Yineng Zhang zhyncs

💼 Principal AI Researcher at Together AI — creator and lead of TGL, the company’s proprietary inference engine.
🧑‍💻 I have initiated and led the end-to-end DeepSeek V3/R1 effort on SGLang — from day-0 support and performance optimization to large-scale EP deployment and GB200 NVL72 integration — driving roadmap, coordination, and execution across community collaborations that pushed the frontier of open-source inference engines at the time.
🎤 Interviewed by The New York Times (Article 1, Article 2), Featured speaker at AI Engineer World's Fair 2025, AMD AI DevDay 2025 and PyTorch Conference 2025.
📚 Co-author of the FlashInfer paper (MLSys 2025 Best Paper) and committer to FlashInfer. Previously, I was Lead Software Engineer at Baseten (co-authored the DeepSeek V3 and Qwen 3 launches) and led CTR GPU inference and vector retrieval system development at Meituan.
🧩 My journey with SGLang has evolved from one of the first core developers, to leading inference optimization efforts, and eventually taking on a core maintainer role to support its next phase of growth.
📫 Contact: me@zhyncs.com | Telegram | LinkedIn | Homepage

Provide feedback