- πΌ Principal AI Researcher at Together AI β creator and lead of TGL, the companyβs proprietary inference engine.
- π§βπ» I have initiated and led the end-to-end DeepSeek V3/R1 effort on SGLang β from day-0 support and performance optimization to large-scale EP deployment and GB200 NVL72 integration β driving roadmap, coordination, and execution across community collaborations that pushed the frontier of open-source inference engines at the time.
- π€ Interviewed by The New York Times (Article 1, Article 2), Featured speaker at AI Engineer World's Fair 2025, AMD AI DevDay 2025 and PyTorch Conference 2025.
- π Co-author of the FlashInfer paper (MLSys 2025 Best Paper) and committer to FlashInfer. Previously, I was Lead Software Engineer at Baseten (co-authored the DeepSeek V3 and Qwen 3 launches) and led CTR GPU inference and vector retrieval system development at Meituan.
- π§© My journey with SGLang has evolved from one of the first core developers, to leading inference optimization efforts, and eventually taking on a core maintainer role to support its next phase of growth.
- π« Contact: me@zhyncs.com | Telegram | LinkedIn | Homepage
π―
Just for fun π
- Bay Area, CA
-
17:28
(UTC -07:00) - https://zhyncs.com
- https://orcid.org/0009-0006-7743-2508
- @zhyncs42
Pinned Loading
-
sgl-project/sglang
sgl-project/sglang PublicSGLang is a fast serving framework for large language models and vision language models.
-
flashinfer-ai/flashinfer
flashinfer-ai/flashinfer PublicFlashInfer: Kernel Library for LLM Serving
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.







