@NVIDIAAI: SGLang is hitting 180 tok/s/GPU on DeepSeek-V4 decode with ~1M context on Blackwell. Good to see fast progress in open source DeepSeek-V4 ...

@NVIDIAAI 3 信息等级 3 发布：2026-04-30T21:31 抓取：2026-05-03 15:25

AI 算力行业

摘要

英伟达AI宣布，SGLang在Blackwell硬件上对DeepSeek-V4推理达到180 tok/s/GPU，支持约1M上下文，该优化来自lmsysorg利用模型混合稀疏注意力的Blackwell特定优化。

客观事实

NVIDIA SGLang DeepSeek-V4 Blackwell lmsysorg

SGLang is hitting 180 tok/s/GPU on DeepSeek-V4 decode with ~1M context on Blackwell.

Good to see fast progress in open source DeepSeek-V4 inference on new hardware.

This comes from Blackwell-specific optimizations by @lmsysorg that better use the model’s hybrid sparse attention.

likes: 311 | retweets: 32 | replies: 11 | views: 32014