英伟达AI宣布,SGLang在Blackwell硬件上对DeepSeek-V4推理达到180 tok/s/GPU,支持约1M上下文,该优化来自lmsysorg利用模型混合稀疏注意力的Blackwell特定优化。
SGLang is hitting 180 tok/s/GPU on DeepSeek-V4 decode with ~1M context on Blackwell.
Good to see fast progress in open source DeepSeek-V4 inference on new hardware.
This comes from Blackwell-specific optimizations by @lmsysorg that better use the model’s hybrid sparse attention.
likes: 311 | retweets: 32 | replies: 11 | views: 32014