@AravSrinivas: GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack qu...

@AravSrinivas 3 信息等级 3 发布：2026-05-12T14:27 抓取：2026-05-12 16:02

🔗 原文链接

AI 算力行业

摘要

NVIDIA GB200芯片改变了服务大型MoE模型（如Qwen）时的prefill和decode分离方式，相比Hopper芯片有吞吐量优势，团队已发表量化对比结果。

客观事实

GB200改变了大型MoE模型的prefill和decode分离方式
与Hopper相比，GB200在服务Qwen时吞吐量有提升

GB200 Qwen Hopper

原文

GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers.

likes: 61 | retweets: 4 | replies: 6 | views: 9548