← 返回列表

@AravSrinivas: GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack qu...

@AravSrinivas 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-12T14:27 抓取:2026-05-12 16:02
🔗 原文链接
摘要

NVIDIA GB200芯片改变了服务大型MoE模型(如Qwen)时的prefill和decode分离方式,相比Hopper芯片有吞吐量优势,团队已发表量化对比结果。

客观事实
  • GB200改变了大型MoE模型的prefill和decode分离方式
  • 与Hopper相比,GB200在服务Qwen时吞吐量有提升
GB200 Qwen Hopper

原文

GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers.

likes: 61 | retweets: 4 | replies: 6 | views: 9548