← 返回列表

@SemiAnalysis_: GB300 NVL72 Rack Scale Dynamo SGLang disaggregation has up to 6.5x better performance than B200 on DeepSeekv4 Pro 1.6T 🚀   The high throughp...

@SemiAnalysis_ 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-04-30T07:47 抓取:2026-05-03 15:25
🔗 原文链接
摘要

在DeepSeekv4 Pro 1.6T模型上,GB300 NVL72搭配SGLang disaggregation和DeepSeek MegaMoe kernels,性能比B200提升6.5倍。该成果由Radix Ark、LMSYS Org、NVIDIA AI、CoreWeave等团队协作实现。

客观事实
  • GB300 NVL72在DeepSeekv4 Pro 1.6T上性能比B200提升6.5倍
  • 高性能配置使用DeepSeek MegaMoe内核完全融合GEMM和EP操作
  • Radix Ark、LMSYS Org、NVIDIA AI和CoreWeave等团队参与优化
GB300 B200 DeepSeek Radix Ark LMSYS Org NVIDIA AI CoreWeave

原文

GB300 NVL72 Rack Scale Dynamo SGLang disaggregation has up to 6.5x better performance than B200 on DeepSeekv4 Pro 1.6T 🚀   The high throughput configuration uses @deepseek_ai 's MegaMoe kernels  which fully fuses & overlaps EP dispatch & EP combine & the GEMMs into an single kernel. This performance is achieved from the 10x engineers @BanghuaZ, Tom & the rest of the team at @radixark, @lmsysorg & @NVIDIAAI for rapidly enabling this performance! Big Shoutout to @CoreWeave to contributing temporary GB300 NVL72 racks towards the open source performance optimization for all to benefit!

likes: 176 | retweets: 20 | replies: 4 | views: 32971