@SemiAnalysis_: GB300 NVL72 Rack Scale Dynamo SGLang disaggregation has up to 6.5x better performance than B200 on DeepSeekv4 Pro 1.6T 🚀 The high throughp...

@SemiAnalysis_ 3 信息等级 3 发布：2026-04-30T07:47 抓取：2026-05-03 15:25

AI 算力半导体

摘要

在DeepSeekv4 Pro 1.6T模型上，GB300 NVL72搭配SGLang disaggregation和DeepSeek MegaMoe kernels，性能比B200提升6.5倍。该成果由Radix Ark、LMSYS Org、NVIDIA AI、CoreWeave等团队协作实现。

客观事实

GB300 NVL72在DeepSeekv4 Pro 1.6T上性能比B200提升6.5倍
高性能配置使用DeepSeek MegaMoe内核完全融合GEMM和EP操作
Radix Ark、LMSYS Org、NVIDIA AI和CoreWeave等团队参与优化

GB300 B200 DeepSeek Radix Ark LMSYS Org NVIDIA AI CoreWeave

原文

GB300 NVL72 Rack Scale Dynamo SGLang disaggregation has up to 6.5x better performance than B200 on DeepSeekv4 Pro 1.6T 🚀 The high throughput configuration uses @deepseek_ai 's MegaMoe kernels which fully fuses & overlaps EP dispatch & EP combine & the GEMMs into an single kernel. This performance is achieved from the 10x engineers @BanghuaZ, Tom & the rest of the team at @radixark, @lmsysorg & @NVIDIAAI for rapidly enabling this performance! Big Shoutout to @CoreWeave to contributing temporary GB300 NVL72 racks towards the open source performance optimization for all to benefit!

likes: 176 | retweets: 20 | replies: 4 | views: 32971