在DeepSeekv4 Pro 1.6T模型上,GB300 NVL72搭配SGLang disaggregation和DeepSeek MegaMoe kernels,性能比B200提升6.5倍。该成果由Radix Ark、LMSYS Org、NVIDIA AI、CoreWeave等团队协作实现。
GB300 NVL72 Rack Scale Dynamo SGLang disaggregation has up to 6.5x better performance than B200 on DeepSeekv4 Pro 1.6T 🚀 The high throughput configuration uses @deepseek_ai 's MegaMoe kernels which fully fuses & overlaps EP dispatch & EP combine & the GEMMs into an single kernel. This performance is achieved from the 10x engineers @BanghuaZ, Tom & the rest of the team at @radixark, @lmsysorg & @NVIDIAAI for rapidly enabling this performance! Big Shoutout to @CoreWeave to contributing temporary GB300 NVL72 racks towards the open source performance optimization for all to benefit!
likes: 176 | retweets: 20 | replies: 4 | views: 32971