@SemiAnalysis_: THE MORE U BUY, THE MORE U SAVE: By ganging up multiple B200 8-GPU machines together over RoCEv2 CX-7 ethernet with Tomahawk switches with a...

@SemiAnalysis_ 3 信息等级 3 发布：2026-05-12T17:01 抓取：2026-05-13 04:02

🔗 原文链接

AI 算力数据中心

摘要

一篇推文介绍通过组合多个B200 8-GPU机器，使用RoCEv2 CX-7以太网和Tomahawk交换机，并应用PD分解推理优化，使每GPU token吞吐量提升高达7倍，每百万token成本降低7倍。

客观事实

通过RoCEv2和Tomahawk交换机组合B200机器实现PD分解优化
每GPU token吞吐量提升高达7倍
每百万token成本降低7倍

B200 RoCEv2 CX-7 Tomahawk inferact vllm_project NVIDIA Dynamo

原文

THE MORE U BUY, THE MORE U SAVE: By ganging up multiple B200 8-GPU machines together over RoCEv2 CX-7 ethernet with Tomahawk switches with an inference optimization called PD disaggregation, the per GPU token throughput increases up to 7x. By increasing per GPU token throughput by up to 7x, this decreases cost per million tokens by up to 7x also.

Great work to @inferact & @vllm_project for building this amazing OSS engine & for @NVIDIADC @KranenKyle for building dynamo inference orchestrator. More improvements to disagg b200 perf to come!

likes: 121 | retweets: 12 | replies: 4 | views: 22135