← 返回列表

@SemiAnalysis_: THE MORE U BUY, THE MORE U SAVE: By ganging up multiple B200 8-GPU machines together over RoCEv2 CX-7 ethernet with Tomahawk switches with a...

@SemiAnalysis_ 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-12T17:01 抓取:2026-05-13 04:02
🔗 原文链接
摘要

一篇推文介绍通过组合多个B200 8-GPU机器,使用RoCEv2 CX-7以太网和Tomahawk交换机,并应用PD分解推理优化,使每GPU token吞吐量提升高达7倍,每百万token成本降低7倍。

客观事实
  • 通过RoCEv2和Tomahawk交换机组合B200机器实现PD分解优化
  • 每GPU token吞吐量提升高达7倍
  • 每百万token成本降低7倍
B200 RoCEv2 CX-7 Tomahawk inferact vllm_project NVIDIA Dynamo

原文

THE MORE U BUY, THE MORE U SAVE: By ganging up multiple B200 8-GPU machines together over RoCEv2 CX-7 ethernet with Tomahawk switches with an inference optimization called PD disaggregation, the per GPU token throughput increases up to 7x. By increasing per GPU token throughput by up to 7x, this decreases cost per million tokens by up to 7x also.

Great work to @inferact & @vllm_project for building this amazing OSS engine & for @NVIDIADC @KranenKyle for building dynamo inference orchestrator. More improvements to disagg b200 perf to come!

likes: 121 | retweets: 12 | replies: 4 | views: 22135