← 返回列表

Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm Block Scheduling

NVIDIA Technical Blog 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-07T21:20 抓取:2026-05-07 22:13
🔗 原文链接
摘要

NVIDIA发布技术博客,介绍GB200 NVL72系统通过NVLink扩展一致性实现整机架性能,带来机架级局部性硬约束,并对Slurm调度器进行优化以提升集群效率。

客观事实
  • GB200 NVL72通过NVLink在整机架范围扩展一致性
  • 机架级局部性成为硬约束,跨域性能大幅下降
NVIDIA GB200 NVL72 Slurm

原文

NVIDIA GB200 NVL72 introduces a fundamentally new way to build GPU clusters by extending NVIDIA NVLink coherence across an entire rack. This design enables...NVIDIA GB200 NVL72 introduces a fundamentally new way to build GPU clusters by extending NVIDIA NVLink coherence across an entire rack. This design enables exascale performance, but it also changes the assumptions that many scheduling systems were built on. As a result, “rack-scale locality” becomes a hard constraint. When workloads cross domain boundaries, performance drops sharply…

Source