Achieving Peak System and Workload Efficiency on NVIDIA GB200 NVL72 with Slurm Block Scheduling

NVIDIA Technical Blog 3 信息等级 3 发布：2026-05-07T21:20 抓取：2026-05-07 22:13

🔗 原文链接

行业 AI 算力数据中心

摘要

NVIDIA发布技术博客，介绍GB200 NVL72系统通过NVLink扩展一致性实现整机架性能，带来机架级局部性硬约束，并对Slurm调度器进行优化以提升集群效率。

客观事实

GB200 NVL72通过NVLink在整机架范围扩展一致性
机架级局部性成为硬约束，跨域性能大幅下降

NVIDIA GB200 NVL72 Slurm

原文

NVIDIA GB200 NVL72 introduces a fundamentally new way to build GPU clusters by extending NVIDIA NVLink coherence across an entire rack. This design enables...NVIDIA GB200 NVL72 introduces a fundamentally new way to build GPU clusters by extending NVIDIA NVLink coherence across an entire rack. This design enables exascale performance, but it also changes the assumptions that many scheduling systems were built on. As a result, “rack-scale locality” becomes a hard constraint. When workloads cross domain boundaries, performance drops sharply…

Source