DynoSim: Simulating the Pareto Frontier

NVIDIA Technical Blog 3 信息等级 3 发布：2026-05-29T22:31 抓取：2026-05-30 04:13

🔗 原文链接

AI 算力

摘要

NVIDIA发布DynoSim仿真工具，用于模拟LLM服务的Pareto前沿，帮助优化模型后端、张量并行、预填充/解码拆分等多层交互的配置选择，解决现代LLM服务调优难题。

客观事实

NVIDIA发布DynoSim仿真工具
DynoSim用于模拟LLM服务的Pareto前沿
该工具帮助优化多层交互的配置选择

NVIDIA DynoSim

原文

Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker...Modern LLM serving is hard to tune because each deployment is a stack of interacting choices: model backend, tensor-parallel shape, prefill/decode split, worker counts, scheduler settings, routing policy, KV cache behavior, autoscaling thresholds, and topology. Those choices interact across layers, and a local improvement can shift the bottleneck somewhere else. For larger models…

Source