@NVIDIAAI: What does it actually take to run agentic workloads at scale? ⚡Agents push token consumption, context length, and latency into extremely de...

@NVIDIAAI 3 信息等级 3 发布：2026-05-05T16:00 抓取：2026-05-05 16:04

🔗 原文链接

AI 算力数据中心

摘要

NVIDIA AI发文称，运行规模化agentic工作负载对token消耗、上下文长度和延迟要求极高。Vera Rubin平台通过极致协同设计，针对此类复杂工作负载，可在万亿参数MoE模型上实现每用户每秒400+ tokens。

客观事实

NVIDIA推出Vera Rubin平台，用于运行agentic工作负载
该平台在万亿参数MoE模型上实现每用户每秒400+ tokens

NVIDIA Vera Rubin

原文

What does it actually take to run agentic workloads at scale?

⚡Agents push token consumption, context length, and latency into extremely demanding regions. Extreme co-design on the Vera Rubin platform is built for these complex workloads, delivering 400+ tokens/sec/user on trillion-parameter MoE models.

Tech blog ➡️ https://t.co/DIxW96omML

likes: 1 | retweets: 0 | replies: 0 | views: 0