← 返回列表

@NVIDIAAI: What does it actually take to run agentic workloads at scale? ⚡Agents push token consumption, context length, and latency into extremely de...

@NVIDIAAI 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-05T16:00 抓取:2026-05-05 16:04
🔗 原文链接
摘要

NVIDIA AI发文称,运行规模化agentic工作负载对token消耗、上下文长度和延迟要求极高。Vera Rubin平台通过极致协同设计,针对此类复杂工作负载,可在万亿参数MoE模型上实现每用户每秒400+ tokens。

客观事实
  • NVIDIA推出Vera Rubin平台,用于运行agentic工作负载
  • 该平台在万亿参数MoE模型上实现每用户每秒400+ tokens
NVIDIA Vera Rubin

原文

What does it actually take to run agentic workloads at scale?

⚡Agents push token consumption, context length, and latency into extremely demanding regions. Extreme co-design on the Vera Rubin platform is built for these complex workloads, delivering 400+ tokens/sec/user on trillion-parameter MoE models.

Tech blog ➡️ https://t.co/DIxW96omML

likes: 1 | retweets: 0 | replies: 0 | views: 0