NVIDIA AI发文称,运行规模化agentic工作负载对token消耗、上下文长度和延迟要求极高。Vera Rubin平台通过极致协同设计,针对此类复杂工作负载,可在万亿参数MoE模型上实现每用户每秒400+ tokens。
What does it actually take to run agentic workloads at scale?
⚡Agents push token consumption, context length, and latency into extremely demanding regions. Extreme co-design on the Vera Rubin platform is built for these complex workloads, delivering 400+ tokens/sec/user on trillion-parameter MoE models.
Tech blog ➡️ https://t.co/DIxW96omML
likes: 1 | retweets: 0 | replies: 0 | views: 0