@NVIDIAAI: TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads. Read their blog to learn more about its ad...

@NVIDIAAI 3 信息等级 3 发布：2026-05-06T16:21 抓取：2026-05-07 04:01

🔗 原文链接

AI 算力行业

摘要

NVIDIA AI宣布推出TokenSpeed推理引擎，专为高速智能体工作负载设计。该引擎具备高级KV缓存管理、安全高效调度器、可插拔分层内核系统，支持多芯片，并在NVIDIA Blackwell上实现最快的MLA注意力内核。

客观事实

TokenSpeed是NVIDIA发布的全新推理引擎
TokenSpeed在NVIDIA Blackwell上拥有最快的MLA注意力内核
该引擎支持多芯片，采用可插拔分层内核系统

NVIDIA TokenSpeed NVIDIA Blackwell

原文

TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads.

Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it also has the fastest MLA attention kernel on NVIDIA Blackwell.

Congrats to @lightseekorg on the launch!

likes: 375 | retweets: 31 | replies: 13 | views: 42352