NVIDIA AI宣布推出TokenSpeed推理引擎,专为高速智能体工作负载设计。该引擎具备高级KV缓存管理、安全高效调度器、可插拔分层内核系统,支持多芯片,并在NVIDIA Blackwell上实现最快的MLA注意力内核。
TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads.
Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it also has the fastest MLA attention kernel on NVIDIA Blackwell.
Congrats to @lightseekorg on the launch!
likes: 375 | retweets: 31 | replies: 13 | views: 42352