NVIDIA 推出 Dynamo Snapshot 技术,用于 Kubernetes 上的推理工作负载快速启动,将启动时间从分钟级降至5秒以内。该技术利用 GMS 实现并发权重恢复,并加速 CRIU 恢复性能,旨在应对生产环境中推理部署的波动需求。
Introducing Dynamo Snapshot, our approach for fast startup for inference workloads on Kubernetes, which reduces startup time from minutes to under 5 seconds.
In production inference deployments demand fluctuates over time. Cold-starting inference workloads can take minutes, leaving idle GPUs that generate no tokens and serve no requests.
Snapshot leverages GMS to enable concurrent weight restoration over a high-speed interconnect, while using Linux native AIO and parallel memfd restoration to accelerate CRIU restore performance.
likes: 188 | retweets: 33 | replies: 9 | views: 19382