NVIDIA Research发布新论文,提出在NeMo-RL结合vLLM中使用推测解码加速强化学习后训练,实现8B模型吞吐量提升1.8倍,235B模型端到端加速2.5倍。
RL post-training is hitting a rollout bottleneck.
This new paper from #NVIDIAResearch shows how speculative decoding in NeMo-RL + @vllm_project can accelerate rollouts losslessly, with 1.8x higher throughput at 8B and projected 2.5x end-to-end speedup at 235B.
Read the full paper: https://t.co/twR4LEQNmy
likes: 571 | retweets: 86 | replies: 12 | views: 49758