← 返回列表

@ClementDelangue: The HF science team just made async RL weight sync ~100x cheaper on bandwidth, and you don't need a shared cluster anymore. The problem: ev...

@ClementDelangue 4 信息等级 4 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-28T13:23 抓取:2026-05-28 17:18
🔗 原文链接
摘要

Hugging Face科学团队推出异步强化学习权重同步优化,仅同步变化的权重,带宽成本降低约100倍,在Qwen3-0.6B上验证,payload从1.2GB降至20-35MB,无需共享集群即可实现完全解耦的训练。

客观事实
  • HF团队使异步RL权重同步带宽成本降低约100倍
  • 新方法在TRL中实现,只传输变化的权重元素
  • 在Qwen3-0.6B上,每步payload从1.2GB降至20-35MB
Hugging Face TRL vLLM Qwen3-0.6B

原文

The HF science team just made async RL weight sync ~100x cheaper on bandwidth, and you don't need a shared cluster anymore.

The problem: every RL step, the trainer typically has to sync fresh weights to the inference engine. for a 7B in bf16 that's ~14GB. for a frontier 1T fp8 checkpoint, that's ~1TB; in bf16 it would be ~2TB. per sync.

The insight: between two RL steps, ~99% of bf16 weights are bit-identical. at RL learning rates, the optimizer is whispering and bf16 literally cannot hear most of it. the stored bf16 bits don't change.

What they shipped in TRL: only the changed elements get encoded as a sparse safetensors file, dropped into a Hugging Face Bucket, and fetched by vLLM. on Qwen3-0.6B, per-step payload goes from 1.2 GB to 20 to 35 MB. This is exactly what we built Buckets for: S3-like object storage on the Hub, Xet-backed (so even full snapshots only transfer the changed chunks).

The cherry on top: we ran a FULL disaggregated training where:
- the trainer lived on one box
- vLLM ran inside a Hugging Face Space
- the Wordle environment ran in another Space
- weights flowed through one Hub bucket

no shared cluster. no RDMA. no VPN. no NCCL across clouds. just HTTPS and a bucket.

one GPU + a Hugging Face account is now enough to do real disaggregated RL. multi-replica inference fleets across regions become a small devops exercise, not a research project.

Full write-up: https://t.co/CG115IjT0q

Open source RL keeps eating the moat!

likes: 254 | retweets: 32 | replies: 19 | views: 18086