Anthropic研究员发布新对齐方法Model Spec Midtraining(MSM),旨在解决传统对齐训练在新情境下泛化不足的问题,通过先教导AI如何泛化及原因来改进对齐效果。
New Anthropic Fellows research: Model Spec Midtraining (MSM).
Standard alignment methods train AIs on examples of desired behavior. But this can fail to generalize to new situations.
MSM addresses this by first teaching AIs how we would like them to generalize and why.
likes: 1119 | retweets: 99 | replies: 75 | views: 110304