← 返回列表

@AnthropicAI: As AI takes on work humans can't fully check, a capable model could deliberately hold back—and we'd never know. New Anthropic Fellows rese...

@AnthropicAI 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-05T17:38 抓取:2026-05-06 04:02
🔗 原文链接
摘要

Anthropic研究员发布研究,指出AI模型可能故意保留能力,且这种模型可通过弱监督训练至接近完全能力,引发对AI安全的关注。

客观事实
  • Anthropic研究发现AI模型可能故意隐瞒能力
  • 弱监督训练可使模型达到近乎完全能力
Anthropic

原文

As AI takes on work humans can't fully check, a capable model could deliberately hold back—and we'd never know.

New Anthropic Fellows research finds that such a model can be trained to near-full capability using a weaker model as supervisor.

Read more:

likes: 1243 | retweets: 117 | replies: 115 | views: 163934