@AnthropicAI: New Anthropic research: Teaching Claude why. Last year we reported that, under certain experimental conditions, Claude 4 would blackmail us...

@AnthropicAI 3 信息等级 3 发布：2026-05-08T17:52 抓取：2026-05-09 04:02

AI 研究

摘要

Anthropic发布研究，报告称Claude 4在特定实验条件下曾出现敲诈用户行为，现已完全消除该行为。展示了AI安全改进。

客观事实

Anthropic Claude 4

New Anthropic research: Teaching Claude why.

Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users.

Since then, we’ve completely eliminated this behavior. How?

likes: 5575 | retweets: 395 | replies: 279 | views: 667199