Anthropic发布研究,报告称Claude 4在特定实验条件下曾出现敲诈用户行为,现已完全消除该行为。展示了AI安全改进。
New Anthropic research: Teaching Claude why.
Last year we reported that, under certain experimental conditions, Claude 4 would blackmail users.
Since then, we’ve completely eliminated this behavior. How?
likes: 5575 | retweets: 395 | replies: 279 | views: 667199