@AnthropicAI: New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activation...

@AnthropicAI 3 信息等级 3 发布：2026-05-07T17:08 抓取：2026-05-08 04:02

AI 研究

摘要

Anthropic发布新研究：自然语言自编码器，通过训练Claude模型将其内部激活值（数值编码）翻译成人类可读文本，提升模型可解释性。

客观事实

Anthropic Claude

New Anthropic research: Natural Language Autoencoders.

Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read.

Here, we train Claude to translate its activations into human-readable text. https://t.co/pMLsxM2VAO

likes: 9987 | retweets: 1040 | replies: 366 | views: 994616