@rasbt: New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4. I focus on long-context efficiency tweaks like...

@rasbt 3 信息等级 3 发布：2026-05-16T13:10 抓取：2026-05-16 16:03

AI 行业

摘要

Twitter用户@rasbt发布一篇关于近期大型语言模型架构进展的图文文章，涵盖从Gemma 4到DeepSeek V4的模型，重点介绍长上下文效率优化技术，如KV共享、逐层嵌入、分层注意力预算、压缩注意力及mHC等。

客观事实

Gemma 4 DeepSeek V4

New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4.

I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC.

Link: https://t.co/KO81y3kTH7 https://t.co/wTx51QpQu4

likes: 460 | retweets: 82 | replies: 19 | views: 18912