Twitter用户@rasbt发布一篇关于近期大型语言模型架构进展的图文文章,涵盖从Gemma 4到DeepSeek V4的模型,重点介绍长上下文效率优化技术,如KV共享、逐层嵌入、分层注意力预算、压缩注意力及mHC等。
New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4.
I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC.
Link: https://t.co/KO81y3kTH7 https://t.co/wTx51QpQu4
likes: 460 | retweets: 82 | replies: 19 | views: 18912