Sourcing — Feed

清除当前 4 条 / 共 3560 条

筛选已选

投研/平台

Alpha 派抓到：12 小时 4 分钟前 SemiAnalysis 抓到：3 天 9 小时前

官方/公司

OpenAI News 抓到：3 小时 10 分钟前 NVIDIA Technical Blog 抓到：9 小时 10 分钟前 Azure Blog 抓到：6 天 21 小时前 Google DeepMind Blog 抓到：8 天 3 小时前 Amazon Science 抓到：1 天 9 小时前 AWS ML Blog 抓到：1 天 3 小时前

微信公众号

微信公众号 · Founder Park 抓到：10 天 23 小时前微信公众号 · FundaAI 抓到：17 天 21 小时前微信公众号 · 九章智驾抓到：10 天 23 小时前微信公众号 · 晚点LatePost 抓到：10 天 23 小时前微信公众号 · 琢磨事抓到：24 天 16 小时前微信公众号 · 甲子光年抓到：21 天 6 小时前

重置

异常/暂停数据源 9

AI 基建 · 26 天 21 小时前微信公众号 · 42章经 · 4 天 15 小时前微信公众号 · DeepTech深科技 · 4 天 15 小时前微信公众号 · Founder Park · 4 天 15 小时前微信公众号 · FundaAI · 4 天 15 小时前微信公众号 · 九章智驾 · 4 天 15 小时前微信公众号 · 晚点LatePost · 4 天 15 小时前微信公众号 · 琢磨事 · 4 天 15 小时前微信公众号 · 甲子光年 · 4 天 15 小时前

3 @rasbt: The MiniMax M2 series was one of the most widely used open-weight LLM series earlier this year. Now, we got a technical report with some int...

2026-05-27T15:07

MiniMax M2技术报告发布，总结了多项技术发现：选择全注意力机制而非混合滑动窗口；线性/稀疏注意力在生产系统中部署困难且前缀缓存支持差；细粒度MoE（128专家top-8）在2B参数规模下推理和代码能力显著提升；训练流程中增加了软件工程agent行为训练。

MiniMax M2采用全注意力机制，放弃混合滑动窗口
稀疏注意力在生产环境中部署困难且前缀缓存支持差
细粒度MoE在2B参数下将MATH从19.6提升至24.1

@rasbt ↗ X AI 算力

2 @rasbt: Added a DeepSeek Sparse Attention (DSA) from-scratch implementation to my LLMs-from-scratch repo thanks to an awesome new reader contrib. W...

2026-05-23T15:20

在 LLMs-from-scratch 仓库中新增了 DeepSeek Sparse Attention (DSA) 的从头实现，包含动机、概述和 GPT 风格模型参考实现，作为独立示例代码。

向 LLMs-from-scratch 仓库添加了 DSA 实现
实现包含动机、概述和 GPT 风格模型参考代码

@rasbt ↗ X AI 研究

3 @rasbt: New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4. I focus on long-context efficiency tweaks like...

2026-05-16T13:10

Twitter用户@rasbt发布一篇关于近期大型语言模型架构进展的图文文章，涵盖从Gemma 4到DeepSeek V4的模型，重点介绍长上下文效率优化技术，如KV共享、逐层嵌入、分层注意力预算、压缩注意力及mHC等。

文章回顾从Gemma 4到DeepSeek V4的LLM架构进展
重点介绍长上下文效率优化技术，包括KV共享和压缩注意力
文章以可视化方式呈现，并附有链接

@rasbt ↗ X AI 行业

3 @rasbt: Here is a 2nd batch of April architecture drops. What a month! - Ant Ling 2.6 1T - Minimax M2.7 - Xiaomi MiMo V2.5 - Poolside Laguna XS.2 - ...

2026-05-03T17:17

2026年4月第二波AI模型架构发布，包括蚂蚁Ant Ling 2.6 1T、Minimax M2.7、小米MiMo V2.5、Poolside Laguna XS.2、腾讯Hy3-preview、IBM Granite 4.1等。

蚂蚁发布Ant Ling 2.6 1T模型
Minimax发布M2.7模型
小米、腾讯、IBM等发布新模型

@rasbt ↗ X AI 行业动态

1 共 1 页