Sourcing — Feed

3 @SemiAnalysis_: One of the data points we keep flagging from our power-crisis research, because it captures the entire mismatch between what AI operators wa...

2026-05-29T17:01

SemiAnalysis研究指出，在ERCOT地区，AI运营商的数据中心互连请求与电网实际核准能力之间存在巨大差距，反映了电力危机中供需不匹配。

在ERCOT，数据中心互连请求与电网承保意愿存在差距
该差距揭示了AI运营商建设计划与电网审批能力的错配

@SemiAnalysis_ ↗ X 行业数据中心算力 AI

3 @SemiAnalysis_: Running a single deep coding model at max context on Cerebras requires 24 systems ($24M Capex) just to support 256 concurrent users. At that...

2026-05-29T04:00

SemiAnalysis发推称，在Cerebras上运行深度编码模型需24个系统（2400万美元资本支出）仅支持256并发用户，而同等资金下标准GB300机架能提供更多内存带宽。

Cerebras运行深度编码模型需24系统（2400万美元）支持256并发用户
同等资金下标准GB300机架可提供更多内存带宽

@SemiAnalysis_ ↗ X AI 算力行业数据中心

3 @SemiAnalysis_: HUGE DEEP DIVE ALERT 🚨: After watching 800VDC sidecar prototypes steal the show at every major conference we’ve attended this 2026, we sat d...

2026-05-29T01:00

SemiAnalysis发布深度报告，预计到2030年800VDC供电技术将推动约39GW的新增数据中心容量，并分析了该技术的渗透率、市场机会及挑战。

预计800VDC到2030年驱动39GW新增数据中心容量

@SemiAnalysis_ ↗ X 数据中心算力行业

3 @SemiAnalysis_: AGI ALERT 🚨 : 63% of sessions do not use sub-agents at all, while 25.9% use 1-5 concurrent sub-agents. 9.8% of sessions use over 5+ paralle...

2026-05-28T01:00

据SemiAnalysis统计，AI agent使用中63%的会话不使用子代理，25.9%使用1-5个并发子代理，9.8%使用5个以上并行子代理。并行子代理可在不增加HBM带宽需求的情况下加速任务完成。

63%的会话未使用子代理
25.9%使用1-5个并发子代理
9.8%使用5个以上并行子代理

@SemiAnalysis_ ↗ X AI 算力

3 @SemiAnalysis_: PDOOM ALERT 🚨 : ~48% of e2e LLM latency is prefill, ~52% is decode. Prefill itself breaks into 2 ops: 🟠 Prefill extend (cache write) — inge...

2026-05-26T23:00

Semianalysis发布LLM推理延迟分析：端到端延迟中prefill占48%，decode占52%；prefill又分为prefill extend（缓写入）和cache read（缓存读取）。

LLM端到端延迟中prefill占48%
LLM端到端延迟中decode占52%
Prefill分为prefill extend和cache read

@SemiAnalysis_ ↗ X AI 算力

3 @SemiAnalysis_: PoV: 70% of New Grad SWE at Meta being reassigned to apply their engineering talent to this RL task https://t.co/UGfvJtFQlK

2026-05-26T19:03

据SemiAnalysis观点，Meta将70%的新毕业软件工程师重新分配至强化学习任务，体现公司对RL方向的资源倾斜。

Meta将70%新毕业软件工程师重新分配至强化学习任务

@SemiAnalysis_ ↗ X AI 算力行业

3 @SemiAnalysis_: One of the threads we kept pulling on in our recent piece on how AI labs are solving the power crisis is that onsite gas has stopped being a...

2026-05-23T21:00

SemiAnalysis指出，现场天然气已不再是边缘选择，而是悄然成为美国下一代AI训练集群的默认规划假设。

现场天然气成为美国下一代AI训练集群的默认规划假设
该转变是悄然发生的，此前被视为边缘选项

@SemiAnalysis_ ↗ X 行业 AI 算力

3 @SemiAnalysis_: FACT ALERT 🚨 : In modern agentic coding, 42% of the time is spent on CPU doing tool use such as editing files, running Bash scripts, running...

2026-05-23T14:00

据分析，现代代理编码中42%的时间用于CPU进行工具使用。传统云计算按CPU核心收费，而代理经济按token收费，为增加token收入，需增加CPU算力。

现代代理编码中42%的时间用于CPU进行工具使用。

@SemiAnalysis_ ↗ X AI 算力

3 @SemiAnalysis_: Great BoM Analysis from our friends at Morgan Stanley A couple things to point out: 1. The memory value indicated here is referring to...

2026-05-22T17:37

摩根士丹利发布NVL72 BoM分析，指出内存价值不含HBM；Nvidia对内存加价；PCB因无缆设计面积和材料升级；BoM价格为OEM渠道价格，超大规模云和Neocloud成本更低。

NVL72 BoM中内存成本不含HBM，HBM计入GPU项
Nvidia对采购内存加价，供应商收入低于BoM显示
PCB内容因无缆设计导致面积和材料升级

@SemiAnalysis_ ↗ X 行业半导体 AI 算力数据中心

3 @SemiAnalysis_: Agentic workloads are quietly rewriting inference economics. We pulled data from 432k real coding agent requests at SemiAnalysis and the med...

2026-05-22T17:01

SemiAnalysis分析了43.2万个真实编码代理请求，发现中位数输入令牌数为9.6万，超过《了不起的盖茨比》全文，表明代理工作负载正在改变推理经济学。

中位数输入令牌为9.6万
数据来源：432k个真实编码代理请求

@SemiAnalysis_ ↗ X AI 算力研究

3 @SemiAnalysis_: TPU ALERT: For OSS production Kubernetes distributed inferencing, Google just added nightly CI for llm-d. Great step by Google to start enab...

2026-05-21T02:00

Google为开源生产Kubernetes分布式推理工具llm-d添加了夜间CI。TPU在llm-d的CI和代码质量上正在追赶NVIDIA。AMD尚未将其GPU或NIC加入该CI。

Google为llm-d添加夜间CI。
TPU在llm-d CI和代码质量上追赶NVIDIA。
AMD尚未将GPU或NIC加入llm-d CI。

@SemiAnalysis_ ↗ X 行业 AI 云计算算力

3 @SemiAnalysis_: Warren Buffett's Berkshire Hathaway first invested in Google in Q3 2025, coincidentally the same time that SemiAnalysis called out a huge in...

2026-05-19T21:00

伯克希尔哈撒韦于2025年Q3首次投资谷歌，并在2026年Q1加仓。巴菲特引用对TPU v5p架构的理解，表示其类似于铁路系统。

伯克希尔哈撒韦于2025年Q3首次投资谷歌
2026年Q1伯克希尔加仓谷歌股份

@SemiAnalysis_ ↗ X 公司 AI 算力美股

3 @SemiAnalysis_: At Stanford CS153 Frontier Systems, Jensen states word for word that he "would like to be at low MFU all the time" & the reasoning Jensen gi...

2026-05-17T02:20

黄仁勋在斯坦福大学演讲中表示，他希望始终维持低模型浮点利用率（MFU），通过过度配置算力、网络和内存等资源来实现更高智能，并暗示xAI可能遵循此策略。

黄仁勋称希望始终处于低MFU状态。
原因是通过过度配置算力、网络和内存来提升智能。
黄仁勋暗示xAI可能遵循此哲学。

@SemiAnalysis_ ↗ X AI 算力行业

3 @SemiAnalysis_: SERIOUS & COOL: AIPerf -- a sub-repo of the Nvidia Dynamo project focused on benchmarking LLM workloads -- just accepted an upstream contrib...

2026-05-16T20:27

AMD首次向英伟达开源项目AIPerf贡献代码，该仓库专注于LLM工作负载基准测试。此贡献被视为开源社区的重要进展，有望推动厂商无关的高质量代码发展。

AMD向英伟达Dynamo项目中的AIPerf子仓库提交代码
这是AMD首次被接受为英伟达仓库的上游贡献者

@SemiAnalysis_ ↗ X AI 算力行业

4 @SemiAnalysis_: During their last Google Cloud Next conference in Las Vegas, Google unveiled their new inference-focused TPU, featuring a novel network topo...

2026-05-14T17:00

谷歌在Google Cloud Next大会上发布新型推理专用TPU，采用名为Broadfly的新型网络拓扑。利用高基数设计，单pod最多可扩展到1152个TPU，相比Ironwood，pod大小提升4.5倍，网络直径减小，任意两个芯片间最多7跳。

谷歌发布新型推理专用TPU，采用Broadfly网络拓扑
新TPU单pod可扩展至1152个芯片
相比Ironwood，pod大小提升4.5倍，最大7跳

@SemiAnalysis_ ↗ X 行业 AI 算力数据中心

3 @SemiAnalysis_: THE MORE U BUY, THE MORE U SAVE: By ganging up multiple B200 8-GPU machines together over RoCEv2 CX-7 ethernet with Tomahawk switches with a...

2026-05-12T17:01

一篇推文介绍通过组合多个B200 8-GPU机器，使用RoCEv2 CX-7以太网和Tomahawk交换机，并应用PD分解推理优化，使每GPU token吞吐量提升高达7倍，每百万token成本降低7倍。

通过RoCEv2和Tomahawk交换机组合B200机器实现PD分解优化
每GPU token吞吐量提升高达7倍
每百万token成本降低7倍

@SemiAnalysis_ ↗ X AI 算力数据中心

3 @SemiAnalysis_: SPEED IS THE MOAT: AMD ROCm software stack has improved performance by over 75x in the last 14 days since DeepSeekv4 launch. The performance...

2026-05-10T17:00

AMD ROCm软件栈在DeepSeekv4发布后14天内性能提升超75倍，通过融合mHC操作和RoPE Hadamard变换降低CPU开销并提高HBM利用率。此外，使用TileLang和Triton编写注意力索引器和KVCache压缩器以加快开发速度。未来目标：再提升5倍以匹敌单节点B200，再提升1.5倍以匹敌PD分离式B200。

AMD ROCm软件栈14天内性能提升超75倍
改进包括融合mHC操作和RoPE Hadamard变换
目标：再提5倍追平单节点B200，再提1.5倍追平PD分离式B200

@SemiAnalysis_ ↗ X AI 半导体算力

2 @SemiAnalysis_: Amazing work from the @sgl_project and @radixark team for their work optimizing DeepSeek V4 inference on B200, B300, and the recent 4x iso-...

2026-05-09T01:00

SGL Project和Radixark团队优化了DeepSeek V4在B200和B300上的推理性能，并在GB300上实现了4倍交互吞吐量提升。

团队优化DeepSeek V4在B200和B300上的推理
在GB300上实现4倍交互吞吐量提升

@SemiAnalysis_ ↗ X AI 算力行业

3 @SemiAnalysis_: Datacenter developers are increasingly planning projects in unincorporated county land, and it's not an accident. Outside city limits, they ...

2026-05-08T17:01

数据中心开发商越来越多地在县级非建制土地上规划项目，这并非偶然。在城市范围外，他们可以绕过市议会批准、分区投票和土地使用审查，从而重塑大型AI基础设施的布局地图。

数据中心开发商偏好县级非建制土地
可避开城市审批流程
重塑AI基础设施布局

@SemiAnalysis_ ↗ X 行业 AI 数据中心算力

3 @SemiAnalysis_: POV of @vllm_project maintainers optimizing DeepSeekv4 performance on day 0 and merging their initial model support PR over the weekend. SPE...

2026-05-08T03:00

vLLM项目维护者正在优化DeepSeekv4的首日性能，并在周末合并了初始模型支持PR，强调速度是关键优势。

vLLM维护者优化DeepSeekv4性能
周末合并初始模型支持PR
强调速度是核心优势

@SemiAnalysis_ ↗ X AI 算力

3 @SemiAnalysis_: RT @SemiAnalysis_: when Anthropic adds 200MW on a Wednesday https://t.co/YzjDiq5po1

2026-05-07T05:15

SemiAnalysis发布推文称，Anthropic在一个周三增加了200MW电力容量。该信息暗示Anthropic正在扩张算力基础设施，但未披露具体项目细节。

Anthropic在一周内增加了200MW电力容量

@SemiAnalysis_ ↗ X 行业 AI 算力

3 @SemiAnalysis_: Canyon Overlook, @ZionNPS - MI355x on SGLang has achieved >10x improvement on throughput PER GPU since day-0 release for DeepSeekv4 Pro. ...

2026-05-06T13:00

AMD MI355x在SGLang上运行DeepSeekv4 Pro，自发布以来每GPU吞吐量提升超过10倍。

AMD MI355x在SGLang上实现>10x吞吐量提升
针对DeepSeekv4 Pro模型

@SemiAnalysis_ ↗ X AI 算力行业

3 @SemiAnalysis_: MINECRAFT STEVE ALERT: GB300 ultra NVL72 is already 2.7x faster 🚀 than GB200 NVL72 on one of the industry standard inference engine known a...

2026-05-04T21:00

据推特消息，GB300 ultra NVL72在vllm推理引擎上比GB200 NVL72快2.7倍。虽然理论性能提升仅1.5倍，但通过全栈优化实现了更高实际性能。该临时样机由英伟达、Inferact和CoreWeave提供用于开源项目。

GB300 ultra NVL72在vllm上比GB200 NVL72快2.7倍
理论上GB300仅有1.5倍NVFP4 FLOP和1.5倍HBM容量
性能提升源于全栈优化带来的复合增益

@SemiAnalysis_ ↗ X AI 算力行业

3 @SemiAnalysis_: A common misconception is that TPU v8i must be the training chip because it has two compute dies. Die count is not the relevant metric, what...

2026-05-04T17:00

SemiAnalysis指出常见误解：TPU v8i并非训练芯片，而是推理芯片。v8i配备8组HBM3E 12-Hi显存，共288GB，带宽8.6 TB/s，而v8t为6组216GB、6.5 TB/s。v8i有384MB片上SRAM，v8t为128MB。FP4算力上，v8i为10.1 PFLOPs，v8t为12.6 PFLOPs。