← 返回列表

@SemiAnalysis_: PDOOM ALERT 🚨 : ~48% of e2e LLM latency is prefill, ~52% is decode. Prefill itself breaks into 2 ops: 🟠 Prefill extend (cache write) — inge...

@SemiAnalysis_ 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-26T23:00 抓取:2026-05-26 23:18
🔗 原文链接
摘要

Semianalysis发布LLM推理延迟分析:端到端延迟中prefill占48%,decode占52%;prefill又分为prefill extend(缓写入)和cache read(缓存读取)。

客观事实
  • LLM端到端延迟中prefill占48%
  • LLM端到端延迟中decode占52%
  • Prefill分为prefill extend和cache read
SemiAnalysis

原文

PDOOM ALERT 🚨 : ~48% of e2e LLM latency is prefill, ~52% is decode. Prefill itself breaks into 2 ops:

🟠 Prefill extend (cache write) — ingests new context/files, writes fresh KV tokens
🟠 Cache read — reuses existing KV cache from prior turns https://t.co/zzKrZFZKhX

likes: 10 | retweets: 0 | replies: 0 | views: 1980