Sourcing — Feed

清除当前 6 条 / 共 3560 条

筛选已选

投研/平台

Alpha 派抓到：11 小时 15 分钟前 SemiAnalysis 抓到：3 天 8 小时前

官方/公司

OpenAI News 抓到：2 小时 21 分钟前 NVIDIA Technical Blog 抓到：8 小时 21 分钟前 Azure Blog 抓到：6 天 20 小时前 Google DeepMind Blog 抓到：8 天 2 小时前 Amazon Science 抓到：1 天 8 小时前 AWS ML Blog 抓到：1 天 2 小时前

微信公众号

微信公众号 · Founder Park 抓到：10 天 22 小时前微信公众号 · FundaAI 抓到：17 天 20 小时前微信公众号 · 九章智驾抓到：10 天 22 小时前微信公众号 · 晚点LatePost 抓到：10 天 22 小时前微信公众号 · 琢磨事抓到：24 天 16 小时前微信公众号 · 甲子光年抓到：21 天 6 小时前

重置

异常/暂停数据源 9

AI 基建 · 26 天 20 小时前微信公众号 · 42章经 · 4 天 15 小时前微信公众号 · DeepTech深科技 · 4 天 15 小时前微信公众号 · Founder Park · 4 天 15 小时前微信公众号 · FundaAI · 4 天 15 小时前微信公众号 · 九章智驾 · 4 天 15 小时前微信公众号 · 晚点LatePost · 4 天 15 小时前微信公众号 · 琢磨事 · 4 天 15 小时前微信公众号 · 甲子光年 · 4 天 15 小时前

3 @AravSrinivas: RT @perplexity_ai: We're open-sourcing the Unigram tokenizer we rebuilt to reduce CPU utilization by 5-6x. Small rerankers and embedders r…

2026-05-27T15:59

Perplexity AI宣布开源其重建的Unigram分词器，声称可将CPU利用率降低5-6倍，同时提及小模型相关技术。

Perplexity AI开源Unigram分词器
分词器降低CPU利用率5-6倍

Aravind Srinivas ↗ X AI 算力

3 @AravSrinivas: GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack qu...

2026-05-12T14:27

NVIDIA GB200芯片改变了服务大型MoE模型（如Qwen）时的prefill和decode分离方式，相比Hopper芯片有吞吐量优势，团队已发表量化对比结果。

GB200改变了大型MoE模型的prefill和decode分离方式
与Hopper相比，GB200在服务Qwen时吞吐量有提升

Aravind Srinivas ↗ X AI 算力行业

3 @AravSrinivas: RT @perplexity_ai: We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks. GB200 i…

2026-05-12T14:22

Perplexity AI发布新研究，展示在NVIDIA GB200 NVL72 Blackwell机架系统上部署后训练Qwen3 235B模型的成果，涉及硬件与模型推理优化。

Perplexity AI发布在NVIDIA GB200 NVL72上的模型部署研究
研究涉及后训练的Qwen3 235B模型
部署基于Blackwell架构的GB200系统

Aravind Srinivas ↗ X AI 算力行业

3 @AravSrinivas: RT @NVIDIAAI: Perplexity runs on NVIDIA. Nice breakdown from the team on how they’re using the CUTLASS Python stack to optimize their mod…

2026-05-07T22:00

Perplexity 确认基于 NVIDIA 平台运行，并使用 CUTLASS Python 栈优化模型，展示了双方在 AI 算力上的合作。

Perplexity 运行在 NVIDIA 平台上。
Perplexity 使用 CUTLASS Python 栈优化模型。

Aravind Srinivas ↗ X AI 算力

3 @AravSrinivas: We serve almost all our production and API traffic, ranging from embeddings to trillion-parameter MoEs, with our own runtime-optimized infer...

2026-05-06T15:15

该公司自研推理引擎ROSE，用于处理从嵌入到万亿参数MoE的生产及API流量。ROSE集成了CuTeDSL，以加速内核部署并在Hoppers和Blackwells GPU上实现峰值性能。

公司自研推理引擎ROSE覆盖嵌入到万亿参数MoE的生产和API流量
ROSE集成CuTeDSL以加速内核部署
ROSE在Hoppers和Blackwells上实现峰值性能

Aravind Srinivas ↗ X AI 算力行业

3 @AravSrinivas: RT @perplexity_ai: We’ve developed our own inference engine Runtime-Optimized Serving Engine (ROSE) to serve models ranging from embeddings…

2026-05-06T15:09

Perplexity AI宣布自研推理引擎ROSE，用于服务从嵌入模型到各种规模的模型，提升运行时优化。

Perplexity AI开发了自研推理引擎ROSE。
ROSE用于服务从嵌入模型到多种规模的模型。

Aravind Srinivas ↗ X AI 算力动态

1 共 1 页