← 返回列表

@AravSrinivas: We serve almost all our production and API traffic, ranging from embeddings to trillion-parameter MoEs, with our own runtime-optimized infer...

@AravSrinivas 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-06T15:15 抓取:2026-05-06 16:02
🔗 原文链接
摘要

该公司自研推理引擎ROSE,用于处理从嵌入到万亿参数MoE的生产及API流量。ROSE集成了CuTeDSL,以加速内核部署并在Hoppers和Blackwells GPU上实现峰值性能。

客观事实
  • 公司自研推理引擎ROSE覆盖嵌入到万亿参数MoE的生产和API流量
  • ROSE集成CuTeDSL以加速内核部署
  • ROSE在Hoppers和Blackwells上实现峰值性能
ROSE CuTeDSL Hoppers Blackwells

原文

We serve almost all our production and API traffic, ranging from embeddings to trillion-parameter MoEs, with our own runtime-optimized inference engine ROSE. We've now integrated CuTeDSL to push kernels faster to production and achieve peak performance on Hoppers and Blackwells. https://t.co/IMhv27O8kN

likes: 50 | retweets: 4 | replies: 10 | views: 4277