该公司自研推理引擎ROSE,用于处理从嵌入到万亿参数MoE的生产及API流量。ROSE集成了CuTeDSL,以加速内核部署并在Hoppers和Blackwells GPU上实现峰值性能。
We serve almost all our production and API traffic, ranging from embeddings to trillion-parameter MoEs, with our own runtime-optimized inference engine ROSE. We've now integrated CuTeDSL to push kernels faster to production and achieve peak performance on Hoppers and Blackwells. https://t.co/IMhv27O8kN
likes: 50 | retweets: 4 | replies: 10 | views: 4277