DeepSeek发布V4版本,引入MegaMoE技术,这是一个1400行的融合CUDA内核,用于计算整个MoE前向传播。
As we've come to expect from a DeepSeek release, DeepSeek V4 comes with more flashy ML systems optimizations. This time? MegaMoE, a 1400 line fused CUDA kernel that computes the entire MoE forward pass. Let's see how it works (1/4) 🧵 https://t.co/rqv6y2i3JV
likes: 122 | retweets: 14 | replies: 4 | views: 15997