华为在昇腾芯片上成功预训练了一个大语言模型,采用超节点优化训练和DSA技术,旨在证明其硬件能力。
Huawei has finally credibly (?) pretrained a big LLM on Ascends. "hyper-node optimized training" suggests 950s I guess. Builds on DSA ("with SWA"). They want to prove it can be done on their hardware.
What is ModAttn?
(pics from Reddit, some translations are off) https://t.co/sxJJrBwb49
likes: 53 | retweets: 1 | replies: 3 | views: 5719