llama.cpp 新增 MTP 支持,使本地模型运行速度显著提升。在 A10G 上,Qwen3.6-27B 密集生成速度从 25 tok/s 提升至 45 tok/s,增幅达 78%。
llama.cpp with MTP support makes local models fast enough to use as daily drivers 🚀
Qwen3.6-27B dense generation below on A10G: From 25 tok/st to 45 tok/s (+78%)! https://t.co/rLjBVa3Yzh
likes: 122 | retweets: 12 | replies: 11 | views: 9051