llama.cpp 新增 MTP(多令牌预测)支持,本地模型推理速度显著提升,足以作为日常驱动。Qwen3.6-27B 密集生成在 A10 GPU 上得到展示,推动本地 AI 部署实用性。
RT @victormustar: llama.cpp with MTP support makes local models fast enough to use as daily drivers 🚀
Qwen3.6-27B dense generation (on A10…
likes: 429 | retweets: 51 | replies: 19 | views: 51513