← 返回列表

@danielhanchen: Qwen3.6 MTP Unsloth GGUFs now run 1.8x faster, increased from 1.4x just two days ago! This is due to llama.cpp adding --spec-draft-p-min 0....

@danielhanchen 3 信息等级 3 1 噪音/剔除;2 较弱;3 普通事实;4 重要行业动态;5 极重大事件。该分数是信息显著性,不是投资建议。 发布:2026-05-15T13:10 抓取:2026-05-15 16:03
🔗 原文链接
摘要

Qwen3.6 MTP Unsloth GGUFs运行速度提升至1.8倍,得益于llama.cpp新增--spec-draft-p-min参数。同时发布了0.8B至9B多个尺寸的MTP GGUF模型,并支持两种推测解码算法。

客观事实
  • Qwen3.6 MTP Unsloth GGUFs运行速度从1.4倍提升至1.8倍
  • 速度提升因llama.cpp添加--spec-draft-p-min 0.75参数
  • 发布了Qwen3.6-0.8B至9B MTP GGUF模型
Qwen3.6 Unsloth GGUF llama.cpp

原文

Qwen3.6 MTP Unsloth GGUFs now run 1.8x faster, increased from 1.4x just two days ago!

This is due to llama.cpp adding --spec-draft-p-min 0.75!

Args have also changed from
--spec-type mtp
to
--spec-type draft-mtp
Also increase --spec-draft-n-max 2 to 6

We also released Qwen3.6-0.8B, 2B, 4B, 9B MTP GGUFs! We'll be providing more soon!

For folks who find the new updated branch to have some perf regression, set --spec-draft-p-min to 0.0 to get the old behavior - we provided a plot of the old branch (red) vs the new branch (blue / green) as well.

Also you can use 2 speculative decoding algos - you can add ngram via --spec-type ngram-mod,draft-mtp - the perf isn't yet optimized so I'll do more benchmarks to find better numbers - see https://t.co/0WKkIC0kyW

Guide for MTP: https://t.co/x9BYC3iXCL

likes: 234 | retweets: 28 | replies: 26 | views: 10991