@danielhanchen: We released experimental MTP Qwen3.6 Unsloth GGUFs! Qwen3.6 27B MTP now runs at 140 tokens/s. Qwen3.6 35B-A3B MTP gets 220 tokens/s generat...

@danielhanchen 3 信息等级 3 发布：2026-05-13T12:20 抓取：2026-05-13 16:03

🔗 原文链接

AI 算力

摘要

Unsloth发布实验性Qwen3.6 MTP GGUF版本，27B模型在单GPU上达到140 tokens/s，35B-A3B模型达到220 tokens/s，相比原始GGUF速度提升1.4倍，且精度不变。建议最大草稿token数为2。

客观事实

发布Qwen3.6 MTP GGUF版本，支持推测解码
27B模型单GPU推理速度140 tokens/s
35B-A3B模型速度220 tokens/s，提升1.4倍

Qwen Unsloth GPU

原文

We released experimental MTP Qwen3.6 Unsloth GGUFs!

Qwen3.6 27B MTP now runs at 140 tokens/s. Qwen3.6 35B-A3B MTP gets 220 tokens/s generation on a single GPU.

Qwen3.6 27B and 35B-A3B have >1.4x speed-up over the original GGUFs without any change in accuracy.

Guide + GGUFs + Benchmarks: https://t.co/x9BYC3iXCL

In terms of average speedup, we see a 1.4x for dense models at draft tokens = 2 and for the MoE around 1.15 to 1.2x.

We do not recommend more than 2 draft tokens because the acceptance rate drops precipitously from 83% to 50% with 4 draft tokens, and the forward passes for MTP become less beneficial.

Use --spec-type mtp --spec-draft-n-max 2

Thanks to Aman for https://t.co/0WKkIC0kyW!

likes: 268 | retweets: 35 | replies: 22 | views: 16336