Instructions to use froggeric/Qwen-Fixed-Chat-Templates with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen-Fixed-Chat-Templates with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen-Fixed-Chat-Templates froggeric/Qwen-Fixed-Chat-Templates
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Reasoning off mode issue
I tried multiple qwen models with llama.cpp in --reasoning off mode, and it happens very often that an orphaned </think> tag appears. It did not do this with other chat templates.
here too:
config.ini:
[*]
n-gpu-layers = all
ctx-size = 65536
threads = 18
batch-size = 2048
ubatch-size = 1024
parallel = 2
mlock = true
mmap = true
; no-mmap = true
flash-attn = true
cache-type-k = q8_0
cache-type-v = q8_0
cache-type-k-draft = q8_0
cache-type-v-draft = q8_0
reasoning = false
prio = 3
seed = 3407
jinja = true
[Qwen3.6-35B-A3B:UD-Q4_K_XL]
model = /models/Qwen3.6/Qwen3.6-35B-A3B-UD-Q4_K_XL.gguf
mmproj = /models/Qwen3.6/mmproj-F16.gguf
temperature = 0.7
top-p = 0.8
top-k = 20
min-p = 0.0
presence-penalty = 1.5
repeat-penalty = 1.0
image-min-tokens = 1024
spec-type = draft-mtp
spec-draft-n-max = 2
chat-template-file = /templates/froggeric-chat_template-v19.jinja
open-webui v0.9.5
prompt:
**Build a VRAM and KV cache calculator tool for llama.cpp server.** The tool should include the following parameters: model type (e.g., Qwen2.5-72B), bit precision (4-bit/8-bit), total `--ctx-size`, number of `--parallel` slots, and batch size. The output should display the estimated VRAM usage, KV cache allocation per slot, and warnings if this configuration exceeds physical GPU limits to avoid 400 errors or stuttering/lag when running multiple concurrent threads.
response:
I'll build a comprehensive VRAM and KV Cache Calculator for llama.cpp server. Let me first research the current understanding of these calculations to ensure accuracy.
<function=web_search>
<parameter=search_queries>
["llama.cpp VRAM calculation KV cache formula 2024", "llama.cpp --ctx-size --parallel VRAM usage calculator", "KV cache memory calculation transformer models bits per token"]
</parameter>
</function>
</tool_call>
Found this one, works nice:
https://huggingface.co/spiritbuun/buun-Qwen3.6-chat_template
Thanks, I will give a try.
In the v20 release, I completely overhauled the thinking toggles and state tracking to handle reasoning-off environments better. Please try the latest v20 template and see if that cleans up the orphaned tags. If you're still seeing it, you might need to update your llama.cpp server to the latest build.
👋
In the v20 release, I completely overhauled the thinking toggles and state tracking to handle reasoning-off environments better. Please try the latest v20 template and see if that cleans up the orphaned tags. If you're still seeing it, you might need to update your llama.cpp server to the latest build.
Thank you for this!
However I've been working with this template tonight and running PiehSoft/Qwen3.6-40B-Deckard-MTP model overnight. And now see these error logs:
Now let me update the test file to inline the functions.
⤵ 1K ⤴ 81 cache: 95K
┌─── ✎ Write: 🟦 src/widgets/Speedometer/__tests__/Speedometer.test.ts · 1 line ──────────────────────────────────────────────────────────────────────────┐
│ 1 import { describe, it, expect, vi, │
│ ✘ Diagnostics (1 error(s)) │
│ └─ 🟦 src/widgets/Speedometer/__tests__/Speedometer.test.ts │
│ └─ ✘:1:35 '}' expected. (1005) │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Error: 500 Failed to parse tool call arguments as JSON: [json.exception.parse_error.101] parse error at line 1, colu…
...
Error: Retry failed after 10 attempts: 500 Failed to parse tool call arguments as JSON: [json.exception.parse_error.101] parse error at line 1, column
168: syntax error while parsing value - invalid string: missing closing quote; last read: '"import { describe, it, expect, vi,'
Also in llamacpp server logs I got this:
175.27.099.097 W srv operator(): got exception: {"error":{"code":500,"message":"Failed to parse tool call arguments as JSON: [json.exception.parse_error.101] parse error at line 1, column 168: syntax error while parsing value - invalid string: missing closing quote; last read: '\"import { describe, it, expect, vi,'","type":"server_error"}}
Not sure it's a template problem but this is exactly what was highlighted to me by my assistant model.
Its said that this template has probable fix to add kwarg auto_disable_thinking_with_tools. Please have a look and maybe this template also needs it? :
https://huggingface.co/spiritbuun/buun-Qwen3.6-chat_template
