Instructions to use froggeric/Qwen-Fixed-Chat-Templates with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use froggeric/Qwen-Fixed-Chat-Templates with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen-Fixed-Chat-Templates froggeric/Qwen-Fixed-Chat-Templates
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
toolcall inside thinking
i get many time a tool call inside the thinking tag... even i use your profile
services:
llama-server:
image: ghcr.io/ggml-org/llama.cpp:full-cuda13-b9209
container_name: llama-server
restart: unless-stopped
ports:
- "16384:8080"
volumes:
- ./models:/models:ro
command: >
--server
--model /models/Qwen3.6-27B-Q4_K_M-uc-mtp-v2.gguf
--alias "Qwen3.6 27B"
--temp 0.6
--top-p 0.95
--min-p 0.00
--top-k 20
--port 8080
--host 0.0.0.0
--fit off
--ctx-size 200000
--presence-penalty 0.0
--repeat-penalty 1.0
--jinja
--chat-template-file /models/Qwen3.6-11.jinja
--mmproj /models/Qwen3.6-27B-Q4_K_M-MTP-mmproj-f16-uc-v2.gguf
--webui
--spec-draft-p-min 0.75
--spec-type draft-mtp
--spec-draft-n-max 3
--chat-template-kwargs '{"preserve_thinking": true}'
--reasoning-budget 8192
--reasoning-budget-message "... thinking budget exceeded, let's answer now.\n"
--split-mode tensor
user: "1000:1000"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all
i tested it with your jinja 11 template for qwen 3.6 up to the template 18 ... and i still face this issue...
is this a problem of opencode or is this a problem with the template?
because your template completely kills the thinking process.... i want the thinking process to occur....
your template works exactly the same as if i would turn off thinking...
i need thinking for the 8k given thinking budget, because this is what makes the model so extremely good. but i dont want to let it think for 65k tokens thats why i am limiting it.
additional your template causes which are not recognized as thinking/reasoning text tags as seen in the screenshot , no clue if its think or thinking as the tag... maybe model specific....
@snapo , it was fixed in v1.1
Now is v1.1.2 out which also fixes error case when some apps call the wrong tool
Please test it out and let me know
Just want to say froggeric v19 templates for qwen 3.6 solved it... i had not one single problem anymore...
Apparently latest claude code\ikllama.cpp broke this, and normal qwen template as well:
It straight up outputs tool calling in main text.
llama-server -m "G:\xlam2\Qwen3.6-35B-A3B-APEX-I-Balanced.gguf" -ngl 999 --chat-template-kwargs "{"preserve_thinking":true}" --parallel-tool-calls --prompt-cache prompt.cache --no-mmap -c 230000 --fit --fit-margin 2048 --grouped-expert-routing --cache-type-k q8_0 --cache-type-v q8_0 --k-cache-hadamard --v-cache-hadamard -np 1 -fa on --mlock -t 8 -tb 16 --merge-qkv -b 1280 -ub 1280 -muge --jinja --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.0 --reasoning-budget -1 --repeat-penalty 1.0 --presence-penalty 0.0 --alias 'qwen-3-apex' --cache-ram 8192 --ctx-checkpoints 64 --ctx-checkpoints-interval 4096 -sm layer --host 127.0.0.1 --port 8080 --chat-template-file G:\xlam2\chat_template.jinja
UPDATE: Definitely CC update issue. Pi Code works as expected
Qwen models inherently tend to bleed tool calls into thinking blocks. In the v20 release, I updated the system prompt instructions to be much more strict about closing the block before emitting . Give the new template a try and let me know if it helps.
Qwen models inherently tend to bleed tool calls into thinking blocks. In the v20 release, I updated the system prompt instructions to be much more strict about closing the block before emitting . Give the new template a try and let me know if it helps.
Thank you! that update fixed my issue. CC works now.




