Instructions to use mistral-experimental/pixtral-12b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mistral-experimental/pixtral-12b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="mistral-experimental/pixtral-12b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("mistral-experimental/pixtral-12b")
model = AutoModelForMultimodalLM.from_pretrained("mistral-experimental/pixtral-12b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use mistral-experimental/pixtral-12b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mistral-experimental/pixtral-12b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mistral-experimental/pixtral-12b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/mistral-experimental/pixtral-12b

SGLang

How to use mistral-experimental/pixtral-12b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mistral-experimental/pixtral-12b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mistral-experimental/pixtral-12b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mistral-experimental/pixtral-12b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mistral-experimental/pixtral-12b",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use mistral-experimental/pixtral-12b with Docker Model Runner:
```
docker model run hf.co/mistral-experimental/pixtral-12b
```

Cannot apply chat template from tokenizer

#31

by DarkLight1337 - opened Mar 14, 2025

Discussion

DarkLight1337

Mar 14, 2025

•

edited Mar 14, 2025

The tokenizer loaded from AutoTokenizer cannot be used to apply the chat template, which is quite unexpected.

>>> from transformers import AutoProcessor, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("mistral-community/pixtral-12b")
>>> tokenizer.chat_template  # The result is None
>>> processor = AutoProcessor.from_pretrained("mistral-community/pixtral-12b")
>>> processor.chat_template  # The result is correct

I have tested various versions of transformers and this happens from v4.44+ (possibly earlier), so I don't think it's a problem with Transformers library.

RaushanTurganbay

Mar 14, 2025

Hey! This is because the tokenizer config doesn't have a "chat_template" field in https://huggingface.co/mistral-community/pixtral-12b/blob/main/tokenizer_config.json. We usually do not add tokenizer templates if the model is a multimodal model, i.e. we don't expect anyone to use tokenizer and image processor separately

But in Vision LLM, like gemma3, the chat template is duplicated as the model has a Gemma3ForCausaLM class which allows users to do simple text-only inferece

pseudotensor

May 9, 2025

You should definitely expect either text or vision queries separately. The original Pixtral model from mistralai worked fine, but no longer works in new vllm. But this community one has the problem that it has no tokenizer for vllm to use for normal text queries.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment