This page documents how vLLM's Python package dependencies are declared, organized, tracked, and installed. It covers the requirements/ directory structure, version pinning for Docker builds, and the infrastructure for handling various hardware backends and nightly PyTorch releases.
For how dependencies are consumed during Docker image construction, see docker/Dockerfile1-125 For the CMake build system and C extension dependencies, see setup.py184-250 For ROCm-specific build setup, see docker/Dockerfile.rocm1-121
All Python dependency declarations live in the requirements/ directory. The files split dependencies by purpose (runtime, build, test) and by platform (CUDA, ROCm, XPU, CPU).
| File | Purpose | Locked? |
|---|---|---|
requirements/common.txt | Core runtime dependencies, all platforms | No |
requirements/cuda.txt | CUDA runtime deps; includes common.txt | No |
requirements/rocm.txt | ROCm runtime deps; includes common.txt | No |
requirements/xpu.txt | XPU runtime deps; includes common.txt | No |
requirements/cpu.txt | CPU runtime deps; includes common.txt | No |
requirements/test/cuda.txt | Fully-locked CUDA test deps (auto-generated) | Yes |
requirements/test/rocm.txt | Fully-locked ROCm test deps (auto-generated) | Yes |
requirements/test/xpu.txt | Fully-locked XPU test deps (auto-generated) | Yes |
Sources: requirements/common.txt1-59 requirements/cuda.txt1-30 requirements/rocm.txt1-30 requirements/xpu.txt1-21 requirements/cpu.txt1-24
The following diagram shows the include relationships between requirements files and how they map to the files on disk.
Requirements File Include Graph
Sources: requirements/cuda.txt1-2 requirements/rocm.txt1-2 requirements/xpu.txt1-2 requirements/cpu.txt1-3
common.txt)requirements/common.txt lists packages required on every platform. Notable categories:
| Category | Key Packages |
|---|---|
| Tokenization | transformers >= 5.5.3, tokenizers >= 0.21.1, sentencepiece, tiktoken >= 0.6.0 |
| Serving | fastapi[standard] >= 0.115.0, openai >= 2.0.0, aiohttp >= 3.13.3, pydantic >= 2.12.0 |
| Observability | prometheus_client >= 0.18.0, opentelemetry-sdk >= 1.27.0, python-json-logger |
| Structured output | xgrammar >= 0.2.1, outlines_core == 0.2.14, lm-format-enforcer == 0.11.3, llguidance >= 1.7.0 |
| Quantization | compressed-tensors == 0.17.0, partial-json-parser |
| Communication | pyzmq >= 25.0.0, msgspec, cbor2, mcp |
| Multimodal | pillow, mistral_common[image] >= 1.11.3, opencv-python-headless >= 4.13.0, einops |
| Build tooling | ninja, setuptools >= 77.0.3, < 81.0.0 |
Sources: requirements/common.txt1-59
cuda.txt)requirements/cuda.txt adds the NVIDIA-specific stack on top of common.txt:
PyTorch is pinned to an exact version (torch==2.11.0) requirements/cuda.txt7 The corresponding CUDA wheel index is defined by PYTORCH_CUDA_INDEX_BASE_URL in the Dockerfile docker/Dockerfile83
Sources: requirements/cuda.txt1-30 docker/Dockerfile83
rocm.txt)requirements/rocm.txt adds the AMD-specific stack. It includes datasets, peft, and specialized packages like amd-quark requirements/rocm.txt11-24
ROCm builds utilize several custom repositories and specific branches during Docker image construction:
https://github.com/ROCm/triton.git docker/Dockerfile.rocm_base3https://github.com/ROCm/pytorch.git docker/Dockerfile.rocm_base5https://github.com/ROCm/aiter.git docker/Dockerfile.rocm_base13https://github.com/ROCm/mori.git docker/Dockerfile.rocm_base15Sources: requirements/rocm.txt1-30 docker/Dockerfile.rocm_base1-15
xpu.txt)requirements/xpu.txt targets Intel XPU platforms. It pins torch==2.12.0 requirements/xpu.txt15 and includes the vllm_xpu_kernels wheel requirements/xpu.txt20 It also pulls from the Intel-specific PyTorch index requirements/xpu.txt14
Sources: requirements/xpu.txt1-21 docker/Dockerfile.xpu40
cpu.txt)requirements/cpu.txt provides support for x86_64 and aarch64 CPU platforms. It targets the CPU variant of PyTorch requirements/cpu.txt10 and includes intel-openmp for x86 optimizations requirements/cpu.txt20
Sources: requirements/cpu.txt1-24 docker/Dockerfile.cpu1-5
The setup.py script acts as the primary coordinator for building C++, CUDA, and Rust extensions. It auto-detects the target device type via VLLM_TARGET_DEVICE setup.py44-103 and determines the number of compilation jobs based on MAX_JOBS setup.py197-211
It also handles bundling tcmalloc for CPU builds to improve performance setup.py124-182
Build Logic Data Flow
Sources: setup.py44-211
docker/versions.jsonThe docker/versions.json file is the authoritative machine-readable source of pinned versions used in Docker builds. It is auto-generated from docker/Dockerfile ARG defaults.
Version tracking flow diagram:
Key versions currently tracked:
CUDA_VERSION: 13.0.2 docker/Dockerfile25PYTHON_VERSION: 3.12 docker/Dockerfile26UBUNTU_VERSION: 22.04 docker/Dockerfile27Sources: docker/Dockerfile9-27
vLLM supports using precompiled extensions via VLLM_USE_PRECOMPILED setup.py50 For the Rust-based frontend, setup.py checks for precompiled binaries in vllm/vllm-rs and vllm/_rust_*.so setup.py37-40
The build process for the Rust frontend is isolated in a dedicated Docker stage to avoid including the Rust toolchain in final images docker/Dockerfile.rocm130-160 docker/Dockerfile.xpu1-34
Sources: setup.py37-78 docker/Dockerfile.rocm127-160
uvvLLM uses uv for high-performance dependency resolution and virtual environment management. Key environment variables set in Docker docker/Dockerfile117-121:
| Variable | Value | Effect |
|---|---|---|
UV_HTTP_TIMEOUT | 500 | Prevents timeout for large wheels |
UV_INDEX_STRATEGY | "unsafe-best-match" | Optimizes index lookups |
UV_PYTHON_INSTALL_DIR | /opt/uv/python | Managed Python location |
UV_CACHE_DIR | /opt/uv/cache | Shared download cache |
Sources: docker/Dockerfile117-121 docker/Dockerfile.rocm58-63 docker/Dockerfile.cpu53-59
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.