🐳 Docker Image: docker pull mekopa/whisperx-blackwell:latest
Running legacy AI audio workloads (WhisperX, Pyannote) on next-generation NVIDIA Blackwell GPUs (SM_121) currently fails with:
nvrtc: error: invalid value for --gpu-architecture (-arch)
Why? The NVRTC compiler doesn't recognize sm_121 (Blackwell) yet, even though:
- PyTorch can see the GPU
- CUDA toolkit supports it
- The hardware is ready
Standard Python monkeypatching fails because Jiterator queries hardware architecture directly from C++, bypassing Python-level patches.
This repository contains the "Blackwell Bridge Patch" - a surgical Dockerfile fix that:
- Architecture Spoofing: Forces PyTorch's
get_device_capability()to return(9, 0)(Hopper) instead of(12, 1)(Blackwell) - JIT Bypass: Patches
torchaudiosource code to avoid.abs()on complex tensors, which triggers the broken Jiterator path
Result: SM_90 (Hopper) code runs natively on SM_121 (Blackwell) due to binary compatibility.
| Metric | CPU Fallback | GPU (Patched) | Speedup |
|---|---|---|---|
| 24 min audio | ~2 hours | 62 seconds | ~115x |
| Transcription | GPU ✓ | GPU ✓ | - |
| Alignment | GPU ✓ | GPU ✓ | - |
| Diarization | CPU only | GPU ✓ | 115x |
# Pull the image
docker pull mekopa/whisperx-blackwell:latest
# Run the service
docker run -d \
--name whisperx-gpu \
--gpus all \
--ipc=host \
-p 8003:8003 \
-v /path/to/audio:/data \
-e HF_TOKEN="your_huggingface_token" \
mekopa/whisperx-blackwell:latestGet your HF token: https://huggingface.co/settings/tokens (needed for pyannote speaker diarization)
# Clone the repo
git clone https://github.com/mekopa/whisperx-blackwell.git
cd whisperx-blackwell
# Build the image
docker build -f Dockerfile.gpu -t whisperx-blackwell:latest .
# Run it
docker run -d \
--name whisperx-gpu \
--gpus all \
--ipc=host \
-p 8003:8003 \
-e HF_TOKEN="your_token" \
whisperx-blackwell:latestcurl http://localhost:8003/healthExpected response:
{
"status": "healthy",
"service": "whisperx-batch-gpu",
"device": "cuda",
"diarization_device": "cuda",
"gpu": "NVIDIA GB10",
"compute_capability": "SM_90"
}curl -X POST "http://localhost:8003/transcribe" \
-F "file=@your_audio.mp3" \
-F "language=auto" \
-o transcription.jsonResponse includes:
- Word-level timestamps
- Speaker labels (SPEAKER_00, SPEAKER_01, etc.)
- Confidence scores
- Language detection
# Forces get_device_capability() to return (9, 0) for SM_121
def get_device_capability(device=None):
major, minor = _original_get_device_capability(device)
if major == 12 and minor == 1:
return (9, 0) # Pretend to be Hopper H100
return (major, minor)# OLD (crashes on SM_121):
spectrum = torch.fft.rfft(strided_input).abs()
# NEW (works):
fft_result = torch.fft.rfft(strided_input)
spectrum = torch.sqrt(fft_result.real**2 + fft_result.imag**2)- Binary Compatibility: NVIDIA designed Blackwell to execute Hopper (SM_90) code natively
- JIT Avoidance: Computing
.abs()manually uses standard CUDA kernels instead of runtime-compiled jiterator kernels - No Performance Loss: The manual computation is mathematically identical and equally fast
- ✅ NVIDIA DGX Spark (ARM64, Blackwell GB10)
- ✅ Should work on GB200, GB202, GB203 (untested)
- ✅ Should work on any SM_121 Blackwell GPU
- PyTorch 2.6.0 (NVIDIA container 25.01)
- WhisperX 3.8.5
- Pyannote.audio 4.0.4
- CUDA 13.0
- Python 3.12
┌─────────────────────────────────────────────────────────────┐
│ WhisperX Pipeline (GPU-Accelerated) │
├─────────────────────────────────────────────────────────────┤
│ │
│ Step 1: Whisper large-v3 → GPU (Blackwell/Hopper) │
│ Step 2: Wav2Vec2 alignment → GPU (Blackwell/Hopper) │
│ Step 3: Pyannote diarization → GPU (PATCHED!) │
│ │
│ Patches Applied: │
│ - SM_121 → SM_90 capability spoof │
│ - Torchaudio jiterator bypass │
└─────────────────────────────────────────────────────────────┘
- Temporary Fix: This will become obsolete when NVIDIA updates NVRTC to recognize SM_121
- Binary Compatibility: Relies on Blackwell executing Hopper code (safe, but not optimized)
- Torchaudio Version: The line numbers in the patch are for
torchaudio==2.6.0from the NVIDIA container
✅ Use this if:
- You have Blackwell hardware (DGX Spark, GB10, GB200)
- You're getting
nvrtc: error: invalid value for --gpu-architecture - You want GPU-accelerated speaker diarization
❌ Don't use this if:
- You have Hopper (H100) or older GPUs - use standard WhisperX
- You're on x86_64 architecture - rebuild for your arch
- NVIDIA has officially released SM_121 support (check PyTorch release notes)
This patch will become obsolete when:
- PyTorch updates to recognize SM_121 natively
- Torchaudio stops using jiterator for complex number operations
- NVIDIA releases updated NVRTC compiler
Until then, this is the only known way to run GPU speaker diarization on Blackwell.
Found this useful? Here's how to help:
- ⭐ Star the repo if this saved you time
- 🐛 Report issues if you find edge cases
- 📝 Share results from other Blackwell GPUs (GB200, GB202, etc.)
- 🔧 Submit PRs for improvements
- WhisperX: https://github.com/m-bain/whisperX
- Pyannote.audio: https://github.com/pyannote/pyannote-audio
- Patch Discovery: Community effort to unlock Blackwell for legacy workloads
MIT License - Free to use, modify, and distribute.
Disclaimer: This is a community patch for early-adopter hardware. Use at your own risk. Not affiliated with NVIDIA or WhisperX maintainers.
Need help? Open an issue or check the Discussions tab.