WhisperX on NVIDIA Blackwell (DGX Spark / GB10 / GB200)

🐳 Docker Image: docker pull mekopa/whisperx-blackwell:latest

The Problem

Running legacy AI audio workloads (WhisperX, Pyannote) on next-generation NVIDIA Blackwell GPUs (SM_121) currently fails with:

nvrtc: error: invalid value for --gpu-architecture (-arch)

Why? The NVRTC compiler doesn't recognize sm_121 (Blackwell) yet, even though:

PyTorch can see the GPU
CUDA toolkit supports it
The hardware is ready

Standard Python monkeypatching fails because Jiterator queries hardware architecture directly from C++, bypassing Python-level patches.

The Solution

This repository contains the "Blackwell Bridge Patch" - a surgical Dockerfile fix that:

Architecture Spoofing: Forces PyTorch's get_device_capability() to return (9, 0) (Hopper) instead of (12, 1) (Blackwell)
JIT Bypass: Patches torchaudio source code to avoid .abs() on complex tensors, which triggers the broken Jiterator path

Result: SM_90 (Hopper) code runs natively on SM_121 (Blackwell) due to binary compatibility.

Performance

Metric	CPU Fallback	GPU (Patched)	Speedup
24 min audio	~2 hours	62 seconds	~115x
Transcription	GPU ✓	GPU ✓	-
Alignment	GPU ✓	GPU ✓	-
Diarization	CPU only	GPU ✓	115x

Quick Start

Option 1: Pre-built Docker Image (Recommended)

# Pull the image
docker pull mekopa/whisperx-blackwell:latest

# Run the service
docker run -d \
  --name whisperx-gpu \
  --gpus all \
  --ipc=host \
  -p 8003:8003 \
  -v /path/to/audio:/data \
  -e HF_TOKEN="your_huggingface_token" \
  mekopa/whisperx-blackwell:latest

Get your HF token: https://huggingface.co/settings/tokens (needed for pyannote speaker diarization)

Option 2: Build from Source

# Clone the repo
git clone https://github.com/mekopa/whisperx-blackwell.git
cd whisperx-blackwell

# Build the image
docker build -f Dockerfile.gpu -t whisperx-blackwell:latest .

# Run it
docker run -d \
  --name whisperx-gpu \
  --gpus all \
  --ipc=host \
  -p 8003:8003 \
  -e HF_TOKEN="your_token" \
  whisperx-blackwell:latest

Usage

Health Check

curl http://localhost:8003/health

Expected response:

{
  "status": "healthy",
  "service": "whisperx-batch-gpu",
  "device": "cuda",
  "diarization_device": "cuda",
  "gpu": "NVIDIA GB10",
  "compute_capability": "SM_90"
}

Transcribe Audio

curl -X POST "http://localhost:8003/transcribe" \
  -F "file=@your_audio.mp3" \
  -F "language=auto" \
  -o transcription.json

Response includes:

Word-level timestamps
Speaker labels (SPEAKER_00, SPEAKER_01, etc.)
Confidence scores
Language detection

Technical Details

The Patches

1. PyTorch Capability Spoof (`Dockerfile.gpu` lines 88-99)

# Forces get_device_capability() to return (9, 0) for SM_121
def get_device_capability(device=None):
    major, minor = _original_get_device_capability(device)
    if major == 12 and minor == 1:
        return (9, 0)  # Pretend to be Hopper H100
    return (major, minor)

2. Torchaudio Jiterator Bypass (`Dockerfile.gpu` lines 113-118)

# OLD (crashes on SM_121):
spectrum = torch.fft.rfft(strided_input).abs()

# NEW (works):
fft_result = torch.fft.rfft(strided_input)
spectrum = torch.sqrt(fft_result.real**2 + fft_result.imag**2)

Why This Works

Binary Compatibility: NVIDIA designed Blackwell to execute Hopper (SM_90) code natively
JIT Avoidance: Computing .abs() manually uses standard CUDA kernels instead of runtime-compiled jiterator kernels
No Performance Loss: The manual computation is mathematically identical and equally fast

Tested Hardware

✅ NVIDIA DGX Spark (ARM64, Blackwell GB10)
✅ Should work on GB200, GB202, GB203 (untested)
✅ Should work on any SM_121 Blackwell GPU

Tested Software

PyTorch 2.6.0 (NVIDIA container 25.01)
WhisperX 3.8.5
Pyannote.audio 4.0.4
CUDA 13.0
Python 3.12

Architecture

┌─────────────────────────────────────────────────────────────┐
│  WhisperX Pipeline (GPU-Accelerated)                        │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Step 1: Whisper large-v3      → GPU (Blackwell/Hopper)     │
│  Step 2: Wav2Vec2 alignment    → GPU (Blackwell/Hopper)     │
│  Step 3: Pyannote diarization  → GPU (PATCHED!)             │
│                                                             │
│  Patches Applied:                                           │
│  - SM_121 → SM_90 capability spoof                          │
│  - Torchaudio jiterator bypass                              │
└─────────────────────────────────────────────────────────────┘

Known Limitations

Temporary Fix: This will become obsolete when NVIDIA updates NVRTC to recognize SM_121
Binary Compatibility: Relies on Blackwell executing Hopper code (safe, but not optimized)
Torchaudio Version: The line numbers in the patch are for torchaudio==2.6.0 from the NVIDIA container

When to Use This

✅ Use this if:

You have Blackwell hardware (DGX Spark, GB10, GB200)
You're getting nvrtc: error: invalid value for --gpu-architecture
You want GPU-accelerated speaker diarization

❌ Don't use this if:

You have Hopper (H100) or older GPUs - use standard WhisperX
You're on x86_64 architecture - rebuild for your arch
NVIDIA has officially released SM_121 support (check PyTorch release notes)

Future Work

This patch will become obsolete when:

PyTorch updates to recognize SM_121 natively
Torchaudio stops using jiterator for complex number operations
NVIDIA releases updated NVRTC compiler

Until then, this is the only known way to run GPU speaker diarization on Blackwell.

Contributing

Found this useful? Here's how to help:

⭐ Star the repo if this saved you time
🐛 Report issues if you find edge cases
📝 Share results from other Blackwell GPUs (GB200, GB202, etc.)
🔧 Submit PRs for improvements

Credits

WhisperX: https://github.com/m-bain/whisperX
Pyannote.audio: https://github.com/pyannote/pyannote-audio
Patch Discovery: Community effort to unlock Blackwell for legacy workloads

License

MIT License - Free to use, modify, and distribute.

Disclaimer: This is a community patch for early-adopter hardware. Use at your own risk. Not affiliated with NVIDIA or WhisperX maintainers.

Need help? Open an issue or check the Discussions tab.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperX on NVIDIA Blackwell (DGX Spark / GB10 / GB200)

The Problem

The Solution

Performance

Quick Start

Option 1: Pre-built Docker Image (Recommended)

Option 2: Build from Source

Usage

Health Check

Transcribe Audio

Technical Details

The Patches

1. PyTorch Capability Spoof (`Dockerfile.gpu` lines 88-99)

2. Torchaudio Jiterator Bypass (`Dockerfile.gpu` lines 113-118)

Why This Works

Tested Hardware

Tested Software

Architecture

Known Limitations

When to Use This

Future Work

Contributing

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WhisperX on NVIDIA Blackwell (DGX Spark / GB10 / GB200)

The Problem

The Solution

Performance

Quick Start

Option 1: Pre-built Docker Image (Recommended)

Option 2: Build from Source

Usage

Health Check

Transcribe Audio

Technical Details

The Patches

1. PyTorch Capability Spoof (Dockerfile.gpu lines 88-99)

2. Torchaudio Jiterator Bypass (Dockerfile.gpu lines 113-118)

Why This Works

Tested Hardware

Tested Software

Architecture

Known Limitations

When to Use This

Future Work

Contributing

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. PyTorch Capability Spoof (`Dockerfile.gpu` lines 88-99)

2. Torchaudio Jiterator Bypass (`Dockerfile.gpu` lines 113-118)

Packages