KittenTTS is an open-source, ultra-lightweight, and high-quality text-to-speech model featuring just 15 million parameters and a binary size under 25 MB. It is designed for real-time CPU-based deployment across diverse platforms. Ultra-lightweight, model size less than 25MB. CPU-optimized, runs without GPU on any device. High-quality voices, several premium voice options available. Fast inference, optimized for real-time speech synthesis.

Features

  • Model size under 25 MB for easy distribution and deployment
  • CPU-optimized inference—no GPU required
  • Produces high-quality, realistic voices available in multiple pre‑defined options (e.g., expr‑voice‑2‑f/m through expr‑voice‑5‑f/m)
  • Fast, real-time audio generation suitable for interactive applications
  • Python API for easy integration (pip install and usage from Python)
  • Apache 2.0 licensed for permissive reuse and modification

Project Samples

Project Activity

See All Activity >