GPT-SoVITS

1 min voice data can also be used to train a good TTS model

This is an exact mirror of the GPT-SoVITS project, hosted at https://github.com/RVC-Boss/GPT-SoVITS. SourceForge is not affiliated with GPT-SoVITS.

Add a Review

Downloads: 44 This Week

Last Update: 2025-07-29

Download

Get an email when there's a new version of GPT-SoVITS

Linux Mac Windows

GPT‑SoVITS is a state-of-the-art voice conversion and TTS system that enables zero‑shot and few‑shot synthesis based on a short vocal sample (e.g., 5 seconds). It supports cross‑lingual speech synthesis across English, Chinese, Japanese, Korean, Cantonese, and more. It's powered by VITS architecture enhanced for few‑sample adaptation and real‑time usability.

Features

Zero‑shot TTS: generate speech from a 5‑second voice sample
Few‑shot fine-tuning: 1 minute of data for improved voice likeness
Cross-lingual support across multiple languages
Web UI for inference and batch generation
Open-source with pretrained model weights
Active community and publication‑grade performance

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow GPT-SoVITS

GPT-SoVITS Web Site

Other Useful Business Software

Papirfly: Best user-friendly DAM and Content Creation Software

The #1 solution to create and manage content. On‑brand. At scale.

Papirfly provides a single online destination for all your employees and other stakeholders who are engaging with your brand, ensuring consistency in all aspects of their communications. Teams can produce infinite studio-standard marketing materials from bespoke templates, store, share and adapt them for their own markets and stay firmly educated on the brand’s purpose, guidelines and evolution – with no specialist skills or agency help necessary.

Learn More

Rate This Project

User Reviews

Be the first to post a review of GPT-SoVITS!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Voice Cloning Software, Python Text-to-Speech (TTS) Models

Registered

2025-07-29

Similar Business Software

Qwen3-TTS

Qwen3-TTS is an open source series of advanced text-to-speech models developed by the Qwen team at Alibaba Cloud under the Apache-2.0 license, offering stable, expressive, and real-time speech generation with features such as voice cloning, voice design, and fine-grained control of prosody and...

See Software
Inworld TTS

Inworld TTS is a state-of-the-art text-to-speech platform designed to deliver ultra-realistic, context-aware speech synthesis and precise voice-cloning capabilities at a radically accessible price. The flagship model, TTS-1, is optimized for real-time applications and supports low-latency...

See Software
Voxtral TTS

Voxtral TTS is a state-of-the-art, multilingual text-to-speech model designed to generate highly realistic and emotionally expressive speech from text, combining strong contextual understanding with advanced speaker modeling to produce natural, human-like audio output. Built as a lightweight...

See Software
Voicv

Voicv is a cutting-edge voice cloning platform that transforms your voice into a digital asset in minutes, supporting multiple languages and zero-shot learning. It allows users to clone any voice with just a 10-30-second audio sample, maintaining high fidelity and natural expression. It...

See Software
Piper TTS

Piper is a fast, local neural text-to-speech (TTS) system optimized for devices like the Raspberry Pi 4, designed to deliver high-quality speech synthesis without relying on cloud services. It utilizes neural network models trained with VITS and exported to ONNX Runtime, enabling efficient and...

See Software
Chatterbox

Chatterbox is a free, open source voice cloning AI model developed by Resemble AI, licensed under MIT. It enables zero-shot voice cloning using just 5 seconds of reference audio, eliminating the need for training. The model offers expressive speech synthesis with unique emotion control, allowing...

See Software

Report inappropriate content

Papirfly: Best user-friendly DAM and Content Creation Software

The #1 solution to create and manage content. On‑brand. At scale.

Learn More

Recommended Projects

OpenVoice
Instant voice cloning by MIT and MyShell. Audio foundation model
VALL-E X
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
CosyVoice
Multi-lingual large voice generation model, providing inference
IndexTTS2
Industrial-level controllable zero-shot text-to-speech system
MOSS-TTS Family
MOSS‑TTS Family open‑source speech and sound generation model