Instructions to use sesame/csm-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sesame/csm-1b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="sesame/csm-1b")# Load model directly from transformers import AutoProcessor, AutoModelForTextToWaveform processor = AutoProcessor.from_pretrained("sesame/csm-1b") model = AutoModelForTextToWaveform.from_pretrained("sesame/csm-1b") - Notebooks
- Google Colab
- Kaggle
Voice Cloning
#7
by isaiahbjork - opened
I created a repo for you to clone your voice with CSM-1B, results aren't the best but you can recognize the cloned voice.
Hi, I am trying voice cloning using the repository https://github.com/isaiahbjork/csm-voice-cloning
. When I provide an audio input of around 3–4 minutes, I encounter the following error:
ValueError: Inputs too long, must be below max_seq_len - max_audio_frames: 1861.
Could you please help me understand how to provide longer audio inputs to achieve better voice cloning accuracy? Additionally, could you let me know which languages are currently supported by this voice cloning approach?