Instructions to use sesame/csm-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sesame/csm-1b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="sesame/csm-1b")# Load model directly from transformers import AutoProcessor, AutoModelForTextToWaveform processor = AutoProcessor.from_pretrained("sesame/csm-1b") model = AutoModelForTextToWaveform.from_pretrained("sesame/csm-1b") - Notebooks
- Google Colab
- Kaggle
Choosing the same voice for all audio generated files
#58
by abesimi - opened
Hi,
I am using lately csm-1b to produce audio in python and it works fine, but it seems to me that I cannot assign the same speaker for different executions.
Each time I run the script, a different voice is produced.
Any feedback to keep consistent speaker is welcomed.
Something like Google Gemini has, a set of enumerated speakers...
Here's my code.
def getAudioFromText(text: str, tempID: str) -> bool:
conversation = [
{"role": "0", "content": [{"type": "text", "text": text}]},
]
inputs = aprocessor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
).to(device)
# infer the model
try:
audio = model.generate(**inputs, output_audio=True,)
audio_url=os.path.join(tempID, f"output_audio.wav")
aprocessor.save_audio(audio, audio_url, sampling_rate=24000, format="wav")
#why the audio file is missing the last word or second?
#fix by adding silence at the end of the audio for 200 ms
return True
except Exception as e:
return False