feat(backends): add Moonshine streaming STT backend#2115
Conversation
Adds a new 'moonshine' recipe for Moonshine speech-to-text using the moonshine_voice C++ streaming API. Integrates with existing OpenAI-compatible /v1/audio/transcriptions and WebSocket /realtime endpoints. Includes: - MoonshineServer backend (HTTP /inference, TCP streaming) - IStreamingTranscriptionServer interface for generic streaming backends - TcpJsonlClient for line-delimited JSON over TCP - RealtimeSessionManager integration for WebSocket/audio forwarding - ModelManager support for moonshine cache resolution and download - CPU-only, cross-platform (Linux/Windows x86_64) - End-to-end smoke test The existing Whisper path is unchanged.
bitgamma
left a comment
There was a problem hiding this comment.
Thanks for your PR!
I see the current implementation is relying on system-wide Python installation, which is not something we can rely on. Generally, we try to avoid Python-based backends for this reason. The only Python-based backend we have is vLLM but it has been packaged to be self-contained and not rely (or interfere with) the system Python. Similar work must be done before this can be considered viable.
Additionally, I see models are not being downloaded from huggingface. Adding another download repository is something we also want to avoid. I'd much prefer if you could find or re-upload the models on HF instead. I also see voices are being downloaded in the user folder, outside the usual lemonade model folder - this is another thing we want to avoid.
In general, this needs to be reworked to follow the same patterns as the other backends.
|
Thank you! |
Adds a new 'moonshine' recipe for Moonshine speech-to-text using the moonshine_voice C++ streaming API. Integrates with existing OpenAI-compatible /v1/audio/transcriptions and WebSocket /realtime endpoints.
Includes:
The existing Whisper path is unchanged.
Requesting @bitgamma to review per discord comments