Automagically synchronize subtitles with video.
-
Updated
Jun 7, 2026 - Python
Automagically synchronize subtitles with video.
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
Voice Activity Detection based on Deep Learning & TensorFlow
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.
Synchronize your subtitles using machine learning
EduSense: Practical Classroom Sensing at Scale
iOS Voice Activity Detection (VAD). Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.
A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.
Speech-end detection library, based on WebRTC's VAD engine
Voice Activity Detection library for Rust with a unified trait interface over multiple backends (WebRTC VAD, Silero). Includes vad-lab, a web-based tool for live experimentation and comparison.
Identifying individual speakers in an audio stream based on the unique characteristics found in individual voices using Python
Visual only speech detection by lip movement. There are countless situations where you can't hear the audio, and it's really frustrating.
Developer experimentation tools for the WaveKat libraries. Includes vad-lab, a web-based tool for testing and comparing VAD backends side by side.
Speech Detection 💬
PocketPiglet for iOS
VadRecorder based webrtc's VAD engine and vo-aac encoder, recording valid speech and discarding silence/noise data
CLI Python basata su AI per rimuovere automaticamente silenzi e segmenti senza parlato dai video, utilizzando Silero VAD e FFmpeg.
Local-first VAD, barge-in, and turn-taking primitives for interruptible voice agents.
Swift library for Voice Activity Detection (VAD) using NVIDIA NeMo MarbleNet model converted to CoreML. Detect speech segments in real-time on iOS/macOS with high accuracy and low latency.
Add a description, image, and links to the speech-detection topic page so that developers can more easily learn about it.
To associate your repository with the speech-detection topic, visit your repo's landing page and select "manage topics."