Whisper Diarization

In-browser automatic speech recognition w/
word-level timestamps and speaker segmentation

You are about to download whisper-base and pyannote-segmentation-3.0, two powerful speech recognition models for generating word-level timestamps across 100 different languages and speaker segmentation, respectively. Once loaded, the models (77MB + 6MB) will be cached and reused when you revisit the page.

Everything runs locally in your browser using 🤗 Transformers.js and ONNX Runtime Web, meaning no API calls are made to a server for inference. You can even disconnect from the internet after the model has loaded!

Input audio/video
Drag & drop or click
to select media
(or try an example)