You are about to download whisper-base and pyannote-segmentation-3.0, two powerful speech recognition models for generating word-level timestamps across 100 different languages and speaker segmentation, respectively. Once loaded, the models (196MB + 6MB) will be cached and reused when you revisit the page.
Everything runs locally in your browser using 🤗 Transformers.js and ONNX Runtime Web, meaning no API calls are made to a server for inference. You can even disconnect from the internet after the model has loaded!