All Guides

OpenAI Whisper Setup (Speech-to-Text)

Install the OpenAI Whisper model and convert audio files to text.

Intermediate25 min

Setup Steps

1. Install ffmpeg (required for audio processing):

sudo apt install ffmpeg

2. Install Whisper via pip:

pip install openai-whisper

3. Command line usage:

whisper audio_file.mp3 --language English --model medium

4. Available models (smallest to largest): tiny, base, small, medium, large-v3

5. Python usage:

python
import whisper
model = whisper.load_model("medium")
result = model.transcribe("audio_file.mp3", language="en")
print(result["text"])

6. Subtitle format output:

whisper audio.mp3 --language en --output_format srt

7. GPU accelerated usage:

python
model = whisper.load_model("large-v3", device="cuda")

8. Faster Whisper alternative (faster):

pip install faster-whisper
python
from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda")
segments, info = model.transcribe("audio.mp3", language="en")
for segment in segments:
    print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")