Extract audio from sound or video files with Whisper

With the general-purpose speech recognition model program Whisper you can convert sound to text. It even supports multiple languages.

This is how to install and use it on Debian, Ubuntu and similar Debian-based Operating Systems.

First, make sure that you have Python 3 installed

$ python3 -V
Python 3.11.2

To avoid getting an "Externally Managed Environment" error, use the virtual environment solution from Fix PIP by Creating a Virtual Environment (Recomendded)

Install virtual environment:

sudo apt install python3-venv

Create a new project and install Whisper:

python3 -m venv whisper_soundlasers
source whisper_soundlasers/bin/activate
pip install -U openai-whisper

You can re-use existing environments by running the "source" command again. Also, it's best to use absolute paths, such as /home/username/my-video.mp4.

Use like this, add --device=cpu in case you have a weak graphic card:

whisper --device=cpu --language=en Sound-Lasers-Could-Soon-Become-Reality.mp4

It can transcribe specific languages, using the --language parameter:

whisper --device=cpu --language Danish Jarlen-Kim-Larsen.mp4

It can even translate into English:

whisper --device=cpu --language Danish --task translate Jarlen-Kim-Larsen.mp4