Extract audio from sound or video files with Whisper
With the general-purpose speech recognition model program Whisper you can convert sound to text. It even supports multiple languages.
This is how to install and use it on Debian, Ubuntu and similar Debian-based Operating Systems.
First, make sure that you have Python 3 installed
$ python3 -V
Python 3.11.2
To avoid getting an "Externally Managed Environment" error, use the virtual environment solution from Fix PIP by Creating a Virtual Environment (Recomendded)
Install virtual environment:
sudo apt install python3-venv
Create a new project and install Whisper:
python3 -m venv whisper_soundlasers
source whisper_soundlasers/bin/activate
pip install -U openai-whisper
You can re-use existing environments by running the "source" command again. Also, it's best to use absolute paths, such as /home/username/my-video.mp4
.
Use like this, add --device=cpu
in case you have a weak graphic card:
whisper --device=cpu --language=en Sound-Lasers-Could-Soon-Become-Reality.mp4
It can transcribe specific languages, using the --language
parameter:
whisper --device=cpu --language Danish Jarlen-Kim-Larsen.mp4
It can even translate into English:
whisper --device=cpu --language Danish --task translate Jarlen-Kim-Larsen.mp4