Use ELAN to edit text transcribed with speech recognition models like Whisper

ELAN

ELAN is an extremely versatile annotation tool for text extracted from audio and video recordings.

It provides several ways to view annotations, edit them, allowing multiple tiers, for different information, such as the text in one tier, and gestures by people in the source video in another tier.

You can see both the video and sound file at the same time, which is very useful, to check where there is sound and where there is silence, and splitting them up in meaningful chunks. You can use Audacity to extract an audio file in WAV format.

ELAN is extremely feature rich, so visit their web site https://archive.mpi.nl/tla/elan for more info. Even though it is from 2020, Introduction to Elan (Nick Thieberger) is still a great way to start using ELAN.

See https://emcawiki.net/Transcription_Resources for a great comparison between the most popular transcription tools, such as ELAN, CLAN, etc.

Note: To run the almost similarly named alternative transcription tool CLAN on Linux, you need to compile it from source. Also, it only provides the analysis commands, there's no CLAN editor for UNIX.

Installing ELAN

To install ELAN on Debian, Ubuntu, etc. go to https://archive.mpi.nl/tla/elan/download and find the latest version. For ELAN 6.8 and Simple ELAN (a simplified, very limited transcription editing tool):

wget https://www.mpi.nl/tools/elan/ELAN_6-8_linux.deb
sudo dpkg --install ELAN_6-8_linux.deb
wget https://www.mpi.nl/tools/elan/simple/Simple-ELAN_1-5_linux.deb
sudo dpkg --install Simple-ELAN_1-5_linux.deb

Using ELAN

One great way of using ELAN is to extract text from a media file with the automatic speech recognition (ASR) tool Whisper. It automatically generates text files in different formats (JSON, SRT, TSV, TXT, VTT) where the SRT subtitle-format is especially useful in ELAN. VTT is also supported. Whisper is great at extracting text, but does make mistakes, and ELAN makes the editing much easier.

Normally you'll work mostly in ELAN, using all its features, for example to do structured searches through multiple annotations files, but it is also an excellent SRT editor.

Generate Danish text to IPA

In ELAN, you can add phonetic transcription in a separate tier. You can convert Danish text into IPA (International Phonetic Alphabet) format with the Danish Text-To-IPA tool which looks up words in https://udtaleordbog.dk/, both built by Ruben Schachtenhaufen (https://schwa.dk). Here is an example, phonetically transcribed with the tool:

Text: Dit liv er en film, og du bestemmer selv handlingen
IPA: /'tit ˈlḭːv æɐ en ˈfil̰m, ɒʊ tu pestɛmɐ ˈsɛl̰ ˈhænlɪŋən/

Edit an SRT subtitle file in ELAN

  1. In ELAN, open the media file (for example in mp4 format) and corresponding audio file, probably a WAV-file via File > New
  2. Import the SRT file via File > Import > Subtitle Audacity label file
  3. Click the Play button to register everything.
  4. Click Options > Transcription Mode, and under "select type for column" select "import-sub".

Now, you should see the SRT subtitles, together with the media file, ready for editing. Enable the Loop button to get the sentences repeated continually.

To export an updated SRT file:

  1. Click File > Save
  2. Click File > Export As > Subtitle Text
  3. Select "Subtitle-Tier"
  4. Click OK > Save