When considering using automatic speech recognition (ASR), it is important to realise that some AV sources are more suitable than others. Here are some things to keep in mind. First, the audio quality must be high. This means that voices should be clear, not echoing, and preferably recorded close to the mouth with adequate microphones.
Secondly, ASR works best with monologues. If an audio file is full of people interrupting each other and talking at cross purposes, the results can be confusing to read, as not all software is able to recognise different people by their voices. Ideally, the file should have a separate channel for each speaker.
Finally, ASR is generally not very good at dealing with accents and dialects. When dealing with migrants or rural dwellers with an accent that might be easy for you to understand, ASR can have great difficulty with it. Let alone accents that are difficult for outsiders to understand.
ASR software
When using software, bear in mind that you are uploading privacy-sensitive files.
Always read the terms and conditions of an ASR service before deciding whether it meets your privacy requirements.
As of January 2023 (version 3.6.12), a new automatic speech recognition option has been built into Subtitle Edit.
This version of Subtitle Edit includes two speech recognition features under the Video tab:
Brief installation instructions for Subtitle Edit 3.6.12, to make the program work best for Whisper speech recognition:
Quickly and easily transcribe audio files into text with OpenAI’s state-of-the-art transcription technology Whisper.
The Pro version requires a small fee of €16.
The Pro version uses Medium and Large models, where transcription results are often much better.
Users who do not want to download software on their computer and still want to use Whisper, they can use SteveDigital’s free service on the Internet.
Online convert audio files or YouTube files into text with OpenAI’s advanced transcription technology Whisper.
There are some drawbacks to using it online, though:
Advantages:
Automatic speech recognition
Automatic speech recognition with Word in Office 365.
With a Microsoft registration, the service can be used online for free.
The disadvantage is that the result is a document without time codes.
Via an option in YouTube Studio, subtitles with time codes can be created.
DOWNLOAD the separate instruction document.
Instruction document for automatic speech recognition in Word can be downloaded here:
Automatic transcription
Automatic transcription with Word in Office 365.
The service can only be used with an Office 365 premium subscription.
(300 minutes of speech recognition per month)
The result is a document with start times per paragraph. An option in YouTube Studio can be used to turn it into a readable subtitle file with time codes.
Download the separate instruction document.
Instructie-document voor automatische transcriptie in Word is hier te downloaden:
The automatic speech recognition can be used with a Google Account.
The disadvantage is that the result is a document without time codes.
Using an option in YouTube Studio, subtitles with time codes can be created from this.
DOWNLOAD the separate instruction document.
Instruction document for automatic speech recognition in Google Docs can be downloaded here:
The automatic subtitles can be created with a Google / YouTube account.
Only suitable for video files.
If you want to have an audio file (mp3, wav, ogg, etc.) automatically transcribed, it must first be converted to a video file in order to be uploaded to YouTube. There are all kinds of free programmes for this. The trick is to load a sound track and put a random picture along the entire length of the sound file. Then save the whole thing as an mp4 file. And the sound file is ready for uploading to YouTube.
Instruction document can be downloaded here
Transcription Portal
The Transcription Portal is an online ASR tool developed and hosted by LMU Munich for academic transcription purposes. The tool is not an ASR service itself, but allows you to process your audio files through many different ASR services. You can then correct and edit the results within the OH-Portal or export them in a file type of your choice.