Transcription

Reading Time: 3 minutes

Table of Contents

Definition

It’s the process of converting spoken language – whether phonetic, phonological, lexical or morphological elements – into written form. In simpler terms, it involves transforming an audio or video recording into a text document.

In the audiovisual world, it helps us create subtitles, closed captions (https://localizationlab.com/ca/la-importancia-dels-subtitols/) and written documents, making videos and audio content more accessible to everyone.

Why is it useful?

According to an article by Scope for Business, a leading UK disability equality charity, “transcriptions enable access to video content in a different format and medium. They also include descriptions of on-screen titles and visual cues that viewers would see when watching the video.”

“The main difference is that transcripts function independently of the video player. Users don’t need to interact with the player itself; instead, transcriptions can be accessed on a separate webpage or embedded within an existing page (e.g., as an accordion element).”

Transcriptions also provide the only way for deafblind users to access video content independently. With a touch device, such as a Braille display, they can read the transcribed content.

Learn more about making video and audio content more accessible from the nonprofit World Wide Web Consortium (W3C): https://www.w3.org/WAI/media/av/

Types of transcription

Edited

An edited transcript corrects grammatical errors and removes word repetitions, hesitations (like “um,” “uh,” “mmm”), and extraneous sounds (such as coughing or doors opening and closing). The goal is to ensure that the message is clear and the reading flows smoothly.

This type of transcript doesn’t alter the meaning of the content but presents it in a more polished, formal way. It is often used in professional or educational settings, such as conferences, classes, seminars, or talks.

Verbatim

A verbatim transcript captures exactly what is heard in the audio or video, providing a word-for-word representation. It includes:

Grammatical errors
Background noises
Hesitations (such as “um,” “uh,” “mmm”)
Other sounds (like laughter, crying, applause, etc.)

When including these elements, they are noted in square brackets, such as: [laughter], [clears their throat], [crying], [door opens], [another person enters and gestures]. Moreover, additional context is captured, but it can make reading more challenging.

Verbatim transcriptions are commonly used in legal and judicial contexts (e.g., hearings, court proceedings, witness statements) as well as in medical and healthcare settings (e.g., recorded patient notes, medical visits).

Smart Verbatim

This style strikes a balance between edited and verbatim transcription. It omits sounds, background noises, and hesitations but retains the original language without formalising it, as with edited one.

Smart verbatim is more subjective, as the transcriber decides what to keep or omit. It’s also suitable for professional or educational settings, like conferences, classes, seminars, or talks.

Phonetics

According to Wikipedia, “phonetic transcription is a visual representation of speech sounds in a language by means of symbols used in a phonetic alphabet.” The scientific community uses the International Phonetic Alphabet (IPA) for this purpose, with symbols shown in brackets.

Two main types of phonetic transcription exist – broad and narrow. Broad transcription omits finer details, while narrow transcription captures subtle sound nuances. If transcription only includes phonemes and archiphonemes, it’s referred to as phonological transcription.”

This type is valuable for linguistic studies, language learning, or speech therapy.

In each case, clients can choose the type that best suits their intended use.

Creating the transcript

Today, artificial intelligence can assist in creating a first draft of transcriptions, with various online software options available. These help transcribers to work more efficiently and focus on the most challenging sections and important annotations.

When transcribing a video, it’s essential to provide context for each scene:

If a graphic or drawing appears, it should be described and explained.
If speakers are visible and there are changes in interlocutors, these details must be noted.
If gestures are used to clarify or emphasise points, this information should also be added to the transcript.

Project: Video transcripts for the Democratic Memorial of Catalonia

At LocalizationLab we partnered with the Democratic Memorial of the Generalitat de Catalunya to support the preservation and dissemination of knowledge about the Second Republic, the Civil War, its diverse victims, Francoist repression, exile, deportation, and the anti-Franco foundations of the restored democracy.

We were tasked with transcribing interviews with a survivor of the Ravensbrück concentration camp, who was deported, joined the French resistance, and shared their experiences. The project involved six interviews with the Catalan survivor of the Ravensbrück camp. The interviews were recorded in French and Catalan and transcribed according to the Democratic Memorial’s guidelines for the transcription of interviews.

If you need assistance with a video or audio transcription in any language, we’d be happy to help. Send us an email (info@localizationlab.com) and we will contact you as soon as possible.

The LocalizationLab Team

LocalizationLab

Video & audio transcription

Definition

Why is it useful?

Types of transcription

Edited

Verbatim

Smart Verbatim

Phonetics

Creating the transcript

Project: Video transcripts for the Democratic Memorial of Catalonia

All Categories: