🧹How to Clean Transcripts

What Happens During 'Transcript Cleaning'?

During the cleaning stage, Junior transforms speaker labeled transcripts into client- or partner-ready call notes. He does this by taking the putting each transcripts through multiple transformations:

  • [Only for audio files], turning the speech into a highly accurate verbatim text of the conversation

  • Parse out question and answer pairs ("QA Blocks")

  • Clean the language for grammatical and syntactical errors as well as interruptions and repetitions

    • Remove informal language, colloquialisms, and filler words ('ums', 'ahs')

  • Extract Entities from every QA block

Junior sees all interviews as a series of questions and answers and chunks the conversation into "QA Blocks" that comprise the base unit of insight in the platform.

The end result is that Junior typically removes ~30-50% of the verbatim text from a transcript to get you a set of notes from the conversation that are more synoptic in nature and less of a pure reference material.

Why Do We Have a Review Stage?

Junior does a great job, but can't do a perfect job out of the box. He still requires a human-in-the-loop to guide him, for 3 main reasons:

  1. AI generated transcripts are liable to mishear / mistranscribe. While our transcription service is consistently best in class, the quality of output is still dependant on input quality, and clarity. It is not possible, therefore, for the AI to mistranscribe words like "impossible" as "possible" if there was interference at the wrong time in the audio file.

  2. Proper Nouns are still the most difficult problem to solve with respect to transcription. Even the best transcription models still struggle with industry technical terms and competitor names. As part of the Review Transcript flow, we've built functionality around Junior to solve this.

  3. Large Language Models (LLMs) tend to oversummarise. While they anchor very well to ideas, concepts and arguments, they have the tendency to remove some nuance and anecdotes during the cleaning process. Some of these details and anecdotes may well be items of the conversation you would like to include in a 'cleaned' transcript.

The importance of the cleaning process should not be understated. By reviewing and approving Junior's output, you are contributing to the creation of a β€˜single source of truth’: the workflow tools in Junior are built on top of the cleaned version of the transcripts - errors or omissions at this stage will cascade through the application.

There are additional benefits to the cleaning process:

  • it ensures your comprehension following a call

  • it gives you the opportunity to tag the best insights for use later on

  • by correcting Proper Nouns, you reduce the time taken to clean future transcripts, increase the accuracy of workflow tools and contribute to your firm's knowledge graph

Last updated

Was this helpful?