How to Clean Transcripts
What Happens During 'Transcript Cleaning'?
During the cleaning stage, Junior transforms speaker labeled transcripts into client- or partner-ready call notes. He does this by taking the putting each transcripts through multiple transformations:
[Only for audio files], turning the speech into a highly accurate verbatim text of the conversation
Parse out question and answer pairs ("QA Blocks")
Clean the language for grammatical and syntactical errors as well as interruptions and repetitions
Remove informal language, colloquialisms, and filler words ('ums', 'ahs')
Extract Entities from every QA block
The end result is that Junior typically removes ~30-50% of the verbatim text from a transcript to get you a set of notes from the conversation that are more synoptic in nature and less of a pure reference material.
Why Do We Have a Review Stage?
Junior does a great job, but can't do a perfect job out of the box. He still requires a human-in-the-loop to guide him, for 3 main reasons:
AI generated transcripts are liable to mishear / mistranscribe. While our transcription service is consistently best in class, the quality of output is still dependant on input quality, and clarity. It is not possible, therefore, for the AI to mistranscribe words like "impossible" as "possible" if there was interference at the wrong time in the audio file.
Proper Nouns are still the most difficult problem to solve with respect to transcription. Even the best transcription models still struggle with industry technical terms and competitor names. As part of the Review Transcript flow, we've built functionality around Junior to solve this.
Large Language Models (LLMs) tend to oversummarise. While they anchor very well to ideas, concepts and arguments, they have the tendency to remove some nuance and anecdotes during the cleaning process. Some of these details and anecdotes may well be items of the conversation you would like to include in a 'cleaned' transcript.
It is critical that you review all output that has undergone transformation via AI.
We have adopted a key design principle throughout Junior: users have the ability to quickly and seamlessly double-check and approve output created by Junior. Ultimately, it is up to you to ensure that work product meets your firm standards.
The importance of the cleaning process should not be understated. By reviewing and approving Junior's output, you are contributing to the creation of a ‘single source of truth’: the workflow tools in Junior are built on top of the cleaned version of the transcripts - errors or omissions at this stage will cascade through the application.
There are additional benefits to the cleaning process:
it ensures your comprehension following a call
it gives you the opportunity to tag the best insights for use later on
by correcting Proper Nouns, you reduce the time taken to clean future transcripts, increase the accuracy of workflow tools and contribute to your firm's knowledge graph
Last updated
Was this helpful?