Speech Recognition versus Transcription

posted Feb 22, 2018, 5:58 PM by Amy LaBranch

A lot of people wonder how I can run a thriving transcription business. Isn't there an app for that? I can tell my smartphone what to do or talk to Alexa. Won't your job soon be obsolete? I have heard that speech-recognition software is getting better and better over the 12 long years I've been a transcriptionist, and yet it is still nowhere near accurate enough for any serious publication. There have been claims that artificial intelligence and machine learning have reached human parity or will soon eclipse us in pattern recognition, but when it comes to processing language, meaning is incredibly important. Even if AI can translate a sound into a word, can it interpret the meaning (i.e. syntax and semantics) well enough to punctuate a sentence correctly? So far, the answer is a big fat no.

Using Dragon for Dictation

To clarify, when it comes to dictation, Nuance's Dragon is the best in the market (which is why it is so expensive). It can understand one person speaking into a recorder (with no accent and no background noise) pretty well because you train it to your voice, and a personal language file is slowly compiled after hours of correcting its output. Still, you have to have a heavy editing hand or else verbalize "comma", "period", "new paragraph". Likewise for Siri or Cortana. You can give them commands or even short-form voice messages, but they are going to have a hard time with multiple speakers, people with thick accents, and the processing time for long-form transcription (i.e. for an hour-long lecture or interview) will take longer that it would for a professional to transcribe.

A Human Transcriptionist is Necessary When You Cannot Compromise on Quality

I work for journalists, academics, politicians, and businesspeople who cannot tolerate mistakes. They may publish those transcripts in newspapers, on blogs, in scientific journals, or as a formal record of proceedings. "ur" (your) or "gonna" (going to) and no periods might be sufficient when communicating via SMS to friends or family, but it's totally unacceptable for formal publications. Incorrect grammar, spelling, punctuation, and capitalization can also be misleading and, frankly, can damage your credibility. You could be at risk of misquoting an interviewee. And anyone who's in marketing knows that messaging is everything.

A Comparison

From time to time, I test speech-recognition software that's available to the public (i.e. free). I have trained Windows Speech Recognition to my voice and tried to use my transcription software to process audio files. I have collected hundreds of hilarious and totally absurd transcripts, all of them completely unusable for a paying client. Not only do they take longer to proofread than it would to just transcribe them from scratch, but it takes a lot of processing power, and my (relatively new) computer slows down to the point that I can't run any other programs.

Using YouTube Auto-Caps

This week, I decided to use YouTube's automatic caption feature, which is based on Google's state-of-the-art automatic speech recognition technology. I had a series of excellent quality audio files of around an hour each. The first thing to know about using YouTube to transcribe, even though it is free, it is very labor intensive. You cannot just upload an audio file to YouTube; it must have some sort of image. So just the process of adding an image to the audio file (with Movie Maker) takes a long time, about 50 minutes in my case.

Then you have to upload the file to YouTube and wait for it to process (about 20 minutes). There's no notification to let you know the captions are available, but I've found that this is generally an overnight process.

On average, I can transcribe a one-hour file in 2.5-3 hours. For excellent audio quality, it would be close to two. So already, YouTube's processing time has taken longer than me, a human transcriptionist.

Once the auto-caps are ready, they are available for download as an .srt, .vtt, or .scc file. These all have timecodes and other tagging embedded within them. I've been doing this for a while. So I have a series of macros to take all of the extraneous coding out, leaving only the spoken words. Then the problem remains that you have a long stream of text, all lowercase, with no punctuation or capitalization, no differentiation between speakers, not to mention recognition errors. So there is a lot of proofreading to do.

The Verdict: Transcription Still Significantly Faster and Substantially Higher Quality

Of the five one-hour files I tested, for proofreading alone, it took me anywhere from 2 to 6.5 hours. So taking into account all of the processing, the overnight captioning, and the proofreading, this is not beneficial to my productivity. Therefore, just as a professional translator would not judge Google Translate as adequate, I will not give discounts to proofread speech recognition files because it literally takes up to three times longer than to transcribe the same audio from scratch.

In the fast-paced world of breaking news and social media, even waiting overnight for a transcript that could've only taken two hours for a human transcriptionist to complete is too long. Why rely on an inconsistent app only to slog through hours of editing when you could just hire an experienced professional to do it for you?

So you want to get started as a transcriptionist

posted Apr 15, 2017, 11:32 AM by Amy LaBranch

Are you a super-fast typist, at least 80 wpm?
Are you a careful listener?
Do you have perfect grammar and spelling?
Can you meet strict deadlines?
Do you have a decent set of headphones?

Perhaps transcription would be right for you.


First things first, you will need the right software and hardware for the job. I use Microsoft Word and Express Scribe (free). (Make sure you choose the "free version". There are two links, and the other one will ask you to upgrade to the pro version after a trial period). Using a foot pedal revolutionizes the job because you never have to take your fingers off the keyboard. Using F-keys starting out will suffice, but investing in a foot pedal will tremendously speed you up, i.e. you will earn more.

Also, a pair of good quality headphones is required. They are industry standard for transcription because headphones keep out the background noise and focus you more clearly on the audio. There's no way you can hear the subtleties if you're just listening to loud-speakers. Sometimes, I get jobs for film projects where the sound guys are using super sensitive equipment to detect even the faintest background noise. So they will see your mistakes if you are not listening carefully. You don't have to spend a lot of money for good quality headphones. I used a pair of $20 Sennheisers for years before I moved to Bose. And they're great, lightweight enough that you can't even tell you're wearing them. I prefer the ones that go over my whole ear, but many transcriptionists use earbuds (I hate them). You can even buy open-backed ones to give your ears a break and help with listening fatigue.

There is a bit of a learning curve with new software, so I suggest testing out a random audio file, maybe 10 minutes, to see how long it takes and to see if you can stomach the work. Keep in mind that industry standard is a turnaround time of 6:1. That means one hour of audio should not take longer than six hours to type. The faster you type, the more you will earn. On average, most jobs are about an hour long and due back in 24 hours. Every once in a while, I get a file format that doesn't work. It's important that you check if the file works right away so that it can be converted into another format if necessary or the client can be notified that there is a problem.

Educate Yourself

For more information about the transcription industry, I recommend joining a forum of professionals, like the Transcription Essentials Forum. They firmly oppose jobs that pay less than $1 per audio minute and educate prospective transcriptionists about exploitative low-paying and crowd-sourcing companies that pay unacceptable wages. To put this in perspective, if it takes you one hour to type 10 minutes of audio, at a rate of $1 per audio minute, then you are making $10/hour. Anything less is not worth your time. Accepting slave wages just brings the whole industry down. Don't do it! They have a great list of companies that hire and fulfill their requirements. Here is also a good place to start: Over 100 Work From Home Transcription Companies (although beware of the low-paying ones, as mentioned above).

As far as I can tell, even though the world seems focused on AI and machine learning right now, there is still no good long-form (i.e. more than a few minutes) speech recognition software: Why Our Crazy-Smart AI Still Sucks at Transcribing Speech. Nuance's Dragon is the best dictation software, and Google and Siri are great for one person, short snippets, voicemails. But so far, I've seen nothing that can ID multiple speakers, do correct grammar and punctuation, and be as accurate (and fast) as a human transcriptionist, especially for speakers with accents.

Being Self-Employed

The feast-or-famine nature of being a freelancer means you don't always have consistent work, and you have to plan for the downtimes. My schedule, for instance, revolves around the quarterly financial earnings seasons. So I have four major busy times every year:
  • Last week January to end February
  • Last week April to end May
  • Last week July to end August
  • Last week October to end November
These jobs are ASAP, meaning I get the assignment, I often have to dial into the live conference call or find the Webcast and record it, and then I have four hours from the end of the call to finish the transcript. Otherwise, I have a variety of other clients, generally with 24-hour deadlines, from legal to corporate to interviews to film editing projects to focus groups, and I specialize in international transcription, where English might not be the speakers' native language.

Finally, freelancers do not get benefits like health insurance or paid vacations, and taxes are not automatically deducted from your paychecks. I highly recommend retaining a CPA to save you money and take the headache out of dealing with all the forms. If you start to rely on transcription as a main source of income, you will need to pay quarterly estimated taxes so you don't get stuck with a giant tax bill on April 15th.


To sum up, if you think you fulfill all the requirements above, I would suggest:
  1. Downloading Express Scribe,
  2. Loading a 10-minute audio file and familiarizing yourself with the software,
  3. Talking to other people who have experience with transcription, and
  4. Applying to some of the above firms to get started.

Transcription of Academic Research, Market Research, Focus Groups, and Qualitative Research

posted Jul 25, 2014, 10:45 AM by Amy LaBranch   [ updated Jun 17, 2015, 10:51 AM ]

Whether you are an academic researcher or someone doing market research, it's crucial to get your interviews and focus groups into written format so you can start your analysis. Note taking during interviews can be distracting and can even introduce nonverbal bias, not to mention that notes can be incomplete and might be deciphered later out of context, and it's important that you get accurate documentation for your research. And even if you digitally record your field interviews or focus groups, typing up hours and hours of content can be a time-consuming and daunting task.

In some cases, you may need a verbatim or exact word-for-word transcript to capture things like mispronunciation, improper grammar, vocalizations, hesitations, or if a video is available, physical gesticulations and fidgeting, that could be important in a social or behavioral context or for conversational analysis.

If you are compiling a qualitative database or using software such as nVivo in order to code or tag your content for easier analysis, transcripts that are already timestamped, coded for speaker identification, or formatted with headings are easier to import, access, and search than fast forwarding and rewinding through audio files. Also, for large projects with multiple interviews, using a standardized format or protocol for the entire collection creates an orderly and consistent database for high-quality analysis that can also be accessed in collaboration with a team of researchers.

If you must deliver examples of first-hand experience or anecdotal evidence, a transcript makes it easy to simply copy-and-paste exact quotes or even whole blocks of text directly into your report.

Finally, if you're writing a master's thesis, doctoral dissertation, white paper, or book, and you find it easier to dictate or hand-write your research, you can leave the typing to me.  Additionally, if you speak English as a second language, and you want an extra set of eyes to review your spelling and grammar, I can proofread your work for a coherent and precise paper.

If you need transcription services for your academic research, marketing research, focus groups, or quantitative research; if you need a verbatim transcript or conversation analysis; if you need a specific template or format to import into a qualitative database; or if you simply need proofreading, please get in touch!

More Listeners with Podcast Transcription

posted Jul 11, 2014, 2:42 PM by Amy LaBranch   [ updated Jun 17, 2015, 10:51 AM ]

Why should you have your podcasts transcribed? Because a typewritten document of any media file creates more accessibility. Since audio and video files are not as easily indexed by Google, a transcript will give your podcast far more online exposure so that prospective listeners can search for key words. Improved search-engine ranking will increase traffic to your Website or blog and ultimately increase your subscribers.

A transcript will give you instant content for any publicity or marketing efforts you may have. Scanning through a text document can make it easier for you to edit your podcast in post-production or pull quotes and sound bites for making promos, rather than fast-forwarding through an audio file.

Sometimes, we just don't have time to listen to every podcast in our queue. With a transcript, your followers can scan it quickly and get the important points, or they can copy-and-paste bits of the text through social media. A transcript will literally make your podcast more accessible for the hearing-impaired audience. And for people who speak English as a second language, a transcript will help them follow your podcast more easily.

Save time by having a professional transcribe your podcasts much more quickly so that you can focus on producing more content to air. Contact me today!

Better Journalism with Transcription

posted Jul 9, 2014, 10:43 AM by Amy LaBranch   [ updated Jun 17, 2015, 10:51 AM ]

It is so much easier to record an interview and have it transcribed than to attempt to be fully engaged in a conversation with someone while you're taking notes. Not only that, but when you get a verbatim transcript of the interview, you can accurately report quotes or use the transcript to pull relevant sound bites for your broadcast or podcast.

Whether you're using a Q&A-style format, an article that will virtually write itself in print media, or you want to get a juicy inside scoop by having an easygoing natural conversation, you can leave all the typing up to me so you can concentrate on asking the right questions and setting the stage for a great piece of journalism.

If you'd like to have an interview transcribed, let me know.

Transcription: Green

posted Jun 27, 2014, 3:41 PM by Amy LaBranch   [ updated Jun 17, 2015, 10:51 AM ]

One of my ultimate goals when I started out in transcription was to operate an environmentally friendly and sustainable business.

Because I work from my home office, and therefore avoid the daily commute, I save on energy cost and reduce fuel consumption and emissions, thereby lowering my overall carbon footprint.

I also conduct all of my business virtually, which means I have a completely paperless office. All communication is over telephone or email, and audio files are simply securely uploaded or shared via Dropbox.

Transcript documents are proofread electronically and returned via email along with paperless invoices, which saves on the cost of envelopes and postage, and saves some trees. Payment can be made easily directly to my bank account or over PayPal.

Every little bit helps to save our planet, and I'm happy to offer a green service. :)

Contact me for all your transcription needs.

