Zoom Transcription Guide

How to Transcribe a
Zoom Call for Free
No Business Plan Required

Zoom's built-in transcription needs the Business plan at 199 dollars per user per year. Skip it. Record locally, drag the .m4a into a free Windows app, and get the full transcript in minutes.

Download for Windows
Microsoft Store
  • Trusted by Windows
  • Quick 30-second setup
"Action items from today's standup..."

Five Steps from Zoom Call to Transcript

Works with any Zoom tier, including the free plan.

1

Enable Local Recording in Zoom

Open Zoom, go to Settings, Recording, and turn on Local Recording. Pick a folder you can find later (the default is Documents, Zoom). Local recording is available to free Zoom users as long as you are the host of the meeting. If you are not the host, ask the host to either record themselves or grant you recording permission.

2

Press Record during the meeting

While in the call, click Record in the Zoom toolbar (or hit Alt+R) and pick Record on This Computer if you have both local and cloud options. Zoom will display a small Recording indicator in the corner so everyone knows. When the call ends, leave the meeting normally. Zoom takes a minute or two to convert the recording into final files.

3

Find audio_only.m4a in your Zoom folder

Open File Explorer and navigate to Documents, Zoom. Inside you will see a dated subfolder for each recording. Open the most recent one. You will find audio_only.m4a (the call audio, smaller file) and zoom_0.mp4 (the screen-share video). For transcription you only need the audio file. The video file works too if you prefer; StarWhisper extracts audio from .mp4 automatically.

4

Drag the file into StarWhisper

Install StarWhisper if you have not yet. Open it and drag audio_only.m4a from File Explorer onto the window. The app detects the language and starts transcribing. A one-hour call typically processes in 5 to 15 minutes on a modern CPU, or 1 to 3 minutes on an NVIDIA GPU. Progress shows in real time. You can leave it running in the background.

5

Review and export the transcript

The transcript appears in the StarWhisper window. Read through it, copy the full text to clipboard, or save it to a .txt file. Paste into Notion, OneNote, Google Docs, or your meeting notes system. The transcript is plain text without speaker labels, which keeps the file portable and easy to clean up. Total cost zero. No upload, no Zoom Business upgrade, no per-minute fees.

Why This Beats Zoom Business Transcription

A free workflow that does what the 199 dollar plan does.

Zero ongoing cost

Zoom Business is around 199 dollars per user per year. This workflow stays on the free Zoom tier and uses StarWhisper's free local transcription. Annual savings stack up fast for solo users or small teams.

Audio stays on your machine

Both the Zoom local recording and the StarWhisper transcription live on your hard drive. No upload to Zoom's servers for transcription, no third-party meeting bot joining the call.

Works for any meeting platform

Same workflow handles Microsoft Teams, Google Meet, Webex, Slack Huddles, or any other call recorded locally. The audio file is the only thing that matters.

96 languages for international calls

Working with a team across Berlin, Tokyo, and Sao Paulo? Whisper auto-detects the language of each recording. Multi-language support details.

Real-time dictation INTO Zoom chat

Same install also lets you dictate by voice directly into Zoom chat (or any other text field). See the voice-to-text in Zoom guide for the press-and-hold workflow.

GPU acceleration

NVIDIA GPU owners process a one-hour meeting in 1 to 3 minutes via CUDA. Without a GPU you still get usable speed on modern CPUs. GPU details.

The Pricing Problem with Zoom's Built-In Transcription

Zoom does include automatic meeting transcription, but only on the Business plan and above. As of writing, the Business plan is roughly 199 dollars per user per year (149 to 199 depending on contract length and seat count). For a single freelancer or a small team where most calls do not need real-time transcription, that is a lot of money for a feature you would use occasionally.

The official Zoom transcription also has limitations that surprise people the first time. The audio is processed in Zoom's cloud, so it leaves your network. Recordings are stored on Zoom's servers, subject to Zoom's retention policies. And it is locked to Zoom calls specifically: it cannot transcribe a recorded Microsoft Teams call or a podcast interview you did somewhere else.

This guide describes the alternative most people end up using. Keep the free Zoom tier, use built-in Local Recording (which is free), and process the resulting file with StarWhisper on your Windows PC. Total cost: zero. Total annual savings vs Zoom Business: 199 dollars per seat.

What Zoom Local Recording Actually Captures

When you record a Zoom meeting locally as host, Zoom captures the mixed audio of every participant (your mic plus everyone else's mic streams). That gets saved as audio_only.m4a. It also records the video grid and screen share to zoom_0.mp4. For transcription you only need the .m4a.

What you do NOT get out of the box is per-speaker audio. The recording is a single mixed track. This means StarWhisper will produce a continuous transcript without speaker labels. For most action-item-and-decisions use cases this is fine: it is fast to read through and reconstruct who said what when you remember the discussion. If speaker diarization is critical for your workflow, you would need a paid cloud transcription service or a more advanced local setup. Honest disclosure.

Zoom also records your screen-share text. The transcript only covers spoken audio, not text on the screen. If someone shared a document with critical information, save that document separately.

Step-by-Step Zoom Recording Settings

One-time setup in Zoom that makes the rest of this workflow smooth:

  1. Open Zoom (desktop client). Click your profile picture in the top right, then Settings.
  2. Click Recording in the left sidebar.
  3. Set the Local Recording location to a folder you can find easily. Default is fine for most people.
  4. Check "Record a separate audio file for each participant" if you want better source separation later. This produces individual .m4a files per speaker, which can help if you need to do manual speaker labeling.
  5. Optionally check "Add a timestamp to the recording". Useful if you cross-reference transcripts with calendar entries later.
  6. Check "Optimize for 3rd party video editor" only if you also use the .mp4 for video editing. Otherwise leave off to keep file sizes smaller.

That is it. Future meetings will record locally to that folder. You only have to set this up once per machine.

Speed and Hardware Expectations

Transcribing a recorded meeting is faster than real-time, sometimes much faster. The exact speed depends on your hardware and which Whisper model you use. Rough numbers for the default medium model on representative machines:

Hardware30-min meeting60-min meeting2-hr meeting
Modern laptop CPU (i7 or Ryzen 7)3 to 6 min6 to 12 min12 to 25 min
NVIDIA RTX 3060 (CUDA)30 to 60 sec1 to 2 min2 to 5 min
NVIDIA RTX 4090 (CUDA)10 to 20 sec20 to 40 sec1 to 2 min
Older CPU (5+ years)10 to 20 min25 to 45 min50 to 90 min

For most office laptops bought in the last three years, expect a one-hour meeting to transcribe in 6 to 12 minutes. That is faster than re-listening to the meeting at 2x speed. If you do a lot of meeting transcription and have an NVIDIA GPU sitting around, enabling the CUDA pack drops the time by an order of magnitude.

Real-Time Dictation INTO Zoom Chat

Separate use case worth mentioning. Beyond transcribing recorded meetings, StarWhisper's main feature is press-and-hold voice dictation into any text field. During a Zoom call you can use it to type into Zoom chat by voice without breaking eye contact with your camera.

The workflow: click into the Zoom chat input, hold the StarWhisper hotkey (default is Right Alt), speak the message, release. Your speech becomes typed text. This is useful for sending detailed messages to attendees while screen sharing, capturing fast notes during a call without alt-tabbing to a notes app, or running an open chat thread alongside the verbal conversation.

For the full real-time-dictation workflow with Zoom, see the dedicated voice-to-text in Zoom guide. The transcription engine is the same; only the trigger is different.

Privacy: What Stays Local and What Does Not

This workflow keeps the meeting audio and transcript on your device. Zoom local recording saves to your hard drive. StarWhisper Local Mode processes it locally. The resulting transcript is a .txt file on your PC. None of this leaves your network unless you choose to share it (paste into a cloud doc, email it, upload it).

Compare to alternatives. Otter.ai joins your meeting as a bot and uploads audio to Otter's servers. Notta does the same. Even Zoom's own transcription processes audio in Zoom's cloud. For confidential calls (M&A discussions, performance reviews, customer interviews under NDA), the local-only workflow is a meaningful improvement. The privacy and offline architecture page covers the full data-flow analysis.

If you are in a regulated industry (healthcare, legal, financial services) the same architecture supports your compliance posture. The HIPAA compliance FAQ walks through what local processing means for protected health information specifically.

For Sales, HR, and Customer Calls Specifically

Sales reps doing discovery calls, recruiters doing screens, and account managers doing renewals all benefit from transcripts of recorded calls. The workflow here is the same: record locally, transcribe afterward. If you want a deeper look at how sales teams and HR functions are using local transcription, see the role-specific pages. The voice-to-text for HR managers guide covers candidate screening workflows. The voice-to-text for content creators page covers podcast and interview workflows that overlap heavily with sales call transcription.

Frequently Asked Questions

Do I need Zoom Pro or Zoom Business to transcribe my calls?
No. Zoom's built-in transcription is only included with the Business plan, which is around 199 dollars per year per user. With this workflow you keep the free Zoom plan (or any paid tier) and use local recording, which is available to free users as long as you are the host. Then StarWhisper transcribes the resulting audio file offline on your PC. Total cost is zero unless you outgrow the StarWhisper free tier.
Can I transcribe a Zoom meeting in real time while it is happening?
Yes for dictation, with a caveat for full-meeting transcription. StarWhisper dictates into any active text field, so you can use it to type into Zoom chat by voice during a call. For full multi-speaker meeting transcription in real time you need to route Zoom audio (everyone's voices) to StarWhisper, which requires a virtual audio cable setup. The simpler workflow most people use is local recording during the call, then transcription right after.
What about Zoom cloud recordings (stored on Zoom's servers)?
Cloud recordings work the same way. Sign into your Zoom account on the web, go to Recordings, find the meeting, and download the audio-only file (Zoom offers M4A or MP4 download). Then drag the downloaded file into StarWhisper. Cloud recording is a paid Zoom feature, but if you already have it, this gives you a free way to transcribe without paying for the additional Zoom Business transcription add-on.
Does this work for Microsoft Teams and Google Meet too?
Yes. The pattern is the same: record the meeting locally (Teams has a built-in Record button, Google Meet recordings come as MP4 in your Drive), then drag the audio or video file into StarWhisper. The app extracts audio from video files automatically. There are also dedicated guides for using voice-to-text inside Zoom, Teams, and Word elsewhere on the site. The transcription engine does not care which platform the meeting was on.
Can I get speaker labels (who said what) in the transcript?
Currently no, StarWhisper does not include automatic speaker diarization. The transcript comes back as a continuous block of text. For most use cases (action items, decisions, getting the gist) this is fine and easy to clean up. If speaker labels are critical, the workaround is to add timestamps in StarWhisper and then manually annotate during a quick review pass. Cloud services like Otter and Notta do speaker labels but at the cost of uploading meeting audio and paying a subscription.
Is the audio uploaded anywhere when I use this method?
No. The Zoom local recording is saved to your hard drive. StarWhisper runs in Local Mode by default and processes the audio entirely on your CPU or GPU using a Whisper model stored on your machine. Nothing is uploaded to OpenAI, to StarWhisper, or to any third party. This matters for confidential calls (client meetings, performance reviews, deal discussions). You can confirm by disconnecting your network during transcription; the app keeps working.
What file format does Zoom save recordings in?
Zoom local recordings produce two files by default. A video file (zoom_0.mp4 or similar) with screen share and video, and an audio-only file (audio_only.m4a) which is just the call audio. For transcription you only need the audio file. Drag audio_only.m4a into StarWhisper. The .mp4 also works (StarWhisper extracts audio automatically) but the .m4a is smaller and processes slightly faster.
Can I dictate INTO Zoom chat using StarWhisper during a call?
Yes. StarWhisper's main feature is press-and-hold dictation into any active text field on Windows. Click into the Zoom chat panel, hold the hotkey (default Right Alt), speak, release. Your speech becomes typed text in the chat input. This is useful for taking notes during a meeting without breaking eye contact or for sending detailed chat messages while screen sharing. Works in 96 languages with auto-detection.

Stop Paying for Zoom Business Just for Transcripts

Free Windows download. Drag any meeting recording in, get a full transcript in minutes. No upload.

Download StarWhisper for Windows