Zoom's built-in transcription needs the Business plan at 199 dollars per user per year. Skip it. Record locally, drag the .m4a into a free Windows app, and get the full transcript in minutes.
Works with any Zoom tier, including the free plan.
Open Zoom, go to Settings, Recording, and turn on Local Recording. Pick a folder you can find later (the default is Documents, Zoom). Local recording is available to free Zoom users as long as you are the host of the meeting. If you are not the host, ask the host to either record themselves or grant you recording permission.
While in the call, click Record in the Zoom toolbar (or hit Alt+R) and pick Record on This Computer if you have both local and cloud options. Zoom will display a small Recording indicator in the corner so everyone knows. When the call ends, leave the meeting normally. Zoom takes a minute or two to convert the recording into final files.
Open File Explorer and navigate to Documents, Zoom. Inside you will see a dated subfolder for each recording. Open the most recent one. You will find audio_only.m4a (the call audio, smaller file) and zoom_0.mp4 (the screen-share video). For transcription you only need the audio file. The video file works too if you prefer; StarWhisper extracts audio from .mp4 automatically.
Install StarWhisper if you have not yet. Open it and drag audio_only.m4a from File Explorer onto the window. The app detects the language and starts transcribing. A one-hour call typically processes in 5 to 15 minutes on a modern CPU, or 1 to 3 minutes on an NVIDIA GPU. Progress shows in real time. You can leave it running in the background.
The transcript appears in the StarWhisper window. Read through it, copy the full text to clipboard, or save it to a .txt file. Paste into Notion, OneNote, Google Docs, or your meeting notes system. The transcript is plain text without speaker labels, which keeps the file portable and easy to clean up. Total cost zero. No upload, no Zoom Business upgrade, no per-minute fees.
A free workflow that does what the 199 dollar plan does.
Zoom Business is around 199 dollars per user per year. This workflow stays on the free Zoom tier and uses StarWhisper's free local transcription. Annual savings stack up fast for solo users or small teams.
Both the Zoom local recording and the StarWhisper transcription live on your hard drive. No upload to Zoom's servers for transcription, no third-party meeting bot joining the call.
Same workflow handles Microsoft Teams, Google Meet, Webex, Slack Huddles, or any other call recorded locally. The audio file is the only thing that matters.
Working with a team across Berlin, Tokyo, and Sao Paulo? Whisper auto-detects the language of each recording. Multi-language support details.
Same install also lets you dictate by voice directly into Zoom chat (or any other text field). See the voice-to-text in Zoom guide for the press-and-hold workflow.
NVIDIA GPU owners process a one-hour meeting in 1 to 3 minutes via CUDA. Without a GPU you still get usable speed on modern CPUs. GPU details.
Zoom does include automatic meeting transcription, but only on the Business plan and above. As of writing, the Business plan is roughly 199 dollars per user per year (149 to 199 depending on contract length and seat count). For a single freelancer or a small team where most calls do not need real-time transcription, that is a lot of money for a feature you would use occasionally.
The official Zoom transcription also has limitations that surprise people the first time. The audio is processed in Zoom's cloud, so it leaves your network. Recordings are stored on Zoom's servers, subject to Zoom's retention policies. And it is locked to Zoom calls specifically: it cannot transcribe a recorded Microsoft Teams call or a podcast interview you did somewhere else.
This guide describes the alternative most people end up using. Keep the free Zoom tier, use built-in Local Recording (which is free), and process the resulting file with StarWhisper on your Windows PC. Total cost: zero. Total annual savings vs Zoom Business: 199 dollars per seat.
When you record a Zoom meeting locally as host, Zoom captures the mixed audio of every participant (your mic plus everyone else's mic streams). That gets saved as audio_only.m4a. It also records the video grid and screen share to zoom_0.mp4. For transcription you only need the .m4a.
What you do NOT get out of the box is per-speaker audio. The recording is a single mixed track. This means StarWhisper will produce a continuous transcript without speaker labels. For most action-item-and-decisions use cases this is fine: it is fast to read through and reconstruct who said what when you remember the discussion. If speaker diarization is critical for your workflow, you would need a paid cloud transcription service or a more advanced local setup. Honest disclosure.
Zoom also records your screen-share text. The transcript only covers spoken audio, not text on the screen. If someone shared a document with critical information, save that document separately.
One-time setup in Zoom that makes the rest of this workflow smooth:
That is it. Future meetings will record locally to that folder. You only have to set this up once per machine.
Transcribing a recorded meeting is faster than real-time, sometimes much faster. The exact speed depends on your hardware and which Whisper model you use. Rough numbers for the default medium model on representative machines:
| Hardware | 30-min meeting | 60-min meeting | 2-hr meeting |
|---|---|---|---|
| Modern laptop CPU (i7 or Ryzen 7) | 3 to 6 min | 6 to 12 min | 12 to 25 min |
| NVIDIA RTX 3060 (CUDA) | 30 to 60 sec | 1 to 2 min | 2 to 5 min |
| NVIDIA RTX 4090 (CUDA) | 10 to 20 sec | 20 to 40 sec | 1 to 2 min |
| Older CPU (5+ years) | 10 to 20 min | 25 to 45 min | 50 to 90 min |
For most office laptops bought in the last three years, expect a one-hour meeting to transcribe in 6 to 12 minutes. That is faster than re-listening to the meeting at 2x speed. If you do a lot of meeting transcription and have an NVIDIA GPU sitting around, enabling the CUDA pack drops the time by an order of magnitude.
Separate use case worth mentioning. Beyond transcribing recorded meetings, StarWhisper's main feature is press-and-hold voice dictation into any text field. During a Zoom call you can use it to type into Zoom chat by voice without breaking eye contact with your camera.
The workflow: click into the Zoom chat input, hold the StarWhisper hotkey (default is Right Alt), speak the message, release. Your speech becomes typed text. This is useful for sending detailed messages to attendees while screen sharing, capturing fast notes during a call without alt-tabbing to a notes app, or running an open chat thread alongside the verbal conversation.
For the full real-time-dictation workflow with Zoom, see the dedicated voice-to-text in Zoom guide. The transcription engine is the same; only the trigger is different.
This workflow keeps the meeting audio and transcript on your device. Zoom local recording saves to your hard drive. StarWhisper Local Mode processes it locally. The resulting transcript is a .txt file on your PC. None of this leaves your network unless you choose to share it (paste into a cloud doc, email it, upload it).
Compare to alternatives. Otter.ai joins your meeting as a bot and uploads audio to Otter's servers. Notta does the same. Even Zoom's own transcription processes audio in Zoom's cloud. For confidential calls (M&A discussions, performance reviews, customer interviews under NDA), the local-only workflow is a meaningful improvement. The privacy and offline architecture page covers the full data-flow analysis.
If you are in a regulated industry (healthcare, legal, financial services) the same architecture supports your compliance posture. The HIPAA compliance FAQ walks through what local processing means for protected health information specifically.
Sales reps doing discovery calls, recruiters doing screens, and account managers doing renewals all benefit from transcripts of recorded calls. The workflow here is the same: record locally, transcribe afterward. If you want a deeper look at how sales teams and HR functions are using local transcription, see the role-specific pages. The voice-to-text for HR managers guide covers candidate screening workflows. The voice-to-text for content creators page covers podcast and interview workflows that overlap heavily with sales call transcription.
Dictate by voice directly into Zoom chat during a live call.
Same local-recording workflow for Teams calls and Teams transcripts.
Convert Drive recordings of Google Meet sessions into searchable text.
Candidate screening transcripts and confidential interview notes.