Google's official Meet transcription is locked behind Workspace Business Standard at about 14 dollars per user per month. This guide shows the free workaround: a Meet recording, a free Windows app, and a transcript in minutes.
Works with or without a Workspace subscription.
If your account is on Workspace Business Standard or higher, recording is built in: click the three-dot menu inside Meet, then Record meeting. If you are on a personal Google account or a Workspace tier without recording, run a free local screen recorder before joining the call. OBS Studio is the most common choice. The built-in Xbox Game Bar (Win+G) and ShareX both work too. Whichever you pick, make sure "Capture system audio" is enabled, or you will record video without sound.
If Google Meet recorded the call for you, it saves to My Drive, Meet Recordings as an MP4. Open Drive, find the file, and download it. If you used a screen recorder, the file is already on your disk wherever you told the recorder to save it. Either way you should end up with a single .mp4 file. File sizes typically run 50 to 300 MB per hour depending on resolution and screen-share content.
Download StarWhisper from the homepage. The installer is around 200 MB on first run because it bundles the Whisper model. There is no account signup, no credit card, and no cloud component required. After install, launch the app once and complete the 30-second setup (pick your default microphone, choose a hotkey, accept defaults for everything else). You are now ready to transcribe.
Open File Explorer to wherever you saved the recording. Drop the .mp4 onto the StarWhisper window. The app extracts the audio track automatically, detects the spoken language, and starts transcribing. A 60-minute call typically takes 5 to 15 minutes on a recent laptop CPU, or 1 to 3 minutes on an NVIDIA GPU using the CUDA acceleration pack. Progress shows in real time. You can minimize the window and keep working.
When the run finishes, the transcript appears in the StarWhisper window. Read it on screen, copy the full text to clipboard, or save it to a .txt file. Paste into Google Docs, Notion, Confluence, OneNote, or your team's notes system. The transcript is plain text without speaker labels, which keeps the file portable. Total elapsed cost: zero. No Workspace upgrade, no Otter subscription, no per-minute transcription fee.
A free workflow that does what the 14 dollar per user plan does.
Workspace Business Standard is roughly 14 dollars per user per month, or 168 per year per seat. For a five-person team that is 840 a year just to enable transcription. This workflow stays free for occasional transcribers, or 10 dollars per month per person on Pro if you transcribe long meetings every day.
The MP4 sits on your hard drive. StarWhisper runs the transcription locally on your CPU or GPU. Nothing is uploaded to Google, OpenAI, or any third party during the transcription itself. Privacy and offline architecture details.
Same workflow handles Zoom, Microsoft Teams, Webex, Slack Huddles, and anything else you can record. The transcription engine treats every recording as just an audio file. See the related guides on transcribing Zoom calls for the platform-specific recording steps.
Distributed teams running Meet calls across English, Spanish, German, French, Japanese, and Mandarin all benefit. Whisper auto-detects the spoken language. Multi-language support page.
Same install also lets you dictate into Google Docs, Chat, or any Windows text field by holding a hotkey. See the voice-to-text in Google Docs guide for the press-and-hold workflow.
NVIDIA GPU owners process a one-hour call in 1 to 3 minutes via CUDA 11 or 12. Without a GPU, modern CPUs handle the same workload in 5 to 15 minutes. Either path is faster than re-listening to the meeting.
Google Meet has live captions for free, but the official transcript export is gated behind Workspace Business Standard or higher. Business Standard is about 14 dollars per user per month, billed annually. For a solo freelancer or a small team where only a few calls per month actually need a transcript, that is a heavy line item. Many teams keep the free tier or a cheaper Workspace plan and end up either taking handwritten notes or paying for an outside service like Otter or Fireflies on top of Workspace.
The cheaper outside services have their own tradeoffs. They join the meeting as a bot, which announces itself in the participant list and unsettles people on confidential calls. They upload meeting audio to their servers, which is a problem for legal, medical, HR, or M&A discussions. And they add another monthly subscription on top of Workspace, which is what people were trying to avoid by not upgrading to Business Standard in the first place.
This guide describes the workflow most independent transcribers and privacy-minded teams settle on. Capture the Meet call (either with Workspace recording, with a free screen recorder, or with one shared by a colleague), then transcribe the resulting MP4 with StarWhisper on your Windows PC. Free, local, no bot in the meeting, no per-minute fee.
The trick to this whole workflow is having a recording in the first place. Three common ways to get one:
Inside the meeting, click the three-dot menu, choose Record meeting, confirm the prompt that asks you to notify participants. When the call ends, the recording is processed and lands in My Drive, Meet Recordings, usually within a few minutes. You get an MP4 with mixed audio of everyone plus video of the active speaker and any screen share.
Before joining the call, start OBS Studio, the built-in Xbox Game Bar (Win+G in Windows 10/11), ShareX, or any other screen recorder. The critical setting is "Capture desktop audio" or "Record system sound", which records what your computer is playing through its speakers. Without this you get video only. Pick MP4 as the output format if your recorder offers a choice. Start recording right before you join the call and stop it after everyone leaves.
If the host or another participant recorded the call and sent you a Drive link, click the link, download the MP4, then jump to step three.
Always tell the other participants you are recording. Most jurisdictions require at least one-party consent, but professional and ethical practice is to disclose. Some workplaces and contracts explicitly forbid local capture of internal meetings, so check before you rely on this for sensitive calls.
A Google Meet recording (either Workspace or screen-recorder) is a single MP4 with one mixed audio track containing every voice plus a single video track of whoever was on screen at the time. The audio is what matters for transcription. There are no per-speaker channels, so neither StarWhisper nor any other single-track transcriber can automatically label who said what.
StarWhisper will produce a clean continuous transcript with sentence breaks and natural punctuation. For typical action-items-and-decisions meetings this is fine: skim the transcript, mentally attribute the lines to whoever you remember speaking, lift out the four or five decisions and action items, share with the team. If you need formal verbatim transcripts with speaker labels (court proceedings, depositions, academic research interviews), you will need either a paid cloud diarization service or a multi-microphone setup where each speaker has their own track.
Transcribing a recorded meeting is faster than real time, sometimes much faster. Approximate runtimes for the default medium Whisper model across common hardware:
| Hardware | 30-min meeting | 60-min meeting | 2-hr meeting |
|---|---|---|---|
| Modern laptop CPU (i7 or Ryzen 7) | 3 to 6 min | 6 to 12 min | 12 to 25 min |
| NVIDIA RTX 3060 (CUDA) | 30 to 60 sec | 1 to 2 min | 2 to 5 min |
| NVIDIA RTX 4090 (CUDA) | 10 to 20 sec | 20 to 40 sec | 1 to 2 min |
| Older CPU (5+ years) | 10 to 20 min | 25 to 45 min | 50 to 90 min |
For most office laptops bought in the last three years, expect a one-hour Meet recording to finish transcribing in 6 to 12 minutes. If you do this regularly and have an NVIDIA GPU in the machine, the CUDA pack drops the time by roughly an order of magnitude.
Distributed teams running Meet calls across Berlin, Tokyo, and Sao Paulo are a major use case for this workflow. Whisper supports 96 languages with strong accuracy in English, German, Spanish, French, Italian, Portuguese, Dutch, Polish, Japanese, Chinese, Korean, Hindi, Russian, Arabic, and Turkish, among others. The model auto-detects the spoken language at the start of the file.
For meetings where speakers switch languages mid-call (a common European pattern), Whisper handles short code switches reasonably well, though it commits to a primary language. If you have a half-Spanish half-English meeting, you may get better results by splitting the recording into two clips and transcribing each in its declared language. The multi-language feature page covers per-language accuracy in more detail.
Translation is also possible. StarWhisper can take a non-English recording and transcribe it directly into English text using Whisper's translate mode. This is useful for internal teams in the US or UK trying to follow a partner meeting in another language without paying a translator. Quality is generally good for major languages and degrades for less-common ones.
This workflow keeps the meeting audio and transcript on your device. Workspace recordings live in your own Google Drive; you control sharing. Screen-recorder recordings save to your hard drive. StarWhisper Local Mode processes the file locally on CPU or GPU. The transcript output is a plain .txt file on your PC. None of this leaves your network unless you choose to share it (paste into a cloud doc, email it, upload it).
Compare to cloud transcription services. Otter, Fireflies, Notta, and similar tools join the call as a bot and upload audio to their servers. Even Google's own transcription processes audio in Google's cloud. For confidential calls (M&A discussions, performance reviews, customer interviews under NDA, legal strategy, medical case reviews) the local-only workflow is a meaningful improvement in data control.
If you are in a regulated industry, the same architecture supports your compliance posture. The HIPAA compliance FAQ covers what local processing means for protected health information specifically.
Recruiters running candidate screens, sales reps on discovery calls, and CS leaders doing renewals all want transcripts but rarely justify a separate transcription line item. The workflow here is the same as any other meeting: record locally, transcribe afterward. For sales teams doing volume work, the voice-to-text for sales reps guide covers integration with CRMs. For HR and recruiting workflows, the voice-to-text for HR managers page covers candidate screening transcripts and the confidentiality requirements that come with them. For a deeper integration with Teams instead of Meet, the voice-to-text in Teams guide is the direct equivalent.
Same local-recording workflow without the Zoom Business 199 dollar plan.
Capture Teams calls and turn the OneDrive recording into searchable text.
Dictate by voice directly into Teams chat during a live meeting.
Candidate screening transcripts and confidential interview notes.