WhatsApp Voice Note Guide

How to Transcribe a
WhatsApp Voice Message
Free, Offline, Any Language

Tired of seven-minute voice notes from family or colleagues? Drop the file into a free Windows app and read the transcript in seconds. Supports 96 languages, audio never uploaded.

Download for Windows
Microsoft Store
  • Trusted by Windows
  • Quick 30-second setup
"Sorry, I know this is a long one, but..."

Five Steps, About a Minute Per Voice Note

No signup, no upload, no per-minute fee.

1

Install WhatsApp Desktop on Windows

If you only use WhatsApp on your phone, grab the desktop client from whatsapp.com or the Microsoft Store. Open it, scan the QR code with your phone, and your full chat history syncs in. WhatsApp Desktop gives you a reliable right-click Save As menu on voice notes that WhatsApp Web in a browser often does not.

2

Right-click the voice note, Save As

Find the voice message in your chat. Right-click on it and pick Save As. WhatsApp Desktop will offer a file name and save the audio to your Downloads folder as either .opus or .ogg. Both are standard Opus-codec files and StarWhisper handles them natively. You do not need to convert anything.

3

Install StarWhisper

Download StarWhisper from the homepage. The installer is small and the setup walks you through a one-time model download so the app can work offline afterward. The free tier covers 500 words per day and 3,500 per week, enough for typical personal use without a Pro plan.

4

Drag the file into StarWhisper

Open StarWhisper and drag the .ogg or .opus file from File Explorer onto the window. The app picks the language automatically and starts transcribing. A typical 30-second voice note finishes in two to five seconds on a modern CPU. With an NVIDIA GPU it is effectively instant.

5

Read, copy, or save the transcript

The text appears in the StarWhisper window. Copy it to clipboard, paste it into a chat or doc, or save it to a .txt file. The voice note is now searchable, skimmable, quotable text. You never had to listen to the whole thing.

Why People Use This Instead of an Online Tool

Specific reasons, not vague benefits.

Audio stays on your computer

Default Local Mode runs OpenAI Whisper on your own machine. No upload, no third-party storage, no servers seeing your family group chat.

96 languages auto-detected

Whether the voice note is in Spanish, Hindi, Arabic, Mandarin, Polish, or any of 96 supported languages, StarWhisper picks the language automatically.

Native Opus and OGG support

WhatsApp's .opus and .ogg files load directly. No third-party converter, no online MP3 ripper, no pasted command-line ffmpeg invocations.

Works offline after install

One-time model download, then full offline operation. Useful for flights, sensitive recordings, or anywhere you do not trust the network.

Free 500 words a day

Covers around 5 to 10 typical voice notes a day with no signup wall, no credit card, no trial countdown. Free tier details here.

GPU acceleration if you have it

NVIDIA GPU owners get effectively instant transcription via CUDA. GPU support details.

Why Transcribing WhatsApp Voice Notes Is Worth the Five Minutes of Setup

WhatsApp voice messages have a particular problem. They are convenient for the sender, who can monologue while walking, but they are inefficient for the receiver, who has to find headphones, put them in, and listen at real-time speed to extract maybe twenty seconds of actual information. A six-minute voice note from a relative often contains one date, one question, and a lot of context. Reading the transcript in fifteen seconds is a strictly better experience.

The other reason: searchability. Once a voice note is transcribed, you can search your chat history for the words inside it. WhatsApp's own search only indexes text messages, so months of voice notes become an opaque black box. Saving transcripts to a notes app or document means your voice-note information becomes retrievable later. People who get a lot of voice notes from a particular contact (a parent, a manager, a project lead) report that converting them to text changes the relationship with the chat itself.

Cloud transcription services exist, but most charge per minute, ask you to upload sensitive personal audio to their servers, and require a signup with a credit card. The math gets bad quickly: at 10 cents per minute and ten voice notes a week averaging two minutes each, that is 8 dollars a month for what is genuinely a small task. The StarWhisper approach is a free local install that handles unlimited free-tier transcription up to the daily word cap. For most casual WhatsApp users that cap is never reached.

Getting the Audio Off Your Phone: The Three Reliable Methods

The fastest path is WhatsApp Desktop on the same Windows PC as StarWhisper. Once linked, every voice note in every chat is right-clickable to save. This is the recommended setup for anyone who plans to transcribe voice notes more than occasionally.

Method 1: WhatsApp Desktop right-click Save As

Already covered in the steps above. Right-click, Save As, drag into StarWhisper. Two clicks of friction. This works for every voice note in any chat, individual or group, as long as you have the desktop app linked.

Method 2: Forward to email from your phone

On Android, long-press the voice note, tap the three-dot menu, choose Share, and send to your own email address as an attachment. On iPhone, long-press the voice note, tap Forward, then the share-arrow icon, then choose Mail. Open Gmail or Outlook on Windows, download the attachment, and drag the resulting file into StarWhisper. The file usually arrives as .opus on Android or .m4a on iPhone. StarWhisper handles both.

Method 3: Export the entire chat from your phone

For batch transcription of months of voice notes, open the chat on your phone, go to chat settings, choose Export Chat, and pick the option to include media. WhatsApp produces a zip file with every audio attachment as .opus. Transfer the zip to your PC, extract it, and drop the folder onto StarWhisper. The app will process every voice note in sequence and label each transcript by file name. This is what people use when migrating years of family chat audio into searchable text.

What the Free Tier Covers and When Pro Becomes Worth It

StarWhisper's free plan gives you 500 words per day, capped at 3,500 words per week. A typical 60-second WhatsApp voice note transcribes to around 150 words of text. That math works out to roughly 3 to 5 voice notes per day on the free tier, or 20 to 25 per week. For most personal WhatsApp use, this is enough.

If you run a small business through WhatsApp Business, get a high volume of voice notes from clients, or do bulk historical transcription, the limits will start to bite. The Pro plan is 10 dollars per month or 80 dollars per year and removes the word cap entirely. Pro plan details and pricing are on the dedicated page. There is also a free 7-day trial that unlocks unlimited access if you want to verify it works for your workload before paying.

Free Local Mode and Pro Local Mode produce identical transcripts. The Pro plan does not get a different or smarter model. It just removes the word cap and adds some workflow features (custom hotkeys, vocabulary, priority cloud fallback if you opt in). For anyone who only wants to read the occasional long voice note from a parent, the free tier is genuinely sufficient.

Privacy: Why Local Transcription Matters for Personal Voice Notes

Voice notes from friends and family are some of the most personal audio data on your phone. They contain medical complaints, relationship drama, opinions about coworkers, family secrets, and offhand comments people would not want preserved on a server somewhere. Uploading that audio to a cloud transcription service means a third party gets a copy.

StarWhisper runs in Local Mode by default. The audio file you drag in is decoded on your CPU or GPU, the Whisper model on your hard drive does the transcription, and the resulting text appears on screen. Nothing is uploaded. Nothing is logged on a remote server. Nothing is reviewed by humans for quality assurance. You can verify this yourself by unplugging your network connection before processing a file; the transcription still works.

Cloud Mode exists as an opt-in toggle in Settings if you specifically want to use the OpenAI Whisper API for a small accuracy improvement on edge cases. It is clearly labeled, off by default, and never silently switched on. For sensitive personal voice notes, just leave the default settings alone. For the deeper privacy story, see the privacy and offline architecture page.

Speed: How Long a Voice Note Actually Takes

Transcription speed depends on your hardware and the length of the voice note. Rough numbers from the Whisper medium model on common machines:

Hardware30-sec voice note2-min voice note10-min voice note
Modern laptop CPU (i7 or Ryzen 7)2 to 5 sec10 to 20 sec1 to 2 min
NVIDIA RTX 3060 (CUDA)under 1 sec2 to 4 sec10 to 20 sec
NVIDIA RTX 4090 (CUDA)under 1 secunder 1 sec5 to 8 sec
Older CPU (5+ years)5 to 10 sec30 to 60 sec3 to 6 min

The Whisper model size also matters. StarWhisper defaults to a balanced choice (medium) but you can switch to the smaller (faster, slightly less accurate) or larger (slower, more accurate) models in Settings. For voice notes, the default is almost always fine. The big quality gap is between built-in Windows dictation and Whisper, not between Whisper model sizes.

Edge Cases: When This Approach Hits Limits

Honest disclosure of where it works less well. First, very noisy audio. Voice notes recorded outdoors in heavy traffic or wind will see accuracy drop from 95-plus percent to maybe 80 percent. The transcript will still be readable, but you might see a few wrong words. Second, heavy code-switching mid-sentence. If a voice note flips between two languages every other word, Whisper sometimes picks one and transliterates the other. Third, very strong regional dialects in certain languages. Standard Spanish from Spain, Mexico, and Argentina all work well; very thick rural dialects can confuse the model.

For all of these, the workaround is the same: try the transcription and accept that the result will be a useful first draft rather than a perfect record. For most personal voice notes the accuracy is well past good enough.

There is also no built-in speaker diarization for group-chat voice notes that have multiple voices in one recording (rare, but it happens). StarWhisper transcribes everything as a single block of text. You can manually split it after the fact if you need that.

Related Workflows You May Want Next

If you found this useful, the same pipeline works for other audio types. Many people install StarWhisper to handle WhatsApp voice notes and then discover they also want it for interview transcription, podcast transcription, or meeting transcription. The drag-and-drop file flow is the same; only the audio source changes. There is also a real-time dictation mode for typing into any app by voice, which is a separate use case but the same install.

Frequently Asked Questions

What audio format does WhatsApp use for voice messages?
WhatsApp records voice messages in the Opus codec wrapped in either an .opus or .ogg container. Some older Android versions also produce .aac. StarWhisper handles all three formats natively, plus MP3, WAV, M4A, FLAC, and any other common audio format. You do not need to convert the file before transcribing. Drag it in as-is and StarWhisper will decode it locally.
Does this work for voice messages in Spanish, German, Arabic, Hindi, or other languages?
Yes. StarWhisper uses OpenAI Whisper, which supports 96 languages including Spanish, German, French, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Arabic, Turkish, Hindi, Japanese, Korean, Mandarin Chinese, Vietnamese, Thai, Indonesian, and many more. Language is auto-detected from the audio, so you do not need to pick one manually. Accuracy is strongest on clear speech and slightly weaker on heavy regional dialects or low-quality recordings.
Can I transcribe several WhatsApp voice messages at once?
Yes. StarWhisper accepts multiple files in one drag-and-drop session. You can also export an entire WhatsApp chat from your phone (Settings, Chats, Export Chat) which produces a zip with every .opus file from that conversation, then drop the unzipped folder onto StarWhisper. The app processes them sequentially and gives you each transcript labeled by file name. There is no per-file or batch size limit.
Is StarWhisper really free for transcribing WhatsApp voice messages?
Yes. The free tier covers 500 words per day and 3,500 words per week, which is enough for roughly 5 to 10 typical voice messages depending on length. There is no credit card required, no signup wall, and no trial timer that quietly converts. If you outgrow the free tier the Pro plan is 10 dollars per month or 80 dollars per year for unlimited use, but most casual WhatsApp transcription users never need it.
Does this work with WhatsApp Web instead of WhatsApp Desktop?
Partially. WhatsApp Web in a browser does not always expose a Save As option on voice notes; the behavior depends on your browser and Chrome extensions installed. The desktop app gives you a reliable right-click Save As menu every time. If you only have WhatsApp Web available, you can forward the voice note to yourself via email from your phone, download the attachment on Windows, then drag it into StarWhisper. Same end result.
How do I get a voice message off my iPhone or Android phone?
On Android, long-press the voice message in WhatsApp, tap the share icon, and send it to your own email or cloud storage. On iPhone, tap and hold the voice note, choose Forward, then share to Mail, AirDrop to a Mac, or upload to iCloud Drive. Open the email or cloud folder on your Windows PC, save the .opus or .m4a attachment, and drag it into StarWhisper. The pipeline is phone, then any file transfer method, then StarWhisper.
Can I transcribe WhatsApp voice messages without an internet connection?
Yes. After the initial install and one-time model download, StarWhisper runs fully offline. The Whisper model lives on your hard drive and processes audio locally on your CPU or GPU. No internet is required to transcribe a voice note. This is useful on flights, in low-signal areas, or when handling sensitive recordings where you do not want any data leaving the device.
Is my voice message data private when I use StarWhisper?
Yes, by default. StarWhisper runs in Local Mode out of the box, which means the audio is processed entirely on your machine. Nothing is uploaded to OpenAI, to StarWhisper, or to any third party. The transcript stays on your hard drive. Cloud Mode exists as an opt-in for users who want to use the OpenAI Whisper API for a slight accuracy boost, but it is off by default and clearly labeled. For sensitive personal voice notes, just leave the default settings alone.

Stop Listening to Long Voice Notes

Free download. Drag in a voice note, read the transcript in seconds. No upload, no signup.

Download StarWhisper for Windows