iPhone Voice Memos, QuickTime audio, and YouTube downloads all use the .m4a format. Drag the file into a free Windows app and get the transcript in minutes. No upload, no per-minute fees, no conversion step.
Drag, wait, copy. Native M4A support, no format conversion required.
Grab the free installer for Windows 10 or 11 from the StarWhisper homepage. The setup takes about two minutes and includes a one-time download of the Whisper model so the app can work fully offline. No signup, no credit card, no email confirmation.
If the M4A is on your iPhone, the easiest path is iCloud Drive (save the Voice Memo to Files, then open Files on Windows via the iCloud app) or USB cable. If the M4A came from a Mac (QuickTime export, GarageBand bounce), just copy it across via any cloud sync or USB stick. YouTube downloads that landed as .m4a are already on the PC and ready to go.
Open StarWhisper. Open File Explorer and find the .m4a file. Drag it onto the StarWhisper window. The app reads M4A natively, so there is no need to convert to MP3 or WAV first. Language is auto-detected, including the 96 supported by Whisper.
StarWhisper processes the audio locally on your CPU or GPU. Speed: roughly 10 times faster than real-time on a modern laptop CPU and 50 times faster on an NVIDIA GPU with CUDA. A one-hour M4A transcribes in about 6 to 12 minutes on CPU or 1 to 2 minutes on a mid-range GPU. A short voice memo finishes in seconds.
The transcript appears in the StarWhisper window when processing finishes. Copy the full text to clipboard, save it as a .txt file, or export with optional timestamps if you need to cross-reference the audio. Paste into Notion, Word, Google Docs, your CMS, or anywhere else the text needs to go.
Specific advantages, not vague benefits.
Most online tools demand you convert M4A to MP3 or WAV before they will accept it. StarWhisper reads M4A natively. Skip the conversion and the small quality loss it brings.
Voice memos are personal: half-finished thoughts, private conversations, confidential notes. Cloud converters upload all of it. StarWhisper keeps the audio on your machine. Privacy details.
Free transcription up to the daily word cap. Rev charges 10 to 25 cents per minute; StarWhisper charges zero per minute and you keep the same audio quality and accuracy.
iPhone built-in mics produce clean audio when the speaker is within a meter. Whisper handles this near-perfectly, 95 to 99 percent accuracy on quiet recordings.
Whisper handles 96 languages including all major European, Asian, and Middle Eastern ones. Voice memos in any of these decode without you choosing a language manually. Language support.
NVIDIA GPU owners install the CUDA pack and transcription speed jumps roughly 5x. A one-hour M4A becomes a two-minute job. GPU acceleration.
M4A is the everyday audio container Apple chose for almost every consumer recording it ships. iPhone Voice Memos saves as M4A. QuickTime audio recordings export as M4A. Apple Music downloads come down as M4A. iTunes audio purchases are M4A. Even the audio-only mode in many YouTube downloaders produces M4A files because YouTube's higher-quality audio streams use AAC inside an MP4 container, which is exactly what M4A is.
If you own an iPhone or a Mac, you have probably accumulated dozens, possibly hundreds, of M4A files: voice memos you took during meetings or while walking, interviews you recorded, lectures you sat through, ideas you captured at 2 a.m. The natural next question, sooner or later, is how to read what is inside them. Listening back is slow. Searching the audio for that one specific phrase you remember is impossible. The answer is transcription, and the obstacle most people hit is that none of the easy options are free and private at the same time.
Online M4A converters charge per file or per minute, force a sign-up, upload your audio to their servers, and cap file size on free tiers. StarWhisper handles M4A as a first-class format on your own machine, free, offline, with no per-file fee or upload step.
The most common M4A sources and the cleanest way to get them onto a Windows PC for transcription:
Open Voice Memos on the iPhone, tap the memo you want, tap the three-dot menu, tap Share, and choose Save to Files. Then either save into iCloud Drive (and access via the iCloud for Windows app), email it to yourself, drop it into a shared cloud folder like Dropbox or OneDrive, or AirDrop to a nearby Mac and copy from there. The file lands as a .m4a with the recording's name.
QuickTime saves new audio recordings as .m4a by default. Move the file to Windows via any cloud sync or USB stick. No conversion needed in either direction; the file is already the right format.
Tools like yt-dlp can extract just the audio track from a YouTube video. Run with the audio-only flag and the output is often .m4a (the original AAC stream YouTube serves at higher bitrates). Drop the resulting .m4a onto StarWhisper to get the spoken transcript without re-encoding.
Zoom can save just the audio track of a recorded meeting as .m4a. Drop that file in to get a clean transcript of the conversation without needing to extract from the video file separately.
For a related workflow that starts directly from an iPhone voice memo (with detailed phone-to-PC transfer instructions), see how to transcribe iPhone voice memos. For Zoom calls specifically, the Zoom call transcription guide walks through the audio-only export step.
Real scenarios with rough timing:
Five-minute voice memo recorded on the iPhone while walking. AirDrop or iCloud the .m4a to your Mac, then sync to Windows (or skip the Mac and use iCloud directly). Drop on StarWhisper. Output: about 700 words of text in under 60 seconds. Useful for capturing thoughts on the move and then having a searchable record.
90-minute interview saved as .m4a from QuickTime. Drop on StarWhisper. CPU processing time: 10 to 18 minutes. Output: roughly 12,000 words of plain-text transcript. Light editing, paste into your draft article. Same task on Rev human transcription would cost 22.50 dollars; StarWhisper is free.
30 voice memos accumulated over a week of meetings and conferences. Drag the entire folder onto StarWhisper. The app queues them, processes one at a time, and saves each transcript with the matching file name. Walk away, come back to 30 ready-to-search transcripts.
Five long-form YouTube interviews relevant to a project, downloaded as .m4a audio-only. Drag in. Transcribe overnight. Now you have searchable text instead of having to scrub through hours of video to find the quote you remember.
People sometimes worry that M4A will give a worse transcript than MP3 or WAV. The opposite is closer to the truth.
M4A uses AAC, a more efficient codec than MP3. At equivalent bitrates, M4A retains slightly more high-frequency detail and produces a cleaner reconstruction of the original audio. For Whisper, this means M4A source audio is generally as transcribable as MP3 or better, especially at lower bitrates where MP3 starts to mangle consonant edges.
WAV is uncompressed and represents the original audio bit-for-bit. It is the gold standard for source quality. But Whisper does not need that quality to produce a great transcript: the model was trained on a mix of compressed and uncompressed audio and handles all common formats well. The difference between an M4A transcript and a WAV transcript of the same recording is typically zero detectable difference.
The practical implication: do not waste time converting M4A to WAV or MP3 before transcribing. Drop the original file in. If you want to transcribe a different source format, see how to convert MP3 to text or how to convert WAV to text; the workflow is identical.
Voice memos are personal. Some are mundane (grocery lists, reminders). Many are not. People record:
Uploading these to an online M4A-to-text converter sends the audio to a server you do not control. Some services keep the audio for days or weeks for "quality improvement". Some sell access to the data. Even when the privacy policy says the right things, the audio still sits on someone else's hardware.
StarWhisper Local Mode keeps the entire pipeline on your device. The .m4a is decoded by the app, the Whisper model runs on your CPU or GPU, and the resulting text is written to your hard drive. Nothing leaves the device. For deeper details on how this works, see privacy and offline architecture and how to transcribe audio offline. For specific professions handling sensitive M4A recordings, see voice-to-text for journalists, voice-to-text for therapists, or voice-to-text for lawyers.
The free tier of StarWhisper provides 500 words per day and 3,500 words per week of transcribed output. A typical 5-minute voice memo produces about 700 words; a typical 60-minute interview produces about 8,000 words. So if you transcribe one short voice memo per day, the free tier never runs out. If you transcribe long interviews routinely, you will hit the cap.
Pro removes the word cap entirely. It costs 10 dollars per month or 80 dollars per year. Full Pro details and pricing. There is also a 7-day full-access trial of Pro that you can use to verify the workflow on a long M4A file before committing.
Free and Pro use the same Whisper model and produce identical transcripts. The Pro plan only removes limits and adds workflow features. For pure M4A-to-text use, the only practical difference is the daily output cap. If your usage is occasional voice memos, stay free. If your usage is regular long-form transcription, Pro is the right call.
The same workflow for the world's most common audio format.
Process uncompressed studio recordings and pro audio files.
The full iPhone-to-Windows voice memo transcription workflow.
Interview audio, source recordings, and confidentiality concerns.