Outlook's built-in Dictate button needs Microsoft 365 and uses Microsoft's older speech engine. StarWhisper is free, runs locally, uses OpenAI Whisper for higher accuracy, and works in classic Outlook, new Outlook, and Outlook on the web.
Works in classic Outlook, new Outlook, and Outlook on the web.
Download StarWhisper from the homepage or the Microsoft Store. The installer is around 200 MB on first launch because it bundles the Whisper model. No account signup, no credit card, no cloud requirement. After install, launch the app once and complete the 30-second setup (default microphone, hotkey choice, accept defaults for the rest). The app then sits in the system tray and stays out of the way.
Launch whichever Outlook you use: classic desktop, new Outlook (the rebuilt UWP version), or Outlook on the web in a browser. Click New Email (or Ctrl+N) to open a compose window. This works the same way for replies and forwards too. You do not need to enable anything inside Outlook itself, no add-in, no setting, no toggle. StarWhisper operates outside Outlook entirely.
Place the cursor inside the message body, exactly where you want the dictated text to appear. You can also dictate into the Subject line, To/Cc/Bcc fields, or any other text input in Outlook by clicking there first. The rule is simple: wherever the keyboard cursor is, that is where StarWhisper will type. This is why it works equally well in classic, new, and web versions of Outlook.
Press and hold the default hotkey (Right Alt). A small recording indicator appears in the corner of your screen. Speak the email naturally. You can say "comma", "period", "new paragraph", "question mark", or just let the engine punctuate from intonation, which Whisper does well. The shorter and more natural the utterance, the better the transcription, but it handles long monologues too.
Release the hotkey. Within a second or two (less if you have a GPU) the transcript appears in the Outlook body, typed in like keyboard input. Read through, fix the rare error, hit Send. You can also keep holding additional bursts of dictation to add paragraphs as you go, or mix dictation with normal typing for technical bits (model numbers, URLs) that voice is bad at.
Higher accuracy, no subscription, more places it works.
Outlook's Dictate button only appears in Microsoft 365 subscription accounts and recent perpetual licenses. StarWhisper has no Microsoft dependency at all. Free Outlook on the web, an older Outlook 2016, or any 365 tier all get the same dictation.
Whisper benchmarks at roughly 98 percent accuracy. Microsoft Speech (used by Outlook Dictate) sits closer to 88 to 92 percent depending on accent and vocabulary. The difference shows up most on accented English, names, technical terms, and longer utterances. Accuracy details.
The three Outlook flavors are diverging right now. Microsoft's Dictate button is most reliable in classic and the 365 web client. StarWhisper works identically in all three because it types at the OS layer, not as an Outlook add-in.
Whisper supports 96 languages and detects which one you are speaking automatically. Reply in English, then immediately reply in German, then Japanese, no setting changes needed. Useful for international support, sales, and recruiting teams.
StarWhisper Local Mode processes audio entirely on your CPU or GPU. Audio never leaves the machine. Useful for compliance contexts (legal, medical, HR), for working on planes or in low-signal areas, or for shared workstations where outbound audio is restricted. Privacy details.
Same install dictates into Word, Excel, Teams, browsers, Slack, Notion, and any other Windows text field. Universal-app coverage.
Microsoft added a Dictate button to Outlook around 2018 and improved it incrementally since. It works. For one-off short messages on a high-end mic in a quiet room, it does the job. But there are three persistent issues that drive people to look for alternatives.
First, the speech model. Microsoft uses its own speech engine, which has been around for years and gets steady but small updates. OpenAI Whisper, by contrast, was trained on 680,000 hours of multilingual audio and benchmarks roughly 6 to 10 percentage points higher in word error rate on standard test sets, with the gap widening on accented English, technical vocabulary, names, and longer utterances. For a Software Engineer dictating a bug report or a Recruiter dictating candidate notes after an interview, those are exactly the conditions where Microsoft Speech struggles.
Second, the subscription gate. The Dictate ribbon button is a Microsoft 365 feature. If you are on a perpetual-license Outlook 2019 or an older version, it simply is not there. If your organization keeps you on a cheaper 365 tier or a non-business edition, the button may also be missing.
Third, the new Outlook transition. Microsoft is rolling out a new Outlook based on web technologies to gradually replace the classic Win32 desktop app. The Dictate button works in both, but the underlying behavior is slightly different, hotkeys differ, and add-ins do not transfer cleanly. People in mixed-version organizations end up with inconsistent dictation behavior. StarWhisper sidesteps all of this because it is not part of Outlook; it types into the active text field at the OS level.
Approximate word error rate (lower is better) across common email-dictation conditions:
| Condition | Microsoft Dictate | StarWhisper (Whisper) |
|---|---|---|
| Native English, quiet room, headset mic | ~5% WER | ~2% WER |
| Native English, laptop mic, office noise | ~10% WER | ~4% WER |
| Accented English (Indian, Chinese, Eastern European) | ~15-22% WER | ~5-8% WER |
| Technical vocabulary (model numbers, jargon, acronyms) | ~18% WER | ~6% WER |
| Long-form (3+ paragraphs) | ~12% WER | ~3% WER |
| Non-English (German, Spanish, French, Japanese) | ~12-25% | ~3-7% |
Numbers are approximate and vary by mic, room, and content. The pattern holds across the conditions that matter most for email dictation: longer messages, technical content, and accented speakers.
StarWhisper does not have a special concept of "Outlook fields", which is what makes this work simply. Wherever the keyboard cursor is, that is where the transcribed text lands. A few practical patterns:
Click into the Subject field. Hold the hotkey. Say the subject. Release. Done. Subjects tend to be short, so the dictation completes almost instantly.
Click into the body. Hold and dictate the full message in one go, or in several bursts. You can think between bursts; the engine processes each burst independently when you release the hotkey. Mix dictation with normal typing for chunks that are awkward to say (URLs, codes, technical configs).
Less useful because Outlook's autocomplete is fast for known contacts. But you can dictate addresses, then let autocomplete take over after the first letters land.
Same flow as a new message. Click into the reply body above the quoted text, dictate, send. The quoted thread is unaffected.
The same hotkey dictates into a calendar invite description, a Notes app, OneNote, Teams chat, or any other Microsoft text input. There is no special integration; it just works because every Outlook field is a Windows text input under the hood.
Email dictation gets interesting for people working across languages. A Recruiter in Berlin might reply to a German hiring manager in German, then an English candidate in English, then a Japanese applicant in Japanese, all in the same morning. With Microsoft Dictate that requires manually switching the dictation language each time. With Whisper auto-detection it just works.
Strong languages where Whisper accuracy is roughly comparable to native English: German, Spanish, French, Italian, Portuguese, Dutch, Polish, Japanese, Mandarin Chinese, Korean, Russian, Arabic, Turkish, Hindi, Swedish, Danish, Norwegian, and Finnish. The multi-language page covers the full list. Less common languages also work but accuracy varies more.
For sales teams writing to international leads, the voice-to-text for sales reps guide has more on multilingual outbound. For HR teams running cross-border recruiting, the voice-to-text for HR managers page covers candidate communication workflows.
Email content is often confidential: client communications, HR matters, legal advice, financial details, customer support tickets with personal data. StarWhisper's default Local Mode means the audio you speak never leaves the device. The transcription runs locally on CPU or GPU, the text lands in Outlook, end of pipeline. No third-party service holds a recording of you discussing a deal or a personnel matter.
Compare to cloud dictation services that send audio to a transcription API. Even Microsoft Dictate sends audio to Microsoft's cloud for processing in some configurations. For organizations subject to HIPAA, attorney-client privilege rules, GDPR concerns, or internal data-residency policies, local-only dictation is much easier to defend in an audit. The HIPAA FAQ walks through the specific architecture and what it means for protected health information.
For deeper detail on the audio path and the architecture choices behind Local Mode, see the local vs cloud Whisper FAQ.
Average typing speed is around 40 words per minute. Average comfortable speaking speed is 130 to 150 words per minute. In practice, dictation does not produce a 3 to 4x improvement because you have to think about what to say and you spend some time correcting errors, but a realistic 1.5 to 2.5x speedup is common for long-form email. Where the saving really lands is fatigue: dictating for an hour is far less physically tiring than typing for an hour, especially for people managing wrist or shoulder strain.
For knowledge workers replying to large email volumes, the 30 to 60 minutes per day saved by dictation accumulates fast. For sales reps replying to leads from the road (on a laptop, in a car, on a train) the productivity case is even clearer. For people in physical therapy for RSI, switching to dictation is often part of the standard recovery protocol; see the voice-to-text for carpal tunnel guide for more.
The complete dictation reference for every Outlook surface and field.
Use the same hotkey in Outlook, browsers, chat tools, CRMs, and other Windows apps.
Higher reply volume, faster outbound, hands-free CRM updates.
Confidential candidate notes, faster outbound to applicants.