Problem to Fix

Why Is Windows Dictation So Bad?
(And the Free Fix)

Windows Voice Typing (Win+H) uses Microsoft's pre-transformer speech models. Accuracy on clear English hovers around 88 percent. Accents break it. Other languages break it. OpenAI Whisper is the modern alternative, accuracy around 98 percent on clear English, strong on accents and 96 languages, and it runs free, locally, on the same Windows PC.

Download Free for Windows
Microsoft Store
  • Trusted by Windows
  • Quick 30-second setup
"Whisper accuracy: 98%. Win+H: 88%. Same mic."

The Accuracy Gap, in Plain Numbers

Same microphone, same Windows PC, two different speech models.

The built-in

Windows Voice Typing (Win+H)

Microsoft's built-in speech recognition is convenient but uses an older speech recognition stack. Accuracy on clear American English benchmarks around 88 percent (one error every nine words). On accented English it falls into the 70s. On most non-English languages it is unusable for actual writing. It is free, it is built in, it works for grocery lists.

The fix

OpenAI Whisper via StarWhisper

Whisper is a modern transformer speech recognition model from OpenAI, trained on 680,000 hours of audio. Independent benchmarks put accuracy around 97 to 98 percent on clear English, with strong performance on accents and 96 languages. StarWhisper bundles Whisper into a free Windows app that runs locally on your PC. Same microphone. Substantially better text.

Six Things Whisper Gets Right That Win+H Gets Wrong

Specific accuracy differences you will notice on day one

Accented English

Indian English, Scottish, Singaporean, South African, Caribbean, Australian. Whisper was trained on all of them. Win+H was trained primarily on American English and shows it. The gap is much larger than the headline 10 points.

Non-English languages

Whisper handles 96 languages. Win+H supports a much shorter list and accuracy varies widely. For German, French, Spanish, Mandarin, Japanese, Korean, Hindi, Arabic, Russian, and most others, the gap is functionally the difference between usable and unusable.

Technical vocabulary

Whisper handles programming terms, medical vocabulary, legal language, and scientific terminology more accurately because the training corpus included that content. Win+H tends to autocorrect technical words into common English equivalents.

Proper nouns

Names of people, places, brands, products. Whisper preserves more of them. Win+H frequently mangles non-English names or substitutes a phonetic guess.

Longer dictation

Whisper holds context across sentences and produces more coherent paragraphs. Win+H is optimized for short utterances and tends to lose the thread on multi-sentence dictation.

Punctuation and casing

Whisper inserts punctuation contextually and respects sentence boundaries. Win+H requires you to say "comma" and "period" explicitly, which slows down natural speech and produces awkward transcripts.

Why Windows Dictation feels stuck in 2014

Microsoft has shipped speech recognition on Windows for over twenty years. Windows Vista had Windows Speech Recognition (WSR), the keyboard-driven dictation tool that almost nobody used. Windows 10 added a Voice Typing redesign in 2017, accessible via the Win+H hotkey. Windows 11 polished the UI further. What has not changed in any meaningful way is the underlying speech model.

The underlying acoustic model in Windows Voice Typing dates to the pre-transformer era. It uses recurrent neural network architectures trained on a relatively small corpus of mostly American English. By contrast, the field has moved on twice over: first to transformer-based models, then to massive-scale multilingual pretraining. Whisper is the most prominent open example of the second wave, with 680,000 hours of training data across 96 languages.

The accuracy gap is structural, not a tuning problem. Microsoft is presumably working on next-generation speech, but for now, the built-in Windows tool sits on top of older tech. If you have ever wondered why dictation on your Pixel phone or your iPhone feels more accurate than on your Windows laptop, it is the same explanation: those phones run newer models.

Concrete examples of where Win+H fails

The accuracy difference shows up immediately on real sentences. Below are typical examples from user reports. The spoken column is what was said. The Win+H column is the verbatim output. The Whisper column is what StarWhisper produced from identical audio.

Spoken Win+H output Whisper (StarWhisper) output
"The deployment went to staging at 3 PM" the deployment went to staging at three p m The deployment went to staging at 3 PM.
"Schedule a meeting with Aoife on Thursday" schedule a meeting with eva on Thursday Schedule a meeting with Aoife on Thursday.
"The patient reported intermittent dyspnea" the patient reported intermittent disney The patient reported intermittent dyspnea.
"Refactor the auth middleware to use JWT tokens" refactor the off middleware to use jay w t tokens Refactor the auth middleware to use JWT tokens.
"Send the contract to [email protected]" send the contract to monara at example dot com Send the contract to [email protected].

These examples are not cherry-picked. They are representative of the kind of error you see if you dictate for any length of time with anything other than the most generic American English vocabulary.

What Whisper does differently, technically

The accuracy difference is not magic, it is architecture and scale. Whisper is a sequence-to-sequence transformer trained end-to-end on a massive, diverse audio corpus. StarWhisper bundles the Whisper model and runs it on your Windows PC locally.

Bigger and more diverse training data

OpenAI trained Whisper on roughly 680,000 hours of audio collected from the web, including 117,000 hours of multilingual data and 125,000 hours of translation data. This is roughly two orders of magnitude more than what the older Microsoft stack was trained on. Larger and more diverse training data is the single biggest reason Whisper handles accents, technical vocabulary, and non-English languages well.

Transformer architecture

Whisper uses an encoder-decoder transformer, the same general architecture as GPT and modern translation models. This architecture is much better at long-range context than the recurrent models that dominated speech recognition through the 2010s. It is why Whisper produces coherent paragraphs while older systems produce coherent sentences and lose the thread between them.

Multitask training

Whisper was trained jointly on multiple speech tasks: transcription, translation, language identification, voice activity detection. This multitask setup produces a model that is robust in conditions where any single-task model would degrade. In practice it means Whisper handles silent gaps, background noise, and language switching gracefully.

It runs locally

Because Whisper is open source and reasonably sized, it fits on a consumer Windows machine and runs at usable speeds on CPU. That is why StarWhisper can package it as a free local tool. No cloud subscription is involved, no audio leaves your PC, and the accuracy advantage applies regardless of internet connectivity. The full detail of how the model runs locally is on the privacy and offline features page.

Where Win+H wins, honestly

The built-in tool has its place

Windows Voice Typing is free, it is built in, it ships on every Windows 10 and 11 machine, and it requires zero setup. For the case where you want to dictate a single sentence into a text box and you do not care about accents, technical vocabulary, or non-English, it works. Many users get genuine value from it on phones too, where the equivalent built-in dictation is also good enough for short messages.

If your dictation needs are limited to "occasional short sentence in Notepad, in clear American English, with no proper nouns," there is no reason to install anything else. The friction of installing a separate app is not worth it for one sentence every few weeks.

Specifically, Win+H is fine when

  • You only dictate occasionally. Once a week, one sentence at a time, into casual text fields.
  • Your speech is clear American English. Standard vocabulary, no proper nouns, no acronyms.
  • You do not want to install anything. It is already there, zero setup cost.
  • You are testing voice input before committing. Win+H tells you whether voice in general works for your workflow.

Side-by-side feature comparison

Capability Windows Voice Typing (Win+H) StarWhisper (Whisper)
Clear English accuracy ~88% ~97-98%
Accented English Weak Strong
Non-English languages Limited 96 languages
Technical / medical / legal vocabulary Mangled Preserved
Auto punctuation Manual ("comma", "period") Automatic
Auto numerals (3 PM vs three p m) No Yes
Audio leaves your device Yes (Microsoft cloud) No (Local Mode)
Works offline No Yes
GPU acceleration No NVIDIA CUDA + Vulkan
Cost Free, built-in Free up to 500 wpd, $10/mo unlimited
Hotkey Win+H (fixed) Configurable
Works in any text field Most All

How to install the fix and keep Win+H

You do not have to choose. Both can coexist. Here is the simplest path.

Install StarWhisper

  • Download the free installer from the StarWhisper homepage
  • Run the installer. Default settings are fine. The bundled Whisper model is included.
  • The app launches and lives in your system tray

Configure your hotkey

  • Open StarWhisper settings
  • Pick a hotkey that does not conflict with Win+H. Many users pick a side key like the menu key, or remap Caps Lock.
  • Test by opening Notepad, pressing the hotkey, and speaking a sentence

Keep Win+H as a fallback

  • Win+H still works. Use it for whatever quick cases you prefer the built-in tool for.
  • Use StarWhisper for everything that needs accuracy or non-English support

Most users find that within a week they stop pressing Win+H entirely because the accuracy difference is large enough that the built-in tool becomes annoying by comparison. If you want a deeper comparison of the two tools side by side, the dedicated StarWhisper vs Windows Voice Typing page covers the trade-offs in more detail.

Hardware: what your machine needs

Whisper is a real neural network and it does want some compute to run quickly, but the requirements are modest by 2026 standards.

The minimum case

  • Windows 10 (64-bit) or Windows 11
  • A multi-core x64 CPU made in the last 7-8 years
  • 4 GB of RAM (8 GB recommended for the larger Whisper models)
  • Around 1 GB of free disk space for bundled model files

The fast case

  • NVIDIA GPU with CUDA support (any GTX 10-series or newer is enough)
  • 16 GB of system RAM
  • SSD storage (not strictly required, just nicer)

For older or lower-spec machines, StarWhisper picks the right Whisper model size automatically. The small model runs in real time on basically any modern Windows laptop, even integrated graphics. The medium and large models are slower but more accurate and benefit from GPU. Vulkan is available as a cross-vendor GPU path for AMD and Intel cards.

If your reason for asking "why is Windows dictation so bad" is that you want a free local fix that respects your hardware, the answer is yes, this works on machines you already own. There is more detail on the professional accuracy features page.

What about specific Windows dictation problems

"Windows voice typing not working at all"

This is a common Win+H complaint. The fix from Microsoft's support docs is usually to reset speech permissions or reinstall language packs. If you have hit this multiple times and want a more stable tool, installing a separate dictation app is a reasonable workaround. StarWhisper runs independently of the Windows speech stack, so it does not break in the same ways.

"Windows dictation doesn't punctuate"

Win+H does not auto-punctuate by default. You can enable a setting called "auto-punctuation" in some recent Windows builds but the behavior is inconsistent. Whisper handles punctuation contextually based on sentence structure, so spoken pauses become commas, ends become periods, and so on, without manual intervention.

"Windows dictation doesn't understand my accent"

This is the single most common complaint and the one with the largest fix. Whisper handles accented English at near-native speaker accuracy. If your accent is anything other than American, the gap is large enough that switching to a Whisper-based tool feels like getting glasses for the first time.

"Windows dictation doesn't work in [specific app]"

Win+H works in most standard Windows text fields but has edge cases in particular apps. StarWhisper uses the same paste mechanism as any other Windows IME, so it works wherever your keyboard works, including in apps where Win+H fails. This applies to Word, Outlook, Chrome address bars, Slack, and so on. The dedicated offline voice dictation FAQ walks through the compatibility list.

Cost: free to start, $10/month if you need unlimited

The free plan covers 500 words per day, which is enough to evaluate the accuracy difference on real work for a week or two. If you find yourself using dictation heavily (writers, researchers, content creators, anyone who produces more than a few thousand words per day), Pro is $10 per month or $80 per year. There is no per-seat math and no upsell tier. Pricing detail on the homepage pricing section.

For writers in particular, the speed of Whisper-based dictation is the main attraction once accuracy is no longer the blocker. See voice to text for writers for the long-form writing workflow specifically.

Frequently Asked Questions

What is wrong with Windows Voice Typing (Win+H)?
Windows Voice Typing uses Microsoft's older speech recognition stack, which dates to the pre-transformer era. Accuracy on clear North American English is around 88 percent. It degrades quickly on accented English, technical vocabulary, proper nouns, and non-English languages. It also struggles with longer-form dictation because it does not maintain enough context between utterances. None of this is news to Microsoft; the underlying tech is just old.
Why is OpenAI Whisper more accurate?
Whisper is a newer transformer-based speech recognition model from OpenAI, trained on around 680,000 hours of multilingual audio. The training corpus is roughly two orders of magnitude larger than what the older Microsoft stack was trained on, and the architecture is more modern. Independent benchmarks consistently put Whisper accuracy on clear English around 97 to 98 percent, with strong performance on accents and non-English languages where the Windows stack collapses.
Do I have to uninstall Windows Dictation to use StarWhisper?
No. Windows Voice Typing and StarWhisper coexist peacefully. They use different hotkeys (Win+H for the built-in, configurable for StarWhisper) and do not interfere with each other. You can keep using Win+H for quick single-sentence dictation and reach for StarWhisper when you need accuracy on longer text, accented English, or non-English content. Most users just stop opening Win+H once they have StarWhisper running.
Can I use both at the same time?
Technically you can have both installed and active. In practice, they listen to the same microphone, so triggering both simultaneously confuses your audio device. Pick one per session. Most users either replace Win+H entirely or use StarWhisper for content and keep Win+H for the rare case where it is faster to hit Win+H than to switch.
What about accents? Does Whisper handle them better?
Yes, substantially better. Whisper was trained on multilingual audio that included a wide range of regional accents, code-switching, and second-language speakers. Indian English, Scottish, Caribbean, Singaporean, South African, Australian, all transcribe with high accuracy. Windows Voice Typing was trained primarily on American English and shows it. If your accent is anything other than North American, the accuracy gap is much larger than 10 percentage points.
What about other languages?
Whisper supports 96+ languages, including German, French, Spanish, Italian, Portuguese, Dutch, Polish, Swedish, Danish, Norwegian, Finnish, Czech, Hungarian, Romanian, Japanese, Chinese, Korean, Hindi, Russian, Arabic, Turkish, Vietnamese, Thai, Indonesian, and Ukrainian among others. Windows Voice Typing supports a shorter list and accuracy varies widely by language. For non-English dictation on Windows, the gap is large enough that Whisper is functionally the only practical option.
Does Whisper run on integrated GPU?
Whisper runs on CPU just fine, which means any modern Windows laptop, including those with only integrated graphics, can run it. NVIDIA GPUs accelerate transcription via CUDA. AMD and Intel GPUs are supported through the Vulkan path. If you have no discrete GPU at all, the CPU path is still fast enough for real-time dictation on the small and medium Whisper models that StarWhisper uses by default.
What about older Windows 10 machines, will Whisper still work?
Yes. StarWhisper supports Windows 10 and Windows 11. The minimum requirements are modest: a multi-core x64 CPU, 4 GB of RAM (8 GB recommended for the larger Whisper models), and around 1 GB of disk space for the bundled model files. Machines from the last 7-8 years all run it without issue. The CPU path runs on essentially any Windows 10 machine.

Replace Win+H With Something Accurate

Free plan covers 500 words per day. Runs locally on your Windows PC. No setup beyond install.

Download StarWhisper