Average typing: 40 WPM. Fast typist: 80 WPM. Average speech: 150 WPM. Voice beats every typist except the world-record sprinters, and Whisper transcribes in real time. This page breaks down the math, the edit-time tradeoff, and where voice loses.
"Is dictation faster than typing?" is the wrong framing. The right one is "for what kind of writing, and after how much editing?"
The marketing pitch for dictation tools usually says "speak as fast as you think." That implies voice is universally faster. It is not. It is faster for first-draft generation and slower or equal for editing, precision punctuation, and code. People who try voice expecting it to replace typing are often disappointed within an hour.
StarWhisper is a push-to-talk dictation app. When you hold the hotkey, voice. When you release, your keyboard works normally. The right framing is "use voice for the parts where voice wins, typing for the parts where typing wins." Net throughput is far higher than either alone.
Five concrete WPM benchmarks worth knowing before you decide.
The most widely cited figure for adult typing speed on a standard QWERTY keyboard. This holds across multiple decades of typing speed surveys. Sustained over a long session, with errors and backspaces, the real-world output is often slightly lower.
Professional typists in keyboard-heavy roles (transcriptionists, legal secretaries, court reporters using QWERTY) reach 65 to 80 WPM sustained. This is roughly half of average speech. A "fast typist" by everyday standards is still slower than an average speaker.
The Monkeytype world record exceeds 300 WPM, but only over a one-minute burst on short common-word tests. No human sustains that pace over a 1,500-word piece. Almost nobody you know writes at world-record pace. The relevant comparison is against your own typing.
The standard figure for conversational English. Newscasters target 140 to 160 WPM because that range is comfortable for listeners. Spontaneous adult conversation sits in the 120 to 180 range. Modern speech recognition handles this entire range with no special configuration.
Some podcast hosts and audiobook narrators reach 180 to 200 WPM comfortably. Auctioneers go higher in short bursts. The upper limit of comfortable spoken English for most people is around 200 WPM before clarity starts to degrade.
OpenAI's Whisper model, the engine under StarWhisper, transcribes audio at faster than 1x speed on modest hardware. On a mid-range NVIDIA GPU, transcription is essentially instant. There is no waiting for the model to catch up to your speech.
Numbers verified against multiple typing-speed surveys, the Monkeytype leaderboards as of May 2026, and standard linguistic references for conversational speech rates.
| Category | Words per minute | Sustained? | Vs voice (avg 150) |
|---|---|---|---|
| Average adult typing | 40 WPM | Yes | Voice 3.75x faster |
| Professional typist | 65 to 80 WPM | Yes | Voice 2x faster |
| Very fast typist | 100 WPM | Sort of | Voice 1.5x faster |
| Monkeytype world record | 300+ WPM | No (one-min burst) | Typing 2x faster (in burst) |
| Average conversational speech | 150 WPM | Yes | Baseline |
| Fast speech (podcast, audiobook) | 180 to 200 WPM | Yes | 1.3x faster than baseline |
| Auctioneer burst | 250 to 350 WPM | No (short bursts) | 2x burst, not sustainable |
The takeaway is simple. Voice beats average typing by almost 4x. Voice beats a fast professional typist by 2x. Voice matches a very fast typist. Voice loses to world-record sprint pace but only over one-minute bursts, which nobody sustains over a real writing session. For the universe of people who type at normal human speeds, voice is faster.
Pure WPM is not the whole story. Voice dictation produces a messier first draft because you cannot see what you are saying as you say it. The first pass usually has more filler words, more false starts, and more places where you restated something. So the honest accounting is "voice WPM minus edit time."
Pure typing at 40 WPM with normal pauses to think: about 50 to 60 minutes for a clean draft. Pure dictation at 150 WPM: about 10 to 12 minutes for a rough draft. Add 8 to 12 minutes of editing afterward. Total dictation workflow: about 20 to 25 minutes. That is still 2 to 3 times faster than typing, even accounting for the cleanup.
Editing is a different cognitive task from drafting. Most edits are small targeted movements: cutting a filler word, fixing a sentence ending, restructuring two sentences. These are fast on a keyboard. They are slow if you try to dictate them. So the dictate-then-edit workflow uses each input method where it is strongest. Voice produces the bulk material at high speed, keyboard does the surgical fixes.
If you measure throughput as "finished words per total time," the dictate-then-edit workflow comfortably beats pure typing for most prose. Specifically, you can expect net throughput in the 90 to 120 WPM range, even though the underlying dictation is happening at 150 and the editing brings the average down. That is still 2x to 3x faster than typing alone.
Voice is not a universal upgrade. It loses on tasks that require precise punctuation, dense special characters, or surgical editing. Legal contracts where the exact placement of a comma changes the meaning of a clause are typed, not dictated. Code with brackets, colons, snake_case identifiers, and indentation is faster to type than to spell out aloud. Editing existing prose is faster by keyboard than by voice because the changes are tiny and targeted. Spreadsheets and structured documents with lots of formatting are faster by keyboard shortcut than by spoken command. Voice is also impossible in environments where you cannot speak: shared offices, libraries, public transit, late at night. None of these cases mean dictation is bad, they mean dictation is the wrong tool for that specific job. Push-to-talk dictation tools like StarWhisper are designed to coexist with typing, not replace it.
Most writers do a mix of these tasks every day. The right tool varies per task.
| Task | Best input | Speed advantage |
|---|---|---|
| First-draft blog post or article | Voice | 3x to 4x faster |
| Email and Slack/Teams reply | Voice | 3x to 5x faster |
| Long-form fiction first draft | Voice | 2x to 3x faster |
| Marketing copy draft | Voice | 3x to 4x faster |
| Note-taking and journaling | Voice | 3x to 4x faster |
| Editing existing prose | Keyboard | 2x faster (typing) |
| Writing code | Keyboard | 2x to 3x faster (typing) |
| Legal contracts, citations | Keyboard | Precision required |
| Spreadsheets and structured data | Keyboard | Shortcuts win |
Older dictation tools forced you into a "dictation mode" where your keyboard partly stopped working and you had to issue voice commands to switch back. That created friction. Modern push-to-talk dictation, including StarWhisper, eliminates the mode switch entirely.
You bind a hotkey, often a side mouse button, a remapped Caps Lock, or a function key. When you hold the hotkey, the app records and transcribes. When you release, your keyboard works exactly as it did before. There is no mode to enter and no mode to exit. You can dictate a sentence, release, type a correction with your fingers, hold the hotkey again, dictate the next sentence. The transitions are imperceptible.
This matters because the right workflow for most writers is "voice for bulk text generation, typing for everything else." If switching between the two has any friction at all, you stop using voice for the parts where it wins. Push-to-talk reduces the friction to zero. You can also read the deeper coverage of the dictate-then-edit method at how to write faster.
WPM is meaningless if accuracy is bad, because every error eats edit time. The good news is that modern speech recognition has closed the accuracy gap to typing for clean English audio.
OpenAI Whisper, the engine under StarWhisper, was trained on roughly 680,000 hours of multilingual audio. Word accuracy on clear English audio benchmarks in the 95 to 98 percent range, which is comparable to or better than a careful typist's actual error rate (typing studies put the typical error rate at 2 to 5 percent before correction). Accuracy on accented English, non-native speakers, and code-switched speech is markedly better than older HMM-based engines like the one in Dragon Professional Individual.
In practical terms, this means a dictated 1,500-word piece has about 30 to 75 errors before editing, most of them minor (a misheard word, a punctuation glitch). A typed 1,500-word piece has roughly the same error count before correction. Voice does not introduce a meaningful accuracy penalty, only a different distribution of errors. See the professional accuracy feature page for the full benchmark numbers and the model trade-offs.
If you searched "voice typing vs typing speed" or "is dictation faster than typing," you are probably trying to make one of these decisions.
In all six cases, the experiment is free, takes 5 minutes to set up, and you will know within one session whether voice fits your brain. The downside is small, the upside is large.
Voice typing beats average typing by almost 4x in raw WPM and still wins by 2x to 3x after edit time. It loses to world-record typing sprints but only over short bursts that nobody sustains in real work. Per-task, voice wins for bulk prose generation, email, notes, and creative drafting; typing wins for editing, code, formatting-heavy documents, and precision content. The best workflow uses both, with push-to-talk dictation making the switch frictionless.
StarWhisper runs locally on Windows 10 and 11, costs nothing for personal use, $10 per month for unlimited Pro, and supports 96 languages. There is no per-user voice training step and no setup beyond picking a hotkey. If you are trying to find out whether voice fits your workflow, the cost of the test is 5 minutes and zero dollars.
The dictate-first edit-later workflow that famous writers used.
If your bottleneck is meetings, not articles, start here.
Whisper accuracy benchmarks and the model trade-offs.
Workflow guide for novelists, bloggers, and substack writers.