Dictate Mandarin Chinese into any Windows application. Whisper-grade Simplified Chinese output, faster than pinyin IME typing, local processing, no audio upload. Free for 500 words a day.
What Whisper does for Chinese that pinyin IMEs cannot
Speak Mandarin, get Simplified Chinese characters at your cursor. No pinyin typing, no candidate-list disambiguation, no tone marks to think about.
Native Mandarin speakers speak around 200 to 250 characters per minute. Fast pinyin IME typists rarely exceed 100. Voice dictation is two to three times faster than typing.
Whisper uses acoustic features, not tone marks. Speak naturally with whatever tones you produce and the engine picks correct characters using sound plus context.
In Local Mode, audio never leaves your Windows machine. Important for cross-border data flow concerns and any sensitive business, legal, or government work.
WeChat for Windows, DingTalk, Feishu, Word, WPS Office, Outlook, browsers. Press the hotkey, dictate, text appears at the cursor. Same install, every app.
500 characters a day on the free plan, 3,500 a week. Pro at 10 dollars per month for unlimited dictation if you write long Chinese documents daily.
Typing Chinese on a Windows machine has been a compromise for as long as personal computers have existed. The dominant approach, the pinyin input method editor (IME), requires you to type the romanized pinyin spelling of each word, then disambiguate from a candidate list of homophone characters. Microsoft Pinyin (built into Windows), Sogou Pinyin, and Google Pinyin all work this way. Even with predictive candidates and personal-dictionary learning, the workflow imposes a constant cognitive interruption: type pinyin, scan the candidate bar, pick the right character, repeat.
A native Mandarin speaker can speak around 200 to 250 Chinese characters per minute in clear speech. A fast pinyin IME typist rarely exceeds 80 to 100 characters per minute, and most casual users sit at 40 to 60. The math is straightforward: voice dictation is two to three times faster than typing for the same content. The cognitive cost is also lower because you skip the disambiguation step entirely. The engine has acoustic context and surrounding-word context, so character selection is often more accurate than what a manual IME pick would produce, especially for casual users who do not have well-trained personal dictionaries.
StarWhisper packages OpenAI's Whisper as a Windows-native dictation tool. The Whisper Chinese model is trained on Simplified Chinese broadcast, news, podcast, and YouTube content at substantial scale. The output is publishable Chinese characters ready to paste into a document, email, or chat. For Mandarin speakers who write a lot of Chinese on Windows (which is to say, most professional Mandarin speakers in mainland China, Singapore, and the diaspora), the workflow gain is immediate.
StarWhisper produces Simplified Chinese by default when you set the language to Chinese. The Whisper Chinese model is trained primarily on Simplified Chinese, which is the mainland China and Singapore standard and the dominant register on the Chinese-language internet. For mainland users, this matches what you want.
If you need Traditional Chinese (the Hong Kong and Taiwan standard), the cleanest workflow is to dictate in Simplified and convert. Conversion is essentially lossless for modern written Chinese because the mapping between Simplified and Traditional is well-defined character by character. Tools that handle this:
For workflows where every output needs to be Traditional, this two-step approach is the cleanest path. The conversion adds maybe a second of friction per document. For one-off Traditional needs, online OpenCC converters work well.
One nuance worth flagging: Hong Kong written Chinese is conventionally standard Chinese rather than written Cantonese. Hong Kong news, business documents, and government writing all use standard Chinese grammar with Traditional characters. So the Simplified-to-Traditional conversion produces output that fits the Hong Kong professional register cleanly. For casual Hong Kong-style writing that intentionally uses Cantonese-specific written characters, you would need to either type or use the Cantonese language setting (which has its own accuracy trade-offs covered below).
Mandarin Chinese has four lexical tones plus a neutral tone, and tonal information is essential for distinguishing words. The character ma in first tone (ma1) means mother, in second tone (ma2) means hemp, in third tone (ma3) means horse, and in fourth tone (ma4) means scold. In a phonetic IME, you have to type the syllable and then pick from a candidate list that includes all the homophones across tones.
Whisper handles tones acoustically. The model is trained on actual audio of Mandarin speech, so tonal patterns are part of the acoustic feature set it learns. You speak naturally with whatever tones you produce, and the engine picks the correct character using sound plus surrounding-word context. You do not type pinyin, you do not pick from a candidate list, you do not think about tone marks. The output is in Chinese characters directly.
One side benefit: speakers whose tone production is less precise (non-native Mandarin learners, dialect speakers whose first language is not Mandarin, kids) get the benefit of contextual disambiguation. If you say something that is acoustically ambiguous between two words but only one fits the surrounding sentence, the engine usually picks the right one. This is closer to how human listeners interpret Mandarin than to how a deterministic pinyin lookup works.
The output is always characters, never pinyin or bopomofo. If you specifically need pinyin output (for language-learning materials, for romanization tables, for academic citation), you would dictate Chinese normally and then run the character output through a pinyin annotation tool. For standard dictation use cases, character output at the cursor is what you want.
For mainland-to-overseas data flow, cross-border employee monitoring, and any business or government workflow involving Chinese-language content, the audio upload question is often the first thing decision-makers ask. The answer for StarWhisper is straightforward: in Local Mode (the default), audio never leaves your Windows machine. There is no upload, no foreign cloud processor, no telemetry of audio content, no transcript retention anywhere remote.
For users concerned about U.S. cloud providers processing Chinese-language content, or about the reverse (Chinese cloud providers being involved in non-Chinese workflows), Local Mode sidesteps both. The Whisper model runs on your CPU or GPU; the audio buffer is discarded immediately after transcription. Nothing is logged.
Cloud Mode is opt-in and clearly labeled in the UI. When enabled for a single transcription, audio is sent to the OpenAI Whisper API for that request and that request only. There is no batch upload, no background telemetry. For any work where data sovereignty matters (legal documents, journalism with sensitive sources, business communications, government work), leave Cloud Mode off. The privacy and offline mode page covers the technical detail.
This contrasts with most cloud-based Chinese transcription services where every audio segment is uploaded by definition. For Chinese-speaking professionals outside mainland China who handle sensitive content, the on-device path is often the simplest defensible posture.
Sales emails, internal memos, customer-service responses, partnership proposals. Standard business Chinese is well represented in the Whisper corpus and transcribes cleanly. Dictate into WeChat for Windows, DingTalk, Feishu, Outlook, or any Windows email client. Press the hotkey, speak the message, release. Output lands at the cursor in proper Chinese characters with appropriate business register.
News articles, feature pieces, opinion columns, interview drafts. Whisper handles journalistic Chinese register well. Standard Mandarin proper nouns (politicians, companies, places) come through correctly for names common in the training corpus; very obscure names may need correction. Long-form Chinese writing benefits more from voice dictation than English does because the per-character typing penalty is higher. The voice to text for content creators page applies equally to Chinese content workflows.
Chinese-American professionals in tech, finance, consulting, and academia routinely produce documents that mix English with Chinese. Whisper code-switching handles this well. Set the StarWhisper language to Chinese for Chinese-dominant content with English brand names and technical terms mixed in, or to Auto-detect for full bilingual switching paragraph by paragraph. The engine recognizes Microsoft, Google, Tencent (in their English forms), API, dashboard, deploy, and other technical English inline.
Mandarin learners can dictate practice sentences and see immediate character output, which is useful for verifying that what you said came out as what you meant. Tone-precision feedback is implicit: if your tones are off enough that Whisper picks the wrong character, you know to practice. The 500-word free plan is plenty for daily practice. The multi-language feature page covers the full list of supported languages if you are also learning other languages.
Professional translators working from English (or other languages) into Chinese can dictate Chinese target text directly into CAT tools like memoQ, SDL Trados Studio, OmegaT, or any translation interface that accepts text input at the cursor. This is significantly faster than typing Chinese in pinyin IMEs and reduces the cognitive cost of staying in the target language. The voice to text for translators page goes deeper into translator workflows.
Mandarin (putonghua) is one of multiple Chinese languages and is the only one with full Whisper support at high accuracy. The Whisper model also includes Cantonese (yue) as a separate language, plus partial coverage of other Chinese varieties through the Chinese (zh) language setting.
Cantonese is supported as a separate Whisper language. Set the StarWhisper language to Cantonese if you speak Cantonese. Accuracy is meaningfully lower than Mandarin because Cantonese has less training data in the Whisper corpus, but it is functional for clear broadcast-register Cantonese. The output is in Chinese characters, which may include Cantonese-specific written characters that appear in casual Hong Kong writing. For formal writing in Hong Kong (which conventionally uses standard Chinese), set the language to Chinese and dictate in Mandarin if you can.
Shanghainese (Wu), Hokkien and Teochew (Min), Hakka, and other Chinese topolects are not separately supported by Whisper. Speakers of these topolects typically write in standard Chinese (Mandarin-based written form) rather than in their spoken language, so the workflow is to set the language to Chinese and dictate in Mandarin even if your daily speech is in a topolect. For speakers who are not fully comfortable in Mandarin, this is the limitation of the current model; Whisper-class speech recognition for topolects is still a research area.
StarWhisper runs on Windows 10 and Windows 11. The free installer is around 100 MB. The Whisper model files (selected based on your hardware) download on first use. CPU-only operation works on any reasonably modern Intel or AMD machine. An NVIDIA GPU with CUDA accelerates the larger models significantly. Vulkan provides a cross-vendor GPU path for AMD and Intel discrete GPUs.
For Mandarin dictation, the medium Whisper model is the sweet spot. The small model is fast and produces acceptable results for clear speech but misses more characters in noisy conditions or for less common vocabulary. The large model gives marginal accuracy gains at substantial VRAM cost. The app picks a sensible default based on your hardware; you can change it in Settings. See the GPU acceleration page for the VRAM and speed trade-offs.
Microphone quality matters more than model size for Chinese accuracy. A USB headset or directional desk microphone produces noticeably cleaner output than laptop built-in mics. Chinese has more tonal and acoustic variation per syllable than English, so cleaner input audio pays off. For best results in an office, sit reasonably close to the mic (within about 20 to 30 centimeters for a desk mic) and avoid speaking into wind from fans or air conditioning.
| Plan | Words / Characters | Price |
|---|---|---|
| Free | 500 words/day, 3,500/week (Chinese words count by character roughly) | $0 |
| Pro Monthly | Unlimited | $10/month |
| Pro Annual | Unlimited | $80/year ($6.67/month) |
There is no separate Chinese language fee. The 96+ language pack including Chinese ships in the same installer. Billing is in USD through Stripe; your bank handles RMB or other currency conversion at the prevailing rate. For full pricing detail, the homepage pricing section lists what each tier includes. The no-subscription feature page explains how the free tier works without any recurring commitment.
Other StarWhisper pages related to multilingual dictation
Auto kanji/hiragana/katakana conversion. Skip the IME for Japanese writing.
Hochdeutsch, Swiss, Austrian, Bavarian. Umlauts and eszett produced natively.
The full list of 96+ Whisper languages StarWhisper supports out of the box.
Dictate target-language translations into memoQ, Trados, OmegaT, and other CAT tools.