Speak in your target language. See what got transcribed. If the words came out right, your pronunciation was intelligible. If not, you can hear the gap. StarWhisper handles 96 languages, runs locally on Windows, and the free plan covers daily practice.
Six properties that matter when you are learning a new language
Speak a sentence in your target language. The transcription is what the model heard. If it matches, you are intelligible. If not, you can see exactly which word failed.
Switch from Spanish to Japanese to German with a couple of clicks. No per-language downloads, no separate apps. The same hotkey works across all of them.
Whisper was trained on diverse accent data, including learners speaking languages they did not grow up with. You do not need a native accent for the system to work.
Build flashcards faster by dictating both sides of a card in any language. Type responses in Duolingo's web app without keyboard-switching headaches.
Mandarin, Vietnamese, Thai, Cantonese, and other tonal languages work. If your tones are off, the wrong character may appear, which is direct feedback.
500 words per day and 3,500 per week with no credit card. Enough for daily journaling, flashcard building, and pronunciation drills in your target language.
Pronunciation feedback used to require a teacher, a tutor on italki or Preply, or a friend who already speaks the language. They listen to you, they tell you what is off, you try again. The bottleneck is the cost and scheduling of another human being. For solo learners between sessions, the cheapest pronunciation feedback was usually no feedback at all. You said the word, you moved on, you had no idea whether it was right.
A modern transcription model is a passable substitute for the listening half of that loop. If you say a Spanish sentence and a model trained on hundreds of thousands of hours of Spanish audio writes down what you said correctly, your pronunciation was intelligible enough for the model. That is a useful proxy for "intelligible enough for a real Spanish speaker." If the model writes down something different, the difference is informative. The word you said came out closer to what the model wrote than what you intended. You can hear the gap, try a new shape with your mouth, see what gets transcribed.
StarWhisper is a dictation app that gives you that loop in any Windows text field, in any of 96 languages. Open a notes app, pick the target language, press the hotkey, say a sentence, see the transcription. The loop closes in about two seconds. You can run it a hundred times in a focused 20-minute practice block.
The practical workflow looks like this. You have a textbook sentence, an Anki card, a phrase from a podcast, anything you want to practice saying. You read or speak it once in your target language. The transcription appears. You compare.
This is not a replacement for a real teacher or tutor, who can correct things a transcription model cannot, like rhythm, stress, intonation, and naturalness. But the transcription loop is available 24 hours a day, costs nothing on the free plan, and runs in any room you can speak out loud in. The marginal practice it adds between paid sessions or class meetings is significant.
Anki is the spaced-repetition flashcard app many serious language learners build their study around. Building good decks is slow if you type everything: switching keyboard layouts for accents, typing characters in scripts you barely know, copy-pasting from dictionaries. Dictation removes most of that cost. The desktop Anki app accepts text input in any field, and StarWhisper drops dictation into any Windows text field, so the card-creation workflow becomes:
For a beginner Japanese learner building a kanji deck, dictating the reading and seeing it appear in kanji + hiragana + katakana mix is dramatically faster than typing romaji and waiting for IME conversion. For a Spanish learner building a verb conjugation deck, dictating the form means accent marks and inverted question marks land correctly the first time. The same applies to Russian Cyrillic, Arabic script, Greek, Korean Hangul, Thai script.
One of the most useful intermediate-level practices is writing journal entries or short essays in the target language. The friction for most learners is twofold: producing the language at all, and typing in a script or accent system they do not have muscle memory for. Dictation removes the second friction. You think a sentence in the target language, you say it, the transcription handles all the orthographic details.
This is also a useful diagnostic. If you can speak a coherent paragraph in your target language fast enough that the transcription captures it, you can speak that paragraph in conversation. Many learners are surprised by the gap between "I can read this" and "I can produce this out loud." The dictation workflow exposes that gap directly. Reading is passive. Producing under the time pressure of normal speech is active and harder. Practicing the active side regularly is one of the highest-leverage things a learner can do.
For long-form writing, the pattern is the same as native-language dictation: talk through a section, stop, edit, talk through the next. The companion voice to text for writers page covers that general workflow in more depth. The only addition for language learners is keeping a grammar correction step in the loop, either a tutor, a language model, or a tool like LanguageTool.
StarWhisper supports 96 languages through Whisper. The accuracy is genuinely high for most major languages and useful for most smaller ones. A few language-specific notes worth knowing:
Strong accuracy. Handles compound words and case endings well. Umlauts and the eszett (ß) appear correctly without keyboard switching. See the dedicated German dictation software page for more.
Strong accuracy. Handles both Iberian and Latin American varieties. Accent marks (tildes, accents, inverted punctuation) appear correctly. See the dedicated Spanish dictation software page for more.
Strong accuracy. Output uses the natural mix of kanji, hiragana, and katakana the model learned from training data, not a single script. Useful for kanji-reading practice because you can dictate a sentence in your own pronunciation and check whether the kanji that appear match what you meant. See the dedicated Japanese dictation software page for more.
Strong accuracy. Output appears in simplified characters by default. Tones matter for word-level accuracy. If a syllable's tone is off, the wrong character can appear, which is exactly the feedback a learner needs.
Strong accuracy in Hangul. Useful for sentence-level pronunciation practice and for building Anki decks faster than typing with an IME.
All supported with good accuracy. Each language uses its native script in the output. Tonal languages like Thai and Vietnamese benefit from the same "wrong word means wrong tone" feedback as Mandarin.
For the full multi-language feature breakdown, see the multi-language feature page.
StarWhisper does not replace your tutor, your textbook, your Duolingo streak, or your italki sessions. It is a general dictation tool that runs in any Windows text field, so it slots into whatever language-learning stack you already use.
For a complete language-learning stack, the high-leverage habit is using dictation daily, even briefly. Two minutes of dictating sentences in your target language each morning produces measurable pronunciation improvement over a few months, because the model gives consistent feedback at zero marginal cost.
Language practice usually happens in environments where reliability beats convenience: on a commute, in a quiet apartment late at night, in a hotel room during travel, in a cabin without internet. Cloud dictation tools tend to introduce micro-failures that interrupt practice flow. StarWhisper Local Mode runs the transcription model on your PC. No upload, no API call, no reconnect spinner. The same flow works offline and online.
This also matters for privacy. Dictating personal journal entries in any language, especially during early-learner attempts that may be grammatically rough, is comfortable when the audio never leaves the device. There is no server logging your practice sentences, no model training on your audio, no analytics dashboard your data feeds. Cloud Mode is available as opt-in for slightly higher accuracy on long-form audio, but it is not the default and is not required.
Download from the homepage, run the Windows installer, allow microphone access, pick a hotkey. Open the language picker in the app, choose your target language, then open any text field, press the hotkey, speak. The first transcription in your target language is the convincing moment. You say a sentence in Spanish, the text appears in Spanish with proper accents, and the pronunciation practice loop just works.
For multi-language learners, leave the picker on auto-detect or switch manually when you change study targets. The model handles the rest. The free plan at 500 words per day and 3,500 per week covers most daily practice indefinitely, and Pro at $10 per month is the unlimited option for learners producing long-form writing in their target language daily.
Dedicated pages for specific target languages
Strong accuracy for compound words, umlauts, and case endings. Useful for B1 and up.
Handles both Iberian and Latin American varieties. Accent marks and inverted punctuation come out right.
Natural mix of kanji, hiragana, and katakana in the output. Useful for kanji-reading practice.
The full 96-language list, how switching works, and notes on accuracy across language families.