日本語の音声入力. Dictate Japanese into any Windows app with automatic kanji, hiragana, and katakana conversion. Skip the IME entirely. Free for 500 words a day.
Native script output, polite forms, and no IME juggling
Output in the mixed orthography Japanese is actually written in. Common nouns in kanji, particles in hiragana, foreign loanwords in katakana. No conversion candidate menus.
Skip the romaji-kana-kanji conversion pipeline that has slowed Japanese typing for decades. Speak the sentence, transcript appears at the cursor in the correct script mix.
Polite forms (です/ます), humble forms (謙譲語), and honorific forms (尊敬語) are transcribed as spoken. Business email, client correspondence, formal documents all work.
English brand names, technical terms, and short phrases embedded in Japanese sentences come through correctly. Tech, finance, and consulting writing flows naturally.
Local Mode keeps audio on your Windows machine. Confidential client communications, internal company memos, and personal writing never leave the building.
500 words a day, 3,500 a week. Pro at $10/month unlocks unlimited dictation for high-volume Japanese writing.
Japanese typing on Windows has been the same painful workflow for thirty years. You type romaji into the IME, watch it convert to hiragana, press space to see kanji conversion candidates, pick the right one, repeat for every word. Even experienced typists hit a ceiling well below their thinking speed because of the cognitive overhead of monitoring the IME candidate window. Voice dictation removes that whole pipeline.
StarWhisper uses OpenAI's Whisper model to transcribe Japanese audio directly into the mixed kanji-hiragana-katakana orthography that Japanese is actually written in. You speak in natural Japanese, the transcript appears at your cursor in the correct script, no IME menus, no conversion candidates, no Tab and Enter key dance. The application is a Windows desktop tool, so it works in Word, Outlook, OneNote, Excel, Teams, Slack, Notion, your browser, and any other Windows app with a text input.
Japanese is one of Whisper's supported languages with substantial training data from Japanese broadcasts, podcasts, anime, drama, news, and YouTube. The model has learned not just Japanese sounds and grammar, but Japanese orthographic conventions: which words are conventionally written in kanji, which stay in hiragana, and which appear in katakana. That convention-following is what makes the output usable without manual post-editing.
Japanese is written with three scripts simultaneously. Kanji (Chinese characters) carry the lexical content of nouns, verb roots, and adjective roots. Hiragana handles grammatical particles, verb conjugation endings, and native Japanese words for which kanji feels heavy. Katakana takes foreign loanwords, onomatopoeia, and certain stylistic uses. Native writers move between the three constantly within a single sentence, and the rules for which script to use where are partly conventional and partly stylistic.
Whisper produces this mixed orthography directly. A sample sentence like 「今日の会議は午後3時に変更になりました」 ("Today's meeting has been changed to 3 PM") comes out exactly as written, with 今日, 会議, 午後 in kanji, 時, 変更 in kanji, particles の, は, に in hiragana, and the verb ending なりました in hiragana. You did not type any romaji; you spoke the sentence and the transcript matches what a native writer would produce.
Katakana for foreign loanwords is automatic. Computer becomes コンピューター, meeting becomes ミーティング, project becomes プロジェクト, system becomes システム. Brand names and English proper nouns stay in their original Latin script (Microsoft, Google, GitHub). The output flows into any Windows app at native speed.
Japanese has elaborate registers of politeness that English does not. The same idea can be expressed in plain form for friends, polite form for colleagues and acquaintances, humble form when describing your own actions in formal settings, and honorific form when describing the actions of someone you respect. Business Japanese in particular relies on a layered combination of these registers and miscalibrating them is an actual professional risk.
Whisper transcribes keigo exactly as spoken without normalization. If you dictate お世話になっております (a standard business correspondence opener), that is what appears in the transcript. If you dictate ご確認いただけますでしょうか (a polite way to request review), that appears verbatim. If you dictate ありがとうございます or ありがとう, you get exactly what you said. The model does not "fix" your register up or down.
This makes Japanese business email dictation directly usable. The common opening, body, and closing patterns of business mail (お世話になっております, さて, ご検討のほどよろしくお願いいたします) come through cleanly. Internal team chat in plain or semi-formal Japanese works the same way. Client-facing documents that require sustained keigo can be dictated at speech speed instead of typed sentence by sentence.
Standard business Japanese (営業, 経理, 取引先, 案件, 見積もり, 請求書, 納期, 検収) is well represented in training data and comes through correctly. Standard technical Japanese (システム, データベース, インターフェース, クラウド, API, リリース, デプロイ) is recognized for both the katakana-loanword side and the native Japanese side.
Industry-specific terminology varies. Common terms across major industries (finance, IT, marketing, consulting, manufacturing, retail) generally work. Very specialized terms (rare scientific vocabulary, niche legal terminology, internal company jargon) may be transcribed phonetically and need correction. A personal find-and-replace list handles repeat offenders. The model does not learn from your corrections session to session, which keeps your data local but means consistent terms remain consistent.
For long-form Japanese writing workflows, see the voice-to-text for content creators page; the same workflow applies to Japanese output. For multilingual workflows generally, the multi-language feature page covers the full supported set.
Modern Japanese workplace and academic writing routinely embeds English brand names, English technical terms, and short English phrases inside Japanese sentences. A sentence like 「明日のMTGでSlackの新機能についてレビューします」 ("Tomorrow's meeting will review Slack's new features") is normal in tech workplaces. Whisper handles this kind of mixed input without breaking the Japanese flow.
Set the StarWhisper language to Japanese for Japanese-dominant content. Embedded English brand names, abbreviations, and short phrases come through with the original casing preserved. The model also recognizes katakana transliterations of English words where Japanese has adopted them (the same word may appear as the original English or as the katakana version depending on what the speaker used). For documents that flip between long English and long Japanese paragraphs, switch the language to Auto-detect so the engine picks per-segment.
For Japanese-to-English translation workflows, dictate the Japanese source into one document and use any translation tool of your choice on the result. For English-to-Japanese drafting, dictate directly in Japanese rather than dictating English and translating afterwards. The output quality is consistently better when you speak in the target language.
Outlook and Gmail messages, internal team updates, client follow-ups, formal correspondence in keigo. The Japanese business email format with お世話になっております opening, body, and よろしくお願いいたします closing dictates cleanly. Dictating a longer business email in Japanese takes roughly a third of the time of typing it through an IME, and reviewing the result is faster than monitoring conversion candidates word by word.
Light novel authors, manga script writers, and visual novel writers can dictate first drafts at speech speed. Dialogue in different character registers (casual, polite, archaic, role-language) comes through as spoken. Long narrative passages in standard Japanese work cleanly. For long-form fiction work, the unlimited Pro plan removes the daily word cap. See the voice-to-text for writers page for general long-form workflows.
Transcribing recorded meetings, interviews, lectures, and podcasts into Japanese text. StarWhisper supports both real-time dictation and audio file transcription. Standard Japanese in clear recording conditions transcribes accurately.
Translators working into Japanese can dictate their translations directly rather than typing through an IME. The output is rough-draft quality and benefits from a polishing pass, but the speed advantage over IME typing is substantial. Working from English source material spoken aloud as Japanese translation is faster than typing.
Journaling, blog posts, social media, Note articles, Twitter/X threads in Japanese. The free tier covers most personal writing volumes. For users who write in Japanese daily as part of a content workflow, Pro is the better fit.
Japan has historically been underserved by Western voice dictation tools because the script complexity, the IME-centric typing culture, and the limited Japanese training data in older speech recognition systems combined to produce poor results. Whisper changed that on the engine side; the gap between Japanese and English output quality is much smaller than it was even three years ago.
StarWhisper funnel data shows Japan generating around 14 daily installs of the Windows app, which is a meaningful and growing market. The current first-success rate for Japanese users sits around 16.7 percent, lower than the 57 percent rate seen for German users in the same dataset. That gap reflects two things: setup friction specific to Japanese users (script selection, language settings, microphone configuration), and the genuine difficulty of dictation when the input is mixed natural Japanese and embedded English. Both are improving as the product iterates.
For Japanese users considering whether voice dictation is finally good enough to replace IME typing for daily writing, the practical answer is that the engine quality is there; the limiting factor is now setup and habit. The free tier lets you test the workflow on your own machine without commitment. The FAQ covers common Japanese setup questions, and the privacy and offline mode page explains the local processing posture.
| Plan | Words | Price (USD) |
|---|---|---|
| Free | 500 words/day, 3,500/week | $0 |
| Pro Monthly | Unlimited | $10/month |
| Pro Annual | Unlimited | $80/year ($6.67/month) |
Billing runs through Stripe in USD. Your bank converts to JPY at the prevailing rate. There is no separate fee for Japanese; the 96+ language pack including Japanese ships in the free installer. Word counts are measured with a sensible heuristic for Japanese where character density per "word" is much higher than for English.
The free tier is genuinely usable for personal Japanese writing volumes: business email, journaling, blog posts, social media. Pro at $10 a month makes sense for users writing long-form Japanese daily: novelists, translators, journalists, full-time business correspondents, and content creators. The annual plan saves about a third compared to monthly billing. The full pricing breakdown is in the homepage pricing section. The no-subscription page explains how the free tier works without any recurring commitment.
StarWhisper runs on Windows 10 and Windows 11. Not on Mac, not on mobile. The installer is around 100 MB; Whisper model files download on first use. CPU-only operation works on any reasonably modern Intel or AMD machine. An NVIDIA GPU with CUDA accelerates the larger models significantly, useful for high-volume Japanese transcription. Vulkan provides a cross-vendor GPU path for AMD and Intel discrete GPUs.
For Japanese dictation, the medium or large Whisper model is recommended over the small model because Japanese benefits more from the extra parameters than English does. The large model is the highest-accuracy option if your machine can run it.
Microphone quality matters more than you might expect. A USB headset or a directional desk microphone produces noticeably cleaner Japanese transcription than a laptop built-in microphone, especially for soft-voiced speakers or rooms with background noise. The investment in a 5,000-yen USB mic pays back quickly in reduced correction time. For more on the GPU side, see the GPU acceleration feature page.
Other StarWhisper pages that pair well with Japanese dictation
Umlauts, compound nouns, and DACH-region dialects. GDPR-friendly local processing.
Castilian, Mexican, Argentinian, and Caribbean Spanish with inverted punctuation.
Long-form drafting workflows that work in any supported language including Japanese.
The full 96+ language pack that ships with the StarWhisper free installer.