Voice to Text for Translators: Dictate in 96 Languages

Name: StarWhisper
Rating: 4.8 (50 reviews)
Author: StarWhisper

Why Translators Dictate Instead of Typing

Professional translators have used voice dictation for decades. The reason is simple arithmetic. Typing speed in your second or third language is usually slower than in your native language, often dramatically so. A translator who types 80 words per minute in English may only type 50 to 60 in German or French because the muscle memory for the keyboard layout, special characters, and language-specific punctuation has been built up over far fewer years. Speaking speed, by contrast, is roughly the same across all languages you are fluent in. You can speak German at the same pace you speak English: around 150 words per minute.

That gap (50 wpm typed vs 150 wpm spoken in the target language) is why translator dictation tools like Dragon Professional with the language pack have been a staple of the profession since the late 1990s. The new variable is that StarWhisper brings the same productivity gain at one-tenth the price, with the bonus that local processing means no audio ever leaves the laptop. For translators who handle legal, patent, medical, or corporate confidential work, the local-only constraint is not a nice-to-have, it is contractually required.

The accuracy story has also shifted. Whisper's multilingual training corpus is dramatically larger than the per-language models that powered Dragon's pre-2020 architecture. For German, Spanish, French, Italian, Portuguese, Dutch, Japanese, Mandarin, Korean, Russian, and Arabic, Whisper typically transcribes professional-grade source material at 95 to 98 percent word accuracy. That is comparable to a fast bilingual typist, and the corrections needed are usually one or two words per segment, not the constant fixing required by earlier-generation tools.

Setting Up Per-Language Hotkeys

The standard StarWhisper setup for a working translator is one global hotkey per target language. For an EN-DE-FR translator, that means three hotkeys (English, German, French), each tied to the corresponding Whisper language mode. Open the Settings panel, find the Hotkeys section, and bind each language to a free function key. F7 for English, F8 for German, F9 for French is a common setup. F10 onward stay open for other languages added later.

Once bound, switching mid-document is one keystroke. Place the cursor in the target segment, press the hotkey for that language, speak, release. The text appears in the correct language. There is no menu to navigate and no confirmation dialog to dismiss. The hotkeys are global, meaning they work whether the foreground window is memoQ, Trados, a browser tab open to Smartcat, or a Word document for an unstructured translation.

For translators who work in more than three or four languages regularly, an alternative setup is to use a single hotkey for the dominant target language and use the menu picker for occasional secondaries. This keeps the keyboard real estate manageable. The trade-off is one extra click per language switch, which adds up if you actually work in five languages in the same day. Most multilingual translators settle on three hotkeys for their three highest-volume languages and the menu for everything else.

Workflow Inside CAT Tools

The basic flow in memoQ, Trados, Smartcat, or Phrase is identical because StarWhisper does not integrate with the CAT tool, it just sends keystrokes to wherever the cursor is. Open your CAT tool, navigate to the segment you want to translate, click into the target field, press the language hotkey, dictate the translation, release the hotkey. The text appears in the target field. Move to the next segment, repeat.

For tag-heavy segments (most marketing copy, software UI, technical documentation), the recommended approach is to dictate the prose first, then copy the source tags using the CAT tool's tag-copy shortcut. Trados uses Ctrl-Alt-down for "Copy source to target tags". memoQ uses F8 (so pick a different StarWhisper hotkey if you use memoQ). Smartcat has a tag-insert panel. Once tags are in, position your cursor inside the target text and the dictated translation drops in correctly. Some translators reverse the order, inserting tags first and then dictating the prose around them; either way works.

Concordance lookups and TM matches are unaffected because StarWhisper does not change anything about the CAT tool's behavior. You still see your TM matches, terminology suggestions, and QA flags as normal. Dictation is purely an input method for the target-language text. Some translators find it useful to dictate the gist of the translation first, then use the CAT tool's edit-distance display to refine toward the TM match. Others dictate cold and ignore the TM panel until a polish pass.

Subtitling Workflow

Subtitling is one of the strongest use cases for translator dictation, because subtitle segments are short, time-constrained, and need to read naturally in the target language. The combined audio-transcription and dictation workflow looks like this: extract audio from the source video using any free tool (FFmpeg, Audacity, VLC export). Open StarWhisper, drag the audio file in, set the source language, export as SRT. You now have a timestamped source transcript.

Open your subtitling tool of choice (Aegisub for hobbyists, Subtitle Edit for free professional use, EZTitles or OOONA Studio for studio work) and load both the video and the SRT. Translate each subtitle by clicking into the target field and dictating the translation using StarWhisper's per-language hotkey. The subtitling tool handles timing, line breaks, character counts, and reading-speed warnings; StarWhisper handles the typing.

For a 30-minute video at typical subtitling density (around 400 to 500 subtitles), this combined workflow typically cuts the project time by 30 to 50 percent versus typing every subtitle. The bigger savings come on dialogue-heavy content where each subtitle is full sentences; the savings are smaller on technical or visual content where most subtitles are short identifiers. The subtitling-from-scratch how-to covers the full pipeline including the audio-extraction step.

Privacy: Why Local Matters for Translation Work

Translator NDAs are typically stricter than NDAs in most other professions. A pre-release product manual, a draft of a merger announcement, an unfiled patent specification, or a sealed court document carries serious consequences for any leak. The standard NDA language for high-end translation work prohibits using any cloud service for processing the source or target text, with some agencies maintaining explicit blacklists of providers (Google Translate, DeepL's free tier, Otter, anything that uploads text to a third party).

Local-only dictation sidesteps the issue cleanly. StarWhisper in Local Mode runs OpenAI Whisper on your PC. The audio is processed by code running on your CPU or GPU. No audio is uploaded, no transcript is sent anywhere, no telemetry includes the dictation content. From the agency's perspective, this is the same as if you were typing the translation yourself. The dictation step is invisible to the NDA. The privacy and offline architecture page goes into the technical details of what "local-only" actually means.

Cloud Mode exists as an opt-in feature for non-NDA work. It sends audio to OpenAI's Whisper API for processing, which is faster on machines without a GPU but obviously cannot be used for confidential content. The setting is clearly labeled and stays off by default. For translators who work across both NDA and non-NDA projects, the safe default is to leave Cloud Mode disabled and only flip it on for explicitly non-sensitive work.

Cost vs Other Translator Dictation Options

Option	Cost	Language coverage	Local processing	Works in CAT tools
StarWhisper	Free / $10 mo	96 languages	Yes (Local Mode)	Yes (keystroke)
Dragon Professional + language pack	$699 + ~$300/language	1 language per pack	Yes	Yes
Dragon Anywhere	$15/mo	6 languages, mobile only	No (cloud)	Limited
Windows Speech Recognition	Free	~10 languages	Yes	Yes, low accuracy
Apple Dictation	Free, Mac/iOS only	~40 languages	Mixed (depends on mode)	Mac CAT tools only

The cost gap is most dramatic against Dragon. A translator working in three target languages historically needed Dragon Professional plus two additional language packs, which adds up to roughly $1,300 in upfront cost. StarWhisper covers all three languages (and the other 93) on the $10 per month Pro plan. For a working translator the Pro plan pays for itself the first day, since one hour saved per week at typical translator rates of $40 to $80 per hour already exceeds the monthly cost. The multi-language feature page details which languages have which accuracy class.

Frequently Asked Questions

Which CAT tools does StarWhisper work with?

Any CAT tool with a normal text input field on Windows. StarWhisper types into the focused window using simulated keystrokes, so it does not need plugin integration. Confirmed working with memoQ (desktop), SDL Trados Studio, Phrase TMS, Smartcat (web), Wordfast, OmegaT, MateCat, Across, and CafeTran. It also works in browser-based interfaces for DeepL, Google Translate, Reverso, Linguee, and any custom translation portal a client provides. The keystroke approach is tool-agnostic and survives CAT tool updates.

How do I switch between source and target languages quickly?

Set a separate hotkey for each language you work in, mapped to that language's Whisper model. The standard setup for a bilingual EN-DE translator: press F8 to dictate in German, F9 to dictate in English. The hotkeys are global so they work inside any CAT tool. For multilingual work (more than two languages), most translators set a primary hotkey for the dominant target language and use the menu language picker for occasional secondary languages. Switching is one keystroke per direction, no menu hunting.

Does dictation preserve formatting like tags and placeholders?

StarWhisper inserts plain text at the cursor; it does not insert CAT-tool formatting tags or inline placeholders by itself. The standard workflow is to dictate the segment text first, then use the CAT tool's tag-copy shortcut (typically Ctrl-Alt-down arrow in Trados, F8 in memoQ) to bring the source tags into the target. This is the same workflow as typing the translation, just faster on the prose portion. Some translators dictate the prose, then add tags during a second pass, which is also a common approach.

Does it handle technical terminology in specialized fields?

OpenAI Whisper, the model behind StarWhisper, has been trained on a large multilingual corpus that includes technical, legal, medical, and academic text. Most domain terminology comes through correctly. Highly specialized terms (rare chemistry compounds, proprietary product codes, obscure case law citations) may need correction, especially in less common target languages. The fastest workflow is to dictate the segment, glance for any obvious mis-recognition, and fix one or two terms by keyboard. Net time is still well below typing the whole segment.

Can I dictate punctuation, line breaks, and capitalization?

Yes. Say comma, period, question mark, exclamation point, colon, semicolon, open quote, close quote, open parenthesis, close parenthesis. For line structure, say new line or new paragraph. Capitalization usually follows automatically from sentence position; for proper-noun capitalization that Whisper misses, use 'cap' before the word in some dictation modes. Different target languages have different punctuation conventions (Spanish inverted question marks, French spaced punctuation, Japanese full-width forms) and Whisper generally handles these correctly when the language is set right.

Is it private enough for confidential or NDA translation work?

In Local Mode, yes. Audio is processed entirely on your PC and never sent to any external server. This matters for legal translation, patent translation, corporate M&A documents, pre-launch product materials, and any client work under an NDA. Cloud-based dictation tools like Dragon Anywhere or Wispr Flow transmit audio to vendor servers, which is typically prohibited by translator NDAs even if the vendor's privacy policy is reasonable. StarWhisper's local-only mode sidesteps that issue. Cloud Mode is opt-in and should be disabled for NDA work.

What is the free tier and is it enough for working translators?

The free plan includes 500 dictated words per day and 3,500 per week. For occasional dictation, glossary work, or trying it out before committing, that is usable. For full-time translators producing 2,000 to 4,000 target words per day, the free tier will hit the cap by mid-morning. The Pro plan at $10 per month or $80 per year removes the limit entirely. A 7-day free Pro trial is available to measure your actual usage. Compared to dedicated translator dictation tools that run $40 to $100 per month, $10 is significantly cheaper.

Can I use it for subtitling work?

Two ways. First, dictate translated subtitle text into your subtitling tool (Aegisub, Subtitle Edit, EZTitles, OOONA) the same way you would dictate into a CAT tool. Second, transcribe the source-language audio of a video file to get a starting SRT, then translate it. StarWhisper accepts MP3, WAV, M4A, and OGG audio input directly (extract the audio from video first using any free tool), produces SRT output with timestamps, and that SRT becomes the starting point for translation. Combined, the workflow can shorten a one-hour subtitling job by 30 to 50 percent.

Voice to Text for Translators:
Dictate Drafts in 96 Languages

Built for Translator Workflows

96 Languages Supported

Works in Any CAT Tool

Per-Language Hotkeys

Local Processing for NDA Work

Dictate into DeepL, MT Postediting

SRT Output for Subtitling

Why Translators Dictate Instead of Typing

Setting Up Per-Language Hotkeys

Workflow Inside CAT Tools

Subtitling Workflow

Privacy: Why Local Matters for Translation Work

Cost vs Other Translator Dictation Options

Frequently Asked Questions

Dictate Faster Than You Type, in Any Language

More for Language Professionals

Voice to Text for Writers

Voice to Text for Content Creators

Multi-Language Support

How to Add Subtitles to Video (Free)

Voice to Text for Translators: Dictate Drafts in 96 Languages

Built for Translator Workflows

96 Languages Supported

Works in Any CAT Tool

Per-Language Hotkeys

Local Processing for NDA Work

Dictate into DeepL, MT Postediting

SRT Output for Subtitling

Why Translators Dictate Instead of Typing

Setting Up Per-Language Hotkeys

Workflow Inside CAT Tools

Subtitling Workflow

Privacy: Why Local Matters for Translation Work

Cost vs Other Translator Dictation Options

Frequently Asked Questions

Dictate Faster Than You Type, in Any Language

More for Language Professionals

Voice to Text for Writers

Voice to Text for Content Creators

Multi-Language Support

How to Add Subtitles to Video (Free)

Voice to Text for Translators:
Dictate Drafts in 96 Languages