Developer Workflow

Voice Typing for Coding:
Dictate Prompts, Comments, and Commits

Vibe coding works best at 150 words per minute. Dictate long prompts to Cursor, Claude Code, and Copilot. Type comments, docstrings, commit messages, and Slack chat with your voice. Local Whisper, Windows native, free to start.

Download for Windows
Microsoft Store
  • Trusted by Windows
  • Quick 30-second setup
"Refactor the auth middleware to use the new session schema..."

Built for the way developers actually code in 2026

Andrej Karpathy called it "vibe coding": dictating intent to an LLM and letting it write the code. Voice is the bottleneck.

For the AI-first developer

150 WPM speech, 60 WPM typing

Frontier models reward long, specific prompts. Typing a 400-word prompt to Cursor takes 6 to 8 minutes. Dictating one takes about 2.5 minutes. You get more iterations per hour.

  • Works in Cursor, Claude Code, Windsurf, VS Code, Copilot Chat
  • Dictate prompts, comments, docstrings, commits, Slack
  • Local Whisper, no code leaks to a transcription cloud
  • NVIDIA GPU acceleration for sub-second turnaround
  • $10/month or $80/year, or free 500 words/day
Where voice is NOT the right tool

Literal code is still faster typed

Voice typing is not a replacement for typing brackets, colons, and snake_case identifiers. The win is dictating the prose around the code, not the code itself.

  • Single line edits in the middle of a function: type
  • Quick fix to a syntax error: type
  • Variable renames: use the IDE refactor
  • Long prompts, comments, commit messages: voice
  • Slack threads explaining a decision: voice

What developers dictate, every day

Six places where voice typing replaces hand typing in a modern dev workflow

Prompts to Cursor and Claude Code

The prompt to your AI pair programmer is mostly prose, often several hundred words for a non-trivial task. Dictating cuts that to a quarter of the time and tends to produce clearer, more specific prompts because you can hear your own ambiguity.

Comments and docstrings

The codebase needs Google-style or JSDoc-style comments and nobody writes them because typing them out is friction. Voice typing removes the friction. Explain what the function does in normal English, edit a few words, move on.

Commit messages

The two-paragraph commit that explains the why takes a minute to type and fifteen seconds to dictate. Conventional commit prefixes like feat, fix, chore get added with one keystroke after Whisper hands you the body.

Slack and Discord chat with the team

Pair programming over a call, debugging together in a Slack huddle, dropping a design note in a thread. All prose, all easier dictated, especially when you are also screen sharing and want your hands free to point at things.

PR descriptions and design docs

Every pull request needs a description and every team has a "good PR description" template. Dictating a four paragraph PR description into the GitHub form takes a minute. Reviewers thank you for the context.

Bug reports and issue triage

Filing a clear bug report in Linear or Jira is the kind of writing that gets rushed because typing is slow. Voice typing gives you the bandwidth to actually describe the repro steps, the expected versus actual, and the workaround you tried.

Vibe coding: why voice matters more than ever

The term "vibe coding" was popularized by Andrej Karpathy in early 2025 to describe a new mode of development where a human directs an LLM in natural language and the LLM produces, edits, and refactors the code. The shift matters because the bottleneck moves. In traditional development, typing speed barely mattered: thinking, designing, and debugging took the time. In vibe coding, you are mostly writing prompts. A prompt is text. The faster you can produce text, the more iterations you get per hour. StarWhisper is a Windows desktop dictation app built around exactly this shift.

The average developer types at around 50 to 70 words per minute. The average speaker dictates fluent prose at 130 to 160 words per minute. That is a 2 to 3x throughput multiplier on every prompt, every code comment, every commit message, every Slack thread, every PR description. Multiply that across a day of AI-assisted development and the time savings are meaningful.

The second-order effect matters more. When typing is the bottleneck, developers under-specify their prompts. They write "fix the auth bug" instead of "the session cookie is being cleared on logout when the user has the remember-me checkbox set, here is the relevant code, please trace why this might happen and propose a fix." The long version produces better LLM output. Voice removes the cost of the long version.

Where voice typing fits in a modern AI coding stack

StarWhisper is not a code editor and does not try to replace your IDE. It is a global Windows dictation layer that types into whatever text field is focused. That means it works equally well in every tool a developer touches in a normal day:

  • Cursor: the chat sidebar, the cmd+K inline edit popup, and Composer mode all accept dictated prompts. No Cursor extension required.
  • Claude Code: runs in a terminal, which is just a text input. Hold the StarWhisper hotkey, dictate your task, release, hit Enter.
  • Windsurf and Aider: same as Claude Code. Terminal CLI tools work transparently.
  • VS Code with GitHub Copilot Chat: dictate into the Copilot Chat pane. Inline ghost-text suggestions still work normally.
  • ChatGPT, Claude.ai, Gemini, Perplexity: the browser-based chat boxes accept dictation just like any other web input.
  • Slack, Discord, Linear, Jira, GitHub: comment fields, message boxes, issue descriptions all accept dictation.

There is no integration layer because there is nothing to integrate with. StarWhisper hooks into Windows at the input level and pastes wherever your cursor is. This is the same model as the operating system's built-in voice typing, except the engine is OpenAI Whisper instead of Windows Speech Recognition, and the audio never leaves your machine.

How Whisper handles technical vocabulary

The standard complaint about voice dictation in technical contexts is that it mangles library names, framework names, and product names. This was true for older speech recognition systems trained on general English corpora. It is much less true for Whisper, which OpenAI trained on 680,000 hours of multilingual web audio, including a substantial amount of technical podcasts, conference talks, and tutorial content.

In practice, common tech vocabulary lands cleanly: React, Vue, Svelte, Next.js, Postgres, MySQL, Redis, Kafka, Docker, Kubernetes, Terraform, Ansible, Django, Flask, FastAPI, Express, Spring Boot, Rails, TensorFlow, PyTorch, NumPy, Pandas, scikit-learn, OpenAI, Anthropic, Hugging Face. The medium and large Whisper models, which Pro users get on NVIDIA GPU paths, handle these noticeably better than the small or base models.

Newer or more obscure names sometimes need a one-word correction. "tRPC" becomes "TRPC" or "T R P C" depending on how you pronounce it. "Pydantic" usually comes out right but sometimes lands as "PI dantic." For names that come up constantly in your work, you learn the pronunciation that Whisper transcribes cleanly within a day or two of use. For everything else, manual correction is faster than re-typing the entire sentence.

The speed math, with real numbers

Task Typing at 60 WPM Voice at 150 WPM Time saved
200-word Cursor prompt 3 min 20 sec 1 min 20 sec 2 minutes
400-word Claude Code task description 6 min 40 sec 2 min 40 sec 4 minutes
100-word commit message body 1 min 40 sec 40 sec 1 minute
300-word PR description 5 minutes 2 minutes 3 minutes
500-word Slack design discussion 8 min 20 sec 3 min 20 sec 5 minutes
20 such items across a typical day ~90 minutes ~35 minutes ~55 minutes

The numbers assume the dictated text is 90% usable and needs a quick edit pass. Most developers find that pass adds about 10% of the original typing time, which is already factored into the voice column above. The point is not the precise minutes saved but the order of magnitude. An hour a day of recovered focus time, across a year of working days, is roughly 200 hours, or five working weeks.

Privacy: why local Whisper matters when you dictate about code

If your prompts to Cursor or Claude Code describe your employer's codebase, that prose is just as confidential as the code itself. Sending it to a third-party cloud transcription service raises the same questions your security team asks about pasting code into a public LLM: where does the audio go, who has access, how long is it retained, what is the audit story?

StarWhisper runs Whisper locally. The audio is captured by your microphone, processed by the model on your CPU or GPU, and turned into text on your machine. There is no upload step, no third-party transcription cloud, no retention period to ask about. If you unplug your network cable, dictation still works. This is structurally easier to defend in a security review than "we delete after 30 days," which is the standard cloud dictation posture.

Cloud Mode, which sends audio to the OpenAI Whisper API for faster results, is opt-in and disabled by default. For dictation about proprietary code, leave it off. The performance gap on a modern NVIDIA GPU is small enough that there is rarely a reason to enable it for this use case.

Setup for the first day of voice driven coding

The setup is short. Install StarWhisper from the download page or the Microsoft Store. The installer auto-detects whether you have an NVIDIA GPU and picks the right Whisper model pack: CPU, CUDA 11, or CUDA 12. First run downloads the model files, which takes a couple of minutes on a normal connection. After that the app sits in your system tray.

Pick a push-to-talk hotkey that does not collide with anything else in your IDE. Many developers use right-side keys like Right Ctrl, Right Alt, the side button on a mouse, or a foot pedal. Press, dictate, release, the text auto-pastes into the focused field. That is the whole interaction model.

For the first week, treat it as a tool for prompts and commit messages only. Build the habit there because the wins are largest and the failure modes are lowest. Once dictating a Cursor prompt feels normal, extend to comments, docstrings, Slack, and PR descriptions. Most developers settle into a stable pattern within two weeks. From there, voice typing becomes one of those tools you only notice when it is not available, like a good mechanical keyboard or a second monitor.

For more general context on dictation in AI chat interfaces, see how to use voice to text with ChatGPT. For a related niche, the voice to text for content creators page covers the same pattern applied to writing rather than coding.

What this does not replace

Voice typing is not a code editor. It is not going to type "for (let i = 0; i < arr.length; i++)" for you, and even if it could, the IDE autocomplete already does that faster. The category of work it replaces is the prose that surrounds the code: prompts, comments, commit messages, chat, descriptions, documentation. That category has grown significantly in the AI-first developer workflow, because the LLM does more of the literal code writing.

If your workflow is hand-writing every line of code in vim with no chat, no LLM, no docs, voice typing has a smaller upside for you. If your workflow involves writing long prompts, summarizing changes for the team, and explaining design decisions in writing, the upside is large.

Frequently Asked Questions

Does StarWhisper work in Cursor for voice prompts?
Yes. StarWhisper types text into any Windows text field that accepts keyboard input, and Cursor's chat panel, inline edit prompt, and composer all qualify. Press the hotkey, dictate your prompt or request, release, and StarWhisper pastes the transcribed text wherever your cursor is. This works the same in Cursor's chat sidebar, the cmd+K inline edit popup, and Composer mode. No extension needed and no Cursor configuration to change.
What about Claude Code or VS Code terminals?
Claude Code runs in a terminal, which is a regular Windows text input. StarWhisper auto-pastes into it the same way it does anywhere else. The same applies to the VS Code integrated terminal, Cody, Windsurf, Aider, and any other CLI agent you launch from the terminal. Dictate the prompt, release the hotkey, then press Enter. If a tool blocks the paste API, you can fall back to dictating into a scratch file and copying across.
Can I actually dictate code (variable names, snake_case, camelCase)?
Whisper transcribes natural language, not raw code. For literal code, voice typing is awkward and usually not faster than typing. The real win is dictating the intent: the prompt you give Cursor, the comment you leave above a function, the description in the commit message, the design discussion in Slack. The LLM writes the code, you describe what you want. That is the productivity multiplier most developers find when they try voice driven AI coding.
How accurate is Whisper on tech terms and library names?
Whisper was trained on a large web corpus that includes a lot of technical writing, blog posts, and documentation. Common library names like React, Postgres, Django, FastAPI, Kubernetes, and TensorFlow come out cleanly. Brand new framework names or obscure project codes will sometimes need a one-word correction, but the model handles tech vocabulary noticeably better than generic Windows speech recognition or older Dragon engines. Pro users on GPU get the medium or large Whisper model, which is the highest accuracy tier.
Does it punctuate code-friendly characters like parens and brackets?
Whisper outputs normal English punctuation when you speak it: 'comma', 'period', 'open paren', 'close paren' and so on. It is not a code dictation system in the Dragon Naturally Speaking sense where you map every symbol to a voice command. For long blocks of literal code, type. For prompts, comments, docstrings, commit messages, and chat, where the input is mostly prose with a few symbols, voice is comfortable.
Can I dictate while reading the screen at the same time?
Yes, and this is one of the bigger ergonomic wins. When you type, your eyes flick between the code on screen and the keyboard. When you dictate, your eyes stay on the diff, the error, the design doc you are referring to. Many developers report being able to think more clearly about the problem because they are not splitting attention between the input mechanic and the screen. The hotkey is the only motor action your hands have to do.
What about pair programming on Zoom or Slack huddles?
StarWhisper does not transcribe the call itself, but it does help you fire off long Slack messages, paste detailed bug reports into the team channel, or write a multi paragraph design proposal while screen sharing. The hotkey is push to talk, so the app only listens when you tell it to. There is no risk of accidentally typing your pair partner's audio because your microphone is only routed to Whisper when you hold the key.
Can I dictate commit messages?
Yes, and this is one of the easier wins to adopt. Open the commit message buffer in your editor, your terminal, or GitHub Desktop, press the hotkey, dictate the summary and body, release, hit save. Whisper handles conventional commit prefixes like 'feat colon', 'fix colon', 'chore colon' if you speak them, but most developers dictate prose and edit the prefix manually. A two paragraph commit message that takes a minute to type takes about fifteen seconds to dictate.

Try StarWhisper Free for Coding

500 words per day on the free tier. No credit card. Audio never leaves your device.

Download StarWhisper