Vibe coding works best at 150 words per minute. Dictate long prompts to Cursor, Claude Code, and Copilot. Type comments, docstrings, commit messages, and Slack chat with your voice. Local Whisper, Windows native, free to start.
Andrej Karpathy called it "vibe coding": dictating intent to an LLM and letting it write the code. Voice is the bottleneck.
Frontier models reward long, specific prompts. Typing a 400-word prompt to Cursor takes 6 to 8 minutes. Dictating one takes about 2.5 minutes. You get more iterations per hour.
Voice typing is not a replacement for typing brackets, colons, and snake_case identifiers. The win is dictating the prose around the code, not the code itself.
Six places where voice typing replaces hand typing in a modern dev workflow
The prompt to your AI pair programmer is mostly prose, often several hundred words for a non-trivial task. Dictating cuts that to a quarter of the time and tends to produce clearer, more specific prompts because you can hear your own ambiguity.
The codebase needs Google-style or JSDoc-style comments and nobody writes them because typing them out is friction. Voice typing removes the friction. Explain what the function does in normal English, edit a few words, move on.
The two-paragraph commit that explains the why takes a minute to type and fifteen seconds to dictate. Conventional commit prefixes like feat, fix, chore get added with one keystroke after Whisper hands you the body.
Pair programming over a call, debugging together in a Slack huddle, dropping a design note in a thread. All prose, all easier dictated, especially when you are also screen sharing and want your hands free to point at things.
Every pull request needs a description and every team has a "good PR description" template. Dictating a four paragraph PR description into the GitHub form takes a minute. Reviewers thank you for the context.
Filing a clear bug report in Linear or Jira is the kind of writing that gets rushed because typing is slow. Voice typing gives you the bandwidth to actually describe the repro steps, the expected versus actual, and the workaround you tried.
The term "vibe coding" was popularized by Andrej Karpathy in early 2025 to describe a new mode of development where a human directs an LLM in natural language and the LLM produces, edits, and refactors the code. The shift matters because the bottleneck moves. In traditional development, typing speed barely mattered: thinking, designing, and debugging took the time. In vibe coding, you are mostly writing prompts. A prompt is text. The faster you can produce text, the more iterations you get per hour. StarWhisper is a Windows desktop dictation app built around exactly this shift.
The average developer types at around 50 to 70 words per minute. The average speaker dictates fluent prose at 130 to 160 words per minute. That is a 2 to 3x throughput multiplier on every prompt, every code comment, every commit message, every Slack thread, every PR description. Multiply that across a day of AI-assisted development and the time savings are meaningful.
The second-order effect matters more. When typing is the bottleneck, developers under-specify their prompts. They write "fix the auth bug" instead of "the session cookie is being cleared on logout when the user has the remember-me checkbox set, here is the relevant code, please trace why this might happen and propose a fix." The long version produces better LLM output. Voice removes the cost of the long version.
StarWhisper is not a code editor and does not try to replace your IDE. It is a global Windows dictation layer that types into whatever text field is focused. That means it works equally well in every tool a developer touches in a normal day:
There is no integration layer because there is nothing to integrate with. StarWhisper hooks into Windows at the input level and pastes wherever your cursor is. This is the same model as the operating system's built-in voice typing, except the engine is OpenAI Whisper instead of Windows Speech Recognition, and the audio never leaves your machine.
The standard complaint about voice dictation in technical contexts is that it mangles library names, framework names, and product names. This was true for older speech recognition systems trained on general English corpora. It is much less true for Whisper, which OpenAI trained on 680,000 hours of multilingual web audio, including a substantial amount of technical podcasts, conference talks, and tutorial content.
In practice, common tech vocabulary lands cleanly: React, Vue, Svelte, Next.js, Postgres, MySQL, Redis, Kafka, Docker, Kubernetes, Terraform, Ansible, Django, Flask, FastAPI, Express, Spring Boot, Rails, TensorFlow, PyTorch, NumPy, Pandas, scikit-learn, OpenAI, Anthropic, Hugging Face. The medium and large Whisper models, which Pro users get on NVIDIA GPU paths, handle these noticeably better than the small or base models.
Newer or more obscure names sometimes need a one-word correction. "tRPC" becomes "TRPC" or "T R P C" depending on how you pronounce it. "Pydantic" usually comes out right but sometimes lands as "PI dantic." For names that come up constantly in your work, you learn the pronunciation that Whisper transcribes cleanly within a day or two of use. For everything else, manual correction is faster than re-typing the entire sentence.
| Task | Typing at 60 WPM | Voice at 150 WPM | Time saved |
|---|---|---|---|
| 200-word Cursor prompt | 3 min 20 sec | 1 min 20 sec | 2 minutes |
| 400-word Claude Code task description | 6 min 40 sec | 2 min 40 sec | 4 minutes |
| 100-word commit message body | 1 min 40 sec | 40 sec | 1 minute |
| 300-word PR description | 5 minutes | 2 minutes | 3 minutes |
| 500-word Slack design discussion | 8 min 20 sec | 3 min 20 sec | 5 minutes |
| 20 such items across a typical day | ~90 minutes | ~35 minutes | ~55 minutes |
The numbers assume the dictated text is 90% usable and needs a quick edit pass. Most developers find that pass adds about 10% of the original typing time, which is already factored into the voice column above. The point is not the precise minutes saved but the order of magnitude. An hour a day of recovered focus time, across a year of working days, is roughly 200 hours, or five working weeks.
If your prompts to Cursor or Claude Code describe your employer's codebase, that prose is just as confidential as the code itself. Sending it to a third-party cloud transcription service raises the same questions your security team asks about pasting code into a public LLM: where does the audio go, who has access, how long is it retained, what is the audit story?
StarWhisper runs Whisper locally. The audio is captured by your microphone, processed by the model on your CPU or GPU, and turned into text on your machine. There is no upload step, no third-party transcription cloud, no retention period to ask about. If you unplug your network cable, dictation still works. This is structurally easier to defend in a security review than "we delete after 30 days," which is the standard cloud dictation posture.
Cloud Mode, which sends audio to the OpenAI Whisper API for faster results, is opt-in and disabled by default. For dictation about proprietary code, leave it off. The performance gap on a modern NVIDIA GPU is small enough that there is rarely a reason to enable it for this use case.
The setup is short. Install StarWhisper from the download page or the Microsoft Store. The installer auto-detects whether you have an NVIDIA GPU and picks the right Whisper model pack: CPU, CUDA 11, or CUDA 12. First run downloads the model files, which takes a couple of minutes on a normal connection. After that the app sits in your system tray.
Pick a push-to-talk hotkey that does not collide with anything else in your IDE. Many developers use right-side keys like Right Ctrl, Right Alt, the side button on a mouse, or a foot pedal. Press, dictate, release, the text auto-pastes into the focused field. That is the whole interaction model.
For the first week, treat it as a tool for prompts and commit messages only. Build the habit there because the wins are largest and the failure modes are lowest. Once dictating a Cursor prompt feels normal, extend to comments, docstrings, Slack, and PR descriptions. Most developers settle into a stable pattern within two weeks. From there, voice typing becomes one of those tools you only notice when it is not available, like a good mechanical keyboard or a second monitor.
For more general context on dictation in AI chat interfaces, see how to use voice to text with ChatGPT. For a related niche, the voice to text for content creators page covers the same pattern applied to writing rather than coding.
Voice typing is not a code editor. It is not going to type "for (let i = 0; i < arr.length; i++)" for you, and even if it could, the IDE autocomplete already does that faster. The category of work it replaces is the prose that surrounds the code: prompts, comments, commit messages, chat, descriptions, documentation. That category has grown significantly in the AI-first developer workflow, because the LLM does more of the literal code writing.
If your workflow is hand-writing every line of code in vim with no chat, no LLM, no docs, voice typing has a smaller upside for you. If your workflow involves writing long prompts, summarizing changes for the team, and explaining design decisions in writing, the upside is large.
Dictate long system prompts into ChatGPT, Claude.ai, Gemini, and Perplexity on Windows.
Dedicated walkthrough for Cursor's chat sidebar, inline edits, and Composer.
The same workflow applied to writing rather than coding: drafts, scripts, posts.
Step-by-step setup for dictating into ChatGPT web and desktop on Windows.