Many cloud transcription tools process audio on vendor infrastructure. That can be inappropriate for confidential client calls, medical conversations, legal work, HR meetings, or R&D recordings unless the vendor and workflow have been reviewed. StarWhisper can run OpenAI Whisper locally on your Windows PC. In Local Mode, audio is not uploaded for transcription.
Where most transcription tools send your audio, and the alternative.
Cloud transcription services process audio on vendor infrastructure and may retain files, logs, or derived content according to their own terms. The convenience is real. The review burden is also real.
For non-sensitive content this is fine. For NDA-bound calls, medical or legal conversations, HR matters, or anything covered by GDPR or HIPAA, it forces a procurement conversation and a paper trail before you can even use it.
StarWhisper bundles the OpenAI Whisper model with the installer. When you transcribe, the model loads into your machine's memory, runs the audio through its neural network using your CPU or GPU, and produces text. There is no upload, no server, no log to subpoena, no retention period to ask about.
If you unplug the network, transcription still works. This is structural privacy, not a policy promise.
What "local processing" actually buys you
In Local Mode, no part of the audio leaves the device. This is checkable with any network monitor. You can verify it before you trust it.
Disconnect from the internet, transcription still works. Cloud tools simply fail under the same conditions. Offline operation is the cleanest possible proof of local processing.
Cloud services handle retention according to their own policies. With Local Mode, transcription audio is not uploaded to a transcription vendor, but users should still review local files, application settings, and their own retention obligations.
Some cloud transcription services pass your audio through additional AI models for cleanup or summarization, multiplying the parties that have access. Local processing keeps the data path to one machine.
Reducing third-party copies can reduce one category of external request risk. This matters for journalists, lawyers, and anyone whose source material is sensitive enough that legal process is a real consideration.
Whisper is open source. The audio you process today is not trapped in a vendor's account. If StarWhisper ever ceased to exist, the underlying model would still work.
Open the privacy policy of a cloud transcription service and you need to review where audio is processed, who the subprocessors are, how long content is retained, whether model-training options exist, and what controls are available on your tier.
For competitor tools, verify current vendor documentation before use. The review checklist is:
For most use cases, this trade-off is fine. The cloud handles the heavy compute, you get a polished product, the audio is encrypted in transit, the company has SOC 2. For some use cases, no amount of policy is enough because the audio still leaves the trusted environment, and the trusted environment is the only one whose security you actually control.
"Local" gets used loosely in marketing. Here is what it means in StarWhisper specifically.
The OpenAI Whisper model files are bundled with the installer. They sit in the StarWhisper installation directory on your Windows drive. You can see them, you can checksum them, you can copy them to another machine. They are not loaded from the internet at runtime. After you have installed the app, you do not need a network connection to dictate.
When you press the dictation hotkey, microphone audio is captured into a memory buffer, fed into the loaded Whisper model, and the model produces text using your machine's compute. No data is sent over the network. If your machine has an NVIDIA GPU, the inference runs on CUDA cores and is faster. If it does not, the CPU path works too, just slower.
This is the cleanest distinction between local and cloud transcription. A cloud product makes an HTTPS request to its API. A local product does not. You can confirm this by running a network monitor while you dictate. The result is the same as if the app had no internet permission at all.
The app does talk to the network for two things: checking for new versions (only when you click the button, per StarWhisper's strict no-auto-update policy) and verifying your license if you are on the paid tier. Neither of those touch your audio. Both can be inspected separately. If you want to use StarWhisper on an air-gapped machine, the free tier requires no license check at all.
HIPAA-sensitive conversations should not be routed through any transcription workflow until the vendor, data path, settings, and retention policy have been reviewed. Local Mode can reduce transcription-vendor exposure because audio is not uploaded for transcription, but each practice still needs its own HIPAA review, workstation safeguards, retention policy, and Cloud Mode controls. We cover related considerations in voice to text for therapists.
Drafting privileged content into a cloud transcription tool is, depending on jurisdiction, either explicitly problematic or a gray area that most legal ethics opinions advise avoiding. The reasoning is that storing privileged communications on a third party's servers may waive privilege under some bar interpretations. Local Mode can reduce third-party processing exposure, but lawyers should follow their jurisdiction, client instructions, firm policy, and retention rules.
Performance reviews, termination conversations, complaint investigations, and compensation discussions are exactly the type of content that should not appear in a third party's transcription database. Even if the SaaS vendor's posture is excellent, the surface area is unnecessary. Local transcription removes the question.
If your source agreed to talk on background, "the audio is in our cloud, deleted after 30 days" is a different story than "the audio never left my laptop." Reputable journalists default to the second story when they can. Local transcription supports that default.
If your employer's data policy says "no customer data in third-party SaaS without security review," that same policy almost certainly applies to voice recordings of internal conversations about that data. Local processing keeps the conversation inside the trusted environment.
For anything approaching SBU, CUI, or classified handling, cloud SaaS is generally off the table. Local processing is the only option that fits the threat model.
| Property | Cloud transcription | StarWhisper Local Mode |
|---|---|---|
| Audio leaves device | Yes | No |
| Retention window | 30 days typical, varies | None (not stored) |
| Third-party LLM processing | Sometimes | No |
| Works offline | No | Yes |
| Subpoena-able server log | Yes | No |
| BAA required for HIPAA | Yes | Customer review |
| Used to train vendor models | Sometimes (opt-out varies) | Never |
| Works behind air gap | No | Yes |
| Verifiable by network capture | Audio visible in transit | Zero outbound |
The reason "local" matters more than "private" is that local is checkable. You do not have to trust a policy statement. You can verify the property directly.
Install a network monitor on Windows. GlassWire is the easiest GUI option; Wireshark is the comprehensive one; the built-in Resource Monitor (Performance Monitor -> Network) is enough for a quick check. Start dictating in Local Mode and watch the StarWhisper process. You should see zero outbound bytes to any transcription endpoint during the dictation itself. The only outbound traffic associated with the app should be unrelated control-plane things like license verification or user-initiated update checks.
Disconnect from the network entirely. Disable Wi-Fi, unplug Ethernet, turn on airplane mode. Open StarWhisper and dictate. It still works. This is the cleanest proof because it is impossible to fake. Cloud transcription tools simply error out under air-gap conditions because they have nowhere to send the audio.
Open the StarWhisper installation folder. You will see the Whisper model files (the GGML or GGUF formats, depending on backend). These are large binary files (several hundred MB to a few GB depending on model size). Their presence on disk is what makes local processing possible. They are the model. They are the entire pipeline. Nothing about transcription has to leave the folder they live in.
You cannot verify that the app does not buffer audio to disk before discarding it. (It does not, but this is a code-level assertion.) You cannot verify Microsoft Windows itself is not capturing microphone audio independently. Those are separate concerns. For the OS layer, the standard Windows hardening guides apply.
For a lot of users, cloud transcription is genuinely the right tool. Multi-speaker meeting transcription with speaker labels is much better in Otter or Fireflies than in any single-microphone local tool. Cross-device sync works because the cloud is the storage layer. Automatic AI summarization runs faster on dedicated GPU servers than on a laptop. Customer support and integrations are stronger from a venture-backed product than a small Windows app.
If your content is not particularly sensitive, you are working across multiple devices, and you want the polished AI-summary-and-share workflow, a cloud tool is probably the better answer. StarWhisper is specifically the answer for users where the audio path matters, and the bar for adoption is whether you trust that path.
StarWhisper does ship with an optional Cloud Mode that sends audio to the OpenAI Whisper API. This exists because some users on low-spec machines want faster transcription and do not have a privacy concern with cloud processing. Cloud Mode is:
If your reason for considering StarWhisper is privacy, keep Cloud Mode off. The full Local Mode experience does not require it. The deeper local vs cloud reference is on the Whisper local vs cloud FAQ page.
StarWhisper is free to download. The free plan covers 500 words per day, which is enough for most users to evaluate the workflow on real content for a week or two before deciding. Pro is $10 per month or $80 per year and removes the daily limit. There is no per-seat pricing, no tier upsell, no usage meter beyond the daily word count. Full detail on the pricing section of the homepage.
System requirements are Windows 10 or 11. Any modern CPU works for the local Whisper path; an NVIDIA GPU makes it faster but is not required. The installer is a few hundred megabytes including the bundled model. Once installed, no network connection is needed for transcription. For more on the offline behavior, the dedicated privacy and offline features page goes into the architectural detail.
The technical detail of how local processing works in StarWhisper.
Side-by-side reference on the two operating modes and when to use each.
Local-only transcription for confidential clinical notes.
How to set up StarWhisper to work entirely without an internet connection.