Transcription Privacy: Keep Your Audio Off the Cloud (Free)

Name: StarWhisper
Rating: 4.8 (50 reviews)
Author: StarWhisper

What every cloud transcription service does with your audio

Open the privacy policy of any major transcription service and you will find a similar structure. The audio you upload is processed on their servers, may be passed through third-party AI providers, and is retained for some period (usually 30 days, sometimes 90, sometimes "until you delete"). Most policies also reserve the right to use anonymized portions of audio to improve the underlying model, unless you opt out, which is sometimes a paid-tier feature.

Specifically, here is the audio path for some of the largest providers:

Otter.ai: Audio uploaded to AWS, processed through proprietary models, stored on their infrastructure, available for export. Pro tier offers data retention controls.
Rev: Audio uploaded for either human or AI transcription. Stored on Rev infrastructure. Privacy posture is reasonable but the audio is unambiguously off your device.
Happy Scribe: EU-based, GDPR-friendly, but still cloud. Audio uploaded, transcribed on their servers, available for download.
Notta: Cloud-based. Free tier includes some limits but no local option.
Trint: Cloud, enterprise-focused. Used heavily in media, but the audio still uploads.
Sonix: Cloud. Stripe-style API, but audio flows through their pipeline.

For most use cases, this trade-off is fine. The cloud handles the heavy compute, you get a polished product, the audio is encrypted in transit, the company has SOC 2. For some use cases, no amount of policy is enough because the audio still leaves the trusted environment, and the trusted environment is the only one whose security you actually control.

What local-only transcription actually means

"Local" gets used loosely in marketing. Here is what it means in StarWhisper specifically.

The model lives on your disk

The OpenAI Whisper model files are bundled with the installer. They sit in the StarWhisper installation directory on your Windows drive. You can see them, you can checksum them, you can copy them to another machine. They are not loaded from the internet at runtime. After you have installed the app, you do not need a network connection to dictate.

Inference runs on your CPU or GPU

When you press the dictation hotkey, microphone audio is captured into a memory buffer, fed into the loaded Whisper model, and the model produces text using your machine's compute. No data is sent over the network. If your machine has an NVIDIA GPU, the inference runs on CUDA cores and is faster. If it does not, the CPU path works too, just slower.

There is no remote API call

This is the cleanest distinction between local and cloud transcription. A cloud product makes an HTTPS request to its API. A local product does not. You can confirm this by running a network monitor while you dictate. The result is the same as if the app had no internet permission at all.

What about updates and license checks

The app does talk to the network for two things: checking for new versions (only when you click the button, per StarWhisper's strict no-auto-update policy) and verifying your license if you are on the paid tier. Neither of those touch your audio. Both can be inspected separately. If you want to use StarWhisper on an air-gapped machine, the free tier requires no license check at all.

Use cases where local transcription is the right call

Healthcare and medical scribing

HIPAA-covered conversations between clinicians and patients should not be uploaded to a cloud transcription service unless that service has a signed BAA and the use case has been reviewed by compliance. Many SaaS transcription products do offer BAAs but only on enterprise tiers. Local processing avoids the question entirely: no BAA needed because no data crosses to a third party. We cover this in detail in voice to text for therapists and on the upcoming HIPAA dictation reference page.

Legal work and attorney-client privilege

Drafting privileged content into a cloud transcription tool is, depending on jurisdiction, either explicitly problematic or a gray area that most legal ethics opinions advise avoiding. The reasoning is that storing privileged communications on a third party's servers may waive privilege under some bar interpretations. Local processing keeps the content on the attorney's machine, which is the same standard that has applied to dictation tools for fifty years.

HR and personnel matters

Performance reviews, termination conversations, complaint investigations, and compensation discussions are exactly the type of content that should not appear in a third party's transcription database. Even if the SaaS vendor's posture is excellent, the surface area is unnecessary. Local transcription removes the question.

Journalism and source protection

If your source agreed to talk on background, "the audio is in our cloud, deleted after 30 days" is a different story than "the audio never left my laptop." Reputable journalists default to the second story when they can. Local transcription supports that default.

R&D, trade secrets, NDA-bound work

If your employer's data policy says "no customer data in third-party SaaS without security review," that same policy almost certainly applies to voice recordings of internal conversations about that data. Local processing keeps the conversation inside the trusted environment.

Government, defense, classified-adjacent work

For anything approaching SBU, CUI, or classified handling, cloud SaaS is generally off the table. Local processing is the only option that fits the threat model.

Comparison: local vs cloud audio handling

Property	Cloud transcription	StarWhisper Local Mode
Audio leaves device	Yes	No
Retention window	30 days typical, varies	None (not stored)
Third-party LLM processing	Sometimes	No
Works offline	No	Yes
Subpoena-able server log	Yes	No
BAA required for HIPAA	Yes	Not applicable
Used to train vendor models	Sometimes (opt-out varies)	Never
Works behind air gap	No	Yes
Verifiable by network capture	Audio visible in transit	Zero outbound

How to verify the privacy claim yourself

The reason "local" matters more than "private" is that local is checkable. You do not have to trust a policy statement. You can verify the property directly.

Test 1: Network capture

Install a network monitor on Windows. GlassWire is the easiest GUI option; Wireshark is the comprehensive one; the built-in Resource Monitor (Performance Monitor -> Network) is enough for a quick check. Start dictating in Local Mode and watch the StarWhisper process. You should see zero outbound bytes to any transcription endpoint during the dictation itself. The only outbound traffic associated with the app should be unrelated control-plane things like license verification or user-initiated update checks.

Test 2: Air gap

Disconnect from the network entirely. Disable Wi-Fi, unplug Ethernet, turn on airplane mode. Open StarWhisper and dictate. It still works. This is the cleanest proof because it is impossible to fake. Cloud transcription tools simply error out under air-gap conditions because they have nowhere to send the audio.

Test 3: Inspect the install

Open the StarWhisper installation folder. You will see the Whisper model files (the GGML or GGUF formats, depending on backend). These are large binary files (several hundred MB to a few GB depending on model size). Their presence on disk is what makes local processing possible. They are the model. They are the entire pipeline. Nothing about transcription has to leave the folder they live in.

What you cannot fully verify

You cannot verify that the app does not buffer audio to disk before discarding it. (It does not, but this is a code-level assertion.) You cannot verify Microsoft Windows itself is not capturing microphone audio independently. Those are separate concerns. For the OS layer, the standard Windows hardening guides apply.

Where cloud transcription wins, honestly

This is not a one-sided argument

For a lot of users, cloud transcription is genuinely the right tool. Multi-speaker meeting transcription with speaker labels is much better in Otter or Fireflies than in any single-microphone local tool. Cross-device sync works because the cloud is the storage layer. Automatic AI summarization runs faster on dedicated GPU servers than on a laptop. Customer support and integrations are stronger from a venture-backed product than a small Windows app.

If your content is not particularly sensitive, you are working across multiple devices, and you want the polished AI-summary-and-share workflow, a cloud tool is probably the better answer. StarWhisper is specifically the answer for users where the audio path matters, and the bar for adoption is whether you trust that path.

Specifically, cloud transcription is better when

You need speaker labels and multi-party transcription. StarWhisper is built for one speaker (you).
You need cross-device sync. StarWhisper is desktop Windows only, no mobile or cloud sync.
You want post-meeting AI summarization with action item extraction. This is a cloud-tool strength.
Your team has standardized on a particular tool. The integration cost may outweigh the privacy upside.

What about StarWhisper's optional Cloud Mode

StarWhisper does ship with an optional Cloud Mode that sends audio to the OpenAI Whisper API. This exists because some users on low-spec machines want faster transcription and do not have a privacy concern with cloud processing. Cloud Mode is:

Off by default. The app ships in Local Mode out of the box.
Opt-in. You enable it in Settings; the toggle is clearly labeled.
Reversible. You can turn it off at any time and the app returns to local-only behavior.
Disclosed. The settings UI explains what changes when you enable it.

If your reason for considering StarWhisper is privacy, keep Cloud Mode off. The full Local Mode experience does not require it. The deeper local vs cloud reference is on the Whisper local vs cloud FAQ page.

Pricing and how to start

StarWhisper is free to download. The free plan covers 500 words per day, which is enough for most users to evaluate the workflow on real content for a week or two before deciding. Pro is $10 per month or $80 per year and removes the daily limit. There is no per-seat pricing, no tier upsell, no usage meter beyond the daily word count. Full detail on the pricing section of the homepage.

System requirements are Windows 10 or 11. Any modern CPU works for the local Whisper path; an NVIDIA GPU makes it faster but is not required. The installer is a few hundred megabytes including the bundled model. Once installed, no network connection is needed for transcription. For more on the offline behavior, the dedicated privacy and offline features page goes into the architectural detail.

Frequently Asked Questions

Does StarWhisper ever send audio anywhere?

Not in Local Mode, which is the default. Audio is captured by your microphone, fed straight into the local Whisper model, turned into text, and discarded. There is no upload step, no third-party processor, no transcript stored on a remote server. The only way audio leaves the device is if you explicitly enable Cloud Mode in settings, which is opt-in and disclosed at the moment you turn it on.

What about Cloud Mode, when does that send audio?

Cloud Mode sends audio to the OpenAI Whisper API only after you explicitly enable it in Settings. It is off by default. You can disable it at any time. The toggle exists for users who want slightly faster transcription on low-end hardware and do not need local-only processing. The Local Mode default never touches the network for transcription.

Can I prove that the audio does not leave my device?

Yes. Open a network monitor like Wireshark, Resource Monitor, or GlassWire on Windows. Start a dictation session in Local Mode. You will see zero outbound traffic from StarWhisper to any transcription endpoint during transcription. The only network traffic associated with the app is occasional license verification and update checks, both unrelated to your audio.

What about telemetry or analytics, does that include audio?

No. StarWhisper's telemetry covers usage events (e.g., dictation started, app version, OS version) and crash reports. It does not include audio, transcribed text content, or any payload that could identify what you said. Telemetry can also be disabled in Settings if you prefer to send nothing at all. The full data inventory is documented in the privacy policy.

Is the transcript stored anywhere?

StarWhisper does not store a transcript history server-side. The transcribed text is pasted into the application you have focused (Word, Notion, Outlook, etc.) and that application handles storage on your own machine. If you use the optional local history feature, transcripts are saved to a folder on your PC that you control and can delete at any time. Nothing is uploaded.

What does local processing actually mean technically?

The OpenAI Whisper model is bundled with the installer and stored on your disk. When you dictate, the app loads the model into memory, captures microphone audio, runs the audio through the model's neural network using your CPU or GPU, and produces text. There is no remote API call. The same architecture would work on a fully air-gapped machine. This is fundamentally different from a SaaS transcription product where the model lives on the vendor's servers.

What about Windows itself or other apps spying on me?

That is a separate concern and outside the scope of any single application. Windows has its own telemetry, which you can configure in Settings. Other apps on your machine may have microphone access. StarWhisper cannot speak to what those do; it can only speak to what it does itself, which is process audio locally. If your threat model includes the OS, you should harden the OS independently.

How do I verify all of this for myself?

Three steps. First, run a network capture during dictation and confirm no upload. Second, check the StarWhisper installation folder to confirm the Whisper model files are present locally. Third, disconnect from the internet entirely and confirm dictation still works in Local Mode. The third test is the cleanest proof because cloud services would simply fail if the network were unavailable.

Transcription Privacy: Keep Your Audio Off the Cloud

The Cloud Transcription Problem

Cloud-based transcription

Whisper running on your device

Six Privacy Properties

Zero upload by default

Works offline

No retention window to manage

No third-party LLM hop

No subpoena surface

No vendor lock-in