How fast is offline Whisper on Windows? A real speed benchmark (2026)

Name: StarWhisper Offline Whisper Speed Benchmark on Windows (2026)
Creator: StarWhisper
Published: 2026-07-04

We measured whisper.cpp transcription speed across every Whisper model size and three compute engines (CPU, NVIDIA CUDA, and Vulkan) on Windows. The short version: on a modern GPU, offline speech-to-text is effectively instant, and you do not need an NVIDIA card to get there.

Published July 4, 2026. Engine: whisper.cpp. Sample: the standard 11-second jfk.wav clip. Hardware: Intel Core i9-13980HX (8 threads used) and an NVIDIA RTX 4090 Laptop GPU. Best of two runs per configuration.

The short answer

On a GPU, it is instant. The default small model transcribes 11 seconds of speech in about 0.83 seconds, roughly 13x faster than real time. Even the largest model (large-v3) runs faster than real time (about 2.8x).
You do not need NVIDIA. Vulkan matches CUDA to within a few percent on the same GPU, so AMD and Intel GPU owners get the same speed. Offline Whisper is not an NVIDIA-only story.
CPU-only is the real constraint. The tiny and base models still beat real time on a fast CPU (5.6x and 2.4x), but the small model drops to about 0.5x real time and large-v3 to 0.1x (over two minutes for an 11-second clip). On a CPU, choose a small model or turn on GPU acceleration.

0.83s

small model, 11s of audio, on GPU (13x real time)

~1%

speed gap between Vulkan and CUDA on the same GPU

35x

real-time factor for the tiny model on GPU

GPU results (RTX 4090 Laptop): CUDA vs Vulkan

On a modern GPU every model size runs at or faster than real time, so live dictation never lags. Notice how close CUDA and Vulkan are: the cross-vendor Vulkan path is not a downgrade.

Model	CUDA time (11s clip)	CUDA speed	Vulkan time (11s clip)	Vulkan speed
tiny	0.32s	34.3x	0.31s	35.5x
base	0.41s	27.2x	0.42s	26.5x
small (default)	0.84s	13.1x	0.82s	13.4x
medium	1.91s	5.7x	2.02s	5.4x
large-v3	3.90s	2.8x	3.73s	3.0x

CPU results (Core i9-13980HX, 8 threads)

A fast CPU handles the small models comfortably, but the compute cost climbs steeply with model size. This is exactly why StarWhisper defaults to a right-sized model and uses your GPU when one is available.

Model	CPU time (11s clip)	CPU speed	Verdict for live dictation
tiny	1.98s	5.6x	Comfortably real time
base	4.54s	2.4x	Real time
small	20.4s	0.5x	Slower than real time
medium	73.6s	0.1x	GPU recommended
large-v3	133.5s	0.1x	GPU required in practice

What this means if you dictate all day

For interactive voice typing, the small model on a GPU is the sweet spot: near-perfect accuracy for everyday dictation with sub-second latency you never feel. The larger models are worth it for difficult audio or file transcription, but only with a GPU. If you are on a laptop with no discrete GPU, the tiny and base models keep dictation responsive, and StarWhisper picks a sensible default for your machine automatically. Because Vulkan performs like CUDA here, StarWhisper can accelerate on NVIDIA, AMD, and Intel GPUs, not just one vendor.

Methodology (so you can reproduce it)

Sample: The canonical 11.0-second jfk.wav clip shipped with whisper.cpp, so anyone can run the same test.
Engine: whisper.cpp whisper-cli, the same engine StarWhisper bundles, with the CPU, CUDA, and Vulkan builds.
Timing: Total wall-clock time reported by whisper.cpp whisper_print_timings, best of two runs per configuration to exclude one-time load variance. Real-time factor = 11.0s of audio divided by processing seconds.
Hardware: Intel Core i9-13980HX (8 threads used, matching the app default) and an NVIDIA RTX 4090 Laptop GPU, 32 GB RAM, Windows 11.
Honest caveat: These are speed measurements, not an accuracy ranking. Latency scales with hardware, so a slower CPU or GPU will be proportionally slower, but the relative picture (GPU is far faster, Vulkan matches CUDA, model size drives cost) holds.

Get StarWhisper free for Windows

StarWhisper runs Whisper entirely on your own machine, no audio leaves your device, and it picks the right model and engine for your hardware automatically.