How fast is offline Whisper on Windows? A real speed benchmark (2026)

We measured whisper.cpp transcription speed across every Whisper model size and three compute engines (CPU, NVIDIA CUDA, and Vulkan) on Windows. The short version: on a modern GPU, offline speech-to-text is effectively instant, and you do not need an NVIDIA card to get there.

Published July 4, 2026. Engine: whisper.cpp. Sample: the standard 11-second jfk.wav clip. Hardware: Intel Core i9-13980HX (8 threads used) and an NVIDIA RTX 4090 Laptop GPU. Best of two runs per configuration.

The short answer

0.83s
small model, 11s of audio, on GPU (13x real time)
~1%
speed gap between Vulkan and CUDA on the same GPU
35x
real-time factor for the tiny model on GPU

GPU results (RTX 4090 Laptop): CUDA vs Vulkan

On a modern GPU every model size runs at or faster than real time, so live dictation never lags. Notice how close CUDA and Vulkan are: the cross-vendor Vulkan path is not a downgrade.

ModelCUDA time (11s clip)CUDA speedVulkan time (11s clip)Vulkan speed
tiny0.32s34.3x0.31s35.5x
base0.41s27.2x0.42s26.5x
small (default)0.84s13.1x0.82s13.4x
medium1.91s5.7x2.02s5.4x
large-v33.90s2.8x3.73s3.0x

CPU results (Core i9-13980HX, 8 threads)

A fast CPU handles the small models comfortably, but the compute cost climbs steeply with model size. This is exactly why StarWhisper defaults to a right-sized model and uses your GPU when one is available.

ModelCPU time (11s clip)CPU speedVerdict for live dictation
tiny1.98s5.6xComfortably real time
base4.54s2.4xReal time
small20.4s0.5xSlower than real time
medium73.6s0.1xGPU recommended
large-v3133.5s0.1xGPU required in practice

What this means if you dictate all day

For interactive voice typing, the small model on a GPU is the sweet spot: near-perfect accuracy for everyday dictation with sub-second latency you never feel. The larger models are worth it for difficult audio or file transcription, but only with a GPU. If you are on a laptop with no discrete GPU, the tiny and base models keep dictation responsive, and StarWhisper picks a sensible default for your machine automatically. Because Vulkan performs like CUDA here, StarWhisper can accelerate on NVIDIA, AMD, and Intel GPUs, not just one vendor.

Methodology (so you can reproduce it)
Sample
The canonical 11.0-second jfk.wav clip shipped with whisper.cpp, so anyone can run the same test.
Engine
whisper.cpp whisper-cli, the same engine StarWhisper bundles, with the CPU, CUDA, and Vulkan builds.
Timing
Total wall-clock time reported by whisper.cpp whisper_print_timings, best of two runs per configuration to exclude one-time load variance. Real-time factor = 11.0s of audio divided by processing seconds.
Hardware
Intel Core i9-13980HX (8 threads used, matching the app default) and an NVIDIA RTX 4090 Laptop GPU, 32 GB RAM, Windows 11.
Honest caveat
These are speed measurements, not an accuracy ranking. Latency scales with hardware, so a slower CPU or GPU will be proportionally slower, but the relative picture (GPU is far faster, Vulkan matches CUDA, model size drives cost) holds.

Get StarWhisper free for Windows

StarWhisper runs Whisper entirely on your own machine, no audio leaves your device, and it picks the right model and engine for your hardware automatically.