macOS voice-to-text overlay — dictate into any text field across any app. Wispr Flow-style ergonomics but on-device, integrated with our own toolchain.
Background
Typing is friction when capturing ideas, drafting client messages, or steering AI tools mid-flow. Existing dictation (Apple Dictation, Wispr Flow) either lives outside our preferred toolchain or sends audio to third-party servers. We wanted a focused on-device voice-to-text overlay we control.
What we built
Swift macOS app using on-device WhisperKit transcription. Phase 1 ships end-to-end: AVAudioEngine capture → WhisperKit actor → transcript on screen. SV/EN/auto-detect language picker. Debug UI for triggering and reviewing each capture.
Approach
Native Swift over cross-platform — macOS-first lets us hook into NSPasteboard, CGEvent, and global hotkeys properly. On-device transcription via WhisperKit avoids latency and audio-out-of-device privacy concerns.
Outcome
Phase 1 ships transcription end-to-end. Roadmap: Phase 2 output pipeline (paste + clipboard preserve/restore), Phase 3 global hotkey state machine, Phase 4 floating HUD, Phase 5 menubar, Phase 6 polish + packaging.
