iOS 26's SpeechAnalyzer on a live mic: the 5 things the docs don't tell you

This is a condensed version. The full write-up — with the complete SpeechSession, the AudioBufferConverter, and the SFSpeechRecognizer → SpeechAnalyzer migration table — lives on the original post, and the runnable sample is on GitHub (MIT).

iOS 26 replaces SFSpeechRecognizer with SpeechAnalyzer + composable modules. The new model is nicer — an orchestrator you attach modules to, optimized for longer on-device audio, no "enable dictation in Settings" requirement. But if you follow the WWDC sample to wire it to a live microphone, you can end up with code that compiles and produces no text. Here are the five things that cost real time.

The mental model

mic ─► AVAudioEngine.installTap ─► AVAudioConverter ─► AnalyzerInput
                                                          │
                                    SpeechAnalyzer([ SpeechTranscriber ])
                                                          │
                            for try await result in transcriber.results
                                result.text (AttributedString) / result.isFinal

SpeechAnalyzer coordinates; you attach a SpeechTranscriber. Audio goes in as AnalyzerInput; results come out of an AsyncSequence.

guard let locale = await SpeechTranscriber.supportedLocale(equivalentTo: .current) else {
    throw Failure.localeNotSupported
}
let transcriber = SpeechTranscriber(
    locale: locale,
    transcriptionOptions: [],
    reportingOptions: [.volatileResults],   // partial text WHILE speaking
    attributeOptions: []                     // add .audioTimeRange for per-word timing
)
let analyzer = SpeechAnalyzer(modules: [transcriber])
let analyzerFormat = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber])

1. You must convert the audio buffer (the #1 trap)

AVAudioEngine's input node format (often 48 kHz, hardware-dependent) usually does not match SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith:). Feed a mismatched buffer and you get a clean compile and zero transcription — no error. Run every buffer through AVAudioConverter first.

let converter = AudioBufferConverter()   // capture locals; never touch self in the tap
let input = audioEngine.inputNode
let micFormat = input.outputFormat(forBus: 0)

input.installTap(onBus: 0, bufferSize: 4096, format: micFormat) { buffer, _ in
    guard let converted = try? converter.convert(buffer, to: analyzerFormat) else { return }
    builder.yield(AnalyzerInput(buffer: converted))
}
audioEngine.prepare()
try audioEngine.start()

2. The model downloads on first use — handle offline

Transcription is on-device, but the language model is a system-shared asset that may not be installed yet (it doesn't count against your app bundle). A first run with no network can't download it, so handle that state explicitly instead of failing silently.

let installed = await Set(SpeechTranscriber.installedLocales.map { $0.identifier(.bcp47) })
if !installed.contains(locale.identifier(.bcp47)) {
    if let request = try await AssetInventory.assetInstallationRequest(supporting: [transcriber]) {
        try await request.downloadAndInstall()   // has .progress for a UI
    }
}

3. Volatile vs. finalized results

reportingOptions: [.volatileResults] gives fast partials while the user is still speaking; result.isFinal marks committed text. Show volatile dimmed, replace it on a final, persist only finals. result.text is an AttributedString.

for try await result in transcriber.results {
    let piece = String(result.text.characters)
    if result.isFinal { finalizedText += piece; volatileText = "" }
    else              { volatileText = piece }
}

4. There is no Custom Vocabulary

SFSpeechRecognizer had contextualStrings to bias toward known terms. SpeechAnalyzer, as of iOS 26.0, exposes no equivalent. If your domain is full of proper nouns or jargon, budget for that gap now.

5. watchOS: SpeechAnalyzer isn't there — but voice input still is

SpeechAnalyzer ships on iOS, iPadOS, macOS, visionOS and tvOS 26 — not watchOS. That doesn't mean "no voice on the Watch": you fall back to the system dictation UI, which hands back finished text (you lose volatile results, time ranges, and your own tap).

// watchOS — the system handles dictation and returns text:
TextFieldLink(prompt: Text("Speak or type")) {
    Image(systemName: "mic.fill")
} onSubmit: { text in send(text) }

A note on latency (with the conditions attached)

The most-cited SpeechAnalyzer latency figure is a WWDC25-era developer-forum report of ~14s+ to the first result on an iPhone 16 Pro (iOS 26.0 beta, Xcode beta 5). On shipping iOS 26.5, on an iPhone 16e — the non-Pro A18, the least powerful A18 device — time to the first volatile result is ~0.3–0.5s on a warm start (model installed, locale allocated). First-ever launch is different (it downloads the model once), so budget for that path separately and show progress.

This is a first-party measurement (time-to-first-volatile-result), not a controlled head-to-head — different device, shipping OS vs beta. Measure on your own device and publish device + OS + metric alongside the number. The likely takeaway: the beta-era latency was a preheat/config/beta issue, not a hardware limit — on-device transcription runs primarily on the Neural Engine, the same 16-core unit across the whole A18 family.

Swift 6 concurrency footnote

The tap closure runs on a real-time audio thread. Under complete strict concurrency, capture only locals (the continuation, the target format, a fresh converter) and never touch a @MainActor object inside the tap — then it compiles without @unchecked Sendable.

Full code and the migration table: github.com/simplememofast/ios26-speechanalyzer-live-mic (MIT). Corrections from real device builds welcome via PR.