On-Device Speech Transcription in React Native with Apple SpeechAnalyzer

Authors

Mike Grabowski

CTO & Founder

Callstack

Following our recent announcements on on-device LLM support and text embeddings, we are excited to expand our on-device AI capabilities for React Native developers.

Today, we are introducing support for speech transcription, powered by Apple's latest speech recognition frameworks.

import { experimental_transcribe as transcribe } from 'ai'

const response = await transcribe({
  model: apple.transcriptionModel(),
  audio,
})

To get started, install the latest version of the @react-native-ai/apple package:

npm install @react-native-ai/apple@latest

Under the Hood: Apple's SpeechAnalyzer Framework

Apple's speech recognition system, introduced in iOS 26, is built around the new SpeechAnalyzer class, part of the Speechframework. This API introduces a modular approach to audio analysis, enabling developers to configure specific capabilities based on the needs of their application.

Modules are attached to the SpeechAnalyzer to perform targeted tasks. In iOS 26, two modules are available: SpeechTranscriber, which performs speech-to-text transcription, and SpeechDetector, which identifies voice activity in an audio stream. In many use cases, SpeechTranscriber alone is sufficient, but the modular design allows for greater flexibility when combining different types of analysis.

The SpeechAnalyzer class coordinates this process. It manages the attached modules, receives incoming audio, and controls the overall analysis workflow. Modules can be added or removed dynamically during a session, and each module will only process audio from the point it was attached. This architecture gives developers control over performance, resource usage, and the level of detail captured during transcription.

How to Use It

We've integrated this new capability with AI SDK experimental_transcribe, making it incredibly simple to get started. With just a few lines of code, you can transcribe audio files.

The function takes an audio source (as an ArrayBuffer or a base64-encoded string) and returns the transcribed text, along with detailed segment information and the audio's total duration.

import { experimental_transcribe as transcribe } from 'ai'
import { apple } from '@react-native-ai/apple'

const file = await fetch('https://example.com/audio.wav')
const audio = await file.arrayBuffer()

const { text, segments, durationInSeconds } = await transcribe({
  model: apple.transcriptionModel(),
  audio,
})

console.log(text)
// "This is a test transcription."

console.log(segments)
// [
//   { text: 'This', startTime: 0.1, endTime: 0.3 },
//   { text: 'is', startTime: 0.3, endTime: 0.4 },
//   { text: 'a', startTime: 0.4, endTime: 0.5 },
//   { text: 'test', startTime: 0.5, endTime: 0.8 },
//   { text: 'transcription.', startTime: 0.8, endTime: 1.5 }
// ]

console.log(durationInSeconds)
// 1.5

The API is highly-optimized to support ArrayBuffers natively, thanks to JSI. This means that you can pass audio data directly without any additional encoding or decoding steps.

Multi-Language Support

You can specify a language using its ISO 639-1 code for more precise control:

await experimental_transcribe({
  model: apple.transcriptionModel(),
  audio: audioArrayBuffer,
  providerOptions: {
    apple: {
      language: 'fr' // Transcribe in French
    }
  }
})

By default, the model will use the device's primary language for transcription.

Painless Asset Management

Apple's SpeechAnalyzer requires language-specific assets to be present on the device. Our provider handles this automatically, requesting the necessary assets when a transcription is initiated.

For a more proactive approach, you can manually prepare the assets. This is useful for ensuring that the models are already available before the user needs them, resulting in an instantaneous transcription experience.

For example, to prepare the English language assets, you could do the following:

import { NativeAppleSpeech } from '@react-native-ai/apple'

await NativeAppleSpeech.prepare('en')

A key benefit of this architecture is that all language models are stored in a system-wide asset catalog, completely separate from your app bundle. This means zero impact on your app's size.

Unmatched Performance

Running on-device with Apple's optimized frameworks delivers a significant performance advantage.

System	Processing Time	Performance
Apple SpeechAnalyzer	45 seconds	Baseline
MacWhisper Large V3 Turbo	1 minute 41 seconds	2.2× slower

According to community benchmarks, transcribing a 34-minute audio file is over 2.2 times faster than using Whisper's Large V3 model.

Conclusion

With the addition of on-device speech transcription, we are continuing our mission to provide React Native developers with a comprehensive, private, and high-performance AI toolkit.

We encourage you to explore the documentation, try out the new feature, and share your feedback with us on GitHub.

‍

Table of contents

This is some text inside of a div block.

Integrating AI into your React Native workflow?

We help teams leverage AI to accelerate development and deliver smarter user experiences.

Let’s chat