Following our recent announcements on on-device LLM support and text embeddings, we are excited to expand our on-device AI capabilities for React Native developers.
Today, we are introducing support for speech transcription, powered by Apple's latest speech recognition frameworks.
import { experimental_transcribe as transcribe } from 'ai'
const response = await transcribe({
model: apple.transcriptionModel(),
audio,
})
To get started, install the latest version of the @react-native-ai/apple package:
npm install @react-native-ai/apple@latest
Under the Hood: Apple's SpeechAnalyzer Framework
Apple's speech recognition system, introduced in iOS 26, is built around the new SpeechAnalyzer
class, part of the Speech
framework. This API introduces a modular approach to audio analysis, enabling developers to configure specific capabilities based on the needs of their application.
Modules are attached to the SpeechAnalyzer
to perform targeted tasks. In iOS 26, two modules are available: SpeechTranscriber
, which performs speech-to-text transcription, and SpeechDetector
, which identifies voice activity in an audio stream. In many use cases, SpeechTranscriber
alone is sufficient, but the modular design allows for greater flexibility when combining different types of analysis.
The SpeechAnalyzer
class coordinates this process. It manages the attached modules, receives incoming audio, and controls the overall analysis workflow. Modules can be added or removed dynamically during a session, and each module will only process audio from the point it was attached. This architecture gives developers control over performance, resource usage, and the level of detail captured during transcription.
How to Use It
We've integrated this new capability with AI SDK experimental_transcribe
, making it incredibly simple to get started. With just a few lines of code, you can transcribe audio files.
The function takes an audio
source (as an ArrayBuffer
or a base64-encoded string) and returns the transcribed text, along with detailed segment information and the audio's total duration.
import { experimental_transcribe as transcribe } from 'ai'
import { apple } from '@react-native-ai/apple'
const file = await fetch('https://example.com/audio.wav')
const audio = await file.arrayBuffer()
const { text, segments, durationInSeconds } = await transcribe({
model: apple.transcriptionModel(),
audio,
})
console.log(text)
// "This is a test transcription."
console.log(segments)
// [
// { text: 'This', startTime: 0.1, endTime: 0.3 },
// { text: 'is', startTime: 0.3, endTime: 0.4 },
// { text: 'a', startTime: 0.4, endTime: 0.5 },
// { text: 'test', startTime: 0.5, endTime: 0.8 },
// { text: 'transcription.', startTime: 0.8, endTime: 1.5 }
// ]
console.log(durationInSeconds)
// 1.5
The API is highly-optimized to support ArrayBuffers natively, thanks to JSI. This means that you can pass audio data directly without any additional encoding or decoding steps.
Multi-Language Support
You can specify a language using its ISO 639-1 code for more precise control:
await experimental_transcribe({
model: apple.transcriptionModel(),
audio: audioArrayBuffer,
providerOptions: {
apple: {
language: 'fr' // Transcribe in French
}
}
})
By default, the model will use the device's primary language for transcription.
Painless Asset Management
Apple's SpeechAnalyzer
requires language-specific assets to be present on the device. Our provider handles this automatically, requesting the necessary assets when a transcription is initiated.
For a more proactive approach, you can manually prepare the assets. This is useful for ensuring that the models are already available before the user needs them, resulting in an instantaneous transcription experience.
For example, to prepare the English language assets, you could do the following:
import { NativeAppleSpeech } from '@react-native-ai/apple'
await NativeAppleSpeech.prepare('en')
A key benefit of this architecture is that all language models are stored in a system-wide asset catalog, completely separate from your app bundle. This means zero impact on your app's size.
Unmatched Performance
Running on-device with Apple's optimized frameworks delivers a significant performance advantage.
According to community benchmarks, transcribing a 34-minute audio file is over 2.2 times faster than using Whisper's Large V3 model.
Conclusion
With the addition of on-device speech transcription, we are continuing our mission to provide React Native developers with a comprehensive, private, and high-performance AI toolkit.
We encourage you to explore the documentation, try out the new feature, and share your feedback with us on GitHub.
Learn more about
AI
Here's everything we published recently on this topic.
We can help you move
it forward!
At Callstack, we work with companies big and small, pushing React Native everyday.
React Native Development
Hire expert React Native engineers to build, scale, or improve your app, from day one to production.
