On-Device Text To Speech on Apple Devices with AI SDK

Authors

Mike Grabowski

CTO & Founder

Callstack

At Callstack, we've been working on bringing on-device providers for the AI SDK. As a part of that effort, we recently added support for generation, embeddings, and transcription. Today, we're excited to announce the latest addition to our on-device AI capabilities: speech synthesis.

With this release, our Apple provider for the AI SDK is now feature-complete. React Native developers have a comprehensive toolkit for building private, local AI experiences.

import { apple } from '@react-native-ai/apple'
import { experimental_generateSpeech as speech } from 'ai'

const response = await speech({
  model: apple.speechModel(),
  text: 'Hello from Apple on-device speech!',
})

// Access the audio
console.log(response.audio)

To get started, install the latest version of the @react-native-ai/apple package:

npm install @react-native-ai/apple@latest

Under the Hood: `AVSpeechSynthesizer`

This feature leverages Apple's AVSpeechSynthesizer to perform text-to-speech synthesis entirely on the user's device.

For maximum performance, our native module operates directly on audio data streams. It captures the raw AVAudioPCMBuffer chunks from the synthesizer, combines the raw PCM data, and generates a complete WAV buffer with the proper headers.

This file-free approach avoids disk I/O, ensuring you receive a ready-to-use audio byte stream in your React Native app with minimal latency.

How to Use It

We've integrated this capability with the AI SDK, making on-device speech generation straightforward. The API takes your input text and returns the synthesized audio in multiple formats for convenience.

import { experimental_generateSpeech as speech } from 'ai'
import { apple } from '@react-native-ai/apple'

const { audio } = await speech({
  model: apple.speechModel(),
  text: 'This is a test of on-device speech synthesis.',
})

// The audio is available in different formats
console.log(audio.uint8Array)
console.log(audio.base64)

The returned audio is a WAV byte stream.

The sample rate and bit depth are determined by the system voice, which is commonly 44.1 kHz, 16-bit PCM.

Rich Voice Selection

You can easily control the language or select a specific voice available on the device. To see which voices are installed, you can call AppleSpeech.getVoices().

import { AppleSpeech } from '@react-native-ai/apple'

const voices = await AppleSpeech.getVoices()
console.log(voices)
// [
//   { identifier: 'com.apple.voice.compact.en-US.Samantha', name: 'Samantha', ... },
//   ...
// ]

Each voice object contains useful metadata, including its identifier, name, language, and quality.

Use a voice's unique identifier for precise control:

await speech({
  model: apple.speechModel(),
  text: 'Custom voice synthesis',
  voice: 'com.apple.voice.super-compact.en-US.Samantha',
})

Alternatively, specify only a language to use the system's default voice for that locale:

await speech({
  model: apple.speechModel(),
  text: 'Bonjour tout le monde!',
  language: 'fr-FR',
})

If both voice and language are provided, the voice identifier takes priority.

Personal Voice and Asset Management

The system gives you access to a catalog of built-in voices, including premium variants that may require a one-time download. For users on iOS 17 and later, this includes access to their own "Personal Voice" if one has been created. Our provider automatically requests the necessary user authorization to list and use a Personal Voice if it is available.

A key benefit of this architecture is that all voice assets are managed by the operating system, completely separate from your app bundle. To manage voices, users can go to iOS Settings → Accessibility → Spoken Content → Voices.

Why On-Device Speech?

Privacy First: All text-to-speech conversion happens locally. No user data ever leaves the device.
No Network Latency: Audio is generated almost instantly, making it perfect for real-time, responsive applications.
Offline Functionality: Because it runs entirely on-device, the feature works flawlessly without an internet connection.

Conclusion

With the addition of on-device speech synthesis, we are continuing our mission to provide React Native developers with a comprehensive, private, and high-performance AI toolkit.

Up next, we will be focusing on offering comprehensive documentation and working on an Android equivalent.

We encourage you to explore the documentation, try out the new feature, and share your feedback with us on GitHub.

‍

Table of contents

This is some text inside of a div block.

Integrating AI into your React Native workflow?

We help teams leverage AI to accelerate development and deliver smarter user experiences.

Let’s chat