When to Use Apple Foundation Models on Mobile
If you are building AI features on mobile, Apple’s Foundation Models framework is worth looking at for one reason above all: it gives you a practical local default.
That does not mean Apple’s on-device model should become your main model for everything. It should not. It is generally a weaker model than a good cloud LLM, and it is the wrong tool for long-context reasoning, deep synthesis, or anything that depends on remote knowledge.
The useful question is simpler: which requests are small enough, frequent enough, and local enough that sending them to the cloud is wasteful?
Where Apple’s local model fits
Apple’s documentation positions the Foundation Models framework around tasks like summarization, extraction, understanding, and text refinement. That is a useful boundary.
A good local-first request usually has most of these properties:
- short prompt and short output
- value drops quickly if latency is noticeable
- request can be answered from local context alone
- request should still work offline or in degraded network conditions
- request touches personal data you would rather not ship to a server unless necessary
That makes Apple’s local model a strong fit for things like:
- rewriting text in a composer
- generating a quick title, label, or reply suggestion
- extracting structured fields from a local document
- lightweight classification or routing before a heavier downstream workflow
- short-form transformations where the user can immediately review the output
These are not benchmark-friendly tasks. They are product tasks. And they are exactly where low latency, privacy, and responsiveness matter more than maximum intelligence.
Where it does not fit
This is the part worth being blunt about: Apple’s local model is not the model you pick when the quality ceiling matters more than latency.
If the feature needs any of the following, a cloud model is usually the better primary path:
- large context windows
- multi-document reasoning
- tool use against backend systems
- strong cross-platform parity across iOS, Android, and web
- deep reasoning where a weak answer is more damaging than a slow one
- knowledge that does not exist on the device
A good concrete example is coding assistance.
Local on-device models can be useful for lightweight developer UX, such as short inline rewrites, small autocomplete suggestions, or private edits to a single file. But they are usually a poor primary choice for serious coding workflows that depend on repository-wide understanding, long context windows, multi-file refactors, debugging across logs and stack traces, or reliable tool use.
If the assistant needs to understand architecture across many files, reason over documentation and code at the same time, call external tools, or produce consistently strong implementation plans, the local model becomes a weak link. In those cases, lower latency is not enough to compensate for lower reasoning quality.
In practice, coding is often exactly where a hybrid model makes more sense: local for fast, private, low-risk interactions, and cloud for the tasks where correctness and broader context matter.
That is why local first is a much better framing than on-device only.
You are not limited to Apple’s built-in local model. Downloadable third-party models can make sense when you need a different capability profile on-device. But they come with real costs: model packaging or download flow, storage footprint, update strategy, memory pressure, startup time, and often a more complex safety and compatibility story. Third-party local models expand what is possible, but they also turn on-device AI into a product delivery problem, not just an API choice.
The practical React Native integration pattern
At the React Native level, the integration shape is simple:
import { apple } from '@react-native-ai/apple';
import { streamText } from 'ai';
const canRunLocally = apple.isAvailable();
const stream = streamText({
model: canRunLocally ? apple() : cloudModel,
messages,
abortSignal: controller.signal,
});
for await (const chunk of stream.textStream) {
appendChunk(chunk);
}There are a few decisions worth noting here:
First, gate on availability. Do not assume the device supports the local path.
Second, make routing explicit. The model choice should be a product decision, not an accident of whichever SDK call is closest.
Third, stream into the UI. For chat, rewrite, and suggestion flows, progressive output matters.
Fourth, support cancellation. On mobile, users edit, retry, and interrupt constantly. If a newer request supersedes an older one, abort the old stream.
Case study: on-device titles and summaries for a multi-model chat app
ChatXOS is a chat app that lets users talk to and compare multiple AI models in one place. To improve everyday usability, the team uses our open source on-device AI library to generate chat titles and summaries locally.
That is a good example of where Apple’s local model is genuinely useful.
Titles and short summaries are:
- small
- frequent
- latency-sensitive
- cheap to regenerate
- not worth a cloud round trip every time
Using the local path here keeps small interactions fast and makes multi-model comparison feel smoother, while leaving the actual heavy model work to stronger remote systems where needed.
That is the pattern to copy: use the local model for the cheap, repeated UX layer around AI, not necessarily for the hardest AI task in the app.
A better way to think about fallback
If the local model can do the job well enough, it should usually be the first hop.
If the request is too long, too important, too complex, or too dependent on remote context, escalate to the cloud.
A simple rule works well in practice:
- use Apple locally for short, cheap, private, offline-friendly tasks
- use cloud for long-context, high-quality, or backend-dependent tasks
That is the real value of Apple’s local AI on mobile. Not that it replaces the cloud, but that it stops you from using the cloud where you never needed it in the first place.

Learn more about AI
Here's everything we published recently on this topic.





















