1 min read

2/17/2026

Link copied to clipboard!

Share on X

Share on Facebook

Copy Link

0:00/

Listen to Article ()

Agent Device: iOS & Android Automation for AI Agents

Authors

Michał Pierzchala

Principal Engineer

Callstack

As AI coding agents become part of our daily engineering workflows, one gap remains clear: they can read and write code, but still struggle to reliably interact with mobile apps.

That’s why we built agent-device: a lightweight, token-efficient CLI for automating iOS and Android devices and simulators, with a command model designed for agents and developers.

Vercel’s agent-browser was a big inspiration for this project. If browser automation gave agents “eyes and hands” on the web while not burning tokens, agent-device brings that same idea to native mobile UI.

What is agent-device?

agent-device is a CLI + daemon stack for agent-driven mobile automation with practical primitives:

open / close for app lifecycle and session flow
snapshot for structured accessibility trees
click, fill, scroll, type, get for interactions
find for semantic targeting
replay for deterministic reruns from recorded .ad scripts

The workflow is straightforward: an agent explores the mobile app using accessibility tree snapshots, stores refs to interactive elements, and performs actions.

On top of that it can also do screenshots, record videos, and save “replay” scripts for later reuse.

Why it helps developers

As engineering teams increasingly use agents like Codex, Claude Code, and Gemini in day-to-day development, expectations are shifting from “assistive” to “autonomous.” Quickly. Teams want agents to handle more of the workflow end to end, including mobile UI tasks, so developers can move faster with less manual intervention.

agent-device is built for that direction: one command surface across iOS and Android, compact structured snapshots that fit LLM context limits, and session-aware behavior aligned with real app testing flows. It also bridges exploratory agent actions with deterministic replay, while screenshots and trace logs make flaky behavior easier to debug.

The outcomes are higher leverage from agents, less context switching for developers, and a clearer path to reliable mobile automation.

Why we’re building it

We can see that agent-native mobile tooling is still early, and the ecosystem around it hasn’t matured yet. What we’re still missing, among many other things are:

Reliable mobile UI understanding for agents
A practical bridge from exploratory actions to deterministic replay
A unified agent experience across iOS and Android
Debugging tools that are optimized for agents

agent-device is our way to close some of these gaps with a small, composable tool that works for both people and agents. After all, it’s a simple CLI. There's no AI inside.

Getting started

This is how you, or rather your agent, will typically interact with the app using agent-device:‍

agent-device open SampleApp --platform ios
agent-device snapshot -i                 # Get interactive elements and capture refs (@eN)
agent-device click @e2                   # Click by ref
agent-device snapshot -i                 # Update snapshot after interaction
agent-device fill @e3 "test@example.com" # Clear then type
agent-device snapshot -i
agent-device get text @e1                # Get text content
agent-device screenshot page.png         # Save screenshot
agent-device close

‍If you haven’t used agent-browser yet, this may be the first time you see snapshot-ref notation, and you’re perfectly fine to wonder: what are these @e2 and @e3, and where are they coming from?

Fortunately it gets quite obvious when you see the output of snapshot command:

> agent-device snapshot -i
@e1 [application] "Settings"
@e2 [window]
@e3 [other] "Settings"
@e4 [navigation-bar] "Settings"
@e5 [button] "Settings"
@e6 [other] "Hover Text, Off"
@e7 [other] "Vision"
@e8 [table] "Accessibility"

‍‍Now we see that e.g., @e5 in the above snapshot, taken from iOS Settings app, corresponds to the “Settings” button. An agent can discover these interactive (the -i flag) or non-interactive elements and “see” our UI, which has to be accessible for e.g. screen readers, as any app should be.

And if there’s any issue progressing with a task, such as “fill this 401k form for me”, due to missing accessible elements, an agent may choose to use screenshot command, understand what’s on the screen, pick coordinates, and use alternative syntax for interactions such as click or fill.

From there, you can record and replay flows (saved as .ad scripts) and use them as end-to-end tests that are deterministic and easy to maintain with replay --update command.

Use official Skills

How will an agent know all of that? This is where our built-in skills come in. Using a single command:

npx skills add callstackincubator/agent-device

‍…your agent can download the latest documentation and usage best practices for the agent-device project. Agents love this when exploring new tools like this one.

Without skills, they’re essentially blind to how to use them properly and will try some self-discovery through help command, sometimes web tool calls.

With skills loaded, any agent such as Claude Code or Codex gets proficient in wielding the tool from the first call. It saves time and tokens. We’ve seen this many times in our testing already, that’s why we ship them right away.

Try it yourself

If this workflow is relevant to you, give agent-device a try. Feel free to let us know what breaks, what’s missing, and what you want the project to support next.

And if you find it useful, star the repo to support the project and help more developers discover it: https://github.com/callstackincubator/agent-device

Table of contents

This is some text inside of a div block.

Integrating AI into your React Native workflow?

We help teams leverage AI to accelerate development and deliver smarter user experiences.

Let’s chat

We can help you move
it forward!

At Callstack, we work with companies big and small, pushing React Native everyday.

Check our offer

On-device AI

Run AI models directly on iOS and Android for privacy-first experiences with reliable performance across real devices.

Find out more

AI Knowledge Integration

Connect AI to your product’s knowledge so answers stay accurate, up to date, and backed by the right sources with proper access control.

Find out more

Generative AI App Development

Build and ship production-ready AI features across iOS, Android, and Web with reliable UX, safety controls, and observability.

Find out more

AI Vibe Coding Cleanup

Turn AI-generated code from tools like Cursor, Claude Code, Codex, or Replit into production-ready software by tightening structure, validating safety, and making it stable under real-world usage.

Find out more

Insights

Learn more about AI

Here's everything we published recently on this topic.

View all

Article

1/16/2026

We can help you moveit forward!

On-device AI

AI Knowledge Integration

Generative AI App Development

AI Vibe Coding Cleanup

Learn more about AI

Announcing: React Native Best Practices for AI Agents

Profiling MLC-LLM’s OpenCL Backend on Android: Performance Insights

On-Device Text To Speech on Apple Devices with AI SDK

On-Device Speech Transcription in React Native with Apple SpeechAnalyzer

On-Device Text Embeddings in React Native With Apple NLP Framework

Expanding On-Device Apple LLM Capabilities: Introducing Tool Calling

Exploring Skills in Claude: A Live Walkthrough

Spec-Driven-Development With Kiro.dev

AI SDK 5: What’s New and What’s Next

Building v0 iOS and Fixing React Native Along the Way

Building a Custom AI Assistant for Your Business

What Is the React Native AI SDK? A Complete Intro & Quickstart

Implementing an Android TurboModule with Kotlin

Working with Different Threads in Swift TurboModules

How to Add Type-Safe Constants to Swift TurboModules

Adding Event Emitters to Your TurboModule in Swift

Writing Your First TurboModule in Swift

We can help you move
it forward!