Agent Device: iOS & Android Automation for AI Agents

Authors
No items found.

As AI coding agents become part of our daily engineering workflows, one gap remains clear: they can read and write code, but still struggle to reliably interact with mobile apps.

That’s why we built agent-device: a lightweight, token-efficient CLI for automating iOS and Android devices and simulators, with a command model designed for agents and developers.

Vercel’s agent-browser was a big inspiration for this project. If browser automation gave agents “eyes and hands” on the web while not burning tokens, agent-device brings that same idea to native mobile UI.

What is agent-device?

agent-device is a CLI + daemon stack for agent-driven mobile automation with practical primitives:

  • open / close for app lifecycle and session flow
  • snapshot for structured accessibility trees
  • click, fill, scroll, type, get for interactions
  • find for semantic targeting
  • replay for deterministic reruns from recorded .ad scripts

The workflow is straightforward: an agent explores the mobile app using accessibility tree snapshots, stores refs to interactive elements, and performs actions.

On top of that it can also do screenshots, record videos, and save “replay” scripts for later reuse.

Why it helps developers

As engineering teams increasingly use agents like Codex, Claude Code, and Gemini in day-to-day development, expectations are shifting from “assistive” to “autonomous.” Quickly. Teams want agents to handle more of the workflow end to end, including mobile UI tasks, so developers can move faster with less manual intervention.

agent-device is built for that direction: one command surface across iOS and Android, compact structured snapshots that fit LLM context limits, and session-aware behavior aligned with real app testing flows. It also bridges exploratory agent actions with deterministic replay, while screenshots and trace logs make flaky behavior easier to debug.

The outcomes are higher leverage from agents, less context switching for developers, and a clearer path to reliable mobile automation.

Why we’re building it

We can see that agent-native mobile tooling is still early, and the ecosystem around it hasn’t matured yet. What we’re still missing, among many other things are:

  1. Reliable mobile UI understanding for agents
  2. A practical bridge from exploratory actions to deterministic replay
  3. A unified agent experience across iOS and Android
  4. Debugging tools that are optimized for agents

agent-device is our way to close some of these gaps with a small, composable tool that works for both people and agents. After all, it’s a simple CLI. There's no AI inside.

Getting started

This is how you, or rather your agent, will typically interact with the app using agent-device:

agent-device open SampleApp --platform ios
agent-device snapshot -i                 # Get interactive elements and capture refs (@eN)
agent-device click @e2                   # Click by ref
agent-device snapshot -i                 # Update snapshot after interaction
agent-device fill @e3 "test@example.com" # Clear then type
agent-device snapshot -i
agent-device get text @e1                # Get text content
agent-device screenshot page.png         # Save screenshot
agent-device close

If you haven’t used agent-browser yet, this may be the first time you see snapshot-ref notation, and you’re perfectly fine to wonder: what are these @e2 and @e3, and where are they coming from?

Fortunately it gets quite obvious when you see the output of snapshot command:

> agent-device snapshot -i
@e1 [application] "Settings"
@e2 [window]
@e3 [other] "Settings"
@e4 [navigation-bar] "Settings"
@e5 [button] "Settings"
@e6 [other] "Hover Text, Off"
@e7 [other] "Vision"
@e8 [table] "Accessibility"

Now we see that e.g., @e5 in the above snapshot, taken from iOS Settings app, corresponds to the “Settings” button. An agent can discover these interactive (the -i flag) or non-interactive elements and “see” our UI, which has to be accessible for e.g. screen readers, as any app should be.

And if there’s any issue progressing with a task, such as “fill this 401k form for me”, due to missing accessible elements, an agent may choose to use screenshot command, understand what’s on the screen, pick coordinates, and use alternative syntax for interactions such as click or fill.

From there, you can record and replay flows (saved as .ad scripts) and use them as end-to-end tests that are deterministic and easy to maintain with replay --update command.

Use official Skills

How will an agent know all of that? This is where our built-in skills come in. Using a single command:

npx skills add callstackincubator/agent-device

…your agent can download the latest documentation and usage best practices for the agent-device project. Agents love this when exploring new tools like this one.

Without skills, they’re essentially blind to how to use them properly and will try some self-discovery through help command, sometimes web tool calls.

With skills loaded, any agent such as Claude Code or Codex gets proficient in wielding the tool from the first call. It saves time and tokens. We’ve seen this many times in our testing already, that’s why we ship them right away.

Try it yourself

If this workflow is relevant to you, give agent-device a try. Feel free to let us know what breaks, what’s missing, and what you want the project to support next.

And if you find it useful, star the repo to support the project and help more developers discover it: https://github.com/callstackincubator/agent-device

Table of contents
Integrating AI into your React Native workflow?

We help teams leverage AI to accelerate development and deliver smarter user experiences.

Let’s chat

//

//
AI

We can help you move
it forward!

At Callstack, we work with companies big and small, pushing React Native everyday.

On-device AI

Run AI models directly on iOS and Android for privacy-first experiences with reliable performance across real devices.

AI Knowledge Integration

Connect AI to your product’s knowledge so answers stay accurate, up to date, and backed by the right sources with proper access control.

Generative AI App Development

Build and ship production-ready AI features across iOS, Android, and Web with reliable UX, safety controls, and observability.

AI Vibe Coding Cleanup

Turn AI-generated code from tools like Cursor, Claude Code, Codex, or Replit into production-ready software by tightening structure, validating safety, and making it stable under real-world usage.

//
Insights

Learn more about AI

Here's everything we published recently on this topic.