Testing AI Agent Skills Reliably with SkillGym

Date
Thursday, June 11, 2026
Time
5:30 PM - 9:30 PM [CEST]
Location
Gdańsk, Poland
Online

Testing AI Agent Skills Reliably with SkillGym

Szymon Chmal will speak at meet.js Gdańsk TypeScript Meetup #20 about SkillGym, a framework for testing AI agent skills across real runners and workflows.

Date
11 June 2026
-
Time
5:30 PM - 9:30 PM [CEST]
Location
Gdańsk, Poland
Online

Testing AI Agent Skills Reliably with SkillGym

Video Unavailable
Organizer
Organizer
Presented
Meet.JS
@
Speakers
Speakers
Featuring
Szymon Chmal
Software Developer
@
Callstack
Featuring
Szymon Chmal
Software Developer
@
Callstack

Testing AI Agent Skills Like Real Software

AI agents increasingly rely on skills to understand tools, project conventions, and repeatable workflows. A skill can tell an agent when to use a CLI, which files to inspect, how to operate a device, or how to follow a team’s engineering process. Once those instructions become part of real delivery, developers need a way to know whether they still work after a change.

In this talk, Szymon Chmal will show how SkillGym brings repeatable testing to agent skills. Instead of checking a prompt manually and hoping the next run behaves the same way, SkillGym runs real agent sessions against configured runners such as Codex, Claude Code, OpenCode, or Cursor Agent, then captures what happened during the run.

From Prompt Checks to Assertions

The core idea is simple: describe the behavior the skill should produce, run the agent, and assert on the result. SkillGym can verify whether the right skill was loaded, whether the expected files were read, whether commands ran in the right order, and whether the final output matched the intended outcome.

That makes skill development less dependent on one-off manual inspection. When a change to a SKILL.md file alters behavior, the test can catch it. When one runner follows the instruction and another does not, the report shows where the behavior diverged.

Debugging Agent Behavior Across Real Workflows

Szymon will cover the practical side of using SkillGym: setting up test cases, running them across different agent harnesses, inspecting normalized reports, and using failed runs to understand whether the problem sits in the prompt, the skill wording, the workspace, or the model’s interpretation.

This matters for teams building agent-facing tools because skills are becoming part of the developer experience. They guide agents through mobile testing, debugging, code review, repository workflows, and internal automation. SkillGym gives developers a way to iterate on those instructions with evidence instead of guesswork.

Join Szymon at meet.js Gdańsk TypeScript Meetup #20 to learn how repeatable tests can make AI agent skills more predictable, maintainable, and ready for real development workflows.

Register now
Integrating AI into your React Native workflow?

We help teams leverage AI to accelerate development and deliver smarter user experiences.

Let's chat
Link copied to clipboard!
Save my spot

Testing AI Agent Skills Reliably with SkillGym

Szymon Chmal will speak at meet.js Gdańsk TypeScript Meetup #20 about SkillGym, a framework for testing AI agent skills across real runners and workflows.

Insights

Learn more about AI

Here's everything we published recently on this topic.

AI

We can help you move
it forward!

At Callstack, we work with companies big and small, pushing React Native everyday.

On-device AI

Run AI models directly on iOS and Android for privacy-first experiences with reliable performance across real devices.

Generative AI App Development

Build and ship production-ready AI features across iOS, Android, and Web with reliable UX, safety controls, and observability.

AI Vibe Coding Cleanup

Turn AI-generated code from tools like Cursor, Claude Code, Codex, or Replit into production-ready software by tightening structure, validating safety, and making it stable under real-world usage.

React Native Performance Optimization

Improve React Native apps speed and efficiency through targeted performance enhancements.

C++ Library Integration for React Native

Wrap existing C-compatible libraries for React Native with type-safe JavaScript APIs.

Shared Native Core for Cross-Platform Apps

Implement business logic once in C++ or Rust and run it across mobile, web, desktop, and TV.

Custom High-Performance Renderers

Build custom-rendered screens with WebGPU, Skia, or Filament for 60fps, 3D, and pixel-perfect UX.