6/11/2026

5:30 PM - 9:30 PM [CEST]

Online

Testing AI Agent Skills Reliably with SkillGym

Name: Testing AI Agent Skills Reliably with SkillGym
Start: 2026-06-08T12:19:33.085Z

Date

Thursday, June 11, 2026

Time

5:30 PM - 9:30 PM [CEST]

Location

Gdańsk, Poland

Online

Testing AI Agent Skills Reliably with SkillGym

Szymon Chmal will speak at meet.js Gdańsk TypeScript Meetup #20 about SkillGym, a framework for testing AI agent skills across real runners and workflows.

Join us

Date

11 June 2026

Time

5:30 PM - 9:30 PM [CEST]

Location

Gdańsk, Poland

Online

Testing AI Agent Skills Reliably with SkillGym

Organizer

Presented

Meet.JS

Speakers

Featuring

Szymon Chmal

Software Developer

Callstack

Featuring

Szymon Chmal

Software Developer

Callstack

Testing AI Agent Skills Like Real Software

AI agents increasingly rely on skills to understand tools, project conventions, and repeatable workflows. A skill can tell an agent when to use a CLI, which files to inspect, how to operate a device, or how to follow a team’s engineering process. Once those instructions become part of real delivery, developers need a way to know whether they still work after a change.

In this talk, Szymon Chmal will show how SkillGym brings repeatable testing to agent skills. Instead of checking a prompt manually and hoping the next run behaves the same way, SkillGym runs real agent sessions against configured runners such as Codex, Claude Code, OpenCode, or Cursor Agent, then captures what happened during the run.

From Prompt Checks to Assertions

The core idea is simple: describe the behavior the skill should produce, run the agent, and assert on the result. SkillGym can verify whether the right skill was loaded, whether the expected files were read, whether commands ran in the right order, and whether the final output matched the intended outcome.

That makes skill development less dependent on one-off manual inspection. When a change to a SKILL.md file alters behavior, the test can catch it. When one runner follows the instruction and another does not, the report shows where the behavior diverged.

Debugging Agent Behavior Across Real Workflows

Szymon will cover the practical side of using SkillGym: setting up test cases, running them across different agent harnesses, inspecting normalized reports, and using failed runs to understand whether the problem sits in the prompt, the skill wording, the workspace, or the model’s interpretation.

This matters for teams building agent-facing tools because skills are becoming part of the developer experience. They guide agents through mobile testing, debugging, code review, repository workflows, and internal automation. SkillGym gives developers a way to iterate on those instructions with evidence instead of guesswork.

Join Szymon at meet.js Gdańsk TypeScript Meetup #20 to learn how repeatable tests can make AI agent skills more predictable, maintainable, and ready for real development workflows.

Integrating AI into your React Native workflow?

We help teams leverage AI to accelerate development and deliver smarter user experiences.

Let's chat

Link copied to clipboard!

Button

Save my spot