Testing AI Agent Skills Reliably with SkillGym
Testing AI Agent Skills Reliably with SkillGym
Szymon Chmal will speak at meet.js Gdańsk TypeScript Meetup #20 about SkillGym, a framework for testing AI agent skills across real runners and workflows.
Testing AI Agent Skills Reliably with SkillGym

Testing AI Agent Skills Like Real Software
AI agents increasingly rely on skills to understand tools, project conventions, and repeatable workflows. A skill can tell an agent when to use a CLI, which files to inspect, how to operate a device, or how to follow a team’s engineering process. Once those instructions become part of real delivery, developers need a way to know whether they still work after a change.
In this talk, Szymon Chmal will show how SkillGym brings repeatable testing to agent skills. Instead of checking a prompt manually and hoping the next run behaves the same way, SkillGym runs real agent sessions against configured runners such as Codex, Claude Code, OpenCode, or Cursor Agent, then captures what happened during the run.
From Prompt Checks to Assertions
The core idea is simple: describe the behavior the skill should produce, run the agent, and assert on the result. SkillGym can verify whether the right skill was loaded, whether the expected files were read, whether commands ran in the right order, and whether the final output matched the intended outcome.
That makes skill development less dependent on one-off manual inspection. When a change to a SKILL.md file alters behavior, the test can catch it. When one runner follows the instruction and another does not, the report shows where the behavior diverged.
Debugging Agent Behavior Across Real Workflows
Szymon will cover the practical side of using SkillGym: setting up test cases, running them across different agent harnesses, inspecting normalized reports, and using failed runs to understand whether the problem sits in the prompt, the skill wording, the workspace, or the model’s interpretation.
This matters for teams building agent-facing tools because skills are becoming part of the developer experience. They guide agents through mobile testing, debugging, code review, repository workflows, and internal automation. SkillGym gives developers a way to iterate on those instructions with evidence instead of guesswork.
Join Szymon at meet.js Gdańsk TypeScript Meetup #20 to learn how repeatable tests can make AI agent skills more predictable, maintainable, and ready for real development workflows.
Testing AI Agent Skills Reliably with SkillGym
Szymon Chmal will speak at meet.js Gdańsk TypeScript Meetup #20 about SkillGym, a framework for testing AI agent skills across real runners and workflows.

Learn more about AI
Here's everything we published recently on this topic.
React Native Performance Optimization
Improve React Native apps speed and efficiency through targeted performance enhancements.
C++ Library Integration for React Native
Wrap existing C-compatible libraries for React Native with type-safe JavaScript APIs.
Shared Native Core for Cross-Platform Apps
Implement business logic once in C++ or Rust and run it across mobile, web, desktop, and TV.
Custom High-Performance Renderers
Build custom-rendered screens with WebGPU, Skia, or Filament for 60fps, 3D, and pixel-perfect UX.






















