Hydra tutorial series — Part 4: Skills
Which skills run inside the automated Hydra factory, which exist for humans on the CLI, how they get invoked, and when you should add a new skill to the loop yourself. Fourth of six short modules.
This part dives straight into the Hydra-specific skill families. If you'd rather first learn what a Claude Skill even is, how the frontmatter works, and when you'd write one yourself, take the public Claude Skills tutorial series (three short modules, ~40 minutes). From here on we assume you know the basics.
The previous parts were about what Hydra does. This part is about how: the skills that let the personas do their job. By the end you'll know which skills the automated pipeline (the "Hydra factory") runs and which ones you call yourself as a human, you'll know the five families, and you'll be able to judge when a new skill is worth writing.
Skills, in one paragraph
A skill in Claude Code is a folder with a SKILL.md (and optionally scripts, examples/, helpers). The folder sits under .claude/skills/<name>/. The description in the frontmatter tells Claude when the skill is relevant; the contents are the instructions Claude follows once the skill is loaded.
For Hydra, skills are the bundling unit for behaviour: instead of pasting a thousand lines of prompt text straight into a persona's CLAUDE.md, it lives in a skill — reusable and testable.
How a skill gets invoked
A single skill can be triggered in two ways:
- Manually — you type
/opsx-applyin a Claude session. Direct, predictable, and the skill runs exactly when you want it to. - Automatically — Claude reads the
descriptionof every available skill and picks one itself when the current situation matches. Ask "what changed?" and Claude will trigger a skill likesummarize-changeson its own if one is available.
Which mode is allowed is set in the skill's own frontmatter:
disable-model-invocation: true— only you can call it via/<name>. Use this for skills with side-effects (/opsx-applymodifies code,/create-propens a PR). Claude shouldn't just decide to do that on its own.user-invocable: false— only Claude itself may load it. Use this for background knowledge (for example alegacy-system-contextthat doesn't make sense as an action).- Neither set — both are allowed.
So that distinction — manual vs. automatic — is about configuration of one skill, not about two different kinds of files. A persona in a Docker container has no keyboard and leans heavily on auto-activation; a human on the CLI usually wants explicit control and types /.
Two worlds: the Hydra factory vs. skills for humans
Hydra's hydra/.claude/skills/ contains ~70 skills, but the automated pipeline only uses a handful. The rest is tooling for you, behind a keyboard. Important to keep separate:
What the Hydra factory actually runs
Every pipeline container has its own, limited set of skills baked into its Docker image:
| Container | Persona | Skills in image | Main function |
|---|---|---|---|
| Builder | Al Gorithm | All .claude/skills/* + all vendor skills | Implements tasks; primarily opsx-apply. Has the other opsx skills on hand for context during verification, finding fits again, and fixing after a quality fail. |
| Reviewer | Juan Claude | hydra-gates + the hydra-gate-* skills + vendor code-review | Mandatory hydra-gates check + deeper code review via the Anthropic community skill. |
| Security | Clyde Barcode | hydra-gates + the hydra-gate-* skills + vendor trailofbits + vendor owasp | Mandatory hydra-gates + SAST (Semgrep) + OWASP top 10 checklists. |
| Applier | Axel Plier | No skills, no write tools (Read + Bash only) | Reads the final diff and both review verdicts and returns one binary answer — pass or not. It applies no fixes; it's a go/no-go gate, not an editor. |
| Browser UI Tester | (sonnet, headless) | One skill: hydra-ui-test | Logs in, navigates the live app via Playwright MCP, delivers verdict JSON. |
On top of that, scripts/run-hydra-gates.sh runs all 22 hydra-gates mechanically in every container, independent of which skill files are in the image. The skill files serve Claude as documentation when fixing; the script does the detection.
What the factory doesn't do — for humans on the CLI
All the other skills (~50 out of 70) are for you, or for a fellow dev in a Claude Code session on their laptop. Examples:
- Preparing a change —
opsx-new,opsx-ff,opsx-explore,opsx-plan-to-issuesare things you do as a human before you throw the work into the pipeline. The Builder only runsopsx-applyon an existing change. - Taking on a role —
team-architect,team-backend,team-poetc. are pure roleplay frames for one-human-one-role sessions. The pipeline doesn't use them. - Running tests —
/test-counsel(all 8 personas) or/test-app(one browser sweep) you start by hand. The factory has its own browser tester (hydra-ui-test); thetest-*family is separate from that. - PRs and day-to-day work —
/create-pr,/review-pr,/report-outare the "three times a day" tools for you, not for the pipeline.
Remember: the factory grabs 5 skills, a human can call all 70. That's the whole difference.
The five skill families in full
With that split in mind, here are all five families. The tag system in the table: 🤖 factory = ships inside a pipeline container, ⌨️ human = CLI only.
Family 1: OpenSpec workflow (opsx-*, 16 skills)
The OPSX skills implement the Conduction workflow for OpenSpec changes — from proposal to archive.
| Skill | For whom | Does |
|---|---|---|
opsx-apply | 🤖 Builder | Implements tasks from a change (the pipeline default). |
opsx-verify | 🤖 Builder | Verifies that the implementation covers the change artefacts. |
opsx-archive | 🤖 Builder + ⌨️ | Archives a completed change, syncing delta into the spec. |
opsx-new | ⌨️ | Starts a new change (proposal scaffold, schema choice). |
opsx-ff | ⌨️ | "Fast-forward": creates a change + all artefacts in a single pass. |
opsx-continue | ⌨️ | Pick up an interrupted change. |
opsx-explore | ⌨️ | Explore pre-spec what a change should even be. |
opsx-onboard | ⌨️ | Onboarding flow on a new repo (sets up OpenSpec config + structure). |
opsx-plan-to-issues | ⌨️ | Converts tasks.md into GitHub issues + a tracking issue. |
opsx-sync | ⌨️ | Sync delta specs back into the main spec. |
opsx-bulk-archive | ⌨️ | Archive multiple completed changes in one go. |
opsx-apply-loop | ⌨️ | Headless build → quality-fix loop for local dev runs. |
opsx-pipeline | ⌨️ | Run multiple changes in parallel (multi-agent). |
opsx-coverage-scan | ⌨️ | Audit a legacy app for spec ↔ code coverage. |
opsx-annotate | ⌨️ | Applies @spec PHPDoc tags after a coverage scan. |
opsx-reverse-spec | ⌨️ | Reverse-engineers a spec from existing code. |
Sixteen skills is a lot. In practice you reach for one of a handful, and the choice follows a simple decision aid:
- Starting fresh and sure of the feature? →
opsx-ff(one pass to all artefacts), thenopsx-plan-to-issues. - Starting fresh but unsure of scope/approach? →
opsx-explorefirst, thenopsx-newandopsx-continuestep by step. - New repo with no OpenSpec yet? →
opsx-onboard. - Legacy code with no specs? →
opsx-coverage-scan→opsx-reverse-spec/opsx-annotate. - Change is written, want it built locally without the full factory? →
opsx-apply-loop(one change) oropsx-pipeline(several in parallel). - Inside the factory the builder only ever runs
opsx-apply,opsx-verifyandopsx-archive— the rest are human-driven preparation.
Family 2: Quality + security gates (hydra-gate-*)
The parent skill hydra-gates is a dispatcher that calls every gate together. The script engine under the hood (scripts/run-hydra-gates.sh) is the single source of truth, and it currently runs 22 gates (gate-1 … gate-22) in every container; the per-gate skill files serve Claude as reference material when fixing a failure.
Rather than reproduce the full table here, it lives — accurate and numbered — in part 3: Quality gates. For the skill family it's enough to know the gates split into three rough groups:
- Hygiene —
spdx-headers,forbidden-patterns,stub-scan,composer-audit,conflict-markers. - Security & auth —
route-auth,orphan-auth,no-admin-idor,unsafe-auth-resolver,semantic-auth,route-reachability. - Conduction conventions & traceability —
initial-state,admin-router,nc-input-labels,modal-isolation,dashboard-antipattern,redundant-controller,notification-dialect,or-objectservice-api,manifest-validation, and the two traceability gatesspec-coverage(gate-16) ande2e-coverage(gate-19).
All 22 run 🤖 inside the Hydra factory (Reviewer + Security containers; Builder also runs them as a post-flight check during fix-quality). When a new incident produces a new gate, it's added here and the count grows — so trust the script's ALL N GATES GREEN line, not a hard-coded number in any doc.
Family 3: Team roles (team-*, 7 skills) — ⌨️ humans only
Skills that model a specific kind of work via a persona frame. Intended for a human working in a terminal alongside Hydra who wants to step into a single role for a bit.
team-po(Product Owner) — writes user stories and acceptance criteria.team-sm(Scrum Master) — manages backlog and sprint planning artefacts.team-architect— makes architecture decisions, writes ADR drafts.team-backend— implements backend work (PHP, services, mappers).team-frontend— implements frontend work (Vue 2, Pinia, NL Design System).team-reviewer— a manual variant of the reviewer work.team-qa— writes test cases and test plans.
None of these ship in a pipeline container — they have no place in the automated loop.
Family 4: Test suites (test-*, 19 skills) — almost all ⌨️ for humans
The largest family by count: around twenty skills that drive agentic browser and API testing, in three clusters:
Test types (one per test kind):
test-app— automated browser test of a whole Nextcloud app (Playwright MCP).test-functional— functional scenarios against implemented features.test-api— API checks against the PHP built-in server.test-accessibility,test-performance,test-security,test-regression— specialised variants.
Personas (test-persona-*) — eight Dutch user profiles, each looking at an app from their own angle: annemarie, fatima, henk, janwillem, mark, noor, priya, sem. Henk reads with large type and looks for simple navigation; Noor hammers on RBAC and audit trails; Annemarie checks NLGov/GEMMA mapping. The persona cards themselves live in hydra/personas/.
Scenario management — test-scenario-create, test-scenario-edit, test-scenario-run write, edit and run reusable TS-NNN-*.md scenarios per app.
The dispatcher for this family is test-counsel: it coordinates all eight personas against one feature and delivers a combined report. Same pattern as hydra-gates for the quality-gates family.
There are three separate "testing" surfaces and they're easy to confuse:
hydra-ui-test— the factory's own browser tester. Runs on the host (viascripts/run-browser-tests.sh, Playwright MCP), logs into the live app, walks the spec's acceptance criteria, and returns a verdict JSON. This is runtime proof that the feature works.- gate-19 (
e2e-coverage) — a static gate from part 3. It doesn't run a browser at all; it checks that every spec scenario is linked to a Playwright test via an@e2eannotation. Coverage bookkeeping, not execution. - The
test-*family — human-driven test suites you run by hand before or after the pipeline.
So: gate-19 checks the links exist, hydra-ui-test actually drives the browser inside the factory, and test-* is what you run manually. Different jobs.
Family 5: Utility & maintenance (~13 skills) — almost all ⌨️ for humans
The rest — those are the remaining ~13 skills. Mostly dev comfort and meta work:
| Skill | Does |
|---|---|
create-pr | Creates a PR from a feature branch — local checks → branch pick → PR body. |
review-pr | Reviews a GitHub PR (note: manual variant; the factory has its own Juan Claude). |
report-out | End-of-day report: today's commits + GitHub activity → Dutch Slack notification. |
clean-env | Fully resets the local Docker dev environment. |
local-run | Bring up the local Nextcloud dev environment. |
sync-docs | Sync {app}/docs/ or .github/docs/claude/ with reality in the repo. |
skill-creator | Wizard for building a new skill (scaffold, frontmatter, evals). |
feature-counsel | Pre-build spec analysis from eight persona perspectives (sibling of test-counsel). |
persistence-audit | Audit how an app handles data persistence (object store, sessions, etc.). |
journeydoc-init / journeydoc-add-story / journeydoc-instrument | Manual instrumentation + extension of Journey docs. |
verify-global-settings-version | Check whether global-settings/VERSION was bumped after a change. |
Nothing in this family runs in the pipeline. They are your daily / commands.
Vendor skills (community)
Next to its own skills, Hydra has vendor skills under hydra/vendor/skills/:
code-review— community review skill from Anthropic. → 🤖 Reviewer container.trailofbits— Semgrep-based static-analysis methodology from Trail of Bits. → 🤖 Security container.owasp— OWASP top 10:2025 + ASVS 5.0 checklists. → 🤖 Security container.
So those three do ship inside the factory (containers). They get loaded onto Juan and Clyde for extra coverage on top of our own hydra-gate-*. Maintenance sits with external parties — updates happen by tracking upstream, not by editing them yourself.
When do you write a new skill?
The pragmatic test: write a skill if…
- The check / behaviour is repeatable — needed more than once, in more than one place.
- It's mechanically describable — you can instruct it in 1-3 paragraphs without it devolving into "it depends on the context".
- A persona or a human would benefit from it. Don't write a skill because you can.
For a false-positive gate (part 3): you adjust the existing skill, you don't write a new one. For a new class of mistake that you see come by 3x: yes, that earns its own hydra-gate-* skill.
Test yourself
Four short questions to check whether you've grasped this part. Stuck? Click Hint. Curious about the answer? Click Answer.
1. What are the two ways a skill can be activated, and how do you control that per skill?
Hint
One way requires a human to type something. The other lets Claude decide for itself based on the skill description. Which frontmatter fields determine which mode is allowed?
Answer
- Manual: you type
/<name>in a Claude session. Direct, predictable. - Automatic: Claude reads the
descriptionof all skills and picks one itself when the current conversation matches.
In the skill's frontmatter you set:
disable-model-invocation: true— manual only. Used for skills with side-effects (/opsx-apply,/create-pr).user-invocable: false— automatic only. Used for background knowledge that isn't useful as an action.- Both empty → both allowed. The default for most Hydra skills.
In the Hydra pipeline the containers mostly use auto-activation (Claude picks the right skill itself); a human on the CLI typically types / explicitly.
2. Which skills actually run inside the pipeline containers, and which ones sit in the repo only for humans?
Hint
Three containers (Builder, Reviewer, Security) each have a specific set in their image. What's inside — and which families sit entirely outside?
Answer
Inside the pipeline containers (🤖 factory):
- Builder — all
.claude/skills/*baked in, but the standard run isopsx-apply. The other opsx skills are there for context. - Reviewer —
hydra-gates+ 13 individualhydra-gate-*+ vendorcode-review. - Security —
hydra-gates+ 13 individualhydra-gate-*+ vendortrailofbits+ vendorowasp. - Applier — no skills (pure CLAUDE.md).
- Browser UI Tester — only
hydra-ui-test.
Plus: scripts/run-hydra-gates.sh runs all 14 gates mechanically in every container, regardless of which skill files are in the image.
Not in the factory, only for humans on the CLI:
- Almost the entire
team-*family (7 skills). - Almost the entire
test-*family (19 skills) — the factory has its ownhydra-ui-test. - Most
opsx-*skills exceptopsx-apply/verify/archive(humans start changes; the pipeline implements them). - The whole utility family (
create-pr,review-pr,report-out,clean-env,sync-docs,skill-creator,feature-counsel,persistence-audit,journeydoc-*,verify-global-settings-version).
In total: of ~70 skills in the repo, the automated pipeline uses a handful; the rest is your toolkit.
3. When do you adjust an existing gate skill and when do you write a new one?
Hint
One decision is about "the gate doesn't do what we already wanted it to do". The other is about "we've discovered a new category of mistake".
Answer
- Adjust existing for a false positive: the gate triggers too broadly or too narrowly on something it was already meant to check. Example:
hydra-gate-forbidden-patternsmatched$builder->add(incorrectly — you tighten the regex, you don't add a second gate. - Write new for a new class of mistake that you see come by 3× and that slips through all existing meshes. Example:
hydra-gate-stub-scanwas born when a builder shipped areturn null;method that PHPCS swallowed and had no test — that was a new category, not a fix on an existing check.
Rule of three: once is chance, twice is coincidence, three times is a pattern that deserves its own gate.
4. What do the "vendor skills" do and why do we keep them separate from our own hydra-gate-*?
Hint
Think about provenance (who wrote them?), which pipeline container they get loaded into, and what happens when you need an external party to update their work.
Answer
Vendor skills (hydra/vendor/skills/) are skills that were not written by us, and that do ship inside the factory:
code-review(Anthropic community) → Reviewer container (Juan Claude).trailofbits(Trail of Bits, Semgrep methodology) → Security container (Clyde).owasp(OWASP top 10:2025 + ASVS 5.0) → Security container.
We keep them separate because:
- Maintenance sits with external parties — we update them by tracking upstream (see
vendor/skills/VERSIONS.md), not by editing them ourselves. Our own gates we mutate freely; vendor skills we leave alone. - Audit trail stays clear: what's ours vs. what's community/external? On a failure you immediately know which camp is responsible for the fix.
Next step
In part 5 we get practical: starting a real Hydra run on a real app, including the label-prefix trick for parallel dev runs.