Ga naar hoofdinhoud
AcademytutorialOpenSpec tutorial series — Part 4: From scenario to Playwright test

OpenSpec tutorial series — Part 4: From scenario to Playwright test

A scenario in a spec is a test you have not run yet. This part shows how a GIVEN/WHEN/THEN scenario becomes a Playwright test, how the @e2e and @spec annotations let you backtrace any line of code or any test to the demand it serves, and how gate-19 turns that into a safe sandbox for spec coding instead of vibe coding.

TutorialOpenSpecSpec-firstTestingPlaywrighte2eTutorial series
14 min read

In Part 1 you learned that a scenario describes behaviour in GIVEN / WHEN / THEN form. In Part 2 you wrote one. This part closes the loop. A scenario is a test you have not run yet, and OpenSpec gives you a mechanical way to prove that every scenario in a spec is exercised by a real browser test. The @e2e annotation and the gate that enforces it are what turn "write a spec" into a safe sandbox for AI. They are also the single biggest thing that separates spec coding from vibe coding.

A scenario is a test you have not run yet

Look again at a scenario from Part 2, this time in our pet-store register:

### Requirement: Search pets by name
Users MUST be able to find pets by typing part of the name.

#### Scenario: Searching narrows the visible list
- GIVEN the pet register contains "Rex", "Bella" and "Max"
- WHEN the user types "be" into the search box
- THEN only "Bella" is shown in the result list

Read that WHEN and THEN again. It is already a test script. Type "be" into the search box, assert that only "Bella" is visible. The only thing missing is the code that drives a browser to do exactly that. A Playwright test is that code:

test('Searching narrows the visible list', async () => {
  // @e2e pet-store-search::searching-narrows-the-visible-list
  await page.getByRole('searchbox').fill('be')
  await expect(page.getByText('Bella')).toBeVisible()
  await expect(page.getByText('Rex')).toHaveCount(0)
})

The scenario and the test are two views of one behaviour. One is plain language for the reviewer, one is TypeScript for the browser. The @e2e comment on the second line is the thread that ties them together, and that thread is what a gate can check.

Spec coding rests on a simple promise: from any line of code, or any test, you can trace back to the functional demand it serves. Two annotations make that promise hold, pointing in opposite directions:

LinkAnnotationLives inAnswers
Backward@spec openspec/...the code (PHPDoc / JSDoc)"which requirement does this method implement?"
Forward@e2e <spec>::<slug>the test (Playwright file)"which scenario does this test prove?"

Every method we write carries an @spec tag naming the requirement it is based on. Every test carries an @e2e tag naming the scenario it proves. Put them together and you can walk the loop in either direction:

   Requirement ──@spec──▶ Code ──implements──▶ behaviour
        ▲                                         │
        │                                         ▼
   Scenario ◀──proves── Playwright test ◀──@e2e──┘

Walk it forward and every scenario has a test. Walk it backward and every method has a requirement. The result is 100% backtrace: pick any function in the codebase and you can name the functional demand behind it; pick any demand and you can name the code and the test that satisfy it. Nothing in between is orphaned. No unspecified feature, no untested UI. That is the whole game.

How a scenario becomes a slug

The @e2e annotation references a scenario by a slug, a stable kebab-case identifier the gate derives mechanically from the spec. You do not invent it. You compute it the same way the gate does. There are two spec formats, and each produces a slug deterministically.

Format A, heading scenarios (the common one). Take the #### Scenario: heading, lowercase it, replace each run of non-alphanumeric characters with a hyphen:

#### Scenario: Searching narrows the visible list

This gives the slug searching-narrows-the-visible-list.

Format B, numbered scenarios under a requirement. Some specs list scenarios as a numbered list under a **Scenarios:** marker. The slug is built from the requirement heading plus the item number, not from the prose:

### REQ-SEARCH-001: Search pets by name

**Scenarios:**

1. **GIVEN** the register has 3 pets **WHEN** the user types "be" **THEN** only Bella shows.

This gives the slug req-search-001-search-pets-by-name-scenario-1.

The two compliant @e2e forms

Both of these are accepted, and both may appear anywhere in a test file: a JSDoc docblock, a describe(...) string, a test(...) title, or a plain comment. The gate greps for them. There is no AST parsing.

Long form, canonical path plus anchor:

// @e2e openspec/specs/pet-store-search/spec.md#searching-narrows-the-visible-list

Short form, spec-name::slug:

// @e2e pet-store-search::searching-narrows-the-visible-list

Use whichever you like. Teams usually pick the short form for brevity and the long form when they want a clickable path. Tests live under tests/e2e/**, by convention tests/e2e/spec-coverage/<spec-name>.spec.ts.

Excluding a scenario, with a reason

Not every scenario is a UI behaviour. "The metrics endpoint returns text/plain" is an HTTP contract, not something a user clicks. For those you do not write a Playwright test. You exclude the scenario, and you say why. There are three scopes, from narrowest to widest.

Scenario level, to excuse one scenario:

#### Scenario: Nightly re-index runs at midnight

@e2e exclude background cron job, covered by PHPUnit integration test

Requirement level, to excuse every scenario under one requirement:

### Requirement: Background sync

@e2e exclude background-only, no UI surface, covered by Newman

Whole-spec level, placed right after ## Purpose, to excuse the entire spec:

## Purpose

@e2e exclude pure-backend API contract, every scenario describes HTTP response shape; covered by Newman

Who tests what: Playwright, Newman, PHPUnit

The exclusion reasons above are not free-form excuses. They map onto a strict division of labour. Each kind of behaviour has exactly one home:

BehaviourToolWhere
User interaction: clicks, typing, navigation, what is on screenPlaywright (e2e)tests/e2e/**, tied with @e2e
API contract: endpoint shape, status codes, auth, response formatNewman / Postmantests/integration/*.postman_collection.json
Unit logic: algorithms, transforms, internal statePHPUnit / Jesttests/Unit/**
Background and cron: async jobs, events, DB consistencyPHPUnit integrationlib/BackgroundJob/, lib/Cron/

So a valid @e2e exclude reason is a pointer to the right test layer: covered by Newman, unit-level algorithm, covered by Jest, DB migration correctness, covered by PHPUnit, DI wiring, framework boilerplate. What you may not do is assert an API contract inside a Playwright test just to make gate-19 green. That is the wrong layer, and the testing standard calls it out. Playwright drives the UI. It does not poke the API directly.

Gate-19: the enforcement

None of this would matter if it were a guideline. It is not. It is gate-19 (e2e-coverage), one of Hydra's 22 mechanical gates. Here is exactly what it does:

  1. It scans every spec file (openspec/specs/*/spec.md) added or modified in the current PR. The scan is diff-scoped, per ADR-020, so untouched legacy specs never block you.
  2. For each spec, it lists every scenario (Format A and Format B).
  3. For each scenario it checks one thing: is there a matching @e2e annotation in a test file, or a reason-bearing @e2e exclude?
  4. Every scenario with neither becomes one finding, reported as <spec>::<slug> missing @e2e. The exit code is the number of uncovered scenarios.

Its sibling, gate-16 (spec-coverage), does the same for the backward link. Every public or protected method you add or change must carry @spec openspec/... or @spec exclude <reason>. Between them, the two gates make the backtrace from the previous section non-optional. You cannot merge a PR that adds an unspecified method, and you cannot merge a PR that adds an untested UI scenario. The builder, human or AI, has no way around it.

Spec coding versus vibe coding

Now the payoff. Picture two ways to ask an AI to build the pet-search feature.

Vibe coding. "Add a search box to the pet list." The model improvises. It guesses what "search" means, invents an endpoint, maybe skips the auth check, maybe leaves a // TODO: handle empty query, maybe adds a route it forgets to register. It might be great. It might be subtly wrong in a way nobody notices until production. You are trusting vibes, and re-reviewing everything by hand every time.

Spec coding. You write the requirement and the scenario first. Then the AI implements against them, inside a sandbox with walls:

  • The spec fixes what must happen: typing "be" shows only "Bella". Not negotiable. It is a contract with MUST.
  • The ADRs fix how it must be built: data goes through OpenRegister, auth attributes match intent, modals live in their own files. See Part 3.
  • Gate-16 forces every new method to point back at the spec. No orphan code.
  • Gate-19 forces every new scenario to have a real browser test. No untested claims.
  • The other 20 gates stop the AI shipping debug calls, unreachable routes, IDOR holes, fail-open auth, accessibility regressions, and the rest.
  • A code reviewer and a security reviewer then read what is left. See Parts 4 and 5 of the Hydra series.

The AI still writes the code. But it cannot invent a feature that is not specified, cannot leave a UI scenario untested, cannot skip the auth check, cannot leave a stub. Every one of those moves trips a deterministic wall before a human ever looks. That is the sandbox. Not "trust the model", but "let the model move freely inside walls you defined on purpose". Spec coding is vibe coding with the guardrails turned on, and the guardrails are specs, ADRs, gates and tests, all tied together by the two annotations you met in this part.

Troubleshooting

Common scenario-to-test mistakes

Almost always a slug mismatch. Re-derive the slug exactly: lowercase the #### Scenario: heading and hyphenate non-alphanumeric runs. Common traps are a trailing period or a colon in the heading, capital letters, or a heading you edited after writing the annotation. Copy the slug from the gate's own output if you can.

Check the heading level. It must be #### Scenario: (four hashes). A ### Scenario: (three) is silently skipped by the parser. Same trap as the three-hash mistake from Part 2: wrong level means invisible.

A bare @e2e exclude needs a reason after it. @e2e exclude on its own counts as non-compliant. Add the why: @e2e exclude pure-backend, covered by Newman.

No. API contracts belong in Newman (tests/integration/*.postman_collection.json), not Playwright. Exclude the scenario with @e2e exclude API contract, covered by Newman and write the assertion in the Newman collection. Playwright is UI-only.

Test yourself

1. What are the two backtrace links, and which annotation lives where?

Hint

One points from code to the spec. The other points from a test to the spec. Both are comments.

Answer

The backward link is @spec openspec/..., which lives in the code (PHPDoc/JSDoc) and answers "which requirement does this method implement?". Enforced by gate-16. The forward link is @e2e <spec>::<slug>, which lives in the Playwright test and answers "which scenario does this test prove?". Enforced by gate-19. Together they give 100% backtrace from any line of code or any test to the functional demand behind it.

2. Turn this heading into the slug the gate expects: #### Scenario: User adds a Pet to the Wishlist.

Hint

Lowercase everything, replace each run of non-alphanumeric characters with a single hyphen.

Answer

user-adds-a-pet-to-the-wishlist.

The full long-form annotation would be // @e2e openspec/specs/<spec-name>/spec.md#user-adds-a-pet-to-the-wishlist, or short form // @e2e <spec-name>::user-adds-a-pet-to-the-wishlist.

3. A scenario describes "the /api/pets endpoint returns HTTP 401 without auth". How do you satisfy gate-19?

Hint

Is that a thing a user clicks, or an HTTP contract? Which test layer owns it?

Answer

It is an API contract, not a UI interaction, so you exclude it from e2e with a reason and test it in Newman:

#### Scenario: Pets endpoint rejects unauthenticated requests

@e2e exclude API contract, covered by Newman integration test

Then assert the 401 in tests/integration/*.postman_collection.json. Writing it as a Playwright test would be the wrong layer. Playwright is UI-only.

4. In one sentence, why is spec coding safer than vibe coding?

Hint

Think about what the AI cannot do once the walls are up.

Answer

Because the spec (what), the ADRs (how) and the 22 gates (including gate-16 and gate-19) form deterministic walls the AI cannot cross. It cannot ship an unspecified feature, an untested UI scenario, a skipped auth check or a leftover stub, so human review is reserved for genuine judgement instead of catching mechanical mistakes by hand.

Where to next

You now have the full OpenSpec loop, from idea to spec to tested code. Two directions from here: