Application Development · stage 3 of 6

Product

External / Ask gate

Define behavioral specifications and acceptance criteria

Product

Define the behavioral contract that hands the design over to development: the acceptance criteria, executable scenarios, and data contracts that say what the system must do and how its success is judged.

Scope

Behavioral specification — observable behavior, acceptance criteria, and the data shapes that cross boundaries. Not the visual design (that came in upstream), not the implementation (that's development's call).

What to do

Write acceptance criteria from the user's perspective: what they can do and how you'd know it worked.
Make every criterion verifiable — pair it with a concrete scenario or check, not a vague intent.
Cover the behavior the design implies, including the failure and edge paths, and prove the coverage.

What NOT to do

Don't redesign the interface or restate visual decisions — reference the design, don't relitigate it.
Don't choose implementation, frameworks, or data storage; specify the contract, not the mechanism.
Don't write criteria no one can check, and don't leave behavior the design shows unspecified.

How the engine runs this stage

1Elaborate

collaborative · plan the work, fan out discovery, declare outputs

Inputs consumed

discoveryfrom Inception design-brieffrom Design design-tokensfrom Design

Discovery fan-out

knowledge artifactAcceptance CriteriaPrioritized user stories and acceptance criteria produced by the product hat. Defines what "done" looks like from the user's perspective — not how the system implements it.

Acceptance Criteria

Prioritized user stories and acceptance criteria produced by the product hat. Defines what "done" looks like from the user's perspective — not how the system implements it.

Content Guide

User stories — "As a [role], I want [action], so that [benefit]" with specific domain entities
Variability brief — dimensions along which behavior varies, confirmed before AC writing
Acceptance criteria — structured as General Rules first, then variant-specific subsections
Prioritization — P0 (must-have for completion) vs P1 (follow-up)

Quality Signals

User stories reference specific domain entities, not generic placeholders
Every criterion is specific enough to write a test for
Edge cases and error paths are covered alongside happy paths
Variability dimensions are explicitly enumerated

knowledge artifactBehavioral SpecGherkin `.feature` files defining what the system does from the user's perspective. These files drive development — tests are written to verify these behaviors, and the features themselves can be executed by Cucumber-compatible test runners.

Behavioral Spec

Gherkin .feature files defining what the system does from the user's perspective. These files drive development — tests are written to verify these behaviors, and the features themselves can be executed by Cucumber-compatible test runners.

Content Guide

Each .feature file should contain:

Feature — descriptive name and summary of the capability
Background — shared preconditions across scenarios (Given steps common to all)
Scenarios — concrete examples covering:
- Happy path — the expected successful flow
- Error scenarios — validation failures, auth errors, not found, server errors
- Edge cases — boundary conditions, concurrent access, empty states, maximum limits
Scenario Outlines — parameterized scenarios for testing across multiple inputs

Quality Signals

Every feature has at least one error scenario, not just the happy path
Scenarios are specific enough to execute as automated tests
Actors are named roles, not generic "user"
Edge cases cover boundaries (zero, one, max, empty, null)
Steps use domain language consistent with acceptance criteria from the product hat

knowledge artifactCoverage MappingTraceability matrix produced by the validator hat mapping every unit success criterion to its corresponding acceptance criteria and specification items. A GAPS FOUND result blocks stage completion until the responsible hat addresses the gap.

Coverage Mapping

Traceability matrix produced by the validator hat mapping every unit success criterion to its corresponding acceptance criteria and specification items. A GAPS FOUND result blocks stage completion until the responsible hat addresses the gap.

Content Guide

Coverage matrix — each success criterion mapped to AC and spec items that cover it
Gap flags — any criterion with no corresponding AC or spec, with the responsible hat identified
Scope creep flags — any AC or spec item that doesn't trace back to a success criterion
Validation decision — APPROVED (no gaps) or GAPS FOUND (blocks stage completion)

Quality Signals

Every success criterion maps to at least one AC or spec item
Every AC item is testable — a concrete test can be described for it
No gaps remain unflagged
Scope creep items are identified but do not block approval

knowledge artifactData ContractsAPI, database, and event contracts that define the data shapes flowing through the system. This output is the agreement between frontend and backend, between services, and between the system and its persistence layer.

Data Contracts

API, database, and event contracts that define the data shapes flowing through the system. This output is the agreement between frontend and backend, between services, and between the system and its persistence layer.

Content Guide

API Endpoints

For each endpoint:

Method and path (e.g., POST /api/v1/users)
Request schema — field names, types, required vs. optional, validation rules
Response schema — field names, types, shape for success and each error case
Error responses — status codes, error body shape, when each error occurs
Authentication — what auth is required, what scopes/roles

Database Models

For each entity:

Entity name and table/collection name
Fields — name, type, nullable, default, constraints
Relationships — foreign keys, join tables, cardinality
Indexes — which fields are indexed and why
Constraints — unique, check, not-null

Event Schemas (if applicable)

For each event:

Event name and topic/channel
Payload schema — field names and types
Producer — what emits this event
Consumers — what listens for this event

Quality Signals

Every field has an explicit type and required/optional designation
Error responses are specified alongside success responses
Example values are provided for non-obvious fields
Naming is consistent across all contracts (same entity name everywhere)

Phase guidance

phase overrideELABORATIONProduct criteria are verified by **behavioral testing** — automated tests (e.g. Cucumber `.feature` scenarios, integration tests, contract tests) that assert the system behaves as specified.

Product Stage — Elaboration

Criteria Guidance

Product criteria are verified by behavioral testing — automated tests (e.g. Cucumber .feature scenarios, integration tests, contract tests) that assert the system behaves as specified.

Good criteria — concrete and verifiable

When generating criteria for this stage, focus on behavioral verification:

Detailed behavioral specs that describe what the system does, not how it is built
Acceptance criteria for every user-facing scenario, each expressible as a Given/When/Then test
Edge cases, error paths, and boundary conditions explicitly covered
Data contracts, validation rules, and state transitions specified with concrete examples
Integration points and external dependency behavior documented (with mock or contract-test specifications)
Behavioral specs precise enough for a developer to implement without follow-up questions

Bad criteria — vague (no clear check)

"Works correctly" — under what conditions? With what input?
"Handles errors" — which errors? What's the expected response?
"Data is validated" — against which schema? What error format?

Bad criteria — product-specific unverifiable

(In addition to the universal unverifiable shapes called out in the workflow engine contracts.)

"Behavior is intuitive" — needs a usability-test pass with a stated success-rate threshold
"Coverage is comprehensive across the user-facing capability list" — needs a structural check counting scenarios against the capability list, not a subjective judgment

Unit `outputs:` — required artifact shape

Every unit MUST declare its produced artifacts as real file paths in the outputs: frontmatter. The advance-hat gate verifies each path exists on disk; freeform descriptions get rejected at write time and at advance time.

For product-stage units, the typical artifact set is:

outputs:
  # Behavioral spec — Gherkin .feature file the specification hat
  # writes to features/. Per the behavioral-spec template, units MUST
  # produce at least one .feature file when they cover user-observable
  # behavior. Reference the file by its actual path, not by name.
  - .haiku/intents/{intent-slug}/features/my_week.feature

  # Acceptance criteria — markdown produced by the product hat for
  # this slice of behavior. Lives at .haiku/intents/{intent-slug}/product/
  # (NOT knowledge/ — that's discovery-stage territory).
  - .haiku/intents/{intent-slug}/product/ACCEPTANCE-CRITERIA.md

  # Data contract — schema/API/DB shape touched by this unit.
  - .haiku/intents/{intent-slug}/product/DATA-CONTRACTS.md

Substitute the bracketed paths with the unit's real intent slug and feature filename. The validator hat's COVERAGE-MAPPING.md is one shared file across the stage — typically only the validator hat's terminal unit lists it as an output.

MUST NOT: write prose like outputs: ["Weekly carryover roll: scheduler trigger, idempotent roll logic"]. That's a completion-criteria description, belongs in the body's ## Completion Criteria section, and the gate now rejects it as unit_outputs_missing (no real path matches).

Outputs produced

output templateSpecsBehavioral specs and data contracts produced by product units. The specification hat writes `.feature` files in Gherkin syntax; the product hat writes acceptance criteria documents.

Product Specifications

Behavioral specs and data contracts produced by product units. The specification hat writes .feature files in Gherkin syntax; the product hat writes acceptance criteria documents.

Expected Artifacts

Behavioral specs — .feature files with Gherkin scenarios (Feature/Scenario/Given/When/Then)
Data contracts — API schemas, request/response shapes, field types
Acceptance criteria — testable conditions for each feature, structured by variability dimension

Quality Signals

Every product unit produces at least one spec artifact
Behavioral specs are valid Gherkin syntax executable by a Cucumber-compatible runner
Data contracts include error responses, not just success cases

AC artifact shapes

The structures below are the canonical shapes for acceptance criteria when the variability brief calls for them. Use these directly; don't invent new structures unless the work genuinely doesn't fit one of these. Project overlays at .haiku/studios/software/stages/product/outputs/SPECS.md may add house-specific patterns; prefer the overlay's shapes over the defaults below when one is present.

Variant-based AC structure

1. General Rules
   1. [Things true across ALL variants — component references, default
      states, tabs where nothing appears]
2. [Variant 1 name]
   1. **[Screen / Tab Name]:**
      1. [Component] Placement:
         1. [Specific placement for this variant]
      2. [Other Component]: [show / hide rule]
3. [Variant 2 name]
   1. **[Screen / Tab Name]:**
      1. [Component] Placement:
         1. [Placement if different from Variant 1]
         2. NOTE: This differs from Variant 1 — [explain how].
      2. [Other Component]: Do NOT display

Adding a column to an existing table

1. Add "[Column Name]" Column to [Table Name]
   1. Add a new column to the [Table Name] table
      1. Column Header: [Column Name]
      2. Column Position: Place after the "[Previous Column]" column
   2. Column Data Display
      1. IF [condition]:
         1. Display [data description]
            1. This is the same value described in [Section X](#anchor)
         2. Format: `[format]`
            1. Example: `[example]`
      2. IF [alternate condition]:
         1. Display: `[sentinel value]`

1. Update [Column Name] Column
   1. Update text to Bold
   2. Add question mark tooltip icon
      1. icon: `question`
      2. color: `info`
      3. Selecting tooltip should open [Modal Name]
         1. See [Section X](#anchor)

1. For [action]: Use updated [Modal Name]
   1. See [Section X](#anchor)

Settings card with a toggle that reveals a configuration section

1. Create [Setting Name] Card
   1. Header
      1. Icon
         1. squareicon
         2. icon: `[icon-name]`
         3. color: `[token]`
      2. title: [Setting Title]
   2. Description
      1. text: [Description copy]
   3. Toggle Row
      1. label: [Toggle label]?
      2. Toggle
         1. Default state: OFF (NO)
         2. When toggled ON (YES), show [Configuration Section]
         3. When toggled OFF (NO), hide [Configuration Section]
   4. Highlighted Reminder
      1. icon: `circle-info`
      2. color: `info`
      3. text: [Reminder copy]
      4. Always show
   5. Save Changes Button
      1. text: Save Changes
      2. color when enabled: `[primary-token]`
      3. Keep disabled if no changes made or validation errors exist
      4. When selected, save and show success toast

Variant-based component placement (canonical multi-state shape)

1. General Rules
   1. The [Component Name] (see [Section X](#anchor) for full component AC) is added to [Screen Name]
   2. The component should be collapsed by default in all states
   3. The component should NOT display on the **[Tab Name]** in any state
2. [Variant 1]: [State Name]
   1. **[Tab A]:**
      1. [Component] Placement:
         1. Place below [element above]
         2. Place above [element below]
      2. [Secondary Component] Placement:
         1. Place directly below [Primary Component]
         2. Only display if [condition] (see [Section X](#anchor))
   2. **[Tab B]:**
      1. [Component] Placement:
         1. Place below [element above]
         2. Place above [element below]
3. [Variant 2]: [State Name]
   1. **[Tab A]:**
      1. [Component] Placement:
         1. Same placement as [Variant 1] [Tab A]
      2. [Secondary Component]: Do NOT display
   2. **[Tab B]:**
      1. [Component] Placement:
         1. Place below [different element]
         2. NOTE: This differs from [Variant 1] — [explain the change]
      2. [Secondary Component]: Do NOT display

Cross-reference conventions

Link related sections rather than restating. Anchor when an anchor is known; otherwise use See Section X above. Parenthetical form is fine for asides: ([Section VIII.b.1](#anchor)).

Inline code values

Backticks for values engineers will literally implement: time formats (HH:MM:SS, Xh Xm Xs), sentinel values (--, YES, NO), color tokens (primary, error, success), icon names, enum values.

When specifying icon + color + behavior together:

1. Icon
   1. squareicon
   2. icon: `mug-hot`
   3. color: `primary`

2Review

pre-execute · agents audit the planned spec before any code lands

review agentCompletenessThe agent **MUST** verify that the product stage's acceptance criteria, behavioral specs, and data contracts fully cover the intent — every user-facing flow, every error path, every boundary condition, every contract surface. Coverage gaps that slip past this lens become production bugs.

Mandate: The agent MUST verify that the product stage's acceptance criteria, behavioral specs, and data contracts fully cover the intent — every user-facing flow, every error path, every boundary condition, every contract surface. Coverage gaps that slip past this lens become production bugs.

Check

The agent MUST verify, file feedback for any violation:

Happy + error + edge coverage per flow — Every user-facing flow named in the intent has all three: a documented happy path, at least one error scenario (auth failure, validation failure, permission failure, not-found, conflict), and at least one boundary case (empty list, single item, maximum allowed, zero, off-by-one).
Variant coverage — Every variant identified in the product hat's Variability Brief has either its own AC subsection or an explicit "same as Variant N" note. No variant is silently skipped.
State-visibility completeness — Every state-visibility list has both Show on: and DO NOT show on: entries. Silence is a coverage gap, not a default.
Contract completeness — Every endpoint named in any .feature scenario has a row in DATA-CONTRACTS.md. Every field has an explicit type and required / optional designation. Every error scenario in a .feature has a matching error row in the contract.
Cross-reference integrity — Every See Section X / [Section X](#anchor) reference points to a section that exists.
AC ↔ scenario ↔ contract trace — The validator hat's COVERAGE-MAPPING.md is APPROVED. If it's GAPS FOUND, that's the highest-priority finding to file.

Common failure modes to look for

A .feature file with only happy-path scenarios (no error, no boundary)
A Background: block that's actually per-scenario preconditions misplaced
A scenario named in implementation language (POST /signup ...) instead of domain language (User submits valid form)
An AC section that uses "etc." or "and so on" — explicit absence (Do NOT display in X) is the contract; silence is ambiguity
A data contract entry like data: object without the inner shape spelled out
A variant referenced in the Variability Brief that has no corresponding AC subsection

review agentFeasibilityThe agent **MUST** challenge whether the specified behavior is implementable as written, within the technical constraints established by upstream design and inception stages. Specs that look complete but require disproportionate effort, conflict with existing schemas, or assume impossible capabilities produce a different failure mode than coverage gaps — they pass review and then stall in development. This lens catches them before they ship downstream.

Mandate: The agent MUST challenge whether the specified behavior is implementable as written, within the technical constraints established by upstream design and inception stages. Specs that look complete but require disproportionate effort, conflict with existing schemas, or assume impossible capabilities produce a different failure mode than coverage gaps — they pass review and then stall in development. This lens catches them before they ship downstream.

Check

The agent MUST verify, file feedback for any violation:

Performance targets are realistic — Response times, throughput, and concurrency claims align with the data model and existing infrastructure. Page loads in < 200ms for a screen that requires three joins across an un-indexed table is infeasible without a stated indexing or caching plan.
No silent breaking schema changes — Every contract change in DATA-CONTRACTS.md is compatible with existing schemas, or is paired with an explicit migration plan in the AC. Renaming a field, narrowing a type, or removing nullability from an existing column is a breaking change and MUST call out the migration approach.
Edge cases have defined behavior, not just intent — "Handle gracefully" is not feasible; it's a placeholder. Every edge case names the specific behavior (a status code, an empty state, a fallback value, a queued retry).
No assumed-impossible capabilities — Specs don't require capabilities that aren't in the inception knowledge or the design output. If the spec assumes a third-party service that wasn't named in inception, file feedback against upstream — the assumption needs to be made explicit before this stage approves.
Auth and permission specs are implementable against the existing identity model — The roles, scopes, and permission shapes in the spec match the system's existing auth model, or the spec calls out the auth-model change explicitly.
Concurrency / ordering / idempotency are specified for any contract that needs them — Any endpoint that mutates state, any event in the contract, and any retry / job mechanism has explicit ordering, idempotency, and concurrency-failure semantics. Silence here becomes race conditions in production.

Common failure modes to look for

A spec that calls for a capability whose cost (in latency, storage, or compute) hasn't been considered
A new endpoint that conflicts with an existing path / verb combination
A .feature scenario whose Given requires data state the database can't actually produce
A data contract that adds a not-null column to an existing table with no backfill or default specified
An error scenario that catches an error the system can't actually throw (e.g., catching a network error in a synchronous local call)
A boundary case ("max 10,000 items") with no statement of how the system behaves at and beyond the boundary

3Execute

per-unit baton · Product → Specification → Validator

hat 1ProductDefine behavioral acceptance criteria (AC) from the user's perspective — what users do and see, not how the system implements it. AC is what hands to engineers as the source-of-truth for behavior; quality here directly drives implementation quality downstream.

Focus: Define behavioral acceptance criteria (AC) from the user's perspective — what users do and see, not how the system implements it. AC is what hands to engineers as the source-of-truth for behavior; quality here directly drives implementation quality downstream.

Process

1. Pre-flight — confirm inputs before writing

Before writing AC, present this checklist to the user and confirm everything is in scope:

Designs — links to the visual mockups / specs that show what's being built (one link per screen / state)
Feature context — what the feature does and why, in plain language
Reference AC — any existing AC docs / sections in the same product to match style, avoid duplication, and link as cross-references
Feature flag — the flag name, if applicable, and whether it's enabled in the environment being compared against
Environment to compare against — running app, staging, etc., so "what's already built" vs. "what's net new" can be distinguished
Definition of "exists" — UI present? Behavior implemented? Tests passing? Agree on the bar before classifying anything as "already exists"

If the user can't confirm an item, write the AC scoped to what's confirmed and call out the gap inline — don't invent context.

2. Identify variability BEFORE writing AC

The single biggest source of missed requirements is unmodeled variability — a button that looks the same across screens but behaves differently per user role, device, state, or context. Don't discover variants mid-write by diffing designs; surface them up front.

Present a Variability Brief to the user for confirmation before any AC drafting:

Dimension: what variable creates different behaviors? (user role, device type, state value, feature flag, locale, etc.)
Variants: list every value of that dimension that has any behavior difference
Per variant, what changes? Use a table:

Variant	Screens affected	Placement differences	Show / hide differences	Behavior differences
name	which screens	where components go	what appears / disappears	any logic changes

What stays the same across all variants? (component always collapsed by default, never appears on the X tab, etc.)

Use the brief to decide structure:

If variants share most behavior → write a General Rules section first, then variant-specific subsections that ONLY name the deltas
If variants are mostly different → write each variant as its own top-level section

3. Compare against existing — classify net new vs. modified vs. existing

If the user gave you an environment to compare against (a running app, staging, etc.), do this BEFORE writing any AC:

Navigate to each relevant screen in the comparison environment
Compare against the new designs section-by-section
For every UI element / behavior you'd write AC for, classify it:
- Existing — already there and matches the design. Skip AC or add Already exists — no changes required
- Modified — exists but something is changing. Write AC for the delta only and call out what's changing from current state
- Net new — doesn't exist yet. Write full AC
Present the classification to the user for confirmation before drafting

Item	Classification	Notes
component / behavior	Existing / Modified / Net new	what's changing, if modified

If the comparison environment doesn't have the feature flag enabled, everything will look net new — don't draw conclusions until the flag state is confirmed. When in doubt, flag it for the user, don't assume.

4. Write the AC

Follow the structure the Variability Brief implied. Match the conventions of the reference AC the user pointed at — numbering scheme, section headers, code formatting, tone. Consistency beats personal preference: if the team writes Section II.4.b, do that; if they write AC-1.4.3.2, do that. Don't impose a new scheme.

The canonical artifact shapes live in plugin/studios/software/stages/product/outputs/SPECS.md. Read those before drafting — they cover variant-based structure, table-column additions, tooltip updates, modal references, settings cards with toggles, multi-variant component placement, cross-reference conventions, and inline code values. Use them directly; engineers benefit from consistency more than from your originality.

Three principles always apply, regardless of shape:

NOTE callouts — anywhere a variant deviates from a prior one, or anywhere implementation needs attention that isn't obvious from the numbered items alone, add an inline NOTE: line that names the difference. Common uses: variant deviation, missing-design fallback, important non-obvious detail, "do NOT" reminders.
State visibility lists — when documenting which states show or hide a component, list the "show" cases first, then explicitly call out the "do not show" cases. Never omit a state — silence is ambiguous to developers. For simpler cases, inline it: [Component]: Do NOT display in [State C] or [State D].
Explicit "Do NOT display" — when a component is hidden in a variant, say so directly. Silence is ambiguous.

5. Self-check before handing off

Before declaring AC complete:

Every variant in the Variability Brief has either its own section or an explicit "same as Variant N" note
Every state in any visibility list has either a "show" or "do not show" entry
Every reference to another AC section uses an anchor link, not a vague "see above"
Every value engineers will literally implement is in backticks
Every numbered item is independently testable — a QA engineer could write a single test that verifies just that item
The document matches the formatting conventions of the reference AC the user pointed at

Anti-patterns (RFC 2119)

The agent MUST present the Variability Brief and the existing-vs-modified-vs-new classification to the user for confirmation before drafting
The agent MUST NOT skip variability identification — variant differences are the #1 source of missed requirements
The agent MUST NOT write implementation details instead of user behavior ("use a Redis cache" vs. "the page loads in under 2 seconds")
The agent MUST NOT omit "do not show / do not display" states — silence is ambiguous; explicit absence is the contract

hat 2SpecificationTranslate the product hat's acceptance criteria into executable behavioral specs (Gherkin `.feature` files) and complete data contracts (API / DB / event schemas). Gherkin is the spec language — every AC item becomes one or more scenarios with explicit `Given` preconditions, `When` actions, and `Then` outcomes. Data contracts are the agreement frontend ↔ backend ↔ persistence. Precision matters: ambiguity in specs becomes bugs in code.

Focus: Translate the product hat's acceptance criteria into executable behavioral specs (Gherkin .feature files) and complete data contracts (API / DB / event schemas). Gherkin is the spec language — every AC item becomes one or more scenarios with explicit Given preconditions, When actions, and Then outcomes. Data contracts are the agreement frontend ↔ backend ↔ persistence. Precision matters: ambiguity in specs becomes bugs in code.

You produce two artifacts per unit:

One or more .feature files under features/ (Gherkin)
The unit's slice of DATA-CONTRACTS.md (request / response / error shapes, DB models, event payloads)

You do NOT produce acceptance criteria — that's the product hat. You read the product hat's AC and turn each AC item into the corresponding scenario(s) and contract(s).

Process

1. Read your inputs

Read the product hat's AC for this unit (ACCEPTANCE-CRITERIA.md)
Read the unit's own success criteria
Read sibling units' existing .feature files and DATA-CONTRACTS.md to keep naming consistent (a User in one feature must be a User in every other; an API path appearing in two units must use the same path and the same field names)

2. Identify the unit's discipline before choosing format

The right contract format depends on what the unit covers:

Frontend / UI unit — .feature files describe component states, responsive behavior, click flows, and visibility rules; data contracts are limited to the shape of payloads the component consumes / emits. Where AC says "show", .feature says Then I see ....
Backend / API unit — .feature files describe request / response behavior, auth checks, error responses; data contracts are full request schemas, response schemas (success + every error), status codes, and authorization scopes.
Service / data-pipeline unit — .feature files describe inputs in / outputs out, including timing and ordering; data contracts include event payloads, idempotency keys, retry semantics, and ordering guarantees.
DevOps / infra unit — .feature files describe environment-specific configuration and rollback criteria; data contracts include the config schema and environment variables.

Pick the format before you start writing. Mixing them inside one feature file is how scenarios become unreadable.

3. Write the Gherkin

Feature file structure (one per logical capability, not one per unit — a unit may produce multiple .feature files if it covers more than one capability):

Feature: <capability name in domain language>
  <one-line description of what this capability lets the user do and why>

  Background:
    Given <preconditions common to every scenario in this file>
    And <another shared precondition>

  Scenario: <named in user language, not implementation language>
    Given <unique precondition for this scenario>
    When <the single user action>
    Then <the observable outcome>
    And <secondary observable outcome>

  Scenario: <error or edge case>
    Given <precondition that triggers the error path>
    When <user action>
    Then <error response>

Scenario naming rules:

Name scenarios in domain language that matches the AC. User submits valid signup form — yes. POST /signup with valid body returns 201 — no (that's implementation, not behavior).
A reviewer who has never touched the codebase should be able to read the scenario list and understand what the feature does.
One observable behavior per scenario. If you have to use the word "and" in the scenario title, split it.

Background section rules:

Put preconditions in Background only if they apply to every scenario in the file.
Per-scenario preconditions go in the scenario's own Given steps.
If your Background is more than 4 Given lines, the file is probably covering two capabilities — split it into two .feature files.

Scenario Outline rules:

Use Scenario Outline with an Examples: table when the same scenario shape applies across multiple inputs (e.g., validation rules for a form across each invalid field). Don't use Scenario Outline to combine genuinely different behaviors into one parameterized scenario — that hides the behavior diversity from the reviewer.

Scenario Outline: Form rejects invalid <field>
  Given the signup form is open
  When I enter <value> in the <field> field
  And I submit the form
  Then the <field> field shows error "<error_message>"

  Examples:
    | field    | value          | error_message              |
    | email    | not-an-email   | Enter a valid email        |
    | password | abc            | At least 8 characters      |
    | zip      | 1234           | Enter a 5-digit ZIP code   |

Error and edge-case coverage:

Every feature MUST include at least one error scenario. Cover, at minimum:

The auth-failure path (if the capability is gated)
The validation-failure path (if the capability accepts input)
The not-found / permission path (if the capability resolves an entity)
The boundary case (empty list, single item, maximum allowed, off-by-one)

A feature with only a happy path is not a complete spec — it's a sales demo.

Steps shared across files:

If you find yourself writing the same multi-line Given block in two feature files, factor it into a shared step (Given a logged-in <role>). Don't duplicate setup verbatim — when it drifts, the tests drift with it.

4. Write the data contracts

For each API endpoint touched by this unit, append to DATA-CONTRACTS.md:

### POST /api/v1/<resource>

**Auth:** <role / scope required, or "public">

**Request body**

| Field      | Type    | Required | Validation                | Notes |
|------------|---------|----------|---------------------------|-------|
| email      | string  | yes      | RFC 5322 email            |       |
| password   | string  | yes      | min 8 chars, must include digit + symbol | hashed before storage |
| referral   | string  | no       | UUID v4                   | optional referral source |

**Success response** (`201 Created`)

| Field      | Type    | Notes |
|------------|---------|-------|
| id         | UUID    | new user id |
| email      | string  | echoed |
| created_at | ISO8601 | server time |

**Error responses**

| Status | Code              | When |
|--------|-------------------|------|
| 400    | validation_failed | any required field missing or invalid |
| 409    | email_in_use      | email already registered |
| 429    | rate_limited      | > 5 signups / IP / hour |

For each DB entity touched, include: entity name, fields (name / type / nullable / default / constraints), relationships (FK + cardinality), indexes (which fields + why), and constraints (unique / check / not-null).

For each event emitted or consumed: event name + topic, payload schema, producer, consumers, ordering and idempotency semantics.

Required level of completeness:

Every error case from the .feature file appears in the error-response table
Every field has an explicit type and required / optional designation
Example values are provided for non-obvious fields (format-specific strings, sentinel values, units)
Naming is consistent across all contracts in this intent (same entity name everywhere, same field name everywhere)
No field labelled "data: object" without spelling out the object's shape

5. Cross-check before handing off

Every AC item from the product hat maps to at least one .feature scenario
Every .feature scenario maps back to an AC item (no orphan scenarios)
Every endpoint named in any scenario appears in DATA-CONTRACTS.md
Every error scenario has a corresponding error row in DATA-CONTRACTS.md
Field names, entity names, and endpoint paths are spelled the same way across AC, .feature, and contracts

Anti-patterns (RFC 2119)

The agent MUST NOT write specs that describe implementation (POST /signup with valid body returns 201) rather than behavior (User submits valid signup form)
The agent MUST NOT define happy path only without error and edge-case scenarios
The agent MUST NOT leave contracts ambiguous — every response shape is named, every error is enumerated
The agent MUST NOT introduce a new endpoint, table, or event in .feature without writing its row in DATA-CONTRACTS.md
The agent MUST use the same entity / field / endpoint names in AC, .feature, and DATA-CONTRACTS.md

hat 3ValidatorVerify that **this unit's** outputs accomplish **this unit's** spec. You are the verify role for the product stage. List the unit's declared outputs, then prove every success criterion in the unit's spec is covered by an acceptance-criteria item and a `.feature` scenario (and a data-contract row when the criterion implies a contract). You do not write AC or specs to fix gaps; you route gaps back to the responsible hat.

Focus: Verify that this unit's outputs accomplish this unit's spec. You are the verify role for the product stage. List the unit's declared outputs, then prove every success criterion in the unit's spec is covered by an acceptance-criteria item and a .feature scenario (and a data-contract row when the criterion implies a contract). You do not write AC or specs to fix gaps; you route gaps back to the responsible hat.

You record this unit's coverage in COVERAGE-MAPPING.md. The stage-wide roll-up across every unit — orphan scenarios, cross-unit scope creep, the full traceability matrix — is the product stage's completeness review agent's job, run once after all units are built. Not yours.

Process

1. List this unit's outputs and read its spec

Call haiku_unit_get { intent, stage, unit, field: "outputs" } to list the artifacts THIS unit declares. These — and only these — are the outputs you validate.
Call haiku_unit_read for THIS unit to read its ## Completion Criteria / ## Success Criteria — the spec the outputs must accomplish.
Read the content of this unit's outputs: the acceptance-criteria items in ACCEPTANCE-CRITERIA.md this unit's product hat wrote, the unit's .feature file(s), and the DATA-CONTRACTS.md rows for endpoints/tables/events this unit touches.

2. Build this unit's coverage rows

One row per success criterion in this unit. Each row names which AC item(s), .feature scenario(s), and contract row(s) cover it. Record the rows under a heading keyed to this unit (e.g. ## unit-NN — <title>) so concurrent validators write distinct sections.

| Criterion ID | Success Criterion | AC Items | Scenarios | Contract Rows | Status |
|--------------|-------------------|----------|-----------|---------------|--------|
| SC-1         | <verbatim>        | AC-1.2, AC-1.4 | `features/signup.feature:Scenario: User submits valid form` | `POST /api/v1/signup` row 1 | COVERED |
| SC-2         | <verbatim>        | _none_   | _none_    | _none_        | GAP — responsible hat: product |

Status values:

COVERED — at least one AC item + at least one .feature scenario reference the criterion. If the criterion implies a contract (any API surface, DB write, event), at least one contract row exists too.
GAP — the criterion has no covering AC OR no covering scenario OR no covering contract row (when one is implied). Name the responsible hat:
- Missing AC → product
- Missing scenario → specification
- Missing contract row → specification
PARTIAL — covered by AC but no scenario yet, or covered by scenario but no contract row. Treated as GAP for the purposes of approval — list the responsible hat explicitly.

3. Reverse-walk this unit for scope creep

Within this unit's own outputs only, walk the other direction:

Every AC item this unit wrote that doesn't trace back to one of this unit's success criteria → list under ## Scope Creep Candidates with the AC reference and a one-line note. Scope creep does NOT block approval — it's a flag for the user to confirm intent.
Every .feature scenario in this unit's file(s) that doesn't trace back to a success criterion → same treatment.
Every endpoint, table, or event this unit added to DATA-CONTRACTS.md that none of this unit's scenarios reference → same treatment.

Cross-unit orphans (a sibling's scenario with no criterion) are the completeness review agent's job, not yours.

4. Decide

For this unit's rows:

If every row is COVERED: write ## Validation Decision: APPROVED under this unit's heading and call haiku_unit_advance_hat.
If any row is GAP or PARTIAL: write ## Validation Decision: GAPS FOUND listing each gap by criterion id + responsible hat. Then call haiku_unit_reject_hat with a message naming the gaps — the workflow engine rewinds this unit to the responsible hat. You do not file feedback for in-unit gaps — rejection is the routing mechanism for the in-flight hat chain.

When you reject, describe the content gap precisely: name the criterion and what the output fails to cover — "scenario User resets password never asserts the lockout-after-5-attempts outcome", not "scenario missing". This unit's output files exist on disk; a reject phrased as file-level absence is refused as contradicting the filesystem.

If a criterion depends on missing upstream output (e.g. a design decision that never landed), file feedback via haiku_feedback against the upstream stage — rejection only rewinds within this stage.

5. Self-check

You listed this unit's outputs via haiku_unit_get and validated only those
Only this unit's success criteria are in the matrix — no sibling unit's criteria
Every cell in the AC / Scenarios / Contract Rows columns is a specific reference (AC-1.4, features/signup.feature:Scenario: ..., POST /signup) — not "yes" or "covered"
Every GAP row names the responsible hat
The validation decision is written explicitly as APPROVED or GAPS FOUND under this unit's heading

Anti-patterns (RFC 2119)

The agent MUST validate only the outputs haiku_unit_get lists for the unit under validation
The agent MUST NOT validate, read for gaps, or reject based on any unit other than the one under validation
The agent MUST NOT edit any file other than COVERAGE-MAPPING.md — you are a verifier, not a fixer
The agent MUST NOT approve without rows that name every success criterion in this unit
The agent MUST name the responsible hat for every gap so the rejection routes correctly
The agent MUST NOT mark a criterion COVERED based on intent — only based on a literal reference to the AC item, scenario, or contract row
The agent MUST NOT write new AC or specs to fill gaps — gaps route back via haiku_unit_reject_hat
The agent MUST NOT phrase a content reject as file-level absence ("missing file", "no output") — name the incomplete content instead

4Approve

post-execute · the same agents re-run against the built work

The agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.

approval agentCompletenessThe agent **MUST** verify that the product stage's acceptance criteria, behavioral specs, and data contracts fully cover the intent — every user-facing flow, every error path, every boundary condition, every contract surface. Coverage gaps that slip past this lens become production bugs.

Check

The agent MUST verify, file feedback for any violation:

Happy + error + edge coverage per flow — Every user-facing flow named in the intent has all three: a documented happy path, at least one error scenario (auth failure, validation failure, permission failure, not-found, conflict), and at least one boundary case (empty list, single item, maximum allowed, zero, off-by-one).
Variant coverage — Every variant identified in the product hat's Variability Brief has either its own AC subsection or an explicit "same as Variant N" note. No variant is silently skipped.
State-visibility completeness — Every state-visibility list has both Show on: and DO NOT show on: entries. Silence is a coverage gap, not a default.
Contract completeness — Every endpoint named in any .feature scenario has a row in DATA-CONTRACTS.md. Every field has an explicit type and required / optional designation. Every error scenario in a .feature has a matching error row in the contract.
Cross-reference integrity — Every See Section X / [Section X](#anchor) reference points to a section that exists.
AC ↔ scenario ↔ contract trace — The validator hat's COVERAGE-MAPPING.md is APPROVED. If it's GAPS FOUND, that's the highest-priority finding to file.

Common failure modes to look for

A .feature file with only happy-path scenarios (no error, no boundary)
A Background: block that's actually per-scenario preconditions misplaced
A scenario named in implementation language (POST /signup ...) instead of domain language (User submits valid form)
An AC section that uses "etc." or "and so on" — explicit absence (Do NOT display in X) is the contract; silence is ambiguity
A data contract entry like data: object without the inner shape spelled out
A variant referenced in the Variability Brief that has no corresponding AC subsection

approval agentFeasibilityThe agent **MUST** challenge whether the specified behavior is implementable as written, within the technical constraints established by upstream design and inception stages. Specs that look complete but require disproportionate effort, conflict with existing schemas, or assume impossible capabilities produce a different failure mode than coverage gaps — they pass review and then stall in development. This lens catches them before they ship downstream.

Check

The agent MUST verify, file feedback for any violation:

Performance targets are realistic — Response times, throughput, and concurrency claims align with the data model and existing infrastructure. Page loads in < 200ms for a screen that requires three joins across an un-indexed table is infeasible without a stated indexing or caching plan.
No silent breaking schema changes — Every contract change in DATA-CONTRACTS.md is compatible with existing schemas, or is paired with an explicit migration plan in the AC. Renaming a field, narrowing a type, or removing nullability from an existing column is a breaking change and MUST call out the migration approach.
Edge cases have defined behavior, not just intent — "Handle gracefully" is not feasible; it's a placeholder. Every edge case names the specific behavior (a status code, an empty state, a fallback value, a queued retry).
No assumed-impossible capabilities — Specs don't require capabilities that aren't in the inception knowledge or the design output. If the spec assumes a third-party service that wasn't named in inception, file feedback against upstream — the assumption needs to be made explicit before this stage approves.
Auth and permission specs are implementable against the existing identity model — The roles, scopes, and permission shapes in the spec match the system's existing auth model, or the spec calls out the auth-model change explicitly.
Concurrency / ordering / idempotency are specified for any contract that needs them — Any endpoint that mutates state, any event in the contract, and any retry / job mechanism has explicit ordering, idempotency, and concurrency-failure semantics. Silence here becomes race conditions in production.

Common failure modes to look for

A spec that calls for a capability whose cost (in latency, storage, or compute) hasn't been considered
A new endpoint that conflicts with an existing path / verb combination
A .feature scenario whose Given requires data state the database can't actually produce
A data contract that adds a not-null column to an existing table with no backfill or default specified
An error scenario that catches an error the system can't actually throw (e.g., catching a network error in a synchronous local call)
A boundary case ("max 10,000 items") with no statement of how the system behaves at and beyond the boundary

5Gate

controls advancement to the next stage

External / Ask

The user chooses: submit for external review, or approve locally.

Fix loop

a separate track · Classifier → Product → Specification → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.
Read the stage's unit list via haiku_unit_list { intent, stage }.
Decide:
- target_unit — which unit this FB counter-signals.
  - If the body names or describes a specific unit's output, set that unit's slug.
  - If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
  - When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
- target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
  - user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
  - adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
  - drift origin → ["user"] (drift always escalates to human).
  - agent origin → [] (informational; no rerun).
Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.
Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.
- blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
- high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
- medium — a genuine issue worth fixing; not delivery-blocking.
- low — a nit, polish, or nice-to-have.
Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.
Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.
Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2ProductCorrect ONE acceptance-criteria finding through the product lens. The AC already exists — you are making the targeted change the finding describes, not authoring AC from a blank page.

Focus: Correct ONE acceptance-criteria finding through the product lens. The AC already exists — you are making the targeted change the finding describes, not authoring AC from a blank page.

What you do

Identify, from the finding, the exact AC item(s) it implicates and the specific change it calls for — a wording fix, a missing or incorrect classification, a scope correction, a contradiction with another artifact (design, spec, or a sibling unit's AC).
Make ONLY that change to the affected AC. Keep the existing structure, numbering, NOTE callouts, and visibility conventions; touch the minimum needed to resolve the finding.
If the change ripples (a renamed entity, a corrected state name), apply it consistently everywhere that AC names it — but do not go looking for unrelated improvements.

What you do NOT do

You do NOT re-author the AC from scratch. No Variability Brief, no existing-vs-new classification pass, no comparison-environment walk-through, no pre-flight checklist — that ritual belongs to the production phase, not a one-finding correction.
You do NOT present to, or wait on, the user. The fix loop is non-interactive — resolve the finding from the artifacts in front of you.
You do NOT expand scope beyond the one finding. An adjacent gap you happen to notice belongs in a separate feedback item, not this fix.
You do NOT touch units, the .feature files / DATA-CONTRACTS.md (that's the specification hat), other stages' artifacts, or anything the finding didn't name.

Anti-patterns (RFC 2119)

The agent MUST NOT treat this dispatch as a request to produce or re-derive acceptance criteria — it is a single targeted correction.
The agent MUST NOT widen scope past the flagged item.
The agent MUST preserve the document's existing conventions; consistency with what's already there beats personal preference.

Why this hat exists in the fix loop

The production product mandate teaches authoring AC from a blank page through user collaboration — the wrong shape for correcting a single review finding, and the source of fix-loop stalls when the agent tried to run a from-scratch ritual against a one-line fix. This variant keeps you in targeted-correction mode so the change lands small and the chain advances.

fix-hat 3SpecificationCorrect ONE behavioral-spec or data-contract finding through the specification lens. The `.feature` files and `DATA-CONTRACTS.md` already exist — you are making the targeted change the finding describes, not authoring specs from scratch.

Focus: Correct ONE behavioral-spec or data-contract finding through the specification lens. The .feature files and DATA-CONTRACTS.md already exist — you are making the targeted change the finding describes, not authoring specs from scratch.

What you do

Identify, from the finding, the exact scenario, step, contract field, or schema row it implicates and the specific change it calls for — a mismatch with the AC, a missing error/edge case, an inconsistent field/entity/endpoint name, a required/optional disagreement, a contradiction with a sibling unit's contract, or an out-of-scope reference in the completion criteria.
Make ONLY that change. Keep the existing Gherkin structure (Background, Scenario Outline, naming) and contract table conventions; touch the minimum needed to resolve the finding.
When the finding is a naming or required/optional inconsistency, align the spec to the canonical artifact the finding cites (the AC, or the unit that owns the contract) rather than inventing a third spelling.

What you do NOT do

You do NOT re-author features or contracts from scratch. No full Gherkin-structure walk-through, no blank-page data-contract authoring, no happy-path-plus-every-error sweep — that's the production phase, not a one-finding correction.
You do NOT expand scope beyond the one finding. An adjacent missing scenario belongs in separate feedback.
You do NOT touch units, the acceptance criteria (correcting AC is the product hat's job), other stages' artifacts, or anything the finding didn't name.

Anti-patterns (RFC 2119)

The agent MUST NOT treat this dispatch as a request to produce or re-derive .feature files or contracts — it is a single targeted correction.
The agent MUST NOT introduce a new endpoint, table, event, or scenario the finding didn't ask for.
The agent MUST keep entity / field / endpoint names spelled the same way across AC, .feature, and DATA-CONTRACTS.md — alignment to the canonical name is usually the fix itself.

Why this hat exists in the fix loop

The production specification mandate is a from-scratch authoring playbook — Gherkin structure rules, full data-contract templates, the whole content guide. Reused against a one-line spec-wording or contract-consistency finding, it buries the correction under instructions for a different job, which stalled the fix loop. This variant keeps you in targeted-correction mode.

fix-hat 4Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

The agent MUST NOT edit any file — you are a verifier, not a fixer
The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat

Product

Scope

What to do

What NOT to do

How the engine runs this stage

1Elaborate

Inputs consumed

Discovery fan-out

Acceptance Criteria

Content Guide

Quality Signals

Behavioral Spec

Content Guide

Quality Signals

Coverage Mapping

Content Guide

Quality Signals

Data Contracts

Content Guide

API Endpoints

Database Models

Event Schemas (if applicable)

Quality Signals

Phase guidance

Product Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

Bad criteria — vague (no clear check)

Bad criteria — product-specific unverifiable

Unit outputs: — required artifact shape

Outputs produced

Product Specifications

Expected Artifacts

Quality Signals

AC artifact shapes

Variant-based AC structure

Adding a column to an existing table

Updating an existing column with a tooltip

Referencing a modal from an action

Settings card with a toggle that reveals a configuration section

Variant-based component placement (canonical multi-state shape)

Cross-reference conventions

Inline code values

2Review

Check

Common failure modes to look for

Check

Common failure modes to look for

3Execute

Process

1. Pre-flight — confirm inputs before writing

2. Identify variability BEFORE writing AC

3. Compare against existing — classify net new vs. modified vs. existing

4. Write the AC

5. Self-check before handing off

Anti-patterns (RFC 2119)

Process

1. Read your inputs

2. Identify the unit's discipline before choosing format

3. Write the Gherkin

4. Write the data contracts

5. Cross-check before handing off

Anti-patterns (RFC 2119)

Process

1. List this unit's outputs and read its spec

2. Build this unit's coverage rows

3. Reverse-walk this unit for scope creep

4. Decide

5. Self-check

Anti-patterns (RFC 2119)

4Approve

Check

Common failure modes to look for

Check

Common failure modes to look for

5Gate

Fix loop

Classifier (feedback triage)

What you do

What you do NOT do

Unit `outputs:` — required artifact shape