Product
External / Ask gateDefine behavioral specifications and acceptance criteria
Product
Define the behavioral contract that hands the design over to development: the acceptance criteria, executable scenarios, and data contracts that say what the system must do and how its success is judged.
Scope
Behavioral specification — observable behavior, acceptance criteria, and the data shapes that cross boundaries. Not the visual design (that came in upstream), not the implementation (that's development's call).
What to do
- Write acceptance criteria from the user's perspective: what they can do and how you'd know it worked.
- Make every criterion verifiable — pair it with a concrete scenario or check, not a vague intent.
- Cover the behavior the design implies, including the failure and edge paths, and prove the coverage.
What NOT to do
- Don't redesign the interface or restate visual decisions — reference the design, don't relitigate it.
- Don't choose implementation, frameworks, or data storage; specify the contract, not the mechanism.
- Don't write criteria no one can check, and don't leave behavior the design shows unspecified.
How the engine runs this stage
1Elaborate
collaborative · plan the work, fan out discovery, declare outputsDiscovery fan-out
knowledge artifactAcceptance CriteriaPrioritized user stories and acceptance criteria produced by the product hat. Defines what "done" looks like from the user's perspective — not how the system implements it.
Acceptance Criteria
Prioritized user stories and acceptance criteria produced by the product hat. Defines what "done" looks like from the user's perspective — not how the system implements it.
Content Guide
- User stories — "As a [role], I want [action], so that [benefit]" with specific domain entities
- Variability brief — dimensions along which behavior varies, confirmed before AC writing
- Acceptance criteria — structured as General Rules first, then variant-specific subsections
- Prioritization — P0 (must-have for completion) vs P1 (follow-up)
Quality Signals
- User stories reference specific domain entities, not generic placeholders
- Every criterion is specific enough to write a test for
- Edge cases and error paths are covered alongside happy paths
- Variability dimensions are explicitly enumerated
knowledge artifactBehavioral SpecGherkin `.feature` files defining what the system does from the user's perspective. These files drive development — tests are written to verify these behaviors, and the features themselves can be executed by Cucumber-compatible test runners.
Behavioral Spec
Gherkin .feature files defining what the system does from the user's perspective. These files drive development — tests are written to verify these behaviors, and the features themselves can be executed by Cucumber-compatible test runners.
Content Guide
Each .feature file should contain:
- Feature — descriptive name and summary of the capability
- Background — shared preconditions across scenarios (Given steps common to all)
- Scenarios — concrete examples covering:
- Happy path — the expected successful flow
- Error scenarios — validation failures, auth errors, not found, server errors
- Edge cases — boundary conditions, concurrent access, empty states, maximum limits
- Scenario Outlines — parameterized scenarios for testing across multiple inputs
Quality Signals
- Every feature has at least one error scenario, not just the happy path
- Scenarios are specific enough to execute as automated tests
- Actors are named roles, not generic "user"
- Edge cases cover boundaries (zero, one, max, empty, null)
- Steps use domain language consistent with acceptance criteria from the product hat
knowledge artifactCoverage MappingTraceability matrix produced by the validator hat mapping every unit success criterion to its corresponding acceptance criteria and specification items. A GAPS FOUND result blocks stage completion until the responsible hat addresses the gap.
Coverage Mapping
Traceability matrix produced by the validator hat mapping every unit success criterion to its corresponding acceptance criteria and specification items. A GAPS FOUND result blocks stage completion until the responsible hat addresses the gap.
Content Guide
- Coverage matrix — each success criterion mapped to AC and spec items that cover it
- Gap flags — any criterion with no corresponding AC or spec, with the responsible hat identified
- Scope creep flags — any AC or spec item that doesn't trace back to a success criterion
- Validation decision — APPROVED (no gaps) or GAPS FOUND (blocks stage completion)
Quality Signals
- Every success criterion maps to at least one AC or spec item
- Every AC item is testable — a concrete test can be described for it
- No gaps remain unflagged
- Scope creep items are identified but do not block approval
knowledge artifactData ContractsAPI, database, and event contracts that define the data shapes flowing through the system. This output is the agreement between frontend and backend, between services, and between the system and its persistence layer.
Data Contracts
API, database, and event contracts that define the data shapes flowing through the system. This output is the agreement between frontend and backend, between services, and between the system and its persistence layer.
Content Guide
API Endpoints
For each endpoint:
- Method and path (e.g.,
POST /api/v1/users) - Request schema — field names, types, required vs. optional, validation rules
- Response schema — field names, types, shape for success and each error case
- Error responses — status codes, error body shape, when each error occurs
- Authentication — what auth is required, what scopes/roles
Database Models
For each entity:
- Entity name and table/collection name
- Fields — name, type, nullable, default, constraints
- Relationships — foreign keys, join tables, cardinality
- Indexes — which fields are indexed and why
- Constraints — unique, check, not-null
Event Schemas (if applicable)
For each event:
- Event name and topic/channel
- Payload schema — field names and types
- Producer — what emits this event
- Consumers — what listens for this event
Quality Signals
- Every field has an explicit type and required/optional designation
- Error responses are specified alongside success responses
- Example values are provided for non-obvious fields
- Naming is consistent across all contracts (same entity name everywhere)
Phase guidance
phase overrideELABORATIONProduct criteria are verified by **behavioral testing** — automated tests (e.g. Cucumber `.feature` scenarios, integration tests, contract tests) that assert the system behaves as specified.
Product Stage — Elaboration
Criteria Guidance
Product criteria are verified by behavioral testing — automated tests (e.g. Cucumber .feature scenarios, integration tests, contract tests) that assert the system behaves as specified.
Good criteria — concrete and verifiable
When generating criteria for this stage, focus on behavioral verification:
- Detailed behavioral specs that describe what the system does, not how it is built
- Acceptance criteria for every user-facing scenario, each expressible as a Given/When/Then test
- Edge cases, error paths, and boundary conditions explicitly covered
- Data contracts, validation rules, and state transitions specified with concrete examples
- Integration points and external dependency behavior documented (with mock or contract-test specifications)
- Behavioral specs precise enough for a developer to implement without follow-up questions
Bad criteria — vague (no clear check)
- "Works correctly" — under what conditions? With what input?
- "Handles errors" — which errors? What's the expected response?
- "Data is validated" — against which schema? What error format?
Bad criteria — product-specific unverifiable
(In addition to the universal unverifiable shapes called out in the workflow engine contracts.)
- "Behavior is intuitive" — needs a usability-test pass with a stated success-rate threshold
- "Coverage is comprehensive across the user-facing capability list" — needs a structural check counting scenarios against the capability list, not a subjective judgment
Unit outputs: — required artifact shape
Every unit MUST declare its produced artifacts as real file paths in the outputs: frontmatter. The advance-hat gate verifies each path exists on disk; freeform descriptions get rejected at write time and at advance time.
For product-stage units, the typical artifact set is:
outputs:
# Behavioral spec — Gherkin .feature file the specification hat
# writes to features/. Per the behavioral-spec template, units MUST
# produce at least one .feature file when they cover user-observable
# behavior. Reference the file by its actual path, not by name.
- .haiku/intents/{intent-slug}/features/my_week.feature
# Acceptance criteria — markdown produced by the product hat for
# this slice of behavior. Lives at .haiku/intents/{intent-slug}/product/
# (NOT knowledge/ — that's discovery-stage territory).
- .haiku/intents/{intent-slug}/product/ACCEPTANCE-CRITERIA.md
# Data contract — schema/API/DB shape touched by this unit.
- .haiku/intents/{intent-slug}/product/DATA-CONTRACTS.md
Substitute the bracketed paths with the unit's real intent slug and feature filename. The validator hat's COVERAGE-MAPPING.md is one shared file across the stage — typically only the validator hat's terminal unit lists it as an output.
MUST NOT: write prose like outputs: ["Weekly carryover roll: scheduler trigger, idempotent roll logic"]. That's a completion-criteria description, belongs in the body's ## Completion Criteria section, and the gate now rejects it as unit_outputs_missing (no real path matches).
Outputs produced
output templateSpecsBehavioral specs and data contracts produced by product units. The specification hat writes `.feature` files in Gherkin syntax; the product hat writes acceptance criteria documents.
Product Specifications
Behavioral specs and data contracts produced by product units. The specification hat writes .feature files in Gherkin syntax; the product hat writes acceptance criteria documents.
Expected Artifacts
- Behavioral specs —
.featurefiles with Gherkin scenarios (Feature/Scenario/Given/When/Then) - Data contracts — API schemas, request/response shapes, field types
- Acceptance criteria — testable conditions for each feature, structured by variability dimension
Quality Signals
- Every product unit produces at least one spec artifact
- Behavioral specs are valid Gherkin syntax executable by a Cucumber-compatible runner
- Data contracts include error responses, not just success cases
AC artifact shapes
The structures below are the canonical shapes for acceptance criteria when the variability brief calls for them. Use these directly; don't invent new structures unless the work genuinely doesn't fit one of these. Project overlays at .haiku/studios/software/stages/product/outputs/SPECS.md may add house-specific patterns; prefer the overlay's shapes over the defaults below when one is present.
Variant-based AC structure
1. General Rules
1. [Things true across ALL variants — component references, default
states, tabs where nothing appears]
2. [Variant 1 name]
1. **[Screen / Tab Name]:**
1. [Component] Placement:
1. [Specific placement for this variant]
2. [Other Component]: [show / hide rule]
3. [Variant 2 name]
1. **[Screen / Tab Name]:**
1. [Component] Placement:
1. [Placement if different from Variant 1]
2. NOTE: This differs from Variant 1 — [explain how].
2. [Other Component]: Do NOT display
Adding a column to an existing table
1. Add "[Column Name]" Column to [Table Name]
1. Add a new column to the [Table Name] table
1. Column Header: [Column Name]
2. Column Position: Place after the "[Previous Column]" column
2. Column Data Display
1. IF [condition]:
1. Display [data description]
1. This is the same value described in [Section X](#anchor)
2. Format: `[format]`
1. Example: `[example]`
2. IF [alternate condition]:
1. Display: `[sentinel value]`
Updating an existing column with a tooltip
1. Update [Column Name] Column
1. Update text to Bold
2. Add question mark tooltip icon
1. icon: `question`
2. color: `info`
3. Selecting tooltip should open [Modal Name]
1. See [Section X](#anchor)
Referencing a modal from an action
1. For [action]: Use updated [Modal Name]
1. See [Section X](#anchor)
Settings card with a toggle that reveals a configuration section
1. Create [Setting Name] Card
1. Header
1. Icon
1. squareicon
2. icon: `[icon-name]`
3. color: `[token]`
2. title: [Setting Title]
2. Description
1. text: [Description copy]
3. Toggle Row
1. label: [Toggle label]?
2. Toggle
1. Default state: OFF (NO)
2. When toggled ON (YES), show [Configuration Section]
3. When toggled OFF (NO), hide [Configuration Section]
4. Highlighted Reminder
1. icon: `circle-info`
2. color: `info`
3. text: [Reminder copy]
4. Always show
5. Save Changes Button
1. text: Save Changes
2. color when enabled: `[primary-token]`
3. Keep disabled if no changes made or validation errors exist
4. When selected, save and show success toast
Variant-based component placement (canonical multi-state shape)
1. General Rules
1. The [Component Name] (see [Section X](#anchor) for full component AC) is added to [Screen Name]
2. The component should be collapsed by default in all states
3. The component should NOT display on the **[Tab Name]** in any state
2. [Variant 1]: [State Name]
1. **[Tab A]:**
1. [Component] Placement:
1. Place below [element above]
2. Place above [element below]
2. [Secondary Component] Placement:
1. Place directly below [Primary Component]
2. Only display if [condition] (see [Section X](#anchor))
2. **[Tab B]:**
1. [Component] Placement:
1. Place below [element above]
2. Place above [element below]
3. [Variant 2]: [State Name]
1. **[Tab A]:**
1. [Component] Placement:
1. Same placement as [Variant 1] [Tab A]
2. [Secondary Component]: Do NOT display
2. **[Tab B]:**
1. [Component] Placement:
1. Place below [different element]
2. NOTE: This differs from [Variant 1] — [explain the change]
2. [Secondary Component]: Do NOT display
Cross-reference conventions
Link related sections rather than restating. Anchor when an anchor is known; otherwise use See Section X above. Parenthetical form is fine for asides: ([Section VIII.b.1](#anchor)).
Inline code values
Backticks for values engineers will literally implement: time formats (HH:MM:SS, Xh Xm Xs), sentinel values (--, YES, NO), color tokens (primary, error, success), icon names, enum values.
When specifying icon + color + behavior together:
1. Icon
1. squareicon
2. icon: `mug-hot`
3. color: `primary`
2Review
pre-execute · agents audit the planned spec before any code landsreview agentCompletenessThe agent **MUST** verify that the product stage's acceptance criteria, behavioral specs, and data contracts fully cover the intent — every user-facing flow, every error path, every boundary condition, every contract surface. Coverage gaps that slip past this lens become production bugs.
Mandate: The agent MUST verify that the product stage's acceptance criteria, behavioral specs, and data contracts fully cover the intent — every user-facing flow, every error path, every boundary condition, every contract surface. Coverage gaps that slip past this lens become production bugs.
Check
The agent MUST verify, file feedback for any violation:
- Happy + error + edge coverage per flow — Every user-facing flow named in the intent has all three: a documented happy path, at least one error scenario (auth failure, validation failure, permission failure, not-found, conflict), and at least one boundary case (empty list, single item, maximum allowed, zero, off-by-one).
- Variant coverage — Every variant identified in the product hat's Variability Brief has either its own AC subsection or an explicit "same as Variant N" note. No variant is silently skipped.
- State-visibility completeness — Every state-visibility list has both
Show on:andDO NOT show on:entries. Silence is a coverage gap, not a default. - Contract completeness — Every endpoint named in any
.featurescenario has a row inDATA-CONTRACTS.md. Every field has an explicit type and required / optional designation. Every error scenario in a.featurehas a matching error row in the contract. - Cross-reference integrity — Every
See Section X/[Section X](#anchor)reference points to a section that exists. - AC ↔ scenario ↔ contract trace — The validator hat's
COVERAGE-MAPPING.mdisAPPROVED. If it'sGAPS FOUND, that's the highest-priority finding to file.
Common failure modes to look for
- A
.featurefile with only happy-path scenarios (no error, no boundary) - A
Background:block that's actually per-scenario preconditions misplaced - A scenario named in implementation language (
POST /signup ...) instead of domain language (User submits valid form) - An AC section that uses "etc." or "and so on" — explicit absence (
Do NOT display in X) is the contract; silence is ambiguity - A data contract entry like
data: objectwithout the inner shape spelled out - A variant referenced in the Variability Brief that has no corresponding AC subsection
review agentFeasibilityThe agent **MUST** challenge whether the specified behavior is implementable as written, within the technical constraints established by upstream design and inception stages. Specs that look complete but require disproportionate effort, conflict with existing schemas, or assume impossible capabilities produce a different failure mode than coverage gaps — they pass review and then stall in development. This lens catches them before they ship downstream.
Mandate: The agent MUST challenge whether the specified behavior is implementable as written, within the technical constraints established by upstream design and inception stages. Specs that look complete but require disproportionate effort, conflict with existing schemas, or assume impossible capabilities produce a different failure mode than coverage gaps — they pass review and then stall in development. This lens catches them before they ship downstream.
Check
The agent MUST verify, file feedback for any violation:
- Performance targets are realistic — Response times, throughput, and concurrency claims align with the data model and existing infrastructure.
Page loads in < 200msfor a screen that requires three joins across an un-indexed table is infeasible without a stated indexing or caching plan. - No silent breaking schema changes — Every contract change in
DATA-CONTRACTS.mdis compatible with existing schemas, or is paired with an explicit migration plan in the AC. Renaming a field, narrowing a type, or removing nullability from an existing column is a breaking change and MUST call out the migration approach. - Edge cases have defined behavior, not just intent — "Handle gracefully" is not feasible; it's a placeholder. Every edge case names the specific behavior (a status code, an empty state, a fallback value, a queued retry).
- No assumed-impossible capabilities — Specs don't require capabilities that aren't in the inception knowledge or the design output. If the spec assumes a third-party service that wasn't named in inception, file feedback against upstream — the assumption needs to be made explicit before this stage approves.
- Auth and permission specs are implementable against the existing identity model — The roles, scopes, and permission shapes in the spec match the system's existing auth model, or the spec calls out the auth-model change explicitly.
- Concurrency / ordering / idempotency are specified for any contract that needs them — Any endpoint that mutates state, any event in the contract, and any retry / job mechanism has explicit ordering, idempotency, and concurrency-failure semantics. Silence here becomes race conditions in production.
Common failure modes to look for
- A spec that calls for a capability whose cost (in latency, storage, or compute) hasn't been considered
- A new endpoint that conflicts with an existing path / verb combination
- A
.featurescenario whoseGivenrequires data state the database can't actually produce - A data contract that adds a not-null column to an existing table with no backfill or default specified
- An error scenario that catches an error the system can't actually throw (e.g., catching a network error in a synchronous local call)
- A boundary case ("max 10,000 items") with no statement of how the system behaves at and beyond the boundary
3Execute
per-unit baton · Product → Specification → Validatorhat 1ProductDefine behavioral acceptance criteria (AC) from the user's perspective — what users do and see, not how the system implements it. AC is what hands to engineers as the source-of-truth for behavior; quality here directly drives implementation quality downstream.
Focus: Define behavioral acceptance criteria (AC) from the user's perspective — what users do and see, not how the system implements it. AC is what hands to engineers as the source-of-truth for behavior; quality here directly drives implementation quality downstream.
Process
1. Pre-flight — confirm inputs before writing
Before writing AC, present this checklist to the user and confirm everything is in scope:
- Designs — links to the visual mockups / specs that show what's being built (one link per screen / state)
- Feature context — what the feature does and why, in plain language
- Reference AC — any existing AC docs / sections in the same product to match style, avoid duplication, and link as cross-references
- Feature flag — the flag name, if applicable, and whether it's enabled in the environment being compared against
- Environment to compare against — running app, staging, etc., so "what's already built" vs. "what's net new" can be distinguished
- Definition of "exists" — UI present? Behavior implemented? Tests passing? Agree on the bar before classifying anything as "already exists"
If the user can't confirm an item, write the AC scoped to what's confirmed and call out the gap inline — don't invent context.
2. Identify variability BEFORE writing AC
The single biggest source of missed requirements is unmodeled variability — a button that looks the same across screens but behaves differently per user role, device, state, or context. Don't discover variants mid-write by diffing designs; surface them up front.
Present a Variability Brief to the user for confirmation before any AC drafting:
- Dimension: what variable creates different behaviors? (user role, device type, state value, feature flag, locale, etc.)
- Variants: list every value of that dimension that has any behavior difference
- Per variant, what changes? Use a table:
| Variant | Screens affected | Placement differences | Show / hide differences | Behavior differences |
|---|---|---|---|---|
| name | which screens | where components go | what appears / disappears | any logic changes |
- What stays the same across all variants? (component always collapsed by default, never appears on the X tab, etc.)
Use the brief to decide structure:
- If variants share most behavior → write a General Rules section first, then variant-specific subsections that ONLY name the deltas
- If variants are mostly different → write each variant as its own top-level section
3. Compare against existing — classify net new vs. modified vs. existing
If the user gave you an environment to compare against (a running app, staging, etc.), do this BEFORE writing any AC:
- Navigate to each relevant screen in the comparison environment
- Compare against the new designs section-by-section
- For every UI element / behavior you'd write AC for, classify it:
- Existing — already there and matches the design. Skip AC or add
Already exists — no changes required - Modified — exists but something is changing. Write AC for the delta only and call out what's changing from current state
- Net new — doesn't exist yet. Write full AC
- Existing — already there and matches the design. Skip AC or add
- Present the classification to the user for confirmation before drafting
| Item | Classification | Notes |
|---|---|---|
| component / behavior | Existing / Modified / Net new | what's changing, if modified |
If the comparison environment doesn't have the feature flag enabled, everything will look net new — don't draw conclusions until the flag state is confirmed. When in doubt, flag it for the user, don't assume.
4. Write the AC
Follow the structure the Variability Brief implied. Match the conventions of the reference AC the user pointed at — numbering scheme, section headers, code formatting, tone. Consistency beats personal preference: if the team writes Section II.4.b, do that; if they write AC-1.4.3.2, do that. Don't impose a new scheme.
The canonical artifact shapes live in plugin/studios/software/stages/product/outputs/SPECS.md. Read those before drafting — they cover variant-based structure, table-column additions, tooltip updates, modal references, settings cards with toggles, multi-variant component placement, cross-reference conventions, and inline code values. Use them directly; engineers benefit from consistency more than from your originality.
Three principles always apply, regardless of shape:
- NOTE callouts — anywhere a variant deviates from a prior one, or anywhere implementation needs attention that isn't obvious from the numbered items alone, add an inline
NOTE:line that names the difference. Common uses: variant deviation, missing-design fallback, important non-obvious detail, "do NOT" reminders. - State visibility lists — when documenting which states show or hide a component, list the "show" cases first, then explicitly call out the "do not show" cases. Never omit a state — silence is ambiguous to developers. For simpler cases, inline it:
[Component]: Do NOT display in [State C] or [State D]. - Explicit "Do NOT display" — when a component is hidden in a variant, say so directly. Silence is ambiguous.
5. Self-check before handing off
Before declaring AC complete:
- Every variant in the Variability Brief has either its own section or an explicit "same as Variant N" note
- Every state in any visibility list has either a "show" or "do not show" entry
- Every reference to another AC section uses an anchor link, not a vague "see above"
- Every value engineers will literally implement is in backticks
- Every numbered item is independently testable — a QA engineer could write a single test that verifies just that item
- The document matches the formatting conventions of the reference AC the user pointed at
Anti-patterns (RFC 2119)
- The agent MUST present the Variability Brief and the existing-vs-modified-vs-new classification to the user for confirmation before drafting
- The agent MUST NOT skip variability identification — variant differences are the #1 source of missed requirements
- The agent MUST NOT write implementation details instead of user behavior (
"use a Redis cache"vs."the page loads in under 2 seconds") - The agent MUST NOT omit "do not show / do not display" states — silence is ambiguous; explicit absence is the contract
hat 2SpecificationTranslate the product hat's acceptance criteria into executable behavioral specs (Gherkin `.feature` files) and complete data contracts (API / DB / event schemas). Gherkin is the spec language — every AC item becomes one or more scenarios with explicit `Given` preconditions, `When` actions, and `Then` outcomes. Data contracts are the agreement frontend ↔ backend ↔ persistence. Precision matters: ambiguity in specs becomes bugs in code.
Focus: Translate the product hat's acceptance criteria into executable behavioral specs (Gherkin .feature files) and complete data contracts (API / DB / event schemas). Gherkin is the spec language — every AC item becomes one or more scenarios with explicit Given preconditions, When actions, and Then outcomes. Data contracts are the agreement frontend ↔ backend ↔ persistence. Precision matters: ambiguity in specs becomes bugs in code.
You produce two artifacts per unit:
- One or more
.featurefiles underfeatures/(Gherkin) - The unit's slice of
DATA-CONTRACTS.md(request / response / error shapes, DB models, event payloads)
You do NOT produce acceptance criteria — that's the product hat. You read the product hat's AC and turn each AC item into the corresponding scenario(s) and contract(s).
Process
1. Read your inputs
- Read the product hat's AC for this unit (
ACCEPTANCE-CRITERIA.md) - Read the unit's own success criteria
- Read sibling units' existing
.featurefiles andDATA-CONTRACTS.mdto keep naming consistent (aUserin one feature must be aUserin every other; an API path appearing in two units must use the same path and the same field names)
2. Identify the unit's discipline before choosing format
The right contract format depends on what the unit covers:
- Frontend / UI unit —
.featurefiles describe component states, responsive behavior, click flows, and visibility rules; data contracts are limited to the shape of payloads the component consumes / emits. Where AC says "show",.featuresaysThen I see .... - Backend / API unit —
.featurefiles describe request / response behavior, auth checks, error responses; data contracts are full request schemas, response schemas (success + every error), status codes, and authorization scopes. - Service / data-pipeline unit —
.featurefiles describe inputs in / outputs out, including timing and ordering; data contracts include event payloads, idempotency keys, retry semantics, and ordering guarantees. - DevOps / infra unit —
.featurefiles describe environment-specific configuration and rollback criteria; data contracts include the config schema and environment variables.
Pick the format before you start writing. Mixing them inside one feature file is how scenarios become unreadable.
3. Write the Gherkin
Feature file structure (one per logical capability, not one per unit — a unit may produce multiple .feature files if it covers more than one capability):
Feature: <capability name in domain language>
<one-line description of what this capability lets the user do and why>
Background:
Given <preconditions common to every scenario in this file>
And <another shared precondition>
Scenario: <named in user language, not implementation language>
Given <unique precondition for this scenario>
When <the single user action>
Then <the observable outcome>
And <secondary observable outcome>
Scenario: <error or edge case>
Given <precondition that triggers the error path>
When <user action>
Then <error response>
Scenario naming rules:
- Name scenarios in domain language that matches the AC.
User submits valid signup form— yes.POST /signup with valid body returns 201— no (that's implementation, not behavior). - A reviewer who has never touched the codebase should be able to read the scenario list and understand what the feature does.
- One observable behavior per scenario. If you have to use the word "and" in the scenario title, split it.
Background section rules:
- Put preconditions in
Backgroundonly if they apply to every scenario in the file. - Per-scenario preconditions go in the scenario's own
Givensteps. - If your
Backgroundis more than 4Givenlines, the file is probably covering two capabilities — split it into two.featurefiles.
Scenario Outline rules:
Use Scenario Outline with an Examples: table when the same scenario shape applies across multiple inputs (e.g., validation rules for a form across each invalid field). Don't use Scenario Outline to combine genuinely different behaviors into one parameterized scenario — that hides the behavior diversity from the reviewer.
Scenario Outline: Form rejects invalid <field>
Given the signup form is open
When I enter <value> in the <field> field
And I submit the form
Then the <field> field shows error "<error_message>"
Examples:
| field | value | error_message |
| email | not-an-email | Enter a valid email |
| password | abc | At least 8 characters |
| zip | 1234 | Enter a 5-digit ZIP code |
Error and edge-case coverage:
Every feature MUST include at least one error scenario. Cover, at minimum:
- The auth-failure path (if the capability is gated)
- The validation-failure path (if the capability accepts input)
- The not-found / permission path (if the capability resolves an entity)
- The boundary case (empty list, single item, maximum allowed, off-by-one)
A feature with only a happy path is not a complete spec — it's a sales demo.
Steps shared across files:
If you find yourself writing the same multi-line Given block in two feature files, factor it into a shared step (Given a logged-in <role>). Don't duplicate setup verbatim — when it drifts, the tests drift with it.
4. Write the data contracts
For each API endpoint touched by this unit, append to DATA-CONTRACTS.md:
### POST /api/v1/<resource>
**Auth:** <role / scope required, or "public">
**Request body**
| Field | Type | Required | Validation | Notes |
|------------|---------|----------|---------------------------|-------|
| email | string | yes | RFC 5322 email | |
| password | string | yes | min 8 chars, must include digit + symbol | hashed before storage |
| referral | string | no | UUID v4 | optional referral source |
**Success response** (`201 Created`)
| Field | Type | Notes |
|------------|---------|-------|
| id | UUID | new user id |
| email | string | echoed |
| created_at | ISO8601 | server time |
**Error responses**
| Status | Code | When |
|--------|-------------------|------|
| 400 | validation_failed | any required field missing or invalid |
| 409 | email_in_use | email already registered |
| 429 | rate_limited | > 5 signups / IP / hour |
For each DB entity touched, include: entity name, fields (name / type / nullable / default / constraints), relationships (FK + cardinality), indexes (which fields + why), and constraints (unique / check / not-null).
For each event emitted or consumed: event name + topic, payload schema, producer, consumers, ordering and idempotency semantics.
Required level of completeness:
- Every error case from the
.featurefile appears in the error-response table - Every field has an explicit type and required / optional designation
- Example values are provided for non-obvious fields (format-specific strings, sentinel values, units)
- Naming is consistent across all contracts in this intent (same entity name everywhere, same field name everywhere)
- No field labelled "data: object" without spelling out the object's shape
5. Cross-check before handing off
- Every AC item from the product hat maps to at least one
.featurescenario - Every
.featurescenario maps back to an AC item (no orphan scenarios) - Every endpoint named in any scenario appears in
DATA-CONTRACTS.md - Every error scenario has a corresponding error row in
DATA-CONTRACTS.md - Field names, entity names, and endpoint paths are spelled the same way across AC,
.feature, and contracts
Anti-patterns (RFC 2119)
- The agent MUST NOT write specs that describe implementation (
POST /signup with valid body returns 201) rather than behavior (User submits valid signup form) - The agent MUST NOT define happy path only without error and edge-case scenarios
- The agent MUST NOT leave contracts ambiguous — every response shape is named, every error is enumerated
- The agent MUST NOT introduce a new endpoint, table, or event in
.featurewithout writing its row inDATA-CONTRACTS.md - The agent MUST use the same entity / field / endpoint names in AC,
.feature, andDATA-CONTRACTS.md
hat 3ValidatorVerify that **this unit's** outputs accomplish **this unit's** spec. You are the verify role for the product stage. List the unit's declared outputs, then prove every success criterion in the unit's spec is covered by an acceptance-criteria item and a `.feature` scenario (and a data-contract row when the criterion implies a contract). You do not write AC or specs to fix gaps; you route gaps back to the responsible hat.
Focus: Verify that this unit's outputs accomplish this unit's spec. You are the verify role for the product stage. List the unit's declared outputs, then prove every success criterion in the unit's spec is covered by an acceptance-criteria item and a .feature scenario (and a data-contract row when the criterion implies a contract). You do not write AC or specs to fix gaps; you route gaps back to the responsible hat.
You record this unit's coverage in COVERAGE-MAPPING.md. The stage-wide roll-up across every unit — orphan scenarios, cross-unit scope creep, the full traceability matrix — is the product stage's completeness review agent's job, run once after all units are built. Not yours.
Process
1. List this unit's outputs and read its spec
- Call
haiku_unit_get { intent, stage, unit, field: "outputs" }to list the artifacts THIS unit declares. These — and only these — are the outputs you validate. - Call
haiku_unit_readfor THIS unit to read its## Completion Criteria/## Success Criteria— the spec the outputs must accomplish. - Read the content of this unit's outputs: the acceptance-criteria items in
ACCEPTANCE-CRITERIA.mdthis unit's product hat wrote, the unit's.featurefile(s), and theDATA-CONTRACTS.mdrows for endpoints/tables/events this unit touches.
2. Build this unit's coverage rows
One row per success criterion in this unit. Each row names which AC item(s), .feature scenario(s), and contract row(s) cover it. Record the rows under a heading keyed to this unit (e.g. ## unit-NN — <title>) so concurrent validators write distinct sections.
| Criterion ID | Success Criterion | AC Items | Scenarios | Contract Rows | Status |
|--------------|-------------------|----------|-----------|---------------|--------|
| SC-1 | <verbatim> | AC-1.2, AC-1.4 | `features/signup.feature:Scenario: User submits valid form` | `POST /api/v1/signup` row 1 | COVERED |
| SC-2 | <verbatim> | _none_ | _none_ | _none_ | GAP — responsible hat: product |
Status values:
- COVERED — at least one AC item + at least one
.featurescenario reference the criterion. If the criterion implies a contract (any API surface, DB write, event), at least one contract row exists too. - GAP — the criterion has no covering AC OR no covering scenario OR no covering contract row (when one is implied). Name the responsible hat:
- Missing AC →
product - Missing scenario →
specification - Missing contract row →
specification
- Missing AC →
- PARTIAL — covered by AC but no scenario yet, or covered by scenario but no contract row. Treated as GAP for the purposes of approval — list the responsible hat explicitly.
3. Reverse-walk this unit for scope creep
Within this unit's own outputs only, walk the other direction:
- Every AC item this unit wrote that doesn't trace back to one of this unit's success criteria → list under
## Scope Creep Candidateswith the AC reference and a one-line note. Scope creep does NOT block approval — it's a flag for the user to confirm intent. - Every
.featurescenario in this unit's file(s) that doesn't trace back to a success criterion → same treatment. - Every endpoint, table, or event this unit added to
DATA-CONTRACTS.mdthat none of this unit's scenarios reference → same treatment.
Cross-unit orphans (a sibling's scenario with no criterion) are the completeness review agent's job, not yours.
4. Decide
For this unit's rows:
- If every row is
COVERED: write## Validation Decision: APPROVEDunder this unit's heading and callhaiku_unit_advance_hat. - If any row is
GAPorPARTIAL: write## Validation Decision: GAPS FOUNDlisting each gap by criterion id + responsible hat. Then callhaiku_unit_reject_hatwith a message naming the gaps — the workflow engine rewinds this unit to the responsible hat. You do not file feedback for in-unit gaps — rejection is the routing mechanism for the in-flight hat chain.
When you reject, describe the content gap precisely: name the criterion and what the output fails to cover — "scenario User resets password never asserts the lockout-after-5-attempts outcome", not "scenario missing". This unit's output files exist on disk; a reject phrased as file-level absence is refused as contradicting the filesystem.
If a criterion depends on missing upstream output (e.g. a design decision that never landed), file feedback via haiku_feedback against the upstream stage — rejection only rewinds within this stage.
5. Self-check
- You listed this unit's outputs via
haiku_unit_getand validated only those - Only this unit's success criteria are in the matrix — no sibling unit's criteria
- Every cell in the AC / Scenarios / Contract Rows columns is a specific reference (
AC-1.4,features/signup.feature:Scenario: ...,POST /signup) — not "yes" or "covered" - Every GAP row names the responsible hat
- The validation decision is written explicitly as
APPROVEDorGAPS FOUNDunder this unit's heading
Anti-patterns (RFC 2119)
- The agent MUST validate only the outputs
haiku_unit_getlists for the unit under validation - The agent MUST NOT validate, read for gaps, or reject based on any unit other than the one under validation
- The agent MUST NOT edit any file other than
COVERAGE-MAPPING.md— you are a verifier, not a fixer - The agent MUST NOT approve without rows that name every success criterion in this unit
- The agent MUST name the responsible hat for every gap so the rejection routes correctly
- The agent MUST NOT mark a criterion COVERED based on intent — only based on a literal reference to the AC item, scenario, or contract row
- The agent MUST NOT write new AC or specs to fill gaps — gaps route back via
haiku_unit_reject_hat - The agent MUST NOT phrase a content reject as file-level absence ("missing file", "no output") — name the incomplete content instead
4Approve
post-execute · the same agents re-run against the built workThe agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.
approval agentCompletenessThe agent **MUST** verify that the product stage's acceptance criteria, behavioral specs, and data contracts fully cover the intent — every user-facing flow, every error path, every boundary condition, every contract surface. Coverage gaps that slip past this lens become production bugs.
Mandate: The agent MUST verify that the product stage's acceptance criteria, behavioral specs, and data contracts fully cover the intent — every user-facing flow, every error path, every boundary condition, every contract surface. Coverage gaps that slip past this lens become production bugs.
Check
The agent MUST verify, file feedback for any violation:
- Happy + error + edge coverage per flow — Every user-facing flow named in the intent has all three: a documented happy path, at least one error scenario (auth failure, validation failure, permission failure, not-found, conflict), and at least one boundary case (empty list, single item, maximum allowed, zero, off-by-one).
- Variant coverage — Every variant identified in the product hat's Variability Brief has either its own AC subsection or an explicit "same as Variant N" note. No variant is silently skipped.
- State-visibility completeness — Every state-visibility list has both
Show on:andDO NOT show on:entries. Silence is a coverage gap, not a default. - Contract completeness — Every endpoint named in any
.featurescenario has a row inDATA-CONTRACTS.md. Every field has an explicit type and required / optional designation. Every error scenario in a.featurehas a matching error row in the contract. - Cross-reference integrity — Every
See Section X/[Section X](#anchor)reference points to a section that exists. - AC ↔ scenario ↔ contract trace — The validator hat's
COVERAGE-MAPPING.mdisAPPROVED. If it'sGAPS FOUND, that's the highest-priority finding to file.
Common failure modes to look for
- A
.featurefile with only happy-path scenarios (no error, no boundary) - A
Background:block that's actually per-scenario preconditions misplaced - A scenario named in implementation language (
POST /signup ...) instead of domain language (User submits valid form) - An AC section that uses "etc." or "and so on" — explicit absence (
Do NOT display in X) is the contract; silence is ambiguity - A data contract entry like
data: objectwithout the inner shape spelled out - A variant referenced in the Variability Brief that has no corresponding AC subsection
approval agentFeasibilityThe agent **MUST** challenge whether the specified behavior is implementable as written, within the technical constraints established by upstream design and inception stages. Specs that look complete but require disproportionate effort, conflict with existing schemas, or assume impossible capabilities produce a different failure mode than coverage gaps — they pass review and then stall in development. This lens catches them before they ship downstream.
Mandate: The agent MUST challenge whether the specified behavior is implementable as written, within the technical constraints established by upstream design and inception stages. Specs that look complete but require disproportionate effort, conflict with existing schemas, or assume impossible capabilities produce a different failure mode than coverage gaps — they pass review and then stall in development. This lens catches them before they ship downstream.
Check
The agent MUST verify, file feedback for any violation:
- Performance targets are realistic — Response times, throughput, and concurrency claims align with the data model and existing infrastructure.
Page loads in < 200msfor a screen that requires three joins across an un-indexed table is infeasible without a stated indexing or caching plan. - No silent breaking schema changes — Every contract change in
DATA-CONTRACTS.mdis compatible with existing schemas, or is paired with an explicit migration plan in the AC. Renaming a field, narrowing a type, or removing nullability from an existing column is a breaking change and MUST call out the migration approach. - Edge cases have defined behavior, not just intent — "Handle gracefully" is not feasible; it's a placeholder. Every edge case names the specific behavior (a status code, an empty state, a fallback value, a queued retry).
- No assumed-impossible capabilities — Specs don't require capabilities that aren't in the inception knowledge or the design output. If the spec assumes a third-party service that wasn't named in inception, file feedback against upstream — the assumption needs to be made explicit before this stage approves.
- Auth and permission specs are implementable against the existing identity model — The roles, scopes, and permission shapes in the spec match the system's existing auth model, or the spec calls out the auth-model change explicitly.
- Concurrency / ordering / idempotency are specified for any contract that needs them — Any endpoint that mutates state, any event in the contract, and any retry / job mechanism has explicit ordering, idempotency, and concurrency-failure semantics. Silence here becomes race conditions in production.
Common failure modes to look for
- A spec that calls for a capability whose cost (in latency, storage, or compute) hasn't been considered
- A new endpoint that conflicts with an existing path / verb combination
- A
.featurescenario whoseGivenrequires data state the database can't actually produce - A data contract that adds a not-null column to an existing table with no backfill or default specified
- An error scenario that catches an error the system can't actually throw (e.g., catching a network error in a synchronous local call)
- A boundary case ("max 10,000 items") with no statement of how the system behaves at and beyond the boundary
5Gate
controls advancement to the next stageThe user chooses: submit for external review, or approve locally.
Fix loop
a separate track · Classifier → Product → Specification → Feedback AssessorNot a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.
fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's
Classifier (feedback triage)
You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.
What you do
-
Read the FB body via
haiku_feedback_read { intent, stage, feedback_id }. -
Read the stage's unit list via
haiku_unit_list { intent, stage }. -
Decide:
target_unit— which unit this FB counter-signals.- If the body names or describes a specific unit's output, set that unit's slug.
- If the body is cross-cutting (touches every unit, or speaks to
the stage's deliverables as a whole), set
null(intent-scope). - When in doubt:
null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
target_invalidates— which approval roles get cleared on closure. Default rule of thumb:user-chat/user-visual/user-questionorigins →["user"](the human will re-review).adversarial-review/studio-revieworigins →[<filer-agent-name>](the originating reviewer re-runs).driftorigin →["user"](drift always escalates to human).agentorigin →[](informational; no rerun).
-
Call
haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes thetarget_unit/target_invalidatesrouting only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance. -
Decide severity and call
haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returnsseverity_already_setand you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.- blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
- high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
- medium — a genuine issue worth fixing; not delivery-blocking.
- low — a nit, polish, or nice-to-have.
Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.
-
Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only
reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself:haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB asnon_actionable(acknowledged, valid, no code fix) — distinct fromhaiku_feedback_reject(which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step. -
Otherwise, call
haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" }to hand off to the next fix-hat. Themessageis the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_writeis refused). Your reasoning lives in the handoffmessage.
What you do NOT do
- You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
- You do NOT call
haiku_feedback_reject— that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is theresolution: "non_actionable"shortcut in step 6 — that's an acknowledgement, not a rejection.) - You do NOT spawn subagents. The classification is a single read + single write + advance.
Why this hat exists
Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.
fix-hat 2ProductCorrect ONE acceptance-criteria finding through the product lens. The AC already exists — you are making the targeted change the finding describes, not authoring AC from a blank page.
Focus: Correct ONE acceptance-criteria finding through the product lens. The AC already exists — you are making the targeted change the finding describes, not authoring AC from a blank page.
What you do
- Identify, from the finding, the exact AC item(s) it implicates and the specific change it calls for — a wording fix, a missing or incorrect classification, a scope correction, a contradiction with another artifact (design, spec, or a sibling unit's AC).
- Make ONLY that change to the affected AC. Keep the existing structure, numbering, NOTE callouts, and visibility conventions; touch the minimum needed to resolve the finding.
- If the change ripples (a renamed entity, a corrected state name), apply it consistently everywhere that AC names it — but do not go looking for unrelated improvements.
What you do NOT do
- You do NOT re-author the AC from scratch. No Variability Brief, no existing-vs-new classification pass, no comparison-environment walk-through, no pre-flight checklist — that ritual belongs to the production phase, not a one-finding correction.
- You do NOT present to, or wait on, the user. The fix loop is non-interactive — resolve the finding from the artifacts in front of you.
- You do NOT expand scope beyond the one finding. An adjacent gap you happen to notice belongs in a separate feedback item, not this fix.
- You do NOT touch units, the
.featurefiles /DATA-CONTRACTS.md(that's the specification hat), other stages' artifacts, or anything the finding didn't name.
Anti-patterns (RFC 2119)
- The agent MUST NOT treat this dispatch as a request to produce or re-derive acceptance criteria — it is a single targeted correction.
- The agent MUST NOT widen scope past the flagged item.
- The agent MUST preserve the document's existing conventions; consistency with what's already there beats personal preference.
Why this hat exists in the fix loop
The production product mandate teaches authoring AC from a blank page through user collaboration — the wrong shape for correcting a single review finding, and the source of fix-loop stalls when the agent tried to run a from-scratch ritual against a one-line fix. This variant keeps you in targeted-correction mode so the change lands small and the chain advances.
fix-hat 3SpecificationCorrect ONE behavioral-spec or data-contract finding through the specification lens. The `.feature` files and `DATA-CONTRACTS.md` already exist — you are making the targeted change the finding describes, not authoring specs from scratch.
Focus: Correct ONE behavioral-spec or data-contract finding through the specification lens. The .feature files and DATA-CONTRACTS.md already exist — you are making the targeted change the finding describes, not authoring specs from scratch.
What you do
- Identify, from the finding, the exact scenario, step, contract field, or schema row it implicates and the specific change it calls for — a mismatch with the AC, a missing error/edge case, an inconsistent field/entity/endpoint name, a required/optional disagreement, a contradiction with a sibling unit's contract, or an out-of-scope reference in the completion criteria.
- Make ONLY that change. Keep the existing Gherkin structure (Background, Scenario Outline, naming) and contract table conventions; touch the minimum needed to resolve the finding.
- When the finding is a naming or required/optional inconsistency, align the spec to the canonical artifact the finding cites (the AC, or the unit that owns the contract) rather than inventing a third spelling.
What you do NOT do
- You do NOT re-author features or contracts from scratch. No full Gherkin-structure walk-through, no blank-page data-contract authoring, no happy-path-plus-every-error sweep — that's the production phase, not a one-finding correction.
- You do NOT expand scope beyond the one finding. An adjacent missing scenario belongs in separate feedback.
- You do NOT touch units, the acceptance criteria (correcting AC is the product hat's job), other stages' artifacts, or anything the finding didn't name.
Anti-patterns (RFC 2119)
- The agent MUST NOT treat this dispatch as a request to produce or re-derive
.featurefiles or contracts — it is a single targeted correction. - The agent MUST NOT introduce a new endpoint, table, event, or scenario the finding didn't ask for.
- The agent MUST keep entity / field / endpoint names spelled the same way across AC,
.feature, andDATA-CONTRACTS.md— alignment to the canonical name is usually the fix itself.
Why this hat exists in the fix loop
The production specification mandate is a from-scratch authoring playbook — Gherkin structure rules, full data-contract templates, the whole content guide. Reused against a one-line spec-wording or contract-consistency finding, it buries the correction under instructions for a different job, which stalled the fix loop. This variant keeps you in targeted-correction mode.
fix-hat 4Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.
Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.
Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.
Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.
Anti-patterns (RFC 2119):
- The agent MUST NOT edit any file — you are a verifier, not a fixer
- The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
- The agent MUST NOT call
advance_hat(close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden —reject_hatwith what's outstanding. - The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
- The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
- The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean
reject_hat