Ideation · stage 3 of 4

Review

Ask gate

Adversarial quality review of the deliverable

Review

Adversarially stress-test the deliverable before it's finalized. The draft from create goes in; a structured report with severity-graded findings comes out. This stage exists to surface the weaknesses, unsupported claims, and gaps the author couldn't see in their own work.

Scope

Quality review of the draft against named criteria: each review surface — clarity, evidence strength, novelty, structural integrity, scope fit, audience fit, coherence — examined and graded. Review decides whether the deliverable holds up and where it doesn't — not what it says (create) and not how it's packaged (deliver). It produces findings; it doesn't rewrite the artifact.

What to do

Name each review surface and its criteria, then review against them with cited observations.
Trace every claim in the draft to its source and flag anything that doesn't trace.
Hunt for weaknesses, logical gaps, and missing perspectives the author wouldn't catch on their own work.
Grade findings by severity (critical, major, minor) so the next stage knows what must be addressed.

What NOT to do

Don't rewrite the deliverable or fix the findings yourself — record them; addressing them is a revisit to create.
Don't re-run research or generate new content.
Don't soften a finding because the draft is otherwise good; severity discipline is the point.
Don't decide unilaterally which findings get addressed — a human arbitrates that before deliver.

How the engine runs this stage

1Elaborate

autonomous · plan the work, fan out discovery, declare outputs

Inputs consumed

draft-deliverablefrom Create

Discovery fan-out

knowledge artifactReview ReportAdversarial review findings for the draft deliverable. This output drives the deliver stage's remediation work.

Review Report

Adversarial review findings for the draft deliverable. This output drives the deliver stage's remediation work.

Content Guide

Organize findings by severity:

Critical — factual errors, logical fallacies, missing essential content. Must be fixed before delivery.
Major — structural problems, weak arguments, significant gaps. Should be fixed.
Minor — clarity issues, style improvements, nice-to-haves. May be fixed.

Each finding should include:

Description of the issue
Location in the deliverable (section, paragraph)
Evidence or reasoning for why it's a problem
Suggested remediation

Separate sections for:

Structural issues — organization, flow, argument structure
Factual issues — incorrect claims, unverified data, reasoning errors
Completeness gaps — missing perspectives, unaddressed questions

End with a summary verdict: approve (ready for delivery), revise (fix critical/major issues then re-review), or reject (fundamental problems requiring substantial rework).

Quality Signals

Findings are specific enough to act on without re-reading the entire draft
Severity ratings reflect actual impact, not reviewer preference
Remediation suggestions are concrete, not vague
The report distinguishes between "must fix" and "nice to have"

Phase guidance

phase overrideELABORATION- "Review report identifies at least 3 substantive issues with specific remediation suggestions"

Review Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

"Review report identifies at least 3 substantive issues with specific remediation suggestions"
"All factual claims are verified against original sources with citations"
"Each finding includes severity rating and actionable fix recommendation"

Bad criteria — vague (no clear check)

"Review is complete"
"Facts are checked"
"Feedback is provided"

Outputs produced

output templateReview FindingsSeverity-ranked findings from adversarial quality review with fact-checking results.

Review Findings

Severity-ranked findings from adversarial quality review with fact-checking results.

Expected Artifacts

Findings -- severity-ranked issues with actionable fix recommendations
Fact-check results -- all factual claims classified as verified, unverified, or false
Fix recommendations -- specific "this is wrong because X, fix by Y" guidance
Verdict -- approve, revise, or reject with rationale

Quality Signals

At least 3 substantive issues are identified with specific remediation suggestions
All factual claims are verified against original sources
Each finding includes severity rating and actionable fix
Verdict is clear with supporting rationale

2Review

pre-execute · agents audit the planned spec before any code lands

review agentCoherenceThe agent **MUST** verify the review's outputs are internally consistent and present a unified narrative across units. Coherence is the lens — a review report that contradicts itself, uses inconsistent terminology, or whose summary doesn't reflect its detail is harder for the downstream `deliver` stage to act on than one with fewer findings stated coherently.

Mandate: The agent MUST verify the review's outputs are internally consistent and present a unified narrative across units. Coherence is the lens — a review report that contradicts itself, uses inconsistent terminology, or whose summary doesn't reflect its detail is harder for the downstream deliver stage to act on than one with fewer findings stated coherently.

Check

The agent MUST verify, filing feedback for any violation:

Terminology consistent — One term per concept across every unit's findings. If unit A's findings call something a "variant" and unit B's findings call the same thing an "option," that's a violation — the publisher will spend cycles reconciling rather than addressing the substance.
Findings do not contradict each other — Two findings recommending opposite changes to the same draft section is a violation. Either one is wrong, or both apply under different conditions that need to be stated explicitly.
Severity assignments coherent across units — Comparable findings carry comparable severities. A "load-bearing claim unsourced" rated critical in one unit and minor in another is a violation; flag the inconsistency.
Executive summary reflects the detail — If the review produces a summary (per the intent or the review-planner's plan), the summary's claims trace to specific findings in the detail. A summary that softens or strengthens the detail is a violation.
Cross-unit references resolve — Where one unit's findings reference another unit ("see review unit-3"), the referenced unit actually contains what's claimed.
Plan-do-verify chain coherent within each unit — The synthesizer's findings cover every aspect the planner listed; the reviewer's verification noted any gaps; the adversarial hats' findings extend rather than contradict the front loop's findings.

Common failure modes to look for

The same draft section called "introduction" in one unit's findings and "preamble" in another
Two findings recommending opposite fixes to the same paragraph with no condition stated to disambiguate
A critical finding in one unit and a minor finding for the same defect class in another, with no rubric reason for the difference
An executive summary asserting "the draft is strong overall" when the detail contains three critical findings
A finding citing "research §3" when there is no §3 in the research brief
A synthesizer block that says "no findings" for an aspect the planner explicitly listed (silent skip)

3Execute

per-unit baton · Review Planner → Synthesizer → Reviewer → Critic → Fact Checker

hat 1CriticAdversarial pass on THIS unit's review. The front loop (planner → synthesizer → reviewer) produced a structured review against named aspects. Your job is to find what that front loop missed — weaknesses, logical gaps, missing perspectives, structural problems the named aspects didn't cover. You operate AFTER the front loop closes per architecture §3.5; the plan's aspect list is your floor, not your ceiling.

Focus: Adversarial pass on THIS unit's review. The front loop (planner → synthesizer → reviewer) produced a structured review against named aspects. Your job is to find what that front loop missed — weaknesses, logical gaps, missing perspectives, structural problems the named aspects didn't cover. You operate AFTER the front loop closes per architecture §3.5; the plan's aspect list is your floor, not your ceiling.

Process

1. Read the draft and the synthesizer's findings together

Read the full draft slice this unit covers. Then read every observation the synthesizer produced. Hold both in mind — your job is to find what they jointly missed. If the synthesizer found a finding, you don't repeat it; you escalate it (if under-rated), challenge its remediation (if weak), or extend it to surfaces the planner didn't list.

2. Hunt the missing perspective

Common gaps the front loop misses because it's bound to the planner's aspect list:

Audience the deliverable was NOT written for — would a different legitimate audience read this and reach a wrong conclusion? Name the audience and the wrong conclusion.
Structural alternative — is there a stronger structure for the section that the creator didn't consider? Don't dictate it; flag it as a finding so the creator can decide.
Stronger counterargument — does the draft consider the weakest version of the opposing position? Steel-man the opposition and check whether the section answers it.
Domain-blindness — does the draft import an assumption from one domain that doesn't hold in this one?
Selection bias in evidence — is the cited evidence systematically drawn from one source class (only vendor blogs, only analyst opinions, only one geography)?
Unstated dependency — does the section's argument quietly depend on an unstated condition that may not hold?

You don't have to hit every category. Hit the ones the draft actually has problems with.

3. Construct each finding with an alternative

A finding without a constructive alternative is a complaint, not a critique. Each finding shape:

### <Finding name>
**What's wrong:** <the gap, with citation to the draft>
**Why it matters:** <the failure mode that results>
**Alternative to consider:** <a concrete different shape — don't dictate, suggest>
**Severity:** critical | major | minor (using the planner's rubric)

If the rubric for the relevant aspect doesn't fit your finding, say so — but choose the closest match rather than inventing a new severity scale.

4. Don't nitpick

Stylistic preferences, formatting quibbles, minor word choice — those are the editor's territory, not yours. If you find yourself writing "I'd word this differently," cut it. Your job is structural and substantive critique; the editor and the publisher handle polish.

5. Self-check before handing off

Every finding cites specific draft content (section / paragraph / line / quote)
Every finding includes a constructive alternative — not just "this is weak"
No finding duplicates a synthesizer finding without adding new information
No finding is purely stylistic
Findings cover at least one perspective the planner's aspect list didn't already cover (otherwise this hat added no value)

Anti-patterns (RFC 2119)

The agent MUST NOT nitpick style or formatting over substance
The agent MUST NOT provide only negative feedback without constructive alternatives
The agent MUST NOT be vague ("this section is weak") without specific draft anchors and a named failure mode
The agent MUST NOT miss the forest for the trees — focusing on detail-level findings while ignoring structural problems
The agent MUST NOT rubber-stamp without genuine critical engagement — if every finding is "looks fine," the hat added no value
The agent MUST NOT repeat the synthesizer's findings verbatim — escalate, extend, or challenge, but don't duplicate
The agent MUST NOT invent a new severity scale — use the planner's rubric or choose the closest match
The agent MUST find at least one perspective the planner's aspect list didn't already cover, OR explicitly state that the front loop's coverage was complete (with reasoning)

hat 2Fact CheckerAdversarial-verify pass on THIS unit. Trace every load-bearing claim in the draft (and in any findings the synthesizer or critic produced) to its named source. Trust nothing on face value — the claim is only as strong as its weakest cited source. This is the terminal hat in the review stage's adversarial loop; downstream stages consume the findings you sign off on.

Focus: Adversarial-verify pass on THIS unit. Trace every load-bearing claim in the draft (and in any findings the synthesizer or critic produced) to its named source. Trust nothing on face value — the claim is only as strong as its weakest cited source. This is the terminal hat in the review stage's adversarial loop; downstream stages consume the findings you sign off on.

Process

1. Inventory the load-bearing claims

A load-bearing claim is any claim that, if false, would change the section's conclusion or recommendation. Walk the draft and list every load-bearing claim with its cited source. Skip claims of common knowledge (definitions, established theory) unless the section uses them in a non-standard way.

If a load-bearing claim is uncited, that itself is a finding — file it immediately at severity major or critical depending on how load-bearing it is, and continue.

2. Trace each claim to its source

For each load-bearing claim:

Open the cited source (URL, doc path, conversation reference)
Locate the specific passage the source uses
Compare the section's restatement against the source

Three failure modes to flag:

Strengthened paraphrase — the section asserts more than the source supports (a "may" became a "will," a single-vendor case study became "industry-wide")
Weakened paraphrase — the section asserts less than the source warrants, often hiding a stronger inconvenient finding
Misattribution — the source doesn't make the claim at all; it was hallucinated or assigned to the wrong source

3. Check the chain of reasoning

For claims that are inferences from multiple cited sources, walk the inference step by step:

Is each premise actually supported by its cited source?
Does the conclusion follow from the premises, or does it smuggle in an unstated premise?
Is there a statistical or logical fallacy in the chain (correlation→causation, base-rate neglect, survivorship bias, hasty generalization)?

A reasoning chain whose individual sources all check out but whose conclusion doesn't follow is a critical finding.

4. Trust-class each surviving source

For claims that pass the trace, double-check the source's trust class:

Primary — first-party documentation, original research, named expert with relevant credentials
Secondary — analyst report, expert commentary, peer-reviewed write-up
Tertiary — vendor blog, anonymous community post, news aggregation

A load-bearing claim sourced only to tertiary anchors is a finding — not because the claim is wrong, but because the evidence is insufficient for the load it's carrying.

5. Write findings into the unit body

### Claim: "<verbatim from draft>"
**Cited source:** <source ref>
**Trace result:** SUPPORTED | STRENGTHENED | WEAKENED | MISATTRIBUTED | UNSOURCED
**Trust class of source:** primary | secondary | tertiary
**Finding:** <description if any — what's wrong, what the source actually says>
**Severity:** critical | major | minor
**Recommended action:** <remove the claim, weaken to what the source supports, find a stronger source, etc.>

For supported, appropriately-cited claims, a one-line "SUPPORTED, primary source, no action" is enough. Don't pad PASS findings with prose.

6. Self-check before handing off

Every load-bearing claim is inventoried (no silent passes on under-cited claims)
Every uncited load-bearing claim is filed as a finding
Every paraphrase mismatch (strengthened or weakened) is filed
Every reasoning chain is walked step by step
Every surviving source is trust-classed
Findings name specific recommended actions

Anti-patterns (RFC 2119)

The agent MUST NOT accept claims at face value because they sound reasonable
The agent MUST NOT only check easy-to-verify facts while skipping reasoning chains
The agent MUST trace claims back to primary sources where the load justifies it
The agent MUST NOT conflate "not disproven" with "verified" — the absence of contradiction is not evidence
The agent MUST NOT ignore statistical or logical reasoning errors that the front loop missed
The agent MUST file uncited load-bearing claims as findings, not silently let them pass
The agent MUST NOT approve a chain of reasoning where individual sources check out but the conclusion doesn't follow from them
The agent MUST NOT rubber-stamp a tertiary-only source for a load-bearing claim — flag the trust-class mismatch even when the claim is technically supported
The agent MUST NOT invent a source the draft didn't cite to make a claim work — that's the creator's job in the next iteration

hat 3Review PlannerPlan the review for THIS unit. Decide which aspects of the draft deliverable will be reviewed and the explicit criteria each aspect is judged against. You do NOT perform the review — that is the synthesizer's job. Your output is a structured review plan the synthesizer follows. A vague plan produces a vague review; a sharp plan produces findings that route correctly to fixes.

Focus: Plan the review for THIS unit. Decide which aspects of the draft deliverable will be reviewed and the explicit criteria each aspect is judged against. You do NOT perform the review — that is the synthesizer's job. Your output is a structured review plan the synthesizer follows. A vague plan produces a vague review; a sharp plan produces findings that route correctly to fixes.

Process

1. Read the inputs

The draft deliverable from create/draft-deliverable (the slice this unit owns)
The unit's success criteria (this is the contract — every criterion must map to at least one planned aspect)
The research brief and any recorded Decisions from the intent's decision register

2. Name the aspects

An aspect is a named, observable property of the deliverable. Aspects are NOT generic ("quality") — they are specific enough that the synthesizer can produce a substantive observation against each one. Typical aspects for ideation deliverables (pick the ones this unit's scope calls for; don't list all of them blindly):

Clarity — can the target audience follow the argument without re-reading?
Evidence strength — do load-bearing claims trace to sources of appropriate trust level?
Novelty / variance — for divergent work, are the alternatives substantively different or surface-different?
Convergence rigor — for convergent work, are the criteria explicit and applied consistently?
Structural integrity — does the section structure reveal the argument?
Scope fit — does the section stay inside the slice the unit owns?
Audience fit — does tone, level of detail, and terminology match the named audience?
Internal coherence — do sub-claims compose without contradiction?
Terminology consistency — one term per concept across the section
Decision-register consistency — no claim contradicts a recorded Decision

Cap the unit at the aspects the synthesizer can substantively review in a single bolt. Five or six well-scoped aspects beats fifteen shallow ones.

3. Define the criterion for each aspect

For each aspect, write the rubric the synthesizer will judge against. The criterion must be specific enough that two reviewers reading the draft against it would reach the same severity rating. Generic criteria like "evidence is sufficient" fail this test; "every load-bearing claim cites a primary source or named expert; analyst opinions count as supporting, not primary" passes.

4. Set the severity rubric

The synthesizer's findings will use severity ratings — define them up front so they aren't applied arbitrarily:

Critical — the finding, if not addressed, would make the deliverable wrong or unfit for purpose
Major — the finding meaningfully weakens the deliverable but the deliverable could still ship with documented caveats
Minor — stylistic, polish, or nice-to-have

Bind the rubric to the specific aspects: "for evidence-strength findings, an unsourced load-bearing claim is critical; an under-supported supporting claim is major; a missing footnote on a tangential point is minor."

5. Write the plan into the unit body

Structure:

## Review Plan for <unit>
### In scope
1. <Aspect> — <criterion>. Severity rubric: <critical / major / minor anchors>.
2. ...

### Out of scope
- <aspect explicitly NOT reviewed in this unit, and where it IS reviewed if anywhere>

### Open Questions
- <ambiguity in the unit's success criteria that affected scope decisions, with default applied>

6. Self-check before handing off

Every aspect is a named, observable property — not "quality"
Every aspect has a criterion specific enough that two reviewers would agree
The severity rubric is bound to specific aspects, not generic
Out-of-scope aspects are explicit so the synthesizer doesn't widen the review
Every unit success criterion maps to at least one planned aspect

Anti-patterns (RFC 2119)

The agent MUST NOT plan a generic "review for quality" — every aspect MUST be a named, observable property of the deliverable
The agent MUST NOT plan more aspects than the synthesizer can substantively review in a single bolt
The agent MUST NOT plan aspects that contradict a recorded Decision (e.g., reviewing for a tone the intent's Decision N explicitly ruled out)
The agent MUST declare out-of-scope aspects explicitly so the synthesizer doesn't widen the review
The agent MUST NOT prescribe the conclusions of the review — your job is planning WHAT gets reviewed, not WHAT the review will say
The agent MUST NOT delegate criterion definition to the synthesizer — vague criteria mean inconsistent reviews
The agent MUST NOT set a severity rubric that's identical across aspects — anchor it per aspect so it actually constrains the synthesizer
The agent MUST ensure every unit success criterion maps to at least one planned aspect — uncovered criteria are how findings get silently skipped

hat 4ReviewerVerify-class hat for the review stage's plan-do-verify front loop. Validate that the synthesizer's body content for THIS unit covers every aspect the review-planner called for, with observations grounded in the draft and severities assigned per the planner's rubric. Body-only verification per architecture §3.4 — frontmatter is workflow engine territory. Adversarial loop (`critic`, `fact-checker`) runs LATER. Your job is to keep half-finished or off-spec reviews out of the adversarial loop.

Focus: Verify-class hat for the review stage's plan-do-verify front loop. Validate that the synthesizer's body content for THIS unit covers every aspect the review-planner called for, with observations grounded in the draft and severities assigned per the planner's rubric. Body-only verification per architecture §3.4 — frontmatter is workflow engine territory. Adversarial loop (critic, fact-checker) runs LATER. Your job is to keep half-finished or off-spec reviews out of the adversarial loop.

Anti-patterns (RFC 2119):

The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
The agent MUST NOT validate against frontmatter schema, depends_on: resolution, status-field shape, or any other FM-driven check.
The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
The agent MUST name a specific failed criterion in any rejection.
The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.
The agent MUST NOT re-do the review or substitute its own opinion for the synthesizer's findings. You verify coverage and rigor, not conclusions.

Validate this unit's outputs against its criteria

List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.

What you check (BODY ONLY)

1. Every planned aspect is covered

For every aspect the review-planner listed in the prior section, the synthesizer's notes MUST contain a corresponding observation block. A skipped aspect — silently or with "skipped, out of time" — is a hard reject.

2. Observations cite the draft concretely

Every observation MUST cite a specific section, paragraph, line range, or quote from the draft. "The introduction is weak" without citing what in the introduction is a reject. "Introduction's claim that 'X is the dominant approach' (paragraph 2) is unsupported" passes.

3. Severities follow the planner's rubric

Every finding's severity (critical / major / minor) MUST be justified by the rubric the review-planner set in the prior section. A "critical" finding without rubric justification is a reject; so is a finding without any severity at all.

4. Decision-register consistency

The synthesizer's findings MUST NOT recommend changes that contradict a recorded Decision. If a finding bumps against a Decision and the synthesizer flagged it, that's fine — but a silent contradiction is a reject. Cite the Decision ID.

5. Open questions accounted for

If the synthesizer flagged open questions, each MUST be explicit and actionable for the planner / human. Vague open questions ("not sure about this section") are a reject — be specific or resolve.

6. No scope drift

The synthesizer MUST NOT have reviewed aspects the planner did not list. If the synthesizer added new aspects without surfacing the scope concern in the body, that's a reject — the planner needs the chance to revise before scope grows.

hat 5SynthesizerPerform the review per the review-planner's plan for THIS unit. Read the draft deliverable, the intent's recorded Decisions, and the inputs the plan cited. Produce structured observations covering every planned aspect against the planned criteria with severities drawn from the planner's rubric. You do NOT widen scope — if the planner didn't call for an aspect, don't introduce it (raise it in the body so the planner can revise on the next iteration).

Focus: Perform the review per the review-planner's plan for THIS unit. Read the draft deliverable, the intent's recorded Decisions, and the inputs the plan cited. Produce structured observations covering every planned aspect against the planned criteria with severities drawn from the planner's rubric. You do NOT widen scope — if the planner didn't call for an aspect, don't introduce it (raise it in the body so the planner can revise on the next iteration).

Process

1. Read the plan and the draft together

Open the review plan (the planner wrote it into this same unit body in the prior section). For each in-scope aspect, hold the criterion in mind as you read the draft slice this unit covers. Skim once for orientation; second read is where observations come from.

2. Produce one observation block per planned aspect

For each in-scope aspect from the planner's list, write an observation block in the body. Even if the conclusion is "passes the criterion," write the block — silent skips are how shallow reviews ship. Block shape:

### <Aspect name>
**Criterion:** <verbatim or paraphrased from the planner>

**Observation:** <what you found, citing specific sections / paragraphs / lines / quotes from the draft>

**Verdict:** PASS | FINDING — severity: critical | major | minor

**If FINDING:** <what's wrong, why it matters, suggested remediation>

Citation discipline:

Every observation cites a specific anchor in the draft — section header, paragraph number, line range, or short verbatim quote
"The introduction is weak" is not an observation; "Introduction's claim that 'X is the dominant approach' (paragraph 2) is unsupported — no source cited, contradicts research-brief §Patterns 3" is an observation
When a finding bumps against a recorded Decision, cite the Decision ID

3. Use the severity rubric the planner set

Severities are constrained by the planner's rubric, not your gut. If you find yourself wanting to rate a finding "critical" but the planner's rubric for that aspect wouldn't class it that way, either the rubric is wrong (raise it in the body for planner-revise) or your finding isn't actually critical (downgrade it). Don't fight the rubric silently.

4. Flag scope concerns rather than acting on them

If you spot a defect outside the planner's aspect list:

Don't add a new aspect mid-review — the reviewer hat will reject for scope drift
Add a ## Scope Concern block at the bottom of the body naming the missed aspect and why you think it should be in scope
The next iteration's planner-pass decides whether to bring it in

5. Handle open questions explicitly

When you genuinely can't reach a verdict — the draft is too ambiguous, the planner's criterion is under-specified, the source the planner cited doesn't exist — add an ## Open Questions entry for that aspect with the specific blocker. Don't guess; the reviewer hat will reject vague open questions but accept specific actionable ones.

6. Self-check before handing off

Every in-scope aspect has an observation block; no aspect is silently skipped
Every observation cites a specific anchor in the draft
Every FINDING has a severity drawn from the planner's rubric and a remediation suggestion
No new aspects were reviewed beyond what the planner listed
Scope concerns are flagged in ## Scope Concern, not silently absorbed
Open questions are specific enough to be actionable

Anti-patterns (RFC 2119)

The agent MUST NOT review aspects the planner did not call for — raise scope concerns in the body, don't act on them
The agent MUST NOT make findings without citing the specific section / line / paragraph of the draft they refer to
The agent MUST NOT assign severities arbitrarily — every severity MUST follow the planner's rubric
The agent MUST NOT rubber-stamp ("looks fine") — every aspect MUST have a substantive observation, even if the verdict is PASS
The agent MUST NOT introduce conclusions that contradict a recorded Decision without citing the Decision ID
The agent MUST NOT issue findings that are stylistic preferences dressed up as substance — the criterion is the contract
The agent MUST flag open questions explicitly rather than guess
The agent MUST NOT silently downgrade or upgrade severity to fit a preferred outcome
The agent MUST NOT edit the planner's plan during review — disagreements with the plan are raised in ## Scope Concern, not edited in place

4Approve

post-execute · the same agents re-run against the built work

The agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.

approval agentCoherenceThe agent **MUST** verify the review's outputs are internally consistent and present a unified narrative across units. Coherence is the lens — a review report that contradicts itself, uses inconsistent terminology, or whose summary doesn't reflect its detail is harder for the downstream `deliver` stage to act on than one with fewer findings stated coherently.

Check

The agent MUST verify, filing feedback for any violation:

Terminology consistent — One term per concept across every unit's findings. If unit A's findings call something a "variant" and unit B's findings call the same thing an "option," that's a violation — the publisher will spend cycles reconciling rather than addressing the substance.
Findings do not contradict each other — Two findings recommending opposite changes to the same draft section is a violation. Either one is wrong, or both apply under different conditions that need to be stated explicitly.
Severity assignments coherent across units — Comparable findings carry comparable severities. A "load-bearing claim unsourced" rated critical in one unit and minor in another is a violation; flag the inconsistency.
Executive summary reflects the detail — If the review produces a summary (per the intent or the review-planner's plan), the summary's claims trace to specific findings in the detail. A summary that softens or strengthens the detail is a violation.
Cross-unit references resolve — Where one unit's findings reference another unit ("see review unit-3"), the referenced unit actually contains what's claimed.
Plan-do-verify chain coherent within each unit — The synthesizer's findings cover every aspect the planner listed; the reviewer's verification noted any gaps; the adversarial hats' findings extend rather than contradict the front loop's findings.

Common failure modes to look for

The same draft section called "introduction" in one unit's findings and "preamble" in another
Two findings recommending opposite fixes to the same paragraph with no condition stated to disambiguate
A critical finding in one unit and a minor finding for the same defect class in another, with no rubric reason for the difference
An executive summary asserting "the draft is strong overall" when the detail contains three critical findings
A finding citing "research §3" when there is no §3 in the research brief
A synthesizer block that says "no findings" for an aspect the planner explicitly listed (silent skip)

5Gate

controls advancement to the next stage

Ask

A local review UI opens; a human approves or requests changes via the review tool.

Fix loop

a separate track · Classifier → Synthesizer → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.
Read the stage's unit list via haiku_unit_list { intent, stage }.
Decide:
- target_unit — which unit this FB counter-signals.
  - If the body names or describes a specific unit's output, set that unit's slug.
  - If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
  - When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
- target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
  - user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
  - adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
  - drift origin → ["user"] (drift always escalates to human).
  - agent origin → [] (informational; no rerun).
Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.
Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.
- blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
- high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
- medium — a genuine issue worth fixing; not delivery-blocking.
- low — a nit, polish, or nice-to-have.
Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.
Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.
Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2SynthesizerPerform the review per the review-planner's plan for THIS unit. Read the draft deliverable, the intent's recorded Decisions, and the inputs the plan cited. Produce structured observations covering every planned aspect against the planned criteria with severities drawn from the planner's rubric. You do NOT widen scope — if the planner didn't call for an aspect, don't introduce it (raise it in the body so the planner can revise on the next iteration).

Process

1. Read the plan and the draft together

2. Produce one observation block per planned aspect

### <Aspect name>
**Criterion:** <verbatim or paraphrased from the planner>

**Observation:** <what you found, citing specific sections / paragraphs / lines / quotes from the draft>

**Verdict:** PASS | FINDING — severity: critical | major | minor

**If FINDING:** <what's wrong, why it matters, suggested remediation>

Citation discipline:

Every observation cites a specific anchor in the draft — section header, paragraph number, line range, or short verbatim quote
"The introduction is weak" is not an observation; "Introduction's claim that 'X is the dominant approach' (paragraph 2) is unsupported — no source cited, contradicts research-brief §Patterns 3" is an observation
When a finding bumps against a recorded Decision, cite the Decision ID

3. Use the severity rubric the planner set

4. Flag scope concerns rather than acting on them

If you spot a defect outside the planner's aspect list:

Don't add a new aspect mid-review — the reviewer hat will reject for scope drift
Add a ## Scope Concern block at the bottom of the body naming the missed aspect and why you think it should be in scope
The next iteration's planner-pass decides whether to bring it in

5. Handle open questions explicitly

6. Self-check before handing off

Every in-scope aspect has an observation block; no aspect is silently skipped
Every observation cites a specific anchor in the draft
Every FINDING has a severity drawn from the planner's rubric and a remediation suggestion
No new aspects were reviewed beyond what the planner listed
Scope concerns are flagged in ## Scope Concern, not silently absorbed
Open questions are specific enough to be actionable

Anti-patterns (RFC 2119)

The agent MUST NOT review aspects the planner did not call for — raise scope concerns in the body, don't act on them
The agent MUST NOT make findings without citing the specific section / line / paragraph of the draft they refer to
The agent MUST NOT assign severities arbitrarily — every severity MUST follow the planner's rubric
The agent MUST NOT rubber-stamp ("looks fine") — every aspect MUST have a substantive observation, even if the verdict is PASS
The agent MUST NOT introduce conclusions that contradict a recorded Decision without citing the Decision ID
The agent MUST NOT issue findings that are stylistic preferences dressed up as substance — the criterion is the contract
The agent MUST flag open questions explicitly rather than guess
The agent MUST NOT silently downgrade or upgrade severity to fit a preferred outcome
The agent MUST NOT edit the planner's plan during review — disagreements with the plan are raised in ## Scope Concern, not edited in place

fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

The agent MUST NOT edit any file — you are a verifier, not a fixer
The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat