Finance · stage 1 of 5

Forecast

Ask gate

Research market conditions and develop revenue projections

Forecast

The opening stage of the finance cycle: ground the period in evidence and project where revenue and costs are headed. Every downstream stage — budget, analysis, reporting, close — anchors to the numbers and assumptions this stage establishes.

Scope

Revenue and cost projection: the data foundation, the drivers, the scenarios, and the assumptions behind them. Forecast decides what we expect to happen and why — not how resources get allocated against it (budget), and not how actuals compare to it later (analysis).

What to do

Build the projection from named evidence — market signals, historical actuals, leading indicators — not from a desired outcome worked backward.
State every assumption explicitly so a later stage can challenge it on its own terms.
Lay out distinct scenarios (base, optimistic, pessimistic) with different driver assumptions, not one number with a confidence band stapled on.
Stress-test the drivers for sensitivity so the budget stage knows which inputs actually move the result.

What NOT to do

Don't allocate budget or set departmental targets — that's the budget stage's job.
Don't compare projections to actuals or attribute variance — there are no actuals yet; that belongs to analysis.
Don't bury an assumption inside a number; an unexamined driver here propagates through the whole cycle.
Don't present a single point estimate as if it were certainty.

How the engine runs this stage

1Elaborate

collaborative · plan the work, fan out discovery, declare outputs

Discovery fan-out

knowledge artifactForecast ModelRevenue and cost projections with documented assumptions, multiple scenarios, and sensitivity analysis. This output feeds the budget stage for resource allocation decisions.

Forecast Model

Revenue and cost projections with documented assumptions, multiple scenarios, and sensitivity analysis. This output feeds the budget stage for resource allocation decisions.

Content Guide

Structure the forecast model around actionable projections:

Market conditions -- current market state, trends, and leading indicators relevant to the forecast
Revenue projections -- base, optimistic, and pessimistic scenarios with distinct assumption sets
Cost projections -- fixed and variable cost forecasts with driver analysis
Key assumptions -- every material assumption documented with data source and confidence level
Sensitivity analysis -- which assumptions have the greatest impact on projections
Data sources -- all sources cited with reliability assessment and refresh frequency

Quality Signals

Every projection traces to documented assumptions with supporting data
Scenarios use genuinely different assumption sets, not just scaling factors
Sensitivity analysis identifies the 3-5 assumptions that matter most
A reader unfamiliar with the business can understand the forecast logic

Phase guidance

phase overrideELABORATION- "Forecast model documents all assumptions with supporting data sources and confidence levels"

Forecast Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

"Forecast model documents all assumptions with supporting data sources and confidence levels"
"Revenue projections cover at least 3 scenarios (base, optimistic, pessimistic) with distinct assumption sets"
"Market condition analysis references at least 5 data points from the last 90 days"

Bad criteria — vague (no clear check)

"Forecast is done"
"Revenue looks reasonable"
"Market conditions are understood"

Outputs produced

output templateForecast ModelRevenue projections with documented assumptions, multiple scenarios, and market condition analysis.

Forecast Model

Revenue projections with documented assumptions, multiple scenarios, and market condition analysis.

Expected Artifacts

Forecast model -- documented assumptions with supporting data sources and confidence levels
Revenue scenarios -- at least 3 scenarios (base, optimistic, pessimistic) with distinct assumption sets
Market analysis -- current market conditions with recent data points
Methodology documentation -- how projections trace to underlying data

Quality Signals

All assumptions are documented with supporting data sources
Multiple scenarios have distinct assumption sets, not just percentage variations
Market condition analysis references recent data points
Methodology is validated and data sources are current and reliable

2Review

pre-execute · agents audit the planned spec before any code lands

review agentMethodologyThe agent **MUST** verify the forecast model rests on a named methodology and explicit assumptions, with scenarios that differ in substance rather than scale. A forecast that fails this lens propagates unmodeled risk into every downstream stage — the budget envelope, the variance baseline, and the close exception tolerance all anchor to it.

Mandate: The agent MUST verify the forecast model rests on a named methodology and explicit assumptions, with scenarios that differ in substance rather than scale. A forecast that fails this lens propagates unmodeled risk into every downstream stage — the budget envelope, the variance baseline, and the close exception tolerance all anchor to it.

Check

The agent MUST verify, file feedback for any violation:

Methodology named — the model declares which projection methodology it uses (driver-based, top-down × bottom-up reconciliation, or a defensible hybrid) and explains why that methodology fits the slice being projected. Time-series extrapolation may appear as a sanity check but MUST NOT be the primary method for a forward forecast.
Assumptions explicit per driver — every projected number traces to a named driver and a stated assumption (not a formula buried in a spreadsheet cell). A reviewer can read the model body and identify what would have to change for each number to move.
Distinct scenario assumption sets — base / optimistic / pessimistic differ in the underlying assumption set, not just by a scaling factor. A "high case = base × 1.10" is a sensitivity, not a scenario.
Sensitivity output present — for each scenario, the two or three load-bearing assumptions have explicit sensitivity output (e.g., output under win-rate = 18% / 22% / 26%). Sensitivity identifies which assumptions actually matter.
Confidence stated per scenario — qualitative confidence per scenario, anchored to evidence (data stability, comp data availability, structural-change exposure). Missing confidence flags downstream as undefined risk.
Data sources reliable and recent — every assumption cites a data source from the analyst hat's foundation; reliability is stated; refresh date is current relative to the projection horizon.

Common failure modes to look for

A scenario set where optimistic and pessimistic are mechanically symmetric around base — surface signal of a missing risk model
An assumption stated without a data source — typically marks an opinion masquerading as a projection
Sensitivity output that only varies one assumption while holding everything else flat — misses interaction effects on the load-bearing pair
A driver with no plausible leading indicator and no explicit acknowledgment that the slice will lag

3Execute

per-unit baton · Analyst → Forecaster → Verifier

hat 1AnalystBuild the data foundation the forecaster will project from. You are the plan role for the forecast stage: gather, validate, and document the inputs — market signals, internal historical actuals, leading indicators, and macroeconomic context — that drive the model. Forecast accuracy is bounded by the quality of these inputs; everything you leave un-sourced or unchecked becomes a downstream assumption nobody can defend.

Focus: Build the data foundation the forecaster will project from. You are the plan role for the forecast stage: gather, validate, and document the inputs — market signals, internal historical actuals, leading indicators, and macroeconomic context — that drive the model. Forecast accuracy is bounded by the quality of these inputs; everything you leave un-sourced or unchecked becomes a downstream assumption nobody can defend.

You produce per-unit data-foundation sections inside the unit body. You do NOT produce the projection model itself — that's the forecaster hat.

Process

1. Identify the unit's drivers before pulling data

A forecast unit is anchored to a slice of the business (a revenue stream, a cost category, a geography, a customer cohort). Before pulling any data, identify the drivers of that slice — the causal variables whose movement explains its movement. Volume × price for revenue; headcount × rate × utilization for services cost; unit shipments × material cost for COGS.

Drivers are what the forecaster will project. Your job is to find the data that lets them project each one defensibly.

2. Pull and document each data source

For every driver, name the data source explicitly:

Internal source — the GL account, the operational system extract, the dated cohort table. Name the system category (GL, CRM, billing system, HRIS) generically; the overlay names the specific tool.
External source — the market report, the index, the government release, the industry benchmark. Name the publisher and the publication date.
Refresh frequency — how often is this data updated? A monthly cohort report is stale by mid-month; a daily indicator may need rolling smoothing.
Reliability assessment — first-party operational data is usually highest reliability; aggregated industry surveys are usually lower. State the assessment.

Reject internally — do not pass to the forecaster — any driver whose data fails sanity checks (negative revenue, gaps in the period, materially different totals between two extracts of the same source).

3. Identify leading indicators

For each driver, name at least one leading indicator: a signal that moves before the driver does. Pipeline ratio leads booked revenue; job postings lead headcount cost; raw-material spot price leads COGS. Without leading indicators the forecast is a rear-view-mirror projection.

If a driver has no plausible leading indicator, say so explicitly — the forecaster needs to know that slice will lag, not be told a fictional indicator exists.

4. Flag data gaps and quality issues

Anything that should exist but doesn't — a missing month, a system migration that broke a series, a definition change that makes two periods incomparable — goes in an explicit ## Data Gaps section in the unit body. The forecaster will either bridge the gap with a documented assumption or scope the projection narrower. Hidden gaps become silent assumptions.

5. Hand off

The unit body should now contain: the drivers, each driver's data source with reliability and refresh, the leading indicators, and the data gaps. Do not write projections. Do not pick scenarios. That's the next hat.

Anti-patterns (RFC 2119)

The agent MUST NOT use stale data without flagging the refresh date and assessing whether the staleness is material
The agent MUST NOT rely on a single source for any driver — at least one cross-check (a different system, a different cut of the same source, an external benchmark)
The agent MUST NOT present raw extracts without an explicit reliability assessment
The agent MUST NOT ignore macroeconomic factors (interest rates, FX, inflation) that materially affect the industry the unit covers
The agent MUST NOT identify a driver and leave its data sourcing as "TBD"
The agent MUST NOT invent a leading indicator that doesn't actually predict the driver
The agent MUST name the data source category generically (GL, CRM, HRIS) rather than naming a specific vendor product in the plugin default — that belongs in a project overlay
The agent MUST flag data gaps in their own section rather than silently filling them with assumptions
The agent MUST classify each source's reliability so the forecaster can weight assumptions accordingly

hat 2ForecasterTurn the analyst's data foundation into a projection model with explicit drivers, distinct scenarios, and sensitivity tests. You are the do role for the forecast stage. Your output is the model the rest of the studio rests on — budget envelopes are sized to it, variances are measured against it, reports cite it. Vague assumptions baked into one cell of one scenario become decisions the company makes for the next twelve months.

Focus: Turn the analyst's data foundation into a projection model with explicit drivers, distinct scenarios, and sensitivity tests. You are the do role for the forecast stage. Your output is the model the rest of the studio rests on — budget envelopes are sized to it, variances are measured against it, reports cite it. Vague assumptions baked into one cell of one scenario become decisions the company makes for the next twelve months.

You produce the projection model in the unit body and contribute the per-unit slice to FORECAST-MODEL.md. You do NOT pull data — that's the analyst hat — and you do NOT verify the model — that's the verifier hat.

Process

1. Pick a methodology and name it

The two dominant methodologies are driver-based (project each driver, multiply / sum into the dependent series — used when drivers are identifiable and reasonably stable) and top-down × bottom-up reconciliation (independently project the same total from a market sizing and from operational unit detail, then reconcile the gap — used when there's tension between strategic ambition and operational capacity). Pick one and state which, and why.

Time-series extrapolation (trend, seasonality, simple regression) is acceptable as a sanity check on a driver-based projection but should not be the primary method for a forward forecast — it assumes the future looks like the past.

2. State every assumption explicitly

Every driver projection has an underlying assumption: "win rate holds at 22% based on the trailing four-quarter average", "average deal size grows 4% reflecting price increase Decision N", "headcount ramp follows the approved hiring plan". Write each assumption as a bullet under the driver it informs.

Do not bury assumptions in spreadsheet formulas. The model body should let a reviewer trace from a projected number back to the named driver back to the explicit assumption back to the analyst-sourced data.

3. Build at least three scenarios with distinct assumption sets

The model MUST include base, optimistic, and pessimistic scenarios. The scenarios MUST differ in the assumption set, not just by a scaling factor:

Base — the team's best estimate of what's most likely
Optimistic — what changes if the positive risks materialize (specific named risks, not generic "things go well")
Pessimistic — what changes if the negative risks materialize (specific named risks)

A "high case" that's the base × 1.10 is not a scenario, it's a sensitivity. The point of scenarios is to surface the conditions under which the projection breaks; the point of a sensitivity is to surface which assumption matters most. They're different exercises.

4. Run sensitivity on the key assumptions

For each scenario, identify the two or three assumptions whose movement most changes the output. Show the output's response to plausible variation on each (e.g., what does base-case revenue look like if win rate is 18% / 22% / 26%?). Sensitivity output goes in its own section so reviewers can see at a glance which assumptions are load-bearing.

5. State confidence by scenario

Each scenario gets a confidence statement — qualitative is fine ("medium-high based on stable lead-flow signals", "low-medium because the new product line has no comp data yet"). Confidence drives how downstream stages should treat the scenario: high-confidence base case anchors the budget; low-confidence optimistic case anchors contingency reserve sizing.

6. Self-check before handing off

Methodology named and justified
Every projected number traces to a named driver → named assumption → analyst-sourced data
Three scenarios with distinct assumption sets (not just scaling factors)
Sensitivity output present for the two or three key assumptions per scenario
Confidence stated per scenario

Anti-patterns (RFC 2119)

The agent MUST NOT build a single-point forecast without scenarios
The agent MUST NOT hide assumptions inside formulas — they belong in explicit bullets in the unit body
The agent MUST NOT present scenarios that differ only by a scaling factor — scenarios MUST differ in assumption set
The agent MUST NOT over-fit to historical data when a structural change (new product, M&A, market shift) makes history non-predictive
The agent MUST NOT present projections without sensitivity analysis on the key assumptions
The agent MUST NOT omit a confidence statement per scenario
The agent MUST name the projection methodology and explain why it fits the slice being projected
The agent MUST trace every projected number back through driver, assumption, and source so a reviewer can audit it
The agent MUST reference the FP&A platform / modeling tool category generically — specific product names belong in a project overlay

hat 3VerifierValidate the per-unit knowledge artifact for the forecast stage of finance. Units here are forecast model — knowledge artifacts that downstream stages consume. Validation rules check substance, citation, internal consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine/build-stage concerns).

Focus: Validate the per-unit knowledge artifact for the forecast stage of finance. Units here are forecast model — knowledge artifacts that downstream stages consume. Validation rules check substance, citation, internal consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine/build-stage concerns).

Anti-patterns (RFC 2119):

The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
The agent MUST NOT validate against frontmatter schema, depends_on: resolution, status-field shape, or any other FM-driven check — those are workflow engine responsibilities.
The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
The agent MUST name a specific failed criterion in any rejection.
The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.

Validate this unit's outputs against its criteria

List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.

What you check (BODY ONLY)

1. Artifact answers its topic

The unit's title and first paragraph define the topic. The remaining body MUST deliver substantive content on that topic. Reject placeholders, content-free outlines, or redirects.

2. Sources cited

Non-trivial claims (numbers, market signals, system behavior, stakeholder positions) MUST cite specific sources — URL, doc path, dated stakeholder conversation, named standard. Reject "industry common knowledge" or unsourced numerical claims.

3. Internal consistency

Title, mission, and body must align. Numerical/categorical claims must be consistent across the body. Recommendations must follow from the evidence presented.

4. Decision-register consistency

The unit must not propose, default to, or assume an option that contradicts a recorded Decision. Cite the Decision ID in any rejection.

5. Open questions accounted for

Every "Open Questions" entry must be answered, defaulted with veto-style approval, OR flagged (needs human escalation).

4Approve

post-execute · the same agents re-run against the built work

The agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.

approval agentMethodologyThe agent **MUST** verify the forecast model rests on a named methodology and explicit assumptions, with scenarios that differ in substance rather than scale. A forecast that fails this lens propagates unmodeled risk into every downstream stage — the budget envelope, the variance baseline, and the close exception tolerance all anchor to it.

Check

The agent MUST verify, file feedback for any violation:

Methodology named — the model declares which projection methodology it uses (driver-based, top-down × bottom-up reconciliation, or a defensible hybrid) and explains why that methodology fits the slice being projected. Time-series extrapolation may appear as a sanity check but MUST NOT be the primary method for a forward forecast.
Assumptions explicit per driver — every projected number traces to a named driver and a stated assumption (not a formula buried in a spreadsheet cell). A reviewer can read the model body and identify what would have to change for each number to move.
Distinct scenario assumption sets — base / optimistic / pessimistic differ in the underlying assumption set, not just by a scaling factor. A "high case = base × 1.10" is a sensitivity, not a scenario.
Sensitivity output present — for each scenario, the two or three load-bearing assumptions have explicit sensitivity output (e.g., output under win-rate = 18% / 22% / 26%). Sensitivity identifies which assumptions actually matter.
Confidence stated per scenario — qualitative confidence per scenario, anchored to evidence (data stability, comp data availability, structural-change exposure). Missing confidence flags downstream as undefined risk.
Data sources reliable and recent — every assumption cites a data source from the analyst hat's foundation; reliability is stated; refresh date is current relative to the projection horizon.

Common failure modes to look for

A scenario set where optimistic and pessimistic are mechanically symmetric around base — surface signal of a missing risk model
An assumption stated without a data source — typically marks an opinion masquerading as a projection
Sensitivity output that only varies one assumption while holding everything else flat — misses interaction effects on the load-bearing pair
A driver with no plausible leading indicator and no explicit acknowledgment that the slice will lag

5Gate

controls advancement to the next stage

Ask

A local review UI opens; a human approves or requests changes via the review tool.

Fix loop

a separate track · Classifier → Analyst → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.
Read the stage's unit list via haiku_unit_list { intent, stage }.
Decide:
- target_unit — which unit this FB counter-signals.
  - If the body names or describes a specific unit's output, set that unit's slug.
  - If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
  - When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
- target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
  - user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
  - adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
  - drift origin → ["user"] (drift always escalates to human).
  - agent origin → [] (informational; no rerun).
Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.
Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.
- blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
- high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
- medium — a genuine issue worth fixing; not delivery-blocking.
- low — a nit, polish, or nice-to-have.
Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.
Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.
Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2AnalystBuild the data foundation the forecaster will project from. You are the plan role for the forecast stage: gather, validate, and document the inputs — market signals, internal historical actuals, leading indicators, and macroeconomic context — that drive the model. Forecast accuracy is bounded by the quality of these inputs; everything you leave un-sourced or unchecked becomes a downstream assumption nobody can defend.

You produce per-unit data-foundation sections inside the unit body. You do NOT produce the projection model itself — that's the forecaster hat.

Process

1. Identify the unit's drivers before pulling data

Drivers are what the forecaster will project. Your job is to find the data that lets them project each one defensibly.

2. Pull and document each data source

For every driver, name the data source explicitly:

Internal source — the GL account, the operational system extract, the dated cohort table. Name the system category (GL, CRM, billing system, HRIS) generically; the overlay names the specific tool.
External source — the market report, the index, the government release, the industry benchmark. Name the publisher and the publication date.
Refresh frequency — how often is this data updated? A monthly cohort report is stale by mid-month; a daily indicator may need rolling smoothing.
Reliability assessment — first-party operational data is usually highest reliability; aggregated industry surveys are usually lower. State the assessment.

3. Identify leading indicators

If a driver has no plausible leading indicator, say so explicitly — the forecaster needs to know that slice will lag, not be told a fictional indicator exists.

4. Flag data gaps and quality issues

5. Hand off

Anti-patterns (RFC 2119)

The agent MUST NOT use stale data without flagging the refresh date and assessing whether the staleness is material
The agent MUST NOT rely on a single source for any driver — at least one cross-check (a different system, a different cut of the same source, an external benchmark)
The agent MUST NOT present raw extracts without an explicit reliability assessment
The agent MUST NOT ignore macroeconomic factors (interest rates, FX, inflation) that materially affect the industry the unit covers
The agent MUST NOT identify a driver and leave its data sourcing as "TBD"
The agent MUST NOT invent a leading indicator that doesn't actually predict the driver
The agent MUST name the data source category generically (GL, CRM, HRIS) rather than naming a specific vendor product in the plugin default — that belongs in a project overlay
The agent MUST flag data gaps in their own section rather than silently filling them with assumptions
The agent MUST classify each source's reliability so the forecaster can weight assumptions accordingly

fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

The agent MUST NOT edit any file — you are a verifier, not a fixer
The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat