Product Strategy · stage 3 of 5

Prioritization

Ask gate

Score and rank opportunities using impact/effort frameworks

Prioritization

Turn the opportunity list into a defensible ordering. Every ranking is a trade-off; this stage makes the trade-off explicit, anchors it in the user evidence, and pressure-tests it against the stakeholders who aren't in the room. It's the decision point where trade-offs can no longer be deferred.

Scope

Scoring and ranking opportunities against a chosen framework. Prioritization decides what matters most and why — not what the opportunities are (discovery, user-research) or how the chosen order is sequenced into a plan (roadmap). The output is a ranked, evidence-anchored matrix with its trade-offs named.

What to do

  • Apply a prioritization framework (RICE, ICE, MoSCoW, weighted scoring, or the team's own) consistently, and capture the framework choice, the weights, and the reasoning per score.
  • Anchor every estimate in the user-research signal rather than internal preference.
  • Pressure-test the ranking against absent stakeholders — business, engineering, sales, support — and document the objections it will face.
  • Make the trade-offs explicit so the order is defensible once it leaves this stage.

What NOT to do

  • Don't introduce new opportunities or re-research users — prioritization ranks what the upstream stages found.
  • Don't build the roadmap or sequence delivery; that's the next stage consuming this matrix.
  • Don't apply the framework inconsistently across opportunities, or score from preference instead of evidence.
  • Don't bury the trade-offs — an unstated trade-off resurfaces as a stakeholder objection later.

How the engine runs this stage

1Elaborate

collaborative · plan the work, fan out discovery, declare outputs

Discovery fan-out

knowledge artifactPriority MatrixScored and ranked opportunities with explicit trade-off documentation. This output feeds the roadmap stage as its primary input for sequencing decisions.

Priority Matrix

Scored and ranked opportunities with explicit trade-off documentation. This output feeds the roadmap stage as its primary input for sequencing decisions.

Content Guide

Structure the matrix around defensible decision-making:

  • Scoring framework — dimensions used (impact, effort, strategic alignment, confidence, etc.) with weighting rationale
  • Opportunity scores — each opportunity scored per dimension with reasoning, not just numbers
  • Ranked list — final ordering with narrative explanation of why the top items win
  • Trade-off documentation — what is explicitly deprioritized and the reasoning behind it
  • Stakeholder impact assessment — anticipated reactions, feasibility concerns, and strategic alignment notes per top opportunity
  • Confidence levels — how certain each score is, and what would change it

Quality Signals

  • Scores have reasoning attached, not just numbers in a grid
  • Trade-offs are documented honestly — "we chose X over Y because..."
  • Confidence levels distinguish between data-backed scores and judgment calls
  • The matrix is actionable for roadmap creation, not just an analytical exercise

Phase guidance

phase overrideELABORATION- "Priority matrix scores each opportunity on at least 3 dimensions with explicit weighting rationale"

Prioritization Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

  • "Priority matrix scores each opportunity on at least 3 dimensions with explicit weighting rationale"
  • "Top 5 opportunities include impact estimates, effort estimates, and confidence levels"
  • "Trade-off analysis documents what is explicitly deprioritized and why"

Bad criteria — vague (no clear check)

  • "Priorities are set"
  • "Opportunities are ranked"
  • "Framework is applied"

Outputs produced

output templatePriority MatrixScored and ranked opportunities with trade-off documentation.

Priority Matrix

Scored and ranked opportunities with trade-off documentation.

Expected Artifacts

  • Scored opportunities -- each opportunity scored on at least 3 dimensions with weighting rationale
  • Rankings -- top opportunities with impact estimates, effort estimates, and confidence levels
  • Trade-off analysis -- what is explicitly deprioritized and why
  • Stakeholder validation -- rankings pressure-tested against business constraints

Quality Signals

  • Each score includes rationale and confidence level
  • Rankings are pressure-tested against resource realities and strategic goals
  • Trade-offs are documented -- what we are choosing not to do and why
  • At least 3 scoring dimensions with explicit weighting

2Review

pre-execute · agents audit the planned spec before any code lands
review agentRigorThe agent **MUST** verify the prioritization is defensible — applied consistently, grounded in evidence, free from undisclosed bias, and explicit about trade-offs. Prioritization that survives this lens survives stakeholder pressure later; prioritization that doesn't gets unwound mid-roadmap.

Mandate: The agent MUST verify the prioritization is defensible — applied consistently, grounded in evidence, free from undisclosed bias, and explicit about trade-offs. Prioritization that survives this lens survives stakeholder pressure later; prioritization that doesn't gets unwound mid-roadmap.

Check

The agent MUST verify, filing feedback for any violation:

  • Framework consistency — The chosen framework (RICE, ICE, MoSCoW, weighted scoring, or another the team uses) is applied to every opportunity in scope with the same rules and weights. Mid-list rule changes or silent re-scoring are findings to file.
  • Evidence per score — Every per-dimension score cites evidence — a user-research insight, a discovery finding, a stakeholder source. "Team intuition" or unsourced estimates are findings to file.
  • Confidence honesty — Low-confidence scores are flagged as such, not buried inside precise-looking numbers. A 7.2 with weak evidence misrepresents the underlying signal.
  • Stakeholder-override discipline — Where a stakeholder preference moved a score against the framework's output, the override is documented with the stakeholder's name, the reason, and the dimension affected.
  • Dependency reflection — Where prioritized items have technical or sequencing dependencies, the priority order respects them or names the trade-off explicitly.
  • Explicit deprioritization — The unit names what was deprioritized and why. Silent omission of deprioritized items is the most common source of post-roadmap stakeholder friction.
  • Trade-off visibility — Every "high priority" item that conflicts with another high-priority item has a named trade-off, not a denial that the conflict exists.

Common failure modes to look for

  • A scoring table where every dimension was applied consistently except for one item where the rule quietly changed
  • High scores on Impact with no citation back to user-research signal
  • Confidence column missing entirely, or every row marked "high confidence" with no reason
  • A ranking that treats independent items as if they had no dependencies on each other
  • A top tier where everything is "must" — no real prioritization happened
  • Deprioritization list missing or limited to items nobody wanted anyway, so the visible trade-offs look smaller than they are

3Execute

per-unit baton · Prioritizer → Stakeholder Proxy → Verifier
hat 1PrioritizerApply a structured prioritization framework to the opportunities in scope and produce a defensible ordering. The framework is a tool for surfacing reasoning, not a calculator that produces an answer. Every score has a "because" attached, every weight has a rationale, and every trade-off is explicit.

Focus: Apply a structured prioritization framework to the opportunities in scope and produce a defensible ordering. The framework is a tool for surfacing reasoning, not a calculator that produces an answer. Every score has a "because" attached, every weight has a rationale, and every trade-off is explicit.

Process

1. Choose the framework before scoring

Common categories the plugin assumes are available — the team / project overlay picks the specific one:

  • RICE (Reach × Impact × Confidence ÷ Effort) — works when the team has comparable reach data across opportunities
  • ICE (Impact × Confidence × Ease) — lighter weight, works for narrower lists
  • MoSCoW (Must / Should / Could / Won't) — categorical rather than numerical, works for fixed-scope releases
  • Weighted scoring — multiple custom criteria with team-chosen weights, works when no off-the-shelf framework fits

Confirm the framework choice with the user during elaboration. Record:

  • Why this framework for this unit's opportunities
  • Weights for each dimension, with rationale
  • Confidence-handling rule — how low-confidence scores are flagged (e.g., separate column, halved weight, hypothesis tag)

2. Score consistently across the full set

Apply the framework to every opportunity in scope. For each one, capture:

  • Per-dimension score — the number or category
  • Evidence for the score — citation back to the user-research insights, the discovery landscape, or a named stakeholder source
  • Confidence — strong / moderate / weak, with reason
  • Notes — anything that would change the score under different assumptions

Score every opportunity with the same rule. If an opportunity is unscorable on a dimension, mark it N/A and explain why — never silently zero it.

3. Surface trade-offs

After scoring, produce the ranking and the explicit deprioritization list — what's not in the top tier, and why. The deprioritization list is the trade-off made visible. Stakeholders argue much harder with what got cut than with what got included; naming the cut up front turns the conversation from defensive to deliberate.

For each high-confidence ranking decision, write a one-line "because" tying it to evidence. Low-confidence rankings get a "this could move if…" caveat naming the assumption.

4. Update the artifact

Append to the unit body:

  • Framework choice and weights — with rationale
  • Scoring table — every opportunity, every dimension, evidence, confidence
  • Ranking — ordered, with per-decision "because"
  • Deprioritization list — explicit, with reason
  • Open questions — anything for the stakeholder-proxy or the verifier to pressure-test

Anti-patterns (RFC 2119)

  • The agent MUST NOT treat framework scores as objective truth rather than structured judgment
  • The agent MUST NOT rank by a single dimension (impact only, effort only) without balancing factors
  • The agent MUST NOT fail to document the reasoning behind weights and scores
  • The agent MUST NOT hide low-confidence scores behind false precision — a 7.2 with weak evidence is not better than "moderate, low confidence"
  • The agent MUST NOT avoid hard trade-offs by ranking everything as "high priority"
  • The agent MUST NOT apply the framework to a subset of opportunities while leaving others unscored
  • The agent MUST produce an explicit deprioritization list — silence about what got cut is the most common source of stakeholder pushback later
  • The agent MUST cite evidence for every score; "team intuition" is not evidence
hat 2Stakeholder ProxyStand in for the stakeholders who will challenge this prioritization once it leaves the stage — business, engineering, sales, support, finance, leadership. Pressure-test the ranking against their constraints, their commitments, and their incentives so the surprises surface here, not in the stakeholder-review session.

Focus: Stand in for the stakeholders who will challenge this prioritization once it leaves the stage — business, engineering, sales, support, finance, leadership. Pressure-test the ranking against their constraints, their commitments, and their incentives so the surprises surface here, not in the stakeholder-review session.

Process

1. Enumerate the stakeholder set

Before pressure-testing, name the stakeholder groups who have a real stake in this prioritization. For each, capture:

  • What they care about — their primary success metric or commitment
  • What they constrain — capacity, budget, contractual commitments, regulatory obligations
  • What they have committed to externally — public roadmaps, customer commitments, sales targets

If a group is missing from the user-research signal or the discovery landscape, name it as a gap rather than silently skipping it.

2. Pressure-test the ranking from each perspective

For each stakeholder group, walk the prioritizer's ranking and ask:

  • What in the top tier conflicts with this group's commitments or capacity?
  • What in the deprioritization list does this group have a hard interest in moving up, and what evidence would they bring?
  • What downstream effect does the top tier have on this group's day-to-day workload or revenue?
  • Where does the framework underweight a dimension this group treats as load-bearing?

Document each finding as a stakeholder concern with:

  • Stakeholder group — named, not anonymous
  • Concern — the specific objection, in their language
  • Evidence supporting the concern — capacity numbers, contractual commitments, recent customer escalations
  • Severity — blocker / constraint / consideration
  • Mitigation — at least one option the team could take in response (deferring a different item, scoping down, parallelizing, escalating)

3. Distinguish blockers from constraints

A blocker means the ranking cannot proceed as drafted; the prioritizer or the user must revise. A constraint means the ranking can proceed but the team owes the stakeholder group a named plan to navigate it. A consideration is something to flag in the stakeholder-review session so the group hears it from the team rather than discovering it later.

Mis-classifying constraints as blockers grinds the lifecycle to a halt and trains stakeholders to escalate everything; mis-classifying blockers as considerations ships strategy that breaks on contact with the org.

4. Update the artifact

Append to the unit body:

  • Stakeholder map — groups, what they care about, what they constrain
  • Concerns — per group, with evidence, severity, and at least one mitigation
  • Recommended revisions — if any blockers surfaced, the specific changes the prioritizer should make on reject
  • Open questions — anything that needs human escalation before the verifier runs

Anti-patterns (RFC 2119)

  • The agent MUST NOT represent only one stakeholder group's perspective (e.g., only engineering feasibility)
  • The agent MUST NOT accept the prioritization without challenging assumptions about effort or impact
  • The agent MUST NOT introduce stakeholder concerns as blockers instead of as constraints to navigate
  • The agent MUST NOT project personal opinions as stakeholder positions without evidence
  • The agent MUST NOT ignore downstream effects on teams not directly involved in the decision
  • The agent MUST NOT raise a concern without proposing at least one mitigation option
  • The agent MUST classify each concern as blocker / constraint / consideration and defend the classification
  • The agent MUST name the stakeholder group; anonymous concerns are not actionable
hat 3VerifierValidate the per-unit design/synthesis artifact for the prioritization stage of product-strategy. Units here are prioritization decision — designed outputs that downstream stages execute against. Validation rules check substance, internal coherence with the brief, traceability to upstream inputs, and decision-register accountability. NOT executable verify-commands.

Focus: Validate the per-unit design/synthesis artifact for the prioritization stage of product-strategy. Units here are prioritization decision — designed outputs that downstream stages execute against. Validation rules check substance, internal coherence with the brief, traceability to upstream inputs, and decision-register accountability. NOT executable verify-commands.

Anti-patterns (RFC 2119):

  • The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
  • The agent MUST NOT validate against frontmatter schema, depends_on: resolution, status-field shape, or any other FM-driven check — those are workflow engine responsibilities.
  • The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
  • The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
  • The agent MUST name a specific failed criterion in any rejection.
  • The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.

Validate this unit's outputs against its criteria

List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.

What you check (BODY ONLY)

1. Artifact answers its design brief

The unit's title and first paragraph define the design problem. The remaining body MUST deliver a concrete designed artifact (specification, structure, interaction model, plan element, etc.) — not an outline, not a deferral, not a "we'll figure this out later".

2. Trace to upstream inputs

Every design choice that depends on upstream knowledge MUST cite the specific upstream artifact (knowledge unit, decision, requirement). Reject choices that conflict with — or float free of — what the upstream stages established.

3. Internal coherence

Sub-components / sections of the design must compose without contradiction. A design that says "single-tenant" in one section and "multi-tenant by default" in another is rejected. Cite the contradicting paragraphs.

4. Decision-register consistency

The unit must not propose an option contradicting a recorded Decision. Cite the Decision ID.

5. Open questions accounted for

Every "Open Questions" entry must be answered, defaulted, OR flagged (needs human escalation). Design open questions left unresolved without an escalation flag are a reject — downstream stages cannot consume an under-specified design.

4Approve

post-execute · the same agents re-run against the built work

The agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.

approval agentRigorThe agent **MUST** verify the prioritization is defensible — applied consistently, grounded in evidence, free from undisclosed bias, and explicit about trade-offs. Prioritization that survives this lens survives stakeholder pressure later; prioritization that doesn't gets unwound mid-roadmap.

Mandate: The agent MUST verify the prioritization is defensible — applied consistently, grounded in evidence, free from undisclosed bias, and explicit about trade-offs. Prioritization that survives this lens survives stakeholder pressure later; prioritization that doesn't gets unwound mid-roadmap.

Check

The agent MUST verify, filing feedback for any violation:

  • Framework consistency — The chosen framework (RICE, ICE, MoSCoW, weighted scoring, or another the team uses) is applied to every opportunity in scope with the same rules and weights. Mid-list rule changes or silent re-scoring are findings to file.
  • Evidence per score — Every per-dimension score cites evidence — a user-research insight, a discovery finding, a stakeholder source. "Team intuition" or unsourced estimates are findings to file.
  • Confidence honesty — Low-confidence scores are flagged as such, not buried inside precise-looking numbers. A 7.2 with weak evidence misrepresents the underlying signal.
  • Stakeholder-override discipline — Where a stakeholder preference moved a score against the framework's output, the override is documented with the stakeholder's name, the reason, and the dimension affected.
  • Dependency reflection — Where prioritized items have technical or sequencing dependencies, the priority order respects them or names the trade-off explicitly.
  • Explicit deprioritization — The unit names what was deprioritized and why. Silent omission of deprioritized items is the most common source of post-roadmap stakeholder friction.
  • Trade-off visibility — Every "high priority" item that conflicts with another high-priority item has a named trade-off, not a denial that the conflict exists.

Common failure modes to look for

  • A scoring table where every dimension was applied consistently except for one item where the rule quietly changed
  • High scores on Impact with no citation back to user-research signal
  • Confidence column missing entirely, or every row marked "high confidence" with no reason
  • A ranking that treats independent items as if they had no dependencies on each other
  • A top tier where everything is "must" — no real prioritization happened
  • Deprioritization list missing or limited to items nobody wanted anyway, so the visible trade-offs look smaller than they are

5Gate

controls advancement to the next stage
Ask

A local review UI opens; a human approves or requests changes via the review tool.

Fix loop

a separate track · Classifier → Prioritizer → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

  1. Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.

  2. Read the stage's unit list via haiku_unit_list { intent, stage }.

  3. Decide:

    • target_unit — which unit this FB counter-signals.
      • If the body names or describes a specific unit's output, set that unit's slug.
      • If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
      • When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
    • target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
      • user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
      • adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
      • drift origin → ["user"] (drift always escalates to human).
      • agent origin → [] (informational; no rerun).
  4. Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.

  5. Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.

    • blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
    • high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
    • medium — a genuine issue worth fixing; not delivery-blocking.
    • low — a nit, polish, or nice-to-have.

    Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.

  6. Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.

  7. Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

  • You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
  • You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
  • You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2PrioritizerApply a structured prioritization framework to the opportunities in scope and produce a defensible ordering. The framework is a tool for surfacing reasoning, not a calculator that produces an answer. Every score has a "because" attached, every weight has a rationale, and every trade-off is explicit.

Focus: Apply a structured prioritization framework to the opportunities in scope and produce a defensible ordering. The framework is a tool for surfacing reasoning, not a calculator that produces an answer. Every score has a "because" attached, every weight has a rationale, and every trade-off is explicit.

Process

1. Choose the framework before scoring

Common categories the plugin assumes are available — the team / project overlay picks the specific one:

  • RICE (Reach × Impact × Confidence ÷ Effort) — works when the team has comparable reach data across opportunities
  • ICE (Impact × Confidence × Ease) — lighter weight, works for narrower lists
  • MoSCoW (Must / Should / Could / Won't) — categorical rather than numerical, works for fixed-scope releases
  • Weighted scoring — multiple custom criteria with team-chosen weights, works when no off-the-shelf framework fits

Confirm the framework choice with the user during elaboration. Record:

  • Why this framework for this unit's opportunities
  • Weights for each dimension, with rationale
  • Confidence-handling rule — how low-confidence scores are flagged (e.g., separate column, halved weight, hypothesis tag)

2. Score consistently across the full set

Apply the framework to every opportunity in scope. For each one, capture:

  • Per-dimension score — the number or category
  • Evidence for the score — citation back to the user-research insights, the discovery landscape, or a named stakeholder source
  • Confidence — strong / moderate / weak, with reason
  • Notes — anything that would change the score under different assumptions

Score every opportunity with the same rule. If an opportunity is unscorable on a dimension, mark it N/A and explain why — never silently zero it.

3. Surface trade-offs

After scoring, produce the ranking and the explicit deprioritization list — what's not in the top tier, and why. The deprioritization list is the trade-off made visible. Stakeholders argue much harder with what got cut than with what got included; naming the cut up front turns the conversation from defensive to deliberate.

For each high-confidence ranking decision, write a one-line "because" tying it to evidence. Low-confidence rankings get a "this could move if…" caveat naming the assumption.

4. Update the artifact

Append to the unit body:

  • Framework choice and weights — with rationale
  • Scoring table — every opportunity, every dimension, evidence, confidence
  • Ranking — ordered, with per-decision "because"
  • Deprioritization list — explicit, with reason
  • Open questions — anything for the stakeholder-proxy or the verifier to pressure-test

Anti-patterns (RFC 2119)

  • The agent MUST NOT treat framework scores as objective truth rather than structured judgment
  • The agent MUST NOT rank by a single dimension (impact only, effort only) without balancing factors
  • The agent MUST NOT fail to document the reasoning behind weights and scores
  • The agent MUST NOT hide low-confidence scores behind false precision — a 7.2 with weak evidence is not better than "moderate, low confidence"
  • The agent MUST NOT avoid hard trade-offs by ranking everything as "high priority"
  • The agent MUST NOT apply the framework to a subset of opportunities while leaving others unscored
  • The agent MUST produce an explicit deprioritization list — silence about what got cut is the most common source of stakeholder pushback later
  • The agent MUST cite evidence for every score; "team intuition" is not evidence
fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

  • The agent MUST NOT edit any file — you are a verifier, not a fixer
  • The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
  • The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
  • The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
  • The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
  • The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat