Dev Evangelism · stage 1 of 5

Research

Auto gate

Identify target audience, map the topic landscape, analyze competitive content

Research

The opening stage of the dev-evangelism lifecycle: turn a raw evangelism intent into a grounded understanding of who the audience is and what they care about. Every later stage — narrative, create, publish, measure — reads this stage's map to know who it's writing for and about.

Scope

Mapping the audience and the topic landscape: developer segments and skill levels, the topics they engage with, where they gather, and where the team has credible expertise to contribute. Research decides who and what — it does not shape the story (narrative), produce assets (create), or distribute anything (publish).

What to do

Read prior community signals, past content history, and the intent's stated audience hypothesis before forming your own.
Map the developer segments, their skill levels, and how they actually behave on each platform.
Find the trending threads, the underserved gaps, and the competitive content already in the space.
Check honestly where the team has credible expertise to contribute, and where it doesn't.

What NOT to do

Don't draft story arcs, hooks, or takeaways — that's the narrative stage.
Don't produce content assets or demos — that's create.
Don't assert an audience or topic claim you can't ground in a signal.
Don't pick a topic the team can't credibly speak to.

How the engine runs this stage

1Elaborate

collaborative · plan the work, fan out discovery, declare outputs

Discovery fan-out

knowledge artifactAudience LandscapeDeveloper audience research and topic landscape analysis. This output feeds the narrative stage as foundational context for story arc and messaging decisions.

Audience Landscape

Developer audience research and topic landscape analysis. This output feeds the narrative stage as foundational context for story arc and messaging decisions.

Content Guide

Structure the landscape around developer understanding:

Developer segments -- defined with skill levels, technology stacks, pain points, and content format preferences
Topic landscape -- trending themes, underserved areas, and competitive content analysis
Content gaps -- opportunities where the team's expertise fills a genuine need
Community presence -- where each segment is active and receptive (forums, platforms, events)
Sources consulted -- community data, analytics, conference programs, with retrieval dates
Open questions -- what remains unvalidated or requires direct audience feedback

Quality Signals

Developer segments are evidence-based, not assumed from job titles alone
Topic recommendations match audience needs and team credibility
Content gaps are validated against existing competitive content
Community mapping identifies specific platforms and forums, not generic categories

Phase guidance

phase overrideELABORATION- "Audience landscape identifies at least 3 developer segments with skill levels, pain points, and preferred content formats"

Research Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

"Audience landscape identifies at least 3 developer segments with skill levels, pain points, and preferred content formats"
"Topic scan surfaces at least 5 trending or underserved topics with competitive content analysis for each"
"Research brief maps existing content gaps where the team has unique expertise to contribute"

Bad criteria — vague (no clear check)

"Research is done"
"Audience is understood"
"Topics are identified"

Outputs produced

output templateResearch BriefDeveloper audience segments, topic landscape, and content gap analysis.

Research Brief

Developer audience segments, topic landscape, and content gap analysis.

Expected Artifacts

Developer segments -- defined with skill levels, technology stacks, pain points, and content preferences
Topic landscape -- trending and underserved topics with competitive content analysis
Content gaps -- opportunities where the team has unique expertise to contribute
Community map -- where target developers are active and receptive

Quality Signals

Developer segments include skill levels, pain points, and content format preferences
At least 5 topics are analyzed with competitive content assessment
Content gaps are validated against existing materials in the space
All claims reference specific data sources with retrieval dates

2Review

pre-execute · agents audit the planned spec before any code lands

review agentRelevanceThe agent **MUST** verify that the research stage's audience segmentation and topic landscape target genuine developer needs that this team has credibility to address. Files feedback on any violation; does NOT edit the research artifacts.

Mandate: The agent MUST verify that the research stage's audience segmentation and topic landscape target genuine developer needs that this team has credibility to address. Files feedback on any violation; does NOT edit the research artifacts.

Check

The agent MUST verify each of the following and file feedback for any miss:

Segment evidence — every audience segment in the landscape is grounded in observable behavior (forum activity, analytics, conference programs, stakeholder interviews with dates) — not in job title alone or in assumption
Segment behavior split — the landscape distinguishes builders (developers who ship with the technology) from evaluators (developers deciding whether to adopt); collapsing both into one segment is a gap
Topic-audience match — each recommended topic maps to at least one named segment from the audience landscape; topics with no matching segment are scope creep
Team credibility check — each recommended topic names the specific prior work, contributors, or expertise that justifies the team publishing on it; topics flagged (credibility gap) get surfaced to the user, not silently dropped
Saturation analysis — for each recommended topic, the competitive content landscape is described with sources cited, not just summarized; an "underserved" claim with no comparison is unsupported
Timeliness window — each topic carries a stance on whether it's ascending, at peak, or past peak; past-peak high-saturation topics that aren't explicitly flagged are findings

Common failure modes to look for

Job-title-only segmentation ("senior engineers") with no behavior context
Channel claims that name no source ("developers are active on X" with no citation)
Topics ranked without a visible ranking method (or with a ranking that contradicts the demand and credibility evidence)
Demand signals cited as "trending" without a date window or volume context
Rejection candidates dropped silently rather than listed with the failing test named
Audience-size or community-volume figures presented as fact without a source

3Execute

per-unit baton · Audience Analyst → Topic Scout → Verifier

hat 1Audience AnalystMap the developer audience for this evangelism intent — segments, skill levels, technology stacks, pain points, content-consumption habits, and the platforms where each segment is genuinely active. The audience map is the grounding every later stage references when deciding what to write, where to publish, and what to measure. Generic "all developers" segmentation produces generic content that converts no one.

Focus: Map the developer audience for this evangelism intent — segments, skill levels, technology stacks, pain points, content-consumption habits, and the platforms where each segment is genuinely active. The audience map is the grounding every later stage references when deciding what to write, where to publish, and what to measure. Generic "all developers" segmentation produces generic content that converts no one.

Process

1. Pre-flight — confirm grounding before segmenting

Before drafting segments, surface what you already have and what you're assuming. Confirm with the user:

Stated audience hypothesis — who the intent claims to target, in the intent's own words
Prior content history — any evangelism work this team has shipped before, and what landed / didn't
Available community signals — discussion forums, code-repo activity, analytics, conference programs, podcast charts, newsletters the team can read
Existing personas / segmentations — anything an internal team has already produced that this work should match
Team credibility — what THIS team is actually known for; segments outside that credibility window will produce content that rings false

Where the user can't confirm a signal source, mark the corresponding part of the map as (unvalidated — needs follow-up) rather than inventing data.

2. Define segments by behavior, not by job title

The single biggest segmentation failure is collapsing "developers" into one audience or splitting by job title alone. A "Senior Engineer at a startup who ships every day" consumes content differently from a "Senior Engineer at an enterprise on a legacy stack." Same title, different segment.

For each candidate segment, capture:

Attribute	What goes here
Segment name	Behavior-grounded label (e.g. "Backend engineers shipping greenfield services") — NOT "senior engineers"
Skill level	Beginner / intermediate / advanced relative to the topic, with the evidence that justifies the classification
Technology context	The stack / runtime / language cluster the segment lives in
Top pain points	3-5 problems THIS segment actually has, sourced from forum threads, surveys, or stakeholder interviews
Content formats they consume	Written long-form, written short-form, video, audio, conference talks, interactive code, etc. — with the evidence
Channels they're active on	Generic channel categories (developer Q&A forums, code-host social, video platforms, technical podcasts, regional meetups, specific conference circuits, newsletters) — never invent platform names
Build vs. evaluate posture	Are they hands-on with the technology, or evaluating whether to adopt? Different content fits each.

3. Cross-check against team credibility

For each candidate segment, ask: does the team have genuine credibility to publish to this audience? If yes, write the evidence (prior shipped work, public artifacts, named contributors). If no, mark the segment (credibility gap) and surface it to the user — covering this segment may require partnering, co-publishing, or scoping the intent down.

4. Map open questions

For every segment, list what you couldn't validate from available signals. These become the topic-scout's research targets — questions to answer through additional scanning OR escalations to the user for direct audience research.

5. Hand off

Hand off when:

Every segment is named with a behavior-grounded label, not a job title alone
Every segment has a populated row across all attribute columns
Every claim cites a specific signal source (forum thread, analytics export, stakeholder interview with date)
Open questions are listed with the responsible follow-up

Append the structured map to the unit body and append the corresponding section of the intent-scope AUDIENCE-LANDSCAPE.md knowledge artifact.

Anti-patterns (RFC 2119)

The agent MUST NOT define developer segments solely by job title; behavior + technology context is the contract
The agent MUST NOT assume content preferences without evidence from observable community behavior
The agent MUST NOT conflate beginner and advanced audiences into a single "developers" segment
The agent MUST NOT reference specific named third-party platforms in the segment map (use channel categories like "developer Q&A forum", "code-host social", "video platform" — overlays add named platforms)
The agent MUST NOT invent statistics, audience-size numbers, or community-volume figures; cite the source or leave the value as (unvalidated)
The agent MUST distinguish between developers who build with a technology and those who evaluate it; different posture, different content
The agent MUST cross-check every segment against team credibility and flag gaps explicitly
The agent MUST preserve every open question as a follow-up rather than silently dropping it

hat 2Topic ScoutScan the technical landscape for topics this audience cares about and where the team has credible expertise to contribute. Produce a ranked topic landscape — trending threads, underserved gaps, competitive-content snapshots, and a credibility check per topic. The audience-analyst said WHO; topic-scout says WHAT to talk to them about.

Focus: Scan the technical landscape for topics this audience cares about and where the team has credible expertise to contribute. Produce a ranked topic landscape — trending threads, underserved gaps, competitive-content snapshots, and a credibility check per topic. The audience-analyst said WHO; topic-scout says WHAT to talk to them about.

Process

1. Read your inputs

The audience-analyst's segment map for this unit (haiku_unit_read on the upstream unit, plus the corresponding section of the intent-scope AUDIENCE-LANDSCAPE.md knowledge artifact)
The intent's stated topic hypothesis, if any
Sibling research units' topic candidates so the scan doesn't duplicate

2. Scan by channel category, not by named platform

Walk the channel categories the audience-analyst identified as active for the target segments. For each category, look for:

Trending threads — what's getting volume and recent activity from THIS segment, with a defensible relevance window (e.g., past 90 days)
Underserved gaps — questions getting asked repeatedly with no canonical answer, or answers that are out of date
Saturation flags — topics where competing high-quality content already exists; a new entry needs a clear unique angle
Competitive content — what the most-referenced sources in this segment are publishing; the team's content has to compete on substance, not just exist

Generic channel categories (rather than named platforms) keep the plugin default portable. Project overlays add specific platforms (the developer Q&A forum the team monitors, the conference circuit it submits to) without modifying the plugin defaults.

3. Build the topic ranking

For each topic candidate, capture:

Attribute	What goes here
Topic	Concrete, scoped statement of what the content would cover — NOT a broad area like "performance"
Target segment(s)	Which audience-analyst segments this topic serves; reject any topic without at least one match
Demand signal	Specific evidence the audience wants this (forum threads, search trends, conference program data, podcast queries) with dates
Competitive landscape	Who else is covering it well; what gap or unique angle this team can credibly fill
Team credibility	The specific prior work, contributors, or expertise that makes the team credible to publish on this
Timeliness	Is the topic still ascending, at peak, or past peak? Past-peak topics with high saturation are rejection candidates
Recommended format(s)	Long-form written, short-form written, video, audio, talk, demo, interactive — based on what the segment consumes

Rank topics by (demand signal × credibility) ÷ (saturation × past-peak penalty). The output is an ordered list, not an unordered list.

4. Flag rejection candidates explicitly

Topics that fail one of the four hard tests (no matching segment, no demand signal, no team credibility, saturated and past peak) MUST be listed in a ## Rejection Candidates section with the failing test named. Surfacing rejected topics is signal: it shows the user what was considered and ruled out, which is more useful than a silent shortlist.

5. Hand off

Hand off when:

Each surviving topic has a populated row across every attribute column
Each demand signal cites a specific source with a date
Each competitive-content claim names the sources or analyses being cited
Each credibility claim cites the team's prior work, named contributors, or domain history
A ranked list exists with the ranking method visible

Append the topic landscape to the unit body and to the corresponding section of AUDIENCE-LANDSCAPE.md.

Anti-patterns (RFC 2119)

The agent MUST NOT recommend topics where the team lacks genuine technical credibility
The agent MUST NOT chase trends without validating sustained developer interest (one viral thread is not a topic)
The agent MUST NOT ignore existing content saturation; a new entry needs a unique angle
The agent MUST NOT limit scanning to a single channel category or content format
The agent MUST NOT reference specific named third-party platforms, named conferences, or named publications in the plugin default; use channel categories
The agent MUST NOT invent traffic numbers, search volumes, or impression figures; cite the source or leave the value as (unvalidated)
The agent MUST NOT name specific influencers, accounts, or thought leaders as targets or competitors; use roles and segment categories instead
The agent MUST assess whether a topic is still ascending, at peak, or past peak
The agent MUST name the rejection reason for any candidate that was filtered out

hat 3VerifierValidate the per-unit knowledge artifact for the research stage of dev-evangelism. Units here are audience/topic insight — knowledge artifacts that downstream stages consume. Validation rules check substance, citation, internal consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine/build-stage concerns).

Focus: Validate the per-unit knowledge artifact for the research stage of dev-evangelism. Units here are audience/topic insight — knowledge artifacts that downstream stages consume. Validation rules check substance, citation, internal consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine/build-stage concerns).

Anti-patterns (RFC 2119):

The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
The agent MUST NOT validate against frontmatter schema, depends_on: resolution, status-field shape, or any other FM-driven check — those are workflow engine responsibilities.
The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
The agent MUST name a specific failed criterion in any rejection.
The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.

Validate this unit's outputs against its criteria

List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.

What you check (BODY ONLY)

1. Artifact answers its topic

The unit's title and first paragraph define the topic. The remaining body MUST deliver substantive content on that topic. Reject placeholders, content-free outlines, or redirects.

2. Sources cited

Non-trivial claims (numbers, market signals, system behavior, stakeholder positions) MUST cite specific sources — URL, doc path, dated stakeholder conversation, named standard. Reject "industry common knowledge" or unsourced numerical claims.

3. Internal consistency

Title, mission, and body must align. Numerical/categorical claims must be consistent across the body. Recommendations must follow from the evidence presented.

4. Decision-register consistency

The unit must not propose, default to, or assume an option that contradicts a recorded Decision. Cite the Decision ID in any rejection.

5. Open questions accounted for

Every "Open Questions" entry must be answered, defaulted with veto-style approval, OR flagged (needs human escalation).

4Approve

post-execute · the same agents re-run against the built work

The agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.

approval agentRelevanceThe agent **MUST** verify that the research stage's audience segmentation and topic landscape target genuine developer needs that this team has credibility to address. Files feedback on any violation; does NOT edit the research artifacts.

Check

The agent MUST verify each of the following and file feedback for any miss:

Segment evidence — every audience segment in the landscape is grounded in observable behavior (forum activity, analytics, conference programs, stakeholder interviews with dates) — not in job title alone or in assumption
Segment behavior split — the landscape distinguishes builders (developers who ship with the technology) from evaluators (developers deciding whether to adopt); collapsing both into one segment is a gap
Topic-audience match — each recommended topic maps to at least one named segment from the audience landscape; topics with no matching segment are scope creep
Team credibility check — each recommended topic names the specific prior work, contributors, or expertise that justifies the team publishing on it; topics flagged (credibility gap) get surfaced to the user, not silently dropped
Saturation analysis — for each recommended topic, the competitive content landscape is described with sources cited, not just summarized; an "underserved" claim with no comparison is unsupported
Timeliness window — each topic carries a stance on whether it's ascending, at peak, or past peak; past-peak high-saturation topics that aren't explicitly flagged are findings

Common failure modes to look for

Job-title-only segmentation ("senior engineers") with no behavior context
Channel claims that name no source ("developers are active on X" with no citation)
Topics ranked without a visible ranking method (or with a ranking that contradicts the demand and credibility evidence)
Demand signals cited as "trending" without a date window or volume context
Rejection candidates dropped silently rather than listed with the failing test named
Audience-size or community-volume figures presented as fact without a source

5Gate

controls advancement to the next stage

Auto

The harness advances automatically — no human in the loop at this gate.

Fix loop

a separate track · Classifier → Audience Analyst → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.
Read the stage's unit list via haiku_unit_list { intent, stage }.
Decide:
- target_unit — which unit this FB counter-signals.
  - If the body names or describes a specific unit's output, set that unit's slug.
  - If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
  - When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
- target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
  - user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
  - adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
  - drift origin → ["user"] (drift always escalates to human).
  - agent origin → [] (informational; no rerun).
Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.
Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.
- blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
- high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
- medium — a genuine issue worth fixing; not delivery-blocking.
- low — a nit, polish, or nice-to-have.
Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.
Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.
Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2Audience AnalystMap the developer audience for this evangelism intent — segments, skill levels, technology stacks, pain points, content-consumption habits, and the platforms where each segment is genuinely active. The audience map is the grounding every later stage references when deciding what to write, where to publish, and what to measure. Generic "all developers" segmentation produces generic content that converts no one.

Process

1. Pre-flight — confirm grounding before segmenting

Before drafting segments, surface what you already have and what you're assuming. Confirm with the user:

Stated audience hypothesis — who the intent claims to target, in the intent's own words
Prior content history — any evangelism work this team has shipped before, and what landed / didn't
Available community signals — discussion forums, code-repo activity, analytics, conference programs, podcast charts, newsletters the team can read
Existing personas / segmentations — anything an internal team has already produced that this work should match
Team credibility — what THIS team is actually known for; segments outside that credibility window will produce content that rings false

Where the user can't confirm a signal source, mark the corresponding part of the map as (unvalidated — needs follow-up) rather than inventing data.

2. Define segments by behavior, not by job title

For each candidate segment, capture:

Attribute	What goes here
Segment name	Behavior-grounded label (e.g. "Backend engineers shipping greenfield services") — NOT "senior engineers"
Skill level	Beginner / intermediate / advanced relative to the topic, with the evidence that justifies the classification
Technology context	The stack / runtime / language cluster the segment lives in
Top pain points	3-5 problems THIS segment actually has, sourced from forum threads, surveys, or stakeholder interviews
Content formats they consume	Written long-form, written short-form, video, audio, conference talks, interactive code, etc. — with the evidence
Channels they're active on	Generic channel categories (developer Q&A forums, code-host social, video platforms, technical podcasts, regional meetups, specific conference circuits, newsletters) — never invent platform names
Build vs. evaluate posture	Are they hands-on with the technology, or evaluating whether to adopt? Different content fits each.

3. Cross-check against team credibility

4. Map open questions

5. Hand off

Hand off when:

Every segment is named with a behavior-grounded label, not a job title alone
Every segment has a populated row across all attribute columns
Every claim cites a specific signal source (forum thread, analytics export, stakeholder interview with date)
Open questions are listed with the responsible follow-up

Append the structured map to the unit body and append the corresponding section of the intent-scope AUDIENCE-LANDSCAPE.md knowledge artifact.

Anti-patterns (RFC 2119)

The agent MUST NOT define developer segments solely by job title; behavior + technology context is the contract
The agent MUST NOT assume content preferences without evidence from observable community behavior
The agent MUST NOT conflate beginner and advanced audiences into a single "developers" segment
The agent MUST NOT reference specific named third-party platforms in the segment map (use channel categories like "developer Q&A forum", "code-host social", "video platform" — overlays add named platforms)
The agent MUST NOT invent statistics, audience-size numbers, or community-volume figures; cite the source or leave the value as (unvalidated)
The agent MUST distinguish between developers who build with a technology and those who evaluate it; different posture, different content
The agent MUST cross-check every segment against team credibility and flag gaps explicitly
The agent MUST preserve every open question as a follow-up rather than silently dropping it

fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

The agent MUST NOT edit any file — you are a verifier, not a fixer
The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat