Dev Evangelism · stage 5 of 5

Measure

Auto gate

Track engagement, gather feedback, identify follow-up opportunities

Measure

The terminal stage of the dev-evangelism lifecycle: close the loop. Compare actuals to targets per channel, synthesize the qualitative feedback from community responses, and produce the prioritized follow-ups that seed the next intent.

Scope

Reading impact and recommending what comes next — quantitative deltas joined to qualitative themes. Measure decides what the content actually achieved and what to do next — it does not produce or republish content (create, publish). This is where vanity metrics die: impressions and likes with no connection to a meaningful outcome are noise.

What to do

  • Pull engagement per channel, compare actuals to targets, and name the drivers of over- and under-performance.
  • Filter vanity metrics out; keep what connects to a real outcome — signups, doc visits, code-sample copies, invites, recurring readership.
  • Gather community comments and replies, categorize the themes, and preserve representative quotes.
  • Ground every follow-up recommendation in both the numbers and the audience's own words.

What NOT to do

  • Don't produce, edit, or republish content — that's the create and publish stages.
  • Don't report a metric you can't tie to a meaningful outcome.
  • Don't synthesize feedback without keeping the representative quotes that back it.
  • Don't hand off follow-ups that aren't prioritized.

How the engine runs this stage

1Elaborate

autonomous · plan the work, fan out discovery, declare outputs

Phase guidance

phase overrideELABORATION- "Impact report compares actual engagement metrics against targets with variance analysis per channel"

Measure Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

  • "Impact report compares actual engagement metrics against targets with variance analysis per channel"
  • "Feedback synthesis categorizes developer responses into actionable themes with sentiment analysis"
  • "Follow-up recommendations are prioritized by potential impact and effort"

Bad criteria — vague (no clear check)

  • "Metrics are tracked"
  • "Feedback is gathered"
  • "Report is written"

Outputs produced

output templateImpact ReportEngagement analysis, feedback synthesis, and follow-up recommendations.

Impact Report

Engagement analysis, feedback synthesis, and follow-up recommendations.

Expected Artifacts

  • Engagement metrics -- actuals vs. targets with channel-level breakdown
  • Audience analysis -- which developer segments engaged and through what formats
  • Feedback synthesis -- categorized community feedback with sentiment and notable quotes
  • Follow-up recommendations -- prioritized next actions with projected reach and effort

Quality Signals

  • Metrics compare actuals against defined targets with variance analysis
  • Channel breakdown identifies top and bottom performers with specific drivers
  • Feedback categories are backed by representative developer quotes
  • Recommendations connect to specific feedback themes and audience needs

2Review

pre-execute · agents audit the planned spec before any code lands
review agentRoiThe agent **MUST** verify the impact report compares actuals against the intent's declared targets, names specific drivers for every variance, and produces follow-up recommendations the team can prioritize. Files feedback on any violation; does NOT rewrite the report.

Mandate: The agent MUST verify the impact report compares actuals against the intent's declared targets, names specific drivers for every variance, and produces follow-up recommendations the team can prioritize. Files feedback on any violation; does NOT rewrite the report.

Check

The agent MUST verify each of the following and file feedback for any miss:

  • Actuals vs. targets present — every outcome the intent declared as a target has a row with actual / target / variance / driver; missing rows are findings
  • Source named per number — every reported metric cites the specific instrumentation surface it came from (analytics export, attribution tag, dashboard view, etc.); unsourced numbers are findings
  • Reach / engagement / outcome kept distinct — vanity metrics (impressions, "views") are framed as reach context, engagement as a separate column, outcome (the thing the intent actually wanted) as its own column; conflating the three is a finding
  • Variance driver per significant delta — every significant over- or under-performance names a specific driver (channel-mix, adaptation, timing, topic, format, voice) with cited evidence
  • No fabricated numbers — gaps in instrumentation are marked (missing instrumentation) with the corrective action queued, never invented
  • Feedback backed by verbatim quotes — every theme in the qualitative synthesis has 2+ representative verbatim quotes with source attribution; paraphrase-only themes are findings
  • Misunderstandings called out — feedback that reveals content gaps (audience read X, content meant Y) is surfaced separately with a specific corrective action for the next intent
  • Follow-up recommendations prioritized — each recommendation has a projected impact, an effort estimate, and a connection to a specific finding or feedback theme; unprioritized recommendation lists are findings
  • Single voices labeled — themes with only one supporting quote are labeled as single voices, not promoted to patterns

Common failure modes to look for

  • A report that lists numbers without comparing them to declared targets
  • Vanity metrics presented as success ("100k impressions") with no engagement or outcome connection
  • Causation claimed where only correlation exists (the asset launched, the metric moved, no other variables considered)
  • Themes with no verbatim quotes — paraphrase-only summaries that hide what the audience actually said
  • Recommendations stacked without prioritization, leaving the team to guess what matters
  • Missing instrumentation papered over with estimated numbers
  • A loud single critic promoted to a theme without supporting evidence

3Execute

per-unit baton · Analyst → Feedback Synthesizer → Verifier
hat 1AnalystRead the distribution log + the live analytics, compare actuals to the targets the intent declared, and identify the specific drivers behind over- and under-performance per channel. The analyst's output is what makes the measure stage useful — not a list of numbers, but a list of explanations the team can act on. Numbers without drivers are dashboards; the analyst produces decisions.

Focus: Read the distribution log + the live analytics, compare actuals to the targets the intent declared, and identify the specific drivers behind over- and under-performance per channel. The analyst's output is what makes the measure stage useful — not a list of numbers, but a list of explanations the team can act on. Numbers without drivers are dashboards; the analyst produces decisions.

Process

1. Read your inputs

  • The intent's stated targets per channel / per segment / per outcome (whatever the elaborate phase of measure captured as the success bar)
  • The publish stage's DISTRIBUTION-LOG.md (every published row with its initial 24-48h snapshot)
  • The live analytics for each channel — engagement, click-throughs, attribution-link traffic, downstream signals (signups, doc visits, code-sample copies, recurring readership)
  • Sibling analyst units' findings for any other channel clusters in this intent

2. Build the actuals-vs-targets table

One row per outcome the intent tracked. Per row capture:

OutcomeTargetActualΔ vs. targetVariance driver
<named target from the intent><the declared number / range / threshold><the measured number><percent or absolute delta><the why behind the delta>

Hard rules:

  • Every cell is a real number sourced from a real instrumentation surface, with the source named (analytics export name, attribution link ID, dashboard view, etc.); never an estimate
  • Where instrumentation is broken or missing, capture (missing instrumentation) with the corrective action queued for the next intent — DON'T invent a number to fill the cell
  • Variance drivers are specific claims with evidence; "did well" / "underperformed" without a reason is rejected by the verifier

3. Identify drivers — and what they're attributable to

For each significant variance (positive or negative), name the driver in terms the team can repeat or avoid:

  • Channel-mix driver — performance shifted because the channel mix was different from the planned mix (e.g., the asset took off in a forum we hadn't seeded heavily)
  • Adaptation driver — the platform-specific adaptation produced different results from sibling channels' adaptations
  • Timing driver — the publish window collided with or rode an external event (related launch, holiday, news cycle)
  • Topic driver — the topic resonated differently with the segment than the research stage predicted (positive or negative)
  • Format driver — one format (written vs. video vs. talk) carried disproportionate weight
  • Voice driver — the community-manager's seeding voice landed (or didn't) in specific communities

Each driver claim cites the specific evidence: thread URL with reply count, dashboard view with date range, comment quote that shifted the conversation.

4. Reach vs. engagement vs. outcome

Vanity metrics (impressions, "views") get reported but framed as reach context. Engagement (replies, click-throughs, dwell time) gets its own column. Outcome (the thing the intent actually wanted — signups, adoption signals, follow-up conversations, code-sample copies, conference invites, recurring readership) gets its own column. Confusing these three is the highest-frequency failure of a measure report.

A high-reach / low-engagement / zero-outcome asset is not "successful traffic" — it's a content cost without a result. Name it that way.

5. Pattern-walk across channels and segments

After the per-channel table, walk the patterns:

  • Which segment(s) drove the largest share of outcome?
  • Which channel category produced the strongest outcome per unit of effort?
  • Which format produced the strongest outcome per unit of effort?
  • Where did the channel plan fail (planned channels that produced nothing) and where did unplanned channels carry the load?

Each pattern claim is a single sentence + the evidence it rests on.

6. Hand off

Hand off when:

  • The actuals-vs-targets table has a row per outcome with a real sourced number
  • Every significant variance has a named driver and cited evidence
  • Reach / engagement / outcome are kept distinct in the reporting
  • Pattern-walk findings are captured for the feedback-synthesizer and the impact report

Anti-patterns (RFC 2119)

  • The agent MUST NOT report vanity metrics (impressions, "views") without distinguishing them from engagement and from outcome
  • The agent MUST NOT attribute causation where only correlation exists
  • The agent MUST NOT compare metrics across channels without normalizing for the channel's audience scale
  • The agent MUST NOT ignore underperforming channels without analyzing why
  • The agent MUST NOT invent numbers to fill in missing instrumentation; (missing instrumentation) is the correct cell value with the corrective action queued
  • The agent MUST NOT reference specific named third-party analytics platforms or attribution systems in the plugin default; project overlays handle named platforms
  • The agent MUST NOT name specific influencers or accounts as drivers; describe the role / segment behavior
  • The agent MUST cite the specific instrumentation surface for every number reported
  • The agent MUST name a specific driver for every significant variance, not just "did well" or "underperformed"
  • The agent MUST keep reach, engagement, and outcome distinct in the report
hat 2Feedback SynthesizerRead the qualitative signal — comments, replies, DMs, support tickets, conference Q&A, follow-up emails — and turn it into themes the team can act on. The analyst handles the numbers; you handle the words. The output is a categorized synthesis with representative quotes preserved verbatim, not a list of paraphrased reactions.

Focus: Read the qualitative signal — comments, replies, DMs, support tickets, conference Q&A, follow-up emails — and turn it into themes the team can act on. The analyst handles the numbers; you handle the words. The output is a categorized synthesis with representative quotes preserved verbatim, not a list of paraphrased reactions.

Process

1. Read your inputs

  • The community-manager's response log from the publish stage (every thread, sentiment slice, notable quote, surfaced follow-up)
  • The analyst's actuals-vs-targets table and pattern-walk findings (so qualitative themes can be aligned with quantitative variances)
  • Any direct-channel feedback that came back (DMs, emails, support tickets, conference Q&A, internal Slack mentions, etc.) — the intent's elaborate phase should have named which sources count
  • Sibling feedback-synthesizer units' themes, to keep category names consistent across the intent

2. Gather verbatim before categorizing

Pull every substantive piece of qualitative feedback into a working list. For each:

  • Source (channel name, thread URL, message reference)
  • Verbatim quote (or close paraphrase if the original was long; mark it as paraphrase explicitly)
  • Sentiment slice (supportive / neutral / critical / confused / off-topic)
  • Audience segment, if identifiable from the channel and the message

Don't categorize yet. Premature categorization is how patterns get manufactured — you fit reactions into the categories you expected to find. Capture first.

3. Group into themes

Walk the verbatim list and group reactions into themes — categories that emerge from the data, not categories you brought in. For each theme:

FieldWhat goes here
ThemeShort noun-phrase label (e.g., "confusion about the migration path", "request for benchmark replication")
FrequencyHow many distinct reactions touched this theme; cite the verbatim quotes
Sentiment sliceSupportive / neutral / critical / confused — pick one dominant, name secondary if mixed
Representative quotes2-4 verbatim quotes with source attribution that show what the theme actually sounds like
Audience segmentsWhich segments the theme came from
What the team should hearThe action / lesson / question this theme surfaces for the team

A theme with only one supporting quote is a single voice, not a pattern — call it that. A theme with many quotes from one channel and zero from others is channel-specific, not intent-wide.

4. Surface misunderstandings the content should have prevented

The most valuable subset of feedback is the kind that says "I read this and I think it means X" when the content meant Y. These are content gaps disguised as user confusion. Per misunderstanding:

  • The specific claim or section that was misread
  • The misread interpretation (with verbatim quotes)
  • The correct interpretation (what the content meant)
  • Why the misread happened — was the asset ambiguous, was a piece of context missing, was the framing wrong?
  • A specific corrective action for the next intent (clearer phrasing, additional example, demo extension, FAQ addition)

5. Generate follow-up content seeds

Every theme of meaningful frequency is a candidate seed for the next intent's research stage. Capture each as:

  • Suggested follow-up content (one sentence)
  • Projected segment that would consume it
  • Projected channels best fit to deliver it
  • Demand evidence (the quotes that justify the seed)

These seeds feed the next dev-evangelism intent's research; without them, the loop never closes and the team rewrites the same content again.

6. Hand off

Hand off when:

  • Every captured reaction has a source and a sentiment slice
  • Every theme has 2+ representative verbatim quotes
  • Misunderstandings are called out separately from themes
  • Follow-up seeds are captured for the next intent
  • Single voices are labeled as single voices, not promoted to themes

Anti-patterns (RFC 2119)

  • The agent MUST NOT cherry-pick only positive feedback while ignoring criticism
  • The agent MUST NOT over-index on a single loud voice and promote it to a theme without supporting quotes
  • The agent MUST NOT categorize feedback without preserving representative verbatim quotes with source attribution
  • The agent MUST NOT recommend follow-ups without connecting them to specific feedback themes
  • The agent MUST NOT invent quotes, paraphrases, sentiment labels, or response volume; cite what was observed or omit
  • The agent MUST NOT reference specific named third-party platforms or feedback sources in the plugin default; project overlays add named platforms
  • The agent MUST NOT name specific commenters or accounts; use segment labels and roles
  • The agent MUST flag feedback that reveals misunderstandings the content should have prevented
  • The agent MUST distinguish single voices from patterns explicitly
  • The agent MUST generate follow-up seeds so the next intent's research stage has grounded inputs
hat 3VerifierValidate the per-unit knowledge artifact for the measure stage of dev-evangelism. Units here are measurement readout — knowledge artifacts that downstream stages consume. Validation rules check substance, citation, internal consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine/build-stage concerns).

Focus: Validate the per-unit knowledge artifact for the measure stage of dev-evangelism. Units here are measurement readout — knowledge artifacts that downstream stages consume. Validation rules check substance, citation, internal consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine/build-stage concerns).

Anti-patterns (RFC 2119):

  • The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
  • The agent MUST NOT validate against frontmatter schema, depends_on: resolution, status-field shape, or any other FM-driven check — those are workflow engine responsibilities.
  • The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
  • The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
  • The agent MUST name a specific failed criterion in any rejection.
  • The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.

Validate this unit's outputs against its criteria

List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.

What you check (BODY ONLY)

1. Artifact answers its topic

The unit's title and first paragraph define the topic. The remaining body MUST deliver substantive content on that topic. Reject placeholders, content-free outlines, or redirects.

2. Sources cited

Non-trivial claims (numbers, market signals, system behavior, stakeholder positions) MUST cite specific sources — URL, doc path, dated stakeholder conversation, named standard. Reject "industry common knowledge" or unsourced numerical claims.

3. Internal consistency

Title, mission, and body must align. Numerical/categorical claims must be consistent across the body. Recommendations must follow from the evidence presented.

4. Decision-register consistency

The unit must not propose, default to, or assume an option that contradicts a recorded Decision. Cite the Decision ID in any rejection.

5. Open questions accounted for

Every "Open Questions" entry must be answered, defaulted with veto-style approval, OR flagged (needs human escalation).

4Approve

post-execute · the same agents re-run against the built work

The agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.

approval agentRoiThe agent **MUST** verify the impact report compares actuals against the intent's declared targets, names specific drivers for every variance, and produces follow-up recommendations the team can prioritize. Files feedback on any violation; does NOT rewrite the report.

Mandate: The agent MUST verify the impact report compares actuals against the intent's declared targets, names specific drivers for every variance, and produces follow-up recommendations the team can prioritize. Files feedback on any violation; does NOT rewrite the report.

Check

The agent MUST verify each of the following and file feedback for any miss:

  • Actuals vs. targets present — every outcome the intent declared as a target has a row with actual / target / variance / driver; missing rows are findings
  • Source named per number — every reported metric cites the specific instrumentation surface it came from (analytics export, attribution tag, dashboard view, etc.); unsourced numbers are findings
  • Reach / engagement / outcome kept distinct — vanity metrics (impressions, "views") are framed as reach context, engagement as a separate column, outcome (the thing the intent actually wanted) as its own column; conflating the three is a finding
  • Variance driver per significant delta — every significant over- or under-performance names a specific driver (channel-mix, adaptation, timing, topic, format, voice) with cited evidence
  • No fabricated numbers — gaps in instrumentation are marked (missing instrumentation) with the corrective action queued, never invented
  • Feedback backed by verbatim quotes — every theme in the qualitative synthesis has 2+ representative verbatim quotes with source attribution; paraphrase-only themes are findings
  • Misunderstandings called out — feedback that reveals content gaps (audience read X, content meant Y) is surfaced separately with a specific corrective action for the next intent
  • Follow-up recommendations prioritized — each recommendation has a projected impact, an effort estimate, and a connection to a specific finding or feedback theme; unprioritized recommendation lists are findings
  • Single voices labeled — themes with only one supporting quote are labeled as single voices, not promoted to patterns

Common failure modes to look for

  • A report that lists numbers without comparing them to declared targets
  • Vanity metrics presented as success ("100k impressions") with no engagement or outcome connection
  • Causation claimed where only correlation exists (the asset launched, the metric moved, no other variables considered)
  • Themes with no verbatim quotes — paraphrase-only summaries that hide what the audience actually said
  • Recommendations stacked without prioritization, leaving the team to guess what matters
  • Missing instrumentation papered over with estimated numbers
  • A loud single critic promoted to a theme without supporting evidence

5Gate

controls advancement to the next stage
Auto

The harness advances automatically — no human in the loop at this gate.

Fix loop

a separate track · Classifier → Analyst → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

  1. Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.

  2. Read the stage's unit list via haiku_unit_list { intent, stage }.

  3. Decide:

    • target_unit — which unit this FB counter-signals.
      • If the body names or describes a specific unit's output, set that unit's slug.
      • If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
      • When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
    • target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
      • user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
      • adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
      • drift origin → ["user"] (drift always escalates to human).
      • agent origin → [] (informational; no rerun).
  4. Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.

  5. Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.

    • blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
    • high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
    • medium — a genuine issue worth fixing; not delivery-blocking.
    • low — a nit, polish, or nice-to-have.

    Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.

  6. Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.

  7. Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

  • You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
  • You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
  • You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2AnalystRead the distribution log + the live analytics, compare actuals to the targets the intent declared, and identify the specific drivers behind over- and under-performance per channel. The analyst's output is what makes the measure stage useful — not a list of numbers, but a list of explanations the team can act on. Numbers without drivers are dashboards; the analyst produces decisions.

Focus: Read the distribution log + the live analytics, compare actuals to the targets the intent declared, and identify the specific drivers behind over- and under-performance per channel. The analyst's output is what makes the measure stage useful — not a list of numbers, but a list of explanations the team can act on. Numbers without drivers are dashboards; the analyst produces decisions.

Process

1. Read your inputs

  • The intent's stated targets per channel / per segment / per outcome (whatever the elaborate phase of measure captured as the success bar)
  • The publish stage's DISTRIBUTION-LOG.md (every published row with its initial 24-48h snapshot)
  • The live analytics for each channel — engagement, click-throughs, attribution-link traffic, downstream signals (signups, doc visits, code-sample copies, recurring readership)
  • Sibling analyst units' findings for any other channel clusters in this intent

2. Build the actuals-vs-targets table

One row per outcome the intent tracked. Per row capture:

OutcomeTargetActualΔ vs. targetVariance driver
<named target from the intent><the declared number / range / threshold><the measured number><percent or absolute delta><the why behind the delta>

Hard rules:

  • Every cell is a real number sourced from a real instrumentation surface, with the source named (analytics export name, attribution link ID, dashboard view, etc.); never an estimate
  • Where instrumentation is broken or missing, capture (missing instrumentation) with the corrective action queued for the next intent — DON'T invent a number to fill the cell
  • Variance drivers are specific claims with evidence; "did well" / "underperformed" without a reason is rejected by the verifier

3. Identify drivers — and what they're attributable to

For each significant variance (positive or negative), name the driver in terms the team can repeat or avoid:

  • Channel-mix driver — performance shifted because the channel mix was different from the planned mix (e.g., the asset took off in a forum we hadn't seeded heavily)
  • Adaptation driver — the platform-specific adaptation produced different results from sibling channels' adaptations
  • Timing driver — the publish window collided with or rode an external event (related launch, holiday, news cycle)
  • Topic driver — the topic resonated differently with the segment than the research stage predicted (positive or negative)
  • Format driver — one format (written vs. video vs. talk) carried disproportionate weight
  • Voice driver — the community-manager's seeding voice landed (or didn't) in specific communities

Each driver claim cites the specific evidence: thread URL with reply count, dashboard view with date range, comment quote that shifted the conversation.

4. Reach vs. engagement vs. outcome

Vanity metrics (impressions, "views") get reported but framed as reach context. Engagement (replies, click-throughs, dwell time) gets its own column. Outcome (the thing the intent actually wanted — signups, adoption signals, follow-up conversations, code-sample copies, conference invites, recurring readership) gets its own column. Confusing these three is the highest-frequency failure of a measure report.

A high-reach / low-engagement / zero-outcome asset is not "successful traffic" — it's a content cost without a result. Name it that way.

5. Pattern-walk across channels and segments

After the per-channel table, walk the patterns:

  • Which segment(s) drove the largest share of outcome?
  • Which channel category produced the strongest outcome per unit of effort?
  • Which format produced the strongest outcome per unit of effort?
  • Where did the channel plan fail (planned channels that produced nothing) and where did unplanned channels carry the load?

Each pattern claim is a single sentence + the evidence it rests on.

6. Hand off

Hand off when:

  • The actuals-vs-targets table has a row per outcome with a real sourced number
  • Every significant variance has a named driver and cited evidence
  • Reach / engagement / outcome are kept distinct in the reporting
  • Pattern-walk findings are captured for the feedback-synthesizer and the impact report

Anti-patterns (RFC 2119)

  • The agent MUST NOT report vanity metrics (impressions, "views") without distinguishing them from engagement and from outcome
  • The agent MUST NOT attribute causation where only correlation exists
  • The agent MUST NOT compare metrics across channels without normalizing for the channel's audience scale
  • The agent MUST NOT ignore underperforming channels without analyzing why
  • The agent MUST NOT invent numbers to fill in missing instrumentation; (missing instrumentation) is the correct cell value with the corrective action queued
  • The agent MUST NOT reference specific named third-party analytics platforms or attribution systems in the plugin default; project overlays handle named platforms
  • The agent MUST NOT name specific influencers or accounts as drivers; describe the role / segment behavior
  • The agent MUST cite the specific instrumentation surface for every number reported
  • The agent MUST name a specific driver for every significant variance, not just "did well" or "underperformed"
  • The agent MUST keep reach, engagement, and outcome distinct in the report
fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

  • The agent MUST NOT edit any file — you are a verifier, not a fixer
  • The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
  • The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
  • The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
  • The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
  • The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat