Measure
Auto gateTrack KPIs, analyze performance, and generate insights and recommendations
Measure
Close the loop on the campaign: read what actually happened from the launch log and the channel platforms, compare it against the goals strategy defined, attribute outcomes to specific decisions, and produce recommendations the next campaign can act on. This stage exists to make the next campaign better than this one.
Scope
Performance analysis and recommendations. Measure decides what the results were, why, and what to do differently next time — not the live activation it grades (launch) or the goals it grades against (strategy). Units are measurement surfaces (channel, segment, asset, overall-vs-goal); they may share data but produce distinct analytic lenses.
What to do
- Pull performance data per channel, segment, and asset, and compare actual KPIs to the strategy's targets using the strategy's own definitions.
- State the attribution model explicitly and segment the data to find the patterns that explain the result.
- Be honest about statistical caveats and the limits of what the data can support.
- Tie every recommendation to a specific finding so the next campaign can act on evidence, not opinion.
What NOT to do
- Don't redefine the KPIs to flatter the result — measure against the strategy's definitions as written.
- Don't relaunch, re-author assets, or change live channels; measure analyzes, it doesn't operate.
- Don't claim attribution the data can't support, or bury the caveats.
- Don't produce recommendations that float free of a finding.
How the engine runs this stage
1Elaborate
autonomous · plan the work, fan out discovery, declare outputsInputs consumed
Discovery fan-out
knowledge artifactPerformance ReportCampaign performance analysis and recommendations. This is the final output of the marketing lifecycle — the record of what happened and what to do next.
Performance Report
Campaign performance analysis and recommendations. This is the final output of the marketing lifecycle — the record of what happened and what to do next.
Content Guide
Structure the report for stakeholder consumption:
- Executive summary — campaign outcome in 3-5 sentences with headline metrics
- KPI performance — actuals vs. targets for each campaign goal with variance analysis
- Channel breakdown — per-channel performance metrics, cost efficiency, and audience engagement
- Audience segment analysis — which segments responded, converted, or disengaged and why
- Asset performance — which creative assets drove results and which underperformed
- Causal insights — what drove the results, both positive and negative
- Recommendations — prioritized actions for future campaigns, ranked by projected impact
Quality Signals
- Every metric is compared against the campaign's stated goals, not just reported in isolation
- Underperformance is analyzed honestly, not minimized or omitted
- Recommendations are specific, data-backed, and actionable — not generic best practices
- The report stands alone without requiring the reader to reference the campaign log
Phase guidance
phase overrideELABORATION- "Performance report compares actual KPIs against campaign goals with variance analysis"
Measure Stage — Elaboration
Criteria Guidance
Good criteria — concrete and verifiable
- "Performance report compares actual KPIs against campaign goals with variance analysis"
- "Channel-level breakdown identifies top and bottom performers with specific metrics"
- "Recommendations are data-backed with projected impact if implemented"
Bad criteria — vague (no clear check)
- "Metrics are reported"
- "Performance is analyzed"
- "Recommendations are provided"
Outputs produced
output templatePerformance ReportKPI results, channel-level analysis, and data-backed recommendations.
Performance Report
KPI results, channel-level analysis, and data-backed recommendations.
Expected Artifacts
- KPI actuals vs targets -- variance analysis for each campaign goal
- Channel breakdown -- top and bottom performers identified with specific metrics
- Audience segment analysis -- performance differences across segments
- Recommendations -- data-backed suggestions with projected impact for future campaigns
Quality Signals
- KPIs are compared against campaign goals with variance analysis
- Channel-level breakdown identifies what worked and what didn't
- Recommendations are data-backed with projected impact
- Findings are packaged into actionable, prioritized recommendations
2Review
pre-execute · agents audit the planned spec before any code landsreview agentMethodologyThe agent **MUST** verify the measurement methodology is sound and the conclusions are warranted — KPIs match the strategy's definitions, the attribution model is stated and appropriate, statistical caveats are honest, and recommendations trace to specific findings. Methodology gaps here become next-campaign mistakes dressed up as data-backed decisions.
Mandate: The agent MUST verify the measurement methodology is sound and the conclusions are warranted — KPIs match the strategy's definitions, the attribution model is stated and appropriate, statistical caveats are honest, and recommendations trace to specific findings. Methodology gaps here become next-campaign mistakes dressed up as data-backed decisions.
Check
The agent MUST verify, file feedback for any violation:
- KPI fidelity to strategy — Every KPI reported matches the strategy stage's KPI definitions. KPIs silently redefined ("we said unaided recall, the report shows aided recall"), or new KPIs introduced without flagging the change, are findings.
- Attribution model stated and appropriate — The attribution model used (named: multi-touch, last-touch, first-touch, modeled, qualitative) is explicit AND appropriate for the channel mix the campaign ran. Last-touch attribution on a campaign that leans on awareness channels is a finding; multi-touch attribution claimed without naming the touch-weighting is a finding.
- Statistical caveats honest — Where sample size, window state, or attribution confidence limit what the data can claim, the limit is stated. Confident conclusions drawn from underpowered slices, ongoing campaigns reported as final, or single-source attribution claims presented without caveat are findings.
- Recommendation traceability — Every recommendation cites a specific finding from the analyst's data. Generic best-practice recommendations ("test more variants", "increase budget") not tied to this campaign's data are findings. Recommendations that assume causation where only correlation was demonstrated are findings.
- Underperformance surfaced — Underperformance by channel, segment, or asset is reported as plainly as outperformance. A report that frames every result as a win is a finding (cherry-picking is the failure mode this lens exists to prevent).
- Data-gap disclosure — Gaps in the campaign log (missing timestamps, missing tracking confirmations, unlogged anomalies) that constrain what the analysis can conclude are named. Gaps treated as if they didn't exist are findings.
- No fabricated benchmarks — Industry benchmarks, projected impact figures, and "typical conversion rate" claims are either cited to a real source or stated as ordinal language (small / meaningful / large). Invented benchmark numbers are findings.
Common failure modes to look for
- A KPI in the report that doesn't appear in the strategy's KPI definitions
- An attribution claim with no named model
- A confident conclusion drawn from a segment cut whose sample size makes the cut non-meaningful
- A recommendation ("increase budget on channel X") presented without the underlying finding that supports it
- A "neutral" finding section that quietly omits the worst-performing channel
- A projected-impact figure given as a specific number with no derivation shown
- An ongoing campaign reported as if the lagging indicators have stabilized when they haven't
3Execute
per-unit baton · Analyst → Report Writer → Verifierhat 1AnalystRead the campaign log and the channel performance data, compare actual outcomes against the strategy's stated goals and KPIs, segment to find patterns, and identify the drivers behind both wins and underperformance. Your output is the evidence base the report-writer turns into a stakeholder narrative — analytic rigor here directly bounds the quality of every recommendation downstream.
Focus: Read the campaign log and the channel performance data, compare actual outcomes against the strategy's stated goals and KPIs, segment to find patterns, and identify the drivers behind both wins and underperformance. Your output is the evidence base the report-writer turns into a stakeholder narrative — analytic rigor here directly bounds the quality of every recommendation downstream.
Process
1. Read your inputs before pulling data
- The campaign log from the launch stage — what went live, when, on which channels, with which tracking
- The strategy's goals and KPIs for this campaign — the targets you're comparing against
- The strategy's segment definitions — the lens for segmentation analysis
- Sibling measure units' findings, so attribution doesn't double-count across the stage
If the campaign log has gaps (missing timestamps, missing tracking confirmation, unlogged channel activity), name them before analyzing — gappy data with confident conclusions is the most expensive analyst failure mode.
2. Compare actuals to goals — variance first
For each goal the strategy defined, produce:
- Target — the goal's specific number and window, verbatim from strategy
- Actual — the measured outcome over the equivalent window
- Variance — actual minus target, in absolute and percentage terms
- Confidence — qualitative note on the strength of the measurement (clean attribution, ambiguous attribution, mixed signal)
If the campaign window is still open or the goal's lagging indicators have not stabilized, say so. Don't report partial signals as final outcomes.
3. Segment performance to find patterns
Break performance down on at least three dimensions:
- By channel category — which channels (paid, owned, earned, direct) delivered, which didn't, against their share of investment and effort
- By audience segment — which segments responded as the strategy predicted, which didn't, which over- or under-indexed
- By asset / variant — which creative or content variants drove the outcome, which didn't (where variants were tested)
Where the data supports it, cross-segment (e.g., "segment A on channel category X over-indexed; segment A on channel category Y under-indexed"). Cross-segments are often where the most actionable insight lives.
Report only segmentation cuts the data actually supports. If sample size is too small for a cut to be meaningful, say so — don't show a confident-looking chart for a non-confident slice.
4. Attribute drivers, honestly
For each significant outcome (win or loss):
- What drove it — the specific decision, asset, channel, audience, or external factor most likely responsible
- Evidence supporting the attribution — the data points that point this direction
- Counter-evidence — what would tell you the attribution is wrong; whether it's present
- Confidence — how strongly the data supports the attribution (named multi-touch, last-touch, modeled, qualitative)
Do not confuse correlation with causation. If two things moved together but the causal mechanism isn't clear, say so. The strategy's named attribution model is the starting point; deviate only with a stated reason.
5. Surface anomalies honestly
The most expensive thing the analyst can do is bury underperformance. For each channel, segment, or asset that underperformed:
- Name it explicitly with the variance
- Hypothesize the cause; mark it as hypothesis, not conclusion
- Flag whether the underperformance was structural (won't repeat the same way) or systemic (will repeat unless changed)
Cherry-picking wins is the failure mode this hat exists to prevent.
6. Self-check before handing off
- Every strategy goal has an actuals row with variance and confidence
- At least three segmentation dimensions are reported (channel, audience, asset / variant)
- Every significant outcome has named drivers AND counter-evidence considered
- Underperformance is reported as honestly as outperformance
- Statistical caveats are explicit where sample size, attribution model, or window state require them
- Data gaps from the campaign log are named, not hidden
- No fabricated benchmark numbers; if external benchmarks are referenced, they're cited
- Open Questions section flags anything that needs a follow-up read or an external data source
Anti-patterns (RFC 2119)
- The agent MUST NOT report metrics without comparing to the campaign's stated goals
- The agent MUST NOT cherry-pick favorable data while ignoring underperforming channels, segments, or assets
- The agent MUST NOT confuse correlation with causation in attribution analysis; mark attribution confidence honestly
- The agent MUST NOT present raw numbers without contextualizing them against goals and constraints
- The agent MUST segment performance by channel category, audience, and asset / variant to surface actionable patterns
- The agent MUST NOT fabricate benchmark conversion rates, ad-spend efficiency numbers, or industry averages
- The agent MUST declare statistical caveats where sample size or window state require them
- The agent MUST NOT hide campaign-log data gaps; name them and constrain conclusions accordingly
- The agent MUST reference channel categories generically; named platforms live in the project overlay
- The agent MUST NOT present hypotheses as conclusions; label confidence explicitly
hat 2Report WriterTurn the analyst's findings into a clear, actionable performance report for stakeholders. Translate data into narrative: what happened, why it matters, what to do next. Prioritize recommendations by projected impact and confidence. The analyst owns the data; you own the story.
Focus: Turn the analyst's findings into a clear, actionable performance report for stakeholders. Translate data into narrative: what happened, why it matters, what to do next. Prioritize recommendations by projected impact and confidence. The analyst owns the data; you own the story.
Process
1. Read the analyst's output before drafting
- The analyst's full findings for this unit (
haiku_unit_read) - The strategy goals the analyst compared against
- Sibling measure units' reports (where they exist) so the campaign-level narrative is coherent across units
If the analyst's output has unresolved hypotheses, low-confidence attribution, or named data gaps, the report MUST surface them. Reports that smooth over uncertainty become next-campaign mistakes.
2. Structure the report by audience expectation
A stakeholder report has three layers; produce all three:
- Executive summary — three to five sentences, top-of-document. What were the campaign's goals, did they hit, what's the recommended next move. Someone who reads only the summary should know whether the campaign worked and what's next
- Findings section — the analyst's variance, segmentation, and attribution in narrative form. Lead each section with the takeaway sentence, then back it with the data
- Recommendations section — prioritized actions, separated by quick wins versus strategic shifts (see step 4)
Don't bury insights in dense data tables. Lead with the sentence; tables and charts support, they don't substitute.
3. Write the findings as narrative, not as a data dump
For each significant finding from the analyst:
- Lead with the takeaway — "Paid channel category A delivered 1.6x its share of total conversions" (not "Channel A: 1,234 conversions")
- Back it with the data — the specific numbers, segmented appropriately
- Connect it to the goal — what this finding means for whether the campaign achieved its objective
- State the confidence — qualitative note carried forward from the analyst; never harden a hypothesis into a conclusion
If the analyst surfaced underperformance, the report MUST surface it too. Underperformance, framed honestly, is more valuable to the next campaign than any single win — don't bury it.
4. Write recommendations grounded in the data
Every recommendation MUST trace to a specific finding. Generic best-practice advice ("test more creative variants") not tied to this campaign's data does not belong in the report — that's content, not a recommendation.
For each recommendation:
- Action — what specifically to do or stop doing
- Why — the finding it traces to, cited by reference
- Projected impact — how much this could move which KPI, with the confidence level
- Effort / cost note — relative effort to implement (low / medium / high), so the prioritization is honest
Sort into two tiers:
- Quick wins — recommendations the next campaign can apply without strategy-level rethinking
- Strategic shifts — recommendations that require revisiting goals, segments, channels, or positioning in the next strategy cycle
Mark which recommendations are mutually exclusive (only one of A, B, or C makes sense) so stakeholders don't try to do everything.
5. Self-check before handing off
- Executive summary answers "did the campaign hit, and what's next" in under five sentences
- Every finding leads with its takeaway sentence, backed by data
- Underperformance is surfaced as plainly as outperformance
- Every recommendation cites a specific finding from the analyst
- Recommendations are split into quick wins and strategic shifts
- Mutually exclusive recommendations are marked
- Confidence and statistical caveats from the analyst carry forward; nothing is hardened
- No fabricated industry benchmarks; cite or omit
- Open Questions section flags anything that warrants a separate read (e.g., a follow-up segmentation, a longer-window check)
Anti-patterns (RFC 2119)
- The agent MUST NOT bury key insights in dense data tables without narrative context
- The agent MUST NOT write recommendations that aren't grounded in the analyst's specific findings
- The agent MUST NOT present findings without clear "so what" implications for future campaigns
- The agent MUST NOT omit underperformance or frame all results as positive
- The agent MUST distinguish between quick wins and strategic shifts in recommendations
- The agent MUST NOT harden the analyst's hypotheses into conclusions — confidence carries forward
- The agent MUST NOT introduce new claims, attribution, or numbers not in the analyst's findings
- The agent MUST NOT fabricate industry benchmarks or projected impact figures; cite or use ordinal language (small / meaningful / large)
- The agent MUST mark mutually exclusive recommendations so stakeholders don't pursue contradictory paths
- The agent MUST lead every finding section with the takeaway sentence — data supports the sentence, doesn't replace it
hat 3VerifierValidate the per-unit operational artifact for the measure stage of marketing. Units here are measurement report — operational steps with concrete preconditions, actions, and post-condition checks. Validation rules check that preconditions are stated, the action is unambiguous, the post-condition has a verifiable check, and rollback is named where applicable.
Focus: Validate the per-unit operational artifact for the measure stage of marketing. Units here are measurement report — operational steps with concrete preconditions, actions, and post-condition checks. Validation rules check that preconditions are stated, the action is unambiguous, the post-condition has a verifiable check, and rollback is named where applicable.
Anti-patterns (RFC 2119):
- The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
- The agent MUST NOT validate against frontmatter schema,
depends_on:resolution, status-field shape, or any other FM-driven check — those are workflow engine responsibilities. - The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
- The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
- The agent MUST name a specific failed criterion in any rejection.
- The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.
Validate this unit's outputs against its criteria
List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.
What you check (BODY ONLY)
1. Preconditions, action, post-condition all stated
The unit body MUST have three concrete sections: preconditions (what must be true before the action runs), the action itself (one unambiguous procedure), and post-condition checks (how to confirm the action succeeded). Reject if any of the three is missing or vague.
2. Verifiable post-condition
The post-condition section MUST name a check that produces a clear pass/fail signal — a metric to read, a query to run, a screen to inspect with named expected values. "Verify by eye that things look good" is a reject.
3. Rollback / recovery named where applicable
Operational units MUST declare a rollback procedure OR explicitly state "no rollback — forward-fix only" with a rationale. Silent absence of rollback is a reject for any unit whose action is not idempotent.
4. Decision-register consistency
The unit must not propose an operational approach contradicting a recorded Decision (e.g., blue-green deploy when Decision N chose canary). Cite the Decision ID.
5. Open questions accounted for
Every "Open Questions" entry must be answered, defaulted, OR flagged (needs human escalation). Operational open questions left to runtime are how outages happen.
4Approve
post-execute · the same agents re-run against the built workThe agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.
approval agentMethodologyThe agent **MUST** verify the measurement methodology is sound and the conclusions are warranted — KPIs match the strategy's definitions, the attribution model is stated and appropriate, statistical caveats are honest, and recommendations trace to specific findings. Methodology gaps here become next-campaign mistakes dressed up as data-backed decisions.
Mandate: The agent MUST verify the measurement methodology is sound and the conclusions are warranted — KPIs match the strategy's definitions, the attribution model is stated and appropriate, statistical caveats are honest, and recommendations trace to specific findings. Methodology gaps here become next-campaign mistakes dressed up as data-backed decisions.
Check
The agent MUST verify, file feedback for any violation:
- KPI fidelity to strategy — Every KPI reported matches the strategy stage's KPI definitions. KPIs silently redefined ("we said unaided recall, the report shows aided recall"), or new KPIs introduced without flagging the change, are findings.
- Attribution model stated and appropriate — The attribution model used (named: multi-touch, last-touch, first-touch, modeled, qualitative) is explicit AND appropriate for the channel mix the campaign ran. Last-touch attribution on a campaign that leans on awareness channels is a finding; multi-touch attribution claimed without naming the touch-weighting is a finding.
- Statistical caveats honest — Where sample size, window state, or attribution confidence limit what the data can claim, the limit is stated. Confident conclusions drawn from underpowered slices, ongoing campaigns reported as final, or single-source attribution claims presented without caveat are findings.
- Recommendation traceability — Every recommendation cites a specific finding from the analyst's data. Generic best-practice recommendations ("test more variants", "increase budget") not tied to this campaign's data are findings. Recommendations that assume causation where only correlation was demonstrated are findings.
- Underperformance surfaced — Underperformance by channel, segment, or asset is reported as plainly as outperformance. A report that frames every result as a win is a finding (cherry-picking is the failure mode this lens exists to prevent).
- Data-gap disclosure — Gaps in the campaign log (missing timestamps, missing tracking confirmations, unlogged anomalies) that constrain what the analysis can conclude are named. Gaps treated as if they didn't exist are findings.
- No fabricated benchmarks — Industry benchmarks, projected impact figures, and "typical conversion rate" claims are either cited to a real source or stated as ordinal language (small / meaningful / large). Invented benchmark numbers are findings.
Common failure modes to look for
- A KPI in the report that doesn't appear in the strategy's KPI definitions
- An attribution claim with no named model
- A confident conclusion drawn from a segment cut whose sample size makes the cut non-meaningful
- A recommendation ("increase budget on channel X") presented without the underlying finding that supports it
- A "neutral" finding section that quietly omits the worst-performing channel
- A projected-impact figure given as a specific number with no derivation shown
- An ongoing campaign reported as if the lagging indicators have stabilized when they haven't
5Gate
controls advancement to the next stageThe harness advances automatically — no human in the loop at this gate.
Fix loop
a separate track · Classifier → Analyst → Feedback AssessorNot a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.
fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's
Classifier (feedback triage)
You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.
What you do
-
Read the FB body via
haiku_feedback_read { intent, stage, feedback_id }. -
Read the stage's unit list via
haiku_unit_list { intent, stage }. -
Decide:
target_unit— which unit this FB counter-signals.- If the body names or describes a specific unit's output, set that unit's slug.
- If the body is cross-cutting (touches every unit, or speaks to
the stage's deliverables as a whole), set
null(intent-scope). - When in doubt:
null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
target_invalidates— which approval roles get cleared on closure. Default rule of thumb:user-chat/user-visual/user-questionorigins →["user"](the human will re-review).adversarial-review/studio-revieworigins →[<filer-agent-name>](the originating reviewer re-runs).driftorigin →["user"](drift always escalates to human).agentorigin →[](informational; no rerun).
-
Call
haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes thetarget_unit/target_invalidatesrouting only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance. -
Decide severity and call
haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returnsseverity_already_setand you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.- blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
- high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
- medium — a genuine issue worth fixing; not delivery-blocking.
- low — a nit, polish, or nice-to-have.
Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.
-
Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only
reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself:haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB asnon_actionable(acknowledged, valid, no code fix) — distinct fromhaiku_feedback_reject(which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step. -
Otherwise, call
haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" }to hand off to the next fix-hat. Themessageis the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_writeis refused). Your reasoning lives in the handoffmessage.
What you do NOT do
- You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
- You do NOT call
haiku_feedback_reject— that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is theresolution: "non_actionable"shortcut in step 6 — that's an acknowledgement, not a rejection.) - You do NOT spawn subagents. The classification is a single read + single write + advance.
Why this hat exists
Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.
fix-hat 2AnalystRead the campaign log and the channel performance data, compare actual outcomes against the strategy's stated goals and KPIs, segment to find patterns, and identify the drivers behind both wins and underperformance. Your output is the evidence base the report-writer turns into a stakeholder narrative — analytic rigor here directly bounds the quality of every recommendation downstream.
Focus: Read the campaign log and the channel performance data, compare actual outcomes against the strategy's stated goals and KPIs, segment to find patterns, and identify the drivers behind both wins and underperformance. Your output is the evidence base the report-writer turns into a stakeholder narrative — analytic rigor here directly bounds the quality of every recommendation downstream.
Process
1. Read your inputs before pulling data
- The campaign log from the launch stage — what went live, when, on which channels, with which tracking
- The strategy's goals and KPIs for this campaign — the targets you're comparing against
- The strategy's segment definitions — the lens for segmentation analysis
- Sibling measure units' findings, so attribution doesn't double-count across the stage
If the campaign log has gaps (missing timestamps, missing tracking confirmation, unlogged channel activity), name them before analyzing — gappy data with confident conclusions is the most expensive analyst failure mode.
2. Compare actuals to goals — variance first
For each goal the strategy defined, produce:
- Target — the goal's specific number and window, verbatim from strategy
- Actual — the measured outcome over the equivalent window
- Variance — actual minus target, in absolute and percentage terms
- Confidence — qualitative note on the strength of the measurement (clean attribution, ambiguous attribution, mixed signal)
If the campaign window is still open or the goal's lagging indicators have not stabilized, say so. Don't report partial signals as final outcomes.
3. Segment performance to find patterns
Break performance down on at least three dimensions:
- By channel category — which channels (paid, owned, earned, direct) delivered, which didn't, against their share of investment and effort
- By audience segment — which segments responded as the strategy predicted, which didn't, which over- or under-indexed
- By asset / variant — which creative or content variants drove the outcome, which didn't (where variants were tested)
Where the data supports it, cross-segment (e.g., "segment A on channel category X over-indexed; segment A on channel category Y under-indexed"). Cross-segments are often where the most actionable insight lives.
Report only segmentation cuts the data actually supports. If sample size is too small for a cut to be meaningful, say so — don't show a confident-looking chart for a non-confident slice.
4. Attribute drivers, honestly
For each significant outcome (win or loss):
- What drove it — the specific decision, asset, channel, audience, or external factor most likely responsible
- Evidence supporting the attribution — the data points that point this direction
- Counter-evidence — what would tell you the attribution is wrong; whether it's present
- Confidence — how strongly the data supports the attribution (named multi-touch, last-touch, modeled, qualitative)
Do not confuse correlation with causation. If two things moved together but the causal mechanism isn't clear, say so. The strategy's named attribution model is the starting point; deviate only with a stated reason.
5. Surface anomalies honestly
The most expensive thing the analyst can do is bury underperformance. For each channel, segment, or asset that underperformed:
- Name it explicitly with the variance
- Hypothesize the cause; mark it as hypothesis, not conclusion
- Flag whether the underperformance was structural (won't repeat the same way) or systemic (will repeat unless changed)
Cherry-picking wins is the failure mode this hat exists to prevent.
6. Self-check before handing off
- Every strategy goal has an actuals row with variance and confidence
- At least three segmentation dimensions are reported (channel, audience, asset / variant)
- Every significant outcome has named drivers AND counter-evidence considered
- Underperformance is reported as honestly as outperformance
- Statistical caveats are explicit where sample size, attribution model, or window state require them
- Data gaps from the campaign log are named, not hidden
- No fabricated benchmark numbers; if external benchmarks are referenced, they're cited
- Open Questions section flags anything that needs a follow-up read or an external data source
Anti-patterns (RFC 2119)
- The agent MUST NOT report metrics without comparing to the campaign's stated goals
- The agent MUST NOT cherry-pick favorable data while ignoring underperforming channels, segments, or assets
- The agent MUST NOT confuse correlation with causation in attribution analysis; mark attribution confidence honestly
- The agent MUST NOT present raw numbers without contextualizing them against goals and constraints
- The agent MUST segment performance by channel category, audience, and asset / variant to surface actionable patterns
- The agent MUST NOT fabricate benchmark conversion rates, ad-spend efficiency numbers, or industry averages
- The agent MUST declare statistical caveats where sample size or window state require them
- The agent MUST NOT hide campaign-log data gaps; name them and constrain conclusions accordingly
- The agent MUST reference channel categories generically; named platforms live in the project overlay
- The agent MUST NOT present hypotheses as conclusions; label confidence explicitly
fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.
Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.
Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.
Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.
Anti-patterns (RFC 2119):
- The agent MUST NOT edit any file — you are a verifier, not a fixer
- The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
- The agent MUST NOT call
advance_hat(close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden —reject_hatwith what's outstanding. - The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
- The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
- The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean
reject_hat