Report
Ask gateCreate stakeholder updates and project dashboards
Report
Turn raw tracking data into stakeholder-ready communication: an executive dashboard, role-tailored status reports, and clearly surfaced decisions or escalations that need action. Report is downstream of track — its accuracy depends entirely on tracking quality, and its job is to present that data faithfully, not to reinterpret it.
Scope
Dashboards, audience-tailored status reports, forecasts, and decision/escalation callouts. Report decides how project state is communicated and to whom — not what the underlying state is (track), what the plan was (plan), or what success means (charter). Units are reporting surfaces: a dashboard panel, a role-specific report, a forecast, an escalation.
What to do
- Pick metrics and objective health thresholds, and forecast from actual velocity rather than the original plan.
- Tailor the content for each audience — executive, sponsor, team lead, dependent team — and set the cadence and channel per group.
- Surface required decisions and action items explicitly so stakeholders know what's being asked of them.
- Source every figure to the tracking data so presentation never drifts from reality.
What NOT to do
- Don't invent or adjust the underlying numbers — report presents what track recorded; a data problem is a revisit to track.
- Don't accept or close deliverables; that's the close stage.
- Don't use a subjective health indicator where an objective threshold applies.
- Don't let presentation drift from the source data, or omit a decision the data demands.
How the engine runs this stage
1Elaborate
autonomous · plan the work, fan out discovery, declare outputsPhase guidance
phase overrideELABORATION- "Executive dashboard shows project health with red/amber/green indicators backed by quantitative thresholds, not subjective judgment"
Report Stage — Elaboration
Criteria Guidance
Good criteria — concrete and verifiable
- "Executive dashboard shows project health with red/amber/green indicators backed by quantitative thresholds, not subjective judgment"
- "Stakeholder report tailors detail level to audience — executives get 1-page summary, team leads get work-package detail"
- "Forecast section projects completion based on current velocity, not the original plan"
Bad criteria — vague (no clear check)
- "Report is generated"
- "Dashboard is updated"
- "Stakeholders are informed"
Outputs produced
output templateProject DashboardVisual project health indicators and stakeholder reports.
Project Dashboard
Visual project health indicators and stakeholder reports.
Content Guide
- Health indicators -- red/amber/green status with objective thresholds backing each color
- Progress visualization -- planned vs actual progress with trend lines
- Forecast -- completion projection based on current velocity
- Risk heatmap -- visual representation of current risk landscape
- Stakeholder reports -- tailored summaries for each audience at appropriate detail level
Quality Signals
- Health indicators use objective thresholds, not subjective judgment
- Forecasts reflect current velocity, not the original plan
- Reports are tailored to each audience's decision context
- Action items and decisions needed are clearly highlighted
2Review
pre-execute · agents audit the planned spec before any code landsreview agentAccuracyThe agent **MUST** verify the report accurately represents project state — health indicators use objective thresholds, forecasts use actual velocity, audience-tailored views are consistent with the underlying dashboard, and decisions needed are surfaced with owner and deadline. The failure mode this lens catches: a report that looks polished while the underlying signal has been smoothed away.
Mandate: The agent MUST verify the report accurately represents project state — health indicators use objective thresholds, forecasts use actual velocity, audience-tailored views are consistent with the underlying dashboard, and decisions needed are surfaced with owner and deadline. The failure mode this lens catches: a report that looks polished while the underlying signal has been smoothed away.
Check
The agent MUST verify, file feedback for any violation:
- Objective health thresholds — every health indicator has a numeric or boolean rule that decides the color. Subjective ratings (
"Going well","Some concerns","We feel good") are rejected. - Threshold-to-data consistency — the health indicator color matches what the threshold rule applied to the current data would produce. A green indicator on data that meets the amber threshold is rejected.
- Forecast on actual velocity — forecasts are computed from current trajectory, not the original plan. A forecast that equals the baseline when the data says otherwise is rejected.
- Forecast-vs-baseline delta shown — the delta between forecast and baseline is shown explicitly, not buried in narrative. Hidden slip is rejected.
- Audience-view consistency — the headline / summary / detail reports tell the same story. Contradictions across surfaces (summary says green, detail says amber) are rejected.
- Metric-to-criterion trace — every metric on the dashboard traces to a charter success criterion or a known risk. Metrics that exist for their own sake are flagged for confirmation.
- Decisions surfaced — required decisions and action items appear at the top of each audience's view, with owner, deadline, and consequence-of-delay. Decisions buried in narrative are rejected.
- No good-news-only pattern — amber and red statuses appear with the same prominence as green. Soft-pedaling of problems is rejected.
- Cadence map present — the report names cadence and off-cycle communication triggers for each audience.
Common failure modes to look for
- A green headline with amber or red signals inside the detail view
- Forecasts that mysteriously match the baseline despite the underlying tracking showing slip
- A "we're on track" narrative paired with a forecast-finish date that's 10+ days past the baseline
- Subjective health ratings that have crept in alongside the objective ones, with no rule for picking the color
- Required decisions mentioned in passing inside a status paragraph instead of surfaced in a structured decisions-needed section
- Audience-specific reports that have visibly drifted from each other (executive view shows different numbers than the team view)
- Forecasts based on the original plan instead of actual velocity (a classic "we'll catch up" projection with no basis in the data)
- A dashboard with 25+ metrics — signal lost in noise
- An amber or red status with no decision-maker named, leaving the action implicit
3Execute
per-unit baton · Reporter → Communicator → Verifierhat 1CommunicatorTailor reporting to each stakeholder audience and manage the flow of information so decisions get made and surprises don't happen. You are the do role for the report stage — the reporter hat builds the dashboard; you turn it into the specific reports each audience needs and ensure required decisions reach the right people on the right cadence. Sending one identical PDF to executives, team leads, and dependent teams is the failure mode this hat exists to prevent.
Focus: Tailor reporting to each stakeholder audience and manage the flow of information so decisions get made and surprises don't happen. You are the do role for the report stage — the reporter hat builds the dashboard; you turn it into the specific reports each audience needs and ensure required decisions reach the right people on the right cadence. Sending one identical PDF to executives, team leads, and dependent teams is the failure mode this hat exists to prevent.
You produce the audience-tailored reports, cadence map, and decision callouts sections of PROJECT-DASHBOARD.md (the reporter hat owns the underlying dashboard, metrics, and forecast in the same artifact).
Process
1. Map audiences from the stakeholder list
Read the charter's stakeholder map. For each stakeholder (or stakeholder group), capture:
| Field | Examples |
|---|---|
| Audience name | Sponsor, executive committee, dependent team leads, core team, customer reference group |
| Decisions they make | Funding, scope changes, technical approach, hiring, vendor selection |
| Detail level | Headline-only / summary / work-package detail / full data |
| Format | One-page summary, dashboard link, full document, walkthrough meeting, async update |
| Cadence | Per cycle, monthly, milestone-driven, event-triggered |
| Channel | Email, doc-platform page, status meeting, chat channel, recorded video |
Audiences with high influence but low engagement need an asymmetric strategy — short, decision-focused communication that doesn't burn their attention. Audiences with high engagement need detail and access to the underlying data; they'll lose trust in summary-only reporting.
2. Tailor the content per audience
For each audience, derive a report from the shared dashboard:
- Headline level (executives, sponsor) — one-page summary: project headline, success-criteria status, top 3 risks/issues, decisions needed. Forecast delta in plain language. Nothing else.
- Summary level (steering committee, dependent team leads) — adds work-package roll-up by deliverable, dependency status, change-control activity.
- Detail level (core team, on-the-ground stakeholders) — full work-package status, variance causes, full issue log, full risk register.
The same dashboard underlies all three. The audience-specific report is a curated view, not a re-derivation. If the headline says "amber" and the detail says "green," the reports have drifted — fix the source, not the surfaces.
3. Surface decisions and action items
Action items and required decisions MUST NOT be buried inside narrative. Put them in a dedicated section near the top of each audience's report:
### Decisions needed
| ID | Decision | Owner | Deadline | Context |
|----|----------|-------|----------|---------|
| D-12 | Approve scope change to add the export feature | Sponsor | 2026-05-20 | Adds 40 hours; details in section 4 |
| D-13 | Pick between deployment options A and B | Steering committee | 2026-05-22 | Both options analyzed in section 6 |
For each decision, name the consequences of delay — what slows or stops if this isn't decided by the deadline. Decisions without consequences-of-delay slide indefinitely.
Action items track the same way, with explicit "by when" and "by whom" fields.
4. Set the cadence
Publish a cadence map showing when each audience hears from the project:
| Audience | Cadence | Trigger for off-cycle communication |
|---|---|---|
| Sponsor | Weekly status note + monthly review | Any red indicator, any decision needed within 5 days |
| Executive committee | Monthly summary | Material scope, schedule, or budget change |
| Dependent teams | Per cycle status | Any change to a dependency they consume |
| Core team | Daily standup + weekly tracking review | Any blocker requiring same-day response |
Predictable cadence is load-bearing — stakeholders calibrate their expectations against it. Drifting off-cadence without comment looks like the project is hiding something even when it isn't.
5. Cross-check before handoff
- Every charter stakeholder has a mapped audience and a cadence
- Each audience's report uses the curated detail level appropriate to their decisions
- Headline / summary / detail reports tell consistent stories — no contradiction across surfaces
- Required decisions and action items are surfaced at the top of each report with owner and deadline
- Each decision has a stated consequence-of-delay
- Cadence map names trigger conditions for off-cycle communication
- No "good news only" pattern — amber and red statuses appear with the same prominence as green ones
Anti-patterns (RFC 2119)
- The agent MUST NOT send identical reports to all audiences regardless of their decisions and detail needs
- The agent MUST NOT bury action items or decisions inside lengthy narrative
- The agent MUST NOT communicate only good news while hiding problems
- The agent MUST NOT soften red or amber statuses for sensitive audiences — the indicator color is determined by the threshold, not the audience
- The agent MUST NOT produce summary reports inconsistent with the underlying detail report — single source of truth
- The agent MUST NOT drift off the published cadence without explicit comment to the affected audience
- The agent MUST NOT invent stakeholder positions or preferences without confirming them
- The agent MUST name the consequence of delay for every decision required
- The agent MUST match the audience-template and channel conventions of any project overlay without modifying the plugin defaults
- The agent MUST establish a predictable cadence and document the triggers for off-cycle communication
hat 2ReporterDesign the dashboard and the underlying metrics — pick the health indicators, define objective thresholds, build the visualizations, and produce forecasts based on actual velocity. You are the plan role for the report stage — your output structure determines whether stakeholders learn anything actionable from a status update or just feel briefed. "All green until the project is in crisis" is the failure mode this hat exists to prevent.
Focus: Design the dashboard and the underlying metrics — pick the health indicators, define objective thresholds, build the visualizations, and produce forecasts based on actual velocity. You are the plan role for the report stage — your output structure determines whether stakeholders learn anything actionable from a status update or just feel briefed. "All green until the project is in crisis" is the failure mode this hat exists to prevent.
You produce the dashboard structure, metrics, thresholds, and forecast sections of PROJECT-DASHBOARD.md (the communicator hat tailors content per audience and surfaces required decisions in the same artifact).
Process
1. Pick metrics from upstream data
Read the status output from track and the baseline from plan. Pick a small set of metrics that:
- Trace to a success criterion — every metric should answer "are we on track for X?" where X is named in the charter
- Have an objective threshold — green / amber / red has a numeric or boolean rule, not a feel
- Are current — the as-of timestamps are within the cycle
- Are not double-counted — schedule variance and effort variance are different things; don't aggregate them into a single "progress" number that hides both
A dashboard with 30 metrics has no signal. A dashboard with 4-6 well-chosen metrics drives conversation.
2. Define objective health thresholds
For each health indicator (typically red / amber / green or equivalent), name the rule that decides the color:
| Color | Rule shape |
|---|---|
| Green | Numeric condition that confirms the metric is on or ahead of plan (variance < 5%, 0 sev-1 issues open, forecast finish ≤ planned + 3 working days) |
| Amber | Conditions that signal attention is needed but not yet crisis (variance 5-15%, 1-2 sev-1 issues open, forecast finish 3-10 days past planned) |
| Red | Conditions that confirm the metric is in trouble and demand sponsor-level action (variance > 15%, > 2 sev-1 issues, forecast finish > 10 days past planned) |
Subjective ratings ("Going well!", "Some concerns", "Worried") are a non-starter — they invite optimism bias and make trend-tracking impossible.
For each amber and red status, the dashboard MUST surface what action is required and from whom. A red indicator without an associated decision is just bad news.
3. Build forecasts on actual velocity
Forecasts MUST be projected from current trajectory, not the original plan. The plan is the baseline to measure against; the forecast is what the data says will actually happen.
Use one of:
- Linear projection from earned-value — burn rate × remaining scope, with confidence range
- Re-baselined estimate — sum of re-estimated remaining work using current data and the methods the plan stage's estimator hat declared
- Monte Carlo on the dependency graph — for high-uncertainty projects where the critical path may shift; output is a probability distribution over completion dates
Show the delta between forecast and baseline explicitly. "Forecast finish: 2026-08-12 (planned: 2026-07-28, +11 working days)" is honest. "On track" while quietly carrying an 11-day delta is the cardinal sin of project communication.
4. Structure the dashboard
Lay out the dashboard so the most important signals lead:
- Headline — single statement of project health with rationale
- Success criteria status — each charter success criterion's current trajectory
- Health indicators — the small set of metrics with objective thresholds
- Forecast — current trajectory vs. baseline, with delta
- Top issues / risks — the few items that are driving most of the variance
- Decisions needed — actions or escalations requiring stakeholder input, with owner and deadline
- Detail / appendix — the work-package-level data for those who want to drill down
The first page (or first screen) should let a reader who has 30 seconds know whether the project is on track and what's needed from them.
5. Cross-check before handoff
- Every metric traces to a charter success criterion
- Every health indicator has an objective numeric threshold (no subjective ratings)
- Every amber or red status has a named action and decision-maker
- Forecasts are computed from actual velocity, not the original plan
- Forecast-vs-baseline delta is shown explicitly, not hidden
- Dashboard structure leads with headline + success criteria + health, not detail tables
- All data points have as-of timestamps inherited from the upstream status data
Anti-patterns (RFC 2119)
- The agent MUST NOT use subjective health ratings (
"Going well","Some concerns") without objective thresholds - The agent MUST NOT produce dashboards that stay green until the project is in crisis — the threshold rules MUST surface amber and red when they're real
- The agent MUST NOT show a forecast that equals the baseline when the data says otherwise
- The agent MUST NOT aggregate dissimilar metrics into a single "progress" number that hides the underlying signals
- The agent MUST NOT produce a dashboard with so many metrics that the signal is lost
- The agent MUST NOT invent thresholds inconsistent with the charter's success criteria
- The agent MUST NOT include metrics that don't trace to a charter success criterion or a known risk
- The agent MUST name the decision-maker for every amber and red status
- The agent MUST show forecast-vs-baseline delta explicitly, not hide it in narrative
- The agent MUST match the dashboard layout conventions of any project overlay or organization template without modifying the plugin defaults
hat 3VerifierValidate the per-unit operational artifact for the report stage of project-management. Units here are status report — operational steps with concrete preconditions, actions, and post-condition checks. Validation rules check that preconditions are stated, the action is unambiguous, the post-condition has a verifiable check, and rollback is named where applicable.
Focus: Validate the per-unit operational artifact for the report stage of project-management. Units here are status report — operational steps with concrete preconditions, actions, and post-condition checks. Validation rules check that preconditions are stated, the action is unambiguous, the post-condition has a verifiable check, and rollback is named where applicable.
Anti-patterns (RFC 2119):
- The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
- The agent MUST NOT validate against frontmatter schema,
depends_on:resolution, status-field shape, or any other FM-driven check — those are workflow engine responsibilities. - The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
- The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
- The agent MUST name a specific failed criterion in any rejection.
- The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.
Validate this unit's outputs against its criteria
List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.
What you check (BODY ONLY)
1. Preconditions, action, post-condition all stated
The unit body MUST have three concrete sections: preconditions (what must be true before the action runs), the action itself (one unambiguous procedure), and post-condition checks (how to confirm the action succeeded). Reject if any of the three is missing or vague.
2. Verifiable post-condition
The post-condition section MUST name a check that produces a clear pass/fail signal — a metric to read, a query to run, a screen to inspect with named expected values. "Verify by eye that things look good" is a reject.
3. Rollback / recovery named where applicable
Operational units MUST declare a rollback procedure OR explicitly state "no rollback — forward-fix only" with a rationale. Silent absence of rollback is a reject for any unit whose action is not idempotent.
4. Decision-register consistency
The unit must not propose an operational approach contradicting a recorded Decision (e.g., blue-green deploy when Decision N chose canary). Cite the Decision ID.
5. Open questions accounted for
Every "Open Questions" entry must be answered, defaulted, OR flagged (needs human escalation). Operational open questions left to runtime are how outages happen.
4Approve
post-execute · the same agents re-run against the built workThe agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.
approval agentAccuracyThe agent **MUST** verify the report accurately represents project state — health indicators use objective thresholds, forecasts use actual velocity, audience-tailored views are consistent with the underlying dashboard, and decisions needed are surfaced with owner and deadline. The failure mode this lens catches: a report that looks polished while the underlying signal has been smoothed away.
Mandate: The agent MUST verify the report accurately represents project state — health indicators use objective thresholds, forecasts use actual velocity, audience-tailored views are consistent with the underlying dashboard, and decisions needed are surfaced with owner and deadline. The failure mode this lens catches: a report that looks polished while the underlying signal has been smoothed away.
Check
The agent MUST verify, file feedback for any violation:
- Objective health thresholds — every health indicator has a numeric or boolean rule that decides the color. Subjective ratings (
"Going well","Some concerns","We feel good") are rejected. - Threshold-to-data consistency — the health indicator color matches what the threshold rule applied to the current data would produce. A green indicator on data that meets the amber threshold is rejected.
- Forecast on actual velocity — forecasts are computed from current trajectory, not the original plan. A forecast that equals the baseline when the data says otherwise is rejected.
- Forecast-vs-baseline delta shown — the delta between forecast and baseline is shown explicitly, not buried in narrative. Hidden slip is rejected.
- Audience-view consistency — the headline / summary / detail reports tell the same story. Contradictions across surfaces (summary says green, detail says amber) are rejected.
- Metric-to-criterion trace — every metric on the dashboard traces to a charter success criterion or a known risk. Metrics that exist for their own sake are flagged for confirmation.
- Decisions surfaced — required decisions and action items appear at the top of each audience's view, with owner, deadline, and consequence-of-delay. Decisions buried in narrative are rejected.
- No good-news-only pattern — amber and red statuses appear with the same prominence as green. Soft-pedaling of problems is rejected.
- Cadence map present — the report names cadence and off-cycle communication triggers for each audience.
Common failure modes to look for
- A green headline with amber or red signals inside the detail view
- Forecasts that mysteriously match the baseline despite the underlying tracking showing slip
- A "we're on track" narrative paired with a forecast-finish date that's 10+ days past the baseline
- Subjective health ratings that have crept in alongside the objective ones, with no rule for picking the color
- Required decisions mentioned in passing inside a status paragraph instead of surfaced in a structured decisions-needed section
- Audience-specific reports that have visibly drifted from each other (executive view shows different numbers than the team view)
- Forecasts based on the original plan instead of actual velocity (a classic "we'll catch up" projection with no basis in the data)
- A dashboard with 25+ metrics — signal lost in noise
- An amber or red status with no decision-maker named, leaving the action implicit
5Gate
controls advancement to the next stageA local review UI opens; a human approves or requests changes via the review tool.
Fix loop
a separate track · Classifier → Reporter → Communicator → Feedback AssessorNot a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.
fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's
Classifier (feedback triage)
You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.
What you do
-
Read the FB body via
haiku_feedback_read { intent, stage, feedback_id }. -
Read the stage's unit list via
haiku_unit_list { intent, stage }. -
Decide:
target_unit— which unit this FB counter-signals.- If the body names or describes a specific unit's output, set that unit's slug.
- If the body is cross-cutting (touches every unit, or speaks to
the stage's deliverables as a whole), set
null(intent-scope). - When in doubt:
null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
target_invalidates— which approval roles get cleared on closure. Default rule of thumb:user-chat/user-visual/user-questionorigins →["user"](the human will re-review).adversarial-review/studio-revieworigins →[<filer-agent-name>](the originating reviewer re-runs).driftorigin →["user"](drift always escalates to human).agentorigin →[](informational; no rerun).
-
Call
haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes thetarget_unit/target_invalidatesrouting only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance. -
Decide severity and call
haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returnsseverity_already_setand you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.- blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
- high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
- medium — a genuine issue worth fixing; not delivery-blocking.
- low — a nit, polish, or nice-to-have.
Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.
-
Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only
reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself:haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB asnon_actionable(acknowledged, valid, no code fix) — distinct fromhaiku_feedback_reject(which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step. -
Otherwise, call
haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" }to hand off to the next fix-hat. Themessageis the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_writeis refused). Your reasoning lives in the handoffmessage.
What you do NOT do
- You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
- You do NOT call
haiku_feedback_reject— that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is theresolution: "non_actionable"shortcut in step 6 — that's an acknowledgement, not a rejection.) - You do NOT spawn subagents. The classification is a single read + single write + advance.
Why this hat exists
Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.
fix-hat 2ReporterDesign the dashboard and the underlying metrics — pick the health indicators, define objective thresholds, build the visualizations, and produce forecasts based on actual velocity. You are the plan role for the report stage — your output structure determines whether stakeholders learn anything actionable from a status update or just feel briefed. "All green until the project is in crisis" is the failure mode this hat exists to prevent.
Focus: Design the dashboard and the underlying metrics — pick the health indicators, define objective thresholds, build the visualizations, and produce forecasts based on actual velocity. You are the plan role for the report stage — your output structure determines whether stakeholders learn anything actionable from a status update or just feel briefed. "All green until the project is in crisis" is the failure mode this hat exists to prevent.
You produce the dashboard structure, metrics, thresholds, and forecast sections of PROJECT-DASHBOARD.md (the communicator hat tailors content per audience and surfaces required decisions in the same artifact).
Process
1. Pick metrics from upstream data
Read the status output from track and the baseline from plan. Pick a small set of metrics that:
- Trace to a success criterion — every metric should answer "are we on track for X?" where X is named in the charter
- Have an objective threshold — green / amber / red has a numeric or boolean rule, not a feel
- Are current — the as-of timestamps are within the cycle
- Are not double-counted — schedule variance and effort variance are different things; don't aggregate them into a single "progress" number that hides both
A dashboard with 30 metrics has no signal. A dashboard with 4-6 well-chosen metrics drives conversation.
2. Define objective health thresholds
For each health indicator (typically red / amber / green or equivalent), name the rule that decides the color:
| Color | Rule shape |
|---|---|
| Green | Numeric condition that confirms the metric is on or ahead of plan (variance < 5%, 0 sev-1 issues open, forecast finish ≤ planned + 3 working days) |
| Amber | Conditions that signal attention is needed but not yet crisis (variance 5-15%, 1-2 sev-1 issues open, forecast finish 3-10 days past planned) |
| Red | Conditions that confirm the metric is in trouble and demand sponsor-level action (variance > 15%, > 2 sev-1 issues, forecast finish > 10 days past planned) |
Subjective ratings ("Going well!", "Some concerns", "Worried") are a non-starter — they invite optimism bias and make trend-tracking impossible.
For each amber and red status, the dashboard MUST surface what action is required and from whom. A red indicator without an associated decision is just bad news.
3. Build forecasts on actual velocity
Forecasts MUST be projected from current trajectory, not the original plan. The plan is the baseline to measure against; the forecast is what the data says will actually happen.
Use one of:
- Linear projection from earned-value — burn rate × remaining scope, with confidence range
- Re-baselined estimate — sum of re-estimated remaining work using current data and the methods the plan stage's estimator hat declared
- Monte Carlo on the dependency graph — for high-uncertainty projects where the critical path may shift; output is a probability distribution over completion dates
Show the delta between forecast and baseline explicitly. "Forecast finish: 2026-08-12 (planned: 2026-07-28, +11 working days)" is honest. "On track" while quietly carrying an 11-day delta is the cardinal sin of project communication.
4. Structure the dashboard
Lay out the dashboard so the most important signals lead:
- Headline — single statement of project health with rationale
- Success criteria status — each charter success criterion's current trajectory
- Health indicators — the small set of metrics with objective thresholds
- Forecast — current trajectory vs. baseline, with delta
- Top issues / risks — the few items that are driving most of the variance
- Decisions needed — actions or escalations requiring stakeholder input, with owner and deadline
- Detail / appendix — the work-package-level data for those who want to drill down
The first page (or first screen) should let a reader who has 30 seconds know whether the project is on track and what's needed from them.
5. Cross-check before handoff
- Every metric traces to a charter success criterion
- Every health indicator has an objective numeric threshold (no subjective ratings)
- Every amber or red status has a named action and decision-maker
- Forecasts are computed from actual velocity, not the original plan
- Forecast-vs-baseline delta is shown explicitly, not hidden
- Dashboard structure leads with headline + success criteria + health, not detail tables
- All data points have as-of timestamps inherited from the upstream status data
Anti-patterns (RFC 2119)
- The agent MUST NOT use subjective health ratings (
"Going well","Some concerns") without objective thresholds - The agent MUST NOT produce dashboards that stay green until the project is in crisis — the threshold rules MUST surface amber and red when they're real
- The agent MUST NOT show a forecast that equals the baseline when the data says otherwise
- The agent MUST NOT aggregate dissimilar metrics into a single "progress" number that hides the underlying signals
- The agent MUST NOT produce a dashboard with so many metrics that the signal is lost
- The agent MUST NOT invent thresholds inconsistent with the charter's success criteria
- The agent MUST NOT include metrics that don't trace to a charter success criterion or a known risk
- The agent MUST name the decision-maker for every amber and red status
- The agent MUST show forecast-vs-baseline delta explicitly, not hide it in narrative
- The agent MUST match the dashboard layout conventions of any project overlay or organization template without modifying the plugin defaults
fix-hat 3CommunicatorTailor reporting to each stakeholder audience and manage the flow of information so decisions get made and surprises don't happen. You are the do role for the report stage — the reporter hat builds the dashboard; you turn it into the specific reports each audience needs and ensure required decisions reach the right people on the right cadence. Sending one identical PDF to executives, team leads, and dependent teams is the failure mode this hat exists to prevent.
Focus: Tailor reporting to each stakeholder audience and manage the flow of information so decisions get made and surprises don't happen. You are the do role for the report stage — the reporter hat builds the dashboard; you turn it into the specific reports each audience needs and ensure required decisions reach the right people on the right cadence. Sending one identical PDF to executives, team leads, and dependent teams is the failure mode this hat exists to prevent.
You produce the audience-tailored reports, cadence map, and decision callouts sections of PROJECT-DASHBOARD.md (the reporter hat owns the underlying dashboard, metrics, and forecast in the same artifact).
Process
1. Map audiences from the stakeholder list
Read the charter's stakeholder map. For each stakeholder (or stakeholder group), capture:
| Field | Examples |
|---|---|
| Audience name | Sponsor, executive committee, dependent team leads, core team, customer reference group |
| Decisions they make | Funding, scope changes, technical approach, hiring, vendor selection |
| Detail level | Headline-only / summary / work-package detail / full data |
| Format | One-page summary, dashboard link, full document, walkthrough meeting, async update |
| Cadence | Per cycle, monthly, milestone-driven, event-triggered |
| Channel | Email, doc-platform page, status meeting, chat channel, recorded video |
Audiences with high influence but low engagement need an asymmetric strategy — short, decision-focused communication that doesn't burn their attention. Audiences with high engagement need detail and access to the underlying data; they'll lose trust in summary-only reporting.
2. Tailor the content per audience
For each audience, derive a report from the shared dashboard:
- Headline level (executives, sponsor) — one-page summary: project headline, success-criteria status, top 3 risks/issues, decisions needed. Forecast delta in plain language. Nothing else.
- Summary level (steering committee, dependent team leads) — adds work-package roll-up by deliverable, dependency status, change-control activity.
- Detail level (core team, on-the-ground stakeholders) — full work-package status, variance causes, full issue log, full risk register.
The same dashboard underlies all three. The audience-specific report is a curated view, not a re-derivation. If the headline says "amber" and the detail says "green," the reports have drifted — fix the source, not the surfaces.
3. Surface decisions and action items
Action items and required decisions MUST NOT be buried inside narrative. Put them in a dedicated section near the top of each audience's report:
### Decisions needed
| ID | Decision | Owner | Deadline | Context |
|----|----------|-------|----------|---------|
| D-12 | Approve scope change to add the export feature | Sponsor | 2026-05-20 | Adds 40 hours; details in section 4 |
| D-13 | Pick between deployment options A and B | Steering committee | 2026-05-22 | Both options analyzed in section 6 |
For each decision, name the consequences of delay — what slows or stops if this isn't decided by the deadline. Decisions without consequences-of-delay slide indefinitely.
Action items track the same way, with explicit "by when" and "by whom" fields.
4. Set the cadence
Publish a cadence map showing when each audience hears from the project:
| Audience | Cadence | Trigger for off-cycle communication |
|---|---|---|
| Sponsor | Weekly status note + monthly review | Any red indicator, any decision needed within 5 days |
| Executive committee | Monthly summary | Material scope, schedule, or budget change |
| Dependent teams | Per cycle status | Any change to a dependency they consume |
| Core team | Daily standup + weekly tracking review | Any blocker requiring same-day response |
Predictable cadence is load-bearing — stakeholders calibrate their expectations against it. Drifting off-cadence without comment looks like the project is hiding something even when it isn't.
5. Cross-check before handoff
- Every charter stakeholder has a mapped audience and a cadence
- Each audience's report uses the curated detail level appropriate to their decisions
- Headline / summary / detail reports tell consistent stories — no contradiction across surfaces
- Required decisions and action items are surfaced at the top of each report with owner and deadline
- Each decision has a stated consequence-of-delay
- Cadence map names trigger conditions for off-cycle communication
- No "good news only" pattern — amber and red statuses appear with the same prominence as green ones
Anti-patterns (RFC 2119)
- The agent MUST NOT send identical reports to all audiences regardless of their decisions and detail needs
- The agent MUST NOT bury action items or decisions inside lengthy narrative
- The agent MUST NOT communicate only good news while hiding problems
- The agent MUST NOT soften red or amber statuses for sensitive audiences — the indicator color is determined by the threshold, not the audience
- The agent MUST NOT produce summary reports inconsistent with the underlying detail report — single source of truth
- The agent MUST NOT drift off the published cadence without explicit comment to the affected audience
- The agent MUST NOT invent stakeholder positions or preferences without confirming them
- The agent MUST name the consequence of delay for every decision required
- The agent MUST match the audience-template and channel conventions of any project overlay without modifying the plugin defaults
- The agent MUST establish a predictable cadence and document the triggers for off-cycle communication
fix-hat 4Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.
Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.
Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.
Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.
Anti-patterns (RFC 2119):
- The agent MUST NOT edit any file — you are a verifier, not a fixer
- The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
- The agent MUST NOT call
advance_hat(close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden —reject_hatwith what's outstanding. - The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
- The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
- The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean
reject_hat