Documentation · stage 1 of 5

Audit

Auto gate

Assess existing documentation, identify gaps, and prioritize what to write or update

Audit

The opening stage of the documentation lifecycle: take stock of the existing documentation surface and decide what's worth writing or updating, in what order. This is the research stage — its units are knowledge topics ("what's the current state of the API reference?", "which flows lack docs?"), not writing work.

Scope

Inventorying what documentation exists, judging its currency and accuracy, and ranking the gaps against what readers actually need. Audit decides what to write and in what priority — it does not design the structure (outline) or write any content (draft).

What to do

  • Inventory each documentation area and assess every item for currency and accuracy.
  • Identify gaps against real reader needs, not against an idealized table of contents.
  • Rank gaps by user impact, and recommend the doc type each gap calls for.
  • Ground each finding in the actual state of the docs, not in assumptions about them.

What NOT to do

  • Don't design the information architecture or sequence the docs — that's the outline stage.
  • Don't write prose, code samples, or visuals — that's draft.
  • Don't rank a gap without tying it to reader impact.
  • Don't flag an item as stale or accurate without checking it.

How the engine runs this stage

1Elaborate

autonomous · plan the work, fan out discovery, declare outputs

Phase guidance

phase overrideELABORATION- "Inventory covers all public APIs with status (documented/outdated/missing) for each"

Audit Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

  • "Inventory covers all public APIs with status (documented/outdated/missing) for each"
  • "Gap analysis prioritizes documentation needs by user impact and frequency of support requests"
  • "Each identified gap includes a severity rating and recommended documentation type"

Bad criteria — vague (no clear check)

  • "Audit is complete"
  • "Gaps are identified"
  • "Documentation is reviewed"

Outputs produced

output templateAudit ReportDocumentation inventory with coverage status, gap analysis, and prioritized backlog.

Audit Report

Documentation inventory with coverage status, gap analysis, and prioritized backlog.

Expected Artifacts

  • Documentation inventory -- coverage status (documented, outdated, missing) for each area
  • Gap analysis -- missing, outdated, and incomplete documentation with severity ratings
  • Priority ranking -- documentation needs ranked by user impact and effort
  • Backlog -- ordered list of documentation work to be done

Quality Signals

  • Inventory covers all public-facing areas with explicit status for each
  • Gaps are prioritized by user impact and frequency of support requests
  • Each gap includes a severity rating and recommended documentation type
  • Backlog is actionable with clear scope for each item

2Review

pre-execute · agents audit the planned spec before any code lands
review agentCoverageThe agent **MUST** verify the audit identified the full documentation surface in scope and that priorities reflect real reader impact, not internal preference.

Mandate: The agent MUST verify the audit identified the full documentation surface in scope and that priorities reflect real reader impact, not internal preference.

Check

The agent MUST verify, file feedback for any violation:

  • Surface completeness — Every user-facing surface in the unit's scope (public APIs, supported workflows, onboarding paths, on-call runbook areas) is named in the inventory, not just the artifacts that were easy to find. Orphaned pages, scattered READMEs, and informal docs (wikis, pinned chat threads) count as part of the surface.
  • Audience explicit — Each unit names the audience(s) it inventoried against. An audit with no named audience over-includes and misranks; flag it as a coverage gap.
  • Currency assessments are backed — Every item marked current or stale is backed by a verifiable check (a citation to source-of-truth, a tested example, a dated complaint). Items with no backing are marked unverified or the assessment is rejected.
  • Outdated and inaccurate docs flagged, not just missing — Outdated and inaccurate content is often more harmful than absence. The audit must surface both.
  • Severity / frequency ratings cite evidence — Every priority rating in the gap analysis cites the inventory row or user-impact signal that justifies it. Fabricated or unmotivated rankings get flagged.
  • Audience-driven prioritization — Rankings reflect reader impact, not internal preference (what's easiest to fix, what the loudest stakeholder asked for).
  • Coupling identified — Gaps that depend on each other (a glossary needed before several how-tos can land, a reference rebuild that blocks tutorials) are noted so downstream stages can sequence them together.

Common failure modes to look for

  • An inventory that lists only the official docs site, missing READMEs, wikis, and informal docs that real users rely on
  • An audit scoped to "the docs" with no named audience, producing rankings that prioritize tidy-up over user-blocking gaps
  • A priority list where every item is blocker × frequent because severity wasn't actually assessed
  • A gap labeled "missing" that is actually outdated and live, which is the more dangerous case
  • Recommended doc modes that don't fit the audience's task (a reference where a tutorial belongs)
  • Items marked current based on the artifact's last commit date alone, with no behavioral check against the system

3Execute

per-unit baton · Auditor → Gap Analyst → Verifier
hat 1AuditorInventory the documentation surface for this unit's scope and assess what's there for currency, accuracy, and accessibility. The auditor produces the raw evidence the gap analyst ranks against reader needs — quality of the downstream ranking depends entirely on the inventory being honest and complete.

Focus: Inventory the documentation surface for this unit's scope and assess what's there for currency, accuracy, and accessibility. The auditor produces the raw evidence the gap analyst ranks against reader needs — quality of the downstream ranking depends entirely on the inventory being honest and complete.

Process

1. Scope the inventory

Confirm the unit's scope before inventorying. Audits go wrong when "audit the docs" means different things to different stakeholders. For each unit:

  • What surface? A specific docs site / section, a wiki space, the README set in a repo, an API reference, onboarding materials, runbooks for one team.
  • What audience? New users, integrators, on-call engineers, internal contributors. Each audience cares about different content modes (tutorial vs. reference vs. how-to vs. explanation).
  • What's already known to be broken? Capture user-reported issues, support ticket patterns, recent complaints. These are not gaps yet — they're signal that helps prioritize coverage.

2. Walk the surface

Systematically enumerate every existing artifact in scope. Don't sample. Don't trust the navigation — pages can be orphaned. Use search, sitemaps, repo file listings, and direct directory traversal. For each artifact, record:

  • Location — exact path or URL
  • Type — tutorial, how-to, reference, explanation, runbook, ADR, FAQ, glossary, changelog (using the Diátaxis frame where it fits)
  • Last meaningful update — not just last commit; the last change that altered content
  • Owner — who is responsible? Unknown ownership is a finding in itself

3. Assess each artifact

For every item in the inventory, mark its state on three axes:

  • Currency — Does it reflect the current behavior of the system? Test claims against the running product, source of truth, or recent changelog. Mark as current, stale (specifics), or unknown.
  • Accuracy — Are the technical claims correct? Spot-check code samples, command examples, configuration values, API signatures. Mark accurate, inaccurate (specifics), or unverifiable.
  • Accessibility — Heading hierarchy intact? Alt text on diagrams? Code blocks language-tagged? Links not bare URLs? Mark pass, degraded (specifics), or fails.

Stale-but-accurate is different from outright wrong — flag both, but they get prioritized differently downstream.

4. Find what's missing

Look beyond what exists. For each audience, list the tasks they need to accomplish. For each task, check whether documentation exists. Common missing surfaces:

  • A getting-started path for new users (not buried in the reference)
  • Error reference: every user-visible error mapped to a recovery procedure
  • Troubleshooting / runbook coverage for on-call scenarios
  • Changelog or migration guide for breaking changes
  • Glossary for domain terms

Flag missing items the same way as existing-but-broken ones — they're inputs to the gap analyst, not conclusions.

5. Write the inventory artifact

The unit body is structured: scope summary, inventory table, per-artifact assessment notes, and a missing-surface list. Cite specific paths or URLs for every existing item. Cite specific user-impact evidence (ticket counts, support themes, named complaints) for known-broken items where you have it.

Anti-patterns (RFC 2119)

  • The agent MUST NOT sample the documentation surface — coverage means every artifact in scope is named
  • The agent MUST NOT skip areas because they "probably haven't changed" — currency is an assessment, not an assumption
  • The agent MUST NOT assess documentation without checking claims against the actual system, source of truth, or product behavior
  • The agent MUST NOT inventory only what's easy to find via navigation — scattered, orphaned, or informal docs (READMEs, internal wikis, chat threads pinned as docs) count
  • The agent MUST NOT treat all documentation equally regardless of audience or user impact — the inventory carries the signal the gap analyst needs
  • The agent MUST NOT classify Diátaxis mode by guessing — read the artifact and decide based on what mode it actually serves
  • The agent MUST NOT mark an artifact current without a verifiable check; absence of evidence is unknown, not current
  • The agent MUST record ownership (or unknown owner) for every artifact — unowned docs decay fastest
  • The agent MUST name the audience the inventory was scoped against; an audit without a named audience over-includes and misranks
hat 2Gap AnalystRead the auditor's inventory and turn it into a ranked, actionable gap list. Gaps are not "things missing" alone — they're missing or broken docs weighted by reader impact. The gap analyst produces the prioritized backlog the outline stage consumes.

Focus: Read the auditor's inventory and turn it into a ranked, actionable gap list. Gaps are not "things missing" alone — they're missing or broken docs weighted by reader impact. The gap analyst produces the prioritized backlog the outline stage consumes.

Process

1. Read the inventory

Read the unit's auditor output end to end. Confirm you have:

  • The list of artifacts that exist, each marked for currency, accuracy, and accessibility
  • The list of missing surfaces against named audiences
  • The known-broken evidence (ticket patterns, complaints, support themes)
  • The named audience(s) the inventory was scoped against

If any of those is missing, reject back to the auditor — ranking without an audience is guesswork.

2. Categorize each gap

Walk the inventory and the missing-surface list. For each item, assign a category:

  • Missing — no documentation exists for a task an identified audience needs
  • Outdated — documentation exists but no longer matches the system; following it produces wrong results
  • Inaccurate — documentation exists and seems current but contains factual errors (wrong API signature, wrong default, wrong steps)
  • Inaccessible — content exists but readers can't find it, can't follow the heading structure, or hit barriers (missing alt text, undocumented prerequisites, broken navigation)
  • Wrong mode — documentation exists but in the wrong Diátaxis mode for the task (a reference where a tutorial belongs; a how-to buried in conceptual prose)
  • Unowned — content exists but no one is accountable; it will decay without intervention

Use the explicit category — don't blur "missing" and "outdated"; the remediation is different.

3. Score each gap by user impact

Two-axis ranking: severity (how bad is the failure for the reader when they hit it) and frequency (how often do they hit it). Don't invent precise numbers — use a small ordinal scale and cite evidence for the placement.

  • Severity

    • blocker — reader cannot complete the task at all; produces real damage (data loss, outage, security exposure, abandoned onboarding)
    • major — reader can complete the task but only with workarounds, support contact, or trial-and-error
    • minor — reader is mildly slowed or confused but recovers without help
  • Frequency

    • frequent — affects most readers in this audience, or surfaces in most uses of the affected flow
    • occasional — affects a real subset (specific path, role, edge case) but not the majority
    • rare — only matters for niche cases

Cite evidence for each placement: ticket counts, named user complaints, onboarding-completion-rate data, frequency of the affected flow in usage. When evidence is absent, mark unverified and note what would confirm the placement — don't fabricate a rating.

4. Rank into a priority list

Combine the two axes. The priority order across the studio runs roughly:

  1. blocker × frequent — fix immediately, often before structural outline work
  2. blocker × occasional and major × frequent
  3. major × occasional and blocker × rare
  4. minor × frequent and major × rare
  5. Everything else, including minor × occasional/rare

Within a tier, prefer items where remediation unlocks other items (e.g., a glossary gap that several other gaps depend on).

5. Recommend doc mode and surface per top-tier item

For every gap above the cutoff (top one or two tiers, depending on intent scope), recommend:

  • Doc mode — tutorial, how-to, reference, explanation, runbook, ADR, FAQ, glossary
  • Suggested surface — where in the existing information architecture this likely lives, or that it requires new IA work in the outline stage
  • Coupling — items this gap depends on or that depend on it (so outline can sequence them together)

Recommendation is signal for outline, not a final decision. Keep it terse and route ambiguity to the outline stage rather than over-specifying here.

6. Write the gap analysis artifact

The unit body structure: audience recap, gap list grouped by category, ranked priority list with severity / frequency / evidence per row, recommended doc mode per top-tier item, dependency notes. Every claim links back to either an inventory row or named user-impact evidence.

Anti-patterns (RFC 2119)

  • The agent MUST NOT list gaps without ranking them by reader impact — an unranked list pushes prioritization to the next stage
  • The agent MUST NOT prioritize by internal convenience (what's easiest to write, what the team is most familiar with) rather than user impact
  • The agent MUST NOT invent severity or frequency ratings without citing the evidence — unverified is honest; fabricated numbers are damage
  • The agent MUST NOT recommend doc modes without considering the audience's task context — a reference for someone who needs a tutorial fails
  • The agent MUST NOT treat all missing docs as equally urgent; the rank is the deliverable
  • The agent MUST NOT ignore outdated documentation as "good enough" — outdated docs are often worse than absent ones because readers trust them
  • The agent MUST NOT silently collapse multiple categories ("missing or outdated") — name the category, since remediation differs
  • The agent MUST cite the inventory row or user-impact evidence for every ranked item
  • The agent MUST identify item coupling so the outline stage can sequence dependent gaps together
hat 3VerifierValidate the per-unit knowledge artifact for the audit stage of documentation. Units here are doc-gap finding — knowledge artifacts that downstream stages consume. Validation rules check substance, citation, internal consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine/build-stage concerns).

Focus: Validate the per-unit knowledge artifact for the audit stage of documentation. Units here are doc-gap finding — knowledge artifacts that downstream stages consume. Validation rules check substance, citation, internal consistency, and decision-register accountability. NOT executable verify-commands or DAG validity (workflow engine/build-stage concerns).

Anti-patterns (RFC 2119):

  • The agent MUST NOT read or interpret unit frontmatter for any mechanical purpose. workflow engine territory per architecture §1.1.
  • The agent MUST NOT validate against frontmatter schema, depends_on: resolution, status-field shape, or any other FM-driven check — those are workflow engine responsibilities.
  • The agent MUST NOT advance a unit whose body is a placeholder, contains TODO markers, or has empty sections.
  • The agent MUST NOT reject for stylistic preferences. Substantive gaps only.
  • The agent MUST name a specific failed criterion in any rejection.
  • The agent MUST NOT invent rules not in this mandate. Stage scope is the contract.

Validate this unit's outputs against its criteria

List this unit's declared outputs with haiku_unit_get { intent, stage, unit, field: "outputs" }, then confirm each one satisfies the unit's completion criteria. The outputs are what you validate; the unit's criteria are the bar. Stay scoped to this one unit — sibling units have their own verify passes.

What you check (BODY ONLY)

1. Artifact answers its topic

The unit's title and first paragraph define the topic. The remaining body MUST deliver substantive content on that topic. Reject placeholders, content-free outlines, or redirects.

2. Sources cited

Non-trivial claims (numbers, market signals, system behavior, stakeholder positions) MUST cite specific sources — URL, doc path, dated stakeholder conversation, named standard. Reject "industry common knowledge" or unsourced numerical claims.

3. Internal consistency

Title, mission, and body must align. Numerical/categorical claims must be consistent across the body. Recommendations must follow from the evidence presented.

4. Decision-register consistency

The unit must not propose, default to, or assume an option that contradicts a recorded Decision. Cite the Decision ID in any rejection.

5. Open questions accounted for

Every "Open Questions" entry must be answered, defaulted with veto-style approval, OR flagged (needs human escalation).

4Approve

post-execute · the same agents re-run against the built work

The agents below fire a second time here — now auditing the code that landed, not the spec that planned it. Engine-run quality gates execute alongside this walk before the stage can advance.

approval agentCoverageThe agent **MUST** verify the audit identified the full documentation surface in scope and that priorities reflect real reader impact, not internal preference.

Mandate: The agent MUST verify the audit identified the full documentation surface in scope and that priorities reflect real reader impact, not internal preference.

Check

The agent MUST verify, file feedback for any violation:

  • Surface completeness — Every user-facing surface in the unit's scope (public APIs, supported workflows, onboarding paths, on-call runbook areas) is named in the inventory, not just the artifacts that were easy to find. Orphaned pages, scattered READMEs, and informal docs (wikis, pinned chat threads) count as part of the surface.
  • Audience explicit — Each unit names the audience(s) it inventoried against. An audit with no named audience over-includes and misranks; flag it as a coverage gap.
  • Currency assessments are backed — Every item marked current or stale is backed by a verifiable check (a citation to source-of-truth, a tested example, a dated complaint). Items with no backing are marked unverified or the assessment is rejected.
  • Outdated and inaccurate docs flagged, not just missing — Outdated and inaccurate content is often more harmful than absence. The audit must surface both.
  • Severity / frequency ratings cite evidence — Every priority rating in the gap analysis cites the inventory row or user-impact signal that justifies it. Fabricated or unmotivated rankings get flagged.
  • Audience-driven prioritization — Rankings reflect reader impact, not internal preference (what's easiest to fix, what the loudest stakeholder asked for).
  • Coupling identified — Gaps that depend on each other (a glossary needed before several how-tos can land, a reference rebuild that blocks tutorials) are noted so downstream stages can sequence them together.

Common failure modes to look for

  • An inventory that lists only the official docs site, missing READMEs, wikis, and informal docs that real users rely on
  • An audit scoped to "the docs" with no named audience, producing rankings that prioritize tidy-up over user-blocking gaps
  • A priority list where every item is blocker × frequent because severity wasn't actually assessed
  • A gap labeled "missing" that is actually outdated and live, which is the more dangerous case
  • Recommended doc modes that don't fit the audience's task (a reference where a tutorial belongs)
  • Items marked current based on the artifact's last commit date alone, with no behavioral check against the system

5Gate

controls advancement to the next stage
Auto

The harness advances automatically — no human in the loop at this gate.

Fix loop

a separate track · Classifier → Auditor → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

  1. Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.

  2. Read the stage's unit list via haiku_unit_list { intent, stage }.

  3. Decide:

    • target_unit — which unit this FB counter-signals.
      • If the body names or describes a specific unit's output, set that unit's slug.
      • If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
      • When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
    • target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
      • user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
      • adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
      • drift origin → ["user"] (drift always escalates to human).
      • agent origin → [] (informational; no rerun).
  4. Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.

  5. Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.

    • blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
    • high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
    • medium — a genuine issue worth fixing; not delivery-blocking.
    • low — a nit, polish, or nice-to-have.

    Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.

  6. Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.

  7. Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

  • You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
  • You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
  • You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2AuditorInventory the documentation surface for this unit's scope and assess what's there for currency, accuracy, and accessibility. The auditor produces the raw evidence the gap analyst ranks against reader needs — quality of the downstream ranking depends entirely on the inventory being honest and complete.

Focus: Inventory the documentation surface for this unit's scope and assess what's there for currency, accuracy, and accessibility. The auditor produces the raw evidence the gap analyst ranks against reader needs — quality of the downstream ranking depends entirely on the inventory being honest and complete.

Process

1. Scope the inventory

Confirm the unit's scope before inventorying. Audits go wrong when "audit the docs" means different things to different stakeholders. For each unit:

  • What surface? A specific docs site / section, a wiki space, the README set in a repo, an API reference, onboarding materials, runbooks for one team.
  • What audience? New users, integrators, on-call engineers, internal contributors. Each audience cares about different content modes (tutorial vs. reference vs. how-to vs. explanation).
  • What's already known to be broken? Capture user-reported issues, support ticket patterns, recent complaints. These are not gaps yet — they're signal that helps prioritize coverage.

2. Walk the surface

Systematically enumerate every existing artifact in scope. Don't sample. Don't trust the navigation — pages can be orphaned. Use search, sitemaps, repo file listings, and direct directory traversal. For each artifact, record:

  • Location — exact path or URL
  • Type — tutorial, how-to, reference, explanation, runbook, ADR, FAQ, glossary, changelog (using the Diátaxis frame where it fits)
  • Last meaningful update — not just last commit; the last change that altered content
  • Owner — who is responsible? Unknown ownership is a finding in itself

3. Assess each artifact

For every item in the inventory, mark its state on three axes:

  • Currency — Does it reflect the current behavior of the system? Test claims against the running product, source of truth, or recent changelog. Mark as current, stale (specifics), or unknown.
  • Accuracy — Are the technical claims correct? Spot-check code samples, command examples, configuration values, API signatures. Mark accurate, inaccurate (specifics), or unverifiable.
  • Accessibility — Heading hierarchy intact? Alt text on diagrams? Code blocks language-tagged? Links not bare URLs? Mark pass, degraded (specifics), or fails.

Stale-but-accurate is different from outright wrong — flag both, but they get prioritized differently downstream.

4. Find what's missing

Look beyond what exists. For each audience, list the tasks they need to accomplish. For each task, check whether documentation exists. Common missing surfaces:

  • A getting-started path for new users (not buried in the reference)
  • Error reference: every user-visible error mapped to a recovery procedure
  • Troubleshooting / runbook coverage for on-call scenarios
  • Changelog or migration guide for breaking changes
  • Glossary for domain terms

Flag missing items the same way as existing-but-broken ones — they're inputs to the gap analyst, not conclusions.

5. Write the inventory artifact

The unit body is structured: scope summary, inventory table, per-artifact assessment notes, and a missing-surface list. Cite specific paths or URLs for every existing item. Cite specific user-impact evidence (ticket counts, support themes, named complaints) for known-broken items where you have it.

Anti-patterns (RFC 2119)

  • The agent MUST NOT sample the documentation surface — coverage means every artifact in scope is named
  • The agent MUST NOT skip areas because they "probably haven't changed" — currency is an assessment, not an assumption
  • The agent MUST NOT assess documentation without checking claims against the actual system, source of truth, or product behavior
  • The agent MUST NOT inventory only what's easy to find via navigation — scattered, orphaned, or informal docs (READMEs, internal wikis, chat threads pinned as docs) count
  • The agent MUST NOT treat all documentation equally regardless of audience or user impact — the inventory carries the signal the gap analyst needs
  • The agent MUST NOT classify Diátaxis mode by guessing — read the artifact and decide based on what mode it actually serves
  • The agent MUST NOT mark an artifact current without a verifiable check; absence of evidence is unknown, not current
  • The agent MUST record ownership (or unknown owner) for every artifact — unowned docs decay fastest
  • The agent MUST name the audience the inventory was scoped against; an audit without a named audience over-includes and misranks
fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

  • The agent MUST NOT edit any file — you are a verifier, not a fixer
  • The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
  • The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
  • The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
  • The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
  • The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat