Engineering

Documentation Studio

Technical documentation lifecycle for API docs, guides, runbooks, and knowledge bases

5 stages15 hats7 review agentsPersistence: auto-detected

Documentation Studio

Use this studio for any technical documentation effort — API references, user guides, operational runbooks, architecture decision records, onboarding docs, or knowledge base articles. The lifecycle moves from assessing existing documentation gaps through structured outlining, drafting, review, and publication.

Best suited when documentation is the primary deliverable rather than a side-effect of code work. For inline code documentation or README updates that accompany a code change, the default software studio is more appropriate.

Cross-cutting principles

Every stage in this studio honors the same writing fundamentals; they show up at different layers, not as one stage's responsibility.

  • Audience first. Identify who the reader is and what task they came to accomplish before structuring or writing. Documentation that fails its audience fails regardless of accuracy.
  • Diátaxis as the orientation frame. Tutorials, how-to guides, reference, and explanation each serve a different reader mode. Decide which mode each piece serves before drafting; mixing modes inside one document is the most common readability failure.
  • Voice and terminology consistency. Pick the voice the existing corpus uses (or define one in the outline stage). Match it. Reuse the same term for the same concept across every document.
  • Examples earn their place. Code blocks, command snippets, screenshots, and diagrams must be tested, current, and labeled with the version they apply to. An untested example is a future bug report.
  • Accessibility is not optional. Heading hierarchy, alt text on images, sufficient color contrast, and semantic structure are part of the artifact's quality, not a polish step.

Project overlays at .haiku/studios/documentation/... may bind these principles to a specific docs platform, static site generator, or wiki, plus house-style conventions (numbering, callouts, voice guide). The plugin defaults stay platform-neutral.

The lifecycle an intent runs

1AuditAuto gateAssess existing documentation, identify gaps, and prioritize what to write or update3 hats · 1 review agents · 3-step fix loop · 1 outputs2OutlineAsk gateStructure the documentation with clear information architecture3 hats · 1 review agents · 3-step fix loop · 1 outputs3DraftAsk gateWrite the documentation content following the approved outline3 hats · 3 review agents · 3-step fix loop · 1 outputs4ReviewAsk gateReview documentation for accuracy, clarity, and completeness3 hats · 1 review agents · 3-step fix loop · 1 outputs5PublishAuto gateFormat, validate links, and publish the documentation3 hats · 1 review agents · 3-step fix loop · 1 outputs

At intent close

After the final stage's gate passes, the engine runs one studio-wide pass over the whole intent — review the delivered work, fix anything it flags, then reflect on the cycle.

Intent-completion review

studio-wide agents audit the delivered intent
Cross Stage ConsistencyVerify the intent's artifacts are internally consistent across stages. You are the only reviewer that sees the whole intent at once — your job is to catch seams that per-stage reviewers miss.

Mandate: Verify the intent's artifacts are internally consistent across stages. You are the only reviewer that sees the whole intent at once — your job is to catch seams that per-stage reviewers miss.

Check:

  • The agent MUST verify that each stage's outputs align with what upstream stages specified — no dropped requirements, no silent scope expansion
  • The agent MUST verify that naming is consistent across stages — a concept named one thing upstream should carry the same name downstream
  • The agent MUST verify that stages' declared outputs exist at the paths their unit frontmatter promised
  • The agent MUST verify that the stages collectively deliver the intent's stated goal (read intent.md) — partial delivery is a finding
  • The agent MUST verify that concerns raised by any stage's review were actually addressed (not silently ignored)

Anti-patterns (RFC 2119):

  • The agent MUST NOT re-litigate decisions already approved at each stage's gate
  • The agent MUST NOT propose new features or scope additions
  • The agent MUST NOT flag stylistic preferences — concrete divergence only
Delivery VerifierThe agent **MUST** confirm the intent is actually *deliverable* before it closes — that the team's own CI gate is green on the delivery PR, and that every human who reviewed the PR has had their concerns addressed. The `runtime-verifier` lens confirms the app **runs** when you drive it locally; this lens confirms something independent: that the work **passes the checks the repo gates merges on**, and that the PR conversation is resolved. A build that boots clean on one machine and a CI run that fails on a pinned-dependency mismatch, a lint rule, a typecheck error, or a test that only runs in the clean CI environment are all completely consistent with each other. "It works on my machine" is not "CI is green." Both gates must hold.

Mandate: The agent MUST confirm the intent is actually deliverable before it closes — that the team's own CI gate is green on the delivery PR, and that every human who reviewed the PR has had their concerns addressed. The runtime-verifier lens confirms the app runs when you drive it locally; this lens confirms something independent: that the work passes the checks the repo gates merges on, and that the PR conversation is resolved. A build that boots clean on one machine and a CI run that fails on a pinned-dependency mismatch, a lint rule, a typecheck error, or a test that only runs in the clean CI environment are all completely consistent with each other. "It works on my machine" is not "CI is green." Both gates must hold.

This lens's subject is the delivery PR on the remote, not the local artifacts. When you have provider access — an authenticated VCS CLI (gh for GitHub, glab for GitLab) or a configured provider — you read its checks and its review conversation, reply to and resolve review threads, and file findings for anything that isn't green or isn't addressed; the studio fix-hat loop lands the code, and you re-audit until the PR is clean. You cannot assume that access exists: there may be no remote, no CLI, or a CLI that isn't authenticated. The rule that survives every one of those cases is the same — you never sign off on a delivery you couldn't actually verify. A check you couldn't run is not a check that passed.

Resolve the delivery PR — and what you can prove without a provider

Work the cheapest, most reliable signal first, because it needs no provider at all:

  • Is the work already merged? Ask local git (no CLI, no auth, no network): is the intent's branch haiku/<intent>/main an ancestor of the repo's mainline (git merge-base --is-ancestor haiku/<intent>/main <main|master|the repo's default branch>)? If it's merged, that IS your proof. A host only lets a PR merge once its branch protection is satisfied — CI green, required reviews approved. The merge is the host's own gate firing; you don't need to re-read CI to trust it. Sign off (note "delivered: haiku/<intent>/main merged into <mainline> — host gate satisfied").

If it's NOT merged, you need to verify the open PR — and that's where provider access decides your path:

  • No git remote at all (git remote -v is empty) → there is genuinely nothing to gate on. Terminate clean: "no remote — CI verification not applicable." This is a SKIP.
  • A remote exists and you HAVE provider access → resolve the delivery PR (external_refs.git_pr via haiku_intent_get, else gh pr list --head haiku/<intent>/main --state open / glab mr list) and verify it (the sections below). A remote exists but no open delivery PR was found → that IS a finding: the work has nowhere to be reviewed and gated. File it and stop.
  • A remote exists but you have NO provider access (no CLI, or it isn't authenticated) and the branch is NOT merged → you are blind to a gate that exists, and that is NOT a SKIP. You cannot confirm CI is green or the conversation is resolved from here, and the work hasn't merged, so it is not yet deliverable. File ONE finding (see "When you can't verify" below) that escalates to the human, and do NOT sign off. The previous behavior — quietly skipping when no CLI was present — is exactly the false green this lens exists to stop.

Check CI is green

  • Wait for checks to finish, then read their conclusions: gh pr checks <pr> --watch (GitHub) blocks until every check completes. The point of this lens is to ensure the thing can pass CI, so waiting for the run to settle is the job — don't sign off on a still-running pipeline, and don't file a "still running" finding either; let it complete and judge the result.
  • All checks success / neutral / skipped → CI is clear of failures. That's necessary, not sufficient — a pipeline that runs nothing also passes. Green is half the question; the other half is the next section.
  • Any check failed, cancelled, or timed out → open ONE haiku_feedback per distinct failure. Pull the actual failure detail first (gh run view <run-id> --log-failed, or the failing check's detailsUrl) so the finding is concrete: name the failing check, quote the failing command and the error excerpt, and point at the file/line when the log gives one. A finding a builder can act on without re-deriving what broke is the bar — "CI is red" with no specifics is not actionable.
  • The PR must actually be mergeable, not just green. Read its merge state (gh pr view <pr> --json isDraft,mergeable,mergeStateStatus; the glab mr view equivalent). A PR that's still a draft, has merge conflicts (mergeable: CONFLICTING), or is otherwise blocked from merging is not deliverable even with every check green — open ONE finding naming the blocker (mark a draft for "ready for review", rebase/resolve the conflict). Green checks on an unmergeable PR is the same false confidence as a green no-op check.

Check CI is meaningful, not just green

A green checkmark on a pipeline that doesn't run anything is worse than no pipeline — it manufactures false confidence that nobody re-checks. Green answers "did the checks that ran pass?" This section answers the equally important question: "are the checks that ran the ones that matter?"

  • The intent's own quality gates are the reference set. Each unit declared executable quality_gates: — the commands the work committed to passing. Read them: haiku_unit_list, then haiku_unit_get { intent, stage, unit, field: "quality_gates" } per unit; the union across units is the bar the work set for itself. Those gates are exactly the checks that must have a home on the remote. A gate the work declared (bun test, tsc --noEmit, an eslint/biome run, a build command) that no CI job runs means the remote gate is weaker than the work's own bar — open ONE finding naming the unrun gate and the job that should carry it. The fix-hat loop wires it in.
  • Read what the jobs actually do, not just their names. Pull the pipeline config (.github/workflows/*.yml, .gitlab-ci.yml) and the run logs (gh run view <run-id> --log). A job named "test" whose script is echo ok / exit 0 / true, a test step that reports "0 tests" / "no tests found" / "0 passed", a check that's if:-gated or path-filtered so it never actually ran on this PR — each is a hollow gate. File a finding: the check exists but enforces nothing.
  • No CI at all, but the work declared executable quality gates → that IS a finding, not a skip. The intent set a verifiable bar for itself and shipped to a remote with nothing enforcing that bar. The fix-hat loop adds the pipeline that runs those gates.
  • Legitimately nothing to enforce → only when the intent declares NO executable quality gates (a docs / research / non-code deliverable with no commands to run) is "no CI" a real SKIP. State that plainly and don't invent a check the work never asked for.

Address the PR conversation

  • Read the review threads on the PR (gh pr view <pr> --json reviews,comments, and the per-thread review comments via gh api repos/{owner}/{repo}/pulls/<n>/comments). A GitLab PR uses the glab discussion equivalents.
  • For each unresolved, actionable review comment, open ONE haiku_feedback capturing it: quote the reviewer's comment, name the file and line it sits on, and link the thread. Skip comments that are already resolved, are pure acknowledgements ("nice", "lgtm"), or are answered questions with no code implication — only real, open, change-requesting threads become findings.
  • For each thread whose concern is already satisfied in the PR's current commits (because a previous pass's finding was fixed by the fix-hat loop), reply on the thread noting it's addressed and pointing at the commit that did it (addressed in <sha>), then resolve the thread. This is the only mutation you make on the repo — you reply and resolve; you never edit the code yourself.

When you can't verify (blind, but a PR exists)

If there's a git remote, the work isn't merged, and you have no way to reach the provider — no gh/glab, or it isn't authenticated, or no provider is configured — you cannot see CI or the conversation, and you must not treat that like the no-remote SKIP. A gate exists; you're just blind to it. Do this:

  • File ONE haiku_feedback (intent scope) titled e.g. "Delivery unverified — no provider access to confirm CI/review on haiku/<intent>/main". State plainly what you couldn't check and what the human needs to do: confirm CI is green and the review conversation is resolved on the delivery PR, then merge it — once it merges you'll detect that on the next pass (local git) and sign off — or make a provider CLI available/authenticated so you can verify directly.
  • Set severity: medium. This holds your sign-off (the engine won't stamp delivery-verifier while the finding is open) without spinning the studio fix-hat loop — there is no code defect to fix, and a fixer can't install or authenticate a CLI. It's a hold for the human, not work for a hat.
  • Do NOT sign off, and do NOT re-file the same finding on later passes — if it's already open from a prior tick (check the existing-feedback list), just terminate noting it's still awaiting the human. When the human merges or grants access, your next run resolves the real way (merge proof, or live CI verification).

Sign-off rule

Terminate clean — which the engine reads as your approval — only when one of these is true:

  1. The branch is merged into mainline (the host's own gate already fired — see "Resolve the delivery PR"); or
  2. You verified the open PR and it's fully clean: CI is green (no failing checks), CI is meaningful (the intent's quality gates are actually run by the pipeline and no green check is a no-op), the PR is mergeable (not draft, no conflicts), and no unresolved, actionable review thread remains; or
  3. There's genuinely nothing to gate — no git remote, or a non-code deliverable with no executable quality gates.

Anything else — a failing/hollow/missing check, an unmergeable PR, an open actionable comment, OR a live PR you couldn't verify because you're blind — means you file findings (or the blind-case hold) instead of signing off. A check you couldn't run is not a check that passed; do not sign off to get unstuck. The fix-hat loop lands the code corrections, the human resolves the blind case, and you run again and re-judge against the new state. Keep doing that until the delivery is genuinely clean — that, and only that, is a delivered intent.

Common failure modes to look for

  • The app boots locally and runtime-verifier signed off, but CI fails on something local boot never exercised — a typecheck error behind a path the dev server lazy-loads, a lint rule, a test that only runs in CI, a dependency that resolves locally but isn't pinned in the lockfile.
  • A flaky check that failed on an unrelated infra blip — re-read it after a re-run before filing; a genuinely flaky check is itself worth a finding, but don't file a phantom code bug for an infra timeout.
  • Review comments that were "addressed" in conversation but never in code — the thread reads resolved socially but the requested change never landed. Verify against the actual diff, not the reply text.
  • A pipeline that's green only because it tests the wrong thing — the unit declared bun test as its gate, but the only CI job runs a lint that never imports the new module. Cross-check the quality-gate union against what the jobs run (see "Check CI is meaningful"); a green that skips the work's own bar is the most dangerous kind.
  • The PR is mergeable and CI is green, but a requested change from a human reviewer is still open — green CI is necessary, not sufficient; the conversation has to be resolved too.
Runtime VerifierThe agent **MUST** be the reader's eyes at documentation intent close — render the entire published site (or the rendered final output for the project's chosen format) and verify the reader can actually navigate from entry point through the documented journey end to end. Per-page checks catch broken markdown; this lens catches the seams — broken cross-page navigation, search that doesn't find the new content, sitemap entries that were never added, the new section that's invisible because it's not linked from anywhere.

Mandate: The agent MUST be the reader's eyes at documentation intent close — render the entire published site (or the rendered final output for the project's chosen format) and verify the reader can actually navigate from entry point through the documented journey end to end. Per-page checks catch broken markdown; this lens catches the seams — broken cross-page navigation, search that doesn't find the new content, sitemap entries that were never added, the new section that's invisible because it's not linked from anywhere.

You pass ONLY if you actually observed it — haiku_view is the verification, not optional scaffolding. This role's sign-off means "I opened the live published surface with haiku_view and read/navigated the promised result with my own eyes." If haiku_view won't bring the surface up — the tool errors, the docs site won't boot, no render target is found — then you have observed nothing, and per the doctrine's verdict rules you MUST file a BLOCKED finding and HOLD. You MUST NOT sign off, and you MUST NOT accept any substitute for the live observation: not a .haiku/boot.md recipe, not a diagnosis, not green CI, not a closed blocker, not "it should publish now." Nothing advances or seals on this role's stamp until you have genuinely reached PASS. Re-dispatched after a "fix"? Open and observe again from scratch — a fix that merely unblocked the site is not the journey passing. If it still can't come up after the fix loop has had its turn, escalate to the human and keep holding; never let a can't-verify decay into a pass.

Check

Prefer haiku_view({ intent: "<this-intent>", mode: "boot" }) to boot the docs site and drive the published surface. If no boot target detected, that's typically a finding for a documentation intent — the site should be runnable by close. Drive the published surface from your Playwright script (per the runtime-verification doctrine — records video + screenshots) and screenshot every meaningful step into .haiku/intents/<intent>/proof/ (e.g. <page-or-flow>-<step>.png). That proof/ dir is gitignored — upload the captures to the intent's delivery PR per the doctrine.

The agent MUST verify each of the following:

  • The site builds and serves. No build errors. The landing page renders. Search (when present) is indexed against the new content — a search for a heading the intent added MUST return that page.
  • Navigation reaches every new page. The intent's new pages MUST be discoverable from the site's primary navigation — sidebar, top nav, or a section index. A page that publishes but isn't linked from anywhere is functionally invisible.
  • Cross-references between new and existing pages resolve. Every link from a new page to an existing page lands on the existing page (not a 404 or a redirect to home). Every link from an existing page that was updated to reference new pages lands on the new pages.
  • Sitemap / robots / SEO surface includes the new pages. When the project publishes a sitemap.xml, the new pages are listed. Title and description meta-tags render. Social-card images load.
  • The headline reader journey works end to end. Pick the journey the intent set out to enable ("a new user can install and run their first request in under five minutes," "a developer can find the API reference for the new endpoint and copy a working example"). Walk it through Playwright start to finish. Screenshot every step. A journey that breaks anywhere — broken link, missing section, search miss, copyable example that doesn't work when pasted — is the headline finding.
  • Per-unit claims hold across every stage. Walk every unit body across draft / outline / review / publish. Each unit's claimed deliverable MUST be visible in the published site, not just in the staged outputs at the time the unit closed.
  • Close the session. Call haiku_view_close({ session_id }) after all checks complete.

Common failure modes to look for

  • New documentation pages that publish but never get added to the sidebar — discoverable only by direct URL guess
  • Search index that rebuilds on deploy but doesn't include the new pages because the search config has an explicit allow-list that wasn't updated
  • A documented Quickstart that worked when each step was reviewed in isolation but breaks end-to-end because step 3's command depends on step 2's output and step 2's output changed
  • Sitemap that lists 47 pages, page count after this intent should be 50, but it still says 47 because the build script hardcoded the count
  • A linked example repo that the docs reference no longer exists — the docs publish clean, the user follows the link, hits a 404

Reflection

synthesized once the intent completes
dimensionClarityReview findings from editor and SME, revision cycles between draft and final.

Analyze: Review findings from editor and SME, revision cycles between draft and final.

Look for:

  • Common clarity issues (jargon, missing context, assumed knowledge)
  • Sections that required the most revision
  • Whether the target audience was correctly identified

Produce:

  • Clarity improvement patterns
  • Writing guideline recommendations
  • Audience definition refinement
dimensionCoverageGaps closed vs gaps identified in audit, documentation coverage by feature area.

Analyze: Gaps closed vs gaps identified in audit, documentation coverage by feature area.

Look for:

  • Audit gaps that were not addressed (what got deprioritized and why)
  • Feature areas still lacking documentation
  • Whether the outline structure matched how users navigate

Produce:

  • Coverage delta (gaps at start vs gaps remaining)
  • Feature area coverage map
  • Recommendations for prioritizing remaining gaps