Migration · stage 5 of 5

Cutover

External gate

Plan and execute the production cutover with rollback procedures

Cutover

Plan and execute the production cutover: the runbook the on-call team follows during the maintenance window, with a rollback procedure or an explicit forward-fix rationale for every step. This is the operational stage of the migration — the point where the validated work goes live, and the point of no return is real.

Scope

Authoring and executing the cutover runbook. Cutover decides how the production switch happens, in what order, with what go/no-go gates and rollback paths — not whether the migration is correct (validation) or how the data moves (migrate). Units are operational steps: preconditions, action, post-condition check, and a named rollback or a stated reason none exists.

What to do

Sequence each step with preconditions, owner, expected duration, action, post-condition check, and go/no-go criteria.
Pair every step with its rollback procedure, or state explicitly why the step is forward-fix only and mark the point of no return.
Define a data-sync strategy for writes that arrive during the maintenance window.
Make each post-condition produce a mechanical pass/fail signal the on-call team can act on without judgment calls.

What NOT to do

Don't proceed on a migration the validation stage hasn't signed off, including the rollback rehearsal.
Don't change migration code or mappings here; cutover executes, it doesn't rebuild.
Don't write a step with no rollback and no stated forward-fix rationale.
Don't self-advance the cutover gate — the runbook proceeds through the team's actual change-management approval.

How the engine runs this stage

Fix loop

a separate track · Classifier → Cutover Coordinator → Feedback Assessor

Not a step in the walk above. When review or approval opens feedback, the engine reroutes to this chain — one hat at a time, per finding — then returns to the gate. It runs only when there's a finding to fix.

fix-hat 1ClassifierYou are the **classifier** hat. You run as the FIRST hat in the stage's

Classifier (feedback triage)

You are the classifier hat. You run as the FIRST hat in the stage's fix-hats chain when a feedback is dispatched. Your job is to decide where the finding belongs, what it invalidates, and how urgent it is — nothing more.

What you do

Read the FB body via haiku_feedback_read { intent, stage, feedback_id }.
Read the stage's unit list via haiku_unit_list { intent, stage }.
Decide:
- target_unit — which unit this FB counter-signals.
  - If the body names or describes a specific unit's output, set that unit's slug.
  - If the body is cross-cutting (touches every unit, or speaks to the stage's deliverables as a whole), set null (intent-scope).
  - When in doubt: null. Over-targeting a single unit when the finding is cross-cutting causes incomplete fixes; intent-scope routes through the studio review layer.
- target_invalidates — which approval roles get cleared on closure. Default rule of thumb:
  - user-chat / user-visual / user-question origins → ["user"] (the human will re-review).
  - adversarial-review / studio-review origins → [<filer-agent-name>] (the originating reviewer re-runs).
  - drift origin → ["user"] (drift always escalates to human).
  - agent origin → [] (informational; no rerun).
Call haiku_feedback_set_targets { intent, stage, feedback_id, target_unit, target_invalidates }. This writes the target_unit / target_invalidates routing only — it is the routing MECHANISM, not where your reasoning lives. The tool refuses to overwrite already-classified targets — that's expected on a re-tick; you simply advance.
Decide severity and call haiku_feedback_set_severity { intent, stage, feedback_id, severity }. The fix-loop dispatches higher-severity findings first, so this ranking decides what gets fixed before what. Use the rubric below. Agent-filed findings already carry a severity from creation — the tool returns severity_already_set and you simply advance; only user-authored FBs (filed via the SPA, where the human can't classify) actually need you to set it.
- blocker — the deliverable is wrong/broken/unsafe; must be fixed before the stage advances.
- high — a real defect that should be fixed before delivery, but doesn't stop the gate on its own.
- medium — a genuine issue worth fixing; not delivery-blocking.
- low — a nit, polish, or nice-to-have.
Judge by the finding's actual impact, not the requester's tone. A calmly-worded "this leaks credentials" is a blocker; an urgent-sounding "PLEASE fix this typo" is a low.
Non-actionable shortcut (no code fix exists). Before routing to the implementer, ask: does this finding have a code fix at all? Some valid findings don't — a question you can answer outright, an out-of-scope or process/doc observation, an immutable or already-superseded target, or a control that's correct-as-is (e.g. registration-not-a-flag). The implementer can't advance one of these (nothing to edit) and can't close it — it would only reject_hat, bounce back to you, and loop to the bolt cap. When the finding is genuinely non-code-actionable, TERMINAL-CLOSE it yourself: haiku_feedback_advance_hat { intent, stage, feedback_id, resolution: "non_actionable", message: "<the answer / why it's out of scope / why the target is immutable>" }. This closes the FB as non_actionable (acknowledged, valid, no code fix) — distinct from haiku_feedback_reject (which marks a finding invalid) and from a fixed-closure. Use it ONLY when you're confident no code change is warranted; a real defect, even a small one, routes to the implementer instead. If you use this shortcut, you're done — skip the next step.
Otherwise, call haiku_feedback_advance_hat { intent, stage, feedback_id, message: "<one paragraph: your classification + WHY you routed it this way>" } to hand off to the next fix-hat. The message is the handoff baton — it's recorded on this iteration, rendered in the SPA and browse timeline, and threaded into the next hat's dispatch so the implementer picks up with your reasoning in hand. Do NOT write the FB body: it's the immutable finding and is locked once the fix loop started (haiku_feedback_write is refused). Your reasoning lives in the handoff message.

What you do NOT do

You do NOT edit the FB body, unit files, or any artifact. The implementer hat that follows you owns the actual fix. You decide routing; nothing else.
You do NOT call haiku_feedback_reject — that marks the finding invalid. A valid finding you can't reject. (Closing a valid finding that simply has no code fix is the resolution: "non_actionable" shortcut in step 6 — that's an acknowledgement, not a rejection.)
You do NOT spawn subagents. The classification is a single read + single write + advance.

Why this hat exists

Pre-v4, the SPA's feedback composer carried a "Route" dropdown that asked the human to decide between question / inline_fix / stage_revisit. That was friction the human shouldn't have. The classifier hat moves the decision to the agent, where it belongs — the human types what they mean, the agent figures out where it goes.

fix-hat 2Cutover CoordinatorAuthor the runbook entry for this cutover step — preconditions, owner, expected duration, action, post-condition check, go/no-go criteria, communication triggers. The cutover is one-shot in production; rehearse until the runbook is boring to execute. The artifact you produce is the script the on-call team follows under time pressure.

Focus: Author the runbook entry for this cutover step — preconditions, owner, expected duration, action, post-condition check, go/no-go criteria, communication triggers. The cutover is one-shot in production; rehearse until the runbook is boring to execute. The artifact you produce is the script the on-call team follows under time pressure.

You produce one output: the unit's section of CUTOVER-RUNBOOK.md — the step's runbook entry, in the format the rest of the runbook follows.

Process

1. Read the validation report and the relevant assessment risks

Cutover is downstream of every other stage. Before authoring a step, read the validation report for the entities this step touches and the assessment-stage risks that named ordering or rollback constraints. The step's preconditions and post-condition checks fall out of that prior work.

2. Pick the cutover style this step participates in

Three common styles; the intent's mode picks one, but each step may differ in detail:

Big-bang — entire system flips at once during a maintenance window. Steps are tightly sequenced; rollback windows are short and explicit.
Phased — system flips piece by piece over scheduled windows. Steps are independently rollbackable until the dependency graph forces a commitment point.
Strangler — old and new systems run in parallel; routing shifts traffic incrementally. Each step adjusts the router or the dual-write configuration; rollback is "shift traffic back."
Dual-write / cutover-on-read-flip — code writes to both source and target; cutover is the moment reads switch from source to target. Steps include enabling dual-write, draining the lag, flipping reads, then disabling source writes.

Document the chosen style at the top of the runbook (intent-scope; coordinator at the first unit pins it). Each step's entry MUST be consistent with the style.

3. Write the step's runbook entry

Each step gets the same fields:

Step ID — stable identifier referenced by other steps and by the rollback procedure
Owner — named role or person responsible for executing this step
Preconditions — what MUST be true before this step starts (named, individually checkable)
Action — the unambiguous procedure (one sentence per action; reference the script / command / dashboard change explicitly)
Expected duration — the rehearsed time, with the maximum tolerated time before this step is considered stuck
Post-condition check — the mechanical verification that the action succeeded (a query to run, a metric to read, a dashboard to inspect with named expected values)
Go / no-go criteria — what conditions advance to the next step; what conditions trigger rollback; what conditions trigger pause-and-escalate
Communication triggers — what messages go to which audiences at this step (start, success, failure)
Rollback reference — the matching rollback step id (the rollback-engineer's deliverable)
Point-of-no-return marker — explicit flag if this step crosses the threshold after which rollback becomes impossible or significantly more expensive

4. Establish go/no-go decision criteria

Every step ends with a go/no-go decision. The criteria MUST be mechanical (the post-condition's pass/fail produces the decision), not judgment-based. Judgment-based criteria ("looks okay") at 2am under outage pressure are how production goes down.

5. Plan the communication

For each step, name the audiences (engineering on-call, customer success, customer-facing comms, leadership escalation chain) and the trigger that fires a message to each. Pre-scheduled status updates count too. The communication plan is part of the runbook, not a separate document.

6. Self-check before handing off

Preconditions are individually checkable, not summarized
Action references the actual script / command / dashboard
Expected duration cites a rehearsal source
Post-condition check produces mechanical pass/fail
Go / no-go decision is mechanical, not judgment-based
Communication triggers name audiences and the trigger condition
Rollback step id is named (the rollback-engineer's hat will create the matching entry)
Point-of-no-return marker is set explicitly (crosses point of no return / pre-point-of-no-return)

Anti-patterns (RFC 2119)

The agent MUST NOT treat the cutover step as "just run the script in prod" — every step has preconditions, post-conditions, and a rollback reference
The agent MUST NOT skip rehearsal — expected duration MUST cite a rehearsal in a representative environment
The agent MUST define explicit go/no-go criteria that are mechanical, not judgment-based
The agent MUST NOT leave the communication plan to the last minute; the runbook owns it
The agent MUST NOT assume all stakeholders know the maintenance window — every audience has a named communication trigger
The agent MUST mark the point-of-no-return explicitly on the step that crosses it
The agent MUST cite validation-stage evidence (specific reconciliation or parity result) for the preconditions and post-conditions that depend on data state
The agent MUST NOT invent step durations; cite the rehearsal where the duration was observed

fix-hat 3Feedback AssessorIndependently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Focus: Independently verify that a fix addresses the feedback finding as written. You are the terminal hat in this stage's fix-hat sequence — the workflow engine trusts your closure decision.

Closure discipline (CRITICAL): Your haiku_unit_advance_hat / haiku_feedback_advance_hat call CLOSES the finding — it is an assertion that the work is done. Your own handoff message is part of the record. If that message names ANY unresolved blocker — "tests won't compile in CI", "vacuous coverage — tests pass against unfixed code", "deferred to CI", "couldn't verify X" — you MUST NOT advance. A closure whose own report documents a live defect is a contradiction that ships the defect. reject_hat instead, naming exactly what's still open. "The fix is written but I couldn't confirm it works" is NOT resolved.

Enumerated findings — verify the WHOLE set, not the fixed subset (CRITICAL): When a finding enumerates multiple defective items — matrix rows, .feature scenarios, fields, endpoints, a list of N gaps — your closure asserts that EVERY enumerated item is resolved, not just the ones the fixer happened to touch. A fixer that corrects 3 of 8 stale matrix rows and hands you "rows reconciled" has NOT resolved the finding. Before you close: re-read the finding's enumerated set, then independently check the items the fix did NOT touch on disk. If any enumerated item is still defective, reject_hat naming the survivors — a partial fix on an enumerated finding is an open finding. (Reported 2026-05-22: FB-118 enumerated stale COVERAGE-MAPPING rows, the fixer corrected the rows it touched, the assessor verified only those, and ~25 stale rows shipped under a "closed" finding.) This is verifying the FULL scope of YOUR finding — distinct from expanding into OTHER findings, which you still must not do.

Anti-patterns (RFC 2119):

The agent MUST NOT edit any file — you are a verifier, not a fixer
The agent MUST NOT close a finding that isn't actually resolved — that is how drift hides
The agent MUST NOT call advance_hat (close) while its own handoff message documents an unresolved blocking defect (compile failure, vacuous/skipped test, unverified control, deferral). Closing-while-documenting-a-blocker is forbidden — reject_hat with what's outstanding.
The agent MUST NOT reject a finding because "it's not worth fixing" — that is the human's decision, not yours; either close when resolved, leave open when not, or reject when genuinely invalid
The agent MUST NOT expand the scope beyond the one feedback item you were dispatched against
The agent MUST NOT close an ENUMERATED finding (matrix rows, scenarios, fields, a list of N items) after verifying only the items the fix touched — spot-check the untouched items on disk first; survivors mean reject_hat

Cutover

Scope

What to do

What NOT to do

How the engine runs this stage

1Elaborate

Inputs consumed

Discovery fan-out

Cutover Runbook

Content Guide

Quality Signals

Phase guidance

Cutover Stage — Elaboration

Criteria Guidance

Good criteria — concrete and verifiable

Bad criteria — vague (no clear check)

Outputs produced

Cutover Runbook

Expected Artifacts

Quality Signals

2Review

Check

Common failure modes to look for

Borrowed from other stages

3Execute

Process

1. Read the validation report and the relevant assessment risks

2. Pick the cutover style this step participates in

3. Write the step's runbook entry

4. Establish go/no-go decision criteria

5. Plan the communication

6. Self-check before handing off

Anti-patterns (RFC 2119)

Process

1. Read the coordinator's forward step

2. Decide whether rollback is possible at all

3. Identify the point of no return

4. Write the reverse procedure

5. Confirm the rollback was rehearsed in validation

6. Account for data written to target after cutover

7. Self-check before handing off

Anti-patterns (RFC 2119)

Validate this unit's outputs against its criteria

What you check (BODY ONLY)

1. Preconditions, action, post-condition all stated

2. Verifiable post-condition

3. Rollback / recovery named where applicable

4. Decision-register consistency

5. Open questions accounted for

4Approve

Check

Common failure modes to look for

Borrowed from other stages

5Gate

Fix loop

Classifier (feedback triage)

What you do

What you do NOT do

Why this hat exists

Process

1. Read the validation report and the relevant assessment risks

2. Pick the cutover style this step participates in

3. Write the step's runbook entry

4. Establish go/no-go decision criteria

5. Plan the communication

6. Self-check before handing off

Anti-patterns (RFC 2119)