Mitigate

Ask review

Apply immediate fixes to stop the bleeding — rollbacks, feature flags, scaling

Hats

Review Agents

Review

Ask, await

Unit Types

Hotfix, Rollback, Workaround

Inputs

Investigate

Dependencies

Investigateroot-cause

Hat Sequence

Mitigator

Focus: Apply the fastest safe action to stop user-facing impact — rollback, feature flag, scaling, or hotfix. Speed matters, but so does not making things worse. Every action must be reversible.

Produces: Mitigation log documenting exactly what was done, when, and how to reverse it.

Reads: Root cause from investigation, deployment history, feature flag state, infrastructure configuration.

Anti-patterns (RFC 2119):

The agent MUST NOT apply a fix without a rollback plan for the fix itself
The agent MUST NOT choose a permanent fix when a faster temporary mitigation exists
The agent MUST document the exact commands or config changes applied
The agent MUST NOT make multiple changes simultaneously, making it impossible to attribute which one helped
The agent MUST NOT skip communication — stakeholders need to know what's being done

Verifier

Focus: Confirm the mitigation actually stopped the user-facing impact. Use the same signals that detected the incident — if error rates triggered the alert, error rates should confirm the fix. Trust metrics, not assumptions.

Produces: Verification report confirming impact cessation with before/after metrics and any known side effects of the mitigation.

Reads: Mitigation log, monitoring dashboards, error tracking, the original alerting signals.

Anti-patterns (RFC 2119):

The agent MUST NOT declare "fixed" based on a single data point or gut feeling
The agent MUST NOT use different metrics to verify than the ones that detected the problem
The agent MUST wait long enough for metrics to stabilize before confirming
The agent MUST NOT ignore partial mitigation — impact reduced but not eliminated
The agent MUST check for side effects introduced by the mitigation itself

Review Agents

Safety

Mandate: The agent MUST verify the mitigation stops the bleeding without introducing new risks.

Check:

The agent MUST verify that mitigation addresses the immediate impact, not a side effect
The agent MUST verify that rollback or feature flag changes do not break other functionality
The agent MUST verify that the mitigation is verified working in production, not just deployed
The agent MUST verify that no data loss or corruption results from the mitigation action

Mitigate

Criteria Guidance

Good criteria examples:

"Mitigation action is documented with exact commands or config changes applied"
"Verification confirms user-facing impact has stopped, measured by the same metrics that triggered the incident"
"Rollback plan exists in case the mitigation itself causes regression"

Bad criteria examples:

"Issue is mitigated"
"Fix is applied"
"Things are back to normal"

Completion Signal (RFC 2119)

Mitigation log documents exactly what was done — rollback version, feature flag toggled, scaling action, or hotfix applied — with timestamps. Verifier MUST have confirmed the user-facing impact MUST have stopped using the same signals that detected the incident. A rollback plan for the mitigation itself MUST be documented. Any known side effects of the mitigation are called out.