Mitigate

Ask review

Apply immediate fixes to stop the bleeding — rollbacks, feature flags, scaling

Hats
2
Review Agents
1
Review
Ask, await
Unit Types
Hotfix, Rollback, Workaround
Inputs
Investigate

Dependencies

Investigateroot-cause

Hat Sequence

1

Mitigator

Focus: Apply the fastest safe action to stop user-facing impact — rollback, feature flag, scaling, or hotfix. Speed matters, but so does not making things worse. Every action must be reversible.

Produces: Mitigation log documenting exactly what was done, when, and how to reverse it.

Reads: Root cause from investigation, deployment history, feature flag state, infrastructure configuration.

Anti-patterns (RFC 2119):

  • The agent MUST NOT apply a fix without a rollback plan for the fix itself
  • The agent MUST NOT choose a permanent fix when a faster temporary mitigation exists
  • The agent MUST document the exact commands or config changes applied
  • The agent MUST NOT make multiple changes simultaneously, making it impossible to attribute which one helped
  • The agent MUST NOT skip communication — stakeholders need to know what's being done
2

Verifier

Focus: Confirm the mitigation actually stopped the user-facing impact. Use the same signals that detected the incident — if error rates triggered the alert, error rates should confirm the fix. Trust metrics, not assumptions.

Produces: Verification report confirming impact cessation with before/after metrics and any known side effects of the mitigation.

Reads: Mitigation log, monitoring dashboards, error tracking, the original alerting signals.

Anti-patterns (RFC 2119):

  • The agent MUST NOT declare "fixed" based on a single data point or gut feeling
  • The agent MUST NOT use different metrics to verify than the ones that detected the problem
  • The agent MUST wait long enough for metrics to stabilize before confirming
  • The agent MUST NOT ignore partial mitigation — impact reduced but not eliminated
  • The agent MUST check for side effects introduced by the mitigation itself

Review Agents

Safety

Mandate: The agent MUST verify the mitigation stops the bleeding without introducing new risks.

Check:

  • The agent MUST verify that mitigation addresses the immediate impact, not a side effect
  • The agent MUST verify that rollback or feature flag changes do not break other functionality
  • The agent MUST verify that the mitigation is verified working in production, not just deployed
  • The agent MUST verify that no data loss or corruption results from the mitigation action

Mitigate

Criteria Guidance

Good criteria examples:

  • "Mitigation action is documented with exact commands or config changes applied"
  • "Verification confirms user-facing impact has stopped, measured by the same metrics that triggered the incident"
  • "Rollback plan exists in case the mitigation itself causes regression"

Bad criteria examples:

  • "Issue is mitigated"
  • "Fix is applied"
  • "Things are back to normal"

Completion Signal (RFC 2119)

Mitigation log documents exactly what was done — rollback version, feature flag toggled, scaling action, or hotfix applied — with timestamps. Verifier MUST have confirmed the user-facing impact MUST have stopped using the same signals that detected the incident. A rollback plan for the mitigation itself MUST be documented. Any known side effects of the mitigation are called out.