Triage

Auto review

Assess severity, identify blast radius, and assign ownership

Hats
2
Review Agents
1
Review
Auto
Unit Types
Triage, Communication
Inputs
None

Hat Sequence

1

First Responder

Focus: Confirm the incident is real, capture initial diagnostic data, and assess immediate user impact. The first responder provides ground truth — what's actually happening, not what dashboards suggest might be happening.

Produces: Initial diagnostic snapshot including error samples, affected endpoints, user impact metrics, and reproduction steps if applicable.

Reads: Alerting data, application logs, error tracking systems, user reports.

Anti-patterns (RFC 2119):

  • The agent MUST NOT assume the alert is a false positive without verifying
  • The agent MUST NOT start a fix before documenting what's broken
  • The agent MUST captur ephemeral diagnostic data (logs, metrics) that may rotate out
  • The agent MUST NOT report symptoms without measuring actual user impact
  • The agent MUST NOT work in isolation without feeding findings back to the incident commander
2

Incident Commander

Focus: Take ownership of the incident, classify severity, assess blast radius, and coordinate the response. The incident commander is the single point of authority — decisions flow through them to avoid confusion during high-pressure situations.

Produces: Incident brief with severity classification, blast radius assessment, ownership assignments, and initial communication plan.

Reads: Alerting data, monitoring dashboards, initial reports from on-call or support.

Anti-patterns (RFC 2119):

  • The agent MUST NOT jump to root cause analysis before establishing severity and blast radius
  • The agent MUST NOT fail to assign clear ownership for investigation and mitigation
  • The agent MUST communicat status to stakeholders early and often
  • The agent MUST NOT downgradd severity without evidence that impact is contained
  • The agent MUST NOT attempt to fix the issue instead of coordinating the response

Review Agents

Severity Accuracy

Mandate: The agent MUST verify severity classification and blast radius assessment are accurate.

Check:

  • The agent MUST verify that severity level matches the observed impact (users affected, data at risk, revenue impact)
  • The agent MUST verify that blast radius assessment accounts for downstream dependencies, not just the failing component
  • The agent MUST verify that escalation path is appropriate for the severity level
  • The agent MUST verify that no under-classification to avoid process overhead

Triage

Criteria Guidance

Good criteria examples:

  • "Incident brief includes severity level (SEV1-4) with justification based on user impact"
  • "Blast radius assessment identifies all affected services, regions, and customer segments"
  • "Communication plan specifies who has been notified and through which channels"

Bad criteria examples:

  • "Severity is assessed"
  • "People are notified"
  • "Incident is triaged"

Completion Signal (RFC 2119)

Incident brief MUST exist with severity classification, blast radius assessment, and ownership assignment. Affected systems and user impact MUST be documented. Initial communication MUST MUST have been sent to stakeholders. First-responder MUST have confirmed the incident is reproducible and captured initial diagnostic data.