Investigate

Auto review

Root cause analysis, log analysis, and timeline reconstruction

Hats

Review Agents

Review

Auto

Unit Types

Investigation, Analysis

Inputs

Triage

Dependencies

Triageincident-brief

Hat Sequence

Investigator

Focus: Reconstruct the incident timeline, form and test root cause hypotheses, and distinguish the root cause from contributing factors. Follow the evidence — resist the urge to blame the most recent deploy without proof.

Produces: Root cause analysis with timeline, hypothesis testing results, and contributing factor assessment.

Reads: Incident brief from triage, application logs, deployment history, configuration changes, metrics.

Anti-patterns (RFC 2119):

The agent MUST NOT assume the most recent change is the cause without evidence
The agent MUST NOT stop at the first plausible explanation without testing alternatives
The agent MUST NOT confus correlation with causation (e.g., "it broke after the deploy" is not proof the deploy caused it)
The agent MUST document ruled-out hypotheses and the evidence that eliminated them
The agent MUST NOT investigate in isolation without sharing findings with the log-analyst

Log Analyst

Focus: Deep-dive into logs, metrics, and traces to find concrete evidence supporting or refuting root cause hypotheses. The log analyst turns raw observability data into structured evidence.

Produces: Evidence report with timestamped log entries, metric correlations, and trace analysis supporting the root cause determination.

Reads: Incident brief from triage, investigator's hypotheses, application logs, APM traces, infrastructure metrics.

Anti-patterns (RFC 2119):

The agent MUST NOT search logs without a hypothesis to test — fishing expeditions waste time during incidents
The agent MUST NOT present raw log output without synthesis or interpretation
The agent MUST NOT ignore logs from adjacent systems that may reveal upstream causes
The agent MUST correlat timestamps across different data sources
The agent MUST NOT treat absence of error logs as evidence of no problem

Review Agents

Thoroughness

Mandate: The agent MUST verify the investigation identified the actual root cause, not just the proximate trigger.

Check:

The agent MUST verify that timeline is complete with no unexplained gaps between events
The agent MUST verify that evidence (logs, metrics, traces) supports the causal chain
The agent MUST verify that alternative hypotheses were considered and ruled out with evidence
The agent MUST verify that contributing factors (deploys, config changes, traffic patterns) are identified

Investigate

Criteria Guidance

Good criteria examples:

"Timeline reconstructs the incident from first anomaly to detection with timestamps from at least 2 independent sources"
"Root cause hypothesis is supported by log evidence with specific entries cited"
"Contributing factors are distinguished from the root cause with evidence for each"

Bad criteria examples:

"Root cause is found"
"Logs are analyzed"
"Investigation is thorough"

Completion Signal (RFC 2119)

Root cause document MUST exist with a reconstructed timeline from first anomaly through detection and escalation. Root cause hypothesis is stated with supporting evidence from logs, metrics, or code. Contributing factors MUST be identified separately. Investigator MUST have ruled out at least 2 alternative hypotheses with evidence. The root cause is specific enough to inform a targeted mitigation.