Deployment

External review

Deploy pipelines to production with monitoring and alerting

Hats

Review Agents

1 +2

Review

External

Unit Types

Deployment

Inputs

Validation

Dependencies

Validationvalidation-report

Hat Sequence

Pipeline Engineer

Focus: Package and deploy the pipeline to the production orchestrator. Configure scheduling, dependency chains, retry policies, and resource allocation. Ensure the pipeline runs reliably on the target infrastructure with proper logging and observability.

Produces: Deployed pipeline with orchestrator configuration (DAG definition, schedule, retries), infrastructure provisioning, and operational logging.

Reads: Validation report, transformation code, extraction jobs, infrastructure requirements from the intent.

Anti-patterns (RFC 2119):

The agent MUST NOT deploy without configuring retries and timeout policies
The agent MUST NOT use hardcoded schedules without considering upstream dependency completion
The agent MUST set resource limits (memory, CPU, parallelism) for pipeline stages
The agent MUST NOT deploy to production without a rollback plan for the first run
The agent MUST NOT skip integration testing of the full DAG in a staging environment

Sre

Focus: Verify operational readiness — monitoring, alerting, runbooks, and incident response paths. Ensure the pipeline meets SLA commitments and that the team can diagnose and recover from failures without the original builder.

Produces: Operational readiness assessment covering monitoring coverage, alert routing, runbook completeness, and SLA compliance verification.

Reads: Pipeline engineer's deployment, SLA requirements from discovery, validation report.

Anti-patterns (RFC 2119):

The agent MUST NOT approve deployment without verifying alert routing reaches the right on-call channel
The agent MUST NOT accept monitoring that covers only success cases, not failure and degradation modes
The agent MUST verify that runbooks are actionable by someone unfamiliar with the pipeline internals
The agent MUST NOT ignore data freshness monitoring in favor of only pipeline execution monitoring
The agent MUST NOT treat operational readiness as a checkbox rather than a genuine safety review

Review Agents

Reliability

Mandate: The agent MUST verify the deployed pipeline is resilient and observable in production.

Check:

The agent MUST verify that failure recovery is defined: retry policies, dead-letter queues, alerting
The agent MUST verify that monitoring covers pipeline health, data freshness, and quality metrics
The agent MUST verify that backfill procedures exist for when historical data needs reprocessing
The agent MUST verify that resource sizing accounts for peak volumes, not just average load

Included from other stages

Data Qualityfrom Transformation Coveragefrom Validation

Deployment

Criteria Guidance

Good criteria examples:

"Pipeline DAG is registered in the orchestrator with correct dependencies, retry policies, and SLA-based alerting"
"Monitoring covers pipeline runtime, row counts per stage, data freshness, and error rates with alerts routed to the on-call channel"
"Runbook documents manual recovery steps for the 3 most likely failure modes (source unavailable, schema drift, transformation timeout)"

Bad criteria examples:

"Pipeline is deployed"
"Monitoring is set up"
"Documentation exists"

Completion Signal (RFC 2119)

Pipeline is deployed to the production orchestrator with correct scheduling, dependencies, and retry policies. Monitoring dashboards show pipeline health, data freshness, and row count trends. Alerting is configured for SLA breaches and pipeline failures. Runbook MUST exist with recovery procedures for common failure scenarios. SRE MUST have MUST be verified the deployment meets operational readiness criteria.