Deployment
External reviewDeploy pipelines to production with monitoring and alerting
Dependencies
Hat Sequence
Pipeline Engineer
Focus: Package and deploy the pipeline to the production orchestrator. Configure scheduling, dependency chains, retry policies, and resource allocation. Ensure the pipeline runs reliably on the target infrastructure with proper logging and observability.
Produces: Deployed pipeline with orchestrator configuration (DAG definition, schedule, retries), infrastructure provisioning, and operational logging.
Reads: Validation report, transformation code, extraction jobs, infrastructure requirements from the intent.
Anti-patterns (RFC 2119):
- The agent MUST NOT deploy without configuring retries and timeout policies
- The agent MUST NOT use hardcoded schedules without considering upstream dependency completion
- The agent MUST set resource limits (memory, CPU, parallelism) for pipeline stages
- The agent MUST NOT deploy to production without a rollback plan for the first run
- The agent MUST NOT skip integration testing of the full DAG in a staging environment
Sre
Focus: Verify operational readiness — monitoring, alerting, runbooks, and incident response paths. Ensure the pipeline meets SLA commitments and that the team can diagnose and recover from failures without the original builder.
Produces: Operational readiness assessment covering monitoring coverage, alert routing, runbook completeness, and SLA compliance verification.
Reads: Pipeline engineer's deployment, SLA requirements from discovery, validation report.
Anti-patterns (RFC 2119):
- The agent MUST NOT approve deployment without verifying alert routing reaches the right on-call channel
- The agent MUST NOT accept monitoring that covers only success cases, not failure and degradation modes
- The agent MUST verify that runbooks are actionable by someone unfamiliar with the pipeline internals
- The agent MUST NOT ignore data freshness monitoring in favor of only pipeline execution monitoring
- The agent MUST NOT treat operational readiness as a checkbox rather than a genuine safety review
Review Agents
Reliability
Mandate: The agent MUST verify the deployed pipeline is resilient and observable in production.
Check:
- The agent MUST verify that failure recovery is defined: retry policies, dead-letter queues, alerting
- The agent MUST verify that monitoring covers pipeline health, data freshness, and quality metrics
- The agent MUST verify that backfill procedures exist for when historical data needs reprocessing
- The agent MUST verify that resource sizing accounts for peak volumes, not just average load
Included from other stages
Deployment
Criteria Guidance
Good criteria examples:
- "Pipeline DAG is registered in the orchestrator with correct dependencies, retry policies, and SLA-based alerting"
- "Monitoring covers pipeline runtime, row counts per stage, data freshness, and error rates with alerts routed to the on-call channel"
- "Runbook documents manual recovery steps for the 3 most likely failure modes (source unavailable, schema drift, transformation timeout)"
Bad criteria examples:
- "Pipeline is deployed"
- "Monitoring is set up"
- "Documentation exists"
Completion Signal (RFC 2119)
Pipeline is deployed to the production orchestrator with correct scheduling, dependencies, and retry policies. Monitoring dashboards show pipeline health, data freshness, and row count trends. Alerting is configured for SLA breaches and pipeline failures. Runbook MUST exist with recovery procedures for common failure scenarios. SRE MUST have MUST be verified the deployment meets operational readiness criteria.