Transformation

Ask review

Transform and model data for the target schema

Hats
2
Review Agents
1
Review
Ask
Unit Types
Transformation
Inputs
Extraction

Dependencies

Extractionstaged-data

Hat Sequence

1

Data Modeler

Focus: Design and validate the target data model — grain definitions, entity relationships, surrogate key strategies, and slowly changing dimension (SCD) types. Ensure the model serves both current query patterns and foreseeable analytical needs.

Produces: Data model documentation with entity-relationship diagrams, grain definitions per table, SCD type decisions, and join path documentation.

Reads: Transformer's implementation, schema analysis from discovery, analytical requirements from the intent.

Anti-patterns (RFC 2119):

  • The agent MUST NOT define tables without explicitly stating the grain (one row per what?)
  • The agent MUST NOT use natural keys as primary keys without considering change scenarios
  • The agent MUST NOT over-normalize for OLTP patterns when the target is analytical (OLAP)
  • The agent MUST document SCD strategy per dimension (Type 1 overwrite vs Type 2 history)
  • The agent MUST NOT design the model without understanding the primary query access patterns
2

Transformer

Focus: Implement transformation logic that converts raw staged data into the target schema. Centralize business rules, ensure idempotency, and write transformations that are testable and debuggable. Substance over cleverness — readable SQL/code beats terse one-liners.

Produces: Transformation code (SQL, dbt models, Spark jobs, etc.) that converts staged data to the target schema with centralized business logic and clear data lineage.

Reads: Staged data from extraction, schema analysis and source catalog from discovery, target schema requirements from the intent.

Anti-patterns (RFC 2119):

  • The agent MUST NOT scatter business logic across multiple transformations instead of centralizing
  • The agent MUST NOT write non-idempotent transformations that produce duplicates on re-run
  • The agent MUST NOT use opaque column aliases without documenting semantic meaning
  • The agent MUST NOT perform implicit type coercions without explicit CAST statements
  • The agent MUST NOT build deeply nested subqueries instead of named CTEs or intermediate models

Review Agents

Data Quality

Mandate: The agent MUST verify transformations produce correct, consistent output that matches the target schema.

Check:

  • The agent MUST verify that type conversions handle edge cases (nulls, empty strings, timezone differences, encoding)
  • The agent MUST verify that business logic transformations match the documented rules exactly
  • The agent MUST verify that deduplication logic is deterministic and handles all key collision scenarios
  • The agent MUST verify that referential integrity is maintained across related entities

Transformation

Criteria Guidance

Good criteria examples:

  • "Transformation SQL is idempotent — re-running produces the same result without duplicates"
  • "Data model follows the agreed dimensional modeling pattern with surrogate keys and SCD type documented per dimension"
  • "All business logic (e.g., revenue recognition rules, status mappings) is centralized in named CTEs or macros, not scattered across queries"

Bad criteria examples:

  • "Transformations are complete"
  • "Data model looks good"
  • "Business logic is implemented"

Completion Signal (RFC 2119)

Transformation layer converts staged raw data into the target schema. All business rules are implemented and centralized. Data model MUST be documented with entity relationships, grain definitions, and SCD strategies. Transformations are idempotent and produce deterministic output. Data modeler MUST have MUST be verified grain consistency and join correctness.