Overview
Data validation is not a one-time cleaning step. It is a continuous quality gate that runs at every ingestion, every transformation, and every load. Without it, bad data enters the pipeline silently, propagates through every downstream system, and surfaces as wrong numbers in a report three weeks later — after decisions have already been made.
Most validation implementations are ad hoc: a few range checks, maybe a null count, no systematic coverage of cross-field constraints or distributional expectations. The result is a pipeline that catches obvious errors but misses the subtle quality degradation that accumulates over time.
The Data Validation Pipeline Design Prompt generates a complete validation architecture: layered checks by severity, cross-field constraint specifications, distributional expectations, alerting thresholds, and a data contract format that makes quality requirements explicit and version-controlled.
What you get: - Layered validation architecture (schema / field / cross-field / distributional) - Severity tier classification (blocking / warning / informational) - Cross-field constraint specifications - Distributional expectation baselines - Data contract format for team-wide quality agreements - Alerting and escalation logic
Built for: data engineers, analytics engineers, and data platform teams building quality gates into production data pipelines.