Data Cleaning
Data preparation and transformation
Schema Migration & Data Transformation Cleaning Prompt
Design a schema migration and structural transformation cleaning plan that maps source to target with explicit field-level transformation rules, conflict resolution, and a rollback strategy — not a best-effort migration that silently loses data.
Data Profiling & Quality Audit Prompt
Run a systematic data profiling audit that produces a complete quality assessment — with per-variable statistics, cross-variable relationship flags, quality scoring, and a prioritized remediation backlog rather than a list of observations.
Time Series Data Cleaning & Gap Handling Prompt
Design a time series cleaning pipeline that handles irregular intervals, gaps, temporal outliers, and resampling — with methods that preserve the temporal structure your model depends on rather than destroying it with naive interpolation.
Text Data Cleaning & NLP Preprocessing Prompt
Design a text cleaning and NLP preprocessing pipeline that preserves the linguistic signal your model needs — with task-specific decisions about what to strip, normalize, and retain rather than a generic cleaning checklist.
Categorical Encoding & Normalization Strategy Prompt
Design a categorical encoding and numeric normalization strategy that matches the encoding method to the variable's cardinality, ordinality, and downstream model — not a blanket one-hot encoding applied to everything.
Data Validation Pipeline Design Prompt
Design a data validation pipeline that catches quality failures at ingestion — with layered checks, severity tiers, alerting logic, and a data contract that makes quality expectations explicit and enforceable.
Duplicate Detection & Deduplication Prompt
Design a deduplication pipeline that catches exact duplicates, near-duplicates, and entity-level duplicates — with matching logic, confidence scoring, and a resolution strategy that doesn't silently discard records.
Data Type & Format Standardization Prompt
Build a data type and format standardization pipeline that resolves inconsistencies at the source — with parsing rules, validation schemas, and transformation logic that makes downstream analysis deterministic.
Outlier Detection & Treatment Framework Prompt
Build an outlier detection and treatment framework that distinguishes genuine anomalies from data errors from extreme-but-valid observations — because removing all outliers is as wrong as keeping all of them.
Missing Data Imputation Strategy Prompt
Design a missing data imputation strategy that matches the mechanism of missingness — not a blanket fill with mean or median — with method selection logic, validation criteria, and downstream impact assessment.