Overview
Outlier treatment is one of the most consequential and least rigorous steps in most data pipelines. The default approach — flag anything beyond 3 standard deviations, remove it — destroys legitimate extreme observations, masks data entry errors that should be corrected rather than removed, and applies a method that assumes normality to data that isn't normal.
A defensible outlier framework starts with classification: is this observation an error, a genuine extreme value, or a signal of a different data-generating process? Each classification requires a different treatment. Errors get corrected or removed. Genuine extremes get retained or winsorized depending on the analysis. Different processes may require segmentation.
The Outlier Detection & Treatment Framework Prompt generates a complete outlier handling system: detection method selection by variable type and distribution, classification protocol, treatment decision tree, and sensitivity analysis to assess whether outlier decisions changed your conclusions.
What you get: - Detection method selection by distribution and variable type - Classification protocol (error vs. extreme vs. different process) - Treatment decision tree with statistical justification - Multivariate outlier detection for correlated variables - Sensitivity analysis to test outlier treatment impact
Built for: data analysts, data scientists, and ML engineers who need defensible outlier decisions documented and reproducible.