1. Deletion Methods

a. Listwise (Complete Case) Deletion

  • How it works: Remove entire rows (cases) with any missing values.
  • When to use:

Data is MCAR (Missing Completely at Random).

Missingness is minimal (<5% of data).

  • Pros: Simple, no computational overhead.
  • Cons:

Loss of sample size → reduced statistical power.

Biased results if data is not MCAR.

b. Pairwise Deletion

  • How it works: Use all available data for each calculation (e.g., correlation uses all non-missing pairs).
  • When to use:

Missingness is MCAR or MAR.

Different variables have different missing patterns.

  • Pros: Retains more data than listwise deletion.
  • Cons:

Can produce inconsistent sample sizes across analyses.

Biased if missingness is MNAR.

2. Imputation Methods

a. Mean/Median/Mode Imputation

  • How it works: Replace missing values with the mean (numeric), median (skewed data), or mode (categorical).
  • When to use:

Small missingness in MCAR data.

Quick baseline approach.

  • Pros: Simple, fast.
  • Cons:

Underestimates variance.

Distorts relationships between variables.

b. Regression Imputation

  • How it works: Predict missing values using linear/logistic regression on other variables.
  • When to use:

Missingness is MAR (predictable from other variables).

Strong correlations exist between variables.

  • Pros: More accurate than mean imputation.
  • Cons:

Overestimates model fit (imputed values are perfect predictions).

Doesn't account for uncertainty.

c. K-Nearest Neighbors (KNN) Imputation

  • How it works: Replace missing values with the average of the *k* most similar rows.
  • When to use:

Data has meaningful similarity (e.g., clustering).

Missingness is MAR.

  • Pros: Captures local patterns.
  • Cons:

Computationally expensive for large datasets.

Sensitive to choice of k.

d. Multiple Imputation (MICE — Multiple Imputation by Chained Equations)

  • How it works:
  1. Create multiple imputed datasets with random noise.
  2. Analyze each dataset separately.
  3. Combine results (Rubin's rules).
  • When to use:

Missingness is MAR or MNAR.

High-quality imputation needed.

  • Pros:

Accounts for imputation uncertainty.

Gold standard for MAR data.

  • Cons:

Complex to implement.

Computationally intensive.

e. Time-Series Imputation (Forward/Backward Fill, Interpolation)

  • How it works:

Forward fill: Use last observed value.

Linear interpolation: Estimate missing values between two points.

  • When to use:

Time-series or longitudinal data.

Missingness is MAR.

  • Pros: Preserves trends.
  • Cons:

Can introduce artificial patterns.

Not suitable for non-sequential data.

3. Model-Based Methods

a. Maximum Likelihood (Expectation-Maximization — EM Algorithm)

  • How it works: Uses observed data to estimate missing values iteratively.
  • When to use:

Missingness is MAR.

Data follows a known distribution (e.g., normal).

  • Pros: More efficient than multiple imputation.
  • Cons: Requires distributional assumptions.

b. Bayesian Imputation

  • How it works: Treats missing data as parameters and estimates them using Bayesian inference.
  • When to use:

Small datasets with MNAR missingness.

High uncertainty in missing values.

  • Pros: Handles uncertainty rigorously.
  • Cons: Computationally complex.

4. Advanced Techniques for MNAR Data

a. Selection Models

  • How it works: Models the missingness mechanism explicitly (e.g., Heckman correction).
  • When to use: When missingness depends on the missing values (MNAR).
  • Pros: Corrects for bias.
  • Cons: Requires strong assumptions.

b. Pattern-Mixture Models

  • How it works: Analyzes different missingness patterns separately.
  • When to use: When subgroups have different missing data mechanisms.
  • Pros: Flexible for MNAR.
  • Cons: Hard to interpret.

c. Sensitivity Analysis

  • How it works: Tests how results vary under different MNAR assumptions.

When to use: When MNAR is suspected but unverifiable.

  • Pros: Quantifies bias risk.
  • Cons: Doesn't "fix" missingness.
None

Note:

MCAR: Deletion or simple imputation is acceptable.

MAR: Use MICE, regression, or KNN.

MNAR: Requires advanced modeling (Bayesian, sensitivity analysis).

Always report missingness patterns and justify your method.