Take your statistical modeling to the next level with Statsmodels — Python's best-kept secret for interpretable machine learning, econometrics, and time series.

None

Statsmodels (Statistical Modeling)

Purpose: In-depth statistical tests & models

📂 Mock Dataset (dataset.csv)

ID,Age,Income,Education_Level,Spending_Score,Gender,City_Type,Year,Marketing_Spend,Online_Shopping_Score
1,24,42000,16,65,Male,Urban,2020,15000,80
2,35,52000,17,72,Female,Urban,2021,18000,85
3,44,58000,18,80,Male,Suburban,2020,14000,70
4,29,60000,18,85,Female,Urban,2021,22000,90
5,41,49000,16,70,Female,Rural,2020,10000,60
6,38,71000,19,88,Male,Suburban,2022,25000,95
7,26,43000,16,60,Female,Rural,2021,12000,55
8,31,50000,17,74,Male,Urban,2022,19000,78

🧠 Table of Contents

  1. Importing and Setup
  2. Advanced Multiple Linear Regression (with Interaction Terms)
  3. Generalized Linear Models (GLM) — Poisson
  4. Logistic & Probit Models (Binary Classification)
  5. Two-Way ANOVA
  6. ARIMA + Exogenous Variables (ARIMAX)
  7. Fixed Effects Panel Regression
  8. Model Diagnostics: Residuals, Normality, VIF
  9. Statistical Assumptions Testing
  10. Custom Hypothesis Testing with t_test()

1. Importing and Setup

None

2. Multiple Linear Regression (with Interaction Terms)

None

Interpretation Tips:

  • Education_Level * Gender includes both variables + their interaction.
  • Use p-values to assess feature significance.
  • Check adjusted R² for model quality.

3. Generalized Linear Models (Poisson for Count Data)

None

📌 Use when your target is count-based or non-normal (e.g., number of items bought).

4. Logistic & Probit Models

None
None

5. Two-Way ANOVA (Gender + City_Type)

None

📚 Use this when comparing multiple groups' means across 2+ factors.

6. ARIMAX (Time Series + Exogenous Variable)

None

7. Panel Regression (Fixed Effects)

from linearmodels.panel import PanelOLS

df.set_index(['ID', 'Year'], inplace=True)
panel_data = df[['Spending_Score', 'Marketing_Spend', 'Income']]

model = PanelOLS.from_formula('Spending_Score ~ Marketing_Spend + Income + EntityEffects', data=panel_data)
results = model.fit()
print(results.summary)

For longitudinal or grouped time data across individuals/entities.

8. Model Diagnostics (QQ, Residuals, VIF)

None

9. Statistical Assumptions

None

10. Custom Hypothesis Tests (t_test())

None

✅ Final Thoughts

Statsmodels is what bridges the gap between "black-box ML" and "real-world interpretability". It's the go-to library for anyone serious about:

  • Statistical rigor
  • Explaining why things happen
  • Forecasting with confidence
  • Building models that pass peer review or executive dashboards