YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
AutoML Regression Model for Shoe Dataset
Model Summary
This model was trained using AutoGluon Tabular (v1.4.0) on the dataset maryzhang/hw1-24679-tabular-dataset.
The task is regression, predicting the actual measured shoe length (mm) from shoe attributes.
- Best Model:
CatBoost_r177_BAG_L1(bagged ensemble of CatBoost models) - Test R² Score: 0.8904 (≈ 89% variance explained)
- Validation R² Score: 0.8049
- Pearson correlation: 0.9473
- RMSE: 1.80 mm
- MAE: 1.10 mm
- Median AE: 0.68 mm
These values indicate the model can predict shoe length within ~1–2 mm of the actual measurement on average.
Leaderboard (Top 5 Models)
| Rank | Model | Test R² | Val R² | Pred Time (s) | Fit Time (s) |
|---|---|---|---|---|---|
| 1 | CatBoost_r177_BAG_L1 | 0.8994 | 0.8049 | 0.0293 | 27.14 |
| 2 | LightGBMLarge_BAG_L2 | 0.8971 | 0.7995 | 0.7011 | 238.93 |
| 3 | CatBoost_BAG_L2 | 0.8939 | 0.8405 | 0.6155 | 276.40 |
| 4 | CatBoost_r9_BAG_L1 | 0.8917 | 0.7889 | 0.0606 | 53.87 |
| 5 | WeightedEnsemble_L3 | 0.8904 | 0.8500 | 0.9871 | 333.68 |
Dataset
- Source: maryzhang/hw1-24679-tabular-dataset
- Size: 338 samples (30 original, 308 augmented)
- Features:
- US size (numeric)
- Shoe size (mm) (numeric)
- Type of shoe (categorical)
- Shoe color (categorical)
- Shoe brand (categorical)
- Target: Actual measured shoe length (mm)
- Splits: 80% training, 20% testing (random_state=42)
Preprocessing
- Converted Hugging Face dataset to Pandas DataFrame
- Train/test split with stratified random seed
- AutoGluon handled categorical encoding, normalization, and feature selection automatically
Training Setup
- Framework: AutoGluon Tabular v1.4.0
- Search Strategy: Bagged/stacked ensembles with model selection (
presets="best") - Time Budget: 1200 seconds (20 minutes)
- Evaluation Metric: R²
- Hyperparameter Search: Automated by AutoGluon (CatBoost, LightGBM, ensemble stacking)
Metrics
- R²: 0.8904 (test)
- RMSE: 1.80 mm
- MAE: 1.10 mm
- Median AE: 0.68 mm
- Uncertainty: Variability assessed across multiple base models in ensemble. Bagging reduces variance; expected error ±2 mm for most predictions.
Intended Use
- Educational: Demonstrates AutoML regression in CMU course 24-679
- Limitations:
- Small dataset size (338 samples) → not robust for production use
- Augmented data may not reflect real-world variability
- Not suitable for medical or industrial applications
Ethical Considerations
- Predictions should not be used to recommend or prescribe footwear sizes in clinical or consumer contexts.
- Dataset augmentation could introduce biases not present in real measurements.
License
- Dataset: MIT License
- Model: MIT License
Hardware / Compute
- Training: Google Colab (CPU runtime)
- Time: ~20 minutes wall-clock time
- RAM: <8 GB used
AI Usage Disclosure
- Model training and hyperparameter search used AutoML (AutoGluon).
- Model card text and documentation partially generated with AI assistance (ChatGPT).
Acknowledgments
- Dataset by Mary Zhang (CMU 24-679)
- Model training and documentation by Yash Sakhale
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support