| --- |
| license: mit |
| language: |
| - en |
| metrics: |
| - r_squared |
| - accuracy |
| - mae |
| - mse |
| - f1 |
| - recall |
| tags: |
| - machine-learning |
| - algorithms |
| - tabular-data |
| - knn |
| - python |
| - weighted-knn |
| - data-science |
| - preprocessing |
| --- |
| |
|
|
| SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API. |
|
|
|
|
| # Model Details |
|
|
|
|
| Model Description |
| SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability. |
|
|
| Developed by: Jashwanth Thatipamula |
| Model type: Weighted KNN for tabular ML |
| License: MIT |
| Language(s): Not language-dependent (numerical tabular ML) |
| Finetuned from model: Not applicable (original algorithm) |
|
|
| Model Sources |
| Repository: https://github.com/thatipamula-jashwanth/smart-knn |
| Paper (DOI): https://doi.org/10.5281/zenodo.17713746 |
| Demo: Coming soon |
|
|
|
|
| # Uses |
|
|
|
|
| Direct Use |
| • Regression on tabular datasets |
| • Classification on tabular datasets |
| • Interpretable ML where feature importance matters |
| • Real-world ML pipelines with missing values and noisy features |
|
|
| Downstream Use |
| • Research on distance-metric learning |
| • Explainable ML baselines |
| • AutoML components for tabular data |
|
|
| Out-of-Scope Use |
| • NLP, image or audio modelling |
| • Deep learning / GPU models |
| • Raw categorical datasets without encoding |
|
|
|
|
| # Bias, Risks, and Limitations |
|
|
| • Instance-based prediction can be slower than tree-based models on large datasets |
| • Low performance on categorical-only datasets without encoding |
| • Requires storing full training set for inference |
|
|
| Recommendations |
| Users should numerically encode categorical features before fitting SmartKNN. |
|
|
|
|
| # How to Get Started with the Model |
|
|
|
|
| pip install smart-knn |
|
|
| import pandas as pd |
| from smart_knn import SmartKNN |
| |
| df = pd.read_csv("data.csv") |
| X = df.drop("target", axis=1) |
| y = df["target"] |
|
|
| model = SmartKNN(k=5) |
| model.fit(X, y) |
|
|
| sample = X.iloc[0] |
| pred = model.predict(sample) |
| print(pred) |
|
|
|
|
| # Training Details |
|
|
|
|
| Training Data |
| SmartKNN is not pretrained and does not ship with training data; users train on their own dataset. |
|
|
| Preprocessing |
| Performed automatically: |
| • Normalization |
| • NaN / Inf cleaning |
| • Median imputation |
| • Outlier clipping |
| • Feature filtering via learned weights |
|
|
| Training Hyperparameters |
| • k = number of neighbors |
| • weight_threshold = drop features below learned importance |
| |
| |
| # Evaluation |
| Testing Data |
| Evaluated across 35 regression and 20 classification public tabular datasets. |
| |
| # Metrics |
| Regression: R², MSE |
| Classification: Accuracy |
| |
| # Results |
| • Regression: SmartKNN outperformed classical KNN on 90%+ datasets |
| • Classification: SmartKNN beat classical KNN on 60% of datasets |
| |
| # Summary |
| SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity. |
| |
| |
| # Environmental Impact |
| |
| |
| SmartKNN requires no GPU and has minimal energy usage. |
| Hardware Type: CPU |
| Hours used: Minimal |
| Carbon Emitted: Negligible |
| |
| # Technical Specifications |
| |
| |
| Model Architecture and Objective |
| • Instance-based learner |
| • Weighted Euclidean distance metric |
| • Learned feature weights (MSE + MI + Random Forest) |
| |
| Compute Infrastructure |
| • Runs efficiently on CPU systems |
| • Implemented using NumPy |
| |
| |
| # Citation |
| |
| |
| @software{smartknn2025, |
| author = {Jashwanth Thatipamula}, |
| title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours}, |
| year = {2025}, |
| publisher = {Zenodo}, |
| doi = {10.5281/zenodo.17713746}, |
| url = {https://doi.org/10.5281/zenodo.17713746} |
| } |
| |
| |
| # Model Card Authors |
| |
| Jashwanth Thatipamula |
| |
| Model Card Contact |
| Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn |