Duplicate from JashuXo/smart-knn

9f07359 3 months ago

4.24 kB

	---
	license: mit
	language:
	- en
	metrics:
	- r_squared
	- accuracy
	- mae
	- mse
	- f1
	- recall
	tags:
	- machine-learning
	- algorithms
	- tabular-data
	- knn
	- python
	- weighted-knn
	- data-science
	- preprocessing
	---


	SmartKNN is a weighted and interpretable extension of classical K-Nearest Neighbours (KNN), designed for real-world tabular machine learning. It automatically learns feature importance, filters weak features, handles missing values, normalizes inputs internally, and consistently achieves higher accuracy and robustness than classical KNN — while maintaining a simple scikit-learn-style API.


	# Model Details


	Model Description
	SmartKNN improves classical KNN by learning feature weights and applying a weighted Euclidean distance for neighbour selection. It performs normalization, NaN/Inf cleaning, median imputation, outlier clipping, and feature filtering internally. It exposes feature importance for transparency and explainability.

	Developed by: Jashwanth Thatipamula
	Model type: Weighted KNN for tabular ML
	License: MIT
	Language(s): Not language-dependent (numerical tabular ML)
	Finetuned from model: Not applicable (original algorithm)

	Model Sources
	Repository: https://github.com/thatipamula-jashwanth/smart-knn
	Paper (DOI): https://doi.org/10.5281/zenodo.17713746
	Demo: Coming soon


	# Uses


	Direct Use
	• Regression on tabular datasets
	• Classification on tabular datasets
	• Interpretable ML where feature importance matters
	• Real-world ML pipelines with missing values and noisy features

	Downstream Use
	• Research on distance-metric learning
	• Explainable ML baselines
	• AutoML components for tabular data

	Out-of-Scope Use
	• NLP, image or audio modelling
	• Deep learning / GPU models
	• Raw categorical datasets without encoding


	# Bias, Risks, and Limitations

	• Instance-based prediction can be slower than tree-based models on large datasets
	• Low performance on categorical-only datasets without encoding
	• Requires storing full training set for inference

	Recommendations
	Users should numerically encode categorical features before fitting SmartKNN.


	# How to Get Started with the Model


	pip install smart-knn

	import pandas as pd
	from smart_knn import SmartKNN

	df = pd.read_csv("data.csv")
	X = df.drop("target", axis=1)
	y = df["target"]

	model = SmartKNN(k=5)
	model.fit(X, y)

	sample = X.iloc[0]
	pred = model.predict(sample)
	print(pred)


	# Training Details


	Training Data
	SmartKNN is not pretrained and does not ship with training data; users train on their own dataset.

	Preprocessing
	Performed automatically:
	• Normalization
	• NaN / Inf cleaning
	• Median imputation
	• Outlier clipping
	• Feature filtering via learned weights

	Training Hyperparameters
	• k = number of neighbors
	• weight_threshold = drop features below learned importance


	# Evaluation
	Testing Data
	Evaluated across 35 regression and 20 classification public tabular datasets.

	# Metrics
	Regression: R², MSE
	Classification: Accuracy

	# Results
	• Regression: SmartKNN outperformed classical KNN on 90%+ datasets
	• Classification: SmartKNN beat classical KNN on 60% of datasets

	# Summary
	SmartKNN delivers higher accuracy, greater robustness to noise, and better interpretability than classical KNN while preserving its simplicity.


	# Environmental Impact


	SmartKNN requires no GPU and has minimal energy usage.
	Hardware Type: CPU
	Hours used: Minimal
	Carbon Emitted: Negligible

	# Technical Specifications


	Model Architecture and Objective
	• Instance-based learner
	• Weighted Euclidean distance metric
	• Learned feature weights (MSE + MI + Random Forest)

	Compute Infrastructure
	• Runs efficiently on CPU systems
	• Implemented using NumPy


	# Citation


	@software{smartknn2025,
	author = {Jashwanth Thatipamula},
	title = {SmartKNN: An Interpretable Weighted Distance Framework for K-Nearest Neighbours},
	year = {2025},
	publisher = {Zenodo},
	doi = {10.5281/zenodo.17713746},
	url = {https://doi.org/10.5281/zenodo.17713746}
	}


	# Model Card Authors

	Jashwanth Thatipamula

	Model Card Contact
	Contact via GitHub issues: https://github.com/thatipamula-jashwanth/smart-knn