Telemetry Knowledge-Graph Classifier
Zero-latency OS state classifier with knowledge-graph-based customizable inference and continuous online self-learning.
Classifies real-time user availability states from OS telemetry:
| State | Description | Accuracy |
|---|---|---|
| 🎯 Deep Focus | Single productive app, low switching, sustained work | 98.9% |
| 🎮 Gaming | High CPU, fullscreen, game process, high network | 100% |
| 📹 Meeting | Communication app active, calendar-matched, video I/O | 100% |
| 🟢 Available | Frequent switching, browser, mixed activities | 100% |
Overall: 99.56% on continuous telemetry stream (5000 samples, unseen data)
Architecture
Dual-Model Ensemble
- Tier 1 — River AdaptiveRandomForest (10 HoeffdingTrees, depth=8): Instant online learner. Updates on every labeled sample in <1ms via
learn_one(). Built-in ADWIN drift detection. - Tier 2 — LightGBM (100 trees, depth=6): Accuracy anchor. Retrained every 30 min on a 2000-sample reservoir-sampled replay buffer.
- Combiner: Adaptive weighted average. Weights shift based on which model has been more accurate recently.
Knowledge Graph (Customizable Inference)
A directed graph capturing relationships between processes, time blocks, and user states. Queried at inference time to produce Bayesian priors:
P_final(state) ∝ P_model(state|x)^0.7 · P_KG_prior(state|ctx)^0.3
Users can customize inference at runtime:
- Add hard override rules (
discord.exe → Meeting, priority=20) - Add process→state edges with weights
- Add time-based priors (
9pm-12am non-work → Gaming) - Submit corrections that instantly update both models
- Query what the system knows about any process or state
Feature Engineering (107 dimensions)
- Welford online normalization (O(1) memory per feature)
- Multi-scale EMAs (30s / 3min / 30min smoothing)
- Rolling statistics (1min and 5min windows)
- Feature hashing (16 buckets for process names — no vocabulary explosion)
- Cyclic temporal encoding (sin/cos for hour and day)
- Rate-of-change features (first and second derivatives)
Usage
Quick Start
from huggingface_hub import hf_hub_download
import pickle, json
# Download artifacts
model_dir = hf_hub_download("Niansuh1/telemetry-kg-classifier", filename="arf_model.pkl",
local_dir="./model_artifacts")
# ... load all artifacts
Training
pip install river lightgbm networkx psutil mmh3
python train.py --samples 30000 --output ./model_artifacts
Inference with Customization
# Interactive CLI
python serve.py --mode cli
# JSON API
python serve.py --mode json
# Demo
python serve.py --mode demo
Knowledge Graph Customization Examples
# "Discord is always work meetings for me"
ensemble.kg.add_override_rule(
rule_id="discord_meeting",
condition={"process": "discord.exe"},
target_state="Meeting",
priority=20,
)
# "I game every evening"
ensemble.kg.add_rule(
condition={"hour_range": [21, 24], "is_work_hours": False},
target_state="Gaming",
priority=5,
)
# Correct a wrong prediction (instant model update)
ensemble.learn(features, "Meeting", raw_telemetry, is_correction=True)
Resource Footprint
| Component | RAM | CPU per cycle (3s) |
|---|---|---|
| Feature engineering | 1 MB | 0.5ms |
| River ARF model | 20 MB | 1ms |
| LightGBM model | 1 MB | 0.02ms |
| Knowledge Graph | 1 MB | 0.1ms |
| Total | ~25 MB | ~2ms (0.07% CPU) |
Training Details
- Data: 30,000 synthetic telemetry samples with Markov chain state transitions
- Guaranteed class balance: 2000+ samples per state minimum
- Online learning: ARF updates every sample, LightGBM retrains every 3000 samples
- Concept drift: ADWIN detector on prediction error stream
- Experience replay: 2000-sample reservoir-sampled buffer for LightGBM
References
- River ML: Montiel et al., "River: machine learning for streaming data in Python", JMLR 2021
- LightGBM: Ke et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree", NeurIPS 2017
- ADWIN: Bifet & Gavalda, "Learning from Time-Changing Data with Adaptive Windowing", SDM 2007
- Feature Hashing: Weinberger et al., "Feature Hashing for Large Scale Multitask Learning", ICML 2009
- Welford: B.P. Welford, "Note on a Method for Calculating Corrected Sums of Squares and Products", Technometrics 1962
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern