Telemetry Knowledge-Graph Classifier

Zero-latency OS state classifier with knowledge-graph-based customizable inference and continuous online self-learning.

Classifies real-time user availability states from OS telemetry:

State	Description	Accuracy
🎯 Deep Focus	Single productive app, low switching, sustained work	98.9%
🎮 Gaming	High CPU, fullscreen, game process, high network	100%
📹 Meeting	Communication app active, calendar-matched, video I/O	100%
🟢 Available	Frequent switching, browser, mixed activities	100%

Overall: 99.56% on continuous telemetry stream (5000 samples, unseen data)

Architecture

Dual-Model Ensemble

Tier 1 — River AdaptiveRandomForest (10 HoeffdingTrees, depth=8): Instant online learner. Updates on every labeled sample in <1ms via learn_one(). Built-in ADWIN drift detection.
Tier 2 — LightGBM (100 trees, depth=6): Accuracy anchor. Retrained every 30 min on a 2000-sample reservoir-sampled replay buffer.
Combiner: Adaptive weighted average. Weights shift based on which model has been more accurate recently.

Knowledge Graph (Customizable Inference)

A directed graph capturing relationships between processes, time blocks, and user states. Queried at inference time to produce Bayesian priors:

P_final(state) ∝ P_model(state|x)^0.7 · P_KG_prior(state|ctx)^0.3

Users can customize inference at runtime:

Add hard override rules (discord.exe → Meeting, priority=20)
Add process→state edges with weights
Add time-based priors (9pm-12am non-work → Gaming)
Submit corrections that instantly update both models
Query what the system knows about any process or state

Feature Engineering (107 dimensions)

Welford online normalization (O(1) memory per feature)
Multi-scale EMAs (30s / 3min / 30min smoothing)
Rolling statistics (1min and 5min windows)
Feature hashing (16 buckets for process names — no vocabulary explosion)
Cyclic temporal encoding (sin/cos for hour and day)
Rate-of-change features (first and second derivatives)

Usage

Quick Start

from huggingface_hub import hf_hub_download
import pickle, json

# Download artifacts
model_dir = hf_hub_download("Niansuh1/telemetry-kg-classifier", filename="arf_model.pkl",
                             local_dir="./model_artifacts")
# ... load all artifacts

Training

pip install river lightgbm networkx psutil mmh3
python train.py --samples 30000 --output ./model_artifacts

Inference with Customization

# Interactive CLI
python serve.py --mode cli

# JSON API
python serve.py --mode json

# Demo
python serve.py --mode demo

Knowledge Graph Customization Examples

# "Discord is always work meetings for me"
ensemble.kg.add_override_rule(
    rule_id="discord_meeting",
    condition={"process": "discord.exe"},
    target_state="Meeting",
    priority=20,
)

# "I game every evening"
ensemble.kg.add_rule(
    condition={"hour_range": [21, 24], "is_work_hours": False},
    target_state="Gaming",
    priority=5,
)

# Correct a wrong prediction (instant model update)
ensemble.learn(features, "Meeting", raw_telemetry, is_correction=True)

Resource Footprint

Component	RAM	CPU per cycle (3s)
Feature engineering	1 MB	0.5ms
River ARF model	20 MB	1ms
LightGBM model	1 MB	0.02ms
Knowledge Graph	1 MB	0.1ms
Total	~25 MB	~2ms (0.07% CPU)

Training Details

Data: 30,000 synthetic telemetry samples with Markov chain state transitions
Guaranteed class balance: 2000+ samples per state minimum
Online learning: ARF updates every sample, LightGBM retrains every 3000 samples
Concept drift: ADWIN detector on prediction error stream
Experience replay: 2000-sample reservoir-sampled buffer for LightGBM

References

River ML: Montiel et al., "River: machine learning for streaming data in Python", JMLR 2021
LightGBM: Ke et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree", NeurIPS 2017
ADWIN: Bifet & Gavalda, "Learning from Time-Changing Data with Adaptive Windowing", SDM 2007
Feature Hashing: Weinberger et al., "Feature Hashing for Large Scale Multitask Learning", ICML 2009
Welford: B.P. Welford, "Note on a Method for Calculating Corrected Sums of Squares and Products", Technometrics 1962

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Downloads last month: -; Downloads are not tracked for this model. How to track