Telemetry Knowledge-Graph Classifier

Zero-latency OS state classifier with knowledge-graph-based customizable inference and continuous online self-learning.

Classifies real-time user availability states from OS telemetry:

State Description Accuracy
🎯 Deep Focus Single productive app, low switching, sustained work 98.9%
🎮 Gaming High CPU, fullscreen, game process, high network 100%
📹 Meeting Communication app active, calendar-matched, video I/O 100%
🟢 Available Frequent switching, browser, mixed activities 100%

Overall: 99.56% on continuous telemetry stream (5000 samples, unseen data)

Architecture

Dual-Model Ensemble

  • Tier 1 — River AdaptiveRandomForest (10 HoeffdingTrees, depth=8): Instant online learner. Updates on every labeled sample in <1ms via learn_one(). Built-in ADWIN drift detection.
  • Tier 2 — LightGBM (100 trees, depth=6): Accuracy anchor. Retrained every 30 min on a 2000-sample reservoir-sampled replay buffer.
  • Combiner: Adaptive weighted average. Weights shift based on which model has been more accurate recently.

Knowledge Graph (Customizable Inference)

A directed graph capturing relationships between processes, time blocks, and user states. Queried at inference time to produce Bayesian priors:

P_final(state) ∝ P_model(state|x)^0.7 · P_KG_prior(state|ctx)^0.3

Users can customize inference at runtime:

  • Add hard override rules (discord.exe → Meeting, priority=20)
  • Add process→state edges with weights
  • Add time-based priors (9pm-12am non-work → Gaming)
  • Submit corrections that instantly update both models
  • Query what the system knows about any process or state

Feature Engineering (107 dimensions)

  • Welford online normalization (O(1) memory per feature)
  • Multi-scale EMAs (30s / 3min / 30min smoothing)
  • Rolling statistics (1min and 5min windows)
  • Feature hashing (16 buckets for process names — no vocabulary explosion)
  • Cyclic temporal encoding (sin/cos for hour and day)
  • Rate-of-change features (first and second derivatives)

Usage

Quick Start

from huggingface_hub import hf_hub_download
import pickle, json

# Download artifacts
model_dir = hf_hub_download("Niansuh1/telemetry-kg-classifier", filename="arf_model.pkl",
                             local_dir="./model_artifacts")
# ... load all artifacts

Training

pip install river lightgbm networkx psutil mmh3
python train.py --samples 30000 --output ./model_artifacts

Inference with Customization

# Interactive CLI
python serve.py --mode cli

# JSON API
python serve.py --mode json

# Demo
python serve.py --mode demo

Knowledge Graph Customization Examples

# "Discord is always work meetings for me"
ensemble.kg.add_override_rule(
    rule_id="discord_meeting",
    condition={"process": "discord.exe"},
    target_state="Meeting",
    priority=20,
)

# "I game every evening"
ensemble.kg.add_rule(
    condition={"hour_range": [21, 24], "is_work_hours": False},
    target_state="Gaming",
    priority=5,
)

# Correct a wrong prediction (instant model update)
ensemble.learn(features, "Meeting", raw_telemetry, is_correction=True)

Resource Footprint

Component RAM CPU per cycle (3s)
Feature engineering 1 MB 0.5ms
River ARF model 20 MB 1ms
LightGBM model 1 MB 0.02ms
Knowledge Graph 1 MB 0.1ms
Total ~25 MB ~2ms (0.07% CPU)

Training Details

  • Data: 30,000 synthetic telemetry samples with Markov chain state transitions
  • Guaranteed class balance: 2000+ samples per state minimum
  • Online learning: ARF updates every sample, LightGBM retrains every 3000 samples
  • Concept drift: ADWIN detector on prediction error stream
  • Experience replay: 2000-sample reservoir-sampled buffer for LightGBM

References

  • River ML: Montiel et al., "River: machine learning for streaming data in Python", JMLR 2021
  • LightGBM: Ke et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree", NeurIPS 2017
  • ADWIN: Bifet & Gavalda, "Learning from Time-Changing Data with Adaptive Windowing", SDM 2007
  • Feature Hashing: Weinberger et al., "Feature Hashing for Large Scale Multitask Learning", ICML 2009
  • Welford: B.P. Welford, "Note on a Method for Calculating Corrected Sums of Squares and Products", Technometrics 1962

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support