LAnoBERT checkpoints (BGL / HDFS / Thunderbird)

From-scratch, custom-vocabulary BERT encoders trained with a masked-language- modeling objective on normal system logs only (no next-sentence prediction), following the LAnoBERT log anomaly detection method. One checkpoint per dataset, stored as a subfolder of this repo.

subfolder dataset vocab batch AUROC / best-F1 (error_mean)
bgl BGL 1000 32 1.000 / 1.000
hdfs HDFS 200 32 0.997 / 0.969
thunderbird Thunderbird 10000 32 1.000 / 1.000

Usage

from transformers import AutoModelForMaskedLM, AutoTokenizer

sub = "bgl"  # or "hdfs" / "thunderbird"
tok = AutoTokenizer.from_pretrained("yukyung/LAnoBERT", subfolder=sub)
model = AutoModelForMaskedLM.from_pretrained("yukyung/LAnoBERT", subfolder=sub)

Scoring

Anomaly score = mean per-word cross-entropy over a log line (error_mean), which is length-adaptive and balanced across datasets. See the code repository for the full inference pipeline.

Citation

@article{lee2023lanobert,
  title   = {LAnoBERT: System log anomaly detection based on BERT masked language model},
  author  = {Lee, Yukyung and Kim, Jina and Kang, Pilsung},
  journal = {Applied Soft Computing},
  volume  = {146},
  pages   = {110689},
  year    = {2023},
  issn    = {1568-4946},
  doi     = {10.1016/j.asoc.2023.110689}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support