Instructions to use yukyung/LAnoBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yukyung/LAnoBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="yukyung/LAnoBERT")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("yukyung/LAnoBERT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
LAnoBERT checkpoints (BGL / HDFS / Thunderbird)
From-scratch, custom-vocabulary BERT encoders trained with a masked-language- modeling objective on normal system logs only (no next-sentence prediction), following the LAnoBERT log anomaly detection method. One checkpoint per dataset, stored as a subfolder of this repo.
- Code: https://github.com/yukyunglee/LAnoBERT
- Paper: Yukyung Lee, Jina Kim, Pilsung Kang. LAnoBERT: System log anomaly detection based on BERT masked language model. Applied Soft Computing, Vol. 146, 2023, 110689. https://doi.org/10.1016/j.asoc.2023.110689
| subfolder | dataset | vocab | batch | AUROC / best-F1 (error_mean) |
|---|---|---|---|---|
bgl |
BGL | 1000 | 32 | 1.000 / 1.000 |
hdfs |
HDFS | 200 | 32 | 0.997 / 0.969 |
thunderbird |
Thunderbird | 10000 | 32 | 1.000 / 1.000 |
Usage
from transformers import AutoModelForMaskedLM, AutoTokenizer
sub = "bgl" # or "hdfs" / "thunderbird"
tok = AutoTokenizer.from_pretrained("yukyung/LAnoBERT", subfolder=sub)
model = AutoModelForMaskedLM.from_pretrained("yukyung/LAnoBERT", subfolder=sub)
Scoring
Anomaly score = mean per-word cross-entropy over a log line (error_mean),
which is length-adaptive and balanced across datasets. See the code repository
for the full inference pipeline.
Citation
@article{lee2023lanobert,
title = {LAnoBERT: System log anomaly detection based on BERT masked language model},
author = {Lee, Yukyung and Kim, Jina and Kang, Pilsung},
journal = {Applied Soft Computing},
volume = {146},
pages = {110689},
year = {2023},
issn = {1568-4946},
doi = {10.1016/j.asoc.2023.110689}
}