Towards Generalizable Context-aware Anomaly Detection: A Large-scale Benchmark in Cloud Environments Paper • 2508.01844 • Published Aug 3, 2025 • 1
S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models Paper • 2310.15147 • Published Oct 23, 2023 • 2
Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent Paper • 2402.13717 • Published Feb 21, 2024 • 3
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning Paper • 2403.02333 • Published Mar 4, 2024 • 1
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models Paper • 2410.07331 • Published Oct 9, 2024 • 5