AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems Dec 23, 2025 • 49
Developing Safe and Responsible Large Language Models -- A Comprehensive Framework Paper • 2404.01399 • Published Apr 1, 2024 • 1
Azimuth: Systematic Error Analysis for Text Classification Paper • 2212.08216 • Published Dec 16, 2022
M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models Paper • 2406.16783 • Published Jun 24, 2024 • 4
Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages Paper • 2411.02398 • Published Nov 4, 2024 • 1
Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Script Languages Paper • 2411.02398 • Published Nov 4, 2024 • 1
DNA Bench: When Silence is Smarter -- Benchmarking Over-Reasoning in Reasoning LLMs Paper • 2503.15793 • Published Mar 20, 2025
Layer-Wise Quantization: A Pragmatic and Effective Method for Quantizing LLMs Beyond Integer Bit-Levels Paper • 2406.17415 • Published Jun 25, 2024
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models Paper • 2503.01781 • Published Mar 3, 2025 • 2
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs Paper • 2509.08031 • Published Sep 9, 2025 • 21
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs Paper • 2509.08031 • Published Sep 9, 2025 • 21
RECODE-H: A Benchmark for Research Code Development with Interactive Human Feedback Paper • 2510.06186 • Published Oct 7, 2025
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics Paper • 2605.12178 • Published 7 days ago • 60
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents Paper • 2605.13841 • Published 6 days ago • 60
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents Paper • 2605.13841 • Published 6 days ago • 60
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents Paper • 2605.13841 • Published 6 days ago • 60
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents Paper • 2605.13841 • Published 6 days ago • 60
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents Paper • 2605.13841 • Published 6 days ago • 60