From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors Paper • 2605.31042 • Published 6 days ago • 17
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models Paper • 2605.20873 • Published 15 days ago • 44
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published Apr 20 • 85
MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning Paper • 2603.03379 • Published Mar 3 • 32
DeepImageSearch Collection Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories • 3 items • Updated Mar 7
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories Paper • 2602.10809 • Published Feb 11 • 59
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories Paper • 2602.10809 • Published Feb 11 • 59
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories Paper • 2602.10809 • Published Feb 11 • 59
LawThinker: A Deep Research Legal Agent in Dynamic Environments Paper • 2602.12056 • Published Feb 12 • 35
LawThinker: A Deep Research Legal Agent in Dynamic Environments Paper • 2602.12056 • Published Feb 12 • 35