LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning Paper • 2602.07075 • Published 7 days ago • 17
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning Paper • 2602.07075 • Published 7 days ago • 17
CellForge: Agentic Design of Virtual Cell Models Paper • 2508.02276 • Published Aug 4, 2025 • 39
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8, 2025 • 76
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8, 2025 • 76
LocAgent: Graph-Guided LLM Agents for Code Localization Paper • 2503.09089 • Published Mar 12, 2025 • 13
LocAgent: Graph-Guided LLM Agents for Code Localization Paper • 2503.09089 • Published Mar 12, 2025 • 13 • 2
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10, 2025 • 16
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10, 2025 • 16 • 3
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3, 2025 • 30
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21, 2025 • 84
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 75
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 75
StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 152
ChatCell: Facilitating Single-Cell Analysis with Natural Language Paper • 2402.08303 • Published Feb 13, 2024 • 14
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science Paper • 2402.04247 • Published Feb 6, 2024 • 2
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents Paper • 2311.11797 • Published Nov 20, 2023 • 2
QTSumm: A New Benchmark for Query-Focused Table Summarization Paper • 2305.14303 • Published May 23, 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity Paper • 2310.07521 • Published Oct 11, 2023
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? Paper • 2309.08963 • Published Sep 16, 2023 • 11