Forecasting Downstream Performance of LLMs With Proxy Metrics Paper • 2605.18607 • Published 7 days ago • 10
RiT: Vanilla Diffusion Transformers Suffice in Representation Space Paper • 2605.21981 • Published 4 days ago • 9
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics Paper • 2605.12178 • Published 13 days ago • 60
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure Paper • 2604.11045 • Published Apr 13 • 26
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published Apr 8 • 95
Communicating about Space: Language-Mediated Spatial Integration Across Partial Views Paper • 2603.27183 • Published Mar 28 • 20
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98 • 5
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published Mar 13 • 149
Value Drifts: Tracing Value Alignment During LLM Post-Training Paper • 2510.26707 • Published Oct 30, 2025 • 13