EvoClaw-Bench

community

https://evo-claw.com/

EvoClaw-Bench

Activity Feed

AI & ML interests

Evaluating AI Agents on Continuous Tasks

Recent Activity

hyd2apse updated a dataset 2 days ago

EvoClaw-Bench/EvoClaw-data

zimplex authored a paper 11 days ago

Haystack Engineering: Context Engineering for Heterogeneous and Agentic Long-Context Evaluation

zimplex authored a paper 11 days ago

Rethinking Memory Mechanisms of Foundation Agents in the Second Half: A Survey

View all activity

Organization Card

Community About org cards

Evaluate AI on Continuous Tasks

EvoClaw is a general-purpose evaluation harness for AI agents on continuous tasks, where milestones build on each other, dependencies interleave, and context accumulates over a long session. Unlike one-shot benchmarks, EvoClaw challenges agents to complete ordered sequences of tasks within a persistent environment, enabling fine-grained, per-milestone analysis.

models 0

None public yet

datasets 2

EvoClaw-Bench/EvoClaw-data

Updated 2 days ago • 1.06k • 4

EvoClaw-Bench/EvoClaw-log

Updated about 1 month ago • 242 • 1

AI & ML interests

Recent Activity

Team members 8

models 0

datasets 2 Sort: Recently updated

datasets 2