arxiv:2605.02503

DataClawBench: An Agent Benchmark for Exploratory Real-World Financial Data Analysis

Published on May 17

Authors:

Abstract

Exploratory financial data analysis benchmark challenges LLM agents' reliability in real-world scenarios with limited prior guidance and native data noise.

AI-generated summary

Autonomous data analysis agents are increasingly expected to conduct exploratory analysis over underexplored data environments. This burden is especially salient in complex financial analytics, where relevant evidence is rarely pre-specified. However, existing benchmarks typically evaluate such agents in prior-guided settings, providing selected data sources, explicit data schemas, or cleaned data, thereby understating the exploratory burden. We introduce DataClawBench, a benchmark for exploratory real-world financial data analysis under limited prior guidance. DataClawBench contains approximately 2.06 million real-world records across enterprise, industry, and policy domains, with native data noise preserved. It further includes 492 cross-domain tasks derived from think-tank consulting scenarios, each annotated with intermediate milestones that diagnose exploration and reasoning failures beyond outcome accuracy. A systematic evaluation of eight advanced LLMs under the OpenClaw agent reveals that exploratory data analysis breaks agent reliability: more exploration does not reliably translate into task-relevant progress or correct final answers.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.02503

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.02503 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.02503 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.