Scale AI

company

Verified

https://scale.com/

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

yashmaurya-scale published a dataset 1 day ago

ScaleAI/ROK-FORTRESS_public

tu-trinh-scale submitted a paper 8 days ago

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

yashmaurya-scale updated a dataset 21 days ago

ScaleAI/ROK-FORTRESS_public

View all activity

Papers

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

View all Papers

yashmaurya-scale

published a dataset 1 day ago

ScaleAI/ROK-FORTRESS_public

Viewer • Updated 21 days ago • 791 • 7

authored 3 papers 6 days ago

A StrongREJECT for Empty Jailbreaks

Paper • 2402.10260 • Published Feb 15, 2024

Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents

Paper • 2410.13886 • Published Oct 11, 2024

Learning to Coordinate with Experts

Paper • 2502.09583 • Published Feb 13, 2025

submitted a paper to Daily Papers 8 days ago

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

Paper • 2604.09408 • Published 15 days ago • 5

authored a paper 8 days ago

HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

Paper • 2604.09408 • Published 15 days ago • 5

yashmaurya-scale

updated a dataset 21 days ago

ScaleAI/ROK-FORTRESS_public

Viewer • Updated 21 days ago • 791 • 7

updated a bucket 21 days ago

ScaleAI/hil-bench-swe-images

in ScaleAI/audiomc 28 days ago

Judge prompt for ARS

#4 opened 28 days ago by

submitted a paper to Daily Papers 30 days ago

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

Paper • 2604.10718 • Published Apr 12 • 4

published a bucket about 1 month ago

ScaleAI/hil-bench-swe-databases

updated a bucket about 1 month ago

ScaleAI/hil-bench-sql-artifacts

updated a dataset about 1 month ago

ScaleAI/hil-bench

Viewer • Updated Mar 31 • 200 • 189 • 1

published a dataset about 1 month ago

ScaleAI/MultiChallenge

Viewer • Updated Mar 31 • 266 • 510 • 1

updated a dataset about 1 month ago

ScaleAI/MultiChallenge

Viewer • Updated Mar 31 • 266 • 510 • 1

mohit-raghavendra

in ScaleAI/SWE-Atlas-QnA about 1 month ago

Criterion Granularity Mismatch in Rubric (Example: Task 6905333b74f22949d97ba9f1, Criterion 1.11)

#2 opened 2 months ago by

andrewpark-scaleai

updated a dataset about 1 month ago

ScaleAI/SWE-Atlas-QnA

Viewer • Updated Mar 31 • 124 • 314 • 15

andrewpark-scaleai

in ScaleAI/SWE-Atlas-QnA about 1 month ago

cannot pull images from today

#4 opened about 2 months ago by

mohit-raghavendra

updated a dataset about 1 month ago

ScaleAI/SWE-Atlas-QnA

Viewer • Updated Mar 31 • 124 • 314 • 15

updated a bucket about 2 months ago

ScaleAI/hil-bench-swe-databases