Helix the Robot
Helix Helix
arrow_backAll datasets

Swe Bench Pro

Hugging Face

Dataset Summary SWE-Bench Pro is a challenging, enterprise-level dataset for testing agent ability on long-horizon software engineering tasks. Paper: https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf See the related evaluation Github: https://github.com/scaleapi/SWE-bench_Pro-os Dataset Structure We follow SWE-Bench Verified (https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified) in terms of dataset structure, with several… See the full description on the dataset page: https://huggingface.co/datasets/ScaleAI/SWE-bench_Pro.

cloud_downloadScaleAI/SWE-bench_Pro
boltOpen in Helix

Interesting queries to try

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here