Helix the Robot
Helix Helix
arrow_backAll datasets

Github Code Clean

Hugging Face

The GitHub Code clean dataset in a more filtered version of codeparrot/github-code dataset, it consists of 115M code files from GitHub in 32 programming languages with 60 extensions totaling in almost 1TB of text data.

boltOpen in Helix

Interesting queries to try

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here