Helix the Robot
Helix Helix
arrow_backAll datasets

Openwebtext

Hugging Face

Dataset Card for "openwebtext" Dataset Summary An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2. This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Dataset Structure Data Instances plain_text Size of downloaded dataset files: 13.51 GB Size of the… See the full description on the dataset page: https://huggingface.co/datasets/Skylion007/openwebtext.

descriptionopenwebtext.parquet view_list500 rows cloud_downloadopenwebtext
boltOpen in Helix

Interesting queries to try

Columns

  • text text

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here