Helix the Robot
Helix Helix
arrow_backAll datasets

Wikitext

Hugging Face

Dataset Card for "wikitext" Dataset Summary The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed version of Penn Treebank (PTB), WikiText-2 is over 2 times larger and WikiText-103 is over 110 times larger. The WikiText dataset also features a far larger… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/wikitext.

descriptionsalesforce--wikitext--wikitext-103-raw-v1.parquet view_list500 rows cloud_downloadSalesforce/wikitext
boltOpen in Helix

Interesting queries to try

Columns

  • text text

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here