Helix the Robot
Helix Helix
arrow_backAll datasets

Txt360

Hugging Face

TxT360: A Top-Quality LLM Pre-training Dataset Requires the Perfect Blend Changelog Version Details v1.1 Added new data sources: TxT360_BestOfWeb, TxT360_QA, europarl-aligned, and wikipedia_extended. Details of v1.1 Additions TxT360_BestOfWeb: This is a filtered version of the TxT360 dataset, created using the ProX document filtering model. The model is similar to the FineWeb-Edu classifier, but also assigns an additional format score that… See the full description on the dataset page: https://huggingface.co/datasets/LLM360/TxT360.

view_list500 rows cloud_downloadLLM360/TxT360
boltOpen in Helix

Interesting queries to try

Columns

  • text text
  • meta mixed
  • subset categorical

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here