Wikipedia
Hugging FaceWikipedia dataset containing cleaned articles of all languages. The datasets are built from the Wikipedia dump (https://dumps.wikimedia.org/) with one split per language. Each example contains the content of one full Wikipedia article with cleaning to strip markdown and unwanted sections (references, etc.).
cloud_off This dataset hasn't been imported yet, so it can't be charted here. You can browse it on Hugging Face.