C4
Hugging FaceA colossal, cleaned version of Common Crawl's web crawl corpus. Based on Common Crawl dataset: "https://commoncrawl.org". This is the processed version of Google's C4 dataset by AllenAI.
cloud_off This dataset hasn't been imported yet, so it can't be charted here. You can browse it on Hugging Face.