Name: Reddit
Creator: Helix
Keywords: annotations_creators:no-annotation, dataset, hugging face, language_creators:crowdsourced, multilinguality:monolingual, social, source_datasets:original, task_categories:summarization

All datasets

Hugging Face

This corpus contains preprocessed posts from the Reddit dataset. The dataset consists of 3,848,330 posts with an average length of 270 words for content, and 28 words for the summary. Features includes strings: author, body, normalizedBody, content, summary, subreddit, subreddit_id. Content is used as document and summary is used as summary.

This dataset hasn't been imported yet, so it can't be charted here. You can browse it on Hugging Face.

Interesting queries to try

play_arrow top 10 rows of Reddit with summary statistics
play_arrow counts grouped by the most common field in Reddit
play_arrow summary charts for Reddit

Interesting queries to try

Related datasets