Reddit Finance 43 250K
Hugging Facereddit finance 43 250k reddit_finance_43_250k is a collection of 250k post/comment pairs from 43 financial, investing and crypto subreddits. Post must have all been text, with a length of 250chars, and a positive score. Each subreddit is narrowed down to the 70th qunatile before being mergered with their top 3 comments and than the other subs. Further score based methods are used to select the top 250k post/comment pairs. The code to recreate the dataset is here:… See the full description on the dataset page: https://huggingface.co/datasets/winddude/reddit_finance_43_250k.
Interesting queries to try
Columns
- id text
- title text
- selftext text
- z_score numeric
- normalized_score numeric
- subreddit text
- body text
- comment_normalized_score numeric
- combined_score numeric