Helix the Robot
Helix
arrow_backAll datasets

Trec

Hugging Face

The Text REtrieval Conference (TREC) Question Classification dataset contains 5500 labeled questions in training set and another 500 for test set. The dataset has 6 coarse class labels and 50 fine class labels. Average length of each sentence is 10, vocabulary size of 8700. Data are collected from four sources: 4,500 English questions published by USC (Hovy et al., 2001), about 500 manually constructed questions for a few rare classes, 894 TREC 8 and TREC 9 questions, and also 500 questions from TREC 10 which serves as the test set. These questions were manually labeled.

cloud_downloadtrec

cloud_off This dataset hasn't been imported yet, so it can't be charted here. You can browse it on Hugging Face.

Interesting queries to try

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here