Arxiv Classification

Name: Arxiv Classification
Creator: Helix
Keywords: dataset, hugging face, language:en, research, size_categories:10K<n<100K, task_categories:text-classification, task_ids:multi-class-classification, task_ids:topic-classification

Hugging Face

Arxiv Classification: a classification of Arxiv Papers (11 classes). This dataset is intended for long context classification (documents have all > 4k tokens). Copied from "Long Document Classification From Local Word Glimpses via Recurrent Attention Learning" @ARTICLE{8675939, author={He, Jun and Wang, Liqun and Liu, Liu and Feng, Jiao and Wu, Hao}, journal={IEEE Access}, title={Long Document Classification From Local Word Glimpses via Recurrent Attention Learning}, year={2019}… See the full description on the dataset page: https://huggingface.co/datasets/ccdv/arxiv-classification.

ccdv--arxiv-classification.parquet 500 rows ccdv/arxiv-classification

Open in Helix Read research report

Ask a question about this data

Type any question in plain English — Helix builds the chart with AI. Sign in to run it and save your charts.

histogram of label most common values in text length distribution of text

Data preview

500 rows · 2 columns · showing first 12

#	text text	label integer
1	Constrained Submodular Maximization via a Non-symmetric Technique arXiv:1611.03253v1 [cs.DS] 10 Nov 2016 Niv Buchbinder∗ Moran Feldman† …	8
2	Self Organizing Maps Whose Topologies Can Be Learned With Adaptive Binary Search Trees Using Conditional Rotations arXiv:1506.02750v1 [cs.…	9
3	Robust Satisfaction of Temporal Logic Specifications via Reinforcement Learning arXiv:1510.06460v1 [cs.SY] 22 Oct 2015 Austin Jones1 , De…	3
4	BATCHED QR AND SVD ALGORITHMS ON GPUS WITH APPLICATIONS IN HIERARCHICAL MATRIX COMPRESSION arXiv:1707.05141v1 [cs.MS] 13 Jul 2017 WAJIH H…	8
5	Analytical and simplified models for dynamic analysis of short skew bridges under moving loads arXiv:1704.07285v2 [cs.CE] 12 Feb 2018 K. …	5
6	Efficient PAC Learning from the Crowd arXiv:1703.07432v2 [cs.LG] 13 Apr 2017 Pranjal Awasthi∗ Avrim Blum† Nika Haghtalab‡ Yishay Manso…	8
7	Automated Identification of Trampoline Skills Using Computer Vision Extracted Pose Estimation Paul W. Connolly, Guenole C. Silvestre and Ch…	1
8	Parsing methods streamlined arXiv:1309.7584v1 [cs.FL] 29 Sep 2013 Luca Breveglieri Stefano Crespi Reghizzi Angelo Morzenti Dipartiment…	6
9	I/O-Efficient Similarity Join⋆ Rasmus Pagh, Ninh Pham, Francesco Silvestri⋆⋆ , and Morten Stöckel⋆ ⋆ ⋆ arXiv:1507.00552v2 [cs.DS] 28 Mar …	8
10	arXiv:1207.0612v2 [math.AC] 21 Jan 2013 COMPLETION BY DERIVED DOUBLE CENTRALIZER MARCO PORTA, LIRAN SHAUL AND AMNON YEKUTIELI Abstract. Le…	0
11	NDT: Neual Decision Tree Towards Fully Functioned Neural Graph Han Xiao 1 arXiv:1712.05934v1 [cs.NE] 16 Dec 2017 Abstract Though traditi…	9
12	Alignment Elimination from Adams’ Grammars Härmel Nestra1 1 Institute of Computer Science, University of Tartu, J. Liivi 2, 50409 Tartu, E…	6

Auto-generated charts

Arxiv Classification: 500 rows by 2 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.

Rows500

Columns2

Numeric cols1

Charts

Distribution of label

Histogram of label values.

Interesting queries to try

Columns

text text
label numeric