Helix the Robot
Helix Helix
arrow_backAll datasets

Arxiver

Hugging Face

Arxiver Dataset Arxiver consists of 63,357 arXiv papers converted to multi-markdown (.mmd) format. Our dataset includes original arXiv article IDs, titles, abstracts, authors, publication dates, URLs and corresponding markdown files published between January 2023 and October 2023. We hope our dataset will be useful for various applications such as semantic search, domain specific language modeling, question answering and summarization. Curation The Arxiver dataset is… See the full description on the dataset page: https://huggingface.co/datasets/neuralwork/arxiver.

descriptionneuralwork--arxiver.parquet view_list500 rows cloud_downloadneuralwork/arxiver
boltOpen in Helix

Interesting queries to try

Columns

  • id text
  • title text
  • abstract text
  • authors text
  • published_date datetime
  • link text
  • markdown text

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here