Dataset research report
Arxiver research report
A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.
Executive Summary
Arxiver Dataset Arxiver consists of 63,357 arXiv papers converted to multi-markdown (.mmd) format. Our dataset includes original arXiv article IDs, titles, abstracts, authors, publication dates, URLs and corresponding markdown files published between January 2023 and October 2023. We hope our dataset will be useful for various applications such as semantic search, domain specific language modeling, question answering and summarization. Curation The Arxiver dataset is… See the full description on the dataset page: https://huggingface.co/datasets/neuralwork/arxiver.
Follow-Up Queries
Preview Rows
| # | idtext | titletext | abstracttext | authorstext | published_datetext | linktext | markdowntext |
|---|---|---|---|---|---|---|---|
| 1 | 2305.00379 | Image Completion via Dual-path Cooperative Filtering | Given the recent advances with image-generating algorithms, deep image completion methods have made significant progress. However, state-of… | Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger | 2023-04-30T03:54:53Z | http://arxiv.org/abs/2305.00379v1 | # Image Completion via Dual-Path Cooperative Filtering ###### Abstract Given the recent advances with image-generating algorithms, deep i… |
| 2 | 2307.16362 | High Sensitivity Beamformed Observations of the Crab Pulsar's Radio Emission | We analyzed four epochs of beamformed EVN data of the Crab Pulsar at 1658.49 MHz. With the high sensitivity resulting from resolving out th… | Rebecca Lin, Marten H. van Kerkwijk | 2023-07-31T01:36:55Z | http://arxiv.org/abs/2307.16362v2 | # High Sensitivity Beamformed Observations of the Crab Pulsar's Radio Emission ###### Abstract We analyzed four epochs of beamformed EVN … |
| 3 | 2301.07687 | Maybe, Maybe Not: A Survey on Uncertainty in Visualization | Understanding and evaluating uncertainty play a key role in decision-making. When a viewer studies a visualization that demands inference, … | Krisha Mehta | 2022-12-14T00:07:06Z | http://arxiv.org/abs/2301.07687v1 | # Maybe, Maybe Not: A Survey on Uncertainty in Visualization ###### Abstract Understanding and evaluating uncertainty play a key role in … |
| 4 | 2309.09088 | Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition | Vocoder models have recently achieved substantial progress in generating authentic audio comparable to human quality while significantly re… | Haoming Guo, Seth Z. Zhao, Jiachen Lian, Gopala Anumanchipalli, Gerald Friedland | 2023-09-16T20:04:16Z | http://arxiv.org/abs/2309.09088v2 | # Enhancing Gan-Based Vocoders with Contrastive Learning Under Data-Limited Condition ###### Abstract Vocoder models have recently achiev… |
| 5 | 2307.16404 | Nonvolatile Magneto-Thermal Switching in MgB2 | Ongoing research explores thermal switching materials to control heat flow. Specifically, there has been interest in magneto-thermal switch… | Hiroto Arima, Yoshikazu Mizuguchi | 2023-07-31T04:59:19Z | http://arxiv.org/abs/2307.16404v1 | # Nonvolatile Magneto-Thermal Switching in MgB\({}_{2}\) ###### Abstract Ongoing research explores thermal switching materials to control… |
| 6 | 2307.16410 | HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution | Scene text image super-resolution (STISR) is an important pre-processing technique for text recognition from low-resolution scene images. N… | Minyi Zhao, Yi Xu, Bingjia Li, Jie Wang, Jihong Guan, Shuigeng Zhou | 2023-07-31T05:32:57Z | http://arxiv.org/abs/2307.16410v1 | # HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution ###### Abstract Scene text image super-resolutio… |
Data Dictionary
- id text
- title text
- abstract text
- authors text
- published_date datetime
- link text
- markdown text
Method And Limits
- Load the catalog entry and preview rows from the processed dataset file.
- Infer numeric, categorical, time, and location fields from real columns.
- Generate a small set of defensive Plotly chart specifications from that profile.
- Expose each chart idea as a query link so the report can be rerun or exported in Helix.
This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.