Ai Arxiv Chunked
Hugging FaceHugging Face dataset: jamescalam/ai-arxiv-chunked
Ask a question about this data
Type any question in plain English — Helix builds the chart with AI. Sign in to run it and save your charts.
Data preview
500 rows · 15 columns · showing first 12| # | doi text | chunk-id text | chunk text | id text | title text | summary text | source text | authors text | categories text | comment text | journal_ref text | primary_category text | published text | updated text | references text |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1910.01108 | 0 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor SANH, Lysandre DEBUT, Julien CHAUMOND, Thomas WOLF Hug… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 2 | 1910.01108 | 1 | loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train an… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 3 | 1910.01108 | 2 | in real-time has the potential to enable novel and interesting language processing applications, the growing computational and memory requi… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 4 | 1910.01108 | 3 | through distillation via the supervision of a bigger Transformer language model can achieve similar performance on a variety of downstream … | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 5 | 1910.01108 | 4 | generalization capabilities of the model and how well it will perform on the test set3. Training loss The student is trained with a distill… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 6 | 1910.01108 | 5 | and teacher hidden states vectors. 3 DistilBERT: a distilled version of BERT Student architecture In the present work, the student - Distil… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 7 | 1910.01108 | 6 | 3E.g. BERT-base’s predictions for a masked token in " I think this is the beginning of a beautiful [MASK] " comprise two high probability t… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 8 | 1910.01108 | 7 | performance on downstream tasks. Comparison on downstream tasks: IMDb (test accuracy) and SQuAD 1.1 (EM/F1 on dev set). D: with a second st… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 9 | 1910.01108 | 8 | examples per batch) using dynamic masking and without the next sentence prediction objective. Data and compute power We train DistilBERT on… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 10 | 1910.01108 | 9 | et al. [2018]) encoder followed by two BiLSTMs.4 The results on each of the 9 tasks are showed on Table 1 along with the macro-score (avera… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 11 | 1910.01108 | 10 | We also studied whether we could add another step of distillation during the adaptation phase by fine-tuning DistilBERT on SQuAD using a BER… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] | |
| 12 | 1910.01108 | 11 | Size and inference speed To further investigate the speed-up/size trade-off of DistilBERT, we compare (in Table 3) the number of parameters… | 1910.01108 | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large… | http://arxiv.org/pdf/1910.01108 | ['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf'] | ['cs.CL'] | February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted … | cs.CL | 20191002 | 20200301 | [{'id': '1910.01108'}] |
Auto-generated charts
Ai Arxiv Chunked: 500 rows by 15 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.
Rows500
Columns15
Categorical cols9
Date range2015-06-22 → 2022-05-19
Charts
doi by record count
Most common doi values across records.
Interesting queries to try
Columns
- doi categorical
- chunk-id text
- chunk text
- id categorical
- title categorical
- summary categorical
- source categorical
- authors mixed
- categories mixed
- comment categorical
- journal_ref categorical
- primary_category categorical
- published categorical
- updated datetime
- references mixed