Helix the Robot
Helix
arrow_backAll datasets

Ai Arxiv Chunked

Hugging Face

Hugging Face dataset: jamescalam/ai-arxiv-chunked

descriptionjamescalam--ai-arxiv-chunked.parquet view_list500 rows cloud_downloadjamescalam/ai-arxiv-chunked
boltOpen in Helix

Ask a question about this data

Type any question in plain English — Helix builds the chart with AI. Sign in to run it and save your charts.

auto_awesome

Data preview

500 rows · 15 columns · showing first 12
# doi text chunk-id text chunk text id text title text summary text source text authors text categories text comment text journal_ref text primary_category text published text updated text references text
1 1910.011080DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor SANH, Lysandre DEBUT, Julien CHAUMOND, Thomas WOLF Hug…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
2 1910.011081loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train an…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
3 1910.011082in real-time has the potential to enable novel and interesting language processing applications, the growing computational and memory requi…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
4 1910.011083through distillation via the supervision of a bigger Transformer language model can achieve similar performance on a variety of downstream …1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
5 1910.011084generalization capabilities of the model and how well it will perform on the test set3. Training loss The student is trained with a distill…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
6 1910.011085and teacher hidden states vectors. 3 DistilBERT: a distilled version of BERT Student architecture In the present work, the student - Distil…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
7 1910.0110863E.g. BERT-base’s predictions for a masked token in " I think this is the beginning of a beautiful [MASK] " comprise two high probability t…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
8 1910.011087performance on downstream tasks. Comparison on downstream tasks: IMDb (test accuracy) and SQuAD 1.1 (EM/F1 on dev set). D: with a second st…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
9 1910.011088examples per batch) using dynamic masking and without the next sentence prediction objective. Data and compute power We train DistilBERT on…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
10 1910.011089et al. [2018]) encoder followed by two BiLSTMs.4 The results on each of the 9 tasks are showed on Table 1 along with the macro-score (avera…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
11 1910.0110810We also studied whether we could add another step of distillation during the adaptation phase by fine-tuning DistilBERT on SQuAD using a BER…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]
12 1910.0110811Size and inference speed To further investigate the speed-up/size trade-off of DistilBERT, we compare (in Table 3) the number of parameters…1910.01108DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighterAs Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large…http://arxiv.org/pdf/1910.01108['Victor Sanh' 'Lysandre Debut' 'Julien Chaumond' 'Thomas Wolf']['cs.CL']February 2020 - Revision: fix bug in evaluation metrics, updated metrics, argumentation unchanged. 5 pages, 1 figure, 4 tables. Accepted …cs.CL2019100220200301[{'id': '1910.01108'}]

Auto-generated charts

Ai Arxiv Chunked: 500 rows by 15 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.

Rows500
Columns15
Categorical cols9
Date range2015-06-22 → 2022-05-19

Charts

doi by record count

Most common doi values across records.

Interesting queries to try

Columns

  • doi categorical
  • chunk-id text
  • chunk text
  • id categorical
  • title categorical
  • summary categorical
  • source categorical
  • authors mixed
  • categories mixed
  • comment categorical
  • journal_ref categorical
  • primary_category categorical
  • published categorical
  • updated datetime
  • references mixed

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here