Helix the Robot
Helix
arrow_backAll datasets

Youtube Transcriptions

Hugging Face

The YouTube transcriptions dataset contains technical tutorials (currently from James Briggs, Daniel Bourke, and AI Coffee Break) transcribed using OpenAI's Whisper (large). Each row represents roughly a sentence-length chunk of text alongside the video URL and timestamp. Note that each item in the dataset contains just a short chunk of text. For most use cases you will likely need to merge multiple rows to create more substantial chunks of text, if you need to do that, this code snippet will… See the full description on the dataset page: https://huggingface.co/datasets/jamescalam/youtube-transcriptions.

descriptionjamescalam--youtube-transcriptions.parquet view_list500 rows cloud_downloadjamescalam/youtube-transcriptions
boltOpen in Helix

Ask a question about this data

Type any question in plain English — Helix builds the chart with AI. Sign in to run it and save your charts.

auto_awesome

Data preview

500 rows · 9 columns · showing first 12
# title text published text url text video_id text channel_id text id text text text start float end float
1 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t0.0Hi, welcome to the video.09.36
2 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t3.0So this is the fourth video in a Transformers311.56
3 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t9.36from Scratch mini series.9.3615.84
4 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t11.56So if you haven't been following along,11.5618.48
5 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t15.84we've essentially covered what you can see on the screen.15.8420.6
6 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t18.48So we got some data.18.4823.72
7 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t20.6We built a tokenizer with it.20.625.76
8 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t23.72And then we've set up our input pipeline23.7228.48
9 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t25.76ready to begin actually training our model, which25.7632.36
10 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t28.48is what we're going to cover in this video.28.4835.96
11 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t32.36So let's move over to the code.32.3639.56
12 Training and Testing an Italian BERT - Transformers From Scratch #42021-07-06 13:00:03 UTChttps://youtu.be/35Pdoyi6ZoQ35Pdoyi6ZoQUCv83tO5cePwHMt1952IVVHw35Pdoyi6ZoQ-t35.96And we see here that we have essentially everything35.9640.48

Auto-generated charts

Youtube Transcriptions: 500 rows by 9 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.

Rows500
Columns9
Numeric cols2
Categorical cols5

Charts

start vs end

Relationship between start and end.

Distribution of start

Histogram of start values.

Interesting queries to try

Columns

  • title categorical
  • published categorical
  • url categorical
  • video_id categorical
  • channel_id categorical
  • id text
  • text text
  • start numeric
  • end numeric

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here