Helix the Robot
Helix Helix
arrow_backAll datasets

Youtube Transcriptions

Hugging Face

The YouTube transcriptions dataset contains technical tutorials (currently from James Briggs, Daniel Bourke, and AI Coffee Break) transcribed using OpenAI's Whisper (large). Each row represents roughly a sentence-length chunk of text alongside the video URL and timestamp. Note that each item in the dataset contains just a short chunk of text. For most use cases you will likely need to merge multiple rows to create more substantial chunks of text, if you need to do that, this code snippet will… See the full description on the dataset page: https://huggingface.co/datasets/jamescalam/youtube-transcriptions.

descriptionjamescalam--youtube-transcriptions.parquet view_list500 rows cloud_downloadjamescalam/youtube-transcriptions
boltOpen in Helix

Interesting queries to try

Columns

  • title categorical
  • published categorical
  • url categorical
  • video_id categorical
  • channel_id categorical
  • id text
  • text text
  • start numeric
  • end numeric

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here