Dataset research report
Dbpedia 14 research report
A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.
Executive Summary
Dataset Card for DBpedia14 Dataset Summary The DBpedia ontology classification dataset is constructed by picking 14 non-overlapping classes from DBpedia 2014. They are listed in classes.txt. From each of thse 14 ontology classes, we randomly choose 40,000 training samples and 5,000 testing samples. Therefore, the total size of the training dataset is 560,000 and testing dataset 70,000. There are 3 columns in the dataset (same for train and test splits), corresponding to… See the full description on the dataset page: https://huggingface.co/datasets/fancyzhx/dbpedia_14.
Research Context
Dbpedia 14: 500 rows by 3 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.
Data Profile
Chart Evidence
These views are generated from the dataset profile. Each chart is paired with a Helix query so it can be opened, adjusted, and exported.
Follow-Up Queries
Preview Rows
| # | labelinteger | titletext | contenttext |
|---|---|---|---|
| 1 | 0 | E. D. Abbott Ltd | Abbott of Farnham E D Abbott Limited was a British coachbuilding business based in Farnham Surrey trading under that name from 1929. A maj… |
| 2 | 0 | Schwan-Stabilo | Schwan-STABILO is a German maker of pens for writing colouring and cosmetics as well as markers and highlighters for office use. It is the… |
| 3 | 0 | Q-workshop | Q-workshop is a Polish company located in Poznań that specializes in designand production of polyhedral dice and dice accessories for use … |
| 4 | 0 | Marvell Software Solutions Israel | Marvell Software Solutions Israel known as RADLAN Computer Communications Limited before 2007 is a wholly owned subsidiary of Marvell Tech… |
| 5 | 0 | Bergan Mercy Medical Center | Bergan Mercy Medical Center is a hospital located in Omaha Nebraska. It is part of the Alegent Health System. |
| 6 | 0 | The Unsigned Guide | The Unsigned Guide is an online contacts directory and careers guide for the UK music industry. Founded in 2003 and first published as a p… |
Data Dictionary
- label numeric
- title text
- content text
Method And Limits
- Load the catalog entry and preview rows from the processed dataset file.
- Infer numeric, categorical, time, and location fields from real columns.
- Generate a small set of defensive Plotly chart specifications from that profile.
- Expose each chart idea as a query link so the report can be rerun or exported in Helix.
This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.