Dataset research report
Sql Create Context research report
A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.
Executive Summary
Overview This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from different DBMS and provides table names, column… See the full description on the dataset page: https://huggingface.co/datasets/b-mc2/sql-create-context.
Follow-Up Queries
Preview Rows
| # | answertext | questiontext | contexttext |
|---|---|---|---|
| 1 | SELECT COUNT(*) FROM head WHERE age > 56 | How many heads of the departments are older than 56 ? | CREATE TABLE head (age INTEGER) |
| 2 | SELECT name, born_state, age FROM head ORDER BY age | List the name, born state and age of the heads of departments ordered by age. | CREATE TABLE head (name VARCHAR, born_state VARCHAR, age VARCHAR) |
| 3 | SELECT creation, name, budget_in_billions FROM department | List the creation year, name and budget of each department. | CREATE TABLE department (creation VARCHAR, name VARCHAR, budget_in_billions VARCHAR) |
| 4 | SELECT MAX(budget_in_billions), MIN(budget_in_billions) FROM department | What are the maximum and minimum budget of the departments? | CREATE TABLE department (budget_in_billions INTEGER) |
| 5 | SELECT AVG(num_employees) FROM department WHERE ranking BETWEEN 10 AND 15 | What is the average number of employees of the departments whose rank is between 10 and 15? | CREATE TABLE department (num_employees INTEGER, ranking INTEGER) |
| 6 | SELECT name FROM head WHERE born_state <> 'California' | What are the names of the heads who are born outside the California state? | CREATE TABLE head (name VARCHAR, born_state VARCHAR) |
Data Dictionary
- answer text
- question text
- context text
Method And Limits
- Load the catalog entry and preview rows from the processed dataset file.
- Infer numeric, categorical, time, and location fields from real columns.
- Generate a small set of defensive Plotly chart specifications from that profile.
- Expose each chart idea as a query link so the report can be rerun or exported in Helix.
This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.