Synthetic Text To Sql
Hugging FaceImage generated by DALL-E. See prompt for more details synthetic_text_to_sql gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0. Please see our release blogpost for more details. The dataset includes: 105,851 records partitioned into 100,000 train and 5,851 test records ~23M total tokens, including ~12M SQL tokens Coverage across 100 distinct… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql.
Ask a question about this data
Type any question in plain English — Helix builds the chart with AI. Sign in to run it and save your charts.
Data preview
500 rows · 11 columns · showing first 12| # | id integer | domain text | domain_description text | sql_complexity text | sql_complexity_description text | sql_task_type text | sql_task_type_description text | sql_prompt text | sql_context text | sql text | sql_explanation text |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 5097 | forestry | Comprehensive data on sustainable forest management, timber production, wildlife habitat, and carbon sequestration in forestry. | single join | only one join (specify inner, outer, cross) | analytics and reporting | generating reports, dashboards, and analytical insights | What is the total volume of timber sold by each salesperson, sorted by salesperson? | CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'J… | SELECT salesperson_id, name, SUM(volume) as total_volume FROM timber_sales JOIN salesperson ON timber_sales.salesperson_id = salesperson.sa… | Joins timber_sales and salesperson tables, groups sales by salesperson, calculates total volume sold by each salesperson, and orders the re… |
| 2 | 5098 | defense industry | Defense contract data, military equipment maintenance, threat intelligence metrics, and veteran employment stats. | aggregation | aggregation functions (COUNT, SUM, AVG, MIN, MAX, etc.), and HAVING clause | analytics and reporting | generating reports, dashboards, and analytical insights | List all the unique equipment types and their corresponding total maintenance frequency from the equipment_maintenance table. | CREATE TABLE equipment_maintenance (equipment_type VARCHAR(255), maintenance_frequency INT); | SELECT equipment_type, SUM(maintenance_frequency) AS total_maintenance_frequency FROM equipment_maintenance GROUP BY equipment_type; | This query groups the equipment_maintenance table by equipment_type and calculates the sum of maintenance_frequency for each group, then re… |
| 3 | 5099 | marine biology | Comprehensive data on marine species, oceanography, conservation efforts, and climate change impacts in marine biology. | basic SQL | basic SQL with a simple select statement | analytics and reporting | generating reports, dashboards, and analytical insights | How many marine species are found in the Southern Ocean? | CREATE TABLE marine_species (name VARCHAR(50), common_name VARCHAR(50), location VARCHAR(50)); | SELECT COUNT(*) FROM marine_species WHERE location = 'Southern Ocean'; | This query counts the number of marine species found in the Southern Ocean by using the COUNT() function with a WHERE clause. |
| 4 | 5100 | financial services | Detailed financial data including investment strategies, risk management, fraud detection, customer analytics, and regulatory compliance. | aggregation | aggregation functions (COUNT, SUM, AVG, MIN, MAX, etc.), and HAVING clause | analytics and reporting | generating reports, dashboards, and analytical insights | What is the total trade value and average price for each trader and stock in the trade_history table? | CREATE TABLE trade_history (id INT, trader_id INT, stock VARCHAR(255), price DECIMAL(5,2), quantity INT, trade_time TIMESTAMP); | SELECT trader_id, stock, SUM(price * quantity) as total_trade_value, AVG(price) as avg_price FROM trade_history GROUP BY trader_id, stock; | This query calculates the total trade value and average price for each trader and stock by grouping the trade_history table by the trader_i… |
| 5 | 5101 | energy | Energy market data covering renewable energy sources, energy storage, carbon pricing, and energy efficiency. | window functions | window functions (e.g., ROW_NUMBER, LEAD, LAG, RANk, NTILE, PERCENT_RANK, etc.) with partitioning and ordering | analytics and reporting | generating reports, dashboards, and analytical insights | Find the energy efficiency upgrades with the highest cost and their types. | CREATE TABLE upgrades (id INT, cost FLOAT, type TEXT); INSERT INTO upgrades (id, cost, type) VALUES (1, 500, 'Insulation'), (2, 1000, 'HVAC… | SELECT type, cost FROM (SELECT type, cost, ROW_NUMBER() OVER (ORDER BY cost DESC) as rn FROM upgrades) sub WHERE rn = 1; | The SQL query uses the ROW_NUMBER function to rank the energy efficiency upgrades based on their cost in descending order. It then filters … |
| 6 | 5102 | defense operations | Defense data on military innovation, peacekeeping operations, defense diplomacy, and humanitarian assistance. | basic SQL | basic SQL with a simple select statement | analytics and reporting | generating reports, dashboards, and analytical insights | What is the total spending on humanitarian assistance by the European Union in the last 3 years? | CREATE SCHEMA if not exists defense; CREATE TABLE if not exists eu_humanitarian_assistance (id INT PRIMARY KEY, year INT, spending INT); IN… | SELECT SUM(spending) FROM defense.eu_humanitarian_assistance WHERE year BETWEEN 2019 AND 2021; | This SQL query calculates the total spending on humanitarian assistance by the European Union in the last 3 years. It uses the SUM() aggreg… |
| 7 | 5103 | aquaculture | Aquatic farming data, fish stock management, ocean health metrics, and sustainable seafood trends. | single join | only one join (specify inner, outer, cross) | analytics and reporting | generating reports, dashboards, and analytical insights | What is the average water temperature for each fish species in February? | CREATE TABLE SpeciesWaterTemp (SpeciesID int, Date date, WaterTemp float); INSERT INTO SpeciesWaterTemp (SpeciesID, Date, WaterTemp) VALUES… | SELECT SpeciesName, AVG(WaterTemp) as AvgTemp FROM SpeciesWaterTemp INNER JOIN FishSpecies ON SpeciesWaterTemp.SpeciesID = FishSpecies.Spec… | This query calculates the average water temperature for each fish species in February. It filters the records where the month of the date i… |
| 8 | 5104 | nonprofit operations | Donation records, program outcomes, volunteer engagement metrics, budget reports, and community impact assessments. | basic SQL | basic SQL with a simple select statement | data manipulation | inserting, updating, or deleting records | Delete a program's outcome data | CREATE TABLE Program_Outcomes (id INT, program_id INT, outcome_type VARCHAR, value INT, outcome_date DATE); INSERT INTO Program_Outcomes (i… | DELETE FROM Program_Outcomes WHERE program_id = 1002; | This query removes the record with a program_id of 1002 from the Program_Outcomes table. |
| 9 | 5105 | public transportation | Extensive data on route planning, fare collection, vehicle maintenance, and accessibility in public transportation. | basic SQL | basic SQL with a simple select statement | analytics and reporting | generating reports, dashboards, and analytical insights | Find the total fare collected from passengers on 'Green Line' buses | CREATE TABLE bus_routes (route_name VARCHAR(50), fare FLOAT); INSERT INTO bus_routes (route_name, fare) VALUES ('Green Line', 1.50), ('Red … | SELECT SUM(fare) FROM bus_routes WHERE route_name = 'Green Line'; | This SQL query calculates the total fare collected from passengers on the 'Green Line' by summing the 'fare' column values where the 'route… |
| 10 | 5106 | real estate | Real estate data on inclusive housing policies, sustainable urbanism, property co-ownership, and housing affordability. | basic SQL | basic SQL with a simple select statement | analytics and reporting | generating reports, dashboards, and analytical insights | What is the average property size in inclusive housing areas? | CREATE TABLE Inclusive_Housing (Property_ID INT, Inclusive VARCHAR(10), Property_Size INT); INSERT INTO Inclusive_Housing (Property_ID, Inc… | SELECT AVG(Property_Size) FROM Inclusive_Housing WHERE Inclusive = 'Yes'; | The SQL query calculates the average property size in inclusive housing areas by using the AVG function on the Property_Size column, and fi… |
| 11 | 5107 | rural health | Detailed records on healthcare access, disease prevalence, and resource allocation in rural health. | single join | only one join (specify inner, outer, cross) | analytics and reporting | generating reports, dashboards, and analytical insights | What is the average income of farmers in each district in India? | CREATE TABLE farmers_india (id INT, name VARCHAR(255), district_id INT, age INT, income INT); INSERT INTO farmers_india (id, name, district… | SELECT d.name, AVG(f.income) FROM farmers_india f JOIN districts_india d ON f.district_id = d.id GROUP BY d.name; | The SQL query joins the farmers_india and districts_india tables, and then groups the average income of farmers by district. |
| 12 | 5108 | marine biology | Comprehensive data on marine species, oceanography, conservation efforts, and climate change impacts in marine biology. | subqueries | subqueries, including correlated and nested subqueries | analytics and reporting | generating reports, dashboards, and analytical insights | What is the average sea surface temperature in the Pacific Ocean per year? | CREATE TABLE Ocean(id INT, region VARCHAR(20), temperature DECIMAL(5,2)); INSERT INTO Ocean(id, region, temperature) VALUES (1, 'Pacific', … | SELECT AVG(temperature) FROM (SELECT temperature FROM Ocean WHERE region = 'Pacific' GROUP BY temperature, YEAR(STR_TO_DATE(region, '%Y')))… | This query calculates the average sea surface temperature in the Pacific Ocean per year. It first groups the data by temperature and year (… |
Auto-generated charts
Synthetic Text To Sql: 500 rows by 11 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.
Charts
Total id by sql_complexity
Top sql_complexity values ranked by summed id.
Distribution of id
Histogram of id values.
id by sql_complexity
Spread of id across sql_complexity groups.
Interesting queries to try
Columns
- id numeric
- domain text
- domain_description text
- sql_complexity categorical
- sql_complexity_description categorical
- sql_task_type categorical
- sql_task_type_description categorical
- sql_prompt text
- sql_context text
- sql text
- sql_explanation text