Helix the Robot
Helix
arrow_backAll datasets

Synthetic Text To Sql

Hugging Face

Image generated by DALL-E. See prompt for more details synthetic_text_to_sql gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0. Please see our release blogpost for more details. The dataset includes: 105,851 records partitioned into 100,000 train and 5,851 test records ~23M total tokens, including ~12M SQL tokens Coverage across 100 distinct… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql.

descriptiongretelai--synthetic-text-to-sql.parquet view_list500 rows cloud_downloadgretelai/synthetic_text_to_sql
boltOpen in Helix

Ask a question about this data

Type any question in plain English — Helix builds the chart with AI. Sign in to run it and save your charts.

auto_awesome

Data preview

500 rows · 11 columns · showing first 12
# id integer domain text domain_description text sql_complexity text sql_complexity_description text sql_task_type text sql_task_type_description text sql_prompt text sql_context text sql text sql_explanation text
1 5097forestryComprehensive data on sustainable forest management, timber production, wildlife habitat, and carbon sequestration in forestry.single joinonly one join (specify inner, outer, cross)analytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the total volume of timber sold by each salesperson, sorted by salesperson?CREATE TABLE salesperson (salesperson_id INT, name TEXT, region TEXT); INSERT INTO salesperson (salesperson_id, name, region) VALUES (1, 'J…SELECT salesperson_id, name, SUM(volume) as total_volume FROM timber_sales JOIN salesperson ON timber_sales.salesperson_id = salesperson.sa…Joins timber_sales and salesperson tables, groups sales by salesperson, calculates total volume sold by each salesperson, and orders the re…
2 5098defense industryDefense contract data, military equipment maintenance, threat intelligence metrics, and veteran employment stats.aggregationaggregation functions (COUNT, SUM, AVG, MIN, MAX, etc.), and HAVING clauseanalytics and reportinggenerating reports, dashboards, and analytical insightsList all the unique equipment types and their corresponding total maintenance frequency from the equipment_maintenance table.CREATE TABLE equipment_maintenance (equipment_type VARCHAR(255), maintenance_frequency INT);SELECT equipment_type, SUM(maintenance_frequency) AS total_maintenance_frequency FROM equipment_maintenance GROUP BY equipment_type;This query groups the equipment_maintenance table by equipment_type and calculates the sum of maintenance_frequency for each group, then re…
3 5099marine biologyComprehensive data on marine species, oceanography, conservation efforts, and climate change impacts in marine biology.basic SQLbasic SQL with a simple select statementanalytics and reportinggenerating reports, dashboards, and analytical insightsHow many marine species are found in the Southern Ocean?CREATE TABLE marine_species (name VARCHAR(50), common_name VARCHAR(50), location VARCHAR(50));SELECT COUNT(*) FROM marine_species WHERE location = 'Southern Ocean';This query counts the number of marine species found in the Southern Ocean by using the COUNT() function with a WHERE clause.
4 5100financial servicesDetailed financial data including investment strategies, risk management, fraud detection, customer analytics, and regulatory compliance.aggregationaggregation functions (COUNT, SUM, AVG, MIN, MAX, etc.), and HAVING clauseanalytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the total trade value and average price for each trader and stock in the trade_history table?CREATE TABLE trade_history (id INT, trader_id INT, stock VARCHAR(255), price DECIMAL(5,2), quantity INT, trade_time TIMESTAMP);SELECT trader_id, stock, SUM(price * quantity) as total_trade_value, AVG(price) as avg_price FROM trade_history GROUP BY trader_id, stock;This query calculates the total trade value and average price for each trader and stock by grouping the trade_history table by the trader_i…
5 5101energyEnergy market data covering renewable energy sources, energy storage, carbon pricing, and energy efficiency.window functionswindow functions (e.g., ROW_NUMBER, LEAD, LAG, RANk, NTILE, PERCENT_RANK, etc.) with partitioning and orderinganalytics and reportinggenerating reports, dashboards, and analytical insightsFind the energy efficiency upgrades with the highest cost and their types.CREATE TABLE upgrades (id INT, cost FLOAT, type TEXT); INSERT INTO upgrades (id, cost, type) VALUES (1, 500, 'Insulation'), (2, 1000, 'HVAC…SELECT type, cost FROM (SELECT type, cost, ROW_NUMBER() OVER (ORDER BY cost DESC) as rn FROM upgrades) sub WHERE rn = 1;The SQL query uses the ROW_NUMBER function to rank the energy efficiency upgrades based on their cost in descending order. It then filters …
6 5102defense operationsDefense data on military innovation, peacekeeping operations, defense diplomacy, and humanitarian assistance.basic SQLbasic SQL with a simple select statementanalytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the total spending on humanitarian assistance by the European Union in the last 3 years?CREATE SCHEMA if not exists defense; CREATE TABLE if not exists eu_humanitarian_assistance (id INT PRIMARY KEY, year INT, spending INT); IN…SELECT SUM(spending) FROM defense.eu_humanitarian_assistance WHERE year BETWEEN 2019 AND 2021;This SQL query calculates the total spending on humanitarian assistance by the European Union in the last 3 years. It uses the SUM() aggreg…
7 5103aquacultureAquatic farming data, fish stock management, ocean health metrics, and sustainable seafood trends.single joinonly one join (specify inner, outer, cross)analytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the average water temperature for each fish species in February?CREATE TABLE SpeciesWaterTemp (SpeciesID int, Date date, WaterTemp float); INSERT INTO SpeciesWaterTemp (SpeciesID, Date, WaterTemp) VALUES…SELECT SpeciesName, AVG(WaterTemp) as AvgTemp FROM SpeciesWaterTemp INNER JOIN FishSpecies ON SpeciesWaterTemp.SpeciesID = FishSpecies.Spec…This query calculates the average water temperature for each fish species in February. It filters the records where the month of the date i…
8 5104nonprofit operationsDonation records, program outcomes, volunteer engagement metrics, budget reports, and community impact assessments.basic SQLbasic SQL with a simple select statementdata manipulationinserting, updating, or deleting recordsDelete a program's outcome dataCREATE TABLE Program_Outcomes (id INT, program_id INT, outcome_type VARCHAR, value INT, outcome_date DATE); INSERT INTO Program_Outcomes (i…DELETE FROM Program_Outcomes WHERE program_id = 1002;This query removes the record with a program_id of 1002 from the Program_Outcomes table.
9 5105public transportationExtensive data on route planning, fare collection, vehicle maintenance, and accessibility in public transportation.basic SQLbasic SQL with a simple select statementanalytics and reportinggenerating reports, dashboards, and analytical insightsFind the total fare collected from passengers on 'Green Line' busesCREATE TABLE bus_routes (route_name VARCHAR(50), fare FLOAT); INSERT INTO bus_routes (route_name, fare) VALUES ('Green Line', 1.50), ('Red …SELECT SUM(fare) FROM bus_routes WHERE route_name = 'Green Line';This SQL query calculates the total fare collected from passengers on the 'Green Line' by summing the 'fare' column values where the 'route…
10 5106real estateReal estate data on inclusive housing policies, sustainable urbanism, property co-ownership, and housing affordability.basic SQLbasic SQL with a simple select statementanalytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the average property size in inclusive housing areas?CREATE TABLE Inclusive_Housing (Property_ID INT, Inclusive VARCHAR(10), Property_Size INT); INSERT INTO Inclusive_Housing (Property_ID, Inc…SELECT AVG(Property_Size) FROM Inclusive_Housing WHERE Inclusive = 'Yes';The SQL query calculates the average property size in inclusive housing areas by using the AVG function on the Property_Size column, and fi…
11 5107rural healthDetailed records on healthcare access, disease prevalence, and resource allocation in rural health.single joinonly one join (specify inner, outer, cross)analytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the average income of farmers in each district in India?CREATE TABLE farmers_india (id INT, name VARCHAR(255), district_id INT, age INT, income INT); INSERT INTO farmers_india (id, name, district…SELECT d.name, AVG(f.income) FROM farmers_india f JOIN districts_india d ON f.district_id = d.id GROUP BY d.name;The SQL query joins the farmers_india and districts_india tables, and then groups the average income of farmers by district.
12 5108marine biologyComprehensive data on marine species, oceanography, conservation efforts, and climate change impacts in marine biology.subqueriessubqueries, including correlated and nested subqueriesanalytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the average sea surface temperature in the Pacific Ocean per year?CREATE TABLE Ocean(id INT, region VARCHAR(20), temperature DECIMAL(5,2)); INSERT INTO Ocean(id, region, temperature) VALUES (1, 'Pacific', …SELECT AVG(temperature) FROM (SELECT temperature FROM Ocean WHERE region = 'Pacific' GROUP BY temperature, YEAR(STR_TO_DATE(region, '%Y')))…This query calculates the average sea surface temperature in the Pacific Ocean per year. It first groups the data by temperature and year (…

Auto-generated charts

Synthetic Text To Sql: 500 rows by 11 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.

Rows500
Columns11
Numeric cols1
Categorical cols4

Charts

Total id by sql_complexity

Top sql_complexity values ranked by summed id.

Distribution of id

Histogram of id values.

id by sql_complexity

Spread of id across sql_complexity groups.

Interesting queries to try

Columns

  • id numeric
  • domain text
  • domain_description text
  • sql_complexity categorical
  • sql_complexity_description categorical
  • sql_task_type categorical
  • sql_task_type_description categorical
  • sql_prompt text
  • sql_context text
  • sql text
  • sql_explanation text

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here