Dataset research report
Synthetic Text To Sql research report
A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.
Executive Summary
Image generated by DALL-E. See prompt for more details synthetic_text_to_sql gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0. Please see our release blogpost for more details. The dataset includes: 105,851 records partitioned into 100,000 train and 5,851 test records ~23M total tokens, including ~12M SQL tokens Coverage across 100 distinct… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql.
Research Context
Synthetic Text To Sql: 500 rows by 11 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.
Data Profile
Chart Evidence
These views are generated from the dataset profile. Each chart is paired with a Helix query so it can be opened, adjusted, and exported.
Total id by sql_complexity
Top sql_complexity values ranked by summed id.
Open and export this chartFollow-Up Queries
Preview Rows
| # | idinteger | domaintext | domain_descriptiontext | sql_complexitytext | sql_complexity_descriptiontext | sql_task_typetext | sql_task_type_descriptiontext | sql_prompttext |
|---|---|---|---|---|---|---|---|---|
| 1 | 5097 | forestry | Comprehensive data on sustainable forest management, timber production, wildlife habitat, and carbon sequestration in forestry. | single join | only one join (specify inner, outer, cross) | analytics and reporting | generating reports, dashboards, and analytical insights | What is the total volume of timber sold by each salesperson, sorted by salesperson? |
| 2 | 5098 | defense industry | Defense contract data, military equipment maintenance, threat intelligence metrics, and veteran employment stats. | aggregation | aggregation functions (COUNT, SUM, AVG, MIN, MAX, etc.), and HAVING clause | analytics and reporting | generating reports, dashboards, and analytical insights | List all the unique equipment types and their corresponding total maintenance frequency from the equipment_maintenance table. |
| 3 | 5099 | marine biology | Comprehensive data on marine species, oceanography, conservation efforts, and climate change impacts in marine biology. | basic SQL | basic SQL with a simple select statement | analytics and reporting | generating reports, dashboards, and analytical insights | How many marine species are found in the Southern Ocean? |
| 4 | 5100 | financial services | Detailed financial data including investment strategies, risk management, fraud detection, customer analytics, and regulatory compliance. | aggregation | aggregation functions (COUNT, SUM, AVG, MIN, MAX, etc.), and HAVING clause | analytics and reporting | generating reports, dashboards, and analytical insights | What is the total trade value and average price for each trader and stock in the trade_history table? |
| 5 | 5101 | energy | Energy market data covering renewable energy sources, energy storage, carbon pricing, and energy efficiency. | window functions | window functions (e.g., ROW_NUMBER, LEAD, LAG, RANk, NTILE, PERCENT_RANK, etc.) with partitioning and ordering | analytics and reporting | generating reports, dashboards, and analytical insights | Find the energy efficiency upgrades with the highest cost and their types. |
| 6 | 5102 | defense operations | Defense data on military innovation, peacekeeping operations, defense diplomacy, and humanitarian assistance. | basic SQL | basic SQL with a simple select statement | analytics and reporting | generating reports, dashboards, and analytical insights | What is the total spending on humanitarian assistance by the European Union in the last 3 years? |
Data Dictionary
- id numeric
- domain text
- domain_description text
- sql_complexity categorical
- sql_complexity_description categorical
- sql_task_type categorical
- sql_task_type_description categorical
- sql_prompt text
- sql_context text
- sql text
- sql_explanation text
Method And Limits
- Load the catalog entry and preview rows from the processed dataset file.
- Infer numeric, categorical, time, and location fields from real columns.
- Generate a small set of defensive Plotly chart specifications from that profile.
- Expose each chart idea as a query link so the report can be rerun or exported in Helix.
This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.