Helix the Robot
Helix
arrow_backSynthetic Text To Sql

Dataset research report

Synthetic Text To Sql research report

A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.

storageHf descriptiongretelai--synthetic-text-to-sql.parquet view_list500 rows

Executive Summary

Image generated by DALL-E. See prompt for more details synthetic_text_to_sql gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples, designed and generated using Gretel Navigator, and released under Apache 2.0. Please see our release blogpost for more details. The dataset includes: 105,851 records partitioned into 100,000 train and 5,851 test records ~23M total tokens, including ~12M SQL tokens Coverage across 100 distinct… See the full description on the dataset page: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql.

Finding 1The dataset has 500 rows available in the catalog.
Finding 2The catalog exposes 11 documented or inferred columns.
Finding 3Helix has 5 ready query prompts for this dataset.
Finding 4This report includes 3 generated chart views.

Research Context

Synthetic Text To Sql: 500 rows by 11 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.

Data Profile

Rows500
Columns11
Numeric cols1
Categorical cols4

Chart Evidence

These views are generated from the dataset profile. Each chart is paired with a Helix query so it can be opened, adjusted, and exported.

Follow-Up Queries

Preview Rows

# idinteger domaintext domain_descriptiontext sql_complexitytext sql_complexity_descriptiontext sql_task_typetext sql_task_type_descriptiontext sql_prompttext
1 5097forestryComprehensive data on sustainable forest management, timber production, wildlife habitat, and carbon sequestration in forestry.single joinonly one join (specify inner, outer, cross)analytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the total volume of timber sold by each salesperson, sorted by salesperson?
2 5098defense industryDefense contract data, military equipment maintenance, threat intelligence metrics, and veteran employment stats.aggregationaggregation functions (COUNT, SUM, AVG, MIN, MAX, etc.), and HAVING clauseanalytics and reportinggenerating reports, dashboards, and analytical insightsList all the unique equipment types and their corresponding total maintenance frequency from the equipment_maintenance table.
3 5099marine biologyComprehensive data on marine species, oceanography, conservation efforts, and climate change impacts in marine biology.basic SQLbasic SQL with a simple select statementanalytics and reportinggenerating reports, dashboards, and analytical insightsHow many marine species are found in the Southern Ocean?
4 5100financial servicesDetailed financial data including investment strategies, risk management, fraud detection, customer analytics, and regulatory compliance.aggregationaggregation functions (COUNT, SUM, AVG, MIN, MAX, etc.), and HAVING clauseanalytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the total trade value and average price for each trader and stock in the trade_history table?
5 5101energyEnergy market data covering renewable energy sources, energy storage, carbon pricing, and energy efficiency.window functionswindow functions (e.g., ROW_NUMBER, LEAD, LAG, RANk, NTILE, PERCENT_RANK, etc.) with partitioning and orderinganalytics and reportinggenerating reports, dashboards, and analytical insightsFind the energy efficiency upgrades with the highest cost and their types.
6 5102defense operationsDefense data on military innovation, peacekeeping operations, defense diplomacy, and humanitarian assistance.basic SQLbasic SQL with a simple select statementanalytics and reportinggenerating reports, dashboards, and analytical insightsWhat is the total spending on humanitarian assistance by the European Union in the last 3 years?

Data Dictionary

  • id numeric
  • domain text
  • domain_description text
  • sql_complexity categorical
  • sql_complexity_description categorical
  • sql_task_type categorical
  • sql_task_type_description categorical
  • sql_prompt text
  • sql_context text
  • sql text
  • sql_explanation text

Method And Limits

  • Load the catalog entry and preview rows from the processed dataset file.
  • Infer numeric, categorical, time, and location fields from real columns.
  • Generate a small set of defensive Plotly chart specifications from that profile.
  • Expose each chart idea as a query link so the report can be rerun or exported in Helix.

This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.

Related Dataset Reports

Login to Helix

Don't have an account? Sign up here

Sign Up for Helix

Already have an account? Login here