Query generation pipeline
Helix builds data transformations and charts by composing deterministic planners with large language models. The pipeline below is what powers the /graph endpoint.
Intent capture
- Requests start with structured metadata: tenant id, schema fingerprints and relevant plugin capabilities.
- We run a lexical parser that extracts verbs, measures, filters and time windows. This primes our prompt builder with the important business entities.
- A safety layer strips secrets and enforces an allow-list of SQL verbs so no destructive commands get generated.
Plan synthesis
Helix does not ask a model for raw SQL. Instead we ask for a chain of high-level actions that reference dataset aliases. Example output:
{
"dataset": "warehouse.orders",
"steps": [
{"op": "filter", "expr": "status in ['shipped', 'processing']"},
{"op": "derive", "name": "gross_margin", "expr": "(revenue - cost) / revenue"},
{"op": "aggregate", "group": ["region"], "metrics": {"orders": "count()", "margin": "avg(gross_margin)"}}
],
"visual": {"type": "bar", "x": "region", "y": "margin", "series": "orders"}
}
The planner validates every step against our schema registry. When something is ambiguous we fall back to fuzzy column matching and emit warnings that surface in the UI.
SQL realization
Action plans are compiled into SQL using our query builder. It applies tenant-specific policies (row level security, soft deletes) and injects feature flags provided by plugins.
Before executing we run the SQL through a static analyzer that rejects anti-patterns (Cartesian products, unbounded cross joins) and estimates resource usage for the autoscaler.
Execution sandbox
Plans execute inside a sandboxed runner with retry semantics. All queries go through pgBouncer or the relevant driver pool and execute under a budget derived from the prompt risk score.
When execution fails we capture the traceback, sample offending rows and feed that back into the rewriter so the next attempt is grounded on real feedback instead of hallucinated fixes.
Chart synthesis
Once data is materialized we hand the schema, summary stats and user goal to a plotting prompt. The LLM proposes layout adjustments which are diffed against our library of plotly defaults.
Post-processing enforces accessibility and branding: color palettes, labels and axis ordering are normalized server side.
Audit trail
Every step emits structured logs. Operators can replay the plan, the generated SQL and the prompt/response pairs to debug issues or certify compliance.
We retain anonymized plans for offline evaluation so the judge system can score precision, runtime and user satisfaction proxies.