From Question to Actionable Insight: The Helix Pipeline

Helix AI > Tech Blog > From Question to Actionable Insight

From Question to Actionable Insight: The Helix Pipeline

The goal of business intelligence isn't charts - it's decisions. This post explores how Helix transforms natural language questions into actionable business insights through a pipeline designed for clarity and decision-support.

What Makes an Insight "Actionable"?

A visualization becomes actionable when it:

Answers a specific business question
Highlights what's important (not just what exists)
Provides context for interpretation
Suggests a clear next step

Our pipeline is designed to produce insights with these properties, not just charts.

The Pipeline Stages

Stage 1: Question Decomposition

Complex business questions often contain multiple sub-questions:

Input: "How are we performing against targets this quarter, and which regions are struggling?"

Decomposed:

Q1: What is current quarter performance vs target?
Q2: What is performance by region?
Q3: Which regions are below target?

def decompose_question(query):
    """Break complex questions into answerable sub-queries."""
    prompt = f"""
    Decompose this business question into specific, answerable sub-questions:
    Question: {query}

    For each sub-question, identify:
    - The metric being asked about
    - The segmentation dimension (if any)
    - The comparison or threshold (if any)
    """
    return llm.generate(prompt)

Stage 2: Semantic Data Matching

Natural language terms must map to actual data columns. "Performance" might mean revenue, units sold, or conversion rate depending on context.

def match_semantics(terms, schema, context):
    """Map business terms to data columns using semantic similarity."""
    embeddings = embed_terms(terms)
    column_embeddings = embed_schema(schema)

    matches = {}
    for term, emb in zip(terms, embeddings):
        # Find semantically similar columns
        scores = cosine_similarity(emb, column_embeddings)
        # Apply context boost (user's previous queries, domain)
        scores = apply_context_weights(scores, context)
        matches[term] = schema.columns[scores.argmax()]

    return matches

Stage 3: Query Synthesis

With semantic mappings established, we generate the data retrieval query:

-- Generated from: "Which regions are below target?"
SELECT
    region,
    SUM(revenue) as actual,
    SUM(target) as target,
    (SUM(revenue) - SUM(target)) / SUM(target) * 100 as variance_pct
FROM sales_data
WHERE quarter = 'Q4-2024'
GROUP BY region
HAVING SUM(revenue) < SUM(target)
ORDER BY variance_pct ASC

Stage 4: Insight Extraction

Raw data becomes insight through automated analysis:

def extract_insights(df, query_context):
    """Identify the key takeaways from the data."""
    insights = []

    # Statistical insights
    if has_outliers(df):
        insights.append(describe_outliers(df))

    # Trend insights
    if has_time_dimension(df):
        insights.append(describe_trend(df))

    # Comparison insights
    if has_target_column(df):
        above = df[df['actual'] >= df['target']]
        below = df[df['actual'] < df['target']]
        insights.append(f"{len(above)} regions on track, {len(below)} below target")

    # Rank insights
    top = df.nlargest(3, 'variance_pct')
    bottom = df.nsmallest(3, 'variance_pct')
    insights.append(f"Top performers: {list(top['region'])}")
    insights.append(f"Needs attention: {list(bottom['region'])}")

    return insights

Stage 5: Visualization with Context

The final visualization includes context that makes it actionable:

def build_actionable_chart(df, insights, metadata):
    fig = build_base_chart(df, metadata['chart_type'])

    # Add reference lines for targets/thresholds
    if metadata.get('target'):
        fig.add_hline(y=metadata['target'], line_dash="dash",
                      annotation_text="Target")

    # Highlight underperformers
    colors = ['red' if x < 0 else 'green' for x in df['variance_pct']]
    fig.update_traces(marker_color=colors)

    # Add insight annotation
    fig.add_annotation(
        text=insights[0],  # Most important insight
        xref="paper", yref="paper",
        x=0.5, y=1.1,
        showarrow=False
    )

    # Action-oriented title
    fig.update_layout(
        title=f"{metadata['metric']}: {len(df[df['variance_pct'] < 0])} regions need attention"
    )

    return fig

Making Insights Trustworthy

Actionable insights require trust. We build confidence through:

Data Lineage

Every visualization shows its source: which tables, what filters, when last updated.

Uncertainty Quantification

When data is incomplete or estimates are involved, we show confidence intervals.

Explanation Generation

Users can ask "Why is this chart showing X?" and get a natural language explanation of the query and transformations.

Example: End-to-End

User asks: "Which products should we discount next month?"

Pipeline produces:

Identifies inventory and sales data sources
Queries: products with high inventory, low recent sales, approaching expiry
Generates scatter plot: inventory level vs sales velocity
Highlights quadrant: high inventory + low velocity
Annotates: "12 products are overstocked with declining demand"
Lists specific SKUs with recommended discount percentages

Continuous Improvement

The pipeline improves through:

User feedback: Thumbs up/down on insights
Query refinement: User edits become training data
A/B testing: Compare insight quality across model versions
LLM judge: Automated evaluation of insight relevance and clarity

For technical details on evaluation, see our post on LLM Judge and Regression Harness.