Dataset research report
Codealpaca 20K research report
A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.
Executive Summary
This dataset splits the original CodeAlpaca dataset into train and test splits.
Finding 1The dataset has 500 rows available in the catalog.
Finding 2The catalog exposes 2 documented or inferred columns.
Finding 3Helix has 4 ready query prompts for this dataset.
Finding 4This report still exposes schema, preview rows, and query prompts even when charts cannot be precomputed.
Follow-Up Queries
Preview Rows
| # | prompttext | completiontext |
|---|---|---|
| 1 | Create a Java class which sorts the given array of numbers. [9, 2, 4, 3, 6, 1] | class ArraySort { void sort(int arr[]) { int n = arr.length; // One by one move boundary of unsorted subarray … |
| 2 | Create an algorithm in JavaScript to find the closest pair of points from a given set of points. let points = [[2, 0], [4, 4], [-2, 2], [-3… | // algorithm to find closest pair of points // uses divide and conquer approach const closestPair = (points) => { // base case: two or… |
| 3 | Change the following loop so that it prints out the numbers from 10 to 1 in descending order. for i in range(1, 11): print(i) | for i in range(10, 0, -1): print(i) |
| 4 | Write an SQL query that returns the 5 most popular books in the library. SELECT * FROM libraryBook | SELECT bookID, bookTitle, COUNT(*) AS Count FROM libraryBook GROUP BY bookID, bookTitle ORDER BY Count DESC LIMIT 5; |
| 5 | Train an AI model to predict whether an email is spam or not. | import nltk from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB # loading data data … |
| 6 | Suggest a solution to the following problem: Write a program to sort a list of strings in alphabetical order. list_of_strings = ['banana', … | list_of_strings.sort() # Sorts the list in alphabetical order |
Data Dictionary
- prompt text
- completion text
Method And Limits
- Load the catalog entry and preview rows from the processed dataset file.
- Infer numeric, categorical, time, and location fields from real columns.
- Generate a small set of defensive Plotly chart specifications from that profile.
- Expose each chart idea as a query link so the report can be rerun or exported in Helix.
This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.