Dataset research report
Mathinstruct research report
A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.
Executive Summary
🦣 MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning MathInstruct is a meticulously curated instruction tuning dataset that is lightweight yet generalizable. MathInstruct is compiled from 13 math rationale datasets, six of which are newly curated by this work. It uniquely focuses on the hybrid use of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and ensures extensive coverage of diverse mathematical fields. Project Page:… See the full description on the dataset page: https://huggingface.co/datasets/TIGER-Lab/MathInstruct.
Research Context
Mathinstruct: 500 rows by 3 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.
Data Profile
Chart Evidence
These views are generated from the dataset profile. Each chart is paired with a Helix query so it can be opened, adjusted, and exported.
Follow-Up Queries
Preview Rows
| # | sourcetext | outputtext | instructiontext |
|---|---|---|---|
| 1 | data/CoT/aqua_rat.json | Let's think about the multi-choice question. 6.52 × 10^5 ly / (3.26 ly/parsec) = 2 x 10^5 persec The answer is A. | The distance between two stars is 6.52 × 10^5 light years. What is the distance between the two stars in parsecs? (1 parsec = 3.26 light ye… |
| 2 | data/CoT/aqua_rat.json | Let's solve the multi-choice question step by step. According to the above the # of permutations of 6 letters COMMON out of which 2 O's and… | How many ways can the letters in the word COMMON be arranged? Answer Choices: (A) 6 (B) 30 (C) 90 (D) 120 (E) 180 |
| 3 | data/PoT/aqua_rat_filtered.json | answers = ['A', 'B', 'C', 'D', 'E'] # If the best marksman had scored 92 points, the total score would have been 84 * 6 = 504 # But he actu… | A team of six entered for a shooting competition. The best marks man scored 85 points. If he had scored 92 points, the average scores for. … |
| 4 | data/CoT/gsm_rft.json | The second patient needs 6+5 = 11 sessions 25-11-6 = 8 sessions The answer is 8 | A psychiatrist has 4 patients that need 25 sessions in total. One of the patients needs 6 sessions. Another patient needs 5 more than that.… |
| 5 | data/PoT/aqua_rat_filtered.json | radius = 22.4 resolutions = 500 # calculate the circumference of the wheel circumference = 2 * 3.14 * radius # calculate the distance cover… | The radius of a wheel is 22.4 cm. What is the distance covered by the wheel in making 500 resolutions? Answer Choices: (A) 187 m (B) 704 m … |
| 6 | data/CoT/college_math.json | By the Fundamental Theorem of Finite Abelian Groups, any group of order 35, which is the product of two distinct prime numbers (5 and 7), m… | Let G be a group of order 35. What can be said about G? Answer Choices: (A) G must be abelian. (B) G must be cyclic. (C) G must be a direc… |
Data Dictionary
- source text
- output text
- instruction text
Method And Limits
- Load the catalog entry and preview rows from the processed dataset file.
- Infer numeric, categorical, time, and location fields from real columns.
- Generate a small set of defensive Plotly chart specifications from that profile.
- Expose each chart idea as a query link so the report can be rerun or exported in Helix.
This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.