Dataset research report
Mbpp research report
A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.
Executive Summary
Dataset Card for Mostly Basic Python Problems (mbpp) Dataset Summary The benchmark consists of around 1,000 crowd-sourced Python programming problems, designed to be solvable by entry level programmers, covering programming fundamentals, standard library functionality, and so on. Each problem consists of a task description, code solution and 3 automated test cases. As described in the paper, a subset of the data has been hand-verified by us. Released here as part of… See the full description on the dataset page: https://huggingface.co/datasets/google-research-datasets/mbpp.
Research Context
Mbpp: 374 rows by 6 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.
Data Profile
Chart Evidence
These views are generated from the dataset profile. Each chart is paired with a Helix query so it can be opened, adjusted, and exported.
Total task_id by test_setup_code
Top test_setup_code values ranked by summed task_id.
Open and export this charttask_id by test_setup_code
Spread of task_id across test_setup_code groups.
Open and export this chartFollow-Up Queries
Preview Rows
| # | task_idinteger | texttext | codetext | test_listtext | test_setup_codetext | challenge_test_listtext |
|---|---|---|---|---|---|---|
| 1 | 601 | Write a function to find the longest chain which can be formed from the given set of pairs. | class Pair(object): def __init__(self, a, b): self.a = a self.b = b def max_chain_length(arr, n): max = 0 mcl = [1 for i … | ['assert max_chain_length([Pair(5, 24), Pair(15, 25),Pair(27, 40), Pair(50, 60)], 4) == 3' 'assert max_chain_length([Pair(1, 2), Pair(3, 4… | [] | |
| 2 | 602 | Write a python function to find the first repeated character in a given string. | def first_repeated_char(str1): for index,c in enumerate(str1): if str1[:index+1].count(c) > 1: return c return "None" | ['assert first_repeated_char("abcabc") == "a"' 'assert first_repeated_char("abc") == "None"' 'assert first_repeated_char("123123") == "1"… | [] | |
| 3 | 603 | Write a function to get a lucid number smaller than or equal to n. | def get_ludic(n): ludics = [] for i in range(1, n + 1): ludics.append(i) index = 1 while(index != len(ludics)): first_ludic =… | ['assert get_ludic(10) == [1, 2, 3, 5, 7]' 'assert get_ludic(25) == [1, 2, 3, 5, 7, 11, 13, 17, 23, 25]' 'assert get_ludic(45) == [1, 2, … | [] | |
| 4 | 604 | Write a function to reverse words in a given string. | def reverse_words(s): return ' '.join(reversed(s.split())) | ['assert reverse_words("python program")==("program python")' 'assert reverse_words("java language")==("language java")' 'assert reverse_… | [] | |
| 5 | 605 | Write a function to check if the given integer is a prime number. | def prime_num(num): if num >=1: for i in range(2, num//2): if (num % i) == 0: return False else: … | ['assert prime_num(13)==True' 'assert prime_num(7)==True' 'assert prime_num(-1010)==False'] | [] | |
| 6 | 606 | Write a function to convert degrees to radians. | import math def radian_degree(degree): radian = degree*(math.pi/180) return radian | ['assert radian_degree(90)==1.5707963267948966' 'assert radian_degree(60)==1.0471975511965976' 'assert radian_degree(120)==2.094395102393… | [] |
Data Dictionary
- task_id numeric
- text text
- code text
- test_list mixed
- test_setup_code categorical
- challenge_test_list mixed
Method And Limits
- Load the catalog entry and preview rows from the processed dataset file.
- Infer numeric, categorical, time, and location fields from real columns.
- Generate a small set of defensive Plotly chart specifications from that profile.
- Expose each chart idea as a query link so the report can be rerun or exported in Helix.
This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.