Dataset research report
Spambase research report
A reproducible data report with schema notes, generated chart evidence, suggested follow-up questions, and export-ready Helix queries.
Executive Summary
Spambase The Spambase dataset from the UCI ML repository. Is the given mail spam? Configurations and tasks Configuration Task Description spambase Binary classification Is the mail spam? Usage from datasets import load_dataset dataset = load_dataset("mstz/spambase")["train"]
Research Context
Spambase: 500 rows by 58 columns. These exploratory charts are generated automatically from the data - open the dataset in Helix to ask your own questions.
Data Profile
Chart Evidence
These views are generated from the dataset profile. Each chart is paired with a Helix query so it can be opened, adjusted, and exported.
word_freq_make vs word_freq_address
Relationship between word_freq_make and word_freq_address.
Open and export this chartDistribution of word_freq_make
Histogram of word_freq_make values.
Open and export this chartCorrelation of numeric columns
Pearson correlation between numeric columns.
Open and export this chartFollow-Up Queries
Preview Rows
| # | word_freq_makefloat | word_freq_addressfloat | word_freq_allfloat | word_freq_3dfloat | word_freq_ourfloat | word_freq_overfloat | word_freq_removefloat | word_freq_internetfloat |
|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0.64 | 0.64 | 0 | 0.32 | 0 | 0 | 0 |
| 2 | 0.21 | 0.28 | 0.5 | 0 | 0.14 | 0.28 | 0.21 | 0.07 |
| 3 | 0.06 | 0 | 0.71 | 0 | 1.23 | 0.19 | 0.19 | 0.12 |
| 4 | 0 | 0 | 0 | 0 | 0.63 | 0 | 0.31 | 0.63 |
| 5 | 0 | 0 | 0 | 0 | 0.63 | 0 | 0.31 | 0.63 |
| 6 | 0 | 0 | 0 | 0 | 1.85 | 0 | 0 | 1.85 |
Data Dictionary
- word_freq_make numeric
- word_freq_address numeric
- word_freq_all numeric
- word_freq_3d numeric
- word_freq_our numeric
- word_freq_over numeric
- word_freq_remove numeric
- word_freq_internet numeric
- word_freq_order numeric
- word_freq_mail numeric
- word_freq_receive numeric
- word_freq_will numeric
- word_freq_people numeric
- word_freq_report numeric
- word_freq_addresses numeric
- word_freq_free numeric
- word_freq_business numeric
- word_freq_email numeric
- word_freq_you numeric
- word_freq_credit numeric
- word_freq_your numeric
- word_freq_font numeric
- word_freq_000 numeric
- word_freq_money numeric
- word_freq_hp numeric
- word_freq_hpl numeric
- word_freq_george numeric
- word_freq_650 numeric
- word_freq_lab numeric
- word_freq_labs numeric
- word_freq_telnet numeric
- word_freq_857 numeric
- word_freq_data numeric
- word_freq_415 numeric
- word_freq_85 numeric
- word_freq_technology numeric
- word_freq_1999 numeric
- word_freq_parts numeric
- word_freq_pm numeric
- word_freq_direct numeric
- word_freq_cs numeric
- word_freq_meeting numeric
- word_freq_original numeric
- word_freq_project numeric
- word_freq_re numeric
- word_freq_edu numeric
- word_freq_table numeric
- word_freq_conference numeric
- char_freq_; numeric
- char_freq_( numeric
- char_freq_[ numeric
- char_freq_! numeric
- char_freq_$ numeric
- char_freq_# numeric
- capital_run_length_average numeric
- capital_run_length_longest numeric
- capital_run_length_total numeric
- is_spam numeric
Method And Limits
- Load the catalog entry and preview rows from the processed dataset file.
- Infer numeric, categorical, time, and location fields from real columns.
- Generate a small set of defensive Plotly chart specifications from that profile.
- Expose each chart idea as a query link so the report can be rerun or exported in Helix.
This report is intentionally reproducible. It uses the local catalog metadata and generated chart specifications rather than claiming external conclusions beyond the dataset.