ForgeFrame and Statistical Analysis¶
Summary¶
ForgeFrame is a statistics toolkit that wraps a BareFrame and provides histogram analysis inspired by CERN ROOT's histogramming interface.
Key design: ForgeFrame outputs row-oriented results (one row per bin) that write directly to Google Sheets and integrate with charts.
const df = gaslamp.BareFrame.fromColumns({ score: [45, 72, 85, 91, 60, 88] });
// Create 5 equal-width bins from 50 to 100
const result = gaslamp.ForgeFrame.from(df).h1('score', {
nbins: 5,
xlow: 50,
xup: 100,
});
// Output: one row per bin (ready for sheets and charts)
// bin | count | relative | cumulative_relative
// underflow | 1 | 0.1667 | 0.1667
// [50, 60) | 1 | 0.1667 | 0.3333
// [60, 70) | 1 | 0.1667 | 0.5000
// [70, 80) | 1 | 0.1667 | 0.6667
// [80, 90) | 2 | 0.3333 | 1.0000
// [90, 100) | 0 | 0.0000 | 1.0000
// overflow | 0 | 0.0000 | 1.0000
// Write directly to a sheet for charting
result.toSheet(SpreadsheetApp.getActiveSheet());
ForgeFrame returns plain BareFrame results—no schema validation. Each bin is a row, ready for Google Sheets visualization.
Philosophy: Histograms as Sheets, Ready for Charts¶
In standard Google Apps Script, creating charts requires manual work:
// Standard GAS: manual histogram binning into an object
const scores = [45, 72, 85, 91, 60, 88];
const bins = { underflow: 0, bin1: 0, bin2: 0, ..., overflow: 0 };
for (const score of scores) {
if (score < 50) bins.underflow++;
else if (score < 60) bins.bin1++;
else if (score < 70) bins.bin2++;
// ... more boilerplate ...
}
// Then manually write to sheet row by row
const sheet = SpreadsheetApp.getActiveSheet();
sheet.getRange(1, 1).setValue('Bin');
sheet.getRange(1, 2).setValue('Count');
sheet.getRange(2, 1).setValue('underflow');
sheet.getRange(2, 2).setValue(bins.underflow);
// ... tedious manual cell-by-cell filling ...
// Finally, create a chart
Problems:
- Tedious manual conversion to sheet format
- Easy to make off-by-one mistakes
- Hard to compute relative frequency and cumulative statistics
- Can't reuse or transform histogram results
- Difficult to integrate with Google Sheets charting
The Solution: Histograms as Row-Oriented DataFrames¶
ForgeFrame produces histograms as a BareFrame where each bin is a row. This design means:
- Direct sheet output:
.toSheet()writes histogram ready for charts - Reusable results: Filter, sort, or transform the histogram
- Integrated statistics: Relative frequency and cumulative columns computed automatically
- Chart-friendly: Rows map directly to chart data series
Create a histogram the same way you filter or group:
// 1D histogram
gaslamp.ForgeFrame.from(df).h1('column', { nbins: 10, xlow: 0, xup: 100 });
// 2D histogram (joint distribution)
gaslamp.ForgeFrame.from(df).h2('age', 'salary', {
xnbins: 10, xlow: 0, xup: 100,
ynbins: 5, ylow: 0, yup: 500000,
});
Both return a BareFrame with:
- Bin labels
- Counts
- Relative frequencies
- Cumulative statistics
Then use standard DataFrame operations to extract insights:
const hist = gaslamp.ForgeFrame.from(df).h1('score', { nbins: 5, xlow: 0, xup: 100 });
// Find the most common bin
const peak = hist.filter(row => row.get('count') > 0)
.sort((a, b) => {
const aCount = row.get('count');
const bCount = row.get('count');
return bCount - aCount;
})
.slice(0, 1);
How It Works: Binning and Counting¶
1D Histograms: h1(column, options)¶
Divides values into equal-width bins and counts how many fall into each.
Bin convention (from CERN ROOT):
[xlow, xup) — half-open interval
├─ [xlow, xlow+width)
├─ [xlow+width, xlow+2*width)
├─ ... (nbins total)
└─ [xup-width, xup)
Values outside:
- Below xlow: go to
underflowbin - At or above xup: go to
overflowbin
Example: 5 bins from 50 to 100
[50, 60), [60, 70), [70, 80), [80, 90), [90, 100)
Value 45 → underflow Value 75 → [70, 80) Value 100 → overflow
Output columns:
| Column | Description |
|---|---|
bin |
Bin label: "underflow", "[xlow, xhigh)", "overflow" |
count |
Number of values in this bin |
relative |
count / total (as decimal) |
cumulative_relative |
Sum of relative frequency up to this bin |
2D Histograms: h2(x_column, y_column, options)¶
Analyzes joint distribution of two variables using a 2D grid.
Example: Age (x: 0–100) vs Salary (y: 0–500K)
xnbins: 10 (0–100 split into 10 bins, 10K each)
ynbins: 5 (0–500K split into 5 bins, 100K each)
Creates 10 × 5 = 50 bins (plus underflow/overflow on each axis).
Output columns:
| Column | Description |
|---|---|
x_bin |
X-axis bin label |
y_bin |
Y-axis bin label |
count |
Number of (x, y) pairs in this bin |
relative |
count / total |
cumulative_relative |
Sum of relative frequency |
Understanding Relative Frequencies¶
relative = count / total_count
cumulative_relative = sum of relative from first bin to current
Example: 6 values, 5 bins
underflow 1 0.1667 0.1667
[50, 60) 1 0.1667 0.3333
[60, 70) 1 0.1667 0.5000
[70, 80) 1 0.1667 0.6667
[80, 90) 2 0.3333 1.0000
overflow 0 0.0000 1.0000
Cumulative frequency shows percentiles: 50% of values are in first 3 bins.
Design Insight: Why Separate from BareFrame?¶
You might ask: "Why not just add h1() to BareFrame?"
Because:
-
Focused responsibility: BareFrame does general transformations. ForgeFrame specializes in statistical analysis.
-
Optional import: Users who don't need histograms don't pay for the code.
-
Clear intent:
.h1()onForgeFrameexplicitly signals "I'm doing statistical analysis," not "I'm transforming data." -
One job, done well: ForgeFrame doesn't try to do every statistical operation. Just histograms, done right.
Precision and Rounding¶
By default, relative and cumulative_relative columns are rounded to 4 decimal places.
Customize with ForgeFrameOptions:
// Round to 2 decimal places
const forge = gaslamp.ForgeFrame.from(df, { precision: 2 });
const hist = forge.h1('score', { nbins: 10, xlow: 0, xup: 100 });
// relative column: 0.17, 0.33, ... (not 0.1667, 0.3333)
Precision affects readability without changing underlying counts.
Use Cases¶
1. Understand Data Distribution¶
const ages = gaslamp.BareFrame.fromSheet(sheet);
const hist = gaslamp.ForgeFrame.from(ages).h1('age', { nbins: 10, xlow: 0, xup: 100 });
console.log(hist.toString()); // Quick visual of age distribution
2. Find Outliers and Anomalies¶
const hist = gaslamp.ForgeFrame.from(df).h1('metric', { nbins: 50, xlow: 0, xup: 1000 });
// Check underflow/overflow
const anomalies = hist.filter(row =>
row.get('bin') === 'underflow' || row.get('bin') === 'overflow'
);
3. Compare Distributions Across Groups¶
const sales = df.filter(row => row.get('category') === 'Sales');
const eng = df.filter(row => row.get('category') === 'Engineering');
const salesHist = gaslamp.ForgeFrame.from(sales).h1('salary', { nbins: 10, xlow: 0, xup: 200000 });
const engHist = gaslamp.ForgeFrame.from(eng).h1('salary', { nbins: 10, xlow: 0, xup: 200000 });
// Compare shapes
console.log('Sales:', salesHist.filter(row => row.get('relative') > 0.2).length);
console.log('Eng:', engHist.filter(row => row.get('relative') > 0.2).length);
4. Analyze Relationships (2D)¶
const hist2d = gaslamp.ForgeFrame.from(employees).h2('age', 'salary', {
xnbins: 10, xlow: 20, xup: 70,
ynbins: 5, ylow: 30000, yup: 150000,
});
// Find the (age, salary) bin with the most employees
const modal = hist2d.filter(row => row.get('count') > 0)
.sort((a, b) => {
const bCount = row.get('count');
const aCount = row.get('count');
return bCount - aCount;
})
.slice(0, 1);
Limitations and Future Directions¶
Current: ForgeFrame provides 1D and 2D histograms with equal-width bins.
Not included (potential future features):
- Variable-width bins (e.g., log scale)
- Statistical tests (χ², KS test)
- Kernel density estimation
- N-dimensional histograms (n > 2)
The philosophy is: do one thing well, keep the API simple.
If you need more advanced statistics, use ForgeFrame for exploratory binning, then apply external statistics libraries to the result.
Design Pattern: Row-Oriented for Sheets¶
ForgeFrame outputs row-oriented histograms (one row per bin), not column-oriented summaries. Why?
Column-oriented (not used here):
{
"bins": ["underflow", "[50, 60)", "[60, 70)"],
"counts": [1, 2, 3],
"relative": [0.1, 0.2, 0.3]
}
Problems: Can't write directly to sheets, hard to integrate with charts.
Row-oriented (ForgeFrame's approach):
bin | count | relative
underflow | 1 | 0.1667
[50, 60) | 2 | 0.3333
[60, 70) | 3 | 0.5000
Advantages:
- Write to sheets:
.toSheet()puts one bin per row - Create charts: Google Sheets reads rows directly as data series
- Transform results: Use BareFrame operations on histogram rows
- Analyze bins: Filter, sort, or manipulate individual bins
Example: Create a histogram and chart it
const hist = gaslamp.ForgeFrame.from(df).h1('score', { nbins: 10, xlow: 0, xup: 100 });
// Write to sheet
const sheet = SpreadsheetApp.getActiveSheet();
hist.toSheet(sheet);
// Sheet now has:
// Row 1: headers (bin, count, relative, cumulative_relative)
// Row 2: underflow data
// Row 3: [0, 10) data
// ... one row per bin ...
// Now create a chart in Google Sheets by selecting the data
// The chart reads rows directly—no additional formatting needed
Design Pattern: Inspection Without Mutation¶
ForgeFrame is read-only. It doesn't modify the original BareFrame:
const df = gaslamp.BareFrame.fromColumns({ values: [1, 2, 3] });
const hist = gaslamp.ForgeFrame.from(df).h1('values', { nbins: 3, xlow: 0, xup: 3 });
// df is unchanged
console.log(df.length); // still 3 rows
console.log(hist.length); // histogram result: 7 rows (3 bins + under + overflow + computed stats)
This is safe and predictable. You can call .h1() without worrying about side effects.
Related¶
- BareFrame-First Design — Understanding BareFrame, the foundation
- Data Orientation and Conversion — How histograms internally organize data
- FlameFrame and Validation — Optional: validate histogram results before further use