Skip to content

ForgeFrame and Statistical Analysis

Summary

ForgeFrame is a statistics toolkit that wraps a BareFrame and provides histogram analysis inspired by CERN ROOT's histogramming interface.

Key design: ForgeFrame outputs row-oriented results (one row per bin) that write directly to Google Sheets and integrate with charts.

JavaScript
const df = gaslamp.BareFrame.fromColumns({ score: [45, 72, 85, 91, 60, 88] });

// Create 5 equal-width bins from 50 to 100
const result = gaslamp.ForgeFrame.from(df).h1('score', {
  nbins: 5,
  xlow: 50,
  xup: 100,
});

// Output: one row per bin (ready for sheets and charts)
// bin       | count | relative | cumulative_relative
// underflow | 1     | 0.1667   | 0.1667
// [50, 60)  | 1     | 0.1667   | 0.3333
// [60, 70)  | 1     | 0.1667   | 0.5000
// [70, 80)  | 1     | 0.1667   | 0.6667
// [80, 90)  | 2     | 0.3333   | 1.0000
// [90, 100) | 0     | 0.0000   | 1.0000
// overflow  | 0     | 0.0000   | 1.0000

// Write directly to a sheet for charting
result.toSheet(SpreadsheetApp.getActiveSheet());

ForgeFrame returns plain BareFrame results—no schema validation. Each bin is a row, ready for Google Sheets visualization.

Philosophy: Histograms as Sheets, Ready for Charts

In standard Google Apps Script, creating charts requires manual work:

JavaScript
// Standard GAS: manual histogram binning into an object
const scores = [45, 72, 85, 91, 60, 88];
const bins = { underflow: 0, bin1: 0, bin2: 0, ..., overflow: 0 };

for (const score of scores) {
  if (score < 50) bins.underflow++;
  else if (score < 60) bins.bin1++;
  else if (score < 70) bins.bin2++;
  // ... more boilerplate ...
}

// Then manually write to sheet row by row
const sheet = SpreadsheetApp.getActiveSheet();
sheet.getRange(1, 1).setValue('Bin');
sheet.getRange(1, 2).setValue('Count');
sheet.getRange(2, 1).setValue('underflow');
sheet.getRange(2, 2).setValue(bins.underflow);
// ... tedious manual cell-by-cell filling ...

// Finally, create a chart

Problems:

  • Tedious manual conversion to sheet format
  • Easy to make off-by-one mistakes
  • Hard to compute relative frequency and cumulative statistics
  • Can't reuse or transform histogram results
  • Difficult to integrate with Google Sheets charting

The Solution: Histograms as Row-Oriented DataFrames

ForgeFrame produces histograms as a BareFrame where each bin is a row. This design means:

  1. Direct sheet output: .toSheet() writes histogram ready for charts
  2. Reusable results: Filter, sort, or transform the histogram
  3. Integrated statistics: Relative frequency and cumulative columns computed automatically
  4. Chart-friendly: Rows map directly to chart data series

Create a histogram the same way you filter or group:

JavaScript
// 1D histogram
gaslamp.ForgeFrame.from(df).h1('column', { nbins: 10, xlow: 0, xup: 100 });

// 2D histogram (joint distribution)
gaslamp.ForgeFrame.from(df).h2('age', 'salary', {
  xnbins: 10, xlow: 0, xup: 100,
  ynbins: 5, ylow: 0, yup: 500000,
});

Both return a BareFrame with:

  • Bin labels
  • Counts
  • Relative frequencies
  • Cumulative statistics

Then use standard DataFrame operations to extract insights:

JavaScript
const hist = gaslamp.ForgeFrame.from(df).h1('score', { nbins: 5, xlow: 0, xup: 100 });

// Find the most common bin
const peak = hist.filter(row => row.get('count') > 0)
  .sort((a, b) => {
    const aCount = row.get('count');
    const bCount = row.get('count');
    return bCount - aCount;
  })
  .slice(0, 1);

How It Works: Binning and Counting

1D Histograms: h1(column, options)

Divides values into equal-width bins and counts how many fall into each.

Bin convention (from CERN ROOT):

Text Only
[xlow, xup) — half-open interval
├─ [xlow, xlow+width)
├─ [xlow+width, xlow+2*width)
├─ ... (nbins total)
└─ [xup-width, xup)

Values outside:

  • Below xlow: go to underflow bin
  • At or above xup: go to overflow bin

Example: 5 bins from 50 to 100

Text Only
[50, 60), [60, 70), [70, 80), [80, 90), [90, 100)

Value 45 → underflow Value 75 → [70, 80) Value 100 → overflow

Output columns:

Column Description
bin Bin label: "underflow", "[xlow, xhigh)", "overflow"
count Number of values in this bin
relative count / total (as decimal)
cumulative_relative Sum of relative frequency up to this bin

2D Histograms: h2(x_column, y_column, options)

Analyzes joint distribution of two variables using a 2D grid.

Example: Age (x: 0–100) vs Salary (y: 0–500K)

Text Only
xnbins: 10 (0–100 split into 10 bins, 10K each)
ynbins: 5  (0–500K split into 5 bins, 100K each)

Creates 10 × 5 = 50 bins (plus underflow/overflow on each axis).

Output columns:

Column Description
x_bin X-axis bin label
y_bin Y-axis bin label
count Number of (x, y) pairs in this bin
relative count / total
cumulative_relative Sum of relative frequency

Understanding Relative Frequencies

Text Only
relative = count / total_count
cumulative_relative = sum of relative from first bin to current

Example: 6 values, 5 bins

Text Only
underflow    1    0.1667    0.1667
[50, 60)     1    0.1667    0.3333
[60, 70)     1    0.1667    0.5000
[70, 80)     1    0.1667    0.6667
[80, 90)     2    0.3333    1.0000
overflow     0    0.0000    1.0000

Cumulative frequency shows percentiles: 50% of values are in first 3 bins.

Design Insight: Why Separate from BareFrame?

You might ask: "Why not just add h1() to BareFrame?"

Because:

  1. Focused responsibility: BareFrame does general transformations. ForgeFrame specializes in statistical analysis.

  2. Optional import: Users who don't need histograms don't pay for the code.

  3. Clear intent: .h1() on ForgeFrame explicitly signals "I'm doing statistical analysis," not "I'm transforming data."

  4. One job, done well: ForgeFrame doesn't try to do every statistical operation. Just histograms, done right.

Precision and Rounding

By default, relative and cumulative_relative columns are rounded to 4 decimal places.

Customize with ForgeFrameOptions:

JavaScript
// Round to 2 decimal places
const forge = gaslamp.ForgeFrame.from(df, { precision: 2 });
const hist = forge.h1('score', { nbins: 10, xlow: 0, xup: 100 });

// relative column: 0.17, 0.33, ... (not 0.1667, 0.3333)

Precision affects readability without changing underlying counts.

Use Cases

1. Understand Data Distribution

JavaScript
const ages = gaslamp.BareFrame.fromSheet(sheet);
const hist = gaslamp.ForgeFrame.from(ages).h1('age', { nbins: 10, xlow: 0, xup: 100 });

console.log(hist.toString()); // Quick visual of age distribution

2. Find Outliers and Anomalies

JavaScript
const hist = gaslamp.ForgeFrame.from(df).h1('metric', { nbins: 50, xlow: 0, xup: 1000 });

// Check underflow/overflow
const anomalies = hist.filter(row =>
  row.get('bin') === 'underflow' || row.get('bin') === 'overflow'
);

3. Compare Distributions Across Groups

JavaScript
const sales = df.filter(row => row.get('category') === 'Sales');
const eng = df.filter(row => row.get('category') === 'Engineering');

const salesHist = gaslamp.ForgeFrame.from(sales).h1('salary', { nbins: 10, xlow: 0, xup: 200000 });
const engHist = gaslamp.ForgeFrame.from(eng).h1('salary', { nbins: 10, xlow: 0, xup: 200000 });

// Compare shapes
console.log('Sales:', salesHist.filter(row => row.get('relative') > 0.2).length);
console.log('Eng:', engHist.filter(row => row.get('relative') > 0.2).length);

4. Analyze Relationships (2D)

JavaScript
const hist2d = gaslamp.ForgeFrame.from(employees).h2('age', 'salary', {
  xnbins: 10, xlow: 20, xup: 70,
  ynbins: 5, ylow: 30000, yup: 150000,
});

// Find the (age, salary) bin with the most employees
const modal = hist2d.filter(row => row.get('count') > 0)
  .sort((a, b) => {
    const bCount = row.get('count');
    const aCount = row.get('count');
    return bCount - aCount;
  })
  .slice(0, 1);

Limitations and Future Directions

Current: ForgeFrame provides 1D and 2D histograms with equal-width bins.

Not included (potential future features):

  • Variable-width bins (e.g., log scale)
  • Statistical tests (χ², KS test)
  • Kernel density estimation
  • N-dimensional histograms (n > 2)

The philosophy is: do one thing well, keep the API simple.

If you need more advanced statistics, use ForgeFrame for exploratory binning, then apply external statistics libraries to the result.

Design Pattern: Row-Oriented for Sheets

ForgeFrame outputs row-oriented histograms (one row per bin), not column-oriented summaries. Why?

Column-oriented (not used here):

JSON
{
  "bins": ["underflow", "[50, 60)", "[60, 70)"],
  "counts": [1, 2, 3],
  "relative": [0.1, 0.2, 0.3]
}

Problems: Can't write directly to sheets, hard to integrate with charts.

Row-oriented (ForgeFrame's approach):

Text Only
bin       | count | relative
underflow | 1     | 0.1667
[50, 60)  | 2     | 0.3333
[60, 70)  | 3     | 0.5000

Advantages:

  • Write to sheets: .toSheet() puts one bin per row
  • Create charts: Google Sheets reads rows directly as data series
  • Transform results: Use BareFrame operations on histogram rows
  • Analyze bins: Filter, sort, or manipulate individual bins

Example: Create a histogram and chart it

JavaScript
const hist = gaslamp.ForgeFrame.from(df).h1('score', { nbins: 10, xlow: 0, xup: 100 });

// Write to sheet
const sheet = SpreadsheetApp.getActiveSheet();
hist.toSheet(sheet);

// Sheet now has:
// Row 1: headers (bin, count, relative, cumulative_relative)
// Row 2: underflow data
// Row 3: [0, 10) data
// ... one row per bin ...

// Now create a chart in Google Sheets by selecting the data
// The chart reads rows directly—no additional formatting needed

Design Pattern: Inspection Without Mutation

ForgeFrame is read-only. It doesn't modify the original BareFrame:

JavaScript
const df = gaslamp.BareFrame.fromColumns({ values: [1, 2, 3] });
const hist = gaslamp.ForgeFrame.from(df).h1('values', { nbins: 3, xlow: 0, xup: 3 });

// df is unchanged
console.log(df.length); // still 3 rows
console.log(hist.length); // histogram result: 7 rows (3 bins + under + overflow + computed stats)

This is safe and predictable. You can call .h1() without worrying about side effects.