Skip to content

GroupedFrame and Aggregation

Summary

GroupedFrame is a temporary wrapper that represents grouped data. It provides aggregation methods with a declarative, multi-column API: specify which columns to aggregate in a single call.

Unlike BareFrame (which is immutable and reusable), GroupedFrame exists only for the duration of aggregation. It's not meant to be stored.

JavaScript
const grouped = df.groupBy(['department']);

// Declarative: specify columns to aggregate explicitly
const counts = grouped.count();              // one row per group
const sums = grouped.sum(['salary', 'bonus']); // sum multiple columns at once
const means = grouped.mean(['sales', 'commission']);

// Custom aggregations
const report = grouped.agg({
  salary: (vals) => Math.max(...vals),
  bonus: (vals) => Math.min(...vals),
});

Design principle: gaslamp uses explicit column lists (sum(['col1', 'col2'])) rather than pandas/polars' separate selection step. This is more declarative—you say exactly what you want aggregated in one readable call.

All aggregation methods return a BareFrame with one row per group.

Philosophy: Grouping + Aggregation as a Two-Step Operation

In standard Google Apps Script, computing group summaries is tedious:

JavaScript
// Standard GAS: manual grouping and aggregation
const data = [
  { dept: 'eng', salary: 80000 },
  { dept: 'eng', salary: 90000 },
  { dept: 'hr', salary: 70000 },
];

// Build a map manually
const groups = {};
for (const row of data) {
  if (!groups[row.dept]) groups[row.dept] = [];
  groups[row.dept].push(row);
}

// Aggregate each group manually
const results = [];
for (const [dept, rows] of Object.entries(groups)) {
  const totalSalary = rows.reduce((sum, r) => sum + r.salary, 0);
  results.push({ dept, totalSalary });
}

Problems:

  • Repetitive and error-prone
  • Different aggregations require completely different loops
  • Hard to compose (e.g., grouping by multiple columns)
  • No reusable patterns

The Solution: Separate Grouping from Aggregation

gaslamp separates the two concerns:

  1. Grouping (BareFrame.groupBy()) — Creates a GroupedFrame
  2. Aggregation (GroupedFrame.count(), .sum(), .agg()) — Returns a BareFrame
JavaScript
const grouped = df.groupBy(['department']);      // Step 1: group
const summary = grouped.sum(['salary']);         // Step 2: aggregate

The separation is clean:

  • Grouping logic is in one place
  • Each aggregation is a simple, focused method
  • Results are always a BareFrame (predictable)

How It Works: The Grouping Process

Step 1: Define Group Keys

When you call df.groupBy(['department']), gaslamp:

  1. Reads all values in the 'department' column
  2. Creates a composite key for each unique value (or combination of values if multiple columns)
  3. Stores the row indices that belong to each group

Example:

JavaScript
const df = gaslamp.BareFrame.fromColumns({
  dept: ['eng', 'eng', 'hr', 'hr', 'eng'],
  salary: [80000, 90000, 70000, 75000, 95000],
});

const grouped = df.groupBy(['dept']);
// Internally:
// 'eng' → [0, 1, 4]    (row indices)
// 'hr'  → [2, 3]       (row indices)

Step 2: Aggregate Within Groups

Each aggregation method operates on the row indices:

count() — Row count per group

JavaScript
grouped.count();
// dept | count
// eng  | 3
// hr   | 2

first() and last() — First/last row of each group

JavaScript
grouped.first();
// dept | salary
// eng  | 80000
// hr   | 70000

sum(), mean(), min(), max() — Numeric aggregations

JavaScript
grouped.sum(['salary']);
// dept | salary
// eng  | 265000
// hr   | 145000

grouped.mean(['salary']);
// dept | salary
// eng  | 88333.33
// hr   | 72500

agg() — Custom aggregations

JavaScript
grouped.agg({
  salary: (values) => Math.max(...values),
  count: (values) => values.length,
});
// dept | salary | count
// eng  | 95000  | 3
// hr   | 75000  | 2

Multiple Group Keys

You can group by multiple columns:

JavaScript
df.groupBy(['dept', 'level'])
  .count();
// dept | level | count
// eng  | 1     | 2
// eng  | 2     | 1
// hr   | 1     | 1
// hr   | 2     | 1

Composite keys are formed from all grouping columns. Two rows are in the same group only if ALL key columns match.

API Design: Declarative vs. Implicit Selection

gaslamp: Explicit Multi-Column Aggregation

JavaScript
// gaslamp: declare columns explicitly
grouped.sum(['salary', 'bonus', 'commission']);
// Result: one row per group with all three columns summed

Advantages:

  • Declarative: Clear what's being aggregated
  • Single call: Specify multiple columns at once
  • Readable: No intermediate selection steps
  • GAS-friendly: Direct mapping to sheet columns

pandas/polars: Implicit Column Selection

Python
# pandas: select columns, then aggregate
grouped[['salary', 'bonus', 'commission']].sum()

gaslamp's approach is more intuitive because:

  1. You specify exactly what you want in the result
  2. No intermediate "select columns, then aggregate" pattern
  3. The intent is obvious at a glance
  4. Feels natural in Google Sheets workflows

Design Insight: Why GroupedFrame Is Temporary

GroupedFrame is not a DataFrame. It's a temporary wrapper:

JavaScript
const grouped = df.groupBy(['dept']); // GroupedFrame
const result = grouped.sum(['salary']); // BareFrame

// You can't:
// grouped.filter(...) ← No, not a DataFrame
// grouped.select(...) ← No, not a DataFrame

// You must:
result.filter(...) // ← Use the result (BareFrame)

Why?

  • Clear intent: GroupedFrame explicitly means "aggregation step"
  • No ambiguity: You can't accidentally operate on grouped data
  • Temporary: GroupedFrame is created and discarded immediately
  • Type safety: Different methods for grouped vs. ungrouped data

If you need to transform the result, use BareFrame methods:

JavaScript
const summary = df.groupBy(['dept']).sum(['salary']);
const filtered = summary.filter(row => row.get('salary') > 100000);
const final = filtered.select(['dept', 'salary']);

Rolling Window Aggregations

Some aggregation methods support rolling windows—aggregating over recent rows within each group.

JavaScript
df.groupBy(['dept']).sum(['salary'], { window: 2 });

This aggregates the last 2 rows of each group. Window size is specified per aggregation, not global.

When to use:

  • Time-series data grouped by category
  • Computing moving averages within groups
  • Detecting trends within each group

Aggregation Semantics

Order Preservation

Groups appear in the order they first occur in the data:

JavaScript
const df = gaslamp.BareFrame.fromColumns({
  dept: ['hr', 'eng', 'hr', 'eng'],
  salary: [70000, 80000, 75000, 90000],
});

df.groupBy(['dept']).count();
// dept | count
// hr   | 2      ← first group (appears first in data)
// eng  | 2

Null Handling

Rows with null in a group key are treated as a separate group:

JavaScript
const df = gaslamp.BareFrame.fromColumns({
  dept: ['eng', null, 'eng', null],
  salary: [80000, 70000, 90000, 75000],
});

df.groupBy(['dept']).count();
// dept | count
// eng  | 2
// null | 2

Empty Groups

Only groups that actually appear in the data are returned. No empty groups are created.

Comparison: first() vs last() vs Custom agg()

Method Returns Use Case
first() All columns, first row per group Get representative row, historical baseline
last() All columns, last row per group Get most recent row, current state
agg() User-specified columns Custom logic, multiple aggregations

All return a BareFrame with one row per group.

Extending Aggregations

The architecture makes it easy to add new aggregations:

JavaScript
// Current: count, first, last, sum, mean, min, max, agg
// Future: median, mode, stddev, etc.

// Could add:
grouped.median(['salary']);
grouped.stddev(['salary']);
grouped.custom({ ... });

Each new aggregation is a simple method that iterates over groups and returns a BareFrame.