GroupedFrame and Aggregation¶

Summary¶

GroupedFrame is a temporary wrapper that represents grouped data. It provides aggregation methods with a declarative, multi-column API: specify which columns to aggregate in a single call.

Unlike BareFrame (which is immutable and reusable), GroupedFrame exists only for the duration of aggregation. It's not meant to be stored.

JavaScript

const grouped = df.groupBy(['department']);

// Declarative: specify columns to aggregate explicitly
const counts = grouped.count();              // one row per group
const sums = grouped.sum(['salary', 'bonus']); // sum multiple columns at once
const means = grouped.mean(['sales', 'commission']);

// Custom aggregations
const report = grouped.agg({
  salary: (vals) => Math.max(...vals),
  bonus: (vals) => Math.min(...vals),
});

Design principle: gaslamp uses explicit column lists (sum(['col1', 'col2'])) rather than pandas/polars' separate selection step. This is more declarative—you say exactly what you want aggregated in one readable call.

All aggregation methods return a BareFrame with one row per group.

Philosophy: Grouping + Aggregation as a Two-Step Operation¶

In standard Google Apps Script, computing group summaries is tedious:

JavaScript

// Standard GAS: manual grouping and aggregation
const data = [
  { dept: 'eng', salary: 80000 },
  { dept: 'eng', salary: 90000 },
  { dept: 'hr', salary: 70000 },
];

// Build a map manually
const groups = {};
for (const row of data) {
  if (!groups[row.dept]) groups[row.dept] = [];
  groups[row.dept].push(row);
}

// Aggregate each group manually
const results = [];
for (const [dept, rows] of Object.entries(groups)) {
  const totalSalary = rows.reduce((sum, r) => sum + r.salary, 0);
  results.push({ dept, totalSalary });
}

Problems:

Repetitive and error-prone
Different aggregations require completely different loops
Hard to compose (e.g., grouping by multiple columns)
No reusable patterns

The Solution: Separate Grouping from Aggregation¶

gaslamp separates the two concerns:

Grouping (BareFrame.groupBy()) — Creates a GroupedFrame
Aggregation (GroupedFrame.count(), .sum(), .agg()) — Returns a BareFrame

JavaScript

const grouped = df.groupBy(['department']);      // Step 1: group
const summary = grouped.sum(['salary']);         // Step 2: aggregate

The separation is clean:

Grouping logic is in one place
Each aggregation is a simple, focused method
Results are always a BareFrame (predictable)

How It Works: The Grouping Process¶

Step 1: Define Group Keys¶

When you call df.groupBy(['department']), gaslamp:

Reads all values in the 'department' column
Creates a composite key for each unique value (or combination of values if multiple columns)
Stores the row indices that belong to each group

Example:

JavaScript

const df = gaslamp.BareFrame.fromColumns({
  dept: ['eng', 'eng', 'hr', 'hr', 'eng'],
  salary: [80000, 90000, 70000, 75000, 95000],
});

const grouped = df.groupBy(['dept']);
// Internally:
// 'eng' → [0, 1, 4]    (row indices)
// 'hr'  → [2, 3]       (row indices)

Step 2: Aggregate Within Groups¶

Each aggregation method operates on the row indices:

count() — Row count per group

JavaScript

grouped.count();
// dept | count
// eng  | 3
// hr   | 2

first() and last() — First/last row of each group

JavaScript

grouped.first();
// dept | salary
// eng  | 80000
// hr   | 70000

sum(), mean(), min(), max() — Numeric aggregations

JavaScript

grouped.sum(['salary']);
// dept | salary
// eng  | 265000
// hr   | 145000

grouped.mean(['salary']);
// dept | salary
// eng  | 88333.33
// hr   | 72500

agg() — Custom aggregations

JavaScript

grouped.agg({
  salary: (values) => Math.max(...values),
  count: (values) => values.length,
});
// dept | salary | count
// eng  | 95000  | 3
// hr   | 75000  | 2

Multiple Group Keys¶

You can group by multiple columns:

JavaScript

df.groupBy(['dept', 'level'])
  .count();
// dept | level | count
// eng  | 1     | 2
// eng  | 2     | 1
// hr   | 1     | 1
// hr   | 2     | 1

Composite keys are formed from all grouping columns. Two rows are in the same group only if ALL key columns match.

API Design: Declarative vs. Implicit Selection¶

gaslamp: Explicit Multi-Column Aggregation¶

JavaScript

// gaslamp: declare columns explicitly
grouped.sum(['salary', 'bonus', 'commission']);
// Result: one row per group with all three columns summed

Advantages:

Declarative: Clear what's being aggregated
Single call: Specify multiple columns at once
Readable: No intermediate selection steps
GAS-friendly: Direct mapping to sheet columns

pandas/polars: Implicit Column Selection¶

Python

# pandas: select columns, then aggregate
grouped[['salary', 'bonus', 'commission']].sum()

gaslamp's approach is more intuitive because:

You specify exactly what you want in the result
No intermediate "select columns, then aggregate" pattern
The intent is obvious at a glance
Feels natural in Google Sheets workflows

Design Insight: Why GroupedFrame Is Temporary¶

GroupedFrame is not a DataFrame. It's a temporary wrapper:

JavaScript

const grouped = df.groupBy(['dept']); // GroupedFrame
const result = grouped.sum(['salary']); // BareFrame

// You can't:
// grouped.filter(...) ← No, not a DataFrame
// grouped.select(...) ← No, not a DataFrame

// You must:
result.filter(...) // ← Use the result (BareFrame)

Why?

Clear intent: GroupedFrame explicitly means "aggregation step"
No ambiguity: You can't accidentally operate on grouped data
Temporary: GroupedFrame is created and discarded immediately
Type safety: Different methods for grouped vs. ungrouped data

If you need to transform the result, use BareFrame methods:

JavaScript

const summary = df.groupBy(['dept']).sum(['salary']);
const filtered = summary.filter(row => row.get('salary') > 100000);
const final = filtered.select(['dept', 'salary']);

Rolling Window Aggregations¶

Some aggregation methods support rolling windows—aggregating over recent rows within each group.

JavaScript

df.groupBy(['dept']).sum(['salary'], { window: 2 });

This aggregates the last 2 rows of each group. Window size is specified per aggregation, not global.

When to use:

Time-series data grouped by category
Computing moving averages within groups
Detecting trends within each group

Aggregation Semantics¶

Order Preservation¶

Groups appear in the order they first occur in the data:

JavaScript

const df = gaslamp.BareFrame.fromColumns({
  dept: ['hr', 'eng', 'hr', 'eng'],
  salary: [70000, 80000, 75000, 90000],
});

df.groupBy(['dept']).count();
// dept | count
// hr   | 2      ← first group (appears first in data)
// eng  | 2

Null Handling¶

Rows with null in a group key are treated as a separate group:

JavaScript

const df = gaslamp.BareFrame.fromColumns({
  dept: ['eng', null, 'eng', null],
  salary: [80000, 70000, 90000, 75000],
});

df.groupBy(['dept']).count();
// dept | count
// eng  | 2
// null | 2

Empty Groups¶

Only groups that actually appear in the data are returned. No empty groups are created.

Comparison: `first()` vs `last()` vs Custom `agg()`¶

Method	Returns	Use Case
`first()`	All columns, first row per group	Get representative row, historical baseline
`last()`	All columns, last row per group	Get most recent row, current state
`agg()`	User-specified columns	Custom logic, multiple aggregations

All return a BareFrame with one row per group.

Extending Aggregations¶

The architecture makes it easy to add new aggregations:

JavaScript

// Current: count, first, last, sum, mean, min, max, agg
// Future: median, mode, stddev, etc.

// Could add:
grouped.median(['salary']);
grouped.stddev(['salary']);
grouped.custom({ ... });

Each new aggregation is a simple method that iterates over groups and returns a BareFrame.

BareFrame-First Design — Understanding BareFrame's role
Data Orientation and Conversion — How grouped data is organized internally
Expressions and Predicates — Filtering grouped results after aggregation

GroupedFrame and Aggregation¶

Summary¶

Philosophy: Grouping + Aggregation as a Two-Step Operation¶

The Solution: Separate Grouping from Aggregation¶

How It Works: The Grouping Process¶

Step 1: Define Group Keys¶

Step 2: Aggregate Within Groups¶

Multiple Group Keys¶

API Design: Declarative vs. Implicit Selection¶

gaslamp: Explicit Multi-Column Aggregation¶

pandas/polars: Implicit Column Selection¶

Design Insight: Why GroupedFrame Is Temporary¶

Rolling Window Aggregations¶

Aggregation Semantics¶

Order Preservation¶

Null Handling¶

Empty Groups¶

Comparison: first() vs last() vs Custom agg()¶

Extending Aggregations¶

Related¶

Comparison: `first()` vs `last()` vs Custom `agg()`¶