GroupedFrame and Aggregation¶
Summary¶
GroupedFrame is a temporary wrapper that represents grouped data.
It provides aggregation methods with a declarative, multi-column API: specify which columns to aggregate in a single call.
Unlike BareFrame (which is immutable and reusable), GroupedFrame exists only for the duration of aggregation. It's not meant to be stored.
const grouped = df.groupBy(['department']);
// Declarative: specify columns to aggregate explicitly
const counts = grouped.count(); // one row per group
const sums = grouped.sum(['salary', 'bonus']); // sum multiple columns at once
const means = grouped.mean(['sales', 'commission']);
// Custom aggregations
const report = grouped.agg({
salary: (vals) => Math.max(...vals),
bonus: (vals) => Math.min(...vals),
});
Design principle: gaslamp uses explicit column lists (sum(['col1', 'col2'])) rather than pandas/polars' separate selection step.
This is more declarative—you say exactly what you want aggregated in one readable call.
All aggregation methods return a BareFrame with one row per group.
Philosophy: Grouping + Aggregation as a Two-Step Operation¶
In standard Google Apps Script, computing group summaries is tedious:
// Standard GAS: manual grouping and aggregation
const data = [
{ dept: 'eng', salary: 80000 },
{ dept: 'eng', salary: 90000 },
{ dept: 'hr', salary: 70000 },
];
// Build a map manually
const groups = {};
for (const row of data) {
if (!groups[row.dept]) groups[row.dept] = [];
groups[row.dept].push(row);
}
// Aggregate each group manually
const results = [];
for (const [dept, rows] of Object.entries(groups)) {
const totalSalary = rows.reduce((sum, r) => sum + r.salary, 0);
results.push({ dept, totalSalary });
}
Problems:
- Repetitive and error-prone
- Different aggregations require completely different loops
- Hard to compose (e.g., grouping by multiple columns)
- No reusable patterns
The Solution: Separate Grouping from Aggregation¶
gaslamp separates the two concerns:
- Grouping (
BareFrame.groupBy()) — Creates aGroupedFrame - Aggregation (
GroupedFrame.count(),.sum(),.agg()) — Returns aBareFrame
const grouped = df.groupBy(['department']); // Step 1: group
const summary = grouped.sum(['salary']); // Step 2: aggregate
The separation is clean:
- Grouping logic is in one place
- Each aggregation is a simple, focused method
- Results are always a
BareFrame(predictable)
How It Works: The Grouping Process¶
Step 1: Define Group Keys¶
When you call df.groupBy(['department']), gaslamp:
- Reads all values in the 'department' column
- Creates a composite key for each unique value (or combination of values if multiple columns)
- Stores the row indices that belong to each group
Example:
const df = gaslamp.BareFrame.fromColumns({
dept: ['eng', 'eng', 'hr', 'hr', 'eng'],
salary: [80000, 90000, 70000, 75000, 95000],
});
const grouped = df.groupBy(['dept']);
// Internally:
// 'eng' → [0, 1, 4] (row indices)
// 'hr' → [2, 3] (row indices)
Step 2: Aggregate Within Groups¶
Each aggregation method operates on the row indices:
count() — Row count per group
grouped.count();
// dept | count
// eng | 3
// hr | 2
first() and last() — First/last row of each group
grouped.first();
// dept | salary
// eng | 80000
// hr | 70000
sum(), mean(), min(), max() — Numeric aggregations
grouped.sum(['salary']);
// dept | salary
// eng | 265000
// hr | 145000
grouped.mean(['salary']);
// dept | salary
// eng | 88333.33
// hr | 72500
agg() — Custom aggregations
grouped.agg({
salary: (values) => Math.max(...values),
count: (values) => values.length,
});
// dept | salary | count
// eng | 95000 | 3
// hr | 75000 | 2
Multiple Group Keys¶
You can group by multiple columns:
df.groupBy(['dept', 'level'])
.count();
// dept | level | count
// eng | 1 | 2
// eng | 2 | 1
// hr | 1 | 1
// hr | 2 | 1
Composite keys are formed from all grouping columns. Two rows are in the same group only if ALL key columns match.
API Design: Declarative vs. Implicit Selection¶
gaslamp: Explicit Multi-Column Aggregation¶
// gaslamp: declare columns explicitly
grouped.sum(['salary', 'bonus', 'commission']);
// Result: one row per group with all three columns summed
Advantages:
- Declarative: Clear what's being aggregated
- Single call: Specify multiple columns at once
- Readable: No intermediate selection steps
- GAS-friendly: Direct mapping to sheet columns
pandas/polars: Implicit Column Selection¶
# pandas: select columns, then aggregate
grouped[['salary', 'bonus', 'commission']].sum()
gaslamp's approach is more intuitive because:
- You specify exactly what you want in the result
- No intermediate "select columns, then aggregate" pattern
- The intent is obvious at a glance
- Feels natural in Google Sheets workflows
Design Insight: Why GroupedFrame Is Temporary¶
GroupedFrame is not a DataFrame.
It's a temporary wrapper:
const grouped = df.groupBy(['dept']); // GroupedFrame
const result = grouped.sum(['salary']); // BareFrame
// You can't:
// grouped.filter(...) ← No, not a DataFrame
// grouped.select(...) ← No, not a DataFrame
// You must:
result.filter(...) // ← Use the result (BareFrame)
Why?
- Clear intent:
GroupedFrameexplicitly means "aggregation step" - No ambiguity: You can't accidentally operate on grouped data
- Temporary: GroupedFrame is created and discarded immediately
- Type safety: Different methods for grouped vs. ungrouped data
If you need to transform the result, use BareFrame methods:
const summary = df.groupBy(['dept']).sum(['salary']);
const filtered = summary.filter(row => row.get('salary') > 100000);
const final = filtered.select(['dept', 'salary']);
Rolling Window Aggregations¶
Some aggregation methods support rolling windows—aggregating over recent rows within each group.
df.groupBy(['dept']).sum(['salary'], { window: 2 });
This aggregates the last 2 rows of each group. Window size is specified per aggregation, not global.
When to use:
- Time-series data grouped by category
- Computing moving averages within groups
- Detecting trends within each group
Aggregation Semantics¶
Order Preservation¶
Groups appear in the order they first occur in the data:
const df = gaslamp.BareFrame.fromColumns({
dept: ['hr', 'eng', 'hr', 'eng'],
salary: [70000, 80000, 75000, 90000],
});
df.groupBy(['dept']).count();
// dept | count
// hr | 2 ← first group (appears first in data)
// eng | 2
Null Handling¶
Rows with null in a group key are treated as a separate group:
const df = gaslamp.BareFrame.fromColumns({
dept: ['eng', null, 'eng', null],
salary: [80000, 70000, 90000, 75000],
});
df.groupBy(['dept']).count();
// dept | count
// eng | 2
// null | 2
Empty Groups¶
Only groups that actually appear in the data are returned. No empty groups are created.
Comparison: first() vs last() vs Custom agg()¶
| Method | Returns | Use Case |
|---|---|---|
first() |
All columns, first row per group | Get representative row, historical baseline |
last() |
All columns, last row per group | Get most recent row, current state |
agg() |
User-specified columns | Custom logic, multiple aggregations |
All return a BareFrame with one row per group.
Extending Aggregations¶
The architecture makes it easy to add new aggregations:
// Current: count, first, last, sum, mean, min, max, agg
// Future: median, mode, stddev, etc.
// Could add:
grouped.median(['salary']);
grouped.stddev(['salary']);
grouped.custom({ ... });
Each new aggregation is a simple method that iterates over groups and returns a BareFrame.
Related¶
- BareFrame-First Design — Understanding BareFrame's role
- Data Orientation and Conversion — How grouped data is organized internally
- Expressions and Predicates — Filtering grouped results after aggregation