LazyFrame and Deferred Evaluation¶
Summary¶
LazyFrame delays filter operations until you explicitly ask for results.
This lets gaslamp apply optimizations like combining multiple filters and short-circuit evaluation.
const result = gaslamp.LazyFrame.from(df)
.filter(gaslamp.Expression.col('age').ge(18))
.filter(gaslamp.Expression.col('status').eq('active'))
.collect(); // now filters are executed and combined
Instead of evaluating filters one by one, LazyFrame fuses them into a single combined condition, skipping unnecessary work.
Philosophy: Delay Decisions Until Necessary¶
In standard Google Apps Script, filter operations are eager (executed immediately):
// Standard GAS: each filter runs immediately
const step1 = data.filter(row => row.age >= 18); // scans all rows once
const step2 = step1.filter(row => row.status === 'active'); // scans filtered rows again
This works, but it's inefficient:
- Multiple passes over the data
- Rows rejected by the first filter still get evaluated by the second
- No opportunity for optimization
The Solution: Queue and Fuse¶
LazyFrame defers evaluation and combines filters:
// LazyFrame: filters are queued and combined
const result = gaslamp.LazyFrame.from(data)
.filter(gaslamp.Expression.col('age').ge(18))
.filter(gaslamp.Expression.col('status').eq('active'))
.collect(); // Now both filters are combined and evaluated once
What happens internally:
.filter()queues the expression (no evaluation yet).filter()again adds another expression to the queue.collect()fuses all expressions into one combined condition- The combined condition is evaluated once over all rows
How It Works: Predicate Fusion¶
The Key Insight: AND Everything Together¶
When you chain multiple filters, they represent AND conditions:
// "rows where age >= 18 AND status == 'active' AND country == 'USA'"
gaslamp.LazyFrame.from(df)
.filter(gaslamp.Expression.col('age').ge(18))
.filter(gaslamp.Expression.col('status').eq('active'))
.filter(gaslamp.Expression.col('country').eq('USA'))
.collect();
LazyFrame fuses these into:
const fused = gaslamp.Expression.col('age').ge(18)
.and(gaslamp.Expression.col('status').eq('active'))
.and(gaslamp.Expression.col('country').eq('USA'));
Then evaluates it once against all rows.
Short-Circuit Evaluation¶
When evaluating a combined condition, evaluation stops as soon as any clause fails:
// For each row:
// if (age >= 18) AND (status === 'active') AND (country === 'USA')
// if age < 18: skip status and country checks, reject row immediately
// if age >= 18 but status != 'active': skip country check, reject row
This saves work. Conditions that filter out many rows should be ordered first.
Lazy vs Eager: When to Use Each¶
Use BareFrame.filter() (Eager) When:¶
- You have a single, simple filter
- You want immediate evaluation (no deferral overhead)
- You're already using BareFrame operations
const adults = df.filter(row => row.get('age') >= 18);
Use LazyFrame (Deferred) When:¶
- You're chaining multiple Expression-based filters
- You want optimizations applied
- You prefer the
.filter().filter().collect()style
const result = gaslamp.LazyFrame.from(df)
.filter(gaslamp.Expression.col('age').ge(18))
.filter(gaslamp.Expression.col('status').eq('active'))
.collect();
The Benefits of Deferral¶
1. Single-Pass Evaluation¶
Fusing multiple filters into one means the data is scanned once, not multiple times.
// Without LazyFrame: 3 passes over the data
const a = df.filter(x);
const b = a.filter(y);
const c = b.filter(z);
// With LazyFrame: 1 pass
const c = gaslamp.LazyFrame.from(df)
.filter(x)
.filter(y)
.filter(z)
.collect();
2. Future Optimization Opportunities¶
Deferring evaluation lets gaslamp optimize:
- Filter reordering: Move cheap, high-selectivity filters first (faster rejection)
- Predicate pushdown: Apply filters before other operations (smaller intermediate results)
- Index usage: Identify candidates for indexed lookups
Today, LazyFrame applies fusion. Tomorrow, it could apply more sophisticated optimizations.
3. Expressive Chaining¶
The .filter().filter().collect() style is natural and readable:
const report = gaslamp.LazyFrame.from(dataframe)
.filter(gaslamp.Expression.col('year').eq(2024))
.filter(gaslamp.Expression.col('department').eq('Sales'))
.filter(gaslamp.Expression.col('revenue').gt(100000))
.collect();
Compare to manual combination:
const combined = gaslamp.Expression.col('year').eq(2024)
.and(gaslamp.Expression.col('department').eq('Sales'))
.and(gaslamp.Expression.col('revenue').gt(100000));
const report = df.filter(combined.toFunction());
LazyFrame is more natural and easier to extend.
Architecture: Immutability and Chaining¶
LazyFrame is immutable. Each .filter() returns a new LazyFrame:
const lazy1 = gaslamp.LazyFrame.from(df);
const lazy2 = lazy1.filter(gaslamp.Expression.col('age').ge(18));
const lazy3 = lazy2.filter(gaslamp.Expression.col('status').eq('active'));
// lazy1, lazy2, lazy3 are all different instances
// None share state
This enables:
- Reuse: Build conditions once, use many times
- Safety: No hidden mutations
- Composability: Build complex filters step-by-step
The Limitation: Expression-Only¶
LazyFrame accepts only Expression instances for filtering.
This design choice:
- Enables optimization (gaslamp understands Expression structure)
- Keeps it simple (no mixing Expression and custom predicates)
- Forces you to think in terms of schema-aware conditions
If you need a complex custom filter, use BareFrame.filter():
// Custom logic: use BareFrame
const custom = df.filter(row => someComplexLogic(row));
// Then, if needed, wrap in LazyFrame for more filters
const result = gaslamp.LazyFrame.from(custom)
.filter(gaslamp.Expression.col('age').ge(18))
.collect();
When to Use LazyFrame in Your Design¶
Good use case:
// Multiple Expression-based filters on the same source
const result = gaslamp.LazyFrame.from(rawData)
.filter(byYear)
.filter(byDepartment)
.filter(byRevenue)
.collect();
Less ideal:
// Single filter: LazyFrame adds overhead with no benefit
gaslamp.LazyFrame.from(df).filter(expr).collect();
// Just use: df.filter(expr.toFunction());
Not suitable:
// Mix of Expression and custom logic: use BareFrame
gaslamp.LazyFrame.from(df)
.filter(gaslamp.Expression.col('age').ge(18))
.filter(row => customCheck(row)) // ← LazyFrame doesn't support this
Related¶
- Expressions and Predicates — Understanding Expression builders
- BareFrame-First Design — When to use eager filtering vs lazy
- Data Orientation and Conversion — Understanding row structure