Skip to content

LazyFrame and Deferred Evaluation

Summary

LazyFrame delays filter operations until you explicitly ask for results. This lets gaslamp apply optimizations like combining multiple filters and short-circuit evaluation.

JavaScript
const result = gaslamp.LazyFrame.from(df)
  .filter(gaslamp.Expression.col('age').ge(18))
  .filter(gaslamp.Expression.col('status').eq('active'))
  .collect(); // now filters are executed and combined

Instead of evaluating filters one by one, LazyFrame fuses them into a single combined condition, skipping unnecessary work.

Philosophy: Delay Decisions Until Necessary

In standard Google Apps Script, filter operations are eager (executed immediately):

JavaScript
// Standard GAS: each filter runs immediately
const step1 = data.filter(row => row.age >= 18);      // scans all rows once
const step2 = step1.filter(row => row.status === 'active'); // scans filtered rows again

This works, but it's inefficient:

  • Multiple passes over the data
  • Rows rejected by the first filter still get evaluated by the second
  • No opportunity for optimization

The Solution: Queue and Fuse

LazyFrame defers evaluation and combines filters:

JavaScript
// LazyFrame: filters are queued and combined
const result = gaslamp.LazyFrame.from(data)
  .filter(gaslamp.Expression.col('age').ge(18))
  .filter(gaslamp.Expression.col('status').eq('active'))
  .collect(); // Now both filters are combined and evaluated once

What happens internally:

  1. .filter() queues the expression (no evaluation yet)
  2. .filter() again adds another expression to the queue
  3. .collect() fuses all expressions into one combined condition
  4. The combined condition is evaluated once over all rows

How It Works: Predicate Fusion

The Key Insight: AND Everything Together

When you chain multiple filters, they represent AND conditions:

JavaScript
// "rows where age >= 18 AND status == 'active' AND country == 'USA'"
gaslamp.LazyFrame.from(df)
  .filter(gaslamp.Expression.col('age').ge(18))
  .filter(gaslamp.Expression.col('status').eq('active'))
  .filter(gaslamp.Expression.col('country').eq('USA'))
  .collect();

LazyFrame fuses these into:

JavaScript
const fused = gaslamp.Expression.col('age').ge(18)
  .and(gaslamp.Expression.col('status').eq('active'))
  .and(gaslamp.Expression.col('country').eq('USA'));

Then evaluates it once against all rows.

Short-Circuit Evaluation

When evaluating a combined condition, evaluation stops as soon as any clause fails:

JavaScript
// For each row:
// if (age >= 18) AND (status === 'active') AND (country === 'USA')
//   if age < 18: skip status and country checks, reject row immediately
//   if age >= 18 but status != 'active': skip country check, reject row

This saves work. Conditions that filter out many rows should be ordered first.

Lazy vs Eager: When to Use Each

Use BareFrame.filter() (Eager) When:

  • You have a single, simple filter
  • You want immediate evaluation (no deferral overhead)
  • You're already using BareFrame operations
JavaScript
const adults = df.filter(row => row.get('age') >= 18);

Use LazyFrame (Deferred) When:

  • You're chaining multiple Expression-based filters
  • You want optimizations applied
  • You prefer the .filter().filter().collect() style
JavaScript
const result = gaslamp.LazyFrame.from(df)
  .filter(gaslamp.Expression.col('age').ge(18))
  .filter(gaslamp.Expression.col('status').eq('active'))
  .collect();

The Benefits of Deferral

1. Single-Pass Evaluation

Fusing multiple filters into one means the data is scanned once, not multiple times.

JavaScript
// Without LazyFrame: 3 passes over the data
const a = df.filter(x);
const b = a.filter(y);
const c = b.filter(z);

// With LazyFrame: 1 pass
const c = gaslamp.LazyFrame.from(df)
  .filter(x)
  .filter(y)
  .filter(z)
  .collect();

2. Future Optimization Opportunities

Deferring evaluation lets gaslamp optimize:

  • Filter reordering: Move cheap, high-selectivity filters first (faster rejection)
  • Predicate pushdown: Apply filters before other operations (smaller intermediate results)
  • Index usage: Identify candidates for indexed lookups

Today, LazyFrame applies fusion. Tomorrow, it could apply more sophisticated optimizations.

3. Expressive Chaining

The .filter().filter().collect() style is natural and readable:

JavaScript
const report = gaslamp.LazyFrame.from(dataframe)
  .filter(gaslamp.Expression.col('year').eq(2024))
  .filter(gaslamp.Expression.col('department').eq('Sales'))
  .filter(gaslamp.Expression.col('revenue').gt(100000))
  .collect();

Compare to manual combination:

JavaScript
const combined = gaslamp.Expression.col('year').eq(2024)
  .and(gaslamp.Expression.col('department').eq('Sales'))
  .and(gaslamp.Expression.col('revenue').gt(100000));
const report = df.filter(combined.toFunction());

LazyFrame is more natural and easier to extend.

Architecture: Immutability and Chaining

LazyFrame is immutable. Each .filter() returns a new LazyFrame:

JavaScript
const lazy1 = gaslamp.LazyFrame.from(df);
const lazy2 = lazy1.filter(gaslamp.Expression.col('age').ge(18));
const lazy3 = lazy2.filter(gaslamp.Expression.col('status').eq('active'));

// lazy1, lazy2, lazy3 are all different instances
// None share state

This enables:

  • Reuse: Build conditions once, use many times
  • Safety: No hidden mutations
  • Composability: Build complex filters step-by-step

The Limitation: Expression-Only

LazyFrame accepts only Expression instances for filtering.

This design choice:

  • Enables optimization (gaslamp understands Expression structure)
  • Keeps it simple (no mixing Expression and custom predicates)
  • Forces you to think in terms of schema-aware conditions

If you need a complex custom filter, use BareFrame.filter():

JavaScript
// Custom logic: use BareFrame
const custom = df.filter(row => someComplexLogic(row));

// Then, if needed, wrap in LazyFrame for more filters
const result = gaslamp.LazyFrame.from(custom)
  .filter(gaslamp.Expression.col('age').ge(18))
  .collect();

When to Use LazyFrame in Your Design

Good use case:

JavaScript
// Multiple Expression-based filters on the same source
const result = gaslamp.LazyFrame.from(rawData)
  .filter(byYear)
  .filter(byDepartment)
  .filter(byRevenue)
  .collect();

Less ideal:

JavaScript
// Single filter: LazyFrame adds overhead with no benefit
gaslamp.LazyFrame.from(df).filter(expr).collect();
// Just use: df.filter(expr.toFunction());

Not suitable:

JavaScript
// Mix of Expression and custom logic: use BareFrame
gaslamp.LazyFrame.from(df)
  .filter(gaslamp.Expression.col('age').ge(18))
  .filter(row => customCheck(row)) // ← LazyFrame doesn't support this