Skip to content

Outlier Detection & Filtering

Identify and report statistical outliers in numeric data.

User Story

You have sales data with amounts that you suspect contain data entry errors or unusual transactions. You need to:

  • Validate all numeric data
  • Calculate statistical measures (mean, standard deviation) per product category
  • Identify outliers (values beyond 2 standard deviations from mean)
  • Flag outliers with deviation metrics for review
  • Export clean data (without outliers) and outlier report separately
  • Understand which products have the most anomalies

With gaslamp

JavaScript
function cookbook_outlierDetection() {
  // Step 1: Define schema for sales data
  const schema = {
    product:  gaslamp.FlameGuards.isString,
    category: gaslamp.FlameGuards.isString,
    amount:   gaslamp.FlameGuards.isNumber,
  };

  const sourceSheet = SpreadsheetApp.getActiveSheet();
  const workbook    = SpreadsheetApp.getActiveSpreadsheet();

  // Step 2: Load and validate sales data
  const df = gaslamp.BareFrame.fromSheet(sourceSheet);
  const { passed, failed } = gaslamp.FlameFrame.from(df, schema);

  if (failed.length > 0) {
    console.error(failed.length + ' validation error(s) found:');
    console.error(failed.display());
    return;
  }

  console.log(`Loaded ${passed.length} valid transactions`);

  // Step 3: Calculate mean and std dev per category
  const categoryStats = passed
    .groupBy(['category'])
    .sum(['amount']);

  // Build stats map: mean and std dev per category
  const statsMap = {};
  passed.groupBy(['category']).toRows().forEach((row) => {
    const category = row.get('category');
    const amounts = passed
      .filter((r) => r.get('category') === category)
      .toRows()
      .map((r) => r.get('amount'));

    // Calculate mean
    const mean = amounts.reduce((a, b) => a + b, 0) / amounts.length;

    // Calculate standard deviation
    const variance = amounts.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / amounts.length;
    const stdDev = Math.sqrt(variance);

    statsMap[category] = { mean, stdDev };
  });

  // Step 4: Mark outliers based on z-score (> 2 std devs from mean)
  const withOutlierInfo = passed.withColumn('z_score', (row) => {
    const category = row.get('category');
    const amount = row.get('amount');
    const { mean, stdDev } = statsMap[category];

    if (stdDev === 0) return 0; // No variance in this category
    return Math.abs((amount - mean) / stdDev);
  }).withColumn('is_outlier', (row) => {
    return row.get('z_score') > 2;
  }).withColumn('category_mean', (row) => {
    return statsMap[row.get('category')].mean;
  }).withColumn('category_stddev', (row) => {
    return statsMap[row.get('category')].stdDev;
  });

  // Step 5: Separate outliers and normal data
  const outliers = withOutlierInfo.filter((row) => row.get('is_outlier'));
  const cleanData = withOutlierInfo.filter((row) => !row.get('is_outlier'));

  console.log(`Found ${outliers.length} outliers out of ${passed.length} transactions`);
  console.log(`${cleanData.length} transactions are within normal range`);

  // Step 6: Export clean data (remove statistical helper columns)
  const cleanDataExport = cleanData.drop(['z_score', 'is_outlier', 'category_mean', 'category_stddev']);

  const cleanSheet = workbook.getSheetByName('Clean Data')
    ?? workbook.insertSheet('Clean Data');
  cleanDataExport.toSheet(cleanSheet);

  // Step 7: Export outlier report for review
  const outlierReport = outliers.select(['product', 'category', 'amount', 'z_score', 'category_mean', 'category_stddev']);

  const reportSheet = workbook.getSheetByName('Outlier Report')
    ?? workbook.insertSheet('Outlier Report');
  outlierReport.toSheet(reportSheet);

  // Step 8: Summary statistics by category
  const summaryData = gaslamp.BareFrame.fromArrays([
    ['Category', 'Mean', 'Std Dev', 'Total Records', 'Outlier Count'],
    ...Object.entries(statsMap).map(([category, stats]) => {
      const categoryTotal = passed.filter((r) => r.get('category') === category).length;
      const categoryOutliers = outliers.filter((r) => r.get('category') === category).length;
      return [category, stats.mean.toFixed(2), stats.stdDev.toFixed(2), categoryTotal, categoryOutliers];
    }),
  ]);

  const summarySheet = workbook.getSheetByName('Category Summary')
    ?? workbook.insertSheet('Category Summary');
  summaryData.toSheet(summarySheet);

  console.log('Outlier analysis exported');
}

Key gaslamp features used:

  • Schema validation with FlameFrame.from() — ensure numeric values before statistics
  • .groupBy() for aggregation — calculate statistics per category
  • .toRows() conversion — access individual values for std dev calculation
  • .filter() with custom logic — identify outliers based on z-score
  • Multiple .withColumn() additions — add z-score, outlier flag, and category stats
  • Separate filtering — split into clean data and outliers
  • .drop() and .select() — manage output columns for different views
  • .fromArrays() with computed data — create summary statistics sheet
  • Reusable DataFrame — generate multiple views from validated source

Without gaslamp

JavaScript
function cookbook_outlierDetection_vanilla() {
  const sourceSheet = SpreadsheetApp.getActiveSheet();
  const workbook    = SpreadsheetApp.getActiveSpreadsheet();

  // Step 1: Load and validate sales data
  const range = sourceSheet.getDataRange();
  const values = range.getValues();
  const headers = values[0];
  const rows = values.slice(1);

  const productIdx = headers.indexOf('product');
  const categoryIdx = headers.indexOf('category');
  const amountIdx = headers.indexOf('amount');

  const validated = [];
  for (const row of rows) {
    if (typeof row[productIdx] !== 'string' || typeof row[categoryIdx] !== 'string' || typeof row[amountIdx] !== 'number') {
      console.warn('Type error in row:', row);
      continue;
    }
    validated.push(row);
  }

  if (validated.length === 0) {
    console.warn('Cannot analyze: no valid rows');
    return;
  }

  console.log(`Loaded ${validated.length} valid transactions`);

  // Step 2: Manually calculate mean and std dev per category
  const categoryGroups = {};
  for (const row of validated) {
    const category = row[categoryIdx];
    if (!categoryGroups[category]) {
      categoryGroups[category] = [];
    }
    categoryGroups[category].push(row[amountIdx]);
  }

  const statsMap = {};
  for (const [category, amounts] of Object.entries(categoryGroups)) {
    const mean = amounts.reduce((a, b) => a + b, 0) / amounts.length;
    const variance = amounts.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / amounts.length;
    const stdDev = Math.sqrt(variance);
    statsMap[category] = { mean, stdDev };
  }

  // Step 3: Manually mark outliers and calculate z-scores
  const outliers = [];
  const cleanData = [];

  for (const row of validated) {
    const category = row[categoryIdx];
    const amount = row[amountIdx];
    const { mean, stdDev } = statsMap[category];

    let zScore = 0;
    if (stdDev !== 0) {
      zScore = Math.abs((amount - mean) / stdDev);
    }

    const isOutlier = zScore > 2;

    if (isOutlier) {
      outliers.push([...row, zScore, mean, stdDev]);
    } else {
      cleanData.push(row);
    }
  }

  console.log(`Found ${outliers.length} outliers out of ${validated.length} transactions`);
  console.log(`${cleanData.length} transactions are within normal range`);

  // Step 4: Write clean data to sheet
  const cleanSheet = workbook.getSheetByName('Clean Data')
    ?? workbook.insertSheet('Clean Data');

  if (cleanData.length > 0) {
    cleanSheet.getRange(1, 1, 1, headers.length).setValues([headers]);
    cleanSheet.getRange(2, 1, cleanData.length, headers.length).setValues(cleanData);
  }

  // Step 5: Write outlier report
  const reportSheet = workbook.getSheetByName('Outlier Report')
    ?? workbook.insertSheet('Outlier Report');

  const reportHeaders = [...headers, 'Z-Score', 'Category Mean', 'Category StdDev'];
  if (outliers.length > 0) {
    reportSheet.getRange(1, 1, 1, reportHeaders.length).setValues([reportHeaders]);
    reportSheet.getRange(2, 1, outliers.length, reportHeaders.length).setValues(outliers);
  }

  // Step 6: Write category summary
  const summarySheet = workbook.getSheetByName('Category Summary')
    ?? workbook.insertSheet('Category Summary');

  const summaryHeaders = ['Category', 'Mean', 'Std Dev', 'Total Records', 'Outlier Count'];
  const summaryData = [];

  for (const [category, stats] of Object.entries(statsMap)) {
    const categoryTotal = categoryGroups[category].length;
    const categoryOutliers = outliers.filter((row) => row[categoryIdx] === category).length;
    summaryData.push([category, stats.mean.toFixed(2), stats.stdDev.toFixed(2), categoryTotal, categoryOutliers]);
  }

  if (summaryData.length > 0) {
    summarySheet.getRange(1, 1, 1, summaryHeaders.length).setValues([summaryHeaders]);
    summarySheet.getRange(2, 1, summaryData.length, summaryHeaders.length).setValues(summaryData);
  }

  console.log('Outlier analysis exported');
}

Without gaslamp, you need to:

  • Manually validate types for each row
  • Manually group data by category into separate arrays
  • Manually calculate mean and std deviation for each category with reduce loops
  • Manually iterate through all data again to mark outliers
  • Manually calculate z-scores per row with conditional std dev check
  • Manually separate rows into outliers and clean data with conditional push
  • Manually construct output arrays with z-score and stats columns
  • Manually filter outliers by category for summary statistics
  • Manually construct headers by spreading original headers + computed columns
  • Write headers and data separately for each output sheet
  • Duplicate category grouping logic (once for stats, again for filtering)

Key Techniques

  • Calculate statistics per group: Use .groupBy() to organize data, then compute mean/std dev
  • Z-score for outlier detection: Mark values > 2 std deviations from mean as outliers
  • Add computed columns: Use .withColumn() to add z-score, outlier flag, and category stats
  • Separate findings: Export clean data and outliers to different sheets
  • Summary by category: Create a breakdown showing anomalies per category
  • Handle edge cases: Check for zero std dev (no variance) before dividing
  • Clean output: Use .drop() to remove helper columns from final export

See Filter Guide for more filtering patterns and groupBy Guide for advanced aggregation.