Outlier Detection & Filtering¶
Identify and report statistical outliers in numeric data.
User Story¶
You have sales data with amounts that you suspect contain data entry errors or unusual transactions. You need to:
- Validate all numeric data
- Calculate statistical measures (mean, standard deviation) per product category
- Identify outliers (values beyond 2 standard deviations from mean)
- Flag outliers with deviation metrics for review
- Export clean data (without outliers) and outlier report separately
- Understand which products have the most anomalies
With gaslamp¶
JavaScript
function cookbook_outlierDetection() {
// Step 1: Define schema for sales data
const schema = {
product: gaslamp.FlameGuards.isString,
category: gaslamp.FlameGuards.isString,
amount: gaslamp.FlameGuards.isNumber,
};
const sourceSheet = SpreadsheetApp.getActiveSheet();
const workbook = SpreadsheetApp.getActiveSpreadsheet();
// Step 2: Load and validate sales data
const df = gaslamp.BareFrame.fromSheet(sourceSheet);
const { passed, failed } = gaslamp.FlameFrame.from(df, schema);
if (failed.length > 0) {
console.error(failed.length + ' validation error(s) found:');
console.error(failed.display());
return;
}
console.log(`Loaded ${passed.length} valid transactions`);
// Step 3: Calculate mean and std dev per category
const categoryStats = passed
.groupBy(['category'])
.sum(['amount']);
// Build stats map: mean and std dev per category
const statsMap = {};
passed.groupBy(['category']).toRows().forEach((row) => {
const category = row.get('category');
const amounts = passed
.filter((r) => r.get('category') === category)
.toRows()
.map((r) => r.get('amount'));
// Calculate mean
const mean = amounts.reduce((a, b) => a + b, 0) / amounts.length;
// Calculate standard deviation
const variance = amounts.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / amounts.length;
const stdDev = Math.sqrt(variance);
statsMap[category] = { mean, stdDev };
});
// Step 4: Mark outliers based on z-score (> 2 std devs from mean)
const withOutlierInfo = passed.withColumn('z_score', (row) => {
const category = row.get('category');
const amount = row.get('amount');
const { mean, stdDev } = statsMap[category];
if (stdDev === 0) return 0; // No variance in this category
return Math.abs((amount - mean) / stdDev);
}).withColumn('is_outlier', (row) => {
return row.get('z_score') > 2;
}).withColumn('category_mean', (row) => {
return statsMap[row.get('category')].mean;
}).withColumn('category_stddev', (row) => {
return statsMap[row.get('category')].stdDev;
});
// Step 5: Separate outliers and normal data
const outliers = withOutlierInfo.filter((row) => row.get('is_outlier'));
const cleanData = withOutlierInfo.filter((row) => !row.get('is_outlier'));
console.log(`Found ${outliers.length} outliers out of ${passed.length} transactions`);
console.log(`${cleanData.length} transactions are within normal range`);
// Step 6: Export clean data (remove statistical helper columns)
const cleanDataExport = cleanData.drop(['z_score', 'is_outlier', 'category_mean', 'category_stddev']);
const cleanSheet = workbook.getSheetByName('Clean Data')
?? workbook.insertSheet('Clean Data');
cleanDataExport.toSheet(cleanSheet);
// Step 7: Export outlier report for review
const outlierReport = outliers.select(['product', 'category', 'amount', 'z_score', 'category_mean', 'category_stddev']);
const reportSheet = workbook.getSheetByName('Outlier Report')
?? workbook.insertSheet('Outlier Report');
outlierReport.toSheet(reportSheet);
// Step 8: Summary statistics by category
const summaryData = gaslamp.BareFrame.fromArrays([
['Category', 'Mean', 'Std Dev', 'Total Records', 'Outlier Count'],
...Object.entries(statsMap).map(([category, stats]) => {
const categoryTotal = passed.filter((r) => r.get('category') === category).length;
const categoryOutliers = outliers.filter((r) => r.get('category') === category).length;
return [category, stats.mean.toFixed(2), stats.stdDev.toFixed(2), categoryTotal, categoryOutliers];
}),
]);
const summarySheet = workbook.getSheetByName('Category Summary')
?? workbook.insertSheet('Category Summary');
summaryData.toSheet(summarySheet);
console.log('Outlier analysis exported');
}
Key gaslamp features used:
- Schema validation with
FlameFrame.from()— ensure numeric values before statistics .groupBy()for aggregation — calculate statistics per category.toRows()conversion — access individual values for std dev calculation.filter()with custom logic — identify outliers based on z-score- Multiple
.withColumn()additions — add z-score, outlier flag, and category stats - Separate filtering — split into clean data and outliers
.drop()and.select()— manage output columns for different views.fromArrays()with computed data — create summary statistics sheet- Reusable DataFrame — generate multiple views from validated source
Without gaslamp¶
JavaScript
function cookbook_outlierDetection_vanilla() {
const sourceSheet = SpreadsheetApp.getActiveSheet();
const workbook = SpreadsheetApp.getActiveSpreadsheet();
// Step 1: Load and validate sales data
const range = sourceSheet.getDataRange();
const values = range.getValues();
const headers = values[0];
const rows = values.slice(1);
const productIdx = headers.indexOf('product');
const categoryIdx = headers.indexOf('category');
const amountIdx = headers.indexOf('amount');
const validated = [];
for (const row of rows) {
if (typeof row[productIdx] !== 'string' || typeof row[categoryIdx] !== 'string' || typeof row[amountIdx] !== 'number') {
console.warn('Type error in row:', row);
continue;
}
validated.push(row);
}
if (validated.length === 0) {
console.warn('Cannot analyze: no valid rows');
return;
}
console.log(`Loaded ${validated.length} valid transactions`);
// Step 2: Manually calculate mean and std dev per category
const categoryGroups = {};
for (const row of validated) {
const category = row[categoryIdx];
if (!categoryGroups[category]) {
categoryGroups[category] = [];
}
categoryGroups[category].push(row[amountIdx]);
}
const statsMap = {};
for (const [category, amounts] of Object.entries(categoryGroups)) {
const mean = amounts.reduce((a, b) => a + b, 0) / amounts.length;
const variance = amounts.reduce((sum, val) => sum + Math.pow(val - mean, 2), 0) / amounts.length;
const stdDev = Math.sqrt(variance);
statsMap[category] = { mean, stdDev };
}
// Step 3: Manually mark outliers and calculate z-scores
const outliers = [];
const cleanData = [];
for (const row of validated) {
const category = row[categoryIdx];
const amount = row[amountIdx];
const { mean, stdDev } = statsMap[category];
let zScore = 0;
if (stdDev !== 0) {
zScore = Math.abs((amount - mean) / stdDev);
}
const isOutlier = zScore > 2;
if (isOutlier) {
outliers.push([...row, zScore, mean, stdDev]);
} else {
cleanData.push(row);
}
}
console.log(`Found ${outliers.length} outliers out of ${validated.length} transactions`);
console.log(`${cleanData.length} transactions are within normal range`);
// Step 4: Write clean data to sheet
const cleanSheet = workbook.getSheetByName('Clean Data')
?? workbook.insertSheet('Clean Data');
if (cleanData.length > 0) {
cleanSheet.getRange(1, 1, 1, headers.length).setValues([headers]);
cleanSheet.getRange(2, 1, cleanData.length, headers.length).setValues(cleanData);
}
// Step 5: Write outlier report
const reportSheet = workbook.getSheetByName('Outlier Report')
?? workbook.insertSheet('Outlier Report');
const reportHeaders = [...headers, 'Z-Score', 'Category Mean', 'Category StdDev'];
if (outliers.length > 0) {
reportSheet.getRange(1, 1, 1, reportHeaders.length).setValues([reportHeaders]);
reportSheet.getRange(2, 1, outliers.length, reportHeaders.length).setValues(outliers);
}
// Step 6: Write category summary
const summarySheet = workbook.getSheetByName('Category Summary')
?? workbook.insertSheet('Category Summary');
const summaryHeaders = ['Category', 'Mean', 'Std Dev', 'Total Records', 'Outlier Count'];
const summaryData = [];
for (const [category, stats] of Object.entries(statsMap)) {
const categoryTotal = categoryGroups[category].length;
const categoryOutliers = outliers.filter((row) => row[categoryIdx] === category).length;
summaryData.push([category, stats.mean.toFixed(2), stats.stdDev.toFixed(2), categoryTotal, categoryOutliers]);
}
if (summaryData.length > 0) {
summarySheet.getRange(1, 1, 1, summaryHeaders.length).setValues([summaryHeaders]);
summarySheet.getRange(2, 1, summaryData.length, summaryHeaders.length).setValues(summaryData);
}
console.log('Outlier analysis exported');
}
Without gaslamp, you need to:
- Manually validate types for each row
- Manually group data by category into separate arrays
- Manually calculate mean and std deviation for each category with reduce loops
- Manually iterate through all data again to mark outliers
- Manually calculate z-scores per row with conditional std dev check
- Manually separate rows into outliers and clean data with conditional push
- Manually construct output arrays with z-score and stats columns
- Manually filter outliers by category for summary statistics
- Manually construct headers by spreading original headers + computed columns
- Write headers and data separately for each output sheet
- Duplicate category grouping logic (once for stats, again for filtering)
Key Techniques¶
- Calculate statistics per group: Use
.groupBy()to organize data, then compute mean/std dev - Z-score for outlier detection: Mark values > 2 std deviations from mean as outliers
- Add computed columns: Use
.withColumn()to add z-score, outlier flag, and category stats - Separate findings: Export clean data and outliers to different sheets
- Summary by category: Create a breakdown showing anomalies per category
- Handle edge cases: Check for zero std dev (no variance) before dividing
- Clean output: Use
.drop()to remove helper columns from final export
See Filter Guide for more filtering patterns and groupBy Guide for advanced aggregation.