Data Cleanup¶
Clean and standardize messy data from different sources.
User Story¶
You've created a Google Form to collect customer information including names, emails, and subscription preferences. Form responses are automatically saved to a Google Sheet, but the data is messy:
- Names have inconsistent capitalization and whitespace
- Some emails are in different cases
- Subscription preference values don't match the expected options ("yes" or "no")
- A few rows have empty or invalid data
You need to validate the data against a schema (including the enum constraint for preferences), standardize it (lowercase, trimmed), and remove invalid rows before processing.
With gaslamp¶
JavaScript
function cookbook_dataCleanup() {
// Step 1: Define what valid data looks like
// Use FlameGuards for type validation and FlameWright for constraints like enum
const schema = {
name: gaslamp.FlameGuards.isString,
email: gaslamp.FlameGuards.isString,
subscription: gaslamp.FlameWright.enumOf(['yes', 'no']),
};
// Step 2: Load data from the active sheet (typically Form Responses)
const sheet = SpreadsheetApp.getActiveSheet();
const df = gaslamp.BareFrame.fromSheet(sheet);
// Step 3: Validate data against the schema
// Returns { passed, failed } — use passed for valid rows, failed for errors
const { passed, failed } = gaslamp.FlameFrame.from(df, schema);
// Step 4: If there are validation errors, show them with suggestions
if (failed.length > 0) {
// Display error rows as a formatted table with invalid_message column
console.error(failed.length + ' validation error(s) found:');
console.error(failed.display(50, 30));
console.error('→ Fix these rows in the Form Responses sheet and run again');
return;
}
// Step 5: Normalize the data first
// Standardize format: trim whitespace, convert to lowercase
// Data is already validated as strings, so toString() is not needed
const normalized = passed
.withColumn('name', (row) => row.get('name').trim().toLowerCase())
.withColumn('email', (row) => row.get('email').toLowerCase())
.withColumn('subscription', (row) => row.get('subscription').toLowerCase());
// Step 6: Filter out incomplete or invalid rows
// Use Expression for readable, composable filter conditions
const hasValidName = gaslamp.Expression.col('name').ne('');
const hasValidEmail = gaslamp.Expression.col('email').contains('@');
const isValid = hasValidName.and(hasValidEmail);
const standardized = normalized.filter(isValid.toFunction());
// Step 7: Report how many rows were cleaned
console.log(`Cleaned ${df.length} -> ${standardized.length} rows`);
// Step 8: Write the cleaned data to a new sheet
const workbook = SpreadsheetApp.getActiveSpreadsheet();
const cleanSheet = workbook.getSheetByName('Cleaned Data')
?? workbook.insertSheet('Cleaned Data');
standardized.toSheet(cleanSheet);
console.log('Cleaned data exported to new sheet');
}
Key gaslamp features used:
- Schema validation with
FlameFrame.from()— catch type errors early - Chainable
.filter()and.withColumn()— readable data transformations .toSheet()— write DataFrame directly to Google Sheet
Without gaslamp¶
JavaScript
function cookbook_dataCleanup_vanilla() {
// Step 1: Fetch all data from the sheet as a 2D array
const sheet = SpreadsheetApp.getActiveSheet();
const range = sheet.getDataRange();
const values = range.getValues();
const headers = values[0];
const rows = values.slice(1);
// Step 2: Find the column indexes manually
const nameIdx = headers.indexOf('name');
const emailIdx = headers.indexOf('email');
const subscriptionIdx = headers.indexOf('subscription');
// Step 3: Define valid enum values for subscription
const validSubscriptions = ['yes', 'no'];
// Step 4: Manually validate types and constraints, separate valid rows
const passed = [];
const failed = [];
for (const row of rows) {
const name = row[nameIdx];
const email = row[emailIdx];
const subscription = row[subscriptionIdx];
// Manual type checking and validation (no schema validation)
if (typeof name !== 'string' || typeof email !== 'string' || typeof subscription !== 'string') {
failed.push(row);
continue;
}
// Manual enum validation
if (!validSubscriptions.includes(subscription.toLowerCase())) {
failed.push(row);
continue;
}
passed.push(row);
}
if (failed.length > 0) {
console.error('Validation errors:', failed);
return;
}
// Step 5: Normalize valid rows (only selected columns, headers lost)
const normalized = passed.map((row) => [
row[nameIdx].trim().toLowerCase(),
row[emailIdx].toLowerCase(),
row[subscriptionIdx].toLowerCase(),
]);
// Step 6: Filter rows with business rules
const standardized = normalized.filter((row) => {
const [name, email, subscription] = row;
return name !== '' && email.includes('@');
});
// Step 7: Report how many rows were cleaned
console.log(`Cleaned ${rows.length} -> ${standardized.length} rows`);
// Step 8: Write the cleaned data to a new sheet
const workbook = SpreadsheetApp.getActiveSpreadsheet();
const cleanSheet = workbook.getSheetByName('Cleaned Data')
?? workbook.insertSheet('Cleaned Data');
if (standardized.length > 0) {
// Must manually calculate range dimensions and write data
// Note: headers are NOT written by vanilla code (unlike gaslamp's toSheet())
cleanSheet.getRange(1, 1, standardized.length, 3).setValues(standardized);
}
console.log('Cleaned data exported to new sheet (without headers)');
}
Without gaslamp, you need to:
- Manually index columns and track their positions
- Manually type-check each value with
typeof - Manually define and validate enum constraints (e.g.,
validSubscriptionsarray and.includes()checks) - Manually loop through rows and accumulate results with separate logic for each constraint
- Select and transform only specific columns (losing the full row structure)
- Manually write headers and data separately (no automatic header preservation)
- Manually call
setValues()with explicit range dimensions - Track column count and data size manually
Key Techniques¶
- Schema validation first: Always check types before processing
- Normalize before filtering: Transform data to a consistent format, then apply business rules
- Use Expression for filter conditions: Separate and name conditions for clarity (e.g.,
hasValidName,hasValidEmail) - Write to a separate sheet: Keep the original data intact as a backup
See BareFrame Guide for more transformation patterns.