redact
Replace sensitive data with anonymized values for safe sharing, development, or compliance.
Alias: rd (e.g., sql-splitter rd dump.sql --hash "*.email" -o safe.sql)
When to Use This
Section titled “When to Use This”- GDPR/CCPA compliance - Remove or anonymize PII before sharing dumps with external parties
- Vendor data sharing - Provide realistic data to contractors or consultants without exposing customer information
- Development environment seeding - Create anonymized copies of production data for local development
- Testing data pipelines - Validate ETL processes with structurally identical but safe data
- Compliance audits - Demonstrate PII handling practices with sanitized sample exports
Use sample first if you also need to reduce the dataset size.
How It Works
Section titled “How It Works”Redaction runs as a streaming single-pass operation:
- Parse statements - Reads each SQL statement (INSERT, COPY) from the input
- Match columns - Compares column names against your patterns (e.g.,
*.emailmatchesusers.email,orders.customer_email) - Apply strategy - Transforms matching values using the specified strategy (null, hash, fake, mask, etc.)
- Write output - Emits transformed statements immediately (no buffering of entire file)
Memory characteristics: Constant ~10MB regardless of file size. Only the current statement is held in memory.
Hashing behavior: The hash strategy uses SHA-256 and is deterministic—the same input value always produces the same hash. This preserves foreign key relationships: if users.email and audit_log.user_email both contain alice@example.com, they both hash to the same value.
Fake data with --seed: When using fake strategy, the same --seed value produces identical fake data across runs. The fake value is derived from the original value’s hash, so alice@example.com always becomes the same fake email (e.g., jessica.smith@example.com) with a given seed.
Exit codes:
0- Success1- Validation errors or--strictmode warnings2- Runtime error (file not found, parse failure)
sql-splitter redact <INPUT> [OPTIONS]Examples
Section titled “Examples”Basic Redaction with Inline Patterns
Section titled “Basic Redaction with Inline Patterns”# Hash emails, null SSNs, fake namessql-splitter redact dump.sql -o safe.sql \ --null "*.ssn" \ --hash "*.email" \ --fake "*.name"
# Mask credit cards (show last 4 digits)sql-splitter redact dump.sql -o safe.sql \ --mask "****-****-****-XXXX=*.credit_card"
# Replace passwords with a constant valuesql-splitter redact dump.sql -o safe.sql \ --constant "*.password=REDACTED"Using Configuration Files
Section titled “Using Configuration Files”# Use YAML config for complex rulessql-splitter redact dump.sql -o safe.sql --config redact.yaml
# Auto-generate config by analyzing column namessql-splitter redact dump.sql --generate-config -o redact.yaml
# Validate config without processingsql-splitter redact dump.sql --config redact.yaml --validateReproducible Redaction
Section titled “Reproducible Redaction”# Same seed = identical fake data across runssql-splitter redact dump.sql -o safe.sql --fake "*.name" --seed 42
# Useful for CI: consistent test fixturessql-splitter redact prod.sql -o fixtures/test-data.sql \ --config redact.yaml --seed 12345Targeting Specific Tables
Section titled “Targeting Specific Tables”# Only redact the users and orders tablessql-splitter redact dump.sql -o safe.sql \ --hash "*.email" --tables users,orders
# Redact everything except audit_logssql-splitter redact dump.sql -o safe.sql \ --hash "*.email" --exclude audit_logsOptions
Section titled “Options”| Flag | Short | Description | Default |
|---|---|---|---|
--output | -o | Output SQL file | stdout |
--dialect | -d | SQL dialect | auto-detect |
--config | -c | YAML config file | - |
--generate-config | Analyze input and generate YAML config | false | |
--null | Columns to set to NULL (glob patterns) | - | |
--hash | Columns to hash with SHA256 | - | |
--fake | Columns to replace with fake data | - | |
--mask | Columns to mask (format: pattern=column) | - | |
--constant | Column=value pairs | - | |
--seed | Random seed for reproducibility | random | |
--locale | Locale for fake data | en | |
--tables | -t | Only redact specific tables | all |
--exclude | -x | Exclude specific tables | none |
--strict | Fail on warnings | false | |
--progress | -p | Show progress bar | false |
--dry-run | Preview without writing | false | |
--json | Output as JSON | false | |
--validate | Validate config only | false |
Redaction Strategies
Section titled “Redaction Strategies”| Strategy | Description | Use Case |
|---|---|---|
null | Replace with NULL | Remove data entirely |
hash | SHA256 hash (deterministic) | Preserve FK relationships |
fake | Generate realistic fake data | Realistic test data |
mask | Partial masking | Credit cards, SSNs |
constant | Fixed value | Placeholder values |
shuffle | Redistribute values | Preserve distribution |
skip | No redaction | Passthrough |
Column Pattern Matching
Section titled “Column Pattern Matching”Patterns use table.column format with * as a wildcard:
| Pattern | Matches |
|---|---|
*.email | email column in any table (users.email, orders.contact_email) |
users.email | Only the email column in the users table |
users.* | All columns in the users table |
*.ssn | Any column named ssn in any table |
audit_*.created_by | created_by in tables starting with audit_ |
Patterns are matched in order—first matching rule wins.
Mask Patterns
Section titled “Mask Patterns”# Keep last 4 digits of credit cardsql-splitter redact dump.sql -o safe.sql --mask "****-****-****-XXXX=*.credit_card"Pattern symbols:
*- Replace with asteriskX- Keep original character#- Random digit
Fake Data Generators
Section titled “Fake Data Generators”25+ generators available:
| Generator | Example Output |
|---|---|
email | jessica.smith@example.com |
name | Robert Johnson |
first_name | Sarah |
last_name | Williams |
phone | +1 (555) 234-5678 |
address | 123 Oak Street, Springfield, IL |
city | Portland |
company | Acme Corporation |
ip | 192.168.1.100 |
uuid | 550e8400-e29b-41d4-... |
date | 1985-07-23 |
credit_card | 4532015112830366 |
ssn | 123-45-6789 |
Auto-Generating Configuration
Section titled “Auto-Generating Configuration”The --generate-config flag analyzes column names to detect common PII patterns:
sql-splitter redact dump.sql --generate-config -o redact.yamlDetected patterns include:
- Email:
*email*,*e_mail* - Phone:
*phone*,*mobile*,*tel* - Names:
*name*,*first_name*,*last_name* - Financial:
*ssn*,*social_security*,*credit_card*,*card_number* - Addresses:
*address*,*street*,*zip*,*postal* - Auth:
*password*,*secret*,*token*
Review and adjust the generated config before use—automated detection may miss custom column names or flag non-sensitive columns.
YAML Config
Section titled “YAML Config”seed: 12345locale: en
defaults: strategy: skip
rules: - column: "*.ssn" strategy: null - column: "*.email" strategy: hash - column: "*.name" strategy: fake generator: name - column: "*.credit_card" strategy: mask pattern: "****-****-****-XXXX"
skip_tables: - schema_migrationsComposing with Other Tools
Section titled “Composing with Other Tools”Sample Then Redact
Section titled “Sample Then Redact”Reduce dataset size before anonymizing for faster processing:
sql-splitter sample prod.sql -o sampled.sql --percent 10 --preserve-relationssql-splitter redact sampled.sql -o dev.sql --config redact.yamlValidate After Redaction
Section titled “Validate After Redaction”Ensure redacted output is valid SQL:
sql-splitter redact dump.sql -o safe.sql --config redact.yamlsql-splitter validate safe.sql --strictFull Dev Environment Pipeline
Section titled “Full Dev Environment Pipeline”Create a complete anonymized dev dataset:
# 1. Sample 5% with FK preservationsql-splitter sample prod.sql -o sampled.sql \ --percent 5 --preserve-relations --seed 42
# 2. Redact PIIsql-splitter redact sampled.sql -o dev.sql \ --config redact.yaml --seed 42
# 3. Validate and reorder for clean importsql-splitter validate dev.sql --strict && \ sql-splitter order dev.sql -o dev-ordered.sqlConvert and Redact for Different Environments
Section titled “Convert and Redact for Different Environments”# Redact MySQL dump, convert to PostgreSQL for testingsql-splitter redact mysql.sql -o safe.sql --config redact.yamlsql-splitter convert safe.sql -o postgres.sql --to postgresTroubleshooting
Section titled “Troubleshooting”Hashed values break foreign key relationships
Section titled “Hashed values break foreign key relationships”Symptom: FK errors when importing redacted dump; parent/child relationships are broken.
Cause: Only one side of a relationship was hashed, or different patterns matched parent and child columns.
Solutions:
- Ensure both sides use the same pattern:
--hash "*.user_id"matches in all tables - Use explicit patterns for both:
--hash "users.id" --hash "orders.user_id" - Verify with:
sql-splitter validate redacted.sql
Fake data repeats or is inconsistent across runs
Section titled “Fake data repeats or is inconsistent across runs”Symptom: Different fake values each time you run redaction, or unexpected repetition.
Cause: No --seed specified (random seed each run), or same original values produce same fake values (by design).
Solutions:
- Use
--seedfor reproducible output:--seed 42 - Same original value → same fake value is intentional (preserves consistency)
- For truly random fake data each run, omit
--seed
Column not being redacted
Section titled “Column not being redacted”Symptom: Sensitive data appears in output despite having a matching rule.
Cause: Pattern doesn’t match the actual table.column name, or an earlier rule with skip strategy matched first.
Solutions:
- Check exact column names:
sql-splitter analyze dump.sqlshows all table/column names - Use
--dry-runto preview what would be redacted - Remember patterns match in order—more specific patterns should come before
*.columnwildcards - Verify pattern syntax:
users.emailnotusers->email
Config validation errors
Section titled “Config validation errors”Symptom: “Invalid config” or “Unknown strategy” errors.
Cause: YAML syntax errors, typos in strategy names, or invalid pattern format.
Solutions:
- Validate before running:
--validateflag - Check strategy spelling:
null,hash,fake,mask,constant,shuffle,skip - Ensure patterns are quoted in YAML:
column: "*.email"notcolumn: *.email - Use
--generate-configas a starting template
”Unknown column” warnings
Section titled “”Unknown column” warnings”Symptom: Warnings about rules that didn’t match any columns.
Cause: Pattern doesn’t match any columns in the dump (typo or columns don’t exist).
Solutions:
- Run
sql-splitter analyze dump.sqlto see actual column names - Use
--strictto fail on unmatched rules (catches config errors early) - Check for schema changes since config was created
See Also
Section titled “See Also”sample- Create reduced datasets before redactingvalidate- Verify redacted output integrityanalyze- List tables and columns for pattern planningconvert- Convert redacted dumps to different dialects- Redact Config Reference - Full YAML configuration options
- JSON Output Schema - Schema for
--jsonoutput - Memory & Streaming - Detailed memory profiles per command
- Glossary - PII, hashing, and streaming terminology