validate

Validate SQL dump files for syntax errors, encoding issues, and referential integrity.

Alias: val (e.g., sql-splitter val dump.sql --strict)

When to Use This

Pre-restore verification - Check dumps before importing to catch issues early
CI/CD quality gates - Block deployments if backups are malformed
Backup integrity monitoring - Verify automated backups haven’t been corrupted
Encoding issue detection - Find invalid UTF-8 sequences that cause import failures
Referential integrity auditing - Detect orphaned foreign key references and duplicate primary keys

Use diff instead if you need to compare two dumps for differences.

How It Works

Validation runs in two passes:

Schema pass - Parses the entire dump, validates SQL syntax, checks UTF-8 encoding, and builds a table registry. This pass is fast and streams through the file with constant memory (~10MB).
Data integrity pass (optional) - If PK/FK checks are enabled, a second pass tracks primary key values and verifies foreign key references. This uses a PkHashSet that stores 64-bit hashes of seen values—memory grows with row count (roughly 8 bytes per unique PK value).

Memory considerations: For a table with 10 million rows, expect ~80MB memory for PK tracking. Use --no-fk-checks for large dumps to skip the data integrity pass entirely, or --max-rows-per-table to limit tracking per table.

Exit codes:

0 - Valid (no errors; warnings may exist)
1 - Invalid (errors found, or warnings with --strict)
2 - Runtime error (file not found, parse failure)

Usage

sql-splitter validate <INPUT> [OPTIONS]

Examples

Basic Validation

# Validate a single dump
sql-splitter validate dump.sql

# Validate with progress bar (useful for large files)
sql-splitter validate large-dump.sql --progress

# Validate a PostgreSQL pg_dump file
sql-splitter validate pg_dump.sql --dialect=postgres

CI/CD Integration

# Strict mode - treat warnings as errors
sql-splitter validate dump.sql --strict

# JSON output for parsing in CI
sql-splitter validate dump.sql --json

# Fail fast on first error (for batch validation)
sql-splitter validate "backups/*.sql" --fail-fast --strict

Memory-Constrained Validation

# Skip PK/FK checks entirely (fastest, lowest memory)
sql-splitter validate dump.sql --no-fk-checks

# Limit rows checked per table (middle ground)
sql-splitter validate dump.sql --max-rows-per-table 100000

# No limit on rows (full check, highest memory)
sql-splitter validate dump.sql --no-limit

Options

Flag	Short	Description	Default
`--dialect`	`-d`	SQL dialect: `mysql`, `postgres`, `sqlite`, `mssql`	auto-detect
`--strict`		Treat warnings as errors (exit 1)	false
`--json`		Output results as JSON	false
`--max-rows-per-table`		Max rows per table for PK/FK checks (0 = unlimited)	1,000,000
`--no-limit`		Disable row limit for PK/FK checks	false
`--no-fk-checks`		Skip PK/FK data integrity checks	false
`--progress`	`-p`	Show progress bar	false
`--fail-fast`		Stop on first error (for glob patterns)	false

Validation Checks

Check	What It Detects	Memory Impact
SQL syntax	Malformed statements, unclosed strings, invalid keywords	None (streaming)
Encoding	Invalid UTF-8 byte sequences, mixed encodings	None (streaming)
DDL/DML consistency	INSERT/COPY into non-existent tables, column count mismatches	Minimal (table registry)
Duplicate PK	Multiple rows with the same primary key value	~8 bytes per row
FK referential integrity	Foreign key values referencing non-existent parent rows	~8 bytes per FK reference

The first three checks run during the schema pass. PK/FK checks require the data integrity pass and can be disabled with --no-fk-checks.

Output

✓ dump.sql is valid

Warnings:
  - Table 'logs' has no primary key
  - Column 'users.email' allows NULL but has unique constraint

JSON Output

sql-splitter validate dump.sql --json

{
  "dialect": "mysql",
  "issues": [
    {
      "severity": "warning",
      "code": "W002",
      "message": "Table 'logs' has no primary key",
      "location": { "table": "logs", "statement_index": 3 }
    }
  ],
  "summary": {
    "errors": 0,
    "warnings": 1,
    "info": 0,
    "tables_scanned": 5,
    "statements_scanned": 150
  },
  "checks": {
    "syntax": "ok",
    "encoding": "ok",
    "ddl_dml_consistency": "ok",
    "pk_duplicates": "ok",
    "fk_integrity": "ok"
  }
}

CI Usage

# Simple CI gate
sql-splitter validate dump.sql --strict || exit 1

# Parse JSON with jq (check for errors)
sql-splitter validate dump.sql --json | jq 'select(.summary.errors > 0)'

# Multi-file validation returns aggregate with results array
sql-splitter validate "*.sql" --json | jq '.results[] | select(.summary.errors > 0)'

Composing with Other Tools

Validate Before Splitting

Catch issues before investing time in splitting large dumps:

sql-splitter validate dump.sql --strict && \
  sql-splitter split dump.sql -o tables/

Validate After Converting Dialects

Ensure dialect conversion produces valid output:

sql-splitter convert mysql.sql -o postgres.sql --to postgres
sql-splitter validate postgres.sql --dialect=postgres --strict

Validate in a Pipeline with analyze

Check integrity and gather statistics in one pass:

for dump in backups/*.sql; do
  echo "=== $dump ==="
  sql-splitter validate "$dump" --no-fk-checks && \
    sql-splitter analyze "$dump"
done

Pre-import Workflow

Full validation pipeline before database restore:

sql-splitter validate dump.sql --strict --json > validation.json
if [ $? -eq 0 ]; then
  sql-splitter order dump.sql -o ordered.sql
  mysql mydb < ordered.sql
fi

Troubleshooting

Out of memory during validation

Symptom: Process killed or “memory allocation failed” error on large dumps.

Cause: PK/FK integrity checks track hashes of all primary key values, which grows with row count.

Solutions:

Skip integrity checks entirely: --no-fk-checks
Limit rows checked per table: --max-rows-per-table 500000
Validate tables individually after splitting: sql-splitter split then validate each file

False positive FK errors

Symptom: Validation reports missing FK references, but data imports successfully.

Cause: The dump may define foreign keys before the referenced parent rows appear (common with deferred constraints or out-of-order dumps).

Solutions:

Reorder the dump first: sql-splitter order dump.sql -o ordered.sql
Skip FK checks: --no-fk-checks
If using PostgreSQL with deferred FKs, this is expected behavior—the database handles it at commit time

Encoding errors on valid files

Symptom: UTF-8 encoding errors on files that open fine in editors.

Cause: The file may use Latin-1, Windows-1252, or mixed encodings.

Solutions:

Convert to UTF-8: iconv -f WINDOWS-1252 -t UTF-8 dump.sql > dump-utf8.sql
Check the original mysqldump charset: ensure --default-character-set=utf8mb4

Validation passes but import fails

Symptom: No validation errors, but mysql or psql reports syntax errors.

Cause: sql-splitter validates common SQL patterns but may not catch all dialect-specific edge cases.

Solutions:

Ensure correct dialect: --dialect=postgres for pg_dump files
Check for dialect-specific syntax not yet supported
Report the issue with a minimal reproduction

”Table has no primary key” warnings

Symptom: Many warnings about tables without PKs.

Cause: PK duplicate detection requires knowing the primary key. Tables without PKs can only have basic syntax/encoding checks.

Solutions:

This is informational—data integrity for these tables is skipped
Use --strict only if you want to enforce PKs on all tables
Consider adding PKs to these tables in your schema

validate

When to Use This

How It Works

Usage

Examples

Basic Validation

CI/CD Integration

Memory-Constrained Validation

Options

Validation Checks

Output

JSON Output

CI Usage

Composing with Other Tools

Validate Before Splitting

Validate After Converting Dialects

Validate in a Pipeline with analyze

Pre-import Workflow

Troubleshooting

Out of memory during validation

False positive FK errors

Encoding errors on valid files

Validation passes but import fails

”Table has no primary key” warnings

See Also