Skip to content

validate

Validate SQL dump files for syntax errors, encoding issues, and referential integrity.

Alias: val (e.g., sql-splitter val dump.sql --strict)

  • Pre-restore verification - Check dumps before importing to catch issues early
  • CI/CD quality gates - Block deployments if backups are malformed
  • Backup integrity monitoring - Verify automated backups haven’t been corrupted
  • Encoding issue detection - Find invalid UTF-8 sequences that cause import failures
  • Referential integrity auditing - Detect orphaned foreign key references and duplicate primary keys

Use diff instead if you need to compare two dumps for differences.

Validation runs in two passes:

  1. Schema pass - Parses the entire dump, validates SQL syntax, checks UTF-8 encoding, and builds a table registry. This pass is fast and streams through the file with constant memory (~10MB).

  2. Data integrity pass (optional) - If PK/FK checks are enabled, a second pass tracks primary key values and verifies foreign key references. This uses a PkHashSet that stores 64-bit hashes of seen values—memory grows with row count (roughly 8 bytes per unique PK value).

Memory considerations: For a table with 10 million rows, expect ~80MB memory for PK tracking. Use --no-fk-checks for large dumps to skip the data integrity pass entirely, or --max-rows-per-table to limit tracking per table.

Exit codes:

  • 0 - Valid (no errors; warnings may exist)
  • 1 - Invalid (errors found, or warnings with --strict)
  • 2 - Runtime error (file not found, parse failure)
Terminal window
sql-splitter validate <INPUT> [OPTIONS]
Terminal window
# Validate a single dump
sql-splitter validate dump.sql
# Validate with progress bar (useful for large files)
sql-splitter validate large-dump.sql --progress
# Validate a PostgreSQL pg_dump file
sql-splitter validate pg_dump.sql --dialect=postgres
Terminal window
# Strict mode - treat warnings as errors
sql-splitter validate dump.sql --strict
# JSON output for parsing in CI
sql-splitter validate dump.sql --json
# Fail fast on first error (for batch validation)
sql-splitter validate "backups/*.sql" --fail-fast --strict
Terminal window
# Skip PK/FK checks entirely (fastest, lowest memory)
sql-splitter validate dump.sql --no-fk-checks
# Limit rows checked per table (middle ground)
sql-splitter validate dump.sql --max-rows-per-table 100000
# No limit on rows (full check, highest memory)
sql-splitter validate dump.sql --no-limit
FlagShortDescriptionDefault
--dialect-dSQL dialect: mysql, postgres, sqlite, mssqlauto-detect
--strictTreat warnings as errors (exit 1)false
--jsonOutput results as JSONfalse
--max-rows-per-tableMax rows per table for PK/FK checks (0 = unlimited)1,000,000
--no-limitDisable row limit for PK/FK checksfalse
--no-fk-checksSkip PK/FK data integrity checksfalse
--progress-pShow progress barfalse
--fail-fastStop on first error (for glob patterns)false
CheckWhat It DetectsMemory Impact
SQL syntaxMalformed statements, unclosed strings, invalid keywordsNone (streaming)
EncodingInvalid UTF-8 byte sequences, mixed encodingsNone (streaming)
DDL/DML consistencyINSERT/COPY into non-existent tables, column count mismatchesMinimal (table registry)
Duplicate PKMultiple rows with the same primary key value~8 bytes per row
FK referential integrityForeign key values referencing non-existent parent rows~8 bytes per FK reference

The first three checks run during the schema pass. PK/FK checks require the data integrity pass and can be disabled with --no-fk-checks.

✓ dump.sql is valid
Warnings:
- Table 'logs' has no primary key
- Column 'users.email' allows NULL but has unique constraint
Terminal window
sql-splitter validate dump.sql --json
{
"dialect": "mysql",
"issues": [
{
"severity": "warning",
"code": "W002",
"message": "Table 'logs' has no primary key",
"location": { "table": "logs", "statement_index": 3 }
}
],
"summary": {
"errors": 0,
"warnings": 1,
"info": 0,
"tables_scanned": 5,
"statements_scanned": 150
},
"checks": {
"syntax": "ok",
"encoding": "ok",
"ddl_dml_consistency": "ok",
"pk_duplicates": "ok",
"fk_integrity": "ok"
}
}
Terminal window
# Simple CI gate
sql-splitter validate dump.sql --strict || exit 1
# Parse JSON with jq (check for errors)
sql-splitter validate dump.sql --json | jq 'select(.summary.errors > 0)'
# Multi-file validation returns aggregate with results array
sql-splitter validate "*.sql" --json | jq '.results[] | select(.summary.errors > 0)'

Catch issues before investing time in splitting large dumps:

Terminal window
sql-splitter validate dump.sql --strict && \
sql-splitter split dump.sql -o tables/

Ensure dialect conversion produces valid output:

Terminal window
sql-splitter convert mysql.sql -o postgres.sql --to postgres
sql-splitter validate postgres.sql --dialect=postgres --strict

Check integrity and gather statistics in one pass:

Terminal window
for dump in backups/*.sql; do
echo "=== $dump ==="
sql-splitter validate "$dump" --no-fk-checks && \
sql-splitter analyze "$dump"
done

Full validation pipeline before database restore:

Terminal window
sql-splitter validate dump.sql --strict --json > validation.json
if [ $? -eq 0 ]; then
sql-splitter order dump.sql -o ordered.sql
mysql mydb < ordered.sql
fi

Symptom: Process killed or “memory allocation failed” error on large dumps.

Cause: PK/FK integrity checks track hashes of all primary key values, which grows with row count.

Solutions:

  1. Skip integrity checks entirely: --no-fk-checks
  2. Limit rows checked per table: --max-rows-per-table 500000
  3. Validate tables individually after splitting: sql-splitter split then validate each file

Symptom: Validation reports missing FK references, but data imports successfully.

Cause: The dump may define foreign keys before the referenced parent rows appear (common with deferred constraints or out-of-order dumps).

Solutions:

  1. Reorder the dump first: sql-splitter order dump.sql -o ordered.sql
  2. Skip FK checks: --no-fk-checks
  3. If using PostgreSQL with deferred FKs, this is expected behavior—the database handles it at commit time

Symptom: UTF-8 encoding errors on files that open fine in editors.

Cause: The file may use Latin-1, Windows-1252, or mixed encodings.

Solutions:

  1. Convert to UTF-8: iconv -f WINDOWS-1252 -t UTF-8 dump.sql > dump-utf8.sql
  2. Check the original mysqldump charset: ensure --default-character-set=utf8mb4

Symptom: No validation errors, but mysql or psql reports syntax errors.

Cause: sql-splitter validates common SQL patterns but may not catch all dialect-specific edge cases.

Solutions:

  1. Ensure correct dialect: --dialect=postgres for pg_dump files
  2. Check for dialect-specific syntax not yet supported
  3. Report the issue with a minimal reproduction

Symptom: Many warnings about tables without PKs.

Cause: PK duplicate detection requires knowing the primary key. Tables without PKs can only have basic syntax/encoding checks.

Solutions:

  1. This is informational—data integrity for these tables is skipped
  2. Use --strict only if you want to enforce PKs on all tables
  3. Consider adding PKs to these tables in your schema