diff

Compare two SQL dump files to detect schema and data differences.

Alias: df (e.g., sql-splitter df old.sql new.sql)

When to Use This

Pre-deployment verification - Compare production dump with staging to catch unexpected changes
Migration validation - Verify that a migration produces the expected schema changes
Schema drift detection - Detect when databases have diverged from each other
Generate migration scripts - Use --format sql to create migration DDL automatically
Data auditing - Track row additions, deletions, and modifications between snapshots

Use validate if you only need to check a single dump’s integrity without comparison.

How It Works

The diff command loads both dumps, extracts schema metadata (tables, columns, indexes, foreign keys), and compares them structurally. For data comparison, it uses primary keys to match rows efficiently - this means tables without primary keys require special handling (--allow-no-pk).

Exit codes:

0 - No differences found
1 - Differences detected (useful for CI gates)
2 - Error during comparison

Usage

sql-splitter diff <OLD> <NEW> [OPTIONS]

Examples

Basic Comparison

# Compare two dumps (schema + data)
sql-splitter diff old.sql new.sql

# Schema-only comparison (faster, no data loading)
sql-splitter diff old.sql new.sql --schema-only

# Data-only comparison (assumes schema is identical)
sql-splitter diff old.sql new.sql --data-only

Generating Migration Scripts

The --format sql option generates runnable DDL to transform the old schema into the new:

sql-splitter diff old.sql new.sql --format sql -o migration.sql

This creates ALTER TABLE, CREATE TABLE, and DROP TABLE statements. Review before running in production.

CI/CD Integration

# Fail CI if any differences exist
sql-splitter diff production.sql staging.sql --schema-only || exit 1

# Export diff as JSON for further processing
sql-splitter diff old.sql new.sql --format json -o diff.json

Filtering and Noise Reduction

# Compare specific tables only
sql-splitter diff old.sql new.sql --tables users,orders

# Ignore auto-updated timestamp columns
sql-splitter diff old.sql new.sql --ignore-columns "*.updated_at,*.created_at"

# Ignore column order differences (focus on types/constraints)
sql-splitter diff old.sql new.sql --ignore-order

Tables Without Primary Keys

By default, tables without PKs are skipped for data comparison. Override this behavior:

# Use all columns as a composite key (slower but works)
sql-splitter diff old.sql new.sql --allow-no-pk

# Or specify a custom key for specific tables
sql-splitter diff old.sql new.sql --primary-key logs:timestamp+message

Options

Flag	Short	Description	Default
`--output`	`-o`	Output file	stdout
`--dialect`	`-d`	SQL dialect	auto-detect
`--schema-only`		Compare schema only	false
`--data-only`		Compare data only	false
`--format`	`-f`	Output format: `text`, `json`, `sql`	`text`
`--tables`	`-t`	Only compare these tables	all
`--exclude`	`-e`	Exclude these tables	none
`--max-pk-entries`		Max PK entries to track	10,000,000
`--verbose`	`-v`	Show sample PK values	false
`--primary-key`		Override PK (format: `table:col1+col2`)	auto
`--ignore-order`		Ignore column order differences	false
`--ignore-columns`		Ignore columns matching glob patterns	none
`--allow-no-pk`		Compare tables without PK using all columns	false
`--progress`	`-p`	Show progress bar	false

What Diff Detects

Schema changes:

Tables added/removed
Columns added/removed/modified (type, nullability)
Primary key changes
Foreign key changes
Index changes (CREATE INDEX, inline INDEX/KEY)

Data changes:

Rows added/removed/modified
Uses PK-based comparison for efficiency

Output Formats

Text (default)

Schema Changes:
  + Table 'audit_logs' added
  - Table 'temp_data' removed
  ~ Table 'users' modified:
    + Column 'phone' added (VARCHAR(20))
    ~ Column 'email' changed: VARCHAR(100) → VARCHAR(255)

Data Changes:
  users: +5 added, -2 removed, ~10 modified
  orders: +100 added

JSON

{
  "schema": {
    "tables_added": ["audit_logs"],
    "tables_removed": ["temp_data"],
    "tables_modified": {
      "users": {
        "columns_added": [{ "name": "phone", "type": "VARCHAR(20)" }],
        "columns_modified": [
          { "name": "email", "old": "VARCHAR(100)", "new": "VARCHAR(255)" }
        ]
      }
    }
  },
  "data": {
    "users": { "added": 5, "removed": 2, "modified": 10 },
    "orders": { "added": 100, "removed": 0, "modified": 0 }
  }
}

SQL Migration

-- Migration script generated by sql-splitter diff

ALTER TABLE users ADD COLUMN phone VARCHAR(20);
ALTER TABLE users MODIFY COLUMN email VARCHAR(255);

CREATE TABLE audit_logs (
  id INT PRIMARY KEY,
  ...
);

DROP TABLE temp_data;

Troubleshooting

Diff reports many changes but I only reformatted

Statement ordering and formatting differences can cause noise. Normalize both dumps first:

sql-splitter order old.sql -o old-sorted.sql
sql-splitter order new.sql -o new-sorted.sql
sql-splitter diff old-sorted.sql new-sorted.sql

See order for details on topological sorting and handling circular dependencies.

Diff is slow on large dumps

Use --schema-only if you only need DDL comparison
Use --tables to limit scope to specific tables
Use --max-pk-entries to limit row tracking for data comparison

Many “Table has no primary key” warnings

Tables without primary keys can’t be efficiently compared. Options:

Skip data comparison for those tables: Default behavior, just note the warnings
Use --allow-no-pk: Compare using all columns (slower, may have false positives)
Specify custom keys: --primary-key logs:timestamp+source_id

Exit code is 1 but no output

The exit code indicates differences were found. Use --verbose to see details:

sql-splitter diff old.sql new.sql --verbose