Skip to content

analyze

Read-only analysis of SQL dump files to show table statistics.

Alias: an (e.g., sql-splitter an dump.sql)

  • Quick inspection before splitting — understand what’s in a dump before running split
  • Finding the largest tables — identify which tables consume the most space in your dump
  • Estimating backup sizes — get byte-level breakdowns without extracting files
  • Comparing dump sizes over time — track table growth by analyzing dumps from different dates
  • Scripted reporting — use --json output for automated monitoring and dashboards

The analyze command performs a single streaming pass through the dump file, counting statements and rows without loading the entire file into memory.

What counts as a row: Each value tuple in an INSERT statement. A multi-row insert like INSERT INTO users VALUES (1, 'alice'), (2, 'bob') counts as 2 rows.

What counts as a statement: Any complete SQL statement associated with a table—CREATE TABLE, INSERT, ALTER TABLE, CREATE INDEX, DROP TABLE, and PostgreSQL COPY blocks.

What “Size” measures: The total bytes of raw SQL statements attributed to each table in the dump file. This includes the statement text itself (e.g., the full INSERT INTO... line).

Memory usage: ~7MB peak for typical dumps. The streaming architecture keeps memory constant regardless of file size.

Exit codes:

  • 0 — Analysis completed successfully
  • 1 — Error reading file or parsing SQL
Terminal window
sql-splitter analyze <INPUT> [OPTIONS]
FlagShortDescriptionDefault
--dialect-dSQL dialect: mysql, postgres, sqlite, mssqlauto-detect
--progress-pShow progress barfalse
--fail-fastStop on first error (for glob patterns)false
--jsonOutput results as JSONfalse
Terminal window
sql-splitter analyze database.sql

For dumps over 100MB, enable the progress bar to track completion:

Terminal window
sql-splitter analyze large-dump.sql --progress

Analyze multiple dumps using glob patterns:

Terminal window
sql-splitter analyze "backups/**/*.sql" --fail-fast
Terminal window
sql-splitter analyze dump.sql --json
┌─────────────┬──────┬───────────┬─────────────┐
│ Table │ Rows │ Size (KB) │ Statements │
├─────────────┼──────┼───────────┼─────────────┤
│ users │ 150 │ 12.3 │ 2 │
│ orders │ 500 │ 45.2 │ 2 │
│ products │ 100 │ 8.1 │ 2 │
│ order_items │ 1200 │ 89.4 │ 2 │
└─────────────┴──────┴───────────┴─────────────┘
Summary:
Tables: 4
Total rows: 1950
Total size: 155.0 KB
Dialect: mysql
{
"input": "dump.sql",
"dialect": "mysql",
"tables": [
{ "name": "users", "rows": 150, "bytes": 12595, "statements": 2 },
{ "name": "orders", "rows": 500, "bytes": 46284, "statements": 2 }
],
"summary": {
"total_tables": 4,
"total_rows": 1950,
"total_bytes": 158720,
"elapsed_ms": 25
}
}

Use jq to filter the JSON output:

Terminal window
sql-splitter analyze dump.sql --json | jq '.tables[] | select(.bytes > 1000000)'
Terminal window
sql-splitter analyze dump.sql --json | jq -r '.tables | sort_by(-.bytes) | .[0:5] | .[] | "\(.name): \(.bytes / 1024 | floor) KB"'

Track dump sizes over time:

#!/bin/bash
DATE=$(date +%Y-%m-%d)
sql-splitter analyze /backups/daily.sql --json | jq --arg date "$DATE" '{date: $date, total_bytes: .summary.total_bytes}' >> /var/log/dump-sizes.jsonl
Terminal window
diff <(sql-splitter analyze old.sql --json | jq '.tables | sort_by(.name)') \
<(sql-splitter analyze new.sql --json | jq '.tables | sort_by(.name)')

Multi-row INSERT statements count each value tuple separately. An insert like INSERT INTO t VALUES (1), (2), (3) counts as 3 rows, not 1. This matches the actual number of records being inserted.

The “Size” column only counts bytes from statements attributed to specific tables. Global statements like SET commands, comments, and file headers are not included in any table’s size. The sum of table sizes will typically be less than the total file size.

A typical mysqldump output includes one CREATE TABLE and one INSERT statement per table, resulting in 2 statements. If you see higher counts, your dump may include ALTER TABLE, CREATE INDEX, or multiple INSERT batches.

Enable --progress to see if the file is being read. For files over 1GB, expect ~1-2 seconds per 100MB. If significantly slower, check disk I/O or try decompressing the file first if it’s compressed.

Use --dialect to explicitly specify the SQL dialect when auto-detection fails:

Terminal window
sql-splitter analyze dump.sql --dialect postgres
  • split — Split a dump into per-table files
  • validate — Check dump integrity
  • sample — Extract a percentage of rows for testing
  • JSON Output Schema — Schema for --json output
  • Glossary — Definitions for statement, row, and other terms