analyze
Read-only analysis of SQL dump files to show table statistics.
Alias: an (e.g., sql-splitter an dump.sql)
When to Use This
Section titled “When to Use This”- Quick inspection before splitting — understand what’s in a dump before running
split - Finding the largest tables — identify which tables consume the most space in your dump
- Estimating backup sizes — get byte-level breakdowns without extracting files
- Comparing dump sizes over time — track table growth by analyzing dumps from different dates
- Scripted reporting — use
--jsonoutput for automated monitoring and dashboards
How It Works
Section titled “How It Works”The analyze command performs a single streaming pass through the dump file, counting statements and rows without loading the entire file into memory.
What counts as a row: Each value tuple in an INSERT statement. A multi-row insert like INSERT INTO users VALUES (1, 'alice'), (2, 'bob') counts as 2 rows.
What counts as a statement: Any complete SQL statement associated with a table—CREATE TABLE, INSERT, ALTER TABLE, CREATE INDEX, DROP TABLE, and PostgreSQL COPY blocks.
What “Size” measures: The total bytes of raw SQL statements attributed to each table in the dump file. This includes the statement text itself (e.g., the full INSERT INTO... line).
Memory usage: ~7MB peak for typical dumps. The streaming architecture keeps memory constant regardless of file size.
Exit codes:
0— Analysis completed successfully1— Error reading file or parsing SQL
sql-splitter analyze <INPUT> [OPTIONS]Options
Section titled “Options”| Flag | Short | Description | Default |
|---|---|---|---|
--dialect | -d | SQL dialect: mysql, postgres, sqlite, mssql | auto-detect |
--progress | -p | Show progress bar | false |
--fail-fast | Stop on first error (for glob patterns) | false | |
--json | Output results as JSON | false |
Examples
Section titled “Examples”Basic Analysis
Section titled “Basic Analysis”sql-splitter analyze database.sqlLarge Files with Progress
Section titled “Large Files with Progress”For dumps over 100MB, enable the progress bar to track completion:
sql-splitter analyze large-dump.sql --progressBatch Analysis
Section titled “Batch Analysis”Analyze multiple dumps using glob patterns:
sql-splitter analyze "backups/**/*.sql" --fail-fastJSON Output for Scripting
Section titled “JSON Output for Scripting”sql-splitter analyze dump.sql --jsonOutput
Section titled “Output”┌─────────────┬──────┬───────────┬─────────────┐│ Table │ Rows │ Size (KB) │ Statements │├─────────────┼──────┼───────────┼─────────────┤│ users │ 150 │ 12.3 │ 2 ││ orders │ 500 │ 45.2 │ 2 ││ products │ 100 │ 8.1 │ 2 ││ order_items │ 1200 │ 89.4 │ 2 │└─────────────┴──────┴───────────┴─────────────┘
Summary: Tables: 4 Total rows: 1950 Total size: 155.0 KB Dialect: mysqlJSON Output
Section titled “JSON Output”{ "input": "dump.sql", "dialect": "mysql", "tables": [ { "name": "users", "rows": 150, "bytes": 12595, "statements": 2 }, { "name": "orders", "rows": 500, "bytes": 46284, "statements": 2 } ], "summary": { "total_tables": 4, "total_rows": 1950, "total_bytes": 158720, "elapsed_ms": 25 }}Composing with Other Tools
Section titled “Composing with Other Tools”Find Tables Over a Certain Size
Section titled “Find Tables Over a Certain Size”Use jq to filter the JSON output:
sql-splitter analyze dump.sql --json | jq '.tables[] | select(.bytes > 1000000)'Get the Top 5 Largest Tables
Section titled “Get the Top 5 Largest Tables”sql-splitter analyze dump.sql --json | jq -r '.tables | sort_by(-.bytes) | .[0:5] | .[] | "\(.name): \(.bytes / 1024 | floor) KB"'Monitoring Script
Section titled “Monitoring Script”Track dump sizes over time:
#!/bin/bashDATE=$(date +%Y-%m-%d)sql-splitter analyze /backups/daily.sql --json | jq --arg date "$DATE" '{date: $date, total_bytes: .summary.total_bytes}' >> /var/log/dump-sizes.jsonlCompare Two Dumps
Section titled “Compare Two Dumps”diff <(sql-splitter analyze old.sql --json | jq '.tables | sort_by(.name)') \ <(sql-splitter analyze new.sql --json | jq '.tables | sort_by(.name)')Troubleshooting
Section titled “Troubleshooting”Row count seems wrong
Section titled “Row count seems wrong”Multi-row INSERT statements count each value tuple separately. An insert like INSERT INTO t VALUES (1), (2), (3) counts as 3 rows, not 1. This matches the actual number of records being inserted.
Size doesn’t match the file size
Section titled “Size doesn’t match the file size”The “Size” column only counts bytes from statements attributed to specific tables. Global statements like SET commands, comments, and file headers are not included in any table’s size. The sum of table sizes will typically be less than the total file size.
Statement count is always 2 per table
Section titled “Statement count is always 2 per table”A typical mysqldump output includes one CREATE TABLE and one INSERT statement per table, resulting in 2 statements. If you see higher counts, your dump may include ALTER TABLE, CREATE INDEX, or multiple INSERT batches.
Analysis is slow on large files
Section titled “Analysis is slow on large files”Enable --progress to see if the file is being read. For files over 1GB, expect ~1-2 seconds per 100MB. If significantly slower, check disk I/O or try decompressing the file first if it’s compressed.
”Unknown dialect” errors
Section titled “”Unknown dialect” errors”Use --dialect to explicitly specify the SQL dialect when auto-detection fails:
sql-splitter analyze dump.sql --dialect postgresSee Also
Section titled “See Also”split— Split a dump into per-table filesvalidate— Check dump integritysample— Extract a percentage of rows for testing- JSON Output Schema — Schema for
--jsonoutput - Glossary — Definitions for statement, row, and other terms