Core Concepts
This page covers the core concepts you need to understand when using sql-splitter. For detailed definitions of database terminology used throughout the documentation, see the Glossary.
SQL Dialects
Section titled “SQL Dialects”sql-splitter supports four SQL dialects:
| Dialect | Tool | Features |
|---|---|---|
| MySQL | mysqldump | Default dialect, backtick quoting, INSERT INTO |
| PostgreSQL | pg_dump | COPY FROM stdin, $$ strings, SET commands |
| SQLite | .dump | INTEGER PRIMARY KEY, .schema commands |
| MSSQL | sqlcmd / SSMS | GO batch separators, IDENTITY, nvarchar |
Auto-Detection
Section titled “Auto-Detection”sql-splitter automatically detects the dialect by analyzing:
- Keywords:
AUTO_INCREMENT(MySQL),SERIAL(PostgreSQL),GO(MSSQL) - Quoting: Backticks (MySQL), double quotes (PostgreSQL/SQLite), brackets (MSSQL)
- Session commands:
SET @(MySQL),SET client_encoding(PostgreSQL)
Override with --dialect:
sql-splitter split dump.sql --dialect=postgresStreaming Architecture
Section titled “Streaming Architecture”sql-splitter uses a streaming architecture that:
- Reads in chunks: Fixed-size buffer reads, not full file loads
- Processes line-by-line: Statements are parsed as they’re read
- Writes immediately: Output files are written in real-time
This means:
- Constant memory: ~50MB regardless of file size
- No file size limit: Process 10GB+ dumps
- Fast startup: No waiting to load the entire file
Compression
Section titled “Compression”sql-splitter automatically detects and decompresses:
| Extension | Format | Notes |
|---|---|---|
.gz | gzip | Most common |
.bz2 | bzip2 | High compression |
.xz | xz/LZMA | Very high compression |
.zst | zstd | Fast compression |
Pass compressed files directly:
sql-splitter split backup.sql.gz -o tables/sql-splitter analyze dump.sql.zstsql-splitter convert mysql.sql.bz2 --to postgresGlob Patterns
Section titled “Glob Patterns”Most commands accept glob patterns for batch operations:
# All SQL files in current directorysql-splitter validate "*.sql"
# Recursive searchsql-splitter analyze "backups/**/*.sql"
# Specific patternssql-splitter split "dumps/*.sql.gz" -o output/Use --fail-fast to stop on first error:
sql-splitter validate "*.sql" --fail-fastJSON Output
Section titled “JSON Output”All commands support --json for machine-readable output:
sql-splitter analyze dump.sql --json | jq '.tables[].name'sql-splitter validate dump.sql --json | jq '.errors'JSON output includes:
- Input/output file paths
- Processing statistics
- Per-table breakdowns
- Warnings and errors
Exit Codes
Section titled “Exit Codes”| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (syntax, I/O, validation failure) |
| 2 | Invalid arguments |
Use in CI scripts:
sql-splitter validate dump.sql --strict || exit 1Statement Types
Section titled “Statement Types”sql-splitter recognizes these SQL statement types:
DDL (Schema):
CREATE TABLECREATE INDEXALTER TABLEDROP TABLE
DML (Data):
INSERT INTOCOPY ... FROM stdin(PostgreSQL)
Control:
- Session settings (
SET,USE) - Comments
- Transaction markers