# sql-splitter > High-performance CLI tool for splitting large SQL dump files into individual table files. Written in Rust, achieves 400+ MB/s throughput with constant ~50MB memory usage regardless of file size. 1.25x faster than the Go version on 10GB files. sql-splitter reads SQL dump files and routes statements to separate output files based on table name. It handles CREATE TABLE, INSERT INTO, CREATE INDEX, ALTER TABLE, and DROP TABLE statements. ## Key Features - Written in Rust with zero-cost abstractions and no garbage collection - Streaming architecture handles files larger than available RAM - Adaptive buffer sizing based on file size (64KB optimal for CPU cache) - Zero-copy parsing using `&[u8]` slices in hot path - Fast hashing with `ahash::AHashMap` instead of default SipHash - Pre-compiled static regexes via `once_cell::Lazy` ## Installation ```bash # Using cargo cargo install --git https://github.com/helgesverre/sql-splitter # Or build from source git clone https://github.com/helgesverre/sql-splitter.git cd sql-splitter cargo build --release # Optimized build for best performance RUSTFLAGS="-C target-cpu=native" cargo build --release ``` ## Commands ### split Split a SQL file into individual table files. ```bash sql-splitter split database.sql --output=tables sql-splitter split database.sql --tables=users,posts # filter specific tables sql-splitter split database.sql --dry-run # preview without writing sql-splitter split database.sql --progress # show progress ``` Flags: - `--output, -o`: Output directory (default: "output") - `--tables, -t`: Filter to specific tables (comma-separated) - `--dry-run`: Preview what would be split without writing - `--progress, -p`: Show progress during processing - `--verbose, -v`: Verbose output ### analyze Analyze a SQL file and display table statistics. ```bash sql-splitter analyze database.sql sql-splitter analyze database.sql --progress ``` ## Supported Statement Types - CREATE TABLE - INSERT INTO - CREATE INDEX - ALTER TABLE - DROP TABLE Other statements (SELECT, UPDATE, DELETE) are skipped. ## Performance Benchmarks on Apple M2 Max: - Parser throughput: 400-500 MB/s - vs Go version: 1.25x faster on 10GB files - Memory usage: ~50 MB constant - Cold start: ~5ms ## Documentation - [README](https://github.com/helgesverre/sql-splitter/blob/main/README.md): Full documentation with architecture details - [AGENTS.md](https://github.com/helgesverre/sql-splitter/blob/main/AGENTS.md): AI assistant guidance for working with the codebase ## Source Code - [src/cmd/](https://github.com/helgesverre/sql-splitter/tree/main/src/cmd): CLI commands (split, analyze) - [src/parser/](https://github.com/helgesverre/sql-splitter/tree/main/src/parser): Streaming SQL parser with `fill_buf` + `consume` pattern - [src/writer/](https://github.com/helgesverre/sql-splitter/tree/main/src/writer): Buffered file writers with WriterPool - [src/splitter/](https://github.com/helgesverre/sql-splitter/tree/main/src/splitter): Split orchestration - [src/analyzer/](https://github.com/helgesverre/sql-splitter/tree/main/src/analyzer): Statistical analysis ## Architecture ``` BufReader (fill_buf) → Parser (Streaming) → WriterPool (BufWriter) → Table Files 64KB Buffer Statement Buffer 256KB Buffers per table ``` Key implementation details: - Language: Rust 2021 edition - CLI Framework: clap v4 with derive macros - Regex: `regex` crate with bytes API - HashMap: `ahash::AHashMap` for performance - Buffer management: `std::io::{BufReader, BufWriter}` ## Optional - [CHANGELOG.md](https://github.com/helgesverre/sql-splitter/blob/main/CHANGELOG.md): Version history - [LICENSE](https://github.com/helgesverre/sql-splitter/blob/main/LICENSE): MIT License - [Makefile](https://github.com/helgesverre/sql-splitter/blob/main/Makefile): Build commands