Skip to content

query

Execute SQL queries directly on dump files without restoring to a database, powered by DuckDB.

Alias: qy (e.g., sql-splitter qy dump.sql "SELECT COUNT(*) FROM users")

  • Quick data exploration - Answer questions about a dump without spinning up a database
  • Data validation - Verify row counts, check for nulls, or validate business rules before restoring
  • Extract subsets - Pull specific records to CSV/JSON for analysis or reporting
  • Schema investigation - Explore table structures and relationships interactively
  • CI pipelines - Automate data quality checks on dump files

Use analyze instead if you only need basic statistics (row counts, table sizes) - it’s faster and doesn’t require loading data.

The command parses your SQL dump, imports the schema and data into an in-memory DuckDB database, then executes your query.

Disk mode (--disk): For dumps >2GB, use --disk to store data in a temporary file instead of RAM. The temp file is stored in your system’s temp directory and automatically deleted when the command exits:

OSTemp file location
macOS/var/folders/.../sql-splitter-<pid>.duckdb (via $TMPDIR)
Linux/tmp/sql-splitter-<pid>.duckdb
Windows%TEMP%\sql-splitter-<pid>.duckdb (e.g., C:\Users\<user>\AppData\Local\Temp\...)

Caching (--cache): Saves the imported database persistently for fast repeated queries. Cache files are stored in your platform’s cache directory and persist until you run --clear-cache:

OSCache directory
macOS~/Library/Caches/sql-splitter/duckdb/
Linux~/.cache/sql-splitter/duckdb/
Windows%LOCALAPPDATA%\sql-splitter\duckdb\ (e.g., C:\Users\<user>\AppData\Local\sql-splitter\duckdb\)

Cache files are named <hash>.duckdb where the hash is based on the dump file’s path, size, and modification time. If the source dump changes, a new cache is created automatically.

Terminal window
sql-splitter query <INPUT> [QUERY] [OPTIONS]
Terminal window
# Count rows in a table
sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
# Filter and inspect data
sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100"
# Join across tables
sql-splitter query dump.sql "SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name"

Use -f to choose the output format and -o to write to a file:

Terminal window
# Export to CSV for spreadsheet analysis
sql-splitter query dump.sql "SELECT * FROM users" -o users.csv -f csv
# Export to JSON for downstream tooling
sql-splitter query dump.sql "SELECT * FROM orders WHERE status = 'pending'" -f json
# Stream JSON Lines for large result sets (one JSON object per line)
sql-splitter query dump.sql "SELECT * FROM events" -f jsonl

When you don’t know what you’re looking for yet, use the REPL:

Terminal window
sql-splitter query dump.sql --interactive
Terminal window
# Large dumps: use disk storage instead of RAM
sql-splitter query huge.sql "SELECT ..." --disk
# Repeated queries: cache the imported database (400x faster on subsequent runs)
sql-splitter query dump.sql "SELECT COUNT(*) FROM users" --cache
sql-splitter query dump.sql "SELECT * FROM users WHERE active = 1" --cache # Uses cache
# Import only tables you need (faster import, less memory)
sql-splitter query dump.sql "SELECT * FROM orders" --tables orders,users
FlagShortDescriptionDefault
--format-fOutput format: table, json, jsonl, csv, tsvtable
--output-oWrite output to filestdout
--dialect-dSQL dialectauto-detect
--interactive-iStart interactive REPLfalse
--diskForce disk-based storage (auto-enabled for large dumps)false
--cacheCache imported database for faster repeated queriesfalse
--tablesOnly import specific tables (comma-separated)all
--memory-limitDuckDB memory limit (e.g., “4GB”)-
--timingShow query execution timefalse
--progressShow import progressfalse
--list-cacheList cached databases-
--clear-cacheClear all cached databases-

Note: Use --format json or --format jsonl for JSON output (not --json).

┌────┬─────────┬─────────────────────┐
│ id │ name │ email │
├────┼─────────┼─────────────────────┤
│ 1 │ Alice │ alice@example.com │
│ 2 │ Bob │ bob@example.com │
└────┴─────────┴─────────────────────┘
[
{ "id": 1, "name": "Alice", "email": "alice@example.com" },
{ "id": 2, "name": "Bob", "email": "bob@example.com" }
]
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
Terminal window
sql-splitter query dump.sql --interactive

REPL commands:

CommandDescription
.tablesList all tables
.schema [table]Show schema
.describe <table>Describe table columns
.format <fmt>Set output format
.count <table>Count rows
.sample <table> [n]Show sample rows (default: 10)
.export <file> <query>Export query to file
.exitExit REPL

Cache the imported database for faster repeated queries:

Terminal window
# First query imports and caches (~30s for large dump)
sql-splitter query dump.sql "SELECT COUNT(*) FROM users" --cache
# Subsequent queries use cache (~0.1s - 400x faster)
sql-splitter query dump.sql "SELECT * FROM users WHERE active = 1" --cache

Manage cache:

Terminal window
# List cached databases (shows path, size, source file)
sql-splitter query --list-cache
# Clear all caches (reclaims disk space)
sql-splitter query --clear-cache

The cache is automatically invalidated when the source dump file changes (based on file size and modification time). To manually clear space, delete files from the cache directory (see How It Works for paths).

For dumps larger than available RAM (automatically enabled for files >2GB):

Terminal window
sql-splitter query huge.sql "SELECT ..." --disk

Disk mode stores data in a temporary DuckDB file instead of memory. The temp file is automatically deleted when the command exits. If the command crashes or is killed, you may need to manually clean up orphaned files from your temp directory (see How It Works for paths).

Combine query with shell tools for powerful pipelines:

Terminal window
# Extract user emails and process with jq
sql-splitter query dump.sql "SELECT email FROM users" -f json | jq -r '.[].email'
# Find tables with orphaned foreign keys
sql-splitter query dump.sql "SELECT * FROM orders WHERE user_id NOT IN (SELECT id FROM users)" -f csv
# Quick row counts for all tables (combine with analyze for speed)
sql-splitter analyze dump.sql --json | jq '.tables | to_entries[] | "\(.key): \(.value.rows)"'
# Export and compress in one step
sql-splitter query dump.sql "SELECT * FROM large_table" -f csv | gzip > export.csv.gz
  • Check table names - Use .tables in interactive mode to list available tables
  • Case sensitivity - DuckDB is case-insensitive for identifiers, but string comparisons are case-sensitive
  • Dialect mismatch - If auto-detection fails, try --dialect mysql or --dialect postgres
  • Use --disk for dumps larger than available RAM
  • Use --tables to import only the tables you need
  • Consider --cache if you’ll run multiple queries

The dump may use schema-qualified names. Try:

Terminal window
sql-splitter query dump.sql ".tables" # List all tables
sql-splitter query dump.sql "SELECT * FROM public.users" # Use schema prefix
Terminal window
# List cached databases to see what's stored
sql-splitter query --list-cache
# Clear cache if data has changed
sql-splitter query --clear-cache