query
Execute SQL queries directly on dump files without restoring to a database, powered by DuckDB.
Alias: qy (e.g., sql-splitter qy dump.sql "SELECT COUNT(*) FROM users")
When to Use This
Section titled “When to Use This”- Quick data exploration - Answer questions about a dump without spinning up a database
- Data validation - Verify row counts, check for nulls, or validate business rules before restoring
- Extract subsets - Pull specific records to CSV/JSON for analysis or reporting
- Schema investigation - Explore table structures and relationships interactively
- CI pipelines - Automate data quality checks on dump files
Use analyze instead if you only need basic statistics (row counts, table sizes) - it’s faster and doesn’t require loading data.
How It Works
Section titled “How It Works”The command parses your SQL dump, imports the schema and data into an in-memory DuckDB database, then executes your query.
Disk mode (--disk): For dumps >2GB, use --disk to store data in a temporary file instead of RAM. The temp file is stored in your system’s temp directory and automatically deleted when the command exits:
| OS | Temp file location |
|---|---|
| macOS | /var/folders/.../sql-splitter-<pid>.duckdb (via $TMPDIR) |
| Linux | /tmp/sql-splitter-<pid>.duckdb |
| Windows | %TEMP%\sql-splitter-<pid>.duckdb (e.g., C:\Users\<user>\AppData\Local\Temp\...) |
Caching (--cache): Saves the imported database persistently for fast repeated queries. Cache files are stored in your platform’s cache directory and persist until you run --clear-cache:
| OS | Cache directory |
|---|---|
| macOS | ~/Library/Caches/sql-splitter/duckdb/ |
| Linux | ~/.cache/sql-splitter/duckdb/ |
| Windows | %LOCALAPPDATA%\sql-splitter\duckdb\ (e.g., C:\Users\<user>\AppData\Local\sql-splitter\duckdb\) |
Cache files are named <hash>.duckdb where the hash is based on the dump file’s path, size, and modification time. If the source dump changes, a new cache is created automatically.
sql-splitter query <INPUT> [QUERY] [OPTIONS]Examples
Section titled “Examples”Basic Queries
Section titled “Basic Queries”# Count rows in a tablesql-splitter query dump.sql "SELECT COUNT(*) FROM users"
# Filter and inspect datasql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100"
# Join across tablessql-splitter query dump.sql "SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name"Exporting Results
Section titled “Exporting Results”Use -f to choose the output format and -o to write to a file:
# Export to CSV for spreadsheet analysissql-splitter query dump.sql "SELECT * FROM users" -o users.csv -f csv
# Export to JSON for downstream toolingsql-splitter query dump.sql "SELECT * FROM orders WHERE status = 'pending'" -f json
# Stream JSON Lines for large result sets (one JSON object per line)sql-splitter query dump.sql "SELECT * FROM events" -f jsonlInteractive Exploration
Section titled “Interactive Exploration”When you don’t know what you’re looking for yet, use the REPL:
sql-splitter query dump.sql --interactivePerformance Optimization
Section titled “Performance Optimization”# Large dumps: use disk storage instead of RAMsql-splitter query huge.sql "SELECT ..." --disk
# Repeated queries: cache the imported database (400x faster on subsequent runs)sql-splitter query dump.sql "SELECT COUNT(*) FROM users" --cachesql-splitter query dump.sql "SELECT * FROM users WHERE active = 1" --cache # Uses cache
# Import only tables you need (faster import, less memory)sql-splitter query dump.sql "SELECT * FROM orders" --tables orders,usersOptions
Section titled “Options”| Flag | Short | Description | Default |
|---|---|---|---|
--format | -f | Output format: table, json, jsonl, csv, tsv | table |
--output | -o | Write output to file | stdout |
--dialect | -d | SQL dialect | auto-detect |
--interactive | -i | Start interactive REPL | false |
--disk | Force disk-based storage (auto-enabled for large dumps) | false | |
--cache | Cache imported database for faster repeated queries | false | |
--tables | Only import specific tables (comma-separated) | all | |
--memory-limit | DuckDB memory limit (e.g., “4GB”) | - | |
--timing | Show query execution time | false | |
--progress | Show import progress | false | |
--list-cache | List cached databases | - | |
--clear-cache | Clear all cached databases | - |
Note: Use --format json or --format jsonl for JSON output (not --json).
Output Formats
Section titled “Output Formats”Table (default)
Section titled “Table (default)”┌────┬─────────┬─────────────────────┐│ id │ name │ email │├────┼─────────┼─────────────────────┤│ 1 │ Alice │ alice@example.com ││ 2 │ Bob │ bob@example.com │└────┴─────────┴─────────────────────┘[ { "id": 1, "name": "Alice", "email": "alice@example.com" }, { "id": 2, "name": "Bob", "email": "bob@example.com" }]id,name,email1,Alice,alice@example.com2,Bob,bob@example.comInteractive REPL
Section titled “Interactive REPL”sql-splitter query dump.sql --interactiveREPL commands:
| Command | Description |
|---|---|
.tables | List all tables |
.schema [table] | Show schema |
.describe <table> | Describe table columns |
.format <fmt> | Set output format |
.count <table> | Count rows |
.sample <table> [n] | Show sample rows (default: 10) |
.export <file> <query> | Export query to file |
.exit | Exit REPL |
Caching
Section titled “Caching”Cache the imported database for faster repeated queries:
# First query imports and caches (~30s for large dump)sql-splitter query dump.sql "SELECT COUNT(*) FROM users" --cache
# Subsequent queries use cache (~0.1s - 400x faster)sql-splitter query dump.sql "SELECT * FROM users WHERE active = 1" --cacheManage cache:
# List cached databases (shows path, size, source file)sql-splitter query --list-cache
# Clear all caches (reclaims disk space)sql-splitter query --clear-cacheThe cache is automatically invalidated when the source dump file changes (based on file size and modification time). To manually clear space, delete files from the cache directory (see How It Works for paths).
Large Files
Section titled “Large Files”For dumps larger than available RAM (automatically enabled for files >2GB):
sql-splitter query huge.sql "SELECT ..." --diskDisk mode stores data in a temporary DuckDB file instead of memory. The temp file is automatically deleted when the command exits. If the command crashes or is killed, you may need to manually clean up orphaned files from your temp directory (see How It Works for paths).
Composing with Other Tools
Section titled “Composing with Other Tools”Combine query with shell tools for powerful pipelines:
# Extract user emails and process with jqsql-splitter query dump.sql "SELECT email FROM users" -f json | jq -r '.[].email'
# Find tables with orphaned foreign keyssql-splitter query dump.sql "SELECT * FROM orders WHERE user_id NOT IN (SELECT id FROM users)" -f csv
# Quick row counts for all tables (combine with analyze for speed)sql-splitter analyze dump.sql --json | jq '.tables | to_entries[] | "\(.key): \(.value.rows)"'
# Export and compress in one stepsql-splitter query dump.sql "SELECT * FROM large_table" -f csv | gzip > export.csv.gzTroubleshooting
Section titled “Troubleshooting”Query returns zero results
Section titled “Query returns zero results”- Check table names - Use
.tablesin interactive mode to list available tables - Case sensitivity - DuckDB is case-insensitive for identifiers, but string comparisons are case-sensitive
- Dialect mismatch - If auto-detection fails, try
--dialect mysqlor--dialect postgres
Import is slow or runs out of memory
Section titled “Import is slow or runs out of memory”- Use
--diskfor dumps larger than available RAM - Use
--tablesto import only the tables you need - Consider
--cacheif you’ll run multiple queries
”Table not found” error
Section titled “”Table not found” error”The dump may use schema-qualified names. Try:
sql-splitter query dump.sql ".tables" # List all tablessql-splitter query dump.sql "SELECT * FROM public.users" # Use schema prefixCache issues
Section titled “Cache issues”# List cached databases to see what's storedsql-splitter query --list-cache
# Clear cache if data has changedsql-splitter query --clear-cacheSee Also
Section titled “See Also”analyze- Quick statistics without full import- DuckDB SQL Reference