Analytics Queries
Run SQL analytics directly on dump files using the embedded DuckDB engine.
Quick Start
Section titled “Quick Start”sql-splitter query dump.sql "SELECT COUNT(*) FROM users"No database server needed. The dump is parsed and loaded into DuckDB for querying.
Interactive Mode
Section titled “Interactive Mode”sql-splitter query dump.sql --interactiveREPL commands:
sql-splitter> .tablesusersordersproducts
sql-splitter> .schema usersCREATE TABLE users ( id INT PRIMARY KEY, name VARCHAR(100), email VARCHAR(255));
sql-splitter> SELECT COUNT(*) FROM users;┌──────────┐│ count(*) │├──────────┤│ 1500 │└──────────┘
sql-splitter> .sample orders 5┌────┬─────────┬────────┐│ id │ user_id │ total │├────┼─────────┼────────┤│ 1 │ 42 │ 99.99 ││ 2 │ 17 │ 149.50 │...
sql-splitter> .exitOutput Formats
Section titled “Output Formats”Table (default)
Section titled “Table (default)”sql-splitter query dump.sql "SELECT * FROM users LIMIT 5"sql-splitter query dump.sql "SELECT * FROM users" -f json[ { "id": 1, "name": "Alice", "email": "alice@example.com" }, { "id": 2, "name": "Bob", "email": "bob@example.com" }]sql-splitter query dump.sql "SELECT * FROM users" -f csv -o users.csvJSON Lines
Section titled “JSON Lines”sql-splitter query dump.sql "SELECT * FROM users" -f jsonlCommon Analytics
Section titled “Common Analytics”Table sizes
Section titled “Table sizes”sql-splitter query dump.sql " SELECT 'users' as tbl, COUNT(*) as rows FROM users UNION ALL SELECT 'orders', COUNT(*) FROM orders UNION ALL SELECT 'products', COUNT(*) FROM products"Data quality checks
Section titled “Data quality checks”# Find duplicatessql-splitter query dump.sql " SELECT email, COUNT(*) as cnt FROM users GROUP BY email HAVING cnt > 1"
# Find orphan recordssql-splitter query dump.sql " SELECT o.* FROM orders o LEFT JOIN users u ON o.user_id = u.id WHERE u.id IS NULL"Export subsets
Section titled “Export subsets”# Export active users to CSVsql-splitter query dump.sql \ "SELECT * FROM users WHERE active = 1" \ -f csv -o active_users.csv
# Export high-value orders to JSONsql-splitter query dump.sql \ "SELECT * FROM orders WHERE total > 1000" \ -f json -o high_value.jsonCaching for Speed
Section titled “Caching for Speed”First query imports the dump (slow):
sql-splitter query dump.sql "SELECT COUNT(*) FROM users" --cache# Importing dump... done (5.2s)# 1500Subsequent queries use cache (fast):
sql-splitter query dump.sql "SELECT * FROM users WHERE active = 1" --cache# Using cached database# ...Manage cache:
# List cached databasessql-splitter query --list-cache
# Clear cachesql-splitter query --clear-cacheLarge Files
Section titled “Large Files”For dumps larger than available RAM:
sql-splitter query huge.sql "SELECT ..." --diskThis stores data on disk instead of in memory.
DuckDB SQL Features
Section titled “DuckDB SQL Features”DuckDB supports modern SQL features:
-- Window functionsSELECT id, name, SUM(total) OVER (PARTITION BY user_id ORDER BY created_at) as running_totalFROM orders;
-- CTEsWITH active_users AS ( SELECT * FROM users WHERE active = 1)SELECT * FROM orders WHERE user_id IN (SELECT id FROM active_users);
-- QUALIFYSELECT * FROM ordersQUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) = 1;See DuckDB SQL Reference for full documentation.