[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-953":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":13,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":21,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":41,"readmeContent":42,"aiSummary":43,"trendingCount":15,"starSnapshotCount":15,"syncStatus":44,"lastSyncTime":45,"discoverSource":46},953,"slothdb","SouravRoy-ETL\u002Fslothdb","SouravRoy-ETL","An experimental embedded SQL engine in C++20. Query Parquet, CSV, JSON, Arrow, Avro, SQLite, and Excel files directly with SQL, in-process. Early-stage.","https:\u002F\u002Fslothdb.org",null,"C++",519,5,4,0,198,6.33,"MIT License",false,"main",true,[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40],"analytics","arrow","avro","columnar-database","cplusplus","cpp","csv","database","dataframe","duckdb","embedded-database","natural-language-sql","olap","parquet","python","query-engine","sql","wasm","2026-06-12 02:00:21","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"assets\u002Fhero.svg\" alt=\"SlothDB\" width=\"100%\">\n\n\u003Ch3>Run analytics faster.\u003C\u002Fh3>\n\n\u003Cp>SlothDB is an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. Built from scratch. \u003Cb>Up to 5x faster\u003C\u002Fb> where it counts.\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FXJWyGmX5G\">\n    \u003Cimg src=\"assets\u002Fdiscord-cta.svg\" alt=\"Join the SlothDB Discord\" width=\"340\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n[![PyPI](https:\u002F\u002Fimg.shields.io\u002Fpypi\u002Fv\u002Fslothdb?color=3775A9&logo=pypi&logoColor=white&cacheSeconds=60)](https:\u002F\u002Fpypi.org\u002Fproject\u002Fslothdb\u002F)\n[![npm](https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fv\u002F@slothdb\u002Fwasm?color=CB3837&logo=npm&label=npm)](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@slothdb\u002Fwasm)\n[![PyPI downloads](https:\u002F\u002Fstatic.pepy.tech\u002Fbadge\u002Fslothdb)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fslothdb)\n[![npm downloads](https:\u002F\u002Fimg.shields.io\u002Fnpm\u002Fdt\u002F@slothdb\u002Fwasm?label=npm%20downloads&color=CB3837)](https:\u002F\u002Fwww.npmjs.com\u002Fpackage\u002F@slothdb\u002Fwasm)\n[![CI](https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg)](https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb\u002Factions\u002Fworkflows\u002Fci.yml)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-blue)](LICENSE)\n[![Stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSouravRoy-ETL\u002Fslothdb?style=social)](https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb)\n[![PeerPush](https:\u002F\u002Fpeerpush.net\u002Fp\u002Fslothdb\u002Fbadge.png)](https:\u002F\u002Fpeerpush.net\u002Fp\u002Fslothdb)\n\n[Website](https:\u002F\u002Fslothdb.org) · [**Playground**](https:\u002F\u002Fslothdb.org\u002Fplayground\u002F) · [**Discord**](https:\u002F\u002Fdiscord.gg\u002FXJWyGmX5G) · [Blog](https:\u002F\u002Fslothdb.org\u002Fblog\u002Fcompiling-a-database-to-wasm.html) · [Docs](docs\u002FDOCUMENTATION.md) · [Benchmarks](#performance) · [Python](docs\u002FDOCUMENTATION.md#6-python-api) · [SQL Guide](docs\u002FDOCUMENTATION.md#4-sql-guide)\n\n\u003Cbr>\n\n\u003Cimg src=\"assets\u002Fdemo.svg\" alt=\"SlothDB 60-second demo - side-by-side timing vs DuckDB\" width=\"90%\">\n\n\u003C\u002Fdiv>\n\n---\n\n## Ask in any language. Get SQL.\n\nType `.ask` at the `slothdb>` prompt. A rules parser handles catalog questions and common English shapes in under 10 ms with no model. Anything else falls through to a local Qwen2.5-Coder (0.5B for simple, 1.5B for analytic; lazy-downloaded on first use under `-DSLOTHDB_ASK_MODEL=ON`), which speaks 29 natural languages: English, Chinese, Spanish, French, German, Japanese, Korean, Russian, Arabic, Portuguese, Italian, Hindi, and more. Every generated statement is shown before it runs. Nothing leaves the machine. Set `SLOTHDB_ASK_CONFIRM=1` to add a `[Y\u002Fn]` prompt before each run.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fask-demo.svg\" alt=\".ask: rules-first, router, two local Qwens, [Y\u002Fn] gate\" width=\"100%\">\n\u003C\u002Fdiv>\n\n| tier | what | cost | covers |\n|---|---|--:|---|\n| 1 | **Rules parser** (default) | sub-10 ms, no model | catalog, COUNT\u002FSUM\u002FAVG\u002FGROUP BY\u002FTOP-N, file-source |\n| 2 | **Local Qwen 2.5-Coder 0.5B Q4_K_M** | ~200 ms, ~310 MB | open-ended SELECT\u002FGROUP BY\u002Ffilter |\n| 3 | **Local Qwen 2.5-Coder 1.5B Q4_K_M** | ~500 ms, ~986 MB | window functions, ranking within groups, LAG\u002FLEAD, joins |\n\nBoth model tiers download lazily in parallel on first `.ask` (total ~1.3 GB). Router is a pure function of the question: no LLM call involved in routing. Cumulative \u002F running \u002F moving aggregates refuse cleanly (engine gap, not model gap). Full spec, router signals, refusal policy: [docs\u002FASK.md](docs\u002FASK.md).\n\n## Try it in 60 seconds\n\n**In your browser** - no install, no account: **[slothdb.org\u002Fplayground](https:\u002F\u002Fslothdb.org\u002Fplayground\u002F)**. Full SlothDB compiled to WebAssembly, with a pre-loaded 1,000-row demo CSV + matching Parquet to compare format performance. Files you add stay on your machine.\n\n**In Python** - CPython 3.8+ on Linux \u002F macOS \u002F Windows (see [latest release](https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb\u002Freleases\u002Flatest) for published wheel tags; falls back to source build if no wheel matches):\n\n```bash\npip install slothdb\npython -c \"import slothdb; slothdb.demo()\"\n```\n\nThat generates a 100 000-row CSV, runs three queries, and prints the side-by-side with DuckDB shown above. No files to find, no setup.\n\n```python\n# Your own files, same API. pandas round-trip in two lines:\nimport slothdb, pandas as pd\ndb = slothdb.connect()\ndf = db.sql(\"SELECT region, SUM(revenue) AS rev FROM 'sales.parquet' GROUP BY region\").fetchdf()\n```\n\n**In Node\u002FJS** - `npm install @slothdb\u002Fwasm`:\n\n```js\nimport { SlothDB } from '@slothdb\u002Fwasm';\nconst db = await SlothDB.create();\nconst { columns, rows } = db.query(\"SELECT 1 AS n\");\n```\n\n**In the shell** - [download](https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb\u002Freleases\u002Flatest) or build from source; then `slothdb analytics.slothdb` for a persistent single-file DB.\n\n---\n\n## What's new in 0.2.5\n\n- **Nested aggregates work everywhere.** `ROUND(AVG(x))`, `AVG(x) + 1`, `SUM(x) \u002F COUNT(*)`, `CAST(SUM(y) AS DOUBLE)` and other shapes that wrap an aggregate inside a scalar function or arithmetic used to throw \"Function execution for: AVG\". The planner now walks the whole expression tree and hoists every aggregate it finds, no matter how deep, so you can write the SELECT list the way you'd write it in DuckDB or Postgres.\n- **`ORDER BY` by aggregate alias works.** `SELECT region, COUNT(*) AS cnt FROM ... GROUP BY region ORDER BY cnt DESC` used to default the sort to column 0 and silently sort by region. `PhysicalOrderBy` now precomputes per-row order keys via the expression executor whenever any clause isn't a plain column ref. Both the full-sort and top-N heap paths use the precomputed keys.\n- **Arithmetic type promotion fixed.** A pre-existing bug surfaced by the hoist: `AVG(x) + 1` would lose the `+1` and `AVG(x) \u002F COUNT(*)` would return `inf`, both because the typed arithmetic kernel reinterpreted operand bytes instead of converting them. Now coerces both operands to the result type with typed fast paths (int to double, int to bigint, float to double).\n- **`bench\u002F` directory at the repo root.** All 43 ClickBench queries verbatim from `ClickHouse\u002FClickBench` plus a 16-query mixed suite, behind a generic Python runner. Reproducible side-by-side timing against DuckDB. See [bench\u002FREADME.md](bench\u002FREADME.md).\n\n408 tests, 131,537 assertions, green on Windows \u002F Linux \u002F macOS.\n\n## What's new in 0.2.3\n\n- **GZIP and ZSTD Parquet decode.** miniz handles codec=2 with a hand-rolled RFC 1952 header peel (the gzip wrapper miniz refuses by default). Vendored libzstd 1.5.6, decompression-only subset, adds about 50 KB to the binary and unblocks codec=6 files written by Spark \u002F pyarrow \u002F parquet-mr defaults.\n- **`read_parquet` glob in the browser.** `SELECT * FROM '\u002Fdata\u002Fshard_*.parquet'` fans out across every match in the playground's MEMFS instead of failing with \"Cannot open Parquet\". Same path the native CLI has had for a while.\n- **Single-threaded WASM stops crashing on GROUP BY.** `std::thread` spawn is now routed through a `HWThreads()` helper that returns 1 under `__EMSCRIPTEN__ && !__EMSCRIPTEN_PTHREADS__` and runs the worker inline.\n- **HTTPS Parquet from the browser playground.** Tiny Cloudflare Worker proxy ([cloudflare\u002Fcors-proxy\u002F](cloudflare\u002Fcors-proxy\u002F), 60 lines, free-tier-friendly) routes around CORS for buckets that don't set `Access-Control-Allow-Origin`. `FROM 'https:\u002F\u002Fhost\u002Ffile.parquet'` works in the playground for any host.\n- **Discord server.** [discord.gg\u002FXJWyGmX5G](https:\u002F\u002Fdiscord.gg\u002FXJWyGmX5G). Bug reports, weird query plans, perf threads, anything.\n\n### Previously in 0.2.0 - 0.2.2\n\n- **Top-N pushdown for `ORDER BY ... LIMIT N` on Parquet.** Bounded-heap operator instead of full sort then truncate. 10M-row `ORDER BY q DESC LIMIT 10`: 420,344 ms to 119 ms.\n- **Predicate pushdown via row-group stats.** Filters that don't intersect a row group's min\u002Fmax skip the whole group. Selective queries on big files get the I\u002FO reduction without an index.\n- **HTTP Range requests for Parquet.** Reader pulls the footer, then only the bytes for the row groups it actually needs. No full-file download for most queries.\n- **Typed batch C API.** New `slothdb_column_int32_buffer \u002F int64_buffer \u002F double_buffer \u002F varchar_buffer \u002F validity_buffer`. The Python wrapper reads one buffer per column instead of two ctypes calls per cell. SELECT 2 columns x 10M rows: 46 s to 16 s.\n- **Direct `string_t` emit + lazy `QueryResult`.** Result chunks stay alive until the user pulls them; `fetchnumpy()` and `fetchdf()` skip the per-cell `Value` boxing entirely.\n- **`.ask` natural-language sub-REPL.** Rules parser in every build (sub-10 ms, no model). Optional local Qwen2.5-Coder GGUF fallback under `-DSLOTHDB_ASK_MODEL=ON`: 0.5B and 1.5B tiers picked by a deterministic keyword router. 29 languages. Generated SQL is shown before it runs. [docs\u002FASK.md](docs\u002FASK.md).\n- **JOIN hot path.** 138 ms vs 540 ms on the 5-query warm batch (1M x 1K) - typed int64 hash, parallel CSV pre-parse, build-side projection pushdown, COUNT(\\*)-over-JOIN fused into the aggregate.\n- **`CREATE LIVE VIEW`** with incremental CSV append. Useful when a dashboard tails a log that keeps growing.\n- **Edge build** (`-DSLOTHDB_EDGE=ON`) for sub-MB WASM bundles under Cloudflare Workers' 1 MB cap.\n\nPer-commit history with bench deltas in [CHANGELOG.md](CHANGELOG.md).\n\n---\n\n## Why SlothDB?\n\nSlothDB is an **embedded analytical database in C++20**. You link it into your application (or run the shell) and point SQL at files on disk. No server process, no import step, no \"load the extension first.\" That's the same model as DuckDB and SQLite, but the defaults are different.\n\n```sql\n-- No CREATE TABLE. No COPY FROM. Just point at the file.\nSELECT department, COUNT(*), AVG(salary)\nFROM 'employees.parquet'\nWHERE hire_year >= 2020\nGROUP BY department\nORDER BY AVG(salary) DESC;\n\n-- Local, HTTP(S), or public S3 - same SQL.\nSELECT region, SUM(revenue) FROM 'https:\u002F\u002Fhost\u002Fdata.csv' GROUP BY region;\nSELECT * FROM 's3:\u002F\u002Fpublic-bucket\u002Fevents.parquet';\n```\n\n### If you're already using DuckDB\n\nDuckDB is great and a lot of users should keep using it. SlothDB overlaps on the embedded-columnar-SQL story but makes different default choices. Four of those choices are what we've heard asked for most:\n\n- **Live views over growing files.** `CREATE LIVE VIEW` caches a query result and, for the common `SELECT * FROM 'file.csv'` shape on CSV\u002FTSV, appends only the new bytes on the next query. Useful when a dashboard tails a log that keeps growing.\n- **Smaller WASM for edge workers.** The `-DSLOTHDB_EDGE=ON` build (CSV\u002FJSON\u002FParquet only) targets Cloudflare Workers' 1 MB script budget; the full WASM bundle is around 1.3 MB.\n- **Everything in core, no extensions.** HTTP(S), S3 (anonymous public reads), Avro, Excel, Arrow, and SQLite read through the same core binary - no separate install\u002Fload step.\n- **Stable C ABI + numeric error codes.** `ErrorCode::TABLE_NOT_FOUND = 2000` does not shift between releases; bindings built against 0.1.x keep working.\n\nSame idea as DuckDB otherwise: embedded, columnar, vectorized, query files directly. Head-to-head on our bench:\n\n| | SlothDB | DuckDB |\n|---|---|---|\n| 5-query warm JOIN batch (1 M × 1 K) | **138 ms** | 540 ms |\n| Live-refresh view on a growing CSV | `CREATE LIVE VIEW`, incremental append | re-execute the query |\n| File-format readers in the core binary | 7 (CSV, Parquet, JSON, Avro, Excel, Arrow, SQLite) | 3 in core; Avro\u002FExcel\u002FSQLite via extensions |\n| Remote file read from SQL | built in (HTTP(S), public S3) | `httpfs` extension |\n| WASM bundle size | 1.3 MB full \u002F sub-1 MB edge | ~18 MB |\n| Extension ABI | stable C ABI, numeric error codes | internal C++ API |\n| `VARCHAR(n)` length | enforced at INSERT | not enforced |\n| Binary size (CLI) | ~1-2 MB (Windows MSVC Release; Linux static ~2-4 MB) | ~50 MB (latest release tarball) |\n| License | MIT | MIT |\n\nNumbers are from our machine; reproduce with the demo below. The big gaps are architectural: native Avro decoder, built-in `httpfs`-equivalent, smaller WASM.\n\n### If you're using ClickHouse today\n\nClickHouse wins at petabyte-scale distributed analytics - SlothDB isn't trying to replace it there. But if your workload fits on one machine (Python notebooks, desktop analytics, embedded BI, single-node dashboards), you're paying ClickHouse-server operational cost for work that doesn't need a cluster:\n\n| | SlothDB | clickhouse-local | ClickHouse server |\n|---|---|---|---|\n| Deployment | 1-2 MB binary, embedded | ~500 MB binary | server + Keeper + config |\n| Cold start | \u003C 10 ms on our bench | seconds (varies) | tens of seconds (varies) |\n| Ops overhead | none | none | daemon, ports, upgrades |\n| Embed in a desktop app | yes, one binary | awkward | no |\n| Cluster \u002F distributed query | no | no | yes |\n\nFor single-node SQL over local files, a ClickHouse server is operational overkill; `clickhouse-local` is closer in spirit but still a ~500 MB binary with slower cold start. SlothDB targets this narrow slice - single-node, embedded, file-first.\n\n### If you're using SQLite today for analytics\n\nSQLite is row-oriented and tuned for transactional workloads. Aggregate queries over wide tables read every column of every row even when you only need two. SlothDB is columnar and vectorized, so column-selective aggregates are typically several× faster - the exact speedup depends on row count and column width, so reproduce with your own data. You can keep your existing SQLite file and read from it directly with `sqlite_scan('app.db', 'users')`.\n\n### What SlothDB does not do (honest list)\n\n- **No distributed query execution.** One-node embedded engine. Use ClickHouse if you outgrow one machine.\n- **No MVCC \u002F multi-writer transactions.** Single writer, crash-safe checkpoint. OLTP workloads are a poor fit.\n- **No secondary indexes yet.** Scan-based execution. Zone-map pruning helps on sorted data, but there's no B-tree \u002F hash index for point lookups.\n- **Window-function coverage is partial.** Plain OVER \u002F PARTITION BY works; `ROWS BETWEEN ...` frames and `SUM OVER (ORDER BY)` cumulative shapes have known gaps (tracked in [docs\u002FROADMAP.md](docs\u002FROADMAP.md)).\n- **Authenticated S3 and TLS cert pinning are not implemented.** `s3:\u002F\u002F` works for anonymous public-bucket reads only.\n- **UPDATE \u002F DELETE are not parallelized.** They rewrite affected chunks serially - fine for thousands of rows, slow for millions.\n- **Young codebase.** 403 tests, five benchmark formats green, but corners of SQL will still surprise you. File an issue with a repro and we'll fix it.\n\n---\n\n## Quickstart\n\n**60-second tour** (no files to find - it generates and queries synthetic data, and prints a side-by-side with DuckDB if you have it installed):\n\n```bash\npip install slothdb\npython -c \"import slothdb; slothdb.demo()\"\n```\n\n```\nQuery                            SlothDB     DuckDB    Speedup\n--------------------------------------------------------------\nCOUNT(*)                          3.1 ms    17.0 ms     5.48x\nSUM(revenue) WHERE year>=2023    10.6 ms    17.7 ms     1.67x\nGROUP BY region                  10.0 ms    19.1 ms     1.91x\n```\n\n**Query your own files** - one-shot or interactive shell:\n\n```bash\npip install slothdb                                                              # Python\ncurl -fsSL https:\u002F\u002Fraw.githubusercontent.com\u002FSouravRoy-ETL\u002Fslothdb\u002Fmain\u002Finstall.sh | bash   # Linux \u002F macOS CLI\n# Windows: download slothdb.exe from https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb\u002Freleases\u002Flatest\n```\n\n```bash\nslothdb -c \"SELECT region, SUM(revenue) FROM 'sales.csv' GROUP BY region ORDER BY 2 DESC;\"\nslothdb                              # interactive, in-memory\nslothdb analytics.slothdb            # interactive, persistent\n```\n\n```python\nimport slothdb\ndb = slothdb.connect()\ndf = db.sql(\"SELECT * FROM 'employees.csv' WHERE salary > 100000\").fetchdf()\n```\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>More install methods (Debian, Fedora, Arch, Homebrew, build from source)\u003C\u002Fb>\u003C\u002Fsummary>\n\n| Platform | Command |\n|----------|---------|\n| Ubuntu \u002F Debian | grab the latest `.deb` from [releases](https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb\u002Freleases\u002Flatest), then `sudo dpkg -i slothdb_*.deb` |\n| Fedora \u002F RHEL | grab the latest `.rpm` from [releases](https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb\u002Freleases\u002Flatest), then `sudo rpm -i slothdb-*.rpm` (or build from [spec](packaging\u002Frpm\u002Fslothdb.spec)) |\n| Arch Linux | `makepkg -si` ([PKGBUILD](packaging\u002Farch\u002FPKGBUILD)) |\n| macOS (Homebrew) | `brew install --build-from-source packaging\u002Fhomebrew\u002Fslothdb.rb` |\n| Build from source | See [below](#build-from-source) |\n\n\u003C\u002Fdetails>\n\n## Performance\n\n> 1 M-row datasets · warm cache · 5-run median on one workstation · DuckDB 1.1.3 · Ryzen 7 5800X, 32 GB DDR4, Windows 11, MSVC Release. Numbers will differ on other hardware - reproduce with `pip install slothdb && python -c \"import slothdb; slothdb.demo()\"`, which runs the 5-query batch side-by-side with whatever DuckDB version is installed.\n\n### 1. JOIN - CPU-bound, new in 0.1.6\n\n```\nSELECT COUNT(*) FROM big JOIN sm ON b.k = s.k   -- 1 M × 1 K\n\nSlothDB  85 ms        DuckDB  212 ms        2.5× faster\n```\n\nPure hash-join hot path, no I\u002FO ambiguity. Typed int64 hash path, parallel CSV pre-parse, build-side projection pushdown, `COUNT(*)`-over-JOIN fused into the aggregate. Landed in this release.\n\n### 2. End-to-end batch - five queries in one shell invocation\n\n```\nscan + aggregate + GROUP BY + filter + JOIN\n\nSlothDB total  138 ms        DuckDB total  540 ms        3.9× faster\n```\n\nMixed workload. Startup cost is part of the denominator - that's honest: it's what someone running `slothdb -c \"...\"` actually pays. Not a microbench.\n\n### 3. Avro - native decode beats an extension path\n\n```\nSUM(revenue) on 1 M-row .avro           SlothDB  140 ms   DuckDB  760 ms   5.43×\nGROUP BY region on 1 M-row .avro        SlothDB  170 ms   DuckDB  800 ms   4.71×\n```\n\nNative typed Avro decoder in the core binary vs. an extension-based reader. Architectural difference, not a micro-optimization.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Full 16-query suite across CSV \u002F Parquet \u002F JSON \u002F Avro \u002F Excel\u003C\u002Fb>\u003C\u002Fsummary>\n\n| Format | Query | SlothDB | DuckDB | Speedup |\n|---|---|--:|--:|:-:|\n| CSV | `COUNT(*)` (parser throughput) | 33 ms | 170 ms | 5.08× |\n| CSV | `SUM(revenue)` | 106 ms | 177 ms | 1.67× |\n| CSV | `GROUP BY region` | 100 ms | 191 ms | 1.91× |\n| CSV | `GROUP BY product, year` | 117 ms | 198 ms | 1.70× |\n| CSV | `WHERE year>=2023 AND qty>100 GROUP BY region` | 107 ms | 194 ms | 1.81× |\n| CSV | `big × small JOIN COUNT(*)` (1 M × 1 K) | 85 ms | 212 ms | 2.49× |\n| Parquet | `COUNT(*)` | 12 ms | 34 ms | 2.83× |\n| Parquet | `SUM(revenue)` | 46 ms | 48 ms | 1.04× (tie, within noise) |\n| Parquet | `GROUP BY region` | 76 ms | 88 ms | 1.16× |\n| Parquet | `GROUP BY product, year` | 146 ms | 173 ms | 1.18× |\n| Parquet | `WHERE + GROUP BY` | 157 ms | 198 ms | 1.26× |\n| JSON | `SUM(revenue)` | 242 ms | 314 ms | 1.30× |\n| JSON | `GROUP BY region` | 284 ms | 324 ms | 1.14× |\n| Avro | `SUM(revenue)` | 140 ms | 760 ms | 5.43× |\n| Avro | `GROUP BY region` | 170 ms | 800 ms | 4.71× |\n| Excel | `GROUP BY region` | 2500 ms | 3560 ms | 1.41× |\n\nMedian speedup: 1.70×. Range: 1.04× - 5.43×.\n\n\u003C\u002Fdetails>\n\nCaveats worth knowing: Parquet aggregates are within ~20 % of DuckDB on most queries - both engines saturate the columnar fast path there, so don't expect 3× on Parquet. The big gaps come from SlothDB's native decoders (Avro, CSV `COUNT(*)`) and the 0.1.6 JOIN hot path. We have not submitted to ClickBench yet - on the roadmap.\n\nThe architectural decisions behind the numbers (typed columnar decode, per-worker buffer reuse, fused scan+aggregate, zero-copy VARCHAR, vectorized filter, parallel CSV aggregate, typed int64 JOIN hash path) are in [CHANGELOG.md](CHANGELOG.md) with a commit per optimization.\n\n## Query Any File with SQL\n\nNo import step. No schema definition. Just query:\n\n```sql\n-- CSV\nSELECT * FROM 'sales.csv';\nSELECT region, SUM(revenue) FROM read_csv('data\u002F*.csv') GROUP BY region;\n\n-- Parquet (fastest - columnar, compressed, filter pushdown)\nSELECT * FROM read_parquet('events.parquet') WHERE event_date > '2024-01-01';\n\n-- JSON \u002F NDJSON\nSELECT status, COUNT(*) FROM 'api_logs.json' GROUP BY status;\n\n-- Excel\nSELECT * FROM read_xlsx('quarterly_report.xlsx');\n\n-- Avro, Arrow IPC, SQLite - all built-in, no extensions\nSELECT * FROM read_avro('events.avro');\nSELECT * FROM sqlite_scan('app.db', 'users');\n```\n\n**Create views on files - always returns fresh data:**\n\n```sql\nCREATE VIEW sales AS SELECT * FROM read_csv('sales.csv');\nCREATE VIEW events AS SELECT * FROM read_parquet('events.parquet');\nCREATE VIEW report AS SELECT * FROM read_xlsx('report.xlsx');\n\n-- Query views like tables - re-reads the file each time\nSELECT region, SUM(revenue) FROM sales GROUP BY region;\n```\n\n**Export results to any format:**\n\n```sql\nCOPY (SELECT * FROM 'big.csv' WHERE year >= 2024) TO 'filtered.parquet' WITH (FORMAT PARQUET);\n```\n\n> **[Full file format guide](docs\u002FDOCUMENTATION.md#2-query-your-files)** - CSV, Parquet, JSON, Excel, Arrow, Avro, SQLite, virtual views\n\n## Persistent Database\n\n```bash\nslothdb analytics.slothdb    # creates or opens a .slothdb file\n```\n\n```sql\nCREATE TABLE sales AS SELECT * FROM read_csv('sales_2024.csv');\nCREATE TABLE events AS SELECT * FROM read_parquet('events.parquet');\n\n-- Next session, tables are still here\nSELECT region, SUM(revenue) FROM sales GROUP BY region;\n```\n\n> **[Working with large datasets](docs\u002FDOCUMENTATION.md#3-working-with-large-datasets)** - when to query directly vs. import vs. convert to Parquet\n\n## Python\n\n```python\nimport slothdb\n\ndb = slothdb.connect()                    # in-memory\ndb = slothdb.connect(\"analytics.slothdb\") # persistent\n\n# Query files directly\nresult = db.sql(\"SELECT * FROM 'employees.csv' WHERE salary > 100000\")\ndf = result.fetchdf()  # pandas DataFrame\n\n# Window functions, CTEs, QUALIFY - full SQL\nresult = db.sql(\"\"\"\n    SELECT name, department, salary\n    FROM 'employees.parquet'\n    QUALIFY ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) = 1\n\"\"\")\n```\n\n> **[Full Python API reference](docs\u002FDOCUMENTATION.md#6-python-api)** - connect, query, results, pandas integration, context manager\n\n## C\u002FC++\n\n```c\n#include \"slothdb\u002Fapi\u002Fslothdb.h\"\n\nslothdb_database *db;\nslothdb_connection *conn;\nslothdb_result *result;\n\nslothdb_open(\"analytics.slothdb\", &db);\nslothdb_connect(db, &conn);\nslothdb_query(conn, \"SELECT region, SUM(revenue) FROM read_csv('sales.csv') GROUP BY region\", &result);\n\nfor (uint64_t r = 0; r \u003C slothdb_row_count(result); r++)\n    printf(\"%s: %s\\n\", slothdb_value_varchar(result, r, 0), slothdb_value_varchar(result, r, 1));\n\nslothdb_free_result(result);\nslothdb_disconnect(conn);\nslothdb_close(db);\n```\n\n> **[Full C\u002FC++ API reference](docs\u002FDOCUMENTATION.md#7-cc-api)** - lifecycle, queries, results, error handling, CMake integration, RAII wrapper\n\n## Features\n\n| Category | Details |\n|----------|---------|\n| **SQL** | 130+ features - JOINs, CTEs (recursive), window functions, QUALIFY, MERGE, subqueries, set operations |\n| **Live file views** | `CREATE LIVE VIEW` caches the result and auto-refreshes on file change. Incremental CSV append path parses only new bytes on log-tail workloads |\n| **Shell `.ask`** | Natural-language -> SQL in the CLI. Rules parser (default, ~50 KB, no network) handles COUNT \u002F SUM \u002F AVG \u002F GROUP BY \u002F TOP-N \u002F year filters \u002F load-file \u002F count-rows-in-file \u002F create-view-from-file. Builds with `-DSLOTHDB_ASK_MODEL=ON` fall back to a local Qwen2.5-Coder GGUF for open-ended NL. Every generated SQL is shown before it runs; set `SLOTHDB_ASK_CONFIRM=1` to require a `[Y\u002Fn]` keypress per statement. |\n| **Metadata** | `DESCRIBE \u003Cquery>`, `DESCRIBE \u003Ctable>`, `PRAGMA table_info('t')`, `PRAGMA database_list` - BI-tool introspection out of the box |\n| **Type constraints** | `VARCHAR(n)` length enforced on INSERT (no silent truncation) |\n| **File I\u002FO** | CSV, Parquet, JSON, Arrow, Avro, Excel, SQLite - all built-in with auto-detection, glob patterns, virtual views |\n| **Remote files** | `https:\u002F\u002F` and public-bucket `s3:\u002F\u002F` URLs work directly in any SQL path |\n| **Functions** | 70+ functions - string, math, date\u002Ftime (including `DATE_TRUNC` with WEEK\u002FQUARTER\u002FDECADE + `MONTHNAME` \u002F `DAYNAME` \u002F `LAST_DAY` \u002F `MAKE_DATE`), aggregate, regex, trigonometric |\n| **Performance** | Vectorized columnar engine (2,048 values\u002Fbatch), morsel-driven parallelism, fused scan+aggregate, typed int64 JOIN hash path, parallel CSV pre-parse, zero-copy VARCHAR |\n| **Build flavours** | Default full build (~1-4 MB native binary depending on platform and linking) or `-DSLOTHDB_EDGE=ON` for sub-MB WASM bundles that fit under Cloudflare Workers' 1 MB cap |\n| **Storage** | Single-file `.slothdb` persistence, RLE\u002Fdictionary\u002Fbitpacking compression, zone maps |\n| **Optimizer** | Constant folding, filter pushdown, TopN optimization |\n| **APIs** | CLI shell, Python (with pandas), C\u002FC++ (stable ABI) |\n| **Reliability** | 403 tests, 131,513 assertions, bounds-checked parsing, DoS limits |\n\n## Documentation\n\n| | |\n|-|-|\n| **[Full Documentation](docs\u002FDOCUMENTATION.md)** | Complete guide - install, file queries, SQL, Python, C\u002FC++, extensions |\n| [Query Your Files](docs\u002FDOCUMENTATION.md#2-query-your-files) | CSV, Parquet, JSON, Excel, Arrow, Avro, SQLite |\n| [Large Datasets](docs\u002FDOCUMENTATION.md#3-working-with-large-datasets) | Import strategies, Parquet conversion, persistence |\n| [SQL Guide](docs\u002FDOCUMENTATION.md#4-sql-guide) | Joins, window functions, CTEs, QUALIFY, MERGE |\n| [All Functions](docs\u002FDOCUMENTATION.md#5-all-functions) | 70+ built-in functions with examples |\n| [Python API](docs\u002FDOCUMENTATION.md#6-python-api) | Connect, query, pandas, context manager |\n| [C\u002FC++ API](docs\u002FDOCUMENTATION.md#7-cc-api) | Lifecycle, queries, results, CMake, RAII |\n| [SQL Quick Reference](docs\u002FSQL_REFERENCE.md) | One-page cheat sheet |\n| [Extension API](include\u002Fslothdb\u002Fextension\u002Fextension_api.h) | Build custom extensions |\n\n## Build from Source\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\u002Fslothdb.git\ncd slothdb\ncmake -B build -DSLOTHDB_BUILD_SHELL=ON -DCMAKE_BUILD_TYPE=Release\ncmake --build build --config Release\n.\u002Fbuild\u002Fsrc\u002Fslothdb          # Linux\u002FmacOS\nbuild\\src\\Release\\slothdb.exe  # Windows\n```\n\n**Run tests:**\n\n```bash\ncmake -B build -DSLOTHDB_BUILD_SHELL=ON -DSLOTHDB_BUILD_TESTS=ON\ncmake --build build --config Release\nctest --test-dir build -C Release    # 403 tests\n```\n\n| Build Option | Description |\n|-------------|-------------|\n| `-DSLOTHDB_BUILD_SHELL=ON` | Build CLI shell |\n| `-DSLOTHDB_BUILD_TESTS=ON` | Build test suite |\n| `-DSLOTHDB_SANITIZERS=ON` | Enable ASan\u002FUBSan |\n| `-DSLOTHDB_EDGE=ON` | Edge \u002F WASM minimal build - strips Excel \u002F Avro \u002F Arrow IPC \u002F SQLite readers. Target: sub-1 MB WASM for Cloudflare Workers. See [docs\u002FEDGE_BUILD.md](docs\u002FEDGE_BUILD.md) |\n\n## Community\n\nThere's a Discord: **[discord.gg\u002FXJWyGmX5G](https:\u002F\u002Fdiscord.gg\u002FXJWyGmX5G)**. Bug reports, install help, weird query plans, \"is this slower than it should be\", feature ideas - any of it. I'm the maintainer, I read everything. GitHub issues are still the canonical tracker; the server is for the questions that come before you file one.\n\n## Contributing\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for build instructions and contribution guidelines.\n\n## License\n\n[MIT](LICENSE) - use it however you want.\n\n---\n\n\u003Cdiv align=\"center\">\n\n\u003Csub>Built with C++20 · Zero dependencies · \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FSouravRoy-ETL\">@SouravRoy-ETL\u003C\u002Fa>\u003C\u002Fsub>\n\n\u003C\u002Fdiv>\n","SlothDB 是一个嵌入式 SQL 数据库，可以在笔记本、服务器乃至浏览器中运行。它从零构建，关键操作速度最高可达同类产品的5倍。SlothDB 支持多种数据格式如 Arrow、Avro、CSV 和 Parquet 等，并提供 Python 和 WASM 接口以适应不同的开发环境。此外，该项目还集成了自然语言处理功能，允许用户通过简单的英文指令生成复杂的 SQL 查询，适用于需要快速分析和查询大量数据的场景，如商业智能报告、在线数据分析服务等。",2,"2026-06-11 02:40:31","CREATED_QUERY"]