[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-3456":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":14,"starSnapshotCount":14,"syncStatus":13,"lastSyncTime":25,"discoverSource":26},3456,"wacrawl","steipete\u002Fwacrawl","steipete","🧾 WhatsApp archaeology with encrypted receipts",null,"Go",143,19,2,0,8,27,3.9,"MIT License",false,"main",[],"2026-06-12 02:00:50","# 🧾 wacrawl\n\n![wacrawl banner](docs\u002Fassets\u002Freadme-banner.jpg)\n\nWhatsApp archaeology with encrypted receipts.\n\nRead-only local archive and search for the macOS WhatsApp Desktop app.\n\n`wacrawl` copies WhatsApp Desktop's local SQLite databases into a temporary\nsnapshot, imports the useful chat data into its own SQLite archive, and gives\nyou scriptable commands for status, chat listing, message listing, and full-text\nsearch.\n\nIt is for local inspection. It does not send messages, decrypt backups, talk to\nWhatsApp Web, or write back into WhatsApp's app container.\n\n## Install\n\nHomebrew is the easiest path. Install directly from my tap:\n\n```bash\nbrew install steipete\u002Ftap\u002Fwacrawl\n```\n\nAfter that, upgrades stay simple:\n\n```bash\nbrew update\nbrew upgrade steipete\u002Ftap\u002Fwacrawl\n```\n\nOr from source:\n\n```bash\ngo install github.com\u002Fsteipete\u002Fwacrawl\u002Fcmd\u002Fwacrawl@latest\n```\n\nCheck the installed binary:\n\n```bash\nwacrawl --version\n```\n\n## Quick Start\n\nFirst, check whether `wacrawl` can see the local WhatsApp Desktop data:\n\n```bash\nwacrawl doctor\n```\n\nSync a fresh local archive:\n\n```bash\nwacrawl sync\nwacrawl import --copy-media\n```\n\nInspect what was imported. Read commands sync automatically by default, so\n`status`, `chats`, `messages`, and `search` refresh the archive before reading\nwhen the local WhatsApp Desktop source is newer:\n\n```bash\nwacrawl status\nwacrawl chats --limit 20\nwacrawl unread --limit 20\nwacrawl messages --limit 20\n```\n\nSearch message text:\n\n```bash\nwacrawl search \"release notes\"\n```\n\nUse JSON for scripts:\n\n```bash\nwacrawl --json search \"invoice\" --from-them --after 2026-01-01\n```\n\n## What It Reads\n\nOn macOS, WhatsApp Desktop stores app data in:\n\n```text\n~\u002FLibrary\u002FGroup Containers\u002Fgroup.net.whatsapp.WhatsApp.shared\n```\n\n`wacrawl` currently imports from:\n\n```text\nChatStorage.sqlite\nContactsV2.sqlite\nMessage\u002FMedia\u002F\n```\n\nIt writes its own archive to:\n\n```text\n~\u002F.wacrawl\u002Fwacrawl.db\n```\n\nOverride either path when needed:\n\n```bash\nwacrawl --source \"$HOME\u002FLibrary\u002FGroup Containers\u002Fgroup.net.whatsapp.WhatsApp.shared\" doctor\nwacrawl --db \u002Ftmp\u002Fwacrawl.db import\n```\n\n## Safety\n\n- Opens WhatsApp data read-only.\n- Copies SQLite database, WAL, and SHM files into a temp snapshot before import.\n- Replaces only the `wacrawl` archive database.\n- Does not modify WhatsApp databases, settings, contacts, chats, or media.\n- Does not use the WhatsApp network protocol.\n- Does not upload data during normal archive\u002Fsearch commands. `backup push`\n  uploads only age-encrypted backup shards when you explicitly run it.\n\nThe archive can contain private message data. Keep `~\u002F.wacrawl\u002Fwacrawl.db`\nlocal and out of commits, backups, and shared logs unless that is intentional.\n\n## Commands\n\n### `doctor`\n\nInspect the source path and database shape:\n\n```bash\nwacrawl doctor\nwacrawl --json doctor\n```\n\nReports source availability, discovered database files, row counts, message date\nrange, and importer schema notes.\n\n### `import`\n\nSnapshot WhatsApp Desktop data and replace the local archive in one transaction:\n\n```bash\nwacrawl import\nwacrawl import --copy-media\n```\n\n`sync` is the same command with a clearer name:\n\n```bash\nwacrawl sync\nwacrawl sync --copy-media\n```\n\nImports:\n\n- chats\n- contacts\n- groups\n- group participants\n- messages\n- media metadata and local media paths\n\nBy default, media paths continue to point at WhatsApp Desktop's app container.\nPass `--copy-media` to copy referenced media files into `media\u002F` next to the\narchive database and rewrite copied message media paths to that archive copy.\nMissing media files are counted in the import output and do not fail the import.\n\n### `status`\n\nShow archive counts and import metadata:\n\n```bash\nwacrawl status\n```\n\nIncludes chat, unread-chat, unread-message, contact, group, participant,\nmessage, media-message, oldest, newest, last-import, and source fields.\n\nBy default, `status` first syncs the archive when the last sync is older than\n`--sync-max-age` and the WhatsApp Desktop source has newer data.\n\n### `chats`\n\nList chats ordered by newest message:\n\n```bash\nwacrawl chats\nwacrawl chats --limit 100\nwacrawl chats --unread\n```\n\nUnread state comes from WhatsApp Desktop's per-chat unread counter. Message\nrows do not expose a reliable incoming per-message \"read by me\" flag.\n\n### `unread`\n\nList only chats with unread messages:\n\n```bash\nwacrawl unread\nwacrawl unread --limit 100\n```\n\n### `messages`\n\nList archived messages:\n\n```bash\nwacrawl messages\nwacrawl messages --chat 1234567890@s.whatsapp.net\nwacrawl messages --after 2026-01-01 --from-them\nwacrawl messages --has-media --json\n```\n\nFilters:\n\n```text\n--chat JID       Restrict to one chat.\n--sender JID     Restrict to one sender.\n--limit N        Max rows. Default: 50.\n--after DATE     RFC3339 timestamp or YYYY-MM-DD.\n--before DATE    RFC3339 timestamp or YYYY-MM-DD.\n--from-me        Only outgoing messages.\n--from-them      Only incoming messages.\n--has-media      Only messages with media metadata.\n--asc            Oldest first.\n```\n\n### `search`\n\nSearch the archive with SQLite FTS5:\n\n```bash\nwacrawl search \"launch\"\nwacrawl search \"invoice\" --from-them --after 2026-01-01\nwacrawl --json search \"restaurant\"\n```\n\nSearch uses message text, chat name, sender name, and media title fields. It\naccepts the same filters as `messages`.\n\n## Sync Behavior\n\n`wacrawl` keeps normal reads fresh without a daemon or background service.\nBefore `status`, `chats`, `messages`, or `search`, it checks the archive's\nlast import time. If the archive is stale, it inspects the WhatsApp Desktop\nsource and imports a fresh snapshot only when the source is ahead.\n\nThe default policy is:\n\n```text\n--sync auto\n--sync-max-age 15m\n```\n\nSync modes:\n\n```text\n--sync auto     Sync before reads when the archive is stale and source is ahead.\n--sync always   Force a sync before every read command.\n--sync never    Read only the existing archive.\n```\n\nExamples:\n\n```bash\nwacrawl search \"release notes\"\nwacrawl --sync always status\nwacrawl --sync never --json messages --limit 10\nwacrawl --sync-max-age 1h chats\n```\n\nIf the WhatsApp Desktop source is unavailable and the archive already has data,\n`--sync auto` warns on stderr and continues with the existing archive.\n`--sync always` treats an unavailable source as an error.\n\n## Encrypted Git Backup\n\n`wacrawl` can back up the archive to a Git repository using age-encrypted JSONL\nshards. This is meant for a private repository such as\n`https:\u002F\u002Fgithub.com\u002Fsteipete\u002Fbackup-wacrawl`, but the message data is encrypted\nbefore Git sees it.\n\nThe backup repo contains:\n\n```text\nREADME.md\nmanifest.json\ndata\u002Fchats.jsonl.gz.age\ndata\u002Fcontacts.jsonl.gz.age\ndata\u002Fgroups.jsonl.gz.age\ndata\u002Fgroup_participants.jsonl.gz.age\ndata\u002Fmessages\u002FYYYY\u002FMM.jsonl.gz.age\n```\n\n`manifest.json` is intentionally cleartext so a machine can inspect backup\nfreshness, public age recipients, counts, shard paths, encrypted byte sizes, and\nplaintext hashes without decrypting message contents. It does not contain\nmessage text, chat names, contacts, participant IDs, or media metadata. Those\nfields live inside the `*.jsonl.gz.age` shards.\n\n### Command Cheat Sheet\n\nUse these most of the time:\n\n```bash\n# First-time setup on a machine.\nwacrawl backup init \\\n  --repo ~\u002FProjects\u002Fbackup-wacrawl \\\n  --remote https:\u002F\u002Fgithub.com\u002Fsteipete\u002Fbackup-wacrawl.git\n\n# Refresh WhatsApp data if needed, encrypt, commit, and push to GitHub.\nwacrawl backup push\n\n# Pull the Git backup, decrypt, verify, and import into the local archive.\nwacrawl backup pull\n\n# Inspect the backup manifest without decrypting message data.\nwacrawl backup status\n```\n\nUseful safety variants:\n\n```bash\n# Force a fresh WhatsApp import before writing the backup.\nwacrawl --sync always backup push\n\n# Write and commit locally, but do not push to GitHub.\nwacrawl backup push --no-push\n\n# Restore into a throwaway database for testing.\nwacrawl --db \u002Ftmp\u002Fwacrawl-restore-test.db backup pull\nwacrawl --db \u002Ftmp\u002Fwacrawl-restore-test.db --sync never status\n```\n\nYou should not need to run `git` manually for normal use. `backup push` handles\nthe backup repo pull\u002Frebase, commit, and push. `backup pull` handles the backup\nrepo pull\u002Frebase before decrypting.\n\n### Encryption and Security Model\n\nBackups use the Go `filippo.io\u002Fage` library with X25519 age identities. There\nis no backup password. Each machine has an age identity file, usually:\n\n```text\n~\u002F.wacrawl\u002Fage.key\n```\n\nThat file contains an `AGE-SECRET-KEY-...` private identity and is written with\n0600 permissions. Its matching public recipient starts with `age1...` and is\nsafe to place in `~\u002F.wacrawl\u002Fbackup.json`, `manifest.json`, or docs.\n\nFor each shard, `wacrawl backup push`:\n\n1. Exports rows from the local archive as deterministic JSONL.\n2. Gzip-compresses the JSONL with a fixed gzip timestamp.\n3. Encrypts the compressed bytes with age for every configured recipient.\n4. Writes only the encrypted `*.jsonl.gz.age` shard to Git.\n5. Writes `manifest.json` with cleartext metadata used for status, diffing, and restore verification.\n\n`wacrawl backup pull` does the reverse: it pulls\u002Frebases the backup repo,\nchecks manifest shard paths, decrypts each shard with the local age identity,\nverifies the shard hash, validates cross-table references, and imports the\nsnapshot into the configured archive database in one transaction.\n\nWhat the backup protects:\n\n- A GitHub read-only compromise or accidental clone does not reveal message text,\n  contacts, chat names, participant IDs, or media metadata.\n- Each encrypted shard can be decrypted by any listed age recipient, so multiple\n  machines can share one backup without sharing one private key.\n- Age provides encrypted-file integrity; corrupted or wrong-key shards fail to\n  decrypt, and `wacrawl` also checks manifest hashes after decrypting.\n\nWhat remains visible in Git:\n\n- `manifest.json` is cleartext.\n- The manifest reveals export time, public recipients, table names, row counts,\n  shard paths, encrypted byte sizes, and plaintext shard hashes.\n- Message shard paths reveal activity by year and month, for example\n  `data\u002Fmessages\u002F2026\u002F04.jsonl.gz.age`.\n- Git history reveals backup cadence and which encrypted shards changed.\n\nImportant limits:\n\n- This is not end-to-end provenance. Someone who can push to the backup repo can\n  replace the backup with different data encrypted to your public recipient.\n  Use normal GitHub access control and review unexpected backup commits.\n- If `~\u002F.wacrawl\u002Fage.key` is lost and no other configured recipient exists, the\n  encrypted backup cannot be restored.\n- If an age identity is compromised, remove its public recipient, run\n  `wacrawl backup push` to re-encrypt current shards, and consider rewriting or\n  deleting old Git history because older commits may still be decryptable with\n  the compromised key.\n- X25519 age recipients are not post-quantum. They are a practical modern\n  default, but not a post-quantum archival guarantee.\n- The local archive database `~\u002F.wacrawl\u002Fwacrawl.db` and the WhatsApp Desktop\n  source data remain plaintext on the machine. Protect the machine and local\n  backups accordingly.\n\n### Initial Setup\n\nInitialize the backup repository and local age identity:\n\n```bash\nwacrawl backup init \\\n  --repo ~\u002FProjects\u002Fbackup-wacrawl \\\n  --remote https:\u002F\u002Fgithub.com\u002Fsteipete\u002Fbackup-wacrawl.git\n```\n\nThis writes `~\u002F.wacrawl\u002Fbackup.json`, creates `~\u002F.wacrawl\u002Fage.key` if needed,\nclones or initializes the local backup checkout, and prints the public age\nrecipient.\n\nThe generated config looks like this:\n\n```json\n{\n  \"repo\": \"~\u002FProjects\u002Fbackup-wacrawl\",\n  \"remote\": \"https:\u002F\u002Fgithub.com\u002Fsteipete\u002Fbackup-wacrawl.git\",\n  \"identity\": \"~\u002F.wacrawl\u002Fage.key\",\n  \"recipients\": [\"age1...\"]\n}\n```\n\nKeep `~\u002F.wacrawl\u002Fage.key` private. The public `age1...` recipient can be stored\nin `backup.json`; the `AGE-SECRET-KEY-...` identity must stay local or in a\npassword manager.\n\n### Push\n\nPush an encrypted backup:\n\n```bash\nwacrawl backup push\n```\n\n`backup push` first pulls\u002Frebases the configured backup checkout, then uses the\nnormal read-time sync policy. With the default `--sync auto --sync-max-age 15m`,\nit refreshes the local archive only when the WhatsApp Desktop source is stale\nand newer than the archive. Then it exports stable JSONL, gzip-compresses each\nshard, encrypts each shard for every configured recipient, updates\n`manifest.json`, removes stale encrypted shards, commits, and pushes the backup\nrepo.\n\nRe-running `backup push` without archive changes leaves Git clean. The command\nprints the repo path, whether anything changed, whether the backup is encrypted,\nthe shard count, and the message count.\n\nUse `--no-push` for local dry runs that commit into the backup checkout but do\nnot push to the remote:\n\n```bash\nwacrawl backup push --no-push\n```\n\n### Restore\n\nRestore from the backup repo:\n\n```bash\nwacrawl backup pull\n```\n\n`backup pull` pulls\u002Frebases the configured backup repo, decrypts every shard with\nthe local age identity, verifies each plaintext shard hash from the manifest,\nvalidates cross-table references, and replaces the configured `wacrawl` archive\ndatabase in one import transaction.\n\nTo test a restore without touching your real archive:\n\n```bash\nwacrawl --db \u002Ftmp\u002Fwacrawl-restore-test.db backup pull\nwacrawl --db \u002Ftmp\u002Fwacrawl-restore-test.db --sync never status\n```\n\n### Status\n\nInspect backup metadata:\n\n```bash\nwacrawl backup status\n```\n\nThis reports encryption status, shard count, message count, export timestamp,\nand repo path. It reads `manifest.json`; it does not need to decrypt shards.\n\n### Multiple Machines\n\nEach machine that should restore needs its own age identity. On the new machine:\n\n```bash\nwacrawl backup init \\\n  --repo ~\u002FProjects\u002Fbackup-wacrawl \\\n  --remote https:\u002F\u002Fgithub.com\u002Fsteipete\u002Fbackup-wacrawl.git\n```\n\nCopy the printed public recipient (`age1...`) into the `recipients` list in\n`~\u002F.wacrawl\u002Fbackup.json` on a machine that can already decrypt the backup, then\nrun:\n\n```bash\nwacrawl backup push\n```\n\nAfter that push, newly written shards are encrypted for all configured\nrecipients. If you added a recipient after data already existed, run a normal\n`wacrawl backup push`; unchanged plaintext shards are re-encrypted when the\nmanifest\u002Fconfig changes.\n\nFor personal setup, storing a copy of `~\u002F.wacrawl\u002Fage.key` in 1Password is a\ngood recovery path. Do not commit the identity file. Do not paste the\n`AGE-SECRET-KEY-...` value into issues, logs, docs, or chat.\n\n### Flags\n\nUseful flags:\n\n```text\n--config PATH        Backup config path. Default: ~\u002F.wacrawl\u002Fbackup.json\n--repo PATH          Local backup Git checkout.\n--remote URL         Backup Git remote.\n--identity PATH      Local age identity. Default: ~\u002F.wacrawl\u002Fage.key\n--recipient AGE      Public age recipient. Repeat for multiple machines.\n--no-push            Commit locally but do not push.\n```\n\n### Recovery Checklist\n\nOn a new Mac:\n\n```bash\nbrew install steipete\u002Ftap\u002Fwacrawl\ngit clone https:\u002F\u002Fgithub.com\u002Fsteipete\u002Fbackup-wacrawl.git ~\u002FProjects\u002Fbackup-wacrawl\nmkdir -p ~\u002F.wacrawl\n```\n\nThen restore `~\u002F.wacrawl\u002Fage.key` from your password manager and create\n`~\u002F.wacrawl\u002Fbackup.json` pointing at the clone:\n\n```json\n{\n  \"repo\": \"~\u002FProjects\u002Fbackup-wacrawl\",\n  \"remote\": \"https:\u002F\u002Fgithub.com\u002Fsteipete\u002Fbackup-wacrawl.git\",\n  \"identity\": \"~\u002F.wacrawl\u002Fage.key\",\n  \"recipients\": [\"age1...\"]\n}\n```\n\nFinally:\n\n```bash\nwacrawl backup pull\nwacrawl --sync never status\n```\n\nIf decryption fails, the local `identity` does not match any recipient used for\nthe encrypted shards. If Git push fails, fix normal GitHub permissions for the\nbackup repository; the archive data is already encrypted before the push.\n\n## Global Flags\n\n```text\n--db PATH               Archive database path. Default: ~\u002F.wacrawl\u002Fwacrawl.db\n--source PATH           WhatsApp Desktop source path.\n--sync MODE             Read-time sync policy: auto, always, or never. Default: auto.\n--sync-max-age DURATION Staleness window for --sync auto. Default: 15m.\n--json                  Emit JSON instead of human-readable output.\n--version               Print the CLI version.\n```\n\nImport\u002Fsync flags:\n\n```text\n--copy-media            Copy referenced media during import\u002Fsync.\n```\n\n## Data Format Notes\n\nWhatsApp Desktop uses CoreData-style SQLite tables. The importer currently knows\nabout:\n\n```text\nZWACHATSESSION\nZWAMESSAGE\nZWAMEDIAITEM\nZWAGROUPINFO\nZWAGROUPMEMBER\n```\n\nImportant details:\n\n- WhatsApp timestamps are seconds since `2001-01-01T00:00:00Z`.\n- `ZWAMESSAGE.Z_PK` is used as the source row identity.\n- `ZSTANZAID` is not unique enough for archive identity.\n- Group senders are resolved through `ZWAMESSAGE.ZGROUPMEMBER`.\n- Media is joined through both `ZWAMESSAGE.ZMEDIAITEM` and\n  `ZWAMEDIAITEM.ZMESSAGE`.\n- WhatsApp's own search database uses a custom `wa_tokenizer`; `wacrawl` builds\n  a portable FTS5 index instead.\n\n## Development\n\nRequires Go 1.26 or newer.\n\n```bash\nmake check\n```\n\nRuns:\n\n```bash\ngolangci-lint run .\u002F...\n.\u002Fscripts\u002Fcoverage.sh 85.0\ngo build -o bin\u002Fwacrawl .\u002Fcmd\u002Fwacrawl\n```\n\nExtra release-parity checks:\n\n```bash\ngo test -count=1 -race .\u002F...\ngoreleaser release --snapshot --clean --skip=publish\n```\n\nCoverage must stay at or above 85%.\n\n## Release\n\nReleases are tag-driven through GoReleaser.\n\n```bash\ngit tag -a v0.2.0 -m \"Release 0.2.0\"\ngit push origin main --tags\n```\n\nCI publishes GitHub release artifacts for:\n\n```text\ndarwin\u002Famd64\ndarwin\u002Farm64\nlinux\u002Famd64\nlinux\u002Farm64\nwindows\u002Famd64\nwindows\u002Farm64\n```\n\nThe Homebrew formula lives in:\n\n```text\n~\u002FProjects\u002Fhomebrew-tap\u002FFormula\u002Fwacrawl.rb\n```\n\n## License\n\nMIT. See `LICENSE`.\n","wacrawl 是一个用于 macOS 上 WhatsApp Desktop 应用的本地存档和搜索工具。它通过复制 WhatsApp 的本地 SQLite 数据库到临时快照中，并将有用的聊天数据导入到自己的 SQLite 存档中，提供命令行接口进行状态查看、聊天列表展示、消息查询及全文搜索等功能。项目采用 Go 语言编写，适合需要对 WhatsApp 聺话记录进行本地检查而不希望直接操作原始数据库或网络协议的场景使用。此外，wacrawl 仅读取数据且不修改原应用内的任何信息，确保了用户数据的安全性。","2026-06-06 02:55:19","CREATED_QUERY"]