[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80807":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":12,"stars30d":12,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":9,"rankLanguage":9,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":17,"topics":20,"createdAt":9,"pushedAt":9,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":12,"lastSyncTime":24,"discoverSource":25},80807,"WAInsight","akhil-dara\u002FWAInsight","akhil-dara","Forensic analysis of already-acquired WhatsApp Android databases - browse every chat exactly like the WhatsApp home screen, plus 30 forensic pages: media recovery, visual-hash search, contact + group reports, offline export bundles. Read-only by construction.",null,"Python",40,2,1,0,42.63,"MIT License",false,"main",true,[],"2026-06-11 04:07:15","\u003Cdiv align=\"center\">\n\n\u003Cimg src=\"logo.png\" alt=\"WAInsight logo\" width=\"120\"\u002F>\n\n# WAInsight - WhatsApp Forensic Analysis Suite\n\n### A forensic analysis suite for already-acquired WhatsApp Android databases\n\n> **WAInsight does not extract anything from a phone.**  Phone acquisition is a separate step done with whatever forensic acquisition tool the analyst already uses.  WAInsight starts where that step ends — point it at the *folder of files* that came out of the acquisition (`msgstore.db`, `wa.db`, the `Media\u002F` directory, the `Avatars\u002F` directory) and it does the rest.\n\nWhat it does, in one paragraph: ingests those files into a normalised, **read-only** case database, then opens a 30-page desktop UI where every conversation is **fully browseable just like the WhatsApp home screen** — chat list with avatars \u002F unread \u002F pinned \u002F muted \u002F archived, click any chat, scroll the timeline (bubbles, edits, revokes, replies, reactions, receipts, forwarded flags), with click-to-jump search, calendar filtering, mention chips, pinned-message strip, and a forensic-info side panel on every bubble.  On top of that browsing surface sit 30 forensic pages: media gallery, perceptual visual search, media recovery, ghost \u002F edit \u002F revoke browsers, calls page, locations, polls, links, contact + group reports, offline export bundles, and the folder-shaped Media Dashboard.\n\nBuilt for digital forensics teams, law-enforcement examiners, and incident responders.  **Source `msgstore.db` is opened with `?mode=ro&immutable=1`** — WAInsight never writes to evidence.\n\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green.svg)](#license) [![Python 3.10+](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10+-blue.svg)](#tech-stack) [![PySide6](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FQt-PySide6-41cd52.svg)](#tech-stack) [![Status: active](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fstatus-active-brightgreen.svg)]()\n\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fakhil-dara\u002FWAInsight?style=social)](https:\u002F\u002Fgithub.com\u002Fakhil-dara\u002FWAInsight\u002Fstargazers) [![GitHub forks](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fforks\u002Fakhil-dara\u002FWAInsight?style=social)](https:\u002F\u002Fgithub.com\u002Fakhil-dara\u002FWAInsight\u002Fnetwork\u002Fmembers) [![Visitors](https:\u002F\u002Fvisitor-badge.laobi.icu\u002Fbadge?page_id=akhil-dara.WAInsight&left_text=Visitors)](https:\u002F\u002Fgithub.com\u002Fakhil-dara\u002FWAInsight)\n\n### Support development\n\n[![Sponsor akhil-dara](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FSponsor-akhil--dara-EA4AAA?logo=githubsponsors&logoColor=white&style=for-the-badge)](https:\u002F\u002Fgithub.com\u002Fsponsors\u002Fakhil-dara)\n\n\n\u003C\u002Fdiv>\n\n---\n\n## Table of contents\n\n- [What WAInsight does](#what-wainsight-does)\n- [Screenshots](#screenshots-suggested-shots-to-add)\n- [Quick start](#quick-start)\n- [Highlights](#highlights)\n- [Pages](#pages)\n- [Reports](#reports)\n- [Offline HTML & dashboard exports](#offline-html--dashboard-exports)\n- [Architecture](#architecture)\n- [Forensic integrity](#forensic-integrity)\n- [Tech stack](#tech-stack)\n- [Repository layout](#repository-layout)\n- [Roadmap](#roadmap)\n- [License](#license)\n\n---\n\n## What WAInsight does\n\nYou point WAInsight at a folder containing **already-acquired** WhatsApp Android files (the tool does not pull anything from a phone — that's the acquisition step, done separately with whatever tool the analyst already uses).  Specifically it expects:\n\n- `msgstore.db` (chats, messages, media, calls, polls, mentions, …)\n- `wa.db` (saved contacts, business \u002F verified state, avatars)\n- `Media\u002F` and `Avatars\u002F` directories\n\n…it ingests those files in **29 sequential stages**, normalises everything into a single `analysis.db` with **47 indexed tables**, and presents a desktop UI where the analyst can:\n\n- **browse every chat exactly like opening WhatsApp itself** — home-screen-style conversation list, click any chat, scroll the timeline of bubbles, see edits \u002F revokes \u002F reactions \u002F replies \u002F receipts \u002F forwarded badges inline\n- triage **200k+ media files** through a folder-shaped offline dashboard with cascading filters\n- recover deleted-from-device media via CDN re-download or hash-linking\n- find every chat where the same SHA‑256 was shared (cross-chat propagation)\n- run perceptual-hash visual search to find similar images across the whole case\n- pivot between contacts, groups, calls, links, locations, polls, status updates, scheduled events\n- export a single offline HTML bundle the case officer can hand off (no Python, no server, just `index.html`)\n- generate landscape-A4 PDF \u002F HTML forensic reports per group or per contact\n\nEverything happens locally. No telemetry, no internet calls except optional CDN media re-download (and that requires the analyst's explicit click).\n\n---\n\n## Screenshots\n\n> All screenshots are taken against a real case with PII blurred. Click any image to view full size. Source files live under [`docs\u002Fscreenshots\u002F`](docs\u002Fscreenshots\u002F).\n\n### Hero\n\n![WAInsight chat viewer](docs\u002Fscreenshots\u002F04_chat_viewer.png)\n\n*Chat viewer rendered inside QWebEngine: bubbles, edit pencil pill, reply badges, forensic ℹ button per message, sender avatars, status ticks.*\n\n### Getting in\n\n| | |\n|---|---|\n| ![Case picker](docs\u002Fscreenshots\u002F01_case_picker.png) | ![Dashboard](docs\u002Fscreenshots\u002F02_dashboard.png) |\n| **Case picker** — discover existing `.wfacase` packages or ingest a new extraction. | **Dashboard** — case-wide totals + activity heatmap as soon as a case is loaded. |\n| ![Conversations](docs\u002Fscreenshots\u002F03_conversations.png) | ![Calendar heatmap filter](docs\u002Fscreenshots\u002F06_calendar_heatmap_filter.png) |\n| **Conversations list** — WhatsApp-style with avatars, unread badges, pinned \u002F muted \u002F archived markers, search. | **Calendar heatmap filter** — every day shows its message count like an airline-fare grid; click + drag to filter. |\n\n### Reading a chat\n\n| | |\n|---|---|\n| ![Forensic info panel](docs\u002Fscreenshots\u002F05_forensic_info.png) | ![Edit history popup](docs\u002Fscreenshots\u002F08_edit_history_popup.png) |\n| **Forensic info panel** — every bubble's ℹ button opens this panel: msgstore source IDs, origination flags, SQL provenance, raw JID. | **Edit history popup** — every revision of an edited message side-by-side, with pre-edit text fully visible. |\n| ![Replies sidebar](docs\u002Fscreenshots\u002F07_replies_sidebar.png) | ![Receipt details](docs\u002Fscreenshots\u002F09_receipt_details.png) |\n| **Replies sidebar** — click a reply chain badge → see every reply to that message + a \"Go to original\" button. | **Receipt details** — click any tick → per-recipient delivered\u002Fread\u002Fplayed timeline with millisecond lag. |\n| ![Image right-click menu](docs\u002Fscreenshots\u002F10_chat_image_context_menu.png) | ![Find Copies popup](docs\u002Fscreenshots\u002F11_find_copies_popup.png) |\n| **Right-click context menu on a media bubble** — Find Copies (exact SHA-256), Find Similar Images (perceptual), copy IDs \u002F file path \u002F key, open file location. | **Find Copies popup** — every chat that ever shared the same SHA-256, with \"Go to chat\" buttons that jump to the exact message. |\n| ![View-once download](docs\u002Fscreenshots\u002F12_view_once_download.png) | |\n| **View-once recovery** — voice notes \u002F images marked \"view-once\" stay downloadable from the bubble even after the on-device file expired (uses CDN URL + media_key from msgstore). | |\n\n### Browse & search\n\n| | |\n|---|---|\n| ![Media gallery cascading](docs\u002Fscreenshots\u002F13_media_gallery_cascading.png) | ![Media recovery dashboard](docs\u002Fscreenshots\u002F14_media_recovery_dashboard.png) |\n| **Media Gallery** — cascading filters (sender × conversation × date × type × status) on a fast thumbnail grid. | **Media Recovery** — per-conversation breakdown of On-Disk \u002F Downloadable \u002F Expired \u002F Missing-No-Key media, one-click bulk re-download. |\n| ![Image similarity page](docs\u002Fscreenshots\u002F15_image_similarity_page.png) | ![Image similarity results](docs\u002Fscreenshots\u002F16_image_similarity_results.png) |\n| **Image Similarity** — drop a screenshot, browse, or paste from clipboard; pick Exact (SHA-256) or Visual (pHash) match mode. | **Match results** — Exact \u002F Near-Exact \u002F Near-Duplicate \u002F Template-Match tiers across the whole case (89 651 indexed images here). |\n| ![Calls page](docs\u002Fscreenshots\u002F17_calls_page.png) | ![Locations page](docs\u002Fscreenshots\u002F18_locations_page.png) |\n| **Calls page** — call records with type \u002F direction \u002F result filters; the calendar popup shows per-day call count badges. | **Locations** — every static + live location share, with start\u002Ffinal coordinates and live-share durations. |\n| ![Polls page](docs\u002Fscreenshots\u002F19_polls_page.png) | |\n| **Polls** — every poll with options + vote tallies; click a row to see the per-option breakdown chart. | |\n\n### Group + contact intelligence\n\n| | |\n|---|---|\n| ![Group Info — owner banner](docs\u002Fscreenshots\u002F24_group_info_owner_banner.png) | ![Group members + former](docs\u002Fscreenshots\u002F26_group_members_former.png) |\n| **Group Info — device-owner banner** — clearly states whether the case-owner is admin \u002F member \u002F removed in this group, plus the decoded `chat.participation_status` source. | **Group members + former** — current roster with role \u002F message \u002F media counts; Former Members section sources from `group_past_participant`, `group_member.is_current=0`, AND message-only inference. |\n| ![Group edit history](docs\u002Fscreenshots\u002F25_group_edit_history.png) | |\n| **Group Edit History** — every name \u002F DP \u002F description \u002F settings change with diff view, sortable by type. | |\n| ![Device history](docs\u002Fscreenshots\u002F22_device_history.png) | ![Contact report](docs\u002Fscreenshots\u002F23_contact_report.png) |\n| **Per-contact device sessions** — every device this contact has used (Primary Android, iPhone, 14 Web\u002FDesktop companions here), first\u002Flast seen, message split, confidence score. | **Contact Activity Report** (HTML) — full forensic identity + per-group activity + 1-on-1 timeline + reactions + groups in common. |\n\n### Reports\n\n| | |\n|---|---|\n| ![Group report dialog](docs\u002Fscreenshots\u002F27_group_report_dialog.png) | ![Top contributors snippet](docs\u002Fscreenshots\u002F28_report_top_contributors.png) |\n| **Group Report dialog** — pick output format (HTML \u002F PDF), restrict to a date range, tick exactly which sections to include. | **Top Contributors snippet** — example section from a generated report: ranked bar chart with full JID per contributor, owner-aware. |\n\n### Tagged-messages bundle export\n\n| | |\n|---|---|\n| ![Tagged export dialog](docs\u002Fscreenshots\u002F20_tagged_export_dialog.png) | ![Tagged export viewer](docs\u002Fscreenshots\u002F21_tagged_export_viewer.png) |\n| **Tagged export dialog** — three modes: full conversations \u002F tagged only \u002F tagged ± N day buffer. Bundles the offline HTML viewer + (optionally) the actual media files; output is a single ZIP. | **Tagged export viewer (browser)** — the exported `index.html` opened offline in Chrome: chat list + the tagged conversation; compaction markers count messages hidden between tagged ones. |\n\n---\n\n## Quick start\n\n```bash\n# 1. Clone + install dependencies (Python 3.10+ required)\ngit clone https:\u002F\u002Fgithub.com\u002F\u003Cyou>\u002FWAInsight.git\ncd WAInsight\npython -m pip install -r requirements.txt\n\n# 2. Launch the GUI\npython wainsight.py\n```\n\nOn first run the launch screen asks you to either **create a new case** (point at a folder containing `msgstore.db` + friends — the ingester runs all 29 stages with progress events) or **open an existing `.wfacase` package** if one already exists.\n\n> **Forensic note:** WAInsight opens `msgstore.db` with `?mode=ro&immutable=1`. The original evidence file is never modified. Every ingestion stage is logged to `chain_of_custody.jsonl` inside the `.wfacase` package, with SHA-256 hashes of every source database read.\n\n### System requirements\n\n- **OS:** Tested on Windows 11\n- **Python:** 3.10 or newer\n- **RAM:** 8 GB minimum, 16 GB recommended for cases with > 1 M messages\n- **Disk:** roughly 2× the size of the source `Media\u002F` folder (for the read-only mirror + thumbnails + indexes)\n- **Optional:** `ffmpeg` on `PATH` for video thumbnails in the Media Dashboard, `pymupdf` (`pip install pymupdf`) for PDF first-page thumbnails\n\n---\n\n## Highlights\n\n**Read-only by construction.** Source `msgstore.db` is mounted with SQLite's `?mode=ro&immutable=1` flag, the case folder is the only writeable surface, and every operation is journaled to `chain_of_custody.jsonl` with timing + source hashes.\n\n**29-stage ingestion pipeline.** Messages → media → calls → mentions → albums → reactions → polls → links → locations → status → revokes → edits → group metadata → past participants → admin events → vcards → comments → pin events → contact resolution → key-id platform classification → device sessions → orphan media discovery → hash-link auto-pass → HD\u002FSD twin linking → motion-photo association → FTS5 indexing → daily\u002Fhourly stats rollup → owner identification → source DB hashing.\n\n**Owner-aware everywhere.** WhatsApp messages from the device owner have `sender_id IS NULL` — every report and every page that joins to `contact` injects the device-owner identity from `case_metadata` so owner activity never surfaces as \"Unknown\" or blank rows.\n\n**Per-message platform attribution — visible while you browse.** Every bubble carries an inline tag — **Android · iPhone · Web\u002FDesktop · Companion #N** — derived at ingest time from the WhatsApp `key_id` length + prefix patterns + companion-device key ID lookups. You see the device that sent each individual message right next to the timestamp, not buried in a separate page. Same column drives the Group Report's *Device Platform Usage* breakdown, the contact's *Device Sessions* tab (Primary vs companions, with first\u002Flast seen, message split, confidence score), and the Calls page's call-side classification. Crucial when a case rests on whether a given WhatsApp message came from the suspect's phone or one of their linked Web sessions.\n\n**Folder-shaped Media Dashboard.** A single output folder with `index.html`, sharded AVIF thumbnails (`thumbs\u002Faa\u002Fbb\u002F\u003Csha>.avif`, ~3 KB each), chunked metadata (`data\u002Fmeta_NNN.js`), and a vendored `app.js` that runs a bitset crossfilter + virtual list + IntersectionObserver in the browser. Handles **200k media rows** at file:\u002F\u002F with sub-millisecond facet filtering. Cascading filters by conversation × sender × MIME × extension × status × date, with per-day histogram (flight-fare style), CSV \u002F XLSX \u002F HTML exports, and \"find every chat that shared this hash\".\n\n**Offline HTML viewer bundle.** Hand off a single ZIP — opens from `file:\u002F\u002F`, no Python, no server. WhatsApp-Web-style chat list, full message rendering, FTS5-equivalent search, tagged-messages sidebar, compaction markers between non-adjacent included messages.\n\n**Cross-Contact Analysis.** Pick 2 + contacts → instantly see what they share: groups they're all in, calls between them, file SHA-256 hashes any of them have shared in common, cross @-mentions, every conversation any of them appears in. Owner is a first-class pickable contact.\n\n**Perceptual visual search.** Drop a query image → three confidence tiers of matches across the whole case (pHash + dHash + edge-map). Catches re-shares of the same content even after recompression.\n\n**Media recovery.** Missing-on-device media with valid CDN URL + decrypt key gets one-click re-download (`pycryptodome` for the AES-CBC). Hash-linked recovery: a missing message that shares a SHA-256 with a present one is auto-resolved to the sibling's bytes (and tagged `recovery_method='hash_linked'` so it's never confused with a real local copy).\n\n**Forensic info panel on every bubble.** ℹ button on any message → side panel with msgstore source IDs, every SQL row that contributed to the rendered bubble, origination flags decoded, per-device receipt timeline.\n\n---\n\n## Feature catalog\n\nEvery item below was personally designed and built into the tool — this is the deliberate feature set, not a wishlist.  Grouped by what the analyst is actually trying to do.\n\n### Core browsing\n\n|   | Feature |\n|---|---|\n| 1 | **Robust LID + JID parsing** — phone, group, broadcast, newsletter, bot, device suffixes, agent variants, LID privacy-restricted addressing, the lot. |\n| 2 | **WhatsApp-style conversation home** — avatars, last-message preview, time, unread badge, pinned \u002F muted \u002F archived \u002F locked markers. |\n| 3 | **Group name visible inline** in the chat header AND in the in-chat sender label — you always know which group a bubble lives in. |\n| 4 | **Pinned-messages strip** with WhatsApp-style prev\u002Fnext browser → click any pin to jump straight to the pinned message. |\n| 5 | **Forensic ℹ button on every bubble** — opens a side panel with msgstore source IDs, every SQL row that fed the bubble, origination flags decoded, per-device receipt timeline. |\n\n### Message timeline integrity\n\n|   | Feature |\n|---|---|\n| 6 | **Mentions parsed** for every conversation, rendered as click-to-profile chips — clicks open the contact detail page. |\n| 7 | **Ghost-message reconstruction** — deleted-for-everyone messages recovered from `message_quoted_text` and rendered inline next to the revoked bubble. |\n| 8 | **Edit history** per message — pencil pill on every edited bubble opens a side-by-side revision timeline (built from FTS index + quoted-text reconstruction). |\n| 9 | **Reply chains** — every quoted message gets a \"↰ N replies\" badge; click → sidebar listing every reply + a \"Go to original\" button (cross-conversation jumps too). |\n| 10 | **60+ system events decoded** — group \u002F security \u002F admin \u002F calls \u002F privacy \u002F business \u002F ephemeral \u002F disappearing-settings — all rendered as readable text instead of opaque type codes. |\n| 11 | **Per-message receipts** — every bubble shows delivered + read ticks; click any tick → per-recipient timeline with delivery \u002F read \u002F played millisecond lag. |\n| 12 | **Forwarded-flag indicator** on every forwarded bubble. |\n\n### Search\n\n|   | Feature |\n|---|---|\n| 13 | **FTS5 global search** with sender \u002F conversation \u002F date \u002F ghost filters; results panel as a sidebar inside the chat with click-to-jump highlights. |\n| 14 | **Calendar filter with per-day message counts** — every cell shows that day's volume, flight-fare style; click + drag to filter. |\n\n### Media analysis\n\n|   | Feature |\n|---|---|\n| 15 | **Media Gallery** with cascading checkbox filters — sender × conversation × date × type × status — over a fast thumbnail grid. |\n| 16 | **One-click + bulk download \u002F decrypt of missing media** — driven by CDN URL + `media_key` + expiry timestamp; AES-CBC decrypt via pycryptodome. **View-once media** (images \u002F voice notes) re-downloadable from the bubble even after the on-device file expired. |\n| 17 | **Re-downloaded media is flagged** — bubble shows a \"Downloaded ✓ (recovered)\" badge so the analyst can tell original-on-device bytes apart from CDN-recovered ones. |\n| 18 | **Hash-link auto-rescue** — if a message's media is missing locally but another message's media has the same SHA-256 on disk, the missing row resolves to the sibling's file and is tagged `recovery_method='hash_linked'` (never confused with a real local copy). |\n| 19 | **Thumbnail-only fallback** — when even the bytes are gone, the WhatsApp `thumbnail_blob` is rendered with a \"Thumbnail only\" status pill. |\n| 20 | **HD \u002F SD twin pair surfaced** — every bubble shows both copies with file sizes, \"↗ HD #X\" \u002F \"↘ SD #Y\" cross-jumps, and a \"Download HD\" CTA when only the SD bytes are local. |\n| 21 | **Motion \u002F Live photos** — still parent shows a \"▶ Live\" badge that plays the 1-2 s motion clip on click. |\n| 22 | **Cross-chat share chain** — right-click any media → SHA-256 + encrypted-hash matches across every chat in the case, sorted chronologically, with go-to-message buttons.  Says where the bytes were *first seen*, not just where they were forwarded. |\n| 23 | **Cross-chat share badge in the gallery** — every tile labels how many other chats hold the same SHA-256, click → jump list. |\n| 24 | **Perceptual visual-hash search** — drop a screenshot or pick from a chat: returns Exact \u002F Near-Exact \u002F Near-Duplicate \u002F Template-Match tiers across the whole case.  *Example workflow:* select a PhonePe payment screenshot → find every PhonePe screenshot anyone has ever shared.  Or: pick a camera original from `DCIM\u002F` → find which chats received it. |\n| 25 | **Orphaned-media browser** — files in `Media\u002F` with no surviving message row (cleared chats \u002F reinstall \u002F lost data) plus auto-rescue back-fill against surviving message hashes. |\n\n### Identity & devices\n\n|   | Feature |\n|---|---|\n| 26 | **Per-message device platform attribution** — every bubble carries an inline tag (Android · iPhone · Web\u002FDesktop · Companion #N) derived at ingest from `key_id` length + prefix patterns + companion key-ID lookups, with a confidence score. |\n| 27 | **Per-contact device sessions** in the contact detail page — Primary vs companions (Web\u002FDesktop, linked Android, etc.), first-seen \u002F last-seen, personal vs group message split, confidence per session. |\n| 28 | **Unified contact registry** merged from 5 sources — `jid_map`, `wa_contacts`, `lid_display_name`, group labels, mention names — so every JID resolves to a single canonical identity. |\n\n### Calls\n\n|   | Feature |\n|---|---|\n| 29 | **Calls page** with filters by 1-on-1 \u002F multi-person \u002F voice \u002F video \u002F answered \u002F declined \u002F missed; per-day count badges in the calendar picker. |\n| 30 | **Synthetic voice-chat \u002F orphan-call reconstruction** — calls that have no `message` row in their conversation get virtual rows reconstructed so they render in every participant's chat timeline. |\n| 31 | **Group voice chats appear inside the group chat** — even when WhatsApp didn't write a `message` row for them, the call still shows up at its real position in the group's timeline. |\n\n### Groups\n\n|   | Feature |\n|---|---|\n| 32 | **Past participants reconstructed from 3 sources** — `group_past_participant`, `group_member.is_current=0`, AND message-presence inference (catches members WhatsApp's own roster purged after a long enough gap). |\n| 33 | **Owner can-post banner** on every Group Info page — explicit Yes \u002F No with the underlying source row (`chat.participation_status`, group admin flags) so the analyst sees *why*. |\n\n### Reports & exports\n\n|   | Feature |\n|---|---|\n| 34 | **Per-contact forensic report** (HTML or PDF) with full identity, devices, stats, calls, groups in common, mentions, reactions, media & links — choosable sections + save location. |\n| 35 | **Offline ZIP chat export** — WhatsApp-Web-style conversational viewer, opens from `file:\u002F\u002F`, no Python \u002F server, with global cross-conversation search. |\n| 36 | **Media Forensics Dashboard** — folder-shaped offline artifact (sharded AVIF thumbnails, chunked metadata, vendored UI engine) that scales to 200k+ media rows; cascading filters, per-day histogram, in-browser CSV \u002F XLSX \u002F HTML export, \"find every chat that shared this hash\" popup. |\n\n---\n\n## Pages\n\nThe sidebar groups 30 pages into **Overview**, **Forensics**, and **More**.\n\n### Overview\n| Page | What it does |\n|---|---|\n| **Dashboard** | Case-wide rollup: totals, top contacts (owner-aware), hourly heatmap, day-of-week breakdown |\n| **Conversations** | Home-style chat list with avatars, unread, pinned\u002Fmuted\u002Farchived markers, search, calendar date filter |\n| **Status Updates** | Status posts with author, view count, reply chain |\n| **Contacts** | Full roster with platform tags, business markers, message counts; per-contact detail page with devices + report button |\n| **Media Gallery** | Cascading-filter thumbnail grid (sender × conversation × date × type × status); right-click → find similar \u002F find shared |\n| **Documents** | Files-only browser with extension rail, risky-extension flagging, find-shared popup, right-click context menu |\n| **Calls** | Call records with type \u002F direction \u002F result filters, search, **per-day count badges in the calendar picker** |\n| **Scheduled Events** | WhatsApp Events: title, time, participants, response counts |\n| **Search** | Global FTS5 with sender \u002F date \u002F conversation \u002F ghost filters + click-to-jump |\n| **Analytics** | Avg\u002Fday, peak day, busiest hour, top contacts (owner included), hourly heatmap, day-of-week bars |\n\n### Forensics\n| Page | What it does |\n|---|---|\n| **Cross-Contact Analysis** | Pick 2 + contacts, see shared groups, calls between them, files in common, cross @-mentions, common conversations |\n| **Ghost Messages** | Deleted-for-everyone messages recovered from `message_quoted_text` with go-to-message |\n| **Edit History** | Every edited message + every revision, click-to-jump |\n| **Revoked Messages** | All revoked messages with revocation actor + timestamp |\n| **System Events** | 60 + decoded event types (group, security, admin, calls, privacy, business, ephemeral) |\n| **Media Recovery** | Missing-but-downloadable media with status pills + one-click CDN re-download |\n| **Image Similarity** | Drop a query image → 3-tier matches (pHash \u002F dHash \u002F edge-map) across the whole case |\n| **Orphaned Media** | Files in `Media\u002F` with no surviving message row + auto-rescue back-fill |\n| **Starred Messages** | WhatsApp-starred messages with bundle \u002F CSV \u002F HTML export |\n| **Tagged Messages** | Investigator-applied tags with notes + bundle export (full \u002F tagged-only \u002F tagged + buffer modes) |\n\n### More\n| Page | What it does |\n|---|---|\n| **Locations** | Static + live locations with start \u002F final coordinates, map preview thumbnails, Google Maps links |\n| **Links** | Forensic links browser: domain rail, risky-only filter, top-domain bar chart, sender\u002Fconv\u002Fdate filters, find-shared popup, CSV\u002FHTML export |\n| **Polls** | Poll questions, options, vote tallies, voter identity |\n| **Export** | Offline HTML viewer bundle generator |\n\n---\n\n## Reports\n\n### Group Forensic Report\n\nPer-group landscape-A4 PDF or HTML, generated from any Group Info page via the **Report** button. Includes:\n\n- **Case & Evidence Provenance** banner: case id, examiner, source database paths + SHA‑256 hashes, ingestion timestamp\n- **Group Identity**: name, JID, chat_id, conversation_id, type, addressing mode (LID \u002F phone), creator, first\u002Flast message\n- **Device Owner & Send Policy**: owner role in this group + decoded send\u002Fedit\u002Fmembership rules from `wa_group_admin_settings`\n- **Summary** cards: messages \u002F members \u002F admins \u002F media \u002F links \u002F forwards\n- **Group Edit History** with profile-picture diff\n- **Current Members** (compact landscape table): DP · stacked Identity (name + phone + JID + LID) · Role · Msgs · Media · Links · Mentions · stacked Activity (joined \u002F first \u002F last). **Owner sorts first** with amber-highlighted row.\n- **Top Contributors** + **Top Forwarders** (with category breakdown)\n- **Device Platforms** (Android\u002FiPhone\u002FWeb split per member, owner-aware)\n- **Mentions Network** (most-mentioned, most-active mentioners, edge list — all owner-aware)\n- **Activity** (hourly bars + daily mini-chart)\n- **Calls** with category badges + per-call duration + result\n- **Locations** with live-location START + FINAL coordinate cells (Google Maps links)\n- **Media & Links**: 60+-entry message-type taxonomy (Type 64 \u002F 82 \u002F 90 \u002F 92 \u002F 112 \u002F 116 etc. mapped to readable labels) + top link domains\n- **Bot Activity** (Meta AI etc., with per-bot top-summoner ranking)\n- **Former Members**: 3-source resolution (`group_past_participant` ∪ `group_member.is_current=0` ∪ message-only inference) — never silently empty\n\n### Contact Forensic Report\n\nPer-contact PDF or HTML via the contact detail page. Section picker dialog lets the analyst toggle: identity \u002F overall stats \u002F activity patterns \u002F per-group activity \u002F 1-on-1 summary \u002F calls \u002F groups in common \u002F mentions \u002F reactions \u002F media & links. **Format selector** (HTML \u002F PDF) and **Save location** picker built into the same dialog. Owner-aware mention rows.\n\n### Media Forensics Dashboard\n\nA separate folder-shaped offline artifact (see [Offline HTML & dashboard exports](#offline-html--dashboard-exports)) that scales to **228k+ media rows** in any case.\n\n---\n\n## Offline HTML & dashboard exports\n\nTwo distinct offline-handoff formats:\n\n### 1. Viewer Bundle (`Export` page)\n\nA single ZIP containing:\n- `index.html` — opens from `file:\u002F\u002F`, no Python, no server\n- WhatsApp-Web-style chat list\n- Full message rendering (incl. ghost \u002F edits \u002F revokes \u002F reactions \u002F quotes \u002F forwarded badges)\n- FTS5-equivalent search across all included messages\n- Tagged-messages sidebar tab when bundle was made from the Tagged Messages page\n- Per-conversation Ctrl+F search bar\n- Compaction markers showing how many messages were collapsed between non-adjacent included messages\n\n### 2. Media Dashboard\n\nA folder with:\n```\noutput_dir\u002F\n  index.html                ← opens in any modern browser at file:\u002F\u002F\n  vendor\u002F  app.css, app.js  ← bundled UI engine, no CDN\n  data\u002F    manifest.js, meta_000.js … meta_NNN.js\n  thumbs\u002F  aa\u002Fbb\u002F\u003Csha>.avif (sharded by hash prefix, deduped)\n```\n\nCascading filters (conversation × sender × MIME × extension × status × date), per-day histogram, top-domains chart, virtual list with IntersectionObserver-driven thumb loading, in-browser CSV \u002F XLSX \u002F HTML exports, \"find every chat that shared this hash\" popup. **Thumbnails:** AVIF when PIL has the plugin (≈3 KB \u002F thumb at 224 px), JPEG fallback. **Disk-priority:** when the original file is on disk we re-render from it for near-original quality; PDF first pages come from the WhatsApp blob (no PyMuPDF dependency required).\n\n### 3. Group \u002F Contact PDF reports\n\nLandscape-A4 PDFs rendered through `QWebEngineView.printToPdf` with an off-screen 1400×1800 viewport so wide tables compute proper column widths before printing.\n\n---\n\n## Architecture\n\n```\n┌──────────────────────── .wfacase package ────────────────────────┐\n│                                                                  │\n│  case.json            chain_of_custody.jsonl                     │\n│  analysis.db          analysis.db-shm   analysis.db-wal          │\n│  sources\u002F             read-only copy of msgstore.db, wa.db, …    │\n│  media\u002F               resolved on-disk media tree                │\n│  _gallery_thumbcache.db   per-case L2 thumbnail cache            │\n│  exports\u002F             HTML viewer bundles, dashboards, PDFs      │\n│                                                                  │\n└──────────────────────────────────────────────────────────────────┘\n            ▲                                       ▲\n            │ written by                            │ read-only by\n            │                                       │\n┌───────────┴──────────────┐         ┌──────────────┴────────────┐\n│  backend\u002F                 │         │  gui\u002F  (PySide6)          │\n│   • Orchestrator          │         │   • 30-page navigation    │\n│   • 29 sequential stages  │         │   • Chat viewer = QWebEng │\n│   • progress events       │         │     + QWebChannel bridge  │\n│  CLI: python run_ingest.py│         │  Entry: python wainsight.py│\n└───────────────────────────┘         └───────────────────────────┘\n```\n\n### Two-process model\n\nThe ingester (`backend\u002Frun_ingest.py`) and the viewer (`gui\u002Fmain.py`) are fully decoupled:\n\n- The ingester only writes to `analysis.db` and emits JSON progress events to stdout. It can run headlessly on a server.\n- The viewer only reads from `analysis.db` (`?immutable=1` + `PRAGMA query_only=1`) and the case directory. Multiple viewers can open the same case simultaneously.\n\n### Chat viewer\n\nFor chats with > 5 000 messages the viewer switches to a **windowed-flat virtual scroller** (Chrome \u002F QtWebEngine). It keeps a sliding window of 500 fully-rendered messages around the viewport centre and uses sharded `\u003Cscript>`-loaded tiles (100 messages each) plus `data-global-idx` anchors so jumping to message #47162 in a 47k-message chat is O(1). Tombstones for not-yet-loaded windows render as skeleton-shimmer placeholders so scrolling never shows blank gaps.\n\n### Offline-rendering for huge media cases\n\nThe Media Dashboard is **folder-shaped, never one giant `.html`** so V8 string limits (1 GiB) and renderer-process memory caps (4 GiB) don't bite. The folder layout (sharded thumbs, chunked metadata, vendored UI engine) is the established pattern for offline forensic artifacts that need to scale into the hundreds of thousands of rows while still opening from `file:\u002F\u002F` on any modern browser.\n\n---\n\n## Forensic integrity\n\n| Property | How it's enforced |\n|---|---|\n| Source `msgstore.db` is never written | `sqlite3.connect(\"file:msgstore.db?mode=ro&immutable=1\", uri=True)` everywhere |\n| Source files are SHA-256 hashed at ingest | `_stage_hash` writes `source_hash_\u003Cfilename>` rows into `case_metadata` |\n| Every ingest action is journaled | `chain_of_custody.jsonl` — one JSON object per stage with timing + status |\n| Recovered media is tagged | `media.recovery_method` = `original` \u002F `downloaded` \u002F `hash_linked` \u002F `hash_linked_after_delete` \u002F `orphan_recovered` (12-state taxonomy preserved in every report and the Media Dashboard) |\n| Owner identity is explicit | Stored in `case_metadata` as `device_owner_name \u002F phone \u002F jid \u002F lid_jid` and threaded through every report + page section so owner messages never surface as \"Unknown\" |\n| Original IDs preserved | `message.source_msg_id`, `media.source_media_row_id`, `contact.source_jid_row_id`, etc. — every analysis row links back to its msgstore.db \u002F wa.db origin row |\n| Timestamps double-encoded | Every report shows local time + UTC in brackets so the case timezone is unambiguous |\n\n---\n\n## Tech stack\n\n| Layer | Tool |\n|---|---|\n| GUI framework | **PySide6** (Qt 6 official Python bindings) |\n| Theming | **qt-material** + custom QSS for light + dark parity |\n| Chat rendering | **QWebEngineView** (Chromium) + `QWebChannel` bridge to Python |\n| Data layer | **SQLite** with FTS5 + custom `analysis.db` schema (47 tables) |\n| Image processing | **Pillow** (with built-in AVIF \u002F WebP \u002F JPEG-XL plugins on Pillow 12+) |\n| Crypto | **pycryptodome** for view-once \u002F encrypted attachment decrypt |\n| Spreadsheet export | **openpyxl** |\n| Protobuf parsing | **protobuf** (used for some msgstore inner blobs) |\n| Optional: video thumbnails | **ffmpeg** on PATH |\n| Optional: PDF thumbnails | **PyMuPDF** (`pip install pymupdf`) |\n| Tested OS | Windows 10 \u002F 11 (primary) |\n\n---\n\n## Repository layout\n\n```\nWAInsight\u002F\n├── wainsight.py                     # GUI launcher\n├── requirements.txt\n├── README.md                        # ← you are here\n├── LICENSE\n│\n├── backend\u002F                         # Pure-Python: no Qt imports\n│   ├── run_ingest.py                # Headless ingest CLI\n│   └── app\u002F\n│       ├── ingestion\u002F               # 29 stage modules + orchestrator\n│       │   ├── orchestrator.py\n│       │   ├── message_ingester.py\n│       │   ├── media_ingester.py\n│       │   ├── call_ingester.py\n│       │   ├── revoke_ingester.py\n│       │   ├── edit_ingester.py\n│       │   ├── contact_resolver.py\n│       │   ├── orphaned_media_ingester.py\n│       │   ├── keyid_classifier.py\n│       │   └── …\n│       ├── db\u002Fschema.py             # 47-table analysis schema\n│       ├── reports\u002F                 # Report generators\n│       │   ├── group_report.py\n│       │   ├── contact_report.py\n│       │   ├── media_report.py      # Folder-shaped dashboard generator\n│       │   └── dashboard_assets\u002F    # index.html + app.css + app.js\n│       └── export\u002F                  # Offline HTML viewer bundle\n│           ├── viewer_bundle_exporter.py\n│           └── viewer_assets\u002F       # index.html \u002F viewer.js \u002F viewer.css\n│\n├── gui\u002F                             # PySide6 only\n│   ├── main.py\n│   └── app\u002F\n│       ├── views\u002F\n│       │   ├── pages\u002F               # 30 page modules\n│       │   │   ├── chat_viewer_page.py\n│       │   │   ├── group_info_page.py\n│       │   │   ├── contact_detail_page.py\n│       │   │   ├── media_gallery_page.py\n│       │   │   ├── documents_page.py\n│       │   │   ├── calls_page.py\n│       │   │   ├── links_page.py\n│       │   │   ├── cross_contact_page.py\n│       │   │   └── …\n│       │   ├── widgets\u002F             # Chat renderer JS, calendar heatmap, etc.\n│       │   │   ├── chat_renderer.js   # Windowed-flat virtual scroller\n│       │   │   ├── chat_styles.css\n│       │   │   ├── chat_web_view.py   # QWebEngineView host\n│       │   │   ├── chat_bridge.py     # QWebChannel bridge\n│       │   │   └── calendar_heatmap.py\n│       │   └── dialogs\u002F             # Report \u002F export \u002F tag dialogs\n│       ├── services\u002F                # Database, ThemeManager, MediaCrypto, ImageSimilarity\n│       └── resources\u002Fthemes\u002F        # light.qss, dark.qss\n│\n└── shared\u002F                          # Used by both backend + gui\n    ├── system_event_formatter.py    # 60+ system event types → human text\n    └── forensic_provenance.py       # Bubble's ℹ side-panel data builder\n```\n\n---\n\n## Roadmap\n\nCurrently planned (no firm dates):\n\n- **WhatsApp Business** support — the Business app uses a similar but distinct schema (extra columns for catalogue, orders, quick-replies, labels, business-profile metadata).  Adding a parallel ingester so cases mixing personal + Business accounts can be analysed in the same UI.\n- Optional GPU acceleration for the perceptual-hash search on very large cases\n\nPull requests are welcome. The codebase is heavily commented in the doc-string-driven style — most files start with a multi-paragraph \"why this exists\" header so newcomers can find their feet quickly.\n\n---\n\n## Acknowledgements\n\nThe schema research, the 29-stage ingestion pipeline, and the 30 analysis pages here are my own work — built up over many months of reverse-engineering `msgstore.db` + `wa.db`, the `Media\u002F` layout, WhatsApp's `key_id` patterns and the device-companion key-ID space, then iterating against real cases.\n\nPart of that work was usefully **cross-checked** against the published research of:\n\n> **Francisco Arenaz Benito** — *Análisis forense de la aplicación WhatsApp en sistemas Android e iOS*\n> Ediciones Universidad de Salamanca · Ágora Policial · ISBN **978-84-1091-202-1**\n> [eusal.es \u002F 978-84-1091-202-1](https:\u002F\u002Feusal.es\u002Fproducto\u002Fanalisis-forense-de-la-aplicacion-whatsapp-en-sistemas-android-e-ios\u002F)\n\nHis book confirmed several hypotheses I'd already formed and saved time on validation — credit and thanks where due.  Anyone serious about WhatsApp forensics on Android should read it.\n\nThe other accelerator during development was my own companion tool, open-sourced separately:\n\n> **SQLite GUI Analyzer** — [github.com\u002Fakhil-dara\u002Fsqlite-gui-analyzer](https:\u002F\u002Fgithub.com\u002Fakhil-dara\u002Fsqlite-gui-analyzer)\n\nThat GUI is what made the schema-mapping work tractable.  My actual workflow when reverse-engineering a WhatsApp table looked like:\n\n1. Open `msgstore.db` in the analyzer.\n2. Search a known value globally — e.g. paste a poll's `_id`, a message's `key_id`, a JID, a SHA-256 — and let the analyzer scan **every column of every table** for matches.\n3. Double-click each hit to open that row in its own panel; line up several panels **side-by-side** so I can see the same value lighting up in three or four related tables at once.\n4. Right-click → **Copy schema** for each table I'm comparing, paste into a scratch buffer, annotate.\n5. Pick an exact-match search on the next ID, validate the foreign-key relationship directly against my own evidence DB instead of re-reading PRAGMA output.\n\nThat click-search-validate loop is faster than reading the schema text on its own and is what surfaced things like the `message_quoted_text` ghost-recovery path, the album parent\u002Fchild linkage, the `chat.participation_status` owner-role encoding, and the HD\u002FSD twin association — all things you'd otherwise miss if you only looked at the static schema.\n\nIf you're trying to ingest a different app's SQLite store, that's the tool I'd recommend starting with.\n\n### Per-message platform attribution — separate research\n\nThe Android \u002F iPhone \u002F Web \u002F companion-device tag that ships on every message bubble is a **separate piece of empirical research**, *not* something the SQLite GUI Analyzer surfaced.  I built the classifier the hard way:\n\n- Started by noticing a recurring prefix pattern in the `key_id` column on every message I'd sent from my own Android handset.\n- Asked friends on iPhone to send me chat exports + collected JIDs of friends who messaged me from iPhones — different `key_id` shape, with its own consistent prefix length \u002F charset.\n- Repeated the exercise for **WhatsApp Web \u002F Desktop** sessions and for linked-companion devices (the secondary Android \u002F iPad \u002F Web sessions WhatsApp lets you attach).\n- Cross-referenced enough samples to write a robust classifier with a confidence score per message — that's what powers the inline **Android · iPhone · Web\u002FDesktop · Companion #N** tag on every bubble + the per-contact *Device Sessions* table on the contact detail page.\n\nAcknowledging this honestly: this part wasn't accelerated by any tool — it was patient sample collection from real users on real devices, then iterating until the rule set held up against new data.\n\n---\n\n## License\n\n[MIT](LICENSE) — see the LICENSE file for the full text.\n\nWAInsight is provided **as-is** for legitimate digital-forensic and incident-response work. Use of this tool against extractions you do not have legal authority to analyse is your responsibility. The authors disclaim liability for misuse.\n\n---\n\n\u003Cdiv align=\"center\">\n\n**WAInsight** — built with care for the forensic community.\n\nFound a bug?  Open an issue.\n\n\u003C\u002Fdiv>\n","WAInsight 是一款针对已获取的WhatsApp Android数据库进行法医分析的工具。它能够将这些数据库文件导入到一个只读的标准化案例数据库中，并提供了一个30页的桌面用户界面，用户可以像在WhatsApp主屏幕上一样浏览所有聊天记录，包括查看聊天列表、消息气泡、编辑历史、撤回信息等。此外，该工具还提供了多种法医页面，如媒体恢复、视觉哈希搜索、联系人和群组报告以及离线导出包等功能。WAInsight 适用于数字取证团队、执法检查员及事件响应者，在确保不修改证据的前提下，帮助他们深入分析WhatsApp数据。基于Python 3.10+ 和 PySide6开发，保证了软件的高性能与跨平台兼容性。","2026-06-11 04:02:25","CREATED_QUERY"]