[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83335":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":31,"readmeContent":32,"aiSummary":10,"trendingCount":15,"starSnapshotCount":15,"syncStatus":33,"lastSyncTime":34,"discoverSource":35},83335,"textsnap","kouhxp\u002Ftextsnap","kouhxp","Snap any image, screenshot, or webpage into plaintext. No GPU. No cloud. One command.","",null,"Python",94,1,63,0,6,31,21,0.9,"MIT License",false,"main",true,[25,26,27,28,29,30],"image2text","ocr","onnx","paddlepaddle","screenshot","transcription","2026-06-12 02:04:33","# textsnap\n\n> **Snap any image, screenshot, or webpage into plaintext. No GPU. No cloud. One command.**\n\n![textsnap demo](demo-textsnap.jpg)\n\n![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.9+-blue)\n![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green)\n![Platforms](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fplatforms-macOS%20%7C%20Linux%20%7C%20Windows-lightgrey)\n\n```\ntextsnap screenshot.png\n```\n\nThat's it. You get a `.txt` next to your shell, recognized on your CPU, from a screenshot, a photo, an image URL, or even a webpage.\n\n---\n\n## Why textsnap\n\n- ⚡ **Runs on CPU.** A 0.9B PaddleOCR-VL-1.5 vision-language model, quantized to q4 ONNX, parses full pages on a plain laptop. No CUDA. No M-series-only tricks. Plain old cores, pinned to your physical-core count.\n- 🖼 **Images, screenshots, URLs, webpages.** Point it at a local file, a direct image URL, or a full article URL — it isolates the main content and OCRs the most prominent image. Or OCR straight from your clipboard with no argument at all — and get the text put *back* on your clipboard, ready to paste.\n- 📴 **Offline after first run.** ~890 MB of ONNX downloads once to your cache and stays there. No API keys. No quotas. Your images never leave your machine.\n- 🎒 **Portable.** Drop the model files next to the script and the whole folder becomes a self-contained, copy-anywhere tool — no install, no download, no flags.\n- 🪶 **One file.** The whole tool is a single Python module. Dependencies install themselves on first run if missing.\n- 📝 **Markdown or plaintext.** Default output is the model's native markdown (tables, headings, structure preserved). Add `--plaintext` to flatten it.\n\n---\n\n## Quickstart\n\n```\n# Install\npip install textsnap\n\n# Snap something\ntextsnap screenshot.png\ntextsnap https:\u002F\u002Fexample.com\u002Farticle --plaintext\ntextsnap photo.jpg -o ~\u002Fnotes\u002Freceipt.txt\n```\n\nThe first run downloads the model (~890 MB). Every run after is offline.\n\n---\n\n## What it handles\n\n| Source            | Example                                  |\n| ----------------- | ---------------------------------------- |\n| Clipboard         | `textsnap` *(no argument)*               |\n| Local image file  | `textsnap path\u002Fto\u002Fimg.png`               |\n| Direct image URL  | `textsnap https:\u002F\u002Fexample.com\u002Fx.png`     |\n| Webpage URL       | `textsnap https:\u002F\u002Fexample.com\u002Farticle`   |\n\nLocal files cover anything Pillow can decode: `.png`, `.jpg`, `.jpeg`, `.webp`, `.bmp`, `.gif`, `.tiff`, and friends. For webpage URLs, textsnap uses readability to isolate the main content, then picks the most prominent image on the page and OCRs that.\n\n---\n\n## Clipboard in, clipboard out\n\nRun `textsnap` with **no argument** and it reads the image currently on your clipboard. The recognized text is then copied **straight back to the clipboard**, so a screenshot-to-text round trip is just: snap → `textsnap` → paste.\n\nThe `.txt` file is still written as well (and its path still printed to stdout), so nothing about scripting changes — the clipboard copy is a pure convenience layered on top.\n\nClipboard-out uses your platform's native tool — `pbcopy` (macOS), `clip` (Windows), or `wl-copy` \u002F `xclip` \u002F `xsel` (Linux) — so it needs no extra Python package. If none of those is installed, textsnap simply skips the clipboard copy; the `.txt` file is always there regardless. (Run with `-v` to see whether the copy succeeded.)\n\n---\n\n## Portable mode\n\nBy default textsnap downloads its model files to an OS cache directory (`~\u002F.cache\u002Ftextsnap\u002F`). But if it finds the model files **sitting next to the script**, it uses those directly — no download, no `--model-dir` flag, no setup at all.\n\n\"Next to the script\" means a layout like:\n\n```\ntextsnap\u002F\n├── textsnap.py\n├── onnx\u002F\n│   ├── vision_encoder_q4.onnx\n│   ├── decoder_q4.onnx\n│   └── embedding.onnx\n└── tokenizer.json\n```\n\nDrop those files in, and you can copy the entire `textsnap\u002F` folder to any machine — a USB stick, an air-gapped box, a fresh laptop — and run it immediately, fully offline, with zero install steps.\n\nModel-directory resolution order:\n\n1. `--model-dir DIR` — if you pass it explicitly, it always wins.\n2. **Portable** — model files found next to the script.\n3. **OS cache** — `~\u002F.cache\u002Ftextsnap\u002F`, downloading on first run if needed.\n\n> Like `--model-dir`, portable-mode files are **not** SHA-256 verified — files you placed there yourself are trusted by definition. Integrity verification applies to files textsnap *downloads*. See [Security](#security).\n\n---\n\n## Install\n\n```\npip install textsnap\n```\n\nInstalls two equivalent commands on your `PATH`: **`textsnap`** (canonical) and **`ocr`** (alias, for when the name slips your mind).\n\nTo install from a local source checkout instead:\n\n```\npip install .\n```\n\nFor a reproducible install with exact pinned dependency versions:\n\n```\npip install -r requirements-lock.txt\npip install .\n```\n\n> **Clipboard note.** Reading images *from* the clipboard relies on Pillow's `ImageGrab`; on Linux you may need `xclip` or `wl-clipboard` installed. Writing recognized text *back* to the clipboard uses `pbcopy` \u002F `clip` \u002F `wl-copy` \u002F `xclip` \u002F `xsel`. macOS and Windows work out of the box.\n\n---\n\n## Usage\n\n```\n# Clipboard (no argument) — text is also copied back to the clipboard\ntextsnap\n\n# Local image file\ntextsnap path\u002Fto\u002Fscreenshot.png\n\n# Direct image URL\ntextsnap \"https:\u002F\u002Fexample.com\u002Fdiagram.png\"\n\n# Webpage — OCRs the most prominent image on the page\ntextsnap \"https:\u002F\u002Fexample.com\u002Farticle\"\n\n# Flatten the model's markdown to plain text\ntextsnap input.png --plaintext\n\n# Custom output path\ntextsnap input.png -o .\u002Fout\u002Fextracted.txt\n\n# Raise the token cap for very dense pages\ntextsnap dense-page.png --max-tokens 4096\n\n# Trade accuracy for speed by shrinking the image budget\ntextsnap input.png --max-pixels 250000\n\n# Use a local model directory instead of downloading\ntextsnap input.png --model-dir ~\u002Fmodels\u002Fpaddleocr-vl\n```\n\n---\n\n## Output\n\nPlaintext, UTF-8. Default location is `.\u002Ftextsnaps\u002F` (created if missing) under the current working directory; override with `-o`. The filename is derived from the image filename stem (`receipt_ocr.txt`), or from the webpage slug for URL inputs.\n\ntextsnap is quiet by default, Unix-style: the **only** thing printed to stdout is the path to the file it wrote, so it composes cleanly —\n\n```\nOUT=$(textsnap receipt.png)   # capture the path\ntextsnap receipt.png | xargs cat   # print the recognized text\n```\n\nWhen the input is the clipboard, the recognized text is *also* placed on the clipboard — see [Clipboard in, clipboard out](#clipboard-in-clipboard-out).\n\nPass `-v` to send progress diagnostics (input type, image size, decode speed, token counts) to **stderr**; stdout stays just the path either way.\n\nDefault file output is the model's **native markdown** — it preserves tables, headings, and document structure:\n\n```\n# Quarterly Report\n\n| Region | Revenue |\n| ------ | ------- |\n| EMEA   | $1.2M   |\n| APAC   | $0.9M   |\n```\n\nWith **`--plaintext`**, markdown is flattened to bare text:\n\n```\nQuarterly Report\n\nRegion Revenue\nEMEA $1.2M\nAPAC $0.9M\n```\n\n---\n\n## Flags\n\n| Flag                  | Description                                                          |\n| --------------------- | -------------------------------------------------------------------- |\n| `-o`, `--output`      | Output `.txt` path. Default: `.\u002Ftextsnaps\u002F\u003Cname>_ocr.txt`.           |\n| `-v`, `--verbose`     | Print progress diagnostics to stderr. Off by default.                |\n| `--plaintext`         | Flatten the model's native markdown to plain text.                   |\n| `--model-dir`         | Use ONNX\u002Fconfig files from this directory. Overrides portable mode and the OS cache. |\n| `--max-tokens`        | Cap generated tokens. Default `2048`. Raise it for very dense pages. |\n| `--max-pixels`        | Image pixel budget fed to the vision encoder. Default is the model's maximum. Lower trades accuracy for speed; too low makes the model hallucinate. The image is only ever shrunk, never enlarged. |\n| `--no-verify`         | Skip SHA-256 verification of downloaded model files (not advised).   |\n| `--generate-checksums`| Download the pinned model files, write a fresh manifest, and exit.   |\n\nAn environment variable, `TEXTSNAP_DECODE_THREADS`, overrides the decoder's intra-op thread count if you want to tune CPU decode for a specific machine. Left unset, textsnap picks a sensible default based on your physical core count.\n\n---\n\n## Security\n\ntextsnap auto-downloads ~890 MB of model weights from the Hugging Face Hub on first run, so it treats those files as untrusted until proven otherwise:\n\n- **Pinned model revision.** Downloads are pinned to a specific repo revision, so a moved or retagged `main` can't silently swap the weights.\n- **SHA-256 verification.** Every downloaded file is hashed and checked against known-good digests before it's loaded. A mismatch aborts the run with a clear error rather than executing unverified weights. Digests live in [`model_checksums.sha256`](model_checksums.sha256) and are also embedded in the script as a fallback, so verification works whether you install from source or from a wheel.\n- **Pinned dependencies.** [`requirements-lock.txt`](requirements-lock.txt) pins exact dependency versions for reproducible installs; the file documents how to add per-wheel `--hash` entries with `pip-compile --generate-hashes` for full supply-chain pinning.\n\nVerification applies to files textsnap **downloads**. Model files you supply yourself — via `--model-dir` or [portable mode](#portable-mode) — are trusted as-is and not re-hashed; you are responsible for their provenance.\n\nRegenerate the checksum manifest after a deliberate model-revision bump:\n\n```\ntextsnap --generate-checksums\n```\n\nTo bypass verification (for local experimentation with a modified model), pass `--no-verify`.\n\n---\n\n## How it works\n\n1. **Load.** From the clipboard, a local file, a direct image URL, or — for a webpage URL — the most prominent image inside the page's main content (readability + a prominence heuristic).\n2. **Preprocess.** The image is run through PaddleOCR-VL's Qwen2-VL-style smart-resize and patchify, producing the pixel-value tensor and grid the vision encoder expects. Smart-resize bounds the image to the model's pixel budget (tunable with `--max-pixels`) and snaps it to the patch grid — textsnap does not pre-shrink beyond that, since starving the encoder of resolution makes the model hallucinate rather than degrade gracefully.\n3. **Recognize.** Three ONNX components run on CPU: a vision encoder (q4), a token-embedding model (fp32), and an autoregressive decoder (q4) with a wired-up KV cache bound via ONNX Runtime IOBinding to avoid copying the cache each step. Greedy decode, guarded against runaway repetition by an n-gram block (it refuses to re-emit an n-gram it has already produced) plus a loop detector that trims any cycle that slips through.\n4. **Format.** Native markdown by default; `--plaintext` reduces it to bare text.\n\nNo image is sent anywhere. No state is kept between runs except the cached model.\n\n---\n\n## Model & cache\n\nThe PaddleOCR-VL-1.5 ONNX components are downloaded on first run to `~\u002F.cache\u002Ftextsnap\u002F`:\n\n- `onnx\u002Fvision_encoder_q4.onnx` — vision encoder + spatial-merge projector\n- `onnx\u002Fdecoder_q4.onnx` — autoregressive decoder\n- `onnx\u002Fembedding.onnx` — token embeddings (fp32; no q4 variant exists)\n- `tokenizer.json`, `config.json`\n\nTogether ~890 MB. To use your own copy, either point `--model-dir` at a directory containing the same `onnx\u002F` files plus `tokenizer.json` and `config.json`, or place those files next to the script for [portable mode](#portable-mode).\n\n---\n\n## Notes & limits\n\n- **First run is the slow one** — it downloads ~890 MB. After that, textsnap is fully offline.\n- **CPU decode is sequential.** Dense, full-page documents take longer than a short screenshot. textsnap pins thread counts to your physical cores and prints a live tokens\u002Fsec readout so a slow run is visibly alive, not hung.\n- **`--max-tokens` caps the output.** Very dense pages can hit the default 2048-token cap and truncate; raise it if the tail of a page is missing.\n- **`--max-pixels` is a speed\u002Faccuracy dial.** Lowering it speeds up the vision encoder but feeds the model a coarser image; set it too low and recognition quality drops sharply. The default (the model's full budget) is the safe choice.\n- **Webpage inputs OCR one image** — the most prominent one in the main content, not the whole rendered page.\n- **Greedy decoding** can occasionally loop on repetitive layouts; an n-gram block prevents most loops outright and a detector trims any that remain.\n\n---\n\n## License\n\nMIT for this project — see [LICENSE](LICENSE).\n\nThe model is **PaddleOCR-VL-1.5**, distributed under Apache-2.0 by PaddlePaddle; textsnap pulls the ONNX export from [`onnx-community\u002FPaddleOCR-VL-1.5-ONNX`](https:\u002F\u002Fhuggingface.co\u002Fonnx-community\u002FPaddleOCR-VL-1.5-ONNX). See the [original model card](https:\u002F\u002Fhuggingface.co\u002FPaddlePaddle\u002FPaddleOCR-VL-1.5) for model terms. Powered by [onnxruntime](https:\u002F\u002Fonnxruntime.ai\u002F) and [huggingface_hub](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fhuggingface_hub).\n",2,"2026-06-11 04:10:57","CREATED_QUERY"]