[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-83240":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":10,"trendingCount":15,"starSnapshotCount":15,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},83240,"papernews","marcj\u002Fpapernews","marcj","news and articles nicely rendered as calm pdf for readers","",null,"Python",133,15,3,0,4,48,26,3.61,"MIT License",false,"main",true,[25,26,27],"news","paper","remarkable","2026-06-12 02:04:32","# papernews\n\n![papernews on a reMarkable, next to a cup of coffee](assets\u002Fhero.jpg)\n\nEvery news site looks different. Hacker News, MacRumors, Quanta, my\nfavourite ML blog, my favourite math blog — each one its own layout, fonts,\ncolors, ads. To read anything I had to wade through somebody's design\nchoices first and focus past the visual noise.\n\nI much prefer reading the way a LaTeX paper or an old magazine looks: quiet\ntypography, generous margins, no color, nothing competing for attention.\n\n**papernews** is the fix. A script pulls all those feeds, has Claude clean\nup, translate to English, and rewrite the article bodies — the **full\ntext**, not just summaries — and renders the result into one consistently\ntypeset LaTeX PDF. Every article is *in* the PDF; you read entirely\noffline, no clicking through, no opening tabs.\n\nA side benefit I didn't expect to like but very much do: one place to read\nthe day's news instead of five tabs being refreshed all day. One or two\nissues per day, no more.\n\nDesigned for an e-ink reader like the reMarkable, but it works just as well\nin any browser's PDF viewer.\n\n**👉 [See `sample-2026-06-04.pdf` for a real day's output.](sample-2026-06-04.pdf)**\n\n## Status\n\nHobby project; works. Things will move. Expect rough edges.\n\n## How to use\n\nYou need: a machine that can run Docker (your laptop, a NAS, a $5\u002Fmo VPS,\nanything), an [Anthropic API key](https:\u002F\u002Fconsole.anthropic.com\u002Fsettings\u002Fkeys),\nand ~2 GB of disk for the image.\n\n```bash\n# 1) Pull\ngit clone https:\u002F\u002Fgithub.com\u002Fmarcj\u002Fpapernews\ncd papernews\n\n# 2) Configure your key\ncp .env.example .env\n$EDITOR .env             # paste ANTHROPIC_API_KEY=sk-ant-...\n\n# 3) Pick your sources\n$EDITOR sources.toml     # add\u002Fremove RSS\u002FHN entries, set per-source limits\n\n# 4) (Optional) Tweak the look\n$EDITOR papernews\u002Ftemplate.tex.j2\n\n# 5) Build + run\ndocker compose up --build -d\n\n# Open http:\u002F\u002Flocalhost:8000\n# First PDF builds on demand and is cached. Background ingest runs every 4h.\n```\n\nEverything you'd normally want to change is in **two files**:\n\n- **`sources.toml`** — which feeds, how many items per feed, in what order.\n  Two source kinds today: `kind = \"hn\"` (Hacker News, top-by-points via the\n  Algolia API) and `kind = \"rss\"` (any Atom\u002FRSS feed via feedparser).\n- **`papernews\u002Ftemplate.tex.j2`** — the LaTeX template. Page size, fonts,\n  colors, layout, what goes on the cover, everything. Edit, restart the\n  container, refresh `\u002Fdigest.pdf`.\n\nOptional but useful:\n\n- **`papernews\u002Fsummarize.py`** + **`papernews\u002Frewrite.py`** — the Claude\n  system prompts. Change `_MODEL` to `claude-sonnet-4-6` for fancier\n  rewrites at ~10× the cost; adjust `_SYSTEM` to change the editorial voice\n  (e.g. disable the auto-translate-to-English rule).\n- **`papernews\u002Fwiki.py`** — what goes into the World news block and the\n  Quote-of-the-day source.\n\n### Getting the PDF onto a reMarkable\n\nA few different ways, no special script needed:\n\n- **Manual** — open `http:\u002F\u002Fyour-machine:8000\u002Fdigest.pdf` in a browser on\n  your phone\u002Flaptop and upload it to your reMarkable from there (drag-and-\n  drop on `my.remarkable.com`, or the reMarkable mobile app, or the USB Web\n  Interface at `http:\u002F\u002F10.11.99.1` while connected by USB).\n- **[`rmapi`](https:\u002F\u002Fgithub.com\u002Fddvk\u002Frmapi)** — a third-party CLI that\n  pushes files to your reMarkable cloud account. Pair once, then:\n  ```bash\n  curl -s http:\u002F\u002Fyour-machine:8000\u002Fdigest.pdf -o today.pdf\n  rmapi put today.pdf \u002FPapernews\n  ```\n  Stick that two-liner in cron on the host and the device picks it up on\n  next sync automatically.\n- **[Remailable](https:\u002F\u002Fgithub.com\u002Fremailable\u002Fremailable)** — a third-party\n  email-to-reMarkable bridge ([remailable.getneutrality.org](https:\u002F\u002Fremailable.getneutrality.org)).\n  You email the PDF as an attachment to your assigned address and it appears\n  on the device. Useful if your papernews host can `mail`\u002F`mutt` but can't\n  reach the reMarkable directly. (reMarkable has no first-party\n  email-to-device; do not believe earlier versions of this README that\n  implied otherwise.)\n\nNo native push is built-in because everyone's setup is different and you\nprobably don't want me poking your reMarkable cloud account with your token.\n\n## Quick start\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fyourname\u002Fpapernews\ncd papernews\ncp .env.example .env\n# paste your ANTHROPIC_API_KEY into .env (get one at\n# https:\u002F\u002Fconsole.anthropic.com\u002Fsettings\u002Fkeys)\ndocker compose up --build\n```\n\nThen visit `http:\u002F\u002Flocalhost:8000` — landing page with a preview image and a\nlink to `\u002Fdigest.pdf`. The first PDF builds on demand, takes ~1–2 minutes the\nfirst time and is then cached until new content arrives.\n\nState lives in `.\u002Fdata\u002Fstate.db` (bind-mounted from the host) so it survives\ncontainer restarts.\n\n## What it produces\n\nA 100–200 page PDF with:\n\n- **Cover page**: title + date + article count, quote of the day from\n  Wikiquote, a \"World news\" block (5 tech headlines + 2 Western items from\n  Wikipedia's Current Events portal, each compressed to a single sentence).\n- **Contents**: every article grouped by source, with dot-leaders to its\n  publication date.\n- **\"Did you know…\"** trivia nuggets from Wikipedia's Main Page.\n- **The articles themselves**, set in two-column Latin Modern with proper\n  paragraph indents, hyphenation, microtypography. Math (`$x = y$`,\n  `$$\\int f$$`, `\\(...\\)`, `\\[...\\]`) is rendered as real LaTeX math. Code\n  blocks (fenced or inline) come through in monospace.\n- All non-English source content (heise, etc.) is translated to English\n  during the rewrite step. You can disable that in the prompt if you don't\n  want it.\n\n### Cover page\n\n[📄 See the full sample PDF →](sample-2026-06-04.pdf)\n\n[![Cover page: title, quote of the day, world news, table of contents](assets\u002Fcover.png)](sample-2026-06-04.pdf)\n\n### Article body\n\n[📄 See the full sample PDF →](sample-2026-06-04.pdf)\n\n[![A typical two-column article page, set in Latin Modern](assets\u002Farticle.png)](sample-2026-06-04.pdf)\n\n## Architecture\n\n```\n                   sources.toml\n                       │\n            ┌──────────┴──────────┐\n            │                     │\n            ▼                     ▼\n       ┌────────┐            ┌────────┐\n       │ gather │            │ wiki\u002F  │\n       │  HN +  │            │ news + │\n       │  RSS   │            │  QOTD  │\n       └───┬────┘            └───┬────┘\n           ▼                     │\n       ┌────────┐                │\n       │extract │                │\n       │ (traf- │                │\n       │  ilatura)               │\n       └───┬────┘                │\n           ▼                     │\n       ┌─────────┐               │\n       │summarize│ ─── Claude    │\n       └───┬─────┘               │\n           ▼                     │\n       ┌─────────┐               │\n       │ rewrite │ ─── Claude    │\n       └───┬─────┘               │\n           ▼                     ▼\n       SQLite store (state.db)   in-memory\n           │                     │\n           └──────────┬──────────┘\n                     ▼\n              ┌──────────┐\n              │  render  │ ── xelatex\n              └────┬─────┘\n                   ▼\n             archive\u002Fcache\u002F\u003Chash>.pdf\n```\n\nFour stages, each idempotent and resumable:\n\n1. **gather** — pulls new items from each source, runs `trafilatura` to\n   extract the article body, stores the raw text. Pure I\u002FO — no LLM cost.\n2. **summarize** — batches up to 8 articles per Claude call and produces a\n   ≤40-word two-sentence summary for each (used as the lede in the front\n   matter and in the contents listing).\n3. **rewrite** — batches up to 8 articles per Claude call (streamed because\n   the output is long) and produces a clean, properly-paragraphed,\n   translated-to-English version of each article body for the renderer.\n   Preserves code fences and `$math$` exactly.\n4. **render** — pulls the latest N articles per source from the store,\n   plus fresh world news + quote + DYK, and runs them through a Jinja\n   template into xelatex → PDF. Results are cached by a hash of \"what's in\n   the store\" + \"what's in sources.toml\". Same content + same config → same\n   cached PDF served instantly.\n\nA background `APScheduler` job runs steps 1–3 every 4 hours (configurable).\nThe render step is on-demand; the first hit to `\u002Fdigest.pdf` after an ingest\nbuilds the PDF and caches it.\n\n## HTTP endpoints\n\n| route          | what it does                                            |\n|----------------|---------------------------------------------------------|\n| `GET \u002F`        | minimal landing page, cover preview + Read PDF link     |\n| `GET \u002Fdigest.pdf` | the current edition (built on demand, then cached)   |\n| `GET \u002Fpreview.png` | page 1 rasterized at 180 DPI                        |\n| `GET \u002Fsources` | JSON list of configured sources + latest `fetched_at`   |\n| `GET \u002Fhealthz` | liveness probe (returns `ok`)                           |\n| `POST \u002Fingest` | manual kick of the gather → summarize → rewrite cycle   |\n\n## Configuring sources\n\nSources live in [`sources.toml`](sources.toml) — that's the exact file used\nto produce [the sample PDF](sample-2026-06-04.pdf). Open it, copy a block,\nedit, restart the container, refresh `\u002Fdigest.pdf`.\n\nThe order of `[[source]]` blocks in the file is the order they'll appear in\nthe PDF — sources at the top come first. World news, quote of the day, and\nthe \"Did you know…\" nuggets are not configured here — they're cover\ndecorations, fetched fresh on every render.\n\n### `kind = \"hn\"` — Hacker News via the Algolia search API\n\nRanks stories by points within a time window. No URL needed; the API is\nhardcoded.\n\n| field          | type | default | meaning |\n|----------------|------|---------|---------|\n| `name`         | string | required | display label (also the contents-page heading) |\n| `kind`         | string | required | must be `\"hn\"` |\n| `limit`        | int  | `10`     | how many top stories to keep |\n| `since_hours`  | int  | `48`     | only consider stories submitted in the last N hours |\n| `min_points`   | int  | `50`     | story must have at least this many points to qualify |\n\n```toml\n[[source]]\nname        = \"Hacker News\"\nkind        = \"hn\"\nlimit       = 10\nsince_hours = 48\nmin_points  = 100\n```\n\n### `kind = \"rss\"` — any Atom\u002FRSS feed\n\nParsed with [feedparser](https:\u002F\u002Ffeedparser.readthedocs.io\u002F), so it accepts\nRSS 0.9\u002F1.0\u002F2.0 and Atom 1.0 — every blog and most news sites work.\n\n| field   | type   | default  | meaning |\n|---------|--------|----------|---------|\n| `name`  | string | required | display label (also the contents-page heading) |\n| `kind`  | string | required | must be `\"rss\"` |\n| `url`   | string | required | feed URL |\n| `limit` | int    | `20`     | take at most N most-recent items |\n\n```toml\n[[source]]\nname  = \"Quanta Magazine\"\nkind  = \"rss\"\nurl   = \"https:\u002F\u002Fwww.quantamagazine.org\u002Ffeed\u002F\"\nlimit = 8\n```\n\n### Per-source ordering and limits in practice\n\nThe `limit` is applied **twice**, on purpose:\n\n- At **fetch** time: gather doesn't pull more than `limit` items from the\n  feed (saves bandwidth and trafilatura time).\n- At **render** time: even if the store accumulates more than `limit` items\n  for a source across multiple ingests (it will — items don't get deleted),\n  only the latest `limit` per source make it into a given PDF.\n\nSo if you want Quanta to have at most 8 articles in the issue, regardless of\nhow many they've published this week → set `limit = 8`. If you want Hacker\nNews to show only the top 5 by points in the last 24h → set `limit = 5,\nsince_hours = 24`.\n\n> **On the totals.** Adding up every `limit` in `sources.toml` gives you the\n> maximum article count per issue. Aim for **30–60 articles** for a\n> comfortable 30–60 minute read. Claude's summaries are dense; volume isn't\n> quality. An empty section on a slow day is cleaner than padding.\n\n## Scheduling ingests\n\nTwo modes; pick whichever fits your routine. Set the env var in `.env`.\n\n### Every N hours (default)\n\n```bash\n# .env\nINGEST_INTERVAL_SECONDS=14400   # 4 hours (the default)\n```\n\n### Cron-style fixed times — \"morning and evening edition\"\n\n```bash\n# .env\nINGEST_SCHEDULE=07:00,18:00     # comma-separated HH:MM\nINGEST_TIMEZONE=Europe\u002FLondon   # any IANA tz; default UTC\n```\n\nIf both are set, `INGEST_SCHEDULE` wins. The render is still on-demand —\nhitting `\u002Fdigest.pdf` between scheduled runs gives you the cached PDF\ninstantly.\n\nYou can also kick a manual ingest any time:\n\n```bash\ncurl -X POST http:\u002F\u002Flocalhost:8000\u002Fingest\n```\n\n## Delivery — push the PDF wherever you want\n\nA built-in hook fires after every successful ingest. Point\n`POST_INGEST_HOOK` at any executable on the container's filesystem (drop\nthe script into your `.\u002Fdata\u002Fhooks\u002F` directory so it survives rebuilds via\nthe bind mount). The hook receives the freshly-built PDF path as its first\nargument.\n\n```bash\n# .env\nPOST_INGEST_HOOK=\u002Fdata\u002Fhooks\u002Fpush-to-remarkable.sh\nPOST_INGEST_HOOK_TIMEOUT=300    # optional; default 300s\n```\n\nHook failures are non-fatal — a broken hook logs an error but doesn't\ncrash the ingest loop.\n\n### Sample: push to a reMarkable 2 over WiFi\n\nDrop this in `.\u002Fdata\u002Fhooks\u002Fpush-to-remarkable.sh` and `chmod +x` it:\n\n```bash\n#!\u002Fusr\u002Fbin\u002Fenv bash\n# Push the latest issue to a reMarkable 2 via SSH.\n# Usage: push-to-remarkable.sh \u003Cpdf-path>\nset -euo pipefail\n\nPDF=\"$1\"\nREMARKABLE=\"root@10.11.99.1\"            # adjust to your device's IP\nSSH_KEY=\u002Fdata\u002Fhooks\u002Fremarkable_id_ed25519\n\nscp -i \"$SSH_KEY\" -o StrictHostKeyChecking=accept-new \\\n    \"$PDF\" \"$REMARKABLE:\u002Fhome\u002Froot\u002Fpapernews.pdf\"\n\n# Refresh the UI so the file appears immediately.\nssh -i \"$SSH_KEY\" \"$REMARKABLE\" 'systemctl restart xochitl'\n```\n\nGenerate a passwordless key (`ssh-keygen -t ed25519 -f\ndata\u002Fhooks\u002Fremarkable_id_ed25519 -N \"\"`), add the `.pub` to the\nreMarkable's `\u002Fhome\u002Froot\u002F.ssh\u002Fauthorized_keys` once, and from then on\nevery ingest pushes the new paper to your device.\n\nThe same pattern works for Kindle (`scp` over USB networking), a network\nprinter (`lp -d papernews \"$PDF\"`), an email (`mutt -a \"$PDF\"`), or\nanything else you can script.\n\n## Tests\n\nModest, no-network unittest suite for the web\u002Fscheduling\u002Fhook behaviour:\n\n```bash\npython -m unittest discover -s tests\n```\n\n## Local development\n\nYou don't have to use Docker — the CLI works directly:\n\n```bash\npython3 -m venv .venv\n.venv\u002Fbin\u002Fpip install -e .\nexport ANTHROPIC_API_KEY=sk-ant-...\n\n.venv\u002Fbin\u002Fpython -m papernews gather       # fetch + extract\n.venv\u002Fbin\u002Fpython -m papernews summarize    # claude pass 1 (batched)\n.venv\u002Fbin\u002Fpython -m papernews rewrite      # claude pass 2 (batched, streamed)\n.venv\u002Fbin\u002Fpython -m papernews render       # xelatex → PDF\n# or all of the above in sequence:\n.venv\u002Fbin\u002Fpython -m papernews build\n```\n\nRequirements: Python 3.11+, `xelatex` (TeX Live with `texlive-xetex`,\n`texlive-latex-extra`, `lmodern`), `pdftoppm` (poppler).\n\n## Customizing the typography\n\nEverything visual lives in one file: [`papernews\u002Ftemplate.tex.j2`](papernews\u002Ftemplate.tex.j2).\n\n- Page size: `paperwidth=157mm, paperheight=210mm` (tuned for reMarkable Pro)\n- Body font: Latin Modern Roman 10pt\n- Two-column body for any article over 2000 characters; single-column\n  otherwise\n- First-line paragraph indent instead of vertical `\\parskip` (classic\n  magazine convention)\n- Microtype protrusion + expansion\n- Letter-spacing on small-caps source labels via fontspec's `LetterSpace`\n\nCustomize whatever you like — the Jinja delimiters are LaTeX-safe\n(`((* ... *))` for blocks, `((( ... )))` for variables) so your `{`, `}` and\n`\\` don't fight each other.\n\n## Cost\n\nRoughly per ingest cycle, with Claude Haiku 4.5 (default model):\n\n- ~50 articles\n- Summarize: 6 batched calls (~8 articles each)\n- Rewrite: 6 batched calls, streamed\n- World-news compress: 1 call\n\nOrder-of-magnitude: a few cents to a few tens of cents per cycle depending on\narticle lengths. At 6 cycles\u002Fday that's well under $1\u002Fday. Going to Sonnet or\nOpus multiplies the bill ~10–30×.\n\nSet a spend cap at\nhttps:\u002F\u002Fconsole.anthropic.com\u002Fsettings\u002Fbilling → Spend limits — the run-loop\ncan't surprise you above whatever you set.\n\n## Privacy\n\n- All data lives on your machine (`.\u002Fdata\u002Fstate.db` + `.\u002Fdata\u002Farchive\u002Fcache\u002F`).\n- Article text is sent to the Anthropic API for summarization and rewriting.\n  That's the only outbound destination for content (besides fetching the\n  feeds themselves).\n- No analytics, no telemetry, no third-party scripts in the landing page.\n\n## Project layout\n\n```\npapernews\u002F\n├── papernews\u002F\n│   ├── fetch.py          # HN Algolia + RSS feedparser\n│   ├── extract.py        # trafilatura\n│   ├── summarize.py      # Anthropic SDK, batched\n│   ├── rewrite.py        # Anthropic SDK, batched + streamed\n│   ├── wiki.py           # World news \u002F Quote \u002F DYK \u002F tech feeds\n│   ├── store.py          # SQLite article store + queries\n│   ├── render.py         # Jinja + xelatex\n│   ├── preview.py        # PDF → PNG via pdftoppm\n│   ├── cache.py          # On-disk cache by content hash\n│   ├── cli.py            # papernews command\n│   ├── web.py            # Flask + APScheduler\n│   └── template.tex.j2   # the magazine\n├── sources.toml          # configured feeds\n├── pyproject.toml\n├── Dockerfile\n├── docker-compose.yml\n└── data\u002F                 # gitignored — your SQLite + cached PDFs\n```\n\n## Contributing\n\nOpen an issue first if you're planning something non-trivial — happy to talk\nabout direction. The codebase is small enough that you can read it end to\nend in an hour.\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n## Why \"papernews\"\n\nWorking name; happy to take suggestions. The vibe is: an old-fashioned daily\npaper, not a feed. You read it once, then you put it down.\n",2,"2026-06-11 04:10:31","CREATED_QUERY"]