[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80529":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},80529,"books-for-bots","prime-radiant-inc\u002Fbooks-for-bots","prime-radiant-inc","Rust CLI that converts EPUBs into a single YAML-headed Markdown file with per-chapter byte and line offsets, giving LLM agents a navigation API for token-efficient reading.",null,"Rust",72,4,64,1,0,3,7,9,49.3,false,"main",true,[],"2026-06-12 04:01:29","# books-for-bots\n\nConvert an EPUB into a single YAML-headed Markdown file optimized for token-efficient reading by LLM agents.\n\n## Why\n\nEPUB is a great format for humans and a terrible format for agents.\n\nA typical agent reading a book has to (a) unpack the zip, (b) parse the OPF manifest, (c) follow the spine, (d) parse each XHTML document, (e) chase cross-document references, (f) do all of that without exhausting its context window. Most don't even try; they paste a chapter at a time from clipboard.\n\n`books-for-bots` flips that. Each book becomes one plain GFM Markdown file with a YAML frontmatter listing every chapter and its **absolute byte and line offsets**. An agent can:\n\n```\nRead book.md --offset 412 --limit 200\n```\n\n…and land directly on Chapter 1's heading, no parsing, no traversal. The frontmatter is the chapter-level navigation API.\n\n## What you get\n\n```\noutput\u002F\u003Cslug>\u002F\u003Cslug>.md\noutput\u002F\u003Cslug>\u002Fimages\u002F\u003Cbasename>\n```\n\nThe Markdown file opens with frontmatter that looks like this (see [`examples\u002Falice\u002F`](examples\u002Falice\u002F) for the full output):\n\n```yaml\n---\ntitle: \"Alice's Adventures in Wonderland\"\nauthors: [Lewis Carroll]\npublished: 2008-06-27\nlanguage: en\nsource_file: examples\u002Falice\u002Falice-pg11-images.epub\nchapters:\n  - title: CHAPTER I. Down the Rabbit-Hole\n    line:         94\n    byte:       2866\n  - title: CHAPTER II. The Pool of Tears\n    line:        158\n    byte:      14524\n  - title: CHAPTER III. A Caucus-Race and a Long Tale\n    line:        216\n    byte:      25801\n  - title: CHAPTER IV. The Rabbit Sends in a Little Bill\n    line:        312\n    byte:      35010\n  ...\n---\n\n## CHAPTER I. Down the Rabbit-Hole\n\nAlice was beginning to get very tired of sitting by her sister...\n```\n\nBelow the frontmatter is plain GFM: `## Chapter Heading`, `**bold**`, `*italic*`, `[link](href)`, `[^footnote]`, GFM pipe tables, fenced code blocks. No embedded HTML except where GFM requires it (table cell `\u003Cbr>`).\n\nNumeric offsets are leading-padded to a fixed 10-character field so the frontmatter byte size is invariant. YAML plain-scalar parsing strips that padding, so consumers parse them as integers. Padding is leading (not trailing) because trailing whitespace gets eaten by editors and pre-commit hooks.\n\n## Use\n\n```sh\ncargo install --path .\nbooks-for-bots my-book.epub --output-dir output\n```\n\nPass `--force` to overwrite an existing output directory.\n\nThat's the entire CLI surface. No flags for \"include or skip footnotes,\" no flags for \"merge or split chapters.\" The tool does one specific thing.\n\n## Example\n\n[`examples\u002Falice\u002F`](examples\u002Falice\u002F) contains Project Gutenberg's _Alice's Adventures in Wonderland_ (PG #11, public domain) and the converted output. 1832 lines of Markdown, 12 chapters with clean offsets, every chapter's heading reachable via the frontmatter byte position.\n\nTo regenerate it:\n\n```sh\nbooks-for-bots examples\u002Falice\u002Falice-pg11-images.epub --output-dir examples\u002Falice\u002Foutput --force\n```\n\nThen verify the offsets work:\n\n```sh\n# Land on Chapter VII's heading using the byte offset from the frontmatter:\ndd if=examples\u002Falice\u002Foutput\u002Falice-s-adventures-in-wonderland-lewis-carroll\u002Falice-s-adventures-in-wonderland-lewis-carroll.md \\\n   bs=1 skip=76563 count=80 2>\u002Fdev\u002Fnull\n# → ## CHAPTER VII. A Mad Tea-Party\n#\n#   There was a table set out under a tree...\n```\n\n## Design principles\n\n1. **Deterministic.** Same input → byte-identical output. No timestamps, no random ordering, no pretty-printing variability. Two runs always agree.\n2. **No agentic judgment.** Every transform is a fixed rule. The tool doesn't decide which images are \"decorative,\" doesn't guess whether a paragraph is \"important,\" doesn't summarize. If it's in the spine, it's in the output.\n3. **Faithful to source.** All text is preserved. Whitespace is collapsed where browsers would collapse it, preserved where they would preserve it (`\u003Cpre>`). Smart quotes, em-dashes, accented characters — all intact.\n4. **One file per book.** Books are immutable. Treat the converted Markdown as immutable too. Offsets are stable seek targets.\n\n## How it's built\n\nA five-stage Rust pipeline:\n\n1. **`load`** wraps the [`epub`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fepub) crate. Returns a typed `Book` with metadata, ordered spine documents, and image bytes keyed by manifest path.\n2. **`extract`** uses [`scraper`](https:\u002F\u002Fcrates.io\u002Fcrates\u002Fscraper) (which is built on `html5ever`) to parse each spine document into a real DOM, then walks the DOM into a `Block`\u002F`Inline` IR. Recognizes paragraphs, headings, lists, tables, blockquotes, fenced code, images, footnote references and definitions, anchors. Treats `\u003Cdiv>`\u002F`\u003Cspan>`\u002F`\u003Csection>` as transparent. Drops empty elements.\n3. **`assemble`** stitches spine documents into a chapter sequence. Resolves chapter titles by priority (TOC label → first H1\u002FH2 → HTML `\u003Ctitle>` → filename). Namespaces footnote IDs per chapter (`[^cN-id]`). Rewrites cross-chapter links to GFM auto-slugs of their target chapter. Drops \"running header\" spine docs (Calibre\u002Fprint-layout artifacts that just contain `\u003Ch1>Book Title\u003C\u002Fh1>`).\n4. **`render`** serializes the `Block` tree to GFM Markdown into a `String`, recording the body-relative byte and line position of each chapter heading. Hand-written serializer; no `pulldown-cmark` round-trip. Handles GFM pipe escaping, backtick-fence widening, and heading-text whitespace collapse.\n5. **`write`** computes the YAML frontmatter (with leading-padded offsets so its size is invariant), concatenates frontmatter + body, writes the markdown file, and copies referenced images to `images\u002F`.\n\nThe whole binary is around 2,000 lines of Rust. Statically linked, no runtime dependencies.\n\n## Real-world quirks\n\nThe tool runs cleanly across a wide range of EPUBs. A few patterns are worth knowing:\n\n- **XHTML self-closing `\u003Cscript \u002F>`** in `\u003Chead>` would otherwise break HTML5 parsing. The `extract` module strips `\u003Chead>` before parsing.\n- **Calibre-split spine docs** (one spine document per print page) leave behind \"running header\" pages whose only content is `\u003Ch1>Book Title\u003C\u002Fh1>`. Those get dropped.\n- **Embedded TOC chapters** in the source EPUB get faithfully transmitted but their internal links collapse to chapter-level slugs (the YAML frontmatter is the navigation API; the embedded TOC is decorative).\n- **Footnote markup variations**: `\u003Csup>\u003Ca>` wrappers, `epub:type=\"noteref\"`, plain `\u003Ca href=\"other.html#fnX\">N\u003C\u002Fa>` with short marker text — all detected and emitted as `[^id]` references. Calibre's `#filepos…` fragment style is supported.\n- **Source artifacts** (pirate-site watermarks, leftover XML escapes, page-break HTML comments) are transmitted as-is. Garbage in, garbage out — the tool is honest about what's in the book.\n\n## Build and test\n\n```sh\ncargo build --release\ncargo test\n```\n\nTests are entirely synthetic. The fixture builder under `tests\u002Fcommon\u002F` constructs in-memory EPUBs at test time using `epub-builder`. No `.epub` files of any kind are committed (except the public-domain Alice in `examples\u002F`).\n\n## Documentation\n\n- [`docs\u002Fspecs\u002F2026-05-01-design.md`](docs\u002Fspecs\u002F2026-05-01-design.md) — design spec\n- [`docs\u002Fplans\u002F2026-05-01-implementation.md`](docs\u002Fplans\u002F2026-05-01-implementation.md) — TDD implementation plan\n\n## Credits\n\nThe example book is _Alice's Adventures in Wonderland_ by Lewis Carroll, [Project Gutenberg eBook #11](https:\u002F\u002Fwww.gutenberg.org\u002Febooks\u002F11). Public domain. Project Gutenberg's terms of use are included with the source EPUB.\n\n## License\n\nMIT.\n","`books-for-bots` 是一个用 Rust 编写的命令行工具，用于将 EPUB 文件转换成带有 YAML 前置的 Markdown 文件，优化了大语言模型代理阅读时的令牌效率。其核心功能是生成单个 Markdown 文件，并在文件开头提供包含每章绝对字节和行偏移量的 YAML 前置信息，使得代理能够直接跳转到指定章节而无需解析整个文档结构。该工具特别适合需要高效处理电子书内容的大规模自然语言处理应用或自动化阅读系统使用。通过简单的 CLI 操作即可完成转换过程，输出格式清晰且易于进一步处理。",2,"2026-06-11 04:01:06","CREATED_QUERY"]