[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81729":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":15,"starSnapshotCount":15,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},81729,"crawlkit","openclaw\u002Fcrawlkit","openclaw","Shared Go infrastructure for local-first crawler archives.","",null,"Go",44,5,33,0,3,8,11,9,2.33,"MIT License",false,"main",true,[26,27,28],"crawler","go","infra","2026-06-12 02:04:18","# 🧱 crawlkit\n\n![crawlkit banner](docs\u002Fassets\u002Freadme-banner.jpg)\n\nShared Go infrastructure for local-first crawler archives.\n\n`crawlkit` is not a universal Slack, Discord, Notion, or GitHub crawler. It is\nthe reusable foundation beneath those tools: SQLite hygiene, TOML config\ndefaults, portable JSONL\u002FGzip packing, git-backed snapshot sharing, sync state,\nCLI output helpers, control\u002Fstatus metadata, a shared terminal explorer, and\nsafe desktop-cache snapshot utilities.\n\n## Install\n\n```bash\ngo get github.com\u002Fopenclaw\u002Fcrawlkit@latest\ngo install github.com\u002Fopenclaw\u002Fcrawlkit\u002Fcmd\u002Fcrawlctl@latest\n```\n\nGo packages are published by tagging this repository. There is no separate\npackage registry step. See `docs\u002Fpublishing.md` for the release commands.\nSee `docs\u002Fboundary.md` for the crawlkit-versus-app ownership boundary and\n`docs\u002Fremote-contract.md` for the Worker\u002Fclient split.\n\n## Packages\n\n- `config`: standard TOML config paths, opt-in platform-native runtime dirs,\n  migration-safe legacy path fallback, and token diagnostics.\n- `store`: SQLite open\u002Fread-only\u002Ftransaction\u002Fquery helpers.\n- `snapshot`: `manifest.json` plus JSONL\u002FGzip table snapshot export, file fingerprints, full import, and planned incremental shard import.\n- `backup`: age-encrypted JSONL\u002FGzip shards, backup manifests, recipient\u002Fidentity helpers, and shard restore verification.\n- `mirror`: clone\u002Finit\u002Fpull\u002Fcommit\u002Fpush helpers for private snapshot repos.\n- `state`: generic crawler cursor and freshness records.\n- `embed`: reusable OpenAI-compatible, Ollama, and llama.cpp embedding providers plus local probe diagnostics.\n- `vector`: float32 vector encoding, dimension validation, cosine scoring, top-k helpers, and reciprocal-rank fusion.\n- `releasecheck`: GitHub release checks, 24-hour cache handling, scripted-output\n  suppression, and stderr update notice formatting for crawl app CLIs.\n- `remote`: provider-neutral HTTP client, config, query, ingest, auth, status,\n  and protocol contract metadata for Worker-fronted remote archives such as\n  Cloudflare D1.\n- `output`: text\u002Fjson\u002Flog output helpers.\n- `control`: crawl app metadata, command manifests, status payloads, and\n  database inventory for launchers and automation.\n- `scheduler`: crawl app discovery, job config, single-process run locking,\n  JSONL run history, log paths, and launchd\u002Fsystemd\u002FWindows\u002Fcron schedule\n  rendering for controller CLIs.\n- `tui`: shared terminal archive explorer with gitcrawl-style responsive panes, entity\u002Fmember\u002Fdetail lanes, compact sortable headers, mouse selection, floating right-click actions, sorting\u002Ffiltering, and local\u002Fremote source status.\n- `cache`: safe read-only local cache snapshot helpers.\n\n## crawlctl\n\n`crawlctl` is the shared controller for keeping local crawl archives warm.\nIt discovers installed crawl apps through `metadata --json`, falls back to\ntemporary legacy adapters for older apps, runs configured jobs with a lock, and\nrecords one JSONL run record per command.\n\n```bash\ncrawlctl init --repo openclaw\u002Fopenclaw\ncrawlctl run\ncrawlctl status\ncrawlctl logs gitcrawl --tail 80\ncrawlctl install --dry-run\n```\n\nNative install backends:\n\n- macOS: `launchd`\n- Linux: `systemd --user`\n- Windows: Task Scheduler\n- portable fallback: cron line rendering\n\n## Downstream apps\n\n- `gitcrawl`, `discrawl`, `notcrawl`, `wacrawl`, `telecrawl`, and `slacrawl`\n  consume `crawlkit` on `main`.\n- The apps keep provider schemas, auth, desktop\u002FAPI parsing, privacy filters,\n  and user-facing CLI contracts. `crawlkit` owns only the reusable mechanics.\n\n## Safety\n\nLibrary tests use temporary directories. They do not touch app runtime stores\nsuch as `~\u002F.config\u002Fgitcrawl`, `~\u002F.slacrawl`, `~\u002F.discrawl`, or `~\u002F.notcrawl`.\n","crawlkit 是一个用于本地优先爬虫存档的共享 Go 基础设施。它提供了包括 SQLite 数据库管理、TOML 配置默认值、JSONL\u002FGzip 文件打包、基于 Git 的快照共享、同步状态管理以及 CLI 输出辅助等功能，帮助开发者构建高效且可扩展的数据抓取工具。该项目特别适用于需要对数据进行本地存储和处理的场景，如个人或企业级的信息归档、数据分析等。其模块化设计使得 crawlkit 可以轻松集成到各种爬虫应用中，同时保持了良好的可维护性和安全性。",2,"2026-06-11 04:06:09","CREATED_QUERY"]