[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11306":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":36,"discoverSource":37},11306,"servo-fetch","konippi\u002Fservo-fetch","konippi","A self-contained browser engine that fetches, renders, and extracts web content as Markdown, JSON, or screenshots — no Chromium, no API key, no setup.","",null,"Rust",124,11,21,7,0,2,9,54,6,3.24,"Apache License 2.0",false,"main",[26,27,28,29,30,31,32],"agent-skills","cli","fetch","mcp","rust","servo","web-scraping","2026-06-12 02:02:30","\u003Cdiv align=\"center\">\n  \u003Ch1 align=\"center\">servo-fetch\u003C\u002Fh1>\n  \u003Cp align=\"center\">A self-contained browser engine that fetches, renders, and extracts web content as Markdown, JSON, or screenshots — no Chromium, no API key, no setup.\u003C\u002Fp>\n  \u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fkonippi\u002Fservo-fetch\u002Factions\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fkonippi\u002Fservo-fetch\u002Fworkflows\u002FCI\u002Fbadge.svg\" alt=\"CI\">\u003C\u002Fa>\n    \u003Ca href=\"https:\u002F\u002Fcrates.io\u002Fcrates\u002Fservo-fetch\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fcrates\u002Fv\u002Fservo-fetch.svg\" alt=\"crates.io\">\u003C\u002Fa>\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FRust-1.86.0-blue?color=fc8d62&logo=rust\" alt=\"MSRV\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT%2FApache--2.0-blue.svg\" alt=\"MIT OR Apache-2.0\">\n  \u003C\u002Fp>\n  \u003Cimg src=\"assets\u002Fdemo.gif\" alt=\"servo-fetch demo\" width=\"900\">\n\u003C\u002Fdiv>\n\nservo-fetch embeds the [Servo](https:\u002F\u002Fservo.org\u002F) browser engine. It executes JavaScript, computes CSS layout,\ncaptures screenshots with a software renderer, and extracts clean content — available as a CLI, a Rust library,\nand a Python SDK.\n\n```bash\n# CLI\nservo-fetch \"https:\u002F\u002Fexample.com\"                        # clean Markdown\nservo-fetch \"https:\u002F\u002Fexample.com\" --screenshot page.png  # PNG screenshot\n```\n\n```rust\n\u002F\u002F Rust\nlet md = servo_fetch::markdown(\"https:\u002F\u002Fexample.com\")?;\n```\n\n```python\n# Python\npage = servo_fetch.fetch(\"https:\u002F\u002Fexample.com\")\nprint(page.markdown)\n```\n\n## Why servo-fetch\n\n- **Zero dependencies** — single binary, no Chromium, no API key\n- **Real JS execution** — SpiderMonkey runs JavaScript, parallel CSS engine computes layout\n- **Layout-aware extraction** — strips navbars, sidebars, footers by actual rendered position\n- **Schema-driven JSON** — declarative CSS-selector schema pulls structured data\n- **Parallel batch fetch** — multiple URLs fetched concurrently\n- **Site crawling** — BFS link traversal with robots.txt, same-site scope, and rate limiting\n- **URL discovery** — sitemap-based URL mapping without rendering (fast, lightweight)\n- **Screenshots without GPU** — software renderer captures PNG\u002Ffull-page screenshots anywhere\n- **Accessibility tree** — AccessKit integration with roles, names, and bounding boxes\n\n## Performance and quality\n\nApple M3 Pro, versus Playwright (the typical AI-agent stack):\n\n| Benchmark           | servo-fetch | playwright:optimized |\n| ------------------- | ----------: | -------------------: |\n| Time — static-small |     ~231 ms |              ~645 ms |\n| Time — spa-heavy    |     ~331 ms |              ~798 ms |\n| Memory (peak RSS)   |    51–64 MB |           300–328 MB |\n\nExtraction quality: mean word-F1 0.823 vs Readability's 0.723 across\nseven page-type fixtures, with `without[]` boilerplate removal at 96.4%\nvs 82.1%. Direct-binary engine peers (chrome-headless-shell, Lightpanda,\ncurl) are opt-in.\n\nMethodology, three-axis breakdown, per-fixture F1, and raw JSON:\n[`benchmarks\u002FREADME.md`](benchmarks\u002FREADME.md) +\n[`benchmarks\u002Fresults\u002F`](benchmarks\u002Fresults\u002F).\n\n## Install\n\n| Interface | Install | Docs |\n|-----------|---------|------|\n| **CLI** | `curl -fsSL https:\u002F\u002Fraw.githubusercontent.com\u002Fkonippi\u002Fservo-fetch\u002Fmain\u002Finstall.sh \\| sh` | [CLI docs](crates\u002Fservo-fetch-cli\u002FREADME.md) |\n| **Rust** | `cargo add servo-fetch` | [Library docs](crates\u002Fservo-fetch\u002FREADME.md) |\n| **Python** | `pip install servo-fetch` | [Python docs](bindings\u002Fpython\u002FREADME.md) |\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>CLI install alternatives\u003C\u002Fb>\u003C\u002Fsummary>\n\n```bash\ncargo binstall servo-fetch-cli   # prebuilt binary\ncargo install servo-fetch-cli    # build from source\n```\n\nOr download from [GitHub Releases](https:\u002F\u002Fgithub.com\u002Fkonippi\u002Fservo-fetch\u002Freleases).\n\n**Linux** — install runtime deps and use `xvfb-run` on headless servers:\n\n```bash\nsudo apt install -y libegl1 libfontconfig1 libfreetype6\nxvfb-run --auto-servernum servo-fetch \"https:\u002F\u002Fexample.com\"\n```\n\n**Windows** — keep `servo-fetch.exe`, `libEGL.dll`, and `libGLESv2.dll` in the same directory.\n\n**macOS** — no extra setup needed.\n\n\u003C\u002Fdetails>\n\n## Quick Start\n\n### CLI\n\n```bash\nservo-fetch \"https:\u002F\u002Fexample.com\"                        # Markdown (default)\nservo-fetch \"https:\u002F\u002Fexample.com\" --json                 # Structured JSON\nservo-fetch \"https:\u002F\u002Fexample.com\" --screenshot page.png  # PNG screenshot\nservo-fetch \"https:\u002F\u002Fexample.com\" --js \"document.title\"  # Run JavaScript\nservo-fetch \"https:\u002F\u002Fexample.com\" --schema schema.json   # Schema-driven JSON\nservo-fetch URL1 URL2 URL3                               # Parallel batch\nservo-fetch crawl \"https:\u002F\u002Fdocs.example.com\" --limit 20  # Crawl a site\nservo-fetch map \"https:\u002F\u002Fexample.com\"                    # Discover URLs via sitemap\nservo-fetch mcp                                          # MCP server (stdio)\nservo-fetch serve                                        # HTTP API server\n```\n\nFull CLI reference → [`servo-fetch-cli`](crates\u002Fservo-fetch-cli\u002FREADME.md)\n\n### Rust\n\n```bash\ncargo add servo-fetch\n```\n\n```rust\n\u002F\u002F URL → Markdown in one line\nlet md = servo_fetch::markdown(\"https:\u002F\u002Fexample.com\")?;\n\n\u002F\u002F Fetch with options\nuse servo_fetch::{fetch, FetchOptions};\nuse std::time::Duration;\n\nlet page = fetch(FetchOptions::new(\"https:\u002F\u002Fexample.com\").timeout(Duration::from_secs(60)))?;\nprintln!(\"{}\", page.html);\nlet md = page.markdown()?;\n\n\u002F\u002F Crawl a site\nservo_fetch::crawl_each(\n    servo_fetch::CrawlOptions::new(\"https:\u002F\u002Fdocs.example.com\")\n        .limit(100)\n        .user_agent(\"MyBot\u002F1.0\"),\n    |result| match &result.outcome {\n        Ok(page) => println!(\"{}: {} chars\", result.url, page.content.len()),\n        Err(e) => eprintln!(\"{}: {e}\", result.url),\n    },\n)?;\n\n\u002F\u002F Discover URLs via sitemap (no rendering)\nlet urls = servo_fetch::map(\n    servo_fetch::MapOptions::new(\"https:\u002F\u002Fexample.com\").limit(1000),\n)?;\nfor u in &urls {\n    println!(\"{}\", u.url);\n}\n```\n\nFull API reference → [`servo-fetch`](crates\u002Fservo-fetch\u002FREADME.md)\n\n### Python\n\n```bash\npip install servo-fetch\n```\n\n```python\nimport servo_fetch\n\npage = servo_fetch.fetch(\"https:\u002F\u002Fexample.com\")\nprint(page.markdown)\n\n# Schema extraction\nfrom servo_fetch import Schema, Field\nschema = Schema(\n    base_selector=\".product\",\n    fields=[\n        Field(name=\"title\", selector=\"h2\", type=\"text\"),\n        Field(name=\"price\", selector=\".price\", type=\"text\"),\n    ],\n)\npage = servo_fetch.fetch(\"https:\u002F\u002Fshop.example.com\", schema=schema)\nprint(page.extracted)\n```\n\nFull API reference → [`bindings\u002Fpython`](bindings\u002Fpython\u002FREADME.md)\n\n## MCP Server\n\nBuilt-in [Model Context Protocol](https:\u002F\u002Fmodelcontextprotocol.io\u002F) server with six tools: `fetch`,\n`batch_fetch`, `crawl`, `map`, `screenshot`, and `execute_js`.\n\n```json\n{\n  \"mcpServers\": {\n    \"servo-fetch\": {\n      \"command\": \"servo-fetch\",\n      \"args\": [\"mcp\"]\n    }\n  }\n}\n```\n\nStreamable HTTP: `servo-fetch mcp --port 8080`\n\nFull MCP tool reference → [`servo-fetch-cli` README](crates\u002Fservo-fetch-cli\u002FREADME.md)\n\n## HTTP API\n\nREST endpoints for containerized deployments and HTTP clients:\n\n```bash\nservo-fetch serve                            # 127.0.0.1:3000\nservo-fetch serve --host 0.0.0.0 --port 80   # expose to network\n\ncurl -X POST http:\u002F\u002F127.0.0.1:3000\u002Fv1\u002Ffetch \\\n  -H 'content-type: application\u002Fjson' \\\n  -d '{\"url\":\"https:\u002F\u002Fexample.com\"}'\n```\n\nEndpoints: `GET \u002Fhealth`, `GET \u002Fversion`, `POST \u002Fv1\u002Ffetch`, `POST \u002Fv1\u002Fbatch_fetch`, `POST \u002Fv1\u002Fscreenshot`, `POST \u002Fv1\u002Fexecute_js`, `POST \u002Fv1\u002Fcrawl`, `POST \u002Fv1\u002Fmap`.\n\nFull HTTP API reference → [`servo-fetch-cli` README](crates\u002Fservo-fetch-cli\u002FREADME.md#http-api-server)\n\n## Docker\n\nMulti-arch image on GitHub Container Registry (`linux\u002Famd64`, `linux\u002Farm64`):\n\n```bash\ndocker run --rm -p 3000:3000 ghcr.io\u002Fkonippi\u002Fservo-fetch:latest\ncurl -X POST http:\u002F\u002F127.0.0.1:3000\u002Fv1\u002Ffetch \\\n  -H 'content-type: application\u002Fjson' \\\n  -d '{\"url\":\"https:\u002F\u002Fexample.com\"}'\n```\n\nRuns as non-root (UID 1001). Images are signed with [cosign](https:\u002F\u002Fgithub.com\u002Fsigstore\u002Fcosign) (keyless) and published with SLSA provenance and SBOM attestations.\n\n## Agent Skills\n\nservo-fetch ships with an [Agent Skills](https:\u002F\u002Fagentskills.io\u002F) package for AI coding agents:\n\n```bash\nnpx skills add https:\u002F\u002Fgithub.com\u002Fkonippi\u002Fservo-fetch\u002Ftree\u002Fmain\u002Fskills\u002Fservo-fetch\n```\n\n## Security\n\nservo-fetch blocks all private and reserved IP ranges ([RFC 6890](https:\u002F\u002Fdatatracker.ietf.org\u002Fdoc\u002Fhtml\u002Frfc6890)),\nstrips credentials from URLs, disables HTTP redirects to prevent SSRF bypass, and sanitizes all output against\nterminal escape injection ([CVE-2021-42574](https:\u002F\u002Fwww.cve.org\u002FCVERecord?id=CVE-2021-42574)).\nSee [SECURITY.md](.\u002FSECURITY.md) for details.\n\n## Limitations\n\n- Sites behind login walls or CAPTCHAs are not supported.\n\n## Contributing\n\nSee [CONTRIBUTING.md](.\u002FCONTRIBUTING.md) for development setup, commit conventions, and PR guidelines.\n\n## License\n\n[MIT](.\u002FLICENSE-MIT) OR [Apache-2.0](.\u002FLICENSE-APACHE)\n","servo-fetch 是一个自包含的浏览器引擎，用于抓取、渲染和提取网页内容为Markdown、JSON或截图。其核心功能包括执行JavaScript、计算CSS布局、使用软件渲染器捕获截图以及基于实际渲染位置提取干净的内容。该工具无需Chromium支持，也不需要API密钥或额外设置，提供了CLI、Rust库和Python SDK三种接口。它适用于需要快速且轻量级地处理网页内容的场景，如自动化数据抓取、内容分析及生成报告等。特别适合那些对环境依赖性要求低但又需要高质量页面解析的应用场合。","2026-06-11 03:31:37","CREATED_QUERY"]