[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81110":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},81110,"bpc-fetch","Sophomoresty\u002Fbpc-fetch","Sophomoresty","Bypass paywall sites — search, discover, and batch fetch articles as Markdown. 936 sites supported.",null,"JavaScript",166,48,37,0,32,129,5,5.07,"MIT License",false,"main",true,[],"2026-06-12 02:04:11","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Flogo.png\" width=\"128\" height=\"128\" alt=\"bpc-fetch logo\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">bpc-fetch\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n  Bypass paywall sites — search, discover, and batch fetch articles as Markdown.\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"README_CN.md\">中文文档\u003C\u002Fa> •\n  \u003Ca href=\"#supported-sites\">936 Sites\u003C\u002Fa> •\n  \u003Ca href=\"#features\">Features\u003C\u002Fa> •\n  \u003Ca href=\"#installation\">Install\u003C\u002Fa> •\n  \u003Ca href=\"#usage\">Usage\u003C\u002Fa> •\n  \u003Ca href=\"#credits\">Credits\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Flinux.do\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLINUX%20DO-Community-blue?style=flat-square\" alt=\"LINUX DO Community\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n## What is this?\n\nA command-line tool that fetches full-text articles from 936 paywalled news sites and saves them as clean Markdown with images. It replicates the bypass logic of the [Bypass Paywalls Clean](https:\u002F\u002Fgitflic.ru\u002Fproject\u002Fmagnolia1234\u002Fbypass-paywalls-chrome-clean) browser extension, but runs headlessly — no browser extension needed.\n\n## Supported Sites\n\n**936 sites** across 40+ countries. Highlights:\n\n| Category | Sites |\n|----------|-------|\n| Financial | The Economist, Financial Times, Bloomberg, WSJ, Reuters, Forbes, Business Insider |\n| US News | New York Times, Washington Post, LA Times, Chicago Tribune, Politico |\n| UK\u002FEU | The Telegraph, The Times, Der Spiegel, Le Monde, El País, Corriere della Sera |\n| Tech\u002FScience | Wired, The Atlantic, Nature, Science, Scientific American, MIT Tech Review |\n| Magazines | The New Yorker, Vanity Fair, Vogue, National Geographic, Esquire |\n| German | 76 sites (FAZ, Handelsblatt, Süddeutsche Zeitung...) |\n| French | 69 sites (Le Figaro, Libération, Les Echos...) |\n| More | Netherlands 30, Italy 28, Spain 26, Belgium 22, Australia 39... |\n\nRun `bpc-fetch sites` to see the full list.\n\n## Features\n\n- **Full bypass coverage** — Replicates all BPC extension strategies: custom User-Agent, Googlebot\u002FBingbot spoofing, referer manipulation, Playwright JS interception, archive.org fallback\n- **Auto fallback chain** — Each URL tries the optimal strategy, degrades gracefully until content is retrieved\n- **Article discovery** — Find recent articles via RSS, sitemap, or browser-rendered homepage\n- **Cross-site crawl** — Search + time filter + batch download in one command\n- **Agent-friendly** — JSON stdout, stderr progress, `next_command` hints in every response\n- **Windows exe** — Single-file distribution via PyInstaller, auto-downloads Chromium on first run\n\n## Installation\n\n### pip (recommended)\n\n```bash\npip install bpc-fetch\nplaywright install chromium\n```\n\n### From source\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fuser\u002Fbpc-fetch.git\ncd bpc-fetch\npip install -e .\nplaywright install chromium\n```\n\n### Windows exe\n\nDownload `bpc-fetch.exe` from [Releases](https:\u002F\u002Fgithub.com\u002Fuser\u002Fbpc-fetch\u002Freleases), then:\n\n```\nbpc-fetch.exe install-browser\nbpc-fetch.exe doctor\n```\n\n## Usage\n\n```bash\n# Check setup\nbpc-fetch doctor\n\n# List supported sites\nbpc-fetch sites --filter economist\n\n# Discover today's articles from a site\nbpc-fetch discover economist.com --since today\n\n# Fetch a single article\nbpc-fetch fetch \"https:\u002F\u002Fwww.economist.com\u002Fleaders\u002F2024\u002F01\u002F01\u002Fexample\" --out-dir .\u002Farticles\n\n# Batch fetch from URL list\nbpc-fetch batch --file urls.txt --out-dir .\u002Farticles\n\n# Cross-site crawl: keyword + time range\nbpc-fetch crawl \"AI regulation\" --sites economist.com,ft.com --since 7d --out-dir .\u002Fai-articles\n```\n\n### Output format\n\n```\narticle-title\u002F\n├── article-title.md      # YAML frontmatter + full text + image refs\n└── images\u002F\n    ├── img_000_abc1.jpg\n    └── img_001_def2.png\n```\n\n### Agent integration\n\nAll commands output JSON. Use `--compact` for minimal output:\n\n```bash\nbpc-fetch discover ft.com --since today --compact\n# → {\"ok\": true, \"domain\": \"ft.com\", \"count\": 15, \"articles\": [...], \"next_command\": \"bpc-fetch batch ...\"}\n```\n\n## Bypass Strategies\n\n| Strategy | Sites | Method |\n|----------|-------|--------|\n| `ua:custom` | 7 | Custom User-Agent string (Liskov, Google-InspectionTool, etc.) |\n| `ua:googlebot` | 85 | Googlebot User-Agent |\n| `ua:facebookbot` | 5 | Facebook crawler UA |\n| `referer:google` | 2 | Google referer header |\n| `block_js` | 425 | Playwright blocks paywall scripts via `Page.route()` |\n| `archive` | 274 | Fetch from archive.org\u002Farchive.is |\n| `cookies` | 138 | Access without tracking cookies |\n\n## Building Windows exe\n\n```bash\npip install pyinstaller\npython build\u002Fbuild_win.py\n# Output: dist\u002Fbpc-fetch.exe\n```\n\n## Credits\n\nThis tool is built on top of the bypass logic from:\n\n- **[Bypass Paywalls Clean](https:\u002F\u002Fgitflic.ru\u002Fproject\u002Fmagnolia1234\u002Fbypass-paywalls-chrome-clean)** by [magnolia1234](https:\u002F\u002Fgitflic.ru\u002Fuser\u002Fmagnolia1234) — the original browser extension that provides the site database and bypass strategies. All credit for the paywall bypass research goes to the BPC project maintainers.\n- **[GenericAgent](https:\u002F\u002Fgithub.com\u002Flsdefine\u002FGenericAgent)** — 本项目核心开发依仗 GA 提供的 AI 能力.\n- **[LINUX DO](https:\u002F\u002Flinux.do)** — Community support and feedback.\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\nThe `data\u002Fsites.js` file is from the Bypass Paywalls Clean project (MIT License).\n","bpc-fetch 是一个命令行工具，用于从936个付费墙新闻网站抓取全文文章并保存为干净的Markdown格式。其核心功能包括通过模拟浏览器行为（如自定义User-Agent、Googlebot\u002FBingbot伪装、referer操纵等）来绕过付费墙限制，并且支持自动降级策略以确保内容获取。此外，它还提供了跨站点爬虫能力，能够根据RSS、站点地图或主页发现最近的文章，并批量下载。此工具适用于需要访问大量受限制内容的研究人员、记者或个人用户，尤其适合那些希望以结构化格式保存和分析信息的场景。",2,"2026-06-11 04:03:33","CREATED_QUERY"]