[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-887":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},887,"insane-search","fivetaku\u002Finsane-search","fivetaku","Auto-bypass for blocked websites in Claude Code — Phase 0→3 adaptive scheduler, no API keys",null,"Python",723,122,609,1,0,6,51,114,18,10.27,"MIT License",false,"main",true,[],"2026-06-12 02:00:20","English | [한국어](README.ko.md)\n\n# insane-search\n\n> **The scraper that's too stubborn to quit.**\n\n`403`. WAF. CAPTCHA. Empty SPA. Login wall. When every normal tool taps out, insane-search is just getting started. Five probe phases. Auto-installs TLS impersonation. Discovers hidden APIs through a real browser. Tries everything — and for every site that claimed to be \"blocked,\" something always works.\n\nNo API keys. No signup. No config. Install, and watch Claude Code stop giving up.\n\n[Quick Start](#quick-start) • [How it works](#how-it-works) • [What's in the index](#whats-in-the-index) • [References](#references) • [Requirements](#requirements)\n\n---\n\n## Quick Start\n\n### 1. Add the marketplace\n\n```\n\u002Fplugin marketplace add https:\u002F\u002Fgithub.com\u002Ffivetaku\u002Fgptaku_plugins.git\n```\n\n### 2. Install the plugin\n\n```\n\u002Fplugin install insane-search\n```\n\n### 3. Restart Claude Code\n\nThat's it. No config, no API keys, no env vars.\n\n### 4. Start asking\n\nJust talk normally. Blocked sites will be unblocked automatically.\n\n```\n\"Show me what's trending on r\u002FLocalLLaMA\"\n\"What did @openclaw post on X recently?\"\n\"Search X for posts about insane-search\"\n\"Summarize this YouTube video\"\n\"Search Coupang for under ₩100,000 keyboards\"\n\"Read this Naver blog post for me\"\n\"네이버에서 클로드코드 관련 뉴스 찾아줘\"\n\"Find LinkedIn articles about Claude Code plugins\"\n```\n\n---\n\n## Why insane-search?\n\n- **It doesn't know the word \"blocked\"** — No pre-judged \"this site can't be accessed\" labels. Every site gets the full chain. Coupang? Coupang falls. LinkedIn? Full article body extracted. Yozm? Chrome UA and done\n- **Identity spoofing built in** — Phase 2 doesn't just swap TLS fingerprints. It builds a full browser identity: homepage cookie warming, referrer chains, locale-matched headers. Sites like fmkorea (HTTP 430) and LinkedIn (login wall) fall to this alone\n- **Intent routing** — \"Fetch this URL\" and \"Search X for this keyword\" are different problems. insane-search routes keywords through WebSearch or Naver Search first, gets URLs, then fetches content. Two-stage pipeline, automatic\n- **Installs its own weapons** — Missing `curl_cffi` for TLS fingerprint bypass? Installs it. Missing `feedparser`? Installs it. Missing `yt-dlp`? Installs it. You don't even notice\n- **5 probe phases, not 1** — WebFetch → Jina → curl UA\u002FURL variants → TLS impersonation with identity spoofing → real browser. Each phase escalates only when the previous hits a wall\n- **Finds hidden APIs** — Phase 3 doesn't just render the page. It watches the browser's network traffic, catches the actual JSON API the site uses internally, and hands it back for reuse\n- **Zero setup friction** — No API keys, no OAuth, no developer portals. Everything runs on public endpoints and auto-installable libraries\n\n---\n\n## How it works\n\nWhen Claude Code needs to fetch a URL, insane-search runs a 4-phase adaptive scheduler. Each phase only runs if the previous phase failed or detected specific blocking signals.\n\n```\nPhase 0: Special endpoint index\n  ↓ not in index or failed\nPhase 1: Lightweight probes (parallel)\n  • WebFetch + Jina Reader\n  • curl with Chrome \u002F mobile \u002F Googlebot UAs\n  • URL variants: m.{domain}, .json, \u002Frss, \u002Ffeed\n  • Sidecar: AMP cache, archive.today, Wayback (low-trust)\n  ↓ 403\u002F429\u002FWAF headers\u002Fchallenge body detected\nPhase 2: TLS impersonation + identity spoofing\n  • curl_cffi with safari → chrome → firefox\n  • Identity spoofing: homepage cookie warming → referrer chain → locale headers\n  • Behavioral challenge detection (Akamai _abck) → skip to Phase 3\n  • Auto-installs if missing: pip install curl_cffi\n  ↓ TLS bypass failed or JS challenge detected\nPhase 3: Full browser\n  • Playwright MCP (browser_navigate → snapshot → evaluate)\n  • Also discovers hidden APIs via network_requests\n  ↓ login\u002Fpaywall detected\nExit: \"authentication required\" — no amount of phases will fix this\n```\n\n**Core principle**: don't pre-exclude any method. Don't skip a method because a dependency is missing — install it and try. Don't skip because a site is \"known to be hard\" — the site changes, and the method might work now.\n\nEvery HTML response is also scanned for OGP tags and JSON-LD structured data — so even partial responses yield titles, summaries, prices, or profile info.\n\n---\n\n## What's in the index\n\nOnly special endpoints that the generic chain can't discover on its own. Everything else — Naver blogs, Coupang, LinkedIn, Medium, Korean news sites, Substack, most forums — is handled by the adaptive scheduler without explicit entries.\n\n### Platform-specific APIs\n\n| Platform | Method | Reference |\n|----------|--------|-----------|\n| X\u002FTwitter | syndication (timeline) + oEmbed (single tweet) + **WebSearch keyword search** | `twitter.md` |\n| Reddit | URL + `.json` + Mobile UA | `json-api.md` |\n| Bluesky | AT Protocol (`public.api.bsky.app\u002Fxrpc\u002F...`) | `public-api.md` |\n| Mastodon | Per-instance public API | `public-api.md` |\n| Hacker News | Firebase API + **Algolia Search** (`hn.algolia.com\u002Fapi\u002Fv1\u002Fsearch`) | `json-api.md` |\n| Stack Overflow | SE API v2.3 | `public-api.md` |\n| Lobste.rs \u002F V2EX \u002F dev.to | Public JSON APIs | `json-api.md` |\n\n### Media (CLI tool required)\n\n| Platform | Method | Reference |\n|----------|--------|-----------|\n| YouTube \u002F Vimeo \u002F Twitch \u002F TikTok \u002F SoundCloud + 1,853 others | `yt-dlp --dump-json` | `media.md` |\n\n### Academic & registry\n\n| Platform | Method | Reference |\n|----------|--------|-----------|\n| arXiv | Atom API | `public-api.md` |\n| CrossRef | REST API | `public-api.md` |\n| Wikipedia | REST API | `json-api.md` |\n| OpenLibrary | JSON API | `public-api.md` |\n| GitHub | `gh` CLI \u002F REST API | `public-api.md` |\n| npm \u002F PyPI | Registry API | `json-api.md` |\n| Wayback Machine | CDX API | `public-api.md` |\n\n### Korea-specific\n\n| Platform | Method | Reference |\n|----------|--------|-----------|\n| Naver Search | curl_cffi identity spoofing + `search.naver.com` (통합\u002F블로그\u002F뉴스) | `naver.md` |\n| Naver Finance (stock prices) | `api.finance.naver.com\u002FsiseJson.naver` (unofficial, no auth) | `naver.md` |\n\n**Everything else flows through Phase 1~3 automatically** — including Coupang (curl_cffi safari), LinkedIn (identity spoofing → JSON-LD full article body), fmkorea (identity spoofing), Medium (Jina), most Korean forums (Jina or curl), and any site with `\u002Frss` or `\u002Ffeed` endpoints.\n\n---\n\n## References\n\nThe skill is organized as a set of reference files, each covering one class of techniques.\n\n| File | Covers |\n|------|--------|\n| `fallback.md` | Phase 0→3 adaptive scheduler, escalation signals, response validation |\n| `jina.md` | Jina Reader (no-key reader at `r.jina.ai`) |\n| `json-api.md` | Public JSON APIs (Reddit, HN, dev.to, Wikipedia, npm, PyPI, etc.) |\n| `public-api.md` | Bluesky, Mastodon, Stack Exchange, arXiv, CrossRef, OpenLibrary, GitHub, Wayback |\n| `media.md` | yt-dlp usage for 1,858 media sites |\n| `twitter.md` | Twitter Syndication API + oEmbed + WebSearch keyword search |\n| `naver.md` | Naver Search (curl_cffi identity spoofing), blog mobile URLs, Finance JSON API |\n| `rss.md` | Korean news RSS (9 outlets), Google News RSS, feedparser, SearXNG |\n| `tls-impersonate.md` | curl_cffi multi-target + identity spoofing (cookie warming, referrer chain) + behavioral challenge detection |\n| `playwright.md` | Playwright MCP full toolkit (snapshot, evaluate, network_requests) |\n| `cache-archive.md` | Google AMP cache, archive.today, Wayback Machine |\n| `metadata.md` | OGP, JSON-LD, Schema.org, Next.js RSC payload extraction |\n\n---\n\n## Dependencies\n\n**Required:** Claude Code only.\n\n**Auto-installed when needed** (the skill installs these transparently on first use):\n\n```bash\npip install curl_cffi    # TLS impersonation for WAF-blocked sites\npip install feedparser   # RSS\u002FAtom parsing\npip install yt-dlp       # 1,858 media sites\n```\n\n**Optional, improves coverage:**\n\n```bash\nbrew install gh                      # GitHub (faster than REST API)\nclaude mcp add playwright -- npx @playwright\u002Fmcp@latest   # JS-rendered sites\n```\n\nIf a dependency is missing, the skill doesn't skip the method — it installs the dependency and tries.\n\n---\n\n## What insane-search is not\n\n- **Not a scraper** — It's a method-selection layer. It uses public APIs and standard techniques\n- **Not API-key based** — Everything uses no-auth public endpoints or URL transformations\n- **Not a hand-maintained answer key** — The index is minimal (~15 groups). Everything else is discovered by the adaptive scheduler\n- **Not bias-forming** — There's no \"access denied\" list. If a site can be reached, the chain will find the way\n\n---\n\n## Usage\n\nThere are no commands. Just talk normally. The skill triggers automatically when a URL is blocked or when accessing platforms that need special handling.\n\n```\n\"What's on the front page of Hacker News right now?\"\n→ Firebase API → top stories with scores and comments\n\n\"Find AI papers published this week on arXiv\"\n→ arXiv Atom API with date filter\n\n\"Scrape Coupang for laptop deals under $1000\"\n→ Phase 2: curl_cffi safari → JSON-LD ItemList\n\n\"Summarize this Medium article\"\n→ Phase 1: Jina Reader → clean markdown\n\n\"Check what people are saying about Claude Code on Reddit\"\n→ Reddit JSON API with Mobile UA → posts + top comments\n\n\"Search X for insane-search\"\n→ Intent routing: keyword search → WebSearch(site:x.com) → oEmbed → full tweets\n\n\"네이버에서 클로드코드 뉴스 찾아줘\"\n→ Naver Search (identity spoofing) → news tab → article URLs → Jina Reader\n\n\"Find LinkedIn articles about AI agents\"\n→ WebSearch(site:linkedin.com) → identity spoofing → JSON-LD articleBody\n```\n\n---\n\n## License\n\nMIT\n\n---\n\n\u003Cdiv align=\"center\">\n\n**If it's on the web, insane-search is getting in.**\n\n\u003C\u002Fdiv>\n","insane-search 是一个自动绕过被封锁网站的工具，专为 Claude Code 设计。其核心功能包括五阶段自适应调度器、自动安装 TLS 伪装以及通过真实浏览器发现隐藏 API 等，无需任何 API 密钥或配置。该项目使用 Python 编写，特别适用于需要访问受限制或屏蔽内容的场景，如突破验证码、登录墙等传统障碍。无论目标站点如何设置防护措施，insane-search 总能找到一种方法获取所需信息。",2,"2026-06-11 02:40:02","CREATED_QUERY"]