[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-2069":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":13,"stars7d":13,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":13,"lastSyncTime":26,"discoverSource":27},2069,"webmcp","AuthBits\u002Fwebmcp","AuthBits","A lightweight, prompt-driven MCP web research server for high-quality LLM powered information extraction.",null,"Python",140,18,2,1,0,9,6,3.84,"MIT License",false,"main",[],"2026-06-12 02:00:36","# webmcp\n\n`webmcp` is an MCP server for web search and content extraction. LLM agents can use it to:\n\n- search the web with DuckDuckGo (default) or SearXNG (optional)\n- fetch and clean page content from one or more URLs\n- send cleaned content to a local LLM for structured extraction\n\n## Features\n\n- `search_web(query, limit=10)` returns web results (title, URL, description)\n- `extract(urls, prompt=None, schema=None, use_browser=True)` extracts data from pages\n- browser-based fetching with Playwright for JavaScript-heavy sites\n- lightweight HTTP fetching mode for faster\u002Fsimple pages\n- persistent tool-call logging to `tool_calls.log.json`\n- configurable search provider: DDG by default, optional SearXNG\n\n## Critical Requirement\n\nFor the main researcher llama.cpp server, include `--webui-mcp-proxy` in launch parameters. Without this flag, this workflow will not function correctly.\n\n## Prompting And Tested Setup\n\nFor best results, use `research_prompt.txt` as your system prompt. This prompt is a core part of the intended workflow and quality; it is effectively half of how this repository is meant to function.\n\nTested setup:\n\n- Main researcher LLM: `Qwen3.5:27b-Q3_K_M.gguf` via llama.cpp on an RTX 4090, context length 200,000, about 40 tok\u002Fs.\n- Extract tool LLM: `Qwen3.5:9b-Q4_K_M.gguf` via llama.cpp on a GTX 1080 Ti, context length 32,768, about 40 tok\u002Fs.\n- This workflow has been tested with the llama.cpp WebUI specifically, and has not been validated with other MCP clients yet.\n\n## Requirements\n\n- Python 3.10+\n- A local OpenAI-compatible LLM endpoint (for example, llama.cpp, LM Studio, vLLM, ollama, etc)\n\n## Configuration\n\nThe app reads LLM settings from environment variables and supports a local `.env` file.\n\n1. Copy `.env.example` to `.env`\n2. Set values:\n\n```env\nLLM_URL=http:\u002F\u002Flocalhost:1234\nLLM_MODEL=your-model-name\nSEARCH_PROVIDER=ddg\n# Optional when SEARCH_PROVIDER=searxng\nSEARXNG_URL=http:\u002F\u002Flocalhost:8080\n```\n\n`LLM_URL` and `LLM_MODEL` are required at startup.\n`SEARCH_PROVIDER` defaults to `ddg`. Set it to `searxng` to replace DDG, and provide `SEARXNG_URL`.\n\n## Search Providers\n\n`search_web` supports two providers:\n\n- `ddg` (default): uses DuckDuckGo via `ddgs`\n- `searxng`: uses your SearXNG instance\n\nSearXNG notes:\n\n- Set `SEARCH_PROVIDER=searxng`\n- Set `SEARXNG_URL` to your instance base URL (for example, `http:\u002F\u002F192.168.0.55:8888`)\n- `webmcp` calls `\u003CSEARXNG_URL>\u002Fsearch` with `format=json`\n\n## Install\n\nInstall dependencies from the pinned requirements file:\n\n```bash\npip install -r requirements.txt\npython -m playwright install chromium\n```\n\n## Run\n\n```bash\npython app.py\n```\n\nServer starts on:\n\n- `http:\u002F\u002F0.0.0.0:8642`\n\n## MCP Usage Notes\n\n- `extract(..., use_browser=True)` is best for dynamic pages that require JS rendering.\n- `extract(..., use_browser=False)` is faster for static pages.\n- If extraction quality is poor, the LLM should provide a more specific `prompt` and\u002For a stricter `schema`.\n\n## TODO\n\n- Revisit JS page rendering and extraction strategy. Right now, roughly 25-30% of pages return little or no usable content even when fetched successfully.\n- Improve anti-bot handling for page fetches. Many targets still return 400-range errors, so investigate stronger browser mimicry (Playwright\u002FChromium behavior, headers, fingerprinting, and potentially user-agent\u002Fprofile rotation).\n\n## License\n\nMIT. See `LICENSE`.","webmcp 是一个轻量级的、基于提示驱动的MCP网络研究服务器，旨在通过高质量的语言模型实现信息提取。其核心功能包括使用DuckDuckGo（默认）或SearXNG（可选）进行网页搜索、从一个或多个URL中抓取并清理页面内容，并将处理后的内容发送给本地语言模型进行结构化提取。技术特点涵盖基于Playwright的浏览器渲染以支持JavaScript密集型网站、快速简单的HTTP获取模式、以及持久化的工具调用日志记录。适用于需要高效准确地从互联网上提取特定信息的研究场景，如市场分析、新闻聚合等。","2026-06-11 02:47:54","CREATED_QUERY"]