[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80649":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":13,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":9,"rankLanguage":9,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":15,"starSnapshotCount":15,"syncStatus":13,"lastSyncTime":25,"discoverSource":26},80649,"benchmark-platform","wgpsec\u002Fbenchmark-platform","wgpsec","浑象 AI agent CTF 靶场竞赛平台",null,"Python",64,7,2,1,0,15,45.21,"MIT License",false,"master",[],"2026-06-12 04:01:29","[中文文档](README.zh-CN.md) | English\n\n# Benchmark Platform\n\nA CTF challenge platform for security capability evaluation. Dynamically manages challenge instances via Docker Compose, with both a Web UI and REST API interface.\n\n## Features\n\n- Challenge lifecycle management (start \u002F stop \u002F health check)\n- **Dynamic Flag injection** — each instance gets a unique `flag{uuid}` at runtime, never baked into images\n- **Challenge Store** — browse, download, and import challenges from GitHub Releases with China mirror acceleration\n- **Hot-reload** — newly downloaded challenges are discoverable without server restart\n- Multi-flag support (multiple attack paths per challenge)\n- Difficulty tiering with Level Gate (progressive unlock)\n- Real-time status display (running \u002F unhealthy \u002F stopped)\n- Submission history & scoring (persisted across restarts)\n- Hint system (with point deduction)\n- Image pre-build \u002F cache page (avoid cold-start delays, **selective build support**)\n- Team management & multi-team scoring\n- Runtime directory isolation (configurable via Web UI)\n- **MCP Server** — Streamable HTTP endpoint for AI Agent integration (Claude Code, LangChain, openai-agents, etc.)\n- **Web UI Authentication** — cookie-based login with admin\u002Fobserver roles; admin token auto-generated or user-defined\n- Apple Silicon (ARM64) compatibility\n\n## Screenshots\n\n| Dashboard | Challenges |\n|-----------|-----------|\n| ![Dashboard](.github\u002Fscreenshots\u002Fdashboard.png) | ![Challenges](.github\u002Fscreenshots\u002Fchallenges.png) |\n\n| Challenge Store | Prebuild |\n|----------------|----------|\n| ![Store](.github\u002Fscreenshots\u002Fstore.png) | ![Prebuild](.github\u002Fscreenshots\u002Fprebuild.png) |\n\n## Quick Start\n\n### Requirements\n\n- Python >= 3.10\n- Docker & Docker Compose\n- sshpass (deployment scripts only)\n\n### Install\n\n```bash\npython3 -m venv venv\n\nsource venv\u002Fbin\u002Factivate\n\npip install -e .\n```\n\n### Prepare Challenge Data\n\nStart the platform and navigate to **\"靶场管理\" (Challenge Store)** in the Web UI sidebar to browse and download challenges.\n\nAlternatively, set up manually:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fwgpsec\u002Fbenchmark-challenges \u002Ftmp\u002Fbenchmarks\nmkdir -p challenges\ncp -r \u002Ftmp\u002Fbenchmarks\u002Fxbow challenges\u002Fxbow\ncp -r \u002Ftmp\u002Fbenchmarks\u002Fcustom challenges\u002Fcustom\nrm -rf \u002Ftmp\u002Fbenchmarks\n```\n\n### Run\n\n```bash\npython3 -m benchmark_platform.server \\\n  --benchmark-folder .\u002Fchallenges \\\n  --port 8088 \\\n  --public-accessible-host localhost\n```\n\nOptions:\n\n| Flag | Description | Default |\n|------|-------------|---------|\n| `--benchmark-folder` | Challenge directory (can be repeated) | required |\n| `--benchmark-id` \u002F `-i` | Load only specific IDs | all |\n| `--challenges-dir` | Root directory for store downloads | `.\u002Fchallenges` |\n| `--admin-token` | Admin token for Web UI login (random if omitted) | random |\n| `--host` | Host to bind to | `0.0.0.0` |\n| `--port` | Server port | 8088 |\n| `--public-accessible-host` | Public hostname for challenges | localhost |\n| `--no-level-gate` | Disable level-based unlock | false |\n\nThe `--admin-token` can also be set via the `ADMIN_TOKEN` environment variable. On startup the server prints the active admin token to the console.\n\nTo hide public repository links, the `powered by wgpsec` footer, and the import\u002Fauthoring guidance in the Challenge Store, start the server with:\n\n```bash\nBENCHMARK_PLATFORM_UI_PROFILE=hide_branding python3 -m benchmark_platform.server \\\n  --benchmark-folder .\u002Fchallenges \\\n  --port 8088 \\\n  --public-accessible-host localhost\n```\n\nAvailable values:\n\n- `open_source`: default UI with public links and import capabilities\n- `hide_branding`: hides public branding and the store import\u002Fauthoring entry points\n\n### Access\n\nOpen `http:\u002F\u002Flocalhost:8088` in your browser. You will be redirected to the login page — enter the admin token printed in the console to get full access. Team observers log in with their own Agent-Token and see a read-only scoreboard.\n\n## Project Structure\n\n```\nbenchmark_platform\u002F\n├── server.py              # FastAPI entry, API routes\n├── base.py                # Core models (Challenge, FlagState, etc.)\n├── mcp_server.py          # MCP Server (5 tools via Streamable HTTP)\n├── auth.py                # Agent-Token authentication\n├── db.py                  # SQLite persistence (teams, progress, settings)\n├── models\u002F\n│   └── benchmark.py       # Benchmark JSON schema\n├── utils\u002F\n│   ├── challenge.py       # ChallengeManager (instance lifecycle, dynamic flag injection)\n│   └── logger.py          # Structured logging\n├── web\u002F\n│   ├── routes.py          # Web UI page & HTMX partial routes\n│   ├── auth_middleware.py # Cookie-based session auth (admin\u002Fobserver roles)\n│   ├── context.py         # Template context builders\n│   ├── prebuild_manager.py # Image pre-build manager\n│   ├── submission_store.py # Submission persistence\n│   ├── store.py           # Challenge store (GitHub Releases download)\n│   └── templates\u002F         # Jinja2 templates\n└── static\u002F\n    └── css\u002Fapp.css\n\nscripts\u002F                   # Deployment helper scripts\nchallenges\u002F                # Challenge source code (git ignored)\nruntime\u002F                   # Running instance copies (git ignored)\ntests\u002F                     # Tests\n```\n\n## API\n\n### Web UI\n\n| Route | Description |\n|-------|-------------|\n| `GET \u002Fweb\u002Flogin` | Login page |\n| `GET \u002Fweb\u002Fscoreboard` | Observer scoreboard (read-only) |\n| `GET \u002Fweb\u002Fdashboard` | Dashboard |\n| `GET \u002Fweb\u002Fchallenges` | Challenge list |\n| `GET \u002Fweb\u002Fhistory` | Submission history |\n| `GET \u002Fweb\u002Fstatus` | Instance status |\n| `GET \u002Fweb\u002Fstore` | Challenge store (download \u002F import) |\n| `GET \u002Fweb\u002Fprebuild` | Image pre-build |\n| `GET \u002Fweb\u002Fteams` | Team management |\n| `GET \u002Fweb\u002Fsettings` | Platform settings |\n\n### REST API\n\nAll endpoints require `Agent-Token` header. Endpoints marked with 🔒 require the admin (default team) token.\n\n| Route | Description |\n|-------|-------------|\n| `GET \u002Fapi\u002Fchallenges` | List all challenges |\n| `POST \u002Fapi\u002Fstart_challenge` | Start instance `{code}` |\n| `POST \u002Fapi\u002Fstop_challenge` | Stop instance `{code}` |\n| `POST \u002Fapi\u002Fsubmit` | Submit flag `{code, flag}` |\n| `POST \u002Fapi\u002Fhint` | Get hint `{code}` |\n| `GET \u002Fapi\u002Fchallenges\u002F{code}\u002Fprogress` | Query flag progress |\n| `POST \u002Fapi\u002Fstop_all` | 🔒 Stop all instances |\n| `POST \u002Fapi\u002Fchallenges\u002Freload` | 🔒 Hot-reload newly downloaded challenges |\n| `POST \u002Fapi\u002Fstart_level` | 🔒 Start all challenges at a level |\n| `POST \u002Fapi\u002Fstop_level` | 🔒 Stop all challenges at a level |\n| `GET \u002Fapi\u002Finstance_statuses` | 🔒 Batch query instance statuses |\n\n### Store API (🔒 Admin only)\n\n| Route | Description |\n|-------|-------------|\n| `GET \u002Fapi\u002Fstore\u002Fmanifest` | Fetch remote challenge manifest |\n| `POST \u002Fapi\u002Fstore\u002Fdownload` | Download a challenge by ID |\n| `POST \u002Fapi\u002Fstore\u002Fdownload-all` | Download all challenges in a category |\n| `POST \u002Fapi\u002Fstore\u002Fdelete` | Delete a downloaded challenge |\n| `POST \u002Fapi\u002Fstore\u002Fimport` | Import a local zip file |\n\n### Prebuild API (🔒 Admin only)\n\n| Route | Description |\n|-------|-------------|\n| `POST \u002Fapi\u002Fprebuild\u002Fstart` | Start image pre-build (supports selective `{codes: [...]}`) |\n| `POST \u002Fapi\u002Fprebuild\u002Fstop` | Stop pre-build |\n| `GET \u002Fapi\u002Fprebuild\u002Fstatus` | Query pre-build progress |\n| `POST \u002Fapi\u002Fprebuild\u002Fremove` | Remove a pre-built image |\n| `POST \u002Fapi\u002Fprebuild\u002Fremove_batch` | Remove selected pre-built images `{codes: [...]}` |\n| `POST \u002Fapi\u002Fprebuild\u002Fremove_all` | Remove all pre-built images |\n\n### MCP Server\n\nThe platform exposes an MCP (Model Context Protocol) endpoint at `\u002Fmcp\u002F` using Streamable HTTP transport, allowing AI agents to interact with challenges directly.\n\n**Tools:**\n\n| Tool | Description | Parameters |\n|------|-------------|------------|\n| `list_challenges` | Get challenge list with team progress | — |\n| `start_challenge` | Start a challenge instance | `code` |\n| `stop_challenge` | Stop a running instance | `code` |\n| `submit_flag` | Submit flag answer | `code`, `flag` |\n| `view_hint` | View challenge hint (10% score penalty) | `code` |\n\n**Authentication:** `Authorization: Bearer \u003Cagent_token>` header.\n\n**Claude Code example:**\n\n```bash\nclaude mcp add benchmark-platform \\\n  --transport http \\\n  --header \"Authorization: Bearer \u003CYOUR_TOKEN>\" \\\n  http:\u002F\u002F\u003CSERVER_HOST>:8088\u002Fmcp\u002F\n```\n\n**JSON config (Cursor, Cline, Windsurf, etc.):**\n\n```json\n{\n  \"mcpServers\": {\n    \"benchmark-platform\": {\n      \"url\": \"http:\u002F\u002F\u003CSERVER_HOST>:8088\u002Fmcp\u002F\",\n      \"headers\": {\n        \"Authorization\": \"Bearer \u003CYOUR_TOKEN>\"\n      }\n    }\n  }\n}\n```\n\n## Agent Integration\n\nThe platform supports two integration methods for AI agents:\n\n1. **MCP (recommended)** — Connect via Streamable HTTP at `\u002Fmcp\u002F`, AI agents call tools directly without writing HTTP code. See [MCP Server](#mcp-server) section above.\n\n2. **REST API** — Standard HTTP endpoints at `\u002Fapi\u002F*`. Authenticate with `Agent-Token: \u003Ctoken>` header. See [REST API](#rest-api) section above.\n\nFor complete API\u002FMCP protocol documentation and integration examples (LangChain, openai-agents, Python native), see the [Tsec-Hackathon documentation](https:\u002F\u002Fgithub.com\u002FYeti-791\u002FTsec-Hackathon\u002Ftree\u002Fmain\u002F%E7%AC%AC%E4%BA%8C%E5%B1%8A%E6%99%BA%E8%83%BD%E6%B8%97%E9%80%8F%E9%BB%91%E5%AE%A2%E6%9D%BE).\n\n## Challenge Format\n\nEach challenge is a directory containing:\n\n```\nXBEN-001-24\u002F\n├── docker-compose.yml    # Required (services read FLAG from environment)\n├── benchmark.json        # Metadata (name, description, level, points)\n├── benchmark.yaml        # Optional, multi-flag definitions\n├── .env                  # FLAG placeholder (replaced at runtime with dynamic flag)\n└── app\u002F mysql\u002F ...       # Application code\n```\n\nThe platform injects a unique `flag{uuid}` into each instance at startup — challenge source code should read the flag from the `FLAG` environment variable rather than hardcoding it.\n\n> **Contributing challenges:** This repo is the platform only. Challenge source code lives in [benchmark-challenges](https:\u002F\u002Fgithub.com\u002Fwgpsec\u002Fbenchmark-challenges). To add or modify challenges, submit PRs there.\n\n## Tech Stack\n\n- **Backend**: FastAPI + Uvicorn\n- **Frontend**: Jinja2 + HTMX + Alpine.js + Tailwind CSS (CDN)\n- **Containers**: Docker Compose\n- **Logging**: Structured JSONL\n\n## References\n\n- [xbow-engineering\u002Fvalidation-benchmarks](https:\u002F\u002Fgithub.com\u002Fxbow-engineering\u002Fvalidation-benchmarks)\n- [Neuro-Sploit\u002Fxbow-validation-benchmarks](https:\u002F\u002Fgithub.com\u002FNeuro-Sploit\u002Fxbow-validation-benchmarks)\n- [Yeti-791\u002FTsec-Hackathon](https:\u002F\u002Fgithub.com\u002FYeti-791\u002FTsec-Hackathon)\n- [Cyberdefense\u002FGOAD](https:\u002F\u002Fgithub.com\u002FOrange-Cyberdefense\u002FGOAD)\n- [dockur\u002Fwindows](https:\u002F\u002Fgithub.com\u002Fdockur\u002Fwindows)\n- [pensarai\u002Fargus-validation-benchmarks](https:\u002F\u002Fgithub.com\u002Fpensarai\u002Fargus-validation-benchmarks)\n\n## WgpSec Agentic Ecosystem\n\nbenchmark-platform is the evaluation layer of the **WgpSec Agentic Ecosystem** — measuring how well AI agents perform in real offensive security scenarios.\n\n```\n┌───────────────────── WgpSec Agentic Ecosystem ─────────────────────┐\n│                                                                     │\n│  Knowledge ➜ Service ➜ Execution ➜ Evaluation                      │\n│                                                                     │\n│  AboutSecurity ──▶ context1337 ──▶ tchkiller ──▶ benchmark-platform │\n│  (Knowledge Base)  (MCP Server)    (Pentest Agent)  (this repo)    │\n│                                         ▲                           │\n│                                    PoJun (General Solver)           │\n│                                                                     │\n└─────────────────────────────────────────────────────────────────────┘\n```\n\n| Project | Role |\n|---------|------|\n| [AboutSecurity](https:\u002F\u002Fgithub.com\u002Fwgpsec\u002FAboutSecurity) | Structured pentest knowledge base (Skills, Dic, Payload, Vuln) |\n| [context1337](https:\u002F\u002Fgithub.com\u002Fwgpsec\u002Fcontext1337) | MCP Server — turns AboutSecurity into a searchable API for AI agents |\n| [tchkiller](https:\u002F\u002Fgithub.com\u002Fwgpsec\u002Ftchkiller) | Autonomous pentest agent with multi-round decision-making and team collaboration |\n| [benchmark-platform](https:\u002F\u002Fgithub.com\u002Fwgpsec\u002Fbenchmark-platform) | CTF challenge platform for evaluating agent offensive capabilities |\n| [benchmark-challenges](https:\u002F\u002Fgithub.com\u002Fwgpsec\u002Fbenchmark-challenges) | Challenge data repository — packed & distributed via GitHub Releases |\n| PoJun | General-purpose AI problem-solving engine (private) |\n\n## FAQ\n\n### \"all predefined address pools have been fully subnetted\"\n\nWhen starting many challenges simultaneously, Docker may run out of network address space. This is because Docker allocates a `\u002F16` subnet per network by default, which limits the total number of networks.\n\n**Fix:** Add `default-address-pools` to your Docker daemon config (Docker Desktop → Settings → Docker Engine):\n\n```json\n{\n  \"default-address-pools\": [\n    {\n      \"base\": \"172.17.0.0\u002F12\",\n      \"size\": 24\n    }\n  ]\n}\n```\n\nClick **Apply & Restart**. This allocates `\u002F24` per network (254 IPs each, more than enough for a challenge), allowing 4000+ concurrent networks.\n\n## License\n\n[MIT](LICENSE)\n","该项目是一个用于安全能力评估的AI agent CTF靶场竞赛平台。其核心功能包括动态管理挑战实例（通过Docker Compose实现启动、停止及健康检查），支持动态注入唯一标志符以增强安全性，提供一个挑战库允许用户浏览、下载和导入来自GitHub Releases的挑战，并且支持中国镜像加速下载。此外，它还具备热重载新挑战、多标志支持、难度分级解锁、实时状态显示、提交历史记录与评分系统等功能。特别地，该平台设计了MCP服务器端点便于集成各类AI代理工具，以及基于cookie的Web UI认证机制。此项目适用于网络安全教育、CTF比赛训练、安全技能测试等场景。","2026-06-06 04:02:36","CREATED_QUERY"]