[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80971":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":12,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":25,"discoverSource":26},80971,"HarnessX","Darwin-Agent\u002FHarnessX","Darwin-Agent","HarnessX is a harness foundry: forge any number of agent harnesses from reusable processors and bundles, pair each with any model, and evolve them through training.",null,"Python",33,1,31,0,2,0.9,"MIT License",false,"main",true,[],"2026-06-12 02:04:09","\u003Ctable align=\"center\" border=\"0\" cellspacing=\"0\" cellpadding=\"0\">\u003Ctr>\u003Ctd align=\"center\" valign=\"middle\">\n\u003Cimg src=\"docs\u002Fassets\u002Fharnessx_logo.png\" alt=\"HarnessX Logo\" width=\"72\"\u002F>\n\u003C\u002Ftd>\u003Ctd align=\"left\" valign=\"middle\">\n\u003Cpicture>\n  \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"docs\u002Fassets\u002Fharnessx_wordmark_dark.png\"\u002F>\n  \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"docs\u002Fassets\u002Fharnessx_wordmark.png\"\u002F>\n  \u003Cimg alt=\"HarnessX\" src=\"docs\u002Fassets\u002Fharnessx_wordmark.png\" height=\"48\"\u002F>\n\u003C\u002Fpicture>\n\u003Cbr\u002F>\n\u003Cb>Compose. &nbsp; Adapt. &nbsp; Evolve.\u003C\u002Fb>\n\u003C\u002Ftd>\u003C\u002Ftr>\u003C\u002Ftable>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Compose the Harness, define the Agent.\u003Cbr\u002F>\n  From zero-code to full customization — one core, X entry points.\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"LICENSE\">\u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-22c55e?style=flat\"\u002F>\u003C\u002Fa>\n  \u003Cimg alt=\"Python\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.11+-3b82f6?style=flat&logo=python&logoColor=white\"\u002F>\n  \u003Cimg alt=\"Version\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fversion-0.1.0-a855f7?style=flat\"\u002F>\n  \u003Cimg alt=\"Status\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FStatus-Beta-f59e0b?style=flat\"\u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#overview\">Overview\u003C\u002Fa> •\n  \u003Ca href=\"#architecture\">Architecture\u003C\u002Fa> •\n  \u003Ca href=\"#quick-start\">Quick Start\u003C\u002Fa> •\n  \u003Ca href=\"#benchmarks\">Benchmarks\u003C\u002Fa> •\n  \u003Ca href=\"#roadmap\">Roadmap\u003C\u002Fa> •\n  \u003Ca href=\"README_zh.md\">中文文档\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n\u003Ca id=\"overview\">\u003C\u002Fa>\n## 🔭 Overview\n\n\n> **The harness — not just the model — determines agent performance.** The same base model produces dramatically different results depending on how context is managed, how tools are orchestrated, how errors are recovered, and how evaluation signals feed back.\n\nHarnessX is a **harness foundry**: forge any number of agent harnesses from reusable processors and bundles, pair each with any model, and evolve them through training — all without rewriting the agent.\n\nMost frameworks solved model swapping. **Behavior swapping** remains expensive — switching from a coding agent to a research agent, adding memory or guardrails, means rewriting the agent.\n\nHarnessX solves this with one clean separation:\n\n```python\nagent = model.agentic(harness)\n```\n\n- `ModelConfig` — provider routing, fallback, per-role model assignment\n- `HarnessConfig` — the full behavior pipeline (tools, memory, processors, trace, sandbox)\n\nThe **X** in Harness**X** stands for e**X**tensible Behavior Composition — compose, adapt, and evolve harnesses without rewriting the agent:\n\n🧩 **Compose** — 9-dimension behavior pipeline; any behavior = Processor, combine with `|` operator.\n\n⚙️ **Adapt** — Harness observes performance and auto-searches optimal harness configurations.\n\n🚀 **Evolve** — every run produces reward-annotated trajectories that feed SFT \u002F RL training.\n\n---\n\n\u003Ca id=\"architecture\">\u003C\u002Fa>\n## 🏗️ Architecture\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fassets\u002Fharnessx_architecture.png\" alt=\"HarnessX Architecture\" width=\"800\"\u002F>\n\u003C\u002Fp>\n\n→ See **[docs\u002Farchitecture.md](docs\u002Farchitecture.md)** for the full 9-dimension behavior pipeline, processor hook points, and composition API.\n\n---\n\n\u003Ca id=\"quick-start\">\u003C\u002Fa>\n\n## 🚀 Quick Start\n\n\u003Cdetails>\n\u003Csummary>Click to expand\u003C\u002Fsummary>\n\n### Install\n\n**One-click install** (interactive — asks before installing uv, Node.js, and optional IM Gateway):\n\n```bash\ncurl -sSf https:\u002F\u002Fraw.githubusercontent.com\u002FDarwin-Agent\u002FHarnessX\u002Fmain\u002Fscripts\u002Finstall.sh | bash\n```\n\n**Non-interactive — install everything without prompts:**\n\n```bash\ncurl -sSf https:\u002F\u002Fraw.githubusercontent.com\u002FDarwin-Agent\u002FHarnessX\u002Fmain\u002Fscripts\u002Finstall.sh | bash -s -- --all\n```\n\nBoth commands install uv, Python 3.12, harnessx, and (with Node.js available) the Harness Lab frontend.\nAfter installation, reload your shell or run `source ~\u002F.bashrc` (or `~\u002F.zshrc` on macOS).\n\n\u003Cdetails>\n\u003Csummary>Manual install with \u003Ca href=\"https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F\">uv\u003C\u002Fa>\u003C\u002Fsummary>\n\n```bash\nuv python install 3.12\nuv venv --python 3.12 .venv\nsource .venv\u002Fbin\u002Factivate\nuv pip install -e .\n# Build frontend (required for hx lab)\ncd frontend && npm install && npm run build && cd ..\n```\n\n\u003C\u002Fdetails>\n\n### CLI\n\n```bash\nexport ANTHROPIC_API_KEY=sk-...\n\nhx \"Research 2026 AI agent trends and write a structured report\"\nhx -p \"Write a Python fizzbuzz\"     # non-interactive, print and exit\nhx -c path\u002Fto\u002Fconfig.yaml           # load a YAML config\nhx --resume \u003Crun_id>                # resume a previous session\nhx lab                              # open the Lab UI at localhost:8000\n```\n\n### IM Gateway\n\nConnect your agent to Feishu, Telegram, Slack, Discord, or DingTalk with a single service.\nThe gateway ships with a built-in React console for managing channels, sessions, and workspaces.\n\n```bash\nhx-gateway start   # start the gateway (configured in ~\u002F.harnessx\u002Fgateway.yaml)\n```\n\n→ See **[gateway\u002FREADME.md](gateway\u002FREADME.md)** for setup, channel configuration, and architecture.\n\n### Python SDK\n\n\u003Cdetails>\n\u003Csummary>Minimal runnable example\u003C\u002Fsummary>\n\n```python\nimport asyncio\nfrom harnessx import BaseTask, HarnessConfig\nfrom harnessx.core.model_config import ModelConfig\nfrom harnessx.providers.anthropic_provider import AnthropicProvider\n\nasync def main():\n    model = ModelConfig(main=AnthropicProvider(\"claude-sonnet-4-6\"))\n    harness = model.agentic(HarnessConfig())\n    result = await harness.run(BaseTask(description=\"What is 2 + 2?\"))\n    print(result.final_output)\n\nasyncio.run(main())\n```\n\n\u003C\u002Fdetails>\n\n\u003C\u002Fdetails>\n\n---\n\n\u003Ca id=\"benchmarks\">\u003C\u002Fa>\n## 📊 Benchmarks\n\nHarnessX provides two evolution loops that systematically improve agent performance on any benchmark:\n\n- **Harness Evolution** — a meta-harness analyzes trajectories and automatically searches for better processor combinations, prompt strategies, and tool configurations, *without changing the model*.\n- **Model Evolution** — reward-annotated trajectories from harness runs feed RL fine-tuning (via [VERL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl)), improving the model itself.\n\nThe two loops compose: evolve the harness first, then evolve the model on top. Below are results on the [GAIA](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fgaia-benchmark\u002FGAIA) benchmark. See [`benchmarks\u002FREADME.md`](benchmarks\u002FREADME.md) for additional benchmarks and adapter details.\n\n### Harness Evolution (Qwen 3.5 9B)\n\nStarting from a default harness (R0, 33%), the meta-harness discovers better configurations round by round — reaching **47%** by R3, a **+14pp gain** with zero model changes. → Reproduce: [`recipe\u002Fgaia_evolver\u002F`](recipe\u002Fgaia_evolver\u002F)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fassets\u002FHarness_Evolution_Config.png\" alt=\"Harness Evolution Config — Round 0 to Round 3\" width=\"800\"\u002F>\n\u003C\u002Fp>\n\n### Harness Evolution (GPT-5)\n\nThe same approach scales to frontier models. Overall GAIA accuracy rises from 62% to **84%** after evolution, with gains across all five domains. → Reproduce: [`recipe\u002Fgaia_evolver\u002F`](recipe\u002Fgaia_evolver\u002F)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fassets\u002FHarness_Evolution.png\" alt=\"Harness Evolution — Per-Domain Accuracy\" width=\"700\"\u002F>\n\u003C\u002Fp>\n\n### Model-Harness Co-Evolution (Qwen 3.5 9B)\n\nWhen the two loops run together, the gains compound: harness evolution lifts the baseline from 33.97% to 41.67%; model evolution pushes it further to **55.77%** — a **+64% relative improvement**, all on a 9B model. → Reproduce: [`recipe\u002Fverl_harnessX\u002F`](recipe\u002Fverl_harnessX\u002F)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fassets\u002FHarnessX_Model_Co_evolution.png\" alt=\"Model-Harness Co-Evolution\" width=\"700\"\u002F>\n\u003C\u002Fp>\n\n---\n\n\u003Ca id=\"structure\">\u003C\u002Fa>\n## 📁 Project Structure\n\n```\nHarnessX\u002F\n├── harnessx\u002F                  # 🧠 Core framework\n│   ├── core\u002F                  #    Harness, Builder, RunLoop, State, Events, Trajectory\n│   ├── processors\u002F            #    7 categories × multiple processors\n│   │   ├── context\u002F           #    📝 System prompt, history, user wrapper\n│   │   ├── control\u002F           #    🛡️ 13 safety & reliability processors\n│   │   ├── evaluation\u002F        #    📊 LLM judge, PRM, self-verify\n│   │   ├── memory\u002F            #    🧠 Extraction, retrieval, 5 strategies\n│   │   ├── multi_model\u002F       #    🔗 Model routing\n│   │   ├── observability\u002F     #    🔭 OTel, checkpoints, metrics\n│   │   └── tools\u002F             #    🔧 Skill loader, schema adapter, filters\n│   ├── providers\u002F             # 🔌 6 model backends + agentic mixin\n│   ├── plugins\u002F               # 🧩 Plugin base, discovery, builtins, dimensions\n│   │   └── dimensions\u002F\n│   │       └── light_memory\u002F  # 🧠 Light-Memory (self-developed)\n│   ├── tools\u002F                 # ⚒️ Tool registry, builtins\n│   ├── sandbox\u002F               # 📦 Local, Docker, E2B\n│   ├── tracing\u002F               # 📡 Journal, OTel, null tracer\n│   ├── rl\u002F                    # 🧬 RLConfigSpec, TaskBuilder\n│   ├── bundles\u002F               # 📦 Pre-composed capability bundles\n│   ├── api\u002F                   # 🌐 FastAPI + SSE for Lab UI\n│   └── cli.py                 # ⌨️ CLI entry point (hx)\n├── benchmarks\u002F                # 📊 4 integrated + 3 ongoing benchmarks\n├── recipe\u002F                    # 🧪 slime (RL training recipe)\n├── examples\u002F                  # 📖 coding \u002F research \u002F assistant \u002F custom_processor\n├── extensions\u002F                # 🔌 Skills (docx, pdf, pptx, xlsx)\n├── frontend\u002F                  # 🖥️ Lab UI (React + TypeScript + Tailwind)\n└── tests\u002F                     # ✅ Unit, integration, E2E\n```\n\n---\n\n\u003Ca id=\"roadmap\">\u003C\u002Fa>\n## 🗺️ Roadmap\n\n> For detailed design notes and motivation behind planned items, see [ROADMAP](docs\u002FROADMAP.md).\n\n| Phase | Focus | Status |\n|:-----:|-------|:------:|\n| **1** | Core: 9-dimension behavior pipeline, 13 processors, multi-provider, SFT\u002FRL bridge, 4 benchmarks, Lab UI | ![current](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-current-22c55e?style=flat-square) |\n| **2** | Meta-opt: Bayesian Optimization, Meta-Harness, auto config search | ![in progress](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-in%20progress-f59e0b?style=flat-square) |\n| **3** | Self-evolution: closed-loop training, HarnessHUB community marketplace | ![planned](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-planned-8b5cf6?style=flat-square) |\n| **4** | Memory: multimodal backends, third-party integrations (VERL, SuperMemory, OpenVKing) | ![planned](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F-planned-8b5cf6?style=flat-square) |\n\n### In-Repo Implementations\n\n- [x] **[Light-Memory](docs\u002Ffeats\u002Flight-memory.md)** — file-based memory with time-decay, daily compression, git versioning (`harnessx\u002Fplugins\u002Fdimensions\u002Flight_memory\u002F`)\n- [x] **Slime RL recipe** — SGLang rollout adapter + token annotation + GRPO training pipeline (`recipe\u002Fslime\u002F`)\n- [x] **MetaHarness** — agent observes its own trajectories and proposes harness config changes; observer harness + meta-agent + sandboxed promotion loop\n- [ ] **LoCoMo benchmark** — long-context memory evaluation: session recall, cross-turn consistency, compaction fidelity\n- [ ] **Bayesian Optimization** — surrogate model search over the ~10^6-configuration harness space\n- [ ] **HarnessHUB** — community platform to publish, version, and pull `HarnessConfig` bundles (`hx pull coding-agent@v1.2`; Lab UI panel; private registries)\n- [ ] **Multimodal Memory** — CLIP-based image\u002Fvideo memory backend via the plugin system\n- [ ] **Harness Memory Evolution** — closed loop: trajectories → RL fine-tuning → better model → better harness; population-level mutation + data flywheel\n\n### Third-Party Integrations *(opt-in, live in `recipe\u002F`)*\n\n- [x] **[VERL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl)** — connect HarnessX rollouts to distributed PPO \u002F GRPO training loops\n- [ ] **[MemPalace](https:\u002F\u002Fgithub.com\u002Fmem-palace\u002Fmem-palace)** — structured episodic memory backend\n- [ ] **[SuperMemory](https:\u002F\u002Fsupermemory.ai)** — cloud-backed semantic memory via the plugin system\n- [ ] **[OpenVKing](https:\u002F\u002Fgithub.com\u002Fopenvking)** — vector-knowledge-graph memory for entity-rich domains\n- [ ] **Memory quality metrics** — retrieval precision \u002F recall surfaced through HarnessJournal\n- [ ] **Data synthesis pipeline** — controlled SFT \u002F preference-dataset generation with diversity constraints\n\n---\n\n## 🤝 Contributing\n\nHarnessX is **fully open-source** under the MIT License. Contributions are welcome for:\n\n- 🧩 **New processors** — behavior modules for unexplored dimensions\n- 🧠 **New memory backends** — via the plugin system\n- 📊 **New benchmark adapters** — `benchmarks\u002F` pattern\n- 🧪 **RL training recipes** — `recipe\u002F`\n- 🖥️ **Lab UI improvements**\n\nPlease read [CONTRIBUTING.md](CONTRIBUTING.md) first.\n\n---\n\n```bibtex\n@software{harnessx2026,\n  title   = {HarnessX: A Composable, Self-Evolving Agent Harness Foundry},\n  author  = {Darwin Agent Team},\n  year    = {2026},\n  url     = {https:\u002F\u002Fgithub.com\u002FDarwin-Agent\u002FHarnessX},\n  license = {MIT},\n}\n```\n\n---\n\n\u003Cdiv align=\"center\">\n  \u003Cstrong>HARNESS\u003C\u002Fstrong>\u003Cstrong>X\u003C\u002Fstrong> — \u003Cem>Compose. Adapt. Evolve.\u003C\u002Fem>\n  \u003Cbr\u002F>\n  \u003Csub>Built with care by the \u003Cstrong>Darwin Agent Team\u003C\u002Fstrong>\u003C\u002Fsub>\n\u003C\u002Fdiv>\n","HarnessX 是一个用于构建和优化智能体行为框架的工具，允许用户通过可重用处理器和捆绑包创建任意数量的智能体框架，并与任何模型配对。其核心功能包括9维度的行为管道配置、自动搜索最优框架配置以及通过训练不断进化智能体的能力。技术上，HarnessX 使用 Python 开发，支持从零代码到完全自定义的灵活配置方式。它特别适用于需要快速调整智能体行为以适应不同任务场景的情况，如从编码助手转变为研究助手或添加记忆功能等，无需重写整个智能体程序。","2026-06-11 04:03:02","CREATED_QUERY"]