[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-84123":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":10,"trendingCount":16,"starSnapshotCount":16,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},84123,"AgentHarness","ApodexAI\u002FAgentHarness","ApodexAI","Evaluation harness for Apodex-1.0 on public deep-research benchmarks.","https:\u002F\u002Fwww.apodex.com",null,"Python",120,7,57,1,0,16,34,66,83.11,"Apache License 2.0",false,"main",[],"2026-06-12 04:01:42","\u003Cdiv align=\"center\">\n  \u003Cpicture>\n    \u003Cimg src=\".\u002Fassets\u002Fapodex_logo.png\" width=\"30%\" alt=\"Apodex-1.0\">\n  \u003C\u002Fpicture>\n\u003C\u002Fdiv>\n\u003Chr>\n\u003Cdiv align=\"center\" style=\"line-height:1\">\n\u003Ca href=\"https:\u002F\u002Fwww.apodex.ai\" target=\"_blank\">\u003Cimg alt=\"Research\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤖%20Online Service-Apodex 1.0-ff6b6b?color=1783ff&logoColor=white\"\u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.apodex.com\u002F\" target=\"_blank\">\u003Cimg alt=\"Homepage\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHomepage-Apodex AI-white?logoColor=white\"\u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fplatform.apodex.ai\" target=\"_blank\">\u003Cimg alt=\"API\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FAPI-Apodex 1.0-1783ff?color=1783ff&logoColor=white\"\u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\" style=\"line-height: 1;\">\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fapodex\" target=\"_blank\">\u003Cimg alt=\"Hugging Face\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Apodex AI-ffc107?color=ffc107&logoColor=white\"\u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FApodexAI\" target=\"_blank\">\u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-Apodex AI-white?logo=github&logoColor=white\"\u002F>\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fdiscord.gg\u002FTDJA59TCng\" target=\"_blank\">\u003Cimg alt=\"Discord\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-Apodex AI-white?logo=discord&logoColor=white\"\u002F>\u003C\u002Fa>\n\u003Ca href=\"LICENSE\">\u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache_2.0-blue?color=blue\"\u002F>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\u003Cp align=\"center\">\n\u003Cb>📰\u003Ca href=\"https:\u002F\u002Fwww.apodex.com\u002Fblog\u002Fapodex-1.0\">Tech Blog\u003C\u002Fa>\u003C\u002Fb> |  \u003Cb>📄\u003Ca href=\"http:\u002F\u002Fwww.apodex.com\u002Fpdf\u002F20260608\">Tech Report\u003C\u002Fa>\u003C\u002Fb>\n\u003C\u002Fp>\n\n---\n\n# AgentHarness\n\n**Evaluation harness for [Apodex-1.0](https:\u002F\u002Fhuggingface.co\u002Fapodex\u002FApodex) on public deep-research benchmarks.**\n\nAgentHarness is the open-source evaluation harness used to reproduce the public benchmark results for **Apodex-1.0** in a standard **ReAct setup**. Apodex-1.0 is a verification-centric model for deep research developed by the Apodex team. This repository focuses on the public, standard ReAct evaluation setup reported in the paper.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fapodex1.0_bench.png\" alt=\"Apodex-1.0 results across deep-research benchmarks\" width=\"800\"\u002F>\n\u003C\u002Fp>\n\n---\n\n## 📊 Performance\n\nOpen-source Apodex-1.0 variants on the four-benchmark deep-research suite:\n\n| Model               | BrowseComp | BrowseComp-ZH | HLE-Text | DeepSearchQA |\n| ------------------- | ---------- | ------------- | -------- | ------------ |\n| Apodex-1.0-mini     | 71.5       | 80.6          | 46.8     | 82.2         |\n| Apodex-1.0-4B-SFT   | 48.8       | 63.5          | 32.9     | 69.9         |\n| Apodex-1.0-2B-SFT   | 27.9       | 35.0          | 18.2     | 49.9         |\n| Apodex-1.0-0.8B-SFT | 13.9       | 10.7          | 11.2     | 25.8         |\n\n---\n\n## ⚡ Quick Start on the harness\n\n### 1. Install dependencies\n\n```bash\nuv sync --python 3.12\n```\n\n### 2. Serve the model (SGLang)\n\n```bash\npython3 -m sglang.launch_server \\\n  --model-path apodex\u002FApodex-1.0-35B-A3B \\\n  --tp 8 \\\n  --host 0.0.0.0 \\\n  --port 1234 \\\n  --context-length 262144 \\\n  --tool-call-parser qwen3_coder \\\n  --reasoning-parser qwen3 \\\n```\n\nFor smaller variants, other serving options, see the [Hugging Face model card](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fapodex\u002Fapodex-1).\n\n### 3. Configure environment variables\n\n```bash\ncp .env.example .env\n```\n\nFill in the required keys in `.env` — `OPENAI_BASE_URL` \u002F `OPENAI_API_KEY` \u002F `OPENAI_MODEL` point at the agent model (your SGLang endpoint from step 2 or any OpenAI-compatible service); `SERPER_API_KEY` \u002F `JINA_API_KEY` \u002F `E2B_API_KEY` enable web search, web fetch, and the code sandbox respectively.\n\n### 4. Download the benchmark datasets\n\n```bash\nwget https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fapodex\u002FDeep-Research-Benchmarks\u002Fresolve\u002Fmain\u002Fdeep_research_benchmarks_260607.zip\nunzip -P 'apodex*()_2026' deep_research_benchmarks_260607.zip\nrm deep_research_benchmarks_260607.zip\n```\n\nThe single quotes around the password are required — it contains shell-meta characters (`*`, `(`, `)`).\n\n> **HLE is not included.** Its license forbids redistributing the answers. To run `hle_text`, accept the license on [`cais\u002Fhle`](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fcais\u002Fhle) and place the JSONL at `benchmarks\u002Fdatasets\u002FHLE-text\u002Fstandardized_data.jsonl`.\n\n### 5. Run a smoke test\n\n```bash\nuv run python -m benchmarks.runner.run_subprocess \\\n  --benchmark browsecomp \\\n  --pipeline react_base \\\n  --profile default \\\n  --limit 1 \\\n  --concurrency 1 \\\n  --out .\u002Ftmp\u002Fsmoke\n```\n\n### 6. Run a full benchmark\n\n```bash\nuv run python -m benchmarks.runner.run_subprocess \\\n  --benchmark browsecomp \\\n  --pipeline react_base \\\n  --profile default \\\n  --runs 5 \\\n  --concurrency 30 \\\n  --out .\u002Fbc-runs\n```\n\n### 7. Check progress and aggregate accuracy\n\n```bash\nuv run python -m benchmarks.runner.check_progress .\u002Fbc-runs\n```\n\nEach question runs in its own subprocess, which makes runs easier to reproduce and debug:\n\n* isolated execution per question\n* no asyncio saturation\n* individual hangs can be `SIGKILL`'d\n* failed samples can be rerun independently\n\n---\n\n## ✅ Supported Benchmarks\n\nBrowseComp, BrowseComp-ZH, xbench-DeepResearch, Humanity's Last Exam (text-only), SuperChem, FrontierScience-Research, FrontierScience-Olympiad, DeepSearchQA, WideSearch\n\nSee [`benchmarks\u002FREADME.md`](benchmarks\u002FREADME.md) for dataset layout, judge configuration, and how to add a new benchmark.\n\n---\n\n## ⭐ Star History\n\n\u003Ca href=\"https:\u002F\u002Fstar-history.com\u002F#ApodexAI\u002FAgentHarness&Date\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=ApodexAI\u002FAgentHarness&type=Date&theme=dark\" \u002F>\n    \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=ApodexAI\u002FAgentHarness&type=Date\" \u002F>\n    \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=ApodexAI\u002FAgentHarness&type=Date\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n---\n\n## 📚 Citation\n\n```bibtex\n@techreport{apodex2026,\n  title  = {Apodex-1.0: A Verification-Centric Agent Team for Discoverative Intelligence},\n  author = {Apodex Team},\n  year   = {2026}\n}\n```\n\n---\n\n## 📄 License\n\nApache 2.0 — see [LICENSE](.\u002FLICENSE).\n",2,"2026-06-11 04:12:20","CREATED_QUERY"]