[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74047":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},74047,"autoagent","kevinrgu\u002Fautoagent","kevinrgu","autonomous harness engineering","",null,"Python",4486,499,29,3,0,12,21,76,36,95.2,false,"main",true,[],"2026-06-12 04:01:12","\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fwww.thirdlayer.inc\">\n    \u003Cimg src=\"https:\u002F\u002Fwww.thirdlayer.inc\u002Fthirdlayer-logo.svg\" alt=\"thirdlayer\" width=\"200\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cblockquote>\n\u003Cp>We're launching a product around self-configuring agents soon. \u003Ca href=\"https:\u002F\u002Fform.typeform.com\u002Fto\u002FZQbnbO09\">Sign up here.\u003C\u002Fa>\u003Cbr>We're hiring engineers. If this work interests you, reach out to \u003Ca href=\"mailto:hello@thirdlayer.inc\">hello@thirdlayer.inc\u003C\u002Fa> with your Github link.\u003C\u002Fp>\n\u003C\u002Fblockquote>\n\n# AutoAgent\n\n> Like autoresearch but for agent engineering. Give an AI agent a task, let it build and iterate on an agent harness autonomously overnight. It modifies the system prompt, tools, agent configuration, and orchestration, runs the benchmark, checks the score, keeps or discards the change, and repeats.\n\n![teaser](progress.png)\n\nThe core idea is the same: you're not touching the harness Python files like you normally would as an engineer. Instead, you program `program.md`, the Markdown file that provides context to the meta-agent and defines the agent-engineering loop.\n\n## How it works\n\nThe repo has a few files and directories that matter:\n\n- **`agent.py`** -- the entire harness under test in a single file. It contains\n  config, tool definitions, agent registry, routing\u002Forchestration, and the\n  Harbor adapter boundary. The adapter section is explicitly marked as fixed;\n  the rest is the primary edit surface for the meta-agent.\n- **`program.md`** -- instructions for the meta-agent + the directive (what\n  kind of agent to build). **This file is edited by the human**.\n- **`tasks\u002F`** -- evaluation tasks in\n  [harbor](https:\u002F\u002Fgithub.com\u002Flaude-institute\u002Fharbor) format. In a clean\n  baseline branch, benchmark payloads may be omitted and added in\n  benchmark-specific branches.\n- **`.agent\u002F`** -- optional workspace artifacts for reusable instructions,\n  notes, prompts, or skills.\n\nThe metric is total **score** produced by the benchmark's task test suites. The\nmeta-agent hill-climbs on this score.\n\n## Quick start\n\n**Requirements:** Docker, Python 3.10+, [uv](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F), and\nwhatever model-provider credentials your current `agent.py` harness requires.\n\n```bash\n# 1. Install uv (if you don't have it)\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n\n# 2. Install dependencies\nuv sync\n\n# 3. Set up the environment variables required by your current agent\u002Fruntime\n# Example:\ncat > .env \u003C\u003C 'EOF'\nOPENAI_API_KEY=...\nEOF\n\n# 4. Build base image\ndocker build -f Dockerfile.base -t autoagent-base .\n\n# 5. Add tasks to tasks\u002F (see Task format section below)\n\n# 6. Run a single benchmark task\nrm -rf jobs; mkdir -p jobs && uv run harbor run -p tasks\u002F --task-name \"\u003Ctask-name>\" -l 1 -n 1 --agent-import-path agent:AutoAgent -o jobs --job-name latest > run.log 2>&1\n\n# 7. Run all tasks in parallel (-n = concurrency, default 4)\nrm -rf jobs; mkdir -p jobs && uv run harbor run -p tasks\u002F -n 100 --agent-import-path agent:AutoAgent -o jobs --job-name latest > run.log 2>&1\n```\n\n## Running the meta-agent\n\nPoint your coding agent at the repo and prompt:\n\n```\nRead program.md and let's kick off a new experiment!\n```\n\nThe meta-agent will read the directive, inspect the current harness, run the\nbenchmark, diagnose failures, modify `agent.py`, and iterate.\n\n## Project structure\n\n```text\nagent.py                       -- single-file harness under test\n  editable harness section     -- prompt, registries, tools, routing\n  fixed adapter section        -- Harbor integration + trajectory serialization\nprogram.md                     -- meta-agent instructions + directive\nDockerfile.base                -- base image\n.agent\u002F                        -- optional agent workspace artifacts\ntasks\u002F                         -- benchmark tasks, typically added in benchmark-specific branches\njobs\u002F                          -- Harbor job outputs\nresults.tsv                    -- experiment log (created by meta-agent, gitignored)\nrun.log                        -- latest run output\n```\n\n## Task format\n\nThe repo ships without tasks. Add your own to `tasks\u002F` following [Harbor's task format](https:\u002F\u002Fharborframework.com\u002Fdocs\u002Ftasks):\n\n```text\ntasks\u002Fmy-task\u002F\n  task.toml           -- config (timeouts, metadata)\n  instruction.md      -- prompt sent to the agent\n  tests\u002F\n    test.sh           -- entry point, writes \u002Flogs\u002Freward.txt\n    test.py           -- verification (deterministic or LLM-as-judge)\n  environment\u002F\n    Dockerfile        -- task container (FROM autoagent-base)\n  files\u002F              -- reference files mounted into container\n```\n\nTests write a score (0.0-1.0) to the verifier logs. The meta-agent hill-climbs\non this. See the [Harbor docs](https:\u002F\u002Fharborframework.com\u002Fdocs) for full details on writing and porting tasks.\n\n## Design choices\n\n- **Program the meta-agent, not the harness directly.** The human steers the\n  loop through `program.md`, while the meta-agent edits `agent.py`.\n- **Single-file, registry-driven harness.** The implementation lives in one\n  file for simplicity, but agent and tool registration stay structured so the\n  harness can still evolve cleanly.\n- **Docker isolation.** The agent runs in a container. It can't damage the host.\n- **Score-driven.** Every experiment produces a numeric score. Keep if better,\n  discard if not. Same loop as autoresearch.\n- **Harbor-compatible tasks.** Tasks use the same format as harbor benchmarks,\n  so the same harness can be evaluated on different datasets.\n\n## Cleanup\n\nDocker images and containers accumulate across runs. Clean up regularly:\n\n```bash\n# Harbor's cached task images + task cache\nuv run harbor cache clean -f\n\n# Full Docker nuke (all unused images, build cache, etc.)\ndocker system prune -a -f\n\n# Lighter: just dead containers\ndocker container prune -f\n```\n\nIf Docker becomes unresponsive (for example after many concurrent runs), restart\nDocker Desktop:\n\n```bash\nkillall Docker && open -a Docker\n```\n\n## Improving performance with skills\n\nYou can equip the agent with [Agent Skills for Context Engineering](https:\u002F\u002Fgithub.com\u002Fmuratcankoylan\u002FAgent-Skills-for-Context-Engineering) and [context7](https:\u002F\u002Fgithub.com\u002Fupstash\u002Fcontext7) skills to improve performance.\n\n## License\n\nMIT\n\n","AutoAgent 是一个用于自主代理工程的工具，它能够自动构建和迭代AI代理。项目的核心功能是通过修改系统提示、工具、代理配置及编排来优化代理性能，并通过基准测试评估结果，持续改进直至达到最佳状态。其主要技术特点是基于Python实现，并且用户只需编辑`program.md`文件提供上下文信息和构建指令即可，无需直接操作复杂的Python代码。适合需要自动化生成或优化特定任务执行代理的场景，如自动化研究、复杂问题解决等。",2,"2026-06-11 03:48:33","high_star"]