[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-928":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":28,"discoverSource":29},928,"HALO","context-labs\u002FHALO","context-labs","Hierarchal Agent Loop Optimizer",null,"TypeScript",833,62,5,2,0,18,29,280,54,9.4,false,"main",true,[],"2026-06-12 02:00:20","\u003C!-- \u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcontext-labs\u002Fuwu\">\n    \u003Cimg src=\"https:\u002F\u002Fem-content.zobj.net\u002Fthumbs\u002F240\u002Fapple\u002F354\u002Fsmiling-face-with-halo_1f607.png\" alt=\"😇\" width=\"100\" height=\"100\" style=\"vertical-align:middle;\">\u003C\u002Fspan>\n  \u003C\u002Fa>\n  \u003Cbr>\n  \u003Ch1>HALO\u003C\u002Fh1>\n\u003C\u002Fp> -->\n\n\u003Ch1 align=\"center\">\n  \u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcontext-labs\u002Fuwu\">\u003Cimg src=\"https:\u002F\u002Fem-content.zobj.net\u002Fthumbs\u002F240\u002Fapple\u002F354\u002Fsmiling-face-with-halo_1f607.png\" alt=\"😇\" width=\"150\" style=\"border-radius:8px;\">\u003C\u002Fa>\n   \u003Cbr>\n  HALO\n  \u003Cbr>\n\u003C\u002Fh1>\n\n\u003Ch4 align=\"center\">✨ RLM-based Automatic Agent Optimization Loop ✨\u003C\u002Fh4>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002Finference_net\">\n    \u003Cimg alt=\"X (formerly Twitter)\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-@inference.net-1DA1F2?style=flat&logo=x&logoColor=white\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT\">\n    \u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg\" \u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fcontext-labs\u002Fhalo\">\n    \u003Cimg alt=\"GitHub\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fcontext-labs\u002Fhalo?style=social\" \u002F>\n  \u003C\u002Fa>\n\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#what-is-this\">What is this?\u003C\u002Fa> •\n  \u003Ca href=\"#install\">Install\u003C\u002Fa> •\n  \u003Ca href=\"#why-an-rlm\">Why RLM?\u003C\u002Fa> •\n  \u003Ca href=\"#benchmarks\">Benchmarks\u003C\u002Fa> •\n  \u003Ca href=\"#development\">Development\u003C\u002Fa> •\n  \u003Ca href=\"#contributing\">Contributing\u003C\u002Fa>\n\u003C\u002Fp>\n\n## What is this?\n\nHALO (Hierarchical Agent Loop Optimization) is a methodology for building recursively self-improving agent harnesses using [RLMs](https:\u002F\u002Fgithub.com\u002Falexzhang13\u002Frlm). This repository contains:\n\n- Information on HALO methodology.\n- A Python package that implements the core HALO-RLM engine. [View on PyPI](https:\u002F\u002Fpypi.org\u002Fproject\u002Fhalo-engine\u002F)\n- A demo project that shows how to build HALO loops for your agents using the Python package. [View demo](\u002Fdemo\u002Fopenai-agents-sdk-demo\u002F)\n- Benchmarking examples applying HALO to popular agent benchmarks. (View [AppWorld](#appworld)).\n\n## HALO Loop\n\nThe core HALO loop is suprisingly simple:\n\n1. Collect execution traces from your agent harness. HALO uses OpenTelemetry-compatible tracing.\n2. Feed traces into HALO-RLM engine.\n3. The engine decomposes the traces to understand common failure modes and across harness executions and produces a report with it’s findings.\n4. This report is fed into a coding agent like Cursor or Claude Code to generate and apply a set of changes to your harness.\n5. The harness is then re-deployed, more traces are gathered, and the cycle repeats.\n\nHALO is great at finding issues in production agent deployments. We find high-traffic environments tend to generate more data with higher variance across executions, creating the type of issues that HALO is great at identifying.\n\n### Why an RLM?\n\nA general-purpose harness like Claude Code is the wrong tool for trace analysis. This isn’t because the model isn’t smart, but because traces can get extremely long, and you need a specialized toolkit in order to make observations about systemic agentic behavior. We noticed in our testing that harnesses like CC would often overfit to an error present in a single\u002Ffew traces rather than generalize to harness-level problems. This led us to creating a specialized form of a RLM.\n\n\u003Cimg src=\".\u002Fassets\u002F\u002Fhalo-rlm.png\" alt=\"rlm\"  style=\"border-radius:8px;\" width=\"600\">\n\n## Get Started\n\n### Install\n\nInstall the HALO engine + CLI from PyPI:\n\n```bash\npip install halo-engine\n\n# Verify installation\nhalo --help\n```\n\n### Usage\n\n1. [Integrate Tracing](docs\u002Fintegrations\u002Fopenai-agents-sdk.md)\n2. Collect traces by running your agent\n3. Run the HALO engine, see the [CLI](\u002Fhalo_cli\u002FREADME.md) docs for more info\n\n```bash\nexport OPENAI_API_KEY=...\n\nhalo path_to_your_traces.jsonl -p \"Diagnose errors you find and suggest fixes\"\n```\n\nWe have provided a [simple demo](\u002Fdemo\u002Fopenai-agents-sdk-demo\u002F) and an [AppWorld](#appworld) demo.\n\n## Benchmarks\n\nHALO is consistently capable of driving improvements on benchmarks, solely by optimizing the harness.\n\n### AppWorld\n\nWe applied HALO to the [AppWorld](https:\u002F\u002Fappworld.dev\u002F) benchmark, a set of agentic tasks that assess the LLM’s ability to use multi-app services like Spotify, Venmo, file systems, and phone contacts. We tested HALO’s ability to improve harnesses for both Gemini 3 Flash and Sonnet 4.6. We iterated on the harness using the `dev` split, and then used the `test_normal` split as a proxy to verify that improvements did not come from overfitting.\n\nThe feedback from HALO Engine surfaced failures in the harnesses such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt edit. HALO’s claims were independently verified from the source trace files with the findings holding up under scrutiny.\n\n\u003Cimg src=\".\u002Fassets\u002F\u002Fhalo-app-world-sgc.png\" alt=\"app-world-sgc\"  style=\"border-radius:8px;\">\n\u003C!-- \n  Note: Table cell styling is still limited in GitHub Markdown rendering,\n  and border-radius is not supported, but background color and padding usually work.\n  If this does not display as desired, you will need to update the image asset itself\n  to include padding and a black background.\n-->\nThe peak improvements over baseline were substantial for both models. For Gemini 3 Flash, dev SGC went from 36.8% to 52.6% (+15.8 points) and test_normal SGC went from 37.5% to 48.2% (+10.7 points). For Sonnet 4.6, dev SGC went from 73.7% to 89.5% (+15.8 points) and test_normal SGC went from 62.5% to 73.2% (+10.7 points).\n\n## Development\n\nLocal development against this repo uses [`uv`](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) for dependency management and [`go-task`](https:\u002F\u002Ftaskfile.dev\u002F) as the task runner.\n\n### Setup\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fcontext-labs\u002FHALO\ncd HALO\ntask env:setup\n```\n\n`task env:setup` installs `uv` (if missing), syncs the venv from `uv.lock`, and configures the repo's git hooks. After that, the `halo` CLI is available via `uv run halo ...` (or activate `.venv\u002F`).\n\n### Common tasks\n\nRun `task --list` for the full list. The ones you'll use most:\n\n| Task                    | What it does                                                                    |\n| ----------------------- | ------------------------------------------------------------------------------- |\n| `task check`            | Run all pre-commit checks: pinned-versions, lint, format, typecheck, unit tests |\n| `task check:fix`        | Same, but auto-fix lint\u002Fformat issues                                           |\n| `task test:unit`        | Unit tests under `tests\u002Funit\u002F`                                                  |\n| `task test:integration` | Integration tests under `tests\u002Fintegration\u002F`                                    |\n\n## License\n\n[MIT](LICENSE)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a pull request.\n","HALO（Hierarchical Agent Loop Optimization）是一个基于RLM的自动代理优化循环工具。它通过收集代理执行的追踪数据，利用OpenTelemetry兼容的追踪技术，将这些数据输入到HALO-RLM引擎中进行分析，识别出常见的失败模式，并生成报告。接着，该报告被用于指导编码代理如Cursor或Claude Code来生成并应用一系列更改以优化代理。整个过程会不断重复，直到达到最佳效果。此项目非常适合在高流量环境中发现和解决生产代理部署中的问题，特别是当需要处理大量且变化多样的执行数据时。","2026-06-11 02:40:19","CREATED_QUERY"]