[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80032":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":29,"discoverSource":30},80032,"Life-Harness","Tianshi-Xu\u002FLife-Harness","Tianshi-Xu","Offical implementation of \"Life-Harness\"",null,"Python",173,13,3,1,0,2,12,104,7,3.44,"MIT License",false,"main",true,[],"2026-06-12 02:03:57","\u003Cdiv align=\"center\">\n\n# Life-Harness\n\n### Adapting the interface, not the model, for deterministic LLM agents\n\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.22166-b31b1b)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22166)\n[![Benchmarks](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fbenchmarks-7-blue)](#benchmarks)\n[![Model Backbones](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fmodel%20backbones-18-green)](#results)\n[![Settings Improved](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fimproved-116%2F126-orange)](#results)\n[![Training Free](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ftraining%20free-yes-lightgrey)](#why-life-harness)\n\n\u003C\u002Fdiv>\n\n## News\n\n- **2026\u002F05\u002F24**: Released the paper and codebase. The second version of the\n  paper has also been submitted to arXiv, and the code release includes the\n  evolution prompts used to build the harness.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Foverview.png\" width=\"92%\" alt=\"Life-Harness overview\" \u002F>\n\u003C\u002Fp>\n\n**Life-Harness** is the code release for **\"Adapting the Interface, Not the Model:\nRuntime Harness Adaptation for Deterministic LLM Agents.\"** It targets a practical\nquestion: when a frozen LLM agent repeatedly fails in a deterministic environment,\ncan we improve the runtime harness around the agent instead of retraining the\nmodel or modifying the environment?\n\nThe answer is yes. Life-Harness turns recurring failures into reusable runtime\ninterventions across action realization, environment contracts, trajectory\nregulation, and procedural skills. The model remains frozen; the benchmark\nenvironment remains intact; only the harness interface adapts.\n\n| Benchmarks | Model backbones | Settings improved | Avg. relative gain | Training-free |\n| ---: | ---: | ---: | ---: | ---: |\n| 7 | 18 | 116 \u002F 126 | 88.5% | Yes |\n\n## Why Life-Harness\n\n| What changes? | What stays fixed? | Why it matters |\n| --- | --- | --- |\n| Runtime harness behavior | LLM weights | No finetuning or model-specific training pipeline |\n| Prompted environment interface | Benchmark environment | Keeps deterministic evaluation comparable |\n\n## Results\n\nAcross **7 deterministic agent benchmarks** and **18 model backbones**,\nLife-Harness improves **116 \u002F 126** model-environment settings, with an\n**88.5% average relative improvement** reported in the paper.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fresult.png\" width=\"68%\" alt=\"Life-Harness result heatmap\" \u002F>\n\u003C\u002Fp>\n\n## Method\n\nLife-Harness evolves a small set of runtime layers from observed failures, then\nreuses those layers during evaluation.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fmethod.png\" width=\"100%\" alt=\"Life-Harness method overview\" \u002F>\n\u003C\u002Fp>\n\n| Harness flag | Paper layer | Runtime role |\n| --- | --- | --- |\n| `h2` | Action Realization Layer | Helps convert model decisions into executable environment actions. |\n| `h3` | Environment Contract Layer | Makes task and environment constraints explicit at runtime. |\n| `h4` | Trajectory Regulation Layer | Regulates multi-step interaction traces to avoid repeated failure patterns. |\n| `h5` | Procedural Skill Layer | Reuses procedural knowledge distilled from recurring successful recoveries. |\n\nWhen the harness is disabled, these layers are not applied.\n\n## Benchmarks\n\nThis repository keeps the two benchmark families in separate folders because\ntheir environments and dependencies are intentionally different.\n\n| Suite | Environments | Start here |\n| --- | --- | --- |\n| AgentBench-style harness | ALFWorld, DBBench, OS, WebShop | [AgentBench\u002FREADME.md](AgentBench\u002FREADME.md) |\n| tau-bench-style harness | Airline, Retail, Telecom | [TauBench\u002FREADME.md](TauBench\u002FREADME.md) |\n\n```text\nLife-harness\u002F\n  AgentBench\u002F      # Docker-based AgentBench-style tasks\n  TauBench\u002F        # uv-based tau-bench-style tasks\n  assets\u002F          # README figures\n```\n\n## Quick Start\n\nClone the repository, then enter the benchmark suite you want to run:\n\n```bash\ncd Life-harness\n\n# tau-bench-style tasks: Airline, Retail, Telecom\ncd TauBench\n\n# AgentBench-style tasks: ALFWorld, DBBench, OS, WebShop\ncd ..\u002FAgentBench\n```\n\nEach subfolder README contains its own environment setup, evaluation commands,\nand harness switches. API keys and provider URLs should be configured locally\nthrough environment variables or `.env` files; do not commit them.\n\n## Citation\n\nIf you use this repository, please cite the paper:\n\n```bibtex\n@article{xu2026adapting,\n  title = {Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents},\n  author = {Xu, Tianshi and others},\n  journal = {arXiv},\n  year = {2026},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22166},\n  urldate = {2026-05-22}\n}\n```\n","Life-Harness 是一个旨在通过调整接口而非模型本身来优化确定性环境下大型语言模型（LLM）代理性能的项目。其核心功能包括从重复失败中学习并生成可重用的运行时干预措施，涵盖动作实现、环境契约、轨迹调节及程序技能等方面，而无需对模型进行再训练或修改环境。项目基于Python开发，支持多种模型骨架，并在7个基准测试和18种模型配置下实现了平均88.5%的相对性能提升。Life-Harness适用于需要提高冻结状态下的LLM代理在特定任务上表现的场景，特别是在希望保持原模型权重不变的情况下寻找解决方案的研究与应用场合。","2026-06-11 03:58:57","CREATED_QUERY"]