[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74214":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},74214,"codex-autoresearch","leo-lilinxiao\u002Fcodex-autoresearch","leo-lilinxiao","Codex Autoresearch Skill — A self-directed iterative system for Codex that continuously cycles through: modify, verify, retain or discard, and repeat indefinitely. Inspired by Karpathy’s autoresearch concept.",null,"Python",1853,113,29,1,0,8,52,156,24,19.17,"MIT License",false,"main",true,[],"2026-06-12 02:03:23","\u003Cp align=\"center\">\n  \u003Cimg src=\"image\u002Fbanner.png\" width=\"700\" alt=\"Codex Autoresearch\">\n\u003C\u002Fp>\n\n\u003Ch2 align=\"center\">\u003Cb>Aim. Iterate. Arrive.\u003C\u002Fb>\u003C\u002Fh2>\n\n\u003Cp align=\"center\">\n  \u003Ci>Autonomous goal-driven experimentation for Codex.\u003C\u002Fi>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fdevelopers.openai.com\u002Fcodex\u002Fskills\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCodex-Skill-blue?logo=openai&logoColor=white\" alt=\"Codex Skill\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fleo-lilinxiao\u002Fcodex-autoresearch\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fleo-lilinxiao\u002Fcodex-autoresearch?style=social\" alt=\"GitHub Stars\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green.svg\" alt=\"MIT License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cb>English\u003C\u002Fb> ·\n  \u003Ca href=\"docs\u002Fi18n\u002FREADME_ZH.md\">🇨🇳 中文\u003C\u002Fa> ·\n  \u003Ca href=\"docs\u002Fi18n\u002FREADME_JA.md\">🇯🇵 日本語\u003C\u002Fa> ·\n  \u003Ca href=\"docs\u002Fi18n\u002FREADME_KO.md\">🇰🇷 한국어\u003C\u002Fa> ·\n  \u003Ca href=\"docs\u002Fi18n\u002FREADME_FR.md\">🇫🇷 Français\u003C\u002Fa> ·\n  \u003Ca href=\"docs\u002Fi18n\u002FREADME_DE.md\">🇩🇪 Deutsch\u003C\u002Fa> ·\n  \u003Ca href=\"docs\u002Fi18n\u002FREADME_ES.md\">🇪🇸 Español\u003C\u002Fa> ·\n  \u003Ca href=\"docs\u002Fi18n\u002FREADME_PT.md\">🇧🇷 Português\u003C\u002Fa> ·\n  \u003Ca href=\"docs\u002Fi18n\u002FREADME_RU.md\">🇷🇺 Русский\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\nThe idea: tell Codex what you want to improve, then walk away. It modifies your code, verifies the result, keeps or discards, and repeats. You come back to a log of experiments and a better codebase.\n\nInspired by [Karpathy's autoresearch](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch), generalized beyond ML to anything you can verify mechanically: test coverage, type errors, latency, lint warnings, security findings, release readiness — if a command can tell whether it improved, the loop can iterate on it.\n\n## Quick Start\n\n> [!IMPORTANT]\n> Start Codex with Goals, hooks, and Full Access enabled:\n>\n> ```bash\n> codex --enable goals --enable hooks --dangerously-bypass-approvals-and-sandbox\n> ```\n>\n> Use this before starting autoresearch for the smoothest foreground and background experience.\n\n```text\n# Install in Codex (recommended)\n$skill-installer install https:\u002F\u002Fgithub.com\u002Fleo-lilinxiao\u002Fcodex-autoresearch\n```\n\nRestart Codex, open your project, and go:\n\n```\nYou:   $codex-autoresearch\n       I want to get rid of all the `any` types in my TypeScript code\n\nCodex: I found 47 `any` occurrences across src\u002F**\u002F*.ts.\n       Results directory: .\u002Fautoresearch-results\u002F\n       Metric: `any` count (current: 47), direction: lower\n       Verify: grep count + tsc --noEmit as guard\n       Run mode: foreground or background?\n\nYou:   Background, go. Run overnight.\n\nCodex: Starting background run — baseline: 47. Iterating.\n```\n\nEach improvement stacks. Each failure reverts. Everything is logged.\n\nFor background runs, start Codex from a trusted **Full Access** session. If Codex is restricted to workspace-only sandboxing, use foreground mode or restart with Full Access before choosing background.\n\nSee [INSTALL.md](docs\u002FINSTALL.md) for manual copy, symlink, and user-scope options. See [GUIDE.md](docs\u002FGUIDE.md) for the full manual.\n\n## How It Works\n\n```\nYou say one sentence  →  Codex scans & confirms  →  You say \"go\"\n                                                        |\n                                         +--------------+--------------+\n                                         |                             |\n                                    foreground                    background\n                                  (current session)            (detached, overnight)\n                                         |                             |\n                                         +--------------+--------------+\n                                                        |\n                                                        v\n                                              +-------------------+\n                                              |    The Loop       |\n                                              |                   |\n                                              |  modify one thing |\n                                              |  git commit       |\n                                              |  run verify       |\n                                              |  improved? keep   |\n                                              |  worse? revert    |\n                                              |  log the result   |\n                                              |  repeat           |\n                                              +-------------------+\n```\n\nThat's it. You pick one: foreground keeps the loop in your current session, background hands it off to a detached process so you can sleep. Same loop either way, but they don't run at the same time.\n\n## What You Say vs What Happens\n\n| You say | What happens |\n|---------|-------------|\n| \"Improve my test coverage\" | Iterates until target or interrupted |\n| \"Fix the 12 failing tests\" | Repairs one by one until zero remain |\n| \"Why is the API returning 503?\" | Hunts root cause with falsifiable hypotheses |\n| \"Is this code secure?\" | STRIDE + OWASP audit, every finding backed by code evidence |\n| \"Ship it\" | Verifies readiness, generates checklist, gates release |\n| \"I want to optimize but don't know what\" | Analyzes repo, suggests metrics, generates config |\n\nBehind the scenes, Codex maps your sentence to one of 7 modes (loop, plan, debug, fix, security, ship, exec). You never need to pick one.\n\n## What Codex Figures Out\n\nYou don't write config. Codex infers everything from your sentence and your repo:\n\n| What it needs | How it gets it | Example |\n|--------------|----------------|---------|\n| Goal | Your sentence | \"get rid of all any types\" |\n| Scope | Scans repo structure | `src\u002F**\u002F*.ts` |\n| Metric | Proposes based on goal + tooling | any count (current: 47) |\n| Direction | Infers from \"improve\" \u002F \"reduce\" \u002F \"eliminate\" | lower |\n| Verify | Matches to repo tooling | `grep` count + `tsc --noEmit` |\n| Guard | Suggests a baseline-passing regression check | `npm test` |\n\nBefore starting, Codex always shows what it found and asks you to confirm. Then you choose foreground or background and say \"go.\"\nBy default, the Results directory stays in the launch context: if you started Codex inside a git repo, that repo root is the default workspace root; if you started outside a git repo, the current launch directory is the default workspace root. Codex should not silently widen that to a parent directory unless you explicitly confirm a broader multi-repo workspace. The confirmation summary should always show the chosen Results directory before launch.\n\n## When It Gets Stuck\n\nInstead of blind retrying, the loop escalates:\n\n| Trigger | Action |\n|---------|--------|\n| 3 consecutive failures | **REFINE** — adjust within current strategy |\n| 5 consecutive failures | **PIVOT** — try a fundamentally different approach |\n| 2 PIVOTs without progress | **Web search** — look for external solutions |\n| 3 PIVOTs without progress | **Stop** — report that human input is needed |\n\nOne success resets all counters.\n\n## Results Log\n\nEvery iteration is recorded in the workspace Results directory at `autoresearch-results\u002Fresults.tsv`:\n\n```\niteration  commit   metric  delta   status    description\n0          a1b2c3d  47      0       baseline  initial any count\n1          b2c3d4e  41      -6      keep      replace any in auth module\n2          -        49      +8      discard   generic wrapper introduced new anys\n3          d4e5f6g  38      -3      keep      type-narrow API response handlers\n```\n\nFailed experiments revert from git but stay in the log. The log is the real audit trail, while `autoresearch-results\u002Fstate.json` is the resume snapshot.\n\n## More Features\n\nThese are covered in detail in [GUIDE.md](docs\u002FGUIDE.md):\n\n- **Cross-run learning** — lessons from past runs bias future hypothesis generation\n- **Parallel experiments** — test up to 3 hypotheses simultaneously via git worktrees\n- **Session resume** — interrupted runs pick up from the last consistent state\n- **CI\u002FCD mode** (`exec`) — non-interactive, JSON output, for automation pipelines\n- **Dual-gate verification** — separate verify (did it improve?) and guard (did anything break?)\n- **Session hooks** — auto-installed; keep Codex on track across session boundaries\n\n## FAQ\n\n**It only makes small incremental changes. Can it try bigger ideas?**\nBy default the loop favors small, verifiable steps — that's by design. But it can go bigger: describe a larger hypothesis in your prompt (e.g., \"try replacing the attention mechanism with linear attention and run the full eval\"), and it will treat that as a single experiment to verify. The loop is best when the human sets the research direction and the agent does the heavy execution and analysis.\n\n**Is this more for engineering optimization than research?**\nIt's strongest when the goal and metric are clear — push coverage up, push errors down, push latency lower. For open-ended research where the direction itself is uncertain, use `plan` mode first to explore, then switch to `loop` once you know what to measure. Think of it as a human-AI collaboration: you provide judgment, it provides iteration speed.\n\n**How do I stop it?** Foreground: interrupt Codex. Background: `$codex-autoresearch` then ask to stop.\n\n**Can it resume after interruption?** Yes. It resumes from `autoresearch-results\u002Fstate.json` automatically.\n\n**How do I use it in CI?** `Mode: exec` with `codex exec`. All config upfront, JSON output, exit codes 0\u002F1\u002F2.\n\n## Documentation\n\n| Doc | What it covers |\n|-----|---------------|\n| [INSTALL.md](docs\u002FINSTALL.md) | All installation methods, skill discovery paths, hooks setup |\n| [GUIDE.md](docs\u002FGUIDE.md) | Full operator's manual: modes, config fields, safety model, advanced usage |\n| [EXAMPLES.md](docs\u002FEXAMPLES.md) | Recipes by domain: coverage, performance, types, security, etc. |\n\n## Acknowledgments\n\nBuilt on ideas from [Karpathy's autoresearch](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch). The Codex skills platform is by [OpenAI](https:\u002F\u002Fopenai.com).\n\n## Star History\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F?repos=leo-lilinxiao%2Fcodex-autoresearch&type=timeline&legend=top-left\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=leo-lilinxiao\u002Fcodex-autoresearch&type=timeline&theme=dark&legend=top-left\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=leo-lilinxiao\u002Fcodex-autoresearch&type=timeline&legend=top-left\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=leo-lilinxiao\u002Fcodex-autoresearch&type=timeline&legend=top-left\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","Codex Autoresearch 是一个针对 Codex 的自导向迭代系统，能够持续进行代码修改、验证、保留或丢弃，并无限循环。项目核心功能包括自动化的代码改进与验证流程，支持多种可机械验证的优化目标如测试覆盖率、类型错误、延迟、代码规范警告、安全问题及发布准备度等。采用 Python 编写，具备高度灵活性和扩展性。适用于需要对代码库进行持续优化但又希望减少人工干预的开发场景，尤其适合那些可以通过命令行工具自动检测改进效果的软件工程项目。",2,"2026-06-11 03:49:32","high_star"]