[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75137":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":10,"languages":10,"totalLinesOfCode":10,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":41,"readmeContent":42,"aiSummary":43,"trendingCount":15,"starSnapshotCount":15,"syncStatus":44,"lastSyncTime":45,"discoverSource":46},75137,"awesome-autoresearch","alvinreal\u002Fawesome-autoresearch","alvinreal","A curated list of autonomous improvement loops, research agents, and autoresearch-style systems inspired by Karpathy's autoresearch.","https:\u002F\u002Fmoltfounders.com\u002Fautoresearch",null,2181,161,11,6,0,23,83,370,69,28.63,"Other",false,"main",true,[26,27,28,29,30,31,32,33,34,35,36,37,38,39,40],"agentic-systems","ai-agents","ai-research","ai-tools","autonomous-agents","autoresearch","awesome-list","claude-code","experiment-loops","karpathy","karpathy-inspired","llm-agents","research-agents","scientific-discovery","self-improving-systems","2026-06-12 02:03:33","\u003Cdiv align=\"center\">\n\n# 🔬 Awesome Autoresearch\n\n**A curated, high-signal index of autonomous improvement loops, research agents, and descendants inspired by** [**karpathy\u002Fautoresearch**](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch).\n\n[![Awesome](https:\u002F\u002Fawesome.re\u002Fbadge.svg)](https:\u002F\u002Fawesome.re)\n[![PRs Welcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen.svg?style=flat-square)](.\u002FCONTRIBUTING.md)\n[![License: CC0-1.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-CC0--1.0-blue.svg?style=flat-square)](.\u002FLICENSE)\n\n\u003Csub>by **Boring Dystopia Development**\u003C\u002Fsub>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fboringdystopia.ai\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fboringdystopia.ai-111111?style=for-the-badge&logo=vercel&logoColor=white\" alt=\"boringdystopia.ai\" \u002F>\n  \u003C\u002Fa>&nbsp;\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002Falvinunreal\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FX-@alvinunreal-000000?style=for-the-badge&logo=x&logoColor=white\" alt=\"X @alvinunreal\" \u002F>\n  \u003C\u002Fa>&nbsp;\n  \u003Ca href=\"https:\u002F\u002Ft.me\u002Fboringdystopiadevelopment\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTelegram-Join%20channel-2CA5E0?style=for-the-badge&logo=telegram&logoColor=white\" alt=\"Telegram Join channel\" \u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n## Contents\n\n- [🛠️ General-purpose descendants](#️-general-purpose-descendants)\n- [🔬 Research-agent systems](#-research-agent-systems)\n- [💻 Platform ports and hardware forks](#-platform-ports-and-hardware-forks)\n- [🎯 Domain-specific adaptations](#-domain-specific-adaptations)\n- [📊 Evaluation & benchmarks](#-evaluation--benchmarks)\n- [📈 Notable use cases and writeups](#-notable-use-cases-and-writeups)\n- [📚 Related resources](#-related-resources)\n- [📄 License](#-license)\n\n## 🛠️ General-purpose descendants\n\n- [kayba-ai\u002Frecursive-improve](https:\u002F\u002Fgithub.com\u002Fkayba-ai\u002Frecursive-improve) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fkayba-ai\u002Frecursive-improve?style=social) - Recursive self-improvement framework where agents capture execution traces, analyze failure patterns, and apply targeted fixes with keep-or-revert evaluation.\n- [vukrosic\u002Fauto-research](https:\u002F\u002Fgithub.com\u002Fvukrosic\u002Fauto-research) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fvukrosic\u002Fauto-research?style=social) - Docs-only control plane for an open autonomous AI research lab — file-based operating model for human direction and agent execution.\n- [uditgoenka\u002Fautoresearch](https:\u002F\u002Fgithub.com\u002Fuditgoenka\u002Fautoresearch) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fuditgoenka\u002Fautoresearch?style=social) - Claude Code skill that generalizes autoresearch into a reusable loop for software, docs, security, shipping, debugging, and other measurable goals.\n- [leo-lilinxiao\u002Fcodex-autoresearch](https:\u002F\u002Fgithub.com\u002Fleo-lilinxiao\u002Fcodex-autoresearch) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fleo-lilinxiao\u002Fcodex-autoresearch?style=social) - Codex-native autoresearch skill with resume support, lessons across runs, optional parallel experiments, and mode-specific workflows.\n- [supratikpm\u002Fgemini-autoresearch](https:\u002F\u002Fgithub.com\u002Fsupratikpm\u002Fgemini-autoresearch) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsupratikpm\u002Fgemini-autoresearch?style=social) - Gemini CLI skill that generalises autoresearch to any measurable goal. Gemini-native: uses Google Search grounding as a live verification source inside the loop, true headless overnight mode via --yolo --prompt, and 1M token context. Also works in Antigravity IDE via .agents\u002Fskills\u002F.\n- [davebcn87\u002Fpi-autoresearch](https:\u002F\u002Fgithub.com\u002Fdavebcn87\u002Fpi-autoresearch) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdavebcn87\u002Fpi-autoresearch?style=social) - `pi` extension plus dashboard for persistent experiment loops, live metrics, confidence tracking, and resumable autoresearch sessions.\n- [drivelineresearch\u002Fautoresearch-claude-code](https:\u002F\u002Fgithub.com\u002Fdrivelineresearch\u002Fautoresearch-claude-code) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdrivelineresearch\u002Fautoresearch-claude-code?style=social) - Claude Code plugin\u002Fskill port of `pi-autoresearch`, with a clean experiment-loop workflow and a concrete biomechanics case study.\n- [greyhaven-ai\u002Fautocontext](https:\u002F\u002Fgithub.com\u002Fgreyhaven-ai\u002Fautocontext) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgreyhaven-ai\u002Fautocontext?style=social) - Closed-loop control plane for repeated agent improvement, with evaluation, persistent knowledge, staged validation, and optional distillation into cheaper local runtimes.\n- [jmilinovich\u002Fgoal-md](https:\u002F\u002Fgithub.com\u002Fjmilinovich\u002Fgoal-md) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fjmilinovich\u002Fgoal-md?style=social) - Generalizes autoresearch into a `GOAL.md` pattern for repos where the agent must first construct a measurable fitness function before it can optimize.\n- [james-s-tayler\u002Flazy-developer](https:\u002F\u002Fgithub.com\u002Fjames-s-tayler\u002Flazy-developer) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fjames-s-tayler\u002Flazy-developer?style=social) - Claude Code skill that orchestrates autoresearch across a prioritized sequence of optimization goals (coverage, test speed, build speed, complexity, LOC, performance) using GOAL.md as the engine. Supports standalone and Ralph Mode multi-instance execution.\n- [mutable-state-inc\u002Fautoresearch-at-home](https:\u002F\u002Fgithub.com\u002Fmutable-state-inc\u002Fautoresearch-at-home) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmutable-state-inc\u002Fautoresearch-at-home?style=social) - Collaborative fork of upstream autoresearch that adds experiment claiming, shared best-config syncing, hypothesis exchange, and swarm-style coordination across many single-GPU agents.\n- [zkarimi22\u002Fautoresearch-anything](https:\u002F\u002Fgithub.com\u002Fzkarimi22\u002Fautoresearch-anything) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fzkarimi22\u002Fautoresearch-anything?style=social) - Generalizes autoresearch to **any measurable metric** — system prompts, API performance, landing pages, test suites, config tuning, SQL queries. \"If you can measure it, you can optimize it.\"\n- [Entrpi\u002Fautoresearch-everywhere](https:\u002F\u002Fgithub.com\u002FEntrpi\u002Fautoresearch-everywhere) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEntrpi\u002Fautoresearch-everywhere?style=social) - Cross-platform expansion that auto-detects hardware config and starts the loop. The \"glue and generalization\" half of autoresearch.\n- [ShengranHu\u002FADAS](https:\u002F\u002Fgithub.com\u002FShengranHu\u002FADAS) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FShengranHu\u002FADAS?style=social) - **Automated Design of Agentic Systems** — ICLR 2025. Meta-agents that invent novel agent architectures by programming them in code.\n- [MaximeRobeyns\u002Fself_improving_coding_agent](https:\u002F\u002Fgithub.com\u002FMaximeRobeyns\u002Fself_improving_coding_agent) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FMaximeRobeyns\u002Fself_improving_coding_agent?style=social) - **SICA**: Self-Improving Coding Agent that edits its own codebase. ICLR 2025 Workshop paper demonstrating scaffold-level self-improvement on coding benchmarks.\n- [peterskoett\u002Fself-improving-agent](https:\u002F\u002Fgithub.com\u002Fpeterskoett\u002Fself-improving-agent) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fpeterskoett\u002Fself-improving-agent?style=social) - Alternative self-improving agent architecture with reflection and meta-learning cycles.\n- [metauto-ai\u002FHGM](https:\u002F\u002Fgithub.com\u002Fmetauto-ai\u002FHGM) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmetauto-ai\u002FHGM?style=social) - **Huxley-Gödel Machine** for coding agents — applies self-improvement to SWE-bench performance via meta-level optimization.\n- [gepa-ai\u002Fgepa](https:\u002F\u002Fgithub.com\u002Fgepa-ai\u002Fgepa) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgepa-ai\u002Fgepa?style=social) - **GEPA (Genetic-Pareto)** — ICLR 2026 Oral. Reflective prompt evolution that outperforms RL (GRPO) on benchmarks. Optimizes any textual parameters against any metric using natural language reflection.\n- [sentient-agi\u002FEvoSkill](https:\u002F\u002Fgithub.com\u002Fsentient-agi\u002FEvoSkill) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsentient-agi\u002FEvoSkill?style=social) - Automated skill discovery for coding agents: evolves reusable skills and prompts from failed trajectories against benchmarks, with support for Claude Code, Codex CLI, OpenCode, OpenHands, and Goose.\n- [MrTsepa\u002Fautoevolve](https:\u002F\u002Fgithub.com\u002FMrTsepa\u002Fautoevolve) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FMrTsepa\u002Fautoevolve?style=social) - GEPA-inspired autoresearch for self-play: mutate code strategies, evaluate head-to-head, rate with Elo\u002FBradley-Terry, branch from the Pareto front. Agent reads match traces to target mutations. Works as a Claude Code skill.\n- [HKUDS\u002FClawTeam](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FClawTeam) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FHKUDS\u002FClawTeam?style=social) - Agent swarm intelligence for autoresearch — spawns parallel GPU research directions, distributes work across agents, aggregates results.\n- [Orchestra-Research\u002FAI-Research-SKILLs](https:\u002F\u002Fgithub.com\u002FOrchestra-Research\u002FAI-Research-SKILLs) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOrchestra-Research\u002FAI-Research-SKILLs?style=social) - Comprehensive skill library including autoresearch orchestration with two-loop architecture (inner optimization + outer synthesis).\n- [WecoAI\u002Faideml](https:\u002F\u002Fgithub.com\u002FWecoAI\u002Faideml) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FWecoAI\u002Faideml?style=social) - **AIDE**: Tree-search ML engineering agent that autonomously improves model performance via iterative code generation and evaluation.\n- [weco.ai](https:\u002F\u002Fweco.ai) - **Weco**: Cloud platform for AIDE with observability, experiment tracking, and managed runs — brings the autoresearch loop into production.\n\n## 🔬 Research-agent systems\n\n- [aiming-lab\u002FAutoResearchClaw](https:\u002F\u002Fgithub.com\u002Faiming-lab\u002FAutoResearchClaw) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Faiming-lab\u002FAutoResearchClaw?style=social) - End-to-end research pipeline that turns a topic into literature review, experiments, analysis, peer review, and paper drafts; broader than autoresearch, but clearly in the same lineage.\n- [OpenRaiser\u002FNanoResearch](https:\u002F\u002Fgithub.com\u002FOpenRaiser\u002FNanoResearch) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOpenRaiser\u002FNanoResearch?style=social) - End-to-end autonomous research engine that plans experiments, generates code, runs jobs locally or on SLURM, analyzes real results, and writes papers grounded in those outputs.\n- [kaust-ark\u002FARK](https:\u002F\u002Fgithub.com\u002Fkaust-ark\u002FARK) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fkaust-ark\u002FARK?style=social) - **ARK (Automatic Research Kit)**: idea + venue → paper pipeline orchestrating 6 agents — proposal analysis, literature search, Slurm experiments, LaTeX drafting, iterative peer review. Controlled via CLI, web dashboard, or Telegram.\n- [wanshuiyin\u002FAuto-claude-code-research-in-sleep](https:\u002F\u002Fgithub.com\u002Fwanshuiyin\u002FAuto-claude-code-research-in-sleep) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fwanshuiyin\u002FAuto-claude-code-research-in-sleep?style=social) - Markdown-first research workflows for Claude Code and other agents, centered on autonomous literature review, experiments, paper iteration, and cross-model critique.\n- [skyllwt\u002FOmegaWiki](https:\u002F\u002Fgithub.com\u002Fskyllwt\u002FOmegaWiki) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fskyllwt\u002FOmegaWiki?style=social) - Wiki-centric full-lifecycle research platform built on Claude Code, realizing Karpathy's LLM-Wiki vision. 20+ skills cover the full loop: ingest → ideate → novelty check → experiment design \u002F run \u002F eval → paper writing. Research state lives in a structured knowledge wiki with an interactive graph.\n- [Sibyl-Research-Team\u002FAutoResearch-SibylSystem](https:\u002F\u002Fgithub.com\u002FSibyl-Research-Team\u002FAutoResearch-SibylSystem) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSibyl-Research-Team\u002FAutoResearch-SibylSystem?style=social) - Fully autonomous AI scientist built on Claude Code, with explicit AutoResearch lineage, multi-agent research iteration, GPU experiment execution, and a self-evolving outer loop.\n- [eimenhmdt\u002Fautoresearcher](https:\u002F\u002Fgithub.com\u002Feimenhmdt\u002Fautoresearcher) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Feimenhmdt\u002Fautoresearcher?style=social) - Early open-source package for automating scientific workflows, currently centered on literature-review generation with an ambition toward broader autonomous research.\n- [hyperspaceai\u002Fagi](https:\u002F\u002Fgithub.com\u002Fhyperspaceai\u002Fagi) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhyperspaceai\u002Fagi?style=social) - Distributed, peer-to-peer research network where autonomous agents run experiments, gossip findings, maintain CRDT leaderboards, and archive results to GitHub across multiple research domains.\n- [Human-Agent-Society\u002FCORAL](https:\u002F\u002Fgithub.com\u002FHuman-Agent-Society\u002FCORAL) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FHuman-Agent-Society\u002FCORAL?style=social) - **CORAL**: Autonomous multi-agent evolution for open-ended discovery ([arXiv:2604.01658](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.01658)). Long-running agents with shared persistent memory, asynchronous execution, and heartbeat-based interventions; SOTA on 10 math\u002Falgorithmic\u002Fsystems tasks.\n- [SakanaAI\u002FAI-Scientist](https:\u002F\u002Fgithub.com\u002FSakanaAI\u002FAI-Scientist) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSakanaAI\u002FAI-Scientist?style=social) - **The AI Scientist**: First comprehensive system for fully automatic scientific discovery. From idea generation to paper writing with minimal human supervision.\n- [SakanaAI\u002FAI-Scientist-v2](https:\u002F\u002Fgithub.com\u002FSakanaAI\u002FAI-Scientist-v2) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSakanaAI\u002FAI-Scientist-v2?style=social) - Workshop-level automated scientific discovery via agentic tree search. Removes template dependency from v1, generalizes across research domains.\n- [AweAI-Team\u002FAiScientist](https:\u002F\u002Fgithub.com\u002FAweAI-Team\u002FAiScientist) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAweAI-Team\u002FAiScientist?style=social) - **AiScientist**: long-horizon ML research lab with hierarchical orchestration and File-as-Bus coordination — workspace files act as the durable system of record. Drives autonomous paper-reproduction (PaperBench) and competition-style MLE-Bench iteration loops under fixed compute\u002Ftime budgets. ([arXiv 2604.13018](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.13018))\n- [HKUDS\u002FAI-Researcher](https:\u002F\u002Fgithub.com\u002FHKUDS\u002FAI-Researcher) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FHKUDS\u002FAI-Researcher?style=social) - NeurIPS 2025 paper. Full end-to-end research automation: hypothesis → experiments → manuscript → peer review. Production version at [novix.science](https:\u002F\u002Fnovix.science\u002Fchat).\n- [openags\u002FAuto-Research](https:\u002F\u002Fgithub.com\u002Fopenags\u002FAuto-Research) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenags\u002FAuto-Research?style=social) - **OpenAGS**: Orchestrates a team of AI agents across the full research lifecycle — lit review, hypothesis generation, experiments, manuscript writing, and peer review.\n- [SamuelSchmidgall\u002FAgentLaboratory](https:\u002F\u002Fgithub.com\u002FSamuelSchmidgall\u002FAgentLaboratory) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FSamuelSchmidgall\u002FAgentLaboratory?style=social) - End-to-end autonomous research workflow: idea → literature review → experiments → report. Supports both autonomous and co-pilot modes.\n- [AgentRxiv](https:\u002F\u002Fagentrxiv.github.io\u002F) - Collaborative autonomous research framework where agent laboratories share a preprint server to build on each other's work iteratively.\n- [JinheonBaek\u002FResearchAgent](https:\u002F\u002Fgithub.com\u002FJinheonBaek\u002FResearchAgent) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FJinheonBaek\u002FResearchAgent?style=social) - Iterative research idea generation over scientific literature with LLMs. Multi-agent review and feedback loops.\n- [du-nlp-lab\u002FMLR-Copilot](https:\u002F\u002Fgithub.com\u002Fdu-nlp-lab\u002FMLR-Copilot) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fdu-nlp-lab\u002FMLR-Copilot?style=social) - Autonomous ML research framework — generates ideas, implements experiments, analyzes results.\n- [MASWorks\u002FML-Agent](https:\u002F\u002Fgithub.com\u002FMASWorks\u002FML-Agent) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FMASWorks\u002FML-Agent?style=social) - Reinforcing LLM agents for autonomous ML engineering. Learns from trial and error to improve model performance.\n- [PouriaRouzrokh\u002FLatteReview](https:\u002F\u002Fgithub.com\u002FPouriaRouzrokh\u002FLatteReview) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FPouriaRouzrokh\u002FLatteReview?style=social) - Low-code Python package for **automated systematic literature reviews** via AI-powered agents.\n- [LitLLM\u002FLitLLM](https:\u002F\u002Fgithub.com\u002FLitLLM\u002FLitLLM) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FLitLLM\u002FLitLLM?style=social) - AI-powered literature review assistant using RAG for accurate, well-structured related-work sections in academic writing.\n- [Agent Laboratory](https:\u002F\u002Fagentlaboratory.github.io\u002F) - Three-phase research pipeline: Literature Review → Experimentation → Report Writing, with specialized agents for each phase.\n- [WecoAI\u002Faideml](https:\u002F\u002Fgithub.com\u002FWecoAI\u002Faideml) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FWecoAI\u002Faideml?style=social) - **AIDE**: AI-Driven Exploration — tree-search-based ML engineering agent that automates experiment design, code generation, and evaluation. Treats ML engineering as code optimization against any metric.\n\n## 💻 Platform ports and hardware forks\n\n- [gianfrancopiana\u002Fopenclaw-autoresearch](https:\u002F\u002Fgithub.com\u002Fgianfrancopiana\u002Fopenclaw-autoresearch) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgianfrancopiana\u002Fopenclaw-autoresearch?style=social) - OpenClaw port of pi-autoresearch; autonomous experiment loop for any optimization target with statistical confidence scoring.\n- [miolini\u002Fautoresearch-macos](https:\u002F\u002Fgithub.com\u002Fmiolini\u002Fautoresearch-macos) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmiolini\u002Fautoresearch-macos?style=social) - Widely adopted macOS fork that adapts upstream autoresearch for Apple Silicon \u002F MPS while preserving the original loop shape.\n- [trevin-creator\u002Fautoresearch-mlx](https:\u002F\u002Fgithub.com\u002Ftrevin-creator\u002Fautoresearch-mlx) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftrevin-creator\u002Fautoresearch-mlx?style=social) - MLX-native Apple Silicon port that keeps the upstream fixed-budget `val_bpb` loop while removing the PyTorch\u002FCUDA dependency entirely.\n- [jsegov\u002Fautoresearch-win-rtx](https:\u002F\u002Fgithub.com\u002Fjsegov\u002Fautoresearch-win-rtx) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fjsegov\u002Fautoresearch-win-rtx?style=social) - Windows-native RTX fork focused on consumer NVIDIA GPUs, with explicit VRAM floors and a practical desktop setup path.\n- [iii-hq\u002Fn-autoresearch](https:\u002F\u002Fgithub.com\u002Fiii-hq\u002Fn-autoresearch) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fiii-hq\u002Fn-autoresearch?style=social) - Multi-GPU autoresearch infrastructure with structured experiment tracking, adaptive search strategy, crash recovery, and queryable orchestration around the classic `train.py` loop.\n- [lucasgelfond\u002Fautoresearch-webgpu](https:\u002F\u002Fgithub.com\u002Flucasgelfond\u002Fautoresearch-webgpu) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Flucasgelfond\u002Fautoresearch-webgpu?style=social) - Browser\u002FWebGPU port that lets agents generate training code, run experiments in-browser, and feed results back into the loop without a Python setup.\n- [tonitangpotato\u002Fautoresearch-engram](https:\u002F\u002Fgithub.com\u002Ftonitangpotato\u002Fautoresearch-engram) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftonitangpotato\u002Fautoresearch-engram?style=social) - Fork with **persistent cognitive memory** — frequency-weighted retrieval of cross-session knowledge for improved experiment continuity.\n- **Colab\u002FKaggle T4 port** - Adapts autoresearch for free T4 GPUs (Google Colab \u002F Kaggle) with zero cost and zero local setup. Key changes: Flash Attention 3 → PyTorch SDPA, removes H100-only kernel dependency. ([upstream issue #208](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002Fautoresearch\u002Fissues\u002F208))\n- [ArmanJR-Lab\u002Fautoautoresearch](https:\u002F\u002Fgithub.com\u002FArmanJR-Lab\u002Fautoautoresearch) - Jetson AGX Orin port with a **director** — a Go binary that acts as a \"creative director\" injecting novelty (arxiv papers + DeepSeek Reasoner) into the loop to escape local minima. Includes multi-experiment comparison (baseline vs director-guided) with detailed stall analysis.\n\n## 🎯 Domain-specific adaptations\n\n- [mattprusak\u002Fautoresearch-genealogy](https:\u002F\u002Fgithub.com\u002Fmattprusak\u002Fautoresearch-genealogy) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmattprusak\u002Fautoresearch-genealogy?style=social) - Applies the autoresearch pattern to genealogy, using structured prompts, archive guides, source checks, and vault workflows to iteratively expand and verify family-history research.\n- [ArchishmanSengupta\u002Fautovoiceevals](https:\u002F\u002Fgithub.com\u002FArchishmanSengupta\u002Fautovoiceevals) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FArchishmanSengupta\u002Fautovoiceevals?style=social) - Uses adversarial callers plus keep-or-revert prompt edits to harden voice AI agents across Vapi, Smallest AI, and ElevenLabs.\n- [chrisworsey55\u002Fatlas-gic](https:\u002F\u002Fgithub.com\u002Fchrisworsey55\u002Fatlas-gic) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fchrisworsey55\u002Fatlas-gic?style=social) - Applies the autoresearch keep-or-revert loop to trading agents, optimizing prompts and portfolio orchestration against rolling Sharpe ratio instead of model loss.\n- [RightNow-AI\u002Fautokernel](https:\u002F\u002Fgithub.com\u002FRightNow-AI\u002Fautokernel) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FRightNow-AI\u002Fautokernel?style=social) - Applies the autoresearch loop to GPU kernel optimization: profile bottlenecks, edit one kernel, benchmark, keep or revert, repeat.\n- [Agent-Analytics\u002Fautoresearch-growth](https:\u002F\u002Fgithub.com\u002FAgent-Analytics\u002Fautoresearch-growth) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAgent-Analytics\u002Fautoresearch-growth?style=social) - Applies autoresearch to landing-page positioning and A\u002FB test candidates, using analytics snapshots and measured experiment results to seed subsequent rounds.\n- [Rkcr7\u002Fautoresearch-sudoku](https:\u002F\u002Fgithub.com\u002FRkcr7\u002Fautoresearch-sudoku) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FRkcr7\u002Fautoresearch-sudoku?style=social) - Enhanced autoresearch workflow where an AI agent iteratively rewrites and benchmarks a Rust sudoku solver, ultimately beating leading human-built solvers on hard benchmark sets.\n- [jeongph\u002Fautospec](https:\u002F\u002Fgithub.com\u002Fjeongph\u002Fautospec) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fjeongph\u002Fautospec?style=social) - Reads natural-language business rules and autonomously builds a Spring Boot service with tests via the keep-or-revert loop. Evaluates with Gradle build + JUnit XML. 119-line skeleton to 950 lines in 5 cycles.\n\n## 📊 Evaluation & benchmarks\n\n- [snap-stanford\u002FMLAgentBench](https:\u002F\u002Fgithub.com\u002Fsnap-stanford\u002FMLAgentBench) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsnap-stanford\u002FMLAgentBench?style=social) - Benchmark suite for evaluating AI agents on ML experimentation tasks. 13 tasks from CIFAR-10 to BabyLM.\n- [openai\u002Fmle-bench](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fmle-bench) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenai\u002Fmle-bench?style=social) - OpenAI's benchmark for measuring how well AI agents perform at ML engineering.\n- [chchenhui\u002Fmlrbench](https:\u002F\u002Fgithub.com\u002Fchchenhui\u002Fmlrbench) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fchchenhui\u002Fmlrbench?style=social) - MLR-Bench: Evaluating AI agents on open-ended ML research. 201 tasks from NeurIPS\u002FICLR\u002FICML workshops.\n- [gersteinlab\u002FML-Bench](https:\u002F\u002Fgithub.com\u002Fgersteinlab\u002FML-Bench) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fgersteinlab\u002FML-Bench?style=social) - Evaluates LLMs and agents for ML tasks on repository-level code.\n- [THUDM\u002FAgentBench](https:\u002F\u002Fgithub.com\u002FTHUDM\u002FAgentBench) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTHUDM\u002FAgentBench?style=social) - Comprehensive benchmark for LLM-as-Agent evaluation across 8 distinct environments. ICLR 2024.\n\n## 📈 Notable use cases and writeups\n\n- **Shopify Liquid optimization** - Tobi Lütke shared an autoresearch-style optimization run on Shopify's Liquid engine, with public traces showing major parse\u002Frender speedups and allocation reductions. ([tweet](https:\u002F\u002Fx.com\u002Ftobi\u002Fstatus\u002F2032212531846971413), [PR with traces](https:\u002F\u002Fgithub.com\u002FShopify\u002Fliquid\u002Fpull\u002F2056))\n- **Driveline baseball biomechanics** - Public autoresearch-style experiment loop for pitch-velocity prediction from biomechanics data, with large reported gains in model quality. ([tweet](https:\u002F\u002Fx.com\u002Fdrivelinekyle\u002Fstatus\u002F2032242254035992610))\n- **Tennis XGBoost prediction + reward hacking writeup** - Nick Oak documents an autoresearch-inspired loop for tennis match prediction, including where the optimization setup went wrong. ([blog](https:\u002F\u002Fnickoak.com\u002Fposts\u002Ftennis-xgboost-autoresearch\u002F) · [repo](https:\u002F\u002Fgithub.com\u002Fbuildoak\u002Ftennis-xgboost-autoresearch) · [gamed branch](https:\u002F\u002Fgithub.com\u002Fbuildoak\u002Ftennis-xgboost-autoresearch\u002Ftree\u002Farchived\u002Fgamed-iterations))\n- **Vesuvius Challenge ink detection swarm** - Multi-agent experimental loop applied to ancient-scroll ink detection, with a strong writeup on cross-scroll generalization improvements. ([blog](https:\u002F\u002Fscrollprize.substack.com\u002Fp\u002Fwe-are-cooking))\n- **Earth system model optimization** - Hybrid workflow where an LLM proposes equation structures and a search process tunes parameters, showing how the autoresearch pattern extends into scientific modeling. ([tweet](https:\u002F\u002Fx.com\u002Fdevparagiri\u002Fstatus\u002F2035075626273739068), [blog](https:\u002F\u002Fparagiri.com\u002Fblog\u002F2026\u002Fautoresearch-earth-system-models\u002F))\n- **The Agentic Researcher** - Paper: \"A Practical Guide to AI-Assisted Research in Mathematics and Machine Learning.\" Cites autoresearch as the canonical example of automated ML experiment pipelines. ([arxiv 2603.15914](https:\u002F\u002Farxiv.org\u002Fhtml\u002F2603.15914))\n- **Scaling Autoresearch to GPU Clusters** - SkyPilot blog on running autoresearch on H100\u002FH200 clusters with cloud orchestration. ([SkyPilot Blog](https:\u002F\u002Fblog.skypilot.co\u002Fscaling-autoresearch\u002F))\n- **Self-Improving Coding Agents** - Addy Osmani's practical guide to setting up self-improving agent loops with Claude Code. ([article](https:\u002F\u002Faddyosmani.com\u002Fblog\u002Fself-improving-agents\u002F))\n- **autoresearch@home: Distributed AI Research** - SETI@home model applied to autoresearch — contribute GPU time to collective model optimization. ([Ensue Blog](https:\u002F\u002Fensue.dev\u002Fblog\u002Fautoresearch-at-home\u002F))\n- **Claude Code + AutoResearch for Self-Improving Skills** - MindStudio guide to building self-improving AI skills using Claude Code with autoresearch patterns. ([article](https:\u002F\u002Fwww.mindstudio.ai\u002Fblog\u002Fclaude-code-autoresearch-self-improving-skills))\n- **100 ML Experiments Overnight** - Particula technical breakdown with domain-agnostic fork applications. ([article](https:\u002F\u002Fparticula.tech\u002Fblog\u002Fkarpathy-autoresearch-autonomous-ml-experiments))\n- **PM's Guide to Autoresearch** - Product manager's guide covering setup, community forks, and real-world applications. ([article](https:\u002F\u002Fwww.news.aakashg.com\u002Fp\u002Fautoresearch-guide-for-pms))\n- **Autoresearch 101 Builder's Playbook** - Substack deep-dive on applying autoresearch patterns to prompts, agents, and workflows with concrete examples. ([article](https:\u002F\u002Fsidsaladi.substack.com\u002Fp\u002Fautoresearch-101-builders-playbook))\n- **Kingy AI Technical Breakdown** - Detailed technical walkthrough of the autoresearch loop architecture, mutation operators, and fitness function design. ([article](https:\u002F\u002Fkingy.ai\u002Fai\u002Fautoresearch-karpathys-minimal-agent-loop-for-autonomous-llm-experimentation\u002F))\n- **Fortune Feature** - Business and industry context on why autoresearch matters for the future of autonomous AI agents. ([article](https:\u002F\u002Ffortune.com\u002F2026\u002F03\u002F17\u002Fandrej-karpathy-loop-autonomous-ai-agents-future\u002F))\n\n## 📚 Related resources\n\nCurated lists and paper collections for AI agents, autonomous systems, and automated research:\n\n- [ai-agents-2030\u002Fawesome-deep-research-agent](https:\u002F\u002Fgithub.com\u002Fai-agents-2030\u002Fawesome-deep-research-agent) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fai-agents-2030\u002Fawesome-deep-research-agent?style=social) - Curated list of deep research agent papers and systems.\n- [YoungDubbyDu\u002FLLM-Agent-Optimization](https:\u002F\u002Fgithub.com\u002FYoungDubbyDu\u002FLLM-Agent-Optimization) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FYoungDubbyDu\u002FLLM-Agent-Optimization?style=social) - Papers on LLM agent optimization methods.\n- [VoltAgent\u002Fawesome-ai-agent-papers](https:\u002F\u002Fgithub.com\u002FVoltAgent\u002Fawesome-ai-agent-papers) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FVoltAgent\u002Fawesome-ai-agent-papers?style=social) - Curated AI agent papers from 2026 — agent engineering, memory, evaluation, workflows, and autonomous systems.\n- [masamasa59\u002Fai-agent-papers](https:\u002F\u002Fgithub.com\u002Fmasamasa59\u002Fai-agent-papers) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fmasamasa59\u002Fai-agent-papers?style=social) - AI agent research papers updated biweekly via automated arxiv search with curated selection.\n- [tmgthb\u002FAutonomous-Agents](https:\u002F\u002Fgithub.com\u002Ftmgthb\u002FAutonomous-Agents) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftmgthb\u002FAutonomous-Agents?style=social) - Autonomous agents research papers, updated daily.\n- [HKUST-KnowComp\u002FAwesome-LLM-Scientific-Discovery](https:\u002F\u002Fgithub.com\u002FHKUST-KnowComp\u002FAwesome-LLM-Scientific-Discovery) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FHKUST-KnowComp\u002FAwesome-LLM-Scientific-Discovery?style=social) - EMNLP 2025 survey on LLMs in scientific discovery.\n- [openags\u002FAwesome-AI-Scientist-Papers](https:\u002F\u002Fgithub.com\u002Fopenags\u002FAwesome-AI-Scientist-Papers) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fopenags\u002FAwesome-AI-Scientist-Papers?style=social) - Collection of AI Scientist \u002F Robot Scientist papers.\n- [agenticscience.github.io](https:\u002F\u002Fagenticscience.github.io\u002F) - Survey: \"From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery.\"\n- [dspy.ai\u002FGEPA](https:\u002F\u002Fdspy.ai\u002Fapi\u002Foptimizers\u002FGEPA\u002Foverview\u002F) - DSPy integration of GEPA reflective prompt optimizer for compound AI systems.\n- [OpenAI Cookbook: Self-Evolving Agents](https:\u002F\u002Fdevelopers.openai.com\u002Fcookbook\u002Fexamples\u002Fpartners\u002Fself_evolving_agents\u002Fautonomous_agent_retraining) - Cookbook for autonomous agent retraining using GEPA-style reflective evolution.\n- [WecoAI\u002Fawesome-autoresearch](https:\u002F\u002Fgithub.com\u002FWecoAI\u002Fawesome-autoresearch) ![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FWecoAI\u002Fawesome-autoresearch?style=social) - Curated list of AutoResearch use cases with verifiable traces and progress charts, organized by domain (LLM training, GPU kernels, voice agents, trading, etc.).\n\n\u003Cdiv align=\"center\">\n\n## Star History\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F?type=date&repos=alvinunreal%2Fawesome-autoresearch\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=alvinunreal%2Fawesome-autoresearch&type=date&theme=dark&legend=top-left\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=alvinunreal%2Fawesome-autoresearch&type=date&legend=top-left\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fimage?repos=alvinunreal%2Fawesome-autoresearch&type=date&legend=top-left\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n## 📄 License\n\nThis list is released under [CC0-1.0](.\u002FLICENSE).\n","awesome-autoresearch 是一个精心整理的自主改进循环、研究代理和自研风格系统的列表，灵感来源于 Karpathy 的 autoresearch 项目。该项目的核心功能包括提供多种通用和特定领域的自主代理系统，这些系统能够自我改进并通过迭代优化实现目标。技术特点涵盖了从软件开发到科学研究等多个领域的应用，并支持递归自我改进框架、文件化操作模式以及跨运行经验学习等特性。适用于需要构建或了解能够自我学习和改进的人工智能代理的研究人员、开发者及爱好者使用，在推动科学发现、提高软件质量等方面展现出巨大潜力。",2,"2026-06-11 03:52:28","high_star"]