[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-76122":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":14,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},76122,"autogo","ericjang\u002Fautogo","ericjang","Autoresearch for Go",null,"Python",210,27,160,3,0,1,4,41,52.44,"MIT License",false,"main",[],"2026-06-12 04:01:20","# AutoGo\n\nA minimal codebase for building a strong Go-playing AI from scratch — and, more importantly, for studying how to automate the AI researcher driving the project. AutoGo is less about mastering Go than about exercising an autonomous-research workflow on a domain where data is cheap and signal is fast.\n\nCurrently, the best model I have trained plays OK but has some bugs due to not understanding life\u002Fdeath properly due to being trained on Tromp-Taylor scoring rules. Working on fixing this.\n\n \u003C!-- that I'm working on fixing `kata1-zhizi-b40c768nbt-fdx6c`.  -->\n\n\u003C!-- ![learning progress](progress.png) -->\n\n[Web Tutorial](https:\u002F\u002Fevjang.com\u002F2026\u002F04\u002F28\u002Fautogo.html)\n\n[Play the AutoGo AI](https:\u002F\u002Fautogo.evjang.com)\n\n## Why Go?\n\nAlphaGo and MCTS are so 2016. Why build a research codebase around Go, as opposed to more recent models like reasoning LLMs, VLMs, Diffusion, etc?\n\nThis repo is not really about Go. It is about automating the Go researcher. The same skillsets should transfer to many other AI research domains. From Dario Amodei's [Machines of Loving Grace](https:\u002F\u002Fdarioamodei.com\u002Fessay\u002Fmachines-of-loving-grace): \n\n*If our core hypothesis about AI progress is correct, then the right way to think of AI is not as a method of data analysis, but as a virtual biologist who performs all the tasks biologists do, including designing and running experiments in the real world (by controlling lab robots or simply telling humans which experiments to run – as a Principal Investigator would to their graduate students), inventing new biological methods or measurement techniques, and so on. It is by speeding up the whole research process that AI can truly accelerate biology. I want to repeat this because it’s the most common misconception that comes up when I talk about AI’s ability to transform biology: I am not talking about AI as merely a tool to analyze data. In line with the definition of powerful AI at the beginning of this essay, I’m talking about using AI to perform, direct, and improve upon nearly everything biologists do.*\n\nAs to why Go is a particularly good environment for \"automated researcher\", it mainly comes down to being a (relatively) computationally lightweight environment that still requires the core competencies of AI researchers.\n\n1. Training policy and value networks in Go are fundamentally about minimizing perplexity, like in LLMs. Unlike model-free RL algorithms specialized for single-player games (ALE benchmark, Mujoco), AlphaGo favors fairly simple training methods (supervised learning) + scaling up the system engineering. A similar taste around simple algorithms + performant distributed systems is employed in frontier labs.\n2. Because Go data is easy to generate and the universe of Go games is very large, I suspect that Go is actually a good fit for studying scaling laws. If Go turns out to not be appropriate for studying scaling laws, we learn something about why [scaling laws are hard for robotics](https:\u002F\u002Fx.com\u002Fericjang11\u002Fstatus\u002F2011611913424421149?s=20). Preliminary experiments in this codebase suggest that similar to reasoning LLMs, neural nets trained on Go exhibit both train-time and test-time scaling law properties.\n3. Techniques that help train Go networks faster will likely translate to LLMs, as well as action + value prediction for robotics. Evaluating new deep learning techniques in Go offers a de-correlated signal from LLM applications.\n4. AlphaGo as a system shares many similar elements to a robotics stack: logging, data collection, replay buffers, distributed RL, simulated evaluation - but runs many orders of magnitude faster and removes a lot of pesky details about implementing robotics systems: slowness + complexity + the crushing weight of maintaining real-world datasets. It actually has a \"little bit of everything\" when it comes to exposure to core deep learning topics.\n5. I find it deeply profound that simply querying a function approximator for value can be an arbitrarily accurate replacement for simulation. It is a miracle that macroscale effects can be predicted accurately without microscale simulation. Extrapolating this principle, I wonder if long-standing questions of computational hardness (P = NP?) are even the right ones to be asking. Perhaps we should be asking if \"P almost NP?\"\n6. Self-play, Nash equilibria, mixed strategies, and recursive self-improvement are top-of-mind for frontier labs. Go is a lightweight yet rich environment for studying those dynamics.\n\nInterested in buying RL environments and data for autonomous game-playing RL research? Please [get in touch](https:\u002F\u002Fforms.gle\u002FGqwLJ9r3SMVcygB46).\n\n\n## Workflow\n\nInstead of running code yourself, you ask Claude to run your experiments. The human researcher provides interactive feedback, which Claude will use to assist in its interpretation of the data.\n\nThere are a few skills in this repository that aid running experiments:\n\n- `autoresearch` : autonomously optimize a metric. Good for hyperparameter tuning (e.g. minimize validation loss) and performance optimization (e.g. maximize moves\u002Fsec).\n- `experiment` : one-off experiment useful for conducting analysis\n\n\n### Architecture\n\nA driver process inside a dev container dispatches jobs to a fleet of GPU worker hosts over SSH. Each job is a one-shot `docker run --rm` of the worker image (`ghcr.io\u002F\u003Cowner>\u002Falphago-worker:latest`). Worker nodes are listed in `cluster.toml`; bring them up with `infra\u002Fcluster.py add \u003Cuser@host>`. Exactly one worker (with `roles = [\"train\", \"collect\"]`) shares the controller's NFS mount, so checkpoints and game data are visible without explicit shipping. The remaining collect-only workers don't share NFS — `infra.remote_exec` rsyncs each `push_files` (e.g. the checkpoint) over before `docker run` and rsyncs each `pull_dirs` (e.g. the NPZ output dir) back when the job exits.\n\nSee `infra\u002Fcluster.py` (cluster bringup, `add`\u002F`ping`\u002F`build`\u002F`pull`\u002F`status` subcommands) and `infra\u002Fremote_exec.py` (the SSH dispatcher used by experiment drivers).\n\n```\n┌──────────────────────┐\n│  Code Editing Client │\n│  (Laptop \u002F VSCode)   │\n│                      │\n│  thin client,        │\n│  SSH remote into     │\n│  dev container       │\n└──────────┬───────────┘\n           │ SSH\n┌──────────▼───────────┐\n│  Dev Container (CPU) │\n│  + \u002Fnfs (host mount) │\n│                      │\n│  development,        │\n│  dispatch jobs via   │\n│  SSH + docker run    │\n└──────────┬───────────┘\n           │  Multi-host cluster  (cluster.toml; one job = one `docker run --rm` per GPU)\n           ▼\n┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐\n│ RTX 6000 Ada   │ │ RTX 6000 Ada   │ │ RTX PRO 6000B  │ │ RTX PRO 6000B  │\n│ train + collect│ │ collect        │ │ collect        │ │ collect        │\n│ \u002Fnfs (host)    │ │ no \u002Fnfs        │ │ no \u002Fnfs        │ │ no \u002Fnfs        │\n│ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ ┌────────────┐ │\n│ │ worker ctr │ │ │ │ worker ctr │ │ │ │ worker ctr │ │ │ │ worker ctr │ │\n│ └────────────┘ │ │ └────────────┘ │ │ └────────────┘ │ │ └────────────┘ │\n└────────────────┘ └────────────────┘ └────────────────┘ └────────────────┘\n```\n\n## Setup\n\nThe dev container provides a fully configured GPU-enabled environment with all dependencies pre-built.\n\n### Host machine prerequisites\n\nThe dev container mounts `~\u002F.ssh` and `~\u002F.claude` from the host. The container's `dev` user runs as UID 1000, so the host user must also be UID 1000 for file ownership to match across the bind mount. If you are logged in as root (UID 0), create a non-root user first:\n\nsudo chmod o+x \u002Fdata\n\n```bash\n# Create a user with UID 1000 (skip if your host user is already UID 1000)\nsudo useradd -m -u 1000 -s \u002Fbin\u002Fbash dev\n# Grant passwordless sudo\necho \"dev ALL=(ALL) NOPASSWD:ALL\" | sudo tee \u002Fetc\u002Fsudoers.d\u002Fdev\n\n# Copy your SSH keys to the new user\nsudo cp -r ~\u002F.ssh \u002Fhome\u002Fdev\u002F.ssh\nsudo chown -R dev:dev \u002Fhome\u002Fdev\u002F.ssh\nsudo chmod 700 \u002Fhome\u002Fdev\u002F.ssh\nsudo chmod 600 \u002Fhome\u002Fdev\u002F.ssh\u002F*\nsudo chmod 644 \u002Fhome\u002Fdev\u002F.ssh\u002F*.pub 2>\u002Fdev\u002Fnull\n\n# Copy Claude CLI config if it exists\nsudo cp -r ~\u002F.claude \u002Fhome\u002Fdev\u002F.claude 2>\u002Fdev\u002Fnull\nsudo chown -R dev:dev \u002Fhome\u002Fdev\u002F.claude 2>\u002Fdev\u002Fnull\nsudo cp ~\u002F.claude.json \u002Fhome\u002Fdev\u002F.claude.json 2>\u002Fdev\u002Fnull\nsudo chown dev:dev \u002Fhome\u002Fdev\u002F.claude.json 2>\u002Fdev\u002Fnull\n\n# Create files that the container bind-mounts expect\nsudo -u dev touch \u002Fhome\u002Fdev\u002F.bash_history\nsudo mkdir -p \u002Fhome\u002Fdev\u002F.docker\nsudo chown -R dev:dev \u002Fhome\u002Fdev\u002F.docker\n\n# Add dev to docker group so it can run containers\nsudo usermod -aG docker dev\n```\n\nThen switch to the new user and launch the dev container from there:\n\n```bash\nsu - dev\n```\n\n> **Why not just chown to 1000?** Changing `~\u002F.ssh` ownership to UID 1000 under root breaks root's own SSH access, since SSH requires key files to be owned by the connecting user. A dedicated UID 1000 host user avoids this tug-of-war entirely.\n\n### Clone and launch\n\n1. Clone with submodules:\n```bash\ngit clone --recursive \u003Cyour-fork-url> AutoGo\ncd AutoGo\n```\n\n2. Open in VS Code and select **\"Reopen in Container\"** when prompted (or run `Dev Containers: Reopen in Container` from the command palette).\n\n3. The container will automatically:\n   - Initialize git submodules\n   - Install Python dependencies via `uv sync`\n   - Build the C++ pybind11 extension\n\n4. Verify the setup:\n```bash\nuv run -m pytest tests\u002F\nuv run python -c \"import alpha_go_cpp; print(alpha_go_cpp.__version__)\"\nnvidia-smi\n```\n\n### Launch from terminal\n\nBuild and enter the container shell directly:\n\n\n```bash\ndocker build -f .devcontainer\u002FDockerfile -t learnalphago-dev .\ndocker run -it --gpus all --shm-size=8g \\\n  -v \"$PWD:\u002Fworkspace\" \\\n  -v \"$HOME\u002F.ssh:\u002Fhome\u002Fdev\u002F.ssh:ro\" \\\n  -p 8265:8265 -p 8090:8090 \\\n  learnalphago-dev bash\n```\n\nThen inside the container:\n```bash\ngit submodule update --init && uv sync\nclaude\n```\n## Development Setup\n\nVSCode: check Terminal: Send Keybindings To Shell so that you can ctrl-Ee and ctrl-g within Calude repl.\n\n## Cluster Operations\n\nWorker nodes are listed in `cluster.toml`; the worker image is `ghcr.io\u002F\u003Cowner>\u002Falphago-worker:latest` (override via `cluster.toml`'s top-level `image` key).\n\n\n### Build and push the worker image\n\nThe first step is to set up authentication to Github Container Registry. Edit a file `.secrets` like so: \n\n```bash\nGHCR_TOKEN=\u003Cyour_github_token>\nGHCR_USER=\u003Cyour_github_username>\n```\n\nThen run\n\n```bash\nuv run infra\u002Fcluster.py build\n```\n\nwhich builds `Dockerfile.worker` (on top of the base devcontainer image) and pushes it `cluster.py build\u002Fadd` automatically loads login information from `.secrets` to deploy the containers for pushing & pulling from the registry.\n\nRun `uv run infra\u002Fcluster.py pull` afterward to roll the new tag onto the fleet — `docker run` does not use `--pull=always`.\n\n### Add a new worker\n\n```bash\n.\u002Finfra\u002Fcluster.py add user@host          # baremetal or container host\n.\u002Finfra\u002Fcluster.py add --ssh-port 2222 user@host\n```\n\nInstalls Docker + the NVIDIA toolkit, logs into GHCR, pulls the image, seeds `\u002Fnfs`, and appends `[nodes.\"\u003Cip>\"]` to `cluster.toml`.\n\n### Inspect the fleet\n\n```bash\n.\u002Finfra\u002Fcluster.py ping     # SSH-echo every node, ✓\u002F✗ per host\n.\u002Finfra\u002Fcluster.py status   # per-GPU lease state from \u002Fnfs\u002Fcluster_leases\n```\n\n### Run an end-to-end iteration of Go training\n\nThe selfplay-only loop in `experiments\u002F2026-04-26_22-32-train-fromscratch\u002F` dispatches collect jobs across the whole cluster and trains on the gathered data:\n\n```bash\nEXP=experiments\u002F2026-04-26_22-32-train-fromscratch\nbash $EXP\u002Frun_iteration.sh 0 5\n```\n\n### Auto-Research\n\nYou can use the `autoresearch` and `experiment` skills to run new experiments. \n\n![nn arch tuning](experiments\u002F2026-04-28_00-38-fastlearn\u002Ffigures\u002FphaseA_progress.png)\n\n![self-play tuning](experiments\u002F2026-04-28_00-38-fastlearn\u002Ffigures\u002FphaseB_progress.png)\n\n## Infra Advice\n\n- Having Claude \"run the training loop by hand\" and stop and remark when a given iteration was going unstable was very useful for catching unstable training early. \n- It's very helpful to start with alternating synchronously between train and collect jobs before attempting to max throughput with async RL and simultaneous training + data collect. Helps a lot with catching stability issues in training, which are much harder to diagnose \u002F backtrack in async mode. Once you get synchronous baseline working, then you can look into speeding things up with async.\n- I wasted a lot of time wrangling distributed job orchestration frameworks. Falling back to docker exec calls over SSH ended up working best and being agent-friendly.\n\n\n## Acknowledgements\n\nThanks to Vincent Weisser of [Prime Intellect](https:\u002F\u002Fapp.primeintellect.ai\u002F) for donating the GPU credits for me to run this project! ","AutoGo 是一个用于从零开始构建强大的围棋AI的精简代码库，更重要的是，它旨在研究如何自动化驱动项目的AI研究员。该项目的核心功能包括训练策略和价值网络，采用类似于大型语言模型（LLMs）中的最小化困惑度方法，并且强调简单的算法与高性能分布式系统的结合。技术特点上，AutoGo 侧重于在数据生成成本低、反馈信号快的领域中实践自主研究工作流。尽管当前的最佳模型在理解生死规则方面存在一些缺陷，但整体表现尚可。此项目特别适合那些希望在相对计算轻量级环境中探索AI自动化研究流程的研究者们，尤其是对于想要了解如何通过简单算法和系统扩展来提升AI能力的研究场景。",2,"2026-06-11 03:54:34","CREATED_QUERY"]