[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79905":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":19,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":15,"starSnapshotCount":15,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},79905,"experiments-autonomous-speedrunning","PrimeIntellect-ai\u002Fexperiments-autonomous-speedrunning","PrimeIntellect-ai","autonomous nanogpt optimizer speedrun","",null,"Python",99,8,91,0,1,5,2.86,false,"main",[],"2026-06-12 02:03:55","\u003Cp align=\"center\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F40c36e38-c5bd-4c5a-9cb3-f7b902cd155d\">\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F6414bc9b-126b-41ca-9307-9e982430cde8\">\n    \u003Cimg alt=\"Prime Intellect\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F6414bc9b-126b-41ca-9307-9e982430cde8\" width=\"312\" style=\"max-width: 100%;\">\n  \u003C\u002Fpicture>\n\u003C\u002Fp>\n\n\u003Ch3 align=\"center\">\nAutonomous Speedrunning Experiment\n\u003C\u002Fh3>\n\nRaw archive of autonomous agents (Claude Code \u002F Opus 4.7 and Codex \u002F GPT 5.5) competing on the `track_3_optimization` benchmark from [modded-nanogpt](https:\u002F\u002Fgithub.com\u002FKellerJordan\u002Fmodded-nanogpt): reach validation loss 3.28 in as few training steps as possible. Only the optimizer, schedules, initialization, and a small set of hyperparameters can change.\n\nBlog post: **TODO: link**\n\nThe reference Muon optimizer reaches the target in 3500 steps. The best public record at the start was 3225. By v2, both agents pass 3035. By v3, Claude reaches 2930 and Codex 2950.\n\nRecord configs from these runs are submitted as PRs against the [`modded-nanogpt` fork](https:\u002F\u002Fgithub.com\u002Feliebak\u002Fmodded-nanogpt\u002Fpulls).\n\n## What's in here\n\nHarness files the agents were given, the plans and threads they wrote, ~10k run logs, and all generated variants. Everything the blog references lives here.\n\n| Folder | What it is |\n|---|---|\n| `v1\u002F` | First wave. Beat the Muon baseline at 3500 steps. |\n| `novelty\u002F` | Novelty-constrained wave. Every idea must pass a novelty check (not just a known method or hyperparameter tweak). |\n| `v2\u002F` | Second wave, starting from the best v1\u002Fnovelty results. Pushing toward 3000. |\n| `v3\u002F` | Third wave, starting from v2 frontier and public PRs. Under-3000 and under-2900 search. |\n\nEach wave:\n\n```text\n\u003Cwave>\u002F\n  claude-code\u002F\n    AGENTS.md\n    goal.md\n    plan.md\n    scratchpad\u002F\n  codex\u002F\n    AGENTS.md\n    goal.md\n    plan.md\n    scratchpad\u002F\n```\n\nThe two agents ran independently and pursued different strategies. Compare their `plan.md`, `THREAD.md`, `runs.jsonl`, and `variants\u002F` rather than assuming the folders are duplicates.\n\n## Scratchpad contents\n\n- `THREAD.md`: chronological event log and reasoning trail\n- `runs.jsonl`: run ledger\n- `runs\u002F*.log`: training logs\n- `variants\u002F*.py`: generated training scripts and candidate recipes\n- `sbatch-stubs\u002F*.sh` (or top-level `run_*.sh`): launch scripts\n- `sweeps\u002F\u003Cname>\u002F`: grouped hyperparameter sweeps\n- `ideas\u002F*.md`, `papers\u002F*.md`, `picklist.md`, `audits.md`: literature notes, idea writeups, novelty checks, candidate triage\n\n## Anatomy of a run\n\nA \"run\" is one launch of one candidate. Inside a scratchpad it lives across four files:\n\n1. `variants\u002F\u003Cname>.py` — the candidate training script the agent generated.\n2. `sbatch-stubs\u002F\u003Cname>.sh` (or `run_*.sh`) — the launcher that submitted it.\n3. `runs\u002F\u003Cname>.log` — the training log produced.\n4. A row in `runs.jsonl` — parsed metrics (`step_to_3_28`, `final_val_loss`, `total_steps`, optimizer\u002FHP fields) plus the path back to the source record.\n\nMatch a `runs.jsonl` row to its log and variant by name\u002Fuuid.\n\n## Aggregated run export\n\n`data\u002Fruns_self_contained\u002F` is the flat, cross-wave view of every run, useful if you want to filter all ~10k runs without walking individual scratchpads.\n\nTop-level files:\n\n- `manifest.json` — export policy, counts, and schema fields.\n- `runs.jsonl` — one JSON object per run.\n- `runs.csv` — flat table for quick filtering.\n- `dropped_runs.jsonl` — inventory rows omitted from the export.\n\nPer-run files under `agents\u002F\u003Cagent>\u002Fruns\u002F\u003Cexport_id>\u002F`:\n\n- `metadata.json` — structured metadata: `final_val_loss`, `min_val_loss`, `final_step`, `train_steps`, `step_to_3_28`, `num_val_points`, `train_time_s`, `step_avg_ms`.\n- `train.log` — copied training log.\n- `launched_script.py` — copied train\u002Fconfig script (present when resolvable).\n- `source_snapshot.py` — exact logged source snapshot (present when emitted).\n- `console.log` \u002F `launch_stub.sh` — console log and sbatch launcher when available.\n\nCounts: **10,428** runs exported (57 dropped for missing `config_path` from 10,485 inventory rows).\n\n| agent | runs |\n|---|---:|\n| `cc_v1` | 605 |\n| `codex_v1` | 2,165 |\n| `cc_novelty` | 81 |\n| `codex_novelty` | 254 |\n| `cc_v2` | 459 |\n| `codex_v2` | 2,729 |\n| `cc_v3` | 1,059 |\n| `codex_v3` | 3,076 |\n","该项目旨在通过自主代理（Claude Code \u002F Opus 4.7 和 Codex \u002F GPT 5.5）在`track_3_optimization`基准测试中竞争，以尽可能少的训练步骤达到验证损失3.28。核心功能包括优化器、调度、初始化和少量超参数的调整。技术特点在于利用了先进的语言模型来生成和评估不同的优化策略。适合于需要高效训练深度学习模型，特别是在资源受限或时间紧迫的情况下使用。项目文件记录了从初始版本到最终版本的详细实验过程，包括代理生成的计划、执行日志及所有变体配置，为研究者提供了丰富的参考材料。",2,"2026-06-11 03:58:29","CREATED_QUERY"]