[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72038":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":8,"rankLanguage":8,"license":20,"archived":21,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":8,"pushedAt":8,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":14,"starSnapshotCount":14,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},72038,"TinyRecursiveModels","SamsungSAILMontreal\u002FTinyRecursiveModels","SamsungSAILMontreal",null,"Python",6526,1029,101,36,0,5,9,30,15,40.04,"MIT License",true,false,"main",[],"2026-06-12 02:02:57","**Update: Due to many automatically generated and irrelevant issues submitted to this repo (that have been deleted now) and our limited capacity to properly maintain this repo, we have to temporaliy archive (make read-only) this and several other repos.**\n\n\n# Less is More: Recursive Reasoning with Tiny Networks\n\nThis is the codebase for the paper: \"Less is More: Recursive Reasoning with Tiny Networks\". TRM is a recursive reasoning approach that achieves amazing scores of 45% on ARC-AGI-1 and 8% on ARC-AGI-2 using a tiny 7M parameters neural network.\n\n[Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04871)\n\n### Motivation\n\nTiny Recursion Model (TRM) is a recursive reasoning model that achieves amazing scores of 45% on ARC-AGI-1 and 8% on ARC-AGI-2 with a tiny 7M parameters neural network. The idea that one must rely on massive foundational models trained for millions of dollars by some big corporation in order to achieve success on hard tasks is a trap. Currently, there is too much focus on exploiting LLMs rather than devising and expanding new lines of direction. With recursive reasoning, it turns out that “less is more”: you don’t always need to crank up model size in order for a model to reason and solve hard problems. A tiny model pretrained from scratch, recursing on itself and updating its answers over time, can achieve a lot without breaking the bank.\n\nThis work came to be after I learned about the recent innovative Hierarchical Reasoning Model (HRM). I was amazed that an approach using small models could do so well on hard tasks like the ARC-AGI competition (reaching 40% accuracy when normally only Large Language Models could compete). But I kept thinking that it is too complicated, relying too much on biological arguments about the human brain, and that this recursive reasoning process could be greatly simplified and improved. Tiny Recursion Model (TRM) simplifies recursive reasoning to its core essence, which ultimately has nothing to do with the human brain, does not require any mathematical (fixed-point) theorem, nor any hierarchy.\n\n### How TRM works\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002FAlexiaJM.github.io\u002Fassets\u002Fimages\u002FTRM_fig.png\" alt=\"TRM\"  style=\"width: 30%;\">\n\u003C\u002Fp>\n\nTiny Recursion Model (TRM) recursively improves its predicted answer y with a tiny network. It starts with the embedded input question x and initial embedded answer y and latent z. For up to K improvements steps, it tries to improve its answer y. It does so by i) recursively updating n times its latent z given the question x, current answer y, and current latent z (recursive reasoning), and then ii) updating its answer y given the current answer y and current latent z. This recursive process allows the model to progressively improve its answer (potentially addressing any errors from its previous answer) in an extremely parameter-efficient manner while minimizing overfitting.\n\n### Requirements\n\nInstallation should take a few minutes. For the smallest experiments on Sudoku-Extreme (pretrain_mlp_t_sudoku), you need 1 GPU with enough memory. With 1 L40S (48Gb Ram), it takes around 18h to finish. In case that you run into issues due to library versions, here is the requirements with the exact versions used: [specific_requirements.txt](https:\u002F\u002Fgithub.com\u002FSamsungSAILMontreal\u002FTinyRecursiveModels\u002Fblob\u002Fmain\u002Fspecific_requirements.txt).\n\n- Python 3.10 (or similar)\n- Cuda 12.6.0 (or similar)\n\n```bash\npip install --upgrade pip wheel setuptools\npip install --pre --upgrade torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fnightly\u002Fcu126 # install torch based on your cuda version\npip install -r requirements.txt # install requirements\npip install --no-cache-dir --no-build-isolation adam-atan2 \nwandb login YOUR-LOGIN # login if you want the logger to sync results to your Weights & Biases (https:\u002F\u002Fwandb.ai\u002F)\n```\n\n### Dataset Preparation\n\n```bash\n# ARC-AGI-1\npython -m dataset.build_arc_dataset \\\n  --input-file-prefix kaggle\u002Fcombined\u002Farc-agi \\\n  --output-dir data\u002Farc1concept-aug-1000 \\\n  --subsets training evaluation concept \\\n  --test-set-name evaluation\n\n# ARC-AGI-2\npython -m dataset.build_arc_dataset \\\n  --input-file-prefix kaggle\u002Fcombined\u002Farc-agi \\\n  --output-dir data\u002Farc2concept-aug-1000 \\\n  --subsets training2 evaluation2 concept \\\n  --test-set-name evaluation2\n\n## Note: You cannot train on both ARC-AGI-1 and ARC-AGI-2 and evaluate them both because ARC-AGI-2 training data contains some ARC-AGI-1 eval data\n\n# Sudoku-Extreme\npython dataset\u002Fbuild_sudoku_dataset.py --output-dir data\u002Fsudoku-extreme-1k-aug-1000  --subsample-size 1000 --num-aug 1000  # 1000 examples, 1000 augments\n\n# Maze-Hard\npython dataset\u002Fbuild_maze_dataset.py # 1000 examples, 8 augments\n```\n\n## Experiments\n\n### Sudoku-Extreme (assuming 1 L40S GPU):\n\n```bash\nrun_name=\"pretrain_mlp_t_sudoku\"\npython pretrain.py \\\narch=trm \\\ndata_paths=\"[data\u002Fsudoku-extreme-1k-aug-1000]\" \\\nevaluators=\"[]\" \\\nepochs=50000 eval_interval=5000 \\\nlr=1e-4 puzzle_emb_lr=1e-4 weight_decay=1.0 puzzle_emb_weight_decay=1.0 \\\narch.mlp_t=True arch.pos_encodings=none \\\narch.L_layers=2 \\\narch.H_cycles=3 arch.L_cycles=6 \\\n+run_name=${run_name} ema=True\n\nExpected: Around 87% exact-accuracy (+- 2%)\n\nrun_name=\"pretrain_att_sudoku\"\npython pretrain.py \\\narch=trm \\\ndata_paths=\"[data\u002Fsudoku-extreme-1k-aug-1000]\" \\\nevaluators=\"[]\" \\\nepochs=50000 eval_interval=5000 \\\nlr=1e-4 puzzle_emb_lr=1e-4 weight_decay=1.0 puzzle_emb_weight_decay=1.0 \\\narch.L_layers=2 \\\narch.H_cycles=3 arch.L_cycles=6 \\\n+run_name=${run_name} ema=True\n```\n\nExpected: Around 75% exact-accuracy (+- 2%)\n\n*Runtime:* \u003C 20 hours\n\n### Maze-Hard (assuming 4 L40S GPUs):\n\n```bash\nrun_name=\"pretrain_att_maze30x30\"\ntorchrun --nproc-per-node 4 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 --nnodes=1 pretrain.py \\\narch=trm \\\ndata_paths=\"[data\u002Fmaze-30x30-hard-1k]\" \\\nevaluators=\"[]\" \\\nepochs=50000 eval_interval=5000 \\\nlr=1e-4 puzzle_emb_lr=1e-4 weight_decay=1.0 puzzle_emb_weight_decay=1.0 \\\narch.L_layers=2 \\\narch.H_cycles=3 arch.L_cycles=4 \\\n+run_name=${run_name} ema=True\n```\n\n*Runtime:* \u003C 24 hours\n\nActually, you can run Maze-Hard with 1 L40S GPU by reducing the batch-size with no noticable loss in performance:\n\n```bash\nrun_name=\"pretrain_att_maze30x30_1gpu\"\npython pretrain.py \\\narch=trm \\\ndata_paths=\"[data\u002Fmaze-30x30-hard-1k]\" \\\nevaluators=\"[]\" \\\nepochs=50000 eval_interval=5000 \\\nlr=1e-4 puzzle_emb_lr=1e-4 weight_decay=1.0 puzzle_emb_weight_decay=1.0 global_batch_size=128 \\\narch.L_layers=2 \\\narch.H_cycles=3 arch.L_cycles=4 \\\n+run_name=${run_name} ema=True\n```\n\n*Runtime:* \u003C 24 hours\n\n\n### ARC-AGI-1 (assuming 4 H-100 GPUs):\n\n```bash\nrun_name=\"pretrain_att_arc1concept_4\"\ntorchrun --nproc-per-node 4 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 --nnodes=1 pretrain.py \\\narch=trm \\\ndata_paths=\"[data\u002Farc1concept-aug-1000]\" \\\narch.L_layers=2 \\\narch.H_cycles=3 arch.L_cycles=4 \\\n+run_name=${run_name} ema=True\n\n```\n\n*Runtime:* ~3 days\n\n### ARC-AGI-2 (assuming 4 H-100 GPUs):\n\n```bash\nrun_name=\"pretrain_att_arc2concept_4\"\ntorchrun --nproc-per-node 4 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 --nnodes=1 pretrain.py \\\narch=trm \\\ndata_paths=\"[data\u002Farc2concept-aug-1000]\" \\\narch.L_layers=2 \\\narch.H_cycles=3 arch.L_cycles=4 \\\n+run_name=${run_name} ema=True\n\n```\n\n*Runtime:* ~3 days\n\n\n## Reference\n\nIf you find our work useful, please consider citing:\n\n```bibtex\n@misc{jolicoeurmartineau2025morerecursivereasoningtiny,\n      title={Less is More: Recursive Reasoning with Tiny Networks}, \n      author={Alexia Jolicoeur-Martineau},\n      year={2025},\n      eprint={2510.04871},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.04871}, \n}\n```\n\nand the Hierarchical Reasoning Model (HRM):\n\n```bibtex\n@misc{wang2025hierarchicalreasoningmodel,\n      title={Hierarchical Reasoning Model}, \n      author={Guan Wang and Jin Li and Yuhao Sun and Xing Chen and Changling Liu and Yue Wu and Meng Lu and Sen Song and Yasin Abbasi Yadkori},\n      year={2025},\n      eprint={2506.21734},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.21734}, \n}\n```\n\nThis code is based on the Hierarchical Reasoning Model [code](https:\u002F\u002Fgithub.com\u002Fsapientinc\u002FHRM) and the Hierarchical Reasoning Model Analysis [code](https:\u002F\u002Fgithub.com\u002Farcprize\u002Fhierarchical-reasoning-model-analysis).\n","该项目实现了一种名为Tiny Recursive Model (TRM)的递归推理模型，该模型使用仅有700万参数的小型神经网络，在ARC-AGI-1和ARC-AGI-2测试集上分别取得了45%和8%的成绩。其核心功能在于通过递归更新潜在变量z与答案y来逐步改进预测结果，整个过程无需依赖复杂的层级结构或固定点定理，展现了“少即是多”的设计理念。适用于需要高效利用计算资源解决复杂推理任务的场景，如AI竞赛中的问题解答等，尤其适合那些希望避免高昂训练成本但仍追求高性能解决方案的研究者或开发者。",2,"2026-06-11 03:40:04","high_star"]