[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72294":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":22,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":28,"lastSyncTime":29,"discoverSource":30},72294,"bdh","pathwaycom\u002Fbdh","pathwaycom","BDH (Dragon Hatchling) – Architecture and Code","",null,"Python",3391,221,36,5,0,1,9,3,62.44,"MIT License",false,"main",[],"2026-06-12 04:01:04","# BDH (Dragon Hatchling)\n\n## **Bridging the Gap Between Transformers and the Brain**\n\n**BDH (Dragon Hatchling)** is a biologically inspired large language model architecture that connects principles of deep learning with the foundations of neuroscience. Developed by researchers at [Pathway](https:\u002F\u002Fpathway.com), BDH provides a theoretical and practical framework for understanding the emergence of reasoning and generalization in artificial systems.\n\nThis repository contains the official implementation from the paper:\n> *A. Kosowski, P. Uznański, J. Chorowski, Z. Stamirowska, M. Bartoszkiewicz.*\n> [_The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain_](https:\u002F\u002Fdoi.org\u002F10.48550\u002FarXiv.2509.26507), arXiv (2025).\n\n\n## Overview\n\nBDH represents a **scale-free, locally interacting network of neurons** capable of intrinsic reasoning dynamics. BDH scales like a Transformer on performance benchmarks—yet retains full interpretability and theoretical grounding in the fine-grained dynamics of neuron interactions.\n\n**Key properties:**\n\n- **Scale-free network topology** mimicking biological connectivity\n- **Locally interacting neuron particles** with excitatory\u002Finhibitory dynamics\n- **Hebbian working memory** based on synaptic plasticity, displaying monosemanticity\n- **GPU-friendly state-space formulation** for efficient implementation\n- **Interpretable activations** that are sparse and positive\n\nBDH formalizes a bridge between **neural computation and machine-based language understanding**. It shows how **macro reasoning behavior** in large AI models emerges from **micro-level neuron dynamics**, guided by principles of graph theory and local computation.\n\nEmpirically, BDH matches **GPT-2–scale Transformers** across language and translation tasks at equivalent parameter scales (10M–1B).\n\n\n***\n\n## Architecture\n\n\u003Cimg src=\"figs\u002Farchitecture.png\" width=\"600\"\u002F>\n\n***\n\n## Relation to Transformers\n\n\u003Cimg src=\"figs\u002Fvocab.png\" width=\"600\"\u002F>\n\nBDH and the Transformer share attention-inspired computation; however, BDH’s graph-based architecture makes its attention **emerge naturally from neuron-level interactions**, reflecting attention as seen in biological systems.\n\n***\n\n## Scaling Laws\n\n\u003Cimg src=\"figs\u002Fbdh_scaling.png\" width=\"600\"\u002F>\n\nBDH follows **Transformer-like scaling laws**, maintaining parameter efficiency while achieving interpretability at any scale.\n\n***\n\n## Latest research update: Sudoku Benchmark\n\nNote: The Sudoku Extreme result refers to Pathway’s internal BDH implementation, not to the current open-source repository. This repository contains the implementation of the baseline variant as described in our [public paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.26507) and does not reproduce the 97.4% benchmark result out of the box. See the dedicated Extreme Sudoku research blog post for additional benchmark context and the reported results.\n\nOn Sudoku Extreme, BDH reaches 97.4% accuracy across roughly 250,000 difficult puzzles, without chain-of-thought, solution backtracking, or external tool use, while leading LLMs struggle to perform on the benchmark at all.\n\nLanguage is not enough for intelligence. Transformers process information token by token with limited internal state, which makes search-heavy, non-linguistic reasoning tasks like Sudoku awkward. BDH uses a larger latent reasoning space with intrinsic memory that supports learning and adaptation during use.\n\nWe believe that the future of AI will belong to systems that can reason natively across domains, that can hold multiple possibilities in a rich latent space, and that can converge on solutions without needing to verbalize every step. BDH is our answer to that challenge. It is designed to be a universal reasoning system that can speak our language without being trapped inside it. And yes, it solves Sudoku.\n\nRead more: [Post-transformers: Sudoku Bench](https:\u002F\u002Fpathway.com\u002Fresearch\u002Fbeyond-transformers-sudoku-bench)\n\n### Performance Comparison\n\n| Model | Sudoku Extreme Accuracy | Relative Cost |\n|------|------------------------|--------------|\n| Pathway BDH | 97.4% | 10× lower, No chain-of-thought |\n| Leading LLMs (O3-mini, DeepSeek R1, Claude 3.7 8K) | ~0% | High (chain-of-thought) |\n\n*Table 1: Performance comparison on extreme Sudoku benchmarks (~250,000 difficult puzzles).*  \n*Source: Pathway internal data and https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.21734 for the Leading LLMs’ accuracy score. Pathway’s approach reflects top-1 accuracy and does not rely on chain-of-thought nor solution backtracking.*\n\n\n## Installation and Training\n\n```bash\n# install dependencies\npip install -r requirements.txt\n\n# train BDH on a toy dataset\npython train.py\n```\n\n\u003C!--For visualization and interpretability analysis, explore the example notebooks in `notebooks\u002F`.-->\n\n\n\n## Learn and Discuss\n\n- Watch the *SuperDataScience podcast* [▶️ *Dragon Hatchling: The Missing Link Between Transformers and the Brain*](https:\u002F\u002Fwww.youtube.com\u002Fwatch?v=mfV44-mtg7c) (72 min.) featuring Adrian Kosowski in conversation with Jon Krohn, unpacking BDH’s neuron-level architecture and sparse reasoning dynamics.\n\n- Read about BDH in\n[*Forbes*](https:\u002F\u002Fwww.forbes.com\u002Fsites\u002Fvictordey\u002F2025\u002F10\u002F08\u002Fcan-ai-learn-and-evolve-like-a-brain-pathways-bold-research-thinks-so\u002F),\n[*Semafor*](https:\u002F\u002Fwww.semafor.com\u002Farticle\u002F10\u002F01\u002F2025\u002Fnew-ai-research-claims-to-be-getting-closer-to-modeling-human-brain),\n[*The Turing Post*](https:\u002F\u002Fwww.turingpost.com\u002Fp\u002Ffod-121-300-million-to-start-a-big-promise-for-science#the-freshest-research-papers-catego),\n[*Quantum Zeitgeist*](https:\u002F\u002Fquantumzeitgeist.com\u002Fpalo-alto-ai-firm-pathway-unveils-post-transformer-architecture-for-autonomous-ai\u002F),\n[*Golem*](https:\u002F\u002Fwww.golem.de\u002Fnews\u002Fneue-ki-architektur-was-ist-baby-dragon-hatchling-2510-201047-2.html),\nand elsewhere in the media.\n\n- Discuss and share the BDH paper on:\n[*Hugging Face Papers*](https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2509.26507), \n[*Alphaxiv*](https:\u002F\u002Falphaxiv.org\u002Fabs\u002F2509.26507),\nand [*EmergentMind*](https:\u002F\u002Femergentmind.com\u002Fpapers\u002F2509.26507).\n\n## Community Projects\n\n- [adamskrodzki\u002Fbdh](https:\u002F\u002Fgithub.com\u002Fadamskrodzki\u002Fbdh): dynamic vocabulary, stateful attention\n- [mosure\u002Fburn_dragon_hatchling](https:\u002F\u002Fgithub.com\u002Fmosure\u002Fburn_dragon_hatchling): Burn port\n- [severian42\u002Fbdh](https:\u002F\u002Fgithub.com\u002Fseverian42\u002Fbdh): MLX port\n- [Git-Faisal\u002Fbdh](https:\u002F\u002Fgithub.com\u002FGit-Faisal\u002Fbdh)\n- [GrahLnn\u002Fbdh](https:\u002F\u002Fgithub.com\u002FGrahLnn\u002Fbdh)\n\n## Acknowledgements\nWe thank Andrej Karpathy for the [nanoGPT](https:\u002F\u002Fgithub.com\u002Fkarpathy\u002FnanoGPT\u002F) code and the tiny Shapespeare dataset used in this demonstration.\n\nBDH research stands at the intersection of **AI architecture**, **biological learning models**, and **theoretical computer science**—an effort to map the *equations of reasoning* between artificial and biological intelligence.\n","Baby Dragon Hatchling (BDH) 是一种生物启发的大规模语言模型架构，旨在连接深度学习原理与神经科学基础。该项目的核心功能包括无标度网络拓扑、局部交互的神经元粒子、基于突触可塑性的Hebbian工作记忆以及GPU友好的状态空间公式化，这些特点使得BDH在保持高效性能的同时具备高度的可解释性。特别适用于需要理解人工系统中推理和泛化能力如何产生的研究场景，以及对模型内部运作机制有深入了解需求的应用场合。",2,"2026-06-11 03:41:14","high_star"]