[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72194":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":27,"discoverSource":28},72194,"transformer-debugger","openai\u002Ftransformer-debugger","openai",null,"Python",4116,239,26,9,0,2,3,6,29.14,"MIT License",false,"main",true,[],"2026-06-12 02:02:59","# Transformer Debugger\n\nTransformer Debugger (TDB) is a tool developed by OpenAI's [Superalignment\nteam](https:\u002F\u002Fopenai.com\u002Fblog\u002Fintroducing-superalignment) with the goal of\nsupporting investigations into specific behaviors of small language models. The tool combines\n[automated interpretability](https:\u002F\u002Fopenai.com\u002Fresearch\u002Flanguage-models-can-explain-neurons-in-language-models)\ntechniques with [sparse autoencoders](https:\u002F\u002Ftransformer-circuits.pub\u002F2023\u002Fmonosemantic-features).\n\nTDB enables rapid exploration before needing to write code, with the ability to intervene in the\nforward pass and see how it affects a particular behavior. It can be used to answer questions like,\n\"Why does the model output token A instead of token B for this prompt?\" or \"Why does attention head\nH attend to token T for this prompt?\" It does so by identifying specific components (neurons,\nattention heads, autoencoder latents) that contribute to the behavior, showing automatically\ngenerated explanations of what causes those components to activate most strongly, and tracing\nconnections between components to help discover circuits.\n\nThese videos give an overview of TDB and show how it can be used to investigate [indirect object\nidentification in GPT-2 small](https:\u002F\u002Farxiv.org\u002Fabs\u002F2211.00593):\n\n- [Introduction](https:\u002F\u002Fwww.loom.com\u002Fshare\u002F721244075f12439496db5d53439d2f84?sid=8445200e-c49e-4028-8b8e-3ea8d361dec0)\n- [Neuron viewer pages](https:\u002F\u002Fwww.loom.com\u002Fshare\u002F21b601b8494b40c49b8dc7bfd1dc6829?sid=ee23c00a-9ede-4249-b9d7-c2ba15993556)\n- [Example: Investigating name mover heads, part 1](https:\u002F\u002Fwww.loom.com\u002Fshare\u002F3478057cec484a1b85471585fef10811?sid=b9c3be4b-7117-405a-8d31-0f9e541dcfb6)\n- [Example: Investigating name mover heads, part 2](https:\u002F\u002Fwww.loom.com\u002Fshare\u002F6bd8c6bde84b42a98f9a26a969d4a3ad?sid=4a09ac29-58a2-433e-b55d-762414d9a7fa)\n\n## What's in the release?\n\n- [Neuron viewer](neuron_viewer\u002FREADME.md): A React app that hosts TDB as well as pages with information about individual model components (MLP neurons, attention heads and autoencoder latents for both).\n- [Activation server](neuron_explainer\u002Factivation_server\u002FREADME.md): A backend server that performs inference on a subject model to provide data for TDB. It also reads and serves data from public Azure buckets.\n- [Models](neuron_explainer\u002Fmodels\u002FREADME.md): A simple inference library for GPT-2 models and their autoencoders, with hooks to grab activations.\n- [Collated activation datasets](datasets.md): top-activating dataset examples for MLP neurons, attention heads and autoencoder latents.\n\n## Setup\n\nFollow these steps to install the repo.  You'll first need python\u002Fpip, as well as node\u002Fnpm.\n\nThough optional, we recommend you use a virtual environment or equivalent:\n\n```sh\n# If you're already in a venv, deactivate it.\ndeactivate\n# Create a new venv.\npython -m venv ~\u002F.virtualenvs\u002Ftransformer-debugger\n# Activate the new venv.\nsource ~\u002F.virtualenvs\u002Ftransformer-debugger\u002Fbin\u002Factivate\n```\n\nOnce your environment is set up, follow the following steps:\n```sh\ngit clone git@github.com:openai\u002Ftransformer-debugger.git\ncd transformer-debugger\n\n# Install neuron_explainer\npip install -e .\n\n# Set up the pre-commit hooks.\npre-commit install\n\n# Install neuron_viewer.\ncd neuron_viewer\nnpm install\ncd ..\n```\n\nTo run the TDB app, you'll then need to follow the instructions to set up the [activation server backend](neuron_explainer\u002Factivation_server\u002FREADME.md) and [neuron viewer frontend](neuron_viewer\u002FREADME.md).\n\n## Making changes\n\nTo validate changes:\n\n- Run `pytest`\n- Run `mypy --config=mypy.ini .`\n- Run activation server and neuron viewer and confirm that basic functionality like TDB and neuron\n  viewer pages is still working\n\n\n## Links\n\n- [Terminology](terminology.md)\n\n## How to cite\n\nPlease cite as:\n\n```\nMossing, et al., “Transformer Debugger”, GitHub, 2024.\n```\n\nBibTex citation:\n\n```\n@misc{mossing2024tdb,\n  title={Transformer Debugger},\n  author={Mossing, Dan and Bills, Steven and Tillman, Henk and Dupré la Tour, Tom and Cammarata, Nick and Gao, Leo and Achiam, Joshua and Yeh, Catherine and Leike, Jan and Wu, Jeff and Saunders, William},\n  year={2024},\n  publisher={GitHub},\n  howpublished={\\url{https:\u002F\u002Fgithub.com\u002Fopenai\u002Ftransformer-debugger}},\n}\n```\n","Transformer Debugger (TDB) 是一个由OpenAI的超对齐团队开发的工具，旨在支持对小型语言模型特定行为的调查。该工具结合了自动化可解释性技术和稀疏自编码器，使用户能够在不编写代码的情况下快速探索，并在前向传递中进行干预以观察其对特定行为的影响。TDB能够识别出影响模型行为的具体组件（如神经元、注意力头、自编码器潜变量），并展示这些组件激活的原因及它们之间的连接，从而帮助研究者发现潜在的电路结构。适用于需要深入理解或调试语言模型内部机制的研究场景，尤其是在探索模型为何选择某个输出而非另一个时特别有用。","2026-06-11 03:40:49","high_star"]