[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72404":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":8,"rankLanguage":8,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},72404,"circuit-tracer","decoderesearch\u002Fcircuit-tracer","decoderesearch",null,"Python",2817,328,21,17,0,5,20,56,15,82.15,"MIT License",false,"main",[],"2026-06-12 04:01:05","# circuit-tracer\n\nThis library implements tools for finding circuits using features from (cross-layer) MLP transcoders, as originally introduced by [Ameisen et al. (2025)](https:\u002F\u002Ftransformer-circuits.pub\u002F2025\u002Fattribution-graphs\u002Fmethods.html) and [Lindsey et al. (2025)](https:\u002F\u002Ftransformer-circuits.pub\u002F2025\u002Fattribution-graphs\u002Fbiology.html).\n\nOur library performs three main tasks. \n1. Given a model with pre-trained transcoders, it finds the circuit \u002F attribution graph; i.e., it computes the direct effect that each non-zero transcoder feature, transcoder error node, and input token has on each other non-zero transcoder feature and output logit.\n2. Given an attribution graph, it visualizes this graph and allows you to annotate these features.\n3. Enables interventions on a model's transcoder features using the insights gained from the attribution graph; i.e. you can set features to arbitrary values, and observe how model output changes.\n\n## Getting Started\nOne quick way to start is to try our [tutorial notebook](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fcircuit_tracing_tutorial.ipynb)! \n\nYou can also find circuits and visualize them in one of three ways:\n1. Use `circuit-tracer` on [Neuronpedia](https:\u002F\u002Fwww.neuronpedia.org\u002Fgemma-2-2b\u002Fgraph?slug=gemma-fact-dallas-austin&pinnedIds=27_22605_10%2C20_15589_10%2CE_26865_9%2C21_5943_10%2C23_12237_10%2C20_15589_9%2C16_25_9%2C14_2268_9%2C18_8959_10%2C4_13154_9%2C7_6861_9%2C19_1445_10%2CE_2329_7%2CE_6037_4%2C0_13727_7%2C6_4012_7%2C17_7178_10%2C15_4494_4%2C6_4662_4%2C4_7671_4%2C3_13984_4%2C1_1000_4%2C19_7477_9%2C18_6101_10%2C16_4298_10%2C7_691_10&supernodes=%5B%5B%22state%22%2C%226_4012_7%22%2C%220_13727_7%22%5D%2C%5B%22preposition+followed+by+place+name%22%2C%2219_1445_10%22%2C%2218_6101_10%22%5D%2C%5B%22Texas%22%2C%2220_15589_10%22%2C%2220_15589_9%22%2C%2219_7477_9%22%2C%2216_25_9%22%2C%224_13154_9%22%2C%2214_2268_9%22%2C%227_6861_9%22%5D%2C%5B%22capital+%2F+capital+cities%22%2C%2215_4494_4%22%2C%226_4662_4%22%2C%224_7671_4%22%2C%223_13984_4%22%2C%221_1000_4%22%2C%2221_5943_10%22%2C%2217_7178_10%22%2C%227_691_10%22%2C%2216_4298_10%22%5D%5D&pruningThreshold=0.6&clickedId=21_5943_10&densityThreshold=0.99) - no installation required! Just click on `+ New Graph` to create your own, or use the drop-down menu to select an existing graph.\n2. Run `circuit-tracer` via a Python script or Jupyter notebook. Start with our [tutorial notebook](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fcircuit_tracing_tutorial.ipynb). This will work on Colab with the GPU resources provided for free by default - just click on the Colab badge! Check out the **Demos** section below for more tutorials. You can also run these demo notebooks locally, with your own compute.\n3. Run `circuit-tracer` via the command-line interface. This can only be done with your own compute. For more on how to do that, see **Command-Line Interface**. \n\nWorking with Gemma-2 (2B) is possible with relatively limited GPU resources; Colab GPUs have 15GB of RAM. More GPU RAM will allow you to do less offloading, and to use a larger batch size. \n\nCurrently, intervening on models with respect to the transcoder features you discover in your graphs is possible both when using `circuit-tracer` in a script or notebook, or on Neuronpedia for Gemma-2 (2B). To perform interventions on Neuronpedia, ensure at least one node is pinned, then click \"Steer\" in the subgraph.\n\n### Installation\nTo install this library, clone it and run the command  `pip install .` in its directory.\n\n### Demos\nWe include some demos showing how to use our library in the `demos` folder. The main demo is [`demos\u002Fcircuit_tracing_tutorial.ipynb`](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fcircuit_tracing_tutorial.ipynb), which replicates two of the findings from [this paper](https:\u002F\u002Ftransformer-circuits.pub\u002F2025\u002Fattribution-graphs\u002Fbiology.html) using Gemma 2 (2B). All demos except for the Llama demo can be run on Colab.\n\nWe also make two simple demos of attribution and intervention available, for those who want to learn more about how to use the library:\n- [`demos\u002Fattribute_demo.ipynb`](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fattribute_demo.ipynb): Demonstrates how to find circuits and visualize them. \n- [`demos\u002Fattribution_targets_demo.ipynb`](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fattribution_targets_demo.ipynb): Demonstrates how to find circuits by specifying attribution targets, i.e. specific logits (or related quantities) that you wish to attribute from. \n- [`demos\u002Fintervention_demo.ipynb`](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fintervention_demo.ipynb): Demonstrates how to perform interventions on models. \n\nWe finally provide demos that dig deeper into specific, pre-computed and pre-annotated attribution graphs, performing interventions to demonstrate the correctness of the annotated graph:\n- [`demos\u002Fgemma_demo.ipynb`](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fgemma_demo.ipynb): Explores graphs from Gemma 2 (2B).\n- [`demos\u002Fgemma_it_demo.ipynb`](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fgemma_it_demo.ipynb): Explores graphs from instruction-tuned Gemma 2 (2B), using transcoders from the base model.\n- [`demos\u002Fllama_demo.ipynb`](https:\u002F\u002Fgithub.com\u002Fsafety-research\u002Fcircuit-tracer\u002Fblob\u002Fmain\u002Fdemos\u002Fllama_demo.ipynb): Explores graphs from Llama 3.2 (1B). Not supported on Colab.\n\nWe also provide a number of annotated attribution graphs for both models, which can be found at the top of their two demo notebooks.\n\n## Usage\n### Available Transcoders\n\n**On HuggingFace**\n\nThe following transcoders are available for use with `circuit-tracer`; this means that the transcoder weights and features are both available (so features will load properly when you run the visualization server). You can use the HuggingFace repo name (e.g. `mntss\u002Fgemma-scope-transcoders`) as the `transcoders` argument of `ReplacementModel.from_pretrained`, or as the argument of `--transcoder_set` in the CLI. \n- Gemma-2 (2B): [PLTs](https:\u002F\u002Fhuggingface.co\u002Fmntss\u002Fgemma-scope-transcoders) (originally from [GemmaScope](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-scope)) and CLTs with 2 feature counts: [426K](https:\u002F\u002Fhuggingface.co\u002Fmntss\u002Fclt-gemma-2-2b-426k) and [2.5M](https:\u002F\u002Fhuggingface.co\u002Fmntss\u002Fclt-gemma-2-2b-2.5M)\n- Llama-3.2 (1B): [PLTs](https:\u002F\u002Fhuggingface.co\u002Fmntss\u002Ftranscoder-Llama-3.2-1B) and [CLTs](https:\u002F\u002Fhuggingface.co\u002Fmntss\u002Fclt-llama-3.2-1b-524k)\n- Qwen-3 PLTs: for Qwen-3 [0.6B](https:\u002F\u002Fhuggingface.co\u002Fmwhanna\u002Fqwen3-0.6b-transcoders-lowl0), [1.7B](https:\u002F\u002Fhuggingface.co\u002Fmwhanna\u002Fqwen3-1.7b-transcoders-lowl0), [4B](https:\u002F\u002Fhuggingface.co\u002Fmwhanna\u002Fqwen3-4b-transcoders), [8B](https:\u002F\u002Fhuggingface.co\u002Fmwhanna\u002Fqwen3-8b-transcoders), and [14B](https:\u002F\u002Fhuggingface.co\u002Fmwhanna\u002Fqwen3-14b-transcoders-lowl0)\n- [GPT-OSS (20B) CLT](https:\u002F\u002Fhuggingface.co\u002Fmntss\u002Fclt-131k)\n- Gemma-3 PLTs (originally from [GemmaScope-2](https:\u002F\u002Fhuggingface.co\u002Fgoogle\u002Fgemma-scope-2)) can be found [here for models of size 270M, 1B, 4B, 12B, and 27B, PT and IT](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002Fmwhanna\u002Fgemma-scope-2-transcoders-circuit-tracer). These require using the `nnsight` backend.\n- Llama 3.1 (8B) Instruct: [TopK PLTs](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fcrv-8b-instruct-transcoders)\n\n**Locally Saved Transcoders**\n\nLocally saved transcoders can be loaded by using `ReplacementModel.from_pretrained` and including the full root path (not relative path) as the `transcoders` argument (e.g. `\u002Fpath\u002Fto\u002Flocal_transcoders\u002F`). To enable feature visualizations in this case, you must direct the visualization server to the local feature data by setting the optional `features_dir` argument in `serve` to the same directory; for example: \n\n`serve(data_dir=graph_path, port=port, features_dir='\u002Fpath\u002Fto\u002Flocal_transcoders\u002F')`\n\n### Choosing a Backend\nBy default, `circuit-tracer` creates a `ReplacementModel` that inherits from the `TransformerLens` `HookedTransformer` class. However, `TransformerLens` does not support all HuggingFace models; it only supports those implemented in `TransformerLens`. \n\nCreating a `ReplacementModel` with `backend='nnsight'` will create an `nnsight`-backed `ReplacementModel` that inherits from its `LanguageModel` class; this supports most HuggingFace models. That is, you can create an `nnsight` `ReplacementModel` using `ReplacementModel.from_pretrained(model_name, backend='nnsight')`. Note, however, that the `nnsight` backend is still experimental: it is slower and less memory-efficient, and may not provide all of the functionality of the `TransformerLens` version.\n\n### Caching\nIn order to use the `lazy_decoder` and `lazy_encoder` options on transcoders, they must be stored in `circuit-tracer`-compatible format. While many transcoders have been uploaded in that format to HuggingFace, this requires large amounts of storage. `circuit-tracer` now supports instead creating a local cache of models, by calling e.g.\n\n```python\nfrom circuit_tracer.utils.caching import save_transcoders_to_cache\n\nhf_ref = \"mwhanna\u002Fgemma-scope-2-27b-pt\u002Ftranscoder_all\u002Fwidth_262k_l0_small\"\ncache_dir = '~\u002F.cache\u002F'\nsave_transcoders_to_cache(hf_ref, cache_dir=cache_dir)\n```\n\nYou can also empty the cache using `circuit_tracer.utils.caching.empty_cache`.\n\n## Command-Line Interface\n\nThe unified CLI performs the complete 3-step process for finding and visualizing circuits:\n\n### 3-Step Process\n1. **Attribution**: Runs the attribution algorithm to find the circuit\u002Fattribution graph, computing direct effects between transcoder features, error nodes, tokens, and output logits.\n2. **Graph File Creation**: Prunes the attribution graph to remove low-effect nodes and edges, then converts it to JSON format suitable for visualization.\n3. **Local Server**: Starts a local web server to visualize and interact with the graph in your browser.\n\n### Basic Usage\nTo find a circuit, create the graph files, and start up a local server, use the command:\n\n```\ncircuit-tracer attribute --prompt [prompt] --transcoder_set [transcoder_set] --slug [slug] --graph_file_dir [directory] --slug [slug] --graph_file_dir [graph_file_dir] --server\n```\n\nIt will tell you where the server is serving (something like `localhost:[port]`). If you run this command on a remote machine, make sure to enable port forwarding, so you can see the graphs locally!\n\n### Mandatory Arguments\n**Attribution**\n- `--prompt` (`-p`): The input prompt to analyze\n- `--transcoder_set` (`-t`): The set of transcoders to use for attribution. Options:\n  - HuggingFace repository ID (e.g., `mntss\u002Fgemma-scope-transcoders`, `username\u002Frepo-name@revision`)\n  - Convenience shortcuts: `gemma` (GemmaScope transcoders) or `llama` (ReLU transcoders)\n  - Path to locally saved transcoders: `\u002Fpath\u002Fto\u002Flocal_transcoders\u002F`\n\n**Graph File Creation**\n\nThese are required if you want to run a local web server for visualization:\n- `--slug`: A name\u002Fidentifier for your analysis run\n- `--graph_file_dir`: Directory path where JSON graph files will be saved\n\nYou can also save the raw attribution graph (to be loaded and used in Python later):\n- `--graph_output_path` (`-o`): Path to save the raw attribution graph (`.pt` file)\n\nYou must set `--slug` and `--graph_file_dir`, or `--graph_output_path`, or both! Otherwise the CLI will output nothing.\n\n**Local Server**\n- `--server`: Start a local web server for graph visualization\n\n### Optional Arguments\n\n**Attribution Parameters:**\n- `--model` (`-m`): Model architecture (auto-inferred for `gemma` and `llama` presets)\n- `--max_n_logits` (default: 10): Maximum number of logit nodes to attribute from\n- `--desired_logit_prob` (default: 0.95): Cumulative probability threshold for top logits\n- `--batch_size` (default: 256): Batch size for backward passes\n- `--max_feature_nodes`: Maximum number of feature nodes (defaults to 7500)\n- `--dtype`: Datatype in which to load the model \u002F transcoders (allowed: `float32\u002Ffp32`, `float16\u002Ffp16`, `bfloat16\u002Fbf16`)\n- `--offload`: Memory optimization option (`cpu`, `disk`, or `None`)\n- `--verbose`: Display detailed progress information\n\n**Graph Pruning Parameters:**\n- `--node_threshold` (default: 0.8): Keeps minimum nodes with cumulative influence ≥ threshold\n- `--edge_threshold` (default: 0.98): Keeps minimum edges with cumulative influence ≥ threshold\n\n**Server Parameters:**\n- `--port` (default: 8041): Port for the local server\n- `--features_dir` (default: None): Path to the directory containing feature files for local server, if using local transcoders\n\n### Examples\n\n**Complete workflow with visualization:**\n```\ncircuit-tracer attribute \\\n  --prompt \"The International Advanced Security Group (IAS\" \\\n  --transcoder_set gemma \\\n  --slug gemma-demo \\\n  --graph_file_dir .\u002Fgraph_files \\\n  --server\n```\n\n**Attribution only (save raw graph):**\n```\ncircuit-tracer attribute \\\n  --prompt \"The capital of France is\" \\\n  --transcoder_set llama \\\n  --graph_output_path france_capital.pt\n```\n\n### Graph Annotation\nWhen using the `--server` option, your browser will open to a local visualization interface. The interface is the same as in [the original papers](https:\u002F\u002Ftransformer-circuits.pub\u002F2025\u002Fattribution-graphs\u002Fmethods.html) (frontend available [here](https:\u002F\u002Fgithub.com\u002Fanthropics\u002Fattribution-graphs-frontend)).\n- **Select a node**: Click on a node.\n- **Pin \u002F unpin a node to subgraph pane**: Ctrl+click\u002FCommmand+click the node.\n- **Annotate a node**: Click on the \"Edit\" button on the right side of the window while a node is selected to edit its annotation.\n- **Group nodes**: Hold G and click on nodes to group them together into a supernode. Hold G and click on the x next to a supernode to ungroup all of them.\n- **Annotate supernode \u002F node group**: click on the label below the supernode to edit the supernode annotation.\n\n## Cite\nYou can cite this library as follows:\n```\n@misc{circuit-tracer,\n  author = {Hanna, Michael and Piotrowski, Mateusz and Lindsey, Jack and Ameisen, Emmanuel},\n  title = {circuit-tracer},\n  howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fdecoderesearch\u002Fcircuit-tracer}},\n  note = {The first two authors contributed equally and are listed alphabetically.},\n  year = {2025}\n}\n```\nor cite the paper [here](https:\u002F\u002Faclanthology.org\u002F2025.blackboxnlp-1.14\u002F).\n","circuit-tracer 是一个用于从多层感知机转码器中发现电路的Python库。该项目基于Ameisen等人和Lindsey等人的研究成果，主要实现三大功能：一是给定预训练模型，计算每个非零转码器特征、错误节点及输入令牌对其他非零转码器特征和输出logit的直接影响；二是可视化这些影响关系，并允许用户进行注释；三是利用归因图中的洞察对模型的转码器特征进行干预实验，观察模型输出的变化。适用于需要深入理解或调试神经网络内部工作机制的研究场景，如自然语言处理领域的模型解释性研究。MIT许可证下开源，社区活跃度高。",2,"2026-06-11 03:41:55","high_star"]