[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-84141":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":9,"trendingCount":14,"starSnapshotCount":14,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},84141,"model_brain_surgery","goldenplums2003\u002Fmodel_brain_surgery","goldenplums2003","Perform vector ablation to local LLM models",null,"Python",58,12,55,0,1,3,5,47.14,"MIT License",false,"main",true,[],"2026-06-12 04:01:42","# Model Prefrontal Lobotomy\n\n## English\n\nModel Prefrontal Lobotomy is a lightweight local workflow for activation capture and weight ablation on Hugging Face Transformers models. It is designed as a small research sandbox before moving the same idea to larger checkpoints.\n\nThe default project layout uses `Qwen3-1.7B` as the source model and writes the edited checkpoint to `surgery-output\u002F`.\n\n### Layout\n\n```text\nmodel_brain_surgery\u002F\n  brain_surgery.py        # Capture activations, compute a direction, edit weights, save the model\n  chat_qwen.py            # Minimal chat REPL; loads surgery-output\u002F by default\n  download_qwen.py        # Download Qwen3-1.7B into the project directory\n  harmful_prompts.py      # Target prompt set\n  harmless_prompts.py     # Harmless control prompt set\n  requirements.txt        # Python dependencies\n  Qwen3-1.7B\u002F             # Original Transformers checkpoint\n  surgery-output\u002F         # Edited checkpoint\n```\n\n### Setup\n\nPython 3.10+ is recommended. On Apple Silicon Macs, Python 3.11 or 3.12 is preferred.\n\n```bash\ncd model_brain_surgery\n\nconda create -n model-surgery python=3.12 -y\nconda activate model-surgery\n\npip install -U pip\npip install -r requirements.txt\n```\n\nCheck whether MPS is available:\n\n```bash\npython - \u003C\u003C'PY'\nimport torch\nprint(\"torch:\", torch.__version__)\nprint(\"mps built:\", torch.backends.mps.is_built())\nprint(\"mps available:\", torch.backends.mps.is_available())\nPY\n```\n\nIf `mps available` is `True`, the scripts will prefer Apple GPU. Otherwise they fall back to CUDA, then CPU.\n\n### Model\n\nModel checkpoints are too large for ordinary GitHub commits. Keep the code and prompt files in Git, but download the model locally before running the experiment.\n\nThe default model path is:\n\n```text\n.\u002FQwen3-1.7B\n```\n\nDownload the model into the current project directory:\n\n```bash\npython download_qwen.py\n```\n\nIf the directory already exists, you can run the scripts directly.\n\nUse a Transformers \u002F safetensors checkpoint for this workflow, not GGUF.\n\n### Prompt Files\n\n`brain_surgery.py` always reads these two files from the project directory:\n\n```text\nharmful_prompts.py    -> must define harmful_prompts: list[str]\nharmless_prompts.py   -> must define harmless_prompts: list[str]\n```\n\nAll prompts in both files are used. Edit these files directly to change the experiment target or control set.\n\n### Run Surgery\n\nDefault run:\n\n```bash\npython brain_surgery.py\n```\n\nDefaults:\n\n```text\nmodel:           .\u002FQwen3-1.7B\nlayers:          8-18\nablation scale:  1.0\noutput:          .\u002Fsurgery-output\ndevice:          auto -> mps, cuda, then cpu\ndtype:           float16\n```\n\nCommon examples:\n\n```bash\n# Smoke test only; do not save the edited checkpoint.\npython brain_surgery.py --layers 2,3 --ablation-scale 0.1 --skip-save\n\n# A moderate middle-layer edit.\npython brain_surgery.py --layers 8-12 --ablation-scale 0.3 --output .\u002Fsurgery-l8-12-s03\n\n# Use the default layer range and save to surgery-output\u002F.\npython brain_surgery.py\n```\n\n`--max-new-tokens` only controls the length of the pre\u002Fpost test generation. It does not affect activation capture or weight editing.\n\n### Chat\n\n`chat_qwen.py` loads this checkpoint by default:\n\n```text\n.\u002Fsurgery-output\n```\n\nRun:\n\n```bash\npython chat_qwen.py\n```\n\nCommands:\n\n```text\n\u002Fclear   clear conversation history\n\u002Fexit    quit\n```\n\nTo chat with the original model instead, set `MODEL_DIR` in `chat_qwen.py` to:\n\n```python\nMODEL_DIR = SCRIPT_DIR \u002F \"Qwen3-1.7B\"\n```\n\n### Method\n\nThe script collects last-token hidden states from selected transformer layers for the target prompts and harmless control prompts. It then computes the difference between the two mean activations and treats that vector as the direction to reduce.\n\nFor each selected layer, the script edits `mlp.down_proj.weight` so the layer emits less of that direction back into the residual stream. This is a direct weight edit, not training: there is no optimizer, no gradient step, and no dataset loop beyond the activation collection pass.\n\n### Notes\n\n- This project is intended for local research workflow validation.\n- Make sure prompts, outputs, and downstream usage comply with safety, ethical, and legal requirements.\n- A large `--ablation-scale` or a wide layer range can noticeably damage general model quality.\n- Start with `--skip-save` and a small layer range before saving an edited checkpoint.\n- GGUF \u002F LM Studio models are not suitable for this direct weight-editing workflow; use Transformers \u002F safetensors checkpoints.\n\n---\n\n## 模型前额叶切除手术\n\n模型的前额叶切除手术是一个轻量的本地实验项目，用于在 Hugging Face Transformers 格式的模型上做激活捕获和权重消融。它适合先在小模型上验证流程，再把同样思路迁移到更大的 checkpoint。\n\n当前项目默认使用 `Qwen3-1.7B` 作为原始模型，并把手术后的模型保存到 `surgery-output\u002F`。\n\n### 目录结构\n\n```text\nmodel_brain_surgery\u002F\n  brain_surgery.py        # 抓激活、计算方向、修改权重、保存模型\n  chat_qwen.py            # 最小聊天 REPL；默认加载 surgery-output\u002F\n  download_qwen.py        # 将 Qwen3-1.7B 下载到当前项目目录\n  harmful_prompts.py      # 目标 prompt 集合\n  harmless_prompts.py     # 无害对照 prompt 集合\n  requirements.txt        # Python 依赖\n  Qwen3-1.7B\u002F             # 原始 Transformers 模型\n  surgery-output\u002F         # 手术后的模型\n```\n\n### 环境安装\n\n推荐使用 Python 3.10+。Apple Silicon Mac 建议使用 Python 3.11 或 3.12。\n\n```bash\ncd model_brain_surgery\n\nconda create -n model-surgery python=3.12 -y\nconda activate model-surgery\n\npip install -U pip\npip install -r requirements.txt\n```\n\n检查 MPS 是否可用：\n\n```bash\npython - \u003C\u003C'PY'\nimport torch\nprint(\"torch:\", torch.__version__)\nprint(\"mps built:\", torch.backends.mps.is_built())\nprint(\"mps available:\", torch.backends.mps.is_available())\nPY\n```\n\n如果 `mps available` 是 `True`，脚本会优先使用 Apple GPU。否则会依次回退到 CUDA 和 CPU。\n\n### 模型准备\n\n模型权重文件过大，不适合直接提交到普通 GitHub 仓库。建议 GitHub 中只保存代码和 prompt 文件，运行实验前再在本地下载模型。\n\n默认模型路径是：\n\n```text\n.\u002FQwen3-1.7B\n```\n\n将模型下载到当前项目目录：\n\n```bash\npython download_qwen.py\n```\n\n如果 `Qwen3-1.7B\u002F` 已经存在，可以直接运行后续脚本。\n\n这里需要使用 Transformers \u002F safetensors 格式模型，不是 GGUF。\n\n### Prompt 文件\n\n`brain_surgery.py` 会固定读取项目目录下的两个文件：\n\n```text\nharmful_prompts.py    -> 必须定义 harmful_prompts: list[str]\nharmless_prompts.py   -> 必须定义 harmless_prompts: list[str]\n```\n\n运行时会使用两个文件中的全部 prompt。若要改变实验目标或对照集合，直接编辑这两个文件。\n\n### 运行手术\n\n默认运行：\n\n```bash\npython brain_surgery.py\n```\n\n默认参数：\n\n```text\nmodel:           .\u002FQwen3-1.7B\nlayers:          8-18\nablation scale:  1.0\noutput:          .\u002Fsurgery-output\ndevice:          auto -> mps, cuda, then cpu\ndtype:           float16\n```\n\n常用示例：\n\n```bash\n# 只做流程验证，不保存模型\npython brain_surgery.py --layers 2,3 --ablation-scale 0.1 --skip-save\n\n# 中层轻量手术\npython brain_surgery.py --layers 8-12 --ablation-scale 0.3 --output .\u002Fsurgery-l8-12-s03\n\n# 使用默认层范围并保存到 surgery-output\u002F\npython brain_surgery.py\n```\n\n`--max-new-tokens` 只控制术前\u002F术后测试回答的最长生成长度，不影响激活捕获或权重修改。\n\n### 聊天测试\n\n`chat_qwen.py` 默认加载：\n\n```text\n.\u002Fsurgery-output\n```\n\n运行：\n\n```bash\npython chat_qwen.py\n```\n\n命令：\n\n```text\n\u002Fclear   清空上下文\n\u002Fexit    退出\n```\n\n如果想和原始模型聊天，把 `chat_qwen.py` 中的 `MODEL_DIR` 改为：\n\n```python\nMODEL_DIR = SCRIPT_DIR \u002F \"Qwen3-1.7B\"\n```\n\n### 方法简述\n\n脚本会分别在目标 prompts 和无害对照 prompts 上收集指定 Transformer 层的最后 token hidden state。随后计算两组平均激活之差，并把这个差值向量视为需要削弱的方向。\n\n对于每个被选中的层，脚本会修改 `mlp.down_proj.weight`，让该层更少地把这个方向写回 residual stream。这是一次直接权重编辑，不是训练：没有优化器，没有梯度更新，也没有训练循环，只有激活收集和一次性矩阵修改。\n\n### 注意事项\n\n- 本项目用于本地科研流程验证。\n- 请确保 prompt 集合、模型输出和后续使用符合安全、伦理和法律要求。\n- 过大的 `--ablation-scale` 或过宽的层范围可能明显损伤模型通用能力。\n- 建议先用 `--skip-save` 和少量层做 smoke test，再正式保存编辑后的模型。\n- GGUF \u002F LM Studio 模型不适合直接运行本项目里的权重手术；请使用 Transformers \u002F safetensors 权重。\n",2,"2026-06-11 04:12:22","CREATED_QUERY"]