[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1581":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":8,"rankLanguage":8,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},1581,"In-Place-TTT","ByteDance-Seed\u002FIn-Place-TTT","ByteDance-Seed",null,"Python",231,24,1,5,0,3,6,36,9,53.79,"Apache License 2.0",false,"main",[],"2026-06-12 04:00:10","\u003Cdiv align=\"center\">\n 👋 Hi, everyone!\n  \u003Cbr>\n  We are \u003Cb>ByteDance Seed team.\u003C\u002Fb>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  You can get to know us better through the following channels👇\n  \u003Cbr>\n  \u003Ca href=\"https:\u002F\u002Fseed.bytedance.com\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-%231e37ff?style=for-the-badge&logo=bytedance&logoColor=white\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5793e67c-79bb-4a59-811a-fcc7ed510bd4\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-07C160?style=for-the-badge&logo=wechat&logoColor=white\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.xiaohongshu.com\u002Fuser\u002Fprofile\u002F668e7e15000000000303157d?xsec_token=ABl2-aqekpytY6A8TuxjrwnZskU-6BsMRE_ufQQaSAvjc%3D&xsec_source=pc_search\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FXiaohongshu-%23FF2442?style=for-the-badge&logo=xiaohongshu&logoColor=white\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwww.zhihu.com\u002Forg\u002Fdou-bao-da-mo-xing-tuan-dui\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fzhihu-%230084FF?style=for-the-badge&logo=zhihu&logoColor=white\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n![seed logo](https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fc42e675e-497c-4508-8bb9-093ad4d1f216)\n\n# In-Place Test-Time Training\n\n**Seamlessly Endowing LLMs with Test-Time Training Ability**\n\nGuhao Feng\\*, Shengjie Luo\\*, Kai Hua, Ge Zhang, Wenhao Huang, Di He, Tianle Cai\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fopenreview.net\u002Fforum?id=dTWfCLSoyl\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FICLR%202026-Oral-b31b1b?style=for-the-badge\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.06169\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-PDF-blue?style=for-the-badge&logo=adobeacrobatreader\">\u003C\u002Fa>\n  \u003Ca href=\"#license\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-Apache%202.0-green?style=for-the-badge\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nIn-Place TTT is a drop-in test-time training method for Transformer LLMs. This repository provides the training, checkpoint conversion, inference, and evaluation stack built on VeOmni, together with recommended configs for Qwen3-8B and LLaMA-3.1-8B.\n\n## News\n\n[2026\u002F03] The codebase is open-sourced.\n\u003Cbr>\n[2026\u002F02] In-Place TTT is accepted to ICLR 2026 as an Oral presentation.\n\n## Table of Contents\n\n- [In-Place Test-Time Training](#in-place-test-time-training)\n  - [News](#news)\n  - [Table of Contents](#table-of-contents)\n  - [Introduction](#introduction)\n  - [Getting Started](#getting-started)\n    - [Environment Setup](#environment-setup)\n    - [Data Preparation](#data-preparation)\n    - [Recommended Config](#recommended-config)\n    - [Training](#training)\n    - [Checkpoint Conversion](#checkpoint-conversion)\n    - [Evaluation](#evaluation)\n  - [Features](#features)\n  - [License](#license)\n  - [Citation](#citation)\n  - [About ByteDance Seed Team](#about-bytedance-seed-team)\n\n## Introduction\n\nCurrent large language models follow a static \"train then deploy\" paradigm. Once deployed, model weights are frozen and cannot adapt to new information encountered during inference. This limits long-context reasoning, where useful information arrives progressively and the model would benefit from updating itself as it reads.\n\n**In-Place Test-Time Training (In-Place TTT)** addresses this by updating a subset of model parameters, the MLP down-projection fast weights, during inference. Unlike prior TTT approaches that require architectural side modules or external memory, In-Place TTT stays inside the standard Transformer block and remains compatible with off-the-shelf autoregressive LLMs.\n\nThe method is centered around three ideas:\n\n1. **Architectural compatibility.** Fast weights live in the existing MLP down-projection matrix, so no extra attention heads or memory modules are introduced.\n2. **LM-aligned objective.** The fast-weight update is aligned with next-token prediction instead of a generic reconstruction target.\n3. **Chunk-wise update.** Long sequences are split into chunks so updates can be computed efficiently and scaled to long contexts.\n\n![In-Place TTT Method Overview](assets\u002Fpipeline.png)\n\nAs used in this repo, the end-to-end workflow is:\n\n1. Provide your own VeOmni-compatible processed dataset and base model assets.\n2. Launch continual pretraining with VeOmni through `train.sh` and `tasks\u002Ftrain_torch.py`.\n3. Export DCP checkpoints into HuggingFace format with `scripts\u002Fmerge_dcp_to_hf.py`.\n4. Run TTT-aware inference and RULER evaluation with `inference_model\u002F`, `eval.sh`, and `eval_config\u002F`.\n\nThe repository includes recommended training configs for Qwen3-8B and LLaMA-3.1-8B, checkpoint conversion utilities, and a full RULER evaluation pipeline via OpenCompass from 4K to 256K context lengths.\n\n## Getting Started\n\n### Environment Setup\n\n**Step 1.** Install PyTorch and FlashAttention:\n\n```bash\npip3 install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n\nwget https:\u002F\u002Fgithub.com\u002FDao-AILab\u002Fflash-attention\u002Freleases\u002Fdownload\u002Fv2.8.3\u002Fflash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl\npip3 install flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl\nrm flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl\n```\n\n**Step 2.** Install VeOmni from the validated commit:\n\n```bash\npip3 install \"veomni @ git+https:\u002F\u002Fgithub.com\u002FByteDance-Seed\u002FVeOmni.git@9b91e164bea9e17f17ed490aab5e076c2335ca25\"\n```\n\n**Step 3.** Install the remaining dependencies:\n\n```bash\npip3 install liger-kernel\npip3 install byted-wandb torchdata blobfile datasets diffusers tiktoken timm\npip3 install transformers==4.57.3\npip3 install opt_einsum einops\n\npip3 uninstall -y byted-wandb wandb\npip3 install byted-wandb\n```\n\n**Step 4.** Optionally verify the installed VeOmni source:\n\n```bash\npython3 - \u003C\u003C'PY'\nimport json, pathlib, veomni\np = pathlib.Path(veomni.__file__).resolve().parents[1] \u002F \"veomni-0.1.0.dist-info\" \u002F \"direct_url.json\"\nprint(\"veomni file:\", veomni.__file__)\nprint(\"direct_url:\", json.loads(p.read_text()) if p.exists() else \"not found\")\nPY\n```\n\n### Data Preparation\n\nThis repository no longer ships data-processing scripts. Provide your own processed dataset through `data.train_path`.\n\nThe recommended configs assume:\n\n- `data.data_type=plaintext`\n- `data.datasets_type=iterable`\n- `data.text_keys=content_split`\n\nFor dataset argument definitions and supported loading modes, refer to the official VeOmni docs:\n\n- [Arguments API Reference](https:\u002F\u002Fveomni.readthedocs.io\u002Fen\u002Flatest\u002Fusage\u002Farguments.html)\n- [Basic Modules \u002F Dataset & DataLoader](https:\u002F\u002Fveomni.readthedocs.io\u002Fen\u002Flatest\u002Fusage\u002Fbasic_modules.html)\n\nExample:\n\n```bash\nbash train.sh tasks\u002Ftrain_torch.py configs\u002Fpretrain\u002Fqwen3_longct.yaml \\\n  --data.train_path \u002Fpath\u002Fto\u002Fyour_data \\\n  --train.output_dir \u002Fpath\u002Fto\u002Fyour_output_dir\n```\n\n### Recommended Config\n\nBelow is the recommended model config pattern used in the provided Qwen and LLaMA examples.\n\n> **Tip:** Set `ttt_target: input_embed` for from-scratch pretraining, or `ttt_target: hidden_states` for continual training.\n\n```yaml\nmodel:\n  model_path: \u002Fpath\u002Fto\u002Fyour_base_model\n  foundation:\n    ttt_layers: [0, 6, 12, 18, 24, 30, 36]\n    ttt_mode: true\n    ttt_proj: true\n    ttt_lr: 3\n    ttt_chunk: 4096\n\ndata:\n  train_path: \u002Fpath\u002Fto\u002Fyour_data\n  train_size: 20000000000\n  dataloader_type: native\n  datasets_type: iterable\n  data_type: plaintext\n  max_seq_len: 65536\n  text_keys: content_split\n  drop_last: true\n\ntrain:\n  output_dir: \u002Fpath\u002Fto\u002Fyour_output_dir\n  data_parallel_mode: fsdp2\n  global_batch_size: 64\n  micro_batch_size: 1\n  optimizer: adamw\n  lr: 5.0e-6\n  lr_warmup_ratio: 0.02\n  lr_decay_style: cosine\n  lr_decay_ratio: 0.90\n  weight_decay: 0.1\n  max_grad_norm: 1.0\n  max_steps: 5000\n  enable_mixed_precision: true\n  enable_gradient_checkpointing: true\n  enable_full_shard: true\n  init_device: meta\n  ckpt_manager: dcp\n  save_steps: 500\n  save_hf_weights: true\n  use_wandb: true\n```\n\nThe corresponding recommended config files are:\n\n- `configs\u002Fpretrain\u002Fqwen3_longct.yaml`\n- `configs\u002Fpretrain\u002Fllama3_longct.yaml`\n\n### Training\n\nQuick smoke run:\n\n```bash\nbash train.sh tasks\u002Ftrain_torch.py configs\u002Fpretrain\u002Fqwen3_longct.yaml \\\n  --train.output_dir \u002Fpath\u002Fto\u002Fyour_output_dir \\\n  --train.max_steps 1 \\\n  --train.use_wandb false\n```\n\nRecommended Qwen config override:\n\n```bash\nbash train.sh tasks\u002Ftrain_torch.py configs\u002Fpretrain\u002Fqwen3_longct.yaml \\\n  --train.wandb_project your_wandb_project \\\n  --train.wandb_name your_run_name \\\n  --train.output_dir \u002Fpath\u002Fto\u002Fyour_output_dir \\\n  --model.foundation '{\"ttt_layers\":[0,6,12,18,24,30,36],\"ttt_mode\":true,\"ttt_proj\":true,\"ttt_lr\":3,\"ttt_chunk\":4096}'\n```\n\nRecommended LLaMA config override:\n\n```bash\nbash train.sh tasks\u002Ftrain_torch.py configs\u002Fpretrain\u002Fllama3_longct.yaml \\\n  --train.wandb_project your_wandb_project \\\n  --train.wandb_name your_run_name \\\n  --train.output_dir \u002Fpath\u002Fto\u002Fyour_output_dir \\\n  --model.foundation '{\"ttt_layers\":[0,6,12,18,24,30,36],\"ttt_mode\":true,\"ttt_proj\":true,\"ttt_lr\":3,\"ttt_chunk\":4096}'\n```\n\n### Checkpoint Conversion\n\nConvert VeOmni DCP checkpoints into HuggingFace format:\n\n```bash\npython scripts\u002Fmerge_dcp_to_hf.py \\\n  --load-dir \u002Fpath\u002Fto\u002Fyour_checkpoint_dir\n\npython scripts\u002Fmerge_dcp_to_hf.py \\\n  --load-dir \u002Fpath\u002Fto\u002Fyour_checkpoint_dir \\\n  --save-dir \u002Fpath\u002Fto\u002Fyour_hf_checkpoint_dir \\\n  --model-assets-dir \u002Fpath\u002Fto\u002Fyour_base_model \\\n  --shard-size 5000000000\n```\n\n### Evaluation\n\nRun the default RULER evaluation sweep:\n\n```bash\nbash eval.sh\n```\n\nSingle-config smoke run:\n\n```bash\nCUDA_VISIBLE_DEVICES=0 python3 -c \\\n  \"import inference_model; from opencompass.cli.main import main; import sys; sys.argv=['opencompass','eval_config\u002Fruler_4k.py','--debug']; main()\"\n```\n\nTo evaluate your own checkpoints, update `eval_config\u002Fmodels.py` with your model name and HuggingFace checkpoint path.\n\n## Features\n\n- **Drop-in TTT for standard Transformers.** In-Place TTT updates the MLP down-projection fast weights without introducing extra architectural side modules.\n- **LM-aligned fast-weight updates.** The optimization target is derived for autoregressive language modeling instead of a generic reconstruction objective.\n- **Long-context continual pretraining stack.** The repo includes recommended Qwen3-8B and LLaMA-3.1-8B configs built on VeOmni and FSDP2.\n- **Checkpoint export path.** `scripts\u002Fmerge_dcp_to_hf.py` converts VeOmni DCP checkpoints into HuggingFace format.\n- **TTT-aware inference and evaluation.** `inference_model\u002F`, `eval.sh`, and `eval_config\u002F` cover inference and RULER evaluation through OpenCompass.\n- **Long-context coverage.** The evaluation setup spans 4K, 8K, 16K, 32K, 64K, 128K, and includes a 256K config.\n\n## License\n\nThis project is licensed under the [Apache License 2.0](.\u002FLICENSE).\n\n## Citation\n\nIf you find this work useful for your research and applications, feel free to give us a star or cite us using:\n\n```bibtex\n@inproceedings{feng2026inplace,\n  title     = {In-Place Test-Time Training},\n  author    = {Feng, Guhao and Luo, Shengjie and Hua, Kai and Zhang, Ge and Huang, Wenhao and He, Di and Cai, Tianle},\n  booktitle = {International Conference on Learning Representations (ICLR)},\n  year      = {2026},\n  note      = {Oral Presentation},\n  url       = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.06169}\n}\n```\n\n## Contact\n\nIf you have any questions about this project, please feel free to contact us:\n\n> Guhao Feng (fenguhao@stu.pku.edu.cn)  \n> Shengjie Luo (shengjieluo@bytedance.com)\n\n## About [ByteDance Seed Team](https:\u002F\u002Fseed.bytedance.com\u002F)\n\nFounded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.\n","In-Place TTT 是一种针对Transformer大型语言模型的即插即用测试时训练方法。该项目提供了基于VeOmni构建的训练、检查点转换、推理和评估栈，并为Qwen3-8B和LLaMA-3.1-8B推荐了配置。其核心功能是在不改变原有模型结构的情况下，使模型能够在测试阶段进行自我调整，从而更好地适应新信息，增强长上下文推理能力。该技术适用于需要持续学习和适应新数据的应用场景，如动态内容生成、对话系统等。项目采用Python开发，遵循Apache License 2.0许可协议。",2,"2026-06-11 02:44:47","CREATED_QUERY"]