[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81048":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},81048,"delta-Mem","MindLab-Research\u002Fdelta-Mem","MindLab-Research","Repo of Paper: delta-Mem: Efficient Online Memory for Large Language Models",null,"Python",41,4,28,1,0,5,13,15,2.1,false,"main",[],"2026-06-12 02:04:10","\u003Ch1 align=\"center\">\n    δ-mem: Efficient Online Memory for Large Language Models\n\u003C\u002Fh1>\n\n\u003Cp align=\"center\">\n\u003Ca href=\"https:\u002F\u002Fcreativecommons.org\u002Flicenses\u002Fby\u002F4.0\u002F\">\u003Cimg alt=\"License: CC-BY-4.0\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-CC_BY_4.0-brightgreen.svg\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.12357\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-B31B1B?style=flat-square&logo=arxiv&logoColor=white\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdeclare-lab\u002Fdelta-mem_qwen3_4b-instruct\">\u003Cimg alt=\"Huggingface\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F🤗_Huggingface-Model-ff9800.svg\">\u003C\u002Fa>\n\u003C\u002Fp>\n\nδ-mem introduces a compact Online State of Associative Memory alongside a frozen full-attention backbone. When a new token or interaction segment arrives, the model projects the current information into a low-dimensional memory space and writes it into the state through delta-rule learning.\n\nThis repository contains the main δ-mem implementation, training scripts, evaluation scripts, and an interactive chat demo. The current public release focuses on Qwen3-4B\u002F8B and SmolLM3-3B experiments with three write strategies: TSW, SSW, and MSW.\n\n## Why δ-mem?\n\nIn long-term agent scenarios, what is truly needed is a more efficient memory mechanism. Such a mechanism should not endlessly increase the context burden like full-text retrieval, nor should it behave like static parametric memory that becomes fixed after training. Instead, it should be able to update dynamically during interaction and directly influence the model’s internal computation during inference. Motivated by this, we propose **δ-mem**, a lightweight online memory mechanism for large language models.\n\n\n## Released Model\n\n| Model | Base model | Adapter | Hugging Face |\n| --- | --- | --- | --- |\n| δ-mem Qwen3-4B Instruct TSW | `Qwen\u002FQwen3-4B-Instruct-2507` | rank-8 Q\u002FO TSW, write length 8192 | [`declare-lab\u002Fdelta-mem_qwen3_4b-instruct`](https:\u002F\u002Fhuggingface.co\u002Fdeclare-lab\u002Fdelta-mem_qwen3_4b-instruct) |\n\n## What Is In This Repository?\n\n```text\nDelta-Mem\u002F\n├── data\u002F\n│   └── locomo10.json                     # local LoCoMo sample file used by scripts\n├── deltamem\u002F\n│   ├── core\u002F                             # Delta-Mem modules, config, adapter save\u002Fload\n│   ├── demo\u002F                             # interactive chat demo\n│   ├── eval\u002F                             # LoCoMo, HotpotQA, IFEval, GPQA, MemoryAgentBench\n│   ├── kernels\u002F                          # affine scan kernel wrapper\n│   ├── runtime\u002F                          # chat\u002Fsession runtime\n│   ├── tests\u002F                            # regression tests\n│   ├── tools\u002F                            # TPS and inspection tools\n│   └── train\u002F                            # SFT training code\n├── scripts\u002F\n│   ├── setup_uv_env.sh\n│   ├── run_qasper_multimodel_write8192_train_and_benchmark_suite.sh\n│   ├── run_qasper_multimodel_write8192_benchmark_suite.sh\n│   ├── run_qasper_multimodel_write8192_*_qwen3_8b.sh\n│   ├── run_qasper_multimodel_write8192_*_smollm3_3b.sh\n│   └── run_generation_tps_benchmark.sh\n└── deepspeed_zero2.json\n```\n\n## Environment Setup\n\n### System Requirements\n\nRecommended setup:\n\n| Component | Recommendation |\n| --- | --- |\n| Python | 3.10 or newer |\n| GPU | NVIDIA GPU for training\u002Fevaluation |\n| CUDA\u002FPyTorch | A CUDA-enabled PyTorch build matching your driver |\n| Package manager | `uv` |\n\nThe training scripts are designed for bf16 GPU runs and use FlashAttention and DeepSpeed. CPU-only usage is not the target path for this release.\n\n### One-Command Setup\n\nClone the repository and run the setup script:\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fdeclare-lab\u002Fdelta-Mem.git\ncd delta-Mem\nbash scripts\u002Fsetup_uv_env.sh\n```\n\nThe script creates a fresh `.venv\u002F`, installs `requirements.txt`, installs FlashAttention with `--no-build-isolation`, and prints a short import\u002FCUDA diagnostic at the end.\n\nIf `uv` is not installed:\n\n```bash\npython -m pip install uv\n```\n\nActivate the environment:\n\n```bash\nsource .venv\u002Fbin\u002Factivate\n```\n\n### Setup Options\n\nUse a specific Python executable:\n\n```bash\nPYTHON_BIN=python3.11 bash scripts\u002Fsetup_uv_env.sh\n```\n\nKeep an existing `.venv\u002F` instead of recreating it:\n\n```bash\nKEEP_VENV=1 bash scripts\u002Fsetup_uv_env.sh\n```\n\nSkip FlashAttention reinstall if your cluster already provides a working build:\n\n```bash\nINSTALL_FLASH_ATTN=0 bash scripts\u002Fsetup_uv_env.sh\n```\n\n### Manual Setup\n\nIf you prefer to manage the environment yourself:\n\n```bash\npython -m pip install uv\nuv venv --python python3.11 .venv\nsource .venv\u002Fbin\u002Factivate\nuv pip install --upgrade pip setuptools wheel\nuv pip install -r requirements.txt\nuv pip install --no-build-isolation flash-attn\n```\n\nIf PyTorch needs to be installed from a specific CUDA index, install it before the requirements, for example:\n\n```bash\nuv pip install torch --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu124\nuv pip install -r requirements.txt\n```\n\n### Verify The Environment\n\nRun:\n\n```bash\npython - \u003C\u003C'PY'\nimport torch, transformers, datasets, accelerate, deepspeed, flash_attn, peft\nprint(\"torch:\", torch.__version__)\nprint(\"cuda:\", torch.cuda.is_available())\nprint(\"transformers:\", transformers.__version__)\nprint(\"datasets:\", datasets.__version__)\nprint(\"deepspeed:\", deepspeed.__version__)\nprint(\"flash_attn:\", flash_attn.__file__)\nprint(\"peft:\", peft.__version__)\nPY\n```\n\nThen run the local checks:\n\n```bash\nPYTHONPATH=. python -m compileall -q deltamem\nPYTHONPATH=. python -m pytest -q deltamem\u002Ftests\n```\n\n## Path Configuration\n\nThe experiment scripts intentionally use placeholder paths under `\u002Froot\u002F...`:\n\n```text\n\u002Froot\u002Fhuggingface\n\u002Froot\u002Fmodels\n\u002Froot\u002Fdata\n\u002Froot\u002Foutputs\n\u002Froot\u002Fexternal\u002FMemoryAgentBench\n```\n\nBefore running training or evaluation, either edit the script variables or override them from the shell:\n\n```bash\nBASE_MODEL_PATH=\u002Fpath\u002Fto\u002FQwen3-4B-Instruct-2507 \\\nTSW_ADAPTER_DIR=\u002Fpath\u002Fto\u002Fdelta-mem-adapter \\\nSUITE_ROOT=\u002Fpath\u002Fto\u002Fresults \\\nbash scripts\u002Frun_qasper_multimodel_write8192_benchmark_suite.sh\n```\n\n## Use The Released Adapter\n\nDownload the adapter from Hugging Face:\n\n```bash\nhuggingface-cli download declare-lab\u002Fdelta-mem_qwen3_4b-instruct \\\n  --local-dir .\u002Fdelta-mem_qwen3_4b-instruct\n```\n\nMinimal loading example:\n\n```python\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nfrom deltamem.core import HFDeltaMemConfig, attach_delta_mem, load_delta_mem_adapter\n\nbase_model = \"Qwen\u002FQwen3-4B-Instruct-2507\"\nadapter_dir = \".\u002Fdelta-mem_qwen3_4b-instruct\"\n\ntokenizer = AutoTokenizer.from_pretrained(base_model)\nmodel = AutoModelForCausalLM.from_pretrained(\n    base_model,\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n)\n\nconfig = HFDeltaMemConfig.from_pretrained(adapter_dir)\nattach_delta_mem(model, config)\nload_delta_mem_adapter(model, adapter_dir)\nmodel.eval()\n```\n\nδ-mem adapters are not standard PEFT LoRA adapters and are not merged into the base model with `merge_and_unload()`. The runtime memory read\u002Fwrite path is part of the model execution.\n\n## Chat Demo\n\nRun the default shell wrapper:\n\n```bash\nbash deltamem\u002Fdemo\u002Frun_chat_demo.sh\n```\n\nTypical override:\n\n```bash\nMODEL_PATH=\u002Fpath\u002Fto\u002FQwen3-4B-Instruct-2507 \\\nADAPTER_DIR=\u002Fpath\u002Fto\u002Fdelta-mem_qwen3_4b-instruct \\\nbash deltamem\u002Fdemo\u002Frun_chat_demo.sh\n```\n\nRun the base model without δ-mem:\n\n```bash\nMODE=base MODEL_PATH=\u002Fpath\u002Fto\u002FQwen3-4B-Instruct-2507 \\\nbash deltamem\u002Fdemo\u002Frun_chat_demo.sh\n```\n\n## Training\n\nThe main Qwen3-4B training script trains SSW, TSW, and MSW variants by default:\n\n```bash\nbash scripts\u002Frun_qasper_multimodel_write8192_train_and_benchmark_suite.sh\n```\n\nRun only TSW:\n\n```bash\nTRAIN_VARIANTS_STRING=\"TSW_rank8_qasper_write8192\" \\\nBENCHMARK_VARIANTS_STRING=\"TSW_rank8_qasper_write8192\" \\\nbash scripts\u002Frun_qasper_multimodel_write8192_train_and_benchmark_suite.sh\n```\n\nModel-specific scripts:\n\n```bash\nbash scripts\u002Frun_qasper_multimodel_write8192_train_and_benchmark_suite_qwen3_8b.sh\nbash scripts\u002Frun_qasper_multimodel_write8192_train_and_benchmark_suite_smollm3_3b.sh\n```\n\n## Evaluation\n\nThe main benchmark suite covers:\n\n| Benchmark | Entry |\n| --- | --- |\n| LoCoMo | `deltamem.eval.locomo_delta` |\n| HotpotQA | `deltamem.eval.benchmark_compare --tasks hotpotqa` |\n| IFEval | `deltamem.eval.benchmark_compare --tasks ifeval` |\n| GPQA Diamond | `deltamem.eval.benchmark_compare --tasks gpqa_diamond` |\n| MemoryAgentBench | `deltamem.eval.benchmark_compare --tasks memory_agent_bench` |\n\nRun the bundled Qwen3-4B benchmark suite:\n\n```bash\nbash scripts\u002Frun_qasper_multimodel_write8192_benchmark_suite.sh\n```\n\nRun only the TSW adapter and skip base-model evaluation:\n\n```bash\nBENCHMARK_VARIANTS_STRING=\"TSW_rank8_qasper_write8192\" \\\nEVAL_TASKS_STRING=\"locomo hotpotqa gpqa_diamond ifeval memory_agent_bench\" \\\nbash scripts\u002Frun_qasper_multimodel_write8192_benchmark_suite.sh\n```\n\n## Citation\nIf you find our work is usefule, please kindly cite:\n\n```bibtex\n@misc{lei2026deltamemefficientonlinememory,\n      title={$\\delta$-mem: Efficient Online Memory for Large Language Models}, \n      author={Jingdi Lei and Di Zhang and Junxian Li and Weida Wang and Kaixuan Fan and Xiang Liu and Qihan Liu and Xiaoteng Ma and Baian Chen and Soujanya Poria},\n      year={2026},\n      eprint={2605.12357},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.12357}, \n}\n```\n","δ-mem是一个为大型语言模型设计的高效在线记忆机制。它通过将当前信息投影到低维记忆空间并利用delta规则学习来更新状态，从而实现动态记忆更新。项目使用Python编写，提供了包括训练脚本、评估脚本以及一个交互式聊天演示在内的完整实现。δ-mem特别适用于需要长期保持对话连贯性的场景，如客服机器人或个人助手等应用环境，能够有效减轻上下文负担同时保持模型内部计算的灵活性。目前公开版本主要基于Qwen3-4B\u002F8B和SmolLM3-3B实验，并支持三种写入策略：TSW、SSW与MSW。",2,"2026-06-11 04:03:19","CREATED_QUERY"]