[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72477":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":14,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72477,"Memento","Memento-Teams\u002FMemento","Memento-Teams","Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs","https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.16153",null,"Python",2455,284,33,13,0,6,14,18,29.36,"MIT License",false,"main",true,[],"2026-06-12 02:03:03","# Memento: Fine-tuning LLM Agents **without** Fine-tuning LLMs\n\n> A memory-based, continual-learning framework that helps LLM agents improve from experience **without** updating model weights.\n\n\u003Cp align=\"center\">\n  \u003Cb>Planner–Executor Architecture\u003C\u002Fb> • \u003Cb>Case-Based Reasoning\u003C\u002Fb> • \u003Cb>MCP Tooling\u003C\u002Fb> • \u003Cb>Memory-Augmented Learning\u003C\u002Fb>\n\u003C\u002Fp>\n\n---\n\n\u003Ctable>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"50%\">\n      \u003Cimg src=\"Figure\u002Ff1_val_test.jpg\" width=\"90%\"\u002F>\n      \u003Cbr\u002F>\n      \u003Csub>\u003Cb>Memento vs. Baselines on GAIA validation and test sets.\u003C\u002Fb>\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" width=\"50%\">\n      \u003Cimg src=\"Figure\u002Ff1_tasks.jpg\" width=\"90%\"\u002F>\n      \u003Cbr\u002F>\n      \u003Csub>\u003Cb>Ablation study of Memento across benchmarks.\u003C\u002Fb>\u003C\u002Fsub>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n  \u003Ctr>\n    \u003Ctd align=\"center\" width=\"50%\">\n      \u003Cimg src=\"Figure\u002Ff1_iteration.jpg\" width=\"90%\"\u002F>\n      \u003Cbr\u002F>\n      \u003Csub>\u003Cb>Continual learning curves across memory designs.\u003C\u002Fb>\u003C\u002Fsub>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\" width=\"50%\">\n      \u003Cimg src=\"Figure\u002Ff1_ood.jpg\" width=\"90%\"\u002F>\n      \u003Cbr\u002F>\n      \u003Csub>\u003Cb>Memento’s accuracy improvement on OOD datasets.\u003C\u002Fb>\u003C\u002Fsub>\n    \u003C\u002Ftd>\n  \u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## 📰 News\n- [2025.10.05] We’re excited to announce that our parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉\n- [2025.09.05] We’ve added support to deploy a local LLM as the executor using vLLM, please see client\u002Fagent_local_server.py. 🎉\n- [2025.09.03] We’ve set up a WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! 🔥 🔥 🔥 [Join our WeChat Group Now!](Figure\u002Fwechat.jpg)\n- [2025.08.30] We’re excited to announce that our no-parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉\n- [2025.08.28] We’ve created a Discord server to make discussions and collaboration around this project easier. Feel free to join and share your thoughts, ask questions, or contribute ideas! 🔥 🔥 🔥 [Join our Discord!](https:\u002F\u002Fdiscord.gg\u002Fy4FP2EDXyX)\n- [2025.08.27] Thanks for your interest in our work! We’ll release our CBR code next week and our Parametric Memory code next month. We’ll keep updating on our further development.\n- [2025.08.27] We add a new Crawler MCP in ```server\u002Fai_crawler.py``` for web crawling and query-aware content compression to reduce token cost.\n- [2025.08.26] We add the SerpAPI (https:\u002F\u002Fserpapi.com\u002Fsearch-api) MCP tool to help you avoid using the search Docker and speed up development.\n\n## 🔥 Key Features\n\n- **No LLM weight updates.** Memento reframes continual learning as **memory-based online reinforcement learning** over a **memory-augmented MDP**. A neural **case-selection policy** guides actions; experiences are stored and reused via efficient Read\u002FWrite operations.\n- **Two-stage planner–executor loop.** A CBR-driven **Planner** decomposes tasks and retrieves relevant cases; an **Executor** runs each subtask as an MCP client, orchestrating tools and writing back outcomes.\n- **Comprehensive tool ecosystem.** Built-in support for web search, document processing, code execution, image\u002Fvideo analysis, and more through a unified MCP interface.\n- **Strong benchmark performance.** Achieves competitive results across GAIA, DeepResearcher, SimpleQA, and HLE benchmarks.\n\n---\n\n## 🧠 Core Concept\n\n**Learn from experiences, not gradients.** Memento logs successful & failed trajectories into a **Case Bank** and **retrieves by value** to steer planning and execution—enabling low-cost, transferable, and online continual learning.\n\n---\n\n## 🏗️ Architecture\n\n### Core Components\n\n- **Meta-Planner**: Breaks down high-level queries into executable subtasks using GPT-4.1\n- **Executor**: Executes individual subtasks using o3 or other models via MCP tools\n- **Case Memory**: Stores final-step tuples **(s_T, a_T, r_T)** for experience replay\n- **MCP Tool Layer**: Unified interface for external tools and services\n\n### Tool Ecosystem\n\n- **Web Research**: Live search and controlled crawling via SearxNG\n- **Document Processing**: Multi-format support (PDF, Office, images, audio, video)\n- **Code Execution**: Sandboxed Python workspace with security controls\n- **Data Analysis**: Excel processing, mathematical computations\n- **Media Analysis**: Image captioning, video narration, audio transcription\n\n---\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.11+\n- OpenAI API key (or compatible API endpoint)\n- SearxNG instance for web search\n- FFmpeg (system-level binary required for video processing)\n- PyTorch 2.0+ with CUDA support (for Parametric Memory)\n\n📖 **For detailed installation instructions, see [INSTALL.md](INSTALL.md)**\n\n### Installation\n\n#### Method 1: Using uv (Recommended - Fast & Modern)\n\n```bash\n# Clone repository\ngit clone https:\u002F\u002Fgithub.com\u002FAgent-on-the-Fly\u002FMemento\ncd Memento\n\n# Install uv if not already installed\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n\n# Sync dependencies and create virtual environment automatically\nuv sync\n\n# Activate the virtual environment\nsource .venv\u002Fbin\u002Factivate  # On Windows: .venv\\Scripts\\activate\n```\n\n#### Method 2: Using pip with requirements.txt\n\n```bash\n# Clone repository\ngit clone https:\u002F\u002Fgithub.com\u002FAgent-on-the-Fly\u002FMemento\ncd Memento\n\n# Create and activate virtual environment\npython -m venv .venv\nsource .venv\u002Fbin\u002Factivate  # On Windows: .venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n```\n\n#### PyTorch Installation\n\n**For GPU support (Recommended for Parametric Memory):**\n\n```bash\n# CUDA 11.8\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu118\n\n# CUDA 12.1\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n\n# CPU only\npip install torch torchvision torchaudio --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcpu\n```\n\nFor more PyTorch installation options, visit: https:\u002F\u002Fpytorch.org\u002Fget-started\u002Flocally\u002F\n\n\n### System Dependencies Installation\n\n#### FFmpeg Installation (Required)\n\n**FFmpeg is required for video processing functionality.** The `ffmpeg-python` package in our dependencies requires a system-level FFmpeg binary.\n\n**Windows:**\n```bash\n# Option 1: Using Conda (Recommended for isolated environment)\nconda install -c conda-forge ffmpeg\n\n# Option 2: Download from official website\n# Visit https:\u002F\u002Fffmpeg.org\u002Fdownload.html and add to PATH\n```\n\n**macOS:**\n```bash\n# Using Homebrew\nbrew install ffmpeg\n```\n\n**Linux:**\n```bash\n# Debian\u002FUbuntu\nsudo apt-get update && sudo apt-get install ffmpeg\n\n```\n\n#### Web Scraping & Search Setup\n\n```bash\n# Install and setup crawl4ai\ncrawl4ai-setup\ncrawl4ai-doctor\n\n# Install playwright browsers\nplaywright install\n```\n\n### Environment Variables Configuration\n\nAfter creating the `.env` file, you need to configure the following API keys and service endpoints:\n\n```bash\n#===========================================\n# OpenAI API Configuration\n#===========================================\nUSE_AZURE_OPENAI=False\n\nOPENAI_API_KEY=your_openai_api_key_here\nOPENAI_BASE_URL=https:\u002F\u002Fapi.openai.com\u002Fv1  # or your custom endpoint\n\nAZURE_OPENAI_API_KEY=your_azure_openai_api_key_here\nAZURE_OPENAI_API_VERSION=your_azure_openai_api_version_here\nAZURE_OPENAI_ENDPOINT=your_azure_openai_endpoint_here\n\n#===========================================\n# Tools & Services API\n#===========================================\n# Chunkr API (https:\u002F\u002Fchunkr.ai\u002F)\nCHUNKR_API_KEY=your_chunkr_api_key_here\n\n# Jina API\nJINA_API_KEY=your_jina_api_key_here\n\n# ASSEMBLYAI API\nASSEMBLYAI_API_KEY=your_assemblyai_api_key_here\n```\n\n**Note**: Replace `your_*_api_key_here` with your actual API keys. Some services are optional depending on which tools you plan to use.\n\n\n### SearxNG Setup\n\nFor web search capabilities, set up SearxNG:\nYou can follow https:\u002F\u002Fgithub.com\u002Fsearxng\u002Fsearxng-docker\u002F to set the docker and use our setting.\n\n```bash\n# In a new terminal\ncd .\u002FMemento\u002Fsearxng-docker\ndocker compose up -d\n```\n\n\n### Basic Usage\n\n#### Interactive Mode\n\n```bash\npython client\u002Fagent.py\n```\n\n#### Parametric Memory Mode (Advanced - With Memory Retriever)\n\n**Parametric Memory** enables the agent to learn from past experiences using a trained neural retriever model.\n\n**Step 1: Train the Memory Retriever**\n\nFirst, you need to train the retriever model with initial training data:\n\n```bash\ncd memory\n\n# Train the retriever model\npython train_memory_retriever.py \\\n  --train training_data.jsonl \\\n  --output_dir .\u002Fckpts\u002Fretriever \\\n  --use_plan \\\n  --val_ratio 0.1 \\\n  --batch_size 32 \\\n  --lr 2e-5 \\\n  --epochs 10 \\\n  --save_best\n```\n\n**Step 2: Configure Environment Variables**\n\nAdd the following to your `.env` file:\n\n```bash\n# Memory Configuration\nMEMORY_JSONL_PATH=..\u002Fmemory\u002Fmemory.jsonl\nTRAINING_DATA_PATH=..\u002Fmemory\u002Ftraining_data.jsonl\nRETRIEVER_MODEL_PATH=..\u002Fmemory\u002Fckpts\u002Fretriever\u002Fbest.pt\nMEMORY_TOP_K=8\nMEMORY_MAX_POS_EXAMPLES=8\nMEMORY_MAX_NEG_EXAMPLES=8\n```\n\n**Step 3: Run Parametric Memory Agent**\n\n```bash\ncd client\n\npython parametric_memory.py\n```\n---\n\n## 🔧 Configuration\n\n### Model Selection\n\n- **Planner Model**: Defaults to `gpt-4.1` for task decomposition\n- **Executor Model**: Defaults to `o3` for task execution\n- **Custom Models**: Support for any OpenAI-compatible API\n\n### Tool Configuration\n\n- **Search**: Configure SearxNG instance URL\n- **Code Execution**: Customize import whitelist and security settings\n- **Document Processing**: Set cache directories and processing limits\n\n---\n\n## 📊 Performance\n\n### Benchmark Results\n\n- **GAIA**: 87.88% (Val, Pass@3 Top-1) and **79.40%** (Test)\n- **DeepResearcher**: **66.6% F1 \u002F 80.4% PM**, with **+4.7–9.6** absolute gains on OOD datasets\n- **SimpleQA**: **95.0%**\n- **HLE**: **24.4% PM** (close to GPT-5 at 25.32%)\n\n### Key Insights\n\n- **Small, high-quality memory works best**: Retrieval **K=4** yields peak F1\u002FPM\n- **Planning + CBR consistently improves performance**\n- **Concise, structured planning outperforms verbose deliberation**\n\n---\n\n## 🛠️ Development\n\n### Project Structure\n\n```\nMemento\u002F\n├── client\u002F                      # Main agent implementation\n│   ├── agent.py                # Hierarchical client with planner–executor\n│   ├── no_parametric_cbr.py    # Non-parametric case-based reasoning\n│   ├── parametric_memory.py    # Parametric memory with neural retriever\n│   ├── run_parametric.sh       # Convenience script for parametric mode\n│   └── PARAMETRIC_MEMORY_GUIDE.md  # Detailed parametric memory guide\n├── server\u002F                      # MCP tool servers\n│   ├── code_agent.py           # Code execution & workspace management\n│   ├── search_tool.py          # Web search via SearxNG\n│   ├── serp_search.py          # SERP-based search tool\n│   ├── documents_tool.py       # Multi-format document processing\n│   ├── image_tool.py           # Image analysis & captioning\n│   ├── video_tool.py           # Video processing & narration\n│   ├── excel_tool.py           # Spreadsheet processing\n│   ├── math_tool.py            # Mathematical computations\n│   ├── craw_page.py            # Web page crawling\n│   └── ai_crawler.py           # Query-aware compression crawler\n├── interpreters\u002F                # Code execution backends\n│   ├── docker_interpreter.py\n│   ├── e2b_interpreter.py\n│   ├── internal_python_interpreter.py\n│   └── subprocess_interpreter.py\n├── memory\u002F                      # Memory components \u002F data\n│   ├── parametric_memory.py        # Case retriever for inference\n│   ├── train_memory_retriever.py  # Retriever training script\n│   ├── np_memory.py            # Non-parametric memory utilities\n│   ├── retrain.sh              # Convenience script for retraining\n│   ├── memory.jsonl            # Memory pool (cases with labels)\n│   ├── training_data.jsonl     # Training data for retriever\n│   └── ckpts\u002F                  # Model checkpoints\n│       └── retriever\u002F\n│           ├── best.pt         # Best performing model\n│           └── last.pt         # Last epoch model\n├── data\u002F                        # Sample data \u002F cases\n├── searxng-docker\u002F              # SearxNG Docker setup\n├── Figure\u002F                      # Figures for README\u002Fpaper\n├── README.md\n├── requirements.txt\n├── pyproject.toml\n└── LICENSE\n```\n\n### Adding New Tools\n\n1. Create a new FastMCP server in the `server\u002F` directory\n2. Implement your tool functions with proper error handling\n3. Register the tool with the MCP protocol\n4. Update the client's server list in `agent.py`\n\n### Custom Interpreters\n\nExtend the `interpreters\u002F` module to add new execution backends:\n\n```python\nfrom interpreters.base import BaseInterpreter\n\nclass CustomInterpreter(BaseInterpreter):\n    async def execute(self, code: str) -> str:\n        # Your custom execution logic\n        pass\n```\n\n---\n\n## 📋 TODO\n\n### Completed Features ✅\n\n- [x] **Add Case Bank Reasoning**: Implemented parametric memory-based case retrieval with neural retriever\n- [x] **Continual Learning Pipeline**: Automated training data collection and model retraining\n\n### Upcoming Features & Improvements\n\n- [ ] **Add User Personal Memory Mechanism**: Implement user-preference search\n- [ ] **Refine Tools & Add More Tools**: Enhance existing tools and expand the tool ecosystem\n- [ ] **Test More New Benchmarks**: Evaluate performance on additional benchmark datasets\n- [ ] **Memory Compression**: Implement efficient memory pruning and compression strategies\n- [ ] **Multi-modal Memory**: Extend memory to support images, videos, and other modalities\n\n---\n\n### Limitations\n\n- **Long-horizon tasks**: GAIA Level-3 remains challenging due to compounding errors\n- **Frontier knowledge**: HLE performance limited by tooling alone\n- **Open-source coverage**: Limited executor validation in fully open pipelines\n\n---\n\n## 🙏 Acknowledgement\n\n* Some of the code in the toolkits and interpreters is adapted from [Camel-AI](https:\u002F\u002Fgithub.com\u002Fcamel-ai\u002Fcamel).\n\n---\n\n## 📚 Citation\n\nIf Memento helps your work, please cite:\n\n```bibtex\n@article{zhou2025mementofinetuningllmagents,\n      title={Memento: Fine-tuning LLM Agents without Fine-tuning LLMs},\n      author={Huichi Zhou and Yihang Chen and Siyuan Guo and Xue Yan and Kin Hei Lee and Zihan Wang and Ka Yiu Lee and Guchun Zhang and Kun Shao and Linyi Yang and Jun Wang},\n      journal={arXiv preprint arXiv: 2508.16153},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.16153},\n      year={2025}\n}\n\n@article{huang2025deep,\n  title={Deep Research Agents: A Systematic Examination And Roadmap},\n  author={Huang, Yuxuan and Chen, Yihang and Zhang, Haozheng and Li, Kang and Fang, Meng and Yang, Linyi and Li, Xiaoguang and Shang, Lifeng and Xu, Songcen and Hao, Jianye and others},\n  journal={arXiv preprint arXiv:2506.18096},\n  year={2025}\n}\n```\n\nFor a broader overview, please check out our survey: [Github](https:\u002F\u002Fgithub.com\u002Fai-agents-2030\u002Fawesome-deep-research-agent)\n\n---\n\n## 🤝 Contributing\n\nWe welcome contributions! Please see our contributing guidelines for:\n\n- Bug reports and feature requests\n- Code contributions and pull requests\n- Documentation improvements\n- Tool and interpreter extensions\n\n---\n\n## Star History\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=Agent-on-the-Fly\u002FMemento&type=Date)](https:\u002F\u002Fwww.star-history.com\u002F#Agent-on-the-Fly\u002FMemento&Date)\n","Memento是一个基于记忆的持续学习框架，旨在帮助大语言模型代理从经验中改进而无需更新模型权重。其核心功能包括无参数更新的记忆增强学习、两阶段规划-执行循环以及案例推理等技术特点。通过记忆增强的马尔可夫决策过程（MDP）和高效的读写操作来存储与重用经验，使得LLM能够在不调整自身权重的情况下进行在线强化学习。此外，它还支持多种工具集成如SerpAPI和vLLM本地部署，进一步提升了开发效率。适用于需要长期学习且对计算资源敏感的应用场景，比如对话系统、个性化推荐等领域。",2,"2026-06-11 03:42:13","high_star"]