[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72138":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},72138,"Search-R1","PeterGriffinJin\u002FSearch-R1","PeterGriffinJin","Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL","https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.09516",null,"Python",4904,439,19,26,0,22,71,229,66,29.93,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:02:59","# Search-R1: Train your LLMs to reason and call a search engine with reinforcement learning\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fraw.githubusercontent.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fmain\u002Fpublic\u002Flogo.png\" alt=\"logo\" width=\"300\"\u002F>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.09516\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper1-blue?style=for-the-badge\" alt=\"Button1\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper2-green?style=for-the-badge\" alt=\"Button2\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FPeterJinGo\u002Fsearch-r1-67d1a021202731cb065740f5\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FResources-orange?style=for-the-badge\" alt=\"Button3\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002FBowenJin13\u002Fstatus\u002F1895544294473109889\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTweet-red?style=for-the-badge\" alt=\"Button4\"\u002F>\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLogs-purple?style=for-the-badge\" alt=\"Button5\"\u002F>\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\u003C!-- \u003Cstrong>Search-R1\u003C\u002Fstrong> is a reinforcement learning framework for \u003Cem>training reasoning and searching (tool-call) interleaved LLMs\u003C\u002Fem>.  -->\n\u003C!-- We built upon [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl). -->\n**Search-R1** is a reinforcement learning framework designed for training **reasoning-and-searching interleaved LLMs**—language models that learn to reason and make tool calls (e.g., to search engines) in a coordinated manner.\n\n\u003C!-- It can be seen as an extension of \u003Cstrong>DeepSeek-R1(-Zero)\u003C\u002Fstrong> with interleaved search engine calling and an opensource RL training-based solution for \u003Cstrong>OpenAI DeepResearch\u003C\u002Fstrong>. -->\nBuilt upon [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl), Search-R1 extends the ideas of **DeepSeek-R1(-Zero)** by incorporating interleaved search engine access and provides a fully open-source RL training pipeline. It serves as an alternative and open solution to **OpenAI DeepResearch**, enabling research and development in tool-augmented LLM reasoning.\n\n\u003C!-- Through RL (rule-based outcome reward), the 3B **base** LLM (both Qwen2.5-3b-base and Llama3.2-3b-base) develops reasoning and search engine calling abilities all on its own. -->\n\nWe support different RL methods (e.g., PPO, GRPO, reinforce), different LLMs (e.g., llama3, Qwen2.5, etc) and different search engines (e.g., local sparse\u002Fdense retrievers and online search engines).\n\nPaper: [link1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.09516), [link2](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117); Model and data: [link](https:\u002F\u002Fhuggingface.co\u002Fcollections\u002FPeterJinGo\u002Fsearch-r1-67d1a021202731cb065740f5); Twitter thread: [link](https:\u002F\u002Fx.com\u002FBowenJin13\u002Fstatus\u002F1895544294473109889); Full experiment log: [prelim](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-open); [v0.1](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-nq_hotpotqa_train); [v0.2](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2); [v0.3](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.3). Details about these logs and methods can be find [here](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fexperiment_log.md).\n\n\n![single-turn](public\u002Fmain.png)\n\n## News\n\n- [2025.10] Search-R1 is featured by Thinking Machines Lab's first product [Tinker](https:\u002F\u002Fgithub.com\u002Fthinking-machines-lab\u002Ftinker-cookbook)! Details: [Document](https:\u002F\u002Fgithub.com\u002Fthinking-machines-lab\u002Ftinker-cookbook\u002Ftree\u002Fmain\u002Ftinker_cookbook\u002Frecipes\u002Ftool_use\u002Fsearch).\n- [2025.7] Search-R1 is supported by [SkyRL](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL)! Detailed instructions: [code](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL\u002Ftree\u002Fmain\u002Fskyrl-train\u002Fexamples\u002Fsearch), [Document](https:\u002F\u002Fnovasky-ai.notion.site\u002Fskyrl-searchr1).\n- [2025.6] Search-R1 is now integrated into the latest version of veRL and can take advantage of its most up-to-date features! Detailed instructions: [veRL](https:\u002F\u002Fverl.readthedocs.io\u002Fen\u002Flatest\u002Fsglang_multiturn\u002Fsearch_tool_example.html), [English Document](https:\u002F\u002Fgithub.com\u002Fzhaochenyang20\u002FAwesome-ML-SYS-Tutorial\u002Fblob\u002Fmain\u002Frlhf\u002Fverl\u002Fmulti-turn\u002Ftool_examples\u002Fverl-multiturn-searchR1-like.md), [Chinese Document](https:\u002F\u002Fgithub.com\u002Fzhaochenyang20\u002FAwesome-ML-SYS-Tutorial\u002Fblob\u002Fmain\u002Frlhf\u002Fverl\u002Fmulti-turn\u002Ftool_examples\u002Fverl-multiturn-searchR1-like_ZH.md).\n- [2025.5] The second [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.15117) conducting detailed empirical studies is published with logs: [v0.3](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.3). \n- [2025.4] We support [multinode](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fmultinode.md) training for 30B+ LLMs!\n- [2025.4] We support [different search engines](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Fblob\u002Fmain\u002Fdocs\u002Fretriever.md) including sparse local retriever, dense local retriever with ANN indexing and online search engines!\n- [2025.3] The first Search-R1 [paper](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2503.09516) is published with the logs: [v0.1](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-nq_hotpotqa_train); [v0.2](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-v0.2).\n- [2025.2] We opensource Search-R1 codebase with [preliminary results](https:\u002F\u002Fwandb.ai\u002Fpeterjin\u002FSearch-R1-open).\n\n## Links\n\n- [Installation](#installation)\n- [Quick start](#quick-start)\n- [Preliminary results](#preliminary-results)\n- [Inference](#inference)\n- [Use your own dataset](#use-your-own-dataset)\n- [Use your own search engine](#use-your-own-search-engine)\n- [Features](#features)\n- [Ackowledge](#acknowledge)\n- [Citations](#citations)\n\n## Installation\n\n### Search-r1 environment\n```bash\nconda create -n searchr1 python=3.9\nconda activate searchr1\n# install torch [or you can skip this step and let vllm to install the correct version for you]\npip install torch==2.4.0 --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu121\n# install vllm\npip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1\n\n# verl\npip install -e .\n\n# flash attention 2\npip3 install flash-attn --no-build-isolation\npip install wandb\n```\n\n### Retriever environment (optional)\nIf you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a seperate environment.)\n```bash\nconda create -n retriever python=3.10\nconda activate retriever\n\n# we recommend installing torch with conda for faiss-gpu\nconda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia\npip install transformers datasets pyserini\n\n## install the gpu version faiss to guarantee efficient RL rollout\nconda install -c pytorch -c nvidia faiss-gpu=1.8.0\n\n## API function\npip install uvicorn fastapi\n```\n\n\n## Quick start\n\nTrain a reasoning + search LLM on NQ dataset with e5 as the retriever and wikipedia as the corpus.\n\n(1) Download the indexing and corpus.\n```bash\nsave_path=\u002Fthe\u002Fpath\u002Fto\u002Fsave\npython scripts\u002Fdownload.py --save_path $save_path\ncat $save_path\u002Fpart_* > $save_path\u002Fe5_Flat.index\ngzip -d $save_path\u002Fwiki-18.jsonl.gz\n```\n\n(2) Process the NQ dataset.\n```bash\npython scripts\u002Fdata_process\u002Fnq_search.py\n```\n\n(3) Launch a local retrieval server.\n```bash\nconda activate retriever\nbash retrieval_launch.sh\n```\n\n(4) Run RL training (PPO) with Llama-3.2-3b-base.\n```bash\nconda activate searchr1\nbash train_ppo.sh\n```\n\n## Preliminary results\n\n(1) The base model (llama3.2-3b-base) learns to call the search engine and obtain improved performance.\n\n![llama-3b](public\u002Fllama32-3b.png)\n\n\n(2) The base model (Qwen2.5-7b-base) can learn to conduct multi-turn search engine calling and reasoning with RL.\n\n![multi-turn](public\u002Fmulti-turn.png)\n\n## Inference\n#### You can play with the trained Search-R1 model with your own question.\n(1) Launch a local retrieval server.\n```bash\nconda activate retriever\nbash retrieval_launch.sh\n```\n\n(2) Run inference.\n```bash\nconda activate searchr1\npython infer.py\n```\nYou can modify the ```question``` on line 7 to something you're interested in.\n\n## Use your own dataset\n\n### QA data\nFor each question-answer sample, it should be a dictionary containing the desired content as below:\n\n```\ndata = {\n        \"data_source\": data_source,\n        \"prompt\": [{\n            \"role\": \"user\",\n            \"content\": question,\n        }],\n        \"ability\": \"fact-reasoning\",\n        \"reward_model\": {\n            \"style\": \"rule\",\n            \"ground_truth\": solution\n        },\n        \"extra_info\": {\n            'split': split,\n            'index': idx,\n        }\n    }\n```\n\nYou can refer to ```scripts\u002Fdata_process\u002Fnq_search.py``` for a concrete data processing example.\n\n### Corpora\n\nIt is recommended to make your corpus a jsonl file, where each line (a dictionary with \"id\" key and \"contents\" key) corresponds to one passage. You can refer to ```example\u002Fcorpus.jsonl``` for an example.\n\nThe \"id\" key corresponds to the passage id, while the \"contents\" key corresponds to the passage content ('\"' + title + '\"\\n' + text).\nFor example:\n```\n{\"id\": \"0\", \"contents\": \"Evan Morris Evan L. Morris (January 26, 1977 \\u2013 July 9, 2015) was a lobbyist for Genentech and its parent corporation Roche in Washington.\"}\n...\n{\"id\": \"100\", \"contents\": \"Three years later, when the United States Exploring Expedition to little-known portions of the globe was organised under Charles Wilkes, Hale was recommended, while yet an undergraduate.\"}\n...\n```\n\n**Index your corpora (optional).**\nIf you would like to use a local retriever as the search engine, you can index your own corpus by:\n```\nbash search_r1\u002Fsearch\u002Fbuild_index.sh\n```\nYou can change ```retriever_name``` and ```retriever_model``` to your interested off-the-shelf retriever.\n\n## Use your own search engine\n\nOur codebase supports local sparse retriever (e.g., BM25), local dense retriever (both flat indexing with GPUs and ANN indexing with CPUs) and online search engine (e.g., Google, Bing, etc). More details can be found [here](https:\u002F\u002Fgithub.com\u002FPeterGriffinJin\u002FSearch-R1\u002Ftree\u002Fmain\u002Fdocs\u002Fretriever.md).\n\nThe main philosophy is to launch a local or remote search engine server separately from the main RL training pipeline. \n\nThe LLM can call the search engine by calling the search API (e.g., \"http:\u002F\u002F127.0.0.1:8000\u002Fretrieve\").\n\nYou can refer to ```search_r1\u002Fsearch\u002Fretriever_server.py``` for an example of launching a local retriever server.\n\n## Features\n- Support local sparse retrievers (e.g., BM25). ✔️\n- Support local dense retrievers (both flat indexing and ANN indexing) ✔️\n- Support google search \u002F bing search \u002F brave search API and others. ✔️\n- Support off-the-shelf neural rerankers. ✔️\n- Support different RL methods (e.g., PPO, GRPO, reinforce). ✔️\n- Support different LLMs (e.g., llama3, Qwen2.5, etc). ✔️\n\n## Acknowledge\n\nThe concept of Search-R1 is inspired by [Deepseek-R1](https:\u002F\u002Fgithub.com\u002Fdeepseek-ai\u002FDeepSeek-R1) and [TinyZero](https:\u002F\u002Fgithub.com\u002FJiayi-Pan\u002FTinyZero\u002Ftree\u002Fmain).\nIts implementation is built upon [veRL](https:\u002F\u002Fgithub.com\u002Fvolcengine\u002Fverl) and [RAGEN](https:\u002F\u002Fgithub.com\u002FZihanWang314\u002FRAGEN\u002Ftree\u002Fmain). \nWe sincerely appreciate the efforts of these teams for their contributions to open-source research and development.\n\n## Awesome work powered or inspired by Search-R1\n\n- [DeepResearcher](https:\u002F\u002Fgithub.com\u002FGAIR-NLP\u002FDeepResearcher): Scaling Deep Research via Reinforcement Learning in Real-world Environments. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FGAIR-NLP\u002FDeepResearcher)](https:\u002F\u002Fgithub.com\u002FGAIR-NLP\u002FDeepResearcher)\n- [Multimodal-Search-R1](https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1): Incentivizing LMMs to Search. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1)](https:\u002F\u002Fgithub.com\u002FEvolvingLMMs-Lab\u002Fmultimodal-search-r1)\n- [OTC](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2504.14870): Optimal Tool Calls via Reinforcement Learning.\n- [ZeroSearch](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch): Incentivize the Search Capability of LLMs without Searching. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FZeroSearch)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FZeroSearch)\n- [IKEA](https:\u002F\u002Fgithub.com\u002Fhzy312\u002Fknowledge-r1): Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fhzy312\u002Fknowledge-r1)](https:\u002F\u002Fgithub.com\u002Fhzy312\u002Fknowledge-r1)\n- [Scent of Knowledge](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.09316): Optimizing Search-Enhanced Reasoning with Information Foraging.\n- [AutoRefine](https:\u002F\u002Fwww.arxiv.org\u002Fpdf\u002F2505.11277): Search and Refine During Think. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fsyr-cn\u002FAutoRefine)](https:\u002F\u002Fgithub.com\u002Fsyr-cn\u002FAutoRefine)\n- [O^2-Searcher](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.16582): A Searching-based Agent Model for Open-Domain Open-Ended Question Answering. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAcade-Mate\u002FO2-Searcher)](https:\u002F\u002Fgithub.com\u002FAcade-Mate\u002FO2-Searcher)\n- [MaskSearch](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.20285): A Universal Pre-Training Framework to Enhance Agentic Search Capability. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FMaskSearch)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FMaskSearch)\n- [VRAG-RL](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.22019): Vision-Perception-Based RAG for Visually Rich Information Understanding. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAlibaba-NLP\u002FVRAG)](https:\u002F\u002Fgithub.com\u002FAlibaba-NLP\u002FVRAG)\n- [R1-Code-Interpreter](https:\u002F\u002Farxiv.org\u002Fabs\u002F2505.21668): Training LLMs to Reason with Code via SFT and RL. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fyongchao98\u002FR1-Code-Interpreter)](https:\u002F\u002Fgithub.com\u002Fyongchao98\u002FR1-Code-Interpreter)\n- [R-Search](https:\u002F\u002Farxiv.org\u002Fabs\u002F2506.04185): Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FQingFei1\u002FR-Search)](https:\u002F\u002Fgithub.com\u002FQingFei1\u002FR-Search)\n- [StepSearch](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2505.15107): Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FZillwang\u002FStepSearch)](https:\u002F\u002Fgithub.com\u002FZillwang\u002FStepSearch)\n- [SimpleTIR](https:\u002F\u002Fsimpletir.notion.site\u002Freport): Stable End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fltzheng\u002FSimpleTIR)](https:\u002F\u002Fgithub.com\u002Fltzheng\u002FSimpleTIR)\n- [Router-R1](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2506.09033): Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fulab-uiuc\u002FRouter-R1)](https:\u002F\u002Fgithub.com\u002Fulab-uiuc\u002FRouter-R1)\n- [SkyRL](https:\u002F\u002Fskyrl.readthedocs.io\u002Fen\u002Flatest\u002F): A Modular Full-stack RL Library for LLMs. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FNovaSky-AI\u002FSkyRL)](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL)\n- [ASearcher](https:\u002F\u002Farxiv.org\u002Fabs\u002F2508.07976): Large-Scale RL for Search Agents. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FinclusionAI\u002FASearcher)](https:\u002F\u002Fgithub.com\u002FinclusionAI\u002FASearcher)\n- [ParallelSearch](https:\u002F\u002Fwww.arxiv.org\u002Fabs\u002F2508.09303): Decompose Query and Search Sub-queries in Parallel with RL. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTree-Shu-Zhao\u002FParallelSearch)](https:\u002F\u002Fgithub.com\u002FTree-Shu-Zhao\u002FParallelSearch)\n- [AutoTIR](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2507.21836): Autonomous Tools Integrated Reasoning via Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fweiyifan1023\u002FAutoTIR)](https:\u002F\u002Fgithub.com\u002Fweiyifan1023\u002FAutoTIR)\n- [verl-tool](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2509.01055): A version of verl to support diverse tool use. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FTIGER-AI-Lab\u002Fverl-tool)](https:\u002F\u002Fgithub.com\u002FTIGER-AI-Lab\u002Fverl-tool)\n- [Tree-GRPO](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.21240): Tree Search for LLM Agent Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FAMAP-ML\u002FTree-GRPO)](https:\u002F\u002Fgithub.com\u002FAMAP-ML\u002FTree-GRPO)\n- [EviNote-RAG](https:\u002F\u002Farxiv.org\u002Fabs\u002F2509.00877): Enhancing RAG Models via Answer-Supportive Evidence Notes. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FDa1yuqin\u002FEviNoteRAG)](https:\u002F\u002Fgithub.com\u002FDa1yuqin\u002FEviNoteRAG)\n- [GlobalRAG](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2510.20548v1): GlobalRAG: Enhancing Global Reasoning in Multi-hop Question Answering via Reinforcement Learning. [![[code]](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FCarnegieBin\u002FGlobalRAG)](https:\u002F\u002Fgithub.com\u002FCarnegieBin\u002FGlobalRAG)\n\n\n\n\n\n## Citations\n\n```bibtex\n@article{jin2025search,\n  title={Search-r1: Training llms to reason and leverage search engines with reinforcement learning},\n  author={Jin, Bowen and Zeng, Hansi and Yue, Zhenrui and Yoon, Jinsung and Arik, Sercan and Wang, Dong and Zamani, Hamed and Han, Jiawei},\n  journal={arXiv preprint arXiv:2503.09516},\n  year={2025}\n}\n```\n\n```bibtex\n@article{jin2025empirical,\n  title={An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents},\n  author={Jin, Bowen and Yoon, Jinsung and Kargupta, Priyanka and Arik, Sercan O and Han, Jiawei},\n  journal={arXiv preprint arXiv:2505.15117},\n  year={2025}\n}\n```\n","Search-R1 是一个基于强化学习的框架，用于训练能够进行推理和调用搜索引擎的语言模型。该项目支持多种强化学习方法（如PPO、GRPO、Reinforce）、不同类型的大型语言模型（例如Llama3、Qwen2.5等）以及不同的搜索引擎（包括本地稀疏\u002F密集检索器和在线搜索引擎）。通过这些技术组合，Search-R1使基础模型能够自主发展出复杂的推理能力和有效的工具调用技能。该框架适用于需要增强语言模型以执行复杂任务或获取外部信息的应用场景，比如知识问答系统、智能助手等。此外，Search-R1提供了完全开源的训练流程，为研究人员和技术开发者提供了一个强大的实验平台。",2,"2026-06-11 03:40:32","high_star"]