[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11659":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},11659,"DCI-Agent-Lite","DCI-Agent\u002FDCI-Agent-Lite","DCI-Agent","Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction","https:\u002F\u002Farxiv.org\u002Fpdf\u002F2605.05242",null,"Python",334,45,3,0,8,29,222,24,4.99,"MIT License",false,"main",true,[],"2026-06-12 02:02:33","\u003Ca name=\"readme-top\">\u003C\u002Fa>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fimgs\u002Fbanner.svg\" alt=\"DCI-Agent-Lite\" height=\"120\">\n\u003C\u002Fp>\n\n\u003Ch2 align=\"center\">\n  Deep Research on Your Personal Knowledge Base\n\u003C\u002Fh2>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fpapers\u002F2605.05242\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-B31B1B?style=for-the-badge&logo=arXiv&logoColor=white\" alt=\"arXiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002Fzhuofengli96475\u002Fstatus\u002F2052784645398303198\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-000000?style=for-the-badge&logo=X&logoColor=white\" alt=\"Twitter\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FDCI-Agent\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white\" alt=\"Hugging Face\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FDCI-Agent\u002Fdemo\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDemo-F97316.svg?style=for-the-badge&logo=gradio&logoColor=white\" alt=\"Demo\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FDCI-Agent\u002Feval-logs\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FEval%20Logs-755BB4?style=for-the-badge&logo=google-sheets&logoColor=white\" alt=\"Eval Logs\">\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n---\n\n## 💥 Introduction\n\n**DCI** is a **direct corpus interaction paradigm** for agentic search. Instead of querying a fixed semantic retriever or retrieval API, the agent **searches the raw corpus directly with terminal tools**. This lets the agent freely compose search primitives and interact with the corpus as an open research environment. It also substantially simplifies the overall retrieval system. \n\n**DCI-Agent-Lite** is the **minimal open implementation** of this paradigm, built on [Pi](https:\u002F\u002Fgithub.com\u002Fbadlogic\u002Fpi-mono\u002Ftree\u002Fmain\u002Fpackages\u002Fcoding-agent) with **bash tools** and **lightweight context management** for **long-horizon deep research**. With `GPT-5.4-nano`, it achieves an impressive **62.9%** accuracy on BrowseComp-Plus, **surpassing** agentic search agents powered by `GPT-5.2`, `Claude-Sonnet-4.6`, `Qwen3.5-122B`, and `GLM-4.7`.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fimgs\u002Fteaser.png\" alt=\"OpenResearcher Teaser\" width=\"100%\" style=\"max-width: 850px; border-radius: 8px; box-shadow: 0 4px 10px rgba(0,0,0,0.1);\">\n\u003C\u002Fdiv>\n\n\u003Cbr>\n\n## 🏆 Main Results\n\nDCI-Agent-Lite outperforms top-performing baselines across 13 benchmarks spanning agentic search, knowledge-intensive QA, and IR-ranking tasks.\n\n- **Table 1 -** Agentic Search results.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fimgs\u002Fbcp.png\" alt=\"Knowledge-intensive QA results\" width=\"65%\" style=\"max-width: 850px; border-radius: 8px; box-shadow: 0 4px 10px rgba(0,0,0,0.1);\">\n\u003C\u002Fdiv>\n\n\n- **Table 2 -** Knowledge-intensive QA results.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fimgs\u002Ftable_qa.png\" alt=\"Knowledge-intensive QA results\" width=\"85%\" style=\"max-width: 850px; border-radius: 8px; box-shadow: 0 4px 10px rgba(0,0,0,0.1);\">\n\u003C\u002Fdiv>\n\n- **Table 3 -** IR ranking results.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fimgs\u002Ftable_ir.png\" alt=\"IR ranking results\" width=\"85%\" style=\"max-width: 850px; border-radius: 8px; box-shadow: 0 4px 10px rgba(0,0,0,0.1);\">\n\u003C\u002Fdiv>\n\n\n## 🌟 Key Features\n- 🔒 **Your private deep-research assistant**: Point DCI-Agent-Lite at a local corpus and start immediately. It searches, inspects, cross-checks, and answers from your own knowledge base without sending documents to a hosted retrieval service.\n- ⚡ **High-resolution, zero-index retrieval**: No embeddings, vector databases, or offline index builds. The agent searches raw files directly with terminal commands like `rg`, `find`, and `sed`, so it can start immediately and maintain fine-grained control over the knowledge base.\n- 🛠️ **Minimal harness, long-horizon power**: Built on [Pi](https:\u002F\u002Fgithub.com\u002Fbadlogic\u002Fpi-mono\u002Ftree\u002Fmain\u002Fpackages\u002Fcoding-agent) with only bash tools and lightweight context management, DCI-Agent-Lite is small enough to hack and strong enough for serious deep research runs.\n- 🚀 **Remarkable agentic-search performance**: DCI-Agent-Lite with GPT-5.4-nano beats top baselines across 13 benchmarks, spanning BrowseComp-Plus, knowledge-intensive QA, and IR ranking.\n\n---\n\n## 📑 Table of Contents\n\n- [⚙️ Setup](#setup)\n- [⚡ Quick Start](#quick-start)\n- [🚀 Running Experiments](#running-experiments)\n- [🎯 Benchmark Evaluation](#benchmark-evaluation)\n- [🏗️ Repository Layout](#repository-layout)\n- [🙏 Acknowledgements](#acknowledgements)\n- [📚 Citation](#citation)\n\n---\n\n\u003Ca name=\"setup\">\u003C\u002Fa>\n## ⚙️ Setup\n\n### One-Click Install\n\n**Unix \u002F macOS**\n\n```bash\nbash setup.sh\n```\n\n\u003Cdetails>\n\u003Csummary>Manual Steps\u003C\u002Fsummary>\n\nSee [`assets\u002Fdocs\u002Fsetup.md`](assets\u002Fdocs\u002Fsetup.md) for detailed prerequisites, repo build instructions, API-key configuration, and vLLM provider setup.\n\nQuick manual path:\n\n```bash\n# 1. Install uv + ripgrep, then sync Python deps\nuv sync\n\n# 2. Clone and build Pi\ngit clone https:\u002F\u002Fgithub.com\u002Fjdf-prog\u002Fpi-mono.git pi-mono\ncd pi-mono && git checkout codex\u002Fcontext-management-ablation && npm install && npm run build && cd ..\n\n# 3. Configure API keys (copy template, edit .env, auto-loaded by setup.sh)\ncp .env.template .env\n# edit .env, then re-run setup.sh or source it manually\n\n# 4. Download datasets (auto-downloaded by setup.sh, or run manually)\n#    Corpus: https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FDCI-Agent\u002Fcorpus\nuv run python scripts\u002Fdownload_corpus.py\n\n#    Benchmark datasets: https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FDCI-Agent\u002Fdci-bench\nuv run python scripts\u002Fdownload_dci_bench.py\n```\n\n\u003C\u002Fdetails>\n\n### Configuration\n\nCopy the template to `.env`, then fill in the variables you need. To get DCI running, set at least one of `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`:\n\n```bash\ncp .env.template .env\n```\n\nCommon variables:\n\n- `OPENAI_API_KEY` for OpenAI model runs and benchmark judging by default.\n- `ANTHROPIC_API_KEY` for Anthropic model runs.\n\n\u003Ca name=\"quick-start\">\u003C\u002Fa>\n## ⚡ Quick Start\n\n**Prerequisites**: Install dependencies and configure an OpenAI API key (see [Setup](#setup)).\n\nThe example below illustrates DCI-Agent-Lite in action: the deep research agent searches the corpus, inspects relevant documents, and produces evidence-grounded answers entirely within the given wikipedia corpus.\n\n1. **Open the DCI-Agent-Lite TUI**:\n\n```bash\n# load keys from .env if not already in environment\nset -a; source .env 2>\u002Fdev\u002Fnull; set +a\n\nuv run dci-agent-lite --terminal \\\n  --provider openai \\\n  --model gpt-5.4-nano \\\n  --cwd \"corpus\u002Fwiki_corpus\" \\\n  --extra-arg=\"--thinking high\"\n```\n\n2. **Run your first task**. In the TUI, type:\n\n```text\nAnswer the following question using only wiki_dump.jsonl in the current directory. Do not use web search. Use rg instead of grep for fast searching. Question: In which street did the Great Fire of London originate?\n```\n\n3. (Optional) **Run Programmatically from the CLI**. Remove the `--terminal` flag and pass your task as the final argument:\n\n```bash\nset -a; source .env 2>\u002Fdev\u002Fnull; set +a\n\nuv run dci-agent-lite \\\n  --provider openai \\\n  --model gpt-5.4-nano \\\n  --cwd \"corpus\u002Fwiki_corpus\" \\\n  --extra-arg=\"--thinking high\" \\\n  \"Answer the following question using only wiki_dump.jsonl in the current directory. Do not use web search. Use rg instead of grep for fast searching. Question: In which street did the Great Fire of London originate?\"\n```\n\nProgrammatic runs save artifacts under `outputs\u002Fruns\u002F\u003Ctimestamp>\u002F`. The final answer is in `final.txt`, the original question is in `question.txt`, and the full trajectory is in `conversation_full.json`. To choose a specific location, pass `--output-dir path\u002Fto\u002Frun`. \n\nMore runnable examples for OpenAI, Anthropic and vLLM are available in [`scripts\u002Fexamples\u002F`](scripts\u002Fexamples\u002F) as `dci_basic_*.sh`. See the [setup guide](assets\u002Fdocs\u002Fsetup.md#5-optional-configure-a-local-vllm-provider) for vLLM configuration.\n\n\n## 🚀 Context Management Strategies\n\nDCI-Agent-Lite includes a lightweight runtime context-management layer for long-horizon deep research runs.\n\nIt uses three simple strategies:\n\n- **Truncation** shortens large tool results in each turn.\n- **Compaction** keeps recent turns and replaces older tool results with placeholders.\n- **Summarization** summarizes older history when the context gets crowded.\n\n\u003Cdetails>\n\u003Csummary>\u003Cstrong>Context management illustration\u003C\u002Fstrong>\u003C\u002Fsummary>\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"assets\u002Fimgs\u002Fcontext_management.png\" alt=\"Context management strategies: truncation, compaction, and summarization\" width=\"100%\" style=\"max-width: 900px; border-radius: 8px; box-shadow: 0 4px 10px rgba(0,0,0,0.1);\">\n\u003C\u002Fdiv>\n\n\u003C\u002Fdetails>\n\nThe runtime levels move from no context management to more aggressive compression:\n\n| Level | Behavior |\n|-------|----------|\n| `level0` | No context management. |\n| `level1` | Light truncation. |\n| `level2` | Stronger truncation. |\n| `level3` | Truncation and compaction. |\n| `level4` | Truncation, compaction, and summarization. |\n\nPass a level through Pi with `--extra-arg`:\n\n```bash\nset -a; source .env 2>\u002Fdev\u002Fnull; set +a\n\nuv run dci-agent-lite \\\n  --provider openai \\\n  --model gpt-5.4-nano \\\n  --cwd \"corpus\u002Fwiki_corpus\" \\\n  --extra-arg=\"--thinking high\" \\\n  --extra-arg=\"--context-management-level level4\" \\\n  \"Answer the following question using only wiki_dump.jsonl in the current directory. Do not use web search. Use rg instead of grep for fast searching. Question: In which street did the Great Fire of London originate?\"\n```\n\n\n\u003Ca name=\"running-experiments\">\u003C\u002Fa>\n## 🎯 Benchmark DCI-Agent-Lite \n\nWe benchmark DCI-Agent-Lite on the following benchmark suites using OpenAI `gpt-5.4-nano` with `--thinking high`, context management set to `level3`, and a maximum turn budget of 300.\n\n| Data | Data Size | Retrieval Corpus | Corpus Size | Avg. Corpus Len. (words) | Corpus Path |\n|------|-----------|------------------|-------------|--------------------------|-------------|\n| BrowseComp-Plus | 830 | BrowseComp-Plus | 100,195 docs | 5,179 | `corpus\u002Fbc_plus_docs\u002F` |\n| BRIGHT-Biology | 103 | BRIGHT-Biology | 57,359 docs | 48 | `corpus\u002Fbright_corpus\u002Fbiology\u002F` |\n| BRIGHT-Earth Science | 116 | BRIGHT-Earth Science | 121,249 docs | 28 | `corpus\u002Fbright_corpus\u002Fearth_science\u002F` |\n| BRIGHT-Economics | 103 | BRIGHT-Economics | 50,220 docs | 52 | `corpus\u002Fbright_corpus\u002Feconomics\u002F` |\n| BRIGHT-Robotics | 101 | BRIGHT-Robotics | 61,961 docs | 25 | `corpus\u002Fbright_corpus\u002Frobotics\u002F` |\n| NQ, TriviaQA, Bamboogle, HotpotQA, 2WikiMultiHopQA, MuSiQue | 50 each \u002F 300 total | Wikipedia-18 | 21,015,324 docs | 100 | `corpus\u002Fwiki_corpus\u002F` |\n\n\n### Agentic Search (BrowseComp-Plus)\n\n```bash\nbash scripts\u002Fbcplus_eval\u002Frun_bcplus_eval_openai.sh\n```\n\n### Knowledge-Intensive QA\n\n```bash\nbash scripts\u002Fqa\u002Frun_hotpotqa_dev_sample50.sh\nbash scripts\u002Fqa\u002Frun_musique_dev_sample50.sh\nbash scripts\u002Fqa\u002Frun_nq_test_sample50.sh\nbash scripts\u002Fqa\u002Frun_triviaqa_test_sample50.sh\nbash scripts\u002Fqa\u002Frun_2wikimultihopqa_dev_sample50.sh\nbash scripts\u002Fqa\u002Frun_bamboogle_test_sample50.sh\n```\n\n### IR Ranking\n\n```bash\n# BRIGHT\nbash scripts\u002Fbright\u002Frun_bio.sh\nbash scripts\u002Fbright\u002Frun_earth_science.sh\nbash scripts\u002Fbright\u002Frun_economics.sh\nbash scripts\u002Fbright\u002Frun_robotics.sh\n```\n\n## 🤝 Core Contributors\n\n\u003Ctable>\n\u003Ctr>\n    \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fzhuofeng-li.github.io\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FZhuofeng-Li.png\" width=\"75px;\" alt=\"Zhuofeng Li\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Zhuofeng Li\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n        \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fjdf-prog\">\n            \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fjdf-prog.png\" width=\"75px;\" alt=\"Dongfu Jiang\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Dongfu Jiang\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fisaacghx.github.io\u002Fabout\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FIsaacGHX.png\" width=\"75px;\" alt=\"Haoxiang Zhang\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Haoxiang Zhang\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003C\u002Ftd>\n        \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fcongwei1230.github.io\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fcongwei1230.github.io\u002Fimages\u002Fprofile.png\" width=\"75px;\" alt=\"Cong Wei\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Cong Wei\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Flupantech.github.io\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Flupantech.png\" width=\"75px;\" alt=\"Pan Lu\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Pan Lu\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ferenup\">\n            \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Ferenup.png\" width=\"75px;\" alt=\"Ping Nie\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Ping Nie\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n## 🎓 Advisors\n\n\u003Ctable>\n\u003Ctr>\n        \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fyejinc.github.io\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fyejinc.github.io\u002Fprofile-uw-2022.jpeg\" width=\"70px;\" alt=\"Yejin Choi\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Yejin Choi\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n        \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fwww.james-zou.com\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fstatic.wixstatic.com\u002Fmedia\u002F0f3e8f_cfa7e327b97745ddb8c4a66454b5eb3e~mv2.jpg\u002Fv1\u002Ffill\u002Fw_199,h_279,al_c,q_80,usm_0.66_1.00_0.01,enc_avif,quality_auto\u002F46824428A5822_ForWeb.jpg\" width=\"60px;\" alt=\"James Zou\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>James Zou\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n     \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fhanj.cs.illinois.edu\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fencrypted-tbn0.gstatic.com\u002Fimages?q=tbn:ANd9GcTYsR82-ravgyteSRLka-SC5A9EwwlJ0opdlVb_4PHVAFHzHu_dmYjegv43Z7gF2MC2k2euJEAA3y4GXrZ-m-h_7F9QWtwd8ITdgD6WMsdMEsmuzb_K&s=10&ec=121643094\" width=\"80px;\" alt=\"Jiawei Han\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Jiawei Han\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fwenhuchen\">\n            \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fwenhuchen.png\" width=\"75px;\" alt=\"Wenhu Chen\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Wenhu Chen\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n    \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fcs.uwaterloo.ca\u002F~jimmylin\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Flintool.png\" width=\"75px;\" alt=\"Jimmy Lin\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Jimmy Lin\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n        \u003Ctd align=\"center\">\n        \u003Ca href=\"https:\u002F\u002Fyuzhimanhua.github.io\u002F\">\n            \u003Cimg src=\"https:\u002F\u002Fyuzhimanhua.github.io\u002Fprofile_pic.jpg\" width=\"75px;\" alt=\"Yu Zhang\"\u002F>\n            \u003Cbr \u002F>\n            \u003Csub>\u003Cb>Yu Zhang\u003C\u002Fb>\u003C\u002Fsub>\n        \u003C\u002Fa>\n    \u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n\u003Ca name=\"acknowledgements\">\u003C\u002Fa>\n## 🙏 Acknowledgements\n\n\u003C!-- TODO: fill in acknowledgements -->\n\n---\n\n\u003Ca name=\"citation\">\u003C\u002Fa>\n## 📚 Citation\n\n```bibtex\n@article{li2026beyond,\n  title={Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction},\n  author={Li, Zhuofeng and Zhang, Haoxiang and Wei, Cong and Lu, Pan and Nie, Ping and Lu, Yi and Bai, Yuyang and Feng, Shangbin and Zhu, Hangxiao and Zhong, Ming and Zhang, Yuyu and Xie, Jianwen and Choi, Yejin and Zou, James and Han, Jiawei and Chen, Wenhu and Lin, Jimmy and Jiang, Dongfu and Zhang, Yu},\n  journal={arXiv preprint arXiv:2605.05242},\n  year={2026}\n}\n```\n\n\u003Cp align=\"right\">\u003Ca href=\"#readme-top\">↑ Back to Top ↑\u003C\u002Fa>\u003C\u002Fp>\n\n## Star History\n\n\u003Ca href=\"https:\u002F\u002Fwww.star-history.com\u002F?repos=DCI-Agent%2FDCI-Agent-Lite&type=date&legend=top-left\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=DCI-Agent\u002FDCI-Agent-Lite&type=date&theme=dark&legend=top-left\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=DCI-Agent\u002FDCI-Agent-Lite&type=date&legend=top-left\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fchart?repos=DCI-Agent\u002FDCI-Agent-Lite&type=date&legend=top-left\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n","DCI-Agent-Lite 是一个直接语料库交互的搜索代理，它通过终端工具直接在原始语料库中进行搜索，从而实现更灵活的搜索操作和简化的检索系统。该项目使用Python编写，基于Pi框架，并利用bash工具和轻量级上下文管理技术，特别适用于需要长时间深入研究的场景。其核心优势在于能够自由组合搜索原语，并以开放研究环境的方式与语料库互动，这使得DCI-Agent-Lite在多个基准测试中表现出色，包括但不限于智能体搜索、知识密集型问答以及信息检索排名任务。实验结果显示，在BrowseComp-Plus数据集上，使用GPT-5.4-nano模型时，该代理达到了62.9%的准确率，超过了其他基于GPT-5.2、Claude-Sonnet-4.6等模型的智能体搜索代理。",2,"2026-06-11 03:32:13","CREATED_QUERY"]