[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-10681":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":17,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":33,"readmeContent":34,"aiSummary":35,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":36,"discoverSource":37},10681,"bRAG-langchain","bragai\u002FbRAG-langchain","bragai","Everything you need to know to build your own RAG application","https:\u002F\u002Fbragai.dev",null,"Jupyter Notebook",4112,498,42,2,0,4,18,30.09,"Other",false,"main",true,[25,26,27,28,29,30,31,32],"agentic-rag","ai","chatbot","llm","machine-learning","python","rag","retrieval-augmented-generation","2026-06-12 02:02:25","# Retrieval-Augmented Generation (RAG) Project\n\n#### 🔜 Check out [bragai.dev](https:\u002F\u002Fbragai.dev) (launching soon)\n\n---------------------\n\nThis repository contains a comprehensive exploration of Retrieval-Augmented Generation (RAG) for various applications.\nEach notebook provides a detailed, hands-on guide to setting up and experimenting with RAG from an introductory level to advanced implementations, including multi-querying and custom RAG builds.\n\n![rag_detail_v2](assets\u002Fimg\u002Frag-architecture.png)\n\n## Project Structure\n\nIf you want to jump straight into it, check out the file `full_basic_rag.ipynb` -> this file will give you a boilerplate starter code of a fully customizable RAG chatbot.\n\nMake sure to run your files in a virtual environment (checkout section `Get Started`)\n\nThe following notebooks can be found under the directory `notebooks\u002F`.\n\n### [1]\\_rag_setup_overview.ipynb\n\nThis introductory notebook provides an overview of RAG architecture and its foundational setup.\nThe notebook walks through: \n- **Environment Setup**: Configuring the environment, installing necessary libraries, and API setups.\n- **Initial Data Loading**: Basic document loaders and data preprocessing methods.\n- **Embedding Generation**: Generating embeddings using various models, including OpenAI's embeddings.\n- **Vector Store**: Setting up a vector store (ChromaDB\u002FPinecone) for efficient similarity search.\n- **Basic RAG Pipeline**: Creating a simple retrieval and generation pipeline to serve as a baseline.\n\n### [2]\\_rag_with_multi_query.ipynb\n\nBuilding on the basics, this notebook introduces multi-querying techniques in the RAG pipeline, exploring: \n- **Multi-Query Setup**: Configuring multiple queries to diversify retrieval.\n- **Advanced Embedding Techniques**: Utilizing multiple embedding models to refine retrieval.\n- **Pipeline with Multi-Querying**: Implementing multi-query handling to improve relevance in response generation.\n- **Comparison & Analysis**: Comparing results with single-query pipelines and analyzing performance improvements.\n\n### [3]_rag_routing_and_query_construction.ipynb\n\nThis notebook delves deeper into customizing a RAG pipeline.\nIt covers: \n- **Logical Routing:** Implements function-based routing for classifying user queries to appropriate data sources based on programming languages.\n- **Semantic Routing:** Uses embeddings and cosine similarity to direct questions to either a math or physics prompt, optimizing response accuracy.\n- **Query Structuring for Metadata Filters:** Defines structured search schema for YouTube tutorial metadata, enabling advanced filtering (e.g., by view count, publication date).\n- **Structured Search Prompting:** Leverages LLM prompts to generate database queries for retrieving relevant content based on user input.\n- **Integration with Vector Stores:** Links structured queries to vector stores for efficient data retrieval.\n\n\n### [4]_rag_indexing_and_advanced_retrieval.ipynb\n\nContinuing from the previous customization, this notebook explores:\n- **Preface on Document Chunking:** Points to external resources for document chunking techniques.\n- **Multi-representation Indexing:** Sets up a multi-vector indexing structure for handling documents with different embeddings and representations.\n- **In-Memory Storage for Summaries:** Uses InMemoryByteStore for storing document summaries alongside parent documents, enabling efficient retrieval.\n- **MultiVectorRetriever Setup:** Integrates multiple vector representations to retrieve relevant documents based on user queries.\n- **RAPTOR Implementation:** Explores RAPTOR, an advanced indexing and retrieval model, linking to in-depth resources.\n- **ColBERT Integration:** Demonstrates ColBERT-based token-level vector indexing and retrieval, which captures contextual meaning at a fine-grained level.\n- **Wikipedia Example with ColBERT:** Retrieves information about Hayao Miyazaki using the ColBERT retrieval model for demonstration.\n\n### [5]_rag_retrieval_and_reranking.ipynb\n\nThis final notebook brings together the RAG system components, with a focus on scalability and optimization: \n- **Document Loading and Splitting:** Loads and chunks documents for indexing, preparing them for vector storage.\n- **Multi-query Generation with RAG-Fusion:** Uses a prompt-based approach to generate multiple search queries from a single input question.\n- **Reciprocal Rank Fusion (RRF):** Implements RRF for re-ranking multiple retrieval lists, merging results for improved relevance.\n- **Retriever and RAG Chain Setup:** Constructs a retrieval chain for answering queries, using fused rankings and RAG chains to pull contextually relevant information.\n- **Cohere Re-Ranking:** Demonstrates re-ranking with Cohere’s model for additional contextual compression and refinement.\n- **CRAG and Self-RAG Retrieval:** Explores advanced retrieval approaches like CRAG and Self-RAG, with links to examples.\n- **Exploration of Long-Context Impact:** Links to resources explaining the impact of long-context retrieval on RAG models.\n\n## Getting Started\n\n### Pre-requisites\n\nEnsure **Python 3.11.11** (preferred) is installed on your system. Follow the platform-specific instructions below to install it if not already installed.\n\n#### macOS\n1. Install [Homebrew](https:\u002F\u002Fbrew.sh\u002F) if not already installed:\n   ```bash\n   \u002Fbin\u002Fbash -c \"$(curl -fsSL https:\u002F\u002Fraw.githubusercontent.com\u002FHomebrew\u002Finstall\u002FHEAD\u002Finstall.sh)\"\n   ```\n2. Install Python 3.11.11:\n   ```bash\n   brew install python@3.11\n   ```\n3. Verify installation:\n   ```bash\n   python3.11 --version\n   ```\n\n#### Linux\n1. Update your package manager:\n   ```bash\n   sudo apt update\n   ```\n2. Install Python 3.11.11:\n   ```bash\n   sudo apt install python3.11 python3.11-venv\n   ```\n3. Verify installation:\n   ```bash\n   python3.11 --version\n   ```\n\n#### Windows\n1. Download the Python 3.11.11 installer from [Python.org](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F).\n2. Run the installer and ensure you check the box **\"Add Python to PATH\"**.\n3. Verify installation:\n   ```cmd\n   python --version\n   ```\n---\n\n### Installation Instructions\n\n#### 1. Clone the Repository\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FbRAGAI\u002FbRAG-langchain.git\ncd bRAG-langchain\n```\n\n#### 2. Create a Virtual Environment\nUse Python 3.11.11 to create a virtual environment:\n```bash\npython3.11 -m venv venv\n```\n\nActivate the virtual environment:\n- **macOS\u002FLinux**:\n  ```bash\n  source venv\u002Fbin\u002Factivate\n  ```\n- **Windows**:\n  ```cmd\n  venv\\Scripts\\activate\n  ```\n\n#### 3. Verify and Fix Python Version\nIf the virtual environment defaults to a different Python version (e.g., Python 3.13):\n1. Verify the current Python version inside the virtual environment:\n   ```bash\n   python --version\n   ```\n2. Use Python 3.11 explicitly within the virtual environment:\n   ```bash\n   python3.11\n   ```\n3. Ensure the `python` command uses Python 3.11 by creating a symbolic link:\n   ```bash\n   ln -sf $(which python3.11) $(dirname $(which python))\u002Fpython\n   ```\n4. Verify the fix:\n   ```bash\n   python --version\n   ```\n\n#### 4. Install Dependencies\nInstall the required packages:\n```bash\npip install -r requirements.txt\n```\n\n---\n\n### Additional Steps\n\n#### 5. Run the Notebooks\nBegin with `[1]_rag_setup_overview.ipynb` to get familiar with the setup process. Proceed sequentially through the other notebooks:\n\n- `[1]_rag_setup_overview.ipynb`\n- `[2]_rag_with_multi_query.ipynb`\n- `[3]_rag_routing_and_query_construction.ipynb`\n- `[4]_rag_indexing_and_advanced_retrieval.ipynb`\n- `[5]_rag_retrieval_and_reranking.ipynb`\n\n#### 6. Set Up Environment Variables\n1. Duplicate the `.env.example` file in the root directory and rename it to `.env`.\n2. Add the following keys (replace with your actual values):\n\n   ```env\n   # LLM Model - Get key at https:\u002F\u002Fplatform.openai.com\u002Fapi-keys\n   OPENAI_API_KEY=\"your-api-key\"\n\n   # LangSmith - Get key at https:\u002F\u002Fsmith.langchain.com\n   LANGCHAIN_TRACING_V2=true\n   LANGCHAIN_ENDPOINT=\"https:\u002F\u002Fapi.smith.langchain.com\"\n   LANGCHAIN_API_KEY=\"your-api-key\"\n   LANGCHAIN_PROJECT=\"your-project-name\"\n\n   # Pinecone Vector Database - Get key at https:\u002F\u002Fapp.pinecone.io\n   PINECONE_INDEX_NAME=\"your-project-index\"\n   PINECONE_API_HOST=\"your-host-url\"\n   PINECONE_API_KEY=\"your-api-key\"\n\n   # Cohere - Get key at https:\u002F\u002Fdashboard.cohere.com\u002Fapi-keys\n   COHERE_API_KEY=your-api-key\n   ```\n\n---\n\nYou're now ready to use the project!\n\n## Usage\n\nAfter setting up the environment and running the notebooks in sequence, you can:\n\n1.  **Experiment with Retrieval-Augmented Generation**:\n    Use the foundational setup in `[1]_rag_setup_overview.ipynb` to understand the basics of RAG.\n\n2.  **Implement Multi-Querying**:\n    Learn how to improve response relevance by introducing multi-querying techniques in `[2]_rag_with_multi_query.ipynb`.\n\n## Star History\n\n\u003Ca href=\"https:\u002F\u002Fstar-history.com\u002F#bragai\u002Fbrag-langchain&Date\">\n \u003Cpicture>\n   \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=bragai\u002Fbrag-langchain&type=Date&theme=dark\" \u002F>\n   \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=bragai\u002Fbrag-langchain&type=Date\" \u002F>\n   \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=bragai\u002Fbrag-langchain&type=Date\" \u002F>\n \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n## Contact\nDo you have questions or want to collaborate? Please open an issue or email Taha Ababou at taha@bragai.dev\n\n`If this project helps you, consider buying me a coffee ☕. Your support helps me keep contributing to the open-source community!`\n\u003Cp>\n    \u003Ca href=\"https:\u002F\u002Fbuymeacoffee.com\u002Fbragai\" target=\"_blank\" rel=\"noopener noreferrer\">\n        \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fsponsor-30363D?style=for-the-badge&logo=GitHub-Sponsors&logoColor=#white\" \u002F>\n    \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cbr>\n\n    The notebooks and visual diagrams were inspired by Lance Martin's LangChain Tutorial.\n\n    \n","该项目是一个全面的检索增强生成（RAG）应用构建指南。它通过一系列Jupyter Notebook提供从基础到高级的RAG技术实践教程，包括环境配置、数据加载、嵌入生成、向量存储设置以及基本和高级RAG管道的创建等核心功能。项目特别强调了多查询处理、基于逻辑与语义的路由机制、结构化搜索提示及与向量数据库集成等技术特点。适用于希望深入理解并动手实现个性化RAG系统的开发者或研究者，尤其是在需要提升聊天机器人或其他AI助手的信息检索与生成能力时。","2026-06-11 03:29:42","top_topic"]