[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82302":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":35,"discoverSource":36},82302,"Epstein_Files_RAG","AbhisumatK\u002FEpstein_Files_RAG","AbhisumatK","An open-source RAG platform to explore the unsealed Jeffrey Epstein court documents.","",null,"Python",42,13,2,0,1,5,8,3,3.44,"MIT License",false,"main",true,[26,27,28,29,30,31],"epstein","epstein-documents","epstein-files","rag","rag-chatbot","rag-pipeline","2026-06-12 02:04:25","# \u003Cimg src=\"logo.png\" width=\"50\" align=\"center\"> Epstein Files RAG Explorer 🔍\n\nAn open-source Retrieval-Augmented Generation (RAG) platform to explore and analyze the unsealed Jeffrey Epstein court documents. Built with LangChain, ChromaDB, and Streamlit.\n\n![Screenshot](pic.png)\n\n## 🚀 Features\n- **Open Stack**: Fully open-source tools and models.\n- **Local & Fast**: Support for local execution via Ollama or high-speed cloud inference via Groq\u002FOpenRouter.\n- **Automated Ingestion**: Easily download and index curated parquet data from Hugging Face.\n- **Strict Guardrails**: Designed to stay strictly within the context of the investigative documents.\n\n---\n\n## 🛠️ Setup Instructions\n\n### 1. Prerequisites\n- **Python 3.10+** (Recommend using a virtual environment).\n- **Ollama** (Optional): If you want to run LLMs completely locally. Download at [ollama.com](https:\u002F\u002Follama.com\u002F).\n- **Windows Users**: If you encounter DLL initialization errors with TensorFlow\u002FTransformers, ensure you follow the installation steps below precisely, as the `requirements.txt` includes critical fixes for `torch` and `protobuf`.\n\n### 2. Installation\nClone the repository and install dependencies:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FAbhisumatK\u002FEpstein_Files_RAG\ncd Epstein_Files_RAG\n\n# Optional create a virtual environment\npython -m venv venv\n.\\venv\\Scripts\\activate  # On Windows\n\n# install dependencies\npip install -r requirements.txt\n```\n\n### 3. Environment Configuration\nCopy the `.env.example` to `.env` and configure your providers:\n```bash\ncp .env.example .env\n```\nFill in your API keys in `.env`:\n- **Groq API**: Get yours at [console.groq.com](https:\u002F\u002Fconsole.groq.com\u002F).\n- **OpenRouter API**: Get yours at [openrouter.ai](https:\u002F\u002Fopenrouter.ai\u002F).\n- **Ollama**: No key needed, just ensure it's running.\n\n### 4. Data Ingestion\nThe Epstein dataset is massive (>200GB). By default, the ingestion script downloads only the first **0.5 GB** chunk for testing.\n```bash\npython ingest.py\n```\n- **Estimated Time**: ~3-5 minutes for the first chunk (depending on your bandwidth).\n- **How to Tweaks**: Open `ingest.py` and change `num_files=1` to a higher number (e.g., `num_files=10` for ~5GB) to index more data.\n\n### 5. Launch the Application\nStart the Streamlit dashboard:\n```bash\nstreamlit run app.py\n```\n\n---\n\n## 📊 Dataset Info\n- **Source**: [Nikity\u002FEpstein-Files](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FNikity\u002FEpstein-Files) on Hugging Face.\n- **Format**: Apache Parquet files containing extracted text from investigative files.\n- **Note**: The 0.5 GB limit (one parquet file) is used to ensure quick setup and low memory usage. The full dataset contains hundreds of thousands of documents.\n\n## 🛡️ Guardrails\nThis application includes specialized system prompts to ensure the assistant stays strictly within the investigative context. It will refuse out-of-scope requests (like general knowledge or unrelated tasks) to maintain the integrity of the analysis.\n\n## 📄 License\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n","Epstein Files RAG Explorer 是一个开源的检索增强生成（RAG）平台，用于探索和分析解封的Jeffrey Epstein法庭文件。该项目基于LangChain、ChromaDB和Streamlit构建，提供完全开源的工具和模型，支持本地快速执行或通过Groq\u002FOpenRouter进行高速云推理。自动化的数据导入功能可以轻松下载并索引来自Hugging Face的精选Parquet数据，并且设计有严格的护栏机制，确保仅在调查文档的范围内操作。适用于需要深入研究和分析大规模法律文档的场景，如法律研究、新闻调查等。","2026-06-11 04:08:18","CREATED_QUERY"]