[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72232":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":14,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":24,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":39,"readmeContent":40,"aiSummary":41,"trendingCount":16,"starSnapshotCount":16,"syncStatus":42,"lastSyncTime":43,"discoverSource":44},72232,"docetl","ucbepic\u002Fdocetl","ucbepic","A system for agentic LLM-powered data processing and ETL","https:\u002F\u002Fdocetl.org",null,"Python",3792,403,31,30,0,37,44,93,29.82,"MIT License",false,"main",true,[26,27,28,29,30,31,32,33,34,35,36,37,38],"agents","data","data-pipelines","document-analysis","document-processing","elt","etl","llm","python","semantic-data","unstructured-data","unstructured-data-analysis","workflow","2026-06-12 02:03:00","# 📜 DocETL: Powering Complex Document Processing Pipelines\n\n[![Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWebsite-docetl.org-blue)](https:\u002F\u002Fdocetl.org)\n[![Documentation](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDocumentation-docs-green)](https:\u002F\u002Fucbepic.github.io\u002Fdocetl)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F1285485891095236608?label=Discord&logo=discord)](https:\u002F\u002Fdiscord.gg\u002FfHp7B2X3xx)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-arXiv-red)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2410.12189)\n\n![DocETL Figure](docs\u002Fassets\u002Freadmefig.png)\n\nDocETL is a tool for creating and executing data processing pipelines, especially suited for complex document processing tasks. It offers:\n\n1. An interactive UI playground for iterative prompt engineering and pipeline development\n2. A Python package for running production pipelines from the command line or Python code\n\n> 💡 **Need Help Writing Your Pipeline?**  \n> You can use **Claude Code** (recommended) to help you write your pipeline—see the quickstart: https:\u002F\u002Fucbepic.github.io\u002Fdocetl\u002Fquickstart-claude-code\u002F  \n> If you’d rather use ChatGPT or the Claude app, see [docetl.org\u002Fllms.txt](https:\u002F\u002Fdocetl.org\u002Fllms.txt) for a big prompt you can copy\u002Fpaste before describing your task.\n\n\n### 🌟 Community Projects\n\n- [Conversation Generator](https:\u002F\u002Fgithub.com\u002FPassionFruits-net\u002Fdocetl-conversation)\n- [Text-to-speech](https:\u002F\u002Fgithub.com\u002FPassionFruits-net\u002Fdocetl-speaker)\n- [YouTube Transcript Topic Extraction](https:\u002F\u002Fgithub.com\u002Frajib76\u002Fdocetl_examples)\n\n### 📚 Educational Resources\n\n- [UI\u002FUX Thoughts](https:\u002F\u002Fx.com\u002Fsh_reya\u002Fstatus\u002F1846235904664273201)\n- [Using Gleaning to Improve Output Quality](https:\u002F\u002Fx.com\u002Fsh_reya\u002Fstatus\u002F1843354256335876262)\n- [Deep Dive on Resolve Operator](https:\u002F\u002Fx.com\u002Fsh_reya\u002Fstatus\u002F1840796824636121288)\n\n\n## 🚀 Getting Started\n\nThere are two main ways to use DocETL:\n\n### 1. 🎮 DocWrangler, the Interactive UI Playground (Recommended for Development)\n\n[DocWrangler](https:\u002F\u002Fdocetl.org\u002Fplayground) helps you iteratively develop your pipeline:\n- Experiment with different prompts and see results in real-time\n- Build your pipeline step by step\n- Export your finalized pipeline configuration for production use\n\n![DocWrangler](docs\u002Fassets\u002Ftutorial\u002Fone-operation.png)\n\nDocWrangler is hosted at [docetl.org\u002Fplayground](https:\u002F\u002Fdocetl.org\u002Fplayground). But to run the playground locally, you can either:\n- Use Docker (recommended for quick start): `make docker`\n- Set up the development environment manually\n\nSee the [Playground Setup Guide](https:\u002F\u002Fucbepic.github.io\u002Fdocetl\u002Fplayground\u002F) for detailed instructions.\n\n### 2. 📦 Python Package (For Production Use)\n\nIf you want to use DocETL as a Python package:\n\n#### Prerequisites\n- Python 3.10 or later\n- OpenAI API key\n\n```bash\npip install docetl\n```\n\nCreate a `.env` file in your project directory:\n```bash\nOPENAI_API_KEY=your_api_key_here  # Required for LLM operations (or the key for the LLM of your choice)\n```\n\n> ⚠️ **Important: Two Different .env Files**\n> - **Root `.env`**: Used by the backend Python server that executes DocETL pipelines\n> - **`website\u002F.env.local`**: Used by the frontend TypeScript code in DocWrangler (UI features like improve prompt and chatbot)\n\nTo see examples of how to use DocETL, check out the [tutorial](https:\u002F\u002Fucbepic.github.io\u002Fdocetl\u002Ftutorial\u002F).\n\n### 2. 🎮 DocWrangler Setup\n\nTo run DocWrangler locally, you have two options:\n\n#### Option A: Using Docker (Recommended for Quick Start)\n\nThe easiest way to get the DocWrangler playground running:\n\n1. Create the required environment files:\n\nCreate `.env` in the root directory (for the backend Python server that executes pipelines):\n```bash\nOPENAI_API_KEY=your_api_key_here  # Used by DocETL pipeline execution engine\n# BACKEND configuration\nBACKEND_ALLOW_ORIGINS=http:\u002F\u002Flocalhost:3000,http:\u002F\u002F127.0.0.1:3000\nBACKEND_HOST=localhost\nBACKEND_PORT=8000\nBACKEND_RELOAD=True\n\n# FRONTEND configuration\nFRONTEND_HOST=0.0.0.0\nFRONTEND_PORT=3000\n\n# Host port mapping for docker-compose (if not set, defaults are used in docker-compose.yml)\nFRONTEND_DOCKER_COMPOSE_PORT=3031\nBACKEND_DOCKER_COMPOSE_PORT=8081\n\n# Supported text file encodings\nTEXT_FILE_ENCODINGS=utf-8,latin1,cp1252,iso-8859-1\n```\n\nCreate `.env.local` in the `website` directory (for DocWrangler UI features like improve prompt and chatbot):\n```bash\nOPENAI_API_KEY=sk-xxx  # Used by TypeScript features: improve prompt, chatbot, etc.\nOPENAI_API_BASE=https:\u002F\u002Fapi.openai.com\u002Fv1\nMODEL_NAME=gpt-4o-mini  # Model used by the UI assistant\n\nNEXT_PUBLIC_BACKEND_HOST=localhost\nNEXT_PUBLIC_BACKEND_PORT=8000\nNEXT_PUBLIC_HOSTED_DOCWRANGLER=false\n```\n\n2. Run Docker:\n```bash\nmake docker\n```\n\nThis will:\n- Create a Docker volume for persistent data\n- Build the DocETL image\n- Run the container with the UI accessible at http:\u002F\u002Flocalhost:3000\n\nTo clean up Docker resources (note that this will delete the Docker volume):\n```bash\nmake docker-clean\n```\n\n##### AWS Bedrock\n\nThis framework supports integration with AWS Bedrock. To enable:\n\n1. Configure AWS credentials:\n```bash\naws configure\n```\n\n2. Test your AWS credentials:\n```bash\nmake test-aws\n```\n\n3. Run with AWS support:\n```bash\nAWS_PROFILE=your-profile AWS_REGION=your-region make docker\n```\n\nOr using Docker Compose:\n```bash\nAWS_PROFILE=your-profile AWS_REGION=your-region docker compose --profile aws up\n```\n\nEnvironment variables:\n- `AWS_PROFILE`: Your AWS CLI profile (default: 'default')\n- `AWS_REGION`: AWS region (default: 'us-west-2')\n\nBedrock models are pefixed with `bedrock`. See liteLLM [docs](https:\u002F\u002Fdocs.litellm.ai\u002Fdocs\u002Fproviders\u002Fbedrock#supported-aws-bedrock-models) for more details.\n\n#### Option B: Manual Setup (Development)\n\nFor development or if you prefer not to use Docker:\n\n1. Clone the repository:\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fucbepic\u002Fdocetl.git\ncd docetl\n```\n\n2. Set up environment variables in `.env` in the root\u002Ftop-level directory (for the backend Python server):\n```bash\nOPENAI_API_KEY=your_api_key_here  # Used by DocETL pipeline execution engine\n# BACKEND configuration\nBACKEND_ALLOW_ORIGINS=http:\u002F\u002Flocalhost:3000,http:\u002F\u002F127.0.0.1:3000\nBACKEND_HOST=localhost\nBACKEND_PORT=8000\nBACKEND_RELOAD=True\n\n# FRONTEND configuration\nFRONTEND_HOST=0.0.0.0\nFRONTEND_PORT=3000\n\n# Host port mapping for docker-compose (if not set, defaults are used in docker-compose.yml)\nFRONTEND_DOCKER_COMPOSE_PORT=3031\nBACKEND_DOCKER_COMPOSE_PORT=8081\n\n# Supported text file encodings\nTEXT_FILE_ENCODINGS=utf-8,latin1,cp1252,iso-8859-1\n```\n\nAnd create an .env.local file in the `website` directory (for DocWrangler UI features):\n```bash\nOPENAI_API_KEY=sk-xxx  # Used by TypeScript features: improve prompt, chatbot, etc.\nOPENAI_API_BASE=https:\u002F\u002Fapi.openai.com\u002Fv1\nMODEL_NAME=gpt-4o-mini  # Model used by the UI assistant\n\nNEXT_PUBLIC_BACKEND_HOST=localhost\nNEXT_PUBLIC_BACKEND_PORT=8000\nNEXT_PUBLIC_HOSTED_DOCWRANGLER=false\n```\n\n3. Install dependencies:\n```bash\nmake install      # Install Python deps with uv and set up pre-commit\nmake install-ui   # Install UI dependencies\n```\n\nIf you prefer using uv directly instead of Make:\n```bash\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\nuv sync --all-groups --all-extras\n```\n\n\n\n4. Start the development server:\n```bash\nmake run-ui-dev\n```\n\n5. Visit http:\u002F\u002Flocalhost:3000\u002Fplayground to access the interactive UI.\n\n### 🛠️ Development Setup\n\nIf you're planning to contribute or modify DocETL, you can verify your setup by running the test suite:\n\n```bash\nmake tests-basic  # Runs basic test suite (costs \u003C $0.01 with OpenAI)\n```\n\nFor detailed documentation and tutorials, visit our [documentation](https:\u002F\u002Fucbepic.github.io\u002Fdocetl).\n","DocETL 是一个用于创建和执行数据处理流水线的系统，特别适用于复杂的文档处理任务。其核心功能包括一个交互式的UI游乐场，支持迭代式提示工程和流水线开发，以及一个Python包，允许用户通过命令行或Python代码运行生产流水线。技术上，DocETL利用了大型语言模型（LLM）来增强数据处理能力，支持从非结构化数据中提取语义信息。该工具非常适合需要对大量复杂文档进行自动化分析与处理的应用场景，如法律文件审查、科研文献整理等。",2,"2026-06-11 03:40:57","high_star"]