[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-10680":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":47,"readmeContent":48,"aiSummary":49,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":50,"discoverSource":51},10680,"DeepAnalyze","ruc-datalab\u002FDeepAnalyze","ruc-datalab","DeepAnalyze is the first agentic LLM for autonomous data science. 🎈你的AI数据分析师，自动分析大量数据，一键生成专业分析报告！","https:\u002F\u002Fruc-deepanalyze.github.io",null,"Python",4215,682,47,15,0,2,20,79,12,30.5,"MIT License",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46],"agent","agentic","agentic-ai","ai","ai-scientist","chatbot","data","data-analysis","data-engineering","data-science","data-visualization","database","deep-research","jupyter","llm","open-source","python","python-programming","qwen","science","2026-06-12 02:02:25","﻿\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\"assets\u002Flogo.png\" alt=\"DeepAnalyze\" style=\"width: 60%; min-width: 300px; display: block; margin: auto;\">\n\u003C\u002Fp>\n\n# DeepAnalyze: Agentic Large Language Models for Autonomous Data Science\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2510.16872-b31b1b.svg?logo=arXiv)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16872)\n[![homepage](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%8C%90%20Homepage%20-DeepAnalyze%20Cases-blue.svg)](https:\u002F\u002Fruc-deepanalyze.github.io\u002F)\n[![model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Huggingface%20-DeepAnalyze--8B-orange.svg)](https:\u002F\u002Fhuggingface.co\u002FRUC-DataLab\u002FDeepAnalyze-8B)\n[![data](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%93%9A%20Datasets%20-DataScience--Instruct--500K-darkgreen.svg)](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRUC-DataLab\u002FDataScience-Instruct-500K)\n[![star](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fruc-datalab\u002FDeepAnalyze?style=social&label=Code+Stars)](https:\u002F\u002Fgithub.com\u002Fruc-datalab\u002FDeepAnalyze)\n![Badge](https:\u002F\u002Fhitscounter.dev\u002Fapi\u002Fhit?url=https%3A%2F%2Fgithub.com%2Fruc-datalab%2FDeepAnalyze&label=Visitors&icon=graph-up&color=%23dc3545&message=&style=flat&tz=UTC)  [![wechat](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeChat-%E5%8A%A0%E5%85%A5DeepAnalyze%E4%BA%A4%E6%B5%81%E8%AE%A8%E8%AE%BA%E7%BE%A4-black?logo=wechat&logoColor=07C160)](.\u002Fassets\u002Fwechat.jpg) \n\n[![twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F@Brian%20Roemmele-gray?logo=x&logoColor=white&labelColor=black)](https:\u002F\u002Fx.com\u002FBrianRoemmele\u002Fstatus\u002F1981015483823571352) [![twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F@Dr%20Singularity-gray?logo=x&logoColor=white&labelColor=black)](https:\u002F\u002Fx.com\u002FDr_Singularity\u002Fstatus\u002F1981010771338498241) [![twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F@Gorden%20Sun-gray?logo=x&logoColor=white&labelColor=black)](https:\u002F\u002Fx.com\u002FGorden_Sun\u002Fstatus\u002F1980573407386423408) [![twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F@AIGCLINK-gray?logo=x&logoColor=white&labelColor=black)](https:\u002F\u002Fx.com\u002Faigclink\u002Fstatus\u002F1980554517126246642) [![twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F@Python%20Developer-gray?logo=x&logoColor=white&labelColor=black)](https:\u002F\u002Fx.com\u002FPython_Dv\u002Fstatus\u002F1980667557318377871) [![twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F@meng%20shao-gray?logo=x&logoColor=white&labelColor=black)](https:\u002F\u002Fx.com\u002Fshao__meng\u002Fstatus\u002F1980623242114314531) \n\n\n> **Authors**: **[Shaolei Zhang](https:\u002F\u002Fzhangshaolei1998.github.io\u002F), [Ju Fan*](http:\u002F\u002Fiir.ruc.edu.cn\u002F~fanj\u002F), [Meihao Fan](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=9RTm2qoAAAAJ), [Guoliang Li](https:\u002F\u002Fdbgroup.cs.tsinghua.edu.cn\u002Fligl\u002F), [Xiaoyong Du](http:\u002F\u002Finfo.ruc.edu.cn\u002Fjsky\u002Fszdw\u002Fajxjgcx\u002Fjsjkxyjsx1\u002Fjs2\u002F7374b0a3f58045fc9543703ccea2eb9c.htm)**\n>\n> Renmin University of China, Tsinghua University\n\n\n**DeepAnalyze** is the first agentic LLM for autonomous data science. It can autonomously complete a wide range of data-centric tasks without human intervention, supporting:\n- 🛠 **Entire data science pipeline**: Automatically perform any data science tasks such as data preparation, analysis, modeling, visualization, and report generation.\n- 🔍 **Open-ended data research**: Conduct deep research on diverse data sources, including structured data (Databases, CSV, Excel), semi-structured data (JSON, XML, YAML), and unstructured data (TXT, Markdown), and finally produce analyst-grade research reports.\n- 📊 **Fully open-source**: The [model](https:\u002F\u002Fhuggingface.co\u002FRUC-DataLab\u002FDeepAnalyze-8B), [code](https:\u002F\u002Fgithub.com\u002Fruc-datalab\u002FDeepAnalyze), [training data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRUC-DataLab\u002FDataScience-Instruct-500K), and [demo](https:\u002F\u002Fhuggingface.co\u002FRUC-DataLab\u002FDeepAnalyze-8B) of DeepAnalyze are all open-sourced, allowing you to deploy or extend your own data analysis assistant.\n\n\u003Cp align=\"center\" width=\"100%\">\n\u003Cimg src=\".\u002Fassets\u002Fdeepanalyze.jpg\" alt=\"deepanalyze\" style=\"width: 70%; min-width: 300px; display: block; margin: auto;\">\n\u003C\u002Fp>\n\n\n## 🔥 News\n\n- **[2026.03.16]**: Update DeepAnalyze **WebUI v2**, featuring a smoother UI, support for the **HeyWhale API**, and support for **Docker-based sandboxed code execution**. More details in [Readme](.\u002Fdemo\u002Fchat_v2\u002FREADME.md) .\n\n- **[2026.01.31]**: 🎉🎉🎉DeepAnalyze served as the official agent supporting the **[2026年(第19届)中国大学生计算机设计大赛大数据主题赛 (2026 (19th) China Collegiate Computer Design Contest – Big Data Track)](https:\u002F\u002Fjsjds.dhu.edu.cn\u002F2025\u002F0322\u002Fc20379a371447\u002Fpage.htm)**.\n\n- **[2025.12.28] ANNOUNCEMENT: DeepAnalyze API Keys Are Now Available 🎉🎉🎉**  You can now apply for your API key via this [Google Form](https:\u002F\u002Fforms.gle\u002FYxVkCzczqq8jeciw9) or this [Feishu Form](https:\u002F\u002Fheywhale.feishu.cn\u002Fshare\u002Fbase\u002FshrcnnBRgO0x2qhx40yq4m1HxUg). For full details and usage instructions, please refer to the [Guide](.\u002Fdocs\u002FDeepAnalyze_API_Key_Usage_Guide.md) or the [Feishu Wiki](https:\u002F\u002Fheywhale.feishu.cn\u002Fwiki\u002FTcVmw314liwCiKkxnttc2CnInfg).\n\n\n- **[2025.11.13]**: DeepAnalyze now supports OpenAI-style API endpointsis and is accessible through the Command Line Terminal UI. Thanks to the contributor [@LIUyizheSDU](https:\u002F\u002Fgithub.com\u002FLIUyizheSDU\u002F)\n\n- **[2025.11.08]**: DeepAnalyze is now accessible through the JupyterUI, building based on [jupyter-mcp-server](https:\u002F\u002Fgithub.com\u002Fdatalayer\u002Fjupyter-mcp-server). Thanks to the contributor [@ChengJiale150](https:\u002F\u002Fgithub.com\u002FChengJiale150).\n\n- **[2025.10.28]**: We welcome all contributions, including improving the DeepAnalyze and sharing use cases (see [`CONTRIBUTION.md`](CONTRIBUTION.md)). All merged PRs will be listed as contributors.\n\n- **[2025.10.27]**: DeepAnalyze has attracted widespread attention, gaining **1K+** GitHub stars and **200K+** Twitter views within a week.\n\n- **[2025.10.21]**: DeepAnalyze's [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16872), [code](https:\u002F\u002Fgithub.com\u002Fruc-datalab\u002FDeepAnalyze), [model](https:\u002F\u002Fhuggingface.co\u002FRUC-DataLab\u002FDeepAnalyze-8B), [training data](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRUC-DataLab\u002FDataScience-Instruct-500K) are released!\n\n## 🖥 Demo\n\n### WebUI\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F04184975-7ee7-4ae0-8761-7a7550c5c8fe\n\u003Cp align=\"center\" width=\"100%\">\nUpload the data, DeepAnalyze can perform data-oriented deep research 🔍 and any data-centric tasks 🛠\n\u003C\u002Fp>\n\n- Clone this repo and download [DeepAnalyze-8B](https:\u002F\u002Fhuggingface.co\u002FRUC-DataLab\u002FDeepAnalyze-8B).\n- Deploy DeepAnalyze-8B via vllm: `vllm serve DeepAnalyze-8B`\n- Run these scripts to launch the API and interface, and then interact through the browser (http:\u002F\u002Flocalhost:4000):\n    ```bash\n    cd demo\u002Fchat\u002Ffrontend\n    npm install\n    cd ..\n    bash start.sh\n    \n    # stop the api and interface\n    bash stop.sh\n    \n    # stop vllm if needed\n    ```\n- If you want to deploy under a specific IP, please replace localhost with your IP address in [.\u002Fdemo\u002Fchat\u002Fbackend.py](.\u002Fdemo\u002Fchat\u002Fbackend.py) and [.\u002Fdemo\u002Fchat\u002Ffrontend\u002Flib\u002Fconfig.ts](.\u002Fdemo\u002Fchat\u002Ffrontend\u002Flib\u002Fconfig.ts)\n\n### WebUI v2\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F2dd1d2aa-6fb9-4202-bc8d-cbe874844725\n\u003Cp align=\"center\" width=\"100%\">\nUpload the data, DeepAnalyze can perform data-oriented deep research 🔍 and any data-centric tasks 🛠\n\u003C\u002Fp> \n\n- A more streamlined UI\n- Added support for HeyWhale API keys\n- Added support for a Docker-based sandbox code execution environment.\n- The usage method is the same as WebUI.\n\n    ```bash\n    cd demo\u002Fchat_v2\u002Ffrontend  \n    npm install\n    cd ..\n    cp .env.example .env \n    bash start.sh\n    # stop the api and interface\n    bash stop.sh\n    \n    # stop vllm if needed\n    ```\n\n### JupyterUI\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fa2335f45-be0e-4787-a4c1-e93192891c5f\n\u003Cp align=\"center\" width=\"100%\">\nFamiliar with Jupyter Notebook? Try DeepAnalyze through the JupyterUI!\n\u003C\u002Fp>\n\n- This Demo runs Jupyter Lab as frontend, creating a new notebook, converting `\u003CAnalyze|Understand|Answer>` to Markdown cells, converting `\u003CCode>` to Code cells and executing them as `\u003CExecute>`.\n- Go to [demo\u002Fjupyter](.\u002Fdemo\u002Fjupyter) to see more and try!\n- 👏Thanks a lot to the contributor [@ChengJiale150](https:\u002F\u002Fgithub.com\u002FChengJiale150).\n\n### CLI\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F018acae5-b979-4143-ae1e-5b74da453c1d\n\u003Cp align=\"center\" width=\"100%\">\nTry DeepAnalyze through the command-line interface\n\u003C\u002Fp>\n\n- Deploy DeepAnalyze-8B via vllm: `vllm serve DeepAnalyze-8B`\n\n- Start the API server and launch the CLI interface:\n    ```bash\n    cd API\n    python start_server.py  # In one terminal\n    \n    cd demo\u002Fcli\n    python api_cli.py       # In another terminal (English)\n    # or\n    python api_cli_ZH.py    # In another terminal (Chinese)\n    ```\n    \n- The CLI provides a Rich-based beautiful interface with file upload support and real-time streaming responses.\n\n- Supports both English and Chinese interfaces .\n\n    \n\n> [!TIP]\n>\n> Clone this repository to deploy DeepAnalyze locally as your data analyst, completing any data science tasks without any workflow or closed-source APIs.\n>\n> 🔥 The UI of the demo is an initial version. Welcome to further develop it, and we will include you as a contributor.\n\n## 🚀 Quick Start\n\n### 🔑 **Use the DeepAnalyze API**\n\n**API keys are now available!**\n\nTo request your key, please fill out one of the following application forms:\n*   **[Primary Form (Google)](https:\u002F\u002Fforms.gle\u002FYxVkCzczqq8jeciw9)**\n*   **[Alternative Form (Feishu)](https:\u002F\u002Fheywhale.feishu.cn\u002Fshare\u002Fbase\u002FshrcnnBRgO0x2qhx40yq4m1HxUg)**\n\n**📚 For comprehensive usage instructions, please refer to the API guide:**\n\n*   **[Documentation](.\u002Fdocs\u002FDeepAnalyze_API_Key_Usage_Guide.md)**\n*   **[Feishu Wiki](https:\u002F\u002Fheywhale.feishu.cn\u002Fwiki\u002FTcVmw314liwCiKkxnttc2CnInfg)**\n\n\n\n### Model Download\n\nDownload model in  [RUC-DataLab\u002FDeepAnalyze-8B · Hugging Face](https:\u002F\u002Fhuggingface.co\u002FRUC-DataLab\u002FDeepAnalyze-8B)  or  [DeepAnalyze-8B · 模型库](https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FRUC-DataLab\u002FDeepAnalyze-8B\u002Fsummary)\n\n#### 📊 Memory Configuration Recommended Parameters Table\n\n| GPU Memory | Model Type | Recommended max-model-len | Use FP8 KV Cache |\n|------------|------------|--------------------------|-----------------------|\n| **16GB** | 8-bit Quantized | 8192 | ✓ |\n| **16GB** | 4-bit Quantized | 49152 | ✓ |\n| **24GB** | Original Model | 16384 | ✓ |\n| **24GB** | 8-bit Quantized | 98304 | ✓ |\n| **24GB** | 4-bit Quantized | 131072 | ✓ |\n| **40GB** | Original Model | 131072 | ✓ |\n| **40GB** | 8-bit Quantized | 131072 |  |\n| **80GB** | Original Model | 131072 |  |\n\nTo obtain the quantized model, you can use `.\u002Fquantize.py` .\n\n#### 🚀 vLLM Launch Command Template\n\n##### General Command Template\n```bash\npython -m vllm.entrypoints.openai.api_server \\\n  --model \u003Cmodel_path> \\\n  --served-model-name DeepAnalyze-8B \\\n  --max-model-len \u003Cselect_from_table_above> \\\n  --gpu-memory-utilization 0.95 \\\n  --port 8000 \\\n  \u003Cadd_fp8_if_required> \\\n  --trust-remote-code\n```\n\n##### Command Examples by Scenario\n\n**Scenario 1: 16GB GPU Memory Users (Recommended: 4-bit Quantized Version)**\n\n```bash\npython -m vllm.entrypoints.openai.api_server \\\n  --model \u002Fpath\u002Fto\u002Fdeepanalyze\u002F4bit \\\n  --served-model-name DeepAnalyze-8B \\\n  --max-model-len 49152 \\\n  --gpu-memory-utilization 0.95 \\\n  --port 8000 \\\n  --kv-cache-dtype fp8 \\\n  --trust-remote-code\n```\n\n**Scenario 2: 24GB GPU Memory Users (For Maximum Context Length)**\n\n```bash\npython -m vllm.entrypoints.openai.api_server \\\n  --model \u002Fpath\u002Fto\u002Fdeepanalyze\u002F4bit \\\n  --served-model-name DeepAnalyze-8B \\\n  --max-model-len 131072 \\\n  --gpu-memory-utilization 0.95 \\\n  --port 8000 \\\n  --kv-cache-dtype fp8 \\\n  --trust-remote-code\n```\n\n**Scenario 3: 80GB GPU Memory Users (Best Performance)**\n\n```bash\npython -m vllm.entrypoints.openai.api_server \\\n  --model \u002Fpath\u002Fto\u002Foriginal\u002Fmodel \\\n  --served-model-name DeepAnalyze-8B \\\n  --max-model-len 131072 \\\n  --gpu-memory-utilization 0.95 \\\n  --port 8000 \\\n  --trust-remote-code\n```\n\n#### Quick Selection Guide\n\n- **Limited Memory (\u003C24GB)**: Use 4-bit Quantized Version + FP8 KV Cache\n- **Balanced Configuration (24-40GB)**: Choose model type based on requirements\n- **Sufficient Memory (≥40GB)**: Use Original Model for best precision\n\nAfter launching, the API service can be accessed via `http:\u002F\u002Flocalhost:8000\u002Fv1\u002Fcompletions`.\n\n### Requirements\n\n- Install packages: `torch`, `transformers`, `vllm>=0.8.5`\n    ```bash\n    conda create -n deepanalyze python=3.12 -y\n    conda activate deepanalyze\n    pip install -r requirements.txt\n    \n    # For training\n    (cd .\u002Fdeepanalyze\u002Fms-swift\u002F && pip install -e .)\n    (cd .\u002Fdeepanalyze\u002FSkyRL\u002F && pip install -e .)\n    ```\n- [`requirements.txt`](requirements.txt) lists the minimal dependencies required for DeepAnalyze inference.\nFor training, please refer to [`.\u002Fdeepanalyze\u002Fms-swift\u002Frequirements.txt`](.\u002Fdeepanalyze\u002Fms-swift\u002Frequirements.txt) and [`.\u002Fdeepanalyze\u002FSkyRL\u002Fpyproject.toml`](.\u002Fdeepanalyze\u002FSkyRL\u002Fpyproject.toml)\n- We recommend separating the inference and training environments to avoid dependency conflicts.\n\n### Command Interaction\n\n- Deploy DeepAnalyze-8B via vllm: `vllm serve DeepAnalyze-8B`\n\n- Run these scripts for any data science tasks:\n  - You can specify **any data science tasks**, including specific data tasks and open-ended data research.\n  - You can specify **any number of data sources**, and DeepAnalyze will automatically explore them.\n  - You can specify **any type of data sources**, e.g., structured data (Databases, CSV, Excel), semi-structured data (JSON, XML, YAML), and unstructured data (TXT, Markdown)\n\n  ```python\n  from deepanalyze import DeepAnalyzeVLLM\n  \n  prompt = \"\"\"# Instruction\n  Generate a data science report.\n  \n  # Data\n  File 1: {\"name\": \"bool.xlsx\", \"size\": \"4.8KB\"}\n  File 2: {\"name\": \"person.csv\", \"size\": \"10.6KB\"}\n  File 3: {\"name\": \"disabled.xlsx\", \"size\": \"5.6KB\"}\n  File 4: {\"name\": \"enlist.csv\", \"size\": \"6.7KB\"}\n  File 5: {\"name\": \"filed_for_bankrupcy.csv\", \"size\": \"1.0KB\"}\n  File 6: {\"name\": \"longest_absense_from_school.xlsx\", \"size\": \"16.0KB\"}\n  File 7: {\"name\": \"male.xlsx\", \"size\": \"8.8KB\"}\n  File 8: {\"name\": \"no_payment_due.xlsx\", \"size\": \"15.6KB\"}\n  File 9: {\"name\": \"unemployed.xlsx\", \"size\": \"5.6KB\"}\n  File 10: {\"name\": \"enrolled.csv\", \"size\": \"20.4KB\"}\"\"\"\n  \n  workspace = \"\u002Fhome\u002Fu2023000922\u002Fzhangshaolei\u002Fdeepanalyze_public\u002FDeepAnalyze\u002Fexample\u002Fanalysis_on_student_loan\u002F\"\n  \n  deepanalyze = DeepAnalyzeVLLM(\n      \"\u002Ffs\u002Ffast\u002Fu2023000922\u002Fzhangshaolei\u002Fcheckpoints\u002Fdeepanalyze-8b\u002F\"\n  )\n  answer = deepanalyze.generate(prompt, workspace=workspace)\n  print(answer[\"reasoning\"])\n  ```\n  You shoud get a deep research report, which can be rendered as a PDF.:\n  ```text\n  # Comprehensive Analysis of Student Enrollment Patterns and Institutional Transfers\n  \n  ## Introduction and Research Context\n  \n  The analysis of student enrollment patterns represents a critical area of educational research with significant implications for institutional planning, resource allocation, and student support services. This comprehensive study examines a comprehensive dataset encompassing 1,194 enrollment records across six educational institutions, merged with supplementary demographic, financial, and employment status data. The research employs advanced analytical techniques including network analysis, predictive modeling, and temporal pattern recognition to uncover both macro-level institutional trends and micro-level student mobility patterns. The dataset's longitudinal nature, spanning fifteen months of enrollment records, provides unique insights into the complex dynamics of student pathways through higher education systems.\n  \n  Our methodological approach combines quantitative analysis of enrollment durations, transfer probabilities, and financial indicators with qualitative ...\n  \n  The research contributes to the growing body of literature on student mobility by providing empirical evidence of institutional transfer networks and their relationship to student outcomes...\n  .....\n  ```\n  \u003Cp align=\"center\" width=\"100%\">\n    \u003Cimg src=\".\u002Fassets\u002Freport.png\" alt=\"deepanalyze\" style=\"width: 100%; min-width: 300px; display: block; margin: auto;\">\n  \u003C\u002Fp>\n\n  > For more examples and task completion details, please refer to [DeepAnalyze's homepage](https:\u002F\u002Fruc-deepanalyze.github.io\u002F).\n\n### API\n- You can build an OpenAI-Style API, using this script (note to change `MODEL_PATH = \"DeepAnalyze-8B\"` in [API\u002Fconfig.py](API\u002Fconfig.py) to your vllm model name):\n\n  ```\n  python API\u002Fstart_server.py\n  ```\n\n- API usage :\n\n  ```\n  FILE_RESPONSE=$(curl -s -X POST \"http:\u002F\u002Flocalhost:8200\u002Fv1\u002Ffiles\" \\\n      -F \"file=@data.csv\" \\\n      -F \"purpose=file-extract\")\n  \n  FILE_ID=$(echo $FILE_RESPONSE | jq -r '.id')\n  \n  curl -X POST http:\u002F\u002Flocalhost:8200\u002Fv1\u002Fchat\u002Fcompletions \\\n       -H \"Content-Type: application\u002Fjson\" \\\n       -d \"{\n          \\\"model\\\": \\\"DeepAnalyze-8B\\\",\n          \\\"messages\\\": [\n            {\n              \\\"role\\\": \\\"user\\\",\n              \\\"content\\\": \\\"Generate a data science report.\\\",\n              \\\"file_ids\\\": [\\\"$FILE_ID\\\"]\n            }\n          ]\n        }\"\n  # wait for a while\n  ```\n  \n- Refer to API\u002FREADME.md for details.\n\n## 🎈 Develop Your Own DeepAnalyze\n\n### 1. Download Model and Training Data\n- Download [DeepSeek-R1-0528-Qwen3-8B](https:\u002F\u002Fhuggingface.co\u002Fdeepseek-ai\u002FDeepSeek-R1-0528-Qwen3-8B). Or you can directly finetune based on [DeepAnalyze-8B](https:\u002F\u002Fhuggingface.co\u002FRUC-DataLab\u002FDeepAnalyze-8B).\n\n  - If you use DeepSeek-R1-0528-Qwen3-8B as the base model, you should add the special tokens, using:\n\n    ```shell\n    MODEL_PATH=path_to_DeepSeek-R1-0528-Qwen3-8B\n    SAVE_PATH=path_to_save_DeepSeek-R1-0528-Qwen3-8B-addvocab\n    \n    python deepanalyze\u002Fadd_vocab.py \\\n      --model_path \"$MODEL_PATH\" \\\n      --save_path \"$SAVE_PATH\" \\\n      --add_tags\n    ```\n\n- Download training data [DataScience-Instruct-500K](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002FRUC-DataLab\u002FDataScience-Instruct-500K).\n  \n  - unzip `DataScience-Instruct-500K\u002FRL\u002Fdata.zip`\n\n\n### 2. Curriculum-based Agentic Training\n- Single-ability Fine-tuning: [.\u002Fscripts\u002Fsingle.sh](.\u002Fscripts\u002Fsingle.sh)\n- Multi-ability Agentic Training (cold start): [.\u002Fscripts\u002Fmulti_coldstart.sh](.\u002Fscripts\u002Fmulti_coldstart.sh)\n- Multi-ability Agentic Training (RL): [.\u002Fscripts\u002Fmulti_rl.sh](.\u002Fscripts\u002Fmulti_rl.sh)\n\n### 3. Evaluation\n- We have unified the evaluation of most existing data science benchmarks using vLLM (with more being continuously added...). You can directly follow the introduction in [.\u002Fplayground](.\u002Fplayground) to quickly evaluate DeepAnalyze or your own agent.\n\n\n## 👏 Contribution\n> We welcome all forms of contributions, and merged PRs will be listed as contributors. \n### Contribution on Code and Model\n\n- We welcome all forms of contributions on DeepAnalyze's code, model and UI, such as Docker packaging, DeepAnalyze model conversion and quantization, and submitting DeepAnalyze workflows based on closed-source LLMs. \n- You can submit a pull request directly.\n- Please refer to the [Developer Guides](https:\u002F\u002Fmatchbench.github.io\u002Fmd_file\u002FDeveloperGuides.html) for contribution guidelines.\n\n### Contribution on Case Study\n\n- We also especially encourage you to share your use cases and feedback when using DeepAnalyze; these are extremely valuable for helping us improve DeepAnalyze.\n- You can place your use cases in a new folder under [`.example\u002F`](.example\u002F). We recommend following the folder structure of [`.example\u002Fanalysis_on_student_loan\u002F`](.example\u002Fanalysis_on_student_loan\u002F), which includes three parts:\n    - `data\u002F`: stores the uploaded files\n    - `prompt.txt`: input instructions\n    - `README.md`: documentation. We suggest including the input, DeepAnalyze’s output, outputs from other closed-source LLMs (optional), and your evaluation\u002Fcomments of the case.\n- DeepAnalyze only has 8B parameters, so we also welcome examples where DeepAnalyze performs slightly worse than the closed-source LLMs — this will help us improve DeepAnalyze.\n\n## 🤝 Acknowledgement\n\n- **Training Frameworks:** [ms-swift](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002Fms-swift), [SkyRL](https:\u002F\u002Fgithub.com\u002FNovaSky-AI\u002FSkyRL)\n\n- **Sources of Training Data:** [Reasoning-Table](https:\u002F\u002Fgithub.com\u002FMJinXiang\u002FReasoning-Table), [Spider](https:\u002F\u002Fyale-lily.github.io\u002Fspider), [BIRD](https:\u002F\u002Fbird-bench.github.io\u002F), [DABStep](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Fdabstep)\n\n - **API Key & Related Services: HeyWhale Community** .\n\n   **HeyWhale Community (www.heywhale.com) is a world-leading Chinese hands-on AI community. By providing massive data resources, practical cases, learning materials, and a wide range of AI training activities, it brings together nearly one million AI practitioners and enthusiasts to share insights, exchange ideas, collaborate, and rapidly advance their skills through practice.**\n\n\n## 🖋 Citation\n\nIf this repository is useful for you, please cite as:\n\n```\n@misc{deepanalyze,\n      title={DeepAnalyze: Agentic Large Language Models for Autonomous Data Science}, \n      author={Shaolei Zhang and Ju Fan and Meihao Fan and Guoliang Li and Xiaoyong Du},\n      year={2025},\n      eprint={2510.16872},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2510.16872}, \n}\n```\n\nIf you have any questions, please feel free to submit an issue or contact `zhangshaolei98@ruc.edu.cn`.\n\n## 🌟 Misc\n\nWelcome to join the [DeepAnalyze WeChat group](.\u002Fassets\u002Fwechat.jpg), chat and share ideas with others!\n\n\u003Cp align=\"left\" width=\"100%\">\n\u003Cimg src=\".\u002Fassets\u002Fwechat2.jpg\" alt=\"DeepAnalyze\" style=\"width: 35%; min-width: 300px; display: block; margin: auto;\">\n\u003C\u002Fp>\n\nIf you like DeepAnalyze, give it a GitHub Star ⭐. \n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=ruc-datalab\u002FDeepAnalyze&type=date&legend=top-left)](https:\u002F\u002Fwww.star-history.com\u002F#ruc-datalab\u002FDeepAnalyze&type=date&legend=top-left)\n","DeepAnalyze 是一个专为自主数据科学设计的代理型大语言模型。它能够自动完成从数据准备、分析、建模到可视化和报告生成的整个数据科学流程，无需人工干预。项目采用Python开发，具备强大的数据处理能力和智能分析功能，支持多种数据科学任务，并且可以一键生成专业的数据分析报告。DeepAnalyze 适用于需要高效处理大量数据并自动生成分析结果的各种场景，如商业智能分析、科研数据分析等。","2026-06-11 03:29:42","top_topic"]