[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80109":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":34,"readmeContent":35,"aiSummary":36,"trendingCount":15,"starSnapshotCount":15,"syncStatus":13,"lastSyncTime":37,"discoverSource":38},80109,"EDAgent","Varn1t\u002FEDAgent","Varn1t","Multi-agent exploratory data analysis system with autonomous insights, visualization, preprocessing, and reporting workflows.","",null,"Python",71,2,15,0,1,16,40.53,false,"main",true,[23,24,25,26,27,28,29,30,31,32,33],"agentic-ai","ai","automation","data-analysis","eda","llm","machine-learning","multi-agent","pandas","python","visualization","2026-06-12 04:01:26","\u003Cdiv align=\"center\">\n\n# EDAgent\n\nYour personal Exploratory Data Analyst + AI Agent\n\n**Drop in a CSV. Get a complete EDA — automatically.**\n\u003Cimg width=\"1878\" height=\"867\" alt=\"image\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F5919cf06-af14-452a-90ad-ba3caaf27906\" \u002F>\n\n\nAn agentic, LLM-powered Exploratory Data Analysis pipeline built with LangGraph and Ollama.  \nNine specialized AI agents analyze your dataset, then generate a polished HTML report — all running **100% locally**.\n\n![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10%2B-blue?style=flat-square&logo=python)\n![LangGraph](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLangGraph-agentic-blueviolet?style=flat-square)\n![Ollama](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FOllama-local%20LLM-orange?style=flat-square)\n\n\u003C\u002Fdiv>\n\n---\n\n## What it does\n\nYou give it a CSV. It spins up a **9-stage LangGraph pipeline** where each node is an AI agent that analyzes a different aspect of your data, writes a summary, and passes its findings to the next stage. At the end, you get:\n\n- An **interactive Streamlit dashboard** with tabbed results and live progress\n- A **rich, color-coded terminal output** (if run via CLI)\n- A **self-contained `report.html`** — dark-themed, browser-ready, heatmap embedded inline\n- A **`correlation_heatmap.png`** saved to `output\u002F`\n\u003Cimg width=\"560\" height=\"605\" alt=\"image\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F96cc9812-567a-40b4-9086-2119cd6626de\" \u002F>\n\n---\n\n## Pipeline\n\n```\nschema → quality → stats → outliers → correlation → importance → synthesis → model_rec → feature_eng\n```\n\nEach node runs a Python analysis tool first, then passes the raw result to the LLM to reason over and summarize in plain English.\n\n**Architecture Note:** All Python computations (correlation matrices, outlier IQR bounds, descriptive stats) run on the **full dataset** to ensure statistical accuracy. However, the data passed to the LLM is intentionally capped (e.g., only the top 20 strongest correlations or top 15 outlier features) to prevent context window overload and hallucination.\n\n| # | Agent | What it analyzes |\n|---|---|---|\n| 1 | **Schema** | Shape, column types, null counts per column |\n| 2 | **Quality** | Duplicates, missing value %, columns with nulls |\n| 3 | **Statistics** | Descriptive stats, skewness, categorical value counts |\n| 4 | **Outliers** | IQR-based detection — count, bounds, example values |\n| 5 | **Correlation** | Pearson matrix, multicollinearity flags, heatmap |\n| 6 | **Feature Importance** | Target-aware correlation ranking (powered by robust multi-tiered target detection), falling back to variance (numeric) and entropy (categorical) ranking |\n| 7 | **Synthesis** | Full EDA narrative — overview, issues, patterns, recommendations |\n| 8 | **Model Recommendation** | Infers problem type, recommends models, flags uncertainty, suggests metrics |\n| 9 | **Feature Engineering** | Suggests concrete new features: log transforms, bins, interactions, encodings |\n\n**Note:** Feature engineering code is executed in a restricted sandbox (`__builtins__` stripped, only `pd`, `np`, and `df` exposed) with a 2-attempt reflection loop that feeds errors back to the LLM for self-correction. The pipeline uses robust multi-tiered target detection to completely hide the target variable from the LLM prompt and sandbox to prevent data leakage, and aggressively checks newly generated features to reject trivial copies (correlation > 0.999) of existing features.\n\n\u003Cimg width=\"482\" height=\"283\" alt=\"image\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002Fb716f1f9-e3d6-4cd3-b621-0e239cf0036f\" \u002F>\n\u003Cimg width=\"1070\" height=\"352\" alt=\"image\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F722b1a1c-5b75-4838-bedd-9df220b3f19d\" \u002F>\n\n\n---\n\n## Quickstart\n\n### 1. Prerequisites\n\nInstall [Ollama](https:\u002F\u002Follama.com) and pull the model:\n```bash\nollama pull llama3.2\n```\n\n### 2. Install dependencies\n```bash\npip install -r requirements.txt\n```\n\n### 3. Run the Dashboard\n```bash\nstreamlit run app.py\n```\nThis will open the EDAgent web dashboard in your browser. Just drag and drop your CSV into the upload area!\n\n#### Running in CLI (Alternative)\nIf you prefer the terminal, you can run the pipeline directly:\n```bash\n# On your own dataset\npython pipeline.py your_dataset.csv\n\n# With built-in test data\npython pipeline.py\n```\n\n---\n\n## Example terminal output\n\n```\n┌──────────────────────────────────────────────────────────────────┐\n│ EDAgent Pipeline                                                 │\n│ Dataset: Teen_Mental_Health_Dataset.csv  Rows: 1200  Cols: 13   │\n└──────────────────────────────────────────────────────────────────┘\n\n  [ schema ]       Running schema agent...\n  [ quality ]      Running quality agent...\n  [ stats ]        Running stats agent...\n  [ outliers ]     Running outlier agent...\n  [ correlation ]  Running correlation agent...\n  [ importance ]   Running feature importance agent...\n  [ synthesis ]    Running synthesis agent...\n  [ model-rec ]    Running model recommendation agent...\n  [ feature-eng ]  Running feature engineering agent...\n\n┌─────────────────────────────────────────────────────────────────┐\n│ Done!                                                           │\n│ HTML Report → output\u002Freport.html                                │\n│ Heatmap     → output\u002Fcorrelation_heatmap.png                    │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n---\n\n## Tech Stack\n\n| Tool | Role |\n|---|---|\n| [Streamlit](https:\u002F\u002Fstreamlit.io\u002F) | Interactive web dashboard |\n| [LangGraph](https:\u002F\u002Fgithub.com\u002Flangchain-ai\u002Flanggraph) | Agent orchestration & state management |\n| [LangChain + Ollama](https:\u002F\u002Fpython.langchain.com\u002F) | Local LLM inference |\n| [Llama 3.2](https:\u002F\u002Follama.com\u002Flibrary\u002Fllama3.2) | The underlying language model |\n| [pandas](https:\u002F\u002Fpandas.pydata.org\u002F) | Data analysis |\n| [seaborn \u002F matplotlib](https:\u002F\u002Fseaborn.pydata.org\u002F) | Correlation heatmap |\n| [scipy](https:\u002F\u002Fscipy.org\u002F) | Entropy calculation for feature importance |\n| [rich](https:\u002F\u002Frich.readthedocs.io\u002F) | Terminal UI |\n\n---\n\n## Project Structure\n\n```\nEDAgent\u002F\n├── app.py               # Streamlit web dashboard\n├── pipeline.py          # Full LangGraph pipeline (all 9 agents)\n├── requirements.txt\n├── README.md\n└── output\u002F              # Auto-generated, git-ignored\n    ├── report.html\n    └── correlation_heatmap.png\n```\n\n---\n\n## Why local?\n\nNo API keys. No data sent to the cloud. Everything runs on your machine via Ollama. Swap `llama3.2` for any other Ollama-compatible model by changing one line in `pipeline.py`:\n\n```python\nllm = ChatOllama(model=\"your-model-here\")\n```\n\n---\n\nCreated by Varnit :)\n","EDAgent 是一个多代理探索性数据分析系统，能够自动提供数据洞察、可视化、预处理和报告工作流。项目核心功能包括通过九个专业AI代理分析CSV文件，并生成交互式Streamlit仪表板、丰富的终端输出以及自包含的HTML报告。技术特点上，它使用Python 3.10+开发，结合了LangGraph和Ollama构建了一个完全本地运行的LLM驱动管道。每个节点执行特定的数据分析任务（如数据质量检查、统计描述、异常值检测等），并将结果传递给下一个阶段。最终，用户可以获得详细的EDA报告及模型推荐。适用于需要快速获得高质量数据洞察的场景，尤其是对数据集进行初步了解和准备时。","2026-06-11 03:59:16","CREATED_QUERY"]