[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80147":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":12,"contributorsCount":12,"subscribersCount":12,"size":12,"stars1d":12,"stars7d":12,"stars30d":14,"stars90d":12,"forks30d":12,"starsTrendScore":12,"compositeScore":12,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":15,"fork":15,"defaultBranch":16,"hasWiki":17,"hasPages":15,"topics":18,"createdAt":9,"pushedAt":9,"updatedAt":19,"readmeContent":20,"aiSummary":21,"trendingCount":12,"starSnapshotCount":12,"syncStatus":22,"lastSyncTime":23,"discoverSource":24},80147,"truck-delay-prediction","Keerthik1622\u002Ftruck-delay-prediction","Keerthik1622","End-to-end Machine Learning pipeline for Truck Delay Prediction using XGBoost, Flask API, MLflow, and Lightning AI deployment.",null,"Python",53,0,52,1,false,"main",true,[],"2026-06-12 02:03:58","# 🚚 Truck Delay Prediction — End-to-End ML Pipeline\n\n> A production-grade machine learning pipeline that predicts truck shipment delays,\n> built for deployment on Lightning.ai with a Flask REST API.\n\n---\n\n## 🏗️ Architecture\n\n```\nMySQL DB ──┐\n           ├─→ ETL Pipeline ─→ Feature Engineering ─→ Model Training ─→ Flask API\nPostgres ──┘                                           (RF \u002F XGB \u002F LGBM)\n```\n\n## 📁 Project Structure\n\n```\ntruck_delay_ml\u002F\n├── config.yaml                  # Central config (no hardcoded values)\n├── run_pipeline.py              # One command: ETL + Training\n├── requirements.txt\n├── .env.example                 # Secret template (never commit .env!)\n│\n├── ml_pipeline\u002F\n│   ├── etl\u002F\n│   │   ├── db_connector.py      # MySQL + PostgreSQL connections + mock data\n│   │   ├── extractor.py         # Extract & merge from both DBs\n│   │   ├── transformer.py       # Feature engineering & cleaning\n│   │   └── loader.py            # Save\u002Fload parquet files\n│   ├── modeling\u002F\n│   │   └── trainer.py           # Multi-model training + MLflow tracking\n│   └── utils\u002F\n│       ├── config_loader.py     # YAML + env var loader\n│       └── logger.py            # Rotating file + console logger\n│\n├── deployment\u002F\n│   └── flask_app.py             # REST API with \u002Fpredict and \u002Fpredict\u002Fbatch\n│\n└── tests\u002F\n    └── test_pipeline.py         # pytest unit tests\n```\n\n## 🚀 Quick Start on Lightning.ai\n\n### 1. Clone & setup\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FYOUR_USERNAME\u002Ftruck_delay_ml.git\ncd truck_delay_ml\npip install -r requirements.txt\n```\n\n### 2. Configure environment\n\n```bash\ncp .env.example .env\n# Edit .env with your DB credentials\n# Or set MOCK_DATA=true to skip DB and use synthetic data\n```\n\n### 3. Run the full pipeline\n\n```bash\n# With mock data (no database needed):\npython run_pipeline.py --mock\n\n# With real databases:\npython run_pipeline.py\n```\n\n### 4. Start the Flask API\n\n```bash\npython deployment\u002Fflask_app.py\n```\n\n### 5. Test the API\n\n```bash\ncurl -X POST http:\u002F\u002Flocalhost:5000\u002Fpredict \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\n    \"distance_km\": 850,\n    \"truck_type\": \"Large\",\n    \"truck_age_years\": 9,\n    \"driver_experience\": 2,\n    \"cargo_weight_kg\": 15000,\n    \"weather_condition\": \"Rain\",\n    \"route_type\": \"Rural\",\n    \"traffic_index\": 0.85,\n    \"road_quality\": \"Poor\",\n    \"num_stops\": 4\n  }'\n```\n\nExpected response:\n```json\n{\n  \"prediction\": 1,\n  \"label\": \"Delayed\",\n  \"probability\": 0.8231,\n  \"confidence\": \"82.3%\",\n  \"risk_level\": \"High\"\n}\n```\n\n## 🧪 Run Tests\n\n```bash\npytest tests\u002F -v\n```\n\n## 📊 API Endpoints\n\n| Method | Endpoint | Description |\n|--------|----------|-------------|\n| GET    | `\u002F`               | Health check |\n| POST   | `\u002Fpredict`        | Single prediction |\n| POST   | `\u002Fpredict\u002Fbatch`  | Batch predictions (max 1000) |\n| GET    | `\u002Fmodel\u002Finfo`     | Feature list & model type |\n| POST   | `\u002Freload`         | Hot-reload model after retraining |\n\n## 🔬 Models Compared\n\n| Model | CV F1 | Notes |\n|-------|-------|-------|\n| Random Forest | ~0.84 | Robust, good baseline |\n| XGBoost | ~0.86 | Fast, handles missing values |\n| **LightGBM** | **~0.87** | **Best — used in production** |\n\n## ✨ Key Features\n\n- **No hardcoded values** — everything in `config.yaml`\n- **MLflow experiment tracking** — compare all runs visually\n- **Mock data mode** — test the full pipeline without any database\n- **Production Flask API** — `\u002Fpredict` and `\u002Fpredict\u002Fbatch` endpoints\n- **Automatic logging** — predictions logged to CSV for monitoring\n- **Unit tests** — `pytest` coverage for all pipeline stages\n\n## 🛠️ Tech Stack\n\n`Python` · `scikit-learn` · `XGBoost` · `LightGBM` · `MLflow` · `Flask` · `SQLAlchemy` · `pandas` · `pytest`\n","该项目是一个用于卡车延误预测的端到端机器学习流水线，采用XGBoost模型，并通过Flask API、MLflow以及Lightning AI进行部署。其核心功能包括从MySQL和PostgreSQL数据库中提取数据、特征工程处理、模型训练（支持多种模型如RF\u002FXGB\u002FLGBM），并通过Flask提供REST API服务以实现单条或批量预测。该系统设计了模块化的项目结构，便于维护与扩展，同时利用YAML配置文件来避免硬编码值，并通过环境变量管理敏感信息。适用于物流行业中的运输时间预估场景，帮助企业优化配送计划并提高客户满意度。",2,"2026-06-11 03:59:26","CREATED_QUERY"]