[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-84322":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":9,"totalLinesOfCode":9,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":9,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":20,"hasPages":20,"topics":22,"createdAt":9,"pushedAt":9,"updatedAt":33,"readmeContent":34,"aiSummary":9,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},84322,"Hands-On-AI-Engineering","Sumanth077\u002FHands-On-AI-Engineering","Sumanth077","A curated collection of practical AI projects implementing OCR systems, RAG, AI agents, and other AI use cases.",null,"https:\u002F\u002Fgithub.com\u002FSumanth077\u002FHands-On-AI-Engineering","Python",1955,547,15,4,0,55,165,98.72,false,"main",[23,24,25,26,27,28,29,30,31,32],"agents","ai","llms","ocr","python","rag","ai-agents","ai-engineering","generative-ai","mcp","2026-06-12 04:01:43","\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Faiengineering.beehiiv.com\u002F\">\n    \u003Cimg src=\"assets\u002Ftheaiengineering_logo.jpeg\" alt=\"Hands-On AI Engineering Banner\" width=\"150\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\u003Cdiv align=\"center\">\n\n# 🚀 Hands-On AI Engineering\n\n[![License: MIT](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-yellow.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FMIT)\n[![PRs Welcome](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRs-welcome-brightgreen.svg)](CONTRIBUTING.md)\n\n\u003C\u002Fdiv>\n\nA curated collection of practical, production-ready AI projects across multiple modalities, including language models, multimodal models, OCR systems, RAG pipelines, and AI agents. Each project is designed to help you learn, experiment, and build real-world AI applications.\n\n## 📋 Table of Contents\n\n- [🎯 Why This Repository?](#-why-this-repository)\n- [🗂️ Project Categories](#️-project-categories)\n- [🚀 Getting Started](#-getting-started)\n- [🤝 Contributing](#-contributing)\n- [📜 License](#-license)\n\n---\n\n## 🎯 Why This Repository?\n\n- **Learn by Doing**: Each project includes complete code, setup instructions, and documentation\n- **Production-Ready**: Projects follow best practices and are ready to be adapted for real-world use\n- **Diverse Use Cases**: From RAG systems to multi-agent workflows and specialized applications\n- **Multiple Model Providers**: Projects use OpenAI, Anthropic, Google, and open-source models\n- **Active Community**: Regular updates and new project additions\n\n---\n\n## 🗂️ Project Categories\n\n### 🤖 AI Agents\n\nIntelligent ai agents for various automation tasks.\n\n- [**Multi-Agent Financial Analyst**](.\u002Fai_agents\u002Fmulti_agent_financial_analyst) — Team of specialized agents for comprehensive financial analysis.\n- [**FinAgent**](.\u002Fai_agents\u002Ffinagent) — Financial assistant agent for stock market analysis and insights.\n- [**Daily AI News Digest**](.\u002Fai_agents\u002Fdaily-news-digest) — Automated daily digest from 92 Karpathy-curated tech blogs delivered to Telegram every morning. MiniMax M2.7 scores articles from the last 24 hours and surfaces the 3 most significant stories.\n- [**Agentic Form Filler**](.\u002Fai_agents\u002Fagentic-form-filler) — Agentic form-filling agent using Landing AI for layout parsing and MiniMax M2.7 for multi-turn data gathering.\n- [**AI Travel Planning Agent**](.\u002Fai_agents\u002Fai_travel_planning_agent) — Multi-agent travel planner that turns a single natural language request into a complete trip plan with flights, hotels, and a day-by-day itinerary.\n- [**Competitive Intelligence Agent**](.\u002Fai_agents\u002Fcompetitive_intelligence_agent) — Generates strategic sales battlecards by analyzing competitors through the lens of your own business context.\n- [**Multi-Agent Research Assistant (AG2)**](.\u002Fai_agents\u002Fmulti_agent_research_assistant_ag2) — Multi-agent research pipeline using AG2 where three specialists collaborate to research any topic and produce a structured report.\n- [**Self-Reflective Agentic RAG**](.\u002Fai_agents\u002Fagentic_rag_system) — LangGraph RAG system that grades retrieved context, rewrites the query if needed, and generates an answer only once the context passes validation.\n- [**Agentic SQL Search**](.\u002Fai_agents\u002Fagentic_sql_search) — Natural language to SQL agent powered by Gemma 4 that writes, executes, and explains queries against an e-commerce database.\n- [**Stock Portfolio Analyst**](.\u002Fai_agents\u002Fstock_portfolio_analyst) — Portfolio analysis agent built with Agno and DeepSeek-V4-Flash. Fetches live market data via YFinance and generates a report covering P&L, concentration risk, and rebalancing recommendations.\n- [**Eagle Eye**](.\u002Fai_agents\u002Feagle_eye) — GitHub PR review agent using OpenClaw and Telegram. Fetches diffs via GitHub MCP, performs structured code review with severity ratings, and posts feedback after user approval.\n- [**CartMate — AI Customer Support Agent**](.\u002Fai_agents\u002Fai_customer_support_agent) — Memory-powered e-commerce support agent built with Mem0 and Mistral Small 4 that remembers customers and picks up conversations where they left off.\n- [**Multi-Agent Coding Assistant**](.\u002Fai_agents\u002Fmulti_agent_coding_assistant) — Four-stage coding pipeline powered by Mistral Small 4 and LangChain. A Planner, Coder, and Reviewer agent collaborate to produce a polished final implementation.\n- [**Startup Analyst**](.\u002Fai_agents\u002Fstartup_analyst) — Startup due-diligence agent powered by MiniMax M2.5. Scrapes a company's site with Firecrawl and produces an investment-grade report covering market position, financials, team, and risks.\n- [**Research Team**](.\u002Fai_agents\u002Fresearch_team) — Multi-agent research system powered by MiniMax M2.5. Seek searches the web, Scout navigates internal documents, and a team leader synthesises findings into a structured report.\n- [**GitHub Intelligence Agent**](.\u002Fai_agents\u002Fgithub_intelligence_agent) — GitHub research agent powered by Gemini 3 Flash and GitHub's official MCP server. Ask anything about repos, contributors, issues, or codebases.\n- [**Smolagents Code Agent**](.\u002Fai_agents\u002Fsmolagents_code_agent) — Agentic task runner powered by Mistral Small 4 and HuggingFace smolagents. Writes and executes Python code at each step using DuckDuckGo and Wikipedia.\n- [**Agent Discovery Agent**](.\u002Fai_agents\u002Fagent_discovery_agent) — Searches and compares AI agents across NANDA, MCP, Virtuals Protocol, A2A, and ERC-8004 through a single natural language interface. Powered by Gemini 3 Flash.\n- [**Cal Scheduling Agent**](.\u002Fai_agents\u002Fcal_scheduling_agent) — Conversational scheduling assistant that manages Cal.com appointments through natural language. Book, reschedule, cancel, and check availability with automatic timezone handling.\n- [**Hacker News Newsletter Agent**](.\u002Fai_agents\u002Fhacker_news_newsletter_agent) — Fetches the 10 latest Hacker News stories, scrapes full article content with Trafilatura, generates a structured HTML newsletter with Gemma 4, and delivers it to your inbox via Gmail SMTP.\n- [**Hotel Finder Agent**](.\u002Fai_agents\u002Fhotel_finder_agent) — Conversational hotel search agent powered by qwen3.6-flash via Orq.ai and the Trivago MCP Server. Search by location, dates, guest count, price range, star rating, and amenities.\n- [**Marketing Strategy Agent**](.\u002Fai_agents\u002Fmarketing_strategy_agent) — Multi-agent marketing campaign generator. A Market Analyst (with Serper web search), Strategy Officer, and Creative Director run sequentially to produce market research, a full strategy, and creative campaign content. Powered by deepseek-v4-flash via Orq.ai.\n- [**Brand Monitor**](.\u002Fai_agents\u002Fbrand_monitor_agent) — Monitors brand mentions across Web, YouTube, Twitter\u002FX, and LinkedIn in a single run. Scrapingdog collects platform data and DeepSeek V4 Flash produces a structured intelligence brief per channel.\n- [**AI Debate Agent**](.\u002Fai_agents\u002Fai_debate_agent) - Two LLM debaters argue opposing sides of any topic you choose. A judge scores each turn and declares a winner.\n- [**Browser Automation Agent**](.\u002Fai_agents\u002Fbrowser_automation_agent) - Takes a natural language instruction and autonomously navigates the web to complete it using browser-use.\n- [**Documentation QnA Agent**](.\u002Fai_agents\u002Fdocumentation_qna_agent) - Chat with any documentation by URL. Uses Fetch MCP and DeepSeek V4 Flash on NVIDIA NIM.\n- [**Job Posting Agent**](.\u002Fai_agents\u002Fjob_posting_agent) - Generates tailored job postings from a company name and role using DeepSeek V4 Flash on NVIDIA NIM.\n- [**LangChain Data Agent**](.\u002Fai_agents\u002Flangchain_data_agent) - Query the Chinook SQLite database in plain English through a conversational Streamlit chat interface.\n- [**Travel Planner Agent**](.\u002Fai_agents\u002Ftravel_planner_agent) - AI trip planning assistant covering weather, budget, packing lists, and day-by-day itineraries from a single request.\n- [**Personal Finance Agent**](.\u002Fai_agents\u002Fpersonal_finance_agent) - Upload a bank statement CSV, auto-categorize transactions, and ask natural language questions about your spending. Powered by a LangChain tool-calling agent backed by Orq.ai with SQLite persistence.\n\n### 📸 OCR\n\nExtracting structure and meaning from visual data and documents.\n\n- [**Image-to-Structured-Data Extractor**](.\u002FOCR\u002Fimage_to_structured_data) — Converts images into validated, structured JSON using Mistral Large 3 and Instructor.\n- [**LaTeX Formula OCR**](.\u002FOCR\u002Flatex_formula_ocr) — Extracts math formulas from images and PDFs into LaTeX using a local vision-language model.\n- [**Medical Prescription Digitizer**](.\u002FOCR\u002Fmedical_prescription_digitizer) — Digitizes handwritten or printed prescriptions into structured output using Mistral Large 3, with real-time drug name validation against RxNorm.\n\n\n### 🎧 Audio\n\nProjects for audio understanding and analysis.\n\n- [**Music Explorer**](.\u002Faudio\u002Fmusic_explorer) — Chat with any audio file or YouTube video using Gemini 3 Flash. Ask for transcriptions, emotion analysis, instrument identification, and timestamp-aware breakdowns.\n- [**Multilingual Audio Translator**](.\u002Faudio\u002Fmultilingual_audio_translator) — Upload or record audio in any language, get it transcribed with faster-whisper, translated via Gemini, and played back as synthesized speech using Kokoro TTS.\n\n### 🎬 Multimodal\n\nProjects combining vision, video, and language models.\n\n- [**GLM-OCR Pro**](.\u002Fmultimodal\u002Fglm_ocr_pro) — Structured document extraction using GLM-OCR via Ollama, transforming images and PDFs into formatted Markdown locally.\n- [**Video Understanding Agent**](.\u002Fmultimodal\u002Fvideo_understanding_agent) — Summarizes YouTube videos into chapters, key takeaways, and action items using Gemini Flash.\n- [**Multimodal Weather App**](.\u002Fmultimodal\u002Fmultimodal_weather_app) — Upload a map image and get live weather. Mistral Small 4 identifies the city via vision, then fetches real-time conditions through native tool calling.\n- [**Multimodal RAG**](.\u002Fmultimodal\u002Fmultimodal_rag) — RAG system that ingests text, URLs, PDFs, images, audio, and video into a shared ChromaDB index. Gemini Embedding 2 handles retrieval and Gemini 3 Flash generates grounded answers, passing actual file URIs for media sources.\n- [**Image Question Answering**](.\u002Fmultimodal\u002Fimage_question_answering) — Upload a PDF, select a page, and ask visual questions answered by Gemma 4 with thinking mode. PyMuPDF renders each page to a full-resolution image for grounded reasoning over charts, tables, and figures.\n- [**Medical Document Parser**](.\u002Fmultimodal\u002Fmedical_document_parser) - Extracts a structured clinical profile from medical PDFs and images using Gemma 4 vision.\n\n### 📚 RAG Applications\n\nRetrieval-Augmented Generation systems for knowledge-enhanced AI applications.\n\n- [**Agentic RAG with O3-Mini & DuckDuckGo**](.\u002Frag_apps\u002Fagentic_rag_with_o3_mini_and_duckduckgo) — RAG system using O3-Mini with DuckDuckGo for real-time web search.\n- [**Agentic RAG with Qwen & FireCrawl**](.\u002Frag_apps\u002Fagentic_rag_with_qwen_and_firecrawl) — RAG system using Qwen and FireCrawl for web scraping and retrieval.\n- [**Vision RAG**](.\u002Frag_apps\u002Fvision_rag) — Multimodal RAG system for processing and querying visual content.\n- [**Clinical RAG with ADE**](.\u002Frag_apps\u002Fclinical_rag_with_ade) — High-precision clinical RAG using LandingAI ADE for visual-first document parsing and Mistral Large for grounded reasoning.\n- [**YouTube Transcript RAG**](.\u002Frag_apps\u002Fyoutube_transcript_rag) — Chat with any YouTube video using Whisper transcription, ChromaDB retrieval, and Mistral Small 4, with timestamp-linked answers.\n- [**GraphRAG Knowledge System**](.\u002Frag_apps\u002Fgraphrag_knowledge_system) — Builds a local knowledge graph from uploaded documents using Mistral Small 4 and NetworkX, supporting both entity-level and thematic queries.\n- [**Hybrid RAG System**](.\u002Frag_apps\u002Fhybrid_rag_system) — Indexes documents into a knowledge graph and a vector store in parallel. Mistral Small 4 answers questions with fused context from both retrieval paths.\n- [**HyDE RAG**](.\u002Frag_apps\u002Fhyde_rag) — RAG pipeline using Hypothetical Document Embeddings. Gemini 3 Flash generates hypothetical answers, Gemini Embedding 2 embeds and averages them, and the result retrieves more precise chunks from ChromaDB.\n- [**Rock Music RAG**](.\u002Frag_apps\u002Frock_music_rag) — Custom rock music knowledge base built from Wikipedia. Add any band, ask questions across all of them, and get sourced answers powered by BM25 retrieval and Gemma 4.\n- [**RAG Agent with Database Routing**](.\u002Frag_apps\u002Frag_agent_with_database_routing) — Routes queries across three specialized Qdrant databases (products, support, financial) using an Agno router agent. Falls back to a LangGraph ReAct web search agent when no relevant documents are found.\n- [**Reasoning RAG**](.\u002Frag_apps\u002Freasoning_rag) - Ask questions against any web source and get cited answers with a live, step-by-step reasoning trace via Gradio.\n\n---\n\n## 🤝 Contributing\n\nWe welcome contributions! Whether you're adding new projects, improving existing ones, or fixing bugs, your help makes this repository better for everyone.\n\n### How to Contribute\n\n1. **Read the guidelines**: Check [CONTRIBUTING.md](CONTRIBUTING.md) for detailed instructions\n2. **Create an issue**: Propose your project or improvement\n3. **Follow the structure**: Use the appropriate category folder\n4. **Submit a PR**: One project per pull request\n\n### Project Structure Requirements\n\n- Each project must be in its own folder within the appropriate category\n- Must include a comprehensive `README.md` (use our [template](.github\u002FREADME_TEMPLATE.md))\n- Must include `requirements.txt` or `pyproject.toml`\n- Must include `.env.example` for required API keys\n- Follow snake_case naming convention\n\n---\n\n## 📜 License\n\nThis repository is licensed under the **MIT License**. See the [LICENSE](.\u002FLICENSE) file for details.\n\n---\n\n## 🙏 Acknowledgments\n\nThank you to all contributors who have helped build this collection of AI engineering projects!\n\n---\n\n\u003Cdiv align=\"center\">\n\n**Built with ❤️ by the [AI Engineering Community](https:\u002F\u002Faiengineering.beehiiv.com\u002F)**\n\nFor sponsorship or collaboration inquiries, reach the maintainer at [sumanth@devable.ai](mailto:sumanth@devable.ai).\n\n[⬆ Back to Top](#-hands-on-ai-engineering)\n\n\u003C\u002Fdiv>\n",2,"2026-06-11 04:12:48","trending"]