[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72430":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":35,"lastSyncTime":36,"discoverSource":37},72430,"AI-Video-Transcriber","wendy7756\u002FAI-Video-Transcriber","wendy7756"," Transcribe and summarize videos and podcasts using AI. Open-source, multi-platform, and supports multiple languages.","https:\u002F\u002Fsipsip.ai",null,"Python",2744,363,17,3,0,24,47,110,72,109.68,"Apache License 2.0",false,"main",true,[27,28,29,30,31],"aitool","tiktok","transcribe","videototext","youtube","2026-06-12 04:01:05","\u003Cdiv align=\"center\">\n\n# AI Video Transcriber\n\nEnglish | [中文](README_ZH.md)\n\nAn AI-powered tool to transcribe and summarize videos and podcasts — paste a URL from YouTube, TikTok, Bilibili, Apple Podcasts, SoundCloud, and 30+ platforms, **or upload a local file** (audio, video, or plain text).\n\n![Interface](en_video.png)\n\n\u003C\u002Fdiv>\n\n## ✨ Features\n\n- 🎥 **Multi-Platform Support**: Works with YouTube, TikTok, Bilibili, Apple Podcasts, SoundCloud, and 30+ more\n- 📁 **Local File Upload**: Drag-and-drop or pick a file — supported formats include `.txt` (treated as transcript text), `.mp3`, `.mp4`, `.m4a`, `.wav`, `.webm`, `.mkv`, `.ogg`, `.flac`. Media is normalized with FFmpeg for Whisper; the same optimize → translate → summarize pipeline runs as for URLs\n- ⚡ **Subtitle-First Architecture**: For platforms with native subtitles (e.g. YouTube), transcripts are extracted instantly — no audio download needed. Whisper is only used as a fallback, making the whole pipeline dramatically faster.\n- 🗣️ **Intelligent Transcription**: High-accuracy speech-to-text using Faster-Whisper when subtitles aren't available\n- 🤖 **AI Text Optimization**: Automatic typo correction, sentence completion, and intelligent paragraphing\n- 🌍 **Multi-Language Summaries**: Generate intelligent summaries in multiple languages\n- 🔧 **Bring Your Own Model**: Configure any OpenAI-compatible API endpoint (OpenAI, OpenRouter, local LLM, etc.) directly in the UI — enter your API Base URL and API Key, then click **Fetch** to auto-discover all available models and select the one you want\n- ⚙️ **Conditional Translation**: Auto-translates the transcript when the summary language differs from the source language\n- 📱 **Mobile-Friendly**: Perfect support for mobile devices\n\n[![Star History Chart](https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=wendy7756\u002FAI-Video-Transcriber&type=Date)](https:\u002F\u002Fstar-history.com\u002F#wendy7756\u002FAI-Video-Transcriber&Date)\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- Python 3.8+\n- FFmpeg (required for yt-dlp audio extraction and for normalizing uploaded media)\n- An API key from any OpenAI-compatible provider (OpenAI, OpenRouter, etc.) — configured directly in the UI, no server-side env var needed\n\n### Installation\n\n#### Method 1: Automatic Installation\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002Fwendy7756\u002FAI-Video-Transcriber.git\ncd AI-Video-Transcriber\n\n# Run installation script\nchmod +x install.sh\n.\u002Finstall.sh\n```\n\n#### Method 2: Docker\n\n```bash\n# Clone the repository\ngit clone https:\u002F\u002Fgithub.com\u002Fwendy7756\u002FAI-Video-Transcriber.git\ncd AI-Video-Transcriber\n\n# Using Docker Compose (easiest)\ncp .env.example .env\n# Edit .env file if you want server-side defaults (optional)\ndocker-compose up -d\n\n# Or using Docker directly\ndocker build -t ai-video-transcriber .\ndocker run -p 8000:8000 --env-file .env ai-video-transcriber\n```\n\nThe image uses **Python 3.12** (Debian Bookworm), upgrades `pip`\u002F`setuptools`\u002F`wheel`, then installs from `requirements.txt` — same version constraints as a fresh local venv on a current Python.\n\n#### Method 3: Manual Installation\n\n1. **Install Python Dependencies**\n```bash\n# macOS (PEP 668) strongly recommends using a virtualenv\npython3 -m venv venv\nsource venv\u002Fbin\u002Factivate\npython -m pip install --upgrade pip\npip install -r requirements.txt\n```\n\n2. **Install FFmpeg**\n```bash\n# macOS\nbrew install ffmpeg\n\n# Ubuntu\u002FDebian\nsudo apt update && sudo apt install ffmpeg\n\n# CentOS\u002FRHEL\nsudo yum install ffmpeg\n```\n\n3. **Configure Environment Variables** *(optional)*\n```bash\n# If you prefer server-side defaults, set these — otherwise configure via the UI\nexport OPENAI_API_KEY=\"your_api_key_here\"\nexport OPENAI_BASE_URL=\"https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1\"  # any OpenAI-compatible endpoint\n```\n\n### Start the Service\n\n```bash\npython3 start.py\n```\n\nAfter the service starts, open your browser and visit `http:\u002F\u002Flocalhost:8000`\n\n#### Production Mode (Recommended for long videos)\n\nTo avoid SSE disconnections during long processing, start in production mode (hot-reload disabled):\n\n```bash\npython3 start.py --prod\n```\n\nThis keeps the SSE connection stable throughout long tasks (30–60+ min).\n\n#### Run with explicit env (example)\n\n```bash\nsource venv\u002Fbin\u002Factivate\nexport OPENAI_API_KEY=your_api_key_here         # optional: server-side default\n# export OPENAI_BASE_URL=https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1  # optional: server-side default\npython3 start.py --prod\n```\n\n## 📖 Usage Guide\n\n1. **Choose input — URL or file**\n   - **Video \u002F podcast URL**: Paste a link from YouTube, Bilibili, or any other supported platform into the input field\n   - **Local file**: Drag a file onto the dashed upload area (or click to browse). Same **Transcribe** button starts the job; uploads use the same API route as URLs (`POST \u002Fapi\u002Fprocess-video` with multipart `file`), which helps when a reverse proxy only allows that path\n2. **Select Summary Language**: Choose the output language from the dropdown next to the input area\n3. **(Optional) Configure AI Model**: Click **AI Settings** to expand the panel\n   - Enter your **API Base URL** (e.g. `https:\u002F\u002Fopenrouter.ai\u002Fapi\u002Fv1`) and **API Key**\n   - Click **Fetch** to auto-load all models from that provider\n   - Select the model you want — or leave blank to use the server default\n4. **Start Processing**: Click the **Transcribe** button. For **URL** jobs, the progress bar shows which mode is active:\n   - **⚡ Subtitle** (green) — native subtitles found, transcript extracted in seconds\n   - **🎙 Whisper** (amber) — no subtitles available, downloading audio for transcription\n   For **local uploads**, media is normalized with FFmpeg then transcribed with Whisper; plain **`.txt`** files skip download\u002FWhisper and go straight into the text pipeline (optimize → summary, and translation when languages differ).\n5. **View Results**: Review the optimized transcript and AI summary\n   - If transcript language ≠ selected summary language, a **Translation** tab appears automatically\n6. **Download Files**: Save Markdown-formatted files (Transcript \u002F Translation \u002F Summary)\n\n## 🛠️ Technical Architecture\n\n### Backend Stack\n- **FastAPI**: Modern Python web framework\n- **yt-dlp**: Video downloading and processing\n- **FFmpeg**: Audio extraction and local upload normalization (mono 16 kHz for Whisper)\n- **Faster-Whisper**: Efficient speech transcription\n- **OpenAI API**: Intelligent text summarization\n\n### Frontend Stack\n- **HTML5 + CSS3**: Responsive interface design\n- **JavaScript (ES6+)**: Modern frontend interactions\n- **Marked.js**: Markdown rendering\n- **Font Awesome**: Icon library\n\n### Project Structure\n```\nAI-Video-Transcriber\u002F\n├── backend\u002F                 # Backend code\n│   ├── main.py             # FastAPI main application\n│   ├── video_processor.py  # Video processing module\n│   ├── transcriber.py      # Transcription module\n│   ├── summarizer.py       # Summary module\n│   ├── translator.py       # Translation module\n│   └── llm_sanitize.py     # Post-process LLM outputs (strip boilerplate)\n├── static\u002F                 # Frontend files\n│   ├── index.html          # Main page\n│   └── app.js              # Frontend logic\n├── temp\u002F                   # Temporary files directory\n├── Dockerfile              # Docker image configuration\n├── docker-compose.yml      # Docker Compose configuration\n├── .dockerignore           # Docker ignore rules\n├── .env.example            # Environment variables template\n├── requirements.txt        # Python dependencies\n├── start.py               # Startup script\n└── README.md              # Project documentation\n```\n\n## ⚙️ Configuration Options\n\n### Environment Variables\n\n| Variable | Description | Default | Required |\n|----------|-------------|---------|----------|\n| `OPENAI_API_KEY` | API key (server-side default) | - | No — can be set in UI instead |\n| `HOST` | Server address | `0.0.0.0` | No |\n| `PORT` | Server port | `8000` | No |\n| `WHISPER_MODEL_SIZE` | Whisper model size | `base` | No |\n| `UPLOAD_MAX_MB` | Maximum upload size for local files (MB) | `200` | No |\n\nAn optional dedicated endpoint `POST \u002Fapi\u002Fprocess-upload` exists with the same behavior as sending `file` to `\u002Fapi\u002Fprocess-video`.\n\n### Whisper Model Size Options\n\n| Model | Parameters | English-only | Multilingual | Speed | Memory Usage |\n|-------|------------|--------------|--------------|-------|--------------|\n| tiny | 39 M | ✓ | ✓ | Fast | Low |\n| base | 74 M | ✓ | ✓ | Medium | Low |\n| small | 244 M | ✓ | ✓ | Medium | Medium |\n| medium | 769 M | ✓ | ✓ | Slow | Medium |\n| large | 1550 M | ✗ | ✓ | Very Slow | High |\n\n## 🔧 FAQ\n\n### Q: Why is transcription slow?\nA: Transcription speed depends on video length, Whisper model size, and hardware performance. Try using smaller models (like tiny or base) to improve speed.\n\n### Q: Which video platforms are supported?\nA: All platforms supported by yt-dlp, including but not limited to: YouTube, TikTok, Facebook, Instagram, Twitter, Bilibili, Youku, iQiyi, Tencent Video, etc.\n\n### Q: What local file types and size limits apply?\nA: Allowed extensions include `.txt`, `.mp3`, `.mp4`, `.m4a`, `.wav`, `.webm`, `.mkv`, `.ogg`, `.flac`. Default max size is **200 MB** per file; override with the `UPLOAD_MAX_MB` environment variable on the server.\n\n### Q: What if the AI optimization features are unavailable?\nA: AI features require an API key from any OpenAI-compatible provider (OpenAI, OpenRouter, etc.). You can enter it directly in the **AI Settings** panel in the UI — no server restart needed. Alternatively, set `OPENAI_API_KEY` as an environment variable for a server-side default.\n\n### Q: I get HTTP 500 errors when starting\u002Fusing the service. Why?\nA: In most cases this is an environment configuration issue rather than a code bug. Please check:\n- Ensure a virtualenv is activated: `source venv\u002Fbin\u002Factivate`\n- Install deps inside the venv: `pip install -r requirements.txt`\n- Configure your API key in the **AI Settings** panel, or set `OPENAI_API_KEY` as an env var\n- Install FFmpeg: `brew install ffmpeg` (macOS) \u002F `sudo apt install ffmpeg` (Debian\u002FUbuntu)\n- If port 8000 is occupied, stop the old process or change `PORT`\n\n### Q: How to handle long videos?\nA: The system can process videos of any length, but processing time will increase accordingly. For very long videos, consider using smaller Whisper models.\n\n### Q: How to use Docker for deployment?\nA: Docker provides the easiest deployment method:\n\n**Prerequisites:**\n- Install Docker Desktop from https:\u002F\u002Fwww.docker.com\u002Fproducts\u002Fdocker-desktop\u002F\n- Ensure Docker service is running\n\n**Quick Start:**\n```bash\n# Clone and setup\ngit clone https:\u002F\u002Fgithub.com\u002Fwendy7756\u002FAI-Video-Transcriber.git\ncd AI-Video-Transcriber\ncp .env.example .env\n# Edit .env file to set server-side defaults (optional)\n\n# Start with Docker Compose (recommended)\ndocker-compose up -d\n\n# Or build and run manually\ndocker build -t ai-video-transcriber .\ndocker run -p 8000:8000 --env-file .env ai-video-transcriber\n```\n\n**Common Docker Issues:**\n- **Port conflict**: Change port mapping `-p 8001:8000` if 8000 is occupied\n- **Permission denied**: Ensure Docker Desktop is running and you have proper permissions\n- **Build fails**: Check disk space (need ~2GB free) and network connection\n- **Container won't start**: Check Docker logs with `docker logs \u003Ccontainer_id>`\n\n**Docker Commands:**\n```bash\n# View running containers\ndocker ps\n\n# Check container logs\ndocker logs ai-video-transcriber-ai-video-transcriber-1\n\n# Stop service\ndocker-compose down\n\n# Rebuild after changes\ndocker-compose build --no-cache\n```\n\n### Q: What are the memory requirements?\nA: Memory usage varies depending on the deployment method and workload:\n\n**Docker Deployment:**\n- **Base memory**: ~128MB for idle container\n- **During processing**: 500MB - 2GB depending on video length and Whisper model\n- **Docker image size**: ~1.6GB disk space required\n- **Recommended**: 4GB+ RAM for smooth operation\n\n**Traditional Deployment:**\n- **Base memory**: ~50-100MB for FastAPI server\n- **Whisper models memory usage**:\n  - `tiny`: ~150MB\n  - `base`: ~250MB  \n  - `small`: ~750MB\n  - `medium`: ~1.5GB\n  - `large`: ~3GB\n- **Peak usage**: Base + Model + Video processing (~500MB additional)\n\n**Memory Optimization Tips:**\n```bash\n# Use smaller Whisper model to reduce memory usage\nWHISPER_MODEL_SIZE=tiny  # or base\n\n# For Docker, limit container memory if needed\ndocker run -m 1g -p 8000:8000 --env-file .env ai-video-transcriber\n\n# Monitor memory usage\ndocker stats ai-video-transcriber-ai-video-transcriber-1\n```\n\n### Q: Network connection errors or timeouts?\nA: If you encounter network-related errors during video downloading or API calls, try these solutions:\n\n**Common Network Issues:**\n- Video download fails with \"Unable to extract\" or timeout errors\n- OpenAI API calls return connection timeout or DNS resolution failures\n- Docker image pull fails or is extremely slow\n\n**Solutions:**\n1. **Switch VPN\u002FProxy**: Try connecting to a different VPN server or switch your proxy settings\n2. **Check Network Stability**: Ensure your internet connection is stable\n3. **Retry After Network Change**: Wait 30-60 seconds after changing network settings before retrying\n4. **Use Alternative Endpoints**: If using custom OpenAI endpoints, verify they're accessible from your network\n5. **Docker Network Issues**: Restart Docker Desktop if container networking fails\n\n**Quick Network Test:**\n```bash\n# Test video platform access\ncurl -I https:\u002F\u002Fwww.youtube.com\u002F\n\n# Test your AI provider endpoint\ncurl -I https:\u002F\u002Fopenrouter.ai\n\n# Test Docker Hub access\ndocker pull hello-world\n```\n\n## 🎯 Supported Languages\n\n### Transcription\n- Supports 100+ languages through Whisper\n- Automatic language detection\n- High accuracy for major languages\n\n### Summary Generation\n- English\n- Chinese (Simplified)\n- Japanese\n- Korean\n- Spanish\n- French\n- German\n- Portuguese\n- Russian\n- Arabic\n- And more...\n\n## 📈 Performance Tips\n\n- **Hardware Requirements**:\n  - Minimum: 4GB RAM, dual-core CPU\n  - Recommended: 8GB RAM, quad-core CPU\n  - Ideal: 16GB RAM, multi-core CPU, SSD storage\n\n- **Processing Time Estimates**:\n\n  | Video Length | Subtitle Mode | Whisper Mode | Notes |\n  |-------------|---------------|--------------|-------|\n  | 1 minute | ~5s | 30s–1 min | Subtitle mode needs no audio download |\n  | 5 minutes | ~10s | 2–5 min | YouTube auto-captions trigger subtitle mode |\n  | 15 minutes | ~15s | 5–15 min | Most YouTube videos support subtitle mode |\n  | 30+ minutes | ~20s | 15–60 min | Podcast\u002Faudio-only always uses Whisper |\n\n## 🤝 Contributing\n\nWe welcome Issues and Pull Requests!\n\n1. Fork the project\n2. Create a feature branch (`git checkout -b feature\u002FAmazingFeature`)\n3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the branch (`git push origin feature\u002FAmazingFeature`)\n5. Open a Pull Request\n\n\n## Acknowledgments\n\n- [yt-dlp](https:\u002F\u002Fgithub.com\u002Fyt-dlp\u002Fyt-dlp) - Powerful video downloading tool\n- [Faster-Whisper](https:\u002F\u002Fgithub.com\u002Fguillaumekln\u002Ffaster-whisper) - Efficient Whisper implementation\n- [FastAPI](https:\u002F\u002Ffastapi.tiangolo.com\u002F) - Modern Python web framework\n- [OpenAI](https:\u002F\u002Fopenai.com\u002F) - Intelligent text processing API\n\n## 📞 Contact\n\nFor questions or suggestions, please submit an Issue or contact Wendy.\n\n---\n\n## 🚀 Try the Full Product — sipsip.ai\n\nThis tool is the open-source part of **[sipsip.ai](https:\u002F\u002Fsipsip.ai)**.\n\nThe full product goes further:\n- 📧 **Daily email briefs** — follow your favorite creators and get an AI-curated digest in your inbox every morning\n- ⚡ Transcribe & summarize any video or podcast on demand\n- 🌐 Multi-language support across all features\n\n**Free to start** — no credit card required.\n\n➡️ [sipsip.ai](https:\u002F\u002Fsipsip.ai)\n\n---\n\n## ⭐ Star History\n\nIf you find this project helpful, please consider giving it a star!\n","AI Video Transcriber 是一个利用人工智能技术转录和总结视频及播客的开源工具，支持多平台和多种语言。其核心功能包括从YouTube、TikTok等30多个平台上抓取内容或上传本地文件进行处理，采用Subtitle-First架构以提高处理速度，并使用Faster-Whisper实现高精度语音转文字；此外，还具备智能文本优化、多语言摘要生成以及自定义模型配置等功能。该项目非常适合需要快速准确地将多媒体内容转化为可读文本并进行分析的场景，如市场调研、内容创作者辅助或是教育资料整理等。",2,"2026-06-11 03:42:01","high_star"]