[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80707":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":21,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":27,"discoverSource":28},80707,"IMG-Dataset-Refiner","NyxAwroo\u002FIMG-Dataset-Refiner","NyxAwroo","Advanced local tool for intelligent preparation and selection with graphics for training datasets for Loras (Flux\u002FQwen\u002FZ-image\u002FSDXL\u002F etc...)","",null,"Python",54,5,4,0,2,9,10,6,2.33,false,"main",[],"2026-06-12 02:04:05","# IMG Dataset Refiner v4.3 Pro\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"logotype\u002Flogo.jpg\" alt=\"IMG Dataset Refiner logo\" width=\"240\">\n  \n  **✨[Sponsor this project](https:\u002F\u002Fwww.paypal.com\u002Fpaypalme\u002FNyxAwroo)**\n\n\n  \u003Cbr>\n\n  ![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10%2B-3776AB?style=for-the-badge&logo=python&logoColor=white)\n  ![Gradio](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGradio-4%2B%20%2F%206%20ready-F97316?style=for-the-badge)\n  ![Windows](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWindows-one--click%20install-0078D6?style=for-the-badge&logo=windows&logoColor=white)\n  ![Local AI](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLocal%20AI-Ollama%20%7C%20LM%20Studio-16A34A?style=for-the-badge)\n  ![Languages](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLanguages-FR%20%7C%20EN%20%7C%20custom-8B5CF6?style=for-the-badge)\n\n  \u003Cp>\u003Cstrong>Desktop-style dataset manager for LoRA, Flux, SDXL and Stable Diffusion training workflows.\u003C\u002Fstrong>\u003C\u002Fp>\n\n  \u003Cp>\n    \u003Ca href=\"#preview\">Preview\u003C\u002Fa> |\n    \u003Ca href=\"#features\">Features\u003C\u002Fa> |\n    \u003Ca href=\"#installation\">Installation\u003C\u002Fa> |\n    \u003Ca href=\"#workflow\">Workflow\u003C\u002Fa> |\n    \u003Ca href=\"#ai-backends\">AI Backends\u003C\u002Fa> |\n    \u003Ca href=\"#developer-notes\">Developer Notes\u003C\u002Fa>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n---\n\n## Overview\n\n**IMG Dataset Refiner** is a local Gradio application for building clean, balanced and training-ready image datasets. It helps you load folders of images and `.txt` captions, edit captions quickly, batch-apply keywords, pre-process images, detect duplicates, analyze dataset bias, and export curated subsets with balancing rules.\n\nIt is built for creators training:\n\n- character, style, object and concept **LoRAs**;\n- **Flux**, **SDXL**, Stable Diffusion and related image models;\n- datasets requiring caption cleanup, translation, balancing and visual QA.\n\nThe app runs locally and can work with local AI servers such as **Ollama** and **LM Studio**, while also supporting OpenAI-compatible endpoints and cloud providers when configured.\n\n---\n\n## Preview\n\n\n### Main Workspace\n\n![Main workspace preview](https:\u002F\u002Fgithub.com\u002FNyxAwroo\u002FIMG-Dataset-Refiner\u002Fblob\u002Fmain\u002Fscreenshots%20demo\u002Fv4.3\u002F1.jpeg)\n\n\n### AI Assistant\n\n![AI assistant preview](https:\u002F\u002Fgithub.com\u002FNyxAwroo\u002FIMG-Dataset-Refiner\u002Fblob\u002Fmain\u002Fscreenshots%20demo\u002Fv4.3\u002F4.jpeg?raw=true)\n\n\n### Export & Recipe\n\n![Export recipe preview](https:\u002F\u002Fgithub.com\u002FNyxAwroo\u002FIMG-Dataset-Refiner\u002Fblob\u002Fmain\u002Fscreenshots%20demo\u002Fv4.3\u002F5.jpeg?raw=true)\n\n\n### Advanced Analytics\n\n![Export recipe preview](https:\u002F\u002Fgithub.com\u002FNyxAwroo\u002FIMG-Dataset-Refiner\u002Fblob\u002Fmain\u002Fscreenshots%20demo\u002Fv4.3\u002F3.jpeg?raw=true)\n\n\n---\n\n## Features\n\n### Dataset Loading\n\n- Load datasets from local paths on any drive.\n- Drag and drop folders or files into the app.\n- Fallback folder-signature search when the browser hides absolute paths.\n- Persistent favorites for frequently used datasets.\n- Natural sorting: `img1`, `img2`, `img10`.\n- Gradio `allowed_paths` support for external local folders.\n\n### Caption Editing\n\n- Fast visual gallery with persistent column preference.\n- Image viewer with caption editor.\n- Keyboard navigation with arrows and `PageUp` \u002F `PageDown`.\n- `Ctrl+S` save shortcut.\n- Live word and token counters.\n- Highlight tracked tags from the global recipe.\n- Live translation preview and full-caption translation.\n\n### Batch Editing\n\n- Clean comma spacing.\n- Remove duplicate tags.\n- Apply keyword library items to a selection or the whole dataset.\n- Add, remove or replace caption fragments at scale.\n- Undo last batch operation.\n\n### Custom Word Library\n\n- Right-side mass-batch word library.\n- Click-to-select custom HTML items.\n- JavaScript bridge for reliable Gradio\u002FSvelte synchronization.\n- Add, remove and clear library entries.\n\n### AI Assistance\n\n- Local LLM\u002FVLM actions through Ollama or LM Studio.\n- OpenAI-compatible endpoint support.\n- Cloud support for Anthropic Claude and Google Gemini.\n- Auto-captioning \u002F OCR with vision models.\n- Reality check \u002F hallucination cleanup.\n- Concept isolator for LoRA subject separation.\n- Tag sorting and standardization.\n- Custom prompt templates.\n- Dataset profiling report with compact tag statistics.\n- AI-generated global recipe from existing captions.\n\n### LM Studio Controls\n\n![Main workspace preview](https:\u002F\u002Fgithub.com\u002FNyxAwroo\u002FIMG-Dataset-Refiner\u002Fblob\u002Fmain\u002Fscreenshots%20demo\u002FLM%20Studio%20-%20how%20find%20API.png?raw=true)\n\n- Refresh available LM Studio models.\n- Favorite VLM and LLM model selectors.\n- Shared model selector for using one model as both VLM and LLM.\n- Load and unload model buttons.\n- Persistent AI settings in `ai_settings.json`.\n\n### Image Pre-processing\n\n- Visual duplicate detection with `imagehash`.\n- Smart face crop via OpenCV.\n- Center square crop.\n- Batch resize to common training resolutions.\n- WebP \u002F JPEG output.\n- Transparent-background flattening for PNGs.\n- Batch renaming for images and captions.\n\n### Analytics\n\n- General tag frequency table.\n- CivitAI \u002F Markdown statistics export.\n- Top-20 recipe fill.\n- Orphan tag detection.\n- Co-occurrence heatmap.\n- Resolution distribution plot.\n- Exclusion matrix.\n- Contradiction hunter.\n- In-app explanations for reading analytics outputs.\n\n### Export\n\n- Classic filtering, auto-balancing and priority strategies.\n- Greedy export algorithm for balanced subsets.\n- Simulation before export.\n- Versioned export folders based on the dataset name.\n- Custom export suffix with `-Sx` \u002F `-S1`, `-S2`, `-S3` behavior.\n\n---\n\n## Installation\n\n### Windows One-click Install\n\n1. Download or clone the repository.\n2. Double-click `install.bat`.\n3. Double-click `start.bat`.\n4. The browser opens automatically.\n\n### Manual Install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FNyxAwroo\u002FIMG-Dataset-Refiner.git\ncd IMG-Dataset-Refiner\npython -m venv venv\n\n# Windows\nvenv\\Scripts\\activate\n\n# macOS \u002F Linux\nsource venv\u002Fbin\u002Factivate\n\npip install -r requirements.txt\npython lora_manager.py\n```\n\n---\n\n## Workflow\n\n1. **Load a dataset**\n   - Paste a folder path, use Browse, drag and drop, or pick a favorite.\n\n2. **Inspect and edit captions**\n   - Use the gallery, viewer and caption editor.\n   - Adjust gallery columns if needed; the setting is saved.\n\n3. **Clean in batch**\n   - Remove duplicate tags, normalize commas and apply word-library actions.\n\n4. **Use AI selectively**\n   - Generate captions, clean hallucinations, standardize tags or build a global recipe.\n\n5. **Analyze**\n   - Check tag frequencies, heatmaps, contradictions and resolution distribution.\n\n6. **Simulate and export**\n   - Tune the recipe table, simulate the result, then export a versioned clean dataset.\n\n---\n\n## AI Backends\n\n| Backend | Use case | Notes |\n| --- | --- | --- |\n| Ollama | Local LLM\u002FVLM workflows | Default local backend, no API key required. |\n| LM Studio \u002F OpenAI-compatible | Local GGUF or remote compatible APIs | Uses `\u002Fv1\u002Fchat\u002Fcompletions`. Model IDs must match the server. |\n| Anthropic Claude | Cloud text and vision workflows | Requires an API key. |\n| Google Gemini | Cloud text and vision workflows | Requires an API key. |\n\nAI settings are stored in `ai_settings.json`.\n\n---\n\n## Project Structure\n\n```text\nIMG Dataset Refiner\u002F\n├── lora_manager.py              # Main application: UI, backend logic, custom JS bridge\n├── requirements.txt             # Python dependencies\n├── install.bat                  # Windows installer\n├── start.bat                    # Windows launcher\n├── Changelog.md                 # Release notes\n├── Prompt_system.md             # Future-development handoff prompt\n├── readme.md                    # GitHub documentation\n├── SUGGESTIONS.md               # Optional improvement backlog\n├── lora_recipes.json            # Saved global\u002Fexport recipes\n├── ai_settings.json             # Persistent AI backend\u002Fmodel settings\n├── ui_settings.json             # Persistent UI preferences, including gallery columns\n├── languages\u002F\n│   ├── fr.json                  # French UI strings\n│   └── en.json                  # English UI strings\n├── logotype\u002F\n│   └── logo.jpg                 # Logo used in README\n└── screenshots demo\u002F            # Place GitHub preview screenshots here\n```\n\nGenerated files such as `favorites.json`, `ai_recipes.json` and export folders may appear after using the app.\n\n---\n\n## Languages\n\nThe UI is fully driven by JSON language files.\n\nTo add a language:\n\n1. Copy `languages\u002Ffr.json` or `languages\u002Fen.json`.\n2. Rename it, for example `de.json`, `es.json`, `it.json`.\n3. Translate values while preserving keys.\n4. Put it in `languages\u002F`.\n5. Restart the app.\n\nYou can also import a language JSON file directly from the in-app settings panel.\n\n---\n\n## Developer Notes\n\nThis project relies on a sensitive Gradio + JavaScript bridge. Before editing `lora_manager.py`, read `Prompt_system.md`.\n\nImportant stability rules:\n\n- Do not pass `custom_js` through `launch(js=...)`.\n- Keep the custom JavaScript injection through `app.load(..., js=custom_js)`.\n- Do not combine `custom_js` and frontend component outputs in the same `app.load` event.\n- Do not update `Gallery` from `app.load`; it can trigger Gradio\u002FSvelte `flush` loops and freeze tabs.\n- Keep hidden bridge components present in the DOM; Gradio may destroy components marked only as `visible=False`.\n- Keep global JavaScript listeners narrow and guarded.\n- Test tab switching after any JS, gallery, dataframe or `app.load` change.\n\n---\n\n## Requirements\n\nMain dependencies include:\n\n- `gradio`\n- `pandas`\n- `plotly`\n- `Pillow`\n- `requests`\n- `deep-translator`\n- `imagehash`\n- `opencv-python`\n\nSee `requirements.txt` for the exact install list.\n\n---\n### 💛 Support the project\nInstaLocalPlanner is a free, open project developed on personal time. If it helps your Instagram workflow, you can support its development with a donation.\n\n**Donation link:** [PayPal](https:\u002F\u002Fwww.paypal.com\u002Fpaypalme\u002FNyxAwroo) \n\u002F\u002F Donations help fund development time, testing, documentation and future improvements. Huge thanks to anyone who contributes 🙏\n\n---\n\n## Credits\n\nMade by NyxAwroo\n","IMG Dataset Refiner 是一个用于准备和选择图像数据集的高级本地工具，适用于训练 LoRA、Flux、SDXL 和 Stable Diffusion 等模型。该项目基于 Python 和 Gradio 构建，提供了一套桌面风格的数据集管理方案，支持快速加载图片和文本标注、批量应用关键词、预处理图像、检测重复项、分析数据集偏差以及导出经过筛选和平衡的数据子集。此外，它还能够与本地 AI 服务器如 Ollama 和 LM Studio 协同工作，并兼容 OpenAI 标准接口。该工具特别适合于需要清理标注、翻译、平衡及视觉质量检查的数据集创建者使用。","2026-06-11 04:01:43","CREATED_QUERY"]