[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80733":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":15,"stars30d":15,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":16,"rankGlobal":10,"rankLanguage":10,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":10,"pushedAt":10,"updatedAt":39,"readmeContent":40,"aiSummary":41,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":42,"discoverSource":43},80733,"text-encoding-guard","haodehaode378\u002Ftext-encoding-guard","haodehaode378","检测并修复 AI 编码助手导致的中文乱码（mojibake）。支持 GitHub Action 和 Claude Code hook。Detect and repair Chinese text encoding corruption caused by AI coding assistants.","https:\u002F\u002Fgithub.com\u002Fhaodehaode378\u002Ftext-encoding-guard",null,"Python",42,3,2,0,41.81,"MIT License",false,"main",true,[22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38],"ai","chinese","ci","claude-code","cli","code-quality","copilot","cursor","encoding","gbk","github-action","github-actions","lint","mojibake","python","unicode","utf-8","2026-06-12 04:01:29","# AI Text Encoding Guard\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fimages\u002Fhero-banner-v2.png\" alt=\"AI Text Encoding Guard - Detect and fix Chinese mojibake\" width=\"100%\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fhaodehaode378\u002Ftext-encoding-guard\u002Factions\u002Fworkflows\u002Fci.yml\">\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fhaodehaode378\u002Ftext-encoding-guard\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg\" alt=\"CI\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-blue.svg\" alt=\"License: MIT\">\u003C\u002Fa>\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fdependencies-zero-brightgreen\" alt=\"Zero Dependencies\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10+-blue\" alt=\"Python 3.10+\">\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"#中文\">中文\u003C\u002Fa> &nbsp;|&nbsp; \u003Ca href=\"#english\">English\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\n# 中文\n\n**AI 编码助手把中文搞成乱码了？自动检测，一键修复。**\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"docs\u002Fimages\u002Fworkflow.png\" alt=\"检测 → 分析 → 修复 → 防护\" width=\"100%\">\n\u003C\u002Fp>\n\n## 问题是什么\n\n当 AI 编码助手（Claude、Cursor、Copilot、Codex）编辑含中文的文件时，编码损坏会悄然发生。UTF-8 字节被误读为 GBK — 等你发现时用户已经看到乱码了。\n\n```\n 修复前（乱码）                  修复后（正常）\n ─────────────────────           ─────────────────────\n \u002F\u002F 鐢ㄦ埛鐧诲綍鎴愬姛             \u002F\u002F 用户登录成功\n \u002F\u002F 鏁版嵁搴撹繛鎺ュけ璐�           \u002F\u002F 数据库连接失败\n \u002F\u002F 璁㈠崟鍒涘缓鎴愬姛              \u002F\u002F 订单创建成功\n```\n\nHTML 标签损坏也能捕获：\n\n```html\n\u003C!-- AI 编辑后 → 标签损坏 -->\n\u003Cp>鐢ㄦ埛涓�績\u003C\u002Fp>\n?\u002Fdiv>              \u003C!-- ← \u003C\u002Fdiv> 的 \u003C 被吃掉了 -->\n\n\u003C!-- 修复后 → 完整恢复 -->\n\u003Cp>用户中心\u003C\u002Fp>\n\u003C\u002Fdiv>\n```\n\n### 谁会中招？\n\n| 场景 | 中招概率 | 后果 |\n|:-----|:--------:|:-----|\n| Claude Code 编辑含中文的 Vue\u002FReact 组件 | **极高** | UI 文案全变乱码，用户直接看到 |\n| Cursor 批量重构中文注释 | **高** | 代码可读性归零，新人看不懂 |\n| Copilot 生成中文文档 | **中** | README \u002F CHANGELOG 变天书 |\n| CI\u002FCD 自动化脚本处理中文文件 | **中** | 静默损坏，上线后才发现 |\n\n## 安装\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhaodehaode378\u002Ftext-encoding-guard.git\ncd text-encoding-guard\npip install -e .\n```\n\n## 快速开始\n\n```bash\n# 扫描项目\ncheck-mojibake --root .\u002Fsrc\n\n# 自动修复（创建 .bak 备份）\ncheck-mojibake --root .\u002Fsrc --fix-gbk\n\n# JSON 输出\ncheck-mojibake --root .\u002Fsrc --json\n\n# CI 卡点：有乱码就失败\ncheck-mojibake --root . --fail-on-find\n```\n\n\u003Cdetails>\n\u003Csummary>不安装直接用\u003C\u002Fsummary>\n\n```bash\npython scripts\u002Fcheck_mojibake.py --root .\u002Fsrc\n# 或\npython -m check_mojibake --root .\u002Fsrc\n```\n\n\u003C\u002Fdetails>\n\n## 工作原理\n\n### 七大检测维度\n\n| 乱码类型 | 产生原因 | 检测信号 | 权重 |\n|:---------|:---------|:---------|:----:|\n| **口字码** | UTF-8 读 GBK | Unicode 替换字符 `�` | `+12` |\n| **破标签** | AI 编辑吞尖括号 | 损坏的 HTML 闭合标签 | `+10` |\n| **古文码** | GBK 读 UTF-8 | 18 个已知乱码码点 | `+2` |\n| **锟拷码** | UTF-8→GBK→UTF-8 双重转换 | `锟斤拷` 模式匹配 | `+8` |\n| **烫屯码** | VC 调试未初始化内存 | `烫烫烫` \u002F `屯屯屯` 重复 | `+6` |\n| **问句码** | 双重转换 | 中文后连续 `??` | `+8` |\n| **符号码** | ISO8859-1 读 UTF-8 | 拉丁扩展字符密集出现 | `+2` |\n\n> 文件总分 > 0 即标记为可疑。分数越高，乱码越严重。\n\n### 修复机制\n\n`--fix-gbk` 不是盲目替换，遵循**三重安全门**：\n\n```\n ① 尝试 GB18030 编码 → UTF-8 解码（逆向还原）\n    也尝试反方向：UTF-8 编码 → GB18030 解码\n         ↓\n ② 修复后分数必须 ≤ 原分数的 1\u002F3\n         ↓\n ③ 绝对改善值必须 ≥ 8 分\n         ↓\n   ┌─ 全部通过 → 写入修复 + 创建 .bak.mojibake 备份\n   └─ 任一失败 → 跳过，标记为需人工检查\n```\n\n## 集成方式\n\n### GitHub Actions — 一行搞定\n\n```yaml\nname: Encoding Guard\non: [push, pull_request]\njobs:\n  check:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions\u002Fcheckout@v4\n      - uses: haodehaode378\u002Ftext-encoding-guard@v1\n```\n\nPR 有乱码自动拦截。\n\n\u003Cdetails>\n\u003Csummary>可选参数\u003C\u002Fsummary>\n\n```yaml\n- uses: haodehaode378\u002Ftext-encoding-guard@v1\n  with:\n    root: '.\u002Fsrc'              # 扫描目录（默认 .）\n    fix-gbk: 'true'            # 自动修复（默认 false）\n    ext: '.sql,.cfg'           # 额外扩展名\n```\n\n\u003C\u002Fdetails>\n\n### Claude Code — 每次编辑自动触发\n\n**第一步** — 添加 PostToolUse hook：\n\n```json\n{\n  \"hooks\": {\n    \"PostToolUse\": [\n      {\n        \"matcher\": \"Write|Edit\",\n        \"command\": \"python scripts\u002Fcheck_mojibake.py --root .\",\n        \"description\": \"Check for Chinese mojibake after file edits\"\n      }\n    ]\n  }\n}\n```\n\n**第二步**（可选）— 安装 skill：\n\n```bash\ncp -r .claude\u002Fskills\u002Ftext-encoding-guard\u002F \u002Fpath\u002Fto\u002Fyour\u002Fproject\u002F.claude\u002Fskills\u002F\n```\n\n## CLI 参考\n\n| 参数 | 说明 | 示例 |\n|:-----|:-----|:-----|\n| `--root PATH` | 要扫描的根目录（必填） | `--root .\u002Fsrc` |\n| `--json` | JSON 格式输出 | `--json` |\n| `--fail-on-find` | 发现可疑文件时退出码 2 | `--fail-on-find` |\n| `--fix-gbk` | 尝试自动 GBK→UTF-8 修复 | `--fix-gbk` |\n| `--ext` | 添加额外扩展名（可多次） | `--ext .sql --ext .cfg` |\n| `--verbose` | 显示详细诊断信息 | `-v` |\n| `--quiet` | 静默模式 | `-q` |\n\n### 扫描范围\n\n**默认 20 种扩展名：**\n`.py` `.md` `.txt` `.json` `.yaml` `.yml` `.toml` `.ini` `.js` `.ts` `.tsx` `.jsx` `.vue` `.html` `.css` `.scss` `.sh` `.bat` `.ps1` `.xml`\n\n**自动跳过：**\n`.git` `node_modules` `dist` `build` `__pycache__` `.venv` `venv` `target` `.idea` `.vscode` `.claude`\n\n## 为什么用这个工具？\n\n| 对比项 | 本工具 | 手动检查 | file\u002Fuchardet |\n|:-------|:------:|:--------:|:-------------:|\n| 精确检测乱码 | :white_check_mark: 评分制 | :x: 肉眼看 | :x: 只检测编码 |\n| 自动修复 | :white_check_mark: 保守+备份 | :x: | :x: |\n| CI 集成 | :white_check_mark: 退出码 2 | :x: | :x: |\n| AI 助手触发 | :white_check_mark: hooks | :x: | :x: |\n| 零依赖 | :white_check_mark: 纯 stdlib | N\u002FA | 需要安装 |\n| 双向修复 | :white_check_mark: UTF-8↔GBK | :x: | :x: |\n\n## 开发\n\n```bash\n# 安装开发依赖\npip install -e \".[test]\"\n\n# 运行测试\npytest\n\n# 运行测试（含覆盖率）\npytest --cov=src\u002Fcheck_mojibake --cov-report=term-missing\n```\n\n## 许可证\n\n[MIT](LICENSE)\n\n---\n\n\u003Ca id=\"english\">\u003C\u002Fa>\n\n# English\n\n**AI agents corrupting your Chinese text? Detect it, fix it, ship it.**\n\n## The Problem\n\nWhen AI coding assistants (Claude, Cursor, Copilot, Codex) edit files with Chinese text, encoding corruption silently creeps in. UTF-8 bytes get misread as GBK — and you don't notice until users see garbage.\n\n```\n Before (corrupted)              After (recovered)\n ─────────────────────           ─────────────────────\n \u002F\u002F 鐢ㄦ埛鐧诲綍鎴愬姛             \u002F\u002F 用户登录成功\n \u002F\u002F 鏁版嵁搴撹繛鎺ュけ璐�           \u002F\u002F 数据库连接失败\n \u002F\u002F 璁㈠崟鍒涘缓鎴愬姛              \u002F\u002F 订单创建成功\n```\n\nBroken HTML tags are also caught:\n\n```html\n\u003C!-- After AI edit — broken tags -->\n\u003Cp>鐢ㄦ埛涓�績\u003C\u002Fp>\n?\u002Fdiv>              \u003C!-- ← \u003C\u002Fdiv> lost its \u003C -->\n\n\u003C!-- After fix — fully recovered -->\n\u003Cp>用户中心\u003C\u002Fp>\n\u003C\u002Fdiv>\n```\n\n### Who Gets Hit?\n\n| Scenario | Risk | Impact |\n|:---------|:----:|:-------|\n| Claude Code editing Vue\u002FReact with Chinese | **Very High** | UI text becomes garbled, users see it |\n| Cursor batch-refactoring Chinese comments | **High** | Code readability drops to zero |\n| Copilot generating Chinese docs | **Medium** | README \u002F CHANGELOG become unreadable |\n| CI\u002FCD scripts processing Chinese files | **Medium** | Silent corruption, discovered after deploy |\n\n## Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fhaodehaode378\u002Ftext-encoding-guard.git\ncd text-encoding-guard\npip install -e .\n```\n\n## Quick Start\n\n```bash\n# Scan a project\ncheck-mojibake --root .\u002Fsrc\n\n# Auto-fix (creates .bak backup)\ncheck-mojibake --root .\u002Fsrc --fix-gbk\n\n# JSON output\ncheck-mojibake --root .\u002Fsrc --json\n\n# CI gate: fail on findings\ncheck-mojibake --root . --fail-on-find\n```\n\n\u003Cdetails>\n\u003Csummary>Run without installation\u003C\u002Fsummary>\n\n```bash\npython scripts\u002Fcheck_mojibake.py --root .\u002Fsrc\n# or\npython -m check_mojibake --root .\u002Fsrc\n```\n\n\u003C\u002Fdetails>\n\n## How It Works\n\n### 7 Detection Types\n\n| Type | Cause | Signal | Weight |\n|:-----|:------|:-------|:------:|\n| **Box chars** | UTF-8 read as GBK | Unicode replacement char `�` | `+12` |\n| **Broken tags** | AI swallows angle brackets | Corrupted HTML closing tags | `+10` |\n| **Ancient text** | GBK read as UTF-8 | 18 known mojibake codepoints | `+2` |\n| **Kun-Kao** | UTF-8→GBK→UTF-8 double conversion | `锟斤拷` pattern match | `+8` |\n| **Tang-Tun** | VC debug uninitialized memory | `烫烫烫` \u002F `屯屯屯` repeats | `+6` |\n| **Question code** | Double conversion | Consecutive `??` after Chinese | `+8` |\n| **Symbol code** | ISO8859-1 read as UTF-8 | Dense Latin Extended characters | `+2` |\n\n> Any file with score > 0 is flagged. Higher score = worse corruption.\n\n### Fix Mechanism\n\n`--fix-gbk` is not a blind replace — it follows a **triple safety gate**:\n\n```\n ① Try GB18030 encode → UTF-8 decode (reverse recovery)\n    Also try the other direction: UTF-8 encode → GB18030 decode\n         ↓\n ② Fixed score must be ≤ 1\u002F3 of original score\n         ↓\n ③ Absolute improvement must be ≥ 8 points\n         ↓\n   ┌─ All pass → Write fix + create .bak.mojibake backup\n   └─ Any fail → Skip, mark for manual review\n```\n\n## Integration\n\n### GitHub Actions — One Line\n\n```yaml\nname: Encoding Guard\non: [push, pull_request]\njobs:\n  check:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions\u002Fcheckout@v4\n      - uses: haodehaode378\u002Ftext-encoding-guard@v1\n```\n\nPRs with mojibake get blocked automatically.\n\n\u003Cdetails>\n\u003Csummary>Optional parameters\u003C\u002Fsummary>\n\n```yaml\n- uses: haodehaode378\u002Ftext-encoding-guard@v1\n  with:\n    root: '.\u002Fsrc'              # Scan directory (default .)\n    fix-gbk: 'true'            # Auto-fix (default false)\n    ext: '.sql,.cfg'           # Extra extensions\n```\n\n\u003C\u002Fdetails>\n\n### Claude Code — Auto-run on Every Edit\n\n**Step 1** — Add a PostToolUse hook:\n\n```json\n{\n  \"hooks\": {\n    \"PostToolUse\": [\n      {\n        \"matcher\": \"Write|Edit\",\n        \"command\": \"python scripts\u002Fcheck_mojibake.py --root .\",\n        \"description\": \"Check for Chinese mojibake after file edits\"\n      }\n    ]\n  }\n}\n```\n\n**Step 2** (optional) — Install the skill:\n\n```bash\ncp -r .claude\u002Fskills\u002Ftext-encoding-guard\u002F \u002Fpath\u002Fto\u002Fyour\u002Fproject\u002F.claude\u002Fskills\u002F\n```\n\n## CLI Reference\n\n| Flag | Description | Example |\n|:-----|:------------|:--------|\n| `--root PATH` | Root directory to scan (required) | `--root .\u002Fsrc` |\n| `--json` | JSON output | `--json` |\n| `--fail-on-find` | Exit code 2 when suspicious files found | `--fail-on-find` |\n| `--fix-gbk` | Attempt auto GBK→UTF-8 fix | `--fix-gbk` |\n| `--ext` | Add extra extensions (repeatable) | `--ext .sql --ext .cfg` |\n| `--verbose` | Show detailed diagnostics | `-v` |\n| `--quiet` | Suppress output | `-q` |\n\n### Scan Scope\n\n**Default 20 extensions:**\n`.py` `.md` `.txt` `.json` `.yaml` `.yml` `.toml` `.ini` `.js` `.ts` `.tsx` `.jsx` `.vue` `.html` `.css` `.scss` `.sh` `.bat` `.ps1` `.xml`\n\n**Auto-skipped:**\n`.git` `node_modules` `dist` `build` `__pycache__` `.venv` `venv` `target` `.idea` `.vscode` `.claude`\n\n## Why This Tool?\n\n| Feature | This Tool | Manual Check | file\u002Fuchardet |\n|:--------|:---------:|:------------:|:-------------:|\n| Precise mojibake detection | :white_check_mark: Scoring | :x: Eyeball | :x: Encoding only |\n| Auto-fix | :white_check_mark: Conservative + backup | :x: | :x: |\n| CI integration | :white_check_mark: Exit code 2 | :x: | :x: |\n| AI assistant trigger | :white_check_mark: Hooks | :x: | :x: |\n| Zero dependencies | :white_check_mark: Pure stdlib | N\u002FA | Requires install |\n| Bidirectional fix | :white_check_mark: UTF-8↔GBK | :x: | :x: |\n\n## Development\n\n```bash\n# Install dev dependencies\npip install -e \".[test]\"\n\n# Run tests\npytest\n\n# Run tests with coverage\npytest --cov=src\u002Fcheck_mojibake --cov-report=term-missing\n```\n\n## License\n\n[MIT](LICENSE)\n","AI Text Encoding Guard 是一个用于检测并修复由 AI 编码助手导致的中文乱码问题的工具。它支持 GitHub Action 和 Claude Code hook，能够自动识别和纠正 UTF-8 与 GBK 编码错误，以及 HTML 标签损坏等问题。该工具使用 Python 开发，无需额外依赖即可运行，并且提供了命令行接口（CLI）以方便集成到持续集成\u002F持续部署（CI\u002FCD）流程中。特别适用于涉及中文文本处理的项目，如 Vue\u002FReact 组件开发、代码注释重构、文档生成等场景，有助于提高代码质量和用户体验。","2026-06-11 04:01:48","CREATED_QUERY"]