[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-11576":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":16,"rankGlobal":9,"rankLanguage":9,"license":17,"archived":18,"fork":18,"defaultBranch":19,"hasWiki":20,"hasPages":18,"topics":21,"createdAt":9,"pushedAt":9,"updatedAt":22,"readmeContent":23,"aiSummary":24,"trendingCount":14,"starSnapshotCount":14,"syncStatus":25,"lastSyncTime":26,"discoverSource":27},11576,"uap-release-analyzer","ckpxgfnksd-max\u002Fuap-release-analyzer","ckpxgfnksd-max","A Claude skill for analyzing tranches of declassified UAP\u002FUFO documents (war.gov PURSUE, FBI Vault, NARA, AARO). Inventory + text extraction + entity surfacing + standardized 11-section REPORT.md.",null,"Python",152,24,1,0,8,44.99,"MIT License",false,"main",true,[],"2026-06-12 04:00:55","# uap-release-analyzer\n\nA Claude Code \u002F Claude.ai skill that turns a folder of declassified UAP\u002FUFO documents — war.gov \"PURSUE\" releases, FBI Vault tranches, NARA boxes, AARO publications — into a structured analytic report.\n\n## What it does\n\nRun it against a release directory (e.g. `~\u002FDocuments\u002FUFO\u002Frelease_01\u002F`) and it produces:\n\n- `inventory.csv` — one row per file: agency (inferred from filename prefix), document type, page count, size\n- `text\u002F*.txt` — extracted text via pdfplumber, with empty files flagged for the (often majority) of files that are scanned with no text layer\n- `analytics\u002F`\n  - `top_terms.csv`, `terms_by_agency.csv` — token frequencies\n  - `entities.json` — locations, agencies, phenomena vocabulary, year clusters, names appearing in 5+ files\n  - `per_file_digest.csv` — top terms \u002F locations \u002F redactions \u002F 2-sentence summary per file\n  - `cross_doc.json` — redaction patterns, agency totals, scanned-vs-text split\n- `REPORT.md` — 11-section human-readable analytic writeup\n\nThe four scripts are idempotent and incremental — re-running on the same folder skips work that's already done.\n\n## Installation\n\n```bash\n# Inside Claude Code (per-user skills directory)\ngit clone https:\u002F\u002Fgithub.com\u002Fckpxgfnksd-max\u002Fuap-release-analyzer.git \\\n  ~\u002F.claude\u002Fskills\u002Fuap-release-analyzer\n```\n\nOr package via `skill-creator`:\n\n```bash\npython -m scripts.package_skill \u002Fpath\u002Fto\u002Fuap-release-analyzer\n# produces uap-release-analyzer.skill — install via Claude Code UI\n```\n\nDependencies: `pdfplumber`, `pypdf`. Install via `pip install pdfplumber pypdf`.\n\n## Layout\n\n```\nuap-release-analyzer\u002F\n├── SKILL.md              # frontmatter + workflow\n├── scripts\u002F\n│   ├── inventory.py\n│   ├── extract_text.py\n│   ├── analyze.py\n│   ├── build_report.py\n│   └── run_all.py        # convenience: run the four in order\n├── references\u002F\n│   ├── agency_vocab.md   # filename-prefix → agency rules\n│   ├── foia_codes.md     # FOIA exemptions and classification stamps\n│   └── war_gov_quirks.md # how war.gov\u002FUFO\u002F is structured + scraping notes\n├── evals\u002Fevals.json      # 4 test cases used to iterate the skill\n├── ARTICLE.md            # development notes (English)\n├── ARTICLE_CN.md         # 中文版开发笔记\n└── LICENSE.txt\n```\n\n## Usage\n\n```bash\n# One-shot: full pipeline\npython scripts\u002Frun_all.py ~\u002FDocuments\u002FUFO\u002Frelease_01\u002F\n\n# Or step-by-step (inventory and extract are the slow parts; both are idempotent)\npython scripts\u002Finventory.py    ~\u002FDocuments\u002FUFO\u002Frelease_01\u002F\npython scripts\u002Fextract_text.py ~\u002FDocuments\u002FUFO\u002Frelease_01\u002F        # all files\npython scripts\u002Fextract_text.py ~\u002FDocuments\u002FUFO\u002Frelease_01\u002F 0 25   # chunked\npython scripts\u002Fanalyze.py      ~\u002FDocuments\u002FUFO\u002Frelease_01\u002F\npython scripts\u002Fbuild_report.py ~\u002FDocuments\u002FUFO\u002Frelease_01\u002F\n```\n\n## Example dataset\n\nThe May 2026 war.gov \"PURSUE\" release this skill was tuned against is mirrored at [`ckpxgfnksd-max\u002Fuap-release-01`](https:\u002F\u002Fgithub.com\u002Fckpxgfnksd-max\u002Fuap-release-01) (Git LFS, **~3.7 GB \u002F 160 files**: 118 PDFs, 28 MP4 videos, 14 images). Clone it as your `release_01\u002F` to reproduce the eval scoreboard:\n\n```bash\ngit lfs install   # one-time\ngit clone https:\u002F\u002Fgithub.com\u002Fckpxgfnksd-max\u002Fuap-release-01.git ~\u002FDocuments\u002FUFO\u002Frelease_01\npython scripts\u002Frun_all.py ~\u002FDocuments\u002FUFO\u002Frelease_01\n```\n\nOr fetch only the buckets you care about:\n\n```bash\nGIT_LFS_SKIP_SMUDGE=1 git clone https:\u002F\u002Fgithub.com\u002Fckpxgfnksd-max\u002Fuap-release-01.git ~\u002FDocuments\u002FUFO\u002Frelease_01\ncd ~\u002FDocuments\u002FUFO\u002Frelease_01\n\ngit lfs pull --include \"dow-uap-*.pdf\"     # text-bearing DOW mission report PDFs\ngit lfs pull --include \"dow-uap-pr*.mp4\"   # 27 DOW Unresolved-Report videos (1.3 GB)\ngit lfs pull --include \"65_hs1*\"           # heavy FBI scanned sections\n```\n\nThe 28 videos in the corpus aren't analyzed by this skill — `analyze.py` only reads PDF text. They're mirrored for completeness so future video-aware analysis (e.g., scene-classification or transcript extraction) has a stable input set. The skill flags any non-PDF file as `(image file — vision analysis required)` or similar in `per_file_digest.csv` and skips it for text analytics.\n\n## Eval scoreboard (iteration-1)\n\n| Eval | with skill | baseline | Δ |\n|---|---|---|---|\n| Full-tranche walkthrough | 100% | 60% | +40 |\n| Single-file summary | 100% | 100% | 0 |\n| Scanned-tranche honest caveats | 100% | 88% | +12 |\n| Fresh-tranche bootstrap | 88% | 50% | +38 |\n| **Mean** | **97%** | **74%** | **+23** |\n\nSee `ARTICLE.md` for the build story and the bugs the eval surfaced.\n\n## Honest caveats\n\n- Entity extraction is keyword-list + regex, not full NER. Year mentions ≠ incident dates.\n- Scanned PDFs (no text layer) produce 0-char `.txt` files by design — the analyzer treats them as \"OCR needed\" rather than running OCR (multi-hour). Run Tesseract as a follow-up if you need that content searchable.\n- The agency vocabulary is tuned to the May 2026 war.gov tranche. New tranches with new prefixes should be added to `references\u002Fagency_vocab.md` and `scripts\u002Finventory.py PREFIX_RULES`.\n\n## License\n\nMIT. See `LICENSE.txt`.\n","uap-release-analyzer 是一个用于分析解密的UAP\u002FUFO文档（如war.gov PURSUE、FBI Vault、NARA和AARO发布的文件）的Claude技能。其核心功能包括目录清单生成、文本提取、实体识别以及生成标准化的11部分报告。项目利用Python编写，依赖于pdfplumber和pypdf库来处理PDF文档中的信息抽取，并能识别并记录文档中的重要实体如地点、机构名称等。该工具特别适合需要对大量解密UFO相关资料进行结构化整理与初步分析的研究人员或爱好者使用。通过自动化处理流程，用户可以快速获得关于特定文档集的关键洞察，提高研究效率。",2,"2026-06-11 03:32:08","CREATED_QUERY"]