[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80674":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":39,"readmeContent":40,"aiSummary":41,"trendingCount":14,"starSnapshotCount":14,"syncStatus":12,"lastSyncTime":42,"discoverSource":43},80674,"phantomstars","tg12\u002Fphantomstars","tg12","Automated detection and tracking of fake engagement on GitHub — daily CI, zero infrastructure",null,"Python",56,2,48,0,4,8,1,44.23,"Other",false,"main",true,[24,25,26,27,28,29,30,31,32,33,34,35,36,37,38],"astroturfing","automation","bot-detection","fake-engagement","fake-stars","github","github-actions","github-trending","infosec","osint","python","security","spam-detection","sybil-detection","threat-intelligence","2026-06-12 04:01:29","\u003Cp align=\"center\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fphantomstars-v0.1.0-blueviolet?style=for-the-badge\" alt=\"phantomstars\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.13-blue?style=for-the-badge\" alt=\"Python 3.13\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-Apache--2.0-green?style=for-the-badge\" alt=\"Apache 2.0\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCI-GitHub%20Actions-orange?style=for-the-badge\" alt=\"GitHub Actions\">\n  \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fruns-daily-brightgreen?style=for-the-badge\" alt=\"Daily\">\n\u003C\u002Fp>\n\n\u003Ch1 align=\"center\">phantomstars\u003C\u002Fh1>\n\u003Cp align=\"center\">\u003Cstrong>Automated detection and tracking of fake engagement on GitHub\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Cp align=\"center\">\n  A \u003Ca href=\"https:\u002F\u002Flabs.jamessawyer.co.uk\u002F\">JS Labs\u003C\u002Fa> project &mdash;\n  part of the \u003Ca href=\"https:\u002F\u002Flabs.jamessawyer.co.uk\u002Fai-slop-intelligence-dashboards\u002F\">AI Slop Intelligence\u003C\u002Fa> initiative.\u003Cbr>\n  Runs every day. Scores every suspicious account. Detects coordinated bot campaigns.\u003Cbr>\n  Files issues directly on compromised repos so maintainers can act.\n\u003C\u002Fp>\n\n---\n\n\u003Cp align=\"center\">\u003Cstrong>Support this project\u003C\u002Fstrong>\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ccode>BTC\u003C\u002Fcode> &nbsp; \u003Ccode>3QjWqhQbHdHgWeYHTpmorP8Pe1wgDjJy54\u003C\u002Fcode>\u003Cbr>\n  \u003Ccode>ETH\u003C\u002Fcode> &nbsp; \u003Ccode>0x5851e6145F4773d1585b8686095FB16E368a4dA1\u003C\u002Fcode>\u003Cbr>\n  \u003Ccode>ZEC\u003C\u002Fcode> &nbsp; \u003Ccode>t1KSR5YkNPbjqRSCoLKo5AddFWdm9Kzxh1B\u003C\u002Fcode>\n\u003C\u002Fp>\n\n---\n\n## Why this exists\n\nGitHub stars are a trust signal. They are how developers decide what to evaluate, what to depend on, and what to recommend. That signal is being systematically corrupted.\n\nDuring the AI boom of 2024-2026, an industry of bot farms emerged to manufacture credibility for low-quality, often malicious repositories. A project with 800 stars in 48 hours reads as legitimate to a developer scanning search results. That's the point. The goal of fake engagement isn't the stars themselves; it's the social proof those stars produce, and the downstream decisions that social proof influences.\n\nThe pattern is identifiable. Accounts created the same week, no bio, no followers, no original repositories, starring the same 15 repos within a 2-hour window. Not one campaign, but dozens running simultaneously, every day, across thousands of accounts. The data shows repos where 185 out of 185 engagers are bots. A 100% fakeness ratio. Entire trending placements built on nothing.\n\n**phantomstars** was built because this problem is tractable. The signal-to-noise ratio in GitHub's public API is, for now, still high enough that coordinated campaigns leave clear fingerprints. This project reads those fingerprints, publishes the raw data, and notifies affected repository maintainers directly.\n\nThis is part of the broader [AI Slop Intelligence](https:\u002F\u002Flabs.jamessawyer.co.uk\u002Fai-slop-intelligence-dashboards\u002F) work at [JS Labs](https:\u002F\u002Flabs.jamessawyer.co.uk\u002F), ongoing research into the mechanics and measurable effects of low-quality AI-generated content flooding developer ecosystems. Fake engagement isn't a peripheral issue. It's the distribution mechanism that gets slop in front of real users.\n\n---\n\n## What it does\n\n**phantomstars** runs a daily GitHub Actions job that:\n\n1. Scrapes the [GitHub Trending](https:\u002F\u002Fgithub.com\u002Ftrending) page for repos gaining stars today\n2. Queries the GitHub Search API for repos created in the last **7 days** with sudden star activity (the wider window catches multi-day campaigns missed by 24h-only scans)\n3. Seeds additional candidate repos from recent Reddit posts in `r\u002Fosinttools` and `r\u002Fcoolgithubprojects` by extracting GitHub repo links from the last **2 days**\n4. Pulls recent engagement events (stars, forks) via the Events API (last 24 hours per repo)\n5. Fetches the full profile of every engaging account via GraphQL: **account creation date**, follower\u002Ffollowing counts, bio, repo history\n6. Scores every account against a composite heuristics model: account age, profile completeness, repository patterns, and activity history\n7. Detects **coordinated campaigns** using timestamp clustering and union-find: clusters of suspicious accounts that engaged within a 3-hour window\n8. Applies the false-positive allowlist before ledger writes, repo-level ratios, dashboards, and notifications so every visible metric uses the same population\n9. Appends all suspects to an append-only JSONL ledger committed back to this repo\n10. Publishes a per-repo intelligence feed showing which repos are being targeted, which discovery sources found them, and whether the Events API window was complete or capped\n11. **Files GitHub issues directly on targeted repos** so maintainers see the campaign data in their own issue tracker\n12. Writes a formatted scan report to the GitHub Actions job summary\n\nNo servers. No databases. No infrastructure bill.\n\n---\n\n## Frequently asked questions\n\n### Does it notify the targeted repo?\n\n**Yes.** When a repo's fakeness ratio exceeds 40% or a coordinated campaign is detected, phantomstars opens an issue directly on that repository. The issue contains the full suspect table, campaign membership, composite scores, and account creation dates: everything a maintainer needs to investigate and report to GitHub.\n\nIf issues are disabled on a targeted repo, the notification is skipped silently and recorded in the scan log.\n\n### Can I request a check for one specific repository?\n\nYes.\n\n- For a normal one-off check, submit a repo in `owner\u002Frepo` form and run a targeted scan.\n- For a lifetime audit request, use the one-off lifetime mode. It is separate from the daily scan.\n\nWhy the split:\n\n- The normal scan model is designed for recent public engagement and low operator cost.\n- A lifetime audit can involve tens of thousands of stars and thousands of forks on larger repos.\n- That is feasible for one-off investigation, but it is too expensive and too slow for the default daily path.\n- Lifetime requests therefore run only in explicit one-off mode with guardrails.\n\n### Can I report a false positive?\n\nYes. If your account appears in `data\u002Fsuspects.jsonl` and you believe the classification is incorrect, [open a false positive issue](..\u002F..\u002Fissues\u002Fnew?template=false_positive.yml) using the provided template. Reports are reviewed manually before any allowlist addition. The allowlist is stored in `data\u002Fallowlist.txt`; accounts listed there are excluded from all future scans and from the suspects ledger.\n\n### What is the campaign ID?\n\nA campaign ID (e.g. `c-a3f9b2e1`) is a **deterministic 8-character hex fingerprint** derived from the SHA-256 hash of the sorted set of member logins in that campaign. The same group of accounts will produce the same campaign ID across independent scan runs, enabling longitudinal tracking. It is not a repo name, a username, or any external identifier.\n\n**Stability:** the ID is stable as long as the campaign's member set is unchanged. If bots are added or suspended between scans, the ID changes because the membership changed. This is expected and reflects real-world drift in bot farm composition.\n\n### Does it check account creation dates?\n\nYes. Every account's creation date is fetched from the GitHub GraphQL API (`createdAt` field) and stored in each suspect record as `account_created_at`. It's also the primary input to the account age score, the strongest single signal for fake accounts. Accounts created within 2 days of engaging score 1.0 on age alone.\n\n### How confident is it?\n\nIndividual scores carry meaningful false positive rates. A new developer with a sparse profile legitimately scores 0.75+. The tool accounts for this by requiring campaign-level evidence before filing issues; a single suspicious account is not enough. A coordinated cluster of 40+ accounts, all created the same week, all scoring 0.75+, all engaging within 90 minutes, is a different matter. That's where confidence becomes actionable.\n\nThe data is always probabilistic. The issue bodies say so explicitly. The goal is to give maintainers the signal and the raw evidence to make their own judgement.\n\n---\n\n## Live dashboard\n\n\u003C!-- STATS:START -->\n| Date | Scanned | Likely Fake | Suspicious | Campaigns | New Fakes (24h) |\n|------|---------|-------------|------------|-----------|-----------------|\n| 2026-05-30 | 2576 | 530 | 2046 | 20 | 356 |\n| 2026-05-29 | 2838 | 733 | 2105 | 42 | 369 |\n| 2026-05-28 | 2748 | 694 | 2054 | 39 | 396 |\n| 2026-05-27 | 2193 | 560 | 1633 | 32 | 491 |\n| 2026-05-26 | 1930 | 236 | 1694 | 43 | 190 |\n| 2026-05-25 | 1526 | 214 | 1312 | 32 | 158 |\n| 2026-05-24 | 2170 | 358 | 1812 | 39 | 265 |\n| 2026-05-23 | 2548 | 426 | 2122 | 43 | 317 |\n| 2026-05-22 | 2318 | 340 | 1978 | 47 | 247 |\n| 2026-05-21 | 1981 | 348 | 1633 | 25 | 277 |\n| 2026-05-20 | 1613 | 268 | 1345 | 23 | 163 |\n| 2026-05-19 | 5463 | 630 | 4121 | 67 | 442 |\n| 2026-05-18 | 8838 | 670 | 7950 | 128 | 340 |\n| 2026-05-17 | 8015 | 831 | 5709 | 82 | 831 |\n\u003C!-- STATS:END -->\n\n---\n\n## Today's most-targeted repos\n\n\u003C!-- REPO_STATS:START -->\n| Repo | Engagers | Likely Fake | Known Fake % | Fakeness % | Campaigns | Coverage | Sources |\n|------|----------|-------------|--------------|------------|-----------|----------|---------|\n| yuyefeiyu\u002Fyt-downloader | 175 | 69 | 44.6% | 39.4% | 1 | complete | github_search_recent |\n| rasoir0591\u002FCrosshair-X | 179 | 68 | 44.1% | 38.0% | 1 | complete | github_search_recent |\n| Allanlv5324F\u002FLossless-Scaling-Github | 180 | 67 | 43.3% | 37.2% | 1 | complete | github_search_recent |\n| bonus-2026\u002Fcrypto-casino-bonus | 289 | 59 | 21.5% | 20.4% | 1 | complete | github_search_recent |\n| 2aronS\u002FDuel-Agents | 289 | 51 | 14.5% | 17.6% | 1 | capped | github_search_recent |\n| ace-trump-tech\u002FDeltaForce-OBS-Locker | 152 | 50 | 2.6% | 32.9% | 1 | complete | github_search_recent |\n| risedownlabs\u002Fpolymarket-weather-bot | 111 | 49 | 65.8% | 44.1% | 1 | complete | github_search_recent |\n| TYOPxyz\u002Fsolana-pumpfun-bundler | 107 | 48 | 66.4% | 44.9% | 1 | complete | github_search_recent |\n| defi-ape\u002Fpolymarket-kalshi-arbitrage-bot | 105 | 42 | 66.7% | 40.0% | 1 | complete | github_search_recent |\n| DigitalPlatDev\u002FFreeDomain | 296 | 36 | 0.3% | 12.2% | 1 | complete | github_trending |\n| anthropic-claude-code-ai\u002Ffree-claude-code-ai-desktop-app | 117 | 35 | 26.5% | 29.9% | 1 | complete | github_search_recent |\n| Jadoox3\u002FMina-The-Hollower-Release | 120 | 35 | 24.2% | 29.2% | 1 | complete | github_search_recent |\n| malk190\u002FRomestead-Game-Release | 119 | 34 | 25.2% | 28.6% | 1 | complete | github_search_recent |\n| tor-browsers\u002Ftor-browser | 123 | 34 | 22.8% | 27.6% | 1 | complete | github_search_recent |\n| Beam-NG-Drive\u002FBeamMP | 119 | 32 | 24.4% | 26.9% | 1 | complete | github_search_recent |\n| PolyMomentum-Labs\u002F.github | 77 | 30 | 63.6% | 39.0% | 1 | capped | github_search_recent |\n| Stellarwolf001\u002Fforza-horizon-6-spotify-radio | 94 | 23 | 14.9% | 24.5% | 1 | complete | github_search_recent |\n| zhristophe\u002FClaude-Mythos-AI-Anthropic-App | 97 | 23 | 15.5% | 23.7% | 1 | complete | github_search_recent |\n| Dharyen\u002Fryujinx-emu | 92 | 22 | 16.3% | 23.9% | 1 | complete | github_search_recent |\n| openfi-dao\u002Fkalshi-trading-bot | 67 | 20 | 71.6% | 29.9% | 1 | complete | github_search_recent |\n| metavault-fi\u002Fsolana-trading-bot | 48 | 19 | 68.8% | 39.6% | 1 | capped | github_search_recent |\n| veryyoldman\u002FGenspark-AI | 60 | 19 | 20.0% | 31.7% | 1 | complete | github_search_recent |\n| Bartates\u002Flunar-client-minecraft | 60 | 18 | 23.3% | 30.0% | 1 | complete | github_search_recent |\n| Szili1994\u002Fcreate-aeronautics-minecraft-mod | 62 | 18 | 21.0% | 29.0% | 1 | complete | github_search_recent |\n| Noahmusahdevs\u002Froblox-account-manager | 64 | 18 | 25.0% | 28.1% | 1 | complete | github_search_recent |\n\u003C!-- REPO_STATS:END -->\n\n---\n\n## Scoring model\n\nEach account receives a composite suspicion score (0.0 = clean, 1.0 = likely fake) from four signals:\n\n| Signal | Weight | Measurement |\n|--------|--------|-------------|\n| Account age | 35% | `\u003C 2 days` → 1.00 &middot; `\u003C 7 days` → 0.90 &middot; `\u003C 30 days` → 0.55 &middot; `\u003C 90 days` → 0.20 &middot; older → 0.00 |\n| Profile completeness | 30% | Points for: no bio (+0.25), no location (+0.15), no company (+0.10), zero followers (+0.30), zero following (+0.10), bot-pattern username (+0.20) |\n| Repository pattern | 25% | Zero repos → 0.90 &middot; all repos are forks → 0.80 &middot; >85% fork ratio → 0.55 |\n| Activity history | 10% | Accounts >14 days old with zero repos + zero social graph → 0.80 (ghost accounts). Zero repos only → 0.60. All-forks + no social graph → 0.50 |\n\n**Classification thresholds:**\n\n| Score | Classification |\n|-------|---------------|\n| &ge; 0.75 | `likely_fake` |\n| &ge; 0.45 | `suspicious` |\n| \u003C 0.45 | `clean` (not stored) |\n\n### Campaign detection\n\nA **campaign** is a group of &ge; 4 suspicious accounts that all engaged with the same repo within a 3-hour window. The algorithm uses union-find to build connected components; accounts that co-engaged within the window are merged, and any component above the minimum size is flagged as a coordinated campaign.\n\nCampaign IDs are stable SHA-256 fingerprints of the sorted member set. The same campaign detected on consecutive days will have the same ID as long as membership is unchanged.\n\n**Why campaigns are the real signal:** Individual scores have meaningful false positive rates. A new developer with a sparse profile can score 0.80 alone. Forty accounts all scoring 0.75+, created within the same week, all starring the same repo within 90 minutes, is not a coincidence. The campaign signal is where the data becomes actionable: the difference between a suspicious data point and evidence of a coordinated operation.\n\n---\n\n## Data format\n\nAll findings are committed to [`data\u002Fsuspects.jsonl`](data\u002Fsuspects.jsonl) and [`data\u002Frepos.jsonl`](data\u002Frepos.jsonl), one JSON record per line, append-only. The GitHub Actions job summary (visible in the Actions UI after each run) provides a formatted per-scan report.\n\n**suspects.jsonl** — one record per flagged account per scan:\n```json\n{\n  \"login\": \"user98432\",\n  \"account_age_score\": 0.9,\n  \"profile_score\": 0.8,\n  \"repo_pattern_score\": 0.8,\n  \"activity_score\": 0.85,\n  \"composite\": 0.842,\n  \"classification\": \"likely_fake\",\n  \"campaign_id\": \"c-a3f9b2e1\",\n  \"scan_date\": \"2026-05-17\",\n  \"account_created_at\": \"2026-05-15\",\n  \"target_repos\": [\"owner\u002Frepo-a\", \"owner\u002Frepo-b\"]\n}\n```\n\n**repos.jsonl** — one record per targeted repo per scan:\n```json\n{\n  \"full_name\": \"owner\u002Fsuspicious-repo\",\n  \"total_scanned\": 87,\n  \"likely_fake\": 62,\n  \"suspicious\": 18,\n  \"known_likely_fake\": 27,\n  \"known_likely_fake_ratio\": 0.310,\n  \"repeat_offenders\": 11,\n  \"allowlisted_excluded\": 3,\n  \"fakeness_ratio\": 0.713,\n  \"classification\": \"likely_fake\",\n  \"campaign_count\": 3,\n  \"discovery_sources\": [\"github_search_recent\", \"reddit_osinttools\"],\n  \"event_sample_complete\": false,\n  \"scan_date\": \"2026-05-17\"\n}\n```\n\n**Query examples:**\n\n```bash\n# All likely_fake accounts from today\njq 'select(.scan_date == \"2026-05-17\" and .classification == \"likely_fake\") | .login' data\u002Fsuspects.jsonl\n\n# Accounts created in the last 3 days that were flagged\njq 'select(.account_created_at >= \"2026-05-14\") | [.login, .account_created_at, .classification] | @tsv' -r data\u002Fsuspects.jsonl\n\n# Which repos were targeted today, sorted by fakeness ratio\njq 'select(.scan_date == \"2026-05-17\") | [.full_name, .fakeness_ratio, .likely_fake] | @tsv' -r data\u002Frepos.jsonl | sort -t$'\\''\\t'\\'' -k2 -rn\n\n# Repos with the highest recycled-bot share from previously seen likely_fake accounts\njq 'select(.scan_date == \"2026-05-17\") | [.full_name, .known_likely_fake_ratio, .repeat_offenders] | @tsv' -r data\u002Frepos.jsonl | sort -t$'\\''\\t'\\'' -k2 -rn\n\n# All members of a specific campaign\njq 'select(.campaign_id == \"c-a3f9b2e1\") | [.login, .account_created_at, .composite] | @tsv' -r data\u002Fsuspects.jsonl\n\n# Repos a specific account targeted\njq 'select(.login == \"user98432\") | .target_repos[]' data\u002Fsuspects.jsonl\n\n# High-confidence repos: fakeness ratio above 60%\njq 'select(.fakeness_ratio >= 0.6) | [.full_name, .fakeness_ratio, .campaign_count] | @tsv' -r data\u002Frepos.jsonl | sort -t$'\\t' -k2 -rn\n```\n\n---\n\n## Setup\n\n### 1. Fork this repo\n\nYour fork owns the data. Results are committed back to `data\u002Fsuspects.jsonl` and `data\u002Frepos.jsonl` on your fork after every daily run.\n\n### 2. Add a GitHub PAT secret\n\nCreate a **classic** Personal Access Token with scopes:\n- `public_repo`: read public repo events and stargazers, create issues on public repos\n- `read:user`: fetch user profiles via GraphQL\n\n**Settings &rarr; Secrets and variables &rarr; Actions &rarr; New repository secret** &rarr; name it `GH_TOKEN`.\n\n> The default `GITHUB_TOKEN` has restricted rate limits and cannot call the user GraphQL endpoint at full capacity. A PAT is required.\n\n### 3. Enable Actions\n\n**Actions &rarr; Enable GitHub Actions** on your fork. The workflow runs at **07:00 UK time daily** using the `Europe\u002FLondon` clock:\n- **06:00 UTC** during British Summer Time\n- **07:00 UTC** during Greenwich Mean Time\n\nNo extra scheduling environment variable is required. GitHub Actions cron is UTC-only, so the workflow triggers at both UTC hours and only proceeds when the local London time is 07:00. Manual trigger available via **Actions &rarr; Daily Phantom Stars Scan &rarr; Run workflow**.\n\nAfter each run, the formatted scan report is visible in **Actions &rarr; [run] &rarr; Summary**.\n\n### 4. Run locally\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FYOUR_USERNAME\u002Fphantomstars.git\ncd phantomstars\npython -m venv venv && source venv\u002Fbin\u002Factivate\npip install -e .\nGH_TOKEN=ghp_your_token python -m phantomstars.main\n```\n\nFor an ad hoc local run after setup:\n\n```bash\nGH_TOKEN=ghp_your_token python -m phantomstars.main\n```\n\nTo scan one repository instead of the normal discovery set:\n\n```bash\nPHANTOMSTARS_TARGET_REPO=owner\u002Frepo GH_TOKEN=ghp_your_token python -m phantomstars.main\n```\n\n### One-off requests\n\nUsers can request a one-off repo check in two ways:\n\n1. Open the `Repo Check Request` issue template and provide the target repo plus requested depth.\n2. Use **Actions -> Daily Phantom Stars Scan -> Run workflow** and optionally set:\n   - `target_repo`: `owner\u002Frepo`\n   - `request_depth`: `recent` or `lifetime-request`\n\nCurrent behavior:\n\n- `recent`: runs the targeted recent-engagement scan immediately.\n- `lifetime-request`: runs a targeted lifetime scan across historical stars and forks for that repo only.\n- The daily scheduled scan remains unchanged and continues to use the recent-engagement method.\n\nGuardrails for lifetime mode:\n\n- only available for explicit one-off targeted requests\n- capped by configured repository-size limits before the scan starts\n- slower and more API-intensive than the daily scan\n\n---\n\n## Project structure\n\n```\nphantomstars\u002F\n├── .github\u002F\n│   ├── workflows\u002Fdaily-scan.yml       # Runs daily at 07:00 Europe\u002FLondon\n│   └── ISSUE_TEMPLATE\u002Ffalse_positive.yml\n├── src\u002Fphantomstars\u002F\n│   ├── config.py                      # All constants, no argparse, no env parsing\n│   ├── models.py                      # Frozen dataclasses\n│   ├── github_client.py               # REST + GraphQL, tenacity retries, rate-limit aware\n│   ├── heuristics.py                  # Per-user composite scoring engine\n│   ├── campaigns.py                   # Timestamp clustering + union-find\n│   ├── storage.py                     # JSONL append + query helpers\n│   ├── reporter.py                    # README dashboard injector\n│   ├── notifier.py                    # GitHub Issues notifier (files on targeted repos)\n│   └── main.py                        # Orchestration entry point\n├── tests\u002F\n│   ├── conftest.py\n│   ├── test_heuristics.py\n│   └── test_campaigns.py\n├── data\u002F\n│   ├── suspects.jsonl                 # Append-only account findings ledger\n│   ├── repos.jsonl                    # Append-only per-repo intelligence\n│   └── allowlist.txt                  # Accounts excluded from future scans\n└── pyproject.toml\n```\n\n---\n\n## Limitations and known failure modes\n\n- **Events API cap:** maximum 300 recent events per repo. Repos with thousands of stars in a day have partial coverage.\n- **Coverage flag:** repos that hit the 300-event cap are marked as `capped` in reports and dashboards; ratios on those repos are conservative samples, not full-day counts.\n- **Search index lag:** GitHub's search index is eventually consistent. Repos created seconds before the scan boundary may be missed.\n- **Heuristic drift:** Bot operators adapt. Score weights may require periodic tuning; adjust constants in `config.py`.\n- **Individual false positives:** A new developer with a sparse profile scores 0.75+ in isolation. Campaign membership is the high-confidence signal.\n- **Campaign ID drift:** If a bot farm's membership changes between scans (bots suspended, new bots added), the campaign ID changes. This reflects actual campaign evolution, not a bug.\n- **Rate limits:** 5,000 API requests\u002Fhour on an authenticated PAT. Well within limits for standard trending page sizes.\n- **Issues disabled:** Some targeted repos disable issues. Notifications for those repos are skipped silently.\n\n---\n\n## False positive process\n\nIf your account appears in `data\u002Fsuspects.jsonl` and you believe it is incorrectly classified:\n\n1. Find your entry: `jq 'select(.login == \"YOUR_LOGIN\")' data\u002Fsuspects.jsonl`\n2. [Open a false positive issue](..\u002F..\u002Fissues\u002Fnew?template=false_positive.yml) with your login, classification, scan date, and explanation\n3. Reports are reviewed manually. Verified false positives are added to `data\u002Fallowlist.txt` and excluded from all future scans, repo ratios, and issue notifications.\n\nNote: opening an issue does not modify or remove any existing data. The suspects ledger is append-only. The allowlist only affects future scans.\n\n---\n\n## Contributing\n\n```bash\npip install -e \".[dev]\"\npython -m black .\npython -m ruff check .\npython -m mypy src\npython -m pytest\n```\n\nAll four must pass before a PR.\n\n---\n\n## Disclaimer\n\nThis tool performs read-only analysis of public GitHub data using the official GitHub API. Where issues are filed on targeted repositories, they contain probabilistic findings and are clearly labelled as automated. Findings are indicators, not accusations. False positives exist and are expected.\n\nBuilt with AI as a coding partner, in response to an ecosystem problem created in part by AI.\n\n---\n\n## License\n\nApache 2.0. See [LICENSE](LICENSE)\n\n---\n\n## Author\n\nBuilt by **tg12** &middot; [GitHub](https:\u002F\u002Fgithub.com\u002Ftg12)\n\nA **[JS Labs](https:\u002F\u002Flabs.jamessawyer.co.uk\u002F)** project &middot; [AI Slop Intelligence Dashboards](https:\u002F\u002Flabs.jamessawyer.co.uk\u002Fai-slop-intelligence-dashboards\u002F)\n","phantomstars 是一个用于自动化检测和跟踪 GitHub 上虚假互动的项目。它通过每日持续集成（CI）运行，无需额外基础设施支持，利用 Python 语言实现。该项目能够识别可疑账户并评分，检测协调一致的机器人活动，并直接在受影响的仓库中创建问题以便维护者采取行动。其核心功能包括自动化的虚假点赞检测、追踪以及通知机制，特别适用于维护开源项目的健康生态，防止低质量或恶意仓库通过伪造的社交证明误导开发者。此外，phantomstars 是 JS Labs 的 AI Slop Intelligence 计划的一部分，旨在研究和对抗由低质量AI生成内容带来的负面影响。","2026-06-11 04:01:35","CREATED_QUERY"]