[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-81842":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":23,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":28,"discoverSource":29},81842,"osv-bloom","endevco\u002Fosv-bloom","endevco","Bloom filter of malicious npm package buckets from OSV. Rebuilt every 10min.","https:\u002F\u002Fgithub.com\u002Fendevco\u002Faube",null,"Rust",30,3,26,0,2,4,6,47.21,"MIT License",false,"main",true,[],"2026-06-12 04:01:35","# osv-bloom\n\nA small, cron-refreshed bloom filter of `(npm package name, semver major bucket)` pairs drawn from OSV `MAL-*` advisories — the malicious-package archive at \u003Chttps:\u002F\u002Fosv-vulnerabilities.storage.googleapis.com\u002Fnpm\u002Fall.zip>.\n\nBuilt so package managers (initially [aube](https:\u002F\u002Fgithub.com\u002Fendevco\u002Faube)) can probe every lockfile entry on every install for ~free, then escalate to OSV's live `\u002Fv1\u002Fquerybatch` only on a bloom hit. False-positive rate is `0.1%`, so a typical lockfile of ~1000 packages will trigger zero or one live-API call per install in steady state.\n\n## Consume\n\nServed via GitHub Pages — no binary artifacts live in git. The\n[`refresh` workflow](.\u002F.github\u002Fworkflows\u002Frefresh.yml) rebuilds the\nfilter every 10 minutes and re-deploys the site.\n\n- `filter.bin` — the bloom filter itself\n- `manifest.json` — params, timestamps, digests\n\nURLs (CDN-cached, `If-None-Match` for change detection):\n\n```\nhttps:\u002F\u002Fendevco.github.io\u002Fosv-bloom\u002Ffilter.bin\nhttps:\u002F\u002Fendevco.github.io\u002Fosv-bloom\u002Fmanifest.json\n```\n\nRust consumers can depend on the reader crate directly:\n\n```toml\n[dependencies]\nosv-bloom = { git = \"https:\u002F\u002Fgithub.com\u002Fendevco\u002Fosv-bloom\" }\n```\n\n```rust\nuse osv_bloom::{Bloom, bucket};\n\nlet bytes = std::fs::read(\"filter.bin\")?;\nlet bloom = Bloom::decode(&bytes)?;\n\nif bloom.contains(\"evil-pkg\", &bucket(1, 0)) {\n    \u002F\u002F probable hit — escalate to OSV live API for the exact (name, version)\n}\n```\n\n## Refresh cadence\n\nGitHub Actions cron runs every 10 minutes. The workflow re-downloads `all.zip`, rebuilds the entry set, and re-deploys the Pages site. Most ticks redeploy a byte-identical filter; clients short-circuit via `manifest.set_digest_sha256` so the bloom is only re-downloaded when the underlying entry set actually changed.\n\n## Detection latency\n\nosv-bloom is a **post-disclosure** defence. The filter only contains packages OSV has already published as `MAL-*`. Observed lag between a malicious npm publish and the corresponding `MAL-*` entry landing in OSV's `all.zip` is on the order of hours to ~1 day (e.g. ~24 h for the [TanStack 2026-05-11 incident](https:\u002F\u002Ftanstack.com\u002Fblog\u002Fnpm-supply-chain-compromise-postmortem)). Within that window osv-bloom returns clean — same as querying OSV's live API would.\n\nThe 10-minute refresh cadence keeps the published filter in lockstep with whatever OSV currently exposes; it does not shorten OSV's own ingestion latency.\n\nFor staleness monitoring, consumers can compare `manifest.newest_mal_modified` (the max `modified` timestamp across all consumed advisories) against `built_at_unix` — if `newest_mal_modified` stops advancing while `built_at` keeps ticking, the upstream OSV feed is the bottleneck, not this filter.\n\n## Key encoding\n\nFor each `affected[]` in a `MAL-*.json`:\n\n1. Skip if `package.ecosystem != \"npm\"`.\n2. If `affected.versions[]` is populated (typical for malicious uploads), parse each as semver and emit one bucket per version.\n3. Else walk `affected.ranges[].events[]`:\n   - `introduced: \"0\"` → emit the wildcard bucket `\"*\"` (matches any version of this package).\n   - `introduced: \"\u003Csemver>\"` → emit that version's bucket.\n   - `fixed` \u002F `last_affected` → emit that bucket too (defensive).\n4. If nothing parsed, emit `\"*\"`.\n\nBucket encoding:\n\n| version            | bucket |\n|--------------------|--------|\n| `1.2.3`            | `\"1\"`  |\n| `3.7.0`            | `\"3\"`  |\n| `0.3.7`            | `\"0.3\"` |\n| `0.0.1`            | `\"0.0\"` |\n| _any version_      | `\"*\"`  |\n\nPre-1.0 packages bucket by `0.\u003Cminor>` because semver allows breaking changes between minors below 1.0 — bucketing by `0` alone would false-positive every install of any 0.x package that ever had a vuln.\n\n## Wire format (v1)\n\nLittle-endian. 64-byte header + bitset.\n\n```text\noffset  size  field\n0       4     magic = b\"OSVB\"\n4       4     format_version (u32) = 1\n8       8     m  (u64) — bit count\n16      4     k  (u32) — hash count\n20      4     n  (u32) — entries inserted\n24      8     built_at_unix_seconds (u64)\n32      32    seed (BLAKE3 keyed-hash key)\n64      ceil(m\u002F8)  bitset (LE bit order: bit i of byte j is mask `1 \u003C\u003C (i % 8)`, byte j = i \u002F 8)\n```\n\nHashing: keyed BLAKE3 over `name || 0x00 || bucket`. The 32-byte digest is split into `h1 = u64::from_le_bytes(d[0..8])` and `h2 = u64::from_le_bytes(d[8..16])`. Bit indices are `(h1 + i*h2) mod m` for `i in 0..k` ([Kirsch–Mitzenmacher double hashing](https:\u002F\u002Fwww.eecs.harvard.edu\u002F~michaelm\u002Fpostscripts\u002Frsa2008.pdf)).\n\nThe seed is deterministic and public (`blake3::hash(b\"osv-bloom v1 deterministic seed\")`); bloom hashing is not a cryptographic operation. If the seed ever needs to change, bump `format_version` — every deployed client has to refetch anyway.\n\n## Output is deterministic\n\nFor a given input set the bitset bytes are byte-identical across runs (constant seed + sorted-deduped entry list). Only the `built_at_unix_seconds` field inside the 64-byte header changes every run, so clients use `manifest.set_digest_sha256` — computed over the sorted entry set, timestamp-free — to decide whether to re-download.\n\n## Sizing\n\nAt the current OSV state (~212K MAL-* advisories, ~216K unique `(name, bucket)` pairs):\n\n- `m` ≈ 3.1M bits\n- `k` = 10\n- Filter size: ~380 KB\n\nDoubles linearly with entry count. Headroom is fine — even a 1M-entry world is ~1.8 MB.\n\n## Build locally\n\n```sh\ncargo run --release -p osv-bloom-build -- --out-dir dist\n```\n\nTakes ~30s on a typical laptop, mostly downloading the 200 MB OSV zip.\n\n## License\n\nMIT.\n","osv-bloom 是一个基于恶意 npm 包的布隆过滤器，数据来源于 OSV 的 `MAL-*` 告警，并每10分钟更新一次。该项目利用 Rust 语言构建了一个高效的数据结构，能够以极低的成本（约0.1%的误报率）检查项目依赖的安全性。当在安装过程中遇到潜在威胁时，它会触发对 OSV 实时API的查询来获取更准确的信息。适用于需要快速初步筛查npm包安全性的场景，如集成到包管理工具中进行自动检测，以减少对外部服务的直接调用次数，从而提升性能和降低成本。","2026-06-11 04:06:54","CREATED_QUERY"]