[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79964":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":38,"readmeContent":39,"aiSummary":40,"trendingCount":16,"starSnapshotCount":16,"syncStatus":41,"lastSyncTime":42,"discoverSource":43},79964,"voidaccess","KatrielMoses\u002Fvoidaccess","KatrielMoses","Self-hosted dark web OSINT platform. Automated threat intelligence from query to graph in 13 steps. Free alternative to Recorded Future, DarkOwl, and Flare.","",null,"Python",231,38,5,6,0,30,151,153,90,94.77,"MIT License",false,"main",true,[27,28,29,30,31,32,33,34,35,36,37],"ai","cybersecurity","dark-web","darknet","opensource","osint","osint-tool","security","self-hosted","threat-intelligence","tor","2026-06-12 04:01:26","\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fpublic\u002Flogo_circle.png\" width=\"160\" alt=\"VoidAccess Logo\">\n  \u003Ch1>VoidAccess\u003C\u002Fh1>\n  \u003Cp>\u003Cstrong>A self-hosted OSINT platform for dark web threat intelligence.\u003C\u002Fstrong>\u003C\u002Fp>\n  \u003Cp>Automate the entire investigation workflow from query refinement to relationship mapping in 13 autonomous pipeline steps.\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n---\n\n## The OSINT Powerhouse\n\nCommercial threat intelligence platforms often charge prohibitive annual fees for capabilities that can be run on private hardware. **VoidAccess** democratizes high-end dark web intelligence by providing an automated, end-to-end workflow:\n\n- **Query Refinement**: Intelligent search term optimization using LLMs.\n- **Multilingual Search**: Deep-web fan-out across English, Russian, and Chinese engines.\n- **Entity Extraction**: Autonomous identification of wallets, IOCs, PGP keys, and more.\n- **Relationship Mapping**: Dynamic graph generation from extracted data co-occurrence.\n- **Structured Export**: STIX 2.1, MISP, Sigma, and CSV support.\n\n---\n\n## Visual Walkthrough\n\n### 1. Intuitive Dashboard\nStart investigations with a clean, dark-themed interface designed for high-stakes research.\n![Homepage](.\u002Fpublic\u002Fhomepage.png)\n\n### 2. Intelligent Scoping\nRefine queries and select investigation depth with precision.\n![Topic Selection](.\u002Fpublic\u002Ftopic_selection.png)\n\n### 3. Real-time Pipeline Tracking\nMonitor the 13-step autonomous pipeline as it crawls and extracts intelligence.\n![Loading](.\u002Fpublic\u002Floading.png)\n\n### 4. Interactive Graph Intelligence\nExplore connections between entities, onion sites, and threat actors in a dynamic, high-contrast graph.\n![Node Selection](.\u002Fpublic\u002Fnode_selection.png)\n\n### 5. Comprehensive Intel Reports\nGet structured summaries and actionable artifacts once the scan completes.\n![Scan Completed](.\u002Fpublic\u002Fscan_completed.png)\n\n---\n\n## How It Works (The 13-Step Pipeline)\n\nVoidAccess handles the complexity of dark web research through a rigorous sequence:\n\n1. **LLM Query Refinement**: Optimizes search terms for .onion engine indexing.\n2. **Parallel Collection**: Queries 16+ Tor search engines simultaneously with paste sites (Pastebin, dpaste, paste.ee), GitHub, GitLab, and curated RSS security feeds.\n3. **Intelligence Filtering**: LLM filters noise, keeping only relevant intelligence pages.\n4. **Multi-Source Enrichment**: Pulls from AlienVault OTX, abuse.ch, ransomware.live, CISA KEV, Shodan, GreyNoise, AbuseIPDB, Feodo Tracker, C2IntelFeeds, and more — running in parallel with collection.\n5. **Recursive .onion Discovery**: Discovers hidden links via seed URL crawling.\n6. **Vector Cache Check**: Avoids redundant scraping for recently visited pages (24h TTL).\n7. **Tor-Routed Scraping**: Safely fetches page content with a 1MB safety cap.\n8. **Persistence**: Stores new content in the local vector cache.\n9. **Intelligence Merging**: Combines scraped and enriched data for processing.\n10. **Advanced Extraction**: Regex, NER, and LLM-based entity identification.\n11. **Historical Cross-Referencing**: Validates data against seed datasets.\n12. **Graph Construction**: Builds relationship nodes based on co-occurrence.\n13. **Final Intelligence Summary**: LLM generates a structured technical briefing.\n\n---\n\n## What It Extracts\n\nThe extraction pipeline identifies these entity types:\n\n| Category | Examples |\n|---|---|\n| **Cryptocurrency** | Bitcoin, Ethereum, Monero wallet addresses |\n| **Network Indicators** | IPv4 addresses, .onion URLs, domains, email addresses, PGP keys |\n| **File Indicators** | MD5, SHA1, SHA256 hashes |\n| **Vulnerabilities** | CVE numbers, MITRE ATT&CK techniques |\n| **Threat Actors** | Actor handles, malware families, ransomware group names |\n| **Paste Sites** | Pastebin, Ghostbin, Rentry, and similar links |\n| **People\u002FOrgs** | Named persons, organization names, locations |\n\nParallel collection sources (run alongside Tor search):\n\n- **Paste sites** — Pastebin, dpaste, paste.ee, Rentry\n- **GitHub** — code search and repository READMEs\n- **GitLab** — code search and project pages\n- **RSS security feeds** — 20 curated feeds (Krebs, BleepingComputer, Talos, Mandiant, CrowdStrike, Unit 42, CISA, and more)\n- **Curated .onion seed catalogue** — 31 vetted seeds across 8 categories, scored per query\n\nEnrichment and quality sources (19 total):\n\n- **AlienVault OTX** — threat pulses and malware families\n- **MalwareBazaar** — malware samples and signatures\n- **ThreatFox** — recent IOC feed\n- **URLhaus** — malicious URL database\n- **ransomware.live** — ransomware group tracking and leak-site seeds\n- **CISA KEV** — known exploited vulnerabilities catalog\n- **Shodan InternetDB** — passive vulnerability signatures\n- **VirusTotal** — file hash AV detection ratio (API key required)\n- **GreyNoise** — suppresses known benign scanner IPs from results (API key required)\n- **AbuseIPDB** — community IP abuse reports; 1,000 checks\u002Fday free\n- **Feodo Tracker + C2IntelFeeds** — confirmed C2 IPs for 6 major frameworks; no key required\n- **crt.sh** — certificate transparency logs; subdomain enumeration; free\n- **URLScan.io** — live domain scan data and malicious verdicts\n- **Wayback Machine** — historical domain snapshots for taken-down infrastructure\n- **Hybrid Analysis** — behavioral sandbox verdict and AV detection ratio for file hashes\n- **HaveIBeenPwned** — breach history for email addresses (paid API key)\n- **EmailRep** — email reputation scoring and disposable detection\n- **CIRCL PDNS + RDAP** — passive DNS history and WHOIS registration data; free\n- **BlockCypher + Etherscan** — blockchain wallet balance and transaction graph\n\nExport formats:\n\n- **STIX 2.1** — bundles with indicators, threat actors, malware objects\n- **MISP JSON** — events with galaxies for direct import\n- **Sigma rules** — auto-generated detection rules from extracted IOCs\n- **CSV** — flat entity dumps for spreadsheet analysis\n\n---\n\n## LLM & Enrichment Ecosystem\n\n### Supported LLM Providers\n\n| Provider | Models | Notes |\n|---|---|---|\n| **OpenRouter** | DeepSeek, Llama 3.3, Claude Haiku | Recommended default; free models available |\n| **Groq** | Llama 3.3, Llama 3.1 | Fast inference; free tier |\n| **OpenAI** | GPT-4o Mini | API key required |\n| **Anthropic** | Claude Haiku | Haiku is the tested default; other models work via manual override. |\n| **Google Gemini** | Gemini 1.5 Flash, 2.5 Pro | Free tier via AI Studio |\n| **Ollama** | Any local model | Air-gapped; no API key needed |\n\nThe default is **DeepSeek via OpenRouter** — fast and strong on technical security content. With free-tier LLMs (Groq free, OpenRouter free models, or Ollama) the cost is **$0**. With paid models like DeepSeek via OpenRouter it is **under $0.50 per investigation**. For fully air-gapped deployments, Ollama runs entirely locally.\n\n### Optional Enrichment API Keys\n\nAll enrichment sources that require a key degrade gracefully when the key is absent — they are skipped without failing the investigation. Keys marked \"free\" require registration but have no cost.\n\n| Key | What it does | Free | Sign up |\n|---|---|---|---|\n| `OTX_API_KEY` | AlienVault OTX threat pulses | Yes | [otx.alienvault.com](https:\u002F\u002Fotx.alienvault.com) |\n| `VT_API_KEY` | VirusTotal file hash AV detections | Yes (4 req\u002Fmin) | [virustotal.com](https:\u002F\u002Fwww.virustotal.com) |\n| `ABUSECH_API_KEY` | MalwareBazaar, ThreatFox, URLhaus rate limits | Yes | [abuse.ch](https:\u002F\u002Fabuse.ch) |\n| `ABUSEIPDB_API_KEY` | Community IP abuse reports; 1,000 checks\u002Fday | Yes | [abuseipdb.com\u002Fregister](https:\u002F\u002Fwww.abuseipdb.com\u002Fregister) |\n| `GREYNOISE_API_KEY` | Suppresses known scanner\u002Fresearcher IPs | Free tier available | [greynoise.io\u002Fpricing](https:\u002F\u002Fwww.greynoise.io\u002Fpricing) |\n| `URLSCAN_API_KEY` | Higher rate limits for URLScan.io domain scans | Yes (public results without key) | [urlscan.io\u002Fuser\u002Fsignup](https:\u002F\u002Furlscan.io\u002Fuser\u002Fsignup) |\n| `HYBRID_ANALYSIS_API_KEY` | Behavioral sandbox analysis for file hashes | Yes | [hybrid-analysis.com\u002Fsignup](https:\u002F\u002Fwww.hybrid-analysis.com\u002Fsignup) |\n| `HIBP_API_KEY` | Email breach history — the most valuable email enrichment | No ($3.50\u002Fmonth) | [haveibeenpwned.com\u002FAPI\u002FKey](https:\u002F\u002Fhaveibeenpwned.com\u002FAPI\u002FKey) |\n| `EMAILREP_API_KEY` | Email reputation scoring; increases rate limits | Yes (reduced rate without key) | [emailrep.io\u002Fkey](https:\u002F\u002Femailrep.io\u002Fkey) |\n| `SECURITYTRAILS_API_KEY` | Richer DNS history for domains | Yes (50 queries\u002Fmonth) | [securitytrails.com\u002Fcorp\u002Fapi](https:\u002F\u002Fsecuritytrails.com\u002Fcorp\u002Fapi) |\n| `GITHUB_TOKEN` | Raises GitHub scraping from 10 to 30 req\u002Fmin | Free | [github.com\u002Fsettings\u002Ftokens](https:\u002F\u002Fgithub.com\u002Fsettings\u002Ftokens) |\n| `GITLAB_TOKEN` | Raises GitLab scraping from 15 to 60 req\u002Fmin | Free | [gitlab.com\u002Fprofile\u002Fpersonal_access_tokens](https:\u002F\u002Fgitlab.com\u002F-\u002Fprofile\u002Fpersonal_access_tokens) |\n| `BLOCKCYPHER_TOKEN` | BTC\u002FETH wallet balance and transaction graph | Yes | [blockcypher.com](https:\u002F\u002Fwww.blockcypher.com) |\n| `ETHERSCAN_API_KEY` | ETH wallet lookups | Yes | [etherscan.io\u002Fapis](https:\u002F\u002Fetherscan.io\u002Fapis) |\n\n---\n\n## Cost Comparison\n\n| Platform | Annual Cost | Self-Hosted | Open Source |\n|---|---|---|---|\n| Recorded Future | ~$25,000 | No | No |\n| DarkOwl | ~$15,000 | No | No |\n| Flare | ~$8,000 | No | No |\n| **VoidAccess** | **Free** | **Yes** | **Yes** |\n\nFree with Groq, OpenRouter free models, or Ollama. Under $0.50 per investigation with paid models like DeepSeek.\n\n---\n\n## What's New in v1.3\n\n- **10 new enrichment sources**: GreyNoise (scanner suppression), AbuseIPDB, Feodo Tracker, C2IntelFeeds, crt.sh, URLScan.io, Wayback Machine, Hybrid Analysis, HaveIBeenPwned, EmailRep\n- **4 new clearnet collection sources**: paste sites, GitHub code search, GitLab code search, and 20 curated RSS security feeds\n- **Curated .onion seed list** — 31 seeds across 8 categories, relevance-scored per query\n- **CIRCL passive DNS + RDAP WHOIS** — infrastructure cluster detection for IPs and domains\n- **Investigation cancellation** — cancel a running pipeline at any checkpoint; partial results are preserved\n- **Sources panel** — per-investigation breakdown of which sources ran and what each returned\n- **Infrastructure clusters panel** — groups IPs and domains sharing ASN, CIDR block, or WHOIS registrant\n- **Entity quality badges** — C2, Malicious, Breached, Disposable, Archived, Taken Down, AV ratio\n- **GreyNoise suppression** — known benign scanner IPs are filtered from entity results automatically\n- **MALWARE_FAMILY auto-creation** from confirmed family names returned by hash enrichment\n\n---\n\n## Quick Start\n\n### Prerequisites\n- Docker and Docker Compose\n- Python 3 (recommended — used by setup.sh for secret generation; Linux\u002FmacOS fall back to \u002Fdev\u002Furandom if absent, Windows setup.bat may require it)\n- One LLM API key — or Ollama for fully local operation (free)\n\n**Free LLM options (no credit card required):**\n- [Groq](https:\u002F\u002Fconsole.groq.com) — fast, free tier, Llama 3.3 70B\n- [OpenRouter](https:\u002F\u002Fopenrouter.ai) — free models including DeepSeek and Llama 3.3\n- [Google AI Studio](https:\u002F\u002Faistudio.google.com) — Gemini free tier\n- [Ollama](https:\u002F\u002Follama.ai) — fully local, no internet required\n\n### Installation\n\n**macOS \u002F Linux \u002F WSL:**\n```bash\nbash setup.sh\n```\n\n**Windows (native):**\n```bat\nsetup.bat\n```\n\nThe interactive wizard creates `.env`, generates `JWT_SECRET` and `POSTGRES_PASSWORD`, prompts for your LLM provider (one of: Groq, OpenRouter, Anthropic, OpenAI, Google Gemini, or Ollama), optionally collects threat-intel keys (`OTX_API_KEY`, `VT_API_KEY`), optionally enables Redis, sets the admin password, and starts the Docker stack.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fpublic\u002Fsetup_gif.gif\" width=\"100%\" alt=\"Setup walkthrough\">\n\u003C\u002Fdiv>\n\n### Starting and Stopping\n\n**macOS \u002F Linux \u002F WSL:**\n```bash\n.\u002Fstart.sh    # build and start all services\n.\u002Fstop.sh     # stop all services\n```\n\n**Windows (native):**\n```bat\nstart.bat     :: build and start all services\nstop.bat      :: stop all services\n```\n\nOnce running, open **http:\u002F\u002Flocalhost:3001** in your browser.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\".\u002Fpublic\u002Fstart_gif.gif\" width=\"100%\" alt=\"Starting VoidAccess\">\n\u003C\u002Fdiv>\n\n### Getting a JWT (API access)\n\n`setup.sh` creates a default admin account at `admin@voidaccess.tech` with the password you provided during the wizard.\n\n```bash\ncurl -X POST http:\u002F\u002Flocalhost:8000\u002Fauth\u002Flogin \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"email\": \"admin@voidaccess.tech\", \"password\": \"yourpassword\"}'\n```\n\nUse the returned token in an `Authorization: Bearer \u003Ctoken>` header for API requests.\n\n### Running your first investigation (API)\n\n```bash\ncurl -X POST http:\u002F\u002Flocalhost:8000\u002Finvestigations \\\n  -H \"Authorization: Bearer \u003Cyour_jwt>\" \\\n  -H \"Content-Type: application\u002Fjson\" \\\n  -d '{\"query\": \"LockBit ransomware infrastructure 2024\"}'\n```\n\nThe investigation starts in `pending`, moves to `processing`, and completes in 3–5 minutes with a summary, extracted entities, relationship graph, and export-ready artifacts.\n\n---\n\n## Architecture\n\nFour Docker services:\n\n| Service | Technology | Port |\n|---|---|---|\n| **postgres** | PostgreSQL 16 | 5433 |\n| **tor** | Tor SOCKS5 proxy | 9050 |\n| **fastapi** | Python 3.11, FastAPI, SQLAlchemy | 8000 |\n| **nextjs** | Next.js 14, TypeScript, Tailwind | 3001 |\n\nThe FastAPI backend runs a 13-step pipeline triggered by `POST \u002Finvestigations`. Every external call has `try\u002Fexcept` with graceful fallback — the pipeline never hard-crashes. API docs are available at **http:\u002F\u002Flocalhost:8000\u002Fdocs** when running.\n\n### Source Tree\n\n```\nvoidaccess\u002F\n├── analysis\u002F      # Temporal patterns, OPSEC failure detection, anomaly scoring\n├── api\u002F           # FastAPI routes; investigation pipeline orchestrator\n├── auth\u002F          # JWT authentication and user management\n├── crawler\u002F       # Recursive .onion link discovery spider\n├── db\u002F            # SQLAlchemy ORM models and Alembic migrations\n├── docs\u002F          # Contributing, security, and usage policy documents\n├── export\u002F        # STIX 2.1, MISP, Sigma, and CSV artifact generation\n├── extractor\u002F     # Regex → NER → LLM entity extraction pipeline\n├── fingerprint\u002F   # Stylometry vectors and actor style profiling\n├── graph\u002F         # NetworkX MultiDiGraph builder and pyvis visualization\n├── i18n\u002F          # Language detection, translation, multilingual query expansion\n├── infra\u002F         # Docker Compose, Tor config, Postgres init\n├── monitor\u002F       # APScheduler watches, change diffing, Telegram\u002FSMTP alerts\n├── public\u002F        # Logo, walkthrough screenshots, demo media\n├── scraper\u002F       # Async aiohttp and Playwright scrapers over Tor\n├── scripts\u002F       # Seed imports and operational utilities\n├── search\u002F        # 16+ .onion search engine fan-out with circuit breaker\n├── sources\u002F       # DarkSearch, Telegram, paste sites, threat-intel feeds\n├── tests\u002F         # Pytest suite (one test file per module)\n├── utils\u002F         # Async helpers, content safety, encryption, defang\n├── vector\u002F        # ChromaDB cache with sentence-transformer embeddings\n├── voidaccess\u002F    # LangChain LLM wrappers and provider registry\n└── web\u002F           # Next.js 14 + TypeScript + Tailwind frontend\n```\n\n> **Note on `voidaccess\u002Fvoidaccess\u002F`** — the nested directory holds the core LLM utilities (`llm.py`, `llm_utils.py`) and is imported at runtime by the API routes (`from voidaccess.llm import ...`). The nested naming reflects the original package structure from the project's pre-API baseline.\n\n---\n\n## Troubleshooting\n\n**Services won't start:**\n```bash\ndocker compose -f infra\u002Fdocker-compose.yml --project-directory . ps\ndocker compose -f infra\u002Fdocker-compose.yml --project-directory . logs -f\n```\n\n**Port conflicts** (3001 or 8000 already in use):\n- macOS\u002FLinux: `lsof -i :3001` to find what's using it\n- Windows: `netstat -ano | findstr :3001`\n\n**Tor not connecting:** The Tor service takes 30–60 seconds to bootstrap on first start. Check health with `.\u002Fcheck_health.sh`. This script verifies Tor proxy connectivity, LLM provider reachability, and dark web search engine availability.\n\n**No .env file:** Run `bash setup.sh` (macOS\u002FLinux\u002FWSL) or `setup.bat` (Windows) before starting.\n\n**Docker build takes a long time:** First build downloads ~3GB of layers. Subsequent builds use the Docker layer cache and are much faster.\n\n---\n\n## Content Safety\n\nEvery investigation runs through mandatory content safety filters before results reach the UI or appear in the graph. CSAM, gore, snuff content, and other prohibited material are blocked at the query stage, URL validation, content scanning, and post-extraction entity filtering. These filters are mandatory and cannot be disabled.\n\n---\n\n## Acceptable Use\n\nVoidAccess is for authorized security research, threat intelligence gathering, and law enforcement purposes only. Users are responsible for ensuring compliance with all local laws and ethical standards. See [docs\u002FUSAGE_POLICY.md](docs\u002FUSAGE_POLICY.md) for the full policy.\n\n---\n\n## Contributing\n\nContributions are welcome. See [docs\u002FCONTRIBUTING.md](docs\u002FCONTRIBUTING.md) for setup instructions, code standards, and the PR process. Please read [docs\u002FCODE_OF_CONDUCT.md](docs\u002FCODE_OF_CONDUCT.md) before participating.\n\nTo report a security vulnerability, see [docs\u002FSECURITY.md](docs\u002FSECURITY.md).\n\n---\n\n## License\n\nMIT License. See [LICENSE](LICENSE) for details.\n","VoidAccess是一个自托管的暗网开源情报（OSINT）平台，旨在自动化从查询到关系图生成的整个威胁情报工作流程。其核心功能包括使用大语言模型进行智能查询优化、多语言深度网络搜索、实体自动提取以及动态生成数据关联图。此外，该平台支持STIX 2.1、MISP等格式的数据导出，便于进一步分析和应用。VoidAccess适合需要对暗网中的威胁情报进行深入研究与监控的安全研究人员或组织使用，提供了一个免费且功能强大的替代方案来代替Recorded Future、DarkOwl等商业服务。",2,"2026-06-11 03:58:43","CREATED_QUERY"]