[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80418":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":29,"readmeContent":30,"aiSummary":31,"trendingCount":16,"starSnapshotCount":16,"syncStatus":32,"lastSyncTime":33,"discoverSource":34},80418,"bigset","tinyfish-io\u002Fbigset","tinyfish-io","What if you had all the data in the world?","",null,"TypeScript",1321,143,18,7,0,24,442,1264,160,19.48,"GNU Affero General Public License v3.0",false,"main",true,[5,27,28],"open-source","tinyfish","2026-06-12 02:04:02","\u003Cp align=\"center\">\n  \u003Cimg src=\"assets\u002Fbanner.svg\" alt=\"BigSet\" width=\"100%\" \u002F>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cstrong>Live, queryable datasets that update automatically.\u003C\u002Fstrong>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ftinyfish-io\u002Fbigset\u002Fstargazers\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Ftinyfish-io\u002Fbigset?style=flat\" alt=\"GitHub Stars\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ftinyfish-io\u002Fbigset\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-AGPL--3.0-blue\" alt=\"License\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Ftinyfish-io\u002Fbigset\u002Fissues\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fissues\u002Ftinyfish-io\u002Fbigset\" alt=\"Issues\" \u002F>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fx.com\u002FTiny_Fish\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Ftwitter\u002Ffollow\u002FTiny_Fish?style=flat\" alt=\"Follow TinyFish\" \u002F>\u003C\u002Fa>\n\u003C\u002Fp>\n\n---\n\nThink of it like a spreadsheet that fills itself in — you describe the dataset you want (YC companies currently hiring, insurance quotes in your area, restaurants serving a specific brand), and BigSet builds it, keeps it fresh, and lets you query it with SQL.\n\nBuilt on [TinyFish](https:\u002F\u002Ftinyfish.ai) APIs.\n\n## ✨ Why BigSet?\n\nAt the end of the day, the only thing that matters is data. Every decision, every agent, every product — it all comes down to having the right data at the right time.\n\nSo what if you could just… ask for it? Describe the dataset you want — in plain English — and have it built, structured, and kept fresh automatically. No scrapers to maintain. No pipelines to babysit. No waking up to broken cron jobs because some site changed a div.\n\nYou describe it. BigSet collects it. Your agents query it with SQL. It stays up to date on your schedule — every 30 minutes, every hour, whatever you need. And if something breaks, a healer agent patches it before you even notice.\n\nAny dataset. Any source. Always fresh. That's the idea.\n\n---\n\n## 🚀 Quick Start\n\n**Prerequisites:** [Docker](https:\u002F\u002Fdocs.docker.com\u002Fget-docker\u002F), [Make](https:\u002F\u002Fwww.gnu.org\u002Fsoftware\u002Fmake\u002F), and a free [Clerk](https:\u002F\u002Fdashboard.clerk.com) account\n\n### 1. Clone and set up Clerk\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Ftinyfish-io\u002Fbigset.git\ncd bigset\n```\n\nCreate a Clerk application at [dashboard.clerk.com](https:\u002F\u002Fdashboard.clerk.com), then go to **JWT Templates** and enable the **Convex** template.\n\n### 2. Configure env\n\n```bash\ncp .env.example .env\n# Fill in NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY, CLERK_SECRET_KEY,\n# CLERK_JWT_ISSUER_DOMAIN, OPENROUTER_API_KEY, and optional service keys.\n```\n\n> **Required for the create-dataset wizard:** set `OPENROUTER_API_KEY` (used by the schema-inference pipeline). Get one at [openrouter.ai](https:\u002F\u002Fopenrouter.ai). Without it the wizard's \"Generate Schema\" step will fail.\n\n> **Optional:** to enable [PostHog](https:\u002F\u002Fposthog.com) product analytics + session replay + error tracking, set `NEXT_PUBLIC_POSTHOG_KEY` and `NEXT_PUBLIC_POSTHOG_HOST`. Leave blank to disable cleanly (the app no-ops every event).\n\n### 3. Start everything\n\n```bash\nmake dev\n```\n\nThis starts all Docker services, waits for Convex to be healthy, and deploys Convex functions automatically.\n`make dev` checks that root `.env` contains real Clerk and OpenRouter values before it starts Docker.\nOnce it's up:\n\n- App: http:\u002F\u002Flocalhost:3500\n- Convex dashboard: http:\u002F\u002Flocalhost:6791\n- [Mastra Studio](https:\u002F\u002Fmastra.ai) (workflow inspector): http:\u002F\u002Flocalhost:4111\n\n### 4. Generate Convex admin key (first time only)\n\n```bash\ndocker compose exec convex .\u002Fgenerate_admin_key.sh\n```\n\nPaste the output into `.env` as `CONVEX_SELF_HOSTED_ADMIN_KEY`, then re-run `make dev`.\n\n### 5. Load curated public datasets\n\nThe landing page and the dashboard's \"Curated\" section read from a set of 9 system-owned datasets. Load them with:\n\n```bash\nmake seed-public-datasets\n```\n\nThe script is **idempotent** — rerunning it skips datasets that already exist (matched by a stable `seedKey`, so renaming a curated dataset never creates a duplicate). To add a 10th curated dataset, append it to `PUBLIC_DATASETS` in [frontend\u002Fconvex\u002FpublicSeed.ts](frontend\u002Fconvex\u002FpublicSeed.ts) with a fresh `seedKey` and rerun the command. To replace existing curated content in place, pass `force: true`:\n\n```bash\ncd frontend\nnode ..\u002Fscripts\u002Fwith-root-env.mjs npx convex run publicSeed:seedPublicDatasets '{\"force\":true}'\n```\n\nOpen [localhost:3500](http:\u002F\u002Flocalhost:3500) and click **Get started** to sign in.\n\n> **Note:** root `.env` is the only local env file. If you edit Convex functions in `frontend\u002Fconvex\u002F`, run `make convex-push` to deploy the changes.\n\n> **Free tier:** each signed-in account gets **2,500 row operations per calendar month** (resets on the 1st, UTC). The header shows a live usage badge; system-owned curated datasets bypass the quota.\n\n---\n\n## 🛠 Tech Stack\n\n| Layer | Tech |\n|-------|------|\n| Frontend | Next.js 16, React 19, Tailwind 4 |\n| Backend | Fastify, TypeScript (agent runner) |\n| Auth | [Clerk](https:\u002F\u002Fclerk.com) |\n| Database | [Convex](https:\u002F\u002Fconvex.dev) (self-hosted) |\n| Data Collection | [TinyFish](https:\u002F\u002Ftinyfish.ai) APIs (Search, Fetch, Browser) |\n| AI orchestration | [Mastra](https:\u002F\u002Fmastra.ai) workflows + [Vercel AI SDK](https:\u002F\u002Fsdk.vercel.ai) + [OpenRouter](https:\u002F\u002Fopenrouter.ai) → Claude Sonnet (schema inference + populate agent) |\n| Table view | [TanStack Table](https:\u002F\u002Ftanstack.com\u002Ftable) + [react-window](https:\u002F\u002Fgithub.com\u002Fbvaughn\u002Freact-window) virtualization |\n| Exports | CSV (built-in) + XLSX ([SheetJS](https:\u002F\u002Fsheetjs.com), dynamic-imported) |\n| Analytics | [PostHog](https:\u002F\u002Fposthog.com) — events, session replay, error tracking (optional) |\n\n## 📁 Project Structure\n\n```text\nbigset\u002F\n├── frontend\u002F            Next.js 16 — UI + Convex schema & functions\n│   ├── convex\u002F          Convex functions, schema, authz + quota helpers\n├── backend\u002F             Fastify + Mastra — schema inference + populate agent\n│   ├── src\u002Fpipeline\u002F    Pure pipelines: schema inference + populate context\n│   ├── src\u002Fmastra\u002F      Mastra workflows, agents, and tools (Studio at :4111 in dev)\n│   ├── src\u002Femail\u002F       Transactional email (Resend) — sends \"dataset ready\" notifications\n│   └── src\u002Fanalytics\u002F   Server-side PostHog wrapper for backend-only events\n├── scripts\u002F             One-off scripts (e.g. verify-authz.sh)\n├── .env                 Local env for frontend, backend, Convex CLI, and Docker (not committed)\n├── docker-compose.dev.yml\n└── Makefile\n```\n\n---\n\n## 🏗 Building in Public\n\nBigSet is a work in progress. We're building in the open because the best ideas come from the people who actually want to use the thing.\n\nWe'd love your feedback, ideas, or help building — come say hi:\n\n- 🐦 **Twitter:** [@Tiny_Fish](https:\u002F\u002Fx.com\u002FTiny_Fish) for project updates\n- 🗣 **Twitter:** [@not_simantak](https:\u002F\u002Fx.com\u002Fnot_simantak) for the unfiltered version\n- 🐛 **GitHub Issues:** [Report bugs or request features](https:\u002F\u002Fgithub.com\u002Ftinyfish-io\u002Fbigset\u002Fissues)\n\n## 🤝 Contributing\n\nContributions are very welcome — whether it's code, feedback, or just telling us what datasets you'd want to build.\n\n1. Fork the repo\n2. Create a branch (`git checkout -b my-feature`)\n3. Make your changes\n4. Run `bash scripts\u002Fverify-authz.sh` to confirm the authorization layer still holds\n5. Open a PR\n\nIf you're not sure where to start, [open an issue](https:\u002F\u002Fgithub.com\u002Ftinyfish-io\u002Fbigset\u002Fissues) or come say hi.\n\n## 📄 License\n\n[AGPL-3.0](LICENSE)\n","BigSet 是一个能够自动更新并支持查询的实时数据集工具。它允许用户以自然语言描述所需的数据集（如正在招聘的YC公司、本地保险报价等），然后BigSet会构建这些数据集，并保持其最新状态，同时支持SQL查询。该项目基于TinyFish API构建，采用TypeScript编写，确保了高效和可维护性。BigSet适用于需要持续获取最新数据但又不想手动维护爬虫或数据管道的场景，特别适合数据分析、市场研究等领域。通过简单的配置，用户可以快速启动服务并开始使用。",2,"2026-06-11 04:00:40","CREATED_QUERY"]