[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82399":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},82399,"poly_data","warproxxx\u002Fpoly_data","warproxxx","Polymarket Data Retriever that fetches, processes, and structures Polymarket data including markets, order events and trades.","",null,"Python",2097,410,23,10,0,34,76,82,102,29.84,"GNU General Public License v3.0",false,"main",true,[],"2026-06-12 02:04:25","# Polymarket Data (v2)\n\n[![License: GPL-3.0](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-GPL--3.0-blue.svg)](https:\u002F\u002Fopensource.org\u002Flicenses\u002FGPL-3.0)\n[![GitHub stars](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002Fwarproxxx\u002Fpoly_data)](https:\u002F\u002Fgithub.com\u002Fwarproxxx\u002Fpoly_data\u002Fstargazers)\n[![GitHub last commit](https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flast-commit\u002Fwarproxxx\u002Fpoly_data)](https:\u002F\u002Fgithub.com\u002Fwarproxxx\u002Fpoly_data\u002Fcommits\u002Fmain)\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10%2B-blue.svg)](https:\u002F\u002Fwww.python.org\u002F)\n\nA pipeline for fetching, processing, and analyzing Polymarket v2 trading data. Reads order events directly from the Polymarket **CTF Exchange V2** contract on Polygon via JSON-RPC, joins them with market metadata from the Polymarket Gamma API, and writes structured trades to CSV.\n\n## ⚠️ v1 → v2 migration\n\nPolymarket migrated to a new set of CTF Exchange contracts on **2026-04-28** and stopped supporting their old subgraph indexer. The old pipeline in this repo (Goldsky subgraph + GraphQL polling) **no longer returns complete data**, so it has been removed.\n\nThe previous version is preserved at the [`v1-final`](https:\u002F\u002Fgithub.com\u002Fwarproxxx\u002Fpoly_data\u002Ftree\u002Fv1-final) tag if you need it for historical analysis. **For any new work, use this v2 version.**\n\nThe V1 retriever used goldsky's very leniet stack for free data but now goldsky only gives data thru turbo pipeline. It is expensive and complex. Other third party options are also high dependency too. So I have decided to get the data directly onchain in this version\n\n\n## Configuration\n\nAll tuning is via environment variables. \n\n| Variable | Default | What it does |\n|---|---|---|\n| `POLYGON_RPC_URL` | `https:\u002F\u002Fpolygon-bor-rpc.publicnode.com` | Polygon JSON-RPC endpoint. Public default works but is slow and times out under sustained backfill. Free tier of QuickNode or Alchemy is much more reliable. Paid tier recommended if you're doing it in a serious environment. |\n| `POLYGON_MAX_BLOCK_RANGE` | `5` | Blocks per `eth_getLogs` query. Default is safe for free RPC tiers; if you have a paid plan, set it to `500` or `1000` to backfill much faster. If you set it higher than your RPC allows, the run stops with an error telling you to lower it. |\n| `PROCESS_CHUNK_SIZE` | `0` | When `>0`, streams `data\u002ForderFilled.csv` through chunks if your machine has limitations processing the data. Use `500000` if processing OOMs on your machine. |\n\nSet them in `.env` file:\n\n```bash\nexport POLYGON_RPC_URL=\"https:\u002F\u002Fyour-endpoint-here\"\nexport POLYGON_MAX_BLOCK_RANGE=1000   # paid RPC plan? bump from 4 to 500 or 1000\nexport PROCESS_CHUNK_SIZE=500000   # only if RAM is tight\n```\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Installation](#installation)\n- [Polygon RPC setup](#polygon-rpc-setup)\n- [Quick Start](#quick-start)\n- [Project Structure](#project-structure)\n- [Data Files](#data-files)\n- [Pipeline Stages](#pipeline-stages)\n- [Resumable & Incremental](#resumable--incremental)\n- [Troubleshooting](#troubleshooting)\n- [Analysis](#analysis)\n- [License](#license)\n\n## Overview\n\n`update.py` runs three stages:\n\n1. **Markets** — fetches all Polymarket markets (closed + active) via the Gamma **keyset** API (`\u002Fmarkets\u002Fkeyset`). Resumable from a saved cursor; subsequent runs only fetch newly created markets.\n2. **Chain** — reads `OrderFilled` events from the CTF Exchange V2 contract (`0xE111180000d2663C0091e4f400237545B87B996B`) on Polygon via direct JSON-RPC. Resumable from the last scanned block.\n3. **Process** — joins order events with market metadata to produce labeled trades with price, USD amount, and BUY\u002FSELL direction.\n\nStages 1 and 2 run in **parallel** (different APIs, zero contention), so total wall time is `max(markets, chain)` rather than the sum.\n\n## Installation\n\nThis project uses [UV](https:\u002F\u002Fdocs.astral.sh\u002Fuv\u002F) for fast package management.\n\n```bash\n# macOS \u002F Linux\ncurl -LsSf https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.sh | sh\n\n# Windows\npowershell -c \"irm https:\u002F\u002Fastral.sh\u002Fuv\u002Finstall.ps1 | iex\"\n\n# Or with pip\npip install uv\n```\n\nThen install dependencies:\n\n```bash\nuv sync\n```\n\n## Polygon RPC setup\n\nThe V1 retriever used goldsky's very leniet stack for free data but now goldsky only gives data thru turbo pipeline. It is expensive and high dependency. Other third party options are high dependency too. So I have decided to get the data directly onchain. For that it needs an RPC URL. If not set, it defaults to `https:\u002F\u002Fpolygon-bor-rpc.publicnode.com`\n\nFor faster retrieval get a node from Quicknode or Alchemy in their premier tiers. If you have no idea what that is, you can sign up [here](https:\u002F\u002Fquicknode.com\u002Fsignup?via=daniel-s)\n\n## Quick Start\n\n```bash\nuv run python update.py\n```\n\nThat's it. Runs markets + chain in parallel, then processes trades. **First run is the long one** — initial markets fetch is ~hour, initial chain backfill from v2 genesis (~April 2026) is several hours on a free RPC. Subsequent runs are seconds.\n\nTo run any stage individually:\n\n```bash\nuv run python -m update_utils.update_markets\nuv run python -m update_utils.update_chain\nuv run python -m update_utils.process_live\n```\n\n## Project Structure\n\n```\npoly_data\u002F\n├── update.py                  # orchestrator: markets + chain in parallel, then process\n├── update_utils\u002F\n│   ├── update_markets.py      # Polymarket Gamma keyset API → markets.csv\n│   ├── update_chain.py        # Polygon RPC OrderFilled events → data\u002ForderFilled.csv\n│   └── process_live.py        # join orders ↔ markets → processed\u002Ftrades.csv\n├── poly_utils\u002F\n│   └── utils.py               # market loader, missing-token backfill\n├── data\u002F                      # all generated data + resume state (gitignored)\n│   ├── markets.csv            # all markets, all fields preserved\n│   ├── missing_markets.csv    # markets backfilled per-token from trades\n│   ├── markets_*_part.csv     # per-pass keyset output (closed\u002Factive)\n│   ├── markets_*_state.json   # keyset cursor state for incremental resume\n│   ├── orderFilled.csv        # raw order events from chain\n│   └── cursor_state.json      # last scanned block\n└── processed\u002F                 # user-facing output (gitignored)\n    └── trades.csv             # labeled trades for analysis\n```\n\n## Data Files\n\n### `data\u002Fmarkets.csv`\nAll markets returned by the Gamma keyset API. **All API fields are preserved as-is**; nested objects (`outcomes`, `clobTokenIds`, `events`, etc.) are stored as JSON strings. The exact column set is determined by the first batch the API returns and persisted in `markets_*_state.json` so resumed runs stay consistent.\n\nKey fields used downstream: `id`, `question`, `slug`, `conditionId`, `clobTokenIds` (JSON array — first element = `token1`, second = `token2`), `closedTime`, `volume`.\n\n### `data\u002ForderFilled.csv`\nRaw `OrderFilled` events decoded from the chain. Schema:\n\n| Column | Notes |\n|---|---|\n| `timestamp` | Unix seconds (from block timestamp) |\n| `maker` | Maker address, lowercase |\n| `makerAssetId` | `\"0\"` if maker is paying USDC; otherwise the CTF token ID |\n| `makerAmountFilled` | Raw integer (6 decimals — USDC and CTF tokens both use 6) |\n| `taker` | Taker address, lowercase |\n| `takerAssetId` | `\"0\"` if taker is paying USDC; otherwise the CTF token ID |\n| `takerAmountFilled` | Raw integer (6 decimals) |\n| `transactionHash` | Polygon transaction hash |\n\nThe v2 `OrderFilled` event natively carries a single `tokenId` + `side` (BUY=0, SELL=1) referring to the maker's order. The reader maps that back to the v1-compatible maker\u002Ftaker\u002Fasset schema above so downstream code stays simple. The CTF Exchange contract address `0xe111180000d2663c0091e4f400237545b87b996b` may appear as `taker` for some events — this is the contract acting as an intermediary for CTF mint\u002Fburn flows during cross-side matches; treat it as a sub-event rather than a counterparty.\n\n### `processed\u002Ftrades.csv`\nLabeled trades for analysis:\n\n| Column | Notes |\n|---|---|\n| `timestamp` | datetime |\n| `market_id` | from markets.csv (null if market wasn't found) |\n| `maker`, `taker` | addresses |\n| `nonusdc_side` | `\"token1\"` or `\"token2\"` |\n| `maker_direction`, `taker_direction` | `\"BUY\"` \u002F `\"SELL\"` |\n| `price` | USDC per outcome token (0–1) |\n| `usd_amount` | trade size in USD |\n| `token_amount` | outcome tokens transferred |\n| `transactionHash` | |\n\n## Pipeline Stages\n\n### 1. `update_markets` — Polymarket Gamma keyset API\n\nPages through `\u002Fmarkets\u002Fkeyset` with `closed=true` then `closed=false`. Saves a cursor per pass; on subsequent runs, resumes from the cursor and only pulls newly created markets.\n\nOutputs `markets_closed_part.csv` + `markets_active_part.csv` (kept across runs as the source of truth) and merges them into `markets.csv` at the end of each run.\n\n### 2. `update_chain` — Polygon RPC `OrderFilled` events\n\nCalls `eth_getLogs` on the CTF Exchange V2 contract in 1000-block windows (auto-halves on errors). Decodes events with the ABI, looks up block timestamps (cached per chunk), and appends to `data\u002ForderFilled.csv`. Saves the last scanned block to `data\u002Fcursor_state.json`.\n\n### 3. `process_live`\n\nReads `data\u002ForderFilled.csv`, finds the resume point in `processed\u002Ftrades.csv`, joins new orders against `get_markets()` (which parses `clobTokenIds` into `token1`\u002F`token2`), computes price\u002FUSD\u002Fdirection, and appends to `processed\u002Ftrades.csv`.\n\nIf any trade references a token ID not in `markets.csv`, it's backfilled into `missing_markets.csv` via a per-token Gamma API call before the join. For memory-bounded processing on large datasets, set `PROCESS_CHUNK_SIZE` — see [Configuration](#configuration).\n\n## Analysis\n\n```python\nimport polars as pl\nfrom poly_utils.utils import get_markets\n\nmarkets_df = get_markets()           # parses clobTokenIds → token1\u002Ftoken2\ntrades_df = pl.read_csv(\"processed\u002Ftrades.csv\", try_parse_dates=True)\n\n# Filter trades for a specific user (filter on `maker` — see note below)\nUSERS = {\n    'domah': '0x9d84ce0306f8551e02efef1680475fc0f1dc1344',\n    '50pence': '0x3cf3e8d5427aed066a7a5926980600f6c3cf87b3',\n}\ntrader_df = trades_df.filter(pl.col(\"maker\") == USERS['domah'])\n```\n\n**Note on user filtering**: Polymarket emits `OrderFilled` from the maker's perspective at the contract level. When you want a user's trades from their side, filter on `maker`, not `taker`.\n\n## License\n\nGPL-3.0 — see [LICENSE](LICENSE).\n","poly_data 项目是一个用于从 Polymarket 平台抓取、处理和结构化市场数据（包括市场信息、订单事件和交易）的工具。它通过 JSON-RPC 从 Polygon 网络上的 Polymarket CTF Exchange V2 合约中读取订单事件，并结合 Polymarket Gamma API 提供的市场元数据，将结构化的交易记录输出为 CSV 文件。该工具采用 Python 语言开发，支持配置环境变量以调整运行参数如 RPC URL 和块范围等，从而优化数据获取效率。适用于需要对 Polymarket 上的交易活动进行分析或监控的应用场景，特别是对于希望直接从链上获取最新完整数据的研究者和开发者来说非常有用。",2,"2026-06-11 04:08:30","high_star"]