[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-79916":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":5,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":15,"stars30d":16,"stars90d":14,"forks30d":14,"starsTrendScore":17,"compositeScore":18,"rankGlobal":8,"rankLanguage":8,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":8,"pushedAt":8,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":12,"lastSyncTime":27,"discoverSource":28},79916,"ai-quant-researcher","zostaff\u002Fai-quant-researcher","zostaff",null,"Python",144,48,2,1,0,10,54,4,55.47,"Other",false,"main",true,[],"2026-06-12 04:01:25","# ai-quant-lab\n\n[![Python](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.11+-blue.svg)](https:\u002F\u002Fwww.python.org\u002Fdownloads\u002F)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Flicense-MIT-green.svg)](LICENSE)\n[![Tests](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Ftests-111%20passing-brightgreen.svg)](tests\u002F)\n[![Claude](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FClaude-Anthropic_SDK-7AD6F8.svg)](https:\u002F\u002Fdocs.anthropic.com)\n\n> AI-powered quant research engine. Claude generates strategies; the system validates and kills the bad ones.\n\n---\n\n## Quick start\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fyourname\u002Fai-quant-lab\ncd ai-quant-lab\npython3 -m venv .venv && source .venv\u002Fbin\u002Factivate\npip install -e \".[dev]\"\ncp .env.example .env                     # fill in ANTHROPIC_API_KEY (optional for examples 1-5)\npython examples\u002F05_deflated_sharpe_demo.py   # see the multiple-testing kill in action\npython -m ai_quant_lab.run --iterations 10 --target 2   # full research loop\n```\n\nThat's it. Examples 1-5 run without an API key on synthetic data. Examples\n6-8 need a key and exercise the Claude-powered loop.\n\n---\n\n## Architecture\n\n![Agent loop](figures\u002F02_agent_loop.png)\n\nThe whole thing is six modules and a CLI. See [docs\u002FARCHITECTURE.md](docs\u002FARCHITECTURE.md).\n\n---\n\n## The core loop\n\n```python\nfrom ai_quant_lab.agents.memory import ResearchMemory\nfrom ai_quant_lab.backtest import BacktestConfig\nfrom ai_quant_lab.orchestrator.loop import LoopConfig, run_research_loop\n\nwith ResearchMemory(\".\u002Fmemory.db\") as memory:\n    artifacts, survivors = run_research_loop(\n        price_data,\n        LoopConfig(\n            market_description=\"Daily bars on a single liquid US equity, 10 years.\",\n            iterations=50,\n            target_survivors=3,\n            backtest_config=BacktestConfig(cost_bps=8.0),\n        ),\n        memory=memory,\n    )\n```\n\nThat's the entire user-facing API. Everything tunable goes through env vars\nor `LoopConfig`. The full implementation is in\n[`ai_quant_lab\u002Forchestrator\u002Floop.py`](ai_quant_lab\u002Forchestrator\u002Floop.py) —\nfewer than 200 lines.\n\n---\n\n## Why this exists\n\nMost LLM trading agent frameworks optimize for the demo: a single\nend-to-end run that produces a plausible-looking backtest. They skip the\nunsexy part: knowing whether the backtest means anything.\n\nThe honest answer for most strategies is \"no, it doesn't.\" When you let\nClaude propose, code, and backtest a thousand variants, picking the best one\nis exactly the recipe for the classic multiple-testing trap. The reported\nSharpe is meaningless. The strategy will not work in production. This is the\nproblem ai-quant-lab is built around.\n\nThe fix is statistical rigor as a hard gate: every accepted strategy passes\nan adversarial critic, a deflated Sharpe test parameterized by the honest\ntrial count, and a correlation check against the existing survivor set. No\noverrides. Generation is cheap; validation is expensive; the architecture\nputs the expensive part where it belongs.\n\n---\n\n## What's different\n\n| Feature                            | ai-quant-lab          | TradingAgents | AgentQuant | QuantEvolve |\n|------------------------------------|-----------------------|---------------|------------|-------------|\n| Deflated Sharpe gate               | ✅ Hard gate, empirical trial variance | ❌  | ❌         | ❌          |\n| Purged CV                          | ✅ Combinatorial       | ❌            | ❌         | ❌          |\n| Leakage detector                   | ✅ Correlation + structural (truncation) | ❌ | ❌ | ❌      |\n| Walk-forward                       | ✅ With purge          | ❌            | ❌         | ✅ Basic    |\n| Cross-sectional portfolio engine   | ✅ Long-short, dollar-neutral | ❌      | ❌         | ❌          |\n| Cross-sectional features           | ✅ Rank, z-score, factor \u002F industry neutralize | ❌ | ❌ | ❌    |\n| Meta-labeling                      | ✅ Logistic classifier, act\u002Fskip filter | ❌ | ❌    | ❌          |\n| Factor attribution                 | ✅ PCA + Fama-French OLS | ❌          | ❌         | ❌          |\n| PCA concentration gate             | ✅ Catches stealthy duplication | ❌  | ❌         | ❌          |\n| Realistic execution                | ✅ Slippage, partial fills, participation cap | ❌ | ❌ | ❌ |\n| TCA framework                      | ✅ Calibrates costs from fills | ❌    | ❌         | ❌          |\n| Intraday-aware bar engine          | ✅ BarSchedule auto-annualization | ❌ | ❌      | ❌          |\n| OHLCV features                     | ✅ Parkinson, Garman-Klass, VWAP-dev | ❌ | ❌ | ❌      |\n| Adversarial critic                 | ✅ Per-market templates (equities\u002Fcrypto\u002Ffutures\u002Foptions\u002Ffx) | ❌ | ✅ Reflect | ❌ |\n| Memory \u002F trial counting            | ✅ SQLite, persists returns | ❌       | ✅ SQLite  | ✅ Feature map |\n| Claude-native                      | ✅ Anthropic SDK + prompt caching | Generic | Gemini | Gemini |\n| Production kill-switch             | ✅ Drawdown \u002F loss \u002F Sharpe collapse | ❌ | ❌    | ❌          |\n| Paper trading diagnostics          | ✅                     | ❌            | ❌         | ❌          |\n| Lines of code                      | ~5000                  | ~5000+        | ~3000+     | ~2000+      |\n| Dependencies                       | 5                      | 15+           | 10+        | 10+         |\n| Works without API key              | ✅ (examples 1-5, 7-9) | ❌            | ❌         | ❌          |\n| Continuous integration             | ✅ (Python 3.11+3.12, ruff) | ❌       | ❌         | ❌          |\n\n---\n\n## The three eras of quant research\n\n![Three eras](figures\u002F01_three_eras.png)\n\nQuant has gone through three big shifts. **Classical** (1980-2005) ran on\nhuman intuition and slow hand-coded backtests; one researcher could test\nmaybe ten strategies a week. **Factor \u002F ML** (2000-2020) industrialized this\nwith libraries, cloud, and falling compute costs; throughput climbed but the\nedge per strategy shrank as the obvious factors got arbitraged.\n\nThe **AI-assisted era** changes the throughput equation by another two\norders of magnitude. One person with an Anthropic key can have Claude\npropose, code, and backtest a thousand strategies a week. Which means the\nbinding constraint is no longer ideas. It's the validation gate.\n\n---\n\n## Time compression\n\n![Time compression](figures\u002F03_time_compression.png)\n\nThe classical research cycle was bottlenecked by coding. A working\nbacktest took a week; iterating on the spec took another week. Now both\ncollapse to minutes. The bottleneck moves to the only step that humans\nneed to be honest about: **knowing whether the result means anything.**\n\n---\n\n## The multiple-testing tax\n\n![Multiple testing](figures\u002F04_multiple_testing.png)\n\nRun a thousand random strategies on a no-edge tape. The expected maximum\nannualized Sharpe is roughly **1.5**. That's the Sharpe of a \"good\" strategy.\nWhich is why naïve reports of \"we tested a thousand things and the best one\nhas a Sharpe of 1.4\" are noise reports.\n\nThe deflated Sharpe ratio penalizes this. See `examples\u002F05_deflated_sharpe_demo.py`\nfor a live demo: pick the best of 1000 random strategies on GBM, run it\nthrough `deflated_sharpe(returns, n_trials=1000)`, watch the p-value go to\n0.62. The strategy doesn't pass the gate. It shouldn't.\n\n---\n\n## Where leakage hides\n\n![Leakage examples](figures\u002F08_leakage_examples.png)\n\nThe most common forms of leakage are not exotic:\n\n1. **Centered rolling windows.** `series.rolling(21, center=True).mean()`\n   includes ten future bars in the value at time t. Any feature built on\n   this peeks at the future.\n2. **Forgotten shifts.** `momentum(price, 21)` without a `.shift(1)` uses\n   today's close in today's signal. The strategy can't actually trade on\n   today's close until tomorrow's open.\n3. **Forward-looking labels used as features.** Triple-barrier labels are\n   built to look forward — that's their job as a _target_. Using them as a\n   _feature_ is the cleanest possible leak.\n\nThe leakage detector (`features\u002Fleakage_detector.py`) catches all three\nshapes by comparing future and past correlations of every feature against\nthe target. See [`examples\u002F03_leakage_demo.py`](examples\u002F03_leakage_demo.py).\n\n---\n\n## Walk-forward, the right way\n\n![Walk-forward](figures\u002F07_walkforward_correct_vs_wrong.png)\n\nShuffled k-fold CV is fine for IID data. Time series is not IID. Run\nwalk-forward (`validation\u002Fwalk_forward.py`) and set a `purge` gap if your\nlabels look forward more than one bar. The OOS curve is the only one that\nmatters.\n\n```python\nout = walk_forward_evaluate(\n    price_data, strategy,\n    train_size=504, test_size=126, purge=5, mode=\"rolling\",\n)\nprint(out[\"metrics\"][\"sharpe_ratio\"])\n```\n\nFor a robustness check across overlapping splits, use\n`combinatorial_purged_cv` — every bar appears in multiple test sets, which\ngives you a distribution of OOS Sharpes rather than a single point.\n\n---\n\n## The three gates\n\nA strategy must pass all three:\n\n1. **Critic.** Adversarial LLM review BEFORE the backtest. Catches the\n   obviously broken ideas. One LLM call.\n2. **Deflated Sharpe.** P-value below `settings.dsr_pvalue_max` (default\n   5%). Parameterized by the honest `n_trials` from `ResearchMemory`. No\n   override.\n3. **Correlation.** Maximum |correlation| with already-accepted strategies\n   below `settings.max_correlation` (default 0.6). Keeps the survivor set\n   diverse.\n\nImplementation in [`orchestrator\u002Fgates.py`](ai_quant_lab\u002Forchestrator\u002Fgates.py).\nFull statistical write-up in [`docs\u002FVALIDATION.md`](docs\u002FVALIDATION.md).\n\n---\n\n## Sections from the article\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Why generation is cheap and validation is expensive\u003C\u002Fb>\u003C\u002Fsummary>\n\nThe asymmetry is the whole point. Claude can propose a thousand strategies a\nday at ~$10. Each requires hundreds of bars of data and seconds of compute to\nvalidate. The throughput equation says: spend cycles validating, not\ngenerating.\n\nConcretely, ai-quant-lab is structured so the cheap step (LLM call) gates\nthe expensive step (full validation). The critic is one cheap call. The\nsandbox is fast. The deflated Sharpe test is a closed-form formula. The\ncostly part — purged CV across many folds — only runs on strategies that have\nalready cleared the cheaper checks.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>The honest n_trials problem\u003C\u002Fb>\u003C\u002Fsummary>\n\nEvery parameter sweep counts. Every variant counts. Every \"let me just try\none more thing\" counts. The deflated Sharpe gate is only as honest as the\ntrial counter feeding it.\n\n`ResearchMemory` exists to make this counter tamper-evident. Every\nhypothesis ever proposed — accepted, rejected, killed by the critic — gets a\nrow in SQLite. The gate reads `memory.n_trials()` and there's no API to\nzero it out.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Why prompt caching matters\u003C\u002Fb>\u003C\u002Fsummary>\n\nThe system prompts of HypothesisAgent, CriticAgent, and CodeAgent are\ninvariant across every iteration of the loop. Marking them with\n`cache_control: ephemeral` means every call after the first hits the\n5-minute cache. On a 50-iteration loop, that's roughly a 10× cost reduction\nwithout changing a single behavior.\n\nSee [`agents\u002Fbase.py`](ai_quant_lab\u002Fagents\u002Fbase.py).\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>The sandbox: catching accidents, not adversaries\u003C\u002Fb>\u003C\u002Fsummary>\n\nThe sandbox AST-walks generated code and rejects anything outside `numpy`,\n`pandas`, `math`, and `ai_quant_lab.features.library`. The runtime\nnamespace has a whitelist of builtins — no `open`, no `__import__`, no\n`eval`. A SIGALRM enforces a wall-clock timeout.\n\nThis is not a security boundary; the LLM running on your machine can write\nany code your shell can run. The sandbox stops Claude from writing a\nstrategy that accidentally `import os; os.system(\"rm -rf ~\")` because of a\nhallucination. It does not stop a determined adversary.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Live diagnostics and the kill switch\u003C\u002Fb>\u003C\u002Fsummary>\n\nOnce a strategy clears the gates, it gets a paper-trade slot. The\n`LiveDiagnostic` compares a rolling window of live returns to the backtest\ndistribution. The `KillSwitch` trips on hard rules (drawdown, daily loss,\nrolling Sharpe collapse) and stays tripped until manually reset.\n\nThere is no \"the strategy was about to recover\" override. That's how\nblow-ups happen.\n\n```python\nfrom ai_quant_lab.production import KillSwitch, LiveDiagnostic\nfrom ai_quant_lab.production.kill_switch import drawdown_trigger, sharpe_collapse_trigger\n\ndiag = LiveDiagnostic(backtest_returns=accepted.returns, window_days=60)\nkill = KillSwitch(triggers=[drawdown_trigger(0.10), sharpe_collapse_trigger(-0.5, 60)])\n```\n\u003C\u002Fdetails>\n\n---\n\n## Examples\n\n| File | What it shows | Needs API key? |\n|------|---------------|----------------|\n| [`01_backtest_basics.py`](examples\u002F01_backtest_basics.py) | Vectorized backtest on synthetic GBM, headline metrics. | No |\n| [`02_feature_pipeline.py`](examples\u002F02_feature_pipeline.py) | Building leakage-proof features via FeaturePipeline. | No |\n| [`03_leakage_demo.py`](examples\u002F03_leakage_demo.py) | Side-by-side: forward reference, centered rolling, clean. | No |\n| [`04_walk_forward.py`](examples\u002F04_walk_forward.py) | Walk-forward on a momentum strategy with diagnostics. | No |\n| [`05_deflated_sharpe_demo.py`](examples\u002F05_deflated_sharpe_demo.py) | 1000 random strategies, DSR kills the lucky best. | No |\n| [`06_full_research_loop.py`](examples\u002F06_full_research_loop.py) | End-to-end loop: Claude proposes → validates → accepts\u002Frejects. | Yes |\n| [`07_cross_sectional_momentum.py`](examples\u002F07_cross_sectional_momentum.py) | Three iterations of cross-sectional momentum, with DSR each step. | No |\n| [`08_paper_trading_sim.py`](examples\u002F08_paper_trading_sim.py) | Simulated paper trading with daily diagnostics and a kill switch. | No |\n| [`09_cross_sectional_portfolio.py`](examples\u002F09_cross_sectional_portfolio.py) | Long-short basket + PCA gate finds stealthy duplication missed by pairwise corr. | No |\n\n---\n\n## Where AI still fails\n\n1. **Survivorship.** Claude proposes strategies on the universe you give it.\n   If your universe is \"S&P 500 today,\" every strategy is biased toward\n   what survived. Fix at the data layer, not in code.\n2. **Cost shocks.** Realistic frictions kill 80% of paper-edges. The\n   default `cost_bps=8` is a generous baseline; small-cap and illiquid\n   markets need much more. Tune for your venue.\n3. **Regime shifts.** Walk-forward is honest within the regimes present in\n   the data. If 2020 isn't in your sample, you have no claim about how the\n   strategy handles 2020.\n4. **The trial counter.** It only counts what's in `ResearchMemory`.\n   Strategies you tested in a notebook and didn't save count too — but\n   nobody records those. Be paranoid about your own scaffolding, not just\n   about Claude's.\n5. **Implementation gaps.** A backtest assumes you can trade your\n   target positions at the closing price. Reality has order books,\n   queues, fill probabilities, partial fills. The diagnostic layer catches\n   live drift, but it can't substitute for an actual execution venue\n   integration. That's not in scope here.\n\n---\n\n## Build slow. Validate hard. Trade small.\n\nIf the gate kills your favorite strategy, the strategy is the problem, not\nthe gate. Add more data. Count your trials honestly. Or accept that the\nedge probably wasn't there.\n\n---\n\n## Docs\n\n- [docs\u002FARCHITECTURE.md](docs\u002FARCHITECTURE.md) — system design.\n- [docs\u002FAGENTS.md](docs\u002FAGENTS.md) — agent prompts and behavior.\n- [docs\u002FVALIDATION.md](docs\u002FVALIDATION.md) — statistical methodology.\n- [docs\u002FCONTRIBUTING.md](docs\u002FCONTRIBUTING.md) — how to extend the system.\n\n## License\n\nMIT. See [LICENSE](LICENSE). Not investment advice. You can lose money trading.\n","ai-quant-researcher 是一个基于AI的量化研究引擎，利用Claude AI生成交易策略，并通过严格的统计验证来筛选出有效的策略。其核心功能包括使用Anthropic的Claude AI自动生成策略、对这些策略进行多测试陷阱下的有效性验证（如偏斜夏普比率检验）以及与现有有效策略集的相关性检查。项目采用Python编写，具有简洁明了的用户接口和高度模块化的架构设计，支持通过环境变量或配置对象调整参数。适合于希望在保证策略稳健性的前提下探索新交易思路的量化研究人员及团队使用。","2026-06-11 03:58:31","CREATED_QUERY"]