[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-750":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":19,"rankGlobal":10,"rankLanguage":10,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":21,"hasPages":23,"topics":24,"createdAt":10,"pushedAt":10,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":28,"discoverSource":29},750,"oransim","OranAi-Ltd\u002Foransim","OranAi-Ltd","Causal Digital Twin for Marketing at Scale · Predict any marketing decision before you spend a dollar.","https:\u002F\u002Foran.cn",null,"Python",1172,152,62,2,0,15,98,19.55,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:00:18","\u003Cdiv align=\"center\">\n\u003Cimg src=\"assets\u002Fwordmark.svg\" alt=\"Oransim\" width=\"640\"\u002F>\n\n### Predict your next campaign's ROI before spending a dollar.\n\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim\u002Fblob\u002Fmain\u002FLICENSE\">\u003Cimg alt=\"License\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Flicense\u002FOranAi-Ltd\u002Foransim?color=blue\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim\u002Freleases\">\u003Cimg alt=\"Release\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fv\u002Ftag\u002FOranAi-Ltd\u002Foransim?label=release&color=blue\">\u003C\u002Fa>\n  \u003Ca href=\"#\">\u003Cimg alt=\"Python\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fpython-3.10%2B-blue\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim\u002Factions\u002Fworkflows\u002Fci.yml\">\u003Cimg alt=\"CI\" src=\"https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim\u002Factions\u002Fworkflows\u002Fci.yml\u002Fbadge.svg\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim\u002Fstargazers\">\u003Cimg alt=\"Stars\" src=\"https:\u002F\u002Fimg.shields.io\u002Fgithub\u002Fstars\u002FOranAi-Ltd\u002Foransim?style=social\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Foran.cn\">\u003Cimg alt=\"Website\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fwebsite-oran.cn-FF6B35\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp>\n  \u003Cstrong>🇬🇧 English\u003C\u002Fstrong> · \u003Ca href=\"README.zh-CN.md\">🇨🇳 中文\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp>\u003Cem>Causal simulation for enterprise growth teams.\u003Cbr\u002F>Audit the engine, license the data.\u003C\u002Fem>\u003C\u002Fp>\n\u003C\u002Fdiv>\n\n---\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"assets\u002Fscreenshots\u002Fhero.png\" alt=\"Oransim hero · 60-second prediction with counterfactual reasoning over a agent-based society\" width=\"100%\"\u002F>\n\u003C\u002Fp>\n\n**For enterprise CMOs** — predict your next campaign's ROI before spending: **4.3M+ indexed 小红书 notes · 2.1M+ creators (达人) across 15 verticals · 100,000+ surveyed consumer panel**, refreshed daily via licensed platform APIs. Counterfactual reasoning engine running on a **1M+ virtual consumer society** with LLM-backed soul personas reading your actual creatives. Transparent causal logic, open-sourced so you can audit it before licensing data access.\n\n*The OSS repo you're reading is the same causal engine running on a 21k-note demo corpus — try it, audit the mechanism end-to-end, then explore the live Enterprise data panel at [datacenter.oran.cn](https:\u002F\u002Fdatacenter.oran.cn\u002F) or contact `cto@orannai.com` for licensed access.*\n\n---\n\n## Who we are\n\n**OranAI Ltd. (橙果视界（深圳）科技有限公司)** — a Shenzhen-based AI marketing company founded May 2024, closed a **multi-million-dollar angel+ round** led by [Cloud Angels Fund, with participation from Leaguer Venture Capital and Jinshajiang United Capital](https:\u002F\u002F36kr.com\u002Fp\u002F3442645125141897). We co-operate the [Tencent Cloud × OranAI AIGC Design Lab](https:\u002F\u002Fcaijing.chinadaily.com.cn\u002Fa\u002F202412\u002F26\u002FWS676d01b5a310b59111daaff3.html), run our in-house multimodal matrix (**Oran-VL 7B** \u002F **Oran-XVL 72B**) behind four products — **PhotoG** (creative agent) · **DataG** (insight engine) · **VoyaAI** (strategy co-pilot) · **[DataCenter](https:\u002F\u002Fdatacenter.oran.cn\u002F)** (real-time creator + note panel explorer) — and serve **70+ enterprise clients** across beauty, FMCG, consumer electronics, and DTC outbound — including [Timekettle and Hyundai Motor (Pharos IV Best Prize)](https:\u002F\u002Fm.tech.china.com\u002Farticles\u002F20260117\u002F202601171798695.html), with 2025 revenue crossing **RMB 20M**.\n\n**Oransim is the causal engine inside that stack.** When a CMO using OranAI asks *\"what if we swapped KOL A for B on day 3 of this campaign?\"* — the `do()`-operator, the per-arm counterfactual heads, and the 14-day Hawkes rollout that answer the question all live in this repository. We open-sourced it under Apache-2.0 so enterprise buyers can audit the reasoning end-to-end — **trust the engine, then license the data panel.**\n\n\u003Csub>As featured in: [PR Newswire](https:\u002F\u002Fwww.prnewswire.com\u002Fnews-releases\u002Foranai-raises-multi-million-dollar-angel-funding-to-lead-ai-content-marketing-through-its-ai-agent-photog-302548911.html) · [亿邦动力](https:\u002F\u002Fwww.ebrun.com\u002F20250520\u002F579947.shtml) · [新浪科技](https:\u002F\u002Ffinance.sina.com.cn\u002Ftech\u002Froll\u002F2024-11-26\u002Fdoc-incxkhus4289659.shtml) · [腾讯新闻](https:\u002F\u002Fnews.qq.com\u002Frain\u002Fa\u002F20250714A07JHO00) · [DoNews](https:\u002F\u002Fwww.donews.com\u002Fnews\u002Fdetail\u002F5\u002F3670706.html)\u003C\u002Fsub>\n\n---\n\n## What it solves\n\nThree campaign decisions that break traditional tools but collapse to one Oransim workflow:\n\n### 1. Pre-launch \n> *\"I have 4 creative videos × 3 KOL shortlists × 2 budget tiers — which combination has the highest ROI?\"*\n\nTraditional approach: A\u002FB test for 2 weeks, burn ¥500k to learn. **Oransim**: 60-second simulation on ¥0, rank all 24 combinations with P35\u002FP65 confidence bands, pick top 3 to actually test.\n\n### 2. Mid-campaign \n> *\"Day 3 CTR is below target. Can I swap out 2 KOLs and reallocate budget to 3 others — and how much ROI shifts?\"*\n\nTraditional approach: data team rebuilds a dashboard overnight. **Oransim**: `do(kol=swap_A_for_B, day=3)` counterfactual rollout in 30 seconds — shows the 14-day path diff with the intervention applied.\n\n### 3. Post-mortem \n> *\"This campaign underperformed. If we'd spent on 小红书 instead of 抖音, what would we have gotten?\"*\n\nTraditional approach: retrospective analysis, ambiguous conclusion. **Oransim**: load actuals + `do(platform_alloc={xhs: 1.0})`, get the counterfactual ROI curve over the same agent population — confident attribution of what would have happened.\n\nAll three run on the same engine. Below is how it's built and why you can trust it.\n\n---\n\n## Why current tools can't answer these three questions\n\nEvery marketing intelligence tool answers part of the question. None answer all three campaign decisions above on the same data:\n\n| The 3 CMO questions | What existing tools do | What's missing |\n|---|---|---|\n| **Pre-launch ROI ranking** for 24 creative × KOL × budget combinations | Classical **Marketing Mix Modelers** fit the total revenue curve — one number per period | Can't tell you *which combination*: MMM is a total, not a per-arm counterfactual |\n| **Mid-campaign intervention** — what if I swap a KOL on day 3? | **Customer Data Platforms** report what already happened — click funnel, cohort retention | Can't roll forward under a `do()` — DMPs are observational, not causal |\n| **Post-mortem counterfactual** — what if we'd spent on 小红书 instead of 抖音? | **Black-box predictors** (AutoML, LLM \"predict ROI\") output a number with no derivation | Can't audit the reasoning — SHAP plots ≠ a causal graph |\n\nOransim sits in the gap: **per-arm counterfactuals** (pre-launch ranking) · **temporal `do()`-rollout** (mid-campaign swap) · **transparent causal graph** (post-mortem audit). One engine, three decisions.\n\n---\n\n## Why you can trust it — three signals, pick what your stakeholders care about\n\n### 🔬 Mechanism · audit the engine yourself\n\nThe OSS repo you're reading is the **full causal engine**, not a marketing demo. Clone it, run it on your scenarios, trace any prediction back through the 64-node causal graph to which agent decision and which budget-curve calculation produced it. No \"trust us, it's ML\" — every prediction is decomposable.\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim.git && cd oransim\npip install -e '.[dev]' && python -m uvicorn oransim.api:app --port 8001 &\ncurl http:\u002F\u002Flocalhost:8001\u002Fapi\u002Fgraph\u002Finspect   # the causal graph, in JSON\n```\n\n### 📊 Data · what Enterprise licenses get you beyond the OSS demo\n\nThe OSS ships a 21k-note reference corpus — enough to validate the mechanism, not enough to power production campaigns. Enterprise Edition runs on a continuously refreshed licensed panel, explorable live at **[datacenter.oran.cn](https:\u002F\u002Fdatacenter.oran.cn\u002F)**:\n\n| Asset | Scale | Source |\n|---|---|---|\n| 小红书 notes | **4,300,000+**, daily refresh | Licensed platform APIs + in-house crawlers |\n| Creators (达人) | **2,100,000+** across 15 verticals — 美妆 · 护肤 · 穿搭 · 3C · 食饮 · 母婴 · 家居 · 汽车 · 汽车后市场 · 健身 · 理财 · 奢品 · 宠物 · 医美 · 旅行 · spanning KOL (top + mid tier), KOC (waist, 1k–50k fans), and long-tail creators | Platform signal + fan-profile metadata |\n| Consumer panel | **100,000+** verified 小红书 users, surveyed monthly | Opt-in recruitment |\n\n*Browse the live panel at **[datacenter.oran.cn](https:\u002F\u002Fdatacenter.oran.cn\u002F)** · contact [`cto@orannai.com`](mailto:cto@orannai.com?subject=Oransim%20Enterprise%20Data%20Access) for licensed integration.*\n\n### 📚 Research · 12-year tech lineage behind every layer\n\nOransim isn't a \"vibes LLM\" — every layer traces to 2010–2024 peer-reviewed literature:\n\n\u003Cdetails>\n\u003Csummary>Architecture + research lineage (click to expand)\u003C\u002Fsummary>\n\n- **Per-arm counterfactual heads** — TARNet (Shalit ICML 2017) · Dragonnet (Shi NeurIPS 2019)\n- **Representation balancing** — HSIC (Gretton 2005) · adversarial-IPTW · BCAUSS · CaT (Melnychuk ICML 2022)\n- **In-context amortization** — CInA (Arik & Pfister NeurIPS 2023)\n- **Causal Neural Hawkes Process** — Mei & Eisner NeurIPS 2017 + Zuo ICML 2020 + Geng NeurIPS 2022 counterfactual TPP\n- **Budget curves** — Hill saturation (Dubé & Manchanda 2005) + frequency fatigue (Naik & Raman 2003)\n- **SCM** — Pearl 3-step (abduction → action → prediction), 64 nodes \u002F 117 edges, discourse + cascade mediators (Sunstein 2017 · Bikhchandani 1992)\n- **Agent population** — IPF \u002F Deming-Stephan 1940 baseline\n\nSee `backend\u002Foransim\u002F{world_model,diffusion,causal}\u002F` — every file has inline citations.\n\u003C\u002Fdetails>\n\n---\n\n## 🚀 Quickstart (60 seconds)\n\n```bash\n# 1. Clone and install\ngit clone https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim.git\ncd oransim\npip install -e '.[dev]'\n\n# 2. Run backend (mock mode — no API key required)\nLLM_MODE=mock python -m uvicorn oransim.api:app --port 8001 &\n\n# 3. Run frontend\npython -m http.server 8090 --directory frontend\n\n# 4. Open http:\u002F\u002Flocalhost:8090 → click \"⚡ 极速\" → \"🚀 Predict\"\n```\n\n> 📌 **What you're running on** — the Quickstart path consumes `data\u002Fsynthetic\u002F` (2k scenarios \u002F 500 notes \u002F 100 event streams) and `data\u002Fmodels\u002Fworld_model_demo.pkl` (LightGBM trained on the synthetic corpus). This is a **demo dataset calibrated to public-report means** — it's deterministic, reproducible, and good enough to exercise every code path, but it is **not real traffic**. To plug in your own data (CSV \u002F JSONL \u002F OpenAPI \u002F DB), jump to [📦 Data](#-data--synthetic-by-default-bring-your-own-data-ready).\n\nMock mode returns deterministic stubs — good for CI \u002F first look, but every LLM-driven feature (soul personas, group-chat, comment-section discourse, LLM calibration of KPIs) falls back to templates. **To unlock the real pipeline, switch to api mode:**\n\n```bash\nLLM_MODE=api \\\nLLM_API_KEY=sk-xxxxx \\\nLLM_MODEL=gpt-5.4 \\\npython -m uvicorn oransim.api:app --port 8001 &\n```\n\nPick the native request format with `LLM_PROVIDER` — defaults to `openai` (also covers DeepSeek \u002F vLLM \u002F any OpenAI-compat gateway):\n\n\u003Cdetails>\n\u003Csummary>Per-provider recommended config (click)\u003C\u002Fsummary>\n\n| `LLM_PROVIDER` | `LLM_BASE_URL` | `LLM_MODEL` example | Key env |\n|---|---|---|---|\n| `openai` *(default)* | `https:\u002F\u002Fapi.openai.com\u002Fv1` | `gpt-5.4` · `gpt-4o-mini` | `OPENAI_API_KEY` or `LLM_API_KEY` |\n| `openai` (DeepSeek) | `https:\u002F\u002Fapi.deepseek.com\u002Fv1` | `deepseek-chat` | `LLM_API_KEY` |\n| `openai` (vLLM local) | `http:\u002F\u002Flocalhost:8000\u002Fv1` | any served model | `LLM_API_KEY=local` |\n| `anthropic` | `https:\u002F\u002Fapi.anthropic.com` (default) | `claude-sonnet-4-6` | `ANTHROPIC_API_KEY` or `LLM_API_KEY` |\n| `gemini` | Google default | `gemini-2.5-pro` · `gemini-2.5-flash` | `GEMINI_API_KEY` \u002F `GOOGLE_API_KEY` \u002F `LLM_API_KEY` |\n| `qwen` | `https:\u002F\u002Fdashscope.aliyuncs.com\u002Fapi\u002Fv1` (default) | `qwen-plus` · `qwen-turbo` | `DASHSCOPE_API_KEY` \u002F `QWEN_API_KEY` \u002F `LLM_API_KEY` |\n\nFull reference in [`.env.example`](.env.example); extended retry \u002F fallback-chain options in [`docs\u002Fen\u002Fquickstart.md`](docs\u002Fen\u002Fquickstart.md).\n\n\u003C\u002Fdetails>\n\nThe frontend shows a yellow banner at the top whenever the backend is still in mock (or has no key set) — click ✕ to dismiss for the session.\n\n> **Running right now · what's real vs aspirational**\n> - ✅ **Working today** — full backend (`POST \u002Fapi\u002Fpredict` · `\u002Fapi\u002Fadapters` · `\u002Fapi\u002Fsandbox\u002F*`, split across `api_routers\u002F` since api.py 1730-line god-file refactor) · full frontend (hero · 9 tabs · cascade animation · modular `js\u002F*.js`) · LightGBM quantile baseline pkl shipped · 5 platform adapters (XHS v1 legacy + TikTok agent-level w\u002F FYP RL + IG \u002F YouTube Shorts \u002F Douyin MVP) · learned amortized abduction (pure-numpy MLP q(U|O)) · multi-LLM providers (OpenAI-compat · Anthropic · Gemini · Qwen).\n> - 🟡 **Code-complete, weights pending** — Causal Transformer world model + Causal Neural Hawkes diffusion — architecture + training loop + inference + thinning sampler all shipped; pretrained weights land with OrancBench v0.5.\n> - 📋 **Roadmap-only** — Twitter \u002F Bilibili \u002F LinkedIn adapters · multi-modal embedders (image\u002Fvideo\u002Faudio stubs only today) · Ray cluster · hosted demo.\n\n---\n\n## 🎬 See It In Action\n\n\u003Ctable>\n\u003Ctr>\n\u003Ctd width=\"50%\" valign=\"top\">\n\n**Three-panel working UI** — left: creative + budget + sliders · center: KPI \u002F Agent pool \u002F AI group-chat tabs (+「更多 ›」dropdown for deep analysis) · right: per-persona LLM reactions.\n\n\u003Cimg src=\"assets\u002Fscreenshots\u002Fmain-three-col.png\" alt=\"Three-panel prediction UI\" width=\"100%\"\u002F>\n\n\u003C\u002Ftd>\n\u003Ctd width=\"50%\" valign=\"top\">\n\n**Opinion-propagation through a agent-based society** — drop an ad copy, watch color-coded opinion waves (green=click \u002F purple=high intent \u002F red=skip \u002F blue=curious) ripple outward from KOL seeds, cascading to their followers in real time.\n\n\u003Cimg src=\"assets\u002Fscreenshots\u002Fsociety-100m.png\" alt=\"Opinion propagation over the agent population\" width=\"100%\"\u002F>\n\n\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftable>\n\n---\n\n## 📦 Data · synthetic by default, bring-your-own-data ready\n\nOransim separates **framework** (the engine: world model, SCM, Hawkes, souls, platforms) from **data** (what flows through it). The OSS ships a small synthetic dataset so every code path works out-of-the-box; every path is replaceable with your own source.\n\n### What ships in the box\n\n| File | What it is | How it's used |\n|---|---|---|\n| `data\u002Fsynthetic\u002Fnotes_v3.json` | 500 synthetic notes · 10 niches | caption \u002F tag \u002F fans \u002F engagement priors |\n| `data\u002Fsynthetic\u002Fscenarios_v0_1.jsonl` | 2k scenarios · synthetic campaigns | world-model training + held-out eval |\n| `data\u002Fsynthetic\u002Fevent_streams_v0_1.jsonl` | 100 synthetic Hawkes event streams | diffusion forecaster fit |\n| `data\u002Fsynthetic\u002Fniche_priors_calibrated.json` | Per-niche CTR \u002F CVR prior means | fallback when world model has no signal |\n| `data\u002Fmodels\u002Fworld_model_demo.pkl` | LightGBM quantile baseline (~3 MB) | shipped weights — retrain with `backend\u002Fscripts\u002Fgen_synthetic_data.py` |\n| `data\u002Fniches.json` | **Niche registry** (10 entries) | single source of truth for niche keys, display labels, CTR priors, synonyms |\n\n> ⚠️ **This is a demo dataset, not truth.** The synthetic generator calibrates to public-report means (CTR \u002F engagement ranges from widely cited industry reports) but doesn't reflect any specific platform's real traffic or your particular audience. For marketing decisions you need either (a) your own data via a `DataProvider` (below), or (b) the Enterprise Edition's production-proven real-panel data — see [Enterprise](#-oranai-enterprise-edition).\n\n### Plugging in your data — three paths\n\nOransim's `DataProvider` interface lives in `oransim\u002Fplatforms\u002Fproviders\u002F`. Pick the path that matches where your data lives:\n\n| Provider | When to use | Contract | Reference |\n|---|---|---|---|\n| `CSVProvider` | batch export from BI \u002F data warehouse | one CSV per table (`notes.csv`, `kols.csv`) | [`docs\u002Fen\u002Fplatforms\u002Fwriting-a-provider.md`](docs\u002Fen\u002Fplatforms\u002Fwriting-a-provider.md) |\n| `JSONLProvider` | streamed events (Kafka tailed to file) | one JSON object per line | same doc |\n| `OpenAPIProvider` | live REST \u002F GraphQL | implement 4 read endpoints | same doc |\n| *Write-your-own* | PostgreSQL \u002F ClickHouse \u002F Snowflake \u002F BigQuery | subclass `DataProvider` | same doc |\n\n**Minimum schema** your source needs to expose:\n\n```yaml\nnotes:\n  - note_id, caption, niche, platform, publish_time,\n    author_fans_count, read_count, like_count, collection_count, comment_count\nkols:\n  - anchor_id, nick, niche, platform, fan_count,\n    interaction_rate, ad_price_cny\n```\n\nField names can be remapped via `provider.field_map`; the exact shape is documented in [`writing-a-provider.md`](docs\u002Fen\u002Fplatforms\u002Fwriting-a-provider.md).\n\n### Adding a new niche\n\nIf your data covers niches beyond the 10 shipping demo niches (e.g. automotive, healthcare, toys-and-collectibles), **edit `data\u002Fniches.json`** — add one entry per niche:\n\n```json\n{\n  \"key\": \"auto\",\n  \"zh\": \"汽车\",\n  \"en\": \"Automotive\",\n  \"synonyms\": [\"新能源车\", \"试驾\", \"特斯拉\", \"SUV\"],\n  \"ctr_prior\": {\"mu\": 0.024, \"sigma\": 0.010, \"n\": 860},\n  \"bias_caption\": \"汽车 试驾 新能源车 改装\",\n  \"female_ratio\": 30\n}\n```\n\nThat's the only edit. The registry is loaded at import time by `oransim.config.niches` and every niche-aware component (KOL library, caption→category detection, CTR priors, structured schema outputs, soul prompt rendering) reads from there — no scattered hardcoded tables to hunt down. Point at a custom path with `ORAN_NICHES_PATH=\u002Fsrv\u002Fmy_niches.json` if you'd rather not edit the in-repo file.\n\n### Telling when you're on demo data vs real data\n\nThe frontend shows a persistent banner whenever:\n- `LLM_MODE=mock` (no LLM key) — templates instead of real LLM\n- No custom `DataProvider` registered — reading `data\u002Fsynthetic\u002F`\n\nBoth cleared when you ship a real key + a real provider. The system-status panel also exposes `GET \u002Fapi\u002Fhealth`'s `data_source` field for observability.\n\n---\n\n## 🏗️ Architecture\n\n\u003Cdiv align=\"center\">\n\u003Cimg src=\"assets\u002Farchitecture.svg\" alt=\"Oransim architecture diagram\" width=\"100%\"\u002F>\n\u003C\u002Fdiv>\n\nA typical prediction request flows: **Creative + Budget** → **PlatformAdapter** (pulls data via pluggable **DataProvider**) → **World Model** (factual + counterfactual predictions) + **Agent Layer** (POP_SIZE-scalable IPF + LLM personas) → **Causal Engine** (64-node causal graph + `do()` counterfactuals) → **Diffusion** (14-day intervention-aware rollout) → **Prediction JSON** (14–19 schemas).\n\n**What runs where:**\n\n| Surface | Default (ships today) | Research-grade (opt-in) |\n|---|---|---|\n| World model | LightGBM quantile baseline (`data\u002Fmodels\u002Fworld_model_demo.pkl`) + hand-coded structural formula | `CausalTransformerWorldModel` (CaT \u002F TARNet \u002F Dragonnet \u002F CInA) — train locally, or swap in via `POST \u002Fapi\u002Fv2\u002Fworld_model\u002Fpredict?model=causal_transformer` |\n| Diffusion | Parametric exponential-kernel Hawkes (Hawkes 1971) | `CausalNeuralHawkesProcess` (Mei & Eisner + Zuo et al. + Geng et al.) — same opt-in pattern: `POST \u002Fapi\u002Fv2\u002Fdiffusion\u002Fforecast?model=causal_neural_hawkes` |\n| Agents | `StatisticalAgents` (vectorised, CPU) | `SoulAgentPool` LLM personas (enable via `use_llm=true` on `\u002Fapi\u002Fpredict`) |\n| Sandbox | Budget-only slider uses a Hill-saturation + frequency-fatigue closed form (`mode: \"fast_approx\"` in the response) so the slider is responsive. Non-budget edits (creative \u002F alloc \u002F KOL) trigger a real model re-run (`mode: \"counterfactual\"` or `\"full_rerun\"`). | — |\n\n*The registry is the extension point. Default `\u002Fapi\u002Fpredict` uses the baseline stack because it's what ships with weights today; `\u002Fapi\u002Fv2\u002F*` is how you A\u002FB swap in the research stack once you've trained it. Both routes share the same SCM \u002F agent \u002F Hawkes plumbing.*\n\nTwo-axis extensibility:\n- **Platform** axis — XHS (legacy, v1 live) + TikTok \u002F Instagram \u002F YouTube Shorts \u002F Douyin (MVP on synthetic); Twitter \u002F Bilibili \u002F LinkedIn on roadmap\n- **Data Provider** axis — pluggable per platform (Synthetic \u002F CSV \u002F JSON \u002F OpenAPI \u002F your own)\n\nSee [`docs\u002Fen\u002Farchitecture.md`](docs\u002Fen\u002Farchitecture.md) for the full design.\n\n---\n\n## 🌐 Platform Adapter Matrix\n\n| Platform             | Region   | Status  | Data Provider                       | World Model          | Milestone |\n|----------------------|----------|---------|-------------------------------------|----------------------|-----------|\n| 🔴 XHS \u002F RedNote     | Greater China | ✅ v1   | Synthetic \u002F CSV \u002F JSON \u002F OpenAPI | Causal Transformer + LightGBM baseline | — |\n| ⚫ TikTok            | Global   | 🟢 MVP  | Synthetic                        | LightGBM baseline    | v0.5 (real panels) |\n| 🟣 Instagram Reels   | Global   | 🟢 MVP  | Synthetic                        | LightGBM baseline    | v0.5 (real panels) |\n| 🔴 YouTube Shorts    | Global   | 🟢 MVP  | Synthetic                        | LightGBM baseline    | v0.5 (real panels) |\n| 🔵 Douyin            | Greater China | 🟢 MVP | Synthetic                        | LightGBM baseline    | v0.5 (real panels) |\n| ⚪ Twitter \u002F X       | Global   | 📋 planned | —                             | —                    | v0.5 |\n| 📺 Bilibili          | Greater China | 📋 planned | —                        | —                    | v1.0 |\n| ✒️ LinkedIn          | Global   | 📋 planned | —                             | —                    | v1.0 |\n\n> *What \"MVP\" actually means here*: XHS is the canonical v1 adapter with real data-provider paths (CSV \u002F JSON \u002F OpenAPI). TikTok \u002F IG \u002F YouTube Shorts \u002F Douyin ship as **config-differentiated wrappers** over the same `PlatformAdapter` interface (each has distinct CPM \u002F CTR \u002F CVR \u002F duration priors — see `backend\u002Foransim\u002Fplatforms\u002F{platform}\u002Fadapter.py`), all driven by the synthetic LightGBM baseline. They pass shape tests end-to-end but don't yet have platform-specific DataProviders hooked up; that's what \"v0.5 (real panels)\" means in the milestone column.\n\n**Want another platform?** Open an [Adapter Request](https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim\u002Fissues\u002Fnew?template=adapter_request.yml) — we prioritize based on community demand.\n\n---\n\n## 📊 What You Get — 14 to 19 Schemas\n\nA single `\u002Fapi\u002Fpredict` call returns structured outputs across these schemas:\n\n1. **total_kpis** — aggregate impressions \u002F clicks \u002F conversions \u002F cost \u002F revenue \u002F CTR \u002F CVR \u002F ROI with P35\u002FP50\u002FP65 bands\n2. **per_platform** — KPIs broken down per platform adapter\n3. **per_kol** — KOL-level attribution\n4. **diffusion_curve** — 14-day daily impression\u002Fengagement forecast (Causal Neural Hawkes; parametric Hawkes as baseline)\n5. **cate** — Conditional Average Treatment Effect across agent demographics\n6. **counterfactual** — \"What if\" branching: alternative creative \u002F budget \u002F KOL\n7. **soul_feedback** — 10 LLM persona reactions in natural language\n8. **group_chat** — simulated group conversation dynamics (Sunstein 2017 polarization)\n9. **discourse** — second-wave mediator impact estimation\n10. **final_report** — LLM-generated executive summary\n11. **verdict** — top-line recommendation (greenlight \u002F optimize \u002F kill)\n12. **kol_optimizer** — optimal KOL mix given objective\n13. **kol_content_match** — creative × KOL compatibility scoring\n14. **tag_lift** — incremental performance from tag\u002Ftargeting choices\n15. **mediator_impact** — path analysis from discourse\u002Fgroup_chat to funnel\n16. **brand_memory** — longitudinal brand preference updates\n17. **sandbox_snapshot** — serialized session state for \"undo \u002F redo\"\n18. **audit_trace** — explainability — which agents, which paths, which weights\n19. **benchmark** — performance against OrancBench\n\nSee [`docs\u002Fen\u002Fschemas\u002F`](docs\u002Fen\u002Fschemas\u002F) for JSON schema definitions.\n\n---\n\n## 🧠 Under the Hood\n\n\u003Cdetails id=\"causal-graph\">\n\u003Csummary>\u003Cb>Causal Graph\u003C\u002Fb> — 64 nodes, 117 edges\u003C\u002Fsummary>\n\nHand-designed by domain experts covering the marketing funnel: impression → awareness → consideration → conversion → repeat purchase → brand memory, with mediators for group discourse (Sunstein 2017) and information cascades (Bikhchandani et al. 1992).\n\nThe graph includes long-term feedback loops (e.g. `repeat_purchase → brand_equity → ecpm_bid → next-cycle impression_dist`). This is intentional — it reflects real marketing physics, not a modeling artifact. Strict Pearl-style abduction on cycles is undefined; our `do()` evaluation uses the cyclic-SCM generalization of Bongers et al. 2021 ([Foundations of Structural Causal Models with Cycles and Latent Variables](https:\u002F\u002Farxiv.org\u002Fabs\u002F1611.06221)), treating the 25-node feedback SCC as a fixed-point solve rather than a topological forward pass.\n\nThe 3-step evaluation in code:\n1. **Abduction** — at the agent layer, re-use the sampled noise from baseline; at the graph layer, per-node residuals are frozen\n2. **Action** — apply `do()` intervention (supported nodes listed in `\u002Fapi\u002Fdag`'s `intervenable: true` set)\n3. **Prediction** — topologically sort the acyclic condensation, solve each SCC by numerical iteration (2–3 passes empirically converge on the shipped graph)\n\nA time-unrolled DAG projection IS available in the OSS release via `oransim.causal.scm.dag_dict_unrolled(n_steps=K)` — each original node becomes `N_t0, N_t1, ..., N_t{K-1}`; feedback edges cross time (`src_ti → dst_t{i+1}`), non-feedback edges replicate within each slice. At `n_steps=2` the shipped graph's 64 nodes + 117 edges (cyclic) unroll to 128 nodes + 220 edges (strict DAG, 14 feedback edges detected automatically via DFS back-edge analysis). Downstream modules that need strict acyclicity (CausalDAG-Transformer attention on a true DAG, textbook Pearl three-step abduction) can consume the unrolled view. The cyclic native graph + SCC condensation remains the default because it keeps the node count small and matches the shipped Transformer's 7-token input layout.\n\nA full equilibrium-solver with fixed-point guarantees for the cyclic native graph is an Enterprise Edition upgrade; the OSS release offers the unrolled-DAG path as the acyclic alternative.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Agent Population\u003C\u002Fb> — POP_SIZE-scalable IPF-calibrated virtual consumers\u003C\u002Fsummary>\n\nGenerated via Iterative Proportional Fitting (IPF \u002F Deming-Stephan 1940) against real Chinese demographic distributions (age × gender × region × income × platform). Each agent carries:\n- Demographics + psychographics\n- Platform-specific engagement priors\n- Niche\u002Fcategory affinity vectors\n- Time-of-day activity curves\n- Social graph embeddings\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Soul Agents\u003C\u002Fb> — LLM personas for qualitative feedback\u003C\u002Fsummary>\n\nThe top-K most salient agents for a scenario are upgraded to LLM-backed personas (`SOUL_POOL_N` configurable; default 100 for demo, scalable via Ray in the Enterprise Edition). Default model: `gpt-5.4`. Each persona:\n- Generates a persona card from its demographic vector\n- Evaluates the creative (reaction \u002F emotional response \u002F intent)\n- Optionally participates in simulated group chats (Sunstein 2017 group polarization)\n- Feeds second-wave mediators back into the causal graph\n\n**Two modes, explicit trade-off**:\n\n- **Template mode** (`use_llm=False`, default) — click decision is a Bernoulli draw against the statistical `click_prob` (+40% niche-match lift); the persona picks a consistent template ``reason`` \u002F ``comment`` \u002F ``feel``. Zero LLM cost, deterministic given seed, used for CATE \u002F ROI numerical reproducibility.\n- **LLM-decider mode** (`use_llm=True`, Park et al. 2023 Generative Agents style) — a real LLM gets the full persona card + creative + KOL context and returns a structured JSON (`will_click`, `reason`, `comment`, `feel`, `purchase_intent_7d`). **The LLM's ``will_click`` is the agent's decision** (not overridden by Bernoulli); the statistical `click_prob` is available as a prior in the prompt. Response tagged `source: \"llm\"`. Trade-off: adds non-determinism per persona; for strict reproducibility stay in template mode or pin `LLM_TEMPERATURE=0`.\n\nCost controlled via:\n- In-flight request coalescing (leader\u002Ffollower dedup pattern)\n- Persona card caching\n- Configurable `SOUL_POOL_N`\n\u003C\u002Fdetails>\n\n\u003Cdetails id=\"causal-transformer-world-model\">\n\u003Csummary>\u003Cb>Causal Transformer World Model\u003C\u002Fb> — primary (research-grade)\u003C\u002Fsummary>\n\nA 6-layer × 256-dim causal Transformer that ingests heterogeneous campaign features and predicts three quantile levels (P35\u002FP50\u002FP65) for each funnel KPI. Architecture lifts ideas from the recent causal-Transformer literature:\n\n- **Token-type factorization** (CaT, Melnychuk et al. ICML 2022) — inputs split into *Covariate* (platform, demographic, time), *Treatment* (creative embedding, budget, KOL), and *Outcome* (KPIs) tokens with distinct type embeddings\n- **DAG-aware attention** (CausalDAG-Transformer) — attention mask derived from the 64-node causal graph restricts each token to attend to topological ancestors; per-head learnable gate on the bias. Because the shipped graph is cyclic (see §[Causal Graph](#causal-graph)), ancestry is defined on the graph's **SCC condensation**: within a feedback SCC all nodes are mutually ancestral, across SCCs the standard DAG ancestor relation applies (Bongers 2021 §3.2). Reference implementation shipped in `CausalTransformerWorldModel.set_dag_from_edges()` and toggleable via `dag_attention_bias=True`. The OSS release defaults to the LightGBM baseline path; **pretrained CT checkpoints with DAG attention enabled ship with the Enterprise Edition** (see §[OranAI Enterprise Edition](#enterprise)).\n- **Per-arm counterfactual heads** (TARNet, Shalit et al. ICML 2017 \u002F Dragonnet, Shi et al. NeurIPS 2019) — one quantile head per discrete treatment arm enables `predict_factual` vs `predict_counterfactual(do(T=t'))` with a single forward pass\n- **Representation balancing** (BCAUSS + CaT) — HSIC (Gretton et al. 2005) or adversarial-IPTW loss decorrelates the learned representation from treatment assignment, reducing bias in counterfactual predictions\n- **In-context amortization** (CInA, Arik & Pfister NeurIPS 2023, optional) — model can condition on a context set of prior campaigns for amortized zero-shot causal inference\n\nCore component: `oransim.world_model.CausalTransformerWorldModel`. Training loop, counterfactual rollout, and save\u002Fload are shipped today; pretrained weights land with OrancBench v0.5.\n\n```python\nfrom oransim.world_model import get_world_model, CausalTransformerWMConfig\n\nwm = get_world_model(\"causal_transformer\", config=CausalTransformerWMConfig(\n    dag_attention_bias=True,\n    balancing_loss=\"hsic\",\n    use_counterfactual_head=True,\n))\npred = wm.predict(features)                         # factual\ncf = wm.counterfactual(features, arm_idx=2)         # do(T = arm 2)\n```\n\n*Requires* `pip install 'oransim[ml]'` (brings in PyTorch). Falls back gracefully to LightGBM if torch is unavailable.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Universal Embedding Bus (UEB)\u003C\u002Fb> — text-only today, multi-modal hooks for v0.5\u003C\u002Fsummary>\n\nEvery data source (creative copy, KOL bio, user comment, fan-profile tabular record, platform event stream) flows through a shared `Embedder` ABC that produces a fixed-dim vector. Downstream modules (world_model \u002F agent \u002F causal) never see modality-specific code — the registry is modality-generic.\n\n**Shipped today (v0.2)**:\n- `RealTextEmbedder` — OpenAI-compatible `text-embedding-3-small` via the same gateway as `soul_llm` (one key for everything). Falls back to a deterministic hash embedder if the API is unavailable.\n- `TabularEmbedder`, `CategoricalEmbedder`, `TimeSeriesEmbedder`, `GeoEmbedder`, `EventEmbedder` — non-learned baselines.\n\n**Stubs for v0.5** (raise `NotImplementedError` pointing to ROADMAP.md#v05 if called):\n- `ImageEmbedderStub` — planned backends: CLIP \u002F Qwen-VL \u002F SigLIP \u002F ImageBind\n- `VideoEmbedderStub` — planned backends: I-JEPA v2 \u002F TimeSformer \u002F VideoMAE v2 \u002F Qwen-VL video\n- `AudioEmbedderStub` — planned backends: Whisper-v3 encoder \u002F CLAP \u002F AudioMAE\n\nDropping a real implementation in is a ~50-line `Embedder` subclass with no downstream changes. See `backend\u002Foransim\u002Fruntime\u002Fembedding_bus.py`.\n\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>LightGBM Quantile World Model\u003C\u002Fb> — fast baseline\u003C\u002Fsummary>\n\nThree quantile regressors (P35, P50, P65) per KPI. Sub-millisecond inference, zero GPU requirement. Refs: Ke et al. 2017 (LightGBM), Koenker 2005 (Quantile Regression).\n\n**Shipped pkl** (`data\u002Fmodels\u002Fworld_model_demo.pkl`, `feature_version: demo_v2`, ~3 MB) consumes **23 features**: 7 tabular (`platform_id`, `niche_idx`, `budget`, `budget_bucket`, `kol_tier_idx`, `kol_fan_count`, `kol_engagement_rate`) + 16 PCA-reduced text-embedding dimensions. The embedding input is a deterministic caption per scenario (`\"春季 {niche} 新品种草 · {tier} KOL · {budget_bucket}\"`) passed through `RealTextEmbedder` — same embedder the rest of the stack uses (UEB, soul-agent persona matching, `kol_content_match`, `search_elasticity`). When `OPENAI_API_KEY` is set, it hits `text-embedding-3-small`; without a key, it falls back to the deterministic SHA-256 hash embedder so training \u002F inference is still reproducible offline. PCA components ship inside the pkl and are applied at inference time via `POST \u002Fapi\u002Fv2\u002Fworld_model\u002Fpredict?model=lightgbm_quantile`. R² on the 200 held-out from 2,000 synthetic scenarios: impressions 0.88 · clicks 0.79 · conversions 0.71 · revenue 0.75.\n\nThe Causal Transformer path consumes the full-dim creative embedding natively (without PCA) once weights land with OrancBench v0.5; the demo LightGBM pkl is the CPU-only fallback until then.\n\n```python\nwm = get_world_model(\"lightgbm_quantile\")\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Budget Model\u003C\u002Fb> — Hill saturation + frequency fatigue\u003C\u002Fsummary>\n\nInstead of naive linear budget scaling:\n\n$$\\text{effective\\_impr\\_ratio}(x) = \\frac{(1+K) \\cdot x}{K + x}$$\n\nMichaelis-Menten \u002F Hill saturation (Dubé & Manchanda 2005), combined with frequency fatigue (Naik & Raman 2003) on CTR\u002FCVR:\n\n$$\\text{ctr\\_decay}(r) = \\max(0.5, 1.0 - 0.08 \\cdot \\max(0, \\log_2 r))$$\n\nThis captures diminishing returns, an optimal budget point, and realistic campaign dynamics.\n\u003C\u002Fdetails>\n\n\u003Cdetails id=\"causal-neural-hawkes-process\">\n\u003Csummary>\u003Cb>Causal Neural Hawkes Process\u003C\u002Fb> — primary diffusion forecaster\u003C\u002Fsummary>\n\nTransformer-parameterized neural temporal point process for 14-day cascading engagement forecasting, with first-class support for counterfactual rollouts under `do()` interventions.\n\nArchitectural references:\n\n- **Mei & Eisner (NeurIPS 2017)** — *The Neural Hawkes Process* — continuous-time neural intensity function, foundation of the field\n- **Zuo et al. (ICML 2020)** — *Transformer Hawkes Process* — self-attention encoder replacing the original CT-LSTM; directly the backbone of this implementation\n- **Shchur et al. (ICLR 2020)** — *Intensity-Free Learning of TPPs* — closed-form inter-event-time head for fast sampling\n- **Chen et al. (ICLR 2021)** — *Neural Spatio-Temporal Point Processes* — Monte Carlo estimator for the log-likelihood compensator\n- **Geng et al. (NeurIPS 2022)** — *Counterfactual Temporal Point Processes* — the intervention semantics for marked point processes\n- **Noorbakhsh & Rodriguez (2022)** — *Counterfactual Temporal Point Processes* — formalizes `do()` queries on event streams\n\nExplicit treatment\u002Fcontrol event typing (`organic` vs `paid_boost`) and an intervention-aware intensity decoder enable queries like \"what if we had stopped boosting on day 3\" via a counterfactual rollout loop.\n\nCore component: `oransim.diffusion.CausalNeuralHawkesProcess`. Architecture, training loop (NLL with MC compensator), forecast sampler (Ogata thinning), and counterfactual rollout are shipped today; pretrained weights land with OrancBench v0.5.\n\n```python\nfrom oransim.diffusion import get_diffusion_model\n\nnh = get_diffusion_model(\"causal_neural_hawkes\")\nfactual = nh.forecast(seed_events=[(0, \"impression\"), (12, \"like\")])\ncf = nh.counterfactual_forecast(\n    seed_events,\n    intervention={\"mute_at_min\": 4320}  # stop boosting 3 days in\n)\n```\n\n*Requires* `pip install 'oransim[ml]'`.\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Parametric Hawkes\u003C\u002Fb> — classical baseline\u003C\u002Fsummary>\n\nExponential-kernel multivariate Hawkes process (Hawkes 1971). Closed-form intensity and log-likelihood; Ogata (1981) thinning sampler. Zero-dependency fallback and the baseline against which the Causal Neural Hawkes is evaluated on OrancBench.\n\n```python\nph = get_diffusion_model(\"parametric_hawkes\")\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Sandbox\u003C\u002Fb> — incremental recomputation for \"what if\"\u003C\u002Fsummary>\n\nScenario sessions persist state so users can iterate: \"change budget from 100k to 150k, how does ROI move?\" Incremental recomputation avoids redoing the full agent simulation when only budget changes. The agent pool is cached; counterfactual evaluation uses union-semantics CATE over reached vs. unreached populations.\n\u003C\u002Fdetails>\n\n---\n\n## 📈 Benchmarks\n\nPhase 1 benchmarks are based on the shipped synthetic corpus (**2,000 scenarios + 100 event streams + 50 OrancBench tasks** — reproducible from the files under [`data\u002Fsynthetic\u002F`](data\u002Fsynthetic\u002F) and [`data\u002Fbenchmarks\u002F`](data\u002Fbenchmarks\u002F)). See [`data\u002Fmodels\u002Fdata_card.md`](data\u002Fmodels\u002Fdata_card.md) for the data-generating process. The R² numbers below were run on 10% held-out of those 2k scenarios; larger-corpus numbers land with OrancBench v0.5.\n\n| Metric | R² (synthetic) | Baseline (linear) | Notes |\n|--------|---------------|-------------------|-------|\n| `second_wave_click`     | 0.30 | 0.18 | PRS quantile median |\n| `first_wave_conversion` | 0.33 | 0.21 | PRS quantile median |\n| `cascade_lift`          | 0.39 | 0.25 | Second-wave mediator |\n| `roi_point_estimate`    | 0.33 | 0.19 | Single-shot regression |\n| `retention_7d`          | 0.29 | 0.17 | Longitudinal |\n\n> ⚠️ **Honest reproducibility framing** — this is a **closed-loop evaluation**: the same synthetic data generator (`backend\u002Fscripts\u002Fgen_synthetic_data.py`) produces both training and held-out splits, and we evaluate our own model on our own generative process. This measures **\"does the model fit our generative assumptions\"**, not external validity. For real marketing-decision accuracy you need either (a) an independent real-panel benchmark (Enterprise Edition uses proprietary real-world data) or (b) a public benchmark with out-of-distribution campaigns — the OrancBench v0.5 plan (see ROADMAP.md) is our attempt at the latter.\n\nSee [`docs\u002Fen\u002Fbenchmarks\u002F`](docs\u002Fen\u002Fbenchmarks\u002F) for the full protocol.\n\n---\n\n## 🗺️ Roadmap — Highlights\n\nSee [ROADMAP.md](ROADMAP.md) for the full 3-horizon × 8-theme plan. Teasers:\n\n**v0.2 (Q3 2026) — shipping pretrained weights**\n- 📦 Trained Causal Transformer + Causal Neural Hawkes checkpoints on an expanded synthetic corpus (targeting ~100k scenarios for OrancBench v0.5)\n- TikTok + Douyin adapter MVPs\n- Docker Compose · MkDocs · CI\n\n**v0.5 (Q4 2026 – Q1 2027)**\n- 🎯 **Cross-platform transfer learning** — pretrain on XHS, fine-tune on TikTok\n- ✅ **Multi-LLM-format adapters** — native Anthropic Messages, Gemini, Qwen DashScope shipped in v0.2; Bedrock Converse + native streaming roadmap item\n- 🎯 **10k soul agents on Ray cluster**\n- ✅ Instagram \u002F YouTube Shorts \u002F Douyin adapters MVP\n\n**v1.0+ (2027)**\n- 🎯 **Causal Foundation Model** — pretrain on 10M+ campaigns\n- 🎯 **Closed-loop AI media buying** — real-time optimization with safety constraints\n- 🎯 **Differential privacy + Federated learning** — for brand-proprietary training\n- 15+ platforms, multi-modal creative understanding, vertical sub-benchmarks\n\n---\n\n## 🏢 OranAI Enterprise Edition\n\nThe OSS you just read is the **causal engine**. Both editions run on the same Apache-2.0 code — the differences below span **8 dimensions**: data, pretrained weights, algorithms, learning loop, governance, integrations, team product, runtime. Audit the engine in this repo, then license the production stack.\n\n> **💼 Business contact** — [`cto@orannai.com`](mailto:cto@orannai.com?subject=Oransim%20Business%20Inquiry) · pricing · data licensing · pilot · on-prem deployment · typical reply \u003C 24h · or browse the live panel at **[datacenter.oran.cn](https:\u002F\u002Fdatacenter.oran.cn\u002F)** first.\n\n### Capability matrix\n\n#### 📊 Data · real-world panel\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **Data panel** | 21k demo 小红书 notes + 3k KOLs | **4.3M+ notes · 2.1M+ 达人 (KOL + KOC + long-tail) · 100k+ consumer panel**, daily refresh · live at [datacenter.oran.cn](https:\u002F\u002Fdatacenter.oran.cn\u002F) `[licensed platform APIs · ClickHouse]` |\n| **Vertical calibration** | Generic priors | **10+ verticals** each calibrated — beauty · 3C · auto · luxury · medical aesthetics · … `[per-vertical fan_profile pkl + CPM–conversion curve fits]` |\n| **Competitor panel** | — | Competitor KOL rosters + historical CPM\u002FCVR 实盘 data `[public disclosures + third-party licensed feeds]` |\n\n#### 🧠 Models · pretrained weights\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **World-model checkpoints** | All 3 models ship with `pretrained_url: \"coming_soon\"` · falls back to LightGBM baseline | **Pretrained CausalTransformer + CausalNeuralHawkes** with DAG-attention enabled `[trained on 10M+ real impressions · DAG mask derived from the 64-node SCM]` |\n| **LLM soul agents** | Text LLM via your API key | Full multimodal — reads your actual creatives (image + video + audio) `[proprietary multimodal backbone · details under NDA]` |\n| **Client-specific fine-tuning** | Shared generic baseline | Fine-tuned on **your real campaign data** · monthly incremental updates |\n\n#### 🧮 Algorithms · solvers & posteriors\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **Counterfactual posterior** | Sample-reuse + closed-form Bayesian shrink + pure-numpy MLP amortizer | **Normalizing-flow learned posterior** · proper Pearl Step-1 abduction on cyclic graphs `[sbi NPE \u002F SNPE]` |\n| **Cyclic SCM equilibrium** | Time-unrolled DAG (acyclic approximation) + linear-SCC Banach fixed-point (requires ρ \u003C 1) | **Non-linear equilibrium solver** with contraction guarantees on arbitrary cyclic SCMs `[Bongers 2021 §5 + damped Picard + spectral-radius monitoring]` |\n| **Synthetic population** | IPF marginal matching (1-way marginals → 8-dim joint · ignores pairwise) | **Bayesian-net \u002F diffusion joint synthesizer** · preserves pairwise + higher-order structure `[bnlearn · TabDDPM · both return HTTP 501 in OSS]` |\n| **KOL matching** | Heuristic cosine (creative embed × KOL interest vector) | **Learned cross-attention encoder** · creative tokens × KOL-persona tokens `[transformer cross-attention · trained on real CPM-conversion labels]` |\n| **Tag \u002F trend extraction** | jieba tokenizer on 21k synthetic notes (static) | **Real-panel index** · daily refresh from live platform feeds `[Kafka → ClickHouse]` |\n\n#### 🔁 Learning loop\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **Incremental learning from actuals** | Static model · manual retrain | Post-campaign actuals auto-stream back into the training set |\n| **Cross-campaign brand memory** | Per-request brand memory only | 12-month continuous brand-equity tracking · avoids re-targeting the same cohort |\n\n#### 🧭 Governance\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **Audit trail** | Local logs | Tamper-evident signed audit chain per prediction (input + model version + data snapshot, fully replayable) |\n| **Approval workflow** | — | Strategy → budget → go-live multi-stage approval |\n| **Rollback \u002F version control** | — | Model-version + data-version + campaign-version binding · one-click rollback |\n| **Compliance** | — | SOC 2 \u002F ISO 27001 path · GDPR · 中国《个人信息保护法》 |\n\n#### 🔗 Integrations\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **Martech connectors** | — | 巨量引擎 \u002F 磁力引擎 \u002F 小红书千帆 \u002F 腾讯广告 \u002F Google Ads \u002F Meta Ads · official APIs |\n| **CRM \u002F CDP bidirectional sync** | — | Salesforce · SAP CDP · Adobe AEP · customer-owned CDP |\n| **SSO \u002F RBAC** | — | SAML 2.0 · OIDC · role-based permissions |\n\n#### 👥 Team product\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **Multi-tenant isolation** | Single-tenant, local | Strict tenant isolation · competitor data physically segregated |\n| **Collaboration** | CLI | Planner \u002F buyer \u002F approver multi-role workflow · Lark \u002F Slack webhooks |\n| **Saved scenario library** | No persistence | Scenario catalog + decision-chain traceability |\n\n#### ⚙️ Runtime\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **Agent runtime** | Single-process Python · 100k agents (`SOUL_POOL_N ≤ 1000` LLM personas) | **Distributed Ray actor pool** · 1M+ agents · 10k+ LLM personas in parallel `[Ray 2.x + vLLM batched inference]` |\n| **Shared state** | Process-local singletons + multi-worker startup WARNING | **Redis-backed shared state** · sandbox \u002F brand-memory \u002F UEB consistent across workers `[Redis 7 + asyncio client]` |\n\n#### 🎧 Managed service\n\n| | Oransim OSS | OranAI Enterprise |\n|---|---|---|\n| **Deployment** | Local \u002F your cloud | Hosted · on-prem · hybrid · 99.9% SLA · sub-second · 全球加速 |\n| **Onboarding** | Self-serve docs | White-glove — custom adapter dev · integration · training |\n| **Model updates** | Community cadence | Managed — zero-downtime refresh as platforms evolve |\n\n### Typical pilot (2 weeks, ¥0 commitment)\n\n1. **Day 1–3 · Scope call** — we pick 2–3 of your active campaigns as test scenarios\n2. **Day 4–10 · Simulation** — you give us creative + KOL shortlist + historical KPIs → we run counterfactual simulation → present ranked recommendations\n3. **Day 11–14 · In-market validation** — you execute one recommendation in market → we compare our pre-launch prediction vs actuals → calibration report\n\n**Exit criteria**: our pre-launch P35\u002FP65 bands contain the actual KPI **≥ 80% of the time**. If not, pilot ends, no charge. If yes, we talk pricing.\n\n### Contact\n\nAll inbound → [`cto@orannai.com`](mailto:cto@orannai.com) · typical reply \u003C 24h. Tag the subject so we route it right:\n\n- **[Business]** — pricing · demo · data licensing · API integration · on-prem deployment\n- **[Pilot]** — book the 2-week pilot described above\n- **[Investor]** \u002F **[Partner]** — investors \u002F strategic partners\n- **[Press]** — media inquiries\n\n---\n\n## 🤝 Contributing\n\nWe love contributions — platform adapters, world-model improvements, docs, benchmarks, translations, bug fixes.\n\n- **Start here**: [CONTRIBUTING.md](CONTRIBUTING.md)\n- **Sign off commits** per [DCO](CONTRIBUTING.md#developer-certificate-of-origin-dco): `git commit -s`\n- **Good first issues**: [see labels](https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim\u002Fissues?q=is%3Aissue+label%3A%22good+first+issue%22)\n- **Platform adapter requests**: [file here](https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim\u002Fissues\u002Fnew?template=adapter_request.yml)\n\nBy contributing, you agree your contribution is licensed under Apache-2.0. No CLA required.\n\n---\n\n## 📚 Citation\n\nIf you use Oransim in research, please cite:\n\n```bibtex\n@software{oransim2026,\n  author       = {{OranAI Ltd. and Oransim contributors}},\n  title        = {Oransim: Causal Simulation for Enterprise Growth Teams},\n  version      = {0.2.0-alpha},\n  date         = {2026-04-18},\n  url          = {https:\u002F\u002Fgithub.com\u002FOranAi-Ltd\u002Foransim},\n  organization = {OranAI Ltd.}\n}\n```\n\nSee [CITATION.cff](CITATION.cff) for `cffconvert`-compatible metadata.\n\n---\n\n## 📜 License\n\nApache License 2.0 — see [LICENSE](LICENSE) and [NOTICE](NOTICE).\n\n`Copyright (c) 2026 OranAI Ltd. (橙果视界（深圳）科技有限公司) and Oransim contributors.`\n\nThird-party dependencies retain their original licenses. We are not affiliated with Xiaohongshu, ByteDance, Meta, Google, or any other platform mentioned in this repository.\n\n---\n\n## 💫 Team\n\nBuilt by **[OranAI Ltd.](https:\u002F\u002Foran.cn)** (橙果视界（深圳）科技有限公司). See §[Who we are](#who-we-are) above for company context.\n\n### Core Maintainer\n\n**Fakong Yin (尹法空)** · CTO & Core Architect, OranAI Ltd. · [`cto@orannai.com`](mailto:cto@orannai.com) · [@OranAi-Ltd](https:\u002F\u002Fgithub.com\u002FOranAi-Ltd)\n\nSole author of this repository's causal engine — 64-node Pearl SCM, per-arm counterfactual world model, causal neural Hawkes diffusion layer, Universal Embedding Bus, 8-router FastAPI backend, 5 platform adapters (XHS · TikTok · Douyin · Instagram Reels · YouTube Shorts), the LightGBM quantile baseline pipeline, and the 9-tab production frontend. End-to-end range across marketing strategy · ad-tech product · causal ML \u002F RL \u002F agent-based simulation · backend + data infrastructure — rare for a single engineer.\n\nGit log speaks for itself: `git log --author=\"Fakong Yin\" --oneline | wc -l`.\n\n**Open roles** — we're hiring researchers (Causal ML · RL · agent-based simulation) and engineers (platform · data · infra). Reach out at [`cto@orannai.com`](mailto:cto@orannai.com).\n\nContributors appear on [`CONTRIBUTORS.md`](CONTRIBUTORS.md) (auto-generated).\n\n---\n\n## ⭐ Star History\n\n\u003Ca href=\"https:\u002F\u002Fstar-history.com\u002F#OranAi-Ltd\u002Foransim&Date\">\n  \u003Cpicture>\n    \u003Csource media=\"(prefers-color-scheme: dark)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=OranAi-Ltd\u002Foransim&type=Date&theme=dark\" \u002F>\n    \u003Csource media=\"(prefers-color-scheme: light)\" srcset=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=OranAi-Ltd\u002Foransim&type=Date\" \u002F>\n    \u003Cimg alt=\"Star History Chart\" src=\"https:\u002F\u002Fapi.star-history.com\u002Fsvg?repos=OranAi-Ltd\u002Foransim&type=Date\" \u002F>\n  \u003C\u002Fpicture>\n\u003C\u002Fa>\n\n---\n\n\u003Cdiv align=\"center\">\nBuilt with ☕ in Shenzhen by \u003Ca href=\"https:\u002F\u002Foran.cn\">OranAI\u003C\u002Fa>. If Oransim helps your work, please ⭐ star the repo — it powers our open-source commitment.\n\u003C\u002Fdiv>\n","Oransim是一个用于大规模营销的因果数字孪生平台，能够在实际投入前预测任何营销决策的效果。它通过基于代理的社会模拟和反事实推理引擎，在一个拥有超过100万虚拟消费者的环境中运行，使用大型语言模型支持的角色来阅读并评估真实的创意内容。该平台提供了透明的因果逻辑，并且开源，允许用户在授权数据访问前进行审计。适用于企业级市场部门，特别是那些希望在执行前准确预估广告活动投资回报率（ROI）的企业。","2026-06-11 02:39:03","CREATED_QUERY"]