[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80744":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":10,"openIssues":12,"contributorsCount":12,"subscribersCount":12,"size":12,"stars1d":12,"stars7d":12,"stars30d":12,"stars90d":12,"forks30d":12,"starsTrendScore":12,"compositeScore":13,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":14,"fork":14,"defaultBranch":15,"hasWiki":16,"hasPages":14,"topics":17,"createdAt":8,"pushedAt":8,"updatedAt":18,"readmeContent":19,"aiSummary":20,"trendingCount":12,"starSnapshotCount":12,"syncStatus":21,"lastSyncTime":22,"discoverSource":23},80744,"futuresim","OpenForecaster\u002Ffuturesim","OpenForecaster",null,"Python",41,5,0,36.33,false,"main",true,[],"2026-06-12 04:01:29","# Futuresim\n\nMulti-agent forecasting simulator where LLM agents predict on free-form questions and are scored against each other.\n\n## Quick Start\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FOpenForecaster\u002Ffuturesim.git\ncd futuresim\n\nuv sync\nsource .venv\u002Fbin\u002Factivate\n\ncp .env.example .env\n# Edit .env: set OPENROUTER_API_KEY and, if needed, storage paths.\n\npython scripts\u002Frun_forecast_sim.py --config configs\u002Fshared\u002Fdefault_sim.yaml\n```\n- OpenRouter configs require `OPENROUTER_API_KEY`.\n- The default search-enabled configs use LanceDB. Use\n  `configs\u002Fshared\u002Fdefault_nosearch_sim.yaml` to run without retrieval.\n\n## Environment\n\n`scripts\u002Frun_forecast_sim.py` loads `.env` from the repo root automatically.\nShell exports override `.env` values. Inside `.env`, `${FSIM_REPO_DIR}` expands\nto this checkout. The `FSIM_*` prefix is kept for compatibility with existing\nconfigs. `pathing.py` is the small helper that loads `.env`, expands config\nplaceholders, and errors if a required placeholder is still unresolved.\n\nCommon variables:\n\n| Variable | Required? | Use |\n|----------|-----------|-----|\n| `OPENROUTER_API_KEY` | yes for OpenRouter configs | Agent and answer-matcher API calls |\n| `FSIM_OUTPUT_BASE` | no | Simulation output root |\n| `FSIM_DATASET_PATH` | no | Hugging Face dataset id or local dataset path |\n| `FSIM_DATASET_CACHE` | no | Hugging Face dataset cache directory |\n| `FSIM_ARTIFACT_BASE` | no | Parent directory for downloaded public artifacts |\n| `FSIM_SEARCH_DB` | for bundled LanceDB search | LanceDB artifact path |\n| `FSIM_ARTICLES_BASE` | for MinimalHarness article browsing | Dated article JSONL tree |\n| `FSIM_EMBEDDING_MODEL` | for search | Embedding model used by the LanceDB index |\n| `FSIM_MATCHER_MODEL` | no | OpenRouter\u002FvLLM model used for answer matching |\n| `FSIM_SIM_MATCHER_CACHE_DIR` | no | Optional shared matcher-cache directory |\n\nFor local overrides, edit `.env`; for one-off runs, prefix the command:\n\n```bash\nFSIM_OUTPUT_BASE=\u002Fscratch\u002F$USER\u002Ffuturesim-runs \\\npython scripts\u002Frun_forecast_sim.py --config configs\u002Fshared\u002Fdefault_sim.yaml\n```\n\n## Data And Search\n\nOpenForesight questions load from Hugging Face by default:\n`nikhilchandak\u002FOpenForesight`. The default config uses the\n`aljazeera2026Q1` split.\n\nThe simulator itself does not require a search backend. The bundled\nsearch-enabled configs use LanceDB through `agents\u002Fsearch_tools`; download the\nprebuilt artifact for those runs:\n\n```bash\nsource .venv\u002Fbin\u002Factivate\nexport FSIM_SEARCH_DB=${FSIM_SEARCH_DB:-$(pwd)\u002Fartifacts\u002Fforecast-news-embeddings}\n\nhf download shash42\u002Fforecast-news-embeddings \\\n  --repo-type dataset \\\n  --local-dir \"$FSIM_SEARCH_DB\" \\\n  --max-workers 8\n\npython scripts\u002Fcheck_search_readiness.py --db-path \"$FSIM_SEARCH_DB\"\n```\n\nSet `FSIM_SEARCH_DB` in `.env` to keep this artifact outside the repo.\n\nThe browsable article corpus is a separate dated tree:\n\n```bash\nexport FSIM_ARTICLES_BASE=${FSIM_ARTICLES_BASE:-$(pwd)\u002Fartifacts\u002Fforecast-news}\n\nhf download shash42\u002Fforecast-news \\\n  --repo-type dataset \\\n  --local-dir \"$FSIM_ARTICLES_BASE\" \\\n  --include '2025\u002F12\u002F**' \\\n  --include '2026\u002F**' \\\n  --max-workers 8\n```\n\n`FSIM_SEARCH_DB` is read by the default runner to construct the bundled\nLanceDB search tool. `articles_base` is only for MinimalHarness runs that expose the existing\n`articles\u002FYYYY\u002FMM\u002FDD\u002Farticles.jsonl` files inside the agent workspace. The\ncurrent Hugging Face corpus covers articles through 2026-03-31.\n\n### Custom Question Sets\n\nThe simulator needs smaller schema than the full OpenForesight columns.\nUse `--dataset custom --dataset_path \u003Cfile-or-dir>` with CSV, JSONL, JSON, or\nParquet. A directory may contain `test.jsonl`, `test.parquet`, or\n`test-*.parquet` style split files.\n\nRequired columns:\n\n| Column | Meaning | Accepted aliases |\n|--------|---------|------------------|\n| `qid` | Stable question id | `question_id`, `id` |\n| `title` | Forecast question shown to agents | `question_title`, `question` |\n| `resolution_date` | Date when the question resolves | `close_time`, `resolve_time` |\n| `ground_truth_answer` | Resolved answer used for scoring | `ground_truth`, `answer`, `resolution`, `resolved_to` |\n\nOptional columns:\n\n| Column | Default | Use |\n|--------|---------|-----|\n| `background` | empty | Context shown to agents |\n| `resolution_criteria` | empty | Resolution rules shown to agents |\n| `answer_type` | `freeform` | Prompt hint such as `binary`, `mcq`, `numeric`, or `freeform` |\n| `options` | empty | JSON\u002Flist of allowed options for enumerated questions |\n| `source_split` | CLI `--split` | Split-specific metrics, especially `test_daily_metrics.csv` |\n| `prompt` | empty | Optional upstream prompt text retained for compatible scaffolds |\n\nOpenForesight-specific article columns such as `url`, `article_maintext`,\n`article_publish_date`, and `prompt_without_retrieval` are not required by the\nsimulator. For example, a ForecastBench-style source should first be converted\nby joining its questions and resolutions into the columns above, then run with:\n\n```bash\npython scripts\u002Frun_forecast_sim.py \\\n  --dataset custom \\\n  --dataset_path \u002Fpath\u002Fto\u002Fquestions.jsonl \\\n  --split test\n```\n\n### Custom News Corpora\n\nQuestion sets and news corpora are independent. The environment advances time,\nexposes questions, and scores forecasts; agents decide what retrieval tools to\nuse. In the public runner, leaving `search_db` empty disables retrieval. When\n`search_db`\u002F`FSIM_SEARCH_DB` is set, `scripts\u002Frun_forecast_sim.py` constructs\nthe bundled LanceDB tool and passes it into the agents.\n\nTo swap in another corpus while using the bundled LanceDB tool, build a table\nnamed `articles` with these fields:\n\n| Field | Meaning |\n|-------|---------|\n| `chunk_id` | Unique id for this retrieved chunk |\n| `article_id` | Source document id |\n| `chunk_index` | Chunk number within the document |\n| `title` | Article\u002Fdocument title |\n| `source` | Publisher or corpus source |\n| `date` | Timestamp used for no-future-leakage filtering |\n| `date_publish` | Optional publish timestamp, also leakage-filtered when present |\n| `content` | Text searched and returned to agents |\n| `url` | Optional source URL |\n| `vector` | Embedding vector, required for semantic\u002Fhybrid search |\n\nKeyword search only needs `content` plus an FTS index. Semantic and hybrid\nsearch also need vectors built with the same embedding model named by\n`FSIM_EMBEDDING_MODEL`.\n\nTo use a different retrieval backend, add an implementation of\n`agents\u002Fsearch_tools\u002Fbase.py`'s `BaseSearchTool` contract and wire it into your\nagent or runner. The search results consumed by agents are only\n`article_id`, `title`, `source`, `date`, optional `date_publish`, `snippet`,\n`score`, and optional `url`.\n\nFor MinimalHarness article browsing, set `articles_base`\u002F`FSIM_ARTICLES_BASE` to\na dated JSONL tree: `YYYY\u002FMM\u002FDD\u002Farticles.jsonl`. Rows should provide `title`,\n`source`, `date`, and `content`; `date_publish`, `url`, `id`, and `date_modify`\nare optional but useful.\n\n## Directory Structure\n\n| Directory | Description |\n|-----------|-------------|\n| `agents\u002F` | Agent implementations (BasicAgent, AllQAgent) |\n| `environment\u002F` | Simulation environment, scoring, data loading |\n| `scripts\u002F` | CLI scripts for running simulations |\n| `configs\u002F` | YAML configuration files |\n\n## Key Commands\n\n### Run Simulation\n```bash\n# Default shared simulation\npython scripts\u002Frun_forecast_sim.py --config configs\u002Fshared\u002Fdefault_sim.yaml\n\n# Shared variant without search\npython scripts\u002Frun_forecast_sim.py --config configs\u002Fshared\u002Fdefault_nosearch_sim.yaml\n```\n\nShared answer-matching cache:\n- Sim runs still fall back to a per-run `matcher_cache.json`.\n- If `FSIM_SIM_MATCHER_CACHE_DIR` is set, `split: \"test\"` runs automatically reuse `\u003Ccache_dir>\u002F\u003Cmatcher_slug>.json` and merge new entries back only when the run exits.\n- For non-`test` runs, opt in with top-level YAML:\n  `matcher_cache: {enabled: true, path: null}`\n- Set `matcher_cache.path` to pin a specific JSON file, or `matcher_cache.enabled: false` to force the old per-run cache.\n- Point `FSIM_SIM_MATCHER_CACHE_DIR` at a writable shared directory if multiple runs should reuse matcher results.\n\n### Scaffold Names\n\nScaffold selection is explicit.\n\n- `basic`, `allQ`, and `allqd` are the base chat-tools scaffolds.\n- `qwenbasic` and `qwenallq` are thin Qwen-named compatibility wrappers over the shared chat-tools loop.\n- `minimalHarness` runs external CLI backends such as Codex, Claude Code, and OpenCode.\n- Qwen scaffolds intentionally do not replay historical hidden thinking across turns; only final assistant content and tool calls are fed back into history.\n- Model names do not automatically switch scaffolds.\n\nSet the scaffold in the config under `defaults.scaffold`.\n\n### Resume \u002F Restart\n```bash\n# Resume from last day\npython scripts\u002Frun_forecast_sim.py --resume \u002Fpath\u002Fto\u002Foutput_dir\n\n# Restart from specific day (preserves predictions before that day)\npython scripts\u002Frun_forecast_sim.py \\\n    --restart_from \u002Fpath\u002Fto\u002Foriginal\u002Frun \\\n    --restart_from_day 2025-04-05\n```\n\n## Documentation\n\n- **[agents\u002Fsearch_tools\u002FREADME.md](agents\u002Fsearch_tools\u002FREADME.md)** — Search tool contract used by agents\n- **[agents\u002FallQAgent\u002FREADME.md](agents\u002FallQAgent\u002FREADME.md)** — AllQ scaffold notes and token-budget fields\n- **[agents\u002FminimalHarnessAgent\u002FREADME.md](agents\u002FminimalHarnessAgent\u002FREADME.md)** — External CLI harness notes\n\n## Output\n\nSimulation results are saved to `FSIM_OUTPUT_BASE\u002F\u003Csim_name>\u002F\u003Ctimestamp>\u002F`:\n- `config.json` — Run configuration\n- `actions.jsonl` — All predictions and resolutions\n- `daily_metrics.csv` — One cumulative metrics row per wakeup session, including daily submission count and average TV shift vs the previous submission\n- `test_daily_metrics.csv` — Same metrics, filtered to questions whose `source_split` is `test`\n- `agents\u002F\u003Cagent_id>\u002Fmodel_raw_warmup.jsonl` — Warmup raw logs written by the agent scaffold, grouped by question id and logging only per-turn input deltas\n- `agents\u002F\u003Cagent_id>\u002Fmodel_raw_daily.jsonl` — Post-warmup raw logs written by the agent scaffold, logging only per-turn input deltas\n- `agents\u002F\u003Cagent_id>\u002F` — Per-agent logs and memory\n\n## OpenForesight Notes\n\n- `timegap_days` changes the simulator from daily wakeups to one session every `N` days. BasicAgent-style prompts mention the last and next wakeup dates during normal sessions, and metrics for active questions are evaluated through the end of that wakeup interval.\n- OpenForesight configs can prepend a window from the `train` split ahead of the main `split` with:\n  - `prepend_train_resolution_start`\n  - `prepend_train_resolution_end`\n  - `subsample_per_month`\n- Each OpenForesight question carries a `source_split` tag at load time so split-specific metrics can be logged without a separate loader path.\n","Futuresim是一个多代理预测模拟器，其中大语言模型代理对自由形式的问题进行预测并相互评分。项目基于Python开发，利用LLM（如通过OpenRouter API调用的模型）来生成预测，并支持使用LanceDB作为搜索后端以增强信息检索能力。它适用于需要评估不同预测模型性能的研究场景或实际应用中，特别是在处理复杂、开放性问题时。通过配置环境变量和数据集路径，用户可以轻松地自定义模拟实验的具体参数，从而在不同的上下文中测试预测模型的有效性和准确性。",2,"2026-06-11 04:01:51","CREATED_QUERY"]