[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80721":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},80721,"PRISM","positionprivacy\u002FPRISM","positionprivacy","Repo for PRISM(Programmatic Reasoning In Spatial Modalities)","",null,"Python",83,9,8,0,19,40,53.5,false,"main",true,[],"2026-06-12 04:01:29","\u003Ch1 align=\"center\">\n  PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning\n\u003C\u002Fh1>\n\n\n\n\n\u003Cp align=\"center\">\n  \u003Cimg alt=\"PRISM 10,372 samples\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPRISM-10%2C372%20samples-1F6FEB?style=flat-square\">\n  \u003Cimg alt=\"Bilingual EN and ZH\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLanguage-EN%20%2F%20ZH-5B4B8A?style=flat-square\">\n  \u003Cimg alt=\"Python 3.10+\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPython-3.10%2B-3776AB?style=flat-square&logo=python&logoColor=white\">\n  \u003Cimg alt=\"Manim CE 0.19.0\" src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FManim%20CE-0.19.0-00B16A?style=flat-square\">\n\u003C\u002Fp>\n\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"figs\u002FPRISM.pdf\">\n    \u003Cimg src=\"figs\u002FPRISM.png\" alt=\"PRISM overview figure\" width=\"960\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\n\n## ✨ Overview\n\nPRISM is a benchmark for studying programmatic spatial-temporal reasoning in Manim code generation. This repository bundles both the released dataset and an evaluation-first toolkit for rendering, deterministic spatial audit, PADVC \u002F TD scoring, and text-expansion analysis.\n\nThe released PRISM dataset is bundled under `data\u002F`. It contains 10,372 examples in total, with 5,199 English prompts and 5,173 Chinese prompts.\n\n\u003Cp align=\"center\">\n  \u003Ca href=\"figs\u002FPRISM.pdf\">\n    \u003Cimg src=\"figs\u002Fbenchmark_statistics.png\" alt=\"PRISM overview figure\" width=\"960\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n> [!NOTE]\n> PRISM bundles both the released bilingual benchmark and the full evaluation toolkit. If you only want to score an existing Manim output directory, you can skip the optional generation components entirely.\n\n\n## 🧪 Evaluation\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\"figs\u002Feval.png\" alt=\"PRISM evaluation funnel\" width=\"300\">\n\u003C\u002Fp>\n\nPRISM evaluates outputs through a progressive funnel:\n\n- `Execution`: code-level reliability, including API misuse, syntax errors, and LaTeX failures.\n- `Spatial`: geometric reasoning quality, including overlap, leak, and out-of-bounds issues.\n- `PADVC & TD`: visual richness and temporal density once basic correctness is satisfied.\n  \n> [!WARNING]\n> `PADVC_center` and `TD_center` are not plug-and-play defaults. To obtain meaningful comparisons, prepare the required local OCR \u002F embedding assets and fit reference parameters on your own curated reference set first.\n\n## 🧭 Workflow\n\n```text\nGenerated Manim scripts (.py)\n        |\n        v\nrender_directory.py  ->  videos\u002F*.mp4\n        |\n        v\naudit_batch.py       ->  audit\u002Fresults.json\n        |\n        +--> score_padvc.py  ->  padvc_final\u002Fpadvc_scores.jsonl\n        |\n        +--> score_td.py     ->  td_final\u002Ftd_center_scores.jsonl\n        |\n        +--> compute_text_expansion.py  ->  info_expand_final\u002F\n```\n\nThe main end-to-end entrypoint is:\n\n```bash\nscripts\u002Frun_evaluation_pipeline.sh \\\n  your_model_run\u002Fcleaned_scripts \\\n  results\u002Feval_your_model \\\n  your_model_run\u002Ftask_manifest.json \\\n  data\u002Fyour_prompts.jsonl \\\n  results\u002Freference_padvc\u002Fpadvc_norm_params.json \\\n  results\u002Freference_td\u002Ftd_center_params.json \\\n  your_model_name\n```\n\n## 📦 Repository Layout\n\n| Path | Purpose |\n| --- | --- |\n| `scripts\u002F` | Command-line tools for evaluation, metric fitting, and optional generation |\n| `manim_bench\u002Fllm_call\u002F` | Minimal LLM client wrapper used by the optional generation flow |\n| `docs\u002F` | Technical notes for data format, metrics, and spatial audit semantics |\n| `examples\u002F` | Toy inputs and example configs for smoke tests |\n| `data\u002F` | Released PRISM dataset plus local dataset workspace |\n| `results\u002F` | Default local workspace for generated outputs and summaries |\n\n## 🚀 Quickstart\n\n### 1. Install Python dependencies\n\n```bash\npython3 -m venv .venv\nsource .venv\u002Fbin\u002Factivate\npython -m pip install --upgrade pip\npython -m pip install -r requirements.txt\n```\n\n### 2. Check the runtime\n\n```bash\npython scripts\u002Fcheck_environment.py\n```\n\n### 3. Prepare OCR and embedding models for PADVC\n\nDefault OCR backend:\n\n- `rapidocr-onnxruntime`\n- optional fallback: `paddleocr`\n\nEmbedding models used by `scripts\u002Fpadvc.py`:\n\n- Chinese: `shibing624\u002Ftext2vec-base-chinese`\n- English \u002F multilingual: `sentence-transformers\u002Fparaphrase-multilingual-MiniLM-L12-v2`\n\nThe PADVC pipeline defaults to offline Hugging Face mode:\n\n```bash\nexport PADVC_HF_CACHE=\u002Fpath\u002Fto\u002Fhuggingface\u002Fhub\n# or point directly to local snapshots\nexport PADVC_ZH_MODEL=\u002Fpath\u002Fto\u002Ftext2vec-base-chinese\nexport PADVC_EN_MODEL=\u002Fpath\u002Fto\u002Fparaphrase-multilingual-MiniLM-L12-v2\n\nexport PADVC_OCR_BACKEND=rapidocr\nexport PADVC_OCR_CACHE_DIR=results\u002Focr_cache\nexport PADVC_DEBUG=0\n```\n\n### 4. Evaluate an existing model run\n\nExpected inputs:\n\n- `your_model_run\u002Fcleaned_scripts\u002F*.py`\n- `your_model_run\u002Ftask_manifest.json`\n- a prompt JSONL such as `data\u002Fyour_prompts.jsonl`\n- fitted reference parameters for PADVC and TD\n\nThen run:\n\n```bash\nscripts\u002Frun_evaluation_pipeline.sh \\\n  your_model_run\u002Fcleaned_scripts \\\n  results\u002Feval_your_model \\\n  your_model_run\u002Ftask_manifest.json \\\n  data\u002Fyour_prompts.jsonl \\\n  results\u002Freference_padvc\u002Fpadvc_norm_params.json \\\n  results\u002Freference_td\u002Ftd_center_params.json \\\n  your_model_name\n```\n\nIf your outputs were not produced by `scripts\u002Fgenerate_code.py`, prepare the minimal manifest and prompt JSONL formats described in [`docs\u002Fdata_format.md`](docs\u002Fdata_format.md).\n\n## 🧪 Fit Reference Parameters\n\n`PADVC_center` and `TD_center` require reference statistics fitted on your own curated reference set:\n\n```bash\npython scripts\u002Ffit_reference_padvc.py \\\n  --dataset-jsonl data\u002Fyour_reference_dataset.jsonl \\\n  --video-root results\u002Freference_videos \\\n  --output-dir results\u002Freference_padvc\n\npython scripts\u002Ffit_reference_td.py \\\n  --dataset-jsonl results\u002Freference_padvc\u002Fpadvc_reference_raw_scores.jsonl \\\n  --output-dir results\u002Freference_td\n```\n\nThe example files under `examples\u002Fparams\u002F` are placeholders for smoke tests only.\n\n## 🤖 Optional Generation\n\nIf you also want to generate model outputs inside this repository:\n\n```bash\ncp manim_bench\u002Fllm_call\u002Fconfig.example.json manim_bench\u002Fllm_call\u002Fconfig.json\n```\n\nYou can also choose a different config path:\n\n```bash\nexport MANIM_BENCH_LLM_CONFIG=\u002Fpath\u002Fto\u002Fconfig.json\n```\n\nThen run:\n\n```bash\npython scripts\u002Fgenerate_code.py \\\n  --input-jsonl examples\u002Fsample_prompts.jsonl \\\n  --instruction-field instruction \\\n  --model your-model-name \\\n  --workers 2 \\\n  --temperature 0.7 \\\n  --output-dir results\u002Fexample_generation\n```\n\n## 🛠️ System Requirements\n\nRecommended environment:\n\n- Linux or macOS\n- Python 3.10+\n- Manim Community Edition 0.19.0\n- FFmpeg\n- Cairo \u002F Pango \u002F `pkg-config`\n- LaTeX toolchain for `Tex` and `MathTex`\n- CJK-capable fonts if you render Chinese text\n\n\u003Cdetails>\n\u003Csummary>Platform package examples\u003C\u002Fsummary>\n\nUbuntu:\n\n```bash\nsudo apt-get update\nsudo apt-get install -y \\\n  ffmpeg pkg-config libcairo2-dev libpango1.0-dev \\\n  texlive texlive-latex-extra texlive-fonts-recommended \\\n  texlive-xetex dvisvgm ghostscript \\\n  fonts-noto-cjk fontconfig\n```\n\nmacOS:\n\n```bash\nbrew install ffmpeg cairo pango pkg-config mactex-no-gui font-noto-sans-cjk\n```\n\n\u003C\u002Fdetails>\n\n## 📚 Documentation\n\n- [`docs\u002Fdata_format.md`](docs\u002Fdata_format.md): expected JSON \u002F JSONL layouts\n- [`docs\u002Fspatial_audit.md`](docs\u002Fspatial_audit.md): spatial-audit semantics\n- [`docs\u002Fmetrics.md`](docs\u002Fmetrics.md): PADVC, TD, and text-expansion overview\n- [`docs\u002Fcode_error_taxonomy.md`](docs\u002Fcode_error_taxonomy.md): code-failure categories\n- [`scripts\u002FREADME.md`](scripts\u002FREADME.md): script inventory and command examples\n","PRISM 是一个用于研究 Manim 代码生成中程序化时空推理的基准。该项目提供了包含10,372个样本的数据集（5,199个英文提示和5,173个中文提示），以及一套全面的评估工具，支持渲染、确定性空间审计、PADVC\u002FTD评分及文本扩展分析等功能。核心功能包括执行层面的可靠性检查、几何推理质量评估以及视觉丰富度和时间密度评价。技术上基于Python 3.10+构建，并使用Manim CE 0.19.0进行动画渲染。适用于需要对自动生成或手动编写的Manim脚本进行多维度质量评估的研究场景。",2,"2026-06-11 04:01:46","CREATED_QUERY"]