[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-82800":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":11,"contributorsCount":11,"subscribersCount":11,"size":11,"stars1d":13,"stars7d":14,"stars30d":15,"stars90d":11,"forks30d":11,"starsTrendScore":16,"compositeScore":11,"rankGlobal":8,"rankLanguage":8,"license":8,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":19,"topics":20,"createdAt":8,"pushedAt":8,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":11,"starSnapshotCount":11,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},82800,"EmbodiedWorld-200K","XiaokunFeng\u002FEmbodiedWorld-200K","XiaokunFeng",null,"Python",98,0,51,4,7,47,12,false,"main",true,[],"2026-06-12 02:04:28","# EmbodiedWorld-200K\n\nReference code for **EmbodiedWorld-200K**, the large-scale open-world\nembodied-planning dataset. This repository releases the **data\nconstruction pipeline** and the **evaluation\ntoolkit**  so that:\n\n1. **Anyone can reproduce or extend the dataset.** The pipeline turns\n   raw gameplay clips and 6-DoF camera-pose trajectories into the\n   canonical `(o₀, ℓ, a₁:T)` triplet format, with all hyperparameters\n   matching the paper.\n2. **Anyone can score new methods on EmbodiedWorld-200K under the\n   same protocol** that we used to report the numbers in our tables.\n\nThe dataset itself, baseline checkpoints, and the trained EWA model are\nhosted separately on the project page:\n\u003Chttps:\u002F\u002Fxiaokunfeng.github.io\u002FEmbodiedWorld-200K\u002F>\n\n\n## Installation\n\nThe code targets **Python ≥ 3.10** and is intentionally lean. The\n\"core\" dependencies (NumPy, Pillow) are needed to run Steps 1+2 of the\npipeline and the entire evaluation toolkit on CPU; the Step 3 VLM\nannotation modules need an additional GPU stack.\n\n```bash\n# 1. Clone and enter the release directory\ngit clone \u003CTHIS_REPO_URL>\ncd code_release\n\n# 2. Core deps (Steps 1+2 + evaluation, CPU-only is fine)\npip install -r requirements.txt\n\n# 3. (Optional) heavy stack for Step 3 VLM annotation\npip install torch transformers vllm decord qwen-vl-utils\n```\n\n`vllm` requires a recent CUDA-capable GPU. We tested with\n`Qwen3.5-27B`, `vllm>=0.6`, and `transformers>=4.45`.\n\n## Quick start\n\n### A. Build the dataset (Steps 1+2, CPU)\n\nGiven a directory of raw sample manifests (one JSON per gameplay clip,\neach pointing to a video and its 6-DoF camera-pose JSON; see\n[`data_pipeline\u002Fexamples\u002Fexample_input.json`](data_pipeline\u002Fexamples\u002Fexample_input.json)):\n\n```bash\npython -m data_pipeline.run_pipeline \\\n    --input_dir  \u002Fpath\u002Fto\u002Fraw_samples\u002F \\\n    --output_dir \u002Fpath\u002Fto\u002Flabeled_out\u002F\n```\n\nEach output JSON inherits all input fields and adds a `segments` block\ncontaining every navigation-coherent segment together with its\nvariable-length W\u002FA\u002FS\u002FD action streams. Defaults match the paper:\n`trans_unit=0.05`, `rot_unit_deg=5.0`, `min_segment_len=60`,\n`angle_threshold_deg=90`.\n\n### B. Add VLM-based instruction annotation (Step 3, GPU + vLLM)\n\n```bash\npython -m data_pipeline.run_pipeline \\\n    --input_dir  \u002Fpath\u002Fto\u002Fraw_samples\u002F \\\n    --output_dir \u002Fpath\u002Fto\u002Flabeled_out\u002F \\\n    --run_step3_detailed \\\n    --run_step3_goal \\\n    --vlm_model_path Qwen\u002FQwen3.5-27B \\\n    --gpu_nums 4\n```\n\nYou can also call each step independently:\n`data_pipeline\u002Finstruction_annotation\u002F{detailed_movement.py,\ndirection_consistency.py, goal_navigation.py}` each ship with their own\nCLI, useful for chunked \u002F multi-machine deployment.\n\n### C. Evaluate predictions\n\nGiven a flat-list eval JSON dumped by your inference loop (see\n[`evaluation\u002Fexamples\u002Fexample_eval_input.json`](evaluation\u002Fexamples\u002Fexample_eval_input.json)):\n\n```bash\npython -m evaluation.eval --eval-json my_eval_dump.json\n```\n\nThe console output reports the five paper metrics (**TM**, **DirAcc**,\n**nDTW**, **SR**, **NE**) plus complementary diagnostics, with a\nper-`move_type_bucket` breakdown when the eval JSON carries that meta\nfield. Use `--csv per_sample.csv` to dump per-sample numbers and\n`--report-json` to write an aggregate JSON summary next to the input.\n\n\n\n## Package vs. script use\n\nEvery module in `data_pipeline\u002F` and `evaluation\u002F` is also importable\nas a Python package:\n\n```python\nfrom data_pipeline import segment_trajectory, discretize_segment_by_magnitude\nfrom evaluation     import eval_one, aggregate, load_json_str\n```\n\nThis makes it trivial to plug the algorithms into your own training\nloop or to swap individual hyperparameters without touching the CLI.\n\n\n\n## Acknowledgements\n\nThe pipeline builds on top of the following community efforts: the\n[OGameData](https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002F...) gameplay-video\nrepository, the [VIPE](https:\u002F\u002Fgithub.com\u002F...) 6-DoF pose estimator,\nand the [Qwen3.5](https:\u002F\u002Fhuggingface.co\u002FQwen) family of vision-language\nmodels. Please cite these works when using their components.\n","EmbodiedWorld-200K 是一个大规模开放世界的具身规划数据集的参考代码。该项目提供了数据构建管道和评估工具包，使得任何人都可以复现或扩展该数据集，并且能够使用与论文中相同的协议来评估新方法。其核心功能包括将原始游戏片段和6自由度相机姿态轨迹转换为标准格式的数据处理流程，以及基于VLM（视觉语言模型）的指令标注模块。适用于需要进行大规模具身智能研究、开发及评估的场景，如机器人导航、虚拟环境中的行为预测等。项目支持Python 3.10及以上版本，部分组件运行需GPU支持。",2,"2026-06-11 04:09:19","CREATED_QUERY"]