[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-77197":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":10,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":22,"hasPages":24,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":15,"lastSyncTime":29,"discoverSource":30},77197,"HumanNet","DAGroup-PKU\u002FHumanNet","DAGroup-PKU","HumanNet: Scaling Human-centric Video Learning to One Million Hours","",null,"Python",198,1,7,2,0,4,6,71,12,0.9,false,"main",true,[],"2026-06-12 02:03:42","\u003Cdiv align=\"center\">\n\n## HumanNet: Human-Centric Video Learning and Embodied AI Resources\n\n**DAGroup & SimpleSilicon Innovation Team**\n\nPeking University\n\n\u003C\u002Fdiv>\n\n## 🔥 News\n* `[2026.05.18]` 🔥 We release **StableVLA**. Congratulations on its acceptance to **ICML 2026**! It is a vision-language-action model for robust robot policy learning. See [Docs](.\u002Fdocs\u002Fstablevla.md) | [Code](.\u002Fsrc\u002Fmodel\u002FStableVLA\u002F) | [Project](https:\u002F\u002Fdagroup-pku.github.io\u002FStableVLA\u002F) | [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.18287) | [Checkpoint](https:\u002F\u002Fhuggingface.co\u002FDAGroup-PKU\u002FStableVLA\u002Ftree\u002Fmain).\n* `[Next Month]` 🔥 We are preparing the open-source release of the HumanNet corpus, the curation pipeline, and the post-training validation code. Stay tuned!\n* `[2026.05.11]`🔥 The **HumanNet** technical report and project page have been released: [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.06747) | [Project](https:\u002F\u002Fdagroup-pku.github.io\u002FHumanNet\u002F).\n\n## 📑 Todo List\n- [x] Release the **HumanNet** technical report on arXiv. ✅\n- [x] Release **StableVLA** model code and documentation. ✅\n- [ ] Release a HumanNet preview subset on Hugging Face for early access.\n- [ ] Release the full one-million-hour HumanNet corpus with metadata and annotations.\n- [ ] Release the trained checkpoints initialized from HumanNet.\n\n\n## 📣 Overview\n![teaser](.\u002Fassets\u002Fteaser.png)\nThis repository is maintained as a growing research hub for human-centric video data, embodied learning models, and validation code. It currently centers on **HumanNet**, a one-million-hour human-centric video corpus, and will also host related models, training recipes, evaluation protocols, and release notes.\n\nThe initial core release is **HumanNet**, a scalable infrastructure for fine-grained activity understanding, motion-aware video learning, and embodied pretraining. HumanNet pairs first-person and third-person footage with caption labels, motion annotations, and hand and body signals, organized by a multi-axis taxonomy and produced by a curation pipeline that treats human-centric filtering, viewpoint characterization, quality control, and privacy review as first-class design choices.\n\n\n## 🎥 Demo\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F52eaa410-0ec4-4f89-81e8-d2ecf9bb351c\n\n\n## 📚 Dataset Family\n\n| Dataset | Status | Documentation | Resources |\n|---|---|---|---|\n| **HumanNet** | Documentation available | [Docs](.\u002Fdocs\u002Fhumandata.md) | [src\u002Fdataset\u002Fhumandata](.\u002Fsrc\u002Fdataset\u002Fhumandata\u002F) |\n| **Rovid-X** | Placeholder available | Coming soon | [src\u002Fdataset\u002Frovid-x](.\u002Fsrc\u002Fdataset\u002Frovid-x\u002F) |\n\n## 🤖 Model Family\n\n| Model | Status | Documentation | Code |\n|---|---|---|---|\n| **StableVLA** | Code and docs available | [Docs](.\u002Fdocs\u002Fstablevla.md) | [src\u002Fmodel\u002FStableVLA](.\u002Fsrc\u002Fmodel\u002FStableVLA\u002F) |\n\n## 🗂️ Repository Map\n\n```text\nHumanNet\u002F\n├── README.md                 # Repository entry point\n├── docs\u002F                     # Component-level documentation and release notes\n│   ├── humandata.md          # HumanNet dataset documentation\n│   └── stablevla.md          # StableVLA documentation\n├── assets\u002F                   # Figures used by the repository README\n└── src\u002F\n    ├── dataset\u002F\n    │   ├── humandata\u002F        # HumanNet dataset resources\n    │   └── rovid-x\u002F          # ROViD-X dataset resources\n    └── model\u002F\n        └── StableVLA\u002F        # StableVLA source code, training scripts, and model README\n```\n\n\n\n\n## 🔧 Usage\n*Coming soon.*\n\n```bash\n# Download a HumanNet subset (placeholder)\n# if you are in china mainland, run this first: export HF_ENDPOINT=https:\u002F\u002Fhf-mirror.com\n# pip install -U \"huggingface_hub[cli]\"\nhuggingface-cli download DAGroup-PKU\u002FHumanNet\n```\n\n\n## 🙏 Acknowledgement\nWe gratefully acknowledge **SimpleSilicon Innovation** for providing funding and resource support, and **Astribot** for providing real-robot platforms and deployment experiment support.\n\n## 📧 Ethics Concerns\nThe videos referenced in this repository are sourced from public domains and intended solely to showcase the capabilities of this research. Human-centric video raises non-trivial privacy, consent, and dual-use concerns; any release will follow license review, redaction, restricted-content filtering, access controls where necessary, and clear documentation of what is included or excluded.\n\n* The service is a research preview. Please contact us if you find any potential violations.\n\n## ✏️ Citation\n\nIf you find our work useful in your research, please consider giving a star :star: and citation :pencil:.\n\n### BibTeX\n```bibtex\n@article{deng2026humannet,\n  title={HumanNet: Scaling Human-centric Video Learning to One Million Hours},\n  author={Deng, Yufan and Zhou, Daquan},\n  journal={arXiv preprint arXiv:2605.06747},\n  year={2026}\n}\n\n@misc{fu2026stablevlarobustvisionlanguageactionmodels,\n      title={StableVLA: Towards Robust Vision-Language-Action Models without Extra Data}, \n      author={Yiyang Fu and Chubin Zhang and Shukai Gong and Yufan Deng and Kaiwei Sun and Qiyang Min and Qibin Hou and Yansong Tang and Jianan Wang and Daquan Zhou},\n      year={2026},\n      eprint={2605.18287},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.18287}, \n}\n\n```\n","HumanNet项目专注于构建一个以人类为中心的视频学习库，规模达到一百万小时。该项目的核心功能包括细粒度活动理解、运动感知视频学习以及具身预训练，通过结合第一人称和第三人称视角的视频片段与标注、动作注释及手部和身体信号来实现。这些数据按照多轴分类法组织，并通过一个精心设计的管理流程处理，确保内容的质量与隐私保护。此外，HumanNet还发布了名为StableVLA的模型，这是一种用于强化机器人策略学习的视觉-语言-行动模型。此项目非常适合于需要大规模人类行为分析的应用场景，如机器人技术、虚拟现实等领域的研究与发展。","2026-06-11 03:55:10","CREATED_QUERY"]