[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80068":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":17,"rankGlobal":10,"rankLanguage":10,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":19,"hasPages":21,"topics":22,"createdAt":10,"pushedAt":10,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":15,"starSnapshotCount":15,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},80068,"HERMESV2","H-EmbodVis\u002FHERMESV2","H-EmbodVis","HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation","https:\u002F\u002Fh-embodvis.github.io\u002FHERMESV2\u002F",null,"Python",64,10,4,0,3,44.92,"Apache License 2.0",false,"main",true,[],"2026-06-13 04:01:23","\u003Cdiv align=\"center\">\n\n\u003Ch3>HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation\u003C\u002Fh3>\n\n\u003Ca href=\"https:\u002F\u002Flmd0311.github.io\u002F\">Xin Zhou\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n\u003Ca href=\"https:\u002F\u002Fdk-liang.github.io\u002F\">Dingkang Liang\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>,\n\u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=PVMQa-IAAAAJ&amp;hl=en\">Xiwu Chen\u003C\u002Fa>\u003Csup>2\u003C\u002Fsup>,\nFeiyang Tan\u003Csup>2\u003C\u002Fsup>,\nDingyuan Zhang\u003Csup>1\u003C\u002Fsup>,\n\u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=4uE10I0AAAAJ&amp;hl=en\">Hengshuang Zhao\u003C\u002Fa>\u003Csup>3\u003C\u002Fsup>,\n\u003Ca href=\"https:\u002F\u002Fscholar.google.com\u002Fcitations?user=UeltiQ4AAAAJ&amp;hl=en\">Xiang Bai\u003C\u002Fa>\u003Csup>1\u003C\u002Fsup>\n\n\u003Cp>\n\u003Csup>1\u003C\u002Fsup>Huazhong University of Science and Technology, \u003Csup>2\u003C\u002Fsup>Mach Drive, \u003Csup>3\u003C\u002Fsup>The University of Hong Kong\n\u003C\u002Fp>\n\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.28196\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHERMES++-arXiv-b31b1b?logo=arxiv\" alt=\"HERMES Conference arXiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fh-embodvis.github.io\u002FHERMESV2\u002F\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHERMES++-Project_Page-2c7a3f?logo=githubpages\" alt=\"HERMES++ Project Page\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FH-EmbodVis\u002FHERMESV2\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHERMES++-Weights-orange?logo=huggingface\" alt=\"HERMES++ Weights\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2501.14729\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHERMES_(ICCV25)-arXiv-b31b1b?logo=arxiv\" alt=\"HERMES Conference arXiv\">\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FLMD0311\u002FHERMES\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHERMES-Conference_Code_(ICCV25)-181717?logo=github\" alt=\"HERMES Conference Code\">\u003C\u002Fa>\n  \u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode%20License-Apache_2.0-green.svg\" alt=\"License\">\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003C\u002Fdiv>\n\n## Abstract\n\nDriving world models serve as a pivotal technology for autonomous driving by simulating environmental dynamics. However, existing approaches predominantly focus on future scene generation, often overlooking comprehensive 3D scene understanding. Conversely, while Large Language Models (LLMs) demonstrate impressive reasoning capabilities, they lack the capacity to predict future geometric evolution, creating a significant disparity between semantic interpretation and physical simulation. To bridge this gap, we propose HERMES++, a unified driving world model that integrates 3D scene understanding and future geometry prediction within a single framework. Our approach addresses the distinct requirements of these tasks through synergistic designs. First, a BEV representation consolidates multi-view spatial information into a structure compatible with LLMs. Second, we introduce LLM-enhanced world queries to facilitate knowledge transfer from the understanding branch. Third, a Current-to-Future Link is designed to bridge the temporal gap, conditioning geometric evolution on semantic context. Finally, to enforce structural integrity, we employ a Joint Geometric Optimization strategy that integrates explicit geometric constraints with implicit latent regularization to align internal representations with geometry-aware priors. Extensive evaluations on multiple benchmarks validate the effectiveness of our method. HERMES++ achieves strong performance, outperforming specialist approaches in both future point cloud prediction and 3D scene understanding tasks.\n\n## TL; DR\n\n- **Unified driving world model:** jointly supports 3D scene understanding and future geometry prediction.\n- **BEV representation for LLMs:** compresses multi-view visual inputs into spatially consistent BEV tokens.\n- **LLM-enhanced world queries:** transfer semantic and world knowledge from language reasoning to future generation.\n- **Current-to-Future Link:** bridges current scene understanding and future geometric evolution.\n- **Textual Injection:** uses text embeddings as conditioning signals for future scene generation.\n- **Joint Geometric Optimization:** aligns latent features with geometry-aware priors through explicit and implicit constraints.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"figures\u002Fintro.png\" width=\"85%\" alt=\"HERMES++ overview\">\n\u003C\u002Fdiv>\n\n## Updates\n\n- **2025.04.30:** Release extended [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2604.28196) and code.\n- **2025.06.26:** The HERMES conference version is accepted to ICCV 2025.\n- **2025.01.24:** The HERMES paper and demo were released.\n\n## Method Overview\n\nHERMES++ unifies understanding and generation around a shared BEV representation:\n\n1. Multi-view images are encoded and projected into BEV space.\n2. BEV features are compressed into LLM-compatible visual tokens.\n3. The LLM performs scene understanding and enriches world queries with semantic knowledge.\n4. The Current-to-Future Link generates future latent representations conditioned on current BEV features, textual semantics, and future ego-motion.\n5. A future geometry decoder predicts future point clouds, optimized with Joint Geometric Optimization.\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"figures\u002Fpipeline.png\" width=\"85%\" alt=\"HERMES++ pipeline\">\n\u003C\u002Fdiv>\n\n## Main Results\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"figures\u002Fmain_results.png\" width=\"85%\" alt=\"HERMES++ main results\">\n\u003C\u002Fdiv>\n\n## Qualitative Results\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"figures\u002Fqualitative_examples.png\" width=\"85%\" alt=\"HERMES++ qualitative examples\">\n\u003C\u002Fdiv>\n\n## Demo\n\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"figures\u002Fgifs\u002Fhermespp_demo_1.gif\" width=\"85%\" alt=\"HERMES++ Demo 1\">\n  \u003Cbr>\n  \u003Cem>Demo 1\u003C\u002Fem>\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"figures\u002Fgifs\u002Fhermespp_demo_2.gif\" width=\"85%\" alt=\"HERMES++ Demo 2\">\n  \u003Cbr>\n  \u003Cem>Demo 2\u003C\u002Fem>\n\u003C\u002Fdiv>\n\u003Cdiv align=\"center\">\n  \u003Cimg src=\"figures\u002Fgifs\u002Fhermespp_demo_3.gif\" width=\"85%\" alt=\"HERMES++ Demo 3\">\n  \u003Cbr>\n  \u003Cem>Demo 3\u003C\u002Fem>\n\u003C\u002Fdiv>\n\n\n## Getting Started\n\nWe provide separate setup, data, and usage documents:\n\n- [Environment Setup](docs\u002FEnvironment.md)\n- [Data and Weights Preparation](docs\u002FData.md)\n- [Usage Guide](docs\u002FUsage.md)\n\nAfter preparing the environment and data, train or evaluate with the configs in [`projects\u002Fconfigs\u002Fhermes`](projects\u002Fconfigs\u002Fhermes).\n\n## To Do\n\n- [x] Release demo.\n- [x] Release checkpoints.\n- [x] Release training code.\n- [x] Release processed datasets.\n\n## Acknowledgement\n\nThis project builds on HERMES, BEVFormer v2, InternVL, UniPAD, OmniDrive, DriveMonkey, and related open-source autonomous driving research. We thank the authors of these projects for their contributions to the community.\n\n## Citation\n\nIf this repository is useful for your research, please consider citing these papers.\n\n```bibtex\n@article{zhou2026hermespp,\n  title={HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation},\n  author={Zhou, Xin and Liang, Dingkang and Chen, Xiwu and Tan, Feiyang and Zhang, Dingyuan and Zhao, Hengshuang and Bai, Xiang},\n  journal={arXiv preprint arXiv:2604.28196},\n  year={2026}\n}\n@inproceedings{zhou2025hermes,\n  title={HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation},\n  author={Zhou, Xin and Liang, Dingkang and Tu, Sifan and Chen, Xiwu and Ding, Yikang and Zhang, Dingyuan and Tan, Feiyang and Zhao, Hengshuang and Bai, Xiang},\n  booktitle={Proceedings of the IEEE\u002FCVF International Conference on Computer Vision},\n  year={2025}\n}\n```\n","HERMES++是一个面向3D场景理解和生成的统一驾驶世界模型。其核心功能包括利用BEV（鸟瞰图）表示整合多视角空间信息，通过LLM（大型语言模型）增强的世界查询促进知识迁移，并设计了当前到未来的链接来桥接时间差距，同时采用联合几何优化策略确保结构完整性。这些技术特点使得HERMES++能够在一个框架内实现对环境动态的全面模拟和预测。该项目特别适用于自动驾驶领域中的场景理解与未来状态预测，为提高自动驾驶系统的感知能力和决策质量提供了有力支持。",2,"2026-06-11 03:59:06","CREATED_QUERY"]