[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80159":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":13,"stars7d":13,"stars30d":16,"stars90d":15,"forks30d":15,"starsTrendScore":17,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":14,"lastSyncTime":27,"discoverSource":28},80159,"NoPo4D","bralani\u002FNoPo4D","bralani","Official implementation of No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos","https:\u002F\u002Fbralani.github.io\u002Fnopo4d_html\u002F",null,"Python",57,1,2,0,6,3,0.9,"MIT License",false,"main",true,[],"2026-06-12 02:03:58","# NoPo4D: No Pose, No Problem in 4D\n\n### Feed-Forward Dynamic 4D Gaussian Splatting from Unposed Multi-View Videos\n\n\u003Cdiv align=\"center\">\n\n[![Project Website](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FNoPo4D-Website-4CAF50?logo=googlechrome&logoColor=white)](https:\u002F\u002Fbralani.github.io\u002Fnopo4d_html\u002F)\n[![Paper](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-Paper-b31b1b?logo=arxiv&logoColor=b31b1b)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22190)\n[![Hugging Face Model](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Model-blue)](https:\u002F\u002Fhuggingface.co\u002Fbralani01\u002Fnopo4d)\n\n\u003C\u002Fdiv>\n\n![NoPo4D Teaser](assets\u002Fimages\u002Fteaser.png)\n\nThis work presents **NoPo4D**, the first feed-forward system that jointly addresses dynamic content, multi-view input, and unknown camera poses in a single pass. In pursuit of pose-free 4D reconstruction, NoPo4D yields two key insights:\n\n* 💎 A **decomposed velocity representation** splits Gaussian motion into per-pixel image-plane shifts and depth changes. This allows direct supervision from 2D optical flow, obviating the need for complex 3D motion ground truth or differentiable rendering.\n* ✨ A **bidirectional motion encoder** paired with **view-dependent opacity** effectively aggregates cross-view features and mitigates cross-timestep Gaussian misalignments.\n\n🏆 NoPo4D consistently outperforms prior feed-forward baselines across four multi-view dynamic benchmarks (ExoRecon, Immersive Light Field, Kubric, and N3DV). With an optional post-optimization stage, it surpasses per-scene optimization methods while running orders of magnitude faster.\n\n## News\n\n- [x] Release inference code\n- [x] Release pretrained checkpoint\n- [ ] Release training code\n\n## Installation\n\nRequires Python ≥ 3.10 and a CUDA-capable GPU.\n\n```bash\ngit clone --recurse-submodules https:\u002F\u002Fgithub.com\u002Fbralani\u002FNoPo4D.git\ncd NoPo4D\npip install \"torch>=2\" torchvision --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu{YOUR_CUDA_VERSION}  # we used cu121\npip install xformers\npip install -e .  # NoPo4D\n```\n\nInstall the Depth Anything 3 backbone:\n\n```bash\ncd src\u002Fmodel\u002Fencoder\u002Fbackbone\u002FDepth-Anything-3\npip install -e .\n```\n\nOptionally, install `torch-scatter` to enable Gaussian voxelization (see [pytorch_scatter](https:\u002F\u002Fgithub.com\u002Frusty1s\u002Fpytorch_scatter)):\n\n```bash\npip install torch-scatter -f https:\u002F\u002Fdata.pyg.org\u002Fwhl\u002Ftorch-2.1.0+${CUDA}.html\n```\n\n## Quick Start\n\n### Command-line inference\n\nRun inference on a folder of images with the provided script:\n\n```bash\npython src\u002Finference.py \\\n    --image_dir assets\u002Fexamples \\\n    --num_cameras 4 \\\n    --output_dir output \\\n    --render_timestamps 10\n```\n\n`assets\u002Fexamples` contains 16 images across 4 cameras and 4 frames, named in camera-major order (`cam0_t0.png`, …, `cam3_t3.png`). The script renders from the same camera viewpoints predicted by the encoder, replaying each camera's scene at `--render_timestamps` evenly-spaced timestamps in [0, 1]. Rendered views are saved to `output\u002Fimages\u002F` and optical flow visualisations to `output\u002Foptical_flow\u002F`.\n\n### Python API\n\n```python\nimport torch\nfrom src.model.nopo4d import NoPo4D\n\n# Load pretrained model from Hugging Face\nmodel = NoPo4D.from_pretrained(\"bralani01\u002Fnopo4d\")\nmodel = model.to(\"cuda\").eval()\n\n# images:     (B, V, 3, H, W)  — camera-major order\n#             V = num_cameras * num_frames\n#             e.g. 2 cameras x 3 frames -> [cam0_t0, cam0_t1, cam0_t2, cam1_t0, cam1_t1, cam1_t2]\n# timestamps: (B, V) in [0, 1] — same layout as images; pass None for static scenes\n\n# Run the Encoder\nencoder_output = model(\n    images=images,\n    timestamps=timestamps,\n    num_cameras=num_cameras,\n)\n# encoder_output.gaussians     — 4D Gaussian primitives\n# encoder_output.camera_pose   — predicted extrinsics \u002F intrinsics\n# encoder_output.depth         — per-view depth maps\n# encoder_output.optical_flow  — per-view forward \u002F backward flow\n\n# Render novel views\nrender_output = model.render(\n    gaussians=encoder_output.gaussians,\n    extrinsics=target_extrinsics,    # (B, V, 4, 4)  c2w matrices\n    intrinsics=target_intrinsics,    # (B, V, 3, 3)  normalised intrinsics\n    image_shape=(H, W),\n    timestamps=target_timestamps,    # (B, V) or None\n)\n# render_output.color: (B, V, 3, H, W)\n# render_output.depth: (B, V, H, W)\n```\n\n## Citation\n\nIf you find this work useful, please cite:\n\n```bibtex\n@misc{balice2026poseproblem4dfeedforward,\n      title={No Pose, No Problem in 4D: Feed-Forward Dynamic Gaussians from Unposed Multi-View Videos},\n      author={Matteo Balice and Yanik Kunzi and Chenyangguang Zhang and Matteo Matteucci and Marc Pollefeys and Sungwhan Hong},\n      year={2026},\n      eprint={2605.22190},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.22190},\n}\n```\n\n## Acknowledgement\n\nWe thank the authors of these excellent works:\n- [Depth Anything 3](https:\u002F\u002Fgithub.com\u002FDepthAnything\u002FDepth-Anything-V3) — backbone ViT\n- [gsplat](https:\u002F\u002Fgithub.com\u002Fnerfstudio-project\u002Fgsplat) — CUDA Gaussian splatting backend\n- [AnySplat](https:\u002F\u002Fgithub.com\u002FInternRobotics\u002FAnySplat) — feed-forward Gaussian splatting framework\n","NoPo4D 是一个从无姿态多视角视频中生成前馈动态4D高斯点云的系统。其核心技术包括分解的速度表示法，该方法将高斯运动分解为每个像素在图像平面上的位移和深度变化，从而可以直接利用2D光流进行监督，无需复杂的3D运动真值或可微渲染；以及双向运动编码器结合视图依赖的透明度机制，有效聚合跨视图特征并减少跨时间步高斯点的错位。适用于需要从多视角视频序列中重建动态场景的应用场景，如虚拟现实、增强现实及影视特效制作等。项目开源代码基于Python开发，并提供了预训练模型供用户快速上手体验。","2026-06-11 03:59:28","CREATED_QUERY"]