[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71163":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":25,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":34,"discoverSource":35},71163,"Wonder3D","xxlong0\u002FWonder3D","xxlong0","Single Image to 3D using Cross-Domain Diffusion for 3D Generation","https:\u002F\u002Fwww.xxlong.site\u002FWonder3D\u002F",null,"Python",5382,438,51,154,0,2,3,14,6,70.83,"MIT License",false,"main",true,[27,28,29,30],"3d-aigc","3d-generation","3dgeneration","single-image-to-3d","2026-06-12 04:00:59","**中文版本 [中文](README_zh.md)**\n# Wonder3D\nSingle Image to 3D using Cross-Domain Diffusion (CVPR 2024 Highlight)\n## [Paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2310.15008) | [Project page](https:\u002F\u002Fwww.xxlong.site\u002FWonder3D\u002F) | [Hugging Face Demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fflamehaze1115\u002FWonder3D-demo) | [Colab from @camenduru](https:\u002F\u002Fgithub.com\u002Fcamenduru\u002FWonder3D-colab)\n\n![](assets\u002Ffig_teaser.png)\n\nWonder3D reconstructs highly-detailed textured meshes from a single-view image in only 2 ∼ 3 minutes. Wonder3D first generates consistent multi-view normal maps with corresponding color images via a cross-domain diffusion model, and then leverages a novel normal fusion method to achieve fast and high-quality reconstruction.\n\n## News\n- 2024.12.22 We have extent the [Wonder3D] to a more advanced version, [Wonder3D++](https:\u002F\u002Fgithub.com\u002Fxxlong0\u002FWonder3D\u002Ftree\u002FWonder3D_Plus)!.\n- 2024.08.29 **\u003Cspan style=\"color:red\">Fixed an issue in '\u002Fmvdiffusion\u002Fpipelines\u002Fpipeline_mvdiffusion_image' where cross-domain attention did not work correctly during classifier-free guidance (CFG) inference, causing misalignment between the RGB and normal generation results.\u003C\u002Fspan>** To address this issue, we need to place the RGB and normal domain inputs in the first and second halves of the batch, respectively, before feeding them into the model. This approach differs from the typical CFG method, which separates unconditional and conditional inputs into the first and second halves of the batch. The results before and after the bug fix are shown below:\n\u003Cdiv align=\"center\">\n \u003Cimg width=\"600\" src=\"assets\u002Fbug_fixed.png\">\n\u003C\u002Fdiv>\n\n- Fixed a severe training bug. The \"zero_init_camera_projection\" in 'configs\u002Ftrain\u002Fstage1-mix-6views-lvis.yaml' should be False. Otherwise, the domain control and pose control will be invalid in the training.\n- 2024.03.19 Checkout our new model [GeoWizard](https:\u002F\u002Fgithub.com\u002Ffuxiao0719\u002FGeoWizard) that jointly produces depth and normal with high fidelity from single images.\n- 2024.05.24 We release a large 3D native diffusion model [CraftsMan3D](https:\u002F\u002Fgithub.com\u002Fwyysf-98\u002FCraftsMan) that is directly trained on 3D representation and therefore is capable of producing complex structures.\n- 2024.05.29 We release a more powerful MV cross-domain diffusion model [Era3D](https:\u002F\u002Fgithub.com\u002FpengHTYX\u002FEra3D) that jointly produces 512x512 color images and normal maps, but more importantly Era3D could automatically figure out the focal length and elevation degree of the input image so that avoid geometry distortions.\n\n## Usage\n```bash\n\n# First clone the repo, and use the commands in the repo\n\nimport torch\nimport requests\nfrom PIL import Image\nimport numpy as np\nfrom torchvision.utils import make_grid, save_image\nfrom diffusers import DiffusionPipeline  # only tested on diffusers[torch]==0.19.3, may have conflicts with newer versions of diffusers\n\ndef load_wonder3d_pipeline():\n\n    pipeline = DiffusionPipeline.from_pretrained(\n    'flamehaze1115\u002Fwonder3d-v1.0', # or use local checkpoint '.\u002Fckpts'\n    custom_pipeline='flamehaze1115\u002Fwonder3d-pipeline',\n    torch_dtype=torch.float16\n    )\n\n    # enable xformers\n    pipeline.unet.enable_xformers_memory_efficient_attention()\n\n    if torch.cuda.is_available():\n        pipeline.to('cuda:0')\n    return pipeline\n\npipeline = load_wonder3d_pipeline()\n\n# Download an example image.\ncond = Image.open(requests.get(\"https:\u002F\u002Fd.skis.ltd\u002Fnrp\u002Fsample-data\u002Flysol.png\", stream=True).raw)\n\n# The object should be located in the center and resized to 80% of image height.\ncond = Image.fromarray(np.array(cond)[:, :, :3])\n\n# Run the pipeline!\nimages = pipeline(cond, num_inference_steps=20, output_type='pt', guidance_scale=1.0).images\n\nresult = make_grid(images, nrow=6, ncol=2, padding=0, value_range=(0, 1))\n\nsave_image(result, 'result.png')\n```\n\n## Collaborations\nOur overarching mission is to enhance the speed, affordability, and quality of 3D AIGC, making the creation of 3D content accessible to all. While significant progress has been achieved in the recent years, we acknowledge there is still a substantial journey ahead. We enthusiastically invite you to engage in discussions and explore potential collaborations in any capacity. \u003Cspan style=\"color:red\">**If you're interested in connecting or partnering with us, please don't hesitate to reach out via email (xxlong@connect.hku.hk)**\u003C\u002Fspan> .\n\n## News\n\n- 2024.02 We release the training codes. Welcome to train wonder3D on your personal data.\n- 2023.10 We release the inference model and codes.\n\n\n### Preparation for inference\n\n#### Linux System Setup.\n```angular2html\nconda create -n wonder3d\nconda activate wonder3d\npip install -r requirements.txt\npip install git+https:\u002F\u002Fgithub.com\u002FNVlabs\u002Ftiny-cuda-nn\u002F#subdirectory=bindings\u002Ftorch\n```\n#### Windows System Setup.\n\nPlease switch to branch `main-windows` to see details of windows setup.\n\n#### Docker Setup\nsee [docker\u002FREADME.MD](docker\u002FREADME.md)\n\n### Training\nHere we provide two training scripts `train_mvdiffusion_image.py` and `train_mvdiffusion_joint.py`. \n\nThe training has two stages: 1) first train multi-view attentions by randomly taking normal or color flag; 2) add cross-domain attention modules into the SD model, and only optimize the newly added parameters.\n\nYou need to modify `root_dir` that contain the data of the config files `configs\u002Ftrain\u002Fstage1-mix-6views-lvis.yaml` and `configs\u002Ftrain\u002Fstage2-joint-6views-lvis.yaml` accordingly.\n\n```\n# stage 1:\naccelerate launch --config_file 8gpu.yaml train_mvdiffusion_image.py --config configs\u002Ftrain\u002Fstage1-mix-6views-lvis.yaml\n\n# stage 2\naccelerate launch --config_file 8gpu.yaml train_mvdiffusion_joint.py --config configs\u002Ftrain\u002Fstage2-joint-6views-lvis.yaml\n```\n\n### Prepare the training data\nsee [render_codes\u002FREADME.md](render_codes\u002FREADME.md).\n\n### Inference\n1. Optional. If you have troubles to connect to huggingface. Make sure you have downloaded the following models.\nDownload the [checkpoints](https:\u002F\u002Fconnecthkuhk-my.sharepoint.com\u002F:f:\u002Fg\u002Fpersonal\u002Fxxlong_connect_hku_hk\u002FEj7fMT1PwXtKvsELTvDuzuMBebQXEkmf2IwhSjBWtKAJiA) and into the root folder.\n\nIf you are in mainland China, you may download via [aliyun](https:\u002F\u002Fwww.alipan.com\u002Fs\u002FT4rLUNAVq6V).\n\n```bash\nWonder3D\n|-- ckpts\n    |-- unet\n    |-- scheduler\n    |-- vae\n    ...\n```\nThen modify the file .\u002Fconfigs\u002Fmvdiffusion-joint-ortho-6views.yaml, set `pretrained_model_name_or_path=\".\u002Fckpts\"`\n\n2. Download the [SAM](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fabhishek\u002FStableSAM\u002Fblob\u002Fmain\u002Fsam_vit_h_4b8939.pth) model. Put it to the ``sam_pt`` folder.\n```\nWonder3D\n|-- sam_pt\n    |-- sam_vit_h_4b8939.pth\n```\n3. Predict foreground mask as the alpha channel. We use [Clipdrop](https:\u002F\u002Fclipdrop.co\u002Fremove-background) to segment the foreground object interactively. \nYou may also use `rembg` to remove the backgrounds.\n```bash\n# !pip install rembg\nimport rembg\nresult = rembg.remove(result)\nresult.show()\n```\n4. Run Wonder3d to produce multiview-consistent normal maps and color images. Then you can check the results in the folder `.\u002Foutputs`. (we use `rembg` to remove backgrounds of the results, but the segmentations are not always perfect. May consider using [Clipdrop](https:\u002F\u002Fclipdrop.co\u002Fremove-background) to get masks for the generated normal maps and color images, since the quality of masks will significantly influence the reconstructed mesh quality.) \n```bash\naccelerate launch --config_file 1gpu.yaml test_mvdiffusion_seq.py \\\n            --config configs\u002Fmvdiffusion-joint-ortho-6views.yaml validation_dataset.root_dir={your_data_path} \\\n            validation_dataset.filepaths=['your_img_file'] save_dir={your_save_path}\n```\n\nsee example:\n\n```bash\naccelerate launch --config_file 1gpu.yaml test_mvdiffusion_seq.py \\\n            --config configs\u002Fmvdiffusion-joint-ortho-6views.yaml validation_dataset.root_dir=.\u002Fexample_images \\\n            validation_dataset.filepaths=['owl.png'] save_dir=.\u002Foutputs\n```\n\n#### Interactive inference: run your local gradio demo. (Only generate normals and colors without reconstruction)\n```bash\npython gradio_app_mv.py   # generate multi-view normals and colors\n```\n\n5. Mesh Extraction\n\n#### Instant-NSR Mesh Extraction\n\n```bash\ncd .\u002Finstant-nsr-pl\npython launch.py --config configs\u002Fneuralangelo-ortho-wmask.yaml --gpu 0 --train dataset.root_dir=..\u002F{your_save_path}\u002Fcropsize-{crop_size}-cfg{guidance_scale:.1f}\u002F dataset.scene={scene}\n```\n\nsee example:\n\n```bash\ncd .\u002Finstant-nsr-pl\npython launch.py --config configs\u002Fneuralangelo-ortho-wmask.yaml --gpu 0 --train dataset.root_dir=..\u002Foutputs\u002Fcropsize-192-cfg1.0\u002F dataset.scene=owl\n```\n\nOur generated normals and color images are defined in orthographic views, so the reconstructed mesh is also in orthographic camera space. If you use MeshLab to view the meshes, you can click `Toggle Orthographic Camera` in `View` tab.\n\n#### Interactive inference: run your local gradio demo. (First generate normals and colors, and then do reconstructions. No need to perform gradio_app_mv.py first.)\n```bash\npython gradio_app_recon.py   \n```\n\n#### NeuS-based Mesh Extraction\n\nSince there are many complaints about the Windows setup of instant-nsr-pl, we provide the NeuS-based reconstruction, which may get rid of the requirement problems. \n\nNeuS consumes less GPU memory and favors smooth surfaces without parameters tuning. However, NeuS consumes more times and its texture may be less sharp. If you are not sensitive to time, we recommend NeuS for optimization due to its robustness.\n\n```bash\ncd .\u002FNeuS\nbash run.sh output_folder_path scene_name \n```\n\n## Common questions\nQ: Tips to get better results.\n1. Wonder3D is sensitive the facing direciton of input images. By experiments, front-facing images always lead to good reconstruction.\n2. Limited by resources, current implemetation only supports limited views (6 views) and low resolution (256x256). Any images will be first resized into 256x256 for generation, so images after such a downsample that still keep clear and sharp features will lead to good results.\n3. Images with occlusions will cause worse reconstructions, since 6 views cannot cover the complete object. Images with less occlsuions lead to better results.\n4. Increate optimization steps in instant-nsr-pl, modify `trainer.max_steps: 3000` in `instant-nsr-pl\u002Fconfigs\u002Fneuralangelo-ortho-wmask.yaml` to more steps like `trainer.max_steps: 10000`. Longer optimization leads to better texture.\n\nQ: The evelation and azimuth degrees of the generated views?\n\nA: Unlike that the prior works such as Zero123, SyncDreamer and One2345 adopt object world system, our views are defined in the camera system of the input image. The six views are in the plane with 0 elevation degree in the camera system of the input image. Therefore we don't need to estimate an elevation degree for input image. The azimuth degrees of the six views are 0, 45, 90, 180, -90, -45 respectively.\n\nQ: The focal length of the generated views?\n\nA: We assume the input images are captured by orthographic camera, so the generated views are also in orthographic space. This design enables our model to keep strong generlaization on unreal images, but sometimes it may suffer from focal lens distortions on real-captured images.\n\n## Details about the camera system and camera poses\n![](assets\u002Fcoordinate.png)\nIn practice, the target object is assumed to be placed along the gravity direction.\n1) **Canonical coordinate system.** Some prior works (e.g. MVDream and SyncDreamer) adopt a shared canonical system for all objects, whose axis $Z_c$ shares the same direction with gravity (a). \n2) **Input view related system.** Wonder3D adopts an independent coordinate system for each object that is related to the input view.\nIts $Z_v$ and $X_v$ axes are aligned with the UV dimension of 2D input image space, and its $Y_v$ axis is vertical to the 2D image plane and passes through the center of ROI (Region of Interests) (b).\n3) **Camera poses.** Wonder3D outputs 6 views $\\{v_i, i=0,...,5\\}$ that are sampled at the $X_vOY_v$ plane of the input-view related system with a fixed radius, where the front view $v_0$ is initialized as input view and the other views are sampled with pre-defined azimuth degrees (see (b)).\n\n## Acknowledgement\nWe have intensively borrow codes from the following repositories. Many thanks to the authors for sharing their codes.\n- [stable diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion)\n- [zero123](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fzero123)\n- [NeuS](https:\u002F\u002Fgithub.com\u002FTotoro97\u002FNeuS)\n- [SyncDreamer](https:\u002F\u002Fgithub.com\u002Fliuyuan-pal\u002FSyncDreamer)\n- [instant-nsr-pl](https:\u002F\u002Fgithub.com\u002Fbennyguo\u002Finstant-nsr-pl)\n\n## License\nWonder3D is under MIT lincense, it's free to use it with an acknowledgement. If you have any questions about the usage of Wonder3D, please contact us.\n\n## Citation\nIf you find this repository useful in your project, please cite the following work. :)\n```\n@article{long2023wonder3d,\n  title={Wonder3D: Single Image to 3D using Cross-Domain Diffusion},\n  author={Long, Xiaoxiao and Guo, Yuan-Chen and Lin, Cheng and Liu, Yuan and Dou, Zhiyang and Liu, Lingjie and Ma, Yuexin and Zhang, Song-Hai and Habermann, Marc and Theobalt, Christian and others},\n  journal={arXiv preprint arXiv:2310.15008},\n  year={2023}\n}\n```\n","Wonder3D 是一个基于单张图像生成高质量3D模型的项目。它通过跨域扩散模型生成多视角法线图及其对应的彩色图像，再利用一种新颖的法线融合方法，在2到3分钟内实现快速且高精度的3D网格重建。此项目采用Python开发，支持CVPR 2024论文中介绍的技术，并已在Hugging Face和Google Colab上提供演示版本。适用于需要从单个视角图像快速创建详细3D模型的应用场景，如游戏设计、虚拟现实内容制作及数字艺术创作等。","2026-06-11 03:36:22","high_star"]