[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71019":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":17,"stars30d":18,"stars90d":16,"forks30d":16,"starsTrendScore":19,"compositeScore":20,"rankGlobal":10,"rankLanguage":10,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":22,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":32,"readmeContent":33,"aiSummary":34,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":35,"discoverSource":36},71019,"stable-dreamfusion","ashawkey\u002Fstable-dreamfusion","ashawkey","Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.","",null,"Python",8839,772,124,191,0,2,13,6,70.96,"Apache License 2.0",false,"main",true,[26,27,28,29,30,31],"dreamfusion","gui","image-to-3d","nerf","stable-diffusion","text-to-3d","2026-06-12 04:00:58","# Stable-Dreamfusion\n\nA pytorch implementation of the text-to-3D model **Dreamfusion**, powered by the [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) text-to-2D model.\n\n**ADVERTISEMENT: Please check out [threestudio](https:\u002F\u002Fgithub.com\u002Fthreestudio-project\u002Fthreestudio) for recent improvements and better implementation in 3D content generation!**\n\n**NEWS (2023.6.12)**:\n\n* Support of [Perp-Neg](https:\u002F\u002Fperp-neg.github.io\u002F) to alleviate multi-head problem in Text-to-3D.\n* Support of Perp-Neg for both [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) and [DeepFloyd-IF](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF).\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F25863658\u002F236712982-9f93bd32-83bf-423a-bb7c-f73df7ece2e3.mp4\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F25863658\u002F232403162-51b69000-a242-4b8c-9cd9-4242b09863fa.mp4\n\n### [Update Logs](assets\u002Fupdate_logs.md)\n\n### Colab notebooks:\n* Instant-NGP backbone (`-O`): [![Instant-NGP Backbone](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1MXT3yfOFvO0ooKEfiUUvTKwUkrrlCHpF?usp=sharing)\n\n* Vanilla NeRF backbone (`-O2`): [![Vanilla Backbone](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1mvfxG-S_n_gZafWoattku7rLJ2kPoImL?usp=sharing)\n\n# Important Notice\nThis project is a **work-in-progress**, and contains lots of differences from the paper. **The current generation quality cannot match the results from the original paper, and many prompts still fail badly!**\n\n## Notable differences from the paper\n* Since the Imagen model is not publicly available, we use [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) to replace it (implementation from [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers)). Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time cost in training.\n* We use the [multi-resolution grid encoder](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Finstant-ngp\u002F) to implement the NeRF backbone (implementation from [torch-ngp](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Ftorch-ngp)), which enables much faster rendering (~10FPS at 800x800).\n* We use the [Adan](https:\u002F\u002Fgithub.com\u002Fsail-sg\u002FAdan) optimizer as default.\n\n# Install\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion.git\ncd stable-dreamfusion\n```\n\n### Optional: create a python virtual environment\n\nTo avoid python package conflicts, we recommend using a virtual environment, e.g.: using conda or venv:\n\n```bash\npython -m venv venv_stable-dreamfusion\nsource venv_stable-dreamfusion\u002Fbin\u002Factivate # you need to repeat this step for every new terminal\n```\n\n### Install with pip\n\n```bash\npip install -r requirements.txt\n```\n\n### Download pre-trained models\n\nTo use image-conditioned 3D generation, you need to download some pretrained checkpoints manually:\n* [Zero-1-to-3](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fzero123) for diffusion backend.\n    We use `zero123-xl.ckpt` by default, and it is hard-coded in `guidance\u002Fzero123_utils.py`.\n    ```bash\n    cd pretrained\u002Fzero123\n    wget https:\u002F\u002Fzero123.cs.columbia.edu\u002Fassets\u002Fzero123-xl.ckpt\n    ```\n* [Omnidata](https:\u002F\u002Fgithub.com\u002FEPFL-VILAB\u002Fomnidata\u002Ftree\u002Fmain\u002Fomnidata_tools\u002Ftorch) for depth and normal prediction.\n    These ckpts are hardcoded in `preprocess_image.py`.\n    ```bash\n    mkdir pretrained\u002Fomnidata\n    cd pretrained\u002Fomnidata\n    # assume gdown is installed\n    gdown '1Jrh-bRnJEjyMCS7f-WsaFlccfPjJPPHI&confirm=t' # omnidata_dpt_depth_v2.ckpt\n    gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' # omnidata_dpt_normal_v2.ckpt\n    ```\n\nTo use [DeepFloyd-IF](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF), you need to accept the usage conditions from [hugging face](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-XL-v1.0), and login with `huggingface-cli login` in command line.\n\nFor DMTet, we port the pre-generated `32\u002F64\u002F128` resolution tetrahedron grids under `tets`.\nThe 256 resolution one can be found [here](https:\u002F\u002Fdrive.google.com\u002Ffile\u002Fd\u002F1lgvEKNdsbW5RS4gVxJbgBS4Ac92moGSa\u002Fview?usp=sharing).\n\n### Build extension (optional)\nBy default, we use [`load`](https:\u002F\u002Fpytorch.org\u002Fdocs\u002Fstable\u002Fcpp_extension.html#torch.utils.cpp_extension.load) to build the extension at runtime.\nWe also provide the `setup.py` to build each extension:\n```bash\ncd stable-dreamfusion\n\n# install all extension modules\nbash scripts\u002Finstall_ext.sh\n\n# if you want to install manually, here is an example:\npip install .\u002Fraymarching # install to python path (you still need the raymarching\u002F folder, since this only installs the built extension.)\n```\n\n### Taichi backend (optional)\nUse [Taichi](https:\u002F\u002Fgithub.com\u002Ftaichi-dev\u002Ftaichi) backend for Instant-NGP. It achieves comparable performance to CUDA implementation while **No CUDA** build is required. Install Taichi with pip:\n```bash\npip install -i https:\u002F\u002Fpypi.taichi.graphics\u002Fsimple\u002F taichi-nightly\n```\n\n### Trouble Shooting:\n* we assume working with the latest version of all dependencies, if you meet any problems from a specific dependency, please try to upgrade it first (e.g., `pip install -U diffusers`). If the problem still holds, [reporting a bug issue](https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002Fnew?assignees=&labels=bug&template=bug_report.yaml&title=%3Ctitle%3E) will be appreciated!\n* `[F glutil.cpp:338] eglInitialize() failed Aborted (core dumped)`: this usually indicates problems in OpenGL installation. Try to re-install Nvidia driver, or use nvidia-docker as suggested in https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion\u002Fissues\u002F131 if you are using a headless server.\n* `TypeError: xxx_forward(): incompatible function arguments`： this happens when we update the CUDA source and you used `setup.py` to install the extensions earlier. Try to re-install the corresponding extension (e.g., `pip install .\u002Fgridencoder`).\n\n### Tested environments\n* Ubuntu 22 with torch 1.12 & CUDA 11.6 on a V100.\n\n# Usage\n\nFirst time running will take some time to compile the CUDA extensions.\n\n```bash\n#### stable-dreamfusion setting\n\n### Instant-NGP NeRF Backbone\n# + faster rendering speed\n# + less GPU memory (~16G)\n# - need to build CUDA extensions (a CUDA-free Taichi backend is available)\n\n## train with text prompt (with the default settings)\n# `-O` equals `--cuda_ray --fp16`\n# `--cuda_ray` enables instant-ngp-like occupancy grid based acceleration.\npython main.py --text \"a hamburger\" --workspace trial -O\n\n# reduce stable-diffusion memory usage with `--vram_O`\n# enable various vram savings (https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16).\npython main.py --text \"a hamburger\" --workspace trial -O --vram_O\n\n# You can collect arguments in a file. You can override arguments by specifying them after `--file`. Note that quoted strings can't be loaded from .args files...\npython main.py --file scripts\u002Fres64.args --workspace trial_awesome_hamburger --text \"a photo of an awesome hamburger\"\n\n# use CUDA-free Taichi backend with `--backbone grid_taichi`\npython3 main.py --text \"a hamburger\" --workspace trial -O --backbone grid_taichi\n\n# choose stable-diffusion version (support 1.5, 2.0 and 2.1, default is 2.1 now)\npython main.py --text \"a hamburger\" --workspace trial -O --sd_version 1.5\n\n# use a custom stable-diffusion checkpoint from hugging face:\npython main.py --text \"a hamburger\" --workspace trial -O --hf_key andite\u002Fanything-v4.0\n\n# use DeepFloyd-IF for guidance (experimental):\npython main.py --text \"a hamburger\" --workspace trial -O --IF\npython main.py --text \"a hamburger\" --workspace trial -O --IF --vram_O # requires ~24G GPU memory\n\n# we also support negative text prompt now:\npython main.py --text \"a rose\" --negative \"red\" --workspace trial -O\n\n## after the training is finished:\n# test (exporting 360 degree video)\npython main.py --workspace trial -O --test\n# also save a mesh (with obj, mtl, and png texture)\npython main.py --workspace trial -O --test --save_mesh\n# test with a GUI (free view control!)\npython main.py --workspace trial -O --test --gui\n\n### Vanilla NeRF backbone\n# + pure pytorch, no need to build extensions!\n# - slow rendering speed\n# - more GPU memory\n\n## train\n# `-O2` equals `--backbone vanilla`\npython main.py --text \"a hotdog\" --workspace trial2 -O2\n\n# if CUDA OOM, try to reduce NeRF sampling steps (--num_steps and --upsample_steps)\npython main.py --text \"a hotdog\" --workspace trial2 -O2 --num_steps 64 --upsample_steps 0\n\n## test\npython main.py --workspace trial2 -O2 --test\npython main.py --workspace trial2 -O2 --test --save_mesh\npython main.py --workspace trial2 -O2 --test --gui # not recommended, FPS will be low.\n\n### DMTet finetuning\n\n## use --dmtet and --init_with \u003Cnerf checkpoint> to finetune the mesh at higher reslution\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --init_with trial\u002Fcheckpoints\u002Fdf.pth\n\n## init dmtet with a mesh to generate texture\n# require install of cubvh: pip install git+https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fcubvh\n# remove --lock_geo to also finetune geometry, but performance may be bad.\npython main.py -O --text \"a white bunny with red eyes\" --workspace trial_dmtet_mesh --dmtet --iters 5000 --init_with .\u002Fdata\u002Fbunny.obj --lock_geo\n\n## test & export the mesh\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --test --save_mesh\n\n## gui to visualize dmtet\npython main.py -O --text \"a hamburger\" --workspace trial_dmtet --dmtet --iters 5000 --test --gui\n\n### Image-conditioned 3D Generation\n\n## preprocess input image\n# note: the results of image-to-3D is dependent on zero-1-to-3's capability. For best performance, the input image should contain a single front-facing object, it should have square aspect ratio, with \u003C1024 pixel resolution. Check the examples under .\u002Fdata.\n# this will exports `\u003Cimage>_rgba.png`, `\u003Cimage>_depth.png`, and `\u003Cimage>_normal.png` to the directory containing the input image.\npython preprocess_image.py \u003Cimage>.png\npython preprocess_image.py \u003Cimage>.png --border_ratio 0.4 # increase border_ratio if the center object appears too large and results are unsatisfying.\n\n## zero123 train\n# pass in the processed \u003Cimage>_rgba.png by --image and do NOT pass in --text to enable zero-1-to-3 backend.\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000\n\n# if the image is not exactly front-view (elevation = 0), adjust default_polar (we use polar from 0 to 180 to represent elevation from 90 to -90)\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000 --default_polar 80\n\n# by default we leverage monocular depth estimation to aid image-to-3d, but if you find the depth estimation inaccurate and harms results, turn it off by:\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image --iters 5000 --lambda_depth 0\n\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --init_with trial_image\u002Fcheckpoints\u002Fdf.pth\n\n## zero123 with multiple images\npython main.py -O --image_config config\u002F\u003Cconfig>.csv --workspace trial_image --iters 5000\n\n## render \u003Cnum> images per batch (default 1)\npython main.py -O --image_config config\u002F\u003Cconfig>.csv --workspace trial_image --iters 5000 --batch_size 4\n\n# providing both --text and --image enables stable-diffusion backend (similar to make-it-3d)\npython main.py -O --image hamburger_rgba.png --text \"a DSLR photo of a delicious hamburger\" --workspace trial_image_text --iters 5000\n\npython main.py -O --image hamburger_rgba.png --text \"a DSLR photo of a delicious hamburger\" --workspace trial_image_text_dmtet --dmtet --init_with trial_image_text\u002Fcheckpoints\u002Fdf.pth\n\n## test \u002F visualize\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --test --save_mesh\npython main.py -O --image \u003Cimage>_rgba.png --workspace trial_image_dmtet --dmtet --test --gui\n\n### Debugging\n\n# Can save guidance images for debugging purposes. These get saved in trial_hamburger\u002Fguidance.\n# Warning: this slows down training considerably and consumes lots of disk space!\npython main.py --text \"a hamburger\" --workspace trial_hamburger -O --vram_O --save_guidance --save_guidance_interval 5 # save every 5 steps\n```\n\nFor example commands, check [`scripts`](.\u002Fscripts).\n\nFor advanced tips and other developing stuff, check [Advanced Tips](.\u002Fassets\u002Fadvanced.md).\n\n# Evalutation\n\nReproduce the paper CLIP R-precision evaluation\n\nAfter the testing part in the usage, the validation set containing projection from different angle is generated. Test the R-precision between prompt and the image.(R=1)\n\n```bash\npython r_precision.py --text \"a snake is flying in the sky\" --workspace snake_HQ --latest ep0100 --mode depth --clip clip-ViT-B-16\n```\n\n# Acknowledgement\n\nThis work is based on an increasing list of amazing research works and open-source projects, thanks a lot to all the authors for sharing!\n\n* [DreamFusion: Text-to-3D using 2D Diffusion](https:\u002F\u002Fdreamfusion3d.github.io\u002F)\n    ```\n    @article{poole2022dreamfusion,\n        author = {Poole, Ben and Jain, Ajay and Barron, Jonathan T. and Mildenhall, Ben},\n        title = {DreamFusion: Text-to-3D using 2D Diffusion},\n        journal = {arXiv},\n        year = {2022},\n    }\n    ```\n\n* [Magic3D: High-Resolution Text-to-3D Content Creation](https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fmagic3d\u002F)\n   ```\n   @inproceedings{lin2023magic3d,\n      title={Magic3D: High-Resolution Text-to-3D Content Creation},\n      author={Lin, Chen-Hsuan and Gao, Jun and Tang, Luming and Takikawa, Towaki and Zeng, Xiaohui and Huang, Xun and Kreis, Karsten and Fidler, Sanja and Liu, Ming-Yu and Lin, Tsung-Yi},\n      booktitle={IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},\n      year={2023}\n    }\n   ```\n\n* [Zero-1-to-3: Zero-shot One Image to 3D Object](https:\u002F\u002Fgithub.com\u002Fcvlab-columbia\u002Fzero123)\n    ```\n    @misc{liu2023zero1to3,\n        title={Zero-1-to-3: Zero-shot One Image to 3D Object},\n        author={Ruoshi Liu and Rundi Wu and Basile Van Hoorick and Pavel Tokmakov and Sergey Zakharov and Carl Vondrick},\n        year={2023},\n        eprint={2303.11328},\n        archivePrefix={arXiv},\n        primaryClass={cs.CV}\n    }\n    ```\n    \n* [Perp-Neg: Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond](https:\u002F\u002Fperp-neg.github.io\u002F)\n    ```\n    @article{armandpour2023re,\n      title={Re-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond},\n      author={Armandpour, Mohammadreza and Zheng, Huangjie and Sadeghian, Ali and Sadeghian, Amir and Zhou, Mingyuan},\n      journal={arXiv preprint arXiv:2304.04968},\n      year={2023}\n    }\n    ```\n    \n* [RealFusion: 360° Reconstruction of Any Object from a Single Image](https:\u002F\u002Fgithub.com\u002Flukemelas\u002Frealfusion)\n    ```\n    @inproceedings{melaskyriazi2023realfusion,\n        author = {Melas-Kyriazi, Luke and Rupprecht, Christian and Laina, Iro and Vedaldi, Andrea},\n        title = {RealFusion: 360 Reconstruction of Any Object from a Single Image},\n        booktitle={CVPR}\n        year = {2023},\n        url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2302.10663},\n    }\n    ```\n\n* [Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation](https:\u002F\u002Ffantasia3d.github.io\u002F)\n    ```\n    @article{chen2023fantasia3d,\n        title={Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation},\n        author={Rui Chen and Yongwei Chen and Ningxin Jiao and Kui Jia},\n        journal={arXiv preprint arXiv:2303.13873},\n        year={2023}\n    }\n    ```\n\n* [Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior](https:\u002F\u002Fmake-it-3d.github.io\u002F)\n    ```\n    @article{tang2023make,\n        title={Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior},\n        author={Tang, Junshu and Wang, Tengfei and Zhang, Bo and Zhang, Ting and Yi, Ran and Ma, Lizhuang and Chen, Dong},\n        journal={arXiv preprint arXiv:2303.14184},\n        year={2023}\n    }\n    ```\n\n* [Stable Diffusion](https:\u002F\u002Fgithub.com\u002FCompVis\u002Fstable-diffusion) and the [diffusers](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers) library.\n\n    ```\n    @misc{rombach2021highresolution,\n        title={High-Resolution Image Synthesis with Latent Diffusion Models},\n        author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},\n        year={2021},\n        eprint={2112.10752},\n        archivePrefix={arXiv},\n        primaryClass={cs.CV}\n    }\n\n    @misc{von-platen-etal-2022-diffusers,\n        author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},\n        title = {Diffusers: State-of-the-art diffusion models},\n        year = {2022},\n        publisher = {GitHub},\n        journal = {GitHub repository},\n        howpublished = {\\url{https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers}}\n    }\n    ```\n\n* The GUI is developed with [DearPyGui](https:\u002F\u002Fgithub.com\u002Fhoffstadt\u002FDearPyGui).\n\n* Puppy image from : https:\u002F\u002Fwww.pexels.com\u002Fphoto\u002Fhigh-angle-photo-of-a-corgi-looking-upwards-2664417\u002F\n\n* Anya images from : https:\u002F\u002Fwww.goodsmile.info\u002Fen\u002Fproduct\u002F13301\u002FPOP+UP+PARADE+Anya+Forger.html\n\n# Citation\n\nIf you find this work useful, a citation will be appreciated via:\n```\n@misc{stable-dreamfusion,\n    Author = {Jiaxiang Tang},\n    Year = {2022},\n    Note = {https:\u002F\u002Fgithub.com\u002Fashawkey\u002Fstable-dreamfusion},\n    Title = {Stable-dreamfusion: Text-to-3D with Stable-diffusion}\n}\n```\n","Stable-Dreamfusion 是一个基于NeRF和扩散模型实现文本到3D及图像到3D转换的项目，并支持网格导出。该项目利用了Stable Diffusion模型的强大能力，通过PyTorch框架实现了从文本或图像输入生成高质量3D内容的功能。它引入了Perp-Neg技术来解决多头问题，并提供了Instant-NGP和Vanilla NeRF两种不同的渲染后端选择，以适应不同场景下的性能需求。此外，项目还集成了Adan优化器，进一步提升了训练效率。尽管目前仍处于开发阶段，且实际生成效果与论文中展示的结果存在一定差距，但Stable-Dreamfusion非常适合对3D内容创作感兴趣的研究者、开发者以及创意工作者探索使用。","2026-06-11 03:35:31","high_star"]