[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72029":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":17,"stars30d":18,"stars90d":15,"forks30d":15,"starsTrendScore":19,"compositeScore":20,"rankGlobal":9,"rankLanguage":9,"license":21,"archived":22,"fork":22,"defaultBranch":23,"hasWiki":24,"hasPages":24,"topics":25,"createdAt":9,"pushedAt":9,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":15,"starSnapshotCount":15,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},72029,"TRELLIS.2","microsoft\u002FTRELLIS.2","microsoft","Native and Compact Structured Latents for 3D Generation",null,"Python",8272,1008,58,117,0,93,217,580,279,40.01,"MIT License",false,"main",true,[],"2026-06-12 02:02:57","![](assets\u002Fteaser.webp)\n\n# Native and Compact Structured Latents for 3D Generation\n\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2512.14692\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-Arxiv-b31b1b.svg\" alt=\"Paper\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FTRELLIS.2-4B\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Model-yellow\" alt=\"Hugging Face\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmicrosoft\u002FTRELLIS.2\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHugging%20Face-Demo-blueviolet\">\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fmicrosoft.github.io\u002FTRELLIS.2\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Website-blue\" alt=\"Project Page\">\u003C\u002Fa>\n\u003Ca href=\"LICENSE\">\u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLicense-MIT-green\" alt=\"License\">\u003C\u002Fa>\n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F63b43a7e-acc7-4c81-a900-6da450527d8f\n\n*(Compressed version due to GitHub size limits. See the full-quality video on our project page!)*\n\n**TRELLIS.2** is a state-of-the-art large 3D generative model (4B parameters) designed for high-fidelity **image-to-3D** generation. It leverages a novel \"field-free\" sparse voxel structure termed **O-Voxel** to reconstruct and generate arbitrary 3D assets with complex topologies, sharp features, and full PBR materials.\n\n\n## ✨ Features\n\n### 1. High Quality, Resolution & Efficiency\nOur 4B-parameter model generates high-resolution fully textured assets with exceptional fidelity and efficiency using vanilla DiTs. It utilizes a Sparse 3D VAE with 16× spatial downsampling to encode assets into a compact latent space.\n\n| Resolution | Total Time* | Breakdown (Shape + Mat) |\n| :--- | :--- | :--- |\n| **512³** | **~3s** | 2s + 1s |\n| **1024³** | **~17s** | 10s + 7s |\n| **1536³** | **~60s** | 35s + 25s |\n\n\u003Csmall>*Tested on NVIDIA H100 GPU.\u003C\u002Fsmall>\n\n### 2. Arbitrary Topology Handling\nThe **O-Voxel** representation breaks the limits of iso-surface fields. It robustly handles complex structures without lossy conversion:\n*   ✅ **Open Surfaces** (e.g., clothing, leaves)\n*   ✅ **Non-manifold Geometry**\n*   ✅ **Internal Enclosed Structures**\n\n### 3. Rich Texture Modeling\nBeyond basic colors, TRELLIS.2 models arbitrary surface attributes including **Base Color, Roughness, Metallic, and Opacity**, enabling photorealistic rendering and transparency support.\n\n### 4. Minimalist Processing\nData processing is streamlined for instant conversions that are fully **rendering-free** and **optimization-free**.\n*   **\u003C 10s** (Single CPU): Textured Mesh → O-Voxel\n*   **\u003C 100ms** (CUDA): O-Voxel → Textured Mesh\n\n\n## 🗺️ Roadmap\n\n- [x] Paper release\n- [x] Release image-to-3D inference code\n- [x] Release pretrained checkpoints (4B)\n- [x] Hugging Face Spaces demo\n- [x] Release shape-conditioned texture generation inference code\n- [x] Release training code\n\n\n## 🛠️ Installation\n\n### Prerequisites\n- **System**: The code is currently tested only on **Linux**.\n- **Hardware**: An NVIDIA GPU with at least 24GB of memory is necessary. The code has been verified on NVIDIA A100 and H100 GPUs.  \n- **Software**:   \n  - The [CUDA Toolkit](https:\u002F\u002Fdeveloper.nvidia.com\u002Fcuda-toolkit-archive) is needed to compile certain packages. Recommended version is 12.4.  \n  - [Conda](https:\u002F\u002Fdocs.anaconda.com\u002Fminiconda\u002Finstall\u002F#quick-command-line-install) is recommended for managing dependencies.  \n  - Python version 3.8 or higher is required. \n\n### Installation Steps\n1. Clone the repo:\n    ```sh\n    git clone -b main https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS.2.git --recursive\n    cd TRELLIS.2\n    ```\n\n2. Install the dependencies:\n    \n    **Before running the following command there are somethings to note:**\n    - By adding `--new-env`, a new conda environment named `trellis2` will be created. If you want to use an existing conda environment, please remove this flag.\n    - By default the `trellis2` environment will use pytorch 2.6.0 with CUDA 12.4. If you want to use a different version of CUDA, you can remove the `--new-env` flag and manually install the required dependencies. Refer to [PyTorch](https:\u002F\u002Fpytorch.org\u002Fget-started\u002Fprevious-versions\u002F) for the installation command.\n    - If you have multiple CUDA Toolkit versions installed, `CUDA_HOME` should be set to the correct version before running the command. For example, if you have CUDA Toolkit 12.4 and 13.0 installed, you can run `export CUDA_HOME=\u002Fusr\u002Flocal\u002Fcuda-12.4` before running the command.\n    - By default, the code uses the `flash-attn` backend for attention. For GPUs do not support `flash-attn` (e.g., NVIDIA V100), you can install `xformers` manually and set the `ATTN_BACKEND` environment variable to `xformers` before running the code. See the [Minimal Example](#minimal-example) for more details.\n    - The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time.\n    - If you encounter any issues during the installation, feel free to open an issue or contact us.\n    \n    Create a new conda environment named `trellis2` and install the dependencies:\n    ```sh\n    . .\u002Fsetup.sh --new-env --basic --flash-attn --nvdiffrast --nvdiffrec --cumesh --o-voxel --flexgemm\n    ```\n    The detailed usage of `setup.sh` can be found by running `. .\u002Fsetup.sh --help`.\n    ```sh\n    Usage: setup.sh [OPTIONS]\n    Options:\n        -h, --help              Display this help message\n        --new-env               Create a new conda environment\n        --basic                 Install basic dependencies\n        --flash-attn            Install flash-attention\n        --cumesh                Install cumesh\n        --o-voxel               Install o-voxel\n        --flexgemm              Install flexgemm\n        --nvdiffrast            Install nvdiffrast\n        --nvdiffrec             Install nvdiffrec\n    ```\n\n## 📦 Pretrained Weights\n\nThe pretrained model **TRELLIS.2-4B** is available on Hugging Face. Please refer to the model card there for more details.\n\n| Model | Parameters | Resolution | Link |\n| :--- | :--- | :--- | :--- |\n| **TRELLIS.2-4B** | 4 Billion | 512³ - 1536³ | [Hugging Face](https:\u002F\u002Fhuggingface.co\u002Fmicrosoft\u002FTRELLIS.2-4B) |\n\n\n## 🚀 Usage\n\n### 1. Image to 3D Generation\n\n#### Minimal Example\n\nHere is an [example](example.py) of how to use the pretrained models for 3D asset generation.\n\n```python\nimport os\nos.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'\nos.environ[\"PYTORCH_CUDA_ALLOC_CONF\"] = \"expandable_segments:True\"  # Can save GPU memory\nimport cv2\nimport imageio\nfrom PIL import Image\nimport torch\nfrom trellis2.pipelines import Trellis2ImageTo3DPipeline\nfrom trellis2.utils import render_utils\nfrom trellis2.renderers import EnvMap\nimport o_voxel\n\n# 1. Setup Environment Map\nenvmap = EnvMap(torch.tensor(\n    cv2.cvtColor(cv2.imread('assets\u002Fhdri\u002Fforest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),\n    dtype=torch.float32, device='cuda'\n))\n\n# 2. Load Pipeline\npipeline = Trellis2ImageTo3DPipeline.from_pretrained(\"microsoft\u002FTRELLIS.2-4B\")\npipeline.cuda()\n\n# 3. Load Image & Run\nimage = Image.open(\"assets\u002Fexample_image\u002FT.png\")\nmesh = pipeline.run(image)[0]\nmesh.simplify(16777216) # nvdiffrast limit\n\n# 4. Render Video\nvideo = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))\nimageio.mimsave(\"sample.mp4\", video, fps=15)\n\n# 5. Export to GLB\nglb = o_voxel.postprocess.to_glb(\n    vertices            =   mesh.vertices,\n    faces               =   mesh.faces,\n    attr_volume         =   mesh.attrs,\n    coords              =   mesh.coords,\n    attr_layout         =   mesh.layout,\n    voxel_size          =   mesh.voxel_size,\n    aabb                =   [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],\n    decimation_target   =   1000000,\n    texture_size        =   4096,\n    remesh              =   True,\n    remesh_band         =   1,\n    remesh_project      =   0,\n    verbose             =   True\n)\nglb.export(\"sample.glb\", extension_webp=True)\n```\n\nUpon execution, the script generates the following files:\n - `sample.mp4`: A video visualizing the generated 3D asset with PBR materials and environmental lighting.\n - `sample.glb`: The extracted PBR-ready 3D asset in GLB format.\n\n**Note:** The `.glb` file is exported in `OPAQUE` mode by default. Although the alpha channel is preserved within the texture map, it is not active initially. To enable transparency, import the asset into your 3D software and manually connect the texture's alpha channel to the material's opacity or alpha input.\n\n#### Web Demo\n\n[app.py](app.py) provides a simple web demo for image to 3D asset generation. you can run the demo with the following command:\n```sh\npython app.py\n```\n\nThen, you can access the demo at the address shown in the terminal.\n\n### 2. PBR Texture Generation\n\nPlease refer to the [example_texturing.py](example_texturing.py) for an example of how to generate PBR textures for a given 3D shape. Also, you can use the [app_texturing.py](app_texturing.py) to run a web demo for PBR texture generation.\n\n\n## 🏋️ Training\n\nWe provide the full training codebase, enabling users to train **TRELLIS.2** from scratch or fine-tune it on custom datasets.\n\n### 1. Data Preparation\n\nBefore training, raw 3D assets must be converted into the **O-Voxel** representation. This process includes mesh conversion, compact structured latent generation, and metadata preparation.\n\n> 📂 **Please refer to [data_toolkit\u002FREADME.md](data_toolkit\u002FREADME.md) for detailed instructions on data preprocessing and dataset organization.**\n\n### 2. Running Training\n\nTraining is managed through the `train.py` script, which accepts multiple command-line arguments to configure experiments:\n\n* `--config`: Path to the experiment configuration file.\n* `--output_dir`: Directory for training outputs.\n* `--load_dir`: Directory to load checkpoints from (defaults to `output_dir`).\n* `--ckpt`: Checkpoint step to resume from (defaults to the latest).\n* `--data_dir`: Dataset path or a JSON string specifying dataset locations.\n* `--auto_retry`: Number of automatic retries upon failure.\n* `--tryrun`: Perform a dry run without actual training.\n* `--profile`: Enable training profiling.\n* `--num_nodes`: Number of nodes for distributed training.\n* `--node_rank`: Rank of the current node.\n* `--num_gpus`: Number of GPUs per node (defaults to all available GPUs).\n* `--master_addr`: Master node address for distributed training.\n* `--master_port`: Port for distributed training communication.\n\n\n### SC-VAE Training\n\n\nTo train the shape SC-VAE, run:\n\n```sh\npython train.py \\\n  --config configs\u002Fscvae\u002Fshape_vae_next_dc_f16c32_fp16.json \\\n  --output_dir results\u002Fshape_vae_next_dc_f16c32_fp16 \\\n  --data_dir \"{\\\"ObjaverseXL_sketchfab\\\": {\\\"base\\\": \\\"datasets\u002FObjaverseXL_sketchfab\\\", \\\"mesh_dump\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fmesh_dumps\\\", \\\"dual_grid\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fdual_grid_256\\\", \\\"asset_stats\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fasset_stats\\\"}}\"\n```\n\nThis command trains the shape SC-VAE on the **Objaverse-XL** dataset using the `shape_vae_next_dc_f16c32_fp16.json` configuration. Training outputs will be saved to `results\u002Fshape_vae_next_dc_f16c32_fp16`.\n\nThe dataset is specified as a JSON string, where each dataset entry includes:\n\n* `base`: Root directory of the dataset.\n* `mesh_dump`: Directory containing preprocessed mesh dumps.\n* `dual_grid`: Directory with precomputed dual-grid representations.\n* `asset_stats`: Directory containing precomputed asset statistics.\n\nTo fine-tune the model at a higher resolution, use the `shape_vae_next_dc_f16c32_fp16_ft_512.json` configuration. Remember to update the `finetune_ckpt` field and adjust the dataset paths accordingly.\n\n\nTo train the texture SC-VAE, run:\n\n```sh\npython train.py \\\n  --config configs\u002Fscvae\u002Ftex_vae_next_dc_f16c32_fp16.json \\\n  --output_dir results\u002Ftex_vae_next_dc_f16c32_fp16 \\\n  --data_dir \"{\\\"ObjaverseXL_sketchfab\\\": {\\\"base\\\": \\\"datasets\u002FObjaverseXL_sketchfab\\\", \\\"pbr_dump\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fpbr_dumps\\\", \\\"pbr_voxel\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fpbr_voxels_256\\\", \\\"asset_stats\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fasset_stats\\\"}}\"\n```\n\n\n### Flow Model Training\n\nTo train the sparse structure flow model, run:\n\n```sh\npython train.py \\\n  --config configs\u002Fgen\u002Fss_flow_img_dit_1_3B_64_bf16.json \\\n  --output_dir results\u002Fss_flow_img_dit_1_3B_64_bf16 \\\n  --data_dir \"{\\\"ObjaverseXL_sketchfab\\\": {\\\"base\\\": \\\"datasets\u002FObjaverseXL_sketchfab\\\", \\\"ss_latent\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fss_latents\u002Fss_enc_conv3d_16l8_fp16_64\\\", \\\"render_cond\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\\\"}}\"\n```\n\nThis command trains the sparse-structure flow model on the **Objaverse-XL** dataset using the specified configuration file. Outputs are saved to `results\u002Fss_flow_img_dit_1_3B_64_bf16`.\n\nThe dataset configuration includes:\n\n* `base`: Root dataset directory.\n* `ss_latent`: Directory containing precomputed sparse-structure latents.\n* `render_cond`: Directory containing conditional rendering images.\n\n\nThe second- and third-stage flow models for shape and texture generation can be trained using the following configurations:\n\n* Shape flow: `slat_flow_img2shape_dit_1_3B_512_bf16.json`\n* Texture flow: `slat_flow_imgshape2tex_dit_1_3B_512_bf16.json`\n\nExample commands:\n\n```sh\n# Shape flow model\npython train.py \\\n  --config configs\u002Fgen\u002Fslat_flow_img2shape_dit_1_3B_512_bf16.json \\\n  --output_dir results\u002Fslat_flow_img2shape_dit_1_3B_512_bf16 \\\n  --data_dir \"{\\\"ObjaverseXL_sketchfab\\\": {\\\"base\\\": \\\"datasets\u002FObjaverseXL_sketchfab\\\", \\\"shape_latent\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fshape_latents\u002Fshape_enc_next_dc_f16c32_fp16_512\\\", \\\"render_cond\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\\\"}}\"\n\n# Texture flow model\npython train.py \\\n  --config configs\u002Fgen\u002Fslat_flow_imgshape2tex_dit_1_3B_512_bf16.json \\\n  --output_dir results\u002Fslat_flow_imgshape2tex_dit_1_3B_512_bf16 \\\n  --data_dir \"{\\\"ObjaverseXL_sketchfab\\\": {\\\"base\\\": \\\"datasets\u002FObjaverseXL_sketchfab\\\", \\\"shape_latent\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fshape_latents\u002Fshape_enc_next_dc_f16c32_fp16_512\\\", \\\"pbr_latent\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Fpbr_latents\u002Ftex_enc_next_dc_f16c32_fp16_512\\\", \\\"render_cond\\\": \\\"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\\\"}}\"\n```\n\nHigher-resolution fine-tuning can be performed by updating the `finetune_ckpt` field in the following configuration files and adjusting the dataset paths accordingly:\n\n* `slat_flow_img2shape_dit_1_3B_512_bf16_ft1024.json`\n* `slat_flow_imgshape2tex_dit_1_3B_512_bf16_ft1024.json`\n\n\n## 🧩 Related Packages\n\nTRELLIS.2 is built upon several specialized high-performance packages developed by our team:\n\n*   **[O-Voxel](o-voxel):** \n    Core library handling the logic for converting between textured meshes and the O-Voxel representation, ensuring instant bidirectional transformation.\n*   **[FlexGEMM](https:\u002F\u002Fgithub.com\u002FJeffreyXiang\u002FFlexGEMM):** \n    Efficient sparse convolution implementation based on Triton, enabling rapid processing of sparse voxel structures.\n*   **[CuMesh](https:\u002F\u002Fgithub.com\u002FJeffreyXiang\u002FCuMesh):** \n    CUDA-accelerated mesh utilities used for high-speed post-processing, remeshing, decimation, and UV-unwrapping.\n\n\n## ⚖️ License\n\nThis model and code are released under the **[MIT License](LICENSE)**.\n\nPlease note that certain dependencies operate under separate license terms:\n\n- [**nvdiffrast**](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fnvdiffrast): Utilized for rendering generated 3D assets. This package is governed by its own [License](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fnvdiffrast\u002Fblob\u002Fmain\u002FLICENSE.txt).\n\n- [**nvdiffrec**](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fnvdiffrec): Implements the split-sum renderer for PBR materials. This package is governed by its own [License](https:\u002F\u002Fgithub.com\u002FNVlabs\u002Fnvdiffrec\u002Fblob\u002Fmain\u002FLICENSE.txt).\n\n## 📚 Citation\n\nIf you find this model useful for your research, please cite our work:\n\n```bibtex\n@article{\n    xiang2025trellis2,\n    title={Native and Compact Structured Latents for 3D Generation},\n    author={Xiang, Jianfeng and Chen, Xiaoxue and Xu, Sicheng and Wang, Ruicheng and Lv, Zelong and Deng, Yu and Zhu, Hongyuan and Dong, Yue and Zhao, Hao and Yuan, Nicholas Jing and Yang, Jiaolong},\n    journal={Tech report},\n    year={2025}\n}\n```\n","TRELLIS.2 是一个先进的大型3D生成模型，专为高保真度的图像到3D转换而设计。该项目利用一种称为O-Voxel的新型“无场”稀疏体素结构，能够重建和生成具有复杂拓扑、锐利特征和全PBR材质的任意3D资产。其核心技术特点包括：高质量、高分辨率与高效性（例如，在NVIDIA H100 GPU上生成512³分辨率的3D资产仅需约3秒），处理任意拓扑结构的能力（如开放表面、非流形几何和内部封闭结构），以及丰富的纹理建模能力（支持基础颜色、粗糙度、金属感和透明度）。TRELLIS.2 适用于需要快速且高质量3D内容生成的场景，如游戏开发、虚拟现实及建筑设计等领域。",2,"2026-06-11 03:40:01","high_star"]