[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-75711":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":26,"readmeContent":27,"aiSummary":28,"trendingCount":16,"starSnapshotCount":16,"syncStatus":29,"lastSyncTime":30,"discoverSource":31},75711,"Pixal3D","TencentARC\u002FPixal3D","TencentARC","[SIGGRAPH 2026] Pixal3D: Pixel-Aligned 3D Generation from Images","https:\u002F\u002Fldyang694.github.io\u002Fprojects\u002Fpixal3d\u002F",null,"Python",1730,154,21,14,0,20,61,1496,60,104.57,"Other",false,"master",[],"2026-06-12 04:01:18","\n\u003Cdiv align=\"center\">\n\n# Pixal3D: Pixel-Aligned 3D Generation from Images\n\n\u003Ch3>SIGGRAPH 2026\u003C\u002Fh3>\n\n\u003Csmall>[Dong-Yang Li](https:\u002F\u002Fldyang694.github.io\u002F)¹ · [Wang Zhao](https:\u002F\u002Fthuzhaowang.github.io\u002F)²* · [Yuxin Chen](https:\u002F\u002Forcid.org\u002F0000-0002-7854-1072)² · [Wenbo Hu](https:\u002F\u002Fwbhu.github.io\u002F)² · [Meng-Hao Guo](https:\u002F\u002Fmenghaoguo.github.io\u002F)¹ · [Fang-Lue Zhang](https:\u002F\u002Ffanglue.github.io\u002F)³ · [Ying Shan](https:\u002F\u002Fwww.linkedin.com\u002Fin\u002FYingShanProfile)² · [Shi-Min Hu](https:\u002F\u002Fcg.cs.tsinghua.edu.cn\u002Fshimin.htm)¹✉\u003C\u002Fsmall>\n\n¹Tsinghua University (BNRist) &nbsp;&nbsp; ²Tencent ARC Lab &nbsp;&nbsp; ³Victoria University of Wellington\n\n*Project lead &nbsp;&nbsp; ✉Corresponding author\n\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n  \u003Ca href=\"https:\u002F\u002Fldyang694.github.io\u002Fprojects\u002Fpixal3d\u002F\">\u003Cimg src=https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject%20Page-333399.svg?logo=googlehome height=22px>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FTencentARC\u002FPixal3D\">\u003Cimg src=https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Demo-276cb4.svg height=22px>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002FTencentARC\u002FPixal3D\">\u003Cimg src=https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Models-d96902.svg height=22px>\u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.10922\">\u003Cimg src=https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FArxiv-b5212f.svg?logo=arxiv height=22px>\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\u003Cdiv align=\"center\">\n    \u003Cimg src=\"assets\u002Fteaser.png\" alt=\"Teaser image of Pixal3D\"\u002F>\n\u003C\u002Fdiv>\n\n**Pixal3D** generates high-fidelity 3D assets from a single image. Unlike previous methods that loosely inject image features via attention, Pixal3D explicitly lifts pixel features into 3D through back-projection, establishing direct pixel-to-3D correspondences. This enables near-reconstruction-level fidelity with detailed geometry and PBR textures.\n\n---\n\n## ✨ News\n\n- **May 2026**: Release training code and data preparation toolkit. 🔧\n- **May 2026**: Release the improved version based on [Trellis.2](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS.2) backbone. 💪\n- **May 2026**: Release inference code and online demo. 🤗\n- **Apr 2026**: Our paper is accepted to SIGGRAPH 2026! 🎉\n\n## 📌 Branches\n\n| Branch | Description |\n|--------|-------------|\n| `main` | **Latest version** — improved implementation based on [Trellis.2](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS.2) backbone with better performance. |\n| `paper` | **Paper version** — original implementation based on [Direct3D-S2](https:\u002F\u002Fgithub.com\u002FDreamTechAI\u002FDirect3D-S2), corresponding to results reported in our SIGGRAPH 2026 paper. |\n\n> If you want to reproduce the results in our paper, please switch to the `paper` branch.\n\n## 🎮 Try It Online\n\nYou can try Pixal3D directly in your browser without any installation via our Hugging Face Gradio demo:\n\n👉 [**Launch Demo**](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FTencentARC\u002FPixal3D)\n\n## 🚀 Getting Started\n\n### Installation\n\n#### Step 1: Follow TRELLIS.2 Installation\n\nPlease first follow the installation guide of [TRELLIS.2](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS.2) to set up the base environment.\n\n#### Step 2: Install Additional Dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n#### Step 3: Install natten\n\n```bash\nNATTEN_CUDA_ARCH=\"xx\" NATTEN_N_WORKERS=xx pip install natten==0.21.0 --no-build-isolation\n```\n\nPlease replace `xx` with the CUDA architecture and the number of build workers suitable for your machine.\n\n#### Step 4: Install utils3d\n\n```bash\npip install https:\u002F\u002Fgithub.com\u002FLDYang694\u002FStorages\u002Freleases\u002Fdownload\u002F20260430\u002Futils3d-0.0.2-py3-none-any.whl\n```\n\n> **Note**: `requirements-hfdemo.txt` is for the Hugging Face Spaces demo (H-series GPU architecture) and may not be compatible with other architectures.\n\n### Usage\n\n#### Inference\n\nGenerate a GLB mesh from a single image:\n\n```bash\npython inference.py --image assets\u002Fimages\u002F0_img.png --output .\u002Foutput.glb\n```\n\n**Low-VRAM mode** (reduces peak VRAM by loading models on-demand):\n\n```bash\npython inference.py --image assets\u002Fimages\u002F0_img.png --output .\u002Foutput.glb --low_vram\n```\n\nBy default, the pipeline resolution is **1536** (standard mode) or **1024** (low-VRAM mode). You can override this with `--resolution`:\n\n```bash\n# Force 1536 even in low-VRAM mode\npython inference.py --image assets\u002Fimages\u002F0_img.png --output .\u002Foutput.glb --low_vram --resolution 1536\n\n# Force 1024 in standard mode\npython inference.py --image assets\u002Fimages\u002F0_img.png --output .\u002Foutput.glb --resolution 1024\n```\n\n**Tip**: If you don't have `flash_attn` installed, you can use PyTorch's built-in SDPA backend instead:\n> ```bash\n> ATTN_BACKEND=sdpa python inference.py --image assets\u002Fimages\u002F0_img.png --output .\u002Foutput.glb --low_vram\n> ```\n\n### Web Demo\n\nWe provide a Gradio web demo for Pixal3D, which allows you to generate 3D meshes from images interactively.\n\n```bash\npython app.py \n```\n\nLow-VRAM mode is also available for the web demo. The frontend default resolution will automatically switch to 1024 in low-VRAM mode (1536 otherwise), but can be changed manually in the UI.\n\n```bash\npython app.py --low_vram\n# or via environment variable:\nLOW_VRAM=1 python app.py\n```\n## 🔧 Training\n\nWe provide the full training codebase for reproducing Pixal3D from scratch.\n\n### Data Preparation\n\nPrepare view-aligned O-Voxel data and rendered condition images by following the data toolkit instructions:\n\n> 📂 **[data_toolkit\u002FREADME.md](data_toolkit\u002FREADME.md)**\n\n### Overview\n\nPixal3D is trained as a three-stage cascade, each progressively increasing resolution:\n\n| Stage | Model | Resolutions | Config Prefix |\n|-------|-------|-------------|---------------|\n| 1 | Sparse Structure | 32 → 64 | `ss_flow_img_dit_*_proj_finetune` |\n| 2 | Shape | 256 → 512 → 1024 | `slat_flow_img2shape_*_proj_finetune` |\n| 3 | Texture | 256 → 512 → 1024 | `slat_flow_imgshape2tex_*_proj_finetune` |\n\nAll stages use **pixel-aligned projection conditioning** and **view-aligned latents** (2 views by default). Within each stage, start from the lowest resolution and progressively fine-tune to higher resolutions by setting `finetune_ckpt` in the config.\n\n### Quick Start\n\n```sh\npython train.py \\\n  --config \u003CCONFIG_JSON> \\\n  --output_dir \u003COUTPUT_DIR> \\\n  --data_dir '\u003CDATA_DIR_JSON>'\n```\n\n`--data_dir` is a JSON string describing the dataset layout. Different stages require different keys:\n\n| Stage | Required keys |\n|-------|---------------|\n| Sparse Structure | `base`, `ss_latent`, `render_cond` |\n| Shape | `base`, `shape_latent`, `render_cond` |\n| Texture | `base`, `shape_latent`, `pbr_latent`, `render_cond` |\n\n### Example: Training All Three Stages\n\nBelow we show the full training sequence using ObjaverseXL as an example. Each higher-resolution step requires updating `finetune_ckpt` in its config JSON to point to the previous checkpoint.\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Stage 1: Sparse Structure (32 → 64)\u003C\u002Fb>\u003C\u002Fsummary>\n\n```sh\n# Resolution 32\npython train.py \\\n  --config configs\u002Fgen\u002Fss_flow_img_dit_1_3B_32_bf16_proj_finetune.json \\\n  --output_dir results\u002Fss_32 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets\u002FObjaverseXL_sketchfab\", \"ss_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fss_latents\u002Fss_enc_conv3d_16l8_fp16_64_view\", \"render_cond\": \"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\"}}'\n\n# Resolution 64 (set finetune_ckpt → results\u002Fss_32 checkpoint)\npython train.py \\\n  --config configs\u002Fgen\u002Fss_flow_img_dit_1_3B_32_bf16_proj_finetune_ft64.json \\\n  --output_dir results\u002Fss_ft64 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets\u002FObjaverseXL_sketchfab\", \"ss_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fss_latents\u002Fss_enc_conv3d_16l8_fp16_64_view\", \"render_cond\": \"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\"}}'\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Stage 2: Shape (256 → 512 → 1024)\u003C\u002Fb>\u003C\u002Fsummary>\n\n```sh\n# Resolution 256\npython train.py \\\n  --config configs\u002Fgen\u002Fslat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune.json \\\n  --output_dir results\u002Fshape_256 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets\u002FObjaverseXL_sketchfab\", \"shape_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fshape_latents\u002Fshape_enc_next_dc_f16c32_fp16_256_view\", \"render_cond\": \"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\"}}'\n\n# Resolution 512\npython train.py \\\n  --config configs\u002Fgen\u002Fslat_flow_img2shape_dit_1_3B_256_bf16_proj_finetune_ft512.json \\\n  --output_dir results\u002Fshape_ft512 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets\u002FObjaverseXL_sketchfab\", \"shape_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fshape_latents\u002Fshape_enc_next_dc_f16c32_fp16_512_view\", \"render_cond\": \"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\"}}'\n\n# Resolution 1024\npython train.py \\\n  --config configs\u002Fgen\u002Fslat_flow_img2shape_dit_1_3B_512_bf16_proj_finetune_ft1024.json \\\n  --output_dir results\u002Fshape_ft1024 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets\u002FObjaverseXL_sketchfab\", \"shape_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fshape_latents\u002Fshape_enc_next_dc_f16c32_fp16_1024_view\", \"render_cond\": \"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\"}}'\n```\n\u003C\u002Fdetails>\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>Stage 3: Texture (256 → 512 → 1024)\u003C\u002Fb>\u003C\u002Fsummary>\n\n```sh\n# Resolution 256\npython train.py \\\n  --config configs\u002Fgen\u002Fslat_flow_imgshape2tex_dit_1_3B_256_bf16_proj_finetune.json \\\n  --output_dir results\u002Ftex_256 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets\u002FObjaverseXL_sketchfab\", \"shape_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fshape_latents\u002Fshape_enc_next_dc_f16c32_fp16_256_view\", \"pbr_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fpbr_latents\u002Ftex_enc_next_dc_f16c32_fp16_256_view\", \"render_cond\": \"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\"}}'\n\n# Resolution 512\npython train.py \\\n  --config configs\u002Fgen\u002Fslat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune.json \\\n  --output_dir results\u002Ftex_512 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets\u002FObjaverseXL_sketchfab\", \"shape_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fshape_latents\u002Fshape_enc_next_dc_f16c32_fp16_512_view\", \"pbr_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fpbr_latents\u002Ftex_enc_next_dc_f16c32_fp16_512_view\", \"render_cond\": \"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\"}}'\n\n# Resolution 1024\npython train.py \\\n  --config configs\u002Fgen\u002Fslat_flow_imgshape2tex_dit_1_3B_512_bf16_proj_finetune_ft1024.json \\\n  --output_dir results\u002Ftex_ft1024 \\\n  --data_dir '{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets\u002FObjaverseXL_sketchfab\", \"shape_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fshape_latents\u002Fshape_enc_next_dc_f16c32_fp16_1024_view\", \"pbr_latent\": \"datasets\u002FObjaverseXL_sketchfab\u002Fpbr_latents\u002Ftex_enc_next_dc_f16c32_fp16_1024_view\", \"render_cond\": \"datasets\u002FObjaverseXL_sketchfab\u002Frenders_cond\"}}'\n```\n\u003C\u002Fdetails>\n\n### Additional Options\n\n\u003Cdetails>\n\u003Csummary>\u003Cb>All command-line arguments\u003C\u002Fb>\u003C\u002Fsummary>\n\n| Argument | Description | Default |\n|----------|-------------|---------|\n| `--config` | Config JSON path | *required* |\n| `--output_dir` | Output directory | *required* |\n| `--data_dir` | Dataset JSON string | `.\u002Fdata\u002F` |\n| `--load_dir` | Checkpoint load directory | `output_dir` |\n| `--ckpt` | Resume from step | `latest` |\n| `--auto_retry` | Retries on failure | `3` |\n| `--tryrun` | Dry run | `false` |\n| `--profile` | Profiling | `false` |\n| `--num_nodes` | Number of nodes | `1` |\n| `--node_rank` | Current node rank | `0` |\n| `--num_gpus` | GPUs per node | all |\n| `--master_addr` | Master address | `localhost` |\n| `--master_port` | Master port | `12666` |\n| `--use_wandb` | Enable W&B logging | `false` |\n| `--wandb_project` | W&B project | `trellis2-training` |\n| `--wandb_name` | W&B run name | basename of `output_dir` |\n| `--wandb_id` | W&B run ID (resume) | — |\n\n\u003C\u002Fdetails>\n\n## 🤗 Acknowledgements\n\nThis project is heavily built upon [Trellis.2](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS.2) and [Direct3D-S2](https:\u002F\u002Fgithub.com\u002FDreamTechAI\u002FDirect3D-S2). We sincerely thank the authors for their outstanding work on scalable 3D generation , which serves as the foundation of our codebase and model architecture.\n\nWe also thank the following repos for their great contributions:\n\n- [Direct3D-S2](https:\u002F\u002Fgithub.com\u002FDreamTechAI\u002FDirect3D-S2)\n- [Trellis](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS)\n- [Trellis.2](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS.2)\n\n## 📄 Citation\n\nIf you find this work useful, please consider citing:\n\n```bibtex\n@article{li2026pixal3d,\n    title={Pixal3D: Pixel-Aligned 3D Generation from Images},\n    author={Li, Dong-Yang and Zhao, Wang and Chen, Yuxin and Hu, Wenbo and Guo, Meng-Hao and Zhang, Fang-Lue and Shan, Ying and Hu, Shi-Min},\n    journal={arXiv preprint arXiv:2605.10922},\n    year={2026}\n}\n```\n\n","Pixal3D 是一个从单张图像生成高保真3D资产的项目。它通过反向投影将像素特征直接提升到三维空间，从而建立像素与3D之间的直接对应关系，实现了接近重建级别的几何细节和PBR纹理精度。该项目基于Trellis.2骨干网络进行了改进，提供了更好的性能，并且支持在线演示。适合需要高质量3D内容生成的应用场景，如游戏开发、虚拟现实以及影视制作等。",2,"2026-06-11 03:53:08","CREATED_QUERY"]