[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-1091":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":13,"stars7d":12,"stars30d":15,"stars90d":14,"forks30d":14,"starsTrendScore":16,"compositeScore":17,"rankGlobal":8,"rankLanguage":8,"license":18,"archived":19,"fork":19,"defaultBranch":20,"hasWiki":21,"hasPages":19,"topics":22,"createdAt":8,"pushedAt":8,"updatedAt":23,"readmeContent":24,"aiSummary":25,"trendingCount":14,"starSnapshotCount":14,"syncStatus":26,"lastSyncTime":27,"discoverSource":28},1091,"trellis-mac","shivampkumar\u002Ftrellis-mac","shivampkumar",null,"Python",404,26,6,4,0,33,12,55.59,"MIT License",false,"main",true,[],"2026-06-12 04:00:07","# TRELLIS.2 for Apple Silicon\n\nRun [TRELLIS.2](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS) image-to-3D generation natively on Mac.\n\nThis is a port of Microsoft's TRELLIS.2 — a state-of-the-art image-to-3D model — from CUDA-only to Apple Silicon via PyTorch MPS. No NVIDIA GPU required.\n\n## Results\n\nGenerates **400K+ vertex meshes with baked PBR textures** from a single image in **~5 minutes 13 seconds on M4 Pro** (24GB, cold start, weights cached, cool machine, pipeline type `512`). About 3m 20s of that is actual generation and baking; the remaining ~1m 45s is pipeline load that happens once per Python process.\n\nOutput is a GLB with base-color, metallic, and roughness textures — ready for use in 3D applications.\n\n### Example\n\n**Input image** &rarr; **Generated 3D mesh** (~400K vertices, ~800K triangles) with Metal-baked PBR textures:\n\n\u003Cp>\n\u003Cimg src=\"assets\u002Fshoe_input.png\" width=\"180\">\n\u003Cimg src=\"assets\u002Fshoe_front.png\" width=\"220\">\n\u003Cimg src=\"assets\u002Fshoe_3q.png\" width=\"260\">\n\u003Cimg src=\"assets\u002Fshoe_side.png\" width=\"220\">\n\u003C\u002Fp>\n\n## Requirements\n\n- macOS on Apple Silicon (M1 or later)\n- Python 3.11+\n- 24GB+ unified memory recommended (the 4B model is large)\n- ~15GB disk space for model weights (downloaded on first run)\n\n## Quick Start\n\n```bash\n# Clone this repo\ngit clone https:\u002F\u002Fgithub.com\u002Fshivampkumar\u002Ftrellis-mac.git\ncd trellis-mac\n\n# (Recommended) Download the Xcode Metal Toolchain so setup can build the\n# Metal-accelerated texture baker. Without this, setup falls back to a pure\n# Python KDTree baker (slower, slightly lower quality).\nxcodebuild -downloadComponent MetalToolchain\n\n# Log into HuggingFace (needed for gated model weights)\nhf auth login\n\n# Request access to these gated models (usually instant approval):\n#   https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fdinov3-vitl16-pretrain-lvd1689m\n#   https:\u002F\u002Fhuggingface.co\u002Fbriaai\u002FRMBG-2.0\n\n# Run setup (creates venv, installs deps, clones & patches TRELLIS.2,\n# builds Metal backends if the toolchain is available)\nbash setup.sh\n\n# Activate the environment\nsource .venv\u002Fbin\u002Factivate\n\n# Generate a 3D model from an image\npython generate.py path\u002Fto\u002Fimage.png\n```\n\nTo skip the Metal build (for example on older hardware or to speed up setup):\n\n```bash\nSKIP_METAL=1 bash setup.sh\n```\n\n`setup.sh` now pre-clones Git dependencies into `deps\u002F` so all network I\u002FO happens up front.  \nIf setup looks inconsistent or you are unsure about local clone state, remove `deps\u002F` and run setup again:\n\n```bash\nrm -rf deps\nbash setup.sh\n```\n\nOutput files are saved to the current directory (or use `--output` to specify a path).\n\n## Usage\n\n```bash\n# Basic usage\npython generate.py photo.png\n\n# With options\npython generate.py photo.png --seed 123 --output my_model --pipeline-type 512\n\n# All options\npython generate.py --help\n```\n\n| Option | Default | Description |\n|--------|---------|-------------|\n| `--seed` | 42 | Random seed for generation |\n| `--output` | `output_3d` | Output filename (without extension) |\n| `--pipeline-type` | `512` | Pipeline resolution: `512`, `1024`, `1024_cascade` |\n| `--texture-size` | `1024` | PBR texture resolution: `512`, `1024`, `2048` |\n| `--no-texture` | — | Skip texture baking, export geometry only |\n\n## What Was Ported\n\nTRELLIS.2 depends on several CUDA-only libraries. This port replaces each of them with a backend that runs on Apple Silicon:\n\n| Original (CUDA) | Replacement | Purpose |\n|---|---|---|\n| `flex_gemm` | `mtlgemm` (Pedro Naugusto's Metal port) with `backends\u002Fconv_none.py` fallback | Sparse 3D convolution. The Metal port is the default now; the pure-PyTorch gather-scatter path is the fallback for machines without the Metal Toolchain. |\n| `o_voxel._C` hashmap | `backends\u002Fmesh_extract.py` | Mesh extraction from dual voxel grid (pure Python) |\n| `flash_attn` | PyTorch SDPA | Scaled dot-product attention for sparse transformers (padded, not fused — room for improvement) |\n| `cumesh` | Skipped during decode | Called on meshes large enough to crash the Metal port; replaced with `fast_simplification` before baking |\n| `nvdiffrast` | `mtldiffrast` (Metal) with pure-Python fallback | Differentiable rasterization for texture baking |\n\nAdditionally, all hardcoded `.cuda()` calls throughout the codebase were patched to use the active device instead.\n\n### Technical Details\n\n**Sparse 3D Convolution** (`backends\u002Fconv_none.py`): Implements submanifold sparse convolution by building a spatial hash of active voxels, gathering neighbor features for each kernel position, applying weights via matrix multiplication, and scatter-adding results back. Neighbor maps are cached per-tensor to avoid redundant computation.\n\n**Mesh Extraction** (`backends\u002Fmesh_extract.py`): Reimplements `flexible_dual_grid_to_mesh` using Python dictionaries instead of CUDA hashmap operations. Builds a coordinate-to-index lookup table, finds connected voxels for each edge, and triangulates quads using normal alignment heuristics.\n\n**Attention** (patched `full_attn.py`): Adds an SDPA backend to the sparse attention module. Pads variable-length sequences into batches, runs `torch.nn.functional.scaled_dot_product_attention`, then unpads results.\n\n**Texture Baking**: By default we use the Metal stack released by [@pedronaugusto](https:\u002F\u002Fgithub.com\u002Fpedronaugusto) — [`mtldiffrast`](https:\u002F\u002Fgithub.com\u002Fpedronaugusto\u002Fmtldiffrast), [`mtlbvh`](https:\u002F\u002Fgithub.com\u002Fpedronaugusto\u002Fmtlbvh), [`mtlmesh`](https:\u002F\u002Fgithub.com\u002Fpedronaugusto\u002Fmtlmesh), and his CPU fork of [`o_voxel`](https:\u002F\u002Fgithub.com\u002Fpedronaugusto\u002Ftrellis2-apple) — which exposes `o_voxel.postprocess.to_glb`. We pre-simplify the decoder mesh to ~200K faces with `fast_simplification` before handing it to the Metal BVH (the BVH builder is unstable on 800K+ face inputs). If the Metal toolchain is unavailable, we fall back to `backends\u002Ftexture_baker.py`: xatlas UV unwrap, then a scipy cKDTree + inverse-distance weighting over the sparse voxel grid at native 512 resolution.\n\n## Performance\n\nBenchmarks on M4 Pro (24GB), pipeline type `512`, full Metal stack installed, weights cached, `SPARSE_CONV_BACKEND=flex_gemm` (default since Pedro Naugusto's zero-copy MPS fix). Numbers below are from a fresh-install end-to-end run (`\u002Fusr\u002Fbin\u002Ftime -h python generate.py shoe_input.png`). These assume a **cool machine**. M4 Pro throttles aggressively under sustained load, and I've measured the same run taking 6–10× longer when the CPU had already been pinned for an hour before I started.\n\n| Stage | Time |\n|-------|------|\n| Pipeline load (first call per process) | 103s |\n| Sparse structure sampling (12 steps) | 80s |\n| Shape SLat sampling (12 steps) | 22s |\n| Texture SLat sampling (12 steps) | 12s |\n| Shape SLat decoder (VAE forward) | ~20s |\n| Tex SLat decoder (VAE forward) | ~7s |\n| `flexible_dual_grid_to_mesh` (pure Python) | ~8s |\n| `fast_simplification` (858K → 200K faces) | ~1s |\n| Texture bake (Metal, 1024²) | ~15s |\n| **Total wall-clock (cold start)** | **5m 13s** |\n| Generation + bake only (excluding pipeline load) | 3m 20s |\n\nThe shape and texture decoder VAEs got 2.5–2.9× faster after [Pedro Naugusto fixed four bugs in `mtlgemm`](https:\u002F\u002Fgithub.com\u002Fshivampkumar\u002Ftrellis-mac\u002Fissues\u002F1#issuecomment-thread) (zero-copy MPS, real fp16\u002Fbf16 kernels, real masked implicit GEMM, no per-call `waitUntilCompleted`). Before his fix, the decoder path was ~38s; now it's ~27s. Sampling steps that also touch sparse conv saw smaller wins — those paths are dominated by attention, which is still SDPA-padded on MPS.\n\nMemory usage peaks at around 18GB unified memory during generation.\n\nFirst-ever run adds ~15GB of HuggingFace weight downloads (TRELLIS.2, DINOv3, RMBG-2.0) — network-bound, not included above. The pipeline load time is dominated by deserializing those weights from disk; if you batch multiple images in one Python process you pay load once.\n\nWith `SKIP_METAL=1` (pure-Python KDTree baker) the texture bake takes ~15s instead of ~11s and coverage near UV chart boundaries is slightly softer. Without the `mtlgemm` Python package specifically, the Metal baker falls back to a `torch.nn.functional.grid_sample` call that can leave mild ring artifacts on curved surfaces; installing `mtlgemm` (done automatically by `setup.sh`) gets rid of them.\n\n## Limitations\n\n- **Hole filling disabled**: Decode-time hole filling requires `cumesh`. The Metal port segfaults on decoder-sized meshes, so we skip this step. Output meshes may have small holes.\n- **Sparse attention is not fused**: The SDPA-padded wrapper works but is the single largest remaining bottleneck (~80 s of a 5m 13s run, the sparse structure sampling phase). A fused Metal attention kernel would be a meaningful perf win.\n- **Pre-simplified before texture bake**: The mesh is decimated from ~800K to ~200K faces before Metal BVH construction to avoid builder instability. If you need the full-resolution mesh, export it via the OBJ output (which is written before simplification).\n- **No training support**: Inference only.\n\n### On `mtlgemm` \u002F `flex_gemm` and thermal throttling\n\n`setup.sh` installs `mtlgemm` as part of the Metal stack. It's used both for the sparse conv diffusion path (since Pedro Naugusto's zero-copy fix) and for the texture baker's `grid_sample_3d`. Without it, `generate.py` falls back to `conv_none.py` for diffusion and monkey-patches `o_voxel.postprocess._grid_sample_3d` with a `torch.nn.functional.grid_sample` call for the bake. The fallback path works but is slower and leaves mild ring artifacts on curved surfaces.\n\nOne thing that burned me during testing: after the M4 Pro had been doing heavy compute for a few hours, the same pipeline slowed from ~3.5 min generation to ~36 min purely from thermal throttling — nothing in the code path changed. If you see unusually slow runs, let the machine cool for a few minutes and retry before blaming the code.\n\n## License\n\nThe porting code in this repository (backends, patches, scripts) is released under the MIT License.\n\nUpstream model weights are subject to their own licenses:\n- **TRELLIS.2**: [MIT License](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS.2\u002Fblob\u002Fmain\u002FLICENSE)\n- **DINOv3**: [Meta custom license](https:\u002F\u002Fhuggingface.co\u002Ffacebook\u002Fdinov3-vitl16-pretrain-lvd1689m\u002Fblob\u002Fmain\u002FLICENSE.md) (gated, review before commercial use)\n- **RMBG-2.0**: [CC BY-NC 4.0](https:\u002F\u002Fhuggingface.co\u002Fbriaai\u002FRMBG-2.0) (non-commercial; commercial use requires a license from BRIA)\n\n## Credits\n\n- [TRELLIS.2](https:\u002F\u002Fgithub.com\u002Fmicrosoft\u002FTRELLIS.2) by Microsoft Research — the original model and codebase\n- [DINOv3](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fdinov3) by Meta — image feature extraction\n- [RMBG-2.0](https:\u002F\u002Fgithub.com\u002FBria-AI\u002FRMBG-2.0) by BRIA AI — background removal\n- [@pedronaugusto](https:\u002F\u002Fgithub.com\u002Fpedronaugusto) — `mtldiffrast`, `mtlbvh`, `mtlmesh`, and the CPU fork of `o_voxel` that together provide the Metal texture-baking path used by this repo\n","TRELLIS.2 for Apple Silicon 是一个将图像转换为3D模型的项目，特别针对搭载Apple Silicon芯片的Mac设备进行了优化。它基于微软的TRELLIS.2模型，通过PyTorch MPS技术从CUDA移植到Apple平台上运行，无需NVIDIA GPU支持。核心功能是能够从单张图片生成具有40万以上顶点和PBR纹理的高质量3D网格，整个过程大约需要5分钟左右（M4 Pro配置下）。生成的3D模型以GLB格式输出，包含基础颜色、金属度及粗糙度贴图，适用于各类3D应用开发场景，如游戏设计、虚拟现实内容制作等。此项目适合拥有至少24GB统一内存的最新macOS系统用户使用。",2,"2026-06-11 02:41:33","CREATED_QUERY"]