[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74132":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":25,"hasPages":23,"topics":26,"createdAt":10,"pushedAt":10,"updatedAt":27,"readmeContent":28,"aiSummary":29,"trendingCount":16,"starSnapshotCount":16,"syncStatus":30,"lastSyncTime":31,"discoverSource":32},74132,"see-through","shitagaki-lab\u002Fsee-through","shitagaki-lab","\"Single-image Layer Decomposition for Anime Characters\" (SIGGRAPH 2026, Conditionally Accepted)","",null,"Python",2777,251,27,9,0,56,126,322,168,29.2,"Apache License 2.0",false,"main",true,[],"2026-06-12 02:03:22","\nSee-through: Single-image Layer Decomposition for Anime Characters\n---\n\n\u003Ca href='https:\u002F\u002Farxiv.org\u002Fabs\u002F2602.03749'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2602.03749-b31b1b.svg'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fhuggingface.co\u002Fspaces\u002F24yearsold\u002Fsee-through-demo'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Space-PSD%20Inference%20Demo-blue'>\u003C\u002Fa>\n\u003Ca href='https:\u002F\u002Fmodelscope.cn\u002Fstudios\u002Fljsabc\u002FSee-Through'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Demo%2F在线演示-624aff.svg'>\u003C\u002Fa>\n\n\n_**[Jian Lin](https:\u002F\u002Fgithub.com\u002FdmMaze)\u003Csup>1\u003C\u002Fsup>, [Chengze Li](https:\u002F\u002Fmoeka.me)\u003Csup>1*\u003C\u002Fsup>, [Haoyun Qin](https:\u002F\u002Fhaoyunqin.com\u002F)\u003Csup>2,3,4\u003C\u002Fsup>, Kwun Wang Chan\u003Csup>1\u003C\u002Fsup>, [Yanghua Jin](https:\u002F\u002Fgithub.com\u002FAixile)\u003Csup>3\u003C\u002Fsup>, [Hanyuan Liu](https:\u002F\u002Fgithub.com\u002Fhyliu)\u003Csup>1\u003C\u002Fsup>, Stephen Chun Wang Choy\u003Csup>1\u003C\u002Fsup>, Xueting Liu\u003Csup>1\u003C\u002Fsup>**_\n\n\u003Csup>1\u003C\u002Fsup>Saint Francis University &emsp; \u003Csup>2\u003C\u002Fsup>University of Pennsylvania &emsp; \u003Csup>3\u003C\u002Fsup>Spellbrush &emsp; \u003Csup>4\u003C\u002Fsup>Shitagaki Lab\n\n\u003Csup>*\u003C\u002Fsup>Corresponding author\n\nConditionally accepted to appear in *ACM SIGGRAPH 2026 Conference Proceedings*.\n\n\u003C\u002Fdiv>\n\n---\n\n> **Notice:** This is an open-source research project. We have not set up any paid service for this tool. If you encounter a website charging for this functionality, it is not from us. Use at your own risk.\n>\n> **声明：** 本项目为开源研究项目，我们未开设任何付费服务。如遇到以此功能收费的网站，均与我们无关，请注意甄别。\n\n## TL;DR\n\nWe introduce a framework that automates the transformation of static anime illustrations into manipulatable **2.5D models**. Our approach decomposes a single image into fully inpainted, semantically distinct layers with inferred drawing orders — up to **23 layers** including hair, face, eyes, clothing, accessories, and more.\n\n![Our Representative Image](common\u002Fassets\u002Frepresentative.jpg)\n\n\n\u003Cdiv align=\"center\">\n  \n\nhttps:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F023d271f-d8d7-4f6b-9083-96e714fb93e0\n\n\n  \u003Cbr>\n  \u003Cem>This is our trailer video. Click to play.\u003C\u002Fem>\n\u003C\u002Fdiv>\n\n## Environment Setup\n\n```bash\n# 1. Create environment\nconda create -n see_through python=3.12 -y\nconda activate see_through\n\n# 2. Install PyTorch (CUDA 12.8)\n# aarch64 users: the pinned versions below may not be available; use torch>=2.9.0 instead\npip install torch==2.8.0+cu128 torchvision==0.23.0+cu128 torchaudio==2.8.0+cu128 \\\n  --index-url https:\u002F\u002Fdownload.pytorch.org\u002Fwhl\u002Fcu128\n\n# 3. Install dependencies (includes common utilities and annotators)\npip install -r requirements.txt\n\n# 4. Create assets symlink (you can also copy assets to the root if you prefer)\nln -sf common\u002Fassets assets\n```\n\n**Optional annotator tiers** (install as needed):\n\n| Tier | Command | What it adds |\n|------|---------|-------------|\n| Body parsing | `pip install --no-build-isolation -r requirements-inference-annotators.txt` | detectron2 for body attribute tagging |\n| SAM2 | `pip install --no-build-isolation -r requirements-inference-sam2.txt` | SAM2 for language-guided segmentation |\n| Instance seg | `pip install -r requirements-inference-mmdet.txt` | mmcv\u002Fmmdet for anime instance segmentation |\n\n> **Note:** Always run scripts from the repository root as the working directory.\n\n## Scripts & Models\n\n### Models\n\n| Model | HuggingFace Repo | Description |\n|-------|-----------------|-------------|\n| LayerDiff 3D | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002Flayerdifforg\u002Fseethroughv0.0.2_layerdiff3d'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97-layerdiff3d-blue'>\u003C\u002Fa> | Diffusion-based transparent layer generation (SDXL) |\n| Marigold Depth | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002F24yearsold\u002Fseethroughv0.0.1_marigold'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97-marigold__depth-blue'>\u003C\u002Fa> | Pseudo-depth estimation fine-tuned for anime |\n| SAM Body Parsing | \u003Ca href='https:\u002F\u002Fhuggingface.co\u002F24yearsold\u002Fl2d_sam_iter2'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97-sam__body__parsing-blue'>\u003C\u002Fa> | Semantic body part segmentation|\n\n### Inference Scripts\n\n| Script | Purpose |\n|--------|---------|\n| `inference\u002Fscripts\u002Finference_psd.py` | **Main pipeline** — end-to-end layer decomposition → PSD output |\n| `inference\u002Fscripts\u002Fsyn_data.py` | Synthetic training data generation utilities |\n\n> For the other inference\u002Fdata parsing scripts refer to the [codebase](.\u002Finference\u002Fscripts\u002F) and check the docstrings for details.\n\n### Demo\n\n| Notebook | Description |\n|----------|-------------|\n| `inference\u002Fdemo\u002Fbodypartseg_sam.ipynb` | Interactive body part segmentation demo with visualization (19-parts) |\n\n> For the definition of complete body tags, refer to [scrap_model.py](.\u002Fcommon\u002Flive2d\u002Fscrap_model.py).\n\n### Online Demo\n\nWe have prepared [a Huggingface Space](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002F24yearsold\u002Fsee-through-demo) with ZeroGPU, so that if you register with HuggingFace, you should be able to run 1-2 PSD extractions per day (approximately 2-3 mins each, at 1280 resolution).\n\nFor users in Mainland China, we also provide a [ModelScope demo](https:\u002F\u002Fmodelscope.cn\u002Fstudios\u002Fljsabc\u002FSee-Through). It's completely free now, and supports slightly higher resolution than the HuggingFace demo. We will continue to maintain both demos to ensure accessibility for users worldwide.\n\n中国大陆用户可以使用[魔搭社区 ModelScope 在线演示](https:\u002F\u002Fmodelscope.cn\u002Fstudios\u002Fljsabc\u002FSee-Through)，目前完全免费，并且可以使用更高一点的分辨率。\n\n\u003Cimg alt=\"image\" src=\"https:\u002F\u002Fgithub.com\u002Fuser-attachments\u002Fassets\u002F3f98f47b-e98b-4628-9859-8772cda69f93\" \u002F>\n\n(Copyright [Tohoku Zunko Project](https:\u002F\u002Fzunko.jp\u002F)).\n\n\n## Usage\n\n### Layer Decomposition (main pipeline)\n\n`inference_psd.py` runs the full See-through pipeline: it applies the **LayerDiff 3D** model\nfor transparent layer generation and the fine-tuned **Marigold** model for pseudo-depth\ninference, then stratifies the character into up to **23 semantic layers** and exports a\nlayered PSD file. Note that the separation for head and body are in two continuous stages, which\nmay lead to a longer time than the original model mentioned in the paper. \n\n```bash\n# Decompose a single image into a layered PSD\npython inference\u002Fscripts\u002Finference_psd.py \\\n  --srcp assets\u002Ftest_image.png \\\n  --save_to_psd\n\n# Process a directory of images\npython inference\u002Fscripts\u002Finference_psd.py \\\n  --srcp path\u002Fto\u002Fimage_folder\u002F \\\n  --save_to_psd\n```\n\nOutput is saved to `workspace\u002Flayerdiff_output\u002F` by default. Each result includes:\n- A layered `.psd` file with semantically separated layers\n- Intermediate depth maps and segmentation masks\n\n> **Note:** This uses our most recent model with 23-layer body part separation (V3).\n\n\nOnce you have finished the layer splitting, you can further process the PSD with the scripts in `inference\u002Fscripts\u002Fheuristic_partseg.py` for depth-based or left-right stratification.\n\n\n```bash\n# Split based on depth\npython inference\u002Fscripts\u002Fheuristic_partseg.py seg_wdepth --srcp workspace\u002Ftest_samples_output\u002FPV_0047_A0020.psd --target_tags handwear\n\n\n#Left-right split\npython inference\u002Fscripts\u002Fheuristic_partseg.py seg_wlr --srcp workspace\u002Ftest_samples_output\u002FPV_0047_A0020_wdepth.psd --target_tags handwear-1\n```\n\n### Low-VRAM Users\n\nThe default pipeline runs at bf16 precision and requires approximately 12-16 GB of VRAM at 1280 resolution.\n\n**12 GB GPUs**: Enable group offload to reduce peak VRAM to ~10 GB at 1280 resolution:\n\n```bash\npython inference\u002Fscripts\u002Finference_psd.py \\\n  --srcp assets\u002Ftest_image.png \\\n  --save_to_psd \\\n  --group_offload\n```\n\n**8 GB GPUs**: Use the NF4 quantized pipeline, which uses 4-bit quantized model weights. This achieves ~8 GB peak VRAM at 1280 resolution, and can be further reduced by lowering the resolution with group offload:\n\n```bash\n# Install bitsandbytes (one-time)\npip install -r requirements-inference-bnb.txt\n\n# Run with NF4 quantization (default: group_offload on, depth resolution 720)\npython inference\u002Fscripts\u002Finference_psd_quantized.py \\\n  --srcp assets\u002Ftest_image.png \\\n  --save_to_psd\n\n# For even lower VRAM, reduce layerdiff resolution to 1024\npython inference\u002Fscripts\u002Finference_psd_quantized.py \\\n  --srcp assets\u002Ftest_image.png \\\n  --save_to_psd \\\n  --resolution 1024\n```\n\nThe quantized models are hosted on HuggingFace and downloaded automatically on first run. Quality is close to the full-precision model (PSNR ~30 dB, SSIM ~0.96 vs bf16 baseline).\n\n> **Note:** Group offload trades speed for VRAM savings (roughly 1.5x slower). NF4 quantization has minimal speed overhead but reduces model weight memory.\n\n**8 GB GPUs**: Block swap pipeline achieves ~8 GB peak VRAM at 1280 resolution with bf16 precision:\n\n```bash\npython inference\u002Fscripts\u002Finference_psd_blockswap.py \\\n  --srcp assets\u002Ftest_image.png \\\n  --save_to_psd \\\n```\n\n### Preparing the dataset for training (e.g., Live2D Parsing)\n\nWe have provided a separate repo for you to prepare the dataset for training the Live2D parsing model. Please refer to [CubismPartExtr](https:\u002F\u002Fgithub.com\u002Fshitagaki-lab\u002FCubismPartExtr) to know how to download the sample model files and prepare your workspace folder. \n\nAfter that, refer to the `README_datapipeline.md` for the instructions on how to run the data parsing scripts to prepare the dataset for inspection and training. \n\n### User Interface\n\nOnce you have prepared your data, you may go ahead with the user interfaces. Refer to [UI Readme](ui\u002FREADME.md) for the instructions on how to launch the UI.\n\n> We currently require the `workspace\u002Fdatasets\u002F` folder located at the repository root to launch the UI, as it contains the sample data for demonstration. We will work on making this more flexible in the future.\n\n> We recommend installing the `mmdet` tier dependencies to ensure the UI can launch successfully. \n\n\n### Training\n\nTraining scripts for all models (LayerDiff, Marigold depth, VAE, body part segmentation)\nare available in [`training\u002F`](training\u002FREADME.md), along with configs and data pipeline\nutilities. Our training was conducted on 8x NVIDIA H200 GPUs.\n\n\n## Community Support\n\nWe welcome community contributions and third-party integrations! \n\nIf you build tools, extensions, or workflows on top of this project, please let us know by opening an issue or pull request — we would be happy to feature your work here.\n\n- [ComfyUI-See-through](https:\u002F\u002Fgithub.com\u002Fjtydhr88\u002FComfyUI-See-through) by [@jtydhr88](https:\u002F\u002Fgithub.com\u002Fjtydhr88) — Integration for ComfyUI, with node-based workflow and in-browser PSD export. Thank you for the amazing work!\n- [PachiPakuGen](https:\u002F\u002Fgithub.com\u002Fkazuya-bros\u002FPachiPakuGen) by [@kazuya-bros](https:\u002F\u002Fgithub.com\u002Fkazuya-bros) — Desktop tool that takes See-Through's decomposed PSD output and generates animation materials (eye blinks, lip-sync mouth shapes) for [SpriTalk](https:\u002F\u002Fkazuyabros.booth.pm\u002Fitems\u002F8102679), a talking-character animation tool. Visit their [Booth](https:\u002F\u002Fkazuyabros.booth.pm\u002Fitems\u002F8102679) for the tool and demo videos!\n- [StretchyStudio](https:\u002F\u002Fgithub.com\u002FMangoLion\u002Fstretchystudio) — Free, in-browser 2D puppet animation tool that auto-rigs See-through's decomposed PSD layers, closing the gap between decomposition and a fully animatable character. Drop our PSD output directly into it and it just works. Check out their [Reddit thread](https:\u002F\u002Fwww.reddit.com\u002Fr\u002FStableDiffusion\u002Fcomments\u002F1sjj7ta\u002Ffree_opensource_tool_to_instantly_rig_and_animate\u002F) and [live editor](https:\u002F\u002Feditor.stretchy.studio).\n\nWe also seek i18n help for this project. Your help will be highly appreciated.\n\n\n## Discussion: Is this Image-to-Live2D?\n\nWe don't think so — at least, not yet.\n\nWhile we produce 2.5D layer decompositions from a single image,\nthe full Image-to-Live2D pipeline requires significantly more:\n\n1. **Finer artistic decomposition.** Live2D models demand layers designed with specific\n   deformation behaviors in mind. Our automatic decomposition prioritizes semantic\n   correctness, but a Live2D artist would make different artistic choices about how\n   to split layers for natural-looking motion.\n\n2. **Rigging.** After decomposition, a Live2D model needs a deformation mesh, physics\n   parameters, and motion curves — this rigging process is arguably the most critical\n   (and labor-intensive) step, and it is not covered in this project.\n\n3. **Artistic intent.** Professional Live2D works are crafted holistically: the layer\n   structure, inpainting style, and rigging are designed together. Automating one step\n   in isolation cannot replicate this.\n\nThat said, we believe our decomposition can serve as a useful **starting point** for\nLive2D artists by eliminating some of the most tedious part of the workflow, such as manual segmentation\nand occluded region inpainting.\n\n## Changelog\n\n**2026-04-14**\n- Released training scripts, configs, and data pipeline for all models (LayerDiff, Marigold depth, VAE, body part segmentation). This is the V3 model with 23 body-part tag training.\n\n**2026-04-02**\n- Multiple memory optimizations; added suggestions for low-VRAM users (group offload, NF4 quantization).\n\n## Acknowledgements\n\nThis work is funded and substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project. No. UGC\u002FFDS11\u002FE02\u002F23).\n\nWe would like to pay our thanks to the following people for their help and support:\n\n+ [Dingkun Yan](https:\u002F\u002Fscholar.google.com\u002Fcitations?user=dM0hOpIAAAAJ&hl=en) and [Xinrui Wang](https:\u002F\u002Fsystemerrorwang.github.io\u002F) for their inspiration and support on the project.\n+ [USTC Student ACG Club \"LEO\"](https:\u002F\u002Fspace.bilibili.com\u002F7021308) for kindly providing the sample Live2D model files for us to demonstrate on the paper.\n\n\n\nThis is an open-source research project.\nWe thank the authors of the following projects that made this work possible:\n\n- [LayerDiffuse](https:\u002F\u002Fgithub.com\u002Flllyasviel\u002FLayerDiffuse_DiffusersCLI) — Transparent image layer diffusion (Lvmin Zhang is always a legend)\n- [Marigold](https:\u002F\u002Fgithub.com\u002Fprs-eth\u002FMarigold) — Diffusion-based monocular depth estimation\n- [Segment Anything (SAM)](https:\u002F\u002Fgithub.com\u002Ffacebookresearch\u002Fsegment-anything) — Foundation model for segmentation\n- [Grounding DINO](https:\u002F\u002Fgithub.com\u002FIDEA-Research\u002FGroundingDINO) — Open-set object detection\n- [LaMa](https:\u002F\u002Fgithub.com\u002Fadvimman\u002Flama) — Large mask inpainting\n- [AnimeInstanceSegmentation](https:\u002F\u002Fgithub.com\u002FdreMaz\u002FAnimeInstanceSegmentation) — Anime-specific instance segmentation\n\n## Citation\n\nIf you find this work useful, please cite:\n\n```bibtex\n@article{lin2026seethrough,\n  title={See-through: Single-image Layer Decomposition for Anime Characters},\n  author={Lin, Jian and Li, Chengze and Qin, Haoyun and Chan, Kwun Wang and Jin, Yanghua and Liu, Hanyuan and Choy, Stephen Chun Wang and Liu, Xueting},\n  journal={arXiv preprint arXiv:2602.03749},\n  year={2026}\n}\n```\n","See-through 是一个用于将静态动漫插图自动转换为可操作的2.5D模型的框架。该项目能够从单张图像中分解出最多23层语义不同的完全填充层，包括头发、脸部、眼睛、服装和配饰等，并推断出这些层的绘制顺序。技术上，它使用了Python语言开发，并依赖于PyTorch等深度学习库来实现复杂的图像处理任务。特别适用于动漫角色设计、动画制作以及需要对二维艺术作品进行三维化处理的场景。此项目已在SIGGRAPH 2026会议上被有条件接受发表。",2,"2026-06-11 03:48:58","high_star"]