[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-78249":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":14,"forks30d":14,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":9,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":14,"starSnapshotCount":14,"syncStatus":15,"lastSyncTime":27,"discoverSource":28},78249,"T2I-L2P","TencentYoutuResearch\u002FT2I-L2P","TencentYoutuResearch","Code for \"L2P: Unlocking Latent Potential for Pixel Generation\"",null,"Python",169,11,1,0,2,7,119,8,3.24,false,"main",true,[],"2026-06-12 02:03:46","\u003Cdiv align=\"center\">\n\n# L2P: Unlocking Latent Potential for Pixel Generation\n\n\u003Cp>\n  \u003Ca href=\"https:\u002F\u002Fnju-pcalab.github.io\u002Fprojects\u002FL2P\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject-Page-1f72ff?style=for-the-badge&logo=githubpages&logoColor=white\" alt=\"Project Page\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.12013\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv-2605.12013-b31b1b?style=for-the-badge&logo=arxiv&logoColor=white\" alt=\"arXiv\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fdatasets\u002Fzhen-nan\u002FL2P-dataset\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDataset-L2P-ffcc4d?style=for-the-badge&logo=huggingface&logoColor=white\" alt=\"Dataset\">\n  \u003C\u002Fa>\n  \u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmultimodalart\u002Fz-image-6b-pixel-space\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHF-Space-ff9d00?style=for-the-badge&logo=huggingface&logoColor=white\" alt=\"HF Space\">\n  \u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cp>\n  \u003Cem>An efficient transfer paradigm enabling high-quality, end-to-end pixel-space diffusion with minimal computational overhead and data requirements.\u003C\u002Fem>\n\u003C\u002Fp>\n\n\u003Cdiv align=\"center\">\n\u003Csub>⭐ If L2P helps your research or product, please consider giving the repo a star ⭐\u003C\u002Fsub>\n\u003C\u002Fdiv>\n\n\n\u003C\u002Fdiv>\n\n---\n\n## 📰 News\n\n- **\\[2026.05.12\\]** Technical report released.\n- **\\[2026.05.22\\]** 1K-resolution training code, inference code, weights, and dataset released.\n- **\\[2026.05.23\\]** Online [demo](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmultimodalart\u002Fz-image-6b-pixel-space). (Thanks to [multimodalart](https:\u002F\u002Fhuggingface.co\u002Fmultimodalart) for the support!)\n\n---\n\n## 🗺️ Roadmap\n\n| Status | Item |\n| :---: | :--- |\n| ✅ | 1K inference code & weights |\n| ✅ | Training code |\n| 🛠️ | 4K\u002F8K\u002F10K UHR generation |\n| 🛠️ | Compatibility with more LDM model|\n\n---\n\n## 📦 Installation\n\n```bash\ngit clone https:\u002F\u002Fgithub.com\u002FTencentYoutuResearch\u002FT2I-L2P.git\ncd T2I-L2P\npip install -e .\n```\n\n\n---\n\n\n\n## 🎨 Inference\nCheckpoint:\n| Model | Params | HuggingFace |\n|-------|--------|-------------|\n|L2P-z-image (1k resolution)       |6B        |[🤗](https:\u002F\u002Fhuggingface.co\u002Fzhen-nan\u002FL2P)              |\n\n```python\nimport torch\nfrom diffsynth.pipelines.z_image_L2P import ZImagePipeline, ModelConfig\n\nmain_model_path = \"\u002Fpath\u002Fmodel-1k-merge.safetensors\"\n\ntext_encoder_paths = [\n    \"\u002Fpath\u002FZ-Image-Turbo\u002Ftext_encoder\u002Fmodel-00001-of-00003.safetensors\",\n    \"\u002Fpath\u002FZ-Image-Turbo\u002Ftext_encoder\u002Fmodel-00002-of-00003.safetensors\",\n    \"\u002Fpath\u002FZ-Image-Turbo\u002Ftext_encoder\u002Fmodel-00003-of-00003.safetensors\",\n]\n\ntokenizer_path = \"\u002Fpath\u002FZ-Image-Turbo\u002Ftokenizer\"\n\npipe = ZImagePipeline.from_pretrained(\n    torch_dtype=torch.bfloat16,\n    device=\"cuda\",\n    model_configs=[\n        ModelConfig(path=[main_model_path]),\n        ModelConfig(path=text_encoder_paths),\n    ],\n    tokenizer_config=ModelConfig(path=tokenizer_path),\n)\n\nprompt = \"an origami pig on fire in the middle of a dark room with a pentagram on the floor\"\n\nimage = pipe(\n    prompt=prompt,\n    seed=42,\n    rand_device=\"cuda\",\n    num_inference_steps=30,\n    cfg_scale=2.0,\n    height=1024,\n    width=1024,\n)\n\nimage.save(\"example.png\")\n```\n\n### Gradio Demo\n\nFirst, install gradio:\n\n```bash\npip install gradio\n```\n\nLaunch a multi-GPU web UI:\n\n```bash\npython app.py\n```\n\nThe demo auto-detects free GPUs, dispatches each request to an idle device, and exposes a Gradio interface at `http:\u002F\u002F0.0.0.0:23231`.\n\n---\n\n## 🏋️ Training\n\nThe full training pipeline consists of four steps:\n**(1)** prepare the Z-Image base weights → **(2)** convert them into a pixel-space initialization → **(3)** launch training → **(4)** merge the trained delta back with the pixel-init weights for inference.\n\n### Step 1 · Prepare Z-Image weights\n\nDownload the official **Z-Image-Turbo** checkpoint from Hugging Face:\n\n- 🤗 [Tongyi-MAI\u002FZ-Image-Turbo](https:\u002F\u002Fhuggingface.co\u002FTongyi-MAI\u002FZ-Image-Turbo)\n\n\n\n### Step 2 · Offline weight conversion (latent → pixel init)\n\nConvert the latent-space DiT weights into a **pixel-space initialization** that L2P can fine-tune from:\n\n```bash\npython examples\u002Fz_image\u002FL2P_convert_weight.py \\\n  --latent_ckpt_files \\\n    \u002Fpath\u002Fto\u002FZ-Image-Turbo\u002Ftransformer\u002Fdiffusion_pytorch_model-00001-of-00003.safetensors \\\n    \u002Fpath\u002Fto\u002FZ-Image-Turbo\u002Ftransformer\u002Fdiffusion_pytorch_model-00002-of-00003.safetensors \\\n    \u002Fpath\u002Fto\u002FZ-Image-Turbo\u002Ftransformer\u002Fdiffusion_pytorch_model-00003-of-00003.safetensors \\\n  --output_path .\u002Fpretrain_weight\u002FZ-Image-Pixel-Init\u002Fdiffusion_pytorch_model.safetensors\n```\n\n### Step 3 · Launch training\n\n**Standard training** :\n\n```bash\nbash train_run.sh\n```\n\n**Low-VRAM training** (single GPU \u003C 24 GB VRAM):\n\n```bash\nbash train_run_low_VRAM.sh\n```\n\n#### Dataset format\n\nProvide a directory of images plus a CSV metadata file:\n\n```\ndata\u002F\n├── images\u002F                # raw image folder\n└── metadata.csv           # columns: file_name, text, ...\n```\n\n### Step 4 · Offline weight merge (for inference)\n\n```bash\npython merge_weights.py \\\n  --file_a .\u002Fmodels\u002Ftrain\u002FL2P_Standard\u002Fstep-xxx.safetensors \\\n  --file_b .\u002Fpretrain_weight\u002FZ-Image-Pixel-Init\u002Fdiffusion_pytorch_model.safetensors \\\n  --file_out .\u002Fmodels\u002Ftrain\u002FL2P_Standard\u002Fmodel-merge.safetensors\n```\n\n- `--file_a`: trained checkpoint from Step 3\n- `--file_b`: pixel-init weights from Step 2\n- `--file_out`: merged single-file weight\n---\n\n## 📜 Citation\n\nIf you find this work useful, please consider citing:\n\n```bibtex\n@article{chen2026l2p,\n  title   = {L2P: Unlocking Latent Potential for Pixel Generation},\n  author  = {Chen, Zhennan and Zhu, Junwei and Chen, Xu and Zhang, Jiangning and\n             Chen, Jiawei and Zeng, Zhuoqi and Zhang, Wei and Wang, Chengjie and\n             Yang, Jian and Tai, Ying},\n  journal = {arXiv preprint arXiv:2605.12013},\n  year    = {2026}\n}\n\n@article{chen2025dip,\n  title   = {DiP: Taming Diffusion Models in Pixel Space},\n  author  = {Chen, Zhennan and Zhu, Junwei and Chen, Xu and Zhang, Jiangning and\n             Hu, Xiaobin and Zhao, Hanzhen and Wang, Chengjie and Yang, Jian and\n             Tai, Ying},\n  journal = {arXiv preprint arXiv:2511.18822},\n  year    = {2025}\n}\n```\n\n---\n\n## 🙏 Acknowledgements\n\nL2P is built upon the excellent open-source work of\n[**DiffSynth-Studio**](https:\u002F\u002Fgithub.com\u002Fmodelscope\u002FDiffSynth-Studio),\n[**Z-Image**](https:\u002F\u002Fgithub.com\u002FTongyi-MAI\u002FZ-Image).\n\n","L2P 项目旨在通过一种高效的迁移范式，实现高质量的端到端像素空间扩散生成，同时最小化计算开销和数据需求。其核心功能包括1K分辨率图像的生成、推理代码及权重提供，并且正在开发4K\u002F8K\u002F10K超高清图像生成能力。技术上，L2P基于Python开发，利用了扩散模型来解锁潜在的像素生成潜力，支持与更多latent diffusion model (LDM) 模型兼容。此项目特别适合于需要高效图像合成的研究场景或产品开发中，如艺术创作、虚拟现实内容生成等领域。","2026-06-11 03:56:41","CREATED_QUERY"]