[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-80125":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":16,"stars7d":13,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":18,"compositeScore":19,"rankGlobal":9,"rankLanguage":9,"license":20,"archived":21,"fork":21,"defaultBranch":22,"hasWiki":23,"hasPages":21,"topics":24,"createdAt":9,"pushedAt":9,"updatedAt":25,"readmeContent":26,"aiSummary":27,"trendingCount":15,"starSnapshotCount":15,"syncStatus":16,"lastSyncTime":28,"discoverSource":29},80125,"PAE","ZhengrongYue\u002FPAE","ZhengrongYue","Official Implementation of \"What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned   Autoencoders for Latent Diffusion\"",null,"Python",61,5,3,1,0,2,7,6,48.53,"MIT License",false,"main",true,[],"2026-06-12 04:01:26","\u003Cdiv align=\"center\">\n\u003Ch1>What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion\u003C\u002Fh1>\n\u003Cdiv align=\"center\">\n\n\u003Ca href=\"https:\u002F\u002Fzhengrongyue.github.io\u002Fpae.github.io\u002F\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FProject_Page-0055b3?logo=githubpages&logoColor=white\" alt=\"Project Page\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fhuggingface.co\u002Fyuezhengrong\u002FPAE-collections\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FHuggingface-Model-PAE\" alt=\"Huggingface\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fwww.modelscope.cn\u002Fmodels\u002FZhengrongYue\u002FPAE-Collections\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FModelScope-Models-purple\" alt=\"ModelScope\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FZhengrongYue\u002FPAE\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FGitHub-ZhengrongYue%2FPAE-black?logo=github&logoColor=white\" alt=\"GitHub\">\n\u003C\u002Fa>\n\u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fpdf\u002F2605.07915\">\n    \u003Cimg src=\"https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPaper-2605.07915-b31b1b?logo=arxiv&logoColor=white\" alt=\"arXiv Paper\">\n\u003C\u002Fa>\n\u003C\u002Fdiv>\n\n\n\u003C\u002Fdiv>\n\nThis project presents **PAE** (Prior-Aligned AutoEncoder), a tokenizer framework that explicitly shapes a **diffusion-friendly latent manifold** for latent diffusion models. Instead of relying solely on reconstruction fidelity or passively inheriting pretrained representations, PAE identifies and optimizes three key properties of a diffusion-friendly latent space — **spatial structure coherence**, **local manifold continuity**, and **global manifold semantics** — through targeted prior-alignment regularizations. On ImageNet 256×256, PAE achieves a new **state-of-the-art gFID of 1.03** with up to **13× faster convergence** than RAE under the same LightningDiT setup. \u003Cbr>\u003Cbr>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fpae_teaser.png\" width=\"90%\">\n  \u003Cbr>\n  \u003Cem>\n  Prior alignment constructs a diffusion-friendly latent manifold. Left: Compared with reconstruction-oriented counterparts, the prior-aligned latent manifold is more structurally coherent, locally continuous, and semantically organized. Right: PAE yields faster convergence, better generation quality, and robust few-step sampling performance.\u003C\u002Fem>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fpae_gen_vis.png\" width=\"90%\">\n  \u003Cbr>\n  \u003Cem>Class-conditional samples generated by PAE with LightningDiT-XL\u002F1 on ImageNet 256×256.\u003C\u002Fem>\n\u003C\u002Fp>\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fpae_sota.png\" width=\"90%\">\n  \u003Cbr>\n  \u003Cem>At only 80 epochs, PAE achieves a gFID of 1.27. This performance already outperforms many baselines trained for 800 epochs, such as FAE (gFID 1.29) and AlignTok (gFID 1.37). With extended training, PAE sets a new SOTA gFID of 1.03 on ImageNet 256×256.\u003C\u002Fem>\n\u003C\u002Fp>\n\n## 🔥 Updates\n\n* **[2026.05.09]** 🚀 🚀 🚀 We release **PAE**. Code and pretrained models are now available!\n* **[2026.05.10]** 🛠️ **Scale-PAE** is currently in progress. Once completed, we will release the full dynamic resolution PAE training code.\n\n## ✨ Highlights\n\n- 🎯 **New Perspective**: We study what makes a latent manifold diffusion-friendly, identifying three key properties: spatial structure coherence, local manifold continuity, and global manifold semantics.\n- 🏗️ **Explicit Manifold Shaping**: PAE turns these properties into explicit training objectives via three prior-alignment regularizations (SSR, MCR, SCR), rather than leaving them to emerge indirectly.\n- ⚡ **13× Faster Convergence**: PAE reaches performance comparable to RAE with up to 13× fewer training epochs under the same LightningDiT setup.\n- 🏆 **State-of-the-Art**: Achieves gFID **1.03** on ImageNet 256×256, the best result among all compared methods.\n- 🔄 **Encoder-Agnostic**: Compatible with multiple VFM backbones including DINOv2, SigLIP2, DINOv3, and MAE.\n\n\n## 🏛️ Architecture\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fassets\u002Fpae_framework.png\" width=\"90%\">\n  \u003Cbr>\n  \u003Cem>Overview of the PAE framework. A frozen VFM provides stable representation features. DAM injects pixel detail while preserving the VFM as the dominant semantic source. Three prior-alignment objectives explicitly shape the latent manifold.\u003C\u002Fem>\n\u003C\u002Fp>\n\n## ❤️ Acknowledgement\nOur work builds upon the foundations laid by many excellent projects in the field. We would like to thank the authors of [LightningDiT](https:\u002F\u002Fgithub.com\u002Fhustvl\u002FLightningDiT), [RAE](https:\u002F\u002Fgithub.com\u002Fbytetriper\u002FRAE), [GAE](https:\u002F\u002Fgithub.com\u002Fsii-research\u002FGAE), [ADM](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fguided-diffusion). We are grateful for their contributions to the community.\n\n## 📘 Citation\nPlease consider citing our work as follows if it is helpful.\n```\n@misc{yue2026mattersdiffusionfriendlylatentmanifold,\n      title={What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion}, \n      author={Zhengrong Yue and Taihang Hu and Mengting Chen and Haiyu Zhang and Zihao Pan and Tao Liu and Zikang Wang and Jinsong Lan and Xiaoyong Zhu and Bo Zheng and Yali Wang},\n      year={2026},\n      eprint={2605.07915},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV},\n      url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2605.07915}, \n}\n```\n","PAE（Prior-Aligned AutoEncoder）是一个旨在为潜在扩散模型构建扩散友好型潜在流形的框架。该项目通过特定的先验对齐正则化，优化了潜在空间的空间结构一致性、局部流形连续性和全局流形语义这三个关键属性，从而在不依赖于单纯的重建保真度或预训练表示的情况下，显著提升了生成质量和收敛速度。实验表明，在ImageNet 256×256数据集上，PAE实现了1.03的gFID值，并且比同类方法RAE快13倍达到收敛。此项目适合需要高效生成高质量图像的应用场景，如图像合成、风格迁移等。","2026-06-11 03:59:21","CREATED_QUERY"]