[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-74052":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":17,"stars7d":18,"stars30d":19,"stars90d":16,"forks30d":16,"starsTrendScore":20,"compositeScore":21,"rankGlobal":10,"rankLanguage":10,"license":22,"archived":23,"fork":23,"defaultBranch":24,"hasWiki":23,"hasPages":23,"topics":25,"createdAt":10,"pushedAt":10,"updatedAt":38,"readmeContent":39,"aiSummary":40,"trendingCount":16,"starSnapshotCount":16,"syncStatus":41,"lastSyncTime":42,"discoverSource":43},74052,"VAR","FoundationVision\u002FVAR","FoundationVision","[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of \"Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction\". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!","",null,"Jupyter Notebook",8698,568,100,57,0,3,5,16,9,39.27,"MIT License",false,"main",[26,27,28,29,30,31,32,33,34,35,36,37],"auto-regressive-model","autoregressive-models","diffusion-models","generative-ai","generative-model","gpt","gpt-2","image-generation","large-language-models","neurips","transformers","vision-transformer","2026-06-12 02:03:21","# VAR: a new visual generation method elevates GPT-style models beyond diffusion🚀 & Scaling laws observed📈\n\n\u003Cdiv align=\"center\">\n\n[![demo platform](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FPlay%20with%20VAR%21-VAR%20demo%20platform-lightblue)](https:\u002F\u002Fopensource.bytedance.com\u002Fgmpt\u002Ft2i\u002Finvite)&nbsp;\n[![arXiv](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FarXiv%20paper-2404.02905-b31b1b.svg)](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02905)&nbsp;\n[![huggingface weights](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Weights-FoundationVision\u002Fvar-yellow)](https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar)&nbsp;\n[![SOTA](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FState%20of%20the%20Art-Image%20Generation%20on%20ImageNet%20%28AR%29-32B1B4?logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPHN2ZyB3aWR0aD0iNjA2IiBoZWlnaHQ9IjYwNiIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayIgb3ZlcmZsb3c9ImhpZGRlbiI%2BPGRlZnM%2BPGNsaXBQYXRoIGlkPSJjbGlwMCI%2BPHJlY3QgeD0iLTEiIHk9Ii0xIiB3aWR0aD0iNjA2IiBoZWlnaHQ9IjYwNiIvPjwvY2xpcFBhdGg%2BPC9kZWZzPjxnIGNsaXAtcGF0aD0idXJsKCNjbGlwMCkiIHRyYW5zZm9ybT0idHJhbnNsYXRlKDEgMSkiPjxyZWN0IHg9IjUyOSIgeT0iNjYiIHdpZHRoPSI1NiIgaGVpZ2h0PSI0NzMiIGZpbGw9IiM0NEYyRjYiLz48cmVjdCB4PSIxOSIgeT0iNjYiIHdpZHRoPSI1NyIgaGVpZ2h0PSI0NzMiIGZpbGw9IiM0NEYyRjYiLz48cmVjdCB4PSIyNzQiIHk9IjE1MSIgd2lkdGg9IjU3IiBoZWlnaHQ9IjMwMiIgZmlsbD0iIzQ0RjJGNiIvPjxyZWN0IHg9IjEwNCIgeT0iMTUxIiB3aWR0aD0iNTciIGhlaWdodD0iMzAyIiBmaWxsPSIjNDRGMkY2Ii8%2BPHJlY3QgeD0iNDQ0IiB5PSIxNTEiIHdpZHRoPSI1NyIgaGVpZ2h0PSIzMDIiIGZpbGw9IiM0NEYyRjYiLz48cmVjdCB4PSIzNTkiIHk9IjE3MCIgd2lkdGg9IjU2IiBoZWlnaHQ9IjI2NCIgZmlsbD0iIzQ0RjJGNiIvPjxyZWN0IHg9IjE4OCIgeT0iMTcwIiB3aWR0aD0iNTciIGhlaWdodD0iMjY0IiBmaWxsPSIjNDRGMkY2Ii8%2BPHJlY3QgeD0iNzYiIHk9IjY2IiB3aWR0aD0iNDciIGhlaWdodD0iNTciIGZpbGw9IiM0NEYyRjYiLz48cmVjdCB4PSI0ODIiIHk9IjY2IiB3aWR0aD0iNDciIGhlaWdodD0iNTciIGZpbGw9IiM0NEYyRjYiLz48cmVjdCB4PSI3NiIgeT0iNDgyIiB3aWR0aD0iNDciIGhlaWdodD0iNTciIGZpbGw9IiM0NEYyRjYiLz48cmVjdCB4PSI0ODIiIHk9IjQ4MiIgd2lkdGg9IjQ3IiBoZWlnaHQ9IjU3IiBmaWxsPSIjNDRGMkY2Ii8%2BPC9nPjwvc3ZnPg%3D%3D)](https:\u002F\u002Fpaperswithcode.com\u002Fsota\u002Fimage-generation-on-imagenet-256x256?tag_filter=485&p=visual-autoregressive-modeling-scalable-image)\n\n\n\u003C\u002Fdiv>\n\u003Cp align=\"center\" style=\"font-size: larger;\">\n  \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02905\">Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction\u003C\u002Fa>\n\u003C\u002Fp>\n\n\u003Cdiv>\n  \u003Cp align=\"center\" style=\"font-size: larger;\">\n    \u003Cstrong>NeurIPS 2024 Best Paper\u003C\u002Fstrong>\n  \u003C\u002Fp>\n\u003C\u002Fdiv>\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FVAR\u002Fassets\u002F39692511\u002F9850df90-20b1-4f29-8592-e3526d16d755\" width=95%>\n\u003Cp>\n\n\u003Cbr>\n\n## News\n* **2025-11:** We Release our Text-to-Video generation model **InfinityStar** based on VAR & Infinity, please check [Infinity⭐️](https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FInfinityStar).\n* **2025-11:** 🎉 InfinityStar is accepted as **NeurIPS 2025 Oral.**\n* **2025-04:** 🎉 Infinity is accepted as **CVPR 2025 Oral.**\n* **2024-12:** 🏆 VAR received **NeurIPS 2024 Best Paper Award**.\n* **2024-12:** 🔥 We Release our Text-to-Image research based on VAR, please check [Infinity](https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FInfinity).\n* **2024-09:** VAR is accepted as **NeurIPS 2024 Oral** Presentation.\n* **2024-04:** [Visual AutoRegressive modeling](https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FVAR) is released.\n\n## 🕹️ Try and Play with VAR!\n\n~~We provide a [demo website](https:\u002F\u002Fvar.vision\u002Fdemo) for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling!~~\n\nWe provide a [demo website](https:\u002F\u002Fopensource.bytedance.com\u002Fgmpt\u002Ft2i\u002Finvite) for you to play with VAR Text-to-Image and generate images interactively. Enjoy the fun of visual autoregressive modeling!\n\nWe also provide [demo_sample.ipynb](demo_sample.ipynb) for you to see more technical details about VAR.\n\n[\u002F\u002F]: # (\u003Cp align=\"center\">)\n[\u002F\u002F]: # (\u003Cimg src=\"https:\u002F\u002Fuser-images.githubusercontent.com\u002F39692511\u002F226376648-3f28a1a6-275d-4f88-8f3e-cd1219882488.png\" width=50%)\n[\u002F\u002F]: # (\u003Cp>)\n\n\n## What's New?\n\n### 🔥 Introducing VAR: a new paradigm in autoregressive visual generation✨:\n\nVisual Autoregressive Modeling (VAR) redefines the autoregressive learning on images as coarse-to-fine \"next-scale prediction\" or \"next-resolution prediction\", diverging from the standard raster-scan \"next-token prediction\".\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FVAR\u002Fassets\u002F39692511\u002F3e12655c-37dc-4528-b923-ec6c4cfef178\" width=93%>\n\u003Cp>\n\n### 🔥 For the first time, GPT-style autoregressive models surpass diffusion models🚀:\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FVAR\u002Fassets\u002F39692511\u002Fcc30b043-fa4e-4d01-a9b1-e50650d5675d\" width=55%>\n\u003Cp>\n\n\n### 🔥 Discovering power-law Scaling Laws in VAR transformers📈:\n\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FVAR\u002Fassets\u002F39692511\u002Fc35fb56e-896e-4e4b-9fb9-7a1c38513804\" width=85%>\n\u003Cp>\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FVAR\u002Fassets\u002F39692511\u002F91d7b92c-8fc3-44d9-8fb4-73d6cdb8ec1e\" width=85%>\n\u003Cp>\n\n\n### 🔥 Zero-shot generalizability🛠️:\n\n\u003Cp align=\"center\">\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FVAR\u002Fassets\u002F39692511\u002Fa54a4e52-6793-4130-bae2-9e459a08e96a\" width=70%>\n\u003Cp>\n\n#### For a deep dive into our analyses, discussions, and evaluations, check out our [paper](https:\u002F\u002Farxiv.org\u002Fabs\u002F2404.02905).\n\n\n## VAR zoo\nWe provide VAR models for you to play with, which are on \u003Ca href='https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar'>\u003Cimg src='https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Huggingface-FoundationVision\u002Fvar-yellow'>\u003C\u002Fa> or can be downloaded from the following links:\n\n|   model    | reso. |   FID    | rel. cost | #params | HF weights🤗                                                                        |\n|:----------:|:-----:|:--------:|:---------:|:-------:|:------------------------------------------------------------------------------------|\n|  VAR-d16   |  256  |   3.55   |    0.4    |  310M   | [var_d16.pth](https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar\u002Fresolve\u002Fmain\u002Fvar_d16.pth) |\n|  VAR-d20   |  256  |   2.95   |    0.5    |  600M   | [var_d20.pth](https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar\u002Fresolve\u002Fmain\u002Fvar_d20.pth) |\n|  VAR-d24   |  256  |   2.33   |    0.6    |  1.0B   | [var_d24.pth](https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar\u002Fresolve\u002Fmain\u002Fvar_d24.pth) |\n|  VAR-d30   |  256  |   1.97   |     1     |  2.0B   | [var_d30.pth](https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar\u002Fresolve\u002Fmain\u002Fvar_d30.pth) |\n| VAR-d30-re |  256  | **1.80** |     1     |  2.0B   | [var_d30.pth](https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar\u002Fresolve\u002Fmain\u002Fvar_d30.pth) |\n| VAR-d36    |  512  | **2.63** |     -     |  2.3B   | [var_d36.pth](https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar\u002Fresolve\u002Fmain\u002Fvar_d36.pth) |\n\nYou can load these models to generate images via the codes in [demo_sample.ipynb](demo_sample.ipynb). Note: you need to download [vae_ch160v4096z32.pth](https:\u002F\u002Fhuggingface.co\u002FFoundationVision\u002Fvar\u002Fresolve\u002Fmain\u002Fvae_ch160v4096z32.pth) first.\n\n\n## Installation\n\n1. Install `torch>=2.0.0`.\n2. Install other pip packages via `pip3 install -r requirements.txt`.\n3. Prepare the [ImageNet](http:\u002F\u002Fimage-net.org\u002F) dataset\n    \u003Cdetails>\n    \u003Csummary> assume the ImageNet is in `\u002Fpath\u002Fto\u002Fimagenet`. It should be like this:\u003C\u002Fsummary>\n\n    ```\n    \u002Fpath\u002Fto\u002Fimagenet\u002F:\n        train\u002F:\n            n01440764: \n                many_images.JPEG ...\n            n01443537:\n                many_images.JPEG ...\n        val\u002F:\n            n01440764:\n                ILSVRC2012_val_00000293.JPEG ...\n            n01443537:\n                ILSVRC2012_val_00000236.JPEG ...\n    ```\n   **NOTE: The arg `--data_path=\u002Fpath\u002Fto\u002Fimagenet` should be passed to the training script.**\n    \u003C\u002Fdetails>\n\n5. (Optional) install and compile `flash-attn` and `xformers` for faster attention computation. Our code will automatically use them if installed. See [models\u002Fbasic_var.py#L15-L30](models\u002Fbasic_var.py#L15-L30).\n\n\n## Training Scripts\n\nTo train VAR-{d16, d20, d24, d30, d36-s} on ImageNet 256x256 or 512x512, you can run the following command:\n```shell\n# d16, 256x256\ntorchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py \\\n  --depth=16 --bs=768 --ep=200 --fp16=1 --alng=1e-3 --wpe=0.1\n# d20, 256x256\ntorchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py \\\n  --depth=20 --bs=768 --ep=250 --fp16=1 --alng=1e-3 --wpe=0.1\n# d24, 256x256\ntorchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py \\\n  --depth=24 --bs=768 --ep=350 --tblr=8e-5 --fp16=1 --alng=1e-4 --wpe=0.01\n# d30, 256x256\ntorchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py \\\n  --depth=30 --bs=1024 --ep=350 --tblr=8e-5 --fp16=1 --alng=1e-5 --wpe=0.01 --twde=0.08\n# d36-s, 512x512 (-s means saln=1, shared AdaLN)\ntorchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py \\\n  --depth=36 --saln=1 --pn=512 --bs=768 --ep=350 --tblr=8e-5 --fp16=1 --alng=5e-6 --wpe=0.01 --twde=0.08\n```\nA folder named `local_output` will be created to save the checkpoints and logs.\nYou can monitor the training process by checking the logs in `local_output\u002Flog.txt` and `local_output\u002Fstdout.txt`, or using `tensorboard --logdir=local_output\u002F`.\n\nIf your experiment is interrupted, just rerun the command, and the training will **automatically resume** from the last checkpoint in `local_output\u002Fckpt*.pth` (see [utils\u002Fmisc.py#L344-L357](utils\u002Fmisc.py#L344-L357)).\n\n## Sampling & Zero-shot Inference\n\nFor FID evaluation, use `var.autoregressive_infer_cfg(..., cfg=1.5, top_p=0.96, top_k=900, more_smooth=False)` to sample 50,000 images (50 per class) and save them as PNG (not JPEG) files in a folder. Pack them into a `.npz` file via `create_npz_from_sample_folder(sample_folder)` in [utils\u002Fmisc.py#L344](utils\u002Fmisc.py#L360).\nThen use the [OpenAI's FID evaluation toolkit](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fguided-diffusion\u002Ftree\u002Fmain\u002Fevaluations) and reference ground truth npz file of [256x256](https:\u002F\u002Fopenaipublic.blob.core.windows.net\u002Fdiffusion\u002Fjul-2021\u002Fref_batches\u002Fimagenet\u002F256\u002FVIRTUAL_imagenet256_labeled.npz) or [512x512](https:\u002F\u002Fopenaipublic.blob.core.windows.net\u002Fdiffusion\u002Fjul-2021\u002Fref_batches\u002Fimagenet\u002F512\u002FVIRTUAL_imagenet512.npz) to evaluate FID, IS, precision, and recall.\n\nNote a relatively small `cfg=1.5` is used for trade-off between image quality and diversity. You can adjust it to `cfg=5.0`, or sample with `autoregressive_infer_cfg(..., more_smooth=True)` for **better visual quality**.\nWe'll provide the sampling script later.\n\n\n## Third-party Usage and Research\n\n***In this pargraph, we cross link third-party repositories or research which use VAR and report results. You can let us know by raising an issue***\n\n(`Note please report accuracy numbers and provide trained models in your new repository to facilitate others to get sense of correctness and model behavior`)\n\n| **Time**     | **Research**                                                                                                                  | **Link**                                                           |\n|--------------|-------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|\n| [5\u002F12\u002F2025]  | [ICML 2025]Continuous Visual Autoregressive Generation via Score Maximization                                                 | https:\u002F\u002Fgithub.com\u002Fshaochenze\u002FEAR                                  |\n| [5\u002F8\u002F2025]   | Generative Autoregressive Transformers for Model-Agnostic Federated MRI Reconstruction                                        | https:\u002F\u002Fgithub.com\u002Ficon-lab\u002FFedGAT                                 |\n| [4\u002F7\u002F2025]   | FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning                                                       | https:\u002F\u002Fgithub.com\u002Fcsguoh\u002FFastVAR                                  |\n| [4\u002F3\u002F2025]   | VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning    | https:\u002F\u002Fgithub.com\u002FVARGPT-family\u002FVARGPT-v1.1                       |\n| [3\u002F31\u002F2025]  | Training-Free Text-Guided Image Editing with Visual Autoregressive Model                                                      | https:\u002F\u002Fgithub.com\u002Fwyf0912\u002FAREdit                                  |\n| [3\u002F17\u002F2025]  | Next-Scale Autoregressive Models are Zero-Shot Single-Image Object View Synthesizers                                          | https:\u002F\u002Fgithub.com\u002FShiran-Yuan\u002FArchonView                          |\n| [3\u002F14\u002F2025]  | Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking                                          | https:\u002F\u002Farxiv.org\u002Fabs\u002F2503.11324                                   |\n| [3\u002F3\u002F2025]   | [ICML 2025]Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator  | https:\u002F\u002Fresearch.nvidia.com\u002Flabs\u002Fdir\u002Fddo\u002F                          |\n| [2\u002F28\u002F2025]  | Autoregressive Medical Image Segmentation via Next-Scale Mask Prediction                                                      | https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.20784                                   |\n| [2\u002F27\u002F2025]  | FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction                                                  | https:\u002F\u002Fgithub.com\u002Fjiaosiyu1999\u002FFlexVAR                            |\n| [2\u002F17\u002F2025]  | MARS: Mesh AutoRegressive Model for 3D Shape Detailization                                                                    | https:\u002F\u002Farxiv.org\u002Fabs\u002F2502.11390                                   |\n| [1\u002F31\u002F2025]  | [ICML 2025]Visual Autoregressive Modeling for Image Super-Resolution                                                          | https:\u002F\u002Fgithub.com\u002Fquyp2000\u002FVARSR                                  |\n| [1\u002F21\u002F2025]  | VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model                       | https:\u002F\u002Fgithub.com\u002FVARGPT-family\u002FVARGPT                            |\n| [1\u002F26\u002F2025]  | [ICML 2025]Visual Generation Without Guidance                                                                                 | https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FGFT                                      |\n| [12\u002F30\u002F2024] | Next Token Prediction Towards Multimodal Intelligence                                                                         | https:\u002F\u002Fgithub.com\u002FLMM101\u002FAwesome-Multimodal-Next-Token-Prediction |\n| [12\u002F30\u002F2024] | Varformer: Adapting VAR’s Generative Prior for Image Restoration                                                              | https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.21063                                   |\n| [12\u002F22\u002F2024] | [ICLR 2025]Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching                         | https:\u002F\u002Fgithub.com\u002Fimagination-research\u002Fdistilled-decoding         |\n| [12\u002F19\u002F2024] | FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching                                                        | https:\u002F\u002Fgithub.com\u002FOliverRensu\u002FFlowAR                              |\n| [12\u002F13\u002F2024] | 3D representation in 512-Byte: Variational tokenizer is the key for autoregressive 3D generation                              | https:\u002F\u002Fgithub.com\u002Fsparse-mvs-2\u002FVAT                                |\n| [12\u002F9\u002F2024]  | CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction                                                 | https:\u002F\u002Fcarp-robot.github.io\u002F                                      |\n| [12\u002F5\u002F2024]  | [CVPR 2025]Infinity ∞: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis                            | https:\u002F\u002Fgithub.com\u002FFoundationVision\u002FInfinity                       |\n| [12\u002F5\u002F2024]  | [CVPR 2025]Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis                                              | https:\u002F\u002Fgithub.com\u002Fyandex-research\u002Fswitti                          |\n| [12\u002F4\u002F2024]  | [CVPR 2025]TokenFlow🚀: Unified Image Tokenizer for Multimodal Understanding and Generation                                   | https:\u002F\u002Fgithub.com\u002FByteFlow-AI\u002FTokenFlow                           |\n| [12\u002F3\u002F2024]  | XQ-GAN🚀: An Open-source Image Tokenization Framework for Autoregressive Generation                                           | https:\u002F\u002Fgithub.com\u002Flxa9867\u002FImageFolder                             |\n| [11\u002F28\u002F2024] | [CVPR 2025]CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient                                       | https:\u002F\u002Fgithub.com\u002Fczg1225\u002FCoDe                                    |\n| [11\u002F28\u002F2024] | [CVPR 2025]Scalable Autoregressive Monocular Depth Estimation                                                                 | https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.11361                                   |\n| [11\u002F27\u002F2024] | [CVPR 2025]SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE                              | https:\u002F\u002Fgithub.com\u002Fcyw-3d\u002FSAR3D                                    |\n| [11\u002F26\u002F2024] | LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization                                | https:\u002F\u002Farxiv.org\u002Fabs\u002F2411.17178                                   |\n| [11\u002F15\u002F2024] | M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation                                         | https:\u002F\u002Fgithub.com\u002FOliverRensu\u002FMVAR                                |\n| [10\u002F14\u002F2024] | [ICLR 2025]HART: Efficient Visual Generation with Hybrid Autoregressive Transformer                                           | https:\u002F\u002Fgithub.com\u002Fmit-han-lab\u002Fhart                                |\n| [10\u002F12\u002F2024] | [ICLR 2025 Oral]Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment                                 | https:\u002F\u002Fgithub.com\u002Fthu-ml\u002FCCA                                      |\n| [10\u002F3\u002F2024]  | [ICLR 2025]ImageFolder🚀: Autoregressive Image Generation with Folded Tokens                                                  | https:\u002F\u002Fgithub.com\u002Flxa9867\u002FImageFolder                             |\n| [07\u002F25\u002F2024] | ControlVAR: Exploring Controllable Visual Autoregressive Modeling                                                             | https:\u002F\u002Fgithub.com\u002Flxa9867\u002FControlVAR                              |\n| [07\u002F3\u002F2024]  | VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling                                                        | https:\u002F\u002Fgithub.com\u002Fdaixiangzi\u002FVAR-CLIP                             |\n| [06\u002F16\u002F2024] | STAR: Scale-wise Text-to-image generation via Auto-Regressive representations                                                 | https:\u002F\u002Farxiv.org\u002Fabs\u002F2406.10797                                   |\n\n\n## License\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n\n## Citation\nIf our work assists your research, feel free to give us a star ⭐ or cite us using:\n```\n@Article{VAR,\n      title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction}, \n      author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},\n      year={2024},\n      eprint={2404.02905},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n\n```\n@misc{Infinity,\n    title={Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis}, \n    author={Jian Han and Jinlai Liu and Yi Jiang and Bin Yan and Yuqi Zhang and Zehuan Yuan and Bingyue Peng and Xiaobing Liu},\n    year={2024},\n    eprint={2412.04431},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV},\n    url={https:\u002F\u002Farxiv.org\u002Fabs\u002F2412.04431}, \n}\n```\n","FoundationVision\u002FVAR 是一个专注于自回归图像生成的项目，通过预测下一尺度来实现可扩展的图像生成。该项目提供了一个极其简单且用户友好的代码库，同时具备最先进的技术特性。它基于GPT风格模型超越了扩散模型，在视觉生成领域取得了显著进展，并观察到了生成过程中的规模法则。VAR特别适合需要高质量图像生成的应用场景，如创意设计、虚拟现实内容创作等。采用Jupyter Notebook编写，易于上手和扩展。",2,"2026-06-11 03:48:36","high_star"]