[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-72775":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":9,"language":10,"languages":9,"totalLinesOfCode":9,"stars":11,"forks":12,"watchers":13,"openIssues":14,"contributorsCount":15,"subscribersCount":15,"size":15,"stars1d":15,"stars7d":16,"stars30d":17,"stars90d":15,"forks30d":15,"starsTrendScore":15,"compositeScore":18,"rankGlobal":9,"rankLanguage":9,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":9,"pushedAt":9,"updatedAt":24,"readmeContent":25,"aiSummary":26,"trendingCount":15,"starSnapshotCount":15,"syncStatus":27,"lastSyncTime":28,"discoverSource":29},72775,"latent-diffusion","CompVis\u002Flatent-diffusion","CompVis","High-Resolution Image Synthesis with Latent Diffusion Models",null,"Jupyter Notebook",14061,1729,95,272,0,11,42,44.71,"MIT License",false,"main",true,[],"2026-06-12 02:03:07","# Latent Diffusion Models\n[arXiv](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.10752) | [BibTeX](#bibtex)\n\n\u003Cp align=\"center\">\n\u003Cimg src=assets\u002Fresults.gif \u002F>\n\u003C\u002Fp>\n\n\n\n[**High-Resolution Image Synthesis with Latent Diffusion Models**](https:\u002F\u002Farxiv.org\u002Fabs\u002F2112.10752)\u003Cbr\u002F>\n[Robin Rombach](https:\u002F\u002Fgithub.com\u002Frromb)\\*,\n[Andreas Blattmann](https:\u002F\u002Fgithub.com\u002Fablattmann)\\*,\n[Dominik Lorenz](https:\u002F\u002Fgithub.com\u002Fqp-qp)\\,\n[Patrick Esser](https:\u002F\u002Fgithub.com\u002Fpesser),\n[Björn Ommer](https:\u002F\u002Fhci.iwr.uni-heidelberg.de\u002FStaff\u002Fbommer)\u003Cbr\u002F>\n\\* equal contribution\n\n\u003Cp align=\"center\">\n\u003Cimg src=assets\u002Fmodelfigure.png \u002F>\n\u003C\u002Fp>\n\n## News\n\n### July 2022\n- Inference code and model weights to run our [retrieval-augmented diffusion models](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.11824) are now available. See [this section](#retrieval-augmented-diffusion-models).\n### April 2022\n- Thanks to [Katherine Crowson](https:\u002F\u002Fgithub.com\u002Fcrowsonkb), classifier-free guidance received a ~2x speedup and the [PLMS sampler](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.09778) is available. See also [this PR](https:\u002F\u002Fgithub.com\u002FCompVis\u002Flatent-diffusion\u002Fpull\u002F51).\n\n- Our 1.45B [latent diffusion LAION model](#text-to-image) was integrated into [Huggingface Spaces 🤗](https:\u002F\u002Fhuggingface.co\u002Fspaces) using [Gradio](https:\u002F\u002Fgithub.com\u002Fgradio-app\u002Fgradio). Try out the Web Demo: [![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fmultimodalart\u002Flatentdiffusion)\n\n- More pre-trained LDMs are available: \n  - A 1.45B [model](#text-to-image) trained on the [LAION-400M](https:\u002F\u002Farxiv.org\u002Fabs\u002F2111.02114) database.\n  - A class-conditional model on ImageNet, achieving a FID of 3.6 when using [classifier-free guidance](https:\u002F\u002Fopenreview.net\u002Fpdf?id=qw8AKxfYbI) Available via a [colab notebook](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FCompVis\u002Flatent-diffusion\u002Fblob\u002Fmain\u002Fscripts\u002Flatent_imagenet_diffusion.ipynb) [![][colab]][colab-cin].\n  \n## Requirements\nA suitable [conda](https:\u002F\u002Fconda.io\u002F) environment named `ldm` can be created\nand activated with:\n\n```\nconda env create -f environment.yaml\nconda activate ldm\n```\n\n# Pretrained Models\nA general list of all available checkpoints is available in via our [model zoo](#model-zoo).\nIf you use any of these models in your work, we are always happy to receive a [citation](#bibtex).\n\n## Retrieval Augmented Diffusion Models\n![rdm-figure](assets\u002Frdm-preview.jpg)\nWe include inference code to run our retrieval-augmented diffusion models (RDMs) as described in [https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.11824](https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.11824).\n\n\nTo get started, install the additionally required python packages into your `ldm` environment\n```shell script\npip install transformers==4.19.2 scann kornia==0.6.4 torchmetrics==0.6.0\npip install git+https:\u002F\u002Fgithub.com\u002Farogozhnikov\u002Feinops.git\n```\nand download the trained weights (preliminary ceckpoints):\n\n```bash\nmkdir -p models\u002Frdm\u002Frdm768x768\u002F\nwget -O models\u002Frdm\u002Frdm768x768\u002Fmodel.ckpt https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Frdm\u002Fmodel.ckpt\n```\nAs these models are conditioned on a set of CLIP image embeddings, our RDMs support different inference modes, \nwhich are described in the following.\n#### RDM with text-prompt only (no explicit retrieval needed)\nSince CLIP offers a shared image\u002Ftext feature space, and RDMs learn to cover a neighborhood of a given\nexample during training, we can directly take a CLIP text embedding of a given prompt and condition on it.\nRun this mode via\n```\npython scripts\u002Fknn2img.py  --prompt \"a happy bear reading a newspaper, oil on canvas\"\n```\n\n#### RDM with text-to-image retrieval\n\nTo be able to run a RDM conditioned on a text-prompt and additionally images retrieved from this prompt, you will also need to download the corresponding retrieval database. \nWe provide two distinct databases extracted from the [Openimages-](https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Findex.html) and [ArtBench-](https:\u002F\u002Fgithub.com\u002Fliaopeiyuan\u002Fartbench) datasets. \nInterchanging the databases results in different capabilities of the model as visualized below, although the learned weights are the same in both cases. \n\nDownload the retrieval-databases which contain the retrieval-datasets ([Openimages](https:\u002F\u002Fstorage.googleapis.com\u002Fopenimages\u002Fweb\u002Findex.html) (~11GB) and [ArtBench](https:\u002F\u002Fgithub.com\u002Fliaopeiyuan\u002Fartbench) (~82MB)) compressed into CLIP image embeddings:\n```bash\nmkdir -p data\u002Frdm\u002Fretrieval_databases\nwget -O data\u002Frdm\u002Fretrieval_databases\u002Fartbench.zip https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Frdm\u002Fartbench_databases.zip\nwget -O data\u002Frdm\u002Fretrieval_databases\u002Fopenimages.zip https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Frdm\u002Fopenimages_database.zip\nunzip data\u002Frdm\u002Fretrieval_databases\u002Fartbench.zip -d data\u002Frdm\u002Fretrieval_databases\u002F\nunzip data\u002Frdm\u002Fretrieval_databases\u002Fopenimages.zip -d data\u002Frdm\u002Fretrieval_databases\u002F\n```\nWe also provide trained [ScaNN](https:\u002F\u002Fgithub.com\u002Fgoogle-research\u002Fgoogle-research\u002Ftree\u002Fmaster\u002Fscann) search indices for ArtBench. Download and extract via\n```bash\nmkdir -p data\u002Frdm\u002Fsearchers\nwget -O data\u002Frdm\u002Fsearchers\u002Fartbench.zip https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Frdm\u002Fartbench_searchers.zip\nunzip data\u002Frdm\u002Fsearchers\u002Fartbench.zip -d data\u002Frdm\u002Fsearchers\n```\n\nSince the index for OpenImages is large (~21 GB), we provide a script to create and save it for usage during sampling. Note however,\nthat sampling with the OpenImages database will not be possible without this index. Run the script via\n```bash\npython scripts\u002Ftrain_searcher.py\n```\n\nRetrieval based text-guided sampling with visual nearest neighbors can be started via \n```\npython scripts\u002Fknn2img.py  --prompt \"a happy pineapple\" --use_neighbors --knn \u003Cnumber_of_neighbors> \n```\nNote that the maximum supported number of neighbors is 20. \nThe database can be changed via the cmd parameter ``--database`` which can be `[openimages, artbench-art_nouveau, artbench-baroque, artbench-expressionism, artbench-impressionism, artbench-post_impressionism, artbench-realism, artbench-renaissance, artbench-romanticism, artbench-surrealism, artbench-ukiyo_e]`.\nFor using `--database openimages`, the above script (`scripts\u002Ftrain_searcher.py`) must be executed before.\nDue to their relatively small size, the artbench datasetbases are best suited for creating more abstract concepts and do not work well for detailed text control. \n\n\n#### Coming Soon\n- better models\n- more resolutions\n- image-to-image retrieval\n\n## Text-to-Image\n![text2img-figure](assets\u002Ftxt2img-preview.png) \n\n\nDownload the pre-trained weights (5.7GB)\n```\nmkdir -p models\u002Fldm\u002Ftext2img-large\u002F\nwget -O models\u002Fldm\u002Ftext2img-large\u002Fmodel.ckpt https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fnitro\u002Ftxt2img-f8-large\u002Fmodel.ckpt\n```\nand sample with\n```\npython scripts\u002Ftxt2img.py --prompt \"a virus monster is playing guitar, oil on canvas\" --ddim_eta 0.0 --n_samples 4 --n_iter 4 --scale 5.0  --ddim_steps 50\n```\nThis will save each sample individually as well as a grid of size `n_iter` x `n_samples` at the specified output location (default: `outputs\u002Ftxt2img-samples`).\nQuality, sampling speed and diversity are best controlled via the `scale`, `ddim_steps` and `ddim_eta` arguments.\nAs a rule of thumb, higher values of `scale` produce better samples at the cost of a reduced output diversity.   \nFurthermore, increasing `ddim_steps` generally also gives higher quality samples, but returns are diminishing for values > 250.\nFast sampling (i.e. low values of `ddim_steps`) while retaining good quality can be achieved by using `--ddim_eta 0.0`.  \nFaster sampling (i.e. even lower values of `ddim_steps`) while retaining good quality can be achieved by using `--ddim_eta 0.0` and `--plms` (see [Pseudo Numerical Methods for Diffusion Models on Manifolds](https:\u002F\u002Farxiv.org\u002Fabs\u002F2202.09778)).\n\n#### Beyond 256²\n\nFor certain inputs, simply running the model in a convolutional fashion on larger features than it was trained on\ncan sometimes result in interesting results. To try it out, tune the `H` and `W` arguments (which will be integer-divided\nby 8 in order to calculate the corresponding latent size), e.g. run\n\n```\npython scripts\u002Ftxt2img.py --prompt \"a sunset behind a mountain range, vector image\" --ddim_eta 1.0 --n_samples 1 --n_iter 1 --H 384 --W 1024 --scale 5.0  \n```\nto create a sample of size 384x1024. Note, however, that controllability is reduced compared to the 256x256 setting. \n\nThe example below was generated using the above command. \n![text2img-figure-conv](assets\u002Ftxt2img-convsample.png)\n\n\n\n## Inpainting\n![inpainting](assets\u002Finpainting.png)\n\nDownload the pre-trained weights\n```\nwget -O models\u002Fldm\u002Finpainting_big\u002Flast.ckpt https:\u002F\u002Fheibox.uni-heidelberg.de\u002Ff\u002F4d9ac7ea40c64582b7c9\u002F?dl=1\n```\n\nand sample with\n```\npython scripts\u002Finpaint.py --indir data\u002Finpainting_examples\u002F --outdir outputs\u002Finpainting_results\n```\n`indir` should contain images `*.png` and masks `\u003Cimage_fname>_mask.png` like\nthe examples provided in `data\u002Finpainting_examples`.\n\n## Class-Conditional ImageNet\n\nAvailable via a [notebook](scripts\u002Flatent_imagenet_diffusion.ipynb) [![][colab]][colab-cin].\n![class-conditional](assets\u002Fbirdhouse.png)\n\n[colab]: \u003Chttps:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg>\n[colab-cin]: \u003Chttps:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002FCompVis\u002Flatent-diffusion\u002Fblob\u002Fmain\u002Fscripts\u002Flatent_imagenet_diffusion.ipynb>\n\n\n## Unconditional Models\n\nWe also provide a script for sampling from unconditional LDMs (e.g. LSUN, FFHQ, ...). Start it via\n\n```shell script\nCUDA_VISIBLE_DEVICES=\u003CGPU_ID> python scripts\u002Fsample_diffusion.py -r models\u002Fldm\u002F\u003Cmodel_spec>\u002Fmodel.ckpt -l \u003Clogdir> -n \u003C\\#samples> --batch_size \u003Cbatch_size> -c \u003C\\#ddim steps> -e \u003C\\#eta> \n```\n\n# Train your own LDMs\n\n## Data preparation\n\n### Faces \nFor downloading the CelebA-HQ and FFHQ datasets, proceed as described in the [taming-transformers](https:\u002F\u002Fgithub.com\u002FCompVis\u002Ftaming-transformers#celeba-hq) \nrepository.\n\n### LSUN \n\nThe LSUN datasets can be conveniently downloaded via the script available [here](https:\u002F\u002Fgithub.com\u002Ffyu\u002Flsun).\nWe performed a custom split into training and validation images, and provide the corresponding filenames\nat [https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flsun.zip](https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flsun.zip). \nAfter downloading, extract them to `.\u002Fdata\u002Flsun`. The beds\u002Fcats\u002Fchurches subsets should\nalso be placed\u002Fsymlinked at `.\u002Fdata\u002Flsun\u002Fbedrooms`\u002F`.\u002Fdata\u002Flsun\u002Fcats`\u002F`.\u002Fdata\u002Flsun\u002Fchurches`, respectively.\n\n### ImageNet\nThe code will try to download (through [Academic\nTorrents](http:\u002F\u002Facademictorrents.com\u002F)) and prepare ImageNet the first time it\nis used. However, since ImageNet is quite large, this requires a lot of disk\nspace and time. If you already have ImageNet on your disk, you can speed things\nup by putting the data into\n`${XDG_CACHE}\u002Fautoencoders\u002Fdata\u002FILSVRC2012_{split}\u002Fdata\u002F` (which defaults to\n`~\u002F.cache\u002Fautoencoders\u002Fdata\u002FILSVRC2012_{split}\u002Fdata\u002F`), where `{split}` is one\nof `train`\u002F`validation`. It should have the following structure:\n\n```\n${XDG_CACHE}\u002Fautoencoders\u002Fdata\u002FILSVRC2012_{split}\u002Fdata\u002F\n├── n01440764\n│   ├── n01440764_10026.JPEG\n│   ├── n01440764_10027.JPEG\n│   ├── ...\n├── n01443537\n│   ├── n01443537_10007.JPEG\n│   ├── n01443537_10014.JPEG\n│   ├── ...\n├── ...\n```\n\nIf you haven't extracted the data, you can also place\n`ILSVRC2012_img_train.tar`\u002F`ILSVRC2012_img_val.tar` (or symlinks to them) into\n`${XDG_CACHE}\u002Fautoencoders\u002Fdata\u002FILSVRC2012_train\u002F` \u002F\n`${XDG_CACHE}\u002Fautoencoders\u002Fdata\u002FILSVRC2012_validation\u002F`, which will then be\nextracted into above structure without downloading it again.  Note that this\nwill only happen if neither a folder\n`${XDG_CACHE}\u002Fautoencoders\u002Fdata\u002FILSVRC2012_{split}\u002Fdata\u002F` nor a file\n`${XDG_CACHE}\u002Fautoencoders\u002Fdata\u002FILSVRC2012_{split}\u002F.ready` exist. Remove them\nif you want to force running the dataset preparation again.\n\n\n## Model Training\n\nLogs and checkpoints for trained models are saved to `logs\u002F\u003CSTART_DATE_AND_TIME>_\u003Cconfig_spec>`.\n\n### Training autoencoder models\n\nConfigs for training a KL-regularized autoencoder on ImageNet are provided at `configs\u002Fautoencoder`.\nTraining can be started by running\n```\nCUDA_VISIBLE_DEVICES=\u003CGPU_ID> python main.py --base configs\u002Fautoencoder\u002F\u003Cconfig_spec>.yaml -t --gpus 0,    \n```\nwhere `config_spec` is one of {`autoencoder_kl_8x8x64`(f=32, d=64), `autoencoder_kl_16x16x16`(f=16, d=16), \n`autoencoder_kl_32x32x4`(f=8, d=4), `autoencoder_kl_64x64x3`(f=4, d=3)}.\n\nFor training VQ-regularized models, see the [taming-transformers](https:\u002F\u002Fgithub.com\u002FCompVis\u002Ftaming-transformers) \nrepository.\n\n### Training LDMs \n\nIn ``configs\u002Flatent-diffusion\u002F`` we provide configs for training LDMs on the LSUN-, CelebA-HQ, FFHQ and ImageNet datasets. \nTraining can be started by running\n\n```shell script\nCUDA_VISIBLE_DEVICES=\u003CGPU_ID> python main.py --base configs\u002Flatent-diffusion\u002F\u003Cconfig_spec>.yaml -t --gpus 0,\n``` \n\nwhere ``\u003Cconfig_spec>`` is one of {`celebahq-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),`ffhq-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),\n`lsun_bedrooms-ldm-vq-4`(f=4, VQ-reg. autoencoder, spatial size 64x64x3),\n`lsun_churches-ldm-vq-4`(f=8, KL-reg. autoencoder, spatial size 32x32x4),`cin-ldm-vq-8`(f=8, VQ-reg. autoencoder, spatial size 32x32x4)}.\n\n# Model Zoo \n\n## Pretrained Autoencoding Models\n![rec2](assets\u002Freconstruction2.png)\n\nAll models were trained until convergence (no further substantial improvement in rFID).\n\n| Model                   | rFID vs val | train steps           |PSNR           | PSIM          | Link                                                                                                                                                  | Comments              \n|-------------------------|------------|----------------|----------------|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------|\n| f=4, VQ (Z=8192, d=3)   | 0.58       | 533066 | 27.43  +\u002F- 4.26 | 0.53 +\u002F- 0.21 |     https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fvq-f4.zip                   |  |\n| f=4, VQ (Z=8192, d=3)   | 1.06       | 658131 | 25.21 +\u002F-  4.17 | 0.72 +\u002F- 0.26 | https:\u002F\u002Fheibox.uni-heidelberg.de\u002Ff\u002F9c6681f64bb94338a069\u002F?dl=1  | no attention          |\n| f=8, VQ (Z=16384, d=4)  | 1.14       | 971043 | 23.07 +\u002F- 3.99 | 1.17 +\u002F- 0.36 |       https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fvq-f8.zip                     |                       |\n| f=8, VQ (Z=256, d=4)    | 1.49       | 1608649 | 22.35 +\u002F- 3.81 | 1.26 +\u002F- 0.37 |   https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fvq-f8-n256.zip |  \n| f=16, VQ (Z=16384, d=8) | 5.15       | 1101166 | 20.83 +\u002F- 3.61 | 1.73 +\u002F- 0.43 |             https:\u002F\u002Fheibox.uni-heidelberg.de\u002Ff\u002F0e42b04e2e904890a9b6\u002F?dl=1                        |                       |\n|                         |            |  |                |               |                                                                                                                                                    |                       |\n| f=4, KL                 | 0.27       | 176991 | 27.53 +\u002F- 4.54 | 0.55 +\u002F- 0.24 |     https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fkl-f4.zip                                   |                       |\n| f=8, KL                 | 0.90       | 246803 | 24.19 +\u002F- 4.19 | 1.02 +\u002F- 0.35 |             https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fkl-f8.zip                            |                       |\n| f=16, KL     (d=16)     | 0.87       | 442998 | 24.08 +\u002F- 4.22 | 1.07 +\u002F- 0.36 |      https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fkl-f16.zip                                  |                       |\n | f=32, KL     (d=64)     | 2.04       | 406763 | 22.27 +\u002F- 3.93 | 1.41 +\u002F- 0.40 |             https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fkl-f32.zip                            |                       |\n\n### Get the models\n\nRunning the following script downloads und extracts all available pretrained autoencoding models.   \n```shell script\nbash scripts\u002Fdownload_first_stages.sh\n```\n\nThe first stage models can then be found in `models\u002Ffirst_stage_models\u002F\u003Cmodel_spec>`\n\n\n\n## Pretrained LDMs\n| Datset                          |   Task    | Model        | FID           | IS              | Prec | Recall | Link                                                                                                                                                                                   | Comments                                        \n|---------------------------------|------|--------------|---------------|-----------------|------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|\n| CelebA-HQ                       | Unconditional Image Synthesis    |  LDM-VQ-4 (200 DDIM steps, eta=0)| 5.11 (5.11)          | 3.29            | 0.72    | 0.49 |    https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fceleba.zip     |                                                 |  \n| FFHQ                            | Unconditional Image Synthesis    |  LDM-VQ-4 (200 DDIM steps, eta=1)| 4.98 (4.98)  | 4.50 (4.50)   | 0.73 | 0.50 |              https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fffhq.zip                                              |                                                 |\n| LSUN-Churches                   | Unconditional Image Synthesis   |  LDM-KL-8 (400 DDIM steps, eta=0)| 4.02 (4.02) | 2.72 | 0.64 | 0.52 |         https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Flsun_churches.zip        |                                                 |  \n| LSUN-Bedrooms                   | Unconditional Image Synthesis   |  LDM-VQ-4 (200 DDIM steps, eta=1)| 2.95 (3.0)          | 2.22 (2.23)| 0.66 | 0.48 | https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Flsun_bedrooms.zip |                                                 |  \n| ImageNet                        | Class-conditional Image Synthesis | LDM-VQ-8 (200 DDIM steps, eta=1) | 7.77(7.76)* \u002F15.82** | 201.56(209.52)* \u002F78.82** | 0.84* \u002F 0.65** | 0.35* \u002F 0.63** |   https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fcin.zip                                                                   | *: w\u002F guiding, classifier_scale 10  **: w\u002Fo guiding, scores in bracket calculated with script provided by [ADM](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fguided-diffusion) |   \n| Conceptual Captions             |  Text-conditional Image Synthesis | LDM-VQ-f4 (100 DDIM steps, eta=0) | 16.79         | 13.89           | N\u002FA | N\u002FA |              https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Ftext2img.zip                                | finetuned from LAION                            |   \n| OpenImages                      | Super-resolution   | LDM-VQ-4     | N\u002FA            | N\u002FA               | N\u002FA    | N\u002FA    |                                    https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fsr_bsr.zip                                    | BSR image degradation                           |\n| OpenImages                      | Layout-to-Image Synthesis    | LDM-VQ-4 (200 DDIM steps, eta=0) | 32.02         | 15.92           | N\u002FA    | N\u002FA    |                  https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Flayout2img_model.zip                                           |                                                 | \n| Landscapes      |  Semantic Image Synthesis   | LDM-VQ-4  | N\u002FA             | N\u002FA               | N\u002FA    | N\u002FA    |           https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fsemantic_synthesis256.zip                                    |                                                 |\n| Landscapes       |  Semantic Image Synthesis   | LDM-VQ-4  | N\u002FA             | N\u002FA               | N\u002FA    | N\u002FA    |           https:\u002F\u002Fommer-lab.com\u002Ffiles\u002Flatent-diffusion\u002Fsemantic_synthesis.zip                                    |             finetuned on resolution 512x512                                     |\n\n\n### Get the models\n\nThe LDMs listed above can jointly be downloaded and extracted via\n\n```shell script\nbash scripts\u002Fdownload_models.sh\n```\n\nThe models can then be found in `models\u002Fldm\u002F\u003Cmodel_spec>`.\n\n\n\n## Coming Soon...\n\n* More inference scripts for conditional LDMs.\n* In the meantime, you can play with our colab notebook https:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1xqzUi2iXQXDqXBHQGP9Mqt2YrYW6cx-J?usp=sharing\n\n## Comments \n\n- Our codebase for the diffusion models builds heavily on [OpenAI's ADM codebase](https:\u002F\u002Fgithub.com\u002Fopenai\u002Fguided-diffusion)\nand [https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fdenoising-diffusion-pytorch](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fdenoising-diffusion-pytorch). \nThanks for open-sourcing!\n\n- The implementation of the transformer encoder is from [x-transformers](https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fx-transformers) by [lucidrains](https:\u002F\u002Fgithub.com\u002Flucidrains?tab=repositories). \n\n\n## BibTeX\n\n```\n@misc{rombach2021highresolution,\n      title={High-Resolution Image Synthesis with Latent Diffusion Models}, \n      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},\n      year={2021},\n      eprint={2112.10752},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n\n@misc{https:\u002F\u002Fdoi.org\u002F10.48550\u002Farxiv.2204.11824,\n  doi = {10.48550\u002FARXIV.2204.11824},\n  url = {https:\u002F\u002Farxiv.org\u002Fabs\u002F2204.11824},\n  author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},\n  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},\n  title = {Retrieval-Augmented Diffusion Models},\n  publisher = {arXiv},\n  year = {2022},  \n  copyright = {arXiv.org perpetual, non-exclusive license}\n}\n\n\n```\n\n\n","该项目通过潜在扩散模型实现高分辨率图像合成。其核心功能包括使用先进的生成技术来创建高质量的图像，支持文本到图像的转换、条件图像生成等，并且引入了检索增强扩散模型以提高生成效果。技术特点上，它基于Jupyter Notebook进行开发与展示，易于理解和复现实验结果；同时提供了预训练模型和详细的使用指南，方便用户快速上手。适合于需要高质量图像生成的应用场景，如创意设计、虚拟现实内容创作以及科学研究中的数据可视化等领域。",2,"2026-06-11 03:43:32","high_star"]