[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-71052":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":8,"htmlUrl":8,"language":9,"languages":8,"totalLinesOfCode":8,"stars":10,"forks":11,"watchers":12,"openIssues":13,"contributorsCount":14,"subscribersCount":14,"size":14,"stars1d":14,"stars7d":14,"stars30d":14,"stars90d":14,"forks30d":14,"starsTrendScore":14,"compositeScore":15,"rankGlobal":8,"rankLanguage":8,"license":16,"archived":17,"fork":17,"defaultBranch":18,"hasWiki":19,"hasPages":17,"topics":20,"createdAt":8,"pushedAt":8,"updatedAt":21,"readmeContent":22,"aiSummary":23,"trendingCount":14,"starSnapshotCount":14,"syncStatus":24,"lastSyncTime":25,"discoverSource":26},71052,"IF","deep-floyd\u002FIF","deep-floyd",null,"Python",7814,527,82,90,0,39.17,"Other",false,"develop",true,[],"2026-06-12 02:02:47","[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FCode_License-Modified_MIT-blue.svg)](LICENSE)\n[![License](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FWeights_License-DeepFloyd_IF-orange.svg)](LICENSE-MODEL)\n[![Downloads](https:\u002F\u002Fpepy.tech\u002Fbadge\u002Fdeepfloyd_if)](https:\u002F\u002Fpepy.tech\u002Fproject\u002Fdeepfloyd_if)\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FDiscord-%237289DA.svg?logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.gg\u002Fumz62Mgr)\n[![Twitter](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FTwitter-%231DA1F2.svg?logo=twitter&logoColor=white)](https:\u002F\u002Ftwitter.com\u002Fdeepfloydai)\n[![Linktree](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002FLinktree-%2339E09B.svg?logo=linktree&logoColor=white)](http:\u002F\u002Flinktr.ee\u002Fdeepfloyd)\n\n# IF by [DeepFloyd Lab](https:\u002F\u002Fdeepfloyd.ai) at [StabilityAI](https:\u002F\u002Fstability.ai\u002F)\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fpics\u002Fnabla.jpg\" width=\"100%\">\n\u003C\u002Fp>\n\nWe introduce DeepFloyd IF, a novel state-of-the-art open-source text-to-image model with a high degree of photorealism and language understanding. DeepFloyd IF is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules: a base model that generates 64x64 px image based on text prompt and two super-resolution models, each designed to generate images of increasing resolution: 256x256 px and 1024x1024 px. All stages of the model utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset. Our work underscores the potential of larger UNet architectures in the first stage of cascaded diffusion models and depicts a promising future for text-to-image synthesis.\n\n\u003Cp align=\"center\">\n  \u003Cimg src=\".\u002Fpics\u002Fdeepfloyd_if_scheme.jpg\" width=\"100%\">\n\u003C\u002Fp>\n\n*Inspired by* [*Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding*](https:\u002F\u002Farxiv.org\u002Fpdf\u002F2205.11487.pdf)\n\n## Minimum requirements to use all IF models:\n- 16GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module)\n- 24GB vRAM for IF-I-XL (4.3B text to 64x64 base module) & IF-II-L (1.2B to 256x256 upscaler module) & Stable x4 (to 1024x1024 upscaler)\n- `xformers` and set env variable `FORCE_MEM_EFFICIENT_ATTN=1`\n\n\n## Quick Start\n[![Open In Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fhuggingface\u002Fnotebooks\u002Fblob\u002Fmain\u002Fdiffusers\u002Fdeepfloyd_if_free_tier_google_colab.ipynb)\n[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FDeepFloyd\u002FIF)\n\n```shell\npip install deepfloyd_if==1.0.2rc0\npip install xformers==0.0.16\npip install git+https:\u002F\u002Fgithub.com\u002Fopenai\u002FCLIP.git --no-deps\n```\n\n## Local notebooks\n[![Jupyter Notebook](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002Fjupyter_notebook-%23FF7A01.svg?logo=jupyter&logoColor=white)](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-notebooks\u002Fblob\u002Fmain\u002Fpipes-DeepFloyd-IF-v1.0.ipynb)\n[![Kaggle](https:\u002F\u002Fkaggle.com\u002Fstatic\u002Fimages\u002Fopen-in-kaggle.svg)](https:\u002F\u002Fwww.kaggle.com\u002Fcode\u002Fshonenkov\u002Fdeepfloyd-if-4-3b-generator-of-pictures)\n\nThe Dream, Style Transfer, Super Resolution or Inpainting modes are avaliable in a Jupyter Notebook [here](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-notebooks\u002Fblob\u002Fmain\u002Fpipes-DeepFloyd-IF-v1.0.ipynb).\n\n\n\n## Integration with 🤗 Diffusers\n\nIF is also integrated with the 🤗 Hugging Face [Diffusers library](https:\u002F\u002Fgithub.com\u002Fhuggingface\u002Fdiffusers\u002F).\n\nDiffusers runs each stage individually allowing the user to customize the image generation process as well as allowing to inspect intermediate results easily.\n\n### Example\n\nBefore you can use IF, you need to accept its usage conditions. To do so:\n1. Make sure to have a [Hugging Face account](https:\u002F\u002Fhuggingface.co\u002Fjoin) and be loggin in\n2. Accept the license on the model card of [DeepFloyd\u002FIF-I-XL-v1.0](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-XL-v1.0)\n3. Make sure to login locally. Install `huggingface_hub`\n```sh\npip install huggingface_hub --upgrade\n```\n\nrun the login function in a Python shell\n\n```py\nfrom huggingface_hub import login\n\nlogin()\n```\n\nand enter your [Hugging Face Hub access token](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fhub\u002Fsecurity-tokens#what-are-user-access-tokens).\n\nNext we install `diffusers` and dependencies:\n\n```sh\npip install diffusers accelerate transformers safetensors\n```\n\nAnd we can now run the model locally.\n\nBy default `diffusers` makes use of [model cpu offloading](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Foptimization\u002Ffp16#model-offloading-for-fast-inference-and-memory-savings) to run the whole IF pipeline with as little as 14 GB of VRAM.\n\nIf you are using `torch>=2.0.0`, make sure to **delete all** `enable_xformers_memory_efficient_attention()`\nfunctions.\n\n```py\nfrom diffusers import DiffusionPipeline\nfrom diffusers.utils import pt_to_pil\nimport torch\n\n# stage 1\nstage_1 = DiffusionPipeline.from_pretrained(\"DeepFloyd\u002FIF-I-XL-v1.0\", variant=\"fp16\", torch_dtype=torch.float16)\nstage_1.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0\nstage_1.enable_model_cpu_offload()\n\n# stage 2\nstage_2 = DiffusionPipeline.from_pretrained(\n    \"DeepFloyd\u002FIF-II-L-v1.0\", text_encoder=None, variant=\"fp16\", torch_dtype=torch.float16\n)\nstage_2.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0\nstage_2.enable_model_cpu_offload()\n\n# stage 3\nsafety_modules = {\"feature_extractor\": stage_1.feature_extractor, \"safety_checker\": stage_1.safety_checker, \"watermarker\": stage_1.watermarker}\nstage_3 = DiffusionPipeline.from_pretrained(\"stabilityai\u002Fstable-diffusion-x4-upscaler\", **safety_modules, torch_dtype=torch.float16)\nstage_3.enable_xformers_memory_efficient_attention()  # remove line if torch.__version__ >= 2.0.0\nstage_3.enable_model_cpu_offload()\n\nprompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says \"very deep learning\"'\n\n# text embeds\nprompt_embeds, negative_embeds = stage_1.encode_prompt(prompt)\n\ngenerator = torch.manual_seed(0)\n\n# stage 1\nimage = stage_1(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type=\"pt\").images\npt_to_pil(image)[0].save(\".\u002Fif_stage_I.png\")\n\n# stage 2\nimage = stage_2(\n    image=image, prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_embeds, generator=generator, output_type=\"pt\"\n).images\npt_to_pil(image)[0].save(\".\u002Fif_stage_II.png\")\n\n# stage 3\nimage = stage_3(prompt=prompt, image=image, generator=generator, noise_level=100).images\nimage[0].save(\".\u002Fif_stage_III.png\")\n```\n\n There are multiple ways to speed up the inference time and lower the memory consumption even more with `diffusers`. To do so, please have a look at the Diffusers docs:\n\n- 🚀 [Optimizing for inference time](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Fif#optimizing-for-speed)\n- ⚙️ [Optimizing for low memory during inference](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fapi\u002Fpipelines\u002Fif#optimizing-for-memory)\n\nFor more in-detail information about how to use IF, please have a look at [the IF blog post](https:\u002F\u002Fhuggingface.co\u002Fblog\u002Fif) and [the documentation](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fmain\u002Fen\u002Fapi\u002Fpipelines\u002Fif) 📖.\n\nDiffusers dreambooth scripts also supports fine-tuning 🎨 [IF](https:\u002F\u002Fhuggingface.co\u002Fdocs\u002Fdiffusers\u002Fmain\u002Fen\u002Ftraining\u002Fdreambooth#if).\nWith parameter efficient finetuning, you can add new concepts to IF with a single GPU and ~28 GB VRAM.\n\n## Run the code locally\n\n### Loading the models into VRAM\n\n```python\nfrom deepfloyd_if.modules import IFStageI, IFStageII, StableStageIII\nfrom deepfloyd_if.modules.t5 import T5Embedder\n\ndevice = 'cuda:0'\nif_I = IFStageI('IF-I-XL-v1.0', device=device)\nif_II = IFStageII('IF-II-L-v1.0', device=device)\nif_III = StableStageIII('stable-diffusion-x4-upscaler', device=device)\nt5 = T5Embedder(device=\"cpu\")\n```\n\n### I. Dream\nDream is the text-to-image mode of the IF model\n\n```python\nfrom deepfloyd_if.pipelines import dream\n\nprompt = 'ultra close-up color photo portrait of rainbow owl with deer horns in the woods'\ncount = 4\n\nresult = dream(\n    t5=t5, if_I=if_I, if_II=if_II, if_III=if_III,\n    prompt=[prompt]*count,\n    seed=42,\n    if_I_kwargs={\n        \"guidance_scale\": 7.0,\n        \"sample_timestep_respacing\": \"smart100\",\n    },\n    if_II_kwargs={\n        \"guidance_scale\": 4.0,\n        \"sample_timestep_respacing\": \"smart50\",\n    },\n    if_III_kwargs={\n        \"guidance_scale\": 9.0,\n        \"noise_level\": 20,\n        \"sample_timestep_respacing\": \"75\",\n    },\n)\n\nif_III.show(result['III'], size=14)\n```\n![](.\u002Fpics\u002Fdream-III.jpg)\n\n## II. Zero-shot Image-to-Image Translation\n\n![](.\u002Fpics\u002Fimg_to_img_scheme.jpeg)\n\nIn Style Transfer mode, the output of your prompt comes out at the style of the `support_pil_img`\n```python\nfrom deepfloyd_if.pipelines import style_transfer\n\nresult = style_transfer(\n    t5=t5, if_I=if_I, if_II=if_II,\n    support_pil_img=raw_pil_image,\n    style_prompt=[\n        'in style of professional origami',\n        'in style of oil art, Tate modern',\n        'in style of plastic building bricks',\n        'in style of classic anime from 1990',\n    ],\n    seed=42,\n    if_I_kwargs={\n        \"guidance_scale\": 10.0,\n        \"sample_timestep_respacing\": \"10,10,10,10,10,10,10,10,0,0\",\n        'support_noise_less_qsample_steps': 5,\n    },\n    if_II_kwargs={\n        \"guidance_scale\": 4.0,\n        \"sample_timestep_respacing\": 'smart50',\n        \"support_noise_less_qsample_steps\": 5,\n    },\n)\nif_I.show(result['II'], 1, 20)\n```\n\n![Alternative Text](.\u002Fpics\u002Fdeep_floyd_if_image_2_image.gif)\n\n\n## III. Super Resolution\nFor super-resolution, users can run `IF-II` and `IF-III` or 'Stable x4' on an image that was not necessarely generated by IF (two cascades):\n\n```python\nfrom deepfloyd_if.pipelines import super_resolution\n\nmiddle_res = super_resolution(\n    t5,\n    if_III=if_II,\n    prompt=['woman with a blue headscarf and a blue sweaterp, detailed picture, 4k dslr, best quality'],\n    support_pil_img=raw_pil_image,\n    img_scale=4.,\n    img_size=64,\n    if_III_kwargs={\n        'sample_timestep_respacing': 'smart100',\n        'aug_level': 0.5,\n        'guidance_scale': 6.0,\n    },\n)\nhigh_res = super_resolution(\n    t5,\n    if_III=if_III,\n    prompt=[''],\n    support_pil_img=middle_res['III'][0],\n    img_scale=4.,\n    img_size=256,\n    if_III_kwargs={\n        \"guidance_scale\": 9.0,\n        \"noise_level\": 20,\n        \"sample_timestep_respacing\": \"75\",\n    },\n)\nshow_superres(raw_pil_image, high_res['III'][0])\n```\n\n![](.\u002Fpics\u002Fif_as_upscaler.jpg)\n\n\n### IV. Zero-shot Inpainting\n\n```python\nfrom deepfloyd_if.pipelines import inpainting\n\nresult = inpainting(\n    t5=t5, if_I=if_I,\n    if_II=if_II,\n    if_III=if_III,\n    support_pil_img=raw_pil_image,\n    inpainting_mask=inpainting_mask,\n    prompt=[\n        'oil art, a man in a hat',\n    ],\n    seed=42,\n    if_I_kwargs={\n        \"guidance_scale\": 7.0,\n        \"sample_timestep_respacing\": \"10,10,10,10,10,0,0,0,0,0\",\n        'support_noise_less_qsample_steps': 0,\n    },\n    if_II_kwargs={\n        \"guidance_scale\": 4.0,\n        'aug_level': 0.0,\n        \"sample_timestep_respacing\": '100',\n    },\n    if_III_kwargs={\n        \"guidance_scale\": 9.0,\n        \"noise_level\": 20,\n        \"sample_timestep_respacing\": \"75\",\n    },\n)\nif_I.show(result['I'], 2, 3)\nif_I.show(result['II'], 2, 6)\nif_I.show(result['III'], 2, 14)\n```\n![](.\u002Fpics\u002Fdeep_floyd_if_inpainting.gif)\n\n### 🤗 Model Zoo 🤗\nThe link to download the weights as well as the model cards will be available soon on each model of the model zoo\n\n#### Original\n\n| Name                                                      | Cascade | Params | FID  | Batch size | Steps |\n|:----------------------------------------------------------|:-------:|:------:|:----:|:----------:|:-----:|\n| [IF-I-M](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-M-v1.0)    |    I    |  400M  | 8.86 |    3072    | 2.5M  |\n| [IF-I-L](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-L-v1.0)    |    I    |  900M  | 8.06 |    3200    | 3.0M  |\n| [IF-I-XL](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-XL-v1.0)* |    I    |  4.3B  | 6.66 |    3072    | 2.42M |\n| [IF-II-M](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-II-M-v1.0)  |   II    |  450M  |  -   |    1536    | 2.5M  |\n| [IF-II-L](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-II-L-v1.0)* |   II    |  1.2B  |  -   |    1536    | 2.5M  |\n| IF-III-L* _(soon)_                                        |   III   |  700M  |  -   |    3072    | 1.25M |\n\n *best modules\n\n### Quantitative Evaluation\n\n`FID = 6.66`\n\n![](.\u002Fpics\u002Ffid30k_if.jpg)\n\n## License\n\nThe code in this repository is released under the bespoke license (see added [point two](https:\u002F\u002Fgithub.com\u002Fdeep-floyd\u002FIF\u002Fblob\u002Fmain\u002FLICENSE#L13)).\n\nThe weights will be available soon via [the DeepFloyd organization at Hugging Face](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd) and have their own LICENSE.\n\n**Disclaimer:** *The initial release of the IF model is under a restricted research-purposes-only license temporarily to gather feedback, and after that we intend to release a fully open-source model in line with other Stability AI models.*\n\n## Limitations and Biases\n\nThe models available in this codebase have known limitations and biases. Please refer to [the model card](https:\u002F\u002Fhuggingface.co\u002FDeepFloyd\u002FIF-I-L-v1.0) for more information.\n\n\n## 🎓 DeepFloyd IF creators:\n\n- Alex Shonenkov [GitHub](https:\u002F\u002Fgithub.com\u002Fshonenkov) | [Linktr](https:\u002F\u002Flinktr.ee\u002FshonenkovAI)\n- Misha Konstantinov [GitHub](https:\u002F\u002Fgithub.com\u002Fzeroshot-ai) | [Twitter](https:\u002F\u002Ftwitter.com\u002F_bra_ket)\n- Daria Bakshandaeva [GitHub](https:\u002F\u002Fgithub.com\u002FGugutse) | [Twitter](https:\u002F\u002Ftwitter.com\u002F_gugutse_)\n- Christoph Schuhmann [GitHub](https:\u002F\u002Fgithub.com\u002Fchristophschuhmann) | [Twitter](https:\u002F\u002Ftwitter.com\u002Flaion_ai)\n- Ksenia Ivanova [GitHub](https:\u002F\u002Fgithub.com\u002Fivksu) | [Twitter](https:\u002F\u002Ftwitter.com\u002Fsusiaiv)\n- Nadiia Klokova [GitHub](https:\u002F\u002Fgithub.com\u002Fvauimpuls) | [Twitter](https:\u002F\u002Ftwitter.com\u002Fvauimpuls)\n\n\n## 📄 Research Paper (Soon)\n\n## Acknowledgements\n\nSpecial thanks to [StabilityAI](http:\u002F\u002Fstability.ai) and its CEO [Emad Mostaque](https:\u002F\u002Ftwitter.com\u002Femostaque) for invaluable support, providing GPU compute and infrastructure to train the models (our gratitude goes to [Richard Vencu](https:\u002F\u002Fgithub.com\u002Frvencu)); thanks to [LAION](https:\u002F\u002Flaion.ai) and [Christoph Schuhmann](https:\u002F\u002Fgithub.com\u002Fchristophschuhmann) in particular for contribution to the project and well-prepared datasets; thanks to [Huggingface](https:\u002F\u002Fhuggingface.co) teams for optimizing models' speed and memory consumption during inference, creating demos and giving cool advice!\n\n## 🚀 External Contributors 🚀\n- The Biggest Thanks [@Apolinário](https:\u002F\u002Fgithub.com\u002Fapolinario), for ideas, consultations, help and support on all stages to make IF available in open-source; for writing a lot of documentation and instructions; for creating a friendly atmosphere in difficult moments 🦉;\n- Thanks, [@patrickvonplaten](https:\u002F\u002Fgithub.com\u002Fpatrickvonplaten), for improving loading time of unet models by 80%;\nfor integration Stable-Diffusion-x4 as native pipeline 💪;\n- Thanks, [@williamberman](https:\u002F\u002Fgithub.com\u002Fwilliamberman) and [@patrickvonplaten](https:\u002F\u002Fgithub.com\u002Fpatrickvonplaten) for diffusers integration 🙌;\n- Thanks, [@hysts](https:\u002F\u002Fgithub.com\u002Fhysts) and [@Apolinário](https:\u002F\u002Fgithub.com\u002Fapolinario) for creating [the best gradio demo with IF](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002FDeepFloyd\u002FIF) 🚀;\n- Thanks, [@Dango233](https:\u002F\u002Fgithub.com\u002FDango233), for adapting IF with xformers memory efficient attention 💪;\n","DeepFloyd\u002FIF 是一个先进的开源文本到图像生成模型，能够根据文本提示生成高度逼真的图像。该项目的核心功能包括一个冻结的文本编码器和三个级联的像素扩散模块，分别负责生成64x64、256x256和1024x1024分辨率的图像。这些模块基于T5变压器提取文本嵌入，并通过增强的UNet架构实现高效的图像合成。DeepFloyd IF特别适合需要高质量图像生成的应用场景，如创意设计、虚拟内容创作等。其卓越的性能在COCO数据集上的零样本FID得分为6.66，展示了其在当前同类模型中的领先地位。",2,"2026-06-11 03:35:40","high_star"]