[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9838":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":28,"readmeContent":29,"aiSummary":30,"trendingCount":16,"starSnapshotCount":16,"syncStatus":31,"lastSyncTime":32,"discoverSource":33},9838,"min-dalle","kuprel\u002Fmin-dalle","kuprel","min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch","",null,"Python",3494,251,23,22,0,1,59.3,"MIT License",false,"main",true,[24,25,26,27],"artificial-intelligence","deep-learning","pytorch","text-to-image","2026-06-12 04:00:47","# min(DALL·E)\n\n[\u003Cimg src=\"https:\u002F\u002Fdevin.ai\u002Fassets\u002Fdeepwiki-badge.png\" alt=\"Ask DeepWiki.com\" height=\"20\"\u002F>](https:\u002F\u002Fdeepwiki.com\u002Fkuprel\u002Fmin-dalle)\n&nbsp;\n[![Colab](https:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg)](https:\u002F\u002Fcolab.research.google.com\u002Fgithub\u002Fkuprel\u002Fmin-dalle\u002Fblob\u002Fmain\u002Fmin_dalle.ipynb)\n&nbsp;\n[![Hugging Face Spaces](https:\u002F\u002Fimg.shields.io\u002Fbadge\u002F%F0%9F%A4%97%20Hugging%20Face-Spaces%20Demo-blue)](https:\u002F\u002Fhuggingface.co\u002Fspaces\u002Fkuprel\u002Fmin-dalle)\n&nbsp;\n[![Replicate](https:\u002F\u002Freplicate.com\u002Fkuprel\u002Fmin-dalle\u002Fbadge)](https:\u002F\u002Freplicate.com\u002Fkuprel\u002Fmin-dalle)\n&nbsp;\n[![Discord](https:\u002F\u002Fimg.shields.io\u002Fdiscord\u002F823813159592001537?color=5865F2&logo=discord&logoColor=white)](https:\u002F\u002Fdiscord.com\u002Fchannels\u002F823813159592001537\u002F912729332311556136)\n\n[YouTube Walk-through](https:\u002F\u002Fyoutu.be\u002Fx_8uHX5KngE) by The AI Epiphany\n\nThis is a fast, minimal port of Boris Dayma's [DALL·E Mini](https:\u002F\u002Fgithub.com\u002Fborisdayma\u002Fdalle-mini) (with mega weights).  It has been stripped down for inference and converted to PyTorch.  The only third party dependencies are numpy, requests, pillow and torch.\n\nTo generate a 3x3 grid of DALL·E Mega images it takes:\n- 55 sec with a T4 in Colab\n- 33 sec with a P100 in Colab\n- 15 sec with an A10G on Hugging Face\n\nHere's a more detailed breakdown of performance on an A100. Credit to [@technobird22](https:\u002F\u002Fgithub.com\u002Ftechnobird22) and his [NeoGen](https:\u002F\u002Fgithub.com\u002Ftechnobird22\u002FNeoGen) discord bot for the graph.\n\u003Cbr \u002F>\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fkuprel\u002Fmin-dalle\u002Fraw\u002Fmain\u002Fperformance.png\" alt=\"min-dalle\" width=\"450\"\u002F>\n\u003Cbr \u002F>\n\nThe flax model and code for converting it to torch can be found [here](https:\u002F\u002Fgithub.com\u002Fkuprel\u002Fmin-dalle-flax).\n\n## Install\n\n```bash\n$ pip install min-dalle\n```  \n\n## Usage\n\nLoad the model parameters once and reuse the model to generate multiple images.\n\n```python\nfrom min_dalle import MinDalle\n\nmodel = MinDalle(\n    models_root='.\u002Fpretrained',\n    dtype=torch.float32,\n    device='cuda',\n    is_mega=True, \n    is_reusable=True\n)\n```\n\nThe required models will be downloaded to `models_root` if they are not already there.  Set the `dtype` to `torch.float16` to save GPU memory.  If you have an Ampere architecture GPU you can use `torch.bfloat16`.  Set the `device` to either \"cuda\" or \"cpu\".  Once everything has finished initializing, call `generate_image` with some text as many times as you want.  Use a positive `seed` for reproducible results.  Higher values for `supercondition_factor` result in better agreement with the text but a narrower variety of generated images.  Every image token is sampled from the `top_k` most probable tokens.  The largest logit is subtracted from the logits to avoid infs.  The logits are then divided by the `temperature`.  If `is_seamless` is true, the image grid will be tiled in token space not pixel space.\n\n```python\nimage = model.generate_image(\n    text='Nuclear explosion broccoli',\n    seed=-1,\n    grid_size=4,\n    is_seamless=False,\n    temperature=1,\n    top_k=256,\n    supercondition_factor=32,\n    is_verbose=False\n)\n\ndisplay(image)\n```\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fkuprel\u002Fmin-dalle\u002Fraw\u002Fmain\u002Fexamples\u002Fnuclear_broccoli.jpg\" alt=\"min-dalle\" width=\"400\"\u002F>\n\nCredit to [@hardmaru](https:\u002F\u002Ftwitter.com\u002Fhardmaru) for the [example](https:\u002F\u002Ftwitter.com\u002Fhardmaru\u002Fstatus\u002F1544354119527596034)\n\n\n### Saving Individual Images\nThe images can also be generated as a `FloatTensor` in case you want to process them manually.\n\n```python\nimages = model.generate_images(\n    text='Nuclear explosion broccoli',\n    seed=-1,\n    grid_size=3,\n    is_seamless=False,\n    temperature=1,\n    top_k=256,\n    supercondition_factor=16,\n    is_verbose=False\n)\n```\n\nTo get an image into PIL format you will have to first move the images to the CPU and convert the tensor to a numpy array.\n```python\nimages = images.to('cpu').numpy()\n```\nThen image $i$ can be coverted to a PIL.Image and saved\n```python\nimage = Image.fromarray(images[i])\nimage.save('image_{}.png'.format(i))\n```\n\n### Progressive Outputs\n\nIf the model is being used interactively (e.g. in a notebook) `generate_image_stream` can be used to generate a stream of images as the model is decoding.  The detokenizer adds a slight delay for each image.  Set `progressive_outputs` to `True` to enable this.  An example is implemented in the colab.\n\n```python\nimage_stream = model.generate_image_stream(\n    text='Dali painting of WALL·E',\n    seed=-1,\n    grid_size=3,\n    progressive_outputs=True,\n    is_seamless=False,\n    temperature=1,\n    top_k=256,\n    supercondition_factor=16,\n    is_verbose=False\n)\n\nfor image in image_stream:\n    display(image)\n```\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fkuprel\u002Fmin-dalle\u002Fraw\u002Fmain\u002Fexamples\u002Fdali_walle_animated.gif\" alt=\"min-dalle\" width=\"300\"\u002F>\n\n### Command Line\n\nUse `image_from_text.py` to generate images from the command line.\n\n```bash\n$ python image_from_text.py --text='artificial intelligence' --no-mega\n```\n\u003Cimg src=\"https:\u002F\u002Fgithub.com\u002Fkuprel\u002Fmin-dalle\u002Fraw\u002Fmain\u002Fexamples\u002Fartificial_intelligence.jpg\" alt=\"min-dalle\" width=\"200\"\u002F>\n","min(DALL·E)是一个快速且精简的DALL·E Mini到PyTorch的移植项目。它通过去除冗余部分并转换为PyTorch框架来实现高效的文本到图像生成，仅依赖numpy、requests、pillow和torch这几个第三方库。该项目特别适用于需要快速生成高质量图像的应用场景，如创意设计、内容创作等，能够在较短时间内（例如使用A10G GPU时仅需约15秒）根据给定的文字描述生成3x3网格的图像。此外，用户还可以通过调整参数如种子值、温度以及超级条件因子来自定义生成过程，以获得更符合需求的结果。",2,"2026-06-11 03:24:59","top_topic"]