[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"project-9808":3},{"id":4,"name":5,"fullName":6,"owner":7,"repo":5,"description":8,"homepage":9,"htmlUrl":10,"language":11,"languages":10,"totalLinesOfCode":10,"stars":12,"forks":13,"watchers":14,"openIssues":15,"contributorsCount":16,"subscribersCount":16,"size":16,"stars1d":16,"stars7d":16,"stars30d":17,"stars90d":16,"forks30d":16,"starsTrendScore":16,"compositeScore":18,"rankGlobal":10,"rankLanguage":10,"license":19,"archived":20,"fork":20,"defaultBranch":21,"hasWiki":22,"hasPages":20,"topics":23,"createdAt":10,"pushedAt":10,"updatedAt":31,"readmeContent":32,"aiSummary":33,"trendingCount":16,"starSnapshotCount":16,"syncStatus":17,"lastSyncTime":34,"discoverSource":35},9808,"deep-daze","lucidrains\u002Fdeep-daze","lucidrains","Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https:\u002F\u002Ftwitter.com\u002Fadvadnoun","",null,"Python",4322,311,72,89,0,2,59.68,"MIT License",false,"main",true,[24,25,26,27,28,29,30],"artificial-intelligence","deep-learning","implicit-neural-representation","multi-modality","siren","text-to-image","transformers","2026-06-12 04:00:46","## Deep Daze\n\n\u003Cimg src=\".\u002Fsamples\u002FMist_over_green_hills.jpg\" width=\"256px\">\u003C\u002Fimg>\n\n*mist over green hills*\n\n\u003Cimg src=\".\u002Fsamples\u002FShattered_plates_on_the_grass.jpg\" width=\"256px\">\u003C\u002Fimg>\n\n*shattered plates on the grass*\n\n\u003Cimg src=\".\u002Fsamples\u002FCosmic_love_and_attention.jpg\" width=\"256px\">\u003C\u002Fimg>\n\n*cosmic love and attention*\n\n\u003Cimg src=\".\u002Fsamples\u002FA_time_traveler_in_the_crowd.jpg\" width=\"256px\">\u003C\u002Fimg>\n\n*a time traveler in the crowd*\n\n\u003Cimg src=\".\u002Fsamples\u002FLife_during_the_plague.jpg\" width=\"256px\">\u003C\u002Fimg>\n\n*life during the plague*\n\n\u003Cimg src=\".\u002Fsamples\u002FMeditative_peace_in_a_sunlit_forest.jpg\" width=\"256px\">\u003C\u002Fimg>\n\n*meditative peace in a sunlit forest*\n\n\u003Cimg src=\".\u002Fsamples\u002FA_man_painting_a_completely_red_image.png\" width=\"256px\">\u003C\u002Fimg>\n\n*a man painting a completely red image*\n\n\u003Cimg src=\".\u002Fsamples\u002FA_psychedelic_experience_on_LSD.png\" width=\"256px\">\u003C\u002Fimg>\n\n*a psychedelic experience on LSD*\n\n## What is this?\n\nSimple command line tool for text to image generation using OpenAI's \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenai\u002FCLIP\">CLIP\u003C\u002Fa> and \u003Ca href=\"https:\u002F\u002Farxiv.org\u002Fabs\u002F2006.09661\">Siren\u003C\u002Fa>. Credit goes to \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fadvadnoun\">Ryan Murdock\u003C\u002Fa> for the discovery of this technique (and for coming up with the great name)!\n\nOriginal notebook [![Open In Colab][colab-badge]][colab-notebook]\n\nNew simplified notebook [![Open In Colab][colab-badge]][colab-notebook-2]\n\n[colab-notebook]: \u003Chttps:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1FoHdqoqKntliaQKnMoNs3yn5EALqWtvP>\n[colab-notebook-2]: \u003Chttps:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1_YOHdORb0Fg1Q7vWZ_KlrtFe9Ur3pmVj?usp=sharing>\n[colab-badge]: \u003Chttps:\u002F\u002Fcolab.research.google.com\u002Fassets\u002Fcolab-badge.svg>\n\nThis will require that you have an Nvidia GPU or AMD GPU\n- Recommended: 16GB VRAM\n- Minimum Requirements: 4GB VRAM (Using VERY LOW settings, see usage instructions below) \n\n## Install\n\n```bash\n$ pip install deep-daze\n```  \n\n### Windows Install\n\n\u003Cimg src=\".\u002Finstruction_images\u002FWindows\u002FStep_1_DD_Win.png\" width=\"480px\">\u003C\u002Fimg>\n\nPresuming Python is installed: \n- Open command prompt and navigate to the directory of your current version of Python\n```bash\n  pip install deep-daze\n```\n\n## Examples\n\n```bash\n$ imagine \"a house in the forest\"\n```\nFor Windows:\n\n\u003Cimg src=\".\u002Finstruction_images\u002FWindows\u002FStep_2_DD_Win.png\" width=\"480px\">\u003C\u002Fimg>\n\n- Open command prompt as administrator\n```bash\n  imagine \"a house in the forest\"\n```\n\nThat's it.\n\n\nIf you have enough memory, you can get better quality by adding a `--deeper` flag\n\n```bash\n$ imagine \"shattered plates on the ground\" --deeper\n```\n\n### Advanced\n\nIn true deep learning fashion, more layers will yield better results. Default is at `16`, but can be increased to `32` depending on your resources.\n\n```bash\n$ imagine \"stranger in strange lands\" --num-layers 32\n```\n\n## Usage\n\n### CLI\n```bash\nNAME\n    imagine\n\nSYNOPSIS\n    imagine TEXT \u003Cflags>\n\nPOSITIONAL ARGUMENTS\n    TEXT\n        (required) A phrase less than 77 tokens which you would like to visualize.\n\nFLAGS\n    --img=IMAGE_PATH\n        Default: None\n        Path to png\u002Fjpg image or PIL image to optimize on\n    --encoding=ENCODING\n        Default: None\n        User-created custom CLIP encoding. If used, replaces any text or image that was used.\n    --create_story=CREATE_STORY\n        Default: False\n        Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 tokens can be used. Requires save_progress to visualize the transitions of the story.\n    --story_start_words=STORY_START_WORDS\n        Default: 5\n        Only used if create_story is True. How many words to optimize on for the first epoch.\n    --story_words_per_epoch=STORY_WORDS_PER_EPOCH\n        Default: 5\n        Only used if create_story is True. How many words to add to the optimization goal per epoch after the first one.\n    --story_separator:\n        Default: None\n        Only used if create_story is True. Defines a separator like '.' that splits the text into groups for each epoch. Separator needs to be in the text otherwise it will be ignored\n    --lower_bound_cutout=LOWER_BOUND_CUTOUT\n        Default: 0.1\n        Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8.\n    --upper_bound_cutout=UPPER_BOUND_CUTOUT\n        Default: 1.0\n        Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0.\n    --saturate_bound=SATURATE_BOUND\n        Default: False\n        If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training.\n    --learning_rate=LEARNING_RATE\n        Default: 1e-05\n        The learning rate of the neural net.\n    --num_layers=NUM_LAYERS\n        Default: 16\n        The number of hidden layers to use in the Siren neural net.\n    --batch_size=BATCH_SIZE\n        Default: 4\n        The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy.\n    --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY\n        Default: 4\n        Calculate a weighted loss of n samples for each iteration. Increasing this can help increase accuracy with lower batch sizes.\n    --epochs=EPOCHS\n        Default: 20\n        The number of epochs to run.\n    --iterations=ITERATIONS\n        Default: 1050\n        The number of times to calculate and backpropagate loss in a given epoch.\n    --save_every=SAVE_EVERY\n        Default: 100\n        Generate an image every time iterations is a multiple of this number.\n    --image_width=IMAGE_WIDTH\n        Default: 512\n        The desired resolution of the image.\n    --deeper=DEEPER\n        Default: False\n        Uses a Siren neural net with 32 hidden layers.\n    --overwrite=OVERWRITE\n        Default: False\n        Whether or not to overwrite existing generated images of the same name.\n    --save_progress=SAVE_PROGRESS\n        Default: False\n        Whether or not to save images generated before training Siren is complete.\n    --seed=SEED\n        Type: Optional[]\n        Default: None\n        A seed to be used for deterministic runs.\n    --open_folder=OPEN_FOLDER\n        Default: True\n        Whether or not to open a folder showing your generated images.\n    --save_date_time=SAVE_DATE_TIME\n        Default: False\n        Save files with a timestamp prepended e.g. `%y%m%d-%H%M%S-my_phrase_here`\n    --start_image_path=START_IMAGE_PATH\n        Default: None\n        The generator is trained first on a starting image before steered towards the textual input\n    --start_image_train_iters=START_IMAGE_TRAIN_ITERS\n        Default: 50\n        The number of steps for the initial training on the starting image\n    --theta_initial=THETA_INITIAL\n        Default: 30.0\n        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.\n    --theta_hidden=THETA_INITIAL\n        Default: 30.0\n        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.\n    --save_gif=SAVE_GIF\n        Default: False\n        Whether or not to save a GIF animation of the generation procedure. Only works if save_progress is set to True.\n```\n\n### Priming\n\nTechnique first devised and shared by \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fquasimondo\">Mario Klingemann\u003C\u002Fa>, it allows you to prime the generator network with a starting image, before being steered towards the text.\n\nSimply specify the path to the image you wish to use, and optionally the number of initial training steps.\n\n```bash\n$ imagine 'a clear night sky filled with stars' --start_image_path .\u002Fcloudy-night-sky.jpg\n```\n\nPrimed starting image\n\n\u003Cimg src=\".\u002Fsamples\u002Fprime-orig.jpg\" width=\"256px\">\u003C\u002Fimg>\n\nThen trained with the prompt `A pizza with green pepper.`\n\n\u003Cimg src=\".\u002Fsamples\u002Fprime-trained.png\" width=\"256px\">\u003C\u002Fimg>\n\n\n### Optimize for the interpretation of an image\n\nWe can also feed in an image as an optimization goal, instead of only priming the generator network. Deepdaze will then render its own interpretation of that image:\n```bash\n$ imagine --img samples\u002FAutumn_1875_Frederic_Edwin_Church.jpg\n```\nOriginal image:\n\n\u003Cimg src=\".\u002Fsamples\u002FAutumn_1875_Frederic_Edwin_Church_original.jpg\" width=\"256px\">\u003C\u002Fimg>\n\nThe network's interpretation:  \n\n\u003Cimg src=\".\u002Fsamples\u002FAutumn_1875_Frederic_Edwin_Church.jpg\" width=\"256px\">\u003C\u002Fimg>\n\nOriginal image:\n\n\u003Cimg src=\".\u002Fsamples\u002Fhot-dog.jpg\" width=\"256px\">\u003C\u002Fimg>\n\nThe network's interpretation:  \n\n\u003Cimg src=\".\u002Fsamples\u002Fhot-dog_imagined.png\" width=\"256px\">\u003C\u002Fimg>\n\n#### Optimize for text and image combined\n\n```bash\n$ imagine \"A psychedelic experience.\" --img samples\u002Fhot-dog.jpg\n```\nThe network's interpretation:  \n\u003Cimg src=\".\u002Fsamples\u002Fpsychedelic_hot_dog.png\" width=\"256px\">\u003C\u002Fimg>\n\n\n### New: Create a story\nThe regular mode for texts only allows 77 tokens. If you want to visualize a full story\u002Fparagraph\u002Fsong\u002Fpoem, set `create_story` to `True`.\n\nGiven the poem “Stopping by Woods On a Snowy Evening” by Robert Frost - \n\"Whose woods these are I think I know. His house is in the village though; He will not see me stopping here To watch his woods fill up with snow. My little horse must think it queer To stop without a farmhouse near Between the woods and frozen lake The darkest evening of the year. He gives his harness bells a shake To ask if there is some mistake. The only other sound’s the sweep Of easy wind and downy flake. The woods are lovely, dark and deep, But I have promises to keep, And miles to go before I sleep, And miles to go before I sleep.\".\n\nWe get:\n\nhttps:\u002F\u002Fuser-images.githubusercontent.com\u002F19983153\u002F109539633-d671ef80-7ac1-11eb-8d8c-380332d7c868.mp4\n\n\n\n### Python\n#### Invoke `deep_daze.Imagine` in Python\n```python\nfrom deep_daze import Imagine\n\nimagine = Imagine(\n    text = 'cosmic love and attention',\n    num_layers = 24,\n)\nimagine()\n```\n\n#### Save progress every fourth iteration\nSave images in the format insert_text_here.00001.png, insert_text_here.00002.png, ...up to `(total_iterations % save_every)`\n```python\nimagine = Imagine(\n    text=text,\n    save_every=4,\n    save_progress=True\n)\n```\n\n#### Prepend current timestamp on each image.\nCreates files with both the timestamp and the sequence number.\n\ne.g. 210129-043928_328751_insert_text_here.00001.png, 210129-043928_512351_insert_text_here.00002.png, ...\n```python\nimagine = Imagine(\n    text=text,\n    save_every=4,\n    save_progress=True,\n    save_date_time=True,\n)\n```\n\n#### High GPU memory usage\nIf you have at least 16 GiB of vram available, you should be able to run these settings with some wiggle room.\n```python\nimagine = Imagine(\n    text=text,\n    num_layers=42,\n    batch_size=64,\n    gradient_accumulate_every=1,\n)\n```\n\n#### Average GPU memory usage\n```python\nimagine = Imagine(\n    text=text,\n    num_layers=24,\n    batch_size=16,\n    gradient_accumulate_every=2\n)\n```\n\n#### Very low GPU memory usage (less than 4 GiB)\nIf you are desperate to run this on a card with less than 8 GiB vram, you can lower the image_width.\n```python\nimagine = Imagine(\n    text=text,\n    image_width=256,\n    num_layers=16,\n    batch_size=1,\n    gradient_accumulate_every=16 # Increase gradient_accumulate_every to correct for loss in low batch sizes\n)\n```\n\n### VRAM and speed benchmarks:\nThese experiments were conducted with a 2060 Super RTX and a 3700X Ryzen 5. We first mention the parameters (bs = batch size), then the memory usage and in some cases the training iterations per second:\n\nFor an image resolution of 512: \n* bs 1,  num_layers 22: 7.96 GB\n* bs 2,  num_layers 20: 7.5 GB\n* bs 16, num_layers 16: 6.5 GB\n\nFor an image resolution of 256:\n* bs 8, num_layers 48: 5.3 GB\n* bs 16, num_layers 48: 5.46 GB - 2.0 it\u002Fs\n* bs 32, num_layers 48: 5.92 GB - 1.67 it\u002Fs\n* bs 8, num_layers 44: 5 GB - 2.39 it\u002Fs\n* bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it\u002Fs\n* bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it\u002Fs\n* bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it\u002Fs\n    \n@NotNANtoN recommends a batch size of 32 with 44 layers and training 1-8 epochs.\n\n\n## Where is this going?\n\nThis is just a teaser. We will be able to generate images, sound, anything at will, with natural language. The holodeck is about to become real in our lifetimes.\n\nPlease join replication efforts for DALL-E for \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fdalle-pytorch\">Pytorch\u003C\u002Fa> or \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FEleutherAI\u002FDALLE-mtf\">Mesh Tensorflow\u003C\u002Fa> if you are interested in furthering this technology.\n\n## Alternatives\n\n\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Flucidrains\u002Fbig-sleep\">Big Sleep\u003C\u002Fa> - CLIP and the generator from Big GAN\n\n## Citations\n\n```bibtex\n@misc{unpublished2021clip,\n    title  = {CLIP: Connecting Text and Images},\n    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},\n    year   = {2021}\n}\n```\n\n```bibtex\n@misc{sitzmann2020implicit,\n    title   = {Implicit Neural Representations with Periodic Activation Functions},\n    author  = {Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein},\n    year    = {2020},\n    eprint  = {2006.09661},\n    archivePrefix = {arXiv},\n    primaryClass = {cs.CV}\n}\n```\n\n[colab-notebook]: \u003Chttps:\u002F\u002Fcolab.research.google.com\u002Fdrive\u002F1FoHdqoqKntliaQKnMoNs3yn5EALqWtvP>\n","Deep Daze 是一个基于命令行的文本到图像生成工具，利用 OpenAI 的 CLIP 和 Siren（隐式神经表示网络）技术。其核心功能是将自然语言描述转化为高质量的图像，支持通过增加层数和使用更多显存来提高生成图片的质量。该项目适合需要快速从文字描述中创建视觉内容的场景，如艺术创作、设计概念可视化等。用户只需简单的命令即可开始生成过程，例如 `imagine \"a house in the forest\"`，并且可以通过添加参数调整输出效果。要求运行环境具备至少 4GB 显存的 GPU。","2026-06-11 03:24:51","top_topic"]