![]() Twitter accounts like and which leverage VQGAN + CLIP with user-submitted prompts have gone viral and received mainstream press. The core CLIP-guided training was improved and translated to a Colab Notebook by Katherine Crawson/ and others in a special Discord server. Then open source worked its magic: the GAN base was changed to VQGAN, a newer model architecture Patrick Esser and Robin Rombach and Björn Ommer which allows more coherent image generation. The first implementation was Big Sleep by Ryan Murdock/ which combined CLIP with an image generating GAN named BigGAN. Since CLIP is essentially an interface between representations of text and image data, clever hacking can allow anyone to create their own pseudo-DALL-E. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |