One of the latest applications of artificial intelligence (AI), and one that has been developing at a rapid pace, is text-to-image generators. Feed these AI algorithms any text you like, and they will generate pictures that match your description. The algorithms can be asked to create images in a range of styles, from oil paintings to CGI renders and even photographs, and the generated images may be strange, beautiful, or both!
How does AI know how to execute when asked to generate an image? The magic happens with generative adversarial networks (GANs). GANs are systems where two neural networks are pitted against one another: a generator that synthesizes images or data, and a discriminator that scores how plausible the results are. The system feeds back on itself to incrementally improve its score.
GANs have attracted a lot of coverage for their potential to bring about unsettling and dystopian use cases—deepfake videos, believable, realistic human faces, poorly trained datasets that inadvertently encode racism—but they also have positive use cases: upscaling low-resolution imagery, stylizing photographs, and repairing damaged artworks (even speculating on entire lost sections in masterpieces).
While a few text-to-image systems exist, (i.e., VQGAN+CLIP, Imagen, and Craiyon), the latest version of DALL-E is arguably the best at generating coherent images. DALL-E technology seems to have a good understanding of the world and the relationships between objects.
Developed by OpenAI, DALL-E is an AI program trained to generate images from text descriptions. DALL-E was first unveiled in January 2021, and its name was inspired by the Pixar robot WALL-E and the Surrealist painter Salvador Dalí.
The tool is currently considered one of the most advanced AI systems for generating images. Type a description, and DALL-E instantly produces professional-looking art or hyper-realistic photographs. DALL-E’s algorithm has been trained with hundreds of thousands of images plus associated text captions, and it makes rapid-fire associations to generate new images. The art it creates is not a mishmash of existing images. Rather, it creates a unique image based on its sophisticated AI model, or its version of a GAN, and it makes connections in ways that mimic the human brain.
DALL-E is now available in beta, but you must first join the waitlist and receive an invite. In its first year of operation, DALL-E was used only by a relatively small, vetted group of testers—mostly researchers, academics, journalists, and artists; however, in April 2022, OpenAI began providing access to more users by making DALL-E available in beta to some of those on the waitlist. According to OpenAI, invites are not being processed solely on a first-come, first-served basis, but rather, individuals are prioritized based on criteria that best suit the needs of OpenAI’s learning goals and in a way that can help distribute compute load across time zones.
OpenAI are sending out 1,000 invites per week and have a goal to onboard one million users. As of mid-July, 100,516 people had been invited to try DALL-E.
Artists and creative professionals have already been using DALL-E to create art quickly and easily. On the OpenAI blog, you can see curated selections of the works produced with DALL-E and introductions of some of the artists that have been using the system. Better yet, OpenAI’s Instagram account, where creative works by users are showcased, is a gallery of strange and beautiful art. Every day, a newly generated artwork is displayed.
As the number of DALL-E users grows, we can expect to see more of DALL-E’s generated artworks wherever we go on the Internet.
Here are some of the current features available to users of the DALL-E platform:
Every DALL-E user will receive 50 free credits during their first month of use and 15 free credits every subsequent month. Each credit can be used for one DALL-E prompt. An original image generation prompt returns four images, an edit prompt returns three images, and a variation prompt also returns three images.
In this first phase of the beta, users can buy additional credits in 115-credit increments for $15 on top of their free monthly credits. Users get full usage rights to commercialize images created with DALL-E, including the right to reprint, sell, and merchandise.
OpenAI is the company behind DALL-E. It is a non-profit research company that aims to develop and direct AI in ways that benefit humanity as a whole. The company was founded by Elon Musk (who has since left the management team) and Sam Altman in 2015. The company has the intent to freely collaborate with other research organizations and individuals for the betterment of AI research.
OpenAI wasn’t the first to openly declare it was pursuing artificial general intelligence (AGI); The pioneering AI research lab DeepMind, which was acquired by Google in 2014, had announced it was working on AGI when it was founded in 2010. DeepMind is home to the team behind Imagen.
OpenAI has developed other AI projects in the past and released them to the public. GPT-3, the latest version of OpenAI’s language model, can generate stories, articles, and poetry based on simple descriptions.
Craiyon, an open source alternative to DALL-E
Craiyon, formerly DALL-E mini, is an open-source alternative to DALL-E that can also draw images from any text prompt. Its original name was designed to suggest that it is a ‘mini’ version of DALL-E, and although it doesn’t have the same performance as DALL-E, it still boasts some incredible results. Craiyon’s algorithm was trained by looking at millions of images from the Internet with their associated captions. Over time, it learnt how to draw an image from a text prompt.
Imagen from Google
Another text-to-image generator is Google Imagen. According to benchmarks run by Google, comparisons of Imagen output with that of other models including VQ-GAN+CLIP and DALL-E 2 found that human raters prefer Imagen over the other models in side-by-side comparisons, both in terms of sample quality and image-text alignment. However, Imagen is still in a research-only phase and is not currently available to the public.
VQ-GAN and CLIP are actually two separate machine learning algorithms that can be used together to generate images based on a text prompt. VQ-GAN is a GAN that is good at generating images that look similar to others, and CLIP is a neural network that is able to determine how well a caption (or prompt) matches an image. The combination of the two technologies is what allows for AI image generation.
Text-to-image models certainly have fantastic creative potential and will revolutionize many industries that use art; however, they also have a range of troubling applications. These systems can generate almost any image a user requests, and it is certainly plausible that a bad-faith actor could misuse a generated image for fake news, hoaxes, and harassment. However, the technology will inevitably move beyond its current phase of permissioned access and become open and widespread among the public, and as the technology develops, so too will society and the response to dealing with such challenges.
*The opinions reflected in this article are the sole opinions of the author and do not reflect any official positions or claims by Acer Inc.
About Ashley Buckwell: Ashley is a technology writer who is interested in computers and software development. He is also a fintech researcher and is fascinated with emerging trends in DeFi, blockchain, and bitcoin. He has been writing, editing, and creating content for the ESL industry in Asia for eight years, with a special focus on interactive, digital learning.