A new breed of AI programs have taken the internet by storm. They generate high-quality images based on a text description.
DALL-E, an image generator created by commercial AI research lab OpenAI, is one of the most popular examples. There are many similar tools available, including Craiyon (formerly DALL-E Mini), which does not require registration and has unlimited tries.
Google has recently released a text-to-image model named Imagen that appears to push the boundaries of caption-conditional image generation. It has a level of photorealism that is unprecedented in machine learning, according to Google. The results of Imagen are on par with real photographs and human-drawn art.
The model uses a Transformer encoder to transform the caption into a sequence of words, then feeds that sequence to a U-Net to generate an image. The generated image is then conditioned with noise-conditioned augmentation to improve quality and reduce artifacts.
The resulting image is then fed into another diffusion model to upscale the low-resolution image to a 1024-by-1024-pixel size. The model achieves state-of-the-art performance on the COCO benchmark, with a FID score of 7.27. The model also performs well on the new DrawBench dataset that features rigorous prompts such as compositionality, cardinality, spatial relations, long-form text, rare words, and challenging images. Moreover, it performs better than DALL-E 2 and even VQ-GAN+CLIP models on this dataset.
Unlike most other image-to-text generators, DALL-E 2 is a generative model—a branch of machine learning that creates complex output instead of predicting or classifying input data. It is trained on a massive set of images and text descriptions scraped from the Internet, which allows it to produce a variety of results from text prompts.
One of its most promising features is its ability to maintain semantic consistency in the images it generates, like a koala dunking a basketball or an astronaut riding a horse. This is a welcome improvement over its predecessor, DALL E 1, which often produced random and unrelated images such as a girl with headphones or an empty room.
When using DALL-E 2, be sure to read the content policy carefully, which specifies restrictions on works with recognizable people and outlines ownership rights. Also, remember that you only have a limited number of free credits—50 in your first month and 15 each month thereafter.
Midjourney, an independent research lab’s eponymous AI program, has made waves in the world of artificial intelligence. The generative AI uses natural language descriptions of artwork to create images – called prompts – similar to OpenAI’s DALL-E and BigSleep’s Imagen.
While Midjourney is less proficient at adapting actual art styles, it does excel at creating environments, particularly fantasy and dystopian sci-fi scenes with dramatic lighting that look like rendered concept art from a video game. Its unique style has attracted attention from AI enthusiasts and artists alike.
To use the program, access one of the designated bot channels for newcomers on the Discord server and enter a creative prompt in public chat with the /imagine command. Once the bot processes your prompt, it will present four visual representations of it in a grid. You can then upscale the generated images and create image variations using the /imageU and /imageV buttons. Using these commands will initiate a free trial, which gives you around 25 “Jobs” (a Job is any action the bot performs). The free trial can also be upgraded to private messaging for an additional $20 per month.
Bing Image Creator
Bing Image Creator is Microsoft’s latest generative AI tool. It uses OpenAI’s DALL-E model to turn text prompts into images. The feature is available starting Tuesday in Bing and Edge. It’s integrated into the chat experience, initially rolling out in Creative mode.
You can provide a prompt, add context like location or activity, and choose an art style for the AI to generate an image for you. You can then choose one of the four generated images and view a larger version to share, save to a collection, or download.
As with any AI-powered tool, there’s the potential for misuse and misappropriation of these generated images. Microsoft says it’s working with OpenAI to curb this possibility and is implementing safeguards and additional protections that will help limit the creation of harmful or unsafe images. In addition to its creative potential, the feature is a great way to generate eye-catching images for your content and social media.