homehome Home chatchat Notifications


This AI module can create stunning images out of any text input

"an illustration of a baby daikon radish in a tutu walking a dog" "a lovestruck cup of boba" "a snail made of harp"

Mihai Andrei
January 8, 2021 @ 6:11 pm

share Share

A few months ago, researchers unveiled GPT-3 — the most advanced text-writing AI ever developed so far. The results were impressive: not only could the AI produce its own texts and mimic a given style, but it could even produce bits of simple code. Now, scientists at OpenAI which developed GPT-3, have added a new module to the mix.

“an armchair in the shape of an avocado”. Credit: OpenAI

Called DALL·E, a portmanteau of the artist Salvador Dalí and Pixar’s WALL·E, the module excerpts text with multiple characteristics, analyzes it, and then creates a picture of what it understands.

Take the example above, for instance. “An armchair in the shape of an avocado” is pretty descriptive, but can also be interpreted in several slightly different ways — the AI does just that. Sometimes it struggles to understand the meaning, but if you clarify it in more than one way it usually gets the job done, the researchers note in a blog post.

“We find that DALL·E can map the textures of various plants, animals, and other objects onto three-dimensional solids. As in the preceding visual, we find that repeating the caption with alternative phrasing improves the consistency of the results.”

Details about the module’s architecture have been scarce, but what we do know is that the operating principle is the same as with the text GPT-3. If the user types in a prompt for the text AI, say “Tell me a story about a white cat who jumps on a house”, it will produce a story of that nature. The same input a second time won’t produce the same thing, but a different version of the story. The same principle is used in the graphics AI. The user can get multiple variations of the same input, not just one. Remarkably, the AI is even capable of transmitting human activities and characteristics to other objects, such as a radish walking a dog or a lovestruck cup of boba.

“an illustration of a baby daikon radish in a tutu walking a dog”. Credit: OpenAi.
“a lovestruck cup of boba”. Image credits: OpenAI.

“We find it interesting how DALL·E adapts human body parts onto animals,” the researchers note. “For example, when asked to draw a daikon radish blowing its nose, sipping a latte, or riding a unicycle, DALL·E often draws the kerchief, hands, and feet in plausible locations.”

Perhaps the most striking thing about these images is how plausible they look. It’s not just dull representations of objects, the adaptations and novelties in the images seem to bear creativity as well. There’s an almost human ambiguity to the way it interprets the input as well. For instance, here are some images it produced when asked for “a collection of glasses sitting on a table”.

Image credits: OpenAI.

The system uses a body of information consisting of internet pages. Each part of the text is taken separately and researched to see what it would look like. For instance, in the image above, it would look at thousands of photos of glasses, then thousands of photos of a table, and then it would combine the two. Sometimes, it would decide on eyeglasses; other times, drinking glasses, or a mixture of both.

DALL·E also appears capable of combining things that don’t exist (or are unlikely to exist) together, transferring traits from one to the other. This is apparent in the avocado-shaped armchair images, but is even more striking in the “snail made of harp” ones.

The algorithm also has the ability to apply some optical distortion to scenes, such as “fisheye lens view” and “a spherical panorama,” its creators note.

DALL·E is also capable of reproducing and adapting real places or objects. When prompted to draw famous landmarks or traditional food, it

At this point, it’s not entirely clear what it could be used for. Fashion and design come to mind as potential applications, though this is likely just scratching the surface of what the module can do. Until further details are released, take a moment to relax with this collage of capybaras looking at the sunset painted in different styles.

Image credits: OpenAI

share Share

Big Tech Said It Was Impossible to Create an AI Based on Ethically Sourced Data. These Researchers Proved Them Wrong

A massive AI breakthrough built entirely on public domain and open-licensed data

Lawyers are already citing fake, AI-generated cases and it's becoming a problem

Just in case you're wondering how society is dealing with AI.

Leading AI models sometimes refuse to shut down when ordered

Models trained to solve problems are now learning to survive—even if we tell them not to.

AI slop is way more common than you think. Here's what we know

The odds are you've seen it too.

Scientists Invented a Way to Store Data in Plastic Molecules and It Could Someday Replace Hard Drives

What if your next hard drive wasn’t a box, but a string of molecules? Synthetic polymers promises to revolutionize data storage.

Meet Cavorite X7: An aircraft that can hover like a helicopter and fly like a plane

This unusual hybrid aircraft has sliding panels on its wings that cover hidden electric fans.

AI is quietly changing how we design our work

AI reshapes engineering, from sketches to skyscrapers, promising speed, smarts, and new creations.

Inside the Great Firewall: China’s Relentless Battle to Control the Internet

On the Chinese internet, a river crab isn’t just a crustacean. It’s code. River crab are Internet slang terms created by Chinese netizens in reference to the Internet censorship, or other kinds of censorship in mainland China. They need to do this because the Great Firewall of China censors and regulates everything that is posted […]

Anthropic's new AI model (Claude) will scheme and even blackmail to avoid getting shut down

In a fictional scenario, Claude blackmailed an engineer for having an affair.

Grok Won’t Shut Up About “White Genocide” Conspiracy Theories — Even When Asked About HBO or Other Random Things

Regardless of the context Grok, it seems, is being used to actively push a topic onto its users.