ZME Science
No Result
View All Result
ZME Science
No Result
View All Result
ZME Science

Home → Science → News

Better than Photoshop: AI synthesizes and edits complex images from a text description — and they’re mind-bogglingly good

I'd like a corgi wearing a red bowtie and a purple party hat, please.

Tibi PuiubyTibi Puiu
December 21, 2021
in Future, News
A A
Share on FacebookShare on TwitterSubmit to Reddit

Text-to-image synthesis generates images from natural language descriptions. You imagine some scenery or action, describe it through text, and then the AI generates the image for you from scratch. The image is unique and can be thought of as a window into machine ‘creativity’, if you can call it that. This field is still in its infancy and while previously such models were buggy and not all that impressive, the state of the art recently showcased by researchers at OpenAI is simply stunning. Frankly, it’s also a bit scary considering the abuse potential of deepfakes.

Imagine “a surrealist dream-like oil painting by Salvador Dali of a cat playing checkers”, “a futuristic city in synthwave style”, or “a corgi wearing a red bowtie and purple party hat”. What would these pictures look like? Perhaps if you were an artist, you could make them yourself. But the AI models developed at OpenAI, an AI research startup founded by Elon Musk and other prominent tech gurus, can generate photorealistic images almost immediately.

The images featured below speak for themselves.

“We observe that our model can produce photorealistic images with shadows and reflections, can compose multiple concepts in the correct way, and can produce artistic renderings of novel concepts,” wrote the researchers in the pre-print server arXiv.

In order to achieve photorealism from free-form text prompts, the researchers applied guided diffusion models. Diffusion models work by corrupting the training data by progressively adding Gaussian noise, slowly wiping out details in the data until it becomes pure noise, and then training a neural network to reverse this corruption process. Their advantage over other image synthesis models lies in their high sample quality, resulting in images or audio files that are almost indistinguishable from traditional versions to human judges.

The computer scientists at OpenAI first trained a 3.5 billion parameter diffusion model that contains a text encoder to condition the image content on natural language descriptions. Next, they compared two distinct techniques for guiding diffusion models towards text prompts: CLIP guidance and classifier-free guidance. Using a combination of automated and human evaluations, the study found classifier-free guidance yields the highest-quality images.

While these diffusion models are perfectly capable of synthesizing high-quality images from scratch, producing convincing images from very complex descriptions can be challenging. This is why the present model was equipped with editing capabilities in addition to “zero-shot generation”. After introducing a text description, the model looks for an existing image, then edits and paints over it. Edits match the style and lighting of the surrounding content, so it all feels like an automated Photoshop. This hybrid system is known as GLIDE, or Guided Language to Image Diffusion for Generation and Editing.

RelatedPosts

Artificial intelligence still has severe limitations in recognizing what it’s seeing
Robot learns by doing. Starts off plain stupid, then grows smarter – just like us
This AI probably knows where your photos were taken. Should we be worried?
MIT made an A.I. that detects 85 percent of cyber attacks

For instance, inputting a text description like “a girl hugging a corgi on a pedestal” will prompt GLIDE to find an existing image of a girl hugging a dog, then the AI cuts the canine from the original image and pastes a corgi.

Besides inpainting, the diffusion model is able to produce its own illustrations in various styles, such as the style of a particular artist, like Van Gogh, or the style of a specific painting. GLIDE can also compose concepts like a bowtie and birthday hat on a corgi, all while binding attributes, such as color or size, to these objects. Users can also make convincing edits to existing images with a simple text command.

Of course, GLIDE is not perfect. The examples posted above are success stories, but the study had its fair share of failures. Certain prompts that describe highly unusual objects or scenarios, such as requesting a car with triangle wheels, will not produce images with satisfying results. The diffusion models are only as good as the training data, so imagination is still very much in the human domain — for now at least.

The code for GLIDE has been released on GitHub.

Tags: artificial intelligencediffusion model

ShareTweetShare
Tibi Puiu

Tibi Puiu

Tibi is a science journalist and co-founder of ZME Science. He writes mainly about emerging tech, physics, climate, and space. In his spare time, Tibi likes to make weird music on his computer and groom felines. He has a B.Sc in mechanical engineering and an M.Sc in renewable energy systems.

Related Posts

Future

AI Designs Computer Chips We Can’t Understand — But They Work Really Well

byMihai Andrei
2 weeks ago
A gold winning strawberry
Mathematics

An AI Just Took Gold at the World’s Hardest Math Contest and It Wasn’t Even Trained For It

byMihai Andrei
4 weeks ago
Inventions

China’s New Mosquito Drone Could Probably Slip Through Windows and Spy Undetected

byMihai Andrei
2 months ago
Future

Your Brain Could Reveal a Deadly Heart Risk. AI Is Learning to Read the Signs

byMihai Andrei
2 months ago

Recent news

The disturbing reason why Japan’s Olympic athletes wear outfits designed to block infrared

August 19, 2025
Erin Kunz holds a microelectrode array in the Clark Center, Stanford University, on Thursday, August 8, 2025, in Stanford, Calif. The array is implanted in the brain to collect data. (Photo by Jim Gensheimer)

Brain Implant Translates Silent Inner Speech into Words, But Critics Raise Fears of Mind Reading Without Consent

August 19, 2025

‘Skin in a Syringe’ Might be the Future of Scar Free Healing For Burn Victims

August 18, 2025
  • About
  • Advertise
  • Editorial Policy
  • Privacy Policy and Terms of Use
  • How we review products
  • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.

No Result
View All Result
  • Science News
  • Environment
  • Health
  • Space
  • Future
  • Features
    • Natural Sciences
    • Physics
      • Matter and Energy
      • Quantum Mechanics
      • Thermodynamics
    • Chemistry
      • Periodic Table
      • Applied Chemistry
      • Materials
      • Physical Chemistry
    • Biology
      • Anatomy
      • Biochemistry
      • Ecology
      • Genetics
      • Microbiology
      • Plants and Fungi
    • Geology and Paleontology
      • Planet Earth
      • Earth Dynamics
      • Rocks and Minerals
      • Volcanoes
      • Dinosaurs
      • Fossils
    • Animals
      • Mammals
      • Birds
      • Fish
      • Amphibians
      • Reptiles
      • Invertebrates
      • Pets
      • Conservation
      • Animal facts
    • Climate and Weather
      • Climate change
      • Weather and atmosphere
    • Health
      • Drugs
      • Diseases and Conditions
      • Human Body
      • Mind and Brain
      • Food and Nutrition
      • Wellness
    • History and Humanities
      • Anthropology
      • Archaeology
      • History
      • Economics
      • People
      • Sociology
    • Space & Astronomy
      • The Solar System
      • Sun
      • The Moon
      • Planets
      • Asteroids, meteors & comets
      • Astronomy
      • Astrophysics
      • Cosmology
      • Exoplanets & Alien Life
      • Spaceflight and Exploration
    • Technology
      • Computer Science & IT
      • Engineering
      • Inventions
      • Sustainability
      • Renewable Energy
      • Green Living
    • Culture
    • Resources
  • Videos
  • Reviews
  • About Us
    • About
    • The Team
    • Advertise
    • Contribute
    • Editorial policy
    • Privacy Policy
    • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.