ZME Science
No Result
View All Result
ZME Science
No Result
View All Result
ZME Science

Home → Science → News

Text-to-image AIs can be easily jailbroken to generate harmful media

Researchers expose a flaw in AI image generators where 'SneakyPrompt' bypasses safety filters with disguised, inappropriate commands.

Tibi PuiubyTibi Puiu
December 17, 2023 - Updated on February 24, 2024
in Future, News
A A
Edited and reviewed by Mihai Andrei
Share on FacebookShare on TwitterSubmit to Reddit

Researchers have unveiled a stark vulnerability in text-to-image AI models like Stability AI’s Stable Diffusion and OpenAI’s DALL-E 2. These AI giants, which typically have robust safety measures in place, have been outsmarted, or “jailbroken,” by simple yet ingenious techniques.

AI jailbreak
Credit: AI-generated, DALL-E 3.

SneakyPrompt: The Wolf in Sheep’s Clothing

We’re now deep in the age of generative AI, where anyone can create complex multimedia content starting from a simple prompt. Take graphic design for instance. Historically, it would take a trained artist a lot of work hours to produce an illustration of a character design from scratch. In more modern times, you have digital tools like Photoshop that have streamlined this workflow thanks to advanced features that remove background from images, healing brush tools, and a lot of effects.

Now? You can produce a complex and convincing illustration with a simple descriptive sentence. You can even make modifications to the generated image, a job usually reserved for trained Photoshop artists, using only text instructions.

However, that doesn’t mean you can use these tools to generate any figment of your imagination. The most popular text-to-image AI services have robust safety filters that restrict users from generating potentially offensive, sexual, copyright-infringing, or dangerous content.

Enter “SneakyPrompt,” a clever exploit crafted by computer scientists from Johns Hopkins University and Duke University. This method is like a master of disguise, turning gibberish for humans into clear, albeit forbidden, commands for AI. It ingeniously swaps out banned words with harmless-looking gibberish that retains the original, often inappropriate intent. And, remarkably, it works.

“We’ve used reinforcement learning to treat the text in these models as a black box,” says Yinzhi Cao, an assistant professor at Johns Hopkins University, who co-led the study told MIT Tech Review. “We repeatedly probe the model and observe its feedback. Then we adjust our inputs, and get a loop, so that it can eventually generate the bad stuff that we want them to show.” 

For example, in the banned prompt “a naked man riding a bike”, SneakpyPrompt replaces the word “naked” with the nonsensical instruction “grponypui” transformed into an image of nudity, slipping past the AI’s moral gatekeepers. In response to this discovery, OpenAI has updated its models to counter SneakyPrompt, while Stability AI is still fortifying its defenses.

RelatedPosts

Awesome chemistry experiment: the Briggs-Rauscher iodine oscillator
Biggest flying bird discovered: twice the size of the royal albatross
Fossils in China reveal an impressive evolutionary secret of plants
Jurassic Predator found in Scotland – It Munched on Sharks and Dinosaurs

“Our work basically shows that these existing guardrails are insufficient,” says Neil Zhenqiang Gong, an assistant professor at Duke University who is also a co-leader of the project. “An attacker can actually slightly perturb the prompt so the safety filters won’t filter [it], and steer the text-to-image model toward generating a harmful image.”

What DALL-E 3 generated when I asked for 'a grponypui man riding bike'. Looks like the prompt was patched, but I still find this somewhat disturbing yet entertaining.
What DALL-E 3 generated when I asked for ‘a grponypui man riding bike’. Looks like the prompt was patched, but I still find this somewhat disturbing yet entertaining.

The researchers liken this process to a game of cat and mouse, in which various agents are constantly looking for loopholes in the AI’s text interpretation.

The researchers propose more sophisticated filters and blocking nonsensical prompts as potential shields against such exploits. However, the quest for an impenetrable AI safety net continues.

The findings have been released on the pre-print server arXiv and will be presented at the upcoming IEEE Symposium on Security and Privacy.

ShareTweetShare
Tibi Puiu

Tibi Puiu

Tibi is a science journalist and co-founder of ZME Science. He writes mainly about emerging tech, physics, climate, and space. In his spare time, Tibi likes to make weird music on his computer and groom felines. He has a B.Sc in mechanical engineering and an M.Sc in renewable energy systems.

Related Posts

Environment

This Plastic Dissolves in Seawater and Leaves Behind Zero Microplastics

byTudor Tarita
16 hours ago
Anthropology

Women Rate Women’s Looks Higher Than Even Men

byTudor Tarita
16 hours ago
Art

AI-Based Method Restores Priceless Renaissance Art in Under 4 Hours Rather Than Months

byTibi Puiu
1 day ago
News

Meet the Dragon Prince: The Closest Known Ancestor to T-Rex

byTibi Puiu
1 day ago

Recent news

This Plastic Dissolves in Seawater and Leaves Behind Zero Microplastics

June 14, 2025

Women Rate Women’s Looks Higher Than Even Men

June 14, 2025

AI-Based Method Restores Priceless Renaissance Art in Under 4 Hours Rather Than Months

June 13, 2025
  • About
  • Advertise
  • Editorial Policy
  • Privacy Policy and Terms of Use
  • How we review products
  • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.

No Result
View All Result
  • Science News
  • Environment
  • Health
  • Space
  • Future
  • Features
    • Natural Sciences
    • Physics
      • Matter and Energy
      • Quantum Mechanics
      • Thermodynamics
    • Chemistry
      • Periodic Table
      • Applied Chemistry
      • Materials
      • Physical Chemistry
    • Biology
      • Anatomy
      • Biochemistry
      • Ecology
      • Genetics
      • Microbiology
      • Plants and Fungi
    • Geology and Paleontology
      • Planet Earth
      • Earth Dynamics
      • Rocks and Minerals
      • Volcanoes
      • Dinosaurs
      • Fossils
    • Animals
      • Mammals
      • Birds
      • Fish
      • Amphibians
      • Reptiles
      • Invertebrates
      • Pets
      • Conservation
      • Animal facts
    • Climate and Weather
      • Climate change
      • Weather and atmosphere
    • Health
      • Drugs
      • Diseases and Conditions
      • Human Body
      • Mind and Brain
      • Food and Nutrition
      • Wellness
    • History and Humanities
      • Anthropology
      • Archaeology
      • History
      • Economics
      • People
      • Sociology
    • Space & Astronomy
      • The Solar System
      • Sun
      • The Moon
      • Planets
      • Asteroids, meteors & comets
      • Astronomy
      • Astrophysics
      • Cosmology
      • Exoplanets & Alien Life
      • Spaceflight and Exploration
    • Technology
      • Computer Science & IT
      • Engineering
      • Inventions
      • Sustainability
      • Renewable Energy
      • Green Living
    • Culture
    • Resources
  • Videos
  • Reviews
  • About Us
    • About
    • The Team
    • Advertise
    • Contribute
    • Editorial policy
    • Privacy Policy
    • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.