ZME Science
No Result
View All Result
ZME Science
No Result
View All Result
ZME Science

Home → Future

This AI Can Zoom Into a Photo 256 Times And The Results Look Insane

Chain-of-Zoom could help AI "see" up to 256 times more clearly.

Tibi PuiubyTibi Puiu
June 4, 2025
in Future, News
A A
Edited and reviewed by Zoe Gordon
Share on FacebookShare on TwitterSubmit to Reddit
Illustration of AI zoom
Credit: KAIST AI.

On a computer screen, the blurry photo of a flag begins to sharpen. Wrinkles emerge on its surface, creases fluttering in a phantom wind. Zoom in again, and threads begin to appear. Again — and there’s a hint of fray at the edge. In this digital sleight of hand, you’re not watching pixels merely stretch or smear. You’re watching artificial intelligence recreate what a better camera might have seen.

This is the promise of Chain-of-Zoom, or CoZ, a new AI framework developed by South Korean researchers at KAIST AI led by Kim Jaechul. The approach aims to solve one of the thorniest problems in modern image enhancement: how to zoom in — dramatically — on a low-resolution image while still keeping the details sharp and believable.

Apparently, the best way to do it is you don’t zoom all at once.

Move Over, CSI

Traditional single-image super-resolution (SISR) systems do their best to guess what’s missing when they’re asked to upscale an image. Many rely on generative models trained to create plausible high-resolution versions of low-resolution photos. It’s like a sort of educated guesswork that fills in the blank with pixels with high odds of being there, probabilistically speaking. But these models are only as good as their training allows — and they tend to fall apart when pushed beyond familiar limits.

“State-of-the-art models excel at their trained scale factors yet fail when asked to enlarge images far beyond that range,” the KAIST team writes in their paper that appeared in the preprint server arXiv.

Chain-of-Zoom sidesteps this limitation by breaking the zooming process into manageable steps. Instead of stretching an image 256 times in one go — a leap that would cause the AI to blur or hallucinate details — CoZ builds a staircase. Each step is a small, calculated zoom, built upon the last.

At every rung of this ladder, CoZ uses an existing super-resolution model — like a well-trained diffusion model — to refine the image. But it doesn’t stop there. A Vision-Language Model (VLM) joins the process, generating descriptive prompts that help the AI imagine what should appear in the next, higher-resolution version.

RelatedPosts

Hackers can still spy on you even if you cover the webcam. Here’s how
This attractive Spanish model makes over $1,000 per ad — and she’s AI generated
AI Is Changing Education — But Are We Keeping Up?
AI fail: Chinese driver gets fine for scratching his face

“The second image is a zoom-in of the first image. Based on this knowledge, what is in the second image?” That’s one of the actual prompts used during training. The VLM’s job is to respond with a handful of meaningful words: “leaf veins,” “fur texture,” “brick wall,” and so on. These prompts guide the next zoom step, like verbal cues handed to an artist sketching in more detail.

Between Pixels and Words

Examplee of results of the AI zoom using different prompts and apporaches
Significance of proposed multi-scale-aware prompts: (a) Null prompt: coarse structure is retained, but high-frequency details are smoothed out. (b) DAPE prompt: inserting text from a degradation-aware prompt extractor (DAPE) helps, yet the images lack intricate detail at large magnifications. (c) VLM-generated prompts (ours): multi-scale prompts extracted by a VLM steer the SR backbone to synthesize realistic textures and crisp details. Credit: KAIST AI.

This interplay between images and language is what sets CoZ apart. As you keep zooming in, the original image loses fidelity — visual clues fade, context disappears. That’s when words matter most.

But generating the right prompts isn’t easy. Off-the-shelf VLMs can repeat themselves, invent odd phrases, or misinterpret blurry input. To keep the process grounded and efficient, the researchers turned to reinforcement learning with human feedback (RLHF). They trained their prompt-generating model to align with human preferences using a technique called Generalized Reward Policy Optimization, or GRPO.

Example of Chain of zoom ai results
Qualitative results for performing CoZ with the open-source OSEDiff (leveraging Stable Diffusion v2.1 as the diffusion backbone). The GRPO fine-tuned VLM is used as the prompt extractor. Credit: KAIST AI.

Three kinds of feedback guided the learning process:

  • A critic VLM scored prompts for how well they matched the images.
  • A blacklist penalized confusing phrases like “first image” or “second image.”
  • A repetition filter discouraged generic or repetitive text.

As training progressed, the prompts became cleaner, more specific, and more useful. Words like “crab claw” replaced vague guesses like “ant leg.” The final model consistently guided the super-resolution engine toward images that were both detailed and believable — even when zooming in 256 times.

Real-World Potential

Extreme super-resolution of photorealistic images by CoZ up to 64× magnification. Credit: KAIST AI.

In side-by-side comparisons with other methods — including nearest-neighbor upscaling and one-step super-resolution — CoZ produced images that stood out for their clarity and texture. Its outputs were evaluated using several no-reference quality metrics, like NIQE and CLIPIQA. Across four magnification levels (4×, 16×, 64×, 256×), CoZ consistently outperformed alternatives, especially at higher scales.

Extreme super-resolution of photorealistic images by CoZ up to 256× magnification. Credit: KAIST AI.

But beyond numbers, the promise of Chain-of-Zoom lies in its flexibility.

It doesn’t require retraining the underlying super-resolution model. That makes it more accessible to developers and researchers who already rely on models like Stable Diffusion. It also opens the door to applications that need fast, high-fidelity zoom without massive computational cost.

All of this may transform how we approach super-resolution.

Potential uses span across fields, including:

  • Medical imaging, where enhanced detail could aid diagnosis.
  • Surveillance footage, helping investigators read distant license plates or facial features.
  • Cultural preservation, restoring old photos with unprecedented clarity.
  • Scientific visualization, especially in fields like microscopy or astronomy.

In one demonstration, CoZ enhanced a photo of leaves until the individual veins were visible — features that weren’t discernible in the original low-resolution image. In another, it revealed the fine weave of a textile.

While these examples are compelling, they also hint at a double-edged sword. Once you zoom in far enough, you’re no longer viewing the original picture but a synthetic copy. In other words, the scenery in the enhanced image doesn’t exist in reality — although it may very closely resemble the original subject of the photo.

That doesn’t make this model any less useful, but these limitations need to be perfectly understood.

The limitations come with their associated risks. Technologies like Chain-of-Zoom, while not inherently deceptive, could be used to manipulate visual data or generate misleading content from blurry sources.

The authors acknowledge this in their paper: “High-fidelity generation from low-resolution inputs may raise concern regarding misinformation or unauthorized reconstruction of sensitive visual data.”

In a world already grappling with deepfakes and visual disinformation, the ability to “see more” isn’t always a blessing. The solution, as always, lies in transparent development and responsible use.

A New Lens on Vision

For now, Chain-of-Zoom represents an elegant solution to a deeply practical problem. It doesn’t reinvent the wheel — it just changes how the wheel turns.

Instead of stretching images beyond their breaking point, CoZ asks: what if we take it slow, one zoom at a time?

The result is not just clearer images. It’s a clearer path forward.

Tags: AIstable diffusionupscaling

ShareTweetShare
Tibi Puiu

Tibi Puiu

Tibi is a science journalist and co-founder of ZME Science. He writes mainly about emerging tech, physics, climate, and space. In his spare time, Tibi likes to make weird music on his computer and groom felines. He has a B.Sc in mechanical engineering and an M.Sc in renewable energy systems.

Related Posts

Health

3D-Printed Pen With Magnetic Ink Can Detect Parkinson’s From Handwriting

byTibi Puiu
1 day ago
Mind & Brain

AI and Brain Scans Reveal Why You Struggle to Recognize Faces of People of Other Races

byTibi Puiu
3 weeks ago
Future

A New AI Tool Can Recreate Your Face Using Nothing But Your DNA

byTibi Puiu
4 weeks ago
Future

We Don’t Know How AI Works. Anthropic Wants to Build an “MRI” to Find Out

byTudor Tarita
4 weeks ago

Recent news

Prehistoric Humans Lit Fires to Smoke Meat a Million Years Ago

June 4, 2025

Student Finds the Psychedelic Fungus the Inventor of LSD Spent His Life Searching For

June 4, 2025

The Real Sound of Clapping Isn’t From Your Hands Hitting Each Other

June 4, 2025
  • About
  • Advertise
  • Editorial Policy
  • Privacy Policy and Terms of Use
  • How we review products
  • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.

No Result
View All Result
  • Science News
  • Environment
  • Health
  • Space
  • Future
  • Features
    • Natural Sciences
    • Physics
      • Matter and Energy
      • Quantum Mechanics
      • Thermodynamics
    • Chemistry
      • Periodic Table
      • Applied Chemistry
      • Materials
      • Physical Chemistry
    • Biology
      • Anatomy
      • Biochemistry
      • Ecology
      • Genetics
      • Microbiology
      • Plants and Fungi
    • Geology and Paleontology
      • Planet Earth
      • Earth Dynamics
      • Rocks and Minerals
      • Volcanoes
      • Dinosaurs
      • Fossils
    • Animals
      • Mammals
      • Birds
      • Fish
      • Amphibians
      • Reptiles
      • Invertebrates
      • Pets
      • Conservation
      • Animal facts
    • Climate and Weather
      • Climate change
      • Weather and atmosphere
    • Health
      • Drugs
      • Diseases and Conditions
      • Human Body
      • Mind and Brain
      • Food and Nutrition
      • Wellness
    • History and Humanities
      • Anthropology
      • Archaeology
      • History
      • Economics
      • People
      • Sociology
    • Space & Astronomy
      • The Solar System
      • Sun
      • The Moon
      • Planets
      • Asteroids, meteors & comets
      • Astronomy
      • Astrophysics
      • Cosmology
      • Exoplanets & Alien Life
      • Spaceflight and Exploration
    • Technology
      • Computer Science & IT
      • Engineering
      • Inventions
      • Sustainability
      • Renewable Energy
      • Green Living
    • Culture
    • Resources
  • Videos
  • Reviews
  • About Us
    • About
    • The Team
    • Advertise
    • Contribute
    • Editorial policy
    • Privacy Policy
    • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.