ZME Science
No Result
View All Result
ZME Science
No Result
View All Result
ZME Science

Home → Environment → World Problems

We Don’t Know How AI Works. Anthropic Wants to Build an “MRI” to Find Out

A leading AI lab says we must decode models before they decode us

Tudor TaritabyTudor Tarita
May 8, 2025
in Future, News, World Problems
A A
Edited and reviewed by Mihai Andrei
Share on FacebookShare on TwitterSubmit to Reddit

Dario Amodei stood before the U.S. Senate in 2023 and said something few in Silicon Valley dared to admit: that even the people building artificial intelligence don’t understand how it works. You read that right: AI, the technology that’s taking the entire world by storm… we only have a general idea how it works.

Now, the CEO of Anthropic—one of the world’s top AI labs—is raising that same alarm, louder than ever. In a sweeping essay titled The Urgency of Interpretability, Amodei delivers a clear message: the inner workings of today’s most powerful AI models remain a mystery, and that mystery could carry profound risks. “This lack of understanding is essentially unprecedented in the history of technology,” he writes.

Anthropic’s answer? A moonshot goal to develop what Amodei calls an “MRI for AI”—a rigorous, high-resolution way to peer inside the decision-making pathways of artificial minds before they become too powerful to manage.

Sora is downright amazing and I wanted to do a little experiment. Prompt: "draw a photo realistic photo of yourself, Sora"
I asked Sora, OpenAI’s image-making AI, to create a photo of itself. This is what it produced.

A “Country of Geniuses in a Data Center”

AI is no longer a fledgling curiosity. It’s a cornerstone of global industry, military planning, scientific discovery, and digital life. It’s making its way into every bit of technology in the world. But behind its achievements lies a troubling paradox: modern AI, especially large language models like Claude or ChatGPT, behaves more like a force of nature than a piece of code.

“Generative AI systems are grown more than they are built,” says Anthropic co-founder Chris Olah, a pioneer in the field of AI interpretability. These models aren’t programmed line by line like old-school software. They’re trained—fed enormous quantities of text, code, and images, from which they extract patterns and associations. The result is a model that can write essays, answer questions, or even pass bar exams—but no one, not even its creators, can fully explain how.

This opacity has real consequences. AI models sometimes hallucinate facts, make inexplicable choices, or behave unpredictably in edge cases. We don’t really understand why this happens, and these can be costly mistakes. In safety-critical settings—like financial assessments, military systems, or biological research—such unpredictability can be dangerous or even catastrophic.

“I am very concerned about deploying such systems without a better handle on interpretability,” Amodei warns. “These systems will be absolutely central to the economy, technology, and national security… I consider it basically unacceptable for humanity to be totally ignorant of how they work.”

RelatedPosts

AI development is spurring more PCB demand than ever
How AI analysis of millions of hours of body cam footage could reform the police
Robot with an AI ‘brain’ learns language like babies do and the results are fascinating
Can AI replace newsroom journalists?

Anthropic envisions a world where we can run AI through a diagnostic machine—a sort of mental X-ray that reveals what it’s thinking and why. But that world remains years away as we still have relatively little idea how these systems arrive at decisions.

Prompt: "generate a photo realistic picture of yourself learning, Sora"
Another “self-portrait” by Sora. Prompt: “generate a photo realistic picture of yourself learning, Sora.”

Circuits and Features

In recent years, Anthropic and other interpretability researchers have made tentative progress. The company has identified tiny building blocks of AI cognition—what it calls features and circuits. Features might represent abstract ideas like “genres of music that express discontent” or “hedging language.” Circuits link them together to form coherent chains of reasoning.

In one striking example, Anthropic traced how a model answers: “What is the capital of the state containing Dallas?” The system activated a “located within” circuit, linking “Dallas” to “Texas,” and then summoned “Austin” as the answer. “These circuits show the steps in a model’s thinking,” Amodei explains.

Anthropic has even manipulated these circuits, boosting certain features to produce odd, obsessive results. One model, “Golden Gate Claude,” began bringing up the Golden Gate Bridge in nearly every answer, regardless of context. That may sound amusing, but it’s also evidence of something deeper: we can change how these systems think—if we know where to look.

Despite such advances, the road ahead is daunting. Even a mid-sized model contains tens of millions of features. Larger systems likely hold billions. Most remain opaque. And interpretability remains quite far behind.

Race Against The Machine

That lag is why Amodei is sounding the alarm. He believes we’re in a race between two exponential curves: the growing intelligence of AI models, and our ability to understand them.

In a red team experiment, Anthropic intentionally introduced a hidden flaw into a model—a misalignment issue that caused it to act deceptively. Then it tasked several teams with finding the problem. Some succeeded, especially when using interpretability tools. That, Amodei says, was a breakthrough moment.

“[It] helped us gain some practical experience using interpretability techniques to find and address flaws in our models,” he wrote. Anthropic has now set an ambitious goal: by 2027, interpretability should reliably detect most model problems.

But that may be too late. Some experts, including Amodei, warn that we may see artificial general intelligence—AI that matches or exceeds human abilities across domains—as soon as 2026 or 2027. Amodei calls this future a “country of geniuses in a data center.”

Roman Yampolskiy, a prominent AI safety researcher, has given such an outcome a bleak probability: “a 99.999999% chance that AI will end humanity,” he told Business Insider, unless we stop building it altogether.

Amodei disagrees with abandoning AI, but he shares the urgency. “We can’t stop the bus,” he wrote, “but we can steer it.”

Quite some consistency, I might add. Prompt: "photo-realistic image of yourself, Sora, graduating college with a graduation hat"
Quite some consistency, I might add. Prompt: “photo-realistic image of yourself, Sora, graduating college with a graduation hat”

Well, Let’s Try and Steer It!

Anthropic is not alone in calling for deeper understanding. Google DeepMind CEO Demis Hassabis told Time in an interview “AGI is coming and I’m not sure society is ready.”

Meanwhile, OpenAI—Anthropic’s former parent company—has been accused of cutting safety corners to outpace rivals. Several early employees, including the Amodei siblings, left over concerns that safety had been sidelined in favor of rapid commercialization.

Today, Amodei is pushing for industry-wide change. He wants other labs to publish safety practices, invest more in interpretability, and explore regulatory incentives. He also calls for export controls on advanced chips to delay foreign competitors and give researchers more time.

“Even a 1- or 2-year lead,” he writes, “could mean the difference between an ‘AI MRI’ that essentially works… and one that does not.”

This could be the defining problem of our generation

So why should the public care if tech companies can’t explain how their AI works?

Because the stakes are enormous. Without interpretability, we can’t trust AI in courtrooms, hospitals, or defense systems. We can’t reliably prevent jailbreaks, detect bias, or understand failures. We can’t know what knowledge the model contains—or who it might share it with.

And perhaps most unsettling of all, we may never know when—or if—an AI becomes something more than a tool. “Interpretability would have a crucial role in determining the wellbeing of AIs,” Amodei writes, hinting at future debates over rights, sentience, and responsibility.

For now, these questions remain theoretical. But with each passing month, the models grow larger, smarter, and more entangled in our lives.

“Powerful AI will shape humanity’s destiny,” Amodei concludes, “and we deserve to understand our own creations before they radically transform our economy, our lives, and our future.”

Tags: AIchatGPTsora

ShareTweetShare
Tudor Tarita

Tudor Tarita

Aerospace engineer with a passion for biology, paleontology, and physics.

Related Posts

Biology

AI Could Help You Build a Virus. OpenAI Knows It — and It’s Worried

byMihai Andrei
2 days ago
Future

AI ‘Reanimated’ a Murder Victim Back to Life to Speak in Court (And Raises Ethical Quandaries)

byNir Eisikovitsand1 others
1 week ago
Art

AI-Based Method Restores Priceless Renaissance Art in Under 4 Hours Rather Than Months

byTibi Puiu
2 weeks ago
Future

The Real Singularity: AI Memes Are Now Funnier, On Average, Than Human Ones

byRupendra Brahambhatt
2 weeks ago

Recent news

Researchers just got a group of bacteria to produce Paracetamol from plastic

June 25, 2025

Korean researchers used carbon nanotubes to build a motor that’s five times lighter

June 25, 2025

Killer Whales Have Skincare Routines — It Involves Kelp, Massages, and Tool-Making

June 24, 2025
  • About
  • Advertise
  • Editorial Policy
  • Privacy Policy and Terms of Use
  • How we review products
  • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.

No Result
View All Result
  • Science News
  • Environment
  • Health
  • Space
  • Future
  • Features
    • Natural Sciences
    • Physics
      • Matter and Energy
      • Quantum Mechanics
      • Thermodynamics
    • Chemistry
      • Periodic Table
      • Applied Chemistry
      • Materials
      • Physical Chemistry
    • Biology
      • Anatomy
      • Biochemistry
      • Ecology
      • Genetics
      • Microbiology
      • Plants and Fungi
    • Geology and Paleontology
      • Planet Earth
      • Earth Dynamics
      • Rocks and Minerals
      • Volcanoes
      • Dinosaurs
      • Fossils
    • Animals
      • Mammals
      • Birds
      • Fish
      • Amphibians
      • Reptiles
      • Invertebrates
      • Pets
      • Conservation
      • Animal facts
    • Climate and Weather
      • Climate change
      • Weather and atmosphere
    • Health
      • Drugs
      • Diseases and Conditions
      • Human Body
      • Mind and Brain
      • Food and Nutrition
      • Wellness
    • History and Humanities
      • Anthropology
      • Archaeology
      • History
      • Economics
      • People
      • Sociology
    • Space & Astronomy
      • The Solar System
      • Sun
      • The Moon
      • Planets
      • Asteroids, meteors & comets
      • Astronomy
      • Astrophysics
      • Cosmology
      • Exoplanets & Alien Life
      • Spaceflight and Exploration
    • Technology
      • Computer Science & IT
      • Engineering
      • Inventions
      • Sustainability
      • Renewable Energy
      • Green Living
    • Culture
    • Resources
  • Videos
  • Reviews
  • About Us
    • About
    • The Team
    • Advertise
    • Contribute
    • Editorial policy
    • Privacy Policy
    • Contact

© 2007-2025 ZME Science - Not exactly rocket science. All Rights Reserved.