homehome Home chatchat Notifications


Researchers encode data in DNA hundreds of times faster than before — with panda pics

Two images were stored in and retrieved from DNA sequences faster than ever before. This could be a game-changer for our data storage.

Mihai Andrei
October 24, 2024 @ 6:23 pm

share Share

AI image of DNA used for storage
DNA could soon become a reliable medium of storage. AI-generated image.

DNA can hold a staggering amount of information. Not only is it the blueprint for all life on Earth, but a single gram of DNA can store the equivalent of 215 million gigabytes of data. That’s enough to hold every digital book, song, and movie ever created. Gram for gram, DNA can store up to a billion times more data than silicon-based storage.

The traditional method of storing data in DNA involves encoding binary information (the ones and zeros of computing) into sequences of nucleotide bases — adenine (A), thymine (T), guanine (G) and cytosine (C) — and then synthesizing these sequences chemically. This method is promising but high costs and slow data writing speeds hamper it. The new study addresses these challenges by introducing a method that encodes data without synthesizing new DNA sequences.

The new method sidesteps these limitations.

In this new system, the research team, led by Cheng Zhang developed a method that uses epigenetic modifications to encode data. Epigenetic modifications involve chemical changes to DNA that do not alter its sequence but can influence its function. One common type of epigenetic modification is DNA methylation, where methyl groups are added to cytosine bases in the DNA sequence.

“It’s encouraging to see that epigenetic principles from biochemistry textbooks and taught in my classroom can be applied seamlessly to DNA data storage applications to solve some of the unmet challenges in this field,” says corresponding author Hao Yan.

How it all works

The team’s approach essentially “prints” data onto DNA using these methylation marks as binary data bits, or “epi-bits.” By using a library of prefabricated DNA templates and short DNA strands known as bricks, the researchers could guide where methyl groups are placed on the DNA, allowing them to encode complex information without having to synthesize new DNA molecules from scratch.

One of the most remarkable features of this new approach is its ability to write data in parallel. Traditional DNA synthesis is a serial process — each nucleotide must be added one at a time, which is time-consuming and costly. However, the new system allows the researchers to add multiple epi-bits of information simultaneously, increasing the speed and scalability of data storage.

Let’s say you’re writing a letter by hand. You’re writing all the letters one by one, which is not very efficient. But, when you print something, you print an entire row, which is much faster.

“This new approach demonstrates how one can harness molecular mechanisms for innovative data solutions, bridging the fields of biology and digital information,” says Laura Na Liu, a co-author of the new study.

Coding panda pics into DNA

Images of tigers showing the results of different DNA data storage methods
Recovered tiger images from samples 1 to 4 with stepwise improved writing-reading pipelines. 

The team tested their approach by storing an image of a panda and a rubbing in the shape of a tiger from ancient China. They then retrieved them with a DNA sequencer.

In their experiments, the researchers stored approximately 275,000 bits of information using their new system (about a third of a megabyte). They achieved this by employing a set of 700 DNA “movable types” (i.e., pre-made short DNA sequences) and five universal DNA templates. This allowed them to write 350 bits of data in a single reaction, a significant improvement over traditional methods. The approach was also reliable, having high fidelity and minimal error rates (less than 3%).

The DNA coding scheme for the image of a panda
 Compression and error correction coding scheme for panda image (i), and a schematic of the retrieved epi-bits on sequencing reads along with the restored image (ii).

To ensure that the data stored using epigenetic modifications could be accurately read, the researchers used high-throughput nanopore sequencing, a technology that reads DNA sequences by passing them through a tiny pore and detecting changes in electrical current.

The research also demonstrated a novel aspect of their technology: its accessibility. They conducted a pilot experiment called “iDNAdrive,” where 60 student volunteers with no professional biolab experience successfully encoded their own data into DNA using a simple kit. This shows that their system is not only scalable but also user-friendly.

This marks a significant departure from current DNA data storage methods, which could only be done in a lab before. In this distributed system, users could “write” data to DNA in their own homes and then retrieve it later through sequencing.

Big promise, big challenges

This research highlights the incredible potential of DNA as a medium for storing vast amounts of data in a compact, stable, and durable form. The innovative use of epigenetic modifications to encode data provides a new way to overcome the limitations of traditional DNA synthesis methods.

DNA is much more stable than silicon and other traditional storage media. Properly stored, DNA can last for thousands of years, making it ideal for archival purposes, such as preserving cultural artifacts, historical records, or scientific data. This method’s potential for distributed data storage could revolutionize personal data privacy and security. Instead of relying on cloud storage or data centers, individuals could store their most sensitive information in DNA, which could be kept in a secure location and accessed only when needed.

However, there are also enormous challenges ahead. For starters, only very small amounts of information were stored, and the error rates, while relatively low (<3%), are not acceptable for data we work with routinely.

Another challenge is the speed of data retrieval. Although nanopore sequencing allows for high-throughput reading of DNA, it is still slower than the reading speeds of conventional digital storage devices. Advances in sequencing technology will be crucial to making DNA data storage competitive with silicon-based systems.

The study “Parallel molecular data storage by printing epigenetic bits on DNA” was published in Nature.

share Share

This Film Shaped Like Shark Skin Makes Planes More Aerodynamic and Saves Billions in Fuel

Mimicking shark skin may help aviation shed fuel—and carbon

China Just Made the World's Fastest Transistor and It Is Not Made of Silicon

The new transistor runs 40% faster and uses less power.

Ice Age Humans in Ukraine Were Masterful Fire Benders, New Study Shows

Ice Age humans mastered fire with astonishing precision.

The "Bone Collector" Caterpillar Disguises Itself With the Bodies of Its Victims and Lives in Spider Webs

This insect doesn't play with its food. It just wears it.

University of Zurich Researchers Secretly Deployed AI Bots on Reddit in Unauthorized Study

The revelation has sparked outrage across the internet.

Giant Brain Study Took Seven Years to Test the Two Biggest Theories of Consciousness. Here's What Scientists Found

Both came up short but the search for human consciousness continues.

The Cybertruck is all tricks and no truck, a musky Tesla fail

Tesla’s baking sheet on wheels rides fast in the recall lane toward a dead end where dysfunctional men gather.

British archaeologists find ancient coin horde "wrapped like a pasty"

Archaeologists discover 11th-century coin hoard, shedding light on a turbulent era.

Astronauts May Soon Eat Fresh Fish Farmed on the Moon

Scientists hope Lunar Hatch will make fresh fish part of space missions' menus.

Scientists Detect the Most Energetic Neutrino Ever Seen and They Have No Idea Where It Came From

A strange particle traveled across the universe and slammed into the deep sea.