Researchers encode data in DNA hundreds of times faster than before

AI image of DNA used for storage — DNA could soon become a reliable medium of storage. AI-generated image.

DNA can hold a staggering amount of information. Not only is it the blueprint for all life on Earth, but a single gram of DNA can store the equivalent of 215 million gigabytes of data. That’s enough to hold every digital book, song, and movie ever created. Gram for gram, DNA can store up to a billion times more data than silicon-based storage.

The traditional method of storing data in DNA involves encoding binary information (the ones and zeros of computing) into sequences of nucleotide bases — adenine (A), thymine (T), guanine (G) and cytosine (C) — and then synthesizing these sequences chemically. This method is promising but high costs and slow data writing speeds hamper it. The new study addresses these challenges by introducing a method that encodes data without synthesizing new DNA sequences.

The new method sidesteps these limitations.

In this new system, the research team, led by Cheng Zhang developed a method that uses epigenetic modifications to encode data. Epigenetic modifications involve chemical changes to DNA that do not alter its sequence but can influence its function. One common type of epigenetic modification is DNA methylation, where methyl groups are added to cytosine bases in the DNA sequence.

“It’s encouraging to see that epigenetic principles from biochemistry textbooks and taught in my classroom can be applied seamlessly to DNA data storage applications to solve some of the unmet challenges in this field,” says corresponding author Hao Yan.

How it all works

The team’s approach essentially “prints” data onto DNA using these methylation marks as binary data bits, or “epi-bits.” By using a library of prefabricated DNA templates and short DNA strands known as bricks, the researchers could guide where methyl groups are placed on the DNA, allowing them to encode complex information without having to synthesize new DNA molecules from scratch.

One of the most remarkable features of this new approach is its ability to write data in parallel. Traditional DNA synthesis is a serial process — each nucleotide must be added one at a time, which is time-consuming and costly. However, the new system allows the researchers to add multiple epi-bits of information simultaneously, increasing the speed and scalability of data storage.

Let’s say you’re writing a letter by hand. You’re writing all the letters one by one, which is not very efficient. But, when you print something, you print an entire row, which is much faster.

“This new approach demonstrates how one can harness molecular mechanisms for innovative data solutions, bridging the fields of biology and digital information,” says Laura Na Liu, a co-author of the new study.

Coding panda pics into DNA

Images of tigers showing the results of different DNA data storage methods — Recovered tiger images from samples 1 to 4 with stepwise improved writing-reading pipelines.

The team tested their approach by storing an image of a panda and a rubbing in the shape of a tiger from ancient China. They then retrieved them with a DNA sequencer.

In their experiments, the researchers stored approximately 275,000 bits of information using their new system (about a third of a megabyte). They achieved this by employing a set of 700 DNA “movable types” (i.e., pre-made short DNA sequences) and five universal DNA templates. This allowed them to write 350 bits of data in a single reaction, a significant improvement over traditional methods. The approach was also reliable, having high fidelity and minimal error rates (less than 3%).

The DNA coding scheme for the image of a panda — Compression and error correction coding scheme for panda image (i), and a schematic of the retrieved epi-bits on sequencing reads along with the restored image (ii).

To ensure that the data stored using epigenetic modifications could be accurately read, the researchers used high-throughput nanopore sequencing, a technology that reads DNA sequences by passing them through a tiny pore and detecting changes in electrical current.

The research also demonstrated a novel aspect of their technology: its accessibility. They conducted a pilot experiment called “iDNAdrive,” where 60 student volunteers with no professional biolab experience successfully encoded their own data into DNA using a simple kit. This shows that their system is not only scalable but also user-friendly.

This marks a significant departure from current DNA data storage methods, which could only be done in a lab before. In this distributed system, users could “write” data to DNA in their own homes and then retrieve it later through sequencing.

Big promise, big challenges

This research highlights the incredible potential of DNA as a medium for storing vast amounts of data in a compact, stable, and durable form. The innovative use of epigenetic modifications to encode data provides a new way to overcome the limitations of traditional DNA synthesis methods.

DNA is much more stable than silicon and other traditional storage media. Properly stored, DNA can last for thousands of years, making it ideal for archival purposes, such as preserving cultural artifacts, historical records, or scientific data. This method’s potential for distributed data storage could revolutionize personal data privacy and security. Instead of relying on cloud storage or data centers, individuals could store their most sensitive information in DNA, which could be kept in a secure location and accessed only when needed.

However, there are also enormous challenges ahead. For starters, only very small amounts of information were stored, and the error rates, while relatively low (<3%), are not acceptable for data we work with routinely.

Another challenge is the speed of data retrieval. Although nanopore sequencing allows for high-throughput reading of DNA, it is still slower than the reading speeds of conventional digital storage devices. Advances in sequencing technology will be crucial to making DNA data storage competitive with silicon-based systems.

The study “Parallel molecular data storage by printing epigenetic bits on DNA” was published in Nature.