Researchers have encoded a full book in DNA, the largest amount of information stored on the biological medium yet.
The data encoded is the digital version of the book, made up of more than 50,000 words, 11 images and one computer program. The overall size of the data is around 0.7 megabytes, report the scientists, led by George Church of Harvard Medical School. For their work, the researchers have used only off-the-shelf technology.
In their article, published on-line by Science magazine, the scientists argue that DNA has unique advantages for data storage. They calculate that their method has by far the highest data density of any medium until now, beating flash media or even quantum holography by orders of magnitude. This is partly because DNA is three dimensional while other storage techniques are restricted to two dimensions.
Yet the main advantage of DNA storage may be durability. DNA can survive millennia unharmed, as demonstrated by the sequencing of genetic information from ancient fossils. At the same time, the tools and techniques necessary for reading out the information will be present in future generations, because they are ubiquitous in nature, the scientists write.
The main disadvantage at this time is expense. The authors admit that the cost and time needed to encode the information make it largely impractical at the moment, except for highly specific applications, like century-scale archiving.
But they point out that the cost of DNA synthesis and sequencing has been dropping by a factor larger than five each year, much higher than the rate for electronic media, albeit from a much higher starting point. The scientists conclude that DNA is becoming an increasingly practical storage medium, at a time when digital information is accumulating at an exponential rate.
For their work, the researchers split into pieces the information of the book “Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves,” co-written by Church. They then synthesized short DNA fragments of around 160 nucleotides — the bits in DNA. Each fragment carries part of the book, information about its position, as well as parts necessary for reading and replicating the piece.
In the process, the scientists have created 70 billion copies of the book. When reading out the information, the data was recovered with but 10 errors overall.
The first demonstration of encoding information into DNA dates back to 1988. Until now, the largest amount of data encoded in nucleic acid has been only 7,920 bits, around one-700th what Church’s team has accomplished. The authors report on a number of improvements over previous methods that make this feat possible, including a more flexible method of encoding data, using shorter and thereby easier to handle DNA pieces, and next-generation technologies for synthesis and sequencing.
For the future, the researchers propose improvements in compression and accuracy, to make the storage denser and less error-prone.