YouTube has become a force to reckon with ever since it was created back in 2005. That is why it comes as quite a shock when you find out that researchers have come up with a theoretical possibility that they could store 10 petabytes (10 million gigabytes) of data in a single gram of DNA. This means that all of YouTube could be fitted on a teaspoon!
The study comes from a team of researchers at the Technion – Israel Institute of Technology located in Haifa and the Interdisciplinary Center (IDC) Herzliya that is also in Israel. It is aimed at a way of examining the possibility of making use of DNA as data storage. Data storage is becoming more and more important with the use of the cloud. Server farms were the conventional solution, but they come with environmental concerns since they rely greatly on electricity.
DNA is already used for the storage of the highly complex code for human life. This makes it a perfect choice for data storage. However, actually pulling it off is not an easy feat. When it comes to the encoding of information in a DNA, you will need a chain that is comprised of links known as nucleotides. These nucleotides are known as the four building blocks of life and are marked with letters A, C, G, and T. Binary sequences comprised of 0s and 1s are then translated into these four letters.
DNA molecules are produced with the same sequences in a process known as synthesis. The next process is known as sequencing, and during this, researchers create an output that represents the original nucleotide sequence. The team worked out the problems theoretically, which is a step forward. In a press statement, the team has described their progress as;
(1) increasing the number of letters used to encode the information (beyond the original 4 letters); (2) significantly reducing the number of synthesis rounds required to store information on DNA; (3) improving the error correction mechanism used.
‘The current synthesis and sequencing processes are inherently redundant because each molecule is produced in large numbers1 and is read in multiple copies during sequencing, says Professor Zohar Yakhini of the Technion in the press statement. “The method we developed leverages this redundancy to increase the effective number of letters well over the original four letters, making it possible for us to encode and write each unit of information in fewer cycles of synthesis.’
The research team was able to bring down the number of synthesis rounds that are required per unit of information by 20%. When you consider the complexity of this kind of work, any improvement is much appreciated. The work carried out by these scientists can actually lead to a 75% reduction in the future.
Professor Roee Amit who is running a synthetic biology lab at the Technion said, ‘In this work, we have implemented a DNA based storage system that encodes information with synthesis efficiency that is significantly better than the standard approach. The study included the actual implementation of the new coding technique for storing large-volume information on DNA molecules and reconstructing it for testing the process.’