Genealogy in Data Science
By Aditi Bhat
One of the most fascinating and constantly advancing aspects of Data is the way it is stored. In the last few years, data storage has gone from megabytes to terabytes. The growth of cloud storage has also been unwavering in the field of data storage. Regardless of the choice of storage, be it DropBox or Google Drive, files no longer need to live on just the hardware.
However, one thing that hasn’t changed is how those files are ultimately stored. Whether the source is cloud or hardware, it’s stored electronically. The collection of hard drives is growing, and server farms are becoming ubiquitous. In this era of exploding Big Data, continuing data storage the way we do today, we will run out of space and power. Not to mention the risk of losing data because of hardware degradation over time.
The most revolutionary data storage method yet:
Luckily, specialized Big Data Analytics courses and increased interest in Data Science and Data Science training has uncovered a solution - encoding data into DNA. Essentially, DNA is a form of storing information about living things - how they grow, how they look, and what their personalities are like. What if we could use this encoding method to, say, store a movie instead? That’s exactly what scientists have done, opening up a whole new world of opportunities for Data Science and Data Analytics Techniques.
Scientists have translated combinations of the 1s and 0s that encode data electronically into corresponding nucleotide bases, A, G, C, and T. For example 00=A, 01=G, 10=C, and 11=T. Once these translations are complete, they can be densely packed into DNA, with plenty of redundancy to ensure that even if a few pieces are lost or cannot be decoded, there is enough information left to piece together the whole.
Benefits of DNAs being used for data storage:
Better data compression
The sheer density of storage means that we can compress data one thousand times more efficiently than methods used today. The best part is that this density can be achieved without overheating hardware, which is the current limiting factor.
Error-free storage and retrieval
It’s easy to store, maintain, and retrieve multiple copies of information without errors using DNA encoding. The encoding algorithm used today is similar to how videos are streamed. So even if a few pieces of information are missing, they can be easily detected and replaced.
DNA doesn’t degrade easily; at room temperature, DNA can be stored for up to four thousand years, and can last even more millennia if stored in a cold, dark place. Because DNA is so essential for evolution, its relevance will not diminish easily. Humans, thousands of years in the future, can still read DNA the way it’s encoded now.
Currently, the only drawback is that DNA storage is still a few years away from being a viable option. It’s still slow to read from DNA for it to be of everyday use, and is still too expensive to be used in industry. Hopefully, the next generation, through advanced Data Science training and with more Big Data generated, will be able to laugh at our server farms while they decide which movie to watch from their strand of DNA.
What are other fascinating data storage methods that you know of? Tell us in the comments!