Want a MEEELLION-year data storage? Use DNA of course

Storing a year’s info would need … just four grams of the stuff!

DNA Helix

Swiss boffins have used their weird wizardry to devised a way to store data for a million years using DNA, or so they claim.

Perhaps DNA should be changed and adopt a new acronymic meaning: Digital Nucleotide Archive.

The researchers were led by boss boffin Robert Grass (Die Haupt Boffiner – think that’s the translation, Ed), a lecturer at ETH Zurich’s Department of Chemistry and Applied Biosciences.

They wrote information to DNA strands adding ECC data, encapsulated them in nanometre-scale glass spheres, and then read it back after simulating a million-year wait (more of that later). So, how does DNA data storage work?

DNA and storage recap

In an August 2012 Phys.org article researchers said they had stored data using DNA, with DNA nucleotides theoretically capable of storing two bits. A nucleotide is an organic molecule that is a nucleic acid sub-unit.

DNA, or deoxyribonucleic acid, is a pair of molecules encoding genetic instructions to create living organisms. Generally, each is a structure with two bi-polymer strands in a double helix coil.

Each strand is built from nucleotides which are joined in a chain. There are four kinds of nucleobases in the nucleotides: adenine (A), cytosine (C), guanine (G) and thymine (T). The sequencing of these bases – A-C-G-T – is used to code genetic information.

In DNA data storage data bits are mapped to these bases. Conceivable we can think of this as a kind of quaternary (4-way) numeral system with binary being 2-way and decimal 10-way.

Actually the researchers did not use quaternary encoding, instead using binary with 0 = (A and C) and 1 = (G and T).

With this scheme “5.27 million 0s and 1s ... 5.27 megabits were then sequenced into sections of nucleotides 96 bits long using one DNA nucleotide for one bit. [So] each block also contained a 19-bit address to encode the block’s place in the overall sequence”.

The Phys.org articles claims that DNA storage density is one million gigabits per cubic millimetre, (1Pbit/mm3) and “four grams of DNA could theoretically store all the digital data created annually [in 2012]".

The million year deal

Since DNA is basically an organic chemistry instance it is affected over time by chemical reactions with its environment. That corrupts the information. The Grass team solved this in two ways; first by adding Reed-Solomon codes to the data for error checking and correction (ECC).

Secondly, by encasing the DNA in 150nm diameter silica glass spheres. They likened this to the idea of reading DNA from fossilized bones hundreds of years old.

They simulated centuries of waiting around by heating the spheres to 60-70oC for a month, as it speeds up any chemical reactions that could happen, and then read the data.

Unfortunately, reading the data means destroying the capsules carrying it. They were opened up using a flouride solution and then the DNA was fed into a sequencer for decoding, a classic WORO situation.

The Grass team think “DNA-encoded information can survive over a million years".

It’s possible that such DNA data encoding could provide a more reliable extremely long-term data storage protocol than any digital-mechanical method, such as optical disks, holographic disks or solid state storage.

DNA sequencing is liable to be used over such an extended period whereas, for example, Blu-ray disk reader technology could, like punched card readers, simply vanish. ®

Biting the hand that feeds IT © 1998–2018