Ditch disk and bin tape: Harvard boffins have cracked DNA storage
Gather ye floppies while ye may
I recently decided to clean out my home office; I’d had enough of the 56K modems lying around, and needed the space. But what I didn’t expect was to find a museum of data storage concentrated in such a small space.
I suspected at the time I wouldn’t need the 5.25" 720k floppy disks to upgrade to VMS v5.1 again, but thought, 'Who knows, maybe I should keep them' – so I did, along with the 2000ish 1.44MB floppy disks and random associated hard disks. Now when I Google floppy disks the first thing that appears is an explanation of what a floppy disk is, or rather was.
Next I moved onto some more recent technology: surely I wouldn’t have to worry about throwing out USB memory sticks, would I? Having counted somewhere around a 100 of the things lying around the house I decided that this was maybe the time that I didn’t really need 10x 64MB sticks cluttering up space, after all my new shiny 64GB version is now 1,000x bigger.
This got me thinking about the state of the data storage market, and the changes going on. While the capacity of floppy disks rose slowly and fairly consistently we have seen some spectacular changes in the storage marketplace. We got used to disk capacities doubling every two years, then this changed to 18 months, then suddenly the 2GB drives became 200GB then 400, then suddenly the 1TB drive had landed.
It was at this time we started to expect development to slow down – after all as a wise Star Trek engineer once said “You cannae change the laws of physics, Captain.” Well, you know what Scotty, actually what we thought we knew about storage has changed... and 2TB drives appeared, now 3TB are not uncommon in data centres and 4TB monsters are available on Amazon.
Surely sometime disk drives have to stop evolving? Well, yes and no, they may stop evolving in their current form, but the requirements to store more and more data, and to hold it for longer and longer goes on unabated. Hmmm, what do we do now?
Well, change the form of course. When it comes to storing information, hard drives don’t hold a candle to DNA. Our genetic code packs billions of gigabytes into a single gram. A mere milligram of the molecule could encode the complete text of every book in the British Library and have plenty of room to spare. All of this has been mostly theoretical, until now. In a new study, researchers stored an entire genetics textbook in less than a picogram of DNA — one trillionth of a gram — an advance that could revolutionise our ability to store data.
Initially there may seem to be some problems around using DNA to store data. First of all, cells die — not a good way to lose your valuable information. They also naturally replicate, introducing changes over time that can alter the data (and while we accepted this on a floppy disk it’s unthinkable now). To get around this challenge, the research team at Harvard created a DNA information-archiving system that uses no cells at all. Instead, an inkjet printer embeds short fragments of chemically synthesised DNA onto the surface of a tiny glass chip. To encode a digital file, researchers divide it into tiny blocks of data and convert these data not into the 1s and 0s of typical digital storage media, but rather into DNA’s four-letter alphabet of As, Cs, Gs, and Ts. Each DNA fragment also contains a digital “barcode” that records its location in the original file. Reading the data requires a DNA sequencer and a computer to reassemble all of the fragments in order and convert them back into digital format. The computer also corrects for errors; each block of data is replicated thousands of times so that any chance glitch can be identified and fixed by comparing it to the other copies.
By using these methods they managed to encode a complete book, just under 6MB in size, onto a single strand of DNA. Now, obviously this comes at a price beyond the reach of customers for now, but at the rate the data storage market moves, who knows how we will upgrade our storage capacity in the future? It is estimated that a double DNA strand could encode 10 exabytes of data or 11,529,215,046,100 MB – that’s quite a lot of floppy disks.
So, now when you hear us data guys talking about “Big Data” and not being scared by the volume element, maybe you’ll understand why.
In a few years' time when you need to add an exabyte or two to your data capacity, don’t worry – I’ve an armful right here. ®