DNA as storage? Old and boring. Boffins now chaining monomers

Take a mass spectroscope and synthesise some 'very tiny disks are coming' hype

Monomer-manipulating researchers writing in Nature Communications this month say they have read multi-byte sequences using mass spectroscopy and bit-storing monomers.

The research paper, titled "Mass spectrometry sequencing of long digital polymers facilitated by programmed inter-byte fragmentation" claims it sets the stage for data storage on a scale 100 times smaller than that of current hard drives.

This research moves on from DNA as a storage medium and biological techniques to using synthetic and more easily handled polymers.

How? Would chemists please excuse themselves while we stumble our way through the complex science involved.

The basic storage element is a synthetic monomer, a molecule, which groups with others to form a chain called a polymer. In DNA the monomers are nucleotides and these are natural. The synthesised monomers here were designed to be sequenced (analysed) using moderate resolution mass spectroscopy (MS).

Data is "written" using an automated monomer-polymer production method, assembled, or rather synthesized by automated phosphoramidite chemistry, into readable digital sequences - poly(phosphodiester)s chains.

Monomers are grouped into sets of 8, an oligomer and thus a byte. Clearly monomers are polygamous and not monogamous.

MS instruments, which are readily available, ionise samples. This is accomplished with techniques such as electron bombardment and produces a signature of the chemical structure using a chart showing the mass-to-charge ratio of the elements. The chart values (spectra) can be used to identify the monomers.

Each monomer has a phosphate group and a binary one or zero value associated with that. In detail a monomer contains either a propyl phosphate (binary 0) or a 2,2-dimethylpropyl phosphate (binary 1) synthon (synthetic building block).

Each set of 8 monomers is separated from the next group by a molecular separator, a weak alkoxyamine group. Further each byte of the sequence is labeled by an identification tag, a mass tag, which means a byte sequence exists and can be detected.

This tag is a natural (A, T, G, or C) or non-natural (B, I, F) nucleotide.

The monomers are then grouped in chains to form a polymer that is stable at room temperatures.

Reading the data involves breaking the 8-bit monomer sequences (bytes) apart at the separator points, producing a mass of bytes whose sequence position can be detected. Then the bytes are individually sequenced; meaning the constituent monomers are analysed, and these have a known position in the byte. So the byte's digital value is identified and the byte string can be reconstructed to reveal the digital data.

Polymer_Storage_spectra

MS spectrum of an 8-byte digital polymer containing the ASCII-encoded word 'Sequence'.

The researchers read a 78-element polymer chain, composed of 64 bits, seven tags, and seven spacers. They say full sequence coverage can be obtained in a single measurement performed in a moderate resolution mass spectrometer.

Such byte value identification and sequence reconstruction actually means the output spectra are viewed and analysed manually, and this takes several hours. The researchers suggest it should be possible to reduce the time needed to a few milliseconds by developing software to perform this task.

Assume this could be done and synthetic polymer storage devices could be read at disk drive or slower tape drive speeds; depending on the read mechanism's characteristics, but we are a huge long way away from thinking about the design of such things.

Chemists may now return.

Problems

The researchers note that synthesis of very long polymer chains containing several hundred coded bits is problematic.

They state: "Polymer-based memory devices will most probably rely on libraries of coded chains, as already done in the field of DNA storage. In such libraries, individual chains containing about 100 coded residues and a short localization address sequence are typically used and permit to store large quantities of information."

That reminds us of a deduplication hash signature library.

The development of organized and accessible digital polymers libraries will be necessary for longer polymer data coding research to be conducted.

Claiming that polymer storage can be a hundredth of the size of a disk drive is far from impressive. Are we talking about a 3.5-inch or 2.5-inch disk drive form factor, or something else?

A small USB stick could meet the criterion easily with a 3.5-inch disk drive. If the researchers means that polymer storage can be one hundredth the size of a disk drive bit then that is more interesting but not by much. What size bit does the writer have in mind? Disk drive bits have shrunk markedly over the past few years, and by more than 100x in physical size.

Does the writer mean a monomer is one hundredth the size of a disk drive bit? That's marginally more interesting again but, once more, compared to what size disk drive bit? On its own this unquantified disk drive size-related claim is risible.

The reading process (MS sequencing) appears to be destructive with polymers disaggregated into bytes (oligos) and the bytes disaggregated into bits (monomers). Unless it involves taking a polymer sample then the researchers have invented WORO storage; write once, read once, which is hardly useful.

If it involves taking a sample then how many samples can be taken before the original material is exhausted? The implication here is that the polymer storage has a limited number of read cycles; WORF storage; Write Once, Read Few, which is far from ideal being no better than existing technology.

Millisecond-class read speed is also far from being impressive, unless we could get tape capacity, reliability and longevity at such a disk access speed. With an 8-byte string read in a few hours in this research we are a massive distance away from that.

All-in-all, the impression given is that the researchers are sophisticated chemists intoxicated with the idea of synthetic monomer chain digital storage but fairly unversed in digital storage technologies. This could lead them into a digital synthetic monomer storage curiosity shop – chemically interesting but digitally useless. ®

Bootnote The research was carried out by Abdelaziz Al Ouahabi, Jean-Arthur Amalian, Laurence Charles & Jean-François Lutz through the Institut Charles Sadron (CNRS) in Strasbourg and the Institute of Radical Chemistry (CNRS / Aix-Marseille University). It was published in Nature Communications, October 17, in a paper, article 967 (2017) entitled "Mass spectrometry sequencing of long digital polymers facilitated by programmed inter-byte fragmentation."


Biting the hand that feeds IT © 1998–2017