Read an article the other day in Scientific American (“Punch card” DNA …) which was reporting on a Nature Magazine Article (DNA punch cards for storing data… ). The articles discussed a new approach to storing (and encoding) data into DNA sequences.
We have talked about DNA storage over the years (most recently, see our Random access DNA object storage post) so it’s been under study for almost a decade.
In prior research on DNA storage, scientists encoded data directly into the nucleotides used to store genetic information. As you may recall, there are two complementary nucleotides A-T (adenine-thymine) and G-C (guanine-cytosine) that constitute the genetic code in a DNA strand. One could use one of these pairs to encode a 1 bit and the other for a 0 bit and just lay them out along a DNA strand.
The main problem with nucleotide encoding of data in DNA is that it’s slow to write and read and very error prone (storing data in DNA separate nucleotides is a lossy data storage). Researchers have now come up with a better way.
Using DNA nicks to store bits
One could encode information in DNA by utilizing the topology of a DNA strand. Each DNA strand is actually made up of a sugar phosphate back bone with a nucleotide (A, C, G or T) hanging off of it, and then a hydrogen bond to its nucleotide complement (T, G, C or A, respectively) which is attached to another sugar phosphate backbone.
It appears that one can deform the sugar phosphate back bone at certain positions and retain an intact DNA strand. It’s in this deformation that the researchers are encoding bits and they call this a “DNA nick”.
Writing DNA nick data
The researchers have taken a standard DNA strand (E-coli), and identified unique sites on it that they can nick to encode data. They have identified multiple (mostly unique) sites for nick data along this DNA, the scientists call “registers” but we would call sectors or segments. Each DNA sector can contain a certain amount of nick data, say 5 to 10 bits. The selected DNA strand has enough unique sectors to record 80 bits (10 bytes) of data. Not quite a punch card (80 bytes of data) but it’s early yet.
Each register or sector is made up of 450 base (nucleotide) pairs. As DNA has two separate strands connected together, the researchers can increase DNA nick storage density by writing both strands creating a sort of two sided punch card. They use this other or alternate (“anti-sense”) side of the DNA strand nicks for the value “2”. We would have thought they would have used the absent of a nick in this alternate strand as being “3” but they seem to just use it as another way to indicate “0” .
The researchers found an enzyme they could use to nick a specific position on a DNA strand called the PfAgo (Pyrococcus furiosus Argonaute) enzyme. The enzyme can de designed to nick distinct locations and register (sectors) along the DNA strand. They designed 1024 (2**10) versions of this enzyme to create all possible 10 bit data patterns for each sector on the DNA strand.
Writing DNA nick data is done via adding the proper enzyme combinations to a solution with the DNA strand. All sector writes are done in parallel and it takes about 40 minutes.
Also the same PfAgo enzyme sequence is able to write (nick) multiple DNA strands without additional effort. So we can replicate the data as many times as there are DNA strands in the solution, or replicating the DNA nick data for disaster recovery.
Reading DNA nick data
Reading the DNA nick data is a bit more complicated.
In Figure 1 the read process starts by by denaturing (splitting dual strands into single strands dsDNA) and then splitting the single strands (ssDNA) up based on register or sector length which are then sequenced. The specific register (sector) sequences are identified in the sequence data and can then be read/decoded and placed in the 80 bit string. The current read process is destructive of the DNA strand (read once).
There was no information on the read time but my guess is it takes hours to perform. Another (faster) approach uses a “two-dimensional (2D) solid-state nanopore membrane” that can read the nick information directly from a DNA string without dsDNA-ssDNA steps. Also this approach is non-destructive, so the same DNA strand could be read multiple times.
Other storage characteristics of nicked DNA
Given the register nature of the nicked DNA data organization, it appears that data can be read and written randomly, rather than sequentially. So nicked DNA storage is by definition, a random access device.
Although not discussed in the paper, it appears as if the DNA nicked data can be modified. That is the same DNA string could have its data be modified (written multiple times).
The researcher claim that nicked DNA storage is so reliable that there is no need for error correction. I’m skeptical but it does appear to be more reliable than previous generations of DNA storage encoding. However, there is a possibility that during destructive read out we could lose a register or two. Yes one would know that the register bits are lost which is good. But some level of ECC could be used to reconstruct any lost register bits, with some reduction in data density.
The one significant advantage of DNA storage has always been its exceptional data density or bits stored per volume. Nicked storage reduces this volumetric density significantly, i.e, 10 bits per 450 (+ some additional DNA base pairs required for register spacing) base pairs or so nicked DNA storage reduces DNA storage volumetric density by at least a factor of 45X. Current DNA storage is capable of storing 215M GB per gram or 215 PB/gram. Reducing this by let’s say 100X, would still be a significant storage density at ~2PB/gram.
- From wikipedia article on Punched card
- From wikipedia article on DNA
- Fig 1 from the Nature Magazine Article DNA punchcards for storing data on native DNA sequences via enzymatic nicking
- Fig 2 from the Nature Magazine Article DNA punchcards for storing data on native DNA sequences via enzymatic nicking
- Fig 3 from the Nature Magazine Article DNA punchcards for storing data on native DNA sequences via enzymatic nicking