Computational (DNA) storage – end of evolution part 4

We were at a recent Storage Field Day (SFD26) where there was a presentation on DNA storage, a new SNIA technical affiliate. The talk there was on how far DNA storage has come and is capable of easily storing GB of data. But I was perusing PNAS archives the other day and ran across an interesting paper Parallel molecular computation on digital data stored in DNA, essentially DNA computational storage.

Computational storage are storage devices (SSDs or HDDs) with computational cores that can be devoted to outside compute activities. Recently, these devices have taken over much of the hyper-scalar grunt work of video/audio transcoding and data encryption activities which are both computationally and data intensive activities.

DNA strand storage and computers

The article above discusses the use of DNA “strand displacement” interactions as micro-code instructions to enable computation on DNA strand storage. The use of DNA strands for storage reduces the storage density of DNA information that currently use nucleotides to encode bits (theoretically, 2 bits per nucleotide) to 0.03 bits per nucleotide. But as DNA information density (using nucleotides) is some 6 orders of magnitude greater than current optical or magnetic storage, this shouldn’t be a concern.

A bit is represented by 5 to 7 nucleotides in DNA strand storage, which they called a domain, these are grouped into a 4 or 5 bit cells, with one or more cells arranged in a DNA strand register which is stored on a DNA plasmid.

They used a common DNA plasmid (M13mp18, 7.2k bases long) for their storage ring (which had many registers on it). M13mp18 is capable of storing several hundred bits, but for their research they used it to store 9 DNA strand registers.

The article discusses the (wet) chemical computational methods necessary to realize DNA strand registers and programing that uses that storage.

The problem with current DNA storage devices is that read out is destructive and time consuming. With current DNA storage, data has to be read out and then computation occurs electronically and then new DNA has to be re-synthesized with any results that need to be stored.

With a computational DNA strand storage device, all this could be done in a single test tube, with no need to do any work outside the test tube.

How DNA strand computer works

They figure shows a multi cell DNA strand register, with nic’s or mismatched nucleotides representing the value of 0 or 1. They use these strands, nic’s and toeholds (attachment points) on DNA strands to represent data. They attach magnetic beads to the DNA strands for manipulation.

DNA strand displacement interactions or the micro-code instructions they have defined include

  • Attachment, where an instruction can be used to attach a cell of information to a register strand.
  • Displacement, where an instruction can be used used to displace an information cell in a register strand.
  • Detachment, where an instruction can be used to a cell present in a register strand to be detach it from the register.

Instructions are introduced, one at a time, as separate DNA strands, into the test tube holding the DNA strand registers. DNA strand data can be replicated 1000s or millions of times in a test tube and the instructions could be replicated as well allowing them to operate on all the DNA strands in the tube.

Creating a SIMD (single instruction stream operating on multiple data elements) computational device based on strand DNA storage which they call SIMDDNA. Note: GPUs and CPUs with vector instructions are also SIMD devices

Using these microcoded DNA strand instructions and DNA strand register storage, they have implemented a bit counter and a Turing Rule 110, sort of like life, program. Turing Rule 110 is Turing Complete and as such, can, with enough time and memory, simulate any program calculation. Later in the a paper they discuss their implementation of a random access device where they go in and retrieve a piece of data and erase it.

Program for bit counting, information in solid blue boundary are the instructions and information in dotted boundary are the impacts to the strand data.

The process seems to flow as follows, they add magnetic beads to each register strand, add an instruction at a time to the test tube, wait for it to complete, wash out the waste products and then add another. When all instructions have been executed the DNA strand computation is done and if needed, can be read out (destructively). Or perhaps pass off to the next program for processing. An instruction can take anywhere from 2 to 10 minutes to complete (it’s early yet in the technology).

They also indicated that the instruction bath added to the test tube need not contain all the same instructions which means that it could create a MIMD (multi-instruction stream operations on multiple data elements) computational device.

The results of the DNA strand computations weren’t 100% accurate but they show that it’s 70-80% accurate at the moment. And when DNA data strands are re-used, for subsequent programs, their accuracy goes down.

There are other approaches to DNA computation and storage which we discuss in parts-1, -2 and -3 in our End of Evolution series. And if you want to learn more about current DNA storage please check out the SFD26 SNIA videos or listen to our GBoS podcast with Dr. J Metz.

Where does evolution fit in

Evolution seems to operate on mutation of DNA and natural selection, or selection of the fittest. Over time this allows good mutations to accumulate and bad mutations to die off.

There’s a mechanism in digital computing called ECC (error correcting codes) which, for example, add additional “guard” bits to every 64-128 bit word of data in a computer memory and using the guard bits, is able to detect 2 or more bit errors (mutations) and correct 1 or 2 bit errors.

If one were to create an ECC algorithm for human DNA strands, say encoding DNA guard bits in junk DNA and an ECC algorithm in a DNA (strand)computer, and inject this into a newborn, the algorithm could periodically check the accuracy of any DNA information in every cell of a human body, and correct it, if there were any mutations. Thus ending human evolution.

We seem a ways off from doing any of this but I could see something like ECC being applied to a computational DNA strand storage device in a matter years. And getting this sort of functionality into a human cell maybe a decade or two. Getting it to the point where it could do this over a lifetime maybe another decade after that.

Comments?

Photo Credit(s):

  • Section B from Figure 2 in the paper
  • Figure 1 from the paper
  • Section A from Figure 2 in the paper
  • Section C from Figure 2 in the paper
  • Section A from Figure 3 in the paper

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.