Computational (DNA) storage – end of evolution part 4

We were at a recent Storage Field Day (SFD26) where there was a presentation on DNA storage, a new SNIA technical affiliate. The talk there was on how far DNA storage has come and is capable of easily storing GB of data. But I was perusing PNAS archives the other day and ran across an interesting paper Parallel molecular computation on digital data stored in DNA, essentially DNA computational storage.

Computational storage are storage devices (SSDs or HDDs) with computational cores that can be devoted to outside compute activities. Recently, these devices have taken over much of the hyper-scalar grunt work of video/audio transcoding and data encryption activities which are both computationally and data intensive activities.

DNA strand storage and computers

The article above discusses the use of DNA “strand displacement” interactions as micro-code instructions to enable computation on DNA strand storage. The use of DNA strands for storage reduces the storage density of DNA information that currently use nucleotides to encode bits (theoretically, 2 bits per nucleotide) to 0.03 bits per nucleotide. But as DNA information density (using nucleotides) is some 6 orders of magnitude greater than current optical or magnetic storage, this shouldn’t be a concern.

A bit is represented by 5 to 7 nucleotides in DNA strand storage, which they called a domain, these are grouped into a 4 or 5 bit cells, with one or more cells arranged in a DNA strand register which is stored on a DNA plasmid.

They used a common DNA plasmid (M13mp18, 7.2k bases long) for their storage ring (which had many registers on it). M13mp18 is capable of storing several hundred bits, but for their research they used it to store 9 DNA strand registers.

The article discusses the (wet) chemical computational methods necessary to realize DNA strand registers and programing that uses that storage.

The problem with current DNA storage devices is that read out is destructive and time consuming. With current DNA storage, data has to be read out and then computation occurs electronically and then new DNA has to be re-synthesized with any results that need to be stored.

With a computational DNA strand storage device, all this could be done in a single test tube, with no need to do any work outside the test tube.

How DNA strand computer works

They figure shows a multi cell DNA strand register, with nic’s or mismatched nucleotides representing the value of 0 or 1. They use these strands, nic’s and toeholds (attachment points) on DNA strands to represent data. They attach magnetic beads to the DNA strands for manipulation.

DNA strand displacement interactions or the micro-code instructions they have defined include

  • Attachment, where an instruction can be used to attach a cell of information to a register strand.
  • Displacement, where an instruction can be used used to displace an information cell in a register strand.
  • Detachment, where an instruction can be used to a cell present in a register strand to be detach it from the register.

Instructions are introduced, one at a time, as separate DNA strands, into the test tube holding the DNA strand registers. DNA strand data can be replicated 1000s or millions of times in a test tube and the instructions could be replicated as well allowing them to operate on all the DNA strands in the tube.

Creating a SIMD (single instruction stream operating on multiple data elements) computational device based on strand DNA storage which they call SIMDDNA. Note: GPUs and CPUs with vector instructions are also SIMD devices

Using these microcoded DNA strand instructions and DNA strand register storage, they have implemented a bit counter and a Turing Rule 110, sort of like life, program. Turing Rule 110 is Turing Complete and as such, can, with enough time and memory, simulate any program calculation. Later in the a paper they discuss their implementation of a random access device where they go in and retrieve a piece of data and erase it.

Program for bit counting, information in solid blue boundary are the instructions and information in dotted boundary are the impacts to the strand data.

The process seems to flow as follows, they add magnetic beads to each register strand, add an instruction at a time to the test tube, wait for it to complete, wash out the waste products and then add another. When all instructions have been executed the DNA strand computation is done and if needed, can be read out (destructively). Or perhaps pass off to the next program for processing. An instruction can take anywhere from 2 to 10 minutes to complete (it’s early yet in the technology).

They also indicated that the instruction bath added to the test tube need not contain all the same instructions which means that it could create a MIMD (multi-instruction stream operations on multiple data elements) computational device.

The results of the DNA strand computations weren’t 100% accurate but they show that it’s 70-80% accurate at the moment. And when DNA data strands are re-used, for subsequent programs, their accuracy goes down.

There are other approaches to DNA computation and storage which we discuss in parts-1, -2 and -3 in our End of Evolution series. And if you want to learn more about current DNA storage please check out the SFD26 SNIA videos or listen to our GBoS podcast with Dr. J Metz.

Where does evolution fit in

Evolution seems to operate on mutation of DNA and natural selection, or selection of the fittest. Over time this allows good mutations to accumulate and bad mutations to die off.

There’s a mechanism in digital computing called ECC (error correcting codes) which, for example, add additional “guard” bits to every 64-128 bit word of data in a computer memory and using the guard bits, is able to detect 2 or more bit errors (mutations) and correct 1 or 2 bit errors.

If one were to create an ECC algorithm for human DNA strands, say encoding DNA guard bits in junk DNA and an ECC algorithm in a DNA (strand)computer, and inject this into a newborn, the algorithm could periodically check the accuracy of any DNA information in every cell of a human body, and correct it, if there were any mutations. Thus ending human evolution.

We seem a ways off from doing any of this but I could see something like ECC being applied to a computational DNA strand storage device in a matter years. And getting this sort of functionality into a human cell maybe a decade or two. Getting it to the point where it could do this over a lifetime maybe another decade after that.

Comments?

Photo Credit(s):

  • Section B from Figure 2 in the paper
  • Figure 1 from the paper
  • Section A from Figure 2 in the paper
  • Section C from Figure 2 in the paper
  • Section A from Figure 3 in the paper

Where should IoT data be processed – part 1

I was at FlashMemorySummit 2019 (FMS2019) this week and there was a lot of talk about computational storage (see our GBoS podcast with Scott Shadley, NGD Systems). There was also a lot of discussion about IoT and the need for data processing done at the edge (or in near-edge computing centers/edge clouds).

At the show, I was talking with Tom Leyden of Excelero and he mentioned there was a real need for some insight on how to determine where IoT data should be processed.

For our discussion let’s assume a multi-layered IoT architecture, with 1000s of sensors at the edge, 100s of near-edge processing/multiplexing stations, and 1 to 3 core data center or cloud regions. Data comes in from the sensors, is sent to near-edge processing/multiplexing and then to the core data center/cloud.

Data size

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)
Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

When deciding where to process data one key aspect is the size of the data. Tin GB or TB but given today’s world, can be PB as well. This lone parameter has multiple impacts and can affect many other considerations, such as the cost and time to transfer the data, cost of data storage, amount of time to process the data, etc. All of these sub-factors include the size of the data to be processed.

Data size can be the largest single determinant of where to process the data. If we are talking about GB of data, it could probably be processed anywhere from the sensor edge, to near-edge station, to core. But if we are talking about TB the processing requirements and time go up substantially and are unlikely to be available at the sensor edge, and may not be available at the near-edge station. And PB take this up to a whole other level and may require processing only at the core due to the infrastructure requirements.

Processing criticality

Human or machine safety may depend on quick processing of sensor data, e. g. in a self-driving car or a factory floor, flood guages, etc.. In these cases, some amount of data (sufficient to insure human/machinge safety) needs to be done at the lowest point in the hierarchy, with the processing power to perform this activity.

This could be in the self-driving car or factory automation that controls a mechanism. Similar situations would probably apply for any robots and auto pilots. Anywhere some IoT sensor array was used to control an entity, that could jeopardize the life of human(s) or the safety of machines would need to do safety level processing at the lowest level in the hierarchy.

If processing doesn’t involve safety, then it could potentially be done at the near-edge stations or at the core. .

Processing time and infrastructure requirements

Although we talked about this in data size above, infrastructure requirements must also play a part in where data is processed. Yes sensors are getting more intelligent and the same goes for near-edge stations. But if you’re processing the data multiple times, say for deep learning, it’s probably better to do this where there’s a bunch of GPUs and some way of keeping the data pipeline running efficiently. The same applies to any data analytics that distributes workloads and data across a gaggle of CPU cores, storage devices, network nodes, etc.

There’s also an efficiency component to this. Computational storage is all about how some workloads can better be accomplished at the storage layer. But the concept applies throughout the hierarchy. Given the infrastructure requirements to process the data, there’s probably one place where it makes the most sense to do this. If it takes a 100 CPU cores to process the data in a timely fashion, it’s probably not going to be done at the sensor level.

Data information funnel

We make the assumption that raw data comes in through sensors, and more processed data is sent to higher layers. This would mean at a minimum, some sort of data compression/compaction would need to be done at each layer below the core.

We were at a conference a while back where they talked about updating deep learning neural networks. It’s possible that each near-edge station could perform a mini-deep learning training cycle and share their learning with the core periodicals, which could then send this information back down to the lowest level to be used, (see our Swarm Intelligence @ #HPEDiscover post).

All this means that there’s a minimal level of processing of the data that needs to go on throughout the hierarchy between access point connections.

Pipe availability

binary data flow

The availability of a networking access point may also have some bearing on where data is processed. For example, a self driving car could generate TB of data a day, but access to a high speed, inexpensive data pipe to send that data may be limited to a service bay and/or a garage connection.

So some processing may need to be done between access point connections. This will need to take place at lower levels. That way, there would be no need to send the data while the car is out on the road but rather it could be sent whenever it’s attached to an access point.

Compliance/archive requirements

Any sensor data probably needs to be stored for a long time and as such will need access to a long term archive. Depending on the extent of this data, it may help dictate where processing is done. That is, if all the raw data needs to be held, then maybe the processing of that data can be deferred until it’s already at the core and on it’s way to archive.

However, any safety oriented data processing needs to be done at the lowest level and may need to be reprocessed higher up in the hierachy. This would be done to insure proper safety decisions were made. And needless the say all this data would need to be held.

~~~~

I started this post with 40 or more factors but that was overkill. In the above, I tried to summarize the 6 critical factors which I would use to determine where IoT data should be processed.

My intent is in a part 2 to this post to work through some examples. If there’s anyone example that you feel may be instructive, please let me know.

Also, if there’s other factors that you would use to determine where to process IoT data let me know.

IBM using PCM to implement better AI – round 6

Saw a recent article that discussed IBM’s research into new computing architectures that are inspired by brain computational techniques (see A new brain inspired architecture … ). The article reports on research done by IBM R&D into using Phase Change Memory (PCM) technology to implement various versions of computer architectures for AI (see Tutorial: Brain inspired computation using PCM, in the AIP Journal of Applied Physics).

But first please take our new poll:

As you may recall, we have been reporting on IBM Research into different computing architectures to support AI processing for quite awhile now, (see: Parts 1, 2, 3, 4, & 5). In our last post, More power efficient deep learning through IBM and PCM, we reported on a unique hybrid PCM-silicon solution to deep learning computation.

Readers should also be familiar with PCM as well as it’s been discussed at length in a number of our posts (see The end of NAND is near, maybe; The future of data storage is MRAM; and New chip architectures with CPU, storage & sensors …). MRAM, ReRAM and current 3D XPoint seem to be all different forms of PCM (I think).

In the current research, IBM discusses three different approaches to support AI  utilizing PCM devices. All three approaches stem from the physical characteristics of PCM.

(Some) PCM physics

FIG. 2. (a) Phase-change memory is based on the rapid and reversible phase transition of certain types of materials between crystalline and amorphous phases by the application of suitable electrical pulses. (b) Transmission electron micrograph of a mushroom-type PCM device in a RESET state. It can be seen that the bottom electrode is blocked by the amorphous phase.

It turns out that PCM devices have many  characteristics that lend themselves to be useful for specialized computation. PCM devices crystalize and melt in order to change state. The properties associated with melting and crystallization of the PCM media cell can be used to support unique forms of computation. Some of these PCM characteristics include::

  • Analog, not digital memory – PCM devices are, at the core, an analog memory device. We mean that they don’t record just a 0 or 1 (actually resistant or conductive) state, but rather a continuum of values between those two.
  • PCM devices have an accumulation capability –   each PCM cell actually  accumulates a level of activation. This means that one cell can be more or less likely to change state depending on prior activity.
  • PCM devices are noisy – PCM cells arenot perfect recorders of state chang signals  but rather have a well known, random noise which impacts the state level attained, that can be used to introduce randomness into processing.

The other major advantage of PCM devices is that they take a lot less power than a GPU-CPU to work.

Three ways to use PCM for AI learning

FIG. 4. “In-memory computing,” computation is performed in place by exploiting the physical attributes of memory devices organized as a “computational memory” unit. For example, if data A is stored in a computational memory unit and if we would like to perform f(A), then it is not required to bring A to the processing unit. This saves energy and time that would have to be spent in the case of conventional computing system and memory unit. Adapted from Ref. 19.

The Applied Physics article describes three ways to use PCM devices in AI learning. These three include:

  1. Computational storage – which uses the analog capabilities of PCM to perform  arithmetic and learning computations. In a sort of combined compute and storage device.
  2. AI co-processor – which uses PCM devices, in an “all PCM nodes connected to all other PCM nodes” operation that could be used to perform neural network learning. In an AI co-processor there would be multiple all connected PCM modules, each emulating a neural network layer.
  3. Spiking neural networks –  which uses PCM activation accumulation characteristics & inherent randomness to mimic, biological spiking neuron activation.
FIG. 11.
A proposed chip architecture for a co-processor for deep learning based on PCM arrays.28

It’s the last approach that intrigues me.

Spiking neural nets (SNN)

FIG. 12. (a) Schematic illustration of a synaptic connection and the corresponding pre- and post-synaptic neurons. The synaptic connection strengthens or weakens based on the spike activity of these neurons; a process referred to as synaptic plasticity. (b) A well-known plasticity mechanism is spike-time-dependent plasticity (STDP), leading to weight changes that depend on the relative timing between the pre- and post-synaptic neuronal spike activities. Adapted from Ref. 31.

Biological neurons accumulate charge from all input (connected) neurons and when they reach some input threshold, generate an output signal or spike. This spike is then used to start the process with another neuron up stream from it

Biological neurons also exhibit randomness in their threshold-spiking process.

Emulating spiking neurons, n today’s neural nets, takes computation.  Also randomness takes more.

But with PCM SNN, both the spiking process and its randomness, comes from device physics. Using PCM to create SNN seems a logical progression.

PCM as storage, as memory, as compute or all the above

In the storage business, we look at Optane (see our 3D Xpoint post) SSDs as blazingly fast storage. Intel has also announced that they will use 3D Xpoint in a memory form factor which should provide sadly slower, but larger memory devices.

But using PCM for compute, is a radical departure from the von Neumann computer architectures we know and love today. HPE has been discussing another new computing architecture with their memristor technology, but only in prototype form.

It seems IBM, is also prototyping hardware done this path.

Welcome to the next computing revolution.

Photo & Caption Credit(s): Photo and caption from Figure 2 in AIP Journal of Applied Physics article

Photo and caption from Figure 4 in AIP Journal of Applied Physics article

Photo and caption from Figure 11 in AIP Journal of Applied Physics article

Photo and caption from Figure 12 in AIP Journal of Applied Physics article