Photonic computing seeing the light of day

Read three articles this past week or so that shed some light on Photonic computing, that is doing computing with light rather than electronic signals. We discuss the first two directly below and save the last for the end.

The first was a TechCrunch article, Bill Gates, Neo, [&] Gigafund backing Luminous in photonic computing, was about a new startup that raised $9M in seed funding to focus on AI computing using photonics rather than electronics. Luminous is working on a AI accelerator chip.

The second article was from MIT News, Chip design drastically reduces energy needed to compute with light, which discusses a paper in Physics Review (Large scale optical neural networks based on photo-electric multiplication) again focused on AI DL acceleration using photonics.

AI accelerators are cropping up everywhere (see my posts on Intel DL Boost, New Graphcore GC2 chips, and AI processing at the edge). So having a photonic, AI accelerator should be easy to integrate with.

Luminous from Princeton

Neuromorphic architecture with level-tuned neurons. The internal state of a primary neuron is used to enable a set of level-tuned neurons. Credit: Pantazi et al. ©2016 IOP Publishing

Luminous Computing is based on research that’s been done over the last decade at Princeton University, and seems to have as it’s goal to ship a single chip to replace 3000TPUs. We’ve talked about Google TPUs before (see Google releases new Cloud TPU, and TPU and hardware vs. software innovation – round 3 posts). We’ve also discussed photonics before (see our Photonic or Optical FPGAs on the horizon post).

Luminous Computings, CTO Mitchell Nahmias has helped author a number of papers, articles, & books on photonics with others from Princeton University, perhaps the most impressive being Principals of neuromorphic photonics. If they are following this approach, it would seem likely they are creating a new photonic neuromorphic chip.

There is absolutely minimal information on their web site so, I am assuming they’re targeting a neuromorphic chip from the CTO’s history of research publications. I could be completely incorrect on this assumption but will continue with it for now.

Neuromorphic AI is in contrast to standard neural network deep learning approaches to AI. We have discussed Neuromorphic computing in a number of posts (e.g. see IBM introduces the SYNAPSE chip from 2011 or University of Manchester fires up the largest neuromorphic chip from 2018, more if interested, just search for “Neuromorphic”). Neuromorphic approaches to AI although had early promise, have been losing the technological arms race, as standard AI DL using GPUs, implementing neural networks, seems to have vastly more adherents both in academia as well as industry.

On the other hand, simulating actual neurons, as in neuromorphic computing, has the potential to be a significant breakthrough. But only if it can be orders of magnitude faster, cheaper or somehow better than standard AI DL neural network processing.

In one of the news reports, it mentioned that Luminous Computing has working silicon. Assuming their approach has significant AI training advantages, will it also require a neuromorphic chip to perform AI inferencing?

Photonics from MIT

The MIT group states, that based on their simulation of the photonics chip, they can perform AI DL using neural network training with much less energy. In their paper, they mention that their matrix multiplication is performed passively by optical interference alone. In this case, both the input and the multiply and accumulate function (MAC) are done using photonics.

AI DL neural network training takes multiple iterations of matrix multiplications and accumulate (MAC) functions. A standard CPU takes about 20 pj/MAC (10**-12 pico-Joules/MAC) and a GPU takes about 1 pj/MAC. Whereas in simulation they were able to show their photonic chip should be able to perform 10 fJ/MAC (10**-15 femto-Joules/MAC) This would put the MIT photonic chip design power consumption per MAC ~1000X better than CPUs and ~100X better than GPUs.

The paper goes on to say that theoretically, they could perform at 50 to 100 zJ/MAC (10**-21 zepto-Joules/MAC), which would add another factor of 1000 to their power advantages.

In addition, MIT’s photonic chip design is being constructed using “free [space] optical components” and should be able to scale much better than “nano photonic implementations”. Nano photonics requires waveguides and optical couplers whereas free-space photonics uses lasers beams traveling through space.

Their paper claims that nano-photonics can only support ~1000 node neural networks but with their free-space photonic solution, they believe they can support a 1,000,000 node neural network. Not sure how this would work in a chip design with lasers visible, being routed via micro-mirrors and other optical components, around a chip.

Nothing I could find indicates that the MIT team has working silicon nor have spun out a startup with seed funding.

Energy use for AI DL modeling

There have been many news reports noting that training a AI DL neural network is 5X more carbon polluting than a driving a car. This statement is based on the third article, a single report, Energy and policy considerations for deep learning in NLP .

One can argue with the numbers. Also, the fact that the car has historically consumed gas, a non-renewable energy source and the GPU could theoretically use solar, wind or hydro renewable power for its energy needs also needs to be stated. Moreover, one trains a AI DL (NLP) model once with all the data and then may subsequently train it on new data that comes in, but it gets used a gazillion times. Most adults (at least in the US) have a car and drive it often, multiplying automobile energy impacts by 100s of millions in the US alone.

But the fact is that AI DL training, that iterates multiple times over 1000s to 1,000,000s of data items, takes energy and lots of it. How that energy is produced matter to global warming. And anything that has the potential to reduce this energy consumption should be welcomed.

~~~~

Photonics may be ready for prime time but only if they can significantly improve AI training .

Photo Credit(s): Logo from Luminous Computing website.

Figure from Neuromorphic computing mimics important brain feature Phys.org article

Screen grab of Figure 1 from the Large-Scale Optical Neural Networks Based on Photoelectric Multiplication paper.

Screen grab of Figure 1 from the Energy and Policy Considerations for Deep Learning in NLP paper.

All that AI DL training data comes from us

Read a couple of articles the past few weeks that highlighted something that not many of us are aware of, most of the data used to train AI deep learning (DL) models comes from us.

That is through our ignorance or tacit acceptation of licenses for apps that we use every day and for just walking around/interacting with the world.

The article in Atlantic, The AI supply chain runs on ignorance, talks about Ever, a picture sharing app (like Flickr), where users opted in to its facial recognition software to tag people in pictures. Ever also used that (tagged by machine or person) data to train its facial recognition software which it sells to government agencies throughout the world.

The second article, in Engadget , Colorado College students were secretly used to train AI facial recognition (software), talks about a group using a telephoto security camera than was pointed at a high traffic area on campus. The data obtained was used to help train an AI DL model to identify facial characteristics from far away.

The article went on to say that gathering photos from people in public places is not against the law. The study was also cleared by the school. The database was not released until after the students graduated but it did have information about the time and date the photos were taken.

But that’s nothing…

The same thing applies to video sharing and photo animation models, podcasting and text speaking models, blogging and written word generation models, etc. All this data is just lying around the web, freely available for any AI DL data engineer to grab and use to train their models. The article which included the image below talks about a new dataset of millions of webpages.

From an OpenAI paper on better language models showing the accuracy of some AI DL models “trained on a new dataset of millions of webpages called WebText.”

,Google photo search is scanning the web and has access to any photo posted to use for training data. Facebook, IG, and others have millions of photos that people are posting online every day, many of which are tagged, with information identifying people in the photos. I’m sure some where there’s a clause in a license agreement that says your photos, when posted on our app, no longer belong to you alone.

As security cameras become more pervasive, camera data will readily be used to train even more advanced facial recognition models without your say so, approval or even appreciation that it is happening. And this is in the first world, with data privacy and identity security protections paramount, imagine how the rest of the world’s data will be used.

With AI DL models, it’s all about the data. Yes much of it is messy and has to be cleaned up, massaged and sometimes annotated to be useful for DL training. But the origins of that training data are typically not disclosed to the AI data engineers nor the people that created it.

We all thought China would have a lead in AI DL because of their unfettered access to data, but the west has its own way to gain unconstrained access to vast amounts of data. And we are living through it today.

Yes AI DL models have the potential to drastically help the world, humanity and government do good things better. But a dark side to AI DL models also exist to help bad actors, organizations and even some government agencies do evil.

Caveat usor (May the user beware)

~~~~

Comments?

Photo Credit(s): “Still Watching You” by jhcrow is licensed under CC BY-NC 2.0 

Computational Photography Homework 1 Results.” by kscottz is licensed under CC BY-NC 2.0 

From Language models are unsupervised multi-task learners OpenAI research paper

ZooArchNet.org, a new collaboration for zoological-archeological data

Read an article the other day about a new collaboration data platform, the ZooArchNet, for archeological and zoological data ( data about animals and the history of humankind).

The collaboration was started at the Florida Museum at the University of Florida. They intend to construct a database that would allow researchers to track the history of animals and how humans have interacted with them over time.

The problem is there’s a lot of historical animal specimen information available in various locations/sites around the world and similarly, there’s a lot of data about the history of humanity, but there’s little that cross links the two. And by missing those cross links, researchers aren’t seeing the big picture, that humankind and animal-kind have co-existed since the dawn of time and have impacted each other throughout their history.

However, if there was a site where one could trace the history of animal and human life, across time, in a region, one could develop a better understanding of how they interact and impact one another.

Humankind interacting with animalkind

In the article, they discuss a number of examples where animals have been impacted by humankind over time. For example, originally the Mexican Turkey was domesticated for its feathers during Mayan, Aztec and other civilizations of Central America,. but over time it became a prized for use as food. While this was occurring, its range expanded considerably throughout North (and South) America.

It’s the understanding of habitat range over time and how humankind helped or hindered this range that’s best served by linking zoological and archeological data sets that exist in research libraries throughout the world.

How it works

One problem in cross linking such data is that it often exists in different formats and uses different metadata to describe it.

A key, early decision was to use a standard metadata format ,the Darwin Core (DwC) an outgrowth of the Dublin Core which is more focused on zoological data.

With this in place, the next problem was to translate specimen metadata into the DwC and extract the actual data (or URI’s) that described the specimen for harvesting. Once all that was accomplished they could migrate the specimen data or archeological data and host it/cross link it in their ZooArchNet database.

For example, the researchers at Florida Museum used the Open Context database to provide archeological informationand the Global Biodiversity Information Facility (GBIF) to supply biological diversity information and together the two were linked and cross indexed in the ZooArchNet database.

Once the data was available and located in Google Cloud storage, researchers could use Google BigQuery data analytics as well as other apps like (Google) indexers to create more data rich views and searchable indices for their ZooArchNet and VertNet web portals.

ZooArchNet is just starting. Most of the information currently available is about the few examples chosen to demonstrate the technology. As with anything like this, there’s a certain amount of crowd sourcing needed to make it worthwhile. It’s popularity will be a prime determinant on its usefulness over time. But anything that helps the world understand the true history of humanity’s impact on this life of this planet is worthwhile.

~~~~

Comments?

Photo Credit(s): “turkey bird” by watts photos1 is licensed under CC BY 2.0 

Workflow from ZooArchNet: Connecting zooarchaeological specimens to the biodiversity and archaeology data networks article

Darwin Core overview from Darwin Core: An Evolving Community-Developed Biodiversity Data Standard article

Need memory, Intel’s Optane DC PM to the rescue

I attended Intel’s DataCentric Innovation Conference Tech Field Day eXclusive (TFDx) last April. There were a couple of items Intel presented at the show that peaked my interest there, one of which was DL Boost (see my Intel’s new DL Boost for AI inferencing blog post) and the other was Optane DC PM (data center persistent memory). This post is about Optane DC PM.

As you already know, Optane SSDs have been on the market now for at least a year or so and have not gained much market traction as of yet. I and others attribute this to the high price differential between Optane SSDs and NVMe Flash SSDs but others may say it’s a matter of production yields – probably a little of both.

But Optane, as announced, always had another form factor (if that’s the right term), as persistent memory that could dramatically increase the size of server memory to support new memory intensive applications at a lower price than DRAM.

I was at Nutanix .NEXT conference last week and saw a 4 socket server from DELL that had 6TB of DRAM in it (and 4-44 core CPUs). I didn’t ask the price but when I mentioned I wanted one for my home office, they said it could easily heat my house. So the other problem with a lot of DRAM is power consumption.

Optane DC PM (data center persistent) memory is intended to solve both the high cost and high power consumption problems of DRAM.

How does it work in a server

The newer Intel motherboards support up to 12 slots of memory per socket. And up to 6 of these can be Optane DC PM (512GB DIMM) or 3TB per socket. Optane DC PM is accessed via L1-L2 caching just like any other memory. Apparently with a dual socket system you can have up to 11 Optane DC PM DIMMs on the motherboard.

L1-L2 cache access times are on the order of picoseconds (10**[-12] seconds), DRAM is on the order of nanoseconds (10**[-9] seconds) and flash is on the order of 100 microseconds (100*10**[-6] seconds). So there’s a vast access time gulf between DRAM and Flash that could be exploited with the right technology – enter Optane DC PM.

The only detailed info I could find on Optane DC PM access times was in a research paper (see Basic performance of Intel Optane DC PMM research paper) and it said Optane DC PM assessing times are ~350 nanoseconds, or close to right between DRAM and Flash. At the show the development team indicated that Optane DC PM support about 3GB/sec of bandwidth per module (DIMM).

There are two ways to use Optane DC PM:

  1. Memory mode – in Memory mode, the data in Optane DC PM is thrown away during a power cycle. You must use a block of DRAM as a cache or rather a virtual memory block to the Optane DC PM acting as a paging store. Data is brought into the DRAM cache when accessed using its (virtual) DRAM address and when no longer used. it gets evicted (destaged) back out to Optane DC PM. When power is cycled the data in Optane DC PM is cleared out. Optane DC PM supports AES XTS-256 bit encryption and can easily be cleared by throwing away encryption keys during a power cycle.
  2. App Direct mode – in App Direct mode, Optane DC PM is accessed directly using application APIs and its data persists across power cycles. For App Direct mode, Optane DC PM is still AES 256 encrypted but here the encryption key is maintained across power cycles but is locked on power up and you need to use a pass phrase to unlock it. In this mode, applications are responsible for flushing (L1-L2) caches to Optane to retains all data written through L1-L2 to the Optane DC PM. There’s a GitHub Persistent Memory Development Kit (PMDK) library for that supports the App Direct mode API that applications need to use.

Both modes use DDR-T, (transactional DDR4) a new memory bus protocol for Optane DC PM access. In the DDR-T protocol, access to the memory bus is requested by a CPU and is granted by an Optane DC PM DIMM. All Optane DC PM DIMMs on a system can be accessed in parrallel.

You can use RDMA to replicate (App direct?) Optane DC PM data from one system to another. In order to support Memory and App Direct mode, Optane DC PM required CPU, BIOS and (application) software changes.

Most of the Optane DC PM support and cryptology logic is implemented in hardware. Optane DC PM has an address indirection table (AIT) to support 3D XPoint wear leveling maintained in DRAM but flushed to Optane during power loss. Transfers to 3D XPoint media is in 256 byte cache lines but the memory bus operates in 64 byte cache lines, so there’s a (DRAM) buffer between media and memory bus.

Optane also supports a high availability, or up to two device failure mode. In this scenario, if one Optane DC PM DIMM fails, the system can swap another spare Optane DC PM DIMM into that address space and continue to operate. If a 2nd Optane DC PM fails then the system fails. Not sure what happens to the data on the original Optane DC PM DIMM during a failure. It seems to me data would be lost, but it could depend on its failure mode.

In Memory mode, the expected ratio between DRAM size and Optane DC PM size is should be 32GB DRAM/6TB Optane DC PM. At the TFDx event, the Optane DC PM team had some performance charts showing different DRAM cache miss rates. Intel also announced new CPU monitoring statistics to track application/workloads impacting DRAM/Optane DC PM in Memory mode and to track Optane DC PM health.

Optane DC PM usage modes are established by the BIOS. It’s flexible enough to have the Optane DC PM usage modes be defined on a region by region basis. Not exactly sure what a region is, but it could be an address range spanning MB(?) of Optane DC PM. With both modes in operation on a system, data can be moved from Memory mode Optane to App direct mode Optane or vice versa.

Intel expects that lifetime of an Optane DC PM DIMM to be from 200-370PB of data writes. Optane DC PMs have a 5 year warrantee. Given its bandwidth (3GB/sec), 200PB of data writes should last ~2 years but that’s at 100% duty cycle, writing 3GB of data, every second of every day. So, 5 years should be a reasonable guarantee using a more realistic ~40% duty cycle.

What applications use Optane DC PM

The one of interest to most people seems to be SAP HANA. According to the development team, SAP HANA could use App Direct mode for main database storage and use DRAM for its delta column store. Cassandra could also use Optane in App Direct mode in a similar fashion.

Also something like a REDIS for key-value store could use Optane DC PM to store Values and use DRAM to store Keys.

But any application could take advantage of the extra memory made available with Optane DC PM DIMMs in Memory mode today. Of course any use of Optane DC PM would require the right levels of Intel Xeon CPUs (Cascade Lake), BIOSes and motherboards.

~~~~

Interested in learning more, TFDx videos of the event are available on the website noted previously. Also these TFDx bloggers also have posts specifically on Optane DC PM.

The coolest thing since sliced bread – Optane by Matt Leib, (@MBLeib)

Intel’s crossover point: A 3D spork by Stephen Foskett (@SFoskett)

Intel answering SAP HANA’s tough questions by Keith Townsend (@CTOAdvisor)

Comments?

Polarized laser light speeds up data center networks

binary data flow

Read an article the other day, Polarizing the data center from IEEE Spectrum, on new optical technology that has the potential to boost data center networking speeds by ~7x beyond what it is today. The research was released in a Nature article, Ultrafast spin lasers (paywall) but a previous version of the paper was released on PLOS (Ultrafast spin lasers) was freely available

It’s still in lab demonstration at this point, but if it does make into the data center, it has the potential to remove local networking as a bottleneck for application workloads, at least for the foreseeable future.

The new technology is based on polarizing (right or left circular) laser light and using plolarization to encode ones and zeros. Today’s optical transceivers seem to use on-off or brightness level to encode data signals, which requires a lot of power (and by definition cooling) to work. On the other hand, polarizing laser light takes ~7% of the power (and cooling), then the old style of on and off laser light. 

How it works

Not sure I understand all the physics but it appears that if you are able to control the carrier spin within a semiconductor, Vertical-Cavity Surface-Emitting Laser (VCSEL), it transmutes carrier spin into photon polarization, and by doing so, emits polarized laser light. And with appropriate sensors, this laser light polarization can be detected and decoded. 

In addition, due to some physical constraints, modulating (encoding) laser intensity will never be faster than modulating (encoding) carrier spin. This has something to do with cycling the laser on and off vs, the polarization process. As such, one should be able to can transmit more information by polarized laser light than by intensified laser light.

Moreover, polarization can be done at room temperature. Apparently, VCSELs operating today typically hit 70C in normal high speed operations, vs. ~21C for VCSELs using polarization.

Lab results

In the lab they are using (I believe) mechanical bending in combination with a pulsed laser to create the spin carriers in the VCSEL’s that polarize the laser light. This is just used for demonstrating purposes. Unclear whether this approach will be useable in a data center application of the technology.

In their lab experiments they were able to demonstrate VCSEL polarization cycles (how quickly they could change polarization) in the 5 ps (pico-second, trillionths of a second) range. This resulted in transmitting something on the order of 214Ghz of polarized light cycles. Somewhere in the PLoS article they mentioned transmitting a random bit string using the technology and not just cycling through 1s and 0s over and over again.

The researchers believe that by moving from mechanical bending, to the use of a photonic crystal or strained quantum well-based VCSELs will allow them to move from signaling at 214Ghz to 1Thz, or ~28X what can be done with laser intensity signaling today. 

I don’t know whether the technology will get out of the lab anytime soon but 1Thz  (~1Tbps) seems something most IT organizations would want, especially if the price is right is similar to today’s technology.

The research mentioned this would be more suitable for data center networking rather than long range data transfers. Not sure why but it could be because 1) it’s still relatively experimental and 2) they have yet to determine distance degradation parameters.

Of course normal (on-off) signaling technology using VCSELs is not standing still. There’s always a potential for moving beyond any current physical constraints to boost some technologies capabilities. Just witness the superparamagnetic barrier in magnetic disk over the years. That physical barrier has moved multiple times during my career.

However, a nearly order of magnitude of speed and more than an order of magnitude of power/cooling improvements are hard to come by with mature technology. I see a polarized optical fiber networking in data centers of the future.

~~~~

Comments?

Photo Credit(s):

Intel’s new DL Boost for DL AI inferencing

I was at a TechFieldDay Extra with Intel Data Centric Innovation Conference last week in San Francisco. It was a lavish affair with many industry analysts in attendance besides the TFDx crew.

At the event Intel announced a number of new products including the availability of their next generation scaleable Xeon processor chips, new Optane DC PM (DIMM) and software, new Ethernet (800) NIC cards, new FPGA line (10nm) and DL (deep learning inferencing) boost functionality.

I was most interested in the DL Boost and Optane DC PM solutions. For this post I focus on DL Boost.

DL Boost for DL inferencing on Xeon

Intel’s DL Boost technology provides a new integer 8 bit precision (INT 8) matrix multiply & summation instruction which can be used to speed up DL inferencing operations. As those who have been following along with my AI-DL-machine learning (ML) blog posts (latest being Learning Machine Learning part 3), probably know, deep learning machine learning that processes data to create a neural network made up with a number of layers and a number of nodes each of which represents a floating point weight used to transform inputs into outputs.

All DL AI projects involve at least two phases: model training and model inferencing (prediction, classification, AI result, etc.). Although both of these activities involve matrix calculations, model training involves a lot more of these compute intensive operations than inferencing. In fact, while training typically is done on GPUs or other special purpose compute hardware (TPU, IPUs, etc.) inferencing can typically be done on standard off the shelf CPUs.

Historically. inferencing used floating point matrix multiplication and summation functionality ,taking input from sensors, logs, photos, etc. and performing the model logic to create an output.

Intel believes (with industry analyst agreement) that over time, 50% or more of the DL AI workload is going to involve inferencing. Hence, the focus on this end of the AI workload, at least for now.

For example, although speech recognition AI can take a long time to process audio recordings and use reinforcement learning to train a recognition model. But, once trained, you could use that recognition AI model in anything from smart speakers, to speech to text dictation machines, to voice response systems, etc. In all of these the recognition model is passed a voice recording (or voice in real time) and processes these to create a text version of the speech.

But all of this has historically been done in floating point (FP) 32 (bit precision) or FP 16. Google’s TPU is capable of doing this with less precision, but to my knowledge, up to this point, it’s always been floating point.

What is DL Boost

What Intel has done with DL Boost is to create a new X86 instruction which can perform an integer (INT) 8 (bit precision) matrix multiplication and summation with less cycles than what it took before. Intel believes if customers were to modify their trained AI neural network models to move from FP 32 (or 16) to INT 8, they could perform inferencing much faster on Xeon Cascade Lake CPUs, than they could before and not have to rely on GPUs for this activity at all.

Yes, this does require hand optimization of trained AI neural network. Some of this may be automated, but not all. Intel claims the precision loss, if done properly, is less than a few percent and it’s impact on AI inferencing correctness is negligible at best.

At the moment, for all the DL modeling I have done, i have never looked at the trained model’s weights leaving this to TensorFlow/Keras to manage for me. But I’m not creating production level DL AI systems (yet). So, I don’t know what it would take to modify my AI models to use INT 8 nor what level of degradation in correctness would ensue. But I also don’t have Cascade Lake Xeon CPUs available.

Some potential problems here:

  1. Manual activity to hand tune the INT 8 neural network is not going to be that popular, except for those organizations where inferencing requires GPUs.
  2. Most production DL AI models, undergo some form of personalization for a user or implementation instance which would require a further FP to INT conversion for each user/implementation.
  3. Most production DL AI models also undergo periodic retraining to fine tune the model with the latest data that has been accumulated. This would also require further FP to INT conversion after each training cycle.

In the end, there’s an advantage for production AI inferencing, for models that don’t require substantial retraining/personalization as they don’t change that often. And there’s a definite cost advantage to using DL Boost INT 8, for those AI inferencing that must use GPUs today to perform in real time or under other performance constraints.

But hand converting neural networks, reminds me of creating assembly code for modules that can impact performance. This is normally reserved for only a select modules or functionality that executud a lot. However, DL models are much more monolithic and by definition, less modular. Identifying which models (or model layers) within a production DL AI solution that are performance sensitive and hand optimizing them to work on CPUs rather than GPUs, seems like a hard task.

It would be better from my perspective to create a single FP 16 matrix multiplication instruction. Alternatively, create some software that would automatically convert any DL AI model (or model layer) from FP to INT 8. That way DL Boost optimization would be just another step in the model training process and could be automatically generated to see if A) it loses too much sensitivity and B) if it’s worthwhile using CPU inferencing.

~~~~

Comments?

For data that never rests, NetApp NDAS

NetApp co-founder, Dave Hitz announced he was becoming a NetApp Founder Emeritus at the Storage Field Day (SFD18) show. He gave a great session about what he and his Hitz foundation’s been doing (for one example see our Archeology meets big data, post). He also discussed at length where he felt the storage world (and NetApp) must do to address the opportunities of the new cloud world. But this post isn’t about Dave, it’s about NetApp Data Availability Service, NDAS.

NetApp NDAS, currently in Beta but GAing (hopefully) later this year, is an AWS marketplace data orchestration solution that manages primary to secondary to S3 movement for ONTAP data. Essentially, NetApp Data Availability Services extends ONTAP data lifecycle management to AWS cloud. But it’s more than just a way to archive ONTAP data.

NDAS orchestrates Snapmirror services across ONTAP systems and AWS. But once your ONTAP data is in S3 it supplies access to that data for authorized AWS applications and services. That way one can use their ONTAP data to provide data analytics, train AI models, and do just about anything you can do with AWS applications today. By using NDAS, customers can extract more value from their ONTAP data.

NDAS is not just copying data to S3 but is also copying ONTAP metadata, catalogues and other information that provides context for that data. By copying ONTAP catalog information, customers and authorized end users can have file level access to ONTAP data residing in S3 objects.

NDAS today, only supports copying data from secondary ONTAP systems to S3. But a future enhancement will expand this to copy primary ONTAP data to S3.

How does NDAS work

NDAS provisions (your) EC2 instances, and middleware to read the data from the secondary systems and copy it to S3 buckets which you provide. NDAS after initial configuration to point to your ONTAP secondary storage systems, will autodiscover all the data available that can be copied to the cloud.

NDAS will start cataloguing your ONTAP data. NDAS EC2 instances support the NDAS copy, view and a Google-like search processes.

NDAS search presents a simplified file system view into your ONTAP data copied to S3. That way customers can identify data that could be used for AI training or data analytics that run in the cloud to access the data.

There’s extensive security to insure that NDAS is properly authorized to access your ONTAP data. Normal S3 security options also apply such as to have the data be encrypted on S3. NDAS data is automatically encrypted in flight.

Moreover, NDAS S3 bucket data can be replicated across AWS regions . Also serverless/lambda funationality are fully supported from or NDAS S3 buckets. .

What can it do with the data

AWS applications can access the data directly through NDAS APIs. Or customers can manually extract data they want to further process using the NDAS GUI to identify and copy data of interests. NDAS essentially creates a small app layer that allows users to view and access the ONTAP data in S3 as a file system.

One can have different NDAS AMIs operating in different regions for faster access or to support GDPR compliance requirements. Alternatively, a customer could have one NDAS AMI accessing all their secondary ONTAP instances.

NDAS is intended to provide a data analyst or IT generalist access to ONTAP data. This way AI training and big data analytics applications which run easily in the cloud, can have access to ONTAP data. In this way, customers can more effectively utilize data that IT has been storing and maintaining, since time began.

One NDAS beta customer is a MLB team. They have over time instrumented their stadiums to generate lot’s of data about pitch speed, rotation, ball location as it crosses the plate, etc.   The problem with all this data is siloed in onprem or IOT systems that generated it. But the customer wants to use the data to improve players, coaches and the viewer experience. And all that needs tools, applications and software that’s just not available to run in the data center. But with NDAS all this data is now available to cloud applications.

NDAS is supported by any ONTAP 9.5 or later (FAS, AFF, Cloud ONTAP, ONTAPselect) secondary storage system. ONTAP 9.5 software contains all the services required to support NDAS. This includes the copy-to-cloud APIs, as well as the NDAS proxy, which supplies the secure interface to NDAS operating in the cloud.

NetApp’s NDAS sessions are pretty informative. Anyone interested in finding out more should checkout the videos available on TechFieldDay website and Dave’s session is also worth a view.

For more information on Dave’s session and NDAS check out:

NetApp, Cloudier than ever by Enrico Signoretti (@ESignoretti)

NetApp and the space in between by Dan Frith (@PenguinPunk)

~~~~

Comments?

DNA IT, the next revolution

I’ve been writing about DNA computing and storage for quite awhile now (see DNA computing and the end of natural evolution, DNA storage and the end of evolution part 2, & Random access DNA object storage system). But in the last few months there’s been a flurry of activity in this space that seems worthy of note.

DNA programing language

First up, A logic programing language for computational nucleic acid devices, a research article in ACS Synthetic Biology magazine. The research describes a new approach to programming DNA computers, that’s uniquely designed to mimic molecular algorithmic capabilities for DNA devices. T\

The language uses logical statements and predicates (reminds me of Prolog). Indeed, the language was modeled after Prolog with equational and molecular extensions to represent DNA functionality. As with Prolog, output is a function of declarative, predicate logic rather than control flow and assignment in normal programming languages. Logic programming takes a different mind set and demands an understanding of formal logic.

The article talks about applications for DNA computing for in vitro (chemical/protien) manufacturing, diagnosis, and therapeutics (operating inside living cells) devices (cells).

DNA storage device

Next up, a recent article in Scientific Reports, Demonstration of end-to-end automation of DNA data storage.

The intent here is to create a fully automated data storage device that uses DNA as its recording media. The current device (seen in the bottom right above) is a lab prototype, that fits on a bench and costs $10K that can store 5 bytes of data with error correction.

The system has three hardware modules: synthesis (writing), storage and sequencing (reading). It also includes encoding and decoding software that translates bits to nucleic acid bases and adds error correction to it. They need to add more bases to be compatible with the sequencing (reading) process.

The limits to storage may have something to do with the size of the storage vessel as well as the size of the DNA string that can be synthesized/sequenced. . Error correction is based on a 6 base (bit) hashing code (less than a byte for 5 bytes). The systems write to read-back time is ~21 hrs.

The device creates many copies of the DNA (data) strand. The 5 byte (“HELLO”) string took 4 micrograms of liquid and yielded 3469 DNA strands, 1973 of which aligned properly to their adapter sequence. Of those properly aligned DNA strands, 30 had extractable payload regions of which 1 was correct, the other 29 were corrupted.

This is a very poor BER (bit error rate). For comparison LTO-7/8 has a BER of 1:10**19 bits, and enterprise disk has a BER of 1:10**15 bits. This DNA storage device has a BER of 3469:1 or ~99.9% of all bits written were lost.

To get a better understanding of the BER, they stored a 100 base (~12 byte) data payload. Of the 25,592 strands created, 286 aligned properly and of those 251 were corrupted, 11 had invalid hashes, and 8 were corrupted but correctable (valid hashes invalid data) and 16 were perfect reads. So 25592 strands had 24 proper reads ~1K:1 BER (not entirely correct because the correctable strands actually had bit errors but we can give them that).

DNA computer architecture

Last up, an IEEE Spectrum article, discussing CalTech Research, DNA computer shows programmable chemical machines are possible, reporting on an article in Nature, Diverse and robust molecular algorithms using reprogrammable DNA self-assembly (paywall). This DNA computer system is made of just DNA and salt water. It computes algorithms on 6 bits of input and uses DNA logic gates.

The Caltech team created 2 input-2-output boolean gates out of DAN sequences, five of these gates are connected to form a computation layer. It supports 6 input and 6 output bits. But you can layer multiple computational levels on top of one another where the output of one layer can be fed in as input to the layer on top of it.

One key, is that the DNA computer self assemblies the computational layer. They use a seed layer as a starter DNA strand and then the input (mixed inside a vial) is attached to this seed layer and then the computational layers are attached one by one until the output is generated.

Each computational layer is made up of DNA computational tiles that attach together sort of like a circuit. they were able to create a 355 instruction set for their DNA computer. In comparison the IBM 360 had a one byte op code (at most 256 instructions).

They have a compiler that allows researchers to write a software algorithm and this translates code into DNA circuit tiles, computational layers and ultimately into a DNA computer.

According to the article, it takes 1-2 hours to grow the computational DNA crystal and another day or so for the computation to complete.

An interesting approach to DNA computation but it’s unclear if they have any branching mechanisms in their “instruction set”. And 6 bit input/output seems a bit limiting. However, by creating boolean gates with DNA, they could recreate any type of electronic computer that exists today.

~~~~

Put it all together and someday you could have a DNA compute server and storage.

One thing that’s missing is a (packet switched or token ring) network for transferring data between cells (and maybe into and out of DNA storage). They could probably use some sort of vascular (network) system with a way to transfer data from inside a cell to the network and into another cell .

That way they could gang a number of DNA compute servers (cells) together and maybe create a cellular automata machine.

The future of computation looks wetter now.

Photo Credit(s):