Internet of Tires

Read an article a couple of weeks back (An internet of tires?… IEEE Spectrum) and can’t seem to get it out of my head. Pirelli, a European tire manufacturer was demonstrating a smart tire or as they call it, their new Cyber Tyre.

The Cyber Tyre includes accelerometer(s) in its rubber, that can be used to sense the pavement/road surface conditions. Cyber Tyre can communicate surface conditions to the car and using the car’s 5G, to other cars (of same make) to tell them of problems with surface adhesion (hydroplaning, ice, other traction issues).

Presumably the accelerometers in the Cyber Tyre measure acceleration changes of individual tires as they rotate. Any rapid acceleration change, could potentially be used to determine whether the car has lost traction due and why.

They tested the new tires out at a (1/3rd mile) test track on top of a Fiat factory, using Audi A8 automobiles and 5G. Unclear why this had to wait for 5G but it’s possible that using 5G, the Cyber Tyre and the car could possibly log and transmit such information back to the manufacturer of the car or tire.

Accelerometers have become dirt cheap over the last decade as smart phones have taken off. So, it was only a matter of time before they found use in new and interesting applications and the Cyber Tyre is just the latest.

Internet of Vehicles

Presumably the car, with Cyber Tyres on it, communicates road hazard information to other cars using 5G and vehicle to vehicle (V2V) communication protocols or perhaps to municipal or state authorities. This way highway signage could display hazardous conditions ahead.

Audi has a website devoted to Car to X communications which has embedded certain Audi vehicles (A4, A5 & Q7), with cellular communications, cameras and other sensors used to identify (recognize) signage, hazards, and other information and communicate this data to other Audi vehicles. This way owning an Audi, would plug you into this information flow.

Pirelli’s Cyber Car Concept

Prior to the Cyber Tyre, Pirelli introduced a Cyber Car concept that is supposedly rolling out this year. This version has tyres with real time pressure, temperature, (static) vertical load and a Tyre ID. Pirelli has been working with car manufacturers to roll out Cyber Car functionality.

The Tyre ID seems to be a file that can include anything that the tyre or automobile manufacturer wants. It sort of reminds me of a blockchain data blocks that could be used to validate tyre manufacturing provenance.

The vertical load sensor seems more important to car and tire manufacturers than consumers. But for electrical car owners, knowing car weight could help determine current battery load and thereby more precisely know how much charge is left in a battery.

Pirelli uses a proprietary algorithm to determine tread wear. This makes use of the other tyre sensors to predict wear and perhaps uses an AI DL algorithm to do this.

~~~

ABS has been around for decades now and tire pressure sensors for over 10 years or so. My latest car has enough sensors to pretty much drive itself on the highway but not quite park itself as of yet. So it was only a matter of time before something like smart tires would show up.

But given their integration with car electronics systems, it would seem that this would only make sense for new cars that included a full set of Cyber Tyres. That is until all tire AND car manufacturers agreed to come up with a standard protocol to communicate such information. When that happens, consumers could chose any tire manufacturer and obtain have similar if not the same functionality from them.

I suppose someone had to be first to identify just what could be done with the electronics available today. Pirelli just happens to be it for now in the tire industry.

I just don’t want to have to upgrade tires every 24 months. And, if I have to wait a long time for my car to boot up and establish communications with my tires, I may just take a (dumb) bike.

Photo Credit(s):

Data analysis of history

Read an article the other day in The Guardian (History as a giant data set: how analyzing the past could save the future), which talks about this new discipline called cliodynamics (see wikipedia cliodynamics article). There was a Nature article (in 2012), Human Cycles: History as Science, which described cliodynamics in a bit more detail.

Cliodynamics uses mathematical systems theory on historical data to predict what will happen in the future for society. According to The Guardian and Nature articles, the originator of cliodynamics, Peter Turchin, predicted in 2010 that the world would change dramatically for the worse over the coming decade, with violence peaking in 2020.

What is cliodynamics

Cliodynamics depends on vast databases of historical data that has been amassed over the last decade or so. For instance, the Seshat Global History Databank (started in 2011, has 3 datasets: moralizing gods, axial age history [8th to 3rd cent. BCE], & social complexity), International Institute of Social History (est. 1935, in 2013 re-organized their collection to focus on data, has 33 dataverses ranging from data on apprenticeships, prices and wage history, strike history of various countries and time periods, etc. ), and Google NGRAM viewer (started in 2010, provides keyword statistics on Google BOOKs).

Cliodynamics uses the information from databases like the above to devise a mathematical model of the history of the world. From their mathematical model, cliodynamics researchers have discerned patterns or cycles in human endeavors that have persisted over centuries.

Cliodynamic cycles

Two of cycles of interest come to mind:

  • Secular cycle – this plays out over 2-3 centuries and starts out with a new egalitarian society that has low levels of inequality where the supply and demand for labor are roughly equal. Over time as population grows, the supply of labor outstrips demand and inequality increases. Elites then start to battle one another, war and political instability results in a new more equal society, re-starting the cycle .
  • Fathers and sons cycle – this plays out over 50 years and starts when the (fathers) generation responds violently to social injustice and the next (sons) generation resigns itself to injustice (or hopefully resolves it) until the next (fathers) generation sees injustice again and erupts violently re-starting the cycle over again. .

It’s this last cycle that Turchin predicted to peak again in 2020, the last one peaking in 1970 and the ones before that peaking in 1920 and 1870.

We’ve seen such theories before. In the 19th and 20th centuries there were plenty of historical theorist. Probably the most prominent was Marx but there were others as well.

The problem with cliodynamics, good data

Sparsity and accuracy of data has always been a problem with historical study. Much information is lost through natural or manmade disasters and much of what’s left is biased. Nonetheless, more and more data is being amassed of a historical nature every day, most of it quantitative and suitable to analysis.

Historical data, where available, can be assessed scientifically, and analyzed by using current tools such as data analytics, machine learning, & deep learning to ascertain trends and make predictions. And the more data available, the more accurate these analyses and predictions can become. Cliodynamics pre-dates much of these tools. but that’s no excuse for not to taking advantage of them.

~~~~

As for 2020, AI, automation and globalization has led and will lead to more job disruption. Inequality is also on the rise, at least throughout much of the west. And then there’s Brexit, USA elections and general mid-east turmoil that seems to all be on the horizon.

Stay tuned, 2020 seems only months away.

Photo Credits:

From Key Historic Figures of WW1 article, Mansell/Ghetty Images, (c) ThoughtCo

Anti War March (1968 Chicago) By David Wilson , CC BY 2.0, Link

Eleven times Americans have marched on Washington, (1920, Washington DC) (c) Smithsonian Magazine

Cambrian Explosion of AI DL app’s in industry and the world

I was at the NetApp Insight conference last week and recorded a podcast (see: GreyBeards Podcast) on what NetApp is doing in the AI DL (Deep Learning) space. On the podcast, we talked about a number of verticals that were deploying AI DL right now and using it to improve outcomes.

It was only is 2012 that AI DL broke out and pretty much conquered the speech recognition contest by improving recognition accuracy by leaps and bounds. Prior to that improvements had been very small and incremental at best. Here we are, just 7 years later and AI DL models are proliferating across industry and every other sector of the world economy.

DL applications in the real world

At the show. we talked about AI DL models being used in healthcare (radiological image analysis, cell counts for infection assessments), automotive (self driving cars), financial services (fraud detection), and retail (predicting how make up would look on someone).

And early this year, at HPE Discover, they discussed a new technique to share training data but still keep it private. In this case, they use block chain technology to publish and share a DL neural network model weights and other hyper parameters trained for some real world purpose.

Customers download and use the model in their day to day activities but record the data that their model analyzes and its predictions. They use this data to update (re-train) their DL neural net. They then publish their new neural net model weights and other parameters to all the other customers. Each customer of the model do the same, updating (re-training) their DL neural net.

At some point an owner or global model arbitrator takes all these individual model updates and aggregates the neural net weights, into a new neural net model and publishes the new model. And then the process starts over again. In this way, training data is never revealed, kept secure and private but DL model updates that result from re-training the model with secured private data would be available to any customer.

Recently, there’s been a slew of articles across many different organizations that show how AI DL is being adopted to work in different areas:

And that’s just a sample of the last few weeks of papers of AI DL activity.

Next Steps

All it takes is data, that can be quantified and classified. With data and classifications in hand, anyone can train a DL model that performs that classification. It doesn’t require GPU farms, decent CPUs are up to the task for TB of data.

But if you want better prediction/classificatoin accuracy, you will need more data which means longer AI DL training runs. So at some point, maybe at >100TB of data, or use AI DL training a lot, you may want that GPU farm.

The Deep Learning with Python book (my favorite) has a number of examples such as, sentiment analysis of text, median real estate pricing predictions, generating text that looks like an authors work, with maybe a dozen more that one can use to understand AI DL technology. But it’s not rocket science, I believe any qualified programmer could do it, with some serious study.

So the real question is what are you doing with your data to make use of AI DLmodels now?

I suppose the other question ought to be, how can you collect more data and classification information, to train more AI DL models?

~~~~

It’s great to be in the storage business.

Photo Credit(s):

Where should IoT data be processed – part 1

I was at FlashMemorySummit 2019 (FMS2019) this week and there was a lot of talk about computational storage (see our GBoS podcast with Scott Shadley, NGD Systems). There was also a lot of discussion about IoT and the need for data processing done at the edge (or in near-edge computing centers/edge clouds).

At the show, I was talking with Tom Leyden of Excelero and he mentioned there was a real need for some insight on how to determine where IoT data should be processed.

For our discussion let’s assume a multi-layered IoT architecture, with 1000s of sensors at the edge, 100s of near-edge processing/multiplexing stations, and 1 to 3 core data center or cloud regions. Data comes in from the sensors, is sent to near-edge processing/multiplexing and then to the core data center/cloud.

Data size

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)
Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

When deciding where to process data one key aspect is the size of the data. Tin GB or TB but given today’s world, can be PB as well. This lone parameter has multiple impacts and can affect many other considerations, such as the cost and time to transfer the data, cost of data storage, amount of time to process the data, etc. All of these sub-factors include the size of the data to be processed.

Data size can be the largest single determinant of where to process the data. If we are talking about GB of data, it could probably be processed anywhere from the sensor edge, to near-edge station, to core. But if we are talking about TB the processing requirements and time go up substantially and are unlikely to be available at the sensor edge, and may not be available at the near-edge station. And PB take this up to a whole other level and may require processing only at the core due to the infrastructure requirements.

Processing criticality

Human or machine safety may depend on quick processing of sensor data, e. g. in a self-driving car or a factory floor, flood guages, etc.. In these cases, some amount of data (sufficient to insure human/machinge safety) needs to be done at the lowest point in the hierarchy, with the processing power to perform this activity.

This could be in the self-driving car or factory automation that controls a mechanism. Similar situations would probably apply for any robots and auto pilots. Anywhere some IoT sensor array was used to control an entity, that could jeopardize the life of human(s) or the safety of machines would need to do safety level processing at the lowest level in the hierarchy.

If processing doesn’t involve safety, then it could potentially be done at the near-edge stations or at the core. .

Processing time and infrastructure requirements

Although we talked about this in data size above, infrastructure requirements must also play a part in where data is processed. Yes sensors are getting more intelligent and the same goes for near-edge stations. But if you’re processing the data multiple times, say for deep learning, it’s probably better to do this where there’s a bunch of GPUs and some way of keeping the data pipeline running efficiently. The same applies to any data analytics that distributes workloads and data across a gaggle of CPU cores, storage devices, network nodes, etc.

There’s also an efficiency component to this. Computational storage is all about how some workloads can better be accomplished at the storage layer. But the concept applies throughout the hierarchy. Given the infrastructure requirements to process the data, there’s probably one place where it makes the most sense to do this. If it takes a 100 CPU cores to process the data in a timely fashion, it’s probably not going to be done at the sensor level.

Data information funnel

We make the assumption that raw data comes in through sensors, and more processed data is sent to higher layers. This would mean at a minimum, some sort of data compression/compaction would need to be done at each layer below the core.

We were at a conference a while back where they talked about updating deep learning neural networks. It’s possible that each near-edge station could perform a mini-deep learning training cycle and share their learning with the core periodicals, which could then send this information back down to the lowest level to be used, (see our Swarm Intelligence @ #HPEDiscover post).

All this means that there’s a minimal level of processing of the data that needs to go on throughout the hierarchy between access point connections.

Pipe availability

binary data flow

The availability of a networking access point may also have some bearing on where data is processed. For example, a self driving car could generate TB of data a day, but access to a high speed, inexpensive data pipe to send that data may be limited to a service bay and/or a garage connection.

So some processing may need to be done between access point connections. This will need to take place at lower levels. That way, there would be no need to send the data while the car is out on the road but rather it could be sent whenever it’s attached to an access point.

Compliance/archive requirements

Any sensor data probably needs to be stored for a long time and as such will need access to a long term archive. Depending on the extent of this data, it may help dictate where processing is done. That is, if all the raw data needs to be held, then maybe the processing of that data can be deferred until it’s already at the core and on it’s way to archive.

However, any safety oriented data processing needs to be done at the lowest level and may need to be reprocessed higher up in the hierachy. This would be done to insure proper safety decisions were made. And needless the say all this data would need to be held.

~~~~

I started this post with 40 or more factors but that was overkill. In the above, I tried to summarize the 6 critical factors which I would use to determine where IoT data should be processed.

My intent is in a part 2 to this post to work through some examples. If there’s anyone example that you feel may be instructive, please let me know.

Also, if there’s other factors that you would use to determine where to process IoT data let me know.

Improving floating point

Read a post this week in Reddit pointing to an article that was from The Next Platform (New approach could sink floating point computation). It was all about changing IEEE floating point format to something better called posits, which was designed by noted computer architect, John Gustafson, et al, (see their paper Beating floating point at its own game: Posit arithmetic, for more info).

The problems with standard floating point have been known since they were first defined, in 1985 by the IEEE. As you may recall, an IEEE 754 floating point number has three parts a sign, an exponent and a mantissa (fraction or significand part). Both the exponent and mantissa can be negative.

IEEE defined floating point numbers

The IEEE 754 standard defines the following formats (see Floating point Floating -point arithmetic, for more info)

  • Half precision floating point, (added in 2008), has 1 sign bit (for the significand or mantissa), 5 exponent bits (indicating 2**-62 to 2**+64) and 10 significand bits for a total of 16 bits.
  • Single precision floating point, has 1 sign bit, 8 exponent bits (indicating 2**-126 to 2**+128) and 23 significand bits for a total of32 bits.
  • Double precision floating point, has 1 sign bit, 11 exponent bits (2**-1022 to 2**+1024) and 52 significand bits.
  • Quadrouple precision floating point, has 1 sign bit, 15 exponent bits (2**-16,382 to 2**+16,384) and 112 significand bits.

I believe Half precision was introduced to help speed up AI deep learning training and inferencing.

Some problems with the IEEE standard include, it supports -0 and +0 which have different representations and -∞ and +∞ as well as can be used to represent a number of unique, Not-a-Numbers or NaNs which are illegal floating point numbers. So when performing IEEE standard floating point arithmetic, one needs to check to see if a result is a NaN which would make it an illegal result, and must be wary when comparing numbers such as -0, +0 and -∞ , +∞. because, sigh, they are not equal.

Posits to the rescue

It’s all a bit technical (read the paper to find out) but posits don’t support -0 and +0, just 0 and there’s no -∞ or +∞ in posits either, just ∞. Posits also allow for a variable number of exponent bits (which are encoded into Regime scale factor bits [whose value is determined by a useed factor] and Exponent scale factor bits) which means that the number of significand bits can also vary.

So, with a 32 bit, single precision Posit, the number range represented can be quite a bit larger than single precision floating point. Indeed, with the approach put forward by Gustafson, a single 32 bit posit has more numeric range than a single precision IEEE 754 float and about as 1/2 as much range as double precision IEEE floating point number but only uses 32 bits.

Presently, there’s no commercial hardware implementations of posits, but there’s a lot of interest. Mostly because, the same number of bits can represent a lot more numeric range than equivalently sized IEEE 754 floats. And for HPC environments, AI deep learning applications, scientific computing, etc. having more numeric range (or precision), in less space, means they can jam more data in the same storage, transfer more data over the same networking bandwidth and save more numbers in limited amounts of DRAM.

Although, commercial implementations do not exist, there’s been some FPGA simulations of posit floating point arithmetic. Those simulations have shown it to be more energy efficient than IEEE 754 floating point arithmetic for the same number of bits. So, you need to add better energy efficiency to the advantages of posit arithmetic.

Is it any wonder that HPC/big science (weather prediction, Square Kilometer Array, energy simulations, etc.) and many AI hardware accelerator chip designers are examining posits as a potential way to boost precision, reduce storage/memory footprint and reduce energy consumption.

~~~~

Yet, standards have a way of persisting. Just look at how long the QWERTY keyboard has lasted. It was originally designed in the 1870’s to slow down typing and reduce jamming, when typewriters were mechanical devices. But ever since 1934, when the DVORAK keyboard was patented, there’s been much better layouts for keyboards. And there’s no arguing that the DVORAK keyboard is better for typing on non-mechanical typewriters. Yet today, I know of no computer vendor that ships DVORAK labeled keyboards. Once a standard becomes set, it’s very hard to dislodge.

Comments?

Photo Credit(s):

All that AI DL training data comes from us

Read a couple of articles the past few weeks that highlighted something that not many of us are aware of, most of the data used to train AI deep learning (DL) models comes from us.

That is through our ignorance or tacit acceptation of licenses for apps that we use every day and for just walking around/interacting with the world.

The article in Atlantic, The AI supply chain runs on ignorance, talks about Ever, a picture sharing app (like Flickr), where users opted in to its facial recognition software to tag people in pictures. Ever also used that (tagged by machine or person) data to train its facial recognition software which it sells to government agencies throughout the world.

The second article, in Engadget , Colorado College students were secretly used to train AI facial recognition (software), talks about a group using a telephoto security camera than was pointed at a high traffic area on campus. The data obtained was used to help train an AI DL model to identify facial characteristics from far away.

The article went on to say that gathering photos from people in public places is not against the law. The study was also cleared by the school. The database was not released until after the students graduated but it did have information about the time and date the photos were taken.

But that’s nothing…

The same thing applies to video sharing and photo animation models, podcasting and text speaking models, blogging and written word generation models, etc. All this data is just lying around the web, freely available for any AI DL data engineer to grab and use to train their models. The article which included the image below talks about a new dataset of millions of webpages.

From an OpenAI paper on better language models showing the accuracy of some AI DL models “trained on a new dataset of millions of webpages called WebText.”

,Google photo search is scanning the web and has access to any photo posted to use for training data. Facebook, IG, and others have millions of photos that people are posting online every day, many of which are tagged, with information identifying people in the photos. I’m sure some where there’s a clause in a license agreement that says your photos, when posted on our app, no longer belong to you alone.

As security cameras become more pervasive, camera data will readily be used to train even more advanced facial recognition models without your say so, approval or even appreciation that it is happening. And this is in the first world, with data privacy and identity security protections paramount, imagine how the rest of the world’s data will be used.

With AI DL models, it’s all about the data. Yes much of it is messy and has to be cleaned up, massaged and sometimes annotated to be useful for DL training. But the origins of that training data are typically not disclosed to the AI data engineers nor the people that created it.

We all thought China would have a lead in AI DL because of their unfettered access to data, but the west has its own way to gain unconstrained access to vast amounts of data. And we are living through it today.

Yes AI DL models have the potential to drastically help the world, humanity and government do good things better. But a dark side to AI DL models also exist to help bad actors, organizations and even some government agencies do evil.

Caveat usor (May the user beware)

~~~~

Comments?

Photo Credit(s): “Still Watching You” by jhcrow is licensed under CC BY-NC 2.0 

Computational Photography Homework 1 Results.” by kscottz is licensed under CC BY-NC 2.0 

From Language models are unsupervised multi-task learners OpenAI research paper

ZooArchNet.org, a new collaboration for zoological-archeological data

Read an article the other day about a new collaboration data platform, the ZooArchNet, for archeological and zoological data ( data about animals and the history of humankind).

The collaboration was started at the Florida Museum at the University of Florida. They intend to construct a database that would allow researchers to track the history of animals and how humans have interacted with them over time.

The problem is there’s a lot of historical animal specimen information available in various locations/sites around the world and similarly, there’s a lot of data about the history of humanity, but there’s little that cross links the two. And by missing those cross links, researchers aren’t seeing the big picture, that humankind and animal-kind have co-existed since the dawn of time and have impacted each other throughout their history.

However, if there was a site where one could trace the history of animal and human life, across time, in a region, one could develop a better understanding of how they interact and impact one another.

Humankind interacting with animalkind

In the article, they discuss a number of examples where animals have been impacted by humankind over time. For example, originally the Mexican Turkey was domesticated for its feathers during Mayan, Aztec and other civilizations of Central America,. but over time it became a prized for use as food. While this was occurring, its range expanded considerably throughout North (and South) America.

It’s the understanding of habitat range over time and how humankind helped or hindered this range that’s best served by linking zoological and archeological data sets that exist in research libraries throughout the world.

How it works

One problem in cross linking such data is that it often exists in different formats and uses different metadata to describe it.

A key, early decision was to use a standard metadata format ,the Darwin Core (DwC) an outgrowth of the Dublin Core which is more focused on zoological data.

With this in place, the next problem was to translate specimen metadata into the DwC and extract the actual data (or URI’s) that described the specimen for harvesting. Once all that was accomplished they could migrate the specimen data or archeological data and host it/cross link it in their ZooArchNet database.

For example, the researchers at Florida Museum used the Open Context database to provide archeological informationand the Global Biodiversity Information Facility (GBIF) to supply biological diversity information and together the two were linked and cross indexed in the ZooArchNet database.

Once the data was available and located in Google Cloud storage, researchers could use Google BigQuery data analytics as well as other apps like (Google) indexers to create more data rich views and searchable indices for their ZooArchNet and VertNet web portals.

ZooArchNet is just starting. Most of the information currently available is about the few examples chosen to demonstrate the technology. As with anything like this, there’s a certain amount of crowd sourcing needed to make it worthwhile. It’s popularity will be a prime determinant on its usefulness over time. But anything that helps the world understand the true history of humanity’s impact on this life of this planet is worthwhile.

~~~~

Comments?

Photo Credit(s): “turkey bird” by watts photos1 is licensed under CC BY 2.0 

Workflow from ZooArchNet: Connecting zooarchaeological specimens to the biodiversity and archaeology data networks article

Darwin Core overview from Darwin Core: An Evolving Community-Developed Biodiversity Data Standard article

Need memory, Intel’s Optane DC PM to the rescue

I attended Intel’s DataCentric Innovation Conference Tech Field Day eXclusive (TFDx) last April. There were a couple of items Intel presented at the show that peaked my interest there, one of which was DL Boost (see my Intel’s new DL Boost for AI inferencing blog post) and the other was Optane DC PM (data center persistent memory). This post is about Optane DC PM.

As you already know, Optane SSDs have been on the market now for at least a year or so and have not gained much market traction as of yet. I and others attribute this to the high price differential between Optane SSDs and NVMe Flash SSDs but others may say it’s a matter of production yields – probably a little of both.

But Optane, as announced, always had another form factor (if that’s the right term), as persistent memory that could dramatically increase the size of server memory to support new memory intensive applications at a lower price than DRAM.

I was at Nutanix .NEXT conference last week and saw a 4 socket server from DELL that had 6TB of DRAM in it (and 4-44 core CPUs). I didn’t ask the price but when I mentioned I wanted one for my home office, they said it could easily heat my house. So the other problem with a lot of DRAM is power consumption.

Optane DC PM (data center persistent) memory is intended to solve both the high cost and high power consumption problems of DRAM.

How does it work in a server

The newer Intel motherboards support up to 12 slots of memory per socket. And up to 6 of these can be Optane DC PM (512GB DIMM) or 3TB per socket. Optane DC PM is accessed via L1-L2 caching just like any other memory. Apparently with a dual socket system you can have up to 11 Optane DC PM DIMMs on the motherboard.

L1-L2 cache access times are on the order of picoseconds (10**[-12] seconds), DRAM is on the order of nanoseconds (10**[-9] seconds) and flash is on the order of 100 microseconds (100*10**[-6] seconds). So there’s a vast access time gulf between DRAM and Flash that could be exploited with the right technology – enter Optane DC PM.

The only detailed info I could find on Optane DC PM access times was in a research paper (see Basic performance of Intel Optane DC PMM research paper) and it said Optane DC PM assessing times are ~350 nanoseconds, or close to right between DRAM and Flash. At the show the development team indicated that Optane DC PM support about 3GB/sec of bandwidth per module (DIMM).

There are two ways to use Optane DC PM:

  1. Memory mode – in Memory mode, the data in Optane DC PM is thrown away during a power cycle. You must use a block of DRAM as a cache or rather a virtual memory block to the Optane DC PM acting as a paging store. Data is brought into the DRAM cache when accessed using its (virtual) DRAM address and when no longer used. it gets evicted (destaged) back out to Optane DC PM. When power is cycled the data in Optane DC PM is cleared out. Optane DC PM supports AES XTS-256 bit encryption and can easily be cleared by throwing away encryption keys during a power cycle.
  2. App Direct mode – in App Direct mode, Optane DC PM is accessed directly using application APIs and its data persists across power cycles. For App Direct mode, Optane DC PM is still AES 256 encrypted but here the encryption key is maintained across power cycles but is locked on power up and you need to use a pass phrase to unlock it. In this mode, applications are responsible for flushing (L1-L2) caches to Optane to retains all data written through L1-L2 to the Optane DC PM. There’s a GitHub Persistent Memory Development Kit (PMDK) library for that supports the App Direct mode API that applications need to use.

Both modes use DDR-T, (transactional DDR4) a new memory bus protocol for Optane DC PM access. In the DDR-T protocol, access to the memory bus is requested by a CPU and is granted by an Optane DC PM DIMM. All Optane DC PM DIMMs on a system can be accessed in parrallel.

You can use RDMA to replicate (App direct?) Optane DC PM data from one system to another. In order to support Memory and App Direct mode, Optane DC PM required CPU, BIOS and (application) software changes.

Most of the Optane DC PM support and cryptology logic is implemented in hardware. Optane DC PM has an address indirection table (AIT) to support 3D XPoint wear leveling maintained in DRAM but flushed to Optane during power loss. Transfers to 3D XPoint media is in 256 byte cache lines but the memory bus operates in 64 byte cache lines, so there’s a (DRAM) buffer between media and memory bus.

Optane also supports a high availability, or up to two device failure mode. In this scenario, if one Optane DC PM DIMM fails, the system can swap another spare Optane DC PM DIMM into that address space and continue to operate. If a 2nd Optane DC PM fails then the system fails. Not sure what happens to the data on the original Optane DC PM DIMM during a failure. It seems to me data would be lost, but it could depend on its failure mode.

In Memory mode, the expected ratio between DRAM size and Optane DC PM size is should be 32GB DRAM/6TB Optane DC PM. At the TFDx event, the Optane DC PM team had some performance charts showing different DRAM cache miss rates. Intel also announced new CPU monitoring statistics to track application/workloads impacting DRAM/Optane DC PM in Memory mode and to track Optane DC PM health.

Optane DC PM usage modes are established by the BIOS. It’s flexible enough to have the Optane DC PM usage modes be defined on a region by region basis. Not exactly sure what a region is, but it could be an address range spanning MB(?) of Optane DC PM. With both modes in operation on a system, data can be moved from Memory mode Optane to App direct mode Optane or vice versa.

Intel expects that lifetime of an Optane DC PM DIMM to be from 200-370PB of data writes. Optane DC PMs have a 5 year warrantee. Given its bandwidth (3GB/sec), 200PB of data writes should last ~2 years but that’s at 100% duty cycle, writing 3GB of data, every second of every day. So, 5 years should be a reasonable guarantee using a more realistic ~40% duty cycle.

What applications use Optane DC PM

The one of interest to most people seems to be SAP HANA. According to the development team, SAP HANA could use App Direct mode for main database storage and use DRAM for its delta column store. Cassandra could also use Optane in App Direct mode in a similar fashion.

Also something like a REDIS for key-value store could use Optane DC PM to store Values and use DRAM to store Keys.

But any application could take advantage of the extra memory made available with Optane DC PM DIMMs in Memory mode today. Of course any use of Optane DC PM would require the right levels of Intel Xeon CPUs (Cascade Lake), BIOSes and motherboards.

~~~~

Interested in learning more, TFDx videos of the event are available on the website noted previously. Also these TFDx bloggers also have posts specifically on Optane DC PM.

The coolest thing since sliced bread – Optane by Matt Leib, (@MBLeib)

Intel’s crossover point: A 3D spork by Stephen Foskett (@SFoskett)

Intel answering SAP HANA’s tough questions by Keith Townsend (@CTOAdvisor)

Comments?