Storage that provides 100% performance at 99% full

A couple of weeks back we were talking with Qumulo at Storage Field Day 20 (SFD20) and they made mention that they were able to provide 100% performance at 99% full. Please see their video session during SFD20 (which can be seen here). I was a bit incredulous of this seeing as how every other modern storage system performance degrades long before they get to 99% capacity.

So I asked them to explain how this was possible. But before we get to that a little background on modern storage systems would be warranted.

The perils of log structured file systems

Most modern storage systems use a log structured file system where when they write data they write it to a sequential log and use a virtual addressing scheme to show where the data is located for that address, creating a (data) log of written blocks.

However, when data is overwritten, it leaves gaps in these data logs. These gaps need to be somehow recycled (squeezed out) in order to be able to be consumed as storage capacity. This recycling process is commonly called “garbage collection”.

Garbage collection does its work by reading heavily gapped log files and re-writing the old, but still current, data into a new log. This frees up those gaps to be reused. But garbage collection like this takes reading and writing of logs to free up space.

Now as log structured file systems get (70-80-90%) full, they need to spend more and more system time and effort (=performance) garbage collecting . This takes system (IO) performance away from normal host IO activity. Which is why I didn’t believe that Qumulo could offer 100% IO performance at 99% full.

But there was always another way to supply storage virtualization (read snapshotting) besides log files. Yes it might involve more metadata (table) management, but what it takes in more metadata, it gives back by requiring no garbage collection.

How Qumulo does without garbage collection

Qumulo has a scaled block store for a back end of their file and object cluster store. And yes it’s still a virtualized block store BUT it’s not a log structured file store.

It seems that there’s a virtual-to-physical mapping table that is used by Qumulo to determine the physical address of any virtual block in the file system. And files are allocated to virtual blocks directly through the use of B-tree metadata. These B-trees indicate which virtual blocks are in use by a file and its snapshots

If a host overwrites a data block. The block can be freed (if not being used in a snapshot) and placed on a freed block list and a new block is allocated in its place. The file’s allocated blocks b-tree is updated to reflect the new block and that’s it.

For snapshots, Qumulo uses something they call “write-out-of-place” process when data that a snapshot points to is overwritten. Again, it appears as if snapshots are some extra metadata associated with a file’s B-tree that defines the data in the snapshot.

The problem comes in when a file is deleted. If it’s a big enough file (TB-PB?), there could be millions to billions of blocks that have to be freed up. This would take entirely too long for a delete command, so this is done in the background. Qumulo calls this “reclaim delete“. So a delete of a big file unlinks the block B-tree from the directory and puts it on this reclaim delete work queue to free up these blocks later. Similarly, when a big snapshot is deleted, Qumulo performs a background process called “reclaim snapshot” for snapshot unique blocks.

As can be seen (it’s very hard to see given the coloration of the chart) from this screen shot of Qumulo’s session at SFD20, reclaim delete and reclaim snapshot are being done concurrently (in the background) with normal system IO. What’s interesting to note here is that reclaim IO (delete and snapshots) are going on all the time during the customers actual work. Why the write throughput drops significantly doing the the 27-29 of July is hard to understand. But the one case where it’s most serious (middle of July 28) reclaim IO also drops significantly. If reclaim IO were impacting write performance I would have expected it to have gone higher when write throughput went lower. But that’s not the case. From what I can see in the above reclaim IO has no impact on read or write throughput at this customer.

So essentially, by using a backing block store that does no garbage collection (not using a log structured file system), Qumulo is able to offer 100% system IO performance at 99% full – woah.

AI ML DL hardware performance results from MLPerf

Read an article a couple of weeks back from IEEE Spectrum, New Records for AI Training which discussed recent MLPerf v0.7 performance results. The article mentioned that MLPerf performance on its benchmarks has increased by ~2.7X in the last year alone.

The MLPerf organization was started back in 2018 to supply machine learning workload performance results, somewhat like what SPEC and TPC did for NFS and transaction processing. The MLPerf organization documented their philosophy in a paper.

But first please take our new poll:

As far as I can tell, MLPerf is the only benchmark currently available to show hardware system performance on AI training and inferencing. Below we report on MLPerf training results.

MLPerf also reports on both closed and open division benchmark results. Closed division submission all use the same software algorithms for each workload submission. This way one can compare workload performance across different hardware systems. Open division results can make use of any algorithm to achieve the desired results on the problem set. We report on MLPerf closed division results below.

Current MLPerf v0.7 (open and closed division) training results are available online (on GitHub) and are summarized in a training results page on their web site.

MLPerf v0.7 workload changes

The MLPerf team added a few new workloads and upped the game of another benchmark for V0.7

  • Recommendation DLRM: a replacement for what was used in MLPerf v0.6 and is from Facebook providing more parallelism in training for recommendations.
  • Wikipedia BERT: an addition to what was used in MLPerf v0.6 and is a new natural language processing (N?P) frontend, trained on Wikipedia which is used with other language processing capabilities.
  • Go MiniGo: an enhancement to MLPerf v0.6 MiniGo accuracy requirements and uses reinforcement learning to learn to play Go well enough to achieve a 50% win rate. For v0.7, they now use a full sized, 19X19 Go board and upped the win rate requirement to 50%.

MiniGo Results

A couple of items of note for the MiniGo results. There are essentially 3 different architectures represented in the above: NVIDIA DGX series (DGX A100, DGX-2H, DGX-1), Google TPUs (V4 and V3) and Intel (8 server nodes with Copper Lake-6 CPUs).

Google TPUs are considered internal and are only available to Google, its hardware partners or on GCP. Although MLPerf include GCP TPU system results for other workloads, there were none submitted for MiniGo.

The Intel system is a preview of their latest gen Copper Lake chips, which may not be commercially available yet. On the other hand, all NVIDIA systems are commercially available and can be deployed in your data center today.

As one can see in the above, NVIDIA systems swept the first 3 positions on our Top 10 MiniGo chart. A DGX A100 came in at #1, reaching a 50% win rate at MiniGo in mere 17 seconds using 448 CPUs and 1792 A100 GPUs. Coming in at #2 at 30 seconds was another DGX A100 using 64 CPUs and 256 A100 GPUs. And at #3 at 35 seconds was a DGX-2H using 64 CPUs and 512 V100 GPUs.

Next at #4 at 151 seconds was a Google TPU system with 64 TPUv4 accelerators (unclear how many CPUs, if any are used, results show 0). Note, an 8-node Intel server with the 32 CPUs (4/node) using the latest gen Copper Lake (-6) CPU came in at #7 using 409 seconds to achieve the training results.

There are 6 other MLPerf workloads including DLRM and BERT mentioned above. Each of these deserve their own discussion on top ten results. Alas, they will need to wait for another time and I will cover all of them in future posts.

~~~~

Nowadays, with much of IT turning to AI ML DL to provide critical services, it’s more important than ever to understand what can and can’t be done with available hardware. The fact that one can train a model to play decent Go in 17 seconds on a large DGX A100 cluster and under 7 minutes on an 8-node, leading edge, Intel server cluster is pretty impressive.

Despite MLPerf’s best efforts, it’s still tough to compare ML performance across systems when there’s so much diversity in the underlying hardware, especially in GPU, TPU and CPU counts. IMHO, it would be very useful to have a single GPU , TPU or CPU system submission requirement for each workload. That way one could compare how well each hardware element can perform the workload in isolation.

Nonetheless, the MLPerf suite of benchmarks provides a great first step in understanding what today’s hardware can accomplish in ML training (and inferencing).

Comments?

Photo Credits:

OFA DNNs, cutting the carbon out of AI

Read an article (Reducing the carbon footprint of AI… in Science Daily) the other day about a new approach to reducing the energy demands for AI deep neural net (DNN) training and inferencing. The article was reporting on a similar piece in MIT News but both were discussing a technique original outlined in a ICLR 2020 (Int. Conf. on Learning Representations) paper, Once-for-all: Train one network & specialize it for efficient deployment.

The problem stems from the amount of energy it takes to train a DNN and use it for inferencing. In most cases, training and (more importantly) inferencing can take place on many different computational environments, from IOT devices, to cars, to HPC super clusters and everything in between. In order to create DNN inferencing algorithms for use in all these environments, one would have to train a different DNN for each. Moreover, if you’re doing image recognition applications, resolution levels matter. Resolution levels would represent a whole set of more required DNNs that would need to be trained.

The authors of the paper suggest there’s a better approach. Train one large OFA (once-for-all) DNN, that covers the finest resolution and largest neural net required in such a way that smaller, sub-nets could be extracted and deployed for less weighty computational and lower resolution deployments.

The authors contend the OFA approach takes less overall computation (and energy) to create and deploy than training multiple times for each possible resolution and deployment environment. It does take more energy to train than training a few (4-7 judging by the chart) DNNs, but that can be amortized over a vastly larger set of deployments.

OFA DNN explained

Essentially the approach is to train one large (OFA) DNN, with sub-nets that can be used by themselves. The OFA DNN sub-nets have been optimized for different deployment dimensions such as DNN model width, depth and kernel size as well as resolution levels.

While DNN width is purely the number of numeric weights in each layer, and DNN depth is the number of layers, Kernel size is not as well known. Kernels were introduced in convolutional neural networks (CovNets) to identify the number of features that are to be recognized. For example, in human faces these could be mouths, noses, eyes, etc. All these dimensions + resolution levels are used to identify all possible deployment options for an OFA DNN.

OFA secrets

One key to the OFA success is that any model (sub-network) selected actually shares the weights of all of its larger brethren. That way all the (sub-network) models can be represented by the same DNN and just selecting the dimensions of interest for your application. If you were to create each and every DNN, the number would be on the order of 10**19 DNNs for the example cited in the paper with depth using {2,3,4) layers, width using {3,4,6} and kernel sizes over 25 different resolution levels.

In order to do something like OFA, one would need to train for different objectives (once for each different resolution, depth, width and kernel size). But rather than doing that, OFA uses an approach which attempts to shrink all dimensions at the same time and then fine tunes that subsets NN weights for accuracy. They call this approach progressive shrinking.

Progressive shrinking, training for different dimensions

Essentially they train first with the largest value for each dimension (the complete DNN) and then in subsequent training epochs reduce one or more dimensions required for the various deployments and just train that subset. But these subsequent training passes always use the pre-trained larger DNN weights. As they gradually pick off and train for every possible deployment dimension, the process modifies just those weights in that configuration. This way the weights of the largest DNN are optimized for all the smaller dimensions required. And as a result, one can extract a (defined) subnet with the dimensions needed for your inferencing deployments.

They use a couple of tricks when training the subsets. For example, when training for smaller kernel sizes, they use the center most kernels and transform their weights using a transformation matrix to improve accuracy with less kernels. When training for smaller depths, they use the first layers in the DNN and ignore any layers lower in the model. Training for smaller widths, they sort each layer for the highest weights, thus ensuring they retain those parameters that provide the most sensitivity.

It’s sort of like multiple video encodings in a single file. Rather than having a separate file for every video encoding format (Mpeg 2, Mpeg 4, HVEC, etc.), you have one file, with all encoding formats embedded within it. If for example you needed Mpeg-4, one could just extract those elements of the video file representing that encoding level

OFA DNN results

In order to do OFA, one must identify, ahead of time, all the potential inferencing deployments (depth, width, kernel sizes) and resolution levels to support. But in the end, you have a one size fits all trained DNN whose sub-nets can be selected and deployed for any of the pre-specified deployments.

The authors have shown (see table and figure above) that OFA beats (in energy consumed and accuracy level) other State of the Art (SOTA) and Neural (network) Architectural Search (NAS) approaches to training multiple DNNs.

The report goes on to discuss how OFA could be optimized to support different latency (inferencing response time) requirements as well as diverse hardware architectures (CPU, GPU, FPGA, etc.).

~~~~

When I first heard of OFA DNN, I thought we were on the road to artificial general intelligence but this is much more specialized than that. It’s unclear to me how many AI DNNs have enough different deployment environments to warrant the use of OFA but with the proliferation of AI DNNs for IoT, automobiles, robots, etc. their will come a time soon where OFA DNNs and its competition will become much more important.

Comments

Photo Credit(s):

Earth globe within a locked cage

Breaking IoT security

Read an article the other day (Researchers exploit low entropy of IoT devices to break RSA certificates) about researchers cracking IoT device security and breaking their public key encryption keys. The report focused on PKI and RSA certificates and IoT devices. The article mentioned the research paper describing the attack in more detail.

safe 'n green by Robert S. Donovan (cc) (from flickr)
safe ‘n green by Robert S. Donovan (cc) (from flickr)

RSA certificates publish a public key and the digital signature of the certificate and identify the device that owns the certificate.

What the researchers were able to show was that ~250K keys in IoT device RSA certificates were insecure. They were able to compromise the 250K RSA certificates using a single Microsoft Azure VM and about $3K of computer time.

It turns out that if two RSA certificate public keys share the same factor, it’s much easier to determine the greatest common devisor GCD) of the two public keys than it is to factor any one of them. And once you have the GCD of the two keys, it’s relatively trivial to determine the other factor in a public key. And that’s just what they did.

Public key infrastructure (PKI) encryption depends on asymmetric cryptography using a “public” key to encrypt messages (or to encrypt a one time key to be used in later encryption of messages) and the use of a “private” key to decrypt the message (or keys) and sign digital certificates. There are certificate authorities and a number of other elements used in PKI but the asymmetric cryptography at its heart, rests on the foundation of the difficulty in factoring large numbers but those large numbers need to be random and prime.

True randomness is hard

Just some of the recently donated seeds that are being added to the Reading Food Growing Network seed swap boxes, including some Polish gherkin seeds.

The problem starts with generating truly random numbers in a digital computer. Digital algorithms typically depend on a computer to perform the some set of instructions, in the same way and sequence so as to get the same answer every time we run the algorithm.

But if you want random numbers this predictability of always coming up with the same answer each time results in non-random numbers (or rather random numbers that are the same each time you run the algorithm). So to get around this, most random number generators can make use of a (random) seed which is used as an input to the algorithm to generate random numbers.

However, this seed needs to be a random number. But to create a random number it needs to be generated not with instructions but using something outside the digital computer. One approach noted above is to use a human typing keys to generate a random number to be used as a seed.

The researchers exploited the fact that most IoT devices don’t use a random (enough) seed for their PKI key generation. And they were able to use the GCD trick to figure out the factors to the PKI.

But the lack of true randomness (or entropy) is the real problem. Somehow, these devices need to have a cheap and effective way to generate a random seed. Until this can be found, they will be subject to these sorts of attacks.

… but not impossible to obtain

I remember in times past when tasked to create a public key-private key pair I had to type some random characters. The Public key encryption algorithm used the inter-character time interval of my typing to generate a random seed that was then used to generate the key pair used in the public key. I believe the two keys also need to be prime numbers.

Earth globe within a locked cage

Perhaps a better approach would be to assign them keys from a centralized key distributor. That way the randomness could be controlled by the (key) distributor.

There are other approaches that depend on the sensors available to an IoT device. If the device has a camera or mic, taking raw data from the camera or sound sensor and doing a numerical transform on them may suffice. Strain gauges, liquid levels, temperature, humidity, wind speed, etc. all of these devices have something which senses the world around them and many of these are, at their base, analog sensors. Reading and converting some portion of these analog signals from raw analog to a digital random seed could be very effective way to generate true(r) randomness.

~~~~

The paper has much more information about the attack and their results if your interested. They said that ~50% of the compromised devices were from a large network supplier. Such suppliers probably also have a vast majority of devices deployed. Still it’s troubling, nonetheless.

Until changes are made to IoT devices, they will continue to be insecure. Not as much of a problem when they are read only sensors but when the information they sense is used by robots or other automation to make decisions about actions, then having insecure IoT becomes a safety issue.

This is not the first time such an attack was attempted and each time, it’s been very successful. That alone should be cause for alarm. But IoT and similar devices are hard to patch in the field and their continuing insecurity may be more of a result of the difficulty of updating a large install base than anything else.

Photo Credit(s):

Your Rx on blockchain

We have been discussing blockchain technology for a while now (e.g., see our posts Etherium enters the enterprise, Blockchains go mainstream, and our podcast Discussing blockchains with Donna Dillenberger). And we were at VMworld 2019 where there was brief mention of Project Concord and VMware’s blockchain use for supply chain management.

But recently there was an article in Science Daily about the use of blockchain technology to improvedprescriptions. This was summary of a research paper on Cryptopharmacueticals (paper behind paywall).

How cryptopharmaceuticals could work

Essentially the intent is to use a medical blockchain, to fight counterfeit drugs. They have a proof of concept IOS & Android MedBlockChain app that would show how it could work, but it’s just a sample of some of its functionality, and doesn’t use an external blockchain.

The MedBlockChain app would create a platform and have at least two sides to it.

  • One side would be the phamaceutical manufacturers which would use the blockchain to add or checkin medications that they manufacture to it in an immutable fashion. So the block chain would essentially have an unfalsifiable record of each pill or batch of pills, that was ever manufactured by pharmaceutical companies around the world.
  • The other side would be used by a person taking a medication. Here they could check-out or use the app to see if the medication they are taking was manufactured by a certified supplier of the drug. Presumably there would be a QR code or something similar, that could be read off the medicine package or pill itself. The app would scan the QR code and then use MedBlockChain to look up the provenance of the medication to see if it’s valid or not (a fraudulent copy).

The example MedBlockChain app also has more medical information that could be made available on the block chain such as test results, body measurements, vitals, etc. These could all be stored immutably in the MedBlockChain and provided to medical practitioners. How such medical (HIPPA controlled) personal information would be properly secured and only supplied in plaintext to appropriate personnel is another matter..

~~~~
Cryptopharmaceuticals and the MedBlockChain reminds me of IBM’s blockchain providing diamond provenance and other supply chain services, only in this case applied to medications. Diamond provenance makes sense because of its high cost but drugs seem a harder market to make to me.

I was going to say that such a market may not exist in first world countries. But then I -saw a wikipedia article on counterfeit medicines (bad steroids and cancer medicines with no active ingredients). It appears that counterfeit/fraudulent medications are a problem wherever you may live.

Then of course, the price of medications seems to be going up. So maybe, it could start as a provenance tool for expensive medications and build a market from there.

How to convince manufacturers and the buying public to use the blockchain is another matter. It’s sort of a chicken and egg thing. You need the manufacturers to use it for medications, pills or batch that they manufacture. Doing so adds overhead, time and additional expense and they would need to add a QR code or something similar to every pill, pen or other drug delivery device.

Then maybe you could get consumers and medical practitioners administering drugs to start using it to validate expensive meds. Starting with expensive medications could potentially build the infrastructure, consumer/medical practitioner and pharmaceutical company buy in that would kick start the MedBlockChain. Once started there, it could work its way down to more widely used medications.

Comments?

Photo Credit(s):

Where should IoT data be processed – part 1

I was at FlashMemorySummit 2019 (FMS2019) this week and there was a lot of talk about computational storage (see our GBoS podcast with Scott Shadley, NGD Systems). There was also a lot of discussion about IoT and the need for data processing done at the edge (or in near-edge computing centers/edge clouds).

At the show, I was talking with Tom Leyden of Excelero and he mentioned there was a real need for some insight on how to determine where IoT data should be processed.

For our discussion let’s assume a multi-layered IoT architecture, with 1000s of sensors at the edge, 100s of near-edge processing/multiplexing stations, and 1 to 3 core data center or cloud regions. Data comes in from the sensors, is sent to near-edge processing/multiplexing and then to the core data center/cloud.

Data size

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)
Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

When deciding where to process data one key aspect is the size of the data. Tin GB or TB but given today’s world, can be PB as well. This lone parameter has multiple impacts and can affect many other considerations, such as the cost and time to transfer the data, cost of data storage, amount of time to process the data, etc. All of these sub-factors include the size of the data to be processed.

Data size can be the largest single determinant of where to process the data. If we are talking about GB of data, it could probably be processed anywhere from the sensor edge, to near-edge station, to core. But if we are talking about TB the processing requirements and time go up substantially and are unlikely to be available at the sensor edge, and may not be available at the near-edge station. And PB take this up to a whole other level and may require processing only at the core due to the infrastructure requirements.

Processing criticality

Human or machine safety may depend on quick processing of sensor data, e. g. in a self-driving car or a factory floor, flood guages, etc.. In these cases, some amount of data (sufficient to insure human/machinge safety) needs to be done at the lowest point in the hierarchy, with the processing power to perform this activity.

This could be in the self-driving car or factory automation that controls a mechanism. Similar situations would probably apply for any robots and auto pilots. Anywhere some IoT sensor array was used to control an entity, that could jeopardize the life of human(s) or the safety of machines would need to do safety level processing at the lowest level in the hierarchy.

If processing doesn’t involve safety, then it could potentially be done at the near-edge stations or at the core. .

Processing time and infrastructure requirements

Although we talked about this in data size above, infrastructure requirements must also play a part in where data is processed. Yes sensors are getting more intelligent and the same goes for near-edge stations. But if you’re processing the data multiple times, say for deep learning, it’s probably better to do this where there’s a bunch of GPUs and some way of keeping the data pipeline running efficiently. The same applies to any data analytics that distributes workloads and data across a gaggle of CPU cores, storage devices, network nodes, etc.

There’s also an efficiency component to this. Computational storage is all about how some workloads can better be accomplished at the storage layer. But the concept applies throughout the hierarchy. Given the infrastructure requirements to process the data, there’s probably one place where it makes the most sense to do this. If it takes a 100 CPU cores to process the data in a timely fashion, it’s probably not going to be done at the sensor level.

Data information funnel

We make the assumption that raw data comes in through sensors, and more processed data is sent to higher layers. This would mean at a minimum, some sort of data compression/compaction would need to be done at each layer below the core.

We were at a conference a while back where they talked about updating deep learning neural networks. It’s possible that each near-edge station could perform a mini-deep learning training cycle and share their learning with the core periodicals, which could then send this information back down to the lowest level to be used, (see our Swarm Intelligence @ #HPEDiscover post).

All this means that there’s a minimal level of processing of the data that needs to go on throughout the hierarchy between access point connections.

Pipe availability

binary data flow

The availability of a networking access point may also have some bearing on where data is processed. For example, a self driving car could generate TB of data a day, but access to a high speed, inexpensive data pipe to send that data may be limited to a service bay and/or a garage connection.

So some processing may need to be done between access point connections. This will need to take place at lower levels. That way, there would be no need to send the data while the car is out on the road but rather it could be sent whenever it’s attached to an access point.

Compliance/archive requirements

Any sensor data probably needs to be stored for a long time and as such will need access to a long term archive. Depending on the extent of this data, it may help dictate where processing is done. That is, if all the raw data needs to be held, then maybe the processing of that data can be deferred until it’s already at the core and on it’s way to archive.

However, any safety oriented data processing needs to be done at the lowest level and may need to be reprocessed higher up in the hierachy. This would be done to insure proper safety decisions were made. And needless the say all this data would need to be held.

~~~~

I started this post with 40 or more factors but that was overkill. In the above, I tried to summarize the 6 critical factors which I would use to determine where IoT data should be processed.

My intent is in a part 2 to this post to work through some examples. If there’s anyone example that you feel may be instructive, please let me know.

Also, if there’s other factors that you would use to determine where to process IoT data let me know.

Clouds an existential threat – part 2

Recall that in part 1, we discussed most of the threats posed by clouds to both hardware and software IT vendors. In that post we talked about some of the more common ways that vendors are trying to head off this threat (for now).

In this post we want to talk about some uncommon ways to deal with the coming cloud apocalypse.

But first just to put the cloud threat in perspective, the IT TAM is estimated, by one major consulting firm, to be a ~$3.8T in 2019 with a growth rate of 3.7% Y/Y. The same number for public cloud spending, is ~$214B in 2019, growing by 17.5% Y/Y. If both growth rates continue (a BIG if), public cloud services spend will constitute all (~98.7%) of IT TAM in ~24 years from now. No nobody would predict those growth rates will continue but it’s pretty evident the growth trends are going the wrong way for (non-public cloud) IT vendors.

There are probably an infinite number of ways to deal with the cloud. But outside of the common ones we discussed in part 1, only a dozen or so seem feasible to me and even less are fairly viable for present IT vendors.

  • Move to the edge and IoT.
  • Make data center as easy and cheap to use as the cloud
  • Focus on low-latency, high data throughput, and high performing work and applications
  • Move 100% into services
  • Move into robotics

The edge has legs

Probably the first one we should point out would be to start selling hardware and software to support the edge. Speaking in financial terms, the IoT/Edge market is estimated to be $754B in 2019, and growing by over a 15.4% CAGR ).

So we are talking about serious money. At the moment the edge is a very diverse environment from cameras, sensors and moveable devices. And everybody seems to be in the act, big industrial firms, small startups and everyone in between. Given this diversity it’s hard to see that IT vendors could make a decent return here. But given its great diversity, one could say it’s ripe for consolidation.

And the edge could use some reference architectures where there are devices at the extreme edge, concentrators at the edge, more higher concentrators at nodes and more at the core, etc. So there’s a look and feel to it that seems like Ro/Bo – central core hub and spoke architectures, only on steroids with leaf proliferation that can’t be stopped. And all that data coming in has to be classified, acted upon and understood.

There are plenty of other big industrial suppliers in this IoT/edge field but none seem to have the IT end of the market that Hitachi Vantara can claim to. Some sort of combination of a large IT vendor and a large industrial firm could potentially do the same

However, Hitachi Vantara seems to be focusing on the software side of the edge. This may be an artifact of Hitachi family of companies dynamics. But it seems to be leaving some potential sales on the table.

Hitachi Vantara has the advantage of being into industrial technology in a big way so the products they create operate in factories, rail yards, ship yards and other industrial sites around the world already. So, adding IoT and edge capabilities to their portfolio is a natural extension of this expertise.

There are a few vendors going into the Edge/IoT in a small way, but no one vendor personifies this approach more than Hitachi Vantara. The Hitachi family of companies has a long and varied history in OT (operational technology) or industrial technology. And over the last many years, HDS and now Hitachi Vantara, have been pivoting their organization to focus more on IoT and edge solutions and seem to have made IOT, OT and the edge, a central part of their overall strategy.

So there’s plenty of money to be made with IoT/Edge hardware and software, one just has to go after it in a big way and there’s lots of competition. But all the competition seems to be on the same playing field (unlike the public cloud playing field).

Getting to “data center as a cloud”

There are a number of reasons why customers migrate work to the cloud, ease of use, ease of storage, ease of scale, access to myriad applications, access to multi-regional data centers, CAPex financial model, to name just a few.

There’s nothing that says much of this couldn’t be provided at the data center. It’s mostly just a lot of open source software and a lot of common hardware. IT vendors can do this sort of work if they put their vast resources to go after it.

From the pure software side, there are a couple of companies trying to do this namely VMware and Nutanix but (IBM) RedHat, (Dell) Pivotal, HPE Simplivity and others are also going after this approach.

Hardware wise CI and HCI, seem to be rudimentary steps towards common hardware that’s easy to deploy, operate and support. But these baby steps aren’t enough. And delivery to deployment in weeks is never going to get them there. If Amazon can deliver books, mattresses, bicycles, etc in a couple of days. IT vendors should be able to do the same with some select set of common hardware and have it automatically deployable in seconds to minutes once powered on.

And operating these systems has to be drastically simplified. On any public cloud there’s really no tuning required, almost minimal configuration, and then it’s just load your data and go. Yes there’s a market place to select, (virtual) hardware, (virtual) storage hardware, (virtual) networking hardware, (virtual server) O/S and (virtual?) open source applications.

Yes there’s a lots of software behind all that virtualization. And it’s fundamentally different than today’s virtualized systems. It’s made to operate only on commodity hardware and only with open source software.

The CAPex financial model is less of a problem. Today. I find many vendors are offering their hardware (and some software) on a CAPex, pay as you go model. More of this needs to be made available but the IT vendors see this, and are already aggressively moving in this direction.

The clouds are not standing still what with Azure Stack, AWS and GCP all starting to provideversions of their stack on prem in the enterprise. This looks to be a strategic battleground between the clouds and IT vendors.

Making everything IT can do in the cloud available in the data center, with common hardware and software and with the speed and ease of deployment, operations and support (maintenance) should be on every IT vendors to do list.

Unfortunately, this is not going to stop the public cloud completely, but it has the potential to slow the growth rate. But time is short, momentum has moved to the public cloud and I don’t (yet) see the urgency of the IT vendors to make this transition happen today.

Focus on low-latency, high data throughput and high performance work

This is somewhat unfair as all the IT vendors are already involved in these markets in a big way. But, there are some trends here, that indicate this low-latency market will be even more important over time.

For example, more and more of commercial IT is starting to take advantage of big data and AI to profit from all their data. And big science is starting to migrate to IT, where massive data flows and data analysis tools are becoming important to the data center. If anything, the emergence of IoT and the edge will increase data flows that need to be analyzed, understood, and ultimately dealt with.

DNA genomics may be relegated to big pharma/medical but 3D visualization is becoming so mainstream that I can do it on my desktop. These sorts of things were relegated to HPC/big science just a decade or so ago. What tools exist in HPC today that the IT data center of the future will deam a necessary part of their application workload.

Is this a sizable TAM, probably not today. In all honesty it’s buried somewhere in the IT TAM above. But it can be a growing niche, where IT vendors can stake a defensive position and the cloud may have a tough time dislodging.

I say the cloud “may have trouble dislodging” because nothing says that the entire data flow/work flow couldn’t migrate to the cloud, if the responsiveness was available there. But, if anything (guaranteed) responsiveness is one of the few achilles heels of the public cloud. Security may be the other one.

We see IBM, Intel, and a few others taking this space seriously. But all IT vendors need to see where they can do better here.

Focus on services

This not really out-of-box thinking. Some (old) IT vendors have been moving into services for over 50 years now others are just seeing there’s money to be made here. Just about every IT vendor has deployment & support services. most hardware have break-fix services.

But standalone IT services are more specialized and in the coming cloud apocalypse, services will revolve around implementing cloud applications and functionality or migrating work from the cloud or (rarely in the future) back to on prem.

TAM for services is buried in the total IT spend but industry analysts estimate that in 2019 total worldwide TAM for IT services will be about $1.0 in 2019 and growing by 2.6% CAGR.

So services are already a significant portion of IT spend today. And will probably not be impacted by the move to the cloud. I’d say that because implementing applications and services will still exist as long as the cloud exists. Yes it may get simpler (better frameworks, containerization, systemization), but it won’t ever go away completely.

Robots, the endgame

Ok laugh now. I understand this is a big ask to think that Robot spending could supplement and maybe someday surpass IT spending. But we all have to think long term. What is a self driving car but a robotic data center on wheels, generating TB of data every day it’s driven.

Robots over the next century will invade every space, become ever present and ever necessary to modern world functioning . They will have sophisticated onboard computing, motors, servos, sensors and on board and backend processing requirements. The real low-latency workload of the future will be in the (computing) minds of robots.

Even if the data center moves entirely to the cloud, all robotic computation will never reside there because A) it’s too real time and B) it needs to operate well even disconnected from the Internet.

Is all this going to happen in the next 10 or 20 years, maybe not but 30 to 50 years out this world will have a multitude of robots operating within it. .

Who’s going to develop, manufacture, support and sustain these mobile computing data centers on wheels, legs, slithering and flying bodies?

I would say IT vendors of today are uniquely positioned to dominate this market. Here to the industry is very fragmented today. There are a few industrial robotic companies and just about every major auto manufacturer is going after self driving cars. And there are many bit players today. So it’s ripe for disruption and consolidation. .

Yet, none of the major IT vendors seem to be going after this. Ok Amazon (hardware & software) and Microsoft (software) have done work in this arena. If anything this should tell IT vendors that they need to start working here as well.

But alas, none have taken up the mantle. In the mean time robot startups are biting the dust left and right, trying to gain market traction.

~~~~

That seems to be about it for the major viable out of the box approaches to the public cloud threat. I have a few other ideas but none seem as useful as the above.

Let me know what you think.

Picture credit(s):

StorPool, fast storage for fast times

At Storage Field Day 18 (SFD18) a couple of weeks ago, we heard from a new company, StorPool, that provides ultra fast software defined storage for MSPs and other cloud providers. You can watch the videos of their sessions here.

Didn’t know what to make of them at first, but when they started demoing their performance, we all woke up. They ran an all read and mixed read-write IO workload, that almost blew away any other proprietary/non-proprietary storage I’ve seen before.

[Updated 12Mar2019] What they were trying to achieve was to match the performance of an Windows Server 2019 Hyper V benchmark which hit 13.8 M IOPS using 12 nodes of 384GB DRAM, 1.5TB Optane DC persistent memory, 32TB (4X8TB) NVMe SSDs and Mellanox 25Gbps RDMA ethernet, with each VM running on the server that had the VHDX file stored.

Their demo showed 70:30 R:W random 4KB mixed workload and achieved 1M IOPS with a read latency of 140 microsec. and write latency of 100 microsec. (end to end at the VM level). [Updated 12Mar2019] They were able to match the performance of a published benchmark without the 1.5TB Optane memory, without the 25Gbps RDMA Ethernet and without having the VMs and its storage running on same nodes. They were able to show this performance running StorPool, KVM and CentOS 7 across 12 nodes running both VMs and storage services.

They also showed information on a pgbench benchmark, which I was not familiar with. The chart had response times on the horizontal axis and TPS performance on the vertical axis.

What’s even more amazing is that even with the great performance they still offer reasonable data services such as CoW snapshots, asynchronous replication (with changed block tracking), thin provisioning, end to end data integrity, and iSCSI support.

Their target market is mostly MSPs and large customers moving to the private cloud configurations. They mentioned deep support for OpenNebula, [updated 12Mar2019] OpenStack, OnApp and Kubernetes which means each virtual disk is a volume/LUN. They support VMware and Windows Server/Hyper-V through iSCSI.

~~~~

The fact that they have a proprietary protocol is not that great but if they can generate the IOPS and response times they showed here with snapshot, thin provisioning and asynch replication, I’m ok with it. [Updated 12Mar2019]The fact that they were able to match the performance of the more expensive system with standard Ethernet, no Optane memory and all VMs running remote made a significant impression on me.

Want to learn more, check out these other discussions on StorPool (and other SFD18 vendors):

SFD18 – as intense as it gets by Max Mortillaro (@DarkAvenger), and

Podcast #3 review the SFD18 presenters by Chris M. Evans (@ChrisMEvans) and Matt Leib (@MBLeib).

[Updated 12Mar2019 Boyan Krosnov sent me an email indicating that the post had made some mistakes in the post which were corrected via updates above. Editors ]