AWS Data Exchange vs Data Banks – part 2

Saw where AWS announced a new Data Exchange service on their AWS Pi day 2023. This is a completely managed service available on the AWS market place to monetize data.

In a prior post on a topic I called data banks (Data banks, data deposits & data withdrawals…), I talked about the need to have some sort of automated support for personal data that would allow us to monetize it.

The hope then (4.5yrs ago) was that social media, search and other web services would supply all the data they have on us back to us and we could then sell it to others that wanted to use it.

In that post, I called the data the social media gave back to us data deposits, the place where that data was held and sold a data bank, and the sale of that data a data withdrawal. (I know talking about banks deposits and withdrawals is probably not a great idea right now but this was back a ways).

AWS Data Exchange

1918 Farm Auction by dok1 (cc) (from Flickr)
1918 Farm Auction by dok1 (cc) (from Flickr)

With AWS Data Exchange, data owners can sell their data to data consumers. And it’s a completely AWS managed service. One presumably creates an S3 bucket with the data you want to sell. determine a price to sell the data for and a period clients can access that data for and register this with AWS and the AWS Data Exchange will support any number of clients purchasing data data.

Presumably, (although unstated in the service announcement), you’d be required to update and curate the data to insure it’s correct and current but other than that once the data is on S3 and the offer is in place you could just sit back and take the cash coming in.

I see the AWS Data Exchange service as a step on the path of data monetization for anyone. Yes it’s got to be on S3, and yes it’s via AWS marketplace, which means that AWS gets a cut off any sale, but it’s certainly a step towards a more free-er data marketplace.

Changes I would like to AWS Data Exchange service

Putting aside the need to have more than just AWS offer such a service, and I heartedly request that all cloud service providers make a data exchange or something similar as a fully supported offering of their respective storage services. This is not quite the complete data economy or ecosystem that I had envisioned in September of 2018.

If we just focus on the use (data withdrawal) side of a data economy, which is the main thing AWS data exchange seems to supports, there’s quite a few missing features IMHO,

  • Data use restrictions – We don’t want customers to obtain a copy of our data. We would very much like to restrict them to reading it and having plain text access to the data only during the period they have paid to access it. Once that period expires all copies of data needs to be destroyed programmatically, cryptographically or in some other permanent/verifiable fashion. This can’t be done through just license restrictions. Which seems to be the AWS Data Exchanges current approach. Not sure what a viable alternative might be but some sort of time-dependent or temporal encryption key that could be expired would be one step but customers would need to install some sort of data exchange service on their servers using the data that would support encryption access/use.
  • Data traceability – Yes, clients who purchase access should have access to the data for whatever they want to use it for. But there should be some way to trace where our data ended up or was used for. If it’s to help train a NN, then I would like to see some sort of provenance or certificate applied to that NN, in a standardized structure, to indicate that it made use of our data as part of its training. Similarly, if it’s part of an online display tool somewhere in the footnotes of the UI would be a data origins certificate list which would have some way to point back to our data as the source of the information presented. Ditto for any application that made use of the data. AWS Data Exchange does nothing to support this. In reality something like this would need standards bodies to create certificates and additional structures for NN, standard application packages, online services etc. that would retain and provide proof of data origins via certificates.
  • Data locality – there are some juristictions around the world which restrict where data generated within their boundaries can be sent, processed or used. I take it that AWS Data Exchange deals with these restrictions by either not offering data under jurisdictional restrictions for sale outside governmental boundaries or gating purchase of the data outside valid jurisdictions. But given VPNs and similar services, this seems to be less effective. If there’s some sort of temporal key encryption service to make use of our data then its would seem reasonable to add some sort of regional key encryption addition to it.
  • Data audibility – there needs to be some way to insure that our data is not used outside the organizations that have actually paid for it. And that if there’s some sort of data certificate saying that the application or service that used the data has access to that data, that this mechanism is mandated to be used, supported, and validated. In reality, something like this would need a whole re-thinking of how data is used in society. Financial auditing took centuries to take hold and become an effective (sometimes?) tool to monitor against financial abuse. Data auditing would need many of the same sorts of functionality, i.e. Certified Data Auditors, Data Accounting Standards Board (DASB) which defines standardized reports as to how an entity is supposed to track and report on data usage, governmental regulations which requires public (and private?) companies to report on the origins of the data they use on a yearly/quarterly basis, etc.

Probably much more that could be added here but this should suffice for now.

other changes to AWS Data Exchange processes

The AWS Pi Day 2023 announcement didn’t really describe the supplier end of how the service works. How one registers a bucket for sale was not described. I’d certainly want some sort of stenography service to tag the data being sold with the identity of those who purchased it. That way there might be some possibility to tracking who released any data exchange data into the wild.

Also, how the data exchange data access is billed for seems a bit archaic. As far as I can determine one gets unlimited access to data for some defined period (N months) for some specific amount ($s). And once that period expires, customers have to pay up or cease accessing the S3 data. I’d prefer to see at least a GB/month sort of cost structure that way if a customer copies all the data they pay for that privilege and if they want to reread the data multiple times they get to pay for that data access. Presumably this would require some sort of solution to the data use restrictions above to enforce.

Data banks, deposits, withdrawals and Initial Data Offerings (IDOs)

The earlier post talks about an expanded data ecosystem or economy. And I won’t revisit all that here but one thing that I believe may be worth re-examining is Initial Data Offerings or IDOs.

As described in the earlier post, IDO’ss was a mechanism for data users to request permanent access to our data but in exchange instead of supplying it for a one time fee, they would offer data equity in the service.

Not unlike VC, each data provider would be supplied some % (data?) ownership in the service and over time data ownership get’s diluted at further data raises but at some point when the service is profitable, data ownership units could be purchased outright, so that the service could exit it’s private data use stage and go public (data use).

Yeah, this all sounds complex, and AWS Data Exchange just sells data once and you have access to it for some period, establishing data usage rights.. But I think that in order to compensate users for their data there needs to be something like IDOs that provides data ownership shares in some service that can be transferred (sold) to others.

I didn’t flesh any of that out in the original post but I still think it’s the only way to truly compensate individuals (and corporations) for the (free) use of the data that web, AI and other systems are using to create their services.

~~~~

I wrote the older post in 2018 because I saw the potential for our data to be used by others to create/trlain services that generate lots of money for those organization but without any of our knowledge, outright consent and without compensating us for the data we have (indadvertenly or advertently) created over our life span.

As an example One can see how Getty Images is suing DALL-E 2 and others have had free use of their copyrighted materials to train their AI NN. If one looks underneath the covers of ChatGPT, many image processing/facial recognition services, and many other NN, much of the data used in training them was obtained by scrapping web pages that weren’t originally intended to supply this sorts of data to others.

For example, it wouldn’t surprise me to find out that RayOnStorage posts text has been scrapped from the web and used to train some large language model like ChatGPT.

Do I receive any payment or ownership equity in any of these services – NO. I write these blog posts partially as a means of marketing my other consulting services but also because I have an abiding interest in the subject under discussion. I’m happy for humanity to read these and welcome comments on them by humans. But I’m not happy to have llm or other RNs use my text to train their models.

On the other hand, I’d gladly sell access to RayOnStorage posts text if they offered me a high but fair price for their use of it for some time period say one year… 🙂

Comments?

The Hollowing out of enterprise IT

We had a relatively long discussion yesterday, amongst a bunch of independent analysts and one topic that came up was my thesis that enterprise IT is being hollowed out by two forces pulling in opposite directions on their apps. Those forces are the cloud and the edge.

Western part of the abandoned Packard Automotive Plant in Detroit, Michigan. by Albert Duce

Cloud sirens

The siren call of the cloud for business units, developers and modern apps has been present for a long time now. And their call is more omnipresent than Odysseus ever had to deal with.

The cloud’s allure is primarily low cost-instant infrastructure that just works, a software solution/tool box that’s overflowing, with locations close to most major metropolitan areas, and the extreme ease of starting up.

If your app ever hopes to scale to meet customer demand, where else can you go. If your data can literally come in from anywhere, it usually lands on the cloud. And if you have need for modern solutions, tools, frameworks or just about anything the software world can create, there’s nowhere else with more of this than the cloud.

Pre-cloud, all those apps would have run in the enterprise or wouldn’t have run at all. And all that data would have been funneled back into the enterprise.

Not today, the cloud has it all, its siren call is getting louder everyday, ever ready to satisfy every IT desire anyone could possibly have, except for the edge.

The Edge, last bastion for onsite infrastructure

The edge sort of emerged over the last decade or so kind of in stealth mode. Yes there were always pockets of edge, with unique compute or storage needs. For example, video surveillance has been around forever but the real acceleration of edge deployments started over the last decade or so as compute and storage prices came down drastically.

These days, the data being generated is stagering and compute requirements that go along with all that data are all over the place, from a few ARMv/RISC V cores to a server farm.

For instance, CERN’s LHC creates a PB of data every second of operation (see IEEE Spectrum article, ML shaking up particle physics too). But they don’t store all that. So they use extensive compute (and ML) to try to only store interesting events.

Seismic ships roam the seas taking images of underground structures, generating gobs of data, some of which is processed on ship and the rest elsewhere. A friend of mine creates RPi enabled devices that measure tank liquid levels deployed in the field.

More recently, smart cars are like a data center on tires, rolling across roads around the world generating more data than you want can even imagine. 5G towers are data centers ontop of buildings, in farmland, and in cell towers doting the highways of today. All off the beaten path, and all places where no data center has ever gone before.

In olden days there would have been much less processing done at the edge and more in an enterprise data center. But nowadays, with the advent of relatively cheap computing and storage, data can be pre-processed, compressed, tagged all done at the edge, and then sent elsewhere for further processing (mostly done in the cloud of course).

IT Vendors at the crossroads

And what does the hollowing out of the enterprise data centers mean for IT server and storage vendors, mostly danger lies ahead. Enterprise IT hardware spend will stop growing, if it hasn’t already, and over time, shrink dramatically. It may be hard to see this today, but it’s only a matter of time.

Certainly, all these vendors can become more cloud like, on prem, offering compute and storage as a service, with various payment options to make it easier to consume. And for storage vendors, they can take advantage of their installed base by providing software versions of their systems running in the cloud that allows for easier migration and onboarding to the cloud. The server vendors have no such option. I see all the above as more of a defensive, delaying or holding action.

This is not to say the enterprise data centers will go away. Just like, mainframe and tape before them, on prem data centers will exist forever, but will be relegated to smaller and smaller, niche markets, that won’t grow anymore. But, only as long as vendor(s) continue to upgrade technology AND there’s profit to be made.

It’s just that that astronomical growth, that’s been happening ever since the middle of last century, happen in enterprise hardware anymore.

Long term life for enterprise vendors will be hard(er)

Over the long haul, some server vendors may be able to pivot to the edge. But the diversity of compute hardware there will make it difficult to generate enough volumes to make a decent profit there. However, it’s not to say that there will be 0 profits there, just less. So, when I see a Dell or HPE server, under the hood of my next smart car or inside the guts of my next drone, then and only then, will I see a path forward (or sustained revenue growth) for these guys.

For enterprise storage vendors, their future prospects look bleak in comparison. Despite the data generation and growth at the edge, I don’t see much of a role for them there. The enterprise class feature and functionality, they have spent the decades creating and nurturing aren’t valued as much in the cloud nor are they presently needed in the edge. Maybe I’m missing something here, but I just don’t see a long term play for them in the cloud or edge.

~~~~

For the record, all this is conjecture on my part. But I have always believed that if you follow where new apps are being created, there you will find a market ready to explode. And where the apps are no longer being created, there you will see a market in the throws of a slow death.

Photo Credit(s):

CTERA, Cloud NAS on steroids

We attended SFD22 last week and one of the presenters was CTERA, (for more information please see SFD22 videos of their session) discussing their enterprise class, cloud NAS solution.

We’ve heard a lot about cloud NAS systems lately (see our/listen to our GreyBeards on Storage podcast with LucidLink from last month). Cloud NAS systems provide a NAS (SMB, NFS, and S3 object storage) front-end system that uses the cloud or onprem object storage to hold customer data which is accessed through the use of (virtual or hardware) caching appliances.

These differ from file synch and share in that Cloud NAS systems

  • Don’t copy lots or all customer data to user devices, the only data that resides locally is metadata and the user’s or site’s working set (of files).
  • Do cache working set data locally to provide faster access
  • Do provide NFS, SMB and S3 access along with user drive, mobile app, API and web based access to customer data.
  • Do provide multiple options to host user data in multiple clouds or on prem
  • Do allow for some levels of collaboration on the same files

Although admittedly, the boundary lines between synch and share and Cloud NAS are starting to blur.

CTERA is a software defined solution. But, they also offer a whole gaggle of hardware options for edge filers, ranging from smart phone sized, 1TB flash cache for home office user to a multi-RU media edge server with 128TB of hybrid disk-SSD solution for 8K video editing.

They have HC100 edge filers, X-Series HCI edge servers, branch in a box, edge and Media edge filers. These later systems have specialized support for MacOS and Adobe suite systems. For their HCI edge systems they support Nutanix, Simplicity, HyperFlex and VxRail systems.

CTERA edge filers/servers can be clustered together to provide higher performance and HA. This way customers can scale-out their filers to supply whatever levels of IO performance they need. And CTERA allows customers to segregate (file workloads/directories) to be serviced by specific edge filer devices to minimize noisy neighbor performance problems.

CTERA supports a number of ways to access cloud NAS data:

  • Through (virtual or real) edge filers which present NFS, SMB or S3 access protocols
  • Through the use of CTERA Drive on MacOS or Windows desktop/laptop devices
  • Through a mobile device app for IOS or Android
  • Through their web portal
  • Through their API

CTERA uses a, HA, dual redundant, Portal service which is a cloud (or on prem) service that provides CTERA metadata database, edge filer/server management and other services, such as web access, cloud drive end points, mobile apps, API, etc.

CTERA uses S3 or Azure compatible object storage for its backend, source of truth repository to hold customer file data. CTERA currently supports 36 on-prem and in cloud object storage services. Customers can have their data in multiple object storage repositories. Customer files are mapped one to one to objects.

CTERA offers global dedupe, virus scanning, policy based scheduled snapshots and end to end encryption of customer data. Encryption keys can be held in the Portals or in a KMIP service that’s connected to the Portals.

CTERA has impressive data security support. As mentioned above end-to-end data encryption but they also support dark sites, zero-trust authentication and are DISA (Defense Information Systems Agency) certified.

Customer data can also be pinned to edge filers, Moreover, specific customer (director/sub-directorydirectories) data can be hosted on specific buckets so that data can:

  • Stay within specified geographies,
  • Support multi-cloud services to eliminate vendor lock-in

CTERA file locking is what I would call hybrid. They offer strict consistency for file locking within sites but eventual consistency for file locking across sites. There are performance tradeoffs for strict consistency, so by using a hybrid approach, they offer most of what the world needs from file locking without incurring the performance overhead of strict consistency across sites. For another way to do support hybrid file locking consistency check out LucidLink’s approach (see the GreyBeards podcast with LucidLink above).

At the end of their session Aron Brand got up and took us into a deep dive on select portions of their system software. One thing I noticed is that the portal is NOT in the data path. Once the edge filers want to access a file, the Portal provides the credential verification and points the filer(s) to the appropriate object and the filers take off from there.

CTERA’s customer list is very impressive. It seems that many (50 of WW F500) large enterprises are customers of theirs. Some of the more prominent include GE, McDonalds, US Navy, and the US Air Force.

Oh and besides supporting potentially 1000s of sites, 100K users in the same name space, and they also have intrinsic support for multi-tenancy and offer cloud data migration services. For example, one can use Portal services to migrate cloud data from one cloud object storage provider to another.

They also mentioned they are working on supplying K8S container access to CTERA’s global file system data.

There’s a lot to like in CTERA. We hadn’t heard of them before but they seem focused on enterprise’s with lots of sites, boatloads of users and massive amounts of data. It seems like our kind of storage system.

Comments?

Internet of Tires

Read an article a couple of weeks back (An internet of tires?… IEEE Spectrum) and can’t seem to get it out of my head. Pirelli, a European tire manufacturer was demonstrating a smart tire or as they call it, their new Cyber Tyre.

The Cyber Tyre includes accelerometer(s) in its rubber, that can be used to sense the pavement/road surface conditions. Cyber Tyre can communicate surface conditions to the car and using the car’s 5G, to other cars (of same make) to tell them of problems with surface adhesion (hydroplaning, ice, other traction issues).

Presumably the accelerometers in the Cyber Tyre measure acceleration changes of individual tires as they rotate. Any rapid acceleration change, could potentially be used to determine whether the car has lost traction due and why.

They tested the new tires out at a (1/3rd mile) test track on top of a Fiat factory, using Audi A8 automobiles and 5G. Unclear why this had to wait for 5G but it’s possible that using 5G, the Cyber Tyre and the car could possibly log and transmit such information back to the manufacturer of the car or tire.

Accelerometers have become dirt cheap over the last decade as smart phones have taken off. So, it was only a matter of time before they found use in new and interesting applications and the Cyber Tyre is just the latest.

Internet of Vehicles

Presumably the car, with Cyber Tyres on it, communicates road hazard information to other cars using 5G and vehicle to vehicle (V2V) communication protocols or perhaps to municipal or state authorities. This way highway signage could display hazardous conditions ahead.

Audi has a website devoted to Car to X communications which has embedded certain Audi vehicles (A4, A5 & Q7), with cellular communications, cameras and other sensors used to identify (recognize) signage, hazards, and other information and communicate this data to other Audi vehicles. This way owning an Audi, would plug you into this information flow.

Pirelli’s Cyber Car Concept

Prior to the Cyber Tyre, Pirelli introduced a Cyber Car concept that is supposedly rolling out this year. This version has tyres with real time pressure, temperature, (static) vertical load and a Tyre ID. Pirelli has been working with car manufacturers to roll out Cyber Car functionality.

The Tyre ID seems to be a file that can include anything that the tyre or automobile manufacturer wants. It sort of reminds me of a blockchain data blocks that could be used to validate tyre manufacturing provenance.

The vertical load sensor seems more important to car and tire manufacturers than consumers. But for electrical car owners, knowing car weight could help determine current battery load and thereby more precisely know how much charge is left in a battery.

Pirelli uses a proprietary algorithm to determine tread wear. This makes use of the other tyre sensors to predict wear and perhaps uses an AI DL algorithm to do this.

~~~

ABS has been around for decades now and tire pressure sensors for over 10 years or so. My latest car has enough sensors to pretty much drive itself on the highway but not quite park itself as of yet. So it was only a matter of time before something like smart tires would show up.

But given their integration with car electronics systems, it would seem that this would only make sense for new cars that included a full set of Cyber Tyres. That is until all tire AND car manufacturers agreed to come up with a standard protocol to communicate such information. When that happens, consumers could chose any tire manufacturer and obtain have similar if not the same functionality from them.

I suppose someone had to be first to identify just what could be done with the electronics available today. Pirelli just happens to be it for now in the tire industry.

I just don’t want to have to upgrade tires every 24 months. And, if I have to wait a long time for my car to boot up and establish communications with my tires, I may just take a (dumb) bike.

Photo Credit(s):

Supercomputing 2019 (SC19) conference

I was at SC19 last week and as always there was lots to see on the expo floor and at the show in general. Two expo booths that I thought were especially interesting were:

  • Zapata Computing systems – a quantum computing programming for hire outfit and
  • Cerebras – a new AI wafer scale accelerator chip that sported 400K+ cores in a single package.

Zapata Computing, quantum coding for hire

We’ve been on a sort of quantum thread this past month or so (e.g., see our Quantum computing – part 2 and part 1, The race for quantum supremacy posts). Zapata Computing was at the edge of the exhibit floor in a small booth pretty much just one guy (Michael Warren) and their booth with some handouts. Must have had something on the booth about quantum computing, because I stopped by

Warren said they have ~20 PhDs, from around the world working for them and provide quantum coding for hire. Zapata works with organizations to either get them up to speed on quantum programing or write quantum programs themselves under contract for clients and help run them on quantum computers.

Zapata’s quantum algorithms are designed to run on any type of quantum computer such as ion trap, superconducting qubit, quantum annealers, etc. They also work with Microsoft Azure Quantum, IBM Q, Rigetti, and Honeywell systems to run quantum programs for customers. Notably missing from this list was Google and Honeywell is new to me but seem active in quantum computing.

Zapata has their own Orquestra quantum toolkit. We have discussed quantum software development kits like IBM Q Qiskit previously but Microsoft has their own, QDK and Rigetti has Forrest SDK. So, presumably, Orquestra front ends these other development kits. Couldn’t find anything on Honeywell but it’s likely they have their own development kit as well or make use of others.

In talking to the Warren at the show, Zapata is working to come up with a quantum computing cloud, which can be used to run quantum code on any of these quantum computers with the click of a button. Warren sounded like this was coming out soon.

Some of the Zapata Computing quantum programs they have developed for clients include: logistic simulations, materials design, chemistry simulations, etc.

Warren didn’t mention the cost of running on quantum computers but he said that some companies are more forthright with pricing than others. It seemed Rigetti had a published price list to use their systems but others seemed to want to negotiate price on a per use basis.

It seems only a matter of time before quantum computing becomes just like GPUs. Just another computational accelerator that works well for some workloads but not others. Zapata Computing and Orquestra are just steps along this path.

Cerebras

AI accelerator chips have also been a hot topic for us (see our posts on Google TPU, GraphCore’s system, and the Mythic’s and Syntiant’s AI accelerators). But none,. with the possible exception of GraphCore, has taken this on to quite the same level as Cerebras.

Cerebras offers a wafer scale chip that is embedded into their CS-1 system. The chip has 400K cores, 18GB of (very fast) SRAM (memory), 100Pb/sec (peta-bits or 10**15 bits per second) of bandwidth and draws ~20kW. Their CS-1 system fits in a standard rack taking up 15U of space.

The on-chip fabric is called SWARM which supports a 2D mesh. The SWARM mesh is entirely configurable, to support optimal neural network connectivity. I assume this means that any core can talk directly (with 0 hops) to any other core on the chip through a configuration setup.

The high speed on chip SRAM supports up to 9PB/sec of memory bandwidth and can be accessed in a single clock cycle. They call the cores Sparse Linear Algebra Compute (SLAC) cores and say that they are optimized to support ML-DL computations, which we assume meansfloating point aritmetic.

Although you can’t really see the (wafer scale) chip in the picture above, it’s located in the section between the copper plate and the copper heat sink and is starts at the copper line between the two. CS-1 consumes a lot of power and much of its design is to provide proper cooling. One can view some of that on the left side of the picture above.

As for software, Cerebras CS-1 supports TensorFlow and PyTorch as well as standard C++. Their Cerebras Software Platform stack, consists of two layers: the Cerebras Intermediate Representation and Cerebras Graph Compiler (CGC) that feeds their Cerebras Wafer Scale Engine (WSE). The CGC maps neural network nodes to cores on the WSE and probably configures SWARM to provide NN core to NN core connectivity.

It’s great to see hardware innovation again. There was a time where everyone thought that software alone was going to kill off hardware innovation. But the facts are that both need to innovate to take computing forward. Cerebras didn’t tell me any PetaFlop rate for their system and but my guess it would beat out the 2PFlop GraphCore2 (GC2) system but it’s only a matter of time before GC3 comes out. That being said, what could be beyond wafer scale integration?

~~~~

I enjoy going to SC19 for all the leading edge technology on display. They have some very interesting cooling solutions that I don’t ever see anywhere else. And the student competition is fun. Teams of students running HPC workloads around the clock, on donated equipment, from Monday evening until Wednesday evening. With (by SC19) spurious fault injection to see how they and their systems react to the faults to continue to perform the work needed.

For every SC conference, they create an SCinet to support the show. This year it supported Tb/sec of bandwidth and the WiFi for the floor and conference. All the equipment and time that goes into creating SCinet is donated.

Unfortunately, I didn’t get a chance to go to keynotes or plenary sessions. I did attend one workshop on container use in HPC and it was completely beyond me. Next years, SC20 will be in Atlanta.

Photo Credit(s):

Clouds an existential threat – part 2

Recall that in part 1, we discussed most of the threats posed by clouds to both hardware and software IT vendors. In that post we talked about some of the more common ways that vendors are trying to head off this threat (for now).

In this post we want to talk about some uncommon ways to deal with the coming cloud apocalypse.

But first just to put the cloud threat in perspective, the IT TAM is estimated, by one major consulting firm, to be a ~$3.8T in 2019 with a growth rate of 3.7% Y/Y. The same number for public cloud spending, is ~$214B in 2019, growing by 17.5% Y/Y. If both growth rates continue (a BIG if), public cloud services spend will constitute all (~98.7%) of IT TAM in ~24 years from now. No nobody would predict those growth rates will continue but it’s pretty evident the growth trends are going the wrong way for (non-public cloud) IT vendors.

There are probably an infinite number of ways to deal with the cloud. But outside of the common ones we discussed in part 1, only a dozen or so seem feasible to me and even less are fairly viable for present IT vendors.

  • Move to the edge and IoT.
  • Make data center as easy and cheap to use as the cloud
  • Focus on low-latency, high data throughput, and high performing work and applications
  • Move 100% into services
  • Move into robotics

The edge has legs

Probably the first one we should point out would be to start selling hardware and software to support the edge. Speaking in financial terms, the IoT/Edge market is estimated to be $754B in 2019, and growing by over a 15.4% CAGR ).

So we are talking about serious money. At the moment the edge is a very diverse environment from cameras, sensors and moveable devices. And everybody seems to be in the act, big industrial firms, small startups and everyone in between. Given this diversity it’s hard to see that IT vendors could make a decent return here. But given its great diversity, one could say it’s ripe for consolidation.

And the edge could use some reference architectures where there are devices at the extreme edge, concentrators at the edge, more higher concentrators at nodes and more at the core, etc. So there’s a look and feel to it that seems like Ro/Bo – central core hub and spoke architectures, only on steroids with leaf proliferation that can’t be stopped. And all that data coming in has to be classified, acted upon and understood.

There are plenty of other big industrial suppliers in this IoT/edge field but none seem to have the IT end of the market that Hitachi Vantara can claim to. Some sort of combination of a large IT vendor and a large industrial firm could potentially do the same

However, Hitachi Vantara seems to be focusing on the software side of the edge. This may be an artifact of Hitachi family of companies dynamics. But it seems to be leaving some potential sales on the table.

Hitachi Vantara has the advantage of being into industrial technology in a big way so the products they create operate in factories, rail yards, ship yards and other industrial sites around the world already. So, adding IoT and edge capabilities to their portfolio is a natural extension of this expertise.

There are a few vendors going into the Edge/IoT in a small way, but no one vendor personifies this approach more than Hitachi Vantara. The Hitachi family of companies has a long and varied history in OT (operational technology) or industrial technology. And over the last many years, HDS and now Hitachi Vantara, have been pivoting their organization to focus more on IoT and edge solutions and seem to have made IOT, OT and the edge, a central part of their overall strategy.

So there’s plenty of money to be made with IoT/Edge hardware and software, one just has to go after it in a big way and there’s lots of competition. But all the competition seems to be on the same playing field (unlike the public cloud playing field).

Getting to “data center as a cloud”

There are a number of reasons why customers migrate work to the cloud, ease of use, ease of storage, ease of scale, access to myriad applications, access to multi-regional data centers, CAPex financial model, to name just a few.

There’s nothing that says much of this couldn’t be provided at the data center. It’s mostly just a lot of open source software and a lot of common hardware. IT vendors can do this sort of work if they put their vast resources to go after it.

From the pure software side, there are a couple of companies trying to do this namely VMware and Nutanix but (IBM) RedHat, (Dell) Pivotal, HPE Simplivity and others are also going after this approach.

Hardware wise CI and HCI, seem to be rudimentary steps towards common hardware that’s easy to deploy, operate and support. But these baby steps aren’t enough. And delivery to deployment in weeks is never going to get them there. If Amazon can deliver books, mattresses, bicycles, etc in a couple of days. IT vendors should be able to do the same with some select set of common hardware and have it automatically deployable in seconds to minutes once powered on.

And operating these systems has to be drastically simplified. On any public cloud there’s really no tuning required, almost minimal configuration, and then it’s just load your data and go. Yes there’s a market place to select, (virtual) hardware, (virtual) storage hardware, (virtual) networking hardware, (virtual server) O/S and (virtual?) open source applications.

Yes there’s a lots of software behind all that virtualization. And it’s fundamentally different than today’s virtualized systems. It’s made to operate only on commodity hardware and only with open source software.

The CAPex financial model is less of a problem. Today. I find many vendors are offering their hardware (and some software) on a CAPex, pay as you go model. More of this needs to be made available but the IT vendors see this, and are already aggressively moving in this direction.

The clouds are not standing still what with Azure Stack, AWS and GCP all starting to provideversions of their stack on prem in the enterprise. This looks to be a strategic battleground between the clouds and IT vendors.

Making everything IT can do in the cloud available in the data center, with common hardware and software and with the speed and ease of deployment, operations and support (maintenance) should be on every IT vendors to do list.

Unfortunately, this is not going to stop the public cloud completely, but it has the potential to slow the growth rate. But time is short, momentum has moved to the public cloud and I don’t (yet) see the urgency of the IT vendors to make this transition happen today.

Focus on low-latency, high data throughput and high performance work

This is somewhat unfair as all the IT vendors are already involved in these markets in a big way. But, there are some trends here, that indicate this low-latency market will be even more important over time.

For example, more and more of commercial IT is starting to take advantage of big data and AI to profit from all their data. And big science is starting to migrate to IT, where massive data flows and data analysis tools are becoming important to the data center. If anything, the emergence of IoT and the edge will increase data flows that need to be analyzed, understood, and ultimately dealt with.

DNA genomics may be relegated to big pharma/medical but 3D visualization is becoming so mainstream that I can do it on my desktop. These sorts of things were relegated to HPC/big science just a decade or so ago. What tools exist in HPC today that the IT data center of the future will deam a necessary part of their application workload.

Is this a sizable TAM, probably not today. In all honesty it’s buried somewhere in the IT TAM above. But it can be a growing niche, where IT vendors can stake a defensive position and the cloud may have a tough time dislodging.

I say the cloud “may have trouble dislodging” because nothing says that the entire data flow/work flow couldn’t migrate to the cloud, if the responsiveness was available there. But, if anything (guaranteed) responsiveness is one of the few achilles heels of the public cloud. Security may be the other one.

We see IBM, Intel, and a few others taking this space seriously. But all IT vendors need to see where they can do better here.

Focus on services

This not really out-of-box thinking. Some (old) IT vendors have been moving into services for over 50 years now others are just seeing there’s money to be made here. Just about every IT vendor has deployment & support services. most hardware have break-fix services.

But standalone IT services are more specialized and in the coming cloud apocalypse, services will revolve around implementing cloud applications and functionality or migrating work from the cloud or (rarely in the future) back to on prem.

TAM for services is buried in the total IT spend but industry analysts estimate that in 2019 total worldwide TAM for IT services will be about $1.0 in 2019 and growing by 2.6% CAGR.

So services are already a significant portion of IT spend today. And will probably not be impacted by the move to the cloud. I’d say that because implementing applications and services will still exist as long as the cloud exists. Yes it may get simpler (better frameworks, containerization, systemization), but it won’t ever go away completely.

Robots, the endgame

Ok laugh now. I understand this is a big ask to think that Robot spending could supplement and maybe someday surpass IT spending. But we all have to think long term. What is a self driving car but a robotic data center on wheels, generating TB of data every day it’s driven.

Robots over the next century will invade every space, become ever present and ever necessary to modern world functioning . They will have sophisticated onboard computing, motors, servos, sensors and on board and backend processing requirements. The real low-latency workload of the future will be in the (computing) minds of robots.

Even if the data center moves entirely to the cloud, all robotic computation will never reside there because A) it’s too real time and B) it needs to operate well even disconnected from the Internet.

Is all this going to happen in the next 10 or 20 years, maybe not but 30 to 50 years out this world will have a multitude of robots operating within it. .

Who’s going to develop, manufacture, support and sustain these mobile computing data centers on wheels, legs, slithering and flying bodies?

I would say IT vendors of today are uniquely positioned to dominate this market. Here to the industry is very fragmented today. There are a few industrial robotic companies and just about every major auto manufacturer is going after self driving cars. And there are many bit players today. So it’s ripe for disruption and consolidation. .

Yet, none of the major IT vendors seem to be going after this. Ok Amazon (hardware & software) and Microsoft (software) have done work in this arena. If anything this should tell IT vendors that they need to start working here as well.

But alas, none have taken up the mantle. In the mean time robot startups are biting the dust left and right, trying to gain market traction.

~~~~

That seems to be about it for the major viable out of the box approaches to the public cloud threat. I have a few other ideas but none seem as useful as the above.

Let me know what you think.

Picture credit(s):

Clouds, an existential threat to vendors – part 1

Was at a conference last month where there was discussion of the “cloudless” future. This is so wrong, clouds are a threat to every IT hardware and software vendor out there and it’s not going away

The hardware side is easy to see.

Clouds threat to IT hardware vendors

On the storage side, the big hyperscalers have adopted software defined storage from the git go. Smaller ones are migrating that way as well and it’s even impacting data centers as the big virtualization software vendors release more and more functionality in SwDefStorage

And on the networking side, the clouds were an early adopter of Openflow, software defined networking. OpenFlow gear still requires specialized hardware but mostly it’s just a server with PCIe accelerator cards that perform high speed switching. Ditto the prior paragraph here as the virtualization vendors are also moving their networking to SwDefNetworking.

Luckily for servers there’s no such thing as a SwDefServer, yet. But server offerings are under just as big a threat from the cloud. Hyper-scalars are sophisticated enough to design their own server hardware and have it manufactured to spec. The smaller ones can make use of whitebox servers. Both of them, at the volumes they consume servers, can force a race to the bottom on pricing.

So server vendors are being relegated to the data center for the most part. And as data center servers become more powerful, virtualized environments need less of them.

The threat to IT software vendors

Make no mistake about it, software is under just as much threat as hardware. AWS and Oracle was probably the best example of how this works. Oracle was once a profitable niche market on AWS. Today, Oracle is not even available on AWS marketplace anymore.

This sort of dynamic can happen to any solution where acceptable open source alternatives exist. With the cloud’s sophistication and volumes they can easily take the sting out of using open source by providing ease of deployment, use and maintenance along with performance scalability. That makes running open source on clouds as easy as any packaged solution.

Internet Splat Map by jurvetson (cc) (from flickr)
Internet Splat Map by jurvetson (cc) (from flickr)

Albeit, maybe the cloud may not offer the support or hand-holding one obtains with packaged software. But open source can be very responsive to bugs/security exposures. Cloud providers can take the time to make their open source solutions bullet proof. And with 1000s to 10,000s of users, running them at scale, it should be easy enough to find any high profile bugs.

Even all those software vendors that make software that executes only on the cloud, to make it run better, more secure or to add some unique functionally are at risk. All these vendors ultimately will suffer by “death from marketplace success“. As they become successful and cloud vendors know inherently how successful they are, they become more interesting to the cloud. Over time more successful solutions will attract cloud provider functionally-equivalent, open source alternatives, that will push them out of the clouds marketplace.

Dealing with the threat to hardware vendors

Hardware vendors have four grand strategies to address the cloud threat.

  1. Stick head in sand, hope it goes away (or at least takes a long time to kill them off). There are still some major vendors with this mindset. Yes, slowly but surely they are coming around to see the light but they think they have a long enough runway to hold on until something better comes along.
  2. Co-opt the cloud by providing unique, hardware capabilities in their cloud environment. There are a few hardware vendors that have adopted this strategy. This buys them more time as they can depend on current data center revenues and over time augment this with cloud revenues.
  3. Join the race to the bottom to become a supplier to clouds. Most hardware vendors started out in a highly competitive environment, but over time they have lost their way (found a higher profitability niche). But lurking in their past somewhere, there’s a competitiveness streak that’s dying to come out. The race to the bottom may never be as profitable as data centers but there’s significant revenue to be had here.
  4. Co-opt the cloud by providing services that span multiple clouds. Not exactly creating a hybrid cloud but rather providing a multi-cloud hardware service. Hardware functionality that can be accessed from multiple clouds can enjoy some advantages of the cloud but at the same time generate data center like revenues..

I may be missing some grand hardware vendor strategies but as I’ve talked with hardware vendors over time these seem to be the main ones moving ahead.

I’ve tried a couple of times to talk to vendors in the #1 mindset above about the futility of their approach. Mostly, I get ignored or at best politely brushed off as being alarmist. Their main hope is that the data center continues on in the present environment and that they can retain their dominance there.

Maybe they have a point. The 1960s mainframe environment still exists today. And IBM still remains dominant there, and generates profits there. But it just doesn’t matter that much to IT anymore. IT has moved on. .

Richard (Dick) Nafzger with Apollo data tape by Goddard Photo and Video (cc) (from flickr)
Richard (Dick) Nafzger with Apollo data tape by Goddard Photo and Video (cc) (from flickr)

Something similar will happen to IT’s data center. Yes it will still exist forever, and perhaps some vendors can continue to profit there.

But the vast majority of IT workloads will be moving to the cloud over time, relegating this to a smaller (proportionally) niche market. They’ve been saying tape is dead since 1967, but it’s still alive, it’s just moved from being a large market to a smaller one (proportionally).

The #2 mindset vendors have a clearer view of wha’s happening with the cloud. They are moving select hardware functionality out to the cloud as soon as they are able. Some are even placing their hardware in cloud provider availability zones (data centers) to support this. We all hope they enjoy lasting success doing this.

But ultimately they to, shall suffer the same fate as software vendors above, due to the cloud’s death by marketplace success. The more successful they become, the higher the likelihood that the cloud providers will go after them with their own functionally-equivalent, software defined solution.

I’m not privy to the contracts between hardware vendors and cloud providers bit perhaps this later transition, to outright competition, can be forestalled enough to make the cloud providers reluctant to compete with them. But hardware success can only lead to more cloud interest and no contract can protect against every contingency.

Those vendors adopting the #3 mindset have to return to their competitiveness roots. Doing this will never be as profitable as today’s data center. So the transition will be painful, but they need to do this soon, while they still have some profits coming from data center sales. The sooner they can deploy these $s to fix supply chains, manufacturing quality/production, drastically slim down marketing and sales, the faster they can start supplying the clouds with appropriate hardware. Profitability will suffer early on but it may never fully recover.

The #4 mindset applies equally well to software vendors as well as hardware vendors but the hardware group seems to be doing this already. Many storage vendors have multi-cloud solutions with hardware positioned in cloud-adjacent facilities that can be accessed from multiple clouds. Such services have to be consumable like any cloud service. But once in place they have a unique value proposition, the ability to move work and data from one cloud to another.

But the only thing stopping cloud providers doing something similar is that they don’t want to help any current user to use a different cloud. Again, depending on how successful this multi-cloud approach becomes, there’s nothing prohibiting the cloud providers from providing similar functionality.

Dealing with the threat to software vendors

Software vendors see 4 grand strategies to deal with the cloud threat:

  1. If you can’t beat them, join them, and create their own cloud. IBM exemplifies this best but one can see this with Microsoft, Oracle, SAP and others. If they can create their own cloud, they can start to compete with cloud providers on an equal footing. Yes they will be smaller but they can enjoy many of the same benefits of bigger clouds, just not as much. .
  2. Offer their software services/stack on the cloud providers. This is similar to the hardware vendors #2 mindset. Yet this has suffered from death by marketplace success since the inception.
  3. Co-opt the cloud by providing services that fuse the data center and the cloud environments. Thus creating hybrid cloud solutions that span the data center-cloud environment which seem to have a longer runway. But this lasts only as long as the data center is a significant market.
  4. Co-opt the cloud by providing services that span cloud provider vendors. Multi-cloud solutions are more apparent for hardware, but nothing prohibits a software vendor from offering services that spans clouds.

I may be missing a few grand strategies here but these seem to be the major ones software vendors are using to deal with the cloud. And just like hardware vendors above, much of the success of these strategies (at least #2,3 &4) depends on flying under the radar of cloud providers. Limiting your success may give you some time to eek out a decent revenue/profitability stream, while the cloud provider kills off the more successful solutions ahead of you. But you’re all living on borrowed time.

The most interesting one is #1. Yes economies of scale will matter, which will make their long term viability a concern. But at least you can be on the same playing field. Most of these companies have sizable treasure chests and if invest serious money on their own clouds, they may have a chance to survive.

Cloud providers are taking their time

The other thing that’s prolonging the data center and correspondingly vendors existence is cloud providers expenses. With all their hardware volumes, use of white box or own designed hardware and open source software, does it make any sense that IT could provide matching services in data centers by themselves.

But something is chewing up their revenues, Maybe it’s marketing, customer acquisition, software/hardware development or support expenses. I tend to think it’s trying to keep pace with customer growth. They end up having to anticipate this growth ahead of time and position hardware, software and services before the customers exist to use them.

I don’t think there’s anything more mysterious to their lack of profitability than that. They all want all the customers they can get. They are have significant growth and they are all charging a premium for their service. However, I may be wrong.

But how long can such hyper-growth last. At some point, as more and more IT organizations move to the cloud this growth will slow, prices will start to come down and it will set off a vicious cycle, more cloud success brings more volumes, less overhead and should lower prices which brings more cloud success.

More cloud success brings less volumes for hardware and software vendors, more overhead and ultimately higher prices.

None of the above solutions seem that attractive to hardware or software vendors but I see only a few ways forward for all of them.

In part 2, I’ll discuss some out of the box strategies that move beyond the data center and the cloud that may be just the way forward for hardware and software vendors need to take the cloud on.

Comments?

For data that never rests, NetApp NDAS

NetApp co-founder, Dave Hitz announced he was becoming a NetApp Founder Emeritus at the Storage Field Day (SFD18) show. He gave a great session about what he and his Hitz foundation’s been doing (for one example see our Archeology meets big data, post). He also discussed at length where he felt the storage world (and NetApp) must do to address the opportunities of the new cloud world. But this post isn’t about Dave, it’s about NetApp Data Availability Service, NDAS.

NetApp NDAS, currently in Beta but GAing (hopefully) later this year, is an AWS marketplace data orchestration solution that manages primary to secondary to S3 movement for ONTAP data. Essentially, NetApp Data Availability Services extends ONTAP data lifecycle management to AWS cloud. But it’s more than just a way to archive ONTAP data.

NDAS orchestrates Snapmirror services across ONTAP systems and AWS. But once your ONTAP data is in S3 it supplies access to that data for authorized AWS applications and services. That way one can use their ONTAP data to provide data analytics, train AI models, and do just about anything you can do with AWS applications today. By using NDAS, customers can extract more value from their ONTAP data.

NDAS is not just copying data to S3 but is also copying ONTAP metadata, catalogues and other information that provides context for that data. By copying ONTAP catalog information, customers and authorized end users can have file level access to ONTAP data residing in S3 objects.

NDAS today, only supports copying data from secondary ONTAP systems to S3. But a future enhancement will expand this to copy primary ONTAP data to S3.

How does NDAS work

NDAS provisions (your) EC2 instances, and middleware to read the data from the secondary systems and copy it to S3 buckets which you provide. NDAS after initial configuration to point to your ONTAP secondary storage systems, will autodiscover all the data available that can be copied to the cloud.

NDAS will start cataloguing your ONTAP data. NDAS EC2 instances support the NDAS copy, view and a Google-like search processes.

NDAS search presents a simplified file system view into your ONTAP data copied to S3. That way customers can identify data that could be used for AI training or data analytics that run in the cloud to access the data.

There’s extensive security to insure that NDAS is properly authorized to access your ONTAP data. Normal S3 security options also apply such as to have the data be encrypted on S3. NDAS data is automatically encrypted in flight.

Moreover, NDAS S3 bucket data can be replicated across AWS regions . Also serverless/lambda funationality are fully supported from or NDAS S3 buckets. .

What can it do with the data

AWS applications can access the data directly through NDAS APIs. Or customers can manually extract data they want to further process using the NDAS GUI to identify and copy data of interests. NDAS essentially creates a small app layer that allows users to view and access the ONTAP data in S3 as a file system.

One can have different NDAS AMIs operating in different regions for faster access or to support GDPR compliance requirements. Alternatively, a customer could have one NDAS AMI accessing all their secondary ONTAP instances.

NDAS is intended to provide a data analyst or IT generalist access to ONTAP data. This way AI training and big data analytics applications which run easily in the cloud, can have access to ONTAP data. In this way, customers can more effectively utilize data that IT has been storing and maintaining, since time began.

One NDAS beta customer is a MLB team. They have over time instrumented their stadiums to generate lot’s of data about pitch speed, rotation, ball location as it crosses the plate, etc.   The problem with all this data is siloed in onprem or IOT systems that generated it. But the customer wants to use the data to improve players, coaches and the viewer experience. And all that needs tools, applications and software that’s just not available to run in the data center. But with NDAS all this data is now available to cloud applications.

NDAS is supported by any ONTAP 9.5 or later (FAS, AFF, Cloud ONTAP, ONTAPselect) secondary storage system. ONTAP 9.5 software contains all the services required to support NDAS. This includes the copy-to-cloud APIs, as well as the NDAS proxy, which supplies the secure interface to NDAS operating in the cloud.

NetApp’s NDAS sessions are pretty informative. Anyone interested in finding out more should checkout the videos available on TechFieldDay website and Dave’s session is also worth a view.

For more information on Dave’s session and NDAS check out:

NetApp, Cloudier than ever by Enrico Signoretti (@ESignoretti)

NetApp and the space in between by Dan Frith (@PenguinPunk)

~~~~

Comments?