Hammerspace and the Open Flash Platform at #AIIFD3

Was at AI Infrastructure Field Day 3 (AIIFD3) last week in CA and Hammerspace presented. (videos here). Molly and Floyd talked about their solution and some of their recent MLCommon’s performance results but Kurt discussed the Open Flash Platform (OFP) Consortium, announced last July which they and partners have been working on..

OFP currently has 6 partners ranging from Hammerspace (storage software supplier), SK Hynix (NAND and SSDs) and Linux Foundation among others and includes end users (Las Alamos National Labs), computational storage (ScaleFlux) and AI solution providers (Xsight).

As I understand it, the OFP is pushing to become a standard adopted by the Open Compute Project (OCP).

OFP is an attempt to redefine NAS as we know it. Hammerspace has been on this journey for a long time with their software only solution but technology is now at a place where it’s time to tackle hardware changes to NAS that would enable even better performance and throughput for AI and other data intensive workloads.

Some of the technology changes driving the need for a different approach to NAS storage:

  • NAND capacities are going through the roof, accessing all that capacity in an effective and performant way, requires a re-architecturing of the storage stack
  • Compute is becoming more widespread and ubiquitous. Every thing seems to have more and more compute capability that it’s causing a rethink as to how to take advantage of all this ubiquitous compute to better address IT (and AI) performance needs
  • AI bandwidth and performance requirements are extreme and are only becoming more so. .
  • Power has become a limiting factor in many AI deployments.

Hammerspace has addressed much of this from a software perspective with their Linux standards efforts to implement Parallel File System and Flex Files in the Linux kernel and in NFS standards as NFSv4.2. PFS and FlexFiles allows Hammerspace to offer very high file bandwidth and data mobility that can’t be supplied any other way.

So it’s time to see what can be done in hardware to make this even better. Enter OFP.

OFP, NAS storage reborn

The idea is to come up with a new packaging of an NFS (v3) server that’s all storage with high amounts of networking and enough compute to serve the storage. Effectively they are putting a DPU (computational intensive networking card) with 1-800Gbps Ethernet connection in front of a train (or toboggan) of NVMe SSDs and calling this a sled.

Their first version using U.2 NVMe SSDs, offers 1PB of capacity with 800Gbps of networking in a 3.5″ X 1.75″ form factor. They would load a NFS v3 Linux based storage server in the DPU and have it run that along with the Networking stack (and more) on the DPU and have access to all this storage capacity in what essentially is a NFSv3 (relatively dumb storage) storage sled.

Package 6 of these together with a couple of power supplies and now you have 6PB raw capacity in 1RU, with 4.8Tbps of bandwidth, consuming .6 kW of power (presumably this is power consumption at idle).

You will no doubt note that the sled, as configured above, does not allow for hot (or even cold) drive replacement. So when drives fail, the NFSv3 code would need to recover from them and take them out of service. So that over time the sled could still be used even though some SSDs have failed.

In the future, moving from U.2 SSDs to E2(E) NVMe SSDs in the storage sled quadruples the capacity while staying in the same power envelope and supplying the same bandwidth. Again the SSDs are not intended to be (hot or cold) swappable, so drive failure would need to be handled by software. With E2(E) SSDs in a sled and 6 of these in a 1RU, one would have 24PB of storage capacity.

Presumably, OFP Sleds could be hot swappable when enough SSDs in a sled fails.

And of course QLC capacities are not standing still so another doubling of these capacities could easily be possible within the next couple of years (imagine 48PB in a single RU, boggles the mind).

The NAS software one runs in the OFP SLED could be any NFSv3 server software but Hammerspace has their own, called DSX. And when you combine DSX servers with lots of capacity and lots of networking bandwidth, Hammerspace’s NFSv4.2 PFS and FlexFiles can really fly.

And with the power and space efficiency as well as extreme bandwidth available, it could be a winning formula for the AI environments, in contrast to scale-out NAS which is the current alternative.

~~~~

But it seems to me any organization (hypervisors are you listening) with intense storage capacity and storage bandwidth needs would be very interested in the OFP for their own environment.

Comments?

AI benchmark for Storage, MLpERF Storage

MLperf released their first round of storage benchmark submissions early this month. There’s plenty of interest how much storage is required to keep GPUs busy for AI work. As a result, MLperf has been busy at work with storage vendors to create a benchmark suitable to compare storage systems under a “simulated” AI workload.

For the v0.5 version ,they have released two simulated DNN training workloads one for image segmentation (3D-Unet [146 MB/sample]) and the other for BERT NLP (2.5 KB/sample).

The GPU being simulated is a NVIDIA V100. What they showing with their benchmark is a compute system (with GPUs) reading data directly from a storage system.

By using simulated (GPU) compute, the benchmark doesn’t need physical GPU hardware to run. However, the veracity of the benchmark is somewhat harder to depend on.

But, if one considers, the reported benchmark metric, # supported V100s, as a relative number across the storage submissions, one is on more solid footing. Using it as a real number of V100s that could be physically supported is perhaps invalid.

The other constraint from the benchmark was keeping the simulated (V100) GPUs at 90% busy. MLperf storage benchmark reports, number of samples/second,MBPS metrics as well as # simulated (V100) GPUs supported (@90% utilization).

In the bar chart we show the top 10 # of simulated V100 GPUs for image segmentation storage submissions, DDN AI400X2 had 5 submissions in this category.

The interesting comparison is probably between DDN’s #1 and #3 submission.

  • The #1 submission had smaller amount of data (24X3.5TB = 64TB of flash), used 200Gbps InfiniBand, with 16 compute nodes and supported 160 simulated V100s.
  • The #3 submission had more data (24X13.9TB=259TB of flash),used 400Gbps InfiniBand with 1 compute node and supports only 40 simulated V100s

It’s not clear why the same storage, with less flash storage, and slower interfaces would support 4X the simulated GPUs than the same storage, with more flash storage and faster interfaces.

I can only conclude that the number of compute nodes makes a significant difference in simulated GPUs supported.

One can see a similar example of this phenomenon with Nutanix #2 and #6 submissions above. Here the exact same storage was used for two submissions, one with 5 compute nodes and the other with just 1 but the one with more compute nodes supported 5X the # of simulated V100 GPUs.

Lucky for us, the #3-#10 submissions in the above chart, all used one compute node and as such are more directly comparable.

So, if we take #3-#5 in the chart above, as the top 3 submissions (using 1 compute node), we can see that the #3 DDN AI400X2 could support 40 simulated V100s, the #4 Weka IO storage cluster could support 20 simulated V100s and the #5 Micron NVMe SSD could support 17 simulated V100s.

The Micron SSD used an NVMe (PCIe Gen4) interface while the other two storage systems used 400Gbps InfiniBand and 100Gbps Ethernet, respectively. This tells us that interface speed, while it may matter at some point, doesn’t play a significant role in determining the # simulated V100s.

Both the DDN AI4000X2 and Weka IO storage systems are sophisticated storage systems that support many protocols for file access. Presumably the Micron SSD local storage was directly mapped to a Linux file system.

The only other MLperf storage benchmark that had submissions was for BERT, a natural language model.

In the chart, we show the # of simulated V100 GPUs on the vertical axis. We see the same impact here of having multiple compute nodes in the #1 DDN solution supporting 160 simulated V100s. But in this case, all the remaining systems, used 1 compute node.

Comparing the #2-4 BERT submissions, both the #2 and #4 are DDN AI400X2 storage systems. The #2 system had faster interfaces and more data storage than the #4 system and supported 40 simulated GPUs vs the other only supporting 10 simulated V100s.

Once again, Weka IO storage system came in at #3 (2nd place in the 1 compute node systems) and supported 24 simulated V100s.

A couple of suggestions for MLperf:

  • There should be different classes of submissions one class for only 1 compute node and the other for any number of compute nodes.
  • I would up level the simulated GPU configurations to A100 rather than V100s, which would only be one generation behind best in class GPUs.
  • I would include a standard definition for a compute node. I believe these were all the same, but if the number of compute nodes can have a bearing on the number of V100s supported, the compute node hardware/software should be locked down across submissions.
  • We assume that the protocol used to access the storage oven InfiniBand or Ethernet was standard NFS protocols and not something like GPUDirect storage or other RDMA variants. As the GPUs were simulated this is probably correct but if not, it should be specfied
  • I would describe the storage configurations with more detail, especially for software defined storage systems. Storage nodes for these systems can vary significantly in storage as well as compute cores/memory sizes which can have a significant bearing on storage throughput.

To their credit this is MLperfs first report on their new Storage benchmark and I like what I see here. With the information provided, one can at least start to see some true comparisons of storage systems under AI workloads.

In addition to the new MLperf storage benchmark, MLperf released new inferencing benchmarks which included updates to older benchmark NN models as well as a brand new GPT-J inferencing benchmark. I’ll report on these next time.

~~~~

Comments?

CTERA, Cloud NAS on steroids

We attended SFD22 last week and one of the presenters was CTERA, (for more information please see SFD22 videos of their session) discussing their enterprise class, cloud NAS solution.

We’ve heard a lot about cloud NAS systems lately (see our/listen to our GreyBeards on Storage podcast with LucidLink from last month). Cloud NAS systems provide a NAS (SMB, NFS, and S3 object storage) front-end system that uses the cloud or onprem object storage to hold customer data which is accessed through the use of (virtual or hardware) caching appliances.

These differ from file synch and share in that Cloud NAS systems

  • Don’t copy lots or all customer data to user devices, the only data that resides locally is metadata and the user’s or site’s working set (of files).
  • Do cache working set data locally to provide faster access
  • Do provide NFS, SMB and S3 access along with user drive, mobile app, API and web based access to customer data.
  • Do provide multiple options to host user data in multiple clouds or on prem
  • Do allow for some levels of collaboration on the same files

Although admittedly, the boundary lines between synch and share and Cloud NAS are starting to blur.

CTERA is a software defined solution. But, they also offer a whole gaggle of hardware options for edge filers, ranging from smart phone sized, 1TB flash cache for home office user to a multi-RU media edge server with 128TB of hybrid disk-SSD solution for 8K video editing.

They have HC100 edge filers, X-Series HCI edge servers, branch in a box, edge and Media edge filers. These later systems have specialized support for MacOS and Adobe suite systems. For their HCI edge systems they support Nutanix, Simplicity, HyperFlex and VxRail systems.

CTERA edge filers/servers can be clustered together to provide higher performance and HA. This way customers can scale-out their filers to supply whatever levels of IO performance they need. And CTERA allows customers to segregate (file workloads/directories) to be serviced by specific edge filer devices to minimize noisy neighbor performance problems.

CTERA supports a number of ways to access cloud NAS data:

  • Through (virtual or real) edge filers which present NFS, SMB or S3 access protocols
  • Through the use of CTERA Drive on MacOS or Windows desktop/laptop devices
  • Through a mobile device app for IOS or Android
  • Through their web portal
  • Through their API

CTERA uses a, HA, dual redundant, Portal service which is a cloud (or on prem) service that provides CTERA metadata database, edge filer/server management and other services, such as web access, cloud drive end points, mobile apps, API, etc.

CTERA uses S3 or Azure compatible object storage for its backend, source of truth repository to hold customer file data. CTERA currently supports 36 on-prem and in cloud object storage services. Customers can have their data in multiple object storage repositories. Customer files are mapped one to one to objects.

CTERA offers global dedupe, virus scanning, policy based scheduled snapshots and end to end encryption of customer data. Encryption keys can be held in the Portals or in a KMIP service that’s connected to the Portals.

CTERA has impressive data security support. As mentioned above end-to-end data encryption but they also support dark sites, zero-trust authentication and are DISA (Defense Information Systems Agency) certified.

Customer data can also be pinned to edge filers, Moreover, specific customer (director/sub-directorydirectories) data can be hosted on specific buckets so that data can:

  • Stay within specified geographies,
  • Support multi-cloud services to eliminate vendor lock-in

CTERA file locking is what I would call hybrid. They offer strict consistency for file locking within sites but eventual consistency for file locking across sites. There are performance tradeoffs for strict consistency, so by using a hybrid approach, they offer most of what the world needs from file locking without incurring the performance overhead of strict consistency across sites. For another way to do support hybrid file locking consistency check out LucidLink’s approach (see the GreyBeards podcast with LucidLink above).

At the end of their session Aron Brand got up and took us into a deep dive on select portions of their system software. One thing I noticed is that the portal is NOT in the data path. Once the edge filers want to access a file, the Portal provides the credential verification and points the filer(s) to the appropriate object and the filers take off from there.

CTERA’s customer list is very impressive. It seems that many (50 of WW F500) large enterprises are customers of theirs. Some of the more prominent include GE, McDonalds, US Navy, and the US Air Force.

Oh and besides supporting potentially 1000s of sites, 100K users in the same name space, and they also have intrinsic support for multi-tenancy and offer cloud data migration services. For example, one can use Portal services to migrate cloud data from one cloud object storage provider to another.

They also mentioned they are working on supplying K8S container access to CTERA’s global file system data.

There’s a lot to like in CTERA. We hadn’t heard of them before but they seem focused on enterprise’s with lots of sites, boatloads of users and massive amounts of data. It seems like our kind of storage system.

Comments?

Storageless data!?

I (virtually) attended SFD21 earlier this year and a company called Hammerspace presented discussing their vision for storageless data (see videos of their session at SFD21).

We’ve talked them before but now they have something to offer the enterprise – data mobility or storageless data.

The white board after David Flynn’s session at SFD8

In essence, customers want to be able to run their workloads wherever it makes the most sense, on prem, in private cloud, and in the public cloud among other places. Historically, it’s been relatively painless to transfer an application’s binary from one to another data center, to a managed service provider or to the public cloud.

And with VMware Cloud Foundation, Kubernetes, Docker and Linux operating everywhere, the runtime environment and other OS services that applications depend on are pretty much available in any of those locations. So now customers have 2 out of 3, what’s left?

It’s all about the Data

Data can take a very long time to move around a data center, let alone across the web between locations. MBs and even GBs of data may be relatively painless to move, but TBs of data can be take days, and moving PBs of data is suicidal.

For instance, when we signed up for a globally accessible file synch and share storage service, I probably had 75GB or so of data I wanted managed. It took literally several days of time to upload this. Yes, I didn’t have data center class internet access, but even that might have only sped this up 2-5X. Ok, now try this with 1TB or more and it’s pretty much going to take days, and you can easily multiple that by 10 to do a PB or more. And that’s if it happens to continue to perform the transfer without disruption.

So what’s Hammerspace storageless data got to do with any of this.

Hammerspace’s idea

It’s been sort of a ground truth of storage, since I’ve been in the industry (40+ years now), that not all random IO data is accessed at the same frequency. That is, some data is accessed a lot and other data accessed hardly at all. That’s why DRAM caching of data can be so important to a host or storage system.

Similarly for sequential access, if you can get the first blocks of data to the host and then stream the rest in time, a storage system can appear to read fast.

Now I won’t go into all the tricks of doing good data caching, (the secret sauce to every vendor’s enterprise storage), but if you can appear to cache data well, you don’t actually need to transfer all the data associated with an application to a location it’s running in, you can appear as if all the data is there, when actually only some of it is present.

Essentially, Hammerspace creates a global file system for your data, across any locations you wish to use it, with great caching, optimized data transfer and with real storage behind it. Servers running your applications mount a Hammerspace file system/share that stitches together all the file storage behind it, across all the locations it’s operating in.

An application request goes to Hammerspace and if the data is not present there, Hammerspace goes and fetches and caches blocks of data as fast as it can. This will let the application start performing IO while the rest of the data is being cached and if allowed, moved to the new location.

Storage can be not managed by Hammerspace, read-write managed by Hammerspace or read-only managed by Hammerspace. For customers who want the whole Hammerspace storageless data functionality they would use read-write mode. For those who just want to access data elsewhere read-only would suffice. Customers who want to continue to access data directly but want read access globally, would use the read-only mode.

Once read-write storage is assigned to Hammerspace grabs all the file metadata information on the storage system. Once this process completes, customers no longer access this file data directly, but rather must access it through Hammerspace. At that point, this data is essentially storageless and can be accessed wherever Hammerspace services are available.

How does Hammerspace do it

Behind the scenes is a lot of technology. Some of which is discussed in the SFD21 sessions (see video’s above). Hammerspace is not in the data path but rather in the control path of data access. But it does orchestrate data movement, and it does route data IO requests from an application to where the data (currently) resides.

Hammerspace also supports Service Level Objectives (SLOs) for performance, geolocation, security, data protection options,, etc. These can be used to keep data in particular regions, to encrypt data (using KMIP), ensure high performance, high data availability, etc.

Hammerspace can manage data across 32 separate sites. It takes a couple of hours to deploy. per site. Each site has a Hammerspace metadata service with standalone access to all data within that site. For example, standalone access could be used, in the event of a network loss.

At the moment, they support eventual consistency and don’t support a global lock service. Rather, Hammerspace uses a conflict resolution service in the event data is overwritten by two or more applications. For any file that was being updated in two or more locations, that file would be flagged as in conflict, Hammerspace would provide snapshots of the various versions of the file(s) and it would require some sort of manual intervention to resolve the conflict. Each location would have (temporary) access to the data it had written directly, but at some point the conflict would need resolution.

They also support NFS and SMB file access for the front end and use object storage services for backend data. Data is copied on demand to the local site’s storage when accessed based on the SLO policies in effect for it. During data movement it is copied up, temporarily into objects on AWS, Microsoft Azure, or GCP, and then copied down to the location it’s being moved to. I believe this temporary object data is encrypted and compressed. Hammerspace support KMIP key providers.

Pricing for Hammerspace is on a managed capacity basis. But anyone can use Hammerspace for up to 10TB for free. Hammerspace is available in AWS marketplace for configuration there.

~~~~

Well it’s been a long time coming, but it appears to be here. Any customers wanting hybrid-cloud operations or global access to their data would be remiss to not check out Hammerspace.

[Edited after posting, The Eds.]

Storage that provides 100% performance at 99% full

A couple of weeks back we were talking with Qumulo at Storage Field Day 20 (SFD20) and they made mention that they were able to provide 100% performance at 99% full. Please see their video session during SFD20 (which can be seen here). I was a bit incredulous of this seeing as how every other modern storage system performance degrades long before they get to 99% capacity.

So I asked them to explain how this was possible. But before we get to that a little background on modern storage systems would be warranted.

The perils of log structured file systems

Most modern storage systems use a log structured file system where when they write data they write it to a sequential log and use a virtual addressing scheme to show where the data is located for that address, creating a (data) log of written blocks.

However, when data is overwritten, it leaves gaps in these data logs. These gaps need to be somehow recycled (squeezed out) in order to be able to be consumed as storage capacity. This recycling process is commonly called “garbage collection”.

Garbage collection does its work by reading heavily gapped log files and re-writing the old, but still current, data into a new log. This frees up those gaps to be reused. But garbage collection like this takes reading and writing of logs to free up space.

Now as log structured file systems get (70-80-90%) full, they need to spend more and more system time and effort (=performance) garbage collecting . This takes system (IO) performance away from normal host IO activity. Which is why I didn’t believe that Qumulo could offer 100% IO performance at 99% full.

But there was always another way to supply storage virtualization (read snapshotting) besides log files. Yes it might involve more metadata (table) management, but what it takes in more metadata, it gives back by requiring no garbage collection.

How Qumulo does without garbage collection

Qumulo has a scaled block store for a back end of their file and object cluster store. And yes it’s still a virtualized block store BUT it’s not a log structured file store.

It seems that there’s a virtual-to-physical mapping table that is used by Qumulo to determine the physical address of any virtual block in the file system. And files are allocated to virtual blocks directly through the use of B-tree metadata. These B-trees indicate which virtual blocks are in use by a file and its snapshots

If a host overwrites a data block. The block can be freed (if not being used in a snapshot) and placed on a freed block list and a new block is allocated in its place. The file’s allocated blocks b-tree is updated to reflect the new block and that’s it.

For snapshots, Qumulo uses something they call “write-out-of-place” process when data that a snapshot points to is overwritten. Again, it appears as if snapshots are some extra metadata associated with a file’s B-tree that defines the data in the snapshot.

The problem comes in when a file is deleted. If it’s a big enough file (TB-PB?), there could be millions to billions of blocks that have to be freed up. This would take entirely too long for a delete command, so this is done in the background. Qumulo calls this “reclaim delete“. So a delete of a big file unlinks the block B-tree from the directory and puts it on this reclaim delete work queue to free up these blocks later. Similarly, when a big snapshot is deleted, Qumulo performs a background process called “reclaim snapshot” for snapshot unique blocks.

As can be seen (it’s very hard to see given the coloration of the chart) from this screen shot of Qumulo’s session at SFD20, reclaim delete and reclaim snapshot are being done concurrently (in the background) with normal system IO. What’s interesting to note here is that reclaim IO (delete and snapshots) are going on all the time during the customers actual work. Why the write throughput drops significantly doing the the 27-29 of July is hard to understand. But the one case where it’s most serious (middle of July 28) reclaim IO also drops significantly. If reclaim IO were impacting write performance I would have expected it to have gone higher when write throughput went lower. But that’s not the case. From what I can see in the above reclaim IO has no impact on read or write throughput at this customer.

So essentially, by using a backing block store that does no garbage collection (not using a log structured file system), Qumulo is able to offer 100% system IO performance at 99% full – woah.

Can we back up a PB?

Tradition says no way. IT backup history says not on your life. Common sense would say never in a million years.

Most organizations with PB of data or more, depend on remote replication to protect against data center outage or massive loss of data. This of course costs ~2X your original data center. And for some organizations one copy is not enough, so ~3X .

I don’t know what a PB scale data storage costs these days but I can’t believe it’s under a couple Million $ USD in hw and sw costs and probably at least another Million or so in OpEx/year. Multiply that by 2 or 3X and you’re now talking real money.

How could backup help?

Well for one you wouldn’t need replicas, so that would cut your hw & sw acquisition costs by a factor of 2 or 3. But backup storage is not free either. So you’d probably need to add back 30-50% of the original data center in hw & sw costs for backups.

You certainly wouldn’t need as many admins. And power for backup storage should also be substantially less. So maybe your OpEx would only be 1.5X in total for the original PB and its backups.

But what could possibly back up a PB of data?

We were talking with Igneous at Cloud Field Day 8 (CFD8, see their video here)  a couple of weeks back and they said they could and do backup PBs of data for customers today. A while back, e also talked with them on a GreyBeards on Storage podcast.

The problems with backing up a PB seem insurmountable. First you have to be able to scan a PB of data. This means looking into multiple file systems on many different hardware platforms, across potentially multiple data centers, and that’s just to get a baseline of what all needs to be backed up.

Then at some point you actually have to store all that data on backup storage. So, to gain some cost advantage, you’d want to compress and deduplicate a PB of data, so that the first full backup wouldn’t take a full PB of backup storage.

Then of course you have to transfer a PB of data to your backup storage, in something that wouldn’t take months to perform. And that just gets you the first full backup.

Next, comes the daily scan of what’s changed. This has to re-scan your PB of data to find that 100TB or so, that’s changed over the last 24 hrs. Sometime after that scan completes, then all that 100TB or so of changed data needs to be compressed, deduped and transferred again to backup storage

And if that’s not enough, you have to do it all over again, every day, from now on, almost forever. And data continues to grow. So 1PB today is likely to be 2PB of more in 12 months (it’s great to be in the storage business). 

So those are the challenges. How can it be done, effectively, day in and day out, enough so that IT can depend on their data being backed up.

Igneous to the rescue…

First, Igneous came out of stealth a while back (listen to our podcast) with a couple of unique capabilities needed for massive data repository discovery and analysis. That is they built a unique engine to scan and index PB scale data repositories. This was so they couldd provide administrators better visibility into their PB scale data repositories. But this isn’t about that product, it’s about backup. 

But some of the capabilities they needed to support that product helped them perform backups as well. For instance, their scan needed to handle PBs of data. They came up with AdaptiveSCAN, which didn’t use standard NFS and SMB data transfer protocols to gain access to file metadata. To open a file on NFS or SMB takes quite a lot of NFS or SMB transactions. But to access metadata only, one doesn’t have to use all those NFS and SMB capabilities, it can be done with much less overhead even when using NFS or SMB.

Of course having a way to scan Billions of files was a major accomplishment, but then where do you put all that metadata. And how can you access it effectively to support backup up a PB data repository. So they needed some serious data indexing capabilities and so came up with InfiniteINDEX

Now a trillion item index, seems a bit much, even for PB scale repositories. But my guess is they have eyes on taking their PB scale backups and going after even bigger fish,. That is offering backups for EB scale data repository. And that might just take a trillion item index

Next, there’s moving PB or even TB of data quickly is no small trick. As the development team at Igneous mostly came from unstructured data providers, they also understood and have access to APIs for most storage vendors (NetApp, Dell-EMC Isilon, Pure FlashBlade, Qumulo, etc.). As such, where available, they utilized those native vendor storage API calls to help them move data rather than having to Open an NFS or SMB file and Read it. 

Of course, even doing all that, moving 100TBs of data around or scanning PB sized data repositories is going to take a lot of processing and IO bandwidth to do in a reasonable period of time. 

So another capability they developed is massive parallelism. That is being able to distribute scan, indexing or data movement work, out to multiple systems. In that fashion it can be accomplished in significantly less wall clock time. 

Well with all that, they pretty much had the guts of a backup application system for PB data repositories but they still didn’t have the glue to put it all together. But recently they announced just that a Igneous’s DataProtect, a full scale backup application for PB of data. 

I suppose I haven’t done justice to all of what they have developed or talked about at their session, so I would suggest viewing their talk at CFD8 and listening to our GBoS podcast to learn more. They did demo their product at CFD8 but I believe it was a canned demo.

I didn’t think I’d see the day when some vendor would offer backup services for PBs of data let alone be shooting for more, but there you have it. Igneous means to take your PB scale data repositories and make them as easy to operate as TB scale data repositories. They call that democratizing data.

Comments?

See these other CFD8 bloggers write ups on Igneous.

CFD8  – Igneous Follow Up  by Nate Avery (@Nathaniel_Avery)

Picture credit(s): All from screen saves during Igneous’s session at CFD8

For data that never rests, NetApp NDAS

NetApp co-founder, Dave Hitz announced he was becoming a NetApp Founder Emeritus at the Storage Field Day (SFD18) show. He gave a great session about what he and his Hitz foundation’s been doing (for one example see our Archeology meets big data, post). He also discussed at length where he felt the storage world (and NetApp) must do to address the opportunities of the new cloud world. But this post isn’t about Dave, it’s about NetApp Data Availability Service, NDAS.

NetApp NDAS, currently in Beta but GAing (hopefully) later this year, is an AWS marketplace data orchestration solution that manages primary to secondary to S3 movement for ONTAP data. Essentially, NetApp Data Availability Services extends ONTAP data lifecycle management to AWS cloud. But it’s more than just a way to archive ONTAP data.

NDAS orchestrates Snapmirror services across ONTAP systems and AWS. But once your ONTAP data is in S3 it supplies access to that data for authorized AWS applications and services. That way one can use their ONTAP data to provide data analytics, train AI models, and do just about anything you can do with AWS applications today. By using NDAS, customers can extract more value from their ONTAP data.

NDAS is not just copying data to S3 but is also copying ONTAP metadata, catalogues and other information that provides context for that data. By copying ONTAP catalog information, customers and authorized end users can have file level access to ONTAP data residing in S3 objects.

NDAS today, only supports copying data from secondary ONTAP systems to S3. But a future enhancement will expand this to copy primary ONTAP data to S3.

How does NDAS work

NDAS provisions (your) EC2 instances, and middleware to read the data from the secondary systems and copy it to S3 buckets which you provide. NDAS after initial configuration to point to your ONTAP secondary storage systems, will autodiscover all the data available that can be copied to the cloud.

NDAS will start cataloguing your ONTAP data. NDAS EC2 instances support the NDAS copy, view and a Google-like search processes.

NDAS search presents a simplified file system view into your ONTAP data copied to S3. That way customers can identify data that could be used for AI training or data analytics that run in the cloud to access the data.

There’s extensive security to insure that NDAS is properly authorized to access your ONTAP data. Normal S3 security options also apply such as to have the data be encrypted on S3. NDAS data is automatically encrypted in flight.

Moreover, NDAS S3 bucket data can be replicated across AWS regions . Also serverless/lambda funationality are fully supported from or NDAS S3 buckets. .

What can it do with the data

AWS applications can access the data directly through NDAS APIs. Or customers can manually extract data they want to further process using the NDAS GUI to identify and copy data of interests. NDAS essentially creates a small app layer that allows users to view and access the ONTAP data in S3 as a file system.

One can have different NDAS AMIs operating in different regions for faster access or to support GDPR compliance requirements. Alternatively, a customer could have one NDAS AMI accessing all their secondary ONTAP instances.

NDAS is intended to provide a data analyst or IT generalist access to ONTAP data. This way AI training and big data analytics applications which run easily in the cloud, can have access to ONTAP data. In this way, customers can more effectively utilize data that IT has been storing and maintaining, since time began.

One NDAS beta customer is a MLB team. They have over time instrumented their stadiums to generate lot’s of data about pitch speed, rotation, ball location as it crosses the plate, etc.   The problem with all this data is siloed in onprem or IOT systems that generated it. But the customer wants to use the data to improve players, coaches and the viewer experience. And all that needs tools, applications and software that’s just not available to run in the data center. But with NDAS all this data is now available to cloud applications.

NDAS is supported by any ONTAP 9.5 or later (FAS, AFF, Cloud ONTAP, ONTAPselect) secondary storage system. ONTAP 9.5 software contains all the services required to support NDAS. This includes the copy-to-cloud APIs, as well as the NDAS proxy, which supplies the secure interface to NDAS operating in the cloud.

NetApp’s NDAS sessions are pretty informative. Anyone interested in finding out more should checkout the videos available on TechFieldDay website and Dave’s session is also worth a view.

For more information on Dave’s session and NDAS check out:

NetApp, Cloudier than ever by Enrico Signoretti (@ESignoretti)

NetApp and the space in between by Dan Frith (@PenguinPunk)

~~~~

Comments?

NetApp’s new NVMeoF/FC AFF & Cloud Data Volumes for every cloud

We attended a NetApp analyst event in their CA HQ last week and they had some interesting announcements as well other information to share. 1st up new faster ONTAP storage.

NVMeoF AFF

NetApp announced this week that their latest generation AFF (All Flash FAS) systems will support FC NVMeoF. We asked if this was just for NVMe SSDs or did it apply to all AFF media. The answer was it’s just another host interface which the customer can license for NVMe SSDs (available only on AFF F800) or SAS SSDs (A700S, A700, and A300). The only AFF not supporting the new host interface is their lowend AFF A220.

As for which NVMeoF, they only support FC at the moment, and it’s our belief that the FC NVMeoF spec is most well defined these days and the FC switch hardware (Brocade-Broadcom since Gen 5, now shipping Gen 6, Cisco not sure) already has NVMeoF support.

NetApp also mentioned support for 100GbE (A800 & A700S only) and 32Gbs FC hardware (all AFF systems but A220). So, presumably they offer NVMeoF for both 32Gbps and 16Gbps FC.

No word on when this will be available for Ethernet FCoE or iSCSI (iNVMe?) but with all the major storage vendors bar one, moving to NVMe SSDs it’s only a matter of time before they also support Ethernet NVMeoF.

As for AFF NVMeoF performance, the answer wasn’t entirely satisfactory. The indication was that the interface reduced response time by 10 usecs or so for NVMe SSDs over SAS SSDs. But I didn’t see any other performance information to substantiate that.

We did see on their AFF datasheet that with NVMe SSDs and NVMeoF FC, the AFF A800 response time was sub 200usec with throughput of 300GB/s (in a 24 node cluster, 12 HA pairs). This means they add only about 100usec for ONTAP data services, a decent trade off from our perspective. Later in their datasheet they say the A800 is capable of 1.3M IOPS and sub-500usec latencies. Unsure why they quoted both numbers.

Cloud Data Volumes

NetApp is taking storage to the cloud. They just announced that NetApp Cloud Data Volumes will be available as a native service under Google Cloud Platform (GCP). NetApp Cloud Data Volume is a storage-as-a-service offering that provides on demand ONTAP file services in the cloud.

For GCP,  both Google and NetApp will be offering the service. Dianne Green, GCP VP said Cloud Data Volumes are a bit like Kubernetes, disruption without disrupting. Customers can easily migrate their onprem file based applications to the cloud without having to worry about the performance of their data or data protection for that matter.

Getting the data there is another matter, but NetApp has other services like CloudSync and someday (maybe for Cloud Data Volumes), SnapMirror, which can help customers move data to and from the cloud.

Currently Cloud Data Volumes are in public preview as an Microsoft Azure Enterprise NFS (and SMB) service. It’s also in beta (I think) in AWS marketplace. And availability on GCP is still restricted. There’s a lot of emphasis at NetApp events on Cloud Data Volumes given its current status on public cloud providers but we think they are trying to gain some experience before they roll it out to the rest of the world.

However,  Jean English, NetApp CMO mentioned that NetApp’s Cloud Data Service business unit has over 1800 customers and currently supports a multi-PB storage footprint in various clouds. Note, this is not just Cloud Data Volumes but comprises all NetApp Cloud Data Services, which includes ONTAP Cloud, NPS, CloudSync, AltaVault, etc. Nonetheless, it’s an impressive indicator of just how far they have come in applying their storage magic to the public cloud in a short time. The hyperscalers (read public cloud providers) say NetApp is 2 or more years ahead of all the other competition and from what we can see, it’s true.

One of the key differentiators between NetApp Cloud Data Volumes and ONTAP Cloud is performance SLAs. Cloud Data Volume customers can select and purchase a specified performance SLA. We believe it comes at three levels and is normally purchased on a pay as you go, consumption based, service offering. However, it’s also available to be billed periodically, other purchase options may be available as well.

When asked what storage was behind the service, the only thing NetApp would confirm was that it was ONTAP storage, present in public cloud data centers in various regions. So Cloud Data Volumes is available in only specific regions but I would expect that to expand over time.

Data Visualization Center

They also christened their new Data Visualization Center (DVC) and we had a multi-course meal at the Bistro at the center. The DVC had a wrap around, 1.5 floor tall screen which showed some of NetApp customer success stories. Inside the screen was a more immersive setting and there was plenty of VR equipment in work spaces alongside customer conference rooms.

Full Disclosure: NetApp paid for all our travel, hotel and food during the analyst event and gave us all Google Home Minis as going away presents and NetApp is a long time customer of my firm.