58: GreyBeards talk HCI with Adam Carter, Chief Architect NetApp Solidfire #NetAppHCI

Sponsored by:In this episode we talk with Adam Carter (@yoadamcarter), Chief Architect, NetApp Solidfire & HCI (Hyper Converged Infrastructure) solutions. Howard talked with Adam at TFD16 and I have known Adam since before the acquisition. Adam did a tour de force session on HCI architectures at TFD16 and we would encourage you to view the video’s of his session.

This is the third time NetApp has been on our show (see our podcast with Lee Caswell and Dave Wright and our podcast with Andy Banta) but this is the first sponsored podcast from NetApp. Adam has been Chief Architect for Solidfire for as long as I have known him.

NetApp has FAS/AFF series storage, E-Series storage and SolidFire storage. Their HCI solution is based on their SolidFire storage system.

NetApp SolidFire HCI Appliance

 

NetApp’s HCI solution is built around a 2U 4-server configuration where 3 of the nodes are actual denser, new SolidFire storage nodes and the 4th node is a VMware ESXi host. That is they have a real, fully functional SolidFile AFA SAN storage system built directly into their HCI solution.

There’s probably a case to be made that this isn’t technically a HCI system from an industry perspective and looks more like a well architected, CI  (converged infrastructure) solution. However, they do support VMs running on their system, its all packaged together as one complete system, and they offer end-to-end (one throat to choke) support, over the complete system.

In addition, they spent a lot of effort improving SolidFire’s, already great VMware software integration to offer a full management stack that fully supports both the vSphere environment and the  embedded SolidFire AFA SAN storage system.

Using a full SolidFire storage system in their solution, NetApp  gave up on the low-end (<$30K-$50K) portion of the HCI market. But to supply the high IO performance, multi-tenancy, and QoS services of current SolidFire storage systems, they felt they had to embed a full SAN storage system.

With other HCI solutions, the storage activity must contend with other VMs and kernel processing on the server. And in these solutions, the storage system doesn’t control CPU/core/thread allocation and as such, can’t guarantee IO service levels that SolidFire is known for.

Also, by configuring their system with a real AFA SAN system, new additional ESXi servers can be added to the complex without needing to purchase additional storage software licenses. Further, customers can add bare metal servers to this environment and there’s still plenty of IO performance to go around. On the other hand, if a customer truly needs more storage performance/capacity, they can always add an additional, standalone SolidFire storage node to the cluster.

The podcast runs ~23 minutes. Adam was very easy to talk with and had deep technical knowledge of their new solution, industry HCI solutions and SolidFire storage.  It’s was a great pleasure for Howard and I to talk with him again. Listen to the podcast to learn more.

Adam Carter, Chief Architect, NetApp SolidFire

Adam Carter is the Chief Product Architect for SolidFire and HCI at NetApp. Adam is an expert in next generation data center infrastructure and storage virtualization.

Adam has led product management at LeftHand Networks, HP, VMware, SolidFire, and NetApp bringing revolutionary products to market. Adam pioneered the industry’s first Virtual Storage Appliance (VSA) product at LeftHand Networks and helped establish VMware’s VSA certification category.

Adam brings deep product knowledge and broad experience in the software defined data center ecosystem.

56: GreyBeards talk high performance file storage with Liran Zvibel, CEO & Co-Founder, WekaIO

This month we talk high performance, cluster file systems with Liran Zvibel (@liranzvibel), CEO and Co-Founder of WekaIO, a new software defined, scale-out file system. I first heard of WekaIO when it showed up on SPEC sfs2014 with a new SWBUILD benchmark submission. They had a 60 node EC2-AWS cluster running the benchmark and achieved, at the time, the highest SWBUILD number (500) of any solution.

At the moment, WekaIO are targeting HPC and Media&Entertainment verticals for their solution and it is sold on an annual capacity subscription basis.

By the way, a Wekabyte is 2**100 bytes of storage or ~ 1 trillion exabytes (2**60).

High performance file storage

The challenges with HPC file systems is that they need to handle a large number of files, large amounts of storage with high throughput access to all this data. Where WekaIO comes into the picture is that they do all that plus can support high file IOPS. That is, they can open, read or write a high number of relatively small files at an impressive speed, with low latency. These are becoming more popular with AI-machine learning and life sciences/genomic microscopy image processing.

Most file system developers will tell you that, they can supply high throughput  OR high file IOPS but doing both is a real challenge. WekaIO’s is able to do both while at the same time supporting billions of files per directory and trillions of files in a file system.

WekaIO has support for up to 64K cluster nodes and have tested up to 4000 cluster nodes. WekaIO announced last year an OEM agreement with HPE and are starting to build out bigger clusters.

Media & Entertainment file storage requirements are mostly just high throughput with large (media) file sizes. Here WekaIO has a more competition from other cluster file systems but their ability to support extra-large data repositories with great throughput is another advantage here.

WekaIO cluster file system

WekaIO is a software defined  storage solution. And whereas many HPC cluster file systems have metadata and storage nodes. WekaIO’s cluster nodes are combined meta-data and storage nodes. So as one scale’s capacity (by adding nodes), one not only scales large file throughput (via more IO parallelism) but also scales small file IOPS (via more metadata processing capabilities). There’s also some secret sauce to their metadata sharding (if that’s the right word) that allows WekaIO to support more metadata activity as the cluster grows.

One secret to WekaIO’s ability to support both high throughput and high file IOPS lies in  their performance load balancing across the cluster. Apparently, WekaIO can be configured to constantly monitoring all cluster nodes for performance and can balance all file IO activity (data transfers and metadata services) across the cluster, to insure that no one  node is over burdened with IO.

Liran says that performance load balancing was one reason they were so successful with their EC2 AWS SPEC sfs2014 SWBUILD benchmark. One problem with AWS EC2 nodes is a lot of unpredictability in node performance. When running EC2 instances, “noisy neighbors” impact node performance.  With WekaIO’s performance load balancing running on AWS EC2 node instances, they can  just redirect IO activity around slower nodes to faster nodes that can handle the work, in real time.

WekaIO performance load balancing is a configurable option. The other alternative is for WekaIO to “cryptographically” spread the workload across all the nodes in a cluster.

WekaIO uses a host driver for Posix access to the cluster. WekaIO’s frontend also natively supports (without host driver) NFSv3, SMB3.1, HDFS and AWS S3  protocols.

WekaIO also offers configurable file system data protection that can span 100s of failure domains (racks) supporting from 4 to 16 data stripes with 2 to 4 parity stripes. Liran said this was erasure code like but wouldn’t specifically state what they are doing differently.

They also support high performance storage and inactive storage with automated tiering of inactive data to object storage through policy management.

WekaIO creates a global name space across the cluster, which can be sub-divided into one to thousands  of file systems.

Snapshoting, cloning & moving work

WekaIO also has file system snapshots (readonly) and clones (read-write) using re-direct on write methodology. After the first snapshot/clone, subsequent snapshots/clones are only differential copies.

Another feature Howard and I thought was interesting was their DR as a Service like capability. This is, using an onprem WekaIO cluster to clone a file system/directory, tiering that to an S3 storage object. Then using that S3 storage object with an AWS EC2 WekaIO cluster to import the object(s) and re-constituting that file system/directory in the cloud. Once on AWS, work can occur in the cloud and the process can be reversed to move any updates back to the onprem cluster.

This way if you had work needing more compute than available onprem, you could move the data and workload to AWS, do the work there and then move the data back down to onprem again.

WekaIO’s RtOS, network stack, & NVMeoF

WekaIO runs under Linux as a user space application. WekaIO has implemented their own  Realtime O/S (RtOS) and high performance network stack that runs in user space.

With their own network stack they have also implemented NVMeoF support for (non-RDMA) Ethernet as well as InfiniBand networks. This is probably another reason they can have such low latency file IO operations.

The podcast runs ~42 minutes. Linar has been around  data storage systems for 20 years and as a result was very knowledgeable and interesting to talk with. Liran almost qualifies as a Greybeard, if not for the fact that he was clean shaven ;/. Listen to the podcast to learn more.

Linar Zvibel, CEO and Co-Founder, WekaIO

As Co-Founder and CEO, Mr. Liran Zvibel guides long term vision and strategy at WekaIO. Prior to creating the opportunity at WekaIO, he ran engineering at social startup and Fortune 100 organizations including Fusic, where he managed product definition, design and development for a portfolio of rich social media applications.

 

Liran also held principal architectural responsibilities for the hardware platform, clustering infrastructure and overall systems integration for XIV Storage System, acquired by IBM in 2007.

Mr. Zvibel holds a BSc.in Mathematics and Computer Science from Tel Aviv University.

54: GreyBeards talk scale-out secondary storage with Jonathan Howard, Dir. Tech. Alliances at Commvault

This month we talk scale-out secondary storage with Jonathan Howard,  Director of Technical Alliances at Commvault.  Both Howard and I attended Commvault GO2017 for Tech Field Day, this past month in Washington DC. We had an interesting overview of their Hyperscale secondary storage solution and Jonathan was the one answering most of our questions, so we thought he would make an good guest for our podcast.

Commvault has been providing data protection solutions for a long time, using anyone’s secondary storag, but recently they have released a software defined, scale-out secondary storage solution that runs their software with a clustered file system.

Hyperscale secondary storage

They call their solution, Hyperscale secondary storage and it’s available in both an hardware-software appliance as well as software only configuration on compatible off the shelf commercial hardware. Hyperscale uses the Red Hat Gluster cluster file system and together with the Commvault Data Platform provides a highly scaleable, secondary storage cluster that can meet anyone’s secondary storage needs while providing high availability and high throughput performance.

Commvault’s Hyperscale secondary storage system operates onprem in customer data centers. Hyperscale uses flash storage for system metadata but most secondary storage resides on local server disk.

Combined with Commvault Data Platform

With the sophistication of Commvault Data Platform one can have all the capabilities of a standalone Commvault environment with software defined storage. This allows just about any RTO/RPO needed by today’s enterprise and includes Live Sync secondary storage replication,  Onprem IntelliSnap for on storage snapshot management, Live Mount for instant recovery using secondary storage directly  to boot your VMs without having to wait for data recovery.  , and all the other recovery sophistication available from Commvault.

Hyperscale storage is capable of doing up to 5 Live Mount recoveries simultaneously per node without a problem but more are possible depending on performance requirements.

We also talked about Commvault’s cloud secondary storage solution which can make use of AWS S3 storage to hold backups.

Commvault’s organic growth

Most of the other data protection companies have came about through mergers, acquisitions or spinoffs. Commvault has continued along, enhancing their solution while bashing everything on an underlying centralized metadata database.  So their codebase was grown from the bottom up and supports pretty much any and all data protection requirements.

The podcast runs ~50 minutes. Jonathan was very knowledgeable about the technology and was great to talk with. Listen to the podcast to learn more.

Jonathan Howard, Director, Technical and Engineering Alliances, Commvault

Jonathan Howard is a Director, Technology & Engineering Alliances for Commvault. A 20-year veteran of the IT industry, Jonathan has worked at Commvault for the past 8 years in various field, product management, and now alliance facing roles.

In his present role with Alliances, Jonathan works with business and technology leaders to design and create numerous joint solutions that have empowered Commvault alliance partners to create and deliver their own new customer solutions.

51: GreyBeards talk hyper convergence with Lee Caswell, VP Product, Storage & Availability BU, VMware

Sponsored by:

VMware

In this episode we talk with Lee Caswell (@LeeCaswell), Vice President of Product, Storage and Availability Business Unit, VMware.  This is the second time Lee’s been on our show, the previous one back in April of last year when he was with his prior employer. Lee’s been at VMware for a little over a year now and has helped lead some significant changes in their HCI offering, vSAN.

VMware vSAN/HCI business

Many customers struggle to modernize their data centers with funding being the primary issue. This is very similar to what happened in the early 2000s as customers started virtualizing servers and consolidating storage. But today, there’s a new option, server based/software defined storage like VMware’s vSAN, which can be deployed for little expense and grown incrementally as needed. VMware’s vSAN customer base is currently growing by 150% CAGR, and VMware is adding over 100 new vSAN customers a week.

Many companies say they offer HCI, but few have adopted the software-only business model this entails. The transition from a hardware-software, appliance-based business model to a software-only business model is difficult and means a move from a high revenue-lower margin business to a lower revenue-higher margin business. VMware, from its very beginnings, has built a sustainable software-only business model that extends to vSAN today.

The software business model means that VMware can partner easily with a wide variety of server OEM partners to supply vSAN ReadyNodes that are pre-certified and jointly supported in the field. There are currently 14 server partners for vSAN ReadyNodes. In addition, VMware has co-designed the VxRail HCI Appliance with Dell EMC, which adds integrated life-cycle management as well as Dell EMC data protection software licenses.

As a result, customers can adopt vSAN as a build or a buy option for on-prem use and can also leverage vSAN in the cloud from a variety of cloud providers, including AWS very soon. It’s the software-only business model that sets the stage for this common data management across the hybrid cloud.

VMware vSAN software defined storage (SDS)

The advent of Intel Xeon processors and plentiful, relatively cheap SSD storage has made vSAN an easy storage solution for most virtualized data centers today. SSDs removed any performance concerns that customers had with hybrid HCI configurations. And with Intel’s latest Xeon Scalable processors, there’s more than enough power to handle both application compute and storage compute workloads.

From Lee’s perspective, there’s still a place for traditional SAN storage, but he sees it more for cold storage that is scaled independently from servers or for bare metal/non-virtualized storage environments. But for everyone else using virtualized data centers, they really need to give vSAN a look.

Storage vendors shifting sales

It used to be that major storage vendor sales teams would lead with hardware appliance storage solutions and then move to HCI when pushed. The problem was that a typical SAN storage sale takes 9 months to complete and then 3 years of limited additional sales.

To address this, some vendors have taken the approach where they lead with HCI and only move to legacy storage when it’s a better fit. With VMware vSAN, it’s a quicker sales cycle than legacy storage because HCI costs less up front and there’s no need to buy the final storage configuration with the first purchase. VMware vSAN HCI can grow as the customer applications needs dictate, generating additional incremental sales over time.

VMware vSAN in AWS

Recently, VMware has announced VMware Cloud in AWS.What this means is that you can have vSAN storage operating in an AWS cloud just like you would on-prem. In this case, workloads could migrate from cloud to on-prem and back again with almost no changes. How the data gets from on-prem to cloud is another question.

Also the pricing model for VMware Cloud in AWS moves to a consumption based model, where you pay for just what you use on a monthly basis. This way VMware Cloud in AWS and vSAN is billed monthly, consistent with other AWS offerings.

VMware vs. Microsoft on cloud

There’s a subtle difference in how Microsoft and VMware are adopting cloud. VMware came from an infrastructure platform and is now implementing their infrastructure on cloud. Microsoft started as a development platform and is taking their cloud development platform/stack and bringing it to on-prem.

It’s really two different philosophies in action. We now see VMware doing more for the development community with vSphere Integrated Containers (VIC), Docker Containers, Kubernetes, and Pivotal Cloud foundry. Meanwhile Microsoft is looking to implement the Azure stack for on-prem environments, and they are focusing more on infrastructure. In the end, enterprises will have terrific choices as the software defined data center frees up customers dollars and management time.

The podcast runs ~25 minutes. Lee is a very knowledgeable individual and although he doesn’t qualify as a Greybeard (just yet), he has been in and around the data center and flash storage environments throughout most of his career. From his diverse history, Lee has developed a very business like perspective on data center and storage technologies and it’s always a pleasure talking with him.  Listen to the podcast to learn more.

Lee Caswell, V.P. of Product, Storage & Availability Business Unit, VMware

Lee Caswell leads the VMware storage marketing team driving vSAN products, partnerships, and integrations. Lee joined VMware in 2016 and has extensive experience in executive leadership within the storage, flash and virtualization markets.

Prior to VMware, Lee was vice president of Marketing at NetApp and vice president of Solution Marketing at Fusion-IO (now SanDisk). Lee was a founding member of Pivot3, a company widely considered to be the founder of hyper-converged systems, where he served as the CEO and CMO. Earlier in his career, Lee held marketing leadership positions at Adaptec, and SEEQ Technology, a pioneer in non-volatile memory. He started his career at General Electric in Corporate Consulting.

Lee holds a bachelor of arts degree in economics from Carleton College and a master of business administration degree from Dartmouth College. Lee is a New York native and has lived in northern California for many years. He and his wife live in Palo Alto and have two children. In his spare time Lee enjoys cycling, playing guitar, and hiking the local hills.

49: Greybeards talk open convergence with Brian Biles, CEO and Co-founder of Datrium

Sponsored By:

In this episode we talk with Brian Biles, CEO and Co-founder of Datrium. We last talked with Brian and Datrium in May of 2016 and at that time we called it deconstructed storage. These days, Datrium offers a converged infrastructure (C/I) solution, which they call “open convergence”.

Datrium C/I

Datrium’s C/I  solution stores persistent data off server onto data nodes and uses onboard flash for a local, host read-write IO cache. They also use host CPU resources to perform some other services such as compression, local deduplication and data services.

In contrast to hyper converged infrastructure solutions available on the market today, customer data is never split across host nodes. That is data residing on a host have only been created and accessed by that host.

Datrium uses on host SSD storage/flash as a fast access layer for data accessed by the host. As data is (re-)written, it’s compressed and locally deduplicated before being persisted (written) down to a data node.

A data node is a relatively light weight dual controller/HA storage solution with 12 high capacity disk drives. Data node storage is global to all hosts running Datrium storage services in the cluster. Besides acting as a permanent repository for data written by the cluster of hosts, it also performs global deduplication of data across all hosts.

The nice thing about their approach to CI is it’s easily scaleable — if you need more IO performance just add more hosts or more SSDs/flash to servers already connected in the cluster. And if a host fails it doesn’t impact cluster IO or data access for any other host.

Datrium originally came out supporting VMware virtualization and acts as an NFS datastore for VMDKs.

Recent enhancements

In July, Datrium released new support for RedHat and KVM virtualization alongside VMware vSphere. They also added Docker persistent volume support to Datrium. Now you can have mixed hypervisors KVM, VMware and Docker container environments, all accessing the same persistent storage.

KVM offered an opportunity to grow the user base and support Redhat enterprise accounts  Redhat is a popular software development environment in non-traditional data centers. Also, much of the public cloud is KVM based, which provides a great way to someday support Datrium storage services in public cloud environments.

One challenge with Docker support is that there are just a whole lot more Docker volumes then VMDKs in vSphere. So Datrium added sophisticated volume directory search capabilities and naming convention options for storage policy management. Customers can define a naming convention for application/container volumes and use these to define group storage policies, which will then apply to any volume that matches the naming convention. This is a lot easier than having to do policy management at a volume level with 100s, 1000s to 10,000s distinct volume IDs.

Docker is being used today to develop most cloud based applications. And many development organizations have adopted Docker containers for their development and application deployment environments. Many shops do development under Docker and production on vSphere. So now these shops can use Datrium to access development as well as production data.

More recently, Datrium also scaled the number of data nodes available in a cluster. Previously you could only have one data node using 12 drives or about 29TB raw storage of protected capacity which when deduped and compressed gave you an effective capacity of ~100TB. But with this latest release, Datrium now supports up to 10 data nodes in a cluster for a total of 1PB of effective capacity for your storage needs.

The podcast runs ~25 minutes. Brian is very knowledgeable about the storage industry, has been successful at many other data storage companies and is always a great guest to have on our show. Listen to the podcast to learn more.

Brian Biles, Datrium CEO & Co-founder

Prior to Datrium, Brian was Founder and VP of Product Mgmt. at EMC Backup Recovery Systems Division. Prior to that he was Founder, VP of Product Mgmt. and Business Development for Data Domain (acquired by EMC in 2009).