56: GreyBeards talk high performance file storage with Liran Zvibel, CEO & Co-Founder, WekaIO

This month we talk high performance, cluster file systems with Liran Zvibel (@liranzvibel), CEO and Co-Founder of WekaIO, a new software defined, scale-out file system. I first heard of WekaIO when it showed up on SPEC sfs2014 with a new SWBUILD benchmark submission. They had a 60 node EC2-AWS cluster running the benchmark and achieved, at the time, the highest SWBUILD number (500) of any solution.

At the moment, WekaIO are targeting HPC and Media&Entertainment verticals for their solution and it is sold on an annual capacity subscription basis.

By the way, a Wekabyte is 2**100 bytes of storage or ~ 1 trillion exabytes (2**60).

High performance file storage

The challenges with HPC file systems is that they need to handle a large number of files, large amounts of storage with high throughput access to all this data. Where WekaIO comes into the picture is that they do all that plus can support high file IOPS. That is, they can open, read or write a high number of relatively small files at an impressive speed, with low latency. These are becoming more popular with AI-machine learning and life sciences/genomic microscopy image processing.

Most file system developers will tell you that, they can supply high throughput  OR high file IOPS but doing both is a real challenge. WekaIO’s is able to do both while at the same time supporting billions of files per directory and trillions of files in a file system.

WekaIO has support for up to 64K cluster nodes and have tested up to 4000 cluster nodes. WekaIO announced last year an OEM agreement with HPE and are starting to build out bigger clusters.

Media & Entertainment file storage requirements are mostly just high throughput with large (media) file sizes. Here WekaIO has a more competition from other cluster file systems but their ability to support extra-large data repositories with great throughput is another advantage here.

WekaIO cluster file system

WekaIO is a software defined  storage solution. And whereas many HPC cluster file systems have metadata and storage nodes. WekaIO’s cluster nodes are combined meta-data and storage nodes. So as one scale’s capacity (by adding nodes), one not only scales large file throughput (via more IO parallelism) but also scales small file IOPS (via more metadata processing capabilities). There’s also some secret sauce to their metadata sharding (if that’s the right word) that allows WekaIO to support more metadata activity as the cluster grows.

One secret to WekaIO’s ability to support both high throughput and high file IOPS lies in  their performance load balancing across the cluster. Apparently, WekaIO can be configured to constantly monitoring all cluster nodes for performance and can balance all file IO activity (data transfers and metadata services) across the cluster, to insure that no one  node is over burdened with IO.

Liran says that performance load balancing was one reason they were so successful with their EC2 AWS SPEC sfs2014 SWBUILD benchmark. One problem with AWS EC2 nodes is a lot of unpredictability in node performance. When running EC2 instances, “noisy neighbors” impact node performance.  With WekaIO’s performance load balancing running on AWS EC2 node instances, they can  just redirect IO activity around slower nodes to faster nodes that can handle the work, in real time.

WekaIO performance load balancing is a configurable option. The other alternative is for WekaIO to “cryptographically” spread the workload across all the nodes in a cluster.

WekaIO uses a host driver for Posix access to the cluster. WekaIO’s frontend also natively supports (without host driver) NFSv3, SMB3.1, HDFS and AWS S3  protocols.

WekaIO also offers configurable file system data protection that can span 100s of failure domains (racks) supporting from 4 to 16 data stripes with 2 to 4 parity stripes. Liran said this was erasure code like but wouldn’t specifically state what they are doing differently.

They also support high performance storage and inactive storage with automated tiering of inactive data to object storage through policy management.

WekaIO creates a global name space across the cluster, which can be sub-divided into one to thousands  of file systems.

Snapshoting, cloning & moving work

WekaIO also has file system snapshots (readonly) and clones (read-write) using re-direct on write methodology. After the first snapshot/clone, subsequent snapshots/clones are only differential copies.

Another feature Howard and I thought was interesting was their DR as a Service like capability. This is, using an onprem WekaIO cluster to clone a file system/directory, tiering that to an S3 storage object. Then using that S3 storage object with an AWS EC2 WekaIO cluster to import the object(s) and re-constituting that file system/directory in the cloud. Once on AWS, work can occur in the cloud and the process can be reversed to move any updates back to the onprem cluster.

This way if you had work needing more compute than available onprem, you could move the data and workload to AWS, do the work there and then move the data back down to onprem again.

WekaIO’s RtOS, network stack, & NVMeoF

WekaIO runs under Linux as a user space application. WekaIO has implemented their own  Realtime O/S (RtOS) and high performance network stack that runs in user space.

With their own network stack they have also implemented NVMeoF support for (non-RDMA) Ethernet as well as InfiniBand networks. This is probably another reason they can have such low latency file IO operations.

The podcast runs ~42 minutes. Linar has been around  data storage systems for 20 years and as a result was very knowledgeable and interesting to talk with. Liran almost qualifies as a Greybeard, if not for the fact that he was clean shaven ;/. Listen to the podcast to learn more.

Linar Zvibel, CEO and Co-Founder, WekaIO

As Co-Founder and CEO, Mr. Liran Zvibel guides long term vision and strategy at WekaIO. Prior to creating the opportunity at WekaIO, he ran engineering at social startup and Fortune 100 organizations including Fusic, where he managed product definition, design and development for a portfolio of rich social media applications.

 

Liran also held principal architectural responsibilities for the hardware platform, clustering infrastructure and overall systems integration for XIV Storage System, acquired by IBM in 2007.

Mr. Zvibel holds a BSc.in Mathematics and Computer Science from Tel Aviv University.

53: GreyBeards talk MAMR and future disk with Lenny Sharp, Sr. Dir. Product Management, WDC

This month we talk new disk technology with Lenny Sharp, Senior Director of Product Management, responsible for enterprise disk with Western Digital Corp. (WDC). WDC recently announced their future disk offerings will be based on a new disk recording technology, called MAMR or microwave assisted magnetic recording.

Over the last decade or so the disk industry has been investing in HAMR or heat assisted magnetic recording as the next recording innovation. So, MAMR is a significant departure but appears well worth it.

WDC is arguably the leading supplier of HDD and one of the leading SSD suppliers to the industry today. Any departure from industry technology roadmaps for WDC is big news.

WDC is banking on MAMR technology to continue to offer capacity disk (for big data) at prices that are 10X below the price of flash storage for the foreseeable future. If they and the rest of the disk industry can deliver on that promise then there should be a substantial market for capacity disk for the next decade or so.

What’s  MAMR?

HAMR uses lasers to heat up a media spot being recorded. This boost in energy helps reduce the magnetic threshold of the grains inside the media and allowed them to be written or change state. Once that energy was removed, the data state on media would persist and could be read multiple times without error.

MAMR uses microwaves to add similar energy to the spot being written on disk media. MAMR doesn’t actually heat up the spot with microwaves, but it does add elector-magnetic energy to the spot being written, which has the same affect of reducing the threshold for writing the media.  I wrote a recent blog post about MAMR technology describing the technology in more detail

HAMR heated the media spot from 400C to 700C, which was potentially reduces disk reliability. MAMR, because it doesn’t heat the disk anymore than normal operations, should not impact disk reliability.

Also MAMR can use pretty much the same disk substrate used in enterprise disks today and be fabricated using much the same manufacturing lines used for PMR (perpendicular magnetic recording) heads, today.

Disk densities

MAMR should allow the industry to get to ~4.5Tb/sqin. Current PMR technology will probably max out at 1.0 to 1.3Tb/sqin.  PMR density growth has flatlined (6-7% per year) recently, but MAMR should put the disk industry back on a 15% density growth/year. The new MAMR disks will be sampling for enterprise customer in 2018 and in production by 2019.

As for how far MAMR will take disk, WDC said we can expect a 40TB disk device (using multiple platters) by 2025 and Lenny said perhaps double that eventually.

We ended our discussion with Lenny on WDC and other disk vendor moves outside of the device level. Over time, IT use of disks have changed and the disk vendor’s seem to believe the best way to address this transition is to look beyond disk/SSD devices and towards manufacturing storage shelves and potentially even systems!? We’ll need to wait and see the dust settle on these moves.

The podcast runs ~45 minutes. Lenny was very knowledgeable about current and future disk technology and seems to have been around the disk industry forever.  He’s got an insider’s view of disk technology, IT’s use of disk and storage market dynamics. Both  Howard and I enjoyed our time with him.   Listen to the podcast to learn more.

Lenny Sharp, Sr. Dir. Product Management, WDC

Lenny Sharp serves as Western Digital’s Sr. Director of Enterprise HDD product line management and planning. He has over 30 years of experience in high technology and storage. Sharp joined HGST in 2009, iniIally responsible for enterprise SSD.
He has also managed client HDD and spent four years in Japan, working closely with the development team and APAC customers.
Previously, he was responsible for managing systems, software, storage and semiconductors for companies including Dell, Philips, Western Digital and Maxtor (since acquired by Seagate).

33: GreyBeards talk HPC storage with Frederic Van Haren, founder HighFens & former Sr. Director of HPC at Nuance

IMG_6319In episode 33 we talk with Frederic Van Haren (@fvha), founder of HighFens, Inc. (@HighFens), a new HPC consultancy and former Senior Director of HPC at Nuance Communications. Howard and I got a chance to talk with Frederic at a recent HPE storage deep dive event, I met up with him again during SFD10, where he was talking on behalf of Kaminario, and he was also at HPE Discover conference last week.

Nuance is the backend speech recognition engine for a number of popular service offerings. Nuance looks very similar to a lot of other hyper-scale customers and ultimately, we feel may be the way of the future for all IT over the coming decades.  Nuance’s data storage journey since Frederic’s tenure with the company holds many lessons for all of us in the storage industry

Nuance currently has ~6PB usable (~16PB raw) of speech wave files as well as uncountable text and other files, all inside IBM SpectrumScale (GPFS).  They have both lots of big files and lots of small files. These days, Spectrum Scale is processing 2-3M files/second. They have doubled capacity for each of the last 9 years, and today handle a billion new files a month. GPFS stripes data across storage, provides data protection, migration, snapshotting and storage tiering across a diverse mix of storage. At the end of the podcast we discussed some open source alternatives to Spectrum Scale but at the time Nuance started down this path,  GPFS was found to be the only thing that could do the job. This proved to be a great solution as they have completely swapped out the underlying storage at least 3 times and all their users were none the wiser.

The first storage that Frederic talked about was Coraid (no longer in business) and their ATA over Ethernet storage solution. This used a SuperMicro with 24 SATA drives/shelf and they bought 40 shelves. Over time this grew to 1000s of SATA drives and was easily scaleable but hard to manage, as it was pretty dumb storage. In fact, they had to deploy video cameras, focused on drive shelves, to detect when drives failed!

Overtime, Nuance came to the realization that they had to do something more manageable and brought in HPE MSA storage to replace their Coraid storage. The MSA was a great solution for them which had 96 SAS drives, were able to support both faster “SCRATCH” storage using fast SAS 300GB/15KRPM drives and slower “STATIC” storage with slower SATA 760GB/7.2KRPM drives and was much more manageable than the Coraid solution.

Although MSA storage worked great, after a while, Nuance’s sprawling FC environment which was doubling yearly, caused them to rethink their storage once again. This led them to swap out all their HPE MSA storage, for HPE 3PAR to consolidate their FC network and storage footprint.

For metadata, Nuance uses a 76 node, Hadoop cluster for sophisticated search queries as doing an LS on the GPFS file system would take days. Their file meta-data is essentially a textual, row by row database and they use queries over the Hadoop cluster to determine things like which files have american english, spoken by females, with 8Khz recording.  Not sure when, but eventually Nuance deployed HPE Vertica SQL over Hadoop for their metadata engine and dropped average query from 12 minutes to 73 sec.(!!)

Nuance, because of their extreme growth and more open environment to storage innovation, had become a favorite for storage startups and major vendors to do Proofs of Concepts (PoC) on new storage offerings. One PoC, Nuance did was for Kamanario storage. There is a standard metric that says a CPU core requires so many IOPS, so that when CPU cores  increase,  you need to supply more IOPS. They went with Kaminario for their test-dev environment and more performance intensive storage. Nuance appreciates Kamanario’s reliability, high availability and highly predictable performance. (See the SFD10 video feed for Frederic’s session)

We talked a bit about how speech recognition’s Hidden Markov Chain statistical model was heavily dependent on CPU cores. Today, if you want to do a recognition task, you assigned it to one core and waited until it was done, a serial process dependent on the # of CPU cores you had available. This turned out to be quite a problem as you had to scale CPU cores if you wanted to do more concurrent speech recognition activities. Then came GPUs and you could do speech recognition work on a GPU core. With the new GPU cards,   instead of a server having ~16 CPU cores,  you could have a server with multiple Graphic cards having 3000-GPU cores. This scaled a lot easier. Machine learning and deep neural nets have the potential to parallelize this, so that it will scale even better

In the end, HPC trials, tribulations and ways of doing business are starting to become  mainstream. I was recently talking to one vendor that said, most HPC groups start out in isolation to support one application but over time they either subsume corporate IT or get absorbed into corp. IT or continue to be a standalone group (while waiting until one of the other two happen).

The podcast runs ~41 minutes and  covers a lot of ground about one HPC organization’s evolution of their storage environment over time, what was driving some of that evolution and the tools they chose to master it.  Listen to the podcast to learn more.

0F2A7849 - Copyv2-resizedFrederic Van Haren, founder HighFens, Inc.

Frederic Van Haren is the Chief Technology Officer @Highfens and known for his insights in the HPC and storage industry. He has over 20 years of experience in High Tech providing technical leadership and strategic direction in Telecom and Speech markets. Frederic spent the last decade at  Nuance Communications building large HPC environments from the ground up. He is frequently invited to speak at events to provide his insights on the HPC and storage markets. He has played leading roles as President of a variety of technology user groups promoting the use of innovative technology. As an Engineer he enjoys working with the engineering teams from technology vendors providing feedback on new and upcoming products.

Frederic lives in Massachusetts,  USA but grew up in the northern part of Belgium where he received his Masters in Electrical Engineering, Electronics and Automation.

GreyBeards talk HPC storage with Molly Rector, CMO & EVP, DDN

oIn our 27th episode we talk with Molly Rector (@MollyRector), CMO & EVP of Product Management/Worldwide Marketing for DDN.  Howard and I have known Molly since her days at Spectra Logic. Molly is also on the BoD of SNIA and Active Archive Alliance (AAA), so she’s very active in the storage industry, on multiple dimensions and a very busy lady.

We (or maybe just I) didn’t know that DDN has a 20 year history in storage and in servicing high performance computing (HPC) customers. It turns out that more enterprise IT organizations are starting to take on workloads that look like HPC activity.

In HPC there are 1000s of compute cores that are crunching on PB of data. For Oil&Gas companies, it’s seismic and wellhead analysis; with bio-informatics it’s genomic/proteomic analysis; and with financial services, it’s economic modeling/backtesting trading strategies. For today’s enterprises such as retailers, it’s customer activity analytics; for manufacturers, it’s machine sensor/log analysis;  and for banks/financial institutions, it’s credit/financial viability assessments. Enterprise IT might not have 1000s of cores at their disposal just yet, but it’s not far off. Molly thinks one way to help enterprise IT is to provide a SuperComputer as a service (ScaaS?) offering, where top 10 supercomputers can be rented out by the hour, sort of like a supercomputing compute/data cloud.

We start early talking about DDN WOS: object store, which can handle archive to cloud or backend tape libraries. Later we discuss DDN ExaScaler and GridScaler, which are NAS appliances for Lustre and massively scale out, parallel file system storage, respectively.

Another key supercomputing storage requirement is  predictable performance. Aside from sophisticated QoS offerings across their products, DDN also offers the IME solution, a bump in the cable, caching system, that can optimize large and small file IO activity for backend DDN NAS scalers. DDN IME is stateless and can be removed from the data path while still allowing IT access  to all their data.

While we were discussing DDN storage interfaces, Molly mentioned they were working on an Omni Path Fabric.  Intel’s new Omni Path Fabric is intended to replace rack scale PCIe networks for HPC.

This months edition is not too technical and runs just over 45 minutes. We only got to SNIA and AAA at the tail end and just for a minute or two. Molly’s always fun to talk to, with enough technical smarts to keep Howard and I at bay, at least for awhile :). Listen to the podcast to learn more.

HeadshotMolly Rector, CMO and EVP Product Management & Worldwide Marketing,  DDN

With 15 years of experience working in the HPC, Media and Entertainment, and Enterprise IT industries running global marketing programs, Molly Rector serves as DDN’s Chief Marketing Officer (CMO) responsible for product management and worldwide marketing. Rector’s role includes providing customer and market input into the company’s product roadmap, raising the Corporate brand visibility outside traditional markets, expanding the partner ecosystem and driving the end-to-end customer experience from definition to delivery.

Rector is a founding member and currently serves as Chairman of the Board for the Active Archive Alliance. She is also the Storage Networking Industry Association’s (SNIA) Vice Chairman of the Board and the Analytics and Big Data committee Vice Chairman. Prior to joining DDN, Rector was responsible for product management and worldwide marketing as CMO at Spectra Logic. During her tenure at Spectra Logic, the company grew revenues consistently by double digits year-over-year, while also maintaining profitability. Rector holds certifications as CommVault Certified System Administrator; Veritas Certified Data Protection Administrator; and Oracle Certified Enterprise DBA: Backup and Recovery. She earned a Bachelor’s of Science degree in biology and chemistry.

Graybeards talk object storage with Russ Kennedy, Sr. VP Prod. Strategy & Cust. Solutions Cleversafe

In our 15th podcast we talk object storage with Russ Kennedy Senior V.P. Product Strategy and Customer Solutions, Cleversafe. Cleversafe is a 10 year old  company sellinge scale out, object storage solutions with a number of interesting characteristics. Howard and I had the chance to talk with Cleversafe at SFD4 (we suggest you view the video if you want to learn more), just about a year ago. But we have both known Russ for a number of years and Ray has done work for Cleversafe in the past.

We haven’t talked about objects storage in the past so this podcast goes over some foundational information about it. Object storage is starting to become more mainstream and general purpose as more interfaces become available and as the amount of data being stored grows out of sight.  Object storage has a flat name space, rich metadata, and relatively rudimentary, native storage access methods. But on top of this one can build sophisticated PB storage environments that can handle high amounts of data throughput, spread this data across multiple sites, and provide highly fault tolerant/highly available storage environments. Object storage will never replace OLTP block oriented storage but for environments with massive unstructured data repositories, it’s probably the best solution out there today.

Cleversafe has some unique characteristics namely their ability to split object storage elements over multiple disparate locations and use erase coding to supply data availability in the event of storage, server, or site failures. Some other object storage systems use 2- or 3-way replication to protect against data loss. But Russ makes the apt comment that when you are talking about PBs of data, replication can cause your storage costs to go up quite fast. Someone mentioned that there are Cleversafe customers that have 15-9’s data availability using erasure coding with only 150% of the original capacity. This is significantly more reliability than what could be obtained by dual or even triple redundancy alone. However, I always find that the weak link in data reliability  discussions such as these is always the software that implements the solution, not the data integrity architecture of the system.

Currently, Cleversafe has many multi-PB installations some of which span continents and others of which are looking to breach an EB (10**21 bytes of storage) of object data. We asked what these customers look like and Russ said lots of Accessors®  (stateless on- and off-ramps for object data) and a lots more Slicestors® (servers holding the statefull storage).

One of the significant barriers to higher object storage adoption has always been their unique, native object storage access protocols. But these days, it turns out that Amazon’s S3 protocol has become the defacto standard for object storage and this is helping accelerate object storage adoption.  In the podcast we discuss how historically, defacto standards have been a successful approach used to introduce new storage access protocols. Cleversafe offers its native RESTful access protocol, S3 and a smattering of others but you can also use other partner solutions if you need standard file access to the object store.

Cleversafe also offers HDFS as another access protocol. With Cleversafe HDFS, Hadoop can access all of it’s data from the Cleversafe object repository. In addition, you can run Hadoop MapReduce on its Slicestor nodes, if you want. Apparently, moving PB of data to analyze it and then deleting it is an expensive and very time consuming proposition, and of course native HDFS uses triple redundancy…

In the podcast, we get into object storage, some of Cleversafe’s advanced functionality, access protocol evolution and more. Listen to the podcast to learn more…

This months episode comes in at a little more than 47 minutes.

Russ Kennedy

Russ Kennedy, Sr VP Product Strategy & Customer Solutions

Russ Kennedy brings more than 20 years experience in the storage industry to Cleversafe as the company’s Senior Vice President of Product Strategy and Customer Solutions. Having rolled up his sleeves working on automated tape libraries, Russ is still attracted to the technological challenges that have shaped the industry and particularly to the innovative approach that Cleversafe delivers to storage.

Russ joined the company initially in 2007 and left in 2009, staying on in an advisory role. In 2011, Russ rejoined the company seeing a clear opportunity to solve the storage needs surrounding the exponential growth of big data and the unique impact that Cleversafe delivers over traditional systems.

Previously, Russ served as the Vice President of Competitive Intelligence at CA Technologies, and was the Senior Director of Engineering and Product Management at Thin Identity Corporation. Russ has an MBA from the University of Colorado at Denver and a bachelor’s degree in Computer Science from Colorado State University.