89: Keith & Ray show at Pure//Accelerate 2019

There were plenty of announcements at Pure//Accelerate in Austin this past week and we were given a preview of them at a StorageFieldDay Exclusive (SFDx), the day before the announcement.

First up is Pure’s DirectMemory. They have added Optane SSDs to FlashArray//X to be used as a read cache for customer data. As you may know, Pure already has an NVRAM write cache. With DirectMemory, customers can have 3TB or 6TB of Optane storage in a FlashArray//X70 or //X90 storage. It almost looks plug and play, you take out one or two flash modules and plug in Optane SSD(s) and off it goes. DirectMemory went GA at the show.

Pure also announced FlashArray//C at Accelerate. This is a new capacity optimized storage solution. They have re-designed their flash module to support higher capacity flash, and supply higher capacity storage (targeted for QLC flash but will originally ship with TLC). FlashArray//C supplies ~5PB of effective (~1.4PB raw) capacity in 9U. Although, FlashArray//C offers cheaper storage on $/GB basis it is also much slower (RT latency on order of 2-4msec) than FlashArray//X storage.. Pure like other vendors we have talked with are trying to drive disk technology out of the enterprise. We had some interesting discussions with Pure (and others) on this topic at the reception. Just remember, tape is still alive and well in the enterprise AND cloud, 52 years after being pronounced dead.

Pure had announced CloudBlockStore (CBS) previously but it is now GA through partners or on AWS marketplace. Give them kudos for their approach as they have taken a different approach to Pure storage in the cloud. With CBS, they have effectively re-archetected and re-implemented Pure FlashArray using AWS EC2, IO1, EBS and S3 storage and ended up with a highly available (iSCSI) block software defined storage. It will be interesting to see how well it’s adopted. Picture is from me explaining CBS architecture to @DVellante.

For Pure’s FlashBlade storage, they have doubled the number of blades in a cluster (or name space), from 75 to 150 FlashBlades. Each FlashBlade contains storage and compute (almost computational storage), so one should see an increase in bandwidth with the added blades. None at Pure would go on record with specific numbers on any performance improvement because it’s still undergoing testing.

Finally, FlashArray//X will offer full NFS and SMB file support. This is coming from a recent acquisition (Compuverde). They plan to differentiate between file on FlashArraiy//X file storage and FlashBlade by saying that FlashArray//X file is for those customers with mostly block storage requirements but also need small amount of file storage and FlashBlade for everyone else that needs file.

The podcast is ~23 minutes. Keith is a long time friend and co-host of our GreyBeards On Storage podcast. He’s always got an interesting perspective on how new technology can benefit the data center today. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Keith Townsend, The CTO Advisor

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

88: A GreyBeard talks DataPlatform with Jon Hildebrand, Principal Technologist, Cohesity at VMworld 2019

Sponsored by:

This is another sponsored GreyBeards on Storage podcast and it was recorded at Vmworld 2019. I talked with Jon Hildebrand (@snoopJ123), Principal Technologist at Cohesity. Jon’s been a long time friend from TechFieldDay days and has been working with Cohesity for ~14 months now. For such a short time, Jon’s seen a lot of changes in Cohesity functionality

Indeed, they just announced general availability of Cohesity 6.4 which he called a “major release”. One of the first things we talked about in the 6.4 release, was CyberScan, Powered by Tenable, which is a new capability that uses backup data and scans it for vulnerabilities and risk postures. This way customers can assess their data to see if it’s been infected, potentially long before ransomware or other cyber threats can cripple your systems.

One of the other features in 6.4 was a new run book automation, called the Cohesity Runbook application, that can be used for instance to standup a physical clone of customer data and applications in the cloud or elsewhere. This way customers can have a fully operational copy of their applications running in the cloud, automatically supplied by Cohesity Runbook. Besides the great use of this facility for DR, and DR testing, such capabilities could be used to fire up a Test/Dev environment of your production applications on public cloud infrastructure.

The last feature of 6.4 that Jon and I discussed, supports archiving data from a primary NAS/filer storage systems and move that data out to Cohesity NAS. A stub or SymLink to the data is retained on the primary NAS system. By doing that, customers still have access to all the metadata and can access the data anytime they want, but frees up primary storage capacity and most of the IO processing to access the data.

Cohesity NAS provides the capacity and the processing power to support the IO and data that has been archived. With the new feature, Cohesity DataPlatform acts as an archive or tier of storage behind the primary NAS server. By doing so, customers should be able to delay tech refresh cycles, which should save them time and money. 

When I asked Jon if there were any last items he wanted to discuss he mentioned the Cohesity Truck. Apparently John, Chris and others at Cohesity have stood up a complete data center inside a semi-trailer. Jon said if we can’t bring customers to the Executive Briefing Center (EBC), then we can bring the EBC to the customers. Jon said the truck is touring the USA and you can arrange a visit by going to Cohesity.com/tour.

The podcast is a little under ~20 minutes. Jon is an old friend from TechFieldDays and seems to be taking to Cohesity very well. I’ve always respected Jon’s knowledge of the customer environment and his technical acumen. Listen to the podcast to learn more.

Jon Hildebrand, Principal Technologist, Cohesity. 

Principal Technologist @ Cohesity | Public Speaker | Blogger | Purveyor of PowerShell | VMware vExpert | Cisco Champion

87: Matt & Ray show at VMworld 2109

Matt and Ray were both at VMworld 2019 in San Francisco this past week, and we did an impromptu podcast on recent news at the show.

VMware announced a number of new projects and just prior to the show they announced the intent to acquire Pivotal and Carbon Black. Pat’s keynote the first day was about a number of new products and features but he also spent time discussing how they were going to incorporate these acquisitions.

One thing that caught a lot of attention was “The Tanzu Portfolio”, which was all about how VMware is adopting Kubernetes as an integral and native part of vSphere moving forward. Project Pacific was their working name for integrating Kubernetes as a native feature of vSphere. And the Tanzu Mission Control was a new multi-cloud/hybrid cloud management solution for Kubernetes clusters wherever they ran.

VMware has had a rather lengthy history with container support from project Photon, to VIC, to running PKS ontop of vSphere. But with Project Pacific, Kubernetes is now being brought under the covers of vSphere and any ESXi cluster becomes a .Kubernetes cluster.

We also talked a little bit about Carbon Black and it’s endpoint security. Neither of us are security experts but Matt mentioned another company he talked with at the show that based their product on workload profiling to determine when something has gone amiss.

It’s Ray’s belief that Carbon Black does much the same profilings only for endpoint devices desktops, laptops, and mobile devices (maybe not thin clients).

Pat also talked a bit about IoT and edge processing at the show and they have a push to support more forms of edge computing.

Ray mentioned he talked with HiveCell, at the show who had a standalone Arm server about the size of a big book that can be stood up just about anywhere there’s power and ethernet.

Unfortunately there’s some background noise on the podcast and it happens to be a short one, at over 16.5 minutes. This podcast represents a departure for us, as the Greybeards have never done a live recording at a conference before. We plan to do more of this so we hope you enjoy it. Please let us know what you think about it and if there’s anything we could do to improve our live recording shows. There’s more on the recording so listen to the podcast to learn more.

Matt Leib

Matt Leib (@MBLeib), one of our co-hosts, has been blogging in the storage space for over 10 years, with work experience both on the engineering and presales/product marketing.. His blog is at Virtually Tied to My Desktop and he’s on LinkedIN.

86: Greybeards talk FMS19 wrap up and flash trends with Jim Handy, General Director, Objective Analysis

This is our annual Flash Memory Summit podcast with Jim Handy, General Director, Objective Analysis. It’s the 5th time we have had Jim on our show. Jim is also an avid blogger writing about memory and SSD at TheMemoryGuy and TheSSDGuy, respectively.

NAND market trends

Jim started off our discussion on the significant price drop in the NAND market over the last two years. He said that prices ($/GB) have dropped 60% last year and are projected to drop about 30% this year.

The problem is over production and as vendors are prohibited from dropping prices below cost, they tend to flatten out at production cost. NAND pricing will remain there until supplies start tightening again. Jim doesn’t see that happening until 2021.

He says although this NAND price drops don’t end up reducing SSD prices, it does allow us to buy more SSD storage for the same price. So maybe back earlier this century NAND cost $10K/GB, now it’s around $0.05/GB.

Jim also mentioned that Chinese NAND fabs should start coming online in 2021 too. They have been spending lots of money trying to get their own NAND manufacturing running. Jim said the reason they want to do this is because the Chinese are spending more $s on chips , than they do for oil.

Computational storage, a bright spot

At the show, computational storage (for more hear our GBoS podcast with Scott Shadley, NGD Systems) was hot again this year. Jim took a shot at defining computational storage and talked about the proliferation of ARM cores in SSDs. Keith mentioned that Moore’s law is making the incremental cost of adding more cores close to zero.

Jim said SAMSUNG already have 6 ARM cores in their SSDs, but most other vendors use 3 cores. I met with NetInt at the show who are focused on computational storage for video transcoding. Keith doesn’t think this would be a good fit, because it takes a lot of computation. But maybe as it’s easily distributable (out to a gaggle of SSDs) and it’s data intensive it might work ok. Jim also mentioned while adding cores may be cheap, increasing memory (DRAM) is not.

According to Jim, hyper-scalars are starting to buy computational storage technology. He’s not sure if they are just trying it out or have some real work running on the technology.

SCM news

We talked about Toshiba’s new XC flash and SSDs. Jim said this is just SLC NAND (expensive $/GB and high endurance) with increased parallelism and reduced latency data paths. Samsung’s Z-NAND is similar. Toshiba claims XL Flash SSDs are another storage class memory (SCM, see our 3DX blog post). Toshiba are pricing XL Flash SSDs at about 10X the $/GB price of 3D TLC NAND, or roughly the same as Optane SSDs.

We next turned to Optane DC PM, which Intel is selling at a loss but as it works only with Cascade Lake CPUs, can help increase CPU adoption. So Intel can absorb Optane DC PM losses by selling more (highly profitable) Cascade Lake systems.

Keith mentioned that SAP HANA now works with Cascade Lake-Optane DC PM. This is driving up demand for the new DC PM and new CPUs. Keith said with the new larger size in memory databases from DC PM, HANA able to do more work, increasing Cascade Lake-Optane DC PM-SAP HANA adoption.

Micron also manufacturers 3DX. Jim said they are in an enviable position as they can . supply the chips (at costs) to Intel, so they know chip volumes and can see what Intel is charging for the technology. So, if at some point, it has runway to become profitable, they can easily enter as a sole secondary source for the technology.

Other NAND news

How high can 3D TLC NAND go? Jim said most 3D NAND sold on the market is 64 layers high but suppliers are already shipping more layers than that. All NAND suppliers, bar one, have said their next generation 3D TLC NAND will be over 100 layers. Some years back one vendor said the technology could go up to 500 layers. This year Samsung, said they see the technology going to 800 layers.

We’ve heard of SLC, MLC, TLC and QLC but at the show there was talk of PLC or five level cell NAND technology. If they can make the technology successful, PLC should reduce manufacturing costs, another 10% ($/GB).

We discussed a lot more that was highlighted at the show, including PCIe fabric/composable infrastructure, zoned (NVMe) name spaces (redux SMR disks) and the ongoing success of the show. We had a brief discussion on when if ever NAND costs will be less than disk ($/GB).

The podcast is a little under ~40 minutes. Jim is an old friend, who is extremely knowledgeable about NAND & DRAM technology as well as semiconductor markets in general. Jim’s always been a kick to talk with. Listen to the podcast to learn more.

Jim Handy, General Director, Objective Analysis

Jim Handy of Objective Analysis has over 35 years in the electronics industry including 20 years as a leading semiconductor and SSD industry analyst. Early in his career he held marketing and design positions at leading semiconductor suppliers including Intel, National Semiconductor, and Infineon.

A frequent presenter at trade shows, Mr. Handy is known for his technical depth, accurate forecasts, widespread industry presence and volume of publication.

He has written hundreds of market reports, articles for trade journals, and white papers, and is frequently interviewed and quoted in the electronics trade press and other media. 

He posts blogs at www.TheMemoryGuy.com, and www.TheSSDguy.com

85: GreyBeards talk NVMe NAS with Howard Marks, Technologist Extraordinary and Plenipotentiary, VAST Data Inc.

As most of you know, Howard Marks was a founding co-Host of the GreyBeards-On- Storage podcast and has since joined with VAST Data, an NVMe file and object storage vendor headquartered in NY with R&D out of Israel. We first met with VAST at StorageFieldDay18 (SFD18, video presentation). Howard announced his employment at that event. VAST was a bit circumspect at their SFD18 session but Howard seems to be more talkative, so on the podcast we learn a lot more about their solution.

VAST Data is essentially an NFS-S3 object store, scale out solution with both stateless, VAST Data storage servers and JBoF drive enclosures with Optane and NVMe QLC SSDs. Storage servers or JBoFs can be scaled independently. They don’t support tiering or DRAM caching of data but instead seem to use the Optane SSDs as a write buffer for the QLC SSDs.

At the SFD18 event their spokesperson said that they were going to kill off disk storage media. (Ed’s note: Disk shipments fell 18% y/y in 1Q 2019, with enterprise disk shipments at 11.5M units, desktop at 24.5M units and laptops at 37M units).

The hardware

The VAST Data storage servers are in a 2U/4 server configuration, that runs interface protocols (NFS & S3), data reduction (see below), data reformating/buffering etc. They are stateless servers with all the metadata and other control state maintained on JBoF Optane drives.

Each drive enclosure JBoF has 12 Optane SSDs and 44 U.2 QLC (no DRAM/no super cap) SSDs. This means there are no write buffers on the QLC SSDs that can lose data when power failures occur. The interface to the JBoF is NVMeoF, either RDMA-RoCE Ethernet or InfiniBand (customer selected). Their JBoFs have high availability, with dual fabric modules that support 2-100Gbps Ethernet/InfiniBand ports per module, 4 per JBoF.

Minimum starting capacity is 500TB and they claim support up to Exabytes. Although how much has actually been tested is an open question. They also support billions of objects/files.

Guaranteed better data reduction

They have a rather unique, multi-level, data reduction scheme. At the start, data is chunked in variable length chunks. They use heuristics to determine the chunk size that fits best. (Ed note, unclear which is first in this sequence below so presented in (our view of) logical order)

  • 1st level computes a similarity hash (56 bit not SHA1), which is used to determine a similarity level with any other currently stored data chunk in the system.
  • 2nd level uses a ZSTD compression algorithm. If a similarity is found, the new data chunk is compressed with the ZSTD compression algorithm and a reference dictionary used by the earlier, similar data chunk. If no existing chunk is similar to this one, the algorithm identifies a semi-unique reference dictionary that optimizes the compression of this data chunk. This semi-unique dictionary is stored as metadata.
  • 3rd level, If it turns out to be a complete duplicate data chunk, then the dedupe count for the original data chunk is incremented, a pointer is saved to the original unique data and the data discarded. If not a complete duplicate of other data, the system computes a delta from the closest “similar’ block and stores just the delta bytes, includes a pointer to the original similar block and increments a delta block counter.

So data is chunked, compressed with a optimized dictionary, be delta-diffed or deduped. All data reduction is done post data write (after the client is ACKed), and presumably, re-hydrated after being read from SSD media. VAST Data guarantees better data reduction for your stored data than any other storage solution.

New data protection

They also supply a unique Locally Decodable Erasure Coding with 4 parity (-like) blocks and anywhere from 36 (single enclosure leaving 4 spare u.2 SSDs) to 150 data blocks per stripe all of which support up to 4 device failures per stripe. 

The locally decodable erasure coding scheme allows for rebuilds without having to read all remaining data blocks in a stripe. In this scheme, once you read the 4 parity (-like) blocks, one has all the information calculated from up to ¾ of the remaining drives in the stripe, so the system only has to read the remaining ¼ drives in the stripe to reconstruct one, two, three, or four failing drives.  Given their data stripe width, this cuts down on the amount of data needing to be read considerably. Still with 150 data drives in a stripe, the system still has to read 38 drives worth of QLC SSD data to rebuild a data drive.

In addition to all the above, VAST Data also reblocks the data into much larger segments, (it writes 1MB segments to the QLC drives) and uses a heat map along with other heuristics to separate actively written data from less actively written data, thus reducing garbage collection, write amplification.

The podcast is a long and runs over ~43 minutes. Howard has always been great to talk with and if anything, now being a vendor, has intensified this tendency. Listen to the podcast to learn more.

Howard Marks, Technologist Extraordinary and Plenipotentiary, VAST Data, Inc.

Howard Marks brings over forty years of experience as a technology architect for hire and Industry observer to his role as VAST Data’s Technologist Extraordinary and Plienopotentary. In this role, Howard demystifies VAST’s technologies for customers and customer requirements for VAST’s engineers.

Before joining VAST, Howard ran DeepStorage an industry test lab and analyst firm. An award-winning speaker, he has appeared at events on three continents including Comdex, Interop and VMworld.

Howard is the author of several books (all gratefully out of print) and hundreds of articles since Bill Machrone taught him journalism at PC Magazine in the 1980s.

Listeners may also remember that Howard was a founding co-Host of the Greybeards-on-Storage Podcast.