095: GreyBeards talk file sync&share with S. Azam Ali, VP Customer Success at CentreStack

We haven’t talked with a file synch and share vendor in a while now and Matt was interested in the technology. He had been talking with CentreStack, and found that they had been making some inroads in the enterprise. So we contacted S. Azam Ali, VP of Customer Success at CentreStack and asked if he wanted to talk about their product on our podcast.

File synch and share, is part collaboration tool, part productivity tool. With file synch & share many users share the same files, across many different environments and end point devices. It’s especially popular with road warriors that need access to the same files on the road that reside in corporate data centers. With this technology, files updated anywhere would be available to all.

Most file synch&share systems require you to use their storage. But CentreStack just provides synch and share access to NFS and SMB storage that’s already in the data center.

CentreStack doesn’t use VPNs to access data, many other vendor do. But with CentreStack, one just log’s into a website (with AD credentials) and they have immediate browser access to files.

CentreStack uses a gateway VM, that runs in the corporate data center, configured to share files/file directories/shares. We asked whether they were in the data path and Azam said no. However, the gateway does register for file system notifications (e.g. when files are updated, outside CentreStack, they get notified).

CentreStack does maintain meta-data on the files, directories, shares that are under it’s control. Presumably, once an admin sets it up, it goes out and access the file systems that have shared files and populates their meta-data for those files.

CentreStack works with any NFS and SMB file system as well as NAS servers that support these two. It’s unclear whether customers can have more than one gateway server in their data center supporting synch and share but Azam did say that it wasn’t unusual for customers with multi-data centers to have a gateway in each, to support synch&share requirements for each data center.

They use client software on end point devices, which presents the shared files as an external drive (to Mac), presumably a cloud drive for Windows PCs and similar services (in an App) for other systems (IOS, Android phones, iPad, etc.). We believe Azam said Linux was coming soon.

The client software can be configured in cache mode or offline mode:

  • Cache mode – the admin can configure how much space to use on the endpoint device and the software will cache the most recently used files in that space for faster access
  • Offline mode – the software moves all files that the endpoint login can access, to the device.

In cache mode, when users open a file (not in the most recently used cache), there will be some delay as the system retrieves data from the internet and copies it to the endpoint device. Unclear what the delay might be but it’s probably a function of internet speed and load on the gateway, with possibly some overhead for the NFS/SMB/NAS system to supply the data. If there’s not enough space to hold the file, the oldest non-open file is erased from the cache.

In both modes, Centrestack supports cross domain locking. That is, if one client has a file open (for update), all other systems/endpoints may only access the file in read-only mode. After the file is closed. the file can then be opened for update by other users.

When CentreStack clients are used to update files, the data is stored back in the original file systems with versioning. This way if the data is corrupted, admins can easily return back to a known good copy version.

CentreStack also offers a cloud backup and DR service. Gateway admins can request that synch&share files be backed up to cloud storage (AWS S3, Azure Blob and Wasabi). When CentreStack backups file data to the cloud, it also includes metadata information about the files so they can be re-constituted anywhere.

A CentreStack cloud gateway VM can be activated in the cloud to supply access to backed up files. Unclear whether the CentreStack cloud backup has to be restored to block or file storage first or if it just accesses the data on cloud storage directly. But one customers using CentreStack cloud DR would need to run client software in their applications accessing these files.

Wasabi seemed an odd solution to have on their list of supported cloud storage providers, but Azam said for their market, the economics of Wasabi storage were hard to ignore. See our previous podcast with David Friend, Co-Founder& CEO, Wasabi, to learn more about Wasabi.

CentreStack is licensed on a per user basis, not storage capacity bucking industry trends. But they don’t actually own the storage so it makes sense. For CentreStack cloud backup, customers also have to supply the cloud storage.

They also offer a 30 day free trial on their website with unlimited users. We assume this uses CentreStacks cloud gateway and customers bring their own cloud storage to support it.

The podcast runs about 35 minutes. Azam was a bit more marketing than we are used to, but he warmed up once we started asking questions. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

S. Azam Ali, VP of Customer Success, CentreStack

S. Azam Ali, is VP of Customer Success at CentreStack and is an executive with extensive experience in managing global teams including sales, support and consulting services.

Azam’s channel experience includes on-boarding new partners including creation of marketing and training collateral for the partners. Azam is an executive with a passion for customer success and establishing long term relationships and partnerships.

Azam is also an advisor to startups as well as established technology companies.

094: GreyBeards talk shedding light on data with Scott Baker, Dir. Content & Data Intelligence at Hitachi Vantara

Sponsored By:

At Hitachi NEXT 2019 Conference, last month, there was a lot of talk about new data services from Hitachi. Keith and I thought it would be a good time to sit down and talk with Scott Baker (@Kraken-Scuba), Director of Content and Data Intelligence, at Hitachi Vantara about what’s going on with data operations these days and how customers are shedding more light on their data.

Information supply chain

Something Scott said in his opening remarks caught my attention when he mentioned customer information supply chains. The information supply chain is similar to manufacturing supply chains, but it’s all about data. Just like manufacturing supply chains where parts and services come from anywhere and are used to create products/services for customers,

information supply chains are about the data used in their organization operations. Information supply chain data is A) being sourced from many places (or applications); B) being added to by supply chain processing (or other applications); and C) ultimately used by the organization to supply a product/service to customers.

But after the product/service is supplied the similarity between manufacturing and information supply chains breaks down. With the information supply chain, data is effectively indestructible, is infinitely re-useable and can live forever. Who throws data away anymore?

The problem most organizations have with information supply chains is once the product/service is supplied, data is often put away never to be seen again or as Scott puts it, goes dark.

This is where Hitachi Content intelligence (HCI) comes in. HCI is designed to take (unstructured or structured) data and analyze it (using natural language and other processing tools) to surround it with information and other metadata, so that it can become more visible and useful to the organization for the life of its existence.

Customers can also use HCI to extract and blend data streams together, automating the creation of an information rich, data repository. The data repository can readily be searched to re-discover or uncover attributes about the data not visible before.

Scott also mentioned the Hitachi Pentaho Platform which can be used to make real time decision from structured data. Pentaho information can also be fed into HCI to provide more intelligence for your structured data.

But HCI can also be used to analyze other database data as well. For instance, database blob and text elements can be fed to and analyzed by HCI. HCI analysis can include natural language processing and other functionality to tag the data by adding key:value information, all of which can be supplied back to the database or Pentaho to add further value to structured data.

Customers can also use HCI to read and transform database tables into XML files. XML files can be stored in object stores as objects or in file systems. XML data could easily be textually indexed and be searched by various tools to better understand the structured data information

We also talked about Hadoop data that can be offloaded to Hitachi Content Platform (HCP) object storage with a stub left behind. Once data is in HCP, HCI can be triggered to index and add more metadata, which can then later be used to decide when to move data back to Hadoop for further analysis.

Finally, Keith mentioned that he just got back from KubeCon and there was an increasing cry for data being used with containerized applications. Scott mentioned HCP for Cloud Scale, the newest member of the HCP object store family, focused on scale out capabilities to provide highly consistent, object storage performance for customers that need it. Customers running containerized workloads use scale-out capabilities to respond to user demand and now they have on premises object storage that can scale with them, as needs change.

The podcast ran ~24 minutes. Scott was very knowledgeable about data workflows, pipelines and the need for better discovery tools. We had a great time discussing information supply chains and how Hitachi can help customers optimize their data pipelines. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Scott Baker, Director of Content and Data Intelligence at Hitachi Vantara

Scott Baker is, and has been, an active member of the information technology, data analytics, data management, and data protection disciplines for longer than he is willing to admit.

In his present role at Hitachi, Scott is the Senior Director of the Content and Data Intelligence organization focused on Hitachi’s Digital Transformation, Data Management, Data Governance, Data Mobility, Data Protection and Data Analytics solutions which includes Hitachi Content Platform (HCP), HCP Anywhere, HCP Gateway, Hitachi Content Intelligence, and Hitachi Data Protection Solutions.

Scott is a VMware Certified Professional, recognized as a subject matter expert, industry speaker, and author. Scott has been a panelist on topics related to storage, cloud, information governance, data security, infrastructure standardization, and social media topics. His educational background includes an MBA, Master’s & Bachelor’s in Computer Science.

When he’s not working, Scott is an avid scuba diver, underwater photographer, and PADI Scuba Instructor. He has a passion for public speaking, whiteboarding, teaching, and traveling the world.

89: Keith & Ray show at Pure//Accelerate 2019

There were plenty of announcements at Pure//Accelerate in Austin this past week and we were given a preview of them at a StorageFieldDay Exclusive (SFDx), the day before the announcement.

First up is Pure’s DirectMemory. They have added Optane SSDs to FlashArray//X to be used as a read cache for customer data. As you may know, Pure already has an NVRAM write cache. With DirectMemory, customers can have 3TB or 6TB of Optane storage in a FlashArray//X70 or //X90 storage. It almost looks plug and play, you take out one or two flash modules and plug in Optane SSD(s) and off it goes. DirectMemory went GA at the show.

Pure also announced FlashArray//C at Accelerate. This is a new capacity optimized storage solution. They have re-designed their flash module to support higher capacity flash, and supply higher capacity storage (targeted for QLC flash but will originally ship with TLC). FlashArray//C supplies ~5PB of effective (~1.4PB raw) capacity in 9U. Although, FlashArray//C offers cheaper storage on $/GB basis it is also much slower (RT latency on order of 2-4msec) than FlashArray//X storage.. Pure like other vendors we have talked with are trying to drive disk technology out of the enterprise. We had some interesting discussions with Pure (and others) on this topic at the reception. Just remember, tape is still alive and well in the enterprise AND cloud, 52 years after being pronounced dead.

Pure had announced CloudBlockStore (CBS) previously but it is now GA through partners or on AWS marketplace. Give them kudos for their approach as they have taken a different approach to Pure storage in the cloud. With CBS, they have effectively re-archetected and re-implemented Pure FlashArray using AWS EC2, IO1, EBS and S3 storage and ended up with a highly available (iSCSI) block software defined storage. It will be interesting to see how well it’s adopted. Picture is from me explaining CBS architecture to @DVellante.

For Pure’s FlashBlade storage, they have doubled the number of blades in a cluster (or name space), from 75 to 150 FlashBlades. Each FlashBlade contains storage and compute (almost computational storage), so one should see an increase in bandwidth with the added blades. None at Pure would go on record with specific numbers on any performance improvement because it’s still undergoing testing.

Finally, FlashArray//X will offer full NFS and SMB file support. This is coming from a recent acquisition (Compuverde). They plan to differentiate between file on FlashArraiy//X file storage and FlashBlade by saying that FlashArray//X file is for those customers with mostly block storage requirements but also need small amount of file storage and FlashBlade for everyone else that needs file.

The podcast is ~23 minutes. Keith is a long time friend and co-host of our GreyBeards On Storage podcast. He’s always got an interesting perspective on how new technology can benefit the data center today. Listen to the podcast to learn more.

This image has an empty alt attribute; its file name is Subscribe_on_iTunes_Badge_US-UK_110x40_0824.png
This image has an empty alt attribute; its file name is play_prism_hlock_2x-300x64.png

Keith Townsend, The CTO Advisor

Keith Townsend (@CTOAdvisor) is a IT thought leader who has written articles for many industry publications, interviewed many industry heavyweights, worked with Silicon Valley startups, and engineered cloud infrastructure for large government organizations. Keith is the co-founder of The CTO Advisor, blogs at Virtualized Geek, and can be found on LinkedIN.

88: A GreyBeard talks DataPlatform with Jon Hildebrand, Principal Technologist, Cohesity at VMworld 2019

Sponsored by:

This is another sponsored GreyBeards on Storage podcast and it was recorded at Vmworld 2019. I talked with Jon Hildebrand (@snoopJ123), Principal Technologist at Cohesity. Jon’s been a long time friend from TechFieldDay days and has been working with Cohesity for ~14 months now. For such a short time, Jon’s seen a lot of changes in Cohesity functionality

Indeed, they just announced general availability of Cohesity 6.4 which he called a “major release”. One of the first things we talked about in the 6.4 release, was CyberScan, Powered by Tenable, which is a new capability that uses backup data and scans it for vulnerabilities and risk postures. This way customers can assess their data to see if it’s been infected, potentially long before ransomware or other cyber threats can cripple your systems.

One of the other features in 6.4 was a new run book automation, called the Cohesity Runbook application, that can be used for instance to standup a physical clone of customer data and applications in the cloud or elsewhere. This way customers can have a fully operational copy of their applications running in the cloud, automatically supplied by Cohesity Runbook. Besides the great use of this facility for DR, and DR testing, such capabilities could be used to fire up a Test/Dev environment of your production applications on public cloud infrastructure.

The last feature of 6.4 that Jon and I discussed, supports archiving data from a primary NAS/filer storage systems and move that data out to Cohesity NAS. A stub or SymLink to the data is retained on the primary NAS system. By doing that, customers still have access to all the metadata and can access the data anytime they want, but frees up primary storage capacity and most of the IO processing to access the data.

Cohesity NAS provides the capacity and the processing power to support the IO and data that has been archived. With the new feature, Cohesity DataPlatform acts as an archive or tier of storage behind the primary NAS server. By doing so, customers should be able to delay tech refresh cycles, which should save them time and money. 

When I asked Jon if there were any last items he wanted to discuss he mentioned the Cohesity Truck. Apparently John, Chris and others at Cohesity have stood up a complete data center inside a semi-trailer. Jon said if we can’t bring customers to the Executive Briefing Center (EBC), then we can bring the EBC to the customers. Jon said the truck is touring the USA and you can arrange a visit by going to Cohesity.com/tour.

The podcast is a little under ~20 minutes. Jon is an old friend from TechFieldDays and seems to be taking to Cohesity very well. I’ve always respected Jon’s knowledge of the customer environment and his technical acumen. Listen to the podcast to learn more.

Jon Hildebrand, Principal Technologist, Cohesity. 

Principal Technologist @ Cohesity | Public Speaker | Blogger | Purveyor of PowerShell | VMware vExpert | Cisco Champion

85: GreyBeards talk NVMe NAS with Howard Marks, Technologist Extraordinary and Plenipotentiary, VAST Data Inc.

As most of you know, Howard Marks was a founding co-Host of the GreyBeards-On- Storage podcast and has since joined with VAST Data, an NVMe file and object storage vendor headquartered in NY with R&D out of Israel. We first met with VAST at StorageFieldDay18 (SFD18, video presentation). Howard announced his employment at that event. VAST was a bit circumspect at their SFD18 session but Howard seems to be more talkative, so on the podcast we learn a lot more about their solution.

VAST Data is essentially an NFS-S3 object store, scale out solution with both stateless, VAST Data storage servers and JBoF drive enclosures with Optane and NVMe QLC SSDs. Storage servers or JBoFs can be scaled independently. They don’t support tiering or DRAM caching of data but instead seem to use the Optane SSDs as a write buffer for the QLC SSDs.

At the SFD18 event their spokesperson said that they were going to kill off disk storage media. (Ed’s note: Disk shipments fell 18% y/y in 1Q 2019, with enterprise disk shipments at 11.5M units, desktop at 24.5M units and laptops at 37M units).

The hardware

The VAST Data storage servers are in a 2U/4 server configuration, that runs interface protocols (NFS & S3), data reduction (see below), data reformating/buffering etc. They are stateless servers with all the metadata and other control state maintained on JBoF Optane drives.

Each drive enclosure JBoF has 12 Optane SSDs and 44 U.2 QLC (no DRAM/no super cap) SSDs. This means there are no write buffers on the QLC SSDs that can lose data when power failures occur. The interface to the JBoF is NVMeoF, either RDMA-RoCE Ethernet or InfiniBand (customer selected). Their JBoFs have high availability, with dual fabric modules that support 2-100Gbps Ethernet/InfiniBand ports per module, 4 per JBoF.

Minimum starting capacity is 500TB and they claim support up to Exabytes. Although how much has actually been tested is an open question. They also support billions of objects/files.

Guaranteed better data reduction

They have a rather unique, multi-level, data reduction scheme. At the start, data is chunked in variable length chunks. They use heuristics to determine the chunk size that fits best. (Ed note, unclear which is first in this sequence below so presented in (our view of) logical order)

  • 1st level computes a similarity hash (56 bit not SHA1), which is used to determine a similarity level with any other currently stored data chunk in the system.
  • 2nd level uses a ZSTD compression algorithm. If a similarity is found, the new data chunk is compressed with the ZSTD compression algorithm and a reference dictionary used by the earlier, similar data chunk. If no existing chunk is similar to this one, the algorithm identifies a semi-unique reference dictionary that optimizes the compression of this data chunk. This semi-unique dictionary is stored as metadata.
  • 3rd level, If it turns out to be a complete duplicate data chunk, then the dedupe count for the original data chunk is incremented, a pointer is saved to the original unique data and the data discarded. If not a complete duplicate of other data, the system computes a delta from the closest “similar’ block and stores just the delta bytes, includes a pointer to the original similar block and increments a delta block counter.

So data is chunked, compressed with a optimized dictionary, be delta-diffed or deduped. All data reduction is done post data write (after the client is ACKed), and presumably, re-hydrated after being read from SSD media. VAST Data guarantees better data reduction for your stored data than any other storage solution.

New data protection

They also supply a unique Locally Decodable Erasure Coding with 4 parity (-like) blocks and anywhere from 36 (single enclosure leaving 4 spare u.2 SSDs) to 150 data blocks per stripe all of which support up to 4 device failures per stripe. 

The locally decodable erasure coding scheme allows for rebuilds without having to read all remaining data blocks in a stripe. In this scheme, once you read the 4 parity (-like) blocks, one has all the information calculated from up to ¾ of the remaining drives in the stripe, so the system only has to read the remaining ¼ drives in the stripe to reconstruct one, two, three, or four failing drives.  Given their data stripe width, this cuts down on the amount of data needing to be read considerably. Still with 150 data drives in a stripe, the system still has to read 38 drives worth of QLC SSD data to rebuild a data drive.

In addition to all the above, VAST Data also reblocks the data into much larger segments, (it writes 1MB segments to the QLC drives) and uses a heat map along with other heuristics to separate actively written data from less actively written data, thus reducing garbage collection, write amplification.

The podcast is a long and runs over ~43 minutes. Howard has always been great to talk with and if anything, now being a vendor, has intensified this tendency. Listen to the podcast to learn more.

Howard Marks, Technologist Extraordinary and Plenipotentiary, VAST Data, Inc.

Howard Marks brings over forty years of experience as a technology architect for hire and Industry observer to his role as VAST Data’s Technologist Extraordinary and Plienopotentary. In this role, Howard demystifies VAST’s technologies for customers and customer requirements for VAST’s engineers.

Before joining VAST, Howard ran DeepStorage an industry test lab and analyst firm. An award-winning speaker, he has appeared at events on three continents including Comdex, Interop and VMworld.

Howard is the author of several books (all gratefully out of print) and hundreds of articles since Bill Machrone taught him journalism at PC Magazine in the 1980s.

Listeners may also remember that Howard was a founding co-Host of the Greybeards-on-Storage Podcast.