88: A GreyBeard talks DataPlatform with Jon Hildebrand, Principal Technologist, Cohesity at VMworld 2019

Sponsored by:

This is another sponsored GreyBeards on Storage podcast and it was recorded at Vmworld 2019. I talked with Jon Hildebrand (@snoopJ123), Principal Technologist at Cohesity. Jon’s been a long time friend from TechFieldDay days and has been working with Cohesity for ~14 months now. For such a short time, Jon’s seen a lot of changes in Cohesity functionality

Indeed, they just announced general availability of Cohesity 6.4 which he called a “major release”. One of the first things we talked about in the 6.4 release, was CyberScan, Powered by Tenable, which is a new capability that uses backup data and scans it for vulnerabilities and risk postures. This way customers can assess their data to see if it’s been infected, potentially long before ransomware or other cyber threats can cripple your systems.

One of the other features in 6.4 was a new run book automation, called the Cohesity Runbook application, that can be used for instance to standup a physical clone of customer data and applications in the cloud or elsewhere. This way customers can have a fully operational copy of their applications running in the cloud, automatically supplied by Cohesity Runbook. Besides the great use of this facility for DR, and DR testing, such capabilities could be used to fire up a Test/Dev environment of your production applications on public cloud infrastructure.

The last feature of 6.4 that Jon and I discussed, supports archiving data from a primary NAS/filer storage systems and move that data out to Cohesity NAS. A stub or SymLink to the data is retained on the primary NAS system. By doing that, customers still have access to all the metadata and can access the data anytime they want, but frees up primary storage capacity and most of the IO processing to access the data.

Cohesity NAS provides the capacity and the processing power to support the IO and data that has been archived. With the new feature, Cohesity DataPlatform acts as an archive or tier of storage behind the primary NAS server. By doing so, customers should be able to delay tech refresh cycles, which should save them time and money. 

When I asked Jon if there were any last items he wanted to discuss he mentioned the Cohesity Truck. Apparently John, Chris and others at Cohesity have stood up a complete data center inside a semi-trailer. Jon said if we can’t bring customers to the Executive Briefing Center (EBC), then we can bring the EBC to the customers. Jon said the truck is touring the USA and you can arrange a visit by going to Cohesity.com/tour.

The podcast is a little under ~20 minutes. Jon is an old friend from TechFieldDays and seems to be taking to Cohesity very well. I’ve always respected Jon’s knowledge of the customer environment and his technical acumen. Listen to the podcast to learn more.

Jon Hildebrand, Principal Technologist, Cohesity. 

Principal Technologist @ Cohesity | Public Speaker | Blogger | Purveyor of PowerShell | VMware vExpert | Cisco Champion

76: GreyBeards talk backup content, GDPR and cyber security with Jim McGann, VP Mkt & Bus. Dev., Index Engines

In this episode we talkindexing old backups, GDPR and CyberSense, a new approach to cyber security, with Jim McGann, VP Marketing and Business Development, Index Engines.

Jim’s an old industry hand that’s been around backups, e-discovery and security almost since the beginning. Index Engines solution to cyber security, CyberSense, is also offered by Dell EMC and Jim presented at a TFDx event this past October hosted by Dell EMC (See Dell EMC-Index Engines TFDx session on CyberSense).

It seems Howard’s been using Index Engines for a long time but keeping them a trade secret. In one of his prior consulting engagements he used Index Engines technology to locate a a multi-million dollar email for one customer.

Universal backup data scan and indexing tool

Index Engines has long history as a tool to index and understand old backup tapes and files. Index Engines did all the work to understand the format and content of NetBackup, Dell EMC Networker, IBM TSM (now Spectrum Protect), Microsoft Exchange backups, database vendor backups and other backup files. Using this knowledge they are able to read just about anyone’s backup tapes or files and tell customers what’s on them.

But it’s not just a backup catalog tool, Index Engines can also crack open backup files and index the content of the data. In this way customers can search backup data, with Google like search terms. This is used day in and day out, for E-discovery and the occasional consulting engagement.

Index Engines technology is also useful for companies complying with GDPR and similar legislation. When any user can request information about them be purged from corporate data, being able to scan, index and search backups is great feature.

In addition to backup file scanning, Index Engines has a multi-PB, indexing solution which can be used to perform the same, Google-like searching on a data center’s file storage. Once again, Index Engines has done the development work to implement their own, highly parallelized metadata and content search engine, demonstratively falter than any open source (Lucene) search solution available today.

CyberSense

All that’s old news, what Jim presented at a TFDx event was their new CyberSense solution. CyberSense was designed to help organizations detect and head off ransomware, cyber assaults and other data corruption attacks.

CyberSense computes a data entropy (randomness) score as well as ~39 other characteristics for every file in backups or online in a custmer’s data center. It then uses that information to detect when a cyber attack is taking place and determine the extent of the corruption. With current and previous entropy and other characteristics on every data file, CyberSense can flag files that look like they have been corrupted and warn customers that a cyber attack is in process before it corrupts all of customers data files.

One typical corruption is to change file extensions. CyberSense cracks open file contents and can determine if it’s an office or other standard document type and then check to see if its extension matches its content. Another common corruption is to encrypt files. Such files necessarily have an increased entropy and can be automatically detected by CyberSense

When CyberSense has detected some anomaly, it can determine who last accessed the file and what executable was used to modify it. In this way CyberSecurity can be used to provide forensics on who, what, when and where about a corrupted file, so that IT can shut the corruption activity down before it’s gone to far.

CyberSense can be configured to periodically scan files online as well as just examine backup data (offline) during or after it’s backed up. Their partnership with Dell EMC is to do just that with Data Domain and Dell EMC backup software.

Index Engines proprietary indexing functionality has been optimized for parallel execution and for reduced index size. Jim mentioned that their content indexes average about 5% of the full storage capacity and that they can index content at a TB/hour.

Index Engines is a software only offering but they also offer services for customers that want a turn key solution. They also are available through a number of partners, Dell EMC being one.

The podcast runs ~44 minutes. Jim’s been around backups, storage and indexing forever. And seems to have good knowledge on data compliance regimes and current security threats impacting customers, across the world today . Listen to our podcast to learn more.

Jim McGann, VP Marketing and Business Development, Index Engines

Jim has extensive experience with the eDiscovery and Information Management in the Fortune 2000 sector. Before joining Index Engines in 2004, he worked for leading software firms, including Information Builders and the French based engineering software provider Dassault Systemes.

In recent years he has worked for technology based start-ups that provided financial services and information management solutions. Prior to Index Engines, Jim was responsible for the business development of Scopeware at Mirror Worlds Technologies, the knowledge management software firm founded by Dr. David Gelernter of Yale University. Jim graduated from Villanova University with a degree in Mechanical Engineering.

Jim is a frequent writer and speaker on the topics of big data, backup tape remediation, electronic discovery and records management.

71: GreyBeards talk DP appliances with Sharad Rastogi, Sr. VP & Ranga Rajagopalan, Sr. Dir., Dell EMC DPD

Sponsored by:

In this episode we talk data protection appliances with Sharad Rastogi (@sharadrastogi), Senior VP Product Management,  and Ranga Rajagopalan, Senior Director, Data Protection Appliances Product Management, Dell EMC Data Protection Division (DPD). Howard attended Ranga’s TFDx session (see TFDx videos here) on their new Integrated Data Protection Appliance (IDPA) the DP4400 at VMworld last month in Las Vegas.

This is the first time we have had anyone from Dell EMC DPD on our show. Ranga and Sharad were both knowledgeable about the data protection industry, industry trends and talked at length about the new IDPA DP4400.

Dell EMC IDPA DP4400

The IDPA DP4400 is the latest member of the Dell EMC IDPA product family.  All IDPA products package secondary storage, backup software and other solutions/services to make for a quick and easy deployment of a complete backup solution in your data center.  IDPA solutions include protection storage and software, search and analytics, system management — plus cloud readiness with cloud disaster recovery and long-term retention — in one 2U appliance. So there’s no need to buy any other secondary storage or backup software to provide data protection for your data center.

The IDPA DP4400 grows in place  from 24 to 96TB of usable capacity and at an average 55:1 dedupe ratio, it could support over 5PB of backup storage on the appliance. The full capacity always ships with the appliance. Customers can select how much or little they get to use by just purchasing a software license key.

In addition to the on appliance capacity, the IDPA DP4400 can use up to 192TB of cloud storage for a native Cloud tier. Cloud tiering takes place after a specified, appliance residency interval, after which backup data is moved from the appliance to the cloud. IDPA Cloud Tier works with AWS, Azure, IBM Cloud Object Storage, Ceph and Dell EMC Elastic Cloud Storage. With the 192TB of cloud and 96TB of on appliance usable storage, together with a 55:1 dedupe ratio, a single IDPA DP4400 can support over 14PB of logical backup data.

Furthermore, IDPA supports Cloud DR. With Cloud DR, backed up VMs are copied to the public cloud (AWS) on a scheduled basis. In case of a disaster, there is an orchestrated failover with the VMs spun up in the cloud. The cloud workloads can then easily be failed back on site once the disaster is resolved.

The IDPA DP4400 also comes with native DD Boost™ support. This means Oracle, SQL server and other applications that already support DD Boost can also use the appliance to backup and restore their application data. DD Boost customers can make use of native application services such as Oracle RAC to manage their database backups/restores with the appliance.

Dell EMC also offers their Future-Proof Loyalty Program guarantees for the IDPA DP4400, including a Data Protection Deduplication guarantee, which, if best practices are followed, Dell EMC will guarantee the appliance dedupe ratio for backup data. Additional guarantees from the Dell EMC Future-Proof Program for IDPA DP4400 include a 3-Year Satisfaction guarantee, a Clear Price guarantee which guarantees predictable pricing for future maintenance and service as well as a Cloud Enabled guarantee. These are just a few of the Dell EMC guarantees provided for the IDPA DP4400.

The podcast runs ~16 minutes. Ranga and Sharad were both very knowlegdeable on DP industry, DP trends and the new IDPA DP4400.  Listen to the podcast to learn more.

Sharad Rostogi, Senior V.P. Product Management, Dell EMC Data Protection Division

Sharad Rastogi is a global technology executive with strong track record of transforming businesses and increasing shareholder value across a broad range of leadership roles, in both high growth and turnaround situations.

As SVP of Product Management at Dell EMC, Sharad is responsible for all products for the $3B Data Protection business.  He oversees a diverse portfolio, and is currently developing next generation integrated appliances, software and cloud based data protection solutions. In the past, Sharad has held senior roles in general management, products, marketing, corporate development and strategy at leading companies including Cisco, JDSU, Avid and Bain.

Sharad holds an MBA from the Wharton School at the University of Pennsylvania, an MS in engineering from the Boston University and a B.Tech in engineering from the Indian Institute of Technology in New Delhi.

He is an advisor to Boston University, College of Engineering, and a Board member at Edventure More – a non-profit providing holistic education. Sharad is a world traveler, always seeking new adventures and experiences

Rangaraaj (Ranga) Rajagopalan, Senior Director Data Protection Appliances Product Management, Dell EMC Data Protection Division

Ranga Rajagopalan is Senior Director of Product Management for Data Protection Appliances at Dell EMC. Ranga is responsible for driving the product strategy and vision for Data Protection Appliances, setting and delivering the multi-year roadmap for Data Domain and Integrated Data Protection Appliance.

Ranga has 15 years of experience in data protection, business continuity and disaster recovery, in both India and USA. Prior to Dell EMC, Ranga managed the Veritas Cluster Server and Veritas Resiliency Platform products for Veritas Technologies.

64: GreyBeards discuss cloud data protection with Chris Wahl, Chief Technologist, Rubrik

Sponsored by:

In this episode we talk with Chris Wahl, Chief Technologist, Rubrik. This is our second time having Chris on our show. The last time was about three years ago (see our Chris on agentless backup podcast). Talking with Chris again was great and there’s been plenty of news since we last spoke with him.

Rubrik now has three products the Rubrik Cloud Data Protection suite (onprem, virtual or in the [AWS & Azure] cloud), the Rubrik Datos IO (recent acquisition) for NoSql database with semantic dedupe and Rubrik Polaris GPS, a SaaS monitoring/trending/management solution for your data protection environment. Polaris GPS monitors and watches data protection trends for you, to insure all your data protection SLAs are being met. But we didn’t spend much time on Polaris.

Datos IO was designed from the start to backup new databases based on NoSQL technologies and provides, a semantic based deduplication capability, that’s unique in the industry . We talked with Datos IO before their acquisition by Rubrik (see our podcast with Tarun on 3rd generation data protection).

Cloud Data Protection

As for their Cloud Data Protection suite, one major differentiator is that all their functionality is available via RESTful APIs. Their GUI is completely built off their APIs. This means any customer could use their set of APIs to integrate Rubrik data protection with any application/workload on the planet.

Chris mentioned that Rubrik has 40+ specific application/system integrations that provide “strictly consistent” data protection. We assume this means application consistent backups and recovery but goes beyond mere applications.

With the Cloud Data Protection solution, data resides on the appliance for only a short (customer specifiable) period and then is migrated off to cloud or onprem object storage. The object storage could be any onprem S3 compatible storage, in the AWS or Azure cloud. It’s completely automatic. The data migrated to object storage is self-defining, meaning that metadata and data are all available in one spot and can be restored anywhere there’s a Rubrik Cloud Data Protection suite operating.

The Cloud Data Protection appliance also supports onboard search and analytics to search backup/recovery metadata/catalogs. As such, there’s no need to purchase other tools to uncover which backup files exist. Their solution also uses data deduplication to reduce the data stored.

Data stored is also encrypted by customer keys and use HTTPS to transfer data. So, data is secured at rest, secured in flight and deduped. Cloud Data Protection also offers data mobility. That is it can move your VMs and data from onprem to the cloud and use Rubrik in the cloud to rehydrade the data and translate your VMs to run in AWS or Azure and it works in reverse, translating AWS/Azure compute instances into VMs.

Rubrik’s major differentiator is simplicity. Traditionally, customers had been conditioned to thinking data protection took hours to maintain, fix and keep running. But with Rubrik Cloud Data Protection, a customer just points it to an application and selects an SLA, and Rubrik takes over from there.

The secret behind Rubrik’s simplicity is Cerebro. Cerebro is where they have put all the smarts to understand a data center’s infrastructure, applications/VMs, protected data and requested SLAs and just makes it work

The podcast runs ~27 minutes. Chris was great to talk with again and given how long it’s been since we last talked, he had much to discuss. Rubrik seems like an easy solution to adopt and if their growth is any indicator, customers agree. Listen to the podcast to learn more.

Chris Wahl, Chief Technologist, Rubrik

Chris Wahl, author of the award winning Wahl Network blog and host of the Datanauts Podcast, focuses on creating content that revolves around virtualization, automation, infrastructure, and evangelizing products and services that benefit the technology community.

In addition to co-authoring “Networking for VMware Administrators” for VMware Press, he has published hundreds of articles and was voted the “Favorite Independent Blogger” by vSphere-Land three years in a row (2013 – 2015). Chris also travels globally to speak at industry events, provide subject matter expertise, and offer perspectives to startups and investors as a technical adviser.

54: GreyBeards talk scale-out secondary storage with Jonathan Howard, Dir. Tech. Alliances at Commvault

This month we talk scale-out secondary storage with Jonathan Howard,  Director of Technical Alliances at Commvault.  Both Howard and I attended Commvault GO2017 for Tech Field Day, this past month in Washington DC. We had an interesting overview of their Hyperscale secondary storage solution and Jonathan was the one answering most of our questions, so we thought he would make an good guest for our podcast.

Commvault has been providing data protection solutions for a long time, using anyone’s secondary storag, but recently they have released a software defined, scale-out secondary storage solution that runs their software with a clustered file system.

Hyperscale secondary storage

They call their solution, Hyperscale secondary storage and it’s available in both an hardware-software appliance as well as software only configuration on compatible off the shelf commercial hardware. Hyperscale uses the Red Hat Gluster cluster file system and together with the Commvault Data Platform provides a highly scaleable, secondary storage cluster that can meet anyone’s secondary storage needs while providing high availability and high throughput performance.

Commvault’s Hyperscale secondary storage system operates onprem in customer data centers. Hyperscale uses flash storage for system metadata but most secondary storage resides on local server disk.

Combined with Commvault Data Platform

With the sophistication of Commvault Data Platform one can have all the capabilities of a standalone Commvault environment with software defined storage. This allows just about any RTO/RPO needed by today’s enterprise and includes Live Sync secondary storage replication,  Onprem IntelliSnap for on storage snapshot management, Live Mount for instant recovery using secondary storage directly  to boot your VMs without having to wait for data recovery.  , and all the other recovery sophistication available from Commvault.

Hyperscale storage is capable of doing up to 5 Live Mount recoveries simultaneously per node without a problem but more are possible depending on performance requirements.

We also talked about Commvault’s cloud secondary storage solution which can make use of AWS S3 storage to hold backups.

Commvault’s organic growth

Most of the other data protection companies have came about through mergers, acquisitions or spinoffs. Commvault has continued along, enhancing their solution while bashing everything on an underlying centralized metadata database.  So their codebase was grown from the bottom up and supports pretty much any and all data protection requirements.

The podcast runs ~50 minutes. Jonathan was very knowledgeable about the technology and was great to talk with. Listen to the podcast to learn more.

Jonathan Howard, Director, Technical and Engineering Alliances, Commvault

Jonathan Howard is a Director, Technology & Engineering Alliances for Commvault. A 20-year veteran of the IT industry, Jonathan has worked at Commvault for the past 8 years in various field, product management, and now alliance facing roles.

In his present role with Alliances, Jonathan works with business and technology leaders to design and create numerous joint solutions that have empowered Commvault alliance partners to create and deliver their own new customer solutions.