71: GreyBeards talk DP appliances with Sharad Rastogi, Sr. VP & Ranga Rajagopalan, Sr. Dir., Dell EMC DPD

Sponsored by:

In this episode we talk data protection appliances with Sharad Rastogi (@sharadrastogi), Senior VP Product Management,  and Ranga Rajagopalan, Senior Director, Data Protection Appliances Product Management, Dell EMC Data Protection Division (DPD). Howard attended Ranga’s TFDx session (see TFDx videos here) on their new Integrated Data Protection Appliance (IDPA) the DP4400 at VMworld last month in Las Vegas.

This is the first time we have had anyone from Dell EMC DPD on our show. Ranga and Sharad were both knowledgeable about the data protection industry, industry trends and talked at length about the new IDPA DP4400.

Dell EMC IDPA DP4400

The IDPA DP4400 is the latest member of the Dell EMC IDPA product family.  All IDPA products package secondary storage, backup software and other solutions/services to make for a quick and easy deployment of a complete backup solution in your data center.  IDPA solutions include protection storage and software, search and analytics, system management — plus cloud readiness with cloud disaster recovery and long-term retention — in one 2U appliance. So there’s no need to buy any other secondary storage or backup software to provide data protection for your data center.

The IDPA DP4400 grows in place  from 24 to 96TB of usable capacity and at an average 55:1 dedupe ratio, it could support over 5PB of backup storage on the appliance. The full capacity always ships with the appliance. Customers can select how much or little they get to use by just purchasing a software license key.

In addition to the on appliance capacity, the IDPA DP4400 can use up to 192TB of cloud storage for a native Cloud tier. Cloud tiering takes place after a specified, appliance residency interval, after which backup data is moved from the appliance to the cloud. IDPA Cloud Tier works with AWS, Azure, IBM Cloud Object Storage, Ceph and Dell EMC Elastic Cloud Storage. With the 192TB of cloud and 96TB of on appliance usable storage, together with a 55:1 dedupe ratio, a single IDPA DP4400 can support over 14PB of logical backup data.

Furthermore, IDPA supports Cloud DR. With Cloud DR, backed up VMs are copied to the public cloud (AWS) on a scheduled basis. In case of a disaster, there is an orchestrated failover with the VMs spun up in the cloud. The cloud workloads can then easily be failed back on site once the disaster is resolved.

The IDPA DP4400 also comes with native DD Boost™ support. This means Oracle, SQL server and other applications that already support DD Boost can also use the appliance to backup and restore their application data. DD Boost customers can make use of native application services such as Oracle RAC to manage their database backups/restores with the appliance.

Dell EMC also offers their Future-Proof Loyalty Program guarantees for the IDPA DP4400, including a Data Protection Deduplication guarantee, which, if best practices are followed, Dell EMC will guarantee the appliance dedupe ratio for backup data. Additional guarantees from the Dell EMC Future-Proof Program for IDPA DP4400 include a 3-Year Satisfaction guarantee, a Clear Price guarantee which guarantees predictable pricing for future maintenance and service as well as a Cloud Enabled guarantee. These are just a few of the Dell EMC guarantees provided for the IDPA DP4400.

The podcast runs ~16 minutes. Ranga and Sharad were both very knowlegdeable on DP industry, DP trends and the new IDPA DP4400.  Listen to the podcast to learn more.

Sharad Rostogi, Senior V.P. Product Management, Dell EMC Data Protection Division

Sharad Rastogi is a global technology executive with strong track record of transforming businesses and increasing shareholder value across a broad range of leadership roles, in both high growth and turnaround situations.

As SVP of Product Management at Dell EMC, Sharad is responsible for all products for the $3B Data Protection business.  He oversees a diverse portfolio, and is currently developing next generation integrated appliances, software and cloud based data protection solutions. In the past, Sharad has held senior roles in general management, products, marketing, corporate development and strategy at leading companies including Cisco, JDSU, Avid and Bain.

Sharad holds an MBA from the Wharton School at the University of Pennsylvania, an MS in engineering from the Boston University and a B.Tech in engineering from the Indian Institute of Technology in New Delhi.

He is an advisor to Boston University, College of Engineering, and a Board member at Edventure More – a non-profit providing holistic education. Sharad is a world traveler, always seeking new adventures and experiences

Rangaraaj (Ranga) Rajagopalan, Senior Director Data Protection Appliances Product Management, Dell EMC Data Protection Division

Ranga Rajagopalan is Senior Director of Product Management for Data Protection Appliances at Dell EMC. Ranga is responsible for driving the product strategy and vision for Data Protection Appliances, setting and delivering the multi-year roadmap for Data Domain and Integrated Data Protection Appliance.

Ranga has 15 years of experience in data protection, business continuity and disaster recovery, in both India and USA. Prior to Dell EMC, Ranga managed the Veritas Cluster Server and Veritas Resiliency Platform products for Veritas Technologies.

64: GreyBeards discuss cloud data protection with Chris Wahl, Chief Technologist, Rubrik

Sponsored by:

In this episode we talk with Chris Wahl, Chief Technologist, Rubrik. This is our second time having Chris on our show. The last time was about three years ago (see our Chris on agentless backup podcast). Talking with Chris again was great and there’s been plenty of news since we last spoke with him.

Rubrik now has three products the Rubrik Cloud Data Protection suite (onprem, virtual or in the [AWS & Azure] cloud), the Rubrik Datos IO (recent acquisition) for NoSql database with semantic dedupe and Rubrik Polaris GPS, a SaaS monitoring/trending/management solution for your data protection environment. Polaris GPS monitors and watches data protection trends for you, to insure all your data protection SLAs are being met. But we didn’t spend much time on Polaris.

Datos IO was designed from the start to backup new databases based on NoSQL technologies and provides, a semantic based deduplication capability, that’s unique in the industry . We talked with Datos IO before their acquisition by Rubrik (see our podcast with Tarun on 3rd generation data protection).

Cloud Data Protection

As for their Cloud Data Protection suite, one major differentiator is that all their functionality is available via RESTful APIs. Their GUI is completely built off their APIs. This means any customer could use their set of APIs to integrate Rubrik data protection with any application/workload on the planet.

Chris mentioned that Rubrik has 40+ specific application/system integrations that provide “strictly consistent” data protection. We assume this means application consistent backups and recovery but goes beyond mere applications.

With the Cloud Data Protection solution, data resides on the appliance for only a short (customer specifiable) period and then is migrated off to cloud or onprem object storage. The object storage could be any onprem S3 compatible storage, in the AWS or Azure cloud. It’s completely automatic. The data migrated to object storage is self-defining, meaning that metadata and data are all available in one spot and can be restored anywhere there’s a Rubrik Cloud Data Protection suite operating.

The Cloud Data Protection appliance also supports onboard search and analytics to search backup/recovery metadata/catalogs. As such, there’s no need to purchase other tools to uncover which backup files exist. Their solution also uses data deduplication to reduce the data stored.

Data stored is also encrypted by customer keys and use HTTPS to transfer data. So, data is secured at rest, secured in flight and deduped. Cloud Data Protection also offers data mobility. That is it can move your VMs and data from onprem to the cloud and use Rubrik in the cloud to rehydrade the data and translate your VMs to run in AWS or Azure and it works in reverse, translating AWS/Azure compute instances into VMs.

Rubrik’s major differentiator is simplicity. Traditionally, customers had been conditioned to thinking data protection took hours to maintain, fix and keep running. But with Rubrik Cloud Data Protection, a customer just points it to an application and selects an SLA, and Rubrik takes over from there.

The secret behind Rubrik’s simplicity is Cerebro. Cerebro is where they have put all the smarts to understand a data center’s infrastructure, applications/VMs, protected data and requested SLAs and just makes it work

The podcast runs ~27 minutes. Chris was great to talk with again and given how long it’s been since we last talked, he had much to discuss. Rubrik seems like an easy solution to adopt and if their growth is any indicator, customers agree. Listen to the podcast to learn more.

Chris Wahl, Chief Technologist, Rubrik

Chris Wahl, author of the award winning Wahl Network blog and host of the Datanauts Podcast, focuses on creating content that revolves around virtualization, automation, infrastructure, and evangelizing products and services that benefit the technology community.

In addition to co-authoring “Networking for VMware Administrators” for VMware Press, he has published hundreds of articles and was voted the “Favorite Independent Blogger” by vSphere-Land three years in a row (2013 – 2015). Chris also travels globally to speak at industry events, provide subject matter expertise, and offer perspectives to startups and investors as a technical adviser.

54: GreyBeards talk scale-out secondary storage with Jonathan Howard, Dir. Tech. Alliances at Commvault

This month we talk scale-out secondary storage with Jonathan Howard,  Director of Technical Alliances at Commvault.  Both Howard and I attended Commvault GO2017 for Tech Field Day, this past month in Washington DC. We had an interesting overview of their Hyperscale secondary storage solution and Jonathan was the one answering most of our questions, so we thought he would make an good guest for our podcast.

Commvault has been providing data protection solutions for a long time, using anyone’s secondary storag, but recently they have released a software defined, scale-out secondary storage solution that runs their software with a clustered file system.

Hyperscale secondary storage

They call their solution, Hyperscale secondary storage and it’s available in both an hardware-software appliance as well as software only configuration on compatible off the shelf commercial hardware. Hyperscale uses the Red Hat Gluster cluster file system and together with the Commvault Data Platform provides a highly scaleable, secondary storage cluster that can meet anyone’s secondary storage needs while providing high availability and high throughput performance.

Commvault’s Hyperscale secondary storage system operates onprem in customer data centers. Hyperscale uses flash storage for system metadata but most secondary storage resides on local server disk.

Combined with Commvault Data Platform

With the sophistication of Commvault Data Platform one can have all the capabilities of a standalone Commvault environment with software defined storage. This allows just about any RTO/RPO needed by today’s enterprise and includes Live Sync secondary storage replication,  Onprem IntelliSnap for on storage snapshot management, Live Mount for instant recovery using secondary storage directly  to boot your VMs without having to wait for data recovery.  , and all the other recovery sophistication available from Commvault.

Hyperscale storage is capable of doing up to 5 Live Mount recoveries simultaneously per node without a problem but more are possible depending on performance requirements.

We also talked about Commvault’s cloud secondary storage solution which can make use of AWS S3 storage to hold backups.

Commvault’s organic growth

Most of the other data protection companies have came about through mergers, acquisitions or spinoffs. Commvault has continued along, enhancing their solution while bashing everything on an underlying centralized metadata database.  So their codebase was grown from the bottom up and supports pretty much any and all data protection requirements.

The podcast runs ~50 minutes. Jonathan was very knowledgeable about the technology and was great to talk with. Listen to the podcast to learn more.

Jonathan Howard, Director, Technical and Engineering Alliances, Commvault

Jonathan Howard is a Director, Technology & Engineering Alliances for Commvault. A 20-year veteran of the IT industry, Jonathan has worked at Commvault for the past 8 years in various field, product management, and now alliance facing roles.

In his present role with Alliances, Jonathan works with business and technology leaders to design and create numerous joint solutions that have empowered Commvault alliance partners to create and deliver their own new customer solutions.

45: Greybeards talk desktop cloud backup/storage & disk reliability with Andy Klein, Director Marketing, Backblaze

In this episode, we talk with Andy Klein, Dir of Marketing for Backblaze, which backs up  desktops and computers to the cloud and also offers cloud storage.

Backblaze has a unique consumer data protection solution where customers pay a flat fee to backup their desktops and then may pay a separate fee for a large recovery. On their website, they have a counter indicating they have restored almost 22.5B files. Desktop/computer backup costs $50/year. To restore files, if it’s under 500GB you can get a ZIP file downloaded at no charge but if it’s larger, you can get a USB flash stick or hard drive shipped FedEx but it will cost you.

They also offer a cloud storage service called B2 (not AWS S3 compatible) which costs $5/TB/year. Backblaze just celebrated their tenth anniversary last April.

Early on Backblaze figured out the only way they were going to succeed was to use consumer class disk drives and to engineer their own hardware and to write their own software to manage them.

Backblaze openness

Backblaze has always been a surprisingly open company. Their Storage Pod hardware (6th generation now) has been open sourced from the start and holds 60 drives for 480TB raw capacity.

A couple of years back when there was a natural disaster in SE Asia, disk drive manufacturing was severely impacted and their cost per GB for disk drives, almost doubled overnight. Considering they were buying about 50PB of drives during that period it was going to cost them ~$1M extra. But you could still purchase drives, in limited quantities, at select discount outlets. So, they convinced all their friends and family to go out and buy consumer drives for them (see their drive farming post[s] for more info).

Howard said that Gen 1 of their Storage Pod hardware used rubber bands to surround and hold disk drives and as a result, it looked like junk. The rubber bands were there to dampen drive rotational vibration because they were inserted vertically. At the time, most if not all of the storage industry used horizontally inserted drives.  Nowadays just about every vendor has a high density, vertically inserted drive tray but we believe Backblaze was the first to use this approach in volume.

Hard drive reliability at Backblaze

These days Backblaze has over 300PB of storage and they have  been monitoring their disk drive SMART (error) logs since the start.  Sometime during 2013 they decided to keep the log data rather than recycling the space. Since they had the data and were calculating drive reliability anyways, they thought that the industry and consumers would appreciate seeing their reliability info. In December of 2014 Backblaze published their hard drive reliability report using Annualized Failure Rates (AFR) they calculated from the many thousands of disk drives they ran every day. They had not released Q2 2017 hard drive stats yet but their Q1 2017 hard drive stats post has been out now for about 3 months.

Most drive vendors report disk reliability using Mean Time Between Failure (MTBF), which is the interval of time until half the drives will fail.  AFR is an alternative reliability metric, which is the percentage of drives that will fail in one year’s time.  Although both are equivalent (for MTBF in hours, AFR=8766/MTBF), AFR is more useful as it tells users the percent of drives they can expect to fail over the next twelve months.

Drive costs matter, but performance matters more

It seemed to the Greybeards that SMR (shingle magnetic recording, read my RoS post for more info) disks would be a great fit for Backblaze’s application. But Andy said their engineering team looked at SMR disks and found the 2nd write (overwrite of a zone) had terrible performance. As Backblaze often has customers who delete files or drop the service, they reuse existing space all the time and SMR disks would hurt performance too much.

We also talked a bit about their current data protection scheme. The new scheme is a Reed Solomon (RS) solution with data written to 17 Storage Pods and parity written to 3 Storage Pods across a 20 Storage Pod group called a Vault.  This way they can handle 3 Storage Pod failures across a Vault without losing customer data.

Besides disk reliability and performance, Backblaze is also interested in finding the best $/GB for drives they purchase. Andy said nowadays the consumer disk pricing (at Backblaze’s volumes) generally falls between ~$0.04/GB and ~$0.025/GB, with newer generation disks starting out at the higher price and as the manufacturing lines mature, fall to the lower price. Currently, Backblaze is buying 8TB disk drives.

The podcast runs ~45 minutes.  Andy was great to talk with and was extremely knowledgeable about disk drives, reliability statistics and “big” storage environments.  Listen to the podcast to learn more.

Andy Klein, Director of marketing at Backblaze

Mr. Klein has 25 years of experience in the cloud storage, computer security, and network security.

Prior to Backblaze he worked at Symantec, Checkpoint, PGP, and PeopleSoft, as well as startups throughout Silicon Valley.

He has presented at the Federal Trade Commission, RSA, the Commonwealth Club, Interop, and other computer security and cloud storage events

 

44: Greybeards talk 3rd platform backup with Tarun Thakur, CEO Datos IO

Sponsored By:

In this episode, we talk with a new vendor that’s created a new method to backup database information. Our guest for this podcast is Tarun Thakur, CEO of Datos IO. Datos IO was started in 2014 with the express purpose to provide a better way to back up and recover databases in the cloud. They started with noSQL, cloud based databases, such as MongoDB and Cassandra.

The problem with backing up noSQL and any SQL databases for that matter, is that they are big files and always have some changes in them. So, for most typical backup systems, databases are always flagged as files that have been changed and thus need to be backed up. So each incremental, backups up the whole database file, even if only a row has changed. All this results in a tremendous waste of storage.

Deduplication can help, but there are problems deduplicating databases. Many databases used compressed data for storing data and deduplication that is based on fixed length blocks don’t work well for variable length, compressed data (see my RayOnStorage Poor deduplication … post).

Also, variable length deduplication algorithms are usually based on known start of record triggers to understand where a chunk of data can be found. Some databases do not use these start of row, start of table indicators, which throw off variable length deduplication algorithms.

So, with traditional backup systems most databases don’t deduplicate very well and are backed up all the time resulting in lots of waisted storage space.

How’s Datos IO different?

Datos IO identifies and backups only changed data, not changed (database) files. Their Datos IO RecoverX product extracts rows from a database, identifies whether this data has changed and then just backups the changed data.

As more customers create applications for the cloud, backups become a critical component of cloud operations. Most cloud based applications are developed from the start, using noSQL databases.

Traditional backup packages don’t work well with NoSQL, cloud databases, if at all. And data center customers are reluctant to move their expensive, enterprise backup packages to the cloud, even if they could operate effectively there.

Datos IO saw that backing up noSQL MongoDB and Cassandra databases in the cloud as a major new opportunity, if it could be done properly.

How does Datos IO backup changed  data?

Essentially, RecoverX takes a point-in-time snapshot of the database and then reads each table, row by row, comparing (hashes of) each row’s data obtained, with the row data they previously backed up and if changed, the new row’s data is added to the current backup. This provides a semantic deduplication of database data.

Furthermore, because RecoverX is looking at the data rather than files, compressed data works just as well as uncompressed. Datos IO uses standardized database APIs to extract the row data, that way they remain compatible with each release of database software.

RecoverX backups reside in S3 objects on the public cloud.

New in RecoverX Version 2

Many customers liked their approach so much they wanted RecoverX to do this for regular SQL databases as well. Major customers are not just developing new applications for the cloud they also want to do enterprise application development, test and QA in the cloud as well, and these applications almost always use SQL databases.

So, Datos IO RecoverX Version 2 nows supports migration and cloud backups for standardized SQL databases. They are starting with MySQL, with plans to support other SQL databases used by the enterprise. Migration occurs by backing up the datacenter MySQL databases to the cloud and then recovering it to the cloud.

They have also added backup and recovery support for Apache Hadoop, HDFS from Cloudera and HortonWorks. Another change is that Datos IO originally offered only a 3 node solution but with Version 2, it will now support  up to a 5 node cluster.

They have also added more backup management and policy support. Now you can add/subtract database table backups at anytime. Now admins can change backup policies  to add or subtract tables/databases on the fly, even while backups are taking place.

The podcast runs ~30 minutes. Tarun has been in the storage industry for a number of years from microcoding storage control logic to managing major industry development organizations. He has an in depth knowledge of storage and backup systems that’s hard to come by and was a pleasure to talk with.  Listen to the podcast to learn more.

Tarun Thakur, CEO, Datos IO

Tarun Thakur is co-founder and CEO, where he leads overall Datos IO business strategy, day-to-day execution, and product efforts. Prior to founding Datos IO he held senior product and technical roles at several leading technology companies.

Most recently, he worked at Data Domain (EMC), where he led and delivered multiple product releases and new products. Prior to EMC, Thakur was at Veritas (Symantec), where he was responsible for inbound and outbound product management for their clustered NAS appliance products.

Prior to that, he worked at the IBM Almaden Research Center where he focused on distributed systems technology. Thakur started his career at Seagate, where he developed advanced storage architecture products.

Thakur has more than 10 patents granted from USPTO and holds an MBA from Duke University.