Rubrik has been around since January 2014 and just GA’d in April of last year. They recently presented at TechFieldDay 10 (TFD10, videos here) with Chris Wahl, Technical Evangelist, Arvin “Nitro” Nithrakashyap, Co-Founder and Bipul Sinha, Co-Founder, in attendance.
I have known Chris Wahl since November of 2013, from our time together on Storage Field Day 4 (SFD4). Howard and I (the “Greybeards”) also interviewed Chris Wahl for Rubrik on a Greybeards on Storage podcast.
VMware backups suck
Rubrik is out to change that and has taken a different tack to doing backup for VMware. Most backup packages are horendous to configure, requiring a multitude of jobs to be scheduled, backup proxies to be configured, backup servers, catalog service/search services, etc. All that just to backup a VMware snapshot.
With typical VMware backups it’s not just the configuration/setup which is toilsome, there are also multiple single points of failure, not the least of which is the backup server that orchestrates the whole thing, the backup storage that holds the data and sometimes, the backup database (catalog/metadata) server that holds information about what’s been backed up where and when.
VMware backup converged infrastructure
Rubrik has taken a converged infrastructure approach to backup. In their offering, backup storage, backup server and catalog server are all together, in one, multi-node cluster. Rubrik systems come with one or more 2u appliances (called “Brik”s), with 4 server nodes and storage, that can be scaled up or down (minimum 3 nodes, I think) to provide backup, recovery and archive services for VMware VMs.
They have some pretty slick interfacing with VMware vSphere to help the backup process along:
- Bring in VMware infrastructure and VM configuration information into the appliance.
- Use VMware snapshotting to create backup snapshots which are then backed up
- Use vADP (VMware vStorage APIs for Data Protection) services that provides changed block tracking to make backup data ingestion more efficient.
Globally deduped, compressed scale-out backup appliance
The backup appliance supplies onboard, globally deduplicated/locally compressed backup storage, providing in a normal environment ~85% data reduction. So in a 1-Brik, 4-node cluster with 3-4TB drives per node, Rubrik provides ~15TB usable or ~100TB effective capacity for backup data. As they say, your mileage (dedupe/compression ratio) may vary…
Backup data is triply mirrored across the backup cluster. Metadata (backup catalogs) are replicated across all nodes in a cluster. In addition to the node storage, backup data can also be archived to offsite storage such as Amazon S3, IBM Cleversafe, & Scality object storage services as well as any NFS archive storage.
The backup cluster, in total, creates an Atlas File system. This is web-scale, master-less (no single point of failure) NFS file system that supports intelligent data striping across disks in the cluster.
Each node in a cluster can be configured with 3- 4TB or -8TB disk drives and 1- 400GB SSD for backup data and metadata storage. Rubrik deduplication occurs at the sub-VM/sub-VMDK level and backup data is ingested first to SSDs and then deduped/compressed before it goes out to disk.
The backup nodes provide a shared-nothing, scale out architecture to support VMware backups.
All nodes in the backup cluster can participate in any and all backup data ingestion, allowing a lot of VM backups to occur in parallel. Each 4-node Brik can sustain up to 1.2GB/sec of data backup transfers.
All nodes support a distributed task queue for their work activities that independently determines which tasks to execute and can execute multiple tasks in parallel across the cluster.
At the moment, their biggest backup cluster has 20 nodes but there doesn’t seem to be a physical limit other than what’s been validated in their labs. Having a scaleable cluster like Rubrik for backup services, means there’s no hardware single point of failure in your backup environment.
Besides the archive storage option, Rubrik also supports unidirectional, bi-directional and hub & spoke Asynch replication options for backup data and metadata storage. This is especially useful to help reduce RTO for DR.
Declarative vs. Imperative configuration
In contrast to most backup software systems Imperative configurations, Rubrik supports what they call a Declarative mode of backup configuration. Rather than the time-consuming, establishing backup schedules on a per VM/VMDK basis, you just create backup classes (service level agreements, SLAs) or use one that’s already defined and assign a VM to a backup SLA class. Once that’s done, the Rubrik cluster will figure out when and how to get the VM’s backups done.
Rubrik uses VSS providers to supply application consistent VMware snapshots for Windows Server VMs.
Local retention and archive characteristics are also established via backup SLAs. Thus, with archive specified for an VM’s SLA, its backup data will age off the local cluster and moved to archive storage, over time freeing up the cluster’s effective capacity for more VM backup data.
Rubrik backup SLA policies can support RPO (recovery point objectives), availability duration (retention), when to archive (long term retention) and replication schedule (DR).
Restores are simple and near immediate
Backup data is indexed at the file level after ingestion. This allows for individual file restores.
Rubrik also supplies instant VM recovery where you can mount a VMDK on the Rubrik cluster and have your VMs access the recovered data directly. These near-zero time restores are rehydrated/decompressed to SSD storage for quick access. Each Brik can sustain up to 30K IOPs, so using near-zero restores shouldn’t cause any performance impact to your VMs. Near-zero restores work for archived data as well but take longer to gain access to the data.
If you would prefer to access the restored data on its primary storage one can always Storage vMotion the data.
Well that about wraps it up. Every influencer at TFD10 received a bobblehead as a gift from Rubrik, nice touch, although I have more hair…
For more information here are some other posts on Rubrik from the TFD10 team:
- TFD10 preview: Rubrik by Chris M. Evans (@chrismevans)
- Rubrik is like Apple Time Machine but for the datacenter by Enrico Signoretti (@esignoretti)