Disaster recovery from VMware to AWS using Dell EMC Avamar & Data Domain

avI was at Dell EMC World2017 last week and although most of the news was on Dell’s new 14th generation server and Dell-EMC integration progress, Wednesday’s keynote was devoted to storage and non-server infrastructure news.

There was plenty of non-server news but one item that caught my attention was new functionality from Dell EMC Data Protection Division that used Avamar and Data Domain to provide disaster recovery for VMware VMs directly to AWS.

Data Domain (AWS) Cloud DR

Dell EMC Data Domain Cloud DR (DDCDR) is  a new capability that enables DD to backup to AWS S3 object storage and when needed restart the virtual machines within AWS.

DDCDR requires that a customer with Avamar backup and Data Domain (DD) storage install an OVA which deploys an “add-on” to their on-prem Avamar/DD system and install a lightweight VM (Cloud DR server) utility in their AWS domain.

Once the OVA is installed, it will read the changed data and will segment, encrypt, and compress the backup data and then send this and the backup metadata to AWS S3 objects. Avamar/DD policies can be established to control how many daily backup copies are to be saved to S3 object storage. There’s no need for Data Domain or Avamar to run in AWS.

When there’s a problem at the primary data center, an admin can click on a Avamar GUI button and have the Cloud DR server, uncompress, decrypt, rehydrate and restore the backup data into EBS volumes, translate the VMware VM image to an AMI image and then restarts the AMI on an AWS virtual server (EC2) with its data on EBS volume storage. The Cloud DR server will use the backup metadata to select the AWS EC2 instance with the proper CPU and RAM needed to run the application. Once this completes, the VM is running standalone, in an AWS EC2 instance. Presumably, you have to have EC2 and EBS storage volumes resources available under your AWS domain to be able to install the application and restore its data.

For simplicity purposes, the user can control almost all of the required functionality for DDCDR from the Avamar GUI alone. But in case of a site outage, the user can initiate the application DR from a portal supplied by the Cloud DR server utility.

There you have it, simplified, easy to use (AWS) Cloud DR for your VM applications all through Dell EMC Avamar, Data Domain storage and DDCDR. At the moment, it only works with AWS cloud but it’s likely to be available for other public clouds in the near future.

~~~~

There was much more infrastructure news at Dell EMC World2017. I’ll discuss more details on their new storage offerings in my upcoming Storage Intelligence newsletter, due out the end of this month. If your interested in receiving your own copy of my newsletter, checkout the signup button in the upper right of this page.

Comments?

[Edits were made for readability and technical accuracy after this post was published. Ed]

EMCWorld2015 Day 2&3 news

Some additional news from EMCWorld2015 this week:

IMG_4527 IMG_4528 IMG_4531EMC announced directed availability for DSSD, their Rack scale shared Flash storage solution using a PCIe3 (switched) fabric with 36 dual ported, flash modules, which hold 512 NAND chips for 144TB NAND flash storage. On the stage floor they had a demonstration pitting a  40 node Hadoop cluster with DAS against a 15 node Hadoop cluster using the DSSD, both running HIVE and working on the same Query. By the time the 40node/DAS solution got to about 2% of the query completion the 15node/DSSD based cluster had finished the query without breaking a sweat. They then ran an even more complex query and it took no time at all.

They also simulated a copy of a 4TB file (~32K-128K IOs) from memory to memory and it took literally seconds, then copied it to SSD that took considerably longer (didn’t catch how long but much longer than memory), and then they showed the same file copy to DSSD and it only took seconds, almost looked exactly a smidgen slower than the memory to memory copy.

They said the PCIe fabric (no indication what the driver was) provided much more parallelism to the dual ported flash storage that the system was almost able to complete the 4TB copy at memory to memory speeds. It was all pretty impressive, albeit a simulation of the real thing.

EMC indicated that they designed the flash modules themselves and expect to double capacity of the DSSD to 288TB shortly. They showed the controller board that had a mezzanine board over a part of it, but together had 12 major chips on it which I assume had something to do with the PCIe fabric. They said there were two controllers in the system for high availability and the 144TB DSSD was deployed in 5U of space.

I can see how this would play well for real time analytics, high frequency trading and HPC environments but there’s more to shared storage than just speed. Cost wasn’t mentioned neither was the software driver but with the ease with which it worked on the Hive query, I can only assume at some lever it must look something like a DAS device but with memory access times… NVMe anyone?

Project CoprHD was announced which open sourced EMC’s ViPR Controller software. Many ViPR customers were asking for EMC to open source ViPR controller, apparently their listening. Hopefully this will enable some participation from non-EMC storage vendors to allow their storage to be brought under the management of ViPR Controller. I believe the intent is to have an EMC hardened/supported version of Project CoprHD or ViPR Controller to coexist with the open source project version which anyone can download and modify for themselves.

A Non-production, downloadable version of ScaleIO was also announced. The test-dev version is a free download with unlimited capacity, full functionality and available for an unlimited time but only for non-production use.  Another of the demos onstage this morning was Chad configuring storage across a ScaleIO cluster and using its QoS services to limit the impact of a specific workload. There was talk that ScaleIO was available previously as a free download but it took a bunch of effort to find and download. They have removed all these prior hindrances and soon, if not today it’s freely available for anyone. ScaleIO runs on VMware and other hypervisors (maybe bare metal as well). So if you wanted to get your feet wet with software defined storage, this sounds like the perfect opportunity.

ECS is being added to EMC’s Data Lake foundation. Not exactly sure what are all the components in the data lake solution but previously the only Data Lake storage was Isilon based. This week EMC added Elastic Cloud Storage to the picture. Recall that Elastic Cloud Storage comes in either a software only or hardware appliance deployment and provides object storage.

I missed Project Liberty before but it’s a virtual VNX appliance, software only version.  I assume this is intended for ROBO deployments or very low end business environments. Presumably it runs on VMware and has some sort of storage limitations. It seems, more and more of EMC products are coming out in virtual appliance versions.

Project Falcon was also announced which is a virtual Data Domain appliance, software only solution, targeted for ROBO environments and other small enterprises. The intent is to have an onramp for DataDomain backup storage.  I assume runs under VMware.

Project Caspian – rolling out CloudScaling orchestration/automation for OpenStack deployments. On the big stage today, Chad and Jeremy demonstrated Project Caspian on a VCE VxRACK deploying racks of servers under OpenStack control. They were able within a couple of clicks define and deploy openstack on bare metal hardware and deploy applications to the OpenStack servers. They had a monitoring screen which showed the OpenStack server activity (transactions) in real time and showed an over commit of the rack and how easy it was to add a new rack with more servers. All this seemed to take but a few clicks. The intent is not to create another OpenStack distribution but to provide an orchestration/automation/monitoring layer of software on top of OpenStack to “industrialize OpenStack” for enterprise users. Looked pretty impressive to me.

I would have to say the DSSD box was most impressive. It would have been interesting to get an upclose look at the box with some more specifications but they didn’t have one on the Expo floor.

EMC Data Domain products enter the archive market

(c) 2011 Silverton Consulting, Inc., All Rights Reserved
(c) 2011 Silverton Consulting, Inc., All Rights Reserved

In another assault on the tape market, EMC announced today a new Data Domain 860 Archiver appliance. This new system supports both short-term and long-term retention of backup data. This attacks one of the last bastions of significant tape use – long-term data archives.

Historically, a cheap version of archives had been the long-term retention of full backup tapes. As such, if one needed to keep data around for 5 years, one would keep all their full backup tape sets offsite, in a vault somewhere for 5 years. They could then rotate the tapes (bring them back into scratch use) after the 5 years elapsed. One problem with this – tape technology is advancing to a new generation of technology more like every 2-3 years and as such, a 5-year old tape cartridge would be at least one generation back before it could be re-used. But current tape technology always reads 2 generations and writes at least one generation back so this use would still be feasible. I would say that many tape users did something like this to create a “psuedopseudo-archive”.

On the other hand, there exists many specific archive point products that focused on one or a few application arenas such as email, records, or database archives which would extract specific data items and place them into archive. These did not generally apply outside one or a few application domains but were used to support stringent compliance requirements. The advantage of these application based archive systems is that the data was actually removed from primary storage, out of any data protection activities and placed permanently in only “archive storage”. Such data would be subject to strict retention policies and as such, would be inviolate (couldn’t be modified) and could not be deleted until formally expired.

Enter the Data Domain 860 Archiver, this system supports up to 24 disk shelves, each one of which could either be dedicated to short- or long-term data retention. Backup file data is moved within the appliance by automated policy from short- to long-term storage. Up to 4-disk shelves can be dedicated to short-term storage with the remainder considered long-term archive units.

When a long-term archive unit (disk shelf) fills up with backup data it is “sealed”, i.e., it is given all the metadata required to reconstruct its file system and deduplication domain and thus, would not require the use of other disk shelves to access its data. In this way one creates a standalone unit that contains everything needed to recover the data. Not unlike a full backup tape set which can be used in a standalone fashion to restore data.

Today, the Data Domain 860 Archiver only supports file access and DD boost data access. By doing so, the backup software is responsible for deleting data that has expired. Such data will then be absent deleted from any backups taken and as policy automation copies the backups to long-term archive units it will be missing gone from there as well.

While Data Domain’s Archiver lacks removing the data from ongoing backup streams that application based archive products can achieve, it does look exactly like what could be achieved from tape based archives today.

One can also replicate base Data Domain or Archiver appliances to an Archiver unit to achieve offsite data archives.

—-

Full disclosure: I currently work with EMC on projects specific to other products but am not currently working on anything associated with this product.

Tape, your move…

EMC NetWorker 7.6 SP1 surfaces

Photo of DD880 appliance (from EMC.com)
Photo of DD880 appliance (from EMC.com)

This week EMC releases NetWorker 7.6 SP1 with new Boost support for Data Domain (DD) appliances which allows NetWorker’s storage node (media server) and the DD appliance to jointly work on providing deduplication services.  Earlier this year EMC DD announced the new Boost functionality which at the time only worked with Symantec’s OST interface. But with this latest service pack (SP1), NetWorker also offers this feature and EMC takes another step to integrate DD systems and functonality across their product portfolio.

DD Boost integration with NetWorker

DD Boost functionality resides on the NetWorker storage node which transfers data to backend storage.  Boost offloads the cutting up of data into segments fingerprinting segments and passing the fingerprints to DD.  Thereafter NetWorker only passes unique data between the storage node and the DD appliance.

Doing this reduces the processing workload on DD appliance, uses less network bandwidth, and on the NetWorker storage node itself, reduces the processing requirements.  While this later reduction may surprise some, realize the storage node primarily moves data and with DD Boost, it moves less data, consuming less processing power. All in all, with NetWorker-DD Boost vs. NetWorker using DD in NFS mode there is a SIGNIFICANT improvement in data ingest performance/throughput.

DD cloning controlled by NetWorker

Also the latest SP incorporates DD management integration, such that an admin can control DataDomain replication from the NetWorker management console alone.  Thus, the operator no longer needs to use the DD management interface to schedule, monitor, and terminate DD replication services.

Additionally, NetWorker can now be aware of all DD replicas and as such, can establish separate retention periods for each replica all from the NetWorker management interface.  Another advantage is that now tape clones of DD data can be completely managed from the NetWorker management console.

Furthermore, one can now configure new DD appliances as a NetWorker resource using new configuration wizards.  NetWorker also supports monitoring and alerting on DD appliances through the NetWorker management console which includes capacity utilization and dedupe rates.

Other enhancements made to NetWorker

  • NetWorker Cloning – scheduling of clones no longer requires CLI scripts and is now can be managed within the GUI as well.  NetWorker cloning is the process which replicates save sets to other storage media.
  • NetWorker Checkpoint/Restart- resuming backups from known good points after a failure. Checkpoint/Restart can be used for very large save sets which cannot complete within a window.

New capacity based licensing for NetWorker

It seems like everyone is simplifying their licensing (see CommVault’s Simpana 9 release). With this version of NetWorker, EMC now supports a capacity based licensing option in addition to their current component- and feature-based  licensing.  With all the features of the NetWorker product, component-based licensing has become more complex and cumbersome to use.  The new Capacity License Option charges on the amount of data being protected and all NetWorker features are included at no additional charge.

The new licensing option is available worldwide, with no tiers of capacity based licensing for feature use, i.e., one level of capacity based licensing.  Capacity based licensing can be more cost effective for those using advanced NetWorker features, should be easier to track, and will be easier to install.  Anyone under current maintenance can convert to the new licensing model but it requires this release of NetWorker software.

—-

NetWorker’s 7.6 SP1 is not a full release but substantial nonetheless.  Not the least of which is the DD Boost and management integration being rolled out.  Also, I believe the new licensing option may appeal to a majority of their customer base but one has to do the math.  Probably some other enhancements I missed here but these seem the most substantial.

What do you think?

EMC's Data Domain ROI

I am trying to put EMC’s price for Data Domain (DDup) into perspective but am having difficulty. According to InfoWorld article on EMC acquisitions ’03-’06 and some other research this $2.2B$2.4B is more money (not inflation adjusted) than anything in EMC’s previous acquisition history. The only thing that comes close was the RSA acquisition for $2.1B in ’06.

VMware only cost EMC $625M and has been by all accounts, very successful being spun out of EMC in an IPO and currently shows a market cap of ~$10.2B. Documentum cost $1.7B and Legato only cost $1.3B both of which are still within EMC.

Something has happened here, in a recession valuations are supposed to be more realistic not less realistic. At Data Domain’s TTM revenues ($300.5M) this will take over 7 years to breakeven on a straightline view. If one considers WACC (weighted average cost of capital) it looks much worse. Looking at DDup’s earnings makes it look even worse.

Other than fire up EMC’s marketing and sales engine to sell more DDup products, what else can EMC do to gain a better return on it’s DDup acquisition? (not in order)

  • Move EMC’s current Disk Libraries to DDup technology and let go of Quantum-FalconStor OEM agreements and/or abandon the current DL product line and substitute Ddup
  • Incorporate DDup technology into Legato Networker for target deduplication applications
  • Incorporate DDup technology into Mozy and Atmos
  • Incorporate DDup technology into Documentum
  • Incorporate DDup technology into Centera and Celerra

Can EMC selling DDup products and doing all this to better its technology double the revenue earnings and savings derived from DDup products and technology – maybe. But the incorporation of DDup into Centera and Celerra could just as easily decrease EMC revenues profits from the storage capacity lost depending on the relative price differences.

I figure the Disk Library, Legato, and Mozy integrations would be first on anyone’s list. Atmos next, and Celerra-Centera last.

As for what to add to DDup’s product line. Possibly additions are around the top end and the bottom end. DDup has been moving up market of late and integration with EMC DL might just help take it there. Down market, there is a potential market of small businesses that might want to use DDup technology at the right price point.

Not sure if the money paid for Ddup still makes sense but at least it begins to look better…

EMC Better At Acquisitions?

I was talking with an EMCer the other day about the Data Domain deal and he said that EMC does very well with acquisitions. Just about every EMC product line other than Symmetrix (and possibly Celerra, Invista, PowerPath and maybe others) came from an acquisition in EMC’s past.

The list goes something like this Clariion from Data General, Centerra from FilePool, Control Center from BMC, Networker from Legato, RainFinity, Avamar, Documentum, RSA all from companies of the same name. There are other examples as well but these should suffice for now. One almost starts to forget about all these separate companies that existed prior to EMC’s acquisitions. Over time EMC manages to succeed in advancing and integrating the various technologies and products into their portfolio.

On the other extreme is Sun. They have almost a perfect record of acquiring companies and burying the technology away. Often the technology does emerge after a gestation period in another Sun product somewhere else but just as often it just fades away never to be seen again.

Today’s companies have to do acquisitions well. They can no longer afford the luxury to acquire companies and then see their investment die away. Those days are long gone

What makes EMC so successful while others can do so poorly? One thing I have learned is that EMC leaves a new acquisition pretty much alone for 12 months or so. During that time presumably they are assessing the current management team for EMC cultural fit and determining the best way to sell, advance and integrate the acquired technology into the rest of EMC’s product and services portfolio.

The other thing I have noticed is that EMC’s most recentr acquisitions have retained at least portions of their original brand names. Networker, RainFinity, Documentum, and RSA are examples here.

I don’t know what it is about retaining a brand name but 1) it makes it harder to let it fade away because it’s so visible, 2) employees who have a personal interest in the brand fight to keep it alive and advancing, and 3) customer base and loyalty is retained better.

Just pieces of the puzzle but no doubt there is more to this than is visible externally.

How well NetApp will do as an Acquirer is another question. I know they have acquired Spinnaker, Alacritus, Decru, Topio, and Onaro over the past five years. Most of these products are still being sold. Rumors point to Spinnaker technology being merged into NetApp’s mainline product soon. All in all, I would have to say that although NetApp has retained the product names for most of these products Onaro’s SANScreen, Decru’s DataFort and others, they haven’t necessarily done a good job keeping the brandnames alive.

What NetApp will do with Data Domain however, is another matter entirely. First, the price being paid is much higher than any previous acquisitions. Second, the market share that Data Domain currently enjoys is much larger than any previous acquisition. Finally, it’s crucial to NetApp’s future revenue growth to do this one right. Given all that, I truly believe they will do a much better job with retaining Data Domain’s brand and product names, thereby keeping the product alive and well for the foreseeable future.

Rgds,
Ray

Data Domain bidding war

It’s unclear to me what EMC would want with Data Domain (DD) other than to lockup deduplication technology across the enterprise. EMC has Avamar for Source dedupe, has DL for target dedupe, has Celerra Dedupe and the only one’s missing are V-Max, Symm & Clariion dedupe.

My guess is that EMC sees Data Domain’s market share as the primary target. It doesn’t take a lot of imagination to figure that once Data Domain is a part of EMC, EMC’s Disk Library (DL) offerings will move over to DD technology. Which probably leaves FalconStor/Quantum technology used in DL today as outsiders.

EMC’s $100M loan to Quantum last month probably was just insurance to keep a business partner afloat until something better came along or they could make it on their own. The DD deal would leave Quantum parntership supporting EMC with just Quantum’s tape offerings.

Quantum deduplication technology doesn’t have nearly the market share that DD has in the enterprise but they have won a number of OEM deals not the least of which is EMC and they were looking to expand. But if EMC buys DD, this OEM agreement will end soon.

I wonder if DD is worth $1.8B in cash what could Sepaton be worth. They seem to be the only pure play dedupe appliance left standing out there.

Not sure whether NetApp will up their bid but they always seem to enjoy competing with EMC. Also unclear how much of this bid is EMC wanting DD or EMC just wanting to hurt NetApp, either way DD stockholders win out in the end.

Data Domain and NetApp

Data Domain has been a longtime partner of NetApp’s, which is probably one reason that NetApp finally decided to make them an offer. Another reason why it’s right to do this now is that in today’s economy, NetApp could get the best price.

The final reason that NetApp and Data Domain should hook up is that there are not many other major storage vendors that don’t already have a dedicated deduplication appliance or two. If Sun were still around it might make sense for them to think about buying Data Domain but they are out of the picture until Oracle figures out what to do with their storage business. EMC has bought Avamar and invested significantly in Quantum. IBM has purchased Diligent, Symantec has PureDisk. HP already has a deduplication product. The only major vendor without dedupe today is HDS.

Data Domain had a lot going for them. They practically defined the target deduplication appliance market. Diligent (now with IBM), Quantum, Sepaton, and others notwithstanding, Data Domain had the largest market share out there and was continuing to experience rapid growth. The fact that Data Domain both supported NAS as well as VTL access modes coupled with their excellent market share made them a prime acquisition on many fronts.

NetApp, of course, has their own deduplication technology which has been very successful in supporting virtual server environments primary storage and was also used to support secondary storage. No doubt over time these two technologies could conceivably merge into one. But don’t hold your breath, some companies have way more than two distinct deduplication technologies which are used for their various products and NetApp may not feel its worthwhile to combine the two technologies in the near future given there two different markets.

All in all, consolidation is necessary evil today.