EMC NetWorker 7.6 SP1 surfaces

Photo of DD880 appliance (from EMC.com)
Photo of DD880 appliance (from EMC.com)

This week EMC releases NetWorker 7.6 SP1 with new Boost support for Data Domain (DD) appliances which allows NetWorker’s storage node (media server) and the DD appliance to jointly work on providing deduplication services.  Earlier this year EMC DD announced the new Boost functionality which at the time only worked with Symantec’s OST interface. But with this latest service pack (SP1), NetWorker also offers this feature and EMC takes another step to integrate DD systems and functonality across their product portfolio.

DD Boost integration with NetWorker

DD Boost functionality resides on the NetWorker storage node which transfers data to backend storage.  Boost offloads the cutting up of data into segments fingerprinting segments and passing the fingerprints to DD.  Thereafter NetWorker only passes unique data between the storage node and the DD appliance.

Doing this reduces the processing workload on DD appliance, uses less network bandwidth, and on the NetWorker storage node itself, reduces the processing requirements.  While this later reduction may surprise some, realize the storage node primarily moves data and with DD Boost, it moves less data, consuming less processing power. All in all, with NetWorker-DD Boost vs. NetWorker using DD in NFS mode there is a SIGNIFICANT improvement in data ingest performance/throughput.

DD cloning controlled by NetWorker

Also the latest SP incorporates DD management integration, such that an admin can control DataDomain replication from the NetWorker management console alone.  Thus, the operator no longer needs to use the DD management interface to schedule, monitor, and terminate DD replication services.

Additionally, NetWorker can now be aware of all DD replicas and as such, can establish separate retention periods for each replica all from the NetWorker management interface.  Another advantage is that now tape clones of DD data can be completely managed from the NetWorker management console.

Furthermore, one can now configure new DD appliances as a NetWorker resource using new configuration wizards.  NetWorker also supports monitoring and alerting on DD appliances through the NetWorker management console which includes capacity utilization and dedupe rates.

Other enhancements made to NetWorker

  • NetWorker Cloning – scheduling of clones no longer requires CLI scripts and is now can be managed within the GUI as well.  NetWorker cloning is the process which replicates save sets to other storage media.
  • NetWorker Checkpoint/Restart- resuming backups from known good points after a failure. Checkpoint/Restart can be used for very large save sets which cannot complete within a window.

New capacity based licensing for NetWorker

It seems like everyone is simplifying their licensing (see CommVault’s Simpana 9 release). With this version of NetWorker, EMC now supports a capacity based licensing option in addition to their current component- and feature-based  licensing.  With all the features of the NetWorker product, component-based licensing has become more complex and cumbersome to use.  The new Capacity License Option charges on the amount of data being protected and all NetWorker features are included at no additional charge.

The new licensing option is available worldwide, with no tiers of capacity based licensing for feature use, i.e., one level of capacity based licensing.  Capacity based licensing can be more cost effective for those using advanced NetWorker features, should be easier to track, and will be easier to install.  Anyone under current maintenance can convert to the new licensing model but it requires this release of NetWorker software.

—-

NetWorker’s 7.6 SP1 is not a full release but substantial nonetheless.  Not the least of which is the DD Boost and management integration being rolled out.  Also, I believe the new licensing option may appeal to a majority of their customer base but one has to do the math.  Probably some other enhancements I missed here but these seem the most substantial.

What do you think?

CommVault’s Simpana 9 release

CommVault annoucned a new release of their data protection product today – Simpana® 9.  The new software provides significantly enhanced support for VM backup, new source-level deduplication capabilities and other enhanced facilities.

Simpana 9 starts by defining 3 tiers of data protection based on their Snapshot Protection Client (SPC):

  • Recovery tier – using SPC application consistent hardware snapshots can be taken utilizing storage interfaces to create content aware granular level recovery.  Simpana 9 SPC now supports EMC, NetApp, HDS, Dell, HP, and IBM (including LSI) storage snapshot capabilities.  Automation supplied with Simpana 9 allows the user to schedule hardware snapshots at various intervals throughout the day such  that they can be used to recover data without delay.
  • Protection tier – using mounted snapshot(s) provided by SPC above, Simpana 9 can create an extract or physical backup set copy to any disk type (DAS, SAN, NAS) providing a daily backup for retention purposes. This data can be deduplicated and encrypted for increased storage utilization and data security.
  • Compliance tier – selective backup jobs can then be sent to cloud storage and/or archive appliances such as HDS HCP or Dell DX for long term retention and compliance, preserving CommVault’s deduplication and encryption.  Alternatively, compliance data can be sent to the cloud.  CommVault’s previous cloud storage support included Amazon S3, Microsoft Azure, Rackspace, Iron Mountain and Nirvanix, with Simpana 9, they now add EMC Atmos providers and Mezeo to the mix.

Simpana 9 VM backup support

Simpana 9 also introduces a SnapProtect Enable Virtual Server Agent (VSA) to speed up virtual machine datastore backups.  With VSA’s support for storage hardware snapshot backups and VMware facilities to provide application consistent backups, virtual server environments can now scale to 1000s of VMs without concern for backup’s processing and IO impact to ongoing activity.  VSA snapshots can be mounted afterwards to a proxy server and using VMware services extract file level content which CommVault can then data deduplicate, encrypt and offload to other media that allows for granular content recovery.

In addition, Simpana 9 supports auto-discovery of virtual machines with auto-assignment of data protection policies.  As such, VM guests can be automatically placed into an appropriate, pre-defined data protection regimen without the need for operator intervention after VM creation.

Also with all the meta-data content cataloguing, Simpana 9 now supplies a light weight file-oriented Storage Resources Manager capability via the CommVault management interface.  Such services can provide detailed file level analytics for VM data without the need for VM guest agents.

Simpana 9 new deduplication support

CommVault’s 1st gen deduplication with Simpana 7 was at the object level.  With Simpana 8 deduplication occured at the block level providing content aware variable block sizes and added software data encryption support for disk or tape backup sets.  With today’s release, Simpana 9 shifts some deduplication processing out to the source (the client) increasing backup data throughput by reducing data transfer. All this sounds similar to EMC’s Data Domain Boost capability introduced earlier this year .

Such a change takes advantage of the CommVault’s intelligent Data Agent (iDA) running in the clients to provide pre-deduplication hashing and list creation rather than doing this all at CommVault’s Media Agent node, reducing data to be transferred.  Further, CommVault’s data deduplication can be applied across a number of clients for a global deduplication service that spans remote clients as well as a central data center repositories.

Simpana 9 new non-CommVault backup reporting and migration capabilities

Simpana 9 provides a new data collector for NetBackup versions 6.0, 6.5, and 7.0 and TSM 6.1 which allows CommVault to discover other backup services in the environment, extract backup policies, client configurations, job histories, etc. and report on these foreign backup processes.  In addition, once their data collertor is in place, Simpana 9 also supports automated procedures that can roll out and convert all these other backup services to CommVault data protection over a weekend, vastly simplifying migration from non-CommVault to Simpana 9 data protection.

Simpana 9 new software licensing

CommVault is also changing their software licensing approach to include more options for capacity based licensing. Previously, CommVault supported limited capacity based licensing but mostly used CommVault architectural component level licensing.  Now, they have expanded the capacity licensing offerings and both licensing modes are available so the customer can select whichever approach proves best for them.  With CommVault’s capacity-based licensing, usage can be tracked on the fly to show when customers may need to purchase a larger capacity license.

Probably other enhancements I missed here as Simpana 9 was a significant changeover from Simpana 8. Nonetheless, this version’s best feature was their enhanced approach to VM backups, allowing more VMs to run on a single server without concern for backup overhead.  The fact that they do source-level pre-deduplication processing just adds icing to the cake.

What do you think?

Cloud storage, CDP & deduplication

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Somebody needs to create a system that encompasses continuous data protection, deduplication and cloud storage.  Many vendors have various parts of such a solution but none to my knowledge has put it all together.

Why CDP, deduplication and cloud storage?

We have written about cloud problems in the past (eventual data consistency and what’s holding back the cloud) despite all that, backup is a killer app for cloud storage.  Many of us would like to keep backup data around for a very long time. But storage costs govern how long data can be retained.  Cloud storage with its low cost/GB/month can help minimize such concerns.

We have also blogged about dedupe in the past (describing dedupe) and have written in industry press and our own StorInt dispatches on dedupe product introductions/enhancements.  Deduplication can reduce storage footprint and works especially well for backup which often saves the same data over and over again.  By combining deduplication with cloud storage we can reduce the data transfers and data stored on the cloud, minimizing costs even more.

CDP is more troublesome and yets still worthy of discussion.  Continuous data protection has always been sort of a step child in the backup business.  As a technologist, I understand it’s limitations (application consistency) and understand why it has been unable to take off effectively (false starts).   But, in theory at some point CDP will work, at some point CDP will use the cloud, at some point CDP will embrace deduplication and when that happens it could be the start of an ideal backup environment.

Deduplicating CDP using cloud storage

Let me describe the CDP-Cloud-Deduplication appliance that I envision.  Whether through O/S, Hypervisor or storage (sub-)system agents, the system traps all writes (forks the write) and sends the data and meta-data in real time to another appliance.  Once in the CDP appliance, the data can be deduplicated and any unique data plus meta data can be packaged up, buffered, and deposited in the cloud.  All this happens in an ongoing fashion throughout the day.

Sometime later, a restore is requested. The appliance looks up the appropriate mapping for the data being restored, issues requests to read the data from the cloud and reconstitutes (un-deduplicates) the data before copying it to the restoration location.

Problems?

The problems with this solution include:

  • Application consistency
  • Data backup timeframes
  • Appliance throughput
  • Cloud storage throughput

By tieing the appliance to a storage (sub-)system one may be able to get around some of these problems.

One could configure the appliance throughput to match the typical write workload of the storage.  This could provide an upper limit as to when the data is at least duplicated in the appliance but not necessarily backed up (pseudo backup timeframe).

As for throughput, if we could somehow understand the average write and deduplication rates we could configure the appliance and cloud storage pipes accordingly.  In this fashion, we could match appliance throughput to the deduplicated write workload (appliance and cloud storage throughput)

Application consistency is more substantial concern.  For example, copying every write to a file doesn’t mean one can recover the file.  The problem is at some point the file is actually closed and that’s the only time it is in an application consistent state.  Recovering to a point before or after this, leaves a partially updated, potentially corrupted file, of little use to anyone without major effort to transform it into a valid and consistent file image.

To provide application consistency, one needs to somehow understand when files are closed or applications quiesced.  Application consistency needs would argue for some sort of O/S or hypervisor agent rather than storage (sub-)system interface.  Such an approach could be more cognizant of file closure or application quiesce, allowing a synch point could be inserted in the meta-data stream for the captured data.

Most backup software has long mastered application consistency through the use of application and/or O/S APIs/other facilities to synchronize backups to when the application or user community is quiesced.  CDP must take advantage of the same facilities.

Seems simple enough, tie cloud storage behind a CDP appliance that supports deduplication.  Something like this could be packaged up in a cloud storage gateway or similar appliance.  Such a system could be an ideal application for cloud storage and would make backups transparent and very efficient.

What do you think?