vSphere 5 storage enhancements

Doyle on a Riemann Sphere by fdecomite (cc) (from Flickr)
Doyle on a Riemann Sphere by fdecomite (cc) (from Flickr)

Twitter was all abuzz yesterday about the recent VMware vSphere 5 announcement. Although there were quite a few changes that came out, the ones of most interest to me were all in the data storage arena:

  • DAS storage appliance – VMware vSphere 5 now has a virtual machine that can take server DAS and offer a shared storage service to other VMs.  The storage appliance is only available for the Essentials+ and below licensing options and is restricted to a three physical ESX server environment.
  • Host based replication service – vSphere 5 now offers a software only replication option to support disaster recovery.  The host-based replication service is not considered high-bandwidth and will not compete with storage or other hardware replication products but can be used to support heterogeneous storage replication.
  • Storage DRS – For vSphere 5 Enterprise edition and above once storage pools have been defined, Storage DRS can migrate VMs to other storage within a pool to automatically load balance IO activity.
  • Storage performance guarantees – For Enterprise edition and above vSphere 5 can provide a QOS capability for IO activity allowing designated, high priority VMs to gain preferential access to IO queues and such so that they perform better in a noisy, mixed environment.
  • IO performance improvements – VMware claims a 4X improvement in storage throughput with vSphere 5.
  • Linked clones – VMware now offers a storage option that can chain two read-writeable copies of a VMDK together and only store the changes needed for the second copy enabling quicker and more efficient storage provisioning for similar VMs.

DAS appliances have been around for awhile now but have never been really popular. However for smaller shops, this might be just the thing to help them start down the virtualization path.  Similarly the VMware host- based data replication is a low-end capability that might help these customers virtualize, although this may be a bit more sophisticated than most SMB data centers need.

Storage performance guarantees, DRS, and automatic provisioning seem to be targeted at the higher end shops with vast storage farms to manage.  Such shops would like to automate (as much as possible) some of the performance management, provisioning and service management that they currently need to do manually to ease VMware’s storage admins workloads.

Linked clones and IO performance improvements will benefit all shops. However, IO improvements should enable bigger more mission critical applications to be virtualized.  On the other hand, linked clones will help all customers quickly and more efficiently deploy lot’s of similar VMs.

——–

The big complaint on Twitter yesterday was on VMware’s licensing change. Apparently vSphere is licensed on a vRAM basis (the amount of virtual memory assigned to all VMs in a shop).  How this will impact customer costs is subject to debate but each vSphere processor license gets a certain amount of vRAM available to it (from 24GB to 48GB of vRAM per slot, depending on license level).

There’s been lot’s of talk about VASA and VAAI capabilities that are being rolled out by storage vendors but that will need to wait until another post.

Comments?

EMCWorld day 2

Day 2 saw releases for new VMAX  and VPLEX capabilities hinted at yesterday in Joe’s keynote. Namely,

VMAX announcements

VMAX now supports

  • Native FCoE with 10GbE support now VMAX supports directly FCoE, 10GbE iSCSI and SRDF
  • Enhanced Federated Live Migration supports other multi-pathing software, specifically it now adds MPIO to PowerPath and soon to come more multi-pathing solutions
  • Support for RSA’s external key management (RSA DPM) for their internal VMAX data security/encryption capability.

It was mentioned more than once that the latest Enginuity release 5875 is being adopted at almost 4x the rate of the prior generation code.  The latest release came out earlier this year and provided a number of key enhancements to VMAX capabilities not the least of which was sub-LUN migration across up to 3 storage tiers called FAST VP.

Another item of interest was that FAST VP was driving a lot of flash sales.  It seems its leading to another level of flash adoption. According to EMC they feel that almost 80-90% of customers can get by with 3% of their capacity in flash and still gain all the benefits of flash performance at significantly less cost.

VPLEX announcements

VPLEX announcements included:

  • VPLEX Geo – a new asynchronous VPLEX cluster-to-cluster communications methodology which can have the alternate active VPLEX cluster up to 50msec latency away
  • VPLEX Witness –  a virtual machine which provides adjudication between the two VPLEX clusters just in case the two clusters had some sort of communications breakdown.  Witness can run anywhere with access to both VPLEX clusters and is intended to be outside the two fault domains where the VPLEX clusters reside.
  • VPLEX new hardware – using the latest Intel microprocessors,
  • VPLEX now supports NetApp ALUA storage – the latest generation of NetApp storage.
  • VPLEX now supports thin-to-thin volume migration- previously VPLEX had to re-inflate thinly provisioned volumes but with this release there is no need to re-inflate prior to migration.

VPLEX Geo

The new Geo product in conjuncton with VMware and Hyper V allows for quick migration of VMs across distances that support up to 50msec of latency.  There are some current limitations with respect to specific VMware VM migration types that can be supported but Microsoft Hyper-V Live Migration support is readily available at full 50msec latencies.  Note,  we are not talking about distance here but latency as the limiting factor to how far the VPLEX clusters can be apart.

Recall that VPLEX has three distinct use cases:

  • Infrastructure availability which proides fault tolerance for your storage and system infrastructure
  • Application and data mobility which means that applications can move from data center to data center and still access the same data/LUNs from both sites.  VPLEX maintains cache and storage coherency across the two clusters automatically.
  • Distributed data collaboration which means that data can be shared and accessed across vast distances. I have discussed this extensively in my post on Data-at-a-Distance (DaaD) post, VPLEX surfaces at EMCWorld.

Geo is the third product version for VPLEX, from VPLEX Local that supports within data center virtualization, to Vplex Metro which supports two VPLEX clusters which are up to 10msec latency away which generally is up to metropolitan wide distances apart, and Geo which moves to asynchronous cache coherence technologies. Finally coming sometime later is VPLEX Global which eliminates the restriction of two VPLEX clusters or data centers and can support 3-way or more VPLEX clusters.

Along with Geo, EMC showed some new partnerships such as with SilverPeak, Cienna and others used to reduce bandwidth requirements and cost for their Geo asynchronous solution.  Also announced and at the show were some new VPLEX partnerships with Quantum StorNext and others which addresses DaaD solutions

Other announcements today

  • Cloud tiering appliance – The new appliance is a renewed RainFinity solution which provides policy based migration to and from the cloud for unstructured data. Presumably the user identifies file aging criteria which can be used to trigger cloud migration for Atmos supported cloud storage.  Also the new appliance can support archiving file data to the Data Domain Archiver product.
  • Google enterprise search connector to VNX – Showing a Google search appliance (GSA) to index VNX stored data. Thus bringing enterprise class and scaleable search capabilities for VNX storage.

A bunch of other announcements today at EMCWorld but these seemed most important to me.

Comments?

Top 10 storage technologies over the last decade

Aurora's Perception or I Schrive When I See Technology by Wonderlane (cc) (from Flickr)
Aurora's Perception or I Schrive When I See Technology by Wonderlane (cc) (from Flickr)

Some of these technologies were in development prior to 2000, some were available in other domains but not in storage, and some were in a few subsystems but had yet to become popular as they are today.  In no particular order here are my top 10 storage technologies for the decade:

  1. NAND based SSDs – DRAM and other technology solid state drives (SSDs) were available last century but over the last decade NAND Flash based devices have dominated SSD technology and have altered the storage industry forever more.  Today, it’s nigh impossible to find enterprise class storage that doesn’t support NAND SSDs.
  2. GMR head– Giant Magneto Resistance disk heads have become common place over the last decade and have allowed disk drive manufacturers to double data density every 18-24 months.  Now GMR heads are starting to transition over to tape storage and will enable that technology to increase data density dramatically
  3. Data DeduplicationDeduplication technologies emerged over the last decade as a complement to higher density disk drives as a means to more efficiently backup data.  Deduplication technology can be found in many different forms today, ranging from file and block storage systems, backup storage systems, to backup software only solutions.
  4. Thin provisioning – No one would argue that thin provisioning emerged last century but it took the last decade to really find its place in the storage pantheon.  One almost cannot find a data center class storage device that does not support thin provisioning today.
  5. Scale-out storage – Last century if you wanted to get higher IOPS from a storage subsystem you could add cache or disk drives but at some point you hit a subsystem performance wall.  With scale-out storage, one can now add more processing elements to a storage system cluster without having to replace the controller to obtain more IO processing power.  The link reference talks about the use of commodity hardware to provide added performance but scale-out storage can also be done with non-commodity hardware (see Hitachi’s VSP vs. VMAX).
  6. Storage virtualizationserver virtualization has taken off as the dominant data center paradigm over the last decade but a counterpart to this in storage has also become more viable as well.  Storage virtualization was originally used to migrate data from old subsystems to new storage but today can be used to manage and migrate data over PBs of physical storage dynamically optimizing data placement for cost and/or performance.
  7. LTO tape When IBM dominated IT in the mid to late last century, the tape format dejour always matched IBM’s tape technology.  As the decade dawned, IBM was no longer the dominant player and tape technology was starting to diverge into a babble of differing formats.  As a result, IBM, Quantum, and HP put their technology together and created a standard tape format, called LTO, which has become the new dominant tape format for the data center.
  8. Cloud storage Unclear just when over the last decade cloud storage emerged but it seemed to be a supplement to cloud computing that also appeared this past decade.  Storage service providers had existed earlier but due to bandwidth limitations and storage costs didn’t survive the dotcom bubble. But over this past decade both bandwidth and storage costs have come down considerably and cloud storage has now become a viable technological solution to many data center issues.
  9. iSCSI SCSI has taken on many forms over the last couple of decades but iSCSI has the altered the dominant block storage paradigm from a single, pure FC based SAN to a plurality of technologies.  Nowadays, SMB shops can have block storage without the cost and complexity of FC SANs over the LAN networking technology they already use.
  10. FCoEOne could argue that this technology is still maturing today but once again SCSI has taken opened up another way to access storage. FCoE has the potential to offer all the robustness and performance of FC SANs over data center Ethernet hardware simplifying and unifying data center networking onto one technology.

No doubt others would differ on their top 10 storage technologies over the last decade but I strived to find technologies that significantly changed data storage that existed in 2000 vs. today.  These 10 seemed to me to fit the bill better than most.

Comments?

One platform to rule them all – Compellent&EqualLogic&Exanet from Dell

Compellent drive enclosure (c) 2010 Compellent (from Compellent.com)
Compellent drive enclosure (c) 2010 Compellent (from Compellent.com)

Dell and Compellent may be a great match because Compellent uses commodity hardware combined with specialized software to create their storage subsystem. If there’s any company out there that can take advantage of commodity hardware it’s probably Dell. (Of course Commodity hardware always loses in the end, but that’s another story).

Similarly, Dell’s EqualLogic iSCSI storage system uses commodity hardware to provide its iSCSI storage services.  It doesn’t take a big leap of imagination to have one storage system that combines the functionality of EqualLogic’s iSCSI and Compellent’s FC storage capabilities.  Of course there are others already doing this including Compellent themselves which have their own iSCSI support already built into their FC storage system.

Which way to integrate?

Does EqualLogic survive such a merger?  I think so.  It’s easy to imagine that Equal Logic may have the bigger market share today. If that’s so, the right thing might be  to merge Compellent FC functionality into EqualLogic.  If Compellent has the larger market, the correct approach is the opposite. The answer lies probably with a little of both.  It seems easiest to add iSCSI functionality to a FC storage system than the converse but the FC to iSCSI approach may be the optimum path for Dell, because of the popularity of their EqualLogic storage.

What about NAS?

The only thing missing from this storage system is NAS.  Of course the Compellent storage offers a NAS option through the use of a separate Windows Storage Server (WSS) front end.  Dell’s EqualLogic does the much the same to offer NAS protocols for their iSCSI system.  Neither of these are bad solutions but they are not a fully integrated NAS offering such as available from NetApp and others.

However, there is a little discussed part, the Dell-Exanet acquisition which happened earlier this year. Perhaps the right approach is to integrate Exanet with Compellent first and target this at the high end enterprise/HPC market place, keeping Equal Logic at the SMB end of the marketplace.  It’s been a while since I have heard about Exanet, and nothing since the acquisition earlier this year.  Does it make sense to backend a clustered NAS solution with FC storage – probably.

—-

Much of this seems doable to me, but it all depends on taking the right moves once the purchase is closed.   But if I look at where Dell is weakest (baring their OEM agreement with EMC), it’s in the highend storage space.  Compellent probably didn’t have much of a foot print there as possible due to their limited distribution and support channel.  A Dell acquisition could easily eliminate these problems and open up this space without having to do much other than start to marketing, selling and supporting Compellent.

In the end, a storage solution supporting clustered NAS, FC, and iSCSI that combined functionality equivalent to Exanet, Compellent and EqualLogic based on commodity hardware (ouch!) could make a formidable competitor to what’s out there today if done properly. Whether Dell could actually pull this off and in a timely manner even if they purchase Compellent, is another question.

Comments?

Data compression lives on

Macroblocking: demolish the eerie ▼oid by Rosa Menkman (cc) (from Flickr)
Macroblocking: demolish the eerie ▼oid by Rosa Menkman (cc) (from Flickr)

Last week NetApp announced the availability of data compression on many of their unified storage platforms, which includes block and file storage.  Earlier this year EMC announced data compression for LUNs on CLARiion and Celerra.  I must commend both of them for re-integrating data compression back into primary storage systems, missing since IBM and Sun stopped marketing RVA and SVA.

Data compression algorithms

Essentially data compression is an algorithm that eliminates redundancy in data streams.  Data compression can be “lossy” or “loss-less”.  Data compression in storage subsystems is typically loss-less which means that the original data can be reconstructed without any loss of information.  One sees lossy algorithms in video/audio data compression which doesn’t impact video/audio fidelity unless it results in significant loss of data.

One simple example of loss-less data compression is Run-Length Encoding which substitutes a trigger, count, and character string for any character repeated more than 4 times in a block of data.  This compresses well any text strings with lots of blanks, numerical data with lot’s of 0’s and initial format data written with a repeating character.

There are other, more sophisticated compression algorithms like Huffman Coding, which identify the most frequent bytes in a block of data and replace these bytes with shorter bit patterns. For example if ~50% of the characters in a text file are the letters “a”, “e”, “i”, “o”, “t”, and “n” (see Wikipedia, Frequency Analysis) then these characters can take up much less space if we encode them 4 or less bits rather than the 8-bits in a byte.

I am certain that both EMC and NetApp are using much more sophisticated algorithms than either of these and it wouldn’t surprise me to know they are using something like the open source algorithm like Zlib (gzip) or Bzip2 (see my Poor deduplication with Oracle RMAN compressed backups post for an explanation) which uses Huffman Coding and adds even more sophistication.  Data compression algorithms like these could offer something like 50% compression, i.e., your data could be stored in 50% less space.

Data compression is often confused with Data Deduplication but it’s not the same. Deduplication looks for duplicate data across different data blocks and files while data compression is strictly only examining the data stream within a block or file and doesn’t depend on any other data.

Storage system data compression

In the past, data compression was relegated to a separate appliance, tape storage systems, and/or host software.  By integrating these algorithms into their main storage engines, both NetApp and EMC are taking advantage of the recent processor speed increases being embedding into their systems to offer offline functionality for online data.

Historically, compression algorithms such as these were implemented in hardware but nowadays they can easily be done in software by being relegated to operate during off-peak IO time or execute as the lowest priority task in the storage system. As such, there can be no guarantee when your data will finally be compressed but it will be compressed eventually.

Data compression like this is great for data that isn’t modified frequently.  It takes some processing time to compress data and as such, would need to be repeated after every modification of a compressed block or file.  So if the data isn’t modified that much, compression’s processing cost could be amortized over longer data lifetimes.

Further, data compression must be undone at read time, i.e., the data needs to be de-compressed and handed off to the IO requesting it.  De-compression is a much faster algorithm than compression because in the case of something like Huffman Coding the character dictionary is already known and as such, it’s just a matter of table lookup and bit field isolation. It would be convenient if this data were sitting in the system DRAM someplace but lacking that, moving it from cache to DRAM could be done quickly enough, processed there, and then moved back before final transfer to the requesting IO.

As such, data compression may impact response time for compressed data reads.  How much of an impact is yet TBD.

Data writes will not be impacted at all because the compression activity is done much later. Whether the data stays in cache until compressed or is brought back in at some later time is another algorithm question which may impact cache hit rates/compression performance but this doesn’t have to be a serious impediment.

NetApp is able to offer this capability for both block and file storage because of it’s WAFL backend data structure which essentially allows it to create variable length blocks for file and block data.  EMC only offers this for LUN data (block storage) as of yet but it’s probably just a matter of time before it’s available for other data as well.

Any questions?

Commodity hardware always loses

Herman Miller's Embody Chair by johncantrell (cc) (from Flickr)
A recent post by Stephen Foskett has revisted a blog discussion that Chuck Hollis and I had on commodity vs. special purpose hardware.  It’s clear to me that commodity hardware is a losing proposition for the storage industry and for storage users as a whole.  Not sure why everybody else disagrees with me about this.

It’s all about delivering value to the end user.  If one can deliver equivalent value with commodity hardware than possible with special purpose hardware then obviously commodity hardware wins – no question about it.

But, and it’s a big BUT, when some company invests in special purpose hardware, they have an opportunity to deliver better value to their customers.  Yes it’s going to be more expensive on a per unit basis but that doesn’t mean it can’t deliver commensurate benefits to offset that cost disadvantage.

Supercar Run 23 by VOD Cars (cc) (from Flickr)
Supercar Run 23 by VOD Cars (cc) (from Flickr)

Look around, one sees special purpose hardware everywhere. For example, just checkout Apple’s iPad, iPhone, and iPod just to name a few.  None of these would be possible without special, non-commodity hardware.  Yes, if one disassembles these products, you may find some commodity chips, but I venture, the majority of the componentry is special purpose, one-off designs that aren’t readily purchase-able from any chip vendor.  And the benefits it brings, aside from the coolness factor, is significant miniaturization with advanced functionality.  The popularity of these products proves my point entirely – value sells and special purpose hardware adds significant value.

One may argue that the storage industry doesn’t need such radical miniaturization.  I disagree of course, but even so, there are other more pressing concerns worthy of hardware specialization, such as reduced power and cooling, increased data density and higher IO performance, to name just a few.   Can some of this be delivered with SBB and other mass-produced hardware designs, perhaps.  But I believe that with judicious selection of special purposed hardware, the storage value delivered along these dimensions can be 10 times more than what can be done with commodity hardware.

Cuba Gallery: France / Paris / Louvre / architecture / people / buildings / design / style / photography by Cuba Gallery (cc) (from Flickr)
Cuba Gallery: France / Paris / Louvre / ... by Cuba Gallery (cc) (from Flickr)

Special purpose HW cost and development disadvantages denied

The other advantage to commodity hardware is the belief that it’s just easier to develop and deliver functionality in software than hardware.  (I disagree, software functionality can be much harder to deliver than hardware functionality, maybe a subject for a different post).  But hardware development is becoming more software like every day.  Most hardware engineers do as much coding as any software engineer I know and then some.

Then there’s the cost of special purpose hardware but ASIC manufacturing is getting more commodity like every day.   Several hardware design shops exist that sell off the shelf processor and other logic one can readily incorporate into an ASIC and Fabs can be found that will manufacture any ASIC design at a moderate price with reasonable volumes.  And, if one doesn’t need the cost advantage of ASICs, use FPGAs and CPLDs to develop special purpose hardware with programmable logic.  This will cut engineering and development lead-times considerably but will cost commensurably more than ASICs.

Do we ever  stop innovating?

Probably the hardest argument to counteract is that over time, commodity hardware becomes more proficient at providing the same value as special purpose hardware.  Although this may be true, products don’t have to stand still.  One can continue to innovate and always increase the market delivered value for any product.

If there comes a time when further product innovation is not valued by the market than and only then, does commodity hardware win.  However, chairs, cars, and buildings have all been around for many years, decades, even centuries now and innovation continues to deliver added value.  I can’t see where the data storage business will be any different a century or two from now…

Poor deduplication with Oracle RMAN compressed backups

Oracle offices by Steve Parker (cc) (from Flickr)
Oracle offices by Steve Parker (cc) (from Flickr)

I was talking with one large enterprise customer today and he was lamenting how poorly Oracle RMAN compressed backupsets dedupe. Apparently, non-compressed RMAN backup sets generate anywhere from 20 to 40:1 deduplication ratios but when they use RMAN backupset compression, their deduplication ratios drop down to 2:1.  Given that RMAN compression probably only adds another 2:1 compression ratio then the overall data reduction becomes something ~4:1.

RMAN compression

It turns out Oracle RMAN supports two different compression algorithms that can be used zlib (or gzip) and bzip2.  I assume the default is zlib and if you want to one can specify bzip2 for even higher compression rates with the commensurate slower or more processor intensive compression activity.

  • Zlib is pretty standard repeating strings elimination followed by Huffman coding which uses shorter bit strings to represent more frequent characters and longer bit strings to represent less frequent characters.
  • Bzip2 also uses Huffman coding but only after a number of other transforms such as run length encoding (changing duplicated characters to a count:character sequence), Burrows–Wheeler transform (changes data stream so that repeating characters come together), move-to-front transform (changes data stream so that all repeating character strings are moved to the front), another run length encoding step, huffman encoding, followed by another couple of steps to decrease the data length even more…

The net of all this is that a block of data that is bzip2 encoded may look significantly different if even one character is changed.  Similarly, even zlib compressed data will look different with a single character insertion, but perhaps not as much.  This will depend on the character and where it’s inserted but even if the new character doesn’t change the huffman encoding tree, adding a few bits to a data stream will necessarily alter its byte groupings significantly downstream from that insertion. (See huffman coding to learn more).

Deduplicating RMAN compressed backupsets

Sub-block level deduplication often depends on seeing the same sequence of data that may be skewed or shifted by one to N bytes between two data blocks.  But as discussed above, with bzip2 or zlib (or any huffman encoded) compression algorithm the sequence of bytes looks distinctly different downstream from any character insertion.

One way to obtain decent deduplication rates from RMAN compressed backupsets would be to decompress the data at the dedupe appliance and then run the deduplication algorithm on it – dedupe appliance ingestion rates would suffer accordingly.  Another approach is to not use RMAN compressed backupsets but the advantages of compression are very appealing such as less network bandwidth, faster backups (because they are not transferring as much data), and quicker restores.

Oracle RMAN OST

On the other hand, what might work is some form of Data Domain OST/Boost like support from Oracle RMAN which would partially deduplicate the data at the RMAN server and then send the deduplicated stream to the dedupe appliance.  This would provide less network bandwidth and faster backups but may not do anything for restores.  Perhaps a tradeoff worth investigating.

As for the likelihood that Oracle would make such services available to deduplicatione vendors, I would have said this was unlikely but ultimately the customers have a say here.   It’s unclear why Symantec created OST but it turned out to be a money maker for them and something similar could be supported by Oracle.  Once an Oracle RMAN OST-like capability was in place, it shouldn’t take much to provide Boost functionality on top of it.  (Although EMC Data Domain is the only dedupe vendor that has Boost yet for OST or their own Networker Boost version.)

—-

When I first started this post I thought that if the dedupe vendors just understood the format of the RMAN compressed backupsets they would be able to have the same dedupe ratios as seen for normal RMAN backupsets.  As I investigated the compression algorithms being used I became convinced that it’s a computationally “hard” problem to extract duplicate data from RMAN compressed backupsets and ultimately would probably not be worth it.

So, if you use RMAN backupset compression, probably ought to avoid deduplicating this data for now.

Anything I missed here?

CommVault’s Simpana 9 release

CommVault annoucned a new release of their data protection product today – Simpana® 9.  The new software provides significantly enhanced support for VM backup, new source-level deduplication capabilities and other enhanced facilities.

Simpana 9 starts by defining 3 tiers of data protection based on their Snapshot Protection Client (SPC):

  • Recovery tier – using SPC application consistent hardware snapshots can be taken utilizing storage interfaces to create content aware granular level recovery.  Simpana 9 SPC now supports EMC, NetApp, HDS, Dell, HP, and IBM (including LSI) storage snapshot capabilities.  Automation supplied with Simpana 9 allows the user to schedule hardware snapshots at various intervals throughout the day such  that they can be used to recover data without delay.
  • Protection tier – using mounted snapshot(s) provided by SPC above, Simpana 9 can create an extract or physical backup set copy to any disk type (DAS, SAN, NAS) providing a daily backup for retention purposes. This data can be deduplicated and encrypted for increased storage utilization and data security.
  • Compliance tier – selective backup jobs can then be sent to cloud storage and/or archive appliances such as HDS HCP or Dell DX for long term retention and compliance, preserving CommVault’s deduplication and encryption.  Alternatively, compliance data can be sent to the cloud.  CommVault’s previous cloud storage support included Amazon S3, Microsoft Azure, Rackspace, Iron Mountain and Nirvanix, with Simpana 9, they now add EMC Atmos providers and Mezeo to the mix.

Simpana 9 VM backup support

Simpana 9 also introduces a SnapProtect Enable Virtual Server Agent (VSA) to speed up virtual machine datastore backups.  With VSA’s support for storage hardware snapshot backups and VMware facilities to provide application consistent backups, virtual server environments can now scale to 1000s of VMs without concern for backup’s processing and IO impact to ongoing activity.  VSA snapshots can be mounted afterwards to a proxy server and using VMware services extract file level content which CommVault can then data deduplicate, encrypt and offload to other media that allows for granular content recovery.

In addition, Simpana 9 supports auto-discovery of virtual machines with auto-assignment of data protection policies.  As such, VM guests can be automatically placed into an appropriate, pre-defined data protection regimen without the need for operator intervention after VM creation.

Also with all the meta-data content cataloguing, Simpana 9 now supplies a light weight file-oriented Storage Resources Manager capability via the CommVault management interface.  Such services can provide detailed file level analytics for VM data without the need for VM guest agents.

Simpana 9 new deduplication support

CommVault’s 1st gen deduplication with Simpana 7 was at the object level.  With Simpana 8 deduplication occured at the block level providing content aware variable block sizes and added software data encryption support for disk or tape backup sets.  With today’s release, Simpana 9 shifts some deduplication processing out to the source (the client) increasing backup data throughput by reducing data transfer. All this sounds similar to EMC’s Data Domain Boost capability introduced earlier this year .

Such a change takes advantage of the CommVault’s intelligent Data Agent (iDA) running in the clients to provide pre-deduplication hashing and list creation rather than doing this all at CommVault’s Media Agent node, reducing data to be transferred.  Further, CommVault’s data deduplication can be applied across a number of clients for a global deduplication service that spans remote clients as well as a central data center repositories.

Simpana 9 new non-CommVault backup reporting and migration capabilities

Simpana 9 provides a new data collector for NetBackup versions 6.0, 6.5, and 7.0 and TSM 6.1 which allows CommVault to discover other backup services in the environment, extract backup policies, client configurations, job histories, etc. and report on these foreign backup processes.  In addition, once their data collertor is in place, Simpana 9 also supports automated procedures that can roll out and convert all these other backup services to CommVault data protection over a weekend, vastly simplifying migration from non-CommVault to Simpana 9 data protection.

Simpana 9 new software licensing

CommVault is also changing their software licensing approach to include more options for capacity based licensing. Previously, CommVault supported limited capacity based licensing but mostly used CommVault architectural component level licensing.  Now, they have expanded the capacity licensing offerings and both licensing modes are available so the customer can select whichever approach proves best for them.  With CommVault’s capacity-based licensing, usage can be tracked on the fly to show when customers may need to purchase a larger capacity license.

Probably other enhancements I missed here as Simpana 9 was a significant changeover from Simpana 8. Nonetheless, this version’s best feature was their enhanced approach to VM backups, allowing more VMs to run on a single server without concern for backup overhead.  The fact that they do source-level pre-deduplication processing just adds icing to the cake.

What do you think?