EMCWorld day 2

Day 2 saw releases for new VMAX  and VPLEX capabilities hinted at yesterday in Joe’s keynote. Namely,

VMAX announcements

VMAX now supports

  • Native FCoE with 10GbE support now VMAX supports directly FCoE, 10GbE iSCSI and SRDF
  • Enhanced Federated Live Migration supports other multi-pathing software, specifically it now adds MPIO to PowerPath and soon to come more multi-pathing solutions
  • Support for RSA’s external key management (RSA DPM) for their internal VMAX data security/encryption capability.

It was mentioned more than once that the latest Enginuity release 5875 is being adopted at almost 4x the rate of the prior generation code.  The latest release came out earlier this year and provided a number of key enhancements to VMAX capabilities not the least of which was sub-LUN migration across up to 3 storage tiers called FAST VP.

Another item of interest was that FAST VP was driving a lot of flash sales.  It seems its leading to another level of flash adoption. According to EMC they feel that almost 80-90% of customers can get by with 3% of their capacity in flash and still gain all the benefits of flash performance at significantly less cost.

VPLEX announcements

VPLEX announcements included:

  • VPLEX Geo – a new asynchronous VPLEX cluster-to-cluster communications methodology which can have the alternate active VPLEX cluster up to 50msec latency away
  • VPLEX Witness –  a virtual machine which provides adjudication between the two VPLEX clusters just in case the two clusters had some sort of communications breakdown.  Witness can run anywhere with access to both VPLEX clusters and is intended to be outside the two fault domains where the VPLEX clusters reside.
  • VPLEX new hardware – using the latest Intel microprocessors,
  • VPLEX now supports NetApp ALUA storage – the latest generation of NetApp storage.
  • VPLEX now supports thin-to-thin volume migration- previously VPLEX had to re-inflate thinly provisioned volumes but with this release there is no need to re-inflate prior to migration.

VPLEX Geo

The new Geo product in conjuncton with VMware and Hyper V allows for quick migration of VMs across distances that support up to 50msec of latency.  There are some current limitations with respect to specific VMware VM migration types that can be supported but Microsoft Hyper-V Live Migration support is readily available at full 50msec latencies.  Note,  we are not talking about distance here but latency as the limiting factor to how far the VPLEX clusters can be apart.

Recall that VPLEX has three distinct use cases:

  • Infrastructure availability which proides fault tolerance for your storage and system infrastructure
  • Application and data mobility which means that applications can move from data center to data center and still access the same data/LUNs from both sites.  VPLEX maintains cache and storage coherency across the two clusters automatically.
  • Distributed data collaboration which means that data can be shared and accessed across vast distances. I have discussed this extensively in my post on Data-at-a-Distance (DaaD) post, VPLEX surfaces at EMCWorld.

Geo is the third product version for VPLEX, from VPLEX Local that supports within data center virtualization, to Vplex Metro which supports two VPLEX clusters which are up to 10msec latency away which generally is up to metropolitan wide distances apart, and Geo which moves to asynchronous cache coherence technologies. Finally coming sometime later is VPLEX Global which eliminates the restriction of two VPLEX clusters or data centers and can support 3-way or more VPLEX clusters.

Along with Geo, EMC showed some new partnerships such as with SilverPeak, Cienna and others used to reduce bandwidth requirements and cost for their Geo asynchronous solution.  Also announced and at the show were some new VPLEX partnerships with Quantum StorNext and others which addresses DaaD solutions

Other announcements today

  • Cloud tiering appliance – The new appliance is a renewed RainFinity solution which provides policy based migration to and from the cloud for unstructured data. Presumably the user identifies file aging criteria which can be used to trigger cloud migration for Atmos supported cloud storage.  Also the new appliance can support archiving file data to the Data Domain Archiver product.
  • Google enterprise search connector to VNX – Showing a Google search appliance (GSA) to index VNX stored data. Thus bringing enterprise class and scaleable search capabilities for VNX storage.

A bunch of other announcements today at EMCWorld but these seemed most important to me.

Comments?

VMware disaster recovery

Thunderstorms over Alexandria, VA by mehul.antani (cc) (from Flickr)
Thunderstorms over Alexandria, VA by mehul.antani (cc) (from Flickr)

I did an article awhile ago for TechTarget on Virtual (machine) Disaster Recovery and discussed what was then the latest version of VMware Site Recovery Manager (SRM) v1.0 and some of it’s capabilities.

Well its been a couple of years since that came out and I thought it would be an appropriate time to discuss some updates to that product and other facilities that bear on virtual machine disaster recovery of today.

SRM to the rescue

Recall that VMware’s SRM is essentially a run book automation tool for system failover.  Using SRM, an administrator defines the physical and logical mapping between a primary site configuration of (protected site in SRM parlance) virtual machines, networking, and data stores and a secondary site (recovery site to SRM) configuration.

Once this mapping is complete, the administrator then creates recovery scripts (recovery plans to SRM) which take the recovery site in a step-by-step fashion from an “inactive” to an “active” state.  With the recovery scripts in hand, data replication can then be activated and monitoring (using storage replication adaptors, SRAs to SRM) can begin.  Once all that was ready and operating, SRM can provide one button failover to the recovery site.

SRM v4.1 supports the following:

  • NFS data stores can now be protected as well as iSCSI and FC LUN data stores.  Recall that a VMFS  (essentially a virtual machine device or drive letter) or a VM data store can be hosted on LUNs or as NFS files.  NFS data stores have recently become more popular with the proliferation of virtual machines under vSphere 4.1.
  • Raw device mode (RDM) LUNs can now be protected. Recall that RDM is another way to access devices directly for performance sensitive VMs eliminating the need to use a data store and  hyper-visor IO overhead.
  • Shared recovery sites are now supported. As such, one recovery site can now support multiple protected sites.  In this way a single secondary site can support failover from multiple primary sites.
  • Role based access security is now supported for recovery scripts and other SRM administration activities. In this way fine grained security roles can be defined that allow protection over unauthorized use of SRM capabilities.
  • Recovery site alerting is now supported. SRM now fully monitors recovery site activity and can report on and alert operations staff when problems occur which may impact failover to the recovery site.
  • SRM test and actual failover can now be initiated and monitored directly from vCenter serve. This provides the vCenter administrator significant control over SRM activities.
  • SRM automated testing can now use storage snapshots.  One advantage of SRM is the ability to automate DR testing which can be done onsite using local equipment. Snapshots eliminates the need for storage replication in local DR tests.

There were many other minor enhancements to SRM since v1.0 but these seem the major ones to me.

The only things lacking seem to be some form of automated failback and three way failover.  I’ll talk about 3-way failover later.

But without automated failback, the site administrator must reconfigure the two sites and reverse the designation of protected and recovery sites, re-mirror the data in the opposite direction and recreate recovery scripts to automate bringing the primary site back up.

However, failback is likely not to be as time sensitive as failover and could very well be a scheduled activity, taking place over a much longer time period. This can, of course all be handled automatically by SRM or be done in a more manual fashion.

Other DR capabilities

At last year’s EMCWorld VPLEX was announced which provided for a federation of data centers or as I called it at the time Data-at-a-Distance (DaaD).  DaaD together with VMware’s Vmotion could provide a level of  disaster avoidance (see my post on VPLEX surfaces at EMCWorld) previously unattainable.

No doubt cluster services from Microsoft Cluster Server (MSCS), Symantec Veritas Cluster Services (VCS)  and others have also been updated.  In some (mainframe) cluster services, N-way or cascaded failover is starting to be supported.  For example, a 3 way DR scenario has a primary site synchronously replicated to a secondary site which is asynchronously replicated to a third site.  If the region where the primary and secondary site is impacted by a disaster, the tertiary site can be brought online. Such capabilities are not yet available for virtual machine DR but it’s only a matter of time.

—–

Disaster recovery technologies are not standing still and VMware SRM is no exception. I am sure a couple of years from now SRM will be even more capable and other storage vendors will provide DaaD capabilities to rival VPLEX.   What the cluster services folks will be doing by that time I can’t even imagine.

Comments?

 

Caching DaaD for federated data centers

Internet Splat Map by jurvetson (cc) (from flickr)
Internet Splat Map by jurvetson (cc) (from flickr)

Today, I attended a webinar where Pat Gelsinger, President of Information Infrastructure at EMC discussed their concept for a new product based on the Yotta Yotta technology they acquired a few years back.  Yotta Yotta’s product was a distributed, coherent caching appliance that had FC front end ports, an Infiniband appliance internal network and both FC and WAN backend links.

What one did with Yotta Yotta nodes was place them in front of your block storage, connect them together via infiniband locally and via a WAN technology (of your choice, then) and then you could access any data behind the appliances from any attached location.  They also provided very quick transferring of bulk data between remote nodes. So, their technology allowed for very rapid data transmission over standard WAN interfaces/distances and provided a distributed cache across those very same distances to the data behind the appliances.

I like caching appliances as much as anyone but they had become prominent only in the late 70’s and early 80’s mostly because caching technology was hard to do with the storage subsystems of the day, but they went away a long time ago.  Nowadays, you can barely purchase a lone disk drive without a cache in them.  So what’s different.

Introducing DaaD

Today we have SSDs and much cheaper processing power.  I wrote about new caching appliances like DataRam‘s XcelaSAN  in a Cache appliances rise from the dead post I did after last years SNW.  But EMC’s going after a slightly broader domain – the world.  The caching appliance that EMC is discussing is really intended to support distributed data access, or as I like to call it,  Data-at-a-Distance (DaaD).

How can this work?  Data is stored on subsystems at various locations around the world.  A DaaD appliance is inserted in front of each of these and connected over the WAN. Some or all of that data is now re-configured (at block or more likely LUN level) to be accessible at distance from each DaaD data center.  As each data center reads and writes data from/to their remote brethern, some portion of that data is cached locally in the DaaD appliance and the rest is only available by going to the remote site (with considerably higher latency).

This works moderately well for well behaved, read intensive workloads where 80% of the IO is to 20% of the data (most of which is cached locally).  But block writes present a particularly nasty problem as any data write has to be propagated to all cache copies before acknowledged.

It’s possible write propagation could be done via invalidating the data in cache (so any subsequent read would need to re-access the data from the original host).  Nevertheless, to even know which DaaD nodes have a cached copy of a particular block, one needs to maintain a dictionary of all globally identifiable blocks held in any DaaD cache node at every moment in time.  Any such table would change often, will necessarily need to be updated very carefully, deadlock free and atomically with non-failable transactions – therein lies one of the technological hurdles.  Doing this quickly without impacting performance is another hurdle.

So simple enough, EMC takes Yotta Yotta’s technology, updates it for todays processors, networking, and storage, and releases it as a data center federation enabler. So, what can one do with a federated data center, well that’s another question and it involves Vmotion, and must be a subject for a future post …