Storage replication – Silverton Consulting

NetApp had three of their customer innovation winners come up on stage for a panel discussion with Dave Hitz moderating the discussion. All three had interesting deployments of NetApp storage systems:

Andrew Henderson from ING DIRECT talked about their need to deploy copies of the banks IT environment for test, development, optimization and security testing. This process took 12 weeks to accomplish the first time they tried and only created a single copy. They wanted to speed this up and be able to deploy 10 or more copies if necessary. Andrew looked at Microsoft Hyper-V, System Center and NetApp FlexClones and transformed this process to now generate a copy of the entire banks IT services in under 10 minutes. And since the new capabilities have been in place they have created over 400 copies of the bank (he called these bank-in-a-box) for various purposes.
Teresa Wahlert from Iowa Workforce Development Agency was up next and talked about their VDI implementation. Iowa cut their budget which forced them to shut down a number of physical offices. But with VDI, VMware and NetApp storage Workforce were able to disperse their services to over 3000 locations now in prisons, libraries, and other venues where they had no presence before. They put out a general call for all the tired, dying PCs in Iowa government and used these to host VDI services. Now Workforce services are up 7X24 locations, pretty amazing for government work. Apparently they had tried VDI before and their previous storage couldn’t handle it. They moved to NetApp with FlashCache and it worked just fine. That’s when they rolled it VDI services to their customers and businesses. With NetApp they were able to implement VDI, reduce storage costs (via deduplication and other storage efficiency features) and increase department services.
Jeff Bell at Mercy Healthcare talked about the difficulties of rolling out electronic health records (EHR) and their challenges of integrating ~30 hospitals and ~400 medical clinics. They started with EHR fairly early 2006-2007 well before the latest governmental push. He mentioned Joplin MO and last years category 5 tornado which about wiped out their hospital there. He said within 2 hours after the disaster, Mercy Healthcare was printing out the EHR for the 183 patients present in the hospital at the time that had to be moved to other care facilities. The promise of EHR is that the information travels with the patient, can be recovered in the event of a disaster and is immediately available. It seems that at least at Mercy Healthcare, EHR is living up to its promise. In addition, they just built a new data center as they were running out of space, power and cooling at the old one. They installed new NetApp storage there and for the first few months had to run heaters to keep the data center live-able because the new power/cooling load was so far below what they were experienced previously. Looking back on what they had accomplished Jeff was not so sure they would build a new data center again. With new cloud offerings coming out and the reduced power/cooling and increased density of NetApp storage they could almost get by without another data center at all.

That’s about it from the customer session.

NetApp execs spent the rest of the day on innovation, mostly at NetApp but also in the IT industry in general.

There was lots of discussion on the new release of Data ONTAP 8.1.1 with its latest cluster mode features. NetApp positioned it as fulfilling out the transition to data/storage as an infrastructure that IT has been pushing for the last decade or so. Following in the grand tradition of what IBM did for computing infrastructure with the 360 and what Cisco and others did for networking infrastructure in the mid 80’s.

Comments?

Moving a VM from one data center to another

In all the blog posts/tweets about VMworld this week I didn’t see much about long distance Vmotion. At Cisco’s booth there was a presentation on how they partnered with VMware and to perform Vmotion over 200 (simulated) miles away.

I can’t recall when I first heard about this capability but for many of us this we heard about this before. However, what was new was that Cisco wasn’t the only one talking about it. I met with a company called NetEx whose product HyperIP was being used to performe long distance Vmotion at over 2000 miles apart . And had at least three sites actually running their systems doing this. Now I am sure you won’t find NetEx on VMware’s long HCL list but what they have managed to do is impressive.

As I understand it, they have an optimized appliance (also available as a virtual [VM] appliance) that terminates the TCP session (used by Vmotion) at the primary site and then transfers the data payload using their own UDP protocol over to the target appliance which re-constitutes (?) the TCP session and sends it back up the stack as if everything is local. According to the NetEx CEO Craig Gust, their product typically offers a data payload of around ~90% compared to standard TCP/IP of around 30%, which automatically gives them a 3X advantage (although he claimed a 6X speed or distance advantage, I can’t seem to follow the logic).

How all this works with vCenter, DRS and HA I can only fathom but my guess is that everything this long distance Vmotion is actually does appears to VMware as a local Vmotion. This way DRS and/or HA can control it all. How the networking is set up to support this is beyond me.

Nevertheless, all of this proves that it’s not just one highend networking company coming away with a proof of concept anymore, at least two companies exist, one of which have customers doing it today.

The Storage problem

In any event, accessing the storage at the remote site is another problem. It’s one thing to transfer server memory and state information over 10-1000 miles, it’s quite another to transfer TBs of data storage over the same distance. The Cisco team suggested some alternatives to handle the storage side of long distance Vmotion:

Let the storage stay in the original location. This would be supported by having the VM in the remote site access the storage across a network
Move the storage via long distance Storage Vmotion. The problem with this is that transferring TB of data takes (even at 90% data payload for 800 Mb/s) would take hours. And 800Mb/s networking isn’t cheap.
Replicate the storage via active-passive replication. Here the storage subsystem(s) concurrently replicate the data from the primary site to the secondary site
Replicate the storage via active-active replication where both the primary and secondary site replicate data to one another and any write to either location is replicated to the other

Now I have to admit the active-active replication where the same LUN or file system can be be being replicated in both directions and updated at both locations simultaneously seems to me unobtainium, I can be convinced otherwise. Nevertheless, the other approaches exist today and effectively deal with the issue, albeit with commensurate increases in expense.

The Networking problem

So now that we have the storage problem solved, what about the networking problem. When a VM is Vmotioned to another ESX server it retains its IP addressing so as to retain all it’s current network connections. Cisco has some techniques here where they can seem to extend the VLAN (or subnet) from the primary site to the secondary site and leave the VM with the same network IP address as at the primary site. Cisco has a couple of different ways to extend the VLAN optimized for HA, load ballancing, scaleability or protocol isolation and broadcast avoidance. (all of which is described further in their white paper on the subject). Cisco did mention that their Extending VLAN technology currently would not support distances greater than 500 miles apart.

Presumably NetEx’s product solves all this by leaving the IP addresses/TCP port at the primary site and just transferring the data to the secondary site. In any event multiple solutions to the networking problem exist as well.

Now, that long distance Vmotion can be accomplished is it a DR tool, a mobility tool, a load ballancing tool, or all of the above. That will need to wait for another post.

Tag: Storage replication

NetApp Analyst Summit Customer Panel – how to survive a category 5 tornado

VMworld and long distance Vmotion

The Storage problem

The Networking problem