Disaster recovery from VMware to AWS using Dell EMC Avamar & Data Domain

avI was at Dell EMC World2017 last week and although most of the news was on Dell’s new 14th generation server and Dell-EMC integration progress, Wednesday’s keynote was devoted to storage and non-server infrastructure news.

There was plenty of non-server news but one item that caught my attention was new functionality from Dell EMC Data Protection Division that used Avamar and Data Domain to provide disaster recovery for VMware VMs directly to AWS.

Data Domain (AWS) Cloud DR

Dell EMC Data Domain Cloud DR (DDCDR) is  a new capability that enables DD to backup to AWS S3 object storage and when needed restart the virtual machines within AWS.

DDCDR requires that a customer with Avamar backup and Data Domain (DD) storage install an OVA which deploys an “add-on” to their on-prem Avamar/DD system and install a lightweight VM (Cloud DR server) utility in their AWS domain.

Once the OVA is installed, it will read the changed data and will segment, encrypt, and compress the backup data and then send this and the backup metadata to AWS S3 objects. Avamar/DD policies can be established to control how many daily backup copies are to be saved to S3 object storage. There’s no need for Data Domain or Avamar to run in AWS.

When there’s a problem at the primary data center, an admin can click on a Avamar GUI button and have the Cloud DR server, uncompress, decrypt, rehydrate and restore the backup data into EBS volumes, translate the VMware VM image to an AMI image and then restarts the AMI on an AWS virtual server (EC2) with its data on EBS volume storage. The Cloud DR server will use the backup metadata to select the AWS EC2 instance with the proper CPU and RAM needed to run the application. Once this completes, the VM is running standalone, in an AWS EC2 instance. Presumably, you have to have EC2 and EBS storage volumes resources available under your AWS domain to be able to install the application and restore its data.

For simplicity purposes, the user can control almost all of the required functionality for DDCDR from the Avamar GUI alone. But in case of a site outage, the user can initiate the application DR from a portal supplied by the Cloud DR server utility.

There you have it, simplified, easy to use (AWS) Cloud DR for your VM applications all through Dell EMC Avamar, Data Domain storage and DDCDR. At the moment, it only works with AWS cloud but it’s likely to be available for other public clouds in the near future.

~~~~

There was much more infrastructure news at Dell EMC World2017. I’ll discuss more details on their new storage offerings in my upcoming Storage Intelligence newsletter, due out the end of this month. If your interested in receiving your own copy of my newsletter, checkout the signup button in the upper right of this page.

Comments?

[Edits were made for readability and technical accuracy after this post was published. Ed]

Existential threats

Not sure why but lately I have been hearing a lot about existential events. These are events that threaten the existence of humanity itself.

Massive Solar Storm

A couple of days ago I read about the Carrington Event which was a massive geomagnetic solar storm in 1859. Apparently it wreaked havoc with the communications infrastructure of the time (telegraphs). Researchers have apparently been able to discover other similar events in earth’s history by analyzing ice cores from Greenland which indicate that events of this magnitude occur once every 500 years and smaller events typically occur multiple times/century.

Unclear to me what a solar storm of the magnitude of the Carrington Event would do to the world as we know it today, but we are much more dependent on electronic communications, radio, electronic power, etc. If such an event were to take out, 50% of our electro-magnetic infrastructure, such as frying power transformers, radio transceivers, magnetic storage/motors/turbines, etc. civilization as we know it would be brought back to the mid 1800’s but with a 21st century population.

This would last until we could rebuild all the lost infrastructure, at tremendous cost. During this time we would be dependent on animal-human-water power, paper-optical based communications/storage, and animal-wind transport.

It appears that any optical based communication/computer systems would remain intact but powering them would be problematic without working transformers and generators.

One article (couldn’t locate this) stated that the odds of another Carrington Event happening is 12%  by 2022. But the ice core research seems to indicate that it should be higher than this. By my reckoning, it’s been 155 years since the last event, which means we are ~1/3rd of the way through the next 500 years, so I would expect the probability of a similar event happening to be ~1/3 at this point and rising slightly every year until it happens again.

Superintelligence

I picked up a book called Superintelligence: Paths, Dangers, Strengths by Nick Bostrom last week and started reading it last night. It’s about the dangers of AI gaining the ability to improve itself and after that becoming not just equivalent to Human Level Intelligence (HMLI) but greatly exceeding HMLI at a super-HMLI level (Superintelligent). This means some Superintelligent entity that would have more intelligence than our current population of humans today, by many orders of magnitude.

Bostrom discusses the take off processes that would lead to Superintelligence and some of the ways we could hope to control it. But his belief is that trying to install any of these controls after it has reached HMLI would be fruitless.

I haven’t finished the book but what I have read so far, has certainly scared me.

Bostrom presents three scenarios for a Superintelligence take off: slow take off, fast take off and medium take off. He believes that in a slow take off scenario there may be many opportunities to control the emerging Superintelligence. In a moderate or medium take off, we would know that something is wrong but would have only some limited opportunity to control it. In the fast take off (literally 18months from HMLI to Superintelligence in one scenario Bostrom presents), the likelihood of controlling it after it starts are non-existent.

The later half of Bostrom’s book discusses potential control mechanisms and other ways to moderate the impacts of superintelligence.  So far I don’t see much hope for mankind in the controls he has proposed. But l am only half way through the book and hope to see more substantial mechanisms in the 2nd half.

In the end, any Superintelligence could substantially alter the resources of the world and the impact this would have on humanity is essentially unpredictable. But by looking at recent history, one can see how other species have faired as humanity has altered the resources of the earth. Humanity’s rise has led to massive species die offs, for any species that happened to lie in the way of human progress.

The first part of Bostrom’s book discusses some estimates as to when the world will reach AI with HMLI. Most experts believe that we will see HMLI like this with a 90% probability by the year 2075 and a 50% probability by the year 2050. As for the duration of take off to superintelligence ,the expert opinions are mixed and he believes that they highly underestimate the speed of take off.

Humanity’s risks

The search for extra-terristial intelligence has so far found nothing. One of the parameters for the odds of a successful search was the number of inhabitable planets in the universe. But the another parameter is the ability of a technological civilization to survive long enough to be noticed – the likelihood of a civilization to survive any existential risk that comes up.

Superintelligence and massive solar storms represent just two such risks but there are a multitude of others that can be identified today, and tomorrow’s technological advances will no doubt give rise to more.

Existential risks like these are ever-present and appear to be growing as our technolgical prowess grows. My only problem is that today the study of existential risks seem at best, ad hoc today and at worst, outright disregard.

I believe the best policy is to recognize known existential risks, have some intelligent debate on how probably they are and how we could potentially check them. There really needs to be some systematic study of existential risks around the world bringing academics and technologists together to understand and to mitigate them. The threats to humanity are real, we can continue to ignore them, study a few that gain human interest, or actively seek out and mitigate all of them we can.

Comments?

Photo Credit(s): C3-class Solar Flare Erupts on Sept. 8, 2010 [Detail] by NASA Goddard’s space flight center photo stream

DR preparedness in real time

As many may have seen there has been serious flooding throughout the front range of Colorado.  At the moment the flooding hasn’t impacted our homes or offices but there’s nothing like a recent, nearby disaster to focus one’s thoughts on how prepared we are to handle a similar situation.

 

What we did when serious flooding became a possibility

As I thought about what I should be doing last night with flooding in nearby counties, I moved my computers, printer, some other stuff from the basement office to an upstairs area in case of basement flooding. I also moved my “Time Machine” backup disk upstairs as well which holds the iMac’s backups (hourly for last 24 hrs, daily for past month and weekly backups [for as many weeks that can be held on a 2TB disk drive]). I have often depended on time machine backups to recover files I inadvertently overwrote, so it’s good to have around.

I also charged up all our mobiles, laptops & iPads and made sure software and email were as up-to-date as possible.  I packed up my laptop & iPad, with my most recent monthly and weekly backups and some other recent work printouts into my backpack and left it upstairs ready to go at a moments notice.

The next day post-mortum

This morning with less panic and more time to think, the printer was probably the least of my concerns but the internet and telecommunications (phones & headset) should probably have been moved upstairs as well.

Although we have multiple mobile phones, (AT&T) reception is poor in the office and home. It would have been pretty difficult to conduct business here with the mobile alone if we needed to.  I use a cable provider for business phones but also have a land line for our home. So I (technically) have triple backup for telecom, although to use the mobile effectively, we would have had to leave the office.

Internet access

Internet is another matter though. We also use cable for internet and the modem that supplies office internet connects to a cable close to where it enters the house/office. All this is downstairs, in the basement. The modem is powered using basement plugs (although it does have a battery as well) and there’s a hard ethernet link between the cable modem and an Airport Express base station (also downstairs) which provides WiFi to the house and LAN for the house iMacs/PCs.

Considering what I could do to make this a bit more flood tolerant, I should have probably moved the cable modem and Airport Express upstairs connecting it to the TV cable and powering it using upstairs power. Airport Express WiFi would have provided sufficient Internet access to work but with the modem upstairs connecting an ethernet cable to a desktop would also have been a possibility.

I do have the hotspot/tethering option for my mobile phone but as discussed above, reception is not that great. As such, it may have not sufficed for the household, let alone a work computer.

Internet is available at our local library and at many nearby coffee shops.  So, worst case was to take my laptop and head to a coffee shop/library that still had power/WiFi and camp out all day, for potentially multiple days.

I could probably do better with Internet access. With the WiFi and tethering capabilities available with cellular iPad these days, if I should just purchase one for the office, with a suitable data plan, I could have used the iPad as another hot spot, independent of my mobile. Of course, I would probably go with a different carrier so that reception issues could also be minimized (hoping where one [AT&T] is poor the other [Verizon?] carrier would be fine).

Data availability

Data access outside of the Time Machine disk and the various hard drive backups was another item I considered this morning.  I have a monthly, hard-drive backups, normally kept in a safety deposit box at a local bank.

The bank is in the same flood/fire plane that I am in, but the tell me it’s floodproof, fireproof and earthquake proof.  Call me paranoid but I didn’t see any fire suppression equipment visible in the vault. The vault door although a large quantity of steel and other metals didn’t seem to have waterproof seals surrounding it.  As for earthquakes, concrete walls, steel door doesn’t mean it’s eartquake proof.  But then again, I am paranoid, it would probably survive much better than anything in our home/office.

Also, I keep weekly encrypted backups in the house, alternating between two hard disk drives and keep the most recent upstairs. So between the weeklies, monthlies, and Time Machine I have three distinct tiers of data backups. Of course, the latest monthly was sitting in the house waiting to be moved to the safety deposit box – not good news.

I also have  a (manual) copy of work data on the laptop, current to the last hard backup (also at home). So of my three tiers of backup every single current one of them was in the home/office.

I could do better. Looking at Dropbox and Box for about $100/year/100GB (DropBox, Box is ~40% cheaper) I could keep our important work and home data on cloud storage and have access to it from any Internet accessible location (including with mobile devices) with proper security credentials. Not sure how long it would take to seed this backup we have about 20Gb of family and work office documents and probably another 120GB or so of photos that I would want to keep around or about 140GB of info.  This could provide 5-way redundancy with Time machine, weekly hard drive and monthly hard drive backups and now Box/Dropbox for a for a (office and home) fourth backup, with  the laptop being a fifth (office only) backup.  Seems like cheap insurance at the moment.

The other thing that Box/DropBox would do for me is to provide a synch service with my laptop so that files changed on either device would synch to the cloud and then be copied to all other devices.  This would substitute my current 4th tier of (work) backups with a more current, cloud backup. It would also eliminate the manual copy process performed during every backup to keep my laptop up to date.

I have some data security concerns with using cloud storage, but according to Dropbox they use Amazon S3 for their storage and AES-256 data encryption so that others can’t read your data. They use SSL to transfer data to the cloud.

Where all the keys are held is another matter and with all the hullabaloo with NSA, anything on the internet can be provided to the gov’t with a proper request. But the same could be said for my home computer and all my backups.

There are plenty of other solutions here, Google drive and Microsoft’s SkyDrive to name just a few. But from what I have heard Dropbox is best, especially if you have a large number of files.

The major downsides (besides the cost) is that when you power up your system it can take longer while Dropbox scans for out-of-synch files and the time it takes to seed your Dropbox account. This is all dependent on your internet access, but according to a trusted source Dropbox seeding starts with smallest files and works up to the larger ones over time. So there is a good likelihood your office files (outside of PPT) might make it to the cloud sooner than your larger media, databases, and other large files.  I figure we have about ~140GB to be copied to the cloud. I promise to update the post with the time it took to copy this data to the cloud.

Power and other emergency preparedness

Power is yet another concern.  I have not taken the leap to purchase a generator for the home/office. But now think this unwise. Although power has gotten a lot more reliable in our home/office over the years, there’s still a real possibility that there could be a disruption. The areas with serious flooding all around us are having power blackouts this morning and no telling when their power might get back on. So a generator purchase is definitely in my future.

Listening to the news today, there was talk of emergency personnel notifying people that they had 30 minutes to evacuate their houses.  So, next time there is a flood/fire warning in the area I think I will take some time to pack up more than my laptop. Perhaps some other stuff like clothing and medicines that will help us survive and continue to work.

Food and water are also serious considerations. In Florida for hurricane preparedness  they suggest filling up your bathtubs with water or having 1 gallon of water per person per day set aside in case of emergency – didn’t do this last night but should have.  Florida’s family emergency preparedness plan also suggests enough water for 5-7 days.

I think we have enough dry food around the house to sustain us for a number of days (maybe not 7 though). If we consider whats in the freezer and fridge that probably goes up to a couple of weeks or so, assuming we can keep it cold.

Cooking food is another concern. We have propane and camp stoves which would provide rudimentary ability to cook outdoors if necessary as well as an old charcoal grill and bag of charcoal in our car-camping stuff. Which should suffice for a couple of days but probably not a week.

As for important documents they are in that safety deposit box in our flood plain. (May need to rethink that). Wills and other stuff are also in the hands of appropriate family members and lawyers so that’s taken care of.

Another item on their list of things to have for a hurricane is flashlights and fresh batteries. These are all available in our camping stuff but would be difficult to access in a moments notice. So a couple of rechargeable flashlights that were easier to access might be a reasonable investment. The Florida plan further suggests you have a battery operated radio. I happen to have an old one upstairs with the batteries removed – just need to make sure to have some fresh batteries around someplace.

They don’t mention gassing up your car. But we do that as a matter of course anytime harsh weather is forecast.

I think this is about it for now. Probably other stuff I didn’t think of. I have a few fresh fire extinguishers around the home/office but have no pumps. May need to add that to the list…

~~~~

Comments?

Photo Credits: September 12 [2013], around 4:30pm [Water in Smiley Creek – Boulder Flood]

 

 

Enterprise file synch

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Last fall at SNW in San Jose there were a few vendors touting enterprise file synchronization services each having a slightly different version of the requirements.   The one that comes most readily to mind was Egnyte which supported file synchronization across a hybrid cloud (public cloud and network storage) which we discussed in our Fall SNWUSA wrap up post last year.

The problem with BYOD

With bring your own devices (BYOD) corporate end users are quickly abandoning any pretense of IT control and turning consumer class file synchronization services to help  synch files across desktop, laptop and all mobile devices they haul around.   But the problem with these solutions such as DropBoxBoxOxygenCloud and others are that they are really outside of IT’s control.

Which is why there’s a real need today for enterprise class file synchronization solutions that exhibit the ease of use and set up available from consumer file synch systems but  offer IT security, compliance and control over the data that’s being moved into the cloud and across corporate and end user devices.

EMC Syncplicity and EMC on premises storage

Last week EMC announced an enterprise version of their recently acquired Syncplicity software that supports on-premises Isilon or Atmos storage, EMC’s own cloud storage offering.

In previous versions of Syncplicity storage was based in the cloud and used Amazon Web Services (AWS) for cloud orchestration and AWS S3 for cloud storage. With the latest release, EMC adds on premises storage to host user file synchronization services that can span mobile devices, laptops and end user desktops.

New Syncplicity users must download desktop client software to support file synchronization or mobile apps for mobile device synchronization.  After that it’s a simple matter of identifying which if any directories and/or files are to be synchronized with the cloud and/or shared with others.

However, with the Business (read enterprise) edition one also gets the Security and Compliance console which supports access control to define users and devices that can synchronize or share data, enforce data retention policies, remote wipe corporate data,  and native support for single sign services. In addition, one also gets centralized user and group management services to grant, change, revoke user and group access to data.  Also, one now obtains enterprise security with AES-256 data-at-rest encryption, separate key manager data centers and data storage data centers, quadruple replication of data for high disaster fault tolerance and SAS70 Type II compliant data centers.

If the client wants to use on premises storage, they would also need to deploy a VM virtual appliance somewhere in the data center to act as the gateway to file synchronization service requests. The file synch server would also presumably need access to the on premises storage and it’s unclear if the virtual appliance is in-band or out-of-band (see discussion on Egnyte’s solution options below).

Egnyte’s solution

Egnyte comes as a software only solution building a file server in the cloud for end user  storage. It also includes an Egnyte app for mobile hardware and the ever present web file browser.  Desktop file access is provided via mapped drives which access the Egnyte cloud file server gateway running as a virtual appliance.

One major difference between Syncplicity and Egnyte is that Egnyte offers a combination of both cloud and on premises storage but you cannot have just on premises storage. Syncplicity only offers one or the other storage for file data, i.e., file synchronization data can only be in the cloud or on local on premises storage but cannot be in both locations.

The other major difference is that Egnyte operates with just about anybody’s NAS storage such as EMC, IBM, and HDS for the on premises file storage.  It operates as an in-band, software appliance solution that traps file activity going to your on premises storage. In this case, one would need to start using a new location or directory for data to be synchronized or shared.

But for NetApp storage only (today), they utilize ONTAP APIs to offer out-of-band file synchronization solutions.  This means that you can keep NetApp data where it resides and just enable synchronization/shareability services for the NetApp file data in current directory locations.

Egnyte promises enterprise class data security with AD, LDAP and/or SSO user authentication, AES-256 data encryption and their own secure data centers.  No mention of separate key security in their literature.

As for cloud backend storage, Egnyte has it’s own public cloud or supports other cloud storage providers such as AWS S3, Microsoft Azure, NetApp Storage Grid and HP Public Cloud.

There’s more to Egnyte’s solution than just file synchronization and sharing but that’s the subject of today’s post. Perhaps we can cover them at more length in a future post if their interest.

File synchronization, cloud storage’s killer app?

The nice thing about these capabilities is that now IT staff can re-gain control over what is and isn’t synched and shared across multiple devices.  Up until now all this was happening outside the data center and external to IT control.

From Egnyte’s perspective, they are seeing more and more enterprises wanting data both on premises for performance and compliance as well as in the cloud storage for ubiquitous access.  They feel its both a sharability demand between an enterprise’s far flung team members and potentially client/customer personnel as well as a need to access, edit and propagate silo’d corporate information using new mobile devices that everyone has these days.

In any event, Enterprise file synchronization and sharing is emerging as one of the killer apps for cloud storage.  Up to this point cloud gateways made sense for SME backup or disaster recovery solutions but IMO, didn’t really take off beyond that space.  But if you can package a robust and secure file sharing and synchronization solution around cloud storage then you just might have something that enterprise customers are clamoring for.

~~~~

Comments?

Oracle (finally) releases StorageTek VSM6

[Full disclosure: I helped develop the underlying hardware for VSM 1-3 and also way back, worked on HSC for StorageTek libraries.]

Virtual Storage Manager System 6 (VSM6) is here. Not exactly sure when VSM5 or VSM5E were released but it seems like an awful long time in Internet years.  The new VSM6 migrates the platform to Solaris software and hardware while expanding capacity and improving performance.

What’s VSM?

Oracle StorageTek VSM is a virtual tape system for mainframe, System z environments.  It provides a multi-tiered storage system which includes both physical disk and (optional) tape storage for long term big data requirements for z OS applications.

VSM6 emulates up to 256 virtual IBM tape transports but actually moves data to and from VSM Virtual Tape Storage Subsystem (VTSS) disk storage and backend real tape transports housed in automated tape libraries.  As VSM data ages, it can be migrated out to physical tape such as a StorageTek SL8500 Modular [Tape] Library system that is attached behind the VSM6 VTSS or system controller.

VSM6 offers a number of replication solutions for DR to keep data in multiple sites in synch and to copy data to offsite locations.  In addition, real tape channel extension can be used to extend the VSM storage to span onsite and offsite repositories.

One can cluster together up to 256 VSM VTSSs  into a tapeplex which is then managed under one pane of glass as a single large data repository using HSC software.

What’s new with VSM6?

The new VSM6 hardware increases volatile cache to 128GB from 32GB (in VSM5).  Non-volatile cache goes up as well, now supporting up to ~440MB, up from 256MB in the previous version.  Power, cooling and weight all seem to have also gone up (the wrong direction??) vis a vis VSM5.

The new VSM6 removes the ESCON option of previous generations and moves to 8 FICON and 8 GbE Virtual Library Extension (VLE) links. FICON channels are used for both host access (frontend) and real tape drive access (backend).  VLE was introduced in VSM5 and offers a ZFS based commodity disk tier behind the VSM VTSS for storing data that requires longer residency on disk.  Also, VSM supports a tapeless or disk-only solution for high performance requirements.

System capacity moves from 90TB (gosh that was a while ago) to now support up to 1.2PB of data.  I believe much of this comes from supporting the new T10,000C tape cartridge and drive (5TB uncompressed).  With the ability of VSM to cluster more VSM systems to the tapeplex, system capacity can now reach over 300PB.

Somewhere along the way VSM started supporting triple redundancy  for the VTSS disk storage which provides better availability than RAID6.  Not sure why they thought this was important but it does deal with increasing disk failures.

Oracle stated that VSM6 supports up to 1.5GB/Sec of throughput. Presumably this is landing data on disk or transferring the data to backend tape but not both.  There doesn’t appear to be any standard benchmarking for these sorts of systems so, will take their word for it.

Why would anyone want one?

Well it turns out plenty of mainframe systems use tape for a number of things such as data backup, HSM, and big data batch applications.  Once you get past the sunk  costs for tape transports, automation, cartridges and VSMs, VSM storage can be a pretty competitive data storage solution for the mainframe environment.

The fact that most mainframe environments grew up with tape and have long ago invested in transports, automation and new cartridges probably makes VSM6 an even better buy.  But tape is also making a comeback in open systems with LTO-5 and now LTO-6 coming out and with Oracle’s 5TB T10000C cartridge and IBM’s 4TB 3592 JC cartridge.

Not to mention Linear Tape File System (LTFS) as a new tape format that provides a file system for tape data which has brought renewed interest in all sorts of tape storage applications.

Competition not standing still

EMC introduced their Disk Library for Mainframe 6000 (DLm6000) product that supports two different backends to deal with the diversity of tape use in the mainframe environment.  Moreover, IBM has continuously enhanced their Virtual Tape Server the TS7700 but I would have to say it doesn’t come close to these capacities.

Lately, when I talked with long time StorageTek tape mainframe customers they have all said the same thing. When is VSM6 coming out and when will Oracle get their act in gear and start supporting us again.  Hopefully this signals a new emphasis on this market.  Although who is losing and who is winning in the mainframe tape market is the subject of much debate, there is no doubt that the lack of any update to VSM has hurt Oracle StorageTek tape business.

Something tells me that Oracle may have fixed this problem.  We hope that we start to see some more timely VSM enhancements in the future, for their sake and especially for their customers.

~~~~

Comments?

~~~~

Image credit: Interior of StorageTek tape library at NERSC (2) by Derrick Coetzee

 

VMworld first thoughts kickoff session

[Edited for readability. RLL] The drummer band was great at the start but we couldn’t tell if it was real or lipsynched. It turned out that each of the Big VMWORLD letters had a digital drum pad on them which meant it was live, in realtime.

Paul got a standing ovation as he left the stage introducing Pat the new CEO.  With Paul on the stage, there was much discussion of where VMware has come the last four years.  But IDC stats probably say it better than most in 2008 about 25% of Intel X86 apps were virtualized and in 2012 it’s about 60% and and Gartner says that VMware has about 80% of that activity.

Pat got up on stage and it was like nothing’s changed. VMware is still going down the path they believe is best for the world a virtual data center that spans private, on premises equipment and extrenal cloud service providers equipment.

There was much ink on software defined data center which is taking the vSphere world view and incorporating networking, more storage, more infrastructure to the already present virtualized management paradigm.

It’s a bit murky as to what’s changed, what’s acquired functionality and what’s new development but suffice it to say that VMware has been busy once again this year.

A single “monster vm” (has it’s own facebook page) now supports up to 64 vCPUs, 1TB of RAM, and can sustain more than a million IOPS. It seems that this should be enough for most mission critical apps out there today. No statement on latency the IOPS but with a million IOS a second and 64 vCPUs we are probably talking flash somewhere in the storage hierarchy.

Pat mentioned that the vRAM concept is now officially dead. And the pricing model is now based on physical CPUs and sockets. It no longer has a VM or vRAM component to it. Seemed like this got lots of applause.

There are now so many components to vCloud Suite that it’s almost hard to keep track of them all:  vCloud Director, vCloud Orchestrator, vFabric applications director, vCenter Operations Manager, of course vSphere and that’s not counting relatively recent acquisitions Dynamic Op’s a cloud dashboard and Nicira SDN services and I am probably missing some of them.

In addition to all that VMware has been working on Serengeti which is a layer added to vSphere to virtualize Hadoop clusters. In the demo they spun up and down a hadoop cluster with MapReduce operating to process log files.  (I want one of these for my home office environments).

Showed another demo of the vCloud suite in action spinning up a cloud data center and deploying applications to it in real time. Literally it took ~5minutes to start it up until they were deploying applications to it.  It was a bit hard to follow as it was going a lot into the WAN like networking environment configuration of load ballancing, firewalls and other edge security and workload characteristics but it all seemed pretty straightforward and took a short while but configured an actual cloud in minutes.

I missed the last part about social cast but apparently it builds a social network of around VMs?  [Need to listen better next time]

More to follow…

 

NetApp Analyst Summit Customer Panel – how to survive a category 5 tornado

20120621-085224.jpg
NetApp had three of their customer innovation winners come up on stage for a panel discussion with Dave Hitz moderating the discussion. All three had interesting deployments of NetApp storage systems:

  • Andrew Henderson from ING DIRECT talked about their need to deploy copies of the banks IT environment for test, development, optimization and security testing. This process took 12 weeks to accomplish the first time they tried and only created a single copy. They wanted to speed this up and be able to deploy 10 or more copies if necessary. Andrew looked at Microsoft Hyper-V, System Center and NetApp FlexClones and transformed this process to now generate a copy of the entire banks IT services in under 10 minutes. And since the new capabilities have been in place they have created over 400 copies of the bank (he called these bank-in-a-box) for various purposes.
  • Teresa Wahlert from Iowa Workforce Development Agency was up next and talked about their VDI implementation. Iowa cut their budget which forced them to shut down a number of physical offices. But with VDI, VMware and NetApp storage Workforce were able to disperse their services to over 3000 locations now in prisons, libraries, and other venues where they had no presence before. They put out a general call for all the tired, dying PCs in Iowa government and used these to host VDI services. Now Workforce services are up 7X24 locations, pretty amazing for government work. Apparently they had tried VDI before and their previous storage couldn’t handle it. They moved to NetApp with FlashCache and it worked just fine. That’s when they rolled it VDI services to their customers and businesses. With NetApp they were able to implement VDI, reduce storage costs (via deduplication and other storage efficiency features) and increase department services.
  • Jeff Bell at Mercy Healthcare talked about the difficulties of rolling out electronic health records (EHR) and their challenges of integrating ~30 hospitals and ~400 medical clinics. They started with EHR fairly early 2006-2007 well before the latest governmental push. He mentioned Joplin MO and last years category 5 tornado which about wiped out their hospital there. He said within 2 hours after the disaster, Mercy Healthcare was printing out the EHR for the 183 patients present in the hospital at the time that had to be moved to other care facilities. The promise of EHR is that the information travels with the patient, can be recovered in the event of a disaster and is immediately available.  It seems that at least at Mercy Healthcare, EHR is living up to its promise. In addition, they just built a new data center as they were running out of space, power and cooling at the old one. They installed new NetApp storage there and for the first few months had to run heaters to keep the data center live-able because the new power/cooling load was so far below what they were experienced previously. Looking back on what they had accomplished Jeff was not so sure they would build a new data center again. With new cloud offerings coming out and the reduced power/cooling and increased density of NetApp storage they could almost get by without another data center at all.

That’s about it from the customer session.

NetApp execs spent the rest of the day on innovation, mostly at NetApp but also in the IT industry in general.

There was lots of discussion on the new release of Data ONTAP 8.1.1 with its latest cluster mode features.  NetApp positioned it as fulfilling out the transition to  data/storage as an infrastructure that IT has been pushing for the last decade or so.  Following in the grand tradition of what IBM did for computing infrastructure with the 360 and what Cisco and others did for networking infrastructure in the mid 80’s.

Comments?

VMware disaster recovery

Thunderstorms over Alexandria, VA by mehul.antani (cc) (from Flickr)
Thunderstorms over Alexandria, VA by mehul.antani (cc) (from Flickr)

I did an article awhile ago for TechTarget on Virtual (machine) Disaster Recovery and discussed what was then the latest version of VMware Site Recovery Manager (SRM) v1.0 and some of it’s capabilities.

Well its been a couple of years since that came out and I thought it would be an appropriate time to discuss some updates to that product and other facilities that bear on virtual machine disaster recovery of today.

SRM to the rescue

Recall that VMware’s SRM is essentially a run book automation tool for system failover.  Using SRM, an administrator defines the physical and logical mapping between a primary site configuration of (protected site in SRM parlance) virtual machines, networking, and data stores and a secondary site (recovery site to SRM) configuration.

Once this mapping is complete, the administrator then creates recovery scripts (recovery plans to SRM) which take the recovery site in a step-by-step fashion from an “inactive” to an “active” state.  With the recovery scripts in hand, data replication can then be activated and monitoring (using storage replication adaptors, SRAs to SRM) can begin.  Once all that was ready and operating, SRM can provide one button failover to the recovery site.

SRM v4.1 supports the following:

  • NFS data stores can now be protected as well as iSCSI and FC LUN data stores.  Recall that a VMFS  (essentially a virtual machine device or drive letter) or a VM data store can be hosted on LUNs or as NFS files.  NFS data stores have recently become more popular with the proliferation of virtual machines under vSphere 4.1.
  • Raw device mode (RDM) LUNs can now be protected. Recall that RDM is another way to access devices directly for performance sensitive VMs eliminating the need to use a data store and  hyper-visor IO overhead.
  • Shared recovery sites are now supported. As such, one recovery site can now support multiple protected sites.  In this way a single secondary site can support failover from multiple primary sites.
  • Role based access security is now supported for recovery scripts and other SRM administration activities. In this way fine grained security roles can be defined that allow protection over unauthorized use of SRM capabilities.
  • Recovery site alerting is now supported. SRM now fully monitors recovery site activity and can report on and alert operations staff when problems occur which may impact failover to the recovery site.
  • SRM test and actual failover can now be initiated and monitored directly from vCenter serve. This provides the vCenter administrator significant control over SRM activities.
  • SRM automated testing can now use storage snapshots.  One advantage of SRM is the ability to automate DR testing which can be done onsite using local equipment. Snapshots eliminates the need for storage replication in local DR tests.

There were many other minor enhancements to SRM since v1.0 but these seem the major ones to me.

The only things lacking seem to be some form of automated failback and three way failover.  I’ll talk about 3-way failover later.

But without automated failback, the site administrator must reconfigure the two sites and reverse the designation of protected and recovery sites, re-mirror the data in the opposite direction and recreate recovery scripts to automate bringing the primary site back up.

However, failback is likely not to be as time sensitive as failover and could very well be a scheduled activity, taking place over a much longer time period. This can, of course all be handled automatically by SRM or be done in a more manual fashion.

Other DR capabilities

At last year’s EMCWorld VPLEX was announced which provided for a federation of data centers or as I called it at the time Data-at-a-Distance (DaaD).  DaaD together with VMware’s Vmotion could provide a level of  disaster avoidance (see my post on VPLEX surfaces at EMCWorld) previously unattainable.

No doubt cluster services from Microsoft Cluster Server (MSCS), Symantec Veritas Cluster Services (VCS)  and others have also been updated.  In some (mainframe) cluster services, N-way or cascaded failover is starting to be supported.  For example, a 3 way DR scenario has a primary site synchronously replicated to a secondary site which is asynchronously replicated to a third site.  If the region where the primary and secondary site is impacted by a disaster, the tertiary site can be brought online. Such capabilities are not yet available for virtual machine DR but it’s only a matter of time.

—–

Disaster recovery technologies are not standing still and VMware SRM is no exception. I am sure a couple of years from now SRM will be even more capable and other storage vendors will provide DaaD capabilities to rival VPLEX.   What the cluster services folks will be doing by that time I can’t even imagine.

Comments?