NetApp Analyst Summit Customer Panel – how to survive a category 5 tornado

20120621-085224.jpg
NetApp had three of their customer innovation winners come up on stage for a panel discussion with Dave Hitz moderating the discussion. All three had interesting deployments of NetApp storage systems:

  • Andrew Henderson from ING DIRECT talked about their need to deploy copies of the banks IT environment for test, development, optimization and security testing. This process took 12 weeks to accomplish the first time they tried and only created a single copy. They wanted to speed this up and be able to deploy 10 or more copies if necessary. Andrew looked at Microsoft Hyper-V, System Center and NetApp FlexClones and transformed this process to now generate a copy of the entire banks IT services in under 10 minutes. And since the new capabilities have been in place they have created over 400 copies of the bank (he called these bank-in-a-box) for various purposes.
  • Teresa Wahlert from Iowa Workforce Development Agency was up next and talked about their VDI implementation. Iowa cut their budget which forced them to shut down a number of physical offices. But with VDI, VMware and NetApp storage Workforce were able to disperse their services to over 3000 locations now in prisons, libraries, and other venues where they had no presence before. They put out a general call for all the tired, dying PCs in Iowa government and used these to host VDI services. Now Workforce services are up 7X24 locations, pretty amazing for government work. Apparently they had tried VDI before and their previous storage couldn’t handle it. They moved to NetApp with FlashCache and it worked just fine. That’s when they rolled it VDI services to their customers and businesses. With NetApp they were able to implement VDI, reduce storage costs (via deduplication and other storage efficiency features) and increase department services.
  • Jeff Bell at Mercy Healthcare talked about the difficulties of rolling out electronic health records (EHR) and their challenges of integrating ~30 hospitals and ~400 medical clinics. They started with EHR fairly early 2006-2007 well before the latest governmental push. He mentioned Joplin MO and last years category 5 tornado which about wiped out their hospital there. He said within 2 hours after the disaster, Mercy Healthcare was printing out the EHR for the 183 patients present in the hospital at the time that had to be moved to other care facilities. The promise of EHR is that the information travels with the patient, can be recovered in the event of a disaster and is immediately available.  It seems that at least at Mercy Healthcare, EHR is living up to its promise. In addition, they just built a new data center as they were running out of space, power and cooling at the old one. They installed new NetApp storage there and for the first few months had to run heaters to keep the data center live-able because the new power/cooling load was so far below what they were experienced previously. Looking back on what they had accomplished Jeff was not so sure they would build a new data center again. With new cloud offerings coming out and the reduced power/cooling and increased density of NetApp storage they could almost get by without another data center at all.

That’s about it from the customer session.

NetApp execs spent the rest of the day on innovation, mostly at NetApp but also in the IT industry in general.

There was lots of discussion on the new release of Data ONTAP 8.1.1 with its latest cluster mode features.  NetApp positioned it as fulfilling out the transition to  data/storage as an infrastructure that IT has been pushing for the last decade or so.  Following in the grand tradition of what IBM did for computing infrastructure with the 360 and what Cisco and others did for networking infrastructure in the mid 80’s.

Comments?

VMware disaster recovery

Thunderstorms over Alexandria, VA by mehul.antani (cc) (from Flickr)
Thunderstorms over Alexandria, VA by mehul.antani (cc) (from Flickr)

I did an article awhile ago for TechTarget on Virtual (machine) Disaster Recovery and discussed what was then the latest version of VMware Site Recovery Manager (SRM) v1.0 and some of it’s capabilities.

Well its been a couple of years since that came out and I thought it would be an appropriate time to discuss some updates to that product and other facilities that bear on virtual machine disaster recovery of today.

SRM to the rescue

Recall that VMware’s SRM is essentially a run book automation tool for system failover.  Using SRM, an administrator defines the physical and logical mapping between a primary site configuration of (protected site in SRM parlance) virtual machines, networking, and data stores and a secondary site (recovery site to SRM) configuration.

Once this mapping is complete, the administrator then creates recovery scripts (recovery plans to SRM) which take the recovery site in a step-by-step fashion from an “inactive” to an “active” state.  With the recovery scripts in hand, data replication can then be activated and monitoring (using storage replication adaptors, SRAs to SRM) can begin.  Once all that was ready and operating, SRM can provide one button failover to the recovery site.

SRM v4.1 supports the following:

  • NFS data stores can now be protected as well as iSCSI and FC LUN data stores.  Recall that a VMFS  (essentially a virtual machine device or drive letter) or a VM data store can be hosted on LUNs or as NFS files.  NFS data stores have recently become more popular with the proliferation of virtual machines under vSphere 4.1.
  • Raw device mode (RDM) LUNs can now be protected. Recall that RDM is another way to access devices directly for performance sensitive VMs eliminating the need to use a data store and  hyper-visor IO overhead.
  • Shared recovery sites are now supported. As such, one recovery site can now support multiple protected sites.  In this way a single secondary site can support failover from multiple primary sites.
  • Role based access security is now supported for recovery scripts and other SRM administration activities. In this way fine grained security roles can be defined that allow protection over unauthorized use of SRM capabilities.
  • Recovery site alerting is now supported. SRM now fully monitors recovery site activity and can report on and alert operations staff when problems occur which may impact failover to the recovery site.
  • SRM test and actual failover can now be initiated and monitored directly from vCenter serve. This provides the vCenter administrator significant control over SRM activities.
  • SRM automated testing can now use storage snapshots.  One advantage of SRM is the ability to automate DR testing which can be done onsite using local equipment. Snapshots eliminates the need for storage replication in local DR tests.

There were many other minor enhancements to SRM since v1.0 but these seem the major ones to me.

The only things lacking seem to be some form of automated failback and three way failover.  I’ll talk about 3-way failover later.

But without automated failback, the site administrator must reconfigure the two sites and reverse the designation of protected and recovery sites, re-mirror the data in the opposite direction and recreate recovery scripts to automate bringing the primary site back up.

However, failback is likely not to be as time sensitive as failover and could very well be a scheduled activity, taking place over a much longer time period. This can, of course all be handled automatically by SRM or be done in a more manual fashion.

Other DR capabilities

At last year’s EMCWorld VPLEX was announced which provided for a federation of data centers or as I called it at the time Data-at-a-Distance (DaaD).  DaaD together with VMware’s Vmotion could provide a level of  disaster avoidance (see my post on VPLEX surfaces at EMCWorld) previously unattainable.

No doubt cluster services from Microsoft Cluster Server (MSCS), Symantec Veritas Cluster Services (VCS)  and others have also been updated.  In some (mainframe) cluster services, N-way or cascaded failover is starting to be supported.  For example, a 3 way DR scenario has a primary site synchronously replicated to a secondary site which is asynchronously replicated to a third site.  If the region where the primary and secondary site is impacted by a disaster, the tertiary site can be brought online. Such capabilities are not yet available for virtual machine DR but it’s only a matter of time.

—–

Disaster recovery technologies are not standing still and VMware SRM is no exception. I am sure a couple of years from now SRM will be even more capable and other storage vendors will provide DaaD capabilities to rival VPLEX.   What the cluster services folks will be doing by that time I can’t even imagine.

Comments?

 

SNIA CDMI plugfest for cloud storage and cloud data services

Plug by Samuel M. Livingston (cc) (from Flickr)
Plug by Samuel M. Livingston (cc) (from Flickr)

Was invited to the SNIA tech center to witness the CDMI (Cloud Data Managament Initiative) plugfest that was going on down in Colorado Springs.

It was somewhat subdued. I always imagine racks of servers, with people crawling all over them with logic analyzers, laptops and other electronic probing equipment.  But alas, software plugfests are generally just a bunch of people with laptops, ethernet/wifi connections all sitting around a big conference table.

The team was working to define an errata sheet for CDMI v1.0 to be completed prior to ISO submission for official standardization.

What’s CDMI?

CDMI is an interface standard for clients talking to cloud storage servers and provides a standardized way to access all such services.  With CDMI you can create a cloud storage container, define it’s attributes, and deposit and retrieve data objects within that container.  Mezeo had announced support for CDMI v1.0 a couple of weeks ago at SNW in Santa Clara.

CDMI provides for attributes to be defined at the cloud storage server, container or data object level such as: standard redundancy degree (number of mirrors, RAID protection), immediate redundancy (synchronous), infrastructure redundancy (across same storage or different storage), data dispersion (physical distance between replicas), geographical constraints (where it can be stored), retention hold (how soon it can be deleted/modified), encryption, data hashing (having the server provide a hash used to validate end-to-end data integrity), latency and throughput characteristics, sanitization level (secure erasure), RPO, and RTO.

A CDMI client is free to implement compression and/or deduplication as well as other storage efficiency characteristics on top of CDMI server characteristics.  Probably something I am missing here but seems pretty complete at first glance.

SNIA has defined a reference implementations of a CDMI v1.0 server [and I think client] which can be downloaded from their CDMI website.  [After filling out the “information on me” page, SNIA sent me an email with the download information but I could only recognize the CDMI server in the download information not the client (although it could have been there). The CDMI v1.0 specification is freely available as well.] The reference implementation can be used to test your own CDMI clients if you wish. They are JAVA based and apparently run on Linux systems but shouldn’t be too hard to run elsewhere. (one CDMI server at the plugfest was running on a Mac laptop).

Plugfest participants

There were a number people from both big and small organizations at SNIA’s plugfest.

Mark Carlson from Oracle was there and seemed to be leading the activity. He said I was free to attend but couldn’t say anything about what was and wasn’t working.  Didn’t have the heart to tell him, I couldn’t tell what was working or not from my limited time there. But everything seemed to be working just fine.

Carlson said that SNIA’s CDMI reference implementations had been downloaded 164 times with the majority of the downloads coming from China, USA, and India in that order. But he said there were people in just about every geo looking at it.  He also said this was the first annual CDMI plugfest although they had CDMI v0.8 running at other shows (i.e, SNIA SDC) before.

David Slik, from NetApp’s Vancouver Technology Center was there showing off his demo CDMI Ajax client and laptop CDMI server.  He was able to use the Ajax client to access all the CDMI capabilities of the cloud data object he was presenting and displayed the binary contents of an object.  Then he showed me the exact same data object (file) could be easily accessed by just typing in the proper URL into any browser, it turned out the binary was a GIF file.

The other thing that Slik showed me was a display of a cloud data object which was created via a “Cron job” referencing to a satellite image website and depositing the data directly into cloud storage, entirely at the server level.  Slik said that CDMI also specifies a cloud storage to cloud storage protocol which could be used to move cloud data from one cloud storage provider to another without having to retrieve the data back to the user.  Such a capability would be ideal to export user data from one cloud provider and import the data to another cloud storage provider using their high speed backbone rather than having to transmit the data to and from the user’s client.

Slik was also instrumental in the SNIA XAM interface standards for archive storage.  He said that CDMI is much more light weight than XAM, as there is no requirement for a runtime library whatsoever and only depends on HTTP standards as the underlying protocol.  From his viewpoint CDMI is almost XAM 2.0.

Gary Mazzaferro from AlloyCloud was talking like CDMI would eventually take over not just cloud storage management but also local data management as well.  He called the CDMI as a strategic standard that could potentially be implemented in OSs, hypervisors and even embedded systems to provide a standardized interface for all data management – cloud or local storage.  When I asked what happens in this future with SMI-S he said they would co-exist as independent but cooperative management schemes for local storage.

Not sure how far this goes.  I asked if he envisioned a bootable CDMI driver? He said yes, a BIOS CDMI driver is something that will come once CDMI is more widely adopted.

Other people I talked with at the plugfest consider CDMI as the new web file services protocol akin to NFS as the LAN file services protocol.  In comparison, they see Amazon S3 as similar to CIFS (SMB1 & SMB2) in that it’s a proprietary cloud storage protocol but will also be widely adopted and available.

There were a few people from startups at the plugfest, working on various client and server implementations.  Not sure they wanted to be identified nor for me to mention what they were working on. Suffice it to say the potential for CDMI is pretty hot at the moment as is cloud storage in general.

But what about cloud data consistency?

I had to ask about how the CDMI standard deals with eventual consistency – it doesn’t.  The crowd chimed in, relaxed consistency is inherent in any distributed service.  You really have three characteristics Consistency, Availability and Partitionability (CAP) for any distributed service.  You can elect to have any two of these, but must give up the third.  Sort of like the Hiesenberg uncertainty principal applied to data.

They all said that consistency is mainly a CDMI client issue outside the purview of the standard, associated with server SLAs, replication characteristics and other data attributes.   As such, CDMI does not define any specification for eventual consistency.

Although, Slik said that the standard does guarantee if you modify an object and then request a copy of it from the same location during the same internet session, that it be the one you last modified.  Seems like long odds in my experience.   Unclear how CDMI, with relaxed consistency can ever take the place of primary storage in the data center but maybe it’s not intended to.

—–

Nonetheless, what I saw was impressive, cloud storage from multiple vendors all being accessed from the same client, using the same protocols.  And if that wasn’t simple enough for you, just use your browser.

If CDMI can become popular it certainly has the potential to be the new web file system.

Comments?

 

Correlated risk

Aerial view of damage to Wakuya, Japan following earthquake. by Official U.S. Navy... (cc) (from Flickr)
Aerial view of damage to Wakuya, Japan following earthquake. by Official U.S. Navy... (cc) (from Flickr)

What’s the chance that

  • an earthquake  at sea could knock out primary power and generate a tsunami which would also knock out backup generators for nuclear power plant emergency cooling equipment (1 in 40 yrs),
  • an overextended speculative market segment would collapse and cause widespread ruin that would take down both equity and bond markets and force 100s of financial institutions to go under (1 in 77 yrs),
  • a hurricane occurs that destroys flood barriers which then flood your home, office and the place you store your backups (?)

All these represent correlated risks that prior to the actual event, were deemed very improbable.  But high improbability, doesn’t mean it will never happen.

Correlated risk defined

A correlated risk is the risk of any subsequent disaster or event occurring after a primary event or catastrophe has occurred. In the case of natural disasters, any event that is generated as a consequence or because of an originating event occurrance is a correlated event and as such, has a correlated risk.

I once worked for a major company, that kept their disaster recovery backups in a basement, underground in the same campus as their headquarters.  This seemed risky, as any event which took out the campus could potentially damage this basement as well and all the associated backup tapes.

How to understand your correlated risk

It seems to me to be pretty straightforward to understand correlated risk within the framework of a business continuity or disaster recovery plan (BC or DR plan).  One lists in one column all possible primary accidents, calamities, disasters, etc., man made or natural, in another column other possible accidents, calamities, disasters, etc. that are generated because of the prime event.

One then recurses on this process to generate all possible correlated events associated with the primary or previous correlated event until you exhaust all possible chains of catastrophes associated with the primary disaster. Then in a third column, list the potential scope (distance or area impacted) and outcomes (what damage could be expected) of all those activities in the first two columns.  In a fourth column, one lists the best guess probability of the events and/or the correlated event(s) occurring.

In the end, you should have an exhaustive list of things you should be preparing for.  Now one ranks the events in probability order and tackles them from highest to lowest probability.  There is some cutoff point that everyone reaches depending on their risk tolerance, at some point dealing with the multiple disasters that could potentially occur becomes too costly to deal with. But it all depends on risk tolerance.  For instance, a nuclear plant probably needs a much higher risk tolerance than your average corporate environment.

With that in place you have a start on a BC and/or DR plan.  Now all you need to determine is your risk tolerance level and how to handle primary and correlated risks that fall within that level.

A correlated risk analysis

Take Silverton Consulting as an example .  I take daily incremental backups stored on a local hard disk, take weekly “partial backups” (critical business files only) to removable media but also stored locally in the office, and take monthly full backups stored in a safety deposit box located in a vault in the basement of a bank within five miles of the office.

If I just look at natural events:

  • My first and most likely natural event is building fire – in this case the scope of the event would be limited to the building, which would take out both the local hard disk incrementals and weekly partial backups but the safety deposit box of monthly fulls would still be accessible.
  • A possible correlated event as well as another primary event could be wild fire – in this case, potentially both the office and the bank could be consumed and all backups would be lost.  The fact that the bank is 5 miles away, has it’s own fire suppression system, and has my backups located in their basement, just reduces the probability of a wild fire impacting both locations but doesn’t eliminate it.
  • Another possible correlated event to any wild fire would be loss of power, transport, and communication services – the fact that the bank is only 5 miles away, indicates that if the primary office loses these services, it’s highly probably that the bank would lose them as well.  Access to the bank vault backups, under these circumstances would be delayed at best, until at least such services could be restored.  Had I been using a cloud provider backup service (which I am considering), I couldn’t access my data until communication services were restored or until I had moved far enough away to regain access to these services.  Wth the roads/other transport being out this would take some time.
  • Next most likely natural event is flood. Our location is within a 100 year flood plane, so a serious flood is possible that would take out the office once every 100 yrs.  I would like to say that our bank is outside our flood plane, but I just don’t know yet.  But I promise to find out.
  • A correlated event to a flood is a loss of power, transport and communication services. The scope and consequences of this catastrophe are similar to that discussed above.
  • Next most likely natural event is tornado, …
  • Next most likely natural event is earthquake, …
  • Next most likely natural event is volcano eruption,

… and the list goes on.  Of course these are just natural disasters, one would need to consider man-made catastrophes as well.

In any event, all these have a distinct, non-zero probability.  One can come up with some calculation of the probability of such primary and correlated events through research and/or other means.

For instance, I get a fortnightly email from Colorado University’s Natural Hazards Center which occasionally provides some insight into these probabilities. Potentially, your corporatations insurance companies can also provide some guidance into these probabilities as well.

What is risk tolerance?

But at some point, only the company can determine it’s risk tolerance.  I believe risk tolerance to be some combination of money one is willing to invest and your ability to invest it in mitigating risks.   For example, let’s say my company makes $10M a year in revenues.  Given the importance of IT to my corporation’s activities a reasonable risk tolerance in $ terms might be somewhere between 0.1% to 1.0% of revenues or $10K to $100K.   I must say I am probably spending more than that percentage of SCI revenues in my current DR activities, such as they are, but I include weekly and monthly backups with these costs (most would not include these activities in pure DR spending).

—-

So as the disaster in Japan continues, let us pray that it works out well in the end for all parties.  But also let’s use this time to re-examine our risk tolerance and disaster recovery plans with respect to correlated risks.  Hopefully, we will all do better next time.

Comments?

Deskchecking BC/DR plans

Hurricane Ike - 2008/09/12 - 21:26 UTC by CoreBurn (cc) (from Flickr)
Hurricane Ike - 2008/09/12 - 21:26 UTC by CoreBurn (cc) (from Flickr)

Quite a lot of twitter traffic/tweetchat this Wednesday on DR/BC all documented on #sanchat sponsored by Compellent. In that discussion I mentioned a presentation I did a couple of years ago for StorageDecisions/Chicago on Successful Disaster Recovery Testing where I discussed some of the techniques companies use to provide disaster recovery and how they validated these activities.

For those shops with the luxury of having an owned or contracted for “hot-site” or “warm-site”, DR testing should be an ongoing and periodic activity. In that presentation I suggested testing DR plans at least 1/year but more often if feasible. In this case a test is a “simulated disaster declaration” where operations is temporarily moved to an alternate site.  I know of one European organization which tested their DR plans every week but they owned the hot-site and their normal operations were split across the two sites.

For organizations that have “cold-sites” or no sites, the choices for DR testing are much more limited. In these situations, I recommended a way to deskcheck or walkthru a BC/DR plan, which didn’t involve any hardware testing. This is like a code or design inspection but applied to a BC/DR plans.

How to perform a BC/DR plan deskcheck/walkthru

In a BC/DR plan deskcheck there are a few roles, namely a leader, a BC/DR plan owner, a recorder,  and participants.  The BC/DR deskcheck process looks something like:

  1. Before the deskcheck, the leader identifies walkthru team members from operations, servers, storage, networking, voice, web, applications, etc.; circulates the current BC/DR plan to all team members; and establishes the meeting date-times.
  2. The leader decides which failure scenario will be used to test the DR/BC plan.  This can be driven by the highest probability or use some form of equivalence testing. (In equivalence testing one collapses the potential failure scenarios into a select set which have similar impacts.)
  3. In the pre-deskcheck meeting,  the leader discusses the roles of the team members and identifies the failure scenario to be tested.  IT staff and other participants are to determine the correctness of the DR/BC plan “from their perspective”.  Every team member is supposed to read the BC/DR plan before the deskcheck/walkthru meeting to identify problems with it ahead of time.
  4. At the deskcheck/walkthru meeting, The leader starts the session by describing the failure scenario and states what, if any  data center, telecom, transport facilities are available, the state of the alternate site, and current whereabouts of IT staff, establishing the preconditions for the BC/DR simulation.  Team members should concur with this analysis or come to consensus on the scenario’s impact on facilities, telecom, transport and staffing.
  5. Next, the owner of the plan, describes the first or next step in detail identifying all actions taken and impact on the alternate site. Participants then determines if the step performs the actions as stated or not.  Also,
    1. Participants discuss the duration for step to complete to place everything on the same time track. For instance at
      1. T0: it’s 7pm on a Wednesday, a fire-flood-building collapse occurs, knocks out the main data center, all online portals are down, all application users are offline, …, luckily operations personnel are evacuated and their injuries are slight.
      2. T1: Head of operations is contacted and declares a disaster; activates the disaster site; calls up the DR team to get to on a conference call ASAP, …
      3. T2: Head of operations, requests backups be sent to the alternate site; personnel are contacted and told to travel to the DR site; Contracts for servers, storage and other facilities at DR site are activate; …
    2. The recorder pays particular attention to any problems brought up during the discussion, ties them to the plan step, identifies originator of the issue, and discusses its impact.  Don’t try to solve the problems,  just record  them and its impact .
    3. The Leader or their designee maintains an official plan timeline in real time. This timeline can be kept on a whiteboard or an (excel/visio chart) display for all to see.  Timeline documentation can be kept as a formal record of the walkthru along with the problem list, and the BC/DR plan.
    4. This step is iterated for every step in the BC/DR plan until the plan is completed.
  6. At the end, the recorder lists all the problems encountered and provides a copy to the plan owner.
  7. The team decides if another deskcheck rewiew is warranted on this failure scenario (depends on the number and severity of the problems identified).
  8. When the owner of the plan has resolved all the issues, he or she reissues the plan to everyone that was at the meeting.
  9. If another deskcheck is warranted, the leader issues another meeting call.

This can take anywhere from half a day to a couple of days. BUT deskchecking your BC/DR plan can be significantly less costly than any actual test.  Nevertheless, a deskcheck cannot replace an actual BC/DR plan simulation test on real hardware/software.

Some other hints from code and design inspections

  • For code or design inspections, a checklist of high probability errors is used to identify and familiarize everyone with these errors.  Checklists can focus participant review to look for most probable errors. The leader can discuss these most likely errors at the pre-deskcheck meeting.
  • Also, problems are given severities, like major or minor problems.  For example,  a BC/DR plan “minor” problem might be an inadequate duration estimate for an activity.  A “major” problem might be a mission critical app not coming up after a disaster.

So that’s what a BC/DR plan deskcheck would look like. If you did a BC/DR plan once a quarter you are doing probably better than most.  And if on top of that, you did a yearly full scale DR simulation on real hardware you would be considered well prepared in my view.  What do you think?

Problems solved, introduced and left unsolved by cloud storage

Cloud whisps (sic) by turtlemom4bacon (cc) (from flickr)
Cloud whisps (sic) by turtlemom4bacon (cc) (from flickr)

When I first heard about cloud storage I wondered just what exactly it was trying to solve. There are many storage problems within the IT shop nowadays days, cloud storage can solve a few of them but introduces more and leaves a few unsolved.

Storage problems solved by cloud storage

  • Dynamic capacity – storage capacity is fixed once purchased/leased. Cloud storage provides an almost infinite amount of storage for your data. One pays for this storage, in GB or TB per month increments, with added storage services (multi-site replication, high availability, etc.) at extra charge. Such capacity can be reduced or expanded at a moments notice.
  • Offsite DR – disaster recovery for many small shops is often non-existent or rudimentary at best. Using cloud storage, data can be copied to the cloud and accessed anywhere via the internet. Such data copies can easily support rudimentary DR for a primary data center outage.
  • Access anywhere – storage is typically local to the IT shop and can normally only be accessed at that location. Cloud storage can be accessed from any internet access point. Applications that are designed to operate all over the world can easily take advantage of such storage.
  • Data replication – data should be replicated for high availability. Cloud storage providers can replicate your data to multiple sites so that if one site goes down other sites can still provide service.

Storage problems introduced by the cloud

  • Variable access times – local storage access times vary from 1 and 100 milleseconds. However, accessing cloud storage can take from 100’s of milleseconds to minutes depending on network connectivity. Many applications cannot endure such variable access times.
  • Different access protocols – local storage support fairly standard access protocols like FC, iSCSI, NFS, and/or CIFS/SMB. Barring the few (but lately increasing) cloud providers that provide NFS access protocol, most cloud storage requires rewriting applications to use new protocols such as REST to store and access cloud file data.
  • Governance over data – local storage is by definition all located inside one data center. Many countries do not allow personal and/or financial data to be stored outside the country of origin. Some cloud storage providers will not guarantee that data stored in the cloud couldn’t be stored outside the country and jurisdiction of a single country.

Storage problems not solved by the cloud:

  • Data backups – data protection via some form of backup is essential. Nothing says that cloud storage providers cannot provide backup of data in the cloud but few if any provide such service. See my Are backups needed in the cloud post.
  • Data security – data security remains an ongoing problem for the local data center moving the data to the cloud just makes security more difficult. Many cloud storage providers provide rudimentary security for data stored but none seem to have integrated strong authentication and encryption services that might provide true data security.
  • Energy consumption – today’s storage consumes power and cooling. Although, cloud storage can be more efficient than onsite storage, this does not eliminate the environmental cost of storage.
  • Data longevity – data stored in the cloud can just as easily go obsolete as data stored locally.

Probably some I have missed here but these are a good start.