Top Ten RayOnStorage Posts for 2012

Here are the top 10 blog posts for 2012 from RayOnStorage.com

1. Snow Leopard to Mountain Lion

We discuss our Mac OSX transition from Snow Leopard to Mountain Lion with the good, bad and ugly of Mountain Lion from a novice user’s perspective.

2. Vsphere 5.1 storage enhancements and future vision

We detail some of the storage enhancements and directions for the latest revision of VMware Vsphere 5.1

3.  Object Storage Summit wrap up

We discuss last months ExecEvent Object Storage Summit and some of the use cases driving customers to adopt object storage for their data centers.

4. EMCWorld2012 part 1 – VNX/VNXe

We analyze the first day of EMCWorld2012 focused on EMC’s VNX/VNXe product enhancements.

5. Dell Storage Forum 2012 – day 2

We discuss the new Compellent and FluidFS systems coming out of Dell Storage Forum and their latest RNA Networks acquisition with a coherent Flash Cache network.

6. EMC buys ExtremeIO

Right before EMCWorld2012, EMC announced their purchase of ExtremeIO which was rumored for sometime but signaled a new path to flash only SAN storage systems.

7. HDS Influencer Summit wrap up

HDS held their Influencer Summit last month and rolled out their executive team to talk about their storage and service directions and successes.

8. Oracle finally releases StorageTek VSM6

Well after much delay we finally get to see the latest generation Virtual Storage Manager 6 (VSM6) for the mainframe System z market place.

9. Coraid, first thoughts

We got to meet with Coraid as part of a Storage TechField Day event and we came away impressed but still wanting to learn more.

10. Latest SPC-1 results IOPS vs. drive counts – chart of the month

Every month (or so) we do a more detailed analysis of a chart that appears in our free monthly newsletter, this was done earlier in the year and documented the correlation between IOPS and drive counts in SPC-1 results.

Happy New Year.

One platform to rule them all – Compellent&EqualLogic&Exanet from Dell

Compellent drive enclosure (c) 2010 Compellent (from Compellent.com)
Compellent drive enclosure (c) 2010 Compellent (from Compellent.com)

Dell and Compellent may be a great match because Compellent uses commodity hardware combined with specialized software to create their storage subsystem. If there’s any company out there that can take advantage of commodity hardware it’s probably Dell. (Of course Commodity hardware always loses in the end, but that’s another story).

Similarly, Dell’s EqualLogic iSCSI storage system uses commodity hardware to provide its iSCSI storage services.  It doesn’t take a big leap of imagination to have one storage system that combines the functionality of EqualLogic’s iSCSI and Compellent’s FC storage capabilities.  Of course there are others already doing this including Compellent themselves which have their own iSCSI support already built into their FC storage system.

Which way to integrate?

Does EqualLogic survive such a merger?  I think so.  It’s easy to imagine that Equal Logic may have the bigger market share today. If that’s so, the right thing might be  to merge Compellent FC functionality into EqualLogic.  If Compellent has the larger market, the correct approach is the opposite. The answer lies probably with a little of both.  It seems easiest to add iSCSI functionality to a FC storage system than the converse but the FC to iSCSI approach may be the optimum path for Dell, because of the popularity of their EqualLogic storage.

What about NAS?

The only thing missing from this storage system is NAS.  Of course the Compellent storage offers a NAS option through the use of a separate Windows Storage Server (WSS) front end.  Dell’s EqualLogic does the much the same to offer NAS protocols for their iSCSI system.  Neither of these are bad solutions but they are not a fully integrated NAS offering such as available from NetApp and others.

However, there is a little discussed part, the Dell-Exanet acquisition which happened earlier this year. Perhaps the right approach is to integrate Exanet with Compellent first and target this at the high end enterprise/HPC market place, keeping Equal Logic at the SMB end of the marketplace.  It’s been a while since I have heard about Exanet, and nothing since the acquisition earlier this year.  Does it make sense to backend a clustered NAS solution with FC storage – probably.

—-

Much of this seems doable to me, but it all depends on taking the right moves once the purchase is closed.   But if I look at where Dell is weakest (baring their OEM agreement with EMC), it’s in the highend storage space.  Compellent probably didn’t have much of a foot print there as possible due to their limited distribution and support channel.  A Dell acquisition could easily eliminate these problems and open up this space without having to do much other than start to marketing, selling and supporting Compellent.

In the end, a storage solution supporting clustered NAS, FC, and iSCSI that combined functionality equivalent to Exanet, Compellent and EqualLogic based on commodity hardware (ouch!) could make a formidable competitor to what’s out there today if done properly. Whether Dell could actually pull this off and in a timely manner even if they purchase Compellent, is another question.

Comments?

Deskchecking BC/DR plans

Hurricane Ike - 2008/09/12 - 21:26 UTC by CoreBurn (cc) (from Flickr)
Hurricane Ike - 2008/09/12 - 21:26 UTC by CoreBurn (cc) (from Flickr)

Quite a lot of twitter traffic/tweetchat this Wednesday on DR/BC all documented on #sanchat sponsored by Compellent. In that discussion I mentioned a presentation I did a couple of years ago for StorageDecisions/Chicago on Successful Disaster Recovery Testing where I discussed some of the techniques companies use to provide disaster recovery and how they validated these activities.

For those shops with the luxury of having an owned or contracted for “hot-site” or “warm-site”, DR testing should be an ongoing and periodic activity. In that presentation I suggested testing DR plans at least 1/year but more often if feasible. In this case a test is a “simulated disaster declaration” where operations is temporarily moved to an alternate site.  I know of one European organization which tested their DR plans every week but they owned the hot-site and their normal operations were split across the two sites.

For organizations that have “cold-sites” or no sites, the choices for DR testing are much more limited. In these situations, I recommended a way to deskcheck or walkthru a BC/DR plan, which didn’t involve any hardware testing. This is like a code or design inspection but applied to a BC/DR plans.

How to perform a BC/DR plan deskcheck/walkthru

In a BC/DR plan deskcheck there are a few roles, namely a leader, a BC/DR plan owner, a recorder,  and participants.  The BC/DR deskcheck process looks something like:

  1. Before the deskcheck, the leader identifies walkthru team members from operations, servers, storage, networking, voice, web, applications, etc.; circulates the current BC/DR plan to all team members; and establishes the meeting date-times.
  2. The leader decides which failure scenario will be used to test the DR/BC plan.  This can be driven by the highest probability or use some form of equivalence testing. (In equivalence testing one collapses the potential failure scenarios into a select set which have similar impacts.)
  3. In the pre-deskcheck meeting,  the leader discusses the roles of the team members and identifies the failure scenario to be tested.  IT staff and other participants are to determine the correctness of the DR/BC plan “from their perspective”.  Every team member is supposed to read the BC/DR plan before the deskcheck/walkthru meeting to identify problems with it ahead of time.
  4. At the deskcheck/walkthru meeting, The leader starts the session by describing the failure scenario and states what, if any  data center, telecom, transport facilities are available, the state of the alternate site, and current whereabouts of IT staff, establishing the preconditions for the BC/DR simulation.  Team members should concur with this analysis or come to consensus on the scenario’s impact on facilities, telecom, transport and staffing.
  5. Next, the owner of the plan, describes the first or next step in detail identifying all actions taken and impact on the alternate site. Participants then determines if the step performs the actions as stated or not.  Also,
    1. Participants discuss the duration for step to complete to place everything on the same time track. For instance at
      1. T0: it’s 7pm on a Wednesday, a fire-flood-building collapse occurs, knocks out the main data center, all online portals are down, all application users are offline, …, luckily operations personnel are evacuated and their injuries are slight.
      2. T1: Head of operations is contacted and declares a disaster; activates the disaster site; calls up the DR team to get to on a conference call ASAP, …
      3. T2: Head of operations, requests backups be sent to the alternate site; personnel are contacted and told to travel to the DR site; Contracts for servers, storage and other facilities at DR site are activate; …
    2. The recorder pays particular attention to any problems brought up during the discussion, ties them to the plan step, identifies originator of the issue, and discusses its impact.  Don’t try to solve the problems,  just record  them and its impact .
    3. The Leader or their designee maintains an official plan timeline in real time. This timeline can be kept on a whiteboard or an (excel/visio chart) display for all to see.  Timeline documentation can be kept as a formal record of the walkthru along with the problem list, and the BC/DR plan.
    4. This step is iterated for every step in the BC/DR plan until the plan is completed.
  6. At the end, the recorder lists all the problems encountered and provides a copy to the plan owner.
  7. The team decides if another deskcheck rewiew is warranted on this failure scenario (depends on the number and severity of the problems identified).
  8. When the owner of the plan has resolved all the issues, he or she reissues the plan to everyone that was at the meeting.
  9. If another deskcheck is warranted, the leader issues another meeting call.

This can take anywhere from half a day to a couple of days. BUT deskchecking your BC/DR plan can be significantly less costly than any actual test.  Nevertheless, a deskcheck cannot replace an actual BC/DR plan simulation test on real hardware/software.

Some other hints from code and design inspections

  • For code or design inspections, a checklist of high probability errors is used to identify and familiarize everyone with these errors.  Checklists can focus participant review to look for most probable errors. The leader can discuss these most likely errors at the pre-deskcheck meeting.
  • Also, problems are given severities, like major or minor problems.  For example,  a BC/DR plan “minor” problem might be an inadequate duration estimate for an activity.  A “major” problem might be a mission critical app not coming up after a disaster.

So that’s what a BC/DR plan deskcheck would look like. If you did a BC/DR plan once a quarter you are doing probably better than most.  And if on top of that, you did a yearly full scale DR simulation on real hardware you would be considered well prepared in my view.  What do you think?

HDS Dynamic Provisioning for AMS

HDS announced support today for their thin provisioning (called Dynamic Provisioning) feature to be available in their mid-range storage subsystem family the AMS. Expanding the subsystems that support Thin provisioning can only help the customer in the long run.

It’s not clear whether you can add dynamic provisioning to an already in place AMS subsystem or if it’s only available on a fresh installation of an AMS subsystem. Also no pricing was announced for this feature. In the past, HDS charged double the price of a GB of storage when it was in a thinly provisioned pool.

As you may recall, thin provisioning is a little like a room with a bunch of inflatable castles inside. Each castle starts with it’s initial inflation amount. As demand dictates, each castle can independently inflate to whatever level is needed to support the current workload up to that castles limit and the overall limit imposed by the room the castles inhabit. In this analogy, the castles are LUN storage volumes, the room the castles are located in, is the physical storage pool for the thinly provisioned volumes, and the air inside the castles is the physical disk space consumed by the thinly provisioned volumes.

In contrast, hard provisioning is like building permanent castles (LUNS) in stone, any change to the size of a structure would require major renovation and/or possible destruction of the original castle (deletion of the LUN).

When HDS first came out with dynamic provisioning it was only available for USP-V internal storage, later they released the functionality for USP-V external storage. This announcement seems to complete the roll out to all their SAN storage subsystems.

HDS also announced today a new service called the Storage Reclamation Service that helps
1) Assess whether thin provisioning will work well in your environment
2) Provide tools and support to identify candidate LUNs for thin provisioning, and
3) Configure new thinly provisioned LUNs and migrate your data over to the thinly provisioned storage.

Other products that support SAN storage thin provisioning include 3PAR, Compellent, EMC DMX, IBM SVC, NetApp and PillarData.