SNIA’s new SSD performance test specification

Western Digital's Silicon Edge Blue SSD SATA drive (from their website)
Western Digital's Silicon Edge Blue SSD SATA drive (from their website)

A couple of weeks ago SNIA just released a new version of their SSSI (SSD) performance test specification for public comment. Not sure if this is the first version out for public comment or not but I discussed a prior version in a presentation I did for SNW last October and I have blogged before about some of the mystery of measuring SSD performance.  The current version looks a lot more polished than what I had to deal with last year but the essence of the performance testing remains the same:

  • Purge test – using vendor approved process, purge (erase) all the data on the drive.
  • Preconditioning test  – Write 2X the capacity of the drive using 128KiB blocksizes and sequentially writing through the whole device’s usable address space.
  • Steady state testing – varying blocksizes, varying read-write ratios, varying block number ranges, looped until steady state is achieved in device performance.

The steady state testing runs a random I/O mix for a minutes duration at whatever the current specified blocksize, RW ratio and block number range.  Also, according to the specification the measurements for steady state are done once 4KiB block sizes and 100% Read Write settles down.  This steady state determinant testing must execute over a number of rounds (4?) then the other performance test runs are considered at “steady state”.

SNIA’s SSSI performance test benefits

Lets start by saying no performance test is perfect.  I can always find fault in any performance test, even my own.  Nevertheless, the SSSI new performance test goes a long way towards fixing some intrinsic problems with SSD performance measurement.  Specifically,

  • The need to discriminate between fresh out of the box (FOB) performance and ongoing drive performance.  The preconditioning test is obviously a compromise in attempting to do this but writing double the full capacity of a drive will take a long time and should cause every NAND cell in the user space to be overwritten.  Once is not enough to overwrite all the devices write buffers.   However three times the device’s capacity may still show some variance in performance but it will take correspondingly longer.
  • The need to show steady state SSD performance versus some peak value.  SSDs are notorious for showing differing performance over time. Partially this is due to FOB performance (see above) but mostly this is due to the complexity of managing NAND erasure and programming overhead.

The steady state performance problem is not nearly as much an issue with hard disk drives but even here, with defect skipping, drive performance will degrade over time (but a much longer time than for SSDs).  My main quibble with the test specification is how they elect to determine steady state – 4KiB with 100% read write seems a bit over simplified.

Is write some proportion of read IO needed to define SSD “steady state” performance?

[Most of the original version of this post centered on the need for some write component in steady state determination.  This was all due to my misreading the SNIA spec.  I now realize that the current spec calls for a 100% WRITE workload with 4KiB blocksizes to settle down to determine steady state.   While this may be overkill, it certainly is consistent with my original feelings that some proportion of write activity needs to be a prime determinant of SSD steady state.]

Most of my concern with how the test determines SSD steady state performance is that lack of write activity. One concern is the lack of read activity in determining steady state. My other worry with this approach is the blocksize seems a bit too small, however this is minor in comparison.

Let’s start with the fact that SSDs are by nature assymetrical devices.  By that I mean their write performance differs substantially from their read performance due to the underlying nature of the NAND technology.  But much of what distinguishes an enterprise SSD from a commercial drive is the sophistication of its write processing.  By using a 100% read rate we are undervaluing this sophistication.

But using 100% writes to test for steady state may be too much.

In addition, it’s It is hard for me to imagine any commercial or enterprise class device in service not having some high portion of ongoing write read IO activity.  I can easily be convinced that a normal R:W activity for an SSD device is somewhere between 90:10 and 50:50.  But I have a difficult time seeing an SSD R:W ratio of 100:0 0:100 as realistic.  And I feel any viable interpretation of device steady state performance needs to be based on realistic workloads.

In SNIA’s defense they had to pick some reproducible way to measure steady state.  Some devices may have had difficulty reaching steady state with any 100% write activity.  However, most other benchmarks have some sort of cut off that can be used to invalidate results.  Reaching steady state is one current criteria for SNIA’s SSSI performance test.  I just think adding some portion of write mix of read and write activity would be a better measure of SSD stability.

As for the 4KiB block size, it’s purely a question of what’s the most probable blocksize in the use of SSDs and  may vary for  enterprise or consumer applications.  But 4KiB seems a bit behind the times especially with todays 128GB and higher drives…

What do you think should SSD steady state need some portion of write mix of read and write activity or not?

[Thanks to Eden Kim and his team at SSSI for pointing out my spec reading error.]

Deskchecking BC/DR plans

Hurricane Ike - 2008/09/12 - 21:26 UTC by CoreBurn (cc) (from Flickr)
Hurricane Ike - 2008/09/12 - 21:26 UTC by CoreBurn (cc) (from Flickr)

Quite a lot of twitter traffic/tweetchat this Wednesday on DR/BC all documented on #sanchat sponsored by Compellent. In that discussion I mentioned a presentation I did a couple of years ago for StorageDecisions/Chicago on Successful Disaster Recovery Testing where I discussed some of the techniques companies use to provide disaster recovery and how they validated these activities.

For those shops with the luxury of having an owned or contracted for “hot-site” or “warm-site”, DR testing should be an ongoing and periodic activity. In that presentation I suggested testing DR plans at least 1/year but more often if feasible. In this case a test is a “simulated disaster declaration” where operations is temporarily moved to an alternate site.  I know of one European organization which tested their DR plans every week but they owned the hot-site and their normal operations were split across the two sites.

For organizations that have “cold-sites” or no sites, the choices for DR testing are much more limited. In these situations, I recommended a way to deskcheck or walkthru a BC/DR plan, which didn’t involve any hardware testing. This is like a code or design inspection but applied to a BC/DR plans.

How to perform a BC/DR plan deskcheck/walkthru

In a BC/DR plan deskcheck there are a few roles, namely a leader, a BC/DR plan owner, a recorder,  and participants.  The BC/DR deskcheck process looks something like:

  1. Before the deskcheck, the leader identifies walkthru team members from operations, servers, storage, networking, voice, web, applications, etc.; circulates the current BC/DR plan to all team members; and establishes the meeting date-times.
  2. The leader decides which failure scenario will be used to test the DR/BC plan.  This can be driven by the highest probability or use some form of equivalence testing. (In equivalence testing one collapses the potential failure scenarios into a select set which have similar impacts.)
  3. In the pre-deskcheck meeting,  the leader discusses the roles of the team members and identifies the failure scenario to be tested.  IT staff and other participants are to determine the correctness of the DR/BC plan “from their perspective”.  Every team member is supposed to read the BC/DR plan before the deskcheck/walkthru meeting to identify problems with it ahead of time.
  4. At the deskcheck/walkthru meeting, The leader starts the session by describing the failure scenario and states what, if any  data center, telecom, transport facilities are available, the state of the alternate site, and current whereabouts of IT staff, establishing the preconditions for the BC/DR simulation.  Team members should concur with this analysis or come to consensus on the scenario’s impact on facilities, telecom, transport and staffing.
  5. Next, the owner of the plan, describes the first or next step in detail identifying all actions taken and impact on the alternate site. Participants then determines if the step performs the actions as stated or not.  Also,
    1. Participants discuss the duration for step to complete to place everything on the same time track. For instance at
      1. T0: it’s 7pm on a Wednesday, a fire-flood-building collapse occurs, knocks out the main data center, all online portals are down, all application users are offline, …, luckily operations personnel are evacuated and their injuries are slight.
      2. T1: Head of operations is contacted and declares a disaster; activates the disaster site; calls up the DR team to get to on a conference call ASAP, …
      3. T2: Head of operations, requests backups be sent to the alternate site; personnel are contacted and told to travel to the DR site; Contracts for servers, storage and other facilities at DR site are activate; …
    2. The recorder pays particular attention to any problems brought up during the discussion, ties them to the plan step, identifies originator of the issue, and discusses its impact.  Don’t try to solve the problems,  just record  them and its impact .
    3. The Leader or their designee maintains an official plan timeline in real time. This timeline can be kept on a whiteboard or an (excel/visio chart) display for all to see.  Timeline documentation can be kept as a formal record of the walkthru along with the problem list, and the BC/DR plan.
    4. This step is iterated for every step in the BC/DR plan until the plan is completed.
  6. At the end, the recorder lists all the problems encountered and provides a copy to the plan owner.
  7. The team decides if another deskcheck rewiew is warranted on this failure scenario (depends on the number and severity of the problems identified).
  8. When the owner of the plan has resolved all the issues, he or she reissues the plan to everyone that was at the meeting.
  9. If another deskcheck is warranted, the leader issues another meeting call.

This can take anywhere from half a day to a couple of days. BUT deskchecking your BC/DR plan can be significantly less costly than any actual test.  Nevertheless, a deskcheck cannot replace an actual BC/DR plan simulation test on real hardware/software.

Some other hints from code and design inspections

  • For code or design inspections, a checklist of high probability errors is used to identify and familiarize everyone with these errors.  Checklists can focus participant review to look for most probable errors. The leader can discuss these most likely errors at the pre-deskcheck meeting.
  • Also, problems are given severities, like major or minor problems.  For example,  a BC/DR plan “minor” problem might be an inadequate duration estimate for an activity.  A “major” problem might be a mission critical app not coming up after a disaster.

So that’s what a BC/DR plan deskcheck would look like. If you did a BC/DR plan once a quarter you are doing probably better than most.  And if on top of that, you did a yearly full scale DR simulation on real hardware you would be considered well prepared in my view.  What do you think?

PC-as-a-Service (PCaaS) using VDI

IBM PC Computer by Mess of Pottage (cc) (from Flickr)
IBM PC Computer by Mess of Pottage (cc) (from Flickr)

Last year at VMworld, VMware was saying that 2010 was year for VDI (virtual desktop infrastructure), last week NetApp said that most large NY banks they talked with were looking at implementing VDI and prior to that, HP StorageWorks announced a new VDI reference platform that could support ~1600 VDI images.  It seems that VDI is gaining some serious interest.

While VDI works well for large organizations, there doesn’t seem to be any similar solution for consumers. The typical consumer today usually runs downlevel OS’s, anti-virus, office applications, etc.  and have no time, nor inclination to update such software.  These consumers would be considerably better served with something like PCaaS, if such a thing existed.

PCaaS

Essentially PCaaS would be a VDI-like service offering, using standard VDI tools or something similar with a lightweight kernel, use of local attached resources (printers, usb sticks, scanners, etc.) but running applications that were hosted elsewhere.  PCaaS could provide all the latest O/S and applications and provide enterprise class reliability, support and backup/restore services.

Broadband

One potential problem with PCaaS is the need for reliable broadband to the home. Just like other cloud services, without broadband, none of this will work.

Possibly this could be circumvented if a PCaaS viewer browser application were available (like VMware’s Viewer). With this in place, PCaaS could be supplied over any internet enabled location supporting browser access.   Such a browser based service may not support the same rich menu of local resources as a normal PCaaS client, but it would probably suffice when needed. The other nice thing about a viewer is that smart phones, iPads and other always-on web-enabled devices supporting standard browsers could provide PCaaS services from anywhere mobile data or WI-FI were available.

PCaaS business model

As for a businesses that could bring PC-as-a-Service to life, I see many potential providers:

  • Any current PC hardware vendor/supplier may want to supply PCaaS as it may defer/reduce hardware purchases or rather move such activity from the consumer to companies.
  • Many SMB hosting providers could easily offer such a service.
  • Many local IT support services could deliver better and potentially less expensive services to their customers by offering PCaaS.
  • Any web hosting company would have the networking, server infrastructure and technical know-how to easily provide PCaaS.

This list ignores any new entrants that would see this as a significant opportunity.

Google, Microsoft and others seem to be taking small steps to do this in a piecemeal fashion, with cloud enabled office/email applications. However, in my view what the consumer really wants is a complete PC, not just some select group of office applications.

As described above, PCaaS would bring enterprise level IT desktop services to the consumer marketplace. Any substantive business in PCaaS would free up untold numbers of technically astute individuals providing un-paid, on-call support to millions, perhaps billions of technically challenged consumers.

Now if someone would just come out with Mac-as-a-Service, I could retire from supporting my family’s Apple desktops & laptops…

Building a green data center

Diversity in the Ecological Soup by jurvetson (cc) (from Flickr)
Diversity in the Ecological Soup by jurvetson (cc) (from Flickr)

At NetApp’s Analyst Days last week David Robbins, CTO Information Technology, reported on a new highly efficient Global Dynamic Lab (GDL) data center which they built in Raleigh, North Carolina.  NetApp predicts this new data center  will have a power use effectiveness (PUE) ratio of 1.2.  Most data centers today do well if they can attain a PUE of 2.0.

Recall that PUE is the ratio of all power required by the data center (includes such things as IT power, chillers, fans, UPS, transformers, humidifiers, lights, etc.) over just IT power (for racks, storage, servers, and networking gear).  A PUE of 2 says that there is as much power used by IT equipment as is used to power and cool the rest of the data center.  An EPA report on Server and Data Center Efficiency said that data centers could reach a PUE of 1.4 if they used state of the art techniques outlined in the report.  A PUE of 1.2 is a dramatic improvement in data center power efficiency and reduces non-IT power in half.

There were many innovations used by NetApp to reach the power effectiveness at GDL. The most important ones were:

  • Cooling at higher temperatures which allowed for the use of ambient air
  • Cold-room, warm aisle layout which allowed finer control over cooling delivery to the racks
  • Top-down cooling which used physics to reduce fan load.

GDL was designed to accommodate higher rack power densities coming from today’s technology. GDL supports an average of 12kW per rack and can handle a peak load of 42kW per rack.  In addition, GDL uses 52U tall racks which helps reduce data center foot print.  Such high powered/high density racks requires rethinking data center cooling.

Cooling at higher temperatures

Probably the most significant factor that improved PUE was planning for the use much warmer air temperatures.  By using warmer air 70-80F/21.1-26.7C, much of the cooling could now be based on ambient air rather than chilled air.  NetApp estimates that they can use ambient air 75% of the year in Raleigh, a fairly warm and humid location.  As such, GDL chiller use is reduced significantly which generates significant energy savings from the number 2 power consumer in most data centers.

Also, NetApp is able to use ambient air for partial cooling for the much of the rest of the year when used in conjunction with chillers.  Air handlers were purchased that could use outside air, chillers or a combination of the two.  GDL chillers also operate more efficiently at the higher temperatures, reducing power requirements yet again.

Given the temperature rise of typical IT equipment cooling of ~20-25F/7.6-9.4C one potential problem is that the warm aisles can exceed 100F/37.8C which is about the upper limit for human comfort. Fortunately, by detecting lighting use in the hot aisles, GDL can increase cold room equipment cooling to bring temperatures in adjacent hot aisles down to a more comfortable level when humans are present.

One other significant advantage to using warmer temperatures is that warmer air is easier to move than colder air.  This provides savings by allowing lowered powered fans to cool the data center.

Cold rooms-warm aisles

GDL built cold rooms at the front side of racks and a relatively open warm aisle on the other side of the racks.  Such a design provides uniform cooling from the top to the bottom of a rack.  With a more open air design, hot air often accumulates and is trapped at the top of the rack which requires more cooling to compensate.  By sealing the cold room, GDL insures a more equilateral cooling of the rack and thus, more efficient use of cooling.

Another advantage provided by cold-rooms, warm aisles is that cooling activity can be regulated by pressure differentials between the two aisles rather than flow control or spot temperature sensors.  Such regulation effectiveness, allows GDL to reduce air supply to match rack requirements.  As such, GDL reduces excess cooling that is required by more open designs using flow or temperature sensors.

Top down cooling

I run into this every day at my office, cool air is dense and flows downward, hot air is light and flows upward.  NetApp designed GDL to have air handlers on top of the computer room rather than elsewhere.  This eliminates much of the ductwork which often reduces air flow efficiency requiring increased fan power to compensate.  Also by piping the cooling in from above, physics helps get that cold air to the racked equipment that needs it.  As for the hot aisles, warm air will naturally rise to the air return above the aisles and can then be vented to the outside, mixed with outside ambient air or chilled before it’s returned to the cold room.

For normal data centers cooled from below, fan power must be increased to move the cool air up to the top of the rack.  GDL’s top down cooling reduces the fan power requirements substantially from below the floor cooling.

—-

There were other approaches which helped GDL reduce power use such as using hot air for office heating but these seemed to be the main ones.  Much of this was presented at NetApp’s Analyst Days last week.  Robbins has written a white paper which goes into much more detail on GDL’s PUE savings and other benefits that accrued to NetApp when the built this data center.

One nice surprise was the capital cost savings generated by using GDL’s power efficient data center design.  This was also detailed in the white paper.  But at the time this post was published the paper was not available.

Now that summer’s here in the north, I think I want a cold room-warm aisle for my office…

More on data growth from NetApp analyst days customers

Installing a power line at the Tram. Pat, Allan and Chris by bossco (cc) (from Flickr)
Installing a power line at the Tram. Pat, Allan and Chris by bossco (cc) (from Flickr)

Some customers at NetApp’s Analyst Days were discussing deployments of NetApp storage with Dave Hitz the new storage efficiency czar and others but I was more interested in their comments on storage growth issues. Jonathan Bartes of Virginia Farm Bureau mentioned the “natural growth rate of unstructured data” seemed to be about 20% per year, but some of the other customers had even higher growth rates.

Tucson Electric Power

Christopher Jeffry Rima from Tucson Electric Power is dealing with 70% CAGR in data growth per year. What’s driving this is primarily regulations (Power companies are heavily regulated utilities in USA), high resolution imagery/GIS data and power management/smart metering. It turns out imagery has increased resolution by about 10X in a matter of years and they use such images as work plan overlays for field work to fix, upgrade or retire equipment. It seems they have hi-res images of all the power equipment and lines in their jurisdiction which are updated periodically via fly overs.

The other thing that’s driving their data growth is smart metering and demand power management. I have talked about smart metering data appetite before. But demand management was new to me.

Rima said that demand management is similar to smart metering but adds a real time modeling of  demand and capacity and bi-directional transmissions to request consumers to shed demand when required. Smart meters and real time generation data feeds the load management model used to predict peak demand over the next time period which is then used to determine whether to shed demand or not.   It turns out that at ~60% utilization the power grid is much more cost effective than at 80% due the need to turn on gas generators which cost more than coal. In any case, when their prediction model shows utilization will top ~60-70% they start shunting load.

Arup

Another customer, Neil Clover from Arup (a construction/engineering firm) started talking about 3D building/site modeling and fire simulation flow dynamics modeling. Clover lamented that it’s not unusual to have a TB of data show up out of nowhere for a project they just took on.

incendio en el edificio 04 by donrenexito (cc) (from Flickr)
incendio en el edificio 04 by donrenexito (cc) (from Flickr)

Clover said the fire flow modeling’s increasing resolution and multiple iterations under varying conditions were generating lots of data. The 3D models are also causing serious data growth and need to be maintained across the design, build, operate cycle of buildings.  TB of data showing up on your data center storage with no advance notice – incredible.  All this and more is causing Clover’s data growth to average around 70% per year.

University Hospitals Leuven, Belgium

The day before at the analyst meeting Reinoud Reynders from the University Hospital Leuven, Belgium mentioned some key drivers of data growth at their hospital as digital pathology studies that generate about 100GB each but which they do about 100 times a day and DNA studies that generate about 1TB of data each and take about a week to create.  This seems higher than I predicted, almost 16X higher.  However, Reynders said the DNA studies are still pretty expensive at $15K USD each but he forecasts costs decreasing drasmatically over the coming years and a commensurate volume increase.

But the more critical current issue might be the digital pathology exams at ~10TB per day.  The saving grace for pathology exams is that such studies can be archived when completed rather than kept online. Reynders also mentioned that digital radiology and imaging studies are also creating massive amounts of data but unfortunately this data must be kept online because they are re-referenced often and has no predictability about it.

While data growth was an understated concern during much of the conference sessions, how customers dealt with such (ab?)normal growth by using NetApp storage and Ontap functionality was the main topic of their presentations.  Explanation on this NetApp functionality and how effective they were at managing data growth will need to await another day.

SPC-1&-1/E results IOPS/Drive – chart of the month

Top IOPS(tm) per drive for SPC-1 & -1/E results as of 27May2010
Top IOPS(tm) per drive for SPC-1 & -1/E results as of 27May2010

The chart shown here reflects information from a SCI StorInt(tm) dispatch on the latest Storage Performance Council benchmark performance results and depicts the top IO operations done per second per installed drive for SPC-1 and SPC-1/E submissions.  This particular storage performance  metric is one of the harder ones to game.  For example, adding more drives to perform better does nothing for this view.

The recent SPC-1 submissions were from Huwaei Symantec’s Oceanspace S2600 and S5600, Fujitsu Eternus DX400 and DX8400 and the latest IBM DS8700 with EasyTier, SSD and SATA drives were added. Of these results, the only one to show up on this chart was the low-end Huawei Symantec S2600.  It used only 48 drives and attained ~17K IOPS as measured by SPC-1.

Other changes to this chart included the addition of Xiotech’s Emprise 5000 SPC-1/E  runs with both 146GB and 600GB drives.  We added the SPC-1/E results because they execute the exact same set of tests and generate the same performance summaries.

It’s very surprising to see the first use of 600GB drives in an SPC-1/E benchmark to show up well here and the very respectable #2 result from their 146GB drive version indicates excellent drive performance yields.  The only other non-146GB drive result was for the Fujitsu DX80 which used 300GB drives.

Also as readers of our storage performance dispatches may recall the Sun (now Oracle) J4400 array provided no RAID support for their benchmark run.  We view this as an unusable configuration and although it’s advantages vis a vis IOPS/drive are probably debatable.

A couple of other caveats to this comparison,

  • We do not include pure SSD configurations as they would easily dominate this metric.
  • We do not include benchmarks that use 73GB drives as they would offer a slight advantage and such small drives are difficult to purchase nowadays.

We are somewhat in a quandary about showing mixed drive (capacity) configurations.  In fact an earlier version of this chart without the two Xiotech SPC-1/E results showed the IBM DS8700 EasyTier configuration with SSDs and rotating SATA disks.  In that version the DS8700 came in at a rough tie with the then 7th place Fujitsu’s ETERNUS2000 subsystem.  For the time being, we have decided not to include mixed drive configurations in this comparison but would welcome any feedback on this decision.

As always, we appreciate any comments on our performance analysis. Also if you are interested in receiving your own free copy of our newsletter with the full SPC performance report in it please subscribe to our newsletter.  The full report will be made available on the dispatches section of our website in a couple of weeks.

Punctuated equilibrium for business success

Finch evolution (from http://rst.gsfc.nasa.gov/Sect20/A12d.html)
Finch evolution (from http://rst.gsfc.nasa.gov/Sect20/A12d.html)

Coming out of the deep recession of 2007-2009 I am struck by how closely business success during recession looks like what ecologists call punctuated equilibrium.  As Wikipedia defines it, punctuated equilibria, “… is a model for discontinuous tempos of change (in) the process of speciation and the deployment of species in geological time.”

This seems to me to look just like strategic inflection points.  That is punctuated equilibrium is a dramatic, discontinuous change in a market or an environment which brings about great opportunity for gain or loss.  Such opportunities can significantly increase business market share, if addressed properly. But if handled wrong, species and/or market share can vanish with surprising speed.

Galapagos Finches

I first heard of punctuated equilibrium from the Pultizer prize-winning book The Beak of the Finch by Jonathan Weiner which documented a study done by two ecologists on Galapagos island finches over the course of a decade or so.  Year after year they went back and mapped out the lives and times of various species of finches on the island.  After a while they came to the conclusion that they were not going to see any real change in the finches during their study, the successful species were holding there own and the unsuccessful species were barely hanging on.  But then something unusual occurred.

As I recall, there was a great drought on the islands which left the more usual soft-skinned nut finch food unavailable.  During this disaster, a segment of finches that hadn’t been doing all that well on the islands but had a more powerful beak was able to rapidly gain population and there was evidence that finch speciation was actually taking place.  It turns out this powerful beak which was a liability in normal times was better able to break open these harder nuts that were relatively more plentiful during drought but normally unavailable.

Recessionary Business  Success

Similar to finches, certain business characteristics that in better times might be consider disadvantageous, can reap significant gains during recession.  Specifically,

  • Diverse product & service portfolio –  multiple products and services that appeal to different customer segments/verticals/size can help by selling to differing business some of which may be suffering  and some who may do ok during a recession.
  • Diverse regional revenue sources – multiple revenue streams coming from first, developing and third world localities around the world can help by selling to regions which can be less impacted by any economic catastrophe.
  • Cash savings – sizable savings accounts can help a company continue to spend on the right activities that will help them emerge from recession much stronger  than competitors forced to cut spending to conserve cash.
  • Marketing discipline – understanding how marketing directly influences revenue can help companies can better identify and invest in those activities that maximize revenue per marketing spend.
  • Development discipline – understanding how to develop products that deliver customer value can help companies better identify and invests in those activities that generate more revenue per R&D $.

Probably other characteristics that were missed, but these will suffice. For example consider cash savings, a large cash horde is probably a poor investment when times are good.  Also, diverse product and regional revenue streams may be considered unfocused and distracting when money is flooding in from main product lines sold in first world regions.  But when times are tough in most areas around the globe or most business verticals, having diverse revenue sources that span the whole globe and/or all business segments can be the difference between life and death.

The two obvious exceptions here are marketing and development discipline.  It’s hard for me to see a potential downside to doing these well.  Both obviously require time, effort and resources to excel in, but the payoffs are present good times and bad.

I am often amazed by the differences in how companies react to adversity.  Recession is just another, more pressing example of this.   Recessions, like industry transformation are facts of life today, failing to plan for them is a critical leadership defect that can threaten business long-term survival.

More cloud storage gateways come out

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Multiple cloud storage gateways either have been announced or are coming out in the next quarter or so. We have talked before about Nasuni’s file cloud storage gateway appliance, but now that more are out one can have a better appreciation of the cloud gateway space.

StorSimple

Last week I was talking with StorSimple that just introduced their cloud storage gateway which provides a iSCSI block protocol interface to cloud storage with an onsite data caching.  Their appliance offers a cloud storage cache residing on disk and/or optional flash storage (SSDs) and provides iSCSI storage speeds for highly active working set data residing on the cache or cloud storage speeds for non-working set data.

Data is deduplicated to minimize storage space requirements.  In addition data sent to the cloud is compressed and encrypted. Both deduplication and compression can reduce WAN bandwidth requirements considerably.    Their appliance also offers snapshots and “cloud clones”.  Cloud clones are complete offsite (cloud) copies of a LUN which can then be maintained in synch with the gateway LUNs by copying daily change logs and applying the logs.

StorSimple works with Microsoft’s Azure, AT&T, EMC Atmos, Iron Mountan and Amazon’s S3 cloud storage providers.   A single appliance can support multiple cloud storage providers segregated on a LUN basis.  Although how cross-LUN deduplication works across multiple cloud storage providers was not discussed.

Their product can be purchased as a hardware appliance with a few 100GB of NAND/Flash storage up to a 150TB of SATA storage.  It also can be purchased as a virtual appliance at lower cost but also much lower performance.

Cirtas

In addition to StorSimple, I have talked with Cirtas which has yet to completely emerge from stealth but what’s apparent from their website is that the Cirtas appliance provides “storage protocols” to server systems, and can store data directly on storage subsystems or on cloud storage.

Storage protocols could mean any block storage protocol which could be FC and/or iSCSI but alternatively, it might mean file protocols I can’t be certain.  Having access to independent, standalone storage arrays may mean that  clients can use their own storage as a ‘cloud data cache’.  Unclear how Cirtas talks to their onsite backend storage but presumably this is FC and/or iSCSI as well.  And somehow some of this data is stored out on the cloud.

So from our perspective it looks somewhat similar to StorSimple with the exception that it uses external storage subsystems for its cloud data cache for Cirtas vs. internal storage for StorSimple.  Few other details were publicly available as this post went out.

Panzura

Although I have not talked directly with Panzura they seem to offer a unique form of cloud storage gateway, one that is specific to some applications.  For example, the Panzura SharePoint appliance actually “runs” part of the SharePoint application (according to their website) and as such, can better ascertain which data should be local versus stored in the cloud.  It seems to have  both access to cloud storage as well as local independent storage appliances.

In addition to a SharePoint appliance they offer a “”backup/DR” target that apparently supports NDMP, VTL, iSCSI, and NFS/CIFS protocols to store (backup) data on the cloud. In this version they show no local storage behind their appliance by which I assume that backup data is only stored in the cloud.

Finally, they offer a “file sharing” appliance used to share files across multiple sites where files reside both locally and in the cloud.  It appears that cloud copies of shared files are locked/WORM like but I can’t be certain.  Having not talked to Panzura before, much of their product is unclear.

In summary

We now have both a file access and at least one iSCSI block protocol cloud storage gateway, currently available, publicly announced, i.e., Nasuni and StorSimple.  Cirtas, which is in the process of coming out, will support a “storage protocol” access to cloud storage and Panzura offers it all (SharePoint direct, iSCSI, CIFS, NFS, VTL & NDMP cloud storage access protocols).  There are other gateways just focused on backup data, but I reserve the term cloud storage gateways for those that provide some sort of general purpose storage or file protocol access.

However, Since last weeks discussion of eventual consistency, I am becoming a bit more concerned about cloud storage gateways and their capabilities.  This deserves some serious discussion at the cloud storage provider level and but most assuredly, at the gateway level.  We need some sort of generic statement that says they guarantee immediate consistency for data at the gateway level even though most cloud storage providers only support “eventual consistency”.  Barring that, using cloud storage for anything that is updated frequently would be considered unwise.

If anyone knows of another cloud storage gateway I would appreciate a heads up.  In any case, the technology is still young yet and I would say that this isn’t the last gateway to come out but it feels like these provide coverage for just about any file or block protocol one might use to access cloud storage.