Storage utilization is broke

Storage virtualization is a great concept but it has some serious limitations. One which comes to mind is the way we measure storage utilization.

Just look at what made VMware so succesful.  As far as I can see it was mainly due to the fact that x86 servers were vastly under utilized sometimes only achieving single digit utilization with normal applications.  This meant you could potentially run 10 or more of these processes on a single server.  Then dual core, quad core, and eight core processor chips starting showing up which just made non-virtualized systems seem even more of a waste.

We need a better way to measure storage utilization in order to show customers one of the ways where storage virtualization can help.

Although the storage industry talks a lot about storage utilization, they really mean capacity utilization.  There is no sense, no measurement, no idea of what performance utilization is of a storage system.

There is one storage startup that is looking at performance utilization, but from the standpoint of partitioning out system performance to different applications, i.e.,  Nexgen, a hybrid SSD-disk storage system, not as a better way to measure system utilization.

Historical problems with storage performance utilization

I think one problem may be that its much harder to measure storage performance utilization.  With a server processor it’s relatively easy to measure idle time. One just needs some place in the O/S to start clocking idle time whenever the server has nothing else to do.

But it’s not so easy in storage systems. Yes there are still plenty of idle loops but  they can be used to wait until a device delivers or accepts some data.  In that case the storage system is not “technically” idle.  On other hand, when a storage system is actually waiting for work, this is “true” idle time.

Potential performance utilization metrics

From a storage performance utilization perspective, I see at least four different metrics:

  • Idle IO time – this is probably closest to what a standard server utilization looks like.  It could be accumulated during intervals when no IO is active on the system. It’s complement, Busy IO time would be accumulated every time IO activity is present in the storage (from the storage server[s] perspective).  The sad fact is that plenty of storage systems measure something akin to Idle IO time, but seldom report on it in any sophisticated manner.
  • Idle IOP time – this could be some theoretical IOPS rate the system could achieve in its present configuration and anytime it was below that level it would accumulate Idle IOP time. It doesn’t have to be 100% of its rated IOPS performance, it could be targeted at 75%, 50% or even 25% of its configuration dependent theoretical maximum. But whenever IOP’s dropped below this rate then the system would start counting idle IOP time.  It’s complement, Busy IOP time, would be counted anytime the system exceeded that targeted IOPs rate.
  • Idle Throughput time – this could be some theoretical data transfer rate the system, in its current configuration, was capable of sustaining and anytime it was less than this rate it would accumulate Idle Throughput time.  Again this doesn’t have to be at the maximum throughput for the storage system but it needs to be some representative sustainable level. It’s counterpart, Busy Throughput time would be accumulated anytime the system reached the targeted throughput level.

Either of the last two measures could be a continuous value rather than some absolute quantity. For example if the targeted IOPS rate was 100K and the system used 50K IOPS for some time interval, then the Idle IOPS time would be the time interval times 50% (50K IOPS achieved/100K targeted IOPS).

To calculate storage performance utilization one would take the Idle IO, IOPS and/or Throughput time over a wall clock interval (say 15 minutes) and average this across multiple time periods.  Storage systems could chart these values to show end-users the periodicity of their activity.

This way on a 15 minute basis we could understand the busy-ness of a storage system.  And if we found that the storage was running at 5% IOPS or Throughput utilization most of the time, then implementing storage virtualization on that storage system would make a lot of sense.

Problems with proposed metrics

One problem with the foregoing is that IOPS and Throughput rates vary tremendously depending on storage system configuration as well as the type of workload that the system is encountering, e.g., 256KB blocks vs 512 byte blocks can have a significant bearing on IOPS rates and throughput rates attainable by any storage system.

There are solutions to these issues but they all require more work in development, testing and performance modeling.   This may argue for the more simpler Idle IO time metric but I prefer the other measures as they provide more accurate and continuous data.

~~~~

I believe metrics such as the above would be a great start to supplying the information that IT staff need to understand how storage virtualization would be beneficial to an organization. 

There are other problems with the current storage virtualization capabilities present today but these must be subjects for future posts.

Image: Biblioteca José Vasconcelos / Vasconcelos Library by * CliNKer *

Server virtualization vs. storage virtualization

Functional Fusion? by Cain Novocaine (cc) (from Flickr)
Functional Fusion? by Cain Novocaine (cc) (from Flickr)

One can only be perplexed by the seemingly overwelming adoption of server virtualization and contrast that with the ho-hum, almost underwelming adoption of storage virtualization.  Why is there this significant a difference?

I think the problem is partly due to the lack of an common understanding of storage performance utilization.

Why server virtualization succeeded

One significant driver of server virtualization is the precipitous drop in server utilization that occurred over the last decade when running single applications on a physical server.  It was nothing to see real processor utilization of less than 10% and consequently it was easy to envision that executing 5-10 applications on the single server. And what’s more each new generation of server kept getting more powerful, handling double the MIPs every 18 months or so driven by Moore’s law.

The other factor was that application workloads weren’t increasing that much. Yes new applications would come online but they seldom consumed an inordinate amount of MIPs and were often similar to what was already present. So application processing growth while not flatlining, was expanding at a relatively slow speed.

Why storage virtualization has failed

Data on the other hand continues its never ending exponential growth. Doubling every 3-5 years or less. And the fact that you have more data, almost always requires more storage hardware to support the IOPs being required to support it.

In the past the storage IOP rates was intrinsically tied to the number of disk heads available to service the load.  Although disk performance grew it wasn’t doubling every 18 months, and real per disk performance was actually going down over time, measured as the amount of IOPS per GB.

This drove proliferation of disk spindles and as such, storage subsystems in the data center. Storage virtualization couldn’t reduce the number of spindles required to support the workload.

Thus, if you look at storage performance from the perspective of % IOPS one could support per disk, most  sophisticated systems were running anywhere from 75% to 150% (based on DRAM caching).

Paradigm shift ahead

But SSDs can change this dynamic considerably.  A typical SSD can sustain 10-100K IOPs and there is some liklihood that this will increase with each generation that comes out but the application requirements will not increase as fast.  Hence, , there is a high liklihood that normal data center utilisation of SSD storage perfomance will start to drop below 50% or more, when that happens. -torage virtualization may start to make a lot more sense.

Maybe when (SSD) data storage starts moving more in line with Moore’s law, storage virtualization will become a more dominant paradigm for data center storage use.

Any bets on who the VMware of storage virtualization will be?

Comments?