The promise of software defined storage

Data hypervisor, software defined storage, data plane, control plane
(c) 2012 Silverton Consulting, Inc. All rights reserved

Not sure why but all the hype around software defined storage seems to be reaching a crescendo.  Possible due to conference season coming up but it started earlier this year.  I attended an SNW analyst session that was talking about software defined storage had on its panel technical people from HDS, IBM, Data Core and VMware.  It seems the distinction between storage virtualization and software defined storage is getting slimmer every time we talk about it.  I have written before about software defined storage (see my Data Hypervisor post).

Server, networking and storage virtualization today

Server virtualization makes an awful lot of sense, has made lots of money and arguably been around for decades now especially in mainframe systems.  Servers have so much power today that dedicating one to a single workload just doesn’t make any sense anymore.

Network virtualization from OpenFlow and others also makes a lot of sense (see OpenFlow the next wave in networking and OpenFlow part 2, Cisco’s response posts). Here we aren’t necessarily boosting network utilization as much as changing resource allocation to deal with altered traffic flows.  That and the fact that provisioning, monitoring and other management characteristics can now be under pragmatic control from the user makes these systems very appealing. Especially, to organizations that exhibit varying network activity over time.

Storage virtualization has been around for a long time too and essentially places a storage system abstraction layer on top of a group of other, heterogeneous storage systems. This provides a number of capabilities such as allowing data to be migrated from one storage system to another without host knowledge or intervention.  Other storage virtualization features include, centralized, management, common storage features, different storage personalities (protocols), etc. But just being able to migrate data from one storage system to another without host intervention or knowledge provides an awful lot of value, especially to large data centers which refresh technology frequently.

Software defined storage compared to server virtualization

Software defined storage seems to imply some ability to marry storage virtualization services to RESTful and other APIs which would allow programatic storage provisioning, monitoring and management.  This would allow data centers to manage and control their storage without involving storage administrators in day-to-day activities.

When I compare this to server virtualization the above described capabilities really don’t increase storage utilization much.  Yes, by automating provisioning or even running thin provisioning one can potentially boost storage capacity utilization but you really haven’t increased the IO utilization much by doing this.

Looking under the covers of most storage systems one might find that CPU cores are pretty idle, but data paths and storage devices are typically running flat out.  One problem is that today’s enterprise storage subsystems are already highly shared across applications and users.  So there is really no barrier to sharing these resources as widely as they can.   As such, storage system IOPS and/or bandwidth utilization is already pretty high.   I would say a typical enterprise application environment storage subsystem performance usually runs above 30% and reaching 50% or more during peak time periods. Increasing IOPS utilization much beyond that risks seriously impacting peak performance periods.

Now if somehow one could migrate slower data around a complex to lower performing storage when there’s no need for high performance and higher performing data to higher performing storage when there is a need then that could help increase performance utilization considerably.   But, many storage systems already do this internally through automated storage tiering and even some can do this across storage systems using storage virtualization.

But the underlying problem here is that in takes a lot of time, resources and effort to move TBs of data around a data center, especially when its doing other work.  So other than something akin to storage tiering across storage systems we are unlikely to see much increase in storage performance utilization with a gaggle of multiple storage systems.  I suppose in the future moving TB of data may take much less time & resources than today but then the problem becomes moving PB of data around.

Software defined storage compared to network virtualization

When I compare the above capabilities to network virtualization it doesn’t look very similar.   There’s really no way to change the storage performance to optimize it for one direction (or application) at this instant and then move storage performance around to another application a couple of hours later.  Yes, again automated storage tiering can do this, and yes some of these systems can tier across storage systems using storage virtualization but in general barring storage tiering there’s nothing like this available today.  

Maybe if inside a storage system the data paths could somehow be programatically reconfigured to offer say more internal bandwidth to the Device-to-Cache path vs. the Cache-to-Frontend path. Changing or reconfiguring data path resources like this could certainly optimize the internal performance of a storage system and this would be a worthwhile feature of any software defined storage.  Knowing which is more important to one application and less important to all the others will take some smarts, across the storage system and host O/S but it’s certainly feasible.  So, with RESTful interfaces, APIs or application hints data paths could be reconfigurations on demand to support applications that are all vieing for IO activity.  

With these sorts of capabilities software defined storage starts to look a little more like software defined networking.

Software defined storage on its own

But in the end we always reach a fundamental limit of IO capabilities in today’s storage systems which is the devices. Yes you can have 2000 or more devices in high-end storage  today and yes you can have all-flash arrays. However, most storage systems are configured to keep whatever devices they have pretty busy as much of the time as possible.

Until we create some sort of storage device that can provide more performance than most applications can ever use, even when they are shared via a storage system, software defined storage capabilities will be limited.  Today’s SSDs have certainly boosted performance considerably but this just means that most applications that warrant all flash arrays are performing faster.  It just so happens that some applications can take all the performance you throw at them and still want more.

I suppose if SSDs cost were to come down to match NL-SAS storage prices and still maintain the 100X faster IOP rate, then maybe a storage system built on such devices could be more “software defined” than others.  And maybe that’s where everyone is headed, believing NAND/SSD price trends will drive costs down so much that everyone can have all the IOPS performance they will ever need out of a single storage system.

Yet, this still just looks like shared storage we have today, only more of it. So we return back to our roots and see that software defined storage is just another way to add more storage sharing. Storage virtualization is nice, new more programmatical storage systems is even better but faster-cheaper storage devices is best of all.

So what we really need is much cheaper SSDs to realize the full promise of software defined storage.   In the mean time opening up APIs and providing RESTful interfaces to provide programatic interfaces to provisioning, monitoring, managing and tuning storage system data paths and other performance characteristics are all we can hope for.

Comments?

 

 

 

Storage utilization is broke

Storage virtualization is a great concept but it has some serious limitations. One which comes to mind is the way we measure storage utilization.

Just look at what made VMware so succesful.  As far as I can see it was mainly due to the fact that x86 servers were vastly under utilized sometimes only achieving single digit utilization with normal applications.  This meant you could potentially run 10 or more of these processes on a single server.  Then dual core, quad core, and eight core processor chips starting showing up which just made non-virtualized systems seem even more of a waste.

We need a better way to measure storage utilization in order to show customers one of the ways where storage virtualization can help.

Although the storage industry talks a lot about storage utilization, they really mean capacity utilization.  There is no sense, no measurement, no idea of what performance utilization is of a storage system.

There is one storage startup that is looking at performance utilization, but from the standpoint of partitioning out system performance to different applications, i.e.,  Nexgen, a hybrid SSD-disk storage system, not as a better way to measure system utilization.

Historical problems with storage performance utilization

I think one problem may be that its much harder to measure storage performance utilization.  With a server processor it’s relatively easy to measure idle time. One just needs some place in the O/S to start clocking idle time whenever the server has nothing else to do.

But it’s not so easy in storage systems. Yes there are still plenty of idle loops but  they can be used to wait until a device delivers or accepts some data.  In that case the storage system is not “technically” idle.  On other hand, when a storage system is actually waiting for work, this is “true” idle time.

Potential performance utilization metrics

From a storage performance utilization perspective, I see at least four different metrics:

  • Idle IO time – this is probably closest to what a standard server utilization looks like.  It could be accumulated during intervals when no IO is active on the system. It’s complement, Busy IO time would be accumulated every time IO activity is present in the storage (from the storage server[s] perspective).  The sad fact is that plenty of storage systems measure something akin to Idle IO time, but seldom report on it in any sophisticated manner.
  • Idle IOP time – this could be some theoretical IOPS rate the system could achieve in its present configuration and anytime it was below that level it would accumulate Idle IOP time. It doesn’t have to be 100% of its rated IOPS performance, it could be targeted at 75%, 50% or even 25% of its configuration dependent theoretical maximum. But whenever IOP’s dropped below this rate then the system would start counting idle IOP time.  It’s complement, Busy IOP time, would be counted anytime the system exceeded that targeted IOPs rate.
  • Idle Throughput time – this could be some theoretical data transfer rate the system, in its current configuration, was capable of sustaining and anytime it was less than this rate it would accumulate Idle Throughput time.  Again this doesn’t have to be at the maximum throughput for the storage system but it needs to be some representative sustainable level. It’s counterpart, Busy Throughput time would be accumulated anytime the system reached the targeted throughput level.

Either of the last two measures could be a continuous value rather than some absolute quantity. For example if the targeted IOPS rate was 100K and the system used 50K IOPS for some time interval, then the Idle IOPS time would be the time interval times 50% (50K IOPS achieved/100K targeted IOPS).

To calculate storage performance utilization one would take the Idle IO, IOPS and/or Throughput time over a wall clock interval (say 15 minutes) and average this across multiple time periods.  Storage systems could chart these values to show end-users the periodicity of their activity.

This way on a 15 minute basis we could understand the busy-ness of a storage system.  And if we found that the storage was running at 5% IOPS or Throughput utilization most of the time, then implementing storage virtualization on that storage system would make a lot of sense.

Problems with proposed metrics

One problem with the foregoing is that IOPS and Throughput rates vary tremendously depending on storage system configuration as well as the type of workload that the system is encountering, e.g., 256KB blocks vs 512 byte blocks can have a significant bearing on IOPS rates and throughput rates attainable by any storage system.

There are solutions to these issues but they all require more work in development, testing and performance modeling.   This may argue for the more simpler Idle IO time metric but I prefer the other measures as they provide more accurate and continuous data.

~~~~

I believe metrics such as the above would be a great start to supplying the information that IT staff need to understand how storage virtualization would be beneficial to an organization. 

There are other problems with the current storage virtualization capabilities present today but these must be subjects for future posts.

Image: Biblioteca José Vasconcelos / Vasconcelos Library by * CliNKer *