Data hypervisor

(c) 2012 Silverton Consulting, Inc. All rights reserved

With all this talk of software defined networking and server virtualization where does storage virtualization stand.  I blogged about some problems with storage virtualization a week or so ago in my post on Storage Utilization is broke and this post takes it to the next level.  Also I was at a financial analyst conference this week in Vail where I heard Mark Lewis of Tekrocket but formerly of EMC discuss the need for a data hypervisor to provide software defined storage.

I now believe what we really need for true storage virtualization is a renewed focus on data hypervisor functionality.  The data hypervisor would need both a control plane and a data plane in order to function properly.   Ideally the control plane would set up the interface and routing for the data plane hardware and the server and/or backend storage would be none the wiser.

DMs everywhere

I envision a scenario where a customer’s application data is packaged with a data hypervisor which runs on a commodity data switch hardware with data plane and control plane software running on it.  Sort of creating (virtual) data machines or DMs.

All enterprise and nowadays most midrange storage provide most of the functionality of a storage control plane such as defining units of storage, setting up physical to logical storage mapping, incorporating monitoring, and management of the physical storage layer, etc.  So control planes are pervasive in today’s storage but proprietary.

In addition most storage systems have data plane functionality which operates to connect a host IO request to the actual data which resides in backend storage or internal cache.  But again although data planes are everywhere in storage today they are all proprietary to a specific vendor’s storage system.

Data switch needed

But in order to utilize a data hypervisor and create a more general purpose control plane layer, we need a more generic data plane layer that operates on commodity hardware. This is different from today’s SAN storage switches or DCB switches but similar in a some ways.

The functions of the data switch/data plane layer would be to take routing instructions from the control plane layer and direct the server IO request to the proper storage unit using the data plane layer.  Somewhere in this world view, probably at the data plane level it would introduce data protection services like RAID or other erasure coding schemes, point in time copy/clone services and replication services and other advanced storage features needed by enterprise storage today.

Also it would need to provide some automated storage movement across and within tiers of physical storage and it would connect server storage interfaces at the front end to storage interfaces at the backend.  Not unlike SAN or DCB switches but with much more advanced functionality.

Ideally data switch storage interfaces could attach to dedicated JBOD, Flash arrays as well as systems using DAS  storage.  In addition, it would be nice if the data switch could talk to real storage arrays on SAN, IP/SANs or NFS&CIFS/SMB storage systems.

The other thing one would like out of a data switch is support for a universal translator that would map one protocol to another, such as iSCSI to SAS, NFS to FC, or FC to NFS and any other combination, depending on the needs of the server and the storage in the configuration.

Now if the data switch were built on top of commodity x86 hardware and software with the data switch as just a specialized application that would create the underpinnings for a true data hypervisor with a control and data plane that could be independent and use anybody’s storage.

Data hypervisor

Assuming all this were available then we would have true storage virtualization.  With these capabilities, storage could be repurposed on the fly, added to, subtracted from, and in general be a fungible commodity not unlike server processing MIPs under VMware or Hyper-V.

Application data would then needed to be packaged into a data machine which would offer all the host services required to support host data access.  The data hypervisor would handle the linkages required to interface with the control and data layers.

Applications could be configured to utilize available storage at ease and storage could grow,  shrink or move to accommodate the required workload just as easily as VMs can be deployed today.

How we get there

Aside from the VMware, Citrix, Microsoft thrusts towards virtual storage there are plenty of storage virtualization solutions that can control most backend enterprise SAN storage. However, the problem with these solutions is that in general the execute only on a specific vendors hardware and don’t necessarily talk to DAS or JBOD storage.

In addition, not all of the current generation storage virtualization solutions are unified. That is most of these today only talk FC, FCoE or iSCSI and don’t support NFS or CIFS/SMB.

These don’t appear to be insurmountable obstacles and with proper allocation of R&D funding, could all be solved.

However the more problematic is that none of these solutions operate on commodity hardware or commodity software.

The hardware is probably the easiest to deal with. Today many enterprise storage systems are built ontop of x86 processor storage controllers. Albeit sometimes they incorporate specialized packaging for redundancy and high availability.

The harder problem may be commodity software. Although the genesis for a few storage virtualization systems might come from BSD or other “commodity” software operating systems. They have been modified over the years to no longer represent anything that can run on standard off the shelf operating systems.

Then again some storage virtualization systems started out with special home grown hardware and software. As such, converting these over to something more commodity oriented would be a major transition.

But the challenge is how to get there from here and would anyone want to take this on.  The other problem is that the value add that storage vendors supply currently would be somewhat eroded.  Not unlike what happened to proprietary Unix systems with the advent of VMware.

But this will not take place overnight and the company that takes this on and makes a go at it can have a significant software monopoly that would be hard to crack.

Perhaps it will take a startup to do this but I believe the main enterprise storage vendors are best positioned to take this on.

Comments?

Storage utilization is broke

Storage virtualization is a great concept but it has some serious limitations. One which comes to mind is the way we measure storage utilization.

Just look at what made VMware so succesful.  As far as I can see it was mainly due to the fact that x86 servers were vastly under utilized sometimes only achieving single digit utilization with normal applications.  This meant you could potentially run 10 or more of these processes on a single server.  Then dual core, quad core, and eight core processor chips starting showing up which just made non-virtualized systems seem even more of a waste.

We need a better way to measure storage utilization in order to show customers one of the ways where storage virtualization can help.

Although the storage industry talks a lot about storage utilization, they really mean capacity utilization.  There is no sense, no measurement, no idea of what performance utilization is of a storage system.

There is one storage startup that is looking at performance utilization, but from the standpoint of partitioning out system performance to different applications, i.e.,  Nexgen, a hybrid SSD-disk storage system, not as a better way to measure system utilization.

Historical problems with storage performance utilization

I think one problem may be that its much harder to measure storage performance utilization.  With a server processor it’s relatively easy to measure idle time. One just needs some place in the O/S to start clocking idle time whenever the server has nothing else to do.

But it’s not so easy in storage systems. Yes there are still plenty of idle loops but  they can be used to wait until a device delivers or accepts some data.  In that case the storage system is not “technically” idle.  On other hand, when a storage system is actually waiting for work, this is “true” idle time.

Potential performance utilization metrics

From a storage performance utilization perspective, I see at least four different metrics:

  • Idle IO time – this is probably closest to what a standard server utilization looks like.  It could be accumulated during intervals when no IO is active on the system. It’s complement, Busy IO time would be accumulated every time IO activity is present in the storage (from the storage server[s] perspective).  The sad fact is that plenty of storage systems measure something akin to Idle IO time, but seldom report on it in any sophisticated manner.
  • Idle IOP time – this could be some theoretical IOPS rate the system could achieve in its present configuration and anytime it was below that level it would accumulate Idle IOP time. It doesn’t have to be 100% of its rated IOPS performance, it could be targeted at 75%, 50% or even 25% of its configuration dependent theoretical maximum. But whenever IOP’s dropped below this rate then the system would start counting idle IOP time.  It’s complement, Busy IOP time, would be counted anytime the system exceeded that targeted IOPs rate.
  • Idle Throughput time – this could be some theoretical data transfer rate the system, in its current configuration, was capable of sustaining and anytime it was less than this rate it would accumulate Idle Throughput time.  Again this doesn’t have to be at the maximum throughput for the storage system but it needs to be some representative sustainable level. It’s counterpart, Busy Throughput time would be accumulated anytime the system reached the targeted throughput level.

Either of the last two measures could be a continuous value rather than some absolute quantity. For example if the targeted IOPS rate was 100K and the system used 50K IOPS for some time interval, then the Idle IOPS time would be the time interval times 50% (50K IOPS achieved/100K targeted IOPS).

To calculate storage performance utilization one would take the Idle IO, IOPS and/or Throughput time over a wall clock interval (say 15 minutes) and average this across multiple time periods.  Storage systems could chart these values to show end-users the periodicity of their activity.

This way on a 15 minute basis we could understand the busy-ness of a storage system.  And if we found that the storage was running at 5% IOPS or Throughput utilization most of the time, then implementing storage virtualization on that storage system would make a lot of sense.

Problems with proposed metrics

One problem with the foregoing is that IOPS and Throughput rates vary tremendously depending on storage system configuration as well as the type of workload that the system is encountering, e.g., 256KB blocks vs 512 byte blocks can have a significant bearing on IOPS rates and throughput rates attainable by any storage system.

There are solutions to these issues but they all require more work in development, testing and performance modeling.   This may argue for the more simpler Idle IO time metric but I prefer the other measures as they provide more accurate and continuous data.

~~~~

I believe metrics such as the above would be a great start to supplying the information that IT staff need to understand how storage virtualization would be beneficial to an organization. 

There are other problems with the current storage virtualization capabilities present today but these must be subjects for future posts.

Image: Biblioteca José Vasconcelos / Vasconcelos Library by * CliNKer *