With all this talk of software defined networking and server virtualization where does storage virtualization stand. I blogged about some problems with storage virtualization a week or so ago in my post on Storage Utilization is broke and this post takes it to the next level. Also I was at a financial analyst conference this week in Vail where I heard Mark Lewis of Tekrocket but formerly of EMC discuss the need for a data hypervisor to provide software defined storage.
I now believe what we really need for true storage virtualization is a renewed focus on data hypervisor functionality. The data hypervisor would need both a control plane and a data plane in order to function properly. Ideally the control plane would set up the interface and routing for the data plane hardware and the server and/or backend storage would be none the wiser.
I envision a scenario where a customer’s application data is packaged with a data hypervisor which runs on a commodity data switch hardware with data plane and control plane software running on it. Sort of creating (virtual) data machines or DMs.
All enterprise and nowadays most midrange storage provide most of the functionality of a storage control plane such as defining units of storage, setting up physical to logical storage mapping, incorporating monitoring, and management of the physical storage layer, etc. So control planes are pervasive in today’s storage but proprietary.
In addition most storage systems have data plane functionality which operates to connect a host IO request to the actual data which resides in backend storage or internal cache. But again although data planes are everywhere in storage today they are all proprietary to a specific vendor’s storage system.
Data switch needed
But in order to utilize a data hypervisor and create a more general purpose control plane layer, we need a more generic data plane layer that operates on commodity hardware. This is different from today’s SAN storage switches or DCB switches but similar in a some ways.
The functions of the data switch/data plane layer would be to take routing instructions from the control plane layer and direct the server IO request to the proper storage unit using the data plane layer. Somewhere in this world view, probably at the data plane level it would introduce data protection services like RAID or other erasure coding schemes, point in time copy/clone services and replication services and other advanced storage features needed by enterprise storage today.
Also it would need to provide some automated storage movement across and within tiers of physical storage and it would connect server storage interfaces at the front end to storage interfaces at the backend. Not unlike SAN or DCB switches but with much more advanced functionality.
Ideally data switch storage interfaces could attach to dedicated JBOD, Flash arrays as well as systems using DAS storage. In addition, it would be nice if the data switch could talk to real storage arrays on SAN, IP/SANs or NFS&CIFS/SMB storage systems.
The other thing one would like out of a data switch is support for a universal translator that would map one protocol to another, such as iSCSI to SAS, NFS to FC, or FC to NFS and any other combination, depending on the needs of the server and the storage in the configuration.
Now if the data switch were built on top of commodity x86 hardware and software with the data switch as just a specialized application that would create the underpinnings for a true data hypervisor with a control and data plane that could be independent and use anybody’s storage.
Assuming all this were available then we would have true storage virtualization. With these capabilities, storage could be repurposed on the fly, added to, subtracted from, and in general be a fungible commodity not unlike server processing MIPs under VMware or Hyper-V.
Application data would then needed to be packaged into a data machine which would offer all the host services required to support host data access. The data hypervisor would handle the linkages required to interface with the control and data layers.
Applications could be configured to utilize available storage at ease and storage could grow, shrink or move to accommodate the required workload just as easily as VMs can be deployed today.
How we get there
Aside from the VMware, Citrix, Microsoft thrusts towards virtual storage there are plenty of storage virtualization solutions that can control most backend enterprise SAN storage. However, the problem with these solutions is that in general the execute only on a specific vendors hardware and don’t necessarily talk to DAS or JBOD storage.
In addition, not all of the current generation storage virtualization solutions are unified. That is most of these today only talk FC, FCoE or iSCSI and don’t support NFS or CIFS/SMB.
These don’t appear to be insurmountable obstacles and with proper allocation of R&D funding, could all be solved.
However the more problematic is that none of these solutions operate on commodity hardware or commodity software.
The hardware is probably the easiest to deal with. Today many enterprise storage systems are built ontop of x86 processor storage controllers. Albeit sometimes they incorporate specialized packaging for redundancy and high availability.
The harder problem may be commodity software. Although the genesis for a few storage virtualization systems might come from BSD or other “commodity” software operating systems. They have been modified over the years to no longer represent anything that can run on standard off the shelf operating systems.
Then again some storage virtualization systems started out with special home grown hardware and software. As such, converting these over to something more commodity oriented would be a major transition.
But the challenge is how to get there from here and would anyone want to take this on. The other problem is that the value add that storage vendors supply currently would be somewhat eroded. Not unlike what happened to proprietary Unix systems with the advent of VMware.
But this will not take place overnight and the company that takes this on and makes a go at it can have a significant software monopoly that would be hard to crack.
Perhaps it will take a startup to do this but I believe the main enterprise storage vendors are best positioned to take this on.
21 thoughts on “Data hypervisor”
so how is it different from DataCore/Virsto?
Not sure about Data Core but Virsto operates only in conjunction with server virtualization and the data hypervisor here stands outside server virtualization.Ray LucchesiRay@SilvertonConsulting.
Virsto in a nutshell is an appliance that does storage consolidation (data plane). and management interface (control plane).
it is limited to compute virtualization environments because it is most relevant in these environments and it is a good entry point.
There are several storage virtualization solutions available(some pure software, some mostly software with little bit of flash), the problem is they can't break the traditional thinking "storage=hardware".
I work at Virsto, so I'll get that out there up front.
In describing what we do, using the server hypervisor as an analogy for the "storage hypervisor" is very apt. Server hypervisors did two basic things: they virtualized CPU/memory resources to improve flexibility, and they increased the ability to utilize a sorely underutilized resource.
That's very similar to what we do at Virsto. We virtualize underlying heterogeneous storage, providing storage objects that look like VMDKs or VHDs to the hypervisor, and allowing storage operations to be performed with very high performance at the VM (instead of the LUN) level (failover, replication, snapshots, etc.). This is similar to the ease of use benefit that NFS provides, but we're doing it for block-based storage.
The second thing we do – and what we see as setting ourselves apart from pure storage virtualization plays – is that we significantly increase the ability to utilize a sorely underutilized resource… your storage. With the strategic placement of a log architecture into the virtual storage layer, we can take storage devices that today may only be producing 30-50 IOPS (from the point of view of application IOPS) and have them operating (from the point of view of guest VMs) at 400 – 500 IOPS. This 10x speedup varies depending on what your core storage technology is (7.2K, 10K, 15K) but it gives you a huge performance boost without requiring that you purchase any additional hardware. This is very different from a caching approach, which not only requires that you buy faster hardware to create the caching tier to get the speedup, but also achieves its speedup with generally 70 – 90% less capacity (log vs cache).
Eric,Thanks for your comment. I am familiar with the Virsto architecture and although it's very close to what I was looking at here, the requirements go beyond a pure VMware vSphere environment and/or pure DAS environment. I believe that storage virtualization in order to be more relevant to the data center of the future has to take on a more software defined architecture not unlike VMware hypervisor has done over the years to support VMs operating under it's environment with just about anyone's servers, storage and networking gear. When storage virtualization can do that, then we really will have a universal storage solution.Ray
what would be missing from the IBM SVC in your point of view?
It runs on commodity x86 HW, provides advanced funktions, runs with most storage systems (non IBM) and even provides connectivity for iSCSI and FCoE hosts to SAN Storage
Arne,Thanks for the question. Although the SVC provides similar functionality, it doesn't execute on commodity software and it doesn't (seem?) as dynamic as what I was thinking of here. And you can't just purchase a SVC software license and run it on your hardware. Also file services are missing from the SVC, although Storwize V7000 Unified seems like a step down that path.Ray
This post prompted me to make a couple of observations to see if I appreciate your intent more fully. This is in two parts.
At a high level of abstraction, a storage system is implemented as a special purpose computer. The special purpose is to move data from memory to magnetic disk very efficiently. The data arrives in memory because an application requests this using a storage protocol that causes the memory of its computer to move that data to the memory of the storage system.
All computing involves layers of abstraction. For example, in storage systems, the address of the data on magnetic disk is highly abstracted from the address an application uses to store its data. For a virtual machine architecture, abstraction layers are crititcal decisions. They have far reaching implications for software architectecture. For example, bad abstraction choices, such as hard coding an IP address into a business application, damage the ability for resources to be fungible to a workload. I suggest that a problem with storage systems is lack of an open, well architected set of abstractions suitable for software development.
End Part 1 …
BRCDbreams,Thanks for your comment. I believe you are on the right track. But one other thing I think worth highlighting is the fact that abstractions can be public or private. In today's storage systems they are essentially private. And that's one of the major problems. The other problem is that even if they were public, not having some set of abstractions that's widely agreed to/adhered to is yet another serious problem.The advantage that server virtualization has here is that the hardware (abstraction) layer has been well defined and public for a long time. This allowed multiple server hypervisors to emerge and relatively painlessly create a new abstraction layer (the hardware abstraction for VMs)Ray+
Begin Part 2 …
As is the case with networking, storage suffers from an architecture that has poor programming abstractions. See Scott Shenker, et al, on how this might be addressed in networking via OpenFlow and what has become known as software defined networking (SDN). Note the emphasis on "software" as in programming and its modern architecture of abstractions. Similarly in storage systems, a more "programatic" set of abstractions is required for better resource utilization with workload variation.
End Part 2 …
BRCDbreams,Thanks again for your comment.I believe you are essentially correct. Although “poor” implies a value statement. My problem today with storage is that the abstraction layers are all private and proprietary rather than public and open. I believe a broader discussion and some shared abstraction layers can open up the storage side of IT to better alternatives. And yes, I believe that such abstractions can be and should be implementable in software.Ray
Start Part 3…
Unlike application state and the relatively small size of its compiled image which can be moved rather quickly from one compute node to another, an application's data is many orders of magnitude larger. Therefore, unlike live VMotion in VMware, storage systems aren't like to move data from disk to disk as a means of improving IO performance. I believe this has important implications on the architecture of the storage system data plane that are unique to storage. It suggests a pool of controllers with CPU and memory that allows IO processing to migrate from one controller node to another as IOPS fluctuate for an application. But, the data pool itself is static with replication of data providing redundancy for data protection, but not for migrating data to another node for better IO performance.
As always, I find your posts thought provoking.
BRCDbreamsRight again. Yes data is more difficult to move and as such tends to be more statically allocated. But that doesn't mean that it can't be moved around the backend of a storage environment. Today's enterprise class storage systems have automated storage tiering that moves chunks of data across tiers from SSD, to high performing disks to high capacity disks. They can do this dynamically or on a scheduled basis. It's not to much of a stretch to imagine such facilities doing this for all of an applications data.I guess, I see both the need for backend storage automated tiering (driven by the control/data plane requirements) as well as cross storage machine data movement which is done for HA, performance, workload ballancing etc. Not unlike why customers move VMs around today. In this case, whether the actual storage that holds the data moves to some other physical storage system is for the control plane to decide. So I would say both the DM and it's associated storage can and should be moved.Ray
Thought provoking post, thanks for sharing.
Data hypervisor seems like a path to 'Information hypervisor' of sorts. While 'Data' abstracts physical concepts such as structured, un-structructed, accessed via protocol A/B/…, located on LUN/VMDK/.., stored on JBOD, Array, DAS, NAS… the 'information' abstracts business concepts such as mission-critical, compliance related, rapid/low change level, quick/rare business access level, business-confidential, scratch etc.
If you look at business function of marketing today – I can see them using 'information hypervisor' of sorts some day – which could be based on your 'data hypervisor' plumbing. Imagine aggregating, analyzing and making sense of 'information' which streams in from social networking 'data hypervisor' or internal business data hypervisors or some marketing research firm's public data hypervisor that could be plugged / unplugged into the 'information hypervisor' as a paid/free choice.
Shaloo,Thanks for your comment. You seem to be taking this hypervisor concept to a new direction. An information hypervisor would seem to abstract business domain resources into something that can be hosted in different environments. At some level this starts looking like business analytics, data warehouses and OLTP databases all combined into one where information can be hosted and flow to any of these entities as needed. The essential information element remains immutable, but transcodeable to support any and all of these services. Interesting…Ray
I like the Information Hypervisor (IHV) idea. Going beyond protocols and into indexing and searchability of the data, hashing it for faster deduplication, replication, data mining. Virus scans could move from the edge to the core, organizations could find information anywhere independent of protocol type.
There was a really cool technology we worked on at VERITAS called VxMS which was a mapping service which could take any block device and analyze it's internal filesystem structure so you could do file level restores from block level backups and snapshots. Baking in that kind of tech would make for a powerful IHV solution.
Steve,Thanks again for your comments. Yes there is the mapping layer for the Data hypervisor but it only gets much more complex for the Information hypervisor, because now you need SQL, HDFS, and probably a half-dozen or more other ones as well.Ray
I like your article. What you outline here is somewhat akin to where we're headed at OS NEXUS with QuantaStor. QuantaStor runs on commodity hardware and as a virtual machine under all major hypervisors. We chose to build on top of Linux for ease of support and maintenance but have focused the design of the product towards making life easy for the virtualization admin.
Where we're unique is in the storage management layer design which incorporates scale-out grid management, and multi-tenancy.
Looking ahead the enterprise and managed hosting companies will need a way to boost efficiency by another order of magnitude. As you point out, that's going to require a solution that uses commodity hardware top to bottom, I think you're spot on.
Steve,Thanks for your comment. As mentioned, there's probably at least a half dozen solutions on the market today that approximate the data hypervisor. Some of these use proprietary hardware, some use proprietary hypervisors and some use proprietary software. I believe that ultimately there will be a place for all these solutions but as solutions go up in “proprietariness”, they must deliver more functionality and thus, more benefits to the customer. The fact that your solution is implemented to support any server hypervisor bodes well.However, as indicated in the post, there is definitely a place for a commodity hardware/commodity software solution that provides the services of a true data hypervisor. Startups seem better positioned to create these sorts of systems but in the end, it threatens major storage vendor business models and as such, needs to be adopted by them or risk marginalization.Good luck with your product.Ray
Thanks Ray. I think the commodity software + commodity hardware is almost there today so it will be interesting to see the data hypervisor merge into the mainstream over then next few years. With Xen/KVM and OpenStack Swift, Gluster, Btrfs, SCST/LIO, etc. there's a real formation of critical mass there. I think the challenge is really in bringing all those complex technologies together under one umbrella and I think that'll take a long time for the open source to produce.
It's interesting, it seems OSS is great at solving some of the really nasty low level technical challenges like hypervisors, filesystems, and kernels but is generally not so good at tackling the management complexity problems (certainly with a few exceptions).
Steve,Thanks for the comment and I couldn't agree more. It seems OSS has a blind side when it comes to management and simplicity.Regards,Ray
Comments are closed.