Caching DaaD for federated data centers

Internet Splat Map by jurvetson (cc) (from flickr)
Internet Splat Map by jurvetson (cc) (from flickr)

Today, I attended a webinar where Pat Gelsinger, President of Information Infrastructure at EMC discussed their concept for a new product based on the Yotta Yotta technology they acquired a few years back.  Yotta Yotta’s product was a distributed, coherent caching appliance that had FC front end ports, an Infiniband appliance internal network and both FC and WAN backend links.

What one did with Yotta Yotta nodes was place them in front of your block storage, connect them together via infiniband locally and via a WAN technology (of your choice, then) and then you could access any data behind the appliances from any attached location.  They also provided very quick transferring of bulk data between remote nodes. So, their technology allowed for very rapid data transmission over standard WAN interfaces/distances and provided a distributed cache across those very same distances to the data behind the appliances.

I like caching appliances as much as anyone but they had become prominent only in the late 70’s and early 80’s mostly because caching technology was hard to do with the storage subsystems of the day, but they went away a long time ago.  Nowadays, you can barely purchase a lone disk drive without a cache in them.  So what’s different.

Introducing DaaD

Today we have SSDs and much cheaper processing power.  I wrote about new caching appliances like DataRam‘s XcelaSAN  in a Cache appliances rise from the dead post I did after last years SNW.  But EMC’s going after a slightly broader domain – the world.  The caching appliance that EMC is discussing is really intended to support distributed data access, or as I like to call it,  Data-at-a-Distance (DaaD).

How can this work?  Data is stored on subsystems at various locations around the world.  A DaaD appliance is inserted in front of each of these and connected over the WAN. Some or all of that data is now re-configured (at block or more likely LUN level) to be accessible at distance from each DaaD data center.  As each data center reads and writes data from/to their remote brethern, some portion of that data is cached locally in the DaaD appliance and the rest is only available by going to the remote site (with considerably higher latency).

This works moderately well for well behaved, read intensive workloads where 80% of the IO is to 20% of the data (most of which is cached locally).  But block writes present a particularly nasty problem as any data write has to be propagated to all cache copies before acknowledged.

It’s possible write propagation could be done via invalidating the data in cache (so any subsequent read would need to re-access the data from the original host).  Nevertheless, to even know which DaaD nodes have a cached copy of a particular block, one needs to maintain a dictionary of all globally identifiable blocks held in any DaaD cache node at every moment in time.  Any such table would change often, will necessarily need to be updated very carefully, deadlock free and atomically with non-failable transactions – therein lies one of the technological hurdles.  Doing this quickly without impacting performance is another hurdle.

So simple enough, EMC takes Yotta Yotta’s technology, updates it for todays processors, networking, and storage, and releases it as a data center federation enabler. So, what can one do with a federated data center, well that’s another question and it involves Vmotion, and must be a subject for a future post …

10 thoughts on “Caching DaaD for federated data centers

  1. This post is great, I was curious what ever happened to Yotta Yotta. Having been an end user of the Yotta Yotta framework implemented at AOL a few years ago I saw the benefit and potential for virtual environments that spawned entire data centers with the storage being a commodity no longer limited to the local SAN fabric, but truly live replication across a geographical void. Most importantly, with VLAN's being extended across to multiple data centers and proper replication of firewall devices, you now have the ability to virtually migrate your running systems to another geographical location without downtime and using the VMWare client to do it. I know the overhead and cost for this type of setup is not looked upon lightly, but reality states that instead of simply having a active passive fail-over site, I now have the ability to extend my peak capacity across to my fail-over site making the investment for geographical redundancy faster to justify it's ROI and lessen the TCO over time.

    1. Thanks for the comment. EMC didn't say when they were coming out with something. Also VMware at last years VMworld demo-ed a Vmotion at a distance using some special sauce from Cisco (which I think extended the LAN/VLAN). It was interesting but not yet ready for prime time.

      If EMC is right, and Cisco can deal with a LAN extending across data centers, then VMware and others might just be able to support Vmotion over distances. And this is not just a DR solution but it's really a load ballancing solution as well.

  2. Very good overview Ray. Federated Storage and Virtual Storage Pools are coming whether we like it or not. Data at a Distance? Latency at a Distance (LaaD) is more like it, this will quickly become the biggest challenge (opportunity?) for storage vendors as they promote Federated Pools around the globe. It will be interesting to watch this technology evolve. Thanks, DrDedupe

Comments are closed.