Yesterday, twitterland was buzzing about EMC’s latest enhancement to their Atmos Cloud Storage platform called GeoProtect. This new capability improves cloud data protection by supporting erasure code data protection rather than just pure object replication.
Erasure coding has been used for over a decade in storage and some of the common algorithms are Reed-Solomon, Cauchy Reed-Soloman, EVENODD coding, etc. All these algorithms provide a way for splitting up customer data into data instances and parity (encoding) to allow some number of data or parity instances to be erased (or lost) while still providing customer data. For example, a R-S encoding scheme we used in the past (called RAID 6+) had 13 data fragments and 2 parity fragments. Such an encoding scheme supported the simultaneous failure of any two drives and could still supply (reconstruct) customer data.
But how does RAID differ from something like GeoProtect.
- RAID is typically within a storage array and not across storage arrays
- RAID is typically limited to a small number of alternative configurations of data disks and parity disks which cannot be altered in the field, and
- Currently, RAID typically doesn’t support more than two disk failures while still being able to recover customer data (see Are RAIDs days numbered?)
As I understand it GeoProtect currently supports only two different encoding schemes which can provide for different levels of data instance failures while still protecting customer data. And with GeoProtect you are protecting data across Atmos nodes and potentially across different geographic locations not just within storage arrays. Also, with Atmos this is all policy driven and data that comes into the system can use any object replication policy or either of the two GeoProtect policies supported today.
Although the nice thing about R-S encoding is that it doesn’t have to be fixed to two different encoding schemes. And as it’s all software, new coding schemes could easily be released over time, possibly someday being entirely something a user could dial up or down at their whim.
But this would seem much more like what Cleversafe has been offering in their SliceStor product. With Cleversafe the user can specify exactly how much redundancy they want to support and the system takes care of everything else. In addition, Cleversafe has implemented a more fine grained approach (with many more fragments) and data and parity are intermingled in each stored fragment.
It’s not a big stretch for Atmos to go from two GeoProtect configurations to four or more. Unclear to me what the right number would be but once you get past 3 or so, it might be easier to just code a generic R-S routine that can handle any configuration the customer wants but I may be oversimplifying the mathematics here.
Nonetheless, in future versions of Atmos I wouldn’t be surprised if it’s possible that through policy management the way data is protected could change over time. Specifically, while data is being frequently accessed, one could use object replication or less compressed encoding to speed up access but once access frequency diminishes (or time passes), data can then protected with more storage efficient encoding schemes which would reduce the data footprint in the cloud while still offering similar resiliency to data loss.
Full disclosure I have worked for Cleversafe in the past and although I am currently working with EMC, I have had no work from EMC’s Atmos team.