We were talking with Ursheet Parikh at StorSimple today about their new cloud gateway product (to be covered in a future post) when at the end of the talk he described some IP they have to handle cloud storage’s “eventual consistency“. Dumbfounded, I asked him to clarify, having never heard this term before.
Apparently, eventual data consistency is what you get when you use most cloud storage providers. With eventual consistency they will not guarantee that when you read back an object that has been recently updated that you will get the latest copy.
In contrast, “immediate consistency” means that if you update an object the cloud storage provider guarantees the latest version will be supplied for any and all subsequent read backs. To me all storage up until cloud storage guaranteed immediate consistency otherwise it was considered a data integrity failure.
To explain, cloud storage providers have multiple copies of any object replicated about that must be updated throughout their environment. As such, they cannot guarantee that you will read back an updated version versus one of the downlevel one(s)- Yikes!
What does this mean for your cloud storage?
First, Microsoft’s Azure cloud storage is the only provider that guarantees immediate consistency but in order to do so has made some restrictions on object size. But this means all the other cloud storage providers only guarantee eventual consistency.
Second, cloud storage with eventual consistency guarantee should not be used for data that’s updated frequently and then read back. It’s probably ok for archive or backup storage (that’s not restored for awhile) BUT it’s not ok for “normal” file or block data which is updated frequently and then read back expecting to see the updates.
According to Ursheet, the cloud storage providers have been completely up-front about their consistency level and as such his product, StorSimple, has been specifically designed to accommodate variable levels of consistency. We would need to ask the other providers how they handle cloud storage consistency-ness to understand whether they have tried to deal with this as well.
However, from my perspective eventual consistency is scary. It appears that cloud storage has redefined what we mean by storage or at the very least eliminating data integrity. Moreover, this seriously limits the usability of raw cloud storage to very archive-like, infrequently updated data storage.
And I thought cloud storage was going to take over the data center – not like this…
12 thoughts on “Eventual data consistency and cloud storage”
Ray — terrific post. Very few people focus on this issue. It is important to understand that eventual consistency is not a bug with the cloud but rather an inherent property of highly scalable distributed systems. Vogel has posted some very informative articles at All Things Distributed.
The bottom line is that there is not way to solve the problem in the cloud without compromising the scalability of the cluster or creating potential locks in the system.
The problem is that applications and specifically file systems need eventual consistency. The special use cases that do not need it are archiving and some applications (really other forms of archiving systems like PACS) where the storage subsystem is assumed to be WORM (i.e tape or optical media). This behavior has kept the use of the cloud limited to WORM-like applications until recently when the gateway products like StorSimple and the Nasuni Filer have solved the problem at the edge.
Our approach is to create consistent snapshots in the file system and stored those, depuplicated, snapshot in the cloud as WORM objects. We have also written about this very issue in our blog:
Thanks for the comment. I guess I wasn't aware of this attribute of the cloud until I talked with Ursheet. The fact that at least one Cloud storage provider can supply \”immediate consistency\” tells me it's not necessarily inherent in the architecture of cloud systems that scale. It seems to me that some form of meta-data directory that is replicated/updated across all cloud data sites/instances would suffice to show which objects are current and which not.
In any event, all this is just another reason to have a cloud storage gateway to deal with this on behalf of the user community.
Nice write up. The eventual consistency issue also comes up not just for immediate reads after writes but also writes after purges, which in this case could result in complete data loss. This is where we need products like StorSimple and Nasuni to enable enterprises to capture the inherent operation simplicity, cost benefits, on-demand storage and other compelling value proposition Cloud infrastructures do provide. Cloud Storage as it is today needs to be viewed as yet another Tier of Storage for Storage controllers like StorSimple, rather than an end-all solution.
Good catch, hadn't considered the purge scenario – complete data loss from downlevel objects. Double Yikes!!
I’m not sure what the fuss is here. TwinStrata’s product delivers immediate consistency through it’s intelligent caching functionality.
Don’t all of the hybrid cloud gateway products do that? Why is this a special feature?
GregR, I was unaware of TwinStrata’s product or capabilities. Perhaps sometime we can discuss just how TwinStrata guarantees immediate consistency.
Sure, I’d be happy to. In a nutshell, every write first goes to local cache, and is then replicated to the cloud. Any reads of that block will get satisfied from cache, not the cloud. The cache will always have the latest write that was issued unless it’s been flushed for some reason, and by then the cloud will have achieved consistency on its own. We also have a variable cache policy, so it’s possible to have a cache that will never flush.
I can’t believe this is unique to us though (as much as I’d like to). Any hybrid cloud architecture will behave this way, though maybe not using exactly the same methodology (the net effect will be the same).
Anyway, feel free to contact me email@example.com.
Comments are closed.