I was talking with another cloud storage gateway provider today and I asked them if they do any sort of backup for data sent to the cloud. His answer disturbed me – they said they depend on backend cloud storage providers replication services to provide data protection – sigh. Curtis and I have written about this before (see my Does Cloud Storage need Backup? post and Replication is not backup by W. Curtis Preston).
Cloud replication is not backup
Replication does a nice job of covering a data center or hardware failure which leaves data at one site inaccessible but allows access to a replica of the data from another site. As far as I am concerned there’s nothing better than replication for these sorts of DR purposes but it does nothing for someone deleting the wrong file. (I one time did a “rm * *” command on a shared Unix directory – it wasn’t pretty).
Some cloud storage (backend) vendors delay the deletion of blobs/containers until sometime later as one solution to this problem. By doing this, the data “stays around” for “sometime” after being deleted and can be restored via special request to the cloud storage vendor. The only problem with this is that “sometime” is an ill-defined, nebulous concept which is not guaranteed/specified in any way. Also, depending on the “fullness” of the cloud storage, this time frame may be much shorter or longer. End-user data protection cannot depend on such a wishy-washy arrangement.
Other solutions to data protection for cloud storage
One way is to have a local backup of any data located in cloud storage. But this kind of defeats the purpose of cloud storage and has the cloud data being stored both locally (as backups) and remotely. I suppose the backup data could be sent to another cloud storage provider but someone/somewhere would need to support some sort of versioning to be able to keep multiple iterations of the data around, e.g., 90 days worth of backups. Sounds like a backup package front-ending cloud storage to me…
Another approach is to have the gateway provider supply some sort of backup internally using the very same cloud storage to hold various versions of data. As long as the user can specify how many days or versions of backups can be held this works great, as cloud replication supports availability in the face of hardware failures and multiple versions support availability in the face of finger checks/logical corruptions.
This problem can be solved in many ways, but just using cloud replication is not one of them.
Listen up folks, whenever you think about putting data in the cloud, you need to ask about backups among other things. If they say we only offer data replication provided by the cloud storage backend – go somewhere else. Trust me, there are solutions out there that really backup cloud data.