I was talking with a cloud storage vendor the other day and they made an interesting comment, cloud storage doesn’t need to backup data?! They told me that they and most cloud storage providers replicate customer file data so that there is always at least two (or more) copies of customer data residing in the cloud at different sites, zones or locations. But does having multiple copies of file data eliminate the need to backup data?
Most people backup data to prevent data loss from hardware/software/system failures and from finger checks – user error. Nowadays, I backup to a external hard disk nightly for my business stuff, add some family stuff to this and backup all this up once a week to external removable media, and once a month take a full backup of all user data on my family Mac’s (photos, music, family stuff, etc.) to external removable media which is then saved offsite.
For my professional existence (30+ years) I have lost personal data from a hardware/software/system failure maybe a dozen times. These events have gotten much rarer in recent history (thank you drive vendors). But about once a month I screw something up and delete or overwrite a file I need to keep around. Most often I restore from the hard drive but occasionally use the removable media to retrieve the file.
I am probably not an exception with respect to finger checks. People make mistakes. How cloud storage providers handle restoring deleted file data for user error will be a significant determinant of service quality for most novice and all professional users.
Now in my mind there are a couple of ways cloud storage providers can deal with this problem.
- Support data backup, NDMP, or something similar which takes a copy of the data off the cloud and manages it elsewhere. This approach has worked for the IT industry for over 50 years now and still appeals to many of us.
- Never “really” delete file data, by this I mean that you always keep replicated copies of all data that is ever written to the cloud. How a customer accesses such “not really deleted” data is open to debate but suffice it to say some form of file versioning might work.
- “Delay” file deletion, don’t delete a file when the user requests it, but rather wait until some external event, interval, or management policy kicks in to “actually” delete the file from the cloud. Again some form of versioning may be required to access “delay deleted” data.
Never deleting a file is probably the easiest solution to this the problem but the cloud storage bill would quickly grow out of control. Delaying file deletion is probably a better compromise but deciding which event, interval, or policy to use to trigger “actually deleting data” to free up storage space is crucial.
Luckily most people realize when they have made a finger check fairly quickly (although may be reluctant to admit it). So waiting a week, month, or quarter before actually deleting file data would work to solve with this problem. Mainframers may recall generation datasets (files) where one specified the number of generations (versions) of a file and when this limit was exceeded, the oldest version would be deleted. Also, using some space threshold trigger to delete old file versions may work, e.g., whenever the cloud gets to be 60% of capacity it starts deleting old file versions. Any or all of these could be applied to different classes of data by management policy.
Of course all of this is pretty much what a sophisticated backup package does today. Backup software retains old file data around for a defined timeframe, typically on some other media or storage than where the data is normally stored. Backup storage space/media can be reclaimed on a periodic basis such as reusing backup media every quarter or only retaining a quarters worth of data in a VTL. Backup software removes the management of file versioning from the storage vendor and places it in the hands of the backup vendor. In any case, many of the same policies for dealing with deleted file versions discussed above can apply.
Nonetheless, in my view cloud storage providers must do something to support restoration of deleted file data. File replication is a necessary and great solution to deal with hardware/software/system failures but user error is much more likely. Not supplying some method to restore files when mistakes happen is unthinkable.