Does cloud storage need backup?

I was talking with a cloud storage vendor the other day and they made an interesting comment, cloud storage doesn’t need to backup data?! They told me that they and most cloud storage providers replicate customer file data so that there is always at least two (or more) copies of customer data residing in the cloud at different sites, zones or locations. But does having multiple copies of file data eliminate the need to backup data?

Most people backup data to prevent data loss from hardware/software/system failures and from finger checks – user error. Nowadays, I backup to a external hard disk nightly for my business stuff, add some family stuff to this and backup all this up once a week to external removable media, and once a month take a full backup of all user data on my family Mac’s (photos, music, family stuff, etc.) to external removable media which is then saved offsite.

For my professional existence (30+ years) I have lost personal data from a hardware/software/system failure maybe a dozen times. These events have gotten much rarer in recent history (thank you drive vendors). But about once a month I screw something up and delete or overwrite a file I need to keep around. Most often I restore from the hard drive but occasionally use the removable media to retrieve the file.

I am probably not an exception with respect to finger checks. People make mistakes. How cloud storage providers handle restoring deleted file data for user error will be a significant determinant of service quality for most novice and all professional users.

Now in my mind there are a couple of ways cloud storage providers can deal with this problem.

  • Support data backup, NDMP, or something similar which takes a copy of the data off the cloud and manages it elsewhere. This approach has worked for the IT industry for over 50 years now and still appeals to many of us.
  • Never “really” delete file data, by this I mean that you always keep replicated copies of all data that is ever written to the cloud. How a customer accesses such “not really deleted” data is open to debate but suffice it to say some form of file versioning might work.
  • “Delay” file deletion, don’t delete a file when the user requests it, but rather wait until some external event, interval, or management policy kicks in to “actually” delete the file from the cloud. Again some form of versioning may be required to access “delay deleted” data.

Never deleting a file is probably the easiest solution to this the problem but the cloud storage bill would quickly grow out of control. Delaying file deletion is probably a better compromise but deciding which event, interval, or policy to use to trigger “actually deleting data” to free up storage space is crucial.

Luckily most people realize when they have made a finger check fairly quickly (although may be reluctant to admit it). So waiting a week, month, or quarter before actually deleting file data would work to solve with this problem. Mainframers may recall generation datasets (files) where one specified the number of generations (versions) of a file and when this limit was exceeded, the oldest version would be deleted. Also, using some space threshold trigger to delete old file versions may work, e.g., whenever the cloud gets to be 60% of capacity it starts deleting old file versions. Any or all of these could be applied to different classes of data by management policy.

Of course all of this is pretty much what a sophisticated backup package does today. Backup software retains old file data around for a defined timeframe, typically on some other media or storage than where the data is normally stored. Backup storage space/media can be reclaimed on a periodic basis such as reusing backup media every quarter or only retaining a quarters worth of data in a VTL. Backup software removes the management of file versioning from the storage vendor and places it in the hands of the backup vendor. In any case, many of the same policies for dealing with deleted file versions discussed above can apply.

Nonetheless, in my view cloud storage providers must do something to support restoration of deleted file data. File replication is a necessary and great solution to deal with hardware/software/system failures but user error is much more likely. Not supplying some method to restore files when mistakes happen is unthinkable.

11 thoughts on “Does cloud storage need backup?

  1. In recent years, my data loss is nearly always manifest through data corruption of some kind. Sometimes it’s a finger check, and sometimes it’s a software failure. The lack of backup capability in cloud storage is an Achilles Heel. Eventually it will become an issue, especially for enterprise users.

    I agree that one solution is to delay the actual removal of deleted files. I use one service that retains deleted files for 30 days after they’re marked for deletion. The user also has the option to purge a file, which really removes immediately.

    My personal policy is to never put anything exclusively in the cloud. I also maintain a copy locally.

    1. Jim, Thanks for your comment and validation. I do believe cloud storage can be used for primary storage but backup considerations become more severe when this is done.

  2. I find the whole concept of protecting cloud data curious. My take is that the cloud vendors will use this uncertainty as a profit center. They will (and some already do) charge different prices for different retentions. Don’t care about anything but yesterday’s data, the price is X. Oh you want yesterday’s and the past 15 days, now you pay X+Y. 30 days? X+Y+Z.

    By using the above model, the provider pushes the retention decisions to the customer. The provider’s only responsibility is to maintain the appropriate number of copies to meet the SLA.

    The final question relates to medium. What if a customer wants the data recoverable for 90 days? 1 year? 10 years? 30 years? I would suggest that different storage mediums would be appropriate for the different retentions including such options as primary disk, deduplicated VTL storag and physical tape.

  3. Interesting take and why not. Other service providers have offered varying level of retention services for years. My only concern was that some Cloud vendors were not considering this. Of course some cloud storage vendors only support secondary or tertiary storage and maybe for that service, backup via replication is sufficient.

  4. You totally hit the nail on the head. I’ve been dealing with Rackspace and love their cloud solution but am amazed they don’t offer variable backups. Most often it will be due to user error but that doesn’t excuse them for not offering a solution. Crazy.

    Thanks for bringing some attention to this issue.

  5. I think the real question from cloud providers is are they to be considered primary-secondary storage or some sort of alternative to reference storage (for never updated data). It’s the confusion with this that sets them up for problems. If they are to be considered primary-secondary storage they need to either defer deletion or offer integrated backup. Just providing replication/mirroring does not suffice for primary-secondary storage…

  6. Of course they want to be considered primary storage with marketing that pitches unlimited scalability not too mention the fact that they get paid for more for more data. Compressed backed up data generates less revenue. User-generated primary storage can grow super-fast. Then mix in the reason the cloud is so hot, small and micro businesses are creating web-apps that have the potential to blow-up overnight, the cloud is the perfect solution, especially on a shoe-string budget. I think we’re simply seeing marketing and demand out in front of engineering. Backup/restores will soon become the new differentiators for the first companies to offer it.

  7. The cost of keeping multiple copies or snapshots of data with another provider can be greatly minimized using compression techniques. By storing data in a form that is compressed and preferably accessible (queryable in the case of structured data) you can significantly reduce your cloud storage and upload costs. At RainStor we recently launched a SaaS data escrow service here, which although not positioned as a backup service, can be used to access old versions or deleted data should your SaaS provider be unable or unwilling to restore the data you need.

Comments are closed.