Securing data in the cloud

Who says there are no clouds today by akakumo (cc) (from Flickr)
Who says there are no clouds today by akakumo (cc) (from Flickr)

We have posted previously about the need for backup in cloud storage. So today I would like to start a discussion on securing “data-at-rest” within the cloud.

Depositing data into the cloud seems a little like a chinese laundry to me – you deposit data in the cloud and receive a ticket or token used to retrieve the data. Today’s cloud data security depends entirely on this token.

Threats to the token

If one only looks at external security threats, two issues to token use seem apparent

  • Brute force cracking of any token is possible. I envision a set of cloud storage users wherein they use their current storage tokens as seeds to identify other alternate tokens. Such an attack could easily generate token synonyms which may or may not be valid. Detecting a brute force attack could be easily accomplished, but distributing this attack across 1000s of compromised PCs would be much harder to detect.
  • Tokens could be intercepted in the clear. Cloud data often may need to be accessed in locations outside the data center of origin. This would require sending tokens to others. These data tokens could inadvertently be sent in the clear and as such, intercepted.

Probably other external exposures beyond these two exist as well but these will suffice.

Securing cloud data-at-rest

Given the potential external and internal threats to data tokens, securing such data can eliminate any data loss from token exposure. I see at least three approaches to securing data in the cloud.

  • The data dispersal approach – Cleversafe’s product splits a data stream into byte segments and disburses these segments across storage locations. Their approach is oriented around Reed Solomon logic and results in no one location having any recognizable portion of the data. This requires multiple sites, or multiple systems at one site to implement but is essentially securing data through segmenting it. The advantages of this approach is that its fast and automatic but the disadvantage is that it only is supported via Cleversafe.
  • The software data encryption approach – there are plenty of software packages out such as GnuPG (GNU Privacy Guard) or PGP which can be used to encrypt your data prior to sending it to the cloud. It’s sort of brute force, software only approach but its advantage is that it can be used with any cloud storage provider. Its disadvantages are that it’s slow, processor intensive, and key management is sporadic.
  • The hardware data encryption approach – there are also plenty of data encryption appliances and/or hardware options out there which can be used to encrypt data. Some of these are available at the FC switch level, some are standalone appliances, and some exist at the storage subsystem level. The problem with most of these is that they only apply to FC storage and are not readily useable by Cloud Storage (unless the provider uses FC storage as its backing store). The advantages are that it’s fast and key management is generally built into the product.

One disadvantage to any of the encryption approaches is that now one needs the encryption keys and the token to access the data. Yet one more thing to protect.

Nothing says that hardware data encryption couldn’t also work for data flowing to the cloud but they would have to support IP plus the cloud specific REST interface. Such support would depend on cloud storage provider market share, but perhaps some cloud vendor could fund a security appliance vendor to support their interface directly, providing a cloud data security option.

The software approach suffers from performance problems but supports anybody’s cloud storage. It might be useful if Cloud storage providers started offering hooks into GnuPG or PGP to directly encrypt cloud data. However, most REST interfaces require some programming to use and it’s not too much of a stretch to program in encryption into this.

I like the data dispersal approach, but most argue that security is not guaranteed as reverse engineering the dispersal algorithm allows one to reconstruct the data stream. But the other more serious problem is that it only applies to Cleversafe storage, perhaps the dispersal algorithm should be open sourced (which it already is) and/or standardized.

There are possibly other approaches which I have missed here but these can easily be used to secure cloud data-at-rest. Possibly adding more security around the data token could also help alleviate this concern. Thoughts?

Post disclosure: I am not currently working with Cleversafe, any data security appliance provider, or cloud storage provider.