Cloud Storage Gateways Surface

Who says there are no clouds today by akakumo (cc) (from Flickr)
Who says there are no clouds today by akakumo (cc) (from Flickr)

One problem holding back general purpose cloud storage has been the lack of a “standard” way to get data in and out of the cloud.  Most cloud storage providers supply a REST interface, an object file interface or other proprietary ways to use their facilities.  The problem with this is that they all require some form of code development on the part of the cloud storage customer in order to make use of these interfaces.

It would be much easier if cloud storage could just talk iSCSI, FCoE, FC, NFS, CIFS, FTP,  etc. access protocols.  Then any data center could use the cloud with a NIC/HBA/CNA and just configure the cloud storage as a bunch of LUNs or file systems/mount points/shares.  Probably FCoE or FC might be difficult to use due to timeouts or other QoS (quality of service) issues but iSCSI and the file level protocols should be able to support cloud storage access without such concerns.

So which cloud storage support these protocols today?  Nirvanix supports CloudNAS used to access their facilities via NFS, CIFS and FTP,  ParaScale supports NFS and FTP, while Amazon S3 and Rackspace CloudFiles do not seem to support any of these interfaces.  There are probably other general purpose cloud storage providers I am missing here but these will suffice for now.   Wouldn’t it be better if some independent vendor supplied one way to talk to all of these storage environments.

How can gateways help?

For one example, Nasuni recently emerged from stealth mode, releasing a beta version of a cloud storage gateway that supports file access to a number of providers. Currently, Nasuni supports CIFS file protocol as a front end for Amazon S3, IronMountain ASP, Nirvanix, and coming soon Rackspace CloudFile.

However, Nasuni is more than just a file protocol converter for cloud storage.  It also supplies a data cache, file snapshot services, data compression/encryption, and other cloud storage management tools. Specifically,

  • Cloud data cache – their gateway maintains a disk cache of frequently accessed data that can be accessed directly without having to go out to the cloud storage.  File data is chunked by the gateway and flushed out of cache to the backend provider. How such a disk cache is maintained coherently across multiple gateway nodes was not discussed.
  • File snapshot services – their gateway supports a point-in-time copy of file date used for backup and other purposes.  The snapshot is created on a time schedule and provides an incremental backup of cloud file data.  Presumably these snapshot chunks are also stored in the cloud.
  • Data compression/encryption services – their gateway compresses file chunks and then encrypts it before sending them to the cloud.  Encryption keys can optionally be maintained by the customer or be automatically maintained by the gateway
  • Cloud storage management services – the gateway configures the cloud storage services needed to define volumes, monitors cloud and network performance and provides a single bill for all cloud storage used by the customer.

By chunking the files and caching them, data read from the cloud should be accessible much faster than normal cloud file access.  Also by providing a form of snapshot, cloud data should be easier to backup and subsequently restore. Although Nasuni’s website didn’t provide much information on the snapshot service, such capabilities have been around for a long time and found very useful in other storage systems.

Nasuni is provided as a software only solution. Once installed and activated on your server hardware, it’s billed for as a service and ultimately is charged for on top of any cloud storage you use.  You sign up for supported cloud storage providers through Nasuni’s service portal.

How well all this works is open for discussion.  We have discussed caching appliances before both from EMC and others.  Two issues have emerged from our discussions, how well caching coherence is maintained across nodes is non-trivial and the economics of a caching appliance are subject to some debate.  However, cloud gateways are more than just caching appliances and as a way of advancing cloud storage adoption, such gateways can only help.

Full disclosure: I currently do no business with Nasuni.

Securing data in the cloud

Who says there are no clouds today by akakumo (cc) (from Flickr)
Who says there are no clouds today by akakumo (cc) (from Flickr)

We have posted previously about the need for backup in cloud storage. So today I would like to start a discussion on securing “data-at-rest” within the cloud.

Depositing data into the cloud seems a little like a chinese laundry to me – you deposit data in the cloud and receive a ticket or token used to retrieve the data. Today’s cloud data security depends entirely on this token.

Threats to the token

If one only looks at external security threats, two issues to token use seem apparent

  • Brute force cracking of any token is possible. I envision a set of cloud storage users wherein they use their current storage tokens as seeds to identify other alternate tokens. Such an attack could easily generate token synonyms which may or may not be valid. Detecting a brute force attack could be easily accomplished, but distributing this attack across 1000s of compromised PCs would be much harder to detect.
  • Tokens could be intercepted in the clear. Cloud data often may need to be accessed in locations outside the data center of origin. This would require sending tokens to others. These data tokens could inadvertently be sent in the clear and as such, intercepted.

Probably other external exposures beyond these two exist as well but these will suffice.

Securing cloud data-at-rest

Given the potential external and internal threats to data tokens, securing such data can eliminate any data loss from token exposure. I see at least three approaches to securing data in the cloud.

  • The data dispersal approach – Cleversafe’s product splits a data stream into byte segments and disburses these segments across storage locations. Their approach is oriented around Reed Solomon logic and results in no one location having any recognizable portion of the data. This requires multiple sites, or multiple systems at one site to implement but is essentially securing data through segmenting it. The advantages of this approach is that its fast and automatic but the disadvantage is that it only is supported via Cleversafe.
  • The software data encryption approach – there are plenty of software packages out such as GnuPG (GNU Privacy Guard) or PGP which can be used to encrypt your data prior to sending it to the cloud. It’s sort of brute force, software only approach but its advantage is that it can be used with any cloud storage provider. Its disadvantages are that it’s slow, processor intensive, and key management is sporadic.
  • The hardware data encryption approach – there are also plenty of data encryption appliances and/or hardware options out there which can be used to encrypt data. Some of these are available at the FC switch level, some are standalone appliances, and some exist at the storage subsystem level. The problem with most of these is that they only apply to FC storage and are not readily useable by Cloud Storage (unless the provider uses FC storage as its backing store). The advantages are that it’s fast and key management is generally built into the product.

One disadvantage to any of the encryption approaches is that now one needs the encryption keys and the token to access the data. Yet one more thing to protect.

Nothing says that hardware data encryption couldn’t also work for data flowing to the cloud but they would have to support IP plus the cloud specific REST interface. Such support would depend on cloud storage provider market share, but perhaps some cloud vendor could fund a security appliance vendor to support their interface directly, providing a cloud data security option.

The software approach suffers from performance problems but supports anybody’s cloud storage. It might be useful if Cloud storage providers started offering hooks into GnuPG or PGP to directly encrypt cloud data. However, most REST interfaces require some programming to use and it’s not too much of a stretch to program in encryption into this.

I like the data dispersal approach, but most argue that security is not guaranteed as reverse engineering the dispersal algorithm allows one to reconstruct the data stream. But the other more serious problem is that it only applies to Cleversafe storage, perhaps the dispersal algorithm should be open sourced (which it already is) and/or standardized.

There are possibly other approaches which I have missed here but these can easily be used to secure cloud data-at-rest. Possibly adding more security around the data token could also help alleviate this concern. Thoughts?

Post disclosure: I am not currently working with Cleversafe, any data security appliance provider, or cloud storage provider.