Who’s the next winner in data storage?

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

“The future is already here – just not evenly distributed”, W. Gibson

It starts as it always does outside the enterprise data center. In the line of businesses, in the development teams, in the small business organizations that don’t know any better but still have an unquenchable need for data storage.

It’s essentially an Innovator’s Dillemma situation. The upstarts are coming into the market at the lower end, lower margin side of the business that the major vendors don’t seem to care about, don’t service very well and are ignoring to their peril.

Yes, it doesn’t offer all the data services that the big guns (EMC, Dell, HDS, IBM, and NetApp) have. It doesn’t offer the data availability and reliability that enterprise data centers have come to demand from their storage. require. And it doesn’t have the performance of major enterprise data storage systems.

But what it does offer, is lower CapEx, unlimited scaleability, and much easier to manage and adopt data storage, albeit using a new protocol. It does have some inherent, hard to get around problems not the least of which is speed of data ingest/egress, highly variable latency and eventual consistency. There are other problems which are more easily solvable, with work, but the three listed above are intrinsic to the solution and need to be dealt with systematically.

And the winner is …

It has to be cloud storage providers and the big elephant in the room has to be Amazon. I know there’s a lot of hype surrounding AWS S3 and EC2 but you must admit that they are growing, doubling year over year. Yes it is starting from a much lower capacity point and yes, they are essentially providing “rentable” data storage space with limited or even non-existant storage services. But they are opening up whole new ways to consume storage that never existed before. And therein lies their advantage and threat to the major storage players today, unless they act to counter this upstart.

On AWS’s EC2 website there must be 4 dozen different applications that can be fired up in the matter of a click or two. When I checked out S3 you only need to signup and identify a bucket name to start depositing data (files, objects). After that, you are charged for the storage used, data transfer out (data in is free), and the number of HTTP GETs, PUTs, and other requests that are done on a per month basis. The first 5GB is free and comes with a judicious amount of gets, puts, and out data transfer bandwidth.

… but how can they attack the enterprise?

Aside from the three systemic weaknesses identified above, for enterprise customers they seem to lack enterprise security, advanced data services and high availability storage. Yes, NetApp’s Amazon Direct addresses some of the issues by placing enterprise owned, secured and highly available storage to be accessed by EC2 applications. But to really take over and make a dent in enterprise storage sales, Amazon needs something with enterprise class data services, availability and security with an on premises storage gateway that uses and consumes cloud storage, i.e., a cloud storage gateway. That way they can meet or exceed enterprise latency and services requirements at something that approximates S3 storage costs.

We have talked about cloud storage gateways before but none offer this level of storage service. An enterprise class S3 gateway would need to support all storage protocols, especially block (FC, FCoE, & iSCSI) and file (NFS & CIFS/SMB). It would need enterprise data services, such as read-writeable snapshots, thin provisioning, data deduplication/compression, and data mirroring/replication (synch and asynch). It would need to support standard management configuration capabilities, like VMware vCenter, Microsoft System Center, and SMI-S. It would need to mask the inherent variable latency of cloud storage through memory, SSD and hard disk data caching/tiering.. It would need to conceal the eventual consistency nature of cloud storage (see link above). And it would need to provide iron-clad, data security for cloud storage.

It would also need to be enterprise hardened, highly available and highly reliable. That means dually redundant, highly serviceable hardware FRUs, concurrent code load, multiple controllers with multiple, independent, high speed links to the internet. Todays, highly-available data storage requires multi-path storage networks, multiple-independent power sources and resilient cooling so adding multiple-independent, high-speed internet links to use Amazon S3 in the enterprise is not out of the question. In addition to the highly available and serviceable storage gateway capabilities described above it would need to supply high data integrity and reliability.

Who could build such a gateway?

I would say any of the major and some of the minor data storage players could easily do an S3 gateway if they desired. There are a couple of gateway startups (see link above) that have made a stab at it but none have it quite down pat or to the extent needed by the enterprise.

However, the problem with standalone gateways from other, non-Amazon vendors is that they could easily support other cloud storage platforms and most do. This is great for gateway suppliers but bad for Amazon’s market share.

So, I believe Amazon has to invest in it’s own storage gateway if they want to go after the enterprise. Of course, when they create an enterprise cloud storage gateway they will piss off all the other gateway providers and will signal their intention to target the enterprise storage market.

So who is the next winner in data storage – I have to believe its going to be and already is Amazon. Even if they don’t go after the enterprise which I feel is the major prize, they have already carved out an unbreachable market share in a new way to implement and use storage. But when (not if) they go after the enterprise, they will threaten every major storage player.

Yes but what about others?

Arguably, Microsoft Azure is in a better position than Amazon to go after the enterprise. Since their acquisition of StorSimple last year, they already have a gateway that with help, could be just what they need to provide enterprise class storage services using Azure. And they already have access to the enterprise, already have the services, distribution and goto market capabilities that addresses enterprise needs and requirements. Maybe they have it all but they are not yet at the scale of Amazon. Could they go after this – certainly, but will they?

Google is the other major unknown. They certainly have the capability to go after enterprise cloud storage if they want. They already have Google Cloud Storage, which is priced under Amazon’s S3 and provides similar services as far as I can tell. But they have even farther to go to get to the scale of Amazon. And they have less of the marketing, selling and service capabilities that are required to be an enterprise player. So I think they are the least likely of the big three cloud providers to be successful here.

There are many other players in cloud services that could make a play for enterprise cloud storage and emerge out of the pack, namely Rackspace, Savvis, Terremark and others. I suppose DropBox, Box and the other file sharing/collaboration providers might also be able to take a shot at it, if they wanted. But I am not sure any of them have enterprise storage on their radar just yet.

And I wouldn’t leave out the current major storage, networking and server players as they all could potentially go after enterprise cloud storage if they wanted to. And some are partly there already.

Comments?

 

Enhanced by Zemanta

Enterprise file synch

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Last fall at SNW in San Jose there were a few vendors touting enterprise file synchronization services each having a slightly different version of the requirements.   The one that comes most readily to mind was Egnyte which supported file synchronization across a hybrid cloud (public cloud and network storage) which we discussed in our Fall SNWUSA wrap up post last year.

The problem with BYOD

With bring your own devices (BYOD) corporate end users are quickly abandoning any pretense of IT control and turning consumer class file synchronization services to help  synch files across desktop, laptop and all mobile devices they haul around.   But the problem with these solutions such as DropBoxBoxOxygenCloud and others are that they are really outside of IT’s control.

Which is why there’s a real need today for enterprise class file synchronization solutions that exhibit the ease of use and set up available from consumer file synch systems but  offer IT security, compliance and control over the data that’s being moved into the cloud and across corporate and end user devices.

EMC Syncplicity and EMC on premises storage

Last week EMC announced an enterprise version of their recently acquired Syncplicity software that supports on-premises Isilon or Atmos storage, EMC’s own cloud storage offering.

In previous versions of Syncplicity storage was based in the cloud and used Amazon Web Services (AWS) for cloud orchestration and AWS S3 for cloud storage. With the latest release, EMC adds on premises storage to host user file synchronization services that can span mobile devices, laptops and end user desktops.

New Syncplicity users must download desktop client software to support file synchronization or mobile apps for mobile device synchronization.  After that it’s a simple matter of identifying which if any directories and/or files are to be synchronized with the cloud and/or shared with others.

However, with the Business (read enterprise) edition one also gets the Security and Compliance console which supports access control to define users and devices that can synchronize or share data, enforce data retention policies, remote wipe corporate data,  and native support for single sign services. In addition, one also gets centralized user and group management services to grant, change, revoke user and group access to data.  Also, one now obtains enterprise security with AES-256 data-at-rest encryption, separate key manager data centers and data storage data centers, quadruple replication of data for high disaster fault tolerance and SAS70 Type II compliant data centers.

If the client wants to use on premises storage, they would also need to deploy a VM virtual appliance somewhere in the data center to act as the gateway to file synchronization service requests. The file synch server would also presumably need access to the on premises storage and it’s unclear if the virtual appliance is in-band or out-of-band (see discussion on Egnyte’s solution options below).

Egnyte’s solution

Egnyte comes as a software only solution building a file server in the cloud for end user  storage. It also includes an Egnyte app for mobile hardware and the ever present web file browser.  Desktop file access is provided via mapped drives which access the Egnyte cloud file server gateway running as a virtual appliance.

One major difference between Syncplicity and Egnyte is that Egnyte offers a combination of both cloud and on premises storage but you cannot have just on premises storage. Syncplicity only offers one or the other storage for file data, i.e., file synchronization data can only be in the cloud or on local on premises storage but cannot be in both locations.

The other major difference is that Egnyte operates with just about anybody’s NAS storage such as EMC, IBM, and HDS for the on premises file storage.  It operates as an in-band, software appliance solution that traps file activity going to your on premises storage. In this case, one would need to start using a new location or directory for data to be synchronized or shared.

But for NetApp storage only (today), they utilize ONTAP APIs to offer out-of-band file synchronization solutions.  This means that you can keep NetApp data where it resides and just enable synchronization/shareability services for the NetApp file data in current directory locations.

Egnyte promises enterprise class data security with AD, LDAP and/or SSO user authentication, AES-256 data encryption and their own secure data centers.  No mention of separate key security in their literature.

As for cloud backend storage, Egnyte has it’s own public cloud or supports other cloud storage providers such as AWS S3, Microsoft Azure, NetApp Storage Grid and HP Public Cloud.

There’s more to Egnyte’s solution than just file synchronization and sharing but that’s the subject of today’s post. Perhaps we can cover them at more length in a future post if their interest.

File synchronization, cloud storage’s killer app?

The nice thing about these capabilities is that now IT staff can re-gain control over what is and isn’t synched and shared across multiple devices.  Up until now all this was happening outside the data center and external to IT control.

From Egnyte’s perspective, they are seeing more and more enterprises wanting data both on premises for performance and compliance as well as in the cloud storage for ubiquitous access.  They feel its both a sharability demand between an enterprise’s far flung team members and potentially client/customer personnel as well as a need to access, edit and propagate silo’d corporate information using new mobile devices that everyone has these days.

In any event, Enterprise file synchronization and sharing is emerging as one of the killer apps for cloud storage.  Up to this point cloud gateways made sense for SME backup or disaster recovery solutions but IMO, didn’t really take off beyond that space.  But if you can package a robust and secure file sharing and synchronization solution around cloud storage then you just might have something that enterprise customers are clamoring for.

~~~~

Comments?

IT as a service on the Cloud is not the end

Prison Planet by AZRainman (cc) (from Flickr)
Prison Planet by AZRainman (cc) (from Flickr)

[Long post] Read another intriguing post by David Vellente at Wikibon today about the emergence of IT shops becoming service organizations to their industries using the cloud to hosting these services.  I am not in complete agreement with Dave but he certainly describes a convincing picture.

His main points are:

  • Cloud storage and cloud computing are emerging as a favorite platform for IT-as-a-service.
  • Specialization and economics of scale will generate an IT-as-a-service capability for any organization’s information processing needs.

I would have to say another tenet of his overall discussion is that IT matters, a lot and I couldn’t agree more.

Cloud reality

For some reason I have been talking a lot about cloud storage this past couple of weeks, in multiple distinct venues.  On the one hand, I was talking with a VAR the other day and they were extremely excited about the opportunity in cloud storage. It seems getting SMB customers to sign up for a slice of storage is easy and once they have that, getting them to use more becomes a habit they can’t get rid of.

I thought maybe the enterprise level would be immune to such inducements, but no.  Another cloud storage gateway vendor,  StorSimple, I talked with recently was touting the great success they were having displacing tier 2 storage in the enterprise.

Lately, I heard that some small businesses/startups have decided to abandon their own IT infrastructure altogether and depend entirely on cloud offerings from Amazon, RackSpace and others for all they need.  They argue that such infrastructure, for all its current faults, will have less downtime than anything they could create on their own within a limited budget.

So, cloud seems to be taking off, everywhere I look.

Vertical support for IT as a service

Dave mentions plenty in his lengthy post that a number of sophisticated IT organizations are taking their internal services and becoming IT-as-a-service profit centers.  Yes, hard to disagree with this one as well.

But, it’s not the end of IT organizations

However, where I disagree with Dave is that he sees this as a winning solution, taking over all internal IT activities.  In his view, either your IT group becomes an external service profit center or it’s destined to be replaced by someone else’s service offering(s).

I don’t believe this. To say that IT as a service will displace 50+ years of technology development in the enterprise is just overstatement.

Dave talks about WINTEL, displacing mainframes as the two monopolies created in IT.  But the fact remains, WINTEL has not eliminated mainframes.  Mainframes still exist and arguably, today are still expanding through out the world.

Dave states that the introduction of WINTEL reduced the switching cost of mainframes, and that the internet and the cloud that follows, have reduced the costs yet again. I agree.  But, that doesn’t mean that switching cost is 0.

Ask anyone whether SalesForce.com switching cost inhibits them from changing services and more than likely they will say yes.  Switching costs have come down, but they are still a viable barrier to change.

Cloud computing and storage generates similar switching costs not to mention the time it takes to transfer TBs of data over a WAN.  Whether a cloud service uses AWS interface, OpenStack, Azzure or any of the other REST/SOAP cloud storage/cloud computing protocols is a formidable barrier to change.  It would be great if OpenStack were to take over but it hasn’t yet, and most likely won’t in the long run.  Mainly because the entrenched suppliers don’t want to help their competition.

IT matters, a lot to my organization

What I see happening is not that much different from what Dave sees, it’s only a matter of degree.  Some IT shops will become service organizations to their vertical but there will remain a large proportion of IT shops that see

  • That their technology is a differentiator.
  • That their technology is not something they want their competition using.
  • That their technology is too important to their corporate advantage to sell to others.

How much of this is reality vs. fiction is another matter.

Nonetheless, I firmly believe that a majority of IT shops that exist today will not convert to using IT as a service.   Some of this is due to sunk costs but a lot will be due to the belief that they are truly better than the service.

That’s not to say that new organizations, just starting out might be more interested in utilizing IT as a service.  For these entities, service offerings are going to be an appealing alternative.

However, a small portion of these startups may just as likely conclude that they can do better or believe it’s more important for them to develop their own IT services to help them get ahead.  Similarly, how much of this is make believe is TBD.

In the end, I believe IT as a service will take it’s place alongside IT developed services and IT outsourced development as yet another capability that any company can deploy to provide information processing for their organization.

The real problem

In my view, the real problem with IT developed services today is development disease.  Most organizations, would like increased functionality, and want it ASAP but they just can’t develop working functionality fast enough.  I call slow functionality development, missing critical features with lots of bugs development disease.  And it’s everywhere today and has never really gone away.

Some of this is due to poor IT infrastructure, some is due to the inability to use new development frameworks, and some of it is due to a lack of skills.  If IT had some pill they could take to help them develop business processing faster, consuming less resources with much fewer bugs and fuller functionality, they would never consider IT as a service.

That’s where the new frameworks of Ruby on Rails, SpringForce and the like are exciting. Their promise is providing faster functionality with fewer failures. When that happens, organizations will move away from IT as a service in droves, and back to internally developed capabilities.

But, we’re not there yet.

—-

Comments?

Cloud storage replication does not suffice for backups – revisited

Free Whipped Cream Clouds on True Blue Sky Creative Commons by Pink Sherbet Photography (cc) (from Flickr)
Free Whipped Cream Clouds on True Blue Sky Creative Commons by Pink Sherbet Photography (cc) (from Flickr)

I was talking with another cloud storage gateway provider today and I asked them if they do any sort of backup for data sent to the cloud.  His answer disturbed me – they said they depend on backend cloud storage providers replication services to provide data protection – sigh. Curtis and I have written about this before (see my Does Cloud Storage need Backup? post and Replication is not backup by W. Curtis Preston).

Cloud replication is not backup

Cloud replication is not data protection for anything but hardware failures!   Much more common than hardware failures is mistakes by end-users who inadvertently delete files, overwrite files, corrupt files, or systems that corrupt files any of which would just be replicated in error throughout the cloud storage multi-verse.  (In fact, cloud storage itself can lead to corruption see Eventual data consistency and cloud storage).

Replication does a nice job of covering a data center or hardware failure which leaves data at one site inaccessible but allows access to a replica of the data from another site.  As far as I am concerned there’s nothing better than replication for these sorts of DR purposes but it does nothing for someone deleting the wrong file. (I one time did a “rm * *” command on a shared Unix directory – it wasn’t pretty).

Some cloud storage (backend) vendors delay the deletion of blobs/containers until sometime later  as one solution to this problem.  By doing this, the data “stays around” for “sometime” after being deleted and can be restored via special request to the cloud storage vendor. The only problem with this is that “sometime” is an ill-defined, nebulous concept which is not guaranteed/specified in any way.  Also, depending on the “fullness” of the cloud storage, this time frame may be much shorter or longer.  End-user data protection cannot depend on such a wishy-washy arrangement.

Other solutions to data protection for cloud storage

One way is to have a local backup of any data located in cloud storage.  But this kind of defeats the purpose of cloud storage and has the cloud data being stored both locally (as backups) and remotely.  I suppose the backup data could be sent to another cloud storage provider but someone/somewhere would need to support some sort of versioning to be able to keep multiple iterations of the data around, e.g., 90 days worth of backups.  Sounds like a backup package front-ending cloud storage to me…

Another approach is to have the gateway provider supply some sort of backup internally using the very same cloud storage to hold various versions of data.  As long as the user can specify how many days or versions of backups can be held this works great, as cloud replication supports availability in the face of hardware failures and multiple versions support availability in the face of finger checks/logical corruptions.

This problem can be solved in many ways, but just using cloud replication is not one of them.

Listen up folks, whenever you think about putting data in the cloud, you need to ask about backups among other things.  If they say we only offer data replication provided by the cloud storage backend – go somewhere else. Trust me, there are solutions out there that really backup cloud data.

Cirtas surfaces

Cirtas system (from www.Cirtas.com)
Cirtas system (from http://www.Cirtas.com)

Yesterday, Cirtas came out of stealth mode and into the lime-light with their new Bluejet cloud storage controller hardware system.  Cirtas joins a number of other products offering cloud storage to the enterprise by supplying a more standard interface which we have discussed before (see Cloud storage gateways surface).

With Cirtas, the interface to the backend cloud storage is supplied as iSCSI, similar to StorSimple‘s product we reviewed previously (see More cloud storage gateways …).  However, StorSimple is focused on Microsoft environments only and select applications, namely Sharepoint, Exchange and Microsoft file services.  Cirtas seems aimed at the more general purpose application environment that uses iSCSI storage protocols.  The only other iSCSI cloud storage gateway providers appear to be TwinStrata and Panzura but the information on Panzura’s website is sketchy.

In addition, Cirtas, StorSimple (and Panzura) provide hardware appliances whereas most of the other cloud storage gateways (NasuniGladinet, TwinStrata) only come as software  packages.  Although Gladinet appears to be targeted at the home office environment.

Cirtas’s Bluejet controller includes onboard RAM cache, SSD flash drives and SAS drives (5TB total) that is used to provide higher performing cloud storage access.  Bluejet also supports space efficient snapshots, data encryption, thin provisioning, data deduplication, and data compression. The Cirtas team comes out of the WAN optimization space so they have incorporated some of these data saving technologies into their product to reduce bandwidth requirement and cloud storage demand.

Cirtas currently supports Amazon S3 and IronMountain cloud storage but more are on the way.  They also recently completed their Series A round of funding which included NEA and Amazon.

Cirtas says they can support local storage performance but have no benchmarks to prove this out.  With iSCSI there aren’t many benchmark options but one could use iSCSI to support Microsoft Exchange and submit something on the Exchange Solution Review Program (ESRP) which might show off this capability.

Nonetheless, cloud storage can be considerably cheaper than primary storage ($/GB basis) and no doubt even with the ~$70K Cirtas Bluejet cloud storage controller, Cirtas supports a significant cost advantage.   With the appliance purchase, you get a basic storage key which allows you to store up to 20TB of data on (through) the appliance, if you have more data to store, additional storage keys can be purchased separately.  This 20TB license does not include the cloud storage costs for storing data on the cloud nor the bandwidth costs to upload and/or access the data on the cloud.

Seems like interest in cloud storage gateways/controllers is heating up, with the addition of Cirtas I count at least 4 that target the enterprise space and when Panzura releases a product that will add another.

Anything I missed?

More cloud storage gateways come out

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Multiple cloud storage gateways either have been announced or are coming out in the next quarter or so. We have talked before about Nasuni’s file cloud storage gateway appliance, but now that more are out one can have a better appreciation of the cloud gateway space.

StorSimple

Last week I was talking with StorSimple that just introduced their cloud storage gateway which provides a iSCSI block protocol interface to cloud storage with an onsite data caching.  Their appliance offers a cloud storage cache residing on disk and/or optional flash storage (SSDs) and provides iSCSI storage speeds for highly active working set data residing on the cache or cloud storage speeds for non-working set data.

Data is deduplicated to minimize storage space requirements.  In addition data sent to the cloud is compressed and encrypted. Both deduplication and compression can reduce WAN bandwidth requirements considerably.    Their appliance also offers snapshots and “cloud clones”.  Cloud clones are complete offsite (cloud) copies of a LUN which can then be maintained in synch with the gateway LUNs by copying daily change logs and applying the logs.

StorSimple works with Microsoft’s Azure, AT&T, EMC Atmos, Iron Mountan and Amazon’s S3 cloud storage providers.   A single appliance can support multiple cloud storage providers segregated on a LUN basis.  Although how cross-LUN deduplication works across multiple cloud storage providers was not discussed.

Their product can be purchased as a hardware appliance with a few 100GB of NAND/Flash storage up to a 150TB of SATA storage.  It also can be purchased as a virtual appliance at lower cost but also much lower performance.

Cirtas

In addition to StorSimple, I have talked with Cirtas which has yet to completely emerge from stealth but what’s apparent from their website is that the Cirtas appliance provides “storage protocols” to server systems, and can store data directly on storage subsystems or on cloud storage.

Storage protocols could mean any block storage protocol which could be FC and/or iSCSI but alternatively, it might mean file protocols I can’t be certain.  Having access to independent, standalone storage arrays may mean that  clients can use their own storage as a ‘cloud data cache’.  Unclear how Cirtas talks to their onsite backend storage but presumably this is FC and/or iSCSI as well.  And somehow some of this data is stored out on the cloud.

So from our perspective it looks somewhat similar to StorSimple with the exception that it uses external storage subsystems for its cloud data cache for Cirtas vs. internal storage for StorSimple.  Few other details were publicly available as this post went out.

Panzura

Although I have not talked directly with Panzura they seem to offer a unique form of cloud storage gateway, one that is specific to some applications.  For example, the Panzura SharePoint appliance actually “runs” part of the SharePoint application (according to their website) and as such, can better ascertain which data should be local versus stored in the cloud.  It seems to have  both access to cloud storage as well as local independent storage appliances.

In addition to a SharePoint appliance they offer a “”backup/DR” target that apparently supports NDMP, VTL, iSCSI, and NFS/CIFS protocols to store (backup) data on the cloud. In this version they show no local storage behind their appliance by which I assume that backup data is only stored in the cloud.

Finally, they offer a “file sharing” appliance used to share files across multiple sites where files reside both locally and in the cloud.  It appears that cloud copies of shared files are locked/WORM like but I can’t be certain.  Having not talked to Panzura before, much of their product is unclear.

In summary

We now have both a file access and at least one iSCSI block protocol cloud storage gateway, currently available, publicly announced, i.e., Nasuni and StorSimple.  Cirtas, which is in the process of coming out, will support a “storage protocol” access to cloud storage and Panzura offers it all (SharePoint direct, iSCSI, CIFS, NFS, VTL & NDMP cloud storage access protocols).  There are other gateways just focused on backup data, but I reserve the term cloud storage gateways for those that provide some sort of general purpose storage or file protocol access.

However, Since last weeks discussion of eventual consistency, I am becoming a bit more concerned about cloud storage gateways and their capabilities.  This deserves some serious discussion at the cloud storage provider level and but most assuredly, at the gateway level.  We need some sort of generic statement that says they guarantee immediate consistency for data at the gateway level even though most cloud storage providers only support “eventual consistency”.  Barring that, using cloud storage for anything that is updated frequently would be considered unwise.

If anyone knows of another cloud storage gateway I would appreciate a heads up.  In any case, the technology is still young yet and I would say that this isn’t the last gateway to come out but it feels like these provide coverage for just about any file or block protocol one might use to access cloud storage.

Cloud Storage Gateways Surface

Who says there are no clouds today by akakumo (cc) (from Flickr)
Who says there are no clouds today by akakumo (cc) (from Flickr)

One problem holding back general purpose cloud storage has been the lack of a “standard” way to get data in and out of the cloud.  Most cloud storage providers supply a REST interface, an object file interface or other proprietary ways to use their facilities.  The problem with this is that they all require some form of code development on the part of the cloud storage customer in order to make use of these interfaces.

It would be much easier if cloud storage could just talk iSCSI, FCoE, FC, NFS, CIFS, FTP,  etc. access protocols.  Then any data center could use the cloud with a NIC/HBA/CNA and just configure the cloud storage as a bunch of LUNs or file systems/mount points/shares.  Probably FCoE or FC might be difficult to use due to timeouts or other QoS (quality of service) issues but iSCSI and the file level protocols should be able to support cloud storage access without such concerns.

So which cloud storage support these protocols today?  Nirvanix supports CloudNAS used to access their facilities via NFS, CIFS and FTP,  ParaScale supports NFS and FTP, while Amazon S3 and Rackspace CloudFiles do not seem to support any of these interfaces.  There are probably other general purpose cloud storage providers I am missing here but these will suffice for now.   Wouldn’t it be better if some independent vendor supplied one way to talk to all of these storage environments.

How can gateways help?

For one example, Nasuni recently emerged from stealth mode, releasing a beta version of a cloud storage gateway that supports file access to a number of providers. Currently, Nasuni supports CIFS file protocol as a front end for Amazon S3, IronMountain ASP, Nirvanix, and coming soon Rackspace CloudFile.

However, Nasuni is more than just a file protocol converter for cloud storage.  It also supplies a data cache, file snapshot services, data compression/encryption, and other cloud storage management tools. Specifically,

  • Cloud data cache – their gateway maintains a disk cache of frequently accessed data that can be accessed directly without having to go out to the cloud storage.  File data is chunked by the gateway and flushed out of cache to the backend provider. How such a disk cache is maintained coherently across multiple gateway nodes was not discussed.
  • File snapshot services – their gateway supports a point-in-time copy of file date used for backup and other purposes.  The snapshot is created on a time schedule and provides an incremental backup of cloud file data.  Presumably these snapshot chunks are also stored in the cloud.
  • Data compression/encryption services – their gateway compresses file chunks and then encrypts it before sending them to the cloud.  Encryption keys can optionally be maintained by the customer or be automatically maintained by the gateway
  • Cloud storage management services – the gateway configures the cloud storage services needed to define volumes, monitors cloud and network performance and provides a single bill for all cloud storage used by the customer.

By chunking the files and caching them, data read from the cloud should be accessible much faster than normal cloud file access.  Also by providing a form of snapshot, cloud data should be easier to backup and subsequently restore. Although Nasuni’s website didn’t provide much information on the snapshot service, such capabilities have been around for a long time and found very useful in other storage systems.

Nasuni is provided as a software only solution. Once installed and activated on your server hardware, it’s billed for as a service and ultimately is charged for on top of any cloud storage you use.  You sign up for supported cloud storage providers through Nasuni’s service portal.

How well all this works is open for discussion.  We have discussed caching appliances before both from EMC and others.  Two issues have emerged from our discussions, how well caching coherence is maintained across nodes is non-trivial and the economics of a caching appliance are subject to some debate.  However, cloud gateways are more than just caching appliances and as a way of advancing cloud storage adoption, such gateways can only help.

Full disclosure: I currently do no business with Nasuni.