Why SO/CFS, Why Now

Why all the interest in Scale-out/Cluster File Systems (SO/CFS) and why now?

Why now is probably easiest to answer, valuations are down. NetApp is migrating GX to their main platform, IBM continues to breath life in GPFS, HP buys IBRIX, and now LSI buys ONStor. It seems every day brings some new activity with scale out/cluster file system products. Interest seems to be based on the perception that SO/CFS would make a good storage backbone/infrastructure for Cloud Computing. But this takes some discussion…

What can one do with a SO/CFS.

  • As I see it SO/CFS provides a way to quickly scale out and scale up NAS system performance. This doesn’t mean that file data can be in multiple locations/sites or that files can be supplied across the WAN but file performance can be scaled independently of file storage.
  • What seems even more appealing is the amount of data/size of the file systems supported by SO/CFS systems. It seems like PBs of storage can be supported and served up as millions of files. Now that sounds like something useful to Cloud environments if one could front end it with some Cloud enabled services.

So why aren’t they taking off because low valuations signal to me they aren’t doing well. I think today few end-users need to support millions of files, PBs of data or the performance these products could sustain. Currently, their main market is the high performance computing (HPC) labs but there are only so many physic/genomic labs out there that need this much data/performance.

That’s where the cloud enters the picture. Cloud’s promise is that it can aggregate everybody’s computing and storage demand into a service offering where 1,000s of user can login from the internet and do their work. With 1,000s of users each with 1,000s files, we now start to talk in the million file range.

Ok, so if the cloud market is coming, then maybe SO/CFS’s has some lasting/broad appeal. One can see preliminary cloud services emerging today especially in backup services such as Mozy or Norton Online Backup (see Norton Online Backup) but not many cloud services exist today with general purpose/generic capabilities, Amazon notwithstanding. If the Cloud market takes time to develop, then buying into SO/CFS technology while it’s relatively cheap and early in its adoption cycle makes sense.

There are many ways to supply cloud storage. Some companies have developed their own brand new solutions here, EMC/Atmos and DataDirect Network/WOS (see DataDirect Network WOS) seem most prominent. Many others exist, toiling away to address this very same market. Which of these solutions survive/succeed in the Cloud market is an open question that will take years to answer.

DataDirect Networks WOS cloud storage

DataDirect Networks (DDN) announced this week a new product offering private cloud services. Apparently the new Web Object Scaler (WOS) is a storage appliance that can be clustered together across multiple sites and offers a single global file name space across all the sites. Also the WOS cloud supports policy file replication and distribution across sites for redundancy and/or load ballancing purposes.

DDN’s press release said a WOS cloud can service up to 1 million random file reads per second. They did not indicate the number of nodes required to sustain this level of performance and they didn’t identify the protocol that was used to do this. The press release implied low-latency file access but didn’t define what they meant here. 1M file reads/sec doesn’t necessarily mean they are all read quickly. Also, there appears to b more work for a file write than a file read and there is no statement on file ingest rate provided.

There are many systems out there touting a global name space. However not many say thier global name space spans across multiple sites. I suppose cloud storage would need to support such a facility to keep file names straight across sites. Nonetheless, such name space services would imply more overhead during file creation/deletion to keep everything straight and meta data duplication/replication/redundancy to support this.

Many questions on how this all works together with NFS or CIFS but it’s entirely possible that WOS doesn’t support either file access protocol and just depends on HTML get and post to access files or similar web services. Moreover, assuming WOS supports NFS or CIFS protocols, I often wonder why these sorts of announcements aren’t paired with a SPECsfs(r) 2008 benchmark report which could validate any performance claim at least at the NFS or CIFS protocol levels.

I talked to one media person a couple of weeks ago and they said cloud storage is getting boring. There are a lot of projects (e.g., Atmos from EMC) out there targeting future cloud storage, I hope for their sake boring doesn’t mean no market exists for cloud storage.