Why SO/CFS, Why Now

Why all the interest in Scale-out/Cluster File Systems (SO/CFS) and why now?

Why now is probably easiest to answer, valuations are down. NetApp is migrating GX to their main platform, IBM continues to breath life in GPFS, HP buys IBRIX, and now LSI buys ONStor. It seems every day brings some new activity with scale out/cluster file system products. Interest seems to be based on the perception that SO/CFS would make a good storage backbone/infrastructure for Cloud Computing. But this takes some discussion…

What can one do with a SO/CFS.

  • As I see it SO/CFS provides a way to quickly scale out and scale up NAS system performance. This doesn’t mean that file data can be in multiple locations/sites or that files can be supplied across the WAN but file performance can be scaled independently of file storage.
  • What seems even more appealing is the amount of data/size of the file systems supported by SO/CFS systems. It seems like PBs of storage can be supported and served up as millions of files. Now that sounds like something useful to Cloud environments if one could front end it with some Cloud enabled services.

So why aren’t they taking off because low valuations signal to me they aren’t doing well. I think today few end-users need to support millions of files, PBs of data or the performance these products could sustain. Currently, their main market is the high performance computing (HPC) labs but there are only so many physic/genomic labs out there that need this much data/performance.

That’s where the cloud enters the picture. Cloud’s promise is that it can aggregate everybody’s computing and storage demand into a service offering where 1,000s of user can login from the internet and do their work. With 1,000s of users each with 1,000s files, we now start to talk in the million file range.

Ok, so if the cloud market is coming, then maybe SO/CFS’s has some lasting/broad appeal. One can see preliminary cloud services emerging today especially in backup services such as Mozy or Norton Online Backup (see Norton Online Backup) but not many cloud services exist today with general purpose/generic capabilities, Amazon notwithstanding. If the Cloud market takes time to develop, then buying into SO/CFS technology while it’s relatively cheap and early in its adoption cycle makes sense.

There are many ways to supply cloud storage. Some companies have developed their own brand new solutions here, EMC/Atmos and DataDirect Network/WOS (see DataDirect Network WOS) seem most prominent. Many others exist, toiling away to address this very same market. Which of these solutions survive/succeed in the Cloud market is an open question that will take years to answer.

HDS High Availability Manager(HAM)

What does HAM look like to the open systems end user. We need to break this question up into two parts – one part for USP-V internal storage and the other part for external storage.

It appears that for internal storage first you need data replication services such as asynch or synchronous replication between the two USP-V storage subsystems. But here you still need some shared External storage used as a quorum disk. Then once all this is set up under HAM the two subsystems can automatically failover access to the replicated internal and shared external storage from one USP-V to the other.

For external storage it appears that this storage must be shared between the two USP-V systems and whenever the primary one fails the secondary one can take over (failover) data storage responsibilities for the failing USP-V frontend.

What does this do for data migration? Apparently, using automated failover with HAM one can migrate date between two different storage pools and then failover server access from one to the other non-disruptively.

Obviously all the servers accessing storage under HAM control would need to be able to access both USP-Vs in order for this to all work properly.

Continuous availability is a hard nut to crack. HDS seems to have taken a shot at doing this from a purely storage subsystem perspective. This might be very useful for data centers running heterogeneous server environments. Typically server clustering software is OS specific like MSCS. Symantec being the lone exception with VCS which supports multiple OSs. Such server clustering can handle storage outages but also depend on storage replication services to make this work.

Unclear to me which is preferable but when you add the non-disruptive data migration – it seems that HAM might make sense.