Why SO/CFS, Why Now

Why all the interest in Scale-out/Cluster File Systems (SO/CFS) and why now?

Why now is probably easiest to answer, valuations are down. NetApp is migrating GX to their main platform, IBM continues to breath life in GPFS, HP buys IBRIX, and now LSI buys ONStor. It seems every day brings some new activity with scale out/cluster file system products. Interest seems to be based on the perception that SO/CFS would make a good storage backbone/infrastructure for Cloud Computing. But this takes some discussion…

What can one do with a SO/CFS.

  • As I see it SO/CFS provides a way to quickly scale out and scale up NAS system performance. This doesn’t mean that file data can be in multiple locations/sites or that files can be supplied across the WAN but file performance can be scaled independently of file storage.
  • What seems even more appealing is the amount of data/size of the file systems supported by SO/CFS systems. It seems like PBs of storage can be supported and served up as millions of files. Now that sounds like something useful to Cloud environments if one could front end it with some Cloud enabled services.

So why aren’t they taking off because low valuations signal to me they aren’t doing well. I think today few end-users need to support millions of files, PBs of data or the performance these products could sustain. Currently, their main market is the high performance computing (HPC) labs but there are only so many physic/genomic labs out there that need this much data/performance.

That’s where the cloud enters the picture. Cloud’s promise is that it can aggregate everybody’s computing and storage demand into a service offering where 1,000s of user can login from the internet and do their work. With 1,000s of users each with 1,000s files, we now start to talk in the million file range.

Ok, so if the cloud market is coming, then maybe SO/CFS’s has some lasting/broad appeal. One can see preliminary cloud services emerging today especially in backup services such as Mozy or Norton Online Backup (see Norton Online Backup) but not many cloud services exist today with general purpose/generic capabilities, Amazon notwithstanding. If the Cloud market takes time to develop, then buying into SO/CFS technology while it’s relatively cheap and early in its adoption cycle makes sense.

There are many ways to supply cloud storage. Some companies have developed their own brand new solutions here, EMC/Atmos and DataDirect Network/WOS (see DataDirect Network WOS) seem most prominent. Many others exist, toiling away to address this very same market. Which of these solutions survive/succeed in the Cloud market is an open question that will take years to answer.

EMC's Data Domain ROI

I am trying to put EMC’s price for Data Domain (DDup) into perspective but am having difficulty. According to InfoWorld article on EMC acquisitions ’03-’06 and some other research this $2.2B$2.4B is more money (not inflation adjusted) than anything in EMC’s previous acquisition history. The only thing that comes close was the RSA acquisition for $2.1B in ’06.

VMware only cost EMC $625M and has been by all accounts, very successful being spun out of EMC in an IPO and currently shows a market cap of ~$10.2B. Documentum cost $1.7B and Legato only cost $1.3B both of which are still within EMC.

Something has happened here, in a recession valuations are supposed to be more realistic not less realistic. At Data Domain’s TTM revenues ($300.5M) this will take over 7 years to breakeven on a straightline view. If one considers WACC (weighted average cost of capital) it looks much worse. Looking at DDup’s earnings makes it look even worse.

Other than fire up EMC’s marketing and sales engine to sell more DDup products, what else can EMC do to gain a better return on it’s DDup acquisition? (not in order)

  • Move EMC’s current Disk Libraries to DDup technology and let go of Quantum-FalconStor OEM agreements and/or abandon the current DL product line and substitute Ddup
  • Incorporate DDup technology into Legato Networker for target deduplication applications
  • Incorporate DDup technology into Mozy and Atmos
  • Incorporate DDup technology into Documentum
  • Incorporate DDup technology into Centera and Celerra

Can EMC selling DDup products and doing all this to better its technology double the revenue earnings and savings derived from DDup products and technology – maybe. But the incorporation of DDup into Centera and Celerra could just as easily decrease EMC revenues profits from the storage capacity lost depending on the relative price differences.

I figure the Disk Library, Legato, and Mozy integrations would be first on anyone’s list. Atmos next, and Celerra-Centera last.

As for what to add to DDup’s product line. Possibly additions are around the top end and the bottom end. DDup has been moving up market of late and integration with EMC DL might just help take it there. Down market, there is a potential market of small businesses that might want to use DDup technology at the right price point.

Not sure if the money paid for Ddup still makes sense but at least it begins to look better…

HDS upgrades AMS2000

Today, HDS refreshed their AMS2000 product line with a new high density drive expansion tray with 48-drives and up to a maximum capacity of 48TB, 8Gps FC (8GFC) ports for the AMS2300 and AMS2500 systems, and a new NEBS Level-3 compliant and DC powered version, the AMS2500DC.

HDS also re-iterated their stance that Dynamic Provisioning will be available on AMS2000 in the 2nd half of this year. (See my prior post on this subject for more information).

HDS also mentioned that the AMS2000 now supports external authentication infrastructure for storage managers and will support Common Criteria Certification for more stringent data security needs. The external authentication will be available in the second half of the year.

I find the DC version pretty interesting and signals a renewed interest in telecom OEM applications for this mid-range storage subsystem. Unclear to me whether this is a significant market for HDS. The 2500DC only supports 4Gps FC and is packaged with a Cisco MDS 9124 SAN switch. DC powered storage is also more energy efficient than AC storage.

Other than that the Common Criteria Certification can be a big thing for those companies or government entitities with significant interest in secure data centers. There was no specific time frame for this certification but presumably they have started the process.

As for the rest of this, it’s a pretty straightforward refresh.

DataDirect Networks WOS cloud storage

DataDirect Networks (DDN) announced this week a new product offering private cloud services. Apparently the new Web Object Scaler (WOS) is a storage appliance that can be clustered together across multiple sites and offers a single global file name space across all the sites. Also the WOS cloud supports policy file replication and distribution across sites for redundancy and/or load ballancing purposes.

DDN’s press release said a WOS cloud can service up to 1 million random file reads per second. They did not indicate the number of nodes required to sustain this level of performance and they didn’t identify the protocol that was used to do this. The press release implied low-latency file access but didn’t define what they meant here. 1M file reads/sec doesn’t necessarily mean they are all read quickly. Also, there appears to b more work for a file write than a file read and there is no statement on file ingest rate provided.

There are many systems out there touting a global name space. However not many say thier global name space spans across multiple sites. I suppose cloud storage would need to support such a facility to keep file names straight across sites. Nonetheless, such name space services would imply more overhead during file creation/deletion to keep everything straight and meta data duplication/replication/redundancy to support this.

Many questions on how this all works together with NFS or CIFS but it’s entirely possible that WOS doesn’t support either file access protocol and just depends on HTML get and post to access files or similar web services. Moreover, assuming WOS supports NFS or CIFS protocols, I often wonder why these sorts of announcements aren’t paired with a SPECsfs(r) 2008 benchmark report which could validate any performance claim at least at the NFS or CIFS protocol levels.

I talked to one media person a couple of weeks ago and they said cloud storage is getting boring. There are a lot of projects (e.g., Atmos from EMC) out there targeting future cloud storage, I hope for their sake boring doesn’t mean no market exists for cloud storage.

HDS Dynamic Provisioning for AMS

HDS announced support today for their thin provisioning (called Dynamic Provisioning) feature to be available in their mid-range storage subsystem family the AMS. Expanding the subsystems that support Thin provisioning can only help the customer in the long run.

It’s not clear whether you can add dynamic provisioning to an already in place AMS subsystem or if it’s only available on a fresh installation of an AMS subsystem. Also no pricing was announced for this feature. In the past, HDS charged double the price of a GB of storage when it was in a thinly provisioned pool.

As you may recall, thin provisioning is a little like a room with a bunch of inflatable castles inside. Each castle starts with it’s initial inflation amount. As demand dictates, each castle can independently inflate to whatever level is needed to support the current workload up to that castles limit and the overall limit imposed by the room the castles inhabit. In this analogy, the castles are LUN storage volumes, the room the castles are located in, is the physical storage pool for the thinly provisioned volumes, and the air inside the castles is the physical disk space consumed by the thinly provisioned volumes.

In contrast, hard provisioning is like building permanent castles (LUNS) in stone, any change to the size of a structure would require major renovation and/or possible destruction of the original castle (deletion of the LUN).

When HDS first came out with dynamic provisioning it was only available for USP-V internal storage, later they released the functionality for USP-V external storage. This announcement seems to complete the roll out to all their SAN storage subsystems.

HDS also announced today a new service called the Storage Reclamation Service that helps
1) Assess whether thin provisioning will work well in your environment
2) Provide tools and support to identify candidate LUNs for thin provisioning, and
3) Configure new thinly provisioned LUNs and migrate your data over to the thinly provisioned storage.

Other products that support SAN storage thin provisioning include 3PAR, Compellent, EMC DMX, IBM SVC, NetApp and PillarData.

HDS High Availability Manager(HAM)

What does HAM look like to the open systems end user. We need to break this question up into two parts – one part for USP-V internal storage and the other part for external storage.

It appears that for internal storage first you need data replication services such as asynch or synchronous replication between the two USP-V storage subsystems. But here you still need some shared External storage used as a quorum disk. Then once all this is set up under HAM the two subsystems can automatically failover access to the replicated internal and shared external storage from one USP-V to the other.

For external storage it appears that this storage must be shared between the two USP-V systems and whenever the primary one fails the secondary one can take over (failover) data storage responsibilities for the failing USP-V frontend.

What does this do for data migration? Apparently, using automated failover with HAM one can migrate date between two different storage pools and then failover server access from one to the other non-disruptively.

Obviously all the servers accessing storage under HAM control would need to be able to access both USP-Vs in order for this to all work properly.

Continuous availability is a hard nut to crack. HDS seems to have taken a shot at doing this from a purely storage subsystem perspective. This might be very useful for data centers running heterogeneous server environments. Typically server clustering software is OS specific like MSCS. Symantec being the lone exception with VCS which supports multiple OSs. Such server clustering can handle storage outages but also depend on storage replication services to make this work.

Unclear to me which is preferable but when you add the non-disruptive data migration – it seems that HAM might make sense.

Data Domain bidding war

It’s unclear to me what EMC would want with Data Domain (DD) other than to lockup deduplication technology across the enterprise. EMC has Avamar for Source dedupe, has DL for target dedupe, has Celerra Dedupe and the only one’s missing are V-Max, Symm & Clariion dedupe.

My guess is that EMC sees Data Domain’s market share as the primary target. It doesn’t take a lot of imagination to figure that once Data Domain is a part of EMC, EMC’s Disk Library (DL) offerings will move over to DD technology. Which probably leaves FalconStor/Quantum technology used in DL today as outsiders.

EMC’s $100M loan to Quantum last month probably was just insurance to keep a business partner afloat until something better came along or they could make it on their own. The DD deal would leave Quantum parntership supporting EMC with just Quantum’s tape offerings.

Quantum deduplication technology doesn’t have nearly the market share that DD has in the enterprise but they have won a number of OEM deals not the least of which is EMC and they were looking to expand. But if EMC buys DD, this OEM agreement will end soon.

I wonder if DD is worth $1.8B in cash what could Sepaton be worth. They seem to be the only pure play dedupe appliance left standing out there.

Not sure whether NetApp will up their bid but they always seem to enjoy competing with EMC. Also unclear how much of this bid is EMC wanting DD or EMC just wanting to hurt NetApp, either way DD stockholders win out in the end.