Fall SNWUSA 2013

Here’s my thoughts on SNWUSA which occurred this past week in the Long Beach Convention Center.

First, it was a great location. I saw a number of users I haven’t seen at SNWUSA ever before, some of which I have known for years from other (non-storage) venues.

Second, the exhibit hall was scantly populated. There were no major storage vendors at the show at all. Gold sponsors included NEC, Riverbed, & Sepaton, representing the largest exhibiters presenn. Making up the next (Contributing) tier were Western Digital, Toshiba, Active Archive Alliance, and LTO consortium with a smattering of smaller companies.  Finally, there were another 12 vendors with kiosks around the floor, with the largest there being Veeam Software.

I suspect VMWorld Europe happening the same time in Barcelona might have had something to do with the sparse exhibit floor but the trend has been present for the past few shows.

That being said there were still a few surprises in store, at least for me.  Two of the most interesting ones were:

  • Coho Data who came out of stealth with a scale out, RAIN (Redundant array of independent nodes) based storage cluster, with distributed, mirrored customer data across nodes and software defined networking. They currently support NFS for VMware with a management UI reminiscent of IOS 7 sans touch support. The product comes as a series of nodes with SSDs, disk storage and SDN. The SDN allows Coho Data to relocate front-end (client) connections to where the customer data lies. The distributed, mirrored backend storage provides redundancy in the case of a node/disk failure, at which time the system understands what data is now at risk and rebuilds the now-mirorless data onto other nodes. It reminds me a lot of Bycast/Archivas like architectures, with SDN and NFS support. I suppose the reason they are supporting VMware VMDKs is that the files are fairly large and thus easier to supply.
  • Cloud Physics was not exhibiting but they sponsored a break. As such, they were there talking with analysts and the press about their product. Their product installs as a VMware VM service and propagates VMware management agents to ESX servers which then pipe information back to their app about how your VMware environment is running, how VMs are performing, how your network and storage are performing for the VMs running, etc. This data is then sent to the cloud, where it’s anonymized. In the cloud, customers can use apps (called Cards) to analyze this data in the cloud, which can help them understand problem areas, predict what configuration changes can do for them, show them how VMs are performing, etc. It essentially is logging all this information to the cloud and providing ways to analyze the data to optimize your VMware environment.

Coming in just behind these two was Jeda Networks with their Software Defined Storage Network (SDSN). They use commodity (OpenFlow compatible) 10GbE switches to support a software FCoE storage SAN. Jeda Networks say that over the past two years,  most 10GbE switch hardware have started to support DCB in hardware and with that in place, plus OpenFlow compatibility, they can provide a SDSN on top of them just by emulating a control layer for FCoE switches. Of course one would still need FCoE storage and CNAs but with that in place one could use much cheaper switches to support FCoE.

CloudPhysics has a subscription based pricing model which offers three tiers:

  • Free where you get their Vapp, the management agents and a defined set of Free Card Apps for no cost;
  • Standard level where you get all the above plus a set of Card Apps which provide more VMware managability for $50/ESX server/Month; and
  • Enterprise level where you get all the above plus all the Card Apps presently available for $150/ESX server/Month.

Jeda networks and Coho Data are still developing their pricing and had none they were willing to disclose.

One of the CloudPhysics Card apps could predict how certain VMs would benefit from host based (PCIe or SSD) IO caching. They had a chart which showed working set inflection points for (I think) one VM running an OLTP application.  I have asked for this chart to discuss further in a future post.  But although CloudPhysics has the data to produce such a chart, the application shows three potential break points where say adding 500MB, 2000MB or 10000MB of SSD cache can speed up application performance by 10%, 30% or 50% (numbers here made up for example purposes and not off the chart they showed me).

A few other companies made announcements at the show. For example, Sepaton announced their new VirtuoSO, scale out hybrid reduplication appliance.

That’s about it. I would have to say that SNW needs to rethink their business model, frequency of stows or what they are trying to do at their conferences. However, on the plust side, most of the users I talked with came away with a lot of information and thought the show was worthwhile and I came away with a few surprises.



New deduplication solutions from Sepaton and NEC

In the last few weeks both Sepaton and NEC have announced new data deduplication appliance hardware and for Sepaton at least, new functionality. Both of these vendors compete against solutions from EMC Data Domain, IBM ProtectTier, HP StoreOnce and others.

Sepaton v7.0 Enterprise Data Protection

From Sepaton’s point of view data growth is exploding, with little increase in organizational budgets and system environments are becoming more complex with data risks expanding, not shrinking. In order to address these challenges Sepaton has introduced a new version of their hardware appliance with new functionality to help address the rising data risks.

Their new S2100-ES3 Series 2925 Enterprise Data Protection Platform with latest Sepaton software now supports up to 80 TB/hour of cluster data ingest (presumably with Symantec OST) and up to 2.0 PB of raw storage in an 8-node cluster. The new appliances support 4-8Gbps FC and 2-10GbE host ports per node, based on HP DL380p Gen8 servers with Intel Xeon E5-2690 processors, 8 core, dual 2.9Ghz CPU, 128 GB DRAM and a new high performance compression card from EXAR. With the bigger capacity and faster throughput, enterprise customers can now support large backup data streams with fewer appliances, reducing complexity and maintenance/licensing fees. S2100-ES3 Platforms can scale from 2 to 8 nodes in a single cluster.

The new appliance supports data-at-rest encryption for customer data security as well as data compression, both of which are hardware based, so there is no performance penalty. Also, data encryption is an optional licensed feature and uses OASIS KMIP 1.0/1.1 to integrate with RSA, Thales and other KMIP compliant, enterprise key management solutions.

NEC HYDRAstor Gen 4

With Gen4 HYDRAstor introduces a new Hybrid Node which contains both the logic for accelerator nodes and capacity for storage nodes, in one 2U rackmounted server. Before the hybrid node similar capacity and accessibility would have required 4U of rack space, 2U for the accelerator node and another 2U for the storage node.

The HS8-4000 HN supports 4.9TB/hr standard or 5.6TB/hr per node with NetBackup OST IO express ingest rates and 12-4TB, 3.5in SATA drives, with up to 48TB of raw capacity. They have also introduced an HS8-4000 SN which just consists of the 48TB of additional storage capacity. Gen4 is the first use of 4TB drives we have seen anywhere and quadruples raw capacity per node over the Gen3 storage nodes. HYDRAstor clusters can scale from 2- to 165-nodes and performance scales linearly with the number of cluster nodes.

With the new HS8-4000 systems, maximum capacity for a 165 node cluster is now 7.9PB raw and supports up to 920.7 TB/hr (almost a PB/hr, need to recalibrate my units) with an all 165-HS8-4000 HN node cluster. Of course, how many customers need a PB/hr of backup ingest is another question. Let alone, 7.9PB of raw storage which of course gets deduplicated to an effective capacity of over 100PBs of backup data (or 0.1EB, units change again).

NEC has also introduced a new low end appliance the HS3-410 for remote/branch office environments that has a 3.2TB/hr ingest with up to 24TB of raw storage. This is only available as a single node system.

Maybe Facebook could use a 0.1EB backup repository?

Image: Intel Team Inside Facebook Data Center by IntelFreePress


Tape vs. Disk, the saga continues

Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)
Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)

Was on a call late last month where Oracle introduced their latest generation T1000C tape system (media and drive) holding 5TB native (uncompressed) capacity. In the last 6 months I have been hearing about the coming of a 3TB SATA disk drive from Hitachi GST and others. And last month, EMC announced a new Data Domain Archiver, a disk only archive appliance (see my post on EMC Data Domain products enter the archive market).

Oracle assures me that tape density is keeping up if not gaining on disk density trends and capacity. But density or capacity are not the only issues causing data to move off of tape in today’s enterprise data centers.

“Dedupe Rulz”

A problem with the data density trends discussion is that it’s one dimensional (well literally it’s 2 dimensional). With data compression, disk or tape systems can easily double the density on a piece of media. But with data deduplication, the multiples start becoming more like 5X to 30X depending on frequency of full backups or duplicated data. And number’s like those dwarf any discussion of density ratios and as such, get’s everyone’s attention.

I can remember talking to an avowed tape enginerr, years ago and he was describing deduplication technology at the VTL level as being architecturally inpure and inefficient. From his perspective it needed to be done much earlier in the data flow. But what they failed to see was the ability of VTL deduplication to be plug-compatible with the tape systems of that time. Such ease of adoption allowed deduplication systems to build a beach-head and economies of scale. From there such systems have no been able to move up stream, into earlier stages of the backup data flow.

Nowadays, what with Avamar, Symantec Pure Disk and others, source level deduplication, or close by source level deduplication is a reality. But all this came about because they were able to offer 30X the density on a piece of backup storage.

Tape’s next step

Tape could easily fight back. All that would be needed is some system in front of a tape library that provided deduplication capabilities not just to the disk media but the tape media as well. This way the 30X density over non-deduplicated storage could follow through all the way to the tape media.

In the past, this made little sense because a deduplicated tape would require potentially multiple volumes in order to restore a particular set of data. However, with today’s 5TB of data on a tape, maybe this doesn’t have to be the case anymore. In addition, by having a deduplication system in front of the tape library, it could support most of the immediate data restore activity while data restored from tape was sort of like pulling something out of an archive and as such, might take longer to perform. In any event, with LTO’s multi-partitioning and the other enterprise class tapes having multiple domains, creating a structure with meta-data partition and a data partition is easier than ever.

“Got Dedupe”

There are plenty of places, that today’s tape vendors can obtain deduplication capabilities. Permabit offers Dedupe code for OEM applications for those that have no dedupe systems today. FalconStor, Sepaton and others offer deduplication systems that can be OEMed. IBM, HP, and Quantum already have tape libraries and their own dedupe systems available today all of which can readily support a deduplicating front-end to their tape libraries, if they don’t already.

Where “Tape Rulz”

There are places where data deduplication doesn’t work very well today, mainly rich media, physics, biopharm and other non-compressible big-data applications. For these situations, tape still has a home but for the rest of the data center world today, deduplication is taking over, if it hasn’t already. The sooner tape get’s on the deduplication bandwagon the better for the IT industry.


Of course there are other problems hurting tape today. I know of at least one large conglomerate that has moved all backup off tape altogether, even data which doesn’t deduplicate well (see my previous Oracle RMAN posts). And at least another rich media conglomerate that is considering the very same move. For now, tape has a safe harbor in big science, but it won’t last long.


Data Domain bidding war

It’s unclear to me what EMC would want with Data Domain (DD) other than to lockup deduplication technology across the enterprise. EMC has Avamar for Source dedupe, has DL for target dedupe, has Celerra Dedupe and the only one’s missing are V-Max, Symm & Clariion dedupe.

My guess is that EMC sees Data Domain’s market share as the primary target. It doesn’t take a lot of imagination to figure that once Data Domain is a part of EMC, EMC’s Disk Library (DL) offerings will move over to DD technology. Which probably leaves FalconStor/Quantum technology used in DL today as outsiders.

EMC’s $100M loan to Quantum last month probably was just insurance to keep a business partner afloat until something better came along or they could make it on their own. The DD deal would leave Quantum parntership supporting EMC with just Quantum’s tape offerings.

Quantum deduplication technology doesn’t have nearly the market share that DD has in the enterprise but they have won a number of OEM deals not the least of which is EMC and they were looking to expand. But if EMC buys DD, this OEM agreement will end soon.

I wonder if DD is worth $1.8B in cash what could Sepaton be worth. They seem to be the only pure play dedupe appliance left standing out there.

Not sure whether NetApp will up their bid but they always seem to enjoy competing with EMC. Also unclear how much of this bid is EMC wanting DD or EMC just wanting to hurt NetApp, either way DD stockholders win out in the end.