Enterprise file synch

Strange Clouds by michaelroper (cc) (from Flickr)
Strange Clouds by michaelroper (cc) (from Flickr)

Last fall at SNW in San Jose there were a few vendors touting enterprise file synchronization services each having a slightly different version of the requirements.   The one that comes most readily to mind was Egnyte which supported file synchronization across a hybrid cloud (public cloud and network storage) which we discussed in our Fall SNWUSA wrap up post last year.

The problem with BYOD

With bring your own devices (BYOD) corporate end users are quickly abandoning any pretense of IT control and turning consumer class file synchronization services to help  synch files across desktop, laptop and all mobile devices they haul around.   But the problem with these solutions such as DropBoxBoxOxygenCloud and others are that they are really outside of IT’s control.

Which is why there’s a real need today for enterprise class file synchronization solutions that exhibit the ease of use and set up available from consumer file synch systems but  offer IT security, compliance and control over the data that’s being moved into the cloud and across corporate and end user devices.

EMC Syncplicity and EMC on premises storage

Last week EMC announced an enterprise version of their recently acquired Syncplicity software that supports on-premises Isilon or Atmos storage, EMC’s own cloud storage offering.

In previous versions of Syncplicity storage was based in the cloud and used Amazon Web Services (AWS) for cloud orchestration and AWS S3 for cloud storage. With the latest release, EMC adds on premises storage to host user file synchronization services that can span mobile devices, laptops and end user desktops.

New Syncplicity users must download desktop client software to support file synchronization or mobile apps for mobile device synchronization.  After that it’s a simple matter of identifying which if any directories and/or files are to be synchronized with the cloud and/or shared with others.

However, with the Business (read enterprise) edition one also gets the Security and Compliance console which supports access control to define users and devices that can synchronize or share data, enforce data retention policies, remote wipe corporate data,  and native support for single sign services. In addition, one also gets centralized user and group management services to grant, change, revoke user and group access to data.  Also, one now obtains enterprise security with AES-256 data-at-rest encryption, separate key manager data centers and data storage data centers, quadruple replication of data for high disaster fault tolerance and SAS70 Type II compliant data centers.

If the client wants to use on premises storage, they would also need to deploy a VM virtual appliance somewhere in the data center to act as the gateway to file synchronization service requests. The file synch server would also presumably need access to the on premises storage and it’s unclear if the virtual appliance is in-band or out-of-band (see discussion on Egnyte’s solution options below).

Egnyte’s solution

Egnyte comes as a software only solution building a file server in the cloud for end user  storage. It also includes an Egnyte app for mobile hardware and the ever present web file browser.  Desktop file access is provided via mapped drives which access the Egnyte cloud file server gateway running as a virtual appliance.

One major difference between Syncplicity and Egnyte is that Egnyte offers a combination of both cloud and on premises storage but you cannot have just on premises storage. Syncplicity only offers one or the other storage for file data, i.e., file synchronization data can only be in the cloud or on local on premises storage but cannot be in both locations.

The other major difference is that Egnyte operates with just about anybody’s NAS storage such as EMC, IBM, and HDS for the on premises file storage.  It operates as an in-band, software appliance solution that traps file activity going to your on premises storage. In this case, one would need to start using a new location or directory for data to be synchronized or shared.

But for NetApp storage only (today), they utilize ONTAP APIs to offer out-of-band file synchronization solutions.  This means that you can keep NetApp data where it resides and just enable synchronization/shareability services for the NetApp file data in current directory locations.

Egnyte promises enterprise class data security with AD, LDAP and/or SSO user authentication, AES-256 data encryption and their own secure data centers.  No mention of separate key security in their literature.

As for cloud backend storage, Egnyte has it’s own public cloud or supports other cloud storage providers such as AWS S3, Microsoft Azure, NetApp Storage Grid and HP Public Cloud.

There’s more to Egnyte’s solution than just file synchronization and sharing but that’s the subject of today’s post. Perhaps we can cover them at more length in a future post if their interest.

File synchronization, cloud storage’s killer app?

The nice thing about these capabilities is that now IT staff can re-gain control over what is and isn’t synched and shared across multiple devices.  Up until now all this was happening outside the data center and external to IT control.

From Egnyte’s perspective, they are seeing more and more enterprises wanting data both on premises for performance and compliance as well as in the cloud storage for ubiquitous access.  They feel its both a sharability demand between an enterprise’s far flung team members and potentially client/customer personnel as well as a need to access, edit and propagate silo’d corporate information using new mobile devices that everyone has these days.

In any event, Enterprise file synchronization and sharing is emerging as one of the killer apps for cloud storage.  Up to this point cloud gateways made sense for SME backup or disaster recovery solutions but IMO, didn’t really take off beyond that space.  But if you can package a robust and secure file sharing and synchronization solution around cloud storage then you just might have something that enterprise customers are clamoring for.

~~~~

Comments?

Latest SPECsfs2008 results, over 1 million NFS ops/sec – chart-of-the-month

Column chart showing the top 10 NFS througput operations per second for SPECsfs2008
(SCISFS111221-001) (c) 2011 Silverton Consulting, All Rights Reserved

[We are still catching up on our charts for the past quarter but this one brings us up to date through last month]

There’s just something about a million SPECsfs2008(r) NFS throughput operations per second that kind of excites me (weird, I know).  Yes it takes over 44-nodes of Avere FXT 3500 with over 6TB of DRAM cache, 140-nodes of EMC Isilon S200 with almost 7TB of DRAM cache and 25TB of SSDs or at least 16-nodes of NetApp FAS6240 in Data ONTAP 8.1 cluster mode with 8TB of FlashCache to get to that level.

Nevertheless, a million NFS throughput operations is something worth celebrating.  It’s not often one achieves a 2X improvement in performance over a previous record.  Something significant has changed here.

The age of scale-out

We have reached a point where scaling systems out can provide linear performance improvements, at least up to a point.  For example, the EMC Isilon and NetApp FAS6240 had a close to linear speed up in performance as they added nodes indicating (to me at least) there may be more there if they just throw more storage nodes at the problem.  Although maybe they saw some drop off and didn’t wish to show the world or potentially the costs became prohibitive and they had to stop someplace.   On the other hand, Avere only benchmarked their 44-node system with their current hardware (FXT 3500), they must have figured winning the crown was enough.

However, I would like to point out that throwing just any hardware at these systems doesn’t necessary increase performance.  Previously (see my CIFS vs NFS corrected post), we had shown the linear regression for NFS throughput against spindle count and although the regression coefficient was good (~R**2 of 0.82), it wasn’t perfect. And of course we eliminated any SSDs from that prior analysis. (Probably should consider eliminating any system with more than a TB of DRAM as well – but this was before the 44-node Avere result was out).

Speaking of disk drives, the FAS6240 system nodes had 72-450GB 15Krpm disks, the Isilon nodes had 24-300GB 10Krpm disks and each Avere node had 15-600GB 7.2Krpm SAS disks.  However the Avere system also had a 4-Solaris ZFS file storage systems behind it each of which had another 22-3TB (7.2Krpm, I think) disks.  Given all that, the 16-node NetApp system, 140-node Isilon and the 44-node Avere systems had a total of 1152, 3360 and 748 disk drives respectively.   Of course, this doesn’t count the system disks for the Isilon and Avere systems nor any of the SSDs or FlashCache in the various configurations.

I would say with this round of SPECsfs2008 benchmarks scale-out NAS systems have come out.  It’s too bad that both NetApp and Avere didn’t release comparable CIFS benchmark results which would have helped in my perennial discussion on CIFS vs. NFS.

But there’s always next time.

~~~~

The full SPECsfs2008 performance report went out to our newsletter subscriber’s last December.  A copy of the full report will be up on the dispatches page of our site sometime later this month (if all goes well). However, you can see our full SPECsfs2008 performance analysis now and subscribe to our free monthly newsletter to receive future reports directly by just sending us an email or using the signup form above right.

For a more extensive discussion of file and NAS storage performance covering top 30 SPECsfs2008 results and NAS storage system features and functionality, please consider purchasing our NAS Buying Guide available from SCI’s website.

As always, we welcome any suggestions on how to improve our analysis of SPECsfs2008 results or any of our other storage system performance discussions.

Comments?

Intel acquires InfiniBand fabric technology from Qlogic

Isilon Packaging by ChrisDag (cc) (from Flickr)”][InfiniBand interconnected] Isilon Packaging by ChrisDag (cc) (from Flickr)Intel announced today that they are going to acquire the InfiniBand (IB) fabric technology business from Qlogic.

From many analyst’s perspective, IB is one of the only technologies out there that can efficiently interconnect a cluster of commodity servers into a supercomputing system.

What’s InfiniBand?

Recall that IB is one of three reigning data center fabric technologies available today which include 10GbE, and 16 Gb/s FC.  IB is currently available in DDR, QDR and FDR modes of operation, that is 5Gb/s, 10Gb/s or 14Gb/s, respectively per single lane, according to the IB update (see IB trade association (IBTA) technology update).  Systems can aggregate multiple IB lanes in units of 4 or 12 paths (see wikipedia IB article), such that an IB QDRx4 supports 40Gb/s and a IB FDRx4 currently supports 56Gb/s.

The IBTA pitch cited above showed that IB is the most widely used interface for the top supercomputing systems and supports the most power efficient interconnect available (although how that’s calculated is not described).

Where else does IB make sense?

One thing IB has going for it is low latency through the use of RDMA or remote direct memory access.  That same report says that an SSD directly connected through a FC takes about ~45 μsec to do a read whereas the same SSD directly connected through IB using RDMA would only take ~26 μsec.

However, RDMA technology is now also coming out on 10GbE through RDMA over Converged Ethernet (RoCE, pronounced “rocky”).  But ITBA claims that IB RDMA has a 0.6 μsec latency and the RoCE has a 1.3 μsec.  Although at these speed, 0.7 μsec doesn’t seem to be a big thing, it doubles the latency.

Nonetheless, Intel’s purchase is an interesting play.  I know that Intel is focusing on supporting an ExaFLOP HPC computing environment by 2018 (see their release).  But IB is already a pretty active technology in the HPC community already and doesn’t seem to need their support.

In addition, IB has been gradually making inroads into enterprise data centers via storage products like the Oracle Exadata Storage Server using the 40 Gb/s IB QDRx4 interconnects.  There are a number of other storage products out that use IB as well from EMC IsilonSGI, Voltaire, and others.

Of course where IB can mostly be found today is in computer to computer interconnects and just about every server vendor out today, including Dell, HP, IBM, and Oracle support IB interconnects on at least some of their products.

Whose left standing?

With Qlogic out I guess this leaves Cisco (de-emphasized lately), Flextronix, Mellanox, and Intel as the only companies that supply IB switches. Mellanox, Intel (from Qlogic) and Voltaire supply the HCA (host channel adapter) cards which provide the server interface to the switched IB network.

Probably a logical choice for Intel to go after some of this technology just to keep it moving forward and if they want to be seriously involved in the network business.

IB use in Big Data?

On the other hand, it’s possible that Hadoop and other big data applications could conceivably make use of IB speeds and as these are mainly vast clusters of commodity systems it would be a logical choice.

There is some interesting research on the advantages of IB in HDFS (Hadoop) system environments (see Can high performance interconnects boost Hadoop distributed file system performance) out of Ohio State University.  This research essentially says that Hadoop HDFS can perform much better when you combine IB with IPoIB (IP over IB, see OpenFabrics Alliance article) and SSDs.  But SSDs alone do not provide as much benefit.   (Although my reading of the performance charts seems to indicate it’s not that much better than 10GbE with TOE?).

It’s possible other Big data analytics engines are considering using IB as well.  It would seem to be a logical choice if you had even more control over the software stack.

~~~~

Comments?

 

Shared DAS

Code Name "Thumper" by richardmasoner (cc) (from Flickr)
Code Name "Thumper" by richardmasoner (cc) (from Flickr)

An announcement this week by VMware on their vSphere  5 Virtual Storage Appliance has brought back the concept of shared DAS (see vSphere 5 storage announcements).

Over the years, there have been a few products, such as Seanodes and Condor Storage (may not exist now) that have tried to make a market out of sharing DAS across a cluster of servers.

Arguably, Hadoop HDFS (see Hadoop – part 1), Amazon S3/cloud storage services and most scale out NAS systems all support similar capabilities. Such systems consist of a number of servers with direct attached storage, accessible by other servers or the Internet as one large, contiguous storage/file system address space.

Why share DAS? The simple fact is that DAS is cheap, its capacity is increasing, and it’s ubiquitous.

Shared DAS system capabilities

VMware has limited their DAS virtual storage appliance to a 3 ESX node environment, possibly lot’s of reasons for this.  But there is no such restriction for Seanode Exanode clusters.

On the other hand, VMware has specifically targeted SMB data centers for this facility.  In contrast, Seanodes has focused on both HPC and SMB markets for their shared internal storage which provides support for a virtual SAN on Linux, VMware ESX, and Windows Server operating systems.

Although VMware Virtual Storage Appliance and Seanodes do provide rudimentary SAN storage services, they do not supply advanced capabilities of enterprise storage such as point-in-time copies, replication, data reduction, etc.

But, some of these facilities are available outside their systems. For example, VMware with vSphere 5 will supports a host based replication service and has had for some time now software based snapshots. Also, similar services exist or can be purchased for Windows and presumably Linux.  Also, cloud storage providers have provided a smattering of these capabilities from the start in their offerings.

Performance?

Although distributed DAS storage has the potential for high performance, it seems to me that these systems should perform poorer than an equivalent amount of processing power and storage in a dedicated storage array.  But my biases might be showing.

On the other hand, Hadoop and scale out NAS systems are capable of screaming performance when put together properly.  Recent SPECsfs2008 results for EMC Isilon scale out NAS system have demonstrated very high performance and Hadoops claim to fame is high performance analytics. But you have to throw a lot of nodes at the problem.

—–

In the end, all it takes is software. Virtualizing servers, sharing DAS, and implementing advanced storage features, any of these can be done within software alone.

However, service levels, high availability and fault tolerance requirements have historically necessitated a physical separation between storage and compute services. Nonetheless, if you really need screaming application performance and software based fault tolerance/high availability will suffice, then distributed DAS systems with co-located applications like Hadoop or some scale out NAS systems are the only game in town.

Comments?