NFS – Silverton Consulting

QoM1610: Will NVMe over Fabric GA in enterprise AFA by Oct’2017

Posted on October 16, 2016 by Ray in Block Storage, CIFS/SMB, Ethernet, FC, FCoE, File Storage, Forecasting, Infiniband, iSCSI, Networking, NFS, NVMe, NVMe storage, Omni-Path, QoM 2016, RDMA, RoCE, SSD storage, Storage performance, Strategic Inflection Points

NVMe over fabric (NVMeoF) was a hot topic at Flash Memory Summit last August. Facebook and others were showing off their JBOF (see my Facebook moving to JBOF post) but there were plenty of other NVMeoF offerings at the show.

NVMeoF hardware availability

When Brocade announced their Gen6 Switches they made a point of saying that both their Gen5 and Gen6 switches currently support NVMeoF protocols. In addition to Brocade’s support, in Dec 2015 Qlogic announced support for NVMeoF for select HBAs. Also, as of July 2016, Emulex announced support for NVMeoF in their HBAs.

From an Ethernet perspective, Qlogic has a NVMe Direct NIC which supports NVMe protocol offload for iSCSI. But even without NVMe Direct, Ethernet 40GbE & 100GbE with iWARP, RoCEv1-v2, iSCSI SER, or iSCSI RDMA all could readily support NVMeoF on Ethernet. The nice thing about NVMeoF for Ethernet is not only do you get support for iSCSI & FCoE, but CIFS/SMB and NFS as well.

InfiniBand and Omni-Path Architecture already support native RDMA, so they should already support NVMeoF.

So hardware/firmware is already available for any enterprise AFA customer to want NVMeoF for their data center storage.

NVMeoF Software

Intel claims that ~90% of the software driver functionality of NVMe is the same for NVMeoF. The primary differences between the two seem to be the NVMeoY discovery and queueing mechanisms.

There are two fabric methods that can be used to implement NVMeoF data and command transfers: capsule mode where NVMe commands and data are encapsulated in normal fabric packets or fabric dependent mode where drivers make use of native fabric memory transfer mechanisms (RDMA, …) to transfer commands and data.

A (Linux) host driver for NVMeoF is currently available from Seagate. And as a result, support for NVMeoF for Linux is currently under development, and not far from release in the next Kernel (I think). (Mellanox has a tutorial on how to compile a Linux kernel with NVMeoF driver support).

With Linux coming out, Microsoft Windows and VMware can’t be far behind. However, I could find nothing online, aside from base NVMe support, for either platform.

NVMeoF target support is another matter but with NICs/HBAs & switch hardware/firmware and drivers presently available, proprietary storage system target drivers are just a matter of time.

Boot support is a major concern. I could find no information on BIOS support for booting off of a NVMeoF AFA. Arguably, one may not need boot support for NVMeoF AFAs as they are probably not a viable target for storing App code or OS software.

From what I could tell, normal fabric multi-pathing support should work fine with NVMeoF. This should allow for HA NVMeoF storage, a critical requirement for enterprise AFA storage systems these days.

NVMeoF advantages/disadvantages

Chelsio and others have shown that NVMeoF adds ~8μsec of additional overhead beyond native NVMe SSDs, which if true would warrant implementation on all NVMe AFAs. This may or may not impact max IOPS depending on scale-ability of NVMeoF.

For instance, servers (PCIe bus hardware) typically limit the number of private NVMe SSDs to 255 or less. With an NVMeoF, one could potentially have 1000s of shared NVMe SSDs accessible to a single server. With this scale, one could have a single server attached to a scale-out NVMeoF AFA (cluster) that could supply ~4X the IOPS that a single server could perform using private NVMe storage.

Base level NVMe SSD support and protocol stacks are starting to be available for most flash vendors and operating systems such as, Linux, FreeBSD, VMware, Windows, and Solaris. If Intel’s claim of 90% common software between NVMe and NVMeoF drivers is true, then it should be a relatively easy development project to provide host NVMeoF drivers.

The need for special Ethernet hardware that supports RDMA may delay some storage vendors from implementing NVMeoF AFAs quickly. The lack of BIOS boot support may be a minor irritant in comparison.

NVMeoF forecast

AFA storage systems, as far as I can tell, are all about selling high IOPS and very-low latency IOs. It would seem that NVMeoF would offer early adopter AFA storage vendors a significant performance advantage over slower paced competition.

In previous QoM/QoW posts we have established that there are about 13 new enterprise storage systems that come out each year. Probably 80% of these will be AFA, given the current market environment.

Of the 10.4 AFA systems coming out over the next year, ~20% of these systems pride themselves on being the lowest latency solutions in the market, and thus command high margins. One would think these systems would be the first to adopt NVMeoF. But, most of these systems have their own, proprietary flash modules and do not use standard (NVMe) SSDs and can use their own proprietary interface to their proprietary flash storage. This will delay any implementation for them until they can convert their flash storage to NVMe which may take some time.

On the other hand, most (70%) of the other AFA systems, that currently use SAS/SATA SSDs, could boost their IOP counts and drastically reduce their IO response times, by implementing NVMe SSDs and NVMeoF. But converting SAS/SATA backends to NVMe will take time and effort.

But, there are a select few (~10%) of AFA systems, that already use NVMe SSDs in their AFAs, and for these few, they would seem to have a fast track towards implementing NVMeoF. The fact that NVMeoF is supported over all fabrics and all storage interface protocols make it even easier.

Moreover, NVMeoF has been under discussion since the summer of 2015, which tells me that astute AFA vendors have already had 18+ months to develop it. With NVMeoF host drivers & hardware available since Dec. 2015, means hardware and software exist to test and validate against.

I believe that NVMeoF will be GA’d within the next 12 months by at least one enterprise AFA system. So my QoM1610 forecast for NVMeoF is YES, with a 0.83 probability.

Comments?

Exablox, bring your own disk storage

Posted on July 8, 2016July 8, 2016 by Ray in CIFS/SMB, Clustered storage, Data reduction, Ethernet, File Storage, NFS, Object storage

We talked with Exablox a month or so ago at Storage Field Day 10 (SFD10) and they discussed some of their unique storage solution and new software functionality. If you’re not familiar with Exablox they sell a OneBlox appliance with drive slots, but no data drives.

The OneBlox appliance provides a Linux based, scale-out, distributed object storage software with a file system in front of it. They support SMB and NFS access protocols and have inline deduplication, data compression and continuous snapshot capabilities. You supply the (SATA or SAS) drives, a bring your own drive (BYOD) storage offering.

Their OneSystem management solution is available on a subscription basis, which usually runs in the cloud as a web accessed service offering used to monitor and manage your Exablox cluster(s). However, for those customers that want it, OneSystem is also available as a Docker Container, where you can run it on any Docker compatible system.
Continue reading “Exablox, bring your own disk storage” →

Rubrik has a better idea for VMware backup

Posted on February 16, 2016May 17, 2016 by Ray in Data compression, Data reduction, Distributed computing, NFS, Object storage, Storage Backup

Cluster nodes Rubrik has been around since January 2014 and just GA’d in April of last year. They recently presented at TechFieldDay 10 (TFD10, videos here) with Chris Wahl, Technical Evangelist, Arvin “Nitro” Nithrakashyap, Co-Founder and Bipul Sinha, Co-Founder, in attendance.

I have known Chris Wahl since November of 2013, from our time together on Storage Field Day 4 (SFD4). Howard and I (the “Greybeards”) also interviewed Chris Wahl for Rubrik on a Greybeards on Storage podcast.
Continue reading “Rubrik has a better idea for VMware backup” →

Primary data’s path to better data storage presented at SFD8

Posted on November 5, 2015 by Ray in Block Storage, DAS, data access, data mobility, File Storage, NFS, Storage architecture, Storage virtualization, Strategic Inflection Points

A couple of weeks ago we met with Primary Data, Lance Smith, CEO, David Flynn, CTO and Kaycee Lai, SVP Product & Sales who were presenting at Storage Field Day 8 (SFD8, videos of their sessions available here). Primary Data has just emerged out of stealth late last year and has ~$60M in funding. Also they have Steve Wozniak (of Apple fame) as Chief Scientist, but he wasn’t at the SFD8 session 🙁

Primary Data seems out to change the world. At first I thought this was just another form of storage virtualization but they are laser focused on data virtualization or what they call data mobility. It differs from pure storage virtualization by being outside the data path. (I have written about data virtualization before as well as the data hypervisor a long time ago). Nowadays they seem to be using the tag line of data in motion.

Why move data?

David has a theory behind the proliferation of startup storage companies. The spectrum behind capacity and performance has gotten immense, over time, which has provided an opening for a number of companies to address these widening needs.

David believes that caching at the storage system or in the servers is an attempt to address this issue by “loaning” the data from the storage silo to the cache. This is trying to supply a lower cost $/IOP for the data. Similar considerations are apparent at the other side where customer’s use archive or backup services to take advantage of much cheaper $/GB storage.

However, given the difficulty of moving data around in present day storage environments, customer data has become essentially immobile. Primary Data is trying to bring about a data mobility revolution and allow data to move over this spectrum of performance and capacity of storage with ease. Doing so easily, will provide significant benefits as customers can more fully take advantage of the various levels of performance and capacity in their data center storage environments.

Primary Data architecture

Primary Data is providing data mobility by using their meta-data service called the DataSphere appliance and their client software running on host servers called the Data Portal. Their offering can be best explained in three layers:

Data virtualization layer – provides continuity of identity and continuity of access across multiple physical storage systems. That is the same data (identity continuity) can be accessed wherever it resides (access continuity) by server applications. Such access and identity must transcend access protocols and interfaces. The Data Portal client software intercepts the server data activity and does control plane activity using the DataSphere appliance and performs IO directly using the physical storage.
Objective based data management – supplies a data affinity service. That is data can have a temporary location relationship with physical storage depending on the current performance (R:W, IOPS, bandwidth, latency) and protection (durability, availability, disaster recoverability, security, copy-ability, version-ability) characteristics of the data. These data objectives are matched to the capabilities or service catalog of the storage infrastructure and data objectives can change over time
Analytics in the loop – detects the performance and other characteristics of the storage and data in real-time. That is by monitoring the storage IO activity Primary Data can determine the actual performance attribute of the storage. Similarly, by monitoring the applications IO characteristics over time the system can determine the performance objectives of its data. The system also takes advantage of SMI-S to define some of the other characteristics of the storage systems.

How does Primary Data work?

Primary Data has taken advantage of parallel NFS extensions (pNFS) in NFSv4 to externalize and separate the storage control plane from the IO data plane. This works well for native Linux where the main developer of the Linux file system stack is on their payroll. IMG_5608rz

In Windows they put a filter driver in front of SMB to split off the control from data IO plane. Something similar is done for VMware ESX servers to supply the control-data plane split but in this case there is a software defined Data Portal that goes along with the DataSphere Service client that can do it all within the same ESX server. Another alternative exists and that is to use the Data Portal appliance as a storage virtualization service but then the IO data path goes through the portal.

According to their datasheet they currently support data virtualization services for NetApp cDOT and 7-mode, EMC Isilon OneFS7.2, and Nexenta 4.x&5.0 but plan on more.

They are not quite GA yet, but are close.

Comments?

Object store and hybrid clouds at Cloudian

Posted on April 23, 2015 by Ray in Data growth, Distributed computing, File Storage, NFS, Object storage, SSD storage, Storage resiliency

Out of Japan comes another object storage provider called Cloudian. We talked with Cloudian at Storage Field Day 7 (SFD7) last month in San Jose (see the videos of their presentations here).

Cloudian history

Cloudian has been out on the market since March of 2011 but we haven’t heard much about them, probably because their focus has been East Asia. The same day that the Tōhoku Earthquake and Tsunami hit the company announced Cloudian, an Amazon S3 Compliant Multi-Tenant Cloud Storage solution.

Their timing couldn’t have been better. Japanese IT organizations were beating down their door over the next two years for a useable and (earthquake and tsunami) resilient storage solution.

Cloudian spent the next 2 years, hardening their object storage system, the HyperStore, and now they are ready to take on the rest of the world.

Currently Cloudian has about 20PB of storage under management and are shipping a HyperStore Appliance or a software only distribution of their solution. Cloudian’s solutions support S3 and NFS access protocols.

Their solution uses Cassandra, a highly scaleable, distributed NoSQL database which came out of FaceBook for their meta-data database. This provides a scaleable, non-sharable meta-data data base for object meta-data repository and lookup.

Cloudian creates virtual storage pools on backend storage which can be optimized for small objects, replication or erasure coding and can include automatic tiering to any Amazon S3 and Glacier compatible cloud storage. I would guess this is how they qualify for Hybrid Cloud status.

The HyperStore appliance

Cloudian creates a HyperStore P2P ring structure. Each appliance has Cloudian management console services as well as the HyperStore engine which supports three different data stores: Cassandra, Replicas, and Erasure coding. Unlike Scality, it appears as if one HyperStore Ring must exist in a region. But it can be split across data centers. Unclear what their definition of a “region” is.

HyperStore hardware come in entry level (HSA-2024: 24TB/1U), capacity optimized (HSA-2048: 48TB/1U), performance optimized (HSA-2060: all flash, 60TB/2U

Replication with Dynamic Consistency

The other thing that Cloudian supports is different levels of consistency for replicated data. Most object stores support eventual consistency (see Eventual Data Consistency and Cloud Storage post). HyperStore supports 3 (well maybe 5) different levels of consistency:

One – object written to one replica and committed there before responding to client
Quorum – object written to N/2+1 replicas before responding to client
1. Local Quorum – replicas are written to N/2+1 nodes in same data center before responding to client
2. Each Quorum – replicas are written to N/2+1 nodes in each data center before responding to client.
All – all replicas must have received and committed the object write before responding to client

There are corresponding read consistency levels as well. The objects writes have a “coordinator” node which handles this consistency. The implication is that consistency could be established on an object basis. Unclear to me whether Read and Write dynamic consistency can be different?

Apparently small objects are also stored in the Cassandra datastore. That way HyperStore optimizes for object size. Also, HyperStore nodes can be added to a ring and the system will auto balance the data across the old and new nodes automatically.

Cloudian also support object versioning, ACL, and QoS services as well.

~~~

I was a bit surprised by Cloudian. I thought I knew all the object storage solutions out on the market. But then again they made their major strides in Asia and as an on-premises Amazon S3 solution, rather than a generic object store.

For more on Cloudian from SFD7 see:

Cloudian – Storage Field Day 7 Preview by @VirtualizedGeek (Keith Townsend)

What’s next for Nexenta

Posted on January 23, 2015January 23, 2015 by Ray in Block Storage, CIFS/SMB, Clustered storage, DAS, desktop virtualization, Distributed computing, iSCSI, NFS, Object storage, Storage, Strategic Inflection Points

We talked with Nexenta at Storage Field Day 6 where they discussed their current and future software defined storage solutions. I highly encourage you to see the SFD6 videos of their sessions if you want to learn more about them.

Nexenta was an earlier adopter of software defined storage and have recently signed with Solinea to support Nexenta under OpenStack CINDER block storage. Nexenta is based on ZFS and supports inline deduplication and advanced performance functionality.

NexentaStor™

NexentaStor™ is there base storage software and comes as a download in both an Enterprise edition and Community edition. NexentaStor can run on most industry standard, x86 server platforms.

The Community edition supports up to 18TB and uses DAS and/or SAS connected storage to supply NFS and SMB file services.
The Enterprise edition extends capacity into the PB and supports FC and iSCSI block storage services as well as file services. The Enterprise edition supports plugins for HA solutions and storage replication.

Nexenta mentioned that they had over 6500 customers for NexentaStor of which 1500 are cloud service providers. But they have a whole lot more to offer than just NexentaStor including NexentaConnect™ and coming soon, NexentaEdge™ and NexentaFusion™.

NexentaConnect™

NexentaConnect software works with VMware or Citrix solutions to provide advanced storage services, such as file services, IO acceleration, and storage automation/analytics. There are three products in the NexentaConnect family:

NexentaConnect for VMware Virtual SAN – by combining NexentaConnect together with VMware Virtual SAN software and DAS or SAS storage one can offer NFS and SMB/CIFS file services. Prior to NexentaConnect, VMware Virtual SAN storage only provided VMware dedicated SAN storage, but now that same infrastructure can be used for any NFS or SMB/CIFS file system storage.
NexentaConnect for VMware Horizon – by combining NexentaConnect with VMware Horizon and DAS plus local SSD storage, one can provide accelerated virtual desktop IO with state of the art write logging, inline deduplication, and GUI based storage automation/analytics.
NexentaConnect for Citrix XenDesktop (in Beta now) – by combining NexentaConnect with Citrix XenDesktop software and DAS plus local SSD storage, one can accelerate XenDesktop IO and ease the management of XenDesktop storage.

Nexenta has teamed up with Dell to offer Dell-Nexenta (and VMware) storage solution using NexentaConnect and VMware Virtual SAN software on Dell hardware.

NexentaEdge™

They spent a lot of time on NexentaEdge and what they plan to offer is a software defined object storage solution. Most object storage systems on the market either started as software only or currently support a software only version. But Nexenta is the first to come at it from a file services heritage that I know of.

NexentaEdge will offer iSCSI services as well as standard object storage services such as Amazon S3 and OpenStack SWIFT. Their solution splits up objects into chunks and replicates/distributes the object chunks across their software defined (object) storage cluster.

Cluster communications uses UDP (not TCP) and so has less overhead. NexentaEdge cluster communications uses their own Replicast protocol to send messages and data out across the cluster. .

They designed NexentaEdge to be able to support Shingle Magnetic Recording (SMR) disks which are very dense storage but occasionally have to go “away” while they perform garbage collection/re-organization. I did two posts about SMR disks a while back (see Shingled magnetic recording disks and Sequential-only disk for more information on SMR).

I have to admit I had a BIG problem with support for iSCSI over eventually consistent storage. I don’t see how this can be used to support ACID database requests but I suppose Nexenta would argue that anyone using object storage for ACID database IO needs to have their head examined.

NexentaFusion™

Although this was not discussed as much, NexentaFusion is another future offering supplying software defined storage analytics and orchestration automation. They intent is to use NexentaFusion with NexentaStor, NexentaConnect and/or NexentaEdge. As you scale up your Nexenta storage cluster, automation/orchestration and storage analytics starts to become a more pressing need. According to Nexenta’s website NexentaFusion 1.0 will support multi-tennant storage monitoring and real time storage analytics while NexentaFusion 2.0 will supportstorage provisioning and orchestration.

~~~~

Nexenta provided Converse all-star shoes to all the participants as well as pens and notebooks. I had to admit I liked the look of the new tennis shoes but my wife and kids thought I was crazy.

Different views on Nexenta from the other SFD6 bloggers can be found below:

SFD6 – Day 2 – Nexenta from PenguinPunk (Dan Firth, @PenguinPunk)

Nexenta – Back in da house by Nigel Poulton (@NigelPoulton)

Sorry Nexenta, but I don’t get it … and questions arise by Juku (Enrico Signoretti, @ESignoretti)

Day 2 at SFD6: Nexenta by Absolutely Windows (John Obeto, @JohnObeto)

New SPECsfs2008 CIFS/SMB vs. NFS (again) – chart of the month

Posted on April 30, 2014August 19, 2016 by Ray in CIFS/SMB, CIFS/SMB throughput, File Storage, NFS, NFS throughput, SPECsfs, SPECsfs2008, Storage performance

SPECsfs2008 benchmark results for CIFS/SMB vs. NFS protocol performance — SCISFS140326-001 (c) 2014 Silverton Consulting, All Rights Reserved

The above chart represents another in a long line of charts on the relative performance of CIFS[/SMB] versus NFS file interface protocols. The information on the chart are taken from vendor submissions that used the same exact hardware configurations for both NFS and CIFS/SMB protocol SPECsfs2008 benchmark submissions.

There are generally two charts I show in our CIFS/SMB vs. NFS analysis, the one above and another that shows a ops/sec per spindle count analysis for all NFS and CIFS/SMB submissions. Both have historically indicated that CIFS/SMB had an advantage. The one above shows the total number of NFS or CIFS/SMB operations per second on the two separate axes and provides a linear regression across the data. The above shows that, on average, the CIFS/SMB protocol provides about 40% more (~36.9%) operations per second than NFS protocol does with the same hardware configuration.

However, there are a few caveats about this and my other CIFS/SMB vs. NFS comparison charts:

The SPECsfs2008 organization has informed me (and posted on their website) that CIFS[/SMB] and NFS are not comparable. CIFS/SMB is a stateful protocol and NFS is stateless and the corresponding commands act accordingly. My response to them and my readers is that they both provide file access, to a comparable set of file data (we assume, see my previous post on What’s wrong with SPECsfs2008) and in many cases today, can provide access to the exact same file, using both protocols on the same storage system.
The SPECsfs2008 CIFS/SMB benchmark does slightly more read and slightly less write data operations than their corresponding NFS workloads. Specifically, their CIFS/SMB workload does 20.5% and 8.6% READ_ANDX and WRITE_ANDX respectively CIFS commands vs. 18% and 9% READ and WRITE respectively NFS commands.
There are fewer CIFS/SMB benchmark submissions than NFS and even fewer with the same exact hardware (only 13). So the statistics comparing the two in this way must be considered preliminary, even though the above linear regression is very good (R**2 at ~0.98).
Many of the submissions on the above chart are for smaller systems. In fact 5 of the 13 submissions were for storage systems that delivered less than 20K NFS ops/sec which may be skewing the results and most of which can be seen above bunched up around the origin of the graph.

And all of this would all be wonderfully consistent if not for a recent benchmark submission by NetApp on their FAS8020 storage subsystem. For once NetApp submitted the exact same hardware for both a NFS and a CIFS/SMB submission and lo and behold they performed better on NFS (110.3K NFS ops/sec) than they did on CIFS/SMB (105.1K CIFS ops/sec) or just under ~5% better on NFS.

Luckily for the chart above this was a rare event and most others that submitted both did better on CIFS/SMB. But I have been proven wrong before and will no doubt be proven wrong again. So I plan to update this chart whenever we get more submissions for both CIFS/SMB and NFS with the exact same hardware so we can see a truer picture over time.

For those with an eagle eye, you can see NetApp’s FAS8020 submission as the one below the line in the first box above the origin which indicates they did better on NFS than CIFS/SMB.

Comments?

~~~~

The complete SPECsfs2008 performance report went out in SCI’s March 2014 newsletter. But a copy of the report will be posted on our dispatches page sometime next quarter (if all goes well). However, you can get the latest storage performance analysis now and subscribe to future free newsletters by just using the signup form above right.

Even more performance information on NFS and CIFS/SMB protocols, including our ChampionCharts™ for file storage can be found in SCI’s recently (March 2014) updated NAS Buying Guide, on sale from our website.

As always, we welcome any suggestions or comments on how to improve our SPECsfs2008 performance reports or any of our other storage performance analyses.

Who’s the next winner in data storage?

Posted on August 15, 2013August 15, 2013 by Ray in CIFS/SMB, Cloud services, Cloud storage, data access, Data availability, data services, Distributed computing, FC, FCoE, NFS, Object storage, Storage, Storage availability, Strategic Inflection Points, Strategy, Visionary organizations

Strange Clouds by michaelroper (cc) (from Flickr)

“The future is already here – just not evenly distributed”, W. Gibson

It starts as it always does outside the enterprise data center. In the line of businesses, in the development teams, in the small business organizations that don’t know any better but still have an unquenchable need for data storage.

It’s essentially an Innovator’s Dillemma situation. The upstarts are coming into the market at the lower end, lower margin side of the business that the major vendors don’t seem to care about, don’t service very well and are ignoring to their peril.

Yes, it doesn’t offer all the data services that the big guns (EMC, Dell, HDS, IBM, and NetApp) have. It doesn’t offer the data availability and reliability that enterprise data centers have come to demand from their storage. require. And it doesn’t have the performance of major enterprise data storage systems.

But what it does offer, is lower CapEx, unlimited scaleability, and much easier to manage and adopt data storage, albeit using a new protocol. It does have some inherent, hard to get around problems not the least of which is speed of data ingest/egress, highly variable latency and eventual consistency. There are other problems which are more easily solvable, with work, but the three listed above are intrinsic to the solution and need to be dealt with systematically.

And the winner is …

It has to be cloud storage providers and the big elephant in the room has to be Amazon. I know there’s a lot of hype surrounding AWS S3 and EC2 but you must admit that they are growing, doubling year over year. Yes it is starting from a much lower capacity point and yes, they are essentially providing “rentable” data storage space with limited or even non-existant storage services. But they are opening up whole new ways to consume storage that never existed before. And therein lies their advantage and threat to the major storage players today, unless they act to counter this upstart.

On AWS’s EC2 website there must be 4 dozen different applications that can be fired up in the matter of a click or two. When I checked out S3 you only need to signup and identify a bucket name to start depositing data (files, objects). After that, you are charged for the storage used, data transfer out (data in is free), and the number of HTTP GETs, PUTs, and other requests that are done on a per month basis. The first 5GB is free and comes with a judicious amount of gets, puts, and out data transfer bandwidth.

… but how can they attack the enterprise?

Aside from the three systemic weaknesses identified above, for enterprise customers they seem to lack enterprise security, advanced data services and high availability storage. Yes, NetApp’s Amazon Direct addresses some of the issues by placing enterprise owned, secured and highly available storage to be accessed by EC2 applications. But to really take over and make a dent in enterprise storage sales, Amazon needs something with enterprise class data services, availability and security with an on premises storage gateway that uses and consumes cloud storage, i.e., a cloud storage gateway. That way they can meet or exceed enterprise latency and services requirements at something that approximates S3 storage costs.

We have talked about cloud storage gateways before but none offer this level of storage service. An enterprise class S3 gateway would need to support all storage protocols, especially block (FC, FCoE, & iSCSI) and file (NFS & CIFS/SMB). It would need enterprise data services, such as read-writeable snapshots, thin provisioning, data deduplication/compression, and data mirroring/replication (synch and asynch). It would need to support standard management configuration capabilities, like VMware vCenter, Microsoft System Center, and SMI-S. It would need to mask the inherent variable latency of cloud storage through memory, SSD and hard disk data caching/tiering.. It would need to conceal the eventual consistency nature of cloud storage (see link above). And it would need to provide iron-clad, data security for cloud storage.

It would also need to be enterprise hardened, highly available and highly reliable. That means dually redundant, highly serviceable hardware FRUs, concurrent code load, multiple controllers with multiple, independent, high speed links to the internet. Todays, highly-available data storage requires multi-path storage networks, multiple-independent power sources and resilient cooling so adding multiple-independent, high-speed internet links to use Amazon S3 in the enterprise is not out of the question. In addition to the highly available and serviceable storage gateway capabilities described above it would need to supply high data integrity and reliability.

Who could build such a gateway?

I would say any of the major and some of the minor data storage players could easily do an S3 gateway if they desired. There are a couple of gateway startups (see link above) that have made a stab at it but none have it quite down pat or to the extent needed by the enterprise.

However, the problem with standalone gateways from other, non-Amazon vendors is that they could easily support other cloud storage platforms and most do. This is great for gateway suppliers but bad for Amazon’s market share.

So, I believe Amazon has to invest in it’s own storage gateway if they want to go after the enterprise. Of course, when they create an enterprise cloud storage gateway they will piss off all the other gateway providers and will signal their intention to target the enterprise storage market.

So who is the next winner in data storage – I have to believe its going to be and already is Amazon. Even if they don’t go after the enterprise which I feel is the major prize, they have already carved out an unbreachable market share in a new way to implement and use storage. But when (not if) they go after the enterprise, they will threaten every major storage player.

Yes but what about others?

Arguably, Microsoft Azure is in a better position than Amazon to go after the enterprise. Since their acquisition of StorSimple last year, they already have a gateway that with help, could be just what they need to provide enterprise class storage services using Azure. And they already have access to the enterprise, already have the services, distribution and goto market capabilities that addresses enterprise needs and requirements. Maybe they have it all but they are not yet at the scale of Amazon. Could they go after this – certainly, but will they?

Google is the other major unknown. They certainly have the capability to go after enterprise cloud storage if they want. They already have Google Cloud Storage, which is priced under Amazon’s S3 and provides similar services as far as I can tell. But they have even farther to go to get to the scale of Amazon. And they have less of the marketing, selling and service capabilities that are required to be an enterprise player. So I think they are the least likely of the big three cloud providers to be successful here.

There are many other players in cloud services that could make a play for enterprise cloud storage and emerge out of the pack, namely Rackspace, Savvis, Terremark and others. I suppose DropBox, Box and the other file sharing/collaboration providers might also be able to take a shot at it, if they wanted. But I am not sure any of them have enterprise storage on their radar just yet.

And I wouldn’t leave out the current major storage, networking and server players as they all could potentially go after enterprise cloud storage if they wanted to. And some are partly there already.

Comments?