Cloudian has been out on the market since March of 2011 but we haven’t heard much about them, probably because their focus has been East Asia. The same day that the Tōhoku Earthquake and Tsunami hit the company announced Cloudian, an Amazon S3 Compliant Multi-Tenant Cloud Storage solution.
Their timing couldn’t have been better. Japanese IT organizations were beating down their door over the next two years for a useable and (earthquake and tsunami) resilient storage solution.
Cloudian spent the next 2 years, hardening their object storage system, the HyperStore, and now they are ready to take on the rest of the world.
Currently Cloudian has about 20PB of storage under management and are shipping a HyperStore Appliance or a software only distribution of their solution. Cloudian’s solutions support S3 and NFS access protocols.
Their solution uses Cassandra, a highly scaleable, distributed NoSQL database which came out of FaceBook for their meta-data database. This provides a scaleable, non-sharable meta-data data base for object meta-data repository and lookup.
Cloudian creates virtual storage pools on backend storage which can be optimized for small objects, replication or erasure coding and can include automatic tiering to any Amazon S3 and Glacier compatible cloud storage. I would guess this is how they qualify for Hybrid Cloud status.
The HyperStore appliance
Cloudian creates a HyperStore P2P ring structure. Each appliance has Cloudian management console services as well as the HyperStore engine which supports three different data stores: Cassandra, Replicas, and Erasure coding. Unlike Scality, it appears as if one HyperStore Ring must exist in a region. But it can be split across data centers. Unclear what their definition of a “region” is.
HyperStore hardware come in entry level (HSA-2024: 24TB/1U), capacity optimized (HSA-2048: 48TB/1U), performance optimized (HSA-2060: all flash, 60TB/2U
Replication with Dynamic Consistency
The other thing that Cloudian supports is different levels of consistency for replicated data. Most object stores support eventual consistency (see Eventual Data Consistency and Cloud Storage post). HyperStore supports 3 (well maybe 5) different levels of consistency:
One – object written to one replica and committed there before responding to client
Quorum – object written to N/2+1 replicas before responding to client
Local Quorum – replicas are written to N/2+1 nodes in same data center before responding to client
Each Quorum – replicas are written to N/2+1 nodes in each data center before responding to client.
All – all replicas must have received and committed the object write before responding to client
There are corresponding read consistency levels as well. The objects writes have a “coordinator” node which handles this consistency. The implication is that consistency could be established on an object basis. Unclear to me whether Read and Write dynamic consistency can be different?
Apparently small objects are also stored in the Cassandra datastore. That way HyperStore optimizes for object size. Also, HyperStore nodes can be added to a ring and the system will auto balance the data across the old and new nodes automatically.
Cloudian also support object versioning, ACL, and QoS services as well.
I was a bit surprised by Cloudian. I thought I knew all the object storage solutions out on the market. But then again they made their major strides in Asia and as an on-premises Amazon S3 solution, rather than a generic object store.
We talked with Nexenta at Storage Field Day 6 where they discussed their current and future software defined storage solutions. I highly encourage you to see the SFD6 videos of their sessions if you want to learn more about them.
NexentaStor™ is there base storage software and comes as a download in both an Enterprise edition and Community edition. NexentaStor can run on most industry standard, x86 server platforms.
The Community edition supports up to 18TB and uses DAS and/or SAS connected storage to supply NFS and SMB file services.
The Enterprise edition extends capacity into the PB and supports FC and iSCSI block storage services as well as file services. The Enterprise edition supports plugins for HA solutions and storage replication.
Nexenta mentioned that they had over 6500 customers for NexentaStor of which 1500 are cloud service providers. But they have a whole lot more to offer than just NexentaStor including NexentaConnect™ and coming soon, NexentaEdge™ and NexentaFusion™.
NexentaConnect software works with VMware or Citrix solutions to provide advanced storage services, such as file services, IO acceleration, and storage automation/analytics. There are three products in the NexentaConnect family:
NexentaConnect for VMware Virtual SAN – by combining NexentaConnect together with VMware Virtual SAN software and DAS or SAS storage one can offer NFS and SMB/CIFS file services. Prior to NexentaConnect, VMware Virtual SAN storage only provided VMware dedicated SAN storage, but now that same infrastructure can be used for any NFS or SMB/CIFS file system storage.
NexentaConnect for VMware Horizon – by combining NexentaConnect with VMware Horizon and DAS plus local SSD storage, one can provide accelerated virtual desktop IO with state of the art write logging, inline deduplication, and GUI based storage automation/analytics.
NexentaConnect for Citrix XenDesktop (in Beta now) –by combining NexentaConnect with Citrix XenDesktop software and DAS plus local SSD storage, one can accelerate XenDesktop IO and ease the management of XenDesktop storage.
They spent a lot of time on NexentaEdge and what they plan to offer is a software defined object storage solution. Most object storage systems on the market either started as software only or currently support a software only version. But Nexenta is the first to come at it from a file services heritage that I know of.
NexentaEdge will offer iSCSI services as well as standard object storage services such as Amazon S3 and OpenStack SWIFT. Their solution splits up objects into chunks and replicates/distributes the object chunks across their software defined (object) storage cluster.
Cluster communications uses UDP (not TCP) and so has less overhead. NexentaEdge cluster communications uses their own Replicast protocol to send messages and data out across the cluster. .
They designed NexentaEdge to be able to support Shingle Magnetic Recording (SMR) disks which are very dense storage but occasionally have to go “away” while they perform garbage collection/re-organization. I did two posts about SMR disks a while back (see Shingled magnetic recording disks and Sequential-only disk for more information on SMR).
I have to admit I had a BIG problem with support for iSCSI over eventually consistent storage. I don’t see how this can be used to support ACID database requests but I suppose Nexenta would argue that anyone using object storage for ACID database IO needs to have their head examined.
Although this was not discussed as much, NexentaFusion is another future offering supplying software defined storage analytics and orchestration automation. They intent is to use NexentaFusion with NexentaStor, NexentaConnect and/or NexentaEdge. As you scale up your Nexenta storage cluster, automation/orchestration and storage analytics starts to become a more pressing need. According to Nexenta’s website NexentaFusion 1.0 will support multi-tennant storage monitoring and real time storage analytics while NexentaFusion 2.0 will supportstorage provisioning and orchestration.
Nexenta provided Converse all-star shoes to all the participants as well as pens and notebooks. I had to admit I liked the look of the new tennis shoes but my wife and kids thought I was crazy.
Different views on Nexenta from the other SFD6 bloggers can be found below:
“The future is already here – just not evenly distributed”, W. Gibson
It starts as it always does outside the enterprise data center. In the line of businesses, in the development teams, in the small business organizations that don’t know any better but still have an unquenchable need for data storage.
It’s essentially an Innovator’s Dillemma situation. The upstarts are coming into the market at the lower end, lower margin side of the business that the major vendors don’t seem to care about, don’t service very well and are ignoring to their peril.
Yes, it doesn’t offer all the data services that the big guns (EMC, Dell, HDS, IBM, and NetApp) have. It doesn’t offer the data availability and reliability that enterprise data centers have come to demand from their storage. require. And it doesn’t have the performance of major enterprise data storage systems.
But what it does offer, is lower CapEx, unlimited scaleability, and much easier to manage and adopt data storage, albeit using a new protocol. It does have some inherent, hard to get around problems not the least of which is speed of data ingest/egress, highly variable latency and eventual consistency. There are other problems which are more easily solvable, with work, but the three listed above are intrinsic to the solution and need to be dealt with systematically.
And the winner is …
It has to be cloud storage providers and the big elephant in the room has to be Amazon. I know there’s a lot of hype surrounding AWS S3 and EC2 but you must admit that they are growing, doubling year over year. Yes it is starting from a much lower capacity point and yes, they are essentially providing “rentable” data storage space with limited or even non-existant storage services. But they are opening up whole new ways to consume storage that never existed before. And therein lies their advantage and threat to the major storage players today, unless they act to counter this upstart.
On AWS’s EC2 website there must be 4 dozen different applications that can be fired up in the matter of a click or two. When I checked out S3 you only need to signup and identify a bucket name to start depositing data (files, objects). After that, you are charged for the storage used, data transfer out (data in is free), and the number of HTTP GETs, PUTs, and other requests that are done on a per month basis. The first 5GB is free and comes with a judicious amount of gets, puts, and out data transfer bandwidth.
… but how can they attack the enterprise?
Aside from the three systemic weaknesses identified above, for enterprise customers they seem to lack enterprise security, advanced data services and high availability storage. Yes, NetApp’s Amazon Direct addresses some of the issues by placing enterprise owned, secured and highly available storage to be accessed by EC2 applications. But to really take over and make a dent in enterprise storage sales, Amazon needs something with enterprise class data services, availability and security with an on premises storage gateway that uses and consumes cloud storage, i.e., a cloud storage gateway. That way they can meet or exceed enterprise latency and services requirements at something that approximates S3 storage costs.
We have talked about cloud storage gateways before but none offer this level of storage service. An enterprise class S3 gateway would need to support all storage protocols, especially block (FC, FCoE, & iSCSI) and file (NFS & CIFS/SMB). It would need enterprise data services, such as read-writeable snapshots, thin provisioning, data deduplication/compression, and data mirroring/replication (synch and asynch). It would need to support standard management configuration capabilities, like VMware vCenter, Microsoft System Center, and SMI-S. It would need to mask the inherent variable latency of cloud storage through memory, SSD and hard disk data caching/tiering.. It would need to conceal the eventual consistency nature of cloud storage (see link above). And it would need to provide iron-clad, data security for cloud storage.
It would also need to be enterprise hardened, highly available and highly reliable. That means dually redundant, highly serviceable hardware FRUs, concurrent code load, multiple controllers with multiple, independent, high speed links to the internet. Todays, highly-available data storage requires multi-path storage networks, multiple-independent power sources and resilient cooling so adding multiple-independent, high-speed internet links to use Amazon S3 in the enterprise is not out of the question. In addition to the highly available and serviceable storage gateway capabilities described above it would need to supply high data integrity and reliability.
Who could build such a gateway?
I would say any of the major and some of the minor data storage players could easily do an S3 gateway if they desired. There are a couple of gateway startups (see link above) that have made a stab at it but none have it quite down pat or to the extent needed by the enterprise.
However, the problem with standalone gateways from other, non-Amazon vendors is that they could easily support other cloud storage platforms and most do. This is great for gateway suppliers but bad for Amazon’s market share.
So, I believe Amazon has to invest in it’s own storage gateway if they want to go after the enterprise. Of course, when they create an enterprise cloud storage gateway they will piss off all the other gateway providers and will signal their intention to target the enterprise storage market.
So who is the next winner in data storage – I have to believe its going to be and already is Amazon. Even if they don’t go after the enterprise which I feel is the major prize, they have already carved out an unbreachable market share in a new way to implement and use storage. But when (not if) they go after the enterprise, they will threaten every major storage player.
Yes but what about others?
Arguably, Microsoft Azure is in a better position than Amazon to go after the enterprise. Since their acquisition of StorSimple last year, they already have a gateway that with help, could be just what they need to provide enterprise class storage services using Azure. And they already have access to the enterprise, already have the services, distribution and goto market capabilities that addresses enterprise needs and requirements. Maybe they have it all but they are not yet at the scale of Amazon. Could they go after this – certainly, but will they?
Google is the other major unknown. They certainly have the capability to go after enterprise cloud storage if they want. They already have Google Cloud Storage, which is priced under Amazon’s S3 and provides similar services as far as I can tell. But they have even farther to go to get to the scale of Amazon. And they have less of the marketing, selling and service capabilities that are required to be an enterprise player. So I think they are the least likely of the big three cloud providers to be successful here.
There are many other players in cloud services that could make a play for enterprise cloud storage and emerge out of the pack, namely Rackspace,Savvis, Terremark and others. I suppose DropBox, Box and the other file sharing/collaboration providers might also be able to take a shot at it, if they wanted. But I am not sure any of them have enterprise storage on their radar just yet.
And I wouldn’t leave out the current major storage, networking and server players as they all could potentially go after enterprise cloud storage if they wanted to. And some are partly there already.
We were talking with Ursheet Parikh at StorSimple today about their new cloud gateway product (to be covered in a future post) when at the end of the talk he described some IP they have to handle cloud storage’s “eventual consistency“. Dumbfounded, I asked him to clarify, having never heard this term before.
Apparently, eventual data consistency is what you get when you use most cloud storage providers. With eventual consistency they will not guarantee that when you read back an object that has been recently updated that you will get the latest copy.
In contrast, “immediate consistency” means that if you update an object the cloud storage provider guarantees the latest version will be supplied for any and all subsequent read backs. To me all storage up until cloud storage guaranteed immediate consistency otherwise it was considered a data integrity failure.
To explain, cloud storage providers have multiple copies of any object replicated about that must be updated throughout their environment. As such, they cannot guarantee that you will read back an updated version versus one of the downlevel one(s)- Yikes!
What does this mean for your cloud storage?
First, Microsoft’s Azure cloud storage is the only provider that guarantees immediate consistency but in order to do so has made some restrictions on object size. But this means all the other cloud storage providers only guarantee eventual consistency.
Second, cloud storage with eventual consistency guarantee should not be used for data that’s updated frequently and then read back. It’s probably ok for archive or backup storage (that’s not restored for awhile) BUT it’s not ok for “normal” file or block data which is updated frequently and then read back expecting to see the updates.
According to Ursheet, the cloud storage providers have been completely up-front about their consistency level and as such his product, StorSimple, has been specifically designed to accommodate variable levels of consistency. We would need to ask the other providers how they handle cloud storage consistency-ness to understand whether they have tried to deal with this as well.
However, from my perspective eventual consistency is scary. It appears that cloud storage has redefined what we mean by storage or at the very least eliminating data integrity. Moreover, this seriously limits the usability of raw cloud storage to very archive-like, infrequently updated data storage.
And I thought cloud storage was going to take over the data center – not like this…