At USENIX ATC conference a couple of weeks ago there was a presentation by a number of researchers on their BlockStack global name space and storage system based on the blockchain based Bitcoin network. Their paper was titled “Blockstack: A global naming and storage system secured by blockchain” (see pg. 181-194, in USENIX ATC’16 proceedings).
Bitcoin blockchain simplified
Blockchain’s like Bitcoin have a number of interesting properties including completely distributed understanding of current state, based on hashing and an always appended to log of transactions.
Blockchain nodes all participate in validating the current block of transactions and some nodes (deemed “miners” in Bitcoin) supply new blocks of transactions for validation.
All blockchain transactions are sent to each node and blockchain software in the node timestamps the transaction and accumulates them in an ordered append log (the “block“) which is then hashed, and each new block contains a hash of the previous block (the “chain” in blockchain) in the blockchain.
The miner’s block is then compared against the non-miners node’s block (hashes are compared) and if equal then, everyone reaches consensus (agrees) that the transaction block is valid. Then the next miner supplies a new block of transactions, and the process repeats. (See wikipedia’s article for more info).
Rubrik has been around since January 2014 and just GA’d in April of last year. They recently presented at TechFieldDay 10 (TFD10, videos here) with Chris Wahl, Technical Evangelist, Arvin “Nitro” Nithrakashyap, Co-Founder and Bipul Sinha, Co-Founder, in attendance.
Cloudian has been out on the market since March of 2011 but we haven’t heard much about them, probably because their focus has been East Asia. The same day that the Tōhoku Earthquake and Tsunami hit the company announced Cloudian, an Amazon S3 Compliant Multi-Tenant Cloud Storage solution.
Their timing couldn’t have been better. Japanese IT organizations were beating down their door over the next two years for a useable and (earthquake and tsunami) resilient storage solution.
Cloudian spent the next 2 years, hardening their object storage system, the HyperStore, and now they are ready to take on the rest of the world.
Currently Cloudian has about 20PB of storage under management and are shipping a HyperStore Appliance or a software only distribution of their solution. Cloudian’s solutions support S3 and NFS access protocols.
Their solution uses Cassandra, a highly scaleable, distributed NoSQL database which came out of FaceBook for their meta-data database. This provides a scaleable, non-sharable meta-data data base for object meta-data repository and lookup.
Cloudian creates virtual storage pools on backend storage which can be optimized for small objects, replication or erasure coding and can include automatic tiering to any Amazon S3 and Glacier compatible cloud storage. I would guess this is how they qualify for Hybrid Cloud status.
The HyperStore appliance
Cloudian creates a HyperStore P2P ring structure. Each appliance has Cloudian management console services as well as the HyperStore engine which supports three different data stores: Cassandra, Replicas, and Erasure coding. Unlike Scality, it appears as if one HyperStore Ring must exist in a region. But it can be split across data centers. Unclear what their definition of a “region” is.
HyperStore hardware come in entry level (HSA-2024: 24TB/1U), capacity optimized (HSA-2048: 48TB/1U), performance optimized (HSA-2060: all flash, 60TB/2U
Replication with Dynamic Consistency
The other thing that Cloudian supports is different levels of consistency for replicated data. Most object stores support eventual consistency (see Eventual Data Consistency and Cloud Storage post). HyperStore supports 3 (well maybe 5) different levels of consistency:
One – object written to one replica and committed there before responding to client
Quorum – object written to N/2+1 replicas before responding to client
Local Quorum – replicas are written to N/2+1 nodes in same data center before responding to client
Each Quorum – replicas are written to N/2+1 nodes in each data center before responding to client.
All – all replicas must have received and committed the object write before responding to client
There are corresponding read consistency levels as well. The objects writes have a “coordinator” node which handles this consistency. The implication is that consistency could be established on an object basis. Unclear to me whether Read and Write dynamic consistency can be different?
Apparently small objects are also stored in the Cassandra datastore. That way HyperStore optimizes for object size. Also, HyperStore nodes can be added to a ring and the system will auto balance the data across the old and new nodes automatically.
Cloudian also support object versioning, ACL, and QoS services as well.
~~~
I was a bit surprised by Cloudian. I thought I knew all the object storage solutions out on the market. But then again they made their major strides in Asia and as an on-premises Amazon S3 solution, rather than a generic object store.
My friend Alex Teu (@alexteu), from Oxygen Cloud wrote a post today about how Cloud Storage is Eating the World Alive. Alex reports that all major NAS and SAN storage vendors lost revenue this year over the previous year ranging from a ~3% loss to over a 20% loss (Q1-2014 compared to Q1-2013, from IDC).
Although an interesting development, it’s hard to say that this is the end of enterprise storage as we know it. I believe there are a number of factors that are impacting enterprise storage revenues and Cloud storage adoption may be only one of them.
Other trends impacting NAS & SAN storage adoption
One thing that has emerged over the last decade or so is the advance of Flash storage. Some of this is used in storage controllers to speed up IO access and some is used in servers to speed up IO access. But any speedup of IO could potentially reduce the need for high-performing disk drives and could allow customers to use higher capacity/slower disk drives instead. This could definitely reduce the cost of storage systems. A little bit of flash goes long way to speed up IO access.
The other thing is that disk capacity is trending upward, at exponential rates. Yesterday,s 2TB disk drive is todays 4TB disk drive and we are already seeing 6TB from Seagate, HGST and others. And this is also driving down the cost of NAS and SAN storage.
Nowadays you can configure 1PB of storage with just over 170 drives. Somewhere in there you might want a couple 100TB of Flash to speed up IO access to these slow disks but Flash is also coming down in ($/GB) price (see SanDISK’s recent consumer grade TLC drive at $0.44/GB). Also the move to MLC flash has increased the capacity of flash devices, leading to less SSDs/flash cache cards to store/speed up more data.
Finally, the other trend which seems to have emerged recently is the movement away from enterprise class storage to server storage. One can see this in VMware’s VSAN, HyperConverged systems such as Nutanix and Scale Computing, as well as a general trend in Windows Server applications (SQL Server, Exchange Server, etc.) to make better use of DAS storage. So some customers are moving their data to shared DAS storage today, whereas before this was more difficult to accomplish effectively and because of that they previously purchased networked storage.
What about cloud storage?
Yes, as Alex has noted, the price of cloud storage has declined precipitously over the last year or so. Alex’s cloud storage pricing graph is shows how the entry of Microsoft and Google has seemingly forced Amazon to match their price reductions. But the other thing of note is that they have all come down to about the same basic price of $0.024/GB/Month.
It’s interesting that Amazon delayed their first S3 serious price reductions by about 4 months after Azure and Google Cloud Storage dropped there’s and then within another month after that, they all were at price parity.
I could find no update to Amazon S3 numbers from last year but the 10x 2.5x growth in Azure’s object count in ~8 months and the roughly doubling of request/second (In my post I didn’t mention last year they were processing 900K requests/second) say something interesting is going on in cloud storage.
I suppose Google’s cloud storage service is too new to report serious results and maybe Amazon wants to keep their growth a secret. But considering Amazon’s recent matching of Azure’s and Google’s pricing, it probably means that their growth wasn’t what they expected.
The other interesting item from the Microsoft discussions on Azure, was that they were already hosting 1M SQL databases in Azure and that 57% of Fortune 500 customers are currently using Azure.
In the “olden days”, before cloud storage, all these SQL databases and Fortune 500 data sets would have more than likely resided on NAS or SAN storage of some kind. And possibly due to the traditional storage’s higher cost and greater complexity, some of this data would never have been spun up in the first place if they had to use traditional storage, but with cloud storage so cheap, rapidly configurable and easy to use all this new data was placed in the cloud.
So I must conclude from Microsofts growth numbers and their implication for the rest of the cloud storage industry that maybe Alex was right, more data is moving to the cloud and this is impacting traditional storage revenues. With IDC’s (2013) data growth at ~43% per year, it would seem that Microsoft’s cloud storage is growing more rapidly than the worldwide data growth, ~14X faster!
On the other hand, if cloud storage was consuming most of the world’s data growth, it would seem to precipitate the collapse of traditional storage revenues, not just a ~3-20% decline. So maybe the most new cloud storage applications would never have been implemented before if they had to use traditional storage, which means that only some of this new data would ever have been stored on traditional storage in the first place, leading to a relatively smaller decline in revenue.
One question remains: is this a short term impact or more of a long running trend that will play out over the next decade or so? From my perspective, new applications spinning up on non-traditional storage is a long running threat to traditional NAS and SAN storage which will ultimately see traditional storage relegated to a niche. How big this niche will ultimately be and how well it can be defended needs to be the subject for another post?
Just got back from an analyst summit with Spectra Logic. They announced a new interface to tape called, Deep Simple Storage Service (DS3) and an appliance that implements this interface named the BlackPearl. The intent is to broaden the use of tape to include, todays more web services, application environments.
The main problems addressed by the new interface is how do you map an essentially sequential, high throughput but long latency access to first byte, removable media device to an essentially small file, get and put environment. And is there a market for such services. I think Spectra Logic has answered the first set of questions and is about to embark on a journey to answer the second set of questions.
The new interface – it’s all about simplifying tape
The DS3 interface answers the first set of questions. With DS3 Specra Logic has extended Amazon’s S3 interface to expose some of the sequentiality and removability of tape to the object storage world.
As you should recall, Amazon S3 is a RESTful, web interface that uses HTTP type GET and PUT commands to move data to and from the S3 storage service. The data you are moving is considered an object and the object name or identifier is unique across the storage service. When you “PUT” an object you get to add key-value pairs of information called meta-data to the object. When you “GET” an object you retrieve the data from the storage service. The other thing one needs to be aware of is that you get and put objects into “BUCKET”s.
With DS3, Spectra Logic has added essentially 4 new commands to S3 protocol, which are:
Bulk Put – this provides a list of objects that one wants to “PUT” into a DS3 storage service and the response from the DS3 storage service is an ordered list of which objects to PUT in sequence and which DS3 storage server node (essentially an IP address) to send the data.
Bulk Get – this supplies a list of objects that one wants to GET from a DS3 storage service and the response is an ordered list of the sequence to get those objects and the node address to use for those object gets
Export Bucket – this identifies a BUCKET that you wish to remove from a DS3 storage service. Presumably the response would be where the bucket can be found, the number of pieces of media to expect, and some identification of the media serial numbers that constitute a bucket on the DS3 storage service.
Import Bucket – this identifies a new bucket which will be imported into a DS3 storage service and will supply some necessary information such as how many pieces of media to expect and the serial numbers of the media. Presumably the response will be a location which can be used to import the media.
With these four simple commands and an appropriate DS3 client, DS3 server and DS3 storage backend one now has everything they need to support a removable media object store. I could see real value for export/import like this on the “rare occasion” when a cloud service provider goes out of business.
The DS3 interface will be publicly available and the intent is to both supply Spectra Logic developed clients as well a ISV/partner developed DS3 clients so as to provide removable media object stores for all sorts of other applications.
Spectra’s is providing developer tools and documentation so that anyone can write a DS3 client. To that end, the DS3 developer portal is up (couldn’t find a link this AM but will update this post when I find it) and available free of charge to anyone today (believe you need to register to gain access to the doc.). They have a DS3 server simulator that DS3 client developers can use to test out and validate their client software. They also have a try & buy service for client developers.
Essentially, the combination of DS3 clients, DS3 servers and DS3 backend storage create a really deep archive for object data. It’s not intended for primary or secondary storage access but it’s big, cheap, and power/space efficient storage that can be very effective if used for archive data.
BlackPearl, the first DS3 Server
Their second announcement is the first implementation of a DS3 server, Spectra Logic calls BlackPearl(™). The BlackPearl connects to one or more Spectra Logic tape libraries as a backend store which together essentially provides a DS3 object storage archive. The DS3 server talks to DS3 clients on the front end. BlackPearl uses SAS or FC connected tape transports, which can be any transport currently supported by SpectraLogic tape libraries, including IBM TS1140, LTO-4, -5 and -6.
In addition to BlackPearl, Spectra Logic is releasing the first DS3 client for Hadoop. In this case, the DS3 client implements a new version of the Hadoop DistCp (distributed copy) command which can be used to create a copy of an HDFS directory tree onto a DS3 storage service.
Current BlackPearl hardware is a standard 2U server with 4-400GB SSDs inside which act as sort of a speed matching buffer for the Object interface to SAS/FC tape interface.
We only saw a configuration with one BlackPearl in operation (GA of BlackPearl is expected this December). But the plan is to support multiple BlackPearl appliances to talk with the same DS3 backend storage. In that case, there will be a shared database and (tape) resource scheduler across all the appliances in the cluster.
Yes, but what about the market?
It’s a gutsy move for someone like Spectra Logic to define a new open interface to deep storage. The fact that the appliance exists outside the tape library itself and could potentially support any removable media offers interesting architectural capabilities. The current (beta) implementation lacked some sophistication but the expectation is that much of this will be resolved by GA or over time through incremental enhancements.
Pricing is appealing. When you add BlackPearl appliance(s), with a T950 Spectra Logic tape library using LTO drives which supports uncompressed data store of ~2.4PB of archive data, the purchase price is ~$0.10/GB. This compares especially well with current Amazon Glacier pricing of $0.01/GB/Month, so that for the price of 10 months of Glacier storage you could own your own DS3 storage service.
At larger capacities, such as BlackPearl using T950 with TS1140 tape drives supporting 6.4PB is even cheaper, at $0.09/GB. Other configurations are available and in general bigger congfigurations are cheaper on $/GB and smaller ones more expensive. The configurations are speced by Spectra Logic to have all the media, tape drives and BlackPearl systems be needed to support an archives object store.
As for markets, Spectra Logic already has beta interest from a large well known web services customer and a number of media & entertainment customers.
In the long run, Spectra Logic believes that if they can simplify access to tape for an application where it’s well qualified to support (deep archive), that this will enable new applications to take advantage of tape, that weren’t even dreamed of before. By opening up a Object Store interface to tape, anyone currently using S3 is a potential customer.
Amazon announced earlier this year that they have over 2 trillion objects is their S3. And as far as I can tell (see my post Who’s the next winner in storage?) they are growing with no end in sight.
“The future is already here – just not evenly distributed”, W. Gibson
It starts as it always does outside the enterprise data center. In the line of businesses, in the development teams, in the small business organizations that don’t know any better but still have an unquenchable need for data storage.
It’s essentially an Innovator’s Dillemma situation. The upstarts are coming into the market at the lower end, lower margin side of the business that the major vendors don’t seem to care about, don’t service very well and are ignoring to their peril.
Yes, it doesn’t offer all the data services that the big guns (EMC, Dell, HDS, IBM, and NetApp) have. It doesn’t offer the data availability and reliability that enterprise data centers have come to demand from their storage. require. And it doesn’t have the performance of major enterprise data storage systems.
But what it does offer, is lower CapEx, unlimited scaleability, and much easier to manage and adopt data storage, albeit using a new protocol. It does have some inherent, hard to get around problems not the least of which is speed of data ingest/egress, highly variable latency and eventual consistency. There are other problems which are more easily solvable, with work, but the three listed above are intrinsic to the solution and need to be dealt with systematically.
And the winner is …
It has to be cloud storage providers and the big elephant in the room has to be Amazon. I know there’s a lot of hype surrounding AWS S3 and EC2 but you must admit that they are growing, doubling year over year. Yes it is starting from a much lower capacity point and yes, they are essentially providing “rentable” data storage space with limited or even non-existant storage services. But they are opening up whole new ways to consume storage that never existed before. And therein lies their advantage and threat to the major storage players today, unless they act to counter this upstart.
On AWS’s EC2 website there must be 4 dozen different applications that can be fired up in the matter of a click or two. When I checked out S3 you only need to signup and identify a bucket name to start depositing data (files, objects). After that, you are charged for the storage used, data transfer out (data in is free), and the number of HTTP GETs, PUTs, and other requests that are done on a per month basis. The first 5GB is free and comes with a judicious amount of gets, puts, and out data transfer bandwidth.
… but how can they attack the enterprise?
Aside from the three systemic weaknesses identified above, for enterprise customers they seem to lack enterprise security, advanced data services and high availability storage. Yes, NetApp’s Amazon Direct addresses some of the issues by placing enterprise owned, secured and highly available storage to be accessed by EC2 applications. But to really take over and make a dent in enterprise storage sales, Amazon needs something with enterprise class data services, availability and security with an on premises storage gateway that uses and consumes cloud storage, i.e., a cloud storage gateway. That way they can meet or exceed enterprise latency and services requirements at something that approximates S3 storage costs.
We have talked about cloud storage gateways before but none offer this level of storage service. An enterprise class S3 gateway would need to support all storage protocols, especially block (FC, FCoE, & iSCSI) and file (NFS & CIFS/SMB). It would need enterprise data services, such as read-writeable snapshots, thin provisioning, data deduplication/compression, and data mirroring/replication (synch and asynch). It would need to support standard management configuration capabilities, like VMware vCenter, Microsoft System Center, and SMI-S. It would need to mask the inherent variable latency of cloud storage through memory, SSD and hard disk data caching/tiering.. It would need to conceal the eventual consistency nature of cloud storage (see link above). And it would need to provide iron-clad, data security for cloud storage.
It would also need to be enterprise hardened, highly available and highly reliable. That means dually redundant, highly serviceable hardware FRUs, concurrent code load, multiple controllers with multiple, independent, high speed links to the internet. Todays, highly-available data storage requires multi-path storage networks, multiple-independent power sources and resilient cooling so adding multiple-independent, high-speed internet links to use Amazon S3 in the enterprise is not out of the question. In addition to the highly available and serviceable storage gateway capabilities described above it would need to supply high data integrity and reliability.
Who could build such a gateway?
I would say any of the major and some of the minor data storage players could easily do an S3 gateway if they desired. There are a couple of gateway startups (see link above) that have made a stab at it but none have it quite down pat or to the extent needed by the enterprise.
However, the problem with standalone gateways from other, non-Amazon vendors is that they could easily support other cloud storage platforms and most do. This is great for gateway suppliers but bad for Amazon’s market share.
So, I believe Amazon has to invest in it’s own storage gateway if they want to go after the enterprise. Of course, when they create an enterprise cloud storage gateway they will piss off all the other gateway providers and will signal their intention to target the enterprise storage market.
So who is the next winner in data storage – I have to believe its going to be and already is Amazon. Even if they don’t go after the enterprise which I feel is the major prize, they have already carved out an unbreachable market share in a new way to implement and use storage. But when (not if) they go after the enterprise, they will threaten every major storage player.
Yes but what about others?
Arguably, Microsoft Azure is in a better position than Amazon to go after the enterprise. Since their acquisition of StorSimple last year, they already have a gateway that with help, could be just what they need to provide enterprise class storage services using Azure. And they already have access to the enterprise, already have the services, distribution and goto market capabilities that addresses enterprise needs and requirements. Maybe they have it all but they are not yet at the scale of Amazon. Could they go after this – certainly, but will they?
Google is the other major unknown. They certainly have the capability to go after enterprise cloud storage if they want. They already have Google Cloud Storage, which is priced under Amazon’s S3 and provides similar services as far as I can tell. But they have even farther to go to get to the scale of Amazon. And they have less of the marketing, selling and service capabilities that are required to be an enterprise player. So I think they are the least likely of the big three cloud providers to be successful here.
There are many other players in cloud services that could make a play for enterprise cloud storage and emerge out of the pack, namely Rackspace,Savvis, Terremark and others. I suppose DropBox, Box and the other file sharing/collaboration providers might also be able to take a shot at it, if they wanted. But I am not sure any of them have enterprise storage on their radar just yet.
And I wouldn’t leave out the current major storage, networking and server players as they all could potentially go after enterprise cloud storage if they wanted to. And some are partly there already.
Last fall at SNW in San Jose there were a few vendors touting enterprise file synchronization services each having a slightly different version of the requirements. The one that comes most readily to mind was Egnyte which supported file synchronization across a hybrid cloud (public cloud and network storage) which we discussed in our Fall SNWUSA wrap up post last year.
The problem with BYOD
With bring your own devices (BYOD) corporate end users are quickly abandoning any pretense of IT control and turning consumer class file synchronization services to help synch files across desktop, laptop and all mobile devices they haul around. But the problem with these solutions such as DropBox, Box, OxygenCloud and others are that they are really outside of IT’s control.
Which is why there’s a real need today for enterprise class file synchronization solutions that exhibit the ease of use and set up available from consumer file synch systems but offer IT security, compliance and control over the data that’s being moved into the cloud and across corporate and end user devices.
EMC Syncplicity and EMC on premises storage
Last week EMC announced an enterprise version of their recently acquired Syncplicity software that supports on-premises Isilon or Atmos storage, EMC’s own cloud storage offering.
In previous versions of Syncplicity storage was based in the cloud and used Amazon Web Services (AWS) for cloud orchestration and AWS S3 for cloud storage. With the latest release, EMC adds on premises storage to host user file synchronization services that can span mobile devices, laptops and end user desktops.
New Syncplicity users must download desktop client software to support file synchronization or mobile apps for mobile device synchronization. After that it’s a simple matter of identifying which if any directories and/or files are to be synchronized with the cloud and/or shared with others.
However, with the Business (read enterprise) edition one also gets the Security and Compliance console which supports access control to define users and devices that can synchronize or share data, enforce data retention policies, remote wipe corporate data, and native support for single sign services. In addition, one also gets centralized user and group management services to grant, change, revoke user and group access to data. Also, one now obtains enterprise security with AES-256 data-at-rest encryption, separate key manager data centers and data storage data centers, quadruple replication of data for high disaster fault tolerance and SAS70 Type II compliant data centers.
If the client wants to use on premises storage, they would also need to deploy a VM virtual appliance somewhere in the data center to act as the gateway to file synchronization service requests. The file synch server would also presumably need access to the on premises storage and it’s unclear if the virtual appliance is in-band or out-of-band (see discussion on Egnyte’s solution options below).
Egnyte’s solution
Egnyte comes as a software only solution building a file server in the cloud for end user storage. It also includes an Egnyte app for mobile hardware and the ever present web file browser. Desktop file access is provided via mapped drives which access the Egnyte cloud file server gateway running as a virtual appliance.
One major difference between Syncplicity and Egnyte is that Egnyte offers a combination of both cloud and on premises storage but you cannot have just on premises storage. Syncplicity only offers one or the other storage for file data, i.e., file synchronization data can only be in the cloud or on local on premises storage but cannot be in both locations.
The other major difference is that Egnyte operates with just about anybody’s NAS storage such as EMC, IBM, and HDS for the on premises file storage. It operates as an in-band, software appliance solution that traps file activity going to your on premises storage. In this case, one would need to start using a new location or directory for data to be synchronized or shared.
But for NetApp storage only (today), they utilize ONTAP APIs to offer out-of-band file synchronization solutions. This means that you can keep NetApp data where it resides and just enable synchronization/shareability services for the NetApp file data in current directory locations.
Egnyte promises enterprise class data security with AD, LDAP and/or SSO user authentication, AES-256 data encryption and their own secure data centers. No mention of separate key security in their literature.
As for cloud backend storage, Egnyte has it’s own public cloud or supports other cloud storage providers such as AWS S3, Microsoft Azure, NetApp Storage Grid and HP Public Cloud.
There’s more to Egnyte’s solution than just file synchronization and sharing but that’s the subject of today’s post. Perhaps we can cover them at more length in a future post if their interest.
File synchronization, cloud storage’s killer app?
The nice thing about these capabilities is that now IT staff can re-gain control over what is and isn’t synched and shared across multiple devices. Up until now all this was happening outside the data center and external to IT control.
From Egnyte’s perspective, they are seeing more and more enterprises wanting data both on premises for performance and compliance as well as in the cloud storage for ubiquitous access. They feel its both a sharability demand between an enterprise’s far flung team members and potentially client/customer personnel as well as a need to access, edit and propagate silo’d corporate information using new mobile devices that everyone has these days.
In any event, Enterprise file synchronization and sharing is emerging as one of the killer apps for cloud storage. Up to this point cloud gateways made sense for SME backup or disaster recovery solutions but IMO, didn’t really take off beyond that space. But if you can package a robust and secure file sharing and synchronization solution around cloud storage then you just might have something that enterprise customers are clamoring for.
Code Name "Thumper" by richardmasoner (cc) (from Flickr)
An announcement this week by VMware on their vSphere 5 Virtual Storage Appliance has brought back the concept of shared DAS (see vSphere 5 storage announcements).
Over the years, there have been a few products, such as Seanodes and Condor Storage (may not exist now) that have tried to make a market out of sharing DAS across a cluster of servers.
Arguably, Hadoop HDFS (see Hadoop – part 1), Amazon S3/cloud storage services and most scale out NAS systems all support similar capabilities. Such systems consist of a number of servers with direct attached storage, accessible by other servers or the Internet as one large, contiguous storage/file system address space.
Why share DAS? The simple fact is that DAS is cheap, its capacity is increasing, and it’s ubiquitous.
Shared DAS system capabilities
VMware has limited their DAS virtual storage appliance to a 3 ESX node environment, possibly lot’s of reasons for this. But there is no such restriction for Seanode Exanode clusters.
On the other hand, VMware has specifically targeted SMB data centers for this facility. In contrast, Seanodes has focused on both HPC and SMB markets for their shared internal storage which provides support for a virtual SAN on Linux, VMware ESX, and Windows Server operating systems.
Although VMware Virtual Storage Appliance and Seanodes do provide rudimentary SAN storage services, they do not supply advanced capabilities of enterprise storage such as point-in-time copies, replication, data reduction, etc.
But, some of these facilities are available outside their systems. For example, VMware with vSphere 5 will supports a host based replication service and has had for some time now software based snapshots. Also, similar services exist or can be purchased for Windows and presumably Linux. Also, cloud storage providers have provided a smattering of these capabilities from the start in their offerings.
Performance?
Although distributed DAS storage has the potential for high performance, it seems to me that these systems should perform poorer than an equivalent amount of processing power and storage in a dedicated storage array. But my biases might be showing.
On the other hand, Hadoop and scale out NAS systems are capable of screaming performance when put together properly. Recent SPECsfs2008 results for EMC Isilon scale out NAS system have demonstrated very high performance and Hadoops claim to fame is high performance analytics. But you have to throw a lot of nodes at the problem.
—–
In the end, all it takes is software. Virtualizing servers, sharing DAS, and implementing advanced storage features, any of these can be done within software alone.
However, service levels, high availability and fault tolerance requirements have historically necessitated a physical separation between storage and compute services. Nonetheless, if you really need screaming application performance and software based fault tolerance/high availability will suffice, then distributed DAS systems with co-located applications like Hadoop or some scale out NAS systems are the only game in town.