NetApp Analyst Summit Customer Panel – how to survive a category 5 tornado

20120621-085224.jpg
NetApp had three of their customer innovation winners come up on stage for a panel discussion with Dave Hitz moderating the discussion. All three had interesting deployments of NetApp storage systems:

  • Andrew Henderson from ING DIRECT talked about their need to deploy copies of the banks IT environment for test, development, optimization and security testing. This process took 12 weeks to accomplish the first time they tried and only created a single copy. They wanted to speed this up and be able to deploy 10 or more copies if necessary. Andrew looked at Microsoft Hyper-V, System Center and NetApp FlexClones and transformed this process to now generate a copy of the entire banks IT services in under 10 minutes. And since the new capabilities have been in place they have created over 400 copies of the bank (he called these bank-in-a-box) for various purposes.
  • Teresa Wahlert from Iowa Workforce Development Agency was up next and talked about their VDI implementation. Iowa cut their budget which forced them to shut down a number of physical offices. But with VDI, VMware and NetApp storage Workforce were able to disperse their services to over 3000 locations now in prisons, libraries, and other venues where they had no presence before. They put out a general call for all the tired, dying PCs in Iowa government and used these to host VDI services. Now Workforce services are up 7X24 locations, pretty amazing for government work. Apparently they had tried VDI before and their previous storage couldn’t handle it. They moved to NetApp with FlashCache and it worked just fine. That’s when they rolled it VDI services to their customers and businesses. With NetApp they were able to implement VDI, reduce storage costs (via deduplication and other storage efficiency features) and increase department services.
  • Jeff Bell at Mercy Healthcare talked about the difficulties of rolling out electronic health records (EHR) and their challenges of integrating ~30 hospitals and ~400 medical clinics. They started with EHR fairly early 2006-2007 well before the latest governmental push. He mentioned Joplin MO and last years category 5 tornado which about wiped out their hospital there. He said within 2 hours after the disaster, Mercy Healthcare was printing out the EHR for the 183 patients present in the hospital at the time that had to be moved to other care facilities. The promise of EHR is that the information travels with the patient, can be recovered in the event of a disaster and is immediately available.  It seems that at least at Mercy Healthcare, EHR is living up to its promise. In addition, they just built a new data center as they were running out of space, power and cooling at the old one. They installed new NetApp storage there and for the first few months had to run heaters to keep the data center live-able because the new power/cooling load was so far below what they were experienced previously. Looking back on what they had accomplished Jeff was not so sure they would build a new data center again. With new cloud offerings coming out and the reduced power/cooling and increased density of NetApp storage they could almost get by without another data center at all.

That’s about it from the customer session.

NetApp execs spent the rest of the day on innovation, mostly at NetApp but also in the IT industry in general.

There was lots of discussion on the new release of Data ONTAP 8.1.1 with its latest cluster mode features.  NetApp positioned it as fulfilling out the transition to  data/storage as an infrastructure that IT has been pushing for the last decade or so.  Following in the grand tradition of what IBM did for computing infrastructure with the 360 and what Cisco and others did for networking infrastructure in the mid 80’s.

Comments?

IBMEdge2012 Day 1 part A – new Ultradrawer, integrated real time compression and more

Brian got up and talked about SmarterStorage that is coming out of STG. A couple of items of specific interest included:

  • IBM’s new ultra drawer – apparently server side all flash array, unclear if this is a shared storage device but would think so.
  • EasyTier cross domain support – In conjunction with the Ultra drawer rollout Brian mentioned that EasyTier would support multi-domain (outside just shared storage tiering?!)
  • Virtual Storage Center – a new administration capability targeted to private and public cloud services which supports a storage service catalog and more self-service provisioning.
  • Real time data compression – for SVC and Storwize V7000 a new (integrated) storage data compression capability based on LZ compression for real time primary active data compression on storage. This will be integrated with other IBM storage over time.

 

That’s about it from the keynote other than the electronic strings music was awesome…

20120604-101955.jpg

20120604-102035.jpg

EMC World 2012 part 1 – VNX/VNXe

Plenty of announcements this morning from EMC World 2012. I’ll try to group them into different posts.  Today’s post Unified Storage Division VNX/VNXe announcements:

  • New VNXe3150 which fills out the lower end of the VNXe line and replaces VNX3100 (but will coexist for awhile). The new storage system supports 2.5″ drives, has quadcore processing, now supports SSDs as a static storage tier (no FAST yet) and has a 100 drive capacity supports 3.5″ 3TB drives and has dual 10GbE port frontend interface cards.  The new system provides 50% performance increase and capacity increase in the same rack.
  • New VNX software now supports 256 read-writeable snapshots per LUN, previous limits were 8 (I think). EMC has also improved storage pooling for both VNX and VNXe which now allows multiple types of RAID groups per pool (previously they all had to be the same RAID level) and rebalancing across RAID groups for better performance, new larger RAID 5 & 6 groups (why?).   They now offer better storage analytics with VMware vCenter which provides impressive integration with FAST VP and FAST CACHE, supplying performance and capacity information, allowing faster diagnosis and resolution of storage issues under VMware.

Stay tuned, more to come I’m sure

 

Will Hybrid drives conquer enterprise storage?

Toyota Hybrid Synergy Drive Decal: RAC Future Car Challenge by Dominic's pics (cc) (from Flickr)
Toyota Hybrid Synergy Drive Decal: RAC Future Car Challenge by Dominic's pics (cc) (from Flickr)

I saw where Seagate announced the next generation of their Momentus XT Hybrid (SSD & Disk) drive this week.  We haven’t discussed Hybrid drives much on this blog but it has become a viable product family.

I am not planning on describing the new drive specs here as there was an excellent review by Greg Schulz at StorageIOblog.

However, the question some in the storage industry have had is can Hybrid drives supplant data center storage.  I believe the answer to that is no and I will tell you why.

Hybrid drive secrets

The secret to Seagate’s Hybrid drive lies in its FAST technology.  It provides a sort of automated disk caching that moves frequently accessed OS or boot data to NAND/SSD providing quicker access times.

Storage subsystem caching logic has been around in storage subsystems for decade’s now, ever since the IBM 3880 Mod 11&13 storage control systems came out last century.  However, these algorithms have gotten much more sophisticated over time and today can make a significant difference in storage system performance.  This can be easily witnessed by the wide variance in storage system performance on a per disk drive basis (e.g., see my post on Latest SPC-2 results – chart of the month).

Enterprise storage use of Hybrid drives?

The problem with using Hybrid drives in enterprise storage is that caching algorithms are based on some predictability of access/reference patterns.  When you have a Hybrid drive directly connected to a server or a PC it can view a significant portion of server IO (at least to the boot/OS volume) but more importantly, that boot/OS data is statically allocated, i.e., doesn’t move around all that much.   This means that one PC session looks pretty much like the next PC session and as such, the hybrid drive can learn an awful lot about the next IO session just by remembering the last one.

However, enterprise storage IO changes significantly from one storage session (day?) to another.  Not only are the end-user generated database transactions moving around the data, but the data itself is much more dynamically allocated, i.e., moves around a lot.

Backend data movement is especially true for automated storage tiering used in subsystems that contain both SSDs and disk drives. But it’s also true in systems that map data placement using log structured file systems.  NetApp Write Anywhere File Layout (WAFL) being a prominent user of this approach but other storage systems do this as well.

In addition, any fixed, permanent mapping of a user data block to a physical disk location is becoming less useful over time as advanced storage features make dynamic or virtualized mapping a necessity.  Just consider snapshots based on copy-on-write technology, all it takes is a write to have a snapshot block be moved to a different location.

Nonetheless, the main problem is that all the smarts about what is happening to data on backend storage primarily lies at the controller level not at the drive level.  This not only applies to data mapping but also end-user/application data access, as cache hits are never even seen by a drive.  As such, Hybrid drives alone don’t make much sense in enterprise storage.

Maybe, if they were intricately tied to the subsystem

I guess one way this could all work better is if the Hybrid drive caching logic were somehow controlled by the storage subsystem.  In this way, the controller could provide hints as to which disk blocks to move into NAND.  Perhaps this is a way to distribute storage tiering activity to the backend devices, without the subsystem having to do any of the heavy lifting, i.e., the hybrid drives would do all the data movement under the guidance of the controller.

I don’t think this likely because it would take industry standardization to define any new “hint” commands and they would be specific to Hybrid drives.  Barring standards, it’s an interface between one storage vendor and one drive vendor.  Probably ok if you made both storage subsystem and hybrid drives but there aren’t any vendor’s left that does both drives and the storage controllers.

~~~~

So, given the state of enterprise storage today and its continuing proclivity to move data around accross its backend storage,  I believe Hybrid drives won’t be used in enterprise storage anytime soon.

Comments?

 

One day with HDS

HDS CEO Jack Domme shares the company’s vision and strategy with Influencer Summit attendees #HDSday by HDScorp
HDS CEO Jack Domme shares the company’s vision and strategy with Influencer Summit attendees #HDSday by HDScorp

Attended #HDSday yesterday in San Jose.  Listened to what seemed like the majority of the executive team. The festivities were MCed by Asim Zaheer, VP Corp and Product Marketing, a long time friend and employee, that came to HDS with the acquisition of Archivas five or so years ago.   Some highlights of the day’s sessions are included below.

The first presenter was Jack Domme, HDS CEO, and his message was that there is a new, more aggressive HDS, focused on executing and growing the business.

Jack said there will be almost a half a ZB by 2015 and ~80% of that will be unstructured data.  HDS firmly believes that much of this growing body of  data today lives in silos, locked into application environments and can’t become truly information until it can be liberated from this box.  Getting information out of the unstructured data is one of the key problems facing the IT industry.

To that end, Jack talked about the three clouds appearing on the horizon:

  • infrastructure cloud – cloud as we know and love it today where infrastructure services can be paid for on a per use basis, where data and applications move seemlessly across various infrastructural boundaries.
  • content cloud – this is somewhat new but here we take on the governance, analytics and management of the millions to billions pieces of content using the infrastructure cloud as a basic service.
  • information cloud – the end game, where any and all data streams can be analyzed in real time to provide information and insight to the business.

Jack mentioned the example of when Japan had their earthquake earlier this year they automatically stopped all the trains operating in the country to prevent further injury and accidents, until they could assess the extent of track damage.  Now this was a specialized example in a narrow vertical but the idea is that the information cloud does that sort of real-time analysis of data streaming in all the time.

For much of the rest of the day the executive team filled out the details that surrounded Jack’s talk.

For example Randy DeMont, Executive VP & GM Global Sales, Services and Support talked about the new, more focused sales team. On that has moved to concentrate on better opportunities and expanded to take on new verticals/new emerging markets.

Then Brian Householder, SVP WW Marketing and Business Development got up and talked about some of the key drivers to their growth:

  • Current economic climate has everyone doing more with less.  Hitachi VSP and storage virtualization is a unique position to be able to obtain more value out of current assets, not a rip and replace strategy.  With VSP one layers better management on top of your current infrastructure, that helps get more done with the same equipment.
  • Focus on the channel and verticals are starting to pay off.  More than 50% of HDS revenues now come from indirect channels.  Also, healthcare and life sciences are starting to emerge as a crucial vertical for HDS.
  • Scaleability of their storage solutions is significant. Used to be a PB was a good sized data center but these days we are starting to talk about multiple PBs and even much more.  I think earlier Jack mentioned that in the next couple of years HDS will see their first 1EB customer.

Mark Mike Gustafson,  SVP & GM NAS (former CEO BlueArc) got up and talked about the long and significant partnership between the two companies regarding their HNAS product.  He mentioned that ~30% of BlueArc’s revenue came from HDS.  He also talked about some of the verticals that BlueArc had done well in such as eDiscovery and Media and Entertainment.  Now these verticals will become new focus areas for HDS storage as well.

John Mansfield, SVP Global Solutions Strategy and Developmentcame up and talked about the successes they have had in the product arena.  Apparently they have over 2000 VSPs intsalled, (announced just a year ago), and over 50% of the new systems are going in with virtualization. When asked later what has led to the acceleration in virtualization adoption, the consensus view was that server virtualization and in general, doing more with less (storage efficiency) were driving increased use of this capability.

Hicham Abdessamad, SVP, Global Services got up and talked about what has been happening in the services end of the business.  Apparently there has been a serious shift in HDS services revenue stream from break fix over to professional services (PS).  Such service offerings now include taking over customer data center infrastructure and leasing it back to the customer at a monthly fee.   Hicham re-iterated that ~68% of all IT initiatives fail, while 44% of those that succeed are completed over time and/or over budget.  HDS is providing professional services to help turn this around.  His main problem is finding experienced personnel to help deliver these services.

After this there was a Q&A panel of John Mansfield’s team, Roberto Bassilio, VP Storage Platforms and Product Management, Sean Moser,  VP Software Products, and Scan Putegnat, VP File and Content Services, CME.  There were a number of questions one of which was on the floods in Thailand and their impact on HDS’s business.

Apparently, the flood problems are causing supply disruptions in the consumer end of the drive market and are not having serious repercussions for their enterprise customers. But they did mention that they were nudging customers to purchase the right form factor (LFF?) disk drives while the supply problems work themselves out.

Also, there was some indication that HDS would be going after more SSD and/or NAND flash capabilities similar to other major vendors in their space. But there was no clarification of when or exactly what they would be doing.

After lunch the GMs of all the Geographic regions around the globe got up and talked about how they were doing in their particular arena.

  • Jeff Henry, SVP &GM Americas talked about their success in the F500 and some of the emerging markets in Latin America.  In fact, they have been so successful in Brazil, they had to split the country into two regions.
  • Niels Svenningsen, SVP&GM EMAE talked about the emerging markets in his area of the globe, primarily eastern Europe, Russia and Africa. He mentioned that many believe Africa will be the next area to take off like Asia did in the last couple of decades of last century.  Apparently there are a Billion people in Africa today.
  • Kevin Eggleston, SVP&GM APAC, talked about the high rate of server and storage virtualization, the explosive growth and heavy adoption of Cloud pay as you go services. His major growth areas were India and China.

The rest of the afternoon was NDA presentations on future roadmap items.

—-

All in all a good overview of HDS’s business over the past couple of quarters and their vision for tomorrow.  It was a long day and there was probably more than I could absorb in the time we had together.

Comments?

 

SSD news roundup

The NexGen n5 Storage System (c) 2011 NexGen Storage, All Rights Reserved
The NexGen n5 Storage System (c) 2011 NexGen Storage, All Rights Reserved

NexGen comes out of stealth

NexGen Storage a local storage company came out of stealth today and is also generally available.  Their storage system has been in beta since April 2011 and is in use by a number of customers today.

Their product uses DRAM caching, PCIe NAND flash, and nearline SAS drives to provide guaranteed QoS for LUN I/O.  The system can provision IOP rate, bandwidth and (possibly) latency over a set of configured LUNs.    Such provisioning can change using policy management on a time basis to support time-based tiering. Also, one can prioritize how important the QoS is for a LUN so that it could be guaranteed or could be sacrificed to support performance for other storage system LUNs.

The NexGen storage provides a multi-tiered hybrid storage system that supports 10GBE iSCSI, and uses MLC NAND PCIe card  to boost performance for SAS nearline drives.  NexGen also supports data deduplication which is done during off-peak times to reduce data footprint.

DRAM replacing Disk!?

In a report by ARS Technica, a research group out of Stanford is attempting to gang together server DRAM to create a networked storage system.  There have been a number of attempts to use DRAM as a storage system in the past but the Stanford group is going after it in a different way by aggregating together DRAM across a gaggle of servers.  They are using standard disks or SSDs for backup purposes because DRAM is, of course, a volatile storage device but the intent is to keep all in memory to speed up performance.

I was at SNW USA a couple of weeks ago talking to a Taiwanese company that was offering a DRAM storage accelerator device which also used DRAM as a storage service. Of course, Texas Memory Systems and others have had DRAM based storage for a while now. The cost for such devices was always pretty high but the performance was commensurate.

In contrast, the Stanford group is trying to use commodity hardware (servers) with copious amounts of DRAM, to create a storage system.  The article seems to imply that the system could take advantage of unused DRAM, sitting around your server farm. But, I find it hard to believe that.  Most virtualized server environments today are running lean on memory and there shouldn’t be a lot of excess DRAM capacity hanging around.

The other achilles heel of the Stanford DRAM storage is that it is highly dependent on low latency networking.  Although Infiniband probably qualifies as low latency, it’s not low latency enough to support this systems IO workloads. As such, they believe they need even lower latency networking than Infiniband to make it work well.

OCZ ups the IOP rate on their RevoDrive3 Max series PCIe NAND storage

Speaking of PCIe NAND flash, OCZ just announced speedier storage, upping the random read IO rates up to 245K from the 230K IOPS offered in their previous PCIe NAND storage.  Unclear what they did to boost this but, it’s entirely possible that they have optimized their NAND controller to support more random reads.

OCZ announces they will ship TLC SSD storage in 2012

OCZ’s been busy.  Now that the enterprise is moving to adopt MLC and eMLC SSD storage, it seems time to introduce TLC (3-bits/cell) SSDs.  With TLC, the price should come down a bit more (see chart in article), but the endurance should also suffer significantly.  I suppose with the capacities available with TLC and enough over provisioning OCZ can make a storage device that would be reliable enough for certain applications at a more reasonable cost.

I never thought I would see MLC in enterprise storage so, I suppose at some point even TLC makes sense, but I would be even more hesitant to jump on this bandwagon for awhile yet.

Solid Fire obtains more funding

Early last week Solid Fire, another local SSD startup obtained $25M in additional funding.  Solid Fire, an all SSD storage system company,  is still technically in beta but expect general availability near the end of the year.   We haven’t talked about them before in RayOnStorage but they are focusing on cloud service providers with an all SSD solution which includes deduplication.  I promise to talk about them some more when they reach GA.

LaCIE introduces a Little Big Disk, a Thunderbolt SSD

Finally, in the highend consumer space, LaCie just released a new SSD which attaches to servers/desktops using the new Apple-Intel Thunderbolt IO interface.  Given the expense (~$900) for 128GB SSD, it seems a bit much but if you absolutely have to have the performance this may be the only way to go.

 

—-

Well that’s about all I could find on SSD and DRAM storage announcements. However, I am sure I missed a couple so if you know one I should have mentioned please comment.

IBM’s 120PB storage system

Susitna Glacier, Alaska by NASA Goddard Photo and Video (cc) (from Flickr)
Susitna Glacier, Alaska by NASA Goddard Photo and Video (cc) (from Flickr)

Talk about big data, Technology Review reported this week that IBM is building a 120PB storage system for some unnamed customer.  Details are sketchy and I cannot seem to find any announcement of this on IBM.com.

Hardware

It appears that the system uses 200K disk drives to support the 120PB of storage.  The disk drives are packed in a new wider rack and are water cooled.  According to the news report the new wider drive trays hold more drives than current drive trays available on the market.

For instance, HP has a hot pluggable, 100 SFF (small form factor 2.5″) disk enclosure that sits in 3U of standard rack space.  200K SFF disks would take up about 154 full racks, not counting the interconnect switching that would be required.  Unclear whether water cooling would increase the density much but I suppose a wider tray with special cooling might get you more drives per floor tile.

There was no mention of interconnect, but today’s drives use either SAS or SATA.  SAS interconnects for 200K drives would require many separate SAS busses. With an SAS expander addressing 255 drives or other expanders, one would need at least 4 SAS busses but this would have ~64K drives per bus and would not perform well.  Something more like 64-128 drives per bus would have much better performer and each drive would need dual pathing, and if we use 100 drives per SAS string, that’s 2000 SAS drive strings or at least 4000 SAS busses (dual port access to the drives).

The report mentioned GPFS as the underlying software which supports three cluster types today:

  • Shared storage cluster – where GPFS front end nodes access shared storage across the backend. This is generally SAN storage system(s).  But the requirements for high density, it doesn’t seem likely that the 120PB storage system uses SAN storage in the backend.
  • Networked based cluster – here the GPFS front end nodes talk over a LAN to a cluster of NSD (network storage director?) servers which can have access to all or some of the storage. My guess is this is what will be used in the 120PB storage system
  • Shared Network based clusters – this looks just like a bunch of NSD servers but provides access across multiple NSD clusters.

Given the above, with ~100 drives per NSD server means another 1U extra per 100 drives or (given HP drive density) 4U per 100 drives for 1000 drives and 10 IO servers per 40U rack, (not counting switching).  At this density it takes ~200 racks for 120PB of raw storage and NSD nodes or 2000 NSD nodes.

Unclear how many GPFS front end nodes would be needed on top of this but even if it were 1 GPFS frontend node for every 5 NSD nodes, we are talking another 400 GPFS frontend nodes and at 1U per server, another 10 racks or so (not counting switching).

If my calculations are correct we are talking over 210 racks with switching thrown in to support the storage.  According to IBM’s discussion on the Storage challenges for petascale systems, it probably provides ~6TB/sec of data transfer which should be easy with 200K disks but may require even more SAS busses (maybe ~10K vs. the 2K discussed above).

Software

IBM GPFS is used behind the scenes in IBM’s commercial SONAS storage system but has been around as a cluster file system designed for HPC environments for over 15 years or more now.

Given this many disk drives something needs to be done about protecting against drive failure.  IBM has been talking about declustered RAID algorithms for their next generation HPC storage system which spreads the parity across more disks and as such, speeds up rebuild time at the cost of reducing effective capacity. There was no mention of effective capacity in the report but this would be a reasonable tradeoff.  A 200K drive storage system should have a drive failure every 10 hours, on average (assuming a 2 million hour MTBF).  Let’s hope they get drive rebuild time down much below that.

The system is expected to hold around a trillion files.  Not sure but even at 1024 bytes of metadata per file, this number of files would chew up ~1PB of metadata storage space.

GPFS provides ILM (information life cycle management, or data placement based on information attributes) using automated policies and supports external storage pools outside the GPFS cluster storage.  ILM within the GPFS cluster supports file placement across different tiers of storage.

All the discussion up to now revolved around homogeneous backend storage but it’s quite possible that multiple storage tiers could also be used.  For example, a high density but slower storage tier could be combined with a low density but faster storage tier to provide a more cost effective storage system.  Although, it’s unclear whether the application (real world modeling) could readily utilize this sort of storage architecture nor whether they would care about system cost.

Nonetheless, presumably an external storage pool would be a useful adjunct to any 120PB storage system for HPC applications.

Can it be done?

Let’s see, 400 GPFS nodes, 2000 NSD nodes, and 200K drives. Seems like the hardware would be readily doable (not sure why they needed watercooling but hopefully they obtained better drive density that way).

Luckily GPFS supports Infiniband which can support 10,000 nodes within a single subnet.  Thus an Infiniband interconnect between the GPFS and NSD nodes could easily support a 2400 node cluster.

The only real question is can a GPFS software system handle 2000 NSD nodes and 400 GPFS nodes with trillions of files over 120PB of raw storage.

As a comparison here are some recent examples of scale out NAS systems:

It would seem that a 20X multiplier times a current Isilon cluster or even a 10X multiple of a currently supported SONAS system would take some software effort to work together, but seems entirely within reason.

On the other hand, Yahoo supports a 4000-node Hadoop cluster and seems to work just fine.  So from a feasability perspective, a 2500 node GPFS-NSD node system seems just a walk in the park for Hadoop.

Of course, IBM Almaden is working on project to support Hadoop over GPFS which might not be optimum for real world modeling but would nonetheless support the node count being talked about here.

——

I wish there was some real technical information on the project out on the web but I could not find any. Much of this is informed conjecture based on current GPFS system and storage hardware capabilities. But hopefully, I haven’t traveled to far astray.

Comments?

 

Shared DAS

Code Name "Thumper" by richardmasoner (cc) (from Flickr)
Code Name "Thumper" by richardmasoner (cc) (from Flickr)

An announcement this week by VMware on their vSphere  5 Virtual Storage Appliance has brought back the concept of shared DAS (see vSphere 5 storage announcements).

Over the years, there have been a few products, such as Seanodes and Condor Storage (may not exist now) that have tried to make a market out of sharing DAS across a cluster of servers.

Arguably, Hadoop HDFS (see Hadoop – part 1), Amazon S3/cloud storage services and most scale out NAS systems all support similar capabilities. Such systems consist of a number of servers with direct attached storage, accessible by other servers or the Internet as one large, contiguous storage/file system address space.

Why share DAS? The simple fact is that DAS is cheap, its capacity is increasing, and it’s ubiquitous.

Shared DAS system capabilities

VMware has limited their DAS virtual storage appliance to a 3 ESX node environment, possibly lot’s of reasons for this.  But there is no such restriction for Seanode Exanode clusters.

On the other hand, VMware has specifically targeted SMB data centers for this facility.  In contrast, Seanodes has focused on both HPC and SMB markets for their shared internal storage which provides support for a virtual SAN on Linux, VMware ESX, and Windows Server operating systems.

Although VMware Virtual Storage Appliance and Seanodes do provide rudimentary SAN storage services, they do not supply advanced capabilities of enterprise storage such as point-in-time copies, replication, data reduction, etc.

But, some of these facilities are available outside their systems. For example, VMware with vSphere 5 will supports a host based replication service and has had for some time now software based snapshots. Also, similar services exist or can be purchased for Windows and presumably Linux.  Also, cloud storage providers have provided a smattering of these capabilities from the start in their offerings.

Performance?

Although distributed DAS storage has the potential for high performance, it seems to me that these systems should perform poorer than an equivalent amount of processing power and storage in a dedicated storage array.  But my biases might be showing.

On the other hand, Hadoop and scale out NAS systems are capable of screaming performance when put together properly.  Recent SPECsfs2008 results for EMC Isilon scale out NAS system have demonstrated very high performance and Hadoops claim to fame is high performance analytics. But you have to throw a lot of nodes at the problem.

—–

In the end, all it takes is software. Virtualizing servers, sharing DAS, and implementing advanced storage features, any of these can be done within software alone.

However, service levels, high availability and fault tolerance requirements have historically necessitated a physical separation between storage and compute services. Nonetheless, if you really need screaming application performance and software based fault tolerance/high availability will suffice, then distributed DAS systems with co-located applications like Hadoop or some scale out NAS systems are the only game in town.

Comments?