Data storage features for virtual desktop infrastructure (VDI) deployments

The Planet Data Center by The Planet (cc) (from Flickr)
The Planet Data Center by The Planet (cc) (from Flickr)

Was talking with someone yesterday about one of my favorite topics, data storage for virtual desktop infrastructure (VDI) deployments.  In my mind there are a few advanced storage features that help considerably with VDI implemetations:

  • Deduplication – almost every one of your virtual desktops will share 75-90% of their O/S disk data with every other virtual desktop.  Having sub-file/sub-block deduplication can be a godsend for all this replicated data and reduce O/S storage requirements considerably.
  • 0 storage snapshots/clones – another solution to the duplication of O/S data is to use some sort of space conserving snapshots.  For example, one creates a master (gold) disk image and makes 100s if not 1000s of snapshots of it, taking almost no additional space.
  • Highly available/highly reliable storage – when you have a lone desktop dependent on DAS for it’s O/S, it doesn’t impact a lot of users if that device fails. However, when you have 100s to 1000s of users dependent on DAS device(s) for their O/S software, any DAS failure could impact all of them at the same time.  As such, one needs to move off DAS and invest in highly reliable and available external storage of some kind to sustain reasonable uptime for your user community.

Those seem to me to be the most important attributes for VDI storage but there are a couple more features/facilities which can also:

  • NAS systems with NFS – VDI deployments will generate lots of VMDKs for all the user desktop C: drives.  Although this can be managed with block level storage as separate LUNs or multi-VMDK LUNs, who want’s to configure a 100 to 1000 LUNs.  NFS files can perform just as well and are much easier to create on the fly and thus, for VDI it’s hard to beat NFS storage.
  • Boot storm enhancements – Another problem with VDI is that everyone gets to work 8am Monday and proceeds to boot up their (virtual) machines, which drives an awful lot of IO to their virtual C: drives.  Deduplication and 0 storage snapshots can help manage the boot storm as long as these characteristics are retained throughout system cache, i.e, deduplication exists in cache as well as on backend disk.  But there are other approaches to the problem as well, available from various vendors to better manage boot storms.
  • Anti-Virus scan enhancements – Similar to boot storms, A-V scans also typically happen around the same time for many desktop users and can be just as bad for virtual C: drive performance.  Again, deduplication or 0 storage snapshots can help (with above caveats) but some vendor storage can offload these activities from the desktop alltogether.  Also last weeks VMworld release of VMware’s vShield Edge (see VMworld 2010 review) also supports some A-V scan enhancements. Any of these approaches should be able to help.

Regular “dumb” block storage will always work but it will require a lot more raw storage, performance will suffer just when everybody gets back to work, and the administrative burden will be much higher.

I may seem biased but enterprise class reliability&availability with some of the advanced storage features described above can help make your deployment of VDI that much better for you and all your knowledge workers.

Anything I missed?

Enterprise data storage defined and why 3PAR?

More SNW hall servers and storage
More SNW hall servers and storage

Recent press reports about a bidding war for 3PAR bring into focus the expanding need for enterprise class data storage subsystems.  What exactly is enterprise storage?

Defining enterprise storage is frought with problems but I will take a shot.  Enterprise class data storage has:

  • Enhanced reliability, high availability and serviceability – meaning it hardly ever fails, it keeps operating (on redundant components) when it does fail, and repairing the storage when the rare failure occurs can be accomplished without disrupting ongoing storage services
  • Extreme data integrity – goes beyond just RAID storage, meaning that these systems lose data very infrequently, provide the latest data written to a location when read and will tell you when data cannot be accessed.
  • Automated I/O performance – meaning sophisticated caching algorithms that try to keep ahead of sequential I/O streams, buffer actively read data, and buffer write data in non-volatile cache before destaging to disk or other media.
  • Multiple types of storage – meaning the system supports SATA, SAS and/or FC disk drives and SSDs or Flash storage
  • PBs of storage – meaning behind one enterprise class storage (sub-)system one can support over 1PB of storage
  • Sophisticated functionality – meaning the system supports multiple forms of offsite replication, thin provisioning, storage tiering, point-in-time copies, data cloning, administration GUIs/CLIs, etc.
  • Compatibility with all enterprise O/Ss – meaning the storage has been tested and is on hardware compatibility lists for every major operating system in use by the enterprise today.

As for storage protocol, it seems best to leave this off the list.  I wanted to just add block storage, but enterprises today probably have as much if not more external file storage (CIFS or NFS) as they have block storage (FC or iSCSI).  And the proportion in file systems seems to be growing (see IDC report referenced below).

In addition, while I don’t like the non-determinism of iSCSI or file access protocols, this doesn’t seem to stop such storage from putting up pretty impressive performance numbers (see our performance dispatches).  Anything that can crack 100K I/O or file operations per second probably deserves to call themselves enterprise storage as long as they meet the other requirements.  So, maybe I should add high-performance storage to the list above.

Why the sudden interest in enterprise storage?

Enterprise storage has been around arguably since the 2nd half of last century (for mainframe systems) but lately has become even more interesting as applications deploy to the cloud and server virtualization (from VMware, Microsoft Hyper-V and others) takes over the data center.

Cloud storage and cloud computing services are lowering the entry points for storage and processing, enabling application deployments which were heretofore unaffordable.  These new cloud applications consume storage at increasing rates and don’t seem to be slowing down any time soon.  Arguably, some cloud storage is not enterprise storage but as service levels go up for these applications, providers must ultimately turn to enterprise storage.

In addition, server virtualization transforms the enterprise data center from a single application per server to easily 5 or more applications per physical server.  This trend is raising server utilization, driving more I/O, and requiring higher capacity.  Such “multi-application” storage almost always requires high availability, reliability and performance to work well, generating even more demand for enterprise data storage systems.

Despite all the demand, world wide external storage revenues dropped 12% last year according to IDC.  Now the economy had a lot to do with this decline but another factor reducing external storage revenue is the ongoing drop in the price of storage on a $/GB basis.  To this point, that same IDC report stated that external storage capacity increased 33% last year.

Why Dell & HP wants 3PAR storage?

Margins on enterprise storage are good, some would say very good.  While raw disk storage can be had at under $0.50/GB, enterprise class storage is often 10 or more times that price.  Now that has to cover redundant hardware, software/firmware engineering and other characteristics, but this still leaves pretty good margins.

In my mind, Dell would see enterprise storage as a natural extension of their current enterprise server business.  They already sell and support these customers, including enterprise class storage just adds another product to the mix.  Developing enterprise storage from scratch is probably a 4-7 year journey with the right people, buying 3PAR puts them in the market today with a competitive product.

HP is already in the enterprise storage market today, with their XP and EVA storage subsystems.  However, having their own 3PAR enterprise class storage may get them better margins than their current XP storage OEMed from HDS.  But I think Chuck Hollis’s post on HP’s counter bid for 3PAR may have revealed another side to this discussion – sometime M&A is as much about constraining your competition as it is about adding new capabilities to a company.

——

What do you think?

WD’s new SiliconEdge Blue SSD data write spec

Western Digital's Silicon Edge Blue SSD SATA drive (from their website)
Western Digital's SiliconEdge Blue SSD SATA drive (from their website)

Western Digital (WD) announced their first SSD drive for the desktop/laptop market space today.  Their drive offers the typical256, 128, and 64GB capacity points over a SATA interface.  Performance looks ok at 5K random read or write IO/s with sustained transfers at 250 and 140MB/s for read and write respectively.  But what caught my eye was a new specification I hadn’t seen before indicating Maximum GB written per day of 17.5, 35 and 70GB/d for their drives using WD’s Operational Lifespan – LifeEST(tm) definition.

I couldn’t find anywhere that said which NAND technology was used in the device but it likely uses MLC NAND.  In a prior posting we discussed a Toshiba study that said a “typical” laptop user writes about 2.4GB/d and a “heavy” laptop user writes about 9.2GB/d.  This data would indicate that WD’s new 64GB drive can handle almost 2X the defined “heavy” user workload for laptops and their other drives would handle it just fine.  A data write rate for desktop work, as far as I can tell, has not been published, but presumably it would be greater than laptop users.

From my perspective more information on the drives underlying NAND technology, on what a LifeEST specification actually means, and a specification as to how much NAND storage was actually present would be nice, but these are all personal nits.  All that aside, I applaud WD for standing up and saying what data write rate their drives can support.  This needs to be a standard part of any SSD specification sheet and I look forward to seeing more information like this coming from other vendors as well.

Atmos GeoProtect vs RAID

The Night Lights of Europe (as seen from space) by woodleywonderworks (cc) (from flickr)
The Night Lights of Europe (as seen from space) by woodleywonderworks (cc) (from flickr)

Yesterday, twitterland was buzzing about EMC’s latest enhancement to their Atmos Cloud Storage platform called GeoProtect.  This new capability improves cloud data protection by supporting erasure code data protection rather than just pure object replication.

Erasure coding has been used for over a decade in storage and some of the common algorithms are Reed-Solomon, Cauchy Reed-Soloman, EVENODD coding, etc.  All these algorithms provide a way for splitting up customer data into data instances and parity (encoding) to allow some number of data or parity instances to be erased (or lost) while still providing customer data.  For example, a R-S encoding scheme we used in the past (called RAID 6+) had 13 data fragments and 2 parity fragments.  Such an encoding scheme supported the simultaneous failure of any two drives and could still supply (reconstruct) customer data.

But how does RAID differ from something like GeoProtect.

  • RAID is typically within a storage array and not across storage arrays
  • RAID is typically limited to a small number of alternative configurations of data disks and parity disks which cannot be altered in the field, and
  • Currently, RAID typically doesn’t support more than two disk failures while still being able to recover customer data (see Are RAIDs days numbered?)

As I understand it GeoProtect currently supports only two different encoding schemes which can provide for different levels of data instance failures while still protecting customer data.  And with GeoProtect you are protecting data across Atmos nodes and potentially across different geographic locations not just within storage arrays.  Also, with Atmos this is all policy driven and data that comes into the system can use any object replication policy or either of the two GeoProtect policies supported today.

Although the nice thing about R-S encoding is that it doesn’t have to be fixed to two different encoding schemes.  And as it’s all software, new coding schemes could easily be released over time, possibly someday being entirely something a user could dial up or down at their whim.

But this would seem much more like what Cleversafe has been offering in their SliceStor product.  With Cleversafe the user can specify exactly how much redundancy they want to support and the system takes care of everything else.  In addition, Cleversafe has implemented a more fine grained approach (with many more fragments) and data and parity are intermingled in each stored fragment.

It’s not a big stretch for Atmos to go from two GeoProtect configurations to four or more.  Unclear to me what the right number would be but once you get past 3 or so, it might be easier to just code a generic R-S routine that can handle any configuration the customer wants but I may be oversimplifying the mathematics here.

Nonetheless, in future versions of Atmos I wouldn’t be surprised if it’s possible that through policy management the way data is protected could change over time. Specifically, while data is being frequently accessed, one could use object replication or less compressed encoding to speed up access but once access frequency diminishes (or time passes), data can then protected with more storage efficient encoding schemes which would reduce the data footprint in the cloud while still offering similar resiliency to data loss.

Full disclosure I have worked for Cleversafe in the past and although I am currently working with EMC, I have had no work from EMC’s Atmos team.

Intel-Micron new 25nm/8GB MLC NAND chip

intel_and_micron_in_25nm_nand_technology
intel_and_micron_in_25nm_nand_technology

Intel-Micron Flash Technologies just issued another increase in NAND density. This one’s manages to put 8GB on a single chip with MLC(2) technology in a 167mm square package or roughly a half inch per side.

You may recall that Intel-Micron Flash Technologies (IMFT) is a joint venture between Intel and Micron to develop NAND technology chips. IMFT chips can be used by any vendor and typically show up in Intel SSDs as well as other vendor systems. MLC technology is more suitable for use in consumer applications but at these densities it’s starting to make sense for use by data centers as well. We have written before about MLC NAND used in the enterprise disk by STEC and Toshiba’s MLC SSDs. But in essence MLC NAND reliability and endurability will ultimately determine its place in the enterprise.

But at these densities, you can just throw more capacity at the problem to mask MLC endurance concerns. For example, with this latest chip, one could conceivably have a single layer 2.5″ configuration with almost 200GBs of MLC NAND. If you wanted to configure this as 128GB SSD you could use the additional 72GB of NAND for failing pages. Doing this could conceivably add more than 50% to the life of an SSD.

SLC still has better (~10X) endurance but being able to ship 2X the capacity in the same footprint can help.  Of course, MLC and SLC NAND can be combined in a hybrid device to give some approximation of SLC reliability at MLC costs.

IMFT made no mention of SLC NAND chips at the 25nm technology node but presumably this will be forthcoming shortly.  As such, if we assume the technology can support a 4GB SLC NAND in a 167mm**2 chip it should be of significant interest to most enterprise SSD vendors.

A couple of things missing from yesterday’s IMFT press release, namely

  • read/write performance specifications for the NAND chip
  • write endurance specifications for the NAND chip

SSD performance is normally a function of all the technology that surrounds the NAND chip but it all starts with the chip.  Also, MLC used to be capable of 10,000 write/erase cycles and SLC was capable of 100,000 w/e cycles but most recent technology from Toshiba (presumably 34nm technology) shows a MLC NAND write/erase endurance of only 1400 cycles.  Which seems to imply that as the NAND technology increases density write endurance rates degrade. How much is subject to much debate and with the lack of any standardized w/e endurance specifications and reporting, it’s hard to see how bad it gets.

The bottom line, capacity is great but we need to know w/e endurance to really see where this new technology fits.  Ultimately, if endurance degrades significantly such NAND technology will only be suitable for consumer products.  Of course at ~10X (just guessing) the size of the enterprise market maybe that’s ok.

Toshiba studies laptop write rates confirming SSD longevity

Toshiba's New 2.5" SSD from SSD.Toshiba.com
Toshiba's New 2.5in SSD from SSD.Toshiba.com

Today Toshiba announced a new series of SSD drives based on their 32NM MLC NAND technology. The new technology is interesting but what caught my eye was another part of their website, i.e., their SSD FAQs. We have talked about MLC NAND technology before and have discussed its inherent reliability limitations, but this is the first time I have seen some company discuss their reliability estimates so publicly. This was documented more in an IDC white paperon their site but the summary on the FAQ web page speaks to most of it.

Toshiba’s answer to the MLC write endurance question all revolves around how much data a laptop user writes per day which their study makes clear . Essentially, Toshiba assumes MLC NAND write endurance is 1,400 write/erase cycles and for their 64GB drive a user would have to write, on average, 22GB/day for 5 years before they would exceed the manufacturers warranty based on write endurance cycles alone.

Let’s see:

  • 5 years is ~1825 days
  • 22GB/day over 5 years would be over 40,000GB of data written
  • If we divide this by the 1400 MLC W/E cycle limits given above, that gives us something like 28.7 NAND pages could fail and yet still support write reliability.

Not sure what Toshiba’s MLC SSD supports for page size but it’s not unusual for SSDs to ship an additional 20% of capacity to over provision for write endurance and ECC. Given that 20% of 64GB is ~12.8GB, and it has to at least sustain ~28.7 NAND page failures, this puts Toshiba’s MLC NAND page at something like 512MB or ~4Gb which makes sense.

MLC vs, SLC write endurance from SSD.Toshiba.com
MLC vs, SLC write endurance from SSD.Toshiba.com

The not so surprising thing about this analysis is that as drive capacity goes up, write endurance concerns diminish because the amount of data that needs to be written daily goes up linearly with the capacity of the SSD. Toshiba’s latest drive announcements offer 64/128/256GB MLC SSDs for the mobile market.

Toshiba studies mobile users write activity

To come at their SSD reliability estimate from another direction, Toshiba’s laptop usage modeling study of over 237 mobile users showed the “typical” laptop user wrote an average of 2.4GB/day (with auto-save&hibernate on) and a “heavy” labtop user wrote 9.2GB/day under similar specifications. Now averages are well and good but to really put this into perspective one needs to know the workload variability. Nonetheless, their published results do put a rational upper bound on how much data typical laptop users write during a year that can then be used to compute (MLC) SSD drive reliability.

I must applaud Toshiba for publishing some of their mobile user study information to help us all better understand SSD reliability for this environment. It would have been better to see the complete study including all the statistics, when it was done, how users were selected, and it would have been really nice to see this study done by a standard’s body (say SNIA) rather than a manufacturer, but these are all personal nits.

Now, I can’t wait to see a study on write activity for the “heavy” enterprise data center environment, …

Seagate launches their Pulsar SSD

Seagate's Pulsar SSD (seagate.com)
Seagate's Pulsar SSD (seagate.com)

Today Seagate announced their new SSD offering, named the Pulsar SSD.  It uses SLC NAND technology and comes in a 2.5″ form factor at 50, 100 or 200GB capacity.  The fact that it uses a 3GB/s SATA interface seems to indicate that Seagate is going after the server market rather than the highend storage market place but different interfaces can be added over time.

Pulsar SSD performance

The main fact that makes the Pulsar interesting is the peak write rates at 25,000 4KB aligned writes per second versus a peak read rate of 30,000.  The ratio of peak reads to peak writes 30:25 represents a significant advance over prior SSDs and presumably this is through the magic of buffering.  But once we get beyond peak IO buffering sustained 128KB writes drops to 2600, 5300, or 10,500 ops/sec for the 50, 100, and 200GB drives respectively.  Kind of interesting that this drops as capacity drops and implies that adding capacity also adds parallelism. Sustained 4KB reads for the Pulsar is speced at 30,000.

In contrast, STEC’s Zeus drive is speced at 45,000 random reads and 15,000 random writes sustained and 80,000 peak reads and 40,000 peak writes.  So performance wise the Seagate Pulsar (200GB) SSD has about ~37% the peak read and ~63% the peak write performance with ~67% the sustained read with ~70% the sustained write performance of the Zeus drive.

Pulsar reliability

The other items of interest is that Seagate states a 0.44% annual failure rate (AFR), so for a 100 Pulsar drive storage subsystem one Pulsar drive will fail every 2.27 years.  Also the Pulsar bit error rate (BER) is specified at <10E16 new and <10E15 at end of life.  As far as I can tell both of these specifications are better than STEC’s specs for the Zeus drive.

Both the Zeus and Pulsar drives support a 5 year limited warranty.  But if the Pulsar is indeed a more reliable drive as indicated by their respective specifications, vendors may prefer the Pulsar as it would require less service.

All this seems to say that reliability may become a more important factor in vendor SSD selection. I suppose once you get beyond 10K read or write IOPs per drive, performance differences just don’t matter that much. But a BER of 10E14 vs 10E16 may make a significant difference to product service cost and as such, may justify changing SSD vendors much easier. Seems to be opening up a new front in the SSD wars – drive reliability

Now if they only offered 6GB/s SAS or 4GFC interfaces…

7 grand challenges for the next storage century

Clock tower (4) by TJ Morris (cc) (from flickr)
Clock tower (4) by TJ Morris (cc) (from flickr)

I saw a recent IEEE Spectrum article on engineering’s grand challenges for the next century and thought something similar should be done for data storage. So this is a start:

  • Replace magnetic storage – most predictions show that magnetic disk storage has another 25 years and magnetic tape another decade after that before they run out of steam. Such end-dates have been wrong before but it is unlikely that we will be using disk or tape 50 years from now. Some sort of solid state device seems most probable as the next evolution of storage. I doubt this will be NAND considering its write endurance and other long-term reliability issues but if such issues could be re-solved maybe it could replace magnetic storage.
  • 1000 year storage – paper can be printed today with non-acidic based ink and retain its image for over a 1000 years. Nothing in data storage today can claim much more than a 100 year longevity. The world needs data storage that lasts much longer than 100 years.
  • Zero energy storage – today SSD/NAND and rotating magnetic media consume energy constantly in order to be accessible. Ultimately, the world needs some sort of storage that only consumes energy when read or written or such storage would provide “online access with offline power consumption”.
  • Convergent fabrics running divergent protocols – whether it’s ethernet, infiniband, FC, or something new, all fabrics should be able to handle any and all storage (and datacenter) protocols. The internet has become so ubiquitous becauset it handles just about any protocol we throw at it. We need the same or something similar for datacenter fabrics.
  • Securing data – securing books or paper is relatively straightforward today, just throw them in a vault/safety deposit box. Securing data seems simple but yet is not widely used today. It doesn’t have to be that way. We need better, more long lasting tools and methodology to secure our data.
  • Public data repositories – libraries exist to provide access to the output of society in the form of books, magazines, papers and other printed artifacts. No such repository exists today for data. Society would be better served if we could store and retrieve data if there were library like institutions could store data. Most of these issues are legal due to data ownership but technological issues exist here as well.
  • Associative accessed storage – Sequential and random access have been around for over half a century now. Associative storage could complement these and be another approach allowing storage to be retrieved by its content. We can kind of do this today by keywording and indexing data. Biological memory is accessed associations or linkages to other concepts, once accessed memory seem almost sequentially accessed from there. Something comparable to biological memory may be required to build more intelligent machines.

Some of these are already being pursued and yet others receive no interest today. Nonetheless, I believe they all deserve investigation, if storage is to continue to serve its primary role to society, as a long term storehouse for society’s culture, thoughts and deeds.

Comments?