Why virtualize now?

HP servers at School of Electrical Engineering, University of Belgrade
HP servers by lilit
I suppose it’s obvious to most analyst why server virtualization is such a hot topic these days. Most IT shops purchase servers today that are way overpowered and can easily execute multiple applications. Today’s overpowered servers are wasted running single applications and would easily run multiple applications if only an operating system could run them together without interference.

Enter virtualization, with virtualization hypervisors can run multiple applications concurrently and sometimes simultaneously on the same hardware server without compromising application execution integrity. Multiple virtual machine applications execute on a single server under a hypervisor that isolates the applications from one another. Thus, they all execute together on the same hardware without impacting each other.

But why doesn’t the O/S do this?

Most computer purists would say why not just run the multiple applications under the same operating system. But operating systems that run servers nowadays weren’t designed to run multiple applications together and as such, also weren’t designed to isolate them properly.

Virtualization hypervisors have had a clean slate to execute and isolate multiple application. Thus, virtualization is taking over the data center floor. As new servers come in, old servers are retired and the applications that used to run on them are consolidated on fewer and fewer physical servers.

Why now?

Current hardware trends dictate that each new generation of server has more processing power and oftentimes, more processing elements than previous generations. Today’s applications are getting more sophisticated but even with added sophistication, they do not come close to taking advantage of all the processing power now available. Hence, virtualization wins.

What seems to be happening nowadays is that while data centers started out consolidating tier 3 applications through virtualization, now they are starting to consolidate tier 2 applications and tier 1 apps are not far down this path. But, tier 2 and 1 applications require more dedicated services, more processing power, more deterministic execution times and thus, require more sophisticated virtualization hypervisors.

As such, VMware and others are responding by providing more hypervisor sophistication, e.g., more ways to dedicate and split up processing, networking and storage available to the physical server for virtual machine or application dedicated use. Thus preparing themselves for a point in the not to distant future when tier 1 applications run with all the comforts of a dedicated server environment but actually execute with other VMs in a single physical server.

VMware vSphere

We can see the start of this trend with the latest offering from VMware, vSphere. This product now supports more processing hardware, more networking options and stronger storage support. vSphere also can dedicate more processing elements to virtual machines. Such new features make it easier to support tier 2 today and tier 1 applications sometime in future.

ESRP results 1K and under mailboxes – chart of the month

Top 10 ESRP database transfers/sec
Top 10 ESRP database transfers/sec

As described more fully in last months SCI’s newsletter, to the left is a chart depicting Exchange Solution Reporting Program (ESRP) results for up to 1000 mailboxes in the database read and write per second category. This top 10 chart is dominated by HP’s new MSA 2000fc G2 product.

Microsoft will tell you that ESRP is not to be used to compare one storage vendor against another but more as a proof of concept to show how some storage can support a given email workload. The nice thing about ESRP, from my perspective, is that it represents a realistic storage workload rather than the more synthetic workloads offered by the other benchmarks.

What does over 3000 Exchange database operations per second mean to the normal IT shop or email user. It should mean more emails per hour can be sent/received with less hardware. It should mean a higher capacity to service email clients. It should mean a happier IT staff.

But does it mean happier end-users?

I would show my other chart from this latest dispatch that has read latency on it but that would be two charts. Anyways, what the top 10 Read Latency chart would show is that EMC CLARiiON dominates with the overall lowest latency and has the top 9 positions with various versions of CLARiiON and replication alternatives being reported in ESRP results. The 9-CLARiiON subsystems had a latency at around 8-11 msecs. The one CLARiiON on the chart above (CX3-20, #7 in the top 10) had a read latency around 9 msec. and write latency at 5 msec. In contrast, the HP MSA had a read latency of 16 msecs with a write latency of 5 msec. – very interesting.

What this says is that database transfers per second are now more like throughput measures and even though a single database operation (latency) may be almost ~2X longer (9 vs. 16 msecs), it can still perform more database transfer operations per second due to concurrency. Almost makes sense.

Are vendors different?

This probably says something more about the focus of the two storage vendor engineering groups – EMC CLARiiON on getting data to you the fastest and HP MSA on getting the most data through the system.  It might also speak to what the vendor’se ESRP teams were trying to show as well. In any case, EMC’s CLARiiON and HP’s MSA have very different performance profiles.

Which vendor’s storage product makes best sense for your Exchange servers – that’s a more significant question?

The full report will be up on my website later this week but if you want to get this information earlier and receive your own copy of our newsletter – just subscribe by emailing us.

What's holding back the cloud?

Cloud whisps by turtlemom4bacon
Cloud whisps by turtlemom4bacon

Steve Duplessie’s recent post on how the lack of scarcity will be a gamechanger got me thinking. Free is good but the simplicity of the user/administrative interface is worth paying for. And it’s that simplicity that pays off for me.

Ease of use

I agree wholeheartedly with Steve about what and where people should spend their time today. Tweetdeck, the Mac, and the iPhone are three key examples that make my business life easier (most of the time).

  • TweetDeck allows me to filter who I am following all while giving me access to any and all of them.
  • The Mac leaves me much more time to do what needs to be done and allows me to spend less time on non-essential stuff.
  • The iPhone has 1000’s of app’s which make my idle time that much more productive.

Nobody would say any of these things are easy to create and for most of them (Tweetdeck is free at the moment) I pay a premium for these products. All these products have significant complexity to offer the simple user and administrative interface they supply.

The iPhone is probably closest to the cloud from my perspective. But it performs poorly (compared to broadband) and service (ATT?) is spotty.  Now these are nuisances in a cell phone which can be lived with.  If this were my only work platform they would be deadly.

Now the cloud may be easy to use because it removes the administrative burden but that’s only one facet of using it. I assume using most cloud services are as easy as signing up on the web and then recoding applications to use the cloud provider’s designated API. This doesn’t sound easy to me. (Full disclosure I am not a current cloud user and thus, cannot talk about it’s ease of use).

Storm clouds

However, today the cloud is not there for other reasons – availability concerns, security concerns, performance issues, etc. All these are inhibitors today and need to be resolved before the cloud can reach the mainstream or maybe be my platform of choice. Also, I have talked before on some other issues with the cloud.

Aside from those inhibitors, the other main problems with the cloud are lack of applications I need to do business today.  Google Apps and MS Office over the net are interesting but not sufficient.  Not sure what is sufficient and that would depend on your line of business but server and desktop platforms had the same problem when they started out. However servers and desktops have evolved over time from killer apps to providing needed application support. The cloud will no doubt follow, over time.

In the end, the cloud needs to both grow up and evolve to host my business model and I would presume many others as well. Personally I don’t care if my data&apps are hosted on the cloud or hosted on office machines. What matters to me are security, reliability, availability, and useability. When the cloud can support me in the same way that the Mac can, then who hosts my applications will be a purely economic decision.

The cloud and net are just not there yet.

STEC’s MLC enterprise SSD

So many choices by Robert S. Donovan
So Many Choices by Robert S. Donovan

I haven’t seen much of a specification on STEC’s new enterprise MLC SSD but it should be interesting.  So far everything I have seen seems to indicate that it’s a pure MLC drive with no SLC  NAND.  This is difficult for me to believe but could easily be cleared up by STEC or their specifications.  Most likely it’s a hybrid SLC-MLC drive similar, at least from the NAND technology perspective, to FusionIO’s SSD drive.

MLC write endurance issue

My difficulty with a pure MLC enterprise drive is the write endurance factor.  MLC NAND can only endure around 10,000 erase/program passes before it starts losing data.  With a hybrid SLC-MLC design one could have the heavy write data go to SLC NAND which has a 100,000 erase/program pass lifecycle and have the less heavy write data go to MLC.  Sort of like a storage subsystem “fast write” which writes to cache first and then destages to disk but in this case the destage may never happen if the data is written often enough.

The only flaw in this argument is that as the SSD drives get bigger (STEC’s drive is available supporting up to 800GB) this becomes less of an issue. Because with more raw storage the fact that a small portion of the data is very actively written gets swamped by the fact that there is plenty of storage to hold this data.  As such, when one NAND cell gets close to its lifetime another, younger cell can be used.  This process is called wear leveling. STEC’s current SLC Zeus drive already has sophisticated wear leveling to deal with this sort of problem with SLC SSDs and doing this for MLCs just means having larger tables to work with.

I guess at some point, with multi-TB per drives, the fact that MLC cannot sustain more than 10,000 erase/write passes becomes moot.  Because there just isn’t that much actively written data out there in an enterprise shop. When you amortize the portion of highly written data as a percentage of a drive, the more drive capacity, the smaller the active data percentages become. As such, as SSD drive capacities gets larger this becomes less of an issue.  I figure with 800GB drives, active data proportion might still be high enough to cause a problem but it might not be an issue at all.

Of course with MLC it’s also cheaper to over provision NAND storage to also help with write endurance. For an 800GB MLC SSD, you could easily add another 160GB (20% over provisioning) fairly cheaply. As such, over provisioning will also allow you to sustain an overall drive write endurance that is much higher than the individual NAND write endurance.

Another solution to the write endurance problem is to increase the power of ECC to handle write failures. This would probably take some additional engineering and may or may not be in the latest STEC MLC drive but it would make sense.

MLC performance

The other issue about MLC NAND is that it has slower read and erase/program cycle times.  Now these are still order’s of magnitude faster than standard disk but slower than SLC NAND.  For enterprise applications SLC SSDs are blistering fast and are often performance limited by the subsystem they are attached to. So, the fact that MLC SSDs are somewhat slower than SLC SSDs may not even be percieved by enterprise shops.

MLC performance is slower because it takes longer to read a cell with multiple bits in it than it takes with just one. MLC, in one technology I am aware of, encodes 2-bits in the voltage that is programmed in or read out from a cell, e.g., VoltageA = “00”, VoltageB=”01″, VoltageC=”10″, and VoltageD=”11″. This gets more complex with 3 or more bits per cell but the logic holds.  With multiple voltages, determining which voltage level is present is more complex for MLC and hence, takes longer to perform.

In the end I would expect STEC’s latest drive to be some sort of SLC-MLC hybrid but I could be wrong. It’s certainly possible that STEC have gone with just an MLC drive and beefed up the capacity, over provisioning, ECC, and wear leveling algorithms to handle its lack of write endurance

MLC takes over the world

But the major issue with using MLC in SSDs is that MLC technology is driving the NAND market. All those items in the photo above are most probably using MLC NAND, if not today then certainly tomorrow. As such, the consumer market will be driving MLC NAND manufacturing volumes way above anything the SLC market requires. Such volumes will ultimately make it unaffordable to manufacture/use any other type of NAND, namely SLC in most applications, including SSDs.

So sooner or later all SSDs will be using only MLC NAND technology. I guess the sooner we all learn to live with that the better for all of us.

XAM and data archives

Vista de la Biblioteca Vasconcelos by Eneas
Vista de la Biblioteca Vasconcelos by Eneas

XAM, a SNIA defined interface standard supporting reference data archives, is starting to become real. EMC and other vendors are starting to supply XAM compliant interfaces.  I could not locate (my Twitter survey for application vendors came back empty) any application vendors supporting XAM APIs but its only a matter of time .  What does XAM mean for your data archive?

The problem

Most IT shops with data archives use special purpose applications that support a vendor defined proprietary interface to store and retrieve data out of a dedicated archive appliance. For example, many email archives support EMC Centerra which has defined a proprietary Centerra API to store and retrieve data from their appliance.  Most other archive storage vendors have followed suit.  Leading to a proprietary vendor lock-in which slows adoption.

However, some proprietary APIs have been front-ended with something like NFS. The problem with NFS and other standard file interfaces is that they were never meant for reference data (data that does not change). So when you try to update an archived file one often gets some sort of weird system error.

Enter XAM

It was designed from the start for reference data. Moreover, XAM supports concurrent access to multiple vendor archive storage systems from the same application. As such, an application supplier need only code to one standard API to gain access to multiple vendor archive systems.

SNIA released the V1.0 XAM interface specfication last July  which defines XAM architecture, C- and JAVA-language API for both the application and the storage vendor.   Although from the looks of it the C version of vendor API is more complete.

However, currently I can only locate two archive storage vendors having released support for the XAM interface (EMC Centerra and SAND/DNA?).   A number of vendors have expressed interest in providing XAM interfaces (HP, HDS HCAP, Bycast StorageGrid and others).   How soon their XAM API support will be provided is TBD.

I would guess what’s really needed is for more vendors to start supporting XAM interface which would get the application vendors more interested in supporting XAM.   Its sort of a chicken and egg thing but I believe the storage vendors have the first move, the application vendors will take more time to see the need.

Does anyone know what other storage vendors support XAM today. Is there any single place where one could even find out? Ditto for applications supporting XAM today?

Toshiba’s New MLC NAND Flash SSDs

Toshiba has recently announced a new series of SSD’s based on MLC NAND (Yahoo Biz story). This is only the latest in a series of MLC SSDs which Toshiba has released.

Historically, MLC (multi-level cell) NAND has supported higher capacity but has been slower and less reliable than SLC (single-level cell) NAND. The capacity points supplied for the new drive (64, 128, 256, & 512GB) reflect the higher density NAND. Toshiba’s performance numbers for new drives also look appealing but are probably overkill for most desktop/notebook/netbook users

Toshiba’s reliability specifications were not listed in the Yahoo story and probably would be hard to find elsewhere (I looked on the Toshiba America website and couldn’t locate any). However the duty cycle for a desktop/notebook data drive are not that severe. So the fact that MLC can only endure ~1/10th the writes that SLC can endure is probably not much of an issue.

SNIA is working on SSD (or SSS as SNIA calls it, see SNIA SSSI forum website) reliability but have yet to publish anything externally. Unsure whether they will break out MLC vs SLC drives but it’s certainly worthy of discussion.

But the advantage of MLC NAND SSDs is that they should be 2 to 4X cheaper than SLC SSDs, depending on the number (2, 3 or 4) of bits/cell, and as such, more affordable. This advantage can be reduced by the need to over-provision the device and add more parallelism in order to improve MLC reliability and performance. But both of these facilities are becoming more commonplace and so should be relatively straight forward to support in an SSD.

The question remains, given the reliability differences, when and if MLC NAND will ever become reliable enough for enterprise class SSDs. Although many vendors make MLC NAND SSDs for the notebook/desktop market (Intel, SanDISK, Samsung, etc.), FusionIO is probably one of the few using a combination of SLC and MLC NAND for enterprise class storage (see FusionIO press release). Although calling the FusionIO device an SSD is probably a misnomer. And what FusionIO does to moderate MLC endurance issues is not clear but buffering write data to SLC NAND must certainly play some part.

Testing storage systems – Intel’s SSD fix

Intel’s latest (35nm NAND) SSD shipments were halted today because a problem was identified when modifying BIOS passwords (see IT PRO story). At least they gave a timeframe for a fix – a couple of weeks.

The real question is can products be tested sufficiently these days to insure they work in the field. Many companies today will ship product to end-user beta testers to work out the bugs before the product reaches the field. But beta-testing has got to be complemented with active product testing and validation. As such, unless you plan to get 100s or perhaps 1000s of beta testers you could have a serious problem with field deployment.

And therein lies the problem, software products are relatively cheap and easy to beta test, just set up a download site and have at it. But with hardware products beta testing actually involves sending product to end-users which costs quite a bit more $’s to support. So I understand why Intel might be having problems with field deployment.

So if you can’t beta test hardware products as easily as software – then you have to have a much more effective test process. Functional testing and validation is more of an art than a science and can cost significant $’s and more importantly, time. All of which brings us back to some form of beta testing.

Perhaps Intel could use their own employees as beta testers rotating new hardware products from one organization to another, over time to get some variability in the use of a new product. Many companies use their new product hardware extensively in their own data centers to validate functionality prior to shipment. In the case of Intel’s SSD drives these could be placed in the in-numberable servers/desktops that Intel no-doubt has throughout it’s corporation.

One can argue whether beta testing takes longer than extensive functional testing. However given today’s diverse deployments, I believe beta testing can be a more cost effective process when done well.

Intel is probably trying to figure out just what went wrong in their overall testing process today. I am sure, given their history, they will do better next time.

Digital Rosetta Stone vs 3d-Barcodes

The BBC reported today on a new way to store digital data for 1000 years coming out of Japan (BBC NEWS | Technology | ‘Rosetta stone’ offers digital lifeline). Personally, I don’t feel that silicon storage is the best answer to this problem, and “wireless” read-back may be problematic over protracted periods of time.

Something more like a 3-dimensional bar code makes a lot more sense to me. Such a recording device could easily record a lot more data than paper does today, be readable via laser scans, microscope, or other light based mechanisms, and by being a physical representation, could be manufactured out of many different materials.

It’s not to say that silicon might not be a good material, lasting for a long time. The article did not go into detail how the data was recorded but presumably this etched storage device somehow trapped a charge in a particular cell that could be read back electronically – not unlike NAND flash does today but with much better reliability. But it is unclear to me why the article states that humidity surrounding the Digital Rosetta Stone device impairs storage longevity. This seems to imply that even though the device is sealed it still can be impacted by external environmental conditions.

That’s why having a recording device that can be made up of many types of materials makes more sense to me. Such a device could conceivably be etched out of marble, ceramics, steel, or any number of other materials. Marble has lasted for millennia in Greece, Italy, and other places. Of course marble is subject to weather and acid rain. But the point is by having multiple substances that can be used to record data for long periods, all using the same recording format and read-back mechanisms we can insure that any number of them can retain data for a long time in the future. Such a 3d barcode could also be sealed in any transparent media such as glass which also has been known to last centuries.

Today 3d barcodes can be attached to a surface of a cube, but they could just as easily be attached to a plate, disk, or page. Once attached (or printed) they could easily record vast amounts of data.

In my view magnetic storage cannot last for over 50 years, electronic storage will not last over 100 years, and the only thing I know of that can last a 1000 years is some physical mechanism. 3D barcodes easily emerges as the answer to this storage problem.