ESRP 1K to 5Kmbox performance – chart of the month

ESRP 1001 to 5000 mailboxes, database transfers/second/spindle
ESRP 1001 to 5000 mailboxes, database transfers/second/spindle

One astute reader of our performance reports pointed out that some ESRP results could be skewed by the number of drives that are used during a run.  So, we included a database transfers per spindle chart in our latest Exchange Solution Review Program (ESRP) report on 1001 to 5000 mailboxes in our latest newsletter.  The chart shown here is reproduced from that report and shows the number of overall database transfers attained (total of read and write) for the top 10 storage subsystems reporting in the latest ESRP results.

This cut of the system performance shows a number of diverse systems:

  • Some storage systems had 20 disk drives and others had 4.
  • Some of these systems were FC storage (2), some were SAS attached storage (3), but most were iSCSI storage.
  • Mailbox counts supported by these subsystems ranged from 1400 to 5000 mailboxes.

What’s not shown is the speed of the disk spindles. Also none of these systems are using SSD or NAND cards to help sustain their respective workloads.

A couple of surprises here:

  • iSCSI systems should have shown up much worse than FC storage. True, the number 1 system (NetApp FAS2040) is FC while the numbers 2&3 are iSCSI, the differences are not that great.  It would seem that protocol overhead is not a large determinant in spindle performance for ESRP workloads.
  • The number of drives used also doesn’t seem to matter much.  The FAS2040 had 12 spindles while the AX4-5i only had 4.  Although this cut of the data should minimize drive count variability, one would think that more drives would result in higher overall performance for all drives.
  • Such performance approaches drive limits of just what a 15Krpm drive can sustain.  No doubt some of this is helped by system caching, but no amount of cache can hold all the database write and read data for the duration of a Jetstress run.  It’s still pretty impressive, considering typical 15Krpm drives (e.g., Seagate 15K.6) can probably do ~172 random 64Kbyte block IOs/second. The NetApp FAS2040 hit almost 182 database transfers/second/spindle, perhaps not 64Kbyte blocks and maybe not completely random, but impressive nonetheless.

The other nice thing about this metric, is that it doesn’t correlate that well with any other ESRP metrics we track, such as aggregate database transfers, database latencies, database backup throughput etc. So it seems to measure a completely different dimension of Exchange performance.

The full ESRP report went out to our newsletter subscribers last month and a copy of the report will be up on the dispatches page of our website later this month. However, you can get this information now and subscribe to future newsletters to receive future full reports even earlier, just subscribe by email.

As always, we welcome any suggestions on how to improve our analysis of ESRP or any of our other storage system performance results.  This new chart was a result of one such suggestion.

Latest SPECsfs2008 CIFS performance – chart of the month

Above we reproduce a chart from our latest newsletter StorInttm Dispatch on SPECsfs(R) 2008 benchmark results.  This chart shows the top 10 CIFS throughput benchmark results as of the end of last year.  As observed in the chart Apple’s Xserve running Snow Leopard took top performance with over 40K CIFS throughput operations per second.  My problem with this chart is that there are no enterprise class systems represented in the top 10 or for that matter (not shown in the above) in any CIFS result.

Now some would say it’s still early yet in the life of the 2008 benchmark but it has been out now for 18 months and still has not a single enterprise class system submission reported.  Possibly, CIFS is not considered an enterprise class protocol but I can’t believe that given the proliferation of Windows.  So what’s the problem?

I have to believe it’s part tradition, part not wanting to look bad, and part just lack of awareness on the part of CIFS users.

  • Traditionally, NFS benchmarks were supplied by SPECsfs and CIFS benchmarks were supplied elsewhere, i.e., NetBenc. However, there never was a central repository for NetBench results so comparing system performance was cumbersome at best.  I believe that’s one reason for SPECsfs’s CIFS benchmark.  Seeing the lack of a central repository for a popular protocol, SPECsfs created their own CIFS benchmark.
  • Performance on system benchmarks are always a mixed bag.  No-one wants to look bad and any top performing result is temporary until the next vendor comes along.  So most vendors won’t release a benchmark result unless it shows well for them.  Not clear if Apple’s 40K CIFS ops is a hard number to beat, but it’s been up there for quite awhile now, and has to tell us something.
  • CIFS users seem to be aware and understand NetBench but don’t have similar awareness on SPECsfs CIFS benchmark yet.  So, given today’s economic climate, any vendor wanting to impress CIFS customers would probably choose to ignore SPECsfs and spend their $s on NetBench.  The fact that comparing results was neigh impossible, could be considered an advantage for many vendors.

So SPECsfs CIFS just keeps going on.  One way to change this dynamic is to raise awareness.  So as more IT staff/consultants/vendors discuss SPECsfs CIFS results, its awareness will increase.  I realize some of  my analysis on CIFS and NFS performance results doesn’t always agree with the SPECsfs party line, but we all agree that this benchmark needs wider adoption.  Anything that can be done to facilitate that deserves my (and their) support.

So for all my storage admins, CIOs and other influencers of NAS system purchases friends out there, you need to start asking to about SPECsfs CIFS benchmark results.  All my peers out their in the consultant community, get on the bandwagon.  As for my friends in the vendor community, SPECsfs CIFS benchmark results should be part of any new product introduction.  Whether you want to release results is and always will be, a marketing question but you all should be willing to spend the time and effort to see how well new systems perform on this and other benchmarks.

Now if I could just get somebody to define an iSCSI benchmark, …

Our full report on the latest SPECsfs 2008 results including both NFS and CIFS performance, will be up on our website later this month.  However, you can get this information now and subscribe to future newsletters to receive the full report even earlier, just email us at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.

Latest SPC-2 results – chart of the month

SPC-2* benchmark results, spider chart for LFP, LDQ and VOD throughput
SPC-2* benchmark results, spider chart for LFP, LDQ and VOD throughput

Latest SPC-2 (Storage Performance Council-2) benchmark resultschart displaying the top ten in aggregate MBPS(TM) broken down into Large File Processing (LFP), Large Database Query (LDQ) and Video On Demand (VOD) throughput results. One problem with this chart is that it really only shows 4 subsystems: HDS and their OEM partner HP; IBM DS5300 and Sun 6780 w/8GFC at RAID 5&6 appear to be the same OEMed subsystem; IBM DS5300 and Sun 6780 w/ 4GFC at RAID 5&6 also appear to be the same OEMed subsystem; and IBM SVC4.2 (with IBM 4700’s behind it).

What’s interesting about this chart is what’s going on at the top end. Both the HDS (#1&2) and IBM SVC (#3) seem to have found some secret sauce for performing better on the LDQ workload or conversely some dumbing down of the other two workloads (LFP and VOD). According to the SPC-2 specification

  • LDQ is a workload consisting of 1024KiB and 64KiB transfers whereas the LFP consists of 1024KiB and 256KiB transfers and the VOD consists of only 256KiB, so transfer size doesn’t tell the whole story.
  • LDQ seems to have a lower write proportion (1%) while attempting to look like joining two tables into one, or scanning data warehouse to create output whereas, LFP processing has a read rate of 50% (R:W of 1:1) while executing a write-only phase, read-write phase and a read-only phase, and apparently VOD has a 100% read only workload mimicking streaming video.
  • 50% of the LDQ workload uses 4 I/Os outstanding and the remainder 1 I/O outstanding. The LFP uses only 1 I/O outstanding and VOD uses only 8 I/Os outstanding.

These seem to be the major differences between the three workloads. I would have to say that some sort of caching sophistication is evident in the HDS and SVC systems that is less present in the remaining systems. And I was hoping to provide some sort of guidance as to what that sophistication looked like but

  • I was going to say they must have a better sequential detection algorithm but the VOD, LDQ and LFP workloads have 100%, 99% and 50% read ratios respectively and sequential detection should perform better with VOD and LDQ than LFP. So thats not all of it.
  • Next I was going to say it had something to do with I/O outstanding counts. But VOD has 8 I/Os outstanding and the LFP only has 1, so the if this were true VOD should perform better than LFP. While LDQ having two sets of phases with 1 and 4 I/Os outstanding should have results somewhere in between these two. So thats not all of it.
  • Next I was going to say stream (or file) size is an important differentiator but “Segment Stream Size” for all workloads is 0.5GiB. So that doesn’t help.

So now I am a complete loss as to understand why the LDQ workloads are so much better than the LFP and VOD workload throughputs for HDS and SVC.

I can only conclude that the little write activity (1%) thrown into the LDQ mix is enough to give the backend storage a breather and allow the subsystem to respond better to the other (99%) read activity. Why this would be so much better for the top performers than the remaining results is not entirely evident. But I would add that, being able to handle lots of writes or lots of reads is relatively straight forward, but handling a un-ballanced mixture is harder to do well.

To validate this conjecture would take some effort. I thought it would be easy to understand what’s happening but as with most performance conundrums the deeper you look the more confounding the results often seem to be.

The full report on the latest SPC results will be up on my website later this year but if you want to get this information earlier and receive your own copy of our newsletter – email me at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.

I will be taking the rest of the week off so Happy Holidays to all my readers and a special thanks to all my commenters. See you next week.

ESRP results over 5K mbox-chart of the month

ESRP Results, over 5K mailboxr, normalized (per 5Kmbx) read and write DB transfers as of 30 October 2009
ESRP Results, over 5K mailbox, normalized (per 5Kmbx) read and write DB transfers as of 30 October 2009

In our quarterly study on Exchange Solution Reviewed Program (ESRP) results we show a number of charts to get a good picture of storage subsystem performance under Exchange workloads. The two that are of interest to most data centers are both the normalized and un-normalized database transfer (DB xfer) charts. The problem with un-normalized DB xfer charts is that the subsystem supporting the largest mailbox count normally shows up best, and the rest of the results are highly correlated to mailbox count. In contrast, the normalized view of DB xfers tends to discount high mailbox counts and shows a more even handed view of performance.

 

We show above a normalized view of ESRP results for the category that were available last month. A couple of caveats are warranted here:

  • Normalized results don’t necessarily scale – results shown in the chart range from 5,400 mailboxes (#1) to 100,000 mailboxes (#6). While normalization should allow one to see what a storage subsystem could do for any mailbox count. It is highly unlikely that one would configure the HDS AMS2100 to support 100,000 mailboxes and it is equally unlikely that one would configure the HDS USP-V to support 5,400 mailboxes.
  • The higher count mailbox results tend to cluster when normalized – With over 20,000 mailboxes, one can no longer just use one big Exchange server and given the multiple servers driving the single storage subsystem, results tend to shrink when normalized. So one should probably compare like mailbox counts rather than just depend on normalization to iron out the mailbox count differences.

There are a number of storage vendors in this Top 10. There are no standouts here, the midrange systems from HDS, HP, and IBM seem to hold down the top 5 and the high end subsystems from EMC, HDS, and 3PAR seem to own the bottom 5 slots.

However, Pillar is fairly unusual in that their 8.5Kmbx result came in at #4 and their 12.8Kmbx result came in at #8. In contrast, the un-normalized results for this chart appear exactly the same. Which brings up yet another caveat, when running two benchmarks with the same system, normalization may show a difference where none exists.

The full report on the latest ESRP results will be up on our website later this month but if you want to get this information earlier and receive your own copy of our newsletter – just subscribe by emailing us.

Latest SPECsfs2008 results – chart of the month

Top 10 SPEC(R) sfs2008 NFS throughput results as of 25Sep2009
Top 10 SPEC(R) sfs2008 NFS throughput results as of 25Sep2009

The adjacent chart is from our September newsletter and shows the top 10 NFSv3 throughput results from the latest SPEC(R) sfs2008 benchmark runs published as of 25 September 2009.

There have been a number of recent announcements of newer SPECsfs2008 results in the news of late, namely Symantec’s FileStore and Avere Systems releases but these results are not covered here. In this chart, the winner is the NetApp FAS6080 with FCAL disks behind it, clocking in at 120K NFSv3 operations/second. This was accomplished with 324 disk drives using 2-10Gbe links.

PAM comes out

All that’s interesting of course but what is even more interesting is NetApp’s results with their PAM II (Performance Accelerator Module) cards. The number 3, 4 and 5 results were all with the same system (FAS3160) with different configurations of disks and PAM II cards. Specifically,

  • The #3 result had a FAS3160, running 56 FCAL disks with PAM II cards and DRAM cache of 532GBs. The system attained 60.5K NFSv3 operations per second.
  • The #4 result had a FAS3160, running 224 FCAL disks with no PAM II cards but 20GB of DRAM cache. This system attained 60.4K NFSv3 ops/second.
  • The #5 result had a FAS3160, running 96 SATA disks with PAM II cards and DRAM cache of 532GBs. This system also attained 60.4K NFSv3 ops/second.

Similar results can be seen with the FAS3140 systems at #8, 9 and 10. In this case the FAS3140 systems were using PAM I (non-NAND) cards with 41GB of cache for results #9 and 10, while #8 result had no PAM with only 9GB of Cache. The #8 result used 224 FCAL disks, #9 used 112 FCAL disks, and #10 had 112 SATA disks. They were able to achieve 40.1K, 40.1K and 40.0K NFSv3 ops/second respectively.

Don’t know how much PAM II cards cost versus FCAL or SATA disks but there is an obvious trade off here. You can use less FCAL or cheaper SATA disks but attain the same NFSv3 ops/second performance.

As I understand it, the PAM II cards come in 256GB configurations and you can have 1 or 2 cards in a FAS system configuration. PAM cards act as an extension of FAS system cache and all IO workloads can benefit from their performance.

As with all NAND flash, write access is significantly slower than read and NAND chip reliability has to be actively managed through wear leveling and other mechanisms to create a reliable storage environment. We assume either NetApp has implemented the appropriate logic to support reliable NAND storage or has purchased NAND cache with the logic already onboard. In any case, the reliability of NAND is more concerned with write activity than read and by managing the PAM cache to minimize writes, NAND reliability concerns could easily be avoided.

The full report on the latest SPECsfs2008 results will be up on my website later this week but if you want to get this information earlier and receive your own copy of our newsletter – email me at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.

Full disclosure: I currently have a contract with NetApp on another facet of their storage but it is not on PAM or NFSv3 performance.

Chart of the month: SPC-1 LRT performance results

Chart of the Month: SPC-1 LRT(tm) performance resultsThe above chart shows the top 12 LRT(tm) (least response time) results for Storage Performance Council’s SPC-1 benchmark. The vertical axis is the LRT in milliseconds (msec.) for the top benchmark runs. As can be seen the two subsystems from TMS (RamSan400 and RamSan320) dominate this category with LRTs significantly less than 2.5msec. IBM DS8300 and it’s turbo cousin come in next followed by a slew of others.

The 1msec. barrier

Aside from the blistering LRT from the TMS systems one significant item in the chart above is that the two IBM DS8300 systems crack the <1msec. barrier using rotating media. Didn’t think I would ever see the day, of course this happened 3 or more years ago. Still it’s kind of interesting that there haven’t been more vendors with subsystems that can achieve this.

LRT is probably most useful for high cache hit workloads. For these workloads the data comes directly out of cache and the only thing between a server and it’s data is subsystem IO overhead, measured here as LRT.

Encryption cheap and fast?

The other interesting tidbit from the chart is that the DS5300 with full drive encryption (FDE), (drives which I believe come from Seagate) cracks into the top 12 at 1.8msec exactly equivalent with the IBM DS5300 without FDE. Now FDE from Seagate is a hardware drive encryption capability and might not be measurable at a subsystem level. Nonetheless, it shows that having data security need not reduce performance.

What is not shown in the above chart is that adding FDE to the base subsystem only cost an additional US$10K (base DS5300 listed at US$722K and FDE version at US$732K). Seems like a small price to pay for data security which in this case is simply turn it on, generate keys, and forget it.

FDE is a hard drive feature where the drive itself encrypts all data written and decrypts all data read to from a drive and requires a subsystem supplied drive key at power on/reset. In this way the data is never in plaintext on the drive itself. If the drive were taken out of the subsystem and attached to a drive tester all one would see is ciphertext. Similar capabilities have been available in enterprise and SMB tape drives is the past but to my knowledge the IBM DS5300 FDE is the first disk storage benchmark with drive encryption.

I believe the key manager for the DS5300 FDE is integrated within the subsystem. Most shops would need a separate, standalone key manager for more extensive data security. I believe the DS5300 can also interface with an standalone (IBM) key manager. In any event, it’s still an easy and simple step towards increased data security for a data center.

The full report on the latest SPC results will be up on my website later this week but if you want to get this information earlier and receive your own copy of our newsletter – email me at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.

ESRP results 1K and under mailboxes – chart of the month

Top 10 ESRP database transfers/sec
Top 10 ESRP database transfers/sec

As described more fully in last months SCI’s newsletter, to the left is a chart depicting Exchange Solution Reporting Program (ESRP) results for up to 1000 mailboxes in the database read and write per second category. This top 10 chart is dominated by HP’s new MSA 2000fc G2 product.

Microsoft will tell you that ESRP is not to be used to compare one storage vendor against another but more as a proof of concept to show how some storage can support a given email workload. The nice thing about ESRP, from my perspective, is that it represents a realistic storage workload rather than the more synthetic workloads offered by the other benchmarks.

What does over 3000 Exchange database operations per second mean to the normal IT shop or email user. It should mean more emails per hour can be sent/received with less hardware. It should mean a higher capacity to service email clients. It should mean a happier IT staff.

But does it mean happier end-users?

I would show my other chart from this latest dispatch that has read latency on it but that would be two charts. Anyways, what the top 10 Read Latency chart would show is that EMC CLARiiON dominates with the overall lowest latency and has the top 9 positions with various versions of CLARiiON and replication alternatives being reported in ESRP results. The 9-CLARiiON subsystems had a latency at around 8-11 msecs. The one CLARiiON on the chart above (CX3-20, #7 in the top 10) had a read latency around 9 msec. and write latency at 5 msec. In contrast, the HP MSA had a read latency of 16 msecs with a write latency of 5 msec. – very interesting.

What this says is that database transfers per second are now more like throughput measures and even though a single database operation (latency) may be almost ~2X longer (9 vs. 16 msecs), it can still perform more database transfer operations per second due to concurrency. Almost makes sense.

Are vendors different?

This probably says something more about the focus of the two storage vendor engineering groups – EMC CLARiiON on getting data to you the fastest and HP MSA on getting the most data through the system.  It might also speak to what the vendor’se ESRP teams were trying to show as well. In any case, EMC’s CLARiiON and HP’s MSA have very different performance profiles.

Which vendor’s storage product makes best sense for your Exchange servers – that’s a more significant question?

The full report will be up on my website later this week but if you want to get this information earlier and receive your own copy of our newsletter – just subscribe by emailing us.