Exchange 2010/ESRP 3.0 results – chart of the month

(c) 2010 Silverton Consulting, Inc., All Rights Reserved
(c) 2010 Silverton Consulting, Inc., All Rights Reserved

Well after last months performance reversals and revelations we now return to the more typical review of the latest Exchange Solution Review Program (ESRP 3.0) for Exchange 2010 results.  Microsoft’s new Exchange 2010 has substantially changed the efficiency and effectiveness of Exchange database I/O.  This will necessitate a new round of ESRP results for all vendors to once again show what their storage can support in Exchange 2010 mail users.  IBM was the first vendor to take this on with their XIV and SVC results.  But within the last quarter EMC and HP also submitted results.  This marks our first blog review of ESRP 3.0 results.

We show here a chart on database latency for current ESRP 3.0 results.  The three lines for each subsystem show the latency in milliseconds for a ESE database read, database write and log write.  In prior ESRP reviews, one may recall that write latency was impacted by the Exchange redundancy in use.  In this chart all four subsystems were using database availability group redundancy (DAG) so write activity should truly show subsystem overhead and not redundancy options.

Unclear why IBM’s XIV showed up so poorly here.  The HP EVA 8400 is considered a high end subsystem but all the rest are midrange.  If one considered drives being used – the HP used 15Krpm FC disk drives, the SVC used 15Krpm SAS drives and both the CLARiiON and the XIV used 7.2Krpm SATA drives.  Still doesn’t explain the poor showing yet.

Of course the XIV had the heaviest user mail workload at 40,000 user mailboxes being simulated and it did perform relatively better from a normalized database transactions perspective (not shown).  Given all this perhaps this XIV submission was intended to show the top end of what the XIV could do from a mailbox count level rather than latency.

Which points up one failing in our analysis. In past ESRP reviews we have always split results into one of three categories <1Kmbx, 1001..5Kmbx, and >5Kmbx.  As ESRP 3.0 is so new there are only 4 results to date and as such, we have focused only on “normalized” quantities in our full newsletter analysis and here.  We believe database latency should not “normally” be impacted by the count of mail users being simulated and must say we are surprised by XIVs showing because of this.  But in all fairness, it sustained 8 80 times the workload that the CLARiiON did.

Interpreting ESRP 3.0 results

As discussed above all 4 tested subsystems were operating with database availability group (DAG) redundancy and as such, 1/2 of the simulated mail user workload was actually being executed on a subsystem while the other 1/2 was being executed as if it were a DAG copy being updated on the subsystem under test.  For example, the #1 HP EVA configuration requires 2-8400s to sustain a real 9K mailbox configuration with DAG in operation.  Such a configuration would support 2 mailbox databases (with 4500 mailboxes each), one active mailbox database residing on each 8400 and the inactive copy of this database residing on it’s brethern.  (Naturally, the HP ESRP submission also supported VSS shadow copies for the DAGs which added yet another wrinkle to our comparisons.)

A couple of concerns simulating DAGs in this manner:

  • Comparing DAG and non-DAG ESRP results will be difficult at best.  It’s unclear to me whether all future ESRP 3.0 submissions will be required to use DAGs or not.  But if not, comparing DAG to non-DAG results will be almost meaningless.
  • Vendors could potentially perform ESRP 3.0 tests with less server and storage hardware. By using DAGs, the storage under test need only endure 1/2 the real mail server I/O workload and 1/2 a DAG copy workload.  The other half of this workload simulation may not actually be present as it’s exactly equivalent to the first workload.
  • Hard to determine if all the hardware was present or only half.  It’s unclear from a casual  skimming of the ESRP report whether all the hardware was tested or not.
  • 1/2 the real mail server I/O is not the same as 1/2 the DAG copy workload. As such, it’s unclear whether 1/2 the proposed configuration could actually sustain a non-DAG version of an equivalent user mailbox count.

All this makes for exciting times in interpreting current and future ESRP 3.0 results.  Look for more discussion on future ESRP results in about a quarter from now.

As always if you wish to obtain a free copy of our monthly Storage Intelligence newsletter please drop us a line. The full report on ESRP 3.0 results will be up on the dispatches section of our website later this month.

ESRP 1K to 5Kmbox performance – chart of the month

ESRP 1001 to 5000 mailboxes, database transfers/second/spindle
ESRP 1001 to 5000 mailboxes, database transfers/second/spindle

One astute reader of our performance reports pointed out that some ESRP results could be skewed by the number of drives that are used during a run.  So, we included a database transfers per spindle chart in our latest Exchange Solution Review Program (ESRP) report on 1001 to 5000 mailboxes in our latest newsletter.  The chart shown here is reproduced from that report and shows the number of overall database transfers attained (total of read and write) for the top 10 storage subsystems reporting in the latest ESRP results.

This cut of the system performance shows a number of diverse systems:

  • Some storage systems had 20 disk drives and others had 4.
  • Some of these systems were FC storage (2), some were SAS attached storage (3), but most were iSCSI storage.
  • Mailbox counts supported by these subsystems ranged from 1400 to 5000 mailboxes.

What’s not shown is the speed of the disk spindles. Also none of these systems are using SSD or NAND cards to help sustain their respective workloads.

A couple of surprises here:

  • iSCSI systems should have shown up much worse than FC storage. True, the number 1 system (NetApp FAS2040) is FC while the numbers 2&3 are iSCSI, the differences are not that great.  It would seem that protocol overhead is not a large determinant in spindle performance for ESRP workloads.
  • The number of drives used also doesn’t seem to matter much.  The FAS2040 had 12 spindles while the AX4-5i only had 4.  Although this cut of the data should minimize drive count variability, one would think that more drives would result in higher overall performance for all drives.
  • Such performance approaches drive limits of just what a 15Krpm drive can sustain.  No doubt some of this is helped by system caching, but no amount of cache can hold all the database write and read data for the duration of a Jetstress run.  It’s still pretty impressive, considering typical 15Krpm drives (e.g., Seagate 15K.6) can probably do ~172 random 64Kbyte block IOs/second. The NetApp FAS2040 hit almost 182 database transfers/second/spindle, perhaps not 64Kbyte blocks and maybe not completely random, but impressive nonetheless.

The other nice thing about this metric, is that it doesn’t correlate that well with any other ESRP metrics we track, such as aggregate database transfers, database latencies, database backup throughput etc. So it seems to measure a completely different dimension of Exchange performance.

The full ESRP report went out to our newsletter subscribers last month and a copy of the report will be up on the dispatches page of our website later this month. However, you can get this information now and subscribe to future newsletters to receive future full reports even earlier, just subscribe by email.

As always, we welcome any suggestions on how to improve our analysis of ESRP or any of our other storage system performance results.  This new chart was a result of one such suggestion.

ESRP results over 5K mbox-chart of the month

ESRP Results, over 5K mailboxr, normalized (per 5Kmbx) read and write DB transfers as of 30 October 2009
ESRP Results, over 5K mailbox, normalized (per 5Kmbx) read and write DB transfers as of 30 October 2009

In our quarterly study on Exchange Solution Reviewed Program (ESRP) results we show a number of charts to get a good picture of storage subsystem performance under Exchange workloads. The two that are of interest to most data centers are both the normalized and un-normalized database transfer (DB xfer) charts. The problem with un-normalized DB xfer charts is that the subsystem supporting the largest mailbox count normally shows up best, and the rest of the results are highly correlated to mailbox count. In contrast, the normalized view of DB xfers tends to discount high mailbox counts and shows a more even handed view of performance.

 

We show above a normalized view of ESRP results for the category that were available last month. A couple of caveats are warranted here:

  • Normalized results don’t necessarily scale – results shown in the chart range from 5,400 mailboxes (#1) to 100,000 mailboxes (#6). While normalization should allow one to see what a storage subsystem could do for any mailbox count. It is highly unlikely that one would configure the HDS AMS2100 to support 100,000 mailboxes and it is equally unlikely that one would configure the HDS USP-V to support 5,400 mailboxes.
  • The higher count mailbox results tend to cluster when normalized – With over 20,000 mailboxes, one can no longer just use one big Exchange server and given the multiple servers driving the single storage subsystem, results tend to shrink when normalized. So one should probably compare like mailbox counts rather than just depend on normalization to iron out the mailbox count differences.

There are a number of storage vendors in this Top 10. There are no standouts here, the midrange systems from HDS, HP, and IBM seem to hold down the top 5 and the high end subsystems from EMC, HDS, and 3PAR seem to own the bottom 5 slots.

However, Pillar is fairly unusual in that their 8.5Kmbx result came in at #4 and their 12.8Kmbx result came in at #8. In contrast, the un-normalized results for this chart appear exactly the same. Which brings up yet another caveat, when running two benchmarks with the same system, normalization may show a difference where none exists.

The full report on the latest ESRP results will be up on our website later this month but if you want to get this information earlier and receive your own copy of our newsletter – just subscribe by emailing us.

ESRP results 1K and under mailboxes – chart of the month

Top 10 ESRP database transfers/sec
Top 10 ESRP database transfers/sec

As described more fully in last months SCI’s newsletter, to the left is a chart depicting Exchange Solution Reporting Program (ESRP) results for up to 1000 mailboxes in the database read and write per second category. This top 10 chart is dominated by HP’s new MSA 2000fc G2 product.

Microsoft will tell you that ESRP is not to be used to compare one storage vendor against another but more as a proof of concept to show how some storage can support a given email workload. The nice thing about ESRP, from my perspective, is that it represents a realistic storage workload rather than the more synthetic workloads offered by the other benchmarks.

What does over 3000 Exchange database operations per second mean to the normal IT shop or email user. It should mean more emails per hour can be sent/received with less hardware. It should mean a higher capacity to service email clients. It should mean a happier IT staff.

But does it mean happier end-users?

I would show my other chart from this latest dispatch that has read latency on it but that would be two charts. Anyways, what the top 10 Read Latency chart would show is that EMC CLARiiON dominates with the overall lowest latency and has the top 9 positions with various versions of CLARiiON and replication alternatives being reported in ESRP results. The 9-CLARiiON subsystems had a latency at around 8-11 msecs. The one CLARiiON on the chart above (CX3-20, #7 in the top 10) had a read latency around 9 msec. and write latency at 5 msec. In contrast, the HP MSA had a read latency of 16 msecs with a write latency of 5 msec. – very interesting.

What this says is that database transfers per second are now more like throughput measures and even though a single database operation (latency) may be almost ~2X longer (9 vs. 16 msecs), it can still perform more database transfer operations per second due to concurrency. Almost makes sense.

Are vendors different?

This probably says something more about the focus of the two storage vendor engineering groups – EMC CLARiiON on getting data to you the fastest and HP MSA on getting the most data through the system.  It might also speak to what the vendor’se ESRP teams were trying to show as well. In any case, EMC’s CLARiiON and HP’s MSA have very different performance profiles.

Which vendor’s storage product makes best sense for your Exchange servers – that’s a more significant question?

The full report will be up on my website later this week but if you want to get this information earlier and receive your own copy of our newsletter – just subscribe by emailing us.