Well after last months performance reversals and revelations we now return to the more typical review of the latest Exchange Solution Review Program (ESRP 3.0) for Exchange 2010 results. Microsoft’s new Exchange 2010 has substantially changed the efficiency and effectiveness of Exchange database I/O. This will necessitate a new round of ESRP results for all vendors to once again show what their storage can support in Exchange 2010 mail users. IBM was the first vendor to take this on with their XIV and SVC results. But within the last quarter EMC and HP also submitted results. This marks our first blog review of ESRP 3.0 results.
We show here a chart on database latency for current ESRP 3.0 results. The three lines for each subsystem show the latency in milliseconds for a ESE database read, database write and log write. In prior ESRP reviews, one may recall that write latency was impacted by the Exchange redundancy in use. In this chart all four subsystems were using database availability group redundancy (DAG) so write activity should truly show subsystem overhead and not redundancy options.
Unclear why IBM’s XIV showed up so poorly here. The HP EVA 8400 is considered a high end subsystem but all the rest are midrange. If one considered drives being used – the HP used 15Krpm FC disk drives, the SVC used 15Krpm SAS drives and both the CLARiiON and the XIV used 7.2Krpm SATA drives. Still doesn’t explain the poor showing yet.
Of course the XIV had the heaviest user mail workload at 40,000 user mailboxes being simulated and it did perform relatively better from a normalized database transactions perspective (not shown). Given all this perhaps this XIV submission was intended to show the top end of what the XIV could do from a mailbox count level rather than latency.
Which points up one failing in our analysis. In past ESRP reviews we have always split results into one of three categories <1Kmbx, 1001..5Kmbx, and >5Kmbx. As ESRP 3.0 is so new there are only 4 results to date and as such, we have focused only on “normalized” quantities in our full newsletter analysis and here. We believe database latency should not “normally” be impacted by the count of mail users being simulated and must say we are surprised by XIVs showing because of this. But in all fairness, it sustained 8 80 times the workload that the CLARiiON did.
Interpreting ESRP 3.0 results
As discussed above all 4 tested subsystems were operating with database availability group (DAG) redundancy and as such, 1/2 of the simulated mail user workload was actually being executed on a subsystem while the other 1/2 was being executed as if it were a DAG copy being updated on the subsystem under test. For example, the #1 HP EVA configuration requires 2-8400s to sustain a real 9K mailbox configuration with DAG in operation. Such a configuration would support 2 mailbox databases (with 4500 mailboxes each), one active mailbox database residing on each 8400 and the inactive copy of this database residing on it’s brethern. (Naturally, the HP ESRP submission also supported VSS shadow copies for the DAGs which added yet another wrinkle to our comparisons.)
A couple of concerns simulating DAGs in this manner:
- Comparing DAG and non-DAG ESRP results will be difficult at best. It’s unclear to me whether all future ESRP 3.0 submissions will be required to use DAGs or not. But if not, comparing DAG to non-DAG results will be almost meaningless.
- Vendors could potentially perform ESRP 3.0 tests with less server and storage hardware. By using DAGs, the storage under test need only endure 1/2 the real mail server I/O workload and 1/2 a DAG copy workload. The other half of this workload simulation may not actually be present as it’s exactly equivalent to the first workload.
- Hard to determine if all the hardware was present or only half. It’s unclear from a casual skimming of the ESRP report whether all the hardware was tested or not.
- 1/2 the real mail server I/O is not the same as 1/2 the DAG copy workload. As such, it’s unclear whether 1/2 the proposed configuration could actually sustain a non-DAG version of an equivalent user mailbox count.
All this makes for exciting times in interpreting current and future ESRP 3.0 results. Look for more discussion on future ESRP results in about a quarter from now.
As always if you wish to obtain a free copy of our monthly Storage Intelligence newsletter please drop us a line. The full report on ESRP 3.0 results will be up on the dispatches section of our website later this month.
2 thoughts on “Exchange 2010/ESRP 3.0 results – chart of the month”
I think it's very unwise to compare ESRP results in this way; firstly because Microsoft specifically say that the ESRP is not a benchmark and shouldn't be used as such, and secondly, because the size of the workload that's tested on each of the storage systems is so very different. As is the type, number, and configuration of the Windows and Exchange servers.
For example, it appears that the Clariion was tested with only 500 mailboxes, while the XIV was tested with 40,000 mailboxes. That's 80 times the workload (not 8 times). Surely that makes a nonsense out of any comparison?
And the XIV test was configured to simulate 0.8 IOPS per user (+20% headroom to 1.0), while the EVA test was configured to simulate 0.25 IOPS per user (+20% headroom to 0.3) and only for 9000 mailboxes.
It's unfortunate that you chose to graph only the latency values for each system, since they are all below Microsoft's recommended maximum (20ms IIRC). More interesting perhaps would be to look at the IOPS delivered by each system; for example we see the XIV deliver 20,555 IOPS, and the EVA8400 deliver 4,205.
Now that *is* interesting, considering the type and number of spindles in each system (but, of course, ESRP results shouldn't be compared in this way 🙂
All good points. I only show one chart in these analyses. However, I do mention that XIV did better on other aspects of its ESRP. Also I would normally not compare a 500 mailbox run against a 40,000 mailbox run but these were the only ones available at the time.
As for latency this can be gamed somewhat but it’s as good a “normalized” ESRP measure as any other I report on. And why I compare ESRP runs, is because it’s a real world workload, albeit simulated. SPC and SPECsfs are one level more removed from real workloads. In essence, I like ESRP because it’s closer to what a real customer will see. Ray
Comments are closed.