There have been a number of Microsoft ESRP submissions this past quarter, especially in the over 5K mailbox category and they now total 12 submissions in this category alone.

The above chart is one or a series of charts from our recent StorInt(tm) dispatch on Exchange performance.   This chart displays an Exchange email counterpart to last month’s SpecSFS 2008 CIFS ORT chart only this time depicting the Top 10 Exchange database read, write and log latencies (sorted by read latency).

Except for the HP Smart Array (at #4) and Dell PowerVault MD1200 (#7), all the remaining submissions are FC attached subsystems.  The HP Smart Array and Dell exceptions used SAS attached storage.

For some reason the HP Smart Array had an almost immeasurable log write response time (<~0.1msec.) and a very respectable database read response time of 8.4msec.

As log writes are essentially sequential, we would expect a SAS/JBOD to do well here. But the random database reads and writes seem indicative of a well tuned, caching (sub-)system, not a JBOD!?

One secret to good Exchange 2010 JBOD performance appears to be matching your Exchange email database and log LUN size to disk drive size.  This seems to be a significant difference between Dell’s SAS storage and HP’s SAS storage.  For instance, both systems had 15Krpm SAS drives at ~600GB, but Dell’s LUN size was 13.4TB while HP’s database and log LUN size was 558GB.   Database and log LUN size relative to disk size didn’t seem to significantly impact Exchange performance for FC subsystems.

The other secret to good SAS Exchange 2010 performance is to stick with relatively small mailbox counts.  Both the HP and Dell JBODs had the smallest mailbox counts of this category at 6K and 7.2K respectively.

Exchange database write latency

There appears to be little correlation between read and write latencies in this data.  All of these results used Exchange database resiliency or DAGs, so they had similar types of database activity to contend with. Also the number of DAGs typically increased with higher mailbox counts but this wasn’t universal, e.g, the HDS AMS 2100 (#1) with 17.2K mailboxes had four DAGs while the last two IBM XIVs (#9&10) with 40K mailboxes had one each.  But the number of database activity groups shouldn’t matter much to Exchange database latencies.

On the other hand, the number of DAG copies may matter to Exchange write performance.  It is unclear how DAG copy writes are measured/simulated in Jetstress, the program used to drive ESRP workloads.   But, the number of database copies stood between two (#1,2,5,8&10) and three (#3,4,6,7&9) for all these submissions with no significant advantage for fewer copies.  So that’s not the answer.

I will make a stand here and say that high variability between read and write database latencies has something to do with storage (sub-)system caching effectiveness and Exchange 2010’s larger block sizes but it’s not clear from the available data.   However, this could easily be an artifact of the limited data available.

Why we like database access latency metrics

In our view, database read latencies correlates well with average Microsoft Exchange user experience for email read/search activities.  Also, log write and database write times can be good substitutes for Exchange Server email send times.  We like to think of database latencies as a end-user view of Exchange email performance.

Exchange 2010 is just a year old now and everyone is still trying to figure out how to perform well within the new architecture, so I expect some significant revisions to this chart over time.  Nonetheless, the current crop clearly indicates that there is a wide disparity in Exchange storage performance.

As always, we welcome any constructive comments on how to improve our analysis of ESRP results.