Exchange database access latencies – Silverton Consulting

(c) 2010 Silverton Consulting, Inc., All Rights Reserved

The chart is from SCI’s October newsletter/performance dispatch on Exchange 2010 Solution Reviewed Program (ESRP v3.0) and shows the mailbox database access latencies for read, write and log write. For this report we are covering solutions supporting from 1001 up to 5000 mailboxes (1K-to-5Kmbx), larger and (a few) smaller configurations have been covered in previous performance dispatches. On latency charts like this – lower is better.

We like this chart because in our view this represents a reasonable measure of email user experience. As users read and create new emails they are actually reading Exchange databases and writing database and logs. Database and log latencies should show up as longer or shorter delays in these activities. (Ok, not exactly true, email client and Exchange server IO aren’t the same thing. But ultimately every email sent has to be written to an Exchange database and log sometime and every new email read-in has to come from an Exchange database as well).

A couple of caveats are in order for this chart.

Xiotech’s top run (#1) did not use database redundancy or DAGs (Database Availability Groups) in their ESRPv3 run. Their feeling is that this technology is fairly new and it will take some time before it’s widely adopted.
There is quite the mix of SAS (#2,3,6,7,9&10), FC (#1,5&8) and iSCSI (#4) connected storage in this mailbox range. Some would say that SAS connected storage should have an advantage here but that’s not obvious from the rankings.
Vendors get to select the workload intensity for any ESRPv3/Jetstress run, e.g. the solutions shown here used between 0.15 IO/sec/mailbox (#9&10) and 0.36 IO/sec/mailbox (#1). IO intensity is just one of the myriad of Jetstress tweakable parameters that make analyzing ESRP so challenging. Normally this would only matter with database and log access counts but heavier workloads can also impact latencies as well.

Wide variance between read and write latencies

The other thing of interest in this chart is the interesting span between read latencies and write (database and log) latencies for the same solution. Take the #10 Dell PowerEdge system for example. It showed a database read latency of ~18msec. but a database write latency of ~0.4msec. Why?

It turns out this Dell system had only 6 disk drives (2TB/7200 RPM). So few disk drives don’t seem adequate to support the read workload and as a result, show up poorly in database read latencies. However, write activity can mostly be masked with cache until it fills up, forcing write delays. With only 1100 mailboxes and 0.15 IOs/sec/mailbox, the write workload apparent fits in cache well enough to be destaged over time, without delaying ongoing write activity. Similar results appear for the other Dell PowerEdge (#6) and the HP Smart Array (#7) which had 12-2TB/7200 RPM and 24-932GB/7200 RPM drives respectively.

On the other hand, Xiotech’s #1 position had 20-360GB/15Krpm drives and EMC’s Celerra #4 run had 15-400GB/10Krpm drives, both of which were able to sustain a more balanced performance across reads and writes (database and logs). For Xiotech’s #5 run they used 40-500GB/10Krpm drives.

It seems there is a direct correlation between drive speed and read database latencies. Most of the systems in the bottom half of this chart have 7200 RPM drives (except for #8, HP StorageWorks MSA) and the top 3 all had 15Krpm drives. However, write latencies don’t seem to be as affected by drive speed and have more to do with the balance between workload, cache size and effective destaging.

The other thing that’s apparent from this chart is that SAS connected storage continues to be an effective solution for this range of Exchange configurations, following a trend first shown in ESRP v2 (Exchange 2007) results. We reported on this in our January ESRPv2 analysis dispatch for this year .

The full dispatch will be up on our website in a couple of weeks but if you are interested in seeing it sooner just sign up for our free newsletter (see upper right) or subscribe by email and we will send you the current issue with download instructions for this and other reports.

As mentioned previously ESRP/Jetstress results are difficult to compare/analyze and we continue to welcome any constructive suggestions on how to improve.

There have been a number of Microsoft ESRP submissions this past quarter, especially in the over 5K mailbox category and they now total 12 submissions in this category alone.

The above chart is one or a series of charts from our recent StorInt(tm) dispatch on Exchange performance. This chart displays an Exchange email counterpart to last month’s SpecSFS 2008 CIFS ORT chart only this time depicting the Top 10 Exchange database read, write and log latencies (sorted by read latency).

Except for the HP Smart Array (at #4) and Dell PowerVault MD1200 (#7), all the remaining submissions are FC attached subsystems. The HP Smart Array and Dell exceptions used SAS attached storage.

For some reason the HP Smart Array had an almost immeasurable log write response time (<~0.1msec.) and a very respectable database read response time of 8.4msec.

As log writes are essentially sequential, we would expect a SAS/JBOD to do well here. But the random database reads and writes seem indicative of a well tuned, caching (sub-)system, not a JBOD!?

One secret to good Exchange 2010 JBOD performance appears to be matching your Exchange email database and log LUN size to disk drive size. This seems to be a significant difference between Dell’s SAS storage and HP’s SAS storage. For instance, both systems had 15Krpm SAS drives at ~600GB, but Dell’s LUN size was 13.4TB while HP’s database and log LUN size was 558GB. Database and log LUN size relative to disk size didn’t seem to significantly impact Exchange performance for FC subsystems.

The other secret to good SAS Exchange 2010 performance is to stick with relatively small mailbox counts. Both the HP and Dell JBODs had the smallest mailbox counts of this category at 6K and 7.2K respectively.

Exchange database write latency

There appears to be little correlation between read and write latencies in this data. All of these results used Exchange database resiliency or DAGs, so they had similar types of database activity to contend with. Also the number of DAGs typically increased with higher mailbox counts but this wasn’t universal, e.g, the HDS AMS 2100 (#1) with 17.2K mailboxes had four DAGs while the last two IBM XIVs (#9&10) with 40K mailboxes had one each. But the number of database activity groups shouldn’t matter much to Exchange database latencies.

On the other hand, the number of DAG copies may matter to Exchange write performance. It is unclear how DAG copy writes are measured/simulated in Jetstress, the program used to drive ESRP workloads. But, the number of database copies stood between two (#1,2,5,8&10) and three (#3,4,6,7&9) for all these submissions with no significant advantage for fewer copies. So that’s not the answer.

I will make a stand here and say that high variability between read and write database latencies has something to do with storage (sub-)system caching effectiveness and Exchange 2010’s larger block sizes but it’s not clear from the available data. However, this could easily be an artifact of the limited data available.

Why we like database access latency metrics

In our view, database read latencies correlates well with average Microsoft Exchange user experience for email read/search activities. Also, log write and database write times can be good substitutes for Exchange Server email send times. We like to think of database latencies as a end-user view of Exchange email performance.

The full ESRP v3.0 performance report will go up on SCI’s website next month in our dispatches directory. However, if you are interested in receiving this sooner, just subscribe by email to our free newsletter and we will send you the current issue with download instructions for this and other reports.

Exchange 2010 is just a year old now and everyone is still trying to figure out how to perform well within the new architecture, so I expect some significant revisions to this chart over time. Nonetheless, the current crop clearly indicates that there is a wide disparity in Exchange storage performance.

As always, we welcome any constructive comments on how to improve our analysis of ESRP results.

Tag: Exchange database access latencies

Latest ESRPv3 (Exchange 2010) results analysis for 1K-to-5Kmailboxes – chart of the month

Wide variance between read and write latencies

Microsoft Exchange Performance, ESRP v3.0 results – chart of the month

Exchange database write latency

Why we like database access latency metrics