Microsoft Exchange Performance, ESRP v3.0 results – chart of the month

(c) 2010 Silverton Consulting, Inc.
(c) 2010 Silverton Consulting, Inc.

There have been a number of Microsoft ESRP submissions this past quarter, especially in the over 5K mailbox category and they now total 12 submissions in this category alone.

The above chart is one or a series of charts from our recent StorInt(tm) dispatch on Exchange performance.   This chart displays an Exchange email counterpart to last month’s SpecSFS 2008 CIFS ORT chart only this time depicting the Top 10 Exchange database read, write and log latencies (sorted by read latency).

Except for the HP Smart Array (at #4) and Dell PowerVault MD1200 (#7), all the remaining submissions are FC attached subsystems.  The HP Smart Array and Dell exceptions used SAS attached storage.

For some reason the HP Smart Array had an almost immeasurable log write response time (<~0.1msec.) and a very respectable database read response time of 8.4msec.

As log writes are essentially sequential, we would expect a SAS/JBOD to do well here. But the random database reads and writes seem indicative of a well tuned, caching (sub-)system, not a JBOD!?

One secret to good Exchange 2010 JBOD performance appears to be matching your Exchange email database and log LUN size to disk drive size.  This seems to be a significant difference between Dell’s SAS storage and HP’s SAS storage.  For instance, both systems had 15Krpm SAS drives at ~600GB, but Dell’s LUN size was 13.4TB while HP’s database and log LUN size was 558GB.   Database and log LUN size relative to disk size didn’t seem to significantly impact Exchange performance for FC subsystems.

The other secret to good SAS Exchange 2010 performance is to stick with relatively small mailbox counts.  Both the HP and Dell JBODs had the smallest mailbox counts of this category at 6K and 7.2K respectively.

Exchange database write latency

There appears to be little correlation between read and write latencies in this data.  All of these results used Exchange database resiliency or DAGs, so they had similar types of database activity to contend with. Also the number of DAGs typically increased with higher mailbox counts but this wasn’t universal, e.g, the HDS AMS 2100 (#1) with 17.2K mailboxes had four DAGs while the last two IBM XIVs (#9&10) with 40K mailboxes had one each.  But the number of database activity groups shouldn’t matter much to Exchange database latencies.

On the other hand, the number of DAG copies may matter to Exchange write performance.  It is unclear how DAG copy writes are measured/simulated in Jetstress, the program used to drive ESRP workloads.   But, the number of database copies stood between two (#1,2,5,8&10) and three (#3,4,6,7&9) for all these submissions with no significant advantage for fewer copies.  So that’s not the answer.

I will make a stand here and say that high variability between read and write database latencies has something to do with storage (sub-)system caching effectiveness and Exchange 2010’s larger block sizes but it’s not clear from the available data.   However, this could easily be an artifact of the limited data available.

Why we like database access latency metrics

In our view, database read latencies correlates well with average Microsoft Exchange user experience for email read/search activities.  Also, log write and database write times can be good substitutes for Exchange Server email send times.  We like to think of database latencies as a end-user view of Exchange email performance.

The full ESRP v3.0 performance report will go up on SCI’s website next month in our dispatches directory.  However, if you are interested in receiving this sooner, just subscribe by email to our free newsletter and we will send you the current issue with download instructions for this and other reports.

Exchange 2010 is just a year old now and everyone is still trying to figure out how to perform well within the new architecture, so I expect some significant revisions to this chart over time.  Nonetheless, the current crop clearly indicates that there is a wide disparity in Exchange storage performance.

As always, we welcome any constructive comments on how to improve our analysis of ESRP results.

35 thoughts on “Microsoft Exchange Performance, ESRP v3.0 results – chart of the month

  1. Few comments to consider
    1. Microsoft allows up to 20ms response times for reads with exchange 2010 (as well as 2007). So ESRPs respect that and limit it up to 20ms response times for reads. If a vendor chooses to use lower response times the price you pay here is TCO of course.
    2. Exchange 2010 average block size is around 32KB-34KB, generally speaking the larger the block size the higher the response times. Compared to exchange 2007 with 1/4 size IO's around 8KB I actually think Microsoft so increase the limit to even more then 20ms.
    3. If you measure impact of 20ms response time on an email read vs 10ms you will find that it's negligible compared to the transfer times over the IP network, the processing time on the clients etc.

    Response times for exchange 2010 following the ESRP guidelines should be less then 20ms, yes all vendors can aim to do lower then that, just put less mailboxes etc. But you lose on TCO and don't gain anything.

    In my opinion your metric of tracking response times is irrelevant if it's following the guidelines.

    1. ESRPer,Thanks for your comments. I already knew about the 20msec limit for reads and that Exchange 2010 blocksizes were increased over 2007. However, I respectfully disagree that 20msec response time vs. 10msec response times have a negligible impact on end-user experience. As for ways to generate better response time there are many and just putting less mailboxes per database does not seem sufficient to reduce database response times for some storage subsystems.Ray

  2. I meant putting less mailboxes on the storage as a whole reduces the response times and that's very easy to test and show.

    Microsoft defines the 20ms limit, your argument is not with me but with them.
    I completely agree with MS that 20ms for 32KB-34KB large blocks with all the overhead of the network outside of the storage makes a lot more sense then 10ms.

    Anyway my point is that each vendor could have choose to go with less response time ESRP submission but that would be a waste of our (the customers) money since 20ms is sufficient for Exchange 2010 large blocks (heck, it was good enough for 8K on exchange 2007).

    Last, the Jetstress testing is assuming 100% mailboxes producing 100% of their ALL the time.
    The reality is that mailboxes don't run at 100% of their iops demand all the time. Hence the table above shows the worse case scenario on a real production exchange 2010. And if the worse case scenario is that you are getting close to 20ms on your peaks then any customer should be fine with it!

    So let's not waste our money (customer) getting a pie in the sky 10ms at 100% iops demand …

    You think that EMC or IBM could not have done 10ms ?!? you are wrong my friend. they just choose to be better on TCO and not waste their customers money. So they put more mailboxes bigger in size used more capacity to drive better TCO still at the limits of MS.

    1. ESRPer,Thanks again for your comments. But I will continue to disagree with you regarding the importance of response time. Yes, most certainly anyone can approach a minimal response time for their subsystem by having only 1 or a few mailboxes in their configuration but then they would be competing in a totally different mailbox category. When you get to 10K to 50Kmbx range, getting down to 10msec or less is a different matter. We will need to continue to disagree on this.Ray

      1. You provide no argument why latency is important here. You ignore my comments above that ESRP is testing all mailboxes running at max iops as worse case. You ignore the TCO that I mention.
        Really what is the base of your claim here?
        MS did great changes in the IO stack to allow larger and even slower drives yet you and others keep pushing faster drives ignoring the changes and recommendation MS has without giving any real justification for it and by that having customers buying smaller faster drives with low tco.

        it's such and easy stand to say the fastest response time is the best for customers, if you would do a real study and check the impact I would love to see it. Because MS did and published their recommendations on their website and they produced a very good calculator that demonstrate it.

        Examples of tco and why customers should not listen to this claim of faster response times, I will pick two ESRPs right from MS website.
        HDS did 68,000 mailboxes of 1GB with 0.12iops per mailbox, that's 8160 iops in total against the storage. They did it with 480 15K 450GB drives.

        XIV (1tb system) did 40,000 mailboxes of 1GB with 1iops per mailbox, thats 40,000 iops in total against the storage. They did it with 360 7.2K 1TB drives.

        Any none storage person can clearly see that the later did much more with much less!
        That's what I'm talking about saving money to customers. That's the real deal here!

        I encourage you to do a deeper sdive here you are missing the point of this new technology and misleading customers to look at the wrong matrices and spend their money old old approach of speeds and feeds.

        Good holidays.

        1. ESRPer,Once again thanks for your comments. I am open to changing my mind.As for the two examples you cite:- the IBM/XIV 40Kmbx subsystem run had 180 disk drives, and although it was being driven at 0.8 (tested at 1.0 simulated IOPS) and could have achieved 40K IOPs it only managed to actual attain 9.3K database transfers/second. I would contend this is due to it's poor database read and write response time (DB read 17.6msec, DB write 8.2msec). The fact is the XIV subsystem could only achieve ~23% of what it should have attained if it could actually sustained the email server workload.- the HDS AMS 2500 68.8Kmbx subsystem run had 480 disks and although it was being driven at 0.12 and could have achieved 8.3K IOPS, managed to actually achieve 13.2K database transfers/second. I would contend this was due to it's better matched workload and better response database read and write response time (DB read 9.3 msec, DB write 3.8msec). Of course the HDS subystem actually achieved 160% of what it should have attained.Now given the examples above I would contend that response time matters, and matters significantly to what level of workload the Exchange 2010 server can actually attain. I believe we will need to continue to disagree on this one.Also could you provide the previously mentioned website that claimed storage response time would have no bearing on Exchange 2010 end-user experience. I would like to check it out.Ray

          1. You have a mistake wit your numbers above. Please re examine the ESRP.
            The submission of IBM tested one XIV at 20,000 mailboxes at 1iops. Don't confuse the 2TB ESRP. I was talking about the 1TB ESRP. MS allows you to test a building block hence approved that ESRP.
            Emc and others did the same thing.
            If you check appendix of each ESRP you will find that in practice many of the ESRP did higher number of iops but reported to be safe little less. The same with the IBM XIV one.
            So check again your data since you provide incorrect results in Your answer above.

            You have to understand that MS approves those results. So one can't say they can do 40,000 mailboxes at 1iops and not do you like you claim below! MS will not approve it.
            So please check your data and correct your answer.
            I would send you screenshoots but can attach here.
            I owe you the website but I'm the road today.

          2. I see how you got the mistake. the ESRP for IBM run 180 drives to generate 20,000 mailboxes and then use two XIV's (360 drives) for multisite DAG.
            So your numbers are half of what they should be at best.
            The appendix on the submitted paper is for one XIV tested 180 drives only.
            So from the paper I see:
            Achieved Transactional I/O per Second 10380.94
            Achieved Transactional I/O per Second 10174.77

            thats 20555.71 IOPS for half the solution (180 drives)
            Which means in practice it did more then 40,000 iops for 40,000 mailboxes with 1 IOPS per mailbox.

            Do you see your mistake now?

          3. ESRPer,Thanks again. Sorry my confusion I was mixing up the results for the other IBM/XIV 40Kmbx run with this one. You are correct that the IBM/XIV was able to achieve 20.5K database operations per second. And yet that still is only ~50% of their potential maximum of 40K. Why? I would still contend this is due to their relatively poor response time. As for the disk drives, the IBM/XIV report list 180 drives as being tested. The fact that they were using DAGs and would have needed to have another subsystem around someplace to handle a similar portion of the workload is interesting but doesn't prove it would have been a similarly configured XIV subsystem. Nonetheless, If you use the phantom subsystem to state they had 2X the drives that were tested, then you will have to do that for every ESRP run. Specifically then you would need to say the HDS AMS 2500 run, which was also using DAGs, had 960 drives.Nevertheless, my view still holds and we still disagree. Exchange 2010 response time still matters to the Exchange server environment as to whether this impacts the end-user that's yet TBD.Ray

          4. When submitting ESRP with DAG you don't have to test the entire setup.
            Jetstress allows you to test half of the DAG while simulating the IOs as if it was a whole setup.
            If you walk through the screens of Jetstress you will see this option to test multiple copies of databases.
            As such IBM submitted a 180 drive 20,000 mailboxes using this option. The ESRP is for 40,000 mailboxes showing half of the hardware tested and simulated the multiple copies using Jetstress option.
            The bottom line the ESRP is for 40,000 mailboxes with 2 XIV's and 4 Servers.
            The test was on 1 XIV and two servers at 20,000 iops (10,000 per server).
            This is approved ESRP from Microsoft! so let's not forget that.

            You keep saying your view still holds and you disagree yet you continue not to provide any real justification.

            Again to summarize. The IBM ESRP is for 40,000 mailboxes 1 IOPS per mailbox 1GB per mailbox. The test was for half of the setup and approved by MS. Jetstress has the option to do such test, MS wrote it that way to help vendors with smaller setups proving the solution including all it's ios.

            Please re-read the ESRPs, IBM is not the only one who did that. EMC did the same.

          5. ESRPer,Thanks again for your continuing comments. Yes, what you say is correct and also applies equally well to the HDS AMS 2500 ESRP run. So if we take the same tack here the simulated HDS AMS 2500 run would have been capable of ~26.5K database operations per second (13.3K actual with another 13.3K in the phantom system). Which by my reckoning is about ~320% of what it should have done with a 0.12 IOPS workload for 68.8Kmbx. I believe this is mainly due to its relatively better response time.Once again, the IBM XIV ESRP run looks very slow in comparison being able to only sustain 100% of its corresponding workload (1.0 IOPS with 40Kmbx), which I contend is mostly due to its poor response time.As for ESRP using smaller than real setups, I wish there were some easy way to see this was the case in the standard ESRP report. It seems most of the ESRP runs using database resiliency (DAGs) have done the same thing, e.g. testing only half the configuration. This changed with Exchange 2010 and probably makes vendor life much easier but analysis that includes phantom boxes/phantom workloads tends to confuse the picture rather than clarifying it.I would still be interested in the web reference you mentioned that shows Exchange 2010 server IO response time has no bearing on end user experience.Ray

          6. Ray,
            You are twisting the math again.
            AMS did 68,000 mailboxes with 0.18 iops per mailbox. Thats 68,0000×0.18=8160 iops
            IBM XIV did 40,000 mailboxes with 1iops. That's 40,000×1=40,000 iops

            Why can't you see that!? It's very very simple.
            All vendors submitted ESRP for half the dag. And thats simple not relevant to the math above

            I give up educating you since you are not seeing the results as they are advertised or at least not understanding them.
            This is my last reply. And judging by the thumbs up/down on the replies I think people who read this article figured long ago that you are simply lacking an understanding how jetstress tests work in 2010 DAG env.

            I do owe you the link. I will get it tomorrow. Sorry for the delay there.

          7. here is what I owe:
            IO Reductions: Exchange 2010 delivers up to a 50% reduction in disk IO from Exchange 2007 levels. This means that more disks meet the minimum performance required to run Exchange, driving down storage costs.

            Optimizations for SATA Disks: IO patterns are optimized so that disk writes do not come in bursts. This removes a barrier that had previously limited the use of Serial Advanced Technology Attachment (SATA) desktop class hard disk drives disks.

          8. ESRPer,Thanks for the link. The fact that Exchange 2010 now no longer writes in bursts which enables SATA drives doesn't mean that Exchange end-user experience is oblivious to storage system response time. It just means that current Exchange IO should attain better write response times with SATA drives. Of course, I read this as saying that storage response time is still an important characteristic of Exchange workloads and is something Microsoft continues to try to improve with every software release. You probably read this as saying that the relatively poor response times of SATA disks won't impact Exchange workloads.I am afraid we are going to continue to disagree on this topic.Ray

          9. Now you are just lacking on the understand of how SATA drive works and their speed.
            A transaction time on a good sata drive is a min 8ms response times!
            If you start running more then one transaction against a sata drive you get to 12-15ms etc. I suggest you look at some drives spec..
            I give up. At least your readers are voting for the truth and not for your article.

          10. ESRPer,Thanks again. But the AMS 2500 with 68.800 mailboxes was driven at 0.12 IOPS but attained much more. Specifically 13,246 on half the storage and by your logic was capable of 26,492 IOPS (by doubling it for the DAG) or ~ 320% of what they should have attained at 68,800*0.12=8,256 IOPs because of their better response times.Ray

          11. I don't understand what you are saying here.
            AMS advertised 68,000 maulboxes with 480 drives at 0.12 IOPS per mailbox thats 8160 IOPS against the storage.
            Thats the only math I see.

          12. ESRPer,Sorry should have added…That Jetstress is set up to drive a solution at 0.12 IOPS per mailbox doesn't necessarily mean that a subsystem can actually attain that level OR that a subsystem couldn't achieve more than that. Apparently, Jetstress is capable of driving a solution harder than specified, if warranted (by whatever decision criteria Jetstress uses). Which I believe is the case with the AMS 2500 solution because it achieved significantly more than what it should have given it's driving specification of 0.12 IOPS/mbx.Ray

          13. Lost you.
            Again facts from the AMS paper page 9

            AMS advertised 68,000 maulboxes with 480 drives at 0.12 IOPS per mailbox thats 8160 IOPS against the storage.

            This is what they tested, this is what they wanted to advertise so this is what we should judge them on.

            The bottom line IBM XIV did X4.9 more IOPS against the storage (40,000 vs 8160).

            Do you understand that?

          14. ESRPer,Thanks again. Let me try to explain. If you look at the ESRP v3.0 website at page down to the HDS section. The second one from the bottom lists the HDS AMS 2500 ESRP report clearly saying this was for 68.800 mailboxes. If you download their report from: look on page 9 you will see they had also listed 68,800 mailboxes with a simulated IO profile of 0.12 IOPS. Now if one goes down to page 20 of that report one can clearly see for the “Aggregate Performance Across All Servers Metrics” that the “Database disk transfers per second” is 13,246. Now taking your logic that this can be doubled as they were using DAGs and the other (phantom) subsystem would have been able to do another equivalent amount of IO then I contend this ESRP shows that the HDS AMS2500 should be doing 26,492 Database disk transfers/second.With 68,800 mailboxes being driven at 0.12 IOPS per mailbox as you say they should have been doing 68.800*0.12 or 8,256 database transfers/second. But Jetstress results show us by your logic of doubling their actual results, that they did 26,492 database disk transfers per second and 26,492/8256 says they did 320% of what they should have done.Ray

          15. I see your mistake.
            AMS did not put all the servers outputs in the PDF. Instead they only put one server as example. So you do not multiple it by two here!
            It's 13,246 for ALL their hosts!
            IBM put in the document output from four servers showing 20,000 iops and the number of overall servers is eight so you do multiple it by two!
            IBM did over all against the storage 40,000 iops. AMS did against the storage 13,246 iops.
            That's 4x more iops with less drives on IBM compares to AMS.

            I hope it's clear now!

          16. ESRPer,I disagree.On the bottom of p.2 of HDS's report, it clearly states their solution consists of 16 DAGs and each configured with 2 servers. P.3 clearly shows this as well with each DAG having 2 servers depicted. The portion of this configuration actually tested consisted of DAG1-DAG8 which included 16 physical servers. The body of the report on pp. 14-19 clearly list the performance of each of these 16 named servers and therefore, the aggregate (on p. 20) is for only 1/2 of the storage configuration (the configuration actually tested).Now given your logic for the IBM XIV run we should be able to double this for the full configuration and end up with 26,492 database disk operations per second for this ESRP report. Ray

          17. AMS ESRP table 12,13,14 clearly says number of drives TESTED is 480.
            Also clearly stays right after that it achieved 13,246 iops.
            They tested with all the 480 disks!

            IBM tested only 180 disks and achieved 20555 iops.

            Clearly the IBM did significantly better.

          18. ESRPer,Thanks again. Yes I agree, IBM XIV did better on an IOPS/drive spindle basis, but this doesn't change the general discussion that database response time does matter to Exchange end user experience.You still haven't accounted nor have an explanation for the discrepancy in how the HDS AMS 2500 system was able to do 320% more than it was supposed to whereas the IBM system was only able to attain ~100% of what they were supposed to given their respective IOPS driving parameter (0.12 and 1.0 respectively). I contend this is due to mainly to the superior response time offered by the AMS system vs. the relatively poor response time of the IBM XIV system.We will need to continue to disagree on this topic.Ray

  3. Interesting thread guys. But why are the storage systems ranked by I/O latency when they all exceed Microsoft's recommendation? I want to see the disk vendors ranked by mailbox density, footprint, and TCO. We're not interested in latency arms races. And this is why we don't hire techie storage consultants.

    1. Hetson,Thanks for your comment. I understand your interest in TCO and footprint but I don't see anywhere that ESRP submissions are required to provide this information. I respectively suggest you contact Microsoft to make this information mandatory for future ESRP submissions after which I would be happy to report on this.As for the latency discussions, I suggest you follow the thread some more – it's not over yet.Ray

      1. Clearly MS dont care about that. They are not selling you the storage. That's why people like you (Ray) have a job! To help customers understand those results and what it means.

  4. Surely the only two important metrics are happy users and lowest TCO. Will real users detect these small differences in latency; I doubt it. Once you are belwo the Microsoft recommended limits you will have happy users. Will management notice differences in TCO – yes of course they will.
    Sorry this article is all theroretical technobabble, good for geeks but has no relevance to the real world

    1. Bazzerman,Thanks for your comment. I believe you are absolutely correct but would add happy CIO and administrators to your list. However, I disagree that users will not detect differences in latency and that this is all technobabble. I encourage you to follow the discussion thread as it continues to unfold if you are interested.Ray

      1. Ray,
        I'm a bit shock to hear that you think email users will detect differences in latency between 10 and 20ms. Email users are demanding users, yes. But not to the point to hold you accountable for SLAs between 10ms and 20ms per IO. That, in my humble opinion is a bit a of a stretch. I also doubt that a customer will pay for a solution that will guarantee sub-10ms latency with configurations that will most likely be expensive in over all TCO.

        1. Florescen,Thanks for your comment. I would say that end-user email experience is driven by a number of factors, some of these are local to the client and some of these are outside the client environment. It's plain to me that activity at the Exchange server can make or break the end user/client experience. The ESRP database response time is an average across all “simulated” Exchange database operations such as, searching email, reading email, sending email, copying email to the client, etc. I would think that in aggregate an end user would notice a doubling in server database response time in a number of ways not the least is email synch between the client and the server but it would also apply to how quickly an email can get from the outside world back to the end user which flows through the server and vice versa.However, this debate is not over by any means and I welcome your input.As for TCO, I mentioned in response to another comment that ESRP does not report on storage costs and without this TCO comparisons are impossible to make.Ray

  5. While latency is a pretty good first order metric to use as a proxy for user experience of the performance of the systems, there were so many other variables that were significantly different between the systems (cost, number of spindles, mailbox sizes etc), that just comparing latency alone has limited utility.

    In my past analyses of ESRP (and other) results, metrics like $/IOP, IOPS/Spindle and RAW capacity vs user Data ratios are all good metrics to have as these will be reflected in datacenter resource consumption which is a large concern for most. It also allows a more "like for like" comparison, though even that's not perfect.

    If two systems have similar IOPS/spindle then comparing the latency is a good indication of how hard they were pushing the disks/systtem and whether there is any headroom left to handle unusual spikes in the workloads.

    It might be interesting to come up with a weighting system where these are all pulled together for an overall "score", you could debate whether the weighting was appropriate but if you published all the data, you could adjust the wieghting factors to match your requirements/concerns and re-score the comparisons. That could be very useful.

    1. StorageWithoutBorders,Thanks for your comments. Great idea. I guess I will need to consider what you have said and perhaps incorporate this into a new ESRP dispatch when it comes out.Ray

Comments are closed.