Microsoft ESRP database transfer performance by storage interface – chart of the month

SCIESRP160728-001The above chart was included in our e-newsletter Microsoft Exchange Solution Reviewed Program (ESRP) performance report, that went out at the end of July. ESRP reports on a number of metrics but one of the more popular is total (reads + writes) Exchange database transfers per second.

Categories reported on in ESRP include: over 5,000 mailboxes; 1001 to 5000 mailboxes; and 1000 and under mailboxes. For the above chart we created our own category using all submissions up to 10,000 mailboxes. Then we grouped the data using the storage  interface between the host Exchange servers and the storage, and only included ESRP reports that had 10 KRPM disk drives.
Continue reading “Microsoft ESRP database transfer performance by storage interface – chart of the month”

Microsoft Exchange database backup performance – chart of the month

Microsoft Exchange 1001-5000 mailboxes, top 10 database backup per server
In last month’s Storage Intelligence newsletter we discussed the latest Exchange storage system performance for 1001 to 5000 mailboxes. One  charts we updated was the above Exchange database backup on a per server basis. The were two new submissions for this quarter, and both the Dell PowerEdge R730xd (#2 above) and the HP D3600 drive shelf with P441 storage controller (#10) ranked well on this metric.

This ESRP reported metric only measures backup throughput at a server level. However, because these two new submissions only had one server, it’s not as much of a problem here.

The Dell system had a SAS connected JBOD with 14-4TB 7200RPM disks and the HP system had a SAS connected JBOD with 11-6TB 7200RPM disks. The other major difference is that the HP system had 4GB of “flash backed write cache” and the Dell system only had 2GB of  “flash backed cache”.

As far as I can tell the fact that the Dell storage managed ~2.3GB/sec. and the HP storage only managed ~1.1GB/sec is probably mostly due to their respective drive configurations than anything else.

RAID 0 vs. RAID 1

One surprising characteristic of the HP setup is that they used RAID 0 while the Dell system used RAID1. This would offer a significant benefit to the Dell system during heavy read activity, but as I understand it, the database backup activity is run with a standard email stress environment. So in this case, there is a healthy mix of reads/writes going on at the time the backup activity. So the Dell system would have an advantage for reads and a penalty for writes (writing two copies of all data). So Dell’s RAID advantage is probably a wash.

Whether RAID 0 vs. RAID 1 would have made any difference to other ESRP metrics (database transfers per second, read/write/log access latencies, log processing, etc.) is subject for another post.

Of course,  with Exchange DAG’s there’s built in database redundancy so maybe RAID 0 is an OK configuration for some customers. Software based redundancy does seem to be Microsoft’s direction, at least since Exchange 2010, so maybe I’m the one that’s out of touch.

Still for such a small configuration I’m not sure I would have gone with RAID 0…

Comments?

SCI SPECsfs2008 NFS throughput per node – Chart of the month

SCISFS150928-001
As SPECsfs2014 still only has (SPECsfs sourced) reference benchmarks, we have been showing some of our seldom seen SPECsfs2008 charts, in our quarterly SPECsfs performance reviews. The above chart was sent out in last months Storage Intelligence Newsletter and shows the NFS transfer operations per second per node.

In the chart, we only include NFS SPECsfs2008 benchmark results with configurations that have more than 2 nodes and have divided the maximum NFS throughput operations per second achieved by the node counts to compute NFS ops/sec/node.

HDS VSP G1000 with an 8 4100 file modules (nodes) and HDS HUS (VM) with 4 4100 file modules (nodes) came in at #1 and #2 respectively, for ops/sec/node, each attaining ~152K NFS throughput operations/sec. per node. The #3 competitor was Huawei OceanStor N8500 Cluster NAS with 24 nodes, which achieved ~128K NFS throughput operations/sec./node. At 4th and 5th place were EMC  VNX VG8/VNX5700 with 5 X-blades and Dell Compellent FS8600 with 4 appliances, each of which reached ~124K NFS throughput operations/sec. per node. It falls off significantly from there, with two groups at ~83K and ~65K NFS ops/sec./node.

Although not shown above, it’s interesting that there are many well known scale-out NAS solutions in SPECsfs2008 results with over 50 nodes that do much worse than the top 10 above, at <10K NFS throughput ops/sec/node. Fortunately, most scale-out NAS nodes cost quite a bit less than the above.

But for my money, one can be well served with a more sophisticated, enterprise class NAS system which can do >10X the NFS throughput operations per second per node than a scale-out systm. That is, if you don’t have to deploy 10PB or more of NAS storage.

More information on SPECsfs2008/SPECsfs2014 performance results as well as our NFS and CIFS/SMB ChampionsCharts™ for file storage systems can be found in our just updated NAS Buying Guide available for purchase on our web site.

Comments?

~~~~

The complete SPECsfs2008 performance report went out in SCI’s September newsletter.  A copy of the report will be posted on our dispatches page sometime this quarter (if all goes well).  However, you can get the latest storage performance analysis now and subscribe to future free monthly newsletters by just using the signup form above right.

As always, we welcome any suggestions or comments on how to improve our SPECsfs  performance reports or any of our other storage performance analyses.

 

New SPECsfs2008 CIFS/SMB vs. NFS (again) – chart of the month

SPECsfs2008 benchmark results for CIFS/SMB vs. NFS protocol performance
SCISFS140326-001 (c) 2014 Silverton Consulting, All Rights Reserved

The above chart represents another in a long line of charts on the relative performance of CIFS[/SMB] versus NFS file interface protocols. The information on the chart are taken from vendor submissions that used the same exact hardware configurations for both NFS and CIFS/SMB protocol SPECsfs2008 benchmark submissions.

There are generally two charts I show in our CIFS/SMB vs. NFS analysis, the one above and another that shows a ops/sec per spindle count analysis for all NFS and CIFS/SMB submissions.  Both have historically indicated that CIFS/SMB had an advantage. The one above shows the total number of NFS or CIFS/SMB operations per second on the two separate axes and provides a linear regression across the data. The above shows that, on average, the CIFS/SMB protocol provides about 40% more (~36.9%) operations per second than NFS protocol does with the same hardware configuration.

However, there are a few caveats about this and my other CIFS/SMB vs. NFS comparison charts:

  • The SPECsfs2008 organization has informed me (and posted on their website) that  CIFS[/SMB] and NFS are not comparable.  CIFS/SMB is a stateful protocol and NFS is stateless and the corresponding commands act accordingly. My response to them and my readers is that they both provide file access, to a comparable set of file data (we assume, see my previous post on What’s wrong with SPECsfs2008) and in many cases today, can provide access to the exact same file, using both protocols on the same storage system.
  • The SPECsfs2008 CIFS/SMB benchmark does slightly more read and slightly less write data operations than their corresponding NFS workloads. Specifically, their CIFS/SMB workload does 20.5% and 8.6% READ_ANDX and WRITE_ANDX respectively CIFS commands vs. 18% and 9% READ and WRITE respectively NFS commands.
  • There are fewer CIFS/SMB benchmark submissions than NFS and even fewer with the same exact hardware (only 13). So the statistics comparing the two in this way must be considered preliminary, even though the above linear regression is very good (R**2 at ~0.98).
  • Many of the submissions on the above chart are for smaller systems. In fact 5 of the 13 submissions were for storage systems that delivered less than 20K NFS ops/sec which may be skewing the results and most of which can be seen above bunched up around the origin of the graph.

And all of this would all be wonderfully consistent if not for a recent benchmark submission by NetApp on their FAS8020 storage subsystem.  For once NetApp submitted the exact same hardware for both a NFS and a CIFS/SMB submission and lo and behold they performed better on NFS (110.3K NFS ops/sec) than they did on CIFS/SMB (105.1K CIFS ops/sec) or just under ~5% better on NFS.

Luckily for the chart above this was a rare event and most others that submitted both did better on CIFS/SMB. But I have been proven wrong before and will no doubt be proven wrong again. So I plan to update this chart whenever we get more submissions for both CIFS/SMB and NFS with the exact same hardware so we can see a truer picture over time.

For those with an eagle eye, you can see NetApp’s FAS8020 submission as the one below the line in the first box above the origin which indicates they did better on NFS than CIFS/SMB.

Comments?

~~~~

The complete SPECsfs2008  performance report went out in SCI’s March 2014 newsletter.  But a copy of the report will be posted on our dispatches page sometime next quarter (if all goes well).  However, you can get the latest storage performance analysis now and subscribe to future free newsletters by just using the signup form above right.

Even more performance information on NFS and CIFS/SMB protocols, including our ChampionCharts™ for file storage can be found in  SCI’s recently (March 2014) updated NAS Buying Guide, on sale from our website.

As always, we welcome any suggestions or comments on how to improve our SPECsfs2008 performance reports or any of our other storage performance analyses.

Latest SPC-2 performance results – chart of the month

Spider chart top 10 SPC-1 MB/second broken out by workload LFP, LDQ and VODIn the figure above you can see one of the charts from our latest performance dispatch on SPC-1 and SPC-2  benchmark results. The above chart shows SPC-2 throughput results sorted by aggregate MB/sec order, with all three workloads broken out for more information.

Just last quarter I was saying it didn’t appear as if any all-flash system could do well on SPC-2, throughput intensive workloads.  Well I was wrong (again) and with an aggregate MBPS™ of ~33.5GB/sec. Kaminario’s all-flash K2 took the SPC-2 MBPS results to a whole different level, almost doubling the nearest competitor in this category (Oracle ZFS ZS3-4).

Ok, Howard Marks (deepstorage.net), my GreyBeardsOnStorage podcast co-host and long-time friend, had warned me that SSDs had the throughput to be winners at SPC-2, but they would probably cost to much to be viable.  I didn’t believe him at the time — how wrong could I be.

As for cost, both Howard and I misjudged this one. The K2 came in at just under a $1M USD, whereas the #2, Oracle system was under $400K. But there were five other top 10 SPC-2 MPBS systems over $1M so the K2, all-flash system price was about average for the top 10.

Ok, if cost and high throughput aren’t the problem why haven’t we seen more all-flash systems SPC-2 benchmarks.  I tend to think that most flash systems are optimized for OLTP like update activity and not sequential throughput. The K2 is obviously one exception. But I think we need to go a little deeper into the numbers to understand just what it was doing so well.

The details

The LFP (large file processing) reported MBPS metric is the average of 1MB and 256KB data transfer sizes, streaming activity with 100% write, 100% read and 50%:50% read-write. In K2’s detailed SPC-2 report, one can see that for 100% write workload the K2 was averaging ~26GB/sec. while for the 100% read workload the K2 was averaging ~38GB/sec. and for the 50:50 read:write workload ~32GB/sec.

On the other hand the LDQ workload appears to be entirely sequential read-only but the report shows that this is made up of two workloads one using 1MB data transfers and the other using 64KB data transfers, with various numbers of streams fired up to generate  stress. The surprising item for K2’s LDQ run is that it did much better on the 64KB data streams than the 1MB data streams, an average of 41GB/sec vs. 32GB/sec.. This probably says something about an internal flash data transfer bottleneck at large data transfers someplace in the architecture.

The VOD workload also appears to be sequential, read-only and the report doesn’t indicate a data transfer size but given K2’s actual results, averaging ~31GB/sec it would seem to indicate it was on the order of 1MB.

So what we can tell is that K2’s SSD write throughput is worse than reads (~1/3rd worse) and relatively smaller sequential reads are better than relatively larger sequential reads (~1/4 better).  But I must add that even at the relatively “slower write throughput”, the K2 would still have beaten the next best disk-only storage system by ~10GB/sec.

Where’s the other all-flash SPC-2 benchmarks?

Prior to K2 there was only one other all-flash system (TMS RamSan-630) submission for SPC-2. I suspect that writing 26 GB/sec. to an all-flash system would be hazardous to its health and maybe other all-flash storage system vendors don’t want to encourage this type of activity.

Just for the record the K2 SPC-2 result has been submitted for “review” (as of 18Mar2014) and may be modified before finally “accepted”. However, the review process typically doesn’t impact performance results as much as other report items. So, officially, we will need to await for final acceptance before we can truly believe these numbers.

Comments?

~~~~

The complete SPC  performance report went out in SCI’s February 2014 newsletter.  But a copy of the report will be posted on our dispatches page sometime next quarter (if all goes well).  However, you can get the latest storage performance analysis now and subscribe to future free newsletters by just using the signup form above right.

Even more performance information and OLTP, Email and Throuphput ChampionCharts for Enterprise, Mid-range and SMB class storage systems are also available in SCI’s SAN Buying Guide, available for purchase from  website.

As always, we welcome any suggestions or comments on how to improve our SPC  performance reports or any of our other storage performance analyses.

SpecSFS2008 results NFS throughput vs. flash size – Chart of the Month

Scatter plot with SPECsfs2008 NFS throughput results against flash size, SSD, NFS thoughputThe above chart was sent out in our December newsletter and represents yet another attempt to understand how flash/SSD use is impacting storage system performance. This chart’s interesting twist is to try to categorize the use of flash in hybrid (disk-SSD) systems vs. flash-only/all flash storage systems.

First, we categorize SSD/Flash-only (blue diamonds on the chart) systems as any storage system that has as much or more flash storage capacity than SPECsfs2008 exported file system capacity. While not entirely true, there is one system that has ~99% of their exported capacity in flash, it is a reasonable approximation.  Any other system that has some flash identified in it’s configuration is considered a Hybrid SSD&Disks (red boxes on the chart) system.

Next, we plot the system’s NFS throughput on the vertical axis and the system’s flash capacity (in GB) on the horizontal axis. Then we charted a linear regression for each set of data.

What troubles me with this chart is that hybrid systems are getting much more NFS throughput performance out of their flash capacity than flash-only systems. One would think that flash-only systems would generate more throughput per flash GB than hybrid systems because of the slow access times from disk. But the data shows this is wrong?!

We understand that NFS throughput operations are mostly metadata file calls and not data transfers so one would think that the relatively short random IOPS would favor flash only systems. But that’s not what the data shows.

What the data seems to tell me is that judicious use of flash and disk storage in combination can be better than either alone or at least flash alone.  So maybe those short random IOPS should be served out of SSD and the relatively longer, more sequential like data access (which represents only 28% of the operations that constitute NFS throughput) should be served out of disk.  And as the metadata for file systems is relatively small in capacity, this can be supported with a small amount of SSD, leveraging that minimal flash capacity for the greater good (or more NFS throughput).

I would be remiss if I didn’t mention that there are relatively few (7) flash-only systems in the SPECsfs2008 benchmarks and the regression coefficient is very poor (R**2=~0.14), which means that this could change substantially with more flash-only submissions. However, it’s looking pretty flat from my perspective and it would take an awful lot of flash-only systems showing much higher NFS throughput per flash GB to make a difference in the regression equation

Nonetheless, I am beginning to see a pattern here in that SSD/Flash is good for some things and disk continues to be good for others. And smart storage system developers will do good to realize this fact.  Also, as a side note, I am beginning to see some rational why there aren’t more flash-only SPECsfs2008 results.

Comments?

~~~~

The complete SPECsfs2008 performance report went out in SCI’s December 2013 newsletter.  But a copy of the report will be posted on our dispatches page sometime this quarter (if all goes well).  However, you can get the latest storage performance analysis now and subscribe to future free newsletters by just using the signup form above right.

Even more performance information and ChampionCharts for NFS and CIFS/SMB storage systems are also available in SCI’s NAS Buying Guide, available for purchase from  website.

As always, we welcome any suggestions or comments on how to improve our SPECsfs2008  performance reports or any of our other storage performance analyses.

 

VM working set inflection points & SSD caching – chart-of-the-month

Attended SNW USA a couple of weeks ago and talked with Irfan Ahmad, Founder/CTO of CloudPhysics, a new Management-as-a-Service offering for VMware. He took out a chart which I found very interesting which I reproduce below as my Chart of the Month for October.

© 2013 CloudPhysics, Inc., All Rights Reserved

Above is a plot of a typical OLTP like application’s IO activity fed into CloudPhysics’ SSD caching model. (I believe this is a read-only SSD cache although they have write-back and write-through SSD caching models as well.)

On the horizontal access is SSD cache size in MB and ranges from 0MB to 3,500MB. On the left vertical access is % of application IO activity which is cache hits. On the right vertical access is the amount of data that comes out of cache in MB, which ranges from 0MB to 18,000MB.

The IO trace was for a 24-hour period and shows how much of the application’s IO workload that could be captured and converted to (SSD) cache hits given a certain sized cache.

The other thing that would have been interesting is to tell the size of the OLTP database that’s being used by the application, it could easily be 18GB or TBs in size, we don’t see that here.

Analyzing the chart

First, in the mainframe era (we’re still there, aren’t we), the rule of thumb was doubling cache size should increase cache hit rate by 10%.

Second, I don’t understand why at 0MB of cache the cache hit rate is ~25%. From my perspective, at 0MB of cache the hit rate should be 0%.  Seems like a bug in the model but that aside the rest of the curve is very interesting.

Somewhere around 500MB of cache there is a step function where cache hit rate goes from ~30% to ~%50.  This is probably some sort of DB index that has been moved into cache and has now become cache hits.

As for the rule of thumb, going from 500MB to 1000MB doesn’t seem to do much, maybe it increases the cache hit ration by a few %. And doubling it again (to 2000MB), only seems to get you another percent or two of more cache hit rates.

But moving to the 2300MB size cache gets you over 80% cache hit rate. I would have to say the rule of thumb doesn’t work well for this workload.

Not sure what the step up really represents from the OLTP workload perspective but at 80% cache hit, most of the database tables that are accessed more frequently must reside now in cache. Prior to this cache size (<2300MB) all of those tables apparently just didn’t fit in cache, thus, as one was being accessed and moved into cache, another was being pushed out of cache causing a read miss the next time it was accessed. After this cache size (>=2300MB), all these frequently accessed tables could now remain in cache, resulting in the ~80% cache hit rate seen on the chart.

Irfan said that they do not display the chart in CloudPhysics solution but rather display the inflection points. That is their solution would say something like at 500MB of SSD the traced application should see ~50% cache hit rate and at 2300MB of SSD the application should generate ~80% cache hits.  This nets it out for the customer but hides the curve above and the underlying complexity.

Caching models & application working sets …

With CloudPhysics SSD trace simulation Card (caching model) and the ongoing lightweight IO trace collection (IO tracing) available with their service, any VM’s working set can be understood at this fine level of granularity. The advantage of CloudPhysics is that with these tools, one could determine the optimum sized cache required to generate some level of cache hits.

I would add some cautions to the above:

  • The results shown here are based on a CloudPhysics SSD caching model.  Not all SSDs cache in the same way, and there can be quite a lot of sophistication in caching algorithms (having worked on a few in my time). So although,  from this may show the hit rate for a simplistic SSD cache, it could easily under or over estimate real cache hit rates, perhaps by a significant amount. The only way to validate CloudPhysics SSD simulation model is to put a physical cache in at the appropriate size and measure the VM’s cache hit rate.
  • Real caching algorithms have a number of internal parameters which can impact cache hit rates. Not the least of which is the size of the IO block being cached. This can be (commonly) fixed  or (rarely) variable in length. But there are plenty of others which can adversely impact cache hit rates as well for differing workloads.
  • Real caches have a warm up period. During this time the cache is filling up with tracks which may never be referenced again. Some warm up periods take minutes while some I have seen take weeks or longer. The simulation is for 24 hours only, unclear how the hit rate would be impacted if the trace/simulation was for longer or shorter periods.
  • Caching IO activity can introduce a positive (or negative) feedback into any application’s IO stream. If without a cache, an index IO took, let’s say 10 msec to complete and now with an appropriate sized cache, it takes 10 μseconds to complete, the application users are going to complete more transactions, faster. As this takes place, then database IO activity will be change from what it looked like without any caching. Also even the non-cache hits should see some speedup, because the amount of IO issued to the backend storage is reduced substantially.  At some point this all reaches some sort of stasis and we have an ongoing cache hit rate. But the key it’s unlikely to be an exact cache hit match to using a trace and model to predict it. The point is that adding cache to any application environment has affects which are chaotic in nature and inherently difficult to model.

Nonetheless, I like what I see here. I believe it would be useful to understand a bit more about CloudPhysics caching model’s algorithm, the size of the application’s database being traced here, and how well their predictions actually matched up to physical cache’s at the sizes recommended.

… the bottom line

Given what I know about caching in the real world, my suggestion is to take the cache sizes recommended here as a bottom end estimate and the cache hit predictions as a top end estimate of what could be obtained with real SSD caches.  I would increase the cache size recommendations somewhat and expect something less than the cache hits they predicted.

In any case, having application (even VM) IO traces like this that could be accessed and used to drive caching simulation models should be a great boon to storage developers everywhere. I can only hope that server side SSDs and caching storage  vendors supply their own proprietary cache model cards that can be supplied with CachePhysics Cards so that potential customers could use their application traces with the vendor cards to predict what their hardware can do for an application.

If you want to learn more about block storage performance from SMB to enterprise class SAN storage systems, please checkout our SAN Buying Guide, available for purchase on our website. Also we report each month on storage performance results from SPC, SPECsfs, and ESRP in our free newsletter. If you would like to subscribe to this, please use the signup form above right.

~~~~

Comments?

Image:  Chart courtesy of and use approved by CloudPhysics

Latest SPECsfs2008 results NFS vs. CIFS – chart-of-the-month

SCISFS121227-010(001) (c) 2013 Silverton Consulting, Inc. All Rights Reserved
SCISFS121227-010(001) (c) 2013 Silverton Consulting, Inc. All Rights Reserved

We return to our perennial quest to understand file storage system performance and our views on NFS vs. CIFS performance.  As you may recall, SPECsfs2008 believes that there is no way to compare the two protocols because

  • CIFS/SMB is “statefull” and NFS is “state-less”
  • The two protocols are issuing different requests.

Nonetheless, I feel it’s important to go beyond these concerns and see if there is any way to assess the relative performance of the two protocols.  But first a couple of caveats on the above chart:

  • There are 25 CIFS/SMB submissions and most of them are for SMB environments vs. 64 NFS submissions which are all over the map
  • There are about 12 systems that have submitted exact same configurations for CIFS?SMB and NFS SPECsfs2008 benchmarks.
  • This chart does not include any SSD or FlashCache systems, just disk drive only file storage.

All that being said, let us now see what the plot has to tell us. First the regression line is computed by Excel and is a linear regression.  The regression coefficient for CIFS/SMB is much better at 0.98 vs NFS 0.80. But this just means that their is a better correlation between CIFS/SMB throughput operations per second to the number of disk drives in the benchmark submission than seen in NFS.

Second, the equation and slope for the two lines is a clear indicator that CIFS/SMB provides more throughput operations per second per disk than NFS. What this tells me is that given the same hardware, all things being equal the CIFS/SMB protocol should perform better than NFS protocol for file storage access.

Just for the record the CIFS/SMB version used by SPECsfs2008 is currently SMB2 and the NFS version is NFSv3.  SMB3 was just released last year by Microsoft and there aren’t that many vendors (other than Windows Server 2012) that support it in the field yet and SPECsfs2008 has yet to adopt it as well.   NFSv4 has been out now since 2000 but SPECsfs2008 and most vendors never adopted it.  NFSv4.1 came out in 2010 and still has little new adoption.

So these results are based on older, but current versions of both protocols available in the market today.

So, given all that, if I had an option I would run CIFS/SMB protocol for my file storage.

Comments?

More information on SPECsfs2008 performance results as well as our NFS and CIFS/SMB ChampionsCharts™ for file storage systems can be found in our NAS Buying Guide available for purchase on our web site.

~~~~

The complete SPECsfs2008 performance report went out in SCI’s December newsletter.  But a copy of the report will be posted on our dispatches page sometime this month (if all goes well).  However, you can get the latest storage performance analysis now and subscribe to future free newsletters by just using the signup form above right.

As always, we welcome any suggestions or comments on how to improve our SPECsfs2008  performance reports or any of our other storage performance analyses.