IO500 performance report as of 25Nov2021

In 10 node performance, IO500 by AdministratorLeave a Comment

This Storage Intelligence (StorInt™) dispatch covers the IO500 benchmark[1]. Recall that the IO500 is focused on HPC (high performance computing) file IO. Unlike other file benchmarks (using NFS and SMB), most IO500 submissions use POSIX file systems which require client software to access their file systems. The IO500 supports two rankings one that allows submissions with any number of client nodes and the other with only 10 client nodes

The group that organizes the IO500 benchmarks, the Virtual Institute for IO (VI4IO), ranks submissions using a composite score that is a function of 4 IOR, bandwidth intensive workloads (easy read, easy write, hard read, and hard write) and 8 mdtest metadata intensive workloads [easy write, stat, & delete, hard write, read, stat, & delete, and find. The IO500 IOR benchmarks simulate big block, bandwidth intensive (traditional HPC) file IO activity and mdtest benchmarks simulate small block, (possibly AI) IO activity. Both are factored into the composite score used to rank IO500 systems performance. 

IO500 rankings are updated twice a year, once at the ISC-HPC conference in the spring and the other at the SC conference in the fall. The recently completed SC21 (Supercomputing 2021) conference IO500 results had several new submissions. 

IO500 10 client node results

We start our discussion with the top 10 composite score rankings as reported by IO500 for submissions with 10 client nodes in Figure 1. 

Figure 1 Top 10 IO500 10 client nodes overall score results

In Figure 1, the Pengcheng CloudBrain-II MadFS system came in first, with a composite IO score of ~2600 from earlier this year, at ISC21. We discussed the Pengcheng results in our last IO500 report but just to re-iterate, Pengcheng’s bandwidth scores were ok, but their mdtest and find scores were great (see mdtest rankings below).

Coming in at #2 and #3 were new SC21 submissions from Huawei, running their OceanFS storage OS (or OceanStor). Key to Huawei’s #2 & 3 rankings were their use of a distributed hash table for directory elements, large IO passthru, small IO file aggregation, and large numbers (800 and 400 respectively) of NVMe SSDs. They also did well in mdtest IO activity

The #4 ranked system used Intel DAOS, which was also extensively discussed in our last IO500 report, but to summarize this submission used Optane PMEM as their primary data storage. 

At #5 was another new SC21 submission, from Kongming BPFS Lab using BPFS. There’s not a lot of current information on BPFS (byte-addressable file system), which came out of UCLA research around 2009 and used DRAM to simulate byte-addressable persistent memory. The Io500 supplied data showed that BPFS Lab used 280 NVMe SSDs for their submission

In Figure 2 we show the top 10 mdtest and find composite scores for 10 client nodes submissions.

Figure 2 IO500 top 10 mdtest & find (composite) score results

Here too, the latest Pengcheng submission came in at #1 with a MD composite score of IOR score of ~34.8K KIOP/sec. As one can see above the Pengcheng submission did ~2X the nearest (Huawei) competitor on mdtest and find IO. 

Once again, the #2 and #3 mdtest and find ranked submissions were the same 2 Huawei OceanFS systems seen in the composite IO rankings above. With the solution with more drives (800 vs 400) coming in at 18.2K KIOP/sec and the lesser drive configuration coming in at 16.7K KIOP/sec metadata composite score. 

The #4 submission was the Kongming BPFS solution with a composite mdtest & find score of 9.8K KIOP/sec, or about 59% of the next higher Huawei OceanFS solution. 

In Figure 3, we show the IO500 top 10 mdtest&find individual scores ranked by their composite mdtest&find rankings for 10 client node systems. 

Figure 3 IO500 Top 10 mdtest&find individual score results

Sorry for the complexity in Figure 3. The scale is logarithmic, and we show the individual scores for each of the mdtest and find IO activity that go into the mdtest&find IO composite score. Higher is better. 

There’s something strange happening with the “find” IO activity. The #1-3 ranked submissions, all had find IO scores above 1,000 KIOPS/sec and the #4 ranking had over 100 KIOPS/sec while all the remaining submissions here (#5-10) were under 10 KIOPS/sec. 

This can’t be explained by storage response time or latency as the DAOS solutions (#5-9) used Optane PMEM while all the rest were (NVMe) SSDs. Our best guess is more of the find IO activity is happening external to the client nodes (at the storage nodes) for the #1-4 submissions than for any of the others. If this is the case, we will need to somehow account for this storage node IO activity or restrict it. Another option is to rank them by average (client-node) response time as well as IO counts. There are other options as well. 

In Figure 4, we show the IO500 top 10 IOR (bandwidth) composite rankings for 10 client node systems. 

Figure 4 IO500 Top 10 IOR (bandwidth) composite score results

In Figure 4, we can see that the Intel DAOS system came in at #1 with a bandwidth score of 399 GiB/sec while the Pengcheng system only achieved 4th place with a bandwidth score of 194 GiB/sec.

And coming at #2 & #3, once again are our two Huawei OceanFS systems with 317 and 314 GiB/sec, respectively. 

For the IOR score, the Kongming BPFS system didn’t even rank in the top 10 (not shown above) by only achieving 97 GiB/sec.  

Significance

IO500 HPC focused benchmarks differ substantially from other file system benchmarks we report on. For one thing, client node counts seem all important and for another file bandwidth is an equal factor to file IOPS.

Also as discussed above there’s something happening with find IO that’s providing a unique advantage to some submissions over others. This is having an outside influence on their mdtest&find rankings. We presume that this (whatever it is) will become more universal as time goes on and then we will see a more realistic IO rankings over IO500 submissions or IO500 rankings will need to somehow consider storage node IO activity as shown above. 

This is our 4th time reporting on IO500 results. Our next IO500 report will coincide with the ISC22-HPC conference release, next spring. 

[This system/storage performance reopen was originally sent out to our newsletter subscribers in November of 2021.  If you would like to receive this information via email please consider signing up for our free monthly newsletter (see subscription request, above right) and we will send our current issue along with download instructions for this and other reports. Dispatches are posted to our website generally a month or more after they are sent to our subscribers. ]

Silverton Consulting, Inc., is a U.S.-based Storage, Strategy & Systems consulting firm offering products and services to the data storage community


[1] All IO500 information is available at https://www.vi4io.org/std/io500/start as of 25 November 2021

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.