Testing filesystems for CPU core scalability

IMG_6536I attended HotStorage’16 and Usenix ATC’16 conferences this past week and there was a paper presented at ATC titled “Understanding Manicure Scalability of File Systems” (see p. 71 in PDF) by Changwoo Min and others at Georgia Institute of Technology. This team of researchers set out to understand the bottlenecks in a typical file systems as they scaled from 1 to 80 (or more) CPU cores on the same server.

FxMark, a new scalability benchmark

They created a new benchmark to probe CPU core scalability they called FxMark (source code available at FxMark), consisting of 19 “micro benchmarks” stressing specific scalability scenarios and three application level benchmarks, representing popular file system activities.

The application benchmarks in FxMark included: standard mail server (Exim), a NoSQL DB (RocksDB) and a standard user file server (DBENCH).

In the micro benchmarks, they stressed 7 different components of files systems: 1) path name resolution; 2) page cache for buffered IO; 3) node management; 4) disk block management; 5) file offset to disk block mapping; 6) directory management; and 7) consistency guarantee mechanism.

They varied each of these micro benchmark characteristics across three different sharing levels, for data reads and data overwrites, from having multiple cores accessing different blocks across different files (LOW share-ability); having multiple cores accessing different blocks in the same file (MEDIUM share-ability), to having multiple cores accessing the same block in the same file (HIGH share-ability).  They also had used the same share ability levels for meta data read path name, and used LOW and MEDIUM for read directory list, create file, unlink file and rename file. More details available in the paper.

FxMark results, bad news

The above photo is one chart from his presentation at Usenix ATC’16. Wherever you see a red highlighted arrows, the file systems being highlighted did not scale well in performance as they added CPU cores, where there’s a blue highlighted arrow, this indicates that the file systems did scale well with additional CPU cores.

The file systems tested were btrfs, ext4, F2FS, tmpfs and XFS. They used a RAM disk to eliminate IO performance being a bottleneck. Their testing showed  that most file systems, showed some performance bottlenecks, for one or more FxMark workloads as they added CPU cores.

FxMarkIn one case that Changwoo highlighted, for many of the micro benchmarks, just about every file system showed bad scaling for CPU cores. For example, for HIGH read share-ability (DRBH), all file systems suffered performance degradation as CPU cores were increased apparently due to contention for page reference counters, for MEDIUM read share-ability (DRBM), one (XFS) had a performance collapse due to consistency guarantees (iNode read/write semaphores) and for LOW read sharability (DRBL), every file system was able to linearly scale performance as one added CPU cores.

Fxmark2In addition, the Exim, RocksDB and DBENCH application benchmarks were the last 3 charts in the photo, respectively. For Exim, only tmpfs scaled well all the rest did not, for RocksDB, there was some scaling from 1 to 10 cores but after that there was little to be gained by adding CPU cores, and for DBENCH, we see btrfs, improve performance up to 40 cores and then gradually dies off.

I asked whether HIGH share-ability would ever occur in a real world environment and Changwoo suggested it could be seen for database servers referencing their index blocks. Not sure I totally agree as these would be buffered and not sent to the file system, but he has a point.

Extending FxMark

In any case, the approach seems interesting and deserves extension to testing external NFS and SMB file systems rather than just Linux server file systems.  And although FxMark was used to test CPU core scaling, it could be easily be generalized to test, node scalability,  as well. Although using a RAM disk to eliminate IO overhead may not work as well for external file systems.


Photo Credit(s): Taken from the Usenix ATC’16 paper, see pg. 71-85