I attended HotStorage’16 and Usenix ATC’16 conferences this past week and there was a paper presented at ATC titled “Understanding Manicure Scalability of File Systems” (see p. 71 in PDF) by Changwoo Min and others at Georgia Institute of Technology. This team of researchers set out to understand the bottlenecks in a typical file systems as they scaled from 1 to 80 (or more) CPU cores on the same server.
FxMark, a new scalability benchmark
They created a new benchmark to probe CPU core scalability they called FxMark (source code available at FxMark), consisting of 19 “micro benchmarks” stressing specific scalability scenarios and three application level benchmarks, representing popular file system activities.
The application benchmarks in FxMark included: standard mail server (Exim), a NoSQL DB (RocksDB) and a standard user file server (DBENCH).
In the micro benchmarks, they stressed 7 different components of files systems: 1) path name resolution; 2) page cache for buffered IO; 3) node management; 4) disk block management; 5) file offset to disk block mapping; 6) directory management; and 7) consistency guarantee mechanism. Continue reading Testing filesystems for CPU core scalability
Back at SFD10 a couple of weeks back now when visiting with Nimble Storage they mentioned that their latest all flash storage array was going to support triple-parity RAID.
And last week at a NetApp-SolidFire analyst event, someone mentioned that the new ONTAP 9 triple parity RAID-TEC™ for larger SSDs. Also heard at the meeting was that a 15.3TB SSD would take on the order of 12 hours to rebuild.
Need for better protection
When Nimble discussed the need for triple parity RAID they mentioned the report from Google I talked about recently (see my Surprises from 4 years of SSD experience at Google post). In that post, the main surprise was the amount of read errors they had seen from the SSDs they deployed throughout their data center.
I think the need for triple-parity RAID and larger (+15TB SSDs) will become more common over time. There’s no reason to think that the SSD vendors will stop at 15TB. And if it takes 12 hours to rebuild a 15TB one, I think it’s probably something like ~30 hours to rebuild a 30TB one, which is just a generation or two away.
They called the new chip, a Tensor Processing Unit (TPU). According to Google, the TPU provides an order of magnitude more power efficient machine learning over what’s achievable via off the shelf GPU/CPUs. TensorFlow is Google’s open sourced machine learning software.
Just got back from EMCWorld2016 this week but on the way there and back I was perusing the FAST’16 papers. One of the papers I read (see Slacker: Fast Distribution with Lazy Docker Containers, p. 181) discussed performance problems with initializing Docker container micro-services and how they could be solved using persistent, intelligent NFS storage.
It appears that Docker container initialization spends a lot of time provisioning and initializing a local file system for each container. Docker containers typically make use of an AUFS (Another Union File System) storage driver which makes use of another file system (like ext4) as its underlying storage which has beneath it either DAS or external storage.
When using persistent and intelligent NFS storage, Docker can take advantage of storage system snapshots and cloning to improve container initialization significantly. In the paper, the researchers used Tintri as the underlying persistent, enterprise class NFS storage but I believe the functionality that’s taken advantage of is available with most enterprise class NAS systems and as such, is readily available with other storage subsystems. Continue reading Faster Docker initialization through Slacker snapshots & NFS storage
In a FAST’16 article I recently read (Flash reliability in production: the expected and unexpected, see p. 67), researchers at Google reported on field experience with flash drives in their data centers, totaling many millions of drive days covering MLC, eMLC and SLC drives with a minimum of 4 years of production use (3 years for eMLC). In some cases, they had 2 generations of the same drive in their field population. SSD reliability in the field is not what I would have expected and was a surprise to Google as well.
The SSDs seem to be used in a number of different application areas but mainly as SSDs with a custom designed PCIe interface (FusionIO drives maybe?). Aside from the technology changes, there were some lithographic changes as well from 50 to 34nm for SLC and 50 to 43nm for MLC drives and from 32 to 25nm for eMLC NAND technology. Continue reading Surprises from 4 years of SSD experience at Google
Microsoft Azure uses a different style of erasure coding for their cloud storage than what I have encountered in the past. Their erasure coding technique was documented in a paper presented at USENIX ATC’12 (for more info check out their Erasure coding in Windows Azure Storage paper).