Has triple parity Raid time come?

Data center with hard drives
Data center with hard drives

Back at SFD10 a couple of weeks back now when visiting with Nimble Storage they mentioned that their latest all flash storage array was going to support triple-parity RAID.

And last week at a NetApp-SolidFire analyst event, someone mentioned that the new ONTAP 9 triple parity RAID-TEC™ for larger SSDs. Also heard at the meeting was that a 15.3TB SSD would take on the order of 12 hours to rebuild.

Need for better protection

When Nimble discussed the need for triple parity RAID they mentioned the report from Google I talked about recently (see my Surprises from 4 years of SSD experience at Google post). In that post, the main surprise was the amount of read errors they had seen from the SSDs they deployed throughout their data center.

I think the need for triple-parity RAID and larger (+15TB SSDs) will become more common over time. There’s no reason to think that the SSD vendors will stop at 15TB. And if it takes 12 hours to rebuild a 15TB one, I think it’s probably something like  ~30 hours to rebuild a 30TB one, which is just a generation or two away.

A read error on one SSD in a RAID group during an SSD rebuild can be masked by having dual parity. A read error on two SSDs can only be masked by having triple parity RAID.

Likelihood of a 2nd error is rising

What’s the likelihood of having two read errors in a RAID group during a 12-hour rebuild? Probably not that high but, if there are more read errors for SSDs in general then, there’s going to be at least more partial rebuilds. And with 15 no 30TB SSDs, there’s not going to be a whole lot of SSDs in your typical storage system anymore. So having one, large parity group, with triple parity might make a lot of sense.

The mathematics are beyond me to figure out actual failure rates but with a higher frequency of read errors and longer rebuild times (due to larger SSDs), triple parity makes sense. If not today with 15TB SSDs, then the next generation of 3D NAND SSDs may make it mandatory.

Moreover, I am aware of recent research (see my Better erasure coding … post) that indicates rebuild activity may be even more prevalent in the future, being used for slow SSD access as well as data errors. So having more parity would make sense if you were rebuilding blocks more often….

Triple parity seems here to stay…

Comments?