Are RAID's days numbered?

HP/EVA drive shelfs in the HP/EVA lab in  Colo. Springs
HP/EVA drive shelfs in the HP/EVA lab in Colo. Springs
A older article that I recently came across said RAID 5 would be dead in 2009 by Robin Haris StorageMojo. In essence, it said as drives get to 1TB or more the time it took to rebuild the drive required going to RAID6.

Another older article I came across said RAID is dead, all hail the storage robot. It seemed to say that when it came to drive sizes there needed to be more flexibility and support for different capacity drives in a RAID group. Data Robotics Drobo products now support this capability which we discuss below.

I am here to tell you that RAID is not dead, not even on life support and without it the storage industry would seize up and die. One must first realize that RAID as a technology is just a way to group together a bunch of disks and to protect the data on those disks. RAID comes in a number of flavors which includes definitions for

  • RAID 0 – no protection)
  • RAID 1 – mirrored data protection
  • RAID 2 through 5 – single parity protection
  • RAID 6 and DP – dual parity protection

The rebuild time problem with RAID

The problem with drive rebuild time is that the time it takes to rebuild a 1TB or larger disk drive can be measured in hours if not days, depending on the busy-ness of the storage system and the RAID group. And of course as 1.5 and 2TB drives come online this just keeps getting longer. This can be sped up by having larger single parity RAID groups (more disk spindles in the RAID stripe), by using DP which actually has two raid groups cross-coupled (which means more disk spindles), or by using RAID 6 which often has more spindles in the RAID group.

Regardless of how you cut it there is some upper limit to the number of spindles that can be used to rebuild a failed drive – the number of active spindles in the storage subsystem. You could conceivably incorporate all these drives into a simple RAID 5 or 6 group (albeit, a very large one).

The downside of this large a RAID group is that data overwrite could potentially cause a performance bottleneck on the parity disks. That is, whenever a block is overwritten in a RAID 2-6 group, the parity for that data block (usually located on one or more other drives) has to be read, recalculated and rewritten back to the same location. Now it can be buffered, and lazily written but the data is not actually protected until parity is on disk someplace.

One way around this problem is to use a log structured file systems. Log file systems never rewrite data so there is no over-write penalty. Nicely eliminating the problem.

Alas, not everyone uses log structured file systems for backend storage. So for the rest of the storage industry the write penalty is real and needs to be managed effectively in order to not become a performance problem. One way to manage this is to limit RAID group size to a small number of drives.

So the dilemma is that in order to provide reasonable drive rebuild times you want a wide (large) RAID group with as many drives as possible in it. But in order to minimize the (over-)write penalty you want as thin (small) a RAID group as possible. How can we solve this dilemma?

Parity declustering

Parity Declustering figure from Holland&Gibson 1992 paper
Parity Declustering figure from Holland&Gibson 1992 paper

In looking at the declustered parity scheme described by Gibson and Holland in their 1992 paper. Parity and the stripe data can be spread across more drives than just in a RAID 5 or 6 group. They show an 8 drive system (see figure) where stripe data (with 3 data block sets) and parity data (of one parity block set) are rotated around a group of 8 physical drives in the array. In this way all 7 remaining drives are used to service the failed 8th drive. Some blocks will be rebuilt with one set of 3 drives and other blocks with a different set of 3 drives. As you go through the failed drives block set, rebuilding it would take all the remaining 7 drives, but not all of them would be busy for all the blocks. This should shrink the drive rebuild time considerably by utilizing more spindles.

Because parity declustering distributes the parity across a number of disk drives as well as the data no one disk would hold the parity for all drives. Doing this would eliminate the hot drive phenomenon, normally dealt with by using smaller RAID groups sizes.

The mixed drive capacity problem with RAID today

The other problem with RAID today is that it assumes a homogeneous set of disk drives in the storage array so that the same blocks/tracks/block sets could be set up as a RAID stripe across those disks used to compute parity. Now, according to the original RAID paper by Patterson, Gibson, and Katz they never explicitly stated a requirement for all the disk drives to be the same capacity but it seems easiest to implement RAID that way. With diverse capacity and performing drives you would normally want them to be in separate RAID groups. But you could create a RAID group using the least common divisor (or smallest capacity drive). However, by doing this you waste all the excess storage in the larger disks.

Now one solution to the above would be the declustered parity solution mentioned above but in the end you would need at least N-drives of the same capacity for whatever your stripe size (N) was going to be. But if you had that many drives why not just use RAID5 or 6.

Another solution popularized by Drobo is to carve up the various disk drives into RAID group segments. So if you had 4 drives with 100GB, 200GB, 400GB and 800GB, you could carve out 4 RAID groups: a 100GB RAID5 group across 4 drives; another 100GB RAID 5 group across 3 drives; a RAID 1 mirror for 200GB across the largest 2 drives; and a RAID 0 of 400GB on the largest drive. This could be configured as 4 LUNs or windows drive letters and used any way you wish.

But is this RAID?

I would say “yes”. Although this is at the subdrive level, it still looks like RAID storage, using parity and data blocks across stripes of data. All that’s been done is to take the unit of drive and make it some portion of a drive instead. Marketing aside, I think it’s an interesting concept and works well for a few drives of mixed capacity (just the market space Drobo is going after).

For larger concerns with intermixed drives I like parity declustering. It has the best of bigger RAID groups without the problems of increased activity for over-writes. Given today’s drive capacities, I might still lean towards a dual parity scheme with the parity declustering stripe but that doesn’t seem difficult to incorporate.

So when people ask if RAID’s days are numbered – my answer is a definite NO!