Tape vs. Disk, the saga continues

Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)
Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)

Was on a call late last month where Oracle introduced their latest generation T1000C tape system (media and drive) holding 5TB native (uncompressed) capacity. In the last 6 months I have been hearing about the coming of a 3TB SATA disk drive from Hitachi GST and others. And last month, EMC announced a new Data Domain Archiver, a disk only archive appliance (see my post on EMC Data Domain products enter the archive market).

Oracle assures me that tape density is keeping up if not gaining on disk density trends and capacity. But density or capacity are not the only issues causing data to move off of tape in today’s enterprise data centers.

“Dedupe Rulz”

A problem with the data density trends discussion is that it’s one dimensional (well literally it’s 2 dimensional). With data compression, disk or tape systems can easily double the density on a piece of media. But with data deduplication, the multiples start becoming more like 5X to 30X depending on frequency of full backups or duplicated data. And number’s like those dwarf any discussion of density ratios and as such, get’s everyone’s attention.

I can remember talking to an avowed tape enginerr, years ago and he was describing deduplication technology at the VTL level as being architecturally inpure and inefficient. From his perspective it needed to be done much earlier in the data flow. But what they failed to see was the ability of VTL deduplication to be plug-compatible with the tape systems of that time. Such ease of adoption allowed deduplication systems to build a beach-head and economies of scale. From there such systems have no been able to move up stream, into earlier stages of the backup data flow.

Nowadays, what with Avamar, Symantec Pure Disk and others, source level deduplication, or close by source level deduplication is a reality. But all this came about because they were able to offer 30X the density on a piece of backup storage.

Tape’s next step

Tape could easily fight back. All that would be needed is some system in front of a tape library that provided deduplication capabilities not just to the disk media but the tape media as well. This way the 30X density over non-deduplicated storage could follow through all the way to the tape media.

In the past, this made little sense because a deduplicated tape would require potentially multiple volumes in order to restore a particular set of data. However, with today’s 5TB of data on a tape, maybe this doesn’t have to be the case anymore. In addition, by having a deduplication system in front of the tape library, it could support most of the immediate data restore activity while data restored from tape was sort of like pulling something out of an archive and as such, might take longer to perform. In any event, with LTO’s multi-partitioning and the other enterprise class tapes having multiple domains, creating a structure with meta-data partition and a data partition is easier than ever.

“Got Dedupe”

There are plenty of places, that today’s tape vendors can obtain deduplication capabilities. Permabit offers Dedupe code for OEM applications for those that have no dedupe systems today. FalconStor, Sepaton and others offer deduplication systems that can be OEMed. IBM, HP, and Quantum already have tape libraries and their own dedupe systems available today all of which can readily support a deduplicating front-end to their tape libraries, if they don’t already.

Where “Tape Rulz”

There are places where data deduplication doesn’t work very well today, mainly rich media, physics, biopharm and other non-compressible big-data applications. For these situations, tape still has a home but for the rest of the data center world today, deduplication is taking over, if it hasn’t already. The sooner tape get’s on the deduplication bandwagon the better for the IT industry.

—-

Of course there are other problems hurting tape today. I know of at least one large conglomerate that has moved all backup off tape altogether, even data which doesn’t deduplicate well (see my previous Oracle RMAN posts). And at least another rich media conglomerate that is considering the very same move. For now, tape has a safe harbor in big science, but it won’t last long.

Comments?

Tape v Disk v SSD v RAM

There was a time not long ago when the title of this post wouldn’t have included SSD. But, with the history of the last couple of years, SSD has earned its right to be included.

A couple of years back I was at a Rocky Mountain Magnetics Seminar (see IEEE magnetics societies) and a disk drive technologist stated that Disks have about another 25 years of technology roadmap ahead of them. During this time they will continue to increase density, throughput and other performance metrics. After 25 years of this they will run up against some theoretical limits which will halt further density progress.

At the same seminar, the presenter said that Tape was lagging Disk technology by about 5-10 years or so. As such, tape should continue to advance for another 5-10 years after disk stops improving at which time tape would also stop increasing density.

Does all this mean the end of tape and disk? I think not. Paper stopped advancing in density theoretically about 2 to 3000 years ago (the papyrus scroll was the ultimate in paper “rotating media”). If we move up to the codex or book form- which in my view is a form factor advance – this took place around 400AD (see history of scroll and codex). Paperback, another form factor advance, took place in the early 20th century (see paperback history).

Turning now to write performance, moveable type was a significant paper (write) performance improvement and started in the mid 15th century. The printing press would go on to improve (paper write) performance for the next six centuries (see printing press history) and continues today.

All this indicates that some data technology, whose density was capped over 2000 years ago, can continue to advance and support valuable activity in today’s world and for the foreseeable future. “Will disk and tape go away” is the wrong question, the right question is “can disk or tape, after SSDs reach price equivalence on a $/GB basis, still be useful to the world”?

I think yes, but that depends on a number of factors as to how the relative SSD-Disk-Tape technologies advance. Assuming someday all these technologies support equivalent Tb/SqIn or spatial density and

  • SSD’s retain their relative advantage in random access speed,
  • Tape it’s advantage in sequential throughput, volumetric density, and long media life, and
  • Disk it’s all around, combined sequential and random access advantage

It seems likely that each can sustain some niche in the data center/home office of tomorrow, although probably not where they are today.

One can see trends being enacted in the enterprise data centers today that are altering the relative positioning of SSDs, disks and tape. Tape is now being relegated to long term, archive storage, Disk is moving to medium-term, secondary storage and SSDs is replacing top tier, primary storage.

More thoughts on this in future posts.