What’s wrong with tape?

StorageTek Automated Cartridge System by brewbooks (cc) (from Flickr)
StorageTek Automated Cartridge System by brewbooks (cc) (from Flickr)

Was on a conference call today with Oracle’s marketing discussing their tape business.  Fred Moore (from Horison Information Systems) was on the call and mentioned something which surprised me.  What’s missing in open and distributed systems was some standalone mechanism to stack volumes onto a single tape cartridge.

The advantages of tape are significant, namely:

  • Low power utilization for offline or nearline storage
  • Cheap media, drives, and automation systems
  • Good sequential throughput
  • Good cartridge density

But most of these advantages fade when cartridge capacity utilization drops.  One way to increase cartridge capacity utilization is to stack multiple tape volumes on a single cartridge.

Mainframes (like system/z) have had cartridge stacking since the late 90’s.  Such capabilities came about due to the increasing cartridge capacities then available. Advance a decade and the problem still exists, Oracle’s StorageTek T10000 has a 1TB cartridge capacity and LTO-5 supports 1.5TB per cartridge both uncompressed.  Nonetheless, open or distributed systems still have no tape stacking capability.

Although I agree with Fred that volume stacking is missing in open systems, but does it really need such a thing.  Currently it seems open systems uses tape for backups, archive data and the occasional batch run.  Automated hierarchical storage management can readily fill up tape cartridges by holding their data movement to tape until enough data is ready to be moved.  On the other hand, backups by their very nature create large sequential streams of data which should result in high capacity utilization except for the last tape in a series.  Which only leaves the problem of occasional batch runs using large datasets or files.

I believe most batch processing today already takes place on the mainframe, leaving relatively little for open or distributed systems.  There are certainly some verticals that do lots of batch processing, for example banks and telcos.  But most heavy batch users grew up in the heyday of the mainframe and are still using them today.

Condor notwithstanding, open and distributed systems never had any sophisticated batch processing capabilities readily available on the mainframe. As such, of those new companies that need batch processing, my guess is that they start with open and as their needs for batch grow move these applications to mainframe.

So the real question becomes how do we increase open systems batch processing.   I don’t think a tape volume stacking system solves that problem.

Given all the above, I see tape use in open being relegated to backup and archive and used less and less for any other activities.

What do you think?