CDs and DVDs longevity questioned

DVD-R read/write side (from
DVD-R read/write side (from

In a recent article from BBC on Should you store treasured data on (optical) disk the conclusion was that CDs and DVDs have significantly worse archive life than advertised or even suspected until recently.  The study done by the French National Centre for Scientific Research discovered that the reliability of a few optical disks was just over one year and most “rarely lasted longer than five to 10 years” although they were advertised to last significantly more.

There was not much detail in the BBC article and searching (in English) for the original research yielded nothing pertaining to the topic.   However, the article did say that the centre used accelerated life testing with heat, water vapor and light (standard IT industry practice) to determine point of failure and that products under the same brand had significant archive life variability due to multiple manufacturers.  They also stated that branding the discs might be impacting longevity as well. And that it appeared that the more than seven miles of (probably DVD) data recorded on the discs is deteriating faster than anticipated.

As a result, they suggested that data on optical disks should be copied every two to three years and maybe as time moves on, this can be done less frequently assuming optical disk lifespans improve.  Also important data should be spread across multiple storage formats.

The case for (IT) tape in video archives

Nonetheless, the article did mention that a 52 minute documentary typically requires about 500GB of high definition video to be recorded and at the moment that video is normally stored on data (tape) cassettes and hard drives.  In my experience these (video) tapes were specific to the recording equipment vendor, i.e. Panasonic, Sony, or others and as such, relatively expensive.  But nowadays, this data can also be stored on LTO or other IT tapes.  In contrast to the above, LTO tape has an archival storage life of around 30 years (depending on vendor) and can be had at reasonable cost.

Also, in the past I was aware of a number of TV broadcasters that had an archive of finished broadcasts residing only on DVDs.  They typically took one additional copy of a DVD and stored them both in their desks or file cabinets.  Many of these people will be very surprised when five years down the line, they go to access their archived broadcasts and find that they can no longer be read.  Of course, I have made the same mistake with my family video archive stored on DVDs.

Video archives whether of raw video or finished broadcasts require large capacity, sequentially accessed storage which seems ideal for automated LTO or other magnetic tape storage.  By using IT tape data storage for video archives, one can benefit from technology advances in density and throughput that happen every couple of years, benefit from volume manufacturing available to IT product manufacturers, and benefit from a significantly longer archive life.

Now if I can just find a USB LTO tape drive that works on the Mac for my home videos and family backups I would feel much better, …

3.3 Exabytes-a-day?!

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)
Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

NetworkWorld announced today information from an EMC funded IDC study that said the world will create 1.2 Zettabytes (ZB, 10**21 bytes) of data in 2010. By my calculations this is 3.3 Exabytes-a-day (XB,10**18 bytes), 2.3PB (10**15 bytes) a minute or 38TB (10**12 bytes) a second.  This seems high and I have talked about how we could get here last year in my Exabyte-a-day post.  But what interested me most was a statement that about 35% more information is created than can be stored.  Not sure I understand this claim. (Deduplication perhaps?)

Aside from deduplication, what this must mean is that data is being created, sent across the Internet and not stored anywhere except while in flight to be discarded soon after.  I assume this data is associated with something like VOIP phone calls and Video chats/conferences, only some portion of which is ever recorded and stored.   (Although that will soon no longer be true for audio, see my Yottabytes by 2015 post).

But 35% would indicate ~1 out of every 3 bytes of data is discarded shortly after creation.  IDC also expects this factor to grow, not shrink and “… to over 60% over the next few years.”  So 3 out of 5 bytes of data will only be available during real-time to be discarded thereafter.

Why this portion should be growing more rapidly than data being stored is hard to fathom. Again video and voice over the internet must be a significant part of the reason.

Storing voice data

I don’t know about most people but I record only a few of my more important calls.  Also, these calls happen to be longer on average than my normal calls.  Does this mean that 35% of my call data volume is not stored, maybe.  All my business calls are done via the Internet nowadays so this data is being created and shipped across the net, used while the call is occurring but never stored other than in flight or by call participants.  So non-recorded calls easily qualifies as data created but not stored.  Even so, while I may listen to maybe ~33% of the recorded calls afterwards, I overwrite all of them ultimately, keeping only the ones that fit on the recorder’s flash device.  Hence, in the end even the voice data I do keep is only retained until I need storage to record more.

Not sure how this is treated in the IDC study but it seems to me to be yet another class of data, maybe call this transient data.  I can see similarities of transient data in company backups, log files, database dumps, etc.  Most of this data is stored for a limited time only to be later erased/recorded over in the end.  How IDC classified such data I cannot tell.

But will transient data grow?

As for video, I currently do no video conferencing so have no information on this.  But I am considering moving to another communication platform that supplies Video chat’s and which will make it less intrusive to record calls.  While demoing this new capability I have rapidly consumed over 200MB of storage for call recordings.  (I need to cap this some way before it gets out of hand).  In any case, I believe recording convenience should make such data more store-able over time, not less.

So while I may agree that 1 out of 3 bytes of data created today is not stored, I definitely don’t think that over time that ratio will grow and certainly not to 60%.  My only caveat is that there is a limit to the amount of data the world can readily store at any one time and this will ultimately drive all of us to delete data we would rather keep.

But maybe all this just points to a more interesting question, how much data does the world create that is kept for a year, a decade, or a century.  But that will need to await another post…