5D storage for humanity’s archive

5D data storage.jpg_SIA_JPG_fit_to_width_INLINEA group of researchers at the University of Southhampton in the UK have  invented a new type of optical recording, based on femto-second laser pulses and silica/quartz media that can store up to 300TB per (1″ diameter) disc platter with thermal stability at up to 1000°C or a media life of up to 13.8B years at room temperature (190°C?). The claim is that the memory device could outlive humanity and maybe the universe.

The new media/recording technique was used recently to create copies of text files (Holy Bible, pictured above). Other significant humanitarian, political and scientific treatise have also been stored on the new media. The new device has been nicknamed “Superman Memory Crystal”, due to the memory glass (quartz) likeness to Superman’s memory crystals.

We have written before on long term archives(See Super Long Term Archive and Today’s data and the 1000 year archive posts) but this one beats them all by many orders of magnitude.
Continue reading “5D storage for humanity’s archive”

Super long term archive

Read an article this past week in Scientific American about a new fused silica glass storage device from Hitachi Ltd., announced last September. The new media is recorded with lasers burning dots which represent binary one or leaving spaces which represents binary 0 onto the media.

As can be seen in the photos above, the data can readily be read by microscope which makes it pretty easy for some future civilization to read the binary data. However, knowing how to decode the binary data into pictures, documents and text is another matter entirely.

We have discussed the format problem before in our Today’s data and the 1000 year archive as well as Digital Rosetta stone vs. 3D barcodes posts. And this new technology would complete with the currently available, M-disc long term achive-able, DVD technology from Millenniata which we have also talked about before.

Semi-perpetual storage archive!!

Hitachi tested the new fused silica glass storage media at 1000C for several hours which they say indicates that it can survive several 100 million years without degradation. At this level it can provide a 300 million year storage archive (M-disc only claims 1000 years).   They are calling their new storage device, “semi-perpetual” storage.  If 100s of millions of years is semi-perpetual, I gotta wonder what perpetual storage might look like.

At CD recording density, with higher densities possible

They were able to achieve CD levels of recording density with a four layer approach. This amounted to about 40Mb/sqin.  While DVD technology is on the order of 330Mb/sqin and BlueRay is ~15Gb/sqin, but neither of these technologies claim even a million year lifetime.   Also, there is the possibility of even more layers so the 40Mb/sqin could double or quadruple potentially.

But data formats change every few years nowadays

My problem with all this is the data format issue, we will need something like a digital rosetta stone for every data format ever conceived in order to make this a practical digital storage device.

Alternatively we could plan to use it more like an analogue storage device, with something like a black and white or grey scale like photographs of  information to be retained imprinted in the media.  That way, a simple microscope could be used to see the photo image.  I suppose color photographs could be implemented using different plates per color, similar to four color magazine production processing. Texts could be handled by just taking a black and white photo of a document and printing them in the media.

According to a post I read about the size of the collection at the Library of Congress, they currently have about 3PB of digital data in their collections which in 650MB CD chunks would be about 4.6M CDs.  So if there is an intent to copy this data onto the new semi-perpetual storage media for the year 300,002012 we probably ought to start now.

Another tidbit to add to the discussion at last months Hitachi Data Systems Influencers Summit, HDS was showing off some of their recent lab work and they had an optical jukebox on display that they claimed would be used for long term archive. I get the feeling that maybe they plan to commercialize this technology soon – stay tuned for more

 

~~~~

Image: Hitachi.com website (c) 2012 Hitachi, Ltd.,

The problems with digital audio archives

ldbell15 by Zyada (cc) (from Flickr)
ldbell15 by Zyada (cc) (from Flickr)

A recent article in Rolling Stone (File Not Found: The Record Industry’s Digital Storage Crisis) laments the fact that digital recordings can go out of service due to format changes, plugin changes, and/or files not being readable (file not found).

In olden days, multi-track masters were recorded on audio tape and kept in vaults.  Audio tape formats never seemed to change or at least changed infrequently, and thus, re-usable years or decades after being recorded.  And the audio tape drives seemed to last forever.

Digital audio recordings on the other hand, are typically stored in book cases/file cabinets/drawers, on media that can easily become out-of-date technology (i.e., un-readable) and in digital formats that seem to change with every new version of software.

Consumer grade media doesn’t archive very well

The article talks about using hard drives for digital recordings and trying to read them decades after they were recorded.  I would be surprised if they still spin up (due to stiction) let alone still readable.  But even if these were CDs or DVDs, the lifetime of consumer grade media is not that long, maybe a couple of years at best, if treated well and if abused by writing on them or by bad handling, it’s considerably less than that.

Digital audio formats change frequently

The other problem with digital audio recordings is that formats go out of date.  I am no expert but let’s take Apple’s Garage Band as an example.  I would be surprised if 15 years down the line that a 2010 Garage Band session recorded today was readable/usable with Garage Band 2025, assuming it even existed.  Sounds like a long time but it’s probably nothing for popular music coming out today.

Solutions to digital audio media problems

Audio recordings must use archive grade media if it’s to survive for longer than 18-36 months.  I am aware of archive grade DVD disks but have never tested any, so cannot speak to their viability in this application.  However, for an interesting discussion on archive quality CD&DVD media see How to choose CD/DVD archival media. But, there are other alternatives.

Removable data center class archive media today includes magnetic tape, removable magnetic disks or removable MO disks.

  • Magnetic tape – LTO media vendors specify archive life on the order of 30 years, however this assumes a drive exists that can read the media.  The LTO consortium states that current generation drives will read back two generations (LTO-5 drive today reads LTO-4 and LTO-3 media) and write back one generation (LTO-5 drive can write on LTO-4 media [in LTO-4 format]).  With LTO generations coming every 2 years or so, it would only take 6 years for a LTO volume, recorded today to be unreadable by current drives.  Naturally, one could keep an old drive around but maintenance/service would no longer be available for it after a couple of years.  LTO drives are available from a number of vendors.
  • Magnetic disk – The RDX Storage Alliance claims a media archive life of 30 years but I wonder whether a RDX drive would exist that could read it and the other question is how archive life was validated. Today’s removable disk typically imitates a magnetic tape drive/format.  The most prominent removable disk vendor is ProStor Systems but there are others.
  • Magneto-optical (MO) media – Plasmon UDO claims a media life of 50+ years for their magneto-optical media.  UDO has been used for years to record check images, medical information and other data.  Nonetheless,  recently UDO technology has not been able to keep up with other digital archive solutions and have gained a pretty bad rap for usability problems.  However, they plan to release a new generation of UDO product line in 2010 which may shake things up if it arrives and can address their usability issues.

Finally, one could use non-removable, high density disk drives and migrate the audio data every 2-3 years to new generation disks.  This would keep the data readable and continuously accessible.  Modern storage systems with RAID and other advanced protection schemes can protect data from any single and potentially double drive failure but as drives age, their error rate goes up.  This is why the data needs to be moved to new disks periodically.  Naturally, this is more frequently than magnetic tape, but given disk drive usability and capacity gains, might make sense in certain applications.

As for removable USB sticks – unclear what the archive life is for these consumer devices but potentially some version that went after the archive market might make sense.  It would need to be robust, have a long archive life and be cheap enough to compete with all the above.  I just don’t see anything here yet.

Solutions to digital audio format problems

There needs to be an XML-like description of a master recording that reduces everything to a more self-defined level which describes the hierarchy of the recording, and provides object buckets for various audio tracks/assets.  Plugins that create special effects would need to convert their effects to something akin to a MPEG-like track that could be mixed with the other tracks, surrounded by meta-data describing where it starts, ends and other important info.

Baring that, some form of standardization on a master recording format would work.  Such a standard could be supported by all major recording tools and would allow a master recording to be exported and imported across software tools/versions.  As this format evolved, migration/conversion products could be supplied to upgrade old formats to new ones.

Another approach is to have some repository for current master audio recording formats.  As software packages go out of date/business, their recording format could be stored in some “format repository”, funded by the recording industry and maintained in perpetuity.  Plug-in use would need to be documented similarly.  With a repository like this around and “some amount” of coding, no master recording need be lost to out-of-date software formats.

Nonetheless, If your audio archive needs to be migrated periodically, it be a convenient time to upgrade the audio format as well.

—-

I have written about these problems before in a more general sense (see Today’s data and the 1000 year archive) but the recording industry seems to be “leading edge” for these issues. When Producer T Bone Burnett testifies at a hearing that “Digital is a feeble storage medium” it’s time to step up and take action.

Digital storage is no more feeble than analog storage – they each have their strengths and weaknesses.  Analog storage has gone away because it couldn’t keep up with digital recording densities, pricing, and increased functionality.  Just because data is recorded digitally doesn’t mean it has to be impermanent, hard to read 15-35 years hence, or in formats that are no longer supported.  But it does take some careful thought on what storage media you use and on how you format your data.

Comments?

Today's data and the 1000 year archive

Untitled (picture of a keypunch machine) by Marcin Wichary (cc) (from flickr)
Untitled (picture of a keypunch machine) by Marcin Wichary (cc) (from flickr)

Somewhere in my basement I have card boxes dating back to the 1970s and paper tape canisters dating back to the 1960s with basic, 360-assembly, COBOL, PL/1 programs on them. These could be reconstructed if needed, by reading the Hollerith encoding and typing them out into text files. Finding a compiler/assembler/interpreter to interpret and execute them is another matter. But, just knowing the logic may suffice to translate them into another readily compilable language of today. Hollerith is a data card format which is well known and well described. But what of the data being created today. How will we be able to read such data in 50 years let alone 500? That is the problem.

Vista de la Biblioteca Vasconcelos by Eneas (cc) (from flickr)
Vista de la Biblioteca Vasconcelos by Eneas (cc) (from flickr)

Civilization needs to come up with some way to keep information around for 1000 years or more. There are books relevant today (besides the Bible, Koran, and other sacred texts) that would alter the world as we know it if they were unable to be read 900 years ago. No doubt, data or information like this, being created today will survive to posterity, by virtue of its recognized importance to the world. But there are a few problems with this viewpoint:

  • Not all documents/books/information are recognized as important during their lifetime of readability
  • Some important information is actively suppressed and may never be published during a regime’s lifetime
  • Even seemingly “unimportant information” may have significance to future generations

From my perspective, knowing what’s important to the future needs to be left to future generations to decide.

Formats are the problem

Consider my blog posts, WordPress creates MySQL database entries for blog posts. Imagine deciphering MySQL database entries, 500 or 1000 years in the future and the problem becomes obvious. Of course, WordPress is open source, so this information could conceivable be readily interpretable by reading it’s source code.

I have written before about the forms that such long lived files can take but for now consider that some form of digital representation of a file (magnetic, optical, paper, etc.) can be constructed that lasts a millennia. Some data forms are easier to read than others (e.g., paper) but even paper can be encoded with bar codes that would be difficult to decipher without a key to their format.

The real problem becomes file or artifact formats. Who or what in 1000 years will be able to render a Jpeg file, able to display an old MS/Word file of 1995, or be able to read a WordPerfect file from 1985. Okay, a Jpeg is probably a bad example as it’s a standard format but, older Word and WordPerfect file formats constitute a lot of information today. Although there may be programs available to read them today, the likelihood that they will continue to do so in 50, let alone 500 years, is pretty slim.

The problem is that as applications evolve, from one version to another, formats change and developers have negative incentive to publicize these new file formats. Few developers today wants to supply competitors with easy access to convert files to a competitive format. Hence, as developers or applications go out of business, formats cease to be readable or convertable into anything that could be deciphered 50 years hence.

Solutions to disappearing formats

What’s missing, in my view, is a file format repository. Such a repository could be maintained by an adjunct of national patent trade offices (nPTOs). Just like todays patents, file formats once published, could be available for all to see, in multiple databases or print outs. Corporations or other entities that create applications with new file formats would be required to register their new file format with the local nPTO. Such a format description would be kept confidential as long as that application or its descendants continued to support that format or copyright time frames, whichever came first.

The form that a file format could take could be the subject of standards activities but in the mean time, anything that explains the various fields, records, and logical organization of a format, in a text file, would be a step in the right direction.

This brings up another viable solution to this problem – self defining file formats. Applications that use native XML as their file format essentially create a self defining file format. Such a file format could be potentially understood by any XML parser. And XML format, as a defined standard, are wide enough defined that they could conceivable be available to archivists of the year 3000. So I applaud Microsoft for using XML for their latest generation of Office file formats. Others, please take up the cause.

If such repositories existed today, people in the year 3010 could still be reading my blog entries and wonder why I wrote them…