The new media/recording technique was used recently to create copies of text files (Holy Bible, pictured above). Other significant humanitarian, political and scientific treatise have also been stored on the new media. The new device has been nicknamed “Superman Memory Crystal”, due to the memory glass (quartz) likeness to Superman’s memory crystals.
Read an article this past week in Scientific American about a new fused silica glass storage device from Hitachi Ltd., announced last September. The new media is recorded with lasers burning dots which represent binary one or leaving spaces which represents binary 0 onto the media.
As can be seen in the photos above, the data can readily be read by microscope which makes it pretty easy for some future civilization to read the binary data. However, knowing how to decode the binary data into pictures, documents and text is another matter entirely.
Hitachi tested the new fused silica glass storage media at 1000C for several hours which they say indicates that it can survive several 100 million years without degradation. At this level it can provide a 300 million year storage archive (M-disc only claims 1000 years). They are calling their new storage device, “semi-perpetual” storage. If 100s of millions of years is semi-perpetual, I gotta wonder what perpetual storage might look like.
At CD recording density, with higher densities possible
They were able to achieve CD levels of recording density with a four layer approach. This amounted to about 40Mb/sqin. While DVD technology is on the order of 330Mb/sqin and BlueRay is ~15Gb/sqin, but neither of these technologies claim even a million year lifetime. Also, there is the possibility of even more layers so the 40Mb/sqin could double or quadruple potentially.
But data formats change every few years nowadays
My problem with all this is the data format issue, we will need something like a digital rosetta stone for every data format ever conceived in order to make this a practical digital storage device.
Alternatively we could plan to use it more like an analogue storage device, with something like a black and white or grey scale like photographs of information to be retained imprinted in the media. That way, a simple microscope could be used to see the photo image. I suppose color photographs could be implemented using different plates per color, similar to four color magazine production processing. Texts could be handled by just taking a black and white photo of a document and printing them in the media.
According to a post I read about the size of the collection at the Library of Congress, they currently have about 3PB of digital data in their collections which in 650MB CD chunks would be about 4.6M CDs. So if there is an intent to copy this data onto the new semi-perpetual storage media for the year 300,002012 we probably ought to start now.
Another tidbit to add to the discussion at last months Hitachi Data Systems Influencers Summit, HDS was showing off some of their recent lab work and they had an optical jukebox on display that they claimed would be used for long term archive. I get the feeling that maybe they plan to commercialize this technology soon – stay tuned for more
Read an article the other day about scientists creating an optical disk that would be readable in a million years or so. The article in Science Mag titled A million – year hard disk was intended to warn people about potential dangers in the way future that were being created today.
A while back I wrote about a 1000 year archive which was predominantly about disappearing formats. At the time, I believed given the growth in data density that information could easily be copied and saved over time but the formats for that data would be long gone by the time someone tried to read it.
The million year optical disk eliminates the format problem by using pixelated images etched on media. Which works just dandy if you happen to have a microscope handy.
Why would you need a million year disk
The problem is how do you warn people in the far future not to mess with radioactive waste deposits buried below. If the waste is radioactive for a million years, you need something around to tell people to keep away from it.
Stone markers last for a few thousand years at best but get overgrown and wear down in time. For instance, my grandmother’s tombstone in Northern Italy has already been worn down so much that it’s almost unreadable. And that’s not even 80 yrs old yet.
But a sapphire hard disk that could easily be read with any serviceable microscope might do the job.
How to create a million year disk
This new disk is similar to the old StorageTek 100K year optical tape. Both would depend on microscopic impressions, something like bits physically marked on media.
For the optical disk the bits are created by etching a sapphire platter with platinum. Apparently the prototype costs €25K but they’re hoping the prices go down with production.
There are actually two 20cm (7.9in) wide disks that are molecularly fused together and each disk can store 40K miniaturized pages that can hold text or images. They are doing accelerated life testing on the sapphire disks by bathing them in acid to insure a 10M year life for the media and message.
Presumably the images are grey tone (or in this case platinum tone). If I assume 100Kbytes per page that’s about 4GB, something around a single layer DVD disk in a much larger form factor.
It appears that sapphire is available from industrial processes and it seems impervious to wear that harms other material. But that’s what they are trying to prove.
Unclear why the decided to “molecularly” fuse two platters together. It seems to me this could easily be a weak link in the technology over the course of dozen millennia or so. On the other hand, more storage is always a good thing.
In the end, creating dangers today that last millions of years requires some serious thought about how to warn future generations.
In another assault on the tape market, EMC announced today a new Data Domain 860 Archiver appliance. This new system supports both short-term and long-term retention of backup data. This attacks one of the last bastions of significant tape use – long-term data archives.
Historically, a cheap version of archives had been the long-term retention of full backup tapes. As such, if one needed to keep data around for 5 years, one would keep all their full backup tape sets offsite, in a vault somewhere for 5 years. They could then rotate the tapes (bring them back into scratch use) after the 5 years elapsed. One problem with this – tape technology is advancing to a new generation of technology more like every 2-3 years and as such, a 5-year old tape cartridge would be at least one generation back before it could be re-used. But current tape technology always reads 2 generations and writes at least one generation back so this use would still be feasible. I would say that many tape users did something like this to create a “psuedopseudo-archive”.
On the other hand, there exists many specific archive point products that focused on one or a few application arenas such as email, records, or database archives which would extract specific data items and place them into archive. These did not generally apply outside one or a few application domains but were used to support stringent compliance requirements. The advantage of these application based archive systems is that the data was actually removed from primary storage, out of any data protection activities and placed permanently in only “archive storage”. Such data would be subject to strict retention policies and as such, would be inviolate (couldn’t be modified) and could not be deleted until formally expired.
Enter the Data Domain 860 Archiver, this system supports up to 24 disk shelves, each one of which could either be dedicated to short- or long-term data retention. Backup file data is moved within the appliance by automated policy from short- to long-term storage. Up to 4-disk shelves can be dedicated to short-term storage with the remainder considered long-term archive units.
When a long-term archive unit (disk shelf) fills up with backup data it is “sealed”, i.e., it is given all the metadata required to reconstruct its file system and deduplication domain and thus, would not require the use of other disk shelves to access its data. In this way one creates a standalone unit that contains everything needed to recover the data. Not unlike a full backup tape set which can be used in a standalone fashion to restore data.
Today, the Data Domain 860 Archiver only supports file access and DD boost data access. By doing so, the backup software is responsible for deleting data that has expired. Such data will then be absent deleted from any backups taken and as policy automation copies the backups to long-term archive units it will be missing gone from there as well.
While Data Domain’s Archiver lacks removing the data from ongoing backup streams that application based archive products can achieve, it does look exactly like what could be achieved from tape based archives today.
One can also replicate base Data Domain or Archiver appliances to an Archiver unit to achieve offsite data archives.
Full disclosure: I currently work with EMC on projects specific to other products but am not currently working on anything associated with this product.
Somewhere in my basement I have card boxes dating back to the 1970s and paper tape canisters dating back to the 1960s with basic, 360-assembly, COBOL, PL/1 programs on them. These could be reconstructed if needed, by reading the Hollerith encoding and typing them out into text files. Finding a compiler/assembler/interpreter to interpret and execute them is another matter. But, just knowing the logic may suffice to translate them into another readily compilable language of today. Hollerith is a data card format which is well known and well described. But what of the data being created today. How will we be able to read such data in 50 years let alone 500? That is the problem.
Civilization needs to come up with some way to keep information around for 1000 years or more. There are books relevant today (besides the Bible, Koran, and other sacred texts) that would alter the world as we know it if they were unable to be read 900 years ago. No doubt, data or information like this, being created today will survive to posterity, by virtue of its recognized importance to the world. But there are a few problems with this viewpoint:
Not all documents/books/information are recognized as important during their lifetime of readability
Some important information is actively suppressed and may never be published during a regime’s lifetime
Even seemingly “unimportant information” may have significance to future generations
From my perspective, knowing what’s important to the future needs to be left to future generations to decide.
Formats are the problem
Consider my blog posts, WordPress creates MySQL database entries for blog posts. Imagine deciphering MySQL database entries, 500 or 1000 years in the future and the problem becomes obvious. Of course, WordPress is open source, so this information could conceivable be readily interpretable by reading it’s source code.
I have written before about the forms that such long lived files can take but for now consider that some form of digital representation of a file (magnetic, optical, paper, etc.) can be constructed that lasts a millennia. Some data forms are easier to read than others (e.g., paper) but even paper can be encoded with bar codes that would be difficult to decipher without a key to their format.
The real problem becomes file or artifact formats. Who or what in 1000 years will be able to render a Jpeg file, able to display an old MS/Word file of 1995, or be able to read a WordPerfect file from 1985. Okay, a Jpeg is probably a bad example as it’s a standard format but, older Word and WordPerfect file formats constitute a lot of information today. Although there may be programs available to read them today, the likelihood that they will continue to do so in 50, let alone 500 years, is pretty slim.
The problem is that as applications evolve, from one version to another, formats change and developers have negative incentive to publicize these new file formats. Few developers today wants to supply competitors with easy access to convert files to a competitive format. Hence, as developers or applications go out of business, formats cease to be readable or convertable into anything that could be deciphered 50 years hence.
Solutions to disappearing formats
What’s missing, in my view, is a file format repository. Such a repository could be maintained by an adjunct of national patent trade offices (nPTOs). Just like todays patents, file formats once published, could be available for all to see, in multiple databases or print outs. Corporations or other entities that create applications with new file formats would be required to register their new file format with the local nPTO. Such a format description would be kept confidential as long as that application or its descendants continued to support that format or copyright time frames, whichever came first.
The form that a file format could take could be the subject of standards activities but in the mean time, anything that explains the various fields, records, and logical organization of a format, in a text file, would be a step in the right direction.
This brings up another viable solution to this problem – self defining file formats. Applications that use native XML as their file format essentially create a self defining file format. Such a file format could be potentially understood by any XML parser. And XML format, as a defined standard, are wide enough defined that they could conceivable be available to archivists of the year 3000. So I applaud Microsoft for using XML for their latest generation of Office file formats. Others, please take up the cause.
If such repositories existed today, people in the year 3010 could still be reading my blog entries and wonder why I wrote them…