Tape still alive, well and growing at Spectra Logic

T-Finity library at SpectraLogic's test facility (c) 2011 Silverton Consulting, All Rights Reserved
Today I met with Spectra Logic execs and some of their Media and Entertainment (M&E) customers, and toured their manufacturing, test labs and briefing center.  The tour was a blast and the customers Kyle Knack from National Geographic (Nat Geo) Global Media, Toni Perez from Medcom (Panama based entertainment company) and Lee Coleman from Entertainment Tonight (ET) all talked about their use of the T-950 Spectra Logic tape libraries in the media ingest, editing and production processes.

Mr. Collins from ET spoke almost reverently about their T-950 and how it has enabled ET to access over 30 years of video interviews, movie segments and other media they can now use to put together clips on just about any entertainment subject imaginable.

He  talked specifically about the obit they did for Michael Jackson and how they were able to grab footage from an interview they did years ago and splice it together with more recent media to show a more complete story.  He also showed a piece on some early Eddie Murphy film footage and interviews they had done at the time which they used in a recent segment about his new movie.

All this was made possible by moving to digital file formats and placing digital media in their T-950 tape libraries.

Spectra Logic T-950 (I think) with TeraPack loaded in robot (c) 2011 Silverton Consulting, All Rights Reserved
Mr. Knack from Nat Geo Media said every bit of media they get anymore, automatically goes into the library archive and becomes the “original copy” of the media used in case other copies are corrupted or lost.  Nat Geo started out only putting important media in the library but found it just cost so much less to just store it in the tape archive that they decided it made more sense to just move all media to the tape library.

Typically they keep two copies in their tape library and important media is also copied to tape and shipped offsite (3 copies for this data).  They have a 4-frame T-950 with around 4000 slots and 14 drives (combination of LTO-4 and -5).  They use FC and FCoE storage for their primary storage and depend on 1000s of SATA drives for primary storage access.

He said they only use SSDs for some metadata support for their web site. He found that SATA drives can handle their big block sequential and provide consistent throughput and especially important to M&E companies consistent latency.

3D printer at Spectra Logic (for mechanical parts fabrication) (c) 2011 Silverton Consulting, All Rights Reserved
Mr. Perez from MedCom had much the same story. They were in the process of moving off of proprietary video tape format (Sony Betacam) to LTO media and digital files. The process is still ongoing although they are more than halfway there for current production.

They still have a lot of old media in Betacam format which will take them years to convert to digital files but they are at least starting this activity.  He said a recent move from one site to another revealed that much of the Betacam tapes were no longer readable.  Digital files on LTO tape should solve that problem for them when they finally get there.

Matt Starr Spectra Logic CTO talked about the history of tape libraries at Spectra Logic which was founded in 1998 and has been laser focused on tape data protection and tape libraries.

I find it pleasantly surprising that a company today can just supply tape libraries with software and make a ongoing concern of it. Spectra Logic must be doing something right, revenue grew 30% YoY last year and they are outgrowing their current (88K sq ft) office, lab, and manufacturing building they just moved into earlier this year and have just signed to occupy another building providing 55K sq ft of more space.

T-Series robot returning TeraPack to shelf (c) 2011 Silverton Consulting, All Rights Reserved
Molly Rector Spectra Logic CMO talked about the shift in the market from peta-scale (10**15 bytes) storage repositories to exa-scale (10**18 bytes) ones.  Ms. Rector believed that today’s cloud storage environments can take advantage of these large tape based, archives to provide much more economical storage for their users without suffering any performance penalty.

At lunch with Matt Starr, Fred Moore (Horison Information Strategies)Mark Peters (Enterprise Strategy Group) and I were talking about HPSS (High Performance Storage System) developed in conjunction with IBM and 5 US national labs that supports vast amounts of data residing across primary disk and tape libraries.

Matt said that there are about a dozen large HPSS sites (HPSS website shows at least 30 sites using it) that store a significant portion of the worlds 1ZB (10**21 bytes) of digital data created this past year (see my 3.3 exabytes of data a day!? post).  Later that day talking with Nathan Thompson Spectra Logic CEO, he said these large HPSS sites probably store ~10% of the worlds data, or 100EB.  I find that difficult to comprehend that much data at only ~12 sites but the national labs do have lots of data on hand.

Nowadays you can get a Spectra Logic T-Finity tape complex with 122K slot, using LTO-4/-5 or IBM TS1140 (enterprise class) tape drives.  This large a T-Finity has 4 rows of tape libraries which uses the ‘Skyway’ to transport a terapack of tape cartridges between one library row to the another.   All Spectra Logic libraries are built around a tape cartridge package they call the TeraPack which contains 10 LTO cartridges or (I think) 9-TS1140 tape cartridges (they are bigger than LTO tapes).  The TeraPack is used to import or export tapes from the library and all the tape slots in the library.

The software used to control all this is called BlueScale and is used in their T50e, a small, 50 slot library all the way up to the 122K T-Finity tape complex.  There are some changes for configuration, robotics and other personalization for each library type but the UI looks exactly the same across any of their libraries. Moreover, BlueScale offers the same enterprise level of functionality (e.g., drive and media life management) services for all Spectra Logic tape libraries.

Day 1 for SpectraPRDay closed with the lab tour and dinner.  Day 2 will start discussing futures and will be under NDA so there won’t be much to talk about right away. But from what I can see, Spectra Logic seems to be breaking down the barriers inhibiting tape use and providing tape library systems, that people almost revere.

I haven’t seen that sort of reaction about a tape library since the STK 4400 first came out last century.



3.3 Exabytes-a-day?!

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)
NetworkWorld announced today information from an EMC funded IDC study that said the world will create 1.2 Zettabytes (ZB, 10**21 bytes) of data in 2010. By my calculations this is 3.3 Exabytes-a-day (XB,10**18 bytes), 2.3PB (10**15 bytes) a minute or 38TB (10**12 bytes) a second.  This seems high and I have talked about how we could get here last year in my Exabyte-a-day post.  But what interested me most was a statement that about 35% more information is created than can be stored.  Not sure I understand this claim. (Deduplication perhaps?)

Aside from deduplication, what this must mean is that data is being created, sent across the Internet and not stored anywhere except while in flight to be discarded soon after.  I assume this data is associated with something like VOIP phone calls and Video chats/conferences, only some portion of which is ever recorded and stored.   (Although that will soon no longer be true for audio, see my Yottabytes by 2015 post).

But 35% would indicate ~1 out of every 3 bytes of data is discarded shortly after creation.  IDC also expects this factor to grow, not shrink and “… to over 60% over the next few years.”  So 3 out of 5 bytes of data will only be available during real-time to be discarded thereafter.

Why this portion should be growing more rapidly than data being stored is hard to fathom. Again video and voice over the internet must be a significant part of the reason.

Storing voice data

I don’t know about most people but I record only a few of my more important calls.  Also, these calls happen to be longer on average than my normal calls.  Does this mean that 35% of my call data volume is not stored, maybe.  All my business calls are done via the Internet nowadays so this data is being created and shipped across the net, used while the call is occurring but never stored other than in flight or by call participants.  So non-recorded calls easily qualifies as data created but not stored.  Even so, while I may listen to maybe ~33% of the recorded calls afterwards, I overwrite all of them ultimately, keeping only the ones that fit on the recorder’s flash device.  Hence, in the end even the voice data I do keep is only retained until I need storage to record more.

Not sure how this is treated in the IDC study but it seems to me to be yet another class of data, maybe call this transient data.  I can see similarities of transient data in company backups, log files, database dumps, etc.  Most of this data is stored for a limited time only to be later erased/recorded over in the end.  How IDC classified such data I cannot tell.

But will transient data grow?

As for video, I currently do no video conferencing so have no information on this.  But I am considering moving to another communication platform that supplies Video chat’s and which will make it less intrusive to record calls.  While demoing this new capability I have rapidly consumed over 200MB of storage for call recordings.  (I need to cap this some way before it gets out of hand).  In any case, I believe recording convenience should make such data more store-able over time, not less.

So while I may agree that 1 out of 3 bytes of data created today is not stored, I definitely don’t think that over time that ratio will grow and certainly not to 60%.  My only caveat is that there is a limit to the amount of data the world can readily store at any one time and this will ultimately drive all of us to delete data we would rather keep.

But maybe all this just points to a more interesting question, how much data does the world create that is kept for a year, a decade, or a century.  But that will need to await another post…