“… would consume nearly half the world’s digital storage capacity.”

A recent National Geographic article on recent research into the brain (February 2014) said something which I find intriguing. “Producing an image of an entire human brain at the same resolution [as a mouse brain] would consume nearly half of the world’s current digital storage capacity.”

They were imaging slices of a mouse brain with an electron microscope, in slices one millimeter square, at a micron in depth, representing just a thousand cubic microns per image. Such a scan of the full mouse brain would require 450,000 TB (0.45 EB, exabyte=10E18 bytes) of storage for the images.

Getting an equivalent resolution image of a single human brain would require 1.3 billion TB (or 1.3 ZB, zettabyte=10E21 bytes).  They went on to say that the world’s digital storage was just 2.7 billion TB (or 2.7 ZB), which is where they came up with the “… nearly half the world’s digital storage capacity.”

So how much digital storage is there in the world today

Setting aside the need for such a detailed map for the moment. Let’s talk about the world’s digital storage.

  • Tape – I don’t have much information about the enterprise tape capacity currently available in IBM TS1120/TS1130 or Oracle T10000C/B/A but a relatively recent article indicated that the 225 millionth LTO cartridge was shipped sometime in 3Q13 which represented a capacity of 90,000 PB (or 90 EB, exabyte=10E18 bytes) of storage capacity
  • Disk – Although I couldn’t find a reasonable estimate of installed disk capacity, IDC reported that 2012 disk capacity shipments were 20EB and through 3Q13 there had been 24.3EB shipped. It’s probably safe to assume that capacity shipments were ~8.3EB or more in 4Q13 so we have shipped ~32.5EB of disk capacity in 2013. One estimate of worldwide disk storage capacity (also provided by IDC) is that we are doubling worldwide disk storage capacity every two years so one estimate of installed disk capacity as of the end of 4Q13 is something on the order of 113.6EB of disk storage.

I won’t delve into optical storage as that’ s even more difficult to get a handle on but my guess is it’s not quite to the level of LTO digital storage so maybe another 90EB there for a total of  ~0.3ZB of digital storage in disks, LTO tape and optical.

However, back in February of 2010, researchers reported in Science that the world’s information storage capacity was 2.0 ZB of storage. Also, last October IDC reported that the US alone had a digital storage capacity of 2.6 ZB and that the US had somewhere between 24 to 40% of the world’s storage. Let’s use 33%, for simplicity sake, this would put world’s digital capacity at around 7.8ZB of storage according to IDC.

Thankfully, a human brain scan at the resolutions above would take only a sixth of the world’s digital storage based on my estimates.

But, we really need to talk about data reduction techniques

I think we need to start discussing some form of data reduction, data compression/fractal compression or even graphical encoding. For example, with appropriate software and compute power the neural scans could be encoded at appropriate levels of detail into a graphical representation. Hopefully, this should be many orders of magnitude less storage intensive. So maybe only 1/600th to 1/60,000 of all the world’s digital storage

Another approach might be to use a form of fractal compression similar to that done in motion pictures/photographic images. Perhaps, I am being naive but it seems to me that there ought to be some form of fractal encoding of neural branching. Most of nature’s branching structures have an underlying fractal basis and I see nothing in neural anatomy that would show me it’s any different.

Of course, I am not a neural biologist, but I am a storage expert and there’s got to be a way to reduce this data load somehow.

Comments?

Photo Credit: Microscopic embryonic mouse brain (DAPI, GFP) by Joseph Elsbernd

DS3, the BlackPearl and the way forward for … tape

Spectra Logic Summit 2013, Nathan Thompson, CEO talking about  Spectra Logic's historyJust got back from an analyst summit with Spectra Logic.  They announced a new interface to tape called, Deep Simple Storage Service (DS3) and an appliance that implements this interface named the BlackPearl.  The intent is to broaden the use of tape to include, todays more web services, application environments.

The main problems addressed by the new interface is how do you map an essentially sequential, high throughput but long latency access to first byte, removable media device to an essentially small file, get and put environment.  And is there a market for such services. I think Spectra Logic has answered the first set of questions and is about to embark on a journey to answer the second set of questions.

The new interface – it’s all about simplifying tape

The DS3 interface answers the first set of questions. With DS3 Specra Logic has extended Amazon’s S3 interface to expose some of the sequentiality and removability of tape to the object storage world.

As you should recall, Amazon S3 is a RESTful, web interface that uses HTTP type GET and PUT commands to move data to and from the S3 storage service.  The data you are moving is considered an object and the object name or identifier is unique across the storage service. When you “PUT” an object you get to add key-value pairs of information called meta-data to the object. When you “GET” an object you retrieve the data from the storage service. The other thing one needs to be aware of is that you get and put objects into “BUCKET”s.

With DS3, Spectra Logic has added essentially 4 new commands to S3 protocol, which are:

  • Bulk Put – this provides a list of objects that one wants to “PUT” into a DS3 storage service and the response from the DS3 storage service is an ordered list of which objects to PUT in sequence and which DS3 storage server node (essentially an IP address) to send the data.
  • Bulk Get – this supplies a list of objects that one wants to GET from a DS3 storage service and the response is an ordered list of the sequence to get those objects and the node address to use for those object gets
  • Export Bucket – this identifies a BUCKET that you wish to remove from a DS3 storage service.  Presumably the response would be where the bucket can be found,  the number of pieces of media to expect, and some identification of the media serial numbers that constitute a bucket on the DS3 storage service.
  • Import Bucket – this identifies a new bucket which will be imported into a DS3 storage service and will supply some necessary information such as how many pieces of media to expect and the serial numbers of the media.  Presumably the response will be a location which can be used to import the media.

With these four simple commands and an appropriate DS3 client, DS3 server and DS3 storage backend one now has everything they need to support a removable media object store. I could see real value for export/import like this on the “rare occasion” when a  cloud service provider goes out of business.

The DS3 interface will be publicly available and the intent is to both supply Spectra Logic developed clients as well a ISV/partner developed DS3 clients so as to provide removable media object stores for all sorts of other applications.

Spectra’s is providing developer tools and documentation so that anyone can write a DS3 client. To that end, the DS3 developer portal is up (couldn’t find a link this AM but will update this post when I find it) and available free of charge to anyone today (believe you need to register to gain access to the doc.). They have a DS3 server simulator that DS3 client developers can use to test out and validate their client software. They also have a try & buy service for client developers.

Essentially, the combination of DS3 clients, DS3 servers and DS3 backend storage create a really deep archive for object data. It’s not intended for primary or secondary storage access but it’s big, cheap, and power/space efficient storage that can be very effective if used for archive data.

BlackPearl, the first DS3 Server

Their second announcement is the first implementation of a DS3 server, Spectra Logic calls BlackPearl(™). The BlackPearl connects to one or more Spectra Logic tape libraries as a backend store which together essentially provides a DS3 object storage archive. The DS3 server talks to DS3 clients on the front end. BlackPearl uses SAS or FC connected tape transports, which can be any transport currently supported by SpectraLogic tape libraries, including IBM TS1140, LTO-4, -5 and -6.

In addition to BlackPearl, Spectra Logic is releasing the first DS3 client for Hadoop. In this case, the DS3 client implements a new version of the Hadoop DistCp (distributed copy) command which can be used to create a copy of an HDFS directory tree onto a DS3 storage service.

Current BlackPearl hardware is a standard 2U server with 4-400GB SSDs inside which act as sort of a speed matching buffer for the Object interface to SAS/FC tape interface.

We only saw a configuration with one BlackPearl in operation (GA of BlackPearl is expected this December). But the plan is to support multiple BlackPearl appliances to talk with the same DS3 backend storage. In that case, there will be a shared database and (tape) resource scheduler across all the appliances in the cluster.

Yes, but what about the market?

It’s a gutsy move for someone like Spectra Logic to define a new open interface to deep storage. The fact that the appliance exists outside the tape library itself and could potentially support any removable media offers interesting architectural capabilities. The current (beta) implementation lacked some sophistication but the expectation is that much of this will be resolved by GA or over time through incremental enhancements.

Pricing is appealing. When you add BlackPearl appliance(s), with a T950 Spectra Logic tape library using LTO drives which supports uncompressed data store of ~2.4PB of archive data, the purchase price is ~$0.10/GB. This compares especially well with current Amazon Glacier pricing of $0.01/GB/Month, so that for the price of 10 months of Glacier storage you could own your own DS3 storage service.

At larger capacities, such as BlackPearl using T950 with TS1140 tape drives supporting 6.4PB is even cheaper, at $0.09/GB. Other configurations are available and in general bigger congfigurations are cheaper on $/GB and smaller ones more expensive.  The configurations are speced by Spectra Logic to have all the media, tape drives and BlackPearl systems be needed to support an archives object store.

As for markets, Spectra Logic already has beta interest from a large well known web services customer and a number of media & entertainment customers.

In the long run, Spectra Logic believes that if they can simplify access to tape for an application where it’s well qualified to support (deep archive), that this will enable new applications to take advantage of tape, that weren’t even dreamed of before.  By opening up a Object Store interface to tape, anyone currently using S3 is a potential customer.

Amazon announced earlier this year that they have over 2 trillion objects is their S3. And as far as I can tell (see my post Who’s the next winner in storage?) they are growing with no end in sight.

~~~~

Comments?

 

Oracle (finally) releases StorageTek VSM6

[Full disclosure: I helped develop the underlying hardware for VSM 1-3 and also way back, worked on HSC for StorageTek libraries.]

Virtual Storage Manager System 6 (VSM6) is here. Not exactly sure when VSM5 or VSM5E were released but it seems like an awful long time in Internet years.  The new VSM6 migrates the platform to Solaris software and hardware while expanding capacity and improving performance.

What’s VSM?

Oracle StorageTek VSM is a virtual tape system for mainframe, System z environments.  It provides a multi-tiered storage system which includes both physical disk and (optional) tape storage for long term big data requirements for z OS applications.

VSM6 emulates up to 256 virtual IBM tape transports but actually moves data to and from VSM Virtual Tape Storage Subsystem (VTSS) disk storage and backend real tape transports housed in automated tape libraries.  As VSM data ages, it can be migrated out to physical tape such as a StorageTek SL8500 Modular [Tape] Library system that is attached behind the VSM6 VTSS or system controller.

VSM6 offers a number of replication solutions for DR to keep data in multiple sites in synch and to copy data to offsite locations.  In addition, real tape channel extension can be used to extend the VSM storage to span onsite and offsite repositories.

One can cluster together up to 256 VSM VTSSs  into a tapeplex which is then managed under one pane of glass as a single large data repository using HSC software.

What’s new with VSM6?

The new VSM6 hardware increases volatile cache to 128GB from 32GB (in VSM5).  Non-volatile cache goes up as well, now supporting up to ~440MB, up from 256MB in the previous version.  Power, cooling and weight all seem to have also gone up (the wrong direction??) vis a vis VSM5.

The new VSM6 removes the ESCON option of previous generations and moves to 8 FICON and 8 GbE Virtual Library Extension (VLE) links. FICON channels are used for both host access (frontend) and real tape drive access (backend).  VLE was introduced in VSM5 and offers a ZFS based commodity disk tier behind the VSM VTSS for storing data that requires longer residency on disk.  Also, VSM supports a tapeless or disk-only solution for high performance requirements.

System capacity moves from 90TB (gosh that was a while ago) to now support up to 1.2PB of data.  I believe much of this comes from supporting the new T10,000C tape cartridge and drive (5TB uncompressed).  With the ability of VSM to cluster more VSM systems to the tapeplex, system capacity can now reach over 300PB.

Somewhere along the way VSM started supporting triple redundancy  for the VTSS disk storage which provides better availability than RAID6.  Not sure why they thought this was important but it does deal with increasing disk failures.

Oracle stated that VSM6 supports up to 1.5GB/Sec of throughput. Presumably this is landing data on disk or transferring the data to backend tape but not both.  There doesn’t appear to be any standard benchmarking for these sorts of systems so, will take their word for it.

Why would anyone want one?

Well it turns out plenty of mainframe systems use tape for a number of things such as data backup, HSM, and big data batch applications.  Once you get past the sunk  costs for tape transports, automation, cartridges and VSMs, VSM storage can be a pretty competitive data storage solution for the mainframe environment.

The fact that most mainframe environments grew up with tape and have long ago invested in transports, automation and new cartridges probably makes VSM6 an even better buy.  But tape is also making a comeback in open systems with LTO-5 and now LTO-6 coming out and with Oracle’s 5TB T10000C cartridge and IBM’s 4TB 3592 JC cartridge.

Not to mention Linear Tape File System (LTFS) as a new tape format that provides a file system for tape data which has brought renewed interest in all sorts of tape storage applications.

Competition not standing still

EMC introduced their Disk Library for Mainframe 6000 (DLm6000) product that supports two different backends to deal with the diversity of tape use in the mainframe environment.  Moreover, IBM has continuously enhanced their Virtual Tape Server the TS7700 but I would have to say it doesn’t come close to these capacities.

Lately, when I talked with long time StorageTek tape mainframe customers they have all said the same thing. When is VSM6 coming out and when will Oracle get their act in gear and start supporting us again.  Hopefully this signals a new emphasis on this market.  Although who is losing and who is winning in the mainframe tape market is the subject of much debate, there is no doubt that the lack of any update to VSM has hurt Oracle StorageTek tape business.

Something tells me that Oracle may have fixed this problem.  We hope that we start to see some more timely VSM enhancements in the future, for their sake and especially for their customers.

~~~~

Comments?

~~~~

Image credit: Interior of StorageTek tape library at NERSC (2) by Derrick Coetzee

 

Million year optical disk

Read an article the other day about scientists creating an optical disk that would be readable in a million years or so. The article in Science Mag titled A million – year hard disk was intended to warn people about potential dangers in the way future that were being created today.

A while back I wrote about a 1000 year archive which was predominantly about disappearing formats. At the time, I believed given the growth in data density that information could easily be copied and saved over time but the formats for that data would be long gone by the time someone tried to read it.

The million year optical disk eliminates the format problem by using pixelated images etched on media. Which works just dandy if you happen to have a microscope handy.

Why would you need a million year disk

The problem is how do you warn people in the far future not to mess with radioactive waste deposits buried below. If the waste is radioactive for a million years, you need something around to tell people to keep away from it.

Stone markers last for a few thousand years at best but get overgrown and wear down in time. For instance, my grandmother’s tombstone in Northern Italy has already been worn down so much that it’s almost unreadable. And that’s not even 80 yrs old yet.

But a sapphire hard disk that could easily be read with any serviceable microscope might do the job.

How to create a million year disk

This new disk is similar to the old StorageTek 100K year optical tape. Both would depend on microscopic impressions, something like bits physically marked on media.

For the optical disk the bits are created by etching a sapphire platter with platinum. Apparently the prototype costs €25K but they’re hoping the prices go down with production.

There are actually two 20cm (7.9in) wide disks that are molecularly fused together and each disk can store 40K miniaturized pages that can hold text or images. They are doing accelerated life testing on the sapphire disks by bathing them in acid to insure a 10M year life for the media and message.

Presumably the images are grey tone (or in this case platinum tone). If I assume 100Kbytes per page that’s about 4GB, something around a single layer DVD disk in a much larger form factor.

Why sapphire

It appears that sapphire is available from industrial processes and it seems impervious to wear that harms other material. But that’s what they are trying to prove.

Unclear why the decided to “molecularly” fuse two platters together. It seems to me this could easily be a weak link in the technology over the course of dozen millennia or so. On the other hand, more storage is always a good thing.

~~~~

In the end, creating dangers today that last millions of years requires some serious thought about how to warn future generations.

Image: Clock of the Long Now by Arenamontanus

A “few exabytes-a-day” from SKA

A number of radio telescopes, positioned close together pointed at a cloudy sky
VLA by C. G. P. Grey (cc) (from Flickr)

ArsTechnica reported today on the proposed Square Kilometer Array (SKA) radio telescope and it’s data requirements. IBM is in collaboration with the Netherlands Institute for Radio Astronomy (ASTRON) to help develop the SKA called the DOME project.

When completed in ~2024, the SKA will generate over an exabyte a day (10**18) of raw data.  I reported in a previous post how the world was generating an exabyte-a-day, but that was way back in 2009.

What is the SKA?

The new SKA telescope will be a configuration of “millions of radio telescopes” which when combined together will create a telescope with an aperture of one square kilometer, which is no small feet.  They hope that the telescope will be able to shed some light on galaxy evolution, cosmology and dark energy.  But it will go beyond that to investigating “strong-field tests of gravity“, “origins and evolution of cosmic magnetism” and search for life on other planets.

But the interesting part from a storage perspective is that the SKA will be generating a “few exabytes a day” of radio telescopic data for every full day of operation.   Apparently the new radio telescopes will make use of a new, more sensitive detector able to generate data of up to 10GB/second.

How much data, really?

The team projects final storage needs at between 300 to 1500 PB per year. This compares to the LHC at CERN which consumes ~15PB of storage per year.

It would seem that the immediate data download would be the few exabytes and then it would be post- or inline-processed into something more mangeable and store-able.  Unless they have some hellaciously fast processing, I am hard pressed to believe this could all happen inline.  But then they would need at least another “few exabytes” of storage to buffer the data feed before processing.

I guess that’s why it’s still a research project.  Presumably, this also says that the telescope won’t be in full operation every day of the year, at least at first.

The IBM-ASTRON DOME collaboration project

The joint research project was named for the structure that covers a major telescope and for a famous Swiss mountain.  Focus areas for the IBM-ASTRON DOME project include:

  • Advanced high performance computing utilizing 3D chip stacks for better energy efficiency
  • Optical interconnects with nanophotonics for high-speed data transfer
  • Storage for both high access performance access and for dense/energy efficient data storage.

In this last focus area, IBM is considering the use of phase change memories (PCM) for high access performance and new generation tape for dense/efficient storage.  We have discussed PCM before in a previous post as an alternative to NAND based storage today (see Graphene Flash Memory).  But IBM has also been investigating MRAM based race track memory as a potential future storage technology.  I would guess the advantage of PCM over MRAM might be access speed.

As for tape, IBM has already demonstrated in their labs technologies for a 35TB tape. However storing 1500 PB would take over 40K tapes per year so they may need another even higher capacities to support SKA tape data needs.

Of course new optical interconnects will be needed to move this much data around from telescope to data center and beyond.  It’s likely that the nanophotonics will play some part as an all optical network for transceivers, amplifiers, and other networking switching gear.

The 3D chip stacks have the advantage of decreasing chip IO and more dense packing of components will make efficient use of board space.  But how these help with energy efficiency is another question.  The team projects very high energy and cooling requirements for their exascale high performance computing complex.

If this is anything like CERN, datasets gathered onsite are initially processed then replicated for finer processing elsewhere (see 15PB a year created by CERN post.  But moving PBs around like SKA will require is way beyond today’s Internet infrastructure.

~~~~

Big science like this gives a whole new meaning to BIGData. Glad I am in the storage business.  Now just what exactly is nanophotonics, mems based phote-electronics?

Tape still alive, well and growing at Spectra Logic

T-Finity library at SpectraLogic's test facility (c) 2011 Silverton Consulting, All Rights Reserved
T-Finity library at SpectraLogic's test facility (c) 2011 Silverton Consulting, All Rights Reserved

Today I met with Spectra Logic execs and some of their Media and Entertainment (M&E) customers, and toured their manufacturing, test labs and briefing center.  The tour was a blast and the customers Kyle Knack from National Geographic (Nat Geo) Global Media, Toni Perez from Medcom (Panama based entertainment company) and Lee Coleman from Entertainment Tonight (ET) all talked about their use of the T-950 Spectra Logic tape libraries in the media ingest, editing and production processes.

Mr. Collins from ET spoke almost reverently about their T-950 and how it has enabled ET to access over 30 years of video interviews, movie segments and other media they can now use to put together clips on just about any entertainment subject imaginable.

He  talked specifically about the obit they did for Michael Jackson and how they were able to grab footage from an interview they did years ago and splice it together with more recent media to show a more complete story.  He also showed a piece on some early Eddie Murphy film footage and interviews they had done at the time which they used in a recent segment about his new movie.

All this was made possible by moving to digital file formats and placing digital media in their T-950 tape libraries.

Spectra Logic T-950 (I think) with TeraPack loaded in robot (c) 2011 Silverton Consulting, All Rights Reserved
Spectra Logic T-950 (I think) with TeraPack loaded in robot (c) 2011 Silverton Consulting, All Rights Reserved

Mr. Knack from Nat Geo Media said every bit of media they get anymore, automatically goes into the library archive and becomes the “original copy” of the media used in case other copies are corrupted or lost.  Nat Geo started out only putting important media in the library but found it just cost so much less to just store it in the tape archive that they decided it made more sense to just move all media to the tape library.

Typically they keep two copies in their tape library and important media is also copied to tape and shipped offsite (3 copies for this data).  They have a 4-frame T-950 with around 4000 slots and 14 drives (combination of LTO-4 and -5).  They use FC and FCoE storage for their primary storage and depend on 1000s of SATA drives for primary storage access.

He said they only use SSDs for some metadata support for their web site. He found that SATA drives can handle their big block sequential and provide consistent throughput and especially important to M&E companies consistent latency.

3D printer at Spectra Logic (for mechanical parts fabrication) (c) 2011 Silverton Consulting, All Rights Reserved
3D printer at Spectra Logic (for mechanical parts fabrication) (c) 2011 Silverton Consulting, All Rights Reserved

Mr. Perez from MedCom had much the same story. They were in the process of moving off of proprietary video tape format (Sony Betacam) to LTO media and digital files. The process is still ongoing although they are more than halfway there for current production.

They still have a lot of old media in Betacam format which will take them years to convert to digital files but they are at least starting this activity.  He said a recent move from one site to another revealed that much of the Betacam tapes were no longer readable.  Digital files on LTO tape should solve that problem for them when they finally get there.

Matt Starr Spectra Logic CTO talked about the history of tape libraries at Spectra Logic which was founded in 1998 and has been laser focused on tape data protection and tape libraries.

I find it pleasantly surprising that a company today can just supply tape libraries with software and make a ongoing concern of it. Spectra Logic must be doing something right, revenue grew 30% YoY last year and they are outgrowing their current (88K sq ft) office, lab, and manufacturing building they just moved into earlier this year and have just signed to occupy another building providing 55K sq ft of more space.

T-Series robot returning TeraPack to shelf (c) 2011 Silverton Consulting, All Rights Reserved
T-Series robot returning TeraPack to shelf (c) 2011 Silverton Consulting, All Rights Reserved

Molly Rector Spectra Logic CMO talked about the shift in the market from peta-scale (10**15 bytes) storage repositories to exa-scale (10**18 bytes) ones.  Ms. Rector believed that today’s cloud storage environments can take advantage of these large tape based, archives to provide much more economical storage for their users without suffering any performance penalty.

At lunch with Matt Starr, Fred Moore (Horison Information Strategies)Mark Peters (Enterprise Strategy Group) and I were talking about HPSS (High Performance Storage System) developed in conjunction with IBM and 5 US national labs that supports vast amounts of data residing across primary disk and tape libraries.

Matt said that there are about a dozen large HPSS sites (HPSS website shows at least 30 sites using it) that store a significant portion of the worlds 1ZB (10**21 bytes) of digital data created this past year (see my 3.3 exabytes of data a day!? post).  Later that day talking with Nathan Thompson Spectra Logic CEO, he said these large HPSS sites probably store ~10% of the worlds data, or 100EB.  I find that difficult to comprehend that much data at only ~12 sites but the national labs do have lots of data on hand.

Nowadays you can get a Spectra Logic T-Finity tape complex with 122K slot, using LTO-4/-5 or IBM TS1140 (enterprise class) tape drives.  This large a T-Finity has 4 rows of tape libraries which uses the ‘Skyway’ to transport a terapack of tape cartridges between one library row to the another.   All Spectra Logic libraries are built around a tape cartridge package they call the TeraPack which contains 10 LTO cartridges or (I think) 9-TS1140 tape cartridges (they are bigger than LTO tapes).  The TeraPack is used to import or export tapes from the library and all the tape slots in the library.

The software used to control all this is called BlueScale and is used in their T50e, a small, 50 slot library all the way up to the 122K T-Finity tape complex.  There are some changes for configuration, robotics and other personalization for each library type but the UI looks exactly the same across any of their libraries. Moreover, BlueScale offers the same enterprise level of functionality (e.g., drive and media life management) services for all Spectra Logic tape libraries.

Day 1 for SpectraPRDay closed with the lab tour and dinner.  Day 2 will start discussing futures and will be under NDA so there won’t be much to talk about right away. But from what I can see, Spectra Logic seems to be breaking down the barriers inhibiting tape use and providing tape library systems, that people almost revere.

I haven’t seen that sort of reaction about a tape library since the STK 4400 first came out last century.

—-

Comments?

Tape vs. Disk, the saga continues

Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)
Inside a (Spectra Logic) T950 library by ChrisDag (cc) (from Flickr)

Was on a call late last month where Oracle introduced their latest generation T1000C tape system (media and drive) holding 5TB native (uncompressed) capacity. In the last 6 months I have been hearing about the coming of a 3TB SATA disk drive from Hitachi GST and others. And last month, EMC announced a new Data Domain Archiver, a disk only archive appliance (see my post on EMC Data Domain products enter the archive market).

Oracle assures me that tape density is keeping up if not gaining on disk density trends and capacity. But density or capacity are not the only issues causing data to move off of tape in today’s enterprise data centers.

“Dedupe Rulz”

A problem with the data density trends discussion is that it’s one dimensional (well literally it’s 2 dimensional). With data compression, disk or tape systems can easily double the density on a piece of media. But with data deduplication, the multiples start becoming more like 5X to 30X depending on frequency of full backups or duplicated data. And number’s like those dwarf any discussion of density ratios and as such, get’s everyone’s attention.

I can remember talking to an avowed tape enginerr, years ago and he was describing deduplication technology at the VTL level as being architecturally inpure and inefficient. From his perspective it needed to be done much earlier in the data flow. But what they failed to see was the ability of VTL deduplication to be plug-compatible with the tape systems of that time. Such ease of adoption allowed deduplication systems to build a beach-head and economies of scale. From there such systems have no been able to move up stream, into earlier stages of the backup data flow.

Nowadays, what with Avamar, Symantec Pure Disk and others, source level deduplication, or close by source level deduplication is a reality. But all this came about because they were able to offer 30X the density on a piece of backup storage.

Tape’s next step

Tape could easily fight back. All that would be needed is some system in front of a tape library that provided deduplication capabilities not just to the disk media but the tape media as well. This way the 30X density over non-deduplicated storage could follow through all the way to the tape media.

In the past, this made little sense because a deduplicated tape would require potentially multiple volumes in order to restore a particular set of data. However, with today’s 5TB of data on a tape, maybe this doesn’t have to be the case anymore. In addition, by having a deduplication system in front of the tape library, it could support most of the immediate data restore activity while data restored from tape was sort of like pulling something out of an archive and as such, might take longer to perform. In any event, with LTO’s multi-partitioning and the other enterprise class tapes having multiple domains, creating a structure with meta-data partition and a data partition is easier than ever.

“Got Dedupe”

There are plenty of places, that today’s tape vendors can obtain deduplication capabilities. Permabit offers Dedupe code for OEM applications for those that have no dedupe systems today. FalconStor, Sepaton and others offer deduplication systems that can be OEMed. IBM, HP, and Quantum already have tape libraries and their own dedupe systems available today all of which can readily support a deduplicating front-end to their tape libraries, if they don’t already.

Where “Tape Rulz”

There are places where data deduplication doesn’t work very well today, mainly rich media, physics, biopharm and other non-compressible big-data applications. For these situations, tape still has a home but for the rest of the data center world today, deduplication is taking over, if it hasn’t already. The sooner tape get’s on the deduplication bandwagon the better for the IT industry.

—-

Of course there are other problems hurting tape today. I know of at least one large conglomerate that has moved all backup off tape altogether, even data which doesn’t deduplicate well (see my previous Oracle RMAN posts). And at least another rich media conglomerate that is considering the very same move. For now, tape has a safe harbor in big science, but it won’t last long.

Comments?

SOHO backup options

© 2010 RDX Storage Alliance. All Rights Reserved. (From their website)
© 2010 RDX Storage Alliance. All Rights Reserved. (From their website)

I must admit, even though I have disparaged DVD archive life (see CDs and DVDs longevity questioned) I still backup my work desktops/family computers to DVD and DVDdl disks.  It’s cheap (on sale 100 DVDs cost about $30 and DVDdl ~2.5 that much) and it’s convenient (no need for additional software, outside storage fees, or additional drives).  For offsite backups I take the monthly backups and store them in a safety deposit box.

But my partner (and wife) said “Your time is worth something, every time you have to swap DVDs you could be doing something else.” (… like helping around the house.)

She followed up by saying “Couldn’t you use something that was start it and forget it til it was done.”

Well this got me to thinking (as well as having multiple media errors in my latest DVDdl full backup), there’s got to be a better way.

The options for SOHO (small office/home office) Offsite backups look to be as follows: (from sexiest to least sexy)

  • Cloud storage for backup – Mozy, Norton BackupGladinetNasuni, and no doubt many others can provide secure, cloud based backup of desktop, laptop data for Macs and Window systems.  Some of these would require a separate VM or server to connect to the cloud while others would not.  Using the cloud might require the office systems to be left on at nite but that would be a small price to pay to backup your data offsite.   Benefits to cloud storage approaches are that it would get the backups offsite, could be automatically scheduled/scripted to take place off-hours and would require no (or minimal) user intervention to perform.  Disadvantages to this approach is that the office systems would need to be left powered on, backup data is out of your control and bandwidth and storage fees would need to be paid.
  • RDX devices – these are removable NFS accessed disk storage which can support from 40GB to 640GB per cartridge. The devices claim 30yr archive life, which should be fine for SOHO purposes.  Cost of cartridges is probably RDX greatest issue BUT, unlike DVDs you can reuse RDX media if you want to.   Benefits are that RDX would require minimal operator intervention for anything less than 640GB of backup data, backups would be faster (45MB/s), and the data would be under your control.  Disadvantages are the cost of the media (640GB Imation RDX cartridge ~$310) and drives (?), data would not be encrypted unless encrypted at the host, and you would need to move the cartridge data offsite.
  • LTO tape – To my knowledge there is only one vendor out there that makes an iSCSI LTO tape and that is my friends at Spectra Logic but they also make a SAS (6Gb/s) attached LTO-5 tape drive.  It’s unclear which level of LTO technology is supported with the iSCSI drive but even one or two generations down would work for many SOHO shops.  Benefits of LTO tape are minimal operator intervention, long archive life, enterprise class backup technology, faster backups and drive data encryption.  Disadvantages are the cost of the media ($27-$30 for LTO-4 cartridges), drive costs(?), interface costs (if any) and the need to move the cartridges offsite.  I like the iSCSI drive because all one would need is a iSCSI initiator software which can be had easily enough for most desktop systems.
  • DAT tape – I thought these were dead but my good friend John Obeto informed me they are alive and well.  DAT drives support USB 2.0, SAS or parallel SCSI interfaces. Although it’s unclear whether they have drivers for Mac OS/X, Windows shops could probably use them without problem. Benefits are similar to LTO tape above but not as fast and not as long a archive life.  Disadvantages are cartridge cost (320GB DAT cartridge ~$37), drive costs (?) and one would have to move the media offsite.
  • (Blu-ray, Blu-ray dl), DVD, or DVDdl – These are ok but their archive life is miserable (under 2yrs for DVDs at best, see post link above). Benefits are they’res very cheap to use, lowest cost removable media (100GB of data would take ~22 DVDs or 12 DVDdls which at $0.30/ DVD or $0.75 for DVDdl thats  ~$6.60 to $9 per backup), and lowest cost drive (comes optional on most desktops today). Disadvantages are high operator intervention (to swap out disks), more complexity to keep track of each DVDs portion of the backup, more complex media storage (you have a lot more of it), it takes forever (burning 7.7GB to a DVDdl takes around an hour or ~2.1MB/sec.), data encryption would need to be done at the host, and one has to take the media offsite.  I don’t have similar performance data for using Blu-ray  for backups other than Blu-ray dl media costs about $11.50 each (50GB).

Please note this post only discusses Offsite backups. Many SOHOs do not provide offsite backup (risky??) and for online backups I use a spare disk drive attached to every office and family desktop.

Probably other alternatives exist for offsite backups, not the least of which is NAS data replication.  I didn’t list this as most SOHO customers are unlikely to have a secondary location where they could host the replicated data copy and the cost of a 2nd NAS box would need to be added along with the bandwidth between the primary and secondary site.  BUT for those sophisticated SOHO customers out there already using a NAS box for onsite shared storage maybe data replication might make sense. Deduplication backup appliances are another possibility but suffer similar disadvantages to NAS box replication and are even less likely to be already used by SOHO customers.

—-

Ok where to now.  Given all this I M hoping to get a Blu-ray dl writer in my next iMac.  Let’s see that would cut my DVDdl swaps down by ~3.2X for single layer and ~6.5X for dl Blu-ray.  I could easily live with that until I quadrupled my data storage, again.

Although an iSCSI LTO-5 tape transport would make a real nice addition to the office…

Comments?