Protecting the Yottabyte archive

blinkenlights by habi (cc) (from flickr)
blinkenlights by habi (cc) (from flickr)

In a previous post I discussed what it would take to store 1YB of data in 2015 for the National Security Agency (NSA). Due to length, that post did not discuss many other aspects of the 1YB archive such as ingest, index, data protection, etc. Thus, I will attempt to cover each of these in turn and as such, this post will cover some of the data protection aspects of the 1YB archive and its catalog/index.

RAID protecting 1YB of data

Protecting the 1YB archive will require some sort of parity protection. RAID data protection could certainly be used and may need to be extended to removable media (RAID for tape), but that would require somewhere in the neighborhood of 10-20% additional storage (RAID5 across 10 to 5 tape drives). It’s possible with Reed-Solomon encoding and using RAID6 that we could take this down to 5-10% of additional storage (RAID 6 for a 40 to a 20 wide tape drive stripe). Possibly other forms of ECC (such as turbo codes) might be usable in a RAID like configuration which would give even better reliability with less additional storage.

But RAID like protection also applies to the data catalog and indexes required to access the 1YB archive of data. Ditto for the online data itself while it’s being ingested, indexed, or readback. For the remainder of this post I ignore the RAID overhead but suffice it to say with today’s an additional 10% storage for parity will not change this discussion much.

Also in the original post I envisioned a multi-tier storage hierarchy but the lowest tier always held a copy of any files residing in the upper tiers. This would provide some RAID1 like redundancy for any online data. This might be pretty usefull, i.e., if a file is of high interest, it could have been accessed recently and therefore resides in upper storage tiers. As such, multiple copies of interesting files could exist.

Catalog and indexes backups for 1YB archive

IMHO, RAID or other parity protection is different than data backup. Data backup is generally used as a last line of defense for hardware failure, software failure or user error (deleting the wrong data). It’s certainly possible that the lowest tier data is stored on some sort of WORM (write once read many times) media meaning it cannot be overwritten, eliminating one class of user error.

But this presumes the catalog is available and the media is locatable. Which means the catalog has to be preserved/protected from user error, HW and SW failures. I wrote about whether cloud storage needs backup in a prior post and feel strongly that the 1YB archive would also require backups as well.

In general, backup today is done by copying the data to some other storage and keeping that storage offsite from the original data center. At this amount of data, most likely the 2.1×10**21 of catalog (see original post) and index data would be copied to some form of removable media. The catalog is most important as the other two indexes could potentially be rebuilt from the catalog and original data. Assuming we are unwilling to reindex the data, with LTO-6 tape cartridges, the catalog and index backups would take 1.3×10**9 LTO-6 cartridges (at 1.6×10**12 bytes/cartridge).

To back up this amount of data once per month would take a gaggle of tape drives. There are ~2.6×10**6 seconds/month and each LTO-6 drive can transfer 5.4×10**8 bytes/sec or 1.4X10**15 bytes/drive-month but we need to backup 2.1×10**21 bytes of data so we need ~1.5×10**6 tape transports. Now tapes do not operate 100% of the time because when a cartridge becomes full it has to be changed out with an empty one, but this amounts to a rounding error at these numbers.

To figure out the tape robotics needed to service 1.5×10**6 transports we could use the latest T-finity tape library just announced by Spectra Logic . The T-Finity supports 500 tape drives and 122,000 tape cartridges, so we would need 3.0×10**3 libraries to handle the drive workload and about 1.1×10**4 libraries to store the cartridge set required, so 11,000 T-finity libraries would suffice. Presumably, using LTO-7 these numbers could be cut in half ~5,500 libraries, ~7.5×10**5 transports, and 6.6×10**8 cartridges.

Other removable media exist, most notably the Prostor RDX. However RDX roadmap info out to the next generation are not readily available and high-end robotics are do not currently support RDX. So for the moment tape seems the only viable removable backup for the catalog and index for the 1YB archive.

Mirroring the data

Another approach to protecting the data is to mirror the catalog and index data. This involves taking the data and copying it to another online storage repository. This doubles the storage required (to 4.2×10**21 bytes of storage). Replication doesn’t easily protect from user error but is an option worthy of consideration.

Networking infrastructure needed

Whether mirroring or backing up to tape, moving this amount of data will require substantial networking infrastructure. If we assume that in 2105 we have 32GFC (32 gb/sec fibre channel interfaces). Each interface could potentially transfer 3.2GB/s or 3.2×10**9 bytes/sec. Mirroring or backing up 2.1×10**21 bytes over one month will take ~2.5×10**6 32GFC interfaces. Probably should have twice this amount of networking just to not have any one be a bottleneck so 5×10**6 32GFC interfaces should work.

As for switches, the current Brocade DCX supports 768 8GFC ports and presumably similar port counts will be available in 2015 to support 32GFC. In addition if we assume at least 2 ports per link, we will need ~6,500 fully populated DCX switches. This doesn’t account for multi-layer switches and other sophisticated switch topologies but could be accommodated with another factor of 2 or ~13,000 switches.

Hot backups require journals

This all assumes we can do catalog and index backups once per month and take the whole month to do them. Now storage today normally has to be taken offline (via snapshot or some other mechanism) to be backed up in a consistent state. While it’s not impossible to backup data that is concurrently being updated it is more difficult. In this case, one needs to maintain a journal file of the updates going on while the data is being backed up and be able to apply the journaled changes to the data backed up.

For the moment I am not going to determine the storage requirements for the journal file required to cover the catalog transactions for a month, but this is dependent on the change rate of the catalog data. So it will necessarily be a function of the index or ingest rate of the 1YB archive to be covered in a future post.

Stay tuned, I am just having too much fun to stop.

Repositioning of tape

HP LTO 4 Tape Media
HP LTO 4 Tape Media
In my past life, I worked for a dominant tape vendor. Over the years, we had heard a number of times that tape was dead. But it never happened. BTW, it’s also not happening today.

Just a couple of weeks ago, I was at SNW and vendor friend of mine asked if I knew anyone with tape library expertise because they were bidding on more and more tape archive opportunities. Tape seems alive and kicking for what I can see.

However, the fact is that tape use is being repositioned. Tape is no longer the direct target for backups that it once was. Most backup packages nowadays backup to disk and then later, if at all, migrate this data to tape (D2D2T). Tape is being relegated to a third tier of storage, a long-term archive and/or a long term backup repository.

The economics of tape are not hard to understand. You pay for robotics, media and drives. Tape, just like any removable media requires no additional power once it’s removed from the transport/drive used to write it. Removable media can be transported to an offsite repository or accross the continent. There it can await recall with nary an ounce (volt) of power consumed.

Problems with tape

So what’s wrong with tape, why aren’t more shops using it. Let me count the problems

  1. Tape, without robotics, requires manual intervention
  2. Tape, because of its transportability, can be lost or stolen, leading to data security breaches
  3. Tape processing, in general, is more error prone than disk. Tape can have media and drive errors which cause data transfer operations to fail
  4. Tape is accessed sequentially, it cannot be randomly accessed (quickly) and only one stream of data can be accepted per drive
  5. Much of a tape volume is wasted, never written space
  6. Tape technology doesn’t stay around forever, eventually causing data obsolescence
  7. Tape media doesn’t last forever, causing media loss and potentially data loss

Likely some other issues with tape missed here, but these seem the major ones from my perspective.

It’s no surprise that most of these problems are addressed or mitigated in one form or another by the major tape vendors, software suppliers and others interested in continuing tape technology.

Robotics can answer the manual intervention, if you can afford it. Tape encryption deals effectively with stolen tapes, but requires key management somewhere. Many applications exist today to help predict when media will go bad or transports need servicing. Tape data, is and always will be, accessed sequentially, but then so is lot’s of other data in today’s IT shops. Tape transports are most definitely single threaded but sophisticated applications can intersperse multiple streams of data onto that single tape. Tape volume stacking is old technology, not necessarily easy to deploy outside of some sort of VTL front-end, but is available. Drive and media technology obsolescence will never go away, but this indicates a healthy tape market place.

Future of tape

Say what you will about Ultrium or the Linear Tape-Open (LTO) technology, made up of HP, IBM, and Quantum research partners, but it has solidified/consolidated the mid-range tape technology. Is it as advanced as it could be, or pushing to open new markets – probably not. But they are advancing tape technology providing higher capacity, higher performance and more functionality over recent generations. And they have not stopped, Ultrium’s roadmap shows LTO-6 right after LTO-5 and delivery of LTO-5 at 1.6TB uncompressed capacity tape, is right around the corner.

Also IBM and Sun continue to advance their own proprietary tape technology. Yes, some groups have moved away from their own tape formats but that’s alright and reflects the repositioning that’s happening in the tape marketplace.

As for the future, I was at an IEEE magnetics meeting a couple of years back and the leader said that tape technology was always a decade behind disk technology. So the disk recording heads/media in use today will likely see some application to tape technology in about 10 years. As such, as long as disk technology advances, tape will come out with similar capabilities sometime later.

Still, it’s somewhat surprising that tape is able to provide so much volumetric density with decade old disk technology, but that’s the way tape works. Packing a ribbon of media around a hub, can provide a lot more volumetric storage density than a platter of media using similar recording technology.

In the end, tape has a future to exploit if vendors continue to push its technology. As a long term archive storage, it’s hard to beat its economics. As a backup target it may be less viable. Nonetheless, it still has a significant install base which turns over very slowly, given the sunk costs in media, drives and robotics.

Full disclosure: I have no active contracts with LTO or any of the other tape groups mentioned in this post.

Today's data and the 1000 year archive

Untitled (picture of a keypunch machine) by Marcin Wichary (cc) (from flickr)
Untitled (picture of a keypunch machine) by Marcin Wichary (cc) (from flickr)

Somewhere in my basement I have card boxes dating back to the 1970s and paper tape canisters dating back to the 1960s with basic, 360-assembly, COBOL, PL/1 programs on them. These could be reconstructed if needed, by reading the Hollerith encoding and typing them out into text files. Finding a compiler/assembler/interpreter to interpret and execute them is another matter. But, just knowing the logic may suffice to translate them into another readily compilable language of today. Hollerith is a data card format which is well known and well described. But what of the data being created today. How will we be able to read such data in 50 years let alone 500? That is the problem.

Vista de la Biblioteca Vasconcelos by Eneas (cc) (from flickr)
Vista de la Biblioteca Vasconcelos by Eneas (cc) (from flickr)

Civilization needs to come up with some way to keep information around for 1000 years or more. There are books relevant today (besides the Bible, Koran, and other sacred texts) that would alter the world as we know it if they were unable to be read 900 years ago. No doubt, data or information like this, being created today will survive to posterity, by virtue of its recognized importance to the world. But there are a few problems with this viewpoint:

  • Not all documents/books/information are recognized as important during their lifetime of readability
  • Some important information is actively suppressed and may never be published during a regime’s lifetime
  • Even seemingly “unimportant information” may have significance to future generations

From my perspective, knowing what’s important to the future needs to be left to future generations to decide.

Formats are the problem

Consider my blog posts, WordPress creates MySQL database entries for blog posts. Imagine deciphering MySQL database entries, 500 or 1000 years in the future and the problem becomes obvious. Of course, WordPress is open source, so this information could conceivable be readily interpretable by reading it’s source code.

I have written before about the forms that such long lived files can take but for now consider that some form of digital representation of a file (magnetic, optical, paper, etc.) can be constructed that lasts a millennia. Some data forms are easier to read than others (e.g., paper) but even paper can be encoded with bar codes that would be difficult to decipher without a key to their format.

The real problem becomes file or artifact formats. Who or what in 1000 years will be able to render a Jpeg file, able to display an old MS/Word file of 1995, or be able to read a WordPerfect file from 1985. Okay, a Jpeg is probably a bad example as it’s a standard format but, older Word and WordPerfect file formats constitute a lot of information today. Although there may be programs available to read them today, the likelihood that they will continue to do so in 50, let alone 500 years, is pretty slim.

The problem is that as applications evolve, from one version to another, formats change and developers have negative incentive to publicize these new file formats. Few developers today wants to supply competitors with easy access to convert files to a competitive format. Hence, as developers or applications go out of business, formats cease to be readable or convertable into anything that could be deciphered 50 years hence.

Solutions to disappearing formats

What’s missing, in my view, is a file format repository. Such a repository could be maintained by an adjunct of national patent trade offices (nPTOs). Just like todays patents, file formats once published, could be available for all to see, in multiple databases or print outs. Corporations or other entities that create applications with new file formats would be required to register their new file format with the local nPTO. Such a format description would be kept confidential as long as that application or its descendants continued to support that format or copyright time frames, whichever came first.

The form that a file format could take could be the subject of standards activities but in the mean time, anything that explains the various fields, records, and logical organization of a format, in a text file, would be a step in the right direction.

This brings up another viable solution to this problem – self defining file formats. Applications that use native XML as their file format essentially create a self defining file format. Such a file format could be potentially understood by any XML parser. And XML format, as a defined standard, are wide enough defined that they could conceivable be available to archivists of the year 3000. So I applaud Microsoft for using XML for their latest generation of Office file formats. Others, please take up the cause.

If such repositories existed today, people in the year 3010 could still be reading my blog entries and wonder why I wrote them…

Cache appliances rise from the dead

XcelaSAN picture from DataRam.com website
XcelaSAN picture from DataRam.com website
Sometime back in the late 80’s a company I once worked with had a product called the tape accelerator which was nothing more than a ram cache in front of a tape device to smooth out physical tape access. The tape accelerator was a popular product for it’s time, until most tape subsystems started incorporating their own cache to do this.

At SNW in Phoenix this week, I saw a couple of vendors that were touting similar products with a new twist. They had both RAM and SSD cache and were doing this for disk only. DataRAM’s XcelaSAN was one such product although apparently there were at least two others on the floor which I didn’t talk with.

XcelaSAN is targeted for midrange disk storage where the storage subsystems have limited amount’s of cache. Their product is Fibre Channel attached and lists for US$65K per subsystem. Two appliances can be paired together for high availability. Each appliance has 8-4GFC ports on it, with 128GB of DRAM and 360GB of SSD cache.

I talked to them a little about their caching algorithms. They claim to have sequential detect, lookahead and other sophisticated caching capabilities but the proof is in the pudding. It would be great to put this in front of a currently SPC benchmarked storage subsystem and see how much it accelerates it’s SPC-1 or SPC-2 results, if at all.

From my view, this is yet another economic foot race. Most new mid range storage subsystems today ship with 8-16GB of DRAM cache and varied primitive caching algorithms. DataRAM’s appliance has considerably more cache but at these prices it would need to be amortized over a number of mid range subsystems to be justified.

Enterprise class storage subsystems have a lot of RAM cache already, but most use SSDs as storage tier and not a cache tier (except for NetApp’s PAM card). Also, we

  • Didn’t talk much about the reliability of their NAND cache or whether they were using SLC or MLC but these days with workloads approaching 1:1 read:write ratios. IMHO, having some SSD in the system for heavy reads are good but you need RAM for the heavy write workloads.
  • Also what happens when the power fails is yet another interesting question to ask. Most subsystem caches have battery backup or non-volatile RAM sufficient to get data written to RAM out to some more permanent storage like disk. In these appliances perhaps they just write it to SSD.
  • Also what happens when the storage subsystem power fails and the appliance stays up. Sooner or later you have to go back to the storage to retrieve or write the data

In my view, none of these issues are insurmountable but take clever code to get around. Knowing how clever there appliance developers are is hard to judge from the outside. Quality is often as much a factor of testing as it is a factor of development (see my Price of Quality post to learn more on this).

Also, most often caching algorithms are very tailored to the storage subsystem that surrounds it. But this isn’t always necessary. Take IBM SVC or HDS USP-V both of which can add a lot of cache in front of other storage subsystems. But these products also offer storage virtualization which the caching appliances do not provide.

All in all, I feel this is a good direction to take but it’s somewhat time limited until the midrange storage subsystems start becoming more cache intensive/knowledgeable. At that time these products will once again fall into the background. But in the meantime they can have a viable market benefit for the right storage environment.

The future of libraries

Vista de la Biblioteca Vasconcelos by Eneas (cc) (from flickr)
Vista de la Biblioteca Vasconcelos by Eneas (cc) (from flickr)
My recent post on an exabyte-a-day generated a comment that got me thinking. What we need in the world today is a universal deduped archive. Such an archive would be a repository for all information generated by the world, nation, state, etc. and would automatically deduplicate the data and back it up.

Such an archive could be a new form of the current library. Keeping data for future generations and also for a nation’s population. Data held in the library repository would need to have:

  • Iron-clad data security via some form of data-at-rest encryption. This is a bit tricky since we would want to dedupe all the data from everywhere yet at the same time have the data be encrypted.
  • Enforceable digital rights management that would allow authorized users data access but unauthorized users would be restricted from viewing the information
  • Easy accessibility that would allow home consumers access to their data in an “always on” type of environment or access from any internet enabled location.
  • Dependable backups that would allow user restore of data.
  • Time limited protection scheme that after so many years (60 or 100) of data non-access/non-modification, the data would revert to public access/non-secured access for future research.
  • Government funding akin to today’s libraries that are publicly funded but serve those consumers that take the time to access their library facilities.

I see this as another outgrowth of current libraries which supports a repository for todays books, magazines, media, maps, and other published artifacts. However, in this case most data would not be published during a person’s lifetime but would become public property sometime after that person dies.

Benefits to society and the individual

Of what use could such a data repository be? Once the data becomes publicly accessible:

  • Future historians could find out what life was really like, in a detail never before available. Find out what people were watching/listening to, who people wrote to/conversed with, and what people cared about in the 21st century by perusing the data feeds of that generation.
  • Future scientists could mine the data for insights into a generation, network links, and personal data consumption.
  • Future governments could mine the data looking for what people thought about a nation, its economy, politics, etc., to help create better government.

But mostly, we don’t know what future researchers could do with the data. If such a repository existed today for what people were thinking and doing 60 to 100 years ago, history would be much more person derived rather than media derived. Economists would have a much more accurate picture of the great depression’s affect on humankind. Medicine would have a much better picture of how the pollutants and lifestyles of yesterday impact the health of today.

Also, as more and more of society’s activity involve data, the detail available on a person’s life becomes even more pervasive. Consider medical imaging, if you had a repository for a person’s x-rays from birth to death, this data could potentially be invaluable to the medicine of tomorrow.

While the data is still protected people

  • Would have a secure repository to store all their data, accessible from any internet enabled location
  • Would have an unlimited repository for their data storage not unlike timemachine on the Mac which they could go back to at anytime in the past to retrieve data.
  • Would have the potential to record even more information about their daily activities.
  • Would have a way to license their data feeds to researchers for a price sort of like registering for Nielsen TV or Alexa web tracking.

Costs to society

The price society would pay could be minimized by appropriate storage and systems technology. If in reality the data created by individuals (~87PB/day from the above mentioned post) could be deduped by a factor of 50X, this would account for only 1.7PB of unique data per day worldwide. If I take a nation’s portion of world GDP as a surrogate for data created by a nation, then for the US with 23.6% of the world’s ’08 GDP, creates ~0.4PB of individual deduped data per day or ~150PB of data per year.

Of course this would be split up by state or by municipality so the load on any one juristiction would be considerably smaller than this. But storing 150PB of data today would take 75K-2TB drives and would cost about ~$15.8M in drive costs (2TB WD drive costs $210 on Amazon) in the US. This does not account for servers, backups, power, cooling, floorspace, administration, etc but let’s triple this to incorporate these other costs. So to store all the data created by individuals in the US in 2009 would cost around $47.4M today with today’s technology.

Also consider that this cost is being cut in half every 18 to 24 months but counteracting that trend is a significant growth in data created/stored by individuals each year (~50%). Hence, by my calculations, the cost to store all this data is declining slightly every year depending on the speed of density increase and average individual data growth rate.

In any event, $47.4M is not a lot to spend to keep a nation’s worth of individual data. The benefits to today’s society would be considerable and future generations would have a treasure trove of data to analyze whenever the need presented itself.

Holding this back today is the obvious cost but also all of the data security considerations. I believe the costs are manageable, at least at the state or municipal level. As for the data security considerations, simple data-at-rest encryption is one viable solution. Although how to encrypt while still providing deduplication is a serious problem to be overcome. Enforceable digital rights, time limited protection, and the other technological features could come with time.

An Exabyte-a-day

snp microarray data by mararie (cc) (from flickr)
snp microarray data by mararie (cc) (from flickr)

At HPTechDay this week Jim Pownell, office of CTO, HP StorageWorks Division, reported on an IDC study that said this year the world is creating about an Exabyte of data each day.  An Exabyte (XB) is 10**18 bytes or 1000 PB of data.  Seems a bit high from my perspective.

Data creation by individuals

Population Growth and Income Level Chart by mattlemmon (cc) (from flickr)
Population Growth and Income Level Chart by mattlemmon (cc) (from flickr)

The US Census bureau estimates todays worldwide population at around 6.8 Billion people. Given that estimate, the XB/day number says that the average person is creating about 150MB/day.

Now I don’t know about you but we probably create that much data during our best week. That being said our family average over the last 3.5 years is more like 30.1MB/day. This average, over the last year, has been closer to 75.1MB/day (darn new digital camera).

If I take our 75.1 MB/day as a reasonable approximate average for our family and with 2 adults in our family, this would say each adult creates ~37.6MB of data per day.

Probably about 50% of todays world wide population probably has no access to create any data whatsoever. Of the remaining 50%, maybe 33% is at an age where data creation is insignificant. All this leaves about 2.3B people actively creating data at around 37.6MB/day. This would account for about 86.5PB of data creation a day.

Naturally, I would consider myself a power data creator but

  • We are not doing much with video production which takes creates gobs of data.
  • Also, my wife retains camera rights and I only take the occasional photo with my cell phone. So I wouldn’t say we are heavy into photography.

Nonetheless, 37.6MB/day on average seems exceptionally high, even for us.

Data creation by companies

However, that XB a day also accounts for corporate data generation as well as individuals. Hoovers, a US corporate database lists about 33M companies worldwide. These are probably the biggest 33M and no doubt creating lot’s of data each day.

Given the above that individuals probably account for 86.5PB/day, that leaves about ~913.5PB/day for the Hoover’s DB of 33M companies to create. By my calculations this would say each of these companies is generating about ~27.6GB/day. No doubt there are plenty of companies out there doing this each day but the average company generates 27.6GB a day?? I don’t think so.

Ok, my count of companies could be wildly off. Perhaps the 33M companies in Hoover’s DB represent only the top 20% of companies worldwide, which means that maybe there are another 132M smaller companies out there totaling 165M companies. Now the 913.5PB/day says the average company generates ~5.5GB/day. This still seems high to me, especially considering this is an average of all 165M companies world wide.

Most analysts predict data creation is growing by over 100% per year, so that XB/day number for this year will be 2XB/day next year.

Of course I have been looking at a new HD video camera for my birthday…

Sony_HDR-TG5V_Vanity350
Sony_HDR-TG5V_Vanity350

The price of quality

At HPTechDay this week we had a tour of the EVA test lab, in the south building of HP’s Colorado Springs Facility. I was pretty impressed and I have seen more than my fair share of labs in my day.

Tony Green HP's EVA Lab Manager
Tony Green HP's EVA Lab Manager
The fact that they have 1200 servers and 500 EVA arrays was pretty impressive but they also happen to have about 20PB of storage over that 500 arrays. In my day a couple of dozen arrays and a 100 or so servers seemed to be enough to test a storage subsystem.

Nowadays it seems to have increased by an order of magnitude. Of course they have sold something like 70,000 EVAs over the years and some of these 500 arrays happen to be older subsystems used to validate problems and debug issues for current field population.

Another picture of the EVA lab with older EVAs
Another picture of the EVA lab with older EVAs

They had some old Compaq equipment there but I seem to have flubbed the picture of that equipment. This one will have to suffice. It seems to have both vertically and horizontally oriented drive shelves. I couldn’t tell you which EVAs these were but as they were earlier in the tour, I figured they were older equipment. It seemed as you got farther into the tour you moved closer to the current iterations of EVA. It seemed like an archive dig in reverse instead of having the most current layers/levels first they were last.

I asked Tony how many FC ports he had and he said it was probably easiest to count the switch ports and double them but something in the thousands seemed reasonable.

FC switch rack with just a small selection of switch equipment
FC switch rack with just a small selection of switch equipment

There were parts of the lab which were both off limits to cameras and to bloggers which was deep into the bowels of the lab. But we were talking about some of the remote replication support that EVA had and how they tested this over distance. Tony said they had to ship their reel of 100 miles of FC up north (probably for some other testing) but he said they have a surragate machine which can be programmed to create the proper FC delay to meet any required distances.

FC delay generator box
FC delay generator box

The blue box in the adjacent picture seemed to be this magic FC delay inducer box. Had interesting lights on it.

Nigel Poulton of Ruptured Monkeys and Devang Panchigar of StorageNerve Blog were also on the tour taking pictures&video. You can barely make out Devang in the picture next to Nigel. Calvin Zito from HP StorageWorks Blog was also on tour but not in any of my pictures.

Nigel and Devang (not pictured) taking videos on EVA lab tour
Nigel and Devang (not pictured) taking videos on EVA lab tour

Throughout our tour of the lab I can say I only saw one logic analyzer although I am sure there were plenty more in the off limits area.

Lonely logic analyzer in EVA lab
Lonely logic analyzer in EVA lab
During HPTechDay they hit on the topic of storage-server convergence and the use of commodity, X86 hardware for future storage systems. From the lack of logic analyzers I would have to concur with this analysis.

Nonetheless, I saw some hardware workstations although this was another lonely workstation sorrounded in a sea of EVAs.

Hardware workstation in the EVA lab, covered in parts and HW stuff
Hardware workstation in the EVA lab, covered in parts and HW stuff
Believe it or not I actually saw one stereo microscope but failed to take a picture of it. Yet another indicator of hardware descent and my inadequacies as a photographer.

One picture of an EVA obviously undergoing some error injection test with drives tagged as removed and being rebuilt or reborn as part of RAID testing.

Drives tagged for removal during EVA test
Drives tagged for removal during EVA test
In my day we would save particularly “squirrelly drives” from the field and use them to verify storage subsystem error handling. I would bet anything these tagged drives had specific error injection points used to validate EVA drive error handling.

I could go on and I have a couple of more decent lab pictures but you get the jist of the tour.

For some reason I enjoy lab tours. You can tell a lot about an organization by how their labs look, how they are manned, organized and set up. What HP’s EVA lab tells me is that they spare no expense to insure their product is literally bulletproof, bug proof, and works every time for their customer base. I must say I was pretty impressed.

At the end of HPTechDay event Greg Knieriemen of Storage Monkeys and Stephen Foskett of GestaltIT hosted an InfoSmack podcast to be broadcast next Sunday 10/4/2009. There we talked a little more on commodity hardware versus purpose built storage subsystem hardware, it was a brief, but interesting counterpoint to the discussions earlier in the week and the evidence from our portion of the lab tour.