Archeology meets Big Data

Polynya off the Antarctic Coast by NASA Earth Observatory (cc) (From Flickr)
Polynya off the Antarctic Coast by NASA Earth Observatory (cc) (From Flickr)

Read an article yesterday about the use of LIDAR (light detection and ranging, Wikipedia) to map the residues of an pre-columbian civilization in Central America, the little know Purepecha empire, peers of the Aztecs.

The original study (seeLIDAR at Angamuco) cited in the piece above was a result of the Legacies of Resilience project sponsored by Colorado State University (CSU) and goes into some detail about the data processing and archeological use of the LIDAR maps.

Why LIDAR?

LIDAR sends a laser pulse from an airplane/satellite to the ground and measures how long it takes to reflect back to the receiver. With that information and “some” data processing, these measurements can be converted to an X, Y, & Z coordinate system or detailed map of the ground.

The archeologists in the study used LIDAR to create a detailed map of the empire’s main city at a resolution of +/- 0.25m (~10in). They mapped about ~207 square kilometers (80 square miles) at this level of detail. In 4 days of airplane LIDAR mapping, they were able to gather more information about the area then they were able to accumulate over 25 years of field work. Seems like digital archeology was just born.

So how much data?

I wanted to find out just how much data this was but neither the article or the study told me anything about the size of the LIDAR map. However, assuming this is a flat area, which it wasn’t, and assuming the +/-.25m resolution represents a point every 625sqcm, then the area being mapped above should represent a minimum of ~3.3 billion points of a LIDAR point cloud.

Another paper I found (see Evaluation of MapReduce for Gridding LIDAR Data) said that a LIDAR “grid point” (containing X, Y & Z coordinates) takes 52 bytes of data.

Given the above I estimate the 207sqkm LIDAR grid point cloud represents a minimum of ~172GB of data. There are LIDAR compression tools available, but even at 50% reduction, it’s still 85GB for 210sqkm.

My understanding is that the raw LIDAR data would be even bigger than this and the study applied a number of filters against the LIDAR map data to extract different types of features which of course would take even more space. And that’s just one ancient city complex.

With all the above the size of LIDAR raw data, grid point fields, and multiple filtered views is approaching significance (in storage terms). Moving and processing all this data must also be a problem. As evidence, the flights for the LIDAR runs over Angamuco, Mexico occurred in January 2011 and they were able to analyze the data sometime that summer, ~6 months late. Seems a bit long from my perspective maybe the data processing/analysis could use some help.

Indiana Jones meets Hadoop

That was the main subject of the second paper mentioned above done by researchers at the San Diego Supercomputing Center (SDSC). They essentially did a benchmark comparing MapReduce/Hadoop running on a relatively small cluster of 4 to 8 commodity nodes against an HPC cluster (running 28-Sun x4600M2 servers, using 8 processor, quad core nodes, with anywhere from 256 GB to 512GB [only on 8 nodes] of DRAM running a C++ implementation of the algorithm.

The results of their benchmarks were that the HPC cluster beat the Hadoop cluster only when all of the LIDAR data could fit in memory (on a DRAM per core basis), after that the Hadoop cluster performed just as well in elapsed wall clock time. Of course from a cost perspective the Hadoop cluster was much more economical.

The 8-node, Hadoop cluster was able to “grid” a 150M LIDAR derived point cloud at the 0.25m resolution in just a bit over 10 minutes. Now this processing step is just one of the many steps in LIDAR data analysis but it’s probably indicative of similar activity occurring earlier and later down the (data) line.

~~~~

Let’s see 172GB per 207sqkm, the earth surface is 510Msqkm, says a similar resolution LIDAR grid point cloud of the entire earth’s surface would be about 0.5EB (Exabyte, 10**18 bytes). It’s just great to be in the storage business.

 

One day with HDS

HDS CEO Jack Domme shares the company’s vision and strategy with Influencer Summit attendees #HDSday by HDScorp
HDS CEO Jack Domme shares the company’s vision and strategy with Influencer Summit attendees #HDSday by HDScorp

Attended #HDSday yesterday in San Jose.  Listened to what seemed like the majority of the executive team. The festivities were MCed by Asim Zaheer, VP Corp and Product Marketing, a long time friend and employee, that came to HDS with the acquisition of Archivas five or so years ago.   Some highlights of the day’s sessions are included below.

The first presenter was Jack Domme, HDS CEO, and his message was that there is a new, more aggressive HDS, focused on executing and growing the business.

Jack said there will be almost a half a ZB by 2015 and ~80% of that will be unstructured data.  HDS firmly believes that much of this growing body of  data today lives in silos, locked into application environments and can’t become truly information until it can be liberated from this box.  Getting information out of the unstructured data is one of the key problems facing the IT industry.

To that end, Jack talked about the three clouds appearing on the horizon:

  • infrastructure cloud – cloud as we know and love it today where infrastructure services can be paid for on a per use basis, where data and applications move seemlessly across various infrastructural boundaries.
  • content cloud – this is somewhat new but here we take on the governance, analytics and management of the millions to billions pieces of content using the infrastructure cloud as a basic service.
  • information cloud – the end game, where any and all data streams can be analyzed in real time to provide information and insight to the business.

Jack mentioned the example of when Japan had their earthquake earlier this year they automatically stopped all the trains operating in the country to prevent further injury and accidents, until they could assess the extent of track damage.  Now this was a specialized example in a narrow vertical but the idea is that the information cloud does that sort of real-time analysis of data streaming in all the time.

For much of the rest of the day the executive team filled out the details that surrounded Jack’s talk.

For example Randy DeMont, Executive VP & GM Global Sales, Services and Support talked about the new, more focused sales team. On that has moved to concentrate on better opportunities and expanded to take on new verticals/new emerging markets.

Then Brian Householder, SVP WW Marketing and Business Development got up and talked about some of the key drivers to their growth:

  • Current economic climate has everyone doing more with less.  Hitachi VSP and storage virtualization is a unique position to be able to obtain more value out of current assets, not a rip and replace strategy.  With VSP one layers better management on top of your current infrastructure, that helps get more done with the same equipment.
  • Focus on the channel and verticals are starting to pay off.  More than 50% of HDS revenues now come from indirect channels.  Also, healthcare and life sciences are starting to emerge as a crucial vertical for HDS.
  • Scaleability of their storage solutions is significant. Used to be a PB was a good sized data center but these days we are starting to talk about multiple PBs and even much more.  I think earlier Jack mentioned that in the next couple of years HDS will see their first 1EB customer.

Mark Mike Gustafson,  SVP & GM NAS (former CEO BlueArc) got up and talked about the long and significant partnership between the two companies regarding their HNAS product.  He mentioned that ~30% of BlueArc’s revenue came from HDS.  He also talked about some of the verticals that BlueArc had done well in such as eDiscovery and Media and Entertainment.  Now these verticals will become new focus areas for HDS storage as well.

John Mansfield, SVP Global Solutions Strategy and Developmentcame up and talked about the successes they have had in the product arena.  Apparently they have over 2000 VSPs intsalled, (announced just a year ago), and over 50% of the new systems are going in with virtualization. When asked later what has led to the acceleration in virtualization adoption, the consensus view was that server virtualization and in general, doing more with less (storage efficiency) were driving increased use of this capability.

Hicham Abdessamad, SVP, Global Services got up and talked about what has been happening in the services end of the business.  Apparently there has been a serious shift in HDS services revenue stream from break fix over to professional services (PS).  Such service offerings now include taking over customer data center infrastructure and leasing it back to the customer at a monthly fee.   Hicham re-iterated that ~68% of all IT initiatives fail, while 44% of those that succeed are completed over time and/or over budget.  HDS is providing professional services to help turn this around.  His main problem is finding experienced personnel to help deliver these services.

After this there was a Q&A panel of John Mansfield’s team, Roberto Bassilio, VP Storage Platforms and Product Management, Sean Moser,  VP Software Products, and Scan Putegnat, VP File and Content Services, CME.  There were a number of questions one of which was on the floods in Thailand and their impact on HDS’s business.

Apparently, the flood problems are causing supply disruptions in the consumer end of the drive market and are not having serious repercussions for their enterprise customers. But they did mention that they were nudging customers to purchase the right form factor (LFF?) disk drives while the supply problems work themselves out.

Also, there was some indication that HDS would be going after more SSD and/or NAND flash capabilities similar to other major vendors in their space. But there was no clarification of when or exactly what they would be doing.

After lunch the GMs of all the Geographic regions around the globe got up and talked about how they were doing in their particular arena.

  • Jeff Henry, SVP &GM Americas talked about their success in the F500 and some of the emerging markets in Latin America.  In fact, they have been so successful in Brazil, they had to split the country into two regions.
  • Niels Svenningsen, SVP&GM EMAE talked about the emerging markets in his area of the globe, primarily eastern Europe, Russia and Africa. He mentioned that many believe Africa will be the next area to take off like Asia did in the last couple of decades of last century.  Apparently there are a Billion people in Africa today.
  • Kevin Eggleston, SVP&GM APAC, talked about the high rate of server and storage virtualization, the explosive growth and heavy adoption of Cloud pay as you go services. His major growth areas were India and China.

The rest of the afternoon was NDA presentations on future roadmap items.

—-

All in all a good overview of HDS’s business over the past couple of quarters and their vision for tomorrow.  It was a long day and there was probably more than I could absorb in the time we had together.

Comments?

 

The sensor cloud comes home

We thought the advent of smart power meters would be the killer app for building the sensor cloud in the home.  But, this week Honeywell announced a new smart thermostat that attaches to the Internet and uses Opower’s cloud service to record and analyze home heating and cooling demand.  Looks to be an even better bet.

9/11 Memorial renderings, aerial view (c) 9/11 Memorial.org (from their website)
9/11 Memorial renderings, aerial view (c) 9/11 Memorial.org (from their website)

Just this past week, on a NPR NOVA telecast: Engineering Ground Zero on building the 9/11 memorial in NYC, it was mentioned that all the trees planted in the memorial had individual sensors to measure soil chemistry, dampness, and other tree health indicators. Yes, even trees are getting on the sensor cloud.

And of course the buildings going up at Ground Zero are all smart buildings as well, containing sensors embedded in the structure, the infrastructure, and anywhere else that matters.

But what does this mean in terms of data

Data requirements will explode as the smart home and other sensor clouds build out.  For example, even if a smart thermostat only issues a message every 15 minutes and the message is only 256 bytes, the data from the 130 million households in the US alone would be an additional ~3.2TB/day.  And that’s just one sensor per household.

If you add the smart power meter, lawn sensor, intrusion/fire/chemical sensor, and god forbid, the refrigerator and freezer product sensors to the mix that’s another another 16TB/day of incoming data.

And that’s just assuming a 256 byte payload per sensor every 15 minutes.  The intrusion sensors could easily be a combination of multiple, real time exterior video feeds as well as multi-point intrusion/motion/fire/chemical sensors which would generate much, much more data.

But we have smart roads/bridges, smart cars/trucks, smart skyscrapers, smart port facilities, smart railroads, smart boats/ferries, etc. to come.   I could go on but the list seems long enouch already.  Each of these could generate another ~19TB/day data stream, if not more.  Some of these infrastructure entities/devices are much more complex than a house and there are a lot more cars on the road than houses in the US.

It’s great to be in the (cloud) storage business

All that data has to be stored somewhere and that place is going to be the cloud.  The Honeywell smart thermostat uses Opower’s cloud storage and computing infrastructure specifically designed to support better power management for heating and cooling the home.  Following this approach, it’s certainly feasible that more cloud services would come online to support each of the smart entities discussed above.

Naturally, using this data to provide real time understanding of the infrastructure they monitor will require big data analytics. Hadoop, and it’s counterparts are the only platforms around today that are up to this task.

—-

So cloud computing, cloud storage, and big data analytics have yet another part to play. This time in the upcoming sensor cloud that will envelope the world and all of it’s infrastructure.

Welcome to the future, it’s almost here already.

Comments?

 

 

Disk capacity growing out-of-sight

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)
A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

Last week, Hitachi Global Storage Division(acquired by Western Digital, closing in 4Q2011) and Seagate announced some higher capacity disk drives for desk top applications over the past week.

Most of us in the industry have become somewhat jaded with respect to new capacity offerings. But last weeks announcements may give one pause.

Hitachi announced that they are shipping over 1TB/disk platter using 3.5″ platters shipping with 569Gb/sqin technology.  In the past 4-6 platter disk drives were available in shipped disk drives using full height, 3.5″ drives.  Given the platter capacity available now, 4-6TB drives are certainly feasible or just around the corner. Both Seagate and Samsung beat HGST to 1TB platter capacities which they announced in May of this year and began shipping in drives in June.

Speaking of 4TB drives, Seagate announced a new 4TB desktop external disk drive.  I couldn’t locate any information about the number of platters, or Gb/sqin of their technology, but 4 platters are certainly feasible and as a result, a 4TB disk drive is available today.

I don’t know about you, but 4TB disk drives for a desktop seem about as much as I could ever use. But when looking seriously at my desktop environment my CAGR for storage (revealed as fully compressed TAR files) is ~61% year over year.  At that rate, I will need a 4TB drive for backup purposes in about 7 years and if I assume a 2X compression rate then a 4TB desktop drive will be needed in ~3.5 years, (darn music, movies, photos, …).  And we are not heavy digital media consumers, others that shoot and edit their own video probably use orders of magnitude more storage.

Hard to believe, but given current trends inevitable,  a 4TB disk drive will become a necessity for us within the next 4 years.

—-

Comments?

 

 

 

 

e-pathology and data growth

Blue nevus (4 of 4) by euthman (cc) (From Flickr)
Blue nevus (4 of 4) by euthman (cc) (From Flickr)

I was talking with another analyst the other day by the name of John Koller of Kai Consulting who specializes in the medical space and he was talking about the rise of electronic pathology (e-pathology).  I hadn’t heard about this one.

He said that just like radiology had done in the recent past, pathology investigations are moving to make use of digital formats.

What does that mean?

The biopsies taken today for cancer and disease diagnosis which involve one more specimens of tissue examined under a microscope will now be digitized and the digital files will be inspected instead of the original slide.

Apparently microscopic examinations typically use a 1×3 inch slide that can have the whole slide devoted to some tissue matter.  To be able to do a pathological examination, one has to digitize the whole slide, under magnification at various depths within the tissue.  According to Koller, any tissue is essentially a 3D structure and pathological exams, must inspect different depths (slices) within this sample to form their diagnosis.

I was struck by the need for different slices of the same specimen. I hadn’t anticipated that but whenever I look in a microscope, I am always adjusting the focal length, showing different depths within the slide.   So it makes sense, if you want to understand the pathology of a tissue sample, multiple views (or slices) at different depths are a necessity.

So what does a slide take in storage capacity?

Koller said, an uncompressed, full slide will take about 300GB of space. However, with compression and the fact that most often the slide is not completely used, a more typical space consumption would be on the order of 3 to 5GB per specimen.

As for volume, Koller indicated that a medium hospital facility (~300 beds) typically does around 30K radiological studies a year but do about 10X that in pathological studies.  So at 300K pathological examinations done a year, we are talking about 90 to 150TB of digitized specimen images a year for a mid-sized hospital.

Why move to  e-pathology?

It can open up a whole myriad of telemedicine offerings similar to the radiological study services currently available around the globe.  Today, non-electronic pathology involves sending specimens off to a local lab and examination by medical technicians under microscope.  But with e-pathology, the specimen gets digitized (where, the hospital, the lab, ?) and then the digital files can be sent anywhere around the world, wherever someone is qualified and available to scrutinize them.

—–

At a recent analyst event we were discussing big data and aside from the analytics component and other markets, the vendor made mention of content archives are starting to explode.  Given where e-pathology is heading, I can understand why.

It’s great to be in the storage business

Coming data bubble or explosion?

World population by Arenamontanus (cc) (from Flickr)
World population by Arenamontanus (cc) (from Flickr)

I was at another conference the other day where someone showed a chart that said the world will create 35ZB (10**21) of data and content in 2020 from 800EB (10**18) in 2009.

Every time I see something like this I cringe.   Yes, lot’s of data is being created today but what does that tell us about corporate data growth.  Not much, I’d wager.

Data bubble

That being said, I have a couple of questions I would ask of the people who estimated this:

  • How much is personal data and how much is corporate data.
  • Did you factor how entertainment data growth rates will change over time.

These two questions are crucial.

Entertainment dominates data growth

Just as personal entertainment is becoming the major consumer of national bandwidth (see study [requires login]), it’s clear to me that the majority of the data being created today is for personal consumption/entertainment – video, music, and image files.

I look at my own office, our corporate data (office files, PDFs, text, etc.) represents ~14% of the data we keep.  Images, music, video, audio take up the remainder of our data footprint.  Is this data growing yes, faster than I would like but the corporate data is only averaging ~30% YoY growth while the overall data growth for our shop is averaging a total of ~116% YoY growth . [As I interrupt this activity to load up another 3.3GB of photos and videos from our camera]

Moreover, although some media content is of significant external interest to select (Media and Entertainment, social media-photo/video sharing sites, mapping/satellite, healthcare, etc.) companies today, most corporations don’t deal with lot’s of video, music or audio data.  Thus, I personally see that the 30% growth is a more realistic growth rate for corporate data than 116%.

Will entertainment data growth flatten?

Will we see a drop in the entertainment data growth rates over time, undoubtedly.

Two factors will reduce the growth of this data.

  1. What happens to entertainment data recording formats.  I believe media recording formats are starting to level out.  I think the issue here is one of fidelity to nature, in terms of how closely a digital representation matches reality as we perceive it.  For example, the fact is that  most digital projection systems in movie theaters today run from ~2 to 8TBs per feature length motion picture which seems to indicate that at some point further gains in fidelity (or in more pixels/frame) may not be worth it.  Similar issues, will ultimately lead to a slowing down of other media encoding formats.
  2. When will all the people that can create content be doing so? Recent data indicates that more than 2B people will be on the internet this year or ~28% of the world’s.  But sometime we must reach saturation on internet penetration and when that happens data growth rates should also start to level out.  Let’s say for argument sake, that 800EB in 2009 was correct and let’s assume there were 1.5B internet users (in 2009).  As such, 1B internet users correlates to a data and content footprint of about 533EB or ~0.5TB/internet user — seems high but certainly doable.

Once these two factors level off, we should see world data and content growth rates plummet.  Nonetheless, internet user population growth could be driving data growth rates for some time to come.

Data explosion

The scary part is that the 35ZB represents only a ~41% growth rate over the period against the baseline 2009 data and content creation levels.

But I must assume this estimate doesn’t consider much growth in digital creators of content, otherwise these numbers should go up substantially.   In the last week, I ran across someone who said there would be 6B internet users by the end of the decade (can’t seem to recall where, but it was a TEDx video).  I find that a little hard to believe but this was based on the assumption that most people will have smart phones with cellular data plans by that time.  If that be the case, 35ZB seems awfully short of  the mark.

A previous post blows this discussion completely away with just one application, (see Yottabytes by 2015 for the NSA A Yottabyte (YB) is 10**24 bytes of data) and I had already discussed an Exabyte-a-day and 3.3 Exabytes-a-day in prior posts.  [Note, those YB by 2015 are all audio (phone) recordings but if we start using Skype Video, FaceTime and other video communications technologies can Nonabytes (10**27) be far behind… BOOM!]

—-

I started out thinking that 35ZB by 2020 wasn’t pertinent to corporate considerations and figured things had to flatten out, then convinced myself that it wasn’t large enough to accommodate internet user growth, and then finally recalled prior posts that put all this into even more perspective.

Comments?

Why Bus-Tech, why now – Mainframe/System z data growth

Z10 by Roberto Berlim (cc) (from Flickr)
Z10 by Roberto Berlim (cc) (from Flickr)

Yesterday, EMC announced the purchase of Bus-Tech, their partner in mainframe or System z attachment for the Disk Library Mainframe (DLm) product line.

The success of open systems mainframe attach products based on Bus-Tech or competitive technology is subject to some debate but it’s the only inexpensive way to bring such functionality into mainframes.  The other, more expensive approach is to build in System z attach directly into the hardware/software for the storage system.

Most mainframer’s know that FC and FICON (System z storage interface) utilize the same underlying transport technology.  However, FICON has a few crucial differences when it comes to data integrity, device commands and other nuances which make easy interoperability more of a challenge.

But all that just talks about the underlying hardware when you factor in disk layout (CKD), tape format, disk and tape commands (CCWs), System z interoperability can become quite an undertaking.

Bus-Tech’s virtual tape library maps mainframe tape/tape library commands and FICON protocols into standard FC and tape SCSI command sets. This way one could theoretically attach anybody’s open system tape or virtual tape system onto System z.  Looking at Bus-Tech’s partner list, there were quite a few organizations including Hitachi, NetApp, HP and others aside from EMC using them to do so.

Surprise – Mainframe data growth

Why is there such high interest in mainframes? Mainframe data is big and growing, in some markets almost at open systems/distributed systems growth rates.  I always thought mainframes made better use of data storage, had better utilization, and controlled data growth better.  However, this can only delay growth, it can’t stop it.

Although I have no hard numbers to back up my mainframe data market or growth rates, I do have anecdotal evidence.  I was talking with an admin at one big financial firm a while back and he casually mentioned they had 1.5PB of mainframe data storage under management!  I didn’t think this was possible – he replied not only was this possible, he was certain they weren’t the largest in their vertical/East coast area by any means .

Ok so mainframe data is big and needs lot’s of storage but this also means that mainframe backup needs storage as well.

Surprise 2 – dedupe works great on mainframes

Which brings us back to EMC DLm and their deduplication option.  Recently, EMC announced a deduplication storage target for disk library data used as an alternative to their previous CLARiion target.  This just happens to be a Data Domain 880 appliance behind a DLm engine.

Another surprise, data deduplication works great for mainframe backup data.  It turns out that z/OS users have been doing incremental and full backups for decades.  Obviously, anytime some system uses full backups, dedupe technology can reduce storage requirements substantially.

I talked recently with Tom Meehan at Innovation Data Processing, creators of FDR, one of only two remaining mainframe backup packages (the other being IBM DFSMShsm).  He re-iterated that deduplication works just fine on mainframes assuming you can separate the meta-data from actual backup data.

System z and distributed systems

In the mean time, this past July, IBM recently announced the zBX, System z Blade eXtension hardware system which incorporates Power7 blade servers running AIX into and under System z management and control.  As such, the zBX brings some of the reliability and availability of System z to the AIX open systems environment.

IBM had already supported Linux on System z but that was just a software port.  With zBX, System z could now support open systems hardware as well.  Where this goes from here is anybody’s guess but it’s not a far stretch to talk about running x86 servers under System z’s umbrella.

—-

So there you have it, Bus-Tech is the front-end of EMC DLm system.  As such, it made logical sense if EMC was going to focus more resources in the mainframe dedupe market space to lock up Bus-Tech, a critical technology partner.  Also, given market valuations these days, perhaps the opportunity was too good to pass up.

However, this now leaves Luminex as the last standing independent vendor to provide mainframe attach for open systems.  Luminex and EMC Data Domain already have a “meet-in-the-channel” model to sell low-end deduplication appliances to the mainframe market.  But with the Bus-Tech acquisition we see this slowly moving away and current non-EMC Bus-Tech partners migrating to Luminex or abandoning the mainframe attach market altogether.

[I almost spun up a whole section on CCWs, CKD and other mainframe I/O oddities but it would have detracted from this post’s main topic.  Perhaps, another post will cover mainframe IO oddities, stay tuned.]

Enterprise data storage defined and why 3PAR?

More SNW hall servers and storage
More SNW hall servers and storage

Recent press reports about a bidding war for 3PAR bring into focus the expanding need for enterprise class data storage subsystems.  What exactly is enterprise storage?

Defining enterprise storage is frought with problems but I will take a shot.  Enterprise class data storage has:

  • Enhanced reliability, high availability and serviceability – meaning it hardly ever fails, it keeps operating (on redundant components) when it does fail, and repairing the storage when the rare failure occurs can be accomplished without disrupting ongoing storage services
  • Extreme data integrity – goes beyond just RAID storage, meaning that these systems lose data very infrequently, provide the latest data written to a location when read and will tell you when data cannot be accessed.
  • Automated I/O performance – meaning sophisticated caching algorithms that try to keep ahead of sequential I/O streams, buffer actively read data, and buffer write data in non-volatile cache before destaging to disk or other media.
  • Multiple types of storage – meaning the system supports SATA, SAS and/or FC disk drives and SSDs or Flash storage
  • PBs of storage – meaning behind one enterprise class storage (sub-)system one can support over 1PB of storage
  • Sophisticated functionality – meaning the system supports multiple forms of offsite replication, thin provisioning, storage tiering, point-in-time copies, data cloning, administration GUIs/CLIs, etc.
  • Compatibility with all enterprise O/Ss – meaning the storage has been tested and is on hardware compatibility lists for every major operating system in use by the enterprise today.

As for storage protocol, it seems best to leave this off the list.  I wanted to just add block storage, but enterprises today probably have as much if not more external file storage (CIFS or NFS) as they have block storage (FC or iSCSI).  And the proportion in file systems seems to be growing (see IDC report referenced below).

In addition, while I don’t like the non-determinism of iSCSI or file access protocols, this doesn’t seem to stop such storage from putting up pretty impressive performance numbers (see our performance dispatches).  Anything that can crack 100K I/O or file operations per second probably deserves to call themselves enterprise storage as long as they meet the other requirements.  So, maybe I should add high-performance storage to the list above.

Why the sudden interest in enterprise storage?

Enterprise storage has been around arguably since the 2nd half of last century (for mainframe systems) but lately has become even more interesting as applications deploy to the cloud and server virtualization (from VMware, Microsoft Hyper-V and others) takes over the data center.

Cloud storage and cloud computing services are lowering the entry points for storage and processing, enabling application deployments which were heretofore unaffordable.  These new cloud applications consume storage at increasing rates and don’t seem to be slowing down any time soon.  Arguably, some cloud storage is not enterprise storage but as service levels go up for these applications, providers must ultimately turn to enterprise storage.

In addition, server virtualization transforms the enterprise data center from a single application per server to easily 5 or more applications per physical server.  This trend is raising server utilization, driving more I/O, and requiring higher capacity.  Such “multi-application” storage almost always requires high availability, reliability and performance to work well, generating even more demand for enterprise data storage systems.

Despite all the demand, world wide external storage revenues dropped 12% last year according to IDC.  Now the economy had a lot to do with this decline but another factor reducing external storage revenue is the ongoing drop in the price of storage on a $/GB basis.  To this point, that same IDC report stated that external storage capacity increased 33% last year.

Why Dell & HP wants 3PAR storage?

Margins on enterprise storage are good, some would say very good.  While raw disk storage can be had at under $0.50/GB, enterprise class storage is often 10 or more times that price.  Now that has to cover redundant hardware, software/firmware engineering and other characteristics, but this still leaves pretty good margins.

In my mind, Dell would see enterprise storage as a natural extension of their current enterprise server business.  They already sell and support these customers, including enterprise class storage just adds another product to the mix.  Developing enterprise storage from scratch is probably a 4-7 year journey with the right people, buying 3PAR puts them in the market today with a competitive product.

HP is already in the enterprise storage market today, with their XP and EVA storage subsystems.  However, having their own 3PAR enterprise class storage may get them better margins than their current XP storage OEMed from HDS.  But I think Chuck Hollis’s post on HP’s counter bid for 3PAR may have revealed another side to this discussion – sometime M&A is as much about constraining your competition as it is about adding new capabilities to a company.

——

What do you think?