VPLEX surfaces at EMCWorld

Pat Gelsinger introducting VPLEXes on stage at EMCWorld
Pat Gelsinger introducting VPLEXes on stage at EMCWorld

At EMCWorld today Pat Gelsinger  had a pair of VPLEXes flanking him on stage and actively moving VMs from “Boston” to “Hopkinton” data centers.  They showed a demo of moving a bunch of VMs from one to the other with all of them actively performing transaction processing.  I have written about EMC’s vision in a prior blog post called Caching DaaD for Federated Data Centers.

I talked to an vSpecialist at the Blogging lounge afterwards and asked him where the data actually resided for the VMs that were moved.  He said the data was synchronously replicated and actively being updated  at both locations. They proceeded to long-distance teleport (Vmotion) 500 VMs from Boston to Hopkinton.  After that completed, Chad Sakac powered down the ‘Boston’ VPLEX and everything in ‘Hopkinton’ continued to operate.  All this was done on stage so Boston and Hopkinton data centers were possibly both located in the  convention center but interesting nonetheless.

I asked the vSpecialist how they moved the IP address between the sites and he said they shared the same IP domain.  I am no networking expert but I felt that moving the network addresses seemed the last problem to solve for long distance Vmotion.  But, he said Cisco had solved this with their OTV (Open Transport Virtualization) for  Nexus 7000 which could move IP addresses from one data center to another.

1 Engine VPLEX back view
1 Engine VPLEX back view

Later at the Expo, I talked with a Cisco rep who said they do this by encapsulating Layer 2 protocol messages into a Layer 3 packet. Once encapsulated it can be routed over anyone’s gear to the other site and as long as there was another Nexus 7K switch at the other site within the proper IP domain shared with the server targets for Vmotion then all works fine.  Didn’t ask what happens if the primary Nexus 7K switch/site goes down but my guess is that the IP address movement would cease to work. But for active VM migration between two operational data centers it all seems to hang together.  I asked Cisco if OTV was a formal standard TCP/IP protocol extension and he said he didn’t know.  Which probably means that other switch vendors won’t support OTV.

4 Engine VPLEX back view
4 Engine VPLEX back view

There was a lot of other stuff at EMCWorld today and at the Expo.

  • EMC’s Content Management & Archiving group was renamed Information Intelligence.
  • EMC’s Backup Recovery Systems group was in force on the Expo floor with a big pavilion with Avamar, Networker and Data Domain present.
  • EMC keynotes were mostly about the journey to the private cloud.  VPLEX seemed to be crucial to this journey as EMC sees it.
  • EMCWorld’s show floor was impressive. Lots of  major partners were there RSA, VMware, IOmega, Atmos, VCE, Cisco, Microsoft, Brocade, Dell, CSC, STEC, Forsythe, Qlogic, Emulex and many others.  Talked at length with Microsoft about SharePoint 2010. Still trying to figure that one out.
One table at bloggers lounge StorageNerve & BasRaayman hard at work
One table at bloggers lounge StorageNerve & BasRaayman in the foreground hard at work

I would say the bloggers lounge was pretty busy for most of the day.  Met a lot of bloggers there including StorageNerve (Devang Panchigar), BasRaaymon (Bas Raaymon), Kiwi_Si (Simon Seagrave), DeepStorage (Howard Marks), Wikibon (Dave Valente), and a whole bunch of others.

Well not sure what EMC has in store for day 2, but from my perspective it will be hard to beat day 1.

Full disclosure, I have written a white paper discussing VPLEX for EMC and work with EMC on a number of other projects as well.

Smart metering’s data storage appetite

European smart meter in use (from en.wikipedia.org) (cc)
European smart meter in use (from en.wikipedia.org/wiki/Smart_meter) (cc)

A couple of years back I was talking with a storage person from PG&E and he was concerned about the storage performance aspects of installing smart meters in California.  I saw a website devoted to another electric company in California installing 1.4M smart meters that send information every 15min to the electric company.  Given that this must be only some small portion of California this represents  ~134M electricity recording transactions per day and seems entirely doable. But even at only 128 bytes per transaction, ~17GB a day of electric metering data is ingested for this company’s service area. Naturally, this power company wants to extend smart metering to gas usage as well which should not quite double the data load.

According to US census data there were ~129M households in 2008.  At that same 15 minute interval, smart metering for the whole US would generate 12B transactions a day and at 128 bytes per transaction, would represent ~ 1.5TB/day.  Of course thats only households and only electricity usage.

That same census website indicates there were 7.7M businesses in the US in 2007.  To smart meter these businesses at the same interval would take an additional ~740M transactions a day or ~95GB of data. But fifteen minute intervals may be too long for some companies (and their power suppliers), so maybe it should  be dropped to every minute for businesses.  At one minute intervals, businesses would add 1.4TB of electricity metering data to the household 1.5TB data or a total of ~3TB of data/day.

Storage multiplication tables:

  • That 3TB of day must be backed up so that’s at least another 3TB of day of backup load (deduplication notwithstanding).
  • That 3TB of data must be processed offline as well as online, so that’s another 3TB a day of data copies.
  • That 3TB of data is probably considered part of the power company’s critical infrastructure and as such, must be mirrored to some other data center which is another 3TB a day of mirrored data.

So with this relatively “small” base data load of 3TB a day we are creating an additional 9TB/day of copies.  Over the course of a year this 12TB/day generates ~4.4PB of data.  A study done by StorageTek in the late ’90s showed that on average data was copied 6 times, so the 3 copies above may be conservative.  If the study results held true today for metering data, it would generate ~7.7PB/year.

To paraphrase Senator E. Dirksena petabyte here, a petabyte there and pretty soon your talking real storage.

In prior posts we discussed the 1.5PB of data generated by CERN each year, the expectations for the world to generate an exabyte (XB) a day of data in 2009 and  NSA’s need to capture and analyze a yottabyte (YB) a year of voice data by 2015.  Here we show how another 4-8PB of storage could be created each year just by rolling out smart electricity metering to US businesses and homes.

As more and more aspects of home and business become digitized more data is created each day and it all must be stored someplace – data storage.  Other technology arenas may also benefit from this digitization of life, leisure, and economy but today we would contend that  storage benefits most from this trend. We must defer for now discussions as to why storage benefits more than other technological domains to some future post.

Future iBook pricing

Apple's iPad iBook app (from Apple.com)
Apple's iPad iBook app (from Apple.com)

All the news about iPad and iBooks app got me thinking. There’s been much discussion on e-book pricing but no one is looking at what to charge for items other than books.  I look at this as something like what happened to albums when iTunes came out.  Individual songs were now available without having to buy the whole album.

As such, I started to consider what iBooks should charge for items outside of books.  Specifically,

  • Poems – no reason the iBooks app should not offer poems as well as books but what’s a reasonable price for a poem.  I believe Natalie Goldberg in Writing Down the Bones: Freeing the Writer Within used to charge $0.25 per poem.  So this is a useful lower bound, however considering inflation (and assuming $0.25 was 1976 pricing), in today’s prices this would be closer to $1.66.  With iBooks app’s published commission rate (33% for Apple) future poets would walk away with $1.11 per poem.
  • Haiku – As a short form poem I would argue that a Haiku should cost less than a poem.  So, maybe $0.99 per haiku,would be a reasonable price.
  • Short stories – As a short form book pricing for short stories needs to be somehow proportional to normal e-book pricing.  A typical book has about 10 chapters and as such, it might be reasonable to consider a short story as equal to a chapter.  So maybe 1/10th the price of an e-book is reasonable.  With the prices being discussed for books this would be roughly the price we set for poems.  No doubt incurring the wrath of poets forevermore, I  am willing to say this undercuts the worth of short stories and would suggest something more on the order of $2.49 for a short story.  (Poets please forgive my transgression.)
  • Comic books – Comic books seem close to short stories and with their color graphics would do well on the iPad.  It seems to me that these might be priced somewhere in between short stories and poems,  perhaps at $1.99 each.
  • Magazine articles – I see no reason that magazine articles shouldn’t be offered as well as short stories outside the magazine itself. Once again, color graphics found in most high end magazines should do well on the iPad.  I would assume pricing similar to short stories would make sense here.

University presses, the prime outlet for short stories today, seem similar to small record labels.  Of course,  the iBooks app could easily offer to sell their production as e-books in addition to selling their stories separately. Similar considerations apply to poetry publishers. Selling poems and short stories outside of book form might provide more exposure for the authors/poets and in the long run, more revenue for them and their publishers.  But record companies will attest that your results may vary.

Regarding magazine articles and comic books there seems to be a dependance on advertising revenue that may suffer from iBook publishing.  This could be dealt with by incorporating publisher advertisements in iBook displays of an article or comic book.   However, significant advertisement revenue comes from ads placed outside of articles, such as in back matter, around the table of contents, in-between articles, etc.  This will need to change with the transition to e-articles – revenues may suffer.

Nonetheless, all these industries can continue to do what they do today.  Record companies still exist, perhaps not doing as well as before iTunes, but they still sell CDs.  So there is life after iTunes/iBooks, but one things for certain – it’s different.

Probably missing whole categories of items that could be separated from book form as sold today,  But in my view, anything that could be offered separately probably will be.  Comments?

7 grand challenges for the next storage century

Clock tower (4) by TJ Morris (cc) (from flickr)
Clock tower (4) by TJ Morris (cc) (from flickr)

I saw a recent IEEE Spectrum article on engineering’s grand challenges for the next century and thought something similar should be done for data storage. So this is a start:

  • Replace magnetic storage – most predictions show that magnetic disk storage has another 25 years and magnetic tape another decade after that before they run out of steam. Such end-dates have been wrong before but it is unlikely that we will be using disk or tape 50 years from now. Some sort of solid state device seems most probable as the next evolution of storage. I doubt this will be NAND considering its write endurance and other long-term reliability issues but if such issues could be re-solved maybe it could replace magnetic storage.
  • 1000 year storage – paper can be printed today with non-acidic based ink and retain its image for over a 1000 years. Nothing in data storage today can claim much more than a 100 year longevity. The world needs data storage that lasts much longer than 100 years.
  • Zero energy storage – today SSD/NAND and rotating magnetic media consume energy constantly in order to be accessible. Ultimately, the world needs some sort of storage that only consumes energy when read or written or such storage would provide “online access with offline power consumption”.
  • Convergent fabrics running divergent protocols – whether it’s ethernet, infiniband, FC, or something new, all fabrics should be able to handle any and all storage (and datacenter) protocols. The internet has become so ubiquitous becauset it handles just about any protocol we throw at it. We need the same or something similar for datacenter fabrics.
  • Securing data – securing books or paper is relatively straightforward today, just throw them in a vault/safety deposit box. Securing data seems simple but yet is not widely used today. It doesn’t have to be that way. We need better, more long lasting tools and methodology to secure our data.
  • Public data repositories – libraries exist to provide access to the output of society in the form of books, magazines, papers and other printed artifacts. No such repository exists today for data. Society would be better served if we could store and retrieve data if there were library like institutions could store data. Most of these issues are legal due to data ownership but technological issues exist here as well.
  • Associative accessed storage – Sequential and random access have been around for over half a century now. Associative storage could complement these and be another approach allowing storage to be retrieved by its content. We can kind of do this today by keywording and indexing data. Biological memory is accessed associations or linkages to other concepts, once accessed memory seem almost sequentially accessed from there. Something comparable to biological memory may be required to build more intelligent machines.

Some of these are already being pursued and yet others receive no interest today. Nonetheless, I believe they all deserve investigation, if storage is to continue to serve its primary role to society, as a long term storehouse for society’s culture, thoughts and deeds.

Comments?

Storage strategic inflection points

EMC vs S&P 500 Stock price chart
EMC vs S&P 500 Stock price chart - 20 yrs from Yahoo Finance

Both EMC and Spectra Logic celebrated their 30 years in business this month and it got me to thinking. Both companies started the same time but one is a ~$14B revenue (’09 projected) behemoth and the other a relatively successful, but relatively mid-size storage company (Spectra Logic is private and does not report revenues). What’s the big difference between these two. As far as I can tell both companies have been adequately run for some time now by very smart people. Why is one two or more orders of magnitude bigger than the other – recognizing strategic inflection points is key.

So what is a strategic inflection point? Andy Grove may have coined the term and calls a strategic inflection point a point “… where the old strategic picture dissolves and gives way to the new.” In my view EMC has been more successful at recognizing storage strategic inflection points than Spectra Logic and this explains a major part of their success.

EMC’s history in brief

In listening this week to Joe Tucci’s talk at EMC Analyst Days he talked about the rather humble beginnings of EMC. It started out selling furniture and memory for mainframes (I think) but Joe said it really took off in 1991, almost 12 years after it was founded. It seems they latched onto some DRAM based SSD like storage technology and converted it to use disk as a RAID storage device in the mainframe and later open systems arena. RAID killed off the big (14″ platter) disk devices that had dominated storage at that time and once started could not be stopped. Whether by luck or smarts EMC’s push into RAID storage made them what they are today – probably a little of both.

It was interesting to see how this played out in the storage market space. RAID used smaller disks, first 8″, then 5.25″ and now 3.5″. When first introduced, manufacturing costs for the RAID storage were so low that one couldn’t help but make a profit selling against big disk devices that held 14″ platters. The more successful RAID became, the more available and reliable the smaller disks became which led to a virtuous cycle culminating in the highly reliable 3.5″ disk devices available today. Not sure Joe was at EMC at the time but if he was he would probably have called that transition between big platter disks and RAID a “strategic inflection point” in the storage industry at the time.

Most of EMC’s competitors and customers would probably say that aggressive marketing also helped propel EMC to be the top of the storage heap. I am not sure which came first, the recognition of a strategic inflection like RAID or the EMC marketing machine but, together, they gave EMC a decided advantage that re-constructed the storage industry.

Spectra Logic’s history in brief

As far as I can tell Spectra Logic has been in the backup software for a long time and later started supporting tape technology where they are well known today. Spectra Logic has disk storage systems as well but they seem better known for their tape and backup technology.

The big changes in tape technology over the past 30 years have been tape cartridges and robotics. Although tape cartridges were introduced by IBM (for the IBM 3480 in 1985), the first true tape automation was introduced by Storage Technology Corp. (with the STK 4400 in 1987). Storage Technology rode the wave of the robotics revolution throughout the late 80’s into the mid 90’s and was very successful for a time. Spectra Logic’s entry into tape robotics was sometime later (1995) but by the time they got onboard it was a very successful and mature technology.

Nonetheless, the revolution in tape technology and operations brought on by these two advances, probably held off the decline in tape for a decade or two, and yet it could not ultimately stem the tide in tape use apparent today (see my post on Repositioning of tape). Spectra Logic has recently introduced a new tape library.

Another strategic inflection point that helped EMC

Proprietary “Open” Unix systems had started to emerge in the late 80’s and early 90’s and by the mid 90’s were beginning to host most new and sophisticated applications. The FC interface also emerged in the early to mid 90’s as a replacement to HPC-HPPI technology and for awhile battled it out against SSA technology from IBM but by 1997 emerged victorious. Once FC and the follow-on higher level protocols (resulting in SAN) were available, proprietary Unix systems had the IO architecture to support any application needed by the enterprise and they both took off feeding on each other. This was yet another strategic inflection point and I am not sure if EMC was the first entry into this market but they sure were the biggest and as such, quickly emerged to dominate it. In my mind EMC’s real accelerated growth can be tied to this timeframe.

EMC’s future bets today

Again, today, EMC seems to be in the fray for the next inflection. Their latest bets are on virtualization technology in VMware, NAND-SSD storage and cloud storage. They bet large on the VMware acquisition and it’s working well for them. They were the largest company and earliest to market with NAND-SSD technology in the broad market space and seem to enjoy a commanding lead. Atmos is not the first cloud storage service out there, but once again EMC was one of the largest companies to go after this market.

One can’t help but admire a company that swings for the bleachers every time they get a chance at bat. Not every one is going out of the park but when they get ahold of one, sometimes they can change whole industries.

What’s happening with MRAM?

16Mb MRAM chips from Everspin
16Mb MRAM chips from Everspin

At the recent Flash Memory Summit there were a few announcements that show continued development of MRAM technology which can substitute for NAND or DRAM, has unlimited write cycles and is magnetism based. My interest in MRAM stems from its potential use as a substitute storage technology for today’s SSDs that use SLC and MLC NAND flash memory with much more limited write cycles.

MRAM has the potential to replace NAND SSD technology because of the speed of write (current prototypes write at 400Mhz or a few nanoseconds) and with the potential to go up to 1Ghz. At 400Mhz, MRAM is already much much faster than today’s NAND. And with no write limits, MRAM technology should be very appealing to most SSD vendors.

The problem with MRAM

The only problem is that current MRAM chips use 150nm chip design technology whereas today’s NAND ICs use 32nm chip design technology. All this means that current MRAM chips hold about 1/1000th the memory capacity of today’s NAND chips (16Mb MRAM from Everspin vs 16Gb NAND from multiple vendors). MRAM has to get on the same (chip) design node as NAND to make a significant play for storage intensive applications.

It’s encouraging that somebody at least is starting to manufacture MRAM chips rather than just being lab prototypes with this technology. From my perspective, it can only get better from here…