TPU and hardware vs. software innovation (round 3)

tpu-2At Google IO conference this week, they revealed (see Google supercharges machine learning tasks …) that they had been designing and operating their own processor chips in order to optimize machine learning.

They called the new chip, a Tensor Processing Unit (TPU). According to Google, the TPU provides an order of magnitude more power efficient machine learning over what’s achievable via off the shelf GPU/CPUs. TensorFlow is Google’s open sourced machine learning  software.

This is very interesting, as Google and the rest of the hype-scale hive seem to have latched onto open sourced software and commodity hardware for all their innovation. This has led the industry to believe that hardware customization/innovation is dead and the only thing anyone needs is software developers. I believe this is incorrect and that hardware innovation combined with software innovation is a better way, (see Commodity hardware always loses and Better storage through hardware posts).
Continue reading “TPU and hardware vs. software innovation (round 3)”

Commodity hardware loses again …

IMG_4528It seemed only a a couple of years back that everyone was touting how hardware engineering no longer mattered anymore.

What with Intel and others playing out Moore’s law, why invest in hardware engineering when the real smarts were all in software?

We said then that hardware engineered solutions still had a significant place to play but few believed me (see my posts:  Commodity hardware always loses and Commodity hardware debate heats up ).

Well hardware’s back, …

A few examples;

  1. EMC DSSD – at EMCWorld2015 a couple of weeks back, EMC demoed a new rack-scale flash storage system that was targeted at extremely high IOPS and very low latency. DSSD is a classic study on how proprietary hardware could enable new levels of performance. The solution connected to servers over a PCIe switched network, which didn’t really exist before and used hardware engineered, Flash Modules which were extremely dense, extremely fast and extremely reliable. (See my EMCWorld post on DSSD and our Greybeards on Storage (GBoS) podcast with Chad Sakac for more info on DSSD)
  2. Diablo Memory Channel Storage (MCS) /SanDisk UltraDIMMs – Diablo’s MCS is coming out in SanDisk’s UltraDIMM NAND storage that plugs into DRAM slots and provides a memory paged access to NAND storage. The key is that the hardware logic provides overheads that are ~50 μsecs to access NAND storage. (We’ve wrote  about MCS and UltraDIMMs here).
  3. Hitachi VSP G1000 storage and their Hitachi Accelerated Flash (HAF) – recent SPC-1 results showed that a G1000 outfitted with HAF modules could generate over 2M IOPS and had very low latency (220 μsecs).  (See our announcement summary on the Hitachi G1000 here).

Diablo ran into some legal problems but that’s all behind them now, so the way forward is clear of any extraneous hurdles.

There are other examples of proprietary hardware engineering from IBM FlashSystems,  networking companies, PCIe flash vendors and others but these will suffice to make my point.

My point is if you want to gain orders of magnitude of better performance, you need to seriously consider engaging in some proprietary hardware engineering. Proprietary hardware may take longer than software-only solutions (although that’s somewhat of a function of the resources you throw at), but the performance gains are sometimes unobtainable any other way.


Chad made an interesting point on our GBoS podcast, hardware innovation is somewhat cyclical. For a period of time, commodity hardware is much better than any storage solution really needs, so innovation swings to the software arena. But over time, software functionality comes up to speed and maxes out the hardware that’s available and then you need more hardware innovation to take performance to the next level. Then the cycle swings back to hardware engineering. And the cycle will swing back and forth again a lot more times before storage is ever through as an IT technology.

Today when it seems that there’s a new software defined storage solution coming out every month we seem very close to peak software innovation with little left for performance gains, but there’s still plenty left if we open our eyes to consider proprietary hardware.

Welcome to the start of the next hardware innovation cycle – take that commodity hardware.


Why EMC is doing Project Lightening and Thunder

Picture of atmospheric lightening striking ground near a building at night
rayo 3 by El Garza (cc) (from Flickr)

Although technically Project Lightening and Thunder represent some interesting offshoots of EMC software, hardware and system prowess,  I wonder why they would decide to go after this particular market space.

There are plenty of alternative offerings in the PCIe NAND memory card space.  Moreover, the PCIe card caching functionality, while interesting is not that hard to replicate and such software capability is not a serious barrier of entry for HP, IBM, NetApp and many, many others.  And the margins cannot be that great.

So why get into this low margin business?

I can see a couple of reasons why EMC might want to do this.

  • Believing in the commoditization of storage performance.  I have had this debate with a number of analysts over the years but there remain many out there that firmly believe that storage performance will become a commodity sooner, rather than later.  By entering the PCIe NAND card IO buffer space, EMC can create a beachhead in this movement that helps them build market awareness, higher manufacturing volumes, and support expertise.  As such, when the inevitable happens and high margins for enterprise storage start to deteriorate, EMC will be able to capitalize on this hard won, operational effectiveness.
  • Moving up the IO stack.  From an applications IO request to the disk device that actually services it is a long journey with multiple places to make money.  Currently, EMC has a significant share of everything that happens after the fabric switch whether it is FC,  iSCSI, NFS or CIFS.  What they don’t have is a significant share in the switch infrastructure or anywhere on the other (host side) of that interface stack.  Yes they have Avamar, Networker, Documentum, and other software that help manage, secure and protect IO activity together with other significant investments in RSA and VMware.   But these represent adjacent market spaces rather than primary IO stack endeavors.  Lightening represents a hybrid software/hardware solution that moves EMC up the IO stack to inside the server.  As such, it represents yet another opportunity to profit from all the IO going on in the data center.
  • Making big data more effective.  The fact that Hadoop doesn’t really need or use high end storage has not been lost to most storage vendors.  With Lightening, EMC has a storage enhancement offering that can readily improve  Hadoop cluster processing.  Something like Lightening’s caching software could easily be tailored to enhance HDFS file access mode and thus, speed up cluster processing.  If Hadoop and big data are to be the next big consumer of storage, then speeding cluster processing will certainly help and profiting by doing this only makes sense.
  • Believing that SSDs will transform storage. To many of us the age of disks is waning.  SSDs, in some form or another, will be the underlying technology for the next age of storage.  The densities, performance and energy efficiency of current NAND based SSD technology are commendable but they will only get better over time.  The capabilities brought about by such technology will certainly transform the storage industry as we know it, if they haven’t already.  But where SSD technology actually emerges is still being played out in the market place.  Many believe that when industry transitions like this happen it’s best to be engaged everywhere change is likely to happen, hoping that at least some of them will succeed. Perhaps PCIe SSD cards may not take over all server IO activity but if it does, not being there or being late will certainly hurt a company’s chances to profit from it.

There may be more reasons I missed here but these seem to be the main ones.  Of the above, I think the last one, SSD rules the next transition is most important to EMC.

They have been successful in the past during other industry transitions.  If anything they have shown similar indications with their acquisitions by buying into transitions if they don’t own them, witness Data Domain, RSA, and VMware.  So I suspect the view in EMC is that doubling down on SSDs will enable them to ride out the next storm and be in a profitable place for the next change, whatever that might be.

And following lightening, Project Thunder

Similarly, Project Thunder seems to represent EMC doubling their bet yet again on the SSDs.  Just about every month I talk to another storage startup coming out in the market providing another new take on storage using every form of SSD imaginable.

However, Project Thunder as envisioned today is not storage, but rather some form of external shared memory.  I have heard this before, in the IBM mainframe space about 15-20 years ago.  At that time shared external memory was going to handle all mainframe IO processing and the only storage left was going to be bulk archive or migration storage – a big threat to the non-IBM mainframe storage vendors at the time.

One problem then was that the shared DRAM memory of the time was way more expensive than sophisticated disk storage and the price wasn’t coming down fast enough to counteract increased demand.  The other problem was making shared memory work with all the existing mainframe applications was not easy.  IBM at least had control over the OS, HW and most of the larger applications at the time.  Yet they still struggled to make it usable and effective, probably some lesson here for EMC.

Fast forward 20 years and NAND based SSDs are the right hardware technology to make  inexpensive shared memory happen.  In addition, the road map for NAND and other SSD technologies looks poised to continue the capacity increase and price reductions necessary to compete effectively with disk in the long run.

However, the challenges then and now seem as much to do with software that makes shared external memory universally effective as with the hardware technology to implement it.  Providing a new storage tier in Linux, Windows and/or VMware is easier said than done. Most recent successes have usually been offshoots of SCSI (iSCSI, FCoE, etc).  Nevertheless, if it was good for mainframes then, it certainly good for Linux, Windows and VMware today.

And that seems to be where Thunder is heading, I think.




Making hardware-software systems design easier

Exposed by AMagill (cc) (from Flickr)
Exposed by AMagill (cc) (from Flickr)

Recent research from MIT on a Streamlining Chip Design was in the news today.  The report described work was done  by Nyrav Dave PhD and Myron King to create a new programming language, BlueSpec that can convert specifications into hardware chip design (Verilog) or compile it into software programming (C++).

BlueSpec designers can tag (annotate) system modules to be hardware or software.  The intent of the project is to make it easier to decide what is done in hardware versus software.  By specifying this decision using a language attribute, it should make architectural hardware-software tradeoffs much easier to do and as a result, delay that decision until much later in the development cycle.

Hardware-software tradeoffs

Making good hardware-software tradeoffs are especially important in mobile handsets where power efficiency and system performance requirements often clash.  It’s not that unusual in these systems that functionality is changed from hardware to software implementations or vice versa.

The problem is that the two different implementations (hardware or software) use different design languages and would typically require a complete re-coding effort to change, delaying system deployment significantly.  Which makes such decisions all the more important to get right early on in system architecture.

In contrast, with BlueSpec, all it would take is a different tag to have the language translate the module into Verilog (chip design language) or C++ (software code).

Better systems through easier hardware design

There is a long running debate around commodity hardware versus special purpose hardware designed systems in storage systems (see Commodity Hardware Always Loses and Commodity Hardware Debate Heats-up Again).  We believe that there will continuing place for special purpose built hardware in storage.  Also, I would go on to say this is likely the case in networking, server systems as well as telecommunications handsets/back-office equipment.

The team at MIT specifically created their language to help create more efficient mobile phone hand sets. But from my perspective it has an equally valid part to play in storage and other systems.

Hardware and software design, more similar than different

Nowadays, hardware and software designers are all just coders using different languages.

Yes hardware engineers have more design constraints and have to deal with the real, physical world of electronics. But what they deal with most, is a hardware design language and design verification tools tailored for their electronic design environment.

Doing hardware design is not that much different from software developers coding in a specific language like C++ or Java.  Software coders must also be able to understand their framework/virtual machine/OS environment their code operates in to produce something that works.  Perhaps, design verification tools don’t work or even exist in software as much as they should but that is more a subject for research than a distinction between the two types of designers.


Whether BlueSpec is the final answer or not isn’t as interesting as the fact that it has taken a first step to unify system design.  Being able to decide much later in the process whether to make a module hardware or software will benefit all system designers and should get products out with less delay.  But getting hardware designers and software coders talking more, using the same language to express their designs can’t help but result in better/tighter integrated designs which end up benefiting the world.


Commodity hardware debate heats up again

Gold Nanowire Array by lacomj (cc) (from Flickr)
Gold Nanowire Array by lacomj (cc) (from Flickr)

A post by Chris M. Evans, in his The Storage Architect blog (Intel inside storage arrays) re-invigorated the discussion we had last year on commodity hardware always loses.

But buried in the comments was one from Michael Hay (HDS) which pointed to another blog post by Andrew Huang in his bunnie’s blog (Why the best days of open hardware are ahead) where he has an almost brillant discussion on how Moore’s law will eventually peter out (~5nm) and as such, will take much longer to double transistor density.  At that time, hardware customization (by small companies/startups) will once again, come to the forefront in new technology development.

Custom hardware, here now and the foreseeable future

Although it would be hard to argue against Andrew’s point nevertheless, I firmly believe there is still plenty of opportunity today to customize hardware that brings true value to the market.   The fact is that Moore’s law doesn’t mean that hardware customization cannot still be worthwhile.

Hitachi’s VSP (see Hitachi’s VSP vs. VMAX) is a fine example of the use of both custom ASICs, FPGAs (I believe) and standard off the shelf hardware.   HP’s 3PAR  is another example,  they couldn’t have their speedy mesh architecture without custom hardware.

But will anyone be around that can do custom chip design?

Nigel Poulton commented on Chris’s post that with custom hardware seemingly going away, the infrastructure, training and people will no longer be around to support any re-invigorated custom hardware movement.

I disagree.  Intel, IBM, Samsung, and many others large companies still maintain an active electronics engineering team/chip design capability, any of which are capable of creating state of the art ASICs.  These capabilities are what make Moore’s law a reality and will not go away over the long run (the next 20-30 years).

The fact that these competencies are locked up in very large organizations doesn’t mean it cannot be used by small companies/startups as well.  It probably does mean that these wherewithal may cost more. But the market place will deal with that in the long run, that is if the need continues to exist.

But do we still need custom hardware?

Custom hardware creates capabilities that magnify Moore’s law processing capabilities to do things that standard, off the shelf hardware cannot.  The main problem with Moore’s law from a custom hardware perspective is it takes functionality that once took custom hardware yesterday (or 18 months ago) and makes it available on off the shelf components with custom software today.

This dynamic just means that custom hardware needs to keep moving, providing ever more user benefits and functionality to remain viable.  When custom hardware cannot provide any real benefit over standard off the shelf components – that’s when it will die.

Andrew talks about the time it takes to develop custom ASICs and the fact that by the time you have one ready, a new standard chip has come out which doubles processor capabilities. Yes custom ASICs take time to develop, but FPGAs can be created and deployed in much less time. FPGA’s, like custom ASICs, also take advantage of Moore’s law with increased transistor density every 18 months. Yes, FPGAs  may be run slower than custom ASICs, but what it lacks in processing power, it makes up in time to market.

Custom hardware has a bright future as far as I can see.



HDS buys BlueArc

wall o' storage (fisheye) by ChrisDag (cc) (From Flickr)
wall o' storage (fisheye) by ChrisDag (cc) (From Flickr)

Yesterday, HDS announced that they had closed on the purchase of BlueArc their NAS supplier for the past 5 years or so.  Many commentators mentioned that this was a logical evolution of their ongoing OEM agreement, how the timing was right and speculated on what the purchase price might have been.   If you are interested in these aspects of the acquisition, I would refer you to the excellent post by David Vellante from Wikibon on the HDS BlueArc deal.

Hardware as a key differentiator

In contrast, I would like to concentrate here on another view of the purchase, specifically on how HDS and Hitachi, Ltd. have both been working to increase their product differentiation through advanced and specialized hardware (see my post on Hitachi’s VSP vs VMAX and for more on hardware vs. software check out Commodity hardware always loses).

Similarly, BlueArc shared this philosophy and was one of the few NAS vendors to develop special purpose hardware for their Titan and Mercury systems to specifically speedup NFS and CIFS processing.  Most other NAS systems use more general purpose hardware and as a result,  a majority of their R&D investment focuses on software functionality.

But not BlueArc, their performance advantage was highly dependent on specifically designed FPGAs and other hardware.  As such, they have a significant hardware R&D budget to continue their maintain and leverage their unique hardware advantage.

From my perspective, this follows what HDS and Hitachi, Ltd., have been doing all along with the USP, USP-V,  and now their latest entrant the VSP.  If you look under the covers at these products you find a plethora of many special purpose ASICs, FPGAs and other hardware that help accelerate IO performance.

BlueArc and HDS/Hitachi, Ltd. seem to be some of the last vendors standing that still believe that hardware specialization can bring value to data storage. From that standpoint, it makes an awful lot of sense to me to have HDS purchase them.

But others aren’t standing still

In the mean time, scale out NAS products continue to move forward on a number of fronts.  As readers of my newsletter know, currently the SPECsfs2008 overall performance winner is a scale out NAS solution using 144 nodes from EMC Isilon (newsletter signup is above right or can also be found here).

The fact that now HDS/Hitachi, Ltd. can bring their considerable hardware development skills and resources to bear on helping BlueArc develop and deploy their next generation of hardware is a good sign.

Another interesting tidbit was HDS’s previous purchase of ParaScale which seems to have some scale out NAS capabilities of its own.  How this all gets pulled together within HDS’s product line will need to be seen.

In any event, all this means that the battle for NAS isn’t over and is just moving to a higher level.



One platform to rule them all – Compellent&EqualLogic&Exanet from Dell

Compellent drive enclosure (c) 2010 Compellent (from
Compellent drive enclosure (c) 2010 Compellent (from

Dell and Compellent may be a great match because Compellent uses commodity hardware combined with specialized software to create their storage subsystem. If there’s any company out there that can take advantage of commodity hardware it’s probably Dell. (Of course Commodity hardware always loses in the end, but that’s another story).

Similarly, Dell’s EqualLogic iSCSI storage system uses commodity hardware to provide its iSCSI storage services.  It doesn’t take a big leap of imagination to have one storage system that combines the functionality of EqualLogic’s iSCSI and Compellent’s FC storage capabilities.  Of course there are others already doing this including Compellent themselves which have their own iSCSI support already built into their FC storage system.

Which way to integrate?

Does EqualLogic survive such a merger?  I think so.  It’s easy to imagine that Equal Logic may have the bigger market share today. If that’s so, the right thing might be  to merge Compellent FC functionality into EqualLogic.  If Compellent has the larger market, the correct approach is the opposite. The answer lies probably with a little of both.  It seems easiest to add iSCSI functionality to a FC storage system than the converse but the FC to iSCSI approach may be the optimum path for Dell, because of the popularity of their EqualLogic storage.

What about NAS?

The only thing missing from this storage system is NAS.  Of course the Compellent storage offers a NAS option through the use of a separate Windows Storage Server (WSS) front end.  Dell’s EqualLogic does the much the same to offer NAS protocols for their iSCSI system.  Neither of these are bad solutions but they are not a fully integrated NAS offering such as available from NetApp and others.

However, there is a little discussed part, the Dell-Exanet acquisition which happened earlier this year. Perhaps the right approach is to integrate Exanet with Compellent first and target this at the high end enterprise/HPC market place, keeping Equal Logic at the SMB end of the marketplace.  It’s been a while since I have heard about Exanet, and nothing since the acquisition earlier this year.  Does it make sense to backend a clustered NAS solution with FC storage – probably.


Much of this seems doable to me, but it all depends on taking the right moves once the purchase is closed.   But if I look at where Dell is weakest (baring their OEM agreement with EMC), it’s in the highend storage space.  Compellent probably didn’t have much of a foot print there as possible due to their limited distribution and support channel.  A Dell acquisition could easily eliminate these problems and open up this space without having to do much other than start to marketing, selling and supporting Compellent.

In the end, a storage solution supporting clustered NAS, FC, and iSCSI that combined functionality equivalent to Exanet, Compellent and EqualLogic based on commodity hardware (ouch!) could make a formidable competitor to what’s out there today if done properly. Whether Dell could actually pull this off and in a timely manner even if they purchase Compellent, is another question.


Commodity hardware always loses

Herman Miller's Embody Chair by johncantrell (cc) (from Flickr)
A recent post by Stephen Foskett has revisted a blog discussion that Chuck Hollis and I had on commodity vs. special purpose hardware.  It’s clear to me that commodity hardware is a losing proposition for the storage industry and for storage users as a whole.  Not sure why everybody else disagrees with me about this.

It’s all about delivering value to the end user.  If one can deliver equivalent value with commodity hardware than possible with special purpose hardware then obviously commodity hardware wins – no question about it.

But, and it’s a big BUT, when some company invests in special purpose hardware, they have an opportunity to deliver better value to their customers.  Yes it’s going to be more expensive on a per unit basis but that doesn’t mean it can’t deliver commensurate benefits to offset that cost disadvantage.

Supercar Run 23 by VOD Cars (cc) (from Flickr)
Supercar Run 23 by VOD Cars (cc) (from Flickr)

Look around, one sees special purpose hardware everywhere. For example, just checkout Apple’s iPad, iPhone, and iPod just to name a few.  None of these would be possible without special, non-commodity hardware.  Yes, if one disassembles these products, you may find some commodity chips, but I venture, the majority of the componentry is special purpose, one-off designs that aren’t readily purchase-able from any chip vendor.  And the benefits it brings, aside from the coolness factor, is significant miniaturization with advanced functionality.  The popularity of these products proves my point entirely – value sells and special purpose hardware adds significant value.

One may argue that the storage industry doesn’t need such radical miniaturization.  I disagree of course, but even so, there are other more pressing concerns worthy of hardware specialization, such as reduced power and cooling, increased data density and higher IO performance, to name just a few.   Can some of this be delivered with SBB and other mass-produced hardware designs, perhaps.  But I believe that with judicious selection of special purposed hardware, the storage value delivered along these dimensions can be 10 times more than what can be done with commodity hardware.

Cuba Gallery: France / Paris / Louvre / architecture / people / buildings / design / style / photography by Cuba Gallery (cc) (from Flickr)
Cuba Gallery: France / Paris / Louvre / ... by Cuba Gallery (cc) (from Flickr)

Special purpose HW cost and development disadvantages denied

The other advantage to commodity hardware is the belief that it’s just easier to develop and deliver functionality in software than hardware.  (I disagree, software functionality can be much harder to deliver than hardware functionality, maybe a subject for a different post).  But hardware development is becoming more software like every day.  Most hardware engineers do as much coding as any software engineer I know and then some.

Then there’s the cost of special purpose hardware but ASIC manufacturing is getting more commodity like every day.   Several hardware design shops exist that sell off the shelf processor and other logic one can readily incorporate into an ASIC and Fabs can be found that will manufacture any ASIC design at a moderate price with reasonable volumes.  And, if one doesn’t need the cost advantage of ASICs, use FPGAs and CPLDs to develop special purpose hardware with programmable logic.  This will cut engineering and development lead-times considerably but will cost commensurably more than ASICs.

Do we ever  stop innovating?

Probably the hardest argument to counteract is that over time, commodity hardware becomes more proficient at providing the same value as special purpose hardware.  Although this may be true, products don’t have to stand still.  One can continue to innovate and always increase the market delivered value for any product.

If there comes a time when further product innovation is not valued by the market than and only then, does commodity hardware win.  However, chairs, cars, and buildings have all been around for many years, decades, even centuries now and innovation continues to deliver added value.  I can’t see where the data storage business will be any different a century or two from now…