Hardware vs. software innovation – round 4

We, the industry and I, have had a long running debate on whether hardware innovation still makes sense anymore (see my Hardware vs. software innovation – rounds 1, 2, & 3 posts).

The news within the last week or so is that Dell-EMC cancelled their multi-million$, DSSD project, which was a new hardware innovation intensive, Tier 0 flash storage solution, offering 10 million of IO/sec at 100µsec response times to a rack of servers.

DSSD required specialized hardware and software in the client or host server, specialized cabling between the client and the DSSD storage device and specialized hardware and flash storage in the storage device.

What ultimately did DSSD in, was the emergence of NVMe protocols, NVMe SSDs and RoCE (RDMA over Converged Ethernet) NICs.

Last weeks post on Excelero (see my 4.5M IO/sec@227µsec … post) was just one example of what can be done with such “commodity” hardware. We just finished a GreyBeardsOnStorage podcast (GreyBeards podcast with Zivan Ori, CEO & Co-founder, E8 storage) with E8 Storage which is yet another approach to using NVMe-RoCE “commodity” hardware and providing amazing performance.

Both Excelero and E8 Storage offer over 4 million IO/sec with ~120 to ~230µsec response times to multiple racks of servers. All this with off the shelf, commodity hardware and lots of software magic.

Lessons for future hardware innovation

What can be learned from the DSSD to NVMe(SSDs & protocol)-RoCE technological transition for future hardware innovation:

  1. Closely track all commodity hardware innovations, especially ones that offer similar functionality and/or performance to what you are doing with your hardware.
  2. Intensely focus any specialized hardware innovation to a small subset of functionality that gives you the most bang, most benefits at minimum cost and avoid unnecessary changes to other hardware.
  3. Speedup hardware design-validation-prototype-production cycle as much as possible to get your solution to the market faster and try to outrun and get ahead of commodity hardware innovation for as long as possible.
  4. When (and not if) commodity hardware innovation emerges that provides  similar functionality/performance, abandon your hardware approach as quick as possible and adopt commodity hardware.

Of all the above, I believe the main problem is hardware innovation cycle times. Yes, hardware innovation costs too much (not discussed above) but I believe that these costs are a concern only if the product doesn’t succeed in the market.

When a storage (or any systems) company can startup and in 18-24 months produce a competitive product with only software development and aggressive hardware sourcing/validation/testing, having specialized hardware innovation that takes 18 months to start and another 1-2 years to get to GA ready is way too long.

What’s the solution?

I think FPGA’s have to be a part of any solution to making hardware innovation faster. With FPGA’s hardware innovation can occur in days weeks rather than months to years. Yes ASICs cost much less but cycle time is THE problem from my perspective.

I’d like to think that ASIC development cycle times of design, validation, prototype and production could also be reduced. But I don’t see how. Maybe AI can help to reduce time for design-validation. But independent FABs can only speed the prototype and production phases for new ASICs, so much.

ASIC failures also happen on a regular basis. There’s got to be a way to more quickly fix ASIC and other hardware errors. Yes some hardware fixes can be done in software but occasionally the fix requires hardware changes. A quicker hardware fix approach should help.

Finally, there must be an expectation that commodity hardware will catch up eventually, especially if the market is large enough. So an eventual changeover to commodity hardware should be baked in, from the start.

~~~~

In the end, project failures like this happen. Hardware innovation needs to learn from them and move on. I commend Dell-EMC for making the hard decision to kill the project.

There will be a next time for specialized hardware innovation and it will be better. There are just too many problems that remain in the storage (and systems) industry and a select few of these can only be solved with specialized hardware.

Comments?

Picture credit(s): Gravestones by Sherry NelsonMotherboard 1 by Gareth Palidwor; Copy of a DSSD slide photo taken from EMC presentation by Author (c) Dell-EMC

TPU and hardware vs. software innovation (round 3)

tpu-2At Google IO conference this week, they revealed (see Google supercharges machine learning tasks …) that they had been designing and operating their own processor chips in order to optimize machine learning.

They called the new chip, a Tensor Processing Unit (TPU). According to Google, the TPU provides an order of magnitude more power efficient machine learning over what’s achievable via off the shelf GPU/CPUs. TensorFlow is Google’s open sourced machine learning  software.

This is very interesting, as Google and the rest of the hype-scale hive seem to have latched onto open sourced software and commodity hardware for all their innovation. This has led the industry to believe that hardware customization/innovation is dead and the only thing anyone needs is software developers. I believe this is incorrect and that hardware innovation combined with software innovation is a better way, (see Commodity hardware always loses and Better storage through hardware posts).
Continue reading “TPU and hardware vs. software innovation (round 3)”

Commodity hardware loses again …

IMG_4528It seemed only a a couple of years back that everyone was touting how hardware engineering no longer mattered anymore.

What with Intel and others playing out Moore’s law, why invest in hardware engineering when the real smarts were all in software?

We said then that hardware engineered solutions still had a significant place to play but few believed me (see my posts:  Commodity hardware always loses and Commodity hardware debate heats up ).

Well hardware’s back, …

A few examples;

  1. EMC DSSD – at EMCWorld2015 a couple of weeks back, EMC demoed a new rack-scale flash storage system that was targeted at extremely high IOPS and very low latency. DSSD is a classic study on how proprietary hardware could enable new levels of performance. The solution connected to servers over a PCIe switched network, which didn’t really exist before and used hardware engineered, Flash Modules which were extremely dense, extremely fast and extremely reliable. (See my EMCWorld post on DSSD and our Greybeards on Storage (GBoS) podcast with Chad Sakac for more info on DSSD)
  2. Diablo Memory Channel Storage (MCS) /SanDisk UltraDIMMs – Diablo’s MCS is coming out in SanDisk’s UltraDIMM NAND storage that plugs into DRAM slots and provides a memory paged access to NAND storage. The key is that the hardware logic provides overheads that are ~50 μsecs to access NAND storage. (We’ve wrote  about MCS and UltraDIMMs here).
  3. Hitachi VSP G1000 storage and their Hitachi Accelerated Flash (HAF) – recent SPC-1 results showed that a G1000 outfitted with HAF modules could generate over 2M IOPS and had very low latency (220 μsecs).  (See our announcement summary on the Hitachi G1000 here).

Diablo ran into some legal problems but that’s all behind them now, so the way forward is clear of any extraneous hurdles.

There are other examples of proprietary hardware engineering from IBM FlashSystems,  networking companies, PCIe flash vendors and others but these will suffice to make my point.

My point is if you want to gain orders of magnitude of better performance, you need to seriously consider engaging in some proprietary hardware engineering. Proprietary hardware may take longer than software-only solutions (although that’s somewhat of a function of the resources you throw at), but the performance gains are sometimes unobtainable any other way.

~~~~

Chad made an interesting point on our GBoS podcast, hardware innovation is somewhat cyclical. For a period of time, commodity hardware is much better than any storage solution really needs, so innovation swings to the software arena. But over time, software functionality comes up to speed and maxes out the hardware that’s available and then you need more hardware innovation to take performance to the next level. Then the cycle swings back to hardware engineering. And the cycle will swing back and forth again a lot more times before storage is ever through as an IT technology.

Today when it seems that there’s a new software defined storage solution coming out every month we seem very close to peak software innovation with little left for performance gains, but there’s still plenty left if we open our eyes to consider proprietary hardware.

Welcome to the start of the next hardware innovation cycle – take that commodity hardware.

Comments?

Why EMC is doing Project Lightening and Thunder

Picture of atmospheric lightening striking ground near a building at night
rayo 3 by El Garza (cc) (from Flickr)

Although technically Project Lightening and Thunder represent some interesting offshoots of EMC software, hardware and system prowess,  I wonder why they would decide to go after this particular market space.

There are plenty of alternative offerings in the PCIe NAND memory card space.  Moreover, the PCIe card caching functionality, while interesting is not that hard to replicate and such software capability is not a serious barrier of entry for HP, IBM, NetApp and many, many others.  And the margins cannot be that great.

So why get into this low margin business?

I can see a couple of reasons why EMC might want to do this.

  • Believing in the commoditization of storage performance.  I have had this debate with a number of analysts over the years but there remain many out there that firmly believe that storage performance will become a commodity sooner, rather than later.  By entering the PCIe NAND card IO buffer space, EMC can create a beachhead in this movement that helps them build market awareness, higher manufacturing volumes, and support expertise.  As such, when the inevitable happens and high margins for enterprise storage start to deteriorate, EMC will be able to capitalize on this hard won, operational effectiveness.
  • Moving up the IO stack.  From an applications IO request to the disk device that actually services it is a long journey with multiple places to make money.  Currently, EMC has a significant share of everything that happens after the fabric switch whether it is FC,  iSCSI, NFS or CIFS.  What they don’t have is a significant share in the switch infrastructure or anywhere on the other (host side) of that interface stack.  Yes they have Avamar, Networker, Documentum, and other software that help manage, secure and protect IO activity together with other significant investments in RSA and VMware.   But these represent adjacent market spaces rather than primary IO stack endeavors.  Lightening represents a hybrid software/hardware solution that moves EMC up the IO stack to inside the server.  As such, it represents yet another opportunity to profit from all the IO going on in the data center.
  • Making big data more effective.  The fact that Hadoop doesn’t really need or use high end storage has not been lost to most storage vendors.  With Lightening, EMC has a storage enhancement offering that can readily improve  Hadoop cluster processing.  Something like Lightening’s caching software could easily be tailored to enhance HDFS file access mode and thus, speed up cluster processing.  If Hadoop and big data are to be the next big consumer of storage, then speeding cluster processing will certainly help and profiting by doing this only makes sense.
  • Believing that SSDs will transform storage. To many of us the age of disks is waning.  SSDs, in some form or another, will be the underlying technology for the next age of storage.  The densities, performance and energy efficiency of current NAND based SSD technology are commendable but they will only get better over time.  The capabilities brought about by such technology will certainly transform the storage industry as we know it, if they haven’t already.  But where SSD technology actually emerges is still being played out in the market place.  Many believe that when industry transitions like this happen it’s best to be engaged everywhere change is likely to happen, hoping that at least some of them will succeed. Perhaps PCIe SSD cards may not take over all server IO activity but if it does, not being there or being late will certainly hurt a company’s chances to profit from it.

There may be more reasons I missed here but these seem to be the main ones.  Of the above, I think the last one, SSD rules the next transition is most important to EMC.

They have been successful in the past during other industry transitions.  If anything they have shown similar indications with their acquisitions by buying into transitions if they don’t own them, witness Data Domain, RSA, and VMware.  So I suspect the view in EMC is that doubling down on SSDs will enable them to ride out the next storm and be in a profitable place for the next change, whatever that might be.

And following lightening, Project Thunder

Similarly, Project Thunder seems to represent EMC doubling their bet yet again on the SSDs.  Just about every month I talk to another storage startup coming out in the market providing another new take on storage using every form of SSD imaginable.

However, Project Thunder as envisioned today is not storage, but rather some form of external shared memory.  I have heard this before, in the IBM mainframe space about 15-20 years ago.  At that time shared external memory was going to handle all mainframe IO processing and the only storage left was going to be bulk archive or migration storage – a big threat to the non-IBM mainframe storage vendors at the time.

One problem then was that the shared DRAM memory of the time was way more expensive than sophisticated disk storage and the price wasn’t coming down fast enough to counteract increased demand.  The other problem was making shared memory work with all the existing mainframe applications was not easy.  IBM at least had control over the OS, HW and most of the larger applications at the time.  Yet they still struggled to make it usable and effective, probably some lesson here for EMC.

Fast forward 20 years and NAND based SSDs are the right hardware technology to make  inexpensive shared memory happen.  In addition, the road map for NAND and other SSD technologies looks poised to continue the capacity increase and price reductions necessary to compete effectively with disk in the long run.

However, the challenges then and now seem as much to do with software that makes shared external memory universally effective as with the hardware technology to implement it.  Providing a new storage tier in Linux, Windows and/or VMware is easier said than done. Most recent successes have usually been offshoots of SCSI (iSCSI, FCoE, etc).  Nevertheless, if it was good for mainframes then, it certainly good for Linux, Windows and VMware today.

And that seems to be where Thunder is heading, I think.

Comments?

 

Comments?

Making hardware-software systems design easier

Exposed by AMagill (cc) (from Flickr)
Exposed by AMagill (cc) (from Flickr)

Recent research from MIT on a Streamlining Chip Design was in the news today.  The report described work was done  by Nyrav Dave PhD and Myron King to create a new programming language, BlueSpec that can convert specifications into hardware chip design (Verilog) or compile it into software programming (C++).

BlueSpec designers can tag (annotate) system modules to be hardware or software.  The intent of the project is to make it easier to decide what is done in hardware versus software.  By specifying this decision using a language attribute, it should make architectural hardware-software tradeoffs much easier to do and as a result, delay that decision until much later in the development cycle.

Hardware-software tradeoffs

Making good hardware-software tradeoffs are especially important in mobile handsets where power efficiency and system performance requirements often clash.  It’s not that unusual in these systems that functionality is changed from hardware to software implementations or vice versa.

The problem is that the two different implementations (hardware or software) use different design languages and would typically require a complete re-coding effort to change, delaying system deployment significantly.  Which makes such decisions all the more important to get right early on in system architecture.

In contrast, with BlueSpec, all it would take is a different tag to have the language translate the module into Verilog (chip design language) or C++ (software code).

Better systems through easier hardware design

There is a long running debate around commodity hardware versus special purpose hardware designed systems in storage systems (see Commodity Hardware Always Loses and Commodity Hardware Debate Heats-up Again).  We believe that there will continuing place for special purpose built hardware in storage.  Also, I would go on to say this is likely the case in networking, server systems as well as telecommunications handsets/back-office equipment.

The team at MIT specifically created their language to help create more efficient mobile phone hand sets. But from my perspective it has an equally valid part to play in storage and other systems.

Hardware and software design, more similar than different

Nowadays, hardware and software designers are all just coders using different languages.

Yes hardware engineers have more design constraints and have to deal with the real, physical world of electronics. But what they deal with most, is a hardware design language and design verification tools tailored for their electronic design environment.

Doing hardware design is not that much different from software developers coding in a specific language like C++ or Java.  Software coders must also be able to understand their framework/virtual machine/OS environment their code operates in to produce something that works.  Perhaps, design verification tools don’t work or even exist in software as much as they should but that is more a subject for research than a distinction between the two types of designers.

—-

Whether BlueSpec is the final answer or not isn’t as interesting as the fact that it has taken a first step to unify system design.  Being able to decide much later in the process whether to make a module hardware or software will benefit all system designers and should get products out with less delay.  But getting hardware designers and software coders talking more, using the same language to express their designs can’t help but result in better/tighter integrated designs which end up benefiting the world.

Comments?

Commodity hardware debate heats up again

Gold Nanowire Array by lacomj (cc) (from Flickr)
Gold Nanowire Array by lacomj (cc) (from Flickr)

A post by Chris M. Evans, in his The Storage Architect blog (Intel inside storage arrays) re-invigorated the discussion we had last year on commodity hardware always loses.

But buried in the comments was one from Michael Hay (HDS) which pointed to another blog post by Andrew Huang in his bunnie’s blog (Why the best days of open hardware are ahead) where he has an almost brillant discussion on how Moore’s law will eventually peter out (~5nm) and as such, will take much longer to double transistor density.  At that time, hardware customization (by small companies/startups) will once again, come to the forefront in new technology development.

Custom hardware, here now and the foreseeable future

Although it would be hard to argue against Andrew’s point nevertheless, I firmly believe there is still plenty of opportunity today to customize hardware that brings true value to the market.   The fact is that Moore’s law doesn’t mean that hardware customization cannot still be worthwhile.

Hitachi’s VSP (see Hitachi’s VSP vs. VMAX) is a fine example of the use of both custom ASICs, FPGAs (I believe) and standard off the shelf hardware.   HP’s 3PAR  is another example,  they couldn’t have their speedy mesh architecture without custom hardware.

But will anyone be around that can do custom chip design?

Nigel Poulton commented on Chris’s post that with custom hardware seemingly going away, the infrastructure, training and people will no longer be around to support any re-invigorated custom hardware movement.

I disagree.  Intel, IBM, Samsung, and many others large companies still maintain an active electronics engineering team/chip design capability, any of which are capable of creating state of the art ASICs.  These capabilities are what make Moore’s law a reality and will not go away over the long run (the next 20-30 years).

The fact that these competencies are locked up in very large organizations doesn’t mean it cannot be used by small companies/startups as well.  It probably does mean that these wherewithal may cost more. But the market place will deal with that in the long run, that is if the need continues to exist.

But do we still need custom hardware?

Custom hardware creates capabilities that magnify Moore’s law processing capabilities to do things that standard, off the shelf hardware cannot.  The main problem with Moore’s law from a custom hardware perspective is it takes functionality that once took custom hardware yesterday (or 18 months ago) and makes it available on off the shelf components with custom software today.

This dynamic just means that custom hardware needs to keep moving, providing ever more user benefits and functionality to remain viable.  When custom hardware cannot provide any real benefit over standard off the shelf components – that’s when it will die.

Andrew talks about the time it takes to develop custom ASICs and the fact that by the time you have one ready, a new standard chip has come out which doubles processor capabilities. Yes custom ASICs take time to develop, but FPGAs can be created and deployed in much less time. FPGA’s, like custom ASICs, also take advantage of Moore’s law with increased transistor density every 18 months. Yes, FPGAs  may be run slower than custom ASICs, but what it lacks in processing power, it makes up in time to market.

Custom hardware has a bright future as far as I can see.

—–

Comments?

HDS buys BlueArc

wall o' storage (fisheye) by ChrisDag (cc) (From Flickr)
wall o' storage (fisheye) by ChrisDag (cc) (From Flickr)

Yesterday, HDS announced that they had closed on the purchase of BlueArc their NAS supplier for the past 5 years or so.  Many commentators mentioned that this was a logical evolution of their ongoing OEM agreement, how the timing was right and speculated on what the purchase price might have been.   If you are interested in these aspects of the acquisition, I would refer you to the excellent post by David Vellante from Wikibon on the HDS BlueArc deal.

Hardware as a key differentiator

In contrast, I would like to concentrate here on another view of the purchase, specifically on how HDS and Hitachi, Ltd. have both been working to increase their product differentiation through advanced and specialized hardware (see my post on Hitachi’s VSP vs VMAX and for more on hardware vs. software check out Commodity hardware always loses).

Similarly, BlueArc shared this philosophy and was one of the few NAS vendors to develop special purpose hardware for their Titan and Mercury systems to specifically speedup NFS and CIFS processing.  Most other NAS systems use more general purpose hardware and as a result,  a majority of their R&D investment focuses on software functionality.

But not BlueArc, their performance advantage was highly dependent on specifically designed FPGAs and other hardware.  As such, they have a significant hardware R&D budget to continue their maintain and leverage their unique hardware advantage.

From my perspective, this follows what HDS and Hitachi, Ltd., have been doing all along with the USP, USP-V,  and now their latest entrant the VSP.  If you look under the covers at these products you find a plethora of many special purpose ASICs, FPGAs and other hardware that help accelerate IO performance.

BlueArc and HDS/Hitachi, Ltd. seem to be some of the last vendors standing that still believe that hardware specialization can bring value to data storage. From that standpoint, it makes an awful lot of sense to me to have HDS purchase them.

But others aren’t standing still

In the mean time, scale out NAS products continue to move forward on a number of fronts.  As readers of my newsletter know, currently the SPECsfs2008 overall performance winner is a scale out NAS solution using 144 nodes from EMC Isilon (newsletter signup is above right or can also be found here).

The fact that now HDS/Hitachi, Ltd. can bring their considerable hardware development skills and resources to bear on helping BlueArc develop and deploy their next generation of hardware is a good sign.

Another interesting tidbit was HDS’s previous purchase of ParaScale which seems to have some scale out NAS capabilities of its own.  How this all gets pulled together within HDS’s product line will need to be seen.

In any event, all this means that the battle for NAS isn’t over and is just moving to a higher level.

—-

Comments?

One platform to rule them all – Compellent&EqualLogic&Exanet from Dell

Compellent drive enclosure (c) 2010 Compellent (from Compellent.com)
Compellent drive enclosure (c) 2010 Compellent (from Compellent.com)

Dell and Compellent may be a great match because Compellent uses commodity hardware combined with specialized software to create their storage subsystem. If there’s any company out there that can take advantage of commodity hardware it’s probably Dell. (Of course Commodity hardware always loses in the end, but that’s another story).

Similarly, Dell’s EqualLogic iSCSI storage system uses commodity hardware to provide its iSCSI storage services.  It doesn’t take a big leap of imagination to have one storage system that combines the functionality of EqualLogic’s iSCSI and Compellent’s FC storage capabilities.  Of course there are others already doing this including Compellent themselves which have their own iSCSI support already built into their FC storage system.

Which way to integrate?

Does EqualLogic survive such a merger?  I think so.  It’s easy to imagine that Equal Logic may have the bigger market share today. If that’s so, the right thing might be  to merge Compellent FC functionality into EqualLogic.  If Compellent has the larger market, the correct approach is the opposite. The answer lies probably with a little of both.  It seems easiest to add iSCSI functionality to a FC storage system than the converse but the FC to iSCSI approach may be the optimum path for Dell, because of the popularity of their EqualLogic storage.

What about NAS?

The only thing missing from this storage system is NAS.  Of course the Compellent storage offers a NAS option through the use of a separate Windows Storage Server (WSS) front end.  Dell’s EqualLogic does the much the same to offer NAS protocols for their iSCSI system.  Neither of these are bad solutions but they are not a fully integrated NAS offering such as available from NetApp and others.

However, there is a little discussed part, the Dell-Exanet acquisition which happened earlier this year. Perhaps the right approach is to integrate Exanet with Compellent first and target this at the high end enterprise/HPC market place, keeping Equal Logic at the SMB end of the marketplace.  It’s been a while since I have heard about Exanet, and nothing since the acquisition earlier this year.  Does it make sense to backend a clustered NAS solution with FC storage – probably.

—-

Much of this seems doable to me, but it all depends on taking the right moves once the purchase is closed.   But if I look at where Dell is weakest (baring their OEM agreement with EMC), it’s in the highend storage space.  Compellent probably didn’t have much of a foot print there as possible due to their limited distribution and support channel.  A Dell acquisition could easily eliminate these problems and open up this space without having to do much other than start to marketing, selling and supporting Compellent.

In the end, a storage solution supporting clustered NAS, FC, and iSCSI that combined functionality equivalent to Exanet, Compellent and EqualLogic based on commodity hardware (ouch!) could make a formidable competitor to what’s out there today if done properly. Whether Dell could actually pull this off and in a timely manner even if they purchase Compellent, is another question.

Comments?