Are neuromorphic chips a dead end?

Read a recent article about Intel’s Pohoiki Beach neuromorphic system and their Loihi chips, that has scaled up to 8M neurons in IEEE Spectrum (Intel’s neuromorphic system hits 8 M neurons). In the last month or so I wrote up about a two startups one of which seemed (?) to be working on a neuromorphic chip development (see my Photonics computing sees the light of day post).

But first please take our new poll:

I’ve been writing about neuromorphic chips since 2011, 8 long years (see IBM SyNAPSE chip post from 2011 or search my site for “neuromorphic”) and none have been successfully reached the market. The problems with neurmorphic architectures have always been twofold, scaling AND software.

Scaling up neurons

The human brain has ~86B neurons (see wikipedia human brain article). So, 8 million neuromorphic neurons is great, but it’s about 10K X too few. And that doesn’t count the connections between neurons. Some human neurons have over 1000 connections between nerve cells (can’t seem to find this reference anymore?).

Wikimedia commons (481px-Chemical_synapse_schema_cropped)
Wikimedia commons (481px-Chemical_synapse_schema_cropped)

To get from a single chip with 125K neurons to their 8M neuron system, Intel took 64 chips and put them on a couple of boards. To scale that to 86B or so would take ~690, 000 of their neuromorphic chips. Now, no one can say if there’s not some level below 85B neuromorphic neurons, that could support a useful AI solution, but the scaling problem still exists.

Then there’s the synapse connections between neuromorphic neurons problem. The article says that Loihi chips are connected in a heirarchical routing network, which implies to me that there are switches and master switches (and maybe a really big master switch) in their 8M neuromorphic neuron system. Adding another 4 orders of magnitude more neuromorphic neurons to this may be impossible or at least may require another 4 sets of progressively larger switches to be added to their interconnect network. There’s a question of how many hops and the resultant latency in connecting two neuromorphic neurons together but that seems to be the least of the problem with neuromorphic architectures.

Missing software abstractions

The first time I heard about neuromorphic chips I asked what the software looks like and the only thing I heard was that it was complex and not very user friendly and they didn’t want to talk about it.

I keep asking about software for neuromorphic chips and still haven’t gotten a decent answer. So, what’s the problem. In today’s day and age, software is easy to do, relatively inexpensive to produce and can range from spaghetti code to a hierarchical masterpieces, so there’s plenty of room to innovate here.

But whenever I talk to engineers about what the software looks like, it almost seems like a software version of an early plug board unit-record computer (essentially card sorters). Only instead of wires, you have software neuromorphic network connections and instead of electro-magnetic devices, one has software spiking neuromorphic neuron hardware.

The way we left plugboards behind was by building up hardware abstractions such as adders, shifters, multipliers, etc. and moving away from punch cards as a storage medium. Somewhere along this transition, we created programing languages like (macro) Assemblers, COBOL, FORTRAN, LISP, etc. It’s the software languages that brought computing out of the labs and into the market.

It’s been at least 8 years now, and yet, no-one has built a spiking neuromorphic computer language yet. Why not?

I think the problem is there’s no level of abstraction above a neuron. Where’s the aritmetic logic unit (ALU) or register equivalents in neuromorphic computers? They don’t exist as far as I can see.

Until we can come up with some higher levels of abstraction, coding neuromorphic chips is going to be an engineering problem not a commercial endeavor.

But neuromorphism has advantages

The IEEE article states a couple of advantages for neuromorphic computing: less energy to perform inferencing (and possibly training) and the ability to train on incremental data rather than having to train across whole datasets again.

Yes these are great, but there’s a gaggle of startups (e.g., see New GraphCore GC2 chip…, AI processing at the edge, TPU and HW-SW innovation) going after the energy problem in AI DL using Von Neumann architectures.

And the incremental training issue doesn’t seem any easier when you have ~80B neurons, with an occasional 1000s of connections between them to adjust correctly. From my perspective, its training advantage seems illusory at best.

Another advantage of neuromorphism is that it simulates the real analog logic of a human brain. Again, that’s great but a brain takes ~22 years to train (college level). Maybe because neuromorphic chips are electronic perhaps training can be done 100 times faster. But there’s still the software issue

~~~~

I hate to be the bearer of bad news. There’s been some major R&D spend on neuromorphism and it continues today with no abatement.

I just think we’d all be better served figuring out how to program the beast than on –spending more to develop more chip hardware..

This is hard for me to say, as I have always been a proponent of hardware innovation. It’s just that neuromorphic software tools don’t exist yet. And I’m afraid, I don’t see any easy way forward to make any progress on this.

Comments?.

Picture credit(s):

AI processing at the edge

Read a couple of articles over the past few weeks (TechCrunch: Google is making a fast, specialized TPU chip for edge devices … and IEEE Spectrum: Two startups use processing in flash for AI at the edge) about chips for AI at the IoT edge.

The two startups, Syntiant and Mythic, are moving to analog only or analog-digital solutions to provide AI processing needed at the edge while Google is taking their TPU technology to the edge.  We have written about Google’s TPU before (see: TPU and hardware vs. software  innovation (round 3) post).

But first please take our new poll:

The major challenge in AI processing at the edge is power consumption. Both  startups attack the power problem by using flash and other analog circuitry to provide power efficient compute.

Google attacked the power problem with their original TPU by reducing computational precision from 64- to 8-bits. By reducing transistor counts, they lowered power requirements proportionally.

AI today is based on neural networks (NN), that connect simulated neurons via simulated synapses with weights attached to indicate whether to boost or decrease the signal being transmitted. AI learning is done by setting those weights and creating the connections between simulated neurons and the synapses.  So learning is setting weights and establishing connections. Actual inferences (using AI to do something) is a process of exciting input simulated neurons/synapses and letting the signal flow through the NN with each weight being used to determine output(s).

AI with standard compute

The problem with doing AI learning or inferencing with normal CPUs or even CUDAs is that the NN does thousands if not millions of  multiplication-accumulation actions at each simulated synapse-neuron connection. Doing all these multiplication-accumulation takes power. CPUs and CUDAs can do these sorts of operations on 32 or 64 bit numbers or even floating point but it still takes power.

AI processing power

AI processing power is measured in trillions of (accumulate-multiply) operations per second per watt (TOPS/W). Mythic believes it can perform 4 TOPS/W and Syntiant says it can do 20 TOPS/W. In comparison, the NVIDIA Volta V100 can do about 0.4 TOPS/W (according to the article). Although  comparing Syntiant-Mythic TOPS to NVIDIA TOPS is a little like comparing apples to oranges.

A current Intel Xeon Platinum 8180M (2.5Ghz, 28 Core processors, 205 W) can probably do (assuming one multiplication-accumulation per hertz) about 2.5 Billion X 28 Cores = 70 Billion Ops Second/205 W or 0.3 GOPS/W (source: Platinum 8180M Data sheet).

As for Google’s TPU TOPS/W, TPU2 is rated at 45 GFLOPS/chip and best guess for power consumption is between 160W and 200W, let’s say 180W. With power at that level, TPU2 should hit 0.25 GFLOPS/W.  TPU3 is coming out with 8X the power but it uses water cooling (read LOTS MORE POWER).

Nonetheless, it appears that Mythic and Syntiant are one to two orders of magnitude better than the best that NVIDIA and TPU2 can do today and many orders of magnitude better than Intel X86.

Improving TOPS/W

Using NAND, as an analog memory to read, write and hold  NN weights is an easy way to reduce power consumption. Combine that with  analog circuitry that can do multiplication and addition with those flash values and you have a AI NN processor. This way you reduce the need to hold weights in memory and do compute in registers by collapsing both compute and memory into the same componentry.

The major difference between Syntiant and Mythic seems to be the amount of analog circuitry they use. Mythic seems to relegate the analog circuitry to an accelerator while Syntiant has a more extensive use of analog circuitry throughout their chip. Probably why it can perform 5X the TOPS/W of Mythic’s IPU.

IBM and others have been working on neuromorphic chips some of which are analog based and others which are all digital based. We’ve written extensively on IBM and some on MIT’s approaches (for the latest on IBM see: More power efficient deep learning through IBM and PCM, and for MIT see: MIT builds an analog synapse chip) and follow the links there to learn more.

~~~~

Special purpose AI hardware is emerging from the labs and finally reaching reality. IBM R&D has been playing with it for a long time. Google is working on TPU3 so there’s no stopping them. And startups are seeing an opening and are taking everyone on. Stay tuned, were in for a good long ride before the someone rises above the crowd and becomes the next chip giant.

Comments?

Photo Credit(s): TechCrunch  Google is making a fast, specialized TPU chip for edge devices … article

Introduction to Digital Design Verification at Mythic, Medium.com Article

Images from Google Cloud Platform Blog on the TPU

Two startups use processing in flash for AI at the edge, IEEE Spectrum article courtesy of Mythic

More power efficient deep learning through IBM and PCM

Read an article today from MIT Technical Review (TR) (AI could get 100 times more efficient with IBM’s new artificial synapses). Discussing the power efficiency of a new analog approach to neural nets and deep learning.

We have talked about IBM’s TrueNorth and Synapse neuromorphic devices  and PCM neural nets before (see: Parts 1, 2, 3, & 4).

The paper in Nature (Equivalent accuracy accelerated neural training using analogue memory ) referred to by the TR article is behind a pay wall. However, another ArsTechnica (Ars) article (Training a neural network in phase change memory beats GPUs) on the new research was a bit more informative.

Both articles discuss a new analog approach, using phase change memory (PCM) which has significant power/training efficiency when compared to today’s standard GPU AI processor. Both the TR and Ars papers report on IBM developments simulating a new (PCM based) neuromorphic device that reduces training  power consumption AND training time by a factor of 100.   But the Nature paper abstract says it reduces both power consumption and computational space (computations per sq mm) by a factor of 100, not exactly the same.

Why PCM

PCM is a nonvolatile memory technology (see part 4 above for more info) that uses electronically induced phase changes in a material to establish a 1’s or 0’s state for a PCM bit.

However, another advantage of PCM is that it also can take on a state between 0 and 1. This is bad for data memory/storage but good for neural nets.

For a PCM based neural net you could have a layer of PCM (neuron) structures and standard wiring that wires all the PCM neurons to the next layer down, for however many layers required for your neural net. The PCM value would indicate the strength of the connection between neurons (synapses).

But, the problem with a PCM neural net is that PCM states don’t provide enough graduations of values between 0 and 1 to fully map today’s neural net weights.

IBM’s latest design has two different tiers of neural nets

According to Ars article, IBM’s latest design has a two tier approach to using PCM in its neural net. The first, top tier uses a PCM structure and the second lower tier uses a more traditional, silicon based structure and together they implement the neural net.

The Ars article speaks of the new two tier design as providing two digit resolution for the weight between  neuron. The structure implemented in PCM determines the higher order digit and the more traditional, silicon based, neural net segment determines the lower order digit in the two digit neural net weight.

With this approach, training occurs mostly in the more traditional, silicon layer neural net, but every 100 or so training events (epochs),  information is used to modify the PCM structure as well. In this fashion, the PCM-silicon neural net is fine tuned using 1 out of 100 or so training events to correct the PCM layer and the other 99 or so training events to modify the silicon layer.

In addition, the silicon layer is apparently implemented in silicon to mimic the PCM layer, using capacitors and transistors.

~~~~

I wonder why not just use two tiers of PCM to do the same thing but it’s possible that training the silicon layer is more power efficient, speedy or both than the PCM layer.

The TR and Ars articles seem to make a point of saying this is analogue computing. And I would guess because the PCM and the silicon layer can take on many values between 0 and 1 that means it’s not digital.

Much of the article is based on combined hardware (built using 90nm technology) and software simulations of the new PCM-silicon neuromorphic device. However, simulations like this are a standard step in ASIC design process, and if successful, we would expect an chip to emerge from foundry within 6-12 months from now.

The Nature paper’s abstract indicated that they simulated the device using standard (MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100) training datasets for handwritten digit recognition and color image classification/recognition. The new device was able to approach within 1% accuracy of software trained neural net with 1% the power and (when updated to latest foundry technologies) in 1% the space.

Furthermore, the abstract said that the current device supports ~205K synapses. The previous generation, IBM TrueNorth (see part 2 above) had the “equivalent of 1M neurons” and their earlier IBM SYNAPSE (see part 1 above) chip had “256K programable synapses” and 256 computational elements. But I believe both of those were single tier devices.

I’d also be very interested in whether the neuromorphic device is compatible with and could be programmed with PyTorch or TensorFlow but I didn’t see any information on how the devices were programmed.

Comments?

Photo Credit(s): neuron by mararie 

3D CrossPoint graphic, taken from Intel-Micron session at FMS16

brain-neurons by Fotis Bobolas

Collaboration as a function of proximity vs. heterogeneity, MIT research

Read an article the other week in MIT news on how Proximity boosts collaboration on MIT campus. Using MIT patents and papers published between 2004-2014, researchers determined how collaboration varied based on proximity or physical distance.

What they found was that distance matters. The closer you are to a person the more likely you are collaborate with him or her (on papers and patents at least).

Paper results

In looking at the PLOS research paper (An exploration of collaborative scientific production at MIT …), one can see that the relative frequency of collaboration decays as distance increases (Graph A shows frequency of collaboration vs. proximity for papers and Graph B shows a similar relationship for patents).

 

Other paper results

The two sets of charts below show the buildings where research (papers and patents) was generated. Building heterogeneity, crowdedness (lab space/researcher) and number of papers and patents per building is displayed using the color of the building.

The number of papers and patents per building is self evident.

The heterogeneity of a building is a function of the number of different departments that use the building. The crowdedness of a building is an indication of how much lab space per faculty member a building has. So the more crowded buildings are lighter in color and less crowded buildings are darker in color.

I would like to point out Building 32. It seems to have a high heterogeneity, moderate crowdedness and a high paper production but a relatively low patent production. Conversely, Building 68 has a low heterogeneity, low crowdedness and a high production of papers and a relatively low production of patents. So similar results have been obtained from buildings that have different crowdedness and different heterogeneity.

The paper specifically cites buildings 3 & 32 as being most diverse on campus and as “hubs on campus” for research activity.  The paper states that these buildings were outliers in research production on a per person basis.

And yet there’s no global correlation between heterogeneity or crowdedness for that matter and (paper/patent) research production. I view crowdedness as a substitute for researcher proximity. That is the more crowded a building is the closer researchers should be. Such buildings should theoretically be hotbeds of collaboration. But it doesn’t seem like they have any more papers than non-crowded buildings.

Also heterogeneity is often cited as a generator of research. Steven Johnson’s Where Good Ideas Come From, frequently mentions that good research often derives from collaboration outside your area of speciality. And yet, high heterogeneity buildings don’t seem to have a high production of research, at least for patents.

So I am perplexed and unsatisfied with the research. Yes proximity leads to more collaboration but it doesn’t necessarily lead to more papers or patents. The paper shows other information on the number of papers and patents by discipline which may be confounding results in this regard.

Telecommuting and productivity

So what does this tell us about the plight of telecommuters in todays business and R&D environments. While the paper has shown that collaboration goes down as a function of distance, it doesn’t show that an increase in collaboration leads to more research or productivity.

This last chart from the paper shows how collaboration on papers is trending down and on patents is trending up. For both papers and patents, inter-departmental collaboration is more important than inter-building collaboration. Indeed, the sidebars seem to show that the MIT faculty participation in papers and patents is flat over the whole time period even though the number of authors (for papers) and inventors (for patents) is going up.

So, I,  as a one person company can be considered an extreme telecommuter for any organization I work with. I am often concerned that  my lack of proximity to others adversely limits my productivity. Thankfully the research is inconclusive at best on this and if anything tells me that this is not a significant factor in research productivity

And yet, many companies (Yahoo, IBM, and others) have recently instituted policies restricting telecommuting because, they believe,  it  reduces productivity. This research does not show that.

So IBM and Yahoo I think what you are doing to concentrate your employee population and reduce or outright eliminate telecommuting is wrong.

Picture credit(s): All charts and figures are from the PLOS paper. 

 

Moore’s law is still working with new 2D-electronics, just 1nm thin

ncomms8749-f1This week scientists at Oak Ridge National Laboratory have created two dimensional nano-electronic circuits just 1nm tall (see Nature Communications article). Apparently they were able to create one crystal two crystals ontop of one another, then infused the top that layer with sulfur. With that as a base they used  standard scalable photolitographic and electron beam lithographic processing techniques to pattern electronic junctions in the crystal layer and then used a pulsed laser evaporate to burn off selective sulfur atoms from a target (selective sulferization of the material), converting MoSe2 to MoS2. At the end of this process was a 2D electronic circuit just 3 atoms thick, with heterojunctions, molecularly similar to pristine MOS available today, but at much thinner (~1nm) and smaller scale (~5nm).

In other news this month, IBM also announced that they had produced working prototypes of a ~7nm transistor in a processor chip (see NY Times article). IBM sold off their chip foundry a while ago to Global Foundries, but continue working on semiconductor research with SEMATECH, an Albany NY semiconductor research consortium. Recently Samsung and Intel left SEMATECH, maybe a bit too early.

On the other hand, Intel announced they were having some problems getting to the next node in the semiconductor roadmap after their current 14nm transistor chips (see Fortune article).  Intel stated that the last two generations took  2.5 years instead of 2 years, and that pace is likely to continue for the foreseeable future.  Intel seems to be spending more research and $’s creating low-power or new (GPUs) types of processing than in a mad rush to double transistors every 2 years.

480px-Comparison_semiconductor_process_nodes.svgSo taking it all in, Moore’s law is still being fueled by Billion $ R&D budgets and the ever increasing demand for more transistors per area. It may take a little longer to double the transistors on a chip, but we can see at least another two generations down the ITRS semiconductor roadmap. That is, if the Oak Ridge research proves manufacturable as it seems to be.

So Moore’s law has at least another generation or two to run. Whether there’s a need for more processing power is anyone’s guess but the need for cheaper flash, non-volatile memory and DRAM is a certainty for as far as I can see.

Comments?

Photo Credits: 

  1. From “Patterned arrays of lateral heterojunctions within monolayer two-dimensional semiconductors”, by Masoud Mahjouri-Samani, Ming-Wei Lin, Kai Wang, Andrew R. Lupini, Jaekwang Lee, Leonardo Basile, Abdelaziz Boulesbaa, Christopher M. Rouleau, Alexander A. Puretzky, Ilia N. Ivanov, Kai Xiao, Mina Yoon & David B. Geohegan
  2. From Comparison semiconductor process nodes” by Cmglee – Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons – https://commons.wikimedia.org/wiki/File:Comparison_semiconductor_process_nodes.svg#/media/File:Comparison_semiconductor_process_nodes.svg

Cloud based database startups are heating up

IBM recently agreed to purchase Cloudant an online database service using a NoSQL database called CouchDB. Apparently this is an attempt by IBM to take on Amazon and others that support cloud based services using a NoSQL database backend to store massive amounts of data.

In other news, Dassault Systems, a provider of 3D and other design tools has invested $14.2M in NuoDB, a cloud-based NewSQL compliant database service provider. Apparently Dassault intends to start offering its design software as a service offering using NuoDB as a backend database.

We have discussed NewSQL and NoSQL database’s before (see NewSQL and the curse of old SQL databases post) and there are plenty available today. So, why the sudden interest in cloud based database services. I tend to think there are a couple of different trends playing out here.

IBM playing catchup

In the IBM case there’s just so much data going to the cloud these days that IBM just can’t have a hand in it, if it wants to continue to be a major IT service organization.  Amazon and others are blazing this trail and IBM has to get on board or be left behind.

The NoSQL or no relational database model allows for different types of data structuring than the standard tables/rows of traditional RDMS databases. Specifically, NoSQL databases are very useful for data that can be organized in a tree (directed graph), graph (non-directed graph?) or key=value pairs. This latter item is very useful for Hadoop, MapReduce and other big data analytics applications. Doing this in the cloud just makes sense as the data can be both gathered and tanalyzed in the cloud without having anything more than the results of the analysis sent back to a requesting party.

IBM doesn’t necessarily need a SQL database as it already has DB2. IBM already has a cloud-based DB2 service that can be implemented by public or private cloud organizations.  But they have no cloud based NoSQL service today and having one today can make a lot of sense if IBM wants to branch out to more cloud service offerings.

Dassault is broadening their market

As for the cloud based, NuoDB NewSQL database, not all data fits the tree, graph, key=value pair structuring of NoSQL databases. Many traditional applications that use databases today revolve around SQL services and would be hard pressed to move off RDMS.

Also, one ongoing problem with NoSQL databases is that they don’t really support ACID transaction processing and as such, often compromise on data consistency in order to support highly parallelizable activities. In contrast, a SQL database supports rigid transaction consistency and is just the thing for moving something like a traditional OLTP processing application to the cloud.

I would guess, how NuoDB handles the high throughput needed by it’s cloud service partners while still providing ACID transaction consistency is part of its secret sauce.

But what’s behind it, at least some of this interest may just be the internet of things (IoT)

The other thing that seems to be driving a lot of the interest in cloud based databases is the IoT. As more and more devices become internet connected, they will start to generate massive amounts of data. The only way to capture and analyze this data effectively today is with NoSQL and NewSQL database services. By hosting these services in the cloud, analyzing/processing/reporting on this tsunami of data becomes much, much easier.

Storing and analyzing all this IoT data should make for an interesting decade or so as the internet of things gets built out across the world.  Cisco’s CEO, John Chambers recently said that the IoT market will be worth $19T and will have 50B internet connected devices by 2020. Seems a bit of a stretch seeings as how they just predicted (June 2013) to have 10B devices attached to the internet by the middle of last year, but who am I to disagree.

There’s much more to be written about the IoT and its impact on data storage, but that will need to wait for another time… stay tuned.

Comments?

Photo Credit(s): database 2 by Tim Morgan 

 

Has latency become the key metric? SPC-1 LRT results – chart of the month

I was at EMCworld a couple of months back and they were showing off a preview of the next version VNX storage, which was trying to achieve a million IOPS with under a millisecond latency.  Then I attended NetApp’s analyst summit and the discussion at their Flash seminar was how latency was changing the landscape of data storage and how flash latencies were going to enable totally new applications.

One executive at NetApp mentioned that IOPS was never the real problem. As an example, he mentioned one large oil & gas firm that had a peak IOPS of 35K.

Also, there was some discussion at NetApp of trying to come up with a way of segmenting customer applications by latency requirements.  Aside from high frequency trading applications, online payment processing and a few other high-performance database activities, there wasn’t a lot that could easily be identified/quantified today.

IO latencies have been coming down for years now. Sophisticated disk only storage systems have been lowering latencies for over a decade or more.   But since the introduction of SSDs it’s been a whole new ballgame.  For proof all one has to do is examine the top 10 SPC-1 LRT (least response time, measured with workloads@10% of peak activity) results.

Top 10 SPC-1 LRT results, SSD system response times

 

In looking over the top 10 SPC-1 LRT benchmarks (see Figure above) one can see a general pattern.  These systems mostly use SSD or flash storage except for TMS-400, TMS 320 (IBM FlashSystems) and Kaminario’s K2-D which primarily use DRAM storage and backup storage.

Hybrid disk-flash systems seem to start with an LRT of around 0.9 msec (not on the chart above).  These can be found with DotHill, NetApp, and IBM.

Similarly, you almost have to get to as “slow” as 0.93 msec. before you can find any disk only storage systems. But most disk only storage comes with a latency at 1msec or more. Between 1 and 2msec. LRT we see storage from EMC, HDS, HP, Fujitsu, IBM NetApp and others.

There was a time when the storage world was convinced that to get really good response times you had to have a purpose built storage system like TMS or Kaminario or stripped down functionality like IBM’s Power 595.  But it seems that the general purpose HDS HUS, IBM Storwize, and even Huawei OceanStore are all capable of providing excellent latencies with all SSD storage behind them. And all seem to do at least in the same ballpark as the purpose built, TMS RAMSAN-620 SSD storage system.  These general purpose storage systems have just about every advanced feature imaginable with the exception of mainframe attach.

It seems nowadays that there is a trifurcation of latency results going on, based on underlying storage:

  • DRAM only systems at 0.4 msec to ~0.1 msec.
  • SSD/flash only storage at 0.7 down to 0.2msec
  • Disk only storage at 0.93msec and above.

The hybrid storage systems are attempting to mix the economics of disk with the speed of flash storage and seem to be contending with all these single technology, storage solutions. 

It’s a new IO latency world today.  SSD only storage systems are now available from every major storage vendor and many of them are showing pretty impressive latencies.  Now with fully functional storage latency below 0.5msec., what’s the next hurdle for IT.

Comments?

Image: EAB 2006 by TMWolf

 

Enhanced by Zemanta

Latest SPC-1 results – IOPS vs drive counts – chart-of-the-month

Scatter plot of SPC-1  IOPS against Spindle count, with linear regression line showing Y=186.18X + 10227 with R**2=0.96064
(SCISPC111122-004) (c) 2011 Silverton Consulting, All Rights Reserved

[As promised, I am trying to get up-to-date on my performance charts from our monthly newsletters. This one brings us current up through November.]

The above chart plots Storage Performance Council SPC-1 IOPS against spindle count.  On this chart, we have eliminated any SSD systems, systems with drives smaller than 140 GB and any systems with multiple drive sizes.

Alas, the regression coefficient (R**2) of 0.96 tells us that SPC-1 IOPS performance is mainly driven by drive count.  But what’s more interesting here is that as drive counts get higher than say 1000, the variance surrounding the linear regression line widens – implying that system sophistication starts to matter more.

Processing power matters

For instance, if you look at the three systems centered around 2000 drives, they are (from lowest to highest IOPS) 4-node IBM SVC 5.1, 6-node IBM SVC 5.1 and an 8-node HP 3PAR V800 storage system.  This tells us that the more processing (nodes) you throw at an IOPS workload given similar spindle counts, the more efficient it can be.

System sophistication can matter too

The other interesting facet on this chart comes from examining the three systems centered around 250K IOPS that span from ~1150 to ~1500 drives.

  • The 1156 drive system is the latest HDS VSP 8-VSD (virtual storage directors, or processing nodes) running with dynamically (thinly) provisioned volumes – which is the first and only SPC-1 submission using thin provisioning.
  • The 1280 drive system is a (now HP) 3PAR T800 8-node system.
  • The 1536 drive system is an IBM SVC 4.3 8-node storage system.

One would think that thin provisioning would degrade storage performance and maybe it did but without a non-dynamically provisioned HDS VSP benchmark to compare against, it’s hard to tell.  However, the fact that the HDS-VSP performed as well as the other systems did with much lower drive counts seems to tell us that thin provisioning potentially uses hard drives more efficiently than fat provisioning, the 8-VSD HDS VSP is more effective than an 8-node IBM SVC 4.3 and an 8-node (HP) 3PAR T800 systems, or perhaps some combination of these.

~~~~

The full SPC performance report went out to our newsletter subscriber’s last November.  [The one change to this chart from the full report is the date in the chart’s title was wrong and is fixed here].  A copy of the full report will be up on the dispatches page of our website sometime this month (if all goes well). However, you can get performance information now and subscribe to future newsletters to receive these reports even earlier by just sending us an email or using the signup form above right.

For a more extensive discussion of block or SAN storage performance covering SPC-1&-2 (top 30) and ESRP (top 20) results please consider purchasing our recently updated SAN Storage Buying Guide available on our website.

As always, we welcome any suggestions on how to improve our analysis of SPC results or any of our other storage system performance discussions.

Comments?