Collaboration as a function of proximity vs. heterogeneity, MIT research

Read an article the other week in MIT news on how Proximity boosts collaboration on MIT campus. Using MIT patents and papers published between 2004-2014, researchers determined how collaboration varied based on proximity or physical distance.

What they found was that distance matters. The closer you are to a person the more likely you are collaborate with him or her (on papers and patents at least).

Paper results

In looking at the PLOS research paper (An exploration of collaborative scientific production at MIT …), one can see that the relative frequency of collaboration decays as distance increases (Graph A shows frequency of collaboration vs. proximity for papers and Graph B shows a similar relationship for patents).

 

Other paper results

The two sets of charts below show the buildings where research (papers and patents) was generated. Building heterogeneity, crowdedness (lab space/researcher) and number of papers and patents per building is displayed using the color of the building.

The number of papers and patents per building is self evident.

The heterogeneity of a building is a function of the number of different departments that use the building. The crowdedness of a building is an indication of how much lab space per faculty member a building has. So the more crowded buildings are lighter in color and less crowded buildings are darker in color.

I would like to point out Building 32. It seems to have a high heterogeneity, moderate crowdedness and a high paper production but a relatively low patent production. Conversely, Building 68 has a low heterogeneity, low crowdedness and a high production of papers and a relatively low production of patents. So similar results have been obtained from buildings that have different crowdedness and different heterogeneity.

The paper specifically cites buildings 3 & 32 as being most diverse on campus and as “hubs on campus” for research activity.  The paper states that these buildings were outliers in research production on a per person basis.

And yet there’s no global correlation between heterogeneity or crowdedness for that matter and (paper/patent) research production. I view crowdedness as a substitute for researcher proximity. That is the more crowded a building is the closer researchers should be. Such buildings should theoretically be hotbeds of collaboration. But it doesn’t seem like they have any more papers than non-crowded buildings.

Also heterogeneity is often cited as a generator of research. Steven Johnson’s Where Good Ideas Come From, frequently mentions that good research often derives from collaboration outside your area of speciality. And yet, high heterogeneity buildings don’t seem to have a high production of research, at least for patents.

So I am perplexed and unsatisfied with the research. Yes proximity leads to more collaboration but it doesn’t necessarily lead to more papers or patents. The paper shows other information on the number of papers and patents by discipline which may be confounding results in this regard.

Telecommuting and productivity

So what does this tell us about the plight of telecommuters in todays business and R&D environments. While the paper has shown that collaboration goes down as a function of distance, it doesn’t show that an increase in collaboration leads to more research or productivity.

This last chart from the paper shows how collaboration on papers is trending down and on patents is trending up. For both papers and patents, inter-departmental collaboration is more important than inter-building collaboration. Indeed, the sidebars seem to show that the MIT faculty participation in papers and patents is flat over the whole time period even though the number of authors (for papers) and inventors (for patents) is going up.

So, I,  as a one person company can be considered an extreme telecommuter for any organization I work with. I am often concerned that  my lack of proximity to others adversely limits my productivity. Thankfully the research is inconclusive at best on this and if anything tells me that this is not a significant factor in research productivity

And yet, many companies (Yahoo, IBM, and others) have recently instituted policies restricting telecommuting because, they believe,  it  reduces productivity. This research does not show that.

So IBM and Yahoo I think what you are doing to concentrate your employee population and reduce or outright eliminate telecommuting is wrong.

Picture credit(s): All charts and figures are from the PLOS paper. 

 

Moore’s law is still working with new 2D-electronics, just 1nm thin

ncomms8749-f1This week scientists at Oak Ridge National Laboratory have created two dimensional nano-electronic circuits just 1nm tall (see Nature Communications article). Apparently they were able to create one crystal two crystals ontop of one another, then infused the top that layer with sulfur. With that as a base they used  standard scalable photolitographic and electron beam lithographic processing techniques to pattern electronic junctions in the crystal layer and then used a pulsed laser evaporate to burn off selective sulfur atoms from a target (selective sulferization of the material), converting MoSe2 to MoS2. At the end of this process was a 2D electronic circuit just 3 atoms thick, with heterojunctions, molecularly similar to pristine MOS available today, but at much thinner (~1nm) and smaller scale (~5nm).

In other news this month, IBM also announced that they had produced working prototypes of a ~7nm transistor in a processor chip (see NY Times article). IBM sold off their chip foundry a while ago to Global Foundries, but continue working on semiconductor research with SEMATECH, an Albany NY semiconductor research consortium. Recently Samsung and Intel left SEMATECH, maybe a bit too early.

On the other hand, Intel announced they were having some problems getting to the next node in the semiconductor roadmap after their current 14nm transistor chips (see Fortune article).  Intel stated that the last two generations took  2.5 years instead of 2 years, and that pace is likely to continue for the foreseeable future.  Intel seems to be spending more research and $’s creating low-power or new (GPUs) types of processing than in a mad rush to double transistors every 2 years.

480px-Comparison_semiconductor_process_nodes.svgSo taking it all in, Moore’s law is still being fueled by Billion $ R&D budgets and the ever increasing demand for more transistors per area. It may take a little longer to double the transistors on a chip, but we can see at least another two generations down the ITRS semiconductor roadmap. That is, if the Oak Ridge research proves manufacturable as it seems to be.

So Moore’s law has at least another generation or two to run. Whether there’s a need for more processing power is anyone’s guess but the need for cheaper flash, non-volatile memory and DRAM is a certainty for as far as I can see.

Comments?

Photo Credits: 

  1. From “Patterned arrays of lateral heterojunctions within monolayer two-dimensional semiconductors”, by Masoud Mahjouri-Samani, Ming-Wei Lin, Kai Wang, Andrew R. Lupini, Jaekwang Lee, Leonardo Basile, Abdelaziz Boulesbaa, Christopher M. Rouleau, Alexander A. Puretzky, Ilia N. Ivanov, Kai Xiao, Mina Yoon & David B. Geohegan
  2. From Comparison semiconductor process nodes” by Cmglee – Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons – https://commons.wikimedia.org/wiki/File:Comparison_semiconductor_process_nodes.svg#/media/File:Comparison_semiconductor_process_nodes.svg

Cloud based database startups are heating up

IBM recently agreed to purchase Cloudant an online database service using a NoSQL database called CouchDB. Apparently this is an attempt by IBM to take on Amazon and others that support cloud based services using a NoSQL database backend to store massive amounts of data.

In other news, Dassault Systems, a provider of 3D and other design tools has invested $14.2M in NuoDB, a cloud-based NewSQL compliant database service provider. Apparently Dassault intends to start offering its design software as a service offering using NuoDB as a backend database.

We have discussed NewSQL and NoSQL database’s before (see NewSQL and the curse of old SQL databases post) and there are plenty available today. So, why the sudden interest in cloud based database services. I tend to think there are a couple of different trends playing out here.

IBM playing catchup

In the IBM case there’s just so much data going to the cloud these days that IBM just can’t have a hand in it, if it wants to continue to be a major IT service organization.  Amazon and others are blazing this trail and IBM has to get on board or be left behind.

The NoSQL or no relational database model allows for different types of data structuring than the standard tables/rows of traditional RDMS databases. Specifically, NoSQL databases are very useful for data that can be organized in a tree (directed graph), graph (non-directed graph?) or key=value pairs. This latter item is very useful for Hadoop, MapReduce and other big data analytics applications. Doing this in the cloud just makes sense as the data can be both gathered and tanalyzed in the cloud without having anything more than the results of the analysis sent back to a requesting party.

IBM doesn’t necessarily need a SQL database as it already has DB2. IBM already has a cloud-based DB2 service that can be implemented by public or private cloud organizations.  But they have no cloud based NoSQL service today and having one today can make a lot of sense if IBM wants to branch out to more cloud service offerings.

Dassault is broadening their market

As for the cloud based, NuoDB NewSQL database, not all data fits the tree, graph, key=value pair structuring of NoSQL databases. Many traditional applications that use databases today revolve around SQL services and would be hard pressed to move off RDMS.

Also, one ongoing problem with NoSQL databases is that they don’t really support ACID transaction processing and as such, often compromise on data consistency in order to support highly parallelizable activities. In contrast, a SQL database supports rigid transaction consistency and is just the thing for moving something like a traditional OLTP processing application to the cloud.

I would guess, how NuoDB handles the high throughput needed by it’s cloud service partners while still providing ACID transaction consistency is part of its secret sauce.

But what’s behind it, at least some of this interest may just be the internet of things (IoT)

The other thing that seems to be driving a lot of the interest in cloud based databases is the IoT. As more and more devices become internet connected, they will start to generate massive amounts of data. The only way to capture and analyze this data effectively today is with NoSQL and NewSQL database services. By hosting these services in the cloud, analyzing/processing/reporting on this tsunami of data becomes much, much easier.

Storing and analyzing all this IoT data should make for an interesting decade or so as the internet of things gets built out across the world.  Cisco’s CEO, John Chambers recently said that the IoT market will be worth $19T and will have 50B internet connected devices by 2020. Seems a bit of a stretch seeings as how they just predicted (June 2013) to have 10B devices attached to the internet by the middle of last year, but who am I to disagree.

There’s much more to be written about the IoT and its impact on data storage, but that will need to wait for another time… stay tuned.

Comments?

Photo Credit(s): database 2 by Tim Morgan 

 

Has latency become the key metric? SPC-1 LRT results – chart of the month

I was at EMCworld a couple of months back and they were showing off a preview of the next version VNX storage, which was trying to achieve a million IOPS with under a millisecond latency.  Then I attended NetApp’s analyst summit and the discussion at their Flash seminar was how latency was changing the landscape of data storage and how flash latencies were going to enable totally new applications.

One executive at NetApp mentioned that IOPS was never the real problem. As an example, he mentioned one large oil & gas firm that had a peak IOPS of 35K.

Also, there was some discussion at NetApp of trying to come up with a way of segmenting customer applications by latency requirements.  Aside from high frequency trading applications, online payment processing and a few other high-performance database activities, there wasn’t a lot that could easily be identified/quantified today.

IO latencies have been coming down for years now. Sophisticated disk only storage systems have been lowering latencies for over a decade or more.   But since the introduction of SSDs it’s been a whole new ballgame.  For proof all one has to do is examine the top 10 SPC-1 LRT (least response time, measured with workloads@10% of peak activity) results.

Top 10 SPC-1 LRT results, SSD system response times

 

In looking over the top 10 SPC-1 LRT benchmarks (see Figure above) one can see a general pattern.  These systems mostly use SSD or flash storage except for TMS-400, TMS 320 (IBM FlashSystems) and Kaminario’s K2-D which primarily use DRAM storage and backup storage.

Hybrid disk-flash systems seem to start with an LRT of around 0.9 msec (not on the chart above).  These can be found with DotHill, NetApp, and IBM.

Similarly, you almost have to get to as “slow” as 0.93 msec. before you can find any disk only storage systems. But most disk only storage comes with a latency at 1msec or more. Between 1 and 2msec. LRT we see storage from EMC, HDS, HP, Fujitsu, IBM NetApp and others.

There was a time when the storage world was convinced that to get really good response times you had to have a purpose built storage system like TMS or Kaminario or stripped down functionality like IBM’s Power 595.  But it seems that the general purpose HDS HUS, IBM Storwize, and even Huawei OceanStore are all capable of providing excellent latencies with all SSD storage behind them. And all seem to do at least in the same ballpark as the purpose built, TMS RAMSAN-620 SSD storage system.  These general purpose storage systems have just about every advanced feature imaginable with the exception of mainframe attach.

It seems nowadays that there is a trifurcation of latency results going on, based on underlying storage:

  • DRAM only systems at 0.4 msec to ~0.1 msec.
  • SSD/flash only storage at 0.7 down to 0.2msec
  • Disk only storage at 0.93msec and above.

The hybrid storage systems are attempting to mix the economics of disk with the speed of flash storage and seem to be contending with all these single technology, storage solutions. 

It’s a new IO latency world today.  SSD only storage systems are now available from every major storage vendor and many of them are showing pretty impressive latencies.  Now with fully functional storage latency below 0.5msec., what’s the next hurdle for IT.

Comments?

Image: EAB 2006 by TMWolf

 

Enhanced by Zemanta

Latest SPC-1 results – IOPS vs drive counts – chart-of-the-month

Scatter plot of SPC-1  IOPS against Spindle count, with linear regression line showing Y=186.18X + 10227 with R**2=0.96064
(SCISPC111122-004) (c) 2011 Silverton Consulting, All Rights Reserved

[As promised, I am trying to get up-to-date on my performance charts from our monthly newsletters. This one brings us current up through November.]

The above chart plots Storage Performance Council SPC-1 IOPS against spindle count.  On this chart, we have eliminated any SSD systems, systems with drives smaller than 140 GB and any systems with multiple drive sizes.

Alas, the regression coefficient (R**2) of 0.96 tells us that SPC-1 IOPS performance is mainly driven by drive count.  But what’s more interesting here is that as drive counts get higher than say 1000, the variance surrounding the linear regression line widens – implying that system sophistication starts to matter more.

Processing power matters

For instance, if you look at the three systems centered around 2000 drives, they are (from lowest to highest IOPS) 4-node IBM SVC 5.1, 6-node IBM SVC 5.1 and an 8-node HP 3PAR V800 storage system.  This tells us that the more processing (nodes) you throw at an IOPS workload given similar spindle counts, the more efficient it can be.

System sophistication can matter too

The other interesting facet on this chart comes from examining the three systems centered around 250K IOPS that span from ~1150 to ~1500 drives.

  • The 1156 drive system is the latest HDS VSP 8-VSD (virtual storage directors, or processing nodes) running with dynamically (thinly) provisioned volumes – which is the first and only SPC-1 submission using thin provisioning.
  • The 1280 drive system is a (now HP) 3PAR T800 8-node system.
  • The 1536 drive system is an IBM SVC 4.3 8-node storage system.

One would think that thin provisioning would degrade storage performance and maybe it did but without a non-dynamically provisioned HDS VSP benchmark to compare against, it’s hard to tell.  However, the fact that the HDS-VSP performed as well as the other systems did with much lower drive counts seems to tell us that thin provisioning potentially uses hard drives more efficiently than fat provisioning, the 8-VSD HDS VSP is more effective than an 8-node IBM SVC 4.3 and an 8-node (HP) 3PAR T800 systems, or perhaps some combination of these.

~~~~

The full SPC performance report went out to our newsletter subscriber’s last November.  [The one change to this chart from the full report is the date in the chart’s title was wrong and is fixed here].  A copy of the full report will be up on the dispatches page of our website sometime this month (if all goes well). However, you can get performance information now and subscribe to future newsletters to receive these reports even earlier by just sending us an email or using the signup form above right.

For a more extensive discussion of block or SAN storage performance covering SPC-1&-2 (top 30) and ESRP (top 20) results please consider purchasing our recently updated SAN Storage Buying Guide available on our website.

As always, we welcome any suggestions on how to improve our analysis of SPC results or any of our other storage system performance discussions.

Comments?

New wireless technology augmenting data center cabling

1906 Patent for Wireless Telegraphy by Wesley Fryer (cc) (from Flickr)
1906 Patent for Wireless Telegraphy by Wesley Fryer (cc) (from Flickr)

I read a report today in Technology Review about how Bouncing data would speed up data centers, which talked about using wireless technology and special ceiling tiles to create dedicated data links between servers.  The wireless signal was in the 60Ghz range and would yield something on the order of couple of Gb per second.

The cable mess

Wireless could solve a problem evident to anyone that has looked under data center floor tiles today – cabling.  Underneath our data centers today there is a spaghetti-like labyrinth of cables connecting servers to switches to storage and back again.  The amount of cables underneath some data centers is so deep and impenetrable that some shops don’t even try to extract old cables when replacing equipment just leaving them in place and layering on new ones as the need arises.

Bouncing data around a data center

The nice thing about the new wireless technology is that you can easily set up a link between two servers (or servers and switches) by just properly positioning antenna and ceiling tiles, without needing any cables.  However, in order to increase bandwidth and reduce interference the signal has to be narrowly focused which makes the technology point-to-point, requiring line of sight between the end points.   But with signal bouncing ceiling tiles, a “line-of-sight” pathway could readily be created around the data center.

This could easily be accomplished by different shaped ceiling tiles such as pyramids, flat panels, or other geometric configurations that would guide the radio signal to the correct transceiver.

I see it all now, the data center of the future would have its ceiling studded with geometrical figures protruding below the tiles, providing wave guides for wireless data paths, routing the signals around obstacles to its final destination.

Probably other questions remain.

  • It appears the technology can only support 4 channels per stream.  Which means it might not scale up to much beyond current speeds.
  • Electromagnetic radiation is something most IT equipment tries to eliminate rather than transmit.  Having something generate and receive radio waves in a data center may require different equipment regulations and having those types of signals bouncing around a data center may make proper shielding more of a concern..
  • Signaling interference is a real problem which might make routing these signals even more of a problem than routing cables.  Which is why I believe they need  some sort of multi-directional wireless switching equipment might help.

In the report, there wasn’t any discussion as to the energy costs of the wireless technology and that may be another issue to consider. However, any reduction in cabling can only help IT labor costs which are a major factor in today’s data center economics.

~~~~

It’s just in investigation stages now but Intel, IBM and others are certainly thinking about how wireless technology could help the data centers of tomorrow reduce costs, clutter and cables.

All this gives a whole new meaning to top of rack switching.

Comments?

Analog neural simulation or digital neuromorphic computing vs. AI

DSC_9051 by Greg Gorman (cc) (from Flickr)
DSC_9051 by Greg Gorman (cc) (from Flickr)

At last week’s IBM Smarter Computing Forum we had a session on Watson, IBM’s artificial intelligence machine which won Jeopardy last year and another session on IBM sponsored research helping to create the SyNAPSE digital neuromorphic computing chip.

Putting “Watson to work”

Apparently, IBM is taking Watson’s smarts and applying it to health care and other information intensive verticals (intelligence, financial services, etc.).  At the conference IBM had Monoj Saxena, senior director Watson Solutions and Dr. Herbert Chase, a professor of clinical medicine a senior medical professor from Columbia School of Medicine come up and talk about Watson in healthcare.

Mr. Saxena’s contention and Dr. Chase concurred that Watson can play at important part in helping healthcare apply current knowledge.  Watson’s core capability is the ability to ingest and make sense of information and then be able to apply that knowledge.  In this case, using medical research knowledge to help diagnose patient problems.

Dr. Chase had been struck at a young age by one patient that had what appeared to be an incurable and unusual disease.  He was an intern at the time and was given the task to diagnose her issue.  Eventually, he was able to provide a proper diagnosis but it irked him that it took so long and so many doctors to get there.

So as a test of Watson’s capabilities, Dr. Chase input this person’s medical symptoms into Watson and it was able to provide a list of potential diagnosises.  Sure enough, Watson did list the medical problem the patient actually had those many years ago.

At the time, I mentioned to another analyst that Watson seemed to represent the end game of artificial intelligence. Almost a final culmination and accumulation of 60 years in AI research, creating a comprehensive service offering for a number of verticals.

That’s all great, but it’s time to move on.

SyNAPSE is born

In the next session IBM had Dr. Dharmenrad Modta come up and talk about their latest SyNAPSE chip, a new neueromorphic digital silicon chip that mimicked the brain to model neurological processes.

We are quite a ways away from productization of the SyNAPSE chip.  Dr. Modha showed us a real-time exhibition of the SyNAPSE chip in action (connected to his laptop) with it interpreting a handwritten numeral into it’s numerical representation.  I would say it’s a bit early yet, to see putting “SyNAPSE to work”.

Digital vs. analog redux

I have written about the SyNAPSE neuromorphic chip and a competing technology, the direct analog simulation of neural processes before (see IBM introduces SyNAPSE chip and MIT builds analog synapse chip).  In the MIT brain chip post I discussed the differences between the two approaches focusing on the digital vs. analog divide.

It seems that IBM research is betting on digital neuromorphic computing.  At the Forum last week, I had a discussion with a senior exec in IBM’s STG group, who said that the history of electronic computing over the last half century or so has been mostly about the migration from analog to digital technologies.

Yes, but that doesn’t mean that digital is better, just more easy to produce.

On that topic, I asked the Dr. Modha, on what he thought of MIT’s analog brain chip.  He said

  • MIT’s brain chip was built on 180nm fabrication processes whereas his is on 45nm or over 3X finer. Perhaps the fact that IBM has some of the best fab’s in the world may have something to do with this.
  • The digital SyNAPSE chip can potentially operate at 5.67Ghz and will be absolutely faster than any analog brain simulation.   Yes, but each analog simulated neuron is actually one of a parallel processing complex and with a 1’000 or a million of them operating even 1000X or million X slower it’s should be able to keep up.
  • The digital SyNAPSE chip was carefully designed to be complementary to current digital technology.   As I look at IT today we are surrounded by analog devices that interface very well with the digital computing environment, so I don’t think this will be a problem when we are ready to use it.

Analog still surrounds us and defines the real world.  Someday the computing industry will awaken from it’s digital hobby horse and somehow see the truth in that statement.

~~~~

In any case, if it takes another 60 years to productize one of these technologies then the Singularity is farther away than I thought, somewhere around 2071 should about do it.

Comments?

Services and products, a match made in heaven

wrench rust by HVargas (cc) (from Flickr)
wrench rust by HVargas (cc) (from Flickr)

In all the hoopla about company’s increasing services revenues what seems to be missing is that hardware and software sales automatically drive lots of services revenues.

A recent Wikibon post by Doug Chandler (see Can cloud pull services and technology together …) showed a chart of leading IT companies percent of revenue from services.  The percentages ranged from a high of 57% for  IBM to a low of 12% for Dell, with the median being ~26.5%.

In the beginning, …

It seems to me that services started out being an adjunct to hardware and software sales – i.e., maintenance, help to install the product, provide operational support, etc. Over time, companies like IBM and others went after service offerings as a separate distinct business activity, outside of normal HW and SW sales cycles.

This turned out to be a great revenue booster, and practically turned IBM around in the 90s.   However, one problem with hardware and software vendors reporting of service revenue is that they also embed break-fix, maintenance and infrastructure revenue streams in these line items.

The Wikibon blog mentioned StorageTek’s great service revenue business when Sun purchased them.  I recall that at the time, this was primarily driven by break-fix, maintenance and infrastructure revenues and not mainly from other non-product related revenues.

Certainly companies like EDS (now with HP), Perot Systems (now with Dell), and other pure service companies generate all their revenue from services not associated with selling HW or SW.  Which is probably why HP and Dell purchased them.

The challenge for analysts is to try to extract the more ongoing maintenance, break-fix and infrastructure revenues from other service activity in order to understand how to delineate portions of service revenue growth:

  • IBM seems to break out their GBS (consulting and application mgnt) from their GTS (outsourcing, infrastructure, and maint) revenues (see IBM’s 10k).  However extracting break-fix and maintenance revenues from the other GTS revenues is impossible outside IBM.
  • EMC has no breakdown whatsoever in their services revenue line item in their 10K.
  • HP similarly, has no breakdown for their service revenues in their 10K.

Some of this may be discussed in financial analyst calls, but I could locate nothing but the above in their annual reports/10Ks.

IBM and Dell to the rescue

So we are all left to wonder how much of reported services revenue is ongoing maintenance and infrastructure business versus other services business.  Certainly IBM, in reporting both GBS and GTS gives us some inkling of what this might be in their annual report: GBS is $18B and GTS is $38B. So that means maint and break-fix must be some portion of that GTS line item.

Perhaps we could use Dell as a proxy to determine break-fix, maintenance and infrastructure service revenues. Not sure where Wikibon got the reported service revenue % for Dell but their most recent 10K shows services are more like 19% of annual revenues.

Dell had a note in their “Results from operations” section that said Perot systems was 7% of this.  Which means previous services, primarily break-fix, maintenance and other infrastructure support revenues accounted for something like 12% (maybe this is what Wikibon is reporting).

Unclear how well Dell revenue percentages are representative of the rest of the IT industry but if we take their ~12% of revenues off the percentages reported by Wikibon then the new ranges are from 45% for IBM to 7% for Dell with an median around 14.5% for non-break fix, maintenance and infrastructure service revenues.

Why is this important?

Break-fix, maintenance revenues and most infrastructure revenues are entirely associated with product (HW or SW) sales, representing an annuity once original product sales close.  The remaining service revenues are special purpose contracts (which may last years), much of which are sold on a project basis representing non-recurring revenue streams.

—-

So the next time some company tells you their service revenues are up 25% YoY, ask them how much of this is due to break-fix and maintenance.  This may tell you whether their product footprint expansion or their service offerings success is driving service revenue growth.

Comments?