September 2011 – Silverton Consulting

Electro-human interface

Posted on September 29, 2011 by Ray in Distributed computing, Networking, Storage, Strategic Inflection Points, WAN

Flaming Lotus Girls Neuron by SanFranAnnie (cc) (from Flickr)

There has been a lot of talk recently about neuromorphic computing (see IBM introduces SyNAPSE) and using Phase Change Memory for artificial neurons but there hasn’t been much discussion of ways to interface human’s or for that matter any life whatsoever to electronics.

That is until today.

Recently a report came out highlighted in IEEE Spectrum on a Transistor Made to Run on Protons. It seems that transistor technology used everywhere today runs on electrons or negative charges but bio-neurological systems all run on positive IONs and/or protons. This makes a proton transistor especially appealing for a biocompatible electronic interface.

Proton transistors are born

Apparently the chip is made out of “nanofibers of chitosan” originally derived from a squid. Also the device works in the presence of only high humidity and when “loaded with proton-donating acid groups”.

All sounds a bit biological to me but that’s probably the point.

It seems in the past when they tried to provide a bio-electronic interface like this they used micro-fluidics with a flow of positive IONs through a pipe. But this new approach does not use flowing liquid at all but rather just a flow of protons across an acid. The proton flow is controlled by an electrostatic potential applied to the transistors gate electrode.

Today the proton transistor has a channel width of 3.5 μm. At that size, it’s ~1000X bigger than current transistor technology (maybe even more). Which means it will be some time before they embed a proton based, 12-core Xeon processor in a brain.

Apparently the protons have a flow rate of ~5×10⁻³ cm² V⁻¹ s⁻¹through the transistor or by my calculations, roughly about ~1/2 bit per 100 seconds. Seems like we are going to need a lot more channels, but its only a start. For more information on the new transistor read the original article in Nature, A polysaccharide bioprotonic field-effect transistor.

But what can it do for me?

A proton transistor has the potential to interface directly with human neurons and as such, can form a biocompatible electronic interface. Such a bionanoprotonic device can conceivably create an in the brain-to-electronics interface that can bring digital information directly into a person’s consciousness. Such a capability would be a godsend to the blind, deaf and handicapped.

Of course if information can go in, it can also come out

One can imagine that such an interface can provide a portal to the web, an interface to a desktop or mobile computing device without the burden of displays, keyboards, or speakers. Such a device, when universally available, may make today’s computing paradigm look like using a manual typewriter.

This all sounds like science fiction but it feels like it just got a step closer to to reality.

Can The Singularity be that far behind?

—–

Comments?

Commodity hardware debate heats up again

Posted on September 27, 2011 by Ray in Distributed computing, Market dynamics, R&D measures, Strategic Inflection Points, System effectiveness, System quality

Gold Nanowire Array by lacomj (cc) (from Flickr)

A post by Chris M. Evans, in his The Storage Architect blog (Intel inside storage arrays) re-invigorated the discussion we had last year on commodity hardware always loses.

But buried in the comments was one from Michael Hay (HDS) which pointed to another blog post by Andrew Huang in his bunnie’s blog (Why the best days of open hardware are ahead) where he has an almost brillant discussion on how Moore’s law will eventually peter out (~5nm) and as such, will take much longer to double transistor density. At that time, hardware customization (by small companies/startups) will once again, come to the forefront in new technology development.

Custom hardware, here now and the foreseeable future

Although it would be hard to argue against Andrew’s point nevertheless, I firmly believe there is still plenty of opportunity today to customize hardware that brings true value to the market. The fact is that Moore’s law doesn’t mean that hardware customization cannot still be worthwhile.

Hitachi’s VSP (see Hitachi’s VSP vs. VMAX) is a fine example of the use of both custom ASICs, FPGAs (I believe) and standard off the shelf hardware. HP’s 3PAR is another example, they couldn’t have their speedy mesh architecture without custom hardware.

But will anyone be around that can do custom chip design?

Nigel Poulton commented on Chris’s post that with custom hardware seemingly going away, the infrastructure, training and people will no longer be around to support any re-invigorated custom hardware movement.

I disagree. Intel, IBM, Samsung, and many others large companies still maintain an active electronics engineering team/chip design capability, any of which are capable of creating state of the art ASICs. These capabilities are what make Moore’s law a reality and will not go away over the long run (the next 20-30 years).

The fact that these competencies are locked up in very large organizations doesn’t mean it cannot be used by small companies/startups as well. It probably does mean that these wherewithal may cost more. But the market place will deal with that in the long run, that is if the need continues to exist.

But do we still need custom hardware?

Custom hardware creates capabilities that magnify Moore’s law processing capabilities to do things that standard, off the shelf hardware cannot. The main problem with Moore’s law from a custom hardware perspective is it takes functionality that once took custom hardware yesterday (or 18 months ago) and makes it available on off the shelf components with custom software today.

This dynamic just means that custom hardware needs to keep moving, providing ever more user benefits and functionality to remain viable. When custom hardware cannot provide any real benefit over standard off the shelf components – that’s when it will die.

Andrew talks about the time it takes to develop custom ASICs and the fact that by the time you have one ready, a new standard chip has come out which doubles processor capabilities. Yes custom ASICs take time to develop, but FPGAs can be created and deployed in much less time. FPGA’s, like custom ASICs, also take advantage of Moore’s law with increased transistor density every 18 months. Yes, FPGAs may be run slower than custom ASICs, but what it lacks in processing power, it makes up in time to market.

Custom hardware has a bright future as far as I can see.

—–

Comments?

SCI’s latest SPC-2 performance results analysis – chart-of-the-month

Posted on September 22, 2011October 8, 2012 by Ray in Block Storage, Disk storage, FC, LDQ, MPBS, SPC-2, Storage, Storage architecture, Storage drive, Storage performance, Storage Performance Council

SCISPC110822-002 (c) 2011 Silverton Consulting, All Rights Reserved

There really wasn’t that many new submissions for the Storage Performance Council SPC-1 or SPC-2 benchmarks this past quarter (just the new Fujitsu DX80S2 SPC-2 run) so we thought it time to roll out a new chart.

The chart above shows a scatter plot of the number of disk drives in a submission vs. the MB/sec attained for the Large Database Query (LDQ) component of an SPC-2 benchmark.

As one who follows this blog and our twitter feed knows we continue to have an ongoing, long running discussion on how I/O benchmarks such as this are mostly just a measure of how much hardware (disks and controllers) are thrown at them. We added a linear regression line to the above chart to evaluate the validity of that claim and as clearly shown above, disk drive count is NOT highly correlated with SPC-2 performance.

We necessarily exclude from this analysis any system results that used NAND based caching or SSD devices so as to focus specifically on disk drive count relevance. There are not a lot of these in SPC-2 results but there are enough to make this look even worse.

We chose to only display the LDQ segment of the SPC-2 benchmark because it has the best correlation or highest R**2 at 0.41 between workload and disk count. The aggregate MBPS as well as the other components of the SPC-2 benchmark include video on demand (VOD) and large file processing (LFP) both of which had R**2’s of less than 0.36.

For instance, just look at the vertical centered around 775 disk drives. There are two systems that show up here, one doing ~ 6000 MBPS and the other doing ~11,500 MBPS – quite a difference. The fact that these are two different storage architectures from the same vendor is even more informative??

Why is the overall correlation so poor?

One can only speculate but there must be something about system sophistication at work in SPC-2 results. It’s probably tied to better caching, better data layout on disk, and better IO latency but it’s only an educated guess. For example,

Most of the SPC-2 workload is sequential in nature. How a storage system detects sequentiality in a seemingly random IO mix is an art form and what a system does armed with that knowledge is probably more of a science.
In the old days of big, expensive CKD DASD, sequential data was all laid out in consecutively (barring lacing) around a track and up a cylinder. These days of zoned FBA disks one can only hope that sequential data resides in laced sectors, along consecutive tracks on the media, minimizing any head seek activity. Another approach, popular this last decade, has been to throw more disks at the problem, resulting in many more seeking heads to handle the workload and who care where the data lies.
IO latency is another factor. We have discussed this before (see Storage throughput vs IO response time and why it matters. But one key to systems throughput is how quickly data gets out of cache and into the hands of servers. Of course the other part to this, is how fast does the storage system get the data from sitting on disk into cache.

Systems that do these better will perform better on SPC-2 like benchmarks that focus on raw sequential throughput.

Comments?

—–

The full SPC performance report went out to our newsletter subscribers last month. A copy of the full report will be up on the dispatches page of our website later next month. However, you can get this information now and subscribe to future newsletters to receive these reports even earlier by just sending us an email or using the signup form above right.

As always, we welcome any suggestions on how to improve our analysis of SPC results or any of our other storage system performance discussions.

Big data and eMedicine combine to improve healthcare

Posted on September 20, 2011April 10, 2012 by Ray in Data analytics, Data science, Distributed computing, Strategic Inflection Points, Visionary leadershp

We have talked before ePathology and data growth, but Technology Review recently reported that researchers at Stanford University have used Electronic Medical Records (EMR) from multiple medical institutions to identify a new harmful drug interaction. Apparently, they found that when patients take Paxil (a depressant) and Pravachol (a cholresterol reducer) together, the drugs interact to raise blood sugar similar to what diabetics have.

Data analytics to the rescue

The researchers started out looking for new drug interactions which could result in conditions seen by diabetics. Their initial study showed a strong signal that taking both Paxil and Pravachol could be a problem.

Their study used FDA Adverse Event Reports (AERs) data that hospitals and medical care institutions record. Originally, the researchers at Stanford’s Biomedical Informatics group used AERs available at Stanford University School of Medicine but found that although they had a clear signal that there could be a problem, they didn’t have sufficient data to statistically prove the combined drug interaction.

They then went out to Harvard Medical School and Vanderbilt University and asked that to access their AERs to add to their data. With the combined data, the researchers were now able to clearly see and statistically prove the adverse interactions between the two drugs.

But how did they analyze the data?

I could find no information about what tools the biomedical informatics researchers used to analyze the set of AERs they amassed, but it wouldn’t surprise me to find out that Hadoop played a part in this activity. It would seem to be a natural fit to use Hadoop and MapReduce to aggregate the AERs together into a semi-structured data set and reduce this data set to extract the AERs which matched their interaction profile.

Then again, it’s entirely possible that they used a standard database analytics tool to do the work. After all, we were only talking about a 100 to 200K records or so.

Nonetheless, the Technology Review article stated that some large hospitals and medical institutions using EMR are starting to have database analysts (maybe data scientists) on staff to mine their record data and electronic information to help improve healthcare.

Although EMR was originally envisioned as a way to keep better track of individual patients, when a single patient’s data is combined with 1000s more patients one creates something entirely different, something that can be mined to extract information. Such a data repository can be used to ask questions about healthcare inconceivable before.

—-

Digitized medical imagery (X-Rays, MRIs, & CAT scans), E-pathology and now EMR are together giving rise to a new form of electronic medicine or E-Medicine. With everything being digitized, securely accessed and amenable to big data analytics medical care as we know is about to undergo a paradigm shift.

Big data and eMedicine combined together are about to change healthcare for the better.

The sensor cloud comes home

Posted on September 15, 2011 by Ray in Cloud services, Data, Data analytics, Data growth, Information economy, Strategic Inflection Points

We thought the advent of smart power meters would be the killer app for building the sensor cloud in the home. But, this week Honeywell announced a new smart thermostat that attaches to the Internet and uses Opower’s cloud service to record and analyze home heating and cooling demand. Looks to be an even better bet.

9/11 Memorial renderings, aerial view (c) 9/11 Memorial.org (from their website)

Just this past week, on a NPR NOVA telecast: Engineering Ground Zero on building the 9/11 memorial in NYC, it was mentioned that all the trees planted in the memorial had individual sensors to measure soil chemistry, dampness, and other tree health indicators. Yes, even trees are getting on the sensor cloud.

And of course the buildings going up at Ground Zero are all smart buildings as well, containing sensors embedded in the structure, the infrastructure, and anywhere else that matters.

But what does this mean in terms of data

Data requirements will explode as the smart home and other sensor clouds build out. For example, even if a smart thermostat only issues a message every 15 minutes and the message is only 256 bytes, the data from the 130 million households in the US alone would be an additional ~3.2TB/day. And that’s just one sensor per household.

If you add the smart power meter, lawn sensor, intrusion/fire/chemical sensor, and god forbid, the refrigerator and freezer product sensors to the mix that’s another another 16TB/day of incoming data.

And that’s just assuming a 256 byte payload per sensor every 15 minutes. The intrusion sensors could easily be a combination of multiple, real time exterior video feeds as well as multi-point intrusion/motion/fire/chemical sensors which would generate much, much more data.

But we have smart roads/bridges, smart cars/trucks, smart skyscrapers, smart port facilities, smart railroads, smart boats/ferries, etc. to come. I could go on but the list seems long enouch already. Each of these could generate another ~19TB/day data stream, if not more. Some of these infrastructure entities/devices are much more complex than a house and there are a lot more cars on the road than houses in the US.

It’s great to be in the (cloud) storage business

All that data has to be stored somewhere and that place is going to be the cloud. The Honeywell smart thermostat uses Opower’s cloud storage and computing infrastructure specifically designed to support better power management for heating and cooling the home. Following this approach, it’s certainly feasible that more cloud services would come online to support each of the smart entities discussed above.

Naturally, using this data to provide real time understanding of the infrastructure they monitor will require big data analytics. Hadoop, and it’s counterparts are the only platforms around today that are up to this task.

—-

So cloud computing, cloud storage, and big data analytics have yet another part to play. This time in the upcoming sensor cloud that will envelope the world and all of it’s infrastructure.

Welcome to the future, it’s almost here already.

Comments?

Disk capacity growing out-of-sight

Posted on September 13, 2011 by Ray in Data, Data density, Data growth, Information economy, Storage drive

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

Last week, Hitachi Global Storage Division(acquired by Western Digital, closing in 4Q2011) and Seagate announced some higher capacity disk drives for desk top applications over the past week.

Most of us in the industry have become somewhat jaded with respect to new capacity offerings. But last weeks announcements may give one pause.

Hitachi announced that they are shipping over 1TB/disk platter using 3.5″ platters shipping with 569Gb/sqin technology. In the past 4-6 platter disk drives were available in shipped disk drives using full height, 3.5″ drives. Given the platter capacity available now, 4-6TB drives are certainly feasible or just around the corner. Both Seagate and Samsung beat HGST to 1TB platter capacities which they announced in May of this year and began shipping in drives in June.

Speaking of 4TB drives, Seagate announced a new 4TB desktop external disk drive. I couldn’t locate any information about the number of platters, or Gb/sqin of their technology, but 4 platters are certainly feasible and as a result, a 4TB disk drive is available today.

I don’t know about you, but 4TB disk drives for a desktop seem about as much as I could ever use. But when looking seriously at my desktop environment my CAGR for storage (revealed as fully compressed TAR files) is ~61% year over year. At that rate, I will need a 4TB drive for backup purposes in about 7 years and if I assume a 2X compression rate then a 4TB desktop drive will be needed in ~3.5 years, (darn music, movies, photos, …). And we are not heavy digital media consumers, others that shoot and edit their own video probably use orders of magnitude more storage.

Hard to believe, but given current trends inevitable, a 4TB disk drive will become a necessity for us within the next 4 years.

—-

Comments?

HDS buys BlueArc

Posted on September 8, 2011September 29, 2011 by Ray in data access, File Storage, R&D measures, Storage, Storage performance, System effectiveness

wall o' storage (fisheye) by ChrisDag (cc) (From Flickr)

Yesterday, HDS announced that they had closed on the purchase of BlueArc their NAS supplier for the past 5 years or so. Many commentators mentioned that this was a logical evolution of their ongoing OEM agreement, how the timing was right and speculated on what the purchase price might have been. If you are interested in these aspects of the acquisition, I would refer you to the excellent post by David Vellante from Wikibon on the HDS BlueArc deal.

Hardware as a key differentiator

In contrast, I would like to concentrate here on another view of the purchase, specifically on how HDS and Hitachi, Ltd. have both been working to increase their product differentiation through advanced and specialized hardware (see my post on Hitachi’s VSP vs VMAX and for more on hardware vs. software check out Commodity hardware always loses).

Similarly, BlueArc shared this philosophy and was one of the few NAS vendors to develop special purpose hardware for their Titan and Mercury systems to specifically speedup NFS and CIFS processing. Most other NAS systems use more general purpose hardware and as a result, a majority of their R&D investment focuses on software functionality.

But not BlueArc, their performance advantage was highly dependent on specifically designed FPGAs and other hardware. As such, they have a significant hardware R&D budget to continue their maintain and leverage their unique hardware advantage.

From my perspective, this follows what HDS and Hitachi, Ltd., have been doing all along with the USP, USP-V, and now their latest entrant the VSP. If you look under the covers at these products you find a plethora of many special purpose ASICs, FPGAs and other hardware that help accelerate IO performance.

BlueArc and HDS/Hitachi, Ltd. seem to be some of the last vendors standing that still believe that hardware specialization can bring value to data storage. From that standpoint, it makes an awful lot of sense to me to have HDS purchase them.

But others aren’t standing still

In the mean time, scale out NAS products continue to move forward on a number of fronts. As readers of my newsletter know, currently the SPECsfs2008 overall performance winner is a scale out NAS solution using 144 nodes from EMC Isilon (newsletter signup is above right or can also be found here).

The fact that now HDS/Hitachi, Ltd. can bring their considerable hardware development skills and resources to bear on helping BlueArc develop and deploy their next generation of hardware is a good sign.

Another interesting tidbit was HDS’s previous purchase of ParaScale which seems to have some scale out NAS capabilities of its own. How this all gets pulled together within HDS’s product line will need to be seen.

In any event, all this means that the battle for NAS isn’t over and is just moving to a higher level.

—-

Comments?

Graphene Flash Memory

Posted on September 6, 2011April 10, 2012 by Ray in Data, Data density, Data integrity, Data retention, SSD storage, Storage, storage economics, Storage energy use, Storage longevity, Storage reliability, Strategic Inflection Points, System effectiveness, Uncategorized

Model of graphene structure by CORE-Materials (cc) (from Flickr)

I have been thinking about writing a post on “Is Flash Dead?” for a while now. Well at least since talking with IBM research a couple of weeks ago on their new memory technologies that they have been working on.

But then this new Technology Review article came out discussing recent research on Graphene Flash Memory.

Problems with NAND Flash

As we have discussed before, NAND flash memory has some serious limitations as it’s shrunk below 11nm or so. For instance, write endurance plummets, memory retention times are reduced and cell-to-cell interactions increase significantly.

These issues are not that much of a problem with today’s flash at 20nm or so. But to continue to follow Moore’s law and drop the price of NAND flash on a $/Gb basis, it will need to shrink below 16nm. At that point or soon thereafter, current NAND flash technology will no longer be viable.

Other non-NAND based non-volatile memories

That’s why IBM and others are working on different types of non-volatile storage such as PCM (phase change memory), MRAM (magnetic RAM) , FeRAM (Ferroelectric RAM) and others. All these have the potential to improve general reliability characteristics beyond where NAND Flash is today and where it will be tomorrow as chip geometries shrink even more.

IBM seems to be betting on MRAM or racetrack memory technology because it has near DRAM performance, extremely low power and can store far more data in the same amount of space. It sort of reminds me of delay line memory where bits were stored on a wire line and read out as they passed across a read/write circuit. Only in the case of racetrack memory, the delay line is etched in a silicon circuit indentation with the read/write head implemented at the bottom of the cleft.

Graphene as the solution

Then along comes Graphene based Flash Memory. Graphene can apparently be used as a substitute for the storage layer in a flash memory cell. According to the report, the graphene stores data using less power and with better stability over time. Both crucial problems with NAND flash memory as it’s shrunk below today’s geometries. The research is being done at UCLA and is supported by Samsung, a significant manufacturer of NAND flash memory today.

Current demonstration chips are much larger than would be useful. However, given graphene’s material characteristics, the researchers believe there should be no problem scaling it down below where NAND Flash would start exhibiting problems. The next iteration of research will be to see if their scaling assumptions can hold when device geometry is shrunk.

The other problem is getting graphene, a new material, into current chip production. Current materials used in chip manufacturing lines are very tightly controlled and building hybrid graphene devices to the same level of manufacturing tolerances and control will take some effort.

So don’t look for Graphene Flash Memory to show up anytime soon. But given that 16nm chip geometries are only a couple of years out and 11nm, a couple of years beyond that, it wouldn’t surprise me to see Graphene based Flash Memory introduced in about 4 years or so. Then again, I am no materials expert, so don’t hold me to this timeline.

—-

Comments?

Proton transistors are born

But what can it do for me?

Of course if information can go in, it can also come out

Custom hardware, here now and the foreseeable future

But will anyone be around that can do custom chip design?

But do we still need custom hardware?

But how did they analyze the data?

It’s great to be in the (cloud) storage business

Hardware as a key differentiator

Problems with NAND Flash

Other non-NAND based non-volatile memories

Graphene as the solution

But how did they analyze the data?