Genome informatics takes off at 100GB/human

All is One, the I-ching and Genome case by TheAlieness (cc) (from flickr)
All is One, the I-ching and Genome case by TheAlieness (cc) (from flickr)

Read a recent article (actually a series of charts and text) on MIT Technical Review called Bases to Bytes which discusses how the costs of having your DNA sequenced is dropping faster than Moore’s law and how storing a person’s DNA data now takes ~100GB.

Apparently Nature magazine says ~30,000 genomes have been sequenced (not counting biotech sequenced genomes), representing ~3PB of data.

Why it takes 100GB

At the moment DNA sequencing is not doing any compression, no deduplication nor any other storage efficiency tools to reduce this capacity footprint.  The 3.2Billion DNA base pairs each would take a minimum of 2 bits to store which should be ~800MB but for some reason more information about each base is saved (for future needs?) and they often re-sequence the DNA multiple times just to be sure (replica’s?).  All this seems to add up  to needing 100GB of data for a typical DNA sequencing output.

How they go from 0.8GB to 100GB with more info on each base pair and multiple copies or 125X the original data requirement is beyond me.

However, we have written about DNA informatics before (see our Dits, codons & chromozones – the storage of life post).  In that post I estimated that human DNA would need ~64GB of storage, almost right on.  (Although there was a math error somewhere in that analysis. Let’s see, 1B codons each with 64 possibilities [needing 6 bits] should require 6Bbits or ~750MB of storage, close enough).

Dedupe to the rescue

But in my view some deduplication should help.  Not clear if it’s at the Codon level or at some higher organizational level (chromosome, protein, ?)  but a “codon-differential” deduplication algorithm might just do the trick and take DNA capacity requirements down to size.  In fact with all the replication in junk DNA, it starts to looks more and more like backup sets already.

I am sure any of my Deduplication friends in the industry such as EMC Data Domain, HP StoreOnce, NetApp, SEPATON, and others would be happy to give it some thought if adequate funding were to follow.  But with this much storage at stake, some of them may take it on just to go after the storage requirements.

Gosh with a 50:1 deduplication ratio, maybe we could get a human DNA sequence down to 2GB.  Then it would only take 14EB to sequence the worlds 7B population today.

Now if we could just sequence the human microbiome with metagenomic analysis of the microbiological communities of organisms that live upon, within and around all of us.  Then we might have the answer to everything biologically we wanted to know about some person.

What we could do with all this information is another matter.

Comments?

Graphene Flash Memory

Model of graphene structure by CORE-Materials (cc) (from Flickr)
Model of graphene structure by CORE-Materials (cc) (from Flickr)

I have been thinking about writing a post on “Is Flash Dead?” for a while now.  Well at least since talking with IBM research a couple of weeks ago on their new memory technologies that they have been working on.

But then this new Technology Review article came out  discussing recent research on Graphene Flash Memory.

Problems with NAND Flash

As we have discussed before, NAND flash memory has some serious limitations as it’s shrunk below 11nm or so. For instance, write endurance plummets, memory retention times are reduced and cell-to-cell interactions increase significantly.

These issues are not that much of a problem with today’s flash at 20nm or so. But to continue to follow Moore’s law and drop the price of NAND flash on a $/Gb basis, it will need to shrink below 16nm.  At that point or soon thereafter, current NAND flash technology will no longer be viable.

Other non-NAND based non-volatile memories

That’s why IBM and others are working on different types of non-volatile storage such as PCM (phase change memory), MRAM (magnetic RAM) , FeRAM (Ferroelectric RAM) and others.  All these have the potential to improve general reliability characteristics beyond where NAND Flash is today and where it will be tomorrow as chip geometries shrink even more.

IBM seems to be betting on MRAM or racetrack memory technology because it has near DRAM performance, extremely low power and can store far more data in the same amount of space. It sort of reminds me of delay line memory where bits were stored on a wire line and read out as they passed across a read/write circuit. Only in the case of racetrack memory, the delay line is etched in a silicon circuit indentation with the read/write head implemented at the bottom of the cleft.

Graphene as the solution

Then along comes Graphene based Flash Memory.  Graphene can apparently be used as a substitute for the storage layer in a flash memory cell.  According to the report, the graphene stores data using less power and with better stability over time.  Both crucial problems with NAND flash memory as it’s shrunk below today’s geometries.  The research is being done at UCLA and is supported by Samsung, a significant manufacturer of NAND flash memory today.

Current demonstration chips are much larger than would be useful.  However, given graphene’s material characteristics, the researchers believe there should be no problem scaling it down below where NAND Flash would start exhibiting problems.  The next iteration of research will be to see if their scaling assumptions can hold when device geometry is shrunk.

The other problem is getting graphene, a new material, into current chip production.  Current materials used in chip manufacturing lines are very tightly controlled and  building hybrid graphene devices to the same level of manufacturing tolerances and control will take some effort.

So don’t look for Graphene Flash Memory to show up anytime soon. But given that 16nm chip geometries are only a couple of years out and 11nm, a couple of years beyond that, it wouldn’t surprise me to see Graphene based Flash Memory introduced in about 4 years or so.  Then again, I am no materials expert, so don’t hold me to this timeline.

 

—-

Comments?

EMCWorld news Day1 1st half

EMC World keynote stage, storage, vblocks, and cloud...
EMC World keynote stage, storage, vblocks, and cloud...

EMC announced today a couple of new twists on the flash/SSD storage end of the product spectrum.  Specifically,

  • They now support all flash/no-disk storage systems. Apparently they have been getting requests to eliminate disk storage altogether. Probably government IT but maybe some high-end enterprise customers with low-power, high performance requirements.
  • They are going to roll out enterprise MLC flash.  It’s unclear when it will  be released but it’s coming soon, different price curve, different longevity (maybe), but brings down the cost of flash by ~2X.
  • EMC is going to start selling server side Flash.  Using storage FAST like caching algorithms to knit the storage to the server side Flash.  Unclear what server Flash they will be using but it sounds a lot like a Fusion-IO type of product.  How well the server cache and the storage cache talks is another matter.  Chuck Hollis said EMC decided to redraw the boundary between storage and server and now there is a dotted line that spans the SAN/NAS boundary and carves out a piece of the server which is sort of on server caching.

Interesting to say the least.  How well it’s tied to the rest of the FAST suite is critical. What happens when one or the other loses power, as Flash is non-volatile no data would be lost but the currency of the data for shared storage may be another question.  Also having multiple servers in the environment may require cache coherence across the servers and storage participating in this data network!?

Some teaser announcements from Joe’s keynote:

  • VPLEX asynchronous, active active supporting two datacenter access to the same data over 1700Km away Pittsburgh to Dallas.
  • New Isilon record scalability and capacity the NL appliance. Can now support a 15PB file system, with trillions of files in it.  One gene sequencer says a typical assay generates 500M objects/files…
  • Embracing Hadoop open source products so that EMC will support Hadoop distro in an appliance or software only solution

Pat G also showed EMC Greenplum appliance searching a 8B row database to find out how many products have been shipped to a specific zip code…

 

 

Services and products, a match made in heaven

wrench rust by HVargas (cc) (from Flickr)
wrench rust by HVargas (cc) (from Flickr)

In all the hoopla about company’s increasing services revenues what seems to be missing is that hardware and software sales automatically drive lots of services revenues.

A recent Wikibon post by Doug Chandler (see Can cloud pull services and technology together …) showed a chart of leading IT companies percent of revenue from services.  The percentages ranged from a high of 57% for  IBM to a low of 12% for Dell, with the median being ~26.5%.

In the beginning, …

It seems to me that services started out being an adjunct to hardware and software sales – i.e., maintenance, help to install the product, provide operational support, etc. Over time, companies like IBM and others went after service offerings as a separate distinct business activity, outside of normal HW and SW sales cycles.

This turned out to be a great revenue booster, and practically turned IBM around in the 90s.   However, one problem with hardware and software vendors reporting of service revenue is that they also embed break-fix, maintenance and infrastructure revenue streams in these line items.

The Wikibon blog mentioned StorageTek’s great service revenue business when Sun purchased them.  I recall that at the time, this was primarily driven by break-fix, maintenance and infrastructure revenues and not mainly from other non-product related revenues.

Certainly companies like EDS (now with HP), Perot Systems (now with Dell), and other pure service companies generate all their revenue from services not associated with selling HW or SW.  Which is probably why HP and Dell purchased them.

The challenge for analysts is to try to extract the more ongoing maintenance, break-fix and infrastructure revenues from other service activity in order to understand how to delineate portions of service revenue growth:

  • IBM seems to break out their GBS (consulting and application mgnt) from their GTS (outsourcing, infrastructure, and maint) revenues (see IBM’s 10k).  However extracting break-fix and maintenance revenues from the other GTS revenues is impossible outside IBM.
  • EMC has no breakdown whatsoever in their services revenue line item in their 10K.
  • HP similarly, has no breakdown for their service revenues in their 10K.

Some of this may be discussed in financial analyst calls, but I could locate nothing but the above in their annual reports/10Ks.

IBM and Dell to the rescue

So we are all left to wonder how much of reported services revenue is ongoing maintenance and infrastructure business versus other services business.  Certainly IBM, in reporting both GBS and GTS gives us some inkling of what this might be in their annual report: GBS is $18B and GTS is $38B. So that means maint and break-fix must be some portion of that GTS line item.

Perhaps we could use Dell as a proxy to determine break-fix, maintenance and infrastructure service revenues. Not sure where Wikibon got the reported service revenue % for Dell but their most recent 10K shows services are more like 19% of annual revenues.

Dell had a note in their “Results from operations” section that said Perot systems was 7% of this.  Which means previous services, primarily break-fix, maintenance and other infrastructure support revenues accounted for something like 12% (maybe this is what Wikibon is reporting).

Unclear how well Dell revenue percentages are representative of the rest of the IT industry but if we take their ~12% of revenues off the percentages reported by Wikibon then the new ranges are from 45% for IBM to 7% for Dell with an median around 14.5% for non-break fix, maintenance and infrastructure service revenues.

Why is this important?

Break-fix, maintenance revenues and most infrastructure revenues are entirely associated with product (HW or SW) sales, representing an annuity once original product sales close.  The remaining service revenues are special purpose contracts (which may last years), much of which are sold on a project basis representing non-recurring revenue streams.

—-

So the next time some company tells you their service revenues are up 25% YoY, ask them how much of this is due to break-fix and maintenance.  This may tell you whether their product footprint expansion or their service offerings success is driving service revenue growth.

Comments?

Strategy is dead, again

American War Cemetary - Remembering by tienvijftien (cc) (from Flickr)
American War Cemetary - Remembering by tienvijftien (cc) (from Flickr)

Was talking with a friend of mine this week and he said that strategic planning has been deemphasized these last few years mainly due to the economic climate.  We have discussed this before (see Strategy, as we know it, is dead). Most companies are in a struggle to survive and had little time or resources to spend on thinking about their long term future, let alone next year.

Yes, the last few years have been tough on everyone but the lack of strategic planning is hard for me to accept.  As I look around, products are still being developed, functionality is being enhanced, technology continues to move forward.  All of these exemplify some strategic planning/thinking, albeit prior efforts from 24 to 30 months ago.

Nonetheless, development has not ceased in the interim. New features and products are still being planned for introduction over the next year or so.  Development pipelines seem as full as ever.

One could read the apparent dichotomy between deemphasizing strategic planning but continuing to roll out new products/enhancements as indicating that strategic planning has little impact on product development.  Another, more subtle interpretation is that strategic thinking goes beyond near term product improvements to something longer term, perhaps outside the 3 year window we see for current product enhancements.

In that case, then the evidence for reduced strategic planning will not show up until a couple more years have passed.  Thus, we should eventually see a slow down in new or revolutionary technology offerings.

Such a slowdown is hard to view.  Apple seems to introduce a revolutionary product every 4.5 yrs or so (iPod ’01, iPhone ’07, Ipad ’10).  Other companies probably have longer cycles.  But any evidence for a strategic planning reduction may ultimately show up as a slowdown in the rate of new/revolutionary product introductions.

Other potential indicators of decreased strategic planning include margin erosion, loss of core competencies, reduction in market value, etc.  Some of these are objective, some subjective but they all sound like a better topic for an MBA thesis than a blog post or at least my blog posts.

For example, Kodak over the last 15 years or so comes to mind as a strategic corporate catastrophe playing out.  They almost invented digital photography/imaging.  But for whatever reason they failed to react to this coming transition until it was too late.  The result is a much diminished company of today, e.g., over the last 15 years their stock price has been reduced by a factor of 12X or more.

There are probably many more examples of both business strategy failure and success but from my perspective the choices are obvious:  Ignore strategic planning for too long and your company struggles to survive or implement strategic planning today and your company may thrive.

What other examples of strategic failure and successes can you think of?