Latest Publications

Future iBook pricing

Apple's iPad iBook app (from Apple.com)

Apple's iPad iBook app (from Apple.com)

All the news about iPad and iBooks app got me thinking. There’s been much discussion on e-book pricing but no one is looking at what to charge for items other than books.  I look at this as something like what happened to albums when iTunes came out.  Individual songs were now available without having to buy the whole album.

As such, I started to consider what iBooks should charge for items outside of books.  Specifically,

  • Poems - no reason the iBooks app should not offer poems as well as books but what’s a reasonable price for a poem.  I believe Natalie Goldberg in Writing Down the Bones: Freeing the Writer Within used to charge $0.25 per poem.  So this is a useful lower bound, however considering inflation (and assuming $0.25 was 1976 pricing), in today’s prices this would be closer to $1.66.  With iBooks app’s published commission rate (33% for Apple) future poets would walk away with $1.11 per poem.
  • Haiku - As a short form poem I would argue that a Haiku should cost less than a poem.  So, maybe $0.99 per haiku,would be a reasonable price.
  • Short stories - As a short form book pricing for short stories needs to be somehow proportional to normal e-book pricing.  A typical book has about 10 chapters and as such, it might be reasonable to consider a short story as equal to a chapter.  So maybe 1/10th the price of an e-book is reasonable.  With the prices being discussed for books this would be roughly the price we set for poems.  No doubt incurring the wrath of poets forevermore, I  am willing to say this undercuts the worth of short stories and would suggest something more on the order of $2.49 for a short story.  (Poets please forgive my transgression.)
  • Comic books - Comic books seem close to short stories and with their color graphics would do well on the iPad.  It seems to me that these might be priced somewhere in between short stories and poems,  perhaps at $1.99 each.
  • Magazine articles - I see no reason that magazine articles shouldn’t be offered as well as short stories outside the magazine itself. Once again, color graphics found in most high end magazines should do well on the iPad.  I would assume pricing similar to short stories would make sense here.

University presses, the prime outlet for short stories today, seem similar to small record labels.  Of course,  the iBooks app could easily offer to sell their production as e-books in addition to selling their stories separately. Similar considerations apply to poetry publishers. Selling poems and short stories outside of book form might provide more exposure for the authors/poets and in the long run, more revenue for them and their publishers.  But record companies will attest that your results may vary.

Regarding magazine articles and comic books there seems to be a dependance on advertising revenue that may suffer from iBook publishing.  This could be dealt with by incorporating publisher advertisements in iBook displays of an article or comic book.   However, significant advertisement revenue comes from ads placed outside of articles, such as in back matter, around the table of contents, in-between articles, etc.  This will need to change with the transition to e-articles – revenues may suffer.

Nonetheless, all these industries can continue to do what they do today.  Record companies still exist, perhaps not doing as well as before iTunes, but they still sell CDs.  So there is life after iTunes/iBooks, but one things for certain – it’s different.

Probably missing whole categories of items that could be separated from book form as sold today,  But in my view, anything that could be offered separately probably will be.  Comments?

Intel-Micron new 25nm/8GB MLC NAND chip

intel_and_micron_in_25nm_nand_technology

intel_and_micron_in_25nm_nand_technology

Intel-Micron Flash Technologies just issued another increase in NAND density. This one’s manages to put 8GB on a single chip with MLC(2) technology in a 167mm square package or roughly a half inch per side.

You may recall that Intel-Micron Flash Technologies (IMFT) is a joint venture between Intel and Micron to develop NAND technology chips. IMFT chips can be used by any vendor and typically show up in Intel SSDs as well as other vendor systems. MLC technology is more suitable for use in consumer applications but at these densities it’s starting to make sense for use by data centers as well. We have written before about MLC NAND used in the enterprise disk by STEC and Toshiba’s MLC SSDs. But in essence MLC NAND reliability and endurability will ultimately determine its place in the enterprise.

But at these densities, you can just throw more capacity at the problem to mask MLC endurance concerns. For example, with this latest chip, one could conceivably have a single layer 2.5″ configuration with almost 200GBs of MLC NAND. If you wanted to configure this as 128GB SSD you could use the additional 72GB of NAND for failing pages. Doing this could conceivably add more than 50% to the life of an SSD.

SLC still has better (~10X) endurance but being able to ship 2X the capacity in the same footprint can help.  Of course, MLC and SLC NAND can be combined in a hybrid device to give some approximation of SLC reliability at MLC costs.

IMFT made no mention of SLC NAND chips at the 25nm technology node but presumably this will be forthcoming shortly.  As such, if we assume the technology can support a 4GB SLC NAND in a 167mm**2 chip it should be of significant interest to most enterprise SSD vendors.

A couple of things missing from yesterday’s IMFT press release, namely

  • read/write performance specifications for the NAND chip
  • write endurance specifications for the NAND chip

SSD performance is normally a function of all the technology that surrounds the NAND chip but it all starts with the chip.  Also, MLC used to be capable of 10,000 write/erase cycles and SLC was capable of 100,000 w/e cycles but most recent technology from Toshiba (presumably 34nm technology) shows a MLC NAND write/erase endurance of only 1400 cycles.  Which seems to imply that as the NAND technology increases density write endurance rates degrade. How much is subject to much debate and with the lack of any standardized w/e endurance specifications and reporting, it’s hard to see how bad it gets.

The bottom line, capacity is great but we need to know w/e endurance to really see where this new technology fits.  Ultimately, if endurance degrades significantly such NAND technology will only be suitable for consumer products.  Of course at ~10X (just guessing) the size of the enterprise market maybe that’s ok.

Free P2P-Cloud Storage and Computing Services?

FFT_graph from Seti@home

FFT_graph from Seti@home

What would happen if somebody came up with a peer-to-peer cloud (P2P-Cloud) storage or computing service.  I see this as

  • Operating a little like Napster/Gnutella where many people come together and share out their storage/computing resources.
  • It could operate in a centralized or decentralized fashion
  • It  would allow access to data/computing resources anywhere from the internet

Everyone joining the P2P-cloud would need to set aside computing and/or storage resources they were willing to devote to the cloud.  By doing so, they would gain access to an equivalent amount (minus overhead) of other nodes computing and storage resources to use as they see fit.

P2P-Cloud Storage

For cloud storage the P2P-Cloud would create a common cloud data repository spread across all nodes in the network:

  • Data would be distributed across the network in such a way that would allow reconstruction within any reasonable time frame and would handle any reasonable amount of node outages without loss of data.
  • Data would be encrypted before being sent to the cloud rendering the data unreadable without the key.
  • Data would NOT necessarily be shared, but would be hosted on other users systems.

As such, if I were to offer up 100GB of storage to the P2P-Cloud, I would get at least a 100GB (less overhead) of protected storage elsewhere on the cloud to use as I see fit.  Some % of this would be lost to administration say 1-3% and redundancy protection say ~25% but the remaining 72GB of off-site storage could be very useful for DR purposes.

P2P-Cloud storage would provide a reliable, secure, distributed file repository that could be easily accessible from any internet location.  At a minimum, the service would be free and equivalent to what someone supplies (less overhead) to the P2P-Cloud Storage service.  If storage needs exceeded your commitment, more cloud storage could be provided at a modest cost to the consumer.  Such fees would be shared by all the participants offering excess [=offered - (consumed + overhead)] storage to the cloud .

P2P-Cloud Computing

Cloud computing is definitely more complex, but generally follows the Seti@HOME/BOINC model:

  • P2P-Cloud computing suppliers would agree to use something like a “new screensaver” which would perform computation while generating a viable screensaver.
  • Whenever the screensaver was invoked, it would start execution on the last assigned processing unit.  Intermediate work results would need to be saved and when completed, the answer could be sent to the requester and a new processing unit assigned.
  • Processing units would be assigned by the P2P-Cloud computing consumer, would be timeout-able and re-assignable at will.

Computing users won’t gain much if the computing time they consume is <= the computing time they offer (less overhead).  However, computing time offset may be worth something, i.e., computing time now might be more valuable than computing time tonite.  Which may offer a slight margin of value to help get this off the ground.  As such, P2P-Cloud computing suppliers would need to be able to specify when computing resources might be mostly available along with the type, quality and quantity.

Unclear how to secure the processing unit and this makes legal issues more prevalent.  That may not be much of a problem, as a complex distributed computing task makes little sense in isolation. But the (il-)legality of some data processing activities could conceivably put the provider in a precarious position. (Somebody from the legal profession would need clarify all this, but I would think that some “Amazon C2″ like licensing might offer safe harbor here).

P2P-Cloud computing services wouldn’t necessarily be amenable to the more normal, non-distributed or linear computing tasks but one could view these as just a primitive version of distributed computing tasks.  In either case, any data needed for computation would need to be sent along with the computing software to be run on a distributed node.  Whether it’s worth the effort is something for the users to debate.

BOINC can provide a useful model here.  Also, the Condor(R) project at U. of Wisconsin/Madison can provide a similar framework for scheduling the work of a “less distributed” computing task model.  In my mind, both types of services ultimately need to be provided.

To generate more compute servers, the SETI@Home and similar BOINC projects rely on doing good deeds.  As such, if you can make your computing task  do something of value to most users then maybe that’s enough. In that case, I would suggest joining up as a BOINC project. For the rest of us, doing more mundane data processing, just offering our compute services to the P2P-Cloud will need to suffice.

Starting up the P2P-Cloud

Bootstrapping the P2P-Cloud might take some effort but once going it should be self sustaining (assuming no centralized infrastructure).  I envision an open source solution, taking off from the work done on Napster&Gnutella and/or Boinc&Condor.

I believe the P2P-Cloud Storage service would be the easiest to get started.  BOINC and SETI@home (list of active Boinc projects) have been around a lot longer than cloud storage but their existence suggests that with the right incentives, even the P2P-Cloud Computing service can make sense.

Strategy, as we know it, is dead

Or at least that’s how the WSJ reported it yesterday.

Years back when I was working in corporate strategy we used to have this yearly dance called strategic planning.  Every year we would fan out to all the business units, look at what they were doing and try to figure out what they needed to be doing three to five years down the road.

This process typically lasted the better part of a quarter or so and culminated in a presentation to upper management on a direction to pursue for the business unit.  What would happen next was often the best part.  Some business groups would shelve the work and not look at it again.  Other business units would invest time and effort to incorporate the strategic plan recommendations into what they were doing that year to try to make it happen in 3 to 5 years time.  At the end of this process, annual budgets would be declared “done” and the world would go back to work.

But that was the old, dead strategy.

The “New Strategy”

The new strategy is defined by adaptability and flexibility to take advantage of any opportunity that presents itself.  This results in strategic plans and operating budgets that are updated monthly, just-in-time decision making, and wider ranging planning scenarios.  For example:

  • Strategic plans and budgets updated  monthly – as the economy tanked over the last couple of years, baseline assumptions were rendered useless in no time at all.  Budgets updated yearly were no help.  Even budgets that were updated quarterly were subject to significant tracking error.  The only way to survive was to look at your budgets every month and adjust for cost of capital, inventory, and revenue mix.  This way a company could adjust their product mix immediately to best match what was selling and thus, maximize return.
  • Just-in-time decision making – the WSJ used a factory closing example in their article but I prefer to look at the SSD vs HDD product mix.  When to get on the SSD bandwagon is a strategic decision.  One can examine this decision yearly quarterly or monthly to see if it makes sense today or  take the time to identify the trigger points that would make the decision for you.  For SSDs, one could decide what price SLC-NAND memory has to drop to,  say $X/GB,  when SSDs would make sense.  To make this decision, one must determine how long it would take to create and launch SSD product offerings, what SLC-NAND pricing trends look like today and back up the trigger point to take this all in account.  But, after that all one need do is monitor SSD pricing daily and when it hits your trigger point start the product changeover.
  • Wider ranging scenarios – all old strategic planning used economic variables such as cost of capital, revenue growth, and cost of goods sold, many would use a range of +/- 5% on each of these factors to generate operating scenarios that were then fed into the strategic planning process.  The problem with such scenarios is that they didn’t take into account the extreme circumstances of the last couple of years.  By widening the scenarios to something like +/- 15%, they became much more useful and would have reflected actual experience.
F-15 F-16 F86 Sabre Jet Heritage Flight by TMWolf (cc) (from flickr)

F-15 F-16 F86 Sabre Jet Heritage Flight by TMWolf (cc) (from flickr)

But in the end most of this speaks to speed and taking advantage of opportunities that are present.

OODA

All this reminds me of Colonel John R. Boyd (USAF deceased) who came up with a new military and competitive strategic paradigm called OODA or Observation, Orientation, Decision, and Action.  Observe the competition (or market place), orient to (or appreciate what) the market is doing,  decide what the most appropriate action will be, and then do it.  John believed that the fastest OODA cycle always wins in the end.  Any OODA cycle takes time to perform, one that is fastest will change the marketplace such that by the time your (slower) adversary sees what’s happening and reacts, you have already changed the world out from under them.

There was a good book on Col. Boyd’s life by Robert Coram, Boyd: The Fighter Pilot Who Changed the Art of War. Also there was a bio, Genghis John, written by a close friend, Chuck Spinney.  If you are interested in understanding more on his views of conflict and strategy, I suggest starting at the bio but the book was an easy read.

How this all applies to the world with 6-18 month product development cycles, and 3 month marketing campaigns needs to be the subject of a future post…

Are SSDs an invasive species?

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

A head assembly on a Seagate disk drive by Robert Scoble (cc) (from flickr)

I was reading about pythons becoming an invasive species in the Florida Everglades and that brought to mind SSDs.  The current ecological niche in data storage has rotating media as the most prolific predator with tape going on the endangered species list in many locales.

So where does SSD enter into the picture.  We have written before on SSD shipments start to take off but that was looking at the numbers from another direction. Given recent announcements it appears that in the enterprise, SSDs seem to be taking over the place formerly held by 15Krpm disk devices.  These were formerly the highest performers and most costly storage around.  But today, SSDs, as a class of storage, are easily the most costly storage and have the highest performance currently available.

The data

Seagate announced yesterday that they had shipped almost 50M disk drives last quarter up 8% from the prior quarter or ~96M drives over the past 6 months.  Now Seagate is not the only enterprise disk provider (Hitachi, Western Digital and others also supply this market) but they probably have the lion’s share.  Nonetheless, Seagate did mention that the last quarter was supply constrained and believed that the total addressible market was 160-165M disk drives.  That puts Seagate’s market share (in unit volume) at ~31% and at that rate the last 6 months total disk drive production should have been ~312M units.

In contrast, IDC reports that SSD shipments last year totaled 11m units. In both the disk and SSD cases we are not just talking enterprise class devices, the numbers include PC storage as well.  If we divide this number in half we have a comparable number of 5.5M SSDs for the last 6 months, giving SSDs less than a 2% market share (in units).

Back to the ecosystem.  In the enterprise, there are 15Krpm disks, 10Krpm disks and 7.2Krpm rotating media disks.  As speed goes down, capacity goes up.  In Seagate’s last annual report they stated that approximately 10% of the drives they manufactured were shipped to the enterprise.  Given that rate, of the 312M drives, maybe 31M were enterprise class (this probably overstates the number but usable as an upper bound).

As for SSDs, in the IDC report cited above, they mentioned two primary markets the PC and enterprise markets for SSD penetration.  In that same Seagate annual report, they said their desktop and mobile markets were around 80% of disk drives shipped.  If we use that proportion for SSDs that would say that of the 5.5M units shipped last half year, 4.4 were in the PC space and 1.1M were for the enterprise.  Given that, it would state that the enterprise class SSDs represent ~3.4% of the enterprise class disk drives shipped.  This is over 10X more than my prior estimate of SSDs being (<0.2%) of enterprise disk drives.  Reality probably lies somewhere between these two estimates.

I wrote a research report a while back which predicted that SSDs would never take off in the enterprise, I was certainly wrong then.  If these numbers are correct, capturing 10% of the enterprise disk market in little under 2 years can only mean that high-end, 15Krpm drives are losing ground faster than anticipated.  Which brings up the analogy of the invasive species.  SSDs seem to be winning a significant beach head in the enterprise market.

In the mean time, drive vendors are fighting back by moving from the 3.5″ to 2.5″ form factor, offering both 15K and 10K rpm drives.   This probably means that the 15Krpm 3.5″ drive’s days are numbered.

I made another prediction almost a decade ago that 2.5″ drives would take over the enterprise around 2005 – wrong again, but only by about 5 years or so. I got to stop making predictions, …

Is M and A the only way to grow?

Photograph of Women Working at a Bell System Telephone Switchboard by US National Archives (cc) (from flickr)

Photograph of Women Working at a Bell System Telephone Switchboard by US National Archives (cc) (from flickr)

Oracle buys Sun, EMC buys Data Domain, Cisco buys Tandberg, it seems like every month another major billion dollar acquisition occurs.  Part of this is because of the recent economic troubles, which now values many companies at the lowest they have been for many years and thus, making it cheaper to acquire good (and/or failing) companies.  But one has to wonder is this the only way to grow?

I don’t think so.

Corporate growth can be purely internally driven or organic just as well as from acquisition.  But it’s definitely harder to do internally.  Why?

  • Companies are focused on current revenue producing products – Revolutionary products rarely make it into development in today’s corporations because they take resources away from other (revenue producing) products.
  • Companies are focused on their current customer base - Products that serve other customers rarely make out into the market from today’s corporations because such markets are foreign to the companies current marketing channels.
  • Company personnel understand current customer problems – To be successful, any new product must address it’s customer pain points and offer some sort of a unique, differentiated solution to those issues and because this takes understanding other customer problems, it seldom happens.
  • New products can sometimes threaten old product revenue streams – It’s a rare new product that doesn’t take market share aware from some old way of doing business.  As companies focus on a particular market, any new product development will no doubt focus on those customers as well.  Thus, many new internally developed products will often displace (or eat away at) current product revenue.  Early on, it’s hard to see how any such product can be justified with respect to current corporate revenue.
  • New products often take efforts above and beyond current product activities – To develop, market and sell revolutionary products takes enormous, ”all-out” efforts to get off the ground.  Most corporations are unable to sustain this level of effort for long, as their startup phase was long ago and long forgotten.

We now know how hard it can be but how does Apple do it?  The iPod and iPhone were revolutionary products (at least from Apple’s perspective) and yet they both undeniably became great successes and helped to redefine industries in the process.  And no one can argue that they haven’t helped Apple to grow significantly in the process.  So how can this be done?

  • It takes strong visionary leadership in the company at the highest level – Such management can make the tough decisions to take resources away from current, revenue producting products and devote time and effort to new ones.
  • It takes marketing genius - Going after new markets, even if they are adjacent, requires in-depth understanding of new market dynamics and total engagement to be succesful.
  • It takes development genius – Developing entirely new products, even if based on current technology, takes development expertise above and beyond evolutionary product enhancement.
  • It takes hard work and a dedicated team – Getting new products off the ground takes a level of effort above and beyond current ongoing product activities.
  • It takes a willingness to fail - Most new internally developed products and/or startups fail.  This fact can be hard to live with and makes justifying future products even harder.

In general, all these items are easier to find in startups rather than an ongoing corporation today.  This is why most companies today find it easier and more successful to grow through acquisitions rather than through organic or internal development.

However, it’s not the only way.  ATT did it for almost a century in the telecom industry but they owned a monopoly.  IBM and HP did it occasionally over the past 60 years or so, but they had strong visionary leadership for much of that time and stumbled miserably, when such leadership was lacking.  Apple has done it over the past couple of decades or so but this is mainly due to Steve Jobs.  There are others of course, but I would venture to say all had strong leadership at the helm.

But these are the exceptions.  Strong visionary leaders usually don’t make it to the top of today’s corporations.  Why that’s the case needs to be the subject of a future post…

Latest CIFS performance – chart of the month

Above we reproduce a chart from our latest newsletter StorInttm Dispatch on SPECsfs(R) 2008 benchmark results.  This chart shows the top 10 CIFS throughput benchmark results as of the end of last year.  As observed in the chart Apple’s Xserve running Snow Leopard took top performance with over 40K CIFS throughput operations per second.  My problem with this chart is that there are no enterprise class systems represented in the top 10 or for that matter (not shown in the above) in any CIFS result.

Now some would say it’s still early yet in the life of the 2008 benchmark but it has been out now for 18 months and still has not a single enterprise class system submission reported.  Possibly, CIFS is not considered an enterprise class protocol but I can’t believe that given the proliferation of Windows.  So what’s the problem?

I have to believe it’s part tradition, part not wanting to look bad, and part just lack of awareness on the part of CIFS users.

  • Traditionally, NFS benchmarks were supplied by SPECsfs and CIFS benchmarks were supplied elsewhere, i.e., NetBenc. However, there never was a central repository for NetBench results so comparing system performance was cumbersome at best.  I believe that’s one reason for SPECsfs’s CIFS benchmark.  Seeing the lack of a central repository for a popular protocol, SPECsfs created their own CIFS benchmark.
  • Performance on system benchmarks are always a mixed bag.  No-one wants to look bad and any top performing result is temporary until the next vendor comes along.  So most vendors won’t release a benchmark result unless it shows well for them.  Not clear if Apple’s 40K CIFS ops is a hard number to beat, but it’s been up there for quite awhile now, and has to tell us something.
  • CIFS users seem to be aware and understand NetBench but don’t have similar awareness on SPECsfs CIFS benchmark yet.  So, given today’s economic climate, any vendor wanting to impress CIFS customers would probably choose to ignore SPECsfs and spend their $s on NetBench.  The fact that comparing results was neigh impossible, could be considered an advantage for many vendors.

So SPECsfs CIFS just keeps going on.  One way to change this dynamic is to raise awareness.  So as more IT staff/consultants/vendors discuss SPECsfs CIFS results, its awareness will increase.  I realize some of  my analysis on CIFS and NFS performance results doesn’t always agree with the SPECsfs party line, but we all agree that this benchmark needs wider adoption.  Anything that can be done to facilitate that deserves my (and their) support.

So for all my storage admins, CIOs and other influencers of NAS system purchases friends out there, you need to start asking to about SPECsfs CIFS benchmark results.  All my peers out their in the consultant community, get on the bandwagon.  As for my friends in the vendor community, SPECsfs CIFS benchmark results should be part of any new product introduction.  Whether you want to release results is and always will be, a marketing question but you all should be willing to spend the time and effort to see how well new systems perform on this and other benchmarks.

Now if I could just get somebody to define an iSCSI benchmark, …

Our full report on the latest SPECsfs 2008 results including both NFS and CIFS performance, will be up on our website later this month.  However, you can get this information now and subscribe to future newsletters to receive the full report even earlier, just email us at SubscribeNews@SilvertonConsulting.com?Subject=Subscribe_to_Newsletter.

4RZFCSBSJCA6

What is cloud storage good for?

Facebook friend carrousel by antjeverena (cc) (from flickr)

Facebook friend carrousel by antjeverena (cc) (from flickr)

Cloud storage has emerged  as a viable business service in the last couple of years, but what does cloud storage really do for the data center.  Moving data out to the cloud makes for unpredictable access times with potentially unsecured and unprotected data.  So what does the data center gain by using cloud storage?

  • Speed – it  often takes a long time (day-weeks-months) to add storage to in-house data center infrastructure.  In this case, having a cloud storage provider where one can buy additional storage by the GB/Month may make sense if one is developing/deploying new applications where speed to market is important.
  • Flexibility – data center storage is often leased or owned for long time periods.  If an application’s data storage requirements vary significantly over time then cloud storage, purchase-able or retire-able on a moments notice, may be just right.
  • Distributed data access – some applications require data to be accessible around the world.  Most cloud providers have multiple data centers throughout the world that can be used to host one’s data. Such multi-site data centers can be often be accessed much quicker than going back to a central data center.
  • Data archive – backing up data that is infrequently accessed wastes time and resources. As such, this data could easily reside in the cloud with little trouble.  References to such data would need to be redirected to one’s cloud provider but that’s about all that needs to be done.
  • Disaster recovery – disaster recovery for many data centers is very low on their priority list.  Cloud storage provides an easy, ready made solution to accessing one’s data outside the data center.  If you elect to copy all mission critical data out to the cloud on a periodic basis, then this data could theoretically be accessed anywhere, usable in many DR scenarios.

Probably some I am missing here but these will do for now.  Most cloud storage providers can provide any and all of these services.

Of course all these capabilities can be done in-house with additional onsite infrastructure, multi-site data centers, archive systems, or offsite backups.  But the question then becomes which is more economical.  Cloud providers can amortize their multi-site data centers across many customers and as such, may be able to provide these services much cheaper than could be done in-house.

Now if they could only solve that unpredictable access time, …

Toshiba studies laptop write rates confirming SSD longevity

Toshiba's New 2.5" SSD from SSD.Toshiba.com

Toshiba's New 2.5in SSD from SSD.Toshiba.com


Today Toshiba announced a new series of SSD drives based on their 32NM MLC NAND technology. The new technology is interesting but what caught my eye was another part of their website, i.e., their SSD FAQs. We have talked about MLC NAND technology before and have discussed its inherent reliability limitations, but this is the first time I have seen some company discuss their reliability estimates so publicly. This was documented more in an IDC white paper on their site but the summary on the FAQ web page speaks to most of it.

Toshiba’s answer to the MLC write endurance question all revolves around how much data a laptop user writes per day which their study makes clear . Essentially, Toshiba assumes MLC NAND write endurance is 1,400 write/erase cycles and for their 64GB drive a user would have to write, on average, 22GB/day for 5 years before they would exceed the manufacturers warranty based on write endurance cycles alone.

Let’s see:

  • 5 years is ~1825 days
  • 22GB/day over 5 years would be over 40,000GB of data written
  • If we divide this by the 1400 MLC W/E cycle limits given above, that gives us something like 28.7 NAND pages could fail and yet still support write reliability.

Not sure what Toshiba’s MLC SSD supports for page size but it’s not unusual for SSDs to ship an additional 20% of capacity to over provision for write endurance and ECC. Given that 20% of 64GB is ~12.8GB, and it has to at least sustain ~28.7 NAND page failures, this puts Toshiba’s MLC NAND page at something like 512MB or ~4Gb which makes sense.

MLC vs, SLC write endurance from SSD.Toshiba.com

MLC vs, SLC write endurance from SSD.Toshiba.com


The not so surprising thing about this analysis is that as drive capacity goes up, write endurance concerns diminish because the amount of data that needs to be written daily goes up linearly with the capacity of the SSD. Toshiba’s latest drive announcements offer 64/128/256GB MLC SSDs for the mobile market.

Toshiba studies mobile users write activity

To come at their SSD reliability estimate from another direction, Toshiba’s laptop usage modeling study of over 237 mobile users showed the “typical” laptop user wrote an average of 2.4GB/day (with auto-save&hibernate on) and a “heavy” labtop user wrote 9.2GB/day under similar specifications. Now averages are well and good but to really put this into perspective one needs to know the workload variability. Nonetheless, their published results do put a rational upper bound on how much data typical laptop users write during a year that can then be used to compute (MLC) SSD drive reliability.

I must applaud Toshiba for publishing some of their mobile user study information to help us all better understand SSD reliability for this environment. It would have been better to see the complete study including all the statistics, when it was done, how users were selected, and it would have been really nice to see this study done by a standard’s body (say SNIA) rather than a manufacturer, but these are all personal nits.

Now, I can’t wait to see a study on write activity for the “heavy” enterprise data center environment, …

5 laws of unstructured data

Richard (Dick) Nafzger with Apollo data tape by Goddard Photo and Video (cc) (from flickr)

Richard (Dick) Nafzger with Apollo data tape by Goddard Photo and Video (cc) (from flickr)


All data operates under a set of laws but unstructured data suffers from these tendencies more than most of all. Although, information technology has helped us to create and manage data easier, it hasn’t done much to minimize the problems these laws produce.

As such, I introduce here my 5 laws of unstructured data in the hopes that they may help us better understand the data we create.

Law 1: Unstructured data grows 50% per year

This has been a truism in the data center for as far back as I can remember. In the data center this is driven by business transactions, new applications and new products/services. On top of all that corporate compliance often dictate that data be retained long after it’s usefulness has passed.

Nowadays, Law 1 is also true for the home user as well. Here it’s a combination of email and media. Not only are cameras moving from 6 to 9 megapixels, home video is moving to high definition and there is just a whole lot more media being created everyday. Also, now social media seems to have doubled or tripled our outreach data creation above “normal email” alone.

Law 2: Unstructured data access frequency diminishes over time

Data created today is accessed frequently during it’s first 90 days of life and then less often after that. Reasons for this decaying access pattern vary, but human memory has to play a significant part in this.

Furthermore, business transactions encounter a life cycle from initiation, to delivery and finally, to termination. During these transitions various unstructured data are created representing the transaction state. Such data may be examined at quarter end and possibly at year end but may never see the light of day after that.

Law 3: Unsearchable data is lost data

Given Law 2’s data access decay and Law 1’s data growth, unsearchable data is by definition, inaccessible data. It’s not hard to imagine how this plays out in the data center or home.

For the data center, unstructured data mostly resides in user and application directories. I am constantly amazed that it’s easier to find data out on the web than it is to find data elsewhere in the data center. Moreover, E-discovery has become a major business segment in recent years by attempting to search unstructured corporate data.

As a Mac user my home environment is searchable for any text string. However, my photo library is another matter. Finding a specific photo from a couple of years ago is a sequential perusal of iPhoto’s library and as such, is seldom done.

Law 4: Unstructured data is copied often

Over a decade ago, a company I worked with sponsored a study to see how often data is copied. The numbers we came up with were impressive. A small but significant % of data is copied often, it’s not unusual to see 6-8 copies of such data. Some of this copying occurs when final documents are passed on, some comes from teamwork and other joint collaboration as working documents are reviewed and some is just interesting information that deserves broader dissemination. As such, data copies can represent a significant portion of any data center’s storage.

I suppose data proliferation may not be as evident in the home but our home would be an exception. Each of our Macs has a copy of all email account and have copies of the best photos. In addition, with laptops and multiple desktops, most Mac’s have copies of each (adult) user’s work environment,

Law 5: Unstructured data manual classification schemes degrade over time

In the data center, one could easily classify any file data created and maintain a database of file meta-data to facilitate access to file data. But who has the discipline or spare time to update such a database whenever they create a file or document. While this may work for “official records”, the effort involved makes it unusable for everything else.

My favorite home example of this is once again, our iPhoto library with it’s manual classification system using stars, e.g., I can assign anything from 0 to 5 stars to any photo. Used to be that after each camera import, I would assign a star rating to each new photo. Nowadays, the only time I do this is once a year and as such, it’s becoming more problematic and less useful. As we take more photographs each year this becomes much more of a burden.

Not sure these 5 laws of unstructured data are mutually exclusive and completely exhaustive but it’s a start. If anyone has any ideas on how to improve my unstructured data laws, feel free to comment below. In the mean time, as for structured data laws, …