(QoM16-002): Will Intel Omni-Path GA in scale out enterprise storage by February 2016 – NO 0.91 probability

opa-cardQuestion of the month (QoM for February is: Will Intel Omni-Path (Architecture, OPA) GA in scale out enterprise storage by February 2016?

In this forecast enterprise storage are the major and startup vendors supplying storage to data center customers.

What is OPA?

OPA is Intel’s replacement for InfiniBand and starts out at 100Gbps. It’s intended more for high performance computing (HPC), to be used as an inter-cluster server interconnect or next generation fabric. Intel says it “will maintain consistency and compatibility with existing Intel True Scale Fabric and InfiniBand APIs by working through the open source OpenFabrics Alliance (OFA) software stack on leading Linux* distribution releases”. Seems like Intel is making it as easy as possible for vendors to adopt the technology.
Continue reading “(QoM16-002): Will Intel Omni-Path GA in scale out enterprise storage by February 2016 – NO 0.91 probability”

(Storage QoM 16-001): Will we see NVM Express (NVMe) drives GA’d in enterprise storage over the next year

NVMeFirst, let me state that QoM stands for Question of the Month. Doing these forecast can be a lot of work, and rather than focusing my whole blog on weekly forecast questions and answers, I would like to do something else as well. So, from now on we are doing only one new forecast a month.

So for the first question of 2016, we will forecast whether NVMe SSDs will be GA’d in enterprise storage over the next year.

NVM Express (NVMe) means the new PCIe interface for SSD storage. Wikipedia has a nice description of NVMe. As discussed there, NVMe was designed for higher performance and enhanced parallelism which comes with the PCI Express (PCIe) bus. The current version of the NVMe spec is 1.2a (available here).

GA means generally available for purchase by any customer.

Enterprise storage systems refers to mid-range and enterprise class storage systems from major AND non-major storage vendors, which includes startups.

Over the next year means by 19 January 2017.

Special thanks to Kacey Lai (@mrdedupe), Primary Data for suggesting this months question.

Current and updates to previous forecasts

 

Update on QoW 15-001 (3DX) forecast:

News out today indicates that 3DX (3D XPoint non-volatile memory) samples may be available soon but it could take another 12 to 18 months to get it into production. 3DX manufacturing is more challenging than current planar NAND technology and uses about 100 new materials, many of which are currently single sourced. We already built into our 3DX forecast potential delays in reaching production in 6 months. The news above says this could be worse than  expected. As such, I feel even stronger that there is less of a possibility of 3DX shipping in storage systems by next December. So I would update my forecast for QoW 15-001 to NO with an 0.75 probability at this time.

So current forecasts for QoW 15-001 are:

A) YES with 0.85 probability; and

B) NO with 0.75 probability

Current QoW 15-002 (3D TLC) forecast

We have 3 active participants, current forecasts are:

A) Yes with 0.95 probability;

B) No with 0.53 probability; and

C) Yes with 1.0 probability

Current QoW 15-003 (SMR disk) forecast

We have 1 active participant, current forecast is:

A) Yes with 0.85 probability

 

(Storage QoW 15-003): SMR disks in GA enterprise storage in 12 months? Yes@.85 probability

Hard Disk by Jeff Kubina (cc) (from Flickr)
Hard Disk by Jeff Kubina (cc) (from Flickr)

(Storage QoW 15-003): Will we see SMR (shingled magnetic recording) disks in GA enterprise storage systems over the next 12 months?

Are there two vendors of SMR?

Yes, both Seagate and HGST have announced and currently shipping (?) SMR drives, HGST has a 10TB drive and Seagate has an 8TB drive on the market since last summer.

One other interesting fact is that SMR will be the common format for all future disk head technologies including HAMR, MAMR, & BPMR (see presentation).

What would storage vendors have to do to support SMR drives?

Because of the nature of SMR disks, writes overlap other tracks so they must be written, at least in part, sequentially (see our original post on Sequential only disks). Another post I did reported on recent work by Garth Gibson at CMU (Shingled Magnetic Recording disks) which showed how multiple bands or zones on an SMR disk could be used some of which could be written randomly and others which could be written sequentially but all could be read randomly. With such an approach you could have a reasonable file system on an SMR device with a metadata partition (randomly writeable) and a data partition (sequentially writeable).

In order to support SMR devices, changes have been requested for the T10 SCSI  & T13 ATA command protocols. Such changes would include:

  • SMR devices support a new write cursor for each SMR sequential band.
  • SMR devices support sequential writes within SMR sequential bands at the write cursor.
  • SMR band write cursors can be read, statused and reset to 0. SMR sequential band LBA writes only occur at the band cursor and for each LBA written, the SMR device increments the band cursor by one.
  • SMR devices can report their band map layout.

The presentation refers to multiple approaches to SMR support or SMR drive modes:

  • Restricted SMR devices – where the device will not accept any random writes, all writes occur at a band cursor, random writes are rejected by the device. But performance would be predictable. 
  • Host Aware SMR devices – where the host using the SMR devices is aware of SMR characteristics and actively manages the device using write cursors and band maps to write the most data to the device. However, the device will accept random writes and will perform them for the host. This will result in sub-optimal and non-predictable drive performance.
  • Drive managed SMR devices – where the SMR devices acts like a randomly accessed disk device but maps random writes to sequential writes internally using virtualization of the drive LBA map, not unlike SSDs do today. These devices would be backward compatible to todays disk devices, but drive performance would be bad and non-predictable.

Unclear which of these drive modes are currently shipping, but I believe Restricted SMR device modes are already available and drive manufacturers would be working on Host Aware and Drive managed to help adoption.

So assuming Restricted SMR device mode availability and prototypes of T10/T13 changes are available, then there are significant but known changes for enterprise storage systems to support SMR devices.

Nevertheless, a number of hybrid storage systems already implement Log Structured File (LSF) systems on their backends, which mostly write sequentially to backend devices, so moving to a SMR restricted device modes would be easier for these systems.

Unclear how many storage systems have such a back end, but NetApp uses it for WAFL and just about every other hybrid startup has a LSF format for their backend layout. So being conservative lets say 50% of enterprise hybrid storage vendors use LSF.

The other 60% would have more of a problem implementing SMR restricted mode devices, but it’s only a matter of time before  all will need to go that way. That is assuming they still use disks. So, we are primarily talking about hybrid storage systems.

All major storage vendors support hybrid storage and about 60% of startups support hybrid storage, so adding these to together, maybe about 75% of enterprise storage vendors have hybrid.

Using analysis on QoW 15-001, about 60% of enterprise storage vendors will probably ship new hardware versions of their systems over the next 12 months. So of the 13 likely new hardware systems over the next 12 months, 75% have hybrid solutions and 50% have LSF, or ~4.9 new hardware systems will be released over the next 12 months that are hybrid and have LSF backends already.

What are the advantages of SMR?

SMR devices will have higher storage densities and lower cost. Today disk drives are running 6-8TB and the SMR devices run 8-10TB so a 25-30% step up in storage capacity is possible with SMR devices.

New drive support has in the past been relatively easy because command sets/formats haven’t changed much over the past 7 years or so, but SMR is different and will take more effort to support. The fact that all new drives will be SMR over time gives more emphasis to get on the band wagon as soon as feasible. So, I would give a storage vendor a 80% likelihood of implementing SMR, assuming they have new systems coming out, are already hybrid and are already using LSF.

So of the ~4.9 systems that are LSF/Hybrid/being released *.8, says ~3.9 systems will introduce SMR devices over the next 12 months.

For non-LSF hybrid systems, the effort seems much harder, so I would give the likelihood of implementing SMR about a 40% chance. So of the ~8.1 systems left that will be introduced in next year, 75% are hybrid or ~6.1 systems and they have a 40% likelihood of implementing SMR so ~2.4 of these non-LSF systems will probably introduce SMR devices.

There’s one other category that we need to consider and that would be startups in stealth. These could have been designing their hybrid storage for SMR from the get go. In QoW 15-001 analysis I assumed another ~1.8 startup vendors would emerge to GA over the next 12 months. And if we assume that 0.75% of these were hybrid then there’s ~1.4 startups vendors that could be using SMR technology in their hybrid storage for a (4.9+2.4+1.4(1.8*.75)= 8.7 systems have a high probability of SMR implementation over the next 12 months in GA enterprise storage products.

Forecast

So my forecast of SMR adoption by enterprise storage is Yes for .85 probability (unclear what the probability should be, but it’s highly probable).

~~~~

Comments?

(Storage QoW 15-003): Will we see SMR disks in GA enterprise storage systems over the next 12 months?

SMR refers to very high density (10TB), shingled magnetic recording (SMR) hard disk devices.

GA means generally available for purchase by any customer.

Enterprise storage systems refers to mid-range and enterprise class storage systems from major AND non-major storage vendors, which includes startups.

Over the next 12 months means by 22 December 2016.

We discussed our Analyst Forecasting and previous Questions of the week (QoW) on 3D XPoint technology ( forecast) and 3D TLC NAND technology (forecast), with present status below.

Present QoW forecasts:

(#Storage-QoW 15-001) – Will 3D XPoint be GA’d in  enterprise storage systems within 12 months? 2 active forecasters, current forecasts are:

A) YES with 0.85 probability; and

B) NO with 0.62 probability.

(Storage-QoW 15-002) 3D TLC NAND GA’d in major vendor storage next year? 3 active participants, current forecasts are:

A) Yes with 0.95 probability;

B) No with 0.53 probability; and

C) Yes with 1.0 probability

 

 

 

(Storage-QoW 15-002) 3D TLC NAND GA’d in major vendor storage next year – NO 0.53

Latest forecast question is: Will 3D TLC NAND be GA’d in major storage products in 12 months?

Splitting up the QoW into more answerable questions:

A) Will any vendor be shipping 3D TLC NAND SSDs/PCIe cards over the next 9 months?

Samsung will is reportedly already shipping 3D TLC NAND SSDs and PCIe cards as of August 13, 2015 and will be producing 48 layer 256Gb 3D TLC NAND memory soon.  Unclear what 3D TLC NAND technology will be shipping in the next generation drives due out soon but they are all spoken of as read-intensive/write-light storage.

One consideration is that major storage vendors typically will not introduce new storage technologies unless it’s available from multiple suppliers. This is not always the case and certainly not for internally developed storage but has been a critical criteria for most major vendors. But in the above reference, it was reported that SK Hynix and Toshiba are gearing up for 2016 shipments of 48 layer 3D TLC NAND as well, how long it takes to get these into SSD/PCIe cards is another question.

A number of startups are rumored to be using 3D TLC and Kamanario has publicly announced that their systems already use 3D TLC.

My probability of a second source for 3D TLC storage coming out within the first 9 months of next year is 0.75 

B) What changes will be required for storage vendors to utilize 3D TLC NAND storage?

The important changes will be SSD endurance and IO performance.

NAND endurance is rated at DWPD (drive writes per day). Current Samsung 3D TLC SSDs are reportedly rated anywhere from 1.3 to 3.5 DWPD for a 5 year warranty period and newer 3D TLC SSDs are rated at 5 DWPD (unknown warranty period). Current enterprise (800GB) MLC drives are reportedly rated at 10-25 DWPD (for 5 years). So if we use 3.5 DWPD for 3D TLC and 17.5 DWPD for MLC, 3D TLC NAND has a ~5X reduction in endurance.

As for performance, if we use the Samsung reported performance of 160K random reads and 18K random writes vs. an HGST 800GB MLC SSD that has 145K random read and 100K random write performance. There is a reduction of ~5.6X in write performance.  Read performance is actually better with 3D TLC NAND.

In order for major vendors to handle, a reduction in 3D TLC endurance, they will need to limit the amount of data written to these devices. Conveniently, in order for major vendors to deal with the reduction in 3D TLC write performance, they will also have to limit the amount of data written to these devices.

Hence, one potential solution is a multi-tiering, all flash array which uses standard MLC SSD/PCIe cards to absorb the heavy write activity and data from this tier, that is relatively unused, could be archived (?) over time to a 2nd tier of storage consisting of 3D TLC SSD/PCIe cards.

This is not that unusual and it’s being done today for hybrid (disk-SSD) systems with automated storage tiering. Only in this case, data is moved to SSD only if it’s accessed frequently. For 3D TLC the tiering policy should be changed from access frequency to time since last access. Doing so in a hybrid array with disk, MLC SSD and TLC SSD, would require the creation of an additional pool of storage and could be accomplished with software changes alone. There are current major vendor storage systems that already support 3 tiers of storage. And some which already support archiving to cloud storage, so these sorts of changes are present in current shipping product.

So yes there’s a reduction in endurance and yes it has worse write performance but it’s still much faster than disk and most major vendors already have software to be able to handle diverse performance storage. So accomodating the new 3D TLC storage shouldn’t be much of a problem.

New storage technology like this usually doesn’t require a hardware change to use. So the only thing that needs to be changed to accomodate the new 3D TLC is software functionality

So if the 3D TLC 2nd source was available there’s a 0.9 probability that some major storage vendor would adopt the technology over the next year.

3) What are the advantages of 3D TLC storage?

Price should be cheaper than MLC storage and the density (GB/volume) should be better. So in this case, it’s a reduction in cost/GB and increase GB/volume. So for these reasons alone it should probably be adopted.

The advantages are good and would certainly give a major vendor an edge in capacity density and in $/GB or at least get them to parity (barring any  functionality differential) with startups adopting the technology.

So given the advantages present in the technology, I would say there should be a 0.7 probability of adoption within the next 12 months.  

Forecast for QoW 15-002 is:

0.75*0.90*0.70 = 0.47 probability of YES adoption or .53 probability of NO adoption of 3D TLC NAND in major storage vendor products over the next 12 months

Update on QoW 15-001 forecast:

I have an update to my post that forecast for QoW 15-001 as a No with 0.62 probability. This question was on the adoption of 3D XPoint (3DX) technology in any enterprise storage vendor product within a year.

It has been brought to my attention that Intel mentioned the cost of producing 3DX was somewhere between 1/2 and 1/4 the cost of DRAM. Also, recent information has come to light that Intel-Micron will price 3DX between 3D NAND and DRAM. So my analysis as to the cost differential for caching technologies is way off (20X). So there would be a significant cost advantage in using the technology for volatile and non-volatile cache. Even if the chips cost nothing, it might be on the order of $3-5K cheaper with 3DX than battery/superCap backed up DRAM and volatile DRAM caching. So it exists but less than a significant cost saver.  So this being the case, I would have to adjust my 0.35 probability of adoption in this use up to 0.65.  I failed to incorporate this parameter in my final forecast, so all that analysis was for nothing. 

Another potential use is as a non-volatile write buffer for SSDs and even more important for 3D TLC NAND (see above). As this is in an SSD, software and hardware integration is commonplace so there’s a higher probability of adoption there as well. And as there are more SSDs than DRAM caching the cost differential could be more significant. Then again, it would depend on two technologies being adopted (TLC and 3DX) so it’s less likely than any one alone.

The other news (to me) was that Intel announced they would incorporate proprietary changes in DIMM bus to support 3DX as one approach. This does not lend credence to widespread adoption.  But probably only applies to server support for the technology, so I would reduce my probability there to 0.55

Updated forecast for QoW 15-001 is now:

  1. chip in production stays at .85, so there’s still 2.6 potential systems that could adopt the technology directly
  2. 0.85 probability that chips in production * 0.55 probability of servers with the technology  * 0.65 probability that a storage vendor would adopt the technology to replace caching, so (=) ~0.30 probability of server adoption in storage, and with 18 potential vendors thats another 5.5 systems potentially adopting the technology.
  3. Add in the two-three startups that likely will emerge, with similar probability of adoption, or 0.30, which is another 0.9 systems

For a total of 2.6+5.5+0.9=9 systems out of ~24 or 0.38 probability of adoption.

So my updated forecast still stands at No with a .62 probability.

(#Storage-QoW 2015-002): Will we see 3D TLC NAND GA in major vendor storage products in the next year?

450_x_492_3d_nand_32_layer_stack

I was almost going to just say something about TLC NAND but there’s planar TLC and 3D TLC. From my perspective, planar NAND is on the way out, so we go with 3D TLC NAND.

QoW 2015-002 definitions

By “3D TLC NAND” we mean 3 dimensional (rather than planar or 2 dimensional) triple level cell (meaning 3 values rather than two [MLC] or one [SLC]) NAND technology. It could show up in SSDs, PCIe cards and perhaps other implementations. At least one flash vendor is claiming to be shipping 3D TLC NAND so it’s available to be used. We did a post earlier this year on 3D NAND, how high can it go. Rumors are out that startup vendors will adopt the technology but have heard nothing any major vendor plans for the technology.

By “major vendor storage products” I mean EMC VMAX, VNX or XtremIO;  HDS VSP G1000, HUS VM (or replacement), VSP-F/VSP G800-G600; HPE 3PAR, IBM DS8K, FlashSystem, or V7000 StorWize; & NetApp AFF/FAS 8080, 8060, or 8040. I tried to use 700 drives or better block storage product lines for the major storage vendors.

By “in the next year” I mean between today (15Dec2015) and one year from today (15Dec2016).

By “GA” I mean a generally available product offering that can be ordered, sold and installed within the time frame identified above.

Forecasts for QoW 2015-002 need to be submitted via email (or via twitter with email addresses known to me) to me before end of day (PT) next Tuesday 22Dec2015.

Thanks to Howard Marks (DeepStorage.net, @DeepStorageNet) for the genesis of this weeks QoW.

We are always looking for future QoW’s, so if you have any ideas please drop me a line.

Forecast contest – status update for prior QoW(s):

(#Storage-QoW 2015-001) – Will 3D XPoint be GA’d in  enterprise storage systems within 12 months? 2 active forecasters, current forecasts are:

A) YES with 0.85 probability; and

B) NO with 0.62 probability.

These can be updated over time, so we will track current forecasts for both forecasters with every new QoW.

 

An analyst forecasting contest ala SuperForecasting & 1st #Storage-QoW

71619318_80d2135743_zI recently read the book SuperForecasting: the art and science of prediction by P. E. Tetlock & D. Gardner. Their Good Judgement Project has been running for years now and the book is the results of their experiments.  I thought it was a great book.

But it also got me to thinking, how can industry analysts do a better job at forecasting storage trends and events?

Impossible to judge most analyst forecasts

One thing the book mentioned was that typically analyst/pundit forecasts are too infrequent, vague and time independent to be judge-able as to their accuracy. I have committed this fault as much as anyone in this blog and on our GreyBeards on Storage podcast (e.g. see our Yearend podcast videos…).

What do we need to do differently?

The experiments documented in the book show us the way. One suggestion is to start putting time durations/limits on all forecasts so that we can better assess analyst accuracy. The other is to start estimating a probability for a forecast and updating your estimate periodically when new information becomes available. Another is to document your rational for making your forecast. Also, do post mortems on both correct and incorrect forecasts to learn how to forecast better.

Finally, make more frequent forecasts so that accuracy can be assessed statistically. The book discusses Brier scores as a way of scoring the accuracy of forecasters.

How to be better forecasters?

In the back of the book the author’s publish a list of helpful hints or guidelines to better forecasting which I will summarize here (read the book for more information):

  1. Triage – focus on questions where your work will pay off.  For example, try not to forecast anything that’s beyond say 5 years out, because there’s just too much randomness that can impact results.
  2. Split intractable problems into tractable ones – the author calls this Fermizing (after the physicist) who loved to ballpark answers to hard questions by breaking them down into easier questions to answer. So decompose problems into simpler (answerable) problems.
  3. Balance inside and outside views – search for comparisons (outside) that can be made to help estimate unique events and balance this against your own knowledge/opinions (inside) on the question.
  4. Balance over- and under-reacting to new evidence – as forecasts are updated periodically, new evidence should impact your forecasts. But a balance has to be struck as to how much new evidence should change forecasts.
  5. Search for clashing forces at work – in storage there are many ways to store data and perform faster IO. Search out all the alternatives, especially ones that can critically impact your forecast.
  6. Distinguish all degrees of uncertainty – there are many degrees of knowability, try to be as nuanced as you can and properly aggregate your uncertainty(ies) across aspects of the question to create a better overall forecast.
  7. Balance under/over confidence, prudence/decisiveness – rushing to judgement can be as bad as dawdling too long. You must get better at both calibration (how accurate multiple forecasts are) and resolution (decisiveness in forecasts). For calibration think weather rain forecasts, if rain tomorrow is 80% probably then over time rain probability estimates should be on average correct. Resolution is no guts no glory, if all your estimates are between 0.4 and 0.6 probable, your probably being to conservative to really be effective.
  8. During post mortems, beware of hindsight bias – e.g., of course we were going to have flash in storage because the price was coming down, controllers were becoming more sophisticated, reliability became good enough, etc., represents hindsight bias. What was known before SSDs came to enterprise storage was much less than this.

There are a few more hints than the above.  In the Good Judgement Project, forecasters were put in teams and there’s one guideline that deals with how to be better forecasters on teams. Then, there’s another that says don’t treat these guidelines as gospel. And a third, on trying to balance between over and under compensating for recent errors (which sounds like #4 above).

Again, I would suggest reading the book if you want to learn more.

Storage analysts forecast contest

I think we all want to be better forecasters. At least I think so. So I propose a multi-year long contest, where someone provides a storage question of the week and analyst,s such as myself, provide forecasts. Over time we can score the forecasts by creating a Brier score for each analysts set of forecasts.

I suggest we run the contest for 1 year to see if there’s any improvements in forecasting and decide again next year to see if we want to continue.

Question(s) of the week

But the first step in better forecasting is to have more frequent and better questions to forecast against.

I suggest that the analysts community come up with a question of the week. Then, everyone would get one week from publication to record their forecast. Over time as the forecasts come out we can then score analysts in their forecasting ability.

I would propose we use some sort of hash tag to track new questions, “#storage-QoW” might suffice and would stand for Question of the week for storage.

Not sure if one question a week is sufficient but that seems reasonable.

(#Storage-QoW 2015-001): Will 3D XPoint be GA’d in  enterprise storage systems within 12 months?

3D XPoint NVM was announced last July by Intel-Micron (wrote a post about here). By enterprise storage I mean enterprise and mid-range class, shared storage systems, that are accessed as block storage via Ethernet or Fibre Channel as SCSI device protocols or as file storage using SMB or NFS file access protocols. By 12 months I mean by EoD 12/8/2016. By GA’d, I mean announced as generally available and sellable in any of the major IT regions of the world (USA, Europe, Asia, or Middle East).

I hope to have my prediction in by next Monday with the next QoW as well.

Anyone interested in participating please email me at Ray [at] SilvertonConsulting <dot> com and put QoW somewhere in the title. I will keep actual names anonymous unless told otherwise. Brier scores will be calculated starting after the 12th forecast.

Please email me your forecasts. Initial forecasts need to be in by one week after the QoW goes live.  You can update your forecasts at any time.

Forecasts should be of the form “[YES|NO] Probability [0.00 to 0.99]”.

Better forecasting demands some documentation of your rational for your forecasts. You don’t have to send me your rational but I suggest you document it someplace you can use to refer back to during post mortems.

Let me know if you have any questions and I will try to answer them here

I could use more storage questions…

Comments?

Photo Credits: Renato Guerreiro, Crystalballer