3D NAND, how high can it go?

450_x_492_3d_nand_32_layer_stackI was at the Flash Memory Summit a couple of weeks ago and a presenter (from Hynix, I think) got up and talked about how 3D NAND was going to be the way forward for all NAND technology. I always thought we were talking about a handful of layers. But on the slide he had what looked to be a skyscraper block with 20-40 layers of NAND.

Currently shipping 3D NAND

It seems all the major NAND fabs are shipping 30+ layer 3D NAND. Samsung last year said they were shipping 32-layer 3D (V-)NANDToshiba announced earlier this year that they had 48-layer 3D NANDHynix is shipping 36-layer 3D NAND.  Micron-Intel is also shipping 32-layer 3D NAND. Am I missing anyone?

Samsung also said that they will be shipping a 32GB, 48-layer V-NAND chip later this year. Apparently, Samsung is also working on 64-layer V-NAND in their labs and are getting good results.  In an article on Samsung’s website they mentioned the possibility of 100 layers of NAND in a 3D stack.

The other NAND fabs are also probably looking at adding layers to their 3D NAND but aren’t talking as much about it. i5QVjaOmlEZHmjM34GrH3NFORjU9A-xAk_JUvkzS8Os

Earlier this year on a GreyBeards on Storage Podcast we talked with Jim Handy, Director at Objective Analysis on what was going on in NAND fabrication. Talking with Jim was fascinating but one thing he said was that with 3D NAND, building a hole with the right depth, width and straight enough was a key challenge. At the time I was thinking a couple of layers deep. Boy was I wrong.

How high/deep can 3D NAND go?

On the podcast, Jim said he thought that 3D NAND would run out of gas around 2023. Given current press releases, it seems NAND fabs are adding ~16 layers a year to their 3D-NAND.

So if 32 to 48 layers is todays 3D-NAND and we can keep adding 16 layers/year through 2023 that’s 8 years *16 layers or an additional 128 layers  to the 32  to 48 layers currently shipping. With that rate we should get to 160 to 176 layer 3D NAND chips. And if 48 layers is 32GB then we maybe we could see  ~+100GB  3D NAND chips.

This of course means that there is no loss in capacity as we increase layers. Also that the industry can continue to add 16 layers/year to 3D-NAND chips.

I suppose there’s one other proviso, that nothing else comes along that is less expensive to fabricate while still providing ever increasing capacity of lightening fast, non-volatile storage (see a recent post on 3D XPoint NVM technology).

Photo Credit(s):

  1. Micron’s press release on 3D NAND, (c) 2015 Micron
  2. Toshiba’s press release as reported by AnandTech, (c) 2015 Toshiba

Next generation NVM, 3D XPoint from Intel + Micron

cross_point_image_for_photo_capsuleEarlier this week Intel-Micron announced (see webcast here and here)  a new, transistor-less NVM with 1000 time the speed (10µsec access time for NAND) of NAND [~10ns (nano-second) access times] and at 10X the density of DRAM (currently 16Gb/DRAM chip). They call the new technology 3D XPoint™ (cross-point) NVM (non-volatile memory).

In addition to the speed and density advantages, 3D XPoint NVM also doesn’t have the endurance problems associated with todays NAND. Intel and Micron say that it has 1000 the endurance of today’s NAND (MLC NAND endurance is ~3000 write (P/E) cycles).

At that 10X current DRAM density it’s roughly equivalent to todays MLC/TLC NAND capacities/chip. And at 1000 times the speed of NAND, it’s roughly equivalent in performance to DDR4 DRAM. Of course, because it’s non-volatile it should take much less power to use than current DRAM technology, no need for power refresh.

We have talked about the end of NAND before (see The end of NAND is here, maybe). If this is truly more scaleable than NAND it seems to me that the it does signal the end of NAND. It’s just a matter of time before endurance and/or density growth of NAND hits a wall and then 3D XPoint can do everything NAND can do but better, faster and more reliably.

3D XPoint technology

The technology comes from a dual layer design which is divided into columns and at the top and bottom of the columns are accessor connections in an orthogonal pattern that together form a grid to access a single bit of memory.  This also means that 3D Xpoint NVM can be read and written a bit at a time (rather than a “page” at a time with NAND) and doesn’t have to be initialized to 0 to be written like NAND.

The 3D nature of the new NVM comes from the fact that you can build up as many layers as you want of these structures to create more and more NVM cells. The microscopic pillar  between the two layers of wiring include a memory cell and a switch component which allows a bit of data to be accessed (via the switch) and stored/read (memory cell). In the photo above the yellow material is a switch and the green material is a memory cell.

A memory cell operates by a using a bulk property change of the material. Unlike DRAM (floating gates of electrons) or NAND (capacitors to hold memory values). As such it uses all of the material to hold a memory value which should allow 3D XPoint memory cells to scale downwards much better than NAND or DRAM.

Intel and Micron are calling the new 3D XPoint NVM storage AND memory. That is suitable for fast access, non-volatile data storage and non-volatile processor memory.

3D XPoint NVM chips in manufacturing today

First chips with the new technology are being manufactured today at Intel-Micron’s joint manufacturing fab in Idaho. The first chips will supply 128Gb of NVM and uses just two layers of 3D XPoint memory.

Intel and Micron will independently produce system products (read SSDs or NVM memory devices) with the new technology during 2016. They mentioned during the webcast that the technology is expected to be attached (as SSDs) to a PCIe bus and use NVMe as an interface to read and write it. Although if it’s used in a memory application, it might be better attached to the processor memory bus.

The expectation is that the 3D XPoint cost/bit will be somewhere in between NAND and DRAM, i.e. more expensive than NAND but less expensive than DRAM. It’s nice to be the only companies in the world with a new, better storage AND memory technology.


Over the last 10 years or so, SSDs (solid state devices) all used NAND technologies of one form or another, but after today SSDs can be made from NAND or 3D XPoint technology.

Some expected uses for the new NVM is in gaming applications (currently storage speed and memory constrained) and for in-memory databases (which are memory size constrained).  There was mention on the webcast of edge analytics as well.

Welcome to the dawn of a new age of computer storage AND memory.

Photo Credits: (c) 2015 Intel and Micron, from Intel’s 3D XPoint website

HP Tech Day – StoreServ Flash Optimizations

Attended HP Tech Field Day late last month in Disneyland. Must say the venue was the best ever for HP, and getting in on Nth Generation Conference was a plus. Sorry it has taken so long for me to get around to writing about it.

We spent a day going over HP’s new converged storage, software defined storage and other storage topics. HP has segmented the Software Defined Data Center (SDDC) storage requirements into cost optimized, Software Defined Storage and SLA optimized, Service Refined Storage. Under Software Defined storage they talked about their StoreVirtual product line which is an outgrowth of the Lefthand Networks VSA, first introduced in 2007. This June, they extended SDS to include their StoreOnce VSA product to go after SMB and ROBO backup storage requirements.

We also discussed some of HP’s OpenStack integration work to integrate current HP block storage into OpenStack Cinder. They discussed some of the integrations they plan for file and object store as well.

However what I mostly want to discuss in this post is the session discussing how HP StoreServ 3PAR had optimized their storage system for flash.

They showed an SPC-1 chart depicting various storage systems IOPs levels and response times as they ramped from 10% to 100% of their IOPS rate. StoreServ 3PAR’s latest entry showed a considerable band of IOPS (25K to over 250K) all within a sub-msec response time range. Which was pretty impressive since at the time no other storage systems seemed able to do this for their whole range of IOPS. (A more recent SPC-1 result from HDS with an all-flash VSP with Hitachi Accelerated Flash also was able to accomplish this [sub-msec response time throughout their whole benchmark], only in their case it reached over 600K IOPS – read about this in our latest performance report in our newsletter, sign up above right).

  • Adaptive Read – As I understood it, this changed the size of backend reads to match the size requested by the front end. For disk systems, one often sees that a host read of say 4KB often causes a read of 16KB from the backend, with the assumption that the host will request additional data after the block read off of disk and 90% of the time spent to do a disk read is getting the head to the correct track and once there it takes almost no effort to read more data. However with flash, there is no real effort to get to a proper location to read a block of flash data and as such, there is no advantage to reading more data than the host requests, because if they come back for more one can immediately read from the flash again.
  • Adaptive Write – Similar to adaptive read, adaptive write only writes the changed data to flash. So if a host writes a 4KB block then 4KB is written to flash. This doesn’t help much for RAID 5 because of parity updates but for RAID 1 (mirroring) this saves on flash writes which ultimately lengthens flash life.
  • Adaptive Offload (destage) – This changes the frequency of destaging or flushing cache depending on the level of write activity. Slower destaging allows written (dirty) data to accumulate in cache if there’s not much write activity going on, which means in RAID 5 parity may not need to be updated as one could potentially accumulate a whole stripe’s worth of data in cache. In low-activity situations such destaging could occur every 200 msecs. whereas with high write activity destaging could occur as fast as every 3 msecs.
  • Multi-tennant IO processing – For disk drives, with sequential reads, one wants the largest stripes possible (due to head positioning penalty) but for SSDs one wants the smallest stripe sizes possible. The other problem with large stripe sizes is that devices are busy during the longer sized IO while performing the stripe writes (and reads). StoreServ modified the stripe size for SSDs to be 32KB so that other IO activity need not have to wait as long to get their turn in the (IO device) queue. The other advantage is when one is doing SSD rebuilds, with a 32KB stripe size one can intersperse more IO activity for the devices involved in the rebuild without impacting rebuild performance.

Of course the other major advantage of HP StoreServ’s 3PAR architecture provides for Flash is its intrinsic wide striping that’s done across a storage pool. This way all the SSDs can be used optimally and equally to service customer IOs.

I am certain there were other optimizations HP made to support SSDs in StoreServ storage, but these are the ones they were willing to talk publicly about.

No mention of when Memristor SSDs were going to be available but stay tuned, HP let slip that sooner or later Memristor Flash storage will be in HP storage & servers.


Photo Credits: (c) 2013 Silverton Consulting, Inc

Has latency become the key metric? SPC-1 LRT results – chart of the month

I was at EMCworld a couple of months back and they were showing off a preview of the next version VNX storage, which was trying to achieve a million IOPS with under a millisecond latency.  Then I attended NetApp’s analyst summit and the discussion at their Flash seminar was how latency was changing the landscape of data storage and how flash latencies were going to enable totally new applications.

One executive at NetApp mentioned that IOPS was never the real problem. As an example, he mentioned one large oil & gas firm that had a peak IOPS of 35K.

Also, there was some discussion at NetApp of trying to come up with a way of segmenting customer applications by latency requirements.  Aside from high frequency trading applications, online payment processing and a few other high-performance database activities, there wasn’t a lot that could easily be identified/quantified today.

IO latencies have been coming down for years now. Sophisticated disk only storage systems have been lowering latencies for over a decade or more.   But since the introduction of SSDs it’s been a whole new ballgame.  For proof all one has to do is examine the top 10 SPC-1 LRT (least response time, measured with workloads@10% of peak activity) results.

Top 10 SPC-1 LRT results, SSD system response times


In looking over the top 10 SPC-1 LRT benchmarks (see Figure above) one can see a general pattern.  These systems mostly use SSD or flash storage except for TMS-400, TMS 320 (IBM FlashSystems) and Kaminario’s K2-D which primarily use DRAM storage and backup storage.

Hybrid disk-flash systems seem to start with an LRT of around 0.9 msec (not on the chart above).  These can be found with DotHill, NetApp, and IBM.

Similarly, you almost have to get to as “slow” as 0.93 msec. before you can find any disk only storage systems. But most disk only storage comes with a latency at 1msec or more. Between 1 and 2msec. LRT we see storage from EMC, HDS, HP, Fujitsu, IBM NetApp and others.

There was a time when the storage world was convinced that to get really good response times you had to have a purpose built storage system like TMS or Kaminario or stripped down functionality like IBM’s Power 595.  But it seems that the general purpose HDS HUS, IBM Storwize, and even Huawei OceanStore are all capable of providing excellent latencies with all SSD storage behind them. And all seem to do at least in the same ballpark as the purpose built, TMS RAMSAN-620 SSD storage system.  These general purpose storage systems have just about every advanced feature imaginable with the exception of mainframe attach.

It seems nowadays that there is a trifurcation of latency results going on, based on underlying storage:

  • DRAM only systems at 0.4 msec to ~0.1 msec.
  • SSD/flash only storage at 0.7 down to 0.2msec
  • Disk only storage at 0.93msec and above.

The hybrid storage systems are attempting to mix the economics of disk with the speed of flash storage and seem to be contending with all these single technology, storage solutions. 

It’s a new IO latency world today.  SSD only storage systems are now available from every major storage vendor and many of them are showing pretty impressive latencies.  Now with fully functional storage latency below 0.5msec., what’s the next hurdle for IT.


Image: EAB 2006 by TMWolf


Enhanced by Zemanta

Racetrack memory gets rolling

A recent MIT study showed how new technology can be used to control and write magnetized bits in nano-structures, using voltage alone. This new technique also consumes much less power than using magnets or magnetism as well.

They envision a sort of nano-circuit, -wire or -racetrack with a series of transistor-like structures spaced at regular intervals above it.  Nano-bits would be racing around these nano-wires as a series of magnetized domains.  These new transitor-like devices would be a sort of onramp for the bits as well as stop-lights/speed limits for the racetrack.

Magnetic based racetrack memory issues

The problems with using magnets to write the bits in nano-racetrack is that magnetism casts a wide shadow and can impact adjacent race tracks, sort of like shingled writes (we last discussed in Shingled magnetic recording disks).   The other problem has been a way to (magnetically) control the speed of racing bits so they can be isolated and read or written effectively.

Magneto-ionic racetrack memory solutions

But MIT researchers have discovered a way to use voltage to change the magnetic orientation of a bit on a race track.  They also found a way through the use of voltage to precisely control the position of magnetic bits speeding around the track and to electronically isolate and select a bit.

What they have created is sort of a transistor for magnetized domains using ion-rich materials.  Voltages can be used to attract or repel those ions and then those ions can interact with flowing magnetic domains to speed up or slow down the movement of magnetic domains.

Thus, the transistor-like device can  be set to attract (or speed up) magnetized domains, slow down magnetized domains or stop them and also be used to change the magnetic orientation of a domain.  MIT researchers call these devices Magneto-ionic devices.

Racetrack memory redefined

So now we have a way to (electronically) seek to bit data on a race track,  a way to precisely (electronically) select bits on the race track, and a way to precisely (electronically) write data on a race track.  And presumably, with an appropriate (magnetic) read head, a way to read this data.  As an added bonus, apparently data once written on the racetrack requires no additional power to stay magnetized.

So the transistor-like devices are a combination of write heads, motors and brakes for the racetrack memory.  Not sure,  but if they can write, slow down and speedup magnetic domains, why can’t they read them as well that way the transistor-like devices could be a read head as well.

Why do they need more than one write-head per track. It seems to me that one should suffice for a fairly long track, not unlike disk drives. I suppose  more of them would make the track faster to write. But  they would all have to operate in tandem, speeding up or stoping the racing bits on the track all together and then starting them all back up, together again.  Maybe this way they can write a byte or a word or a chunk of data all at the same time.

In any event, it seems that race track memory took a (literally) quantum leap  forward with this new research out of MIT.

Racetrack memory futures

IBM has been talking about race track memory for some time now and this might be the last hurdle to overcome to getting there (we last discussed this in A “few exabytes-a-day” from SKA post).

In addition,  there doesn’t appear to be any write cycle, bit duration or the need for erasing whole page issues with this type of technology.  So as an underlying storage for a new sort of semi-conductor storage device (SSD) this has significant inherent advantages.

Not to mention that is all based on nano-based device sizes which means that it can pack a lot of bits in very little volume or area.  So SSDs based on these racetrack memory technologies will be denser, faster, and require less energy – could you want.

Image: Nürburgring 2012 by Juriën Minke


The shrinking low-end

I was updating my features list for my SAN Buying Guide the other day when I noticed that low-end storage systems were getting smaller.

That is NetApp, HDS  and others had recently reduced the number of drives they supported on their newest low-end storage systems (e.g, see specs for HDS HUS-110 vs AMS2100 and Netapp FAS2220 vs FAS2040). And as the number of drives determines system capacity, the size of their SMB storage was shrinking.

But what about the data deluge?

With the data explosion going on, data growth in most IT organizations is something like  65%.  But these problems seem to be primarily in larger organizations or in data warehouses databases used for operational analytics.  In the case of analytics, these are typically done on database machines or Hadoop clusters and don’t use low-end storage.

As for larger organizations, the most recent storage systems all seem to be flat to growing in capacity, not shrinking. So, the shrinking capacity we are seeing in new low-end storage doesn’t seem to be an issue in these other market segments.

What else could explain this?

I believe the introduction of SSDs is changing the drive requirements for low-end storage.  In the past, prior to SSDs, organizations would often over provision their storage to generate better IO performance.

But with most low-end systems now supporting SSDs, over provisioning is no longer an economical solution to increase performance.  As such, for those needing higher IO performance the most economical solution (CAPex and OPex) is to buy a small amount of SSD capacity in conjunction with the remaining storage in disk capacity.

That and the finding that maybe SMB data centers don’t need as much disk storage as was originally thought.

The downturn begins

So this is the first downturn in capacity to come along in my long history with data storage.  Never before have I seen capacities shrink in new versions of storage systems designed for the same market space.

But if SSDs are driving the reduction in SMB storage systems, shouldn’t we start to see the same trends in mid-range and enterprise class systems?

But disk enclosure re-tooling may be holding these system capacities flat.  It takes time, effort and expense to re-implement disk enclosures for storage systems.  And as the reductions we are seeing in low-end is not that significant, maybe it’s just not worth it for these other systems – just yet.

But it would be useful to see something that showed the median capacity shipments per storage subsystem. I suppose weighted averages are available from something like IDC disk system shipments and overall capacity shipped. But there’s no real way to derive median from these measures and I think thats the only stat that might show how this trend is being felt in other market segments.


Image credit: Photo of Dell EqualLogic PSM4110 Blade Array disk drawer, taken at Dell Storage Forum 2012




SPECsfs2008 NFS SSD/NAND performance, take two – chart-of-the-month

SCISFS120623-010(002) (c) 2012 Silverton Consulting, Inc. All Rights Reserved

For some time now I have been experimenting with different approaches to normalize IO activity (in the chart above its NFS throughput operations per second) for systems that use SSDs or Flash Cache.  My previous attempt  (see prior SPECsfs2008 chart of the month post) normalized base on GB of NAND capacity used in a submission.

I found the previous chart to be somewhat lacking so this quarter I decided to use SSD device and/or Flash Cache card count instead.  This approach is shown in the above chart. Funny thing, although the rankings were exactly the same between the two charts one can see significant changes in the magnitudes achieved, especially in the relative values, between the top 2 rankings.

For example, in the prior chart Avere FXT 3500 result still came in at number one but whereas here they achieved ~390K NFS ops/sec/SSD on the prior chart they obtained ~2000 NFS ops/sec/NAND-GB. But more interesting was the number two result. Here the NetApp FAS6240 with 1TB Flash Cache Card achieved ~190K NFS ops/sec/FC-card but on the prior chart they only hit ~185 NFS ops/sec/NAND-GB.

That means on this version of the normalization the Avere is about 2X more effective than the NetApp FAS6240 with 1TB FlashCache card but in the prior chart they were 10X more effective in ops/sec/NAND-GB. I feel this is getting closer to the truth but not quite there yet.

We still have the problem that all the SPECsfs2008 submissions that use SSDs or FlashCache also have disk drives as well as (sometimes significant) DRAM cache in them.  So doing a pure SSD normalization may never suffice for these systems.

On the other hand, I have taken a shot at normalizing SPECsfs2008 performance for SSDs-NAND, disk devices and DRAM caching as one dimension in a ChampionsChart™ I use for a NAS Buying Guide, for sale on my website.  If your interested in seeing it, drop me a line, or better yet purchase the guide.


The complete SPECsfs2008 performance report went out in SCI’s June newsletter.  But a copy of the report will be posted on our dispatches page sometime next month (if all goes well).  However, you can get the SPECsfs2008 performance analysis now and subscribe to future free newsletters by just using the signup form above right.

For a more extensive discussion of current NAS or file system storage performance covering SPECsfs2008 (Top 20) results and our new ChampionsChart™ for NFS and CIFS storage systems, please see SCI’s NAS Buying Guide available from our website.

As always, we welcome any suggestions or comments on how to improve our analysis of SPECsfs2008 results or any of our other storage performance analyses.

EMC buys ExtremeIO

Wow, $430M for a $25M startup that’s been around since 2009 and hasn’t generated any revenue yet.  It probably compares well against Facebook’s recent $1B acquisition of Instagram but still it seems a bit much.

It certainly signals a significant ongoing interest in flash storage in whatever form that takes. Currently EMC offers PCIe flash storage (VFCache), SSD options in VMAX and VNX, and has plans for a shared flash cache array (project: Thunder).  An all-flash storage array makes a lot of sense if you believe this represents an architecture that can grab market share in storage.

I have talked with ExtremeIO in the past but they were pretty stealthy then (and still are as far as I can tell). Not much details about their product architecture, specs on performance, interfaces or anything substantive. The only thing they told me then was that they were in the flash array storage business.

In a presentation to SNIA’s BOD last summer I said that the storage industry is in revolution.  When a 20 or so device system can generate ~250K or more IO/second with a single controller, simple interfaces, and solid state drives, we are no longer in Kansas anymore.

Can a million IO storage system be far behind.

It seems to me, that doing enterprise storage performance has gotten much easier over the last few years.  Now that doesn’t mean enterprise storage reliability, availability or features but just getting to that level of performance before took 1000s of disk drives and racks of equipment.  Today, you can almost do it in a 2U enclosure and that’s without breaking a sweat.

Well that seems to be the problem, with a gaggle of startups, all vying after SSD storage in one form or another the market is starting to take notice.  Maybe EMC felt that it was a good time to enter the market with their own branded product, they seem to already have all the other bases covered.

Their website mentions that ExtremeIO was a load balanced, deduplicated clustered storage system with enterprise class services (this could mean anything). Nonetheless, a deduplicating, clustered SSD storage system built out of commodity servers could define at least 3 other SSD startups I have recently talked with and a bunch I haven’t talked with in awhile.

Why EMC decided that ExtremeIO was the one to buy is somewhat a mystery.  There was some mention of an advanced data protection scheme for the flash storage but no real details.

Nonetheless, enterprise SSD storage services with relatively low valuation and potential to disrupt enterprise storage might be something to invest in.  Certainly EMC felt so.


Comments, anyone know anything more about ExtremeIO?