PCIe – Silverton Consulting

Axellio, next gen, IO intensive server for RT analytics by X-IO Technologies

Posted on June 22, 2017 by Ray in Block Storage, Data reduction, Distributed computing, Machine Learning, NVMe, NVMe storage, PCIe, Storage performance, Strategic Inflection Points, Strategic planning

We were at X-IO Technologies last week for SFD13 in Colorado Springs talking with the team and they showed us their new IO and storage intensive server, the Axellio. They want to sell Axellio to customers that need extreme IOPS, very high bandwidth, and large storage requirements. Videos of X-IO’s sessions at SFD13 are available here.

The hardware

Axellio comes in 2U appliance with two server nodes. Each server supports 2 sockets of Intel E5-26xx v4 CPUs (4 sockets total) supporting from 16 to 88 cores. Each server node can be configured with up to 1TB of DRAM or it also supports NVDIMMs.

There are two key differentiators to Axellio:

The FabricExpress™, a PCIe based interconnect which allows both server nodes to access dual-ported, 2.5″ NVMe SSDs; and
Dense drive trays, the Axellio supports up to 72 (6 trays with 12 drives each) 2.5″ NVMe SSDs offering up to 460TB of raw NVMe flash using 6.4TB NVMe SSDs. Higher capacity NVMe SSDS available soon will increase Axellio capacity to 1PB of raw NVMe flash.

They also probably spent a lot of time on packaging, cooling and power in order to make Axellio a reliable solution for edge computing. We asked if it was NEBs compliant and they told us not yet but they are working on it.

Axellio can also be configured to replace 2 drive trays with 2 processor offload modules such as 2x Intel Phi CPU extensions for parallel compute, 2X Nvidia K2 GPU modules for high end video or VDI processing or 2X Nvidia P100 Tesla modules for machine learning processing. Probably anything that fits into Axellio’s power, cooling and PCIe bus lane limitations would also probably work here.

At the frontend of the appliance there are 1x16PCIe lanes of server retained for networking that can support off the shelf NICs/HCAs/HBAs with HHHL or FHHL cards for Ethernet, Infiniband or FC access to the Axellio. This provides up to 2x100GbE per server node of network access.

Performance of Axellio

With Axellio using all NVMe SSDs, we expect high IO performance. Further, they are measuring IO performance from internal to the CPUs on the Axellio server nodes. X-IO says the Axellio can hit >12Million IO/sec with at 35µsec latencies with 72 NVMe SSDs.

Lab testing detailed in the chart above shows IO rates for an Axellio appliance with 48 NVMe SSDs. With that configuration the Axellio can do 7.8M 4KB random write IOPS at 90µsec average response times and 8.6M 4KB random read IOPS at 164µsec latencies. Don’t know why reads would take longer than writes in Axellio, but they are doing 10% more of them.

Furthermore, the difference between read and write IOP rates aren’t close to what we have seen with other AFAs. Typically, maximum write IOPs are much less than read IOPs. Why Axellio’s read and write IOP rates are so close to one another (~10%) is a significant mystery.

As for IO bandwitdh, Axellio it supports up to 60GB/sec sustained and in the 48 drive lax testing it generated 30.5GB/sec for random 4KB writes and 33.7GB/sec for random 4KB reads. Again much closer together than what we have seen for other AFAs.

Also noteworthy, given PCIe’s bi-directional capabilities, X-IO said that there’s no reason that the system couldn’t be doing a mixed IO workload of both random reads and writes at similar rates. Although, they didn’t present any test data to substantiate that claim.

Markets for Axellio

They really didn’t talk about the software for Axellio. We would guess this is up to the customer/vertical that uses it.

Aside from the obvious use case as a X-IO’s next generation ISE storage appliance, Axellio could easily be used as an edge processor for a massive fabric of IoT devices, analytics processor for large RT streaming data, and deep packet capture and analysis processing for cyber security/intelligence gathering, etc. X-IO seems to be focusing their current efforts on attacking these verticals and others with similar processing requirements.

X-IO Technologies’ sessions at SFD13

Other sessions at X-IO include: Richard Lary, CTO X-IO Technologies gave a very interesting presentation on an mathematically optimized way to do data dedupe (caution some math involved); Bill Miller, CEO X-IO Technologies presented on edge computing’s new requirements and Gavin McLaughlin, Strategy & Communications talked about X-IO’s history and new approach to take the company into more profitable business.

Again all the videos are available online (see link above). We were very impressed with Richard’s dedupe session and haven’t heard as much about bloom filters, since Andy Warfield, CTO and Co-founder Coho Data, talked at SFD8.

For more information, other SFD13 blogger posts on X-IO’s sessions:

SFD13 primer – X-IO Axellio Edge Computing Platform by Max Mortillaro (@Darkkavenger)
X-IO Technology – a #SFD13 preview by Mike Preston (@mwpreston)

Full Disclosure

X-IO paid for our presence at their sessions and they provided each blogger a shirt, lunch and a USB stick with their presentations on it.

Facebook moving to JBOF (just a bunch of flash)

Posted on August 13, 2016August 13, 2016 by Ray in Data growth, NVMe, NVMe storage, PCIe, SSD storage

At Flash Memory Summit (FMS 2016) this past week, Vijay Rao, Director of Technology Strategy at Facebook gave a keynote session on some of the areas that Facebook is focused on for flash storage. One thing that stood out as a significant change of direction was a move to JBOFs in their datacenters.

As you may recall, Facebook was an early adopter of (FusionIO’s) server flash cards to accelerate their applications. But they are moving away from that technology now.

Insane growth at Facebook

Why? Vijay started his talk about some of the growth they have seen over the years in photos, videos, messages, comments, likes, etc. Each was depicted as a animated bubble chart, with a timeline on the horizontal axis and a growth measurement in % on the vertical axis, with the size of the bubble being the actual quantity of each element.

Although the user activity growth rates all started out small at different times and grew at different rates during their individual timelines, by the end of each video, they were all almost at 90-100% growth, in 4Q15 (assume this is yearly growth rate but could be wrong).

Vijay had similar slides showing the growth of their infrastructure, i.e., compute, storage and networking. But although infrastructure grew less quickly than user activity (messages/videos/photos/etc.), they all showed similar trends and ended up (as far as I could tell) at ~70% growth.
Continue reading “Facebook moving to JBOF (just a bunch of flash)” →

Intel Cloud Day 2016 news and views

Posted on April 11, 2016May 17, 2016 by Ray in Cloud services, Distributed computing, NVMe storage, PCIe, SSD storage, Strategic Inflection Points, Visionary leadershp

A couple of weeks back I was at Intel Cloud Day 2016 with the rest of the TFD team. We listened to a number of presentations from Intel Management team mostly about how the IT world was changing and how they planned to help lead the transition to the new cloud world.

The view from Intel is that any organization with 1200 to 1500 servers has enough scale to do a private cloud deployment that would be more economical than using public cloud services. Intel’s new goal is to facilitate (private) 10,000 clouds, being deployed across the world.

In order to facilitate the next 10,000, Intel is working hard to introduce a number of new technologies and programs that they feel can make it happen. One that was discussed at the show was the new OpenStack scheduler based on Google’s open sourced, Kubernetes technologies which provides container management for Google’s own infrastructure but now supports the OpenStack framework.

Another way Intel is helping is by building a new 1000 (500 now) server cloud test lab in San Antonio, TX. Of course the servers will be use the latest Xeon chips from Intel (see below for more info on the latest chips). The other enabling technology discussed a lot at the show was software defined infrastructure (SDI) which applies across the data center, networking and storage.

According to Intel, security isn’t the number 1 concern holding back cloud deployments anymore. Nowadays it’s more the lack of skills that’s governing how quickly the enterprise moves to the cloud.

At the event, Intel talked about a couple of verticals that seemed to be ahead of the pack in adopting cloud services, namely, education and healthcare. They also spent a lot of time talking about the new technologies they were introducing today.
Continue reading “Intel Cloud Day 2016 news and views” →

A tale of two AFAs: EMC DSSD D5 & Pure Storage FlashBlade

Posted on March 17, 2016May 17, 2016 by Ray in Block Storage, data protection, Data reduction, data services, Ethernet, File Storage, PCIe, SSD storage

There’s been an ongoing debate in the analyst community about the advantages of software only innovation vs. hardware-software innovation (see Commodity hardware loses again and Commodity hardware always loses posts). Here is another example where two separate companies have turned to hardware innovation to take storage innovation to the next level.

DSSD D5 and FlashBlade

Within the last couple of weeks, two radically different AFAs were introduced. One by perennial heavyweight EMC with their new DSSD D5 rack scale flash system and the other by relatively new comer Pure Storage with their new FlashBlade storage system.

These two arrays seem to be going after opposite ends of the storage market: the 5U DSSD D5 is going after both structured and unstructured data that needs ultra high speed IO access (<100µsec) times and the 4U FlashBlade going after more general purpose unstructured data. And yet the two have have many similarities at least superficially.
Continue reading “A tale of two AFAs: EMC DSSD D5 & Pure Storage FlashBlade” →

QoM 16-001: Will NVMe GA in enterprise storage over the next 12 months? Yes 0.68 probability

Posted on January 26, 2016January 26, 2016 by Ray in Forecasting, PCIe, QoM 2016, SSD storage, Storage performance

NVMe The latest analyst forecast contest Question of the Month (QoM 16-001) is on whether NVMe PCIe-SSDs will GA in enterprise storage over the next 12 months? For more information on our analyst forecast contest, please check out the post.

There are a couple of considerations that would impact NVMe adoption.

Availability of NVMe SSDs?

Intel, Samsung, Seagate and WD-HGST are currently shipping 2.5″ & HH-HL NVMe PCIe SSDs for servers. Hynix, Toshiba, and others had samples at last year’s Flash Memory Summit and promised production early this year. So yes, they are available, from at least 3 sources now, including enterprise class storage vendors, with more coming online over the year.

Some advantages of NVMe SSDs?

Advantages of NVMe (compiled from NVMe organization and other NVMe sources):

Lower SSD write and read IO access latencies
Higher mixed IOPS performance
Widespread OS support (not necessarily used in storage systems
Lower power consumption
X4 PCIe support
NVMe over FC Fabric (new RDMA) support

Disadvantages of NVMe SSDs?

Disadvantages of NVMe (compiled from NVMe drive reviewers and other sources):

Smaller form factors limit (MLC) capacity SSDs
New cabling (U.2) for 2.5″ SSDs
BIOS changes to support boot from NVMe (not much of a problem in storage systems)

Not many enterprise storage vendors use PCIe Flash

Current storage vendors that use PCIe flash (sourced from web searches on PCIe flash for major storage vendors):

Using PCIe SSDs as part or only storage tier
- Kamanario K2 all flash array
- NexGen storage hybrid storage
NetApp (PCIe) FlashCache
Others (?2) with Volatile cache backed by PCIe SSDs
Others (?2) using PCIe SSD as Non-volatile cache

Only a few of these will have new storage hardware out over the next 12 months. I estimated (earlier) about 1/3 of current storage vendors will release new hardware over the next 12 months.

The advantages of NVMe don’t matter as much unless you have a lot of PCIe flash in your system, so the 2 vendors above that use PCIe SSDs as storage are probably most likely to move to NVMe, but the limited size of NVMe drives, the meagre performance speed up to storage available from NVMe, may make NVMe adoption less likely. So maybe there’s a 0.3 probability * 1/3 (of vendors with hardware refresh) * 2 (vendors using PCIe flash as storage) or ~0.2.

For the other 5 candidates listed above, the advantages for NVMe aren’t that significant, so if they are refreshing their hardware, there’s maybe a low chance that they will take on NVMe, mainly because it’s going to become the prominent PCIe flash protocol, So maybe that adds another 0.15 of probability * 1/3 * 5 or ~0.25. (When I originally formulated the NVMe QoM I had not anticipated NVMe SSDs backing up volatile cache but they certainly exist, today.)

Other potential candidate for NVMe are all start ups. EMC DSSD uses PCIe fabric for it’s NAND support, and could already be making use of NVMe. (Although, I would not classify DSSD as an enterprise storage vendor.)

But there may be other start ups out there using PCIe flash that would consider moving to NVMe. A while back, I estimated there’s ~3 startups likely to emerge over the next year. It’s almost a certainty that they would all have some sort of flash storage., but maybe only one of them would make use of PCIe SSDs. And it’s unclear whether they would use NVMe drives as main storage or for caching. So, splitting the difference in probabilities, we will use 0.23 probability * 1 or ~0.23.

So total up my forecast we forecast for NVMe adoption in GA enterprise storage hardware over the next 12 months to be Yes with 0.68 probability.

The other likely candidates that will support NVMe are software defined storage or hyper converged storage. I don’t list these as enterprise storage vendors but I could be convinced that this was a mistake. If I add in SW defined storage the probability goes up, to high 0.80s to low 0.90s.

Comments?

(Storage QoM 16-001): Will we see NVM Express (NVMe) drives GA’d in enterprise storage over the next year

Posted on January 19, 2016 by Ray in data access, Forecasting, NVMe storage, PCIe, QoM 2016, QoW 2015, SSD storage, Storage performance

NVMe First, let me state that QoM stands for Question of the Month. Doing these forecast can be a lot of work, and rather than focusing my whole blog on weekly forecast questions and answers, I would like to do something else as well. So, from now on we are doing only one new forecast a month.

So for the first question of 2016, we will forecast whether NVMe SSDs will be GA’d in enterprise storage over the next year.

NVM Express (NVMe) means the new PCIe interface for SSD storage. Wikipedia has a nice description of NVMe. As discussed there, NVMe was designed for higher performance and enhanced parallelism which comes with the PCI Express (PCIe) bus. The current version of the NVMe spec is 1.2a (available here).

GA means generally available for purchase by any customer.

Enterprise storage systems refers to mid-range and enterprise class storage systems from major AND non-major storage vendors, which includes startups.

Over the next year means by 19 January 2017.

Special thanks to Kacey Lai (@mrdedupe), Primary Data for suggesting this months question.

Current and updates to previous forecasts

Update on QoW 15-001 (3DX) forecast:

News out today indicates that 3DX (3D XPoint non-volatile memory) samples may be available soon but it could take another 12 to 18 months to get it into production. 3DX manufacturing is more challenging than current planar NAND technology and uses about 100 new materials, many of which are currently single sourced. We already built into our 3DX forecast potential delays in reaching production in 6 months. The news above says this could be worse than expected. As such, I feel even stronger that there is less of a possibility of 3DX shipping in storage systems by next December. So I would update my forecast for QoW 15-001 to NO with an 0.75 probability at this time.

So current forecasts for QoW 15-001 are:

A) YES with 0.85 probability; and

B) NO with 0.75 probability

Current QoW 15-002 (3D TLC) forecast

We have 3 active participants, current forecasts are:

A) Yes with 0.95 probability;

B) No with 0.53 probability; and

C) Yes with 1.0 probability

Current QoW 15-003 (SMR disk) forecast

We have 1 active participant, current forecast is:

A) Yes with 0.85 probability

Coho Data, the packet processing squeeze and working set exploits

Posted on October 23, 2015October 26, 2015 by Ray in Ethernet, Networking, NVMe storage, PCIe, Storage performance, Strategic Inflection Points, System effectiveness

Was at Coho Data this week with Storage Field Day 8 (SFD8) (see the videos here) and we met with Andy Warfield (@andywarfield), CTO and Co-founder Coho Data. Last time we met (at SFD6) Andy talked at length about some enhancements they were working on and gave us a tutorial on HyperLogLog (HLL) data structures that can be used to identify application working sets.

Packet processing time is getting squeezed

Andy’s always a joy to talk with and this time was no exception. Andy started out talking about the speed of networking and what it meant for network packet processing time. He showed a chart with network speeds on the horizontal axis and packet processing time (in nsec) on the vertical access. It was a log-log chart but it showed an exponential decay such that at 10GbE a system had 67.2nsec to process a packet, at 40GbE, it had 16.8nsec to process a packet and at 100GbE the system had 6.7nsec to process a single packet. He was leading up to explaining why “storage datapaths are like network datapaths in hell”.

Similar performance dynamics are impacting storage device processing. In this case, NVMe PCIe flash devices are becoming processing bound.

Andy showed a chart for 4K random reads, plotting the number of cores on the bottom against K-IOPS on the vertical axis. At about 4 cores with one P3700 Intel PCIe NVMe card, the IOP performance of the storage system (as measured for NIC throughput) flattened out, from that point on, even after doubling the number of cores. It turns out with just one Intel P3700 NVMe PCIe flash card and 4 core Xeon processors one can quickly max out IOs across a 40GbE network, even though there’s plenty of networking bandwidth still available. Of course, this situation becomes much worse with the new XPoint NVM which is 1000X faster than NAND, coming out next year from Micron-Intel (subject for a future post as Intel was another SFD8 presenter).

Andy also made the point that as a component of a system increases in cost, software usually tries to improve its utilization. This dynamic is now occurring for PCIe flash cards, which generally make up about 50% of the cost of a storage controller complex.

Location, location, location, …

Net Net, (network forwarding decision) storage data transfer time is shrinking as data placement times are becoming longer. By that I think he means that determining where to place data in the storage hierarchy is becoming more complex, taking more processing cycles just when we have less time to make those decisions.

So the crux of the question is how do we make those decisions better. Coho Data has attacked this problem by implementing HLLs to better identify application working sets.

Last year (See prior post for more info on HLLs) Coho Data had just started working with HLL technology and hadn’t fully implemented their working set analytics. But this year, Andy displayed an On-Stream reporting service treemap chart (where rectangle size indicates relative size of a parameter) that indicated an application’s cache working set size.

Using working set history to improve IO

By using a time series of properly implemented HLLs together with snapshots of working set block information, Coho Data can tell how the working set changes over time for an application or VM. Andy showed an example of application working set size changes over the course of multiple days and each day in the evening there was a giant spike in working set size. This turned out to be backup scans.

So Coho Data could then go back and snapshot the working set information before and after the spike to see if it was different. Once it was determined to be different, they then could go further and re-apply the working set cache data prior to the spike (not sure if this is implemented just yet) after the backup scan to re-warm the workload data cache. Of course, this meant that the system would have read all this data back into cache. But doing so would leave the application’s data location optimized for upcoming IO activity.

This was just one example of what Coho Data could do to make a better data placement decision and improve the applications IO performance. Neat stuff, if you ask me.

Can’t wait until next year to see what Coho Data is working on next.

Comments?

Next generation NVM, 3D XPoint from Intel + Micron

Posted on July 31, 2015 by Ray in Data density, PCIe, SSD storage

Earlier this week Intel-Micron announced (see webcast here and here) a new, transistor-less NVM with 1000 time the speed (10µsec access time for NAND) of NAND [~10ns (nano-second) access times] and at 10X the density of DRAM (currently 16Gb/DRAM chip). They call the new technology 3D XPoint™ (cross-point) NVM (non-volatile memory).

In addition to the speed and density advantages, 3D XPoint NVM also doesn’t have the endurance problems associated with todays NAND. Intel and Micron say that it has 1000 the endurance of today’s NAND (MLC NAND endurance is ~3000 write (P/E) cycles).

At that 10X current DRAM density it’s roughly equivalent to todays MLC/TLC NAND capacities/chip. And at 1000 times the speed of NAND, it’s roughly equivalent in performance to DDR4 DRAM. Of course, because it’s non-volatile it should take much less power to use than current DRAM technology, no need for power refresh.

We have talked about the end of NAND before (see The end of NAND is here, maybe). If this is truly more scaleable than NAND it seems to me that the it does signal the end of NAND. It’s just a matter of time before endurance and/or density growth of NAND hits a wall and then 3D XPoint can do everything NAND can do but better, faster and more reliably.

3D XPoint technology

The technology comes from a dual layer design which is divided into columns and at the top and bottom of the columns are accessor connections in an orthogonal pattern that together form a grid to access a single bit of memory. This also means that 3D Xpoint NVM can be read and written a bit at a time (rather than a “page” at a time with NAND) and doesn’t have to be initialized to 0 to be written like NAND.

The 3D nature of the new NVM comes from the fact that you can build up as many layers as you want of these structures to create more and more NVM cells. The microscopic pillar between the two layers of wiring include a memory cell and a switch component which allows a bit of data to be accessed (via the switch) and stored/read (memory cell). In the photo above the yellow material is a switch and the green material is a memory cell.

A memory cell operates by a using a bulk property change of the material. Unlike DRAM (floating gates of electrons) or NAND (capacitors to hold memory values). As such it uses all of the material to hold a memory value which should allow 3D XPoint memory cells to scale downwards much better than NAND or DRAM.

Intel and Micron are calling the new 3D XPoint NVM storage AND memory. That is suitable for fast access, non-volatile data storage and non-volatile processor memory.

3D XPoint NVM chips in manufacturing today

First chips with the new technology are being manufactured today at Intel-Micron’s joint manufacturing fab in Idaho. The first chips will supply 128Gb of NVM and uses just two layers of 3D XPoint memory.

Intel and Micron will independently produce system products (read SSDs or NVM memory devices) with the new technology during 2016. They mentioned during the webcast that the technology is expected to be attached (as SSDs) to a PCIe bus and use NVMe as an interface to read and write it. Although if it’s used in a memory application, it might be better attached to the processor memory bus.

The expectation is that the 3D XPoint cost/bit will be somewhere in between NAND and DRAM, i.e. more expensive than NAND but less expensive than DRAM. It’s nice to be the only companies in the world with a new, better storage AND memory technology.

~~~~

Over the last 10 years or so, SSDs (solid state devices) all used NAND technologies of one form or another, but after today SSDs can be made from NAND or 3D XPoint technology.

Some expected uses for the new NVM is in gaming applications (currently storage speed and memory constrained) and for in-memory databases (which are memory size constrained). There was mention on the webcast of edge analytics as well.

Welcome to the dawn of a new age of computer storage AND memory.