October 2016 – Silverton Consulting

Docker presents at Cloud Field Day 1 (CFD1)

Posted on October 25, 2016 by Ray in Cloud services, Distributed computing, Strategic Inflection Points, System effectiveness

Earlier this summer, Docker presented at Cloud Field Day 1 (CFD1) on some of their current technology and upcoming enhancements. (See the video’s here).

As you probably recall, Docker is an implementation of Linux containers which is a way of packaging applications into micro-services that can be built, ship and run across onprem, private and public cloud infrastructure.

Docker containers and Docker Engine

Docker containers combine a base OS image, plus whatever other binaries are needed to run a micro-service into a container which runs ontop of a Docker Engine. Containers can then be run as a single instance or multiple instances on a Docker Engine.

Containers are not VMs, they have a fundamentally different architecture. For instance,

A VM includes a full OS and App software, it often takes several minutes to boot up and there is a hypervisor underneath it that emulates hardware and other critical services needed to run a VM. But there is no underlying standard OS under the VM layer.
A Docker container relies on shared OS resources, which allows for a lighter weight application package using shared resources, which means that instantiation/booting up is much faster, there is no Hypervisor, but a container can run under Linux, Windows or Mac OSs, and containers provide for full stack portability.

In the Docker Hub (srepository for Docker containers) one can find a WordPress container that contains the whole LAMP + WordPress stack in a single container. To run WordPress you would also need a MySQL or compatible database and there’s a MySQL machine container that can be used. You could easily run both the WordPress/LAMP container and the MySQL container in the same Docker Engine, connect the two together and connect the LAMP+Wordpress container to the Internet to fire up a WordPress blog site.

Docker compared VMs to houses and containers to apartments. Docker Engines can run as a VM or on bare metal hardware.

Running Docker containers on desktop, servers and in the cloud

If you want to experiment with Docker, you can download Docker for Mac or Docker for Windows which can be used install and run a native Docker engine on your desktop.

Windows Server also supports native Docker containers. In VMware one can run Docker containers under vSphere Integrated Containers which supplies Docker API endpoints as standard ESX VMs or you can run Docker containers under Project Photon which is a streamlined, non-ESX hypervisor that also supplies Docker API endpoints.

You can run Docker containers in AWS and Azure as well that integrates with each public cloud’s compute, network and storage services.

Docker Swarm

So you have your Docker engine running, with multiple containers sharing resources and to create an application but your out of compute, storage or networking power on your engine and need to bring on another server or two. What do you do? With Docker 1.12, you can now use Docker Swarm, which supports multiple Docker Engines.

With Docker Swarm, you have management nodes and worker nodes. Management nodes provide HA services for Docker containers which runs across multiple worker nodes. Worker nodes run Docker Engines with multiple containers.

Docker Swarms orchestrates the operation of multiple Docker Engines running Docker Services.

A Docker Service is a Docker container running across multiple worker nodes (engines) in a Docker Swarm. Docker services can be run globally (across each worker node) or replicated (some number of Docker Container instances are run across one or more worker nodes). You specify on the Docker Service command which you want and Swarm will insure that the specifications selected are implemented across its worker nodes.

If a worker node goes down, Swarm will detect it and re-start the failed container instances on other worker nodes in the Swarm. Beware, if your container relied on persistent storage, that storage must be also available to all Swarm worker nodes.

Swarm provides a Routing Mesh. When you fire up a container service you can identify a swarm-wide ingress port for a container. Every worker node will listen in on that port to provide a container-aware routing service to route app requests across the Swarm to wherever the containers are currently running.

You can have multiple Swarm management nodes which share the management of the Swarm. Swarm management nodes are either leaders or followers and provide a RAFT consensus model. If the leader node goes down, another management node will take on its leadership role and start managing the Swarm.

There are many other technologies underneath Docker Swarm that are worth a look but suffice it to say it provides a load-balancing, HA service for container execution across multiple engines.

Docker Datacenter

What could possibly be missing? We have Docker Engines that can run multiple containers and Docker Swarms that can run multiple Docker Engines and containers in an HA manner. But we really need something that supports multiple Docker Swarms, and throw in a private secure Container repository and enterprise support options while you’re at it.

Earlier this year Docker introduced Docker Datacenter, a priced service offering which does just that. It provides Containers-as-a-Service (CaaS) across multiple Docker Swarms that has commercial support options, a Docker Trusted Repository and integrates it all with enterprise services like LDAP/AD to provide audit logs and other monitoring capabilities for container services execution.

Using Docker Datacenter, developers can have their own multiple development swarms to support engineering activities and ship and store their container images in a secure, private repository and operations can have multiple Swarms which all run the same Docker Container apps in an HA manner.

From an app developer standpoint, it all looks like container instances are running in the same Docker Engine environment across all those implementations. Operations sees a centralized management console (plane) that provides a way to monitor and manage multiple Docker Swarms running everywhere.

Well that’s about it for the update on Docker. There wasn’t much at the sessions on how containers access persistent storage but there’s a Flocker service that offers plugin support for EMC, NetApp and other enterprise SAN storage for Container apps. And there seem to be others out there and available.

You can read/hear more about Docker from these other CFD1 participants:

The point of Docker is more than containers by Justin Warren (@JPWarren)
Docker on Cloudcast (podcast) by Nigel Moulton (@NigelPoulton)

Comments

Full disclosure: Docker gave us a very nice/very long scarf, and two t-shirts decorated with Docker logo and tagline and a number of stickers and pins.

QoM1610: Will NVMe over Fabric GA in enterprise AFA by Oct’2017

Posted on October 16, 2016 by Ray in Block Storage, CIFS/SMB, Ethernet, FC, FCoE, File Storage, Forecasting, Infiniband, iSCSI, Networking, NFS, NVMe, NVMe storage, Omni-Path, QoM 2016, RDMA, RoCE, SSD storage, Storage performance, Strategic Inflection Points

NVMe over fabric (NVMeoF) was a hot topic at Flash Memory Summit last August. Facebook and others were showing off their JBOF (see my Facebook moving to JBOF post) but there were plenty of other NVMeoF offerings at the show.

NVMeoF hardware availability

When Brocade announced their Gen6 Switches they made a point of saying that both their Gen5 and Gen6 switches currently support NVMeoF protocols. In addition to Brocade’s support, in Dec 2015 Qlogic announced support for NVMeoF for select HBAs. Also, as of July 2016, Emulex announced support for NVMeoF in their HBAs.

From an Ethernet perspective, Qlogic has a NVMe Direct NIC which supports NVMe protocol offload for iSCSI. But even without NVMe Direct, Ethernet 40GbE & 100GbE with iWARP, RoCEv1-v2, iSCSI SER, or iSCSI RDMA all could readily support NVMeoF on Ethernet. The nice thing about NVMeoF for Ethernet is not only do you get support for iSCSI & FCoE, but CIFS/SMB and NFS as well.

InfiniBand and Omni-Path Architecture already support native RDMA, so they should already support NVMeoF.

So hardware/firmware is already available for any enterprise AFA customer to want NVMeoF for their data center storage.

NVMeoF Software

Intel claims that ~90% of the software driver functionality of NVMe is the same for NVMeoF. The primary differences between the two seem to be the NVMeoY discovery and queueing mechanisms.

There are two fabric methods that can be used to implement NVMeoF data and command transfers: capsule mode where NVMe commands and data are encapsulated in normal fabric packets or fabric dependent mode where drivers make use of native fabric memory transfer mechanisms (RDMA, …) to transfer commands and data.

A (Linux) host driver for NVMeoF is currently available from Seagate. And as a result, support for NVMeoF for Linux is currently under development, and not far from release in the next Kernel (I think). (Mellanox has a tutorial on how to compile a Linux kernel with NVMeoF driver support).

With Linux coming out, Microsoft Windows and VMware can’t be far behind. However, I could find nothing online, aside from base NVMe support, for either platform.

NVMeoF target support is another matter but with NICs/HBAs & switch hardware/firmware and drivers presently available, proprietary storage system target drivers are just a matter of time.

Boot support is a major concern. I could find no information on BIOS support for booting off of a NVMeoF AFA. Arguably, one may not need boot support for NVMeoF AFAs as they are probably not a viable target for storing App code or OS software.

From what I could tell, normal fabric multi-pathing support should work fine with NVMeoF. This should allow for HA NVMeoF storage, a critical requirement for enterprise AFA storage systems these days.

NVMeoF advantages/disadvantages

Chelsio and others have shown that NVMeoF adds ~8μsec of additional overhead beyond native NVMe SSDs, which if true would warrant implementation on all NVMe AFAs. This may or may not impact max IOPS depending on scale-ability of NVMeoF.

For instance, servers (PCIe bus hardware) typically limit the number of private NVMe SSDs to 255 or less. With an NVMeoF, one could potentially have 1000s of shared NVMe SSDs accessible to a single server. With this scale, one could have a single server attached to a scale-out NVMeoF AFA (cluster) that could supply ~4X the IOPS that a single server could perform using private NVMe storage.

Base level NVMe SSD support and protocol stacks are starting to be available for most flash vendors and operating systems such as, Linux, FreeBSD, VMware, Windows, and Solaris. If Intel’s claim of 90% common software between NVMe and NVMeoF drivers is true, then it should be a relatively easy development project to provide host NVMeoF drivers.

The need for special Ethernet hardware that supports RDMA may delay some storage vendors from implementing NVMeoF AFAs quickly. The lack of BIOS boot support may be a minor irritant in comparison.

NVMeoF forecast

AFA storage systems, as far as I can tell, are all about selling high IOPS and very-low latency IOs. It would seem that NVMeoF would offer early adopter AFA storage vendors a significant performance advantage over slower paced competition.

In previous QoM/QoW posts we have established that there are about 13 new enterprise storage systems that come out each year. Probably 80% of these will be AFA, given the current market environment.

Of the 10.4 AFA systems coming out over the next year, ~20% of these systems pride themselves on being the lowest latency solutions in the market, and thus command high margins. One would think these systems would be the first to adopt NVMeoF. But, most of these systems have their own, proprietary flash modules and do not use standard (NVMe) SSDs and can use their own proprietary interface to their proprietary flash storage. This will delay any implementation for them until they can convert their flash storage to NVMe which may take some time.

On the other hand, most (70%) of the other AFA systems, that currently use SAS/SATA SSDs, could boost their IOP counts and drastically reduce their IO response times, by implementing NVMe SSDs and NVMeoF. But converting SAS/SATA backends to NVMe will take time and effort.

But, there are a select few (~10%) of AFA systems, that already use NVMe SSDs in their AFAs, and for these few, they would seem to have a fast track towards implementing NVMeoF. The fact that NVMeoF is supported over all fabrics and all storage interface protocols make it even easier.

Moreover, NVMeoF has been under discussion since the summer of 2015, which tells me that astute AFA vendors have already had 18+ months to develop it. With NVMeoF host drivers & hardware available since Dec. 2015, means hardware and software exist to test and validate against.

I believe that NVMeoF will be GA’d within the next 12 months by at least one enterprise AFA system. So my QoM1610 forecast for NVMeoF is YES, with a 0.83 probability.

Comments?

Hitachi and the coming IoT gold rush

Posted on October 8, 2016October 8, 2016 by Ray in Artificial Intelligence, Cloud services, Cloud storage, Data analytics, Distributed computing, Energy efficiency, Information economy, Market dynamics, Object storage, R&D measures, Strategic Inflection Points, System effectiveness, Visionary leadershp

Earlier this week I attended Hitachi Summit 2016 along with a number of other analysts and Hitachi executives where Hitachi discussed their current and ongoing focus on the IoT (Internet of Things) business.

We have discussed IoT before (see QoM1608: The coming IoT tsunami or not, Extremely low power transistors … new IoT applications). Analysts and companies predict ~200B IoT devices by 2020 (my QoM prediction is 72.1B 0.7 probability). But in any case there’s a lot of IoT activity going to come online, very shortly. Hitachi is already active in IoT and if anything, wants it to grow, significantly.

Hitachi’s current IoT business

Hitachi is uniquely positioned to take on the IoT business over the coming decades, having a number of current businesses in industrial processes, transportation, energy production, water management, etc. Over time, all these industries and more are becoming much more data driven and smarter as IoT rolls out.

Some metrics indicating the scale of Hitachi’s current IoT business, include:

Hitachi is #79 in the Fortune Global 500;
Hitachi’s generated $5.4B (FY15) in IoT revenue;
Hitachi IoT R&D investment is $2.3B (over 3 years);
Hitachi has 15K customers Worldwide and 1400+ partners; and
Hitachi spends ~$3B in R&D annually and has 119K patents

Hitachi has been in the OT (Operational [industrial] Technology) business for over a century now. Hitachi has also had a very successful and ongoing IT business (Hitachi Data Systems) for decades now. Their main competitors in this IoT business are GE and Siemans but neither have the extensive history in IT that Hitachi has had. But both are working hard to catchup.

Hitachi Rail-as-a-Service

For one example of what Hitachi is doing in IoT, they have recently won a 27.5 year Rail-as-a-Service contract to upgrade, ticket, maintain and manage all new trains for UK Rail. This entails upgrading all train rolling stock, provide upgraded rail signaling, traffic management systems, depot and station equipment and ticketing services for all of UK Rail.

The success and profitability of this Hitachi service offering hinges on their ability to provide more cost efficient rail transport. A key capability they plan to deliver is predictive maintenance.

Today, in UK and most other major rail systems, train high availability is often supplied by using spare rolling stock, that’s pre-positioned and available to call into service, when needed. With Hitachi’s new predictive maintenance capabilities, the plan is to reduce, if not totally eliminate the need for spare rolling stock inventory and keep the new trains running 7X24.

Hitachi said their new trains capture 48K data items and generate over ~25GB/train/day. All this data, will be fed into their new Hitachi Insight Group Lumada platform which includes Pentaho, HSDP (Hitachi Streaming Data Platform) and their Content Analytics to analyze train data and determine how best to keep the trains running. Behind all this analytical power will no doubt be HDS HCP object store used to keep track of all the train sensor data and other information, Hitachi UCP servers to process it all, and other Hitachi software and hardware to glue it all together.

The new trains and services will be rolled out over time, but there’s a pretty impressive time table. For instance, Hitachi will add 120 new high speed trains to UK Rail by 2018. About the only thing that Hitachi is not directly responsible for in this Rail-as-a-Service offering, is the communications network for the trains.

Hitachi other IoT offerings

Hitachi is actively seeking other customers for their Rail-as-a-service IoT service offering. But it doesn’t stop there, they would like to offer smart-water-as-a-service, smart-city-as-a-service, digital-energy-as-a-service, etc.

There’s almost nothing that Hitachi currently supplies as industrial products that they wouldn’t consider offering in an X-as-a-service solution. With HDS Lumada Analytics, HCP and HDS storage systems, Hitachi UCP converged infrastructure, Hitachi industrial products, and Hitachi consulting services, together they are primed to take over the IoT-industrial products/services market.

Welcome to the new Hitachi IoT world.

Comments?

Bar chart depicting IOPS/GB-NAND, #1 is Datacore Parallel Server with ~266 IOPS/GB-NAND,

SPC-1 IOPS performance per GB-NAND – chart of the month

Posted on October 1, 2016 by Ray in IOPS, SAS, SATA, SPC-1, SSD storage, Storage performance

The above is an updated chart from last months SCI newsletter StorInt™ SPC Performance Report depicting the top 10 SPC-1 submissions IOPS™ per GB-NAND. We have been searching for a while now how to depict storage system effectiveness when using SSD or other flash storage. We have used IOPS/SSD in the past but IOPS/GB-NAND looks better.

Calculating IOPS/GB-NAND

SPC-1 does not report this metric but it can be calculated by dividing IOPS by NAND storage capacity. One can find out NAND storage capacity by looking over SPC-1 full disclosure reports (FDR), totaling up the NAND storage in the configuration in all the SSDs and flash devices. This is total NAND capacity, not Total ASU (used storage) Capacity. GB-NAND reflects just what’s indicated for SSD/flash device capacity in the configuration section. This is not necessarily the device’s physical NAND capacity when over provisioned, but at least it’s available in the FDR.

DataCore Parallel Server IOPS/GB-NAND explained

The DataCore Parallel Server generated over 5M IOPS (IO’s/second) under an SPC-1 (OLTP-like) workload. And with their 54-480GB SSDs, totaling ~25.9TB of NAND capacity, it gives them just under 200 IOPS/GB-NAND. The chart in the original report was incorrect. There we used 36-480GB SSDs or ~17.3TB of NAND to compute IOPS/GB-NAND, which gave them just under 300 IOPS/GB-NAND in the report, which was incorrect. (The full report has been since corrected and is available for re-download for subscribers to our newsletter).

The 480GB (Samsung SM863 MZ-7KM480E)SSDs were all SATA attached. Samsung lists these SSDs as V-NAND, MLC drives, rated at 97K random Reads and 26K random writes. At over 5M IOPS, it should be running close to 100% of the SSDs rated performance. However, DataCore’s Parallel Server included 2 controllers with a total of 3TB of DRAM cache, which was then SAS connected to 4 DELL MD1220 storage arrays, each with 512GB of DRAM cache, so their total configuration had about 5TB of DRAM in it, most of which would have been used as a IO cache.

The SPC-1 submission only used 11.8TB (Total ASU capacity) of storage. All the DRAM cache help to explain how they attained 5M IOPS. Having a multi-tiered cache like DataCore-MD1220 configuration, doesn’t insure that all the cache is effectively used but even without cache tiering logic, there might not be much of an overlap between the MD1220 and Parallel Server caches. It would be more interesting to see how busy the SSDs were during this SPC-1 run.

How random the SPC-1 workload is, is subject to much speculation in the industry. Suffice it to say it’s not 100% random, but what is. Non-random OLTP workloads would tend to favor larger caches.

SPC is coming out with a new version of their benchmark with supplementary information which may shed more light on device busyness.

All SPC-1 benchmark submissions are available at storageperformance.org.

Want more?

The August 2016 and our other SPC Performance reports have much more information on SPC-1 and SPC-2 performance. Moreover, there’s a lot more performance information, covering email and other (OLTP and throughput intensive) block storage workloads, in our SAN Storage Buying Guide, available for purchase on our website. More information on file and block protocol/interface performance is included in SCI’s SAN-NAS Buying Guide, also available from our website .

~~~~

The complete SPC performance report went out in SCI’s August 2016 Storage Intelligence e-newsletter. A copy of the report will be posted on our SCI dispatches (posts) page over the next quarter or so (if all goes well). However, you can get the latest storage performance analysis now and subscribe to future free SCI Storage Intelligence e-newsletters, by just using the signup form in the sidebar or you can subscribe here.