New GraphCore GC2 chips with 2PFlop performance in a Dell Server

I was at Dell EMC Analyst summit this past week and at the show they had a series of sessions describing some of Dell Venture capital investments. One of the sessions was about GraphCore, a UK design firm that’s working on a new AI chip.

Their new GC2 chip is now out and available for customers to use. The new chip offers unprecedented performance for AI NN computations.

Hardware

GraphCore’s new Colossus GC2 chip holds 1216 IPU-Cores™. Each IPU runs at 100GFlops and is capable of running 7 threads. The GC2 chip supports 300MB of memory, with an aggregate of 30TB/s of memory bandwidth.  Each IPU supports low precision floating point arithmetic in completely parallel/concrrent execution. The GC2 chip has 23.6B transistors.

Each GC2 chip supports 80 IPU-Links™ to connect to other GC2 chips with 2.5tbps of chip to chip bandwidth. Further, the chip includes a PCIe Gen 4 x16 link (31.5GB/s) to host processors. And each chip supports up to 8TB/s IPU-Exchange™ on the chip bandwidth for inter chip, IPU to IPU communications.

The GC2 chip is available on a PCIe accelerator board that includes 2 GC2 chips. It’s also available in a Dell server configuration with 8 of their PCIe accelerator boards. In the server, with 2 GC2 chips  per board, it has ~19.5K IPUs with ~2.0PFlops in total of IPU processing power.

Software

GC2 IPUs support GraphCore’s Poplar® software and API’s that allows users to code in many of their favorite AI framework, such as PyTorch and TensorFlow.

At the NIPS 2017 conference GraphCore showed some AI ResNet-50, DeepBench LSTM RNN, and DeepVoice WaveNet performance benchmark results with their GC2 accelerator cards..

The chart above shows DeepBench LSTMN RNN runs comparing their  GC2 accelerator card against an Nvidia P100 GPU board (longer is better).

DeepBench is intended to support a set of workloads that mimic or simulate typical deep neural net types of operations and is used to compare NN hardware systems. The chart above compares DeepBench RNN inference operations with  GC2 accelerator card  vs. Nvidia P100 cards at three levels of response times (<2msec, <5msec. and <7msec.).

As can be seen in the chart, the GraphCore GC2 accelerator card performed significantly (from 182X to 242X) better than the Nvidia accelerator card executing NN inferencing at <5msec and <7msec latency. And was able to perform ~42K Inferences at <2msec latency where Nvidia P100 was unable to do at all.

~~~~
The GC2 chip, accelerator card and Dell EMC servers that run them look to be a significant advance in AI NN computations. We didn’t see any technical spec’s for the server but we assume it comes in a 4U configuration and uses less power than 8 GPUs.

However, at the moment, the servers are sold out. No information on the GC2 accelerator cards but our guess is that they are sold out as well, and probably ditto for the chips. Dell didn’t quote us any pricing on the servers, so its hard to know whether we could afford one, even if they weren’t sold out.

Who wouldn’t want to own a 4U server with 2PFlops performance for their AI apps?

Comments?

Photo Credit(s): Photos taken during Dell EMC Analyst Summit GraphCore presentation

Photos from GraphCore NIPS 2017 presentations