TPU and hardware vs. software innovation (round 3)

tpu-2At Google IO conference this week, they revealed (see Google supercharges machine learning tasks …) that they had been designing and operating their own processor chips in order to optimize machine learning.

They called the new chip, a Tensor Processing Unit (TPU). According to Google, the TPU provides an order of magnitude more power efficient machine learning over what’s achievable via off the shelf GPU/CPUs. TensorFlow is Google’s open sourced machine learning  software.

This is very interesting, as Google and the rest of the hype-scale hive seem to have latched onto open sourced software and commodity hardware for all their innovation. This has led the industry to believe that hardware customization/innovation is dead and the only thing anyone needs is software developers. I believe this is incorrect and that hardware innovation combined with software innovation is a better way, (see Commodity hardware always loses and Better storage through hardware posts).

Historically, the hyper-scale hive have avoided proprietary or vendor based storage systems, networking routers/switches or proprietary operating systems , in favor of software defined storage using commodity server flash and disk, software defined networking using commodity switch hardware and commodity servers/OSs based on Intel/AMD/ARM processors. In most cases, the hyper-scale hive have developed their own software and then open sourced it to gain wider adoption, broader development and a deeper ecosystem.

But, somewhere the light must have slowly dawned in Google development that maybe there was a better way. Design your own processor, get it manufactured by a contract Fab and put them to work to optimize machine intelligence data flow and data processing. By doing all this, we could provide more deep learning for less.


A TPU is a custom ASIC tailored to execute TensorFlow activities. It’s efficiency is based on reducing the computational precision, e.g., 64- to 8-bit, to that needed to only perform TensorFlow processing. This also means TPUs need less transistors per operation, presumably speeding up computational activity.

tpu-1The TPU processing unit (see above) fits into drive slots in server racks.

Normally disk drives are SATA or SAS attached in off the shelf servers. But it’s also possible these drive slots were meant for SSDs which could be PCIe attached and latest generationNVMe connectors are multi-protocol capable, that is SAS or NVMe. If NVMe PCIe attached, the TPU’s would have much higher bandwidth capabilities and faster access to memory. Also, I don’t see any DIMMs on the TPU card, so presumably this means it needs access to memory.

According to Google the projects been ongoing for several years now and been in operation in their data center for over a year. Moreover, Google mentioned that their TPUs were up and running in real applications within 22 days after they tested silicon.

TPU’s used for what

Applications that currently depend on TPU include RankBrain (Google’s search engine), StreetView (Google Maps app), and AlphaGo (Google’s world champion Go player program). All these are currently heavily dependent on machine learning algorithms or are in the process of moving that way. It seems that all of Google’s mainstream applications (and some that are in the lab) all require more and more machine learning.

I believe the intent is to make TPU TensorFlow processing available to Google Cloud customers through their Cloud Machine Learning SAAS solution.

Google’s not alone

Apparently there are other organizations using custom designed ASICs, FPGAs, GPU’s, etc. to accelerate machine learning, (for more on custom ASICs, see my  posts on IBM’s TrueNorth and PCM neuromorphic accelerators). There are also others out working with FPGA accelerators (see Microsoft’s new FPGA), GPU accelerators (see Facebook’s openGPU) and at least one other with a whole custom designed ASIC (see Nirvana systems) to be used in cloud based machine learning.

If anything, machine learning seems to be heating up hardware development and customization.

Google’s also a member of the OpenPower foundation created by IBM and partners to create a wider ecosystem surrounding customized processor chips based on IBM’s POWER architecture. No word on whether TPU is based on POWER architectures but it wouldn’t surprise me if it was

Hardware wins again, take that software innovation…

Photo Credit(s): Images from Google Cloud Platform Blog on the TPU