Where should IoT data be processed – part 2

I wrote a post a while back on Where IOT data should be processed – part 1. We will get back to that post in a moment, but recently I read an article (How big data forced the hunt for ET intelligence to evolve) that mentioned after 20 years, they were shutting down SETI@home.

SETI@home was a crowdsourced computational network that took snippets of radio spectrum, sent them to 1000s of home computers to be analyzed during idle computer time, once processed the analysis was sent back to SETI@home. It was one of the first to use a crowdsourced approach to perform data processing. The data was collected at a radio telescope, sent to SETI@home and distributed from there.

6 Factors for IOT data processing

In my post I talked about 6 factors that should help determine where data is processed. Those 6 factors included

  • Data size which is a measure of the amount (GB, TB or PBs) of data that is being generated at an IOT node
  • Data pipe availability, which is all about the networking bandwidth that’s available at the IOT node. If we are talking some sort of low-bandwidth networking access then it probably makes sense to process the data more locally and send only results of processing up the stack.
  • Processing criticality which indicates how important is the processing of the data. If the processing could save a life then maybe it should be done as close as possible to where the data is generated. If the data processing is less critical it could perhaps be done at other nodes in an IOT network
  • Processing time and infrastructure cost which is all about what sort of computational resources are required to perform the processing and how much would it cost. If processing of the data is to undergo multiple passes or requires multi-core CPUs or GPUs, moving data off the IoT node and onto a more comprehensive server to process it, could make sense.
  • Compliance, governance and archive requirements, which discussed the potential need for all data to be available for regulatory audits and as such may need to be available at a central location anyway so why not perform processing there.
  • Data information funnel, which talked about the fact that an IoT network should be configured in layers and that each layer in the stack should probably be responsible for some portion of the data processing needed by the overall system, if nothing more than compressing the information before it is sent elsewhere.

Now that I review the list, the last, Data information funnel, factor really should be a function of the other factors rather than a separate factor.

In that blog post I promised to follow it up with some examples of the logic applied to real world problems. SETI is the first one I’ve seen in the literature

SETI’s IoT processing problem

Closeup front view of one antenna of the Allan Telescope Array, a radio telescope for combined radio astronomy and SETI (Search for Extraterrestrial Intelligence) research being built by the University of California at Berkeley, outside San Francisco. The first phase, consisting of 42 6 meter dish antennas like the one shown here, was completed in 2007. Eventually it will have 350 antennas. This type of antenna is called an offset Gregorian design. The incoming radio waves are reflected by the large parabolic dish onto a secondary concave parabolic reflector in front of the dish, and then into a feed horn. A metal shroud can be seen along the bottom of the secondary reflector which shields the antenna from ground noise. It covers the frequency range from 0.5 to 11.2 GHz.

The SETI researchers found that “The telescopes are now capable of producing so much data that it’s not possible to get that volume of data out to volunteers,” And “The discovery space is in these massive, massive data streams. And it’s just not efficient to distribute many terabits per second out to volunteers all over the world. It’s more efficient for that data processing to happen at the actual observatory.”

So they moved the data processing for the SETI IoT network from being distributed out to home computers throughout the world to being done at the (telescope) source where the data was originally generated.

This decision seems to rely on a couple of the factors above. Namely the pipe availability and data size factors. They had to move processing because no pipes existed to send Tb of data to 1000s of home computers. And finally, the processing time and infrastructure cost has come down so much, that it was just easier to do the processing onsite.

It doesn’t seem like processing criticality or compliance-governance-archive had any bearing on the decision.

So there’s the first example that seems to fit well into our data processing framework.

~~~~

We ought to be able to come up with a formula that uses all these factors and comes up to with a yes or no as to whether to process the data on the node or not.

Photo Credit(s)

Open source ASICs – Hardware vs. Software innovation (round 5)

A good friend of mine sent me an article yesterday (Produce your own physical chips for free, in the open) that announced a collaboration between Google, Skywater Technology Foundry and FOSSi (Free and open source silicon) Foundation that ultimately supplies a completely open source set of tools to create ASICs at 130nm node ASIC level. The last piece of this toolkit was an open source PDK (Process Design Kit) data that was produced by Google-Skywater technologies and their offer for free fab services to manufacture chips that were designed with the tool set.

Layout snapshots of 2D and 3D ICs designed in 130-nm process technology: (a) 2D IC (2D-130); (b) the top and bottom tiers of a 3D IC using macro-level partitioning (3D-MP-130); and (c) the top and bottom tiers of a 3D IC using pipeline-level partitioning (3D-PP-130). 

The industry and I have had a long term discussion in this blog and elsewhere about the superiority of hardware innovation vs. software innovation using commodity hardware (e.g., see TPU and hardware vs. software innovation (Round 3) and Hardware vs. software innovation – Round 4). Most of the tech industry believes that software innovation on commodity hardware is better than hardware innovation. We beg to differ and in our mind, it’s the combination of hardware AND software innovation that is remaking the world.

Much of this can be seen with smart phone technology. The smart phone would not be possible without significant hardware innovation and has supplied ubiquitous computing for the world. That is it has connected billions to the internet that had no connection before.

But historically, hardware innovation has been hard to do, took a long time, and costs a lot vs. software innovation with commodity hardware, which by definition, is easier to do, takes almost no time (with continuous innovation even less) and costs almost nothing, especially when using open source.

The one innovation that emerged over the last few decades to make new hardware creation easier, has been the FPGA. FPGAs allow for “programing” hardware logic in the lab (sometime in the field) rather than having it be set in silicon in the fab. The toolchains to support FPGA programming can be proprietary but some are also available in open source. For example, SymbiFlow (open source) takes in Verilog (IEEE standard hardware definition language) and converts it into a binary bit stream used to program most (Xilinx-7 and Lattice) FPGAs.

But this recent announcement makes the process to create ASICs completely open source and much easier and cheaper to do

ASICs design flow

Prior to this announcement, most PDKs were expensive and specific to a particular FAB and process node. With Google’s and Skywater’s release of open source PDK (on GitHub) data, designers and engineers now have a completely open source tool kit that is they have RTL (Register-transfer-level, hardware description logic) design tools, EDA tools and PDK data to create their own ASICs. And with this toolkit Skywater together with Google will manufacture ASICs for you, at no cost.

The FOSSi dial up talk (embedded in the announcement above) goes into much detail about the FPGA and ASIC tool chain. but prior to this announcement the PDK data which is used to help the RTL and EDA tools simulate, verify and determine the optimum layout for the hardware design was always proprietary.

Open source RTL tools have been available for years now starting with OpenCores, OpenRISC, RISC-V and now OpenPower. RISC-V and OpenPower include RTL to implement sophisticade instruction set CPUs. OpenRISC is RTL for a precursor to RISC-V and OpenCores supplies the RTL for a number of other (CPU) cores. But this is just a sample of the RTL that’s available in open source.

EDA Tools are also available in open source. The most recent incarnation would be the DARPA funded, OpenROAD project. OpenROAD will ultimately provide a completely open source EDA Tool set for electronic design. The first component of this is a set of EDA tools that convert RTL to GDS II (industry standard graphical design stream description of a IC chip componentry and layout). GDS II streams are used to create masks for IC fabrication.

And now with the open source Google-Skywater PDK data, one has a complete open source tool chain to create ASICs at the 130nm node level for the Skywater Fab in Minnesota.

A PDK contains a lot of data about the ASIC fabrication process including process design rules, analog and digital design cells and models, behavioral models for analog and digital design, extracted data for simulation and other supporting functionality.

The Google-Skywater Technologies open source PDK is Apache 2.0 Licensed. The PDK is used in the SKY130 process node, which includes 130nm technologies, high voltage support, 5 metal layers and one interconnect layer.

At the moment the PDK includes standard digital cell support (“nor” gates, “and” gates, flip flops, etc.) but over time they are planning to add analog cells, IO & periphery cells, analog RF as well a fully automated design rule checking, with SRAM/flash build spaces.

The PDK does include standard SRAM bit cells and in combination with OpenRAM project one can use SRAM cells to create SRAM memory for the ASIC.

Google-Skywater are going to be fabricating, for free, up to 40 ASIC designs starting in Fall of 2020 and then six months later, they will start fabricating ~40 ASICs ever 3 months.

However to qualify for free fabrication, your design has to be completely open source (located on GitHub). To submit your ASIC you need to send your public GitHub URL repository to efabless and they will perform verification processes on it. If it works, they will respond with an email that it was accepted. If more than 40 designs are submitted for a run, the Google-Skywater team will decide on which 40 will be manufactured

The 16mm**2 ASIC automatically comes with a RISC-V CPU, RAM and power plus ~40 IOs. There is another 10mm**2 space for all of your ASIC specific logic. If successful, you will get back ~100 to 400 packaged chips.

~~~~

ASICs were always lengthy and costly to design and then fabrication took more money and time, before you got anything back to test. With Open source tool kits, design should no longer cost anything but engineering time and with the sophistication available is todays toolchain, should not be that lengthy. And if your one of the lucky 40 designs, ASIC fabrication is free. And then starting next year fabrication runs will occur every 3 months. So you could potentially get your design back in an ASIC in as little as 3 months.

And while the 130nm technology node dates back to 2001-2003, there were plenty of sophisticated ASICss made during those years (at a previous job, we did a couple ourselves). And of course, with your very own RISC-V CPU inside, you could pretty much do anything you want with your ASIC. Yeah RAM, SRAM and other constraints may limit you, but that’s what hardware innovation is all about, deal with the physical constraints but open up a whole new architectural world.

Welcome to the a new era of ASIC (hardware) innovation.

Photo Credit(s):

Software defined power grid

Read an article this past week in IEEE Spectrum (The Software Defined Power Grid is here) about a company that has been implementing software defined power grids throughout USA and the world to better integrate and utilize renewable energy alongside conventional power generation equipment.

Moreover, within the last year or so, Tesla has installed a Virtual Power Plant (VPP) using residential solar and grid scale batteries to better manage the electrical grid of South Australia (see Tesla’s Australian VPP propped up grid during coal outage). VPP use to offset power outages would necessitate something like a software defined power grid.

Software defined power grid

Not sure if there’s a real definition somewhere but from our perspective, a software defined power grid is one where power generation and control is all done through the use of programatic automation. The human operator still exists to monitor and override when something goes wrong but they are not involved in the moment to moment control of which power is saved vs. fed into the grid.

About a decade ago, we wrote a post about smart power meters (Smart metering’s data storage appetite) discussing the implementation of smart meters for home owners that had some capabilities to help monitor and control power use. But although that technology still exists, the software defined power grid has moved on.

The IEEE Spectrum article talks about a phasor measurement units (PMUs) that are already installed throughout most power grids. It turns out that most PMUs are capable of transmitting phasor power status at 60 times a second granularity and each status report is time stamped with high accuracy, GPS synchronized time.

On the other hand, most power grids today use SCADAs (supervisory control and data acquisition) to monitor and manage the power grid. But SCADAs only send data every 2-4 seconds. PMU’s are also installed in most power grids, but their information is not as important as SCADA to the monitoring, management and control of most (non-software defined) power grids.

One software defined power grid

PXiSE, the company in the IEEE Spectrum article, implemented their first demonstration project in Hawaii. That power grid had reached the limit of wind and solar power that it could support with human management. The company took their time and implemented a digital simulation of the power grid. But with the simulation in hand, battery storage and a off the shelf PC, the company was able to manage the grids power generation mix in real time with complete automation.

After that success, the company next turned to a micro-grid (building level power) with electronic vehicles, battery and solar power. Their software defined power grid reduced peak electricity demand within the building, saving significant money. With that success the company took their software defined power grid on the road to South Korea, Chile, Mexico and a number of other locations the world.

Tesla’s VPP

The Tesla VPP in South Australia, is planned to consists of up to 50K houses with solar PV panels and 13.5Kwh of batteries, able to deliver up to 250Mw of power generation and 650Mwh of power storage.

At the present time, the system has ~1000 house systems installed but even with that limited generation and storage capability it has already been called upon at least twice to compensate for coal generation power outage. To manage each and every household, they’d need something akin to the smart meters mentioned above in conjunction with a plethora of PMUs.

Puerto Rico’s power grid problems and solutions

There was an article not so long ago about the disruption to Puerto Rico’s power grid caused by Hurricanes Irma and Maria in IEEE Spectrum (Rebuilding Puerto Rico’s Power Grid: The Inside Story) and a subsequent article on making Puerto Rico’s power grid more resilient to hurricanes and other natural disasters (How to harden Puerto Rico’s power grid). The later article talked about creating micro grids, community PV and battery storage that could be disconnected from the main grid in times of disaster but also used to distribute power generation throughout the island.

Although the researchers didn’t call for the software defined power grid, it is our understanding that something similar would be an outstanding addition to their work there.

~~~~

As the use of renewables goes up and the price of batteries decreases while their capabilities go up over time, more and more power grids will need to become software defined. In the end, more software defined power grids with increasing renewables power generation and storage will make any power grid, more resilient and more fault tolerant.

Photo Credit(s):

Hybrid digital training-analog inferencing AI

Read an article from IBM Research, Iso-accuracy DL inferencing with in-memory computing, the other day that referred to an article in Nature, Accurate DNN inferencing using computational PCM (phase change memory or memresistive technology) which discussed using a hybrid digital-analog computational approach to DNN (deep neural network) training-inferencing AI systems. It’s important to note that the PCM device is both a storage device and a computational device, thus performing two functions in one circuit.

In the past, we have seenPCM circuitry used in neuromorphic AI. The use of PCM here is not that (see our Are neuromorphic chips a dead end? post).

Hybrid digital-analog AI has the potential to be more energy efficient and use a smaller footprint than digital AI alone. Presumably, the new approach is focused on edge devices for IoT and other energy or space limited AI deployments.

Whats different in Hybrid digital-analog AI

As researchers began examining the use of analog circuitry for use in AI deployments, the nature of analog technology led to inaccuracy and under performance in DNN inferencing. This was because of the “non-idealities” of analog circuitry. In other words, analog electronics has some intrinsic capabilities that induce some difficulties when modeling digital logic and digital exactitude is difficult to implement precisely in analog circuitry.

The caption for Figure 1 in the article runs to great length but to summarize (a) is the DNN model for an image classification DNN with fewer inputs and outputs so that it can ultimately fit on a PCM array of 512×512; (b) shows how noise is injected during the forward propagation phase of the DNN training and how the DNN weights are flattened into a 2D matrix and are programmed into the PCM device using differential conductance with additional normalization circuitry

As a result, the researchers had to come up with some slight modifications to the typical DNN training and inferencing process to improve analog PCM inferencing. Those changes involve:

  • Injecting noise during DNN neural network training, so that the resultant DNN model becomes more noise resistant;
  • Flattening the resultant DNN model from 3D to 2D so that neural network node weights can be implementing as differential conductance in the analog PCM circuitry.
  • Normalizing the internal DNN layer outputs before input to the next layer in the model

Analog devices are intrinsically more noisy than digital devices, so DNN noise sensitivity had to be reduced. During normal DNN training there is both forward pass of inputs to generate outputs and a backward propagation pass (to adjust node weights) to fit the model to the required outputs. The researchers found that by injecting noise during the forward pass they were able to create a more noise resistant DNN.

Differential conductance uses the difference between the conductance of two circuits. So a single node weight is mapped to two different circuit conductance values in the PCM device. By using differential conductance, the PCM devices inherent noisiness can be reduced from the DNN node propagation.

In addition, each layer’s outputs are normalized via additional circuitry before being used as input for the next layer in the model. This has the affect of counteracting PCM circuitry drift over time (see below).

Hybrid AI results

The researchers modeled their new approach and also performed some physical testing of a digital-analog DNN. Using CIFAR-10 image data and the ResNet-32 DNN model. The process began with an already trained DNN which was then retrained while injecting noise during forward pass processing. The resultant DNN was then modeled and programed into a PCM circuit for implementation testing.

Part D of Figure 4 shows the Baseline which represents a completely digital implementation using FP32 multiplication logic; Experiment which represents the actual use of the PCM device with a global drift calibration performed on each layer before inferencing; Mode which represents theira digital model of the PCM device and its expected accuracy. Blue band is one standard-deviation on the modeled result.

One challenge with any memristive device is that over time its functionality can drift. The researchers implemented a global drift calibration or normalization circuitry to counteract this. One can see evidence of drift in experimental results between ~20sec and ~60 seconds into testing. During this interval PCM inferencing accuracy dropped from 93.8% to 93.2% but then stayed there for the remainder of the experiment (~28 hrs). The baseline noted in the chart used digital FP32 arithmetic for infererenci and achieved ~93.9% for the duration of the test.

Certainly not as accurate as the baseline all digital implementation, but implementing DNN inferencing model in PCM and only losing 0.7% accuracy seems more than offset by the clear gain in energy and footprint reduction.

While the simplistic global drift calibration (GDC) worked fairly well during testing, the researchers developed another adaptive (batch normalization statistical [AdaBS]) approach, using a calibration image set (from the training data) and at idle times, feed these through the PCM device to calculate an average error used to adjust the PCM circuitry. As modeled and tested, the AdaBS approach increased accuracy and retained (at least modeling showed) accuracy over longer time frames.

The researchers were also able to show that implementing part (first and last layers) of the DNN model in digital FP32 and the rest in PCM improved inferencing accuracy even more.

~~~~

As shown above, a hybrid digital-analog PCM AI deployment can provide similar accuracy (at least for CIFAR-10/ResNet-24 image recognition) to an all digital DNN model but due to the efficiencies of the PCM analog circuitry allowed for a more energy efficient DNN deployment.

Photo Credit(s):