New era of graphical AI is near #AIFD2 @Intel

I attended AIFD2 ( videos of their sessions available here) a couple of weeks back and for the last session, Intel presented information on what they had been working on for new graphical optimized cores and a partner they have, called Katana Graph, which supports a highly optimized graphical analytics processing tool set using latest generation Xeon compute and Optane PMEM.

What’s so special about graphs

The challenges with graphical processing is that it’s nothing like standard 2D tables/images or 3D oriented data sets. It’s essentially a non-Euclidean data space that has nodes with edges that connect them.

But graphs are everywhere we look today, for instance, “friend” connection graphs, “terrorist” networks, page rank algorithms, drug impacts on biochemical pathways, cut points (single points of failure in networks or electrical grids), and of course optimized routing.

The challenge is that large graphs aren’t easily processed with standard scale up or scale out architectures. Part of this is that graphs are very sparse, one node could point to one other node or to millions. Due to this sparsity, standard data caching fetch logic (such as fetching everything adjacent to a memory request) and standardized vector processing (same instructions applied to data in sequence) don’t work very well at all. Also standard compute branch prediction logic doesn’t work. (Not sure why but apparently branching for graph processing depends more on data at the node or in the edge connecting nodes).

Intel talked about a new compute core they’ve been working on, which was was in response to a DARPA funded activity to speed up graphical processing and activities 1000X over current CPU/GPU hardware capabilities.

Intel presented on their PIUMA core technology was also described in a 2020 research paper (Programmable Integrated and Unified Memory Architecture) and YouTube video (Programmable Unified Memory Architecture).

Intel’s PIUMA Technology

DARPA’s goals became public in 2017 and described their Hierarchical Identity Verify Exploit (HIVE) architecture. HIVE is DOD’s description of a graphical analytics processor and is a multi-institutional initiative to speed up graphical processing. .

Intel PIUMA cores come with a multitude of 64-bit RISC processor pipelines with a global (shared) address space, memory and network interfaces that are optimized for 8 byte data transfers, a (globally addressed) scratchpad memory and an offload engine for common operations like scatter/gather memory access.

Each multi-thread PIUMA core has a set of instruction caches, small data caches and register files to support each thread (pipeline) in execution. And a PIUMA core has a number of multi-thread cores that are connected together.

PIUMA cores are optimized for TTEPS (Tera-Traversed Edges Per Second) and attempt to balance IO, memory and compute for graphical activities. PIUMA multi-thread cores are tied together into (completely connected) clique into a tile, multiple tiles are connected within a single node and multiple nodes are tied together with a 8 byte transfer optimized network into a PIUMA system.

P[I]UMA (labeled PUMA in the video) multi-thread cores apparently eschew extensive data and instruction caching to focus on creating a large number of relatively simple cores, that can process a multitude of threads at the same time. Most of these threads will be waiting on memory, so the more threads executing, the less likely that whole pipeline will need to be idle, and hopefully the more processing speedup can result.

Performance of P[I]UMA architecture vs. a standard Xeon compute architecture on graphical analytics and other graph oriented tasks were simulated with some results presented below.

Simulated speedup for a single node with P[I]UMAtechnology vs. Xeon range anywhere from 3.1x to 279x and depends on the amount of computation required at each node (or edge). (Intel saw no speedups between a single Xeon node and multiple Xeon Nodes, so the speedup results for 16 P[I]UMA nodes was 16X a single P[I]UMA node).

Having a global address space across all PIUMA nodes in a system is pretty impressive. We guess this is intrinsic to their (large) graph processing performance and is dependent on their use of photonics HyperX networking between nodes for low latency, small (8 byte) data access.

Katana Graph software

Another part of Intel’s session at AIFD2 was on their partnership with Katana Graph, a scale out graph analytics software provider. Katana Graph can take advantage of ubiquitous Xeon compute and Optane PMEM to speed up and scale-out graph processing. Katana Graph uses Intel’s oneAPI.

Katana graph is architected to support some of the largest graphs around. They tested it with the WDC12 web data commons 2012 page crawl with 3.5B nodes (pages) and 128B connections (links) between nodes.

Katana runs on AWS, Azure, GCP hyperscaler environment as well as on prem and can scale up to 256 systems.

Katana Graph performance results for Graph Neural Networks (GNNs) is shown below. GNNs are similar to AI/ML/DL CNNs but use graphical data rather than images. One can take a graph and reduce (convolute) and summarize segments to classify them. Moreover, GNNs can be used to understand whether two nodes are connected and whether two (sub)graphs are equivalent/similar.

In addition to GNNs, Katana Graph supports Graph Transformer Networks (GTNs) which can analyze meta paths within a larger, heterogeneous graph. The challenge with large graphs (say friend/terrorist networks) is that there are a large number of distinct sub-graphs within the graph. GTNs can break heterogenous graphs into sub- or meta-graphs, which can then be used to understand these relationships at smaller scales.

At AIFD2, Intel also presented an update on their Analytics Zoo, which is Intel’s MLops framework. But that will need to wait for another time.


It was sort of a revelation to me that graphical data was not amenable to normal compute core processing using today’s GPUs or CPUs. DARPA (and Intel) saw this defect as a need for a completely different, brand new compute architecture.

Even so, Intel’s partnership with Katana Graph says that even today compute environment could provide higher performance on graphical data with suitable optimizations.

It would be interesting to see what Katana Graph could do using PIUMA technology and appropriate optimizations.

In any case, we shouldn’t need to wait long, Intel indicated in the video that P[I]UMA Technology chips could be here within the next year or so.


Photo Credit(s):

  • From Intel’s AIFD2 presentations
  • From Intel’s PUMA you tube video

Swarm learning for distributed & confidential machine learning

Read an article the other week about researchers in Germany working with a form of distributed machine learning they called swarm learning (see: AI with swarm intelligence: a novel technology for cooperative analysis …) which was reporting on a Nature magazine article (see: Swarm Learning for decentralized and confidential clinical machine learning).

The problem of shared machine learning is particularly accute with medical data. Many countries specifically call out patient medical information as data that can’t be shared between organizations (even within country) unless specifically authorized by a patient.

So these organizations and others are turning to use distributed machine learning as a way to 1) protect data across nodes and 2) provide accurate predictions that uses all the data even though portions of that data aren’t visible. There are two forms of distributed machine learning that I’m aware of federated and now swarm learning.

The main advantages of federated and swarm learning is that the data can be kept in the hospital, medical lab or facility without having to be revealed outside that privileged domain BUT the [machine] learning that’s derived from that data can be shared with other organizations and used in aggregate, to increase the prediction/classification model accuracy across all locations.

How distributed machine learning works

Distributed machine learning starts with a common model that all nodes will download and use to share learnings. At some agreed to time (across the learning network), all the nodes use their latest data to re-train the common model and share new training results (essentially weights used in the neural network layers) with all other members of the learning network.

Shared learnings would be encrypted with TLS plus some form of homomorphic encryption that allowed for calculations over the encrypted data.

In both federated and swarm learning, the sharing mechanism was facilitated by a privileged block chain (apparently Etherium for swarm). All learning nodes would use this blockchain to share learnings and download any updates to the common model after sharing.

Federated vs. Swarm learning

The main difference between federated and swarm learning is that with federated learning there is a central authority that updates the model(s) and with swarm learning that processing is replaced by a smart contract executing within the blockchain. Updating model(s) is done by each node updating the blockchain with shared data and then once all updates are in, it triggers a smart contract to execute some Etherium VM code which aggregates all the learnings and constructs a new model (or at least new weights for the model). Thus no node is responsible for updating the model, it’s all embedded into a smart contract within the Etherium block chain. .

Buthow does the swarm (or smart contract) update the common model’s weights. The Nature article states that they used either a straight average or a weighted average (weighted by “weight” of a node [we assume this is a function of the node’s re-training dataset size]) to update all parameters of the common model(s).

Testing Swarm vs. Centralized vs. Individual (node) model learning

In the Nature paper, the researchers compared a central model, where all data is available to retrain the models, with one utilizing swarm learning. To perform the comparison, they had all nodes contribute 20% of their test data to a central repository, which ran the common swarm updated model against this data to compute an accuracy metric for the swarm. The resulting accuracy of the central vs swarm learning comparison look identical.

They also ran the comparison of each individual node (just using the common model and then retraining it over time without sharing this information to the swarm versus using the swarm learning approach. In this comparison the swarm learning approach alway seemed to have as good as if not better accuracy and much narrower dispersion.

In the Nature paper, the researchers used swarm learning to manage the machine learning model predictions for detecting COVID19, Leukemia, Tuberculosis, and other lung diseases. All of these used public data, which included PBMC (peripheral blood mono-nuclear cells) transcription data, whole blood transcription data, and X-ray images.

Swarm learning also provides the ability to onboard new nodes in the network. Which would supply the common model and it’s current weights to the new node and add it to the shared learning smart contract.

The code for the swarm learning can be downloaded from HPE (requires an HPE passport login [it’s free]). The code for the models and data processing used in the paper are available from github. All this seems relatively straight forward, one could use the HPE Swarm Learning Library to facilitate doing this or code it up oneself.

Photo Credit(s):

Towards a better AGI – part 3(ish)

Read an article this past week in Nature about the need for Cooperative AI (Cooperative AI: machines must learn to find common ground) which supplies the best view I’ve seen as to a direction research needs to go to develop a more beneficial and benign AI-AGI.

Not sure why, but this past month or so, I’ve been on an AGI fueled frenzy (at leastihere). I didn’t realize this was going to be a multi-part journey otherwise, I would have lableled them AGI part-1 & -2 ( please see: Existential event risks [part-0], NVIDIA Triton GMI, a step to far [part-1] and The Myth of AGI [part-2] to learn more).

But first please take our new poll:

The Nature article puts into perspective what we all want from future AI (or AGI). That is,

  • AI-AI cooperation: AI systems that cooperate with one another while at the same time understand that not all activities are zero sum competitions (like chess, go, Atari games) but rather most activities, within the human sphere, are cooperative activities where one agent has a set of goals and a different agent has another set of goals, some of which overlap while others are in conflict. Sport games like soccer lacrosse come to mind. But there are other card and (Risk & Diplomacy) board games that use cooperating parties, with diverse goals to achieve common ends.
  • AI-Human cooperation: AI systems that cooperate with humans to achieve common goals. Here too, most humans have their own sets of goals, some of which may be in conflict with the AI systems goals. However, all humans have a shared set of goals, preservation of life comes to mind. It’s in this arena where the challenges are most acute for AI systems. Divining human and their own system underlying goals and motivations is not simple. And of course giving priority to the “right” goals when they compete or are in conflict will be an increasingly difficult task to accomplish, given todays human diversity.
  • Human-Human cooperation: Here it gets pretty interesting, but the paper seems to say that any future AI system should be designed to enhance human-human interaction, not deter or interfere with it. One can see the challenge of disinformation today and how wonderful it would be to have some AI agent that could filter all this and present a proper picture of our world. But, humans have different goals and trying to figure out what they are and which are common and thereby something to be enhanced will be an ongoing challenge.

The problem with today’s AI research is that its all about improving specific activities (image recognition, language understanding, recommendation engines, etc) but all are point solutions and none (if any) are focused on cooperation.

Tit for tat wins the award

To that end, the authors of the paper call for a new direction one that attempts to imbue AI systems with social intelligence and cooperative intelligence to work well in the broader, human dominated world that lies ahead.

In the Nature article they mentioned a 1984 book by Richard Axelrod, The Evolution of Cooperation. Perhaps, the last great research on cooperation that was ever produced.

In this book it talked about a world full of simulated prisoner dilemma actors that interacted, one with another, at random.

The experimenters programmed some agents to always do the proper thing for their current partner, some to always do the wrong thing to their partner, others to do right once than wrong from that point forward, etc. The experimenters tried every sort of cooperation policy they could think of.

Each agent in an interaction would get some number of points for an interaction. For example, if both did the right thing they would each get 3 points, if one did wrong, the sucker would get 1 and the bad actor would get 4, both did wrong each got 1 point, etc.

The agents that had the best score during a run (of 1000s of random pairings/interactions) would multiply for the the next run and the agents that did worse would disappear over time in the population of agents in simulated worlds.

The optimal strategy that emerged from these experiments was

  1. Do the right thing once with every new partner, and
  2. From that point forward tit for tat (if the other party did right the last time, then you do right thing the next time you interact with them, if they did wrong the last time, then you do wrong the next time you interact with them).

It was mind boggling at the time to realize that such a simple strategy could be so effective/sustainable in simulation and perhaps in the real world. It turns out that in a (simulated) world of bad agents, there would be this group of Tit for Tat agents that would build up, defend itself and expand over time to succeed.

That was the state of the art in cooperation research back then (1984). I’ve not seen anything similar to this since.

I haven’t seen anything like this that discusses how to implement algorithms in support of social intelligence.


The authors of the Nature article believe it’s once again time to start researching cooperation techniques and start researching social intelligence so we can instill proper cooperation and social intelligence technology into future AI (AGI) systems .

Perhaps if we can do this, we may create a better AI (or AGI) so that both it and we can live better in our world, galaxy and universe.


The myth of AGI

Sorry seem to be on an AGI bent this month…

Read an article the other day about a new book (The myth of AI, by Erik. J. Larson) that explains how the present direction of AI-ML-DL will be very unlikely to achieve artificial general intelligence (AGI) given it’s current direction. Amazon and others offer a short preview of the book which is where most of this discussion comes from.

Types of (human) reasoning

Near as I can tell, (don’t have the book), the book discusses the three types of reasoning that exist in human intellect, i.e., deduction, induction and abduction.

  • Deduction uses formal logic (or its equivalents) to derive facts or theorems from basic principles.
  • Induction uses a multitude of samples and constructs general principles from the analysis of them
  • Abduction uses a set of probabilistic assertions and formal logic, to come up with a probabilistic principle.

Deduction is most famously observed in geometry and arithmetic proofs and was most evident in the early years of AI through its use of expert systems. The challenge with expert systems is that the real world is vastly more complex than any geometrical or arithmetical artifice that humankind can produce.

Expert systems became champions of checkers, chess and some other games but in the end was not easily generalizable beyond a few (gaming and medically) restricted domains.

Induction is presently all the rage and represents what machine learning and deep neural networks (DNN) are doing with all that training data and resultant classification inferencing.

Today we have DNNs that can classify the objects in an image, can learn to play any game on the planet better than humans, and can even safely drive a car down the road.

The current AI world view is that this form of reasoning, DNN induction, will if taken to its extreme will ultimately result in some level of AGI, or human-equivalent levels of intelligence in a system. The author of the book begs to differ.

Abduction is less well known or discussed in rational circles. It’s essentially what any human does when presented with real world examples/experiences to derive an understanding (or principe) of what happened.

For example, a plate full of cookies last night becomes an almost empty plate of crumbs and two cookies. So what happened, your son woke up early, consumed most if not all of them, and left for work. This is a probabilistic (most likely) inference, but has a high probability of being true.

Any AGI will need all forms of reasoning

The challenge is that AI has been through the deduction phase through the rise of expert systems which crashed and burned because of the cost and time required to produce an exhaustive and correct expert system. And AI is currently in the induction phase, via DNN training, which seems to be entirely more generalizable and successfully usable in many different domains, but no one is talking seriously about doing abduction in AI (anymore).

The author claims (again, have not read the book) that any AGI will require as much abduction as induction (as well as perhaps deduction), and therefore, AGI is not inevitable based on our current AI DNN (or induction) intensive path.

Previous and current attempts at abduction reasoning

Some may recall fuzzy logic as one of the avenues taken after expert systems seemed to fail at doing successful and realistic inferencing around the end of last century. Fuzzy logic was a way of bring probabilities into deduction, not unlike abduction as defined above. With fuzzy logic each assertion or base assumption was given a probabilistic value (of being true) and the final derivation was assigned some level of probability of being true.

The wikipedia article has definitions for fuzzy logic and, or and not which of course would allow any system to make these assertions. But fuzzy logic (like expert systems above) suffered from the inability to exhaustively cover all examples in a real world situation.

Furthermore, the (funny) thing about DNNs is that they are much more probabilistic than it appears. If one examines classification outputs of any DNN, it is extremely rare to see some sort of boolean (true or false) yes or no answers. Mostly one sees a series of probabilities that are assigned to each classification bucket.

DNN systems hide these probabilities by just selecting the maximum (or minimum) probability generated as its final classification. This is entirely an artifact of needing to have some discrete output (classification selection). But DNN (internal) results always result in probabilistic values.

So although, pure induction doesn’t include probabilities, DNN induction as practiced today in AI systems, uses probabilistic reasoning in every layer of a DNN and in its final results.

What else may be missing from AI to allow AGI to be developed

Personally, AGI seems to require not just the reasoning approaches above, but a more workable and general purpose planning solution. I’ve tried to identify to see whether some researchers are using DNNs to provide general purpose planning solutions but have been yet to find any (in publcly available research). These are probably the one place where expert (or control) fuzzy systems still shine. But again they are hard to generalize and prove almost impossible to be completely exhaustive.

Nonetheless, in the end, I think that all the above just proves, that there are a number of distinct reasoning and other (planning) techniques that may need to come together to provide AGI. As any of us can attest, all of these different approaches are available within any human intellect.

And if we assume that any AGI will need to follow the human design to intelligence (not a given), they will all need to be stitched together, combined and brought to bear to realize AGI.

But, at present, with all the focus on DNN/induction, we, as AI researchers, are not making any progress on using these other techniques or in combining them into a single system.

And for that I am happy. I would be very pleased to have any AGI be farther out than nearer term. Because for the life of me, AGI scares the s&#t out of me.

Mostly because I don’t see any real way to control AGI, once unleashed. That and given the diversity of motives around this world, I don’t see any realistic mechanism to instill a universal and firm (unalterable) belief in the sanctity of human and other life, the dependance this life has on our environment/biosphere and the rule of law needed to maintain peace across humankind (and I’m probably missing a half dozen more things that we would want any AGI to adhere to).

Maybe, if I saw more effort on how, we as a species can come up with universal views on these and other topics and can come up with some way of instilling, essentially a system of programs, with these unalterable beliefs and AGI controls based on these, I’d be less fearful of AGI emerging.

Lacking that, any way of delaying its emergence, is fine by me.


Photo Credit(s):

NVIDIA Triton Giant Model Inference, a step too far

At GTC this week NVIDIA announced a new capability for their AI suite called Triton Giant Model Inference . This solution addresses the current and future problem of trying to perform inferencing with models whose parameters exceed a single GPU card.

During NVIDIA’s GTC show they showed a chart which indicates that model parameters are on an exponential climb (just eyeballing it here but 10X every year since 2018). Current models, like OpenAI’s GPT-3 have 175B parameters. Such a model would require ~350GB of GPU memory to perform inferencing on the whole model.

The fact that NVIDIA’s A100 currently sports 80GB of GPU memory means that GPT-3 would need to be cut up or partitioned to run on NVIDIA GPUs. Hence the need (from NVIDIA’s perspective) for a mechanism that can allow them to perform multi-GPU inferencing or their Triton Giant Machine Inference engine (GMI).

But first please take our new poll:

Why do we need GMI

It’s unclear what needs to be done to perform inferencing with a 175B parameter model today but my guess it involves a lot of manual splitting up of the model, into different layers/partitions and running the layers/partitions on separate GPUs and gluing the output of one portion to the input of the next. Such activity would be a complex, manual undertaking and would inherently slow down the model inferencing activities and add to inferencing latencies.

With Triton GMI, NVIDIA appears able to supply automated multi-GPU inferencing for models that exceed single GPU memory. Whether such models can span (DGX) servers or not was not revealed but even within a single DGX server there’s 4-A100s, so that provides an aggregate of 320GB of GPU memory. Of course, it’s very likely future Ampere GPUs will allow for more memory.

Why consider a step too far

Here’s my point, with artificial general intelligence (AGI, reasoning at human levels and beyond), coming sooner or later. My (and perhaps, humanities) preference is to have this happen later than earlier. Hopefully, this will give us more time to understand how to design/engineer/control AGI so that it doesn’t harm humanity or the earth. (See my post on Existential event risk… for more information on risks of Superintelligence)

One way to control or delay the emergence of AGI is to limit model size. Now NVIDIA, Google and others have already released capabilities that allow them to train models that exceed the size of one GPU.

Alas, the only thing left is to consider limit the size of models that can be used to perform inferencing. I fear that Triton GMI pretty much open up the flood gates to supply any size model inferencing. This will provide for more and more sophisticated AI/ML/DL models and will uncap model sizes in the near future.

Doing this will give us (humanity) a little more time to understand how to control AGI. But all this presupposes that any AGI will require more parameters than current DNN models. I think this is a safe assumption but I’m no expert.

Will delaying NVIDIA Triton GMI really help

I was not briefed on internals of GMI but possibly it makes use of DGX NV-Link and NVIDIA Software to automatically partition a DNN and deploy it over the 4-A100 GPUS in a DGX.

NVIDIA is not the only organization working on advancing DNN training and inferencing capabilities. And it’s very likely that more than one of them (Google, FaceBook, AWS, etc) have probably identified the model size as a problem for inferencing and are working on their own solutions. So delaying GMI will not be a long term fix.

But maybe if we could just delay this capability from reaching the market for 2 to 5 years it would have a follow on impact of delaying the emergence of AGI.

Is this going to stop some one/some organization from achieving AGI, probably not. Could it delay some person/organization/government from getting there – maybe. Perhaps, it will give humanity enough time to come up with other ways to control AGI. But I fear the more technology moves on, are options for controlling AGI diminish.

Don’t get me wrong. I think AI, DL NN and NVIDIA (Google, DeepMind, Facebook and others) have done a great service to help mankind succeed over this next century. And I in no way wish to hold back this capability. And a “good” AGI has the potential to help everyone on this earth in more ways than I can imagine.

But achieving AGI is a step function and once unleashed it may be difficult to control. Anything we can do today to a) delay the emergence of AGI and b) help to control it, is IMHO, worthy of consideration.


Photo Credits:

  • from NVIDIA GTC Keynote by Jensen Huang, CEO
  • From Hackernoon article, Can Bitcoin AGI develops to benefit humanity

AI inferencing using light alone

Researchers at UCLA have taken a trained DL neural network and implemented it into a series of passive optical only, 3D printed diffraction gratings to perform fashion MNIST object classification. And did the same with a MNIST handwritten digit and ImageNet DL neural network classifiers.

But first please take our new poll:

Experimental testing of 3D-printed D2NNs.(A and B) After the training phase, the final designs of five different layers (L1, L2, …, L5) of the handwritten digit classifier, fashion product classifier, and the imager D2NNs are shown. To the right of the network layers, an illustration of the corresponding 3D-printed D2NN is shown. (C and D) Schematic (C) and photo (D) of the experimental terahertz setup. An amplifier-multiplier chain was used to generate continuous-wave radiation at 0.4 THz, and a mixer-amplifier-multiplier chain was used for the detection at the output plane of the network. RF, radio frequency; f, frequency.

See the article on SlashGear, 3D printed all-optical diffractive deep learning neural network…. The research article is only available on Optical Society of America’s website/magazine (see Residual D2NN: training diffractive deep neural networks via learnable light shortcuts behind hard paywall). However, I did find a follow on article on ArchivX (see Analysis of Diffractive Optical Neural Networks and Their Integration with Electronic Neural Networks) that discussed how to integrate D2NN approaches with an electronic NN to create a hybrid inference engine. And another earlier Science article (see All-optical machine learning using diffractive deep neural networks) that was available which described earlier versions of D2NN technology for MNIST digit classification, fashion MNIST classification and ImageNet object classification.

How does it work

Apparently the researchers trained a normal (electronic based) deep learning neural network on the MNIST, Fashion MNIST and ImageNet and then converted the resultant trained NNs into a set of multiple diffraction grids. They did some computer simulation of the D2NN and once satisfied it worked and achieved decent accuracy, 3D printed the diffraction plates.

All-optical D2NN-based classifiers. These D2NN designs were based on spatially and temporally coherent illumination and linear optical materials/layers. (a) D2NN setup for the task of classification of handwritten digits (MNIST), where the input information is encoded in the amplitude channel of the input plane. (b) Final design of a 5-layer, phase-only classifier for handwritten digits. (c) Amplitude distribution at the input plane for a test sample (digit ‘0’). (d-e) Intensity patterns at the output plane for the input in (c); (d) is for MSE-based, and (e) is softmax- cross-entropy (SCE)-based designs. (f) D2NN setup for the task of classification of fashion products (Fashion-MNIST), where the input information is encoded in the phase channel of the input plane. (g) Same as (b), except for fashion product dataset. (h) Phase distribution at the input plane for a test sample. (i-j) Same as (d) and (e) for the input in (h),  refers to the illumination source wavelength. Input plane represents the plane of the input object or its data, which can also be generated by another optical imaging system or a lens, projecting an image of the object data onto this plane.

In their D2NN, they start with coherent (laser) light in the THz spectrum, used this to illuminate the input plane (I assume an image of the object/digit/fashion accessory) and passed this through multiple plates of diffraction grids onto THz detector which was used to detect the illuminated spot that indicated the classification.

The article in science has a supplementary materials download that show how the researchers converted NN weights into a diffraction grating. Essentially each pixel on the diffraction grating either transmits, refracts, or reflects a light path. And this represents the connections between layers. It’s unclear whether the 5 or 6 plates used in the D2NN correspond to the NN layers but it’s certainly possible.

And to the life of me I can’t understand what they mean by “Residual D2NN”, other than if it means using a trained (residual) NN and converting this to D2NN.

Some advantages of D2NN

3D printing diffraction gratings means anyone/lab could do this. The 3D printers they used had a spatial accuracy of 600 dpi, with 0.1mm accuracy, almost consumer grade 3D printers. In any case, being able to print these in a matter of hours, while not as easy as changing an all digital NN, seems like an easy way to try out the approach.

For example, for the MNIST digit classifier they used a pixel size of 400um and each diffraction layer they created was equivalent to 200X200 neural weights. Which means that 5 layer D2NN could handle about 0.2M neural weights which were completely connected to one another. This meant they could have (200×200)**2*5=8B connections in the MNIST D2NN. In the image classifier, each diffraction layer had 300×300 neural weights. So D2NN’s seem to scale very well.

Being an all passive optical device, the system is operates entirely in parallel, That is, the researchers indicated that the D2NN devices operate at the speed of light and would perform the inferencing activity in the time it takes a camera to capture the image.

Also the device uses very little energy (I assume just the energy for the THz generator, the input plane detector and the THz detector at the end.

And the researchers also claimed the device was cheap to manufacture, it could be created for less than $50. (Unclear if this included all the electronics or just the D2NN diffraction gratings and holder). And once you have locked into a D2NN that you wanted to use, could be manufactured in volume, very cheaply (sort of like stamping out CD platters). Finally, the number of neural network nodes and layers can be scaled up to a large number of layers and nodes per layer while still fitting on the diffraction gratings. In contrast, all electronic NN require more compute power as you scale up network layers and nodes per layer.

The other article (ArchivX) talked about potentially using a hybrid optical-electronic DNN approach with some layers being D2NN and others being purely digital (electronics). Such a system could potentially be used where some portion of the NN was more stable/more compute intensive than others and where the final output classification layer(s) was more changeable and much smaller/less compute intensive. Such a hybrid system could make use of the best of of the all optical D2NN to efficiently and quickly compress the input space and then have the electronic final classification layer provide the final classification step.

The Oracle

Combining a handful of D2NNs into a device that accepts speech input and provides speech output with the addition of say an offline copy of Wikipedia, Google Books etc. with a search engine that could be used to retrieve responses to questions asked would create an oracle device. Where you would ask a question and the device would respond with the best answer it could find (in it’s databases).

If this could be made out of an all passive optical components and use natural sunlight/electronic illumination to perform it’s functionality, such an all optical, question to answer oracle would be very useful to the populations of the world. And could be manufactured in volume very cheaply and would cost almost nothing to operate.

A couple of other tweaks, if we could collapse the multiple grating D2NNs into a single multi-layer plate/platter and make these replaceable in the device that would allow the oracle’s information base to be updated periodically.

Then if we could embed such a device into a Long Now Clock that would reflect sunlight onto the disk every Solstice, or Equinox, then we could have a quarterly oracle device that could last for 1000 of years. That would provide answers to queries one day every quarter. And that would be quite the oracle…

Photo credit(s):

The birth of biocomputing (on paper)

Read an article this past week discussing how researchers in Barcelona Spain have constructed a biological computing device on paper (see Biocomputer built with cells printed on paper). Their research was written up in a Nature Article (see 2D printed multi-cellular devices performing digital or analog computations).

We’ve written about DNA computing and storage before (see DNA IT …, DNA Computing… posts and our GBoS podcast on DNA storage…). But this technology takes all that to another level.

2-bit_ALU (from
2-bit_ALU (from

The challenges with biological computing previously had been how to perform the input processing and output within a single cell or when using multiple cells for computations, how to wire the cells together to provide the combinational logic required for the circuit.

The researchers in Spain seemed to have solved the wiring problems by using diffusion across a porous surface (like paper) to create a carrier signal (wire equivalent) and having cell groups at different locations along this diffusion path either enhance or block that diffusion, amplify/reduce that diffusion or transform that diffusion into something different

Analog (combinatorial circuitry types of) computation for this biocomputer are performed based on the location of sets of cells along this carrier signal. So spatial positioning is central to the device and the computation it performs. Not unlike digital or combinatorial circuitry, different computations can be performed just by altering the position along the wire (carrier signal) that gates (cells) are placed.

Their process seems to start with designing multiple cell groups to provide the processing desired, i.e., enhancing, blocking, transforming of the diffusion along the carrier signal, etc. Once they have the cells required to transform the diffusion process along the carrier signal, they then determine the spatial layout for the cells to be used in the logical circuit to perform the computation desired. Then they create a stamp which has wells (or indentations) which can be filled in with the cells required for the computation. Then they fill these wells with cells and nutrients for their operation and then stamp the circuit onto a porous surface.

The carrier signal the research team uses is a small molecule, the bacterial 3OC6HSL acyl homoserine lactone (AHL) which seems to be naturally used in a sort of biologic quorum sensing. And the computational cells produce an enzyme that enhances or degrades the AHL flow along the carrier signal. The AHL diffuses across the paper and encounters these computational cells along the way and compute whatever it is that’s required to be computed. At some point a cell transforms AHL levels to something externally available

They created:

  • Source cells (Sn) that take a substance as input (say mercury) and converts this into AHL
  • .Gate cells (M) that provide a switch on the solution of AHL difusing across the substrate.
  • Carrier reporter cells (CR) which can be used to report on concentrations of AHL.

The CR cells produce green florescent reporter proteins (GFP). Moreover, each gate cell expresses red florescent reporter proteins (RFP) as well for sort of a diagnostic tap into its individual activity.

Mapping of a general transistor architecture on a cellular printed pattern obtained using a stamping template. Similar to the transistor architecture, the cellular pattern is composed of three main components: source (S1 cells), gate (M cells) that responds to external inputs and a drain (CR cells) as the final output responding to the presence of the carrying signal (CS). b Stamping template used to create the circuit made of PLA with a layer of synthetic fibre (green). Cellular inks (yellow) are in their corresponding containers. Before stamping, the synthetic fibre is soaked with the different cell types. Finally, the stamping template is pressed against the paper surface, depositing all cells. c Circuit response. In the absence of external input, i.e. arabinose, the CS encoded in the production of AHL molecules by S1 cells diffuses along the surface, inducing GFP expression in reporter cells CR. In the presence of 10−3 M arabinose (Ara), the modulatory element Mara produces the AHL cleaving enzyme Aiia, which degrades the CS. Error bars are the standard deviation (SD) of three independent experiments. Data are presented as mean values ± SD. Experiments are performed on paper strips. The average fold change is 5.6x. d Photography of the device. Source data are provided as a Source Data file.

Using S, M and CR cells they are able to create any type of gate needed. This includes OR, AND, NOR and XNOR gates and just about any truth table needed. With this level of logic they could potentially implement any analog circuit on a piece of paper (if it was big enough).

a Schematic representation of the multi-branch implementation of a truth table. bImplementation of different logic gates. A schematic representation of the cells used in each paper strip and their corresponding distance points is given (Left). Gates with two sources of S1 (OR and XNOR gates) are circuits carrying two branches, while the other gates (NOR and AND gates) can be implemented with just one branch. Input concentrations are Ara = 10−3 M and aTc = 10−6 M. M+aTc and MaTc are, respectively, positive and negative modulatory cells responding to aTc. M+ara and Mara are, respectively, positive and negative modulatory cells responding to arabinose. S1 cells produce AHL constitutively and CR are the reporter cells. Error bars are the standard deviation (SD) of three independent experiments. The average fold change has been obtained from the mean of ON and OFF states from each circuit. OR gate 14.31x, AND gate 6.21x, NOR gate 6.58x, XNOR gate 5.6x. Source data are provided as a Source Data file.

As we learn in circuits class, any digital logic can be reduced to one of a few gates, such as NAND or NOR.

As an example of uses of the biocomputing, they implemented a mercury level sensing device. Once the device is dipped in a solution with mercury, the device will display a number of green florescent dots indicating the mercury levels of the solution

The bio-logical computer can be stamped onto any surface that supports agent diffusion, even flexible surfaces such as paper. The process can create a single use bio-logic computer, sort of smart litmus paper that could be used once and then recycled.

The computational cells stay “alive” during operation by metabolizing nutrients they were stamped with. As the biocomputer uses biological cells and paper (or any flexible diffusible substrate) as variable inputs and cells can be reproduced ad-infinitum for almost no cost, biocomputers like this can be made very inexpensively and once designed (and the input cells and stamp created) they can be manufactured like a printing press churns out magazines.


Now I’d like to see some sort of biological clock capability that could be used to transform this combinatorial logic into digital logic. And then combine all this with DNA based storage and I think we have all the parts needed for a biological, ARM/RISC V/POWER/X86 based server.

And a capacitor would be a nice addition, then maybe they could design a DRAM device.

Its one off nature, or single use will be a problem. But maybe we can figure out a way to feed all the S, M, and CR cells that make up all the gates (and storage) for the device. Sort of supplying biological power (food) to the device so that it could perform computations continuously.

Ok, maybe it will be glacially slow (as diffusion takes time). We could potentially speed it up by optimizing the diffusion/enzymatic processes. But it will never be the speed of modern computers.

However, it can be made very cheap, and very height dense. Just imagine a stack of these devices 40in tall that would potentially consist of 4000-8000 or more processing elements with immense amounts of storage. And slowness may not be as much of a problem.

Now if we could just figure out how to plug it into an ethernet network, then we’d have something.

Photo credit(s):

  • 2 Bit alu from Wikipedia
  • Figures 1 & 3 from Nature article 2D printed multi-cellular devices performing digital and analog computation