Towards a better AGI – part 3(ish)

Read an article this past week in Nature about the need for Cooperative AI (Cooperative AI: machines must learn to find common ground) which supplies the best view I’ve seen as to a direction research needs to go to develop a more beneficial and benign AI-AGI.

Not sure why, but this past month or so, I’ve been on an AGI fueled frenzy (at leastihere). I didn’t realize this was going to be a multi-part journey otherwise, I would have lableled them AGI part-1 & -2 ( please see: Existential event risks [part-0], NVIDIA Triton GMI, a step to far [part-1] and The Myth of AGI [part-2] to learn more).

But first please take our new poll:

The Nature article puts into perspective what we all want from future AI (or AGI). That is,

  • AI-AI cooperation: AI systems that cooperate with one another while at the same time understand that not all activities are zero sum competitions (like chess, go, Atari games) but rather most activities, within the human sphere, are cooperative activities where one agent has a set of goals and a different agent has another set of goals, some of which overlap while others are in conflict. Sport games like soccer lacrosse come to mind. But there are other card and (Risk & Diplomacy) board games that use cooperating parties, with diverse goals to achieve common ends.
  • AI-Human cooperation: AI systems that cooperate with humans to achieve common goals. Here too, most humans have their own sets of goals, some of which may be in conflict with the AI systems goals. However, all humans have a shared set of goals, preservation of life comes to mind. It’s in this arena where the challenges are most acute for AI systems. Divining human and their own system underlying goals and motivations is not simple. And of course giving priority to the “right” goals when they compete or are in conflict will be an increasingly difficult task to accomplish, given todays human diversity.
  • Human-Human cooperation: Here it gets pretty interesting, but the paper seems to say that any future AI system should be designed to enhance human-human interaction, not deter or interfere with it. One can see the challenge of disinformation today and how wonderful it would be to have some AI agent that could filter all this and present a proper picture of our world. But, humans have different goals and trying to figure out what they are and which are common and thereby something to be enhanced will be an ongoing challenge.

The problem with today’s AI research is that its all about improving specific activities (image recognition, language understanding, recommendation engines, etc) but all are point solutions and none (if any) are focused on cooperation.

Tit for tat wins the award

To that end, the authors of the paper call for a new direction one that attempts to imbue AI systems with social intelligence and cooperative intelligence to work well in the broader, human dominated world that lies ahead.

In the Nature article they mentioned a 1984 book by Richard Axelrod, The Evolution of Cooperation. Perhaps, the last great research on cooperation that was ever produced.

In this book it talked about a world full of simulated prisoner dilemma actors that interacted, one with another, at random.

The experimenters programmed some agents to always do the proper thing for their current partner, some to always do the wrong thing to their partner, others to do right once than wrong from that point forward, etc. The experimenters tried every sort of cooperation policy they could think of.

Each agent in an interaction would get some number of points for an interaction. For example, if both did the right thing they would each get 3 points, if one did wrong, the sucker would get 1 and the bad actor would get 4, both did wrong each got 1 point, etc.

The agents that had the best score during a run (of 1000s of random pairings/interactions) would multiply for the the next run and the agents that did worse would disappear over time in the population of agents in simulated worlds.

The optimal strategy that emerged from these experiments was

  1. Do the right thing once with every new partner, and
  2. From that point forward tit for tat (if the other party did right the last time, then you do right thing the next time you interact with them, if they did wrong the last time, then you do wrong the next time you interact with them).

It was mind boggling at the time to realize that such a simple strategy could be so effective/sustainable in simulation and perhaps in the real world. It turns out that in a (simulated) world of bad agents, there would be this group of Tit for Tat agents that would build up, defend itself and expand over time to succeed.

That was the state of the art in cooperation research back then (1984). I’ve not seen anything similar to this since.

I haven’t seen anything like this that discusses how to implement algorithms in support of social intelligence.


The authors of the Nature article believe it’s once again time to start researching cooperation techniques and start researching social intelligence so we can instill proper cooperation and social intelligence technology into future AI (AGI) systems .

Perhaps if we can do this, we may create a better AI (or AGI) so that both it and we can live better in our world, galaxy and universe.


The myth of AGI

Sorry seem to be on an AGI bent this month…

Read an article the other day about a new book (The myth of AI, by Erik. J. Larson) that explains how the present direction of AI-ML-DL will be very unlikely to achieve artificial general intelligence (AGI) given it’s current direction. Amazon and others offer a short preview of the book which is where most of this discussion comes from.

Types of (human) reasoning

Near as I can tell, (don’t have the book), the book discusses the three types of reasoning that exist in human intellect, i.e., deduction, induction and abduction.

  • Deduction uses formal logic (or its equivalents) to derive facts or theorems from basic principles.
  • Induction uses a multitude of samples and constructs general principles from the analysis of them
  • Abduction uses a set of probabilistic assertions and formal logic, to come up with a probabilistic principle.

Deduction is most famously observed in geometry and arithmetic proofs and was most evident in the early years of AI through its use of expert systems. The challenge with expert systems is that the real world is vastly more complex than any geometrical or arithmetical artifice that humankind can produce.

Expert systems became champions of checkers, chess and some other games but in the end was not easily generalizable beyond a few (gaming and medically) restricted domains.

Induction is presently all the rage and represents what machine learning and deep neural networks (DNN) are doing with all that training data and resultant classification inferencing.

Today we have DNNs that can classify the objects in an image, can learn to play any game on the planet better than humans, and can even safely drive a car down the road.

The current AI world view is that this form of reasoning, DNN induction, will if taken to its extreme will ultimately result in some level of AGI, or human-equivalent levels of intelligence in a system. The author of the book begs to differ.

Abduction is less well known or discussed in rational circles. It’s essentially what any human does when presented with real world examples/experiences to derive an understanding (or principe) of what happened.

For example, a plate full of cookies last night becomes an almost empty plate of crumbs and two cookies. So what happened, your son woke up early, consumed most if not all of them, and left for work. This is a probabilistic (most likely) inference, but has a high probability of being true.

Any AGI will need all forms of reasoning

The challenge is that AI has been through the deduction phase through the rise of expert systems which crashed and burned because of the cost and time required to produce an exhaustive and correct expert system. And AI is currently in the induction phase, via DNN training, which seems to be entirely more generalizable and successfully usable in many different domains, but no one is talking seriously about doing abduction in AI (anymore).

The author claims (again, have not read the book) that any AGI will require as much abduction as induction (as well as perhaps deduction), and therefore, AGI is not inevitable based on our current AI DNN (or induction) intensive path.

Previous and current attempts at abduction reasoning

Some may recall fuzzy logic as one of the avenues taken after expert systems seemed to fail at doing successful and realistic inferencing around the end of last century. Fuzzy logic was a way of bring probabilities into deduction, not unlike abduction as defined above. With fuzzy logic each assertion or base assumption was given a probabilistic value (of being true) and the final derivation was assigned some level of probability of being true.

The wikipedia article has definitions for fuzzy logic and, or and not which of course would allow any system to make these assertions. But fuzzy logic (like expert systems above) suffered from the inability to exhaustively cover all examples in a real world situation.

Furthermore, the (funny) thing about DNNs is that they are much more probabilistic than it appears. If one examines classification outputs of any DNN, it is extremely rare to see some sort of boolean (true or false) yes or no answers. Mostly one sees a series of probabilities that are assigned to each classification bucket.

DNN systems hide these probabilities by just selecting the maximum (or minimum) probability generated as its final classification. This is entirely an artifact of needing to have some discrete output (classification selection). But DNN (internal) results always result in probabilistic values.

So although, pure induction doesn’t include probabilities, DNN induction as practiced today in AI systems, uses probabilistic reasoning in every layer of a DNN and in its final results.

What else may be missing from AI to allow AGI to be developed

Personally, AGI seems to require not just the reasoning approaches above, but a more workable and general purpose planning solution. I’ve tried to identify to see whether some researchers are using DNNs to provide general purpose planning solutions but have been yet to find any (in publcly available research). These are probably the one place where expert (or control) fuzzy systems still shine. But again they are hard to generalize and prove almost impossible to be completely exhaustive.

Nonetheless, in the end, I think that all the above just proves, that there are a number of distinct reasoning and other (planning) techniques that may need to come together to provide AGI. As any of us can attest, all of these different approaches are available within any human intellect.

And if we assume that any AGI will need to follow the human design to intelligence (not a given), they will all need to be stitched together, combined and brought to bear to realize AGI.

But, at present, with all the focus on DNN/induction, we, as AI researchers, are not making any progress on using these other techniques or in combining them into a single system.

And for that I am happy. I would be very pleased to have any AGI be farther out than nearer term. Because for the life of me, AGI scares the s&#t out of me.

Mostly because I don’t see any real way to control AGI, once unleashed. That and given the diversity of motives around this world, I don’t see any realistic mechanism to instill a universal and firm (unalterable) belief in the sanctity of human and other life, the dependance this life has on our environment/biosphere and the rule of law needed to maintain peace across humankind (and I’m probably missing a half dozen more things that we would want any AGI to adhere to).

Maybe, if I saw more effort on how, we as a species can come up with universal views on these and other topics and can come up with some way of instilling, essentially a system of programs, with these unalterable beliefs and AGI controls based on these, I’d be less fearful of AGI emerging.

Lacking that, any way of delaying its emergence, is fine by me.


Photo Credit(s):

NVIDIA Triton Giant Model Inference, a step too far

At GTC this week NVIDIA announced a new capability for their AI suite called Triton Giant Model Inference . This solution addresses the current and future problem of trying to perform inferencing with models whose parameters exceed a single GPU card.

During NVIDIA’s GTC show they showed a chart which indicates that model parameters are on an exponential climb (just eyeballing it here but 10X every year since 2018). Current models, like OpenAI’s GPT-3 have 175B parameters. Such a model would require ~350GB of GPU memory to perform inferencing on the whole model.

The fact that NVIDIA’s A100 currently sports 80GB of GPU memory means that GPT-3 would need to be cut up or partitioned to run on NVIDIA GPUs. Hence the need (from NVIDIA’s perspective) for a mechanism that can allow them to perform multi-GPU inferencing or their Triton Giant Machine Inference engine (GMI).

But first please take our new poll:

Why do we need GMI

It’s unclear what needs to be done to perform inferencing with a 175B parameter model today but my guess it involves a lot of manual splitting up of the model, into different layers/partitions and running the layers/partitions on separate GPUs and gluing the output of one portion to the input of the next. Such activity would be a complex, manual undertaking and would inherently slow down the model inferencing activities and add to inferencing latencies.

With Triton GMI, NVIDIA appears able to supply automated multi-GPU inferencing for models that exceed single GPU memory. Whether such models can span (DGX) servers or not was not revealed but even within a single DGX server there’s 4-A100s, so that provides an aggregate of 320GB of GPU memory. Of course, it’s very likely future Ampere GPUs will allow for more memory.

Why consider a step too far

Here’s my point, with artificial general intelligence (AGI, reasoning at human levels and beyond), coming sooner or later. My (and perhaps, humanities) preference is to have this happen later than earlier. Hopefully, this will give us more time to understand how to design/engineer/control AGI so that it doesn’t harm humanity or the earth. (See my post on Existential event risk… for more information on risks of Superintelligence)

One way to control or delay the emergence of AGI is to limit model size. Now NVIDIA, Google and others have already released capabilities that allow them to train models that exceed the size of one GPU.

Alas, the only thing left is to consider limit the size of models that can be used to perform inferencing. I fear that Triton GMI pretty much open up the flood gates to supply any size model inferencing. This will provide for more and more sophisticated AI/ML/DL models and will uncap model sizes in the near future.

Doing this will give us (humanity) a little more time to understand how to control AGI. But all this presupposes that any AGI will require more parameters than current DNN models. I think this is a safe assumption but I’m no expert.

Will delaying NVIDIA Triton GMI really help

I was not briefed on internals of GMI but possibly it makes use of DGX NV-Link and NVIDIA Software to automatically partition a DNN and deploy it over the 4-A100 GPUS in a DGX.

NVIDIA is not the only organization working on advancing DNN training and inferencing capabilities. And it’s very likely that more than one of them (Google, FaceBook, AWS, etc) have probably identified the model size as a problem for inferencing and are working on their own solutions. So delaying GMI will not be a long term fix.

But maybe if we could just delay this capability from reaching the market for 2 to 5 years it would have a follow on impact of delaying the emergence of AGI.

Is this going to stop some one/some organization from achieving AGI, probably not. Could it delay some person/organization/government from getting there – maybe. Perhaps, it will give humanity enough time to come up with other ways to control AGI. But I fear the more technology moves on, are options for controlling AGI diminish.

Don’t get me wrong. I think AI, DL NN and NVIDIA (Google, DeepMind, Facebook and others) have done a great service to help mankind succeed over this next century. And I in no way wish to hold back this capability. And a “good” AGI has the potential to help everyone on this earth in more ways than I can imagine.

But achieving AGI is a step function and once unleashed it may be difficult to control. Anything we can do today to a) delay the emergence of AGI and b) help to control it, is IMHO, worthy of consideration.


Photo Credits:

  • from NVIDIA GTC Keynote by Jensen Huang, CEO
  • From Hackernoon article, Can Bitcoin AGI develops to benefit humanity

AI inferencing using light alone

Researchers at UCLA have taken a trained DL neural network and implemented it into a series of passive optical only, 3D printed diffraction gratings to perform fashion MNIST object classification. And did the same with a MNIST handwritten digit and ImageNet DL neural network classifiers.

But first please take our new poll:

Experimental testing of 3D-printed D2NNs.(A and B) After the training phase, the final designs of five different layers (L1, L2, …, L5) of the handwritten digit classifier, fashion product classifier, and the imager D2NNs are shown. To the right of the network layers, an illustration of the corresponding 3D-printed D2NN is shown. (C and D) Schematic (C) and photo (D) of the experimental terahertz setup. An amplifier-multiplier chain was used to generate continuous-wave radiation at 0.4 THz, and a mixer-amplifier-multiplier chain was used for the detection at the output plane of the network. RF, radio frequency; f, frequency.

See the article on SlashGear, 3D printed all-optical diffractive deep learning neural network…. The research article is only available on Optical Society of America’s website/magazine (see Residual D2NN: training diffractive deep neural networks via learnable light shortcuts behind hard paywall). However, I did find a follow on article on ArchivX (see Analysis of Diffractive Optical Neural Networks and Their Integration with Electronic Neural Networks) that discussed how to integrate D2NN approaches with an electronic NN to create a hybrid inference engine. And another earlier Science article (see All-optical machine learning using diffractive deep neural networks) that was available which described earlier versions of D2NN technology for MNIST digit classification, fashion MNIST classification and ImageNet object classification.

How does it work

Apparently the researchers trained a normal (electronic based) deep learning neural network on the MNIST, Fashion MNIST and ImageNet and then converted the resultant trained NNs into a set of multiple diffraction grids. They did some computer simulation of the D2NN and once satisfied it worked and achieved decent accuracy, 3D printed the diffraction plates.

All-optical D2NN-based classifiers. These D2NN designs were based on spatially and temporally coherent illumination and linear optical materials/layers. (a) D2NN setup for the task of classification of handwritten digits (MNIST), where the input information is encoded in the amplitude channel of the input plane. (b) Final design of a 5-layer, phase-only classifier for handwritten digits. (c) Amplitude distribution at the input plane for a test sample (digit ‘0’). (d-e) Intensity patterns at the output plane for the input in (c); (d) is for MSE-based, and (e) is softmax- cross-entropy (SCE)-based designs. (f) D2NN setup for the task of classification of fashion products (Fashion-MNIST), where the input information is encoded in the phase channel of the input plane. (g) Same as (b), except for fashion product dataset. (h) Phase distribution at the input plane for a test sample. (i-j) Same as (d) and (e) for the input in (h),  refers to the illumination source wavelength. Input plane represents the plane of the input object or its data, which can also be generated by another optical imaging system or a lens, projecting an image of the object data onto this plane.

In their D2NN, they start with coherent (laser) light in the THz spectrum, used this to illuminate the input plane (I assume an image of the object/digit/fashion accessory) and passed this through multiple plates of diffraction grids onto THz detector which was used to detect the illuminated spot that indicated the classification.

The article in science has a supplementary materials download that show how the researchers converted NN weights into a diffraction grating. Essentially each pixel on the diffraction grating either transmits, refracts, or reflects a light path. And this represents the connections between layers. It’s unclear whether the 5 or 6 plates used in the D2NN correspond to the NN layers but it’s certainly possible.

And to the life of me I can’t understand what they mean by “Residual D2NN”, other than if it means using a trained (residual) NN and converting this to D2NN.

Some advantages of D2NN

3D printing diffraction gratings means anyone/lab could do this. The 3D printers they used had a spatial accuracy of 600 dpi, with 0.1mm accuracy, almost consumer grade 3D printers. In any case, being able to print these in a matter of hours, while not as easy as changing an all digital NN, seems like an easy way to try out the approach.

For example, for the MNIST digit classifier they used a pixel size of 400um and each diffraction layer they created was equivalent to 200X200 neural weights. Which means that 5 layer D2NN could handle about 0.2M neural weights which were completely connected to one another. This meant they could have (200×200)**2*5=8B connections in the MNIST D2NN. In the image classifier, each diffraction layer had 300×300 neural weights. So D2NN’s seem to scale very well.

Being an all passive optical device, the system is operates entirely in parallel, That is, the researchers indicated that the D2NN devices operate at the speed of light and would perform the inferencing activity in the time it takes a camera to capture the image.

Also the device uses very little energy (I assume just the energy for the THz generator, the input plane detector and the THz detector at the end.

And the researchers also claimed the device was cheap to manufacture, it could be created for less than $50. (Unclear if this included all the electronics or just the D2NN diffraction gratings and holder). And once you have locked into a D2NN that you wanted to use, could be manufactured in volume, very cheaply (sort of like stamping out CD platters). Finally, the number of neural network nodes and layers can be scaled up to a large number of layers and nodes per layer while still fitting on the diffraction gratings. In contrast, all electronic NN require more compute power as you scale up network layers and nodes per layer.

The other article (ArchivX) talked about potentially using a hybrid optical-electronic DNN approach with some layers being D2NN and others being purely digital (electronics). Such a system could potentially be used where some portion of the NN was more stable/more compute intensive than others and where the final output classification layer(s) was more changeable and much smaller/less compute intensive. Such a hybrid system could make use of the best of of the all optical D2NN to efficiently and quickly compress the input space and then have the electronic final classification layer provide the final classification step.

The Oracle

Combining a handful of D2NNs into a device that accepts speech input and provides speech output with the addition of say an offline copy of Wikipedia, Google Books etc. with a search engine that could be used to retrieve responses to questions asked would create an oracle device. Where you would ask a question and the device would respond with the best answer it could find (in it’s databases).

If this could be made out of an all passive optical components and use natural sunlight/electronic illumination to perform it’s functionality, such an all optical, question to answer oracle would be very useful to the populations of the world. And could be manufactured in volume very cheaply and would cost almost nothing to operate.

A couple of other tweaks, if we could collapse the multiple grating D2NNs into a single multi-layer plate/platter and make these replaceable in the device that would allow the oracle’s information base to be updated periodically.

Then if we could embed such a device into a Long Now Clock that would reflect sunlight onto the disk every Solstice, or Equinox, then we could have a quarterly oracle device that could last for 1000 of years. That would provide answers to queries one day every quarter. And that would be quite the oracle…

Photo credit(s):

The birth of biocomputing (on paper)

Read an article this past week discussing how researchers in Barcelona Spain have constructed a biological computing device on paper (see Biocomputer built with cells printed on paper). Their research was written up in a Nature Article (see 2D printed multi-cellular devices performing digital or analog computations).

We’ve written about DNA computing and storage before (see DNA IT …, DNA Computing… posts and our GBoS podcast on DNA storage…). But this technology takes all that to another level.

2-bit_ALU (from
2-bit_ALU (from

The challenges with biological computing previously had been how to perform the input processing and output within a single cell or when using multiple cells for computations, how to wire the cells together to provide the combinational logic required for the circuit.

The researchers in Spain seemed to have solved the wiring problems by using diffusion across a porous surface (like paper) to create a carrier signal (wire equivalent) and having cell groups at different locations along this diffusion path either enhance or block that diffusion, amplify/reduce that diffusion or transform that diffusion into something different

Analog (combinatorial circuitry types of) computation for this biocomputer are performed based on the location of sets of cells along this carrier signal. So spatial positioning is central to the device and the computation it performs. Not unlike digital or combinatorial circuitry, different computations can be performed just by altering the position along the wire (carrier signal) that gates (cells) are placed.

Their process seems to start with designing multiple cell groups to provide the processing desired, i.e., enhancing, blocking, transforming of the diffusion along the carrier signal, etc. Once they have the cells required to transform the diffusion process along the carrier signal, they then determine the spatial layout for the cells to be used in the logical circuit to perform the computation desired. Then they create a stamp which has wells (or indentations) which can be filled in with the cells required for the computation. Then they fill these wells with cells and nutrients for their operation and then stamp the circuit onto a porous surface.

The carrier signal the research team uses is a small molecule, the bacterial 3OC6HSL acyl homoserine lactone (AHL) which seems to be naturally used in a sort of biologic quorum sensing. And the computational cells produce an enzyme that enhances or degrades the AHL flow along the carrier signal. The AHL diffuses across the paper and encounters these computational cells along the way and compute whatever it is that’s required to be computed. At some point a cell transforms AHL levels to something externally available

They created:

  • Source cells (Sn) that take a substance as input (say mercury) and converts this into AHL
  • .Gate cells (M) that provide a switch on the solution of AHL difusing across the substrate.
  • Carrier reporter cells (CR) which can be used to report on concentrations of AHL.

The CR cells produce green florescent reporter proteins (GFP). Moreover, each gate cell expresses red florescent reporter proteins (RFP) as well for sort of a diagnostic tap into its individual activity.

Mapping of a general transistor architecture on a cellular printed pattern obtained using a stamping template. Similar to the transistor architecture, the cellular pattern is composed of three main components: source (S1 cells), gate (M cells) that responds to external inputs and a drain (CR cells) as the final output responding to the presence of the carrying signal (CS). b Stamping template used to create the circuit made of PLA with a layer of synthetic fibre (green). Cellular inks (yellow) are in their corresponding containers. Before stamping, the synthetic fibre is soaked with the different cell types. Finally, the stamping template is pressed against the paper surface, depositing all cells. c Circuit response. In the absence of external input, i.e. arabinose, the CS encoded in the production of AHL molecules by S1 cells diffuses along the surface, inducing GFP expression in reporter cells CR. In the presence of 10−3 M arabinose (Ara), the modulatory element Mara produces the AHL cleaving enzyme Aiia, which degrades the CS. Error bars are the standard deviation (SD) of three independent experiments. Data are presented as mean values ± SD. Experiments are performed on paper strips. The average fold change is 5.6x. d Photography of the device. Source data are provided as a Source Data file.

Using S, M and CR cells they are able to create any type of gate needed. This includes OR, AND, NOR and XNOR gates and just about any truth table needed. With this level of logic they could potentially implement any analog circuit on a piece of paper (if it was big enough).

a Schematic representation of the multi-branch implementation of a truth table. bImplementation of different logic gates. A schematic representation of the cells used in each paper strip and their corresponding distance points is given (Left). Gates with two sources of S1 (OR and XNOR gates) are circuits carrying two branches, while the other gates (NOR and AND gates) can be implemented with just one branch. Input concentrations are Ara = 10−3 M and aTc = 10−6 M. M+aTc and MaTc are, respectively, positive and negative modulatory cells responding to aTc. M+ara and Mara are, respectively, positive and negative modulatory cells responding to arabinose. S1 cells produce AHL constitutively and CR are the reporter cells. Error bars are the standard deviation (SD) of three independent experiments. The average fold change has been obtained from the mean of ON and OFF states from each circuit. OR gate 14.31x, AND gate 6.21x, NOR gate 6.58x, XNOR gate 5.6x. Source data are provided as a Source Data file.

As we learn in circuits class, any digital logic can be reduced to one of a few gates, such as NAND or NOR.

As an example of uses of the biocomputing, they implemented a mercury level sensing device. Once the device is dipped in a solution with mercury, the device will display a number of green florescent dots indicating the mercury levels of the solution

The bio-logical computer can be stamped onto any surface that supports agent diffusion, even flexible surfaces such as paper. The process can create a single use bio-logic computer, sort of smart litmus paper that could be used once and then recycled.

The computational cells stay “alive” during operation by metabolizing nutrients they were stamped with. As the biocomputer uses biological cells and paper (or any flexible diffusible substrate) as variable inputs and cells can be reproduced ad-infinitum for almost no cost, biocomputers like this can be made very inexpensively and once designed (and the input cells and stamp created) they can be manufactured like a printing press churns out magazines.


Now I’d like to see some sort of biological clock capability that could be used to transform this combinatorial logic into digital logic. And then combine all this with DNA based storage and I think we have all the parts needed for a biological, ARM/RISC V/POWER/X86 based server.

And a capacitor would be a nice addition, then maybe they could design a DRAM device.

Its one off nature, or single use will be a problem. But maybe we can figure out a way to feed all the S, M, and CR cells that make up all the gates (and storage) for the device. Sort of supplying biological power (food) to the device so that it could perform computations continuously.

Ok, maybe it will be glacially slow (as diffusion takes time). We could potentially speed it up by optimizing the diffusion/enzymatic processes. But it will never be the speed of modern computers.

However, it can be made very cheap, and very height dense. Just imagine a stack of these devices 40in tall that would potentially consist of 4000-8000 or more processing elements with immense amounts of storage. And slowness may not be as much of a problem.

Now if we could just figure out how to plug it into an ethernet network, then we’d have something.

Photo credit(s):

  • 2 Bit alu from Wikipedia
  • Figures 1 & 3 from Nature article 2D printed multi-cellular devices performing digital and analog computation

Storageless data!?

I (virtually) attended SFD21 earlier this year and a company called Hammerspace presented discussing their vision for storageless data (see videos of their session at SFD21).

We’ve talked them before but now they have something to offer the enterprise – data mobility or storageless data.

The white board after David Flynn’s session at SFD8

In essence, customers want to be able to run their workloads wherever it makes the most sense, on prem, in private cloud, and in the public cloud among other places. Historically, it’s been relatively painless to transfer an application’s binary from one to another data center, to a managed service provider or to the public cloud.

And with VMware Cloud Foundation, Kubernetes, Docker and Linux operating everywhere, the runtime environment and other OS services that applications depend on are pretty much available in any of those locations. So now customers have 2 out of 3, what’s left?

It’s all about the Data

Data can take a very long time to move around a data center, let alone across the web between locations. MBs and even GBs of data may be relatively painless to move, but TBs of data can be take days, and moving PBs of data is suicidal.

For instance, when we signed up for a globally accessible file synch and share storage service, I probably had 75GB or so of data I wanted managed. It took literally several days of time to upload this. Yes, I didn’t have data center class internet access, but even that might have only sped this up 2-5X. Ok, now try this with 1TB or more and it’s pretty much going to take days, and you can easily multiple that by 10 to do a PB or more. And that’s if it happens to continue to perform the transfer without disruption.

So what’s Hammerspace storageless data got to do with any of this.

Hammerspace’s idea

It’s been sort of a ground truth of storage, since I’ve been in the industry (40+ years now), that not all random IO data is accessed at the same frequency. That is, some data is accessed a lot and other data accessed hardly at all. That’s why DRAM caching of data can be so important to a host or storage system.

Similarly for sequential access, if you can get the first blocks of data to the host and then stream the rest in time, a storage system can appear to read fast.

Now I won’t go into all the tricks of doing good data caching, (the secret sauce to every vendor’s enterprise storage), but if you can appear to cache data well, you don’t actually need to transfer all the data associated with an application to a location it’s running in, you can appear as if all the data is there, when actually only some of it is present.

Essentially, Hammerspace creates a global file system for your data, across any locations you wish to use it, with great caching, optimized data transfer and with real storage behind it. Servers running your applications mount a Hammerspace file system/share that stitches together all the file storage behind it, across all the locations it’s operating in.

An application request goes to Hammerspace and if the data is not present there, Hammerspace goes and fetches and caches blocks of data as fast as it can. This will let the application start performing IO while the rest of the data is being cached and if allowed, moved to the new location.

Storage can be not managed by Hammerspace, read-write managed by Hammerspace or read-only managed by Hammerspace. For customers who want the whole Hammerspace storageless data functionality they would use read-write mode. For those who just want to access data elsewhere read-only would suffice. Customers who want to continue to access data directly but want read access globally, would use the read-only mode.

Once read-write storage is assigned to Hammerspace grabs all the file metadata information on the storage system. Once this process completes, customers no longer access this file data directly, but rather must access it through Hammerspace. At that point, this data is essentially storageless and can be accessed wherever Hammerspace services are available.

How does Hammerspace do it

Behind the scenes is a lot of technology. Some of which is discussed in the SFD21 sessions (see video’s above). Hammerspace is not in the data path but rather in the control path of data access. But it does orchestrate data movement, and it does route data IO requests from an application to where the data (currently) resides.

Hammerspace also supports Service Level Objectives (SLOs) for performance, geolocation, security, data protection options,, etc. These can be used to keep data in particular regions, to encrypt data (using KMIP), ensure high performance, high data availability, etc.

Hammerspace can manage data across 32 separate sites. It takes a couple of hours to deploy. per site. Each site has a Hammerspace metadata service with standalone access to all data within that site. For example, standalone access could be used, in the event of a network loss.

At the moment, they support eventual consistency and don’t support a global lock service. Rather, Hammerspace uses a conflict resolution service in the event data is overwritten by two or more applications. For any file that was being updated in two or more locations, that file would be flagged as in conflict, Hammerspace would provide snapshots of the various versions of the file(s) and it would require some sort of manual intervention to resolve the conflict. Each location would have (temporary) access to the data it had written directly, but at some point the conflict would need resolution.

They also support NFS and SMB file access for the front end and use object storage services for backend data. Data is copied on demand to the local site’s storage when accessed based on the SLO policies in effect for it. During data movement it is copied up, temporarily into objects on AWS, Microsoft Azure, or GCP, and then copied down to the location it’s being moved to. I believe this temporary object data is encrypted and compressed. Hammerspace support KMIP key providers.

Pricing for Hammerspace is on a managed capacity basis. But anyone can use Hammerspace for up to 10TB for free. Hammerspace is available in AWS marketplace for configuration there.


Well it’s been a long time coming, but it appears to be here. Any customers wanting hybrid-cloud operations or global access to their data would be remiss to not check out Hammerspace.

[Edited after posting, The Eds.]

Tattoos that light up

Read an article the other day, titled Light-emitting tattoo engineered in ScienceDaily. Which was reporting on research done by University College London and Istituto Italiano di Tecnologia (Italian Institute of Technology) (Ultrathin, ultra-comfortable and free-standing, tattooable LEDs – behind paywall).

The new technology out of their research can construct OLEDs, found in TVs, phones, and other displays, and apply them as temporary tattoos. The tattoos will eventually degrade, wash off but while present on the skin they can light up and display information.

According to the Nanowerk news article reporting on the research, (see Light emitting tattoos engineered for the 1st time), the OLEDs are printed onto paper which can then be transferred to skin by the application of water. The picture above shows a number of the OLED tattoos ready for application.

The vision is that OLED tattoos along with other flexible electronics could provide wearable sensors of bio-chemical activity of a person. Such sensors could be used in hospitals and in the home to display dehydration, glucose status, oxygenation, etc. as well as be able to display heart and breath rates. But in order to get to that vision there’s a few steps that are needed.

Flexible, stretchable electronics

There have been a number of articles about creating flexible electronics, (e.g., see A design to improve the resilience and electrical performance thin metal film based electrodes). This article was reporting on research done at the University of Illinois, Champaign-Urbana reported in Nature (behind paywall) but one of the researchers blogged about in NaturePortfolio Devices & Materials (see: An atom-thick interlayer enables the electrical ductility of thin-film metal electrodes).

Flexible electronics can be constructed by creating a thin metal film with the electronics embedded in it placed on top of a flexible substrate. However, when that flexible substrate starts to deform or stretch it induces cracks in the thin metal films which lead to loss of conductivity, or loss of electronics function.

The research cited in the article above showed videos of cracking that takes place during deformation and stretching which would lead to loss of conductivity.

But the researchers at UofI found out that if you place a thin layer of graphene or other 2D sheet of material between the electronic thin film and the flexible substrate, the cracks that eventually happen are much less harmful to electronic conduction or functioning or provide electronic ductability. To add ductablity to an electronic circuit using LEDs the team applied an atomically thin (<1nm), 2D layer of graphene between it and the flexible substrate.

Somehow the graphene provided a mechanical buffer between the flexible substrate and the thin film electronics that allowed the circuits to have much more ductility. It appears that this mechanical buffer changed the type of cracking that occurs on the thin metal film such that they are shorter and more varied in direction rather than straight across and this helped them retain functioning longer than without the

The researchers at U of I actually created a led display that could be bent without failure. See a video of them comparing the thin film vs thin film with 2D substrate.

Skin sensors

Moreover, there have been a number of articles discussing new wearable technologies that could be used to sense a persons bio-chemical state. For example, research reported on recently (see Do Sweat It! Wearable Microfluidic Sensor to Measure Lactate Concentration in Real Time) done at the Tokyo University of Science, published in Electochimica Acta (behind paywall) talks about a sweat sensor that can be applied to skin to determine when athletes or others are getting dehydrated.

This sensor uses a micro-fluidics device which printed with electronic ink. Such a device could be manufactured in volume and be readily printed onto surfaces, that could be applied to the skin, anywhere sweat was being produced.

Future tattoos

Wearable sensors already surround us. We have watches that can tell our heart rates, walk/running speed/rates, step counts, etc. It doesn’t take much to imagine that most if not all of these could be fabricated on a thin film and with the proper 2D substrate layer be applied as a tattoo to a person while in the hospital but all these sensors have lacked a read out or display up until now. With OLED readouts wearable sensors now have a reasonable display capability.

The sweat sensor above uses microfluidics to do a lactate assay of sweat. The motion sensors in my watch uses MEMs and onboard IMU/GPS to determine speed and direction of movement. Electronic temperature sensors use thermoelectric effects. Blood oxygen sensors use LEDs and light sensors. None of these appears unable to be fabricated, miniaturized and printed on thin films. Adding OLEDs and why do we need a watch anymore?

What seems to be the most glaring omission is gas sensors (although the lactate micro-fluidic sensor is close). If we could somehow miniaturize gas sensors with enough sensitivity to glucose levels, immunological load, specific diseases (COVID19), then maybe there’d be a mass market for such devices, outside of a hospital or smart watch users.

Then with OLED and electronics that can be temporarily tattooed onto a person skin., why couldn’t this be a fashion accessory. I can imagine lot’s of people would have interest in lighting up messages, iconography or other data on their arms, hands, or other areas of a person’s body. I wonder if it could be used to display hair on the top of my head :)?

And of course these OLED-electronics based tattoos are temporary. But if they are all made from electronic ink, it seems to me that such tattoos could be permanently printed (implanted?) onto a persons skin.

Maybe at some future point a permanent OLED-electronics based tattoo could provide an electronic display and input device that could be used in conjunction with a phone or a smart-watch. All it would take would be blue-tooth.


Photo credits:

Data Science storage with NetApp’s Python Toolkit

I’ve got a book someplace (yet to be read completely) with the title Data science with Python. At a recent Storage Field Day 21 last month, NetApp was there discussing a number of their product offerings one of which was their Python SDK to manage NetApp storage for data scientists and AI researchers (see videos of their sessions here).

I’m not a data science expert but a Python SDK for storage management just makes so much sense to me I just had to take a look. Their GitHub repo is available online and they call it the NetApp Data Science Toolkit.

But first please take our new poll:

The challenge for data science and AI researchers is that it’s all about the data. How do you find the data, gain access to it, clean it, and process it quickly so you can do it all over again. Having some sort of Python SDK that allows you to do some rudimentary storage volume configuration, access, snapshotting etc. can make these sorts of pipelines be self-serviced rather than going back and forth with operations to get volumes configured, mounted, and services established.

NetApp Data Science Toolkit

The NetApp Data Science Toolkit can be PIP installed into anything with Python 3.5 or later and can be invoked via a command line or as a library of Python functions that can be invoked. The command line utility and the Python calls appear to be functionally equivalent.

pip3 install netapp-ontap pandas tabulate requests boto3

The Toolkit must be configured for your environment and NetApp storage but once that’s done your ready to rock and roll.

MLOps pipeline from Google

The command line is invoked with


following that command are subcommands and parameters specifying what ONTAP operation you want to perform and how it is to be done. Python function calls seem to follow the same parameterization as the CLI.

The CLI and Python function calls can run on MacOS or any Linux distribution. There’s a paper that discusses how to use the SDK to accelerate AI pipelines as well as another ReadMe that describes it’s use in Kubernetes with NetApp’s Trident CSI plugin.

The functionality supports NetApp AFF, FAS, Cloud Volumes and Select that are running ONTAP 9.7 or later. For a current list of ONTAP functions available, check out the toolkit. But for a overview these ONTAP functions were available.

  • For Volume Management – cloning, creating, listing all, deleting or mounting a volume,
  • For Snapshot Management – creating, deleting, listing and restoring snapshots (of volumes)
  • For Data Fabric Management – listing all cloud sync relationships, triggering a cloud sync operation, multi-thread pulling a bucket down from S3 storage (into a NetApp volume directory), pulling a single object down from S3 into a file, pushing the contents of a directory to bucket on S3 and pushing a file into an object on S3.
  • For Advanced Data Fabric Management – listing all SnapMirror relationships and triggering a sync operation for an existing SnapMirror relationship.

This is a pretty comprehensive list of NetApp ONTAP storage functionality. Having all this under control of Python and CLI for data scientist or AI researcher seems pretty impressive.

Of course not every option for all those functions are supported but it’s just a start (V1.1 of the toolkit). I’m sure there’s more to come, especially if customers demand it.

However, it would be nice to have an ONTAP simulator available with the toolkit that could be used to test out your Python code and CLI commands before using real NetApp storage. This would be very useful for those of us lacking our own test ONTAP storage, just hanging around on prem or in the cloud.

As Python becomes the language of choice for AI and now data science, it seems only natural that storage and data protection companies would start releasing Python SDKs/APIs for their product functionality. That way AI and data science researchers could embed any storage functionality they needed directly into their Python code or Jupyter Notebook application.

Having a Python SDK for NetApp ONTAP storage, means using data storage for your MLops or data science pipelines is that much easier.

Great move by NetApp. Ok where’s the rest of the industry?

Picture credit(s):