Mobile computing – Silverton Consulting

AI navigation goes with the flow

Posted on December 22, 2021August 9, 2022 by Ray in AGI, Deep Learning, Drones, Machine Learning, Mobile computing, Neural network, Reinforcement Learning, Robots, Strategic Inflection Points

Read an article the other day (Engineers Teach AI to Navigate Ocean with Minimal Energy) about a simulated robot that was trained to navigate 2D turbulent water flow to travel between locations. They used a combination reinforcement learning with a DNN derived policy. The article was reporting on a Nature Communications open access paper (Learning efficient navigation in vortical flow fields).

The team was attempting to create an autonomous probe that could navigate the ocean and other large bodies of water to gather information. I believe ultimately the intent was to provide the navigational smarts for a submersible that could navigate terrestrial and non-terrestrial oceans.

One of the biggest challenges for probes like this is to be able to navigate turbulent flow without needing a lot of propulsive power and using a lot of computational power. They said that any probe that could propel itself faster than the current could easily travel wherever it wanted but the real problem was to go somewhere with lower powered submersibles.. As a result, they set their probe to swim at a constant speed at 80% of the overall simulated water flow.

Even that was relatively feasible if you had unlimited computational power to train and inference with but trying to do this on something that could fit in a small submersible was a significant challenge. NLP models today have millions of parameters and take hours to train with multiple GPU/CPU cores in operation and lots of memory Inferencing using these NLP models also takes a lot of processing power.

The researchers targeted the computational power to something significantly smaller and wished to train and perform real time inferencing on the same hardware. They chose a “Teensy 4.0 micro-controller” board for their computational engine which costs under $20, had ~2MB of flash memory and fit in a space smaller than 1.5″x1.0″ (38.1mm X 25.4mm).

The simulation setup

The team started their probe turbulent flow training with a cylinder in a constant flow that generated downstream vortices, flowing in opposite directions. These vortices would travel from left to right in the simulated flow field. In order for the navigation logic to traverse this vortical flow, they randomly selected start and end locations on different sides.

The AI model they trained and used for inferencing was a combination of reinforcement learning (with an interesting multi-factor reward signal) and a policy using a trained deep neural network. They called this approach Deep RL.

For reinforcement learning, they used a reward signal that was a function of three variables: the time it took, the difference in distance to target and a success bonus if the probe reached the target. The time variable was a penalty and was the duration of the swim activity. Distance to target was how much the euclidean distance between the current probe location and the target location had changed over time. The bonus was only applied when the probe was in close proximity to the target location, The researchers indicated the reward signal could be used to optimize for other values such as energy to complete the trip, surface area traversed, wear and tear on propellers, etc.

For the reinforcement learning state information, they supplied the probe and the target relative location [Difference(Probe x,y, Target x,y)], And whatever sensor data being tested (e.g., for the velocity sensor equipped probe, the local velocity of the water at the probe’s location).

They trained the DNN policy using the state information (probe start and end location, local velocity/vorticity sensor data) to predict the swim angle used to navigate to the target. The DNN policy used 2 internal layers with 64 nodes each.

They benchmarked the Deep RL solution with local velocity sensing against a number of different approaches. One naive approach that always swam in the direction of the target, one flow blind approach that had no sensors but used feedback from it’s location changes to train with, one vorticity sensor approach which sensed the vorticity of the local water flow, and one complete knowledge approach (not shown above) that had information on the actual flow at every location in the 2D simulation

It turned out that of the first four (naive, flow-blind, vorticity sensor and velocity sensor) the velocity sensor configured robot had the highest success rate (“near 100%”).

That simulated probe was then measured against the complete flow knowledge version. The complete knowledge version had faster trip speeds, but only 18-39% faster (on the examples shown in the paper). However, the knowledge required to implement this algorithm would not be feasible in a real ocean probe.

More to be done

They tried the probes Deep RL navigation algorithm on a different simulated flow configuration, a double gyre flow field (sort of like 2 circular flows side by side but going in the opposite directions).

The previously trained (on cylinder vortical flow) Deep RL navigation algorithm only had a ~4% success rate with the double gyre flow. However, after training the Deep RL navigation algorithm on the double gyre flow, it was able to achieve a 87% success rate.

So with sufficient re-training it appears that the simulated probe’s navigation Deep RL could handle different types of 2D water flow.

The next question is how well their Deep RL can handle real 3D water flows, such as idal flows, up-down swells, long term currents, surface wind-wave effects, etc. It’s probable that any navigation for real world flows would need to have a multitude of Deep RL trained algorithms to handle each and every flow encountered in real oceans.

However, the fact that training and inferencing could be done on the same small hardware indicates that the Deep RL could possibly be deployed in any flow, let it train on the local flow conditions until success is reached and then let it loose, until it starts failing again. Training each time would take a lot of propulsive power but may be suitable for some probes.

The researchers have 3D printed a submersible with a Teensy microcontroller and an Arduino controller board with propellers surrounding it to be able to swim in any 3D direction. They have also constructed a water tank for use for in real life testing of their Deep RL navigation algorithms.

Picture credit(s):

From Learning efficient navigation in vortical flow fields Nature magazine article
From Learning efficient navigation in vortical flow fields Nature magazine article
From Learning efficient navigation in vortical flow fields Nature magazine article
From Learning efficient navigation in vortical flow fields Nature magazine article
From Learning efficient navigation in vortical flow fields Nature magazine article
From Engineers Teach AI to Navigate Ocean with Minimal Energy CalTech article

CTERA, Cloud NAS on steroids

Posted on August 13, 2021August 13, 2021 by Ray in Cloud services, Cloud storage, data access, Data consistency, Data security, Distributed computing, File Storage, Mobile computing, Object storage, storage scalability, Strategic Inflection Points

We attended SFD22 last week and one of the presenters was CTERA, (for more information please see SFD22 videos of their session) discussing their enterprise class, cloud NAS solution.

We’ve heard a lot about cloud NAS systems lately (see our/listen to our GreyBeards on Storage podcast with LucidLink from last month). Cloud NAS systems provide a NAS (SMB, NFS, and S3 object storage) front-end system that uses the cloud or onprem object storage to hold customer data which is accessed through the use of (virtual or hardware) caching appliances.

These differ from file synch and share in that Cloud NAS systems

Don’t copy lots or all customer data to user devices, the only data that resides locally is metadata and the user’s or site’s working set (of files).
Do cache working set data locally to provide faster access
Do provide NFS, SMB and S3 access along with user drive, mobile app, API and web based access to customer data.
Do provide multiple options to host user data in multiple clouds or on prem
Do allow for some levels of collaboration on the same files

Although admittedly, the boundary lines between synch and share and Cloud NAS are starting to blur.

CTERA is a software defined solution. But, they also offer a whole gaggle of hardware options for edge filers, ranging from smart phone sized, 1TB flash cache for home office user to a multi-RU media edge server with 128TB of hybrid disk-SSD solution for 8K video editing.

They have HC100 edge filers, X-Series HCI edge servers, branch in a box, edge and Media edge filers. These later systems have specialized support for MacOS and Adobe suite systems. For their HCI edge systems they support Nutanix, Simplicity, HyperFlex and VxRail systems.

CTERA edge filers/servers can be clustered together to provide higher performance and HA. This way customers can scale-out their filers to supply whatever levels of IO performance they need. And CTERA allows customers to segregate (file workloads/directories) to be serviced by specific edge filer devices to minimize noisy neighbor performance problems.

CTERA supports a number of ways to access cloud NAS data:

Through (virtual or real) edge filers which present NFS, SMB or S3 access protocols
Through the use of CTERA Drive on MacOS or Windows desktop/laptop devices
Through a mobile device app for IOS or Android
Through their web portal
Through their API

CTERA uses a, HA, dual redundant, Portal service which is a cloud (or on prem) service that provides CTERA metadata database, edge filer/server management and other services, such as web access, cloud drive end points, mobile apps, API, etc.

CTERA uses S3 or Azure compatible object storage for its backend, source of truth repository to hold customer file data. CTERA currently supports 36 on-prem and in cloud object storage services. Customers can have their data in multiple object storage repositories. Customer files are mapped one to one to objects.

CTERA offers global dedupe, virus scanning, policy based scheduled snapshots and end to end encryption of customer data. Encryption keys can be held in the Portals or in a KMIP service that’s connected to the Portals.

CTERA has impressive data security support. As mentioned above end-to-end data encryption but they also support dark sites, zero-trust authentication and are DISA (Defense Information Systems Agency) certified.

Customer data can also be pinned to edge filers, Moreover, specific customer (director/sub-directorydirectories) data can be hosted on specific buckets so that data can:

Stay within specified geographies,
Support multi-cloud services to eliminate vendor lock-in

CTERA file locking is what I would call hybrid. They offer strict consistency for file locking within sites but eventual consistency for file locking across sites. There are performance tradeoffs for strict consistency, so by using a hybrid approach, they offer most of what the world needs from file locking without incurring the performance overhead of strict consistency across sites. For another way to do support hybrid file locking consistency check out LucidLink’s approach (see the GreyBeards podcast with LucidLink above).

At the end of their session Aron Brand got up and took us into a deep dive on select portions of their system software. One thing I noticed is that the portal is NOT in the data path. Once the edge filers want to access a file, the Portal provides the credential verification and points the filer(s) to the appropriate object and the filers take off from there.

CTERA’s customer list is very impressive. It seems that many (50 of WW F500) large enterprises are customers of theirs. Some of the more prominent include GE, McDonalds, US Navy, and the US Air Force.

Oh and besides supporting potentially 1000s of sites, 100K users in the same name space, and they also have intrinsic support for multi-tenancy and offer cloud data migration services. For example, one can use Portal services to migrate cloud data from one cloud object storage provider to another.

They also mentioned they are working on supplying K8S container access to CTERA’s global file system data.

There’s a lot to like in CTERA. We hadn’t heard of them before but they seem focused on enterprise’s with lots of sites, boatloads of users and massive amounts of data. It seems like our kind of storage system.

Comments?

The birth of biocomputing (on paper)

Posted on March 26, 2021March 26, 2021 by Ray in Distributed computing, Mobile computing, Processing performance, Strategic Inflection Points, System effectiveness

Read an article this past week discussing how researchers in Barcelona Spain have constructed a biological computing device on paper (see Biocomputer built with cells printed on paper). Their research was written up in a Nature Article (see 2D printed multi-cellular devices performing digital or analog computations).

We’ve written about DNA computing and storage before (see DNA IT …, DNA Computing… posts and our GBoS podcast on DNA storage…). But this technology takes all that to another level.

The challenges with biological computing previously had been how to perform the input processing and output within a single cell or when using multiple cells for computations, how to wire the cells together to provide the combinational logic required for the circuit.

The researchers in Spain seemed to have solved the wiring problems by using diffusion across a porous surface (like paper) to create a carrier signal (wire equivalent) and having cell groups at different locations along this diffusion path either enhance or block that diffusion, amplify/reduce that diffusion or transform that diffusion into something different

Analog (combinatorial circuitry types of) computation for this biocomputer are performed based on the location of sets of cells along this carrier signal. So spatial positioning is central to the device and the computation it performs. Not unlike digital or combinatorial circuitry, different computations can be performed just by altering the position along the wire (carrier signal) that gates (cells) are placed.

Their process seems to start with designing multiple cell groups to provide the processing desired, i.e., enhancing, blocking, transforming of the diffusion along the carrier signal, etc. Once they have the cells required to transform the diffusion process along the carrier signal, they then determine the spatial layout for the cells to be used in the logical circuit to perform the computation desired. Then they create a stamp which has wells (or indentations) which can be filled in with the cells required for the computation. Then they fill these wells with cells and nutrients for their operation and then stamp the circuit onto a porous surface.

The carrier signal the research team uses is a small molecule, the bacterial 3OC6HSL acyl homoserine lactone (AHL) which seems to be naturally used in a sort of biologic quorum sensing. And the computational cells produce an enzyme that enhances or degrades the AHL flow along the carrier signal. The AHL diffuses across the paper and encounters these computational cells along the way and compute whatever it is that’s required to be computed. At some point a cell transforms AHL levels to something externally available

They created:

Source cells (Sn) that take a substance as input (say mercury) and converts this into AHL
.Gate cells (M) that provide a switch on the solution of AHL difusing across the substrate.
Carrier reporter cells (CR) which can be used to report on concentrations of AHL.

The CR cells produce green florescent reporter proteins (GFP). Moreover, each gate cell expresses red florescent reporter proteins (RFP) as well for sort of a diagnostic tap into its individual activity.

Mapping of a general transistor architecture on a cellular printed pattern obtained using a stamping template. Similar to the transistor architecture, the cellular pattern is composed of three main components: source (S₁ cells), gate (M⁻ cells) that responds to external inputs and a drain (CR cells) as the final output responding to the presence of the carrying signal (CS). b Stamping template used to create the circuit made of PLA with a layer of synthetic fibre (green). Cellular inks (yellow) are in their corresponding containers. Before stamping, the synthetic fibre is soaked with the different cell types. Finally, the stamping template is pressed against the paper surface, depositing all cells. c Circuit response. In the absence of external input, i.e. arabinose, the CS encoded in the production of AHL molecules by S₁ cells diffuses along the surface, inducing GFP expression in reporter cells CR. In the presence of 10⁻³ M arabinose (Ara), the modulatory element M⁻_ara produces the AHL cleaving enzyme Aiia, which degrades the CS. Error bars are the standard deviation (SD) of three independent experiments. Data are presented as mean values ± SD. Experiments are performed on paper strips. The average fold change is 5.6x. d Photography of the device. Source data are provided as a Source Data file.

Using S, M and CR cells they are able to create any type of gate needed. This includes OR, AND, NOR and XNOR gates and just about any truth table needed. With this level of logic they could potentially implement any analog circuit on a piece of paper (if it was big enough).

a Schematic representation of the multi-branch implementation of a truth table. bImplementation of different logic gates. A schematic representation of the cells used in each paper strip and their corresponding distance points is given (Left). Gates with two sources of S₁ (OR and XNOR gates) are circuits carrying two branches, while the other gates (NOR and AND gates) can be implemented with just one branch. Input concentrations are Ara = 10⁻³ M and aTc = 10⁻⁶ M. M⁺_aTc and M⁻_aTc are, respectively, positive and negative modulatory cells responding to aTc. M⁺_ara and M⁻_ara are, respectively, positive and negative modulatory cells responding to arabinose. S₁ cells produce AHL constitutively and CR are the reporter cells. Error bars are the standard deviation (SD) of three independent experiments. The average fold change has been obtained from the mean of ON and OFF states from each circuit. OR gate 14.31x, AND gate 6.21x, NOR gate 6.58x, XNOR gate 5.6x. Source data are provided as a Source Data file.

As we learn in circuits class, any digital logic can be reduced to one of a few gates, such as NAND or NOR.

As an example of uses of the biocomputing, they implemented a mercury level sensing device. Once the device is dipped in a solution with mercury, the device will display a number of green florescent dots indicating the mercury levels of the solution

The bio-logical computer can be stamped onto any surface that supports agent diffusion, even flexible surfaces such as paper. The process can create a single use bio-logic computer, sort of smart litmus paper that could be used once and then recycled.

The computational cells stay “alive” during operation by metabolizing nutrients they were stamped with. As the biocomputer uses biological cells and paper (or any flexible diffusible substrate) as variable inputs and cells can be reproduced ad-infinitum for almost no cost, biocomputers like this can be made very inexpensively and once designed (and the input cells and stamp created) they can be manufactured like a printing press churns out magazines.

~~~~

Now I’d like to see some sort of biological clock capability that could be used to transform this combinatorial logic into digital logic. And then combine all this with DNA based storage and I think we have all the parts needed for a biological, ARM/RISC V/POWER/X86 based server.

And a capacitor would be a nice addition, then maybe they could design a DRAM device.

Its one off nature, or single use will be a problem. But maybe we can figure out a way to feed all the S, M, and CR cells that make up all the gates (and storage) for the device. Sort of supplying biological power (food) to the device so that it could perform computations continuously.

Ok, maybe it will be glacially slow (as diffusion takes time). We could potentially speed it up by optimizing the diffusion/enzymatic processes. But it will never be the speed of modern computers.

However, it can be made very cheap, and very height dense. Just imagine a stack of these devices 40in tall that would potentially consist of 4000-8000 or more processing elements with immense amounts of storage. And slowness may not be as much of a problem.

Now if we could just figure out how to plug it into an ethernet network, then we’d have something.

Photo credit(s):

2 Bit alu from Wikipedia
Figures 1 & 3 from Nature article 2D printed multi-cellular devices performing digital and analog computation

Tattoos that light up

Posted on March 3, 2021March 3, 2021 by Ray in Information economy, Mobile computing, Scenario planning, Strategic Inflection Points

Read an article the other day, titled Light-emitting tattoo engineered in ScienceDaily. Which was reporting on research done by University College London and Istituto Italiano di Tecnologia (Italian Institute of Technology) (Ultrathin, ultra-comfortable and free-standing, tattooable LEDs – behind paywall).

The new technology out of their research can construct OLEDs, found in TVs, phones, and other displays, and apply them as temporary tattoos. The tattoos will eventually degrade, wash off but while present on the skin they can light up and display information.

According to the Nanowerk news article reporting on the research, (see Light emitting tattoos engineered for the 1st time), the OLEDs are printed onto paper which can then be transferred to skin by the application of water. The picture above shows a number of the OLED tattoos ready for application.

The vision is that OLED tattoos along with other flexible electronics could provide wearable sensors of bio-chemical activity of a person. Such sensors could be used in hospitals and in the home to display dehydration, glucose status, oxygenation, etc. as well as be able to display heart and breath rates. But in order to get to that vision there’s a few steps that are needed.

Flexible, stretchable electronics

There have been a number of articles about creating flexible electronics, (e.g., see A design to improve the resilience and electrical performance thin metal film based electrodes). This article was reporting on research done at the University of Illinois, Champaign-Urbana reported in Nature (behind paywall) but one of the researchers blogged about in NaturePortfolio Devices & Materials (see: An atom-thick interlayer enables the electrical ductility of thin-film metal electrodes).

Flexible electronics can be constructed by creating a thin metal film with the electronics embedded in it placed on top of a flexible substrate. However, when that flexible substrate starts to deform or stretch it induces cracks in the thin metal films which lead to loss of conductivity, or loss of electronics function.

The research cited in the article above showed videos of cracking that takes place during deformation and stretching which would lead to loss of conductivity.

But the researchers at UofI found out that if you place a thin layer of graphene or other 2D sheet of material between the electronic thin film and the flexible substrate, the cracks that eventually happen are much less harmful to electronic conduction or functioning or provide electronic ductability. To add ductablity to an electronic circuit using LEDs the team applied an atomically thin (<1nm), 2D layer of graphene between it and the flexible substrate.

Somehow the graphene provided a mechanical buffer between the flexible substrate and the thin film electronics that allowed the circuits to have much more ductility. It appears that this mechanical buffer changed the type of cracking that occurs on the thin metal film such that they are shorter and more varied in direction rather than straight across and this helped them retain functioning longer than without the

The researchers at U of I actually created a led display that could be bent without failure. See a video of them comparing the thin film vs thin film with 2D substrate.

Skin sensors

Moreover, there have been a number of articles discussing new wearable technologies that could be used to sense a persons bio-chemical state. For example, research reported on recently (see Do Sweat It! Wearable Microfluidic Sensor to Measure Lactate Concentration in Real Time) done at the Tokyo University of Science, published in Electochimica Acta (behind paywall) talks about a sweat sensor that can be applied to skin to determine when athletes or others are getting dehydrated.

This sensor uses a micro-fluidics device which printed with electronic ink. Such a device could be manufactured in volume and be readily printed onto surfaces, that could be applied to the skin, anywhere sweat was being produced.

Future tattoos

Wearable sensors already surround us. We have watches that can tell our heart rates, walk/running speed/rates, step counts, etc. It doesn’t take much to imagine that most if not all of these could be fabricated on a thin film and with the proper 2D substrate layer be applied as a tattoo to a person while in the hospital but all these sensors have lacked a read out or display up until now. With OLED readouts wearable sensors now have a reasonable display capability.

The sweat sensor above uses microfluidics to do a lactate assay of sweat. The motion sensors in my watch uses MEMs and onboard IMU/GPS to determine speed and direction of movement. Electronic temperature sensors use thermoelectric effects. Blood oxygen sensors use LEDs and light sensors. None of these appears unable to be fabricated, miniaturized and printed on thin films. Adding OLEDs and why do we need a watch anymore?

What seems to be the most glaring omission is gas sensors (although the lactate micro-fluidic sensor is close). If we could somehow miniaturize gas sensors with enough sensitivity to glucose levels, immunological load, specific diseases (COVID19), then maybe there’d be a mass market for such devices, outside of a hospital or smart watch users.

Then with OLED and electronics that can be temporarily tattooed onto a person skin., why couldn’t this be a fashion accessory. I can imagine lot’s of people would have interest in lighting up messages, iconography or other data on their arms, hands, or other areas of a person’s body. I wonder if it could be used to display hair on the top of my head :)?

And of course these OLED-electronics based tattoos are temporary. But if they are all made from electronic ink, it seems to me that such tattoos could be permanently printed (implanted?) onto a persons skin.

Maybe at some future point a permanent OLED-electronics based tattoo could provide an electronic display and input device that could be used in conjunction with a phone or a smart-watch. All it would take would be blue-tooth.

Comments?

Photo credits:

From the Nanowerk News article Light-emitting tattoo engineered for the first time
From the NaturePortfolio Devices & Materials engineering blog article An atom-thick interlayer enables the electrical ductility of thin-film metal electrodes

Ok, maybe neuromorphic chips aren’t a deadend – Neuromorphic Part 6

Posted on November 25, 2020October 22, 2024 by Ray in Artificial Intelligence, Brain emulation, Cognitive computing, Deep Learning, Energy efficiency, IoT, Mobile computing, Neural network, Neuromorphic, System effectiveness

Those of you who followe my blog will no doubt recall that I pronounced neuromorphic chips dead (see our Are neuromorphic chips a deadend blog post). Not because the hardware technology wasn’t improving or good enough, but because software support for the technology was sorely lacking and it was extremely complex or nigh impossible to program and use.

But first please take our new poll:

And, in the meantime GPUs, TPUs and other more “normal” neural network hardware and accelerators, all were able to utilize standard, easy to use, mostly open source, AI DL frameworks. And all this hardware was steadily improving, coming out regularly with more power and performance, with no end in sight.

But then I attended AIFD1 (AI Field Day 1) and at one of the sessions, Anil Mankar, COO & Co-Founder of a company named BrainChip Inc, (see video of their talk) presented yet another neuromorphic chip, called the AKIDA Neural Processor. Their current generation of the technology is available in their AKD 1000 SoC chip, focused on IoT solutions. But they had created a a software development environment that allowed one to use standard TensorFlow neural network trained models and deploy these on their hardware. And that got my interest.

BrainChip’s AKIDA AKD 1000 hardware AND software

Their AI DL nueromoryhic chip is made app of Event Domain Neural Processing Units (NPUs). AKIDA technology is focused on low power, sensor like applications. They claim to save power by only consumuing power (or is running) when an event takes place. They are also able to save on memory requirements by using 1, 2 or 4 bits (vs. 8, 16, 32 or more bits) for model weights/activations

Their hardware seems to run spiking neural networks (SNN, see our blog post on another chip technology using SNNs). In their SDK, they have a CNN2SNN tool that could take a any (TensorFlow) trained CNN model and convert it to a SNN, that could then run on their AKIDA tecnology.

They also have an AKIDA Model Zoo with a handful of pre-trained CNN type models that have already been converted to run on their technology. They also provide a tutorial on their technology. Mankar, said that if you understand how to use TensorFlow Keras today, to construct and train your models, it shouldn’t be too hard to understand how to use their tools to do what you want.

Their chip hardware is available today on a separate PCIe card, M.2 form factor card. or as a chip. Finally, they also license their AKIDA IP to other chip designers.

AKIDA AKD 1000 performance

At the AIFD1 Mankar showed statistics on the performance and accuracy attained using their chip vs. using standard 32 bit floating point CNN implementations.

As discussed above, their processor uses 1-4 bits for weight quantization and as such loses some accuracy but as you can see it’s a matter of one to a few percent vs. these same models using a 32bit floating point CNN implementation.

Because of their smaller weights, AKIDA uses less memory and less bandwidth to update models vs. models using larger weights.

As shown in the chart the the memory required for the 8-bit deep learning algorithms (DLAs) were all significantly larger than the memory requirements for the AKIDA solution. For one algorithm, they required ~1/2 the memory size of the 8-bit DLA version of the model.

Mankar also provided information on the amount of calculations required per inference using AKIDA vs. 8-bit DLAs.

Just to set the stage, MMACs/Inference is (matrix or multiple) multiplications and accumulations required to perform a single inference with the selected CNN model. ImageNet (1000), ImageNette (20) and Visual Wake Word models are all standard CNN models, that have pre-trained on vast repositories of data, that can run in many hardware environments. The non-AKIDA solutions above were all running using an 8-bit DLA CNN model. Activity regularization is a method of reducing the learning rate and weights used during training that shrinks the weight changes during training to reduce model overfit.

He also showed some comparisons of their technology vs. Intel’s LoiHi hardware. LoiHi is another neuromorphic chip, whose original introduction prompted me to write the “Are neuromorphic chips a deadend” post (link above). Unfortunately, I didn’t capture any of these charts, but from my recollection, they showed that AKIDA technology used slightly less power than LoiHi technology in all their comparisons.

AKIDA technology demo

In their live, on camera, demo, they used a previously downloaded VGG16 (if I recall correctly) CNN trained model. Offline they had replaced the last classification layer with a (blank, untrained) dense network and they converted this to a SNN and downloaded onto one of their boards. They had developed an application that used this board with a camera to perform more CNN training or CNN image inferencing (classification).

They first (one-shot) trained their board’s model to recognize the background of what the camera was seeing and then proceeded to perform (one-shot) trainings to classify toys of tigers, elephants and cars. All these were completed in real time in the demo. They were able to verify the training took using pictures of tigers, elephants and cars as well as classify all the toys in different orientations and a different toy car

The AIFD1 (a tuff) crowd, said had seen all this before but would be really interested to see if their chip could distinguish between different cars (one a toy race car and the other a toy police car). On camera, they were able to re-train their CNN to distinguish between (toy) car 1 and car 2 to classify properly between the two of them. They had one or two instances where their CNN model was confused, but they were able to re-train it to recognize the toy car and place it into the correct classification (using two-shot[?] learning).

At AIFD1, Mankar also presented detailed, real world data on how they were able to perform Keyword spotting, person detection, E-nose classification, E-tongue classification, and auditory (E-ear?) classification in embedded sensor systems.

AKIDA technology limitations

At the moment, their chip doesn’t support neural networks that use memory such as LSTM or RNN’s but it seems to work fine for any CNN, which was shown multiple times in the data they presented and in their demo.

We were really impressed with their software stack, liked what we saw of their hardware/IP, and enjoyed their demo and its one-shot learning. Check out their videos (link above) for more information on them.

Photo Credit(s): all charts are from BrainChip Inc’s website or were presented at their AIFD1 session

Open source digital assistant

Posted on November 17, 2020November 17, 2020 by Ray in Artificial Intelligence, Crowdsourcing, Information economy, Mobile computing, Neural network, Strategic Inflection Points, Visionary leadershp

I’ve come by and purchased a number of digital assistants over the last couple of years from both Google and Amazon but not Apple. At first their novelty drove me to take advantage of them to do a number of things. But over time I started to only use them for music playing or jokes. But then I started to hear about some other concerns with the technology.

The problems with today’s vendor based, digital assistants

My and others main concern was their ability to listen into conversations in the home and workplace without being queried. Yes, there are controls on some of them to turn off the mic and thus any recordings. But these are not hardwired switches and as software may or may not work depending on the implementation. As such, there is no guarantee that they won’t still be recording audio feeds even with their mic (supposedly) turned off.

At one point I saw a news article where police had subpoenaed recordings of a digital assistant to use in a criminal case. Now I’m ok with use of this for specific, court approved, criminal cases but what’s to limit its use to such. And not all courts, or governments for that matter, are as protective of personal privacy as some.

Open source digital assistant on the way

But with an open source version of a digital assistant, one where the user had complete programmatical control over its recording and use of audio data is another matter. I suppose this doesn’t necessarily help the technically challenged among us that can’t program our way out of a paper bag but even for those individuals, the fact that an open source version exists to protect privacy, could be construed as something much more secure than a company or vendor’s product.

All that made it very interesting when I saw an article recently about a project put together at Standford on an Open source challenger to popular virtual assistants”.

How to create a open source digital assistant

The main problem facing an open source digital assistant is the need for massive amounts of annotated training request data. This is one of the main reasons that commercial digital assistants often record conversations when not specifically requested.

But Stanford University who is responsible for creating the open source digital assistant above has managed to design and create a “rules based” system to help generate all the training data needed for a virtual assistant.

With all this automatically generated training data they can use it to train a digital assistant’s natural language processing neural network to understand what’s being asked and drive whatever action is being requested.

At the moment the digital assistant (and its conversation generator) has somewhat limited skills, or rather only works in a restricted set of domains such as restaurants, people, movies, books and music. For example, “identify a restaurant near me that has deep dish pizza and is rated greater than 4 on a 5 point scale”, “find me an mystery novel talking that is about magic”, or “who was the 22nd president of the USA”.

But as the digital assistant and its annotated, rules based conversation generator are both open source, anyone can contribute more skills code or add more conversational capabilities. Over time, if there’s enough participation, perhaps even someday perform all of the skills or capabilities of commercial digital assistants.

Introducing Almond and Stanford’s OVAL

Stanford work on this project is out of their OVAL (Open Virtual Assistant Lab). Their open source virtual assistant is called Almond.

Almond’s verbal generator is called Genie and uses compositional technology to generate conversations that are used to train their linguistic user interface (LUInet). Almond also uses ThingTalk a new declaritive program language to process responses to queries and requests. Finally, Almond makes use of Thingpedia, a repository of information about internet services and IoT devices to tell it how to interact with these systems.

Stanford Genie technology

The technology behind Genie is based on using source text statements to create templates that can generate sentences for any domain you wish to have Almond work in. If one is interested in expanding the Almond domains, they can create their own templates using the Genie toolkit.

One essentially provides a small set of input sentences that are converted into templates and used by Genie to understand how to parse all similar sentences. This enables Almond to “understand” what’s being requested of it

The set of input sentences can start small and be augmented or added to over time to handle more diverse or complex queries or requests. Their GitHub toolkit and Genie technology is described more fully in a paper Genie: A generator of natural language symantec parsers for virtual assistant commands

Stanford ThingTalk declarative language

ThingTalk is the programming language used to control what Almond can do for requests and queries. Essentially it’s a multi-part statement about what to do when a request comes along. The main parts in a ThingTalk statement include:

When a particular action is supposed to be triggered.
What service does the request need in order to perform its action.
What action is requested

The “what service does a request need” are based on Open API calls (See ThingPedia below). The “what action is requested” can either be standard Almond actions or invoke other ThingPedia open source API calls, such as create a tweet, post on FB, send email etc.

For example, a ThingTalk statement looks like:

monitor @com.foxnews.get() => @com.slack.send();

Which monitors Fox news for any new news articles and sends them (the link) to your Slack channel.

Stanford Thingpedia

Thingpedia is an open source repository of structured information available on the Web and of API services available on the web. Structured information or data is the information behind calendars, contact databases, article repositories, etc. Any of which can be queried for information and some of which can be updated or have actions performed on them. API services are the way that those queries and actions are performed.

One page of the Thingpedia multi-page summary of services that are offered

The Thingpedia web page shows a number of services that already have Open source APIs defined and registered. For example, things like twitter, facebook, bing search, BBC news, gmail and a host of other services. More are being added all the time and these represent the domains that Almond can be used to act upon.

Some of these domains are more defined that others. But in any case any service that takes the form of an web based API can be added to Thingpedia.

Thingpedia as a standalone open source repository is valuable in and of itself regardless of its use by Almond. But Almond would be impossible without Thingpedia. Thingpedia wants to be the wikipedia of APIs.

Almond, putting it all together

Almond consists of mainly the Almond Agent, Engine and Thingpedia. The Agent is used by the various Almond implementions to parse and understand the request and access the ThinkTalk program statement. Almond Agent uses its LUInet natural language interpreter, interpret that request and to select the ThingTalk program for the request. Once the ThinkTalk program is identified, it uses the various Thingpedia APIs requested by the ThinkTalk statement to generate the proper API calls to the service being requested and generate any output that is requested.

Where can you run Almond

Almond is available currently as a web app, an Android App, a Gnome (Linux) desktop/laptop App, a CLI application or can be run on your Mac or Windows computers. You could of course create your own smart speaker to run Almond or perhaps hack a current smart speaker to do so.

One important consideration is that with the Android app, all your data and credentials are only stored on the phone. And will not go out into the cloud or elsewhere. I didn’t see similar statements about privacy protections for the web app or any of the other deployments. But as Almond is open source, you potentially have much greater control over where your data resides.

~~~~

What I would really like is a smart speaker app running on a RPi with a microphones and a decent speaker attached, all in the package of a cube or cylinder.

I thought their videos on Almond were pretty cheesy but the technology is very interesting and could potentially make for an interesting competitor of today’s smar

Photo Credit(s):

All photos and graphics from Stanford Almond and OVAL Lab websites.

Your Rx on blockchain

Posted on September 6, 2019September 6, 2019 by Ray in Blockchain, Distributed computing, Information economy, Mobile computing, Strategic Inflection Points

We have been discussing blockchain technology for a while now (e.g., see our posts Etherium enters the enterprise , Blockchains go mainstream, and our podcast Discussing blockchains with Donna Dillenberger). And we were at VMworld 2019 where there was brief mention of Project Concord and VMware’s blockchain use for supply chain management.

But recently there was an article in Science Daily about the use of blockchain technology to improvedprescriptions. This was summary of a research paper on Cryptopharmacueticals (paper behind paywall).

How cryptopharmaceuticals could work

Essentially the intent is to use a medical blockchain, to fight counterfeit drugs. They have a proof of concept IOS & Android MedBlockChain app that would show how it could work, but it’s just a sample of some of its functionality, and doesn’t use an external blockchain.

The MedBlockChain app would create a platform and have at least two sides to it.

One side would be the phamaceutical manufacturers which would use the blockchain to add or checkin medications that they manufacture to it in an immutable fashion. So the block chain would essentially have an unfalsifiable record of each pill or batch of pills, that was ever manufactured by pharmaceutical companies around the world.
The other side would be used by a person taking a medication. Here they could check-out or use the app to see if the medication they are taking was manufactured by a certified supplier of the drug. Presumably there would be a QR code or something similar, that could be read off the medicine package or pill itself. The app would scan the QR code and then use MedBlockChain to look up the provenance of the medication to see if it’s valid or not (a fraudulent copy).

The example MedBlockChain app also has more medical information that could be made available on the block chain such as test results, body measurements, vitals, etc. These could all be stored immutably in the MedBlockChain and provided to medical practitioners. How such medical (HIPPA controlled) personal information would be properly secured and only supplied in plaintext to appropriate personnel is another matter..

~~~~
Cryptopharmaceuticals and the MedBlockChain reminds me of IBM’s blockchain providing diamond provenance and other supply chain services, only in this case applied to medications. Diamond provenance makes sense because of its high cost but drugs seem a harder market to make to me.

I was going to say that such a market may not exist in first world countries. But then I -saw a wikipedia article on counterfeit medicines (bad steroids and cancer medicines with no active ingredients). It appears that counterfeit/fraudulent medications are a problem wherever you may live.

Then of course, the price of medications seems to be going up. So maybe, it could start as a provenance tool for expensive medications and build a market from there.

How to convince manufacturers and the buying public to use the blockchain is another matter. It’s sort of a chicken and egg thing. You need the manufacturers to use it for medications, pills or batch that they manufacture. Doing so adds overhead, time and additional expense and they would need to add a QR code or something similar to every pill, pen or other drug delivery device.

Then maybe you could get consumers and medical practitioners administering drugs to start using it to validate expensive meds. Starting with expensive medications could potentially build the infrastructure, consumer/medical practitioner and pharmaceutical company buy in that would kick start the MedBlockChain. Once started there, it could work its way down to more widely used medications.

Comments?

Photo Credit(s):

Futurism article: Edible QR codes could deliver exactly what your body needs to heal
Medblockchain app Icon from IOS app store

Where should IoT data be processed – part 1

Posted on August 13, 2019August 13, 2019 by Ray in Data analytics, Data compression, Data efficiency, Data growth, Data transmission, Decision making, Deep Learning, Distributed computing, Internet of Things, Mobile computing, Networking, Neural network, Robots, Storage, System effectiveness

I was at FlashMemorySummit 2019 (FMS2019) this week and there was a lot of talk about computational storage (see our GBoS podcast with Scott Shadley, NGD Systems). There was also a lot of discussion about IoT and the need for data processing done at the edge (or in near-edge computing centers/edge clouds).

At the show, I was talking with Tom Leyden of Excelero and he mentioned there was a real need for some insight on how to determine where IoT data should be processed.

For our discussion let’s assume a multi-layered IoT architecture, with 1000s of sensors at the edge, 100s of near-edge processing/multiplexing stations, and 1 to 3 core data center or cloud regions. Data comes in from the sensors, is sent to near-edge processing/multiplexing and then to the core data center/cloud.

Data size

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

When deciding where to process data one key aspect is the size of the data. Tin GB or TB but given today’s world, can be PB as well. This lone parameter has multiple impacts and can affect many other considerations, such as the cost and time to transfer the data, cost of data storage, amount of time to process the data, etc. All of these sub-factors include the size of the data to be processed.

Data size can be the largest single determinant of where to process the data. If we are talking about GB of data, it could probably be processed anywhere from the sensor edge, to near-edge station, to core. But if we are talking about TB the processing requirements and time go up substantially and are unlikely to be available at the sensor edge, and may not be available at the near-edge station. And PB take this up to a whole other level and may require processing only at the core due to the infrastructure requirements.

Processing criticality

Human or machine safety may depend on quick processing of sensor data, e. g. in a self-driving car or a factory floor, flood guages, etc.. In these cases, some amount of data (sufficient to insure human/machinge safety) needs to be done at the lowest point in the hierarchy, with the processing power to perform this activity.

This could be in the self-driving car or factory automation that controls a mechanism. Similar situations would probably apply for any robots and auto pilots. Anywhere some IoT sensor array was used to control an entity, that could jeopardize the life of human(s) or the safety of machines would need to do safety level processing at the lowest level in the hierarchy.

If processing doesn’t involve safety, then it could potentially be done at the near-edge stations or at the core. .

Processing time and infrastructure requirements

Although we talked about this in data size above, infrastructure requirements must also play a part in where data is processed. Yes sensors are getting more intelligent and the same goes for near-edge stations. But if you’re processing the data multiple times, say for deep learning, it’s probably better to do this where there’s a bunch of GPUs and some way of keeping the data pipeline running efficiently. The same applies to any data analytics that distributes workloads and data across a gaggle of CPU cores, storage devices, network nodes, etc.

There’s also an efficiency component to this. Computational storage is all about how some workloads can better be accomplished at the storage layer. But the concept applies throughout the hierarchy. Given the infrastructure requirements to process the data, there’s probably one place where it makes the most sense to do this. If it takes a 100 CPU cores to process the data in a timely fashion, it’s probably not going to be done at the sensor level.

Data information funnel

We make the assumption that raw data comes in through sensors, and more processed data is sent to higher layers. This would mean at a minimum, some sort of data compression/compaction would need to be done at each layer below the core.

We were at a conference a while back where they talked about updating deep learning neural networks. It’s possible that each near-edge station could perform a mini-deep learning training cycle and share their learning with the core periodicals, which could then send this information back down to the lowest level to be used, (see our Swarm Intelligence @ #HPEDiscover post).

All this means that there’s a minimal level of processing of the data that needs to go on throughout the hierarchy between access point connections.

Pipe availability

The availability of a networking access point may also have some bearing on where data is processed. For example, a self driving car could generate TB of data a day, but access to a high speed, inexpensive data pipe to send that data may be limited to a service bay and/or a garage connection.

So some processing may need to be done between access point connections. This will need to take place at lower levels. That way, there would be no need to send the data while the car is out on the road but rather it could be sent whenever it’s attached to an access point.

Compliance/archive requirements

Any sensor data probably needs to be stored for a long time and as such will need access to a long term archive. Depending on the extent of this data, it may help dictate where processing is done. That is, if all the raw data needs to be held, then maybe the processing of that data can be deferred until it’s already at the core and on it’s way to archive.

However, any safety oriented data processing needs to be done at the lowest level and may need to be reprocessed higher up in the hierachy. This would be done to insure proper safety decisions were made. And needless the say all this data would need to be held.

~~~~

I started this post with 40 or more factors but that was overkill. In the above, I tried to summarize the 6 critical factors which I would use to determine where IoT data should be processed.

My intent is in a part 2 to this post to work through some examples. If there’s anyone example that you feel may be instructive, please let me know.

Also, if there’s other factors that you would use to determine where to process IoT data let me know.