The myth of AGI

Sorry seem to be on an AGI bent this month…

Read an article the other day about a new book (The myth of AI, by Erik. J. Larson) that explains how the present direction of AI-ML-DL will be very unlikely to achieve artificial general intelligence (AGI) given it’s current direction. Amazon and others offer a short preview of the book which is where most of this discussion comes from.

Types of (human) reasoning

Near as I can tell, (don’t have the book), the book discusses the three types of reasoning that exist in human intellect, i.e., deduction, induction and abduction.

  • Deduction uses formal logic (or its equivalents) to derive facts or theorems from basic principles.
  • Induction uses a multitude of samples and constructs general principles from the analysis of them
  • Abduction uses a set of probabilistic assertions and formal logic, to come up with a probabilistic principle.

Deduction is most famously observed in geometry and arithmetic proofs and was most evident in the early years of AI through its use of expert systems. The challenge with expert systems is that the real world is vastly more complex than any geometrical or arithmetical artifice that humankind can produce.

Expert systems became champions of checkers, chess and some other games but in the end was not easily generalizable beyond a few (gaming and medically) restricted domains.

Induction is presently all the rage and represents what machine learning and deep neural networks (DNN) are doing with all that training data and resultant classification inferencing.

Today we have DNNs that can classify the objects in an image, can learn to play any game on the planet better than humans, and can even safely drive a car down the road.

The current AI world view is that this form of reasoning, DNN induction, will if taken to its extreme will ultimately result in some level of AGI, or human-equivalent levels of intelligence in a system. The author of the book begs to differ.

Abduction is less well known or discussed in rational circles. It’s essentially what any human does when presented with real world examples/experiences to derive an understanding (or principe) of what happened.

For example, a plate full of cookies last night becomes an almost empty plate of crumbs and two cookies. So what happened, your son woke up early, consumed most if not all of them, and left for work. This is a probabilistic (most likely) inference, but has a high probability of being true.

Any AGI will need all forms of reasoning

The challenge is that AI has been through the deduction phase through the rise of expert systems which crashed and burned because of the cost and time required to produce an exhaustive and correct expert system. And AI is currently in the induction phase, via DNN training, which seems to be entirely more generalizable and successfully usable in many different domains, but no one is talking seriously about doing abduction in AI (anymore).

The author claims (again, have not read the book) that any AGI will require as much abduction as induction (as well as perhaps deduction), and therefore, AGI is not inevitable based on our current AI DNN (or induction) intensive path.

Previous and current attempts at abduction reasoning

Some may recall fuzzy logic as one of the avenues taken after expert systems seemed to fail at doing successful and realistic inferencing around the end of last century. Fuzzy logic was a way of bring probabilities into deduction, not unlike abduction as defined above. With fuzzy logic each assertion or base assumption was given a probabilistic value (of being true) and the final derivation was assigned some level of probability of being true.

The wikipedia article has definitions for fuzzy logic and, or and not which of course would allow any system to make these assertions. But fuzzy logic (like expert systems above) suffered from the inability to exhaustively cover all examples in a real world situation.

Furthermore, the (funny) thing about DNNs is that they are much more probabilistic than it appears. If one examines classification outputs of any DNN, it is extremely rare to see some sort of boolean (true or false) yes or no answers. Mostly one sees a series of probabilities that are assigned to each classification bucket.

DNN systems hide these probabilities by just selecting the maximum (or minimum) probability generated as its final classification. This is entirely an artifact of needing to have some discrete output (classification selection). But DNN (internal) results always result in probabilistic values.

So although, pure induction doesn’t include probabilities, DNN induction as practiced today in AI systems, uses probabilistic reasoning in every layer of a DNN and in its final results.

What else may be missing from AI to allow AGI to be developed

Personally, AGI seems to require not just the reasoning approaches above, but a more workable and general purpose planning solution. I’ve tried to identify to see whether some researchers are using DNNs to provide general purpose planning solutions but have been yet to find any (in publcly available research). These are probably the one place where expert (or control) fuzzy systems still shine. But again they are hard to generalize and prove almost impossible to be completely exhaustive.

Nonetheless, in the end, I think that all the above just proves, that there are a number of distinct reasoning and other (planning) techniques that may need to come together to provide AGI. As any of us can attest, all of these different approaches are available within any human intellect.

And if we assume that any AGI will need to follow the human design to intelligence (not a given), they will all need to be stitched together, combined and brought to bear to realize AGI.

But, at present, with all the focus on DNN/induction, we, as AI researchers, are not making any progress on using these other techniques or in combining them into a single system.

And for that I am happy. I would be very pleased to have any AGI be farther out than nearer term. Because for the life of me, AGI scares the s&#t out of me.

Mostly because I don’t see any real way to control AGI, once unleashed. That and given the diversity of motives around this world, I don’t see any realistic mechanism to instill a universal and firm (unalterable) belief in the sanctity of human and other life, the dependance this life has on our environment/biosphere and the rule of law needed to maintain peace across humankind (and I’m probably missing a half dozen more things that we would want any AGI to adhere to).

Maybe, if I saw more effort on how, we as a species can come up with universal views on these and other topics and can come up with some way of instilling, essentially a system of programs, with these unalterable beliefs and AGI controls based on these, I’d be less fearful of AGI emerging.

Lacking that, any way of delaying its emergence, is fine by me.

Comments?

Photo Credit(s):

Anti-Gresham’s Law: Good information drives out bad

(Good information is in blue, bad information is in Red)

Read an article the other day in ScienceDaily (Faster way to replace bad info in networks) which discusses research published in a recent IEEE/ACM Transactions on Network journal (behind paywall). Luckily there was a pre-print available (Modeling and analysis of conflicting information propagation in a finite time horizon).

The article discusses information epidemics using the analogy of a virus and its antidote. This is where bad information (the virus) and good information (the antidote) circulate within a network of individuals (systems, friend networks, IOT networks, etc). Such bad information could be malware and its good information counterpart could be a system patch to fix the vulnerability. Another example would be an outright lie about some event and it’s counterpart could be the truth about the event.

The analysis in the paper makes some simplifying assumptions. That in a any single individual (network node), both the virus and the antidote cannot co-exist. That is either an individual (node) is infected by the virus or is cured by the antidote or is yet to be infected or cured.

The network is fully connected and complex. That is once an individual in a network is infected, unless an antidote is developed the infection proceeds to infect all individuals in the network. And once an antidote is created it will cure all individuals in a network over time. Some individuals in the network have more connections to other nodes in the network while different individuals have less connections to other nodes in the network.

The network functions in a bi-directional manner. That is any node, lets say RAY, can infect/cure any node it is connected to and conversely any node it is connected to can infect/cure the RAY node.

Gresham’s law, (see Wikipedia article) is a monetary principle which states bad money in circulation drives out good. Where bad money is money that is worth less than the commodity it is backed with and good money is money that’s worth more than the commodity it is backed with. In essence, good money is hoarded and people will preferentially use bad money.

My anti-Gresham’s law is that good information drives out bad. Where good information is the truth about an event, security patches, antidotes to infections, etc. and bad infrormation is falsehoods, malware, biological viruses., etc

The Susceptible Infected-Cured (SIC) model

The paper describes a SIC model that simulates the (virus and antidote) epidemic propagation process or the process whereby virus and its antidote propagates throughout a network. This assumes that once a network node is infected (at time0), during the next interval (time0+1) it infects it’s nearest neighbors (nodes that are directly connected to it) and they in turn infect their nearest neighbors during the following interval (time0+2), etc, until all nodes are infected. Similarly, once a network node is cured it will cure all it’s neighbor nodes during the next interval and these nodes will cure all of their neighbor nodes during the following interval, etc, until all nodes are cured.

What can the SIC model tell us

The model provides calculations to generate a number of statistics, such as half-life time of bad information and extinction time of bad-information. The paper discusses the SIC model across complex (irregular) network topologies as well as completely connected and star topologies and derives formulas for each type of network

In the discussion portion of the paper, the authors indicate that if you are interested in curing a population with bad information it’s best to map out the networks’ topology and focus your curation efforts on those node(s) that lie along the (most) shortest path(s) within a network.

I wrongly thought that the best way to cure a population of nodes would be to cure the nodes with the highest connectivity. While this may work and such nodes, are no doubt along at least one if not all, shortest paths, it may not be the optimum solution to reduce extinction time, especially If there are other nodes on more shortest paths in a network, target these nodes with a cure.

Applying the SIC model to COVID-19

It seems to me that if we were to model the physical social connectivity of individuals in a population (city, town, state, etc.). And we wanted to infect the highest portion of people in the shortest time we would target shortest path individuals to be infected first.

Conversely, if we wanted to slow down the infection rate of COVID-19, it would be extremely important to reduce the physical connectivity of indivduals on the shortest path in a population. Which is why social distancing, at least when broadly applied, works. It’s also why, when infected, self quarantining is the best policy. But if you wished to not apply social distancing in a broad way, perhaps targeting those individuals on the shortest path to practice social distancing could suffice.

However, there are at least two other approaches to using the SIC model to eradicate (extinguish the disease) the fastest:

  1. Now if we were able to produce an antidote, say a vaccine but one which had the property of being infectious (say a less potent strain of the COVID-19 virus). Then targeting this vaccine to those people on the shortest paths in a network would extinguish the pandemic in the shortest time. Please note, that to my knowledge, any vaccine (course), if successful, will eliminate a disease and provide antibodies for any future infections of that disease. So the time when a person is infected with a vaccine strain, is limited and would likely be much shorter than the time soemone is infected with the original disease. And most vaccines are likely to be a weakened version of an original disease may not be as infectious. So in the wild the vaccine and the original disease would compete to infect people.
  2. Another approach to using the SIC model and is to produce a normal (non-transmissible) vaccine and target vaccination to individuals on the shortest paths in a population network. As once vaccinated, these people would no longer be able to infect others and would block any infections to other individuals down network from them. One problem with this approach is if everyone is already infected. Vaccinating anyone will not slow down future infection rates.

There may be other approaches to using SIC to combat COVID-19 than the above but these seem most reasonable to me.

So, health organizations of the world, figure out your populations physical-social connectivity network (perhaps using mobile phone GPS information) and target any cure/vaccination to those individuals on the highest number of shortest paths through your network.

Comments?

Photo Credit(s):

  1. Figure 2 from the Modeling and analysis of conflicting information propagation in a finite time horizon article pre-print
  2. Figure 3 from the Modeling and analysis of conflicting information propagation in a finite time horizon article pre-print
  3. COVID-19 virus micrograph, from USA CDC.

Where should IoT data be processed – part 1

I was at FlashMemorySummit 2019 (FMS2019) this week and there was a lot of talk about computational storage (see our GBoS podcast with Scott Shadley, NGD Systems). There was also a lot of discussion about IoT and the need for data processing done at the edge (or in near-edge computing centers/edge clouds).

At the show, I was talking with Tom Leyden of Excelero and he mentioned there was a real need for some insight on how to determine where IoT data should be processed.

For our discussion let’s assume a multi-layered IoT architecture, with 1000s of sensors at the edge, 100s of near-edge processing/multiplexing stations, and 1 to 3 core data center or cloud regions. Data comes in from the sensors, is sent to near-edge processing/multiplexing and then to the core data center/cloud.

Data size

Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)
Dans la nuit des images (Grand Palais) by dalbera (cc) (from flickr)

When deciding where to process data one key aspect is the size of the data. Tin GB or TB but given today’s world, can be PB as well. This lone parameter has multiple impacts and can affect many other considerations, such as the cost and time to transfer the data, cost of data storage, amount of time to process the data, etc. All of these sub-factors include the size of the data to be processed.

Data size can be the largest single determinant of where to process the data. If we are talking about GB of data, it could probably be processed anywhere from the sensor edge, to near-edge station, to core. But if we are talking about TB the processing requirements and time go up substantially and are unlikely to be available at the sensor edge, and may not be available at the near-edge station. And PB take this up to a whole other level and may require processing only at the core due to the infrastructure requirements.

Processing criticality

Human or machine safety may depend on quick processing of sensor data, e. g. in a self-driving car or a factory floor, flood guages, etc.. In these cases, some amount of data (sufficient to insure human/machinge safety) needs to be done at the lowest point in the hierarchy, with the processing power to perform this activity.

This could be in the self-driving car or factory automation that controls a mechanism. Similar situations would probably apply for any robots and auto pilots. Anywhere some IoT sensor array was used to control an entity, that could jeopardize the life of human(s) or the safety of machines would need to do safety level processing at the lowest level in the hierarchy.

If processing doesn’t involve safety, then it could potentially be done at the near-edge stations or at the core. .

Processing time and infrastructure requirements

Although we talked about this in data size above, infrastructure requirements must also play a part in where data is processed. Yes sensors are getting more intelligent and the same goes for near-edge stations. But if you’re processing the data multiple times, say for deep learning, it’s probably better to do this where there’s a bunch of GPUs and some way of keeping the data pipeline running efficiently. The same applies to any data analytics that distributes workloads and data across a gaggle of CPU cores, storage devices, network nodes, etc.

There’s also an efficiency component to this. Computational storage is all about how some workloads can better be accomplished at the storage layer. But the concept applies throughout the hierarchy. Given the infrastructure requirements to process the data, there’s probably one place where it makes the most sense to do this. If it takes a 100 CPU cores to process the data in a timely fashion, it’s probably not going to be done at the sensor level.

Data information funnel

We make the assumption that raw data comes in through sensors, and more processed data is sent to higher layers. This would mean at a minimum, some sort of data compression/compaction would need to be done at each layer below the core.

We were at a conference a while back where they talked about updating deep learning neural networks. It’s possible that each near-edge station could perform a mini-deep learning training cycle and share their learning with the core periodicals, which could then send this information back down to the lowest level to be used, (see our Swarm Intelligence @ #HPEDiscover post).

All this means that there’s a minimal level of processing of the data that needs to go on throughout the hierarchy between access point connections.

Pipe availability

binary data flow

The availability of a networking access point may also have some bearing on where data is processed. For example, a self driving car could generate TB of data a day, but access to a high speed, inexpensive data pipe to send that data may be limited to a service bay and/or a garage connection.

So some processing may need to be done between access point connections. This will need to take place at lower levels. That way, there would be no need to send the data while the car is out on the road but rather it could be sent whenever it’s attached to an access point.

Compliance/archive requirements

Any sensor data probably needs to be stored for a long time and as such will need access to a long term archive. Depending on the extent of this data, it may help dictate where processing is done. That is, if all the raw data needs to be held, then maybe the processing of that data can be deferred until it’s already at the core and on it’s way to archive.

However, any safety oriented data processing needs to be done at the lowest level and may need to be reprocessed higher up in the hierachy. This would be done to insure proper safety decisions were made. And needless the say all this data would need to be held.

~~~~

I started this post with 40 or more factors but that was overkill. In the above, I tried to summarize the 6 critical factors which I would use to determine where IoT data should be processed.

My intent is in a part 2 to this post to work through some examples. If there’s anyone example that you feel may be instructive, please let me know.

Also, if there’s other factors that you would use to determine where to process IoT data let me know.

Clouds an existential threat – part 2

Recall that in part 1, we discussed most of the threats posed by clouds to both hardware and software IT vendors. In that post we talked about some of the more common ways that vendors are trying to head off this threat (for now).

In this post we want to talk about some uncommon ways to deal with the coming cloud apocalypse.

But first just to put the cloud threat in perspective, the IT TAM is estimated, by one major consulting firm, to be a ~$3.8T in 2019 with a growth rate of 3.7% Y/Y. The same number for public cloud spending, is ~$214B in 2019, growing by 17.5% Y/Y. If both growth rates continue (a BIG if), public cloud services spend will constitute all (~98.7%) of IT TAM in ~24 years from now. No nobody would predict those growth rates will continue but it’s pretty evident the growth trends are going the wrong way for (non-public cloud) IT vendors.

There are probably an infinite number of ways to deal with the cloud. But outside of the common ones we discussed in part 1, only a dozen or so seem feasible to me and even less are fairly viable for present IT vendors.

  • Move to the edge and IoT.
  • Make data center as easy and cheap to use as the cloud
  • Focus on low-latency, high data throughput, and high performing work and applications
  • Move 100% into services
  • Move into robotics

The edge has legs

Probably the first one we should point out would be to start selling hardware and software to support the edge. Speaking in financial terms, the IoT/Edge market is estimated to be $754B in 2019, and growing by over a 15.4% CAGR ).

So we are talking about serious money. At the moment the edge is a very diverse environment from cameras, sensors and moveable devices. And everybody seems to be in the act, big industrial firms, small startups and everyone in between. Given this diversity it’s hard to see that IT vendors could make a decent return here. But given its great diversity, one could say it’s ripe for consolidation.

And the edge could use some reference architectures where there are devices at the extreme edge, concentrators at the edge, more higher concentrators at nodes and more at the core, etc. So there’s a look and feel to it that seems like Ro/Bo – central core hub and spoke architectures, only on steroids with leaf proliferation that can’t be stopped. And all that data coming in has to be classified, acted upon and understood.

There are plenty of other big industrial suppliers in this IoT/edge field but none seem to have the IT end of the market that Hitachi Vantara can claim to. Some sort of combination of a large IT vendor and a large industrial firm could potentially do the same

However, Hitachi Vantara seems to be focusing on the software side of the edge. This may be an artifact of Hitachi family of companies dynamics. But it seems to be leaving some potential sales on the table.

Hitachi Vantara has the advantage of being into industrial technology in a big way so the products they create operate in factories, rail yards, ship yards and other industrial sites around the world already. So, adding IoT and edge capabilities to their portfolio is a natural extension of this expertise.

There are a few vendors going into the Edge/IoT in a small way, but no one vendor personifies this approach more than Hitachi Vantara. The Hitachi family of companies has a long and varied history in OT (operational technology) or industrial technology. And over the last many years, HDS and now Hitachi Vantara, have been pivoting their organization to focus more on IoT and edge solutions and seem to have made IOT, OT and the edge, a central part of their overall strategy.

So there’s plenty of money to be made with IoT/Edge hardware and software, one just has to go after it in a big way and there’s lots of competition. But all the competition seems to be on the same playing field (unlike the public cloud playing field).

Getting to “data center as a cloud”

There are a number of reasons why customers migrate work to the cloud, ease of use, ease of storage, ease of scale, access to myriad applications, access to multi-regional data centers, CAPex financial model, to name just a few.

There’s nothing that says much of this couldn’t be provided at the data center. It’s mostly just a lot of open source software and a lot of common hardware. IT vendors can do this sort of work if they put their vast resources to go after it.

From the pure software side, there are a couple of companies trying to do this namely VMware and Nutanix but (IBM) RedHat, (Dell) Pivotal, HPE Simplivity and others are also going after this approach.

Hardware wise CI and HCI, seem to be rudimentary steps towards common hardware that’s easy to deploy, operate and support. But these baby steps aren’t enough. And delivery to deployment in weeks is never going to get them there. If Amazon can deliver books, mattresses, bicycles, etc in a couple of days. IT vendors should be able to do the same with some select set of common hardware and have it automatically deployable in seconds to minutes once powered on.

And operating these systems has to be drastically simplified. On any public cloud there’s really no tuning required, almost minimal configuration, and then it’s just load your data and go. Yes there’s a market place to select, (virtual) hardware, (virtual) storage hardware, (virtual) networking hardware, (virtual server) O/S and (virtual?) open source applications.

Yes there’s a lots of software behind all that virtualization. And it’s fundamentally different than today’s virtualized systems. It’s made to operate only on commodity hardware and only with open source software.

The CAPex financial model is less of a problem. Today. I find many vendors are offering their hardware (and some software) on a CAPex, pay as you go model. More of this needs to be made available but the IT vendors see this, and are already aggressively moving in this direction.

The clouds are not standing still what with Azure Stack, AWS and GCP all starting to provideversions of their stack on prem in the enterprise. This looks to be a strategic battleground between the clouds and IT vendors.

Making everything IT can do in the cloud available in the data center, with common hardware and software and with the speed and ease of deployment, operations and support (maintenance) should be on every IT vendors to do list.

Unfortunately, this is not going to stop the public cloud completely, but it has the potential to slow the growth rate. But time is short, momentum has moved to the public cloud and I don’t (yet) see the urgency of the IT vendors to make this transition happen today.

Focus on low-latency, high data throughput and high performance work

This is somewhat unfair as all the IT vendors are already involved in these markets in a big way. But, there are some trends here, that indicate this low-latency market will be even more important over time.

For example, more and more of commercial IT is starting to take advantage of big data and AI to profit from all their data. And big science is starting to migrate to IT, where massive data flows and data analysis tools are becoming important to the data center. If anything, the emergence of IoT and the edge will increase data flows that need to be analyzed, understood, and ultimately dealt with.

DNA genomics may be relegated to big pharma/medical but 3D visualization is becoming so mainstream that I can do it on my desktop. These sorts of things were relegated to HPC/big science just a decade or so ago. What tools exist in HPC today that the IT data center of the future will deam a necessary part of their application workload.

Is this a sizable TAM, probably not today. In all honesty it’s buried somewhere in the IT TAM above. But it can be a growing niche, where IT vendors can stake a defensive position and the cloud may have a tough time dislodging.

I say the cloud “may have trouble dislodging” because nothing says that the entire data flow/work flow couldn’t migrate to the cloud, if the responsiveness was available there. But, if anything (guaranteed) responsiveness is one of the few achilles heels of the public cloud. Security may be the other one.

We see IBM, Intel, and a few others taking this space seriously. But all IT vendors need to see where they can do better here.

Focus on services

This not really out-of-box thinking. Some (old) IT vendors have been moving into services for over 50 years now others are just seeing there’s money to be made here. Just about every IT vendor has deployment & support services. most hardware have break-fix services.

But standalone IT services are more specialized and in the coming cloud apocalypse, services will revolve around implementing cloud applications and functionality or migrating work from the cloud or (rarely in the future) back to on prem.

TAM for services is buried in the total IT spend but industry analysts estimate that in 2019 total worldwide TAM for IT services will be about $1.0 in 2019 and growing by 2.6% CAGR.

So services are already a significant portion of IT spend today. And will probably not be impacted by the move to the cloud. I’d say that because implementing applications and services will still exist as long as the cloud exists. Yes it may get simpler (better frameworks, containerization, systemization), but it won’t ever go away completely.

Robots, the endgame

Ok laugh now. I understand this is a big ask to think that Robot spending could supplement and maybe someday surpass IT spending. But we all have to think long term. What is a self driving car but a robotic data center on wheels, generating TB of data every day it’s driven.

Robots over the next century will invade every space, become ever present and ever necessary to modern world functioning . They will have sophisticated onboard computing, motors, servos, sensors and on board and backend processing requirements. The real low-latency workload of the future will be in the (computing) minds of robots.

Even if the data center moves entirely to the cloud, all robotic computation will never reside there because A) it’s too real time and B) it needs to operate well even disconnected from the Internet.

Is all this going to happen in the next 10 or 20 years, maybe not but 30 to 50 years out this world will have a multitude of robots operating within it. .

Who’s going to develop, manufacture, support and sustain these mobile computing data centers on wheels, legs, slithering and flying bodies?

I would say IT vendors of today are uniquely positioned to dominate this market. Here to the industry is very fragmented today. There are a few industrial robotic companies and just about every major auto manufacturer is going after self driving cars. And there are many bit players today. So it’s ripe for disruption and consolidation. .

Yet, none of the major IT vendors seem to be going after this. Ok Amazon (hardware & software) and Microsoft (software) have done work in this arena. If anything this should tell IT vendors that they need to start working here as well.

But alas, none have taken up the mantle. In the mean time robot startups are biting the dust left and right, trying to gain market traction.

~~~~

That seems to be about it for the major viable out of the box approaches to the public cloud threat. I have a few other ideas but none seem as useful as the above.

Let me know what you think.

Picture credit(s):

AI processing at the edge

Read a couple of articles over the past few weeks (TechCrunch: Google is making a fast, specialized TPU chip for edge devices … and IEEE Spectrum: Two startups use processing in flash for AI at the edge) about chips for AI at the IoT edge.

The two startups, Syntiant and Mythic, are moving to analog only or analog-digital solutions to provide AI processing needed at the edge while Google is taking their TPU technology to the edge.  We have written about Google’s TPU before (see: TPU and hardware vs. software  innovation (round 3) post).

But first please take our new poll:

The major challenge in AI processing at the edge is power consumption. Both  startups attack the power problem by using flash and other analog circuitry to provide power efficient compute.

Google attacked the power problem with their original TPU by reducing computational precision from 64- to 8-bits. By reducing transistor counts, they lowered power requirements proportionally.

AI today is based on neural networks (NN), that connect simulated neurons via simulated synapses with weights attached to indicate whether to boost or decrease the signal being transmitted. AI learning is done by setting those weights and creating the connections between simulated neurons and the synapses.  So learning is setting weights and establishing connections. Actual inferences (using AI to do something) is a process of exciting input simulated neurons/synapses and letting the signal flow through the NN with each weight being used to determine output(s).

AI with standard compute

The problem with doing AI learning or inferencing with normal CPUs or even CUDAs is that the NN does thousands if not millions of  multiplication-accumulation actions at each simulated synapse-neuron connection. Doing all these multiplication-accumulation takes power. CPUs and CUDAs can do these sorts of operations on 32 or 64 bit numbers or even floating point but it still takes power.

AI processing power

AI processing power is measured in trillions of (accumulate-multiply) operations per second per watt (TOPS/W). Mythic believes it can perform 4 TOPS/W and Syntiant says it can do 20 TOPS/W. In comparison, the NVIDIA Volta V100 can do about 0.4 TOPS/W (according to the article). Although  comparing Syntiant-Mythic TOPS to NVIDIA TOPS is a little like comparing apples to oranges.

A current Intel Xeon Platinum 8180M (2.5Ghz, 28 Core processors, 205 W) can probably do (assuming one multiplication-accumulation per hertz) about 2.5 Billion X 28 Cores = 70 Billion Ops Second/205 W or 0.3 GOPS/W (source: Platinum 8180M Data sheet).

As for Google’s TPU TOPS/W, TPU2 is rated at 45 GFLOPS/chip and best guess for power consumption is between 160W and 200W, let’s say 180W. With power at that level, TPU2 should hit 0.25 GFLOPS/W.  TPU3 is coming out with 8X the power but it uses water cooling (read LOTS MORE POWER).

Nonetheless, it appears that Mythic and Syntiant are one to two orders of magnitude better than the best that NVIDIA and TPU2 can do today and many orders of magnitude better than Intel X86.

Improving TOPS/W

Using NAND, as an analog memory to read, write and hold  NN weights is an easy way to reduce power consumption. Combine that with  analog circuitry that can do multiplication and addition with those flash values and you have a AI NN processor. This way you reduce the need to hold weights in memory and do compute in registers by collapsing both compute and memory into the same componentry.

The major difference between Syntiant and Mythic seems to be the amount of analog circuitry they use. Mythic seems to relegate the analog circuitry to an accelerator while Syntiant has a more extensive use of analog circuitry throughout their chip. Probably why it can perform 5X the TOPS/W of Mythic’s IPU.

IBM and others have been working on neuromorphic chips some of which are analog based and others which are all digital based. We’ve written extensively on IBM and some on MIT’s approaches (for the latest on IBM see: More power efficient deep learning through IBM and PCM, and for MIT see: MIT builds an analog synapse chip) and follow the links there to learn more.

~~~~

Special purpose AI hardware is emerging from the labs and finally reaching reality. IBM R&D has been playing with it for a long time. Google is working on TPU3 so there’s no stopping them. And startups are seeing an opening and are taking everyone on. Stay tuned, were in for a good long ride before the someone rises above the crowd and becomes the next chip giant.

Comments?

Photo Credit(s): TechCrunch  Google is making a fast, specialized TPU chip for edge devices … article

Introduction to Digital Design Verification at Mythic, Medium.com Article

Images from Google Cloud Platform Blog on the TPU

Two startups use processing in flash for AI at the edge, IEEE Spectrum article courtesy of Mythic

Surprises in flash storage IO distributions from 1 month of Nimble Storage customer base

graphs
We were at Nimble Storage (videos of their sessions) for Storage Field Day 10 (SFD10) last week and they presented some interesting IO statistics from data analysis across their 7500 customer install base using InfoSight.

As I understand it, the data are from all customers that have maintenance and are currently connected to InfoSight, their SaaS service solution for Nimble Storage. The data represents all IO over the course of a single month across the customer base. Nimble wrote a white paper summarizing their high level analysis, called Busting the myth of storage block size.
Continue reading “Surprises in flash storage IO distributions from 1 month of Nimble Storage customer base”

(#Storage-QoW 2015-002): Will we see 3D TLC NAND GA in major vendor storage products in the next year?

450_x_492_3d_nand_32_layer_stack

I was almost going to just say something about TLC NAND but there’s planar TLC and 3D TLC. From my perspective, planar NAND is on the way out, so we go with 3D TLC NAND.

QoW 2015-002 definitions

By “3D TLC NAND” we mean 3 dimensional (rather than planar or 2 dimensional) triple level cell (meaning 3 values rather than two [MLC] or one [SLC]) NAND technology. It could show up in SSDs, PCIe cards and perhaps other implementations. At least one flash vendor is claiming to be shipping 3D TLC NAND so it’s available to be used. We did a post earlier this year on 3D NAND, how high can it go. Rumors are out that startup vendors will adopt the technology but have heard nothing any major vendor plans for the technology.

By “major vendor storage products” I mean EMC VMAX, VNX or XtremIO;  HDS VSP G1000, HUS VM (or replacement), VSP-F/VSP G800-G600; HPE 3PAR, IBM DS8K, FlashSystem, or V7000 StorWize; & NetApp AFF/FAS 8080, 8060, or 8040. I tried to use 700 drives or better block storage product lines for the major storage vendors.

By “in the next year” I mean between today (15Dec2015) and one year from today (15Dec2016).

By “GA” I mean a generally available product offering that can be ordered, sold and installed within the time frame identified above.

Forecasts for QoW 2015-002 need to be submitted via email (or via twitter with email addresses known to me) to me before end of day (PT) next Tuesday 22Dec2015.

Thanks to Howard Marks (DeepStorage.net, @DeepStorageNet) for the genesis of this weeks QoW.

We are always looking for future QoW’s, so if you have any ideas please drop me a line.

Forecast contest – status update for prior QoW(s):

(#Storage-QoW 2015-001) – Will 3D XPoint be GA’d in  enterprise storage systems within 12 months? 2 active forecasters, current forecasts are:

A) YES with 0.85 probability; and

B) NO with 0.62 probability.

These can be updated over time, so we will track current forecasts for both forecasters with every new QoW.

 

An analyst forecasting contest ala SuperForecasting & 1st #Storage-QoW

71619318_80d2135743_zI recently read the book SuperForecasting: the art and science of prediction by P. E. Tetlock & D. Gardner. Their Good Judgement Project has been running for years now and the book is the results of their experiments.  I thought it was a great book.

But it also got me to thinking, how can industry analysts do a better job at forecasting storage trends and events?

Impossible to judge most analyst forecasts

One thing the book mentioned was that typically analyst/pundit forecasts are too infrequent, vague and time independent to be judge-able as to their accuracy. I have committed this fault as much as anyone in this blog and on our GreyBeards on Storage podcast (e.g. see our Yearend podcast videos…).

What do we need to do differently?

The experiments documented in the book show us the way. One suggestion is to start putting time durations/limits on all forecasts so that we can better assess analyst accuracy. The other is to start estimating a probability for a forecast and updating your estimate periodically when new information becomes available. Another is to document your rational for making your forecast. Also, do post mortems on both correct and incorrect forecasts to learn how to forecast better.

Finally, make more frequent forecasts so that accuracy can be assessed statistically. The book discusses Brier scores as a way of scoring the accuracy of forecasters.

How to be better forecasters?

In the back of the book the author’s publish a list of helpful hints or guidelines to better forecasting which I will summarize here (read the book for more information):

  1. Triage – focus on questions where your work will pay off.  For example, try not to forecast anything that’s beyond say 5 years out, because there’s just too much randomness that can impact results.
  2. Split intractable problems into tractable ones – the author calls this Fermizing (after the physicist) who loved to ballpark answers to hard questions by breaking them down into easier questions to answer. So decompose problems into simpler (answerable) problems.
  3. Balance inside and outside views – search for comparisons (outside) that can be made to help estimate unique events and balance this against your own knowledge/opinions (inside) on the question.
  4. Balance over- and under-reacting to new evidence – as forecasts are updated periodically, new evidence should impact your forecasts. But a balance has to be struck as to how much new evidence should change forecasts.
  5. Search for clashing forces at work – in storage there are many ways to store data and perform faster IO. Search out all the alternatives, especially ones that can critically impact your forecast.
  6. Distinguish all degrees of uncertainty – there are many degrees of knowability, try to be as nuanced as you can and properly aggregate your uncertainty(ies) across aspects of the question to create a better overall forecast.
  7. Balance under/over confidence, prudence/decisiveness – rushing to judgement can be as bad as dawdling too long. You must get better at both calibration (how accurate multiple forecasts are) and resolution (decisiveness in forecasts). For calibration think weather rain forecasts, if rain tomorrow is 80% probably then over time rain probability estimates should be on average correct. Resolution is no guts no glory, if all your estimates are between 0.4 and 0.6 probable, your probably being to conservative to really be effective.
  8. During post mortems, beware of hindsight bias – e.g., of course we were going to have flash in storage because the price was coming down, controllers were becoming more sophisticated, reliability became good enough, etc., represents hindsight bias. What was known before SSDs came to enterprise storage was much less than this.

There are a few more hints than the above.  In the Good Judgement Project, forecasters were put in teams and there’s one guideline that deals with how to be better forecasters on teams. Then, there’s another that says don’t treat these guidelines as gospel. And a third, on trying to balance between over and under compensating for recent errors (which sounds like #4 above).

Again, I would suggest reading the book if you want to learn more.

Storage analysts forecast contest

I think we all want to be better forecasters. At least I think so. So I propose a multi-year long contest, where someone provides a storage question of the week and analyst,s such as myself, provide forecasts. Over time we can score the forecasts by creating a Brier score for each analysts set of forecasts.

I suggest we run the contest for 1 year to see if there’s any improvements in forecasting and decide again next year to see if we want to continue.

Question(s) of the week

But the first step in better forecasting is to have more frequent and better questions to forecast against.

I suggest that the analysts community come up with a question of the week. Then, everyone would get one week from publication to record their forecast. Over time as the forecasts come out we can then score analysts in their forecasting ability.

I would propose we use some sort of hash tag to track new questions, “#storage-QoW” might suffice and would stand for Question of the week for storage.

Not sure if one question a week is sufficient but that seems reasonable.

(#Storage-QoW 2015-001): Will 3D XPoint be GA’d in  enterprise storage systems within 12 months?

3D XPoint NVM was announced last July by Intel-Micron (wrote a post about here). By enterprise storage I mean enterprise and mid-range class, shared storage systems, that are accessed as block storage via Ethernet or Fibre Channel as SCSI device protocols or as file storage using SMB or NFS file access protocols. By 12 months I mean by EoD 12/8/2016. By GA’d, I mean announced as generally available and sellable in any of the major IT regions of the world (USA, Europe, Asia, or Middle East).

I hope to have my prediction in by next Monday with the next QoW as well.

Anyone interested in participating please email me at Ray [at] SilvertonConsulting <dot> com and put QoW somewhere in the title. I will keep actual names anonymous unless told otherwise. Brier scores will be calculated starting after the 12th forecast.

Please email me your forecasts. Initial forecasts need to be in by one week after the QoW goes live.  You can update your forecasts at any time.

Forecasts should be of the form “[YES|NO] Probability [0.00 to 0.99]”.

Better forecasting demands some documentation of your rational for your forecasts. You don’t have to send me your rational but I suggest you document it someplace you can use to refer back to during post mortems.

Let me know if you have any questions and I will try to answer them here

I could use more storage questions…

Comments?

Photo Credits: Renato Guerreiro, Crystalballer