The Data Wall – AGI part 11, ASI part 2

Went to a conference the other week (Cloud Field Day 20) and heard a term I hadn’t heard before, the Data Wall. I wasn’t sure what this meant but thought it an interesting concept.

Then later that week, I read an article online, Situational Awareness – The Decade Ahead, by Leopold Ashenbrenner, which talked about the path to AGI. He predicts it will happen in 2027, and ASI in 2030. However he also discusses many of the obstacles to reaching AGI and one key roadblock is the Data Wall.

This is a follow on to our long running series on AGI (see AGI part 10 here) and with this we are creating a new series on Artificial Super Intelligence (ASI) and have relabeled an earlier post as ASI part 1.

The Data Wall

LLMs, these days, are being trained on the internet text, images, video and audio. However the vast majority of the internet is spam, junk and trash. And because of this, LLMs are rapidly reaching (bad) data saturation. There’s only so much real intelligence to be gained from scraping the internet. .

The (LLM) AI industry apparently believes that there has to be a better way to obtain clean, good training data for their LLMs and if that can be found, true AGI is just a matter of time (and compute power). And this, current wall of garbage data is prohibiting true progress to AGI and is what is meant by the Data Wall.

Leopold doesn’t go into much detail about solutions to the data wall other than to say that perhaps Deep Reinforcement Learning (see below). Given the importance of this bottleneck, every LLM company is trying to solve it. And as a result, any solutions to the Data Wall will end up being proprietary because this enables AGI.

National_Security_Agency_seal
National_Security_Agency_seal

But the real gist of Leopold’s paper is that AGI and its follow on, Artificial Super Intelligence (ASI) will be the key to enabling or retaining national supremacy in the near (the next decade and beyond) future.

And that any and all efforts to achieve this must be kept as a National Top Secret. I think, he wants to see something similar to the Manhattan Project be created in the USA, only rather than working to create an atom/hydrogen bomb, it should be focused on AGI and ASI.

The problem is that when AGI and it’s follow on ASI, is achieved it will represent an unimaginable advantage to the country/company than owns it. Such technology if applied to arms, weapons, and national defense will be unbeatable in any conflict. And could conceivably be used to defeat any adversary before a single shot was fired.

The AGI safety issue

In the paper Leopold talks about AGI safety and his proposed solution is to have AGI/ASI agents be focused on crafting the technologies to manage/control this. I see the logic in this and welcome it but feel it’s not sufficient.

I believe (seems to be in the minority these days) that rather than having a few nation states or uber corporations own and control AGI, it should be owned by the world, and be available to all nation states/corporations and ultimately every human on the planet.

My view is the only way to safely pass through the next “existential technological civilizational bottleneck” (eg, AGI is akin to atomic weapons, genomics, climate change all of which could potentially end life on earth), is to have many of these that can compete effectively with one another. Hopefully such a competition will keep all of them all in check and in the end have them be focused on the betterment of all of humanity.

Yes there will be many bad actors that will take advantage of AGI and any other technology to spread evil, disinformation and societal destruction. But to defeat this, it needs to become ubiquitous, every where, and in that way these agents can be used to keep the bad actors in check.

And of course keeping the (AGI/ASI) genie in the bottle will be harder and harder as time goes on.

Computational performance is going up 2X every few years, so building a cluster of 10K H200 GPUs, while today is extremely cost prohibitive for any but uber corporations and nation states, in a decade or so, will be something any average sized corporation could put together in their data center (or use in the cloud). And in another decade or so will be able to be built into a your own personal basement data center.

The software skills to train an LLM while today may require a master’s degree or higher will be much easier to understand and implement in a decade or so. So that’s not much of a sustainable advantage either.

This only leaves the other bottlenecks to achieving AGI, a key one of which is the Data Wall.

Solving the Data Wall.

In order to have as many AGI agents as possible, the world must have an open dialogue on research into solving the Data Wall.

So how can the world generate better data to use to train open source AGIs. I offer a few suggestions below but by no means is this an exhaustive list. And I’m a just an interested (and talented) amateur in all this

Deep reinforcement learning (DRL)

Leopold mentioned DRL as one viable solution to the data wall in his paper. DRL is a technique that Deepmind used to create a super intelligent Atari, Chess and Go player. They essentially programed agents to play a game against itself and determine which participant won the game. Once this was ready they set multiple agents loose to play one another.

Each win would be used to reward the better player, each loss to penalize the worse player, after 10K (or ~10M) games they ended up with agents that could beat any human player.

Something similar could be used to attack the Data Wall. Have proto-AGI agents interact (play, talk, work) with one another to generate, let’s say more knowledge, more research, more information. And over time, as the agents get smarter, better at this, AGI will emerge.

However, the advantage of Go, Chess, Atari, Protein Folding, finding optimal datacenter energy usage, sort coding algorithms, etc. is that there’s a somewhat easy way to determine which of a gaggle of agents has won. For research, this is not so simple.

Let’s say we program/prompt an protoAGI agent to generate a research paper on some arbitrary topic (How to Improve Machine Learning, perhaps). So it generates a research paper, how does one effectively and inexpensively judge if this is better, worse or the same as another agent’s paper.

I suppose with enough proto-AGI agents one could automatically use “repeatability” of the research as one gauge for research correctness. Have a gaggle of proto-AGIs be prompted to replicate the research and see if that’s possible.

Alternatively, submit the papers to an “AGI journal” and have real researchers review it (sort of like how Human Reinforcement Learning for LLMs works today). The costs for real researchers reviewing AGI generated papers would be high and of course the amount of research generated would be overwhelming, but perhaps with enough paid and (unpaid) voluntary reviewers, the world could start generating more good (research) data.

Perhaps at one extreme we could create automated labs/manufacturing lines that are under the control of AGI agent(s) and have them create real world products. With some modest funding, perhaps we could place the new products into the marketplace and see if they succeed or not. Market success would be the ultimate decision making authority for such automated product development.

(This later approach seems to be a perennial AGI concern, tell an AGI agent to make better paper clips and it uses all of the earths resources to do so.)

Other potential solutions to the Data Wall

There are no doubt other approaches that could be used to validate proto-AGI agent knowledge generation.

  • Human interaction – have an AGI agent be available 7X24 with humans as they interact with the world. Sensors worn by the human would capture all their activities. An AGI agent would periodically ask a human why they did something. Privacy considerations make this a nightmare but perhaps using surveillance videos and an occasional checkin with the human would suffice.
  • Art, culture and literature – there is so much information embedded in cultural artifacts generated around the world that I believe this could effectively be mined to capture additional knowledge. Unlike the internet this information has been generated by humans at a real economic cost, and as such represents real vetted knowledge.
  • Babies-children– I can’t help but believe that babies and young children can teach us (and proto-AGI agents) an awful lot on how knowledge is generated and validated. Unclear how to obtain this other than to record everything they do. But maybe it’s sufficient to capture such data from daycare and public playgrounds, with appropriate approvals of course.

There are no doubt others. But finding some that are cheap enough that could be used for open source is a serious consideration.

~~~~

How we get through the next decade will determine the success or failure of AI and perhaps life on earth. I can’t help but think the more the merrier will help us get there..

Comments,

Steam Locomotive lessons for disk vs. SSD

Read a PHYS ORG article on Extinction of Steam Locomotives derails assumption about biological evolution… which was reporting on a Royal Society research paper The end of the line: competitive exclusion & the extinction… that looked at the historical record of steam locomotives since their inception in the early 19th century until their demise in the mid 20th century. Reading the article it seems to me to have a wider applicability than just to evolutionary extinction dynamics and in fact similar analysis could reveal some secrets of technological extinction.

Steam locomotives

During its 150 years of production, many competitive technologies emerged starting with electronic locomotives, followed by automobiles & trucks and finally, the diesel locomotive.

The researchers selected a single metric to track the evolution (or fitness) of the steam locomotive called tractive effort (TE) or the weight a steam locomotive could move. Early on, steam locomotives hauled both passengers and freight. The researchers included automobiles and trucks as competitive technologies because they do offer a way to move people and freight. The diesel locomotive was a more obvious competitor.

The dark line is a linear regression trend line on the wavy mean TE line, the boxes are the interquartile (25%-75%) range, the line within the boxes the median TE value, and the shaded areas 95% confidence interval for trend line of the steam locomotives TE that were produced that year. Raw data from Locobase, a steam locomotives database

One can see from the graph three phases. The red phase, from 1829-1881, there was unencumbered growth of TE for steam locomotives during this time. But in 1881, electric locomotives were introduced corresponding to the blue phase and after WW II the black phase led to the demise of steam.

Here (in the blue phase) we see a phenomena often seen with the introduction of competitive technologies, there seems to be an increase in innovation as the multiple technologies duke it out in the ecosystem.

Automobiles and trucks were introduced in 1901 but they don’t seem to impact steam locomotive TE. Possibly this is because the passenger and freight volume hauled by cars and trucks weren’t that significant. Or maybe it’ impact was more on the distances hauled.

In 1925 diesel locomotives were introduced. Again we don’t see an immediate change in trend values but over time this seemed to be the death knell of the steam locomotive.

The researchers identified four aspects to the tracking of inter-species competition:

  • A functional trait within the competitive species can be identified and tracked. For the steam locomotive this was TE,
  • Direct competitors for the specie can be identified that coexist within spatial, temporal and resource requirements. For the steam locomotive, autos/trucks and electronic/diesel locomotives.
  • A complete time series for the species/clade (group of related organisms) can be identified. This was supplied by Locobase
  • Non-competitive factors don’t apply or are irrelevant. There’s plenty here including most of the items listed on their chart.

From locomotives to storage

I’m not saying that disk is akin to steam locomotives while flash is akin to diesel but maybe. For example one could consider storage capacity as similar to locomotive TE. There’s a plethora of other factors that one could track over time but this one factor was relevant at the start and is still relevant today. What we in the industry lack is any true tracking of capacities produced since the birth of the disk drive 1956 (according to wikipedia History of hard disk drives article) and today.

But I’d venture to say the mean capacity have been trending up and the variance in that capacity have been static for years (based on more platter counts rather than anything else).

There are plenty of other factors that could be tracked for example areal density or $/GB.

Here’s a chart, comparing areal (2D) density growth of flash, disk and tape media between 2008 and 2018. Note both this chart and the following charts are Log charts.

Over the last 5 years NAND has gone 3D. Current NAND chips in production have 300+ layers. Disks went 3D back in the 1960s or earlier. And of course tape has always been 3D, as it’s a ribbon wrapped around reels within a cartridge.

So areal density plays a critical role but it’s only 2 of 3 dimensions that determine capacity. The areal density crossover point between HDD and NAND in 2013 seems significant to me and perhaps the history of disk

Here’s another chart showing the history of $/GB of these technologies

In this chart they are comparing price/GB of the various technologies (presumably the most economical available during that year). Trajectories in HDDs between 2008-2010 was on a 40%/year reduction trend in $/GB, then flat lined and now appears to be on a 20%/year reduction trend. Flash during 2008-2017 has been on a 25% reduction in $/GB for that period which flatlined in 2018. LTO Tape had been on a 25%/year reduction from 2008 through 2014 and since then has been on a 11% reduction.

If these $/GB trends continue, a big if, flash will overcome disk in $/GB and tape over time.

But here’s something on just capacity which seems closer to the TE chart for steam locomotives.

HDD capacity 1980-2020.

There’s some dispute regarding this chart as it only reflects drives available for retail and drives with higher capacities were not always available there. Nonetheless it shows a couple of interesting items. Early on up to ~1990 drive capacities were relatively stagnant. From 1995-20010 there was a significant increase in drive capacity and since 2010, drive capacities have seemed to stop increasing as much. We presume the number of x’s for a typical year shows different drive capacities available for retail sales, sort of similar to the box plots on the TE chart above

SSDs were first created in the early 90’s, but the first 1TB SSD came out around 2010. Since then the number of disk drives offered for retail (as depicted by Xs on the chart each year) seem to have declined and their range in capacity (other than ~2016) seem to have declined significantly.

If I take the lessons from the Steam Locomotive to heart here, one would have to say that the HDD has been forced to adapt to a smaller market than they had prior to 2010. And if areal density trends are any indication, it would seem that R&D efforts to increase capacity have declined or we have reached some physical barrier with todays media-head technologies. Although such physical barriers have always been surpassed after new technologies emerged.

What we really need is something akin to the Locobase for disk drives. That would track all disk drives sold during each year and that way we can truly see something similar to the chart tracking TE for steam locomotives. And this would allow us to see if the end of HDD is nigh or not.

Final thoughts on technology Extinction dynamics

The Royal Society research had a lot to say about the dynamics of technology competition. And they had other charts in their report but I found this one very interesting.

This shows an abstract analysis of Steam Locomotive data. They identify 3 zones of technology life. The safe zone where the technology has no direct competitions. The danger zone where competition has emerged but has not conquered all of the technologies niche. And the extinction zone where competing technology has entered every niche that the original technology existed.

In the late 90s, enterprise disk supported high performance/low capacity, medium performance/medium capacity and low performance/high capacity drives. Since then, SSDs have pretty much conquered the high performance/low capacity disk segment. And with the advent of QLC and PLC (4 and 5 bits per cell) using multi-layer NAND chips, SSDs seem poisedl to conquer the low performance/high capacity niche. And there are plenty of SSDs using MLC/TLC (2 or 3 bits per cell) with multi-layer NAND to attack the medium performance/medium capacity disk market.

There were also very small disk drives at one point which seem to have been overtaken by M.2 flash.

On the other hand, just over 95% of all disk and flash storage capacity being produced today is disk capacity. So even though disk is clearly in the extinction zone with respect to flash storage, it’s seems to still be doing well.

It would be wonderful to have a similar analysis done on transistors vs vacuum tubes, jet vs propeller propulsion, CRT vs. LED screens, etc. Maybe at some point with enough studies we could have a theory of technological extinction that can better explain the dynamics impacting the storage and other industries today.

Comments,

Photo Credit(s):

The Hollowing out of enterprise IT

We had a relatively long discussion yesterday, amongst a bunch of independent analysts and one topic that came up was my thesis that enterprise IT is being hollowed out by two forces pulling in opposite directions on their apps. Those forces are the cloud and the edge.

Western part of the abandoned Packard Automotive Plant in Detroit, Michigan. by Albert Duce

Cloud sirens

The siren call of the cloud for business units, developers and modern apps has been present for a long time now. And their call is more omnipresent than Odysseus ever had to deal with.

The cloud’s allure is primarily low cost-instant infrastructure that just works, a software solution/tool box that’s overflowing, with locations close to most major metropolitan areas, and the extreme ease of starting up.

If your app ever hopes to scale to meet customer demand, where else can you go. If your data can literally come in from anywhere, it usually lands on the cloud. And if you have need for modern solutions, tools, frameworks or just about anything the software world can create, there’s nowhere else with more of this than the cloud.

Pre-cloud, all those apps would have run in the enterprise or wouldn’t have run at all. And all that data would have been funneled back into the enterprise.

Not today, the cloud has it all, its siren call is getting louder everyday, ever ready to satisfy every IT desire anyone could possibly have, except for the edge.

The Edge, last bastion for onsite infrastructure

The edge sort of emerged over the last decade or so kind of in stealth mode. Yes there were always pockets of edge, with unique compute or storage needs. For example, video surveillance has been around forever but the real acceleration of edge deployments started over the last decade or so as compute and storage prices came down drastically.

These days, the data being generated is stagering and compute requirements that go along with all that data are all over the place, from a few ARMv/RISC V cores to a server farm.

For instance, CERN’s LHC creates a PB of data every second of operation (see IEEE Spectrum article, ML shaking up particle physics too). But they don’t store all that. So they use extensive compute (and ML) to try to only store interesting events.

Seismic ships roam the seas taking images of underground structures, generating gobs of data, some of which is processed on ship and the rest elsewhere. A friend of mine creates RPi enabled devices that measure tank liquid levels deployed in the field.

More recently, smart cars are like a data center on tires, rolling across roads around the world generating more data than you want can even imagine. 5G towers are data centers ontop of buildings, in farmland, and in cell towers doting the highways of today. All off the beaten path, and all places where no data center has ever gone before.

In olden days there would have been much less processing done at the edge and more in an enterprise data center. But nowadays, with the advent of relatively cheap computing and storage, data can be pre-processed, compressed, tagged all done at the edge, and then sent elsewhere for further processing (mostly done in the cloud of course).

IT Vendors at the crossroads

And what does the hollowing out of the enterprise data centers mean for IT server and storage vendors, mostly danger lies ahead. Enterprise IT hardware spend will stop growing, if it hasn’t already, and over time, shrink dramatically. It may be hard to see this today, but it’s only a matter of time.

Certainly, all these vendors can become more cloud like, on prem, offering compute and storage as a service, with various payment options to make it easier to consume. And for storage vendors, they can take advantage of their installed base by providing software versions of their systems running in the cloud that allows for easier migration and onboarding to the cloud. The server vendors have no such option. I see all the above as more of a defensive, delaying or holding action.

This is not to say the enterprise data centers will go away. Just like, mainframe and tape before them, on prem data centers will exist forever, but will be relegated to smaller and smaller, niche markets, that won’t grow anymore. But, only as long as vendor(s) continue to upgrade technology AND there’s profit to be made.

It’s just that that astronomical growth, that’s been happening ever since the middle of last century, happen in enterprise hardware anymore.

Long term life for enterprise vendors will be hard(er)

Over the long haul, some server vendors may be able to pivot to the edge. But the diversity of compute hardware there will make it difficult to generate enough volumes to make a decent profit there. However, it’s not to say that there will be 0 profits there, just less. So, when I see a Dell or HPE server, under the hood of my next smart car or inside the guts of my next drone, then and only then, will I see a path forward (or sustained revenue growth) for these guys.

For enterprise storage vendors, their future prospects look bleak in comparison. Despite the data generation and growth at the edge, I don’t see much of a role for them there. The enterprise class feature and functionality, they have spent the decades creating and nurturing aren’t valued as much in the cloud nor are they presently needed in the edge. Maybe I’m missing something here, but I just don’t see a long term play for them in the cloud or edge.

~~~~

For the record, all this is conjecture on my part. But I have always believed that if you follow where new apps are being created, there you will find a market ready to explode. And where the apps are no longer being created, there you will see a market in the throws of a slow death.

Photo Credit(s):

New DRAM can be layered on top of CPU cores

At the last IEDM (IEEE International ElectronDevices Meenting), there were two sessions devoted to a new type of DRAM cell that consists or 2 transistors and no capacitors (2TOC) that can be built in layers on top of a micro processor which doesn’t disturb the microprocessor silicon. I couldn’t access (behind paywalls) the actual research but one of the research groups was from Belgium (IMEC) and the other from the US (Notre Dame and R.I.T). This was written up in a couple of teaser articles in the tech press (see IEEE Spectrum tech talk article).

DRAM today is built using 1 transistor and 1 capacitor (1T1C). And it appears that capacitors and logic used for microprocessors aren’t very compatible. As such, most DRAM lives outside the CPU (or microprocessor cores) chip and is attached over a memory bus.

New 2T0C DRAM Bit Cell: Data is written by appliying current to the WBL and WWL and bit’s are read by seeing if acurrent can pass through the RWL RBL

Memory busses have gotten faster in order to allow faster access to DRAM but this to is starting to reach fundamental physical limits and DRAM memory sizes aren’t scaling like the used to.

Wouldn’t it be nice if there were a new type of DRAM that could be easlly built closer or even layered on top of a CPU chip, with faster direct access from/to CPU cores. through inter chip electronics.

Oxide based 2T0C DRAM

DRAM was designed from the start with 1T1C so that it could hold a charge. With a charge in place it could be read out quickly and refreshed periodically without much of a problem.

The researcher found that at certain sizes (and with proper dopants) small transistors can also hold a (small) charge without needing any capacitor.

By optimizing the chemistry used to produce those transistors they were able to make 2T0C transistors hold memory values. And given the fabrication ease of these new transistors, they can easily be built on top of CPU cores, at a low enough temperature so as not to disturb the CPU core logic.

But, given these characteristics the new 2T0C DRAMB can also be built up in layers. Just like 3D NAND and unlike current DRAM technologies.

Today 3D NAND is being built at over 64 layers, with Flash NAND roadmap’s showing double or quadruple that number of layers on the horizon. Researchers presenting at IMEC were able to fabricate an 8 layer 2T0C DRAM on top of a microprocessor and provide direct, lightening fast access to it.

The other thing about the new DRAM technology is that it doesn’t need to be refreshed as often. Current DRAM must be refreshed every 64 msec. This new 2T0C technology has a much longer retention time and currently only needs to be refreshed every 400s and much longer retention times are technically feasible.

Some examples of processing needing more memory:

  • AI/ML and the memory wall -Deep learning models are getting so big that memory size is starting to become a limiting factor in AI model effectiveness. And this is just with DRAM today. Optane and other SCM can start to address some of this problem but ithe problem doesn’t go away, AI DL models are just getting more complex I recently read an article where Google trained a trillion parameter language model.
  • In memory databases – SAP HANA is just one example but they are other startups as well as traditional database providers that are starting to use huge amounts of memory to process data at lightening fast speeds. Data only seems to grow not shrink.

Yes Optane and other SCM today can solve some of thise problems. But having a 3D scaleable DRAM memory, that can be built right on chip core, with longer hold times and faster direct access can be a game changer.

It’s unclear whether we will see all DRAM move to the new 2T0C format, but if it can scale well in the Z direction has better access times, and longer retention, it’s unclear why this wouldn’t displace all current 1T1C DRAM over time. However, given the $Bs of R&D spend on new and current DRAM 1T1C fabrication technology, it’s going to be a tough and long battle.

Now if the new 2T0C DRAM could only move from 1 bit per cell to multiple bits per cell, like SLC to MLC NAND, the battle would heat up considerably.

Photo Credits:

Is hardware innovation accelerating – hardware vs. software innovation (round 6)

There’s something happening to the IT industry, that maybe has not happened in a couple of decades or so but hardware innovation is back. We’ve been covering bits and pieces of it in our hardware vs software innovation series (see Open source ASiCs – HW vs. SW innovation [round 5] post).

But first please take our new poll:

Hardware innovation never really went away, Intel, AMD, Apple and others had always worked on new compute chips. DRAM and NAND also have taken giant leaps over the last two decades. These were all major hardware suppliers. But special purpose chips, non CPU compute engines, and hardware accelerators had been relegated to the dustbins of history as the CPU giants kept assimilating their functionality into the next round of CPU chips.

And then something happened. It kind of made sense for GPUs to be their own electronics as these were SIMD architectures intrinsically different than SISD, standard von Neumann X86 and ARM CPUs architectures

But for some reason it didn’t stop there. We first started seeing some inklings of new hardware innovation in the AI space with a number of special purpose DL NN accelerators coming online over the last 5 years or so (see Google TPU, SC20-Cerebras, GraphCore GC2 IPU chip, AI at the Edge Mythic and Syntiants IPU chips, and neuromorphic chips from BrainChip, Intel, IBM , others). Again, one could look at these as taking the SIMD model of GPUs into a slightly different direction. It’s probably one reason that GPUs were so useful for AI-ML-DL but further accelerations were now possible.

But it hasn’t stopped there either. In the last year or so we have seen SPUs (Nebulon Storage), DPUs (Fungible, NVIDIA Networking, others), and computational storage (NGD Systems, ScaleFlux Storage, others) all come online and become available to the enterprise. And most of these are for more normal workload environments, i.e., not AI-ML-DL workloads,

I thought at first these were just FPGAs implementing different logic but now I understand that many of these include ASICs as well. Most of these incorporate a standard von Neumann CPU (mostly ARM) along with special purpose hardware to speed up certain types of processing (such as low latency data transfer, encryption, compression, etc.).

What happened?

It’s pretty easy to understand why non-von Neumann computing architectures should come about. Witness all those new AI-ML-DL chips that have become available. And why these would be implemented outside the normal X86-ARM CPU environment.

But SPU, DPUs and computational storage, all have typical von Neumann CPUs (mostly ARM) as well as other special purpose logic on them.

Why?

I believe there are a few reasons, but the main two are that Moore’s law (every 2 years halving the size of transistors, effectively doubling transistor counts in same area) is slowing down and Dennard scaling (as you reduce the size of transistors their power consumption goes down and speed goes up) has stopped almost. Both of these have caused major CPU chip manufacturers to focus on adding cores to boost performance rather than just adding more transistors to the same core to increase functionality.

This hasn’t stopped adding instruction functionality to each CPU, but it has slowed considerably. And single (core) processor speeds (GHz) have reached a plateau.

But what it has stopped is having the real estate available on a CPU chip to absorb lots of additional hardware functionality. Which had been the case since the 1980’s.

I was talking with a friend who used to work on math co-processors, like the 8087, 80287, & 80387 that performed floating point arithmetic. But after the 486, floating point logic was completely integrated into the CPU chip itself, killing off the co-processors business.

Hardware design is getting easier & chip fabrication is becoming a commodity

We wrote a post a couple of weeks back talking about an open foundry (see HW vs. SW innovation round 5 noted above)that would take a hardware design and manufacture the ASICs for you for free (or at little cost). This says that the tool chain to perform chip design is becoming more standardized and much less complex. Does this mean that it takes less than 18 months to create an ASIC. I don’t know but it seems so.

But the real interesting aspect of this is that world class foundries are now available outside the major CPU developers. And these foundries, for a fair but high price, would be glad to fabricate a 1000 or million chips for you.

Yes your basic state of the art fab probably costs $12B plus these days. But all that has meant is that A) they will take any chip design and manufacture it, B) they need to keep the factory volume up by manufacturing chips in order to amortize the FAB’s high price and C) they have to keep their technology competitive or chip manufacturing will go elsewhere.

So chip fabrication is not quite a commodity. But there’s enough state of the art FABs in existence to make it seem so.

But it’s also physics

The extremely low latencies that are available with NVMe storage and, higher speed networking (100GbE & above) are demanding a lot more processing power to keep up with. And just the physics of how long it takes to transfer data across a distance (aka racks) is starting to consume too much overhead and impacting other work that could be done.

When we start measuring IO latencies in under 50 microseconds, there’s just not a lot of CPU instructions and task switching that can go on anymore. Yes, you could devote a whole core or two to this process and keep up with it. But wouldn’t the data center be better served keeping that core busy with normal work and offloading that low-latency, realtime (like) work to a hardware accelerator that could be executing on the network rather than behind a NIC.

So real time processing has become faster, or rather the amount of time to execute CPU instructions to switch tasks and to process data that needs to be done in realtime to keep up with faster line speed is becoming shorter.

So that explains DPUs, smart NICS, DPUs, & SPUs. What about the other hardware accelerator cards.

  • AI-ML-DL is becoming such an important and data AND compute intensive workload that just like GPUs before them, TPUs & IPUs are becoming a necessary evil if we want to service those workloads effectively and expeditiously.
  • Computational storage is becoming more wide spread because although data compression can be easily done at the CPU, it can be done faster (less data needs to be transferred back and forth) at the smart Drive.

My guess we haven’t seen the end of this at all. When you open up the possibility of having a long term business model, focused on hardware accelerators there would seem to be a lot of stuff that needs to be done and could be done faster and more effectively outside the core CPU.

There was a point over the last decade where software was destined to “eat the world”. I get a lot of flack for saying that was BS and that hardware innovation is really eating the world. Now that hardtware innovation’s back, it seems to be a little of both.

Comments?

Photo Credits:

Software defined power grid

Read an article this past week in IEEE Spectrum (The Software Defined Power Grid is here) about a company that has been implementing software defined power grids throughout USA and the world to better integrate and utilize renewable energy alongside conventional power generation equipment.

Moreover, within the last year or so, Tesla has installed a Virtual Power Plant (VPP) using residential solar and grid scale batteries to better manage the electrical grid of South Australia (see Tesla’s Australian VPP propped up grid during coal outage). VPP use to offset power outages would necessitate something like a software defined power grid.

Software defined power grid

Not sure if there’s a real definition somewhere but from our perspective, a software defined power grid is one where power generation and control is all done through the use of programatic automation. The human operator still exists to monitor and override when something goes wrong but they are not involved in the moment to moment control of which power is saved vs. fed into the grid.

About a decade ago, we wrote a post about smart power meters (Smart metering’s data storage appetite) discussing the implementation of smart meters for home owners that had some capabilities to help monitor and control power use. But although that technology still exists, the software defined power grid has moved on.

The IEEE Spectrum article talks about a phasor measurement units (PMUs) that are already installed throughout most power grids. It turns out that most PMUs are capable of transmitting phasor power status at 60 times a second granularity and each status report is time stamped with high accuracy, GPS synchronized time.

On the other hand, most power grids today use SCADAs (supervisory control and data acquisition) to monitor and manage the power grid. But SCADAs only send data every 2-4 seconds. PMU’s are also installed in most power grids, but their information is not as important as SCADA to the monitoring, management and control of most (non-software defined) power grids.

One software defined power grid

PXiSE, the company in the IEEE Spectrum article, implemented their first demonstration project in Hawaii. That power grid had reached the limit of wind and solar power that it could support with human management. The company took their time and implemented a digital simulation of the power grid. But with the simulation in hand, battery storage and a off the shelf PC, the company was able to manage the grids power generation mix in real time with complete automation.

After that success, the company next turned to a micro-grid (building level power) with electronic vehicles, battery and solar power. Their software defined power grid reduced peak electricity demand within the building, saving significant money. With that success the company took their software defined power grid on the road to South Korea, Chile, Mexico and a number of other locations the world.

Tesla’s VPP

The Tesla VPP in South Australia, is planned to consists of up to 50K houses with solar PV panels and 13.5Kwh of batteries, able to deliver up to 250Mw of power generation and 650Mwh of power storage.

At the present time, the system has ~1000 house systems installed but even with that limited generation and storage capability it has already been called upon at least twice to compensate for coal generation power outage. To manage each and every household, they’d need something akin to the smart meters mentioned above in conjunction with a plethora of PMUs.

Puerto Rico’s power grid problems and solutions

There was an article not so long ago about the disruption to Puerto Rico’s power grid caused by Hurricanes Irma and Maria in IEEE Spectrum (Rebuilding Puerto Rico’s Power Grid: The Inside Story) and a subsequent article on making Puerto Rico’s power grid more resilient to hurricanes and other natural disasters (How to harden Puerto Rico’s power grid). The later article talked about creating micro grids, community PV and battery storage that could be disconnected from the main grid in times of disaster but also used to distribute power generation throughout the island.

Although the researchers didn’t call for the software defined power grid, it is our understanding that something similar would be an outstanding addition to their work there.

~~~~

As the use of renewables goes up and the price of batteries decreases while their capabilities go up over time, more and more power grids will need to become software defined. In the end, more software defined power grids with increasing renewables power generation and storage will make any power grid, more resilient and more fault tolerant.

Photo Credit(s):

Anti-Gresham’s Law: Good information drives out bad

(Good information is in blue, bad information is in Red)

Read an article the other day in ScienceDaily (Faster way to replace bad info in networks) which discusses research published in a recent IEEE/ACM Transactions on Network journal (behind paywall). Luckily there was a pre-print available (Modeling and analysis of conflicting information propagation in a finite time horizon).

The article discusses information epidemics using the analogy of a virus and its antidote. This is where bad information (the virus) and good information (the antidote) circulate within a network of individuals (systems, friend networks, IOT networks, etc). Such bad information could be malware and its good information counterpart could be a system patch to fix the vulnerability. Another example would be an outright lie about some event and it’s counterpart could be the truth about the event.

The analysis in the paper makes some simplifying assumptions. That in a any single individual (network node), both the virus and the antidote cannot co-exist. That is either an individual (node) is infected by the virus or is cured by the antidote or is yet to be infected or cured.

The network is fully connected and complex. That is once an individual in a network is infected, unless an antidote is developed the infection proceeds to infect all individuals in the network. And once an antidote is created it will cure all individuals in a network over time. Some individuals in the network have more connections to other nodes in the network while different individuals have less connections to other nodes in the network.

The network functions in a bi-directional manner. That is any node, lets say RAY, can infect/cure any node it is connected to and conversely any node it is connected to can infect/cure the RAY node.

Gresham’s law, (see Wikipedia article) is a monetary principle which states bad money in circulation drives out good. Where bad money is money that is worth less than the commodity it is backed with and good money is money that’s worth more than the commodity it is backed with. In essence, good money is hoarded and people will preferentially use bad money.

My anti-Gresham’s law is that good information drives out bad. Where good information is the truth about an event, security patches, antidotes to infections, etc. and bad infrormation is falsehoods, malware, biological viruses., etc

The Susceptible Infected-Cured (SIC) model

The paper describes a SIC model that simulates the (virus and antidote) epidemic propagation process or the process whereby virus and its antidote propagates throughout a network. This assumes that once a network node is infected (at time0), during the next interval (time0+1) it infects it’s nearest neighbors (nodes that are directly connected to it) and they in turn infect their nearest neighbors during the following interval (time0+2), etc, until all nodes are infected. Similarly, once a network node is cured it will cure all it’s neighbor nodes during the next interval and these nodes will cure all of their neighbor nodes during the following interval, etc, until all nodes are cured.

What can the SIC model tell us

The model provides calculations to generate a number of statistics, such as half-life time of bad information and extinction time of bad-information. The paper discusses the SIC model across complex (irregular) network topologies as well as completely connected and star topologies and derives formulas for each type of network

In the discussion portion of the paper, the authors indicate that if you are interested in curing a population with bad information it’s best to map out the networks’ topology and focus your curation efforts on those node(s) that lie along the (most) shortest path(s) within a network.

I wrongly thought that the best way to cure a population of nodes would be to cure the nodes with the highest connectivity. While this may work and such nodes, are no doubt along at least one if not all, shortest paths, it may not be the optimum solution to reduce extinction time, especially If there are other nodes on more shortest paths in a network, target these nodes with a cure.

Applying the SIC model to COVID-19

It seems to me that if we were to model the physical social connectivity of individuals in a population (city, town, state, etc.). And we wanted to infect the highest portion of people in the shortest time we would target shortest path individuals to be infected first.

Conversely, if we wanted to slow down the infection rate of COVID-19, it would be extremely important to reduce the physical connectivity of indivduals on the shortest path in a population. Which is why social distancing, at least when broadly applied, works. It’s also why, when infected, self quarantining is the best policy. But if you wished to not apply social distancing in a broad way, perhaps targeting those individuals on the shortest path to practice social distancing could suffice.

However, there are at least two other approaches to using the SIC model to eradicate (extinguish the disease) the fastest:

  1. Now if we were able to produce an antidote, say a vaccine but one which had the property of being infectious (say a less potent strain of the COVID-19 virus). Then targeting this vaccine to those people on the shortest paths in a network would extinguish the pandemic in the shortest time. Please note, that to my knowledge, any vaccine (course), if successful, will eliminate a disease and provide antibodies for any future infections of that disease. So the time when a person is infected with a vaccine strain, is limited and would likely be much shorter than the time soemone is infected with the original disease. And most vaccines are likely to be a weakened version of an original disease may not be as infectious. So in the wild the vaccine and the original disease would compete to infect people.
  2. Another approach to using the SIC model and is to produce a normal (non-transmissible) vaccine and target vaccination to individuals on the shortest paths in a population network. As once vaccinated, these people would no longer be able to infect others and would block any infections to other individuals down network from them. One problem with this approach is if everyone is already infected. Vaccinating anyone will not slow down future infection rates.

There may be other approaches to using SIC to combat COVID-19 than the above but these seem most reasonable to me.

So, health organizations of the world, figure out your populations physical-social connectivity network (perhaps using mobile phone GPS information) and target any cure/vaccination to those individuals on the highest number of shortest paths through your network.

Comments?

Photo Credit(s):

  1. Figure 2 from the Modeling and analysis of conflicting information propagation in a finite time horizon article pre-print
  2. Figure 3 from the Modeling and analysis of conflicting information propagation in a finite time horizon article pre-print
  3. COVID-19 virus micrograph, from USA CDC.

Gaming is driving storage innovation at WDC

I was at SFD19 a couple of weeks ago and Western Digital supplied the afternoon sessions on their technology (see videos here). Phil Bullinger gave a great session on HDDs and the data center market. Carl Che did a session on HDD technology and discussed on how 5G was going to ramp up demand for video streaming and IoT data requirements. Of course one of the sessions was on their SSD and NAND technologies.

But the one session that was pretty new and interesting to me was their discussion on how Gaming and how it’s driving system innovation. Eric Spaneut, VP of Client Computing was the main speaker for the session but they also had Leah Schoeb, Sr. Developer Manager at AMD, to discuss the gaming market and its impact on systems technology.

There were over 100M viewers of the League of Legends World Championships, with a peak viewership of 44M viewers. To put that in perspective the 2020 Super Bowl had 102M viewers. So gaming championships today are almost as big as the Super Bowl in viewership.

Gaming demands higher performing systems

Gaming users are driving higher compute processors/core counts, better graphics cards, faster networking and better storage. Gamers are building/buying high end desktop systems that cost $30K or more, dwarfing the cost of most data center server hardware.

Their gaming rigs are typically liquid cooled, have LEDs all over and are encased in glass. I could never understand why my crypto mine graphics cards had LEDs all over them. The reason was they were intended for gaming systems not crypto mines.

Besides all the other components in these rigs, they are also buying special purpose storage. Yes storage capacity requirements are growing for games but performance and thermal/cooling have also become major considerations.

Western Digital has dedicated a storage line to gaming called WD Black. It includes both HDDs and SSDs (internal NVMe and external USB/SATA attached) at the moment. But Leah mentioned that gaming systems are quickly moving away from HDDs onto SSDs.

Thermal characteristics matter

Of the WDC’s internal NVMe SSDs (WD Black SN750s), one comes with a heat sink attached. It turns out SSD IO performance can be throttled back due to heat. The heatsink allows the SSD to operate at higher temperatures and offer more bandwidth than the one without. Presumably, it allows the electronics to stay cooler and thus stay running at peak performance.

I believe their WD Black HDDs have internal fans in them to keep them cool. And of course they all come in black with LEDs surrounding them.

Storage can play an important part in the “gaming experience” for users once you get beyond network bottlenecks for downloading. For downloading and storage perform well . however for game loading and playing/editing videos/other gaming tasks, NVMe SSDs offer a significant performance boost over SATA SDDs and HDDs.

But not all gaming is done on high-end gaming desktop systems. Today a lot of gaming is done on dedicated consoles or in the cloud. Cloud based gaming is mostly just live streaming of video to a client device, whether it be a phone, tablet, console, etc. Live game streaming is almost exactly like video on demand but with more realtime input/output and more compute cores/graphic engines to perform the gaming activity and to generate the screens in “real” time. So having capacity and performance to support multiple streams AND the performance needed to create the live, real time experience takes a lot of server compute & graphics hardware, networking AND storage.

~~~~

So wherever gamers go, storage is becoming more critical in their environment. Both WDC and AMD see this market as strategic and growing, whose requirements are unique enough to demand special purpose products. They bothy are responding with dedicated hardware and product lines tailored to gaming needs.

Photo credit(s): All graphics in this post are from WDC’s gaming session video stream