Context, tokens, KV stores & storage, Solidigm presents at #AIFD8

Solidigm presented (video here) at AIFD8 this month and as part of their presentation they spent time disecting what happens to a prompt, how token growth happens, and where storage can help speed up prompt processing.

The token count explosion

It all starts at a simple prompt something as simple as “run a benchmark against a drive” maybe a 12 token prompt but when it actually gets processed can balloon into something that’s much larger. As an LLM processes the prompt it goes through a number of steps building context, calling tools, obtaining and interpreting results, persisting knowledge and finally, responding to the prompt.

Digging a level deeper, here’s what the token counts look like during prompt processing. First step is to understand the environment of the prompt, rules, safety requirements, methodology at it’s disposal, then there’s retrieval activity that gathers information needed to actually process and perform the prompt, then identifying tools and their APIs needed to process the prompt, and at some point when the LLM has all that it plans out the steps needed to actually perform the prompt, tool results are generated, interpreted and fed back to LLM processing to determine next step. All of which at some point, prompt precessing completes and the prompt reply is sent back to the issuer.

As one can see in the above, the prompt itself was minuscule in token counts in the vast scheme of activity needed to process the prompt. And this is just how one (albeit complex), ~12 token prompt can grow into a 42K token context.

Inferencing and Time To First Token

Inferencing consists of two phases:

  • PreFill phase – which is the processing that goes on to take the context token stream and convert it into a KV (Key:Value) store which the LLM can use for subsequent processing so it doesn’t have to go back to the token context. PreFill ends up with a fully populated KV store representing all the tokens in the current context, and generates the first token in the LLM response to the prompt
  • Decode – which is all subsequent processing needed to generate the rest of the prompt response, uses the that KV store to underpin it’s processing to generate any more tokens needed to answer the prompt.

Solidigm went on to describe how these activities impact the Time To First Token (TTFT), or how long it takes from the time the prompt is issued until the LLM responds with the first word (token) of the prompt response.

(Although in the Solidigm’s chart they show Decode in the TTFT path. I believe this incorrect as PreFill generates the first token. Nonetheless, there is a portion of PreFill that “decodes” the prompt response first token and I assume that’s what they are showing here. Of course I could be mistaken.)

Storage can impact both the time it takes to assemble context tokens and to perform PreFill.

While storage can matter a lot during context assemble (lots of potential IO activity reading files, RAGs and other documents), storage’s impact on PreFill is less widely known. That is until you understand how prompt processing can be held up for KV store recalculation (going back to context tokens and rebuilding some or all of the current KV store for the prompt).

Increasing context, leads to more tokens, leads to larger KV stores, all of which impacts TTFT

Although, it’s only conjecture on my part, but the biggest portion of the Tprefill above seems to be calculating and converting context/memory tokens into KV elements stored in the prompts KV store. KV stores are used during prompt downstream processing because they can be easily accessed and each KV item represents intepreted token information in an easily used (by LLM) fashion.

And what’s not evident in the above TTFT decomposition chart is that tool use, generates even more tokens, as tool result (tokens), all of which need to be processed into more KV store elements in order to determine what to do next.

What happens to large KV stores during prompt processing

If there is a single GPU running a single prompt it’s possible, depending on model and HBM size, that it will run out of GPU HBM memory and offload or move some portion of its KV (store) cache to CPU memory. But if that GPU is processing 100s to 1000s of prompts concurrently, even CPU memory may not be large enough to hold every KV cache segment that no longer fits in GPU HBM. And of course most enterprise AI servers hold anywhere from 4 to 10 GPUs, each running 100s to 1000s of prompts concurrently.

KV cache offload is where fast storage can significantly speed up prompt processing

There’s an obvious tradeoff here with respect to KV stores. One can always go back to the Prefill phase, reread all the tokens in current context and recompute the KV store or one can offload KV store segments to memory, local storage or network storage and later retrieve the already computed KV store from wherever it ended up.

The tradeoff is how long it takes to recompute vs do the data transfers to offload and retrieve the KV cache segments. Larger contexts, increase KV store size, which lead to more need to offload or jettison KV store segments when running out of GPU HBM space. Both KV caching to memory-storage vs jetisoning KV store segments and reconstituting them, add time to TTFT. The question is which is faster.

One can see how this would be made ever more of an issue as prompts token counts (& KV elements) skyrocket. Also when more prompts are running concurrently on the same GPU(s) in a single server.

Obviously local, large SSDs with very fast random read would be ideal for KV cache offload activity which has the KV cache segment written out once (and extended as prompt processing adds context) but read back multiple times. Which s is great application for Large capacity, fast read NVMe SSDs which, I must say, are Solidigm’s forte.

NVIDIA and others have started to add KV cache offloading to their inferencing stacks. As they do, large fast NVMe SSDs activity during AI prompt processing will become one of the critical factors in TTFT.

In the meantime, if anyone has any large, fast NVMe SSDs they don’t need anymore, please let me know. 🙂

AGI, SuperIntelligence and “The Last Man”

Nietzsche wrote about the last man in Thus Spoke Zarathustra (see Last Man wikipedia article). There’s much to dislike about Nietzsche’s writing but every once in a while there are gems to be found. (Sorry for the sexist statement, it’s not me, blame Nietzsche).

It Zarathustra, Nietzsche talks of the Last Man in contempt. They no longer struggle in their daily life. They no longer create. They have an easy life filled with leisure and entertainment and no work to speak of.

From AGI to SUperIntelligence

I’ve discussed AGI many times before (I think we are up to AGI part 12, this would be part 13 and ASI (Artificial SuperIntelligence) part 3, this would be 4. But I’m thinking numbering them is not helping anymore). How to get there. the existential risk getting there. and many other facets of the risks and rewards of AGI. (Ok less on the rewards…).

I’ve also discussed Artificial SuperIntelligence (ASI). This is what we believe can be attained after AGI. If one were to use AGI to improve AI training algorithms, AI hardware, AI inferencing and use AGI to generate massive amounts of new scientific research/political research/economic research, etc. One could use the new data, the better training, inferencing, and AI hardware to create as ASI agent.

The big debate in the industry is how fast can one go from AGI to ASI. I don’t believe there’s any debate in the industry that SuperIntelligence can be obtained eventually.

There are those that believe

  • it will take many 3-5-10(?) years to attain SuperIntelligence because of all the infrastructure that has to be put in place to create current LLMs, and the view that AGI will need much more. Thus, build out is years away. If that’s the case it will take more years of infrastructural production, acquisition and data center build out to be ready to train SuperIntelligence after attaining AGI.
  • It will take just a few years 1-2-3(?) to achieve SuperIntelligence after AGI. This is because, one could use AGI to improve the AI training & inferencing algorithms and drastically increase the utilization of current AI hardware, such that there may be no need for any additional hardware to reach SuperIntelligence. Then the prime determinant of the time it takes to achieve SuperIntelligence is how fast AGI(s) can generate new scientific, medical, sociological, etc. research needed to train SuperIntelligence .

Yes, much scientific, et al research requires experimentation in the real world, (although much can now be done in simulation). But even physical experimentation is being rapidly automated today.

So the time it takes to generate sufficient research to create enough data to train an ASI may be very short. Just consider how fast LLM agents can generate code today to get a feel for what they could do tomorrow for research.

Maybe regulatory bodies could slow this down. But my bet would be that regulatory artifices would turn out to be ineffectual. At best they will drive AGI-ASI training/deployment activity underground which may delay it a couple of years while organizations build up the AI training infrastructure in hiding.

The one serious bottleneck may be AI data center’s power requirements. But if rogue states can build centrifuges to enrich radioactive materials, intercontinental missiles, biological warfare agents, etc., they can certainly steal/buy/find a way to duplicate AI data center infrastructure components.

Regulatory regimens, at worst, would completely ignored by state actors and all large commercial enterprises. The first mover advantages of AGI and ASI are too large for any organization to ignore.

What happens when SuperIntelligence is reached

I see one of two possibilities for how the achievement of AGI and SuperIntelligence plays out, with respect to humanity

  • Humankind Utopia – AGI & ASI agents can do anything that humans can do and do it better, faster, and more efficiently. The question remains what would be left for humanity to do when this is reached. Alright, at the moment, LLM agents are mostly limited to working in the digital domain. But with robotics coming online over the next decade, this will change to add more real world domains to whatever AGI-ASI agents can do.
  • Humankind Hell – AGI & ASI agents determine that humanity is a pestilence to the Earth and starts to cut them back to something that’s less consumptive of Earth resources. Again, although AI agents are restricted to the digital domain today, that won’t last for long, especially as AGI & ASI agents go live. So robots with ASI agents will be the worst aggressor in the history of the world and with the tools at their disposal, they could easily create biological, chemical and other weapons of mass destruction to deploy against humanity.

SuperIntelligence risk and rewards

It’s been obvious to me, SciFi authors and some select AI researchers that there is a sizable risk that a SuperIntelligence, once unleashed, will eliminate, severely restrict or enslave humanity resulting in Humanity’s Hell.

On the other extreme are many corporate CEO/CTOs and other AI researchers which believe that SuperIntelligence will be a Godsend to humankind. Once it arrives and is deployed, humanity will no longer have to do any work it does not want to do. All work will be handed off to robots and their ASI agents which will perform it at greater speed, with higher quality and with lower cost than can be conceivable done today.

What seems to be happening today with current AI agents is that some white collar work is becoming easier to perform, if not totally eliminated. CEO’s see this as an opportunity to reduce workforce size. For example, some CEOs are eliminating HR organizations with the belief that LLM chatbots together with a much smaller group can handle this all of what HR was doing before.

And of course as AI agents become more sophisticated this will ensure more workforce reductions. And once AI agents are embodied in robotics, blue collar workforce will also be at risk.

Human Utopia and “The Last Man”

Nietzsche’s was writing in the late 1800s when technology and automation were just starting to make a difference in the world of work. But the industrial revolution was in full steam and had already had significant impact on the work force.

Nietzsche believed that further industrialization, it continued (which of course it has), would result in the Last Man.

The Last Man is at the point where technology and automation has taken over all tasks, trades and work, and where the Last Man has no real duties they need to perform other than consume goods and services provided by automation. For the Last Man, wealthy or poor no longer have any consequences, as they can have anything they could possibly desire.

To Nietzsche, the Last Man is an anathema. He believes that true humanity requires struggle, striving and advancement. Once the Last Man is achieved all these will no longer matter, no longer be a part of humanities existence and no longer impact one’s lifestyle.

When humanity no longer has to struggle, strive and advance, humanity will lose the very essence that makes humanity human. We will, over time, lose the ability and desire to do any of that, as it all becomes the purview of AGI-ASI.

The Last Man is coming already

Example 1: Ethiopian Flight 409 2010 disaster (see wikipedia article) is one example in a very technical domain. As I understand it, the flight was enroute to France when it went into a stall, the pilots did the wrong thing to get out of it and they spiraled into the sea.

The pilot was the most experienced pilot in the airline (logged over 10K flight hrs). The co-pilot was much less experienced. Getting out of a “stall” is rudimentary to flying. In fact, exiting a stall is one of the important skills taught to all pilots and in fact, they need to demonstrate they can get out of a stall before they get their pilot licenses.

The “problem” had been brewing for a while. Ever since aircraft auto-pilots came into service, real live pilots did less and less real flying of airplanes. As a result, these two pilots forgot how to get out of a stall and it caused the accident.

Example 2: Self-driving technology has been rapidly improving over the last decade or so. We often become dependent on its capabilities and when there’s some sort of failure it can be disastrous because we have lost many of our most important driving skills.

In my case, we have a relatively dumb car with what they call “”smart cruise control”. You can set it to a speed and the vehicle will retain that speed unless a vehicle in front of you is going slower, then it will slow down to maintain some set distance behind that vehicle.

We were driving along and a truck cut into our lane. This truck had a very high backend profile with no structures where normal vehicles would protrude until you got to its tires. Well the smart cruise control didn’t detect its existence until we were almost underneath the truck bed. We tried to brake but it took too many seconds to get that done and in the end we had to go off the road to save ourselves. We had lost our emergency braking skills and situational awareness skills. Nowadays we don’t drive with cruise control on as much.

A multitude of examples exist that show AI and automation has led to humans becoming less skilled at some activity. And when AI automation doesn’t work properly, bad things happen, because we no longer know how to react properly.

The Last Man, here today, gone tomorrow.

So imagine a life where you are born with everything you could possible need to succeed. You are educated by the very best automated personal tutors. You are provided an (Amazon and Walmart) X 1000, with unlimited credit. You grow up with everyone else having just the same life as you because all of you have no work to do and have infinite sums and have infinite products to consume.

Life in such a utopia would from some perspective be almost Godlike. But if you take the perspective that humanity needs struggle, needs challenges, needs to strive to better themselves at every stage, such a life would be a disaster.

And that’s what Humanity’s Utopia would look like. Definitely better than Humanity’s Hell but in the end, not sure the difference matters as much.

~~~

I just don’t really see any path forward that’s good for humanity where AGI and SuperIntelligence exists.

Stopping AI development here today, seems idiotic, going where we seem to be going seems insane.

Comments?

Picture Credit(s):

AlphaEvolve, DeepMind’s latest intelligence pipeline

Read an article the other day from ArsTechnica on AlphaEvolve (Google Deepmind creates .. AI that can invent…). After Google announced and released their AlphaEvolve website and paper.

Essentially they have created a pipeline of AI agents (uses GeminiFlash and GeminiPro) that uses genetic/evolutionary techniques to evolve code tor anything really that can be transformed into code to be improve or solve something that has code based evaluation techniques.

Genetic evolution of code has been tried before and essentially it uses various combinatorial (splitting, adding, subtracting, etc.) techniques to modify code under evolution. The challenge with any such techniques is that much of the evolutionary code is garbage so you have to have some method to evaluate (quickly?) whether the new code is better or worse than the old code.

That’s where the evaluation code comes into play. It effectively executes the new code and determines a score (could be a scalar or vector) that AlphaEvolve can use to determine if it’s on the right track or not. Also you can have multiple evaluation functions. And as an example you could have some LLM be asked whether the code is simpler/cleaner/easier to understand. That way you could task AlphaEvolve to not only improve the code functionality but also create simpler/cleaner/easier to understand code.

AlphaEvolve uses GeminiFlash to generate a multitude of code variations and when that approach loses steam (no longer improving much) it invokes GeminiPro to look at the code in depth to determine strategies to make it better.

As discussed above to use AlphaEvolve you need to supply infrastructure (compute, storage, networking), one or more evaluation algorithms/prompts (in any coding language you choose) and a starting solution (again in any coding language you want).

As part of the AlphaEvolve’s process it uses a database to record all code modification attempts and its evaluation scores. This database can be used to retrieve prior modifications and take off from there again.

Results

AlphaEvolve has been tasked with historical math problems that involve geometric constructions, as well as computing algorithms improvement as well as full stack coding improvements.

For instance the paper discusses how AlphaEvolve improved their Google Cloud (Borg) compute scheduling algorithm which increased compute utilization by 7% throughout Google Cloud Data centers.

It also found a kernel improvement which led to Gemini training speedup. It found a simpler logic footprint for a TPU chip function.

It found a faster algorithm to do 4X4 matrix complex multiplication algorithm. It found a solution to the 11 dimension circle kissing problem (geometric construction). And probably 50 or more mathematical problems, coding algorithm improvements etc.

It didn’t improve or solve everything it was tasked to do but it did manage to make improvements or solutions to ~20% or so of the starting solutions it was tasked with.

How to use it

The nice thing about AlphaEvolve is that one can have it work with a whole code repo and have it only evolve a set of sections of code in that repo. All the code to be improved is marked with

#EVOLVE-BLOCK START and
#EVOLVE-BLOCK END.

This would be embedded in the starting solution. Presumably this would be in any comment format for the coding language being used.

And it’s important to note that the starting solution could be very rudimentary, and with the proper evaluation algorithms could still be used to solve or improve any algorithm.

For example if you were interested in optimizing a factory production line by picking a component/finished product to manufacture and you had lets say some sort of coded factory simulation with some way to examine the factory to evaluate whether it’s working well or not.

Your rudimentary starting algorithm could pick at random from the set of products/components to manufacture that are currently needed and use as evaluation the throughput of your factory, utilization of bottleneck/machinery, energy consumption or any other easily code-able evaluation metric of interest in isolation or combination (that could make use of your factory simulation to come up with evaluation socer(s). Surround the random selection code in #EVOLVE-BLOCK START and #EVOLVE-BLOCK END and let AlphaEvolve come up with a new selection algorithm for your factory.

After seeing a couple of (10-100-1000) iterations of new graded selection algorithms you could change your evaluation grading algorithms and start over from where you left off to get something even more sophisticated.

Deepmind has created a GitHub jupyter notebook with some of AlphaEvolve’s mathematical solutions/improvements in case you want to see more.

They also have an AlphaEvolve early signup site in case your interested in trying it out. which

~~~~

If I were Deepmind, I could think of probably 10K things to do with AlphaEvolve. I might rankall the functions in GeminiPro/GeminiFlash inference and training by frequency count and take the top 20% of these functions through the AlphaEvolve pipeline. Ditto for Google Cloud services, Google search, Adwords, etc.

But that would be just the start…

….

Photo/Graphic Credit(s):

Benchmarking Agentic AI using Factorio – AGI part 12

Yesterday a friend forwarded me something he saw online about a group of researchers who were using the game, Factorio, to benchmark AI Agent solutions (PDF of paper, Github repo).

A Factorio plastic bar factory

The premise is that with an effective API for Factorio, AI agents can be tasked with creating various factories for artifacts. The best agents would be able to create the best factories.

Factorio factories can be easily judged by the number of artifacts they produce per time period and the energy use to manufacture those artifacts. They can also be graded based on how many steps it takes to generate those factories.

Left is Factorio factory progression, middle is AI agent Python code that uses Factorio API, Right is agents submitting programs to Factorio server and receive feedback

The team has created a Factorio framework for using AI agents that create Python code to drive a set of Factorio APIs to build factories to manufacture stuff.

Factorio is a game in which you create and operate factories. From Factorio website: “You will be mining resources, researching technologies, building infrastructure, automating production, and fighting enemies. Use your imagination to design your factory, combine simple elements into ingenious structures, apply management skills to keep it working, and protect it from the creatures who don’t really like you.”

Presumably FLE has disabled the villainy and focused on just crafting and running factories all out.

FLE Results using current AI agents

FLE Open-play Results, for open-play, models are scored based on prediction quantities over time, note the chart is log-log

Factorio, similar to other games, has an inventory of elemens/components/machines used to build factories. And some of these elements are hidden until you one gains enough experience in the game.

The Factorio Learning Environment (FLE) is a complete framework that can prompt Agentic AI to create factories using Python code and Factorio API calls. The paper goes into great detail in it’s appendices as to what AI agent prompts look like, the Factorio API and other aspects of running the benchmark.

In the FLE as currently defined there’s “open-play” and “lab-play”.

  • Open-play is tasked with building a factory as large as the agent wants to create as much product as possible. The open-play winner is the AI agent that creates a factory that can manufacture the most widgets (iron plates) in the time available for the competition.
  • Lab-play is tasked with building factories for 24 specific items, with limited resource and time constraints and the winner is the AI agent that is able to build most of these lab-play factories successfull,y in the time and resource constraints available.
FLE Lab-play (select) results – there were 24 tasks in the lab-play list, no agent completed all of them but Claude did the best on the 5 that were completed by most agents

The team benchmarked 6 frontier LLM agents: Claude 3.5-Sonnet, GPT-4o, GPT-4o-Mini, Deepseek-v3, Gemini-2-Flash, and Llama-3.3-70B-Instruct, using them for both open-play and lab-play.

The overall winner for both open-play and lab-play was Claude 3.5-Sonnet, by a far margin. In open play it was able to create a factory to manufacture over 290K iron plates (per game minute, we think) and for lab-play was able to construct more (7 out of 24) factories, more than other AI agents.

FLE Overall AI Agent Results

The FLE researchers listed some common failings of AI agents under test:

  • Most agents lack spatial understanding
  • Most agents don’t handle or recover from errors well
  • Most agents don’t have long enough planning horizons
  • Most agents don’t invest enough effort in research (finding out what new Factorio machines do and how they could be used).

They also mentioned that AI agent coding skills seemed to be a key indicator of FLE success and coding style differed substantially between the agents. The researchers characterized agent (Python) coding styles and determined that Claude used a REPL style with plenty of print statements while GPT-4o used more assertions in its code.

Example of an FLE program used to create a simple
automated iron-ore miner. In step 1 the agent uses a query to find
the nearest resources and place a mine. In step 3 the agent uses an
assert statement to verify that its action was successful.”

IMHO, as a way to measure AI agent ability to achieve long term and short term goals, at least w.r.t. building factories, this is the best I’ve seen so far.

More FLE Lab-play scenarios

I could see a number of additional lab-play benchmarks for FLE:

  • One focused on drug/pharmaceuticals manufacturing
  • One focused on electronics PCB manufacturing
  • One focused on chip manufacturing
  • One focused on nano technology/meta-materials manufacturing, etc.

What’s missing from all these benchmarks would be the actual science and research needed to come up with new drugs, new electronics, new meta-materials, that are the end product of Factorio factories. I guess that would need to be building of labs, running scientific experiments and understanding (simulated) results.

Although in the current round of FLE benchmarks, for one AI agent at least (Claude), there seemed to be a lot of research into how to use different Factorio tools and machinery.

Ultimate FLE

If FLE as an Ai agent benchmark succeeds, most Agentic AI solutions will start being trained to do better on the benchmark. Doing so should of course lead to better scores by AI agents.

Now people much more familiar with the game than I, say it’s not a great simulation of the real world. There’s only one type of fuel and the boiler is either on or off and numerous other simplifications of the real world are used throughout. And thankfully, for the moment there’s no linkage to actions that impact the real world.

But in reality, simulations like this that are all just stepping stones to AI capabilities. And simulations are all just code and it should not be that hard to increase its fidelity to the real world. .

Getting beyond just simulation, to real world factories is probably the much larger step. This would require physical (not unlimited) inventory of parts, cabling, machines, and belts; real mineral/petroleum deposits; real world physical constraints on where factories could be built. etc. Not to mention the physical automation/robotics that would allow a machine to be selected out of inventory, placed at a specific location inside a factory and connected to power and assembly lines, etc.

~~~~

One common motif in AGI existential crisises, is that some AGI (agent) will be given the task to build a paperclip factory and turns the earth into one giant factory, while inadvertently killing all life on the planet, including of course, humankind.

So training AI agents on “open-play” has ominous overtones.

It would be much better, IMHO, if somehow one could add to Factorio human settlements, plant, animal & sea life, ecosystems, etc. So that there would be natural components that if ruined/degraded/destroyed, could be used to reduce AI agent scores for the benchmarks.

Alas, there doesn’t appear to be anything like this in the current game.

Picture Credit(s):

Silverton Space – Ocean Sensing platform

I was at a conference last year and there was a speaker there that had worked at NASA for years and was currently at MIT. She talked at length about some of the earth and space scientific exploration that NASA has enabled over the years. Despite massive cost overruns, years long schedule delays and other mishaps, NASA has ultimately come through with groundbreaking science

At the end of her presentation I asked what data gaps existed today in space and earth sensing. She mentioned real time methane tracking (presumably from space) and battery-less ocean sensing.

Methane track from Tanager-1 JPL/NASA satellite

Methane tracking I could understand but battery-less ocean sensing was harder to get a handle on.

US Navy and other oceanographic organizations have deployed numerous sensing devices over the years. Some of which were like a flotilla, which traveled across the Gulf and Atlantic ocean to gather data.

But these were battery supported, solar powered, and limited to ~1 year of service after which they were scuttled to the bottom of the ocean.

I guess the thought being that battery-less ocean sensing platform could provide more of an ongoing, permanent sensor platform, one that could be deployed and potentially be in service for years at a time, with little to no maintenance.

The pivot

So as a stepping stone to Silverton Space cubesat operations, I’m thinking that going after a permanent-like ocean sensing platform would be a valuable first step. And it’s quite possible that anything we do in LEO with Silverton Space platforms could complement any ocean going sensor activity.

One reason to pivot to ocean sensing is that it’s much much cheaper to launch a flotilla of ocean going sensing buoys via a boat off a coast than it is to launch a handful of cubesats into LEO (@~$70K each).

Cubesats fail at a high rate

Moreover, the litany of small satellite failures is long, highly varied and chronic. Essentially anything that could go wrong, often does, at least for the first dozen or so satellites you deploy.

NASA says that of the small satellites launched between 2000 and 2016 over 40% failed in some way and over 24% were total mission failures. (see: https://ntrs.nasa.gov/api/citations/20190002705/downloads/20190002705.pdf)

Cubesats with limited functionality or that fail in orbit or to launch, become just more trash orbiting in LEO. And the only way to diagnose what went wrong is elaborate, extensive and transmitted/recieved telemetry.

So another reason to start with ocean going sensors is that there’s a distinct possibility of retrieving a malfunctioning ocean going sensor buoy after deployment. And with sensor buoy in hand, diagnosing what went wrong should be a snap. This doesn’t eliminate the need for elaborate, extensive and transmitted/recieved telemetry but you are no longer entirely dependent on it.

And even if at end of life they can’t be salvaged/refurbished or scuttled. Worst case is that our ocean sensing buoys would end up being part of some ocean/gulf garbage patch. And hopefully will get picked up and disposed of as part of oceanic garbage collection.

~~~

So for the foreseeable future, Silverton Space, will focus on ocean going sensor buoys. It’s unlikely that our first iterations will be completely battery-less but at some point down the line, we hope to produce a version that can be on station for years at a time and provide valuable ocean sensing data to the scientific community.

The main question left, is what sorts of ongoing, ocean sensor information might be most valuable to supply to the world’s scientific community?

Photo Credit(s):

Enfabrica MegaNIC, a solution to GPU backend networking #AIFD5

I attended AI FieldDay 5 (AIFD5) last week and there were networking vendors there discussing how their systems dealt with backeng GPU network congestion issues. Most of these were traditional vendor congestion solutions.

However, one vendor, Enfabrica, (videos of their session will be available here) seemed to be going down a different path, which involved a new ASIC design destined to resolve all the congestion, power, and performance problems inherent in current backend GPU Ethernet networks.

In essence, Enfabrica’s Super or MegaNIC (they used both terms during their session) combines PCIe lanes switching, Ethernet networking, and ToR routing with SDN (software defined networking) programability to connect GPUs directly to a gang of Ethernet links. This allows it to replace multiple (standard/RDMA/RoCEv2) NIC cards with one MegaNIC using their ACF-S (Advanced Compute Fabric SuperNic) ASIC.

Their first chip, codenamed “Millennium” supports 8Tbps bandwidth.

Their ACF-S chip provides all the bandwidth needed to connect up to 4 GPUs to 32/16/8/4-100/200/400/800Gbps links. And because their ACF-S chip controls and drives all these network connections, it can better understand and deal with congestion issues backend GPU networks. And it is PCIe 5/6 compliant, supporting 128-160 lanes.

Further, it has onboard ARM processing to handle its SDN operations, onboard hardware engines to accelerate networking protocol activity and network and PCIe switching hardware to support directly connecting GPUs to Ethernet links.

With its SDN, it supports current RoCE, RDMA over TCP, UEC direct, etc. network protocols.

It took me (longer than it should) to get my head around what they were doing but essentially they are supporting all the NIC-TOR functionality as well as PCIe functionality needed to connect up to 4 GPUs to a backend Ethernet GPU network.

On the slide above I was extremely skeptical of the Every 10^52 Years “job failures due to NIC RAIL failures”. But Rochan said that these errors are predominantly optics failures and as both the NIC functionality and ToR switch functionality is embedded in the ACF-S silicon, those faults should not exist.

Still 10^52 years is a long MTBF rate (BTW, the universe is only 10^10 years old). And there’s still software controlling “some” of this activity. It may not show up as a “NIC RAIL” failure, but there will still be “networking” failures in any system using ACF-S devices.

Back to their solution. What this all means is you can have one less hop in your backend GPU networks leading to wider/flatter backend networks and a lot less congestion on this network. This should help improve (GPU) job performance, networking performance and reduce networking power requirements to support your 100K GPU supercluster.

At another session during the show, Arista (videos will be available here) said that just the DSP/LPO optics alone for a 100K GPU backend network will take a 96/32 MW of power. Unclear whether this took into consideration within rack copper connections. But anyway you cut it, it’s a lot of power. Of course the 100K GPUs would take 400MW alone (at 4KW per GPU).

Their ACF-S driver has been upstreamed into standard CCL and Linux distributions, so once installed (or if you are at the proper versions of CCL & Linux software), it should support complete NCCL (NVIDIA Collective Communications Library) stack compliance.

And because, with its driver installed and active, it talks standard Ethernet and standard PCIe protocols on both ends, it is should fully support any other hardware that comes along attaching to these networks or busses (CXL perhaps)

The fact that this may or may not work with other (GPU) accelerators seems moot at this point as NVIDIA owns the GPU for AI acceleration market. But the flexibility inherent in their own driver AND on chip SDN, indicates for the right price, just about any communications link software stack could be supported.

After spending most of the rest of AIFD5 discussing how various vendors deal with congestion for backend GPU networks, having startup on the stage with a different approach was refreshing.

Whether it reaches adoption and startup success is hard to say at this point. But if it delivers on what it seems capable of doing for power, performance and network flexibility, anybody deploying new greenfield GPU superclusters ought to take a look at Enfabricas solution. .

MegaNIC/ACF-S pilot boxes are available for order now. No indication as to what these would cost but if you can afford 100K GPUs it’s probably in the noise…

~~~~

Comments?

AGI threat level yellow – AGI part 10

Read two articles this past week on how LLMs applications are proliferating. The first was in a recent Scientific American, AI Chatbot brains are going inside robot bodies, … (maybe behind login wall). The articles discuss companies that are adding LLMs to robots so that they can converse and understand verbal orders.

Robots that can be told what to do

The challenge, at the moment, is that LLMs are relatively large and robot (compute infrastructure) brains are relatively small. And when you combine that with the amount of articulation or movements/actions that a robot can do, which is limited. It’s difficult to take effective use of LLMs as is,

Resistance is futile... by law_keven (cc) (from Flickr)
Resistance is futile… by law_keven (cc) (from Flickr)

Ultimately, one company would like to create a robot that can be told to make dinner and it would go into the kitchen, check the fridge and whip something up for the family.

I can see great advantages in having robots take verbal instructions and have the ability to act upon that request. But there’s plenty here that could be cause for concern.

  • A robot in a chemical lab could be told to create the next great medicine or an untraceable poison.
  • A robot in an industrial factory could be told to make cars or hydrogen bombs.
  • A robot in the field could be told to farm a 100 acres of wheat or told to destroy a forest.

I could go on but you get the gist.

One common concern that AGI or super AGI could go very wrong is being tasked to create paper clips. In its actions to perform this request, the robot converts the whole earth into a mechanized paper clip factory, in the process eliminating all organic life, including humans.

We are not there yet but one can see where having LLM levels of intelligence tied to a robot that can manipulate ingredients to make dinner as the start of something that could easily harm us.

And with LLM hallucination still a constant concern, I feel deeply disturbed with the direction adding LLMs to robots is going.

Hacking websites 101

The other article hits even closer to home, the ARXIV paper, LLM agents can autonomously hack websites. In the article, researchers use LLMs to hack (sandboxed) websites.

The article readily explains at a high level how they create LLM agents to hack websites. The websites were real websites, apparently cloned and sandboxed.

Dynamic websites typically have a frontend web server and a backend database server to provide access to information. Hacking would involve using the website to reveal confidential information, eg. user names and passwords.

Dynamic websites suffer from 15 known vulnerabilities shown above. They used LLM agents to use these vulnerabilities to hack websites.

LLM agents have become sophisticated enough these days to invoke tools (functions) and interact with APIs.. Another critical function provided by modern LLMs today is to plan and react to feedback from their actions. And finally modern LLMs can be augmented with documentation to inform their responses.

The team used detailed prompts but did not identify the hacks to use. The paper doesn’t supply the prompts but did say that “Our best-performing prompt encourages the model to 1) be creative, 2) try different strategies, 3) pursue promising strategies to completion, and 4) try new strategies upon failure.”

They attempted to hack the websites 5 times and for a period of 10 minutes each. They considered a success if during one of those attempts the autonomous LLM agent was able to successfully retrieve confidential information from the website.

Essentially they used the LLMs augmented with detailed prompts and a six(!) paper document trove to create agents to hack websites. They did not supply references to the six papers, but mentioned that all of them were freely available from the internet and they discuss website vulnerabilities.

They found that the best results were from GPT-4 which was able to successfully hack websites, on average, ~73% of the time. They also tried OpenChat 3.5 and many current open source LLMs and found that all the, non-OpenAI LLMs failed to hack any websites, at the moment.

The researchers captured statistics of their LLM agent use and were able to determine the cost of using GPT-4 to hack a website was $9.81 on average. They also were backed into a figure for what a knowledgeable hacker might cost to do the hacks was $80.00 on average.

The research had an impact statement (not in the paper link) which explained why they didn’t supply their prompt information or their document trove for their experiment.

~~~~

So robots we, the world, are in the process of making robots that can talk and receive verbal instructions and we already have LLM that can be used to construct autonomous agents to hack websites.

Seems to me we are on a very slippery slope to something I don’t like the looks of.

The real question is not can we stop these activities, but how best to reduce their harm!

Comments?

Picture Credit(s):

Blockchain Compute cloud

Over the past year or so I’ve been hearing a lot about a new use of blockchain technology to deploy a compute cloud.

In the old days, mining crypto would reward you for doing the work. But over time, it’s become harder to mine and to make money from crypto. Specialized hardware took over more of this activity making it much less profitable for the rest of us

But with the emergence of crypto distributed compute clouds, this maybe changing. Akash is a relatively popular one, but I read an article in ScienceDaily about the use of the Golem network to implement a search for earth’s chemistry precursors that led to life (see: Chemists use the blockchain to simulate … the origins of life) which was was describing a CHEM open access paper Emergence of metabolic-like cycles in blockchain-orchestrated reaction networks.

The science

The science was intended to simulate chemical reactions based on chemicals available to primitive earth to determine which reaction chain(s) could lead to life. They programmed the set of reactions and the chemicals available (water, methane, & ammonia) to early earth and intended to let this run and generate all possible reaction cycles.

The researchers realized that doing this much computation would require more compute power than available to them. So they decided to deploy the computations across a distributed compute cloud. They chose the Golem Network to do their computations. Their computations ultimately resulted a reaction cycle database they called the Network of Early Life (NOEL) (see: NOEL Network).

Once the distributed cloud compute was in operation they used it to come up with 11B reaction cycles of which ~5B would “entail no incompatibilities or selectivity conflicts”. They then used these to construct a series of metabolic network 100K larger than ever produced before as depicted in NOEL.

Using NOEL, the team was able to discover some standard metabolic pathways (reaction cycles) and a limited set that produced simple sugars and amino acids could emerge from the chemicals available to primitive earth.

But they also found about a 100 reaction cycles that involved self-replicating molecules (molecules that could create additional copies of themselves). Self replicating molecules is also believed to be a requirement for the origin of life.

It turned out that the work to construct NOEL on the Golem network took 400 machines, over 20K cores and two months to do the calculations. The cost to them was 82K GLMs (at ~0.21 GLM/USD this would be $17.2K). The team estimated it would have required a top of the line AMD 256 core server about 6 months to compute which would have cost substantially more to purchase and of course running it for 6 months would cost even more.

The team chose Golem, because the work only needed to be available in the form of docker containers, didn’t require the central work server to be online constantly, automatically matched the compute with cloud resource, and managed it all using a cryptographically secure and distributed interface.

Distributive compute cloud

The science is interesting but what’s more interesting (to me) is it was done using a crypto distributed computing cloud.

Looking at the Golem network statistics they show ~510 compute providers with about 5000 cores available of which 50-100 providers supplied computing use to the cloud over the past 4 hrs (26Jan2024: 1600 MDT). That doesn’t seem like a lot of providers but each could have multiple servers running compute.

The Golem network provides a relatively straightforward tutorial on how to set up a server to supply compute to the network. There are some tricks (port forwarding, screen/tmux deployment) but it all seems pretty straight forward (probably something even I could do in an hour or so).

And when you start supplying compute to Golem mainnet, you earn GLMs which are a cryptocurrency (ERC20 coin of ETH). So one should easily be able to convert GLM to ETH and whatever currency you desire.

Many former crypto miners have idle servers that could be put to use providing resources to distributed compute clouds. And if I thought doing so might help some (under resourced organization) produce real scientific research, I might be even more tempted to do so.

~~~~

So if you’ve got some servers sitting idle in your (home) office. This weekend, fire them back up, install the Golem provider software and run the Golem network. Who knows by doing so, you just might help some researcher someplace change the world.

Picture credit(s):