NVIDIA Triton Giant Model Inference, a step too far

At GTC this week NVIDIA announced a new capability for their AI suite called Triton Giant Model Inference . This solution addresses the current and future problem of trying to perform inferencing with models whose parameters exceed a single GPU card.

During NVIDIA’s GTC show they showed a chart which indicates that model parameters are on an exponential climb (just eyeballing it here but 10X every year since 2018). Current models, like OpenAI’s GPT-3 have 175B parameters. Such a model would require ~350GB of GPU memory to perform inferencing on the whole model.

The fact that NVIDIA’s A100 currently sports 80GB of GPU memory means that GPT-3 would need to be cut up or partitioned to run on NVIDIA GPUs. Hence the need (from NVIDIA’s perspective) for a mechanism that can allow them to perform multi-GPU inferencing or their Triton Giant Machine Inference engine (GMI).

But first please take our new poll:

Why do we need GMI

It’s unclear what needs to be done to perform inferencing with a 175B parameter model today but my guess it involves a lot of manual splitting up of the model, into different layers/partitions and running the layers/partitions on separate GPUs and gluing the output of one portion to the input of the next. Such activity would be a complex, manual undertaking and would inherently slow down the model inferencing activities and add to inferencing latencies.

With Triton GMI, NVIDIA appears able to supply automated multi-GPU inferencing for models that exceed single GPU memory. Whether such models can span (DGX) servers or not was not revealed but even within a single DGX server there’s 4-A100s, so that provides an aggregate of 320GB of GPU memory. Of course, it’s very likely future Ampere GPUs will allow for more memory.

Why consider a step too far

Here’s my point, with artificial general intelligence (AGI, reasoning at human levels and beyond), coming sooner or later. My (and perhaps, humanities) preference is to have this happen later than earlier. Hopefully, this will give us more time to understand how to design/engineer/control AGI so that it doesn’t harm humanity or the earth. (See my post on Existential event risk… for more information on risks of Superintelligence)

One way to control or delay the emergence of AGI is to limit model size. Now NVIDIA, Google and others have already released capabilities that allow them to train models that exceed the size of one GPU.

Alas, the only thing left is to consider limit the size of models that can be used to perform inferencing. I fear that Triton GMI pretty much open up the flood gates to supply any size model inferencing. This will provide for more and more sophisticated AI/ML/DL models and will uncap model sizes in the near future.

Doing this will give us (humanity) a little more time to understand how to control AGI. But all this presupposes that any AGI will require more parameters than current DNN models. I think this is a safe assumption but I’m no expert.

Will delaying NVIDIA Triton GMI really help

I was not briefed on internals of GMI but possibly it makes use of DGX NV-Link and NVIDIA Software to automatically partition a DNN and deploy it over the 4-A100 GPUS in a DGX.

NVIDIA is not the only organization working on advancing DNN training and inferencing capabilities. And it’s very likely that more than one of them (Google, FaceBook, AWS, etc) have probably identified the model size as a problem for inferencing and are working on their own solutions. So delaying GMI will not be a long term fix.

But maybe if we could just delay this capability from reaching the market for 2 to 5 years it would have a follow on impact of delaying the emergence of AGI.

Is this going to stop some one/some organization from achieving AGI, probably not. Could it delay some person/organization/government from getting there – maybe. Perhaps, it will give humanity enough time to come up with other ways to control AGI. But I fear the more technology moves on, are options for controlling AGI diminish.

Don’t get me wrong. I think AI, DL NN and NVIDIA (Google, DeepMind, Facebook and others) have done a great service to help mankind succeed over this next century. And I in no way wish to hold back this capability. And a “good” AGI has the potential to help everyone on this earth in more ways than I can imagine.

But achieving AGI is a step function and once unleashed it may be difficult to control. Anything we can do today to a) delay the emergence of AGI and b) help to control it, is IMHO, worthy of consideration.


Photo Credits:

  • from NVIDIA GTC Keynote by Jensen Huang, CEO
  • From Hackernoon article, Can Bitcoin AGI develops to benefit humanity