Posts

Comments

Comment by ckbyrd on How feasible/costly would it be to train a very large AI model on distributed clusters of GPUs? · 2022-01-25T15:44:13.841Z · LW · GW

Don't know a ton about this but here are a few thoughts:

- Overall, I think distributed compute is probably not good for training or inference, but might be useful for data engineering or other support functions. 

- Folding@home crowdsources compute for expanding markov state models of possible protein folding paths. Afaik, this doesn't require backpropagation or any similar latency-sensitive updating method. The crowdsourced computers just play out a bunch of scenarios, which are then aggregated and pruned off-line. Interesting paths are used to generate new workloads for future rounds of crowdsourcing. 

This is an important disanalogy to deep RL models, and I suspect this is why F@H doesn't suffer from the issues Lennart mentioned (latency, data bandwith, etc.)

This approach can work for some of the applications that ppl use big models for - e.g. Rosetta@home does roughly the same thing as Alphafold, but it's worse at it. (afaik Alphafold can't do what F@H does - different problem)

- F@H and DL both benefit from GPUs because matrix multiplication is pretty general. If future AI systems train on more specialized hardware, it may become too hard to crowdsource useful levels of compute. 

- Inference needs less aggregate compute, but often requires very low latency, which probably makes it a bad candidate for distribution.

- IMO crowdsourced compute is still interesting even if it's no good for large model training/inference. It's really good at what it does (see F@H, Rosetta@home, cryptomining collectives, etc.), and MDPs are highly general even with long latencies/aggregation challenges. 

Maybe clever engineers could find ways to use it for e.g. A/B testing fine-tunings of large models, or exploiting unique datasources/compute environments (self-driving cars, drones, satellites, IoT niches, phones).