↑ comment by jbash ·
2023-11-03T23:23:39.025Z · LW(p) · GW(p)
OK... although I notice that everybody in the initial post is just assuming you could run the uploads without providing any arguments.
Human brains have probably more than 1000 times as many synapses as current LLMs have weights. All the values describing the synapse behavior have to be resident in some kind of memory with a whole lot of bandwidth to the processing elements. LLMs already don't fit on single GPUs.
Unlike transformers, brains don't pass nice compact contexts from layer to layer, so splitting them across multiple GPU-like devices is going to slow you way down because you have to shuttle such big vectors between them... assuming you can even vectorize most of it at all given the timing and whatnot, and don't have to resort to much discrete message passing.
It's not even clear that you can reduce a biological synapse to a single weight; in fact you probably can't. For one thing, brains don't run "inference" in the way that artificial neural networks do. They run forward "inference-like" things, and at the same time do continuous learning based on feedback systems that I don't think are well understood... but definitely are not back propagation. It's not plausible that a lot of relatively short term tasks aren't dependent on that, so you're probably going to have to do something more like continuously running training than like continuously running inference.
There are definitely also things going on in there that depend on the relative timings of cascades of firing through different paths. There are also chemicals sloshing around that affect the ensemble behavior of whole regions on the scale of seconds to minutes. I don't know about in brains, but I do know that there exist biological synapses that aren't just on or off, either.
You can try to do dedicated hardware, and colocate the "weights" with the computation, but then you run into the problem that biological synapses aren't uniform. Brains actually do have built-in hardware architectures, and I don't believe those can be replicated efficiently with arrays of uniform elements of any kind... at least not unless you make the elements big enough and programmable enough that your on-die density goes to pot. If you use any hardwired heterogeneity and you get it wrong, you have to spin the hardware design, which is Not Cheap (TM). You also lose density because you have to replicate relatively large computing elements instead of only replicating relatively dense memory elements. You do get a very nice speed boost on-die, but I at a wild guess I'd say that's probably a wash with the increased need for off-die communication because of the low density.
If you want to keep your upload sane, or be able to communicate with it, you're also going to have to give it some kind of illusion of a body and some kind of illusion of a comprehensible and stimulating environment. That means simulating an unknown but probably large amount of non-brain biology (which isn't necessarily identical between individuals), plus a not-inconsiderable amount of outside-world physics.
So take a GPT-4 level LLM as a baseline. Assume you want to speed up your upload to be able to fast-talk about as fast as the LLM can now, so that's a wash. Now multiply by 1000 for the raw synapse count, by say 2 for the synapse diversity, by 5? for the continuous learning, by 2 for the extra synapse complexity, and by conservatively 10 for the hardware bandwidth bottlenecks. Add another 50 percent for the body, environment, etc.
So running your upload needs 300,000 times the processing power you need to run GPT-4. Which I suspect is usually run on quad A100s (at maybe $100,000 per "inference machine").
You can't just spend 30 billion dollars and shove 1,200,000 A100s into a chassis; the power, cooling, and interconnect won't scale (nor is there fab capacity to make them). If you packed them into a sphere at say 500 per cubic meter (which allows essentially zero space for cooling or interconnects, both of which get big fast), the sphere would be about 16 meters across and dissipate 300MW (with a speed of light delay from one side to the other of 50ns).
Improved chips help, but won't save you. Moore's law in "area form" is dead and continues to get deader. If you somehow restarted Moore's law in its original, long-since-diverged-from form, and shrank at 1.5x in area per year for the next 17 years, you'd have transistors ten times smaller than atoms (and your power density would be, I don't know, 100 time as high, leading to melted chips). And once you go off-die, you're still using macrosopic wires or fibers for interconnect. Those aren't shrinking... and I'm not sure the dies can get a lot bigger.
Switching to a completely different architecture the way I mentioned above might get back 10X or so, but doesn't help with anything else as long as you're building your system out of a fundamentally planar array of transistors. So you still have a 240 cubic meter, 30MW, order-of-3-billon-dollar machine, and if you get the topology wrong on the first try you get to throw it away and replace it. For one upload. That's not very competitive with just putting 10 or even 100 people in an office.
Basically, to be able to use a bunch of uploads, you need to throw away all current computing technology and replace it with some kind of much more element-dense, much more interconnect-dense, and much less power-dense computing substrate. Something more brain-like, with a 3D structure. People have been trying to do that for decades and haven't gotten anywhere; I don't think it's going to be manufactured in bulk by 2040.
... or you can try to trim the uploads themselves down by factors that end with multiple zeroes, without damaging them into uselessness. That strikes me as harder than doing the scanning... and it also strikes me as something you can't make much progress on until you have mostly finished solving the scanning problem.
It's not that you can't get some kind of intelligence in realistic hardware. You might even be able to get something much smarter than a human. But you're specifically asking to run a human upload, and that doesn't look feasible.
Replies from: steve2152, jacobjacob