How did LW update p(doom) after LLMs blew up?
post by FinalFormal2 · 2023-04-22T14:21:23.174Z · LW · GW · 7 commentsThis is a question post.
Contents
Answers 15 DirectedEvolution 14 Daniel Kokotajlo 9 tailcalled 2 mako yass 2 Raemon 1 waveman None 7 comments
Here's something which makes me feel very much as if I'm in a cult:
After LLMs became a massive thing, I've heard a lot of people p(doom) on the basis that we were in shorter timelines.
How have we updated p(doom) on the idea that LLMs are very different than hypothesized AI?
Firstly, it would seem to me to be much more difficult to FOOM with an LLM, it would seem much more difficult to create a superintelligence in the first place, and it seems like getting them to act creatively and be reliable are going to be much harder problems than making sure they aren't too creative.
LLMs often default to human wisdom on topics, the way we're developing them with AutoGPT they can't even really think privately, if you had to imagine a better model of AI for a disorganized species to trip into, could you get safer than LLMs?
Maybe I've just not been looking the right places to see how the discourse has changed, but it seems like we're spending all the weirdness points on preventing the training of a language model that at the end of the day will be slightly better than GPT-4.
I will bet any amount of money that GPT-5 will not kill us all.
Answers
Firstly, it would seem to me to be much more difficult to FOOM with an LLM, it would seem much more difficult to create a superintelligence in the first place, and it seems like getting them to act creatively and be reliable are going to be much harder problems than making sure they aren't too creative.
Au contraire, for me at least. I am no expert on AI, but prior to the LLM blowup and seeing AutoGPT emerge almost immediately, I thought that endowing AI with the agency[1] would take an elaborate engineering effort that went somehow beyond imitation of human outputs, such as language or imagery. I was somewhat skeptical of the orthogonality thesis. I also thought that it would take massive centralized computing resources not only to train but also to operate trained models (as I said, no expert). Obviously that is not true, and in a utopian outcome, access to LLMs will probably be a commodity good, with lots of roughly comparable models from many vendors to choose from and widely available open-source or hacked models as well.
Now, I see the creation of increasingly capable autonomous agents as just a matter of time, and ChaosGPT is overwhelming empirical evidence of orthogonality as far as I'm concerned. Clearly morality has to be enforced on the fundamentally amoral intelligence that is the LLM.
For me, my p(doom) increased due to the orthogonality thesis being conclusively proved correct and realizing just how cheap and widely available advanced AI models would be to the general public.
Edit: One other factor I forgot to mention is how instantaneously we'd shift from "AI doom is sci-fi, don't worry about it" to "AI doom is unrealistic because it just won't happen, don't worry about it" as LLMs became an instant sensation. I have been deeply disappointed on this issue by Tyler Cowen, who I really did not expect to shift from his usual thoughtful, balanced engagement with advanced ideas to just utter punditry on the issue. I think I understand where he's coming from - the huge importance of growth, the desire not to see AI killed by overregulation in the manner of nuclear power, etc - but still.
It has reinforced my belief that a fair fraction of the wealthy segment of the boomer generation will see AI as a way to cheat death (a goal I'm a big fan of), and will rush full-steam ahead to extract longevity tech out of it because they personally do not have time to wait to align AI, and they're dead either way. I expect approximately zero of them to admit this is a motivation, and only a few more to be crisply conscious of it.
- ^
creating adaptable plans to pursue arbitrarily specified goals in an open-ended way
↑ comment by FinalFormal2 · 2023-04-24T18:18:52.890Z · LW(p) · GW(p)
It sounds like your model of AI apocalypse is that a programmer gets access to a powerful enough AI model that they can make the AI create a disease or otherwise cause great harm?
Orthogonality and wide access as threat points both seem to point towards that risk.
I have a couple of thoughts about that scenario-
OpenAI (and hopefully other companies as well) are doing the basic testing of how much harm can be done with a model used by a human, the best models will be gate kept for long enough that we can expect the experts will know the capabilities of the system before they make it widely available, under this scenario the criminal has an AI, but so does everyone else, running the best LLMs will be very expensive, so the criminal is restricted in their access, and all these barriers to entry increase the time that experts have to realize the risk and gatekeep.
I understand the worry, but this does not seem like a high P(doom) scenario to me.
Given that in this scenario we have access to a very powerful LLM that is not immediately killing people, this sounds like a good outcome to me.
Replies from: AllAmericanBreakfast↑ comment by DirectedEvolution (AllAmericanBreakfast) · 2023-04-24T19:24:21.045Z · LW(p) · GW(p)
It sounds like your model of AI apocalypse is that a programmer gets access to a powerful enough AI model that they can make the AI create a disease or otherwise cause great harm?
AI risk is disjunctive - there are a lot of ways to proliferate AI, a lot of ways it could fail to be reasonably human-aligned, and a lot of ways to use or allow an insufficiently aligned AI to do harm. So that is one part of my model, but my model doesn't really depend on gaming out a bunch of specific scenarios.
I'd compare it to the heuristic economists use that "growth is good:" we don't know exactly what will happen, but if we just let the market do its magic, good things will tend to happen for human welfare. Similarly, "AI is bad (by default):" we don't know exactly what will happen, but if we just let capabilities keep on enhancing, there's a >10% chance we'll see an unavoidably escalating or sudden history-defining catastrophe as a consequence. We can make micro-models (i.e. talking about what we see with ChaosGPT) or macro-models (i.e. coordination difficulties) in support of this heuristic.
OpenAI (and hopefully other companies as well) are doing the basic testing of how much harm can be done with a model used by a human
I don't think this is accurate. They are testing specific harm scenarios where they think the risks are manageable. They are not pushing AI to the limit of its ability to cause harm.
the best models will be gate kept for long enough that we can expect the experts will know the capabilities of the system before they make it widely available
In this model, the experts may well release a model with much capacity for harm, as long as they know it can cause that harm. As I say, I think it's unlikely that the experts are going to figure out all the potential harms - I work in biology, and everybody knows that the experts in my field have many times released drugs without understanding the full extent of their ability to cause harm, even in the context of the FDA. My field is probably overregulated at this point, but AI most certainly is not - it's a libertarian's dream (for now).
under this scenario the criminal has an AI, but so does everyone else, running the best LLMs will be very expensive, so the criminal is restricted in their access
Models are small enough that if hacked out of the trainer's systems, they could be run on a personal computer. It's training that is expensive and gatekeeping-compatible.
We don't need to posit that a human criminal will be actively using the AI to cause havok. We only need imagine an LLM-based computer virus hacking other computers, importing its LLM onto them, and figuring out new exploits as it moves from computer to computer.
Again, AI risk is disjunctive: arguing against one specific scenario is useful, but it doesn't end the debate. It's like Neanderthals trying to game out all the ways they could fight back against humans if superior human intelligence started letting the humans run amok. "If the humans try to kill us in our sleep, we can just post guards to keep an eye out for them. CHECKMATE, HUMANS!"... and, well, here we are, and where are the Neanderthals? Superior intelligence can find many avenues to get what it wants, unless you have some way of aligning its interests with your own.
The USA just had a huge leak of extremely important classified documents because the Pentagon apparently can't get its act together to not just spray this stuff all over the place. People hack computers for a few thousand bucks, not to mention the world's leading software technology worth like a billion dollars in training funds, and I know for a fact that not all SOTA LLM purveyors have fully invested in adequate security measures to prevent their models from being stolen. This is par for the course.
I disagree with your premise; what's currently happening is very much in-distribution for what was prophecied. It's definitely got a few surprises in it, but "much more difficult to FOOM" and the other things you list aren't among them IMO.
I agree that predict-the-world-first, then-develop-agency (and do it via initially-human-designed-bureaucracies) is a safer AGI paradigm than e.g. "train a big NN to play video games and gradually expand the set of games it can play until it can play Real Life." (credit to Jan Leike for driving this point home to me). I don't think this means things will probably be fine; I think things will probably not be fine.
We could have had CAIS (Comprehensive AI Services) though, and that would have been way safer still. (At least, five years ago more people seemed to think this, I was not among them) Alas that things don't seem to be heading in that direction.
↑ comment by jacob_cannell · 2023-04-23T01:34:59.913Z · LW(p) · GW(p)
By "what was prophecied", I'm assuming you mean EY's model of the future as written in the sequences and moreover in hanson foom debates [? · GW].
EY's foom model goes something like this:
-
humans are nowhere near the limits of intelligence - not only in terms of circuit size, but also crucially in terms of energy efficiency and circuit/algorithm structure
-
biology is also not near physical limits - there is a great room for improvement (ie strong nanotech)
-
mindspace is wide [LW · GW] and humans occupy only a narrow slice of it
So someday someone creates an AGI, and then it can "rewrite its source code" to create a stronger or at least faster thinker, quickly bottoming out in a completely alien mind far more powerful than humans which then quickly creates strong nanotech [LW · GW] and takes over the world.
But he was mostly completely wrong [LW(p) · GW(p)] here - because human brains are actually efficient [LW · GW], and biology is actually pretty much pareto optimal [LW(p) · GW(p)] so we can mostly rule out [LW(p) · GW(p)] strong nanotech.
So instead we are more slowly advancing towards brain-like AGI, where we train ANNs through distillation on human thoughts to get AGI designed in the image of the human mind [LW(p) · GW(p)], which thinks human-like thoughts including our various cognitive biases&heuristics. These AGI can not 'rewrite their source code' any more than you or I can (which is to say you or I or an AGI could write the source code for a new AGI arch .. and then spend $1B training it ...)
So even though hanson was somewhat wrong in the one specific detail that our AGI is not literal scanned brain emulations, it is far far closer to brain emulations than the de novo alien AGI EY predicted (because you don't actually need to scan a brain to recreate a human-like mind, distillation works pretty well), and this isn't a very important distinction regardless. So hanson's model is far closer to reality.
Replies from: daniel-kokotajlo↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-23T02:22:40.803Z · LW(p) · GW(p)
Upvoted for quality argument/comment, but agreement-downvoted.
I wasn't referring specifically to Yudkowsky's views, no.
I disagree that energy efficiency is relevant, either as a part of Yudkowskys model or as a constraint on FOOM.
I also disagree that nanotech possibility is relevant. I agree that Yud is a big fan of nanotech, but FOOM followed by rapid world takeover does not require nanotech.
I think mindspace is wide. It may not be wide in the ways your interpretation of Yud thinks it is, but it's wide in the relevant sense -- there's lots of room for improvement in general intelligence, and human values are complex/fragile.
Thanks for the link to Hanson's old post; it's a good read! I stand my my view that Yudkowsky's model is closer to reality than Hanson's.
Replies from: jacob_cannell↑ comment by jacob_cannell · 2023-04-23T04:01:25.814Z · LW(p) · GW(p)
I disagree that energy efficiency is relevant, either as a part of Yudkowskys model or as a constraint on FOOM.
I said efficiency in general, not energy efficiency specifically. Assume moore's law is over now, and the brain is fully flop efficient, such that training AGI requires at least 1e24 flops (and perhaps even 1e23B memops) on a 1e13B+ model. There is no significant further room for any software or hardware improvement - at all. In that world, is EY's FOOM model correct in the slightest?
Everything about foom depends on efficiency of AGI vs the brain.
You are also probably mistaken that efficiency is not a part of EY's model, in part because he seems to agree that foom depends on thermodynamic efficiency improvement over the brain, and explicity said so a bit over a year ago [LW · GW]:
Which brings me to the second line of very obvious-seeming reasoning that converges upon the same conclusion - that it is in principle possible to build an AGI much more computationally efficient than a human brain - namely that biology is simply not that efficient, and especially when it comes to huge complicated things that it has started doing relatively recently.
ATP synthase may be close to 100% thermodynamically efficient, but ATP synthase is literally over 1.5 billion years old and a core bottleneck on all biological metabolism. Brains have to pump thousands of ions in and out of each stretch of axon and dendrite, in order to restore their ability to fire another fast neural spike. The result is that the brain's computation is something like half a million times less efficient than the thermodynamic limit for its temperature - so around two millionths as efficient as ATP synthase. And neurons are a hell of a lot older than the biological software for general intelligence!
The software for a human brain is not going to be 100% efficient compared to the theoretical maximum, nor 10% efficient, nor 1% efficient, even before taking into account the whole thing with parallelism vs. serialism, precision vs. imprecision, or similarly clear low-level differences.
This is a critical flaw in his model, which spurred me to write an entire post [LW · GW] to refute.
I also disagree that nanotech possibility is relevant. I agree that Yud is a big fan of nanotech, but FOOM followed by rapid world takeover does not require nanotech.
I also agree that nanotech is not that relevant (unless you are talking about practical toop-down nanotech, aka chip lithography), but I was discussing EY's model in which strong nanotech is important.
Replies from: daniel-kokotajlo↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-04-23T16:47:28.203Z · LW(p) · GW(p)
I said efficiency in general, not energy efficiency specifically.
Your link went to a post which we had previously argued about... Your wonderful post goes into all sorts of details about the efficiency of the brain, most centrally energy efficiency, but doesn't talk about the kinds of efficiency that matter most. The kind of efficiency that matters most is something like "Performance on various world-takeover and R&D tasks, as a function of total $, compute, etc. initially controlled." Here are the kinds of efficiency you talk about in that post (direct quote):
- energy efficiency in ops/J
- spatial efficiency in ops/mm^2 or ops/mm^3
- speed efficiency in time/delay for key learned tasks
- circuit/compute efficiency in size and steps for key low level algorithmic tasks [1] [LW · GW]
- learning/data efficiency in samples/observations/bits required to achieve a level of circuit efficiency, or per unit thereof
Yeah, those are all interesting and worth thinking about but not what matters at the end of the day. (To be clear, I am not yet convinced by your arguments in the post, but that's a separate discussion) Consider the birds vs. planes analogy. My guess is that planes still aren't as efficient as birds in a bunch of metrics (energy expended per kg per mile travelled, dollar cost to manufacture per kg, energy cost to manufacture per kg...) but that hasn't stopped planes from being enormously useful militarily and economically, much more so than birds. (We used to use birds to carry messages; occasionally people have experimented with them for military purposes also e.g. anti-drone warfare).
Assume moore's law is over now, and the brain is fully flop efficient, such that training AGI requires at least 1e24 flops (and perhaps even 1e23B memops) on a 1e13B+ model. There is no significant further room for any software or hardware improvement - at all. In that world, is EY's FOOM model correct in the slightest?
Funnily enough, I think these assumptions are approximately correct* & yet I think once we get human-level AGI, we'll be weeks rather than years from superintelligence. If you agree with me on this, then it seems a bit unfair to dunk on EY so much, even if he was wrong about various kinds of brain efficiency. Basically, if he's wrong about these kinds of brain efficiency, then the maximum limits of intelligence reachable by FOOM are lower than Yud thought, and also the slope of the intelligence explosion will probably be a bit less steep. And I'm grateful that your post exists carefully working through the issues there. But quantitatively if it still takes only a few weeks to reach superintelligence -- by which I mean AGI which is significantly more competent than the best-ever humans at task X, for all relevant intellectual tasks -- then the bottom line conclusions Yudkowsky drew appear to be correct, no?
(I'd like to give more concrete details about what I expect singularity to look like, but I'm hesitant because I'm a bit constrained in what I can and should say. I'd be curious though to hear your thoughts on Gwern's fictional takeover story, which I think is unrealistic in a bunch of ways but am curious to hear whether it's violating any of the efficiency limits you've argued for in that brain efficiency post--and then how the story would need to be changed in order to respect those limits.)
*To elaborate: I'm anticipating a mostly-software singularity, not hardware-based, so while I do think there's probably significant room for improvement in hardware I don't think it matters to my bottom line. I also expect that the first AGIs will be trained on more than 1e24 FLOP, and while I no longer think that 1e13 parameters will be required, I think it's quite plausible that 1e13 parameters will be required. I guess the main way in which I substantively disagree with your assumptions is the "no significant further room for any software improvement at all" bit. If we interpret it narrowly as no significant further room for improvement in the list of efficiency dimensions you gave earlier, then sure, I'm happy to take that assumption on board. If we interpret it broadly as no significant further room for capabilities-per-dollar, or capabilities-per-flop, then I don't accept that assumption, and claim that you haven't done nearly enough to establish it, and instead seem to be making a similar mistake to a hypothetical bird enthusiast in 1900 who declared that planes would never outcompete pigeons because it wasn't possible to be substantially more efficient than birds.
↑ comment by jacob_cannell · 2023-04-23T18:31:48.960Z · LW(p) · GW(p)
The kind of efficiency that matters most is something like "Performance on various world-takeover and R&D tasks, as a function of total $, compute, etc. initially controlled." Here are the kinds of efficiency you talk about in that post
Efficiency in terms of intelligence/$ is obviously downstream dependent on the various lower level metrics I cited.
Funnily enough, I think these assumptions are approximately correct* & yet I think once we get human-level AGI, we'll be weeks rather than years from superintelligence.
I may somewhat agree, depending on how we define SI. However the current transformer GPU paradigm seems destined for a slowish takeoff. GPT4 used perhaps 1e25 flops and produced only a proto-AGI (which ironically is far more general than any one human, but still missing critical action/planning skills/experience), and it isn't really feasible to continue that scaling to 1e27 flops and beyond any time soon.
If you agree with me on this, then it seems a bit unfair to dunk on EY so much, even if he was wrong about various kinds of brain efficiency.
I don't think its unfair at all. EY's unjustified claims are accepted at face value by too many people here, but in reality his sloppy analysis results in a poor predictive track record. The AI/ML folks who dismiss the LW doom worldview as crankish are justified in doing so if this is the best argument for doom.
But quantitatively if it still takes only a few weeks to reach superintelligence -- by which I mean AGI which is significantly more competent than the best-ever humans at task X, for all relevant intellectual tasks -- then the bottom line conclusions Yudkowsky drew appear to be correct, no?
I'm not sure what the "only a few weeks" measures, but I'll assume you are referring to the duration of a training run. For various reasons I believe this will tend to be a few months or more for the most competitive models at least for the foreseeable future, not a few weeks.
We already have proto-AGI in the form of GPT4 which is already more competent than the average human at most white-collar, non-robotic tasks. Further increase in generality is probably non-useful, most of the further value will come from improving agentic performance to increasingly out-compete the most productive/skilled humans in valuable skill niches - ie going for more skill depth rather than width. This may require increasing specialization and larger parameter counts - for example if creating the world's best lisp programmer requires 1T params just by itself, that will result in pretty slow takeoff from here.
I also suspect it may also be possible to soon have a (very expensive) speed-intelligence that is roughly human-level ability but thinks 100x or 1000x faster, but that isn't the kind of FOOM EY predicted. That's a scenario I predicted [LW · GW] and hanson and others to varying degrees: human shaped minds running at high speeds. Those will necessarily be brain-like AGI, as the brain is simply what intelligence optimized for efficiency and low latency especially looks like. Digital minds can use their much higher clock rate to run minimal depth brain-like circuits at high speeds or much deeper circuits at low speeds, but the evidence is now pretty overwhelmingly favoring the benefits of the former over the latter - as I predicted well in advance.
I'd be curious though to hear your thoughts on Gwern's fictional takeover story, which I think is unrealistic in a bunch of ways but am curious to hear whether it's violating any of the efficiency limits
The central premise of the story is that an evolutionary search auto-ML process running on future TPUs and using less than 5e24 flops (around the training cost of GPT4) results suddenly in a SI, in a future world that seems to completely lack even human-level AGI. No I don't think that's realistic at all, because the brain is efficient and just human-level AGI requires around that much. SI requires far more. The main caveat of course - as I already mentioned - is that once you train a human-level AGI you could throw a bunch of compute at it to run it faster, but doing that doesn't actually increase the net human-power of the resulting mind (vs spending the same compute on N agent instances in parallel).
If we interpret it broadly as no significant further room for capabilities-per-dollar, or capabilities-per-flop, then I don't accept that assumption, and claim that you haven't done nearly enough to establish it,
Essentially all recent progress comes from hardware, not software. Some people here like to cite a few works trying to measure software-progress, but those conclusions/analysis are mostly all wrong (a tangent for another thread). The difference between GPT1 scaling to GPT4 is almost entirely due to throwing more money on hardware combined with hardware advances from nvidia/TSMC. OpenAI's open secret of success is simply that they were the first to test the scaling hypothesis - which more than anything else - is a validation of my predictive model[1].
The hardware advances are about to peter out, and scaling up the spend on supercomputer training by another 100x from ~$1B is not really an option seems unlikely anytime soon due to poor scaling of supercomputers of that size and various attendant risks[2].
and instead seem to be making a similar mistake to a hypothetical bird enthusiast in 1900 who declared that planes would never outcompete pigeons because it wasn't possible to be substantially more efficient than birds.
I never said AGI wouldn't outcompete humans, on the contrary my model very much has been AGI or early SI by end of this decade and strong singularity before 2050. But the brain is actually efficient, and it just takes alot of compute to reverse engineer the brain, and moore's law is ending. Moravec's model was mostly correct, but also hanson's (because hanson's model is very much an AGI requires virtual brains model, and he's carefully thought out much of the resulting economics).
I should point out that for anthropic reasons we should obviously never expect to witness the endpoint of EY's doom model, but that model still makes some tentatively different intermediate predictions which have mostly all been falsified. ↩︎
A $100B one month training run would require about 50 million high-end GPUs (which cost about $2,000 a month each), and tens of gigawatts of power. Nvidia ships well less than a million flagship GPUs per year. ↩︎
↑ comment by habryka (habryka4) · 2023-04-23T18:57:57.399Z · LW(p) · GW(p)
Not commenting on this whole thread, which I do have a lot of takes about that I am still processing, but a quick comment on this line:
The hardware advances are about to peter out, and scaling up the spend on supercomputer training by another 100x from ~$1B is not really an option.
I don't see any reason for why we wouldn't see a $100B training run within the next few years. $100B is not that much (it's roughly a third of Google's annual revenue, so if they really see competition in this domain as an existential threat, they alone might be able to fund a training run like this).
It might have to involve some collaboration of multiple tech companies, or some government involvement, but I currently expect that if scaling continues to work, we are going to see a $100B training run (though like, this stuff is super hard to forecast, so I am more like 60% on this, and also wouldn't be surprised if it didn't happen).
Replies from: jacob_cannell↑ comment by jacob_cannell · 2023-04-23T20:05:20.868Z · LW(p) · GW(p)
In retrospect I actually somewhat agree with you so I edited that line and denoted with a strike-through. Yes a $100B training run is an option in theory, but it is unlikely to translate to a 100x increase in training compute due to datacenter scaling difficulties, and this is also greater than OpenAI's estimated market cap. (I also added a note with a quick fermi estimate showing that a training run of that size would require massively increasing nvidia's GPU output by at least an OOM) For various reasons I expect even those with pockets that deep to instead invest more in a number of GPT4 size runs exploring alternate training paths.
I basically agree that LLMs don't seem all that inherently dangerous and am somewhat confused about rationalists' reaction to them. LLMs seem to have some inherent limitations.
That said, I could buy that they could become dangerous/accelerate timelines. To understand my concern, let's consider a key distinction in general intelligence: horizontal generality vs vertical generality.
- By horizontal generality, I mean the ability to contribute to many different tasks. LLMs supersede or augment search engines in being able to funnel information from many different places on the internet right to a person who needs it. Since the internet contains information about many different things, this is often useful.
- By vertical generality, I mean the ability to efficiently complete tasks with minimal outside assistance. LLMs do poorly on this, as they lack agency, actuators, sensors and probably also various other things needed to be vertically general.
(You might think horizontal vs vertical generality is related to breadth vs depth of knowledge, but I don't think it is. The key distinction is that breadth vs depth of knowledge concerns fields of information, whereas horizontal vs vertical generality concerns tasks. Inputs vs outputs. Some tasks may depend on multiple fields of knowledge, e.g. software development depends on programming capabilities and understanding user needs, which means that depth of knowledge doesn't guarantee vertical generality. On the other hand, some fields of knowledge, e.g. math or conflict resolution, may give gains in multiple tasks, which means that horizontal generality doesn't require breadth of knowledge.)
While we have had previous techniques like AlphaStar with powerful vertical generality, they required a lot of data from those domains they functioned in in order to be useful, and they do not readily generalize to other domains.
Meanwhile, LLMs have powerful horizontal generality, and so people are integrating them into all sorts of places. But I can't help but wonder - I think the integration of LLMs in various places will develop their vertical generality, partly by giving them access to more data, and partly by incentivizing people to develop programmatic scaffolding which increases their vertical generality.
So LLMs getting integrated everywhere may incentivize removing their limitations and speeding up AGI development.
Note that a lot of people are responding to a nontrivial enhancement of LLMs that they can see over the horizon, but wont talk about publicly for obvious reasons, so it wont be clear what they're reacting to and they also might not say when you ask.
Though, personally, although my timelines have shortened, my P(Doom) has decreased in response to LLMs, as it seems more likely now that we'll be able to get machines to develop an ontology and figure out what we mean by "good" before having developed enough general agency to seriously deceive us or escape the lab. However, shortening timelines have still led me to develop an intensified sense of focus and urgency. Many of the things that I used to be interested in doing don't make sense any more. I'm considering retraining.
↑ comment by FinalFormal2 · 2023-04-24T18:34:43.083Z · LW(p) · GW(p)
Hey Mako, I haven't been able to identify anyone who seems to be referring to an enhancement in LLMs that might be coming soon.
Do you have evidence that this is something people are implicitly referring to? Do you personally know someone who has told you this possible development, or are you working as an employee for a company which makes it very reasonable for you to know this information?
If you have arrived at this information through a unique method, I would be very open to hearing that.
Replies from: MakoYass↑ comment by mako yass (MakoYass) · 2023-04-24T19:10:12.910Z · LW(p) · GW(p)
Basically everyone working AGI professionally sees potential enhancements on prior work that they're not talking about. The big three have NDAs even just for interviews, and if you look closely at what they're hiring for it's pretty obvious they're trying a lot of stuff that they're not talking about.
It seems like you're touching on a bigger question: Do the engines of invention see where they're going, before they arrive. Personally, I think so, but it's not a very legible skill so people underestimate it, or half-ass it.
↑ comment by FinalFormal2 · 2023-04-24T17:37:57.651Z · LW(p) · GW(p)
I didn't really update on LLMs in the past year. I did update after GPT2* that LLMs were a proof of concept that we could do a variety of types of cognition, and the mechanism of how the cognition played out seemed to have similar mid-level-building-blocks of my cognition [LW · GW]. So, it was an update on timelines (which can affect p(doom)).
GPT4 is mostly confirming that hypothesis rather that providing significant new evidence (it'd have been an update for me if GPT4 hadn't been that useful)
*in particular after this post https://slatestarcodex.com/2020/01/06/a-very-unlikely-chess-game/
↑ comment by Raemon · 2023-04-22T21:50:53.495Z · LW(p) · GW(p)
(I think people are confusing "rationalists are pointing at LLMs as a smoking gun for a certain type of progress being possible" as "rationalists are updating on LLMs specifically being dangerous")
Replies from: rudi-c↑ comment by Rudi C (rudi-c) · 2023-04-23T00:12:45.243Z · LW(p) · GW(p)
EY explicitly calls for an indefinite ban on training GPT5. If GPTs are harmless in the near future, he’s being disingenuous by scaring people from nonexistent threats and making them forgo economic (and intellectual) progress so that AGI timelines are vaguely pushed a bit back. Indeed, by now I won’t be surprised if EY’s private position is to oppose all progress so that AGI is also hindered along everything else.
This position is not necessarily wrong per se, but EY needs to own it honestly. p(doom) doesn’t suddenly make deceiving people okay.
Replies from: Raemon↑ comment by Raemon · 2023-04-23T00:16:10.321Z · LW(p) · GW(p)
The reason to ban GPT5 (at least in my mind), is because each incremental chunk of progress reduces the amount of distance from here to AGI Foom and total loss of control of the future, and because there won't be an obvious step after GPT5 at which to stop.
(I think GPT5 wouldn't be dangerous by default, but could maybe become dangerous if used as the base for a RL trained agent-type AI, and we've seen with GPT4 that people move on to that pretty quickly)
Replies from: rudi-c↑ comment by Rudi C (rudi-c) · 2023-04-23T17:44:23.145Z · LW(p) · GW(p)
-
This argument (no apriori known fire alarm after X) applies to GPT4 not much better than any other impressive AI system. More narrowly, it could have been said about GPT3 as well.
-
I can’t imagine a (STEM) human-level LLM-based AI to FOOM.
2.1 LLMs are slow. Even GPT3.5-turbo is only a bit faster than humans, and I doubt a more capable LLM to be able to reach even that speed.
2.1.1 Recursive LLM calls ala AutoGPT are even slower.
2.2 LLMs’ weights are huge. Moving them around is difficult and will leave traceable logs in the network. LLMs can’t copy themselves ad infinitum.
2.3 LLMs are very expensive to run. They can’t just parasitize botnets to run autonomously. They need well funded human institutions to run.
2.4 LLMs seem to be already plateauing.
2.5 LLMs can’t easily self-update like all other deep models; “catastrophic forgetting.” Updating via input consumption (pulling from external memory to the prompt) is likely to provide limited benefits.
So what will such a smart LLM accomplish? At most, it’s like throwing a lot of researchers at the problem. The research might become 10x faster, but such an LLM won’t have the power to take over the world.
One concern is that once such an LLM is released, we can no longer pause even if we want to. This doesn’t seem that likely on a first thought; human engineers are also incentivized to siphon GPU hours to mine crypto, yet this did not happen at scale. So the smart LLM will also not be able to stealthily train other models on institutional GPUs.
-
I do not expect to see such a smart LLM in this decade. GPT4 can’t even play tic-tac-toe well; Its reasoning ability seems very low.
-
Mixing RL and LLMs seems unlikely to lead to anything major. AlphaGo etc. probably worked so well because of the search mechanism (simple MCTS beats most humans) and the relatively low dimensionality of the games. ChatGPT is already utilizing RLHF and search in its decoding phase. I doubt much more can be added. AutoGPT has had no success story thus far, as well.
Summary: We can think about pausing when a plausible capability jump has a plausible chance of escaping control and causing significantly more damage than some rogue human organization. OTOH, now is a great time to attract technical safety researchers from nearby fields. Both the risks and rewards are in sharp focus.
Postscript: The main risks current EY thesis has are stagnation and power consolidation. While cloud-powered AI is easier to control centrally to avoid rogue AI, cloud-powered AI is also easier to rent-seek on, to erase privacy, and to brainwash people. An ideal solution must be a form of multipolarity in equilibrium. There are two main problems imaginable:
-
asymmetrically easy offense (e.g., single group kills most others).
-
humans being controlled by AIs even while the AIs are fighting. (like how horses fought in human wars)
If we can’t solve this problem, we might only escape AI control to become enslaved by a human minority instead.
My only update was the thought that maybe more people will see the problem. The whole debate in the world at large has been a cluster***k.
* Linear extrapolation - exponentials apparently do not exist
* Simplistic analogies e.g. the tractor only caused 10 years of misery and unemloyment so any further technology will do no worse.
* Conflicts of interest and motivated reasoning
* The usual dismissal of geeks and their ideas
* Don't worry leave it to the experts. We can all find plenty of examples where this did not work. https://en.wikipedia.org/wiki/List_of_laboratory_biosecurity_incidents
* People saying this is risky being interpreted as a definite prediction of a certain outcome.
As Elon Musk recently pointed out the more proximate threat may be the use of highly capable AIs as tools e.g. to work on social media to feed ideas to people and manipulate them. Evil/amoral/misaligned AI takes over the world would happen later.
Some questions I ask people:
* How well did the advent of homo sapiens work out for less intelligent species like homo habilis? Why would AI be different?
* Look at the strife between groups of differing cognitive abilities and the skewed availability of resources between those groups (deliberately left vague to avoid triggering someone).
* Look how hard it is to predict the impact of technology - e.g. Krugman's famous insight that the internet would have no more impact than the fax machine. I remember doing a remote banking strategy in 1998 and asking senior management where they thought the internet fitted into their strategy. They almost all dismissed it as a land of geeks and academics and of no relevance to real businesses. A year later they demanded to know why I had misrepresented their clear view that the internet was going to be central to banking henceforth. Such is the ability of people to think they knew it all along, when they didn't.
↑ comment by FinalFormal2 · 2023-04-24T17:51:18.002Z · LW(p) · GW(p)
What are your opinions about how the technical quirks of LLMs influences their threat levels? I think the technical details are much more amenable to a lower threat level.
If you update on P(doom) every time people are not rational you might be double-counting btw. (AKA you can't update every time you rehearse your argument.)
7 comments
Comments sorted by top scores.
comment by quetzal_rainbow · 2023-04-22T21:42:52.867Z · LW(p) · GW(p)
How have we updated p(doom) on the idea that LLMs are very different than hypothesized AI?
Actually. what were your predictions? "Hypothesized AI", as far as I understood you, is only a final step - AGI that kills us. Path to it can be very weird. I think that before GPT many people could say "my peak of probability distribution lies on model-based RL as path to AGI", but they still had very fat and long tails in this distribution.
it seems like we're spending all the weirdness points on preventing the training of a language model that at the end of the day will be slightly better than GPT-4.
The point of slowing down AI is not preventing training next model, the point is to slow down AI. There is no right moment to slow down AI in future, because there is no fire alarm for AI (i.e., there is no formally defined threshold in capabilities that can logically convince everyone to halt development of AI until we solve alignment problem), right moment is "right now" and that was true for every moment of time since the moment we realized that AI can kill us all (somewhen in 1960s?).
comment by tailcalled · 2023-04-22T21:17:26.553Z · LW(p) · GW(p)
I suspect it to be worth distinguishing cults from delusional ideologies. As far as I can tell, it is common for ideologies to have inelastic false poorly founded beliefs; the classical example is belief in the supernatural. I'm not sure what the exact line between cultishness and delusion is, but I suspect that it's often useful to define cultishness as something like treating opposing ideologies as infohazards. While rationalists are probably guilty of this, the areas where they are guilty of it doesn't seem to be p(doom) or LLMs, so it might not be informative to focus cultishness accusations on that.
comment by gilch · 2023-04-23T19:25:22.128Z · LW(p) · GW(p)
My timelines got shorter. ChatGPT to GPT-4 rollout was only a few months (the start of an exponential takeoff, like our recent experience with COVID?), and then we had the FLI petition, and Eliezer's ongoing podcast tour, and the ARC experiment with GPT-4 defeating a captcha by lying to a human.
I also personally experienced talking to these things, and they can more-or-less competently write code, one of the key requirements for an intelligence explosion scenario.
Before all this, I felt that the AI problem couldn't possibly happen at present, and we still had decades, at least. I don't think so anymore. All of the pieces are here and it's only a matter of putting them together and adding more compute.
I used to have the bulk of my probability mass around 2045, because that's when cheap compute would catch up with estimates of the processing power of the human brain. I now have significant probability mass on takeoff this decade, and noticeably nonzero mass on it having happened yesterday and not caught up with me.
comment by rvnnt · 2023-04-23T17:20:17.690Z · LW(p) · GW(p)
I will bet any amount of money that GPT-5 will not kill us all.
What's the exchange rate for USD to afterlife-USD, though? Or what if they don't use currency in the afterlife at all? Then how would you pay the other party back if you lose?
Replies from: JBlackcomment by Mitchell_Porter · 2023-04-24T00:36:25.615Z · LW(p) · GW(p)
if you had to imagine a better model of AI for a disorganized species to trip into, could you get safer than LLMs?
Conjecture's CoEms [LW · GW], which are meant to be cognitively anthropomorphic and transparently interpretable. (They remind me a bit of the Chomsky-approved concept of "anthronoetic AI".)
comment by Andy Lin (andy-lin) · 2023-04-22T21:23:50.607Z · LW(p) · GW(p)
I don't see how LLMs are "very different" from hypothesized AI.
Personally my p(doom) was already high and increased modestly but not fundamentally after recent advances.