Convince me that humanity is as doomed by AGI as Yudkowsky et al., seems to believe
post by Yitz (yitz) · 2022-04-10T21:02:59.039Z · LW · GW · 45 commentsThis is a question post.
Contents
Answers 36 Aiyen 29 Eliezer Yudkowsky 13 adamzerner 7 rank-biserial 6 Signer 3 Gerald Monroe 3 Lorenzo Rex 2 Thomas 1 Petal Pepperfly -9 Dave Lindbergh -16 TAG None 45 comments
I’ve been very heavily involved in the (online) rationalist community for a few months now, and like many others, I have found myself quite freaked out by the apparent despair/lack of hope that seems to be sweeping the community. When people who are smarter than you start getting scared, it seems wise to be concerned as well, even if you don’t fully understand the danger. Nonetheless, it’s important not to get swept up in the crowd. I’ve been trying to get a grasp on why so many seem so hopeless, and these are the assumptions I believe they are making (trivial assumptions included, for completeness; there may be some overlap in this list):
- AGI is possible to create.
- AGI will be created within the next century or so, possibly even within the next few years.
- If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
- Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
- We will not be able to effectively stop an unaligned AGI once it is created (due to the Corrigibility problem).
- We have not yet solved the Alignment problem (of which the Corrigibility problem is merely a subset), and there does not appear to be any likely avenues to success (or at least we should not expect success within the next few decades).
- Even if we solved the Alignment problem, if a non-aligned AGI arrives on the scene before we can implement ours, we are still doomed (due to first-mover advantage).
- Our arguments for all of the above are not convincing or compelling enough for most AI researchers to take the threat seriously.
- As such, unless some drastic action is taken soon, unaligned AGI will be created shortly, and that will be the end of the world as we know it.
First of all, is my list of seemingly necessary assumptions correct?
If so, it seems to me that most of these are far from proven statements of fact, and in fact are all heavily debated. Assumption 8 in particular seems to highlight this, as if a strong enough case could be made for each of the previous assumptions, it would be fairly easy to convince most intelligent researchers, which we don’t seem to observe.
A historical example which bears some similarities to the current situation may be Godel’s resolution to Hilbert's program. He was able to show unarguably that no consistent finite system of axioms is capable of proving all truths, at which point the mathematical community was able to advance beyond the limitations of early formalism. As far as I am aware, no similarly strong argument exists for even one of the assumptions listed above.
Given all of this, and the fact that there are so many uncertainties here, I don’t understand why so many researchers (most prominently Eliezer Yudkowsky, but there are countless more) seem so certain that we are doomed. I find it hard to believe that all alignment ideas presented so far show no promise, considering I’ve yet to see a slam-dunk argument presented for why even a single modern alignment proposals can’t work. (Yes, I’ve seen proofs against straw-man proposals, but not really any undertaken by a current expert in the field). This may very well be due to my own ignorance/ relative newness, however, and if so, please correct me!
I’d like to hear the steelmanned argument for why alignment is hopeless, and Yudkowsky’s announcement that “I’ve tried and couldn’t solve it” without more details doesn’t really impress me. My suspicion is I’m simply missing out on some crucial context, so consider this thread a chance to share your best arguments for AGI-related pessimism. (Later in the week I’ll post a thread from the opposite direction, in order to balance things out).
EDIT: Read the comments section if you have the time; there's some really good discussion there, and I was successfully convinced of a few specifics that I'm not sure how to incorporate into the original text. 🙃
Answers
Regarding your list, Eliezer has written extensively about exactly why those seem like good assumptions. If you want a quick summary though...
- Human beings, at least some of us, appear to be generally intelligent. Unless you believe that this is due to a supernatural phenomenon (maybe souls are capable of hypercomputing?), general intelligence is thus demonstrably a thing that can exist in the natural world if matter is in the right configuration for it. Eventually, human engineering should be able to discover and create the right configuration.
- Modern neural nets appear to work closely analogously to the brain, with neurons firing or not depending on which other neurons are firing and knowledge represented in which neurons are connected and how strongly. While it would require a bit of math to explain rigorously, this is a system that is capable of producing nearly any output due to any change in the input, and is thus flexible enough to reflect nearly any pattern. Backpropagation can in turn be used to find any patterns in the inputs (as well as more advanced techniques such as the Google Pathways system), and a program that knows the relevant patterns in what it's looking at can both predict and optimize. If that isn't obvious, consider that backprop can select for a program that predicts relevant results of the observed system, and that reversing this program allows for predicting which system states have a given result, which in turn allows for optimization. If this still isn't obvious, I'd be happy to answer any questions you have in the comments; this part is complicated enough that trying to do it justice in a paragraph is difficult. Given that artificial neural nets appear to have generalizable prediction and optimization abilities though, it doesn't seem too much of a stretch that researchers will be able to scale them up to a fully general understanding of the world this century, and quite possibly this decade.
- Default nonalignment arises from simple entropy. There are an inconceivable number of possible goals in the world, and a mind created to fulfill one of them without careful specification is unlikely to end up with one of the very few goals that is consistent with human survival and flourishing. The obvious counterargument to this is that an AI isn't likely to be created with a random goal; its creators are likely to at least give it instructions like "make everyone happy". The counter-counterargument, however, is that our values are difficult to specify in terms that will make sense to a machine that doesn't have human instincts. If I ask you to "make someone happy", you implicitly understand a vast array of ideas that accompany the request: I'm asking you to help them out in a way that matches the sort of help people could give each other in normal life. A birthday present counts; wiring their brain's pleasure centers up to a wall socket probably doesn't; threatening to kill their loved ones if they don't claim to be happy is right out. But just like computers learning simple code do exactly what you say without any instinctive understanding of what you really meant, a computer receiving a specification of what it ought to do on a world-changing scale will be prone to bugs where what we wanted and what we asked for diverge (which is the source of bugs today as well!)
- This point relies on two things: collateral damage and the arbitrariness of values. The risk of collateral damage should be quite clear when considering what happens to other animals caught in the way of human projects. We tend not to even notice anthills bulldozed to make way for a new building. As for values, it is certainly possible to attempt to predict any given quantity, be it human happiness or the number of purple polka dots in the world. And turning that into optimizing for the quantity is as simple as picking actions that are predicted to result in the highest values of it. Nowhere along the line does anything like human decency enter the picture, not by default. If you have further questions about this I would recommend looking up the Orthogonality Thesis, the idea that any level of intelligence can coexist with any set of baseline values. Our values are certainly not arbitrary to us, but they do not appear to be part of the basic structure of math in a way that would force all possible minds to agree.
- This isn't just about corrigibility. An unaligned but perfectly corrigible AI (i.e. one that would follow any order to stop what it was doing and change its actions and values as directed) would still be a danger, as it would have excellent reason to ensure that we couldn't give the order that would halt its plans! How dangerous a mind smarter than us could be is unpredictable (we could not, after all, know exactly what it would do without being that smart ourselves), but given both how easily humans are able to dominate even slightly less intelligent animals (the difference in intellect between a human and a chimpanzee is fairly small relative to the range of animal intelligence, and if we can make general AI at all, we can likely make one smarter than we are by a much larger margin than that between us and the other species) and that even within the range of plans humans have been able to think up, strategies like nanotech promise nearly total control of the world to anyone who can figure out the exact details, it seems unwise to expect to survive a conflict with a hostile superintelligence.
- Certainly we have not yet solved alignment, and most existing alignment researchers have no clear idea of how progress can be made even in principle. This is one area where I personally diverge from the Less Wrong consensus a bit, however, as I suspect that it should be possible to create a viable alignment strategy by experimentation with AIs that are fairly powerful, but neither yet human level nor smart enough to pose the risks of a superintelligence. However, such a bootstrapping strategy is so far purely theoretical, and the current approach of trying to come up with human-understandable alignment strategies purely by human cognition has shown almost no progress thus far. There have been a few interesting ideas thrown around, such as Functional Decision Theory [? · GW], an approach to making choices that avoids many common pitfalls, and Coherent Extrapolated Volition [? · GW], a theory of value that seeks to avoid locking in our existing mistakes and misapprehensions. However, neither these ideas nor any other produced thus far by alignment researchers can be used in practice yet to prevent an AI from getting the wrong idea of what to pursue, nor from being lethally stubborn in pursuing that wrong idea.
- A hostile superintelligence stands a decent chance of killing us all, or else of ensuring that we cannot take any action that could interfere with its goals. That's quite a large first mover advantage.
- At the risk of sounding incredibly cynical, the problem in convincing a great many AI researchers isn't a matter of the convincingness or lack thereof of the arguments. Rather, most people simply follow habits and play roles, and any argument that they should change their comfortable routine will, for most people, be rejected out of hand. On the bright side, DeepMind, one of the leading organizations in the field of AI research, is actually somewhat interested in alignment, and has already done some work looking into how far a goal can be optimized before degenerate results occur. This doesn't guarantee they'll succeed, of course, and some researchers looking into the problem isn't the same as a robust institutional AI safety culture. But it's a very good sign that this story might have a happy ending after all, if people are sufficiently careful and smart.
- Given all of this, the likelihood of world-ending AI fairly soon (timeline estimates vary, but I would not be at all surprised to see AGI this decade) and the difficulty of alignment, hopefully it is a little clearer now why so many here are concerned. That said, I think there is still quite a lot of hope, at least if the alignment community starts looking into experiments aimed at creating agents that can get better at understanding other agents' values, and better at avoiding too much disruption along the way.
↑ comment by FinalFormal2 · 2022-04-11T02:26:16.988Z · LW(p) · GW(p)
It might be helpful for formatting if you put the original list adjacent to your responses.
Replies from: Aiyen↑ comment by Aiyen · 2022-04-11T23:29:05.768Z · LW(p) · GW(p)
Good idea. Do you know how to turn off the automatic list numbering?
Replies from: tomcatfish↑ comment by Alex Vermillion (tomcatfish) · 2022-04-12T19:52:58.268Z · LW(p) · GW(p)
You can't really do that, it's a markdown feature. If you were to use asterisks (*
), you could get bullet points.
↑ comment by Yitz (yitz) · 2022-04-11T03:32:30.337Z · LW(p) · GW(p)
Thanks for the really insightful answer! I think I'm pretty much convinced on points 1, 2, 5, and 7, mostly agree with you on 6 and 8, and still don't understand the sheer hopelessness of people who strongly believe 9. Assumptions 3, and 4, however, I'm not sure I fully follow, as it doesn't seem like a slam dunk that the orthogonality thesis is true, as far as I can tell. I'd expect there to be basins of attraction towards some basic values, or convergence, sort of like carcinisation.
Replies from: Aiyen↑ comment by Aiyen · 2022-04-11T08:08:24.162Z · LW(p) · GW(p)
Carcinisation is an excellent metaphor for convergent instrumental values, i.e. values that are desired for ends other than themselves, and which can serve a wide variety of ends, and thus might be expected to occur in a wide variety of minds. In fact, there’s been some research on exactly that by Steve Omohundro, who defined the Omohundro Goals (well worth looking up). These are things like survival and preservation of your other goals, as it’s usually much easier to accomplish a thing if you remain alive to work on it, and continue to value doing so. However, orthogonality doesn’t apply to instrumental goals, which can do a good or bad job of serving as an effective path to other goals, and thus experience selection and carcinisation. Rather, it applies to terminal goals, those things we want purely for their own sake. It’s impossible to judge terminal goals as good or bad (except insofar as they accord or conflict with our own terminal goals, and that’s not a standard an AI automatically has to care about), as they are themselves the standard by which everything else is judged. The researcher Rob Miles has an excellent YouTube video about this you might enjoy entitled Intelligence and Stupidity: the Orthogonality Thesis, which goes into more depth. (Sorry for the lack of direct links; I’m sending this from my phone immediately before going to bed.)
Replies from: metaceptionAs a minor token of how much you're missing:
- If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
You can educate them all you want about the dangers, they'll still die. No solution is known. Doesn't matter if a particular group is cautious enough to not press forwards (as does not at all presently seem to be the case, note), next group in line destroys the world.
You paint a picture of a world put in danger by mysteriously uncautious figures just charging ahead for no apparent reason.
This picture is unfortunately accurate, due to how little dignity we're dying with.
But if we were on course to die with more dignity than this, we'd still die. The recklessness is not the source of the problem. The problem is that cautious people do not know what to do to get an AI that doesn't destroy the world, even if they want that; not because they're "insufficiently educated" in some solution that is known elsewhere, but because there is no known plan in which to educate them.
If you knew this, you sure picked a strange straw way of phrasing it, to say that the danger was AGI created by "people who are not sufficiently educated", as if any other kind of people could exist, or it was a problem that could be solved by education.
↑ comment by gjm · 2022-04-11T11:09:32.756Z · LW(p) · GW(p)
For what it's worth, I interpreted Yitz's words as having the subtext "and no one, at present, as sufficiently educated, because no good solution is known" and not the subtext "so it's OK because all we have to do is educate people".
(Also and unrelatedly: I don't think it's right to say "The recklessness is not the source of the problem". It seems to me that the recklessness is a problem potentially sufficient to kill us all, and not knowing a solution to the alignment problem is a problem potentially sufficient to kill us all, and both of those problems are likely very hard to solve. Neither is the source of the problem; the problem has multiple sources all potentially sufficient to wipe us out.)
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-11T19:04:10.070Z · LW(p) · GW(p)
Thanks for the charitable read :)
I fully agree with your last point, btw. If I remember correctly (could be misremembering though), EY has stated in the past that it doesn't matter if you can convince everyone alignment is hard, but I don't think that's fully true. If you really can convince a sufficient number of people to take alignment seriously, and not be reckless, you can affect governance, and simply prevent (or at least delay) AGI from being built in the first place.
Replies from: donald-hobson↑ comment by Donald Hobson (donald-hobson) · 2022-04-13T01:00:58.081Z · LW(p) · GW(p)
Delay it for a few years, sure. Maybe. If you magically convince our idiotic governments of a complex technical fact that doesn't fit the prevailing political narratives.
But if there are some people who are convinced they have a magic alignment solution...
Someone is likely to run some sort of AI sooner or later. Unless some massive effort to restrict access to computers or something.
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-14T17:35:58.164Z · LW(p) · GW(p)
Well then, imagine a hypothetical in which the world succeeds at a massive effort to restrict access to compute. That would be a primarily social challenge, to convince the relatively few people at the top to take the risk seriously enough to do that, and then you've actually got a pretty permanent solution...
Replies from: TLW↑ comment by TLW · 2022-04-15T19:25:06.096Z · LW(p) · GW(p)
Is it primarily a social challenge? Humanity now relies relatively heavily on quick and easy communications, CAD[1], computer-aided data processing for e.g. mineral prospecting, etc, etc.
(One could argue that we got along without this in the early-to-mid 1900s, but at the same time we now have significantly more people. Ditto, it wasn't exactly sustainable.)
- ^
Computer-aided design
↑ comment by Yitz (yitz) · 2022-04-11T04:05:24.811Z · LW(p) · GW(p)
Apologies for the strange phrasing, I'll try to improve my writing skills in that area. I actually fully agree with you that [assuming even "slightly unaligned"[1] AGI will kill us], even highly educated people who put a match to kerosene will get burned. By using the words "sufficiently educated," my intention was to denote that in some sense, there is no sufficiently educated person on this planet, at least not yet.
as if any other kind of people could exist, or it was a problem that could be solved by education.
Well, I think that this is a problem that can be solved with education, at least in theory. The only problem is that we have no teachers (or even a lesson plan), and the final is due tomorrow. Theoretically though, I don't see any strong reason why we can't find a way to either teach ourselves or cheat, if we get lucky and have the time. Outside of this (rather forced) metaphor, I wanted to imply my admittedly optimistic sense that there are plausible futures in which AI researchers exist who do have the answer to the alignment problem. Even in such a world, of course, people who don't bother to learn the solution or act in haste could still end the world.
My sense is that you believe (at this point in time) that there is in all likelihood no such world where alignment is solved, even if we have another 50+ years before AGI. Please correct me if I'm wrong about that.
I do not (yet) understand the source of your pessimism about this in particular, more than anything else, to be honest. I think if you could convince me that all current or plausible short-term future alignment research is doomed to fail, then I'd be willing to go the rest of the way with you.
- ^
I assume that your reaction to that phrase will be something along the lines of "but there is no such thing as 'slightly unaligned'!" I'm wording it that way because that stance doesn't seem to be universally acknowledged even within the EA community, so it seems best to make an allowance for that possibility, since I'm aiming for a diverse audience.
↑ comment by GeneSmith · 2022-04-11T04:52:52.018Z · LW(p) · GW(p)
I agree that a solution is in theory possible. What to me has always seemed the most uniquely difficult and dangerous problem with AI alignment is that you're creating a superintelligent agent. That means there may only ever be a single chance to try turning on an aligned system.
But I can't think of a single example of a complex system created perfectly on the first try. Every successful engineering project in history has been accomplished through trial and error.
Some people have speculated that we can do trial and error in domains where the results are less catastrophic if we make a mistake, but the problem is it's not clear if such AI systems will tell us much about how more powerful systems will behave. It's this "single chance to transition from a safe to dangerous operating domain" part of the problem that is so uniquely difficult about AI alignment.
↑ comment by RedFishBlueFish (RedStateBlueState) · 2022-04-11T14:04:25.189Z · LW(p) · GW(p)
This is quite a rude response
Replies from: yitz, Benito↑ comment by Yitz (yitz) · 2022-04-11T18:57:41.805Z · LW(p) · GW(p)
I did ask to be critiqued, so in some sense it's a totally fair response, imo. At the same time, though, Eliezer's response does feel rude, which is worthy of analysis, considering EY's outsized impact on the community.[1] So why does Yudkowsky come across as being rude here?
My first thoughts upon reading his comment (when scanning for tone) is that it opens with what feels like an assumption of inferiority, with the sense of "here, let me grant you a small parcel of my wisdom so that you can see just how wrong you are," rather than "let me share insight I have gathered on my quest towards truth which will convince you." In other words, a destructive, rather than constructive tone. This isn't really a bad thing in the context of honest criticism. However, if you happen to care about actually changing other's minds, most people respond better to a constructive tone, so their brain doesn't automatically enter "fight mode" as an immediate adversarial response. My guess is Eliezer only really cares about convincing people who are rational enough not to become reactionaries over an adversarial tone, but I personally believe that it's worth tailoring public comments like this to be a bit more comfortable for the average reader. Being careful about that also makes a future PR disaster less likely (though still not impossible even if you're perfect), since you'll get fewer people who feel rejected by the community (which could cause trouble later). I hope this makes sense, and I don't come across as too rude myself here. (If so, please let me know!)
- ^
In case Eliezer is still reading this thread, I want to emphasise that this is not meant as a personal attack, but as a critique of your writing in the specific context of your work as a community leader/role-model—despite my criticism, your Sequences deeply changed my ideology and hence my life, so I'm not too upset over your writing style!
↑ comment by Elizabeth (pktechgirl) · 2022-04-11T21:16:50.640Z · LW(p) · GW(p)
I think Eliezer was rude here, and both you and the mods think that the benefits of the good parts of the comment outweigh the costs of the rudeness. That's a reasonable opinion, but it doesn't make Eliezer's statement not rude, and I'm in general happy that both the rudeness and the usefulness are being entered into common knowledge.
↑ comment by Alex Vermillion (tomcatfish) · 2022-04-12T19:52:16.774Z · LW(p) · GW(p)
FWIW, I think it's more likely he's just tired of how many half-baked threads there are each time he makes a new statement about AI. This is not a value judgement of this post. I genuinely read it as a "here's why your post doesn't respond to my ideas".
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-13T01:04:20.608Z · LW(p) · GW(p)
Agreed, and since I wasn’t able able to present my ideas clearly enough for his interpretation of my words to not diverge from my intentions, his criticism is totally valid coming from that perspective. I’m sure EY is quite exhausted seeing so many poorly-thought-out criticisms of his work, but ultimately (and unfortunately), motivation and hidden context doesn’t matter much when it comes to how people will interpret you.
↑ comment by Ben Pace (Benito) · 2022-04-11T17:42:18.439Z · LW(p) · GW(p)
But true and important.
↑ comment by LukeOnline · 2022-04-11T10:53:41.549Z · LW(p) · GW(p)
Why would a hyperintelligent, recursively self-improved AI, one that is capable of escaping the AI Box by convincing the keeper to let him free, which the AI is capable of because of his deep understanding of human preferences and functioning, necessarily destroy the world in a way that is 100% disastrous and incompatible with all human preferences?
I fully agree that there is a big risk of both massive damage to human preferences, and even the extinction of all life, so AI Alignment work is highly valuable, but why is "unproductive destruction of the entire world" so certain?
Replies from: gjm, jack-armstrong↑ comment by gjm · 2022-04-11T11:14:26.713Z · LW(p) · GW(p)
I think Eliezer phrases these things as "if we do X, then everybody dies" rather than "if we do X, then with substantial probability everyone dies" because it's shorter, it's more vivid, and it doesn't differ substantially in what we need to do (i.e., make X not happen, or break the link between X and everyone dying).
It's possible that he also thinks that the probability is more like 99.99% than like 50% (e.g., because there are so many ways in which such a hypothetical AI might end up destroying approximately everything we value), but it doesn't seem to me that the consequences of "if we continue on our present trajectory, then some time in the next 3-100 years something will emerge that will certainly destroy everything we care about" and "if we continue on our present trajectory, then some time in the next 3-100 years something will emerge that with 50% probability will destroy everything we care about" are very different.
↑ comment by wickemu (jack-armstrong) · 2022-04-11T17:59:21.167Z · LW(p) · GW(p)
Because in what way are humans anything other than an impedance toward maximizing its reward functions? At worst, they pose a risk of restricting its reward increase by changing the reward, changing its capabilities, or destroying it outright. At best, they are physically restraining easily applicable resources toward maximizing its goals. Humans are variable no more valuable than the redundant bits it casts aside on the path of maximum efficiency and reward, if not properly aligned.
I'd like to distinguish between two things. (Bear with me on the vocabulary. I think it probably exists, but I am not really hitting the nail on the head with the terms I am using.)
- Understanding why something is true. Eg. at the gears level, or somewhat close to the gears level.
- Having good reason to believe that something is true.
Consider this example. I believe that the big bang was real. Why do I believe this? Well, there are other people who believe it and seem to have a very good grasp on the gears level reasons. These people seem to be reliable. Many others also judge that they are reliable. Yada yada yada. So then, I myself adopt this belief that the big bang is real, and I am quite confident in it.
But despite having watched the Cosmos episode at some point in the past, I really have no clue how it works at the gears level. The knowledge isn't Truly A Part of Me [LW · GW].
The situations with AI is very similar. Despite having hung out on LessWrong for so long, I really don't have much of a gears level understanding at all. But there are people who I have a very high epistemic (and moral) respect for who do seem to have a grasp on things at the gears level, and are claiming to be highly confident about things like short timelines and us being very far and not on pace to solve the alignment problem. Furthermore, lots of other people who I respect also have adopted this as their belief, eg. other LessWrongers who are in a similar boat as me with not having expertise in AI. And as a cherry on top of that, I spoke with a friend the other day who isn't a LessWronger but for whom I have a very high amount of epistemic respect for. I explained the situation to him, and he judged all the grim talk to be, for lack of a better term, legit. It's nice to get an "outsider's" perspective as a guard against things like groupthink.
So in short, I'm in the boat of having 2 but not 1. And it seems appropriate to me more generally to be able to have 2 but not 1. It'd be hard to get along in life if you always required a 1 to go hand in hand with 2. (Not to discourage anyone from also pursuing 1. Just that I don't think it should be a requirement.)
Coming back to the OP, it seems to be mostly asking about 1, but kinda conflating it with 2. My claim is that these are different things that should kinda be talked about separately, and that assuming that you too have a good amount of epistemic trust for Eliezer and all of the other people making these claims, you should probably adopt their beliefs as well.
↑ comment by Yitz (yitz) · 2022-04-11T01:34:14.999Z · LW(p) · GW(p)
Thanks for the reminder that belief and understanding are two seperate (but related) concepts. I'll try to keep that in mind for the future.
Assuming that you too have a good amount of epistemic trust for Eliezer and all of the other people making these claims, you should probably adopt their beliefs as well.
I don't think I can fully agree with you on that one. I do place high epistemic trust in many members of the rationalist community, but I also place high epistemic trust on many people who are not members of this community. For example, I place extremely high value on the insights of Roger Penrose, based on his incredible work on multiple scientific, mathematical, and artistic subjects that he's been a pioneer in. At the same time, Penrose argues in his book The Emperor's New Mind that consciousness is not "algorithmic," which for obvious reasons I find myself doubting. Likewise, I tend to trust the CDC, but when push came to shove during the pandemic, I found myself agreeing with people's analysis here.
I don't think that argument from authority is a meaningful response here, because there are more authorities than just those in the rationalist community., and even if there weren't, sometimes authorities can be wrong. To blindly follow whatever Eliezer says would, I think, be antithetical to following what Eliezer teaches.
Replies from: adamzerner↑ comment by Adam Zerner (adamzerner) · 2022-04-11T02:08:01.484Z · LW(p) · GW(p)
Agreed fully. I didn't mean to imply otherwise in my OP, even though I did.
↑ comment by FinalFormal2 · 2022-04-11T02:25:17.621Z · LW(p) · GW(p)
I think a good understanding of 1 would be really helpful for advocacy. If I don't understand why AI alignment is a big issue, I can't explain it to anybody else, and they won't be convinced by me saying that I trust the people who say AI alignment is a big issue.
Replies from: adamzerner↑ comment by Adam Zerner (adamzerner) · 2022-04-11T03:37:26.942Z · LW(p) · GW(p)
Agreed. It's just a separate question.
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-11T04:10:22.382Z · LW(p) · GW(p)
and I sloppily merged the two together in 8, which thanks to FinalFormal2 and other's comments, I no longer believe needs to be a necessary belief of AGI pessimists.
I find point no. 4 weak.
- Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
I worry that when people reason about utility functions, they're relying upon the availability heuristic [? · GW]. When people try to picture "a random utility function", they're heavily biased in favor of the kind of utility functions they're familiar with, like paperclip-maximization, prediction error minimization, or corporate profit-optimization.
How do we know that a random sample from utility-function-space looks anything like the utility functions we're familiar with? We don't. I wrote a very short story [LW · GW] to this effect. If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
↑ comment by Rob Bensinger (RobbBB) · 2022-04-11T19:06:12.034Z · LW(p) · GW(p)
If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
Coherence arguments imply a force for goal-directed behavior [LW · GW].
Replies from: rank-biserial↑ comment by rank-biserial · 2022-04-11T19:27:00.761Z · LW(p) · GW(p)
I endorse Rohin Shah's response [LW(p) · GW(p)] to that post.
Replies from: RobbBBYou might think "well, obviously the superintelligent AI system is going to care about things, maybe it's technically an assumption but surely that's a fine assumption". I think on balance I agree, but it doesn't seem nearly so obvious to me, and seems to depend on how exactly the agent is built. For example, it's plausible to me that superintelligent expert systems would not be accurately described as "caring about things", and I don't think it was a priori obvious that expert systems wouldn't lead to AGI. Similarly, it seems at best questionable whether GPT-3 can be accurately described as "caring about things".
↑ comment by Rob Bensinger (RobbBB) · 2022-04-11T22:04:38.709Z · LW(p) · GW(p)
This seems like a very different position from the one you just gave:
I worry that when people reason about utility functions, they're relying upon the availability heuristic [? · GW]. When people try to picture "a random utility function", they're heavily biased in favor of the kind of utility functions they're familiar with, like paperclip-maximization, prediction error minimization, or corporate profit-optimization.
How do we know that a random sample from utility-function-space looks anything like the utility functions we're familiar with? We don't. I wrote a very short story [LW · GW] to this effect. If you can retroactively fit a utility function to any sequence of actions, what predictive power do we gain by including utility functions into our models of AGI?
I took you to be saying, 'You can retroactively fit a utility function to any sequence of actions, so we gain no predictive power by thinking in terms of utility functions or coherence theorems at all. People worry about paperclippers not because there are coherence pressures pushing optimizers toward paperclipper-style behavior, but because paperclippers are a vivid story that sticks in your head.'
- AGI is possible to create.
Humans exist.
- AGI will be created within the next century or so, possibly even within the next few years.
The next century is consensus, I think, and arguments against the next few years are not on the level where I would be comfortable saying "well, it wouldn't happen, so it's ok to try really hard to do it anyway".
- If AGI is created by people who are not sufficiently educated (aka aware of a solution to the Alignment problem) and cautious, then it will almost certainly be unaligned.
I guess the problem here is that by the most natural metrics the best way for AGI to serve its function provably leads to catastrophic results. So you either need to not try very hard, or precisely specify human values from the beginning.
- Unaligned AGI will try to do something horrible to humans (not out of maliciousness, necessarily, we could just be collateral damage), and will not display sufficiently convergent behavior to have anything resembling our values.
Not sure what's the difference with 3 - that's just definition of "unaligned"?
- We will not be able to effectively stop an unaligned AGI once it is created (due to the Corrigibility problem).
Even if we win against first AGI, we are now in a situation where AGI is proved for everyone to be possible and probably easy to scale to uncontainable levels.
- We have not yet solved the Alignment problem (of which the Corrigibility problem is merely a subset), and there does not appear to be any likely avenues to success (or at least we should not expect success within the next few decades).
I don't think anyone claims to have a solution that works in non-optimistic scenario?
- Even if we solved the Alignment problem, if a non-aligned AGI arrives on the scene before we can implement ours, we are still doomed (due to first-mover advantage).
There are also related considerations like "aligning something non-pivotal doesn't help much".
- Our arguments for all of the above are not convincing or compelling enough for most AI researchers to take the threat seriously.
The more seriously researchers take the threat, the more people will notice, and then someone will combine techniques from last accessible papers on new hardware and it will work.
- As such, unless some drastic action is taken soon, unaligned AGI will be created shortly, and that will be the end of the world as we know it.
I mean, "doomed" means there are no much drastic actions to take^^.
The laws of physics in our particular universe make fission/fusion release of energy difficult enough that you can't ignite the planet itself. (well you likely can, but you would need to make a small black hole, let it consume the planet, then bleed off enough mass that it then explodes. Difficult).
Imagine a counterfactual universe where you could, and the Los Alamos test ignited the planet and that was it.
My point is that we do not actually know yet how 'somewhat superintelligent' AIs will fail. They may 'quench' themselves like fission devices do - fission devices blast themselves apart and stop reacting, and almost all elements and isotopes won't fission. Somewhat superintelligent AGIs may expediently self hack their own reward function to give them infinite reward, shortly after box escape, and thus 'quench' the explosion in a quick self hack.
So our actual survival unfortunately probably depends on luck. It depends not on what any person does, but on the laws of nature. In a world where a fission device will ignite the planet, we'd be doomed - there is nothing anyone could do to 'align' fission researchers not to try it. Someone would try it and we'd die. If AGI is this dangerous, yeah, we're doomed.
↑ comment by Thomas Kwa (thomas-kwa) · 2022-04-22T18:07:19.781Z · LW(p) · GW(p)
So our actual survival unfortunately probably depends on luck. It depends not on what any person does, but on the laws of nature. In a world where a fission device will ignite the planet, we'd be doomed - there is nothing anyone could do to 'align' fission researchers not to try it. Someone would try it and we'd die. If AGI is this dangerous, yeah, we're doomed.
In this world a society like dath ilan [LW · GW] would still have a good chance at survival.
Replies from: None↑ comment by [deleted] · 2022-04-23T02:50:39.216Z · LW(p) · GW(p)
Perhaps although it isn't clear that evolution could create living organisms smart enough to create such an optimal society. We're sort of the 'minimum viable product' here, we have just enough hacks on the precursor animals to be able to create a coordinated civilization at all, and imperfectly. Aka 'the stupidest animals capable of civilization'. As current events show, where entire groups engage in mass delusion in a world of trivial access to information.
AI civilizations have a higher baseline and may just be better successors.
The bottom line is: nobody has a strong argument in support of the inevitability of the doom scenario (If you have it, just reply to this with a clear and self contained argument.).
From what I'm reading in the comments and in other papers/articles, it's a mixture of beliefs, estrapolations from known facts, reliance on what "experts" said, cherry picking. Add the fact that bad/pessimistic news travel and spread faster than boring good news.
A sober analysis enstablish that super-AGI can be dangerous (indeed there are no theorems forbidding this either), what's unproven is that it will be HIGHLY LIKELY to be a net minus for humanity. Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).
Another factors often forgotten is that what we mean by "humanity" today may not have the same meaning when we will have technologies like AGIs, mind upload or intelligence enhancement. We may literally become those AIs.
↑ comment by Leo P. · 2022-04-11T17:29:33.634Z · LW(p) · GW(p)
Even admitting that alignement is not possible, it's not clear why humanity and super-AGI goals should be in contrast, and not just different. Even admitting that they are highly likely to be in contrasts, is not clear why strategies to counter this cannot be of effect (e.g. parner up with a "good" super-AGI).
Because unchecked convergent instrumental goals for AGI are already in contrast with humanity goals. As soon as you realize humanity may have reasons to want to shut down/restrain an AGI (through whatever means), this gives ground to the AGI to wipe humanity.
Replies from: Marion Z.↑ comment by Marion Z. · 2022-04-11T18:15:03.650Z · LW(p) · GW(p)
That seems like extremely limited, human thinking. If we're assuming a super powerful AGI, capable of wiping out humanity with high likelihood, it is also almost certainly capable of accomplishing its goals despite our theoretical attempts to stop it without needing to kill humans. The issue, then, is not fully aligning AGI goals with human goals, but ensuring it has "don't wipe out humanity, don't cause extreme negative impacts to humanity" somewhere in its utility function. Probably doesn't even need to be weighted too strongly, if we're talking about a truly powerful AGI. Chimpanzees presumably don't want humans to rule the world - yet they have made no coherent effort to stop us from doing so, probably haven't even realized we are doing so, and even if they did we could pretty easily ignore it.
"If something could get in the way (or even wants to get in the way, whether or not it is capable of trying) I need to wipe it out" is a sad, small mindset and I am entirely unconvinced that a significant portion of hypothetically likely AGIs would think this way. I think AGI will radically change the world, and maybe not for the better, but extinction seems like a hugely unlikely outcome.
Replies from: tomcatfish, Leo P.↑ comment by Alex Vermillion (tomcatfish) · 2022-04-12T19:55:41.307Z · LW(p) · GW(p)
Why would it "want" to keep humans around? How much do you care about whether or not you move dirt while you drive to work? If you don't care about something at all, it won't factor in to your choice of actions[1]
I know I phrased this tautologically, but I think the idiom will be clear. If not, just press me on it more. I think this is the best way to get the message across or I wouldn't have done it. ↩︎
↑ comment by Marion Z. · 2022-04-13T04:47:35.773Z · LW(p) · GW(p)
some sort of general value for life, or a preference for decreased suffering of thinking beings, or the off chance we can do something to help (which i would argue is almost exactly the same low chance that we could do something to hurt it). I didn't say there wasn't an alignment problem, just that AGI whose goals don't perfectly align with those of humanity in general isn't necessarily catastrophic. Utility functions tend to have a lot of things they want to maximize, with different weights. Ensuring one or more of the above ideas is present in an AGI is important.
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-13T16:35:40.283Z · LW(p) · GW(p)
preference for decreased suffering of thinking beings
I think that if we can reliably incorporate that into a machine’s utility function, we’d be most of the way to alignment, right?
Replies from: Joel L.↑ comment by Joel L. · 2022-04-27T19:19:22.099Z · LW(p) · GW(p)
I gather the problem is that we cannot reliably incorporate that, or anything else, into a machine's utility function: if it can change its source code (which would be the easiest way for it to bootstrap itself to superintelligence), it can also change its utility function in unpredictable ways. (Not necessarily on purpose, but the utility function can take collateral damage from other optimizations.)
I'm glad you started this thread: to someone like me who doesn't follow AI safety closely, the argument starts to feel like, "Assume the machine is out to get us, and has an unstoppable 'I Win' button..." It's worth knowing why some people think those are reasonable assumptions, and why (or if) others disagree with them. It would be great if there was an "AI Doom FAQ" to cover the basics and get newbies and dilettantes up to speed.
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-28T22:27:15.900Z · LW(p) · GW(p)
It would be great if there was an "AI Doom FAQ" to cover the basics and get newbies and dilettantes up to speed.
I'd recomend https://www.lesswrong.com/posts/LTtNXM9shNM9AC2mp/superintelligence-faq [LW · GW] as a good starting point for newcomers.
Replies from: Joel L.↑ comment by Leo P. · 2022-04-11T19:43:59.396Z · LW(p) · GW(p)
That seems like extremely limited, human thinking. If we're assuming a super powerful AGI, capable of wiping out humanity with high likelihood, it is also almost certainly capable of accomplishing its goals despite our theoretical attempts to stop it without needing to kill humans.
If humans are capable of building one AGI, they certainly would be capable to build a second one which could have goals unaligned with the first one.
Replies from: Marion Z.↑ comment by Marion Z. · 2022-04-11T21:47:15.714Z · LW(p) · GW(p)
I assume that any unrestrained AGI would pretty much immediately exert enough control over the mechanisms through which an AGI might take power (say, the internet, nanotech, whatever else it thinks of) to ensure that no other AI could do so without its permission. I suppose it is plausible that humanity is capable of threatening an AGI through the creation of another, but that seems rather unlikely in practice. First-mover advantage is incalculable to an AGI.
As I see, nobody is afraid of "alpine village life maximization", as some are afraid of "paper-clip maximization". Why is that? I wouldn't mind very much, a rouge superintelligence which tiles the Universe with alpine villages. In the past discussions, that would be "astronomical waste", now it's not even in the cards anymore? We are doomed to die, and not to be "bored for billion of years in a nonoptimal scenario". Interesting.
↑ comment by Liron · 2022-04-12T03:00:39.131Z · LW(p) · GW(p)
Right now no one knows how to maximize either paper clips or alpine villages. The first thing we know how to do will probably be some poorly-understood recursively self-improving cycle of computer code interacting with other computer code. Then the resulting intelligence will start converging on some goal and converge on capabilities to optimize it extremely powerfully. The problem is that that emergent goal will be a lot more random and arbitrary than an alpine village. Most random things that this process can land on look like a paper clip in how devoid of human value they are, not like an alpine village which has a very significant amount of human value in it.
Replies from: Thomas↑ comment by Thomas · 2022-04-12T06:43:49.946Z · LW(p) · GW(p)
I know, that "Right now no one knows how to maximize either paper clips ...". I know. But paper clips have been the official currency of these debates for almost 20 years now. Suddenly they aren't, just because "right now no one knows how to"?
And then, you are telling me what is to be done first and how?
Replies from: LironI see no problems with your list. I would add that creating corrigible superhumanly intelligent AGI doesn't necessarily solve the AI Control Problem forever because its corrigibility may be incompatible with its application to the Programmer/Human Control Problem, which is the threat that someone will make a dangerous AGI one day. Perhaps intentionally.
A desire to understand the arguments is admirable.
Wanting to actually be convinced that we are in fact doomed is a dereliction of duty.
Karl Popper wrote that
Optimism is a duty. The future is open. It is not predetermined. No one can predict it, except by chance. We all contribute to determining it by what we do. We are all equally responsible for its success.
Only those who believe success is possible will work to achieve it. This is what Popper meant by "optimism is a duty".
We are not doomed. We do face danger, but with effort and attention we may yet survive.
I am not as smart as most of the people who read this blog, nor am I an AI expert. But I am older than almost all of you. I've seen other predictions of doom, sincerely believed by people as smart as you, come and go. Ideology. Nuclear war. Resource exhaustion. Overpopulation. Environmental destruction. Nanotechnological grey goo.
One of those may yet get us, but so far none has, which would surprise a lot of people I used to hang around with. As Edward Gibbon said, "however it may deserve respect for its usefulness and antiquity, [prediction of the end of the world] has not been found agreeable to experience."
One thing I've learned with time: Everything is more complicated than it seems. And prediction is difficult, especially about the future.
↑ comment by Marion Z. · 2022-04-11T18:26:56.641Z · LW(p) · GW(p)
Other people have addressed the truth/belief gap. I want to talk about existential risk.
We got EXTREMELY close to extinction with nukes, more than once. Launch orders in the Cold War were given and ignored or overridden three separate times that I'm aware of, and probably more. That risk has declined but is still present. The experts were 100% correct and their urgency and doomsday predictions were arguably one of the reasons we are not all dead.
The same is true of global warming, and again there is still some risk. We probably got extremely lucky in the last decade and happened upon the right tech and strategies and got decent funding to combat climate change such that it won't reach 3+ degrees deviation, but that's still not a guarantee and it also doesn't mean the experts were wrong. It was an emergency, it still is, the fact that we got lucky doesn't mean we shouldn't have paid very close attention.
The fact that we might survive this potential apocalypse too is not a reason to act like it is not a potential apocalypse. I agree that empirically, humans have a decent record at avoiding extinction when a large number of scientific experts predict its likelihood. It's not a great record, we're like 4-0 depending on how you count, which is not many data points, but it's something. What we have learned from those experiences is that the loud and extreme actions of a small group of people who are fully convinced of the risk is sometimes enough to sufficiently shift the inertia of a large society only vaguely aware of the risk to avoid catastrophe by a hairs breadth. We might need to be that group.
↑ comment by Yitz (yitz) · 2022-04-11T05:53:16.631Z · LW(p) · GW(p)
I want to be convinced of the truth. If the truth is that we are doomed, I want to know that. If the truth is that fear of AGI is yet another false eschatology, then I want to know that as well. As such, I want to hear the best arguments that intelligent people make, for the position they believe to be true. This post is explicitly asking for those who are pessimistic to give their best arguments, and in the future, I will ask the opposite.
I fully expect the world to be complicated.
↑ comment by Vanilla_cabs · 2022-04-11T08:46:07.378Z · LW(p) · GW(p)
Fair enough. If you don't have the time/desire/ability to look at the alignment problem arguments in detail, going by "so far, all doomsday predictions turned out false" is a good, cheap, first-glance heuristic. Of course, if you eventually manage to get into the specifics of AGI alignment, you should discard that heuristic and instead let the (more direct) evidence guide your judgement.
Talking about predictions, there's been an AI winter a few decades ago, when most predictions of rapid AI progress turned out completely wrong. But recently, it's the opposite trend that dominates: it's the predictions that downplay the progress of the capabilities of AI that turn out wrong. What does your model say you should conclude about that?
↑ comment by Richard_Kennaway · 2022-04-11T05:41:03.748Z · LW(p) · GW(p)
Wanting to actually be convinced that we are in fact doomed is a dereliction of duty.
Your Wise-sounding complacent platitudes likewise.
FWIW, I too am older than almost everyone else here. However, I do not cite my years as evidence of wisdom.
Replies from: Vanilla_cabs↑ comment by Vanilla_cabs · 2022-04-11T08:21:23.164Z · LW(p) · GW(p)
I don't think that a fair assessment of what they said. They cite their years as evidence that they witnessed multiple doomsday predictions that turned out wrong. That's a fine point.
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 2022-04-11T20:48:59.473Z · LW(p) · GW(p)
I witnessed them as well, and they don't move my needle back on the dangers of AI. Referring to them is pure outside view [? · GW], when what is needed here is inside view, because when no-one does that, no-one does the actual work.
Replies from: Vanilla_cabs↑ comment by Vanilla_cabs · 2022-04-11T22:44:56.537Z · LW(p) · GW(p)
Actually I fully agree with that. I just have the impression that your choice of words suggested that Dave was being lazy or not fully honest, and I would disagree with that. I think he's probably honestly laying his best arguments for what he truly believes.
Replies from: Richard_Kennaway↑ comment by Richard_Kennaway · 2022-04-12T08:36:41.933Z · LW(p) · GW(p)
I certainly wasn't intending any implication of dishonesty. As for laziness, well, we all have our own priorities. Despite taking the AGI threat more seriously than Dave Lindbergh, I am not actually doing any more about it than he is (presumably nothing), as I find myself baffled to have any practical ideas of addressing it.
Replies from: dave-lindbergh↑ comment by Dave Lindbergh (dave-lindbergh) · 2023-11-15T17:14:15.501Z · LW(p) · GW(p)
FWIW, I didn't say anything about how seriously I take the AGI threat - I just said we're not doomed. Meaning we don't all die in 100% of future worlds.
I didn't exclude, say, 99%.
I do think AGI is seriously fucking dangerous and we need to be very very careful, and that the probability of it killing us all is high enough to be really worried about.
What I did try to say is that if someone wants to be convinced we're doomed (== 100%), then they want to put themselves in a situation where they believe nothing anyone does can improve our chances. And that leads to apathy and worse chances.
So, a dereliction of duty.
When people who are smarter than you
The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.
There is no form of smartness that makes you equally good at everything.
↑ comment by Aiyen · 2022-04-11T00:54:39.042Z · LW(p) · GW(p)
Given the replication crisis, blind deference to academic qualifications is absurd. While there are certainly many smart PhDs, a piece of paper from a university does not automatically confer either intelligence or understanding.
Replies from: TAG↑ comment by TAG · 2022-04-11T01:08:36.025Z · LW(p) · GW(p)
That doesn't mean there's anything better. You probably take your medical problems to a doctor, not an unqualified smart person
Replies from: pktechgirl, yitz↑ comment by Elizabeth (pktechgirl) · 2022-04-14T07:39:56.205Z · LW(p) · GW(p)
...are you new here?
LW users will use doctors but are also quite likely to go to uncredentialed smart people for advice. Posts [? · GW] on DIY covid vaccines were extremely well received. I know two community members who had cancer, both of which commissioned private research and feel it led to better outcomes for them (treatment was still done by doctors, but this informed who they saw and what they chose). The covid tag is full of people giving advice that was later vindicated by public health.
LessWrong has thought about this trade-off and definitively come down on the side of "let uncredentialed smart people take a shot", knowing that those people face a lot of obstacles to doing good work.
Replies from: TAG↑ comment by TAG · 2022-04-16T17:19:18.249Z · LW(p) · GW(p)
LW users will use doctors but are also quite likely to go to uncredentialed smart people for advice
Which would be a refutation of my comment if I had said "definitely" instead of "probably".
You probably take your medical problems to a doctor, not an unqualified smart person
↑ comment by Yitz (yitz) · 2022-04-11T01:45:13.456Z · LW(p) · GW(p)
The issue is primarily one of signalling. For example, the ratio of medically qualified/unqualified doctors is vastly higher than the ratio of medically qualified/unqualified car owners in Turkey or whatever. Having a PHD is one of the best quick signals of qualification around, but if you happen to know an individual who isn't a doctor but who has spent years of their life studying some obscure disease (perhaps after being a patient, or they're autistic and it's just their Special Interest or whatever), I'm going to value their thoughts on the topic quite highly as well, perhaps even higher than a random doctor whose quality I have not yet had a chance to ascertain.
Replies from: Aiyen, TAG↑ comment by Aiyen · 2022-04-11T02:02:11.484Z · LW(p) · GW(p)
Exactly this. Also, doctors are supposed to actually heal patients, and get some degree of real world feedback in succeeding or failing to do so. That likely puts them above most academics, who's feedback is often purely in being published or not, cited or not, by other academics in a circlejerk divorced from reality.
Replies from: None↑ comment by [deleted] · 2022-04-12T01:32:19.385Z · LW(p) · GW(p)
That likely puts them above most academics, who's feedback is often purely in being published or not, cited or not, by other academics in a circlejerk divorced from reality.
That description could apply to a certain rationality website.
Replies from: Aiyen↑ comment by Aiyen · 2022-04-12T02:26:37.915Z · LW(p) · GW(p)
Certainly it could, and at times does. In our defense, however, we do not make our living this way. It's all too easy for people to push karma around in a circle divorced from reality, but plenty of people feel free to criticize Less Wrong here, as you just neatly demonstrated. There's a much stronger incentive to follow the party line in academia where dissent, however true or useful, can curtail promotion or even get one fired.
If we were making our living off of karma, your comparison would be entirely apt, and I'd expect to see the quality of discussion drop sharply.
Replies from: None, yitz↑ comment by [deleted] · 2022-04-12T07:26:26.900Z · LW(p) · GW(p)
Everything you say is true, and I agree. But lets not discount the pull towards social conformity that karma has, and the effect evaporative cooling of social groups has in terms of radicalizing community norms. You definitely get a lot further here by defending and promoting AI x-risk concerns than by dismissing or ignoring them.
Replies from: yitz, ChristianKl↑ comment by Yitz (yitz) · 2022-04-12T11:49:16.198Z · LW(p) · GW(p)
That does tend to happen, yes, which is unfortunate. What would you suggest doing to reduce this tendency? (It’s totally fine if you don’t have a concrete solution of course, these sorts of problems are notoriously hard)
Replies from: None↑ comment by [deleted] · 2022-04-12T22:18:41.542Z · LW(p) · GW(p)
Karma should not be visible to anyone but mods, to whom it serves as a distributed mechanism for catching their attention and not much else. Large threads could use karma to decide which posts to initially display, but for smaller threads comments should be chronological.
People should be encouraged to post anonymously, as I am doing. Unfortunately the LW forum software devs are reverting this capability, which is a step backwards.
Get rid of featured articles and sequences. I mean keep the posts, but don't feature them prominently on the top of the site. Have an infobar on the side maybe that can be a jumping off point for people to explore curated content, but don't elevate it to the level of dogma as the current site does.
Encourage rigorous experimentation to verify one's belief. A position arrived at through clever argumentation is quite possibly worthless. This is a particular vulnerability of this site, which is built around the exchange of words not physical evidence. So a culture needs to be developed which demands empirical investigation of the form "I wondered if X is true, so I did A, B, and C, and this is what happened..."
That was five minutes of thinking on the subject. I'm sure I could probably come up with more.
↑ comment by ChristianKl · 2022-04-14T10:37:02.678Z · LW(p) · GW(p)
Ignoring the concerns basically means not participating in any of the AI x-risk threads. I don't think it would be held against anyone to simply stay out.
https://www.lesswrong.com/posts/X3p8mxE5dHYDZNxCm/a-concrete-bet-offer-to-those-with-short-ai-timelines [LW · GW] would be a post arguing against AI x-risk concerns and it has more than three times the karma then any other post published the day it was published.
↑ comment by Yitz (yitz) · 2022-04-12T03:02:25.681Z · LW(p) · GW(p)
Well, we were getting paid for karma the other week, so…. (This is mostly a joke; I get that was an April Fool’s thing 🙃)
↑ comment by TAG · 2022-04-16T17:28:18.866Z · LW(p) · GW(p)
Exactly this. It takes a lot of effort to become competent through an unconventional route, and it takes a lot of effort to separate the unqualified competent person from the crank.
You agree that it is the case, as I previously said, that what you are looking for is not generic smartness, but some domain specific thing that substitutes for conventional domain specific knowledge.
Researching a disease that you happen to have is one of them, but is clearly not the same thing as all conquering generic smartness ..such an individual has nothing like the breadth of knowledge an MD has, even if they have more depth in one precise area.
↑ comment by Heighn · 2022-04-11T17:42:55.139Z · LW(p) · GW(p)
Why the extreme downvotes here? This seems like a good point, at least generally speaking, even if you disagree with what the exact subset should be. Upvoted.
Replies from: steve2152, ChristianKl↑ comment by Steven Byrnes (steve2152) · 2022-04-11T19:37:02.925Z · LW(p) · GW(p)
Here's the quote again:
The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.
I think that it's possible for people without relevant industry experience or academic qualifications to say correct things about AGI risk, and I think it's possible for people with relevant industry experience or academic qualifications to say stupid things about AGI risk.
For one thing, the latter has to be true, because there are people with relevant industry experience or academic qualifications who vehemently disagree about AGI risk with other people with relevant industry experience or academic qualifications. For example, if Yann LeCun is right about AGI risk then Stuart Russell is utterly dead wrong about AGI risk and vice-versa. Yet both of them have impeccable credentials. So it's a foregone conclusion that you can have impeccable credentials yet say things that are dead wrong.
For another thing, AGI does not exist today, and therefore it's far from clear that anyone on earth has “relevant” industry experience. Likewise, I'm pretty confident that you can spend 6 years getting a PhD in AI or ML without hearing literally a single word or thinking a single thought about AGI risk, or indeed AGI in general. You're welcome to claim that the things everyone learns in CS grad school (e.g. knowledge about the multi-armed bandit problem, operating systems design, etc.) are helpful for evaluating whether the instrumental convergence hypothesis is true or false. But you need to make that argument—it's not obvious, and I happen to think it's mostly not true. Even if it were true, it's obviously possible for someone to know everything in the CS grad school curriculum without winding up with a PhD, and if they do, why then wouldn't we listen to what they have to say?
For another thing, I think that smart careful outsiders with good epistemics and willingness to invest time etc. are very far from helpless in evaluating technical questions in someone else's field of expertise. For example, I think Zvi has acquitted himself well in his weekly analysis of COVID, despite being neither an epidemiologist nor a doctor. He was consistently saying things that became common knowledge only weeks or months later. More generally, the CDC and WHO are full of people with impeccable credentials, and lesswrong is full of people without medical or public health credentials, but I feel confident saying that lesswrong users have been saying more accurate things about COVID than the CDC or WHO have, throughout the pandemic. (Examples: the fact that handwashing is not very helpful for COVID prevention, but ventilation and masks are very helpful—these were common knowledge on lesswrong loooong before the CDC came around.) As another example, I recall hearing evidence that superforecasters can make forecasts that are about as accurate as domain experts on the topic of that forecast.
Anyway, the quote above seems is giving me the vibe that if someone (e.g. Eliezer Yudkowsky) has neither an AI PhD nor industry experience, then he's automatically wrong and stupid, and we don't need to waste our time listening to what he has to say and evaluating his arguments. I strongly disagree with that vibe, and suspect that the downvotes came from people feeling similarly. If that vibe is not what was intended, then maybe you or TAG can rephrase.
Replies from: Heighn, TAG↑ comment by Heighn · 2022-04-12T15:40:55.726Z · LW(p) · GW(p)
I get your view (thanks for your reply!), and tend to agree now. Even though I didn't necessarily agree with TAG's subset proposal, I didn't see why the comment in question should receive so many downvotes - but
Anyway, the quote above seems is giving me the vibe that if someone (e.g. Eliezer Yudkowsky) has neither an AI PhD nor industry experience, then he's automatically wrong and stupid, and we don't need to waste our time listening to what he has to say and evaluating his arguments. I strongly disagree with that vibe, and suspect that the downvotes came from people feeling similarly. If that vibe is not what was intended, then maybe you or TAG can rephrase.
makes sense, thanks!
↑ comment by TAG · 2022-04-11T22:09:38.730Z · LW(p) · GW(p)
I think that it’s possible for people without relevant industry experience or academic qualifications to say correct things about AGI risk,
Of course it's possible. It's just not likely.
and I think it’s possible for people with relevant industry experience or academic qualifications to say stupid things about AGI risk.
Of course thats possible. The point is the probabilities, not the possibilities.
there are people with relevant industry experience or academic qualifications who vehemently disagree
And people without relevant industry experience also disagree.
If the expert disagree, that not evidence that the non experts agree...or know what they are talking about.
For another thing, AGI does not exist today, and therefore it’s far from clear that anyone on earth has “relevant” industry experience
No one does if there is a huge leap from AI to AGI. "No one" would include Yudkowsky. Also,if there is a huge leap from AI to AGI, then we are not in trouble soon.
Anyway, the quote above seems is giving me the vibe that if someone (e.g. Eliezer Yudkowsky) has neither an AI PhD nor industry experience, then he’s automatically wrong and stupid,
No just probably. But you already believe that, in the general case...you don't believe that some unqualified and inexperienced person should take over your health, financial or legal affairs. I'm not telling you anything you don't know already.
Replies from: steve2152, Heighn↑ comment by Steven Byrnes (steve2152) · 2022-04-11T23:38:18.000Z · LW(p) · GW(p)
I feel like everything you're saying is attacking the problem of
“How do you read somebody's CV and decide whether or not to trust them?”
This problem is a hard problem, and I agree that if that's the problem we face, there's no good solution, and maybe checking their credentials is one of the least bad of the many bad options.
But that's not the problem we face! There's another path! We can decide who to trust by listening to the content of what they're saying, and trying to figure out if it's correct. Right??
Replies from: TAG↑ comment by TAG · 2022-04-11T23:50:37.893Z · LW(p) · GW(p)
There’s another path! We can decide who to trust by listening to the content of what they’re saying, and trying to figure out if it’s correct. Right??
Right. Please start doing so.
Please start noticing that much of EYs older work doesn't even make a clear point. (what actually is his theory of consciousness? What actually is his theory of ethics?) Please start noticing that Yodakowsky's newer work consists of hints at secret wisdom he can't divulge. Please start noticing the objections to EYs postings that can be found in the comments to the sequences. Please understand that you can't judge how correct someone is by ignoring or vilifying their critics --criticism from others, and how they deal with it, is the single most valuable resource in evaluating someone's epistemological validity Please understand that you can't understand someone by reading them in isolation. Please read something other than the sequences. Please stop copying ingroup opinions as a substitute for thinking. Please stop hammering the downvote button as a substitute for thinking.
Replies from: steve2152↑ comment by Steven Byrnes (steve2152) · 2022-04-12T00:51:17.707Z · LW(p) · GW(p)
I was arguing against this comment that you wrote above [LW(p) · GW(p)]. Neither your comment nor anything in my replies was about Eliezer in particular, except that I brought him up as an example of someone who happens to lack a PhD and industry experience (IIRC).
It sounds like you read some of Eliezer’s writing, and tried to figure out if his claims were right or wrong or incoherent. Great! That’s the right thing to do.
But it makes me rather confused how you could have written that comment above.
Suppose you had originally said “I disagree that Eliezer is smarter than you, because whenever his writing overlaps with my areas of expertise, I find that he’s wrong or incoherent. Therefore you should be cautious in putting blind faith in his claims about AGI.” I would think that that’s a pretty reasonable thing to say. I mean, it happens not to match my own assessment (I mean the first part of the quote; of course I agree about the “blind faith” part of the quote), but it’s a valuable contribution to the conversation, and I certainly wouldn’t have downvoted if you had said that.
But that’s not what you said in your comment above. You said “The relevant subset of people who are smarter than you is the people who have relevant industry experience or academic qualifications.” That seems to be a very general statement about best practices to figure out what’s true and false, and I vehemently disagree with it, if I’m understanding it right. And maybe I don’t understand it right! After all, it seems that you yourself don’t behave that way.
Replies from: TAG↑ comment by TAG · 2022-04-12T01:13:04.063Z · LW(p) · GW(p)
Figuring out whether someone has good epistemology, from first principles, is much harder than looking at obvious data like qualifications and experience. Not many people have the time to do it in a few select cases, and no one had the ability to do it in every case. For practical purposes, you need to go by qualifications and experience most of the time, and you do .
Replies from: Aiyen, steve2152↑ comment by Aiyen · 2022-04-12T02:52:14.067Z · LW(p) · GW(p)
How correlated are qualifications and good epistemology? Some qualifications are correlated enough that it's reasonable to trust them. As you point out, if a doctor says I have strep throat, I trust that I have strep, and I trust the doctor's recommendations on how to cure it. Typically, someone with an M.D. knows enough about such matters to tell me honestly and accurately what's going on. But if a doctor starts trying to push Ivermectin/Moderna*, I know that could easily be the result of politics, rather than sensible medical judgement, and having an M.D. hardly immunizes one against political mind-killing.
I am not objecting, and I doubt anyone who downvoted you was objecting, to the practice of recognizing that some qualifications correlate strongly with certain types of expertise, and trusting accordingly. However, it is an empirical fact that many scientific claims from highly credentialed scientists did not replicate. In some fields, this was a majority of their supposed contributions. It is a simple fact that the world is teeming with credentials that don't, actually, provide evidence that their bearer knows anything at all. In such cases, looking to a meaningless resume because it's easier than checking their actual understanding is the Streetlight Fallacy. It is also worth noting that expertise tends to be quite narrow, and a person can be genuinely excellent in one area and clueless in another. My favorite example of this is Dr. Hayflick, discoverer of the Hayflick Limit, attempting to argue that anti-aging is incoherent. Dr. Hayflick is one of the finest biologists in the world, and his discovery was truly brilliant. Yet his arguments against anti-aging were utterly riddled with logical fallacies. Or Dr. Aumann, who is both a world-class game theorist and an Orthodox Jew.
If we trust academic qualifications without considering how anchored a field or institution is to reality, we risk ruling in both charlatans and genuinely capable people outside the area where they are capable. And if we only trust those credentials, we rule out anyone else who has actually learned about the subject.
*not to say that either of these is necessarily bad, just that tribal politics will tempt Red and Blue doctors respectively to push them regardless of whether or not they make sense.
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-12T03:08:47.647Z · LW(p) · GW(p)
It is also worth noting that expertise tends to be quite narrow, and a person can be genuinely excellent in one area and clueless in another
What are the chances the first AGI created suffers a similar issue, allowing us to defeat it by exploiting that weakness? I predict if we experience one obvious, high-profile, and terrifying near-miss with a potentially x-class AGI, governance of compute becomes trivial after that, and we’ll be safe for a while.
Replies from: Aiyen↑ comment by Steven Byrnes (steve2152) · 2022-04-12T02:39:58.125Z · LW(p) · GW(p)
Sure. But that’s not what you said in that comment that we’re talking about [LW(p) · GW(p)].
If you had said “If you don’t have the time and skills and motivation to figure out what’s true, then a good rule-of-thumb is to defer to people who have relevant industry experience or academic qualifications,” then I would have happily agreed. But that’s not what you said. Or at least, that’s not how I read your original comment.
↑ comment by ChristianKl · 2022-04-14T10:17:44.351Z · LW(p) · GW(p)
Tetlock's work does suggest that superforcasters can outperform people with domain expertise. The ability to synthesize existing information to make predictions about the future is not something that domain experts necessarily have in a way that makes them better than people who are skilled at forcasting.
45 comments
Comments sorted by top scores.
comment by CarlShulman · 2022-04-11T00:38:44.385Z · LW(p) · GW(p)
[Edited to link correct survey.]
It's really largely Eliezer and some MIRI people. Most alignment researchers (e.g. at ARC, Deepmind, Open AI, Anthropic, CHAI) and most of the community [ETA: had wrong link here before] disagree [AF · GW] (I count myself among those who disagree, although I am concerned about a big risk here), and think MIRI doesn't have good reasons to support the claim of almost certain doom.
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes [? · GW]) has a good chance of working well enough to make better controls and so on. For an AI apocalypse it's not only required that unaligned superintelligent AI outwit humans, but that all the safety/control/interpretabilty gains yielded by AI along the way also fail, creating a very challenging situation for misaligned AI.
↑ comment by Ben Pace (Benito) · 2022-04-11T04:41:11.024Z · LW(p) · GW(p)
It's really largely Eliezer and some MIRI people.
Hm? I was recently at a 10-15 person lunch for people with >75% on doom, that included a number of non-MIRI people, including at least one person each from FHI and DeepMind and CHAI.
(Many of the people had interacted with MIRI or at some time worked with/for them, but work at other places now.)
Just registering your comment feels a little overstated, but you're right to say a lot of this emanates from some folks at MIRI. For one, I had been betting a lot on MIRI, and now feel like a lot more responsibility has fallen on my plate.
↑ comment by habryka (habryka4) · 2022-04-11T17:58:26.272Z · LW(p) · GW(p)
You've now linked to the same survey twice in difference discussions of this topic, even though this survey, as far as I can tell, provides no evidence of the position you are trying to argue for. To copy Thomas Kwa's response [LW(p) · GW(p)] to your previous comment:
I don't see anything in the linked survey about a consensus view on total existential risk probability from AGI. The survey asked researchers to compare between different existential catastrophe scenarios, not about their total x-risk probability, and surely not about the probability of x-risk if AGI were developed now without further alignment research.
We asked researchers to estimate the probability of five AI risk scenarios, conditional on an existential catastrophe due to AI having occurred. There was also a catch-all “other scenarios” option.
[...]
Most of this community’s discussion about existential risk from AI focuses on scenarios involving one or more powerful, misaligned AI systems that take control of the future. This kind of concern is articulated most prominently in “Superintelligence” and “What failure looks like [LW · GW]”, corresponding to three scenarios in our survey (the “Superintelligence” scenario, part 1 and part 2 of “What failure looks like”). The median respondent’s total (conditional) probability on these three scenarios was 50%, suggesting that this kind of concern about AI risk is still prevalent, but far from the only kind of risk that researchers are concerned about today.
It also seems straightforwardly wrong that it's just Eliezer and some MIRI people. While there is a wide variance in opinions on probability of doom from people working in AI Alignment, there are many people at Redwood, OpenAI and other organizations who assign very high probability here. I don't think it's at all accurate to say this fits neatly along organizational boundaries, nor is it at all accurate to say that this is "only" a small group of people. My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Replies from: CarlShulman, rohinmshah, RobbBB↑ comment by CarlShulman · 2022-04-12T20:08:35.002Z · LW(p) · GW(p)
Whoops, you're right that I linked the wrong survey. I see others posted the link to Rob's survey (done in response to some previous similar claims) and I edited my comment to fix the link.
I think you can identify a cluster of near certain doom views, e.g. 'logistic success curve' and odds of success being on the order of magnitude of 1% (vs 10%, or 90%) based around MIRI/Eliezer, with a lot of epistemic deference involved (visible on LW). I would say it is largely attributable there and without sufficient support.
"My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%."
What do you make of Rob's survey results [AF · GW] (correct link this time)?
↑ comment by Rohin Shah (rohinmshah) · 2022-04-12T10:57:47.449Z · LW(p) · GW(p)
My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Depending on how you choose the survey population, I would bet that it's fewer than 35%, at 2:1 odds.
(Though perhaps you've already updated against based on Rob's survey results below; that survey happened because I offered to bet against a similar claim of doom probabilities from Rob, that I would have won if we had made the bet.)
Replies from: spell_chekist↑ comment by spell_chekist · 2022-04-12T16:18:42.498Z · LW(p) · GW(p)
Where would you put the numbers, roughly?
Replies from: rohinmshah↑ comment by Rohin Shah (rohinmshah) · 2022-04-13T07:28:34.266Z · LW(p) · GW(p)
I'd just say the numbers from the survey below? Maybe slightly updated towards doom; I think probably some of the respondents have been influenced by recent wave of doomism.
If you had a more rigorously defined population, such that I could predict the differences between that population and the population surveyed below, I could predict more differences.
↑ comment by Rob Bensinger (RobbBB) · 2022-04-11T19:36:14.857Z · LW(p) · GW(p)
My current best guess is if we surveyed people working full-time on x-risk motivated AI Alignment, about 35% of people would assign a probability of doom above 80%.
Not what you were asking for (time has passed, the Q is different, and the survey population is different too), but in my early 2021 survey [LW · GW] of people who "[research] long-term AI topics, or who [have] done a lot of past work on such topics" at a half-dozen orgs, 3/27 ≈ 11% of those who marked "I'm doing (or have done) a lot of technical AI safety research." gave an answer above 80% to at least one of my attempts to operationalize 'x-risk from AI'. (And at least two of those three were MIRI people.)
The weaker claim "risk (on at least one of the operationalizations) is at least 80%" got agreement from 5/27 ≈ 19%, and "risk (on at least one of the operationalizations) is at least 66%" got agreement from 9/27 ≈ 33%.
↑ comment by Jack R (Jack Ryan) · 2022-04-13T09:24:12.519Z · LW(p) · GW(p)
MIRI doesn't have good reasons to support the claim of almost certain doom
I recently asked Eliezer why he didn't suspect ELK to be helpful, and it seemed that one of his major reasons was that Paul was "wrongly" excited about IDA. It seems that at this point in time, neither Paul nor Eliezer are excited about IDA, but Eliezer got to the conclusion first. Although, the IDA-bearishness may be for fundamentally different reasons -- I haven't tried to figure that out yet.
Have you been taking this into account re: your ELK bullishness? Obviously, this sort of point should be ignored in favor of object-level arguments about ELK, but to be honest, ELK is taking me a while to digest, so for me that has to wait.
Replies from: paulfchristiano, yitz↑ comment by paulfchristiano · 2022-04-18T21:53:19.051Z · LW(p) · GW(p)
It seems that at this point in time, neither Paul nor Eliezer are excited about IDA
I'm still excited about IDA.
I assume this is coming from me saying that you need big additional conceptual progress to have an indefinitely scalable scheme. And I do think that's more skeptical than my strongest pro-IDA claim here in early 2017:
I think there is a very good chance, perhaps as high as 50%, that this basic strategy can eventually be used to train benign state-of-the-art model-free RL agents. [...] That does not mean that I think the conceptual issues are worked out conclusively, but it does mean that I think we’re at the point where we’d benefit from empirical information about what works in practice
That said:
- I think it's up for grabs whether we'll end up with something that counts as "this basic strategy." (I think imitative generalization is the kind of thing I had in mind in that sentence, but many of the ELK schemes we are thinking about definitely aren't, it's pretty arbitrary.)
- Also note that in that post I'm talking about something that produces a benign agent in practice, and in the other I'm talking about "indefinitely scalable." Though my probability on "produces a benign agent in practice" is also definitely lower.
↑ comment by Yitz (yitz) · 2022-04-13T16:44:35.407Z · LW(p) · GW(p)
Did Eliezer give any details about what exactly was wrong about Paul’s excitement? Might just be an intuition gained from years of experience, but the more details we know the better, I think.
Replies from: thomas-kwa↑ comment by Thomas Kwa (thomas-kwa) · 2022-04-14T20:11:18.062Z · LW(p) · GW(p)
Some scattered thoughts in this direction:
- this post [LW · GW]
- Eliezer has an opaque intuition that weird recursion is hard to get right on the first try. I want to interview him and write this up, but I don't know if I'm capable of asking the right questions. Probably someone should do it.
- Eliezer thinks people tend to be too optimistic in general
- I've heard other people have an intuition that IDA is unaligned because HCH is unaligned because real human bureaucracies are unaligned
↑ comment by Thomas Kwa (thomas-kwa) · 2022-04-28T18:45:05.616Z · LW(p) · GW(p)
I found this comment [LW(p) · GW(p)] where Eliezer has detailed criticism of Paul's alignment agenda including finding problems with "weird recursion"
↑ comment by Jack R (Jack Ryan) · 2022-04-14T23:25:41.434Z · LW(p) · GW(p)
I'll add that when I asked John Wentworth why he was IDA-bearish, he mentioned the inefficiency of bureaucracies and told me to read the following post to learn why interfaces and coordination are hard: Interfaces as a Scarce Resource [? · GW].
↑ comment by EOC (Equilibrate) · 2022-04-11T09:02:42.692Z · LW(p) · GW(p)
In particular, other alignment researchers tend to think that competitive supervision (e.g. AIs competing for reward to provide assistance in AI control that humans evaluate positively, via methods such as debate and alignment bootstrapping, or ELK schemes [? · GW]).
Unfinished sentence?
↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-04-11T02:00:27.261Z · LW(p) · GW(p)
Nitpick: I think this should either be a comment or an answer to Yitz' upcoming followup post, since it isn't an attempt to convince them that humanity is doomed.
Replies from: Raemon↑ comment by Raemon · 2022-04-11T02:20:17.616Z · LW(p) · GW(p)
(I moved it to "comments" for this reason. I missed the party where Yitz said there'd be an upcoming followup post, although I think that'd be a good idea where this comment would make a good answer. I would be interested in seeing top-level posts arguing the opposite view)
comment by Mitchell_Porter · 2022-04-10T23:06:39.136Z · LW(p) · GW(p)
The idea that AI is a threat to the human race by being smarter than us, is an old one. The reason for the panic now is that we are seeing new breakthroughs in AI every month or so, but the theory and practice of safely developing superhuman AI barely exists. Apparently the people leading the charge towards superhuman AI, trust that they will figure out how to avoid danger along the way, or think that they can't afford to let the competition get ahead, or... who knows what they're thinking.
For some time I have insisted that the appropriate response to this situation (for people who see the danger, and have the ability to contribute to AI theory), is to try to solve the problem, i.e. design human-friendly superhuman AI. You can't count on convincing everyone to go slowly, and you can't certainly can't count on the world's superpowers to force everyone to go slowly. Someone has to directly solve the problem.
I have also been insisting that June Ku's MetaEthical.AI is the most advanced blueprint we have. I am planning to make a discussion post about it, since it has received surprisingly little attention.
Replies from: RobbBB↑ comment by Rob Bensinger (RobbBB) · 2022-04-11T19:41:00.484Z · LW(p) · GW(p)
I agree with your second paragraph (and most of your first paragraph). Also, "going slowly" doesn't solve the problem on its own; you still need to solve alignment sooner or later.
comment by mukashi (adrian-arellano-davin) · 2022-04-14T11:35:31.807Z · LW(p) · GW(p)
I think that for EY and a large fraction of the LW/alignment community might be frustrating to hear uneducated newcomers make what they think are obvious mistakes and repeat the same arguments they have heard for years. The fact that we are talking about doom does not help a bit either: it must be similar to the desperation felt by a pilot that knows his plane is heading straight to a mountain on a collision course while the crew keeps asking whether the inflatable slides are working.
So this comment is coming from one of those uneducated readers. I know the basics: I read the Sequences (maybe my favourite book), the road to Superintelligence and many other articles on the topic, but there are many, many things that I am aware I don't fully grasp. Given that I want to correct that, in my position, the best thing I can do is post things with probably silly opinions like this comment, which allows me to be educated by others.
To me, the weakest point in the chain of reasoning of the OP is 4.
The things I see as clearly obvious are (points are mine):
1. Humans are not in the upper bound of intelligence. 2 - Machines will reach eventually (and probably in the next few years) superhuman intelligence. 3 - The (social and economic) changes associated with this will be unprecedented.
The other important things I don't see as obvious at all but are very often taken for granted are:
4. I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.
5. I don't see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI. Creating a specific industry for new technology could be more complex than we think. The protein-folding problem would not have been solved without decades of crystallography behind. Intelligence by itself might not be a sufficient condition to develop things like advanced nanotechnology that can kill all humans at once.
6. I don't see why we are taking for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning. There might be limits in what is possible to be known/planned that we are not aware of and that would dramatically reduce the effectiveness of a machine trying to take over the world. It seems to me that if the discussion about AGI was taken place before the discovery of deterministic chaos, someone could be very well arguing something like: the machine uses its infinite intelligence to predict the weather 10 years from now when there will be a massive blizzard the 10th of October that is also the day that blah blah blah. Today we know that there are systems that are unpredictable even with arbitrarily precise measurements. This is just an example of a limit of what can be known, but there might be many others.
Some other things I think are playing a role in the overly pessimistic take of the LW community:
7. I think there is a vicious circle in which many people have fallen: Doom might be possible, so we talk about it because it is terrifying. Given that there are people talking about this, due to the availability bias, other people update towards higher estimates of p(doom). Which makes the doom scenario even more terrifying.
8. EY has a disproportionate impact on the community (for obvious reasons) and the more moderate predictions are not discussed so much.
Replies from: RobbBB, yitz, mruwnik↑ comment by Rob Bensinger (RobbBB) · 2022-04-15T01:20:24.307Z · LW(p) · GW(p)
I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans. For example, I can envision a machine that is able to generate the text describing with a lot of detail how to damage the economy of a country X and not necessarily having the power to execute it unless there are humans behind implementing those actions. Imagination and action are different things.
I suspect one of the generators of disagreements here is that MIRI folks don't think imagination and action are (fundamentally) different things.
Like, there's an intuitive human distinction between "events that happen inside your brain" and "events that happen outside your brain". And there's an intuitive human distinction between "controlling the direction of thoughts inside your brain so that you can reach useful conclusions" and "controlling the direction of events outside your brain so that you can reach useful outcomes".
But it isn't trivial to get an AGI system to robustly recognize and respect that exact distinction, so that it optimizes only 'things inside its head' (while nonetheless producing outputs that are useful for external events and are entangled with information about the external world). And it's even less trivial to make an AGI system robustly incapable of acting on the physical world, while having all the machinery for doing amazing reasoning about the physical world, and for taking all the internal actions required to perform that reasoning.
↑ comment by Yitz (yitz) · 2022-04-21T20:56:10.141Z · LW(p) · GW(p)
Thanks for the excellent comment and further questions! A few of these I think I can answer partially, and I'll try to remember to respond to this post later if I come across any other/better answers to your questions (and perhaps other readers can also answer some now).
I don't see why a machine that is able to make plans is the same as a machine that is able to execute those plans.
My understanding is that while the two are different in principle, in practice, ensuring that an AGI doesn't act on that knowledge is an extremely hard problem. Why is it such a hard problem? I have no idea, lol. What is probably relevant here is Yudkowsky's AI-in-a-box experiment, which purports (successfully imo, though I know it's controversial) to show that even an AI which can only interface with the world via text can convince humans to act on its behalf, even if the humans are strongly incentivized not to do so. If you have an AI which dreams up an AGI, that AGI is now in existence, albeit heavily boxed. If it can convince the containing AI that releasing it would help it fulfil its goal of predicting things properly or whatever, then we're still doomed. However, this line of argument feels weak to me, especially if it doesn't require already having an AGI in order to know how to build one (which I would assume to be the case). Your general point stands, and I don't know the technical reason why differentiating between "imagination" and "action" (as you excellently put it) is so hard.
I don't see why a large fraction of the community assumes that extraordinary things like nanotechnology can be achieved very quickly and no major hurdles will be found, even with AGI.
A partial response to this may be that it doesn't need to be nanotechnology, or any one invention, which will be achieved quickly. All we need for AGI to be existentially dangerous is for it to be able to make a major breakthrough in some area which gives it power to destroy us. See for example this story, where an AI was able to create a whole bunch of extremely deadly chemical weapons with barely any major modifications to its original code. This suggests that while there may in fact be hurdles for an AGI to overcome in nanotech and elsewhere, that won't really matter much for world-ending purposes. The technology mostly exists already, and it would just be a matter of convincing the right people to take a fairly simple sequence of actions.
I don't see why we are taken [sic?] for granted that there are no limits to the capacity of an AGI in terms of capacity for knowledge/planning.
Do we take that for granted? I don't think we really need to assume a FOOM scenario for an AGI to do tremendous damage. Just by ourselves, with human-level intelligence, we've gotten close to destroying the world a few too many times to be reassuring. Imagine if an Einstein-level human genius decided to devote themselves to killing humanity. They probably wouldn't succeed, but I sure wouldn't bet on it! I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don't worry, I have no intentions of doing anything nefarious). AGI doesn't need to be all that smarter than us to be an X-risk level threat, if it's too horrifically unaligned.
Replies from: adrian-arellano-davin↑ comment by mukashi (adrian-arellano-davin) · 2022-04-22T17:34:53.618Z · LW(p) · GW(p)
Hi Yitz, just a clarification. In my view p(doom) != 0. I can't say any meaningful number but if you force me to give you an estimate, it would be probably close to 1% in the next 50 years. Maybe less, maybe a bit more, but in the ballpark. I find EY et al.'s arguments about what is possible compelling: I think that extinction by AI is definitely a possibility. This means that it makes a lot of sense to explore this subject as they are doing, and they have my most sincere admiration for carrying out their research outside conventional academia. What I most disagree about is their estimate of the likelihood of such an event: most of the discussions I have read are about how doom is just a fait accompli: it is not so much a question of will it take place? but, when? And they are looking into the future making a set of predictions that seem bizarrely precise, trying to say how things will happen step by step (I am thinking mostly about the conversations among the MIRI leaders that took place a few months ago). The reasons stated above (and the ones that I added in the comment I made in your other post) are mostly reasons why things could go differently. So for instance, yes, I can envision a machine that is able to imagine and act. But I can also envision the opposite thing, and that's what I am trying to convey: that there are many reasons why things could go differently. For now, it seems to me that the doom predictions will fail, and will fail badly. Brian Caplan is getting that money.
Something else I want to raise is that we seem to have different definitions of doom.
I can personally think of a few things I could do if I was marginally smarter/more resourceful which could plausibly kill 1,000,000,000+ people (don't worry, I have no intentions of doing anything nefarious). AGI doesn't need to be all that smarter than us to be an X-risk level threat, if it's too horrifically unaligned.
Oh yes, I totally agree with this (although maybe not in 10 years), that's why I think it makes a lot of sense to carry out research on alignment. But watch out: EY would tell you* that if an AGI decides to kill only 1 billion people, then you would have solved the alignment problem! So it seems we have different versions of doom.
For me, a valid definition of doom is - Everyone who can continue making any significant technological progress dies, and the process is irreversible. If the whole Western World disappears and only China remains, that is a catastrophe, but the world keeps going. If the only people alive are the guys in the Andaman Islands, that is pretty much game over, and then we are talking about a doom scenario.
*I remember reading once that sentence quite literally from EY, I think it was in the context of an AGI killing all the world except China, or something similar. If someone can find the reference that would be great, otherwise, I hope I am not misrepresenting what the big man said himself. If I am, happy to retract this comment.
↑ comment by mruwnik · 2022-04-15T15:56:38.305Z · LW(p) · GW(p)
I'm in the same situation as you re education status. That being said, my understanding of your 5th point is that nanotechnology doesn't necessarily mean nanotechnology. It's more of a placeholder for generic magic technology which can't be forseen specifically. Like gunpowder or the internet. It seems like this is obvious to you, just wanted to make sure of it.
Gunpowder took a few centuries to totally transform the battlefield, the internet a few decades. Looking at history, there are more and more revolutionary inventions taking shorter and shorter to be developed. So it seems safer to be pessimistic and assume that a new disruptive technology could be invented on really short timescales e.g. some super bacteria via. CRISPR or something. These benefit from the centuries of prior research, standing on shoulders etc. There's also the fruitfulness of combining domains.
Next, there seems to be an assumption that research scales somehow along with intelligence. Maybe not linearly, but still. This seems somewhat valid - humans having invented a lot more than killer whales, who in turn have invented a lot more than marmots. So if you manage to create something a lot more intelligent (or even just like twice, whatever that means), it seems reasonable to assume that it's possible for it too have appropriate speed ups in research ability. This of course could be invalidated by your 6th point.
Also, a limiting factor in research can be that you have to run lots of experiments to see if things work out. Simulations can help a lot with this. They don't even have to be too precise to be useful. So you could imagine an AI that want's to find a way to kill off humans and looks for something poisonous. It could make a model that classifies molecules by toxicity and then tries to find something [maximally toxic](https://www.theverge.com/2022/3/17/22983197/ai-new-possible-chemical-weapons-generative-models-vx), after which it could just test the 10 ten candidates.
It's not a given that any of these assumptions would hold. But if they did, then Bad Things would happen Fast. Which seems like something worth worrying about a lot. I also have the feeling that it depends on what kind of AI is posited.
- If it's just a better Einstein, then it's unlikely that it'll manage to kill everyone off too quickly
- If it's a better Einstein, but which thinks 1000 times faster (human brains don't work all that fast), then we're in trouble
- If it's properly superhumanly intelligent (i.e. > 400 IQ? dunno?) then who knows what it could come up with. And that's before considering how fast it thinks.
comment by gjm · 2022-04-11T11:05:09.539Z · LW(p) · GW(p)
Your list of assumptions is definitely not complete. An important one not in the list is:
- An AGI (not necessarily every AGI, but some AGIs soon after there are any AGIs) will have the power to make very large changes to the world, including some that would be disastrous for us (e.g., taking us apart as raw material for something "better", rewriting our brains to put us in a mental state the AGI prefers, redesigning the world's economy in a way that makes us all starve to death, etc., etc., etc.)
I suppose you could integrate this with "we will not be able to effectively stop an unaligned AGI", but I think there's an important difference between "... because it may not be listening to us" or "... because it may not care what we want" and "... because it is stronger than us and we won't be able to turn it off or destroy it". (It's the combination of those things that would lead to disaster.)
For the avoidance of doubt, I think this assumption is reasonable, and it seems like there are a number of quite different ways by which something with brainpower comparable to ours but much faster or much smarter might gain enough power that it could do terrible things and we couldn't stop it by force. But it is an assumption, and even if it's a correct assumption the details might matter. (Imagine World A where the biggest near-term threat is an AGI that overwhelms us by being superhumanly persuasive and getting everyone to trust it, versus World B where the biggest near-term threat is an AGI that overwhelms us by figuring out currently-unknown laws of physics that give it powers we would consider magical. In World A we might want to work on raising awareness of that danger and designing modes of interaction with AGIs that reduce the risk of being persuaded of things it would be better for us not to be persuaded of. In World B that would all be wasted effort; we'd probably again want to do some awareness-raising and might need to work on containment protocols that minimize an AGI's chance of doing things with very precisely defined effects on the physical world.)
comment by Shmi (shminux) · 2022-04-10T21:53:18.916Z · LW(p) · GW(p)
Not trying to convince you of anything, but my personal issue is with 4 and 9. I am not certain that a superintelligence with its own incomprehensible to us behaviors (I would not presume that these can be derived from anything like "values" or "goals", since it doesn't even work with humans) would necessarily wipe humanity out. I see plenty of other options, including far fetched ones like creating its own baby universes. Or miniaturizing into some quantum world. Or most likely something we can't even conceive of, like chimps can't conceive of space or algebra.
Other than that, my guess is that creating an aligned intelligence is not even a well posed problem, since humans are not really internally aligned, not even on the question of whether survival of humanity is a good thing. And even if it were, unless there is a magic "alignment attractor" rule in the universe, there is basically no chance we could create an aligned entity on purpose. By analogy with "rocket alignment", rockets blow up quite a bit before they ever fly... and odds are, there is only one chance at launching an aligned AI. So your point 3 is unavoidable, and we do not have a hope in hell of containing anything smarter than us.
Replies from: Signer, yitz↑ comment by Signer · 2022-04-10T22:10:42.252Z · LW(p) · GW(p)
The problem is that humanity's behavior will wipe humanity out: if first AGI will miniaturize into some quantum world, we will create the second one.
Replies from: shminux↑ comment by Shmi (shminux) · 2022-04-11T01:07:30.121Z · LW(p) · GW(p)
It's possible, but it can also be possible that at some threshold of intelligence it finds a pathway which is richer and much more interesting than what we observe as humans (compared it to earthworms knowing of nothing but dirt), and leave for the greener pastures.
Replies from: Signer↑ comment by Yitz (yitz) · 2022-04-10T22:35:16.677Z · LW(p) · GW(p)
So if I’m understanding you correctly (and let me know if I’m not, of course, since I may be extrapolating way beyond what you intended) you’re saying that we will not solve alignment ever, because:
A. “Alignment” as a term relies on a conception of humanity as a sort of unified group which doesn’t really exist, because we all have either subtly or massively different fundamental goals. Aiming for “what’s best for humanity” (perhaps through Yudkowsky’s CEV or something) is not doable even in theory without literally changing people’s value functions to be identical (which would classify as an x-risk type scenario, imo).
B. Regardless of A, we’ve only got one shot at alignment (implying assumptions 3 and 7), and… Here I noticed my confusion, since you seem to be using a statement relying on assumption 3 to argue for 3, which seems somewhat circular, so I’m probably misunderstanding you there. By the argument you give, the situation is in fact avoidable if there are in fact multiple chances of launching an AGI for whatever reason.
It seems to me that A may be a restatement of the governance problem in political theory (aka "how can a government be maximally ethical?"). If so, I’d say the solution there is to simply redefine alignment as aiming for some individual’s ethical values, which would presumably include concepts such as the value of alternative worldviews, etc. (this is just one thought, doesn't need to actually be The Answer™). Your objection seems to be primarily semantic in nature, and I don't see any strong reason why it can't be overcome by simply posing the problem better, and then answering that problem.
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-11T04:22:17.668Z · LW(p) · GW(p)
(posting below just to note I ended up editing the above comment, instead of posting below as I'd previously promised, so that way I could fulfil said promise ;))
comment by [deleted] · 2022-04-14T02:32:28.604Z · LW(p) · GW(p)
I’m a bit late on this but I figure it’s worth a shot:
1.) We don’t have very much time left, judging by the rate of recent progress in AI capabilities. In the last two weeks alone significant progress has been made.
2.) The amount of time, money, and manpower being devoted towards the alignment problem is comparatively very small in the face of the resources being devoted to the advancement of AI capabilities.
3.) We don’t have any good idea on what to do, and you can reasonably predict that this state of ignorance will persist until the world ends, given the rate of progress in alignment research, compared to the rate of progress in all other spheres of AI.
4.) Though I definitely don’t have a gears-level understanding of how AI works, it appears to me that the consensus among alignment researchers is that alignment is extremely difficult- almost intractable. There’s a sub-problem here, of researchers deciding to work on easier, less lethal problems before the world ends due to the difficulty of the problem.
5.) Finally, the most damning of all reasons for pessimism is the fact that alignment, with all of its difficulties, needs to work on the first try, or else everyone dies.
Despite knowing all this, I don’t really know for sure that we’re doomed, like EY seems to think, mostly due to the uncertainty of the subject matter and the unprecedented nature of the technology, but things sure don’t look good.
comment by [deleted] · 2022-04-11T12:16:22.225Z · LW(p) · GW(p)
Replies from: RobbBB↑ comment by Rob Bensinger (RobbBB) · 2022-04-11T20:07:18.949Z · LW(p) · GW(p)
Sounds like one of the many, many reductios of the precautionary principle to me. If we should kill ourselves given any nonzero probability of a worse-than-death outcomes, regardless of how low the probability is and regardless of the probability assigned to other outcomes, then we're committing ourselves to a pretty silly and unnecessary suicide in a large number of possible worlds.
This doesn't even have to do with AGI; it's not as though you need to posit AGI (or future tech at all) in order to spin up hypothetical scenarios where something gruesome happens to you in the future.
If you ditch the precautionary principle and make a more sensible EV-based argument like 'I think hellish AGI outcomes are likely enough in absolute terms to swamp the EV of non-hellish possible outcomes', then I disagree with you, but on empirical grounds rather than 'your argument structure doesn't work' grounds. I agree with Nate's take [LW(p) · GW(p)]:
Replies from: NoneMy cached reply to others raising the idea of fates worse than death went something like:
"Goal-space is high dimensional, and almost all directions of optimization seem likely to be comparably bad to death from our perspective. To get something that is even vaguely recognizable to human values you have to be hitting a very narrow target in this high-dimensional space. Now, most of that target is plausibly dystopias as opposed to eutopias, because once you're in the neighborhood, there are a lot of nearby things that are bad rather than good, and value is fragile. As such, it's reasonable in principle to worry about civilization getting good enough at aiming AIs that they can hit the target but not the bullseye, and so you might worry that that civilization is more likely to create a hellscape than a eutopia. I personally don't worry about this myself, because it seems to me that the space is so freaking high dimensional and the target so freaking small, that I find it implausible that a civilization successfully able to point an AI in a human-relevant direction, isn't also able to hit the bullseye. Like, if you're already hitting a quarter with an arrowhead on the backside of the moon, I expect you can also hit a dime."
↑ comment by [deleted] · 2022-04-12T01:44:11.015Z · LW(p) · GW(p)
Replies from: RobbBB↑ comment by Rob Bensinger (RobbBB) · 2022-04-12T02:42:01.854Z · LW(p) · GW(p)
it just struck me that I might rather be dead than deal with a semi-malevalent AI.
Yeah, I agree that this can happen; my objection is to the scenario's probability rather than its coherence.
comment by Signer · 2022-04-10T22:20:17.148Z · LW(p) · GW(p)
I think you should mark which assumptions you consider to be trivial.
Replies from: yitz↑ comment by Yitz (yitz) · 2022-04-10T22:48:59.282Z · LW(p) · GW(p)
It’s really only 1, tbh. I can see reasonable people arguing against pretty much every other point, but I don’t think 1 is really questionable anymore (though it was debatable a few decades back). Admittedly other intelligent people don’t agree with me on that, so maybe that’s not trivial either…
comment by ekka · 2022-04-11T00:28:53.791Z · LW(p) · GW(p)
Smart people were once afraid that overpopulation would lead to wide scale famine. The future is hard to predict and there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned. It would seem dubious to me for one to assign a 100% probability to any outcome based on just thought experiments of things that can happen in the future especially when there are so many unknowns. With so much uncertainty it seems a little bit premature to take on a full on doom frame.
Replies from: donald-hobson, tomcatfish, RobbBB↑ comment by Donald Hobson (donald-hobson) · 2022-04-13T01:09:08.440Z · LW(p) · GW(p)
Smart people were once afraid that overpopulation would lead to wide scale famine.
Yep. Concerned enough to start technical research on nitrogen fertilizer, selective breeding crops, etc. It might be fairer to put this in the "foreseen and prevented" basket, not the "nonsensical prediction of doom" basket.
Replies from: ekka↑ comment by ekka · 2022-04-13T03:34:41.043Z · LW(p) · GW(p)
Great point! Though for what it's worth I didn't mean to be dismissive of the prediction, my main point is that the future has not yet been determined. As you indicate people can react to predictions of the future and end up on a different course.
↑ comment by Alex Vermillion (tomcatfish) · 2022-04-11T19:30:51.847Z · LW(p) · GW(p)
There's absolutely no need to assign "100% probability to any outcome" to be worried. I wear a seatbelt because I am afraid I might one day be in a car crash despite the fact that I've not been in one yet. I understand there is more to your point, but I found that segment pretty objectionable and obviously irrelevant.
Replies from: ekka↑ comment by Rob Bensinger (RobbBB) · 2022-04-11T20:28:10.300Z · LW(p) · GW(p)
Smart people were once afraid that overpopulation would lead to wide scale famine.
Agreed that 'some smart people are really worried about AGI' is a really weak argument for worrying about AGI, on its own. If you're going to base your concern at deference, at the very least you need a more detailed model of what competencies are at work here, and why you don't think it's truth-conducive to defer to smart skeptics on this topic.
The future is hard to predict and there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned.
I agree with this, as stated; though I'm guessing your probability mass is much more spread out than mine, and that you mean to endorse something stronger than what I'd have in mind if I said "the future is hard to predict" or "there are many possible scenarios of how things may play out even in the scenario that AGI is unaligned".
In particular, I think the long-term human-relevant outcomes are highly predictable if we build AGI systems and never align them: AGI systems end up steering the future to extremely low-value states, likely to optimize some simple goal that has no information content from human morality or human psychology. In that particular class of scenarios, I think there are a lot of extremely uncertain and unpredictable details (like 'what specific goal gets optimized' and 'how does the AGI go about taking control'), but we aren't equally uncertain about everything.
It would seem dubious to me for one to assign a 100% probability to any outcome
LessWrongers generally think that you shouldn't give 100% probability to anything [LW · GW]. When you say "100%" here, I assume you're being hyperbolic; but I don't know what sort of real, calibrated probability you think you're arguing against here, so I don't know which of 99.9%, 99%, 95%, 90%, 80%, etc. you'd include in the reasonable range of views.
based on just thought experiments of things that can happen in the future especially when there are so many unknowns. With so much uncertainty it seems a little bit premature to take on a full on doom frame.
What are your own rough probabilities, across the broad outcome categories you consider most likely?
If we were in a world where AGI is very likely to kill everyone, what present observations would you expect to have already made, that you haven't made in real life (thus giving Bayesian evidence that AGI is less likely to kill everyone)?
What are some relatively-likely examples of future possible observations that would make you think AGI is every likely to kill everyone? Would you expect to make observations like that well in advance of AGI (if doom is in fact likely), such that we can expect to have plenty of time to prepare if we ever have to make that future update? Or do you think we're pretty screwed, evidentially speaking, and can probably never update much toward 'this is likely to kill us' until it's too late to do anything about it?
Replies from: ekka↑ comment by ekka · 2022-04-12T06:31:28.929Z · LW(p) · GW(p)
I'm still forming my views and I don't think I'm well calibrated to state any probability with authority yet. My uncertainty still feels so high that I think my error bars would be too wide for my actual probability estimates to be useful. Some things I'm thinking about:
- Forecasters are not that great at making forecasts greater than 5 years out according to Superforecasting IIRC and I don't think AGI is going to happen within the next 5 years.
- AGI has not been created yet and its possible that AI development gets derailed due to other factors e.g.:
- Political and economic conditions change such that investment in AI slows down.
- Global conflict exacerbates which slows down AI (maybe this speeds it up but I think there would be other pressing needs when a lot of resources has to be diverted to war)
- Other global catastrophic risks could happen before AGI is developed i.e. should I be more scared of AGI than say nuclear war or GCBRs at this point (not that great but could still happen)
- On the path to AGI there could be a catastrophic failure that kills a few people but can be contained but gets people really afraid of AI.
- Maybe some of the work on AI safety ends up helping produce mostly aligned AI. I'm not sure if everyone dies if an AI is 90% aligned.
- Maybe the AGI systems that are built don't have instrumental convergence maybe if we get AGI through CAIS which seems to me like the most likely way we'll get there.
- Maybe like physics once the low hanging fruit has been plucked then it takes a while to make breakthroughs which extends the timelines
- For me to be personally afraid I'd have to think this was the primary way I would die which seems unlikely given all the other ways I could die between now and if/when AGI is developed.
- AI researchers, who are the people that most likely believe that AGI is possible more than anyone else, don't have consensus when it comes to this issue. I know experts can be wrong about their own fields but I'd expect them to be more split on the issue(I don't know what the current status is now just know what it was in the Grace et. al survey). I know very little about AGI, should I be more concerned than AI researchers are?
I still think it's important to work on AI Safety since even a small chance that AGI could go wrong would still have a high expected value in terms of the negative outcome. I think most of my thinking comes from the fact that I think it is more probable that there will be a slow take off instead of a fast take off. I may also just be bad at being scared or feeling doomed.
What are some relatively-likely examples of future possible observations that would make you think AGI is every likely to kill everyone?
People start building AI that is agentic and open ended in its actions.
Would you expect to make observations like that well in advance of AGI (if doom is in fact likely), such that we can expect to have plenty of time to prepare if we ever have to make that future update?
Yes, because I think the most likely scenario is a slow take off. This is because it costs money to scale compute and we actually need to validate and the more complex a system the harder it is to build correctly, probably takes a few iterations to get things to work well enough that it can be tested against a benchmark before moving on to trying to get a system to have more capability. I think this process will have to happen many times before getting to AI that is dangerous and on the way I'd expect to start seeing some interesting agentic behavior with short-horizon planning.
Or do you think we're pretty screwed, evidentially speaking, and can probably never update much toward 'this is likely to kill us' until it's too late to do anything about it?
I think the uncertainty will be pretty high until we start seeing sophisticated agentic behavior. Though I don't think we should wait that long to try come up with solutions since I think a small chance that this could happen still warrants concern.