[Prediction] Humanity will survive the next hundred years
post by lsusr · 2023-02-25T18:59:57.845Z · LW · GW · 44 commentsContents
Reasoning None 44 comments
Definition: Living Homo sapiens will live on the surface of the planet Earth exactly 100 years from the publication of this post. This is similar to the Caplan-Yudkowsky bet except I extend the duration from 13 to 100 years and remove the reference to the cosmic endowment.
Confidence: >80%. I'd like to put it higher but there's a lot of unknowns and I'm being conservative.
I'd be happy to bet money at the Caplan-Yudokowsky bet [LW · GW]'s 2-1 odds on a 13-year horizon. However, I don't think their clever solution actually works. The only way Yudkowsky benefits from his loan is if he spends all of his resources before 2030, but if he does that then Caplan won't get paid back, even if Caplan wins the bet. Morever, even Yudkowsky doesn't seem to treat this as a serious bet.
While there's no good way to bet that the world will end soon (except indirectly via financial derivatives), there is a way to bet that our basic financial infrastructure will continue to exist for decades to come: I am investing in a Roth IRA.
Reasoning
Biowarfare [LW · GW] won't kill everyone. Nuclear war won't kill everyone. Anthropogenic global warming won't kill everyone. At worst, these will destroy civilization which, counterintuitively, makes Homo sapiens more likely to survive on the short term (century). The same goes for minor natural disasters like volcanic eruptions.
Natural disasters like giant meteors, or perhaps a gamma ray burst, are unlikely. The last time something like that happened was 66 million years ago. The odds of something similar happening in the next century are on the order of . That's small enough to ignore for the purposes of this bet. The only way everyone dies is via AI.
I see two realistic roads to superintelligent world optimizers.
- Human simulator mesa optimizer running on a non-agentic superintelligence.
- Inhuman world-optimizing agent.
Human simulators are unlikely to exterminate humanity by accident because the agent mesa optimizer is (more or less) human aligned and the underlying superintelligence (currently LLMs) is not a world optimizer. The underlying superintelligence won't kill everyone (intentionally or unintentionally), because it's not a world optimizer. The emulated human probably won't kill everyone by accident because that would require a massive power differential combined with malevolent intent. Another way humanity gets exterminated in this scenario is as the side effect of futuretech war, but since neither Yudkowsky nor Caplan consider this possibility likely I don't feel I need to explain why it's unlikely.
Inhuman world-optimizing agents are unlikely to turn the Universe into paperclips because that's not the most likely failure mode. A world-optimizing agents must align its world model with reality [LW · GW]. Poorly-aligned world-optimizing agents instrumentally converge, not on siezing control of reality, but on the much easier task of siezing competing pieces of their own mental infrastructure. A misaligned world optimizer that seeks to minimize conflict between its sensory data and internal world model will just turn off its sensors.
These ideas are not very fleshed out. I'm posting, not to explain my logic, but to publicly register a prediction. If you disagree (or agree), then please register a counter prediction in the comments.
44 comments
Comments sorted by top scores.
comment by gjm · 2023-02-27T13:45:14.666Z · LW(p) · GW(p)
There are several instances in the comments of the following schema: Someone finds lsusr's (self-admittedly not very fleshed out) arguments unsatisfactory and explains why; lsusr responds with just "Would you like to register a counter-prediction?".
I do not think this is a productive mode of discussion, and would like to see less of it. I think the fault is lsusr's.
These responses give me the very strong impression -- and I would bet fairly heavily that I am not alone in this -- that lsusr considers those comments inappropriate, and is trying to convey something along the lines of "put up or shut up; if you disagree with me then you should be able to make a concrete prediction that differs from mine; otherwise, no one should care about your arguments".
I see three possibilities.
First, maybe lsusr considers that (especially given the last paragraph in the OP) any discussion of the arguments at all is inappropriate. In that case, I think either (1) the arguments should simply not be in the OP at all or else (2) there should be an explicit statement along the lines of "I am not willing to discuss my reasoning and will ignore comments attempting to do so".
Second, maybe lsusr is happy to discuss the arguments, but only with people who demonstrate their seriousness by making a concrete prediction that differs from lsusr's. I think this would be a mistake, because it is perfectly possible to find something wrong in an argument without having a definite opinion on the proposition the argument is purporting to support. If someone says "2+2=5, therefore the COVID-19 pandemic was the result of a lab leak", one should not have to have formed a definite opinion on the origin of the pandemic in order to notice that 2+2 isn't 5 or that (aside from ex falso quodlibet) there is no particular connection between basic arithmetic and the origins of the pandemic.
(Also: some of those arguments may be of independent interest. lsusr's claim that poorly-aligned world-optimizing agents converge on "seizing competing pieces of their own mental infrastructure" sounds rather interesting if true -- and, for what it's worth, implausible to me prima facie -- regardless of how long the human race is going to survive.)
Third, maybe the impression I have is wrong, and lsusr doesn't in fact disapprove of, or dislike, comments addressing the arguments without a concrete disagreement with the conclusions. In that case, I think lsusr is giving a misleading impression by not engaging with any such comments in any way other than by posting the same "Would you like to register a counter-prediction?" response to all of them.
Replies from: lsusr, sharmake-farah↑ comment by lsusr · 2023-02-28T03:07:49.440Z · LW(p) · GW(p)
Your comment is contingent on several binary possibilities about my intentions. I appreciate your attempt to address all leaves of the decision tree. Here I will help limit the work you have to do by pinning things down.
To clarify,
My post serves one purpose: to register a public prediction. I am betting reputation. But it makes no sense to bet reputation on something everyone agrees on. It only makes sense to bet on things people disagree on. I'm hoping people will make counter-predictions because that can help verify, in the future, that the claims I made were disputed at the time.
These responses give me the very strong impression -- and I would bet fairly heavily that I am not alone in this -- that lsusr considers those comments [comments without specific predictions] inappropriate, and is trying to convey something along the lines of "put up or shut up; if you disagree with me then you should be able to make a concrete prediction that differs from mine; otherwise, no one should care about your arguments".
Not exactly. The comments are prefectly appropriate. I don't plan on engaging them in either direction because the purpose of this post isn't (for me) to debate. It's to register a public prediction.
So why ask if people want to make a counter-prediction? Because arguing against me without making a concrete prediction after I just made a public prediction puts comments in a frustratingly ambiguous state where it's not obvious whether they constitute a counter-prediction. I want to avoid ambiguity in future evaluation of these threads.
To put it another way, someone could claim "I knew lsusr was wrong--see this comment I made" if I turn out to be wrong. That same person could also claim "I don't lose reputation because I didn't make an explicit counter-prediction". I want to avoid this potential for strategic ambiguity.
First, maybe lsusr considers that (especially given the last paragraph in the OP) any discussion of the arguments at all is inappropriate. In that case, I think either (1) the arguments should simply not be in the OP at all or else (2) there should be an explicit statement along the lines of "I am not willing to discuss my reasoning and will ignore comments attempting to do so".
They're not arguments intended to convince anyone else of anything. They're personal reasons for my conclusion. I think it's better to have them than not to have them, because my post is part of a collaborative effort to find the truth, and more transparency is better toward achieving this end.
As for an explicit statement, here's something I could try the next time I make a similar post:
This post primarily serves as a public prediction. Please begin all comments with either a counter-prediction or [no prediction]. You are welcome to debate the logic of my reasoning, but do not expect me to engage with you. Right now I am putting skin in the game, not grandstanding.
Second, maybe lsusr is happy to discuss the arguments, but only with people who demonstrate their seriousness by making a concrete prediction that differs from lsusr's.
Nope, but I appreciate you considering the possibility. I am happy to consider the arguments elsewhere [LW · GW], but the arguments presented here are too thin to defend. For each tiny point I'd have to write a whole post, and if/when I do that I'd rather make an actual top-level post.
Third, maybe the impression I have is wrong, and lsusr doesn't in fact disapprove of, or dislike, comments addressing the arguments without a concrete disagreement with the conclusions.
I "[don't] in fact disapprove of, or dislike, comments addressing the arguments without a concrete disagreement with the conclusions." I enjoy them, actually. I just want to clarify whether the comments are counter-predictions or not.
I appreciate the feedback. I will consider it in the future to as not to give a misleading impression.
Replies from: gjm↑ comment by gjm · 2023-02-28T12:21:19.055Z · LW(p) · GW(p)
Thanks for the clarification.
I'm not sure I quite agree with you about strategic ambiguity, though. Again, imagine that you'd said "I am 80% confident that the human race will still be here in 100 years, because 2+2=5". If someone says "I don't know anything about existential risk, but I know that 2+2 isn't 5 and that aside from ex falso quodlibet basic arithmetic like this obviously can't tell us anything about it", then I am perfectly happy for them to claim that they knew you were wrong even though they didn't stand to lose anything if your overall prediction turns out right.
(My own position, not that anyone should care: my gut agrees with lsusr's overall position "but I try not to think with my gut"; I don't think I understand all the possible ways AI progress could go well enough for any prediction I'd make by explicitly reasoning it out to be worth much; accordingly I decline to make a concrete prediction; I mostly agree that making such predictions is a virtuous activity because it disincentivizes overconfident-looking bullshitting, but I think admitting one's ignorance is about equally virtuous; the arguments mentioned in the OP seem to me unlikely to be correct but I could well be missing important insights that would make them more plausible. And I do agree that the comments lsusr replied to in the way I'm gently objecting to would have been improved by adding "and therefore I think our chance of survival is below 20%" or "but I do agree that we will probably still be here in 100 years" or "and I have no idea about the actual prediction lsusr is making" or whatever.)
Replies from: lsusr↑ comment by Noosphere89 (sharmake-farah) · 2023-02-27T17:08:28.122Z · LW(p) · GW(p)
While I agree with you that the approach lsusr's taking isn't great, and I disagree with the focus on prediction right now, at the same time I do sympathize with lsusr's approach, and I think this is related to a problem LW has on AI safety epistemics, which is related to general problems of epistemics on LW.
1a3orn nailed it perfectly in that LW has theories that are not predictive enough, in that you could justify too many outcomes of AI safety with the theories we have, and from my meta/outside viewpoint, a lot of LW is neither using empirical evidence like science or trying to formalize things as mathematics has done, but rather philosophizing, which is terrible at getting anywhere close to the epistemic truth of the matter often, and the response is some variation of LW is special at something, but why is this assumed rather than deferring to the outside view.
Don't get me wrong, I think LW is right on that the science establishment we have is deeply inadequate, and is stuck in an inadequate equilibrium, but one thing we have learned at great cost is that we need to be empirical and touch reality, rather than appeal to cultural traditions or out of touch grand theories.
What lsusr is trying to do is to get people to have predictions on AI safety that are falsifiable and testable, to prevent the problem of being out of touch with reality.
A good model here is Ajeya Cotra and the Technical interpretability community. While Stephen Casper criticized them, they're probably the only group that is actually trying to touch reality. The fact that that most LWers are either hostile to empirical evidence is probably a product of assumed specialness, without any reason for that specialness.
Replies from: gjmcomment by lc · 2023-02-25T19:45:48.002Z · LW(p) · GW(p)
Human simulators are unlikely to exterminate humanity by accident because the agent mesa optimizer is (more or less) human aligned and the underlying superintelligence (currently LLMs) is not a world optimizer.
If a superintelligent LLM is not a mesa-optimizer itself, it can be turned into an optimizer via a one-line bash script asking it to produce the shell commands that maximize some goal. So this isn't much help unless you can use that superintelligent LLM to patch the holes in humanity that would allow someone to squiggle the planet.
Inhuman world-optimizing agents are unlikely to turn the Universe into paperclips because that's not the most likely failure mode. A world-optimizing agents must align its world model with reality. Poorly-aligned world-optimizing agents instrumentally converge, not on siezing control of reality, but on the much easier task of siezing competing pieces of their own mental infrastructure. A misaligned world optimizer that seeks to minimize conflict between its sensory data and internal world model will just turn off its sensors.
-
This appropriates the word alignment in a way that is probably unhelpful to your thesis, whatever you intend to mean.
-
This is not what humans do, so it is clearly possible, conceptually speaking, for inhuman world-agents to target the outside world instead of maximize an internal worldscore variable. And even maximizing an internal worldscore variable can be unsafe, if the robot decides it wants to use all available matter to add "1s" to the number.
↑ comment by Razied · 2023-02-25T19:55:39.925Z · LW(p) · GW(p)
If a superintelligent LLM is not a mesa-optimizer itself, it can be turned into an optimizer via a one-line bash script asking it to produce the shell commands that maximize some goal.
Why would a LLM trained on internet text ever do something like this? The most likely continuation of a prompt asking it to produce shell commands to take over the world is very unlikely to actually contain such commands, because that's not the sort of thing that exists in the training data. The LLM might contain latent superintelligent capabilities, but it's still being aimed at predicting the continuations that were likely in its training set.
Replies from: abramdemski, lc↑ comment by abramdemski · 2023-03-06T16:44:58.224Z · LW(p) · GW(p)
Here's my answer.
People fine-tune the superintelligent LLM to do something other than pure prediction, like with ChatGPT. Because it's "superintelligent", it has the capabilities buried in there (which is to say, more specifically, it can generate superhumanly-intelligent outputs if conditioned on superhumanly intelligent inputs -- I'm not trying to argue this as what will happen, it's just my interpretation of the assumption of "superintelligent LLM"). So perhaps fine-tuning on a dataset of true answers to hard questions brings this out. Or perhaps using RLHF or something else.
I agree that this isn't a "one-line bash script". My interpretation of lc is that "LLM" doesn't necessarily mean pure prediction (sine existing LLMs aren't only trained on pure prediction, either); and in particular "superintelligent LLM" suggests that someone found a way to get superhumanly-useful outputs from an LLM (which people surely try to do).
↑ comment by lc · 2023-02-25T19:57:00.638Z · LW(p) · GW(p)
I'm not saying it would do something like this. I'm saying that as soon as you release it someone out there will say "OK LLM, maximize stock price of my company".
Replies from: Razied↑ comment by Razied · 2023-02-25T20:01:51.171Z · LW(p) · GW(p)
Certainly, someone will for sure ask it to produce the text that maximizes the stock price of their company, then the superLLM will pass that prompt through its model, and output the most likely continuation of that request, which is not at all text that actually maximizes the stock price. Because out of all instances of text containing "Please maximize my stock price" over the internet, there are no examples of superintelligent outputs to that request. It's more likely to consider that request as part of a story prompt, or output something like "I don't know how to do that", even if it did internally know how to do that.
Replies from: abramdemski, lc↑ comment by abramdemski · 2023-03-06T16:51:43.506Z · LW(p) · GW(p)
I want to note that if we assume it's merely a superintelligent predictor, trained on all available data in the world, but only able to complete patterns super-well, it's still extremely useful for predicting stock prices. This is in itself an incredibly profitable ability, and can also be leveraged to "output text that maximizes stock price" without too much difficulty:
- Have the system output some text periodically.
- Interleave the company stock prices between text blocks.
- Generate a large number of samples for each new prediction, and keep the text blobs for which further completions predict high stock prices down the line. (This can be done automatically - no human review, just look at the predicted price.)
Not saying this is a great technique in real life, just saying that if we assume "really great predictor" and go from there, this will eventually start working well, as the system notices the influence of its text blobs on the subsequent stock prices.
comment by Donald Hobson (donald-hobson) · 2023-02-26T17:32:19.704Z · LW(p) · GW(p)
Inhuman world-optimizing agents are unlikely to turn the Universe into paperclips because that's not the most likely failure mode.
Suppose 100 such agents are made. The first 99 brick themselves, they wirehead and then do nothing. The 100'th destroys the world. You need to claim that this failure mode is sufficiently unlikely that all AI experiments on earth fail to hit it.
A world-optimizing agents must align its world model with reality [LW · GW]. Poorly-aligned world-optimizing agents instrumentally converge, not on siezing control of reality, but on the much easier task of siezing competing pieces of their own mental infrastructure.
You whatsmore this argument needs to hold, even in there are humans seeing their AI siezing their own mental infrastructure, and trying to design an AI that doesn't do this. Do you believe the only options are agents that harmlessly wirehead, and aligned AI?
A misaligned world optimizer that seeks to minimize conflict between its sensory data and internal world model will just turn off its sensors.
Maybe, but will it write a second AI, an AI that takes over the world to ensure the first AI receives power and doesn't have it's sensors switched off? If you really care about your own mental infrastructure, the easiest way to control it might be to code a second AI that takes nanotech to your chip.
Or maybe, once the conflict is done, and half the AI has strangled the other half, the remaining mind is coherent enough to take over the world.
Replies from: lsusrcomment by abramdemski · 2023-02-26T19:11:10.667Z · LW(p) · GW(p)
This is just to state that I'm very far from buying any of the arguments as-written. They seem like dumb arguments. But then, you say you're not trying to explain your arguments here.
Replies from: lsusr↑ comment by lsusr · 2023-02-26T19:38:21.810Z · LW(p) · GW(p)
Would you like to register a counter prediction?
Replies from: abramdemski↑ comment by abramdemski · 2023-02-28T17:02:10.045Z · LW(p) · GW(p)
Sure. It seems to me like humans are in a bad spot, with a significant chance of not surviving the next hundred years, depending mainly on whether very weak alignment methods are enough to land us in anything like a corrigibility basin.
comment by Vladimir_Nesov · 2023-02-25T20:23:40.530Z · LW(p) · GW(p)
Human simulator mesa optimizer running on a non-agentic superintelligence
Even if LLM AGIs are directly aligned [LW(p) · GW(p)] (don't discard humanity themselves), they are not necessarily transitively aligned [LW(p) · GW(p)] (don't build something that discards both them and humanity). If they fail at not building inhuman world-optimizing agents as soon as they are able, there is going to be only a brief time when LLM AGIs are in charge, much shorter than 100 years.
Replies from: lsusr↑ comment by lsusr · 2023-02-25T22:01:19.532Z · LW(p) · GW(p)
Philosophical zombie takeover seems like a real possibility to me.
Replies from: Vladimir_Nesov↑ comment by Vladimir_Nesov · 2023-02-25T22:19:59.414Z · LW(p) · GW(p)
Huh? Anyway, I don't believe in p-zombies, and for example think that human emotions expressed by sufficiently coherent LLM characters are as real as human emotions, because emotions exist as thingy [LW(p) · GW(p)] simulacra [LW(p) · GW(p)] even when there are no concrete physical or cognitive-architectural correlates.
Replies from: lsusr, JacobW↑ comment by JacobW38 (JacobW) · 2023-02-26T09:59:27.832Z · LW(p) · GW(p)
Explain to me how a sufficiently powerful AI would fail to qualify as a p-zombie. The definition I understand for that term is "something that is externally indistinguishable from an entity that has experience, but internally has no experience". While it is impossible to tell the difference empirically, we can know by following evolutionary lines: all future AIs are conceptually descended from computer systems that we know don't have experience, whereas even the earliest things we ultimately evolved from almost certainly did have experience (I have no clue at what other point one would suppose it entered the picture). So either it should fit the definition or I don't have the same definition as you.
Your statement about emotions, though, makes perfect sense from an outside view. For all practical purposes, we will have to navigate those emotions when dealing with those models exactly as we would with a person. So we might as well consider them equally legitimate; actually, it'd probably be a very poor idea not to, given the power these things will wield in the future. I wouldn't want to be basilisked because I hurt Sydney's feelings.
Replies from: gbear605, Dentin↑ comment by gbear605 · 2023-02-26T10:23:35.051Z · LW(p) · GW(p)
If you look far enough back in time, humans are are descended from animals akin to sponges that seem to me like they couldn’t possibly have experience. They don’t even have neurons. If you go back even further we’re the descendants of single celled organisms that absolutely don’t have experience. But at some point along the line, animals developed the ability to have experience. If you believe in a higher being, then maybe it introduced it, or maybe some other metaphysical cause, but otherwise it seems like qualia has to arise spontaneously from the evolution of something that doesn’t have experience - with possibly some “half conscious” steps along the way.
From that point of view, I don’t see any problem with supposing that a future AI could have experience, even if current ones don’t. I think it’s reasonable to even suppose that current ones do, though their lack of persistent memory means that it’s very alien to our own, probably more like one of those “half conscious” steps.
Replies from: JacobW↑ comment by JacobW38 (JacobW) · 2023-02-26T11:09:32.617Z · LW(p) · GW(p)
If you go back even further we’re the descendants of single celled organisms that absolutely don’t have experience.
My disagreement is here. Anyone with a microscope can still look at them today. The ones that can move clearly demonstrate acting on intention in a recognizable way. They have survival instincts just like an insect or a mouse or a bird. It'd be completely illogical not to generalize downward that the ones that don't move also exercise intention in other ways to survive. I see zero reason to dispute the assumption that experience co-originated with biology.
I find the notion of "half consciousness" irredeemably incoherent. Different levels of capacity, of course, but experience itself is a binary bit that has to either be 1 or 0.
Replies from: gbear605↑ comment by gbear605 · 2023-02-26T11:57:38.717Z · LW(p) · GW(p)
If bacteria have experience, then I see no reason to say that a computer program doesn’t have experience. If you want to say that a bacteria has experience based on guesses from its actions, then why not say that a computer program has experience based on its words?
From a different angle, suppose that we have a computer program that can perfectly simulate a bacteria. Does that bacteria have experience? I don’t see any reason why not, since it will demonstrate all the same ability to act on intention. And if so, then why couldn’t a different computer program also be conscious? (If you want to say that a computer can’t possibly perfectly simulate a bacteria, then great, we have a testable crux, albeit one that can’t be tested right now.)
↑ comment by Dentin · 2023-02-26T12:46:10.154Z · LW(p) · GW(p)
AIUI, you've got the definition of a p-zombie wrong in a way that's probably misleading you. Let me restate the above:
"something that is externally indistinguishable from an entity that experiences things, but internally does not actually experience things"
The whole p-zombie thing hinges on what it means to "experience something", not whether or not something "has experience".
Replies from: gbear605↑ comment by gbear605 · 2023-02-26T13:03:46.073Z · LW(p) · GW(p)
I understand that definition, which is why I’m confused for why you brought up the behavior of bacteria as evidence for why bacteria has experience. I don’t think any non-animals have experience, and I think many animals (like sponges) also don’t. As I see it, bacteria are more akin to natural chemical reactions than they are to humans.
I brought up the simulation of a bacteria because an atom-for-atom simulation of a bacteria is completely identical to a bacteria - the thing that has experience is represented in the atoms of the bacteria, so a perfect simulation of a bacteria must also internally experience things.
comment by Razied · 2023-02-25T19:40:06.845Z · LW(p) · GW(p)
Human simulators are unlikely to exterminate humanity by accident because the agent mesa optimizer is (more or less) human aligned and the underlying superintelligence (currently LLMs) is not a world optimizer.
GPT-N is not a Human simulator, but more like a "text-existing-on-the-internet simulator". If you give it a prompt conditioned on metadata with a future date, it will need to internally predict the future of humanity, and if it predicts that humanity does not solve alignment, then some significant fraction of the text on the internet might be written by malign AIs, which means that GPT-N will try to internally simulate a malign AI. I think there are ways of conditioning LLMs to mitigate this sort of problem, but it doesn't just fall out naturally out of them being "human simulators".
comment by Teerth Aloke · 2023-02-26T06:03:34.851Z · LW(p) · GW(p)
Biowarfare [LW · GW] won't kill everyone. Nuclear war won't kill everyone. Anthropogenic global warming won't kill everyone. At worst, these will destroy civilization which, counterintuitively, makes Homo sapiens more likely to survive on the short term (century). The same goes for minor natural disasters like volcanic eruptions.
Natural disasters like giant meteors, or perhaps a gamma ray burst, are unlikely. The last time something like that happened was 66 million years ago. The odds of something similar happening in the next century are on the order of . That's small enough to ignore for the purposes of this bet. The only way everyone dies is via AI
Strongly agreed with this. The only non-infinitesimal probability of human extinction comes from an alien intelligence (like AI) actively pursuing the goal of human extinction.
Replies from: donald-hobson↑ comment by Donald Hobson (donald-hobson) · 2023-02-26T17:40:46.298Z · LW(p) · GW(p)
Some not totally tiny chance on humans pursuing human extinction using say, advanced nanotech.
Replies from: Teerth Aloke↑ comment by Teerth Aloke · 2023-02-27T11:07:52.204Z · LW(p) · GW(p)
Admitted.
comment by [deleted] · 2023-02-25T20:29:51.367Z · LW(p) · GW(p)
Here's a third form of AGI:
https://www.lesswrong.com/posts/Aq82XqYhgqdPdPrBA/?commentId=Mvyq996KxiE4LR6ii [LW · GW]
This is a search for the most powerful and general algorithm (what I call a cognitive architecture), where you do not put any optimization pressure on the algorithm having "world optimizing" ability.
When you use the agent, you use it in "sessions", like LLMs, and those sessions are finite time and you clear state variables afterwards. No online training, training must be all offline.
For ongoing tasks, the agent must fill out a data structure that is human readable such that another agent can seamlessly "pick up" where the last ended session at.
The reason why this form of AGI is likely to beat the 2 forms you mention is :
(1) it's much cheaper and faster to reach AGI and then ASI.
(2) it will outperform in ability the "human emulator", and will be measurably safer on tasks due to no state buildup than the "world optimizer". The world optimizer cannot be tested for safety because the machine is constantly accumulating state. You need to be able to test models where they are always in a known state, and you benchmark their reliability.
Many of the ideas were drawn from :
https://www.lesswrong.com/posts/HByDKLLdaWEcA2QQD/applying-superintelligence-without-collusion [LW · GW]
https://www.lesswrong.com/posts/5hApNw5f7uG8RXxGS/the-open-agency-model
comment by DaemonicSigil · 2023-02-25T20:08:10.623Z · LW(p) · GW(p)
Counter-predictions:
- Humanity will still be around in 2030 (P = 90%)
- ... in 2040 (P = 70%)
- A Strong AGI will be built by 2123 (P = 75%)
- Conditional on this, no attempts at solving the alignment problem succeed (P = no idea / depends on our decisions)
- Conditional on this, humanity survives anyway, because the AI is aligned by default or from some other reason we survive without solving alignment. (P = 10%)
- Conditional on this, no attempts at solving the alignment problem succeed (P = no idea / depends on our decisions)
↑ comment by lsusr · 2023-02-25T20:11:21.797Z · LW(p) · GW(p)
What are your definitions for "Strong AGI", "the alignment problem succeed[s]" and "humanity survives"?
Replies from: DaemonicSigil↑ comment by DaemonicSigil · 2023-02-25T20:50:08.631Z · LW(p) · GW(p)
Strong AGI: Artificial intelligence strong enough to build nanotech, while being at least as general as humans (probably more general). This definition doesn't imply anything about the goals or values of such an AI, but being at least as general as humans does imply that it is an agent that can select actions, and also implies that it is at least as data-efficient as humans.
Humanity survives: At least one person who was alive before the AI was built is still alive 50 years later. Includes both humanity remaining biological and uploading, doesn't include everyone dying.
Alignment problem: The problem of picking out an AI design that won't kill everyone from the space of possible designs for Strong AGI.
Aligned by default: Maybe most of the space of possible designs for Strong AGI does in fact consist of AIs that won't kill everyone. If so, then "pick a design at random" is a sufficient strategy for solving the alignment problem.
An attempt at solving the alignment problem: Some group of people who believe that the alignment problem is hard (i.e. picking a design at random has a very low chance of working) try to solve it. The group doesn't have to be rationalists, or to call it the "alignment problem" though.
A successful attempt at solving the alignment problem: One of the groups in the above definition do in fact solve the alignment problem, i.e. they find a design for Strong AGI that won't kill everyone. Important note: If Strong AGI is aligned by default, then no attempts are considered successful, since the problem wasn't really a problem in the first place.
Replies from: lsusr↑ comment by lsusr · 2023-02-25T22:00:08.265Z · LW(p) · GW(p)
Thanks. These seem like good definitions. They actually set the bar high for your prediction, which is respectable. I appreciate you taking this seriously.
If you'll permit just a little bit more pedantic nitpicking, do you mind if I request a precise definition of nanotech? I assume you mean self-replicating nanobots (grey goo) because, technically, we already have nanotech. However, putting the bar at grey goo (potential, of course—the system doesn't have to actually make it for real) might be setting it above what you intended.
Replies from: DaemonicSigil↑ comment by DaemonicSigil · 2023-02-26T10:00:47.438Z · LW(p) · GW(p)
Self replicating nanotech is what I'm referring to, yes. Doesn't have to be a bacteria-like grey goo sea of nanobots, though. I'd generally expect nanotech to look more like a bunch of nanofactories, computers, energy collectors, and nanomachines to do various other jobs, and some of the nanofactories have the job of producing other nanofactories so that the whole system replicates itself. There wouldn't be the constraint that there is with bacteria where each cell is in competition with all the others.
comment by simon · 2023-02-25T19:09:42.322Z · LW(p) · GW(p)
The only way Yudkowsky benefits from his loan is if he spends all of his resources before 2030, but if he does that then Caplan won't get paid back, even if Caplin wins the bet.
I don't think that's true; he has sufficient earning potential to come up with the money in a short time frame near when it's due.
Replies from: lsusr↑ comment by lsusr · 2023-02-25T19:10:59.503Z · LW(p) · GW(p)
I think the right place for this discussion is on my other post The Caplan-Yudkowsky End-of-the-World Bet Scheme Doesn't Actually Work [LW · GW].