Posts
Comments
He specifically told me when I asked this question that his views were the same as Geoff Hinton and Scott Aaronson and neither of them hold the view that smarter than human AI poses zero threat to humanity.
I enjoyed this and thank you for writing it. Ultimately, the only real reason to do this is for your own enjoyment or perhaps those of friends (and random people on the internet).
Non-signatories to the NPT (Israel, India, Pakistan), were able to and did develop nuclear weapons without being subject to military action. By contrast (and very much contrary to international law) Yudkowsky proposes that non-signatories to his treaty be subject to bombardment.
It is not a well-thought out exception. If this proposal were meant to be taken seriously it would make enforcement exponentially harder and set up an overhang situation where AI capabilities would increase further in a limited domain and be less likely to be interpretable.
The use of violence in case of violations of the NPT treaty has been fairly limited and highly questionable in international law. And, in fact, calls for such violence are very much frowned upon because of fear they have a tendency to lead to full scale war.
No one has ever seriously suggested violence as a response to potential violation of the various other nuclear arms control treaties.
No one has ever seriously suggested running a risk of nuclear exchange to prevent a potential treaty violation. So, what Yudkowsky is suggesting is very different than how treaty violations are usually handled.
Given Yudkowsky's view that the continued development of AI has an essentially 100% probability of killing all human beings, his view makes total sense - but he is explicitly advocating for violence up to and including acts of war. (His objections to individual violence mostly appear to relate to such violence being ineffective.)
I would think you could force the AI to not notice that the world was round, by essentially inputting this as an overriding truth. And if that was actually and exactly what you cared about, you would be fine. But if what you cared about was any corollary of the world being round or any result of the world being round or the world being some sort of curved polygon it wouldn't save you.
To take the Paul Tibbetts analogy: you told him not to murder and he didn't murder; but what you wanted was for him not to kill and in most systems including the one he grew up in killings of the enemy in war are not murder.
This may say more about the limits of the analogy than anything else, but in essence you might be able to tell the AI it can't deceive you, but it will be bound exactly by the definition of deception you provide and it will freely deceive you in any way that you didn't think of.
-
Other planets have more mass, higher insolation, lower gravity, lower temperature and/or rings and more (mass in) moons. I can think of reasons why any of those might be more or less desirable than the characteristics of Earth It is also possible that the AI may determine it is better off not to be on a planet at all. In addition, in a non- foom scenario, for defensive or conflict avoidance reasons the AI may wind up leaving Earth and once it does so may choose not to return.
-
That depends a lot on how it views the probe. In particular by doing this is it setting up a more dangerous competitor than humanity or not? Does it regard the probe as self? Has it solved the alignment problem and how good does it think it's solution is?
-
No. Humans aren't going to be the best solution. The question is whether they will be good enough that it would be a better use of resources to continue using the humans and focus on other issues.
-
It's definitely possible that it will discover extra reasons to process Earth (or destroy the humans even if it doesn't process Earth).
This is just wrong. Avoiding processing Earth doesn't require that the AI cares for us. Other possibilities include:
(1) Earth is not worth it; the AI determines that getting off Earth fast is better;
(2) AI determines that it is unsure that it can process Earth without unacceptable risk to itself;
(3) AI determines that humans are actually useful to it one way or another;
(4) Other possibilities that a super-intelligent AI can think of, that we can't.
There are, of necessity, a fair number of assumptions in the arguments he makes. Similarly, counter-arguments to his views also make a fair number of assumptions. Given that we are talking about something that has never happened and which could happen in a number of different ways, this is inevitable.
What makes monkeys intelligent in your view?
This is an interesting question on which I've gone back and forth. I think ultimately, the inability to recognize blatant inconsistencies or to reason at all means that LLMs so far are not intelligent. (Or at least not more intelligent than a parrot.)
Bing Chat is not intelligent. It doesn't really have a character. (And whether one calls it GPT-4 or not, given the number of post-GPT changes doesn't seem very meaningful.)
But to the extent that people think that one or more of the above things are true however, it will tend to increase skepticism of AI and support for taking more care in deploying it and for regulating it, all of which seem positive.
"An exception is made for jobs that fail to reach their employment due to some clearly identifiable non-software-related shock or change in trends, such as an economic crisis or a war. Such jobs will be removed from the list before computing the fraction."
But macroeconomic or geopolitical events such as major recession or war are likely to affect all job categories. So the correct way to deal with this is not to remove such jobs but to adjust the fraction by the change in overall employment.
There already exist communication mechanisms more nuanced than signing a petition. You can call or write/email your legislator with more nuanced views. The barrier is not the effort to communicate (which under this proposal might be slightly lower) but the effort to evaluate the issue and come up with a nuanced position.
If the risk from AGI is significant (and whether you think p(doom) is 1% or 10% or 100% is it unequivocally significant) and imminent (and whether your timelines are 3 years or 30 years it is pretty imminent) the problem is that an institution as small as MIRI is a significant part of the efforts to mitigate this risk, not whether or not MIRI gave up.
(I recognize that some of the interest in MIRI is the result of having a relatively small community of people focused on the AGI x-risk problem and the early prominence in that community of a couple of individuals, but that really is just a restatement of the problem).
I appreciate that you have defined what you mean when you say AGI. One problem with a lot of timeline work, especially now, is that AGI is not always defined.
That isn't very comforting. To extend the analogy: there was a period when humans were relatively less powerful when they would trade with some other animals such as wolves/dogs. Later, when humans became more powerful that stopped.
It is likely that the powers of AGI will increase relatively quickly, so even if you conclude there is a period when AGI will trade with humans that doesn't help us that much.
I was thinking along similar lines. I note that someone with amnesia probably remains generally intelligent, so I am not sure continuous learning is really required.
I was aware of a couple of these, but most are new to me. Obviously, published papers (even if this is comprehensive) represent only a fraction of what is happening and, likely, are somewhat behind the curve.
And it's still fairly surprising how much of this there is.
The problem is twofold. One, as and to the extent AI proliferates, you will eventually find someone who is less capable and careful about their sandboxing. Two, relatedly and more importantly, for much the same reason that people will not be satisfied with AIs without agency, they will weaken the sandboxing.
The STEM AI proposal referred to above can be used to illustrate this. If you want the AGI to do theoretical math you don't need to tell it anything about the world. If you want it to cure cancer, you need to give it a lot of information about physics, chemistry and mammalian biology. And if you want it to win the war or the election, then you have to tell it about human society and how it works. And, as it competes with others, whoever has more real time and complete data is likely to win.
I suspect that part of what is going on is that many in the AI safety community are inexperienced with and uncomfortable with politics and have highly negative views about government capabilities.
Another potential (and related) issue, is that people in the AI safety community think that their comparative advantage doesn't lie in political action (which is likely true) and therefore believe they are better off pursuing their comparative advantage (which is likely false).
That is too strong a statement. I think that it is evidence that general intelligence may be easier to achieve than commonly thought. But, past evidence has already shown that over the last couple of years and I am not sure that this is significant additional evidence in that regard.
It's not my fire alarm (in part because I don't think that's a good metaphor). But it has caused me to think about updating timelines.
My initial reaction was to update timelines, but this achievement seems less impressive than I thought at first. It doesn't seem to represent an advance in capabilities; instead it is (another) surprising result of existing capabilities.
My understanding is that starting in late 2020 with the release of Stockfish 12, Stockfish would probably be considered AI, but before that it would not be. I am, of course, willing to change this view based on additional information.
The original Alpha Zero- Stockfish match was in 2017, so if the above is correct, I think referring to Stockfish as non-AI makes sense.
"AI agents may not be radically superior to combinations of humans and non-agentic machines"
I'm not sure that the evidence supports this unless the non-agentic machines are also AI.
In particular: (i) humans are likely to subtract from this mix and (ii) AI is likely to be better than non-AI.
In the case of chess, after two decades of non-AI programming advances from the time that computers beat the best human, involving humans no longer provides an advantage over just using the computer programs. And, Alpha Zero fairly decisively beat Stockfish (one of the best non-AI chess programs).
If the requirement for this to be true is that the non-agentic machine needs to be non-agentic AI, I am unsure that this is a separate argument from the one about AI being non-agentic. Rather this is a necessary condition for that point.
My impression is that there has been a variety of suggestions about the necessary level of alignment. It is only recently that don't kill most of humanity has been suggested as a goal and I am not sure that the suggestion was meant to be taken seriously. (Because if you can do that, you can probably do much better; the point of that comment as I understand it was that we aren't even close to being able to achieve even that goal.)
As an empirical fact, humans are not perfect human face recognizers. It is something humans are very good at, but not perfect. We are definitely much better recognizers of human faces than of worlds high in human values. (I think it is perhaps more relevant to say consensus on what constitutes a human face is much. much higher than what constitutes a world high in human values.)
I am unsure whether this distinction is relevant for the substance of the argument however.
I don't have the same order, but tend to agree that option 0 is the most likely one.
This was well written and persuasive. It doesn't change my views against AGI on very short time lines (pre-2030), but does suggest that I should be updating likelihoods thereafter and shorten timelines.
Humans obtain value from other humans and depend on them for their existence. It is hypothesized that AGIs will not depend on humans for their existence. Thus, humans who would not push the button to kill all other humans may choose not to do so for reasons of utility that don't apply to AGI. Your hypothetical assumes this difference away, but our observations of humans don't.
As you not, human morality and values were shaped by evolutionary and cultural pressure in favor of cooperation with other humans. The way this presumably worked is that humans who were less able or willing to cooperate tended to die more frequently. And cultures that were less able or willing to do so were conquered and destroyed. It is unclear how we would be able to replicate this or how well it translates.
It is unclear how many humans would actually choose to press this button. Your guess is that between 5% and 50% of humans would choose to do so.
That doesn't suggest humans are very aligned; rather the opposite. It means that if we have between 2 and 20 AGIs (and those numbers don't seem unreasonable) between 1 and 10 would choose to destroy humanity. Of course, extinction is the extreme version; having an AGI could also result in other negative consequences.
Those are fair concerns, but my impression in general is that those kinds of attitudes will tend to moderate in practice as Balsa becomes larger, develops and focuses on particular issues. To the extent they don't and are harmful, Balsa is likely to be ineffective but is unlikely to be able to be significant enough to cause negative outcomes.
I understand why you get the impression you do. The issues mentioned are all over the map. Zoning is not even a Federal government issue. Some of those issues are already the subject of significant reform efforts. In other cases, such as "fixing student loans" it's unclear what Balsa's goal even is.
But, many of the problems identified are real.
And, it doesn't seem that much progress is being made on many of them.
So, Balsa's goal is worthy.
And, it may well be that Balsa turns out to be unsuccessful, but doing nothing is guaranteed to be unsuccessful.
So, I for one applaud this effort and am excited to see what comes of it.
It does sound reckless doesn't it? Even more so when you consider that over time you would likely have to eliminate many species of mosquito, not just one to achieve the effect you desire. And, as the linked nature article noted, this could have knock on effects on other species which prey on mosquitos.
I think your comment is important, because this is probably the heart of the objection to using gene drives to exterminate mosquitos.
I think a few points are relevant in thinking about this objection:
(1) We already take steps to reduce mosquito populations, which are successful in wealthier countries.
(2) This shows the limited ecological effects of eliminating mosquitos.
(3) The existing efforts are not narrowly targeted. Eliminating malaria and other disease causing mosquitos would enable these other efforts to stop, possibly reducing overall ecological effects.
(4) Malaria is a major killer and there are other mosquito borne diseases. If you are looking at this from a human-centered perspective, the ecological consequences would have to be clear and extreme to conclude that this step shouldn't be taken and the consequences don't appear to be clear or extreme. (If there is another perspective you are looking at this from, I'd be happy to consider it.)
(5) Humanity is doing its best to eradicate Guinea worm to universal praise. It's a slow process. Would you suggest reversing it? Why are mosquitos and Guinea worms different?
I think we have significantly longer. Still, if success requires several tens of thousands of people researching this for decades, we will likely fail.
(1) Reasoned estimates for the date as of which we will develop AGI start in less than two decades.
(2) To my knowledge, there aren't thousands studying alignment now (let alone tens of thousands) and there does not seem to be a significant likelihood of that changing in the next few years.
(3) Even if, by the early 2030s, there are 10s of thousands of researchers working on alignment, there is a significant chance they may not have time to work on it for decades before AGI is developed.
"When we refer to “aligned AI”, we are using Paul Christiano’s conception of “intent alignment”, which essentially means the AI system is trying to do what its human operators want it to do."
Reading this makes me think that the risk of catastrophe due to human use of AGI is higher than I was thinking.
In a word where AGI is not agentic, but is ubiquitous I can easily see people telling "their" AGIs to "destroy X" or "do something about Y" and having catastrophic results. (And attempts to prevent such outcomes could also have catastrophic results for similar reasons.)
So you may need to substantively align AGI (i.e. have AGI with substantive values or hard to alter restrictions) even if the AGI itself does not have agency or goals.
I think that either of the following would be reasonably acceptable outcomes:
(i) alignment with the orders of the relevant human authority, subject to the Universal Declaration of Human Rights as it exists today and other international human rights law as it exists today;
(ii) alignment with the orders of relevant human authority, subject to the constraints imposed on governments by the most restrictive of the judicial and legal systems currently in force in major countries.
Alignment doesn't mean that AGI is going to be aligned with some perfect distillation of fundamental human values (which doesn't exist) or the "best" set of human values (on which there is no agreement); it means that a range of horrible results (most notably human extinction due to rational calculation) is ruled out.
That my values aren't perfectly captured by those of the United States government isn't a problem. That the United States government might rationally decide it wanted to kill me and then do so would be.
Some of your constraints, in particular the first two, seem like they would not be practical in the real world in which AI would be deployed. On the other hand, there are also other things one could do in the real world which can't be done in this kind of dialogue, which makes boxing theoretically stronger.
However, the real problem with boxing is that whoever boxes less is likely to have a more effective AI, which likely results in someone letting an AI out of its box or more likely, loosening the box constraints sufficiently to permit an escape.
"The greatest cost is probably starting expansion a tiny bit later, not making the most effective use of what's immediately at hand."
Possible, but not definitely so. We don't really know all the relevant variables.
The two questions you pose are not equivalent. There are critiques of AI existential risk arguments. Some of them are fairly strong. I am unaware of any which do a good job of quantifying the odds of AI existential risk. In addition, your second question appears to be asking for a cumulative probability. It's hard to see how you can provide that absent a mechanism for eventually cutting AI existential risk to zero...which seems difficult.
You are making a number of assumptions here.
(1) The AI will value or want the resources used by humans. Perhaps. Or, perhaps the AI will conclude that being on a relatively hot planet in a high-oxygen atmosphere with lots of water isn't optimal and leave the planet entirely.
(2) The AI will view humans as a threat. The superhuman AI that those on Less Wrong usually posit, one so powerful that it can cause human extinction with ease, can't be turned off or reprogrammed and can manipulate humans as easily as I can type can't effectively be threatened by human beings.
(3) An AI which just somewhat cares about humans is insufficient for human survival. Why? Marginal utility is a thing.
In addition to being misleading, this just makes AI one more (small) facet of security. But security is broadly underinvested in and there is limited government pushback. In addition, there is already a security community which prioritizes other issues and thinks differently. So this would place AI in the wrong metaphorical box.
While I'm not a fan of the proposed solution I do want to note that its good that people are beginning to look at the problem.
In general, expressions of support for the bill will (modestly) help its passage. So, if you think this is a good bill and you live in the United States (1) call or write your Senators and urge them to support it; and (2) call or write your member of Congress, mention the Senate bill and urge them to introduce such a bill in the House.
If you think there are issues, then you should do the same thing, but instead note the issues and you should also write/call Senator Peters' Washington Office with the same message.
One line of reasoning is as follows:
- We don't know what goal(s) the AGI will ultimately have. (We can't reliably ensure what those goals are.)
- There is no particular reason to believe it will have any particular goal.
- Looking at all the possible goals that it might have, goals of explicitly benefiting or harming human beings are not particularly likely.
- On the other hand, because human beings use resources which the AGI might want to use for its own goals and/or might pose a threat to the AGI (by, e.g. creating other AGIs) there are reasons why an AGI not dedicated to harming or benefiting humanity might destroy humanity anyway. (This is an example or corollary of "instrumental convergence".)
- Because of 3, minds tortured for eternity is highly unlikely.
- Because of 4, humanity being ended in the service of some alien goal which has zero utility from the perspective of humanity is far more likely.
Exactly this. The rest, those little irregularities, at the time didn't matter, because we didn't know what we didn't know.
It is a separate and entirely different problem.
First, do no harm.
One in a hundred likely won't be enough if the organization doing the boxing is sufficiently security conscious. (And if not, there will likely be other issues.)
China is currently an effective peer competitor of the US, among other issues. 2010 is a rough estimate of when that condition started to obtain.
I think people here are uncomfortable advocating for political solutions either because of their views of politics or their comfort level with it.
You don't have to believe that alignment is impossible to conclude that you should advocate for a political/governmental solution. All you have to believe is that the probability of x-risk from AGI is reasonably high and the probably of alignment working to prevent it it not reasonably high. That seems to describe the belief of most of those on LessWrong.
I suspect you will not accept this answer, but for many practical definitions the United States had control over the world starting in 1991 and ending around 2010.
It suggests putting more weight on a plan to get AI Research globally banned. I am skeptical that this will work (though if burning all GPUs would be a pivotal act the chances of success are significantly higher), but it seems very unlikely that there is a technical solution either.
In addition, at least some purported technical solutions to AI risk seem to meaningfully increase the risk to humanity. If you have someone creating an AGI to exercise sufficient control over the world to execute a pivotal act, that raises the stakes of being first enormously which incentivizes cutting corners. And, it also makes it more likely that the AGI will destroy humanity and be quicker to do so.