Posts
Comments
Yeah so I think Tomasik has written basically what I’m saying https://reducing-suffering.org/dissolving-confusion-about-consciousness/
I can think of two interpretations of consciousness being "causally connected" to physical systems:
1. consciousness is the result of physical phenomena like brain states, but it does not cause any. So it has an in-edge coming from the physical world, but not an out-edge to the physical world. Again, this implies that consciousness cannot be what causes me to think about consciousness.
2. consciousness causes things in the physical world. Which, again, I believe, necessitates a consciousness variable in the laws of the universe.
Note that I am not trying to get at what Eliezer was arguing, I am asking about the consequences of his arguments, even ones that he may not have intended.
So does this suppose that there is some “consciousness” variable in the laws of the universe? If consciousness causes me to think X, and thinking X can be traced back to a set of physical laws that govern the neurons in my brain, then there must be some consciousness variable somewhere in these physical laws, no? Otherwise it has to be that consciousness corresponds to some physical phenomenon, and it is that phenomenon - not the consciousness - that caused you to think about it. If there were no consciousness attached to that physical phenomenon, you would go along just the same way, thinking the exact same thing.
I was reading this old Eliezer piece arguing against the conceivability of p-zombies. https://www.lesswrong.com/posts/7DmA3yWwa6AT5jFXt/zombies-redacted
And to me this feels like a more general argument against the existence of consciousness itself, in any form similar to how we normally think about it, not just against p-zombies.
Eliezer says that consciousness itself cannot be what causes me to think about consciousness, that causes philosophers to write papers about consciousness, thus it must be the physical system that such consciousness corresponds to that causes these things. But then…that seems to discredit the ability to use your own consciousness as evidence for anything. If my own conscious experience cannot be evidence for me thinking about consciousness, why should we think consciousness exists?
Am I confused about something here?
That's not a simple problem.First you have to specify "not killing everyone" robustly (outer alignment) and then you have to train the AI to have this goal and not an approximation of it (inner alignment).
See my other comment for the response.
Anyway, the rest of your response is spent talking about the case where AI cares about its perception of the paperclips rather than the paperclips themselves. I'm not sure how severity level 1 would come about, given that the AI should only care about its reward score. Once you admit that the AI cares about worldly things like "am I turned on", it seems pretty natural that the AI would care about the paperclips themselves rather than its perception of the paperclips. Nevertheless, even in severity level 1, there is still no incentive for the AI to care about future AIs, which contradicts concerns that non-superintelligent AIs would fake alignment during training so that future superintelligent AIs would be unaligned.
We don't know how to represent "do not kill everyone"
I think this goes to Matthew Barnett’s recent article of actually yes we do. And regardless I don’t think this point is a big part of Eliezer’s argument. https://www.lesswrong.com/posts/i5kijcjFJD6bn7dwq/evaluating-the-historical-value-misspecification-argument
We don't know how to pick which quantity would be maximized by a would-be strong consequentialist maximizer
Yeah so I think this is the crux of it. My point is that if we find some training approach that leads to a model that cares about the world itself rather than hacking some reward function, that’s a sign that we can in fact guide the model in important ways and there’s a good chance this includes being able to tell it not to kill everyone
We don't know know what a strong consequentialist maximizer would look like, if we had one around, because we don't have one around (because if we did, we'd be dead)
This is just a way of saying “we don’t know what AGI would do”. I don’t think this point pushes us toward x-risk any more than it pushes us toward not-x-risk.
I also do not think the responses to this question are satisfying enough to be refuting. I don’t even think they are satisfying enough to make me confident I haven’t just found a hole in AI risk arguments.This is not a simple case of “you just misunderstand something simple”.
I don’t care that much but if LessWrong is going to downvote sincere questions because it finds them dumb or whatever this will make for a site very unwelcoming to newcomers
I do indeed agree this is a major problem even if I'm not sure if I agree with the main claim. The rise of fascism in the last decade and expectation that it will continue is extremely evident; its consequences for democracy are a lot less clear.
The major wrinkle in all of this is in assessing anti-democratic behavior. Democracy indices not a great way of assessing democracy for much the same reason that the Doomsday Clock is a bad way of assessing nuclear risk: they're subjective metrics by (probably increasingly) left-leaning academics and tend to measure a lot of things that I wouldn't classify as democracy (eg rights of women/LGBT people/minorities). This paper found that using re-election rates there has been no evidence of global democratic backsliding. This started quite the controversy in political science; my read on the subsequent discussion is that there is evidence of backsliding, but such backsliding has been fairly modest.
I expect things to get worse as more countries get far-right leaders and those which already have far-right leaders have their democratic institutions increasingly captured by far-right leaders. And yet...a lot of places with far-right leaders continue to have close elections. See Poland, Turkey, Israel if you count that. In Brazil they even lost election. One plausible theory here is that the more anti-democratic behavior a party engages in the more resistance they face - either because voters are turned off or because their opponents increasingly become center or center-right parties seeking to create broad pro-democracy coalitions - and that this roughly balances out. What does this mean for how one evaluates democracy?
Finally, some comments specifically on more Western countries. I think the future of these countries is really uncertain.
For the next decade, it's really dependent on a lot of short-term events. Will Italy's PM Meloni engage in anti-democratic behavior? Will Le Pen win election in France, and if so will she engage in anti-democratic behavior? Will Trump win in 2024? How quickly/far will the upward trend in polling for Germany and Spain's far-right continue?
I know the piece specifies the next decade, but more long-term, the rise of fascism has come quite suddenly in the span of these last 8 years. If it continues for a few decades (and AI doesn't kill us all) then we are probably destined for fascist governments almost everywhere and the deterioration of democratic institutions. But how long this global trend will last is really the big question in global politics. Maybe debates over AI issues will become the big issue to supplant fascism? IDK. I'd love to see some analysis of historical trends in public approval to see what a prior for this question would look like; I've never gotten around to doing it myself and am really not very well informed about history here.
I'm going to quote this from an EA Forum post I just made for why simply repeated exposure to AI Safety (through eg media coverage) will probably do a lot to persuade people:
[T]he more people hear about AI Safety, the more seriously people will take the issue. This seems to be true even if the coverage is purporting to debunk the issue (which as I will discuss later I think will be fairly rare) - a phenomenon called the illusory truth effect. I also think this effect will be especially strong for AI Safety. Right now, in EA-adjacent circles, the argument over AI Safety is mostly a war of vibes. There is very little object-level discussion - it's all just "these people are relying way too much on their obsession with tech/rationality" or "oh my god these really smart people think the world could end within my lifetime". The way we (AI Safety) win this war of vibes, which will hopefully bleed out beyond the EA-adjacent sphere, is just by giving people more exposure to our side.
No, it does not say that either. I’m assuming you’re referring to “choose our words carefully”, but stating something imprecisely is a far ways from not telling the truth.
Nowhere in that quote does it say we should not speak the truth
Yeah so this seems like what I was missing.
But it seems to me that in these types of models, where the utility function is based on the state of the world rather than on input to the AI, aligning the AI not to kill humanity is easier. Like if an AI gets a reward every time it sees a paperclip, then it seems hard to punish the AI for killing humans because "human dies" is a hard thing for an AI with just sensory input to explicitly recognize. If however the AI is trained on a bunch of runs where the utility function is the number of paperclips actually created, then we can also penalize the model for the number of people who actually die.
I'm not very familiar with these forms of training so I could be off here.
Steelmanning is useful as a technique because often the intuition of somebody’s argument is true even if the precise argument they are using is not. If the other person is a rationalist, then you can point out the argument’s flaws and expect them to update the argument to more precisely explore their intuition. If not, you likely have to do some of the heavily lifting for them by steelmanning their argument and seeing where its underlying intuition might be correct.
This post seems only focused on the rationalist case.
As with most things in life: this seems like it could be a real improvement, it's great that we're testing it and finding out!
For most products to be useful, they must be (perhaps not perfectly, but near-perfectly) reliable. A fridge that works 90% of the time is useless, as is a car that breaks down 1 out of every 10 times you try to go to work. The problem with AI is inherently that it’s unreliable - we don’t know how the inner algorithm works, so it just breaks at random points, especially because most of the tasks it handles are really hard (hence why we can’t just use classical algorithms). This makes it really hard to integrate AI until it gets really good, to the point where it can actually be called reliable
The things AI is already used for are things where reliability doesn’t matter as much. Advertisement algorithms just need to be as good as possible to make the company as much revenue as possible. People currently use machine translation just to get the message across and not for formal purposes, making AI algorithms sufficient (if they were better maybe we could use them for more formal purpose’s!). The list goes on.
I honestly think AI won’t become super practical until we reach AGI, at which point (if we ever get there) its usage will explode due to massive applicability and solid reliability (if it doesn’t take over the world, that is).
Yeah, maybe we could show ratio of strong upvotes to upvotes
This is quite a rude response