Posts
Comments
What do you mean? Surely they aren't offering this for anyone who writes anything manicly. It would be nice if someone volunteered for doing that service more often though.
I think you're right that it will take work to parse; it's definitely taking me work to parse! Possibly what you suggest would be good, but it sounds like work. I'll see what I think after the dialogue.
I was going to say The Thing. https://en.wikipedia.org/wiki/The_Thing_(1982_film)
Seems like someone went through my top-level posts and strong downvoted them.
The analogy from historical evolution is the misalignment between human genes and human minds, where the rise of the latter did not result in extinction of the former. It plausibly could have, but that is not what we observe.
The analogy is that the human genes thing produces a thing (human minds) which wants stuff, but the stuff it wants is different from what what the human genes want. From my perspective you're strawmanning and failing to track the discourse here to a sufficient degree that I'm bowing out.
For evolution in general, this is obviously pattern measure, and truly can not be anything else.
This sure sounds like my attempt elsewhere to describe your position:
There's no such thing as misalignment. There's one overarching process, call it evolution or whatever you like, and this process goes through stages of creating new things along new dimensions, but all the stages are part of the overall process. Anything called "misalignment" is describing the relationship of two parts or stages that are contained in the overarching process. The overarching process is at a higher level than that misalignment relationship, and the misalignment helps compute the overarching process.
Which you dismissed.
I'm saying that you, a bio-evolved thing, are saying that you hope something happens, and that something is not what bio-evolution wants. So you're a misaligned optimizer from bio-evolution's perspective.
A different way to maybe triangulate here: Is misalignment possible, on your view? Like does it ever make sense to say something like "A created B, but failed at alignment and B was misaligned with A"? I ask because I could imagine a position, that sort of sounds a little like what you're saying, which goes:
There's no such thing as misalignment. There's one overarching process, call it evolution or whatever you like, and this process goes through stages of creating new things along new dimensions, but all the stages are part of the overall process. Anything called "misalignment" is describing the relationship of two parts or stages that are contained in the overarching process. The overarching process is at a higher level than that misalignment relationship, and the misalignment helps compute the overarching process.
The original argument that your OP is responding to is about "bio evolution". I understand the distinction, but why is it relevant? Indeed, in the OP you say:
For the evolution of human intelligence, the optimizer is just evolution: biological natural selection. The utility function is fitness: gene replication count (of the human defining genes).
So we're talking about bio evolution, right?
I'm saying that the fact that you, an organism built by the evolutionary process, hope to step outside the evolutionary process and do stuff that the evolutionary process wouldn't do, is misalignment with the evolutionary process.
The search process is just searching for designs that replicate well in environment.
This is a retcon, as I described here:
If you run a big search process, and then pick a really extreme actual outcome X of the search process, and then go back and say "okay, the search process was all along a search for X", then yeah, there's no such thing as misalignment. But there's still such a thing as a search process visibly searching for Y and getting some extreme and non-Y-ish outcome, and {selection for genes that increase their relative frequency in the gene pool} is an example.
Ok so the point is that the vast vast majority of optimization power coming from {selection over variation in general} is coming more narrowly from {selection for genes that increase their relative frequency in the gene pool} and not from {selection between different species / other large groups}. In arguments about misalignment, evolution refers to {selection for genes that increase their relative frequency in the gene pool}.
If you run a big search process, and then pick a really extreme actual outcome X of the search process, and then go back and say "okay, the search process was all along a search for X", then yeah, there's no such thing as misalignment. But there's still such a thing as a search process visibly searching for Y and getting some extreme and non-Y-ish outcome, and {selection for genes that increase their relative frequency in the gene pool} is an example.
Of course - and we'd hope that there is some decoupling eventually! Otherwise it's just be fruitful and multiply, forever.
This "we'd hope" is misalignment with evolution, right?
Say you have a species. Say you have two genes, A and B.
Gene A has two effects:
A1. Organisms carrying gene A reproduce slightly MORE than organisms not carrying A.
A2. For every copy of A in the species, every organism in the species (carrier or not) reproduces slightly LESS than it would have if not for this copy of A.
Gene B has two effects, the reverse of A:
B1. Organisms carrying gene B reproduce slightly LESS than organisms not carrying B.
B2. For every copy of B in the species, every organism in the species (carrier or not) reproduces slightly MORE than it would have if not for this copy of B.
So now what happens with this species? Answer: A is promoted to fixation, whether or not this causes the species to go extinct; B is eliminated from the gene pool. Evolution doesn't search to increase total gene count, it searches to increase relative frequency. (Note that this is not resting specifically on the species being a sexually reproducing species. It does rest on the fixedness of the niche capacity. When the niche doesn't have fixed capacity, evolution is closer to selecting for increasing gene count. But this doesn't last long; the species grows to fill capacity, and then you're back to zero-sum selection.)
I don't see how this detail is relevant. The fact remains that humans are, in evolutionary terms, much more successful than most other mammals.
What do you mean by "in evolutionary terms, much more successful"?
IIUC a lot of DNA in a lot of species is gene-drive-like things.
These seem like failure modes rather than the utility function.
By what standard are you judging when something is a failure mode or a desired outcome? I'm saying that what evolution is, is a big search process for genes that increase their relative frequency given the background gene pool. When evolution built humans, it didn't build agents that try to promote the relative frequency of the genes that they are carrying. Hence, inner misalignment and sharp left turn.
You write:
The utility function is fitness: gene replication count (of the human defining genes)[1]. And by this measure, it is obvious that humans are enormously successful. If we normalize so that a utility score of 1 represents a mild success - the expectation of a typical draw of a great apes species, then humans' score is >4 OOM larger, completely off the charts.[2]
Footnote 1 says:
Nitpick arguments about how you define this specifically are irrelevant and uninteresting.
Excuse me, what? This is not evolution's utility function. It's not optimizing for gene count. It does one thing, one thing only, and it does it well: it promotes genes that increase their RELATIVE FREQUENCY in the reproducing population.
The failure of alignment is witnessed by the fact that humans very very obviously fail to maximize the relative frequency of their genes in the next generation, given the opportunities available to them; and they are often aware of this; and they often choose to do so anyway. The whole argument in this post is totally invalid.
Something else is going on here, it seems to me
To me the obvious candidate is that people are orienting around Nate in particular in an especially weird way.
Thank you.
not insult your reasoning skills
More detail here seems like it could be good. What form did the insult take? Other relevant context?
Ok, thanks for clarifying that that paragraph was added later.
(My comments also apply to the paragraph that was in the original.)
FWIW it seems to me that EY did not carefully read your post, and missed your distinction between having the human utility function somewhere in the AI vs. explicitly. Assuming you didn't edit the post, your paragraph here
The key difference between the value identification/specification problem and the problem of getting an AI to understand human values is the transparency and legibility of how the values are represented: if you solve the problem of value identification, that means you have an actual function that can tell you the value of any outcome (which you could then, hypothetically, hook up to a generic function maximizer to get a benevolent AI). If you get an AI that merely understands human values, you can't necessarily use the AI to determine the value of any outcome, because, for example, the AI might lie to you, or simply stay silent.
makes this clear enough. But my eyes sort of glazed over this part. Why? Quoting EY's comment above:
We are trying to say that because wishes have a lot of hidden complexity, the thing you are trying to get into the AI's preferences has a lot of hidden complexity.
A lot of the other sentences in your post sound like things that would make sense to say if you didn't understand this point, and that wouldn't make sense to say if you did understand this point. EY's point here still goes through even if you have the ethical-situations-answerer. I suspect that's why others, and I initially, misread / projected onto your post, and why (I suspect) EY took your explicit distancing as not reflecting your actual understanding.
This updated me, thank you. A fair amount, from "IDK, this sounds like it's fairly likely to mainly be just people being sensitive about blunt confrontational communication in a context where blunt confrontational communication is called for" to "Maybe that, but sure sounds a lot like Nate has a general disregard for fellows--maybe there's some internal story he has where his behavior would make sense if other people shared that story, but they don't and that should be obvious and he should have not behaved that way given that they don't".
Without digging in too much, I'll say that this exchange and the OP is pretty confusing to me. It sounds like MB is like "MIRI doesn't say it's hard to get an AI that has a value function" and then also says "GPT has the value function, so MIRI should update". This seems almost contradictory.
A guess: MB is saying "MIRI doesn't say the AI won't have the function somewhere, but does say it's hard to have an externally usable, explicit human value function". And then saying "and GPT gives us that", and therefore MIRI should update.
And EY is blobbing those two things together, and saying neither of them is the really hard part. Even having the externally usable explicit human value function doesn't mean the AI cares about it. And it's still a lot of bits, even if you have the bits. So it's still true that the part about getting the AI to care has to go precisely right.
If there's a substantive disagreement about the facts here (rather than about the discourse history or whatever), maybe it's like:
Straw-EY: Complexity of value means you can't just get the make-AI-care part to happen by chance; it's a small target.
Straw-MB: Ok but now we have a very short message pointing to roughly human values: just have a piece of code that says "and now call GPT and ask it what's good". So now it's a very small number of bits.
(It's almost certainly actually Eliezer, given this tweet: https://twitter.com/ESYudkowsky/status/1710036394977235282)
Top level blog post, do it.
The details matter here; I don't feel I can guess from what you've said whether we'd agree or not.
For example:
Tam: says some idea about alignment
Newt: says some particular flaw "...and this is an instance of a general problem, which you'll have to address if you want to make progress..." gestures a bit at the general problem
Tam: makes a tweak to the proposal that locally addresses the particular flaw
Newt: "This still doesn't address the problem."
Tam: "But it seems to solve the concrete problem, at least as you stated it. It's not obvious to me that there's a general problem here; if we can solve instances of it case-by-case, that seems like a lot of progress."
Newt: "Look, we could play this game for some more rounds, where you add more gears and boxes to make it harder to see that there's a problem that isn't being addressed at all, and maybe after a few rounds you'll get the point. But can we just skip ahead to you generalizing to the class of problem, or at least trying to do that on your own?"
Tam: feels dismissed/disrespected
I think Newt could have been more graceful and more helpful, e.g. explicitly stating that he's had a history of conversations like this, and setting boundaries about how much effort he feels exciting about putting in, and using body language that is non-conflictual... But even if he doesn't do that, I don't really think he's violating a norm here. And depending on context this sort of behavior might be about as well as Newt can do for now.
Cowards going around downvoting without making arguments.
There are ways of communicating other than being blunt that can... unsettlingly affect you
I really wish it were possible for this conversation to address what the affected people are coming in with. I suspect (from priors and the comments here) that there are social effects that are at core not located in either Nate or TurnTrout that result in this.
But I think some people possess the skill of "being able to communicate harsh truths accurately in ways where people still find the interaction kind, graceful, respectful, and constructive." And my understanding is that's what people like TurnTrout are wishing for.
This is a thing, but I'm guessing that what you have in mind involves a lot more than you're crediting of not actually trying for the crux of the conversation. As just one example, you can be "more respectful" by making fewer "sweeping claims" such as "you are making such and such error in reasoning throughout this discussion / topic / whatever". But that's a pretty important thing to be able to say, if you're trying to get to real cruxes and address despair and so on.
Engaging with people in ways such that they often feel heard/seen/understood
This is not a reasonable norm. In some circumstances (including, it sounds like, some of the conversations under discussion) meeting this standard would require a large amount of additional effort, not related to the ostensible reason for talking in the first place.
Engaging with people in ways such that they rarely feel dismissed/disrespected
Again, a pretty unreasonable norm. For some topics, such as "is what you're doing actually making progress towards that thing you've arranged your life (including social context) around making progress on?", it's very easy for people to feel this way, even if they are being told true, useful, relevant things.
Something fuzzy that lots of people would call "kindness" or "typical levels of warmth"
Ditto, though significantly less strongly; I do think there's ways to do this that stay honest and on-mission without too much tradeoff.
More likely, they say the creativity came from MrUgleh. Which it did, in many important senses, amazing work. I’m confused by the prompt not having at least a weird trick in it, seems like you should have to work harder for this?
You're aware it's using ControlNet, right?
Oh yeah. Well I guess it depends on the latitude.
I just tried to change it from being a quote to being in a box. But apparently you need a package to put a box around verbatim text in Latex. https://tex.stackexchange.com/questions/6260/how-to-draw-box-around-text-that-contains-a-verbatim-block
So feature suggestion: boxes. Or Latex packages.
I wonder why people don't build very narrow, very tall apartment buildings, with the broad face facing the sun arc. Then all appartments get lots of sun; on the back you have hallways and such.
Seems very plausible to me.
I want to make a hopefully-sandboxed comment:
This seems kind of cringe.
I don't think of myself as someone who thinks in terms of cringe, much, but apparently I have this reaction. I don't particularly endorse it, or any implications of it, but it's there. Maybe it means I have some intuition that the thing is bad to do, or maybe it means I expect it to have some weird unexpected social effect. Maybe it will be mocked in a way that shows that it's not actually a good symbolic social move. Maybe the intuition is something like: protests are the sort of thing that the weak side does, the side that will be mainly ignored, or perhaps mocked, and so making a protest puts stop-AI-ists in a weak social position. (I continue to not endorse any direct implication here, such as "this is bad to do for this reason".) Why would someone in power and reasoning in terms of power, like Lecun, take the stop-AI-ists seriously, when they've basically publicly admitted to not having social power, i.e. to being losers? Someone in power can't gain more power by cooperating with losers, and does not need to heed the demands of losers because they can't threaten them in the arena of power. (I do endorse trying to be aware of this sort of dynamic. I hope to see some version of the protest that is good, and/or some version of updating on the results or non-results.)
[ETA: and to be extra clear, I definitely don't endorse making decisions from within the frame of social symbolic moves and power dynamics and conflict. That sort of situation is something that we are to some extent always in, and sometimes forced to be in and sometimes want to be in, but that thinking-frame is never something that we are forced to or should want to restrict our thinking to.]
Alternative theory: Alice felt on thin ice socially + professionally. When she was sick she finally felt she had a bit of leeway and therefore felt even a little willing to make requests of these people who were otherwise very "elitist" wrt everyone, somewhat including her. She tries to not overstep. She does this by stating what she needs, but also in the same breath excusing her needs as unimportant, so that the people with more power can preserve the appearance of not being cruel while denying her requests. She does this because she doesn't know how much leeway she actually has.
Unfortunately this is a hard to falsify theory. But at a glance it seems consistent, and I think it's also totally a thing that happens.
I want to note a specific pattern that I've noticed. I am not commenting on this particular matter overall; the events with Nonlinear may or may not be an instance of the pattern. It goes like this:
- Fred does something unethical / immoral.
- People start talking about how Fred did something bad.
- Fred complains that people should not be talking the way they are talking, and Fred specifically invokes the standard of the court system, saying stuff like "there's a reason courts presume innocence / allow the accused to face the accuser / give a right to a defense attorney / have discovery / have the right to remain silent / right to avoid incriminating oneself / etc. etc.".
Fred's implication is that people shouldn't be talking the way they're talking because it's unjust.
... Of course, this pattern could also happen when step 1 is Fred not doing something bad; and either way, maybe Fred is right... But I suspect that in reality, Fred uses this as a way of isolated demands for rigor.
Being able to deduce a policy from beliefs doesn’t mean that common knowledge of beliefs is required.
Sure, I didn't say it was. I'm saying it's sufficient (given some assumptions), which is interesting.
In any case it doesn’t mean that an agent in reality in a prisoner’s dilemma has a crystal ball telling them the other’s policy.
Sure, who's saying so?
But a case where they each learn each other’s beliefs doesn’t feel that natural to me
It's analyzed this way in the literature, and I think it's kind of natural; how else would you make the game be genuinely perfect information (in the intuitive sense), including the other agent, without just picking a policy?
Yes, but the idea (I think!) is that you can recover the policy from just the beliefs (on the presumption of CDT EU maxxing). Saying that A does xyz because B is going to do abc is one thing; it builds in some of the fixpoint finding. The common knowledge of beliefs instead says: A does xyz because he believes "B believes that A will do xyz, and therefore B will do abc as the best response"; so A chooses xyz because it's the best response to abc.
But that's just one step. Instead you could keep going:
--> A believes that
----> B believes that
------> A believes that
--------> B believes that A will do xyz,
--------> and therefore B will do abc as the best response
------> and therefore A will do xyz as the best response
----> and therefore B will do abc as the best response
so A does xyz as the best response. And then you go to infinityyyy.
The situation is slightly complicated, in the following way. You're broadly right; source code sharing is new. But the old concept of Nash equilibrium is I think sometimes justified like this: We assume that not only do the agents know the game, but they also know each other. They know each other's beliefs, each other's beliefs about the other's beliefs, and so on ad infinitum. Since they know everything, they will know what their opponent will do (which is allowed to be a stochastic policy). Since they know what their opponent will do, they'll of course (lol) do a causal EU-maxxing best response. Therefore the final pair of strategies must be a Nash equilibrium, i.e. a mutual best-response.
This may be what Isaac was thinking of when referring to "common knowledge of everything".
OSGT then shows that there are code-reading players who play non-Nash strategies and do better than Nashers.
I will go ahead and say it is not once-a-year level hard for most people to find worthwhile first dates.
I'm a counter datapoint.
functional Machine Intelligence Research Imaging
More exciting IMO isn't so much the big data aspect, but just the opportunity for "big individual data": people getting to watch their own brain state for many hours. E.g. learning when you're rationalizing, when you're avoiding something, when you're deluded, when you're tired, when you're really thinking about something else, etc.
I don't see much of a disagreement here? I'm just saying that the way in which random things are accelerated is largely via convergent stuff; and therefore there's maybe some way that one can "repurpose" all that convergent stuff towards some aligned goal. I agree that this idea is dubious / doesn't obviously work. As a contrast, one could imagine instead a world in which new capabilities are sort of very idiosyncratic to the particular goal they serve, and when you get an agent with some goals, all its cognitive machinery is idiosyncratic and hard to parse out, and it would be totally infeasible to extract the useful cognitive machinery and repurpose it.
So in general, you'd say that nothing is objective? Is that right?
It would be useful to know relative dust levels in practice, given equipment + habits. E.g.: with such and such air filter running all the time, the air has X% less particulates of size Y; etc.
Note that you can probably find the broken LW posts by searching the title (+author) in LW.
Good point, but also according to Wikipedia "the index includes 167 countries and territories", so small changes in the average are plausibly meaningful.