Posts

If Alignment is Hard, then so is Self-Improvement 2023-04-07T00:08:21.567Z
The curious case of Pretty Good human inner/outer alignment 2022-07-05T19:04:49.434Z

Comments

Comment by PavleMiha on If Alignment is Hard, then so is Self-Improvement · 2023-04-07T18:00:41.732Z · LW · GW

"It's much easier to find parts of the system that don't affect values than it is to nail down exactly where the values are encoded." - I really don't see why this is true, how can you only change parts that don't affect values if you don't know where values are encoded?

Comment by PavleMiha on If Alignment is Hard, then so is Self-Improvement · 2023-04-07T17:59:46.503Z · LW · GW

I guess I don't really see that in myself. If you offered me a brain chip that would make me smarter but made me stop caring for my family I simply wouldn't do it. Maybe I'd meditate to make want to watch less TV, but that's because watching TV isn't really in what I'd consider my "core" desires.

Comment by PavleMiha on If Alignment is Hard, then so is Self-Improvement · 2023-04-07T11:51:37.719Z · LW · GW

Quite curious to see Eliezer or someone else's point on this subject, if you could point me in the right direction!

Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-08T14:38:55.871Z · LW · GW

Ah I do personally find that a lot better than wholesale uploading, but even then I'd stop short of complete replacement. I would be too afraid that without noticing I would lose my subjective experience - the people doing the procedure would never know the difference. Additionally, I think for a lot of people if such a procedure would stop them from having kids they wouldn't want to do it. Somewhat akin to having kids with a completely new genetic code, most people seem to not want that. Hard to predict the exact details of these procedures and what public opinion will be of them, but it would only take some people to consistently refuse for their genes to keep propagating.

Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-08T10:33:00.312Z · LW · GW

I completely agree that our behaviour doesn't maximise the outer goal. My mysteriously capitalised "Pretty Good" was intended to point in this direction - that I find it interesting that we still have some kids, even when we could have none and still have sex and do other fun things. Declining populations would also point to worse alignment. I would consider proper bad alignment to be no kids at all, or the destruction of the planet and our human race along with it, although my phrasing, and thinking on this, is quite vague.

There is an element of unsustainability in your strategy for max gene spreading, where if everyone was constantly doing everything they could to try to spread their genes as much as possible, in the ways you describe, humanity as a whole might not survive, spreading no genes at all. But, even if it would unsustainable for everyone to do the things you described, a few more people could do it, spread their genes far and wide and society would keep ticking along. Or everyone could have just a few more children and things would probably be fine in the long term. I would say that men getting very little satisfaction from sperm donation is a case of misalignment - a deep mismatch between our "training" ancestral environments and our "deployment" modern world.

So I agree we don't maximise the outer goal, especially now that know how not to. One of the things that made me curious about this whole thing is that this characteristic, some sort of robust goal following without maximising, seems like something we would desire in artificial agents. Reading through all these comments is crystallising in my head what my questions on this topic actually are:

  1. Is this robust non-maximalness an emerging quality of some or all very smart agents? - I doubt it, but it would be nice as it would reduce the chances that we get turned into paperclips.
  2. Do we know how to create agents that exhibit these characteristics I think are positive? - I doubt it, but might be worth figuring out. An AGI that follows their goals only some sustainable, reasonable, amount seems safer than the AGI equivalent of the habitual sperm donor.
Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-06T21:00:30.286Z · LW · GW

So like a couple would decide to have kids and they would just pick a set of genes entirely unrelated to theirs to maximise whatever characteristics they valued?

If I understand it correctly, I still feel like most people would choose not to do this, a lot of people seem against even minor genetic engineering, let alone something as major as that. I do understand a lot of the reticence towards genetic engineering has other sources besides “this wouldn’t feel like my child, it’s hard to make any clear predictions.

Yeah, anthropomorphising evolution is pretty iffy, I guess in this situation I’m imagining we’re evolution and we create a human race with the goal of replicating a bunch of DNA sequences that starts doing all sorts of wild things we didn’t predict. I still think I’d be more pleased with the outcome here than what a lot of current thinking on AGIs predicts we will be once we create a capable enough AGI. We do propagate our little DNA sequences, not as ambitiously as we perhaps could, but also responsibly enough that we aren’t destroying absolutely everything in our path. I don’t see this as a whole-sale reinterpreting of what evolution “wants”, more of a not very zealous approach to achieving it.

A bit like if I made a very capable paper clip making AI and it made only a few million paperclips and then got distracted watching YouTube and only making some paperclips every now and then. Not ideal, but better than annihilation.

Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-06T20:40:57.823Z · LW · GW

So if I upload my brain onto silicon, but don’t destroy my meat self in the process, how is the one in the silicon me? Would I feel the qualia of the silicon me? Should I feel better about being killed after I’ve done this process? I really don’t think it’s a matter of the Overton window, people do have an innate desire not to die, and unless I’m missing something this process seems a lot like dying with a copy somewhere.

Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-06T15:35:49.042Z · LW · GW

Yes, that's exactly the direction this line of thought is pulling me in! Although perhaps I am less certain we can copy the mechanics of the brain, and more keen on looking at the environments that led to human intelligence developing the way it did, and whether we can do the same with AI.

Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-06T11:59:02.886Z · LW · GW

I don't think people have shown any willingness to modify themselves anywhere close to that extent. Most people believe mind uploading would be equal to death (I've only found a survey of philosophers [1]), so I don't see a clear path for us to abandon our biology entirely. Really the clearest path I can see is us being replaced by AI in mostly unpleasant ways, but I wouldn't exactly call that humanity at that point.

I'd even argue that if given the choice to just pick a whole new set of genes for their kids unrelated to theirs most people would say no. A lot of people have a very robust desire to have biological children.

While I agree that our niceness evolved because it was beneficial, I do wonder why we didn't evolve the capacity for really long-term deception instead, like we fear AGIs will develop. A commenter above made a point about the geographic concentration of genes that I found very interesting and might explain this.

I reckon the question is whether can we replicate whatever made us nice in AGIs

[1] https://survey2020.philpeople.org/survey/results/5094

Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-06T11:49:38.719Z · LW · GW

I suspect (but can't prove) that most people would not upload themselves to non-biological substrate if given the choice - only 27% of philosophers[1] believe that uploading your brain would mean that you survive on the non-biological substrate. I also suspect that people would not engineer the desire to have kids out of themselves. If most people want to have kids, I don't think we can assume that they would change that desire, a bit like we don't expect very powerful AGIs to allow themselves to be modified. The closest I can think of right now would be that I could take drugs that would completely kill my sex drive, and almost no one would do that willingly, although that probably has other horrible side-effects.

If humans turn out to be misaligned in that way - we modify ourselves completely out of alignment with "evolution's wishes" - that would tell us something about the alignment of intelligent systems, but I think so far people have shown no willingness to do that sort of thing.

[1]https://survey2020.philpeople.org/survey/results/5094

Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-06T11:38:47.029Z · LW · GW

I agree with you on what is the inner optimiser. I might not have been able to make myself super clear in the OP, but I see the "outer" alignment as some version "propagate our genes", and I find it curious that that outer goal produced a very robust "want to have kids" inner alignment. I did also try to make the point that the alignment isn't maximal in some way, as in yeah, we don't have 16 kids, and men don't donate to sperm banks as much as possible and other things that might maximise gene propagation, but even that I find interesting: we fulfill evolution's "outer goal" somewhat, without going into paperclip-maximiser-style propagate genes at all cost. This seems to me like something we would want out of an AGI.

Comment by PavleMiha on The curious case of Pretty Good human inner/outer alignment · 2022-07-05T18:11:44.792Z · LW · GW

Genes being concentrated geographically is a fascinating idea, thanks for the book recommendation, I'll definitely have a look.

Niceness does seem like the easiest to explain with our current frameworks, and it makes me think about whether there is scope to train agents in shared environments where they are forced to play iterated games with either other artificial agents or us. Unless an AI can take immediate decisive action, as in a fast take-off scenario, it will, at least for a while, need to play repeated games. This does seem to be covered under the idea that powerful AI would be deceptive, and pretend to play nice until it didn't have to, but somehow our evolutionary environment led to the evolution of actual care for others' wellbeing rather than only very sophisticated long-term deception abilities.

I remember reading about how we evolved emotional reactions that are purposefully hard to fake, such as crying, in a sort of arms race against deception, I believe it's in How the Mind Works. This reminds me somewhat of that, where areas where people have genuine care for each other's well beings are more likely to propagate the genes concentrated there.