Posts
Comments
Thanks for the response, Martin. I'd like to try to get to the heart of what we disagree on. Do you agree that, given a sufficiently different architecture - e.g. a human who had a dog's brain implanted somehow - would grow to have different values in some respect? For example, you mention arguing persuasively. Argument is a pretty specific ability, but we can widen our field to language in general - the human brain has pretty specific circuitry for that. A dog's brain that lacks the appropriate language centers would likely never learn to speak, leave alone argue persuasively.
I want to point out again that the disagreement is just a matter of scale. I do think relatively similar values can be learnt through similar experiences for basic RL agents; I just want to caution that for most human and animal examples, architecture may matter more than you might think.
Hi Martin, thanks a lot for reading and for your comment! I think what I was trying to express is actually quite similar to what you write here.
'If we did they would still have different experiences, notably the experience of having a brain architecture ill-suited to operating their body.' - I agree. If I understand shard theory right, it claims that underlying brain architecture doesn't make much difference, and e.g. the experience of trying to walk in different ways, and failing at some but succeeding at others, would be enough to lead to success. However I'm pointing out that a dog's brain would still be ill-suited to learning things such as walking in a human body (at least compared to a human's brain), showing the importance of architecture.
My goal was to try to illustrate the importance of brain structure through an absurd thought experiment, not to create a coherent scenario - I'm sorry if that lead to confusion. The argument does not rest on the dog, the dog is meant to serve as an illustration of the argument.
At the end of the day, I think the authors of shard theory also concede that architecture is important in some cases - the difference seems to be more of a matter of scale. I'm merely suggesting that architecture may be a little more important than they consider it, and pointing to the variety of brain architectures and resulting values in different animals as an example.
Thanks, I really appreciate that! I've just finished an undergrad in cognitive science, so I'm glad that I didn't make any egregious mistakes, at least.
"AGI won't be just an RL system ... It will need to have explicit goals": I agree that this if very likely. In fact, the theory of 'instrumental convergence' often discussed here is an example of how an RL system could go from being comprised of low-level shards to having higher-level goals (such as power-seeking) that have top-down influence. I think Shard Theory is correct about how very basic RL systems work, but am curious about if RL systems might naturally evolve higher-level goals and values as they become more larger, more complex and are deployed over longer time periods. And of course, as you say, there's always the possibility of us deliberately adding explicit steering systems.
"shards can't hide any complex computations...since these route through consciousness.": Agree. Perhaps a case could be made for people being in denial of certain ideas they don't want to accept, or 'doublethink' where people have two views about a subject contradict each other. Maybe these could be considered different shards competing? Still, it seems a bit of a stretch, and certainly doesn't describe all our thinking.
"I think there's another important error in applying shard theory to AGI alignment or human cognition in the claim reward is not the optimization target": I think this is a very interesting area of discussion. I kind of wanted to delve into this further in the post, and talk about our aversion to wireheading, addiction and the reward system, and all the ways humans do and don't differentiate between intrinsic rewards and more abstract concepts of goodness, but figured that would be too much of a tangent haha. But overall, I think you're right.
"Something about being watched makes us more responsible ... In a pinch, placebo-ing yourself with a huge fake pair of eyes might also help."
There are 'Study with me'/'Work with me' videos on Youtube, which is usually just a few hours of someone working silently at a desk or library. I sometimes turn one of those on to give me the feeling I'm not alone in the room, raising accountability.
Great post!
I don't think people focus on language and vision because they're less boring than things like decision trees; they focus on those because the domains of language and vision are much broader than the domains decision trees, etc., are applied to. If you train a decision tree model to predict the price of a house it will do just that, whereas if you train a language model to write poetry it could conceivably write about various topics such as math, politics and even itself (since poetry is a broad scope). This is a (possibly) a step towards general intelligence, which is what people are worried/excited about.
I agree with your argument that algorithms such as decision trees are much better at doing things that humans can't, whereas language and vision models are not.
It's great to see Brittany's response was so positive, but could you still clarify if you explicitly told her you would help her learn how to cook, and/or did she ask you to do so? Or did you just infer that it's something that she would enjoy, and proceed without making it explicit?
Again, I'm happy for Tiffany's newfound cooking abilities - congratulations to her!