What if memes are common in highly capable minds?

daniel-kokotajlo

What if memes are common in highly capable minds?

post by Daniel Kokotajlo (daniel-kokotajlo) · 2020-07-30T20:45:17.500Z · LW · GW · 10 comments

This is a question post.

  Answers
    2 M. Y. Zuo
    1 Itay Yona
None
10 comments

The meme-theoretic view of humans says: Memes are to humans as sailors are to ships in the age of sail.

If you want to predict where a ship will go, ask: Is it currently crewed by the French or the English? Is it crewed by merchants, pirates, or soldiers? These are the most important questions.

You can also ask e.g. "Does it have a large cargo hold? Is it swift? Does it have many cannon-ports?" But these questions are less predictive of where it will go next. They are useful for explaining how it got the crew it has, but only to a point--while it's true that a ship built with a large cargo hold is more likely to be a merchant for more of its life, it's quite common to encounter a ship with a large cargo hold that is crewed by soldiers, or for a ship built in France to be sailed by the English, etc. The main determinants of how a ship got the crew it currently has are its previous interactions with other crews, e.g. the fights it had, the money that changed hands when it was in port, etc.

The meme-theoretic view says: Similarly, the best way to explain human behavior is by reference to the memes in their head, and the best way to explain how those memes got there is to talk about the history of how those memes evolved inside the head in response to other memes they encountered outside the head. Non-memetic properties of the human (their genes, their nutrition, their age, etc.) matter, but not as much, just like how the internal layout of a ship, its size, its age, etc. matter too, but not as much as the sailors inside it.

Anyhow, the meme-theoretic view is an interesting contrast to the highly-capable-agent view. If we apply the meme-theoretic view to AI, we get the following vague implications:

--Mesa-alignment problems are severe. The paper already talks about how there are different ways a system could be psuedo-aligned, e.g. it could have a stable objective that is a proxy of the real objective, or it could have a completely different objective but be instrumentally motivated to pretend, or it could have a completely different objective but have some irrational tic or false belief that makes it behave the way we want for now. Well, on a meme-theoretic view these sorts of issues are the default, they are the most important things for us to be thinking about.

--There may be no stable objective/goal at all in the system. It may have an objective/goal now, but if the objective is a function of the memes it currently has and the memes can change in hard-to-predict ways based on which other memes it encounters...

--Training/evolving an AI to behave a certain way will be very different at each stage of smartness. When it is too dumb to host anything worthy of the name meme, it'll be one thing. When it is smart enough to host simple memes, it'll be another thing. When it is smart enough to host complex memes, it'll be another thing entirely. Progress and success made at one level might not carry over to higher levels.

--There is a massive training vs. deployment problem. The memes our AI encounters in deployment will probably be massively different from those in training, so how do we ensure that it reacts to them appropriately? We have no idea what memes it will encounter when deployed, because we want it go to out into the world and do all sorts of learning and doing on our behalf.

Thanks to Abram Demski for reading a draft and providing some better terminology

Answers

answer by M. Y. Zuo · 2021-11-11T02:35:03.089Z · LW(p) · GW(p)

If there are multiple AI’s exchanging memes with each other and with humans there will likely be AI-AI stag hunts and AI-human stag hunts emerging in largely unpredictable ways due to the rapid pace of memetic evolution.

Benefits: less chance of a rogue paperclip maximizer forming

Drawbacks: greater chance of humans falling for AI generated memes

Also the rate of memetic evolution may be even faster for AIs than humans due to their differing architecure.

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2021-11-11T10:16:13.172Z · LW(p) · GW(p)

I don't understand, can you elaborate / unpack that?

Replies from: M. Y. Zuo

↑ comment by M. Y. Zuo · 2021-11-11T13:28:38.966Z · LW(p) · GW(p)

A stag hunt: https://www.lesswrong.com/tag/stag-hunt [? · GW] is a game theory term about a pattern of coordination that commonly emerges in multi party interactions.

AIs have coordination problems with other AIs and with humans. AGIs exponentially more so as well discussed on LW.

In attempting to compete and solve such coordination problems, the usage of memes will almost certainly be utilized, in both AI-AI and AI-human interaction. The dynamics will induce memetic evolution.

answer by Itay Yona · 2022-06-06T22:10:27.417Z · LW(p) · GW(p)

[In my opinion]

Memes are self-replicating concepts (given you have enough humans to spread them). Highly capable minds are different as they contain predictive models of: world, self, and others. This allows them to manipulate both objects in the world, and other people to fulfill their needs. Since memes don't have these capacities, and even though they are related to human behavior, they should not be accounted as the cause of human behavior. Even if the best way to explain human behavior is through memes, they don't necessarily account of most of the decision-making process.

[/In my opinion]

10 comments

Comments sorted by top scores.

comment by DanielFilan · 2020-07-30T21:54:28.612Z · LW(p) · GW(p)

My understanding of meme theory is that it considers the setting where memes mutate, reproduce, and are under selection pressure. This basically requires you to think that there's some population pool where the memes are spreading. So, one way to think about it might be to ask what memetic environment your AI systems are in.

Are human memes a good fit for AI agents? You might think that a physics simulator is not going to be a good fit for most human memes (except perhaps for memes like "representation theory is a good way to think about quantum operators"), because your physics simulator is structured differently from most human minds, and doesn't have the initial memes that our memes are co-adapted with. That being said, GPT-8 might be very receptive to human memes, as memes are pretty relevant to what characters humans type on the internet.
How large is the AI population? If there's just one smart AI overlord and then a bunch of MS Excel-level clever computers, the AI overlord is probably not exchanging memes with the spreadsheets. However, if there's a large number of smart AI systems that work in basically the same manner, you might think that that forms the relevant "meme pool", and the resulting memes are going to be different from human memes (if the smart AI systems are cognitively different from humans), and as a result perhaps harder to predict. You could also imagine there being lots of AI system communities where communication is easy within each community but difficult between communities due to architectural differences.

Replies from: daniel-kokotajlo, Viliam, wearsshoes

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2020-07-31T11:34:33.061Z · LW(p) · GW(p)

One scenario that worries me: At first the number of AIs is small, and they aren't super smart, so they mostly just host normal human memes and seem as far as we (and even they) can tell to be perfectly aligned. Then, they get more widely deployed, and now there are many AIs and maybe they are smarter also, and alas it turns out that AIs are a different environment than humans, in a way which was not apparent until now. So different memes flourish and spread in the new environment, and bad things happen.

↑ comment by Viliam · 2020-07-31T03:14:02.444Z · LW(p) · GW(p)

A part of the idea of "meme" is that the human mind is not designed as a unified algorithm, but consists of multiple parts, that can be individually gained or replaced. (The rest of the idea is that the parts are mostly acquired by learning from other humans, so their copies circulate in the population which provides an evolutionary environment for them.)

Could this first part make sense alone? Could an AI be constructed -- in analogy to "Kegan level 5" in humans -- in the way that it creates these parts (randomly? by mutation of existing ones?), then evaluates them somehow, keeps the good ones and discards the bad ones, with the idea that it may be easier to build a few separate models, and learn which one to use in which circumstances, than going directly for one unified model of everything? In other words, that the general AI would internally be an arena of several smaller, non-general AIs; with a mechanism to create, modify, and select new ones? Like, we want to teach the AI how to write poetry, so the AI will create a few sub-AIs that can do only poetry and nothing more, evaluate them, and then follow the most successful one of them. Another set of specialized sub-AIs for communicating with humans; another for physics; etc. With some meta mechanism which would decide when a new set of sub-AIs are needed (e.g. when all existing sub-AIs are doing poorly at solving the problem).

And, like, this architecture could work for some time; with greater capacity the general AI would be able to spawn more sub-AIs and cover more topics. And then at some moment, the process would generate a new sub-AI that somehow hijacks the meta mechanism and convices it that it is a good model for everything. For example, it could stumble upon an idea "hey, I should simply wirehead myself" or "hey, I should try being anti-inductive for a while and actually discard the useful sub-AIs and keep the harmful ones" (and then it would find out that this was a very bad idea, but because now it keeps the bad ideas, it would keep doing it).

Even if we had an architecture that does not allow full self-modification, so that wireheading or changing the meta mechanism is not possible, maybe the machine that cannot fully self-modify would find out that it is very efficient to simulate a smaller AI, such that the simulated AI can self-modify. And the simulated AI would work reasonably for a long time, and then suddenly start doing very stupid things... and before the simulating AI realizes that something went wrong, maybe some irrepairable damage already happened.

...this all is too abstract for me, so I even have no idea whether what I wrote here actually makes any sense. I hope a smarter minds may look at this and extract the parts that make sense, assuming there are any.

↑ comment by Rachel Shu (wearsshoes) · 2020-07-31T03:17:03.036Z · LW(p) · GW(p)

comment by Lukas_Gloor · 2020-07-31T10:23:23.348Z · LW(p) · GW(p)

In this answer [LW(p) · GW(p)] on arguments for hard takeoff, I made the suggestion that memes related to "learning how to learn" could be the secret sauce that enables discontinuous AI takeoff. Imagine an AI that absorbs all the knowledge on the internet, but doesn't have a good sense of what information to prioritize and how to learn from what it has read. Contrast that with an AI that acquires better skills about how to organize its inner models, making its thinking more structured, creative, and generally efficient. Good memes about how to learn and plan might make up an attractor, and AI designs with the right parameters could hone in on that attracter in the same way as "great minds think alike." However, if you're slightly off the attractor and give too much weight to memes that aren't useful for truth-seeking and good planning, your beliefs might resemble that of a generally smart person with poor epistemics, or someone low on creativity who never has genuine insights.

comment by walking_mushroom · 2022-06-16T02:56:12.236Z · LW(p) · GW(p)

I find this perspective interesting (and confusing), and want to think about it more deeply. Can you recommend reading anything to have a better understanding of what you're thinking, or what led you to this idea in specific?

Beyond the possible implications you mentioned, I think this might be useful in clarifying the 'trajectory' of agent selection pressure far from theoretical extremes that Richard Ngo mentioned in "agi safety from first principles" sequence.

My vague intuition is that successful, infectious memes work by reconfiguring agents to shift from one fix point in policy to another while not disrupting utility. Does that make sense?

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-06-16T05:57:50.689Z · LW(p) · GW(p)

Thanks! Excellent point about the connection to the trajectory of agent selection pressure.

I don't remember what led me to this idea in particular. I've been influenced by doing a lot of thinking about agent foundations and metaethics and noticing the ways in which humans don't seem to be well modelled as utility maximizers or even just any sort of rational goal-directed agents with stable goals. I also read the book "The Meme Machine" and liked it, though that was after writing this post, not before, IIRC.

I don't know what you mean by fixed points in policy. Elaborate?

Replies from: walking_mushroom

↑ comment by walking_mushroom · 2022-06-17T08:50:33.729Z · LW(p) · GW(p)

I don't know what you mean by fixed points in policy. Elaborate?

I might have slightly abused the term "fix point" & being unnecessarily wordy.

I mean that though I don't see how memes can change objectives of agents in a fundamental way, memes influence "how certain objectives are being maximized". Low-level objectives are the same yet their policies are implemented differently - because of receiving different memes. I think it's vaguely like externally installed bias.

Ex: humans all crave social connections but people model their relationship with the society and interpret such desire differently, partially depending on cultural upbringing (meme).

I don't know if having higher-levels of intelligence/being more rational/coherent cancels out the effects, ex: smarter version of agent now thinks more generally about all possible policies and finds there's a 'optimal' way to realize certain objective and is no longer steered by memes/biases. Though I think in open-ended tasks it's less likely to see such convergence, because current space of policies is built upon solutions and tools built before and is highly path-dependent in general. So memes early on might matter more to open-ended tasks.

I'm also thinking about agency foundations atm, and also confused about the generality of the utility maximizer frame. One simple answer to why humans don't fit the frame is "humans aren't optimizing hard enough (so haven't shown convergence in policy)". But this answer doesn't clarify "what happens when agents aren't as rational/hard-optimizing", "dynamics and preconditions when agents-in-general becomes more rational/coherent/utility maximizer", etc. so I'm not happy with my state of understand on this matter.

The book looks cool, will read soon, TY!

(btw this is my first interaction on lw so it's cool :) )

comment by Donald Hobson (donald-hobson) · 2020-08-23T21:41:03.641Z · LW(p) · GW(p)

I think that there is an unwarrented jump here from (Humans are highly memetic) to (AI's will be highly memetic).

I will grant you that memes have a substantial effect on human behaviour. It doesn't follow that AI's will be like this.

Your conditions would only have a strong argument for them if there was a good argument that AI's should be meme driven.

Replies from: daniel-kokotajlo

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2020-08-24T01:26:56.572Z · LW(p) · GW(p)

I didn't take myself to be arguing that AIs will be highly memetic, but rather just floating the possibility and asking what the implications would be.

Do you have arguments in mind for why AIs will be less memetic than humans? I'd be interested to hear them.

What if memes are common in highly capable minds?

Contents

Answers

10 comments