AI alignment via civilizational cognitive updates

post by AtillaYasar (atillayasar) · 2024-11-10T09:33:35.023Z · LW · GW · 10 comments

Contents

  The gist
  Explore vs exploit ("don't buttclench")
  List of thoughts related to memetic evolution and AI alignment
    Civilizational change as memetic evolution
    Social media performs memetic updates
    Wokeism as case study
    Connor Leahy's emphasis
    Alignment discourse lacks updates
    Fix big lab incentives?
    World will change fast
    Meta-thoughts about the above list
  Side note: GUI tools, externality of cognition
  Betting on the human spirit
  Memetic evolution bad?
None
10 comments

(This started as a reply to @Tamsin Leake [LW · GW] 's reply in my post about why cyborgism maybe should be open [LW · GW]. This post does not require you to read our interaction, though it lead to this, and I'm very grateful for Tamsin's reply.)

In general, this is a counterargument against:

we should only share cyborg tools (software that lets AI help us think) among AI safety people, so that the big labs don't get ahold of them, so we can save the world before they end it

 

The gist

My idea is that IF humanity doesn't want to die, we can discover this by maximizing information sharing and "converging" our culture and discourse towards what is *actually* going on with AI and what systems are capable of and that society isn't equipped to handle it, which will then cause humanity to resolve its dislike of the state of the world, by creating safe AI and/or institutions for creating safe AI.

(or whatever the path may be, which we don't know yet!!)

Tools that (in part via AI) help us think, help us share ideas and parse information and find information, will speed up memetic evolution.

Can't solve a problem you can't see.

 

Explore vs exploit ("don't buttclench")

Something feels wrong about AI safety people not wanting to discuss insights "because big bad tech will use it to build stronger AI".
(It doesn't jive with my personality at all because it feels very buttclenchy, so I'm aware  that I'm biased against this way of viewing things, but I'll keep writing anyway)

I wanna propose this framing: 

how much you share moves you along a spectrum of, fast learning on one side vs "the enemy" getting bad information, on the other side.


I believe we still need a LOT of information, and therefor should err on "share more and learn more".

If AI safety is a massive civilizational coordination problem, we need all the memetic-evolutionary pressure we can get. Technology that helps us think and communicate is the way to perform such updates.

 

List of thoughts related to memetic evolution and AI alignment

 

Civilizational change as memetic evolution

Basically updates happens along 2 axes: raising update speed and widening information bottleneck.
Population density increased both of those: (hunter/gatherer -> villages -> cities, (tho specialization is also a factor here, but you can see specialization as a result of widening search space + widening information bottleneck by society listening to more people on the fringes)), the printing press increased both, social media increased both.

 

Social media performs memetic updates

I think the magnitude and speed of updates that happen purely via Twitter and Youtube (maybe TikTok too), and the effect of that, is really important to understand and would guess that if many years from now we looked back at the years 2000-2030 with a sophisticated understanding of memetics, they would be a central topic.

 

Wokeism as case study

There's a way to view the insanity of wokeism as a runaway memetic phenomenon that developed because of social media, which is a novel and powerful technology for spreading and developing memes. (as @Connor Leahy [LW · GW] put it somewhere, "mentally ill teenagers developing increasingly deranged memetic viruses and unleashing them upon the population")

People say wokeism peaked in 2020, maybe its arc can serve as a case study for understanding memetics (there are historical examples too, like religions, Nazism, slavery and its abolishment, communism, many many trends and social movements I'm ignorant about..)

 

Connor Leahy's emphasis

He often talks about AI safety in terms of "civilizational coordination" (and in his/Conjecture's recent creation The Compendium [LW · GW] it's emphasizes it even more), which makes me wanna "update like a Bayesian" in this direction, or think about it more seriously until I can refute it or extract insights.

 

Alignment discourse lacks updates

I've heard multiple very smart people/good thinkers criticize @Eliezer Yudkowsky [LW · GW] and this field in general as not having updated well on modern AI (despite having great insights years ago when nobody saw it coming).

This is more a "Bayesian update in this direction", not object level, but anyway it does feel like alignment discourse is not talking about actual specific models and developments that are happening *right now*, and Anthropic is probably doing this more than anybody via mechanistic interpretability.

You can argue "but we're worried about future AIs" and yes I agree, but I find it very suspicious that that argument excuses the lack of updates.
(there's probably a logical fallacy or epistemological sin I'm committing here, but whatever)

 

Fix big lab incentives?

it's often said that big lab CEOs and their armies of researchers are very well-intentioned people who simply misunderstand and (maybe due to personal flaws/laziness/ambitions of grandeur) underestimate the colossal forces acting upon them (profit incentive being one).

So maybe they even become your ally if the system in which they're embedded is more aligned with your values, or maybe they'll jump out if its unaligned-ness is more globally and more concretely understood. (via memetic/cultural evolution)

(or as @Connor Leahy [LW · GW] might put it, if I understand his ideas that he explains in this podcast, "if you can make the gods and forces that the big labs are controlled by, do your bidding")

 

World will change fast

With short timelines (end of 2028, which is generally what I believe, but if it's longer, this point is even stronger), the world will change *massively* via not-yet-world-ending-AI.

The value of adapting to changes (by sharing information and arguments and insights), and the value of a civilization that is able to adapt, increases proportionally to how much things are changing.

Sama testifying before congress and Dario talking openly on Dwarkesh's podcast about 25% doom, updated discourse.

 

Meta-thoughts about the above list

I hope to expand and elaborate on those topics... and part of why I'm writing on LW is that you can read an article's preview on-hover, which lets you effectively create a web of posts, and build ideas, this website is a powerful tool for memetic evolution.

I'm sure almost everything there is already covered by multiple people with more much depth and writing skill --- if only the technology to find and reference such articles (which I bet actually already exists in multiple forms) was more widespread and well-known!!

 

Side note: GUI tools, externality of cognition

This topic really really fascinates me and is super personal to me, because I've been building the app that I'm currently working on for over a year and it's very much been a tool to boost my cognition and has helped my mental health immeasurably.
(in very short, it's a desktop app written in Tkinter, a glorified note taking system +  code editor/executor with many windows and tabs and other widget types, and you can talk to LMs and create hotkeys for arbitrary code written within the application), and even before that with other apps, pretty soon after I started learning to code ~4 years ago.

Simply the ability to copypaste a shitload of text into a chat window and get a summary (or any shape of breakdown or cognitive work of your choosing) is extremely valuable  -- this article is a result of many cycles of copypaste to Claude -> edit -> repeat -- and seems to yield like 95+% of the benefits of AI despite literally hundreds of engineering work and thinking on my part to create better tools. (and god knows how many tens of thousands of hours from big labs' engineers)

(the non-AI benefits are more about being able to organize my thoughts and feelings better, iterate on UI design, search across the app/my files, and these changes do very much respond to engineering efforts and to thinking)

How and why and in what ways tools and AI help our cognition, what the "landscape of cognitive tasks and abilities" looks like, what AI does and doesn't help with and why, why note-taking is so good, cognition being outsourced to our social interactions, etc. etc. -- I'm basically planting a flag around these topics. I hope to write about this more, I think it's fascinating and that the ceiling for empowering us via externalized cognition + AI tooling is very very high.

(And I suspect, based on my experience with this app and older intuitions, that there is way more benefit in AI-less tooling than we know (in very short: think of the speed and quantity of processing our visual system performs on a video or image of a scene, compared to reading and parsing information, it's like 100x at least, in both speed and bandwidth))

Part of why I haven't written about it yet and instead am writing posts like this, is that maybe discussing such ideas will "raise p(doom)" by empowering the AI industry, somehow via second- and third-order effects. It's also very frustrating that my ideas might be dumb and trivial, and that I can't even discover this without writing and publishing.

 

Betting on the human spirit

Maximizing global memetic evolution is fundamentally a bet on the human spirit and on the power of a globally cooperating and evolving civilization -- which is basically an expression of the human spirit.

Whereas "buttclenching", ie, "we safety people will keep everything secret and create an aligned AI, ship it to big labs and save the world before they destroy it (or directly use the AI to stop them)", is a bet on a small number of AI safety people, and on the brilliance of individual humans, as opposed to the larger system in which we are embedded.

(by "cooperation" I don't mean "everyone agrees on a goal and then does it in unison", I mean, "everyone is in some kind of communication/bit-sharing/mutual exploitation/applying adaptation pressure, adversarial or otherwise". Which was nicely pointed out by Connor in this segment of his Bankless podcast appearance (in short, he praised the podcaster for suing the US government [ something crypto related ] because that's a mechanism of civilizational coordination, regardless of being object-level correct about his case))

 

Memetic evolution bad?

> But what if it's actually a bad thing to allow humans to understand each other more frictionlessly and influence each other more rapidly and globally converge on things? What if we converge towards hell, and the only way to save us is an aligned ASI to stop all badness?

> Idk man we're just fucked then?

 

(edit: followup-ish: launched more general exploration into memetics [LW · GW] )

10 comments

Comments sorted by top scores.

comment by Connor Leahy (NPCollapse) · 2024-11-10T15:39:40.332Z · LW(p) · GW(p)

Hi, as I was tagged here, I will respond to a few points. There are a bunch of smaller points only hinted at that I won't address. In general, I strongly disagree with the overall conclusion of this post.

There are two main points I would like to address in particular:

1 More information is not more Gooder

There seems to be a deep underlying confusion here that in some sense more information is inherently more good, or inherently will result in good things winning out. This is very much the opposite of what I generally claim about memetics. Saying that all information is good is like saying all organic molecules or cells are equally good. No! Adding more biosludge and toxic algal blooms to your rosegarden won't make it better!

Social media is the exact living proof of this. People genuinely thought social media will bring everyone together, resolve conflicts, create a globally unified culture and peace and democracy, that autocracy and bigotry couldn't possibly thrive if you just only had enough information. I consider this hypothesis thoroughly invalidated. "Increasing memetic evolutionary pressure" is not a good thing! (all things equal) 

Increasing the evolutionary pressure on the flu virus doesn't make the world better, and viruses mutate a lot faster than nice fluffy mammals. Most mutations in fluffy mammals kills them, mutations in viruses helps them far more. Value is fragile. It is asymmetrically easy to destroy than to create.

Raw evolution selects for fitness/reproduction, not Goodness. You are just feeding the Great Replicator.

For an accessible intro to some of this, I recommend the book "Nexus" by Yuval Harari. (not that I endorse everything in that book, but the first half is great)

2 "Pivotal Act" style theories of change

You talk about theories of change of the form "we safety people will keep everything secret and create an aligned AI, ship it to big labs and save the world before they destroy it (or directly use the AI to stop them)". I don't endorse, and in fact strongly condemn, such theories of change. 

But not because of the hiding information part, but because of the "we will not coordinate with others and will use violence unilaterally" part! Such theories of change are fundamentally immoral for the same reasons labs building AGI is immoral. We have a norm in our civilization that we don't as private citizens threaten to harm or greatly upend the lives of our fellow civilians without either their consent or societal/governmental/democratic authority.

The not sharing information part is fine! Not all information is good! For example, Canadian researchers a while back figured out how to reconstruct an extinct form of smallpox, and then published how to do it. Is this a good thing for the world to have that information out there?? I don't think so. Should we open source the blue prints of the F-35 fighter jet? I don't think so, I think it's good that I don't have those blueprints!

Information is not inherently good! Not sharing information that would make the world worse is virtuous. Now, you might be wrong about the effects of sharing the information you have, sure, but claiming there is no tradeoff or the possibility that sharing might actually, genuinely, be bad, is just ignoring why coordination is hard.

3 Conclusion

If you ever find yourself thinking something of the shape "we must simply unreservedly increase [conceptually simple variable X], with no tradeoffs", you're wrong. Doesn't matter how clever you think X is, you're wrong. Any real life, not fake complex thing is made of towers upon towers of tradeoffs. If you think there are no tradeoffs in whatever system you are looking at, you don't understand the system.

Memes are not our friends. Conspiracy theories and lies spread faster than complex, nuanced truth. The printing press didn't bring the scientific revolution, it brought the witch burnings and the 30 year war. The scientific revolution came from the Royal Society and its nuanced, patient, complex norms of critical inquiry. Yes, spreading your scientific papers was also important, it was necessary but not sufficient for a good outcome.

More mutation/evolution, all things equal, means more cancer, not more health and beauty. Health and beauty can come from cancerous mutation and selection, but it's not a pretty process, and requires a lot of bloody, bloody trial and error (and a good selection function). The kind of inefficient and morally abominable process I would prefer us not relying on.

 

With that being said, I think it's good that you wrote things down and are thinking about them, please don't take what I'm saying as some kind of personal disparaging, I wish more people wrote down their ideas and tried to think things through! I think there is indeed a lot of valuable things in this direction, around better norms, tools, processes and memetic growth, but they're just really quite non trivial! You're on your way to thinking critically about morality, coordination and epistemology, which is great! That's where I think real solutions are!

Replies from: cleo-scrolls, feugjavnpolj, shankar-sivarajan, atillayasar
comment by Cleo Scrolls (cleo-scrolls) · 2024-11-12T17:51:17.883Z · LW(p) · GW(p)

That last paragraph seems important. There’s a type of person that doesn’t have an opinion yet in AI discourse, which is new, and will bounce off the "side" that appears most hostile to them--which, if they have misguided ideas, might be the truth-seeking side that gently criticizes. (Not saying that's the case for the author of this post!) 

It’s really hard to change the mind of someone who’s found their side in AI. But not to have them join one in the first place! 

Replies from: atillayasar
comment by AtillaYasar (atillayasar) · 2024-11-13T07:14:41.356Z · LW(p) · GW(p)

Despite being "into" AI safety for a while, I haven't picked a side. I do believe it's extremely important and deserves more attention and I believe that AI actually could kill everyone in less 5 years.

But any effort spent on pinning down one's "p(doom)" is not spent usefully on things like: how to actually make AI safe, how AI works, how to approach this problem as a civilization/community, how to think about this problem. And, as was my intention with this article, "how to think about things in general, and how to make philosophical progress".

comment by feugjavnpolj · 2024-11-11T20:46:18.817Z · LW(p) · GW(p)

I'm worried you're not seeing this at a long enough timescale.

I'm claiming:

1. "information sharing is good" is an invariant as timeless as "people will sacrifice truth and empathy for power", you can't claim Moloch wins based on available evidence.

2. both of these are more powerful than short-effects which we can forecast

On 1:

Increased information sharing leads to faster iteration. Faster iteration of science and technology leads to increased power derived from technology. Faster iteration of social norms and technologies leads to increased power derived from better coordination.

It is not a coincidence that the USA is simultaneously the most powerful and one of the most tolerant societies in human history.

Suppose you were the inventor of the gutenberg press deciding whether to release your technology or not. Maybe you could have foreseen the witch burnings. Maybe you could've even foreseen something like the 95 theses.

You couldn't have foreseen democracy in France, or that its success would inspire the US. (Which was again only possible because of sharing of information between Europe and US) You couldn't have foreseen that jew physicists leaving Europe for a more tolerant society would invent an atomic bomb that would ultimately bring peace to Europe. You couldn't have foreseen the peace among EU nations in 2024, not enforced just at threat of bomb but more strongly via intermixing of its peoples.

If you decided not to release the gutenberg press because of forecasted witch burnings you might have made a collosal mistake.

Information sharing is argued as good because it relies on principles of human behaviour that survive long after you die, long after any specific circumstances.

Information survives the rise and fall of civilisations. As long as 1-of-n people preserve some information, it is preserved. A basic desire for truth and empathy is universal amongst human beings across space and time, as its encoded in genetics not culture.

Yes, people are often forced to sacrifice certain values at the altar of other ones, and we see this throughout history. You could call this Moloch. This too is universal.

Both of these are invariants that could hold long after the point where we can forecast specific events.

On 2:

Witch burnings don't prove gutenberg press bad.

Social media isn't proven bad on such a short timescale for the same reason witch burnings don't prove the gutenberg press bad.

You haven't even proven publishing smallpox papers in public is bad. Maybe one day bioweapon research is banned and this is only possible because of public consensus built on publicly available papers such as the smallpox paper.

comment by Shankar Sivarajan (shankar-sivarajan) · 2024-11-11T01:56:49.791Z · LW(p) · GW(p)

societal/governmental/democratic authority.

There is a certain type of person who would look at the mountains of skulls that Genghis Khan piled up and before judging it evil, ask whether it was a state acting or a group of individuals.

Fuck that. States/governments, "democratic" or otherwise, have absolutely no privileged moral status, and to hell with any norm that suggests otherwise, and to hell with any "civilization" that promotes such a norm.

What the state can do is wield violence far more effectively than you, so if you want to level a city, say, Beijing or Moscow, yeah, you should get the US military to do it instead of trying to do it yourself. And it can wield violence against you if you defy its will, so it's a bad idea to do so publicly, but for purely pragmatic reasons, not moral ones.

Replies from: NPCollapse
comment by Connor Leahy (NPCollapse) · 2024-11-11T13:55:55.697Z · LW(p) · GW(p)

Morality is multifaceted and multilevel. If you have a naive form of morality that is just "I do whatever I think is the right thing to do", you are not coordinating or being moral, you are just selfish.

Coordination is not inherently always good. You can coordinate with one group to more effectively do evil against another. But scalable Good is always built on coordination. If you want to live in a lawful, stable, scalable, just civilization, you will need to coordinate with your civilization and neighbors and make compromises.

As a citizen of a modern country, you are bound by the social contract. Part of the social contract is "individuals are not allowed to use violence against other individuals, except in certain circumstances like self defense." [1] Now you might argue that this is a bad contract or whatever, but it is the contract we play by (at least in the countries I have lived in), and I think unilaterally reneging on that contract is immoral. Unilaterally saying "I will expose all of my neighbors to risk of death from AGI because I think I'm a good person" is very different from "we all voted and the majority decided building AGI is a risk worth taking."

Now, could it be that you in some exceptional circumstances need to do something immoral to prevent some even greater tragedy? Sure, it can happen. Murder is bad, but self defense can make it on net ok. But just because it's self defense doesn't make murder moral, it just means there was an exception in this case. War is bad, but sometimes countries need to go to war. That doesn't mean war isn't bad.

Civilization is all about commitments, and honoring them. If you can't honor your commitments to your civilization, even when you disagree with them sometimes, you are not civilized and are flagrantly advertising your defection. If everyone does this, we lose civilization.

Morality is actually hard, and scalable morality/civilization is much, much harder. If an outcome you dislike happened because of some kind of consensus, this has moral implications. If someone put up a shitty statue that you hate in the town square because he's an asshole, that's very different morally from "everyone in the village voted, and they like the statue and you don't, so suck it up." If you think "many other people want X and I want not X" has no moral implications whatsoever your "morality" is just selfishness.[2]

  1. ^

     (building AGI that might kill everyone to try to create your vision of utopia is "using violence")

  2. ^

     (I expect you don't actually endorse this, but your post does advocate for this)

Replies from: shankar-sivarajan
comment by Shankar Sivarajan (shankar-sivarajan) · 2024-11-11T16:28:57.069Z · LW(p) · GW(p)

has no moral implications whatsoever

I do in fact believe morality to be entirely orthogonal to "consensus" or  what "many other people" want, and since you call this "selfishness," I shall return the favor and call your view, for all that you frame it as "coordination" or "scalable morality," abject bootlicking.

A roaming bandit's "do what I tell you and you get to live" could be thought of a kind of contract, I suppose, but I wouldn't consider myself bound by it if I could get away with breaching it. I consider the stationary bandits' "social contracts" not to be meaningfully different. One clue to how they're similar is how the more powerful party can go, à la Vader, "Here is a New Deal. Pray I don't renew it any further." Unilaterally reneging on such a contract when you are the weaker party  would certainly be unwise, for the same reason trying to stand between a lynch mob and its intended victim would be—simple self-preservation—but I condemn the suggestion that it would be immoral.

If everyone does this, we lose civilization.

I see what you call "civilization," and I'm against it. I vaguely recall reading of a medieval Christian belief that if everyone stopped sinning for a day, Christ would return and restore the Kingdom of Heaven. This reminds me of that: would be nice, but it ain't gonna happen.

Replies from: atillayasar
comment by AtillaYasar (atillayasar) · 2024-11-13T07:42:52.306Z · LW(p) · GW(p)

I agree that morality and consensus are in principle not the same -- Nazis or any evil society is an easy counterexample.
(One could argue Nazis did not have the consensus of the entire world, but you can then just imagine a fully evil population.)

But for one, simply rejecting civilization and consensus based on "you have no rigorous definition of them and also look at Genghis Khan/Nazis this proves that governments are evil" is like, basically, putting the burden of proof on the side that is arguing for civilization and common sense morality, which is suspicious.

I'm open to alternatives, but just saying "governments can be evil therefor I reject it, full stop" is not that helpful for discourse. Like what do you wanna do, just abolish civilization?

So consider a handwavy view of morality and of what a "good civilization" looks like. Let's assume common sense morality is correct, and that mostly everyone understands what this means: "don't steal, don't hurt people, leave people alone unless they're doing bad things, don't be a sex pervert, etc.". And assume most people agree with this and want to live by this. Then when you have consensus, meaning most people are observing and agreeing that civilization (or practically speaking, the area or community over which they have a decent amount of influence), is abiding by "common sense morality", then everything is basically moral and fine.

 

 

(I also want to point out that caring too much on pinpointing what morality means exactly and how to put it into words, distracts from solving practical problems where it's extremely obvious what is morally going wrong, but where you have to sort out the "implementation details".)

comment by AtillaYasar (atillayasar) · 2024-11-10T19:09:14.806Z · LW(p) · GW(p)

TLDR:
Here's all the ways in which you're right, and thanks for pointing these things out!


At a meta-level, I'm *really* excited by just how much I didn't see your criticism coming. I thought I was thinking carefully, and that iterating on my post with Claude (though it didn't write a single word of it!) was taking out the obvious mistakes, but I missed so much. I have to rethink a lot about my process of writing this.

I strongly agree that I need a *way* more detailed model of what "memetic evolution" looks like, when it's good vs bad, and why, whether there's a better way of phrasing and viewing it, dig into historical examples, etc.

I'm curious if social media is actually bad beyond the surface -- but again I should've anticipated "social media kinda seems bad in a lot of ways" being such an obvious problem in my thinking, and attended to it.

Reading it back, it totally reads as an argument for "more information more Gooder", which I didn't see at all. (generally viewing the post as "more X is always more good" is also cool as in, a categorization trick that brings clarity)

I think a good way to summarize my mistake is that I didn't "go all the way" in my (pretty scattered) lines of thinking.

 You're on your way to thinking critically about morality, coordination and epistemology, which is great!

Thanks :)  A big part of why I got into writing ideas explicitly and in big posts (vs off-hand Tweets/personal notes), is because you've talked about this being a coordination mechanism on Discord.

Replies from: atillayasar
comment by AtillaYasar (atillayasar) · 2024-11-11T00:36:10.005Z · LW(p) · GW(p)

So I've been thinking more about this...

I think you completely missed the angle of, civilizational coordination via people updating on the state of the world and on what others are up to.

(To be fair I literally wrote in The Gist, "speed up memetic evolution", lol that's really dumb, also explicitly advocated for "memetic evolution" multiple times throughout)

 

Communication is not exactly "sharing information"

Communication is about making sure you know where you each stand and that you resolve to some equilibrium, not that you tell each other your life story and all the object level knowledge in your head.

Isn't this exactly what you're doing when going around telling people "hey guys big labs are literally building gods they don't understand nor control, this is bad and you should know it" ?
I should still dig into what that looks like exactly and when it's done well vs badly (for example you don't tell people how exactly OpenAI is building gods, just that they are).

I'd argue that if Youtube had a chatbot window embedded in the UI which can talk about contents of a video, this would be a very positive thing, because generally it would increase people's clarity about and ability to parse, contents of videos.

 

Clarity of ideas is not just "pure memetic evolution"

Think of the type of activity that could be described as "doing good philosophy" and "being a good reader". This process is iterative too: absorb info from world -> share insight/clarified version of info -> get feedback -> iterate again -> affect world state -> repeat. It's still in the class of "unpredictable memetic phenomena", but it's very very different from what happens on the substrate of mindless humans scrolling TikTok, guided by the tentacles of a recommendation algorithm.

Even a guy typing something into a comment box, constantly re-reading and re-editing and re-considering, will land on (evolve towards) unpredictable ideas (memes). That's the point!