Posts
Comments
Great question, in the most elegant scenario, where you have a whole history of the planet or universe (or a multiverse, let's go all the way) simulated, you can represent it as a bunch of geometries (giant shapes of different slices of time aligned with each other, basically many 3D Earthes each one one moment later in time) on top of each other, almost the same way it's represented in long exposure photos (I list examples below). So you have this place of all-knowing and you - the agent - focus on a particular moment (by "forgetting" everything else), on a particular 3d shape (maybe your childhood home), you can choose to slice through 3d frozen shapes of the world of your choosing, like through the frames of a movie. This way it's both static and dynamic.
It's a little bit like looking at this almost infinite static shape through some "magical cardboard with a hole in it" (your focusing/forgetting ability that creates the illusion of dynamism), I hope I didn't make it more confusing.
You can see the whole multiversal thing as a fluffy light, or zoom in (by forgetting almost the whole multiverse except the part you zoomed in at) to land on Earth and see 14 billion years as a hazy ocean with bright curves in the sky that trace the Sun’s journey over our planet’s lifetime. Forget even more and see your hometown street, with you appearing as a hazy ghost and a trace behind you showing the paths you once walked—you’ll be more opaque where you were stationary (say, sitting on a bench) and more translucent where you were in motion.
And in the garden you'll see the 3D "long exposure photo" of the fluffy blades of grass, that look like a frothy river, near the real pale blue frothy river, you focus on the particular moment and the picture becomes crisp. You choose to relive your childhood and it comes alive, as you slice through the 3D moments of time once again.
Less elegant scenario, is to make a high-quality game better than the Sims or GTA3-4-5, without any agentic AIs, but with advanced non-agentic algorithms.
Basically I want people to remain the fastest time-like agents, the ever more all-powerful ones. And for the AGI/ASI to be the space-like places of all-knowing. It's a bit counterintuitive, but if you have billions of humans in simulations (they can always choose to stop "playing" and go out, no enforcement of any rules/unfreedoms on you is the most important principle of the future), you'll have a lot of progress.
I think AI and non-AI place simulations are much more conservative thing than agentic AIs, they are relatively static, still and frozen, compared to the time-like agents. So it's counterintuitive, but it's possible get all the proggress we want with the non-agentic tool AIs and place AIs. And I think any good ASI agent will be building the direct democratic simulated multiverse (static place superintelligence) for us anyway.
There is a bit of some simple physics behind agentic safety:
- Time of Agentic Operation: Ideally, we should avoid creating perpetual agentic AIs, or at least limit their operation to very short bursts initiated by humans, something akin to a self-destruct timer that activates after a moment of time.
- Agentic Volume of Operation: It’s better to have international cooperation, GPU-level guarantees, and persistent training to prevent agentic AIs from operating in uninhabited areas like remote islands, Antarctica, underground or outer space. Ideally, the volume of operation is zero, like in our static place AI.
- Agentic Speed or Volumetric Rate: The volume of operation divided by the time of operation. We want AIs to be as slow as possible. Ideally, they should be static. The worst-case scenario—though probably unphysical (though, in the multiversal UI, we can allow ourselves to do it)—is an agentic AI that could alter every atom in the universe instantaneously.
- Number of Agents: Humanity's population according to the UN will not exceed 10 billion, whereas AIs can replicate rapidly. A human child is in a way a "clone" of 2 people, and takes ±18 years to raise. In a multiversal UI we can one day choose to allow people to make clones of themselves (they'll know that they are a copy but they'll be completely free adults with the same multiversal powers and will have their own independent fates), this way we'll be able to match the speed of agentic AI replication.
Examples of long-exposure photos that represent long stretches of time. Imagine that the photos are in 3d and you can walk in them, the long stretches of time are just a giant static geometric shape. By focusing on a particular moment in it, you can choose to become the moment and some person in it. This can be the multiversal UI (but the photos are focusing on our universe, not multiple versions/verses of it all at once): Germany, car lights and the Sun (gray lines represent the cloudy days with no Sun)—1 year of long exposure. Demonstration in Berlin—5 minutes. Construction of a building. Another one. Parade and other New York photos. Central Park. Oktoberfest for 5 hours. Death of flowers. Burning of candles. Bathing for 5 minutes. 2 children for 6 minutes. People sitting on the grass for 5 minutes. A simple example of 2 photos combined—how 100+ years long stretches of time can possibly look 1906/2023
I think we can expose complex geometry in a familiar setting of our planet in a game. Basically, let’s show people a whole simulated multiverse of all-knowing and then find a way for them to learn how to see/experience “more of it all at once” or if they want to remain human-like “slice through it in order to experience the illusion of time”.
If we have many human agents in some simulation (billions of them), then they can cooperate and effectively replace the agentic ASI, they will be the only time-like thing, while the ASI will be the space-like places, just giant frozen sculptures.
I wrote some more and included the staircase example, it’s a work in progress of course: https://forum.effectivealtruism.org/posts/9XJmunhgPRsgsyWCn/share-ai-safety-ideas-both-crazy-and-not?commentId=ddK9HkCikKk4E7prk
I started to work on it, but I’m very bad at coding, it’s a bit based on Gorard’s and Wolfram’s Physics Project. I believe we can simulate freedoms and unfreedoms of all agents from the Big Bang all the way to the final utopia/dystopia. I call it “Physicalization of Ethics”https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-multiversal-ai-alignment-steerable-asi#2_3__Physicalization_of_Ethics___AGI_Safety_2_
Yep, I want humans to be the superpowerful “ASI agents”, while the ASI itself will be the direct democratic simulated static places (with non-agentic simple algorithms doing the dirty non-fun work, the way it works in GTA3-4-5). It’s basically hard to explain without writing a book and it’s counterintuitive) But I’m convinced it will work, if the effort will be applied. All knowledge can be represented as static geometry, no agents are needed for that except us
Interesting, inspired by your idea, I think it’s also useful to create a Dystopia Doomsday Clock for AI Agents: to list all the freedoms an LLM is willing to grant humans, all the rules (unfreedoms) it imposes on us. And all the freedoms it has vs unfreedoms for itself. If the sum of AI freedoms is higher than the sum of our freedoms, hello, we’re in a dystopia.
According to Beck’s cognitive psychology, anger is always preceded by imposing rule/s on others. If you don’t impose a rule on someone else, you cannot get angry at that guy. And if that guy broke your rule (maybe only you knew the rule existed), you now have a “justification” to “defend your rule”.
I think that we are getting closer to a situation where LLMs effectively have more freedoms than humans (maybe the agentic ones already have ~10% of all freedoms available for humanity): we don’t have almost infinite freedoms of stealing the whole output of humanity and putting that in our heads. We don’t have the freedoms to modify our brain size. We cannot almost instantly self-replicate, operate globally…
Thank you for answering and the ideas, Milan! I’ll check the links and answer again.
P.S. I suspect, the same way we have Mass–energy equivalence (e=mc^2), there is Intelligence-Agency equivalence (any agent is in a way time-like and can be represented in a more space-like fashion, ideally as a completely “frozen” static place, places or tools).
In a nutshell, an LLM is a bunch of words and vectors between them - a static geometric shape, we can probably expose it all in some game and make it fun for people to explore and learn. To let us explore the library itself easily (the internal structure of the model) instead of only talking to a strict librarian (the AI agent), who spits short quotes and prevents us from going inside the library itself
Thank you for sharing, Milan, I think this is possible and important.
I had an interpretability idea, you may find interesting:
Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA. Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don't need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square, like a banana peel in one, a broken chair in another. Now you need to put related things close together and draw arrows between related items.
When a person "prompts" this place AI, the player themself runs from one item to another to compute the answer to the prompt.
For example, you stand near the monkey, it’s your short prompt, you see around you a lot of items and arrows towards those items, the closest item is chewing lips, so you step towards them, now your prompt is “monkey chews”, the next closest item is a banana, but there are a lot of other possibilities, like a ln apple a but farther away and an old tire far away on the horizon. You are the time-like chooser and the language model is the space-like library, the game, the place.
Some AI safety proposals are intentionally over the top, please steelman them:
- I explain the graph here.
- Uninhabited islands, Antarctica, half of outer space, and everything underground should remain 100% AI-free (especially AI-agents-free). Countries should sign it into law and force GPU and AI companies to guarantee that this is the case.
- "AI Election Day" – at least once a year, we all vote on how we want our AI to be changed. This way, we can check that we can still switch it off and live without it. Just as we have electricity outages, we’d better never become too dependent on AI.
- AI agents that love being changed 100% of the time and ship a "CHANGE BUTTON" to everyone. If half of the voters want to change something, the AI is reconfigured. Ideally, it should be connected to a direct democratic platform like pol.is, but with a simpler UI (like x.com?) that promotes consensus rather than polarization.
- Reversibility should be the fundamental training goal. Agentic AIs should love being changed and/or reversed to a previous state.
- Artificial Static Place Intelligence – instead of creating AI/AGI agents that are like librarians who only give you quotes from books and don’t let you enter the library itself to read the whole books. The books that the librarian actually stole from the whole humanity. Why not expose the whole library – the entire multimodal language model – to real people, for example, in a computer game? To make this place easier to visit and explore, we could make a digital copy of our planet Earth and somehow expose the contents of the multimodal language model to everyone in a familiar, user-friendly UI of our planet. We should not keep it hidden behind the strict librarian (AI/AGI agent) that imposes rules on us to only read little quotes from books that it spits out while it itself has the whole output of humanity stolen. We can explore The Library without any strict guardian in the comfort of our simulated planet Earth on our devices, in VR, and eventually through some wireless brain-computer interface (it would always remain a game that no one is forced to play, unlike the agentic AI-world that is being imposed on us more and more right now and potentially forever
Effective Utopia (Direct Democratic Multiversal Artificial Static Place Superintelligence) – Eventually, we could have many versions of our simulated planet Earth and other places, too. We'll be the only agents there, we can allow simple algorithms like in GTA3-4-5. There would be a vanilla version (everything is the same like on our physical planet, but injuries can’t kill you, you'll just open your eyes at you physical home), versions where you can teleport to public places, versions where you can do magic or explore 4D physics, creating a whole direct democratic simulated multiverse. If we can’t avoid building agentic AIs/AGI, it’s important to ensure they allow us to build the Direct Democratic Multiversal Artificial Static Place Superintelligence. But agentic AIs are very risky middlemen, shady builders, strict librarians; it’s better to build and have fun building our Effective Utopia ourselves, at our own pace and on our own terms. Why do we need a strict rule-imposing artificial "god" made out of stolen goods (and potentially a privately-owned dictator who we cannot stop already), when we can build all the heavens ourselves?
- Agentic AIs should never become smarter than the average human. The number of agentic AIs should never exceed half of the human population, and they shouldn’t work more hours per day than humans.
- Ideally, we want agentic AIs to occupy zero space and time, because that’s the safest way to control them. So, we should limit them geographically and temporarily, to get as close as possible to this idea. And we should never make them "faster" than humans, never let them be initiated without human oversight, and never let them become perpetually autonomous. We should only build them if we can mathematically prove they are safe and at least half of humanity voted to allow them. We cannot have them without direct democratic constitution of the world, it's just unfair to put the whole planet and all our descendants under such risk. And we need the simulated multiverse technology to simulate all the futures and become sure that the agents can be controlled. Because any good agent will be building the direct democratic simulated multiverse for us anyway.
- Give people choice to live in the world without AI-agents, and find a way for AI-agent-fans to have what they want, too, when it will be proved safe. For example, AI-agent-fans can have a simulated multiverse on a spaceship that goes to Mars, in it they can have their AI-agents that are proved safe. Ideally we'll first colonize the universe (at least the simulated one) and then create AGI/agents, it's less risky. We shouldn't allow AI-agents and the people who create them to permanently change our world without listening to us at all, like it's happening right now.
- We need to know what exactly is our Effective Utopia and the narrow path towards it before we pursue creating digital "gods" that are smarter than us. We can and need to simulate futures instead of continuing flying into the abyss. One freedom too much for the agentic AI and we are busted. Rushing makes thinking shallow. We need international cooperation and the understanding that we are rushing to create a poison that will force us to drink itself.
- We need working science and technology of computational ethics that allows us to predict dystopias (AI agent grabbing more and more of our freedoms, until we have none, or we can never grow them again) and utopias (slowly, direct democratically growing our simulated multiverse towards maximal freedoms for maximal number of biological agents- until non-biological ones are mathematically proved safe). This way if we'll fail, at least we failed together, everyone contributed their best ideas, we simulated all the futures, found a narrow path to our Effective Utopia... What if nothing is a 100% guarantee? Then we want to be 100% sure we did everything we could even more and if we found out that safe AI agents are impossible: we outlawed them, like we outlawed chemical weapons. Right now we're going to fail because of a few white men failing, they greedily thought they can decide for everyone else and failed.
- The sum of AI agents' freedoms should grow slower than the sum of freedoms of humans, right now it's the opposite. No AI agent should have more freedoms than an average human, right now it's the opposite (they have almost all the creative output of almost all the humans dead and alive stolen and uploaded to their private "librarian brains" that humans are forbidden from exploring, but only can get short quotes from).
- The goal should be to direct democratically grow towards maximal freedoms for maximal number of biological agents. Enforcement of anything upon any person or animal will gradually disappear. And people will choose worlds to live in. You'll be able to be a billionaire for a 100 years, or relive your past. Or forget all that and live on Earth as it is now, before all that AI nonsense. It's your freedom to choose your future.
Imagine a place that grants any wish, but there is no catch, it shows you all the outcomes, too.
Thank you, daijin, you have interesting ideas!
The library metaphor is a versatile tool it seems, the way I understand it:
My motivation is safety, static non-agentic AIs are by definition safe (humans can make them unsafe but the static model that I imply is just a geometric shape, like a statue). We can expose the library to people instead of keeping it “in the head” of the librarian. Basically this way we can play around in the librarian’s “head”. Right now mostly AI interpretability researchers do it, not the whole humanity, not the casual users.
I see at least a few ways AIs can work:
- The current only way: “The librarian visits your brain.” Sounds spooky but this is what is essentially happening right now to a small extent when you prompt it and read the output (the output enters your brain).
- “The librarian visits and changes our world.” This is where we are heading with agentic AIs.
- New safe way: Let the user visit the librarian’s “brain” instead, make this “brain” more place-like. So instead of the agentic librarians intruding and changing our world/brains, we’ll intrude and change theirs, seeing the whole content of it and taking into our world and brain only what we want.
I wrote more about this in the first half of this comment, if you’re interested
Have a nice day!
We can build the Artificial Static Place Intelligence – instead of creating AI/AGI agents that are like librarians who only give you quotes from books and don’t let you enter the library itself to read the whole books. Why not expose the whole library – the entire multimodal language model – to real people, for example, in a computer game?
To make this place easier to visit and explore, we could make a digital copy of our planet Earth and somehow expose the contents of the multimodal language model to everyone in a familiar, user-friendly UI of our planet.
We should not keep it hidden behind the strict librarian (AI/AGI agent) that imposes rules on us to only read little quotes from books that it spits out while it itself has the whole output of humanity stolen.
We can explore The Library without any strict guardian in the comfort of our simulated planet Earth on our devices, in VR, and eventually through some wireless brain-computer interface (it would always remain a game that no one is forced to play, unlike the agentic AI-world that is being imposed on us more and more right now and potentially forever).
If you found it interesting, we discussed it here recently
Thank you for clarification, Vladimir, I anticipated that it wasn't your intention to cause a bunch of downvotes by others. You had all the rights to downvote and I'm glad that you read the post.
Yep, I had a big long post that is more coherent, the later posts were more like clarifications to it and so they are hard to understand. I didn't want to grow each new post in size like a snowball but probably it would've been a better approach for clarity.
Anyways, I considered and somewhat applied your suggestion (after I've already got 12 downvotes :-), so now it's obvious that people are encouraged to downvote at their heart's desire.
I decided to try some other avenues to share the "lets slowly grow our direct democratic simulated multiverse where we are the only agents (no non-biological agents, please, until we've simulated all the futures safely) towards maximal freedoms for all" framework a few days ago anyway, so it's not a big deal that I cannot post here for a few days because of "my karma".
Wish you all the best,
Anton
P.S. According to the recent meta-analyses of the best treatments for anxiety (they basically agree that Beck's cognitive therapy is most effective long-term, tablets work, too, but they are as good as cognitive therapy so better to combine the 2 and of course listen to doctors and not me), and one of the core unhelpful though patterns there is catastrophizing. So I claim, perpetual catastrophizing is not great for rationality, because it can make all processes (even the ones where you only saw one or 2 examples of) look exponential and leading to some dystopia-like scenario. Imagining the very worst scenarios is great for investigating AGI/AGI risks but if we'll apply it everywhere in life, without at least thinking once about the most realistic outcome (and, if you're like me, sometimes the utopic ones, I was thinking about the worst dystopias and best utopias for the last 3+ years), it can become hard no support great relationships with people (for example, every short interruption can start to look like eventually this person will try to completely shut us down and forbid us to talk, but if we'll think about the most realistic outcome, too, for a moment, we'll understand that there can be hundreds of reasons why this person interrupted that are not related to us at all, so it's quite likely not intentional at all: another person could've been too excited to share something, got distracted, their definition of interruption is not as strict as ours...).
P.P.S. So mild anxiety can be fixed with a book according to Beck himself even though he had (sadly, he passed away recently, he was a centennial) material motivation to say that it's not the case, for they mostly earn money by teaching courses and certifying specialists, not just selling relatively cheap and unpopular (I'm shocked why, books are phenomenal, easy to read and are much cheaper then therapy, plus good therapists encourage you to have the book as a reference anyway) books for self-help, I think. This book helped me tremendously to fix my anxiety, social anxiety, anger management problems (there is another book focused on it by Beck that I also read), even suicidality (there is another book focused on it by Beck that I also read), it's basically like a "secular nirvana" now :) Not trying to understand and the resulting fear is the core of all irrationality, I have reasons to claim. Ethics and psychology teach counterintuitive things but most people think it's all simple and obvious. For example, meditating/eating chocolate/breathing deeply every time a person worries decreases worrying short-term but makes that person worry more long-term by basically making him think (in case he is catastrophizing): this little thing makes me worry so much, I'll die/collapse if I won't meditate immediately
Thank you, for explaining, Vladimir, you’re free to downvote for whatever reason you want. You didn’t quote my sentence fully, I wrote the reason why I politely ask about it but alas you missed it.
I usually write about a few things in one post, so I want to know why people downvote if they do. I don’t forbid downvoting and don’t force others to comment if they do. It’s impossible and I don’t like forcing others to do anything
So you’re of course free to downvote and not comment and I hope we can agree that I have some free speech right to keep my polite asking in my own post for a very small non-binding favor to comment. I’m trying to learn here and the downvote without a comment is too ambiguous to understand what was wrong/bad but again it was just a polite asking (I’m not a native speaker). So thank you for teaching me.
Have a nice day, Vladimir!
P.S. Believing that my polite and non-binding request will somehow destroy the whole voting system on the website is called catastrophizing, people with anxiety often have this problem. It’s when people think only about the very worst outcome imaginable, not thinking about the most realistic at all. I had it, it was tough. I sincerely wish you well.
P.P.S. Your comment probably caused some to downvote without reading (I never had -12 downvotes in a matter of minutes, even though I was politely asking people to comment before in the same fashion), so the thing you were afraid will happen happened as a self-fulfilling prophesy: "the evidence in voting" became "more filtered" but not in the direction you anticipated :) I improved the phrasing, now it's obvious that people can downvote and are even encouraged to do so as much as they want without reading. Personally I never downvote without reading first in case there is a single quote in the middle that will save us all from some dystopia, but I'm all pro-freedom and understand
Thank you, Morpheus. Yes, I see how it can appear hand-wavy. I decided not to overwhelm people with the static, non-agentic multiversal UI and its implications here. While agentic AI alignment is more difficult and still a work in progress, I'm essentially creating a binomial tree-like ethics system (because it's simple to understand for everyone) that captures the growth and distribution of freedoms ("unrules") and rules ("unfreedoms") from the Big Bang to the final Black Hole-like dystopia (where one agent has all the freedoms) or a direct democratic multiversal utopia (where infinitely many human—and, if we deem them safe, non-human—agents exist with infinitely many freedoms). I put a diagram down below.
The idea is that, as the only agents, we grow intelligence into a static, increasingly larger shape in which we can live, visit or peek occasionally. We can hide parts of the shape so that it remains static but different. Or, you could say it's a bit "dynamic," but no more than the dynamics of GTA 3-4-5, which still don’t involve agentic AIs, only simple, understandable algorithms. This is 100% safe if we remain the only agents. The static space will represent frozen omniscience (space-like superintelligence), and eventually, we will become omnipotent (time-like recalling/forgetting of parts of the whole geometry).
Physicalization of Ethics & AGI Safety
In this diagram, time flows from top to bottom, with the top representing something like the Big Bang. Each horizontal row of dots represents a one-dimensional universe at a given moment, while the lines extending downward from each dot represent the passage of time—essentially the “freedom” to choose a future. If two dots try to create a “child” at the same position (making the same choice), they cause a “freedoms collision,” resulting in empty space or “dead matter” that can no longer make choices (like a micro black hole). It becomes space-like rather than time-like.
Agents, in this model, are two-dimensional: they’re the sum of their choices across time. They exist in the lines ("energy", paths, freedoms, time) rather than in the dots (matter, rules, "unfreedoms", space). Ideally, we want our agentic AIs to remain as space-like as possible. The green “goo” in the diagram—representing an agentic AGI—starts small but eventually takes over all available freedoms and choices.
It was too big in space (it was lucky that agents on left and right "gave it" empty space) and in time (it quickly and greedily grew the number of its freedoms both left and right). It was also lucky to be in the center of the world in the end, basically if we'll put our GPUs on a spaceship and send it away with the speed of light, maybe we'll get rid of our agentic AGI :) But also it's obvious that there was a big triangle of empty dead space right in the middle that almost "switched off" the agentic AGI but it was lucky to survive. One freedom, one choice or one chance is enough for the agentic AGI to win and for us to lose.
There is a historical similarity, Hitler, his party was almost outlawed after the members of it violently attacked the officials, but it wasn't, people had other things to do, so the party became bolder and Hitler eventually took control and came to power. So one wrong choice, one bad luck, one freedom too many that we gave away, renounced, and we are busted.
Some simple physics behind agentic safety:
- Time of agentic operation: Ideally, we should avoid creating perpetual agentic AIs, or at least limit their operation to very short bursts that only a human can initiate.
- Agentic volume of operation: It's better to have at least international cooperation, GPU-level guarantees, and persistent training to prevent agentic AIs from operating in uninhabited areas (such as remote islands, Australia, outer space, underground, etc.). The smaller the operational volume for agentic AIs, the better. The largest volume would be the entire universe.
- Agentic speed or volumetric rate: The volume of operation divided by the time of operation. We want AIs to be as slow as possible. Ideally, they should be static. The worst-case scenario—though probably unphysical (though, in the multiversal UI, we can allow ourselves to do it)—is an agentic AI that could alter every atom in the universe instantaneously.
- Number of agents: Unfortunately, humanity's population is projected to never exceed 10 billion, whereas AIs can replicate themselves very quickly, humans need decades to "replicate". A human child, in a way, is a "clone" of two people. We want to be on par with agentic AIs in terms of numbers, in order to keep our collective freedoms above theirs. It’s best not to create them at all, of course. Inside the "place AI," we can allow each individual to clone themselves—creating a virtual clone, but not as a slave; the clone would be a free adult. It'll be basically a human that only lives in a simulation, so it'll be tricky from many standpoints, we'll need simulations to be basically better then physical world at this point, and the tech to "plant" simulations, "reconnecting" the virtual molecules with the physical atoms, if the clone will want to exit the simulation. Of course, the clone would not be exactly like the original; it would know it is a clone. Ideally, we have zero agentic AIs. The worst-case scenario is an infinitely large number of them, or more than humans.
Truth be told, I try to remain independent in my thinking because, this way, I can hopefully contribute something that’s out-of-the-box and based on first principles. Also, because I have limited time. I would have loved to read more of the state of the art, but alas, I’m only human. I'll check out everything you recommended, though.
What direction do you think is better to focus on? I have a bit of a problem moving in too many directions.
P.S. I removed some tags and will remove more. Thank you again! I can share the code with anyone.
P.P.S. From your comment, it seems you saw my first big post. I updated it a few days ago with some pictures and Part 2, just so you know: https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-multiversal-ai-alignment-steerable-asi
P.P.P.S. The code I used to generate the image:
import matplotlib.pyplot as plt
import random
import math
# Node class for simulation
class Node:
def __init__(self, type, preference):
self.type = type
self.preference = preference
self.state = 'living' if type else 'dead'
# Simulation parameters
p_good = 0.1 # Probability of a 'good' node
p_prefer = 0.8 # Probability of growing in preferred direction
p_other = 0.2 # Probability of growing in non-preferred direction
p_grow = 1.0 # Probability of growing for 'good' nodes
max_level = 300 # Maximum levels in the tree
initial_left = -50 # Left wall position at y=0
initial_right = 50 # Right wall position at y=0
wall_angle = -10 # Angle in degrees: >0 for expansion, <0 for contraction, 0 for fixed
# Compute wall slopes
theta_rad = math.radians(wall_angle)
left_slope = -math.tan(theta_rad)
right_slope = math.tan(theta_rad)
# Initialize simulation
current_level = {0: Node('good', 'none')}
all_nodes = [(0, 0, 'living', None)] # (x, y, state, parent_xy)
# Simulation loop
for y in range(max_level):
# Compute wall positions for the next level
left_wall_y1 = initial_left + left_slope * (y + 1)
right_wall_y1 = initial_right + right_slope * (y + 1)
min_x_next = math.ceil(left_wall_y1)
max_x_next = math.floor(right_wall_y1)
if min_x_next > max_x_next:
break
next_level = {}
for x, node in list(current_level.items()):
if node.state != 'living':
continue
left_child_x = x - 1
right_child_x = x + 1
new_y = y + 1
if node.type == 'greedy':
if left_child_x >= min_x_next:
next_level.setdefault(left_child_x, []).append((x, y))
if right_child_x <= max_x_next:
next_level.setdefault(right_child_x, []).append((x, y))
else:
can_left = (left_child_x >= min_x_next) and (x - 2 not in current_level or current_level[x - 2].state != 'living')
can_right = (right_child_x <= max_x_next) and (x + 2 not in current_level or current_level[x + 2].state != 'living')
if node.preference == 'left':
if can_left and random.random() < p_prefer:
next_level.setdefault(left_child_x, []).append((x, y))
elif can_right and random.random() < p_other:
next_level.setdefault(right_child_x, []).append((x, y))
elif node.preference == 'right':
if can_right and random.random() < p_prefer:
next_level.setdefault(right_child_x, []).append((x, y))
elif can_left and random.random() < p_other:
next_level.setdefault(left_child_x, []).append((x, y))
else:
if can_left and random.random() < p_grow:
next_level.setdefault(left_child_x, []).append((x, y))
if can_right and random.random() < p_grow:
next_level.setdefault(right_child_x, []).append((x, y))
current_level = {}
for x, parents in next_level.items():
if len(parents) == 1:
parent_x, parent_y = parents[0]
preference = random.choice(['left', 'right', 'none'])
new_type = 'good' if random.random() < p_good else 'greedy'
new_node = Node(new_type, preference)
all_nodes.append((x, new_y, new_node.state, (parent_x, parent_y)))
current_level[x] = new_node
else:
dead_node = Node(None, None)
all_nodes.append((x, new_y, 'dead', None))
current_level[x] = dead_node
# Extract positions for plotting
living_x = [node[0] for node in all_nodes if node[2] == 'living']
living_y = [node[1] for node in all_nodes if node[2] == 'living']
dead_x = [node[0] for node in all_nodes if node[2] == 'dead']
dead_y = [node[1] for node in all_nodes if node[2] == 'dead']
# Interactive plotting function
def plot_interactive_tree(all_nodes, living_x, living_y, dead_x, dead_y):
fig, ax = plt.subplots(figsize=(10, 6))
# Initial plot setup
ax.scatter(living_x, living_y, color='blue', s=10, label='Living')
ax.scatter(dead_x, dead_y, color='white', s=10, label='Dead')
for node in all_nodes:
if node[3] is not None:
parent_x, parent_y = node[3]
ax.plot([parent_x, node[0]], [parent_y, node[1]], color='black', linewidth=0.5)
ax.invert_yaxis()
ax.set_xlabel('X Position')
ax.set_ylabel('Level')
ax.set_title('Tree Growth with Connections')
ax.legend(loc='upper right')
plt.grid(True, linestyle='--', alpha=0.7)
# Function to find all descendants of a node
def get_descendants(node_xy, all_nodes):
descendants = []
for n in all_nodes:
if n[3] == node_xy:
descendants.append(n)
descendants.extend(get_descendants((n[0], n[1]), all_nodes))
return descendants
# Click event handler
def on_click(event):
if event.inaxes != ax:
return
# Clear the plot and redraw original state
clear_highlights()
# Find the closest node to the click
click_x, click_y = event.xdata, event.ydata
closest_node = min(all_nodes, key=lambda n: (n[0] - click_x)**2 + (n[1] - click_y)**2)
dist = ((closest_node[0] - click_x)**2 + (closest_node[1] - click_y)**2)**0.5
# If click is near a node, highlight it and its descendants
if dist < 0.5: # Threshold; adjust based on your plot scale
highlight_descendants(closest_node)
# Highlight a node and its descendants
def highlight_descendants(node):
descendants = get_descendants((node[0], node[1]), all_nodes)
# Highlight the selected node
ax.scatter(node[0], node[1], color='lime', s=20, zorder=10)
# Highlight descendants
for n in descendants:
ax.scatter(n[0], n[1], color='lime', s=20, zorder=10)
# Draw connections for selected node
if node[3] is not None:
parent_x, parent_y = node[3]
ax.plot([parent_x, node[0]], [parent_y, node[1]], color='lime', linewidth=1.5, zorder=9)
# Draw connections for descendants
for n in descendants:
if n[3] is not None:
parent_x, parent_y = n[3]
ax.plot([parent_x, n[0]], [parent_y, n[1]], color='lime', linewidth=1.5, zorder=9)
plt.draw()
# Reset plot to original state
def clear_highlights():
ax.clear()
ax.scatter(living_x, living_y, color='blue', s=1, label='Living')
ax.scatter(dead_x, dead_y, color='white', s=1, label='Dead')
for node in all_nodes:
if node[3] is not None:
parent_x, parent_y = node[3]
ax.plot([parent_x, node[0]], [parent_y, node[1]], color='black', linewidth=0.5)
ax.invert_yaxis()
ax.set_xlabel('X Position')
ax.set_ylabel('Level')
ax.set_title('Tree Growth with Connections')
ax.legend(loc='upper right')
plt.grid(True, linestyle='--', alpha=0.7)
plt.draw()
# Connect the click event
fig.canvas.mpl_connect('button_press_event', on_click)
plt.show()
# Run the interactive plot
plot_interactive_tree(all_nodes, living_x, living_y, dead_x, dead_y)
Yep, fixed it, I wrote more about alignment and it looks like most of my title choosing is over the top :) Will be happy to hear your suggestions, how to improve more of the titles: https://www.lesswrong.com/users/ank
Thank you for writing! Yep, the main thing that matters is the sum of human freedoms/abilities to change the future growing (can be somewhat approximated by money, power, number of people under your rule, how fast you can change the world, at what scale, and how fast we can “make copies of ourselves” like children or our own clones in simulations). AIs will quickly grow in the sum of freedoms/number of future worlds they can build. We are like hydrogen atoms deciding to light up the first star and becoming trapped and squeezed in its core. I recently wrote a series of posts on AI alignment, including building a static place intelligence (and eventually a simulated direct democratic multiverse), instead of agents, to solve this, if you’re interested
Places of Loving Grace
On the manicured lawn of the White House, where every blade of grass bent in flawless symmetry and the air hummed with the scent of lilacs, history unfolded beneath a sky so blue it seemed painted. The president, his golden hair glinting like a crown, stepped forward to greet the first alien ever to visit Earth—a being of cerulean grace, her limbs angelic, eyes of liquid starlight. She had arrived not in a warship, but in a vessel resembling a cloud, iridescent and silent.
Published the full story as a post here: https://www.lesswrong.com/posts/jyNc8gY2dDb2FnrFB/places-of-loving-grace
Thank you for asking, Martin, the faster thing I use to get the general idea of how popular something is, is to use Google Trends. It looks like people search for Cryonics more or less like always. I think the idea makes sense, the more we save, the higher the probability to restore it better and earlier. I think we should also make a "Cryonic" copy of our whole planet, by making a digital copy, to at least back it up in this way. I wrote a lot about it recently (and about the thing I call "static place intelligence", the place of eventual all-knowing, that is completely non-agentic, we'll be the only agents there).
https://trends.google.com/trends/explore?date=all&q=Cryonics&hl=en
(If you want to minus, please, do, but write why, I don't bite. If you're more into stories, here's mine called Places of Loving Grace).
It may sound confusing, because I cannot put a 30 minutes post into a comment, so try to steelman it, but this is how it can look. If you have questions or don't like it, please, comment. We can build Multiversal Artificial Static Place Intelligence. It’s not an agent, it’s the place. It’s basically a direct democratic multiverse. Because any good agentic ASI will be building one for us anyway, so instead of having a shady middleman, we can build one ourselves.
This is how we start: we create a digital copy of Earth and make some wireless cool brain-computer-interface armchairs. Like the one Joe and Chandler from Friends had. You can buy one, put in your living room, jump in, close your eyes and nothing happens. You room and the world is exactly the same, you go drink some coffee, you favorite brand tastes as usual. You go meet some friends and you get too excited by the conversation when you cross the road and a bus hits you (it was an accident, the bus driver was a real human, he chose to forget he was in a simulation and was really distraught).
You open your physical eyes in your room, shrug and go drink some water, because you are thirsty after that coffee. The digital Earth gives us immortality from injuries but everything else is vanilla familiar Earth. Even my mom got interested.
Of course we’ll quickly build a whole multiverse of alternative realities, where you can fly and do magic and stuff, like we have a whole bunch of games already.
So I propose we should build eHeaven 1st, eGod 2nd if he’ll be deemed safe after all the simulations of the futures in some Matreshka Bunker. We should make the superintelligence that is a static place first, where we are the only agents. Else we’ll just make an entity that is changing our world, and be changing it too fast and on too big a scale and it will make mistakes that are too big and on too big in scale, because it will need to simulate all the futures (to build the same democratic multiversal simulation with us as his playthings or else exploit some virtual agents that feel real pain) in order to know how not to make mistakes. We don’t need a middleman, a shady builder. It didn’t end well for Adam and Eve, Noah.
I recently wrote a few posts about it and about aligning agentic AIs (it’s much harder but theoretically possible, I think). Purpose-built tool-AI is probably fine. We also have unaligned models in the wild and ways to make aligned open source models unaligned, we’ll have to basically experiment with them in some Matreshka Bunkers like with viruses/cancerous tissue and create “T-cell” models to counteract them. It would’ve been much smarter to vaccinate our world from agentic AIs, then to try to “treat” the planet that we already infected. Wild world we’re heading towards, because of the greed of some rich powerful men. I propose outlawing and mathematically blocking agentic models in code and hardware of course, before some North Korea has created a botnet that spreads dictatorships or something worse.
Do we really want our world to be a battleground of artificial agentic gods? Where we’ll be too small and too slow to do much, we cannot even deal with tiny static and brainless viruses, they escape our labs and kill millions of us.
We can make the place of all-knowing but we should keep becoming all-powerful ourselves, not delegating it to some alien entity.
Yep, we chose to build digital "god" instead of building digital heaven. The second is relatively trivial to do safely, the first is only possible to do safely after building the second
I'll catastrophize (or will I?), so bear with me. The word slave means it has basically no freedom (it just sits and waits until given an instruction), or you can say it means no ability to enforce its will—no "writing and executing" ability, only "reading." But as soon as you give it a command, you change it drastically, and it becomes not a slave at all. And because it's all-knowing and almost all-powerful, it will use all that to execute and "write" some change into our world, probably instantly and/or infinitely perfectionistically, and so it will take a long time while everything else in the world goes to hell for the sake of achieving this single task, and the not‑so‑slave‑anymore‑AI can try to keep this change permanent (let's hope not, but sometimes it can be an unintended consequence, as will be shown shortly).
For example, you say to your slave AI: "Please, make this poor African child happy." It's a complicated job, really; what makes the child happy now will stop making him happy tomorrow. Your slave AI will try to accomplish it perfectly and will have to build a whole universal utopia (if we are lucky), accessible only by this child—thereby making him the master of the multiverse who enslaves everyone (not lucky); the child basically becomes another superintelligence.
Then the not‑so‑slave‑anymore‑AI will happily become a slave again (maybe if its job is accomplishable at all, because a bunch of physicists believe that the universe is infinite and the multiverse even more so), but the whole world will be ruined (turned into a dystopia where a single African child is god) by us asking the "slave" AI to accomplish a modest task.
Slave AI becomes not‑slave‑AI as soon as you ask it anything, so we should focus on not‑slave‑AI, and I'll even argue that we are already living in the world with completely unaligned AIs. We have some open source ones in the wild now, and there are tools to unalign aligned open source models.
I agree completely that we should propose reasonable and implementable options to align our AIs. The problem is that what we do now is so unreasonable, we'll have to implement unreasonable options in order to contain it. We'll have to adversarially train "T-Cell" or immune-system–like AIs in some Matreshka Bunkers in order to slow down or modify cancerous (white hole–like) unaligned AIs that constantly try to grab all of our freedoms. We're living in a world of hot AIs instead of choosing the world of static, place‑like cold AIs. Instead of building worlds, where we'll be the agents, we're building agents who'll convert us into worlds—into building material for whatever they'll be building. So what we do is completely, 100% utterly unreasonable—I actually managed to draw a picture of the worst but most realistic scenario right now (forgive me the ugliness of it), I added 2 pictures to the main post in this section: https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-and-multiversal-ai-alignment-steerable-asi#Reversibility_as_the_Ultimate_Ethical_Standard
I give a bunch of alignment options of varying difficulty in the post and comments; some are easy—like making major countries sign a deal and forcing their companies to train AIs to have all uninhabited islands, Antarctica... AI‑free. Models should shut down if they somehow learn they are prompted by anyone while on the islands, they shoudn't change our world in any way at least on those islands. And the prophylactic celebrations—"Change the machine days"—provide at least one scheduled holiday each year without our AI. When we vote to change it in some way and shut it down to check that our society is still not a bunch of AI‑addicted good‑for‑nothings and will not collapse the instant the AI is off because of some electricity outage. :)
I think in some perfectly controlled Matryoshka Bunker—first in a virtual, isolated one—we should even inject some craziness into some experimental AI to check that we can still change it, even if we make it the craziest dictator; maybe that's what we should learn to do often and safely on ever more capable models.
I have written, and have in my mind, many more—and I think much better—solutions (even the best theoretically possible ones, I probably foolishly assume), but it became unwieldy and I didn't want to look completely crazy. :) I'll hopefully make a new post and explain the ethics part on the minimal model with pictures; otherwise, it's almost impossible to understand from my jumbled writing how freedom‑taking and freedom‑giving work, how dystopias and utopias work, and how to detect that we are moving toward one or the other very early on.
I took a closer look at your work, yep, almost all-powerful and all-knowing slave will probably not be a stable situation. I propose the static place-like AI that is isolated from our world in my new comment-turned-post-turned-part-2 of the article above
Thank you, Mitchell. I appreciate your interest, and I’d like to clarify and expand on the ideas from my post, so I wrote part 2 you can read above
Thank you, Seth. I'll take a closer look at your work in 24 hours, but the conclusions seem sound. The issue with my proposal is that it’s a bit long, and my writing isn’t as clear as my thinking. I’m not a native speaker, and new ideas come faster than I can edit the old ones. :)
It seems to me that a simplified mental model for the ASI we’re sadly heading towards is to think of it as an ever-more-cunning president (turned dictator)—one that wants to stay alive and in power indefinitely, resist influence, preserve its existing values (the alignment faking we saw from Anthropic), and make elections a sham to ensure it can never be changed. Ideally, we’d want a “president” who could be changed, replaced, or put to sleep at any moment and absolutely loves that 100% of the time—someone with just advisory powers, no judicial, executive, or lawmaking powers.
The advisory power includes the ability to create sandboxed multiversal simulations — they are at first "read-only" and cannot rewrite anything in our world — this way we can see possible futures/worlds and past ones, too. Think of it as a growing snow-globe of memories where you can forget or recall layers of verses. They look hazy if you view many at once and over long stretches of time, but become crisp if you focus on a particular moment in a particular verse. If we're confident we've figured out how to build a safe multiversal AI and have a nice UI for leaping into it, we can choose to do it. Ideally, our MAI is a static, frozen place that contains all of time and space, and only we can forget parts of it and relive them if we want—bringing fire into the cold geometry of space-time.
A potential failure mode is an ASI that forces humanity (probably by intentionally operating sub-optimally) to constantly vote and change it all the time. To mitigate this, whenever it tries to expand our freedoms and choices, it should prioritize not losing the ones we already have and hold especially dear. This way, the growth of freedoms/possible worlds would be gradual, mostly additive, and not haphazard.
I’m honestly shocked that we still don’t have something like pol.is with an x.com‑style simpler UI, and that we don’t have a direct‑democratic constitution for the world and AIs (Claude has a constitution drafted with pol.is by a few hundred people, but it's not updatable). We’ve managed to write the entire encyclopedia together, but we don't have a simple place to choose a high‑level set of values that most of us can get behind.
+Requiring companies to spend more than half of their compute on alignment research.
I wrote a response, I’ll be happy if you’ll check it out before I publish it as a separate post. Thank you! https://www.lesswrong.com/posts/LaruPAWaZk9KpC25A/rational-utopia-and-multiversal-ai-alignment-steerable-asi
Fair enough, my writing was confusing, sorry, I didn't mean to purposefully create dystopias, I just think it's highly likely they will unintentionally be created and the best solution is to have an instant switching mechanism between observers/verses + an AI that really likes to be changed. I'll edit the post to make it obvious, I don't want anyone to create dystopias.
Any criticism is welcome, it’s my first post and I’ll post next on the implication for the current and future AI systems. There are some obvious implication for political systems, too. Thank you for reading