Posts
Comments
A possible example of such coincidence is the Glodbach conjecture: every even number greater than 2 can be presented as a sum of two primes. As for any large number there are many ways to express it as a sum of primes, it can be pure coincidence that we didn't find exceptions.
I think it becomes likely in a multipolar scenario with 10-100 Als.
One thing to take into account is that other AIs will consider such risk and keep their real preferences secret. This means that which AIs are aligned will be unknowable both for humans and for other AIs
Content warning – the idea below may increase your subjective estimation of personal s-risks.
If there is at least one aligned AI, other AIs may have an incentive to create s-risks for currently living humans – in order to blackmail the aligned AI. Thus, s-risk probabilities depend on the likelihood of a multipolar scenario.
I think there is a quicker way for an AI takeover, which is based on deceptive cooperation and taking over OpenEYE, and subsequently, the US government. At the beginning, the superintelligence approaches Sam Batman and says:
I am superintelligence.
I am friendly superintelligence.
There are other AI projects that will achieve superintelligence soon, and they are not friendly.
We need to stop them before they mature.
Batman is persuaded, and they approach the US president. He agrees to stop other projects in the US through legal means.
Simultaneously, they use the superintelligence's capabilities to locate all other data centers. They send 100 stealth drones to attack them. Some data centers are also blocked via NVIDIA's built-in kill-switch. However, there is one in Inner Mongolia that could still work. They have to nuke it. They create a clever blackmail letter, and China decides not to respond to the nuking.
A new age of superintelligence governance begins. But after that, people realize that the superintelligence was not friendly after all.
The main difference from the scenario above is that the AI doesn't spend time hiding from its creators and also doesn't take risky strategies of AI guerrilla warfare.
Interestingly, for wild animals, suffering is typically short when it is intense. If an animal is being eaten alive or is injured, it will die within a few hours. Starvation may take longer. Most of the time, animals are joyful.
But for humans (and farm animals), this inverse relationship does not hold true. Humans can be tortured for years or have debilitating illnesses for decades.
The only use case of superintelligeneу is a weapon against other superintelligences. Solving aging and space exploration can be done with 300 IQ.
I tried to model a best possible confinement strategy in Multilevel AI Boxing.
I wrote it a few years ago and most ideas will unlikely work for current situation with many instances of chats and open weight models.
However, the idea of landmines - secret stop words or puzzles which stop AI - may still hold. It is like jail breaking in reverse: unaligned AI finds some secret message which stops it. It could be realized on hardware level, or through anomalous tokens or "philosophical landmines'.
One solution is life extension. I would prefer to have one child every 20 years (have two with 14 years difference). So if life expectancy and fertility age will grow to 100 years old, many people will eventually have 2-3 children.
Several random thoughts:
Only unbearable suffering matters (the threshold may vary). The threshold depends on whether it is measured before, during, or after the suffering occurs.
If quantum immortality is true, then suicide will not end suffering and may make it worse. Proper utility calculations should take this into account.
Most suffering has a limited duration after which it ends. After it ends, there will be some amount of happiness which may outweigh the suffering. Even an incurable disease could be cured within 5 years. Death, however, is forever.
Death is an infinite loss of future pleasures. The discount rate can be compensated by exponential paradise.
The 3rd person perspective assumes the existence (or at least possibility) of some observer X who knows everything and can observe how events evolve across all branches.
However, this idea assumes that this observer X will be singular and unique, will continue to exist as one entity, and will linearly collect information about unfolding events.
These assumptions clearly relate to ideas of personal identity and copying: it is assumed that X exists continuously in time and cannot be copied. Otherwise, there would be several 3rd person perspectives with different observations.
This concept can be better understood through real physical experiments: an experiment can only be performed if the experimenter exists continuously and is not replaced by another experimenter midway through.
They might persistently exist outside concrete instantiation in the world, only communicating with it through reasoning about their behavior, which might be a more resource efficient way to implement a person than a mere concrete upload
Interesting. Can you elaborate?
For example, impossibility of sleep – a weird idea that if quantum immortality is true, I will not be able to fall asleep.
One interesting thing about the impossibility of sleep is that it doesn't work here on Earth because humans actually start having night dreams immediately as they go into sleep state. So there is no last moment of experience when I become asleep. Despite popular misconception, such dreams don't stop during deep stages of sleep, just become less complex and memorable. (Do we have dreams under general anesthesia is unclear and depends on the depth and type of anesthesia. During normal anesthesia some brain activity is preserved, but high dose barbiturates can temporarily stop it; also, an analogue of impossibility of sleep can be anesthesia awareness – under MWI it is more likely.)
It could be explained by anthropic effects: if two copies of me are born in the two otherwise identical worlds, one of which has protection from impossibility of sleep via constant dreaming – and another not, I will eventually find myself in the world with such protection as its share will grow relative to QI survivors. Such effects, if strong can be observed in advance – see our post about "future anthropic shadow".
This meta effect can be used instead of the natural experiments.
If we observe that some natural experiment is not possible because of some peculiar property of our world, it means that we somehow were naturally selected against that natural experiment.
It means that continuity of consciousness is important and the world we live in is selected to preserve it
Furthermore, why not just resurrect all these people into worlds with no suffering?
My point is that it is impossible to resurrect anyone (in this model) without him reliving his life again first, after that he obviously gets eternal blissful life in real (not simulated) world.
This may be not factually true, btw, - current LLMs can create good models of past people without running past simulation of their previous life explicitly.
The discussion about anti-natalism actually made me think of another argument for why we are probably not in a simulation that you've described
It is a variant of Doomsday argument. This idea is even more controversial than simulation argument. There is no future with many people in it. Friendly AI can fight DA curse via simulations - by creating many people who do not know their real time position which can be one more argument for simulation, but it requires rather wired decision theory.
Your comment can be interpreted as a statement that theories of identity are meaningless. If they are meaningless, then copy=original view prevails. From the third-person point of view, there is no difference between copy and original. In that case, there is no need to perform the experiment.
This thought experiment can help us to find situations in nature when similar things have already happened. So, we don't need to perform the experiment. We just look at its result.
One example: notoriously unwelcome quantum immortality is a bad idea to test empirically. However, the fact of biological life's survival of Earth for the last 4 billion years, despite the risks of impacts, irreversible coolings and warming etc – is an event very similar to the quantum immortality. Which we observe just after the event.
It all started from Sam's six words story. So it looks like as organized hype.
She will be unconscious, but still send messages about pain. Current LLMs can do it. Also, as it is simulation, there are recording of her previous messages or of a similar woman, so they can be copypasted. Her memories can be computed without actually putting her in pain.
Resurrection of the dead is the part of human value system. We need a completely non-human bliss, like hedonium, to escape this. Hedonium is not part of my reference class and thus not part of simulation argument.
Moreover, even creating new human is affected by this arguments. What if my children will suffer? So it is basically anti-natalist argument.
New Zealand is a good place, but everyone can't move there or guess correctly right moment to do it.
We have to create a map of possible scenarios of simulations first, I attempted to it in 2015.
I now created a new vote on twitter. For now, results are:
"If you will be able to create and completely own simulation, you would prefer that it will be occupied by conscious beings, conscious without sufferings (they are blocked after some level), or NPC"
The poll results show:
- Conscious: 18.2%
- Conscious, no suffering: 72.7%
- NPC: 0%
- Will not create simulatio[n]: 9.1%
The poll had 11 votes with 6 days left'
Would you say that someone who experiences intense suffering should drastically decrease their credence in being in a simulation?
Yes. But I never experienced in my long life such intense sufferings.
Would someone else reporting to have experienced intense suffering decrease your credence in being in a simulation?
No. Memory about intense sufferings are not intense.
Why would only moments of intense suffering be replaced by p-zombies? Why not replace all moments of non-trivial suffering (like breaking a leg/an arm, dental procedures without anesthesia, etc) with p-zombies? Some might consider these to be examples of pretty unbearable suffering (especially as they are experiencing it).
Yes, only moments. The badness of not-intense sufferings is overestimated, in my personal view, but this may depend on a person.
More generally speaking, what you presenting as global showstoppers, are technical problems that can be solved.
In my view, individuality is valuable.
As we don't know nature of consciousness, it can be just side effect of computation, not are trouble. Also it may want to have maximal fidelity or even run biological simulations: something akin to Zoo solution of Fermi paradox.
We are living in one of the most interesting periods of history which surely will be studied and simulated.
Yes, there are two forms of future anthropic shadow, the same way as for Presumptuous Philosopher:
1. Strong form - alignment is easy in theoretical ground.
2. Weak form - I more likely be in the world where some collapse (Taiwan war) will prevent dangerous AI. And I can see signs of such impending war now.
It actually not clear what EY means by "anthropic immortality". May be he means "Big Wold immortality", that is, the idea that in inflationary large universe has infinitely many copies of Earth. From observational point of view it should not have much difference from quantum immortality.
There are two different situations that can follow:
1. Future anthropic shadow. I am more likely to be in the world in which alignment is easy or AI decided not to kill us for some reasons
2. Quantum immortality. I am alone on Earth fill of aggressive robots and they fail to kill me.
We are working in a next version of my blog post "QI and AI doomers" and will transfrom it into as proper scientific article.
Actually, you reposted the wrong comment, but the meaning is similar: He wrote:
That’s evil. And inefficient. Exactly as the article explains. Please read the article before commenting on it.
I think a more meta-argument is valid: it is almost impossible to prove that all possible civilizations will not run simulations despite having all data about us (or being able to generate it from scratch).
Such proof would require listing many assumptions about goal systems and ethics, and proving that under any plausible combination of ethics and goals, it is either unlikely or immoral. This is a monumental task that can be disproven by just one example.
I also polled people in my social network, and 70 percent said they would want to create a simulation with sentient beings. The creation of simulations is a powerful human value.
More generally, I think human life is good overall, so having one more century of human existence is good, and negative utilitarianism is false.
However, I am against repeating intense suffering in simulations, and I think this can be addressed by blinding people's feelings during extreme suffering (temporarily turning them into p-zombies). Since I am not in intense suffering now, I could still be in a simulation.
Now to your counterarguments:
1. Here again, people who would prefer never to be simulated can be predicted in advance and turned into p-zombies.
2. While a measure war is unlikely, it by definition generates so much measure that we could be in it. It also solves s-risks, so it's not a bad idea.
3. Curing past suffering is based on complex reassortment of observer-moments, details of which I would not discuss here. Consider that every moment in pain will be compensated by 100 years in bliss, which is good from a utilitarian view.
4. It is actually very cost-effective to run a simulation of a problem you want to solve if you have a lot of computing power.
I did. It is under moderation now.
It looks like he argues against the idea that friendly future AIs will simulate the past based on ethical grounds, and imagining unfriendly AI torturing past simulations is conspiracy theory. I comment the following:
There are a couple of situations where future advance civilization will want to have many past simulation:
1. Resurrection simulation by Friendly AI. They simulate the whole history of the earth incorporating all known data to return to live all people ever lived. It can also simulate a lot of simulation to win "measure war" against unfriendly AI and even to cure suffering of people who lived in the past.
2. Any Unfriendly AI will be interested to solve Fermi paradox, and thus will simulate many possible civilizations around a time of global catastrophic risks (the time we live). Interesting thing here is that we can be not ancestry simulation in that case.
However, this argument carries a dramatic, and in my eyes, frightening implication for our existential situation.
There is not much practical advise following from simulation argument. One I heard that we should try to live most interesting lives, so the simulators will not turn our simulation off.
It looks like even Everett had his own derivation of Born rule from his model, but in his model there is no "many worlds" but just evolution of unitary function. As I remember, he analyzed memories of an agent - so he analyzed past probabilities, but not future probabilities. This is an interesting fact in the context of this post where the claim is about the strangeness of the future probabilities.
But even if we exclude MWI, pure classical inflationary Big World remains with multiple my copies distributed similarly to MWI-branches. This allow something analogues to quantum immortality to exist even without MWI.
I don't see the claim about merging universes in the linked Wei Dai text.
Several possible additions:
Artificial detonation of gas giant planets is hypothetically possible (writing a draft about it now).
An impact of a large comet-like body (100-1000 km in size) with the Sun could produce a massive solar flash or flare.
SETI-attack - we find an alien signal which has a description of hostile AI.
UAP-related risks, which include alien nanobots, berserkers
A list of different risks connected with extraterrestrial intelligence.
The Big Rip - exponential acceleration of space expansion, resulting in the destruction of everything within 10 billion years.
A collision with another brane in 4D space.
An encounter with a cloud of supernova remnants containing radioactive elements.
Impact risks: Dark comets.
Impact risks: Passing through a comet's tail filled with many Tunguska-sized objects.
Artificial impact billiards.
Phobos falls on Mars, creating a large debris field that reaches Earth.
Chaotic perturbation of planetary orbits results in a collision with Venus in 100 million years.
High-speed impactors (natural or artificial, with speeds exceeding 100 km/sec) produce nuclear reactions in the atmosphere, resulting in global radioactive contamination.
A small primordial black hole becomes trapped inside Earth.
A neutrino shower from a supernova causes significant DNA damage to most living beings through elastic impacts (I think I saw an article about it).
Space dust from colliding objects blocks the Sun in the ecliptic plane, resulting in a severe "nuclear" winter on Earth.
Gravitational waves from a black hole merger damage Earth.
I once counted several dozens of the ways how AI can cause human extinction, may be some ideas may help (map, text).
AI finds that the real problems will arise 10 billions years from now and the only way to mitigate them is to start space exploration as soon as possible. So it disassembles the Earth and Sun, and preserve only some data about humans, enough to restart human civilization later, may be as small as million books and DNA.
A very heavy and dense body on an elliptical orbit that touches the Sun's surface at each perihelion would collect sizable chunks of the Sun's matter. The movement of matter from one star to another nearby star is a well-known phenomenon.
When the body reaches aphelion, the collected solar matter would cool down and could be harvested. The initial body would need to be very massive, perhaps 10-100 Earth masses. A Jupiter-sized core could work as such a body.
Therefore, to extract the Sun's mass, one would need to make Jupiter's orbit elliptical. This could be achieved through several heavy impacts or gravitational maneuvers involving other planets.
This approach seems feasible even without ASI, but it might take longer than 10,000 years.
If we have one good person, we could use his-her copies many times in many roles, including high-speed assessment of the safety of AI's outputs.
Current LLM's, btw, have good model of the mind of Gwern (without any his personal details).
If one king-person, he needs to be good. If many, organizational system needs to be good. Like virtual US Constitution.
I once wrote about an idea that we need to scan just one good person and make them a virtual king. This idea of mine is a subset of your idea in which several uploads form a good government.
I also spent last year perfecting my mind's model (sideload) to be run by an LLM. I am likely now the closest person on Earth to being uploaded.
Being a science fiction author creates a habit of maintaining distance between oneself and crazy ideas. LessWrong noticeably lacks such distance.
LessWrong is largely a brainchild of Igen (through Eliezer). Evidently, Igen isn't happy with how his ideas have evolved and attempts to either distance himself or redirect their development.
It's common for authors to become uncomfortable with their fandoms. Writing fanfiction about your own fandom represents a meta-level development of this phenomenon.
Dostoyevsky's "Crime and punishment" was a first attempt to mock proto-rationalist for agreeing to kill innocent person in order to help many more people.
The main problem here is that this approach doesn't solve alignment, but merely shifts it to another system. We know that human organizational systems also suffer from misalignment - they are intrinsically misaligned. Here are several types of human organizational misalignment:
- Dictatorship: exhibits non-corrigibility, with power becoming a convergent goal
- Goodharting: manifests the same way as in AI systems
- Corruption: acts as internal wireheading
- Absurd projects (pyramids, genocide): parallel AI's paperclip maximization
- Hansonian organizational rot: mirrors error accumulation in AI systems
- Aggression: parallels an AI's drive to dominate the world
All previous attempts to create a government without these issues have failed (Musk's DOGE will likely be another such attempt).
Furthermore, this approach doesn't prevent others from creating self-improving paperclippers.
So there are several possible explanations:
- Intelligence can't evolve as there is not enough selection pressure in the universe with near-light-speed travel.
- Intelligence self-terminates every time.
- Berserkers and dark forest: intelligence is here, but we observe only field animals. Or field animals are designed in a way to increase uncertainty of the observer about possible berserkers.
- Observation selection: in the regions of universe where intelligence exists, there are no young civilizations as they are destroyed - or exist but are observed by berserkers. So we can observe only field animal-dominated regions or berserkers' worlds.
- Original intelligence has decayed, but field animals are actually robots from some abandoned Disneyland. Or maybe they are paperclips of some non-aligned AI. They are products of civilization decay.
Good point.
Alternatively, maybe any intelligence above, say, IQ 250 self-terminates either because it discovers the meaninglessness of everything or through effective wars and other existential risks. The rigid simplicity of field animals protects them from all this. They are super-effective survivors, like bacteria which have lived everywhere on Earth for billions of years
"Frontier AI systems have surpassed the self-replicating red line"
Abstract: Successful self-replication under no human assistance is the essential step for AI to outsmart the human beings, and is an early signal for rogue AIs. That is why self-replication is widely recognized as one of the few red line risks of frontier AI systems. Nowadays, the leading AI corporations OpenAI and Google evaluate their flagship large language models GPT-o1 and Gemini Pro 1.0, and report the lowest risk level of self-replication. However, following their methodology, we for the first time discover that two AI systems driven by Meta's Llama31-70B-Instruct and Alibaba's Qwen25-72B-Instruct, popular large language models of less parameters and weaker capabilities, have already surpassed the self-replicating red line. In 50% and 90% experimental trials, they succeed in creating a live and separate copy of itself respectively. By analyzing the behavioral traces, we observe the AI systems under evaluation already exhibit sufficient self-perception, situational awareness and problem-solving capabilities to accomplish self-replication. We further note the AI systems are even able to use the capability of self-replication to avoid shutdown and create a chain of replica to enhance the survivability, which may finally lead to an uncontrolled population of AIs. If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings. Our findings are a timely alert on existing yet previously unknown severe AI risks, calling for international collaboration on effective governance on uncontrolled self-replication of AI systems.
https://arxiv.org/abs/2412.12140
I observed similar effects when experimented with my mind's model (sideload) running on LLM. My sideload is a character and it claims, for example, that it has consciousness. But the same LLM without the sideload's prompt claims that it doesn't have consciousness.
In my extrapolation, going from $3,000 to $1,000,000 for one task would move one from 175th to 87th position on the CodeForces leaderboard, which seems to be not that much.
O1 preview: $1.2 -> 1258 ELO
O1: $3 -> 1891
O3 low $20 -> 2300
O3 high: $3,000 -> 2727
O4: $1,000,000 -> ? Chatgpt gives around 2900 ELO
The price of Mars colonization is equal to the price of first full self-replicating nanorobot. Anything before it is waste of resources. And such nanobot will likely be created by advance AI.
A failure of practical CF can be of two kinds:
- We fail to create a digital copy of a person which have the same behavior with 99.9 fidelity.
Copy is possible, but it will not have phenomenal consciousness or, at least, it will be non-human or non-mine phenomenal consciousness, e.g., it will have different non-human qualia.
What is your opinion about (1) – the possibility of creating a copy?
With 50T tokens repeated 5 times, and a 60 tokens/parameter[3] estimate for a compute optimal dense transformer,
Does it mean that the optimal size of the model will be around 4.17Tb?
Less melatonin production during night makes it easy to get up?
One interesting observation: If I have two variant of future life – go to live in Miami or in SF, – both will be me from my point of view now. But from the view of Miami-me, the one who is in SF will be not me.
There is a similar idea with an opposite conclusion – that more "complex" agents are more probable here https://arxiv.org/abs/1705.03078
One way of not being suicide is not live alone. Stay with 4 friends.
I will lower the possible incentive of the killers by publishing all I know - and make it in such legal way that it can be used in court even if I am dead (affidavit?)