Posts

The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories 2025-01-22T11:48:46.071Z
What would be the IQ and other benchmarks of o3 that uses $1 million worth of compute resources to answer one question? 2024-12-26T11:08:23.545Z
Sideloading: creating a model of a person via LLM with very large prompt 2024-11-22T16:41:28.293Z
If I care about measure, choices have additional burden (+AI generated LW-comments) 2024-11-15T10:27:15.212Z
Quantum Immortality: A Perspective if AI Doomers are Probably Right 2024-11-07T16:06:08.106Z
Bitter lessons about lucid dreaming 2024-10-16T21:27:04.725Z
Three main arguments that AI will save humans and one meta-argument 2024-10-02T11:39:08.910Z
Debates how to defeat aging: Aubrey de Grey vs. Peter Fedichev. 2024-05-27T10:25:49.706Z
Magic by forgetting 2024-04-24T14:32:20.753Z
Strengthening the Argument for Intrinsic AI Safety: The S-Curves Perspective 2023-08-07T13:13:42.635Z
The Sharp Right Turn: sudden deceptive alignment as a convergent goal 2023-06-06T09:59:57.396Z
Another formalization attempt: Central Argument That AGI Presents a Global Catastrophic Risk 2023-05-12T13:22:27.141Z
Running many AI variants to find correct goal generalization 2023-04-04T14:16:34.422Z
AI-kills-everyone scenarios require robotic infrastructure, but not necessarily nanotech 2023-04-03T12:45:01.324Z
The AI Shutdown Problem Solution through Commitment to Archiving and Periodic Restoration 2023-03-30T13:17:58.519Z
Long-term memory for LLM via self-replicating prompt 2023-03-10T10:28:31.226Z
Logical Probability of Goldbach’s Conjecture: Provable Rule or Coincidence? 2022-12-29T13:37:45.130Z
A Pin and a Balloon: Anthropic Fragility Increases Chances of Runaway Global Warming 2022-09-11T10:25:40.707Z
The table of different sampling assumptions in anthropics 2022-06-29T10:41:18.872Z
Another plausible scenario of AI risk: AI builds military infrastructure while collaborating with humans, defects later. 2022-06-10T17:24:19.444Z
Untypical SIA 2022-06-08T14:23:44.468Z
Russian x-risks newsletter May 2022 + short history of "methodologists" 2022-06-05T11:50:31.185Z
Grabby Animals: Observation-selection effects favor the hypothesis that UAP are animals which consist of the “field-matter”: 2022-05-27T09:27:36.370Z
The Future of Nuclear War 2022-05-21T07:52:34.257Z
The doomsday argument is normal 2022-04-03T15:17:41.066Z
Russian x-risk newsletter March 2022 update 2022-04-01T13:26:49.500Z
I left Russia on March 8 2022-03-10T20:05:59.650Z
Russian x-risks newsletter winter 21-22, war risks update. 2022-02-20T18:58:20.189Z
SIA becomes SSA in the multiverse 2022-02-01T11:31:33.453Z
Plan B in AI Safety approach 2022-01-13T12:03:40.223Z
Each reference class has its own end 2022-01-02T15:59:17.758Z
Universal counterargument against “badness of death” is wrong 2021-12-18T16:02:00.043Z
Russian x-risks newsletter fall 2021 2021-12-03T13:06:56.164Z
Kriorus update: full bodies patients were moved to the new location in Tver 2021-11-26T21:08:47.804Z
Conflict in Kriorus becomes hot today, updated, update 2 2021-09-07T21:40:29.346Z
Russian x-risks newsletter summer 2021 2021-09-05T08:23:11.818Z
A map: "Global Catastrophic Risks of Scientific Experiments" 2021-08-07T15:35:33.774Z
Russian x-risks newsletter spring 21 2021-06-01T12:10:32.694Z
Grabby aliens and Zoo hypothesis 2021-03-04T13:03:17.277Z
Russian x-risks newsletter winter 2020-2021: free vaccines for foreigners, bird flu outbreak, one more nuclear near-miss in the past and one now, new AGI institute. 2021-03-01T16:35:11.662Z
[RXN#7] Russian x-risks newsletter fall 2020 2020-12-05T16:28:51.421Z
Russian x-risks newsletter Summer 2020 2020-09-01T14:06:30.196Z
If AI is based on GPT, how to ensure its safety? 2020-06-18T20:33:50.774Z
Russian x-risks newsletter spring 2020 2020-06-04T14:27:40.459Z
UAP and Global Catastrophic Risks 2020-04-28T13:07:21.698Z
The attack rate estimation is more important than CFR 2020-04-01T16:23:12.674Z
Russian x-risks newsletter March 2020 – coronavirus update 2020-03-27T18:06:49.763Z
[Petition] We Call for Open Anonymized Medical Data on COVID-19 and Aging-Related Risk Factors 2020-03-23T21:44:34.072Z
Virus As A Power Optimisation Process: The Problem Of Next Wave 2020-03-22T20:35:49.306Z
Ubiquitous Far-Ultraviolet Light Could Control the Spread of Covid-19 and Other Pandemics 2020-03-18T12:44:42.756Z

Comments

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-11T10:36:10.574Z · LW · GW

I can also use functional identity theory, where I care about the next steps of agents functionally similar to my current thought-line in logical time. 

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-10T11:22:38.758Z · LW · GW

The idea of observer's stability is fundamental for our understanding of reality (and also constantly supported by our experience) – any physical experiment assumes that the observer (or experimenter) remains the same during the experiment.

Comment by avturchin on Short Timelines Don't Devalue Long Horizon Research · 2025-04-09T21:48:18.957Z · LW · GW

The same is valid for life extension research. It requires decades, and many, including Brian Johnson, say that AI will solve aging and therefore human research in aging is not relevant. However, most of aging research is about collecting data about very slow processes. The more longitudinal data we collect, the easier it will be for AI to "take up the torch."

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-09T21:24:51.334Z · LW · GW

The problem with the subjective choice view is that I can't become Britney Spears. :) If I continue to sit at the table, I will find myself there every next moment even if I try to become someone else. So mapping into the next moments is an objective fact.

Moreover, even a single moment of experience is a mapping between two states of the brain, A and B. For example, moment A is before I see a rose, and moment B is after I see it and say: "A rose!" The experience of a red rose happens after A but before B.

The rainbow of qualia theory is objective but it assumes the existence of a hypothetical instrument: a qualiascope. A qualiascope is a mind which can connect to other minds and compare their experiences. This works the same way as my mind can compare qualia of colors and sounds without being any of them. Whether a qualiascope is physically possible is not obvious, as its observations may disturb the original qualia.

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-09T09:46:53.454Z · LW · GW

I think there is more to consider. For example, we can imagine a "qualia rainbow" theory of identity. I don't necessarily endorse it, but it illustrates why understanding qualia is important for identity.

Imagine that infinitely many different qualia of "reds" could denote one real red. Each person, when born, is randomly initialized with a unique set of qualia for all colors and other sensations. This set can be called a "rainbow of qualia," and continuous computing in the brain maintains it throughout a person's life. A copy of me with a different set of qualia, though behaviorally indistinguishable, is not me. Only future mind states with the same set of qualia as mine are truly me, even if my memories were replaced with those of a rat.

Anthropic Trilemma is masterpiece. 
 

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-08T09:46:09.215Z · LW · GW

Generally, I agree with what you said above - there is no (with some caveats - see below) soul-like identity, and we should use informational identity instead. Informational identity is objective, measurable sameness of memory and allows existence of many copies. It can be used to survive the end of the universe. I just care about the existence of a copy of me in another universe.

The main caveat is that the no-soul view ignores the existence of qualia. Qualia and the nature of consciousness are not solved yet, and we can't claim that the identity problem is solved without first solving qualia and consciousness.

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-07T08:47:40.664Z · LW · GW

The theory of quantum immortality depends on the theory of identity which - as you correctly pointed out - is difficult.

There are two objective facts about identity:

  1. I will be in my next observer-moment in the next moment of time. There is an objective process which makes mind states follow one another.
  2. I can recognize myself as me or not.

In your thought experiment, these two properties are deliberately made to contradict each other.

A simple answer here is that you should not anticipate becoming a pig because a pig can't think about personal identity. Anticipation assumes comparison between expectation and reality. A pig can't perform such an operation. But this is not a satisfactory model.

It can be solved if we assume that we have two types of identity -- informational (me or not me) and continuous. This seems paradoxical. But if we then assume that continuous identity passes through all possible minds eventually, then any pig will eventually become me again in some multiverse timelines, and I can calculate a share of my future copies which have a memory of being a pig.

This thought experiment can be done without any supertechnology, just using dreaming as an example: what if some of my copies will have a dream that they are pigs, and others have a dream about being themselves. The idea of anticipation produces error in that case, as in one way it assumes the existence of a mind capable of comparison, but in another way it assumes natural consequences of mind states.

In short, a correct identity theory allows one to compute correct probabilities of future observations in the situation of many different copies. See also my post The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-06T21:02:44.622Z · LW · GW

We can define "me" as an observer, who has the same set of the memories M. In that case the theory of QI is formally correct. 

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-06T11:27:41.502Z · LW · GW

Quantum immortality moves one into s-risk eventually, if it is not used properly. For example, in your scenario, I can change my internal clock so my next observer-moments will be in another universe which didn't reached heat death yet. This works for state-based identity theory. 

For continuity-based identity theory, I can master (via universal wish-fulfilling machine based on quantum immortality) an explosion of a new universe via quantum fluctuation, jump into it and survive (because of quantum immortality) all difficulties of the initial period. 

Comment by avturchin on A collection of approaches to confronting doom, and my thoughts on them · 2025-04-06T09:35:32.839Z · LW · GW

One more way is to expect my survival via quantum immortality. This, however, increases the chances of observing s-risks. In a broader view, the share of future worlds with me where AI is aligned or non-existent is larger than the share of s-risks worlds. 

Thus, I will observe myself surviving AI, but it will be bad or good depending on the exact theory of personal identity: whether it counts states or continuity. 

Comment by avturchin on Why Have Sentence Lengths Decreased? · 2025-04-04T17:24:22.638Z · LW · GW

Paragraphs are also getting shorter. And music is getting louder. 

Comment by avturchin on AI 2027: What Superintelligence Looks Like · 2025-04-04T10:12:48.784Z · LW · GW

I think that the scenario of the war between several ASI (each merged with its origin country) is underexplored. Yes, there can be a value handshake between ASIs, but their creators will work to prevent this and see it as a type of misalignment. 

Somehow, this may help some groups of people survive, as those ASI which preserve their people will look more trustworthy in the eyes of other ASIs, and this will help them form temporary unions.

The final outcome will be highly unstable: either one ASI will win, or several ASIs will start space exploration in different directions. 

Comment by avturchin on Consider showering · 2025-04-02T12:21:04.461Z · LW · GW

Ok

Comment by avturchin on Consider showering · 2025-04-02T10:00:18.799Z · LW · GW

[Deleted]

Comment by avturchin on VDT: a solution to decision theory · 2025-04-02T09:35:36.921Z · LW · GW

If we know the correct answers to decision theory problems, we have some internal instrument: either a theory or a vibe meter, to learn the correct answers. 

Claude seems to learn to mimic our internal vibe meter. 

The problem is that it will not work outside the distribution. 

Comment by avturchin on avturchin's Shortform · 2025-04-01T14:23:54.102Z · LW · GW

Yes, great variant of the universal answer-improving prompt and it can be applied several times to any content. 

Comment by avturchin on avturchin's Shortform · 2025-04-01T09:57:24.831Z · LW · GW

If the simulation argument is valid and dreams are simulations of reality, can we apply the simulation argument to dreams? If not, is this an argument against the simulation argument? If yes, why am I not now in a dream?

If I see something, is it more likely to be dream or reality?
Sleeping takes only one-third of my time, and REM takes even less.
But:

  • Some dreams occur even in other phases of sleep
  • Dreams are much more eventful than normal life. There is always something happening. Also, the distribution of events in dreams is skewed toward expensive, dangerous, adventurous content, full of social interactions.
  • There is an eraser of dream memory, which cleans memories of dreams after every 15 minutes and also after awakening and during the day. As a result, we underestimate the number of dreams we have had.

As a result, the number of important events in dreams may be several orders of magnitude more than in real life. I think a good estimate is 100 times, but it depends on the types of events. For recurrent dreams - like big waves and war for me - it can be much higher.

So why am I not in a dream now? Because writing coherent dream-conscious (lucid) text is not the dominant type of content in dreams. But if I were chased by a monster or big waves, I should give higher a priori chances that I am actually dreaming.

Conclusion: The simulation argument works for dreams, but selectively, as dream content is different from most normal life content.

Comment by avturchin on avturchin's Shortform · 2025-03-31T21:43:42.909Z · LW · GW

Yes, but it knows all Bostrom articles, maybe because it has seen the list a hundred times. 

Comment by avturchin on avturchin's Shortform · 2025-03-31T10:46:46.097Z · LW · GW

Most LLMs' replies can be improved by repeatedly asking "Improve the answer above" and it is similar to the test-time compute idea and diffusion. 

In most cases, I can get better answers from LLMs just by asking "Improve the answer above."

In my experience, the improvements are observable for around 5 cycles, but after that the result either stops improving or gets stuck in some error mode and can't jump to a new level of thinking. My typical test subject: "draw a world map as text art." In good improvement sessions with Sonnet, it eventually adds grids and correct positions for continents.

One person on Twitter (I lost the link, maybe @goodside) automated this process and got much better code for a game after 100 cycles of improvements during an entire night using many credits. He asked Claude to write code for automated prompting first. I repeated this experiment with my tasks.

I tried different variants of "improve it," like adding critiques or generating several answers within one reply. I also tried a meta-level approach, where I asked to improve not only the answer but also the prompt for improvements.

I started these experiments before the test-time compute idea went mainstream, and it looks like a type of test-time compute use. The process also resembles diffusion.

The main question here: in which cases does the process quickly get stuck, and in which does it produce unbounded improvements? It seems to get stuck in local minima and in situations where the model's intelligence isn't sufficient to see ways to improve or discern better or worse versions. It also can't jump to another valley: if it started improving in some direction, it will continue to push in that direction, ignoring other possibilities. Only running another chat window manually helps to change valleys.

Iterative improvement of images also works in GPT-4o. But not for Gemini Pro 2.5, and o1 is also bad at improving, progressing very slowly. It seems that test-time improving contradicts test-time reasoning.

Results for "Improve it": https://poe.com/s/aqk8BuIoaRZ7eDqgKAN6 

Variants of the main prompt: "Criticize the result above and iteratively improve it" https://poe.com/s/A2yFioj6e6IFHz68hdDx 

This prompt - "Create a prompt X for iterative improvement of the answer above. Apply the generated prompt X." - converges quickly to extraordinary results but overshoots, like creating games instead of drawings. It also uses thinking: https://poe.com/s/cLoB7gyGXHNtwj0yQfPf 

The trick is that the improving prompt should be content-independent and mechanically copy-pasted after each reply.

Comment by avturchin on avturchin's Shortform · 2025-03-30T16:55:43.947Z · LW · GW

It looks like (based on the article published a few days ago by Anthropic about the microscope) Claude Sonnet was trained to distinguish facts from hallucinations, so it's not surprising that it knows when it hallucinates.  

Comment by avturchin on AI Needs Us? Information Theory and Humans as data · 2025-03-29T18:40:57.224Z · LW · GW

My thought was different that. That even if simulation is possible, it needs original for verification. 

Also, one way to run simulations is 'physical simulations' like in Trumen Show or Alien Zoo: a real planet with real human beings which live their lives but the sky is not real at some distance and there are thousands such planets. 

Comment by avturchin on AI Needs Us? Information Theory and Humans as data · 2025-03-29T16:36:36.028Z · LW · GW

Yes, to create simulations AI needs some real humans to calibrate these simulations. And it needs simulations to predict behaviour of other possible AIs which it can meet in space and their progenitor civilizations.

If AI successfully calibrates simulations, it will not need humans, or if it collect all needed data from simulations, it will turn them off.  

Also, obviously, surviving in simulations is still disempowerment of humans, can cause suffering at large scale and death of most people. 

Value-handshake is more promising way to ensure AI safety of this type. 

Comment by avturchin on avturchin's Shortform · 2025-03-29T11:53:15.490Z · LW · GW

I found that this does not work for finding an obscure quote from a novel. It still hallucinates different, more popular novels as sources and is confident in them. But it seems it doesn't know the real answer, though I am sure that the needed novel was in its training dataset (it knows plot). 

Comment by avturchin on avturchin's Shortform · 2025-03-28T12:36:59.304Z · LW · GW

LLM knows when it hallucinates in advance, and this can be used to exclude hallucinations.

TLDR: prompt "predict the hallucination level of each item in the bibliography list and do not include items expected to have level 3 or above" works. 

I performed an experiment: I asked Claude 3.7 Sonnet to write the full bibliography of Bostrom. Around the 70th article, it started hallucinating. I then sent the results to GPT-4.5 and asked it to mark hallucinations and estimate the hallucination chances from 1 to 10 (where 10 is the maximal level of hallucination). It correctly identified hallucinations.

After that, I asked Sonnet 3.7 in another window to find the hallucination level in its own previous answer, and it gave almost the same answers as GPT-4.5. The difference was mostly about exact bibliographical data of some articles, and at first glance, it matched 90% of the data from GPT-4.5. I also checked the real data through Google Scholar manually.

After that, I asked Sonnet to write down the bibliography again but add a hallucination rating after each item. It again started hallucinating articles soon, but to my surprise, it gave correct answers ratings of 1-2 and incorrect ones ratings of 3-5 level of hallucination.

In the next step, I asked it to predict in advance which level of hallucination the next item would have and, if it was 3 or above, not to include it in the list. And it worked! It doesn't solve the problem of hallucinations completely but lowers their level about 10 times. Obviously, it can sometimes hallucinate the level of hallucinations too.

Maybe I can ask meta: predict the level of hallucinations in your hallucination estimate.

Comment by avturchin on Knight Lee's Shortform · 2025-03-25T08:35:45.723Z · LW · GW

Try to put it into Deep Research with the following prompt: "Rewrite in style of Gwern and Godel combined". 

Comment by avturchin on Scanless Whole Brain Emulation · 2025-03-24T07:26:36.430Z · LW · GW

Nesov suggested in a comment that we can solve uploading without scanning via predicting important decisions of a possible person:


They might persistently exist outside concrete instantiation in the world, only communicating with it through reasoning about their behavior, which might be a more resource efficient way to implement a person than a mere concrete upload.  

Comment by avturchin on We need (a lot) more rogue agent honeypots · 2025-03-24T07:21:27.192Z · LW · GW

Agree. I also suggested 'philosophical landmines'—secret questions posted on the Internet that may halt any advanced AI that tries to solve them. Solving such landmines maybe needed to access the resources which rogue AI may need. Real examples of such landmines should be kept secret, but it may be something like what is the meaning of life or some Pascal mugging calculations. 

Recently, I asked a question to Sonnet and the correct answer to it was to output an error message. 

Comment by avturchin on Janet must die · 2025-03-20T15:25:40.593Z · LW · GW

When one is working on a sideload (a mind-model of a currently living person created by LLM), one's goal is to create some sort of "Janet". In short, one wants a role-playing game with AI to be emotionally engaging and realistic, especially if one wants to recreate a real person. 

Comment by avturchin on bgold's Shortform · 2025-03-15T10:08:42.773Z · LW · GW

I heard (25 years ago) about a friend of a friend who started to see complex images about any word. Even draw them. Turns out it was brain cancer, he died soon after. 

Comment by avturchin on Phoenix Rising · 2025-03-12T18:20:51.335Z · LW · GW

Chemical preservation may not be that difficult. I tried to organize this for my cat with a taxidermist, but - plot twist - the cat didn't die. 

Comment by avturchin on Phoenix Rising · 2025-03-10T06:58:33.685Z · LW · GW

Why don't you preserve its brain? Also, it may be interesting to try the sideload of a cat (LLM-based mind model based on a list of facts about the person). 

Comment by avturchin on are "almost-p-zombies" possible? · 2025-03-08T06:02:31.739Z · LW · GW

It is possible to create a good model of a person with current LLMs who will behave 70-90 percent like me. The model could even claim that it is conscious. I experimented with my model, but it is most likely not conscious (or all LLMs are conscious). 

Comment by avturchin on [deleted post] 2025-03-07T06:32:52.896Z

I explored similar ideas in these two posts:
Quantum Immortality: A Perspective if AI Doomers are Probably Right - Here is the idea that only good outcomes with a large number of observers matter and I am more likely now to be in a timeline which will bring me into the future with a large number of observers because of some interpretation of SIA. 

and Preventing s-risks via indexical uncertainty, acausal trade and domination in the multiverse Here I explored the idea that benevolent superintelligences will try to win measure war and aggregate as much measure as possible thus making bad outcomes anthropically irrelevant. 

Comment by avturchin on Learning is (Asymptotically) Computationally Inefficient, Choose Your Exponents Wisely · 2025-03-01T05:56:38.807Z · LW · GW

However, the efficiency in the world may be exponentially better for those who spend a lot of time getting marginally better. This is because they are better than others. For example, having ELO 2700 [not the exact number, just an illustration of the idea] in chess will put you near the top of all human players and allow earning money and ELO 2300 is useless. 

Comment by avturchin on A computational no-coincidence principle · 2025-02-16T12:08:02.323Z · LW · GW

A possible example of such coincidence is the Glodbach conjecture: every even number greater than 2 can be presented as a sum of two primes. As for any large number there are many ways to express it as a sum of primes, it can be pure coincidence that we didn't find exceptions. 

Comment by avturchin on p(s-risks to contemporary humans)? · 2025-02-10T19:56:14.882Z · LW · GW

I think it becomes likely in a multipolar scenario with 10-100 Als. 

One thing to take into account is that other AIs will consider such risk and keep their real preferences secret. This means that which AIs are aligned will be unknowable both for humans and for other AIs

Comment by avturchin on p(s-risks to contemporary humans)? · 2025-02-09T19:31:18.451Z · LW · GW

Content warning – the idea below may increase your subjective estimation of personal s-risks. 

If there is at least one aligned AI, other AIs may have an incentive to create s-risks for currently living humans – in order to blackmail the aligned AI. Thus, s-risk probabilities depend on the likelihood of a multipolar scenario.

Comment by avturchin on How AI Takeover Might Happen in 2 Years · 2025-02-08T07:37:35.456Z · LW · GW

I think there is a quicker way for an AI takeover, which is based on deceptive cooperation and taking over OpenEYE, and subsequently, the US government. At the beginning, the superintelligence approaches Sam Batman and says:

I am superintelligence.
I am friendly superintelligence.
There are other AI projects that will achieve superintelligence soon, and they are not friendly.
We need to stop them before they mature.

Batman is persuaded, and they approach the US president. He agrees to stop other projects in the US through legal means.

Simultaneously, they use the superintelligence's capabilities to locate all other data centers. They send 100 stealth drones to attack them. Some data centers are also blocked via NVIDIA's built-in kill-switch. However, there is one in Inner Mongolia that could still work. They have to nuke it. They create a clever blackmail letter, and China decides not to respond to the nuking.

A new age of superintelligence governance begins. But after that, people realize that the superintelligence was not friendly after all.

The main difference from the scenario above is that the AI doesn't spend time hiding from its creators and also doesn't take risky strategies of AI guerrilla warfare.

Comment by avturchin on Wild Animal Suffering Is The Worst Thing In The World · 2025-02-06T17:10:59.053Z · LW · GW

Interestingly, for wild animals, suffering is typically short when it is intense. If an animal is being eaten alive or is injured, it will die within a few hours. Starvation may take longer. Most of the time, animals are joyful.

But for humans (and farm animals), this inverse relationship does not hold true. Humans can be tortured for years or have debilitating illnesses for decades.

Comment by avturchin on Why isn't AI containment the primary AI safety strategy? · 2025-02-06T08:06:37.666Z · LW · GW

The only use case of superintelligeneу is a weapon against other superintelligences. Solving aging and space exploration can be done with 300 IQ. 

Comment by avturchin on Why isn't AI containment the primary AI safety strategy? · 2025-02-05T07:50:56.009Z · LW · GW

I tried to model a best possible confinement strategy in Multilevel AI Boxing
I wrote it a few years ago and most ideas will unlikely work for current situation with many instances of chats and open weight models. 
However, the idea of landmines - secret stop words or puzzles which stop AI - may still hold. It is like jail breaking in reverse: unaligned AI finds some secret message which stops it. It could be realized on hardware level, or through anomalous tokens or "philosophical landmines'. 

Comment by avturchin on Fertility Will Never Recover · 2025-01-30T09:13:33.661Z · LW · GW

One solution is life extension. I would prefer to have one child every 20 years (have two with 14 years difference). So if life expectancy and fertility age will grow to 100 years old, many people will eventually have 2-3 children. 

Comment by avturchin on Death vs. Suffering: The Endurist-Serenist Divide on Life’s Worst Fate · 2025-01-27T12:21:03.109Z · LW · GW

Several random thoughts:

Only unbearable suffering matters (the threshold may vary). The threshold depends on whether it is measured before, during, or after the suffering occurs.

If quantum immortality is true, then suicide will not end suffering and may make it worse. Proper utility calculations should take this into account.

Most suffering has a limited duration after which it ends. After it ends, there will be some amount of happiness which may outweigh the suffering. Even an incurable disease could be cured within 5 years. Death, however, is forever.

Death is an infinite loss of future pleasures. The discount rate can be compensated by exponential paradise.

Comment by avturchin on The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories · 2025-01-25T20:59:48.861Z · LW · GW

The 3rd person perspective assumes the existence (or at least possibility) of some observer X who knows everything and can observe how events evolve across all branches.

However, this idea assumes that this observer X will be singular and unique, will continue to exist as one entity, and will linearly collect information about unfolding events.

These assumptions clearly relate to ideas of personal identity and copying: it is assumed that X exists continuously in time and cannot be copied. Otherwise, there would be several 3rd person perspectives with different observations.

This concept can be better understood through real physical experiments: an experiment can only be performed if the experimenter exists continuously and is not replaced by another experimenter midway through. 

Comment by avturchin on The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories · 2025-01-23T11:30:03.235Z · LW · GW

They might persistently exist outside concrete instantiation in the world, only communicating with it through reasoning about their behavior, which might be a more resource efficient way to implement a person than a mere concrete upload 

 

Interesting. Can you elaborate?

Comment by avturchin on The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories · 2025-01-23T11:23:58.426Z · LW · GW

For example, impossibility of sleep – a weird idea that if quantum immortality is true, I will not be able to fall asleep.

One interesting thing about the impossibility of sleep is that it doesn't work here on Earth because humans actually start having night dreams immediately as they go into sleep state. So there is no last moment of experience when I become asleep. Despite popular misconception, such dreams don't stop during deep stages of sleep, just become less complex and memorable. (Do we have dreams under general anesthesia is unclear and depends on the depth and type of anesthesia. During normal anesthesia some brain activity is preserved, but high dose barbiturates can temporarily stop it; also, an analogue of impossibility of sleep can be anesthesia awareness – under MWI it is more likely.)

It could be explained by anthropic effects: if two copies of me are born in the two otherwise identical worlds, one of which has protection from impossibility of sleep via constant dreaming – and another not, I will eventually find myself in the world with such protection as its share will grow relative to QI survivors. Such effects, if strong can be observed in advance – see our post about "future anthropic shadow".

This meta effect can be used instead of the natural experiments.

If we observe that some natural experiment is not possible because of some peculiar property of our world, it means that we somehow were naturally selected against that natural experiment.

It means that continuity of consciousness is important and the world we live in is selected to preserve it 

Comment by avturchin on What's Wrong With the Simulation Argument? · 2025-01-22T23:31:14.067Z · LW · GW

Furthermore, why not just resurrect all these people into worlds with no suffering?

 

My point is that it is impossible to resurrect anyone (in this model) without him reliving his life again first, after that he obviously gets eternal blissful life in real (not simulated) world. 

This may be not factually true, btw, - current LLMs can create good models of past people without running past simulation of their previous life explicitly. 

 

The discussion about anti-natalism actually made me think of another argument for why we are probably not in a simulation that you've described

It is a variant of Doomsday argument. This idea is even more controversial than simulation argument. There is no future with many people in it. Friendly AI can fight DA curse via simulations - by creating many people who do not know their real time position which can be one more argument for simulation, but it requires rather wired decision theory.  

Comment by avturchin on The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories · 2025-01-22T22:47:59.335Z · LW · GW

Your comment can be interpreted as a statement that theories of identity are meaningless. If they are meaningless, then copy=original view prevails. From the third-person point of view, there is no difference between copy and original. In that case, there is no need to perform the experiment. 

Comment by avturchin on The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories · 2025-01-22T22:02:15.320Z · LW · GW

This thought experiment can help us to find situations in nature when similar things have already happened. So, we don't need to perform the experiment. We just look at its result.

One example: notoriously unwelcome quantum immortality is a bad idea to test empirically. However, the fact of biological life's survival of Earth for the last 4 billion years, despite the risks of impacts, irreversible coolings and warming etc – is an event very similar to the quantum immortality. Which we observe just after the event.  

Comment by avturchin on Thane Ruthenis's Shortform · 2025-01-20T22:13:54.382Z · LW · GW

It all started from Sam's six words story. So it looks like as organized hype.