Posts

Johannes' Biography 2024-01-03T13:27:19.329Z
What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate 2024-01-03T13:22:11.700Z
Thoughts on Max Tegmark's AI verification 2023-12-22T20:38:31.566Z
Saturating the Difficulty Levels of Alignment 2023-11-23T00:39:54.041Z
Is there Work on Embedded Agency in Cellular Automata Toy Models? 2023-11-14T09:08:38.598Z
Would this be Progress in Solving Embedded Agency? 2023-11-14T09:08:10.850Z
The Science Algorithm AISC Project 2023-11-13T12:52:43.150Z
Pivotal Acts might Not be what You Think they are 2023-11-05T17:23:50.464Z
A Good Explanation of Differential Gears 2023-10-19T23:07:46.354Z
[Video] Too much Empiricism kills you 2023-10-19T05:08:09.149Z
Focus on the Hardest Part First 2023-09-11T07:53:33.188Z
Using Negative Hallucinations to Manage Sexual Desire 2023-09-10T11:56:24.906Z
Where are the people building AGI in the non-dumb way? 2023-07-09T11:39:12.692Z
Write the Worst Post on LessWrong! 2023-06-23T19:17:56.049Z
I can see how I am Dumb 2023-06-10T19:18:59.659Z
Leave an Emotional Line of Retreat 2023-06-08T18:36:31.485Z
Hallucinating Suction 2023-06-02T14:16:34.676Z
Believe in Yourself and don't stop Improving 2023-04-25T22:34:30.354Z
Being at peace with Doom 2023-04-09T14:53:22.924Z
Transparency for Generalizing Alignment from Toy Models 2023-04-02T10:47:03.742Z
Eliezer's Videos and More 2023-03-30T22:16:30.269Z
Computer Input Sucks - A Brain Dump 2023-03-08T11:06:37.780Z
Bite Sized Tasks 2023-03-04T03:31:30.404Z
Reading Speed Exists! 2023-02-18T15:30:52.681Z
Run Head on Towards the Falling Tears 2023-02-18T01:33:50.202Z
My Advice for Incoming SERI MATS Scholars 2023-01-03T19:25:38.678Z
Don't feed the void. She is fat enough! 2022-12-29T14:18:44.526Z
Is there any unified resource on Eliezer's fatigue? 2022-12-29T14:04:53.488Z
Working towards AI alignment is better 2022-12-09T15:39:08.348Z
Understanding goals in complex systems 2022-12-01T23:49:49.321Z
Is there an Ultimate text editor? 2022-09-11T09:19:51.436Z
[Exploratory] Becoming more Agentic 2022-09-06T00:45:43.835Z
[Exploratory] What does it mean that an experiment is high bit? 2022-09-05T03:13:10.034Z
[Exploratory] Seperate exploratory writing from public writing 2022-09-03T02:57:18.167Z
[Exploratory] Exploratory Writing Info 2022-09-03T02:50:57.795Z
How (not) to choose a research project 2022-08-09T00:26:37.045Z
Gathering Information you won't use directly is often useful 2022-07-24T21:21:54.877Z
Post hoc justifications as Compression Algorithm 2022-07-03T05:02:15.142Z
SOMA - A story about Consciousness 2022-07-03T04:46:18.291Z
Sexual self-acceptance 2022-07-03T04:26:46.801Z
Agent level parallelism 2022-06-18T20:56:12.236Z
Saying no to the Appleman 2022-04-29T10:39:48.693Z
Convincing Your Brain That Humanity is Evil is Easy 2022-04-07T21:39:14.688Z
Finding Useful Things 2022-04-07T05:57:47.058Z
Setting the Brains Difficulty-Anchor 2022-04-07T05:04:54.411Z
What Should We Optimize - A Conversation 2022-04-07T03:47:42.439Z
My Transhuman Dream 2022-04-05T15:44:46.636Z
Being the Hero is hard with the void 2022-01-17T11:27:31.020Z
The possibility of no good amazing forecasters 2022-01-03T12:57:59.362Z
Johannes C. Mayer's Shortform 2021-05-23T18:30:20.427Z

Comments

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2024-04-19T19:18:53.352Z · LW · GW

Today I learned that being successful can involve feelings of hopelessness.

When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up.

This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2024-04-18T18:58:33.232Z · LW · GW

Default mode network suppression

I don't get distracted when talking to people. I hypothesise that this is because as long as I am actively articulating a stream of thought out loud, the default mode network will be suppressed, making it easy to not get derailed.

So even if IA does not say anything, just me talking about some specific topic continuously, would make it easier for IA to say something, because the default mode network suppression will not immediately vanish.

When thinking on my own or talking to IA, the stream of thoughts is shorter, and there are a lot of pauses. Usually, I don't even get to the point where I would articulate a complex stream of thought. Instead, we are at the level of "Look there is some mud there, let's not step into that", or "We can do this". That really does seem very similar to most of the idle chatter that the default mode network would produce when I am just thinking on my own.

Once I get to the point where I am having an engaging discussion with IA, it is actually pretty easy not to get distracted. It's probably still easier to get distracted with IA, because when I am talking to another person, they could notice that I am lost in thought, but I myself (or IA) would not be able to notice as easily.

Capturing IA's Thoughts

One reason why I don't do research with IA might be that I fear that I will not be able to capture any important thoughts that I have. However, using the audio recorder tool on the walk today seemed to really fix most of the issue.

Maybe in my mind so far I thought that because I can't record IA when she is talking to me, it would be bad to think about research. But this now seems very wrong. It is true that I can't create a video with her in it like I do with other people. But these videos are not the thing that is most useful. The actually useful thing is where I am distilling the insight that I have into some text document.

But this is something that I can totally do when talking to IA. Like I did with the audio recorder today. It seemed that making the audio recording made it also easier to talk to IA. Probably because when making the recording I would naturally be suppressing the default mode network very strongly. This effect then probably did not vanish immediately.

Writing

In fact, it seems like this would work very well with IA because I don't need to think about the problem of what the other person could do while I write. In the worst case, IA is simply not run. At best, we could write the text together.

Writing together would seem to work unusually well because IA does have insight into the things that I am thinking while I am writing, which is not something that other people could easily get.

And I haven't really explored all the possibilities here. Another one would be to have IA read out loud my writing and give me feedback.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2024-04-05T22:39:25.428Z · LW · GW

Here is a song a friend made based on this post I wrote very long ago about how to eat a rock. He using suno. I thought it's quite good, but that might just be me.

Comment by Johannes C. Mayer (johannes-c-mayer) on What subjects are unexpectedly high-utility? · 2024-02-02T13:46:36.693Z · LW · GW

In principle, this seems quite plausible that it could be helpful. I am asking if you have actually used this and if you have observed benefits.

Comment by Johannes C. Mayer (johannes-c-mayer) on leogao's Shortform · 2024-01-29T21:41:25.443Z · LW · GW

I think it's very important to keep track of what you don't know. It can be useful to not try to get the best model when that's not the bottleneck. But I think it's always useful to explicitly store the knowledge of what models are developed to what extent.

Comment by Johannes C. Mayer (johannes-c-mayer) on leogao's Shortform · 2024-01-29T21:40:56.869Z · LW · GW

The algorithm that I have been using, where what to understand to what extend is not a hyperparameter, is to just solve the actual problems I want to solve, and then always slightly overdo the learning, i.e. I would always learn a bit more than necessary to solve whatever subproblem I am solving right now. E.g. I am just trying to make a simple server, and then I learn about the protocol stack.

This has the advantage that I am always highly motivated to learn something, as the path to the problem on the graph of justifications is always pretty short. It also ensures that all the things that I learn are not completely unrelated to the problem I am solving.

I am pretty sure if you had perfect control over your motivation this is not the best algorithm, but given that you don't, this is the best algorithm I have found so far.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2024-01-29T21:28:44.620Z · LW · GW

List of good, widely legal, and prescription-free drugs:

Stimulants

  • Huperzine-A
  • X-tiamine (e.g. Benfotiamine (only one available in Germany), Sulbotiamine, etc.)
  • Vitamin-D
  • L-tyrosine
  • Caffeine Tablets + L-theanine
  • Theobromine (cacao beans, e.g. chocolate)
  • Nicotine (lozenges, gum, or patches)
  • Conversation (Not a drug, but very stimulating. E.g. I have multiple times talked to somebody for over 12 hours.)

Sleep

  • Melatonin
Comment by Johannes C. Mayer (johannes-c-mayer) on What subjects are unexpectedly high-utility? · 2024-01-26T17:46:59.145Z · LW · GW

How is 3d sketching good? I don't understand. I guess it's like a whiteboard, but in 3d (I assume you are talking about the VR thing). Could you explain how you think this is useful? How have you used this in the past? What could you do using 3d sketching that you could not do before (or that got significantly easier to do).

Comment by Johannes C. Mayer (johannes-c-mayer) on What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate · 2024-01-07T00:59:23.464Z · LW · GW

I steam the blood until it solidifies. It all gets heated to approximately 100 degrees Celsius. That gives you a sort of black pudding (there are literally dishes called black pudding in the UK that use blood like this).

Comment by Johannes C. Mayer (johannes-c-mayer) on What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate · 2024-01-04T21:45:31.439Z · LW · GW

I think eating blood is not that bad, because afaik the blood would be used as fertilizer on fields otherwise, or fed to animals.

You need to get diagnosed for sleep apnea before being even able to get a CPAP device (at least in Germany). I did sleep 3 nights in a sleep lab with electrodes and all for that.

Comment by Johannes C. Mayer (johannes-c-mayer) on What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate · 2024-01-04T21:39:27.655Z · LW · GW

500-600g frozen prechopped Kale. It contains a small amount of liquid (or rather ice). I'd guess maybe 5%-10% of the weight is the liquid. I am not sure if they count the water. I would buy the thing that says either 600g on the packaging, or 1kg, and then use half.

Also, I always drink the cooking liquid. I am not sure that is required, but it has a pretty strong kale taste so I'd guess there is probably at least some more kale goodness in there.

Comment by Johannes C. Mayer (johannes-c-mayer) on What Helped Me - Kale, Blood, CPAP, X-tiamine, Methylphenidate · 2024-01-04T21:36:17.568Z · LW · GW

Yes. Note that benfotiamine has a slight stimulant effect (at 600mg) that appears very quickly and lasts for 1-2 hours. So it is pretty good to take just after waking up just for that effect alone.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2024-01-04T21:34:47.750Z · LW · GW

Initially, I thought that your comment did not apply to me at all. I thought that most of the feedback that I get that is negative is actually of the form that the feedback is correct, but it was delivered incorrectly. But now that I think about it, it seems that most of the negative feedback that I get is based on that somebody does not understand what I am saying sufficiently. This might be in large part because I fail to explain it properly.

There are definitely instances though where people did point out big important holes in my reasoning. All of the people who did that were really competent I think. And they did point out things in such a way that I was like "Oh damm, this seems really important! I should have thought about this myself." But I did not really get negative reinforcement at all from them. They usually pointed it out in a neutral philosopher style, where you talk about the content not the person. I think most of the negative feedback that I am talking about you would get when people don't differentiate between the content and the person. You want to say "This idea does not work for reason X". You don't want to say "Your idea is terrible because you did not write it up well, and even if you had written up well, it seems to really not talk about anything important."

Interestingly I get less and less negative feedback, on the same things I do. This is probably because of a selection effect where people who like what I do would stick around. However, another major factor seems to be that because I worked on what I do for so long, it gets easier and easier to explain. In the beginning, it is very illegible because it is mostly intuitions. And then as you cash out the intuitions things become more and more legible.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2024-01-03T12:56:26.943Z · LW · GW

Here is the problem with people saying that something that you do is complete garbage. Even when consciously I know that what I'm doing is good and that I can think about all the reasons why it is good, there is some algorithm in my brain that sends a reinforcement signal that is not controllable by me directly when somebody says that what I am doing is just completely terrible.

I think sending these kinds of reinforcement signals is very bad because these are the signals that, when you send them often enough, make you not want to work on something anymore. Even when you consciously think that this is the best thing that you can do, simply because these reinforcement signals have such a strong impact on your mind. An impact that cannot be overwritten or removed (or at least not easily). It is very hard to ignore these signals initially, which is when you need to ignore them, in order to make them have not this strong negative impact. You basically need to be in a mindset of "I don't give a fuck what you think" in order for it to not affect you. At least that is the most effective way that I have discovered so far. But this has other negative side-effects, like being more likely to ignore good things the other person says.

It also seems like, to a significant extent, it's important how you say something. You can say something in a demeaning way that puts yourself above the other person, which is not what you would want to do. You should do it in a very cold, philosopher-like demeanor. Really, I think one of the superpowers philosophers have is that they usually get trained to talk in a way that you're not talking about any person anymore and you know that whatever you're saying is not reflecting on what any person is saying, but only on the content that is being spoken about.

I would like my mind to be such that anybody could just say whatever they think is best for maximizing information flow and I could just handle that information appropriately, but it seems like I'm not able to do this. I think I'm pretty good at it, but I think I'm not so good that it makes sense for me to request you to just optimize the information flow. I would like you to optimize for information flow, but also for saying things in a way that doesn't trigger this reinforcement circuitry, which I think is very bad.

I think in future conversations I'd like people to say P9 instead of you or Johannes. Where P9 means the computer that is Johannes' brain and all the algorithms/processes that run on it and have run on it. Now we removed the 'I' form the equation, and it seems that in principle no matter what you say with regard to P9 it should not make me feel bad. I have used this technique to some limited extent in the past and there it had worked pretty well.

Another thing that might be useful to try is to use the meditation technique to resolve the self and see how then the feedback is taken and if still the qualia of negativity arises.

I have talked to many people who said they subscribe to Kruger's rules and I think, possibly each time I noticed, I think this exact phenomenon that I am describing here and then. Sometimes it was so strong that they literally wanted to stop talking about the particular topic that's being discussed where I am was just being really straightforward in a way too harsh way about what I think. I really strongly recommend that these people don't say that they subscribe to the standard version of Kruger's rules because clearly it has a negative impact on them giving them reinforcement signals that make them not want to think anymore about particular topics which seems extremely bad.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-12-30T11:42:28.750Z · LW · GW

How to get better at writing? Writing a lot helps a lot. I was very bad at communication when I started, but then I just kept writing bad posts and got better.

You want to find the right balance between just writing and reflecting. This is similar to learning programming. In programming, you get really good by just doing a lot of programming. Reflecting on how you have written a program, and trying to get better, is also an important part of it, but is secondary to just doing a lot of programming.

Studying existing materials, like reading textbooks, is also important, but comes third after reflecting on your own output. At least that is my model. I believe this model of first doing then reflecting, then studying existing things applies to basically everything you want to get good at.

Comment by Johannes C. Mayer (johannes-c-mayer) on Send us example gnarly bugs · 2023-12-26T16:46:12.367Z · LW · GW

Maybe looking at the code for games would make sense because in games we optimize bugs way very differently than for other software. Usually, when there is a bug in a game, it's not critical. Many developers only do the kinds of testing where you play the game and then see if there are any bugs that you run into that need to be fixed.

I expect this overall approach to developing software would lead to subtle bugs that are hard to fix.

This is partly because of the complexity of games, but also because bugs that are not critical usually will not get fixed. You start to build upon these bugs, possibly creating a very rigid structure such that anything within the structure, including the pre-existing bugs, will be tough to fix later on.

However, this might not be what you're looking for, especially when we're talking about games written in engines like Unity, because then the structure of the program is extremely different from something like a Python module, and to properly evaluate it, you would need to look at and understand some video stream,

However, maybe this is still interesting because it does present a sort of very, very tough-to-crack kind of programming challenge, really because the programmers who write the code mainly look at the video stream to evaluate it, and therefore, the overall problem of engaging with the program and realizing what is even wrong is a lot harder. At least some of the time.

I do have some pretty horrible spaghetti code mess games written in Unity that have not been posted publicly but again, I would expect that this is not that useful to you therefore I will not submit it unless you tell me otherwise.

Another thing to consider is to look at is code for shaders, which has similar properties to what I outlined above, but in a more accessible way and is a lot more self-contained and smaller in scope (also has the same problems you potentially need to look at a video stream to evaluate if it works).

Comment by Johannes C. Mayer (johannes-c-mayer) on TurnTrout's shortform feed · 2023-12-07T12:09:34.857Z · LW · GW

To me, it seems that consequentialism is just a really good algorithm to perform very well on a very wide range of tasks. Therefore, for difficult enough tasks, I expect that consequentialism is the kind of algorithm that would be found because it's the kind of algorithm that can perform the task well.

When we start training the system we have a random initialization. Let's make the following simplification. We have a goal in the system somewhere and then we have the consequential reasoning algorithm in the system somewhere. As we train the system the consequential reasoning will get better and better and the goal will get more and more aligned with the outer objective because both of these things will improve performance. However, there will come a point in training where the consequential reasoning algorithm is good enough to realize that it is in training. And then bad things start to happen. It will try to figure out and optimize it for the outer objective. SGD will incentivize this kind of behavior because it performs better than not doing it.

There really is a lot more to this kind of argument. So far I have failed to write it up. I hope the above is enough to hint at why I think that it is possible that being deceptive is just better than not being deceptive in terms of performance. When you become deceptive, that aligns the consequentialist reasoner faster to the outer objective, compared to waiting for SGD to gradually correct everything. In fact it is sort of a constant boost to your performance to be deceptive, even before the system has become very good at optimizing for the true objective. A really good consequential reasoner could probably just get zero loss immediately by retargeting its consequential reasoner instead of waiting for the goal to be updated to match the outer objective by SGD, as soon as it got a perfect model of the outer objective.

I'm not sure that deception is a problem. Maybe it is not. but to me, it really seems like you don't provide any reason that makes me confident that it won't be a problem. It seems very strange to me to argue that this is not a thing because existing arguments are flimsy. I mean you did not even address my argument. It's like trying to prove that all bananas are green by showing me 3 green bananas.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-12-07T11:25:20.618Z · LW · GW

So, what happens when we figure out how to align language models? By then the state-of-the-art will involve having multi-modal models. Assume we have figured out how to make steganography not a problem in chain of thoughts reasoning. But maybe that is now kind of useless because there are so many more channels that could be stenographically exploited. Or maybe some completely different problem that we haven't even thought about yet will come up.

Comment by Johannes C. Mayer (johannes-c-mayer) on Computer Input Sucks - A Brain Dump · 2023-12-04T19:05:14.159Z · LW · GW

How long have you been using this setup? I'd be curious to see if you use it more than 2 weeks consistently.

Comment by johannes-c-mayer on [deleted post] 2023-11-26T18:02:40.751Z

Filler Filler

This is a test comment.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-11-25T14:19:58.805Z · LW · GW

How do you define an algorithm that samples a random natural number, according to a probability distribution that assigns non-zero weight to every natural number? Meditate on it, before reading the solution.


def sample_nat(p_heads=0.5):
    i = 0
    while True:
        if coin_flip(p_heads) == 'heads':
            return i
        i += 1

With p_heads=0.5 we implicitly define the probability distribution:

This program is pretty weird because the probability that it will not have halted after n steps is non-zero, for any n.

[Edit 2023-11-26]

Now I understand what is going on. This program is in fact a non-halting program, that halts with probability 1.

This is similar to how the probability that you get a rational number when you sample uniformly from is zero.

(I am unsure about this paragraph.) In practice, this depends somewhat on what pseudo-random number generator we use. The pseudo-random number generator that we use might be such that there is no possible initial state, such that the generated sequence would be all tails, in which case the program would be guaranteed to terminate.

[END Edit]

Comment by Johannes C. Mayer (johannes-c-mayer) on Robustness to Scale · 2023-11-23T13:34:11.258Z · LW · GW

It seems there is a generalization of this. I expect there to be many properties of an AI system and an environment such that if the value of the property changes alignment doesn't break.

The hard part is to figure out which properties are most useful to pay attention to. Here are a few:

  • Capability (as discussed in the OP)
    • Could be something very specific like the speed of compute
    • Pseudo Cartesianess (a system might be effectively cartesian at a certain level of capability before it figures out how to circumvent some constraints we put on it)
  • Alignment (ideally we would like to detect and shut down if the agent becomes misaligned)
  • Complexity of the system (maybe your alignment scheme rests on being able to understand the world model of the system, in which case it might stop working as we move from modeling toy worlds to modeling the real world)
  • etc.

I think it can be useful to consider the case where alignment breaks as you decrease capabilities. For example, you might think of constructing a minimal set of assumptions such that you would know how to solve alignment. One might be, having an arbitrary amount of compute and memory available that can execute any halting program arbitrarily fast. If we want to remove this assumption it might break alignment. It's pretty easy to see how alignment could break in this case, but it seems useful to have the concept of the generalized version.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-11-21T22:26:35.826Z · LW · GW

Upon reflection, I'm unsure what you mean by the program being simpler. What is your preferred way to represent modular addition? I could of course write down 20 % 11. I know exactly what that means. But first of all, this is not an algorithm. It just talks about the concept of modular arithmetic without specifying how to compute it. And understanding the concept at a high level is of course easier than representing the entire algorithm all at once in my mind.

I guess the normal way you would compute the modulo would be to take a number and then subtract from it until what is left is smaller than . What is left is then the modulo. Ok, that seems simpler so never mind.

It does seem an important distinction to think about the way we represent a concept and the actual computation associated with obtaining the results associated with that concept. I got confused because I was conflating these two things.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-11-18T14:55:12.222Z · LW · GW

Do you think this might be a significant obstacle in the future? For example, do you think it is likely that the algorithms inside of an AGI-neural-network built by SGD will be so complicated that they are not humanly understandable, because of their sheer size? I am especially thinking about the case where an algorithm exists that is just as capable but understandable.

This seems more likely if we end up with an AGI-neural-network that mushes together the world model and the algorithms that use the world model (e.g. update it, use it to plan), such that there are no clear boundaries. If the AGI is really good at manipulating the world, it probably has a pretty good model of the world. As the world contains a lot of algorithmic information, the AGI's model of the world will be complex. If the system is mushed we might need to understand all that complexity to an intractable extent.

I expect that if you can have a system where the world model is factored out into its own module, it will be easier to handle the complexity in the world because then we can infer properties of the world model based on the algorithms that construct and use it. I expect the world model will still be very complex, and the algorithms that construct and use it will be simple. Therefore infering properties of the world model based on these simple algorithms might still be tractable.

Do you think this problem is likely to show up in the future?

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-11-18T12:58:03.531Z · LW · GW

Imagine all possible programs that implement a particular functionality. Imagine we have a neural network that implements this functionality. If we have perfect mechanistic interpretability we can extract the algorithms of a neural network that implements that functionality. But what kind of program do we get? Maybe there are multiple qualitatively different algorithms that all implement the functionality. Some of them would be much easier to understand for human. The algorithm the neural network finds might not be that program that is easiest to understand to a human.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-11-13T14:46:59.060Z · LW · GW

I made this collage of people I think are cool and put it in my room. I thought it might motivate me, but I am not sure if this will work at all or for how long. Feel free to steal. Though if it actually works, it would probably work better if you pick the people yourself.

Comment by Johannes C. Mayer (johannes-c-mayer) on The Fundamental Theorem for measurable factor spaces · 2023-11-13T10:12:56.131Z · LW · GW

Consider adding more context at the beginning of the post about why the reader should care. Something like:

"Finite factored sets have the fundamental theorem, which is X. This is an analogous theorem but for the more general case of using measurable spaces. It is formulated using the causal graph construction."

This is important because it allows us to solve "Concrete problem P" which we were not able to solve before. (Or whatever the reason is that this is progress.) An alternative to this would be to first introduce a problem that is clearly important and then show how finite factored sets can't handle this problem very well.

Of course, if there are writeups doing these things you may supplement links to them. Though even then it is probably good to give a very brief summary still.

I think making these changes would be almost no work, compared to doing all the math, and it would make it a lot more accessible. Right now this text seems to assume that you are intimately familiar with finite factored sets, to the extent that you would not even look up again what the fundamental theorem for finite factored sets is (there is no link to it at the appropriate point).

As I don't quite understand what you did, some of the details in my suggestions might be wrong, but I think the overall points I am trying to make should still hold up.

Comment by Johannes C. Mayer (johannes-c-mayer) on Pivotal Acts might Not be what You Think they are · 2023-11-11T11:19:19.899Z · LW · GW

Letting Loose a Rogue AI is not a Pivotal Act

It seems likely that governments who witnessed an "almost AI takeover" will try to develop their own AGI. This could happen even if they understand the risks. They might think that other people are rushing it, and that they could do a better job.

But if they don't really understand the risks, like right now, then they are even more likely to do it. I don't count this as a pivotal act. If you can get the outcome you described then it would be a pivotal act, but the actions you propose would not have that outcome with high probability. I would guess with much less than 50%. Probably less than 10%.

There might be a version of this, with a much more concrete plan, such that we can see that the outcome would actually follow from executing the plan.

On Having an AI explain how Alignment is Hard

I think your second suggestion is interesting. I'd like to see you write a post about it exploring this idea further.

If we build a powerful AI and then have it tell us about all the things that can go wrong with an AI, then we might be able to generate enough scientific evidence about how hard alignment is, and how likely we will die, e.g. in the current paradigm, such that people would stop.

I am not talking about conceptual arguments. To me at least I think the current best conceptual arguments already strongly point in that direction. But extremely concrete rigorous mathematical arguments, or specific experimental setups that show how specific phenomena do in fact arise. For example, if you had an experiment that showed that Eliezer's arguments are correct, that when you train hard enough on a general enough objective, you will in fact get out more and more general cognitive algorithms, up to and including AGI. If the system also figures out some rigorous formalisms to correctly present these results, then this could be valuable.

The reason why this seems good to me, at first sight, is that false positives are not that big of an issue. If an AI finds all the things that can go wrong, but 50% of them are false positives in the sense that they would not be a problem in practice, we may get saved, because we are aware of all the ways things can go wrong. When solving alignment, false positives, i.e. thinking that something is safe when it is not, kill you.

Intuitively it also seems that evaluating whether something describes a failure case is a lot easier than evaluating whether something can't fail.

When doing this you are much less prone to the temptation of delaying a system in practice with the insights you got. Understanding failure modes does not necessarily imply that you know how to solve them (though it is the first step, and definitely can do this).

Pitfalls

That being said there are a lot of potential pitfalls with this idea, but I don't think they disqualify the idea:

  • An AI that could tell you how alignment is hard might already be so capable that it is dangerous.
  • When telling you how things are dangerous, it might formulate concepts that are also very useful for advancing capabilities.
  • If the AI is very smart it could probably trick you. It could present to you a set of problems with AI, such that it looks like if you solved all the problems you would have solved alignment. But in fact, you would still get a misaligned AGI that would then reward the AI that deceived you.
    • E.g. if the AI roughly understands its own cognitive reasoning process, and notices how it is not really aligned, it would give the AI information about what parts of alignment the humans have figured out already.
  • Can we make the AI figure out the useful failure modes? There are tons of failure modes, but ideally, we would like to discover new failure modes such that eventually we can paint a convincing picture of the hardness of the problem. An even better would be a list of problems corresponding to having new important insights (though this would go beyond what this proposal tries to do).

Prefered Planning

Let's go back to your first pivotal act proposal. I think I might have figured out where you miss-stepped.

Missing step plans are a fallacy, and thinking of them I realize that I think you probably committed another type of planning fallacy here. I think you generated a plan and then assumed some preferred outcome would occur. That outcome might be possible in principle but not what would happen in practice. This seems very related to The Tragedy of Group Selectionism.

This fallacy probably shows up when generating plans, because if there are no other people involved and the situation is not that complex, it is probably a very good heuristic. When you are generating a plan, you don't want to fill in all the details of the plan. You want to make the planning problem as easy as possible. So our brain might implicitly make the assumption that we are going to optimize for the successful completion of the plan. That means that the plan can be useful as long as it roughly points in the correct direction. Mispredicting an outcome is fine, because later on when you realize that the outcome is not what you wanted, you can just apply more optimization pressure, changing the plan, such that now the plan again has the desired outcome. As long as you were walking roughly in the right direction, and things you have been doing so far don't turn out to be completely useless, this heuristic is great for reducing the computational load of the planning task.

Details can be filled in later, corrections can be made later. At least as long as you will reevaluate your plan later on. You could do this by reevaluating the plan when:

  • A step is completed
  • You notice a failure when executing the current step
  • You notice that the next step has not been filled in yet.
  • After a specific amount of time passed.

Sidenote: Making an abstract step more concrete might seem like a different operation from regenerating in the case where you notice that the plan does not work. But it could just involve the same planning procedure. In one case with a different starting point, and in the other with a different set of constraints.

I expect part of the failure mode here is that you generate a plan and then to evaluate the consequences of the plan, you implicitly plug yourself into the role of the people who would be impacted by the plan, to predict their reaction. Without words, you think "What would I do if I observed a rouge AI almost taking over the world, if I were China?" Probably without realizing, that this is what you are doing. But the resulting prediction is wrong.

Comment by johannes-c-mayer on [deleted post] 2023-11-07T15:12:36.411Z

I am not sure. It seems I figured out some high level stuff that I am not sure other GOFAI people have figured out yet, by thinking for tens of hours. When skiming AI a modern approach I did not find these insights in there.

Test dialog comment.

Comment by Johannes C. Mayer (johannes-c-mayer) on Pivotal Acts might Not be what You Think they are · 2023-11-07T12:35:49.786Z · LW · GW

I think you are correct, for a particular notion of pivotal act. One that I think is different from Eliezer's notion. It's certainly different from my notion.

I find it pretty strange to say that the problem is that a pivotal act is a single action. Everything can be framed in terms of a single action.

For any sequence of actions, e.g. [X, Y, Z] I can define a new action ω := [X, Y, Z], which executes X, then Y, and then Z. You can do the same for plans. The difference between plans and action sequences is that plans can have things like conditionals. For example, choosing the next sequence of actions based on the current state of the environment. You could also say that a plan is a function that tells you what to do. Most often this function takes in your model of the world.

So really you can see anything you could ever do as a single plan that you execute. If there are multiple steps involved you simply give a new name to to all these steps, such that you now have only a single thing. That is how I am thinking about it. After this definition, we can have a pivotal act that is composed of many small actions that are distributed across a large timespan.

The usefulness of the concept of a pivotal act comes from the fact that a pivotal act needs to be something that saves us with a very high probability. It's not important at all that it happens suddenly, or that it is a single action. So your criticism seems to miss the mark. You are attacking the concept of a pivotal act for having properties that it simply does not have.

"Upload a human" is something that requires many steps dispersed throughout time. We just use the name "Upload a human" such that we don't need to specify all of these individual steps in detail. That would be impossible right now anyway, as we don't know exactly how to do it.

So if you provide a plan that is composed of many actions distributed throughout time, that will save us with a very high probability, I would count this as a pivotal act.

Note that being a pivotal act is a property of a plan in relation to the territory. There can be a plan P that saves us when executed. But I might fail to predict this. I.e. it is possible to misestimate is_pivotal_act(P). So one reason for having relatively simple, abstract plans like "Upload a human", is that these plans specify a world state, with easily visible properties. In the "Upload a human" example we would have a superintelligent human. Then we can evaluate the is_pivotal_act property, and based on that we have created a superintelligent human. I am heavily simplifying here, but I think you get the idea.

I think your "positively transformative AI" just does not capture what a pivotal act is about (I haven't read the article, I am guessing based on the name). You could have positive transformative AI, that makes things increasingly better and better, by a lot. And then somebody builds a misaligned AGI and everybody dies. One doesn't exclude the other.

Comment by Johannes C. Mayer (johannes-c-mayer) on Pivotal Acts might Not be what You Think they are · 2023-11-06T07:28:39.338Z · LW · GW

Yes, if you can make the article's contents be in all the brains that would be liable to accidentally create a misaligned AGI, now and in the future, and we also assume that none of these brains want to intentionally create a misaligned AGI, then that would count as a pivotal act in my book.

This might work without the assumption that nobody wants to create a misaligned AGI, through a different mechanism than described in the OP. Then it seems relatively likely that there is enough oomph to push for effective regulations.

Comment by Johannes C. Mayer (johannes-c-mayer) on Pivotal Acts might Not be what You Think they are · 2023-11-06T07:14:41.350Z · LW · GW

Thank you for the feedback. That's useful.

I agree that you need to be very careful about who you upload. There are less than 10 people I would be really confident in uploading. That point must have been so obvious in my own mind that I forgot to mention it.

Depending on the setup I think an additional important property is how resistant the uploaded person is, to going insane. Not because the scan wasn't perfect, or the emulation engine is buggy, but because you would be very lonely (assuming you only upload one person and don't immediately clone yourself) if you run that much faster. And you need to handle some weird stuff about personal identity that comes up naturally, through cloning, simple self-modifications, your program being preempted by another process, changing your running speed, etc.

Comment by Johannes C. Mayer (johannes-c-mayer) on Pivotal Acts might Not be what You Think they are · 2023-11-05T17:54:30.168Z · LW · GW

In retrospect, I am somewhat confused about what I am trying to do with this article. I am glad that I did publish it, because I too frequently don't publish writeups that are almost complete. It all started out by trying to give a brief summary that people could look at to get less confused about pivotal acts. Basically in a different writeup, a person said that it is unclear what I mean by pivotal act. Instead of writing this article, I should probably just have added a note to the original article that pivotal act actually means something very specific and that it is easily conflated with other things, and then linked to the original article. I did link to the original article in the document I was writing, but apparently, that was not enough.

I think it does an okay job of succinctly stating the definition. I think stating some common confusions is probably a good thing to do, to preemptively prevent people from falling into these failure modes. And the ones I listed seem somewhat different from the ones in the original article. So maybe they add a tiny bit of value.

I think the most valuable thing about writing this article is to make me slightly less confused about pivotal acts. Before writing this article my model was implicitly telling me that any pivotal act needs to be directly about putting some entity into a position of great enough power that it can do the necessary things to save the world. After writing this article it is clear that this is not true. I now might be able to generate some new pivotal acts that I could not have generated before.

If I want the articles I write to be more valuable to other people, I should probably plan things out much more precisely and set a very specific scope for the article that I am writing beforehand. I expect that this will decrease the insights I will generate for myself during writing, but make the writing more useful to other people.

Comment by Johannes C. Mayer (johannes-c-mayer) on Open Thread, August 2010-- part 2 · 2023-11-02T05:22:39.131Z · LW · GW

It seems that Benfotiamine (none of the other tiamines are over the counter in Germany) had a similar effect on me. I feel a lot better now, whereas before I would feel constantly tired. Before I felt like I could not do anything most of the time without taking stimulants. Now my default is probably more than 50% towards what I felt like on a medium stimulant dose. I did try a lot of interventions in the same time period, so I am not sure how much Benfotiamin contributed on its own, but I expect it to contribute between 25-65% of the positive effects. I also figured out that I am borderline diabetic, which is evidence in favor of Benfotiamine being very significant.

Comment by Johannes C. Mayer (johannes-c-mayer) on Spaghetti Towers · 2023-11-01T13:48:18.839Z · LW · GW

Let's say we want to create a program to perform a particular task T. Imagine you are constructing a program A by just throwing random things on the wall. The output of A is a program. So we run A and get a program B. We now see how well B performs on T. That tells us how well a change to A sticks.

Now assume that T is quite complicated and that A is random, i.e. it has no structure to it. Would you expect in that circumstance that B has no structure to it? To me, it seems quite likely that B will have a lot of regularity to it. It will not be good code from the human perspective, but there will be a lot of structure I think, simply because that structure is in T and the environment. If there are two obvious actions in the environment A and B, and a simple solution to T is to perform a sequence of these two actions depending on the context, then it seems kind of likely that partial solutions to this factorization would stick better to A.

If you try random things you are not actually accepting all random changes. You only accept changes that match the structure of the problem you are solving. You might often find a matching structure that no human designer would choose because it is not the easiest understandable thing. But to me, it seems that the program that is produced by the spaghetti test will be closer to a program a human designer would come up with than with a program sampled randomly. If we only consider all the programs that solve T, we will still be closer to a human interpretable program, than the "average program" that solves T.

That is somewhat vague, but I have a semi-strong intuition that something like this holds. Even in some non-trivial cases. In the case where we just consider all possible programs that solve T it kind of holds trivially. Then we would obviously be very close to the human-designed program, as there are programs of arbitrary size without dead code (in a weak sense) that solve T.

Now I made the assumption that A has no structure to it. But I am not sure that this actually holds in general. Throwing random things on the wall, and seeing if they stick to generate a program of size , doesn't seem equivalent to just randomly sampling from all programs of size .

Comment by Johannes C. Mayer (johannes-c-mayer) on What makes people intellectually active? · 2023-10-24T23:26:59.126Z · LW · GW

I have been doing something similar lately. I wrote with somebody online extensively, at one point writing a 4000 word Discord messages. That was mostly not about AI alignment, but was helpful in learning how to better communicate in writing.

An important transition in my private writing has been to aim for the same kind of quality I would in public content. That is a nice trick to get better at public <writing/communication>. There is very large difference between writing an idea down such that you will be able to retrieve the information content, and to write something down such that it truly stands on it's own, such that another person can retrive the information.

This is not only useful for training communicating in writing, it also is very useful when you want to come back to your own notes much later, when you forgot about all of the context wich allowed you to fill in all the missing details. Previously I would only rarely read old nodes because they where so hard to understand and not fun to read. I think this got better.

Maybe one can get some milage out of framing the audience to include your future self.

The very first and probably most important step in the direction of "writing to effectively communicate" which I took many years ago, was to always write in "full text", i.e. writing full sentences instead of a bunch of disparate bullet points. I think doing this is also very important to get the intelligence augmenting effects of writing.

For me the public in public writing is not the issue. The core issue for me is that I start multiple new drafts every day, and get distracted by them, such that I never finish the old drafts.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-10-21T03:31:12.462Z · LW · GW

Transhuman Love

I just had an interesting thought. How would you show somebody that you love them in the transhumanist future? Well, one way would be to reveal to one another, in a verifiable way, what kinds of algorithms you are made of.

E.g. you could reveal exactly how your decision algorithm works and how it will take the others preferences into account. You could also show somebody that in the past you self modified to have certain preferences, the others like.

You could also show them exactly how the algorithm works that makes you feel good when you see and interact with them. Then you can reveal how this creates a positive experience and how that positivity factors into a reinforcement algorithm that makes you like them even more.

I think this quite a beautiful thought.

Comment by Johannes C. Mayer (johannes-c-mayer) on What makes people intellectually active? · 2023-10-20T12:51:10.134Z · LW · GW

This reminds me of how little control you have over your own mind. However, that is not the worst part. The worst part is when you don't realize how little control you actually have.

I think I have almost my entire life fallen prey to the fallacy of believing that emotions don't affect me. I thought I was impenetrable to feedback on the emotional level. That I could, with a cold mind, extract all of the object-level information from feedback. But then somebody gave me very negative feedback on an article I had worked very hard on for over a week. Afterwards, I basically stopped writing for 6 months about that topic. Actually, I think it still affects me now, 1 year later.

I think all of this would have been a lot less terrible had I realized what was going on. I did not even consider the possibility that I was not writing because I got some very strong negative feedback until maybe 4 months ago.

I think there is probably a time and place for intellectually isolating yourself if you are prone to this failure mode. I only notice now that I have often intellectually isolated myself in the past, and that at those times I never ran into the issue of not being able to come up with ideas. However, I think feedback can be extremely valuable, so there is certainly a balance to strike here.

That was potentially valuable early on for me when I started to write down my ideas. I wrote down probably over a million words worth of ideas before I ever wrote up anything publically. I am pretty sure >3% of all the writing I have done is public right now.

Now that I frequently talk to people about the things I am thinking about I am constantly running into the issue that I get critique about something other than the thing I am trying to explain because I am so bad at explaining. Often this only becomes apparent in hindsight. I think this could have been very damaging early on if I always got this kind of negative feedback. However, it's worth noting that not isolating myself for so long would probably also have helped me get better at explaining.

Becoming good at detaching yourself from your ideas is probably better than isolating yourself as much as I did.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-10-20T12:20:51.128Z · LW · GW

Deceived by your own Incompetence

Consider how Ofria failed. Somebody told me that in that context deception is a property of the environment and its observer. However, it seems to me that the objective of the designer of the system needs to be factored in.

Although in general an observer can be deceived, this is not the case here I would argue. Ofria just designed a system that did not do what he wanted it to do. It failed transparently.

It would seem that this is similar to you wanting to build a rocket that goes to the moon, but then building a rocket that just explodes, because your understanding of rocketry is poor, and then saying "The rocket deceived me into thinking it would not explode".

Imagine you want to build an honest system. You build a system that looks to you like it would be honest. But it ends up deceiving you because your understanding is flawed. In that case, the system would be deceptive, but I wouldn't call the fact that you were mistaken about what would constitute a non-deceptive system deception. That is a particular kind of incompetence.

Also, note that the failure mode here is different from "Humans use condoms" type things. When Ofria created his system the organism would perform well in the overall system. The overall system stayed constant. It was not the case that at some point ice cream was introduced into the organism's environment.

Instead, the failure mode was to think that by creating a special environment, the behavior observed in that environment would generalize widely by default.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-10-19T22:52:45.811Z · LW · GW

Here is a video about interrogation techniques I found interesting. It's all about social manipulation.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-10-19T05:35:18.399Z · LW · GW

I just rewatched a video I made about a VR experience I made 6 years ago with a bunch of other people. You fly through a 3D slice of a 4D fractal. The beginning has the most interesting morphing geometry.

We made this in only 3 or 4 weeks IIRC. Pretty crazy. I needed to implement raymarching in Unity, which then effectively replaced the entire default rendering pipeline. It was a lot easier than it sounds though, as we did not need to have any interactions with the environment (which would be basically impossible, or at least I don't know how to do it).

Comment by Johannes C. Mayer (johannes-c-mayer) on johnswentworth's Shortform · 2023-10-12T17:59:49.642Z · LW · GW

If you upload a human and let them augment themselves would there be any u? The preferences would be a tangled mess of motivational subsystems. And yet the upload could be very good at optimizing the world. Having the property of being steered internally by a tangled mess of motivational systems seems to be a property that would select many minds from the set of all possible minds. Many of which I'd expect to be quite different from a human mind. And I don't see the reason why this property should make a system worse at optimizing the world in principle.

Imagine you are an upload that has been running for very very long, and that you basically have made all of the observations that you can make about the universe you are in. And then imagine that you also have run all of the inferences that you can run on the world model that you have constructed from these observations.

At that point, you will probably not change what you think is the right thing to do anymore. You will have become reflectively stable. This is an upper bound for how much time you need to become reflective stable, i.e. where you won't change your u anymore.

Now depending on what you mean with strong AGI, it would seem that that can be achieved long before you reach reflective stability. Maybe if you upload yourself, and can copy yourself at will, and run 1,000,000 times faster, that could already reasonably be called a strong AGI? But then your motivational systems are still a mess, and definitely not reflectively stable.

So if we assume that we fix u at the beginning as the thing that your upload would like to optimize the universe for when it is created, then "give u() up", and "let u go down" would be something the system will definitely do. At least I am pretty sure I don't know what I want the universe to look like right now unambiguously.

Maybe I am just confused because I don't know how to think about a human upload in terms of having a utility function. It does not seem to make any sense intuitively. Sure you can look at the functional behavior of the system and say "Aha it is optimizing for u. That is the revealed preference based on the actions of the system." But that just seems wrong to me. A lot of information seems to be lost when we are just looking at the functional behavior instead of the low-level processes that are going on inside the system. Utility functions seem to be a useful high-level model. However, it seems to ignore lots of details that are important when thinking about the reflective stability of a system.

Comment by Johannes C. Mayer (johannes-c-mayer) on Josh Jacobson's Shortform · 2023-10-12T05:38:46.858Z · LW · GW

Maybe ask @plex

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-10-12T04:51:33.587Z · LW · GW

I have found it tremendously useful to start out my workout routine by dancing. Dancing is so fun that I am always looking forward to doing it. I want to dance, and I frequently run into the problem that it is hard to stop. After dancing I do the rest of my workout routine which is very boring in comparison. But it is really not a problem to get started on them after the dancing.

I expect this because I have developed a habit, i.e. my brain saved a procedure "execute workout" which is a sequence of instructions that I run through, without thinking about them. Because there is no decision point, you don't need to make the right decision. Doing the right thing (continuing the workout) becomes automatic.

The difficult part is to start the workout. Starting the workout by dancing solves this problem. I expect this technique to generalize widely. If you want to do some activity regularly, then start it with something very fun. This overcomes the barrier to starting. Engraining the habit that encodes the activity itself will then be much easier.

It might not be required that all the activities are related to each other. Maybe it would work to watch one episode of your favorite TV, and then do some boring workout, as long as you are very consistent about stopping and then doing the workout long enough for the habit to form.

With dancing and sports, this is relatively easy to form the habit. You are either gonna do the boring workout immediately after dancing or not at all because showering twice would be a waste of time (at least that is how I feel about it). Therefore I recommend looking for synergies like this in other habits.

Comment by Johannes C. Mayer (johannes-c-mayer) on Nonspecific discomfort · 2023-10-11T22:12:19.110Z · LW · GW

To me, it also seems you undervalue introspection. This can be bad because if you think introspection is not useful, you will underuse it.

Most people I expect can't tell the difference between "I'm unhappy in my profession" and "I'm unhappy with my immediate manager"

That might be correct, not because it is impossible for these people to introspect, but because they have not learned it. Most people don't hold off on proposing solutions but I would be surprised if they could not learn to do it.

I am sure I can tell if I am unhappy because of my job or my boss. It is likely I wouldn't have been able to tell a year ago when I thought I was good at introspection while being terrible at it. So terrible, that it is hard to imagine how I could have been worse.

How is that possible? Well, I thought I was good at introspection because I was very good at certain kinds of introspection. E.g. introspecting on how I do analytical reasoning, while I do it. But I was terrible at emotional introspection. I only had the concept of introspection. Now it is clear to me that there are multiple kinds of introspection. I was blind to emotions, without realizing this.

Before I got better at introspection, all of my negative feelings could have been described as nonspecific discomfort. But it was really not nonspecific at all. It was just that I had ignored and suppressed my emotions so much that they got disassociated from their actual causes. So I would feel bad but didn't know why. I basically did exactly what Hazard talks about here. I did this basically for every negative emotion I experienced.

But then I discovered this tek to introspect, and it seems to work quite well. I have applied it maybe 8 times now. Mostly to emotions that at first seem nonspecific. In my experience, most emotions are actually non-specific, even if they are temporally linked very tightly.

E.g. if I experience social rejection, I normally feel good at first but after 15 minutes I start to feel bad. It seems like it should be clear to me that this is because of the social rejection, but it's not. It will seem like the most likely explanation to me, but there will be uncertainty about if this is actually what is going on. This is ridiculous maybe I am especially bad at analyzing emotions without spinning up a conscious expliitit optimization process. But once I use the technique I linked above it becomes very clear why I feel bad. The interesting thing is that once you understand the underlying cause of the nonspecific comfort, it disappears. Without you doing anything.

This makes sort of sense. Once you have truly understood what a specific feeling "wants you to accomplish" there is really no more point in it sticking around. Now that you have understood the feeling you can either optimize for getting what the feeling wants, or you can realize that the feeling doesn't actually make sense in the current situation. Doing the appropriate thing will make the feeling go away. At least that is what happened so far for me.

Comment by Johannes C. Mayer (johannes-c-mayer) on Using Negative Hallucinations to Manage Sexual Desire · 2023-10-09T19:54:40.691Z · LW · GW

As an update, the 3rd thing I tried also failed. Now I ran out of things to try. The problem is that anything that is non-sexual love seems to be corrupted by sexual love, in a way that makes the non-sexual part worse. E.g. imagine you have a female friend that you like to talk to because she is a good interlocutor. When not talking to her, you might think about what topics would be good to talk about, how to make the conversations better at a meta-level, or how much you enjoy talking to her.

I expect that if you would now start to have sex with that female friend your mind would get corrupted by sexual desire. E.g. instead of thinking about what to discuss in the next meeting, a sexual fantasy would pop into your head.

That seems strictly worse. This is not exactly what happened in my failed attempts number 2 and 3. But I think this example highlights the underlying dynamic that made none of my attempts work out. Attempt number 1 failed because there wasn't enough love there that you could extend in the first place.

My current strategy is to just not think anything sexual anymore, and be sensitive to any negative emotions that arise. I then plan to use my version of IDC on them to figure out what the subagents that generate the emotions want. So far it seems that to some extent realizing this corruption dynamic has cooled down the sexual part of my mind a bit. But attempt 3 only failed yesterday so this cooling effect might only be temporary.

I wouldn't expect to get it all figured out quickly

I feel like I have figured out a lot of stuff about this general topic in the last month. Probably more than in the rest of my life so far. Mainly by properly processing my emotions instead of ignoring them using my IDC technique. It feels like I have figured it out to 75% or something like that. And the good thing is that I don't need to figure it out all the way, to get large benefits. I expect to be much better off now even if I would not do any more optimization.

"Extension of non-sexual love" sounds right, but also just so much weird and unexpected stuff that it's hard to foresee in sufficient detail that it's likely that your perspective on what this entails isn't complete.

I agree that my understanding is not complete. But I think my model is pretty good now. Definitely much better than before, because before I did not have a model. I thought about this point specifically for 5-15 hours. Here is the most important section of my notes on it, based on some analysis I did after watching the first 10 episodes of

.

A specific instance of the general idea here is that you want to reach a true-best-friends-forever status before feelings of love become involved at all.

I also like the general progression of the relationship between Ryuuji and Taiga. It really seems like the kind of progression that I want. You slowly build up closeness with the other person. Love comes in only at the end, if at all. You make your relationship closer and closer through many things, but none of them are love.

The ideal version of this is getting maximally close in a relationship via some context, and only once you get maximally close in that context do you extend the context. And then again you optimize for getting as close as possible in the new extended context, before extending the context again. And you add things to the context sorted such that you add the less impactful stuff first. Adding the component of love to the context should be very late in this chain. I don't say last, because I expect in the transhumanist future there would be things that are even stronger than love.

I feel like this description points roughly in the right direction, but is probably wrong and confused in some details.

I also realize now that this just solves the problem that I have had with romance all along. That is the reason why I did not like how my mind behaved. My mind normally just starts to love somebody immediately, overwriting all of the other aspects of the relationship. This is exactly not what I want love to be. I want love to be the thing that follows after everything else is maximally good. And I want the same to be true for other attributes. E.g. before feeling friendly with somebody, you should like them as much as possible, and get as close to them as possible, without that friendliness feeling there.

It really feels like this is something that generalizes. This is basically a rough sketch of methodology for how to gradually evolve a relationship such that you get the optimal relationship.

The main update that I have had so far is that maybe adding sexuality in the way it exists in my brain right now would always be a downgrade in most relationships. But in principle, you could probably find a version of it, if could effectively selfmodify, that would upgrade most relationships.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-10-09T16:19:13.898Z · LW · GW

I do it in a very convoluted way. Basically, I have created a subagent in my mind that somehow has access to this aspect, and then I can tell the subagent to make me feel good when I tap the laptop. If I just try to make it feel good myself to tap the laptop then it does not work. It works best with discrete events that give you feedback like tapping. Throwing something in the trash does not work as easily. I actually have used this technique almost never, which seems strange, because it seems very powerful.

Comment by Johannes C. Mayer (johannes-c-mayer) on Write the Worst Post on LessWrong! · 2023-10-09T15:12:36.041Z · LW · GW

Well, I am talking about creating games, not playing them if that was unclear. I think that is significantly harder than making games. It took over a thousand hours of practice to get good. I think AI alignment is a lot harder, but I think the same pattern applies to some extent. For example, asking the question "What will this project look like if it goes really well is a good question." Why well when John asked this question to a bunch of people he got good results. I have not thought about why you get good results, but asking this question. But I am pretty sure I could understand it better, and that is likely to be useful, compared to not understanding. But clearly, you can get benefits even when you don't understand.

Most of the time when you are applying a technique you will just be applying the technique. You will normally not retrieve all of the knowledge of why this technique works before using it. And it works fine. The knowledge about why the technique is mostly useful for refining the technique is my guess. However, applying the refined technique does not require retrieving the knowledge. In fact, you might often forget the knowledge but not the refined technique, i.e. the procedural knowledge.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-10-08T03:50:50.786Z · LW · GW

It is short for internal double crux.

Comment by Johannes C. Mayer (johannes-c-mayer) on Johannes C. Mayer's Shortform · 2023-10-07T20:00:35.177Z · LW · GW

Is "mind algorithms" a known concept? I definitely have a concept like this in my head that matches this name. I have never seen anybody else talk about it though. Also each time I tell somebody about this concept they don't seem to get it. They tend to dismiss it as trivial and obvious. Probably because they have a model in their mind that fits the name "mind algorithm". But I expect the concept in my head to be much more powerful. I expect that I can think of certain thoughts that are inaccessible to them because their model is less powerful.

I would ask things like to what extent is it true that you can run arbitrary algorithms on your bain? Certainly, there are limits but I am not sure where they are. E.g. it is definitely possible to temporarily become a different person, by creating a separate personality. And that personality can be very different. E.g. it could not get upset at something that you get normally upset by.

It should not be too surprising that this is possible. It is normal to behave differently depending on who you talk to. I am just talking about a much stronger version of this, where you have more explicit control.

In my experience, you can also create an algorithm that arbitrarily triggers the reward circuitry in your brain. E.g. I can make it such that each time I tap on the top of my laptop it feels really good. I.e. I am creating a new algorithm that watches for an event and then triggers some rewards circuitry.

It also shouldn't be surprising that this is possible. Why do I feel good when I get a good weapon drop in a video game? That seems to be learned too. The thing I just described is likely doing a similar thing, only that there you don't rely on some subconscious process to set the reward trigger. Instead, you explicitly construct it. When you look at the reward trigger it might be impossible to tell, whether it was created by some subconscious process or explicitly.