Posts
Comments
I have tried meditation a little bit although not very seriously. Everything I've heard about it makes me think it would be a good idea to do it more seriously.
Not sure how to be weird without being unuseful. What does a weird but useful background look like?
Also I've already been trying to read a lot but still somewhat dissatisfied with my pace. You mentioned you could read at 3x your previous speed. How did you do that?
I am pretty anxious about posting this since this is my first post on LessWrong and also about a pretty confusing topic but I'm probably not well calibrated on this front so oh. Also thanks to NicholasKross for taking a look at my drafts.
What other advice/readings do you have for optimizing your life/winning/whatever?
I think this depends on whether you use SIA or SSA or some other theory of anthropics.
I have a strong inside view of the alignment problem and what a solution would look like. The main reason why I don't have an as concrete inside view AI timeline is because I don't know enough about ML and I have to defer to get a specific decade. The biggest gap in my model of the alignment problem is what a solution to inner misalignment would look like, although I think it would be something like trying to find a way to avoid wireheading.
I've checked out John Wentworth's study guide before, mostly doing CS50.
Part of the reason I'm considering getting a degree is so I can get a job if I want and not have to bet on living rent-free with other rationalists or something.
The people I've talked to the most have timelines centering around 2030. However, I don't have a detailed picture of why because their reasons are capabilities exfohazards. From what I can tell, their reasons are tricks you can implement to get RSI even on hardware that exists right now, but I think most good-sounding tricks don't actually work (no one expected transformer models to be the closest to AGI in comparison with other architectures) and I think superintelligence is more contingent on compute and training data than they think. It also seems like other people in AI alignment disagree in a more optimistic direction. Now that I think about it though, I probably overestimated how long the timelines of optimistic alignment researchers were so it's probably more like 2040.
The difference between an expected utility maximizer using updateless decision theory and an entity who likes the number 1 more than the number 2, or who cannot count past 1, or who has a completely wrong model of the world which nonetheless makes it one-box is that the expected utility maximizer using updateless decision theory wins in scenarios outside of Newcomb's problem where you may have to choose to $2 instead of $1, or have to count amounts of objects larger than 1, or have to believe true things. Similarly, an entity that "acts like they have a choice" generalizes well to other scenarios whereas these other possible entities don't.
- I think getting an extra person to do alignment research can give massive amounts of marginal utility considering how few people are doing it and how it will determine the fate of humanity. We're still in the stage where adding an extra person removes a scarily large amount from p(doom), like up to 10% for an especially good individual person, which probably averages to something much smaller but still scarily large when looking at your average new alignment researcher. This is especially true for agent foundations.
- I think it's very possible to solve the alignment problem. Stuff like QACI, while not a full solution yet, make me think that this is conceivable and you could probably find a solution if you threw enough people at the problem.
- I think we'll get a superintelligence at around 2050.
One-boxers win because they reasoned in their head that one-boxers win because of updateless decision theory or something so they "should" be a one-boxer. The decision is predetermined but the reasoning acts like it has a choice in the matter (and people who act like they have a choice in the matter win.) What carado is saying is that people who act like they can move around the realityfluid tend to win more, just like how people who act like they have a choice in Newcomb's problem and one-box in Newcomb's problem win even though they don't have a choice in the matter.
I don't think this matters all that much. In Newcomb's problem, even though your decision is predetermined, you should still want to act as if you can affect the past, specifically Omega's prediction.
I don't believe something can persuade generals to go to war in a short period of time, just because it's very intelligent.
A few things I've seen give pretty worrying lower bounds for how persuasive a superintelligence would be:
- How it feels to have your mind hacked by an AI
- The AI in a box boxes you (content warning: creepy blackmail-y acausal stuff)
Remember that a superintelligence will be at least several orders of magnitude more persuasive than character.ai or Stuart Armstrong.
Formal alignment proposals avoid this problem by doing metaethics, mostly something like determining what a person would want if they were perfectly rational (so no cognitive biases or logical errors), otherwise basically omniscient, and had an unlimited amount of time to think about it. This is called reflective equilibrium. I think this approach would work for most people, even pretty terrible people. If you extrapolated a terrorist who commits acts of violence for some supposed greater good, for example, they'd realize that the reasoning they used to determine that said acts of violence were good was wrong.
Corrigibility, on the other hand, is more susceptible to this problem and you'd want to get the AI to do a pivotal act, for example, destroying every GPU to prevent other people from deploying harmful AI, or unaligned AI for that matter.
Realistically, I think that most entities who'd want to use a superintelligent AI like a nuke would probably be too short-sighted to care about alignment, but don't quote me on that.
To the first one, they aren't actually suffering that much or experiencing anything they'd rather not experience because they're continuous with you and you aren't suffering.
I don't actually think a simulated human would be continuous in spacetime with the AI because the computation wouldn't be happening inside of the qualia-having parts of the AI.
I think what defines a thing as a specific qualia-haver is not what information it actually holds but how continuous it is with other qualia-having instances in different positions of spacetime. I think that mental models are mostly continuous with the modeler so you can't actually kill them or anything. In general, I think you're discounting the importance that the substrate of a mental model/identity/whatever has. To make an analogy, you're saying the prompt is where the potential qualia-stuff is happening, and isn't merely a filter on the underlying language model.
My immediate thought is that the cat is already out of the bag and whatever risk there was of AI safety people accelerating capabilities is nowadays far outweighed by capabilities hype and in general, much larger incentives, and that the most we can do is to continue to build awareness of AI risk. Something about this line of reasoning strikes me as uncritical though.
Probably not the best person on this forum when it comes to either PR or alignment but I'm interested enough, if only about knowing your plan, that I want to talk to you about it anyways.
Will the karma thing affect users who've joined before a certain period of time? Asking this because I joined quite a while ago but have only 4 karma right now.
That's not really specific enough. I would describe it as someone being really angry about something, contingent on a certain belief being true, but then when you ask them why they believe that belief, its very weak evidence or something that is the opposite of an open and shut case or something that could vary depending on context and so on and so forth.