Posts

Deliberative Cognitive Algorithms as Scaffolding 2024-02-23T17:15:26.424Z
The Byronic Hero Always Loses 2024-02-22T01:31:59.652Z
Gemini 1.5 released 2024-02-15T18:02:50.711Z
Noticing Panic 2024-02-05T03:45:51.794Z
Suggestions for net positive LLM research 2023-12-13T17:29:11.666Z
Sherlockian Abduction Master List 2023-12-02T22:10:21.848Z
Is there a hard copy of the sequences available anywhere? 2023-09-11T19:01:54.980Z
Decision theory is not policy theory is not agent theory 2023-09-05T01:38:27.175Z
Mechanistic Interpretability is Being Pursued for the Wrong Reasons 2023-07-04T02:17:10.347Z
A flaw in the A.G.I. Ruin Argument 2023-05-19T19:40:03.135Z
Is "Regularity" another Phlogiston? 2023-03-12T03:13:44.646Z

Comments

Comment by Cole Wyeth (Amyr) on List your AI X-Risk cruxes! · 2024-04-29T15:14:11.669Z · LW · GW

Cruxes connected to whether we get human level A.I. soon:
Do LLM agents become useful in the short term?
How much better is GPT-5 than GPT-4?
Does this generation of robotics startups (e.g. Figure) succeed?

Cruxes connected to whether takeoff is fast:
Are A.I. significantly better at self improving while maintaining alignment of future versions than we are at aligning A.I.?

Cruxes that might change my mind about mech. interp. being doomed:
Can a tool which successfully explains cognitive behavior in GPT-N do the same for GPT-N+1 without significant work?

Last ditch crux:
In high dimensional spaces, do agents with radically different utility functions actually stomp on each other or do they trade? When the intelligence of one agent scales far beyond the other, does trade become stomping or do both just reduce etc. 

Comment by Cole Wyeth (Amyr) on We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society · 2024-04-27T17:55:37.483Z · LW · GW

I think it's worth considering that Jaynes may actually be right here about general agents. His argument does seem to work in practice for humans: it's standard economic theory that trade works between cultures with strong comparative advantages. On the other hand, probably the most persistent and long running conflict between humans that I can think of is warfare over occupancy of Jerusalem. Of course there is an indexical difference in utility function here - cultures disagree about who should control Jerusalem. But I would have to say that under many metrics of similarity this conflict arises from highly similar loss/utility functions. Certainly I am not fighting for control of Jerusalem, because I just don't care at all about who has it - my interests are orthogonal in some high dimensional space. 

The standard "instrumental utility" argument holds that an unaligned AGI will have some bizarre utility function very different from ours, but the first step towards most such utility functions will be seizing control of resources, and that this will become more true the more powerful the AGI. But what if the resources we are bottlenecked by are only bottlenecks for our objectives and at our level of ability? After all, we don't go around exterminating ants; we aren't competing with them over food, we used our excess abilities to play politics and build rockets (I think Marcus Hutter was the first to bring this point to my attention in a lasting way). I think the standard response is that we just aren't optimizing for our values hard enough, and if we didn't intrinsically value ants/nature/cosmopolitanism, we would eventually tile the planet with solar panels and wipe them out. But why update on this hypothetical action that we probably will not in fact take? Is it not just as plausible that agents at a sufficiently high level of capability tunnel into some higher dimensional space of possibilities where lower beings can't follow or interfere, and never again have significant impact on the world we currently experience? 

I can imagine a few ways this might happen (energy turns out not to be conserved and deep space is the best place to build a performant computer, it's possible to build a "portal" of some kind to a more resource rich environment (interpreted very widely), the most effective means of spreading through the stars turns out to be just skipping between stars and ignoring planets) but the point is that the actual mechanism would be something we can't think of.  

Comment by Cole Wyeth (Amyr) on We are headed into an extreme compute overhang · 2024-04-27T17:27:00.638Z · LW · GW

I have no reason to question your evidence but I don't agree with your arguments. It is not clear that a million LLM's coordinate better an a million humans. There are probably substantial gains from diversity among humans, so the identical weights you mentioned could cut in either direction. An additional million human level intelligences would have a large economic impact, but not necessarily a transformative one. Also, your argument for speed superintelligence is probably flawed; since you're discussing what happens immediately after the first human level AGI is created, gains from any speedup in thinking should already be factored in and will not lead to superintelligence in the short term.  

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2024-04-10T19:07:59.654Z · LW · GW

Should be fixed.

Comment by Cole Wyeth (Amyr) on General Thoughts on Secular Solstice · 2024-03-27T15:00:19.923Z · LW · GW

You have pointed out some important tradeoffs. Many of my closest friends and intellectual influences outside of lesswrong are religious, and often have interesting perspectives and ideas (though I can't speak to whether this is because of their religions, caused by a common latent variable, or something else). However, I do not think that the purpose of lesswrong is served by engaging with religious ideology here, and I think that avoiding this is probably worth the cost of losing some valuable perspectives.

As you've said, @Jeffrey Heninger does participate in the lesswrong community, at its current level of hostility towards religion. I have read some of his other posts in the past and found them enjoyable and valuable, though I think I am roughly indifferent to this one being published. Why does this suggest to you that the community needs to be less hostile to religion, instead of more or roughly the same amount? Presumably if it were less hostile towards religion, there would be more than the current level of religious discussion - do you think that would be better on the margin? I would also expect an influx of religious people below Jeffrey's level, not above it.

I'm open to starting a dialogue if you want to discuss this further. 

Comment by Cole Wyeth (Amyr) on Modern Transformers are AGI, and Human-Level · 2024-03-26T22:54:40.852Z · LW · GW

Perhaps AGI but not human level. A system that cannot drive a car or cook a meal is not human level. I suppose it's conceivable that the purely cognitive functions are at human level, but considering the limited economic impact I seriously doubt it. 

Comment by Cole Wyeth (Amyr) on General Thoughts on Secular Solstice · 2024-03-26T22:42:53.991Z · LW · GW

Hostility towards Others may be epistemically and ethically corrosive, but the kind of hostility I have discussed is also sometimes necessary. For instance, militaristic jingoism is bad, and I am hostile to it. I am also wary of militaristic jingoists, because they can be dangerous (this is an intentionally extreme example; typical religions are less dangerous).

There is a difference between evangelizing community membership and evangelizing an ideology or set of beliefs. 

Usually, a valuable community should only welcome members insofar as it can still maintain its identity and reason for existing. Some communities, such as elite universities, should and do have strict barriers for entry (though the specifics are not always ideal). The culture of lesswrong would probably be erased (that is, retreat to other venues) if lesswrong were mainstreamed and successfully invaded by the rest of the internet. 

I generally agree that (most) true beliefs should be shared. Ideologies however are sometimes useful to certain people in certain contexts and not to other people in other contexts. Also, evangelism is costly, and it's easy to overestimate the value of your ideology to others. 

Comment by Cole Wyeth (Amyr) on General Thoughts on Secular Solstice · 2024-03-25T14:38:08.529Z · LW · GW

The distinctions you're pointing out are subtle enough that it may be better to ask people instead of trying to infer their beliefs from such a noisy signal. I reject painting religious people as uniformly or even typically villainous, but it is probably fine to share a common frustration with some religions and religious institutions shutting down new ideas. I was not at secular solstice so I can't say for sure which better describes it, but based on the examples given the former seems at least as accurate.  

Comment by Cole Wyeth (Amyr) on General Thoughts on Secular Solstice · 2024-03-24T14:44:18.324Z · LW · GW

I appreciate the perspective. Personally I don't really see the point of a secular solstice. But frankly, the hostility to religion is a feature of the rationalist community, not a bug. 

Rejection of faith is a defining feature of the community and an unofficial litmus test for full membership. The community has a carefully cultivated culture that makes it a kind of sanctuary from the rest of the world where rationalists can exchange ideas without reestablishing foundational concepts and repeating familiar arguments (along with many other advantages). The examples you point to do not demonstrate hostility towards religious people, they demonstrate hostility towards religion. This is as appropriate here as hostility towards factory farming is at a vegan group.

Organizations (corporate, social, biological) are all defined by their boundaries. Christianity seems to be unusually open to everyone, but I think this is partially a side effect of evangelism. It makes sense to open your boundaries to the other when you are trying to eat it. Judaism in contrast carefully enforces the boundaries of its spaces.    

Lesswrong hates religion in the way that lipids hate water. We want it on the outside. I don't know about other rationalists, but I don't have a particular desire to seek it out and destroy it everywhere it exists (and I certainly wish no harm to religious people). I agree with you that too much hostility is harmful; but I don't agree that good organizations must always welcome the other.  

Comment by Cole Wyeth (Amyr) on Reflective AIXI and Anthropics · 2024-03-23T15:21:12.887Z · LW · GW

I don't find these intuitive arguments reliable. In particular, I doubt it is meaningful to say that reflective oracle AIXI takes the complexity of its own counterfactual actions into account when weighing decisions. This is not how its prior works or interacts with its action choice. I don't fully understand your intuition, and perhaps you're discussing how it reasons about other agents in the environment, but this is much more complicated than you imply, and probably depends on the choice of reflective oracle. (I am doing a PhD related to this). 

Comment by Cole Wyeth (Amyr) on On green · 2024-03-22T21:54:37.060Z · LW · GW

I think about this a lot too. One of my biggest divergences with (my perception of) the EA/rationalist worldview is a desire to leave "wilderness"/"wildness" in the world. I think that all of us should question our wisdom. 

Comment by Cole Wyeth (Amyr) on Advice Needed: Does Using a LLM Compomise My Personal Epistemic Security? · 2024-03-11T16:26:22.649Z · LW · GW

The concern may become valid in the future but honestly if this description of your circumstances is accurate I strongly suggest you completely stop reading lesswrong, instead focus on your financial circumstances and mental health. 

Comment by Cole Wyeth (Amyr) on Many arguments for AI x-risk are wrong · 2024-03-05T22:18:45.582Z · LW · GW

It's true that classical that modern RL models don't try to maximize their reward signal (it is the training process which attempts to maximize reward), but DeepMind tends to build systems that look a lot like AIXI approximations, conducting MCTS to maximize an approximate value function (I believe AlphaGo and MuZero both do something like this). AIXI does maximize it's reward signal. I think it's reasonable to expect future RL agents will engage in a kind of goal directed utility maximization which looks very similar to maximizing a reward. Wireheading may not be a relevant concern, but many other safety concern around RL agents remain relevant. 

Comment by Cole Wyeth (Amyr) on Deliberative Cognitive Algorithms as Scaffolding · 2024-02-24T00:39:03.203Z · LW · GW

It is probably possible to make some form of scaffolding work, I'm just skeptical that it's going to be as effective as training an agent directly. Depending on timelines, scaffolding might still feed progress towards superintelligence. 

Comment by Cole Wyeth (Amyr) on OpenAI's Sora is an agent · 2024-02-16T17:33:53.625Z · LW · GW

One could hook up a language model to decide what to visualize, Sora to generate visualizations, and a vision model to extract outcomes. 
This seems like around 40% of what intelligence is - the only thing I don't really see is how reward should be "plugged in," but there may be naive ways to set goals. 

Comment by Cole Wyeth (Amyr) on Gemini 1.5 released · 2024-02-15T19:32:05.855Z · LW · GW

this is mind blowing. When it works it's crazy, even the "glitches" are bizarre like the real world were a slightly broken video game.

Comment by Cole Wyeth (Amyr) on Gemini 1.5 released · 2024-02-15T18:20:23.274Z · LW · GW

Remarkably long context window can process many tokens, but can it process the token safety discussion ;)

Comment by Cole Wyeth (Amyr) on Noticing Panic · 2024-02-05T15:31:03.467Z · LW · GW

Yes, this is a good point. Confidence in delegation is probably a panic reducing factor. In a way delegating to one's future self can be viewed as a special case.

Comment by Cole Wyeth (Amyr) on Noticing Panic · 2024-02-05T15:28:32.073Z · LW · GW

I don't mean to imply that levels of planning are objective. I think that solving math problems usually has relatively few levels of planning involved, but requires domain specific competency and a pretty high level of fluid intelligence. Also, the levels of planning required depends on the person - at a low level of mathematical maturity, each recursive level of chasing definitions might feel like a level of planning, though for me "chasing definitions" is itself effectively one step. I do think that in mathematics it can be easier to compartmentalize the levels of planning, which perhaps is one of a combination of factors that makes me good at it. This note is probably worth adding to the post, thank you.

I did not that executives who don't panic at many levels of planning can still fail because they lack domain specific competency. Difficulty and levels of planning are not identical. 

Comment by Cole Wyeth (Amyr) on Suggestions for net positive LLM research · 2024-01-07T21:52:35.166Z · LW · GW

I like this idea, can I DM you about the research frontier?

Comment by Cole Wyeth (Amyr) on Suggestions for net positive LLM research · 2024-01-07T21:51:18.977Z · LW · GW

This is an extensive list, I'll return to it when I have a bit more experience with the area.

Comment by Cole Wyeth (Amyr) on A Decision Theory Can Be Rational or Computable, but Not Both · 2023-12-22T04:58:17.076Z · LW · GW

The levels of computability of various notions of optimal decision making are discussed in Jan Leike's PhD thesis: https://jan.leike.name/publications/Nonparametric%20General%20Reinforcement%20Learning%20-%20Leike%202016.pdf

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2023-12-11T16:54:14.149Z · LW · GW

Thanks! Good to know.

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2023-12-10T03:54:58.721Z · LW · GW

Useful to have a counterexample.

No, I don't think many other things in my life correlate with MMA. There are no other visual cues I would consider reliable.

If it's not too personal, is your condition persistent? Does it include both ears? And does it look similar to what you might see on UFC fighters?

Comment by Cole Wyeth (Amyr) on Google Gemini Announced · 2023-12-06T17:47:27.965Z · LW · GW

Progress on competitive coding seems impressive. Otherwise I am having trouble evaluating it, it seems slightly better than GPT-4 at most tasks, and multi-modal. Tentatively it seems to be significant progress. 

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2023-12-04T03:44:21.978Z · LW · GW

Similarly, American conservatives generally refer to democrats as "liberals" and rarely as "progressives," which democrats use more often to describe themselves.

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2023-12-03T21:14:11.819Z · LW · GW

@Sune specified that they know the person's gender. 

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2023-12-03T18:00:01.602Z · LW · GW

Yeah that definitely belongs in the post :)

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2023-12-03T17:59:32.252Z · LW · GW

Being from America, I feel pretty confident seconding this one. I will be visiting America soon and pay attention for examples.

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2023-12-03T16:07:48.756Z · LW · GW

Interesting! I will attempt to verify this and then add it to the list.

Comment by Cole Wyeth (Amyr) on Sherlockian Abduction Master List · 2023-12-03T00:36:13.868Z · LW · GW

Yeah, I guess I view Rugby and American football as being essentially combat sports. This may be worth clarifying in the post, but no one who read it and then found out "oh this person actually did Rugby not wrestling" would be particularly surprised.

Still this is somewhat an illustration of the general problem, there are often many adjacent and some non-adjacent alternative explanations.

Comment by Cole Wyeth (Amyr) on Protest against Meta's irreversible proliferation (Sept 29, San Francisco) · 2023-09-20T16:45:43.220Z · LW · GW

This seems to me like a second order correction which is unlikely to change the sign of the outcome. 

Comment by Cole Wyeth (Amyr) on Contra Heighn Contra Me Contra Functional Decision Theory · 2023-09-13T23:28:07.247Z · LW · GW

Making a decision at the policy level may be useful for forming habits, but I don't think that considering others with a similar policy or those who can discern my policy is useful in this example. Those later two are the ones I associate more closely with FDT, and such considerations seem to be very difficult to carry out in practice and perhaps sensitive to assumptions about the world. 
Honestly I don't see the motivation for worrying about the decisions of "other agents sufficiently similar to oneself" at all. It doesn't seem useful to me, right now, making decisions or adopting a policy, and it doesn't seem useful to build into an A.I. either except in very specific cases where many copies of the A.I. are likely to interact. The heuristic arguments that this is important aren't convincing to me because they are sufficiently elaborate that it seems other assumptions about the way the environment accesses/includes one's policy could easily lead to completely different conclusions. 

The underlying flaw I see in many pro-FDT style arguments is that they tend to uncritically accept that if adopting FDT (or policy X) is better in one example that adopting policy Y, policy X must be better than policy Y, or at least neither one is the best policy. But I strongly suspect there are no free lunch conditions here - even in the purely Decision Theoretic context of AIXI there are serious issues with the choice of prior being subjective, so I'd expect it to be even worse if one allows the environment read/write access to the whole policy. I haven't seen any convincing argument that there is some kind of "master policy." I suppose if you pin down a mixture rigorously defining how the environment is able to read/write the policy then there would be some Bayes optimal policy, but I'm willing to bet it would be deviously hard to find or even approximate.

Comment by Cole Wyeth (Amyr) on Contra Heighn Contra Me Contra Functional Decision Theory · 2023-09-13T19:13:56.817Z · LW · GW

I agree that this is an important consideration for humans, though I feel perhaps FDT is overkill for the example you mentioned (insofar as I understand what FDT means). 

Comment by Cole Wyeth (Amyr) on High school advice · 2023-09-12T22:44:00.267Z · LW · GW

I'm surprised no one seems to have mentioned this yet but the sequences are a good place to start on lesswrong generally (this isn't high school specific).
 

Also one piece of advice for high school, which is only distantly connected to rationality: If you want to go to a good university, jump through some hoops while you're in high school. I spent a lot of time reading textbooks and gave my classes like 15% effort because they seemed trivial and uninteresting, but 25% effort would have probably been enough to get me a 4.0 which would have been useful later.

Comment by Cole Wyeth (Amyr) on High school advice · 2023-09-12T22:39:29.796Z · LW · GW

Though the above shouldn't be used as justification for doing nothing. Reading textbooks and taking classes becomes doing nothing if there is no output.

Comment by Cole Wyeth (Amyr) on High school advice · 2023-09-12T22:38:03.611Z · LW · GW

Early in college I wanted to jump into "doing A.I." and ended up doing robotics research/clubs that I mostly hated in practice, and worse for whatever reason (probably too much obsession with discipline) didn't even notice I hated, and I wasted a lot of time that way. It's true that a lot of A.I. related fields attract people with less of a mathematical focus and it's important to develop your mind working on problems you actually like. 

Comment by Cole Wyeth (Amyr) on Contra Heighn Contra Me Contra Functional Decision Theory · 2023-09-12T22:28:25.413Z · LW · GW

You might like my recent post which also argues that FDT (or at least the sequential version of EDT, I don't feel that FDT is well specified enough for me to talk about it) shouldn't be considered a decision theory.
 

https://www.lesswrong.com/posts/MwetLcBPvshg9ePZB/decision-theory-is-not-policy-theory-is-not-agent-theory?commentId=rZ5tpB5nHewBzAszR

Comment by Cole Wyeth (Amyr) on Decision theory is not policy theory is not agent theory · 2023-09-11T18:56:29.901Z · LW · GW

I am skeptical of going beyond Decision Theory, though I think to some extent it is necessary for actually building an intelligent machine. Agent Theory seems to exist mostly to address that problem, but I believe that the proper way to make any headway on Agent Theory is likely to start with the much easier to analyze Decision Theory and then make corrections. 
It's interesting to consider whether there is another level beyond Agent Theory. I am not sure if I would consider Society Theory to be a generalization of the same type, I would have to consider that.

Comment by Cole Wyeth (Amyr) on Decision theory is not policy theory is not agent theory · 2023-09-06T11:50:41.565Z · LW · GW

I suggest the paper I mentioned on sequential extensions of causal and evidential decision theory. Sequential policy evidential decision theory is definitely a policy theory. But sequential action evidential decision theory is a decision theory making slightly weaker assumptions than CDT. So it's not clear where the general category EDT should go; I think I'll update the post to be more precise about that.

Comment by Cole Wyeth (Amyr) on Against Almost Every Theory of Impact of Interpretability · 2023-08-22T16:27:21.843Z · LW · GW

I roughly agree with the case made here because I expect interpretability research to be much, much harder than others seem to appreciate. This is a consequence of strong intuitions from working on circuit complexity. Figuring out the behavior of a general circuit sounds like it's in a very hard complexity class -  even writing down the truth table for a circuit takes exponential time in the number of inputs! I would be surprised if coming up with a human interpretable explanation of sub circuits is easy; there are some reasons to believe that SGD will usually produce simple circuits so some success in the average case is possible (see recent work of Ard Louis), but it would be pretty shocking if the full problem had a solution fast enough to run on the huge transformer circuits we are dealing with. 
I outlined this position (and pointed out that there is some hope of at least understanding some individual circuits and learning about intelligence) here: https://www.lesswrong.com/posts/RTmFpgEvDdZMLsFev/mechanistic-interpretability-is-being-pursued-for-the-wrong
(Not my best writing though)

Comment by Cole Wyeth (Amyr) on The Problem with AIXI · 2023-08-13T23:27:11.060Z · LW · GW

I have discussed this problem with Professor Hutter, and though I wouldn't claim to be able to predict how he would respond to this dialogue, I think his viewpoint is that the anvil problem will not matter in practice. In rough summary of his response: an agent will form a self model by observing itself taking actions through its own camera. When you write something on a piece of paper, you can read what you are writing, and see your own hand holding the pen. Though AIXI may not compress its own action bits, it will compress the results it observes of its actions, and will form a model of its hardware (except perhaps the part that produces and stores those action bits). 

Comment by Cole Wyeth (Amyr) on Contra Anton 🏴‍☠️ on Kolmogorov complexity and recursive self improvement · 2023-07-09T21:05:38.395Z · LW · GW

As has been observed by other commenters, the argument fails to take into account runtime limitations - in the real world programs can self improve by finding (provably) faster programs that (provably) perform the same inferences that they do, which most people would consider self improvement. However the argument may be onto something: it is indeed true that a program p cannot output a program q with K(q) > K(p) by more than a constant (there is a short program which simulates any input program and then runs its output). Here K(p) is the length of the shortest program with the same behavior as p - in this case we seem to require p to both output another program q and learn to predict a sequence. It is also true that a high level of Kolmogorov complexity is required to eventually predict all sequences up to a high level of complexity: https://arxiv.org/pdf/cs/0606070.pdf. 
The real world implications of this argument are probably lessened by the fact that predictors are embedded, and can improve their own hardware or even negotiate with other agents for provably superior software. 

Comment by Cole Wyeth (Amyr) on Contra Anton 🏴‍☠️ on Kolmogorov complexity and recursive self improvement · 2023-07-09T20:54:11.808Z · LW · GW

Another good example is the Goedel machine

Comment by Cole Wyeth (Amyr) on K-complexity is silly; use cross-entropy instead · 2023-07-07T18:58:48.976Z · LW · GW

As noted by interstice, this has been defined for infinite and finite strings (instead of functions). I believe Li and Vitanyi use the notation KM.

Comment by Cole Wyeth (Amyr) on Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)? · 2023-07-03T03:48:56.205Z · LW · GW

It has become slightly more plausible that Melanie Mitchell could come around. 

Comment by Cole Wyeth (Amyr) on Open Problems Related to Solomonoff Induction · 2023-06-01T12:41:03.892Z · LW · GW

How would you take the average over an infinite number of UTM's? You would first need to choose a distribution on them. 

Comment by Cole Wyeth (Amyr) on A flaw in the A.G.I. Ruin Argument · 2023-05-21T17:53:09.296Z · LW · GW

Not really. To be clear, I am criticizing the argument Eliezer tends to make. There can be flaws in that argument and we can still be doomed. I am saying his stated confidence is too high because even if alignment is as hard as he thinks, A.I. itself may be harder than he thinks, and this would give us more time to take alignment seriously.

In the second scenario I outlined (say, scenario B) where gains to intelligence feed back into hardware improvements but not drastic software improvements, multiple tries may be possible. On the whole I think that this is not very plausible (1/3 at most), and the other two scenarios look like they only give us one try. 

Comment by Cole Wyeth (Amyr) on A flaw in the A.G.I. Ruin Argument · 2023-05-19T21:51:30.666Z · LW · GW

I believe you're saying that if foom is more than a few years away, it becomes easy to solve the alignment problem before then. I certainly think it becomes easier. 

But the view that "foom more than a few years away -> the alignment problem is easy" is not the one I expressed, which contained among other highly tentative assertions: "the alignment problem is hard -> foom more than a few years away", and the two are opposed in the sense that they have different truth values when alignment is hard. The distinction here is that the chances we will solve the alignment problem depend on the time to takeoff, and are not equivalent to the difficulty of the alignment problem. 

So, you mentioned a causal link between time to foom and chances of solving alignment, which I agree on, but I am also asserting a "causal link" between difficulty of the alignment problem and time to foom (though the counterfactuals here may not be as well defined). 

As for how you could possibly believe foom is not going to happen any year now: My opinion depends on precisely what you mean by foom and by "any year now" but I think I outlined scenarios where it takes ~25 years and ~60 years. Do you have a reason to think both of those are unlikely? It seems to me that hard takeoff within ~5 years relies on the assumptions I mentioned about recursive algorithmic improvement taking place near human level, and seems plausible, but I am not confident it will happen. How surprised will you be if foom doesn't happen within 5 years?

I do expect the next 10 years to be pretty strange, but under the assumptions of the ~60 year scenario the status quo may not be completely upset that soon.  

Comment by Cole Wyeth (Amyr) on AI Will Not Want to Self-Improve · 2023-05-19T19:47:36.416Z · LW · GW

This is funny but realistically the human who physically swapped out the device driver for the virtual person would probably just swap the old one back. Generally speaking, digital objects that produce value are backed up carefully and not too fragile. At later stages of self improvement, dumb robots could be used for "screwdriver" tasks like this.