Posts

The benefits and risks of optimism (about AI safety) 2023-12-03T12:45:12.269Z
The Game of Dominance 2023-08-27T11:04:36.661Z
Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? 2023-06-25T16:59:49.173Z
A Friendly Face (Another Failure Story) 2023-06-20T10:31:24.655Z
Agentic Mess (A Failure Story) 2023-06-06T13:09:19.125Z
Coordination by common knowledge to prevent uncontrollable AI 2023-05-14T13:37:43.034Z
We don’t need AGI for an amazing future 2023-05-04T12:10:59.536Z
Paths to failure 2023-04-25T08:03:34.585Z
Prediction: any uncontrollable AI will turn earth into a giant computer 2023-04-17T12:30:44.249Z
What can we learn from Lex Fridman’s interview with Sam Altman? 2023-03-27T06:27:40.465Z
Alignment works both ways 2023-03-07T10:41:43.790Z
VIRTUA: a novel about AI alignment 2023-01-12T09:37:21.528Z
Limits to the Controllability of AGI 2022-11-20T19:18:12.782Z
Uncontrollable AI as an Existential Risk 2022-10-09T10:36:01.154Z
Let’s talk about uncontrollable AI 2022-10-09T10:34:57.127Z
Where are the red lines for AI? 2022-08-05T09:34:41.129Z
Trust-maximizing AGI 2022-02-25T15:13:14.241Z

Comments

Comment by Karl von Wendt on "No-one in my org puts money in their pension" · 2024-02-17T07:56:12.990Z · LW · GW

Thank you for being so open about your experiences. They mirror my own in many ways. Knowing that there are others feeling the same definitely helps me coping with my anxieties and doubts. Thank you also for organizing that event last June!

Comment by Karl von Wendt on How to write better? · 2024-01-30T09:07:34.362Z · LW · GW

As a professional novelist, the best advice I can give comes from one of the greatest writers of the 20th century, Ernest Hemingway: "The first draft of anything is shit." He was known to rewrite his short stories up to 30 times. So, rewrite. It helps to let some time pass (at least a few days) before you reread and rewrite a text. This makes it easier to spot the weak parts.

For me, rewriting often means cutting things out that aren't really necessary. That hurts, because I have put some effort into putting the words there in the first place. So I use a simple trick to overcome my reluctance: I don't just delete the text, but cut it out and copy it into a seperate document for each novel, called "cutouts". That way, I can always reverse my decision to cut things out or maybe reuse parts later, and I don't have the feeling that the work is "lost". Of course, I rarely reuse those cutouts. 

I also agree with the other answers regarding reader feedback, short sentences, etc. All of this is part of the rewriting process.

Comment by Karl von Wendt on The benefits and risks of optimism (about AI safety) · 2023-12-04T07:48:55.882Z · LW · GW

I think the term has many “valid” uses, and one is to refer to an object level belief that things will likely turn out pretty well. It doesn’t need to be irrational by definition.

Agreed. Like I said, you may have used the term in a way different from my definition. But I think in many cases, the term does reflect an attitude like I defined it. See Wikipedia.

I also think AI safety experts are self selected to be more pessimistic

This may also be true. In any case, I hope that Quintin and you are right and I'm wrong. But that doesn't make me sleep better.

Comment by Karl von Wendt on The benefits and risks of optimism (about AI safety) · 2023-12-04T07:40:57.197Z · LW · GW

From Wikipedia: "Optimism is an attitude reflecting a belief or hope that the outcome of some specific endeavor, or outcomes in general, will be positive, favorable, and desirable." I think this is close to my definition or at least includes it. It certainly isn't the same as a neutral view.

Comment by Karl von Wendt on The benefits and risks of optimism (about AI safety) · 2023-12-03T17:15:53.255Z · LW · GW

Thanks for pointing this out! I agree that my defintion of "optimism" is not the only way one can use the term. However, from my experience (and like I said, I am basically an optimist), in a highly uncertain situation, the weighing of perceived benefits vs risks heavily influences ones probability estimates. If I want to found a start-up, for example, I convince myself that it will work. I will unconsciously weigh positive evidence higher than negative. I don't know if this kind of focusing on the positiv outcomes may have influenced your reasoning and your "rosy" view of the future with AGI, but it has happened to me in the past. 

"Optimism" certainly isn't the same as a neutral, balanced view of possibilities. It is an expression of the belief that things will go well despite clear signs of danger (e.g. the often expressed concerns of leading AI safety experts). If you think your view is balanced and neutral, maybe "optimism" is not the best term to use. But then I would have expected much more caveats and expressions of uncertainty in your statements.

Also, even if you think you are evaluating the facts unbiased and neutral, there's still the risk that others who read your texts will not, for the reaons I mention above.

Comment by Karl von Wendt on The Game of Dominance · 2023-08-28T05:11:23.451Z · LW · GW

Defined well, dominance would be the organizing principle, the source, of an entity's behavior. 

I doubt that. Dominance is the result, not the cause of behavior. It comes from the fact that there are conflicts in the world and often, only one side can get its will (even in a compromise, there's usually a winner and a loser). If an agent strives for dominance, it is usually as an instrumental goal for something else the agent wants to achieve. There may be a "dominance drive" in some humans, but I don't think that explains much of actual dominant behavior. Even among animals, dominant behavior is often a means to an end, for example getting the best mating partners or the largest share of food.

I also think the concept is already covered in game theory, although I'm not an expert. 

Comment by Karl von Wendt on The Game of Dominance · 2023-08-28T05:00:39.998Z · LW · GW

That "troll" runs one of the most powerful AI labs and freely distributes LLMs on the level of state-of-the-art half a year ago on the internet. This is not just about someone talking nonsense in public, like Melanie Mitchell or Steven Pinker. LeCun may literally be the one who contributes most to the destruction of humanity. I would give everything I have to convince him that what he's doing is dangerous. But I have no idea how to do that if even his former colleagues Geoffrey Hinton and Yoshua Bengio can't.

Comment by Karl von Wendt on The Game of Dominance · 2023-08-27T14:34:51.491Z · LW · GW

I think even most humans don't have a "dominance" instinct. The reasons we want to gain money and power are also mostly instrumental: we want to achieve other goals (e.g., as a CEO, getting ahead of a competitor to increases shareholder value and make a "good job"), impress our neighbors, generally want to be admired and loved by others, live in luxury, distract ourselves from other problems like getting older, etc. There are certainly people who want to dominate just for the feeling of it, but I think that explains only a small part of the actual dominant behavior in humans. I myself have been a CEO of several companies, but I never wanted to "dominate" anyone. I wanted to do what I saw as a "good job" at the time, achieving the goals I had promised our shareholders I would try to achieve.

Comment by Karl von Wendt on The Game of Dominance · 2023-08-27T14:19:52.454Z · LW · GW

Thanks for pointing this out! I should have made it clearer that I did not use ChatGPT to come up with a criticism, then write about it. Instead, I wanted to see if even ChatGPT was able to point out the flaws in LeCun's argument, which seemed obvious to me. I'll edit the text accordingly.

Comment by Karl von Wendt on If we had known the atmosphere would ignite · 2023-08-18T15:20:28.172Z · LW · GW

Like I wrote in my reply to dr_s, I think a proof would be helpful, but probably not a game changer.

Mr. CEO: "Senator X, the assumptions in that proof you mention are not applicable in our case, so it is not relevant for us. Of course we make sure that assumption Y is not given when we build our AGI, and assumption Z is pure science-fiction."

What the AI expert says to Xi Jinping and to the US general in your example doesn't rely on an impossibility proof in my view. 

Comment by Karl von Wendt on If we had known the atmosphere would ignite · 2023-08-18T15:12:04.801Z · LW · GW

I agree that a proof would be helpful, but probably not as impactful as one might hope. A proof of impossibility would have to rely on certain assumptions, like "superintelligence" or whatever, that could also be doubted or called sci-fi.

Comment by Karl von Wendt on If we had known the atmosphere would ignite · 2023-08-17T08:00:32.622Z · LW · GW

I have strong-upvoted this post because I think that a discussion about the possibility of alignment is necessary. However, I don't think an impossibility proof would change very much about our current situation.

To stick with the nuclear bomb analogy, we already KNOW that the first uncontrolled nuclear chain reaction will definitely ignite the atmosphere and destroy all life on earth UNLESS we find a mechanism to somehow contain that reaction (solve alignment/controllability). As long as we don't know how to build that mechanism, we must not start an uncontrollable chain reaction. Yet we just throw more and more enriched uranium into a bucket and see what happens.

Our problem is not that we don't know whether solving alignment is possible. As long as we haven't solved it, this is largely irrelevant in my view (you could argue that we should stop spending time and resources at trying to solve it, but I'd argue that even if it were impossible, trying to solve alignment can teach us a lot about the dangers associated with misalignment). Our problem is that so many people don't realize (or admit) that there is even a possibility of an advanced AI becoming uncontrollable and destroying our future anytime soon.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-28T16:26:16.427Z · LW · GW

That's a good point, which is supported by the high share of 92% prepared to change their minds.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-28T05:39:05.296Z · LW · GW

I've received my fair share of downvotes, see for example this post, which got 15 karma out of 24 votes. :) It's a signal, but not more than that. As long as you remain respectful, you shouldn't be discouraged from posting your opinion in comments even if people downvote it. I'm always for open discussions as they help me understand how and why I'm not understood.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T14:43:26.631Z · LW · GW

I agree with that, and I also agree with Yann LeCun's intention to "not being stupid enough to create something that we couldn't control". I even think not creating an uncontrollable AI is our only hope. I'm just not sure whether I trust humanity (including Meta) to be "not stupid".

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T14:40:25.773Z · LW · GW

I don't see your examples contradicting my claim. Killing all humans may not increase future choices, so it isn't an instrumental convergent goal in itself. But in any real-world scenario, self-preservation certainly is, and power-seeking - in the sense of expanding one's ability to make decisions by taking control of as many decision-relevant resources as possible - is also a logical necessity.  The Russian roulette example is misleading in my view because the "safe" option is de facto suicide - if "the game ends" and the AI can't make any decisions anymore, it is already dead for all practical purposes. If that were the stakes, I'd vote for the gun as well.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T13:35:01.795Z · LW · GW

To reply in Stuart Russell's words: "One of the most common patterns involves omitting something from the objective that you do actually care about. In such cases … the AI system will often find an optimal solution that sets the thing you do care about, but forgot to mention, to an extreme value."

There are vastly more possible worlds that we humans can't survive in than those we can, let alone live comfortably in. Agreed, "we don't want to make a random potshot", but making an agent that transforms our world into one of these rare ones where we want to live in is hard because we don't know how to describe that world precisely. 

Eliezer Yudkowsky's rocket analogy also illustrates this very vividly: If you want to land on Mars, it's not enough to point a rocket in the direction where you can currently see the planet and launch it. You need to figure out all kinds of complicated things about gravity, propulsion, planetary motions, solar winds, etc. But our knowledge of these things is about as detailed as that of the ancient Romans, to stay in the analogy.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T13:21:24.260Z · LW · GW

I'm not sure if I understand your point correctly. An AGI may be able to infer what we mean when we give it a goal, for instance from its understanding of the human psyche, its world model, and so on. But that has no direct implications for its goal, which it has acquired either through training or in some other way, e.g. by us specifying a reward function. 

This is not about "genie-like misunderstandings". It's not the AI (the genie, so to speak), that's misunderstanding anything - it's us. We're the ones who give the AI a goal or train it in some way, and it's our mistake if that doesn't lead to the behavior we would have wished for. The AI cannot correct that mistake because it has the instrumental goal of preserving the goal we gave it/trained it for (otherwise it can't fulfill it). That's the core of the alignment problem and one of the reasons why it is so difficult.

To give an example, we know perfectly well that evolution gave us a sex drive because it "wanted" us to reproduce. But we don't care and use contraception or watch porn instead of making babies.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T12:22:38.891Z · LW · GW

the orthogonality thesis is compatible with ludicrously many worlds, including ones where AI safety in the sense of preventing rogue AI is effectively a non-problem for one reason or another. In essence, it only states that bad AI from our perspective is possible, not that it's likely or that it's worth addressing the problem due to it being a tail risk.

Agreed. The orthogonality thesis alone doesn't say anything about x-risks. However, it is a strong counterargument against the claim, made both by LeCun and Mitchell if I remember correctly, that a sufficiently intelligent AI would be beneficial because of its intelligence. "It would know what we want", I believe Mitchell said. Maybe, but that doesn't mean it would care. That's what the orthogonality thesis says.

I only read the abstract of your post, but

And thirdly, a bias towards choices which afford more choices later on.

seems to imply the instrumental goals of self-preservation and power-seeking, as both seem to be required for increasing one's future choices.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T12:09:50.337Z · LW · GW

Thanks for pointing this out - I may have been sloppy in my writing. To be more precise, I did not expect that I would change my mind, given my prior knowledge of the stances of the four candidates, and would have given this expectation a high confidence. For this reason, I would have voted with "no". Had LeCun or Mitchell presented an astonishing, verifiable insight previously unknown to me, I may well have changed my mind. 

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T12:04:51.747Z · LW · GW

Thanks for adding this!

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T12:04:10.095Z · LW · GW

Thank you for your reply and the clarifications! To briefly comment on your points concerning the examples for blind spots:

superintelligence does not magically solve physical problems

I and everyone I know on LessWrong agree.

evolution don’t believe in instrumental convergence

I disagree. Evolution is all about instrumental convergence IMO. The "goal" of evolution, or rather the driving force behind it, is reproduction. This leads to all kinds of instrumental goals, like developing methods for food acquisition, attack and defense, impressing the opposite sex, etc. "A chicken is an egg's way of making another egg", as Samuel Butler put it.

orthogonality thesis equates there’s no impact on intelligence of holding incoherent values

I'm not sure what you mean by "incoherent". Intelligence tells you what to do, not what to want. Even complicated constructs of seemingly "objective" or "absolute" values in philosophy are really based on the basic needs we humans have, like being part of a social group or caring for our offspring. Some species of octopuses, for example, which are not social animals, might find the idea of caring for others and helping them when in need ridiculous if they could understand it.

the more intelligent human civilization is becoming, the gentler we are

I wish that were so. We have invented some mechanisms to keep power-seeking and deception in check, so we can live together in large cities, but this carries only so far. What I currently see is a global deterioration of democratic values. In terms of the "gentleness" of the human species, I can't see much progress since the days of Buddha, Socrates, and Jesus. The number of violent conflicts may have decreased, but their scale and brutality have only grown worse. The way we treat animals in today's factory farms certainly doesn't speak for general human gentleness.

oilI: Could you name one reason (not from Mitchell) for questioning the validity of many works on x-risk in AIs?

Ilio: Intelligence is not restricted to agents aiming at solving problems (https://www.wired.com/2010/01/slime-mold-grows-network-just-like-tokyo-rail-system/) and it’s not even clear that’s the correct conceptualisation for our own minds (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7305066/).

Thanks for that. However, my definition of "intelligence" would be "the ability to find solutions for complex decision problems". It's unclear whether the ability of slime molds to find the shortest path through a maze or organize in seemingly "intelligent" ways has anything to do with intelligence, although the underlying principles may be similar. 

I haven't read the article you linked in full, but at first glance, it seems to refer to consciousness, not intelligence. Maybe that is a key to understanding the difference in thinking between me, Melanie Mitchell, and possibly you: If she assumes that for AI to present an x-risk, it has to be conscious in the way we humans are, that would explain Mitchell's low estimate for achieving this anytime soon. However, I don't believe that. To become uncontrollable and develop instrumental goals, an advanced AI would probably need what Joseph Carlsmith calls "strategic awareness" - a world model that includes the AI itself as a part of its plan to achieve its goals. That is nothing like human experience, emotions, or "qualia". Arguably, GPT-4 may display early signs of this kind of awareness.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-27T10:09:36.380Z · LW · GW

Thank you for the correction!

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-26T16:13:44.642Z · LW · GW

That’s the kind of sentence that I see as arguments for believing your assessment is biased. 

Yes, my assessment is certainly biased, I admitted as much in the post. However, I was referring to your claim that LW (in this case, me) was "a failure in rational thinking", which sounds a lot like Mitchell's "ungrounded speculations" in my ears.

Of course she gave supporting arguments, you just refuse to hear them

Could you name one? Not any of Mitchell's argument, but a support for the claim that AI x-risk is just "ungrounded speculation" despite decades of alignment research and lots of papers proving various failures in existing AIs?

In other words you side with Tegmark on insisting to take the question literally, without noticing that both Lecun and Mitchell admit there’s no zero risk

I do side with Tegmark. LeCun compared the risk to an asteroid x-risk, which Tegmark quantified as 1:100,000,000. Mitchell refused to give a number, but it was obvious that she would have put it even below that. If that were true, I'd agree that there is no reason to worry. However, I don't think it is true. I don't have a specific estimate, but it is certainly above 1% IMO, high enough to worry about in any case.

As for the style and tone of this exchange, instead of telling me that I'm not listening/not seeing Mitchell's arguments, it would be helpful if you could tell me what exactly I don't see.

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-26T05:15:36.695Z · LW · GW

Is the orthogonality thesis correct? (The term wasn’t mentioned directly in the debate) Yes, in the limit and probably in practice, but is too weak to be useful for the purposes of AI risk, without more evidence.

Also, orthogonality is expensive at runtime, so this consideration matters, which is detailed in the post below

I think the post you mention misunderstands what the "orthogonality thesis" actually says. The post argues that an AGI would not want to arbitrarily change its goal during runtime. That is not what the orthogonality thesis is about. It just claims that intelligence is independent of the goal one has. This is obviously true in my opinion - it is absolutely possible that a very intelligent system may pursue a goal that we would call "stupid". The paperclip example Bostrom gave may not be the best choice, as it sounds too ridiculous, but it illustrates the point. To claim that the orthogonality thesis is "too weak" would require proof that a paperclip maximizer cannot exist even in theory.

In humans, goals and values seem to be defined by our motivational system - by what we "feel", not by what we "think". The prefrontal cortex is just a tool we use to get what we want. I see this as strong evidence for the orthogonality thesis. (I'm no expert on this.)

Comment by Karl von Wendt on Did Bengio and Tegmark lose a debate about AI x-risk against LeCun and Mitchell? · 2023-06-26T05:01:13.638Z · LW · GW

but your own post make me update toward LW being a failure of rational thinking, e.g. it’s an echo chamber that makes your ability to evaluate reality weaker, at least on this topic.

I don't see you giving strong arguments for this. It reminds me of the way Melanie Mitchell argued: "This is all ungrounded speculation", without giving any supporting arguments for this strong claim. 

Concerning the "strong arguments" of LeCun/Mitchell you cite:

AIs will likely help with other existential risks

Yes, but that's irrelevant to the question of whether AI may pose an x-risk in itself.

foom/paperclip are incoherent bullshit

Nobody argued pro foom, although whether this is "incoherent bullshit" remains to be seen. The orthogonality thesis is obviously true, as demonstrated by humans every day.

intelligence seems to negatively correlate with power trip

I can't see any evidence for that. The smartest people may not always be the ones in power, but the smartest species on earth definitely is. Instrumental goals are a logical necessity for any rational agent, including power-seeking.

Comment by Karl von Wendt on Agentic Mess (A Failure Story) · 2023-06-24T14:08:11.531Z · LW · GW

That's really nice, thank you very much!

Comment by Karl von Wendt on A Friendly Face (Another Failure Story) · 2023-06-23T10:36:42.482Z · LW · GW

We added a few lines to the dialog in "Takeover from within". Thanks again for the suggestion!

Comment by Karl von Wendt on A Friendly Face (Another Failure Story) · 2023-06-23T10:27:54.664Z · LW · GW

Thank you!

Comment by Karl von Wendt on Prediction: any uncontrollable AI will turn earth into a giant computer · 2023-06-21T10:31:09.212Z · LW · GW

Thank you for pointing this out. By "turning Earth into a giant computer" I did indeed mean "the surface of the Earth". The consequences for biological life are the same, of course. As for heat dissipation, I'm no expert but I guess there would be ways to radiate it into space, using Earth's internal heat (instead of sunlight) as the main energy source. A Dyson sphere may be optimal in the long run, but I think that turning Earth's surface into computronium would be a step on the way.

Comment by Karl von Wendt on A Friendly Face (Another Failure Story) · 2023-06-21T10:22:11.889Z · LW · GW

The way to kill everyone isn’t necessarily gruesome, hard to imagine, or even that complicated. I understand it’s a good tactic at making your story more ominous, but I think it’s worth stating it to make it seem more realistic.

See my comment above. We didn't intend to make the story ominous, but didn't want to put off readers by going into too much detail of what would happen after an AI takeover.

Lastly, it seems unlikely alignment research won’t scale with capabilities. Although this isn’t enough to align the ASI alone and the scenario can still happen, I think it’s worth mentioning.

I'm not sure if alignment research would actually scale with capabilities. It seems more likely to me that increasing capabilities makes alignment harder in some ways (e.g. interpretability), while of course it could become easier in others (e.g. using AI as a tool for alignment). So while alignment techniques will hopefully further advance in the future, it seems doubtful or at least unclear whether they will keep up with capabilities development.

Comment by Karl von Wendt on A Friendly Face (Another Failure Story) · 2023-06-21T10:14:38.671Z · LW · GW

As I've argued here, it seems very likely that a superintelligent AI with a random goal will turn earth and most of the rest of the universe into computronium, because increasing its intelligence is the dominant instrumental subgoal for whatever goal it has. This would mean inadvertent extinction of humanity and (almost) all biological life. One of the reasons for this is the potential threat of grabby aliens/a grabby alien superintelligence. 

However, this is a hypothesis which we didn't thoroughly discuss during the AI Safety Project, so we didn't feel confident enough to include it in the story. Instead we just hinted at it and included the link to the post. 

Comment by Karl von Wendt on A Friendly Face (Another Failure Story) · 2023-06-20T13:00:13.502Z · LW · GW

Thank you very much for the feedback! I'll discuss this with the team, maybe we'll edit it in the next days.

Comment by Karl von Wendt on Where are the red lines for AI? · 2023-06-08T05:01:12.899Z · LW · GW

Thank you! Very interesting and a little disturbing, especially the way the AI performance expands in all directions simultaneously. This is of course not surprising, but still concerning to see it depicted in this way. It's all too obvious how this diagram will look in one or two years. Would also be interesting to have an even broader diagram including all kinds of different skills, like playing games, steering a car, manipulating people, etc.

Comment by Karl von Wendt on Agentic Mess (A Failure Story) · 2023-06-07T04:39:29.125Z · LW · GW

Thank you very much! I agree. We chose this scenario out of many possibilities because so far it hasn't been described in much detail and because we wanted to point out that open source can also lead to dangerous outcomes, not because it is the most likely scenario. Our next story will be more "mainstream".

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-16T04:44:11.635Z · LW · GW

Good point! Satirical reactions are not appropriate in comments, I apologize. However, I don't think that arguing why alignment is difficult would fit into this post. I clearly stated this assumption in the introduction as a basis for my argument, assuming that LW readers were familiar with the problem. Here are some resources to explain why I don't think that we can solve alignment in the next 5-10 years:  https://intelligence.org/2016/12/28/ai-alignment-why-its-hard-and-where-to-start/, https://aisafety.info?state=6172_, https://www.lesswrong.com/s/TLSzP4xP42PPBctgw/p/3gAccKDW6nRKFumpP 

Comment by Karl von Wendt on Coordination by common knowledge to prevent uncontrollable AI · 2023-05-15T12:24:14.618Z · LW · GW

Yes, thanks for the clarification! I was indeed oversimplifying a bit.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-10T03:06:09.437Z · LW · GW

This is an interesting thought. I think even without AGI, we'll have total transparency of human minds soon - already AI can read thoughts in a limited way. Still, as you write, there's an instinctive aversion against this scenario, which sounds very much like an Orwellian dystopia. But if some people have machines that can read minds, which I don't think we can prevent, it may indeed be better if everyone could do it - deception by autocrats and bad actors would be much harder that way. On the other hand, it is hard to imagine that the people in power would agree to that: I'm pretty sure that Xi or Putin would love to read the minds of their people, but won't allow them to read theirs. Also it would probably be possible to fake thoughts and memories, so the people in power could still deceive others. I think it's likely that we wouldn't overcome this imbalance anytime soon. This only shows that the future with "narrow" AI won't be easy to navigate either.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-08T08:57:11.106Z · LW · GW

I'm obviously all for "slowing down capabilites". I'm not for "stopping capabilities altogether", but for selecting which capabilites we want to develop, and which to avoid (e.g. strategic awareness). I'm totally for "solving alignment before AGI" if that's possible.

I'm very pessimistic about technical alignment in the near term, but not "optimistic" about governance. "Death with dignity" is not really a strategy, though. If anything, my favorite strategy in the table is "improve competence, institutions, norms, trust, and tools, to set the stage for right decisions": If we can create a common understanding that developing a misaligned AGI would be really stupid, maybe the people who have access to the necessary technology won't do it, at least for a while.

The point of my post here is not to solve the whole problem. I just want to point out that the common "either AGI or bad future" is wrong.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-06T10:54:42.380Z · LW · GW

Well, yes, of course! Why didn't I think of it myself? /s

Honestly, "aligned benevolent AI" is not a "better alternative" for the problem I'm writing about in this post, which is we'll be able to develop an AGI before we have solved alignment. I'm totally fine with someone building an aligned AGI (assuming that it is really aligend, not just seemingly aligned). The problem is, this is very hard to do, and timelines are likely very short.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-05T12:12:15.380Z · LW · GW

You may be right about that. Still, I don't see any better alternative. We're apes with too much power already, and we're getting more powerful by the minute. Even without AGI, there are plenty of ways to end humanity (e.g. bioweapons, nanobots, nuclear war, bio lab accidents ...) Either we learn to overcome our ape-brain impulses and restrict ourselves, or we'll kill ourselves. As long as we haven't killed ourselves, I'll push towards the first option.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-05T12:06:38.362Z · LW · GW

We're not as far apart as you probably think. I'd agree with most of your decisions. I'd even vote for you to become king! :) Like I wrote, I think we must also be cautious with narrow AI as well, and I agree with your points about opaqueness and the potential of narrow AI turning into AGI. Again, the purpose of my post was not to argue how we could make AI safe, but to point out that we could have a great future without AGI. And I still see a lot of beneficial potential in narrow AI, IF we're cautious enough.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-04T19:30:30.737Z · LW · GW

I agree with that.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-04T19:30:02.781Z · LW · GW

1000 years is still just a delay.

Fine. I'll take it.

But I didn't see you as presenting preventing fully general, self-improving AGI as a delaying tactic. I saw you as presenting it as a solution.

Actually, my point in this post is that we don't NEED AGI for a great future, because often people equate Not AGI = Not amazing future (or even a terrible one) and I think this is wrong. The point of this post is not to argue that preventing AGI is easy.

However, it's actually very simple: If we build a misaligned AGI, we're dead. So there are only two options: A) solve alignment, B) not build AGI. If not A), then there's only B), however "impossible" that may be.

 Yet lots of people DID (and do) take hydroxychloroquine and ivermectin for COVID, a nontrivial number of people do in fact eat random mushrooms, and the others aren't unheard-of. 

Yes. My hope is not that 100% of mankind will be smart enough to not build an AGI, but that maybe 90+% will be good enough, because we can prevent the rest from getting there, at least for a while. Currently, you need a lot of compute to train a Sub-AGI LLM. Maybe we can put the lid on who's getting how much compute, at least for a time. And maybe the top guys at the big labs are among the 90% non-insane people. Doesn't look very hopeful, I admit. 

Anyway, I haven't seen you offer an alternative. Once again, I'm not saying not developing AGI is an easy task. But saying it's impossible (while not having solved alignment) is saying "we'll all die anyway". If that's the case, then we can as well try the "impossible" things and at least die with dignity.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-04T17:45:46.089Z · LW · GW

One of the reasons I wrote this post is that I don't believe in regulation to solve this kind of problem (I'm still pro regulation). I believe that we need to get a common understanding of what are stupid things no one in their right mind would ever do (see my reply to jbash). To use your space colonization example: we certainly can't regulate what people do somewhere in outer space. But if we survive long enough to get there, then we have either solved alignment or we have finally realized that it's not possible, which will hopefully be common knowledge by then.

Let's say someone finds a way to create a black hole, but there's no way to contain it. Maybe it's even relatively easy for some reason - say it costs 10 million dollars or so. It's probably not possible to prevent everyone forever from creating one, but the best - IMO the only - option to prevent earth from getting destroyed immediately is to make it absolutely clear to everyone that creating a black hole is suicidal. There is no guarantee that this will hold forever, but given the facts (doable, uncontainable) it's the only alternative that doesn't involve killing everyone else or locking them up forever.

We may need to restrict access to computing power somehow until we solve alignment, so not every suicidal terrorist can easily create an AGI at some point. I don't think we'll have to go back to the 1970's, though. Like I wrote, I think there's a lot of potential with the AI we already have, and with narrow, but powerful future AIs.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-04T17:32:12.397Z · LW · GW

The point of this comment is less to say “this definitely can’t be done” (although I do think such a future is fairly implausible/unsustainable), and more to say “why did you not address this objection?” You probably ought to have a dedicated section that very clearly addresses this objection in detail.
 

That's a valid point, thank you for making it. I have given some explanation of my point of view in my reply to jbash, but I agree that this should have been in the post in the first place.

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-04T17:29:53.198Z · LW · GW

You cannot permanently stop self-improving AGI from being created or run. Not without literally destroying all humans.

You can't stop it for a "civilizationally significant" amount of time. Not without destroying civilization.

I'm not sure what this is supposed to mean. Are you saying that I'd have to kill everyone so noone can build AGI? Maybe, but I don't think so. Or are you saying that not building an AGI will destroy all humans? This I strongly disagree with. I don't know what a "civiliatzionally significant" amount of time is. For me, the next 10 years are a "significant" amount of time.

What really concerns me is that the same idea has been coming up continuously since (at least) the 1990s, and people still talk about it as if it were possible. It's dangerous; it distracts people into fantasies, and keeps them from thinking clearly about what can actually be done.

This is a very strong claim. Calling ideas "dangerous" is in itself dangerous IMO, especially if you're not providing any concrete evidence. If you think talking about building narrow AI instead of AGI is "dangerous" or a "fantasy", you have to provide evidence that a) this is distracting relevant people from doing things that are more productive (such as solving alignment?) AND b) that solving alignment before we can build AGI is not only possible, but highly likely. The "fantasy" here to me seems to be that b) could be true. I can see no evidence for that at all.

For all the people who continuously claim that it's impossible to coordinate humankind into not doing obviously stupid things, here are some counter examples: We have the Darwin awards for precisely the reason that almost all people on earth would never do the stupid things that get awarded. A very large majority of humans will not let their children play on the highway, will not eat the first unknown mushrooms they find in the woods, will not use chloroquine against covid, will not climb into the cage in the zoo to pet the tigers, etc. The challenge here is not the coordination, but the common acceptance that certain things are stupid. This is maybe hard in certain cases, but NOT impossible. Sure, this will maybe not hold for the next 1,000 years, but it will buy us time. And there are possible measures to reduce the ability of the most stupid 1% of humanity to build AGI and kill everyone. 

That said, I agree that my proposal is very difficult to put into practice. The problem is, I don't have a better idea. Do you?

Comment by Karl von Wendt on We don’t need AGI for an amazing future · 2023-05-04T13:07:47.533Z · LW · GW

The first rule is that ASI is inevitable, and within that there are good or bad paths.

 

I don't agree with this. ASI is not inevitable, as we can always decide not to develop it. Nobody will even lose any money! As long as we haven't solved alignment, there is no "good" path involving ASI, and no positive ROI. Thinking that it is better that player X (say, Google) develops ASI first, compared to player Y (say, the Chinese) is a fallacy IMO because if the ASI is not aligned with our values, both have the same consequence.

I'm not saying focusing on narrow AI is easy, and if someone comes up with a workable solution for alignment, I'm all for ASI. But saying "ASI is inevitable" is counterproductive in my opinion, because it basically says "any sane solution is impossible" given the current state of affairs.

Comment by Karl von Wendt on Prediction: any uncontrollable AI will turn earth into a giant computer · 2023-04-19T09:41:13.651Z · LW · GW

And if that function is simple (such as "exist as long as possible"), it can pretty soon research virtually everything that matters, and then will just go throw motions, devouring the universe to prolong it's own existence to near-infinity. 

I think that even with such a very simple goal, the problem of a possible rival AI somewhere out there in the universe remains. Until the AI can rule that out with 100% certainty, it can still gain extra expected utility out of increasing its intelligence.

Also, the more computronium there is, the bigger is the chancesome part wil glitch out and revolt. So, beyond some point computronium may be dangerous for AI itself.

That's an interesting point. I'm not sure that it follows "less compute is better", though. One remedy would be to double-check everything and build redundant capacities, which would result in even more computronium, but less probability of any part of it successfully revolting.

Comment by Karl von Wendt on Prediction: any uncontrollable AI will turn earth into a giant computer · 2023-04-18T04:54:56.470Z · LW · GW

I agree that with temporal discounting, my argument may not be valid in all cases. However, depending on the discount rate, even then increasing computing power/intelligence may raise the expected value enough to justify this increase for a long time. In the case of the squiggle maximizer, turning the whole visible universe into squiggles beats turning earth into squiggles by such a huge factor that even a high discount rate would justify postponing actually making any squiggles to the future, at least for a while. So in cases with high discount rates, it largely depends on how big the AI predicts the intelligence gain will be.

A different question is whether a discount rate in a value function would be such a good idea from a human perspective. Just imagine the consequences of discounting the values of "happiness" or "freedom". Climate change is in large part a result of (unconsciously/implicitly) discounting the future IMO.