LessWrong 2.0 Reader

View: New · Old · Top

← previous page (newer posts) · next page (older posts) →

← previous page (newer posts) · next page (older posts) →

Recent comments

seth-herd on Instruction-following AGI is easier and more likely than value aligned AGI

I read your linked shortform thread. I agreed with pretty most of your arguments against some common AGI takeover arguments. I agree that they won't coordinate against us and won't have "collective grudges" against us.

But I don't think the arguments for continued stability are very thorough, either. I think we just don't know how it will play out. And I think there's a reason to be concerned that takeover will be rational for AGIs, where it's not for humans.

The central difference in logic is the capacity for self-improvement. In your post, you addressed self-improvement by linking a Christiano piece on slow takeoff. But he noted at the start that he wasn't arguing against self-improvement, only that the pace of self improvement would be more modest. But the potential implications for a balance of power in the world remain.

Humans are all locked to a similar level of cognitive and physical capabilities. That has implications for game theory where all of the competitors are humans. Cooperation often makes more sense for humans. But the same isn't necessarily true of AGI. Their cognitive and physical capacities can potentially be expanded on. So it's (very loosely) like the difference between game theory in chess, and chess where one of the moves is to add new capabilities to your pieces. We can't learn much about the new game from theory of the old, particularly if we don't even know all of the capabilities that a player might add to their pieces.

More concretely: it may be quite rational for a human controlling an AGI to tell it to try to self-improve and develop new capacities, strategies and technologies to potentially take over the world. With a first-mover advantage, such a takeover might be entirely possible. Its capacities might remain ahead of the rest of the world's AI/AGIs if they hadn't started to aggressively self-improve and develop the capacities to win conflicts. This would be particularly true if the aggressor AGI was willing to cause global catastrophe (e.g., EMPs, bringing down power grids).

The assumption of a stable balance of power in the face of competitors that can improve their capacities in dramatic ways seems unlikely to be true by default, and at the least, worthy of close inspection. Yet I'm afraid it's the default assumption for many.

Your shortform post is more on-topic for this part of the discussion, so I'm copying this comment there and will continue there if you want. It's worth more posts; I hope to write one myself if time allows.

Edit: It looks like there's an extensive discussion there, including my points here, so I won't bother copying this over. As far as I could tell, neither you nor anyone else had really addressed the destabilizing effect of potential AGI self-improvement. So I continue to think that a massively multipolar AGI scenario probably results fairly quickly in conflict and potential catastrophe.

eggsyntax on Language Models Model Us

I'm aware of the paper because of the impact it had. I might personally not have chosen to draw their attention to the issue, since the main effect seems to be making some research significantly more difficult, and I haven't heard of any attempts to deliberately exfiltrate weights that this would be preventing.

bec-hawk on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Noting that while Sam describes the provision as being about “about potential equity cancellation”, the actual wording says ‘shall be cancelled’ not ‘may be cancelled’, as per this tweet from Kelsey Piper: https://x.com/KelseyTuoc/status/1791584341669396560

eggsyntax on Language Models Model Us

Interesting! Tough to test at scale, though, or score in any automated way (which is something I'm looking for in my approaches, although I realize you may not be).

bec-hawk on Stephen Fowler's Shortform

Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.

OpenAI wasn’t a private company (ie for-profit) at the time of the OP grant though.

seth-herd on Instruction-following AGI is easier and more likely than value aligned AGI

In the near term AI and search are blurred, but that's a separate topic. This post was about AGI as distinct from AI. There's no sharp line between but there are important distinctions, and I'm afraid we're confused as a group because of that blurring. More above [LW(p) · GW(p)], and it's worth its own post and some sort of new clarifying terminology. The term AGI has been watered down to include LLMs that are fairly general, rather than the original and important meaning of AI that can think about anything, implying the ability to learn, and therefore almost necessarily to have explicit goals and agency. This was about that type of "real" AGI, which is still hypothetical even though increasingly plausible in the near term.

alex_altair on Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Hey Johannes, I don't quite know how to say this, but I think this post is a red flag about your mental health. "I work so hard that I ignore broken glass and then walk on it" is not healthy.

I've been around the community a long time and have seen several people have psychotic episodes. This is exactly the kind of thing I start seeing before they do.

I'm not saying it's 90% likely, or anything. Just that it's definitely high enough for me to need to say something. Please try to seek out some resources to get you more grounded.

arthur-conmy on Language Models Model Us

They emailed some people about this: https://x.com/brianryhuang/status/1763438814515843119

The reason is that it may allow unembedding matrix weight stealing: https://arxiv.org/abs/2403.06634

seth-herd on Instruction-following AGI is easier and more likely than value aligned AGI

Yes, we do see such "values" now, but that's a separate issue IMO.

There's an interesting thing happening in which we're mixing discussions of AI safety and AGI x-risk. There's no sharp line, but I think they are two importantly different things. This post was intended to be about AGI, as distinct from AI. Most of the economic and other concerns relative to the "alignment" of AI are not relevant to the alignment of AGI.

This thesis could be right or wrong, but let's keep it distinct from theories about AI in the present and near future. My thesis here (and a common thesis) is that we should be most concerned about AGI that is an entity with agency and goals, like humans have. AI as a tool is a separate thing. It's very real and we should be concerned with it, but not let it blur into categorically distinct, goal-directed, self-aware AGI.

Whether or not we actually get such AGI is an open question that should be debated, not assumed. I think the answer is very clearly that we will, and soon; as soon as tool AI is smart enough, someone will make it agentic, because agents can do useful work, and they're interesting. So I think we'll get AGI with real goals, distinct from the pseudo-goals implicit in current LLMs behavior.

The post addresses such "real" AGI that is self-aware and agentic, but that has the sole goal of doing what people want is pretty much a third thing that's somewhat counterintuitive.

bec-hawk on Ilya Sutskever and Jan Leike resign from OpenAI [updated]

Is that not what Altman is referring to when he talks about vested equity? My understanding was employees had no other form of equity besides PPUs, in which case he’s talking non-misleadingly about the non-narrow case of vested PPUs, ie the thing people were alarmed about, right?

LessWrong 2.0 Reader

Archive

Recent comments