"Does your paradigm beget new, good, paradigms?"

post by Raemon · 2024-01-25T18:23:15.497Z · LW · GW · 5 comments

A very short version of this post, which seemed worth rattling of quickly for now.

 

A few months ago, I was talking to John about paradimicity in AI alignment. John says "we don't currently have a good paradigm." I asked "Is 'Natural Abstraction' a good paradigm?". He said "No, but I think it's something that's likely to output a paradigm that's closer to the right paradigm for AI Alignment."

"How many paradigms are we away from the right paradigm?"

"Like, I dunno, maybe 3?" said he.

Awhile later I saw John arguing on LessWrong with (I think?) Ryan Greenblatt about whether Ryan's current pseudo-paradigm was good. (Sorry if I got the names here or substance here wrong, I couldn't find the original thread, and it seemed slightly better to be specific so we could dig into a concrete example).

One distinction in the discussion seemed to be something like:

Now a) again I'm not sure I'm remembering this conversation right, b) whether either of those points are true in this particular case would be up for debate and I'm not arguing they're true. (also, regardless, I am interested in the idea of AI Control and think that getting AI companies to actually do the steps necessary to control at least nearterm AIs [LW · GW] is something worth putting effort into)

But it seemed good to promote to attention the idea that: when you're looking at clusters of AI Safety research and thinking about whether it is congealing into a useful, promising paradigm, one of the questions to ask is not just "does this paradigm seem locally tractable" but "do I have a sense that this paradigm will open up new lines of research that can lead to be better paradigms?".

(Whether one can be accurate in answering that question is yet another uncertainty. But, I think if you ask yourself "is this approach/paradigm useful", your brain will respond with different intuitions than "does this approach/paradigm seem likely to result in new/better paradigms?")

Some prior reading:

5 comments

Comments sorted by top scores.

comment by Thomas Kwa (thomas-kwa) · 2024-01-25T21:53:21.203Z · LW(p) · GW(p)

What do you mean by paradigm? It's easy to get confused [LW(p) · GW(p)] talking about paradigms.

comment by aysja · 2024-01-25T19:32:37.310Z · LW(p) · GW(p)

I think the guiding principle behind whether or not scientific work is good should probably look something more like “is this getting me closer to understanding what’s happening” where “understanding” is something like “my measurements track the thing in a one to one lock-step with reality because I know the right typings and I’ve isolated the underlying causes well enough.”

AI control doesn’t seem like it’s making progress on that goal, which is certainly not to say it’s not important—it seems good to me to be putting some attention on locally useful things. Whereas the natural abstractions agenda does feel like progress on that front.

As an aside: I dislike basically all words about scientific progress at this point. I don’t feel like they’re precise enough and it seems easy to get satiated on them and lose track of what’s actually important which is, imo, absolute progress on the problem of understanding what the fuck is going on with minds. Calling this sort of work “science” risks lumping it in with every activity that happens in e.g., academia, and that isn’t right. Calling it “pre-paradigmatic” risks people writing it off as “Okay so people just sit around being confused for years? How could that be good?”

I wish we had better ways of talking about it. I think that more precisely articulating what our goals are with agent foundations/paradigmaticity/etc could be very helpful, not only for people pursuing it, but for others to even have a sense of what it might mean for field founding science to help in solving alignment. As it is, it seems to often get rounded off to “armchair philosophy” or “just being sort of perpetually confused” which seems bad.

comment by ryan_greenblatt · 2024-01-25T23:17:41.109Z · LW(p) · GW(p)

Sorry if I got the names here or substance here wrong, I couldn't find the original thread, and it seemed slightly better to be specific so we could dig into a concrete example

FWIW, I don't seem to remember the exact conversation you mentioned (but it does sound sorta plausible). Also, I personally don't mind you using a fake example with me in it.

[Unimportant, but whatever] Quickly on the object level of the plausibly fictional conversation (lol):

had a bunch of traction on producing a plan that would at least reasonably help if we had to align superintelligent AIs in the near future.

I would more say "seems like it would reasonably help a lot in getting a huge amount of useful work out of AIs". (And then this work could plausibly help with aligning superintelligent AIs, but that isn't clearly the only or even main thing we're initially targeting.)

Replies from: Raemon
comment by Raemon · 2024-01-26T00:18:00.734Z · LW(p) · GW(p)

I would more say "seems like it would reasonably help a lot in getting a huge amount of useful work out of AIs". (And then this work could plausibly help with aligning superintelligent AIs, but that isn't clearly the only or even main thing we're initially targeting.)

Yeah I think if I thought more carefully before posting I'd have come up with this rephrasing myself. Matches my understanding of what you're going for.

comment by Joseph Bloom (Jbloom) · 2024-01-28T16:41:40.613Z · LW(p) · GW(p)

Thanks for writing this! This is an idea that I think is pretty valuable and one that comes up fairly frequently when discussing different AI safety research agendas.

I think that there's a possibly useful analogue of this which is useful from the perspective of being deep inside a cluster of AI safety research and wondering whether it's good. Specifically, I think we should ask "does the value of my current line of research hinge on us basically being right about a bunch of things or does much of the research value come from discovering all the places we are wrong?".

One reason this feels like an important variant to me is that when I speak to people skeptical about the area of research I've been working in, they often seem surprised that I'm very much in agreement with them about a number of issues. Still, I disagree with them that the solution is to shift focus, so much as to try to work how the one paradigm might need to shift into another.