Understanding Conjecture: Notes from Connor Leahy interview

post by Akash (akash-wasil) · 2022-09-15T18:37:51.653Z · LW · GW · 23 comments

Contents

  Highlights
    Timelines
    Thoughts on MIRI Dialogues & Eliezer’s style
    Thoughts on Death with Dignity & optimizing for “dignity points” rather than utility
    Thoughts on the importance of playing with large models
    Conjecture
    Refine (alignment incubator)
    Uncorrelated Bets
    What does Conjecture need right now?
  Full notes
    AGI Timelines
    How we’ll get AGI
    Thoughts on Ajeya’s bioanchors report
    Thoughts on Death with Dignity
    Thoughts on the importance of playing with large models
    Was Eleuther AI net negative?
    Conjecture
    Thoughts on government coordination
    Miracles
    Uncorrelated bets
    What partial solutions to alignment will look like
    What Conjecture works on
    Refine (alignment incubator)
    Thoughts on infohazards
    Why is Conjecture for-profit?
    How will Conjecture make money?
    What does Conjecture need right now?
    Why invest in Conjecture instead of Redwood or Anthropic?
None
24 comments

I recently listened to Michaël Trazzi interview Connor Leahy (co-founder & CEO of Conjecture, a new AI alignment organization) on a podcast called The Inside View. Youtube video here; full video & transcript here

The interview helped me better understand Connor’s worldview and Conjecture’s theory of change.

I’m sharing my notes below. The “highlights” section includes the information I found most interesting/useful. The "full notes" section includes all of my notes.

Disclaimer #1: I didn’t take notes on the entire podcast. I selectively emphasized the stuff I found most interesting. Note also that these notes were mostly for my understanding, and I did not set out to perfectly or precisely capture Connor’s views.

Disclaimer #2: I’m always summarizing Connor (even when I write with “I” or “we”— the “I” refers to Connor). I do not necessarily endorse or agree with any of these views.

Highlights

Timelines

Thoughts on MIRI Dialogues & Eliezer’s style

Thoughts on Death with Dignity & optimizing for “dignity points” rather than utility

Thoughts on the importance of playing with large models

Conjecture

Refine (alignment incubator)

Uncorrelated Bets

What does Conjecture need right now?

Full notes

AGI Timelines

How we’ll get AGI

Thoughts on Ajeya’s bioanchors report

Thoughts on Paul-Eliezer-Others Dialogues

Thoughts on Death with Dignity

Thoughts on the importance of playing with large models

Was Eleuther AI net negative?

Conjecture

Thoughts on government coordination

Miracles

Uncorrelated bets

What partial solutions to alignment will look like

What Conjecture works on

Refine (alignment incubator)

Thoughts on infohazards

Why is Conjecture for-profit?

How will Conjecture make money?

What does Conjecture need right now?

Why invest in Conjecture instead of Redwood or Anthropic?

23 comments

Comments sorted by top scores.

comment by Kaj_Sotala · 2022-09-16T15:00:18.150Z · LW(p) · GW(p)

Nice interview, liked it overall! One small question -

  • Heuristic: Imagine you were in a horror movie. At what point would the audience be like “why aren’t you screaming yet?” And how can you see GPT-3 and Dall-E (especially Dall-E) and not imagine the audience screaming at you?

I feel like I'm missing something; to me, this heuristic obviously seems like it'd track "what might freak people out" rather than "how close are we actually to AI". E.g. it feels like I could also imagine an audience at a horror movie starting to scream in the 1970s if they were shown the sample dialogue with SHRDLU starting from page 155 here. Is there something I'm not getting?

Replies from: Bjartur Tómas
comment by Tomás B. (Bjartur Tómas) · 2022-09-17T13:55:11.472Z · LW(p) · GW(p)

Jonathan Blow had a thread on Twitter about this, like Eroisko SHRDLU has no published code, no similar system showing the same behaviour after 40-50 years. Just the author’s word. I think the performance of both was wildly exaggerated.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2022-09-17T19:25:49.124Z · LW(p) · GW(p)

But if we are the movie audience seeing just the publication of the paper in the 70s, we don't yet know that it will turn out to be a dead end with no meaningful follow-up after 40-50 years. We just see what looks to us like an impressive result at the time.

And we also don't yet know if GPT-3 and Dall-E will turn out to be dead ends with no significant progress for the next 40-50 years. (I will grant that it seems unlikely, but when the SHRDLU paper was published, it being a dead end must have seemed unlikely too.)

Replies from: Bjartur Tómas
comment by Tomás B. (Bjartur Tómas) · 2022-09-17T19:27:25.570Z · LW(p) · GW(p)

Millions have personally used GPT-3 in this movie. 

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2022-09-17T19:53:11.826Z · LW(p) · GW(p)

If we start going to the exact specifics of what makes them different then yes, there are reasonable grounds for why GPT-3 would be expected to genuinely be more of an advance than SHRDLU was. But at least as described in the post, the heuristic under discussion wasn't "if we look at the details of GPT-3, we have good reasons to expect it to be a major milestone"; the heuristic was "the audience of a horror movie would start screaming when GPT-3 is introduced". 

If the audience of a 1970s horror movie would have started screaming when SHRDLU was introduced, what we now know about why it was a dead end doesn't seem to matter, nor does it seem to matter that GPT-3 is different. Especially since why would a horror movie introduce something like that only for it to turn out to be a red herring?

I realize that I may be taking the "horror movie" heuristic too literally but I don't know how else to interpret it than "evaluate AI timelines based on what would make people watching a horror movie assume that something bad is about to happen".

Replies from: Bjartur Tómas
comment by Tomás B. (Bjartur Tómas) · 2022-09-18T02:05:16.266Z · LW(p) · GW(p)

Seems like he basically admits the thing was a fraud:

comment by Raemon · 2022-09-15T22:37:44.878Z · LW(p) · GW(p)

I found this a useful crystallization of what was going on with Death With Dignity (I'm curious if Eliezer thinks this was a good summary)

comment by [deleted] · 2022-09-17T09:08:56.043Z · LW(p) · GW(p)

I appreciate the post and Connor for sharing his views, but the antimeme thing kind of bothers me.

  • Here’s my hot take: I think Paul and Eliezer were having two totally different conversations. Paul was trying to have a scientific conversation. Eliezer was trying to convey an antimeme.
  • An antimeme is something that by its very nature resists being known. Most antimemes are just boring—things you forget about. If you tell someone an antimeme, it bounces off them. So they need to be communicated in a special way. Moral intuitions. Truths about yourself. A psychologist doesn’t just tell you “yo, you’re fucked up bro.” That doesn’t work.
  • A lot of Eliezer’s value as a thinker is that he notices & comprehends antimemes. And he figures out how to communicate them.
  • A lot of his frustration throughout the years has been him telling everyone that it’s really really hard to convey antimemes. Because it is.
  • If you read The Sequences, some of it is just factual explanations of things. But a lot of it is metaphor. It reads like a religious text. Not because it’s a text of worship, but because it’s about metaphors and stories that affect you more deeply than facts.
  • What happened in the MIRI dialogues is that Eliezer was telling Paul “hey, I’m trying to communicate an antimeme to you, but I’m failing because it’s really really hard.”

Does Connor ever say what antimeme Eliezer is trying to convey, or is it so antimemetic that no one can remember it long enough to write it down? 

I understand that if this antimeme stuff is actually true, these ideas will be hard to convey. But it's really frustrating to hear Connor keep talking about antimemes while not actually mentioning what these antimemes are and what makes them antimemetic. Also, saying "There are all these antimemes out there but I can't convey them to you" is a frustratingly unfalsifiable statement.

Replies from: Bjartur Tómas, None
comment by Tomás B. (Bjartur Tómas) · 2022-09-17T13:43:39.746Z · LW(p) · GW(p)

I suppose if it’s an a antimeme, I may be not understanding. But this was my understanding:

Most humans are really bad at being strict consequentialists. In this case, they think of some crazy scheme to slow down capabilities that seems sufficiently hardcore to signal that they are TAKING SHIT SERIOUSLY and ignore second order effects that EY/Connor consider obvious. Anyone whose consequentialism has taken them to this place is not a competent one. EY proposes such people (which I think he takes to mean everyone, possibly even including himself) follow a deontological rule instead, attempt to die with dignity. Connor analogizes this to reward shaping - the practice of assigning partial credit to RL agents for actions likely to be useful in reaching the true goal.

Replies from: None
comment by [deleted] · 2022-09-18T07:07:25.085Z · LW(p) · GW(p)

I think that's the antimeme from the Dying with Dignity post. If I remember correctly, the MIRI dialogues between Paul and Eliezer were about takeoff speeds, so Connor is probably referring to something else in the section I quoted, no?

comment by [deleted] · 2022-09-17T13:42:05.384Z · LW(p) · GW(p)
comment by Shmi (shminux) · 2022-09-16T04:26:46.153Z · LW(p) · GW(p)

That is a very enlightening post! My favorite bits:

A lot of Eliezer’s value as a thinker is that he notices & comprehends antimemes. And he figures out how to communicate them.

I think Paul and Eliezer were having two totally different conversations. Paul was trying to have a scientific conversation. Eliezer was trying to convey an antimeme.

Replies from: Kaj_Sotala, Mo Nastri
comment by Kaj_Sotala · 2022-09-16T16:02:08.805Z · LW(p) · GW(p)

Yeah, I also really liked those bits.

comment by Mo Putera (Mo Nastri) · 2022-09-16T15:20:45.420Z · LW(p) · GW(p)

I'm curious if Eliezer endorses this, especially the first paragraph. 

comment by Raemon · 2022-09-15T22:38:34.424Z · LW(p) · GW(p)

We posted on LessWrong saying that we’re hiring, and we got so many high-quality applications. 1 in 3 applications were really good— that never happens! So we have some new people, and we have lots of projects, but we’re currently funding-constrained.

I want to flag I expect this to not just be funding constrained by network-constrained – onboarding a new employee doesn't just cost money but a massive amount of time, especially if you're trying to scale a nuanced company culture. 

comment by Lone Pine (conor-sullivan) · 2022-09-18T10:45:31.881Z · LW(p) · GW(p)

I'm starting to think that utilitarianism is the heart of the problem here. "Utilitarianism is intractable" is only an antimeme to utilitarians, in the same way that "Object-Oriented Programming is complex" is only an antimeme to people who are fans of Object-Oriented Programming.

Replies from: sharmake-farah
comment by Noosphere89 (sharmake-farah) · 2022-09-23T12:30:31.376Z · LW(p) · GW(p)

I'd argue that a major part of the problem really is long-term consequentialism, but I'd argue that this is inevitable at least partially as soon as 2 conditions are met by default:

  1. Trade offs exist, and the value of something cannot be infinity nor arbitrarily large values.

  2. Full knowledge of the values of something isn't known to the agent.

It really doesn't matter whether consequentialism or morality is actually true, just whether it's more useful than other approaches (given that capabilities researchers are only focusing on how capable a model is.)

And for a lot of problems in the real world, this is pretty likely to occur.

For a link to a dentological AI idea, here it is:

https://www.lesswrong.com/posts/FSQ4RCJobu9pussjY/ideological-inference-engines-making-deontology [LW · GW]

And for a myopic decision theory, LCDT:

https://www.lesswrong.com/posts/Y76durQHrfqwgwM5o/lcdt-a-myopic-decision-theory [LW · GW]

comment by Stephen McAleese (stephen-mcaleese) · 2022-09-19T22:11:01.311Z · LW(p) · GW(p)

This is a really interesting interview with lots of great ideas. Thanks for taking notes on this!

The only point I don't really agree with is the idea that Redwood Research, Anthropic, and ARC are correlated. Although they are all in the same geographic area, they seem to be working on fairly different projects to me:

  • Redwood Research: controlling the output of language models.
  • Anthropic: deep transformer interpretability work.
  • ARC: theoretical alignment research (e.g. ELK).
comment by TAG · 2022-09-16T14:48:35.595Z · LW(p) · GW(p)

Lacks explanations of basics like who Conjecture are, and what an antimeme is.

Replies from: akash-wasil
comment by Akash (akash-wasil) · 2022-09-16T15:23:01.549Z · LW(p) · GW(p)

Conjecture is a new AI alignment organization (https://www.conjecture.dev/). Edited the post to include the link.

Connor's explanation of an antimeme (as presented in the interview) is above:

An antimeme is something that by its very nature resists being known. Most antimemes are just boring—things you forget about. If you tell someone an antimeme, it bounces off them.

Replies from: TAG
comment by TAG · 2022-09-16T16:01:44.972Z · LW(p) · GW(p)

And some are too complicated, and some are too unusual, and some are too disturbing. Four very different things.

Replies from: shminux
comment by Shmi (shminux) · 2022-09-16T16:44:22.632Z · LW(p) · GW(p)

Yes, all those. And all of them occur in the AI safety discussions. The result is the same though: SCP-style disappearance from personal or public consciousness. 

Replies from: TAG
comment by TAG · 2022-09-16T17:03:22.255Z · LW(p) · GW(p)

Since the causes are different, the cures are different.