One person's worth of mental energy for AI doom aversion jobs. What should I do?

lorec

One person's worth of mental energy for AI doom aversion jobs. What should I do?

post by Lorec · 2024-08-26T01:29:01.700Z · LW · GW · 17 comments

17 comments

Hi! I'm Lorec, AKA Mack. I made this post 3 years ago:

Wanted: Foom-scared alignment research partner [LW · GW]

I met some great people, but we never got much of anywhere.

Since then, technical alignment research in general also has not gotten much of anywhere [ counterexample; other strongish counterexamples I know of include the Visible Thoughts idea and Pliny's approach, Anthropic's approach doesn't seem to have panned out ] and AI doom aversion policy [LW · GW] has become a thing.

I made a Discord a while ago for discussion of doom aversion methods. We were some of the first people [to my knowledge] talking positively about SB-1047. I consider it a failure: we were early and correct, but because we were not plugged into any network, nothing came of it.

I am indifferent to technical versus policy work except to the extent of [ the effectiveness factor over the risk factor of [technical work in general] ], versus [ the effectiveness factor over the risk factor of [policy work in general] ].

Factors I'm coming in considering important contributors to the technical versus policy weigh-in:

Pro Technical

- Can potentially have low safety risks if the researcher knows exactly what they are doing and does not use their employer's money to contribute to capabilities

- Can potentially have high safety upsides if the researcher knows exactly what they are doing and is a paranoid saint and can work without ever posting their exciting intermediate results on social media [difficulty level: impossible]

- Technical experience lends [any] policy credibility, while policy experience does not lend technical credibility

Pro Policy

- Fairly safe, for people who have a reasonable level of knowing what they are doing

- Policy jobs [from my faraway position; this might be wrong] seem likely to be more fungible [with each other] than technical jobs - resulting in less risk of being locked in to one employer whose mission I find myself disagreeing with

- I expect to have an easier time getting one of these kinds of jobs; while I consider myself decent enough at programming to be qualified for such technical alignment research as is hiring in principle, in practice I have no degree, job history, or portfolio, and am done wasting my time trying to acquire them, like, no, really, done. End of story.

Who should I talk to? What movements or orgs should I look into? Where are Things Happening the most? As stated in title, all my spoons are available for this, provided I find something that's actually high prospective impact, low prospective risk.

I appreciate your time and consideration.

17 comments

Comments sorted by top scores.

comment by Neel Nanda (neel-nanda-1) · 2024-08-26T02:13:46.836Z · LW(p) · GW(p)

Anthropic's approach doesn't seem to have panned out

Please don't take that tweet as evidence that mech interp is doomed! Much attention is on sparse autoencoders nowadays, which seem like a cool and promising approach

Replies from: Lorec

↑ comment by Lorec · 2024-08-26T03:00:01.090Z · LW(p) · GW(p)

Tweet link removed.

Replies from: neel-nanda-1

↑ comment by Neel Nanda (neel-nanda-1) · 2024-08-26T11:57:12.012Z · LW(p) · GW(p)

Thanks! I will separately say that I disagree with the statement regardless of whether you're treating my tweet as evidence

Replies from: Lorec

↑ comment by Lorec · 2024-08-29T14:41:07.320Z · LW(p) · GW(p)

In what sense do you consider the mechinterp paradigm that originated with Olah, to be working?

Replies from: neel-nanda-1

↑ comment by Neel Nanda (neel-nanda-1) · 2024-08-29T19:52:45.221Z · LW(p) · GW(p)

We are finding a bunch of insights about the internal features and circuits inside models that I believe to be true, and developing useful techniques like sparse autoencoders and activation patching that expand the space of what we can do. We're starting to see signs of life of actually doing things with mech interp, though it's early days. I think skepticism is reasonable, and we're still far from actually mattering for alignment, but I feel like the field is making real progress and is far from failed

comment by TsviBT · 2024-08-26T16:30:15.958Z · LW(p) · GW(p)

https://tsvibt.blogspot.com/2023/07/views-on-when-agi-comes-and-on-strategy.html#things-that-might-actually-work

Replies from: nathan-helm-burger

↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-08-27T18:57:07.113Z · LW(p) · GW(p)

I think Tsvi is quite mistaken about the speed we are likely to see AGI develop at. I expect AGI by 2028 [LW(p) · GW(p)] with ~95% probability. He does not. Maybe we should dialogue about this?

Replies from: TsviBT

↑ comment by TsviBT · 2024-08-27T19:18:05.024Z · LW(p) · GW(p)

Sure, though if you're just going to say "I know how to do it! Also I won't tell you!" then it doesn't seem very pointful?

Replies from: Lorec

↑ comment by Lorec · 2024-08-28T13:17:37.844Z · LW(p) · GW(p)

"Endpoints are easier to predict than trajectories"; eventual singularity is such an endpoint [LW · GW]; on our current trajectory, the person who is going to do it does not necessarily know they are going to do it until it is done.

Replies from: M. Y. Zuo

↑ comment by M. Y. Zuo · 2024-09-02T19:11:07.195Z · LW(p) · GW(p)

"Endpoints are easier to predict than trajectories"

According to…? Can you link the proof?

comment by lemonhope (lcmgcd) · 2024-08-29T06:54:51.916Z · LW(p) · GW(p)

I don't know if you're a woman, but the women I know have had much more success in politics than the men I know.

Replies from: Lorec

↑ comment by Lorec · 2024-08-29T14:57:51.540Z · LW(p) · GW(p)

Not a woman, sadly.

I believe it, especially if one takes a view of "success" that's about popularity rather than fiat power.

But FYI to future advisors: the thing I would want to prospectively optimize for, along the gov path, when making this decision, is about fiat power. I'm highly uncertain about whether viable paths exist from a standing start to [benevolent] bureaucratic fiat power over AI governance, and if so, where those viable paths originate.

If it was just about reach, I'd probably look for a columnist position instead.

comment by Seth Herd · 2024-11-19T02:56:55.680Z · LW(p) · GW(p)

Did you make any progress on choosing a course? My brief pitch is this: LLM agents are our most likely route to AGI, and particularly likely in short timelines. Aligning them is not the same as aligning the base LLMs. Yet almost no one is working on bridging that gap.

That's what I'm working on. More can be found in my user profile.

I do think this is high prospective impact. I'm not sure what you mean by low prospective risk. I think the work has good odds of being at least somewhat useful, since it's so neglected and it's pretty commonly agreed that language model agents (or foundation model agents or LLM cognitive architectures) are a pretty likely path to first AGI.

I'm happy to talk more. I meant to respond here sooner.

comment by Nathan Helm-Burger (nathan-helm-burger) · 2024-08-27T18:54:23.568Z · LW(p) · GW(p)

I don't have much in the way of good ideas for you to try next. I will, however, link you to my viewpoint on what the next few years probably look like. [LW(p) · GW(p)]

comment by Chris_Leong · 2024-08-26T16:25:04.452Z · LW(p) · GW(p)

If you can land a job in government, it becomes much easier to land other jobs in government.

comment by Chris_Leong · 2024-08-26T16:12:21.114Z · LW(p) · GW(p)

Pliny's approach?

Replies from: Lorec

↑ comment by Lorec · 2024-08-28T13:19:44.831Z · LW(p) · GW(p)

https://x.com/elder_plinius

One person's worth of mental energy for AI doom aversion jobs. What should I do?

Contents

17 comments