Does anyone's full-time job include reading and understanding all the most-promising formal AI alignment work?

post by NicholasKross · 2023-06-16T02:24:31.048Z · LW · GW · 2 comments

This is a question post.

Contents

2 comments

(By "most promising" I mostly mean "not obviously making noob mistakes", with the central examples being "any Proper Noun research agenda associated with a specific person or org".)

(By "formal" I mean "involving at least some math proofs, and not solely coding things".)

Asking because the field is both relatively-small and also I'm not sure if any single person "gets" all of it anymore.

Example that made me ask this (not necessarily a central example): Nate Soares wrote this [LW · GW] about John Wentworth's work, but then Wentworth replied [LW(p) · GW(p)] saying it was inaccurate about his current/overall priorities.

Answers

2 comments

Comments sorted by top scores.

comment by Seth Herd · 2023-06-16T06:38:58.709Z · LW(p) · GW(p)

I'm just curious, why the specification of math proofs? I know of some modestly promising ideas for aligning the sorts of AGI we're likely to get, and none of them were originally specified in mathematical terms. Tacking on maths to those wouldn't really be useful. My impression is that the search for formal proofs of safety have failed and are probably hopeless. It also seems like adding mathematical gloss to ML and psychological concepts is more often confusing than enlightening.

Replies from: NicholasKross
comment by NicholasKross · 2023-06-16T12:42:26.387Z · LW(p) · GW(p)

It's to differentiate from more-obviously-tractable less-formalism/conceptual/deconfusion-based research agendas like i.e. HCH. As asked, I'm looking for info specifically related to this other kind.