What constraints does deep learning place on alignment plans?

post by Garrett Baker (D0TheMath) · 2023-05-03T20:40:16.007Z · LW · GW · No comments

This is a question post.

Contents

No comments

A common complaint of particularly theoretical alignment research is that it doesn't seem compatible with deep learning. For example, Tammy's QACI proposal seems pretty clearly not compatible with deep learning. More extremely is early MIRI work.

Prosaic alignment seems more clearly compatible with deep learning, and so does work that rests on interpretability.

Ideally, we'd have both an alignment solution that seems compatible with deep learning, and is at least as theoretically sound (in terms of alignment properties) as QACI (though I'm skeptical of its soundness).

Sometimes clearly defining the constraints of a problem can help with solving it. So, what are the currently known (or probable) constraints that deep learning imposes on alignment solutions (and plans)?

Answers

No comments

Comments sorted by top scores.