My recent posts

post by paulfchristiano · 2016-11-29T18:51:09.000Z · LW · GW · 0 comments

Contents

  Terminology and concepts
None
No comments

Over at medium, I'm continuing to write about AI control; here's a roundup from the last month.

Many of these seem like interesting things to discuss here; would it be better to post each of these as a link when I write it?

#Strategy

Prosaic AI control argues that AI control research should first consider the case where AI involves no "unknown unknowns."
Handling destructive technology tries to explain the upside of AI control, if we live in a universe where we eventually need to build a singleton anyway.
Hard-core subproblems explains a concept I find helpful for organizing research.

#Building blocks of ALBA

Security amplification and reliability amplification are complements to capability amplification. Ensembling for reliability is now implemented in ALBA on github.
Meta-execution is my current leading contender for security and capability amplification. It’s totally unclear how well it can work (some relevant speculation).
Thoughts on reward engineering discusses a bunch of prosaic but important issues when designing reward functions.

Terminology and concepts

Clarifying the distinction between safety, control and alignment.
Benignity may be a useful invariant when designing aligned AI.

0 comments

Comments sorted by top scores.