My recent posts
post by paulfchristiano · 2016-11-29T18:51:09.000Z · LW · GW · 0 commentsContents
Terminology and concepts None No comments
Over at medium, I'm continuing to write about AI control; here's a roundup from the last month.
Many of these seem like interesting things to discuss here; would it be better to post each of these as a link when I write it?
#Strategy
- Prosaic AI control argues that AI control research should first consider the case where AI involves no "unknown unknowns."
- Handling destructive technology tries to explain the upside of AI control, if we live in a universe where we eventually need to build a singleton anyway.
- Hard-core subproblems explains a concept I find helpful for organizing research.
#Building blocks of ALBA
- Security amplification and reliability amplification are complements to capability amplification. Ensembling for reliability is now implemented in ALBA on github.
- Meta-execution is my current leading contender for security and capability amplification. It’s totally unclear how well it can work (some relevant speculation).
- Thoughts on reward engineering discusses a bunch of prosaic but important issues when designing reward functions.
Terminology and concepts
- Clarifying the distinction between safety, control and alignment.
- Benignity may be a useful invariant when designing aligned AI.
0 comments
Comments sorted by top scores.