AISC project: SatisfIA – AI that satisfies without overdoing it
post by Jobst Heitzig · 2023-11-11T18:22:57.050Z · LW · GW · 0 commentsThis is a link post for https://docs.google.com/document/d/1JhmK31IwYGcwqX0nKmxKsbmTh_DX3o1OoW7NJmhVbIw/edit?usp=sharing)
Contents
SatisfIA – AI that satisfies without overdoing it Summary Details Applying None No comments
This project is part of the upcoming round of AI Safety Camp.
SatisfIA – AI that satisfies without overdoing it
Summary
This project will make a contribution to some fundamental design aspects of AI systems. We will explore novel designs for generic AI agents – AI systems that can be trained to act autonomously in a variety of environments – and their implementation in software.
Our designs deviate from most existing designs in that they are not based on the idea that the agent should aim to maximize some kind of objective function (which I argue is inherently unsafe if the agent is powerful enough and one cannot be absolutely sure to have found the exactly right objective function). Rather than aiming to maximize some objective function, our agents will aim to fulfill goals that are specified via constraints called “aspirations” (which I argue implies a much lower probability of taking “extreme” actions and therefore is likely much safer).
For example, I might want my AI butler to prepare 100–150 ml of tea, having a temperature of 70–80°C, taking for this at most 10 minutes, spending at most $1 worth of resources, and succeeding in this with at least 95% probability (rather than: prepare as much tea as fast and cheap as possible with the largest possible probability).
For a lightweight introduction into this way of thinking, you can watch this interview.
We will study several versions of such “non-maximizing” agent designs and corresponding learning algorithms (mostly variants of Reinforcement Learning in Markov Decision Problems).
This involves designing agents and algorithms in theory, implementing them in software (mostly Python), simulating their behavior in selected test environments (e.g., in the AI safety gridworlds), formulating hypotheses about that behavior, especially about its safety-relevant consequences, then trying to prove or disprove these hypotheses formally and/or provide numerical evidence that supports them, and writing-up these results in blog posts and an academic paper.
Details
For details, please consult the full proposal
Applying
To apply working on this project, please visit the AI Safety Camp webpage and apply for project #21
0 comments
Comments sorted by top scores.