[Aspiration-based designs] Outlook: dealing with complexity
post by Jobst Heitzig, jossoliver, thomasfinn, Simon Dima (simon-dima) · 2024-04-28T13:06:35.841Z · LW · GW · 3 commentsContents
Multi-dimensional aspirations Hierarchical decision making None 3 comments
Summary. This teaser post sketches our current ideas for dealing with more complex environments. It will ultimately be replaced by one or more longer posts describing these in more detail. Reach out if you would like to collaborate on these issues.
Multi-dimensional aspirations
For real-world tasks that are specified in terms of more than a single evaluation metric, e.g., how much apples to buy and how much money to spend at most, we can generalize Algorithm 2 as follows from aspiration intervals to convex aspiration sets:
- Assume there are many evaluation metrics , combined into a vector-valued evaluation metric .
- Preparation: Pick many linear combinations in the space spanned by these metrics so that their convex hull is full-dimensional and contains the origin, and consider the many policies each of which maximizes the expected value of the corresponding function . Let and be the expected values of when using in state or after using action in state , respectively (see Fig. 1). Let the admissibility simplices and be the simplices spanned by the vertices and , respectively (red and violet triangles in Fig. 1). They replace the feasibility intervals used in Algorithm 2.
- Policy: Given a convex state-aspiration set (central green polyhedron in Fig. 1), compute its midpoint (centre of mass) and consider the segments from to the corners of (dashed black lines in Fig. 1). For each of these segments , let be the (nonempty!) set of actions for which intersects . For each , compute the action-aspiration by shifting a copy of along towards until the intersection of and is contained in the intersection of and (half-transparent green polyhedra in Fig. 1), and then intersecting with to give (yellow polyhedra in Fig. 1). Then pick one candidate action from each and randomize between these actions in proportions so that the corresponding convex combination of the sets is included in . Note that this is always possible because is in the convex hull of the sets and the shapes of the sets "fit" into by construction.
- Aspiration propagation: After observing the successor state , the action-aspiration is rescaled linearly from to to give the next state-aspiration , see Fig. 2.
(We also consider other variants of this general idea)
Hierarchical decision making
A common way of planning complex tasks is to decompose them into a hierarchy of two or more levels of subtasks. Similar to existing approaches from hierarchical reinforcement learning, we imagine that an AI system can make such hierarchical decisions as depicted in the following diagram (shown for only two hierarchical levels, but obviously generalizable to more levels):
3 comments
Comments sorted by top scores.
comment by Roman Malov · 2024-05-02T20:33:22.700Z · LW(p) · GW(p)
Replies from: Roman MalovPick many linearly independent linear combinations
isn't there at most linearly independent linear combinations of ?
↑ comment by Roman Malov · 2024-05-02T20:37:34.741Z · LW(p) · GW(p)
maybe you meant pairwise linearly independent (by looking at the graph)
Replies from: Jobst Heitzig↑ comment by Jobst Heitzig · 2024-05-02T20:49:17.902Z · LW(p) · GW(p)
You are of course perfectly right. What I meant was: so that their convex hull is full-dimensional and contains the origin. I fixed it. Thanks for spotting this!