Relative Abstracted Agency

post by Audere (Ozzalus) · 2023-04-08T16:57:38.556Z · LW · GW · 6 comments

Note: This post was pasted without much editing or work put into formatting. I may come back and make it more presentable at a later date, but the concepts should still hold.

Relative abstracted agency is a framework for considering the extent to which a modeler models a target as an agent, what factors lead a modeler to model a target as an agent, and what sort of models have the nature of being agent-models. The relative abstracted agency of a target relative to a reasonably efficient modeler is based on the most effective strategies that the modeler uses to model the target, which exist on a spectrum from terminalizing strategies to simulating strategies.


Factors that affect relative abstracted agency of a target:

  1. Complexity of target and quantity of available information about the target. AIXI can get away with simulating everything even with a mere single bit of information about a target because it has infinite computing power. In practice, any modeler that fits in the universe likely needs a significant fraction of the bits of complexity of a nontrivial target to model it well using simulating strategies. Having less information about a target tends to, but doesn’t always, make agent-abstracted strategies more effective than less agent-abstracted ones. For example, a modeler may best predict a target using mechanizing strategies until it has information suggesting that the target acts agentically enough to be better modeled using psychologizing or projecting strategies.
  2. Predictive or strategic ability of the modeler relative to the target. Targets with overwhelming predictive superiority over a modeler are usually best modeled using terminalizing strategies, whereas targets that a modeler has an overwhelming predictive advantage over are usually best modeled using mechanizing or simulating strategies.


Relevance of this framework to AI alignment:

Additional thoughts:
(draws on parts of, particularly Kosoy’s model of agenthood)

Suppose there is a correct hypothesis for the world in the form of a non-halting turing program. Hereafter I’ll simply refer to this as “the world.”


Consider a set of bits of the program at one point in its execution which I will call the target. This set of bits can also be interpreted as a cartesian boundary around an agent executing some policy in Vanessa’s framework. We would like to evaluate the degree to which the target is usefully-approximated as an agent, relative to some agent that (instrumentally or terminally) attempts to make accurate predictions under computational constraints using partial information about the world, which we will call the modeler. 


Vanessa Kosoy’s framework outlines a way of evaluating the probability that an agent G has a utility function U which takes into account the agent’s efficacy at satisfying U as well as the complexity of U. Consider some utility function which the target is most kosoy-agentic with respect to. Hereafter I’ll simply refer to this as the target’s utility function.


Suppose the modeler can choose between gaining 1 bit of information of its choice about the target’s physical state in the world, and gaining 1 bit of information of its choice about the target’s utility function. (Effectively, the modeler can choose between obtaining an accurate answer to a binary question about the target’s physical state, and obtaining an accurate answer to a binary question about the target’s utility function). The modeler, as an agent, should assign some positive amount of utility to each option relative to a null option of gaining no additional information. Let’s call the amount of utility it assigns to the former option SIM and the amount it assigns to the latter option TERM.


A measure of the relative abstracted agency of the target, relative to the modeler, is given by TERM/SIM. Small values indicate that the target has little relative abstracted agency, while large values indicate that the target has significant abstracted agency. The RAA of a rock relative to myself should be less than one, as I expect information about its physical state to be more useful to me than information about its most likely utility function. On the other hand, the RAA of an artificial superintelligence relative to myself should be greater than one, as I expect information about its utility function to be more useful to me than information about its physical state.



Comments sorted by top scores.

comment by Jan_Kulveit · 2023-04-08T17:13:01.369Z · LW(p) · GW(p)

I don't mind the post was posted without much editing or work put into formatting but I find it somewhat unfortunate the post was probably written without any work put into figuring out what other people wrote about the topic and what terminology they use

Recommended reading: 
- Daniel Dennett's Intentional stance
- Grokking the intentional stance [LW · GW]
- Agents and device review [LW · GW]

Replies from: lahwran
comment by the gears to ascension (lahwran) · 2023-04-08T21:08:49.091Z · LW(p) · GW(p)

@Audere [LW · GW] Thoughts on changing words to match previous ones?

comment by the gears to ascension (lahwran) · 2023-04-03T21:37:11.888Z · LW(p) · GW(p)

@mods, if there were an alignmentforum sketch grade posts, this would belong there. It seems like there ought to be a level between lesswrong and alignmentforum, which is gently vetted, but specifically allows low quality posts.

comment by the gears to ascension (lahwran) · 2023-04-03T21:30:34.562Z · LW(p) · GW(p)

a question came up - how do you formalize this exactly? how do you separate questions about physical state from questions about utility functions? perhaps, audere says, could you bound the relative complexity of the perspectives of utility function representation vs simulating perspective?

also how do you deal with modeling smaller boundedly rational agents in actual formalism? I can recognize psychologizing is the right perspective to model a cat who is failing to walk around a glass wall to get the food on the other side and is instead meowing sadly at the wall, but how do I formalize it? Seems like the discovering agents paper still has a lot to tell us about how to do this - 

comment by the gears to ascension (lahwran) · 2023-04-03T21:25:23.985Z · LW(p) · GW(p)

Still on the call - Audere was saying this builds on Kosoy's definition by trying to patch a hole; I am not quite keeping track of which thing is being patched

comment by the gears to ascension (lahwran) · 2023-04-03T21:11:48.117Z · LW(p) · GW(p)

We were discussing this on a call and I was like "this is very interesting and more folks on LW should consider this perspective". It came up after a while of working through Discovering Agents, which is a very deep and precise causal models read and takes a very specific perspective. The perspective in this post is an extension of

Agents and Devices: A Relative Definition of Agency

According to Dennett, the same system may be described using a physical' (mechanical) explanatory stance, or using an intentional' (belief- and goal-based) explanatory stance. Humans tend to find the physical stance more helpful for certain systems, such as planets orbiting a star, and the intentional stance for others, such as living animals. We define a formal counterpart of physical and intentional stances within computational theory: a description of a system as either a device, or an agent, with the key difference being that devices' are directly described in terms of an input-output mapping, while agents' are described in terms of the function they optimise. Bayes' rule can then be applied to calculate the subjective probability of a system being a device or an agent, based only on its behaviour. We illustrate this using the trajectories of an object in a toy grid-world domain.

One of the key points that @Audere [LW · GW] is arguing that the amount of information one has about a target, and one needs to know more and more about a target possible agent to do higher and higher levels of precise modeling. Very interesting. So a key concern we have is the threat from an agent that is able to do full simulation of other agents. If we could become unpredictable to potentially scary agents, we would be safe, but due to being made of mechanisms we cannot hide, we cannot indefinitely.