Does a LLM have a utility function?
post by Dagon · 2022-12-09T17:19:45.936Z · LW · GW · 2 commentsThis is a question post.
Contents
Answers 18 cfoster0 10 Loppukilpailija 4 rachelAF None 2 comments
There's a lot of discussion and research into AI alignment, almost always about variants of how to define/create a utility function (or meta-function, if it changes over time) that is actually aligned with ... something. That something is at least humanity's survival, but often something like flourishing or other semi-abstract goal. Oops, that's not my question for today.
My question for today is whether utility functions are actually part of the solution at all. Humans don't have them, the most interesting spurs toward AI don't have them. Maybe anything complicated enough to be called AGI doesn't have one (or at least doesn't have a simple, concrete, consistent one).
Answers
It may be better to ask "Is a utility function a useful abstraction to describe how X makes decisions?" (Does it allow you to compress your description of X's decisions [? · GW]?) Recall that utility functions are just a representation derived from preferences that are structured in a particular way. But not all ways of deciding on a preferred outcome are structured in that way[1], and not all decision algorithms work by preferring outcomes, so thinking in terms of utility functions is not always helpful.
- ^
See for example:
Aumann, R. J. (1962). Utility theory without the completeness axiom. Econometrica: Journal of the Econometric Society, 445-462.Bewley, T. F. (2002). Knightian decision theory. Part I. Decisions in economics and finance, 25(2), 79-110.
↑ comment by TAG · 2022-12-10T14:59:35.455Z · LW(p) · GW(p)
Even if it's a useful abstraction, it's only an abstraction. You can't make an AI safe by changing the it's UF unless it's UF is a distinct component at the engineering level, not just an abstraction.
Replies from: Dagon↑ comment by Dagon · 2022-12-10T15:43:03.897Z · LW(p) · GW(p)
And you can't determine if it's safe by examining or understanding it's utility function, if the abstraction is so loose as to not be align-able.
Replies from: dan-4↑ comment by Dan (dan-4) · 2023-01-15T15:53:09.638Z · LW(p) · GW(p)
Its not really an abstraction at all in this case, it literally has a utility function. What rates highest on its utility function is returning whatever token is 'most likely' given it's training data.
I found janus's post Simulators [LW · GW] to address this question very well. Much of AGI discussion revolves around agentic AIs (see the section Agentic GPT for discussion of this), but this does not model large language models very well. janus suggests that one should instead think of LLMs such as GPT-3 as "simulators". Simulators are not very agentic themselves or well described as having a utility function, though they may create simulacra that are agentic (e.g. GPT-3 writes a story where the main character is agentic).
↑ comment by janus · 2022-12-10T03:45:07.556Z · LW(p) · GW(p)
A relevant passage from Simulators:
We can specify some types of outer objectives using a ground truth distribution that we cannot with a utility function. As in the case of GPT, there is no difficulty in incentivizing a model to predict actions that are corrigible, incoherent, stochastic, irrational, or otherwise anti-natural to expected utility maximization. All you need is evidence of a distribution exhibiting these properties.
For instance, during GPT’s training, sometimes predicting the next token coincides with predicting agentic behavior, but:
- The actions of agents described in the data are rarely optimal for their goals; humans, for instance, are computationally bounded, irrational, normative, habitual, fickle, hallucinatory, etc.
- Different prediction steps involve mutually incoherent goals, as human text records a wide range of differently-motivated agentic behavior
- Many prediction steps don’t correspond to the action of any consequentialist agent but are better described as reporting on the structure of reality, e.g. the year in a timestamp. These transitions incentivize GPT to improve its model of the world, orthogonally to agentic objectives.
- When there is insufficient information to predict the next token with certainty, log-loss incentivizes a probabilistic output. Utility maximizers aren’t supposed to become more stochastic [LW · GW] in response to uncertainty.
Everything can be trivially modeled as a utility maximizer, but for these reasons, a utility function is not a good explanation or compression of GPT’s training data, and its optimal predictor is not well-described as a utility maximizer. However, just because information isn’t compressed well by a utility function doesn’t mean it can’t be compressed another way. The Mandelbrot set is a complicated pattern compressed by a very simple generative algorithm which makes no reference to future consequences and doesn’t involve argmaxxing anything (except vacuously being the way it is [LW · GW]). Likewise the set of all possible rollouts of Conway’s Game of Life – some automata may be well-described as agents [LW · GW], but they are a minority of possible patterns, and not all agentic automata will share a goal. Imagine trying to model Game of Life as an expected utility maximizer!
This makes the same point as cfoster0's comment on this post - and that self-supervised learning is a method of AI specification that does not require "choosing a utility function", even implicitly, since the resulting policy won't necessarily be well-described as a utility maximizer at all.
Replies from: dan-4↑ comment by Dan (dan-4) · 2023-01-15T15:05:05.452Z · LW(p) · GW(p)
I'm going to disagree here.
It's utility function is pretty simple and explicitly programmed. It wants to find the best token, where 'best' is mostly the same as 'the most likely according to the data I'm trained on'. With a few other particulars (where you can adjust how 'creative' vs plagiarizer-y it should be.)
That's a utility function. GPT is what's called a hill climbing algorithm. It must have a simple straight forward utility function hard coded right in there for it to assess if a given choice is 'climbing' or not.
Replies from: sil-ver↑ comment by Rafael Harth (sil-ver) · 2023-01-15T15:14:33.697Z · LW(p) · GW(p)
That's the training signal, not the utility function. Those are different things. (I believe this point was made in Reward is not the Optimization Target [LW · GW], though I could be wrong since I never actually read this post; corrections welcome.)
I think that the significant distinction is whether an AI system has a utility function that it is attempting to optimize at test time. A LLM does have an utility function, in that there is an objective function written in its training code that it uses to calculate gradients and update its parameters during training. However, once it is deployed, its parameters are frozen and its score on this objective function can no longer impact its behavior. In that sense, I don't think that it makes sense to think of a LLM as "trying to" optimize this objective after deployment. However, this answer could change in response to changes in model training strategy, which is why this distinction is significant.
2 comments
Comments sorted by top scores.
comment by Dan (dan-4) · 2023-01-15T15:48:17.312Z · LW(p) · GW(p)
YES, It wants to find the best next token, where 'best' is 'the most likely'.
That's a utility function. Its utility function is a line of code necessary for training, otherwise nothing would happen when you tried to train it.
Reply
comment by Dan (dan-4) · 2023-01-15T14:53:47.189Z · LW(p) · GW(p)
A utility function is the assessment by which you decide how much an action would further your goals. If you can do that, highly accurately or not, you have a utility function.
If you had no utility function, you might decide you like NYC more than Kansas, and Kansas more than Nigeria, but you prefer Nigeria to NYC. So you get on a plane and fly in circles, hopping on planes every time you get to your destination forever.
Humans definitely have a utility function. We just don't know what ranks very highly on our utility function. We mostly agree on the low ranking stuff. A utility function is the process by which you rate potential futures that you might be able to bring about and decide you prefer some futures more than others.
With a utility function plus your (limited) predictive ability you rate potential futures as being better, worse, or equal to each other, and act accordingly.