Describing humans using a "utility function" or through "goals" is wrong.
Humans are a bunch of habits (like CFAR TAPs) which have some correlation with working towards goals, but this is more of an imperfect rationalization than a reasonable/natural way to describe the situation.
Also yes, we have some part that thinks in goals, but it has a very limited effect on anything (like actions) compared to what we'd naturally think.
Credit to a friend
[I have no idea what I'm talking about, feel free to ignore if this doesn't resonate of course, seemed worth a comment]
Some people here inspire me to make predictions ;) So here's my attempt:
My guess, mainly based on this image (linked from the post):
Is that he'd say it's a sub category of "getting models to output things based only on their training data, while treating them as a black box and still assuming unexpected outputs will happen sometimes", as well as "this might work well for training, but obviously not for an AGI" and "if we're going to talk about limiting a model's output, Redwood Research is more of a way to go" and perhaps "this will just advance AI faster"