"Decision Transformer" (Tool AIs are secret Agent AIs)

post by gwern · 2021-06-09T01:06:57.937Z · LW · GW · 4 comments

This is a link post for https://sites.google.com/berkeley.edu/decision-transformer

4 comments

Comments sorted by top scores.

comment by John Schulman (john-schulman) · 2021-06-09T15:46:36.485Z · LW(p) · GW(p)

Basically agree -- I think that a model trained by maximum likelihood on offline data is less goal-directed than one that's trained by an iterative process where you reinforce its own samples (aka online RL), but still somewhat goal directed. It needs to simulate a goal-directed agent to do a good job at maximum likelihood. OTOH it's mostly concerned with covering all possibilities, so the goal directed reasoning isn't emphasized. But with multiple iterations, the model can improve quality (-> more goal directedness) at the expense of coverage/diversity.

comment by gwern · 2021-06-09T01:07:49.698Z · LW(p) · GW(p)

Rewards need not be written in natural language as crudely as "REWARD: +10 UTILONS". Something to think about as you continue to write text online.

And what of the dead? I own that I thought of myself, at times, almost as dead. Are they not locked below ground in chambers smaller than mine was, in their millions of millions? There is no category of human activity in which the dead do not outnumber the living many times over. Most beautiful children are dead. Most soldiers, most cowards. The fairest women and the most learned men – all are dead. Their bodies repose in caskets, in sarcophagi, beneath arches of rude stone, everywhere under the earth. Their spirits haunt our minds, ears pressed to the bones of our foreheads. Who can say how intently they listen as we speak, or for what word?

comment by evhub · 2021-06-09T02:03:35.965Z · LW(p) · GW(p)

(Moderation note: added to the Alignment Forum from LessWrong.)

comment by mtaran · 2021-06-10T05:36:06.650Z · LW(p) · GW(p)

Nice video reviewing this paper at https://youtu.be/-buULmf7dec

In my experience it's reasonably easy to listen to such videos while doing chores etc.