Shouldn't we 'Just' Superimitate Low-Res Uploads?

post by lukemarks (marc/er) · 2023-11-03T07:42:06.824Z · LW · GW · 1 comment

This is a question post.

Contents

  Answers
    2 Bogdan Ionut Cirstea
None
1 comment

If you have three optimizers which are all very different (one is a neural network, the other was built by hand, and the final one materialized through a Boltzmann Brain-like process) but have identical preferences and an equivalent capacity for optimization, they probably end up doing similar things over a long enough time-scale. 

I bring this up, because in discussions of uploading people seem to gravitate toward obtaining a digital encoding of the entire mind, as though the useful part of the upload contains the fact that it optimizes in a human-like way, when it seems like what we actually want is something that optimizes in any way so long as it is optimizing for the things we want, in all their complexity.

In Does Davidad's Uploading Moonshot Work? [LW · GW], @jacobjacob [LW · GW] brings up the following:

What about the idea of "just" training a giant transformer to, instead of predicting next tokens in natural language, predicting neural activity?

@lisathiergart [LW · GW] replies by saying that this probably violates the assumption that the product would be more aligned than the status quo. I raise the following objections:


Vanessa Kosoy refers to physicalist superimitation as one potential endpoint of the Learning Theoretic Agenda [LW · GW]. She describes superimitation as follows:

An agent (henceforth: the "imitator") that receives the policy of another agent (henceforth: the "original"), and produces behavior which pursues the same goals but significantly better.

If you can superimitate, or otherwise optimize for the preferences of an upload, maybe simple approaches (like predicting the next MEG reading) are sufficient, or at least comparable to training a full upload and leveraging that in spite of being significantly easier?

Answers

answer by Bogdan Ionut Cirstea · 2023-11-03T11:09:17.921Z · LW(p) · GW(p)

You might be interested in this AI safety camp '23 project I proposed of fine-tuning LMs on fMRI data and in some of the linkposts I've published on LW, including e.g. The neuroconnectionist research programme [LW · GW], Scaling laws for language encoding models in fMRI [LW · GW] and Mapping Brains with Language Models: A Survey [LW · GW]. Personally, I'm particularly interested in low-res uploads for automated alignment research, e.g. to plug into something like the superalignment plan (I have some shortform [LW · GW] notes on this).

1 comment

Comments sorted by top scores.

comment by Vladimir_Nesov · 2023-11-03T11:34:14.656Z · LW(p) · GW(p)

Preference/optimization/reward are not well-defined for humans, attempts to shoehorn human data into this frame probably result in monstrous goodharting. Uploading is more straightforwardly valuable where it produces exact uploads that can then be instantiated in vast quantities and with greater serial speed, with the hope that they can then figure out what to do next.