Woods’ new preprint on object permanence

steve2152

Woods’ new preprint on object permanence

post by Steven Byrnes (steve2152) · 2024-03-07T21:29:57.738Z · LW · GW · 1 comments

  The experiment
  How do we make sense of these results?
None
1 comment

Quick poorly-researched post, probably only of interest to neuroscientists.

The experiment

Justin Wood at University of Indiana has, over many years with great effort, developed a system for raising baby chicks such that all the light hitting their retina is experimentally controlled right from when they’re an embryo—the chicks are incubated and hatched in darkness, then moved to a room with video screens, head-tracking and so on. For a much better description of how this works and how he got into this line of work, check out his recent appearance on the Brain Inspired podcast.

He and collaborators posted a new paper last week: “Object permanence in newborn chicks is robust against opposing evidence” by Wood, Ullman, Wood, Spelke, and Wood. I just read it today. It’s really cool!

The official whisky of Wood, Ullman, Wood, Spelke, and Wood

In their paper, they are using the system above to study “object permanence”, the idea that things don’t disappear when they go out of sight behind an occluder. The headline result is that baby chicks continue to act as if object permanence is true, even if they have seen thousands of examples where it is false and zero where it is true over the course of their short lives.

They describe two main experiments. Experiment 1 is the warmup, and Experiment 2 is the headline result I just mentioned.

In experiment 1, the chicks are raised in a VR visual world where they never see anything occlude anything, ever. They only see one virtual object move around an otherwise-empty virtual room. The chicks of course imprint on the object. This phase lasts 4 days. Then we move into the test phase.

The test initializes when the chick moves towards the virtual object, which starts in the center of the room. Two virtual opaque screens appear on the sides of the room.

In the easier variant of the test, the object moves behind one of the screens, and then nothing else happens for a few minutes. The experimenters measure which screen the chick spent more time ~~looking at~~ standing near.^[1] The result: all 8 chicks hung out more near the screen that the virtual object would be behind, than near the other screen, at least for the first 30 seconds or so after the object disappeared from view.
In the harder variant, one of the screens moves to the object, occludes the object, then moves back to its starting point. Again, the experiments measure which screen the chick spent more time nearby. Here, 7 of the 8 chicks hung out near the screen that the virtual object would be behind, at least for 15ish seconds.

Moving on to experiment 2, the test phase was the same as the easier variant above—the object moved to behind one of the two opaque virtual screens on the sides. But the preceding 4-day training phase was different for these chicks: instead of never seeing any occlusion events, they witnessed thousands of occlusion events, where the object would go behind a virtual opaque screen, and then after a variable amount of time (0-20 seconds), the screens would lower to reveal that the object was where we might expect (for the “natural world” chicks), or had magically teleported to behind the “wrong” screen (the “unnatural world” chicks). (There was no randomization—each chick lived its whole training-phase in either the natural or unnatural world.)

Remarkably, all four chicks in the “natural world” and all four chicks in the “unnatural world” spent more time near the screen that the object had disappeared behind, rather than the other one, at least for the first 15-30 seconds. In fact, remarkably, there was no difference between the natural-world and unnatural-world chicks!

How do we make sense of these results?

It’s always worth asking: maybe the experiment is garbage? I’m far from an expert, but the methodology and statistics seem fine as far as I can tell. The effect sizes are big—you can see it at a glance in the reported data. I see no obvious confounders. The experimenters seem like they were scrupulously trying to be careful—for example, they discuss how they fed the chicks amorphous mounds of food in transparent containers, so as not to let them witness any object occlusions.

So I’m inclined to take the experimental results at face value. Moving on:

My regular readers have heard this spiel many times before, but I think we can roughly divide the brain into:

Parts that are learning algorithms and which “learn from scratch” (locally-random initialization) [LW · GW] including the cortex, hippocampus, striatum, cerebellum, amygdala, and some other odds and ends;
Parts that are not learning algorithms in the first place [LW · GW] including mainly the brainstem and hypothalamus.

So, in my head, there are basically two ways to explain the Wood, Ullman, Wood, Spelke, Wood results.

The first centers around the first bullet point, and talks about inductive biases of brain learning algorithms. This is the main explanation mentioned by the authors. It’s definitely possible. It would be interesting that thousands of datapoints over four days was insufficient to overrule these inductive biases. But not impossible, as far as I know. Actually, I kinda have no idea whether I should think of “thousands of datapoints over four days” as being “a lot” or “a little”.

A variant of that explanation involves not inductive biases per se but rather that the learning algorithms are pretrained when the chicks are still in their eggs, using a kind of prenatal “synthetic training data” called “retinal waves”. The authors mention this one too.

But I also want to bring up a second possible category of explanation, namely that the story centers around innate algorithms built into the hypothalamus and brainstem, which are not learning algorithms at all—so terms like “inductive bias” and “pretraining” would be the wrong way to think about them.

The most famous brainstem thing in the vicinity of this paper is the “orienting reflex”. If you hear an unexpected sound, see an unexpected flash, feel an unexpected touch, etc., then you will tend to move your eyes, head, neck, and sometimes body to investigate it. This is a brainstem reflex, centrally involving the superior colliculus.^[2]

More specifically, I think the superficial (shallow) layers of the superior colliculus have a retinotopic map (ref), i.e. each different part of the superior colliculus gets light from a different part of the visual field (along with partially-processed sound, somatosensory, and other inputs), and then figures out whether anything is happening here that warrants an orienting reflex, and then actually executes that reflex if appropriate (by moving in just the right way to bring that area into the center of the visual field).

Now we’re getting more speculative / ill-informed, but I think this system is more sophisticated than just what I said above.

For one thing, if there’s weak evidence that some part of the visual field is worth orienting towards, then there are (supposedly) superior colliculus cells that gradually accumulate that Bayesian evidence and trigger an orienting reflex when the cumulative strength-of-evidence is strong enough (ref).^[3]

It would be cool if that accumulated evidence was also appropriately shifted around the surface of the superior colliculus when your eyes move,^[4] so that the evidence can continue to accumulate instead of restarting from complete uncertainty every time your eyes move at all. In a brief search, I couldn’t easily confirm or disconfirm whether the superior colliculus is capable of doing that. But if it does, then the superior colliculus would have a semi-stable map of “how much important stuff seems to be happening in each different part of my egocentric world”.

Supplementing that picture is this paper which (the way I interpreted it) says that the rodent superior colliculus is always secretly keeping track of the answer to the question “if I suddenly have to scamper away from a threat, which direction in the world should I scamper towards?” (Darkness and shelter are good, bright open fields are bad, if you’re a rodent fleeing from birds-of-prey.) That’s a hint that maybe this map-of-the-egocentric-world is tracking more than just the single parameter “this direction does or doesn’t merit an orienting reflex”.

If something like that above picture is true, then in the experiment above, it would be pretty natural to expect the superior colliculus to be flagging that the imprinted object (or maybe just “something important” or “something I want to go to”) might still be in the part of the visual world where it was last seen. And that’s basically my hypothesis / hunch for the Wood et al. results.

If that’s true, then we wouldn’t describe that as an inductive bias, or pretraining—again, this isn’t a learning algorithm. Instead, we would say “there’s an innate brainstem algorithm that was designed by evolution to work in a world in which object permanence is valid”.

[UPDATE: Spelling it out for clarity, this alleged superior colliculus algorithm might work something like: (1) At each point of retinotopic space, there are a couple parameters that we can think of as “credence that there’s something worth looking at here” and “credence that there’s something here that I want to approach”. (2) Continually do Bayesian updating of all those parameters based on fresh incoming visual data, as processed in other parts of the superior colliculus. (3) If there’s optical flow, then shift these parameters around to follow that flow. (4) Use these parameters for approach and orienting decisions. This algorithm only works well, i.e. only leads to evolutionarily-sensible behavior, if “things that are worth looking at” and “things that are worth approaching” actually follow the local optical flow, as opposed to teleporting. Note that this algorithm is not a within-lifetime learning algorithm—but of course meanwhile there are vision-related learning algorithms in other parts of the brain, and also, if we get into more detailed implementation, then there are some dynamic adjustable parameters involved (see §2.3.3 here [LW · GW]).]

As a footnote, more broadly, I think people tend to widely underemphasize the role of the hypothalamus and brainstem in navigating and tracking surroundings. For example, the dorsal & ventral tegmental nuclei of Gudden, and the lateral & medial mammillary nuclei, are all intimately involved in tracking head direction and navigating the world, and AFAICT nobody knows exactly what calculations any of those regions are doing or how they’re doing it. (I even have a heretical hunch that the famous “grid cells” in the entorhinal cortex might be downstream of a more ancient path-integration / dead-reckoning system involving those brainstem and hypothalamus regions.) Relatedly, researchers have sometimes surgically removed the entire cortex of rats, and these “decorticate rats” can get around their environment reasonably well, including the ability to swim towards and climb onto visible platforms in the water. I think they also do basic things like “not walk into walls”. (They do sometimes get helplessly stuck in narrow passageways, and also, they fail the swim test when the platform is just below the water surface, such that its location needs to be triangulated from distant visual landmarks, if I recall correctly.) (ref)

As mentioned at the top, this is a quick sloppy blog post. Please comment below, or email me, if you have any thoughts!

^{^}
UPDATE MARCH 8: When I first published this, I talked in several places about the chicks “looking towards” the left or right screen. Oops, that was an error. The measurement was actually about which screen the chicks spent more time standing near (see filial imprinting). Thanks to Justin Wood for the emailed correction.
^{^}
I’m using the mammal term “superior colliculus” instead of the equivalent bird term “optic tectum” because most of the literature I’m citing involves rodent experiments. I’m doing that based on my impression that the rodent superior colliculus is doing generally similar things in similar ways as the chicken optic tectum, despite the 300M years that have elapsed since their last common ancestor. But I haven’t checked that in detail, and someone should tell me if I’m wrong.
^{^}
UPDATE MARCH 11: Just to clarify, that reference involves a so-called “diffusion model”, whereas I’m describing it in the text as tracking credences and doing Bayesian updates. But that’s OK because this paper (which I admittedly didn’t check in detail) says that those are two ways to talk about the same algorithm.
^{^}
If I were an Intelligent Designer, I would shift things around based on local optical flow. But I don’t know if the superior colliculus can do that or not.

1 comments

Comments sorted by top scores.

comment by Mikhail Samin (mikhail-samin) · 2024-03-08T04:24:46.568Z · LW(p) · GW(p)

(I read the experiments and only skimmed through the rest.) I feel fairly confident I would’ve predicted the results of the first experiment, despite the possibility of hindsight bias; I predicted what I will see before reading the results of the second one (though the results were in my vision field). I think object permanence and movement is much more important than appearance after being occluded. I.e., you might expect the object to be somewhere, you might have your eyes follow an object, and when it’s not where it should be, you get some error, but you still look there. I feel less certain what happens if you never see objects moving; following things with your sight is probably not hardwired with no data; but if you see a lot of moving objects, I think you look where you expect it to be, even if it’s not there.

An experiment that I’d like to see would be:

Object A moves behind screen 1; object B moves from screen 1 and behind screen 2; the chick is only interested in object A; where does it look? My prediction (feels obvious!): it will look on screen 2 more than if there’s no object B.

Woods’ new preprint on object permanence

Contents

The experiment

How do we make sense of these results?

1 comments