Posts

Jan's Shortform 2023-02-27T02:44:02.473Z
[Simulators seminar sequence] #2 Semiotic physics - revamped 2023-02-27T00:25:52.635Z
This week in fashion 2023-01-23T17:23:57.191Z
[Simulators seminar sequence] #1 Background & shared assumptions 2023-01-02T23:48:50.298Z
[Hebbian Natural Abstractions] Mathematical Foundations 2022-12-25T20:58:03.423Z
Results from a survey on tool use and workflows in alignment research 2022-12-19T15:19:52.560Z
[Hebbian Natural Abstractions] Introduction 2022-11-21T20:34:15.298Z
"Brain enthusiasts" in AI Safety 2022-06-18T09:59:04.149Z
A descriptive, not prescriptive, overview of current AI Alignment Research 2022-06-06T21:59:22.344Z
The Brain That Builds Itself 2022-05-31T09:42:57.028Z
Adversarial attacks and optimal control 2022-05-22T18:22:49.975Z
Elementary Infra-Bayesianism 2022-05-08T12:23:00.192Z
Continental Philosophy as Undergraduate Mathematics 2022-04-26T08:05:17.433Z
Pop Culture Alignment Research and Taxes 2022-04-16T15:45:26.831Z
A Brief Excursion Into Molecular Neuroscience 2022-04-10T17:55:10.174Z
Compute Governance: The Role of Commodity Hardware 2022-03-26T10:08:07.518Z
A survey of tool use and workflows in alignment research 2022-03-23T23:44:30.058Z
On Context And People 2022-03-19T23:38:36.701Z
Via productiva - my writing and productivity framework 2022-03-06T16:06:19.816Z
Trust-maximizing AGI 2022-02-25T15:13:14.241Z
Inferring utility functions from locally non-transitive preferences 2022-02-10T10:33:18.433Z
The Greedy Doctor Problem... turns out to be relevant to the ELK problem? 2022-01-14T11:58:05.107Z
The Unreasonable Feasibility Of Playing Chess Under The Influence 2022-01-12T23:09:57.679Z
Belief-conditional things - things that only exist when you believe in them 2021-12-25T10:49:38.203Z
On (Not) Reading Papers 2021-12-21T09:57:19.416Z
Slightly advanced decision theory 102: Four reasons not to be a (naive) utility maximizer 2021-11-23T11:02:38.256Z
The Greedy Doctor Problem 2021-11-16T22:06:15.724Z
Drug addicts and deceptively aligned agents - a comparative analysis 2021-11-05T21:42:48.993Z
Frankfurt Declaration on the Cambridge Declaration on Consciousness 2021-10-24T09:54:13.127Z
Applied Mathematical Logic For The Practicing Researcher 2021-10-17T20:28:10.211Z
How to build a mind - neuroscience edition 2021-10-03T21:45:59.629Z
Cognitive Biases in Large Language Models 2021-09-25T20:59:51.755Z
Soldiers, Scouts, and Albatrosses. 2021-09-12T10:36:32.719Z
Frankfurt, Germany – ACX Meetups Everywhere 2021 2021-08-23T08:46:40.821Z

Comments

Comment by Jan (jan-2) on Jan's Shortform · 2023-02-27T02:44:02.904Z · LW · GW

Neuroscience and Natural Abstractions

Similarities in structure and function abound in biology; individual neurons that activate exclusively to particular oriented stimuli exist in animals from drosophila (Strother et al. 2017) via pigeons (Li et al. 2007) and turtles (Ammermueller et al. 1995) to macaques (De Valois et al. 1982). The universality of major functional response classes in biology suggests that the neural systems underlying information processing in biology might be highly stereotyped (Van Hooser, 2007, Scholl et al. 2013). In line with this hypothesis, a wide range of neural phenomena emerge as optimal solutions to their respective functional requirements (Poggio 1981, Wolf 2003, Todorov 2004, Gardner 2019). Intriguingly, recent studies on artificial neural networks that approach human-level performance reveal surprising similarity between emerging representations in both artificial and biological brains (Kriegeskorte 2015, Yamins et al. 2016, Zhuang et al. 2020).

Despite the commonalities across different animal species, there is also substantial variability (Van Hooser, 2007). One prominent example of a functional neural structure that is present in some, but absent in other, animals is the orientation pinwheel in the primary visual cortex (Meng et al. 2012), synaptic clustering with respect to orientation selectivity (Kirchner et al. 2021), or the distinct three-layered cortex in reptiles (Tosches et al. 2018). These examples demonstrate that while general organization principles might be universal, the details of how exactly and where in the brain the principles manifest is highly dependent on anatomical factors (Keil et al. 2012, Kirchner et al. 2021), genetic lineage (Tosches et al. 2018), and ecological factors (Roeth et al. 2021). Thus, the universality hypothesis as applied to biological systems does not imply perfect replication of a given feature across all instances of the system. Rather, it suggests that there are broad principles or abstractions that underlie the function of cognitive systems, which are conserved across different species and contexts.

Comment by Jan (jan-2) on [Simulators seminar sequence] #2 Semiotic physics - revamped · 2023-02-27T01:38:12.808Z · LW · GW

Hi, thanks for the response! I apologize, the "Left as an exercise" line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people's time and I feel pretty bad about it. Mea culpa.

I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn't be shocked if there are mistakes in there). Regarding your points:

  1. The limit now shows up on both sides of the equation (as it should)! The dependence on  on the RHS does actually kind of drop away at some point, but I'm not showing that here. I'd previously just sloppily substituted "chose  as a large number" and then rewrite the proposition in the way indicated at the end of the Note for Proposition 2. That's the way these large deviation principles are typically used.
  2. Yeah, that should have been an  rather than a . Sorry, sloppy.
  3. True. Thinking more about it now, perhaps framing the proposition in terms of "bridges" was a confusing choice; if I revisit this post again (in a month or so 🤦‍♂️) I will work on cleaning that up. 
Comment by Jan (jan-2) on [Simulators seminar sequence] #2 Semiotic physics - revamped · 2023-02-27T00:59:04.872Z · LW · GW

Hmm there was a bunch of back and forth on this point even before the first version of the post, with @Michael Oesterle  and @metasemi arguing what you are arguing. My motivation for calling the token the state is that A) the math gets easier/cleaner that way and B) it matches my geometric intuitions. In particular, if I have a first-order dynamical system  then  is the state, not the trajectory of states . In this situation, the dynamics of the system only depend on the current state (that's because it's a first-order system). When we move to higher-order systems, , then the state is still just , but the dynamics of the system but also the "direction from which we entered it". That's the first derivative (in a time-continuous system) or the previous state (in a time-discrete system).

At least I think that's what's going on. If someone makes a compelling argument that defuses my argument then I'm happy to concede!

Comment by Jan (jan-2) on [Simulators seminar sequence] #2 Semiotic physics - revamped · 2023-02-27T00:36:27.742Z · LW · GW

Thanks for pointing this out! This argument made it into the revised version. I think because of finite precision it's reasonable to assume that such an  always exists in practice (if we also assume that the probability gets rounded to something < 1).

Comment by Jan (jan-2) on [Simulators seminar sequence] #2 Semiotic physics - revamped · 2023-02-27T00:34:09.797Z · LW · GW

Technically correct, thanks for pointing that out! This comment (and the ones like it) was the motivation for introducing the "non-degenerate" requirement into the text. In practice, the proposition holds pretty well - although I agree it would nice to have a deeper understanding of when to expect the transition rule to be "non-degenerate"

Comment by Jan (jan-2) on Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review) · 2023-01-30T00:37:46.421Z · LW · GW

Thanks for sharing your thoughts Shos! :)

Comment by Jan (jan-2) on This week in fashion · 2023-01-25T16:38:12.865Z · LW · GW

Hmmm good point. I originally made that decision because loading the image from the server was actually kind of slow. But then I figured out asynchronicity, so could totally change it... I'll see if I find some time later today to push an update! (to make an 'all vs all' mode in addition to the 'King of the hill')

Comment by Jan (jan-2) on This week in fashion · 2023-01-24T01:26:08.822Z · LW · GW

Hi Jennifer!

Awesome, thank you for the thoughtful comment! The links are super interesting, reminds me of some of the research in empirical aesthetics I read forever ago.

On the topic of circular preferences: It turns out that the type of reward model I am training here handles non-transitive preferences in a "sensible" fashion. In particular, if you're "non-circular on average" (i.e. you only make accidental "mistakes" in your rating) then the model averages that out. And if you consitently have a loopy utility function, then the reward model will map all the elements of the loop onto the same reward value.

Finally: Yes, totally, feel free to send me the guest ID either here of via DM!

Comment by Jan (jan-2) on [Simulators seminar sequence] #2 Semiotic physics - revamped · 2023-01-04T17:30:09.314Z · LW · GW

Hi Erik! Thank you for the careful read, this is awesome!

Regarding proposition 1 - I think you're right, that counter-example disproves the proposition. The proposition we were actually going for was  ,  i.e. the probability without the end of the bridge! I'll fix this in the post.

Regarding proposition II - janus had the same intuition and I tried to explain it with the following argument: When the distance between tokens becomes large enough, then eventually all bridges between the first token and an arbitrary second token end up with approximately the same "cost".  At that point, only the prior likelihood of the token will decide which token gets sampled. So Proposition II implies something like , or that in the limit "the probability of the most likely sequence ending in  will be (when appropriately normalized) proportional to the probability of ", which seems sensible? (assuming something like ergodicity). Although I'm now becoming a bit suspicious about the sign of the exponent, perhaps there is a "log" or a minus missing on the RHS... I'll think about that a bit more.

Comment by Jan (jan-2) on [Hebbian Natural Abstractions] Mathematical Foundations · 2022-12-27T00:02:25.855Z · LW · GW

Uhhh exciting! Thanks for sharing!

Comment by Jan (jan-2) on The Greedy Doctor Problem... turns out to be relevant to the ELK problem? · 2022-08-01T17:36:26.495Z · LW · GW

Huh, thanks for spotting that! Yes, should totally be ELK 😀 Fixed it.

Comment by Jan (jan-2) on Formal Philosophy and Alignment Possible Projects · 2022-07-02T13:17:42.799Z · LW · GW

This work by Michael Aird and Justin Shovelain might also be relevant: "Using vector fields to visualise preferences and make them consistent"

And I have a post where I demonstrate that reward modeling can extract utility functions from non-transitive preference orderings: "Inferring utility functions from locally non-transitive preferences"

(Extremely cool project ideas btw)

Comment by Jan (jan-2) on A descriptive, not prescriptive, overview of current AI Alignment Research · 2022-07-01T11:00:37.202Z · LW · GW

Hey Ben! :) Thanks for the comment and the careful reading!

Yes, we only added the missing arx.iv papers after clustering, but then we repeat the dimensionality reduction and show that the original clustering still holds up even with the new papers (Figure 4 bottom right). I think that's pretty neat (especially since the dimensionality reduction doesn't "know" about the clustering) but of course the clusters might look slightly different if we also re-run k-means on the extended dataset.

Comment by Jan (jan-2) on [Link] Adversarially trained neural representations may already be as robust as corresponding biological neural representations · 2022-06-25T21:22:24.151Z · LW · GW

There's an important caveat here:

The visual stimuli are presented 8 degrees over the visual field for 100ms followed by a 100ms grey mask as in a standard rapid serial visual presentation (RSVP) task.

I'd be willing to bet that if you give the macaque more than 100ms they'll get it right - That's at least how it is for humans!

(Not trying to shift the goalpost, it's a cool result! Just pointing at the next step.)

Comment by Jan (jan-2) on "Brain enthusiasts" in AI Safety · 2022-06-22T09:44:38.415Z · LW · GW

Great points, thanks for the comment! :) I agree that there are potentially some very low-hanging fruits. I could even imagine that some of these methods work better in artificial networks than in biological networks (less noise, more controlled environment).

But I believe one of the major bottlenecks might be that the weights and activations of an artificial neural network are just so difficult to access? Putting the weights and activations of a large model like GPT-3 under the microscope requires impressive hardware (running forward passes, storing the activations, transforming everything into a useful form, ...) and then there are so many parameters to look at. 

Giving researchers structured access to the model via a research API could solve a lot of those difficulties and appears like something that totally should exist (although there is of course the danger of accelerating progress on the capabilities side also).

Comment by Jan (jan-2) on "Brain enthusiasts" in AI Safety · 2022-06-18T19:06:05.150Z · LW · GW

Great point! And thanks for the references :) 

I'll change your background to Computational Cognitive Science in the table! (unless you object or think a different field is even more appropriate)

Comment by Jan (jan-2) on A descriptive, not prescriptive, overview of current AI Alignment Research · 2022-06-09T12:15:31.046Z · LW · GW

Thank you for the comment and the questions! :)

This is not clear from how we wrote the paper but we actually do the clustering in the full 768-dimensional space! If you look closely as the clustering plot you can see that the clusters are slightly overlapping - that would be impossible with k-means in 2D, since in that setting membership is determined by distance from the 2D centroid.

Comment by Jan (jan-2) on The Brain That Builds Itself · 2022-06-03T20:08:54.410Z · LW · GW

Oh true, I completely overlooked that! (if I keep collecting mistakes like this I'll soon have enough for a "My mistakes" page)

Comment by Jan (jan-2) on The Brain That Builds Itself · 2022-06-02T13:53:16.536Z · LW · GW

Yes, good point! I had that in an earlier draft and then removed it for simplicity and for the other argument you're making!

Comment by Jan (jan-2) on Adversarial attacks and optimal control · 2022-05-29T12:42:04.847Z · LW · GW

This sounds right to me! In particular, I just (re-)discovered this old post by Yudkowsky and this newer post by Alex Flint that both go a lot deeper on the topic. I think the optimal control perspective is a nice complement to those posts and if I find the time to look more into this then that work is probably the right direction.

Comment by Jan (jan-2) on [Alignment] Is there a census on who's working on what? · 2022-05-24T07:39:51.985Z · LW · GW

As part of the AI Safety Camp our team is preparing a research report on the state of AI safety! Should be online within a week or two :)

Comment by Jan (jan-2) on Adversarial attacks and optimal control · 2022-05-23T20:36:48.015Z · LW · GW

Interesting, I added a note to the text highlighting this! I was not aware of that part of the story at all. That makes it more of a Moloch-example than a "mistaking adversarial for random"-example.

Comment by Jan (jan-2) on Adversarial attacks and optimal control · 2022-05-23T09:29:27.015Z · LW · GW

Yes, that's a pretty fair interpretation! The macroscopic/folk psychology notion of "surprise" of course doesn't map super cleanly onto the information-theoretic notion. But I tend to think of it as: there is a certain "expected surprise" about what future possible states might look like if everything evolves "as usual", . And then there is the (usually larger) "additional surprise" about the states that the AI might steer us into, . The delta between those two is the "excess surprise" that the AI needs to be able to bring about.

It's tricky to come up with a straightforward setting where the actions of the AI can be measured in nats, but perhaps the following works as an intuition pump: "If we give the AI full, unrestricted access to a control panel that controls the universe, how many operations does it have to perform to bring about the catastrophic event?". That's clearly still not well defined (there is no obvious/privileged way that the panel should look like), but it shows that 1) the "excess surprise" is a lower bound (we wouldn't usually give the AI unrestricted access to that panel) and 2) that the minimum amount of operations required to bring about a catastrophic event is probably still larger than 1.

Comment by Jan (jan-2) on Elementary Infra-Bayesianism · 2022-05-09T11:58:39.878Z · LW · GW

Thank you for your comment! You are right, these things are not clear from this post at all and I did not do a good job at clarifying that. I'm a bit low on time atm, but hopefully, I'll be able to make some edits to the post to set the expectations for the reader more carefully.

The short answer to your question is: Yep, X is the space of events. In Vanessa's post it has to be compact and metric, I'm simplifying this to an interval in R. And  can be derived from  by plugging in g=0 and replacing the measure  by the Lesbegue integral . I have scattered notes where I derive the equations in this post. But it was clear to me that if I want to do this rigorously in the post, then I'd have to introduce an annoying amount of measure theory and the post would turn into a slog. So I decided to do things hand-wavy, but went a bit too hard in that direction.

Comment by Jan (jan-2) on High-stakes alignment via adversarial training [Redwood Research report] · 2022-05-05T21:08:38.506Z · LW · GW

Cool paper, great to see the project worked out! (:

One question: How do you know the contractors weren't just answering randomly (or were confused about the task) in your "quality after filtering" experiments (Table 4)? Is there agreement across contractors about the quality of completions (in case they saw the same completions)?

Comment by Jan (jan-2) on GPT-3 and concept extrapolation · 2022-04-21T08:42:16.880Z · LW · GW

Fascinating! Thanks for sharing!

Comment by Jan (jan-2) on GPT-3 and concept extrapolation · 2022-04-20T11:53:09.406Z · LW · GW
Comment by Jan (jan-2) on GPT-3 and concept extrapolation · 2022-04-20T11:52:46.847Z · LW · GW

Cool experiment! I could imagine that the tokenizer handicaps GPT's performance here (reversing the characters leads to completely different tokens). With a character-level tokenizer GPT should/might be able to handle that task better!

Comment by Jan (jan-2) on Pop Culture Alignment Research and Taxes · 2022-04-18T17:52:59.401Z · LW · GW

Interesting, thank you! I guess I was thinking of deception as characterized by Evan Hubinger, with mesa-optimizers, bells, whistles, and all. But I can see how a sufficiently large competence-vs-performance gap could also count as deception.

Comment by Jan (jan-2) on Pop Culture Alignment Research and Taxes · 2022-04-18T08:40:42.580Z · LW · GW

Thanks for the comment! I'm curious about the Anthropic Codex code-vulnerability prompting, is this written up somewhere? The closest I could find is this, but. I don't think that's what you're referencing?

Comment by Jan (jan-2) on Pop Culture Alignment Research and Taxes · 2022-04-18T08:24:51.907Z · LW · GW

I was not aware of this, thanks for pointing this out! I made a note in the text. I guess this is not an example of "advanced AI with an unfortunately misspecified goal" but rather just an example of the much larger class of "system with an unfortunately misspecified goal".

Comment by Jan (jan-2) on Pop Culture Alignment Research and Taxes · 2022-04-17T07:30:49.231Z · LW · GW

Thanks for the comment, I did not know this! I'll put a note in the essay to highlight this comment.

Comment by Jan (jan-2) on Pop Culture Alignment Research and Taxes · 2022-04-17T07:28:54.228Z · LW · GW

Iiinteresting! Thanks for sharing! Yes, the choice of how to measure this affects the outcome a lot..

Comment by Jan (jan-2) on A Brief Excursion Into Molecular Neuroscience · 2022-04-11T13:36:55.334Z · LW · GW

Hmm, fair, I think you might get along fine with my coworker from footnote 6 :) I'm not even sure there is a better way to write these titles - but they can still be very intimidating for an outsider.

Comment by Jan (jan-2) on A Brief Excursion Into Molecular Neuroscience · 2022-04-11T13:35:29.582Z · LW · GW

Yes, I agree, a model can really push intuition to the next level! There is a failure mode where people just throw everything into a model and hope that the result will make sense. In my experience that just produces a mess, and you need some intuition for how to properly set up the model.

Comment by Jan (jan-2) on A Brief Excursion Into Molecular Neuroscience · 2022-04-10T18:42:25.457Z · LW · GW

Hi! :) Thanks for the comment! Yes, that's on purpose, the idea is that a lot of the shorthand in molecular neuroscience are very hard to digest. So since the exact letters don't matter I intentionally garbled them with a Glitch Text Generator. But perhaps that isn't very clear without explanation, I'll add something.

This word Ǫ̵͎͊G̶̦̉̇l̶͉͇̝̽͆̚i̷͔̓̏͌c̷̱̙̍̂͜k̷̠͍͌l̷̢̍͗̃n̷̖͇̏̆å̴̤c̵̲̼̫͑̎̆ f.e. a garbled version of O-GLicklnac, which in term is the phonetic version of "O-GlcNAc"

Comment by Jan (jan-2) on Theories of Modularity in the Biological Literature · 2022-04-04T14:38:42.458Z · LW · GW

Theory #4 appears very natural to me, especially in the light of papers like Chen et al 2006 or Cuntz et al 2012. And another supporting intuition from developmental neuroscience is that development is a huge mess and that figuring out where to put a long-range connection is really involved.  And there can be a bunch of circuit remodeling on a local scale, once you established a long-range connection, there is little hope of substantially rewiring it.

In case you want to dive deeper into this (and you don't want to read all those papers), I'd be happy to chat more about this :)

Comment by Jan (jan-2) on Basic Inframeasure Theory · 2022-03-30T16:51:40.132Z · LW · GW

I've been meaning to dive into this for-e-ver and only now find the time for it! This is really neat stuff, haven't enjoyed a framework this much since logical induction. Thank you for writing this!

Comment by Jan (jan-2) on Compute Governance: The Role of Commodity Hardware · 2022-03-28T10:04:38.158Z · LW · GW

Yep, I agree, SLIDE is probably a dud. Thanks for the references! And my inside view is also that current trends will probably continue and most interesting stuff will happen on AI-specialized hardware.

Comment by Jan (jan-2) on Compute Governance: The Role of Commodity Hardware · 2022-03-26T20:22:22.180Z · LW · GW

Thank you for the comment! You are right, that should be a ReLu in the illustration, I'll fix it :)

Comment by Jan (jan-2) on Inferring utility functions from locally non-transitive preferences · 2022-02-15T10:11:58.891Z · LW · GW

Great explanation, I feel substantially less confused now. And thank you for adding two new shoulder advisors to my repertorie :D

Comment by Jan (jan-2) on Inferring utility functions from locally non-transitive preferences · 2022-02-15T10:07:36.575Z · LW · GW

Thank you for the thoughtful reply!

3. I agree with your point, especially that  should be true.

But I think I can salvage my point by making a further distinction. When I write  I actually mean  where  is a semantic embedding that takes sentences to vectors. Already at the level of the embedding we probably have 

and that's (potentially) a good thing! Because if we structure our embedding in such a way that  points to something that is actually comparable to the conjunction of the two, then our utility function can just be naively linear in the way I constructed it above, I belieeeeeve that this is what I wanted to gesture at when I said that we need to identify an appropriate basis in an appropriate space (i.e. where and whatever else we might want out of the embedding). But I have a large amount of uncertainty around all of this.

Comment by Jan (jan-2) on Inferring utility functions from locally non-transitive preferences · 2022-02-12T13:48:52.693Z · LW · GW

Awesome, thanks for the feedback Eric! And glad to hear you enjoyed the post!

I'm confused why you're using a neural network

Good point, for the example post it was total overkill. The reason I went with a NN was to demonstrate the link with the usual setting in which preference learning is applied. And in general, NNs generalize better than the table-based approach ( see also my response to Charlie Steiner ).

happy to chat about that

I definitely plan to write a follow-up to this post, will come back to your offer when that follow-up reaches the front of my queue :)

But there doesn't seem to be any point where we can't infer the best possible approximation at all.

Hadn't thought about this before! Perhaps it could work to compare the inferred utility function with a random baseline? I.e. the baseline policy would be "for every comparison, flip a coin and make that your prediction about the human preference". 

If this happens to accurately describe how the human makes the decision, then the utility function should not be able to perform better than the baseline (and perhaps even worse). How much more structure can we add to the human choice before the utility function performs better than the random baseline?

it's not obvious to me that approximating inconsistent preferences using a utility function is the "right" thing to do

True! I guess one proposal to resolve these inconsistencies is CEV, although that is not very computable.

Comment by Jan (jan-2) on Inferring utility functions from locally non-transitive preferences · 2022-02-12T13:35:57.028Z · LW · GW

Thanks for the comment! (:

  1. True, fixed it! I was confused there for a bit.
  2. This is also true. I wrote it like this because the proof sketch on Wikipedia included that step. And I guess if step 3 can't be executed (complicated), then it's nice to have the sorted list as a next-best-thing.
  3. Those are interesting points and I'm not sure I have a good answer (because the underlying problems are quite deep, I think). My statement about linearity in semantic embeddings is motivated by something like the famous "King – Man + Woman = Queen" from word2vec. Regarding linearity of the utility function - I think this should be given by definition, or? (Hand-wavy: Using this we can write  and so on).
    But your point is well-taken, the semantic embedding is not actually always linear. This requires some more thought.
  4. Ahhh very interesting, I'd not have expected that intuitively, in particular after reading the comment from @cousing_it above. I wonder how an explicit solution with the Hodge decomposition can be reconciled with the NP-hardness of the problem :thinkies:
Comment by Jan (jan-2) on Inferring utility functions from locally non-transitive preferences · 2022-02-12T13:09:13.963Z · LW · GW

Hey Charlie! 

Good comment, gave me a feeling "oh, ups, why didn't I?" for a while. I think having the Elo-like algorithm as a baseline to compare to would have been a good thing to have in any case. But there is something that the NN can do that the Elo-like algorithm can't; generalization. Every "new" element (or even an interpolation of older elements) will get the "initial score" (like 1500 in chess) in Elo, while the NN can exploit similarities between the new element and older elements.

Comment by Jan (jan-2) on Inferring utility functions from locally non-transitive preferences · 2022-02-12T13:05:00.586Z · LW · GW

Fantastic, thank you for the pointer, learned something new today! A unique and explicit representation would be very neat indeed.

Comment by Jan (jan-2) on [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain · 2022-02-11T01:07:41.606Z · LW · GW

I'm pretty confused here.

Yeah, the feeling's mutual 😅 But the discussion is also very rewarding for me, thank you for engaging!

I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don't think those two things are in opposition to each other.

A couple of thoughts:

  • Yes, I agree that the inductive bias (/genetically hardcoded information) can live in different components: the learning rule, the network architecture, or the initialization of the weights. So learning-from-scratch is logically compatible with inductive biases - we can just put all the inductive bias into the learning rule and the architecture and none in the weights.
    • But from the architecture and the learning rule, the hardcoded info can enter into the weights very rapidly (f.e. first step of the learning rule: set all the weights to the values appropriate for an adult brain. Or, more realistically, a ConvNet architecture can be learned from a DNN by setting a lot of connections to zero). Therefore I don't see what it could buy you to assume the weights to be free of inductive bias.
    • There might also be a case that in the actual biological brain the weights are not initialized randomly. See f.e. this work on clonally related neurons.
  • Something that is not appreciated a lot outside of neuroscience: "Learning" in the brain is as much a structural process as it is a "changing weights" process.  This is particularly true throughout development but also into adulthood - activity-dependent learning rules do not only adjust the weights of connections, but they can also prune bad connections and add new connections. The brain simultaneously produces activity, which induces plasticity, which changes the circuit, which produces slightly different activity in turn.

The point is, this kind of explanation does not talk about subplates and synapses, it talks about principles of algorithms and computations.

That sounds a lot more like cognitive science than neuroscience! This is completely fine (I did my undergrad in CogSci), but it requires a different set of arguments from the ones you are providing in your post, I think. If you want to make a CogSci case for learning from scratch, then your argument has to be a lot more constructive (i.e. literally walk us through the steps of how your proposed system can learn all/a lot of what humans can learn). Either you take a look at what is there in the brain (subplate, synapses, ...), describe how these things interact, and (correctly) infer that it's sufficient to produce a mind (this is the neuroscience strategy); Or you propose an abstract system, demonstrate that it can do the same thing as the mind, and then demonstrate that the components of the abstract system can be identified with the biological brain (this is the CogSci strategy). I think you're skipping step two of the CogSci strategy.

Whatever that explanation is, it's a thing that we can turn into a design spec for our own algorithms, which, powered by the same engineering principles, will do the same computations, with the same results.

I'm on board with that. I anticipate that the design spec will contain (the equivalent of) a ton of hardcoded genetic stuff also for the "learning subsystem"/cortex. From a CogSci perspective, I'm willing to assume that this genetic stuff could be in the learning rule and the architecture, not in the initial weights. From a neuroscience perspective, I'm not convinced that's the case.

is that true even if there haven't been any retinal waves?

Blocking retinal waves messes up the cortex pretty substantially (same as if the animal were born without eyes). There is the beta-2 knockout mouse, which has retinal waves but with weaker spatiotemporal correlations.  As a consequence beta-2 mice fail to track moving gratings and have disrupted receptive fields.

Comment by Jan (jan-2) on Inferring utility functions from locally non-transitive preferences · 2022-02-10T16:27:52.458Z · LW · GW

Yes please, would be excited to see that!

Comment by Jan (jan-2) on [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain · 2022-02-07T18:34:15.741Z · LW · GW

Here's an operationalization. Suppose someday we write computer code that can do the exact same useful computational things that the neocortex (etc.) does, for the exact same reason. My question is: Might that code look like a learning-from-scratch algorithm?

Hmm, I see. If this is the crux, then I'll put all the remaining nitpicking at the end of my comment and just say: I think I'm on board with your argument. Yes, it seems conceivable to me that a learning-from-scratch program ends up in a (functionally) very similar state to the brain. The trajectory of how the program ends up there over training probably looks different (and might take a bit longer if it doesn't use the shortcuts that the brain got from evolution), but I don't think the stuff that evolution put in the cortex is strictly necessary.

A caveat: I'm not sure how much weight the similarity between the program and the brain can support before it breaks down. I'd strongly suspect that certain aspects of the cortex are not logically implied by the statistics of the environment, but rather represent idiosyncratic quirks that were adapted at some point during evolution. Those idiosyncratic quirks won't be in the learning-from-scratch program. But perhaps (probably?) they are also not relevant in the big scheme of things.

I'm inclined to put different layer thicknesses (including agranularity) in the category of "non-uniform hyperparameters".

Fair! Most people in computational neuroscience are also very happy to ignore those differences, and so far nothing terribly bad happened.

If you buy the "locally-random pattern separation" story (Section 2.5.4), that would make it impossible for evolution to initialize the adjustable parameters in a non-locally-random way.

You point out yourself that some areas (f.e. the motor cortex) are granular, so that argument doesn't work there. But ignoring that, and conceding the cerebellum and the drosophila mushroom body to you (not my area of expertise), I'm pretty doubtful about postulating "locally-random pattern separation" in the cortex. I'm interpreting your thesis to cash out as "Given a handful of granule cells from layer 4, the connectivity with pyramidal neurons in layer 2/3 is (initially) effectively random, and therefore layer 2/3 neurons need to learn (from scratch) how to interpret the signal from layer 4". Is that an okay summary?

Because then I think this fails at three points:

  1. One characteristic feature of the cortex is the presence of cortical maps. They exist in basically all sensory and motor cortices, and they have a very regular structure that is present in animal species separated by as much as 64 million years of evolution. These maps imply that if you pick a handful of granule cells from layer 4 that are located nearby, their functional properties will be somewhat similar! Therefore, even if connectivity between L4 and L2/3 is locally random it doesn't really matter since the input is somewhat similar in any case. Evolution could "use" that fact to pre-structure the circuit in L2/3.
  2. Connectivity between L4 and L2/3 is not random. Projections from layer 4 are specific to different portions of the postsynaptic dendrite, and nearby synapses on mature and developing dendrites tend to share similar activation patterns. Perhaps you want to argue that this non-randomness only emerges through learning and the initial configuration is random? That's a possibility, but ...
  3. ... when you record activity from neurons in the cortex of an animal that had zero visual experience prior to the experiment (lid-suture), they are still orientation-selective! And so is the topographic arrangement of retinal inputs and the segregation of eye-specific inputs. At the point of eye-opening, the animals are already pretty much able to navigate their environment.

Obviously, there are still a lot of things that need to be refined and set up during later development, but defects in these early stages of network initialization are pretty bad (a lot of neurodevelopmental disorders manifest as "wiring defects" that start in early development).

I'm very confused by this. I have coded up a ConvNet with random initialization. It was computationally tractable; in fact, it ran on my laptop!

Okay, my claim there came out a lot stronger than I wanted and I concede a lot of what you say. Learning from scratch is probably not computationally intractable in the technical sense. I guess what I wanted to argue is that it appears practically infeasible to learn everything from scratch. (There is a lot of "everything" and not a lot of time to learn it. Any headstart might be strictly necessary and not just a nice-to-have).

(As a side point: your choice of a convnet as the example is interesting. People came up with convnets because fully-connected, randomly initialized networks were not great at image classification and we needed some inductive bias in the form of a locality constraint to learn in a reasonable time. That's the point I wanted to make.)

I guess maybe what you're claiming is: we can't have all three of {learning from scratch, general intelligence, computational tractability}.

Interesting, I haven't thought about it like this before. I do think it could be possible to have all three - but then it's not the brain anymore. As far as I can tell, evolutionary pressures make complete learning from scratch infeasible.

Comment by Jan (jan-2) on [Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain · 2022-02-04T23:20:09.729Z · LW · GW

Hey Steve! Thanks for writing this, it was an interesting and useful read! After our discussion in the LW comments, I wanted to get a better understanding of your thinking and this sequence is doing the job. Now I feel I can better engage in a technical discussion.

I can sympathize well with your struggle in section 2.6. A lot of the "big picture" neuroscience is in the stage where it's not even wrong. That being said, I don't think you'll find a lot of neuroscientists who nod along with your line of argument without raising objections here and there (neuroscientists love their trivia). They might be missing the point, but I think that still makes your theory (by definition) controversial. (I think the term "scientific consensus" should be used carefully and very selectively).

In that spirit, there are a few points that I could push back on:

  • Cortical uniformity (and by extension canonical microcircuits) are extremely useful concepts for thinking about the brain. But they are not literally 100% accurate. There are a lot of differences between different regions of the cortex, not only in thickness but also in the developmental process (here or here). I don't think anyone except for Jeff Hawkin believes in literal cortical uniformity.
  • In section 2.5.4.1 you are being a bit dismissive of biologically-"realistic" implementations of backpropagation. I used to be pretty skeptical too, but some of the recent studies are beginning to make a lot of sense. This one (a collaboration of Deepmind and some of the established neuroscience bigshots) is really quite elegant and offers some great insights on how interneurons and dendritic branches might interact.
  • A more theoretical counter: If evolution could initialize certain parts of the cortex so that they are faster "up and running" why wouldn't it? (Just so that we can better understand it? How nice!) From the perspective of evolution, it makes a lot of sense to initialize the cortex with an idea of what an oriented edge is because oriented edges have always been around since the inception of the eye.
    Or, in terms of computation theory, learning from scratch is computationally intractable. Strong, informative priors over hypothesis space might just be necessary to learn anything worthwhile at all.

But perhaps I'm missing the point with that nitpicking. I think the broader conceptual question I have is: What does "randomly initialized" even mean in the brain? At what point is the brain initialized? When the neural tube forms? When interneurons begin to migrate to the cortex? When the first synapses are established? When the subplate is gone? When the pruning of excess synapses and the apoptosis of cells is over? When the animal/human is born? When all the senses begin to transmit input? After college graduation?

Perhaps this is the point that the "old-timer" also wanted to make. It doesn't really make sense to separate the "initialization" from the "refinement". They happen at the same time, and whether you put a certain thing into one category or the other is up to individual taste.

All of this being said, I'm very curious to read the next parts of this sequence! :) Perhaps my points don't even affect your core argument about AI Safety.