My Criticism of Singular Learning Theory 2023-11-19T15:19:16.874Z
Goodhart's Law in Reinforcement Learning 2023-10-16T00:54:11.669Z
VC Theory Overview 2023-07-02T22:45:59.974Z
How Smart Are Humans? 2023-07-02T15:46:55.309Z
Using (Uninterpretable) LLMs to Generate Interpretable AI Code 2023-07-02T01:01:53.846Z
Some Arguments Against Strong Scaling 2023-01-13T12:04:27.444Z
What kinds of algorithms do multi-human imitators learn? 2022-05-22T14:27:31.430Z
Updating Utility Functions 2022-05-09T09:44:46.548Z
Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian 2020-12-29T13:33:53.202Z
Baseline Likelihood of Long-Term Side Effects From New Drugs? 2020-12-27T13:37:35.288Z
Two senses of “optimizer” 2019-08-21T16:02:08.985Z
Risks from Learned Optimization: Conclusion and Related Work 2019-06-07T19:53:51.660Z
Deceptive Alignment 2019-06-05T20:16:28.651Z
The Inner Alignment Problem 2019-06-04T01:20:35.538Z
Conditions for Mesa-Optimization 2019-06-01T20:52:19.461Z
Risks from Learned Optimization: Introduction 2019-05-31T23:44:53.703Z
Two agents can have the same source code and optimise different utility functions 2018-07-10T21:51:53.939Z


Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-24T23:11:02.712Z · LW · GW

If a universality statement like the above holds for neural networks, it would tell us that most of the details of the parameter-function map are irrelevant.  

I suppose this depends on what you mean by "most". DNNs and CNNs have noticeable and meaningful differences in their (macroscopic) generalisation behaviour, and these differences are due to their parameter-function map. This is also true of LSTMs vs transformers, and so on. I think it's fairly likely that these kinds of differences could have a large impact on the probability that a given type model will learn to exhibit goal-directed behaviour in a given training setup, for example.

The ambitious statement here might be that all the relevant information you might care about (in terms of understanding universality) are already contained in the loss landscape.

Do you mean the loss landscape in the limit of infinite data, or the loss landscape for a "small" amount of data? In the former case, the loss landscape determines the parameter-function map over the data distribution. In the latter case, my guess would be that the statement probably is false (though I'm not sure).

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T13:40:50.993Z · LW · GW

You're right, I put the parameters the wrong way around. I have fixed it now, thanks!

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T13:39:13.927Z · LW · GW

I could have changed it to Why Neural Networks can obey Occam's Razor, but I think this obscures the main point.

I think even this would be somewhat inaccurate (in my opinion). If a given parametric Bayesian learning machine does obey (some version of) Occam's razor, then this must be because of some facts related to its prior, and because of some facts related to its parameter-function map. SLT does not say very much about either of these two things. What the post is about is primarily the relationship between the RLCT and posterior probability, and how this relationship can be used to reason about training dynamics. To connect this to Occam's razor (or inductive bias more broadly), further assumptions and claims would be required.

At the time of writing, basically nobody knew anything about SLT

Yes, thank you so much for taking the time to write those posts! They were very helpful for me to learn the basics of SLT.

As we discussed at Berkeley, I do like the polynomial example you give and this whole discussion has made me think more carefully about various aspects of the story, so thanks for that.

I'm very glad to hear that! :)

My inclination is that the polynomial example is actually quite pathological and that there is a reasonable correlation between the RLCT and Kolmogorov complexity in practice

Yes, I also believe that! The polynomial example is definitely pathological, and I do think that low  almost certainly is correlated with simplicity in the case of neural networks. My point is more that the mathematics of SLT does not explain generalisation, and that additional assumptions definitely will be needed to derive specific claims about the inductive bias of neural networks. 

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T11:53:45.695Z · LW · GW

Well neural networks do obey Occam's razor, at least according to the formalisation of that statement that is contained in the post (namely, neural networks when formulated in the context of Bayesian learning obey the free energy formula, a generalisation of the BIC which is often thought of as a formalisation of Occam's razor).

Would that not imply that my polynomial example also obeys Occam's razor? 

However, I accept your broader point, which I take to be: readers of these posts may naturally draw the conclusion that SLT currently says something profound about (ii) from my other post, and the use of terms like "generalisation" in broad terms in the more expository parts (as opposed to the technical parts) arguably doesn't make enough effort to prevent them from drawing these inferences.

Yes, I think this probably is the case. I also think the vast majority of readers won't go deep enough into the mathematical details to get a fine-grained understanding of what the maths is actually saying.

I'm often critical of the folklore-driven nature of the ML literature and what I view as its low scientific standards, and especially in the context of technical AI safety I think we need to aim higher, in both our technical and more public-facing work.

Yes, I very much agree with this too.

Does that sound reasonable?

Yes, absolutely!

At least right now, the value proposition I see of SLT lies not in explaining the "generalisation puzzle" but in understanding phase transitions and emergent structure; that might end up circling back to say something about generalisation, eventually.

I also think that SLT probably will be useful for understanding phase shifts and training dynamics (as I also noted in my post above), so we have no disagreements there either.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T11:27:13.322Z · LW · GW

I think I recall reading that, but I'm not completely sure.

Note that the activation function affects the parameter-function map, and so the influence of the activation function is subsumed by the general question of what the parameter-function map looks like.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T11:21:49.026Z · LW · GW

I'm not sure, but I think this example is pathological.

Yes, it's artificial and cherry-picked to make a certain rhetorical point as simply as possible.

This is the more relevant and interesting kind of symmetry, and it's easier to see what this kind of symmetry has to do with functional simplicity: simpler functions have more local degeneracies.¨

This is probably true for neural networks in particular, but mathematically speaking, it completely depends on how you parameterise the functions. You can create a parameterisation in which this is not true.

You can make the same critique of Kolmogorov complexity.

Yes, I have been using "Kolmogorov complexity" in a somewhat loose way here.

Wild conjecture: [...]

Is this not satisfied trivially due to the fact that the RLCT has a certain maximum and minimum value within each model class? (If we stick to the assumption that  is compact, etc.)

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T10:56:30.424Z · LW · GW

Will do, thank you for the reference!

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T10:52:38.542Z · LW · GW

Yes, I completely agree. The theorems that have been proven by Watanabe are of course true and non-trivial facts of mathematics; I do not mean to dispute this. What I do criticise is the magnitude of the significance of these results for the problem of understanding the behaviour of deep learning systems.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T10:19:56.486Z · LW · GW

Thank you for this -- I agree with what you are saying here. In the post, I went with a somewhat loose equivocation between "good priors" and "a prior towards low Kolmogorov complexity", but this does skim past a lot of nuance. I do also very much not want to say that the DNN prior is exactly towards low Kolmogorov complexity (this would be uncomputable), but only that it is mostly correlated with Kolmogorov complexity for typical problems.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T10:09:45.731Z · LW · GW

Yes, I mostly just mean "low test error". I'm assuming that real-world problems follow a distribution that is similar to the Solomonoff prior (i.e., that data generating functions are more likely to have low Kolmogorov complexity than high Kolmogorov complexity) -- this is where the link is coming from. This is an assumption about the real world, and not something that can be established mathematically.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-22T10:03:50.279Z · LW · GW

I think that it gives us an adequate account of generalisation in the limit of infinite data (or, more specifically, in the case where we have enough data to wash out the influence of the inductive bias); this is what my original remark was about. I don't think classical statistical learning theory gives us an adequate account of generalisation in the setting where the training data is small enough for our inductive bias to still matter, and it only has very limited things to say about out-of-distribution generalisation.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-21T01:02:16.432Z · LW · GW

The assumption that small neural networks are a good match for the actual data generating process of the world, is equivalent to the assumption that neural networks have an inductive bias that gives large weight to the actual data generating process of the world, if we also append the claim that neural networks have an inductive bias that gives large weight to functions which can be described by small neural networks (and this latter claim is not too difficult to justify, I think).

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-21T00:44:36.825Z · LW · GW

I think the second one by Carroll is quite careful to say things like "we can now understand why singular models have the capacity to generalise well" which seems to me uncontroversial, given the definitions of the terms involved and the surrounding discussion.

The title of the post is Why Neural Networks obey Occam's Razor! It also cites Zhang et al, 2017, and immediately after this says that SLT can help explain why neural networks have the capacity to generalise well. This gives the impression that the post is intended to give a solution to problem (ii) in your other comment, rather than a solution to problem (i).

Jesse's post includes the following expression:

I think this also suggests an equivocation between the RLCT measure and practical generalisation behaviour. Moreover, neither post contains any discussion of the difference between (i) and (ii).

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-21T00:25:13.646Z · LW · GW

Anyway I'm guessing you're probably willing to grant (i), based on SLT or your own views, and would agree the real bone of contention lies with (ii).

Yes, absolutely. However, I also don't think that (i) is very mysterious, if we view things from a Bayesian perspective. Indeed, it seems natural to say that an ideal Bayesian reasoner should assign non-zero prior probability to all computable models, or something along those lines, and in that case, notions like "overparameterised" no longer seem very significant.

Maybe that has significant overlap with the critique of SLT you're making?

Yes, this is basically exactly what my criticism of SLT is -- I could not have described it better myself!

Again, I think this reduction is not trivial since the link between  and generalisation error is nontrivial.

I agree that this reduction is relevant and non-trivial. I don't have any objections to this per se. However, I do think that there is another angle of attack on this problem that (to me) seems to get us much closer to a solution (namely, to investigate the properties of the parameter-function map).

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T23:48:28.225Z · LW · GW

A few things:

1. Neural networks do typically learn functions with low Kolmogorov complexity (otherwise they would not be able to generalise well).
2. It is a type error to describe a function as having low RLCT. A given function may have a high RLCT or a low RLCT, depending on the architecture of the learning machine. 
3. The critique is against the supposition that we can use SLT to explain why neural networks generalise well in the small-data regime. The example provides a learning machine which would not generalise well, but which does fit all assumptions made my SLT. Hence, the SLT theorems which appear to prove that learning machines will generalise well when they are subject to the assumptions of SLT must in fact be showing something else.

My point is precisely that SLT does not give us a predictive account of how neural networks behave, in terms of generalisation and inductive bias, because it abstacts away from factors which are necessary to understand generalisation and inductive bias.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T23:34:11.119Z · LW · GW

To say that neural networks are empirical risk minimisers is just to say that they find functions with globally optimal training loss (and, if they find functions with a loss close to the global optimum, then they are approximate empirical risk minimisers, etc). 

I think SLT effectively assumes that neural networks are (close to being) empirical risk minimisers, via the assumption that they are trained by Bayesian induction.

Comment by Joar Skalse (Logical_Lunatic) on VC Theory Overview · 2023-11-20T11:59:39.316Z · LW · GW

The bounds are not exactly vacuous -- in fact, they are (in a sense) tight. However, they concern a somewhat adversarial setting, where the data distribution may be selected arbitrarily (including by making it maximally opposed to the inductive bias of the learning algorithm). This means that the bounds end up being much larger than what you would typically observe in practice, if you give typical problems to a learning algorithm whose inductive bias is attuned to the structure of "typical" problems. 

If you move from this adversarial setting to a more probabilistic setting, where you assume a fixed distribution over  or , then you may be able to prove tighter probabilistic bounds. However, I do not have any references of places where this actually has been done (and as far as I know, it has not been done before).

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:56:24.146Z · LW · GW

I already posted this in response to Daniel Murfet, but I will copy it over here:

For example, the agnostic PAC-learning theorem says that if a learning machine  (for binary classification) is an empirical risk minimiser with VC dimension , then for any distribution  over , if  is given access to at least  data points sampled from , then it will with probability at least  learn a function whose (true) generalisation error (under ) is at most  worse than the best function which  is able to express (in terms of its true generalisation error under ). If we assume that that  corresponds to a function which  can express, then the generalisation error of  will with probability at least  be at most .

This means that, in the limit of infinite data,  will with probability arbitrarily close to 1 learn a function whose error is arbitrarily close to the optimal value (among all functions which  is able to express). Thus, any empirical risk minimiser with a finite VC-dimension will generalise well in the limit of infinite data.

For a bit more detail, see this post.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:51:20.060Z · LW · GW

Does this not essentially amount to just assuming that the inductive bias of neural networks in fact matches the prior that we (as humans) have about the world?

This is basically a justification of something like your point 1, but AFAICT it's closer to a proof in the SLT setting than in your setting.

I think it could probably be turned into a proof in either setting, at least if we are allowed to help ourselves to assumptions like "the ground truth function is generated by a small neural net" and "learning is done in a Bayesian way", etc.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:48:04.074Z · LW · GW

In your example there are many values of the parameters that encode the zero function

Ah, yes, I should have made the training data be (1,1), rather than (0,0). I've fixed the example now!

Is that a fair characterisation of the argument you want to make?

Yes, that is exactly right!

Assuming it is, my response is as follows. I'm guessing you think  is simpler than  because the former function can be encoded by a shorter code on a UTM than the latter.

The notion of complexity that I have in mind is even more pre-theoretic than that; it's something like " looks like an intuitively less plausible guess than ". However, if we want to keep things strictly mathematical, then we can substitute this for the definition in terms of UTM codes.

But this isn't the kind of complexity that SLT talks about

I'm well aware of that -- that is what my example attempts to show! My point is that the kind of complexity which SLT talks about does not allow us to make inferences about inductive bias or generalisation behaviour, contra what is claimed e.g. here and here.

So we agree that Kolmogorov complexity and the local learning coefficient are potentially measuring different things. I want to dig deeper into where our disagreement lies, but I think I'll just post this as-is and make sure I'm not confused about your views up to this point.

As far as I can tell, we don't disagree about any object-level technical claims. Insofar as we do disagree about something, it may be more methodolocical meta-questions. I think that what would probably be the most important thing to understand about neural networks is their inductive bias and generalisation behaviour, on a fine-grained level, and I don't think SLT can tell you very much about that. I assume that our disagreement must be about one of those two claims?

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:35:51.010Z · LW · GW

For example, the agnostic PAC-learning theorem says that if a learning machine  (for binary classification) is an empirical risk minimiser with VC dimension , then for any distribution  over , if  is given access to at least  data points sampled from , then it will with probability at least  learn a function whose (true) generalisation error (under ) is at most  worse than the best function which  is able to express (in terms of its true generalisation error under ). If we assume that that  corresponds to a function which  can express, then the generalisation error of  will with probability at least  be at most .

This means that, in the limit of infinite data,  will with probability arbitrarily close to 1 learn a function whose error is arbitrarily close to the optimal value (among all functions which  is able to express). Thus, any empirical risk minimiser with a finite VC-dimension will generalise well in the limit of infinite data.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:31:16.614Z · LW · GW

I'm going to make a few comments as I read through this, but first I'd like to thank you for taking the time to write this down, since it gives me an opportunity to think through your arguments in a way I wouldn't have done otherwise.

Thank you for the detailed responses! I very much enjoy discussing these topics :)

My impression is that you tend to see this as a statement about flatness, holding over macroscopic regions of parameter space

My intuitions around the RLCT are very much geometrically informed, and I do think of it as being a kind of flatness measure. However, I don't think of it as being a "macroscopic" quantity, but rather, a local quantity.

I think the rest of what you say coheres with my current picture, but I will have to think about it for a bit, and come back later!

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:16:37.531Z · LW · GW

I have often said that SLT is not yet a theory of deep learning, this question of whether the infinite data limit is really the right one being among one of the main question marks I currently see.

Yes, I agree with this. I think my main objections are (1) the fact that it mostly abstacts away from the parameter-function map, and (2) the infinite-data limit.

My view is that the validity of asymptotics is an empirical question, not something that is settled at the blackboard.

I largely agree, though depends somewhat on what your aims are. My point there was mainly that theorems about generalisation in the infinite-data limit are likely to end up being weaker versions of more general results from statistical and computational learning theory.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:14:08.094Z · LW · GW

That's interesting, thank you for this!

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:13:45.345Z · LW · GW

Yes, I meant specifically on LW and in the AI Safety community! In academia, it remains fairly obscure.

Comment by Joar Skalse (Logical_Lunatic) on My Criticism of Singular Learning Theory · 2023-11-20T10:13:13.229Z · LW · GW

I think this is precisely what SLT is saying, and this is nontrivial!

It is certainly non-trivial, in the sense that it takes many lines to prove, but I don't think it tells you very much about the actual behaviour of neural networks.

Note that loss landscape considerations are more important than parameter-function considerations in the context of learning.

One of my core points is, precisely, to deny this claim. Without assumptions about the parameter function map, you cannot make inferences from the characteristics of the loss landscape to conclusions about generalisation behaviour, and understanding generalisation behaviour is crucial for understanding learning. (Unless you mean something like "convergence behaviour" when you say "in the context of learning", in which case I agree, but then you would consider generalisation to be outside the scope of learning.)

For example it's not clear in your example why f(x) = 0 is likely to be learned

My point is precisely that it is not likely to be learned, given the setup I provided, even though it should be learned.

Learning bias in a NN should most fundamentally be understood relative to the weights, not higher-order concepts like Kolmogorov complexity (though as you point out, there might be a relationship between the two).

There is a relationship between the two, and I claim that this relationship underlies the mechanism behind why neural networks work well compared to other learning machines.

The thing is, the "complexity of f" (your K(f)) is not a very meaningful concept from the point of view of a neural net's learning

If we want to explain generalisation in neural networks, then we must explain if and how their inductive bias aligns with out (human) priors. Moreover, our human priors are (in most contexts) largely captured by computational complexity. Therefore, we must somewhere, in some way, connect neural networks to computational complexity.

indeed, there is no way to explain why generalizable networks like modular addition still sometimes memorize without understanding that the two are very distinct

Why not? The memorising solution has some nonzero "posterior" weight, so you would expect it to be found with some frequency. Does the empirical frequency of this solution deviate far from the theoretical prediction?

Comment by Joar Skalse (Logical_Lunatic) on Goodhart's Law in Reinforcement Learning · 2023-10-16T09:03:32.890Z · LW · GW

including stuff Joar has worked on

That is right! See this paper.

Comment by Joar Skalse (Logical_Lunatic) on How Smart Are Humans? · 2023-07-10T13:28:01.122Z · LW · GW

 which animals cannot do at all, they can't write computer code or a mathematical paper

This is not obvious to me (at least not for some senses of the word "could"). Animals cannot be motivated into attempting to solve these tasks, and they cannot study maths or programming. If they could do those things, then it is not at all clear to me that they wouldn't be able to write code or maths papers. To make this more specific; insofar as humans rely on a capacity for general problem-solving in order to do maths and programming, it would not surprise me if many animals also have this capacity to a sufficient extent, but that it cannot be directed in the right way. Note that animals even outperform humans at some general cognitive tasks. For example, chimps have a much better short-term memory than humans.

Moreover, we know a lot about human performance at those tasks, and it's abysmal, even for top humans, and for AI research as a field.

Abysmal, compared to what? Yes, we can see that it is abysmal compared to what would in principle be information-theoretically possible. However, this doesn't tell us very much about whether or not it is abysmal compared to what is computationally possible.

The problem of finding the minimal complexity hypothesis for a given set of data is not computationally tractable. For Kolmogorov complexity, it is uncomputable, but even for Boolean complexity, it is at least exponentially difficult (depending a bit on how exactly the problem is formalised). This means that in order to reason effectively about large amounts of data, it is (presumably) necessary to model most of it using low-fidelity methods, and then (potentially) use various heuristics in order to determine what pieces of information deserve more attention. I would therefore expect a "saturated" AI system to also frequently miss things that look obvious in hindsight.

So it seems that, at least, there is quite a bit of room for a large initial boost over the current human-equivalent capacity.

I agree that AI systems have many clear and obvious advantages, and that e.g. simply running them at a higher clock speed will give you a clear boost regardless of what assumptions we make about the "quality" of their cognition compared to that of humans. The question I'm concerned with is whether or not a takeoff scenario is better modeled as "AI quickly bootstraps to incomprehensible, Godlike intelligence through recursive self-improvement", or whether it is better modeled as "economic growth suddenly goes up by a lot". All the obvious advantages of AI systems are compatible with the latter.

Comment by Joar Skalse (Logical_Lunatic) on How Smart Are Humans? · 2023-07-08T14:08:44.741Z · LW · GW

So, the claim is (of course) not that intelligence is zero-one. We know that this is not the case, from the fact that some people are smarter than other people. 

As for the other two points, see this comment and this comment.

Comment by Joar Skalse (Logical_Lunatic) on How Smart Are Humans? · 2023-07-08T13:59:03.841Z · LW · GW

So, this model of a takeoff scenario makes certain assumptions about how intelligence works, and these assumptions may or may not be correct. In particular, it assumes that the initial AI systems are very far from being algorithmically optimal. We don't know whether or not this will be the case; that is what I am trying to highlight.

The task of extracting knowledge from data is a computational task, which has a certain complexity-theoretic hardness. We don't know what that hardness is, but there is a lower bound on how efficiently this task can be done. Similarly for all the other tasks of intelligence (such as planning, etc).

Strong recursive self-improvement (given a fixed amount of resources) is only possible if the first AI systems are very far from being algorithmically optimal at all the relevant computational tasks. This is not a given; it could be true or false. For example, while you can optimise a SAT-solver in many ways, it will at the end of the day necessarily have a worst-case exponential runtime complexity (unless P = NP).

Therefore, the question of how much more intelligent AI systems will end up being compared to humans, depends on how close the human brain algorithm is to being (roughly) Pareto-optimal for all the relevant computational tasks. We don't know the answer to this question. Strong, sustained recursive self-improvement is only possible if our initial AGI algorithm, and the human brain, both are very far from being Pareto-optimal.

Is this the case? You could point to the discrepancy between humans and animals, and argue that this demonstrates that there are cognitive algorithms that yield vastly different results given similar amounts of resources (in terms of data and parameters). However, the argument I've written casts doubt on whether or not this evidence is reliable. Given that, I think the case is no longer so clear; perhaps the human neural architecture is (within some small-ish polynomial factor of being) Pareto optimal for most relevant cognitive tasks.

Now, even if the human brain algorithm is roughly optimal, AI systems will almost certainly still end up with vastly more cognitive force (because they can be run faster, and given more memory and more data). However, I think that this scenario is different in relevant ways. In particular, without (strong) recursive self-improvement, you probably don't get an uncontrollable, exponential growth in intelligence, but rather a growth that is bottle-necked by resources which you could control (such as memory, data, CPU cycles, and etc).


Comment by Joar Skalse (Logical_Lunatic) on How Smart Are Humans? · 2023-07-08T09:55:48.069Z · LW · GW

I don't have any good evidence that humans raised without language per se are less intelligent (if we understand "intelligence" to refer to a general ability to solve new problems). For example, Genie was raised in isolation for the first 13 years of her life, and never developed a first language. Some researchers have, for various reasons, guessed that she was born with average intelligence, but that she, as a 14-year old, had a mental age "between a 5- and 8-year-old". However, here we have the confounding factor that she also was severely abused, and that she got very little mental stimulus in general for the first 13 years of her life, which would presumably obstruct mental development independently of a lack of language. This makes it hard to draw any strong conclusions (and we would regardless have a very small number of data points).

However, just to clarify, the argument I'm making doesn't crucially rely on the assumption that a human with language is significantly more intelligent than a human without language, but rather on the (presumably much less controversial) assumption that language is a significant advantage regardless of whether or not it is also paired with an increase in intelligence. For example, it would not surprise me if orangutans with language (but orangutan-level intelligence) over time would outcompete humans without language (but otherwise human-level intelligence). This, in turn, makes it difficult to infer how intelligent humans are compared to animals, based on what we have achieved compared to animals. 

For example, one might say 

"Humans have gone to space, but no other species is anywhere close to being able to do that. This proves that humans are vastly more intelligent than all other species."

However, without access to language, humans can't go to space either. Moreover, we don't know if orangutans would eventually be able to go to space if they did have access to language. This makes it quite non-trivial to make a direct comparison.

Comment by Joar Skalse (Logical_Lunatic) on How Smart Are Humans? · 2023-07-08T09:11:00.228Z · LW · GW

I think the broad strokes are mostly similar, but that a bunch of relevant details are different.

Yes, a large collective of near-human AI that is allowed to interact freely over a (subjectively) long period of time is presumably roughly as hard to understand and control as a Bostrom/Yudkowsky-esque God in a box. However, in this scenario, we have the option to not allow free interaction between multiple instances, while still being able to extract useful work from them. It is also probably much easier to align a system that is not of overwhelming intelligence, and this could be done before the AIs are allowed to interact. We might also be able to significantly influence their collective behaviour by controlling the initial conditions of their interactions (similarly to how institutions and cultural norms have a substantial long-term impact on the trajectory of a country, for example). It is also more plausible that humans (or human simulations or emulations) could be kept in the loop for a long time period in this scenario. Moreover, if intelligence is bottle-necked by external resources (such as memory, data, CPU cycles, etc) rather than internal algorithmic efficiency, then you can exert more control over the resulting intelligence explosion by controlling those resources. Etc etc.

Comment by Joar Skalse (Logical_Lunatic) on Using (Uninterpretable) LLMs to Generate Interpretable AI Code · 2023-07-08T08:48:37.679Z · LW · GW

Note that this proposal is not about automating interpretability.

Comment by Joar Skalse (Logical_Lunatic) on Using (Uninterpretable) LLMs to Generate Interpretable AI Code · 2023-07-08T08:44:37.035Z · LW · GW

The point is that you (in theory) don't need to know whether or not the uninterpretable AGI is safe, if you are able to independently verify its output (similarly to how I can trust a mathematical proof, without trusting the mathematician).

Of course, in practice, the uninterpretable AGI presumably needs to be reasonably aligned for this to work. You must at the very least be able to motivate it to write code for you, without hiding any trojans or backdoors that you are not able to detect.

However, I think that this is likely to be much easier than solving the full alignment problem for sovereign agents. Writing software is a myopic task that can be accomplished without persistent, agentic preferences, which means that the base system could be much more tool-like that the system which it produces.

But regardless of that point, many arguments for why interpretability research will be helpful also apply to the strategy I outline above.  

Comment by Joar Skalse (Logical_Lunatic) on Using (Uninterpretable) LLMs to Generate Interpretable AI Code · 2023-07-02T22:54:30.668Z · LW · GW
  1. This is obviously true; any AI complete problem can be trivially reduced to the problem of writing an AI program that solves the problem. That isn't really a problem for the proposal here. The point isn't that we could avoid making AGI by doing this, the point is that we can do this in order to get AI systems that we can trust without having to solve interpretability.
  2. This is probably true, but the extent to which it is true is unclear. Moreover, if the inner workings of intelligence are fundamentally uninterpretable, then strong interpretability must also fail. I already commented on this in the last two paragraphs of the top-level post.
Comment by Joar Skalse (Logical_Lunatic) on How Smart Are Humans? · 2023-07-02T16:38:06.285Z · LW · GW

Yes, I agree with this. I mean, even if we assume that the AIs are basically equivalent to human simulations, they still get obvious advantages from the ability to be copy-pasted, the ability to be restored to a checkpoint, the ability to be run at higher clock speeds, and the ability to make credible pre-commitments, etc etc. I therefore certainly don't think there is any plausible scenario in which unchecked AI systems wouldn't end up with most of the power on earth. However, there is a meaningful difference between the scenario where their advantages mainly come from overwhelmingly great intelligence, and the scenario where their advantages mainly (or at least in large part) come from other sources. For example, scaleable oversight is a more realistic possibility in the latter scenario than it is in the former scenario. Boxing methods are also more realistic in the latter scenario than the former scenario, etc.

Comment by Joar Skalse (Logical_Lunatic) on Using (Uninterpretable) LLMs to Generate Interpretable AI Code · 2023-07-02T15:39:24.733Z · LW · GW

No, I don't have any explicit examples of that. However, I don't think that the main issue with GOFAI systems necessarily is that they have bad performance. Rather, I think the main problem is that they are very difficult and laborious to create. Consider, for example, IBM Watson. I consider this system to be very impressive. However, it took a large team of experts four years of intense engineering to create Watson, whereas you probably could get similar performance in an afternoon by simply fine-tuning GPT-2. However, this is less of a problem if you can use a fleet of LLM software engineers and have them spend 1,000 subjective years on the problem over the course of a weekend.

I also want to note that:
1. Some trade-off between performance and transparency is acceptable, as long as it is not too large. 
2. The system doesn't have to be an expert system: the important thing is just that it's transparent.
3. If it is impossible to create interpretable software for solving a particular task, then strong interpretability must also fail.


Comment by Joar Skalse (Logical_Lunatic) on Using (Uninterpretable) LLMs to Generate Interpretable AI Code · 2023-07-02T01:10:41.435Z · LW · GW

To clarify, the proposal is not (necessarily) to use an LLM to create an interpretable AI system that is isomorphic to the LLM -- their internal structure could be completely different. The key points are that the generated program is interpretable and trustworthy, and that it can solve some problem we are interested in. 

Comment by Joar Skalse (Logical_Lunatic) on My impression of singular learning theory · 2023-06-20T15:42:34.501Z · LW · GW

What is the exact derivation that gives you claim (1)?

Comment by Joar Skalse (Logical_Lunatic) on The basic reasons I expect AGI ruin · 2023-04-27T18:05:57.666Z · LW · GW

Empirically, the inductive bias that you get when you train with SGD, and similar optimisers, is in fact quite similar to the inductive bias that you would get, if you were to repeatedly re-initialise a neural network until you randomly get a set of weights that yield a low loss. Which optimiser you use does have an effect as well, but this is very small by comparison. See this paper.

Comment by Joar Skalse (Logical_Lunatic) on Some Arguments Against Strong Scaling · 2023-01-23T13:05:17.995Z · LW · GW

The kinds of humans that we are worried about are the kinds of humans that can do original scientific research and autonomously form plans for taking over the world. Human brains learn to take actions and plans that previously led to high rewards (outcomes like eating food when hungry, having sex, etc)*. These two things are fundamentally not the same thing. Why, exactly, would we expect that a system that is good at the latter necessarily would be able to do the former?"

This feels like a bit of a digression, but we do have concrete examples of systems that are good at eating food when hungry, having sex, and etc, without being able to do original scientific research and autonomously form plans for taking over the world; animals. And the difference between humans and animals isn't just that humans have more training data (or even that we are that much better at survival and reproduction in the environment of evolutionary adaptation). But I should also note that I consider argument 6 to be one of the weaker arguments I know of.

We know, from computer science, that it is very powerful to be able to reason in terms of variables and operations on variables. It seems hard to see how you could have human-level intelligence without this ability. However, humans do not typically have this ability, with most human brains instead being more analogous to Boolean circuits, given their finite size and architecture of neuron connections.

The fact that human brains have a finite size and architecture of neuron connections does not mean that they are well-modelled as Boolean circuits. For example, a (real-world) computer is better modelled as a Turing machine than as a finite-state automaton, even though there is a sense in which they actually are finite-state automata. 

The brain is made out of neurons, yes, but it matters a great deal how those neurons are connected. Depending on the answer to that question, you could end up with a system that behaves more like Boolean circuits, or more like a Turing machine, or more like something else.

With neural networks, the training algorihtm and the architecture together determine how the neurons end up connected, and therefore, if the resulting system is better thought of as a Boolean circuit, or a Turing machine, or otherwise. If the wiring of the brain is determined by a different mechanism than what determines the wiring of a deep learning system, then the two systems could end up with very different properties, even if they are made out of similar kinds of parts.

With the brain, we don't know what determines the wiring. This makes it difficult to draw strong conclusions about the high-level behaviour of brains from their low-level physiology. With deep learning, it is easier to do this.

I find it hard to make the argument here because there is no argument -- it's just flatly asserted that neural networks don't use such representations, so all I can do is flatly assert that humans don't use such representations. If I had to guess, you would say something like "matrix multiplications don't seem like they can be discrete and combinatorial", to which I would say "the strength of brain neuron synapse firings doesn't seem like it can be discrete and combinatorial".

What representations you end up with does not just depend on the model space, it also depends on the learning algorithm. Matrix multiplications can be discrete and combinatorial. The question is if those are the kinds of representations that you in fact would end up with when you train a neural network by gradient descent, which to me seems unlikely. The brain does (most likely) not use gradient descent, so the argument does not apply to the brain.

Do you perhaps agree that you would have a hard time navigating in a 10-D space? Clearly you have simply memorized a bunch of heuristics that together are barely sufficient for navigating 3-D space, rather than truly understanding the underlying algorithm for navigating spaces.

It would obviously be harder for me to do this, and narrow heuristics are obviously an important part of intelligence. But I could do it, which suggests a stronger transfer ability than what would be suggested if I couldn't do this.

In some other parts, I feel like in many places you are being one-sidedly skeptical.

Yes, as I said, my goal with this post is not to present a balanced view of the issue. Rather, my goal is just to summarise as many arguments as possible for being skeptical of strong scaling. This makes the skepticism one-sided in some places.

Comment by Joar Skalse (Logical_Lunatic) on Some Arguments Against Strong Scaling · 2023-01-18T23:43:32.637Z · LW · GW

The general rule I'm following is "if the argument would say false things about humans, then don't update on it".

Yes, this is of course very sensible. However, I don't see why these arguments would apply to humans, unless you make some additional assumption or connection that I am not making. Considering the rest of the conversation, I assume the difference is that you draw a stronger analogy between brains and deep learning systems than I do?

I want to ask a question that goes something like "how correlated is your credence that arguments 5-10 apply to human brains with your credence that human brains and deep learning systems are analogous in important sense X"? But because I don't quite know what your beliefs are, or why you say that arguments 5-10 apply to humans, I find it hard to formulate this question in the right way.

For example, regarding argument 7 (language of thought), consider the following two propositions:

  1. Some part of the human brain is hard-coded to use LoT-like representations, and the way that these representations are updated in response to experience is not analogous to gradient descent.
  2. Updating the parameters of a neural network with gradient descent is very unlikely to yield (and maintain) LoT-like representations.

These claims could both be true simultaneously, no? Why, concretely, do you think that arguments 5-10 apply to human brains?

I'm not seeing why that's evidence for the perspective. Even when word order is scrambled, if you see "= 32 44 +" and you have to predict the remaining number, you should predict some combination of 76, 12, and -12 to get optimal performance; to do that you need to be able to add and subtract, so the model presumably still develops addition and subtraction circuits. Similarly for text that involves logic and reasoning, even after scrambling word order it would still be helpful to use logic and reasoning to predict which words are likely to be present. The overall argument for why the resulting system would have strong, general capabilities seems to still go through.

It is empirically true that the resulting system has strong and general capabilities, there is no need to question that. What I mean is that this is evidence that those capabilities are a result of information processing that is quite dissimilar from what humans do, which in turn opens up the possibility that those processes could not be re-tooled to create the kind of system that could take over the world. In particular, they could be much more shallow than they seem.

It is not hard to argue that a model with general capabilities for reasoning, hypothesis generation, and world modelling, etc, would get a good score at the task of an LLM. However, I think one of the central lessons from the history of AI is that there probably also are many other ways to get a good score at this task.

In addition, I don't know why you expect that intelligence can't be implemented through "a truly massive ensemble of simple heuristics".

Given a sufficiently loose definition of "intelligence", I would expect that you almost certainly could do this. However, if we instead consider systems that would be able to overpower humanity, or very significantly shorten the amount of time before such a system could be created, then it is much less clear to me.

Why don't you think a big random forest classifier could lead to AGI?

I don't rule out the possibility, but it seems unlikely that such a system could learn representations and circuits that would enable sufficiently strong out-of-distribution generalisation.

But it is "forced" by the training data? The argument here is that text prediction is hard enough that the only way the network can do it (to a very very high standard) is to develop these sorts of representation?

I think this may be worth zooming in on. One of the main points I'm trying to get at is that it is not just the asymptotic behaviour of the system that matters; two other (plausibly connected) things which are at least as important is how well the system generalises out-of-distribution, and how much data it needs to attain that performance. In other words, how good it is at extrapolating from observed examples to new situations. A system could be very bad at this, and yet eventually with enough training data get good in-distribution performance.

The main point of LoT-like representations would be a better ability to generalise. This benefit is removed if you could only learn LoT-like representation by observing training data corresponding to all the cases you would like to generalise to.

I certainly agree that a randomly initialized network is not going to have sensible representations, just as I'd predict that a randomly initialized human brain is going to have sensible representations (modulo maybe some innate representations encoded by the genome). I assume you are saying something different from that but I'm not sure what.

Yes, I am not saying that.

Maybe if I rephrase it this way; to get us to AGI, LLMs would need to have a sufficiently good inductive bias, but I'm not convinced that they actually have a sufficiently good inductive bias.

But why not? If I were to say "it seems as though the human brain works like a deep learning system, while of course being implemented somewhat differently", how would you argue against that?

It is hard for me to argue against this, without knowing in more detail what you mean by "like", and "somewhat differently", as well as knowing what pieces of evidence underpin this belief/impression.

I would be quite surprised if there aren't important high-level principles in common between deep learning and at least parts of the human brain (it would be a bit too much of a coincidence if not). However, this does not mean that deep learning (in its current form) captures most of the important factors behind human intelligence. Given that there are both clear physiological differences (some of which seem more significant than others) and many behavioural differences, I think that the default should be to assume that there are important principles of human cognition that are not captured by (current) deep learning.

I know several arguments in favour of drawing a strong analogy between the brain and deep learning, and I have arguments against those arguments. However, I don't know if you believe in any of these arguments (eg, some of them are arguments like "the brain is made out of neurons, therefore deep learning"), so I don't want to type out long replies before I know why you believe that human brains work like deep learning systems.

Oh, is your point "LLMs do not have a general notion of search that they can apply to arbitrary problems"? I agree this is currently true, whereas humans do have this. This doesn't seem too relevant to me, and I don't buy defining memorization as "things that are not general-purpose search" and then saying "things that do memorization are not intelligent", that seems too strong.

Yes, that was my point. I'm definitely not saying that intelligence = search, I just brought this up as an example of a case where GPT3 has an impressive ability, but where the mechanism behind that ability is better construed as "memorising the training data" rather than "understanding the problem". The fact that the example involved search was coincidental.

Do you actually endorse that response? Seems mostly false to me, except inasmuch as humans can write things down on external memory (which I expect an LLM could also easily do, we just haven't done that yet).

I don't actually know much about this, but that is the impression I have got from speaking with people who work on this. Introspectively, it also feels like it's very non-random what I remember. But if we want to go deeper into this track, I would probably need to look more closely at the research first.

Comment by Joar Skalse (Logical_Lunatic) on Some Arguments Against Strong Scaling · 2023-01-16T22:07:05.145Z · LW · GW

But for all of them except argument 6, it seems like the same argument would imply that humans would not be generally intelligent.

Why is that?

Because text on the Internet sometimes involves people using logic, reasoning, hypothesis generation, analyzing experimental evidence, etc, and so plausibly the simplest program that successfully predicts that text would do so by replicating that logic, reasoning etc, which you could then chain together to make scientific progress.

What does the argument say in response?

There are a few ways to respond.

First of all, what comes after "plausibly" could just turn out to be wrong. Many people thought human-level chess would require human-like strategising, but this turned out to be wrong (though the case for text prediction is certainly more convincing).

Secondly, an LLM is almost certainly not learning the lowest K-complexity program for text prediction, and given that, the case becomes less clear. For example, suppose an LLM instead learns a truly massive ensemble of simple heuristics, that together produce human-like text. It seems plausible that such an ensemble could produce convincing results, but without replicating logic, reasoning, and etc. IBM-Watson did something along these lines. Studies such as this one also provide some evidence for this perspective.

To give an intuition pump, suppose we trained an extremely large random forest classifier on the same data as GPT3 was trained on. How good would the output of this classifier be? While it would probably not be as good as GPT3, it would probably still be very impressive. And a random forest classifier is also a universal function approximator, whose performance keeps improving as it is given more training data. I'm sure there are scaling laws for them. But I don't think many people believe that we could get AGI by making a sufficiently big random forest classifier for next-token prediction. Why is that? I have found this to be an interesting prompt to think about. For me, a gestalt shift that makes long time lines seem plausible is to look at LLMs sort of like how you would look at a giant random forest classifier.

(Also, just to reiterate, I am not personally convinced of long time-lines, I am just trying to make the best arguments for this view more easily available.)

How do you know neural networks won't use such representations?

I can't say this for sure, especially not for newer or more exotic architectures, but it does certainly not seem like these are the kinds of representations that deep learning systems are likely to learn. Rather, they seems much more likely to learn manifold-like representations, where proximity corresponds to relevant similarity, or something along those lines. Syntactically organised, combinatorial representations are certainly not very "native" to the deep learning paradigm.

It is worth clarifying that neural networks of course in principle could implement these representations, at least in the same sense as how a Boolean network can implement a Turing machine. The question is if they in practice can learn such representations in a reasonable way. Consider the example I gave with how an MLP can't learn an identity function, unless the training data essentially forces it to memorise one. The question is whether or not a similar thing is true of LoT-style representations. Can you think of a natural way to represent a LoT in a vector space, that a neural network might plausibly learn, without being "forced" by the training data?

As an extremely simple example, a CNN and an MLP will in practice not learn the same kinds of representations, even though the CNN model space is contained in the MLP model space (if you make them wide enough). How do I know that an MLP won't learn a CNN-like representation? Because these representations are not "natural" to MLPs, and the MLP will not be explicitly incentivised to learn them. My sense is that most deep learning systems are inclined away from LoT-like representations for similar reasons.

What is true of human brains but not of neural networks such that human brains can do this but neural networks can't?

A human brain is not a tabula rasa system trained by gradient descent. I don't know how a human brain is organised, what learning algorithms are used, or what parts are learnt as opposed to innate, etc, but it does not seem as though it works in the same way as a deep learning system. 

What is true of human brains but not neural networks such that human brains can represent programs but neural networks can't?

(I'd note that I'm including chain-of-thought as a way that neural networks can run programs.)

Here I will again just say that a human brain isn't a tabula rasa system trained by gradient descent, so it is not inherently surprising for one of the two to have a property that the other one does not.

Chain-of-thought and attention mechanisms do certainly do seem to bring deep learning systems much closer to the ability to reason in terms of variables. Whether or not it is sufficient, I do not know.

I would bet that you can play chess, but you cannot fold a protein (even if the rules for protein were verbally described to you). What's the difference?

Why wouldn't I be able to fold a protein? At least if the size of the relevant state space is similar to that of eg chess.

(Also, to be clear, GPT-3 struggles with verbally described mazes with as few as ~5 states.)

Why doesn't this apply to humans as well? We forget stuff all the time.

The argument would have to be that humans are more strategic with what to remember, and what to forget.

Comment by Joar Skalse (Logical_Lunatic) on Some Arguments Against Strong Scaling · 2023-01-16T10:07:44.367Z · LW · GW

"human-level question answering is believed to be AI-complete" - I doubt that. I think that [...]

Yes, these are also good points. Human-level question answering is often listed as a candidate for an AI-complete problem, but there are of course people who disagree. I'm inclined to say that question-answering probably is AI-complete, but that belief is not very strongly held. In your example of the painter; you could still convey a low-resolution version of the image as a grid of flat colours (similar to how images are represented in computers), and tell the painter to first paint that out, and then paint another version of what the grid image depicts.

We don't care that match about specific phrasing, and use the "loss" of how match the content make sense, is true, is useful...

Yes, I agree. Humans are certainly better than the GPTs at producing "representative" text, rather than text that is likely on a word-by-word basis. My point there was just to show that "reaching human-level performance on next-token prediction" does not correspond to human-level intelligence (and has already been reached).

Memorization & generalisation - just noting that it is a spectrum rather than a dichotomy, as compression ratios are. Anyway, the current methods don't seem to generalise well enough to overcome the sparsity of public data in some domains - which may be the main bottleneck in (e.g.) RL anyway.

I agree.

let's spell the obvious objection - it is obviously possible to implement discrete representations over continuous representations. This is why we can have digital computers that are based on electrical currents rather than little rocks. The problem is just that keeping it robustly discrete is hard, and probably very hard to learn.

Of course. The main question is if it is at all possible to actually learn these representations in a reasonable way. The main benefit from these kinds of representations would come from a much better ability to generalise, and this is only useful if they are also reasonably easy to learn. Consider my example with an MLP learning an identity function -- it can learn it, but it is by "memorising" it rather than "actually learning" it. For AGI, we would need a system that can learn combinatorial representations quickly, rather than learn them in the way that an MLP learns an identity function.

I think that problem may be solved easily with minor changes of architecture though, and therefore should not effect timelines.

Maybe, that remains to be seen. My impression is that the most senior AI researchers (Yoshua Bengio, Yann LeCun, Stuart Russell, etc) lean in the other direction (but I could be wrong about this). As I said, I feel a bit confused/uncertain about the force of the LoT argument.

Inductive logic programming - generalise well in a much more restricted hypothesis space, as one should expect based on learning theory.

To me, it is not at all obvious that ILP systems have a more restricted hypothesis space than deep learning systems. If anything, I would expect it to be the other way around (though this of course depends on the particular system -- I have mainly used metagol). Rather, the way I think of it is that ILP systems have a much stronger simplicity bias than deep learning systems, and that this is the main reason for why they can generalise better from small amounts of data (and the reason they don't work well in practice is that this training method is too expensive for more large-scale problems).

Comment by Joar Skalse (Logical_Lunatic) on Some Arguments Against Strong Scaling · 2023-01-16T09:32:39.402Z · LW · GW

Ah, now I get your point, sorry. Yes, it is true that GPTs are not incentivised to reproduce the full data distribution, but rather, are incentivised to reproduce something more similar to a maximum-likelihood estimate point distribution. This means that they have lower variance (at least in the limit), which may improve performance in some domains, as you point out. But individual samples from the model will still have a high likelihood under the data distribution. 

Comment by Joar Skalse (Logical_Lunatic) on Some Arguments Against Strong Scaling · 2023-01-16T09:26:39.501Z · LW · GW

The benchmarks tell you about what the existing systems do. They don't tell you about what's possible.

Of course. It is almost certainly possible to solve the problem of catastrophic forgetting, and the solution might not be that complicated either. My point is that it is a fairly significant problem that has not yet been solved, and that solving it probably requires some insight or idea that does not yet exist. You can achieve some degree of lifelong learning through regularised fine-tuning, but you cannot get anywhere near what would be required for human-level cognition.

You could summarize InstructGPT's lesson as "You can get huge capability gains by comparably minor things added on top".

Yes, I think that lesson has been proven quite conclusively now. I also found systems like PaLM-SayCan very convincing for this point. But the question is not whether or not you can get huge capability gains -- this is evidently true -- the question is whether you get close to AGI without new theoretical breakthroughts. I want to know if we are now on (and close to) the end of the critical path, or whether we should expect unforeseeable breakthroughts to throw us off course a few more times before then. 

Comment by Joar Skalse (Logical_Lunatic) on Some Arguments Against Strong Scaling · 2023-01-14T00:38:26.593Z · LW · GW

If you look at a random text on the internet it would be very surprising if every word in it is the most likely word to follow based on previous words. 

I'm not completely sure what your point is here. Suppose you have a biased coin, that comes up heads with p=0.6 and tails with p=0.4. Suppose you flip it 10 times. Would it be surprising if you then get heads 10 times in a row? Yes, in a sense. But that is still the most likely individual sequence.

The step from GTP3 to InstructGPT and ChatGPT was not one of scaling up in terms of size of models and substantial increase in the amount of training data. [...]Over at Deep Mind they have GATO which is an approach that combines large language model with other problems sets. 

I would consider InstructGPT, ChatGPT, GATO, and similar systems, to all be in the general reference class of systems that are "mostly big transformers, trained in a self-supervised way, with some comparably minor things added on top".

That's just not true for ChatGPT. ChatGPT was very fast in learning how people tricked it to produce TOS violating content. 

I'm not sure if this has been made public, but I would be surprised if this was achieved by (substantial) retraining of the underlying foundation model. My guess is that this was achieved mainly by various filters put on top. But it is possible that fine tuning was used. Regardless, catastrophic forgetting remains a fundamental issue. There are various benchmarks you can take a look at, if you want.

If you ask ChatGPT to multiply two 4-digit numbers it writes out the reasoning process in natural knowledge and comes to the right answer. ChatGPT is already today decent at using language for its reasoning process. 

A system can multiply two 4-digit numbers and explain the reasoning process without exhibiting productivity and systematicity to the degree that an AGI would have to. Again, the point is not quite whether or not the system can use language to reason, the point is how it represents propositions, and what that tells us about its ability to generalise (the LoT hypothesis should really have been given a different name...).

Comment by Joar Skalse (Logical_Lunatic) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-03-03T14:46:57.781Z · LW · GW

What I'm suggesting is that volume in high-dimensions can concentrate on the boundary.

Yes. I imagine this is why overtraining doesn't make a huge difference.

Falsifiable Hypothesis: Compare SGD with overtaining to the random sampling algorithm. You will see that functions that are unlikely to be generated by random sampling will be more likely under SGD with overtraining. Moreover, functions that are more likely with random sampling will be become less likely under SGD with overtraining.

See e.g., page 47 in the main paper.

Comment by Joar Skalse (Logical_Lunatic) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-03-03T14:46:41.081Z · LW · GW

(Maybe you weren't disagreeing with Zach and were just saying the same thing a different way?)

I'm honestly not sure, I just wasn't really sure what he meant when he said that the Bayesian and the Kolmogorov complexity stuff were "distractions from the main point".

This feels similar to:

Saying that MLK was a "criminal" is one way of saying that MLK thought and acted as though he had a moral responsibility to break unjust laws and to take direct action.

(This is an exaggeration but I think it is directionally correct. Certainly when I read the title "neural networks are fundamentally Bayesian" I was thinking of something very different.)

Haha. That's obviously not what we're trying to do here, but I do see what you mean. I originally wanted to express these ideas in more geometric language, rather than probability-theoretic language, but in the end we decided to go for more probability-theoretic language anyway. 

I agree that this arguably could be mildly misleading. For example, the correspondence between SGD and Bayesian sampling only really holds for some initialisation distributions. If you deterministically initialise your neural network to the origin (i.e., all zero weights) then SGD will most certainly not behave like Bayesian sampling with the initialisation distribution as its prior. Then again, the probability-theoretic formulation might make other things more intuitive.

Comment by Joar Skalse (Logical_Lunatic) on Why Neural Networks Generalise, and Why They Are (Kind of) Bayesian · 2021-03-03T14:46:22.257Z · LW · GW

I agree with your summary. I'm mainly just clarifying what my view is of the strength and overall role of the Algorithmic Information Theory arguments, since you said you found them unconvincing. 

I do however disagree that those arguments can be applied to "literally any machine learning algorithm", although they certainly do apply to a much larger class of ML algorithms than just neural networks. However, I also don't think this is necessarily a bad thing. The picture that the AIT arguments give makes it reasonably unsurprising that you would get the double-descent phenomenon as you increase the size of a model (at small sizes VC-dimensionality mechanisms dominate, but at larger sizes the overparameterisation starts to induce a simplicity bias, which eventually starts to dominate). Since you get double descent in the model size for both neural networks and eg random forests, you should expect there to be some mechanism in common between them (even if the details of course differ from case to case).