Wittgenstein and ML — parameters vs architecture

post by Cleo Nardo (strawberry calm) · 2023-03-24T04:54:07.648Z · LW · GW · 9 comments

Contents

  1. Deep Learning
  2. Symbolic Logic
  3. Wittgenstein
  4. Alignment relevance
None
9 comments

Status: a brief distillation of Wittgenstein's book On Certainty, using examples from deep learning and GOFAI, plus discussion of AI alignment and interpretability.


"That is to say, the questions that we raise and our doubts depend on the fact that some propositions are exempt from doubt, are as it were like hinges on which those turn."

— Ludwig Wittgenstein, On Certainty

1. Deep Learning

Suppose we want a neural network to detect whether two children are siblings based on photographs of their face. The network will received two -dimensional vectors  and representing the pixels in each image, and will return a value  which we interpret as the log-odds that the children are siblings. So the model has type-signature .

There are two ways we can do this.

  1. We could use an architecture , where —
    •  is the sigmoid function
    •  is an  matrix of learned parameters,
    •   is a learned bias.
    • This model has  free parameters.
  2. Alternatively, we could use an architecture , where —
    •   is the sigmoid function
    •  is an  upper-triangular matrix of learned parameters
    •   is a learned bias
    • This model has  free parameters.

Each model has a vector of free parameters . If we train the model via SGD on a dataset (or via some other method) we will end up with a trained models , where  is the architecture.

Anyway, we now have two different NN models, and we want to ascribe beliefs to each of them. Consider the proposition  that siblingness is symmetric, i.e. every person is the sibling of their siblings. What does it mean to say that a model knows or belives that .

Let's start with a black-box definition of knowledge or belief: when we say that a model knows or believes that , we mean that  for all  which look sufficiently like faces. According to this black-box definition, both trained models believe 

But if we peer inside the black box, we can see that NN Model 1 believes  in a very different way than how NN Model 2 believes .

These are two different kinds of belief.

2. Symbolic Logic

Suppose we use GOFAI/symbolic logic to determine whether two children are siblings.

Our model consists of three things —

  1. A language  consisting of names and binary familial relations.
  2. A knowledge-base  consisting of -formulae.
  3. A deductive system  which takes a set of -formulae (premises) to a larger set of -formulae (conclusions).

 There are two ways we can do this.

  1. We could use a system  , where —
    • The language  has names for every character and familial relations 
    • The knowledge-base  has axioms 
    • The deductive system  corresponds to first-order predicate logic.
  2. Alternatively, we could use a system , where —
    • The language  has names for every character and familial relations 
    • The knowledge-base  has axioms 
    • The deductive system  corresponds to first-order predicate logic with an additional logical rule .

In this situation, we have two different SL models, and we want to ascribe beliefs to each of them. Consider the proposition  that siblingness is symmetric, i.e. every person is the sibling of their siblings.

Let's start with a black-box definition of knowledge or belief: when we say that a model knows or believes that , we mean that  for every pair of closed -terms . According to this black-box definition, both models believe 

But if we peer inside the black box, we can see that SL Model 1 believes  in a very different way than how SL Model 2 believes .

These are two different kinds of belief. Can you see how they map onto the distinction in the previous section?

3. Wittgenstein

In On Certainty, Wittgenstein contrasts two different kinds of belief.

 PerceptionJudgement
Free beliefThis cat is furryToday is a Thursday
Hinge beliefThere are three colours

4. Alignment relevance

9 comments

Comments sorted by top scores.

comment by philipn · 2023-03-24T07:30:55.545Z · LW(p) · GW(p)

Could you elaborate on "For NN Model 1, the belief is encoded in the learned parameters . For NN Model 2, the belief is encoded in the architecture itself "?

Replies from: philh
comment by philh · 2023-03-26T10:27:57.444Z · LW(p) · GW(p)

If (i.e. is symmetric), then . The first model would (we suppose) learn a symmetric , because in reality siblingness is symmetric. The second model uses a matrix that will always be symmetric, no matter what it's learned.

(In reality the first model presumably wouldn't learn an exactly-symmetric matrix, but we could talk about "close enough" and/or about behavior in the limit.)

Replies from: strawberry calm
comment by Cleo Nardo (strawberry calm) · 2023-03-26T12:06:31.154Z · LW(p) · GW(p)

Yep, exactly!

Two things to note:

(1)

Note that the distinction between hinge beliefs and free beliefs does not supervene on the black-box behaviour of NNs/LLMs. It depends on how the belief is implemented, how the belief is learned, how the belief might change, etc.

(2)

"The second model uses a matrix that will always be symmetric, no matter what it's learned." might make it seem that the two models are more similar than they actually are.

You might think that both models store an  matrix , and the architecture of both models is , but Model 1 has a slightly symmetric matrix  whereas Model 2 has an exactly symmetric matrix . But this isn't true. The second model doesn't store a symmetric matrix — it stores an upper triangle. 

comment by abhayesian · 2023-03-24T13:35:41.072Z · LW(p) · GW(p)

I do not think that "101 is a prime number" and  "I am currently on Earth" are implemented that differently in my brain; they both seem to be implemented in parameters rather than architecture.  I guess they also wouldn't be implemented differently in modern-day LLMs.  Maybe the relevant extension to LLMs would be the facts the model would think of when prompted with the empty string vs. some other detailed prompt.

Replies from: strawberry calm
comment by Cleo Nardo (strawberry calm) · 2023-03-24T14:56:50.339Z · LW(p) · GW(p)

The proposition "I am currently on Earth" is implemented both in the parameters and in the architecture, independently.

Replies from: abhayesian
comment by abhayesian · 2023-03-24T15:07:30.972Z · LW(p) · GW(p)

How can "I am currently on Earth" be encoded directly into the structure of the brain?  I also feel that "101 is a prime number" is more fundamental to me (being about logical structure rather than physical structure) than currently being on Earth, so I'm having a hard time understanding why this is not considered a hinge belief.

comment by Mateusz Bagiński (mateusz-baginski) · 2024-11-09T14:28:14.647Z · LW(p) · GW(p)

Thanks for the post! I expected some mumbo jumbo but it turned out to be an interesting intuition pump.

comment by Marc Carauleanu (Marc-Everin Carauleanu) · 2023-03-29T11:24:28.750Z · LW(p) · GW(p)

It feels like if the agent is generally intelligent enough hinge beliefs could be reasoned/fine-tuned against for the purposes of a better model of the world. This would mean that the priors from the hinge beliefs would still be present but the free parameters would update to try to account for them at least on a conceptual level. Examples would include general relativity, quantum mechanics and potentially even paraconsistent logic for which some humans have tried to update their free parameters to account as much as possible for their hinge beliefs for the purpose of better modelling the world (we should expect this in AGI as it is an instrumentally convergent goal). Moreover, a sufficiently capable agent could self-modify to get rid of the limiting hinge beliefs for the same reasons. This problem could be averted if the hinge beliefs/priors were defining the agent's goals but goals seem to be fairly specific and about concepts in a world model but hinge beliefs tend to be more general eg. how those concepts relate. Therefore, I'm uncertain how stable alignment solutions that rely on hinge beliefs would be.

comment by Bill Benzon (bill-benzon) · 2023-03-24T10:20:53.671Z · LW(p) · GW(p)

LessWrong's favourite analogy — the map and the territory.

Ah, that explains so much about this place.

Random not-so-random factoid. One of Wittgenstein's students was a woman named Margaret Masterman. She became a very distinguished British academic and was a pioneer in the field of computational linguistics. I believe she was the first to program a computer to generate haiku, back in the late 60s. Yes, primitive by today's standards. But the first.