Posts

The Internal Model Principle: A Straightforward Explanation 2025-04-12T10:58:51.479Z
Towards building blocks of ontologies 2025-02-08T16:03:29.854Z
Ruling Out Lookup Tables 2025-02-04T10:39:34.899Z
A Generalization of the Good Regulator Theorem 2025-01-04T09:55:25.432Z
A Straightforward Explanation of the Good Regulator Theorem 2024-11-18T12:45:48.568Z
Abstractions are not Natural 2024-11-04T11:10:09.023Z

Comments

Comment by Alfred Harwood on $500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need? · 2025-04-23T14:52:10.205Z · LW · GW

This seems like an interesting problem! I've been thinking about it a little bit but wanted to make sure I understood before diving in too deep. Can I see if I understand this by going through the biased coin example?

Suppose I have 2^5 coins and each one is given a unique 5-bit string label covering all binary strings from 00000 to 11111. Call the string on the label .

The label given to the coin indicates its 'true' bias. The string 00000 indicates that the coin with that label has p(heads)=0. The coin labelled 11111 has p(heads)=1. The ‘true’ p(heads) increases in equal steps going up from 00000 to 00001 to 00010 etc. Suppose I randomly pick a coin from this collection, toss it 200 times and call the number of heads X_1. Then I toss it another 200 times and call the number of heads X_2.

Now, if I tell you what the label on the coin was (which tells us the true bias of the coin), telling you X_1 would not give you any more information to help you guess X_2 (and vice versa). This is the first Natural Latent condition ( induces independence between X_1 and X_2). Alternatively, if I didn’t tell you the label, you could estimate it from either X_1 or X_2 equally well. This is the other two diagrams.

I think that the full label  will be an approximate stochastic natural latent. But if we consider only the first bit[1] of the label (which roughly tells us whether the bias is above or below 50% heads) then this bit will be a deterministic natural latent because with reasonably high certainty, you can guess the first bit of  from X_1 or X_2. This is because the conditional entropy H(first bit of |X_1) is low. On the other hand H( | X_1) will be high. If I get only 23 heads out of 200 tosses, I can be reasonably certain that the first bit of  is a 0 (ie the coin has a less than 50% of coming up heads) but can't be as certain what the last bit of  is. Just because  satisfies the Natural Latent conditions within , this doesn’t imply that  satisfies . We can use X_1 to find a 5-bit estimate of , but most of the useful information in that estimate is contained in the first bit. The second bit might be somewhat useful, but its less certain than the first. The last bit of the estimate will largely be noise. This means that going from using  to using ‘first bit of ’ doesn’t decrease the usefulness of the latent very much, since the stuff we are throwing out is largely random. As a result, the ‘first bit of ’ will still satisfy the natural latent conditions almost as well as . By throwing out the later bits, we threw away the most 'stochastic' bits, while keeping the most 'latenty' bits.

So in this case, we have started from a stochastic natural latent and used it to construct a deterministic natural latent which is almost as good. I haven’t done the calculation, but hopefully we could say something like ‘if  satisfies the natural latent conditions within  then the first bit of  satisfies the natural latent conditions within  (or  or something else)’. Would an explicit proof of a statement like this for this case be a special case of the general problem?

The problem question could be framed as something like: “Is there some standard process we can do for every stochastic natural latent, in order to obtain a deterministic natural latent which is almost as good (in terms of \epsilon)”. This process will be analogous to the ‘throwing away the less useful/more random bits of \lambda’ which we did in the example above. Does this sound right?

Also, can all stochastic natural latents can be thought of as 'more approximate' deterministic latents? If a latent satisfies the the three natural latents conditions within , we can always find a (potentially much bigger)  such that this latent also satisfies the deterministic latent condition, right? This is why you need to specify that the problem is showing that a deterministic natural latent exists with 'almost the same' . Does this sound right?

 

 

  1. ^

    I'm going to talk about the 'first bit' but an equivalent argument might also hold for the 'first two bits' or something. I haven't actually checked the maths. 

Comment by Alfred Harwood on Ruling Out Lookup Tables · 2025-02-21T16:42:24.513Z · LW · GW

It seems that the relevant thing is not so much how many values you have tested as the domain size of the function. A function with a large domain cannot be explicitly represented with a small lookup table. But this means you also have to consider how the black box behaves when you feed it something outside of its domain, right?

This sounds right. Implicitly, I was assuming that when the black box was fed an x outside of its domain it would return an error message or at least, something which is not equal to f(x). I realise that I didn't make this clear in the post. 

But what if it is a lookup table, but the index is taken mod the length of the table? Or more generally, what if a hash function is applied to the index first?

In the post I was considering programs which are 'just' a lookup tables. But I agree that there are other programs that use lookup tables but are not 'just' tables. The motivation is that if you want simulate f(x) over a large domain using a function smaller than the bound given in the post, then the program needs to contain some feature which captures something nontrivial about the nature of f. 

Like (simple example just to be concrete) suppose f(x)=[x(mod10)]^2 defined on all real numbers. Then the program in the black box might have two steps: 1) compute x(mod10) using a non-lookup table method then 2) use a lookup table containing x^2 for x=1 to 10. This (by my slightly arbitrary criteria) would count as a program which 'actually computes the function' since it has fixed length and its input/output behaviour coincides with f(x) for all possible inputs. It seems to me that if a blackbox program contains a subsection which computes x(mod10) then it has captured something important about f(x), more so than if the program only contains a big lookup table. 

It seems that counting the number of unique output values taken by the function is more robustly lower-bounds its size as a (generalized) lookup table?

I think this is right. If you wanted to rule out lookup tables being used at any point in the program then this is bottlenecked by the number of unique outputs. 

Comment by Alfred Harwood on Ruling Out Lookup Tables · 2025-02-07T11:55:40.789Z · LW · GW

Cool, that all sounds fair to me. I don't think we have any substantive disagreements.

Comment by Alfred Harwood on Ruling Out Lookup Tables · 2025-02-05T11:33:03.223Z · LW · GW

hmm, we seem to be talking past each other a bit. I think my main point in response is something like this:

In non-trivial settings, (some but not all) structural differences between programs lead to differences in input/output behaviour, even if there is a large domain for which they are behaviourally equivalent.

But that sentence lacks a lot of nuance! I'll try to break it down a bit more to find if/where we disagree (so apologies if a lot of this is re-hashing).

  • I agree that if two programs produce the same input output behaviour for literally every conceivable input then there is not much of an interesting difference between them and you can just view one as a compression of the other.
  • As I said in the post, I consider a program to be 'actually calculating the function', if it is a finite program which will return  for every possible input .
  • If we have a finite length lookup table, it can only output f(x) for a finite number of inputs.
  • If that finite number of inputs is less than the total number of possible inputs, this means that there is at least one input (call it x_0) for which a lookup table will not output f(x_0).
  • I've left unspecified what it will do if you query the lookup table with input x_0. Maybe it doesn't return anything, maybe it outputs an error message, maybe it blows up. The point is that whatever it does, by definition it doesn't return f(x_0).
  • Maybe the number of possible inputs to f is finite and the lookup table is large enough to accommodate them all. In this case, the lookup table would be a 'finite program which will return  for every possible input ' so I would be happy to say that there's not much of a distinction between the lookup table and a different method of computing the function. (Of course, there is a trivial difference in the sense that they are different algorithms, but its not the one we are concerned with here).
  • However, this requires that the size of the program is on the order of (or larger than) the number of possible inputs it might face. I think that in most interesting cases this is not true.
  • By 'interesting cases', I mean things like humans who do not contain an internal representation of every possible sensory input, or LLMs where the number of possible sentences you could input is larger than the model itself.
  • This means that in 'interesting cases', a lookup table will, at some point, give a different output to a program which is directly calculating a function.
  • So structural difference (of the 'lookup table or not' kind) between two finite programs will imply behavioural difference in some domain, even if there is a large domain for which the two programs are behaviourally equivalent (for an environment where the number of possible inputs is larger than the size of the programs).
  • As I see it, this is the motivation behind the Agent-like structure problem. If you know that a program has agent-like structure, this can help you predict its behaviour in domains where you haven't seen it perform.
  • Or, conversely, if you know that selecting for certain behaviours is likely to lead to agent-like structure you can avoid selecting for those behaviours (even if, within a certain domain, those behaviours are benign) because in other domains agent-like behaviour is dangerous.
  • Of course, there are far more types of programs than just 'lookup table' or 'not lookup table'. Lookup tables are just one obvious way for which a program might exhibit certain behaviour in a finite domain which doesn't extend to larger domains.
Comment by Alfred Harwood on Ruling Out Lookup Tables · 2025-02-04T21:15:18.898Z · LW · GW

For most practical purposes, "calculating a function" is only and exactly a very good compression algorithm for the lookup table.  

I think I disagree. Something like this might be true if you just care about input and output behaviour (it becomes true by definition if you consider that any functions with the same input/output behaviour are just different compressions of each other). But it seems to me that how outputs are generated is an important distinction to make. 

I think the difference goes beyond 'heat dissipation or imputed qualia'. As a crude example, imagine that, for some environment f(x) is an 'optimal strategy' (in some sense) for inputs x. Suppose we train AIs in this environment and AI A learns to compute f(x) directly whereas AI B learns to implement a lookup table. Based on performance alone, both A and B are equal, since they have equivalent behaviour. But it seems to me that there is a meaningful distinction between the two. These differences are important since you could use them to make predictions about how the AIs might behave in different environments or when exposed to different inputs.

I agree that there are more degrees of distinction between algorithms than just "lookup table or no". These are interesting, just not covered in the post!

Comment by Alfred Harwood on Ruling Out Lookup Tables · 2025-02-04T20:15:42.678Z · LW · GW

I agree that there is a spectrum of ways to compute f(x) ranging from efficient to inefficient (in terms of program length). But I think that lookup tables are structurally different from direct ways of computing f because they explicitly contain the relationships between inputs and outputs. We can point to a 'row' of a lookup table and say 'this corresponds to the particular input x_1 and the output y_1' and do this for all inputs and outputs in a way that we can't do with a program which directly computes f(x). I think that allowing for compression preserves this important property, so I don't have a problem calling a compressed lookup table a lookup table. Note that the compression allowed in the derivation is not arbitrary since it only applies to the 'table' part of the programme, not the full input/output behaviour of the programme.

if you have a program computing a predicate P(x, y) that is only true when y = f(x), and then the program just tries all possible y - is that more like a function, or more like a lookup?

In order to test whether y=f(x), the program must have calculated f(x) and stored it somewhere. How did it calculate f(x)? Did it use a table or calculate it directly?

What about if you have first compute some simple function of the input (e.g. x mod N), then do a lookup?

I agree that you can have hybrid cases. But there is still a meaningful distinction between the part of the program which is a lookup table and the part of the program which isn't (in describing the program you used this distinction!). In the example you have given the pre-processing function (x mod N) is not a bijection. This means that we couldn't interpret the pre-processing as an 'encoding' so we couldn't point to parts of the program corresponding to each unique input and output. Suppose the function was f(x) =( x mod N )+ 2 and the pre-processing captured the x mod N part and it then used a 2xN lookup table to calculate the '+2'. I think this program is importantly different from one which stores the input and output for every single x. So when taken as a whole the program would not be a lookup table and might be shorter than the lookup table bound presented above. But this captures something important! Namely, that the program is doing some degree of 'actually computing the function'.

Comment by Alfred Harwood on A Generalization of the Good Regulator Theorem · 2025-01-05T14:40:19.140Z · LW · GW

Regarding your request for a practical example.

Short Answer: It's a toy model. I don't think I can come up with a practical example which would address all of your issues.

Long Answer, which I think gets at what we disagree about:

I think we are approaching this from different angles. I am interested in the GRT from an agent foundations point of view, not because I want to make better thermostats. I'm sure that GRT is pretty useless for most practical applications of control theory! I read John Wentworth's post where he suggested that the entropy-reduction problem may lead to embedded-agency problems. Turns out it doesn't but it would have been cool if it did! I wanted to tie up that loose end from John's post.

Why do I care about entropy reduction at all?

  • I'm interested in 'optimization', as it pertains to the agent-like structure problem, and optimization is closely related to entropy reduction, so this seemed like an interesting avenue to explore.
  • Reducing entropy can be thought of as one 'component' of utility maximization, so it's interesting from that point of view.
  • Reducing entropy is often a necessary (but not sufficient) condition for achieving goals. A thermostat can achieve an average temperature of 25C by ensuring that the room temperature comes from a uniform distribution over all temperatures between 75C and -25C. But a better thermostat will ensure that the temperature is distributed over a narrower (lower entropy) distribution around 25C .
Comment by Alfred Harwood on A Generalization of the Good Regulator Theorem · 2025-01-05T14:11:06.738Z · LW · GW

I think we probably agree that the Good Regulator Theorem could have a better name (the 'Good Entropy-Reducer Theorem'?). But unfortunately, the result is most commonly known using the name 'Good Regulator Theorem'. It seems to me that 55 years after the original paper was published, it is too late to try to re-brand.

I decided to use that name (along with the word 'regulator') so that readers would know which theorem this post is about. To avoid confusion, I made sure to be clear (right in the first few paragraphs) about the specific way that I was using the word 'regulator'. This seems like a fine compromise to me.

Comment by Alfred Harwood on A Generalization of the Good Regulator Theorem · 2025-01-04T20:36:34.243Z · LW · GW

Minimising the entropy of Z says that Z is to have a narrow distribution, but says nothing about where the mean of that distribution should be. This does not look like anything that would be called "regulation".

I wouldn't get too hung up on the word 'regulator'. It's used in a very loose way here, as in common in old cybernetics-flavoured papers. The regulator is regulating, in the the sense that it is controlling and restricting the range of outcomes.

Time is absent from the system as described. Surely a "regulator" should keep the value of Z near constant over time?

You could imagine that this is a memoryless system where the process repeats every timestep (and S/X has no memory of what Z was previously). Then, a regulator which chose a policy which minimized entropy of Z (and used the same policy each timestep) would keep the value of Z as near constant as possible. (Maybe this is closer to what you were thinking of as a 'regulator' the previous point?)

The value of Z is assumed to be a deterministic function of S and R. Systems that operate over time are typically described by differential equations, and any instant state of S and R might coexist with any state of Z.

This post deals with discrete systems where S and R happen 'before' Z. If you want something similar but for continuous systems over time, you might want to take a look at the Internal Model Principle, which is a similar kind of result, which can apply to continuous time systems modelled with differential equations.

R has no way of knowing the value of Z. It is working in the dark. Why is it hobbled in this way?

In this setup, S is just defined as the input to R. If you wanted, you could think of as S as the 'initial state' and Z as the 'final state' of the system, provided they had the same range. But this setup also allows S to have a range different to S. R can know the value of S/X, which is what it needs to know in order to affect Z. If it knows the dynamics of the setup then it can predict the outcome and doesn't need to 'see' Z. 

If you are thinking of something like 'R must learn a strategy by trying out actions and observing their effect on Z' then this is beyond the scope of this post! The Good Regulator Theorem(s) concern optimal behaviour, not how that behaviour is learned.

There are no unmodelled influences on Z. In practice, there are always unmodelled influences.

If by 'unmodelled' influences, you mean 'variables that the regulator cannot see, but still affect Z', then this problem is solved in the framework with X (and implicitly N). 

Comment by Alfred Harwood on Why are there no interesting (1D, 2-state) quantum cellular automata? · 2024-11-26T22:04:08.883Z · LW · GW

When you are considering finite tape length, how do you deal with  when  or  when  ?  

Comment by Alfred Harwood on Why I Think All The Species Of Significantly Debated Consciousness Are Conscious And Suffer Intensely · 2024-11-20T21:53:44.988Z · LW · GW

Should there be a 'd' on the end of 'Debate' in the title or am I parsing it wrong? 

Comment by Alfred Harwood on A Straightforward Explanation of the Good Regulator Theorem · 2024-11-19T15:34:20.249Z · LW · GW

It is meant to read 350°F. The point is that the temperature is too high to be a useful domestic thermostat. I have changed the sentence to make this clear (and added a ° symbol ). The passage now reads:

Scholten gives the evocative example of a thermostat which steers the temperature of a room to 350°F with a probability close to certainty. The entropy of the final distribution over room temperatures would be very low, so in this sense the regulator is still 'good', even though the temperature it achieves is too high for it to be useful as a domestic thermostat.

(Edit: I've just realised that 35°F would also be inappropriate for a domestic thermostat by virtue of being too cold so either works for the purpose of the example. Scholten does use 350, so I've stuck with that. Sorry, I'm unfamiliar with Fahrenheit!) 

Comment by Alfred Harwood on Abstractions are not Natural · 2024-11-06T22:24:25.947Z · LW · GW

Your reaction seems fair, thanks for your thoughts! Its a good a suggestion to add an epistemic status - I'll be sure to add one next time I write something like this.

Comment by Alfred Harwood on Abstractions are not Natural · 2024-11-06T22:12:35.739Z · LW · GW

Got it, that makes sense. I think I was trying to get at something like this when I was talking about constraints/selection pressure (a system has less need to use abstractions if its compute is unconstrained or there is no selection pressure in the 'produce short/quick programs' direction) but your explanation makes this clearer. Thanks again for clearing this up!

Comment by Alfred Harwood on Abstractions are not Natural · 2024-11-05T14:23:47.917Z · LW · GW

Thanks for taking the time to explain this. This is a clears a lot of things up.

Let me see if I understand. So one reason that an agent might develop an abstraction is that it has a utility function that deals with that abstraction (if my utility function is ‘maximize the number of trees’, its helpful to have an abstraction for ‘trees’). But the NAH goes further than this and says that, even if an agent had a very ‘unnatural’ utility function which didn’t deal with abstractions (eg. it was something very fine-grained like ‘I value this atom being in this exact position and this atom being in a different position etc…’) it would still, for instrumental reasons, end up using the ‘natural’ set of abstractions because the natural abstractions are in some sense the only ‘proper’ set of abstractions for interacting with the world. Similarly, while there might be perceptual systems/brains/etc which favour using certain unnatural abstractions, once agents become capable enough to start pursuing complex goals (or rather goals requiring a high level of generality), the universe will force them to use the natural abstractions (or else fail to achieve their goals). Does this sound right?

Presumably its possible to define some ‘unnatural’ abstractions. Would the argument be that unnatural abstractions are just in practice not useful, or is it that the universe is such that its ~impossible to model the world using unnatural abstractions?

Comment by Alfred Harwood on Abstractions are not Natural · 2024-11-04T23:24:14.741Z · LW · GW

Its late where I am now so I'm going to read carefully and respond to comments tomorrow, but before I go to bed I want to quickly respond to your claim that you found the post hostile because I don't want to leave it hanging.

I wanted to express my disagreements/misunderstandings/whatever as clearly as I could but had no intention to express hostility. I bear no hostility towards anyone reading this, especially people who have worked hard thinking about important issues like AI alignment. Apologies to you and anyone else who found the post hostile.

Comment by Alfred Harwood on Abstractions are not Natural · 2024-11-04T16:53:36.465Z · LW · GW

Thanks for taking the time to explain this to me! I would like to read your links before responding to the meat of your comment, but I wanted to note something before going forward because there is a pattern I've noticed in both my verbal conversations on this subject and the comments so far.

I say something like 'lots of systems don't seem to converge on the same abstractions' and then someone else says 'yeah, I agree obviously' and then starts talking about another feature of the NAH while not taking this as evidence against the NAH.

But most posts on the NAH explicitly mention something like the claim that many systems will converge on similar abstractions [1]. I find this really confusing!

Going forward it might be useful to taboo the phrase 'the Natural Abstraction Hypothesis' (?) and just discuss what we think is true about the world.

Your comment that its a claim about 'proving things about the distribution of environments' is helpful.  To help me understand what people mean by the NAH could you tell me what would (in your view) constitute strong evidence against the NAH? (If the fact that we can point to systems which haven't converged on using the same abstractions doesn't count)

 

  1. ^

    Natural Abstractions: Key Claims, Theorems and Critiques: 'many cognitive systems learn similar abstractions', 

    Testing the Natural Abstraction Hypothesis: Project Intro 'a wide variety of cognitive architectures will learn to use approximately the same high-level abstract objects/concepts to reason about the world'

    The Natural Abstraction Hypothesis: Implications and Evidence 'there exist abstractions (relatively low-dimensional summaries which capture information relevant for prediction) which are "natural" in the sense that we should expect a wide variety of cognitive systems to converge on using them.

    '

Comment by Alfred Harwood on Open Thread Summer 2024 · 2024-09-12T12:45:06.538Z · LW · GW

Hello! My name is Alfred. I recently took part in AI Safety Camp 2024 and have been thinking about the Agent-like structure problem. Hopefully I will have some posts to share on the subject soon.