Ruling Out Lookup Tables
post by Alfred Harwood · 2025-02-04T10:39:34.899Z · LW · GW · 1 commentsContents
Introduction The Smallest Possible Lookup Table Optimal Compression Some Obvious Consequences An Example Why do we care? None 1 comment
This post was written during Alex Altair's agent foundations fellowship [LW · GW] funded by the LTFF.
This is not particularly surprising or complex, but I wanted it written up somewhere. When we have been discussing the Agent-like Structure problem, lookup tables often come up as a useful counter-example or intuition pump for how a system could exhibit agent-like behaviour without agent-like structure. It is fairly intuitive that, in the limit of a large number of entries, a lookup table requires a longer program to implement than a program which 'just' computes a function. A lookup table's size can be reduced using efficient encoding but even a lookup table whose entries are encoded using a maximally efficient entropy code still grows in size with the number of entries. This post makes that claim quantitative.
Introduction
Suppose we have a black box which takes in strings and spits out other strings. Based on the inputs and outputs of the black box, it seems like the box is calculating some computable function . One possibility is that the black box contains a program which calculates the function on the input, and then outputs it. Another possibility is that the black box contains a big table with lots of possible inputs and their corresponding outputs. The program inside the black box then looks through this table until it finds an entry matching its input and prints the corresponding output. This is known as a lookup table.
Suppose we say that the box is 'actually calculating the function', if the box contains finite program which will return for every possible input . Can we be sure that the box is actually calculating the function rather than using a lookup table?
Just by looking at a finite number of instances of input-output behaviour, we can't. Input-output behaviour for any function can be mimicked by a lookup table. But we can rule out a lookup table if we know the size of the program being executed inside the black box. This is because a program which computes a function has a fixed length. The program which calculates the function is the same length whether we test its behaviour using just one string of length two or on one hundred strings of length one trillion (though other implementation requirements, such as memory, may be different). A lookup table, must, at minimum, contain entries corresponding to all of the inputs and outputs for which we have tested it. This means that the minimum possible size for a lookup table-based program grows as the number of inputs and outputs for which it is tested grows.
Suppose we know the length of the program in the black box. If the box shows the correct input-output behaviour for a number of inputs which is too large to be stored in a lookup table, then we can rule out a lookup table and assume that the box is implementing some other program. We will now make that claim quantitative.
The Smallest Possible Lookup Table
In order to rule out a lookup table on the basis of length, we must first identify the smallest program which constitutes a lookup table and which can mimic the input-output behaviour of some function. We will call something a lookup table if it stores a representation of each input-output pair and, when presented with an input, searches through this list to find an appropriate output.
Let us consider the example above, where it seems like the black box is computing the function . Imagine that we have tested it on one hundred bitstrings, each of length one billion. One naive way to capture this behaviour in a lookup table would be to explicitly represent each input and output in a table (along with a fixed-length program which searches through the entries of the table and outputs the correct entry).
Since each of the 100 inputs and 100 outputs has length 1 billion, the length of such a program would be at least 200 billion bits, just to accommodate the entries to the table, plus a constant to accommodate the program which searches through the table. On top of this, the program would need some way of delimiting the 'rows' and 'columns' of the table. Can we compress the size of this program further, without losing the right to call it a 'lookup table'?
For a start, the table itself doesn't have to contain explicit representations of all of the inputs and outputs. We can add a small program which encodes the inputs and outputs as smaller strings. After searching through the table which contains these small encoded strings, the program then decodes the entry in the table corresponding to the output.
The total program for this procedure will be shorter than the program which explicitly represents each entry in the lookup table, provided that the reduction in length from compression is greater than the extra length from the encoding/decoding programs. A program like this is still spiritually a lookup table since it represents each input and output.
Optimal Compression
Suppose we have a lookup table containing entries in total which correspond to encodings of the inputs and outputs (so there are inputs and outputs). Let the length of each encoded entry be denoted by with where is the number of unique entries in the table. Let the number of times each entry appears in the table be denoted by so that . In a lookup table without redundancy, each input will only be represented once, but multiple inputs might map to the same output or some input strings may 'double up' as outputs. This would lead to for the codewords representing those strings. In order for a program to be able to search through the table and identify each input and output separately, the encoding must be uniquely decodable. Ensuring that the code is uniquely decodable means that we don't need to included extra bits to demarcate where one entry ends and another begins.
The total length of the part of the program constituting the lookup table is:
where we have treated the 's as a vector .
Since the codewords form a uniquely decodable code, they must satisfy the Kraft-McMillan Inequality:
Note that the left hand side of this constraint monotonically decreases with . Recall that we are interested in finding the shortest possible encoding. Therefore so we can treat the constraint as an equality, since any encoding with can have its codeword lengths shortened until it satisfies . This means that we can treat the constraint as an equality and use Lagrange multipliers to minimize with respect to the constraint .
The Lagrange multipliers method involves solving the system of equations:
for and a parameter along with the condition
This leads to the following set of equations:
Combined with the constraint, this is a system of equations with unknowns (the values of plus one ) and can be solved. Details of the solution can be found in this footnote[1]. The set of lengths obtained by minimizing subject to is
Note that we haven't actually specified what encoding we are using, we have just obtained the minimum lengths of the table entries. For a concrete example, note that the this encoding has properties similar the Shannon-Fano code, Huffman code or other entropy codes. These codes assign codewords of length where is the probability that a particular word will appear. Our code assigns codewords similarly if are interpreted as 'probabilities' that each word appears in any given cell of the lookup table. The most common entries in the table (those with the largest ) are assigned the shortest codewords, while entries which only appear once in the table are assigned codewords of length .
This means that the total length of the smallest lookup table is
The complete program which uses the lookup table will have length equal to plus a value capturing the encoding/decoding program and the procedure for searching through the table. Therefore, we can say that our expression for corresponds to a lower bound for the total length of the program.
If we write is the probability that a randomly selected element of the table is the -th codeword we can write explicitly as an entropy of a probability distribution :
Unlike a program which computes the function directly, this program grows with the number of inputs for which it outputs the correct string. Thus, if we test our black box with a number of inputs and outputs such that is larger than the length of the program in the black box, we can be sure that the program is not implementing a lookup table. (This doesn't necessarily mean that it is computing the function, the black box may contain a different function which happens to overlap with the candidate function for a subset of inputs and outputs.) In practice, the length of lookup table program will be larger than this bound (since this bound does not include the length of the encoding/decoding program or the program which searches through the table). Nonetheless, it is a hard bound. If the length of a program is smaller than , we can't guarantee that it is computing the function we hoped, but we can guarantee that it is not a lookup table (for our reasonable definition of 'lookup table').
Some Obvious Consequences
- Many to one functions are easier to store in a small lookup table than bijections
- If you want to rule out a lookup table quickly, choose inputs which are expected to map to diverse outputs, since these take up more room in the table. You can strategically query the black box with these inputs.
- If the black box only needs to produce an output on a small number of inputs, it may actually be shorter to store the program as a table, if the program for the function is long.
An Example
Suppose is a function which takes an input string and outputs concatenated with one thousand 0's. A black box containing a program of length bits seems that it is calculating this function. Assuming we test in such a way that all inputs and outputs are unique (for example, we only use inputs which do not end in one thousand zeros) and it correctly outputs for each input, how many inputs do we need to test to rule out a lookup table?
Since each entry in the table is unique, we have and . Plugging this in to our equation gives . Solving gives . This means that we can rule out a lookup table if the black box gives the correct output for inputs. We could improve on this bound if we had estimates for a lower bound on the size of the encoding/decoding and search parts of the lookup table program. We could then solve to obtain a smaller bound.
Why do we care?
When studying the agent-like structure problem [LW · GW] and selection theorems [LW · GW], there is an important distinction between behaviour and structure. Just because a program exhibits some behaviour, that does not necessarily imply that it has a particular structure. A wide variety of structures can produce the same input-output behaviour and you can make almost no claims about what structure is implied by the behaviour. But by making some limited structural assumptions (such as limiting the length of the program) and combining these with behaviours, we can make stronger claims about the kind of structure implied by behaviours. In this sense, the result in this post can be thought of as a (very weak!) selection theorem. Given a basic structural assumption (the limited length of the program) and some behaviour (computing some function for a large number of inputs/outputs) we say something further about the structure (that is is not a lookup table).
- ^
We have . Summing both sides over gives:
Now, we use the fact that and to write
so .
Substituting this into gives
This allows us to solve for the lengths and obtain .
1 comments
Comments sorted by top scores.
comment by Anon User (anon-user) · 2025-02-04T15:09:53.092Z · LW(p) · GW(p)
This seems to be making a somewhat arbitrary distinction - specifically a program that computes f(x) in some sort of a direct way, and a program that computes it in some less direct way (you call it a "lookup table", but you seem to actually allow combining that with arbitrary decompression/decoding algorithms). But realistically, this is a spectrum - e.g. if you have a program computing a predicate P(x, y) that is only true when y = f(x), and then the program just tries all possible y - is that more like a function, or more like a lookup? What about if you have first compute some simple function of the input (e.g. x mod N), then do a lookup?