# Equivalent of Information Theory but for Computation?

post by Jemist · 2021-07-17T09:38:48.227Z · LW · GW · 27 commentsThis is a question post.

A quick google search brings up "computation theory". This does not seem to be as precise as information theory, we cannot talk about "n units of computation" the same way we can talk about "m bits of information". In fact there does not seem to be a generally accepted fundamental unit of computation which we can talk about.

Computational complexity theory is well-developed but only talks about scaling.

Turing machines seem to be able to "hide" a lot of the work in the number of available symbols to go on the tape or the number of available states of the pointer.

Logic circuits haven't produced much and seem a bit arbitrary (why only OR, AND, and NOT gates?).

Seems like we need a theory of computation to qualitatively understand things like logical uncertainty.

## Answers

Oddly, it is simply because it is a much harder problem. The basic theories for representing information are the basic theories of computation, but then you have to manipulate and transform them. Different ways of representing this make certain computations much simpler, and others much more complex. Software is equivalent to hardware, so anything that can be done in software would have to be fundamentally possible in dedicated hardware, that would find it incredibly easier to compute than the general purpose stuff. Literally everything that can be done with a computer can be done with a dedicated machine that would find it much easier than a normal computer.

And, Or, and Not gates are simply for human convenience. We find them easy to reason about, and don't need to use to many to describe/make complicated things. The computers themselves use Nand or Nor.

You might want to look at trinary, which was a perfectly good logic that can include uncertainty. It is perhaps slightly better for computers than binary, but only the Soviet Union used it, so it was a casualty of the far superior western economy.

I'm not sure what you mean by "equivalent of information theory but for computation". Though it can be applied to other things, information theory is fundamentally about computation. Asking what its equivalent for computation is is like asking what the equivalent of a car is for driving on roads.

(Not that it seems to matter, but logic circuits go far beyond or, and and not gates. There are also nand gates, xor gates, really anything you can write a truth table for you can build a gate for.)

Logic circuits haven’t produced much

Haven't they?

and seem a bit arbitrary (why only OR, AND, and NOT gates?).

You can compose any othe gate out of them. In fact you can compose any gate out of NANDs.

One problem with quantifying computation is that any problem with only a single instance can be trivially solved.

For example,

- How much computation does it take to write a program that computes the one millionth digit of pi? Easy - the program just immediately prints "1".
- How much computation does it take to compute the shortest path passing through the 100 largest cities in the US? Again, it's easy - the program does no "real work" and instead outputs the list of the 100 largest cities in the correct order. (I don't know the correct order, but such a program definitely exists.)

For this reason, it doesn't really make sense to talk about computation in terms of problems with only a single instance, or even a finite number of instances.

Instead, computational complexity theory focuses on how algorithms scale for arbitrary input sizes, because that's the only type of question that's actually interesting.

So can you clarify what you're asking for? And how is this related to "qualitatively understanding things like logical uncertainty"?

## ↑ comment by deepthoughtlife · 2021-07-19T17:57:28.211Z · LW(p) · GW(p)

Your comment about cities is not actually true. It is (if I recall correctly), an NP-complete problem. It is quick to check, but unless all NP problems are quick to solve, this one will take a long time [100 is relatively large for the problem] unless you exploit very specific parts of the problem's structure [usually via A* search which uses a very particular kind of heuristic], and even then, you have to be willing to calculate for a long time, or accept an approximate answer if things don't line up just right. Each extra path between them that needs to be checked adds time, and there are a lot of paths. You could not even store a lookup table for this kind of problem easily unless you were certain of getting this exact problem [and creating this lookup table would not be easy.]

Your other example seems suspect as well [unless you actually checked the millionth digit of Pi]. A lookup table would work for this problem though [and lookup tables are trivial for Pi.]

It is in fact useful to think about the scaling on small problems as well [mostly because some algorithms are very difficult, and become infeasible well before large sizes.] Sometimes it is correct to use an algorithm that doesn't scale well when the input size is small. For instance, hybrid quicksorts are superior to pure ones, despite using algorithms that don't scale well when sizes are small.

Replies from: Slider, timothy-johnson## ↑ comment by Slider · 2021-07-19T18:31:45.570Z · LW(p) · GW(p)

There is a difference between solving the route for these specific 100 cities and 100 unknown or freely changable cities. In order to have an interesting problem at all we need to have that scaling built in.

NP-completeness would mean that any problem structure part would transfer. You might be thinking about not bothering to solve it exactly but settling for the "most cases" being allowed to miss some options which is how problems known to be hard in their teorethical pure forms can be tamed to have practically fast "solutions".

Replies from: deepthoughtlife## ↑ comment by deepthoughtlife · 2021-07-19T20:25:41.723Z · LW(p) · GW(p)

You fail to see the issue. There are 100 cities, and an extreme number of paths to get between them [literally infinite if you include cycles, which you have to convince the algorithm to exclude.] You do not know the length of these composite paths between cities, so you have to calculate them.

Theoretically, you need to know the length of every path to be sure you have the shortest path. In practice, we can exclude cycles, and use heuristics [such as map distance] to narrow down which paths to compute, but it is still an overly difficult problem. (It is an equally difficult variant of the traveling salesman problem in computer science). When I just googled it, a team in Japan was being lauded for 16 'cities' using a different kind of processor in 2020. (I don't know how links work here. https://www.popularmechanics.com/science/a30655520/scientists-solve-traveling-salesman-problem/ )

Replies from: Slider## ↑ comment by Slider · 2021-07-19T23:26:20.499Z · LW(p) · GW(p)

I agree we are disagreeing where the core of the issues are.

Sure it explodes pretty heavily. But if we are using constant cities then we could tailor about pattern and knowledge about the composite paths, we would essentially know them.

It is a different task to multiply two numbers together rathetr than to calculate 1234*5678. In order for the solution to the second question to be a valid solution to the first problem it needs to be insensitive to the specific numbers used. Timothy Johnsons main answer was about that no matter how hard the problem is is the scope of the instances to be covered is 1 then the answer will/can consist of the just the value without any actual computation being involved. For an interesting answer the computation aids on how the variations of the cases to be covered can be handled, how the digits of the numbers provided affects what calculations needed to be done to compute the product. But that has the character of scaling.

Replies from: Pattern, deepthoughtlife## ↑ comment by Pattern · 2021-07-20T18:17:34.599Z · LW(p) · GW(p)

It might be useful to specify some of the effects that are specific. For instance:

- Driving between a set of cities might be more Euclidean than the Traveling Salesman problem (which is more general).
- In theory, it could be more difficult if driving costs are asymmetric. Perhaps in practice this would take the form of 1 or 2 roads which are closed for construction, which, by shaping the solution might make computation to find an optimal path faster.
- Ditch the quest for the optimal path, say 3 times it in length is acceptable, and the problem gets easier.
- If you're talking about how in practice, the running time is a lot better than theory('s worst case), then say that.

## ↑ comment by deepthoughtlife · 2021-07-20T02:36:49.883Z · LW(p) · GW(p)

Note: I already responded to him directly about his reply to me.

The fact that the specific and general differ is unimportant to my point. You don't have the answer to start with, and so you have to calculate it. The calculation is what computation is. You can't just assume that you already know the answer, and claim that makes computing it trivial.

The cities being constant changes nothing in this discussion, since you still had to compute it before putting the answer in the lookup table, and knowing which cities it was is only a precondition, not an answer. Memoization (the technical term for keeping track of the intermediate results of your computation) is useful, but not a panacea.

Replies from: Slider## ↑ comment by Slider · 2021-07-20T08:51:58.656Z · LW(p) · GW(p)

If I am a programmer and I can do the calculation on the behalf of my program beforehand at compile time and avoid any runtime computation that is still significant. We can't have the compute time include everything about understanding the question otherwise we need to include kindergarden time about learning what the word "city" means. Thus while "global compute" is inescapable we are proper to just focus on time spent after the algorithm has been designed and frozen in place.

Replies from: deepthoughtlife## ↑ comment by deepthoughtlife · 2021-07-20T19:24:05.218Z · LW(p) · GW(p)

Calculating at compile time is still obviously computation! Obviously, if you can, it is better to do so most of the time, but it is also irrelevant to the point. This isn't something that simply takes a long time to calculate, but if you run it for a few hours or days when creating the program it can be recorded. You cannot, in fact, calculate this beforehand because it is computationally infeasible. (In some cases, where the heuristics mentioned earlier work well enough, it can be computed, but that relies on the structure of the problem, and still requires a lot of computation.)

Obviously, we are just talking past each other, so I'll stop responding here.

## ↑ comment by Timothy Johnson (timothy-johnson) · 2021-07-19T18:30:47.072Z · LW(p) · GW(p)

Sorry, you misunderstood my point. Perhaps I'm being a little pedantic and unclear.

For the cities example, the point is that when the problem domain is restricted to a single example (the top 100 cities in the US), there is some program out there that outputs the list of cities in the correct order.

You can imagine the set of all possible programs, similar to The Library of Babel - Wikipedia. Within that set, there are programs that print the 100 cities in every possible order. One of them is the correct answer. I don't need to know which program that is to know that it exists.

This is why the OP's suggestion to define "fundamental units of computation" doesn't really make sense, and why the standard tools of computational complexity theory are (as far as I know) the only reasonable approach.

Replies from: deepthoughtlife## ↑ comment by deepthoughtlife · 2021-07-19T20:17:16.250Z · LW(p) · GW(p)

If that's what you meant, it is rather unclear in the initial comment. It is, in fact, very important that we do not know what the sequence is. You could see it as the computation is to determine which book in the library of Babel to look at. There is only one correct book [though some are close enough], and we have to find that one [thus, it is a search problem.] How difficult this search is, is actually a well defined problem, but it simply has multiple ways of being done [for instance, by a specialist algorithm, or a general one.]

Of course, I do agree that a lookup table can make some problems trivial, but that doesn't work for this sort of thing [and a lookup table of literally everything is basically is what the Library of Babel would be.] Pure dumb search doesn't work that well, especially when the table is infinite.

Edit: You can consider finding it randomly the upper bound on computational difficulty, but lowering bound requires an actual algorithm [or at least a good description of the kind of thing it is], not just the fact that there is an algorithm. The Library of Babel proves very little in this regard. (Note: I had to edit my edit due to writing something incorrect.)

## 27 comments

Comments sorted by top scores.

## comment by JBlack · 2021-07-19T13:49:04.334Z · LW(p) · GW(p)

Turing machine encodings don't really "hide" much in the alphabet or state, since their information content is logarithmic. Large alphabets can be efficiently emulated with only logarithmic slowdown, that is alphabets of up to 2^N symbols can always be emulated with a TM of only 2 symbols running N times slower. Larger state tables are harder to *efficiently* emulate, but there are at least many universal Turing machines that can emulate any other Turing machine in time proportional to the information content of the state table.

Boolean circuits aren't really arbitrary. The exact set of operators doesn't actually matter much, since every set of operators can be (usually efficiently) emulated by every other universal set. The set {or, and, not} is familiar and can express every other boolean function. The singleton sets {nand} and {nor} are also commonly used. Either of these is also universal, with the only difference between them being the labelling of the states. In one sense, this is the simplest element of digital computation.

I'm not sure what you mean by needing a theory of computation to understand things like logical uncertainty. Can you expand on this a bit more?

Replies from: Slider## ↑ comment by Slider · 2021-07-19T14:30:09.763Z · LW(p) · GW(p)

I think this can also be walked in the other direction in that if you have some computation that does a certain amount of steps then there can be an analogous computation that uses a larger alphabet but runs faster. The limit of this would be where each letter of the alphabeth represents a novel start state/end state and then all computation can be done in a single step.

Replies from: JBlack, timothy-johnson## ↑ comment by JBlack · 2021-07-20T03:41:20.795Z · LW(p) · GW(p)

Yes, and for this reason it's usual to consider only *finite* alphabets.

While any particular bound on input and output size can be processed in a single step with a large enough finite alphabet, Turing machines operate on any finite input without bound on length. Representing all of those with one symbol each would require an infinite alphabet with a correspondingly infinite state transition table.

Replies from: Slider## ↑ comment by Slider · 2021-07-20T09:15:56.606Z · LW(p) · GW(p)

The hope in the orginial question was to somehow have a method of saying "this thing has n compute in it".

I am a bit unsure whether control structures and such can be faithfully preserved. But it seems if 00,01,10,11 can be translated to 0,1,2,3 then a ...010101010111 could be translated into ...123231412 and the very same process could be applied to turn 01,02,03,11,12,13,21,22,23,31,32,33 into A,B,C,D,E,F,G,H,I,J,L,M. Even if we can't get to a single symbol at any given point we can get increased performance by predictable increase in alphabeth and this will not run out. That is for any N for all binary strings of that length there exists a pyramid of letter subsititution where at the top each string is covered by a single letter. That is the trick deosn't rely on actual infinities, it also works for all finite numbers.

## ↑ comment by Timothy Johnson (timothy-johnson) · 2021-07-20T00:43:02.523Z · LW(p) · GW(p)

Turing machines can only have a finite number of states. So you can do all computation in a single step only if the problem has a finite number of possible inputs. At that point, your program is basically a lookup table.

Nontrivial computational problems will have an infinite number of possible inputs, so this strategy won't work in general.

Replies from: Pattern, Slider## ↑ comment by Pattern · 2021-07-20T18:18:57.190Z · LW(p) · GW(p)

At that point, your program is basically a lookup table.

Can you actually produce a look up table for perfect chess play?

Replies from: deepthoughtlife## ↑ comment by deepthoughtlife · 2021-07-20T19:37:22.926Z · LW(p) · GW(p)

Such a table cannot really be created because it is too large, not just for computing, but even for storing in memory if it were somehow given to you. It is not out of the question that computing resources continues to grow enough such that it eventually becomes feasible, but we have no idea if they will, and it would be a long time in the future.

Theoretical Turing machines are very simple, but have infinite resources, and are thus a bad way of determining the difficulty of things.

Replies from: Pattern## ↑ comment by Pattern · 2021-07-21T16:40:19.447Z · LW(p) · GW(p)

Theoretical Turing machines are very simple, but have infinite resources, and are thus a bad way of determining the difficulty of things.

If you can't do it with a Turing machine, maybe you can't do it. This sort of hedges against 'does someone come up with a better algorithm', or a more efficient data structure, so things don't get broken when something we thought was impossible is accomplished.

Other approaches include working out a minimum amount of computation, then looking at:

- Practical costs (today)*
- Minimum amount of energy used in theory, per physics laws re: deleting information

(Both might get thrown out of whack by quantum computers, if they pan out. Also, reversible computing *might* lower theoretical and practical costs.)

*This one can be affected by 'if someone buys hardware specifically for doing this once, they can reuse it' , if there's not enough padding.

Replies from: deepthoughtlife## ↑ comment by deepthoughtlife · 2021-07-21T21:47:20.958Z · LW(p) · GW(p)

Yes, I suppose I left out that you can determine that something can't be computed if you couldn't do it with a Turing machine. Proofs of impossibility are actually somewhat important.

Practical costs today are a shifting sand, but worthwhile for difficult and important tasks, while being useless on a number of other ones due to the difficulty of determining them. What algorithm, and what should be the reference computer? (Or reference computer building / buying process. It would be silly to include a massive R&D program, but what about a small one that doesn't take very long to make a component that vastly improves results?)

Reversible computing is a huge question mark, but so are the lower bounds of minimum energy used to delete information. [I think the theories haven't really been tested in that area, because we have no idea how to, and are many orders of magnitude away.] On a side note, I expect based purely on my own intuition that reversible computing will actually end up taking more energy in all practical cases than the minimum reachable efficiency of nonreversible.

Quantum computing does change the mark quite a bit cost on very specific algorithms if quantum computers actually manage to scale to large enough sizes for them, but that is still many orders of magnitude for most problems they are likely superior for. Nonetheless, I think that an algorithm that is improved sufficiently by a quantum computer should be considered using a quantum version as soon as we have good number for it.

## ↑ comment by Slider · 2021-07-20T08:57:18.548Z · LW(p) · GW(p)

Are there computational problems that can' t be represented with turing machines?

Replies from: timothy-johnson## ↑ comment by Timothy Johnson (timothy-johnson) · 2021-07-21T18:56:12.769Z · LW(p) · GW(p)

That depends what you mean by "computational problems". In its usual definition, a Turing machine takes a single input string, and provides a single bit as output: "Accept" or "Reject". (It may also of course run forever without providing an answer.)

For example, the question, "What is the shortest path through this list of cities?" isn't possible to encode directly into a standard Turing machine. Instead, the Turing machine can answer, "Is there a path through this list of cities with length at most k?", for some constant k.

If you don't like this, there are ways to modify the definition of a Turing machine. But for the purposes of studying computational complexity, all reasonable definitions seem to be equivalent.

Replies from: Slider## ↑ comment by Slider · 2021-07-21T20:05:42.409Z · LW(p) · GW(p)

I think the machine halting can be interpreted as accepting and you mgiht be allowed to leave a number on the tape.

I was wondering whether cases like the halting problem might be intedresting edgecases but TMs are not especially inferior. Church-turing thesis is about there not being anything interesting missed by what is captured by machines.