The Missing Piece

post by Martin Sustrik (sustrik) · 2019-10-27T05:50:00.824Z · LW · GW · 6 comments

Contents

  October 27th, 2019
None
6 comments

In computer science, compiler is a program that translates things that programmers write, such as "if CloseButtonPressed then CloseApplication", to things that computers are able to understand and execute, such as "0001010111001010111011000111100010101011110".

When the first compiler was written, someone had to write it in the machine code, the zeroes and ones that computers understand. They had to type in the numbers, or, more likely, punch holes into punchcards.

But from that point on, programmers were able to write human-readable programs instead of the inscrutable machine code. They just had to use the existing compiler to translate from the former to the latter.

Now imagine you are tasked with writing the second, improved, version of the compiler.

You can, of course, do the same thing again. You can type in the numbers, or maybe punch zeroes and ones into punchcards. But writing machine code is hard.

And now you have a better option at your disposal: You can write the new version of the compiler in a human-readable language and use the old version of the compiler to translate it into machine code!

Similarly, the third version of the compiler can be compiled by the second version of the compiler, the fourth by the third and so on.

That, by the way is not a hypothetical scenario. That's how the compilers are written nowadays.

Now, here comes the question: What if all the software disappeared overnight? What if the only thing we were left with was the human-readable source code of the latest version of the compiler? Would we be able to reconstruct the language and the compiler and get everything running again?

On one hand, the source code contains all the relevant information. It contains the full description of the language, its syntax and its semantics. It contains all the instructions needed to translate it into the machine code. It should be therefore possible to use it to create a functional compiler again.

But on the other hand, how exactly would you do that? The source code is, after all, just a pile of text. And there's no compiler to turn it into the machine code, into something that computer understands. It sounds like you are faced with a kind of chicken-and-egg problem.

Clearly, the source code is not enough. There's some information missing. But it's hard to pinpoint what exactly the missing part may be.

***

What if we found an undamaged T-Rex DNA in the gut of ancient mosquito trapped inside a piece of amber? Would we be able to grow an adult T-Rex? Would we be able to find out what colour they were, whether they had feathers and whether males attracted females by nightingale-like song?

On one hand, one regularly reads news about scientists being on the verge of resurrecting the wooly mammoth. It am no embryologist, but as far as I understand, the idea is that the DNA reconstructed from the frozen remains of mammoths found in Siberia would be injected into an elephant egg and grown either in an elephant female or maybe ex-vivo.

On the other hand, dinosaurs have no close living relatives. For T-Rex, there's no equivalent of what elephant is to mammoth. Maybe the DNA could be injected into an ostrich egg, the birds being the closest surviving relatives to dinosaurs. But given the evolutionary distance between T-Rex and ostrich, how likely it is that the biochemistry and embryology of the ostrich egg is similar enough to the biochemistry and embryology of the T-Rex egg to produce a viable offspring? I wouldn't bet much on it.

And even if that was possible, what if there was no living organism left, just a piece of DNA? Would we still be able to reconstruct the animal?

Although DNA encodes all the information about the organism, there's clearly something missing. And, again, it's hard to put your finger on what exactly it may be.

***

Central African economy is failing badly. CAR authorities therefore look around for inspiration and decide to adopt the Swiss model. Switzerland, after all, is a small multi-ethnic, multi-confessional country, just like CAR, and it was able to raise from being an obscure and poor backwater country to one of the world's most wealthy countries in just few decades.

CAR therefore adopts the Swiss constitution, Swiss political system, Swiss law, Swiss approach to bank secrecy, Swiss educational system. They establish a church choir in every village and start eating fondue.

But to everyone's surprise nothing changes. CAR economy is doing as badly as ever. Corruption is rife. The rule of law falters. The institutions that run like clockwork in Switzerland fail mysteriously in CAR.

Clearly, there's something about Swiss society that wasn't captured in the laws and institutions introduced in CAR. But, once again, it's hard to say what the missing piece could possibly be.

October 27th, 2019

>

6 comments

Comments sorted by top scores.

comment by Donald Hobson (donald-hobson) · 2019-10-27T20:22:59.854Z · LW(p) · GW(p)

The case about swizerland is different, so I won't talk about it. In the other cases, what is going on is about fixed points. Call the text version of the latest compiler , and the compiled version . These have the relation that . The compiled code, when given the non compiled code as input, returns itself. However there are many different pieces of machine code with the property that . Some will ignore their input entirely and quine themselves, some will be nonsensical data shuffling that produces gibberish on any other input. A few might be compilers that detect when they are compiling themselves and insert a malicious package. https://en.wikipedia.org/wiki/Backdoor_(computing)#Compiler_backdoors . Its possible that some feature of the programming language is defined in a way that uses itself, in such a way that there are stable modifications to the language. Suppose the only time an else block is used is in defining what to do when compiling an else block. Then a broken compiler that never ran any code in else blocks might compile into itself.

The fixed points of self compiling compilers are sufficiently rare, and most of them will be sufficiently stupid, that it should be possible to deduce which fixed point you want given only weak assumptions of sanity. I would expect a team of smart programmers to be able to figure out the language P, given only a P compiler written in P. (assuming P is a sensible programming language they haven't seen before, like python would be to a parallel universe where the biggest diff was that python didn't exist.) For instance, they would have a pretty good guess at what if statements, multiplication, ect did at first glance. It would then be a case o using that to figure out the details of object inheritance.


The biological case is basically the same. DNA is source code, proteins and other cellular machinery form the binaries. The DNA contains instructions that tell proteins how to duplicate themselves. Biologists can probably find the right fixed point by putting protein constructors from several animal sources, along with amino acids and plenty of DNA into a test tube. If the right pieces get close enough in the right way that the first protein machine forms, it will duplicate exponentially. (Maybe not, if the gap in the meaning of DNA is significant).

Replies from: sustrik, None
comment by Martin Sustrik (sustrik) · 2019-10-28T05:24:10.376Z · LW(p) · GW(p)

Let me restate the question in a different way:

If we have just the compiler source code, we are missing some information (easily proven by showing that there's infinite number of such Xs where X(S)=X, whereas only one is "correct").

To find out what that information may be let's consider the case where both the source code of the compiler and the compiler binary are available, but there's no programmer that understands the language. Are we still missing said piece of information?

On one hand, we can assume that yes, the information in question is still missing. In that case it must be something that is in the head of the programmer, some kind of "interpretation" of the language. But if that is so, how does that apply to the biological case? What's the "interpretation" of DNA and whose head it resides in?

On the other hand, we can assume that no, with the compiler binary at hand there's no information missing. Therefore, there must be something in the binary that's not present in the source code. But given that the binary is just a transformation of the source code, what exactly that may be? Is it some kind of "interpretation" of the language, but encoded as machine code?

An unrelated though: Why is the Swiss/CAR case different from the other two? If one looks at how the reproduction is carried out in living organisms (not the high school biology version, but the real thing) then it is, given its complexity and distributed nature, much more similar to the working of a society than to a compiler. Maybe, after all, the biological and sociological cases are similar, and the compilers have nothing to do with the other two?

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2019-10-29T00:09:11.463Z · LW(p) · GW(p)

There is info in the compiler binary that isn't in the source code. Suppose the language contains the constant Pi.

The lines in the compiler that deal with this look like

If (next token == "Pi"){

return float_to_binary(Pi)}

The actual value of Pi, 3.14... is nowhere to found in the source code, its stored in the binary, and passed from the compilers binary into the compiled binary. Of course, at some point the value of Pi must have been hard coded in. Perhaps it was written into the first binary, and has never been seen in source code. Or perhaps a previous version contained return float_to_binary(3.14) instead. Given just the source code, there would be no way to tell the value of Pi without getting out a maths book and relying on the programmers using normal mathematical names. The binary is a transformation of the source code, but that doesn't stop the compiler adding info. A compiled binary contains info about which processor architecture it runs on, source code doesn't, and so can be compiled onto different architectures. A compiler adds info, even to its own source code.

comment by [deleted] · 2019-10-30T17:03:27.727Z · LW(p) · GW(p)

A very insightful explanation. It leads me to think what this implies for the replication of nanobots [LW · GW]:

If all nanodevices produced are precise molecular copies, and moreover, any mistakes on the assembly line are not heritable because the offspring got a digital copy of the original encrypted instructions for use in making grandchildren, then your nanodevices ain't gonna be doin' much evolving.
You'd still have to worry about prions—self-replicating assembly errors apart from the encrypted instructions, where a robot arm fails to grab a carbon atom that is used in assembling a homologue of itself, and this causes the offspring's robot arm to likewise fail to grab a carbon atom, etc., even with all the encrypted instructions remaining constant.

So is prion evolution just sliding from fixed point to fixed point? If so, how likely is it to happen and how would one go about suppressing process? How would one reduce the density of fixed points?

Replies from: donald-hobson
comment by Donald Hobson (donald-hobson) · 2019-10-30T23:22:34.068Z · LW(p) · GW(p)

Yes, prion evolution is sliding between fixed points, One way to reduce fixed points would be to measure and test the finished duplicate, and destroy it if it fails the tests. Without tests, you just need A to build A, and A' to build A'. No prion can reside exclusively in the testing mechanisms, so either the difference between A and A' is something that the tests can't measure, or A' builds A' and also T', a tester that has a prion making A' pass the tests. This is a much more stringent set of conditions, so there are less prions. Of course, a self reproducing program is always a fixed point. You can't stop those (nanomachines that self reproduce without looking at your instructions) from being possible, just avoid making them.

comment by lsusr · 2019-10-30T09:08:18.976Z · LW(p) · GW(p)

But, once again, it's hard to say what the missing piece could possibly be.

This statement took me a long time to untangle.

The Central African Republic is located in the poorest, most war-torn region of the globe. Switzerland is in the middle of the European Union. If the Central African Republic was transposed into the middle of Western Europe and assimilated into the European Union then it'd a prosperous country in no time. In this sense, it's easy to say what the missing piece is.

But it's not well-defined which parts Europe are essential to Switzerland's continuous existence. That's because "which parts Europe are essential to Switzerland's continuous existence?" isn't even a well-defined question. There are many different combinations of features of Europe you could remove to trigger Switzerland's internal collapse.

I think that the original statement is intended to mean something along the following lines.

A self-perpetuating system includes its environment. Part of the self-perpetuating system is encoded in its environment. Exactly which bits of the self-perpetuating system are encoded in its environment is often not a well-defined question and therefore difficult to state.