Posts
Comments
once you know what your probability distribution is...
I'd merely stress that that's an enormous "once." When you're writing a program (which, yes, I used to do), normally you have only the foggiest idea of what a typical input is going to be, yet you want the program to work anyway. This is not just a hypothetical worry, or something limited to cryptography: people have actually run into strange problems using pseudorandom generators for Monte Carlo simulations and hashing (see here for example, or Knuth vol 2).
Even so, intuition suggests it should be possible to design PRG's that defeat anything the world is likely to throw at them. I share that intuition; it's the basis for the (yet-unproved) P=BPP conjecture.
"Any one who considers arithmetical methods of producing random digits is, of course, in a state of sin." --von Neumann
I'm not making any nonstandard claims about computational complexity, only siding with the minority "derandomize things" crowd
Note that I also enthusiastically belong to a "derandomize things" crowd! The difference is, I think derandomizing is hard work (sometimes possible and sometimes not), since I'm unwilling to treat the randomness of the problems the world throws at me on the same footing as randomness I generate myself in the course of solving those problems. (For those watching at home tonight, I hope the differences are now reasonably clear...)
When Scott Aaronson was 12 years old, he: "set myself the modest goal of writing a BASIC program that would pass the Turing Test by learning from experience and following Asimov's Three Laws of Robotics..."
As I think back on that episode, I realize that even at the time, I didn't really expect to succeed; I just wanted to see how far I could get and what would happen if I tried. And it's not clear to me in retrospect that it wasn't worth a day's work: at the least, I learned something about how to write tokenizers and user interfaces! Certainly I've spent many, many days less usefully. For similar reasons, it's probably worth it for budding computer scientists to spend a few days on the P vs. NP question, even if their success probability is essentially zero: it's the only way to get a gut, intuitive feel for why the problem is hard.
Is it likewise possible that some of the AGI researchers you've met (not the creationist, but the other ones) aren't quite as stupid as they seemed? That even if they don't succeed at their stated goal (as I assume they won't), the fact that they're actually building systems and playing around with them makes it halfway plausible that they'll succeed at something?
While "reifying the internal nodes" must indeed be counted as one of the great design flaws of the human brain, I think the recognition of this flaw and the attempt to fight it are as old as history. How many jokes, folk sayings, literary quotations, etc. are based around this one flaw? "in name only," "looks like a duck, quacks like a duck," "by their fruits shall ye know them," "a rose by any other name"... Of course, there wouldn't be all these sayings if people didn't keep confusing labels with observable attributes in the first place -- but don't the sayings suggest that recognizing this bug in oneself or others doesn't require any neural-level understanding of cognition?
Are there any witticisms that don't count as 'rationality quotes'?
So here's my question: Can you actually do asymptotically better than natural natural selection by applying an error-correcting code that doesn't hamper beneficial mutations?
In principle, yes. In a given generation, all we want is a mutation rate that's nonzero, but below the rate that natural selection can correct. That way we can maintain a steady state indefinitely (if we're indeed at a local optimum), but still give beneficial mutations a chance to take over.
Now with DNA, the mutation rate is fixed at ~10^-8. Since we need to be able to weed out bad mutations, this imposes an upper bound of ~10^8 on the number of functional base pairs. But there's nothing special mathematically about the constant 10^-8 -- that (unless I'm mistaken) is just an unwelcome intruder from physics and chemistry. So by using an error-correcting code, could we make the "effective mutation rate" nonzero, but as far below 10^-8 as we wanted?
Indeed we could! Here's my redesigned, biology-beating DNA that achieves this. Suppose we want to simulate a mutation rate ε<also stick in "parity-check pairs" from a good error-correcting code. These parity-check pairs let us correct as many mutations as we want, with only a tiny probability of failure.
Next we let the physics and chemistry of DNA do their work, and corrupt a 10^-8 fraction of the base pairs. And then, using exotic cellular machinery whose existence we get to assume, we read the error syndrome off the parity-check pairs, and use it to undo all but one mutation in the unencoded, functional pairs. But how do we decide which mutation gets left around for evolution's sake? We just pick it at random! (If we need random bits, we can just extract them from the error syndrome -- the cosmic rays or whatever it is that cause the physical mutations kindly provide us with a source of entropy.)
Wiseman, let M be the number of "functional" base pairs that get mutated per generation, and let C be the number of those base pairs that natural selection can affect per generation. Then if M>>C, the problem is that the base pairs will become mostly (though not entirely) random junk, regardless of what natural selection does. This is a point about random walks that has nothing to do with biology.
To illustrate, suppose we have an n-bit string. At every time step, we can change one of the bits to anything we want, but then two bits get chosen at random and inverted. Question: in "steady state," how many of the bits can we ensure are 0?
I claim the answer is only 3n/4. For suppose pn of the bits are 1, and that we always pick a '1' bit and change it to 0. Then the expected change in the number of bits after a single time step is
D(p) = (1/4) [p^2 (-3) + 2p(1-p) (-1) + (1-p)^2 (1)].
Setting D(p)=0 and solving yields p=1/4.
So we get an either/or behavior: either the mutation rate is small enough that we can keep the functional DNA pretty much exactly as it is, or else a huge fraction of the "functional" DNA becomes randomized. In other words, there's no intermediate regime where the functional DNA keeps mutating around within some "interesting" region of configuration space, without spilling out into a huge region of configuration space.
Note that for the above argument, I've assumed that the "interesting" regions of DNA configuration space are necessarily small -- and in particular, that they can't correspond to Hamming balls of size c*n where c is a constant. This assumption is basically a restatement of our earlier observation that natural selection hasn't discovered error-correcting codes. As such, it seems to me to be pretty secure biologically. But if this assumption fails then so does my argument.
Wiseman, if it's true that (1) copying DNA inherently incurs a 10^-8 probability of error per base pair, and that (2) evolution hasn't invented any von-Neumann-type error-correction mechanism, then all the objections raised by you and others (and by me, earlier!) are irrelevant.
In particular, it doesn't matter how capable a species is of coping with a few detrimental mutations. For if the mutation rate is higher than what natural selection can correct, the species will just keep on mutating, from one generation to the next, until the mutations finally do become detrimental.
Also, your idea that sections of DNA can become more robust once they've "proven themselves" violates the 10^-8 assumption -- which I'd imagine (someone correct me if I'm wrong) comes from physics and chemistry, not because it's favored by natural selection.
Eliezer, sorry for spamming, but I think I finally understand what you were getting at.
Von Neumann showed in the 50's that there's no in-principle limit to how big a computer one can build: even if some fraction p of the bits get corrupted at every time step, as long as p is smaller than some threshold one can use a hierarchical error-correcting code to correct the errors faster than they happen. Today we know that the same is true even for quantum computers.
What you're saying -- correct me if I'm wrong -- is that biological evolution never discovered this fact.
If true, this is a beautiful argument for one of two conclusions: either that (1) digital computers shouldn't have as hard a time as one might think surpassing billions of years of evolution, or (2) 25MB is enough for pretty much anything!
OK, I posted the following update to my blog entry:
Rereading the last few paragraphs of Eliezer's post, I see that he actually argues for his central claim -- that the human genome can’t contain more than 25MB of "meaningful DNA" -- on different (and much stronger) grounds than I thought! My apologies for not reading more carefully.
In particular, the argument has nothing to do with the number of generations since the dawn of time, and instead deals with the maximum number of DNA bases that can be simultaneously protected, in steady state, against copying errors. According to Eliezer, copying DNA sequence involves a ~10^-8 probability of error per base pair, which — because only O(1) errors per generation can be corrected by natural selection — yields an upper bound of ~10^8 on the number of "meaningful" base pairs in a given genome.
However, while this argument is much better than my straw-man based on the number of generations, there's still an interesting loophole. Even with a 10^-8 chance of copying errors, one could imagine a genome reliably encoding far more than 10^8 bits (in fact, arbitrarily many bits) by using an error-correcting code. I'm not talking about the "local" error-correction mechanisms that we know DNA has, but about something more global -- by which, say, copying errors in any small set of genes could be completely compensated by other genes. The interesting question is whether natural selection could read the syndrome of such a code, and then correct it, using O(1) randomly-chosen insertions, deletions, transpositions, and reversals. I admit that this seems unlikely, and that even if it's possible in principle, it's probably irrelevant to real biology. For apparently there are examples where changing even a single base pair leads to horrible mutations. And on top of that, we can't have the error-correcting code be too good, since otherwise we'll suppress beneficial mutations!
Incidentally, Eliezer's argument makes the falsifiable prediction that we shouldn't find any organism, anywhere in nature, with more than 25MB of functional DNA. Does anyone know of a candidate counterexample? (I know there are organisms with far more than humans' 3 billion base pairs, but I have no idea how many of the base pairs are functional.)
Lastly, in spite of everything above, I'd still like a solution to my "pseudorandom DNA sequence" problem. For if the answer were negative -- if given any DNA sequence, one could efficiently reconstruct a nearly-optimal sequence of insertions, transpositions, etc. producing it -- then even my original straw-man misconstrual of Eliezer's argument could put up a decent fight! :-)
OK, I came up with a concrete problem whose solution would (I think) tell us something about whether Eliezer's argument can actually work, assuming a certain stylized set of primitive operations available to DNA: insertion, deletion, copying, and reversal. See here if you're interested.
Eliezer, so long an organism's fitness depends on interactions between many different base pairs, the effect can be as if some of the base pairs are logically determined by others.
Also, unless I'm mistaken there are some logical operations that the genome can perform: copying, transpositions, reversals...
To illustrate, suppose (as apparently happens) a particular DNA stretch occurs over and over with variations: sometimes forwards and sometimes backwards, sometimes with 10% of the base pairs changed, sometimes chopped in half and sometimes appended to another stretch, etc. Should we then count all but one of these occurrences as "junk"? Of course, we could measure the number of bits using our knowledge of how the sequence actually arose ("first stretch X was copied Y times, then the copies were modified as follows..."). But the more such knowledge we bring in, the further we get from the biologist's concept of information and the closer to the Platonic mathematical concept.
Eliezer, your argument seems to confuse two different senses of information. You first define "bit" as "the ability to eliminate half the possibilities" -- in which case, yes, if every organism has O(1) children then the logical "speed limit on evolution" is O(1) bits per generation.
But you then conclude that "the meaningful DNA specifying a human must fit into at most 25 megabytes" -- and more concretely, that "it is an excellent bet that nearly all the DNA which appears to be junk, really is junk." I don't think that follows at all.
The underlying question here seems to be this: suppose you're writing a software application, and as you proceed, many bits of code are generated at random, many bits are logically determined by previous bits (albeit in a more-or-less "mindless" way), and at most K times you have the chance to fix a bit as you wish. (Bits can also be deleted as you go.) Should we then say that whatever application you end up with can have at most K bits of "meaningful information"?
Arguably from some God's-eye view. But any mortal examining the code could see far more than K of the bits fulfilling a "functional role" -- indeed, possibly even all of them. The reason is that the web of logical dependencies, by which the K "chosen" bits interacted with the random bits to produce the code we see, could in general be too complicated ever to work out within the lifetime of the universe. And crucially, when biologists talk about how many base pairs are "coding" and how many are "non-coding", it's clearly the pragmatic sense of "meaningful information" they have in mind rather than the Platonic one.
Indeed, it's not even clear that God could produce a ~K-bit string from which the final application could be reliably reconstructed. The reason is that the application also depends on random bits, of which there are many more than K. Without assuming some conjecture about pseudorandom number generators, it seems the most God could do would be to give us a function mapping the random bits to K bits, such that by applying that function we'd end up most of the time with an application that did more-or-less the same thing. (This actually leads to some interesting CS questions, but I'll spare you for now! :) )
To say something more concrete, without knowing much more than I do about biology, I wouldn't venture a guess as to how much of the "junk DNA" is really junk. The analogy I prefer is the following: if I printed out the MS Word executable file, almost all of it would look like garbage to me, with only a few "coding regions" here and there ("It looks like you're writing a letter. Would you like help?"). But while the remaining bits might indeed be garbage in some sense, they're clearly not in the sense a biologist would mean.
Bravo.
Eliezer: Excellent post, but I wonder if what you're saying is related to impressionist painting or the French Revolution? :-)
Seriously, I constantly meet people who ask me questions like: "could quantum algorithms have implications for biomechanical systems?" Or "could neural nets provide insight to the P versus NP problem?" And I struggle to get across to them what you've articulated so clearly here: that part of being a successful researcher is figuring out what isn't related to what else.