Information theory and FOOM

post by PhilGoetz · 2009-10-14T16:52:35.107Z · LW · GW · Legacy · 95 comments

Contents

  DNA Evolution
  Science!
  Analysis
None
95 comments

Information is power.  But how much power?  This question is vital when considering the speed and the limits of post-singularity development.  To address this question, consider 2 other domains in which information accumulates, and is translated into an ability to solve problems:  Evolution, and science.

DNA Evolution

Genes code for proteins.  Proteins are composed of modules called "domains"; a protein contains from 1 to dozens of domains.  We classify genes into gene "families", which can be loosely defined as sets of genes that on average share >25% of their amino acid sequence and have a good alignment for >75% of their length.  The number of genes and gene families known doubles every 28 months; but most "new" genes code for proteins that recombine previously-known domains in different orders.

Almost all of the information content of a genome resides in the amino-acid sequence of its domains; the rest mostly indicates what order to use domains in individual genes, and how genes regulate other genes.  About 64% of domains (and 84% of those found in eukaryotes) evolved before eukaryotes split from prokaryotes about 2 billion years ago. (Michael Levitt, PNAS July 7 2009, "Nature of the protein universe"; D. Yooseph et al. "The Sorcerer II global ocean sampling expedition", PLoS Bio 5:e16.)  (Prokaryotes are single-celled organisms lacking a nucleus, mitochondria, or gene introns.  All multicellular organisms are eukaryotes.)

It's therefore accurate to say that most of the information generated by evolution was produced in the first one or two billion years; the development of more-complex organisms seems to have nearly stopped evolution of protein domains.  (Multi-cellular organisms are much larger and live much longer; therefore there are many orders of magnitude fewer opportunities for selection in a given time period.)  Similarly, most evolution within eukaryotes seems to have occurred during a period of about 50 million years leading up to the Cambrian explosion, half a billion years ago.

My first observation is that evolution has been slowing down in information-theoretic terms, while speeding up in terms of the intelligence produced.  This means that adding information to the gene pool increases the effective intelligence that can be produced using that information by a more-than-linear amount.

In the first of several irresponsible assumptions I'm going to make, let's assume that the information evolved in time t is proportional to i = log(t), while the intelligence evolved is proportional to et = ee^i.  I haven't done the math to support those particular functions; but I'm confident that they fit the data better than linear functions would.  (This assumption is key, and the data should be studied more closely before taking my analysis too seriously.)

My second observation is that evolution occurs in spurts.  There's a lot of data to support this, including data from simulated evolution; see in particular the theory of punctuated equilibrium, and the data from various simulations of evolution in Artificial Life and Artificial Life II.  But I want to single out the eukaryote-to-Cambrian-explosion spurt.  The evolution of the first eukaryotic cell suddenly made a large subset of organism-space more accessible; and the speed of evolution, which normally decreases over time, instead increased for tens of millions of years.

Science!

The following discussion relies largely on de Solla Price's Little Science, Big Science (1963), Nicholas Rescher's Scientific Progress: A Philosophical Essay on the Economics of Research in Natural Science (1978), and the data I presented in my 2004 TransVision talk, "The myth of accelerating change".

The growth of "raw" scientific knowledge is exponential by most measures: Number of scientists, number of degrees granted, number of journals, number of journal articles, number of dollars spent.  Most of these measures have a doubling time of 10-15 years.  (GDP has a doubling time closer to 20 years, suggesting that the ultimate limits on knowledge may be economic.)

The growth of "important" scientific knowledge, measured by journal citations, discoveries considered worth mentioning in histories of science, and perceived social change, is much slower; if it is exponential, it appears IMHO to have had a doubling time of 50-100 years between 1600 and 1940.  (It can be argued that this growth began slowing down at the onset of World War II, and more dramatically around 1970).  Nicholas Rescher argues that important knowledge = log(raw information).

A simple argument supporting this is that "important" knowledge is the number of distinctions you can make in the world; and the number of distinctions you can draw based on a set of examples is of course proportional to the log of the size of your data set, assuming that the different distinctions are independent and equiprobable, and your data set is random.  However, an opposing argument is that log(i) is simply the amount of non-redundant information present in a database with uncompressed information i.  (This appears to be approximately the case for genetic sequences.  IMHO it is unlikely that scientific knowledge is that redundant; but that's just a guess.)  Therefore, important knowledge is somewhere between O(log(information)) and O(information), depending whether information is closer to O(raw information) or O(log(raw information)).

Analysis

We see two completely-opposite pictures:  In evolution, the efficaciousness of information increases more-than-exponentially with the amount of information.  In science, it increases somewhere between logarithmically and linearly.

My final irreponsible assumption will be that the production of ideas, concepts, theories, and inventions ("important knowledge") from raw information, is analogous to the production of intelligence from gene-pool information.  Therefore, evolution's efficacy at using the information present in the gene pool can give us a lower bound on the amount of useful knowledge that could be extracted from our raw scientific knowledge.

I argued above that the amount of intelligence produced from a given gene-information-pool i is approximately e^ei, while the amount of useful knowledge we extract from raw information i is somewhere between O(i) and O(log(i)).  The implication is that the fraction of discoveries that we have made, out of those that could be made from the information we already have, has an upper bound between O(1/e^e^i) and O(1/e^e^e^i).

One key question in asking what the shape of AI takeoff will be, is therefore: Will AI's efficiency at drawing inferences from information be closer to that of humans, or that of evolution?

If the latter, then the number of important discoveries that an AI could make, using only the information we already have, may be between e^e^i and e^e^e^i times the number of important discoveries that we have made from it.  i is a large number representing the total information available to humanity.  e^e^i is a goddamn large number.  e^e^e^i is an awful goddamn large number.  Where before, we predicted FOOM, we would then predict FOOM^FOOM^FOOM^FOOM.

Furthermore, the development of the first AI will be, I think, analogous to the evolution of the first eukaryote, in terms of suddenly making available a large space of possible organisms.  I therefore expect the pace of information generation by evolution to suddenly switch from falling, to increasing, even before taking into account recursive self-improvement.  This means that the rate of information increase will be much greater than can be extrapolated from present trends.  Supposing that the rate of acquisition of important knowledge will change from log(i=et) to et gives us FOOM^FOOM^FOOM^FOOM^FOOM, or 4FOOM.

This doesn't necessarily mean a hard takeoff.  "Hard takeoff" means, IMHO, FOOM in less than 6 months.  Reaching the e^e^e^i level of efficiency would require vast computational resources, even given the right algorithms; an analysis might find that the universe doesn't have enough computronium to even represent, let alone reason over, that space.  (In fact, this brings up the interesting possibility that the ultimate limits of knowledge will be storage capacity:  Our AI descendants will eventually reach the point where they need to delete knowledge from their collective memory in order to have the space to learn something new.)

However, I think this does mean FOOM.  It's just a question of when.

ADDED:  Most commenters are losing sight of the overall argument.  This is the argument:

  1. Humans have diminishing returns on raw information when trying to produce knowledge.  It takes more dollars, more data, and more scientists to produce a publication or discovery today than in 1900.
  2. Evolution has increasing returns on information when producing intelligence.  With 51% of the information in a human's DNA, you could build at best a bacteria.  With 95-99%, you could build a chimpanzee.
  3. Producing knowledge from information is like producing intelligence from information. (Weak point.)
  4. Therefore, the knowledge that could be inferred from the knowledge that we have is much, much larger than the knowledge that we have.
  5. An artificial intelligence may be much more able than us to infer what is implied by what it knows.
  6. Therefore, the Singularity may not go FOOM, but FOOMFOOM.

95 comments

Comments sorted by top scores.

comment by whpearson · 2009-10-16T10:24:17.543Z · LW(p) · GW(p)

Doesn't information have to be about something? Bits are not inherently powerful... proteins are about structure but they do not inherently win you any evolutionary races. I'd contend that there is a lot more information about how to survive in which proteins are in the genome and when they are transcribed.

You seem to mixing up bits needed to replicate the genome with bits of information gained about the outside world and how to survive in it.

Edit: To give you an example of the difference. Consider a standard computer program that does something useful in the world. If I take the machine code and chop it up and rearrange the segments then you are not likely to get a program that does anything useful. Yet it has the same number of bits (even likely similar or more complexity) as the original program.

comment by taw · 2009-10-14T17:17:46.113Z · LW(p) · GW(p)

The idea that pace of discovery slowed down is an extremely common and really obvious fallacy.

We only know that discovery was important after it gets widely implemented, what happens decades after invention. Yet, we count it as happening not at implementation time, but at invention time. So recent discoveries that will be implemented in the future are not counted at all, artificially lowering our discovery importance counts.

Also if you use silly measures like railroad tracks per person, or max land mph, you will obviously not see much progress, as large part of the progress is exploring new kinds of activities, not just making old activities more efficient. Any constant criterion like that will underestimate progress.

Replies from: Perplexed, PhilGoetz, timtyler
comment by Perplexed · 2010-10-13T16:45:27.536Z · LW(p) · GW(p)

The idea that pace of discovery slowed down is an extremely common and really obvious fallacy.

The idea can't be a fallacy. What you mean is that the usual argument for this idea contains an obvious fallacy.

It is an important distinction because reversed stupidity is not intelligence. Identifying the fallacy doesn't prove that the pace of discovery has not slowed.

comment by PhilGoetz · 2009-10-14T17:30:37.777Z · LW(p) · GW(p)

The idea that pace of discovery slowed down in the 20th century is a parenthetical digression, and has no bearing on the analysis in this post.

Also if you use silly measures like railroad tracks per person, or max land mph, you will obviously not see much progress, as large part of the progress is exploring new kinds of activities, not just making old activities more efficient. Any constant criterion like that will underestimate progress.

But it's okay when Ray Kurzweil does it? He is underestimating progress by doing so? What measures are less silly?

Replies from: taw
comment by taw · 2009-10-14T18:58:06.940Z · LW(p) · GW(p)

The idea that pace of discovery slowed down in the 20th century is a parenthetical digression, and has no bearing on the analysis in this post.

It seemed vaguely related to your exps and logs.

What measures are less silly?

There are many locally valid measures, but all become ridiculous when applies to wrong times. It seems to me that GDP/capita is the least bad measure at the moment, but it's very likely it won't do too far in the past or too far in the future.

I have no idea what Kurzweil is doing.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T19:34:17.856Z · LW(p) · GW(p)

It seemed vaguely related to your exps and logs.

It is related, which is why I mentioned it. But it isn't a link in the chain of reasoning.

Replies from: gwern
comment by gwern · 2009-10-15T00:06:52.140Z · LW(p) · GW(p)

I don't quite follow the whole thing (too many Big Os and exponents for me to track the whole thing), but wouldn't it be quite relevant given your observations about S-curves in the development of microbes?

What's to stop us from saying that science has hits its S-curve's peak of how much it could extract from the data and that an AI would be similarly hobbled, especially if we bring in statistical studies like Charles Murray's _Human Accomplishment_ which argues that up to 1950, long enough ago that recency effects ought to be gone, major scientific discoveries show a decline from peaks in the 1800s or whenever? (Or that mammalian intelligences have largely exhausted the gains?)

Eliezer may talk about how awesome a Solomonoff-inducting intelligence would be and writes stories about how much weak superintelligences could learn, but that's still extrapolation which could easily fail (eg. we know the limits on maximum velocity and have relatively good ideas how one could get near the speed of light, but we're not very far from where we began, even with awesome machines).

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-15T00:35:49.251Z · LW(p) · GW(p)

I see what you're saying. That would lead to a more complicated analysis, which I'm not going to do, since people here don't find this approach interesting.

Replies from: gwern
comment by gwern · 2010-10-13T15:17:14.181Z · LW(p) · GW(p)

If an idea is important and interesting to you, then I think that's enough justification. The post isn't negative, after all.

comment by timtyler · 2010-10-14T20:36:07.070Z · LW(p) · GW(p)

I don't think there is any consensus on how to measure innovation. So, before dealing with the question, one must first be clear about what form of measurement you are using - otherwise nobody will know what you aare talking about.

comment by timtyler · 2009-10-14T18:55:53.565Z · LW(p) · GW(p)

Re: ""Hard takeoff" means, IMHO, FOOM in less than 6 months."

Nobody ever specifies when the clock is going to start ticking. We already have speech recognition, search oracles, stockmarket wizards, and industrial automation on a massive scale. Machine intelligence has been under construction for at least 60 years - and machines have been taking people's jobs for over 100 years.

If your clock isn't ticking by now, then what exactly are you waiting for?

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T19:24:04.065Z · LW(p) · GW(p)

Valid point. I spoke loosely. I was referring to the EY - RH debate, in which "6 months" usually meant 6 months from proof-of-concept, or else from not being able to stay under the radar, to FOOM.

Replies from: timtyler
comment by timtyler · 2009-10-14T20:08:48.318Z · LW(p) · GW(p)

Well, it's not just you at fault. Nobody who advocates the idea that some future period of change is rapid ever seems to pin down what they mean to the point where their view is falsifiable. People need to say when they are starting and stopping their clocks - or else they might just as well be mumbling mystical incantations.

Explosive increases in information technology are happening NOW.

If you are looking for an information technology explosion, then THIS IS IT.

Replies from: billswift
comment by billswift · 2009-10-15T16:26:16.117Z · LW(p) · GW(p)

But not as explosive as it will (may) be in the future; this is what has people worried.

Replies from: timtyler
comment by timtyler · 2009-10-15T19:24:30.674Z · LW(p) · GW(p)

How do you measure the "explosiveness" of an explosion? By measuring the base of its exponential growth? If so, I would classify the hypothesis that the explosion will get "more explosive" as a somewhat speculative one. The growth in CPU clock rates has already run out of steam. The explosion may well get faster in some areas - but getting more "explosive" has got to be different from that.

Today, information technology is exploding in the same sense that the nucleii in a nuclear bomb are exploding - by exhibiting exponential growth processes. The explosion is already happening. How fast it is happening is really another issue.

comment by gwern · 2011-08-15T03:53:30.709Z · LW(p) · GW(p)

Here's another interesting set of quotes; are we even correct in assuming the most recent percent of DNA matters much? After all, chimps outperform humans in some areas like monkey ladder. From "If a Lion Could Talk":

"Giving a blind person a written IQ test is obviously not a very mean meaningful evaluation of his mental abilities. Yet that is exactly what many cross-species intelligence tests have done. Monkeys, for example, were found not only to learn visual discrimination tasks but to improve over a series of such tasks -- they formed a learning set, a general concept of the problem that betokened a higher cognitive process than a simple association. Rats given the same tasks showed difficulty in mastering the problems and no ability to form a learning set. The obvious conclusion was that monkeys are smarter than rats, a conclusion that was comfortably accepted, as it fit well with our preexisting prejudices about the distribution of general intelligence in nature. But when the rat experiments were repeated, only this time the rats were given the task of discriminating different smells, they learned quickly and showed rapid improvement on subsequent problems, just as the monkeys did.

The problem of motivation is another major confounding variable. Sometimes we may think we are testing an animal's brain when we are only testing its stomach. For example, in a series of studies goldfish never learned to improve their performance when challenged with "reversal" tasks. These are experiments in which an animal is trained to pick one of two alternative stimuli (a black panel versus a white panel, say) in order to obtain a food reward; the correct answer is then switched and the subject has to relearn which one to pick. Rats quickly learned to switch their response when the previously rewarded answer no longer worked. Fish didn't. This certainly fit comfortably with everyone's sense that fish are dumber than rats. But when the experiment was repeated with a different food reward (a paste squirted into the tank right where the fish made its correct choice, as opposed to pellets dropped into the back of the tank), lo and behold the goldfish suddenly did start improving on reversal tasks. Other seemingly fundamental learning differences between fish and rodents likewise vanished when the experiments were redesigned to take into account differences in motivation.

Equalizing motivation is an almost insoluble problem for designers of experiments. Are three goldfish pellets the equivalent of one banana or fifteen bird seeds? How could we even know? We would somehow have to enter into the internal being of different animals to know for sure, and if we could do that we would not need to be devising roundabout experiments to probe their mental processes in the first place.

When we do control for all of the confounding variables that we possibly can, the striking thing about the "pure" cognitive differences that remain is how the similarities in performance between different animals given similar problems vastly outweigh the differences. To be sure, there seems to be little doubt that chimpanzees can learn new associations with a single reinforced trial, and that that is genuinely faster than other mammals or pigeons do it. Monkeys and apes also learn lists faster than pigeons do. Apes and monkeys seem to have a faster and more accurate grasp of numerosity judgments than birds do. The ability to manipulate spatial information appears to be greater in apes than in monkeys.

But again and again experiments have shown that many abilities thought the sole province of "higher" primates can be taught, with patience, to pigeons or other animals. Supposedly superior rhesus monkeys did better than the less advanced cebus monkeys in a visual learning-set problem using colored objects. Then it turned out that the cebus monkeys did better than the rhesus monkeys when gray objects were used. Rats were believed to have superior abilities to pigeons in remembering locations in a radial maze. But after relatively small changes in the procedure and the apparatus, pigeons did just as well.

If such experiments had shown, say, that monkeys can learn lists of forty-five items but pigeons can only learn two, we would probably be convinced that there are some absolute differences in mental machinery between the two species. But the absolute differences are far narrower. Pigeons appear to differ from baboons and people in the way they go about solving problems that involve matching up two images that have been rotated one from the other, but they still get the right answers. They essentially do just as well as monkeys in categorizing slides of birds or fish or other things. Euan Macphail's review of the literature led him to conclude that when it comes to the things that can be honestly called general intelligence, no convincing differences, either qualitative or quantitative, have yet been demonstrated between vertebrate species. While few cognitive researchers would go quite so far -- and in deed we will encounter a number of examples of differences in mental abilities between species that are hard to explain as anything but a fundamental difference in cognitive function -- it is striking how small those differences are, far smaller than "common sense" generally has it. Macphail has suggested that the "no-difference" stance should be taken as a "null hypothesis" in all studies of comparative intelligence; that is, it is an alternative that always has to be considered and ought to be assumed to be the case unless proven otherwise."

EDIT: I've added this and some other points to my Evolutionary drug heuristics article.

comment by gwern · 2011-08-07T16:30:05.037Z · LW(p) · GW(p)

In the first of several irresponsible assumptions I'm going to make, let's assume that the information evolved in time t is proportional to i = log(t), while the intelligence evolved is proportional to et = ee^i. I haven't done the math to support those particular functions; but I'm confident that they fit the data better than linear functions would.

This may be covered by the following assumption about 'spurts', but this doesn't seem to work for me.

If intelligence really could jump like that, shouldn't we expect to see that in humans already? For example, shouldn't we expect to see small mutations or genes with outsized effects on intelligence? Instead, we see that even a highly inbred population with many dozens of nasty genetic problems like the Ashkenazi only get 10 or 20 IQ points*, and we see a long-term stagnation in cranial capacity, and genetic surveys seem to (as far as I've heard) turn up hundreds or thousands of genetic variations weakly linked to small IQ increases. (I cover some related points in my article on evolution & drugs.) All of this makes intelligence look like it has a logarithmic relationship with diminishing returns.

* My understanding is that on a hypothetical 'absolute' scale of intelligence, as you get smarter, each IQ point corresponds to less and less 'actual' intelligence, due to the bell curve/relative ranking that IQ is - it's an ordinal scale, not a cardinal scale.

Replies from: gwern, CarlShulman, Swimmer963
comment by gwern · 2011-08-09T16:43:29.333Z · LW(p) · GW(p)

hundreds or thousands of genetic variations weakly linked to small IQ increases

For example, I may be misinterpreting this new study http://www.guardian.co.uk/science/2011/aug/09/genetic-differences-intelligence but it seems to back me up:

"To test his idea, researchers looked at more than half a million locations in the genetic code of 3,511 unrelated adults. Each of these sites is where people are known to have single-letter variations in their DNA, called single nucleotide polymorphisms (SNPs). These variations were correlated with the individuals' performance in two types of psychometric tests that are established in assessing intelligence: one test measuring recalled knowledge (via vocabulary) and the second measuring problem-solving skills.

They found that 40% of the variation in knowledge (called "crystallised intelligence" by the researchers) and 51% of the variation in problem-solving skills ("fluid-type intelligence") between individuals could be accounted for by the differences in DNA. The results are published on Tuesday in the journal Molecular Psychiatry.

..."It is the first to show biologically and unequivocally that human intelligence is highly polygenic [involving lots of genes] and that purely genetic (SNP) information can be used to predict intelligence," Deary wrote in the journal paper.

Though the researchers now know the proportion of the variation in intelligence that is likely to be a result of genes, they do not know which genes are likely to be most important in determining intelligence. "If they can be found, and if we want to follow them up, to find out some of the mechanisms that underlie successful thinking, our best guess at present is that the number is huge. It could be many thousands," said Deary. "That could be a limitation to progress using this type of research."

From the abstract, "Genome-wide association studies establish that human intelligence is highly heritable and polygenic":

"General intelligence is an important human quantitative trait that accounts for much of the variation in diverse cognitive abilities. Individual differences in intelligence are strongly associated with many important life outcomes, including educational and occupational attainments, income, health and lifespan. Data from twin and family studies are consistent with a high heritability of intelligence, but this inference has been controversial. We conducted a genome-wide analysis of 3511 unrelated adults with data on 549 692 single nucleotide polymorphisms (SNPs) and detailed phenotypes on cognitive traits. We estimate that 40% of the variation in crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals is accounted for by linkage disequilibrium between genotyped common SNP markers and unknown causal variants. These estimates provide lower bounds for the narrow-sense heritability of the traits. We partitioned genetic variation on individual chromosomes and found that, on average, longer chromosomes explain more variation. Finally, using just SNP data we predicted ~1% of the variance of crystallized and fluid cognitive phenotypes in an independent sample (P=0.009 and 0.028, respectively). Our results unequivocally confirm that a substantial proportion of individual differences in human intelligence is due to genetic variation, and are consistent with many genes of small effects underlying the additive genetic influences on intelligence."

Replies from: CarlShulman
comment by CarlShulman · 2011-08-09T21:00:31.964Z · LW(p) · GW(p)

Interesting. This study is a significant positive update for the feasibility of embryo selection for intelligence: it means that sufficiently enormous/high-powered GWAS studies can give good estimates of genetic potential for IQ in embryos. If common SNPs were less important relative to rare deleterious variants (in explaining heritability), then embryo selection would be complicated by the need to attribute effects to novel rare mutations (without having those properties made immediately clear by the population studies) based on physiological models.

Replies from: gwern
comment by gwern · 2011-08-09T21:52:34.757Z · LW(p) · GW(p)

Well, it's good news if you didn't expect it to be possible at all (is that anyone here?), but it's bad news if you were expecting it to be easy or give high gains.

The result seems to say only that X percent of the genome was related in any way; when it comes time to actually predict intelligence, they only get '1% of the variance of crystallized and fluid cognitive phenotypes in an independent sample'. Given that they cover a lot of genetic information and that with this sort of thing, there seem to be diminishing returns, that suggests the final product will only be a few percent, and nowhere near the ceiling set by genetic influence. Maybe a few points is worthwhile but embryo selection is an expensive procedure...

Replies from: CarlShulman
comment by CarlShulman · 2011-08-09T23:51:32.419Z · LW(p) · GW(p)

We already knew that there weren't common variants of large effect. Conditioning on that, more heritability from common variants of small effect is better for embryo selection than heritability from rare variants.

comment by CarlShulman · 2011-08-09T18:04:36.132Z · LW(p) · GW(p)

My understanding is that on a hypothetical 'absolute' scale of intelligence, as you get smarter, each IQ point corresponds to less and less 'actual' intelligence, due to the bell curve/relative ranking that IQ is - it's an ordinal scale, not a cardinal scale.

In what sense? As you go to higher IQs each additional IQ point means a greater (multiplicative) difference in the rarity of individuals with that frequency. Studies like those of Benbow and Terman show sizable continuing practical impact of increasing IQ on earnings, patents, tenured academic positions, etc.

ETA: Thanks for the clarification.

Replies from: gwern
comment by gwern · 2011-08-09T18:48:42.494Z · LW(p) · GW(p)

Because of the construction of the tests. As you go to higher points, each point represents fewer and fewer correctly answered questions. Matrix IQ tests can be mechanically generated by combining simple rules, and they show the same bell curve norms despite what look only like linear increases in number of rules or complexity.

And the Benbow and Terman studies and others do show practical impact, but they don't show a linear impact where each IQ point is as valuable as the previous, and they certainly do not show an increasing marginal returns where the next IQ point gives greater benefits than before!

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2011-08-09T19:11:40.767Z · LW(p) · GW(p)

Do you mean "at higher IQ values each additional point corresponds to less and less additional (expected) intelligence" or "at higher IQ values each additional point corresponds to less and less total intelligence"?

Replies from: gwern, CarlShulman
comment by gwern · 2011-08-09T20:03:51.397Z · LW(p) · GW(p)

I mean the former - diminishing returns in measured intelligence (IQ) versus actual intelligence.

(I definitely am not saying that IQ points are uncorrelated with actual intelligence at some point, or inversely correlated!)

comment by CarlShulman · 2011-08-09T19:51:00.084Z · LW(p) · GW(p)

I didn't thus misinterpret: my prior on the latter meaning is low.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2011-08-09T19:53:44.387Z · LW(p) · GW(p)

(I corrected my comment before you replied, sorry for acting confused.)

comment by Swimmer963 (Miranda Dixon-Luinenburg) (Swimmer963) · 2011-08-09T18:22:13.751Z · LW(p) · GW(p)

I cover some related points in my article on evolution & drugs.

I followed the link and read the page. Fascinating!

Replies from: gwern
comment by gwern · 2011-08-09T21:51:01.292Z · LW(p) · GW(p)

You're welcome. If you have any suggestions for further examples, I'd be glad to hear them - the essay is kind of skinny for such a grand principle.

comment by teageegeepea · 2009-10-14T23:57:29.440Z · LW(p) · GW(p)

Have you read "The 10,000 Year Explosion"? Cochran & Harpending (and Hawks and some others in the paper its based on) argue that evolution has accelerated recently. The reason is that there is a larger population, so more new mutations to be selected. Also, because our environment is not a steady state our genes don't reach a steady state either (like horseshoe crabs or a number of other species). I've only read a bit past the first chapter, but it would seem relevant to your claim.

Replies from: PhilGoetz, timtyler
comment by PhilGoetz · 2009-10-15T00:26:56.305Z · LW(p) · GW(p)

I'd be interested, but evolution over the past 20,000 years doesn't affect the argument I'm making here, which looks at a long-term trend in evolution.

ADDED: There are some factors that will increase genetic exchange and selective pressure, as discussed in some comments below; but not that increasing genetic exchange often slows evolution. There's a balance between being able to spread beneficial mutations, and reaching premature convergence; the "sweet spot" of that balance is with very small communities, much, much smaller than continent-sized. Some equations and data indicate that species diversity is much larger when the environment is fragmented into small areas with little communication (google "island theory of biogeography").

comment by timtyler · 2009-10-16T17:31:33.560Z · LW(p) · GW(p)

It sounds as though they are talking about human evolution - plus maybe the evolution of rats, lettuces and pigeons. The numbers of many other species have dwindled.

comment by bogus · 2009-10-14T17:21:55.339Z · LW(p) · GW(p)

What's your justification for the claim that "almost all of the information content of an organism resides in the amino-acid sequence of its domains"?

For your claims about "speed of evolution" to make any sense, it must be the case that we could get rid of the information content which does not reside in these sequences with minimal losses in evolutionary fitness. My guess is that this is not the case, hence your measure of "information" is quite suspect.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T17:29:10.804Z · LW(p) · GW(p)

What's your justification for the claim that "almost all of the information content of an organism resides in the amino-acid sequence of its domains"?

The paragraphs before that claim, plus the fact that the fraction of DNA devoted to regulation, such as promoter sequences, is a small fraction of that devoted to coding, and is also much more redundant. In short: That is the result you find when you measure it.

For your claims about "speed of evolution" to make any sense, it must be the case that we could get rid of the information content which does not reside in these sequences with minimal losses in evolutionary fitness.

I see no reason to think that. You can often kill an organism by changing 1 bit in its genome.

Replies from: SilasBarta, bogus
comment by SilasBarta · 2009-10-14T17:42:14.305Z · LW(p) · GW(p)

What's your justification for the claim that "almost all of the information content of an organism resides in the amino-acid sequence of its domains"?

The paragraphs before that claim, plus the fact that the fraction of DNA devoted to regulation, such as promoter sequences, is a tiny fraction of that devoted to coding, and is also much more redundant. In short: That is the result you find when you measure it. This is fact, not opinion.

Even so, you're saying that I have virtually all of the information needed to (in principle) reconstruct an organism once I see its DNA (which tells me the protein domains and their order therein). What about the information about its ontogenic process (the surrounding womb/shell that enables chemical reactions to happen in just the right way), its physical injuries and modifications, its upbrining, its diet, its culture ...?

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T17:47:26.371Z · LW(p) · GW(p)

Even so, you're saying that I have virtually all of the information needed to (in principle) reconstruct an organism once I see its DNA.

No, I'm not saying that. Yes, there is extra-genomic information. I'm sure that the increase in intelligence happens because more-complex creatures can extract more information from the environment. But that is an output, not an input, in my analysis; it goes into "intelligence", not into "information input". I am asking what function translates genomic information (plus epigenetic information) into "intelligence", or an organism's ability to solve problems.

Replies from: SilasBarta, timtyler
comment by SilasBarta · 2009-10-14T19:28:40.301Z · LW(p) · GW(p)

No, I'm not saying that. Yes, there is extra-genomic information. I'm sure that the increase in intelligence happens because more-complex creatures can extract more information from the environment.

Yes, you were saying "that" where "that" refers to "almost all of the information content of an organism resides in the amino-acid sequence of its domains". And that statement means that, but for practical difficulties, the DNA suffices to tell you what you need to do to build the organism (though you left a caveat that you might still need a small amount of extra information, which I assumed to mean e.g. age).

If you mean something else by it, then you're using the terms in a non-standard way.

This isn't an issue of organisms being able to extract more information from the environment; irrespective of how much information it extracts from the environment, you still need lots of information in addition to the genome to make a copy -- and this is a big part of why Jurassic Park hasn't already happened.

(By the way, based on our previous exchanges, we seem to be looking at similar problems and could help each other: one big hole in my knowledge is that of organic chemistry and thus how existing self-replicators work at the chemical level.)

Replies from: PhilGoetz, billswift
comment by PhilGoetz · 2009-10-14T19:32:35.718Z · LW(p) · GW(p)

Yes, you were saying "that" where "that" refers to "almost all of the information content of an organism resides in the amino-acid sequence of its domains".

I misspoke. I've fixed it now.

Replies from: MichaelBishop
comment by Mike Bishop (MichaelBishop) · 2009-10-14T22:22:59.557Z · LW(p) · GW(p)

I assume you mean you've edited your previous comment. I'd appreciate it if, when people edit their posts or comments, they indicate that they have done so, and ideally how, in the very same comment/post. That said, I don't want to be so nitpicky as to discourage contributions.

comment by billswift · 2009-10-15T16:41:47.774Z · LW(p) · GW(p)

"almost all of the information content of an organism resides in the amino-acid sequence of its domains" is an accurate statement. The uterine environment is involved with the creation of another organism, but it is not part of the information content of the organism, except in the sense that everything that happens to an organism is recorded in damages to that organism.

comment by timtyler · 2009-10-14T19:03:17.418Z · LW(p) · GW(p)

"Almost all of the information content of an organism resides in the amino-acid sequence of its domains" seems simply wrong to me. Whatever the intended point was, it needs rephrasing.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T19:32:55.231Z · LW(p) · GW(p)

Rephrased.

comment by bogus · 2009-10-14T17:39:38.068Z · LW(p) · GW(p)

the fraction of DNA devoted to regulation, such as promoter sequences, is a tiny fraction of that devoted to coding

Given that most of DNA is junk, considering "information" in raw storage terms makes little sense. It may be a tiny fraction, but is it an important contributor to genetic fitness? If it is, then it's hard to argue that evolution has slowed down.

I see no reason to think that. You can often kill an organism by changing 1 bit in its genome.

ETA: I'm not disputing that, but see above. I'm trying to qualify the information's overall contribution to genetic fitness.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T17:41:24.978Z · LW(p) · GW(p)

I'm only arguing about the speed at which information is produced, and the speed at which intelligence is produced.

comment by Vladimir_Nesov · 2009-10-15T15:53:43.969Z · LW(p) · GW(p)

Even the concepts involved in this "analysis" seem pretty meaningless, but when you start plugging them into "math" and "exponentials", it results in meaninglessness singularity, that is a point at which all sanity breaks down and woo ensues!

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2009-10-15T17:16:02.519Z · LW(p) · GW(p)

Downvoted because the criticism is too vague to really reply to. This isn't the first time I've observed it in your comments; I'd recommend elaborating more if you want them to be useful. (On the other hand, often - including now - your comments do seem like they'd have an insightful idea behind them, which is why I took the time to contribute this comment. )

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-10-15T17:28:35.478Z · LW(p) · GW(p)

It's more of a voice of dissent than criticism, naming an issue rather than constructing it. There is a tradeoff between elaborating and staying silent: on one side too much effort, on another absence of data about what intuition says. I don't feel it's OK for this post to have a positive score, and I'm pretty sure about this judgment even without making details explicit for myself. Sometimes that's a fallacy, of course.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2009-10-15T18:24:39.942Z · LW(p) · GW(p)

Fair enough. Though I don't think it'd take that much work to make the comment in question more constructive. A sentence or two about why the concepts used are useless would already help a lot.

comment by SilasBarta · 2009-10-15T03:42:50.142Z · LW(p) · GW(p)

May be relevant, and seems to be consistent with your point: evolution has a speed limit and complexity bound.

Replies from: PhilGoetz, timtyler
comment by PhilGoetz · 2009-10-15T15:52:22.657Z · LW(p) · GW(p)

Actually, that's based on the mistaken belief that selection can provide only 1 bit of information per generation. If you'll look down to the end of the original 2007 post, you'll see I gave the correct (and now Eliezer-approved) formulation, which is:

If you take a population of organisms, and you divide it arbitrarily into 2 groups, and you show the 2 groups to God and ask, "Which one of these groups is, on average, more fit?", and God tells you, then you have been given 1 bit of information.

But if you take a population of organisms, and ask God to divide it into 2 groups, one consisting of organisms of above-average fitness, and one consisting of organisms of below-average fitness, that gives you a lot more than 1 bit. It takes n lg(n) bits to sort the population; then you subtract out the information needed to sort each half, so you gain n lg(n) - 2(n/2)lg(n/2) = n[lg(n) - lg(n/2)] = nlg(2) = n bits.

If you do tournament selection, you have n/2 tournaments, each of which gives you 1 bit, so you get n/2 bits per generation.

ADDED: This doesn't immediately get you out of the problem, as n bits spread out among n genomes gives you 1 bit per genome. That doesn't mean, though, that you've gained only 1 bit for the species as a whole. The more-important observation in that summary is that organisms with more mutations are more likely to die, eliminating > 1 mutation per death on average.

Replies from: PhilGoetz, Eliezer_Yudkowsky, timtyler, SilasBarta
comment by PhilGoetz · 2009-10-15T21:54:34.595Z · LW(p) · GW(p)

This paragraph is more important:

Although the actual Genome Project's finding of 25,000 genes fits well under Yudkowsky's attempted bound, the mathematical argument failed. A computer simulation failed to bear out the bound, and the flaw appears to have been as follows: Even if one mutation creates one death, this does not mean that one death eliminates only a single mutation. Organisms bearing more deleterious mutations are more likely to lose the evolutionary competition, and so each death can eliminate more mutations than average. If mating is random and the least fit organisms are perfectly eliminated in every generation, the information supportable in the genome goes as the inverse square of the mutation rate.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-10-15T17:38:27.724Z · LW(p) · GW(p)

That's not exactly Eliezer-approved, because now the real problem is to tell what the conditions are more like in nature - Worden or MacKay or somewhere in between. That's what I put up on the Wiki as summary of the state of information. Mathematical assumptions are cheaper than empirical truths.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-15T21:50:36.270Z · LW(p) · GW(p)

Right - I just meant wrt "1 bit per generation regardless of population size".

comment by timtyler · 2009-10-15T21:47:53.360Z · LW(p) · GW(p)

If this is a discussion of Worden's paper, then you seem to have missed that he is not talking about information, but rather "Genetic Information in the Phenotype" - which is actually a completely different concept.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-15T22:08:28.859Z · LW(p) · GW(p)

he is not talking about information, but rather "Genetic Information in the Phenotype" - which is actually a completely different concept

How so?

Replies from: timtyler
comment by timtyler · 2009-10-16T06:18:06.461Z · LW(p) · GW(p)

For instance:

"GIP is a measure of how much the observed values i in a large population tend to cluster on a few values; if there is no clustering, Gµ=0, and if there is complete clustering on one value, Gµ= log2(Nµ). It is a property of the population, not of an individual."

comment by SilasBarta · 2009-10-15T15:54:43.957Z · LW(p) · GW(p)

Ah. Sorry for not reading through the history, and thanks for the good explanation!

comment by timtyler · 2009-10-15T20:19:07.277Z · LW(p) · GW(p)

Worden? Essentially that's a crock. See:

http://alife.co.uk/essays/no_speed_limit_for_evolution/

comment by LauraABJ · 2009-10-15T02:38:55.504Z · LW(p) · GW(p)

I think this post presents a very interesting view of the information explosion. Even the task of self-improvement will undergo an evolution of sorts, and we have no better example to draw from than genetic evolution. We have observed an increasing efficiency of information to directed behavior (intelligence as the article puts it), and it is yet to be seen what the limits of that efficiency may be.

Only one upvote? Really?

comment by timtyler · 2009-10-14T19:09:54.881Z · LW(p) · GW(p)

Long lived organisms do reproduce and evolve more slowly - though note that evolution still acts on their germ line cells during their lifetime.

However, to jump from there to "evolution has been slowing down in information-theoretic terms" seems like a bit of a wild leap. Bacteria haven't gone away - and they are evolving as fast as ever. How come their evolution is not being counted?

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T19:21:15.163Z · LW(p) · GW(p)

I mentioned that 64% of the known protein domains, and 84% of those found in eukaryotes, evolved before the prokaryote-eukaryote split. This means that 73% of those found in bacteria evolved in the first billion years of life, before the prok-euk split about 2 billion years ago. So, roughly 3/4 of the information in bacterial genomes evolved during the first 1/3 of life on earth, meaning the rate of information generation in bacteria during that first 1/3 was about 6 times what it was in the next 2/3. This is more surprising than the observations about eukaryote evolution. I interpret it as meaning that competition became more intense as evolution progressed, allowing less experimentation and causing more confinement to local maxima.

Replies from: timtyler
comment by timtyler · 2009-10-14T20:01:33.121Z · LW(p) · GW(p)

I didn't find your figures in the (enormous) referenced papers. Without them, it isn't clear what the figures mean or where the 73% comes from.

Most proteins have not been discovered - and there is probably a bias towards discovering the ones that are shared with eucaryotes - which would distort the figures in favour of finding older genes.

Also, life started around 3.7 billion years ago. Also, it seems rather dubious to measure the rate of information change within evolution as the rate of information change within bacterial genomes. That doesn't consider the information present in the diversity of life.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T21:57:55.595Z · LW(p) · GW(p)

In the Levitt paper, 64% is the number of single-domain architecture proteins that are found in at least two of the 3 groups viruses, prokaryotes, and eukaryotes (figure 3). This is my (very close) approximation for the fraction of families in eukaryotes or prokaryotes found in both eukaryotes and prokaryotes, which isn't reported. 84% is computed from that information, plus the caption of figure 3 saying that prokaryotes contain 88% of SDA families. 73% is computed from all of that information.

Most proteins have not been discovered - and there is probably a bias towards discovering the ones that are shared with eucaryotes - which would distort the figures in favour of finding older genes.

There is no bias towards discovering genes shared with eukaryotes in ordinary sequencing. We sequence complete genomes. Almost all of the bacterial genes known come from these whole-genome projects. We've sequenced many more bacteria than eukaryotes. Bacterial genomes don't contain much repetitive intergenic DNA, so you get nice complete genome assemblies.

Life starting 3.7 billion years ago - could be. Google's top ten show claims ranging from 2.7GY to 4.4GY ago. Adding that .7 billion could make the information-growth curve more linear, and remove one exponentiation in my analysis.

Also, it seems rather dubious to measure the rate of information change within evolution as the rate of information change within bacterial genomes. That doesn't consider the information present in the diversity of life.

Let's just say I'm measuring the information in DNA. Information in "the diversity of life" is too vague. I don't want to measure any information that an organism or an ecosystem gains from the environment by expressing those genetic codes.

Replies from: SilasBarta, SilasBarta, timtyler, taw, timtyler, timtyler, timtyler
comment by SilasBarta · 2009-10-14T22:33:00.867Z · LW(p) · GW(p)

Out of curiosity, has the protein problem yet been mathematically formalized so that it can be handed over to computers? That is, do we understand molecular dynamics well enough to automatically discover all possible proteins, starting from the known ones?

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-15T00:28:51.355Z · LW(p) · GW(p)

We could list them in order, if that's what you mean. It would be a Library of Babel. Could we determine their structures? No. Need much much more computational power, for starters.

Replies from: SilasBarta
comment by SilasBarta · 2009-10-15T03:20:44.691Z · LW(p) · GW(p)

That's what I intended to refer to -- the tertiary structure. Have they thrown the full mathematical toolbox at it? I've heard that predicting it is NP-complete, in that it's order-dependent, etc., but have they thrown all the data-mining tricks at it to find out whatever additional regularity constrains the folding so as to make it less computationally intensive?

The reason I don't immediately assume the best minds have already considered this, is that there seems to be some disconnect between the biological and mathematical communities -- just recently I heard that they just got around to using the method of adjacency matrix eigenvectors (i.e. what Google's PageRank uses) for identifying critical species in ecosystems.

Replies from: pengvado
comment by pengvado · 2009-10-15T09:52:17.894Z · LW(p) · GW(p)

Predicting the ground state of a protein is NP-hard. But nature can't solve NP-hard problems either, so predicting what actually happens when a protein folds is merely in BQP.

I would expect most proteins found in natural organisms to be in some sense easy instances of the protein folding problem, i.e. that the BQP method finds the ground state. Because the alternative is getting stuck in local minima, which probably means it doesn't fold consistently to the same shape, which is probably an evolutionary disadvantage. But if there are any remaining differences, then for the purpose of protein structure prediction it's actually the local minimum that's the right answer, and the NP problem that doesn't get solved is irrelevant.

And yes there are quantum simulation people hard at work on the problem, so it's not just biologists. But I don't know enough of the details to say whether they've exhausted the conventional toolbox of heavy-duty math yet.

Replies from: Cyan, DanArmak, PhilGoetz
comment by Cyan · 2009-10-15T15:07:56.665Z · LW(p) · GW(p)

This is a nice insight.

comment by DanArmak · 2009-10-15T16:08:05.067Z · LW(p) · GW(p)

But nature can't solve NP-hard problems in general either, so predicting what actually happens when a protein folds is merely in BQP.

That explains why I've seen descriptions of folding prediction algorithms that run in polynomial time, on the order of n^5 or less with n = number of amino acids in primary chain.

I wanted to add that many proteins found in nature require chaperones to fold correctly. These can be any other molecules - usually proteins, RNA-zymes, or lipids - that influence the folding process to either assist or prevent certain configurations. They can even form temporary covalent bonds with the protein being folded. (Or permanent ones; some working proteins have attached sugars, metals, other proteins, etc.) And the protein making machinery in the ribosomes has a lot of complexity as well - amino acid chains don't just suddenly appear and start folding.

All this makes it much harder to predict the folding and action of a protein in a real cell environment. In vivo experiments can't be replaced by calculations without simulating a big chunk of the whole cell on a molecular level.

comment by PhilGoetz · 2009-10-15T15:33:20.996Z · LW(p) · GW(p)

Why do you think nature can't solve NP-hard problems? When you dip a twisted wire with 3D structure into a dish of liquid soap and water, and pull it out and get a soap film, didn't it just solve an NP problem?

All of the bonds and atoms in a protein are "computing" simultaneously, so the fact that the problem is NP in terms of number of molecules isn't a problem. I don't understand BQP & so can't comment on that.

Incidentally, your observation about consistent folding is often right, but some proteins have functions that depending on them folding to different shapes under different conditions. Usually these shapes are similar. I don't know if any proteins routinely fold into 2 very-different shapes.

Replies from: SilasBarta, Cyan, pengvado
comment by SilasBarta · 2009-10-15T15:51:29.210Z · LW(p) · GW(p)

Why do you think nature can't solve NP-hard problems? When you dip a twisted wire with 3D structure into a dish of liquid soap and water, and pull it out and get a soap film, didn't it just solve an NP problem?

Oh no. Ohhhhh no. Not somebody trying to claim that nature solves problems in NP in polynomial time because of bubble shape minimization again!

Like Cyan just said, nature does not solve the NP problem of a global optimal configuration. It just finds a local optimum, which is already known to be computationally easy! Here's a reference list.

comment by Cyan · 2009-10-15T15:45:31.053Z · LW(p) · GW(p)

The more convoluted the wire structure, the more likely the soap film is to be in a stable sub-optimal configuration.

comment by pengvado · 2009-10-15T16:24:34.267Z · LW(p) · GW(p)

All of the bonds and atoms in a protein are "computing" simultaneously,

And there are at most N^2 of them, so that doesn't transform exponential into tractable. It's not even a Grover speedup (2^N -> 2^(N/2)), which we do know how to get out of a quantum computer.

comment by SilasBarta · 2009-10-15T23:39:04.333Z · LW(p) · GW(p)

And, interestingly enough, slashdot just ran a story on progress in protein folding in the nucleus of a cell.

comment by timtyler · 2009-10-15T20:45:25.343Z · LW(p) · GW(p)

So: with two data-points and a back-of-the-envelope calculation, you conclude that DNA-evolution has been slowing down?

It seems like pretty feeble evidence to me :-(

I should add that - conventionally - evolutionary rates are measured in 'darwins', and are based on trait variation (not variation in the underlying genotypes) because of how evolution is defined.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-16T03:48:39.550Z · LW(p) · GW(p)

Dude. It's an idea. I said not to take my conclusions too seriously. This is not a refereed journal. Why do you think your job is only to find fault with ideas, and never to play with them, try them out, or look for other evidence?

Different people measure evolutionary rate or distance differently, depending on what their data is. People studying genetic evolution never use Darwins. The reason for bringing up genomes at all in this post is to look at the shape of the relationship between genome information and phenotypic complexity; so to start by measuring only phenotypes would get you nowhere.

Replies from: timtyler
comment by timtyler · 2009-10-16T06:31:08.638Z · LW(p) · GW(p)

Inaccurate premise: I don't think my job is "only to find fault with ideas". When I do that, it's often because that is the simplest and fastest way to contribute. Destruction is easier than construction - but it is pretty helpful nonetheless. Critics have saved me endless hours of frustration pursuing bad ideas. I wish to pass some of that on.

In this particular sub-thread, my behavior is actually fairly selfish: if there's reasonable evidence that DNA-evolution has been slowing down, I would be interested in hearing about it. However, I'm not going to find such evidence in this thread if people get the idea that this point has already been established.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-21T22:20:19.324Z · LW(p) · GW(p)

I don't have strong evidence that DNA evolution has been slowing down in bacteria. I presented both evidence and explanation why it has been slowing down in eukaryotes. That is all that matters for this post; because the point of referring to DNA evolution in this post has to do with how efficiently evolution uses information in the production of intelligence. Eukaryotes are more intelligent than bacteria.

comment by taw · 2009-10-15T00:02:25.305Z · LW(p) · GW(p)

So I've read the paper. According to it, and it seems very plausible to me, we have some reason to suspect we seriously underestimate number of SDA families, and most widely distributed SDA families are most likely to be known (those often happen to occur in multiple groups), and less widely distributed families are least likely to be known (those often happen to be one group only).

The actual percentage of shared SDA families is almost certainly lower than what we can naively estimate from current data. I don't know how much lower. Maybe just a few percent, maybe a lot.

Not mentioned in the paper, but quite obvious is huge amount of horizontal gene transfer happening on evolutionary scales like that (especially with viruses). It also increases apparent sharing and makes them appear older than they really are.

Third effect is that SDA family that diverged long time ago might be unrecognizable as single family, and one that developed more recently is still recognizable as such. This can only increase apparent age of SDA families.

So there are at least three effects of unknown magnitude, but known direction. If any of them is strong enough, it invalidates your hypothesis. If all of them are weak, your hypothesis still relies a lot on dating of eukaryote-prokaryote split.

comment by timtyler · 2009-10-14T22:48:13.955Z · LW(p) · GW(p)

Imagine you have 100 related organisms in a bag with PRO written on it. You take 10 and put them in a bag with EUC written on it. Then you sequence everything - and calculate what fraction of the total number of genes are found in both bags - and you come back with 64%.

That this is less than 100% doesn't represent the genomes in the EUC bag changing. It just means that you selected a small sample from the PRO bag. Choose 5 and the figure would have been smaller. Choose 50 and the figure would have been bigger.

comment by timtyler · 2009-10-14T22:33:02.643Z · LW(p) · GW(p)

At least 3.7 billion years ago, then.

comment by timtyler · 2009-10-14T22:26:10.232Z · LW(p) · GW(p)

I too was talking about information in DNA. The number of species influences the quantity of information present in the DNA of an ecosystem - just as rolling a dice 100 times supplies more information than rolling it once.

comment by timtyler · 2009-10-14T18:52:35.405Z · LW(p) · GW(p)

I can't say I care much for the Evolution/Science split in this post.

A more natural split would be between DNA evolution and cultural evolution.

Evolution is best seen as an umbrella term that covers any copying-with-variation-and-selection process - and science is one example of cultural evolution.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T19:25:11.721Z · LW(p) · GW(p)

Agreed on DNA evolution. Not agreed on cultural evolution. Science is easier to quantify. My data concerns mainly science.

Replies from: timtyler
comment by timtyler · 2009-10-14T22:22:58.421Z · LW(p) · GW(p)

You were being specific, when you could have been more general. It isn't just science that is evolving rapidly, it is technology, fashion, music, literature - and so on. Science makes a fine example - but it's just one part among many of a snowballing human culture - and readers should ideally be aware of that.

comment by timtyler · 2009-10-14T18:43:27.626Z · LW(p) · GW(p)

Re: "I therefore expect the pace of evolution to suddenly switch from falling, to increasing [...]"

Since the pace of evolution has clearly been increasing recently, this seems like a rather retroactive prediction.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T19:28:52.365Z · LW(p) · GW(p)

Since the pace of evolution has clearly been increasing recently, this seems like a rather retroactive prediction.

I refer you to Katja's blog post, "'Clearly' covers murky thought". I'm not aware of any evidence that the pace of evolution has been increasing recently. I am pretty sure that the pace of DNA evolution has not been increasing recently. I am aware of many reasons for predicting that the pace of evolution of humans has been decreasing in the past centuries.

But I will change the statement to specify rate of information acquisition in genomes.

Replies from: timtyler
comment by timtyler · 2009-10-14T20:15:01.287Z · LW(p) · GW(p)

Evolution as a whole is has clearly been speeding up recently - due to the current mass extinction, the evolution of intelligent design, genetic engineering, etc. Today we are witnessing an unprecedented rate of evolutionary change. Just look out of your window.

For the human genome, perhaps see: Human evolution is 'speeding up'

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-14T22:06:53.726Z · LW(p) · GW(p)

We are witnessing many extinctions. That's a loss of information, not a gain of information. But, yes, massive disturbances and relocations increase the rate of evolution, all else being equal.

Replies from: timtyler
comment by timtyler · 2009-10-14T22:53:52.070Z · LW(p) · GW(p)

If you refer to the "pace of evolution" I should hope that mass extinctions count as rapid evolution. The gene frequencies there are changing pretty rapidly.

If you mean to refer to some other kind of metric, you should probably be more specific - for example, you might want to consider talking about "constructive evolution" - or something similar.

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-15T00:31:55.842Z · LW(p) · GW(p)

If you refer to the "pace of evolution" I should hope that mass extinctions count as rapid evolution.

I wouldn't. That's just loss. If your planet was hit by a supernova, would you call that rapid evolution?

I've been quite specific. I'm talking about the accumulation of information in the DNA of all organisms.

Replies from: timtyler
comment by timtyler · 2009-10-15T21:24:21.812Z · LW(p) · GW(p)

Don't call it "evolution", then - or people will get very confused. Evolution is about change - not about gain or loss. Check with the definition of the term.

comment by PeterKinnon · 2009-10-15T03:31:18.571Z · LW(p) · GW(p)

Sadly we here observe a retreat within the simple language of mathematics. I am not decrying mathematics nor am I underestimating the great value of that language in extending knowledge of the physical world by bypassing the complexities and irrelevancies common to the natural languages.

It does, however, suffer from two major weaknesses:

Firstly, like all languages, it is capable of generating fictions - entities and scenarios which have no correspondence wit the real world.

Secondly, it is, like all reasoning or computational processes, raw data sensitive. This is expressed in the venerable IT mantra "Garbage In - Garbage Out"

The second of these is, I believe, the main culprit in generating the conclusions that you affirm. For you seem to have fallen into the same trap as Jonathan Huebner who, using a rather arbitrary criterion of "significant advances" concluded that the rate of innovation has actually been decreasing since a maximum that occurred in 1873. A glance around the shelves of a Chemical Abstracts archive, for instance, will quickly tell a denizen of the real world that something is seriously wrong with that analysis.

Now it is very evident that not all aspects of human activities, even some technologies, are subject to exponential growth, as underlined in your very interesting presentation "The Myth of Accelerating Change" Similarly the evolution of life has not exhibited a consistent acceleration in all its bifurcations and ramifications. The "significant steps" used by Huebner (even after allowing for the strong "self selection" effect involved in that analysis) are merely break points, sharp upheavals resulting from the accumulation of innumerable "baby steps" wherein lies the exponentiality.

The input data for your own analysis lies in the conceptual arbitrariness of "important knowledge" and "useful knowledge". These are the foundation of your argument. If their validity is in question, as I propose it is, then all the arithmetic in the world will not hold it together. Similarly extrapolations of arbitrary data to towers of exponentiality must be considered as pure flights of fancy. GIGO rules!

The great problem with ascribing importance and utility (citation hits certainly won't do) is in determining the entity to which these are relative. Certainly these value judgments will vary enormously between individual humans and consensus is probably out of the question. We can tie it down a little by noticing that a fairly constant gross exponentiality appears to be tied to technology rather than other human activities. A distinction not easy to make as many aspect of social and individual behavior are themselves driven by technological change. Such features as sex, art and religion are among those with some immunity.

One way to escape from this dilemma, however, if we are to properly interpret the patterns science observes in nature, is to learn the trick of stepping outside our (very natural) anthropocentric shell so that objectivity is not compromised. My book "Unusual Perspectives" (the electronic edition of which can be freely downloaded) uses this approach to arrive at the proposition that the evolution of living systems and the evolution of technology (with which it is contiguous) are components of an ongoing natural process. From this point of reference we can perhaps better determine what aspects are "important" or "useful" (to the life process). This stance certainly strongly suggests the advent of what some call a singularity but which I prefer to regard as something akin to a fairly imminent phase transition. I further speculate that that, considering the history of the process and the apparent direction of its vector, that the internet could as the result of an inevitable self-assembly mechanism be the most likely candidate for the next prime effector of the process. "Unusual Perspectives" can be downloaded from the dedicated website: www.unusual-perspectives.net

Replies from: PhilGoetz
comment by PhilGoetz · 2009-10-15T15:38:25.673Z · LW(p) · GW(p)

The input data for your own analysis lies in the conceptual arbitrariness of "important knowledge" and "useful knowledge". These are the foundation of your argument. If their validity is in question, as I propose it is, then all the arithmetic in the world will not hold it together.

Nope. If you'll look at the math, you'll see that I said "important knowledge" ranges somewhere between O(log(raw information)) and O(raw information). Important knowledge = O(raw information) means we do not make any distinction between raw information and "important" information.

the evolution of living systems and the evolution of technology (with which it is contiguous) are components of an ongoing natural process.

Some of the ancients would have said that human inventions and nature are fundamentally the same, since nature is the invention of God. Now some people say that technology and evolution are fundamentally the same, since humans are part of nature.

Whatever. I just want to know if the curves match.