Posts

Comments

Comment by Jed_Harris on Artificial Mysterious Intelligence · 2008-12-07T21:05:43.000Z · LW · GW

I should mention that the NIPS '08 papers aren't on line yet, but all previous conferences do have the papers, tutorials, slides, background material, etc. on line. For example here's last year.

Comment by Jed_Harris on Artificial Mysterious Intelligence · 2008-12-07T21:00:36.000Z · LW · GW

The arguments Eliezer describes are made, and his reactions are fair. But really the actual research community "grew out" of most of this stuff a while back. CYC and the "common sense" efforts were always a sideshow (in terms of research money and staff, not to mention results). Neural networks were a metonym for statistical learning for a while, then serious researchers figured out they needed to address statistical learning explicitly. Etc.

Admittedly there's always excessive enthusiasm for the current hot thing. A few years ago it was support vector machines, I'm not sure what now.

I recognize there's some need to deflate popular misconceptions, but there's also a need to move on and look at current work.

Eliezer, I'd be very interested in your comments on (what I regard as) the best current work. Examples for you to consider would be Sebastian Thrun, Andrew Ng (both in robotics at Stanford), Chris Manning (linguistics at Stanford), and the papers in the last couple of NIPS conferences (the word "Neural" in the conference title is just a fossil, don't have an allergic reaction).

As an entertaining side note, here's an abstract for a poster for NIPS '08 (happening tomorrow) that addresses the crossover between AI and ems:

A Bayesian Approach for Extracting State Transition Dynamics from Multiple Spike Trains

Neural activity is non-stationary and varies across time. Hidden Markov Models (HMMs) have been used to track the state transition among quasi-stationary discrete neural states. Within this context, an independent Poisson model has been used for the output distribution of HMMs; hence, the model is incapable of tracking the change in correlation without modulating the firing rate. To achieve this, we applied a multivariate Poisson distribution with a correlation term for the output distribution of HMMs. We formulated a Variational Bayes (VB) inference for the model. The VB could automatically determine the appropriate number of hidden states and correlation types while avoiding the overlearning problem. We developed an efficient algorithm for computing posteriors using the recursive relationship of a multivariate Poisson distribution. We demonstrated the performance of our method on synthetic data and a real spike train recorded from a songbird.This is a pretty good example of what I meant by "solving engineering problems" and it should help the ems program "cut corners".

Comment by Jed_Harris on Sustained Strong Recursion · 2008-12-06T17:36:56.000Z · LW · GW

Regarding serial vs. parallel:

The effect on progress is indirect and as a result hard to figure out with confidence.

We have gradually learned how to get nearly linear speedups from large numbers of cores. We can now manage linear speedups over dozens of cores for fairly structured computations, and linear speedup over hundreds of cores are possible in many cases. This is well beyond the near future number of cores per chip. For the purposes of this analysis I think we can assume that Intel can get linear speedups from increasing processors per chip, say for the next ten years.

But there are other issues.

More complicated / difficult programming models may not slow down a given program, but they make changing programs more difficult.

Over time our ability to create malleable highly parallel programs has improved. In special cases a serial program can be "automatically" parallelized (compilation with hints) but mostly parallelization still requires explicit design. But the abstractions have gotten much easier to use and revise.

(In my earlier analysis I was assuming, I think correctly, that this improvement was a function of human thought without much computational assist. The relevant experiments aren't computationally expensive. Intel has been building massively parallel systems since the mid-80s but it didn't produce most major improvements. The parallel programming ideas accreted slowly from a very broad community.)

So I guess I'd say that with the current software technology and trend, Intel can probably maintain most of its computational curve-riding. Certainly simulations with a known software architecture can be parallelized quite effectively, and can be maintained as requirements evolve.

The limitation will be on changes that violate the current pervasive assumptions of the simulation design. I don't know what those are these days, and if I did I probably couldn't say. However they reflect properties that are common to all the "processor like" chips Intel designs, over all the processes it can easily imagine.

Changes to software that involve revising pervasive assumptions have always been difficult, of course. Parallelization just increases the difficulty by some significant constant factor. Not really constant, though, it has been slowly decreasing over time as noted above.

So the types of improvement that will slow down are the ones that involve major new ways to simulate chips, or major new design approaches that don't fit Intel's current assumptions about chip micro-architecture or processes.

While these could be significant, unfortunately I can't predict how or when. I can't even come up with a list of examples where such improvement were made. They are pretty infrequent and hard to categorize.

I hope this helps.

Comment by Jed_Harris on Sustained Strong Recursion · 2008-12-06T01:09:44.000Z · LW · GW

I'll try to estimate as requested, but substituting fixed computing power for "riding the curve" (as Intel does now) is a bit of an apples to fruit cocktail comparison, so I'm not sure how useful it is. A more direct comparison would be with always having a computing infrastructure from 10 years in the future or past.

Even with this amendment, the (necessary) changes to design, test, and debugging processes make this hard to answer...

I'll think out loud a bit.

Here's the first quick guess I can make that I'm moderately sure of: The length of time to go through a design cycle (including shrinks and transitions to new processes) would scale pretty closely with computing power, keeping the other constraints pretty much constant. (Same designers, same number of bugs acceptable, etc.) So if we assume the power follows Moore's law (probably too simple as others have pointed out) cycles would run hundreds of times faster with computing power from 10 years in the future.

This more or less fits the reality, in that design cycles have stayed about the same length while chips have gotten hundreds of times more complex, and also much faster, both of which soak up computing power.

Probably more computing power would have also allowed faster process evolution (basically meaning smaller feature sizes) but I was never a process designer so I can't really generate a firm opinion on that. A lot of physical experimentation is required and much of that wouldn't go faster. So I'm going to assume very conservatively that the increased or decreased computing power would have no effect on process development.

The number of transistors on a chip is limited by process considerations, so adding computing power doesn't directly enable more complex chips. Leaving the number of devices the same and just cycling the design of chips with more or less the same architecture hundreds of times faster doesn't make much economic sense. Maybe instead Intel would create hundreds of times as many chip designs, but that implies a completely different corporate strategy so I won't pursue that.

In this scenario, experimentation via computing gets hundreds of times "cheaper" than in our world, so it would get used much more heavily. Given these cheap experiments, I'd guess Intel would have adopted much more radical designs.

Examples of more radical approaches would be self-clocked chips, much more internal parallelism (right now only about 1/10 of the devices change state on any clock), chips that directly use more of the quantum properties of the material, chips that work with values other than 0 and 1, direct use of probabilistic computing, etc. In other words, designers would have pushed much further out into the micro-architectural design space, to squeeze more function out of the devices. Some of this (e.g. probabilistic or quantum-enhanced computing) could propagate up to the instruction set level.

(This kind of weird design is exactly what we get when evolutionary search is applied directly to a gate array, which roughly approximates the situation Intel would be in.)

Conversely, if Intel had hundreds of times less computing power, they'd have to be extremely conservative. Designs would have to stay further from any possible timing bugs, new designs would appear much more slowly, they'd probably make the transition to multiple cores much sooner because scaling processor designs to large numbers of transistors would be intractable, there's be less fine grained internal parallelism, etc.

If we assumed that progress in process design was also more or less proportional to computing power available, then in effect we'd just be changing the exponent on the curve; to a first approximation we could assume no qualitative changes in design. However as I say this is a very big "if".

Now however we have to contend with an interesting feedback issue. Suppose we start importing computing from ten years in the future in the mid-1980s. If it speeds everything up proportionally, the curve gets a lot steeper, because that future is getting faster faster than ours. Conversely if Intel had to run on ten year old technology the curve would be a lot flatter.

On the other hand if there is skew between different aspects of the development process (as above with chip design vs. process design) we could go somewhere else entirely. For example if Intel develops some way to use quantum effects in 2000 due to faster simulations from 1985 on, and then that gets imported (in a black box) back to 1990, things could get pretty crazy.

I think that's all for now. Maybe I'll have more later. Further questions welcome.

Comment by Jed_Harris on Sustained Strong Recursion · 2008-12-05T22:56:05.000Z · LW · GW

I did work at Intel, and two years of that was in the process engineering area (running the AI lab, perhaps ironically).

The short answer is that more computing power leads to more rapid progress. Probably the relationship is close to linear, and the multiplier is not small.

Two examples:

  • The speed of a chip is limited by critical paths. Finding these and verifying fixes depends on physically realistic simulations (though they make simplifying assumptions, which sometimes fail). Generally the better the simulation the tighter one can cut corners. The limit on simulation quality is typically computer power available (though it can also be understanding the physics well enough to cheat correctly).

Specifically with reference to Phil Goetz's comment about scaling, the physics is not invariant under scaling (obviously) and the critical paths change in not entirely predictable ways. So again optimal "shrinks" are hostage to simulation performance.

  • The second example is more exotic. Shortly before I arrived in the process world, one of the guys who ended up working for me figured out how to watch the dynamics of a chip using a scanning electron microscope, since the charges in the chip modulate the electron beam. However integrating scanning control, imaging, chip control etc. was non-trivial and he wrote a lot of the code in Lisp. Using this tool he found the source of some serious process issues that no one had been able to diagnose.
  • This is a special case of the general pattern that progress in making the process better and the chips faster typically depends on modeling, analyzing, collecting data, etc. in new ways, and the limits are often how quickly humans can try out and evolve computer mediated tools. Scaling to larger data sets, using less efficient but more easily modified software, running simulations faster, etc. all pay big dividends.

    Intel can't in general substitute more processors in a cluster for faster processors, since writing software that gets good speedups on large numbers of processors is hard, and changing such software is much harder than changing single-processor software. The pool of people who can do this kind of development is also small and can't easily be increased.

    So I don't really know what difference it makes, but I think Eliezer's specific claim here is incorrect.

    Comment by Jed_Harris on Logical or Connectionist AI? · 2008-11-17T16:20:30.000Z · LW · GW

    On the one hand, Eliezer is right in terms of historical and technical specifics.

    On the other hand neural networks for many are a metoynym for continuous computations vs. the discrete computations of logic. This was my reaction when the two PDP volumes came out in the 80s. It wasn't "Here's the Way." It was "Here's and example of how to do things differently that will certainly work better."

    Note also that the GOFAI folks were not trying to use just one point in logic space. In the 70s we already knew that monotonic logic was not good enough (due to the frame problem among other things) so there was an active exploration of different types of non-monotonic logic. That's in addition to all the modal logics, etc.

    So the dichotomy Eliezer refers to should be viewed as more of a hyperplane separator in intelligence model space. From that point of view I think it is fairly valid -- the subspace of logical approaches is pretty separate from the subspace of continuous approaches, though Detlef and maybe others have shown you can build bridges.

    The two approaches were even more separate culturally at the time. AI researchers didn't learn or use continuous mathematics, and didn't want to see it in their papers. That probably has something to do with the 17 years. Human brains and human social groups aren't very good vehicles for this kind of search.

    So yes, treating this as distinction between sharp points is wrong. But treating it as a description of a big cultural transition is right.

    Comment by Jed_Harris on Complexity and Intelligence · 2008-11-03T23:25:57.000Z · LW · GW

    The "500 bits" only works if you take a hidden variable or Bohmian position on quantum mechanics. If (as the current consensus would say) non-linear dynamics can amplify quantum noise then enormous amounts of new information are being "produced" locally everywhere all the time. The current state of the universe incorporates much or all of that information. (Someone who understands the debates about black holes and the holographic principle should chime in with more precise analysis.)

    I couldn't follow the whole argument so I'm not sure how this affects it, but given that Eliezer keeps referring to this claim I guess it is important.

    Comment by Jed_Harris on A Premature Word on AI · 2008-05-31T23:45:56.000Z · LW · GW

    Poke's comment is interesting and I agree with his / her discussion of cultural evolution. But it also is possible to turn this point around to indicate a possible sweet spot in the fitness landscape that we are probably approaching. Conversely, however, I think the character of this sweet spot indicates scant likelihood of a very rapidly self-bootstrapping AGI.

    Probably the most important and distinctive aspect of humans is our ability and desire to coordinate (express ourselves to others, imitate others, work with others, etc.). That ability and desire is required to engage in the sort of cultural evolution that Poke describes. It underlies the individual acquisition of language, cultural transmission, long term research programs, etc.

    But as Eric Raymond points out, we are just good enough at this to make it work at all. A bunch of apes trying to coordinate world-wide culture, economy and research is a marginal proposition.

    Furthermore we can observe that major creative works come from a very small number of people in "hot" communities -- e.g. Florence during the Renaissance. As Paul Graham points out, this can't be the result of a collection of uniquely talented individuals, it must be some function of the local cultural resources and incentives. Unfortunately I don't know of any fine grained research on what these situations have in common -- we probably don't even have the right concepts to express those characteristics.

    A mundane version of this is the amazing productivity of a "gelled team", in software development and other areas. There is some interesting research on the fine grained correlates of team productivity but not much.

    So I conjecture that there is a sweet spot for optimized "thinking systems" equivalent to highly productive human teams or larger groups.

    Of course we already have such systems, combining humans and digital systems; the digital parts compensate for human limitations and decrease coordination costs in various ways, but they are still extremely weak -- basically networked bookkeeping mechanisms of various sorts.

    The natural direction of evolution here is that we improve the fit between the digital parts and the humans, tweak the environment to increase human effectiveness, and gradually increase the capabilities of the digital environment, until the human are no longer needed.

    As described this is just incremental development. However it is self-accelerating; these systems are good tools for improving themselves. I expect we'll see the usual sigmoid curve, where these "thinking systems" relatively quickly establish a new level, but then development slows down as they run into intrinsic limitations -- though it is hard to predict what these will be, just as Ada Lovelace couldn't predict the difficulties of massively parallel software design.

    From here, we can see a sweet spot that is inhabited by systems with the abilities of "super teams", perhaps with humans as components. In this scenario any super team emerges incrementally in a landscape with many other similar teams in various stages of development. Quite likely different teams will have different strengths and weaknesses. However nothing in this scenario gives us any reason to believe in super teams that can bootstrap themselves to virtual omniscience or omnipotence.

    This development will also give us deep insight into how humans coordinate and how to facilitate and guide that coordination. This knowledge is likely to have very large consequences outside the development of the super teams.

    Unfortunately, none of this thinking gives us much of a grip on the larger implications of moving to this sweet spot, just as Ada Lovelace (or Thomas Watson) didn't anticipate the social implications of the computer, and Einstein and Leo Szilard didn't anticipate the social implications of control over nuclear energy.

    Comment by Jed_Harris on A Premature Word on AI · 2008-05-31T20:26:32.000Z · LW · GW

    I largely agree with Robin's point that smaller incremental steps are necessary.

    But Eliezer's point about big jumps deserves a reply. The transitions to humans and to atomic bombs do indicate something to think about -- and for that matter, so does the emergence of computers.

    These all seem to me to be cases where the gradually rising or shifting capacities encounter a new "sweet spot" in the fitness landscape. Other examples are the evolution of flight, or of eyes, both of which happened several times. Or trees, a morphological innovation that arises in multiple botanical lineages.

    Note that even for innovations that fit this pattern, e.g. computers and atomic bombs, enormous amounts of incremental development are required before we can get to the sweet spot and start to expand there. (This is also true for biological evolution of course.)

    I think most human innovations (tall building, rockets, etc.) are due to incremental accumulation of this sort, rather than finding any big sweet spots.

    I should also note that decades before the atomic bomb, the actual production of energy from nuclear fission (geothermal) and fusion (the sun) was clear, if not understood in detail. Similarly the potential of general purpose computers was sensed (e.g. by Ada Lovelace) far before we could build them. This foreknowledge was quite concrete -- it involved detailed physical accounts of existing sources of energy, automation of existing computing techniques, etc. So this sort of sweet spot can be understood in quite detailed ways well before we have the technical skills to reach it.

    Using this model, if AGI arrives rapidly, it will be because we found a sweet spot, over and above computing. If AGI is feasible in the near future, that implies that we are near such a sweet spot now. If we are near such a sweet spot, we should be able to understand some of its specific form (beyond "it uses Bayesian reasoning") and the limitations that keep us from getting to it immediately.

    I agree with Eliezer that Bayesian methods are "forced", and I also feel the "Good Old Fashioned AI" folks (certainly including Shank and McCarthy) are not good forecasters, for many reasons.

    However Bayesian approaches are at the root of existing impressive AI, such as Thrun's work on autonomous vehicles. I have been watching this work fairly closely, and it is making the normal sort of incremental progress. If there's a big sweet spot nearby in the fitness landscape, these practitioners should be able to sense it. They would be well qualified to comment on the prospects for AI, and AGI in particular. I would be very interested in what they have to say.

    Comment by Jed_Harris on GAZP vs. GLUT · 2008-04-08T00:22:37.000Z · LW · GW

    PK, Phil Goetz, and Larry D'Anna are making a crucial point here but I'm afraid it is somewhat getting lost in the noise. The point is (in my words) that lookup tables are a philosophical red herring. To emulate a human being they can't just map external inputs to external outputs. They also have to map a big internal state to the next version of that big external state. (That's what Larry's equations mean.)

    If there was no internal state like this, a GLUT couldn't emulate a person with any memory at all. But by hypothesis, it does emulate a person (perfectly). So it must have this internal state.

    And given that a GLUT is maintaining a big internal state it is equivalent to a Turing machine, as Phil says.

    But that means that is can implement any computationally well defined process. If we believe that consciousness can be a property of some computation then GLUTs can have consciousness. This isn't even a stretch, it is totally unavoidable.

    The whole reason that philosopher talk about GLUTs, or that Searle talks about the Chinese room, is to try to trick the reader into being overwhelmed by the intuition that "that can't possibly be conscious" and to STOP THINKING.

    Looking at this discussion, to some extent that works! Most people didn't say "Hmmm, I wonder how a GLUT could emulate a human..." and then realize it would need internal state, and the internal state would be supporting a complex computational process, and that the GLUT would in effect be a virtual machine, etc.

    This is like an argument where someone tries to throw up examples that are so scary, or disgusting, or tear jerking, or whatever that we STOP THINKING and vote for whatever they are trying to sneak through. In other words it does not deserve the honor of being called an argument.

    This leaves the very interesting question of whether a computational process can support consciousness. I think yes, but the discussion is richer. GLUTs are a red herring and don't lead much of anywhere.

    Comment by Jed_Harris on Zombie Responses · 2008-04-05T05:50:53.000Z · LW · GW

    Thanks for taking the time and effort to hash out this zombie argument. Often people don't seem get the extreme derangement of the argument that Chalmers actually makes, and imagine because it is discussed in respectable circles it must make sense.

    Even the people who do "understand" the argument and still support it don't let themselves see the full consequences. Some of your quotes from Richard Chappell are very revealing in this respect. I think you don't engage with them as directly as you could.

    At one point, you quote Chappell:

    It's misleading to say it's "miraculous" (on the property dualist view) that our qualia line up so neatly with the physical world. There's a natural law which guarantees this, after all. So it's no more miraculous than any other logically contingent nomic necessity (e.g. the constants in our physical laws).

    But since Chalmers' "inner light" is epiphenomenal, any sort of "inner light" could be associated with any sort of external expression. Perhaps Chalmers' inner experience is horrible embarrassment about the arguments he's making, a desperate desire to shut himself up, etc. That is just as valid a "logically contingent nomic necessity". There's no reason whatsoever to prefer the sort of alignment implied by our behavior when we "describe our awareness" (which by Chalmers' argument isn't actually describing anything, it is just causal chains running off).

    Then you quote Chappell:

    ... Zombie (or 'Outer') Chalmers doesn't actually conclude anything, because his utterances are meaningless. A fortiori, he doesn't conclude anything unwarrantedly. He's just making noises; these are no more susceptible to epistemic assessment than the chirps of a bird.

    But we can't know that Chalmers' internal experience is aligned with his expressions. Maybe the correct contingent nomic necessity is that everyone except people whose name begins with C have inner experience. So Chalmers doesn't. That would make all his arguments just tweets.

    And because these dual properties are epiphenomenal, there is no possible test that would tell us if Chalmers is making an argument or just tweeting away. Or at least, so Chalmers himself apparently claims (or tweets). So to accept Chappell's position makes all epistemic assessment of other's contingent on unknowable facts about the world. Bit of a problem.

    As an aside, I'll also mention that Chappell's disparaging comments about "the chirps of a bird" indicate rather a blind spot. Birds chirp precisely to generate epistemic assessment in other birds, and the effectiveness of their chirps and their epistemic assessments is critical to their inclusive fitness.

    I'd like to see some speculation about why people argue like this. It certainly isn't because the arguments are intrinsically compelling.

    Comment by Jed_Harris on If You Demand Magic, Magic Won't Help · 2008-03-22T22:46:13.000Z · LW · GW

    Eliezer sayeth: "I want to be individually empowered by producing neato effects myself, without large capital investments and many specialists helping" ... [is] in principle doable - you can get this with, say, the right kind of nanotechnology, or (ahem) other sufficiently advanced tech, and bring it to a large user base..."

    Agreed. But as you hint, Eliezer, this case is indistinguishable from magic. So arguably the class of fantasies I mention are equivalent to living in some interesting future. In any case they don't seem to match the schema you present in the post.

    Eliezer continues: "...as long as they have the basic psychological ability to take joy in anything that is merely real." I think that even in a wonderful future, most people will take joy from unusually large bangs, crazy risks, etc. as they do today; fancy technology will make these easier to produce and survive. Most people still won't get much joy or wonder from the underlying phenomena unless we re-engineer human nature. Ian Banks' Culture novels and short stories have some pretty good ironic accounts of amazing Culture technology being used for thrills by idiots.

    I don't disagree with the importance of "joy in things that are merely real." But there are multiple sources of joy, some higher quality than others.

    And speaking of wishing for magical power, I wish I could copy a quote from this blog and paste it into the comment box with the text styles preserved. Shows how hard it is to come by magic.

    Comment by Jed_Harris on If You Demand Magic, Magic Won't Help · 2008-03-22T20:32:50.000Z · LW · GW

    There are a number of fantasy stories where the protagonist is very good at something, largely because they work hard at it, and then they enter a magical world and discover that their skills and work have a lot more impact. Often they have to work hard after they get there to apply their skills. Often the protagonist is a computer hacker and their skills, which in our world only work inside of computers, in a magical context can alter physical / consensual reality. (Examples: Broken Crescent, Web Mage. There are many others. Arguably this pattern goes back at least to The Incomplete Enchanter though success came way too easily for Harold Shea.)

    So I think the appeal of this type of fantasy is partly that big effects in our world usually require big causes -- capital investment, megatons of steel, etc. -- even after you know the right "magic spell". In these fantasy worlds -- and in some cases in computer networks -- big, widely distributed effects can be produced just by uttering the magic spell in the right place, or by building a local, inexpensive magical workshop using the right blueprint -- e.g. YouTube.

    Comment by Jed_Harris on Occam's Razor · 2007-09-29T05:51:23.000Z · LW · GW

    MIT Press has just published Peter Grünwald's The Minimum Description Length Principle. His Preface, Chapter 1, and Chapter 17 are available at that link. Chapter 17 is a comparison of different conceptions of induction.

    I don't know this area well enough to judge Peter's wok, but it is certainly informative. Many of his points echo Eliezer's. If you find this topic interesting, Peter's book is definitely worth checking out.

    Comment by Jed_Harris on The Futility of Emergence · 2007-09-01T17:17:00.000Z · LW · GW

    Thanks, Eliezer. Regarding your questions:

    1. Is the property objective or subjective? The coarse grained property is objective -- e.g. the largest connected component in percolation. The meta-property that a coarse-grained property is emergent is as objective as the entropy of a configuration. It is model dependent, but in most cases we can't come up with a model that makes it go away.

    2. To the extent "emergentness" is subjective, it is because it is relative to a model. So in some cases it could possibly be the result of ignorance of a better model. But we can't claim "emergentness" due to ignorance of any workable model, we can only say "I don't know".

    3. Conjecturing that a property is emergent is a guide to inquiry. It is saying "Let's look for a model of this property that's robust under perturbation of the elements of the ensemble, but where the value of the property changes dramatically due to small changes in the average value of some properties of the elements. The model will be based on some highly simplified view of how the elements interact."

    4. Some observations about models of emergent properties:

      • We don't have a very good "toolbox" for building them yet. We're getting better, but have a long way to go before we know how to proceed when we conjecture that a property is emergent.

      • We are even weaker in the design of emergent systems. That is why even very simple designs with emergent properties, like flocking, seem so striking. This is a serious disability because we depend on systems with major emergent properties, such as markets, and we don't know how to manage them very effectively.

      • Often when people claim properties are "mysterious", we could dispell these claims if we could respond with an intuitive account of how those properties emerge. Lacking such an account, we are often vulnerable to mystification.
      • Comment by Jed_Harris on The Futility of Emergence · 2007-09-01T07:36:00.000Z · LW · GW

        I do think there is a good deal of commonality among the reasonable comments about what emergence is and also feel the force of Eliezer's request for negative examples.

        I'll try to summarize (and of course over-simplify).

        When we have a large collection of interacting elements, and we can measure a property of the collection as a whole, in some cases we'd like to call that property emergent, and in some cases we wouldn't.

        I can think of three important cases:

        • If we can compute the property as a simple sum or average of properties of the individual elements, then it is not emergent. So e.g. mass or temperature are not emergent properties.

        • If we need to analyze long chains of structurally specific causal interactions to explain the coarser grained property, then it is not emergent. So e.g. the time telling properties of a mechanical clock, or the arithmetic computing properties of a calculator are not emergent.

        • If we can compute the property as a function of the properties of the elements, and it depends sensitively on specific characteristics of their behavior and interaction, but is robust under local perturbations (i.e. doesn't depend on structurally specific causal chains), then the property is emergent. So e.g. percolation is emergent. Also we have some warrant to say that flocking, thinking (as brains do it), social interaction, etc. are emergent.

        I'm not claiming these three cases cover all the legitimate positive and negative examples of emergence -- I don't think the concept has crystallized that completely yet. But I do think they answer Eliezer's challenge.

        Another, less crisply defined question is whether we should be using "emergence" so defined, and relatedly, whether people are mostly trying to use it in this sense, or whether they are, as Eliezer fears, just using it as a synonym for "magic".

        My own feeling is that many users of the term are groping for a clear definition of this general sort, and that they are doing so precisely to avoid having to explain a large class of phenomena by "magic".

        Comment by Jed_Harris on Tsuyoku Naritai! (I Want To Become Stronger) · 2007-04-14T19:29:47.000Z · LW · GW

        The discussion about the "dissipation" of knowledge from generation to generation (or of piety and trust in God, as ZH says) reminds me of Elizabeth Eisenstein's history of the transition to printing. Manual copying (on average) reduces the accuracy of manuscripts. Printing (on average) increases the accuracy, because printers can keep the type made up into pages, and can fix errors as they are found. Thus a type-set manuscript becomes a (more or less reliable) nexus for the accumulation of increasingly reliable judgments.

        Eisenstein's account has been questioned, but as far as I've seen, the issues that have been raised really don't undercut her basic point.

        Of course digital reproduction pushes this a lot further. (Cue the usual story about self-correcting web processes.) But I don't know of any really thorough analysis of the dynamics of error in different communication media.

        Comment by Jed_Harris on Tsuyoku Naritai! (I Want To Become Stronger) · 2007-04-14T19:21:24.000Z · LW · GW

        Great discussion! Regarding majoritarianism and markets, they are both specific judgment aggregation mechanisms with specific domains of application. We need a general theory of judgment aggregation but I don't know if there are any under development.

        In a purely speculative market (i.e. no consumption, just looking to maximize return) prices reflect majoritarian averages, weighted by endowment. Of course endowments change over time based on how good or lucky an investor is, so there is some intrinsic reputation effect. Also, investors can go bankrupt, which is an extreme reputation effect. If investors reproduce you can get a pretty "smart" system, but I'm sure it has systematic limitations -- the need to understand those limitations is a good example of why we need a general theory of judgment aggregation.

        I'd like to see an iterated jelly bean guessing game, with the individual guesses weighted by previous accuracy of each individual. I bet the results would quickly get better than just a flat average. Note that (unlike economies) there's no fixed quantity of "weight" here. Conserved exchanges are not a necessary part of this kind of aggregation.

        On the other hand if you let individuals see each other's guesses, I bet accuracy would get worse. (This is more similar to markets.) The problem is that there's be herding effects, which are individually rational (for guessers trying to maximize their score) but which on average reduce the overall quality of judgment. This is an intrinsic problem with markets. Maybe we should see this as an example of Eliezer's point in another post about marginal zero-sum competition.

        Comment by Jed_Harris on Marginally Zero-Sum Efforts · 2007-04-14T17:50:26.000Z · LW · GW

        Nick Bostrom's point is important: We should regard the induced competition as a negative externality of the process that induces the competition -- grant writing, consideration for promotion, etc. The "correct" solution as Bostrom points out is to internalize the cost.

        I think good companies do this quite carefully with the inducements they build into their culture -- they are looking to only generate competition that will produce net benefits to the company (not always the individuals).

        Conversely, there are well known shop floor self-management processes (workers punishing each other for competing to win management favor) that form to prevent exactly this kind of zero-sum competition (zero sum from the worker's point of view since they don't get a share of the increased profits).

        I would bet that in at least some granting processes, informal regulation like this arises to control the costs to applicants. It would be especially easy in the context of peer review.

        This is a reasonable interpretation of behavior that produces "old boys clubs" -- the members of the club have formed a coalition to reduce their costs of marginal zero-sum behavior. Of course it imposes other costs...