Selecting Rationalist Groups

post by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-04-02T16:21:11.355Z · LW · GW · Legacy · 34 comments

Contents

34 comments

Previously in seriesPurchase Fuzzies and Utilons Separately
Followup toConjuring an Evolution To Serve You

GreyThumb.blog offered an interesting comparison of poor animal breeding practices and the fall of Enron, which I previously posted on in some detail.  The essential theme was that individual selection on chickens for the chicken in each generation who laid the most eggs, produced highly competitive chickens—the most dominant chickens that pecked their way to the top of the pecking order at the expense of other chickens.  The chickens subjected to this individual selection for egg-laying prowess needed their beaks clipped, or housing in individual cages, or they would peck each other to death.

Which is to say: individual selection is selecting on the wrong criterion, because what the farmer actually wants is high egg production from groups of chickens.

While group selection is nearly impossible in ordinary biology, it is easy to impose in the laboratory: and breeding the best groups, rather than the best individuals, increased average days of hen survival from 160 to 348, and egg mass per bird from 5.3 to 13.3 kg.

The analogy being to the way that Enron evaluated its employees every year, fired the bottom 10%, and gave the top individual performers huge raises and bonuses.  Jeff Skilling fancied himself as exploiting the wondrous power of evolution, it seems.

If you look over my accumulated essays, you will observe that the art contained therein is almost entirely individual in nature... for around the same reason that it all focuses on confronting impossibly tricky questions:  That's what I was doing when I thought up all this stuff, and for the most part I worked in solitude.  But this is not inherent in the Art, not reflective of what a true martial art of rationality would be like if many people had contributed to its development along many facets.

Case in point:  At the recent LW / OB meetup, we played Paranoid Debating, a game that tests group rationality.  As is only appropriate, this game was not the invention of any single person, but was collectively thought up in a series of suggestions by Nick Bostrom, Black Belt Bayesian, Tom McCabe, and steven0461.

In the game's final form, Robin Gane-McCalla asked us questions like "How many Rhode Islands would fit into Alaska?" and a group of (in this case) four rationalists tried to pool their knowledge and figure out the answer... except that before the round started, we each drew facedown from a set of four cards, containing one spade card and one red card.  Whoever drew the red card got the job of trying to mislead the group.  Whoever drew the spade showed the card and became the spokesperson, who had to select the final answer.  It was interesting, trying to play this game, and realizing how little I'd practiced basic skills like trying to measure the appropriateness of another's confidence or figure out who was lying.

A bit further along, at the suggestion of Steve Rayhawk, and slightly simplified by myself, we named 60% confidence intervals for the quantity with lower and upper bounds; Steve fit a Cauchy distribution to the interval ("because it has a fatter tail than a Gaussian") and we were scored according to the log of our probability density on the true answer, except for the red-card drawer, who got the negative of this number.

The Paranoid Debating game worked surprisingly well—at least I had fun, despite somehow managing to draw the red card three out of four times.  I can totally visualize doing this at some corporate training event or even at parties.  The red player is technically acting as an individual and learning to practice deception, but perhaps practicing deception (in this controlled, ethically approved setting) might help you be a little less gullible in turn.  As Zelazny observes, there is a difference in the arts of discovering lies and finding truth.

In a real institution... you would probably want to optimize less for fun, and more for work-relevance: something more like Black Belt Bayesian's original suggestion of The Aumann Game, no red cards.  But where both B3 and Tom McCabe originally thought in terms of scoring individuals, I would suggest forming people into groups and scoring the groups.  An institution's performance is the sum of its groups more directly than it is the sum of its individuals—though of course there are interactions between groups as well.  Find people who, in general, seem to have a statistical tendency to belong to high-performing groups—these are the ones who contribute much to the group, who are persuasive with good arguments.

I wonder if there are any hedge funds that practice "trio trading", by analogy with pair programming?

Hal Finney called Aumann's Agreement Theorem "the most interesting, surprising, and challenging result in the field of human bias: that mutually respectful, honest, and rational debaters cannot disagree on any factual matter once they know each other's opinions".  It is not just my own essays that are skewed toward individual application; the whole trope of Traditional Rationality seems to me skewed the same way.  It's the individual heretic who is the hero, and Authority the untrustworthy villain whose main job is not to resist the heretic too much, to be properly defeated.  Science is cast as a competition between theories in an arena with rules designed to let the strongest contender win.  Of course, it may be that I am selective in my memory, and that if I went back and read my childhood books again, I would notice more on group tactics that originally slipped my attention... but really, Aumann's Agreement Theorem doesn't get enough attention.

Of course most Bayesian math is not widely known—the Agreement Theorem is no exception here.  But even the intuitively obvious counterpart of the Agreement Theorem, the treatment of others' beliefs as evidence, receives little shrift in Traditional Rationality.  This may have something to do with Science developing in the midst of insanity and in defiance of Authority; that is a historical fact about how Science developed.  But if the high performers of a rationality dojo need to practice the same sort of lonely dissent... well, that must not be a very effective rationality dojo.

 

Part of the sequence The Craft and the Community

Next post: "Incremental Progress and the Valley"

Previous post: "Purchase Fuzzies and Utilons Separately"

34 comments

Comments sorted by top scores.

comment by MBlume · 2009-04-02T20:05:46.028Z · LW(p) · GW(p)

So far there've only been LW/OB meetups in the Bay area -- is there any way we could plot the geographic distribution of LW members and determine whether there are other spots where we could get a good meetup going?

Replies from: ciphergoth, MichaelHoward
comment by MichaelHoward · 2009-04-02T21:52:40.971Z · LW(p) · GW(p)

There's been meets on the East Coast too, but it's about time they expanded beyond America's shores.

Here's a vote for London!

Replies from: Roko
comment by Roko · 2009-04-02T23:31:36.843Z · LW(p) · GW(p)

Carl shulman and I were speaking about lw meetups at the end of June here in the uk. London, oxford and Cambridge were mentioned. I am in Edinburgh.

comment by MBlume · 2009-04-02T18:45:46.571Z · LW(p) · GW(p)

The trouble with the scoring as described is that it is not zero sum, and, as far as I can tell, constitutes a prisoner's dilemma. That is, if you would cooperate on the one-shot PD, you should also completely ignore a red card handed you. This can be remedied by giving the red card -3 times the score of the group.

ETA: I suppose this PD-equivalence collapses if one player believes themself to be significantly more effective than the other three (or if anyone believes anyone believes this etc. etc.)

Replies from: Psy-Kosh
comment by Psy-Kosh · 2009-04-02T22:06:49.726Z · LW(p) · GW(p)

The downside is that your observation messes up the game a bit. The upside is that your observation means that in "really real" situations, rationalists would be even less likely to try to deceive each other. :)

comment by jimrandomh · 2009-04-02T17:10:47.845Z · LW(p) · GW(p)

I like the concept of the Paranoid Debating game, but would propose one modification. Rather than always having a player assigned to deceive, have one with 50% probability, but don't reveal to anyone (except the deceiver) whether there is a deceiver in the group. To implement this with a group of n players, first choose a spokesman and give him a spade, then deal each other player one card from a deck containing 2n-3 black cards and one red card.

As another possible variant, introduce a small (say, 1/52) chance that everyone except the spokesman is red, and everyone except the spokesman knows it. To implement this, first choose a spokesman at random, then choose a dealer from the remaining players. The dealer looks at one card at random. If it's the ace of diamonds, he prepares a deck containing only diamonds; otherwise, he prepares a deck containing 2n-3 black cards and one heart, shuffles it and then deals from it. Getting a diamond means that everyone else has a diamond; getting a heart means that everyone else has black cards.

Replies from: Eliezer_Yudkowsky, MattFisher
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-04-02T17:28:31.617Z · LW(p) · GW(p)

Heh - why should you know whether all the others are evil or not? How interesting would it be, if, by being pulled hard in different directions by liars who didn't know the others were lying, the spokesperson ended up with a more accurate estimate?

Replies from: ciphergoth, jimrandomh
comment by Paul Crowley (ciphergoth) · 2009-04-02T21:57:03.949Z · LW(p) · GW(p)

There's endless variety here, since this is essentially a form of Werewolf) about real facts. I can't wait to play it.

comment by jimrandomh · 2009-04-02T17:53:37.451Z · LW(p) · GW(p)

Heh - why should you know whether all the others are evil or not?

Logistics. It isn't practical to have someone shuffle a deck if they aren't allowed to see any of the cards they're shuffling, so if you're using playing cards to assign roles, at least the dealer will know whether or not the players are all red.

One possible solution would be to have a PDA or smartphone assign the roles, and pass it around. If you do it this way, you could also have a small chance that one player is given the exact real answer. (But red players could falsely claim that they have it.)

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-04-02T18:13:15.804Z · LW(p) · GW(p)

Our version had one person (Robin Gane-McCalla) as central coordinator. Also, it's quite possible to shuffle small units of cards without seeing their undersides.

comment by MattFisher · 2009-04-03T05:26:17.634Z · LW(p) · GW(p)

A simple variant with interesting results would be to deal everyone one card from a full deck. Anyone who is dealt a diamond is a deceiver. The dealer can be the spokesman, so it will rotate each turn. This way there is a 1/4 chance that any given person is a deceiver, and a small (1/(4^n))-ish chance that all n players (including the dealer) are trying to deceive each other.

Trying to reach the best outcome for everyone with an unknown number of deceivers in the mix? Sounds like life.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-04-03T05:44:37.061Z · LW(p) · GW(p)

But the spokesperson is the only one known to be trustworthy who has to put together the final estimate - if they're a deceiver, they can just say "One googol!" or whatever.

comment by MichaelHoward · 2009-04-02T19:39:01.009Z · LW(p) · GW(p)

Paranoid Debating suggestion: after uncovering the red menace, spend a few minutes trying to figure out what effect the disinformation had on your estimate, then make a new one.

Repeat with the reformed Red joining in.

Replies from: John_Maxwell_IV
comment by John_Maxwell (John_Maxwell_IV) · 2009-04-02T21:16:37.031Z · LW(p) · GW(p)

Average all the group members' suggestions as to how many Rhode Islands will fit in Alaska, then throw out the guess farthest from the average, re-average, and make that the center of your distribution.

Replies from: MichaelHoward
comment by MichaelHoward · 2009-04-02T21:41:30.496Z · LW(p) · GW(p)

Ah... sorry for not being clear. My suggestion was meant to help participants test their rationality, learn about biases, and internalize that knowledge through experience, not as a way to improve their estimate.

comment by Z_M_Davis · 2009-04-02T22:30:48.084Z · LW(p) · GW(p)

"[...] collectively thought up in a series of suggestions by [...] Black Belt Bayesian [...] and steven0461."

Aren't these the same person?

Replies from: steven0461
comment by steven0461 · 2009-04-02T22:37:16.703Z · LW(p) · GW(p)

They are.

Replies from: Eliezer_Yudkowsky
comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2009-04-03T02:48:38.276Z · LW(p) · GW(p)

Ah. Okay, wasn't sure about that.

Why 0461, if it's not too private?

Replies from: steven0461
comment by steven0461 · 2009-04-03T13:38:05.792Z · LW(p) · GW(p)

I thought I should pick something more unique than "steven"; the specific number is one of those one-person in-jokes.

comment by SoullessAutomaton · 2009-04-02T19:56:26.001Z · LW(p) · GW(p)

As someone who is largely not likely to ever attend a LW/OB meetup, is there any chance of organizing a net-based variant of this game? It sounds reasonably fun and useful.

Replies from: ciphergoth
comment by Paul Crowley (ciphergoth) · 2009-04-02T22:10:32.619Z · LW(p) · GW(p)

It would be easy to play it on IRC, if we had a suitable webapp to support it. I've created a #lesswrong channel on irc.freenode.net - not sure it's in-mission for them, but I guess they'll decide that. I'll add something about timezone to my location survey.

comment by Steve_Rayhawk · 2009-04-03T03:37:29.439Z · LW(p) · GW(p)

we were scored according to the log of our probability density on the true answer, except for the red-card drawer, who got the negative of this number.

This part still needs to be improved by someone. Log probability densities are only defined up to an additive constant log of a scaling factor. A player could get a high score by drawing the red card for a question with an answer in small units.

To normalize the scores, you could subtract the average of the log probability densities across groups.

comment by NoSignalNoNoise (AspiringRationalist) · 2015-10-09T00:08:22.099Z · LW(p) · GW(p)

When we tried Paranoid Debating at the Boston meetup a few years back, we often had the problem that the deceiver didn't know enough about the question to know which direction to mislead in. I think the game would work better if the deceiver were simply trying to bias the group in a particular direction rather than make the group wrong. I think that's also a closer approximation to real life - plenty of people want to sell you their product and don't know or care which option is best. Not many just want you to buy the wrong product.

comment by TheOtherDave · 2010-11-30T17:27:39.842Z · LW(p) · GW(p)

I feel I ought to say that I really appreciated this post.

I have been struggling for some time, as I read through the archives, with the degree to which isolated intellectual problem-solving is presented as the heart of Fun. This very much does not describe "my Way" and I have been feeling increasingly alienated by it.

Consequently, it is a great relief to me to see you acknowledge that your essays are "skewed toward individual application."

It's not that I'm opposed to it, and it's not that I expect you to change your style (not least because you're reading this, if you are, 18 months "in the future").

I just appreciate the acknowledgment, however transient, that it is one Way among many.

Thanks.

comment by Peter_de_Blanc · 2009-04-10T15:42:47.165Z · LW(p) · GW(p)

The game that Paranoid Debating most reminds me of is Mafia (or Werewolf).) Most of the players are cooperating to achieve a common goal, but there is a minority of secret saboteurs who are trying to achieve the opposite goal. I didn't think it was an interesting game until I played it with a group of skilled players, and then I had a lot of fun.

Other games might be enjoyable in a Paranoid Debating-like format. For instance, Paranoid Go could involve teams that decide collectively which move to play, and each team includes one saboteur who is trying to make the team lose.

comment by Peter_de_Blanc · 2009-04-05T11:36:02.242Z · LW(p) · GW(p)

Regarding chicken breeding: individual selection would do better with a larger breeding population. In a population of size N, destroying a rival's egg reduces the average per-chicken egg output by 1/N, but laying an egg increases your own output by 1. Sabotaging rivals becomes less important as population size increases.

Group selection probably still produces better outcomes, because then chickens will actually cooperate instead of being (at best) indifferent to each other.

You can notice this same effect in all sorts of zero-sum games. If there are only two players, then sabotaging your opponent is exactly as important as helping yourself, but if there are many players, then sabotaging your opponents becomes less attractive.

comment by MendelSchmiedekamp · 2009-04-02T18:37:46.907Z · LW(p) · GW(p)

There is a reason selective breeding in business tends to focus on behavior, rather than personnel, in the very least the turn around is much faster and your replacement population will be more strongly influences by the positive selection.

It seems a good lens to view various management trends and ideas, to see what effects they could be predicted to have on both the people-space and behavior-space populations.

On the other hand, games as selectors can serve a dual role, both as a calibrated test and as a means to learn, especially because of the potential influence of play on reinforcing the less conscious techniques used in rationality. Unfortunately, there is a risk of diminishing returns if the game is used too much for either purpose.

comment by handoflixue · 2011-07-22T00:08:32.365Z · LW(p) · GW(p)

This seems like a key missing ingredient in a lot of the "rationality dojo" suggestions I'm seeing - there's a heavy focus on competition and individual selection, and not a lot of focus on rationalist groups.

comment by igoresque · 2009-04-04T13:53:14.611Z · LW(p) · GW(p)

This post is really about 2 issues, both interesting. The 'game' thread is interesting and fun.

However, the 'group vs individuals issue' deserves attention as well! I believe it is entirely true, and this simple observation may sky-rocket efficiency in many areas. At the same time, this isn't very apealing on a personal level. There doesn't seem place for our ego's anymore. That may be rational, but not fun. Also, as individualists will collectively agree, groups just. suck. I can't be creative 'in a group'. And I suspect I am not alone in that...

comment by pcm · 2009-04-03T23:28:53.893Z · LW(p) · GW(p)

I'm puzzled about why people preferred the spokesman version of paranoid debating to the initial version where the median number was the team's answer. Designating a spokesman publicly as a non-deceiver provides information about who the deceiver is. In one case, we determined who the deceiver was by two of us telling the spokesman that we were sufficiently ignorant about the subject relative to him that he should decide based only on his knowledge. That gave our team a big advantage that had little relation to our rationality. I expect the median approach can be extended to confidence intervals by taking the median of the lows and the median of the highs, but I'm not fully confident that there are no problems with that.

I have more comments about Sunday's game here.

comment by Lightwave · 2009-04-02T21:53:25.977Z · LW(p) · GW(p)

This game sounds a lot like Mafia).

comment by PhilGoetz · 2009-04-02T18:46:48.834Z · LW(p) · GW(p)

Thumbs-up; nice to see collective rationality advocated here.