Fundamentally Flawed, or Fast and Frugal?

post by Kaj_Sotala · 2009-12-20T15:10:15.714Z · LW · GW · Legacy · 86 comments

Contents

86 comments

Whenever biases are discussed around here, it tends to happen under the following framing: human cognition is a dirty, jury-rigged hack, only barely managing to approximate the laws of probability even in a rough manner. We have plenty of biases, many of them a result of adaptations that evolved to work well in the Pleistocene, but are hopelessly broken in a modern-day environment.

That's one interpretation. But there's also a different interpretation: that a perfect Bayesian reasoner is computationally intractable, and our mental algorithms make for an excellent, possibly close to an optimal, use of the limited computational resources we happen to have available. It's not that the programming would be bad, it's simply that you can't do much better without upgrading the hardware. In the interest of fairness, I will be presenting this view by summarizing a classic 1996 Psychological Review article, "Reasoning the Fast and Frugal Way: Models of Bounded Rationality" by Gerd Gigerenzer and Daniel G. Goldstein. It begins by discussing two contrasting views: the Enlightenment ideal of the human mind as the perfect reasoner, versus the heuristics and biases program that considers human cognition as a set of quick-and-dirty heuristics.

Many experiments have been conducted to test the validity of these two views, identifying a host of conditions under which the human mind appears more rational or irrational. But most of this work has dealt with simple situations, such as Bayesian inference with binary hypotheses, one single piece of binary data, and all the necessary information conveniently laid out for the participant (Gigerenzer & Hoffrage, 1995). In many real-world situations, however, there are multiple pieces of information, which are not independent, but redundant. Here, Bayes’ theorem and other “rational” algorithms quickly become mathematically complex and computationally intractable, at least for ordinary human minds. These situations make neither of the two views look promising. If one would apply the classical view to such complex real-world environments, this would suggest that the mind is a supercalculator like a Laplacean Demon (Wimsatt, 1976)— carrying around the collected works of Kolmogoroff, Fisher, or Neyman—and simply neds a memory jog, like the slave in Plato’s Meno. On the other hand, the heuristics-and-biases view of human irrationality would lead us to believe that humans are hopelessly lost in the face of real-world complexity, given their supposed inability to reason according to the canon of classical rationality, even in simple laboratory experiments.

There is a third way to look at inference, focusing on the psychological and ecological rather than on logic and probability theory. This view questions classical rationality as a universal norm and thereby questions the very definition of “good” reasoning on which both the Enlightenment and the heuristics-and-biases views were built. Herbert Simon, possibly the best-known proponent of this third view, proposed looking for models of bounded rationality instead of classical rationality. Simon (1956, 1982) argued that information-processing systems typically need to satisfice rather than optimize. Satisficing, a blend of sufficing and satisfying, is a word of Scottish origin, which Simon uses to characterize algorithms that successfully deal with conditions of limited time, knowledge, or computational capacities. His concept of satisficing postulates, for instance, that an organism would choose the first object (a mate, perhaps) that satisfies its aspiration level—instead of the intractable sequence of taking the time to survey all possible alternatives, estimating probabilities and utilities for the possible outcomes associated with each alternative, calculating expected utilities, and choosing the alternative that scores highest.

Let us consider the following example question: Which city has a larger population? (a) Hamburg (b) Cologne.

The paper describes algorithms fitting into a framework that the authors call a theory of probabilistic mental models (PMM). PMMs fit three visions: (a) Inductive inference needs to be studied with respect to natural environments; (b) Inductive inference is carried out by satisficing algorithms; (c) Inductive inferences are based on frequencies of events in a reference class. PMM theory does not strive for the classical Bayesian ideal, but instead attempts to build an algorithm the mind could actually use.

These satisficing algorithms dispense with the fiction of omniscient Laplacean Demon, who has all the time and knowledge to search for all relevant information, to compute the weights and covariances, and then to integrate all this information into an inference.

The first algorithm presented is the Take the Best algorithm, named because its policy is "take the best, ignore the rest". In the first step, it invokes the recognition principle: if only one of two objects is recognized, it chooses the recognized object. If neither is recognized, it chooses randomly. If both are recognized, it moves on to the next discrimination step. For instance, if a person is asked which of city a and city b is bigger, and the person has never heard of b, they will pick a.

If both objects are recognized, the algorithm will next search its memory for useful information that might provide a cue regarding the correct answer. Suppose that you know a certain city has its own football team, while another doesn't have one. It seems reasonable to assume that a city having a football team correlates with the city being of at least some minimum size, so the existence of a football team has positive cue value for predicting city size - it signals a higher value on the target variable.

In the second step, the Take the Best algorithm retrieves from memory the cue values of the highest ranking cue. If the cue discriminates, which is to say one object has a positive cue value and the other does not, the search is terminated and the object with the positive cue value is chosen. If the cue does not discriminate, the algorithm keeps searching for better cues, choosing randomly if no discriminating cue is found.

The algorithm is hardly a standard statistical tool for inductive inference: It does not use all available information, it is non-compensatory and nonlinear, and variants of it can violate transitivity. Thus, it differs from standard linear tools for inference such as multiple regression, as well as from nonlinear neural networks that are compensatory in nature. The Take The Best algorithm is noncompensatory because only the best discriminating cue determines the inference or decision; no combination of other cue values can override this decision. [...] the algorithm violates the Archimedian axiom, which implies that for any multidimensional object a (a1, a2, ... an) preferred to b (b1, b2, ... bn) where a1 dominates b1, this preference can be reversed by taking multiples of any one or a combination of b2, b3, ... , bn. As we discuss, variants of this algorithm also violate transitivity, one of the cornerstones of classical rationality (McClennen, 1990).

This certainly sounds horrible: possibly even more horrifying is that a wide variety of experimental results make perfect sense if we assume that the test subjects are unconsciously employing this algorithm. Yet, despite all of these apparent flaws, the algorithm works.

The authors designed a scenario where 500 simulated individuals with varying amounts of knowledge were presented with pairs of cities and were tasked with choosing the bigger one (83 cities, 3,403 city pairs). The Take the Best algorithm was pitted against five other algorithms that were suggested by "several colleagues in the fields of statistics and economics": Tallying (where the number of positive cue values for each object is tallied across all cues and the object with the largest number of positive cue values is chosen), Weighted Tallying, the Unit-Weight Linear Model, the Weighted Linear Model, and Multiple Regression.

Take the Best was clearly the fastest algorithm, needing to look up far fewer cue values than the rest. But what about the accuracy? When the simulated individuals had knowledge of all the cues, Take the Best drew as many correct inferences as any of the other algorithms, and more than some. When looking at individuals with imperfect knowledge? Take the Best won or tied for the best position for individuals with knowledge of 20 and 50 percent of the cues, and didn't lose by more than a few tenths of a percent for individuals that knew 10 and 75 percent of the cues. Averaging over all the knowledge classes, Take the Best made 65.8% correct inferences, tied with Weighted Tallying for the gold medal.

The authors also tried two, even more stupid algorithms, which were variants of Take the Best. Take the Last, instead of starting the search from the highest-ranking cue, first tries the cue that discriminated last, then the cue that discriminated the time before the last, and so on. The Minimalist algorithm picks a cue at random. This produced a perhaps surprisingly small drop in accuracy, with Take the Last getting 64,7% correct inferences and Minimalist 64,5%.

After the algorithm comparison, the authors spend a few pages discussing some of the principles related to the PMM family of algorithms and their empirical validity, as well as the implications all of this might have on the study of rationality. They note, for instance, that even though transitivity (if we prefer a to b and b to c, then we should also prefer a to c) is considered a cornerstone axiom in classical relativity, several algorithms violate transitivity without suffering very much from it.

At the beginning of this article, we pointed out the common opposition between the rational and the psychological, which emerged in the nineteenth century after the breakdown of the classical interpretation of probability (Gigerenzer et al., 1989). Since then, rational inference is commonly reduced to logic and probability theory, and psychological explanations are called on when things go wrong. This division of labor is, in a nutshell, the basis on which much of the current research on judgment under uncertainty is built. As one economist from the Massachusetts Institute of Technology put it, “either reasoning is rational or it’s psychological” (Gigerenzer, 1994). Can not reasoning be both rational and psychological?

We believe that after 40 years of toying with the notion of bounded rationality, it is time to overcome the opposition between the rational and the psychological and to reunite the two. The PMM family of cognitive algorithms provides precise models that attempt to do so. They differ from the Enlightenment’s unified view of the rational and psychological, in that they focus on simple psychological mechanisms that operate under constraints of limited time and knowledge and are supported by empirical evidence. The single most important result in this article is that simple psychological mechanisms can yield about as many (or more) correct inferences in less time than standard statistical linear models that embody classical properties of rational inference. The demonstration that a fast and frugal satisficing algorithm won the competition defeats the widespread view that only “rational” algorithms can be accurate. Models of inference do not have to forsake accuracy for simplicity. The mind can have it both ways.

86 comments

Comments sorted by top scores.

comment by Kaj_Sotala · 2009-12-20T21:03:39.649Z · LW(p) · GW(p)

I don't, incidentally, think that our algorithms are anywhere close to optimal, but I nonetheless felt that the opposing point of view still merits a bit more attention than it has had here so far. They do have a point, even if they're not 100% correct.

Replies from: Gavin
comment by Gavin · 2009-12-21T08:09:47.734Z · LW(p) · GW(p)

This could actually act as counterevidence against the claim that AI will surpass humans around the time that the processing speed of computers rivals that of the human brain.

It may be that running a non-jury-rigged rational system against the complexity of the real world requires another order of magnitude or more of processing power.

This brings up the likelihood that initial AIs will need to be jury-rigged, and will have their own set of cognitive biases.

comment by Roko · 2009-12-20T20:43:55.295Z · LW(p) · GW(p)

a perfect Bayesian reasoner is computationally intractable, and our mental algorithms make for an excellent, possibly close to an optimal, use of the limited computational resources we happen to have available

Looking at Sandberg and Bostrom's The Wisdom of Nature: An Evolutionary Heuristic for Human Enhancement, we see that there are several reasons why the human brain's native algorithms are unlikely to be anything close to optimal, even given the limited computational resources we happen to have available inside our skulls:

  • Changed tradeoffs. Evolution ‘‘designed’’ the system for operation in one type of environment, but now we wish to deploy it in a very different type of environment

  • Value discordance. There is a discrepancy between the standards by which evolution measured the quality of her work, and the standards that we wish to apply.

  • Evolutionary restrictions. Sometimes the evolutionary algorithm just can't find certain solutions, for example because it gets stuck in local optima.

In the case of our cognitive algorithms, the "Changed tradeoffs" item seems particularly likely to be an issue. Our information rich environment means that highly accurate information can be obtained much, much more easily than in the EEA, but it requires careful rational analysis.

Replies from: Kaj_Sotala, ChristianKl
comment by Kaj_Sotala · 2009-12-20T21:01:16.533Z · LW(p) · GW(p)

Changed tradeoffs. Evolution ‘‘designed’’ the system for operation in one type of environment, but now we wish to deploy it in a very different type of environment

An important question - how changed is the environment, really? Yes, there are plenty of cases where a changed environment is obviously breaking our evolved reasoning algorithms, but I suspect many people might be overstating the difference.

Value discordance. There is a discrepancy between the standards by which evolution measured the quality of her work, and the standards that we wish to apply.

At the risk of falling into a purely semantic discussion, this doesn't mean the algorithms wouldn't be optimal. It just makes them optimized for some other purpose than the one we'd prefer.

Replies from: cousin_it, zero_call, Roko
comment by cousin_it · 2009-12-20T21:33:22.783Z · LW(p) · GW(p)

An important question - how changed is the environment, really?

That's a great discussion to have. I'd say the biggest changes are that a modern person interacts with a lot of other people and receives a lot of symbolic information. Other "major" changes, like increased availability of food or better infant healthcare, look to me minor by comparison. Not sure how to weigh this stuff, though.

Replies from: whpearson
comment by whpearson · 2009-12-20T21:55:16.126Z · LW(p) · GW(p)

We now also have computers.

I suspect the optimal evolved system in a modern environment (efficient and effective) is an idiot savant that can live long enough to spit out the source code for an AI guaranteed to increase the inclusive fitness of the genes of the host.

Genetic engineering and sperm/egg donation are other modern inventions I don't think we are all exploiting to increase our fitness optimally.

comment by zero_call · 2009-12-21T01:43:31.173Z · LW(p) · GW(p)

One of the fundamental ways the environment has changed locally must be the level of information that we are now able to process. Namely, since writing was invented, we've been able to consume (I would suppose) far more knowledge from far more sources. But, after all, since writing is just like a mimic of speech that we were originally "designed" for, I can't imagine the modern environment is so much different for our built in algorithms for writing. And similarly for many other "modern" aspects of life.

Edit: Interestingly, I suppose books and written information have essentially developed in civilization as a response to the weaknesses of the evolved brain. Thus, many of the deficiencies in our cognitive operations have actually been attacked by civilization. Insofar as the brain was not properly designed, the modern environment has largely been a source of positive, external cognitive optimization/reorganization.

One might propose that the environment has actually become far less challenging in modern times; certainly I haven't had to hunt and kill for food anytime in recent memory. Now, I can live far longer, with much less (positive) stress, I can smoke and drink and damage my mind at will, I have the express ability to become morbidly obese and mentally unhealthy, and so on. I can freely read and absorb widely disseminated propaganda from sources like Hitler, in maybe the worst case scenario. Perhaps the environment has been effectively weakening our internal algorithms through this kind of under usage and exploitation, rather than through any incidental non-optimization.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-12-21T01:58:04.555Z · LW(p) · GW(p)

Good point. Civilization allows to use the strengths of our native makeup more efficiently, thus instead of being "disadjusted" because of change since the EEA, in many areas we are more at home than could ever be naturally.

comment by Roko · 2009-12-20T21:16:08.180Z · LW(p) · GW(p)

how changed is the environment, really?

We have to do far more very-long-term planning than in the EEA, we are protected from starvation by easy job markets and stable food sources like food shops, we have access to healthcare, both mental and physical.

Most prominently, our explicit beliefs matter more for decision theory than for signalling, whereas in the EEA the opposite was true.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2009-12-20T21:45:43.241Z · LW(p) · GW(p)

We have to do far more very-long-term planning than in the EEA

As societies, perhaps. As individuals, probably not. I find it a bit odd that you mention a decreased risk of starvation at the same time as this item; needing to look forward a year or preferably several to make sure you didn't run out of food during the winter (or the winter after that) has been a major factor in the past. Even if you lived in a warm country, it seems like there would have been more long-term dangers than there are now, when we have a variety of safety networks and a much safer society.

Most prominently, our explicit beliefs matter more for decision theory than for signalling, whereas in the EEA the opposite was true.

Existential risks excluded, I'm not sure if this is true.

Replies from: Roko, Roko
comment by Roko · 2009-12-20T22:33:28.336Z · LW(p) · GW(p)

Most prominently, our explicit beliefs matter more for decision theory than for signalling, whereas in the EEA the opposite was true.

Existential risks excluded, I'm not sure if this is true.

Example: deciding to study at school rather than slack off.

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2009-12-20T22:46:40.341Z · LW(p) · GW(p)

Granted.

comment by Roko · 2009-12-20T22:32:20.016Z · LW(p) · GW(p)

Did hunter gatherers really look forward several winters ahead?

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2009-12-20T22:45:18.624Z · LW(p) · GW(p)

Hunter-gatherers, possibly not, but we've had agriculture around for 10,000 years. That has been enough time for other selection effects (for instance, the persistent domestication of cattle, and the associated dairying activities, did alter the selective environments of some human populations for sufficient generations to select for genes that today confer greater adult lactose tolerance), so I'd be cautious about putting too much weight on the hunter-gatherer environment.

Replies from: Roko
comment by Roko · 2009-12-20T23:56:08.715Z · LW(p) · GW(p)

interesting. So in fact for those adaptations that could be implemented in just 10,000/20 = 500 generations are probably more skewed towards rationality.

We can probably see the difference that those 500 generations made by the differences in life outcomes between those with aboriginal Australian DNA and white European DNA.

Replies from: MichaelVassar
comment by MichaelVassar · 2009-12-21T22:39:31.550Z · LW(p) · GW(p)

Why be needlessly inflammatory?

Replies from: Larks, Roko
comment by Larks · 2009-12-23T19:54:49.184Z · LW(p) · GW(p)

It provides an test for the theory?

comment by Roko · 2009-12-22T01:03:58.793Z · LW(p) · GW(p)

hmmm well I was actually considering the point purely from an academic POV - it occurred to me that the aboriginals were a near-perfect example. But now that you point it out, I guess that comment could be construed as "in bad taste" or "racist" or something.

Replies from: Nick_Tarleton
comment by Nick_Tarleton · 2009-12-23T20:30:26.543Z · LW(p) · GW(p)

Cultural differences are hard to factor out, too.

comment by ChristianKl · 2009-12-20T23:53:22.009Z · LW(p) · GW(p)

The fact that human reasoning isn't optimal implies in no way that the intelligently designed algorithm of Bayesian reasoning is better.

Replies from: None, Roko
comment by [deleted] · 2009-12-22T01:32:05.041Z · LW(p) · GW(p)

If you mean optimal as in "maximizing accuracy given the processing power", then yes. But in terms of "maximizing accuracy given the data", then Bayesian reasoning is optimal from the definition of conditional probability.

Replies from: ChristianKl
comment by ChristianKl · 2009-12-22T01:44:58.953Z · LW(p) · GW(p)

Maximizing accuracy given available processing power and available data is the core problem when it comes to finding a good decision theory.

We don't ask what decision theory God should use but what decision theory humans should use. Both go and chess are NP-hard and with can't be fully processed even if you have a computer build from all atoms in the universe.

Replies from: None
comment by [deleted] · 2009-12-22T02:54:53.573Z · LW(p) · GW(p)

You're confusing optimality in terms of results and efficiency in terms of computing power with your use of "NP-hard". Something like the travelling salesman problem is NP-hard in that there's no known way to solve them beyond a certain efficiency in terms of computing power (how to do optimally on them in terms of results is easy). It doesn't apply to chess or go in that there is no known way to get optimal results no matter how much computing power you have. These are two completely different things.

Replies from: JohannesDahlstrom
comment by JohannesDahlstrom · 2009-12-25T02:08:06.325Z · LW(p) · GW(p)

Surely there is a known way to play chess and go optimally (in the sense of always either winning or forcing a draw). You just search through the entire game tree, instead of a sub-tree, using the standard minimax algorithm to choose the best move each turn. This is obviously completely computationally infeasible, but possible in principle. See Solved game

comment by Roko · 2009-12-20T23:59:06.941Z · LW(p) · GW(p)

Correct.

It would be extraordinary if the algorithm that is optimal given infinite computational resource is also optimal given limited resource.

I suspect that by framing this as a battle between Bayesian inference and actual evolved human algorithms, we are missing the third alternative: algorithm X, which is the optimal algorithm for decision-making given the resources and options that we have in the society that we find ourselves in.

comment by Madbadger · 2009-12-20T20:51:15.200Z · LW(p) · GW(p)

It is worth remembering that human computation is a limited resource - we just don't have the ability to subject everything to Bayesian analysis. So, save our best rationality for what's important, and use heuristics to decide what kind of chips to buy at the grocery store.

Replies from: CronoDAS
comment by CronoDAS · 2009-12-20T21:03:43.802Z · LW(p) · GW(p)

I decided what college to go to by rolling a die. ;)

Replies from: billswift, Madbadger
comment by billswift · 2009-12-21T00:28:55.467Z · LW(p) · GW(p)

A random choice has long been considered a good tool to prevent dithering when you have equivalently valued alternatives.

comment by Madbadger · 2009-12-20T21:14:07.005Z · LW(p) · GW(p)

Yeah, sometimes you don't get the tools and information you need to make the best decision until after you've made it. 8-)

Replies from: CronoDAS
comment by CronoDAS · 2009-12-20T21:17:25.733Z · LW(p) · GW(p)

I wasn't disappointed in my choice of college, but I was disappointed in my choice of major. (I followed my father's advice, and, in this case, although his advice sounded reasonable, it turned out to be just plain wrong.)

comment by Roko · 2009-12-21T00:00:57.803Z · LW(p) · GW(p)

It would be extraordinary if the algorithm that is optimal given infinite computational resource is also optimal given limited resource.

I suspect that by framing this as a battle between Bayesian inference and actual evolved human algorithms, we are missing the third alternative: algorithm X, which is the optimal algorithm for decision-making given the resources and options that we have in the society that we find ourselves in.

Replies from: quanticle
comment by quanticle · 2009-12-21T18:22:26.677Z · LW(p) · GW(p)

Well, it may be that this ideal algorithm you're looking for is NP-hard, and thus cannot ever be executed in a short amount of time over a non-trivial problem space. Have you considered the possibility that this bounded rationality model is algorithm X?

Replies from: Cyan
comment by Cyan · 2009-12-21T18:25:53.401Z · LW(p) · GW(p)

Computing time is a resource, so "optimal algorithm for decision-making given the resources... we have" rules out impractical algorithms.

comment by CronoDAS · 2009-12-20T21:38:22.588Z · LW(p) · GW(p)

Incidentally, if I don't have a good answer to a "guessing" problem immediately, I find it faster to just Google the relevant facts than to try to struggle to find a distinction between them that I can latch onto.

As for Hamburg vs. Cologne, my recognition heuristic is more familiar with Hamburg as a city than Cologne as a city (I know Hamburg is in Germany, I suspect that Cologne is in France). On the other hand, I know that I recognize Hamburg because I often eat hamburgers, which doesn't seem like it says much about the city. Nevertheless, if I have to guess, I'll guess Hamburg. Now to look up the actual answer...

Wikipedia gives 1.8 million for Hamburg, and Cologne (also in Germany, which surprised me) is slightly under 1 million. So I guessed right, but I still prefer the JFGI heuristic. ;)

Replies from: Jawaka
comment by Jawaka · 2009-12-22T12:08:49.892Z · LW(p) · GW(p)

the German name for Cologne is Köln

comment by Roko · 2009-12-20T20:37:26.461Z · LW(p) · GW(p)

The demonstration that a fast and frugal satisficing algorithm won the competition defeats the widespread view that only “rational” algorithms can be accurate.

I am suspicious of work that attempts to provide evidence for a counterintuitive result in a way that could fairly obviously have been rigged. In this case, the key question is how "generic" their competition really was. It might be more convincing if arguments could be made about a plausible "real-world" distribution of problem instances, then a set of sample competitions drawn from that distribution and various decision algorithms run on those instances.

Replies from: billswift, Kaj_Sotala
comment by billswift · 2009-12-21T00:25:36.976Z · LW(p) · GW(p)

There is a lot more work on this point, not all of it focused on the point. What else, for example, would you call Robert Axelrod's "Tit-for-Tat" than a "fast and frugal satisficing algorithm"? In fact, there has been enough other work on this and related points that I would not refer to as a counter-intuitive result.

Replies from: Nick_Tarleton
comment by Nick_Tarleton · 2009-12-21T02:31:55.818Z · LW(p) · GW(p)

What else, for example, would you call Robert Axelrod's "Tit-for-Tat" than a "fast and frugal satisficing algorithm"?

Tit-for-tat doesn't win because it's computationally efficient.

Replies from: Technologos
comment by Technologos · 2009-12-21T09:13:30.003Z · LW(p) · GW(p)

Now that would be a cool extension of Axelrod's test: include a penalty per round or per pairing as a function of algorithm length.

Replies from: None
comment by [deleted] · 2009-12-22T03:05:21.910Z · LW(p) · GW(p)

Length of source code, or running time? (How the heck did English end up with the same word for measuring time and a 3D axis?)

Replies from: Technologos
comment by Technologos · 2009-12-22T03:11:49.844Z · LW(p) · GW(p)

I had been thinking source code length, such that it would correspond to Kolmogorov complexity. Both would actually work, testing different things.

And perhaps the English question makes more sense if we consider things with a fourth time dimension ;)

comment by Kaj_Sotala · 2009-12-20T22:20:51.277Z · LW(p) · GW(p)

Later work seems to support the notion of fast and frugal algorithms performing evenly with more complicated ones. See e.g. Fast and Frugal Heuristics: The Tools of Bounded Rationality for references to later experiments. (It's unfortunately too late here for me to write a proper summary of it, especially since I haven't read the referenced later studies.)

Replies from: Roko
comment by Roko · 2009-12-20T22:30:28.358Z · LW(p) · GW(p)

ok, sure. So there has been some thought put in beyond "here's this one challenge that we rigged to make the fast and frugal algorithm look good!"

comment by Daniel_Burfoot · 2009-12-21T04:05:04.736Z · LW(p) · GW(p)

The demonstration that a fast and frugal satisficing algorithm won the competition defeats the widespread view that only “rational” algorithms can be accurate.

While this demonstration is interesting in some sense, it's pretty obvious that for any algorithm one can find an example problem at which the algorithm excels. Does the paper state how many example problems were tried?

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2009-12-21T08:27:52.495Z · LW(p) · GW(p)

Does the paper state how many example problems were tried?

Only this problem. Later research has studied the algorithm's performance for other problems, though.

comment by Vladimir_Nesov · 2009-12-20T22:04:05.371Z · LW(p) · GW(p)

(Not directly related, but may be interesting to someone. )

In a certain technical sense, "satisficing" is formally equivalent to expected utility maximization. Specifically, consider an interval on a real line (e.g. the amount of money that could be made), and a continuous and monotonous utility function on that interval. Expected utility maximization for that utility function u (i.e. the choice of a random variable X with codomain in the amounts of money) is then equivalent to maximization of probability Pr(X>V), where V is a random variable that depends only on u. Pr(X>V) looks quite like satisficing: you are trying to choose the outcome X so that it's better than a threshold V, except you are uncertain about what the "correct" threshold is.

In detail, an appropriately scaled utility function u(x) is a cumulative distribution function for a random variable V, so expected utility can be written as

&=\int{u(x)d\mathrm{Pr}(x\geq%20X)}=\int{\mathrm{Pr}(x\geq%20V)d\mathrm{Pr}(x\geq%20X)}=\mathrm{Pr}(X\geq%20V))

Reference: E. Castagnoli & M. Licalzi (1996). `Expected utility without utility'. Theory and Decision 41(3):281-301.

Replies from: Technologos
comment by Technologos · 2009-12-21T09:33:29.040Z · LW(p) · GW(p)

That utility function would have a very interesting second derivative, though...

Also, the example appears to depend on simultaneous consideration of the options; with sequential consideration, might not a small standard deviation for V induce a situation where many options will have high Pr(X>V) and only EU maximization would support rejection of early options?

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-12-21T10:55:55.315Z · LW(p) · GW(p)

Could you say that more explicitly? What sequential consideration? Where do Pr(X>V) and EU(X) disagree, given that they are equal?

Replies from: Technologos
comment by Technologos · 2009-12-21T12:06:12.699Z · LW(p) · GW(p)

To be clear, I am not disagreeing with your analysis of the model you presented; I am arguing that satisficing and EU maximization are not equivalent in general, but rather only when certain conditions are satisfied. Imagine, for instance, that there was no uncertainty in V; then two distributions of X could both have Pr(X>V) = 1 with different EUs.

I was thinking of sequential consideration as essentially introducing uncertainty about the set of possible X distributions, but on reflection it's clear that this would be inadequate by itself to change C&L's result. The above modification--or any variant where satisficing includes a threshold requirement for Pr(X>V) rather than trying to maximize that quantity--would have to be integrated to make sequential consideration matter.

Finally, if V depends only on money, rather than utility, then having a utility function with positive second derivative could make EU maximizers pick an X distribution with higher mean and standard deviation than satisficers might.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-12-21T15:53:21.464Z · LW(p) · GW(p)

Finally, if V depends only on money, rather than utility, then having a utility function with positive second derivative could make EU maximizers pick an X distribution with higher mean and standard deviation than satisficers might.

I can't make sense of this statement.

Replies from: Technologos
comment by Technologos · 2009-12-21T19:26:49.166Z · LW(p) · GW(p)

The way you set up the model, V was a threshold of utility. Thus, anything that increased one's expected utility also increased one's expected probability of being above that threshold.

If, however, V was a threshold of money (distributed, say, as N($100,$10)), then look at these two X-distributions, given a utility function U(x) = x (the case of a function with positive second derivative just makes the following more extreme):

1) 100% probability of $200

2) 90% probability of $100 and 10% probability of $2100

Expected utilities:

1) 200

2) 300

Probabilities of meeting threshold:

1) 1- the probability of being 10 standard deviations above the mean, or "very damn close to 1"

2) 90% 50% + 10% "even closer to 1 than the above" = 55%

So EU-maxers will take the latter choice, where satisficers will take the former.

Note that if U(x) = x^2, then the disparity is even stronger.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-12-21T19:35:12.563Z · LW(p) · GW(p)

I believe you are confused, but can't pinpoint on the first look in what way exactly. V is not "threshold of utility", it is a random variable of the same kind as X. I don't see what you mean by setting V to be normally distributed and U(x)=x, given that by construction they determine each other by the rule Pr(x>V)=u(x). If you redefine the concepts, you should do so more clearly.

Replies from: Technologos
comment by Technologos · 2009-12-21T19:48:11.099Z · LW(p) · GW(p)

If the decision agent is trying to maximize the probability of its utility being greater than a draw from the random variable V (where V is specified in utility) then it is trying to maximize the probability of being above some (yet-unknown) threshold value, no?

The departure from your model that I was clarifying (unsuccessfully) in the last comment was for V to be a random variable not of utility but of money, distributed normally in this example. U(x) = x is the utility function for the EU-maxing agent, because when V is specified in money, the satisficing agent no longer needs to worry about utility.

The rule you gave is only true when the satisficer defines the threshold level in terms of utility.

Replies from: Vladimir_Nesov
comment by Vladimir_Nesov · 2009-12-21T19:57:03.041Z · LW(p) · GW(p)

No luck. You should write everything in math, specifying types (domains/codomains) of all functions/random variables. It'll really be easier, and the confusion (mine or yours) will be instantly resolved.

Replies from: Technologos, Technologos
comment by Technologos · 2009-12-21T20:34:11.526Z · LW(p) · GW(p)

Also, thanks for posting the original comment--it's actually useful to some research I'm doing, now that I actually understand it!

comment by Technologos · 2009-12-21T20:31:03.335Z · LW(p) · GW(p)

Ah, I think I found it. I took V to have a codomain in utilons in your example (that was my interpretation of "V is a random variable that depends only on u").

Reinterpreting the subsequent comments in that context, I can see that I was responding to "formally equivalent" in the original comment as if it meant "expected utility maximization of the traditional sort, where each outcome x is itself assigned a value by a function on x that does not involve V, will produce the same decisions as satisficing of the type described under these conditions."

Interestingly, the latter may be true if V did have a codomain in utilons (or at least, I was unable to come up with a consistent counterexample).

comment by vinayak · 2010-01-20T14:42:32.950Z · LW(p) · GW(p)

I think one thing that evolution could have easily done with our existing hardware is to at least allow us to use rational algorithms whenever it's not intractable to do so. This would have easily eliminated things such as Akrasia, where our rational thoughts do give a solution, but our instincts do not allow us to use them.

Replies from: wedrifid
comment by wedrifid · 2010-01-20T14:48:07.805Z · LW(p) · GW(p)

This would have easily eliminated things such as Akrasia, where our rational thoughts do give a solution, but our instincts do not allow us to use them.

It tried that with your great^x uncle. But he actually spent his time doing the things he said he wanted to do instead of what was best for him in his circumstances and had enough willpower to not cheat on his mate with the girls who were giving him looks.

comment by [deleted] · 2009-12-22T03:03:01.290Z · LW(p) · GW(p)

Heh, this reminds me of something I saw a while ago. http://plover.net/~bonds/shibboleths.html

comment by Madbadger · 2009-12-21T05:39:05.605Z · LW(p) · GW(p)

Here is an example of an amusing "Fast and Frugal" heuristic for evaluating claims with a lot of missing knowledge and required computation: http://xkcd.com/678/

comment by zero_call · 2009-12-21T01:20:45.583Z · LW(p) · GW(p)

Outstanding post and clearly written. I'd like to see more posts of this nature on here. The results definitely seem to make sense, and seem pleasing to my intuition, but I feel kind of skeptical about such a simplified account of the cognitive process. I suppose you have to start somewhere though, and I'm not really at all familiar with this kind of science.

From personal experience, encountering a lot of excellent mathematicians in University, I have often felt that some of the best mathematicians are people who simply have the best computational resources. In other words, they have the best memory, fastest sense of reasoning, etc. etc. But remarkably, these people of course are not somehow complete "intelligence trumps" that can out argue you or outsmart you on any topic. It gives a distinct impression of a situation similar to the one posed in this article, where optimized decision making relies perhaps equally on a both power, as well as algorithmic efficiency, or insight.

At the risk of sounding redundant or pedantic, I'm wondering why did you come across this paper in particular?

Replies from: Kaj_Sotala
comment by Kaj_Sotala · 2009-12-21T08:18:34.020Z · LW(p) · GW(p)

At the risk of sounding redundant or pedantic, I'm wondering why did you come across this paper in particular?

We had a university course on heuristics and biases a while back. This article was among the required reading.

(Unfortunately I couldn't make the time to complete that course back then, so I'm only now reading the articles.)

comment by PhilGoetz · 2009-12-21T00:50:26.837Z · LW(p) · GW(p)

This point is important if one is constructing a theory about how future AIs will think, and assumes that they will reach Aumann agreement because they are Bayesians.

comment by CronoDAS · 2009-12-20T21:02:29.962Z · LW(p) · GW(p)

The "recognition heuristic" tends to work surprisingly well for stock picking, or so I've heard.

Find a bunch of "ordinary" people who have no special knowledge of stock picking, give them a list of companies, and ask them to say which ones they've heard of. Stocks of companies people have heard of tend to do better than stocks that people haven't heard of.

Replies from: Alicorn, Bo102010, ABranco
comment by Alicorn · 2009-12-20T21:24:11.448Z · LW(p) · GW(p)

My guess is that this would stop working if you did it with people who do have special stock related knowledge. The recognition heuristic is most effective when you've heard of comparatively few things. For instance, on the "which city is bigger" test, Germans were shown to do better for American cities than Americans did, and vice-versa, because Americans are more likely than Germans to have heard of small American cities and vice-versa.

comment by Bo102010 · 2009-12-20T22:00:24.711Z · LW(p) · GW(p)

Note also that "tend to do better than" does not mean "tend to outperform the market as a whole," an important point.

Replies from: magfrump, CronoDAS
comment by magfrump · 2009-12-21T02:00:22.186Z · LW(p) · GW(p)

for any well-defined sense of "tend to do better than" it has to, otherwise it isn't tending to do better.

(since any stock someone has heard of is "tending to do better than" the set of stocks people haven't heard of)

Unless the statement was intended to be "stocks of companies people have heard of tend to do better than stocks of SIMILAR COMPANIES people haven't heard of."

comment by CronoDAS · 2009-12-20T23:59:52.499Z · LW(p) · GW(p)

Indeed.

comment by ABranco · 2009-12-21T00:01:34.157Z · LW(p) · GW(p)

You're right. It works better if the group interviewed is composed of neither experts nor completely isolated news-averse schizoids.

comment by [deleted] · 2012-12-11T22:04:39.914Z · LW(p) · GW(p)

There is a very clear cluster of people working in cognitive science with bayesian and machine learning savvy, centered around Tenenbaum, Griffiths, Kemp, Goodman, Chater, Oaksley, Perfors, Steyvers, et cetera. They often coauthor papers and have something of a unified perspective on The Way to do things (more unified and more coauthory even restricting the field to other bayesian and machine learning savvy folk, like Hinton, Gigerenzer, Friston, MD Lee). It seems like they should have a name. Tengrikemgoochoakpersteyvetcet perhaps? But then, perhaps not.

Anyway, the Tengriks have a new paper for NIPS 2012, part of their ongoing bounded optimality project to show how brains implement rational approximations of rational inference. They show how anchoring bias naturally falls out in situations when there's a time cost to continuing computation. Pdf here.

Not a revolutionary idea, but still a nice paper.

comment by [deleted] · 2012-03-02T22:04:33.027Z · LW(p) · GW(p)

What if the question required picking the smaller city? Then, if you've only heard of one, it would seem you should pick the unknown city, as you are more likely to know of larger than smaller cities. Doesn't the take the best algorithm, by specifying taking the one you know as a general fast-and-frugal tactic, lead you astray? Do you know whether subjects still choose the known city?

comment by Thomas · 2009-12-21T10:53:43.139Z · LW(p) · GW(p)

Just a question for MWI advocates.

If this world W1 has a parallel world W2, which has a parallel world W3, and which W1 hasn't - this is the very difference between W1 and W2 - is the W3 second order parallel to us?

comment by ChristianKl · 2009-12-20T23:59:23.514Z · LW(p) · GW(p)

There'a no person who plays chess on a good level while employing Bayesian reasoning.

In Go Bayesian reasoning performs even worse. A good Go player makes some of his move simply because he appreciate their beauty and without having "rational" reasons for them. Our brain is capable of doing very complex pattern matching that allows the best humans to be better at a large variety of tasks than computers which use rule based algorithms.

Replies from: MichaelVassar, None
comment by MichaelVassar · 2009-12-21T22:33:21.669Z · LW(p) · GW(p)

In chess or go idealized Bayesians just make the right move because they are logically omniscient.

Replies from: wedrifid, nickjhay, ChristianKl
comment by wedrifid · 2009-12-21T23:33:38.017Z · LW(p) · GW(p)

In chess or go idealized Bayesians just make the right move because they are logically omniscient.

Logical omniscience comes close to the perfect move but understanding the imperfections of the opponent can alter what the ideal move is slightly. This requires prior information that can not be derived logically (from the rules of the game).

comment by Nick Hay (nickjhay) · 2009-12-21T22:52:01.392Z · LW(p) · GW(p)

Idealized Bayesians don't have to be logically omniscient -- they can have a prior which assigns probability to logically impossible worlds.

comment by ChristianKl · 2009-12-22T00:31:15.021Z · LW(p) · GW(p)

If you argue that Bayesianism is only a good way to reason when you are omniscient and a bad idea for people who aren't omniscient I can agree with your argument.

If you are however omniscient you don't need much decision theory anyway.

Replies from: None
comment by [deleted] · 2009-12-22T03:14:21.197Z · LW(p) · GW(p)

There's a bit of a difference between logical omniscience and vanilla omniscience: with logical omniscience, you can perfectly work out all the implications of all of the evidence you find, and with the other sort, you get to look a printout of the universe's state.

Replies from: ChristianKl
comment by ChristianKl · 2009-12-23T23:59:50.898Z · LW(p) · GW(p)

But you don't have any of those in the real world and therefore they shouldn't factor into a discussion about effective decision making strategies.

Replies from: None
comment by [deleted] · 2010-01-26T04:37:26.871Z · LW(p) · GW(p)

You'll never find perfect equality in the real world, so let's abandon math.

Replies from: ChristianKl
comment by ChristianKl · 2010-01-30T15:53:56.618Z · LW(p) · GW(p)

You will never find evidence for the existence of God, so let's abandon religion...

Replies from: Richard_Kennaway
comment by Richard_Kennaway · 2010-01-30T17:47:41.948Z · LW(p) · GW(p)

Yes! Already did!

Replies from: ChristianKl
comment by ChristianKl · 2010-01-30T20:53:57.169Z · LW(p) · GW(p)

Where's the difference between believing in nonexistent logical omniscience and believing in nonexistent Gods?

comment by [deleted] · 2009-12-22T03:10:10.611Z · LW(p) · GW(p)

I'd imagine Deep Blue is more approximately Bayesian that a human (search trees vs. giant crazy neural net).

Replies from: Nick_Tarleton
comment by Nick_Tarleton · 2009-12-24T00:11:29.096Z · LW(p) · GW(p)

I think you mean "cleanly constructed" or something like that. Minimax search doesn't deal with uncertainty at all, whereas good human chess players presumably do so, causally model their opponents, and the like.

comment by brazil84 · 2009-12-20T20:43:24.425Z · LW(p) · GW(p)

It seems to me that the problems with human rationality really start to come out when our sense of self is somehow on the line.

It's one thing to guess at which of two foreign cities is bigger. It's another to guess at which child is smarter -- our own child or or somone else's.

So perhaps we as humans have hardware and software which is pretty good, except that we sometimes use our brainpower to fool ourselves.