Noisy Reasoners

post by lukeprog · 2012-12-13T07:53:29.193Z · LW · GW · Legacy · 14 comments

One of the more interesting papers at this year's AGI-12 conference was Finton Costello's Noisy Reasoners. I think it will be of interest to Less Wrong:

 

This paper examines reasoning under uncertainty in the case where the AI reasoning mechanism is itself subject to random error or noise in its own processes. The main result is a demonstration that systematic, directed biases naturally arise if there is random noise in a reasoning process that follows the normative rules of probability theory. A number of reliable errors in human reasoning under uncertainty can be explained as the consequence of these systematic biases due to noise. Since AI systems are subject to noise, we should expect to see the same biases and errors in AI reasoning systems based on probability theory.

 

14 comments

Comments sorted by top scores.

comment by gwern · 2012-12-14T03:11:07.861Z · LW(p) · GW(p)

A recent paper I found even more interesting, courtesy of XiXiDu: "Burn-in, bias, and the rationality of anchoring"

Bayesian inference provides a unifying framework for addressing problems in machine learning, artificial intelligence, and robotics, as well as the problems facing the human mind. Unfortunately, exact Bayesian inference is intractable in all but the simplest models. Therefore minds and machines have to approximate Bayesian inference. Approximate inference algorithms can achieve a wide rangeof time-accuracy tradeoffs, but what is the optimal tradeoff? We investigate time-accuracy tradeoffs using the Metropolis-Hastings algorithm as a metaphor for the mind’s inference algorithm(s). We find that reasonably accurate decisions are possible long before the Markov chain has converged to the posterior distribution, i.e. during the period known as “burn-in”. Therefore the strategy that is optimal subject to the mind’s bounded processing speed and opportunity costs may perform so few iterations that the resulting samples are biased towards the initial value. The resulting cognitive process model provides a rational basis for the anchoring-and-adjustment heuristic. The model’s quantitative predictions are tested against published data on anchoring in numerical estimation tasks.

comment by [deleted] · 2012-12-13T08:29:07.582Z · LW(p) · GW(p)

We however can expect AI systems to be less subject to noise than human brains.

Replies from: twanvl
comment by twanvl · 2012-12-13T14:39:12.147Z · LW(p) · GW(p)

Not necessarily. By using randomness you can often get more work done with less resources, at the cost of increased noise. This is also a trade-off that an AI system should make.

Replies from: None, PatSwanson
comment by [deleted] · 2012-12-13T22:12:50.598Z · LW(p) · GW(p)

Not that level of randomness. Computers are far more precise than meat. Most of the noise in meat is just plain error, not approximation by probabilistic methods.

comment by PatSwanson · 2012-12-13T21:53:29.807Z · LW(p) · GW(p)

Wouldn't increasing noise levels in the decision-making processes of a Friendly AI decrease the Friendliness of that AI?

I think that ought to take this approach to reducing resource-consumption off the table.

Replies from: None
comment by [deleted] · 2012-12-13T22:11:19.229Z · LW(p) · GW(p)

The worst that noise can do is decrease the quality of the approximation that the AI is using. (EDIT: barring the OP effects, of which I am skeptical) For friendliness, this means decreasing decision quality.

If you decide that such is unacceptable, the AI needs to spend more resources (time and energy) on coming to the conclusion. In some cases that will be worth it, in others, not. The AI is capable of making this trade off on it's own.

If you don't let it trade accuracy for speed, the day will come when you need a decision now, and the AI will choke and everyone will die.

It's not clear how an AI that couldn't trade off accuracy could even work, given that the exact forms of nearly everything are intractable.

comment by MrMind · 2012-12-13T16:12:21.208Z · LW(p) · GW(p)

While the model is interesting, it is almost irremediably ruined by this line: "since by definition P(A) = Ta/n", which substantially conflates probability with frequency. From this point of view, the conclusion:

It is clear from Pearl's work that probability theory provides normatively correct rules which an AI system must use to reason optimally about uncertain events. It is equally clear that AI systems (like all other physical systems) are unavoid- ably subject to a certain degree of random variation and noise in their internal workings. As we have seen, this random variation does not produce a pattern of reasoning in which probability estimates vary randomly around the correct value; instead,it produces systematic biases that push probability estimates in certain directions and so will produce conservatism, subadditivity, and the conjunction and disjunction errors in AI reasoning.

does not follow (because the estimation is made by the prior and not by counting).
BUT the issue of noise in AI is interesting per se: if we a have stable self-improving friendly AI, could it faultily copies/update itself into an unfriendly version?

Replies from: nigerweiss, Kindly
comment by nigerweiss · 2012-12-14T08:20:28.601Z · LW(p) · GW(p)

Repeated self modification is problematic, because it represents a product of series (though possibly a convergent one, if the AI gets better at maintaining its utility function / rationality with each modification) -- naively, because no projection about the future can have a confidence of 1, there is some chance that each change to the AI will be negative-value.

Replies from: MrMind
comment by MrMind · 2012-12-14T11:16:47.387Z · LW(p) · GW(p)

Right, it's not only noise that can alter the value of copying/propagating source code, since we can at least imagine that future improvements will be also more stable in this regard: there's also the possibility that the moral landscape of an AI could be fractal, so that even a small modification might turn friendliness into unfriendliness/hostility.

comment by Kindly · 2012-12-14T14:32:29.698Z · LW(p) · GW(p)

While the model is interesting, it is almost irremediably ruined by this line: "since by definition P(A) = Ta/n", which substantially conflates probability with frequency.

Think of P(A) merely as the output of a noiseless version of the same algorithm. Obviously this depends on the prior, but I think this one is not unreasonable in most cases.

Replies from: MrMind
comment by MrMind · 2012-12-14T15:18:44.679Z · LW(p) · GW(p)

I'm not sure I've understood the sentence

Think of P(A) merely as the output of a noiseless version of the same algorithm.

because P(A) is the noiseless parameter.
Anyway, the entire paper is based on the counting algorithm to establish that random noise can give rise to structured bias, and that this is a problem for a bayesian AI.
But while the mechanism can be an interesting and maybe even correct way to unify the mentioned bias in human mind, it can hardly be posed as a problem for such an artificial intelligence. A counting algorithm for establishing probabilities basically denies everything bayesian update is designed for (the most trivial example: extraction from a finite urn).

Replies from: Kindly
comment by Kindly · 2012-12-14T15:31:39.163Z · LW(p) · GW(p)

Well, yes, the prior that yields counting algorithms is not universal. But in many cases it's good idea! And if you decide to use, for example, some rule-of-succession style modifications, the same situation appears.

In the case of a finite urn, you might see different biases (or none at all if your algorithm stubbornly refuses to update because you chose a silly prior).

comment by [deleted] · 2012-12-13T22:19:51.705Z · LW(p) · GW(p)

The same biases

Highly unlikely. Roughly analogous at best.

Luckily, the AI is fully capable of throwing correction factors in there if there are in fact systematic biases to it's approximations.

I don't see immediately how noise could cause a systematic error unless you were doing something stupid like representing probabilities as real numbers between 0 and 1. Maybe I should actually read this...

Replies from: thomblake
comment by thomblake · 2012-12-17T16:06:17.075Z · LW(p) · GW(p)

Maybe I should actually read this...

Yes, "I don't see how..." is not a very useful comment on a paper that you haven't read that purports to explain how.