reply to benelliott about Popper issues

curi

reply to benelliott about Popper issues

post by curi · 2011-04-07T08:11:14.351Z · LW · GW · Legacy · 189 comments

189 comments

This is a discussion page because I got the message "Comment too long". Apparently the same formatting magic doesn't work here for quotes :( It is a reply to:

http://lesswrong.com/lw/3ox/bayesianism_versus_critical_rationalism/3ulv

> > You can conjecture Bayes' theorem. You can also conjecture all the rest, however some things (such as induction, justificationism, foundationalism) contradict Popper's epistemology. So at least one of them has a mistake to fix. Fixing that may or may not lead to drastic changes, abandonment of the main ideas, etc

> Fully agreed. In principle, if Popper's epistemology is of the second, self-modifying type, there would be nothing wrong with drastic changes. One could argue that something like that is exactly how I arrived at my current beliefs, I wasn't born a Bayesian.

OK great.

If the changes were large enough, to important parts (for example if it lost the ability to self-modify) I wouldn't want to call it Popper's epistemology anymore (unless maybe the changes were made very gradually, with Popper's ideas being valued the whole time, and still valued at the end). It would be departing from his tradition too much, so it would be something else. A minor issue in some ways, but tradition matters.

> I can also see some ways to make induction and foundationalism easer to swallow.

> A discussion post sounds about right for this, if enough people like it you might consider moving it to the main site.

104 comments later it's at 0 karma. There is interest, but not so much liking. I don't think the main site is the right place for me ;-)

> > I think you are claiming that seeing a white swan is positive support for the assertion that all swans are white. (If not, please clarify).

> This is precisely what I am saying.

Based on what you say later, I'm not sure if you mean this in the same way I meant it. I meant: it is positive support for "all swans are white" *over* all theories which assert "all swans are black" (I disagree with that claim). If it doesn't support them *more than those other theories* then I regard it as vaccuous. I don't believe the math you offered meets this challenge over supporting "all swans are white" more than various opposites of it. I'm not sure if you intended it to.

> > If so, this gets into important issues. Popper disputed the idea of positive support. The criticism of the concept begins by considering: what is support? And in particular, what is the difference between "X supports Y" and "X is consistent with Y"?

> The beauty of Bayes is how it answers these questions. To distinguish between the two statements we express them each in terms of probabilities.

> "X is consistent with Y" is not really a Bayesian way of putting things, I can see two ways of interpreting it. One is as P(X&Y) > 0, meaning it is at least theoretically possible that both X and Y are true. The other is that P(X|Y) is reasonably large, i.e. that X is plausible if we assume Y.

Consistent means "doesn't contradict". It's the first one. Plausible is definitely not what I wanted.

> "X supports Y" means P(Y|X) > P(Y), X supports Y if and only if Y becomes more plausible when we learn of X. Bayes tells us that this is equivalent to P(X|Y) > P(X), i.e. if Y would suggest that X is more likely that we might think otherwise then X is support of Y.

This is true but fairly vaccous, in my view. I don't want to argue over what counts as significant. If you like it, shrug. It is important that, e.g., we reject ideas refuted by evidence. But I don't think this addresses the major problems in epistemology which come after we decide to reject things which are refuted by evidence.

The reason it doesn't is there's always infinitely many things supported by any evidence, in this sense. Infinitely many things which make wildly different predictions about the future, but identical predictions about whatever our evidence covers. If Y is 10 white swans, and X is "all swans are white" then X is supported, by your statement. But also supported are infinitely many different theories claiming that all swans are black, and that you hallucinated. You saw exactly what you would see if any of those theories were true, so they get as much support as anything else. There is nothing (in the concept of support) to differentiate between "all swans are white" and those other theories.

If you do add something else to differentiate, I will say the support concept is useless. The new thing does all the work. And further, the support concept is frequently abused. I have had people tell me that "all swans are black, but tomorrow you will hallucinated 10 white swans" is supported less by seeing 10 white swans tomorrow than "all swans are white" is, even though they made identical predictions (and asserted them with 100% probability, and would both have been definitely refuted by anything else). That kind of stuff is just wrong. I don't know if you think that kind of thing or not. What you said here does clearly disown it, nor advocate it. But that's the kind of thing that concerns me.

> Suppose we make X the statement "the first swan I see today is white" and Y the statement "all swans are white". P(X|Y) is very close to 1, P(X|~Y) is less than 1 so P(X|Y) > P(X), so seeing a white swan offers support for the view that all swans are white. Very, very weak support, but support nonetheless.

The problem I have is that it's not supported over infinitely many rivals. So how is that really support? It's useless. The only stuff not being supported is that which contradicts the evidence (like, literally contradicts, with no hallucination claims. e.g. a theory that predicts you will think you saw a green swan tomororw. but then you don't, just the white ones. that one is refuted). The inconsistent theories are refuted. The theories which make probabalistic predictions are partially supported. And the theories that say "screw probability, 100% every time" for all predictions get maximally supported, and between them support does not differentiate. (BTW I think it's ironic that I score better on support when I just stick 100% in front of every prediction in all theories I mention, while you score lower by putting in other numbers, and so your support concept discourages ever making predictions with under 100% confidence).

> (The above is not meant to be condescending, I apologise if you know all of it already).

It is not condescending. I think (following Popper) that explaining things is important and that nothing is obvious, and that communication is difficult enough without people refusing go over the "basics" in order to better understand each other. Of course this is a case where Popper's idea is not unique. Other people have said similar. But this idea, and others, are integrated into his epistemology closely. There's also *far more detail and precision* available, to explain *why* this stuff is true (e.g. lengthy theories about the nature of communication, also integrated into his epistemology). I don't think ideas about interpretting people's writing in kind ways, and miscommunication being a major hurdle, are so closely integrated with Bayesian approaches with are more math focussed and don't integrate so nicely with explanations.

My reply about support is basic stuff too, to my eye. But maybe not yours. I don't know. I expect not, since if it was you could have addressed it in advance. Oh well. It doesn't matter. Reply as you will. No doubt I'm also failing to address in advance something you regard as important.

> > To show they are correct. Popper's epistemology is different: ideas never have any positive support, confirmation, verification, justification, high probability, etc...

> This is a very tough bullet to bite.

Yes it is tough. Because this stuff has been integral to the Western philosophy tradition since Aristotle until Popper. That's a long time. It became common sense, intuitive, etc...

> > How do we decide which idea is better than the others? We can differentiate ideas by criticism. When we see a mistake in an idea, we criticize it (criticism = explaining a mistake/flaw). That refutes the idea. We should act on or use non-refuted ideas in preference over refuted ideas.

> One thing I don't like about this is the whole 'one strike and you're out' feel of it. It's very boolean,

Hmm. FYI that is my emphasis more than Popper's. I think it simplifies the theory a bit to regard all changes to theories as new theories. Keep in mind you can always invent a new theory with one thing changed. So the ways it matters have some limits, it's party just a terminology thing (terminolgoy has meaning, and some is better than others. Mine is chosen with Popperian considerations in mind. A lot of Popper's is chosen with considerations in mind of talking with his critics). Popper sometimes emphasized that it's important not to give up on theories too easily, but to look for ways to improve them when they are criticized. I agree with that. So, the "one strike you're out" way of expressing this is misleading, and isn't *substantially* implied in my statements (b/c of the possibility of creating new and similar theories). Other terminologies have different problems.

> the real world isn't usually so crisp. Even a correct theory will sometimes have some evidence pointing against it, and in policy debates almost every suggestion will have some kind of downside.

This is a substantive, not terminological, disagreement, I believe. I think it's one of the *advantages* of my terminology that it helped highlight this disagreement.

Note the idea that evidence "points" is the support idea.

In the Popperian scheme of things, evidence does not point. It contradicts, or it doesn't (given some interpretation and explanation, which are often more important than the evidence itself). That's it. Evidence can thus be used in criticisms, but is not itself inherently a criticism or argument.

So let me rephrase what you were saying. "Even a correct theory will sometimes have critical arguments against it".

Part of the Popperian view is that if an idea has one false aspect, it is false. There is a sense in which any flaw must be decisive. We can't just go around admitting mistakes into our ideas on purpose.

One way to explain the issue is: for each criticism, consider it. Judge if it's right or wrong. Do your best and act on the consequence. If you think the criticism is correct, you absolutely must reject the idea it criticizes. If you don't, then you can regard the theory as not having any *true* critical arguments against it, so that's fine.

When you reject an idea for having one false part, you can try to form a new theory to rescue the parts you still value. This runs into dangers of arbitrarily rescuing everything in an ad hoc way. There's two answers to that. The first is: who cares? Popperian epistemology is not about laying out rules to prevent you from thinking badly. It's about offering advice to help you think better. We don't really care very much if you find a way to game the system and do something dumb, such as making a series of 200 ad hoc and silly arguments to try to defend a theory you are attached to. All we'll do is criticize you for it. And we think that is good enough: there are criticisms of bad methodologies, but no formal rules that definitively ban them. Now the second answer, which Deutsch presents in The Fabric of Reality, is that when you modify theories you often ruin their explanation. If you don't, then the modification is OK, it's good to consider this new theory, it's worth considering. But if the explanation is ruined, that puts an end to trying to rescue it (unless you can come up with a good idea for a new way to modify it that wont' ruin the explanation).

This concept of ruining explanations is important and not simple. Reading the book would be great (it is polished! edited!) but I'll try to explain it briefly. This example is actually from his other book, _The Beginning of Infinity_ chapter 1. We'll start with a bad theory: the seasons are caused by Persephone's imprisonment, for 6 months of the year, in the underworld (via her mother Demeter's magic powers which she uses to express her emotions). This theory has a bad explanation in the first place, so it can be easily rescued when it's emprically contradicted. For example this theory predicts the seasons will be the same all over the globe, at the same time. That's false. But you can modify the theory very easily to account for the empirical data. You can say that Demeter only cares about the area where she lives. She makes it cold when Persephone is gone, and hot when she's present. The cold or hot has to go somewhere, so she puts it far away. So, the theory is saved by an ad hoc modification. It's no worse than before. Its substantive content was "Demeter's emotions and magic account for the seasons". And when the facts change, that explanation remains in tact. This is a warning against bad explanations (which can be criticized directly for being bad explanations, so there's no big problem here).

But when you have a good explanation, such as the real explanation for the seasons, based on the Earth orbitting the sun, and the axis being tilted, and so on, ad hoc modifications cause bigger problems. Suppose we found out the seasons are the same all around the world at the same time. That would refute the axis tilt theory of seasons. You could try to save it, but it's hard. If you added magic you would be ruining the axis tilt *explantion* and resorting to a very different explanation. I can't think of any way to save the axis tilt theory from the observation that the whole world has the same seasons as the same time, without contradicting or replacing its explanation. So that's why ad hoc modifications sometimes fail (for good explanatory theories only). In the cases where there is not a failure of this type -- if there is a way to keep a good explanation and still account for new data -- then that new theory is genuinely worth consideration (and if there is some thing wrong with it, you can criticize it).

> There is also the worry that there could be more than one non-refuted idea, which makes it a bit difficult to make decisions.

Yes I know. This is an important problem. I regard it as solved. For discussion of this problem, go to:

http://lesswrong.com/r/discussion/lw/551/popperian_decision_making/

> Bayesianism, on the other hand, when combined with expected utility theory, is perfect for making decisions.

Bayesianism works when you assume a bunch of stuff (e.g. some evidence), and you set up a clean example, and you choose an issue it's good at handling. I don't think it is very helpful in a lot of real world cases. Certaintly it helps in some. I regard Bayes' theorem itself as "how not to get probability wrong". That matters to a good amount of stuff. But hard real world scenarios usually have rival explanations of the proper interpretation of the available evidence, they have fallible evidence that is in doubt, they have often many different arguments that are hard to assign any numbers to, and so on. Using solomonoff induction is assign numbers, for example, doesn't work in practice as far as i know (e.g. people don't actually compute the numbers for dozens of political arugments using it). Another assumption being made is *what is a desirable (high utility) outcome* -- Bayesianism doesn't help you figure that out, it just lets you assume it (I see that as entrenching bias and subjectivism in reagards to morality -- we *can* make objective criticisms of moral values).

189 comments

Comments sorted by top scores.

comment by jimrandomh · 2011-04-07T12:10:09.067Z · LW(p) · GW(p)

At this point, I have to conclude that you just plain don't understand Bayesian epistemology well enough to criticize it. I also suspect that you have become too strongly opinionated on this topic to be able to learn, at least until you get some distance.

The principle difference between Bayesian and Popperian epistemology is that Bayesianism is precise; it puts all necessary ambiguity in the prior, and assumes only noncontroversial, well-tested mathematical mathematical axioms, and everything thereafter is deductively sound math. In Popper, the ambiguity (which is still necessary) is in the definitions and spreads through the whole system, making its predictions much less concrete and thus making it hard to falsify.

To make progress in epistemology beyond Popper, you must switch from English to math. It takes a lot of work and a lot of time to rebuild a fuzzy English understanding of epistemology as a precise math understanding, but you will find that the precise math reproduces the same predictions, and many more predictions that the fuzzy English could never have made.

Replies from: None, David_Gerard, curi

↑ comment by [deleted] · 2011-04-07T15:19:23.629Z · LW(p) · GW(p)

The principle difference between Bayesian and Popperian epistemology is that Bayesianism is precise; it puts all necessary ambiguity in the prior, and assumes only noncontroversial, well-tested mathematical mathematical axioms, and everything thereafter is deductively sound math.

I think you're overselling it. Here are two big weaknesses of Bayesian epistemology as I understand it:

it cannot handle uncertainty about unproved mathematical truths.
It does not describe the way any existing intelligence actually operates, or even could operate in principle. (That last clause is the problem of writing an AI.)

I have never seen on this website any argument resolved or even approached on semirigorous Bayesian lines, except a couple of not-so-successful times between Yudkowsky and Hanson. Popper, or the second-hand accounts of him that I understand, seems to describe the way that I (and I think you!) actually think about things: we collect a big database of explanations and of criticisms of those explanations, and we decide the merits of those explanations and criticisms using our messy judgement.

In cases that satisfy some mild assumptions (but not so mild as to handle the important problem 1.!) this might be equivalent to Bayesianism. But equivalences go both ways, and Popper seems to be what we actually practice -- what's the problem?

Replies from: timtyler, komponisto, khafra

↑ comment by timtyler · 2011-04-07T17:02:00.440Z · LW(p) · GW(p)

Here are two big weaknesses of Bayesian epistemology as I understand it:

it cannot handle uncertainty about unproved mathematical truths.

Why do you think that?

It does not describe the way any existing intelligence actually operates, or even could operate in principle. (That last clause is the problem of writing an AI.)

Solomonoff induction is uncomputable? So: use a computable approximation.

Replies from: curi, None

↑ comment by curi · 2011-04-07T19:38:33.464Z · LW(p) · GW(p)

Solomonoff induction is uncomputable? So: use a computable approximation.

What is the argument that the approximation you use is good? What I mean is, when you approximate you are making changes. Some possible changes you could make would create massive errors. Others -- the type you are aiming for -- only create small errors that don't spread all over. What is your method of creating an approximation of the second type?

Replies from: timtyler, JoshuaZ

↑ comment by timtyler · 2011-04-07T21:02:45.998Z · LW(p) · GW(p)

Making computable approximations of Solomonoff induction is a challenging field which it seems inappropriate to try and cram into a blog comment. Probably the short answer is "by using stochastic testing".

↑ comment by JoshuaZ · 2011-04-08T03:19:11.097Z · LW(p) · GW(p)

There's a large amount of math behind this sort of thing, and frankly, given your other comments I'm not sure that you have enough background. It might help to just read up on Bayeisian machine learning which needs to deal with just this sort of issue. Then keep in mind that there are theorems that given some fairly weak conditions to rule out pathological cases one approximate any distribution by a computable distribution to arbitrary accuracy. You need to be careful about what metric you are using but it turns out to be true for a variety of different notions of approximating and different metrics. While this is far from my area of expertise, so I'm by no means an expert on this, my impression is that the theorems are essentially of the same flavor as the theorems one would see in a real analysis course about approximating functions with continuous functions or polynomial functions.

↑ comment by [deleted] · 2011-04-08T02:03:09.072Z · LW(p) · GW(p)

Why do you think that?

If you think I'm mistaken, please say so and elaborate.

Solomonoff induction is uncomputable? So: use a computable approximation.

It's hard for me to believe that you haven't thought of this, but it's difficult to "approximate" an uncomputable function. Think of any enormous computable function f(n) you like. Any putative "approximation" of the busy beaver function is off by a factor larger than f(n). I bet Solomonoff is similarly impossible to approximate -- am I wrong?

Replies from: timtyler

↑ comment by timtyler · 2011-04-08T03:11:48.732Z · LW(p) · GW(p)

I am not aware of any particular issues regarding Bayesian epistemology handling uncertainty about unproved mathematical truths. How is that different from other cases where there is uncertainty?

Using a computable approximation of Solomonoff induction is a standard approach. If you don't have a perfect compressor, you just use the best one you have. It is the same with Solomonoff induction.

Replies from: None

↑ comment by [deleted] · 2011-04-08T14:01:34.942Z · LW(p) · GW(p)

I am not aware of any particular issues regarding Bayesian epistemology handling uncertainty about unproved mathematical truths. How is that different from other cases where there is uncertainty?

I outlined the problem with mathematical uncertainty here. The only reason I believe that this is an open problem is on Yudkowsky's say-so in his reply.

Using a computable approximation of Solomonoff induction is a standard approach. If you don't have a perfect compressor, you just use the best one you have. It is the same with Solomonoff induction.

Standard approach to what? I don't know what a "compressor" or "perfect compressor" is, if those are technical terms.

To me, the question is whether an approximation to Solomonoff induction has approximately the same behavior as Solomonoff induction. I think it can't, for instance because no approximation of the busy beaver function (even the "best compressor you have") behaves anything like the busy beaver function. If you think this is a misleading way of looking at it please tell me.

Replies from: JoshuaZ, timtyler

↑ comment by JoshuaZ · 2011-04-08T14:12:25.449Z · LW(p) · GW(p)

To me, the question is whether an approximation to Solomonoff induction has approximately the same behavior as Solomonoff induction. I think it can't, for instance because no approximation of the busy beaver function (even the "best compressor you have") behaves anything like the busy beaver function. If you think this is a misleading way of looking at it please tell me.

Solomonoff induction can't handle the Busy Beaver function because Busy Beaver is non-computable. So it isn't an issue for approximations of Solomonoff (except in so far as they can't handle it either).

Replies from: None

↑ comment by [deleted] · 2011-04-08T14:16:18.163Z · LW(p) · GW(p)

I am not saying that "Solomonoff can't handle Busy Beaver." (I'm not even sure I know what you mean.) I am saying that Solomonoff is analogous to Busy Beaver, for instance because they are both noncomputable functions. Busy Beaver is non-approximatable in a strong sense, and so I think that Solomonoff might also be non-approximatable in an equally strong sense.

Replies from: timtyler

↑ comment by timtyler · 2011-04-08T17:14:29.518Z · LW(p) · GW(p)

Kolmogorov complexity is uncomputable, but you can usefully approximate Kolmogorov complexity for many applications using PKZIP. The same goes for Solomonoff induction. Its prior is based on Kolmogorov complexity.

↑ comment by timtyler · 2011-04-08T16:29:55.637Z · LW(p) · GW(p)

I outlined the problem with mathematical uncertainty here.

Don't agree with the premise there. As to what Yudkowsky is talking about by saying "Logical uncertainty is an open problem" - it beats me. There's really only uncertainty. Uncertainty about mathematics is much the same as other kinds of uncertainty.

Replies from: None

↑ comment by [deleted] · 2011-04-08T16:32:53.592Z · LW(p) · GW(p)

What premise?

Replies from: timtyler

↑ comment by timtyler · 2011-04-08T16:53:45.712Z · LW(p) · GW(p)

The first 5 lines of the post - this bit::

Any prior P, of any agent, has at least one of the following three properties:

Replies from: None

↑ comment by [deleted] · 2011-04-08T22:49:05.795Z · LW(p) · GW(p)

I can prove that to you, unless I made a mistake. Are you saying you can defeat it a priori by telling me a prior that doesn't have any of those three properties?

Replies from: timtyler

↑ comment by timtyler · 2011-04-09T12:08:45.650Z · LW(p) · GW(p)

Take induction, for example, where the domain of the function P(X) ranges over the possible symbols that might come next in the stream (or, if you prefer, ranges over the hypotheses that predict them).

Then P(X and Y) is typically not a meaningful concept.

Replies from: None

↑ comment by [deleted] · 2011-04-09T15:19:56.296Z · LW(p) · GW(p)

The trichotomy is:

It is not defined on all X -- i.e. P is agnostic about some things

It has P(X) < P(X and Y) for at least one pair X and Y -- i.e. P sometimes falls for the conjunction fallacy

It has P(X) = 1 for all mathematically provable statements X -- i.e. P is an oracle.

Taken literally, your P falls for 1. For instance it doesn't have an opinion about whether it will be sunny tomorrow or the Riemann hypothesis is true. If you wish avoid this problem by encoding the universe as a string of symbols to feed to your induction machine, why wouldn't you let me encode "X and Y" at the same time?

↑ comment by komponisto · 2011-04-07T16:59:18.636Z · LW(p) · GW(p)

Here are two big weaknesses of Bayesian epistemology as I understand it:

1.it cannot handle uncertainty about unproved mathematical truths.

Why not? You just use an appropriate formalization of mathematics, and treat it as uncertainty about the behavior of a proof-searching machine.

I have never seen on this website any argument resolved or even approached on semirigorous Bayesian lines, except a couple of not-so-successful times between Yudkowsky and Hanson

I can think of at least one concrete example. But I'm guessing you were familiar with that example (and numerous other smaller-scale ones) and rejected it, so you must mean something different than I do by "argument approached on semirigorous Bayesian lines".

equivalences go both ways, and Popper seems to be what we actually practice -- what's the problem?

Perhaps there isn't any, except insofar as the poster is claiming that Bayes is wrong because it isn't Popper.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-04-07T18:29:10.977Z · LW(p) · GW(p)

Why not? You just use an appropriate formalization of mathematics, and treat it as uncertainty about the behavior of a proof-searching machine.

Unfortunately this isn't helpful. Consider for example a Turing machine that seems to halt on all inputs, and we know that when this one halts it halts with either a 0 or a 1. Does this machine represent a computable sequence (hence should have non-zero probability assigned if one is using a Solomonoff prior)? If we haven't resolved that question we don't know. But in order to use any form of prior over computable sequences we need to assume that we have access to what actually represents a computable hypothesis and what doesn't. There are other problems as well.

Replies from: komponisto

↑ comment by komponisto · 2011-04-07T21:15:36.898Z · LW(p) · GW(p)

I'm having trouble parsing your third (I don't know what it means for a Turing machine to [fail to] "represent a computable sequence", especially since I thought that a "computable sequence" was by definition the output of a Turning machine) and fourth (we don't know what?) sentences, but if your general point is what I think it is ("after formalizing logical uncertainty, we'll still have meta-logical uncertainty left unformalized!"), that's simply a mathematical fact, and not an argument against the possibility of formalizing logical uncertainty in the first place.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-04-08T01:52:30.524Z · LW(p) · GW(p)

I'm having trouble parsing your third (I don't know what it means for a Turing machine to [fail to] "represent a computable sequence", especially since I thought that a "computable sequence" was by definition the output of a Turning machine)

A sequence f(n) is computable if there's a Turing machine T that given input n halts with output f(n). But, not all Turing machines halt on all inputs. It isn't hard to make Turing machines that go into trivial infinite loops, and what is worse, Turing machines can fail to halt in much more ugly and non-obvious ways to the point where the question "Does the Turing machine M halt on input n" is not in general decidable. This is known at the Halting theorem. So if I'm using some form of Solomonoff prior I can't even in general tell whether a machine describes a point in my hypothesis space.

Replies from: komponisto

↑ comment by komponisto · 2011-04-08T02:15:18.830Z · LW(p) · GW(p)

What I don't understand is your argument that there is a specific problem with logical uncertainty that doesn't apply to implementing Solomonoff induction in general. Yes, the halting problem is undecidable, so you can't decide if a sequence is computable; but assuming you've already got a Solomonoff-induction machine that can say "my probability that it will rain tomorrow is 50%", why can't it also say "my probability that the Riemann Hypothesis is true is 50%"?

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-04-08T02:23:04.929Z · LW(p) · GW(p)

but assuming you've already got a Solomonoff-induction machine that can say "my probability that it will rain tomorrow is 50%", why can't it also say "my probability that the Riemann Hypothesis is true is 50%"?

That's actually a really good example. It isn't that difficult to make a Turing machine that halts if and only if the Riemann hypothesis is true. So a system using Solomonoff induction has to recognize for starters whether or not that Turing machine halts. Essentially, in the standard version of Solomonoff induction, you need to assume that you have access to indefinitely large computing power. You can try making models about what happens when you have limited computational power in your entity (In some sense AIXI implementations and implementations of Bayesian reasoning need to do close to this). But if one doesn't assume that one has indefinite computing power then a lot of the results about how different priors behave no longer hold (or at least the proofs don't obviously go through). For more detail on that sort of thing I'd recommend talking to cousin_it or jimrandomh since they've thought and know a lot more about these issues than I do.

Replies from: komponisto

↑ comment by komponisto · 2011-04-08T15:19:17.036Z · LW(p) · GW(p)

It isn't that difficult to make a Turing machine that halts if and only if the Riemann hypothesis is true. So a system using Solomonoff induction has to recognize for starters whether or not that Turing machine halts.

Only in the sense that a human trying to solve the Riemann Hypothesis also has to recognize whether the same Turing machine halts.

When I talk about "going meta", I really mean it: when the Solomonoff machine that I have in mind is considering "whether this sequence is computable" or "whether the Riemann Hypothesis is true" or more generally "whether this Turing machine halts", it is going to be doing so the same way a human does: by using a model of the mathematical object in question that isn't actually equivalent to that same mathematical object. It won't be answering the question "natively", the way a computer typically adds 3+5 (i.e. by specific addition algorithms built into it); instead, it will be more closely analogous to a computer being programmed to simulate three apples being combined with five apples on its display screen, and then count the apples by recognizing their visual representation.

So the upshot is that to be able to give an answer the question "what is your probability that this Turing machine halts?", the Solomonoff AI does not need to solve anything equivalent to the halting problem. It just needs to examine the properties of some internal model corresponding to the label "Turing machine", which need not be an actual Turing machine. It is in this way that uncertainty about mathematical truths is handled.

It should go without saying that this isn't directly of use in building such an AI, because it doesn't tell you anything about how to construct the low-level algorithms that actually run it. But this thread isn't about how to build a Bayesian AI; rather, it's about whether a Bayesian AI is something that it makes sense to build. And my point here is that "Well, if you had a Bayesian AI, it wouldn't be able to give you probability estimates concerning the truth of mathematical statements" is not a valid argument on the latter question.

Replies from: None

↑ comment by [deleted] · 2011-04-08T15:56:57.007Z · LW(p) · GW(p)

So the upshot is that to be able to give an answer the question "what is your probability that this Turing machine halts?", the Solomonoff AI does not need to solve anything equivalent to the halting problem.

By "Solomonoff AI" do you mean "some computable approximation of a Solomonoff AI"? My impression is that the Solomonoff prior just does solve the halting problem, and that this is a standard proof that it is uncomputable.

when the Solomonoff machine that I have in mind is considering "whether this sequence is computable" or "whether the Riemann Hypothesis is true" or more generally "whether this Turing machine halts", it is going to be doing so the same way a human does.

Humans are bad at this. Is there some reason to think that a "the Solomonoff machine you have in mind" will be better at it?

Replies from: komponisto

↑ comment by komponisto · 2011-04-08T18:10:02.238Z · LW(p) · GW(p)

My impression is that the Solomonoff prior just does solve the halting problem,

The programmer would need to have solved the halting problem in order to program the Solomonoff prior into the AI, but the AI itself would not be solving the halting problem.

when the Solomonoff machine that I have in mind is considering "whether this sequence is computable" or "whether the Riemann Hypothesis is true" or more generally "whether this Turing machine halts", it is going to be doing so the same way a human does.

Humans are bad at this. Is there some reason to think that a "the Solomonoff machine you have in mind" will be better at it?

It may or may not be (though hopefully it would be more intelligent than humans -- that's the point after all!); it doesn't matter for the purpose of this argument.

Replies from: None

↑ comment by [deleted] · 2011-04-09T03:15:27.984Z · LW(p) · GW(p)

The programmer would need to have solved the halting problem in order to program the Solomonoff prior into the AI, but the AI itself would not be solving the halting problem.

This strikes me as a confused way of looking at things. As you know, it's a theorem that the halting problem is unsolvable in a technical sense. That theorem expresses a limitation on computer programs, not computer programmers.

↑ comment by khafra · 2011-04-07T15:36:13.723Z · LW(p) · GW(p)

My thinking seems to me more qualitatively bayesian than popperian. I don't have a good enough memory to keep all the criticisms I've ever heard of each theory I provisionally accept in mind. Instead, when I encounter a criticism that seems worth considering, I decrease my belief in the theory by an amount corresponding to the strenghth of the criticism. If I then go on to find evidence that weakens the criticism, strengthens the original theory, or weakens all possible alternate theories, I increase my belief in the original theory again.

Replies from: curi, None

↑ comment by curi · 2011-04-07T21:18:04.496Z · LW(p) · GW(p)

You raise an interesting issue which is: what is the strength of a criticism? How is that determined?

For example, your post is itself a criticism of Popperian epistemology. What is the strength of your post?

By not using strengths of arguments, I don't have this problem. Strengths of arguments remind me of proportional representation voting where every side gets a say. PR voting makes a mess of things, not just in practice but also in terms of rigorous math (e.g. Arrow's Theorem)

Replies from: Sniffnoy, JoshuaZ

↑ comment by Sniffnoy · 2011-04-08T04:58:43.443Z · LW(p) · GW(p)

What does Arrow's theorem have to do with proportional representation? Arrow's theorem deals with single-winner ordinal voting systems. Is there some generalization that covers proportional representation as well?

Replies from: curi

↑ comment by curi · 2011-04-08T09:37:10.570Z · LW(p) · GW(p)

For one thing, all elections have a single overall outcome that wins.

Replies from: Sniffnoy

↑ comment by Sniffnoy · 2011-04-08T10:12:14.609Z · LW(p) · GW(p)

Indeed, but "single-winner" has a technical meaning here that's rather more restrictive than that. Unless each voter could choose their vote aribitrarily from among the set of those overall outcomes, it's not single-winner.

Replies from: curi

↑ comment by curi · 2011-04-08T16:58:21.072Z · LW(p) · GW(p)

can you give the meaning of "single winner" and the reason you think not directly voting for the single winner will remove the problems?

in the US presidential elections, no voters can directly vote for his desired overall outcome. we have a primary system. are you saying that the primary system makes arrow's theorem's problems go away for us?

Replies from: Sniffnoy

↑ comment by Sniffnoy · 2011-04-08T17:53:28.940Z · LW(p) · GW(p)

This appears to be very confused. Arrow's theorem is a theorem, a logical necessity, not a causal influence. It does not go around causing problems, that can be then be warded off by modifying your system to avoid its preconditions. It's just a fact that your voting system can't satisfy all its desiderata simultaneously. If you're not talking about a single-winner voting system it's simply an inapplicable statement. Perhaps the system meets all the desiderata, perhaps it doesn't. Arrow's theorem simply has nothing to say about it. But if you take a system that meets the preconditions, and simply modify it to not do so, then there's no reason to expect that it suddenly will start meeting the desiderata. For instance, though I think it's safe to say that US presidential elections are single winner, even if they somehow didn't count as such they'd fail to satisfy IIA. My point is just don't bring up theorems where they don't apply.

If you want a formal statement of Arrow's theorem... well it can take several forms, so take a look at the definitions I made in this comment about their equivalence. Then with that set of definitions, any voting system satisfying IIA and unanimity (even in just the "weak" form where if everyone votes for the same linear order, the overall winner is the winner of that linear order) with a finite set of voters must be a dictatorship. (There's a version for infinite sets of voters too but that's not really relevant here.)

Replies from: curi

↑ comment by curi · 2011-04-08T18:03:49.911Z · LW(p) · GW(p)

You're very condescending and you're arguing with my way of speaking while refusing to actually provide a substantive argument.

it does apply. you're treating yourself as an authority, and me a beginner. and just assuming that knowing nothing about me.

you made a mistake which i explained in my second paragraph. your response: to ignore it, and say you were right anyway and i should just read more.

Replies from: Sniffnoy

↑ comment by Sniffnoy · 2011-04-08T18:13:19.103Z · LW(p) · GW(p)

and just assuming that knowing nothing about me.

I am, in fact, inferring that based on what you wrote.

you made a mistake which i explained in my second paragraph. your response: to ignore it, and say you were right anyway and i should just read more.

Then I must ask that you point out this mistake more explicitly, because nothing in that second paragraph contradicts anything I said.

Replies from: curi

↑ comment by curi · 2011-04-08T18:15:19.060Z · LW(p) · GW(p)

Your claims implied Arrow's Theorem doesn't apply to the US election system. Or pretty much any other. You also defined "single-winner" so that the single US president who wins the election doesn't qualify.

OK, you're right I didn't actually contradict you. But don't you think that's a mistake? I think it does apply to real life voting systems that people use.

Replies from: Sniffnoy

↑ comment by Sniffnoy · 2011-04-08T18:18:29.306Z · LW(p) · GW(p)

Perhaps you misunderstand what is meant when people say a theorem "applies". It means that the preconditions are met (and that therefore the conclusions hold), not simply that the conclusions hold. The conclusions of a theorem can hold without the theorem being at all applicable.

Replies from: curi

↑ comment by curi · 2011-04-08T18:19:55.318Z · LW(p) · GW(p)

If you meant "Arrow's theorem is not applicable to any voting system on Earth" why did you object to my statements about PR? The PR issue is irrelevant, is it not?

↑ comment by JoshuaZ · 2011-04-08T01:47:17.033Z · LW(p) · GW(p)

By not using strengths of arguments, I don't have this problem.

Do you intend to treat all criticism equally?

↑ comment by [deleted] · 2011-04-07T15:48:19.899Z · LW(p) · GW(p)

I'm suspicious of the notion of "increasing" and "decreasing" a belief. Did you pick those words exactly because you didn't want to use the word "updating"? Why not?

My guess is that having a bad memory is as much a disadvantage for Bayesians as for Popperians.

Replies from: khafra

↑ comment by khafra · 2011-04-07T16:06:25.023Z · LW(p) · GW(p)

I'm suspicious of your suspicion. Is it purely because of the terms I used, or do you really have no beliefs you hold more tenuously than others?

If I ask you whether the sun rose this morning, will you examine a series of criticisms of that idea for validity and strength?

If I ask you whether it's dangerous to swim after a heavy meal, you'll probably check snopes to verify your suspicion that it's an old wives' tale, and possibly check any sources cited at snopes. But will you really store the details of all those arguments in your memory as metadata next to "danger of swimming after a heavy meal," or just mark it as "almost certainly not dangerous"?

Replies from: curi, None

↑ comment by curi · 2011-04-07T21:19:36.694Z · LW(p) · GW(p)

To save on storage, learn powerful explanations. Sometimes ideas can be integrated into better ideas that elegantly cover a lot of ground. Finding connections between fields is an important part of learning.

Learning easy to remember rules of thumb -- and improving them with criticism when they cause problems -- is also valuable for some applications.

↑ comment by [deleted] · 2011-04-07T23:18:40.602Z · LW(p) · GW(p)

I don't think the shortcuts I take on easy questions are very demonstrative of anything.

I'm suspicious of your suspicion. Is it purely because of the terms I used, or do you really have no beliefs you hold more tenuously than others?

I know what it means to think an outcome is likely or not likely. I don't know what it means for a belief to be tenuous or not tenuous.

↑ comment by David_Gerard · 2011-04-07T12:21:00.741Z · LW(p) · GW(p)

At this point, I have to conclude that you just plain don't understand Bayesian epistemology well enough to criticize it.

LessWrong is seriously lacking a proper explanation of Bayesian epistemology (as opposed to the theorem itself). Do you have one handy?

Replies from: timtyler, None

↑ comment by timtyler · 2011-04-07T17:12:41.957Z · LW(p) · GW(p)

http://yudkowsky.net/rational/bayes has a section on Bayesian epistemology that compares it to Popper's ideas.

Bayesian epistemology boils down to: use probabilities to represent your confidence in your beliefs, use Bayes's theorem to update your confidences - and try to choose a sensible prior.

Replies from: curi

↑ comment by curi · 2011-04-07T19:39:50.664Z · LW(p) · GW(p)

Of course I've read that.

It first of all is focussed on Bayes' theorem without a ton of epistemology.

It second of all does not discuss Popper's ideas but only nasty myths about them. See the original post here:

http://lesswrong.com/lw/54u/bayesian_epistemology_vs_popper/

↑ comment by [deleted] · 2011-04-07T12:50:49.083Z · LW(p) · GW(p)

Yes, http://wiki.lesswrong.com/wiki/Sequences . Specifically the first four 'sequences' there. Many people on LW will say "Read the sequences!" as an answer to almost everything, like they're some kind of Holy Writ, which can be offputting, but in this case Yudkowsky really does answer most of the objections you've been raising and is the simplest explanation I know of.

ETA - That's weird. When I posted this reply I could have sworn that the username on the comment above wasn't David_Gerard but one I didn't recognise. I wouldn't point D_G to the sequences because I know he's read them, but would point a newbie to them. Apologies for the brainfart,

Replies from: None

↑ comment by [deleted] · 2011-04-07T12:59:33.043Z · LW(p) · GW(p)

I've read the Sequences, and I don't know what you are referring to--the first four Sequences do explain Bayes-related ideas and how to apply them in everyday life, but they don't address all of the criticism that curi and others have pointed out. Did you have a specific post or series of posts in mind?

Replies from: None

↑ comment by [deleted] · 2011-04-07T13:19:51.346Z · LW(p) · GW(p)

The criticisms here that haven't been just noise have mostly boiled down to the question of choosing one hypothesis from an infinite sample space. I'm not sure exactly where in the sequences EY covered this, but I know he did, and did in one of those four, because I reread them recently. Sorry I can't be more help.

Replies from: David_Gerard

↑ comment by David_Gerard · 2011-04-07T13:27:18.400Z · LW(p) · GW(p)

What I'm meaning to point out is the absence of something with a title along the lines of "Bayesian epistemology" that explains Bayesian epistemology specifically.

What LW has is "here's the theorem" and "everything works by this theorem", without the second one being explained in coherent detail. I mean, Bayes structure is everywhere. But just noting that is not an explanation.

There's potential here for an enormously popular front-page post that gets linked all over the net forever, because the SEP article is so awful ...

↑ comment by curi · 2011-04-07T21:14:31.575Z · LW(p) · GW(p)

At this point, I have to conclude that you just plain don't understand Bayesian epistemology well enough to criticize it.

What you should do is say specifically what I got wrong (just one thing is fine). Then you'll be making a substantive statement!

making its predictions much less concrete and thus making it hard to falsify.

What predictions? It is a philosophical theory.

To make progress in epistemology beyond Popper, you must switch from English to math.

Your conception of epistemology is different than ours. We seek things like explanations that help us to understand the world.

Replies from: jimrandomh

↑ comment by jimrandomh · 2011-04-07T21:29:44.472Z · LW(p) · GW(p)

What you should do is say specifically what I got wrong (just one thing is fine). Then you'll be making a substantive statement!

Ok, here's one. You criticize Bayesian updating for invoking infinitely many hypotheses, as a fundamental problem. In fact, the problem of infinite sets is an issue, but it's resolved in Jaynes' book by a set of rules in which one never deals with infinities directly, but rather with convergent limiting expressions, which are mathematically well-behaved in ways that infinities aren't. This ensures, among other things, that any set of hypotheses (whether finite or infinite) has only finite total plausibility, and lets us compute plausibilities for whole sets at once (ideally, picking out one element and giving it a high probability, and assigning a low total probability to the infinitely many other hypotheses).

What predictions? It is a philosophical theory.

Both theories make predictions about the validity of models using evidence - that is, they predict whether future observations will agree with the model.

Your conception of epistemology is different than ours. We seek things like explanations that help us to understand the world.

No, our conceptions of epistemology are the same. Math does help us understand the world, in ways that natural language can't.

Replies from: curi

↑ comment by curi · 2011-04-07T21:42:01.002Z · LW(p) · GW(p)

Ok, here's one. You criticize Bayesian updating for invoking infinitely many hypotheses, as a fundamental problem.

No, I didn't say that. I invoked them, because they matter. You then claims Jaynes' deals with the problem. Yet Yudkowsky concedes it is a problem. I don't think you understood me, rather than vice versa.

Both theories make predictions about the validity of models using evidence

Popper never made a prediction like that. And this rather misses some points. Some models for using evidence (e.g. induction) are literally incapable of making predictions (therefore people who do make predictions must be doing something else). Here Popper was not making a prediction, and also was pointing out prediction isn't the right way to judge some theories.

No, our conceptions of epistemology are the same. Math does help us understand the world, in ways that natural language can't.

Can you write philosophical explanations in math? Of course math helps for some stuff, but not everything.

Replies from: jimrandomh

↑ comment by jimrandomh · 2011-04-07T21:52:57.657Z · LW(p) · GW(p)

Ok, here's one. You criticize Bayesian updating for invoking infinitely many hypotheses, as a fundamental problem.

No, I didn't say that. I invoked them, because they matter. You then claims Jaynes' deals with the problem. Yet Yudkowsky concedes it is a problem

Here's where you've really gone astray. You're trying to figure out math by reading what people are saying about it. That doesn't work. In order to understand math, you have to look at the math itself. I'm not sure what statement by Yudkowsky you're referring to, but I'll bet it was something subtly different.

Both theories make predictions about the validity of models using evidence

Popper never made a prediction like that.

Uh, wait a second. Did you really just say that Popper doesn't provide a method for using evidence to decide whether models are valid? There must be some sort of misunderstanding here.

Replies from: timtyler, curi

↑ comment by timtyler · 2011-04-08T12:41:17.859Z · LW(p) · GW(p)

I'm not sure what statement by Yudkowsky you're referring to, but I'll bet it was something subtly different.

I am pretty sure it was this one - where: Yudkowsky goes loopy.

↑ comment by curi · 2011-04-07T22:17:18.556Z · LW(p) · GW(p)

The only way evidence is used is that criticisms may refer to it.

I'm not trying to figure out math, I'm trying to discuss the philosophical issues.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-04-09T17:03:14.090Z · LW(p) · GW(p)

The only way evidence is used is that criticisms may refer to it.

Please reread what Jim wrote. You seem to be in agreement with his statement that evidence is used.

I'm not trying to figure out math, I'm trying to discuss the philosophical issues.

Unfortunately, they are interrelated. There's a general pattern here: some people (such as Jaynes and Yudkowsky) are using math as part of their philosophy. In the process of that they are making natural language summaries and interpretations of those claims. You are taking those natural language statements as if that was all they had to say and then trying to apply your intuition of on ill-defined natural language statements rather than read those natural language statements in the context of the formalisms and math they care about. You can't divorce the math from the philosophy.

comment by [deleted] · 2011-04-07T15:17:26.387Z · LW(p) · GW(p)

I want to emphasize this line from the long op, which I think is curi's best argument:

Hard real world scenarios usually have rival explanations of the proper interpretation of available evidence, they have fallible evidence that is in doubt, they have often many different arguments that are hard to assign any numbers to, and so on.

Therefore Bayesianism does not describe the way that we actually find out true things. I think this is a pretty compelling criticism of Bayes, does anyone have a stock answer?

Replies from: timtyler, komponisto

↑ comment by timtyler · 2011-04-07T17:15:32.441Z · LW(p) · GW(p)

Therefore Bayesianism does not describe the way that we actually find out true things. I think this is a pretty compelling criticism of Bayes, does anyone have a stock answer?

It isn't a theory about human psychology in the first place.

Replies from: curi

↑ comment by curi · 2011-04-07T21:28:30.522Z · LW(p) · GW(p)

Epistemologies are theories about how knowledge is created.

Humans create knowledge.

If you want to be an epistemology, address the problem of how they do it.

Replies from: timtyler

↑ comment by timtyler · 2011-04-08T12:50:13.269Z · LW(p) · GW(p)

Humans do all kinds of things badly. They are becoming obsolete.

For a neater perspective that is more likely to stand the test of time, it is better to consider how machines create knowledge.

↑ comment by komponisto · 2011-04-07T16:39:13.441Z · LW(p) · GW(p)

Firstly, "the way we actually do X is by Y" is never a valid criticism of a theory saying "the way we should do X is by Z". (Contemporary philosophers are extremely fond of this mistake, it must be said.) If we're not using Bayes, then maybe we're doing it wrong.

Secondly, that the fact that we don't consciously think in terms of numbers doesn't mean that our brains aren't running Bayes-like algorithms on a low level not accessible to conscious introspection.

Replies from: JoshuaZ, None

↑ comment by JoshuaZ · 2011-04-07T18:31:00.776Z · LW(p) · GW(p)

Secondly, that the fact that we don't consciously think in terms of numbers doesn't mean that our brains aren't running Bayes-like algorithms on a low level not accessible to conscious introspection.

Failure to perform correctly on the Monty Hall problem is cross-cultural. I haven't seen the literature in any detail but my impression is that the conjunction fallacy is also nearly universal. Whatever humans are doing it isn't very close to Bayes.

Replies from: komponisto, curi

↑ comment by komponisto · 2011-04-07T20:51:01.581Z · LW(p) · GW(p)

Emphasis on low-level. Thinking "hmm, the probability of this outcome is 1/3" is high-level, conscious cognition. The sense in which we're "Bayesians" is like the sense in which we're good at calculus: catching balls, not (necessarily) passing written tests.

The conjunction fallacy is a closer to being a legitimate counterargument, but I would remind you that "Bayes-like" does not preclude the possibility of deviations from Bayes.

Perhaps some general perspective would be helpful. My point of view is that "inference = Bayes" is basically an analytic truth. That is, "Bayesian updating" is the mathematically precise notion that best corresponds to the vague, confused human idea of "inference". The latter turns out to mean Bayesian updating in the same sense that our intuitive idea of "connectedness" turns out to mean this. As such, we can make our discourse strictly more informative by replacing talk of "inference" with talk of "Bayesian updating" throughout. We can talk about Bayesian updating done correctly, and done incorrectly. For example, instead of saying "humans don't update according to Bayes", we should rather say, "humans are inconsistent in their probability assignments".

Replies from: curi

↑ comment by curi · 2011-04-07T21:31:06.629Z · LW(p) · GW(p)

That is, "Bayesian updating" is the mathematically precise notion that best corresponds to the vague, confused human idea of "inference".

I agree with you that "inference" is a vague and confused notion.

I don't agree that finding some math that somewhat corresponds to a bad idea, makes things better!

Popper's approach to it is to reject the idea and come up with better, non-confused ideas.

Replies from: komponisto

↑ comment by komponisto · 2011-04-07T21:44:10.423Z · LW(p) · GW(p)

An idea becomes "non-confused" when it is turned into math. "Inference" may be a confused notion, but Bayesian updating isn't.

If Popper has better math than Bayes, so much the better. That's not the impression I get from your posts, however. The impression I get from your posts is that you meant to say "Hey! Check out this great heuristic that Karl Popper came up with for generating more accurate probabilities!" but instead it came out as "Bayes sucks! Go Popper!"

Replies from: curi

↑ comment by curi · 2011-04-07T21:47:59.288Z · LW(p) · GW(p)

If the math is non-confused, and the idea is confused, then what's going on is not that the idea became non-confused but the math doesn't correspond to reality.

If Popper has better math than Bayes, so much the better.

He doesn't have a lot of math.

No matter how much math you have, you always face problems of considering issues like whether some mathematical objects correspond to some real life things, or not. And you can't settle those issues with math.

"Bayes sucks! Go Popper!"

You guys are struggling with problems, such as justificationism, which Popper solved. Also with instrumentalism, lack of appreciation for explanatory knowledge, foundationalism, etc

Replies from: komponisto

↑ comment by komponisto · 2011-04-07T22:03:15.833Z · LW(p) · GW(p)

If the math is non-confused, and the idea is confused, then what's going on is not that the idea became non-confused but the math doesn't correspond to reality.

What? Only confused ideas correspond to reality? That makes no sense.

No matter how much math you have, you always face problems of considering issues like whether some mathematical objects correspond to some real life things, or not. And you can't settle those issues with math.

You settle those issues by experiment.

You guys are struggling with problems, such as justificationism, which Popper solved. Also with instrumentalism, lack of appreciation for explanatory knowledge, foundationalism, etc

I'm not sure I see the problem, frankly. As far as I can tell this would be like me telling you that you're "struggling with the problem of Popperianism".

Replies from: curi

↑ comment by curi · 2011-04-07T22:15:41.123Z · LW(p) · GW(p)

If you take a confused idea, X. And you take some non-confused math, Y. Then they do not correspond precisely.

No matter how much math you have, you always face problems of considering issues like whether some mathematical objects correspond to some real life things, or not. And you can't settle those issues with math.

You settle those issues by experiment.

Can't be done. When you try to set up an experiment you always have to have philosophical theories. For example if you want to measure something, you need a theory about the nature of your measuring device. e.g. you'll want to come up with some mathematical properties and know if they correspond to the real physical object. So you run into the same problem again.

I'm not sure I see the problem, frankly.

How are theories justified?

How are theories induced? If you say using the solomonoff prior, then are the theories it offers always best? If not, that's a problem, right? If yes, what's the argument for that?

↑ comment by curi · 2011-04-07T21:29:51.738Z · LW(p) · GW(p)

I actually don't agree with this. Those problems are caused by memes, not hardware.

Cross cultural is caused by the logic of the situation different early cultures were being in, and what mistakes are easy to make, being similar.

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-04-08T01:45:20.771Z · LW(p) · GW(p)

I actually don't agree with this. Those problems are caused by memes, not hardware

How would you test this claim? (Note by the way that in the case of Monty Hall, the percentages don't change much from culture to culture. It is consistently between 75-90% refusing to switch in all tested cultures. This is actually one of the things that convinced me that this isn't memetic. )

↑ comment by [deleted] · 2011-04-08T01:45:53.393Z · LW(p) · GW(p)

Firstly, "the way we actually do X is by Y" is never a valid criticism of a theory saying "the way we should do X is by Z". (Contemporary philosophers are extremely fond of this mistake, it must be said.) If we're not using Bayes, then maybe we're doing it wrong.

Let me go further. The way people with a good track record of finding out true things (for instance, komponisto) actually go about finding out true things is by collecting explanations and criticisms of those explanations, not by computing priors and posteriors.

What would it mean to be doing it wrong? I can only think of: believing a lot of false things. So tell me some false things that I could come to believe by Popperian methods, that I wouldn't come to believe by Bayesian methods, or even better show me that the converse happens much more rarely.

Secondly, that the fact that we don't consciously think in terms of numbers doesn't mean that our brains aren't running Bayes-like algorithms on a low level not accessible to conscious introspection.

Sure. For instance there's good evidence that our brains judge what color something is by a Bayesian process. But why should I take advice about epistemology from such an algorithm?

comment by benelliott · 2011-04-07T11:09:10.476Z · LW(p) · GW(p)

Damn, I had a reply but it took so long to type that I lost internet connection.

Basically, with your point about supporting infinitely many theories, I refer you back to my comment that started this whole discussion.

As for the 'one strike and you're out' approach to criticism, I have three big problems with it:

The first is that if 'I don't understand' counts as a criticism, and you have claimed it does, then we need to reject every scientific theory we currently have since someone, somewhere, doesn't understand it.

Second, you accused Jaynes and Yudkowsky of being unscholarly when they said that Popper believed falsification is possible. Popper clearly believes refutation is possible, what is the difference between this and falsification?

Third, it leaves no room for weak criticisms.

Imagine I have a coin, which I think might be double-headed, although I'm not sure, it could also just be an ordinary coin. For some reason I am not allowed to examine both sides, all I can do is flip it and examine whichever side comes on top. If I was a Popperian how would I reason about this?

If tails comes on top then I have criticism of the 'double-headed' theory strong enough to refute it.

If heads comes on top then I do not have a strong enough criticism to refute either theory. This apparently means I do not have any criticism at all.

After 1000 heads I still don't have any criticism. At this point I get tired of flipping so I decide to make my decision. I still have two theories, so I criticise both of them for not refuting the other and reject them both. I sit down and try to come up with a better theory.

To me, this seems nuts. After 1000 heads, the 'ordinary coin' theory should have been thoroughly refuted. I would happily bet my life for a penny on it. If you wish to claim that is has been, then you must tell me where exactly, between 1 and 1000, you draw the line between refuted and not refuted.

Replies from: curi, GuySrinivasan

↑ comment by curi · 2011-04-07T21:27:32.133Z · LW(p) · GW(p)

The first is that if 'I don't understand' counts as a criticism, and you have claimed it does, then we need to reject every scientific theory we currently have since someone, somewhere, doesn't understand it.

Sort of. It's not perfect. As far as scientific progress, it should be improved on. Indefinitely.

In the mean time we have to make decisions. For making decisions, we never directly use canonical scientific theories. Instead, you make a conjecture like, "QM isn't perfect. But I can use it for building this space ship, and my spaceship will fly". This conjecture is itself open to criticism. It could be a bad idea, depending on the details of the scenario. But it is not open to criticism by some guy in Africa not understanding QM, which doesn't matter.

Second, you accused Jaynes and Yudkowsky of being unscholarly when they said that Popper believed falsification is possible.

That's not what they said. Check out the actual quotes, e.g. Yudkowsky said "Karl Popper's idea that theories can be definitely falsified". That is not Popper's idea. Ideas cannot be "definitely falsified" but only fallibly/conjecturally/tentatively falsified.

Third, it leaves no room for weak criticisms.

That's a merit!

Well, it does leave room for them as ideas which you might remember and try to improve on later.

If you wish to claim that is has been, then you must tell me where exactly, between 1 and 1000, you draw the line between refuted and not refuted.

I would conjecture an explanation of when to stop and why. It would depend on what my goal was with the coin flipping. If that explanation wasn't refuted by criticism, I would use it.

For example, I might be flipping coins to choose which to use for the coin flipping olympics. my goal might be to keep my job. So I might stop after 5 flips all the same and just move on to the next coin. That would work fine for my purposes.

Replies from: benelliott

↑ comment by benelliott · 2011-04-07T22:16:04.177Z · LW(p) · GW(p)

In the mean time we have to make decisions. For making decisions, we never directly use canonical scientific theories. Instead, you make a conjecture like, "QM isn't perfect. But I can use it for building this space ship, and my spaceship will fly". This conjecture is itself open to criticism. It could be a bad idea, depending on the details of the scenario. But it is not open to criticism by some guy in Africa not understanding QM, which doesn't matter.

This, kind of feels like cheating. You use all its predictions but never give it credit for making them.

Besides, are you really suggesting that 'someone doesn't understand this' is a legitimate criticism. If this was correct it would mean that scientific truth is partly dependant on human minds, and that the laws of physics themselves change with our understanding of them.

That's not what they said. Check out the actual quotes, e.g. Yudkowsky said "Karl Popper's idea that theories can be definitely falsified". That is not Popper's idea. Ideas cannot be "definitely falsified" but only fallibly/conjecturally/tentatively falsified.

Yudkowsky doesn't use 'definitely' to mean 'with certainty'.

That's a merit!

No its not!

Weak criticisms are important, because they add up to strong ones and sometimes they are all we have to decide by.

I would conjecture an explanation of when to stop and why. It would depend on what my goal was with the coin flipping. If that explanation wasn't refuted by criticism, I would use it.

For example, I might be flipping coins to choose which to use for the coin flipping olympics. my goal might be to keep my job. So I might stop after 5 flips all the same and just move on to the next coin. That would work fine for my purposes.

You're moving the goal posts.

I wasn't asking what you would do in pragmatic terms, I was asking at which point would you consider the theory refuted. You have claimed your thinking process is based on examining criticisms and refuting theories when they are valid, so when is it valid?

Replies from: curi

↑ comment by curi · 2011-04-07T22:23:19.700Z · LW(p) · GW(p)

This, kind of feels like cheating. You use all its predictions but never give it credit for making them.

I don't know what you mean by "give credit". I'm happy to hand out all sorts of credit.

If I don't have a criticism of using all QMs predictions, then I'll use them. That someone doesn't understand QM isn't a criticism of this. That's only a criticism of explanations of QM.

If this was correct it would mean that scientific truth is partly dependant on human minds,

It would mean that what ideas are valuable is partly dependent on what people exist to care about them.

Yudkowsky doesn't use 'definitely' to mean 'with certainty'.

What does it mean?

He shouldn't write stuff that, using the dictionary definitions, is a myth about Popper, and not clarify. Even if you're right he isn't excused.

Weak criticisms are important, because they add up to strong ones and sometimes they are all we have to decide by.

I think they can't and don't. I think this is a big like saying 3 wrong answers add up to a right answer.

If an argument is false, why should it count for anything? Why would you ever want a large number of false arguments (false as best you can judge them) to trump one true argument (true as best you judge it)?

I wasn't asking what you would do in pragmatic terms, I was asking at which point would you consider the theory refuted.

I would tentatively, fallibly consider the theory "it is a fair coin" refuted after, say, 20 flips. Why 20? I conjectured 20 and don't have a criticism of it. For coin flipping in particular, if I had any rigorous needs, I would use some math in accordance with them.

Replies from: benelliott

↑ comment by benelliott · 2011-04-07T22:40:41.734Z · LW(p) · GW(p)

It would mean that what ideas are valuable is partly dependent on what people exist to care about them.

Be careful with arguing that an idea's value is a different thing to its truth, you're starting to sound like an apologist.

If I don't have a criticism of using all QMs predictions, then I'll use them. That someone doesn't understand QM isn't a criticism of this. That's only a criticism of explanations of QM.

If those explanations are helpful to some people they shouldn't be rejected simply because they are not helpful to others. After all, without them we would never have the predictions.

He shouldn't write stuff that, using the dictionary definitions, is a myth about Popper, and not clarify. Even if you're right he isn't excused.

I don't know about the dictionary definitions, but in everyday conversation 'definitely' doesn't mean 'with certainty'. As Wittgenstein pointed out, these words are frequently used in contexts where the speaker might be wrong for dozens of reasons, and knows it. For instance, "I definitely left my keys by the microwave" is frequently false, and is generally only said by people who are feeling uncertain about it.

I would tentatively, fallibly consider the theory "it is a fair coin" refuted after, say, 20 flips. Why 20? I conjectured 20 and don't have a criticism of it. For coin flipping in particular, if I had any rigorous needs, I would use some math in accordance with them.

I conjecture 21, you don't have any criticism of that either. I now have a criticism of 20, which is that it fails to explain why my conjecture is wrong.

I think they can't and don't. I think this is a big like saying 3 wrong answers add up to a right answer.

If an argument is false, why should it count for anything? Why would you ever want a large number of false arguments (false as best you can judge them) to trump one true argument (true as best you judge it)?

A weak criticism is not the same as an invalid criticism. It just means a criticism that slightly erodes a position, without single-handedly bringing the whole thing crashing down.

The coin-flip thing was intended as an example.

Replies from: curi

↑ comment by curi · 2011-04-08T00:19:23.195Z · LW(p) · GW(p)

If those explanations are helpful to some people they shouldn't be rejected simply because they are not helpful to others

You are taking rejection as a bigger deal than it is. The theory that "X is the perfect explanation" for X that confuses some people is false.

So we reject it.

We can accept other theories, e.g. that X is flawed but, for some particular purpose, is appropriate to use.

I don't know about the dictionary definitions,

It means "without doubt". Saying things like "I have no doubt that X" when there is doubt is just dumb.

I conjecture 21, you don't have any criticism of that either. I now have a criticism of 20, which is that it fails to explain why my conjecture is wrong.

The problem situation is under specified. When you ask ambiguous questions like what should I do in [under specified situation] then you get multiple possible answers and it's hard to do much in the way of criticizing.

In real world situations (which have rich context), it's not so hard to decide. But when I gave an example like that you objected.

I can't criticize 20 vs 21 unless I have some goal in mind, some problem we are trying to solve. (If there is no problem to be solved, I won't flip at all.) If the problem is figuring out if the coin is fair, with certainty, that is not solvable, so I won't flip at all. If it is figuring it out with a particular probability, given a few reasonable background assumptions, then I will look up the right math to use. If it's something else, what?

A weak criticism is not the same as an invalid criticism. It just means a criticism that slightly erodes a position, without single-handedly bringing the whole thing crashing down.

This is an important issue. I think your statement here is imprecise.

A criticism might demolish one single idea which is part of a bigger idea.

If it demolishes zero individual ideas, then where is the erosion?

If it demolishes one little idea, then that idea is refuted. And the big idea needs to replace it with something else which is not refuted, or find a way to do without.

Replies from: benelliott

↑ comment by benelliott · 2011-04-08T07:32:33.518Z · LW(p) · GW(p)

You are taking rejection as a bigger deal than it is. The theory that "X is the perfect explanation" for X that confuses some people is false.

Maybe QM is exactly right, and maybe it is just too complicated for some people to understand. There is no need to be so harsh in your criticism process, why not just admit that a theory can be right without being perfect in every other respect.

It means "without doubt". Saying things like "I have no doubt that X" when there is doubt is just dumb.

Yet everyone does it. Language is a convention, not a science. If you are using a word differently from everyone else then you are wrong, the dictionary has no authority on the matter.

The problem situation is under specified. When you ask ambiguous questions like what should I do in [under specified situation] then you get multiple possible answers and it's hard to do much in the way of criticizing.

This is a flaw. Bayes can handle any level of information.

I can't criticize 20 vs 21 unless I have some goal in mind, some problem we are trying to solve. (If there is no problem to be solved, I won't flip at all.) If the problem is figuring out if the coin is fair, with certainty, that is not solvable, so I won't flip at all. If it is figuring it out with a particular probability, given a few reasonable background assumptions, then I will look up the right math to use. If it's something else, what?

Can you really not see why the above is moving the goal posts. Earlier, you said that you think by coming up with conjectures, and criticising them, and only then make decisions. Now you are putting the decision making process in the driving seat and saying that everything is based on that. So is Popperianism purely pragmatic? Is the whole conjecture and criticism thing not really the important part, and in fact its all based on decision strategies. Or do you use the conjecture-criticism thing to try and reach the correct answer, as you have previously stated, and then use that for decision making.

If it demolishes zero individual ideas, then where is the erosion?

It makes the idea less likely, less plausible, by a small amount. The coin flip is intended to illustrate it. Saying that you will use Bayes in the coin flip example and nowhere else is like saying you believe Newton's laws work 'inside the laboratory' but you're going to keep using Aristotle outside.

↑ comment by SarahNibs (GuySrinivasan) · 2011-04-07T17:16:59.181Z · LW(p) · GW(p)

Following curi's steps, we'd lower our standards. How do you feel about the theory "I don't want to spend more time on this and getting 1000 heads if it's double-headed is 2^1000 more likely than getting 1000 heads if it's ordinary so I'll make the same decisions I'd make if I knew it were double-headed unless I get a rough estimate of at least a factor of 2^990 difference in how much I care about the outcome of one of those decisions".

Replies from: benelliott

↑ comment by benelliott · 2011-04-07T17:24:43.117Z · LW(p) · GW(p)

What you appear to be suggesting amounts to Bayesian epistemology done wrong.

Replies from: curi

↑ comment by curi · 2011-04-07T21:20:44.858Z · LW(p) · GW(p)

For coin flipping analysis, use Bayes' theorem (not Bayesian epistemology).

Replies from: benelliott, timtyler

↑ comment by benelliott · 2011-04-07T22:08:27.089Z · LW(p) · GW(p)

If Bayes generates the right answer here, whereas naive Popperian reasoning without it goes spectacularly wrong, maybe this should be suggesting something. Also it ignores my main point that Poppers theory does not admit weak criticisms, of which the coin coming up heads is just one example.

↑ comment by timtyler · 2011-04-08T12:53:42.947Z · LW(p) · GW(p)

Whether you have a double-headed coin or not is still a form of knowledge.

The Bayes' theorem:good, Bayesian epistemology:bad perspecitive won't wash.

comment by Peterdjones · 2011-04-14T16:34:04.291Z · LW(p) · GW(p)

Neither the potential infinity of theories, nor the possibility of error favour Popper over Bayes.

"The reason it doesn't is there's always infinitely many things supported by any evidence, in this sense. Infinitely many things which make wildly different predictions about the future, but identical predictions about whatever our evidence covers. If Y is 10 white swans, and X is "all swans are white" then X is supported, by your statement. But also supported are infinitely many different theories claiming that all swans are black, and that you hallucinated. You saw exactly what you would see if any of those theories were true, so they get as much support as anything else. There is nothing (in the concept of support) to differentiate between "all swans are white" and those other theories."

And that doesn't matter unless they (a) all have equal prior,and (b) will continue to be supported equally by any future evidence. (a) Is never the case, but doesn't help that much since, without a solution to (b), the relative rankings of various theories will never change from their priors, making evidence irrelevant. (b) is also never the case. The claims that 100% of swans are white, 90% are white, 80%, and so on will not all be equaly supported by a long series of observations of white swans. A Popperian could argue that the the theories that are becoming relatively less supported are becoming partially refuted. But relative refutation of T is relative support for not-T, just because of the meaning of relative. The Popperian can only rescue the situation by showing that there is absolute refutation, but no absolute support.

"If you do add something else to differentiate, I will say the support concept is useless. The new thing does all the work. And further, the support concept is frequently abused. I have had people tell me that "all swans are black, but tomorrow you will hallucinated 10 white swans" is supported less by seeing 10 white swans tomorrow than "all swans are white" is, even though they made identical predictions (and asserted them with 100% probability, and would both have been definitely refuted by anything else)."

They have different priors. The halucination theory is a skeptical hypothesis, and it is well known that skeptical hypotheses can't be refuted empirically. But we can still give them low priors. Or regard them as bad explanations--for instance, reject them because they are unfalsifiable or too easy to vary.

comment by [deleted] · 2011-04-07T09:32:51.967Z · LW(p) · GW(p)

"I see that as entrenching bias and subjectivism in reagards to morality -- we can make objective criticisms of moral values."

You keep asserting that. You keep failing to provide a shred of evidence.

"BTW I think it's ironic that I score better on support when I just stick 100% in front of every prediction in all theories I mention, while you score lower by putting in other numbers, and so your support concept discourages ever making predictions with under 100% confidence"

That's true right up until you see the first black swan. All else being equal, simpler explanations are always to be favoured over more complex ones. Look up Kolmogrov complexity and minimum message length.

Replies from: curi

↑ comment by curi · 2011-04-07T09:46:31.697Z · LW(p) · GW(p)

You keep asserting that. You keep failing to provide a shred of evidence.

I posted arguments. What did you not like about them? Post a criticism of something I said. "No evidence" is just a request for justification which I regard as impossible, but I did give arguments.

That's true right up until you see the first black swan.

At which point infinitely many of my 100% theories will be refuted. And infinitely many will remain. You can never win at that game using finite evidence. For any finite set of evidence, infinitely many 100% type theories predict all of it perfectly.

Replies from: None, None, prase

↑ comment by [deleted] · 2011-04-07T10:28:03.039Z · LW(p) · GW(p)

What arguments? I had a look through what you've written, and found this "There are objective facts about how to live, call them what you will. Or, maybe you'll say there aren't. If there are, then it's not objectively wrong to be a mass murderer. Do you really want to go there into full blown relativism and subjectivism?"

This is hardly an argument for the truth content of a statement. Just because the consequences of a theory of moral behaviour make us feel bad doesn't mean they are not true- we should be interested in whether the statements conflict with how the universe seems to work. The notion of morality independent to sentient beings has always seemed fundamentally absurd to me, and I have yet to find a decent argument in its favour.

The worry of slipping into moral relativism is that we are trapped in a position where we can't punish mass murderers. But theres lots of sensible reasons for mass murderers to punish mass murder, and not indulge in it themselves. One would have to get exceptional utility out of murdering to counteract all the downsides with performing such an action.

The problem here is, as often happens with moral discussion, that wrong is not well defined. You say wrong to mean a grand moral force of the universe, but it could mean "is this a sensible action for this being to take, given their goals and desires". It might turn out that given all that said person IS benefited most by mass murder

Replies from: curi

↑ comment by curi · 2011-04-07T10:43:05.358Z · LW(p) · GW(p)

As I recall I gave a citation where to find Popper discussing morality (it is The World of Parmenides).

And I explained that moral knowledge is created using the same method as any other kind of knowledge. And i said that that method is (conjectures and refutations).

questioning if people want to advocate strong relativism or subjectivism is an argument, too. if you aren't aware of the already existing arguments against relativism or subjectivism, then it's incomplete for you. you could always ask.

you haven't understood my view. i didn't say it's a moral force. the issue of "what is the right action, given my goals and desires?" is 100% objective, and it is a moral issue. i don't know why you expected me to disagree about that. there is a fact of the matter about it. that is one of the major parts of morality. but there is also a second part: the issue of what are good goals and desires to have?

how can that be objective, you wonder? well for example some sets of goals contradict each other. that allows for a type of objective moral argument, about what goals/values/preferences/utility-functions to have, against contradictory goals.

there's others. to start with, read: http://www.curi.us/1169-morality

Replies from: benelliott, None

↑ comment by benelliott · 2011-04-07T12:30:08.246Z · LW(p) · GW(p)

"what is the right action, given my goals and desires?" is 100% objective

Bayes, combined with Von Neuman Mortenson utility theory answers this, at least in principle.

You keep acting as if it is a flaw that Bayes only predicts. Is it a flaw that Newton's laws of motion do not explain the price of gold? Narrowness is a virtue, attempting to spread your theory as wide as possible ends up forcing it into places where it doesn't belong.

Replies from: curi

↑ comment by curi · 2011-04-07T18:19:56.107Z · LW(p) · GW(p)

If bayes wants to be an epistemology then it must do more than predict. Same for Newton.

If you want to have math which doesn't dethrone Popper, but is orthogonal, you're welcome to do that and i'd stop complaining (much). However Yudkowsky says Bayesian Epistemology dethrones and replaces Popper. He regards it as a rival theory to Popper's. Do you think Yudkowsky was wrong about that?

Replies from: timtyler, benelliott

↑ comment by timtyler · 2011-04-07T19:40:07.283Z · LW(p) · GW(p)

Yudkowsky says Bayesian Epistemology dethrones and replaces Popper. He regards it as a rival theory to Popper's. Do you think Yudkowsky was wrong about that?

It replaces Popperian epistemology where their domains overlap - namely: building models from observations and using them to predict the future. It won't alone tell you what experiments to perform in order to gather more data - there are other puzzle pieces for dealing with that.

Replies from: curi

↑ comment by curi · 2011-04-07T20:23:29.557Z · LW(p) · GW(p)

There's no overlap there b/c Popperian epistemology doesn't provide the specific details of how to do that. Popperian epistemology is fully compatible with, and can use, Bayes' theorem and any other pure math or logic insights.

Popperian epistemology contradicts your "other puzzle pieces". And without them, Bayes' theorem alone isn't epistemology.

Replies from: timtyler

↑ comment by timtyler · 2011-04-07T20:54:30.284Z · LW(p) · GW(p)

It replaces Popperian epistemology where their domains overlap - namely: building models from observations and using them to predict the future.

There's no overlap there b/c Popperian epistemology doesn't provide the specific details of how to do that.

Except for the advice on induction? Or has induction merely been rechristened as corroboration? Popper enthusiasts usually seem to deny doing that.

Replies from: curi

↑ comment by curi · 2011-04-07T20:55:50.885Z · LW(p) · GW(p)

Induction doesn't work.

building models from observations and using them to predict the future

I thought you were referring to things you can do with Bayes' theorem and some input. If you meant something more, provide the details of what you are proposing.

Replies from: timtyler

↑ comment by timtyler · 2011-04-08T13:04:16.404Z · LW(p) · GW(p)

Building models from observations and using them to predict the future is what Solomonoff induction does. It is Occam's razor plus Bayes's theorem.

↑ comment by benelliott · 2011-04-07T19:28:42.950Z · LW(p) · GW(p)

The most common point of Popper's philosophy that I hear (including from my Popperian philosophy teacher) is the whole "black swan white swan" thing, which Bayes does directly contradict, and dethrone (though personally I'm not a big fan of that terminology).

The stuff you talked about with conjectures and criticisms does not directly contradict Bayes and if the serious problems with 'one strike and you're out' criticisms are fixed it I may be persuaded to accept both it and Bayes.

Bayes is not meant to be an epistemology all on its own. It only starts becoming one when you put it together with Solomonoff Induction, Expected Utility Theory, Cognitive Science and probably a few other pieces of the puzzle that haven't been found yet. I presume the reason it is referred to as Bayesian rather than Solomonoffian or anything else is that Bayes is the both most frequently used and the oldest part.

Replies from: curi

↑ comment by curi · 2011-04-07T19:33:14.889Z · LW(p) · GW(p)

The black swan thing is not that important to Popper's ideas, it is merely a criticism of some of Popper's opponents.

How does Bayes dethrone it? By asserting that white swans support "all swans are white"? I've addressed that at length (still going through overnight replies, if someone answered my points i'll try to find it).

Solomonoff Induction, Expected Utility Theory, Cognitive Science

Well I don't have a problem with Bayes' theorem itself, of course (pretty much no one does, right? i hope not lol). It's these surrounding ideas that make an epistemology that I think are mistaken, and all of which Popper's epistemology contradicts. (I mean the take on cognitive science popular here, not the idea of doing cognitive science).

Replies from: benelliott

↑ comment by benelliott · 2011-04-07T19:58:17.159Z · LW(p) · GW(p)

(still going through overnight replies, if someone answered my points i'll try to find it)

I think I answered your points a few days ago with my first comment of this discussion.

In short, yes, there are infinitely many hypotheses whose probabilities are raised by the white swan, and yes those include both "all swans are white" and "all swans are black and I am hallucinating" but the former has a higher prior, at least for me, so it remains more probable by several orders of magnitude. For evidence to support X it doesn't have to only support X. All that is required is that X does better at predicting than the weighted average of all alternatives.

I have had people tell me that "all swans are black, but tomorrow you will hallucinated 10 white swans" is supported less by seeing 10 white swans tomorrow than "all swans are white" is, even though they made identical predictions (and asserted them with 100% probability, and would both have been definitely refuted by anything else).

Just to be clear I am happy to say those people were completely wrong. It would be nice if nobody ever invented a poor argument to defend a good conclusion but sadly we do not live in that world.

Replies from: curi

↑ comment by curi · 2011-04-07T20:21:58.091Z · LW(p) · GW(p)

I think I answered your points a few days ago with my first comment of this discussion.

But then I answered your answer, right? If I missed one that isn't pretty new, let me know.

but the former has a higher prior

so support is vacuous and priors do all the real work. right?

and priors have their own problems (why that prior?).

Just to be clear I am happy to say those people were completely wrong. It would be nice if nobody ever invented a poor argument to defend a good conclusion but sadly we do not live in that world.

OK. I think your conception of support is unsubstantive but not technically wrong.

Replies from: benelliott

↑ comment by benelliott · 2011-04-07T21:01:54.915Z · LW(p) · GW(p)

so support is vacuous and priors do all the real work. right?

No. Bayesian updating is doing the job of distinguishing "all swans are white" from "all swans are black" and "all swans are green" and "swans come in a equal mixture of different colours". It is only a minority of hypothesis which are specifically crafted to give the same predictions as "all swans are white" where posterior probabilities remain equal to priors.

What is it with you! I admit that priors are useful in one situation and you conclude that everything else is useless!

Also, the problem of priors is overstated. Given any prior at all, the probability of eventually converging to the correct hypothesis, or at any rate a hypothesis which gives exactly the same predictions as the correct one, is 1.

Bayes cannot distinguish between two theories that assign exactly the same probabilities to everything, but I don't see how you could distinguish them, without just making sh*t up, and it doesn't matter much anyway since all my decisions will be correct whichever is true.

Replies from: curi

↑ comment by curi · 2011-04-07T21:07:45.270Z · LW(p) · GW(p)

Bayesian updating is doing the job of distinguishing "all swans are white" from "all swans are black"

But that is pretty simple logic. Bayes' not needed.

@priors -- are you saying you use self-modifying priors?

Bayes cannot distinguish between two theories that assign exactly the same probabilities to everything

That makes it highly incomplete, in my view. e.g. it makes it unable to address philosophy at all.

but I don't see how you could distinguish them

By considering their explanations. The predictions of a theory are not its entire content.

without just making sh*t up

that's one of the major problems popper addressing (reconciling fallibilism and non-justification with objective knowledge and truth)

and it doesn't matter much anyway since all my decisions will be correct whichever is true.

It does matter, given that you aren't perfect. How badly things start breaking when mistakes are made depends on issues other than what theories predict -- it depends on their explanations, internal structure, etc...

Replies from: benelliott

↑ comment by benelliott · 2011-04-07T22:07:49.596Z · LW(p) · GW(p)

It does matter, given that you aren't perfect. How badly things start breaking when mistakes are made depends on issues other than what theories predict -- it depends on their explanations, internal structure, etc...

No, I'm pretty sure that if I theory A and theory B generate the same predictions then things will go exactly as well or badly for me whichever is true.

By considering their explanations. The predictions of a theory are not its entire content.

One could say that this is how to work out priors. You are aware that the priors aren't necessarily set in stone at the beginning of time? Jaynes pointed out that a prior should always include all the information you have that is not explicitly part of the data (and even the distinction between prior and data is just a convention), and may well be based on insights or evidence encountered at any time, even after the data was collected.

Solomonoff Induction is precisely designed to consider explanations. The difference is it does so in a rigorous mathematical fashion rather than with a wishy-washy word salad.

That makes it highly incomplete, in my view. e.g. it makes it unable to address philosophy at all.

It was designed to address science, which is a more important job anyway.

However, in my experience, the majority of philosophical questions are empirically addressable, at least in principle, and the majority of the rest are wrong questions.

Replies from: curi

↑ comment by curi · 2011-04-07T22:11:36.592Z · LW(p) · GW(p)

No, I'm pretty sure that if I theory A and theory B generate the same predictions then things will go exactly as well or badly for me whichever is true.

No!

OK would you agree that this is an important point of disagreement, and an interesting discussion topic, to focus on? Do you want to know why not?

Do you have any particular argument that I can't be right about this? Or are you just making a wild guess? Are you open to being wrong about this? Would you be impressed if there was a theory which explained this issue? Intrigued to learn more about the philosophy from which I learned this concept?

Replies from: benelliott

↑ comment by benelliott · 2011-04-07T22:27:58.875Z · LW(p) · GW(p)

I cannot see how I could be wrong.

Lets look at a Bayesian decision process. First you consider all possible actions that you could take, this is unaffected by the difference between A and B. For each of them you use your probabilities to get a distribution across all possible outcomes, these will be identical.

You assign a numerical utility to each outcome based on how much you would value that outcome. If you want, I can give a method for generating these numerical utilities. These will be a mixture of terminal and instrumental values. Terminal values are independent of beliefs, so these are identical. Instrumental values depend on beliefs, but only via predictions about what an outcome will lead to in the long run, so these are identical.

For each action you take an average of the values of all outcomes weighted by probability, and pick action with the highest result. This will be the same with theory A or theory B. So I do the same thing either way, and the same thing happens to me either way. Why do I care which is true?

Replies from: curi

↑ comment by curi · 2011-04-07T22:32:25.226Z · LW(p) · GW(p)

I cannot see how I could be wrong.

So, if you were wrong, you'd be really impressed, and want to rethink your worldview?

Or did you mean you're not interested?

None of the rest of what you say is relevant at all. Remember that you said, "No, I'm pretty sure that if I theory A and theory B generate the same predictions then things will go exactly as well or badly for me whichever is true." It wasn't specified that they were Bayesian decision theories.

And the context was how well or badly it goes for you when we introduce mistakes into the picture (e.g. the issue is you, being fallible, make some mistakes. How resilient is your life to them?).

Do you now understand the issue I'm talking about?

Replies from: benelliott

↑ comment by benelliott · 2011-04-07T23:01:15.464Z · LW(p) · GW(p)

Notice how the description I gave made absolutely no reference to whether or not the theories are correct. The argument applies equally well regardless of any correspondence to reality or lack thereof. Nothing changes when we introduce mistakes to the picture because they are already in the picture.

The only kind of mistakes that can hurt me are the ones that affect my decisions, and the only ones that can do that are the ones that affect my predictions. The point remains, if the predictions are the same, my actions are the same, the outcome is the same.

Replies from: curi

↑ comment by curi · 2011-04-08T00:09:00.177Z · LW(p) · GW(p)

You're still mistaken and have overlooked several things.

And you have ignored my questions.

In Popperian epistemology, we do not say things like

I cannot see how I could be wrong.

No, I'm pretty sure that [my position is true]

They are anti-fallibilist, closed minded, and silly. We don't think our lack of imagination of how we could possibly be wrong is an argument that we are right.

I want you to pin yourself down a little bit. What will you concede if you find out you are wrong about this? Will you concede a lot or almost nothing? Will you regard it as important and be glad, or will you be annoyed and bored? Will you learn much? Will your faith in Bayes be shaken? What do you think is at stake here?

And are you even interested? You have expressed no interest in why I think you're mistaken, you just keep saying how I can't possibly have a point (even though you don't yet know what it is).

Replies from: benelliott, benelliott

↑ comment by benelliott · 2011-04-08T07:11:14.442Z · LW(p) · GW(p)

I want you to pin yourself down a little bit. What will you concede if you find out you are wrong about this? Will you concede a lot or almost nothing? Will you regard it as important and be glad, or will you be annoyed and bored? Will you learn much? Will your faith in Bayes be shaken? What do you think is at stake here?

It annoys me a lot when people do this, because I can be wrong in many different ways. If I give a maths proof, then say I cannot see how it could be wrong, someone else might come up and ask me if I will give up my trust (trust, not faith, is what I have in Bayes by the way) in maths. When they reveal why I am wrong, it turns out I just made a mathematical error, I have learnt that I need to be more careful, not that maths is wrong.

I am confident enough in that statement that I would be interested to find out why you think it is wrong.

If the way in which you prove me wrong turns out to be interesting and important, rather than a technical detail or a single place where I said something I didn't mean, then it will likely cause a significant change in my world view. I will not just immediately switch to Popper, there are more than two alternatives after all, and I may well not give up on Bayes. This isn't a central tenet of Bayesian decision theory, (although it is a central tenet of instrumental rationality), so it won't refute the whole theory.

My most likely response, if you really can show that more than prediction is required, is to acknowledge that at least one component of the complete Bayesian epistemology is still missing. It would not surprise me, although it would surprise me to find that this specific thing was what was missing.

No, I'm pretty sure that [my position is true]

I'm not asserting that I could not possibly be wrong, that P(I am wrong) = 0. All I am saying is that I feel pretty sure about this, which I do.

↑ comment by benelliott · 2011-04-08T13:27:41.003Z · LW(p) · GW(p)

Since you refuse to state your point I'm going to guess what it is.

My guess is that you are referring to the point you made earlier about how the difference between "the seasons are caused by the earth tilting on its axis" and "the seasons are caused by the Goddess Demeter being upset about Persephone being in Hades" is that the former has a good explanation and the latter has a bad explanation. Is your point that if I don't care about explanations I have no means of distinguishing between them.

I do not find this convincing, I do not currently have time to explain why but I can do so later if you want.

Replies from: curi

↑ comment by curi · 2011-04-08T16:55:18.528Z · LW(p) · GW(p)

i got bored of your evasions.

you're not on the right track.

if you were going to buy a black box which does multiplication, do you think all black boxes you could buy -- which you thoroughly test and find give perfect outputs for all inputs -- are equally good?

disregard time taken to get an answer. and they only multiply numbers up to an absolute value of a trillion, say.

Replies from: benelliott

↑ comment by benelliott · 2011-04-08T17:18:50.785Z · LW(p) · GW(p)

if you were going to buy a black box which does multiplication, do you think all black boxes you could buy -- which you thoroughly test and find give perfect outputs for all inputs -- are equally good?

disregard time taken to get an answer. and they only multiply numbers up to an absolute value of a trillion, say.

If I disregard time taken then yes, they are all equally good (assuming we don't add in other confounding factors like if one works by torturing puppies and the other doesn't).

Replies from: curi

↑ comment by curi · 2011-04-08T17:26:52.357Z · LW(p) · GW(p)

But one might work, internally, by torturing puppies.

One might work, internally, in a way that will break sooner.

One might work, internally, in a way that is harder to repair if it does break.

One might work, internally, in a way that is harder or easier to convert to perform some other function.

So the internal structure of knowledge, which makes identical predictions, does matter.

All good programmers know this in the form of: some coding styles, which achieve the same output for the users, have different maintenance costs.

This is an important fact about epistemology, that the internal structure of knowledge matters, not just its results.

edit: Relating this to earlier conversation, one might work, internally, in a way so that if an error does happen (maybe they have error rates of 1 time in 500 trillion. or maybe something partially breaks after you buy it and use it a while), then the result you get is likely to be off by a small amount. Another might work internally in a way that if something goes wrong you may get random output.

and lol @ Marius

Replies from: benelliott, Marius

↑ comment by benelliott · 2011-04-08T19:39:25.819Z · LW(p) · GW(p)

I just said I was assuming away confounding factors like that for the sake of argument.

One might work, internally, in a way that will break sooner.

One might work, internally, in a way that is harder to repair if it does break.

Ideas do not 'break'. They are either correspond to reality or they do not, this is a timeless fact about them. They do not suddenly switch from corresponding to reality to not doing so.

If by break you mean 'fail in some particular scenario that has not yet been considered' then the only way one can fail and the other not is if they generate different predictions in that scenario.

One might work, internally, in a way that is harder or easier to convert to perform some other function.

The only other function would be to predict in a different domain, which would mean that if they still make the same predictions they are equally good.

Replies from: curi

↑ comment by curi · 2011-04-08T19:53:02.461Z · LW(p) · GW(p)

I just said I was assuming away confounding factors like that for the sake of argument.

You meant you were intentionally ignoring all factors that could possibly make them different? That the puppy example had to do with immorality was merely a coincidence, and not representative of the class of factors you wanted to ignore? That completely defeats the point of the question to say "I don't care which I buy, ignoring all factors that might make me care, except the factors that, by premise, are the same for both".

Ideas do not 'break'.

Not everything I said directly maps back to the original context. Some does. Some doesn't.

Ideas stored in brains can randomly break. Brains are physical devices with error rates.

Regardless, ideas need to be changed. How easily they change, and into what, depends on their structure.

The only other function would be to predict in a different domain, which would mean that if they still make the same predictions they are equally good.

Even if I granted that (I don't), one black box could have a list of predictions. Another one could predict using an algorithm which generates the same results as the list.

When you go to change them to another domain, you'll find it matters which you have. For example, you might prefer the list, since a list based approach will still work in other domains, while the algorithm might not work at all. Or you might prefer the algorithm since it helps you find a similar algorithm for the new domain, and hate the list approach b/c the new domain is trillions of times bigger and the list gets unwieldy.

Replies from: benelliott

↑ comment by benelliott · 2011-04-08T21:53:26.416Z · LW(p) · GW(p)

Ideas stored in brains can randomly break. Brains are physical devices with error rates.

Brains are fairly resilient to the failure of a single component.

Regardless, ideas need to be changed. How easily they change, and into what, depends on their structure.

Part of the problem here is that I'm not clear what you mean by ideas. Are you using Hume's definition, in which an idea is like a mental photocopy of a sensory impression? Or do you mean something more like a theory?

If you mean the former then ideas are just the building blocks of predictions and plans. They're content matters, because their content effects the predictions I make.

If you mean something like a theory, then let me explain what I think of theories as. A theory is like a compressed description of a long list of probabilities. For example, Newton's laws could be 'unfolded' into a very long, possibly infinite, list of predictions for what objects will do in a wide range of situations, but it is quicker to give them in their compressed form, as general principles of motion.

A theory can be seen as a mathematical function, from a subset of the set of all possible strings of future events to the real numbers between 0 and 1. Ultimately, a functions identity rests in its output and nothing else. x+x and 2x may look different, but to say they are different functions is absurd. If you give me a functional equation to solve, and I show that 2x is the answer, you cannot criticise me for failing to distinguish between the the two possibilities of 2x and x+x, because they are not actually two different possibilities.

The only way you could compare 2x and x+x is to note that the former is slightly quicker to write, so when we have two theories giving identical predictions we pick the one that is shorter, or from which it easier to generate the predictions (analogous to picking the faster box, or the box which is more convenient to carry around with you).

Replies from: curi

↑ comment by curi · 2011-04-08T21:57:54.814Z · LW(p) · GW(p)

Brains are fairly resilient to the failure of a single component.

Yes. And are not perfectly resilient. So, I'm right and we do have to consider the issue.

I'm not clear what you mean by ideas

http://fallibleideas.com/ideas

A theory is like a compressed description of a long list of probabilities

What do you call explanatory theories?

Even with your definition of theories, my example about changing a theory to apply to a new domain is correct, is it not? e.g. which compression algorithm is used matters, even though it doesn't affect the predictions being made.

Or similarly: compress data, flip one bit, uncompress. The (average) result depends on the compression algorithm used. And since our brains are fallible then this can happen to people. How often? Well... most people experience lots of errors while remembering stuff.

Replies from: benelliott

↑ comment by benelliott · 2011-04-08T22:06:14.170Z · LW(p) · GW(p)

http://fallibleideas.com/ideas

This 'explanation' does not explain anything. It gives a one sentence definition 'an idea is the smallest unit of coherent thought' and the rest is irrelevant to the definition. That one sentence just pushes back the question into "what counts as coherent thought".

The definition this looks most like it Hume's, is it that?

Even with your definition of theories, my example about changing a theory to apply to a new domain is correct, is it not? e.g. which compression algorithm is used matters, even though it doesn't affect the predictions being made.

No, you do not change a theory to apply to a new domain. You invent a new theory.

Or similarly: compress data, flip one bit, uncompress. The (average) result depends on the compression algorithm used. And since our brains are fallible then this can happen to people. How often? Well... most people experience lots of errors while remembering stuff.

Fine, you also try to pick the explanation which you find easiest to remember, which is pretty much equivalent to the shortest. Or you write stuff down.

Replies from: curi

↑ comment by curi · 2011-04-08T22:19:28.258Z · LW(p) · GW(p)

Some new theories are modifications of old theories, right? e.g. in physics, QM wasn't invented from scratch.

So the "structure" (my word) of the old theories matters.

Fine, you also try to pick the explanation which you find easiest to remember, which is pretty much equivalent to the shortest. Or you write stuff down.

I think you're missing the point which is that some compression algorithms (among other things) are more error resistance. This is an example of how the internal structure of knowledge making identical predictions can differ and how the differences can have real world consequences.

Which is just the thing we were debating, and which you previously couldn't conceive of how you could be mistaken about.

Replies from: benelliott

↑ comment by benelliott · 2011-04-08T23:01:38.216Z · LW(p) · GW(p)

Some new theories are modifications of old theories, right? e.g. in physics, QM wasn't invented from scratch.

Some theories are modifications of theories from other domains. It is quite rare for such a theory to work. As for QM, what happened there is quite a common story in science.

A theory is designed which makes correct predictions within the range of things we can test, but as our range of testable predictions expands we find it doesn't predict as well in other domains. It gets abandoned in favour of something better (strictly speaking it never gets fully abandoned, its probability just shrinks to well below the threshold for human consideration). That better theory inevitably 'steals' some of the old theory's predictions, since the old theory was making correct predictions in some domains the new theory must steal those otherwise it will quickly end up falsified itself.

This doesn't mean theories should be designed to do anything other than predict well, the only thing a theory an hope to offer any successor is predictions, so the focus should remain on those.

I think you're missing the point which is that some compression algorithms (among other things) are more error resistance. This is an example of how the internal structure of knowledge making identical predictions can differ and how the differences can have real world consequences.

Which is just the thing we were debating, and which you previously couldn't conceive of how you could be mistaken about.

As someone with quite a bit of training in maths and no training in computing I have a bit of a cognitive blind-spot for efficiency concerns with algorithms. However, if we look at the real point of this digression, which was whether it is a fault of Bayes that it does not do more than predict, then I think you don't have a point.

When do these efficiency concerns matter, are you saying there should be some sort of trade-off between predictive accuracy and efficiency? That if we have a theory that generate slightly worse predictions but does so much more efficiently then we should adopt it?

I can't agree to that. It may sometimes be useful to use the second theory as a useful approximation to the first, but we should always keep one eye on the truth, you never know when you will need to be as accurate as possible (Newtonian Mechanics and Relativity provide a good example of this dynamic).

If you are saying efficiency concerns only matter with theories that have exactly the same predictive accuracy, then all your talk of structure and content can be reduced to a bunch of empirical questions "which of these is shorter" "which of these is most quick to use" "which of these is least likely to create disastrous predictions if we make an error". Every one of these has its own Bayesian answer, so the process solves itself.

Replies from: curi

↑ comment by curi · 2011-04-09T19:29:12.339Z · LW(p) · GW(p)

As for QM, what happened there is quite a common story in science.

Right. It's common that old theories get changed. That there is piecemeal improvement of existing knowledge, rather than outright replacement.

How well ideas are able to be improved in this way -- how suited for it they are -- is an attribute of ideas other than what predictions they make. It depends on what I would call their internal structure.

This doesn't mean theories should be designed to do anything other than predict well

So that's (above) why it does mean it. Note that good structure doesn't come at the cost of prediction. It's not a trade off; they aren't incompatible. You can have both.

As someone with quite a bit of training in maths and no training in computing I have a bit of a cognitive blind-spot for efficiency concerns with algorithms.

Efficiency of algorithms (space and speed both) is a non-predictive issue, and it's important, but it's not what I was talking about at all. It's ground that others have covered plenty. That's why I specifically disqualified that concern in my hypothetical.

When do these efficiency concerns matter, are you saying there should be some sort of trade-off between predictive accuracy and efficiency?

No trade off, and I wasn't talking about efficiency. My concern is how good ideas are at facilitating their own change, or not.

There's some good examples of this from programming. For example, you can design a program which is more modular: it has separated, isolated parts. Then later the parts can be individually reused. This is good! There's no tradeoffs involved here with what output you get from the program -- this issue is orthogonal to output.

If the parts of the program are all messily tied together, then you remove one and everything breaks. (That means: if you introduce some random error, you end up with unpredictable, bad results.) If they are decoupled, and each system has fault tolerances and error checking on everything it's told by the other systems, then errors, or wanted to replace some part of it, can have much more limited affects on the rest of the system.

That if we have a theory that generate slightly worse predictions but does so much more efficiently then we should adopt it?

No, nothing like that at all.

Replies from: benelliott

↑ comment by benelliott · 2011-04-09T20:40:24.716Z · LW(p) · GW(p)

Right. It's common that old theories get changed. That there is piecemeal improvement of existing knowledge, rather than outright replacement.

It is my claim that this is not a process of change, it is a total replacement. It will happen that if the old theory had any merit at all then some of its predictions will be identical to those made by the new one.

There's some good examples of this from programming. For example, you can design a program which is more modular: it has separated, isolated parts. Then later the parts can be individually reused. This is good! There's no tradeoffs involved here with what output you get from the program -- this issue is orthogonal to output.

If my claim is correct then all theories are perfectly modular. A replacement thoery may have an entirely different explanation, but it can freely take any subset of the old theory's predictions.

Efficiency of algorithms (space and speed both) is a non-predictive issue, and it's important, but it's not what I was talking about at all. It's ground that others have covered plenty. That's why I specifically disqualified that concern in my hypothetical.

I was using efficiency as a quick synonym for all such concerns, to save time and space.

No, nothing like that at all.

Fine, if we have a new theory that generates very slightly worse predictions, but is less modular, would you advocate replacement.

Replies from: curi

↑ comment by curi · 2011-04-09T20:47:19.960Z · LW(p) · GW(p)

If my claim is correct then all theories are perfectly modular.

Do you think all computer programs are perfectly modular? What is it that programmers are learning when they read books on modular design and change their coding style?

I was using efficiency as a quick synonym for all such concerns, to save time and space.

Well OK. I didn't know that. I don't think you should though b/c efficiency is already the name of a small number of well known concerns, and the ones I'm talking about are separate.

Fine, if we have a new theory that generates very slightly worse predictions, but is less modular, would you advocate replacement.

I think this question is misleading. The answer is something like: keep both, and use them for different purposes. If you want to make predictions, use the theory currently best at that. If you want to do research, the more modular one might be a more promising starting point and might surpass the other in the future.

It is my claim that this is not a process of change, it is a total replacement. It will happen that if the old theory had any merit at all then some of its predictions will be identical to those made by the new one.

In physics, QM retains many ideas from classical physics. Put it this way: a person trained in classical physics doesn't have to start over to learn QM. There are lots of ways they are related and his pre-existing knowledge remains useful. This is why even today classical physics is still taught (usually first) at universities. It hasn't been totally replaced. It's not just that some of its predictions are retained (they are only retained as quick approximations in limited sets of circumstances, btw. by and large, technically speaking, they are false. and taught anyway), it's that ways of thinking about physics, and approaching physics problems, are retained. people need knowledge other than predictions such as how to think like physicist. while classical physics predictions were largely not retained, some other aspects largely were retained.

BTW speaking of physics, are you aware of the debate about the many worlds interpretation, the bohmian interpretation, the shut up and calculate interpretation, the copenhagan interpretation, and so on? None of these debates about about prediction. They are all about the explanatory interpretation (or lack there of, for the shut up and calculate school), and the debates are between people who agree on the math and predictions.

Replies from: benelliott

↑ comment by benelliott · 2011-04-09T21:20:52.588Z · LW(p) · GW(p)

Do you think all computer programs are perfectly modular? What is it that programmers are learning when they read books on modular design and change their coding style?

They are learning to write computer programs, which are not necessarily perfectly modular. Computer programs and scientific theories are different things and work in different ways.

I think this question is misleading. The answer is something like: keep both, and use them for different purposes. If you want to make predictions, use the theory currently best at that. If you want to do research, the more modular one might be a more promising starting point and might surpass the other in the future.

Are you imagining a scenario something like Ptolemy's astronomy versus Copernican astronomy during the days when the latter still assumed the planets moved in perfect circles?

In that case I can sort of see your point, the latter was a more promising direction for future research while the former generated better predictions. The former deserved a massive complexity penalty of course, but it may still have come out on top in the Bayesian calculation.

Sadly, there will occasionally be times when you do everything right but still get the wrong answer. Still, there is a Bayesian way to deal with this sort of thing without giving up the focus on predictions. Yudkowsky's Technical Explanation of Technical Explanation goes into it.

In physics, QM retains many ideas from classical physics. Put it this way: a person trained in classical physics doesn't have to start over to learn QM. There are lots of ways they are related and his pre-existing knowledge remains useful. This is why even today classical physics is still taught (usually first) at universities. It hasn't been totally replaced. It's not just that some of its predictions are retained (they are only retained as quick approximations in limited sets of circumstances, btw. by and large, technically speaking, they are false. and taught anyway), it's that ways of thinking about physics, and approaching physics problems, are retained. people need knowledge other than predictions such as how to think like physicist. while classical physics predictions were largely not retained, some other aspects largely were retained.

That may be the case, it doesn't mean that when designing a theory I should waste any concern on "should I try to include general principles which may be useful to future theories". Thinking like that is likely to generate a lot of rubbish. Thinking "what is happening here, and what simple mathematical explanation can I find for it?" is the only way to go.

BTW speaking of physics, are you aware of the debate about the many worlds interpretation, the bohmian interpretation, the shut up and calculate interpretation, the copenhagan interpretation, and so on? None of these debates about about prediction. They are all about the explanatory interpretation (or lack there of, for the shut up and calculate school), and the debates are between people who agree on the math and predictions.

Yes, I am. There are some empirical prediction differences, for instance, we might or might not be able to create quantum superpositions at increasingly large scales. In general, I say just pick the mathematically simplest and leave it at that.

You don't need to resort to QM for an example of that dilemma, "do objects vanish when we stop looking at them?" is another such debate, but ultimately it doesn't make much difference either way so I say just assume they don't since its mathematically simpler.

Edit:

Thinking about this some more, I can see the virtue of deliberately spending some time thinking about explanations, and keeping a record of such explanations even if they are attached to theories that make sub-par predictions.

This should all be kept separate from the actual business of science. Prediction should also remain the target, the job of explanations is merely a means towards that end.

Double Edit:

I just realised that I wrote do where I meant don't for objects vanishing. That completely changes the meaning of the paragraph. Crap.

Replies from: None, curi, curi

↑ comment by [deleted] · 2011-04-09T21:35:00.304Z · LW(p) · GW(p)

You don't need to resort to QM for an example of that dilemma, "do objects vanish when we stop looking at them?" is another such debate, but ultimately it doesn't make much difference either way so I say just assume they do since its mathematically simpler.

Assuming that objects do vanish when you stop looking at them is much simpler?

Replies from: benelliott

↑ comment by benelliott · 2011-04-09T21:43:30.439Z · LW(p) · GW(p)

Just noted that in an edit, it was a typo.

↑ comment by curi · 2011-04-09T22:08:09.706Z · LW(p) · GW(p)

This should all be kept separate from the actual business of science

Part of the business of science is to create successively better explanations of the world we live in. What its nature is, and so on.

Or maybe you will call that philosophy. If you do, it will be the case that many scientists are "philosophers" too. In the past, I would have said most of them. But instrumentalism started getting rather popular last century.

I wonder where you draw the line with your instrumentalism. For example, do you think the positivists were mistaken? If so, what are your arguments against them?

Replies from: benelliott

↑ comment by benelliott · 2011-04-09T22:49:19.245Z · LW(p) · GW(p)

I do think the positivists went too far. They failed to realise that we can make predictions about things which we can never test. We can never evaluate these predictions, and we can never update our models on the basis of them, but we can still make them in the same way as we make any other predictions.

For example, consider the claim "a pink rhinoceros rides a unicycle around the Andromeda galaxy, he travels much faster than light and so completes a whole circuit of the galaxy every 42 hours. He is, of course, far too small for our telescopes to see."

The positivist says "meaningless!"

I say "meaningful, very high probability of being false"

Another thing they shouldn't have dismissed is counterfactuals. As Pearl showed, questions about counterfactuals can be reduced to Bayesian questions of fact.

Part of the business of science is to create successively better explanations of the world we live in. What its nature is, and so on.

I sympathise with this. To some extent I may have been exaggerating my own position in my last few posts, it happens to me occasionally. I do think that predictions are the only way of entangling your beliefs with reality, of creating a state of the world where what you believe is causally affected by what is true. Without that you have no way to attain a map that reflects the territory, any epistemology which claims you do is guilty of making stuff up.

Replies from: curi

↑ comment by curi · 2011-04-09T22:53:55.444Z · LW(p) · GW(p)

Without that you have no way to attain a map that reflects the territory, any epistemology which claims you do is guilty of making stuff up.

I do not agree with this assertion.

Some things I note about it:

1) it isn't phrased as a prediction

2) it isn't phrased as an argument based on empirical evidence

Would you like to try rewriting it more carefully?

Replies from: benelliott

↑ comment by benelliott · 2011-04-10T08:56:14.623Z · LW(p) · GW(p)

1) It can be phrased as a prediction. "I predict if someone had no way to evaluate their predictions based on evidence they would have no way of attaining a map that reflects the territory. They would have no way of attaining a belief-set that works better in this world than in the average of all possible worlds".

2) It is a mathematical statement, or at any rate the logical implication of a mathematical statement, and thus is probably true in all possible worlds so I am not trying to entangle it with the territory.

Replies from: curi

↑ comment by curi · 2011-04-10T09:03:34.648Z · LW(p) · GW(p)

If Y can be phrased as a prediction, it does not follow that Y is the predictive content of X. Do you understand?

Replies from: benelliott

↑ comment by benelliott · 2011-04-10T10:08:16.962Z · LW(p) · GW(p)

I understand, but disagree. The point I have been trying to make is that it does.

My original claim was that an agent's outcome was determined solely by that agent's predictions and the external world in which that agent lived. If you define a theory so that its predictive content is a strict subset of all the predictions which can be derived from it then yes, its predictive content is not all that matters, the other predictions matter as well.

It nonetheless remains the case that what happens to an agent is determined by that agent's predictions. You need to understand that theories are not fundamentally Bayesian concepts, so it is much better to argue Bayes at either the statement-level or the agent-level than the theory-level.

In addition, I think our debate is starting to annoy everyone else here. There have been times when the entire recent comments bar is filled with comments from one of us, which is considered bad form.

Could we continue this somewhere else?

Replies from: curi

↑ comment by curi · 2011-04-11T20:06:21.468Z · LW(p) · GW(p)

Could we continue this somewhere else?

Yes. I PMed you somewhere yesterday. Did you get it?

↑ comment by curi · 2011-04-09T21:28:59.155Z · LW(p) · GW(p)

"do objects vanish when we stop looking at them?" is another such debate, but ultimately it doesn't make much difference either way so I say just assume they do since its mathematically simpler.

This kind of attitude is what we see as the anti-thesis of philosophy, and of understanding the world.

When people try to invent new theories in physics, they need to understand things. For example, will they want to use math that models objects that frequently blink in and out of existence, or objects that don't do that? Both ways might make the same predictions for all measurements humans can do, but they lead to different research directions. The MWI vs Bohm/etc stuff is similar: it leads to different research directions for what one thinks would be an important advance in physics, a promising place to look for a successor theory, and so on. As an example, Deutsch got interested in fungibility because of his way of understanding physics. That may or may not be the right direction to go -- depending on if his way of thinking about physics is right -- but the point is it concretely matters even though it's not an issue of prediction.

They are learning to write computer programs, which are not necessarily perfectly modular. Computer programs and scientific theories are different things and work in different ways.

Computer programs are a kind of knowledge, like any other, which has an organizational structure, like any other knowledge.

That may be the case, it doesn't mean that when designing a theory I should waste any concern on "should I try to include general principles which may be useful to future theories". Thinking like that is likely to generate a lot of rubbish.

That is not how you achieve good design. That's the wrong way to go about it. Good design is achieved by looking for simplicity, elegance, clarity, modularity, good explanations, and so on. When you have those, then you do get a theory which can be more help to future theories. If you just try to think, "What will the future want?" then you won't know the answer so you won't get anywhere.

EDIT: I thought you meant having them vanish is simpler b/c it means that less stuff exists less of the time. That is a rough description of what the Copenhagen Interpretation people think. One issue this raises is that people can and do disagree about what is simpler. I don't think it's good to set up "simpler" as a definitive criterion. It's better to have a discussion. You can use "it's simpler" as an argument. And that might end the discussion. But it might not if someone else has another criterion they think is relevant and an explanation of why it matters (you shouldn't rule out that ever happening). And also someone might criticize your interpretation of what makes things simpler in general, or which is simpler in this case.

Replies from: JoshuaZ, benelliott

↑ comment by JoshuaZ · 2011-04-09T21:44:10.733Z · LW(p) · GW(p)

I'm pretty sure Benelleiot wrote "do" when he meant "don't."

↑ comment by benelliott · 2011-04-09T21:38:12.172Z · LW(p) · GW(p)

The MWI vs Bohm/etc stuff is similar: it leads to different research directions for what one thinks would be an important advance in physics, a promising place to look for a successor theory, and so on. As an example, Deutsch got interested in fungibility because of his way of understanding physics. That may or may not be the right direction to go -- depending on if his way of thinking about physics is right -- but the point is it concretely matters even though it's not an issue of prediction.

Actually, it is an issue of prediction. We are trying to predict which future research will lead to promising results.

Computer programs are a kind of knowledge, like any other, which has an organizational structure, like any other knowledge.

I would say computer programs are more analogous to architecture than to scientific theories.

Replies from: curi

↑ comment by curi · 2011-04-09T21:50:11.450Z · LW(p) · GW(p)

Actually, it is an issue of prediction. We are trying to predict which future research will lead to promising results.

If you have a theory, X, which predicts Y, then there are important aspects of X other than Y. There is non-predictive content of X. When you say that those other factors have to do with prediction in some way, that wouldn't mean that only the predictive content of X matters since the way they have to do with prediction isn't a prediction X made.

I would say computer programs are more analogous to architecture than to scientific theories.

I'm not speaking in loose analogies and opinions. All knowledge is the same thing ("knowledge") because it has shared attributes. The concept of an "idea" covers ideas in various fields, under the same word, because they share attributes.

Replies from: benelliott

↑ comment by benelliott · 2011-04-09T21:55:50.011Z · LW(p) · GW(p)

If you have a theory, X, which predicts Y, then there are important aspects of X other than Y. There is non-predictive content of X. When you say that those other factors have to do with prediction in some way, that wouldn't mean that only the predictive content of X matters since the way they have to do with prediction isn't a prediction X made.

I would say, evaluate X solely on the predictive merits of Y. If we are interested in future research directions then make separate predictions about those.

All knowledge is the same thing ("knowledge") because it has shared attributes.

A computer program doesn't really count as knowledge. Its information, in the scientific and mathematical sense, and you write it down, but the similarity ends there. It is a tool that is built to do a job, and in that respect is more like a building. It doesn't really count as knowledge at all, not to a Bayesian at any rate.

Remember, narrowness is a virtue.

Replies from: curi

↑ comment by curi · 2011-04-09T22:04:26.597Z · LW(p) · GW(p)

Remember, narrowness is a virtue.

What? Broad reach is a virtue. A theory which applies to many questions -- which has some kind of general principle to it -- is valuable. Like QM which applies to the entire universe -- it is a universal theory, not a narrow theory.

A computer program doesn't really count as knowledge.

It has apparent design. It has adaptation to a purpose. It's problem-solving information. (The knowledge is put there by the programmer, but it's still there.)

I would say, evaluate X solely on the predictive merits of Y. If we are interested in future research directions then make separate predictions about those.

One of the ways this came up is we were considering theories with identical Y, and whether they have any differences that matter. I said they do. Make sense now?

Replies from: benelliott

↑ comment by benelliott · 2011-04-09T22:53:14.970Z · LW(p) · GW(p)

What happens if we taboo the word 'theory'.

It has apparent design. It has adaptation to a purpose. It's problem-solving information. (The knowledge is put there by the programmer, but it's still there.)

In this sense, a building is also knowledge. Programming is making, not discovering.

One of the ways this came up is we were considering theories with identical Y, and whether they have any differences that matter. I said they do. Make sense now?

Suppose two theories A and B make identical predictions for the results of all lab experiments carried out thus far but disagree about directions for future research. I would say they make different predictions about which research directions will lead to success, and are therefore not entirely identical.

Replies from: curi, curi

↑ comment by curi · 2011-04-09T23:22:11.472Z · LW(p) · GW(p)

Just got an idea for a good example from another thread.

Consider chess. If a human and a chess program come up with the same move, then the differences between them, and their ways of thinking about the move, don't really matter, do you think?

And suppose we want to learn from them. So we give them both white. We play the same moves against each of them. We end up with identical games, suppose. So, in the particular positions from that game they make identical predictions about what move has the best chance to win.

Now, we also in each case gather some information about why they made each move to learn from.

The computer program provides move list trees it looked at, at every move, with evaluations of the positions they reach.

The human provides explanations. He says things like, "I was worried my queen side wasn't safe, so i decided i better win on the king side quickly" or "i saw that this was eventually heading towards a closed game with my pawns fixed on dark squares, so that's why i traded my bishop for a knight there".

When you want to learn chess, these different kinds of information are both useful, but in different ways. They are different. The differences matter. For a specific person, with specific strengths and weaknesses, one or the other may be far far more useful.

Replies from: benelliott

↑ comment by benelliott · 2011-04-10T09:03:08.805Z · LW(p) · GW(p)

So, the computer program and the human do different things, and thereby produce different results. Your point?

I was claiming that if they did the same thing they would get the same results.

Replies from: curi

↑ comment by curi · 2011-04-10T09:05:33.135Z · LW(p) · GW(p)

The point is that identical predictive content (in this case, in the sense of predicting what move has the best chance to win the game in each position) does not mean that what's going on behind the scenes is even similar.

It would be that one thing is intelligent, and one isn't. That's how big the differences can be.

So, are you finally ready to concede that your claim is false? The one you were so sure of that identical predictive content of theories means nothing else matters, and that their internal structure can't be important?

Replies from: benelliott

↑ comment by benelliott · 2011-04-10T10:01:39.064Z · LW(p) · GW(p)

No, because they do different things. If they take different actions this imply's they must have different predictions (admittedly its a bit anthropomorphic to talk about a chess program having predictions at all).

Incidentally, they are using different predictions to make their moves. For example the human may predict P(my left side is too weak) = 0.9 and use this prediction to derive P(I should move my queen to the left side) = 0.8, while the chess program doesn't really predict at all but if it did you would see something more like individual predictions for the chance of winning given each possible move, and a derived prediction like P(I should move my queen to the left side) = 0.8.

With such different processes, its really an astonishing coincidence that they make the same moves at all.

(I apologise in advance for my lack of knowledge of how chess players actually think, I haven't played it since I discovered go, I hope my point is still apparent.)

Replies from: curi

↑ comment by curi · 2011-04-10T10:22:35.772Z · LW(p) · GW(p)

That's not how chess players think.

Your point is apparent -- you try to reinterpret all human thinking in terms of probability -- but it just isn't true. There's lots of books on how to think about chess. They do not advise what you suggest. Many people follow the advice they do give, which is different and unlike what computers do.

People learn explanations like "control the center because it gives your pieces more mobility" and "usually develop knights before bishops because it's easier to figure out the correct square for them".

Chess programs do things like count up how many squares each piece on the board can move to. When humans play they don't count that. They will instead do stuff like think about what squares they consider important and worry about those.

Replies from: benelliott

↑ comment by benelliott · 2011-04-10T10:36:30.051Z · LW(p) · GW(p)

"control the center because it gives your pieces more mobility"

Notice how this sentence is actually a prediction in disguise

"usually develop knights before bishops because it's easier to figure out the correct square for them"

As is this

↑ comment by curi · 2011-04-09T23:13:37.222Z · LW(p) · GW(p)

What happens if we taboo the word 'theory'.

I only write "idea" instead. If you taboo that too, i start writing "conjecture" or "guess" which is misleading in some contexts. Taboo that too and i might have to say "thought" or "believe" or "misconception" which are even more misleading in many contexts.

In this sense, a building is also knowledge.

Yes buildings rely on, and physically embody, engineering knowledge.

Suppose two theories A and B make identical predictions for the results of all lab experiments carried out thus far but disagree about directions for future research. I would say they make different predictions about which research directions will lead to success, and are therefore not entirely identical.

But they don't make those predictions. they don't say this stuff, they embody it in their structure. it's possible for a theory to be more suited to something, but no one knows that, and it wasn't made that way on purpose.

Replies from: JoshuaZ, benelliott

↑ comment by JoshuaZ · 2011-04-09T23:24:02.248Z · LW(p) · GW(p)

The point of tabooing words is to expand your definitions and remove misunderstandings, not to pick almost synonyms.

↑ comment by benelliott · 2011-04-10T09:01:30.547Z · LW(p) · GW(p)

I only write "idea" instead. If you taboo that too, i start writing "conjecture" or "guess" which is misleading in some contexts. Taboo that too and i might have to say "thought" or "believe" or "misconception" which are even more misleading in many contexts.

You didn't read the article, and so you are missing the point. In spectacular fashion I might add.

Yes buildings rely on, and physically embody, engineering knowledge.

So, buildings should be made out of bricks, therefore scientific theories should be made out of bricks?

But they don't make those predictions. they don't say this stuff, they embody it in their structure. it's possible for a theory to be more suited to something, but no one knows that, and it wasn't made that way on purpose.

I contend that a theory can make more predictions than are explicitly written down. Most theories make infinitely many predictions. A logically omniscient Ideal Bayesian would immediately be able to see all those predictions just from looking at the theory, a Human Bayesian may not, but they still exist.

Replies from: curi

↑ comment by curi · 2011-04-10T09:09:00.775Z · LW(p) · GW(p)

What do you think is more likely:

1) I meant

So, buildings should be made out of bricks, therefore scientific theories should be made out of bricks?

2) I meant something else which you didn't understand

Can you specify the infinitely many predictions of the theory "Mary had a little lamb" without missing any I deem important structural issues? Saying the theory "Mary had a little lamb" is not just a prediction but infinitely many predictions is non-standard terminology right? Did you invent this terminology during this argument, or did you always use? Are there articles on it?

Replies from: benelliott, wedrifid

↑ comment by benelliott · 2011-04-10T09:52:20.583Z · LW(p) · GW(p)

Bayesians don't treat the concept of a theory as being fundamental to epistemology (which is why I wanted to taboo it), so I tried to figure out the closest Bayesian analogue to what you were saying and used that.

As for 1) and 2), I was merely pointing out that "program's are a type of knowledge, programs should be modular, therefore knowledge should be modular" and "building's are a type of knowledge, buildings should be made of bricks, therefore knowledge should be made of bricks" are of the same form and equally valid. Since the latter is clearly wrong, I was making the point that the former is also wrong.

To be honest I have never seen a better demonstration of the importance of narrowness than your last few comments, they are exactly the kind of rubbish you end up talking when you make a concept too broad.

Replies from: curi

↑ comment by curi · 2011-04-10T09:56:09.894Z · LW(p) · GW(p)

program's are a type of knowledge, programs should be modular, therefore knowledge should be modular

I didn't make that argument. Try to be more careful not to put words into my mouth.

↑ comment by wedrifid · 2011-04-10T09:30:47.006Z · LW(p) · GW(p)

When you have a reputation like curi's this is exactly the sort of rhetorical question you should avoid asking.

Replies from: curi

↑ comment by curi · 2011-04-10T09:36:36.903Z · LW(p) · GW(p)

What should I do, do you think? I take it you know what my goals are in order to judge this issue. Neat. What are they? Also what's my reputation like?

↑ comment by Marius · 2011-04-08T17:35:53.296Z · LW(p) · GW(p)

But one might work, infernally, by torturing puppies

FTFY

↑ comment by [deleted] · 2011-04-07T11:05:32.816Z · LW(p) · GW(p)

Mm, I'm not sure I entirely agree with that link- we might accept that most long term goals on maximisation will lead to what most people might recognise as morality, but I don't know if all goals are long term. It also makes no argument as to one goal being "better" than another. Theres sensible reasons for me to discourage people having the desire to kill me, for example, but I don't see that one could argue that I'm right and he's wrong. If someone is born with just one innate desire, that of killing me, its in her interest to pursue that goal. Now she might well act morally elsewhere while engaging fully in her training towards killing me, but at some point where she is confident that she will be able to kill me, she should drop everything else and kill me. Of course after this her life is empty, but she only had that one desire, and she had to fufill it at some point- she got absolutely no value from everything else.

Now was she wrong to pursue that goal? I don't see how I can condemn her. I obviously will do everything in my power to stop her, and I would hope others in society would have goals which are interrupted by my untimely demise, but I don't see where condemnation comes in here. We had conflicting goals, and mine seem "nicer" from an intuitive argument, but if I lived in a world where everyone had a strong desire to see me dead then I imagine it would feel "nicer" to them for me to die, and "nasty" for me to survive.

Replies from: curi

↑ comment by curi · 2011-04-07T18:17:07.554Z · LW(p) · GW(p)

Not all goals are long term.

One of the purposes of the dialog is to explain that the foundations are not very important. That means you don't have to figure out the correct foundations or starting place to have objective morality. You can start wherever you want, because rather little depends on the starting place.

Once you do make a ton of progress, when you're much wiser, if your starting place was squirrels you'd be able to reconsider it because it's so silly. The same holds for any other particularly dumb starting place.

The ones that will be harder to change later are specifically the ones that you don't see as bad -- that you don't want to change. The ones that are either correct or you don't yet have enough knowledge to see the problems with them.

If someone is born with just one innate desire

Innate desires aren't morality. It's a bad argument "I was born this way therefore I should be this way". That's getting and ought from an is.

Moving on, one way to move past the squirrel scenario, which enables you to criticize the squirrel starting point and many others, is you consider other scenarios. Drop the squirrels and put in something else, like minimizing bison. Put in a way variety of stuff. It's not too important what it is. Any kind of value, taken seriously, and which has something to say long term. Even wanting to kill someone will work if you also want them to stay dead forever (if you really want to make sure to destroy all the information that could be used to resurrect them later with advanced technology, and you want to know what kind of remains could be used for that and what would violate the laws of physics, then you will need advanced knowledge).

So, you try the same thought experiment with bison-minimizing, or killing-forever.

You find that some of the conclusions are the same, and some are different.

Take only the ones that are the same for thousands of starting points.

Those are the non-parochial ones. They are the ones that don't depend on your culture and biases. There is the objective content.

The parts that vary by starting points are wrong.

That's what I think. And I think this argument isn't bogged down in being totally subjective from the start. Maybe it's not perfect, but that just means we could make an even better argument in the future.

Getting back to your original point, this shared content across many starting points does say stuff about what to do short term. It doesn't give complete arbitrary freedom of action in the short term. It's also not totally restrictive, but that's good (it might get a lot more restrictive if made more precise. but that would also be good. knowing how to live well in high detail would be a good thing even if it gave you less non-immoral options. as long as we don't jump the gun and create very precise rules before we understand how to work them out well, we'll be ok.)

Replies from: None

↑ comment by [deleted] · 2011-04-07T19:26:11.070Z · LW(p) · GW(p)

Sorry, but I just do not see how you can claim desires are not morality when you have yet to provide a basis for what it is! I see no reason to believe that those bases with common conclusions are somehow better. They might feel better, but thats not good enough

Replies from: curi

↑ comment by curi · 2011-04-07T19:28:57.109Z · LW(p) · GW(p)

you have yet to provide a basis

I've argued that morality is at least largely, if not entirely, independent of basis. So asking me for a basis isn't the right question.

Can you give an example of a starting point you think avoids the common conclusions such as liberalism?

Replies from: None

↑ comment by [deleted] · 2011-04-07T20:32:06.605Z · LW(p) · GW(p)

You have shown that an argument can be made that given a number of seemingly dissimilar, long term goals, e can make arguments which convincingly argue that to achieve them one should act in a manner people would generally consider moral. I am not convinced squirrel morality gives me an answer on specific moral questions (abortion say) but I can see how one might manage it. You have yet to convince me that short term bases will do the same: I am reasonably confident that many wil not. To claim theses bases as inferior seems to be begging the question to me.

As to your specific question: how about a basis of wanting to prevent liberalism? It would certainly be difficult to achieve and counter productive, but to claim that those respective properties are bad begs the question: you need morality to condemn purposes which are going to cause nothing but pain for all involved.

Replies from: curi, None

↑ comment by curi · 2011-04-07T20:52:02.949Z · LW(p) · GW(p)

how about a basis of wanting to prevent liberalism?

If you were just to destroy the world, or build a static society and die of a meteor strike one day b/c your science never advanced, then life could evolve on another planet.

You need enough science and other things to be able to affect the whole universe. And for that you need liberalism temporarily. Then at the very very end, when you're powerful enough to easily do whatever you want to the whole universe (needs to be well within your power, not at the limits of your power, or it's too risky, you might fail) then finally you can destroy or control everything.

So that goal leads straight to billions of years of liberalism. And that does mean freedom of abortion: ruining people's lives to punish them for having sex does not make society wealthier, does not promote progress, etc... But does increase the risk of everyone dying of meteor before you advance enough to deal with such a problem.

short term bases

Accomplish short term things, in general, depends on principles. Suppose I want a promotion at work within the next few years. It's important to have the right kind of philosophy. I'll have a better shot at it if I think well. So I'll end up engaging with some big ideas. Not every single short term basis will lead somewhere interesting. If it's really short, it's not so important. Also consider this: we can conjecture that life is nice. People cannot use short term bases, which don't connect to big ideas, to criticize this. If they want to criticize it, they will have to engage with some big ideas, so then we get liberalism again.

Replies from: None

↑ comment by [deleted] · 2011-04-08T08:53:50.521Z · LW(p) · GW(p)

Dealing with issues in order. OK, fine, once again you've taken a bases that I've given and assumed I want it to apply to the entire universe (note this isn't necessarily what most people actually mean. Just because I want humans to be happy doesn't necessarily mean I want a universe tiled with happy humans), but even under this assumption I'm not sure I agree- by encouraging liberalism in the short term we may make it impossible to create liberalism in the long term, and you are imagining a society which is human in nature. Humans like liberalism, as a rule, but to say that therefore morality needs liberalism is actually subjective on humans. If I invent a species of blergs who love illiberalism then I can get away with it. Bear in mind that an illiberal species isn't THAT hard to imagine- we suppose democracy is stable despite liberal societies being destroyed by more liberal ones. You make an assumption of stability based on the past 300 years or so of history, which seems somewhat presumptive.

I actually agree that given sensible starting assumptions we can get to something that looks like morality, or at least argue strongly in its favour, but those bases have no reason outside of themselves to be accepted. They are axioms, and axioms are by necessity subjective. We can look at them and say "hey those seem sensible" and "hey, those lead to results that jibe with my intuitions", but we can't really defend them as inherent rules. Look at Eliezer's three worlds collide, with the Baby Eaters. While I disagree with many of the conclusions of that story, the evolution of the Baby Eaters doesn't sound totally implausible, but theres a society thats developed a morality utterly at odds with our own.

On short term bases, I can obviously invent short term bases that don't work. You claim that my murderer is worried about my resurrection. Most aren't, and its easy to just say they want me to die once, and don't care heavily if I resurrect afterwards. If I do, their desire will already have been fufilled and they will be sated. This person is weird and irrational, and there are multiple sensible reasons for us to make sure that person does not accomplish their goals, but to claim their goal is worse than ours inherently assumes a number of goals that that individual doesn't possess.

Replies from: curi

↑ comment by curi · 2011-04-08T09:57:11.947Z · LW(p) · GW(p)

OK, fine, once again you've taken a bases that I've given and assumed I want it to apply to the entire universe

We have different priorities.

What I want is: if people want to improve, then they can. There is an available method they can use, such as taking seriously their ideas and fully applying them instead of arbitrarily restricting them.

Most murderers don't worry about resurrection. Yes, but I don't mind. The point is a person with a murder type of basis has a way out starting with his existing values.

I think what you want is not possible methods of progress people could use if they wanted to, but some kind of guarantee. But there can't be one. For one thing, no matter what you come up with people could simply refuse to accept your arguments.

They can refuse to accept mine too. I don't care. My interest is that they can improve if they like. That's enough.

There doesn't have to be a way to force a truth on everyone (other than, you know, guns) for it to be an objective truth.

Bear in mind that an illiberal species isn't THAT hard to imagine

They are easy to imagine. But they are only temporary. They always go extinct because they cannot deal with all the unforeseen problems they counter.

You make an assumption of stability based on the past 300 years or so of history, which seems somewhat presumptive.

No. You made an assumption about my reasoning. I certainly didn't say that. You just guessed it. If you'd asked my reasoning that isn't what I would have said.

Replies from: None, JoshuaZ

↑ comment by [deleted] · 2011-04-08T10:33:19.976Z · LW(p) · GW(p)

Mm, I wonder if we are potentially arguing about the same thing here. I suspect our constructions of morality would look very similar at the end of the day, and that the word "objective" is getting in our way. I still don't see how one can possibly construct a morality which exists outside minds in a real way, as morality is a ffunction of sentience.

Replies from: curi

↑ comment by curi · 2011-04-08T17:03:26.699Z · LW(p) · GW(p)

morality is answers to questions about how to live. it is not a "function of sentience".

the theory "morality not is objective" means that for any non-ambiguous question, there are multiple equally good answers.

an example of a non-ambiguous question is, "which computer should i buy today, if any, given my situation and background knowledge, and indeed given the entire state of the universe if it's relevant".

morality being objective means if a different person got into an identical situation (it only has to actually be the same in the relevant ways which are limited), the answer would be the same for him, not magically change.

so far this doesn't have a lot of substance. yet it is what objectivity means. subjectivity is a dumb theory which advocates magic and whim.

the reason objectivity is important (besides for rejecting subjective denials that moral arguments can apply to anyone who doesn't feel like letting them) is that when you consider lots of objective moral answers (in full detail the way i was saying) you find: there are common themes across multiple answers and many things are irrelevant (so, common themes across all questions in categories specified by only a small number of details). some explanations hold, and govern the answers, across many different moral questions. those are important and objective moral truths. when we learn them, they don't just help us once but can be re-used to help with other choices later.

Replies from: wedrifid

↑ comment by wedrifid · 2011-04-08T18:11:09.788Z · LW(p) · GW(p)

It is a convention here to use capital letters at the start of sentences. Either as a mere stuffy traditional baggage inherited from English grammar or because it makes it easier to parse the flow of a paragraph at a glance.

Replies from: curi

↑ comment by curi · 2011-04-08T18:12:35.462Z · LW(p) · GW(p)

Why don't you just solve the problem in software instead of whining about it? It's not hard.

Replies from: Emile, jimrandomh

↑ comment by Emile · 2011-04-08T18:40:08.254Z · LW(p) · GW(p)

The problem is solved by software, in the form of the "Vote down" and "Reply" buttons, which allow to eventually correct things like that.

(More seriously, having the site automatically add proper capitalization to what people write would be awful, even if it was worth adding a feature for the tiny minority of people who can't find their "shift" key)

Replies from: wedrifid, curi

↑ comment by wedrifid · 2011-04-08T18:52:57.722Z · LW(p) · GW(p)

The problem is solved by software, in the form of the "Vote down" and "Reply" buttons, which allow to eventually correct things like that.

I like it. (Because I had written the exact same thing myself and only refrained from posting it because curi had tipped my fairly sensitive 'troll' meter which invokes my personal injunction against feeding with replies.)

Replies from: AlephNeil

↑ comment by AlephNeil · 2011-04-08T20:02:02.857Z · LW(p) · GW(p)

curi had tipped my fairly sensitive 'troll' meter

Agreed.

This website has such high standards that I would have felt totally out of line if I'd offered a frank opinion on the credibility/crankiness of our visitor.

But I guess what's awesome about the karma system is that it removes any need to 'descend to the personal level'. No need to drive people away with mockery or insults.

↑ comment by curi · 2011-04-08T18:55:37.695Z · LW(p) · GW(p)

why would it be awful to have (optional) software support for "a convention here"?

↑ comment by jimrandomh · 2011-04-08T18:50:26.948Z · LW(p) · GW(p)

Following the rules of English is solely the writer's responsibility. Some text input methods, such as onscreen keyboards for cell phones, will do capitalization for you. They will sometimes get it wrong, though, so you have to override them, and this is inconvenient enough that people who're used to using the shift key don't want autocapitalization. For example, variable names stay in lower-case if they're at the start of a sentence, and some periods represent abbreviations rather than the ends of sentences.

More to the point, though, proper capitalization, punctuation, spelling and grammar are signals that reveal how fluent a writer is in English, and whether they've proofread. Comments that don't follow the basic rules of English can be dismissed more readily, because writers still struggling with language are usually struggling with concepts too, and a comment that hasn't been proofread probably hasn't been checked for logical errors either.

Replies from: curi

↑ comment by curi · 2011-04-08T18:59:33.454Z · LW(p) · GW(p)

Following the rules of English is solely the writer's responsibility.

It seems to me you have a moral theory that people should have to work hard, and be punished for failing to conform to convention, even though, if you want to read it a particular way you could solve that in software without bothering me. If it's a convention here in particular, as I was told, then software support could be added to the website instead of used by individual readers. You're irrationally objecting to dissident or "lazy" behavior on principle, and you don't want to solve the problem in a way which is nicer people you think should be forced to change. This is an intolerant and illiberal view.

Your plan of inferring whether I proofread, or whether I am fluent with English, from my use or not of capitalization, is rather flawed. I often proofread and don't edit capitalization. But you don't care about that. The important thing to you is the moral issue, not that your semi-factual arguments are false.

Replies from: jimrandomh

↑ comment by jimrandomh · 2011-04-08T19:36:59.034Z · LW(p) · GW(p)

I have been trolled. I have lost. I will have a nice day anyways.

Replies from: playtherapist

↑ comment by playtherapist · 2011-04-08T22:51:55.747Z · LW(p) · GW(p)

I like your attitude, son!

↑ comment by JoshuaZ · 2011-04-08T14:04:06.551Z · LW(p) · GW(p)

Bear in mind that an illiberal species isn't THAT hard to imagine
They are easy to imagine. But they are only temporary. They always go extinct because they cannot deal with all the unforeseen problems they counter.

What evidence do you have for this claim? This isn't at all obvious to me. The only highly sapient species we encounter are humans. And homo sapiens aren't terribly liberal. Do you have examples of other species that are intrinsically illiberal that have gone extinct?

Replies from: curi

↑ comment by curi · 2011-04-08T16:50:04.105Z · LW(p) · GW(p)

without progress, you don't get advanced science. that means eventually you die to a super nova explosion or meteor or something. how could it be otherwise?

Replies from: JoshuaZ

↑ comment by JoshuaZ · 2011-04-09T15:03:38.906Z · LW(p) · GW(p)

without progress, you don't get advanced science. that means eventually you die to a super nova explosion or meteor or something. how could it be otherwise?

You may need to think carefully about what you mean by illiberal and progress. You also may want to consider why an illiberal species can't construct new technologies as needed to deal with threats.

↑ comment by [deleted] · 2011-04-07T20:32:51.858Z · LW(p) · GW(p)

Urgh, typing this on my phone is less than fun. My point is pretty much finished though

↑ comment by [deleted] · 2011-04-07T10:10:55.069Z · LW(p) · GW(p)

The problem is, nobody else here (or very few people here) regards justification as impossible, You're essentially saying you refuse to engage by the same evidentiary rules as anyone else here. You're not going to change anyone's mind without providing justification.

"At which point infinitely many of my 100% theories will be refuted. And infinitely many will remain. "

Like I said, look up Kolmogrov Complexity and minimum message length. At any given time, the simplest of those 'theories' consistent with all data is the one with the highest probability.

Replies from: curi

↑ comment by curi · 2011-04-07T10:25:50.015Z · LW(p) · GW(p)

Can you tell how ideas are justified, without creating a regress or other severe problem? Tell me the type of justificationism that works, then I will accept it.

Replies from: khafra

↑ comment by khafra · 2011-04-07T13:00:04.794Z · LW(p) · GW(p)

Can you tell me at posterior probability you consider an idea justified, and how many different models can be grouped together under a single idea, without appealing to intuition or other fuzzy concepts?

Also, can you replace the ">" in the top level post with html formatting of some sort?

↑ comment by prase · 2011-04-07T14:14:05.598Z · LW(p) · GW(p)

At which point infinitely many of my 100% theories will be refuted. And infinitely many will remain. You can never win at that game using finite evidence. For any finite set of evidence, infinitely many 100% type theories predict all of it perfectly.

It seems that your objection is basically that if I toss a coin seventeen times and it ends up in a sequence of HTTTHTHHHHTHTHTTH, there is a specific theory T1 (namely, that the physical laws cause the sequence to be HTTTHTHHHHTHTHTTH) which scores higher than the clearly correct explanation T2 (i.e. the probability of each sequence is the same 2^(-17)). But this is precisely why priors depend on the Kolmogorov complexity of hypotheses: with such a prior, the posterior of T2 will be higher than the posterior of T1.

And, after all, you don't have infinitely many theories. Theories live in brains, not in an infinite Platonic space of ideas. Why should we care whether there are infinitely many ways to formulate a theory so absurd that nobody would think of it but still compatible with the evidence? Solomonoff induction tells you to ignore them, which agrees with the common sense.

Replies from: curi

↑ comment by curi · 2011-04-07T19:22:13.344Z · LW(p) · GW(p)

Selectively ignoring theories, even when we're aware of them, is just bias, isn't it?

I'm a bit surprised that someone here is saying to me "OK so mathematically, abstractly, we're screwed, but in practice it's not a big deal, proceed anyway". Most people here respect math and abstract thinking, and don't dismiss problems merely for involving substantial amounts of theory.

Of course a prior can arbitrarily tell you which theories to prefer over others. But why those? You're getting into problems of arbitrary foundations.

Replies from: prase, calef

↑ comment by prase · 2011-04-07T20:39:26.091Z · LW(p) · GW(p)

Bias is a systematic error in judgement, something which yields bad results. It is incorrect to apply that label to heuristics which are working well.

I haven't told you that we are abstractly screwed, but it's no big deal. We are not screwed, on the contrary, the Solomonoff induction is a consistent algorithm which works well in practice. It is as arbitrary as any axioms are arbitrary. You can't do any better if you want to have any axioms at all, or any method at all. If your epistemology isn't completely empty, it can be criticised for being arbitrary without regard to its actual details. And after all, what ultimately matters is whether it works practically, not some perceived lack of arbitrariness.

↑ comment by calef · 2011-04-07T20:32:35.120Z · LW(p) · GW(p)

We're fundamentally incapable of making statements about reality without starting on some sort of arbitrary foundation.

And I think describing it as "selectively ignoring" is doing it an injustice. We're deductively excluding, and it there were some evidence to appear that would contradict that exclusion, those theories would no longer be excluded.

I'm actually have trouble finding a situation in which a fallibilist would accept/reject a proposition, and a Bayesian would do the opposite of the fallibilist. And I don't mean epistemological disagreements, I mean disagreements of the form "Theory Blah is not false."

Replies from: curi

↑ comment by curi · 2011-04-07T20:36:54.713Z · LW(p) · GW(p)

We're fundamentally incapable of making statements about reality without starting on some sort of arbitrary foundation.

This is something Popper disputes. He says you can start in the middle, or anywhere. Why can't that be done?

And I think describing it as "selectively ignoring" is doing it an injustice. We're deductively excluding

I was talking about the theories that can't be deductively excluded b/c they make identical predictions for all available evidence.

reply to benelliott about Popper issues

Contents

189 comments