Posts
Comments
Thank you for this info. I've signed up. I think this flipped my mood from gloomy to happy.
Incidentally, this is the second study I've signed up for via the web. The first is the Good Judgement Project which has been a fun exercise so far.
I think minesweeper makes a nice analogy with many of the ideas of epistemic rationality espoused in this community. At a basic level, it demonstrates how probabilities are subjectively objective -- our state of information (the board state) is what determines the probability of a mine under an unknown square but that there really is only one correct set of mine probabilities. However, we also run quickly into the problem of bounded cognition. In this situation we resort to heuristics. Of course, heuristics are of varying quality, and it is possible with mathematics, to make better heuristics.
For example, if you find that the set of possible configurations of mines in a particular neighborhood is partitioned into, say, those that involve k mines and those that involve k+1 mines, then you can get a pretty good estimate of the probability that the true configuration will be in one partition or the other. It depends on the density of mines under the squares that aren't known (something like a prior).
There are situations which come up in which one must decide just what one's goals are -- is it to survive the next click or to maximize the chance to win the game? Often these two goals result in the same decision, but sometimes, interestingly, they result in different decisions.
I like to play on a 24x30 board (maximum allowed on windows machines), with 200 mines. This makes the game rarely about deductive logic alone. Situations in which probability theory is necessary come up all the time with this density.
Lovely. Thanks.
K. S. Van Horn gives a few lines describing the derivation in his PT:TLoS errata. I don't understand why he does step 4 there -- it seems to me to be irrelevant. The two main facts which are needed are step 2-3 and step 5, the sum of a geometric series and the Taylor series expansion around y = S(x). Hopefully that is a good hint.
Nitpicking with his errata, 1/(1-z) = 1 + z + O(z^2) for all z is wrong since the interval of convergence for the RHS is (-1,1). This is not important to the problem since the z here will be z = exp(-q) which is less than 1 since q is positive.
I would like to share some interesting discussion on a hidden assumption used in Cox's Theorem (this is the result which states that what falls out of the desiderata is a probability measure).
First, some criticism of Cox's Theorem -- a paper by Joseph Y. Halpern published in the Journal of AI Research. Here he points out an assumption which is necessary to arrive at the associative functional equation:
F(x, F(y,z)) = F(F(x,y), z) for all x,y,z
This is (2.13) in PT:TLoS
Because this equation was derived by using the associativity of the conjunction operation A(BC) = (AB)C, there are restrictions on what values the plausibilities x, y, and z can take. If these restrictions were stringent enough that x,y and z could only take on finitely many values or if they were to miss an entire interval of values, then the proof would fall apart. There needs to be an additional assumption that the values they can take form a dense subset. Halpern argues that this assumption is unnatural and unreasonable since it disallows "notions of belief with only finitely many gradations." For example, many AI projects have only finitely many propositions that are considered.
K. S. Van Horn's article on Cox's Theorem addresses this criticism directly and powerfully starting on page 9. He argues that the theory that is being proposed should be universal and so having holes in the set of plausibilities should be unacceptable.
Anyhow, I found it interesting if only because it makes explicit a hidden assumption in the proof.
Ah OK. You're right. I guess I was taking the 'extension of logic' thing a little too far there. I had it in my head that ({any prop} | {any contradiction}) = T since contradictions imply anything. Thanks.
Yeah. My solution is basically the same as yours. Setting A=B=C makes F(T,T) = T. But setting A=B AND C -> ~A makes F(T,T) = F (warning: unfortunate notation collision here).
Yeah. A total derivative. The way I think about it is the dv thing there (jargon: a differential 1-form) eats a tangent vector in the y-z plane. It spits out the rate of change of the function in the direction of the vector (scaled appropriately with the magnitude of the vector). It does this by looking at the rate of change in the y-direction (the dy stuff) and in the z-direction (the dz stuff) and adding those together (since after taking derivatives, things get nice and linear).
I'm not too familiar with the functional equation business either. I'm currently trying to figure out what the heck is happening on the bottom half of page 32. Figuring out the top half took me a really long while (esp. 2.50).
I'm convinced that the inequality in eqn 2.52 shouldn't be there. In particular, when you stick in the solution S(x) = 1 - x, it's false. I can't figure out if anything below it depends on that because I don't understand much below it.
I did not go through the 9 remaining cases, but I did think about one...
Suppose (AB|C) = F[(A|BC) , (B|AC)]. Compare A=B=C with (A = B) AND (C -> ~A).
Re 2-7: Yep, chain rule gets it done. By the way, took me a few minutes to realize that your citation "2-7" refers to a line in the pdf manuscript of the text. The numbering is different in the hardcopy version. In particular, it uses periods (e.g. equation 2.7) instead of dashes (e.g. equation 2-7), so as long as we're all consistent with that, I don't suppose there will be much confusion.
Jaynes discusses a "tricky point" with regard to the difference between the everyday >meaning of the verb "imply" and its logical meaning; are there other differences between >the formal language of logic and everyday language?
In formal logic, the disjunction "or" is inclusive -- "A or B" is true if A and B are true. In everyday language, typically "or" is exclusive -- "A or B" is meant to exclude the possibility that A and B are both true.
I'm not claiming that working from the definition of derivative is the best way to present the topic. But it is certainly necessary to present the definition if the calculus is being taught in math course. Part of doing math is being rigorous. Doing derivatives without the definition is just calling on a black box.
On the other hand, once one has the intuition for the concept in hand through more tangible things like pictures, graphs, velociraptors, etc., the definition falls out so naturally that it ceases to be something which is memorized and is something that can be produced ``on the fly''.
I teach calculus often. Students don't get hung up on mechanical things like (x^3)' = 3x^2. They instead get hung up on what
%20=%20\lim_{h%20\to%200}%20\dfrac{f(x+h)%20-%20f(x)}{h})
has to do with the derivative as a rate of change or as a slope of a tangent line. And from the perspective of a calculus student who has gone through the standard run of American school math, I can understand. It does require a level up in mathematical sophistication.
I'm not sure about all the details, but I believe that there was a small kerfuffle a few decades ago over a suggestion to change the apex of U.S. ``school mathematics'' from calculus to a sort of discrete math for programming course. I cannot remember what sort of topics were suggested though. I do remember having the impression that the debate was won by the pro-calculus camp fairly decisively -- of course, we all see that school mathematics hasn't changed much.
Probability theory as extended logic.
I think it can be presented in a manner accessible to many (Jaynes PT:LOS is not accessible to many).
Jaynes references Polya's books on the role of plausible reasoning in mathematical investigations. The three volumes are How to Solve it, and two volumes of Mathematics and Plausible Reasoning. They are all really fun and interesting books which kind of give a glimpse of the cognitive processes of a successful mathematician.
Particularly relevant to Jaynes' discussion of weak syllogisms and plausibility is a section of Vol. 2 of Mathematics and Plausible Reasoning which gives many other kinds of weak syllogisms. Things like: "A is analogous to B, B true, so A is more credible."
Just a heads up in case anyone wants to see more of this sort of thing (as at least one person on IRC #lesswrong did).
There are also fun exercises -- for example: cryptic crossword clues as an exercise in plausible reasoning.
Along with the distinction between causal and logical connections, when considering the conditional premise of the syllogisms (if A then B), Jaynes warns us to distinguish between those conditional statements of a purely formal character (the material conditional ) and those which assert a logical connection.
It seems to me that the weak syllogisms only "do work" when the conditional premise is true due to a logical connection between antecedent and consequent. If no such connection exists, or rather, if our mind cannot establish such a connection, then the plausibility of the antecedent doesn't change upon learning the consequent.
For example, "if the garbage can is green then frogs are amphibians" is true since frogs are amphibians, but this fact about frogs does not increase (or decrease) the probability that the garbage can is green since presumably, most of us don't see a connection between the two propositions.
At some point in learning logic, I think I kind of lost touch with the common language use of conditionals as asserting connections. I like that Jaynes reminds us of the distinction.
Upon further study, I disagree with myself here. It does seem like entropy as a measurement of uncertainty in probability distributions does more or less fall out of the Cox Polya desiderata. I guess that 'common sense' one is pretty useful!
I wonder if Jaynes' statement is really true? Here is an example that is on my mind because I'm reading the (thus far) awesome book The Making of the Atomic Bomb. Apologies if I get details wrong:
In the 1930s, there was a lot of work done on neutron bombardment of uranium. At some point, Fermi fired slow moving neutrons at uranium and got a bunch of interesting reaction products that he concluded were most plausibly transuranic elements. I believe he came to this conclusion because the models of the day discounted the hypothesis that a slow moving neutron could do anything but release a "small" particle like a helium nucleus or something and furthermore there was experimental work done to discount the lower elements that were in the vicinity of uranium.
Some weird experimental data by Joliet and Curie which seemed inconsistent with the prevailing model came up later. Hahn and Strassman seemed not to believe their results, and so tried to replicate them and found similar anomalies. A careful chemical analysis of the reaction products of uranium bombardment found elements like barium -- much lower on the periodic table. Meitner and Frisch came along and provided a new model which turned out to be right.
So here was data that when analyzed with respect to old models seemed implausible. The data was questioned, but then replicated, studied and then understood. The result was that the old model had to be cast aside for something new. The reason is that the data was incompatible with the model (or at least implausible enough) that a new model needed to be created.
Isn't this narrative the way knowledge often goes? New data comes along and blows up old ideas because the new data is inconsistent with or implausible in the old model. Does this jibe with Jaynes' statement?
I think this is not so important, but it helpful to think about nonetheless. I guess the first step is to define what is meant by 'Bayesian'. In my original comment, I took one necessary condition to be that a Bayesian gadget is one which follows from the Cox-Polya desiderata. It might be better to define it to be one which uses Bayes' Theorem. I think in either case, Maxent fails to meet the criteria.
Maxent produces the distribution on the sample space which maximizes entropy subject to any known constraints which presumably come from data. If there are no constraints, then one gets the principle of indifference which can also be gotten straight out of the Cox-Polya desiderata as you say. But I think these are two different approaches to the same target. Maxent needs something new -- namely Shannon's information entropy (by 'new' I mean new w.r.t. Cox-Polya). Furthermore, the derivation of Maxent is really different from the derivation of the principle of indifference from Cox-Polya.
I could be completely off here, but I believe the principle of indifference argument is generalized by the transformation group stuff. I think this because I can see the action of the symmetric group (this is the group (group in the abstract algebra sense) of permutations) on the hypothesis space in the principle of indifference stuff. Anyway, hopefully we'll get up to that chapter!
I think Jaynes more or less defines 'Bayesian methods' to be those gadgets which fall out of the Cox-Polya desiderata (i.e. probability theory as extended logic). Actually, this can't be the whole story given the following quote on page xxiii:
"It is true that all 'Bayesian' calculations are included automatically as particular cases of our rules; but so are all 'frequentist' calculations. Nevertheless, our basic rules are broader than either of these."
In any case, Maximum entropy gives you the pre-Bayesian ensemble (I got that word from here) which then allow the Bayesian crank to turn. In particular, I think Maximum entropy methods are not Bayesian in the sense that they do not follow from the Cox-Polya desiderata.
I'm (still) in!
I live in Davis, California, USA which is about an hour from the Bay Area.
The link to the pdf version seems to be missing in the original post.
I'm enthusiastically in.
My name is Taiyo Inoue. I am a 32, male, father of a 1 year old son, married, and a math professor. I enjoy playing the acoustic guitar (American primitive fingerpicking), playing games, and soaking up the non-poisonous bits of the internet.
I went through 12 years of math study without ever really learning that probability theory is the ultimate applied math. I played poker for a bit during the easy money boom for fun and hit on basic probability theory which the 12 year old me could have understood, but I was ignorant of the Bayesian framework for epistemology until I was 30 years old. This really annoys me.
I blame my education for leaving me ignorant about something so fundamental, but mostly I blame myself for not trying harder to learn about fundamentals on my own.
This site is really good for remedying that second bit. I have a goal to help fix the first bit -- I think we call it "raising the sanity waterline".
As a father, I also want to teach my son so he doesn't have the same regret and annoyance at my age.
Hi.
Any comments I've made have been in the last few months. Ive been lurking this site since its inception.
I have no problem letting decomposition refer to details of mind that can be adjusted independently of others. I can imagine such things. But I do not know if such things actually exist in my mind.
I have a math background so I tend to think a bit like that. Here's a silly analogy: consider a linear transformation on a vector space. Sometimes there are invariant subspaces of the vector space called eigenspaces, but sometimes there are not. In this case, you cannot just analyze the effect of a linear transformation by examining a smaller subspace of the vector space since the transformation sort of forces all subspaces to interact with one another.
I can imagine that I adjust some detail of my mind's inner workings and I might imagine that this detail is somehow independent from all the other details. But perhaps, in reality, my mind has no eigenspaces. Perhaps adjusting one detail necessarily has an effect on all the other details of my mind. I can't really understand my consciousness unless I look at the whole thing at once.
Maybe this is embodied in the qualification you made: "more or less independently". Seems fair. But this is the sort of objection I think a non-reductionist might have.
They might also reject the notion of an electron having a conscious experience (as do I) for reasons mentioned elsewhere in the comments section.
I'd like to offer what I think might be objections to this post. When I imagine myself as a non-reductionist and non-materialist reading this post (I am, in fact, neither of these things), I believe I find myself unconvinced by this thought experiment. I suppose I'm not sure convincing this hypothetical me is the goal... nonetheless here are my hypothetical objections:
When I introspect on my thought processes, I am using my mind. I might imagine that I can isolate a "specks of consciousness" just as you ask me to do, but this is a fact about my state of mind which does not necessarily correlate with reality. Perhaps my consciousness is indecomposable. It feels that way to me. What it seems like you are asking me to do in this thought experiment is build a model of consciousness in which consciousness is already similar to matter in that it is decomposable. This seems like question begging! I need more evidence that shows that such reductionism is actually possible.
In any case, suppose that I put my reductionist hat on, and examine my model of decomposable consciousness. Calling infinitesimal specks of my consciousness electrons isn't helpful to me. It also feels like question begging since it seems to assume that my consciousness is made of the same stuff as ordinary matter. If you want to call these specks of mind electrons, feel free. But labeling something is a fact about maps, not about the territory. I need evidence of this which speaks directly about the phenomenon of consciousness.
My perceptions or feelings may sometimes lead me astray, but often times they work really well! For example, when I perceive a rock flying at my face, I do much better when I trust my instincts and feelings than when I don't. How can I tell that my feelings that there is something extraphysical about my consciousness or that the only right way to think about consciousness is by taking a holistic approach are, in fact, wrong?
There is a sign problem when iev(A,B) is defined. You mention that you can get the mutual information of A and B by taking -log_2 of probabilistic evidence pev(A,B) = P(AB) / [P(A)P(B)], but this creates an extra negative sign:
-log_2(pev(A,B)) = -log_2[P(AB) / [P(A)P(B)]] = -[ log_2(P(AB)) - [log_2(P(A)) + log_2(P(B))] ] = -log_2(P(AB)) + log_2(P(A)) + log_2(P(B)) = inf(AB) - inf(A) - inf(B).
I donated 100 USD to the general fund.
I am a lurker -- always taking and never giving. This might change, but perhaps not. In any case, this opportunity is an effective way for me to give back to a community that has given me so much. Thank you.