## Posts

## Comments

**davids**on 2014 Less Wrong Census/Survey · 2014-10-23T12:39:14.958Z · score: 37 (37 votes) · LW · GW

I am curious what kind of analysis you plan to run on the calibration questions. Obvious things to do:

For each user, compute the correlation between their probabilities and the 0-1 vector of right and wrong answers. Then display the correlations in some way (a histogram?).

For each question, compute the mean (or median) of the probability for the correct answers and for the wrong answers, and see how separated they are.

But neither of those feels like a really satisfactory measure of calibration.

**davids**on Newcomblike problems are the norm · 2014-10-03T20:09:54.218Z · score: 1 (1 votes) · LW · GW

"Naively in the actual Newcombe's problem if omega is only correct 1/999,000+epsilon percent of the time…"

I'd like to argue with this by way of a parable. The eccentric billionaire, Mr. Psi, invites you to his mansion for an evening of decision theory challenges. Upon arrival, Mr. Psi's assistant brings you a brandy and interviews you for hours about your life experiences, religious views, favorite philosophers, ethnic and racial background … You are then brought into a room. In front of you is a transparent box with a $1 bill in it, and an opaque box. Mr. Psi explains:

"You may take just the solid box, or both boxes. If I predicted you take one box, then that box contains $1000, otherwise it is empty. I am not as good at this game as my friend Omega, but out of my last 463 games, I predicted "one box" 71 times and was right 40 times out of 71; I picked "two boxes" 392 times and was right 247 times out of 392. To put it another way, those who one-boxed got an average of (40*$1000+145*$0)/185 = $216 and those who two-boxed got an average of (31*$1001+247*$1)/278=$113. "

So, do you one-box?

"Mind if I look through your records?" you say. He waves at a large filing cabinet in the corner. You read through the volumes of records of Mr. Psi's interviews, and discover his accuracy is as he claims. But you also notice something interesting (ROT13): Ze. Cfv vtaberf nyy vagreivrj dhrfgvbaf ohg bar -- ur cynprf $1000 va gur obk sbe gurvfgf naq abg sbe ngurvfgf. link.

Still willing to say you should one-box?

By the way, if it bothers you that the odds of $1000 are less than 50% no matter what, I also could have made Mr. Psi give money to 99/189 one boxers (expected value $524) and only to 132/286 two boxers (expected value $463) just by hfvat gur lrne bs lbhe ovegu (ROT13). This strategy has a smaller difference in expected value, and a smaller success rate for Mr. Psi, but might be more interesting to those of you who are anchoring on $500.

**davids**on Explanations for Less Wrong articles that you didn't understand · 2014-04-04T00:41:41.643Z · score: 1 (1 votes) · LW · GW

A few years ago, I tried to write a friendly introduction to this technical part.

**davids**on [Open Thread] Stupid Questions (2014-02-17) · 2014-02-21T06:41:59.103Z · score: 0 (0 votes) · LW · GW

The grammar of the sentence is a bit hard to follow. When I am presenting this paradox to friends (I have interesting friends), I hand them a piece of paper with the following words on it:

Take another piece of paper and copy these words:

"Take another piece of paper and copy these words: "QQQ" Then replace the three consecutive capital letters with another copy of those words. The resulting paragraph will make a false claim."

Then replace the three consecutive capital letters with another copy of those words. The resulting paragraph will make a false claim.

I urge you to carry out the task. You should wind up with a paper that has the exact same words on it as the paper I gave you.

If you believe that the statement on my paper is true, then you should believe that the statement on your paper is false, and vice versa. Yet they are the same statement! Assuming that you think truth or falsehood is a property of grammatical sentences, independent of where they are written, this should bother you. Moreover, unlike the standard liar paradox, the paper I gave never talks about itself, it only talks about a message you will write on some other piece of paper (which does not, in turn, talk about the original message) when you perform some simple typographical operations.

Quine constructed this example to demonstrate the sort of subtleties that come up in order to invent a mathematical formalism that can talk about truth, and can talk about manipulating symbols, without bringing in the liar paradox. (To learn how this problem is solved, take a course on mathematical logic and Goedel's theorem.)

**davids**on A Fervent Defense of Frequentist Statistics · 2014-02-20T21:19:18.843Z · score: 1 (1 votes) · LW · GW

Well, I was trying to make the simplest possible example. We can of course add the monkey to our pool of experts. But part of the problem of machine learning is figuring out how long we need to watch an expert fail before we go to the monkey.

**davids**on A Fervent Defense of Frequentist Statistics · 2014-02-20T16:22:28.850Z · score: 2 (2 votes) · LW · GW

Suppose there are two experts, and two horses. Expert 1 always predicts horse A, expert 2 always predicts horse B, the truth is that the winning horse cycles ABABABABABA... The frequentist randomizes choice of expert according to weights; the Bayesian always chooses the expert who currently has more successes, and flips a coin when the experts are tied. (Disclaimer: I am not saying that this is the only possible strategies consistent with these philosophies, I am just saying that that these seem like the simplest possible instantiations of "when I actually choose which person to follow on a given round, I randomize according to my weights, whereas a Bayesian would always want to deterministically pick the person with the highest expected value.")

If the frequentist starts with weights (1,1), then we will have weights (1/2^k, 1/2^k) half the time and (1/2^k, 1/2^{k+1}) half the time. In the former case, we will guess A and B with equal probability and have a success rate of 50%; in the latter case, we will guess A (the most recent winner) with probability 2/3 and B with probability 1/3, for a success rate of 33%. On average, we achieve 5/12 = 41.7% success. Note that 0.583/0.5 = 1.166... < 1.39, as predicted.

On half the other horses, expert 1 has one more correct guess than expert 2, so the Bayesian will lose everyone of these bets. In addition, the Bayesian is guessing at random on the other horses, so his or her total success rate is 25%. Note that the experts are getting 50% success rates. Note that 0.75/0.5 = 1.5 < 2.41, as we expect.

As is usually the case, reducing the penalty from 1/2 to (for example) 0.9, would yield to slower convergence but better ultimate behavior. In the limit where the penalty goes to 1, the frequentist is achieving the same 50% that the "experts" are, while the Bayesian is still stuck at 25%.

Now, of course, one would hope that no algorithm would be so Sphexish as to ignore pure alternation like this. But the point of the randomized weighted majority guarantee is that it holds (up to the correctness of the random number generator) regardless of how much more complicated reality may be than the experts' models.

**davids**on Even Odds · 2014-01-13T18:20:26.261Z · score: 0 (0 votes) · LW · GW

I thought it was interesting too. As far as I can tell, your result is special to the situation of two bettors and two events. The description I gave describes a betting method when there are more than two alternatives, and that method is strategy proof, but it is not fair, and I can't find a fair version of it.

I am really stumped about what to do when there are three people and a binary question. Naive approaches give no money to the person with the median opinion.

**davids**on Even Odds · 2014-01-13T16:14:53.848Z · score: 4 (4 votes) · LW · GW

Here is another attempt to present the same algorithm, with the goal of making it easier to memorize:

"Each puts in the square of their surprise, then swap."

To spell this out, I predict that some event will happen with probability 0.1, you say it is 0.25. When it happens, I am 0.9 surprised and you are only 0.75 surprised. So I put down (0.9)^2 * D, you put down (0.75)^2 * D, and we swap our piles of money. Since I was more surprised, I come out the loser on the deal.

"Square of the surprise" is a quantity commonly used to measure the failure rate of predicative agents in machine learning; it is also known as Brier score. So we could describe this rule as "each bettor pays the other his or her Brier score." There was some discussion of the merits of various scoring systems in an earlier post of Coscott's.

**davids**on Mental Context for Model Theory · 2013-11-01T02:10:57.092Z · score: 0 (0 votes) · LW · GW

The operations on a Rubix cube aren't abelian. Is that just a typo on your part, or am I missing some subtle point you are making?

I'm not sure what you are getting at when you say you don't want to found math on sets. I definitely intended to use the word "set" in a naive sense, so that it is perfectly fine for sets to contain numbers, or rotations of a Rubix cube, or for that matter rocks and flowers. I wasn't trying to imply that the elements of a model had to be recursively constructed from the nullset by the axioms of ZFC. If you prefer "collection of things", I'd be glad to go with that. But I (and more to the point, model theorists) do want to think of a model as a bunch of objects with functions that take as inputs these objects and make other objects, and relations which do and do not hold between various pairs of the objects.

I'm retracting a bunch of the other things I wrote after this because, on reflection, the later material was replying to a misreading of what you wrote in your following post. I still think your de-emphasis on the fact that the model is the universe is very confusing, especially when you then talk about the cardinality of models. (What is the cardinality of a rule for assigning truth values?) But on careful reading, you weren't actually saying something wrong.

**davids**on Mental Context for Model Theory · 2013-11-01T00:54:17.801Z · score: 1 (1 votes) · LW · GW

The reals can be studied as models of many theories. They (with the operation +, relation = and element 0) are a model of the axioms of an abelian group. They are also a model of the axioms of a group. The reals with (+, *, 0, 1, =) are a model of the axioms of a field. The reals with (+, *, 0, 1, =, <) are a model of the axioms of an ordered field. Etcetera...

Models are things. Theories are collections of statements about things. A model can satisfy many theories; a theory can have many models. I agree completely with So8res statement that it is important to keep the two straight.

In addition, your example of Dedekind completeness is an awkward one because the Dedekind completeness axiom is a good example of the kind of thing you can't say in first order logic. (There are partial ways around this, but I'm trying to keep my replies on the introductory level of this post.) But I can just imagine that you had distinguished the reals and the rationals by saying that, in R, ∃ x : x^2=1+1 is true and in Q it is false, so I don't need to focus on that.

**davids**on Mental Context for Model Theory · 2013-10-31T23:51:00.031Z · score: 5 (5 votes) · LW · GW

"A model is an interpretation of the sentences generated by a language. A model is a structure which assigns a truth value to each sentence generated by some language under some logic."

I think this phrasing will be very misleading to anyone who tries to learn model theory from these posts. This is one thing a model DOES, but it isn't what a model IS. As far as I can tell, you nowhere say what a model is, even approximately. Writing out precisely what a model is takes a lot of space (like in the book you're reading!) so let me give an example.

Our alphabet will be the symbols of first order logic, plus as many variable names as we need, and the symbols +, =, 0.

Our axioms are

∀ x : x+0=0+x=x

∀ x,y: x+y=y+x

∀ x,y,z: (x+y)+z=x+(y+z)

∀ x ∃ y : x+y=y+x=0

Our THEORY is the set of all logical consequences of these statements, where "logical consequence" means "obtainable by the formal rules of first order logic . A MODEL of our theory is a specific set G, a specific element of G called 0 and a specific operation + taking two elements of G and returning a third element of G, such that all of these statements are true about G. In other words, a model of this theory is an abelian group.

One thing an abelian group can do is give us a way to assign a true or false value to any statement in our language. For example, consider the statement ∀ x ∃ y : y+y+y=x. This statement is true in the group of rational numbers, but false in the group of integers. If we choose a particular abelian group, that will force a specific choice as to whether this statement is true or false.

However, you shouldn't identify an abelian group with a way of assigning truth values to statements about abelian groups. For example, the rational numbers and the real numbers are both abelian groups and, as it turns out, there is no statement using only +, 0, = and logical connectives whose truth value is different in these two groups. Nonetheless, they are different models.

**davids**on Open Thread, September 30 - October 6, 2013 · 2013-10-09T18:19:06.492Z · score: 4 (4 votes) · LW · GW

Let me suggest a world view which is much less negative than the other replies: I view panhandlers as vendors of warm fuzzies and therefore treat them as I would any other street vendor whose product I am most likely not interested in. In particular, I have no reason to be hostile to them, or to be disrespectful of their trade.

If they engage me politely, I smile and say "No thanks." I think the second word there is helpful to my mindset and also makes their day a little better. If they become hostile or unpleasant, I feel no guilt about ignoring them; they have given me good reason to suspect their fuzzies are of low quality. If they have a particularly amusing approach, and I feel like treating myself, I give them money. (EG The woman who offered to bet me a dollar that she could "knock down this wall", gesturing at a nearby brick building. It was obviously a setup, but it was worth paying a dollar to learn the punchline, and she delivered it well.)

I developed this mindset while living in Berkeley, CA near Telegraph and walking everywhere, which I suspect means that I was encountering panhandlers at a rate about as high as anyone in the first world.

I also, of course, contribute significant portions of money to charities which can do a lot more good with it. If you are looking for a charity which specifically aids people in a situation similar to the ones you are refusing, you may want to consider the HOPE program http://www.thehopeprogram.org/ . In 2007, Givewell said about them "For donors looking to help extremely disadvantaged adults obtain relatively low-paying jobs, we recommend HOPE." http://www.givewell.org/united-states/charities/HOPE-Program . There is an argument (and Givewell makes it) that helping extremely disadvantaged adults in the first world obtain relatively low-paying jobs is so much harder than helping poor people in the third world that it should not be attempted. Without taking a side on that, if you feel guilty that you are not helping extremely disadvantaged adults in the first world, contributing to the HOPE project would do more to actually address this issue than giving to panhandlers.

**davids**on Amplituhedron? · 2013-09-21T17:12:34.856Z · score: 21 (21 votes) · LW · GW

Here is an attempt to create a roadmap to the amplituhedron work. My relevant background and disclaimers: I am a mathematician with interests in particle physics who has been trying to learn about Arkani-Hamed and collaborators' ideas for the last two years. The specific result which is getting press now is one that has not been public for most of that time; my goal had been to understand the story of scattering amplitudes as described in his prior 154 page paper. I have been meeting regularly with a group of mathematicians and physicists here at the University of Michigan in pursuit of this goal.

So, what should you learn first:

You should be completely comfortable with quantum mechanics and special relativity. I would point out that Less Wrong will give you great ideas about the philosophy of QM but is very short on computing any actual examples; you should understand how to actually use QM to solve problems.

Mathematically, I found my familiarity with representation theory and Lie groups extremely useful. However, a lot of the physicists in our group didn't have this background and compensated for it with strengths of their own.

You should understand the material of a first graduate course in Quantum Field Theory, through the computation of tree-level amplitudes. To learn this, I audited a course taught out of Srednicki's book, and also read on my own in Peskin-Schroeder and Zee. I can't claim to have a great understanding of this material, and if anyone has advice as to how to learn it better, I'd love to hear some. However, I feel confident in saying that, had I been enrolled in that class, I would have gotten an A, and I think you should at least be at that level. A second course in QFT certainly wouldn't hurt -- the fact that I had never worked through any loop integrals in detail handicapped me -- but I am managing without it.

If you get this far, I strongly recommend you next read Henriette Elvang and Yu-Tin Huang's notes on scattering amplitudes http://arxiv.org/abs/1308.1697 . As the abstract says, "The purpose of this review is to bridge the gap between a standard course in quantum field theory and recent fascinating developments in the studies of on-shell scattering amplitudes." I have found this extremely helpful. (Of course, being able to knock on Henriette's door and get her to explain something to me is even more valuable :).)

After that, I'd look at "Scattering Amplitudes and the Positive Grassmannian" http://arxiv.org/abs/1212.5605 . This is long and hard, but has the advantage that it is written down in full detail, unlike the current subject which only exists in lecture notes.

At this point, you will have caught up to me, so I'm not sure I can advise you how to go further. However, I will suggest that I find Arkani-Hamed's co-author, Trnka, much more understandable than Arkani-Hamed. These lecture notes http://wwwth.mpp.mpg.de/members/strings/amplitudes2013/amplitudes_files/program/Talks/WE/Trnka.pdf are the clearest presentation of the amplituhedron material I have found yet.

**davids**on Useful Questions Repository · 2013-07-25T16:27:18.287Z · score: 18 (18 votes) · LW · GW

"What hidden obstacle could be causing my failures?"

My mental shorthand for this is the following experience: I try to pull open the silverware drawer. It jams at an inch open. I push it shut and try again, same result. I pull harder, it opens a tiny bit more before stopping.

Reflection: Some physical object is getting in the way of the motion. Something could be on the drawer track, but more likely it is inside the drawer. It is a rigid object, because I always stop at the same place, although slightly squashable because I was able to yank and pull a little harder. It is probably striking the inner wall of the cabinet in which the drawer is mounted. It is on an angle because I can't see it when I look through the inch gap. There is a fork or knife angled up and poking against the inner wall. Digging around with my finger quickly finds a fork.

Since then, I've brought up this question by asking myself "what is the fork in the drawer"?

For example, my linear algebra students generally seem smart and attentive, but they become confused whenever I do a detailed computation with inner products. After some thought about which computations confuse them, hypothesize that whoever taught them basic matrix manipulations didn't teach the "transpose" operator, and particularly didn't teach the rule (AB)^T = B^T A^T. Fixed very quickly. (Of course, I also try to encourage them to ask questions about what confuses them, but I think that it is impossible to ever get a class comfortable enough questioning you to not need to think on your own about what is the underlying difficulty causing confusion.)

**davids**on Harry Potter and the Methods of Rationality discussion thread, part 24, chapter 95 · 2013-07-19T16:27:53.242Z · score: 0 (0 votes) · LW · GW

So, what do you all think is Voldemort's goal here? In canon, he was a power hungry sadist, so conquering the world while torturing his minions made sense. But MOR!Voldemort seems to find people tiresome and is happiest as an immortal in lifeless space. In that case, why not Horcrux Pioneer 11, kill his earthly body and be done with it?

At the moment, he has a plausible motivation -- provoke Harry into discovering a better form of immortality than Horcruxes, and use it for himself. But it seems implausible that this was his goal until Harry came to Hogwarts this year since he had no reason to expect that his murder of James and Lilly would lead to a combined scientific genius/wizard. What was he trying to accomplish before that?

**davids**on The flawed Turing test: language, understanding, and partial p-zombies · 2013-05-22T21:45:09.961Z · score: 10 (10 votes) · LW · GW

I remember hearing the story of a mathematical paper published in English but written by a Frenchmen, containing the footnotes:

1 I am grateful to professor Littlewood for helping me translate this paper into English.2

2 I am grateful to professor Littlewood for helping me translate this footnote into English.3

3 I am grateful to professor Littlewood for helping me translate this footnote into English.

Why was no fourth footnote necessary?

**davids**on Reflection in Probabilistic Logic · 2013-05-06T15:26:02.878Z · score: 1 (1 votes) · LW · GW

Other nitpicks (which I don't think are real problems):

If the Wikipedia article on Kakatuni's fixed point theorem is to be believed, then Kakatuni's result is only for finite dimensional vector spaces. You probably want to be citing either Glicksberg or Fan for the infinite dimensional version. These each have some additional hypotheses, so you should check the additional hypotheses.

At the end of the proof of Theorem 2, you want to check that the graph of is closed. Let be the graph of . What you check is that, if ) is a sequence of points in which approaches a limit, then that limit is in . This set off alarm bells in my head, because there are examples of a topological space , and a subspace , so that is not closed in but, if is any sequence in which approaches a limit in , then that limit is in . See Wikipedia's article on sequential spaces. However, this is not an actual problem. Since is countable, is metrizable and therefore closure is the same as sequential closure in .

**davids**on Reflection in Probabilistic Logic · 2013-05-06T14:58:26.304Z · score: 1 (1 votes) · LW · GW

In the proof of Theorem 2, you write "Clearly is convex." This isn't clear to me; could you explain what I am missing?

More specifically, let ) be the subset of obeying %20%3C%20b%20\%20\implies%20\%20\mathbb{P}\left(%20a%20%3C%20\mathbb{P}(\lceil%20\phi%20\rceil)%20%3C%20b%20\right)%20=1%20). So }%20X(\phi,a,b)). If ) were convex, then would be as well.

But ) is not convex. Project ) onto in the coordinates corresponding to the sentences and %20%3C%20b). The image is %20\cup%20\left(%20%20[0,1]%20\times%20\{%201%20\}%20\right)%20\cup%20\left(%20[b,1]%20\times%20[0,1]%20\right)). This is not convex.

Of course, the fact that the $X(\phi,a,b)$ are not convex doesn't mean that their intersection isn't, but it makes it non-obvious to me why the intersection should be convex. Thanks!

**davids**on Logical Pinpointing · 2012-11-02T18:33:40.041Z · score: 0 (0 votes) · LW · GW

This is pretty close to how I remember the discussion in GEB. He has a good discussion of non-Euclidean geometry. He emphasizes that originally the negation of Parallel Postulate was viewed as absurd, but that now we can understand that the non-Euclidean axioms are perfectly reasonable statements which describe something other than plane geometry we are used to. Later he has a bit of a discussion of what a model of PA + NOT(CON(PA)) would look like. I remember finding it pretty confusing, and I didn't really know what he was getting at until I red some actual logic theory textbooks. But he did get across the idea that the axioms would still describe something, but that something would be larger and stranger than the integers we think we know.