## Posts

## Comments

**alex_zag_al**on Belief in Self-Deception · 2017-11-05T23:13:59.943Z · LW · GW

I think that someone who merely believed they were happy, and then experienced real happiness, would not want to go back.

**alex_zag_al**on The correct response to uncertainty is *not* half-speed · 2017-02-24T16:32:40.312Z · LW · GW

There's an important category of choices: the ones where any good choice is "acting as if" something is true.

That is, there are two possible worlds. And there's one choice best if you knew you were in world 1, and another choice best if you knew you were in world 2. And, in addition, under any probabilistic mixture of the two worlds, one of those two choices is still optimal.

The hotel example falls into this category. So, one of the important reasons to recognize this category is to avoid a half-speed response to uncertainty.

Many choices don't fall into this category. You can tell because in many decision-making problems, gathering more information is a good decision. But, this is never acting as if you knew one of the possibilities for certain.

Arguably in your example, information-seeking actually *was* the best solution: pull over and take out a map or use a GPS.

It seems like another important category of choices is those where the best option is trying the world 1 choice for a specified amount of time and then trying the world 2 choice. Perhaps these are the choices where the best source of information is observing whether something works? Reminds me of two-armed bandit problems, where acting-as-if and investigating manifest in the same kind of choice (pulling a lever).

**alex_zag_al**on The price you pay for arriving to class on time · 2017-02-24T14:52:14.480Z · LW · GW

Yeah. I mean, I'm not saying you should arrive late to class.

The way to work what you're saying into the framework is:

The cost of consistently arriving late is high

The cost (in minutes spent waiting for the class to start) of avoiding consistent lateness is less high

Therefore, you should pay this cost in minutes spent waiting

The point is to quantify the price, not to say you shouldn't pay it.

**alex_zag_al**on Fact Posts: How and Why · 2017-02-23T07:39:32.508Z · LW · GW

the soft sciences have to deal with situations which never exactly repeat

This is also true of evolutionary biology--I think it's not widely recognized that evolutionary biology is like the soft sciences in this way.

**alex_zag_al**on Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality” · 2017-02-23T06:09:55.154Z · LW · GW

iii. Emphasize all rationality use cases evenly. Cause all people to be evenly targeted by CFAR workshops.We can’t do this one either; we are too small to pursue all opportunities without horrible dilution and failure to capitalize on the most useful opportunities.

This surprised me, since I think of rationality as the general principles of truth-finding.

What have you found about the degree to which rationality instruction needs to be tailored to a use-case?

**alex_zag_al**on Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality” · 2017-02-23T06:09:11.160Z · LW · GW

Several of these had the form “I, too, think that AI safety is incredibly important — and that is why I think CFAR should remain cause-neutral, so it can bring in more varied participants who might be made wary by an explicit focus on AI.”

I don't think that AI safety is important, which I guess makes me one of the "more varied participants made wary by an explicit focus on AI." Happy you're being explicit about your goals but I don't like them.

**alex_zag_al**on Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality” · 2017-02-23T06:07:20.268Z · LW · GW

Wow, I've read the story but I didn't quite realize the irony of it being a textbook (not a curriuculum, a textbook, right?) about judgment and decision making.

**alex_zag_al**on When (Not) To Use Probabilities · 2016-05-01T01:45:41.706Z · LW · GW

The alternative I would propose, in this particular case, is to debate the general rule of banning physics experiments because you cannot be absolutely certain of the arguments that say they are safe.

Giving up on debating the probability of a particular proposition, and shifting to debating the merits of a particular rule, is I feel one of the ideas behind frequentist statistics. Like, I'm not going to say anything about whether the true mean is in my confidence interval in this particular case. But note that using this confidence interval formula works pretty well on average.

**alex_zag_al**on Willing gamblers, spherical cows, and AIs · 2016-02-23T21:56:36.082Z · LW · GW

I don't know about the role of this assumption in AI, which is what you seem to care most about. But I think I can answer about its role in philosophy.

One thing I want from epistemology is a model of ideally rational reasoning, under uncertainty. One way to eliminate a lot of candidates for such a model is to show that they make some kind of obvious mistake. In this case, the mistake is judging something as a good bet when really it is guaranteed to lose money.

**alex_zag_al**on Examples of Rationality Techniques adopted by the Masses · 2016-02-14T14:45:26.539Z · LW · GW

Inquiring after the falsifiability of a theory?

Not perfect but very good, and pretty popular.

**alex_zag_al**on When Science Can't Help · 2015-11-14T11:13:31.333Z · LW · GW

After a few years in grad school, I think the principles of science are different from what you've picked up from your own sources.

In particular, this stands out to me as incorrect:

(1) I had carefully followed everything I'd been told was Traditionally Rational, in the course of going astray. For example, I'd been careful to only believe in stupid theories that made novel experimental predictions, e.g., that neuronal microtubules would be found to support coherent quantum states.

My training in writing grant applications contradicts this depiction of science. A grant has an introduction that reviews the facts of the field. It is followed by your hypothesis, and the mark of a promising grant is that the hypothesis looks obvious given your depiction of the facts. In fact, it is best if your introduction causes the reader to think of the hypothesis themselves, and anticipate its introduction.

This key feature of a good hypothesis is totally separate from its falsifiability (important later in the application). And remember, the hypothesis has to appear obvious in the eyes of a senior in the field, since that's who judges your proposal. Can you say this for your stupid theory?

(2) Science would have been perfectly fine with my spending ten years trying to test my stupid theory, only to get a negative experimental result, so long as I then said, "Oh, well, I guess my theory was wrong."

Given the above, the social practice of science would not have *funded* you to work for ten years on this theory. And this reflects the social practice's implementation of the ideals of Science. The ideals say your hypothesis, while testable, is stupid.

I think you have a misconception about how science handles stupid testable ideas. However, I can't think of a way that this undermines this sequence, which is about how science handles rational untestable ideas.

EDIT: it seems poke said all this years ago.

**alex_zag_al**on The Power of Noise · 2015-11-14T09:39:19.339Z · LW · GW

Bayesian adaptive clinical trial designs place subjects in treatment groups based on a posterior distribution. (Clinical trials accrue patients gradually, so you don't have to assign the patients using the prior: you assign new patients using the posterior conditioned on observations of the current patients.)

These adaptive trials are, as you conjecture, much more efficient than traditional randomized trials.

Example: I-SPY 2. Assigns patients to treatments based on their "biomarkers" (biological measurements made on the patients) and the posterior derived from previous patients.

When I heard one of the authors explain adaptive trials in a talk, he said they were based on multi-armed bandit theory, with a utility function that combines accuracy of results with welfare of the patients in the trial.

However, unlike in conventional multi-armed bandit theory, the trial design still makes random decisions! The trials are still sort of randomized: "adaptively randomized," with patients having a higher chance of being assigned to certain groups than others, based on the current posterior distribution.

**alex_zag_al**on Permitted Possibilities, & Locality · 2015-11-14T09:14:48.616Z · LW · GW

Here are some things that shouldn't happen, on my analysis: An ad-hoc self-modifying AI as in (1) undergoes a cycle of self-improvement, starting from stupidity, that carries it up to the level of a very smart human - and then stops, unable to progress any further.

I'm sure this has been discussed elsewhere, but to me it seems possible that progress may stop when the mind becomes too complex to make working changes to.

I used to think that a self-improving AI would foom because as it gets smarter, it gets easier for it to improve itself. But it may get harder for it to improve itself, because as it self-improves it may turn itself into more and more of an unmaintainable mess.

What if creating unmaintainable messes is the *only way* that intelligences up to very-smart-human-level know how to create intelligences up to very-smart-human level? That would make that level a hard upper limit on a self-improving AI.

**alex_zag_al**on Against the Bottom Line · 2015-11-01T18:06:07.360Z · LW · GW

As I understand the post, its idea is that a rationalist should never "start with a bottom line and then fill out the arguments".

I disagree. The idea, rather, is that your beliefs are as good as the algorithm that fills out the bottom line. Doesn't mean you shouldn't start by filling out the bottom line; just that you shouldn't do it by thinking of what feels good or what will win you an argument or by any other algorithm only weakly correlated with truth.

Also, note that if what you write above the bottom line can change the bottom line, that's part of the algorithm too. So, actually, I *do* agree that a rationalist should not write the bottom line, look for a chain of reasoning that supports it, and refuse to change the bottom line if the reasoning doesn't.

**alex_zag_al**on Don't You Care If It Works? - Part 1 · 2015-07-30T13:36:34.071Z · LW · GW

By trusting Eliezer on MWI, aren't you trusting both his epistemology and his mathematical intuition?

Eliezer believes that the MWI interpretation allows you to derive quantum physics without any additional hypotheses that add complexity, such as collapse or the laws of movement for Bohm's particles. But this belief is based on mathematical intuition, according to the article on the Born probabilities. Nobody knows *how* to derive the observations without additional hypotheses, but a lot of people such as Eliezer conjecture it's possible. Right?

I feel like this point must have been made many times before, as Eliezer's quantum sequence has been widely discussed, so maybe instead of a response I need a link to a previous conversation or a summary of previous conclusions.

But relating it to the point of your article... If Eliezer is wrong about quantum mechanics, should that lower my probability that his other epistemological views are correct? This is important because it affects whether or not I bother learning those views. The answer is "yes but not extremely", because I think if there's an error, it may be in the mathematical intuition.

To generalize a bit, it's hard to find pure tests of a single ability. Though your example of stopping rules is actually a pretty good one, for understanding the meaning of all the probability symbols. But usually we should not be especially confused when someone with expertise is wrong about a single thing, since that single thing is probably not a pure test of that expertise. However we *should* be confused if on average they are wrong as many times as people without the expertise. Then we have to doubt the expertise or our own judgments of the right and wrong answers.

**alex_zag_al**on Training Reflective Attention · 2015-01-08T23:13:10.626Z · LW · GW

This reminds me of hitting Ctrl+C, but on a thought process or object of focus instead of a program. After reading, I do it when i suspect I'm about to voluntarily do something I'm going to regret.

EDIT: At least, I think I'm doing it... I haven't done any training approaching the amount of time the training in your post takes.

**alex_zag_al**on Information theory and the symmetry of updating beliefs · 2014-11-22T19:37:52.996Z · LW · GW

Yes... if a theory adds to the surprisal of an experimental result, then the experimental result adds precisely the same amount of the surprisal of the theory. That's interesting.

**alex_zag_al**on Information theory and the symmetry of updating beliefs · 2014-11-22T05:32:31.380Z · LW · GW

Much like how inches and centimeters are off by a constant factor. Different log bases are analogous to different units.

**alex_zag_al**on Unfriendly Natural Intelligence · 2014-11-16T02:00:47.974Z · LW · GW

The fear of cults, and the related fear of cults of personality, are antimemes against excessive awe of persons.

**alex_zag_al**on The Truth and Instrumental Rationality · 2014-11-09T02:14:12.036Z · LW · GW

(last time I heard the word "jungle" was a Peruvian guy saying his dad grew up in the jungle and telling me about Peruvian native marriage traditions)

**alex_zag_al**on The Truth and Instrumental Rationality · 2014-11-08T20:59:33.013Z · LW · GW

Well that was a straightforward answer.

**alex_zag_al**on The Truth and Instrumental Rationality · 2014-11-08T20:33:46.878Z · LW · GW

The metaphor's going over my head. Don't feel obligated to explain though, I'm only mildly curious. But know that it's not obvious to everyone.

**alex_zag_al**on The Truth and Instrumental Rationality · 2014-11-03T15:11:31.851Z · LW · GW

...my suggestion is that truth-seeking (science etc) has increased in usefulness over time, whereas charisma is probably roughly the same as it has been for a long time.

Yes, and I think it's a good suggestion. I think I can phrase my real objection better now.

My objection is that I don't think this article gives any *evidence* for that suggestion. The historical storytelling is a nice *illustration*, but I don't think it's *evidence*.

I don't think it's evidence because I don't expect evolutionary reasoning at this shallow a depth to produce reliable results. Historical storytelling can justify all sorts of things, and if it justifies your suggestion, that doesn't really mean anything to me.

A link to a more detailed evolutionary argument written by someone else, or even just a link to a Wikipedia article on the general concept, would have changed this. But what's here is just evolutionary/historical storytelling like I've seen justifying all sorts of incorrect conclusions, and the only difference is that I happen to agree with the conclusion.

If you just want to illustrate something that you expect your readers to already believe, this is fine. If you want to convince anybody you'd need a different article.

**alex_zag_al**on Rationality Quotes November 2014 · 2014-11-01T21:40:09.647Z · LW · GW

This is from a novel (Three Parts Dead by Max Gladstone). The situation is a man and a woman who have to work together but have trouble trusting each other because of propaganda from an old war:

[Abelard] hesitated, suddenly aware that he was alone with a woman he barely trusted, a woman who, had they met only a few decades before, would have tried to kill him and destroy the gods he served. Tara hated propaganda for this reason. Stories always outlasted their usefulness.

**alex_zag_al**on Rationality Quotes November 2014 · 2014-11-01T21:19:28.842Z · LW · GW

Colin Howson, talking about how Cox's theorem bears the mark of Cox's training as a physicist (source):

An alternative approach is to start immediately with a quantitative notion and think of general principles that any acceptable numerical measure of uncertainty should obey. R.T. Cox and I.J. Good, working independently in the mid nineteen-forties, showed how strikingly little in the way of constraints on a numerical measure yield the finitely additive probability functions as canonical representations. It is not just the generality of the assumptions that makes the Cox–Good result so significant: unlike some of those which have to be imposed on a qualitative probability ordering, the assumptions used by Cox and to a somewhat lesser extent Good seem to have the property of being

uniformlyself-evidently analytic principles of numerical epistemic probabilitywhatever particular scale it might be measured in.Cox was a working physicist and his point of departure was a typical one: to look forinvariant principles:To consider first ... what principles of probable inference will hold however probability is measured. Such principles, if there are any, will play in the theory of probable inference a part like that of Carnot’s principle in thermodynamics, which holds for all possible scales of temperature, or like the parts played in mechanics by the equations of Lagrange and Hamilton, which have the same form no matter what system of coordinates is used in the description of motion. [Cox 1961]

**alex_zag_al**on The Truth and Instrumental Rationality · 2014-11-01T21:01:10.617Z · LW · GW

I like this post, I like the example, I like the point that science is newer than debate and so we're probably more naturally inclined to debate. I don't like the apparently baseless storytelling.

In the jungle of our evolutionary childhood, humanity formed groups to survive. In these groups there was a hierachy of importance, status and power. Predators, starvation, rival groups and disease all took the weak on a regular basis, but the groups afforded a partial protection. However, a violent or unpleasant death still remained a constant threat. It was of particular threat to the lowest and weakest members of the group. Sometimes these individuals were weak because they were physically weak. However, over time groups that allowed and rewarded things other than physical strength became more successful. In these groups, discussion played a much greater role in power and status. The truely strong individuals, the winners in this new arena were one's that could direct converstation in their favour - conversations about who will do what, about who got what, and about who would be punished for what. Debates were fought with words, but they could end in death all the same.

I don't know much about the environment of evolutionary adaptation, but it sounds like you don't either. Jungle? Didn't we live on the savannah? And forming groups for survival, it seems just as plausible that we formed groups for availability of mates.

If you don't know what the EEA was like, why use it as an example? All you really know is about the modern world. I think reasoning about the modern world makes your point quite well in fact. There are still plenty of people living and dying dependent on their persuasive ability. For example, Adolf Hitler lived while Ernst Rohm died. And we can guess that it's been like this since the beginning of humanity and that this has bred us to have certain behaviors.

I think this reasoning is a lot more reliable, in fact, than imagining what the EEA was like without any education in the subject.

Maybe I'm being pedantic--the middle of the post is structured as a story, a chronology. It definitely reads nicely that way.

**alex_zag_al**on Applications of logical uncertainty · 2014-10-24T20:04:29.271Z · LW · GW

Hmm. Yeah, that's tough. What do you use to calculate probabilities of the principles of logic you use to calculate probabilities?

Although, it seems to me that a bigger problem than the circularity is that I don't know what kinds of things are evidence for principles of logic. At least for the probabilities of, say, mathematical statements, conditional on the principles of logic we use to reason about them, we have some idea. Many consequences of a generalization being true are evidence for a generalization, for example. A proof of an analogous theorem is evidence for a theorem. So I can see that the kinds of things that are evidence for mathematical statements are other mathematical statements.

I don't have nearly as clear a picture of what kinds of things lead us to accept principles of logic, and what kind of statements they are. Whether they're empirical observations, principles of logic themselves, or what.

**alex_zag_al**on Applications of logical uncertainty · 2014-10-23T01:41:15.174Z · LW · GW

Do you know of any cases where this simulation-seeded Gaussian Process was then used as a prior, and updated on empirical data?

Like...

uncertain parameters --simulation--> distribution over state

noisy observations --standard bayesian update--> refined distribution over state

Cari Kaufman's research profile made me think that's something she was interested in. But I haven't found any publications by her or anyone else that actually do this.

I actually think that I misread her research description, latching on to the one familiar idea.

**alex_zag_al**on Post ridiculous munchkin ideas! · 2014-10-19T17:44:37.559Z · LW · GW

This reminds me of the story of Robert Edgar, who created the DNA and protein sequence alignment program MUSCLE.

He got a PhD in physics, but considers that a mistake. He did his bioinformatics work after selling a company and having free time. The bioinformatics work was notable enough that it's how I know of him.

His blog post, from which I learned this story: https://thewinnower.com/discussions/an-unemployed-gentleman-scholar

**alex_zag_al**on Logical uncertainty reading list · 2014-10-19T16:44:49.693Z · LW · GW

added, with whatever little bits of summary I could get by skimming.

**alex_zag_al**on Applications of logical uncertainty · 2014-10-19T16:26:38.347Z · LW · GW

It's true that this is a case of logical uncertainty.

However, I must add that in most of my examples, I bring up the benefits of a *probabilistic representation*. Just because you have logical uncertainty doesn't mean you need to represent it with probability theory.

In protein structure, we already have these Bayesian methods for inferring the fold, so the point of the probabilistic representation is to plug it i these methods as a prior. In philosophy, we want ideal rationality, which suggests probability. In automated theorem proving... okay, yeah, in automated theorem proving I can't explain why you'd want to use probability theory in particular.

But yes. If you had a principled way to turn your background information and already done computations into a probability distribution for future computations, you could use that for AI search problems. And optimization problems. Wow, that's a lot of problems. I'm not sure how it would stack up against other methods, but it'd be interesting if that became a paradigm for at least some problems.

In fact, now that you've inspired me to look for it, I find that it's being done! Not with the approach of coming up with a distribution over all mathematical statements that you see in Christiano's report, and which is the approach I had in mind when writing the post. But rather, with an approach like what Cari Kaufman I think uses, where you guess based on nearby points. Which is accomplished by modeling a difficult-to-evaluate function as a stochastic process with some kind of local correlations, like a Gaussian process, so that you get probability distributions for the values of the function at each point. What I'm finding is that this is, in fact, an approach people use to optimizing difficult-to-evaluate objective functions. See here for the details: Efficient Global Optimization of Expensive Black-Box Functions, by Jones, Schonlau and Welch.

**alex_zag_al**on Outside View(s) and MIRI's FAI Endgame · 2014-10-19T14:13:31.045Z · LW · GW

They wouldn't classify their work that way, and in fact I thought that was the whole point of surveying these other fields. Like, for example, a question for philosophers in the 1600s is now a question for biologists, and that's why we have to survey biologists to find out if it was resolved.

**alex_zag_al**on Applications of logical uncertainty · 2014-10-19T13:30:43.724Z · LW · GW

Yes. Because, we're trying to express uncertainty about the consequences of axioms. Not about axioms themselves.

common_law's thinking does seem to be something people actually do. Like, we're uncertain about the consequences of the laws of physics, while simultaneously being uncertain of the laws of physics, while simultaneously being uncertain if we're thinking about it in a logical way. But, it's not the kind of uncertainty that we're trying to model, in the applications I'm talking about. The missing piece in these applications are probabilities *conditional* on axioms.

**alex_zag_al**on Logical uncertainty reading list · 2014-10-19T07:27:53.948Z · LW · GW

Nice. Links added to post and I'll check them out later. The Duc and Williamson papers were from a post of yours, by the way. Some, MIRI status report or something. I don't remember.

**alex_zag_al**on A Limited But Better Than Nothing Way To Assign Probabilities to Statements of Logic, Arithmetic, etc. · 2014-10-18T16:43:44.385Z · LW · GW

I now think you're right that logical uncertainty doesn't violate any of Jaynes's desiderata. Which means I should probably try to follow them more closely, if they don't create problems like I thought they would.

An Aspiring Rationalist's Ramble has a post asserting the same thing, that nothing in the desiderata implies logical omniscience.

**alex_zag_al**on Applied Bayes' Theorem: Reading People · 2014-10-18T15:38:42.555Z · LW · GW

Here, the author is keeping in mind Conservation of Expected Evidence. If you could anticipate in advance the direction of any update, you should just update now. You should not expect to be able to get the right answer right away and never need to seriously update it.

There has to be a better way to put this.

The problem is that sometimes you *can* anticipate the direction. For example, if someone's flipping a coin, and you think it might have two heads. This is a simple example because a heads is always evidence in favor of the two-heads hypothesis, and a tails is always evidence in favor of the normal-coin hypothesis. We can see you become sure of the direction of evidence in this scenario: If the prior prob of two heads is 1/2, then after about ten heads you're 99% sure the eleventh is also going to be heads.

However, I *do* think that this is just because of very artificial features of the example that would never hold when making first impressions of people. Specifically, what's going on in the coin example is a hypothesis that we're very sure of, that makes very specific predictions. I can't prove it, but I think that's what allows you to be very sure of the update direction.

This never happens in social situations where you've just recently met someone--you're never sure of a hypothesis that makes very specific predictions, are you?

I don't know. I do know that there's some element of the situation besides conservation of probability going into this. It takes more than just that to derive that updates will be gradual and in an unpredictable direction.

(EDIT: I didn't emphasize this but updates aren't necessarily gradual in the coin example--a tails leads to an extreme update. I think that might be related--an extreme update in an unexpected direction balancing a small one in a known direction?)

**alex_zag_al**on The Level Above Mine · 2014-10-11T22:59:39.186Z · LW · GW

You'll note that I don't try to modestly say anything like, "Well, I may not be as brilliant as Jaynes or Conway, but that doesn't mean I can't do important things in my chosen field."

Because I do know... that's not how it works.

Maybe not in your field, but that is how it *usually* works, isn't it?

(the rest of this comment is basically an explanation of comparative advantage)

Anybody can take the load off of someone smarter, by doing the easiest tasks that have been taking their time.

As a most obvious example, a brilliant scientist's secretary. Another example: a brilliant statistician that employs a programmer, who turns his statistical and computational ideas into efficient, easy-to-use software. He doesn't have to be the best programmer, and doesn't have to be that great at statistics, but he allows the statistician to publish usable implementations of his statistical methods without having to code them himself.

Or, here's another one: I've heard MIRI needs a science writer, or needs funding for one. You don't have to be Eliezer Yudkowsky-level at thinking about FAI to save Yudkowsky the time it takes to write manuscripts that can be published in science journals, and then Yudkowsky uses that time for research.

This is "important work." It's not the kind of important work Jaynes or Conway does, and it doesn't put your name in the history books, and if that's what was meant by the article I have no disagreement. But by any *utilitarian* standard of importance, it's important.

**alex_zag_al**on Newcomblike problems are the norm · 2014-09-24T22:56:23.208Z · LW · GW

I don't understand this yet, which isn't too surprising since I haven't read the background posts yet. However, all the "roughly speaking" summaries of the more exact stuff are enough to show me that this article is talking about something I'm curious about, so I'll be reading in more detail later probably.

**alex_zag_al**on Logical uncertainty, kind of. A proposal, at least. · 2014-07-31T15:44:11.767Z · LW · GW

This is counterintuitive in an interesting way.

You'd think that since P(Q1|~∀xQx) = 1/2 and P(Q1|∀xQx) = 1, observing Q1 is evidence in favor of ∀xQx.

And it *is*, but the hidden catch is that this depends on the implication that ∀xQx->Q1, and that implication is exactly the same amount of evidence *against* ∀xQx.

It's also an amusing answer to the end of part 1 exercise.

**alex_zag_al**on Rationality Quotes May 2014 · 2014-05-22T02:50:49.797Z · LW · GW

i once had to go to the doctor so he could fish a lego out of my nose. So, that was worse than eating all the cabbage or spilling all the milk I think. More scary, and probably more expensive, depending on how the insurance worked out.

**alex_zag_al**on Truth: It's Not That Great · 2014-05-21T19:51:55.693Z · LW · GW

Truth is really important sometimes, but so far I've been bad about identifying when.

I know a fair bit about cognitive biases and ideal probabilistic reasoning, and I'm pretty good at applying it to scientific papers that I read or that people link through Facebook. But these applications are usually not important.

But, when it comes to my schoolwork and personal relationships, I commit the planning fallacy routinely, and make bad predictions against base rates. And I spend no time analyzing these kinds of mistakes or applying what I know about biases and probability theory.

If I really *operationalized* my belief that only some truths are important, I'd prioritize truths and apply my rationality knowledge to the top priorities. That would be awesome.

**alex_zag_al**on Truth: It's Not That Great · 2014-05-21T19:46:40.404Z · LW · GW

The first and third ones, about info sometimes being worthless, just made me think of Vaniver's article on value of information calculations. So, I mean, it sounded very LessWrongy to me, very much the kind of thing you'd hear here.

The second one made me think of nuclear secrets, which made me think of HPMOR. Again, it seems like the kind of thing that this community would recognize the value of.

I think my reactions to these were biased, though, by being told how I was expected to feel about them. I always like to subvert that, and feel a little proud of myself when what I'm reading fails to describe me.

**alex_zag_al**on The Joys of Conjugate Priors · 2014-04-22T22:58:06.397Z · LW · GW

I'm pretty sure that the Cauchy likelihood, like the other members of the t family, is a weighted mixture of normal distributions. (Gamma distribution over the inverse of the variance)

EDIT: There's a paper on this, "Scale mixtures of normal distributions" by Andrews and Mallows, if you want the details

**alex_zag_al**on A Fervent Defense of Frequentist Statistics · 2014-02-10T09:18:16.785Z · LW · GW

Hmm. Considering that I was trying to come up with an example to illustrate how explicit the assumptions are, the assumptions aren't that explicit in my example are they?

Prior knowledge about the world --> mathematical constraints --> prior probability distribution

The assumptions I used to get the constraints are that the best estimate of your next measurement is the average of your previous ones, and that the best estimate of its squared deviation from that average is some number s^2, maybe the variance of your previous observations. But those aren't states of the world, those are assumptions about your inference behavior.

Then I added later that the *real* assumptions are that you're making unbiased measurements of some unchanging quantity mu, and that the mechanism of your instrument is unchanging. These are facts about the world. But these are not the assumptions that I used to derive the constraints, and I don't show how they lead to the former assumptions. In fact, I don't think they do.

Well. Let me assure you that the assumptions that lead to the constraints are *supposed* to be facts about the world. But I don't see how that's supposed to work.

**alex_zag_al**on A Fervent Defense of Frequentist Statistics · 2014-02-10T08:38:40.362Z · LW · GW

it seems like a weird response to say "oh, well who cares about explicit assumptions anyways?"

Yeah, sorry. I was getting a little off topic there. It's just that in your post, you were able to connect the explicit assumptions being true to some kind of performance guarantee. Here I was musing on the fact that I couldn't. It was meant to undermine my point, not to support it.

What does it mean to "assume that the prior satisfies these constraints"?

?? The answer to this is so obvious that I think I've misunderstood you. In my example, the constraints are on moments of the prior density. In many other cases, the constraints are symmetry constraints, which are also easy to express mathematically.

But then you bring up concrete statements about the world? Are you asking how you get from your prior knowledge about the world to constraints on the prior distribution?

EDIT: you don't "assume a constraint", a constraint follows from an assumption. Can you re-ask the question?

**alex_zag_al**on A Fervent Defense of Frequentist Statistics · 2014-02-10T08:17:31.970Z · LW · GW

On the other hand, an argument I hear is that Bayesian methods make their assumptions explicit because they have an explicit prior. If I were to write this as an assumption and guarantee, I would write:

Assumption: The data were generated from the prior.

Guarantee: I will perform at least as well as any other method.

While I agree that this is an assumption and guarantee of Bayesian methods, there are two problems that I have with drawing the conclusion that “Bayesian methods make their assumptions explicit”. The first is that it can often be very difficult to understand how a prior behaves; so while we could say “The data were generated from the prior” is an explicit assumption, it may be unclear what exactly that assumption entails. However, a bigger issue is that “The data were generated from the prior” is an assumption that very rarely holds; indeed, in many cases the underlying process is deterministic (if you’re a subjective Bayesian then this isn’t necessarily a problem, but it does certainly mean that the assumption given above doesn’t hold).

In a post addressing a crowd where a lot of people read Jaynes, you're not addressing the Jaynesian perspective on where priors come from.

When E. T. Jaynes does statistics, the assumptions are made very clear.

The Jaynesian approach is this:

- your prior knowledge puts constraints on the prior distribution,
- and the prior distribution is the distribution of maximum entropy that meets all those constraints.

The assumptions are explicit because the constraints are explicit.

As an example, imagine that you're measuring something you've measured before. In a lot of situations, you'll end up with constraints on the mean and variance of your prior distribution. This is because you reason in the following way:

- You think that your best estimate for the next measurement is the average of your previous measurements.
- Your previous estimates have given you a sense of the general scale of the errors you get. You express this as an estimate of the squared difference between your estimated next measurement and your real next measurement.

If we take "best estimate" to mean "minimum expected least squared error", then these are constraints on the mean and variance of the prior distribution. If you maximize entropy subject to these constraints, you get a normal distribution. And that's your prior.

The assumptions are quite explicit here. You've assumed that these measurements are measuring some unchanging quantity in an unbiased way, and that the magnitude of the error tends to stay the same. And you *haven't* assumed any other relationship between the next measurement and the previous ones. You definitely didn't assume any autocorrelation, for example, because you didn't use the previous measurement in any of your estimates/constraints.

But since we're not assuming that the data are generated from the prior, we don't have the corresponding guarantee. Which brings up the question, what do we have? What's the point of this?

A paper I'm reading right now (Portilla & Simoncelli, 2000, "A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients") puts it this way:

**Guarantee**: "The maximum entropy density is optimal in the sense that it does not introduce any constraints on the [prior] besides those [we assumed]."

It's not a performance guarantee. Just a guarantee against hidden assumptions, I guess?

Despite apparently having nothing to do with performance, it makes a lot of sense from within Jaynes's perspective. To him, probability theory is logic: figuring out which conclusions are supported by your premises. It just extends deductive logic, allowing you to see *to what degree* each conclusion is supported by your premises.

And, I think your criticism of the Dutch Book argument applies here: is your time better spent making sure you don't have any hidden assumptions, or writing more programs to make more money? But it definitely makes your assumptions explicit, that's just part of the whole "prob theory as logic" paradigm.

(But I don't really understand what it *means* to not introduce any constraints besides those assumed, or why maxent is the procedure that achieves this. That's why I quoted the guarantee from someone who I assume *does* understand)

**alex_zag_al**on Putting in the Numbers · 2014-02-03T05:24:31.808Z · LW · GW

okay, and you were just trying to make sure that Manfred knows that all this probability-of-distributions speech you're speaking isn't, as he seems to think, about the degree-of-belief-in-my-current-state-of-ignorance distribution for the first roll. Gotcha.

**alex_zag_al**on Putting in the Numbers · 2014-02-03T05:09:49.721Z · LW · GW

Okay... but do we agree that the degree-of-belief distribution for the first roll is (1/3, 1/3, 1/3), whether it's a fair die or a completely biased in an unknown way die?

Because I'm pretty sure that's what Manfred's talking about when he says

There is a single correct distribution for our starting information, which is (1/3,1/3,1/3),

and I think him going on to say

the "distribution across possible distributions" is just a delta function there.

was a mistake, because you were talking about different things.

EDIT:

I thought so too, which is why I asked him what he thought a delta function in the distribution space meant.

Ah. Yes. Okay. I am literally saying only things that you know, aren't I. My bad.

**alex_zag_al**on Foundations of Probability · 2014-02-02T05:09:49.253Z · LW · GW

I like your writing style. For something technical, it feels very personal. And you keep it very concise while also easy to read - is there a lot of trimming down that goes on, or do you just write it that way?

**alex_zag_al**on Putting in the Numbers · 2014-02-02T04:53:58.996Z · LW · GW

In my experience with Bayesian biostatisticians, they don't talk much about the information a prior represents. But they're also not just using common ones. They talk a lot about its "properties" - priors with "really nice properties". As for as I can tell, they mean two things:

- Computational properties
- The way the distribution shifts as you get evidence. They think about this in a lot of detail, and they like priors that lead to behavior they think is reasonable.

I think this amounts to the same thing. The way they think and infer about the problem is determiend by their information. So, when they create a robot that thinks and infers in the same way, they are creating one with the same information as they have.

But, as a procedure for creating a prior that represents your information, it's very different from Jaynes's procedure. Jaynes's procedure being, stating your prior information very precisely, and then finding symmetries or constraints for maximum entropy I guess.

I'm very happy you're writing about logical uncertainty btw, it's been on my mind a lot lately.