Using vs. evaluating (or, Why I don't come around here no more)

philgoetz

Using vs. evaluating (or, Why I don't come around here no more)

post by PhilGoetz · 2014-01-20T02:36:29.575Z · LW · GW · Legacy · 38 comments

38 comments

[Summary: Trying to use new ideas is more productive than trying to evaluate them.]

I haven't posted to LessWrong in a long time. I have a fan-fiction blog where I post theories about writing and literature. Topics don't overlap at all between the two websites (so far), but I prioritize posting there much higher than posting here, because responses seem more productive there.

The key difference, I think, is that people who read posts on LessWrong ask whether they're "true" or "false", while the writers who read my posts on writing want to write. If I say something that doesn't ring true to one of them, he's likely to say, "I don't think that's quite right; try changing X to Y," or, "When I'm in that situation, I find Z more helpful", or, "That doesn't cover all the cases, but if we expand your idea in this way..."

Whereas on LessWrong a more typical response would be, "Aha, I've found a case for which your step 7 fails! GOTCHA!"

It's always clear from the context of a writing blog why a piece of information might be useful. It often isn't clear how a LessWrong post might be useful. You could blame the author for not providing you with that context. Or, you could be pro-active and provide that context yourself, by thinking as you read a post about how it fits into the bigger framework of questions about rationality, utility, philosophy, ethics, and the future, and thinking about what questions and goals you have that it might be relevant to.

38 comments

Comments sorted by top scores.

comment by [deleted] · 2014-01-20T13:29:17.937Z · LW(p) · GW(p)

Summary: Trying to use new ideas is more productive than trying to evaluate them.

If you drop the qualifier "new", then you may find something else holds; trying to use evaluated ideas is more productive than trying to use new ones. From personal experience, I've gotten a lot of utility from less wrong threads. Because they are evaluated by the community, I find them more useful.

I rarely post unless I have something constructive to add. One such reason would be if I noticed an error or oversight and think that some reader might benefit from my remark. If I found a post useful however, I just upvote it and don't comment about its usefulness.

Hypothesis: The lack of people using your posts is more perceived than real.

Proposed Test: At the end of a post, ask that if the reader found the post helpful they leave a comment saying something to that effect, instead of just upvoting. Make sure to give a reason for the request to increase participation. i.e. "If you found this post useful, please comment saying so, because I am testing a hypothesis."

Replies from: itaibn0

↑ comment by itaibn0 · 2014-01-20T14:28:47.217Z · LW(p) · GW(p)

Well, in my personal experience, I almost never use the ideas I find on Less Wrong.

Replies from: None

↑ comment by [deleted] · 2014-01-22T21:30:50.894Z · LW(p) · GW(p)

Then why are you on less wrong? Have you read the sequences?

I'm genuinely perplexed at why someone would spend as much time on LW as you seem to have done while never using any of the ideas in its meme pool. The stated point of this site is to improve human rationality. That means you.

Do you think you're already above what LW could have taught you? Do you think LW is entirely wrong and just like to tell people that? What's the deal?

I'm attracted to LW because the idea of a group of people collectively becoming stronger is a powerful one. It's not the reality of LW, but I'm willing to stick around and scan it every one in a while for someone I can use, and when I see someone I can use, I try to use it. I feel that using ideas I've found on LW has made me much better at the things I do.

Replies from: Lumifer

↑ comment by Lumifer · 2014-01-22T21:35:23.435Z · LW(p) · GW(p)

I'm genuinely perplexed at why someone would spend as much time on LW as you seem to have done while never using any of the ideas in its meme pool. The stated point of this site is to improve human rationality.

The "stated point" is not very relevant to the personal utility one can find.

"...the street finds its own uses for things" -- William Gibson.

Replies from: None

↑ comment by [deleted] · 2014-01-23T03:09:59.912Z · LW(p) · GW(p)

Can you give an example of great personal utility being conveyed by using a self-help website for some other purpose?

I guess this goes back to the OPs point that less wrong is more about arguing about minutia than anything else.

Replies from: asr, Lumifer

↑ comment by asr · 2014-01-23T16:08:28.122Z · LW(p) · GW(p)

Can you give an example of great personal utility being conveyed by using a self-help website for some other purpose?

I derive noninstrumental value from thinking about decision-making in new and different ways. For example, I don't know if it's useful exactly to sharpen my ideas about ethics, but it's stimulating and pleasant.

A number of contributors here are good writers, and I enjoy reading their work. I particularly enjoy reading prose from Eliezer and Scott, even if it doesn't help me in any direct way.

↑ comment by Lumifer · 2014-01-23T15:49:59.206Z · LW(p) · GW(p)

Can you give an example of great personal utility being conveyed by using a self-help website for some other purpose?

Sure. One whitehat example: LW is a community of highly intelligent weird people. I am sure some find a lot of personal utility in just being members of such a community.

And one blackhat example: LW people are a good target for trolling and successful trolling leads to hilarious lulz.

Replies from: None

↑ comment by [deleted] · 2014-01-27T19:07:28.311Z · LW(p) · GW(p)

Sure. One whitehat example: LW is a community of highly intelligent weird people. I am sure some find a lot of personal utility in just being members of such a community.

This goes right back to the OP's point: more people "use" LW for status contests than they do for self-improvement.

So, if the OP posts something on LW, it's going to be mined for pitfalls to be exploited and pointed out in order to gain status, rather than being fairly evaluated as an instrumental tool.

Which is why the OP doesn't like LW any more. And why LW will trend towards uselessness over time.

Replies from: Lumifer

↑ comment by Lumifer · 2014-01-27T19:10:43.024Z · LW(p) · GW(p)

more people "use" LW for status contests

Gaining utility from being a member of a community is NOT AT ALL the same as using that community for status contests.

comment by [deleted] · 2014-01-21T19:28:13.460Z · LW(p) · GW(p)

The key difference, I think, is that people who read posts on LessWrong ask whether they're "true" or "false", while the writers who read my posts on writing want to write.

Whereas on LessWrong a more typical response would be, "Aha, I've found a case for which your step 7 fails! GOTCHA!"

This is a failure mode of LW that's disenchanted me with the community for several years.

I think the core of it is that almost nobody on LW has anything to protect, which is what you're basically pointing out here. The people on your blog want to get better at writing. LWers want to get social validation from the LW community, and a great way to do that is to shit on anyone they can to eke out a little more status.

I think a lot of the problem has to do with karma. It's easy to upvote GOTCHA comments because it feels like the truth-valuing thing to do and most of the time the comments are verifiable. But GOTCHA comments are the imgur links of LW. They provide a bite-size amount of feel-good content that's ultimately useless.

I think less wrong would benefit from having its own /r/circlejerk, because sometimes the only way to notice these sorts of things is to magnify them to the point of absurdity.

comment by Cyan · 2014-01-20T11:51:53.973Z · LW(p) · GW(p)

I consider you a smart guy -- but when I wrote a couple of front page posts about Bayesian statistics, you made a few comments that revealed notable Dunning-Kruger effect with respect to the topic. Since that time, my observations of your interactions with other subject matter experts about topics in their domains of expertise have only reinforced this impression.

My current understanding of you is that you are a smart guy, but posting on LW is often unrewarding for you because (i) your high intelligence is a core part of your self-image, and (ii) you're not as smart as you think you are. I hope this info can be of use to you; apologies for any narcissistic injury caused by this comment;.

Replies from: Anatoly_Vorobey, Kaj_Sotala, PhilGoetz

↑ comment by Anatoly_Vorobey · 2014-01-20T19:14:08.160Z · LW(p) · GW(p)

Out of curiosity, did you consider sending this comment via PM, and if so, what made you decide to post it publicly?

Replies from: Cyan

↑ comment by Cyan · 2014-01-20T21:58:09.376Z · LW(p) · GW(p)

I didn't think of using a PM. I don't have any good reason to do this publicly... hmm.

If you were implicitly questioning my motives, you were right to do so.

↑ comment by Kaj_Sotala · 2014-01-20T17:35:49.273Z · LW(p) · GW(p)

If you're going to make a comment like this, you really have to provide specific examples.

Replies from: Cyan

↑ comment by Cyan · 2014-01-20T18:45:55.507Z · LW(p) · GW(p)

Fair enough. Here's the comment thread. There was also a follow-up PM exchange between PhilGoetz and me which gave me very weak but non-zero evidence supporting my impression.

The "other subject matter experts" examples are too fuzzy in my memory to try to find; the principle example (and possibly only example) is the time EY explicitly discommended PhilGoetz's attempt to reiterate EY's ideas (in the comment thread of a post PhilGoetz wrote to critique EY's ideas).

Replies from: IlyaShpitser, PhilGoetz

↑ comment by IlyaShpitser · 2014-01-20T19:19:04.584Z · LW(p) · GW(p)

Yeah, I have conversations like this about causality with people here all the time :(. (I don't remember any particular ones with Phil specifically). It is definitely a problem wider than just one individual.

↑ comment by PhilGoetz · 2014-01-21T18:17:43.011Z · LW(p) · GW(p)

You called me over-confident, and as evidence, cited a conversation in which I mostly asked you questions. It seems your claim is based on my having said,

You said, "seek a prior that guarantees posterior calibration." That's what both EM and Gibbs sampling do, which is why I asked.

and your opinion that is wrong.

My recollection is that both EM and Gibbs sampling produce prior probabilities which maximize the likelihood of the observed data. In other words, they produce priors that result in posteriors (the probability of the observed data given those priors) that are perfectly calibrated to the data you train them on.

So our situations are symmetric: I think you did not quite understand what you said, or else what I said, or else you misunderstand EM and Gibbs sampling. I'm open to correction.

Replies from: Cyan

↑ comment by Cyan · 2014-01-21T18:31:30.446Z · LW(p) · GW(p)

Perhaps you could lay out the problem with my evidence in more concrete terms?

(ETA: At the time I wrote this reply, the comment I was responding to read

You called me over-confident, and as evidence, cited a conversation in which I asked you questions.

As I write this ETA, there's a lot more detail in the parent.)

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2014-01-21T18:40:25.286Z · LW(p) · GW(p)

That is as concrete as I can make it, unless you want me to write out an algorithm for Gibbs sampling and explaining why it produces priors that maximize the posterior. Or give an example where I used it to do so. I can do that: I had a set of about 8 different databases I was using to assign functions to known proteins. I wanted to estimate the reliability of each database, as a probability that its annotation was correct. This set of 8 probabilities was the set of priors I sought. I had a set of about a hundred-thousand annotated proteins, and given a set of priors, I could produce the probability of the given set of 100,000 annotations. I used that dataset plus Gibbs sampling to produce those 8 priors. And it worked extraordinarily well.

Replies from: Cyan

↑ comment by Cyan · 2014-01-21T19:41:06.248Z · LW(p) · GW(p)

Oh man, you're not doing yourself any favors in trying to shift my understanding of you. Not that I doubt that your algorithm worked well! Let me explain.

You've used a multilevel modelling scheme in which the estimands are the eight proportions. In general, in any multilevel model, the parameters at a given level determine the prior probabilities for the variables at the level immediately below. In your specific context, i.e., estimating these proportions, a fully Bayesian multilevel model would also have a prior distribution on those proportions (a so-called "hyperprior", terrible name).

If you didn't use one, your algorithm can be viewed as a fully Bayesian analysis that implicitly used a constant prior density for the proportions, and this will indeed work well given enough information in the data. Alternatively, one could view the algorithm as a (randomized) type II maximum likelihood estimator, also known as "empirical Bayes".

In a fully Bayesian analysis, there will always be a top-level prior that is chosen only on the basis of prior information, not data. Any approach that uses the data to set the prior at the top level is an empirical Bayes approach. (These are definitions, by the way.) When you speak of "estimating the prior probabilities", you're taking an empirical Bayes point of view, but you're not well-informed enough to be aware that "Bayesian" and "empirical Bayes" are not the same thing.

The kinds of prior distributions with which I was concerned in my posts are those top-level prior distributions that don't come from data. Now, my pair of posts were terrible -- they basically dropped all of the readers into the inferential gap. But smart mathy guy cousin_it was intrigued enough to do his own reading and wrote some follow-up posts, and these serve as an existence proof that it was possible for someone with enough background to understand what I was talking about.

On the other hand, you didn't know what I was talking about, but you thought you did, and you offered questions and comments that apparently you still believe are relevant to the topic I addressed in my posts. To me, it really does look like -- in this context, at least -- you are laboring under a "cognitive bias in which unskilled individuals suffer from illusory superiority, mistakenly rating their ability much higher than is accurate".

So now I'll review my understanding of you:

Smart? Yes.
Not as smart as you think you are? Yes.
High intelligence is a core part of your self-image? Well, you did find my claim "not as smart as you think you are" irritating enough to respond to; you touted your math degree, teaching experience, and success in data analysis. So: yes.
Posting on LW is often unrewarding for you because of above three traits? Hmm... well, that has the same answer as this question: have you found our current exchange unrewarding? (Absent further info, I'm assuming the answer is "yes".)

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2014-01-22T01:29:37.345Z · LW(p) · GW(p)

To claim evidence that I'm overconfident, you have to show me asserting something that is wrong, and then failing to update when you provide evidence that it's wrong.

In the thread which you referenced, I asked you questions, and the only thing I asserted was that EM and Gibbs sampling find priors which will result in computed posteriors being well-calibrated to the data. You did not provide, and still have not provided, evidence that that statement was wrong. Therefore I did not exhibit a failure to update

I might be using different terminology than you--by "priors" I meant the values that I'm going to use as priors in my running program on new data for transferred function annotations, and by "posteriors" I meant the posterior probability it will compute for a given annotation, given those "priors". I didn't claim to know what the standard terminology is. The only thing I claimed was that Gibbs sampling & EM did something that, using my terminology, could be described as setting priors so they gave calibrated results.

If you had corrected my terminology, and I'd ignored you, that would have been a failure to update. If you'd explained that I misunderstand Gibbs sampling, that would have been a failure to update. You didn't.

Relevant to your post? I don't know. I didn't assert that that particular fact was relevant to your post. I don't know if I even read your post. I responded to your comment, "seek a prior that guarantees posterior calibration," very likely in an attempt to understand your post.

you didn't know what I was talking about, but you thought you did

Again, what are you talking about? I asked you questions. The only thing I claimed to know was about the subject that I brought up, which was EM and Gibbs sampling.

As far as I can see, I didn't say anything confidently, I didn't say anything that was incorrect AFAIK, I didn't claim you had made a mistake, and I didn't fail to update on any evidence that something I'd said was wrong. So all these words of yours are not evidence for my over-confidence.

Even now, after writing paragraphs on the subject, you haven't tried to take anything I claimed and explain why it is wrong!

Try this approach: Look over the comments that you provided as evidence of my overconfidence. Say what I would have written differently if I were not overconfident.

In a fully Bayesian analysis, there will always be a top-level prior that is chosen only on the basis of prior information, not data. Any approach that uses the data to set the prior at the top level is an empirical Bayes approach.

I don't see how distinction makes sense for Gibbs sampling or EM. They are iterative procedures that take your initial (top-level) prior, and then converge on a posterior-to-the-data value (which I called the prior, as it is plugged into my operating program as a prior). It doesn't matter how you choose your initial prior; the algorithm will converge onto the same final result, unless there is some difficulty converging. That's why these algorithms exist--they spare you from having to choose a prior, if the data is strong enough that the choice makes no difference.

Replies from: Cyan

↑ comment by Cyan · 2014-01-22T06:20:22.751Z · LW(p) · GW(p)

If you'd explained that I misunderstand Gibbs sampling, that would have been a failure to update. You didn't.

I wrote a comment that was so discordant with your understanding of Gibbs sampling and EM that it should have been a red flag that one or the other of us was misunderstanding something. Instead you put forth a claim stating your understanding, and it fell to me to take note of the discrepancy and ask for clarification. This failure to update is the exact event which prompted me to attach "Dunning-Kruger" to my understanding of you.

I don't see how distinction makes sense for Gibbs sampling or EM... That's why these algorithms exist--they spare you from having to choose a prior, if the data is strong enough that the choice makes no difference.

The way in which the ideas you have about EM and Gibbs sampling are wrong isn't easily fixable in a comment thread. We could do a Google Hangout at some point; if you're interested, PM me.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2014-01-22T15:54:06.593Z · LW(p) · GW(p)

I believe my ideas about Gibbs sampling are correct, as demonstrated by my correct choice and implementation of it to solve a difficult problem. My terminology may be non-standard.

Here is what I believe happened in that referenced exchange: You wrote a comment that was difficult to comprehend, and I didn't see how it related to my question. I explained why I asked the question, hoping for clarification. That's a failure to communicate, not a failure to update.

Replies from: Vaniver, Cyan, jsalvatier

↑ comment by Vaniver · 2014-01-22T21:57:29.171Z · LW(p) · GW(p)

Here is what I believe happened in that referenced exchange: You wrote a comment that was difficult to comprehend, and I didn't see how it related to my question. I explained why I asked the question, hoping for clarification. That's a failure to communicate, not a failure to update.

My interpretation, having read this comment thread and then the original: Cyan brought up a subtle point about statistics, explained in a non-obvious way. (This comment seemed about as informative to me as the entire post.) You asked "don't statistical procedures X and Y solve this problem?", to which Cyan responded that they weren't relevant, and then you repeated that they do.

Here, the takeaway I would make is that Cyan is likely a theory guy, and you're likely an applications guy. (I got what I think Cyan's point was on my first read, but it was a slow read and my "not my area of expertise" alarms were sounding.) It is evidence for overconfidence when people don't know what they don't know (heck, that might even be a good definition for overconfidence).

Say what I would have written differently if I were not overconfident.

After Cyan's response that Gibbs and EM weren't relevant, I would have written something like "If Gibbs and EM aren't relevant to the ideas of this post, then I don't think I understand the ideas of this post. Can you try to summarize those as clearly as possible?"

↑ comment by Cyan · 2014-01-22T16:32:06.817Z · LW(p) · GW(p)

That's a failure to communicate, not a failure to update.

Okay, fair enough. I'll give it a shot, and then I'm bowing out.

Let me explain the problem with

That's why these algorithms exist--they spare you from having to choose a prior, if the data is strong enough that the choice makes no difference.

This is not why these algorithms exist. EM isn't really an algorithm per se; it's a recipe for building an optimization algorithm for an objective function with the form given in equation 1.1 of the seminal paper on the topic. Likewise, Gibbs sampling is a recipe for constructing a certain type of Markov chain Monte Carlo algorithm for a given target distribution.

If you read the source material I've linked, you'll notice that the EM paper gives many examples in which nothing like what you call a prior (actually a proportion) is present, e.g., sections 4.1.3, 4.6. Something like what you call priors are present in the example of section 4.3, although those models don't really match the problem you solved. (To see why I brought up empirical Bayes in the context of your problem, read section 4.5.)

You'll also notice that the Wikipedia article on MCMC does not mention priors in either your sense or my sense at all. That is because such notions only arise in specific applications; a true grokking of MCMC in general and Gibbs sampling in particular does not require the notion of a prior in either sense.

You've understood how to use the Gibbs sampling technology to solve a problem; that does not mean you understand the key ideas underlying the technology. Your problem was in the space of problems addressed by the technology, but that space is much larger, and the key ideas much more general, than you have as yet appreciated.

↑ comment by jsalvatier · 2014-02-07T00:17:01.789Z · LW(p) · GW(p)

Not to be a jerk, but your ideas about Gibbs and EM seem very wrong to me too, for exactly the reasons that Cyan describes below.

Because of that, surprised that you said you had used Gibbs in a statistical application with great success. Perhaps you were using a stats package that used Gibbs sampling rather than being Gibbs sampling?.

↑ comment by PhilGoetz · 2014-01-21T18:13:24.568Z · LW(p) · GW(p)

Assuming that I disagreed with you re. Bayesian statistics, our positions are symmetric--you believe I am overconfident, and I believe you are over-confident. I have a degree in math and have taught basic Bayesian statistics at a university, and used Bayesian statistics successfully to get correct results in computer programs many times, so I have some reason for my confidence. Have you made use of this information in re-evaluating your own confidence?

Replies from: Cyan

↑ comment by Cyan · 2014-01-21T18:23:05.707Z · LW(p) · GW(p)

You'd told me by PM that you'd carried out analyses using Bayesian methods, but when I asked you to give me a look at some, you (justifiably!) deemed it not worth your time to do so. So that part is incorporated into my picture of you. I didn't know about the math degree or teaching, but that info is in line with my current understanding of you, so it doesn't shift it.

comment by Gunnar_Zarncke · 2014-01-20T10:52:20.829Z · LW(p) · GW(p)

So you say that you get more actually helpful comments on your blog. But: Do those comments help only you and your blog or do they (possibly transitively) help to improve rationality overall which is the stated goal here?

You don't say it but I wonder whether you propose to or wish for more constructive comments here.

Is that a worthwhile goal here? Maybe not. Gotcha (meant humorously; to point this out as an example of your 'accusation').

Can this be generalized to comment cultures depending on blog topic?

I wonder whether you this kind of comments on your blog.

comment by Stefan_Schubert · 2014-01-25T16:41:43.613Z · LW(p) · GW(p)

Interesting post. Please do stick around; you seem to have interesting ideas.

This very good and thought-provoking post is relevant: http://lesswrong.com/lw/3h/why_our_kind_cant_cooperate/

I'd like to think that one should be nice simply for the sake of being nice. But it also seems to me that in this case it's rational to be nice; or, to put it differently, that if Less Wrong's discussion culture drives away interesting posters such as Phil then that culture is not very rational.

Hence I think we should have a discussion about what the rational, or optimal, discussion culture is. This would be somewhat related to the ask vs guess gulture discussion, where I side with the guess culture (which says that you should try to think of how what you say is likely to be received) rather than with the ask culture (which encourages you to be "frank" and not be overly concerned with the consequences of this frankness). As a guesser, I think that you often have to restrain yourself from asking for things bluntly (even if you think it's a reasonable request). Similarly, I think that even if "step 7 fails", you shouldn't point that out in that blunt fashion ("Aha, I've found a case for which your step 7 fails! GOTCHA!").

comment by moridinamael · 2014-01-20T03:18:28.672Z · LW(p) · GW(p)

Oh, well, yes, there are lots of folks who can only seem to poke holes and criticize the surface of things, but those aren't the people who make LessWrong worthwhile, those people are the noise. Admittedly sometimes you have to slog through fifty nitpicks to find one earnest insight, but that's sort of the human condition.

Replies from: passive_fist

↑ comment by passive_fist · 2014-01-20T05:20:25.620Z · LW(p) · GW(p)

I've been analyzing my own behavior and I realized that knowing more about rationality and biases and so on is a double-edged sword. On the one hand, it allows me to evaluate ideas in a better way, but on the other (the human brain being what it is), being right about a lot of things leads to arrogance and a belief that one is always right. This is compounded by the use of the upvote/downvote system, which only makes this problem worse. It's so easy to fall into the trap of arrogance when you have a large number of upvotes on your posts. It's also easy to fall into the trap of thinking that one is immune to this sort of self-deception.

Sadly I've been seeing this type of behavior often here, and sometimes even from established members.

I can't think of any solution except a total rehaul of the social dynamics of the site. There must be a better way of having more rational discussions than petty disagreements and blind anonymous upvotes/downvotes.

Replies from: fubarobfusco

↑ comment by fubarobfusco · 2014-01-20T07:45:01.821Z · LW(p) · GW(p)

http://lesswrong.com/lw/he/knowing_about_biases_can_hurt_people/

Replies from: passive_fist

↑ comment by passive_fist · 2014-01-21T02:17:08.220Z · LW(p) · GW(p)

That's a very good link, but I'm talking about the slightly different problem of arrogance - believing that you are right and other people are stupid, rather than simply trying to argue against them with rationalist ammunition. Although to some extent I think these two concepts overlap.

comment by atorm · 2014-01-20T06:11:14.558Z · LW(p) · GW(p)

I wish you came 'round here more.

Replies from: PhilGoetz

↑ comment by PhilGoetz · 2014-04-29T19:09:32.446Z · LW(p) · GW(p)

Thanks. I appreciate it.

comment by MugaSofer · 2014-01-22T09:18:02.857Z · LW(p) · GW(p)

Posted in the wrong thread

comment by Gurkenglas · 2014-01-20T15:29:07.175Z · LW(p) · GW(p)

"Aha, I've found a case for which your step 7 fails! GOTCHA!"

That's a strawman! GOTCHA!

Using vs. evaluating (or, Why I don't come around here no more)

Contents

38 comments