Important fact about how people evaluate sets of arguments

post by Daniel Kokotajlo (daniel-kokotajlo) · 2023-02-14T05:27:58.409Z · LW · GW · 11 comments

Contents

11 comments

Ronny Fernandez on twitter:

  • A very important fact which just came to my attention is that people do not tend to sum or take the max reasonableness of arguments for P to form a judgement about P, rather they tend to take the average. 
  • This is a somewhat reasonable heuristic in some situations. For instance, if someone gives you a really unreasonable argument for P this is evidence that their judgement of arguments isn’t very good, and so their best argument is more likely to be secretly bad.
  • Similarly, it is evidence that they are motivated to convince you even using faulty arguments, which is generally speaking a bad sign.
  • It has important implications. Sometimes people think “oh I will make 50 ok arguments for P instead of one really good one” but most folks are not very impressed by this, even though they should be.
  • Relatedly, if you try to turn a complicated thesis T into a social movement, the average reasonableness of an argument in favor of T will plummet, and so you may very quickly find that everyone perceives the anti-T-ers as being much more reasonable.
  • This will probably still be true even if the best pro-T arguments are very good, and especially true if the best pro-T arguments are subtle or hard to follow.
  • Yes, this is about ai risk. I don’t think this is a slam dunk argument against trying to make ai-risk-pilled-ness into a popular social movement, but it is a real cost, and nearly captures the shape of my real worries.
  • The best version of my real worries, I also have real worries that are not nearly as defensible or cool.
  • Oh actually probably they take the min rather than the average unless they like you, in which case they take the max.

First of all, is this important fact actually true? I'd love to know. Reviewing my life experience... it sure seems true? At least true in many circumstances? I think I can think of lots of examples where this fact being true is a good explanation of what happened. If people have counterarguments or sources of skepticism I'd be very interested to hear in the comments.

Secondly, I concluded a while back that One Strong Argument Beats Many Weak Arguments [LW(p) · GW(p)], and in Ye Olden Days of Original Less Wrong when rationalists spent more time talking about rationality there was a whole series of posts arguing for the opposite claim (1 [LW · GW], 2 [LW · GW], 3 [LW · GW]). Seems possibly related. I'd love to see this debate revived, and tied in to the more general questions of: 

(A) Does rationality in practice recommend aggregating the quality of a group of arguments for a claim by taking the sum, the max, the min, the mean, or what? (To be clear, obviously the ideal is more complicated & looks more like Bayesian conditionalization on a huge set of fleshed-out hypotheses. But in practice, when you don't have time for that, what do you do?)

(B) What do people typically do, and on what factors does that depend--e.g. do they take the min if they don't like you or the claim, and take the max if they do?

Finally: Steven Adler pointed me to this paper that maybe provides some empirical evidence for Ronny's claim.

 

11 comments

Comments sorted by top scores.

comment by Taran · 2023-02-14T18:40:58.945Z · LW(p) · GW(p)

Any time you get a data point about X, you get to update both on X and on the process that generated the data point.  If you get several data points in a row, then as your view of the data-generating process changes you have re-evaluate all of the data it gave it you earlier.  Examples:

  • If somebody gives me a strong-sounding argument for X and several weak-sounding arguments for X, I'm usually less persuaded than if I just heard a strong-sounding argument for X.  The weak-sounding arguments are evidence that the person I'm talking to can't evaluate arguments well, so it's relatively more likely that the strong-sounding argument has a flaw that I just haven't spotted.
  • If somebody gives me a strong-sounding argument for X and several reasonable-but-not-as-strong arguments against X, I'm more persuaded than just by the strong argument for X.  This is because the arguments against X are evidence that the data-generating process isn't filtered (there's an old Zack_M_Davis post about this but I can't find it).  But this only works to the extent that the arguments against X seem like real arguments and not strawmen: weak-enough arguments against X make me less persuaded again, because they're evidence of a deceptive data-generating process.
  • If I know someone wants to persuade me of X, I mostly update less on their arguments than I would if they were indifferent, because I expect them to filter and misrepresent the data (but this one is tricky: sometimes the strong arguments are hard to find, and only the enthusiasts will bother).
  • If I hear many arguments for X that seem very similar I don't update very much after the first one, since I suspect that all the arguments are secretly correlated.
  • On social media the strongest evidence is often false, because false claims can be better optimized for virality.  If I hear lots of different data points of similar strength, I'll update more strongly on each individual data point.

None of this is cheap to compute; there are a bunch of subtle, clashing considerations.  So if we don't have a lot of time, should we use the sum, or the average, or what?  Equivalently: what prior should we have over data-generating processes?  Here's how I think about it:

Sum: Use this when you think your data points are independent, and not filtered in any particular way -- or if you think you can precisely account for conditional dependence, selection, and so on.  Ideal, but sometimes impractical and too expensive to use all the time.

Max: Useful when your main concern is noise.  Probably what I use the most in my ordinary life.  The idea is that most of the data I get doesn't pertain to X at all, and the data that is about X is both subject to large random distortions and probably secretly correlated in a way that I can't quantify very well.  Nevertheless, if X is true you should expect to see signs of it, here and there, and tracking the max leaves you open to that evidence without having to worry about double-updating.  As a bonus, it's very memory efficient: you only have to remember the strongest data favoring X and the strongest data disfavoring it, and can forget all the rest.

Average: What I use when I'm evaluating an attempt at persuasion from someone I don't know well.  Averaging is a lousy way to evaluate arguments but a pretty-good-for-how-cheap-it-is way to evaluate argument-generating processes.  Data points that aren't arguments probably shouldn't ever be averaged.

Min: I don't think this one has any legitimate use at all.  Lots of data points are only very weakly about X, even when X is true.

All of these heuristics have cases where they abjectly fail, and none of them work well when your adversary is smarter than you are.

comment by riceissa · 2023-02-14T20:30:34.798Z · LW(p) · GW(p)

This doesn't seem to be what I or the people I regularly interact with do... I wish people would give some examples or link to conversations where this is happening.

My own silly counter-model is that people take the sum, but the later terms of the sum only get added if the running total stays above some level of plausibility. This accounts for idea inoculation (where people stop listening to arguments for something because they have already heard of an absurd version of the idea). It also explains the effect Ronny mentions about how "you may very quickly find that everyone perceives the anti-T-ers as being much more reasonable": people stopped listening to the popular-and-low-quality arguments in favor of T.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-02-15T02:43:08.484Z · LW(p) · GW(p)

I've noticed it happening a bunch in conversations about timelines. People ask me why my timelines are 'short' and I start rattling off reasons, typically in order from most to least important, and then often I've got the distinct impression that I would have been more persuasive if I had just given the two most important reasons instead of proceeding down the list. As soon as I say something somewhat dubious, people pounce, and make the whole discussion about that, and then if I can't convince them on that point they reject the whole set of arguments. Sometimes this can be easily explained by motivated cognition. But it's happened often enough with people who seem fairly friendly & unbiased & curious (as opposed to skeptical) that I don't think that's the only thing that's going on. I think Ronny's explanation is what's going on in those cases.

Replies from: riceissa
comment by riceissa · 2023-02-15T20:52:27.776Z · LW(p) · GW(p)

I think it's often easiest/most tempting to comment specifically on a sketchy thing that someone says instead of being like "I basically agree with you based on your strongest arguments" and leaving it at that (because the latter doesn't seem like it's adding any value). (I think there's been quite a bit of discussion about the psychology of nitpicking, which is similar to but distinct from the behavior you mention, though I can't find a good link right now.) Of course it would be better to give both one's overall epistemic state plus any specific counter-arguments one thought of, but I only see a few people doing this sort of thing consistently. That would be my guess as to what's going on in the situations you mention (like, I could imagine myself behaving like the people you mention, but it wouldn't be because I'm taking averages, it would be because I'm responding to whatever I happen to have the most thoughts on). But you have a lot more information about those situations so I could be totally off-base.

Replies from: daniel-kokotajlo
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-02-15T21:01:29.141Z · LW(p) · GW(p)

Yeah idk, what you say makes sense too. But in at least some cases it seemed like the takeaway they had at the end of the conversation, their overall update or views on timelines, was generated by averaging the plausibility of the various arguments rather than by summing them or doing something more complex.

(And to be clear I'm not complaining that this is unreasonable! For reasons Ronny and others have mentioned, sometimes this is a good heuristic to follow.)

comment by ESRogs · 2023-02-14T19:28:01.545Z · LW(p) · GW(p)

I could have sworn that there was an LW comment or post from back in the day (prob 2014 or earlier) where someone argued this same point that Ronny is making — that people tend to judge a set of arguments (or a piece of writing in general?) by its average quality rather than peak quality. I've had that as a cached hypothesis/belief since then.

Just tried to search for it but came up empty. Curious if anyone else remembers or can find the post/comment.

Replies from: ESRogs
comment by ESRogs · 2023-02-14T19:34:14.161Z · LW(p) · GW(p)

in Ye Olden Days of Original Less Wrong when rationalists spent more time talking about rationality there was a whole series of posts arguing for the opposite claim (1 [LW · GW], 2 [LW · GW], 3 [LW · GW])

Oh, and FWIW I don't think I'm just thinking of Jonah's three posts mentioned here. Those are about how we normatively should consider arguments. Whereas what I'm thinking of was just an observation about how people in practice tend to perceive writing.

(It's possible that what I'm thinking of was a comment on one of those posts. My guess is not, because it doesn't ring a bell as the context for the comment I have in mind, but I haven't gone through all the comments to confirm yet.)

comment by DirectedEvolution (AllAmericanBreakfast) · 2023-02-14T17:13:24.083Z · LW(p) · GW(p)

Perhaps we can use a model in which an argument’s strength is a linear combination of its max, sum, and mean:

Strength = A x Max + B x Sum + C x Mean

So then the question becomes how to assign the weights.

In theory you could obtain an empirical result by assigning values to the strength, max, sum and mean for various beliefs you hold and their competing beliefs you do not hold, then finding the global coefficient values that best account for the numbers you put in. You could also try this with nonlinear models. If you could get people you consider worthy of emulation to take this study, perhaps you would find that there’s a model and set of parameters that best explains their collective approach to decision-making, which you could then adopt for yourself.

comment by Dagon · 2023-02-14T15:29:20.246Z · LW(p) · GW(p)

Need to specify context and define more precisely what it means to judge an argument. This post seems to vacillate between high-bandwidth mutual truth-seeking discussions and “social movement”, which is more of a simple broadcast-of belief model, with little room for debate or consideration of arguments.

To the direct question, rationality does not naively aggregate arguments by strength or quantity. It demands decomposing the arguments into evidence, and de-duplicating the components which overlap/correlate to get an update. And what people typically do depends entirely on the people and situations you consider to be typical.

comment by Zach Stein-Perlman · 2023-02-14T07:34:28.794Z · LW(p) · GW(p)

This largely feels true.

But if someone is disposed to believe P because of strong argument X, the existence of weak arguments for X doesn't feel like it dissuades them.

There's a related phenomenon where--separate from what people believe and how they evaluate arguments--your adversaries will draw attention to the most objectionable things you say, and a movement's adversaries will draw attention to the most objectionable things a member of the movement says.

comment by Garrett Baker (D0TheMath) · 2023-02-14T07:22:00.963Z · LW(p) · GW(p)

I think it really depends on the situation. Ideally, you'd take the best argument on offer for both positions, but this assumes arguments for both positions are equally easy for you to find (with help from third parties, not necessarily optimizing [well] for you making good decisions). I think in practice I try to infer what the blind-spots and spin-incentives [LW · GW] of the arguments I hear are, and try to think about what world we'd have to live in [? · GW] in order for these lines of arguments to be the ones which I end up hearing about via these sources.

Never do I do any kind of averaging or maximizing thing, and although what I said above sounds more complicated than saying "average!" or "maximize!", it mostly just runs in the background, on autopilot, at this point, so it doesn't take all that much extra time to implement. So I think its a false-dichotomy.

In some sense, one strong argument seems like it should defeat a bunch of weak arguments, but this assumes you're in a situation you never actually find yourself in in real life [LW · GW][1]. In reality, once you have one strong argument and a bunch of weak arguments, now begins the process of seeing how far you can take the weak arguments and turn them into strong arguments (either by thinking them yourself or seeking out people who seem to be convinced by the weak-to-you versions of the arguments). And if you can't do this, you should evaluate how likely you think it is you can make one of those weaker arguments stronger (either by some learned heuristics about what sorts of weak arguments are shadows of stronger ones, or looking at their advocates, or those who've been convinced, or the incentives involved, etc.).


  1. Although the original text talks about policies very specifically, I think this is also the case when trying to reason about progressively more accurate abstractions of the world. What you're really deciding on is which line of research inquiry to devote more thought to, with little expectation that either hypothesis on offer will be a truly general theory for the true hypothesis (even if---especially if---it is able to be developed into a truly general theory with a bit (or a lot) of work). ↩︎