Controversy - Healthy or Harmful?

post by Gunnar_Zarncke · 2014-04-07T22:03:55.836Z · LW · GW · Legacy · 18 comments

Contents

  Follow-up to: What have you recently tried, and failed at?
  Related-to: Challenging the Difficult Sequence
None
18 comments

Follow-up to: What have you recently tried, and failed at?

Related-to: Challenging the Difficult Sequence

ialdabaoth's post about blockdownvoting and its threads have prompted me to keep an eye on controversial topics and community norms on LessWrong. I noticed some things.

I was motivated: My own postings are also sometimes controversial. I know beforehand which might be (this one possibly). Why do I post them nonetheless? Do I want to wreak havoc? Or do I want to foster productive discussion of unresolved but polarized questions? Or do I want to call in question some point the community may have a blind spot on or possibly has taken something for granted too early.

Of course I see postings that provide objective information. These are almost never downvoted. Some members specialize in these. "Write a lukeprog-style post on X" is an appeal to invest time to provide a benefit (information) for the community. No problem here. Neither for a large body of 'typical' LW posts.

But some postings which I associate with newbies or aspiring rationalists (me included) which are sometimes personal, sometime objective often get a share of downvotes because they don't match some standards or norms. I mostly just don't vote these. I upvote them if I recognize that the poster has made a genuine effort I want to honor.

But progress is made by the discussion of controversial topics, where a consensus or synthesis has not (yet) been reached. Agreed - there are topics that are inherently ambiguous because different people have different values (and I don't mean politics which is abuse of topics for us-them-games). These topics can accumulate a sizeable share of downvotes. But even on these topics agreement should be possible at least on the meta-level of acceptance of the existence of different values. It is not necessary to downvote just because you have another position on this topic than taken or implied by the poster. 

Beside these direct on-track controversial topics which are mostly civil with regard to voting (except possibly if a strong stance is taken by some party) we also have another kind of post - or often threads. These are posts and (sub) threads about topics like religion, recreation, politics, status, real life pragmatism, relationships, dealing with newbies and trolls (which actually is a range and precise placement is difficult initially). What is the reason for these seemingly low quality posts (measured by karma)? What keeps those posting these at LessWrong? Who are they?

I think some amount of these topics are a necessary part of a healthy community and somebody has to tend to them. Some are more inclined to do so. Maybe these are housekeeping gnomes who do not really get the reward for their work - as downvoting these topics (to limit them) cannot be differentiated from the need to handle them somehow. If you depreciate these posters you depreciate those topics. Do you want to reduce this area of LW topics which connects LW to real life and help LW to keep alive?

One can wonder why the karma mechanism hasn't driven those posters and the topics away already. As always what gets measured gets optimized. In this case the karma mechanism ensures that no <0 karma poster remains. But as long as you consistently achieve > 50% positive you can stay. This suggests that we should see long-term members at all levels between 50% to 100%. Maybe someone with access to the database could provide a histogram of users by positive percentage. Do we have (long-term) members with near 50% karma?

It can be that there are some social processes at work which reinforce downvoting near 50% (thus skewing the distribution) possibly by those who see karma as a proxy for status and further push those near 50% away (though from a status perspective this in contra-productive because this not only 'punishes' them to stay below but actually 'deletes' them thus letting the punishers sink relatively).

I had a look at the karma positive rate of the recent top contributors and it appears that most long-time members have >85% but indeed few have above 95%. I also had a look at some controversial long-time members and those exist. Even one with 70% positive and >7500 upvotes (which translates into nearly 20000 votes total (up and down); karma formula: total votes = karma/(2*p-1)  where p = percentage positive).

So obviously even long term members don't 'achieve' 99%. And this doesn't appear to be the goal. Taking Eliezer Yudkowsky as a role model we see a high karma (I'm surprised how he managed to average 100 karma per day!) and a positive rate of 94% which seems a lot. But if you look over his posts you find a surprising number of controversial posts (e.g. the recent April Fools' Day Confession).

I read this to mean that posting controversial topics is encouraged - if it doesn't get our of hand or into Main.

I read this as an example to go forward and work hard controversy (at the risk of failure). This is in the spirit of What have you recently tried, and failed at? and all the posts in the Challenging the Difficult Sequence. You can only learn from this. Hey. Loosing karma is not the end of the world (only if it falls below 0). 

Remember the next time when you see someone with a 500 karma but 60% positive it means 1500 upvotes (and 2500 total) and likely contributions that actually advanced something rather than 'only' disseminating information. And they follow the role models too.

And if you are a low positive poster than take consolation from the absolute votes you got.

18 comments

Comments sorted by top scores.

comment by Viliam_Bur · 2014-04-08T10:49:01.409Z · LW(p) · GW(p)

Controversy - Healthy or Harmful?

Depends on the specific controversy, I guess.

To give a better answer, we could let an algorithm choose, say, 100 random comments with negative karma, and then make a poll and let users rate these comments as "healthy controversy" or "harmful".

My guess is that I would rate 70-80% of them as a waste of time.

comment by Nornagest · 2014-04-07T22:31:56.648Z · LW(p) · GW(p)

Do we have (long-term) members with near 50% karma?

Not many. The average controversial poster (by my totally-not-objective-at-all standards) seems to hover around 70%, with only a few people below that and usually not by much; preliminary and non-rigorous investigation points toward total karma ratios of between 73% and 66%, weighted towards the high end, plus one outlier in the mid-50s. There are a couple of less-well-established individuals closer to parity, but it seems that striking a close balance between approval and disapproval and developing thick enough skin to stick around anyway is harder than it sounds.

It might be interesting to note that of those I looked up, almost everyone had 30-day karma ratios lower than their total karma ratios: 60% wasn't unusual. I don't know if this points toward more recent controversy or a tendency for people to grow more outspoken or more contrarian over time.

Replies from: Viliam_Bur, None, shminux, Protagoras
comment by Viliam_Bur · 2014-04-08T10:44:21.004Z · LW(p) · GW(p)

almost everyone had 30-day karma ratios lower than their total karma ratios

One factor that contributes to this, but I am not sure how large part of the effect it explains: Downvoted articles disappear after a while, while upvoted articles remain visible. Therefore recent bad articles cost karma, but old bad articles don't; while both recent and old good articles can get upvotes. It's a bit similar with comments; when they are collapsed, probably people are less likely to look at them.

Please note that 30-day karma means recent votes on recent contributions -- it does not include recent votes on old contributions. And I suspect the "recent votes on old contributions" is biased towards positive, because the old bad contributions disappear.

comment by [deleted] · 2014-04-08T03:55:10.661Z · LW(p) · GW(p)

There's also the question of one poster having two very different types of posts.

My comments in which I simply inject biology or astronomy information tend to get massive levels of upvotes, while my posts on other topics have a tendency to be more controversial than my averaged 87% positive would indicate. I'm unable to separate the two to give hard numbers, unfortunately.

comment by shminux · 2014-04-07T23:01:46.799Z · LW(p) · GW(p)

It might be interesting to note that of those I looked up, almost everyone had 30-day karma ratios lower than their total karma ratios: 60% wasn't unusual. I don't know if this points toward more recent controversy or a tendency for people to grow more outspoken or more contrarian over time.

Mine is down to 62% due to some block-downvoting in the last couple of weeks, so that could be another reason. Or is this what you mean by "recent controversy"?

Replies from: Nornagest
comment by Nornagest · 2014-04-07T23:07:45.037Z · LW(p) · GW(p)

Nah, I was talking more about the recent rash of posts on demographics and related topics in Stranger than History, the reactions to Eliezer's mass social engineering speculation in the April Fools' post, and a few other similar events. Though now that you mention it, people do seem to have grown a bit more trigger-happy with downvotes in recent months, block and otherwise.

comment by Protagoras · 2014-04-08T14:21:11.090Z · LW(p) · GW(p)

I am inclined to believe that the more recent controversy may be a factor. It's the first time I've been block downvoted, so I'm inclined to believe that there's been an increase in that kind of activity.

comment by Schmoo · 2014-04-08T07:38:07.247Z · LW(p) · GW(p)

The Karma system is better than nothing, and also better than even simpler systems as Facebook's like system, but the main problem is that it is too simple.

Presumably the Karma system is supposed to at least do two things:

1) Influence posters' behaviour (e.g. if you get downvoted when writing in a certain way you're likely to change)

2) Inform readers which posts and comments to read

However, it does not perform these tasks very efficiently, the reason being that it is so very unclear what we are voting on. People apply wildly different criteria. For instance, I would guess that some have a much lower threshold for throwing a downvote than others. Also, some primarily reward people who write posts containing objective information (as pointed out above), whereas others also reward other sorts of posts.

As someone pointed out somewere, there is also a bandwagon effect when it comes to voting, so that posts/comments with upvotes/downvotes are more likely to continue to be upvoted/downvoted. This means that a certain post which a lot of people would actually find interesting can get downvoted because of bad luck: the first voter uses non-standard criteria and his vote then influences subsequent voters.

All this means that both posters and readers can't know exactly why it is that a certain post has got a certain amount of Karma. As a result, the present Karma system does not fulfil either task 1) or task 2) adequately. If you don't know why a certain post got a certain amount of Karma, how can you know how to change your writing, and how can you decide whether to read it or not?

Of course, the comments give both readers and posters a better picture of what people think of the post, but saying this is a bit beside the point. If it doesn't matter that the Karma system is less than satisfactory because you can read the comments, then why have the Karma system after all?

The main advantage of the present Karma system is its simplicity. It could be argued that more complex system would be too complicated for people to comprehend, etc. That is perhaps an argument that would be viable at Reddit and similar sites, but surely a site claiming to be "rationalist" should be able to assume that it's members can handle more complex systems.

Exactly how such a system is to be devised is an important question which should be discussed (suggestions are welcome) but I'll stop here for now.

Replies from: Richard_Kennaway, Gunnar_Zarncke, Gunnar_Zarncke
comment by Richard_Kennaway · 2014-04-08T08:54:32.530Z · LW(p) · GW(p)

People apply wildly different criteria.

This is a feature.

If everyone had identical criteria for voting, we would see all postings having either large positive karma, karma near zero, or large negative karma. The more alike people are in their judgements, the less information the total score provides. It is because people vary in what they find voteworthy that the whole spectrum of scores is meaningful.

As someone pointed out somewere, there is also a bandwagon effect when it comes to voting, so that posts/comments with upvotes/downvotes are more likely to continue to be upvoted/downvoted.

If many people with different criteria all like a post, chances are that the next person to read it will like it also. I don't see a problem.

This means that a certain post which a lot of people would actually find interesting can get downvoted because of bad luck: the first voter uses non-standard criteria and his vote then influences subsequent voters.

I have often noticed the direction of karma on a post reversing after the first few votes. Sometimes I have voted on a post that I would not otherwise have done, just to oppose the trend of its karma when I thought it unmerited.

The main advantage of the present Karma system is its simplicity.

Yes! One click! A more complicated system would not be too complicated to use, but too complicated to be worth using. On Ebay, I'm happy to give feedback as positive/neutral/negative plus a few words of boilerplate, but I never use their 5-star scales for quality of packaging, promptness of delivery, etc. How do I rate a cardboard box out of 5?

In short, I think the karma system is excellent and sets a high bar for being improved on.

Replies from: Schmoo, Gunnar_Zarncke
comment by Schmoo · 2014-04-08T09:48:01.648Z · LW(p) · GW(p)

If everyone had identical criteria for voting, we would see all postings having either large positive karma, karma near zero, or large negative karma. The more alike people are in their judgements, the less information the total score provides.

If you only can give 1 plus vote, 1 negative vote, or no vote at all, that seems to follow. If you rather could give, say 1-5 positive or negative Karma, we would see a greater variety of scores.

Also, note that many posts and especially comments have very few votes. This means that the votes actually cast will often not be typical of the whole population of possible voters in a system where people's votes vary considerably. In a system where people's votes are more alike, this obviously happens less frequently.

Yes! One click! A more complicated system would not be too complicated to use, but too complicated to be worth using. On Ebay, I'm happy to give feedback as positive/neutral/negative plus a few words of boilerplate, but I never use their 5-star scales for quality of packaging, promptness of delivery, etc. How do I rate a cardboard box out of 5?

I agree that one shouldn't have to rate, e.g. comments on say five different criteria. The system could be be somewhat more complex to comprehend, but you're right that it shouldn't be significantly more complex to use.

I think one obvious improvement is, though, to separate the posts into different categories which are to be assessed on different criteria. You could have one "objective information/literature review" section, one "opinion piece/discussion" section, one "meetup" section, and possibly a few more. In each section, you'd be rated on different criteria. That way, original pieces wouldn't be downvoted because they're not literature reviews, which seems to be Gunnar's (justifiable) complaint.

This system would be superior to the present, and no more complicated. I think further improvements are also possible, but those should be separately discussed.

comment by Gunnar_Zarncke · 2014-04-08T11:18:19.356Z · LW(p) · GW(p)

People apply wildly different criteria.

This is a feature.

It is a property. It means some aggregation. But that is inevitable given a single bit.

In short, I think the karma system is excellent and sets a high bar for being improved on.

It is excellent compared to no rating or only single-direction voting. But is quite inferior to e.g. the slashdot system. Even a single-click system that provides different buttons for different types of posts would be better.

comment by Gunnar_Zarncke · 2014-04-08T12:23:48.233Z · LW(p) · GW(p)

See also The Mathematics of Gamification - Application of Bayes Rule to Voting.

Replies from: Schmoo
comment by Schmoo · 2014-04-08T12:53:56.557Z · LW(p) · GW(p)

Thanks, that's very interesting. I was especially interested in this:

We can gauge each Superuser’s voting accuracy based on their performance on honeypots (proposed updates with known answers which are deliberately inserted into the updates queue). Measuring performance and using these probabilities correctly is the key to how we assign points to a Superuser’s vote.

So they measure voting accuracy based on some questions on which they know the true answer.

There is a difference between their votes and the kind of votes cast here, though; namely that on Less Wrong there is not in a strict sense a "true answer" to how good a post or comment is. So that tactics cannot be used.

On questions on which there is a true answer it is easier to track people's reliability and provide them with incentives to answer reliably. On questions which are more an issue of preference ("e.g. how good is this post?") that is harder.

comment by Gunnar_Zarncke · 2014-04-08T09:31:55.981Z · LW(p) · GW(p)

In some comment some time earlier I proposed a voting/rating system (which I now can't find because "vote" occurs in every hit) which was intendend to be intuitive and provide the necessary information. The basic idea is to asynchroneously transport human emotion. Translating the emotion to/from a few well known words is trivial and if the set of words is sufficiently rich and the aggregation of these ratings (for sorting/filtering) follows some sensible rules then I think this system should be near optimum.

I'd add independent votes for the dichotomies love/hate, happy/sad, awed/pity, surprised/bored, funny/sick, (for comparison you can have a look at the Lojban attitudinals). Using such a system a great insightful post might get voted love+awe. And a rant hate and/or sick. Some unhelpful commonplace get 'bored'.

Adding a satisfied/dissatisfied attitudinal is problematic because it is prone to depend on the relationship to the poster. One could add an agreement/disagreement vote which votes the relation between both members and which isn't taken into account when ranking globally but in a personal view.

In a way the usual 'like' is an abstracted sum of the positive emotions. Whereas karma here is a sum of all emotions (because it allows downvotes).

Slashdot tries a different approach that tries to use some objective categories which I can't translate to simple emotions ('informative'=curiosity? 'insightful'=surprise+awe?, 'funny'=surprise+happyness?). But I do get little out of these tags and they are more difficult to translate.

ADDED: See Measuring Emotions

Replies from: christopherj, Schmoo
comment by christopherj · 2014-04-09T16:40:24.757Z · LW(p) · GW(p)

Since you mention Slashdot, here's a little side effect of one of their moderation systems. At one point, they decided that "funny" shouldn't give posters karma. However, given the per-post karma cap of 5, this can prevent karma-giving moderation while encouraging karma-deleting moderation by people who think the comment overrated, potentially costing the poster tons of karma. As such, moderators unwilling to penalize posters for making jokes largely abandoned the "funny" tag in favor of alternatives.

I suspect that if an agree/disagree moderation option were added, it would likely suffer from a similar problem. Eg if we treated that tag reasonably and used it to try to separate karma gains/losses from personal agreement/disagreement, people would be tempted to rate a post they especially like as disagree/love/awe.

A more interesting idea, I think, would be to run correlations between your votes and various other bits, such as keywords, author, and other voters, to increase the visibility of posts you like and decrease the visibility of posts you don't like. This would encourage honest and frequent voting, and diversity. Conversely, it would cause people to overestimate the community's agreement with them (more than they would by default).

comment by Schmoo · 2014-04-08T09:49:57.329Z · LW(p) · GW(p)

Interesting, and an interesting Slashdot link. I especially like the idea of "moderating the moderators". You do need to check whether people vote seriously in some way, it seems to me.

The only problem I see is Richard's concern below that multi-criterial systems, where you actually vote on all criteria, may turn out to be too cumbersome to use.

Replies from: Gunnar_Zarncke, Gunnar_Zarncke
comment by Gunnar_Zarncke · 2014-04-15T14:47:59.440Z · LW(p) · GW(p)

Depends. It's too cumbersomeif it is as elaborate as this one: Measuring Emotions

comment by Gunnar_Zarncke · 2014-04-08T11:13:52.283Z · LW(p) · GW(p)

I didn't propose to force a vote on all. Only the stronges emotional responses. Maybe none.