Detecting Web baloney with your nose?

post by uzalud · 2012-11-10T15:50:07.292Z · score: -3 (14 votes) · LW · GW · Legacy · 21 comments

Is there a useful heuristic for detecting rationally-challenged texts (as in Web pages, forum posts, facebook comments) which takes relatively superficial attributes such as formatting choices, spelling errors, etc. as input? Something a casual Internet reader may use to detect possibly unworthy content so they can suspend their belief and research the matter further. Let's call them "text smells" (analogue to code smells), like:

  1. too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
  2. walls of text;
  3. little concrete data/links/references;
  4. too much irrelevant data and references;
  5. poor spelling and grammar;
  6. obvious half-truths and misinformation.

Since many crackpots, pseudoscientific con artists, and conspiracy theorists seem to have cleaned up their Web sites in recent years, I wonder do these low-cost baloney detection tools might be of real value. Does anyone know of any studies or analyses of correlation between these basic metrics and the actual quality of the content? Can you think of some other smells typical of Web baloney?



Comments sorted by top scores.

comment by jimrandomh · 2012-11-10T18:41:01.666Z · score: 12 (12 votes) · LW(p) · GW(p)

Remember, there's unlimited reading material to choose from; your not-worth-reading detector should be sensitive, because false negatives cost much more than false positives. When reading an author for the first time, unless I have a strong recommendation or other quality signal, I will stop if the first incidence of stupidity precedes the first insight, or if there are no good insights in the first 500 words or so.

For superficial signals like spelling and overuse of emphasis, I divide them into two categories: things a good writer would do if they were rushed, and things a good writer wouldn't ever do. Typos, missing words, few citations? You're looking at an unedited draft; whether that's okay or not depends on the context. Bold italic all-caps large font? Crackpot.

comment by fubarobfusco · 2012-11-10T17:04:34.420Z · score: 6 (6 votes) · LW(p) · GW(p)

"Proper" spelling and grammar are some sort of indication of conscientiousness that the writer has put into ① their education, and ② the text itself. However, it's a pretty noisy signal; there are plenty of properly-spelled Bible study guides out there.

comment by beoShaffer · 2012-11-10T19:53:18.418Z · score: 1 (1 votes) · LW(p) · GW(p)

Also, there are a lot of insightful people who focused on learning other things (its amazing how little non-code writing even good cs programs will let you get away with) and/or who write in english because its common rather than because its what they were educated in.

comment by thomblake · 2012-11-12T22:01:42.323Z · score: 0 (0 votes) · LW(p) · GW(p)

its amazing how little non-code writing even good cs programs will let you get away with

And yet the good ones leave one with an appreciation for syntax that transfers itself naturally to the written word.

comment by dbaupp · 2012-11-10T22:50:55.730Z · score: 3 (3 votes) · LW(p) · GW(p)

There are a few other "crackpot indices" around. John Baez has a famous one, and Scott Aaronson has one in that vein (mostly specific to mathematics papers though).

comment by buybuydandavis · 2012-11-10T19:27:45.833Z · score: 2 (4 votes) · LW(p) · GW(p)

In defense of crackpots, many of the canonical writers here would ping the crackpot meter of most people, as would most of the LW contributors.

Korzybski is a prime example. If I hadn't had a very strong prior from personal discussions, there is no way I would have made it 10 pages into Science and Sanity.

For serious reading, my priors are more important than typesetting. For web blogs and filtering forums, it's a decent way to filter complete unknowns.

comment by [deleted] · 2012-11-11T14:22:56.798Z · score: 1 (1 votes) · LW(p) · GW(p)

Number 7: comic sans

comment by [deleted] · 2012-11-10T16:59:12.250Z · score: 1 (5 votes) · LW(p) · GW(p)
  • too much emphasis in text (ALL CAPS, bold, color, exclamations, etc.);
  • walls of text;
  • little concrete data/links/references;
  • too much irrelevant data and references;
  • poor spelling and grammar;
  • obvious half-truths and misinformation.

I count three that apply to Eliezer's sequences and another that can be applied to lukeprog's posts. And in addition to all four of these a fifth (poor spelling) that apply to my own posts.

comment by gjm · 2012-11-11T03:14:35.938Z · score: 2 (2 votes) · LW(p) · GW(p)

Would you care to clarify how much you mean "... so Eliezer and Luke are crackpotty" and how much you mean "... so these aren't a very good guide"? (For the avoidance of doubt, I don't think either argument is obviously crazy, though actually I think Eliezer and Luke aren't crackpots and those are useful crackpot indicators.)

comment by TrE · 2012-11-10T17:10:55.318Z · score: 2 (2 votes) · LW(p) · GW(p)
  • two types of emphasis at once, such as underlined italic bold text
  • a product to be sold, such as a book written by a mistaken genius
comment by BerryPick6 · 2012-11-10T19:30:27.965Z · score: 1 (1 votes) · LW(p) · GW(p)

Which one applies to Luke?

comment by [deleted] · 2012-11-10T19:35:22.190Z · score: 3 (5 votes) · LW(p) · GW(p)

Too much irrelevant data and references.

comment by prase · 2012-11-10T18:44:47.809Z · score: 0 (0 votes) · LW(p) · GW(p)

Which three?

comment by [deleted] · 2012-11-10T18:46:32.567Z · score: 2 (4 votes) · LW(p) · GW(p)

Well Eliezer was found of italicizing words in his text, doesn't provide references for most of his statements and wrote quit a few walls of text. I mean the sequences are huge.

comment by uzalud · 2012-11-10T19:23:13.436Z · score: 4 (4 votes) · LW(p) · GW(p)

I wouldn't call Eliezer's emphasis excessive, nor would I call the sequences "walls of text". This is an example of both:

My question is: if you didn't know any English, could you still infer that this is more likely to be baloney, or not?

comment by Decius · 2012-11-10T21:34:52.526Z · score: 1 (1 votes) · LW(p) · GW(p)

Without knowing English, I would suggest that only the excessive repeated bangs and interrogation marks are high-value. The excessive ALL CAPS is likely mid-value, and the lack of paragraph breaks is low-value.

comment by [deleted] · 2012-11-10T19:36:50.256Z · score: 1 (3 votes) · LW(p) · GW(p)

That extreme? Yes it is evidence that the author has low competence and that is evidence of being false.

comment by Thomas · 2012-11-10T17:17:37.140Z · score: -3 (3 votes) · LW(p) · GW(p)

Connecting for example 2 with a crackpottery might be a sign of a crackpottery itself.

comment by prase · 2012-11-10T18:43:24.328Z · score: 6 (6 votes) · LW(p) · GW(p)

In my experience, absence of paragraphs is strongly correlated with low quality of the text. Do you have examples of good walls of text?

comment by buybuydandavis · 2012-11-10T19:17:11.443Z · score: 4 (4 votes) · LW(p) · GW(p)

I think walls of text are matter of time and place. My recollection is that Paine, Jefferson, and Franklin wrote in very long paragraphs, with very long sentences in them.

For contemporary texts written for the web, I'd generally agree, and even if I don't think the writer is a crackpot, I'll stop reading because of the difficulty of visually tracking through a wall of text.

comment by Thomas · 2012-11-10T18:49:50.737Z · score: 0 (2 votes) · LW(p) · GW(p)

I can't say I have. But I would't judge a text based on some peculiar font use, either.

You have to read and understand why it is good or why it isn't. Shortcuts like "funny grammar" are not reliable.