Reading Level of Less Wrong

post by Alexandros · 2010-12-13T09:54:28.812Z · LW · GW · Legacy · 24 comments

Contents

24 comments

Here's something to pick our collective spirits up:

According to Google's infallible algorithms, 20% of the content on LessWrong.com falls within the 'Advanced' reading level. For comparison, another well-known bastion of intelligence on the internets, Hacker News, only has 4% of it's content in that category.

Strangely, inserting a space before the name of the site in the query tends to reduce the amount of content that falls in the highest bucket, but I am told that highly trained Google engineers are interrogating the bug in a dimly lit room as we speak, and expect it to crack soon.

24 comments

Comments sorted by top scores.

comment by [deleted] · 2010-12-13T11:07:48.617Z · LW(p) · GW(p)

Why would that pick anyone's spirits up? Surely in an ideal world, where you want to actually communicate, you want the reading level to be the lowest possible one that would get the idea across? Making something actively difficult to read is a good way to confine your ideas to an in-group...

Replies from: Jack, Alexandros
comment by Jack · 2010-12-13T11:37:10.028Z · LW(p) · GW(p)

Yes. In an ideal world everything interesting and important would be comprehensible to a ten-year-old. But since we don't live in an ideal world and many interesting and important ideas require difficult concepts and complex vocabulary we can be pleased with this evidence that we are rather unique in our ability and propensity to talk about important, difficult ideas.

Replies from: None
comment by [deleted] · 2010-12-13T13:20:27.441Z · LW(p) · GW(p)

It's not evidence of any such thing. Read Orwell's Politics And The English Language - http://www.mtholyoke.edu/acad/intrel/orwell46.htm . Every example of bad writing he gives there would show up as 'advanced'.

Replies from: Jack
comment by Jack · 2010-12-13T13:45:40.651Z · LW(p) · GW(p)

I have no idea how google's algorithms work. If they're counting syllables per word or evaluating vocabulary then ranking as advanced is evidence for both the claim that we use too much jargon and the claim that we talk about difficult and complex ideas. But that isn't the only evidence to consider. We've both read much of Less Wrong and can evaluate the difficulty and complexity of the ideas we discuss here. Do you not think we talk about difficult and complex ideas here? If so, what makes you think the 'advanced' rating is a product of poor writing rather than attempts to grapple with complexity?

I bet my use of the word 'algorithm' in my first sentence increases our rating. Would you like to suggest another word to replace it?

Replies from: DanielLC
comment by DanielLC · 2010-12-14T19:51:10.534Z · LW(p) · GW(p)

'method', or maybe 'system'

comment by Alexandros · 2010-12-13T11:31:25.358Z · LW(p) · GW(p)

Nobody's making anything actively difficult to read, nor did I advocate working to increase the reading level.

However, one-shot unexpected measurements of proxies do carry useful information. In this case, this is weak objective evidence that the conversation on LW is of especially high quality compared to other esteemed communities. That is all.

Replies from: None
comment by [deleted] · 2010-12-13T13:29:46.912Z · LW(p) · GW(p)

If anything, it's weak evidence that the conversation is of poor quality. Doing the same search for TrueOrigin.org, a creationist site, for example, shows that it gets 70% 'advanced', 29% 'intermediate' and 0% basic.

'Advanced' reading level is almost always a pretty good proxy for obfuscation, rather than for intelligence.

Replies from: atucker, David_Gerard, Jack
comment by atucker · 2010-12-14T07:24:17.644Z · LW(p) · GW(p)

Most reading level metrics are calculated with something like 206.835 - 1.015(total words/total sentences) - 84.6(total syllables/total words)*. Others involve long paragraphs and whatnot.

Besides being an amalgamation of funky constants to get answers the way they want (100 is easy, 0 is best for college-educated folks) it favors run-on sentences with polysyllabic words.

I think that most of the time, short well-phrased sentences are more understandable.

Long sentences of big words seem to be reminiscent of the incomprehensible journal article that takes effort to understand the language of, or the papers that kids in school throw together without regard for conveying an understanding of the subject, let alone editing for clarity.

*http://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_test

Replies from: magfrump, mindspillage
comment by magfrump · 2010-12-14T09:08:02.839Z · LW(p) · GW(p)

Your last sentence there has a complexity level of -105.87. Your overall level may be higher because your other sentences aren't over twenty words, but most of it came from your syllables.

Perhaps you could follow your own advice, and use shorter words?

I agree that these metrics are bad at finding depth and good at finding obfuscation.

Dang, I was hoping to get away with only two syllable words.

Also, upvoted for giving a link to a readability test.

Replies from: wedrifid
comment by wedrifid · 2010-12-14T09:36:15.593Z · LW(p) · GW(p)

Perhaps you could follow your own advice, and use shorter words?

The comprehensibility problem in the last sentence seems to be the grammar!

Replies from: None
comment by [deleted] · 2010-12-14T13:28:39.317Z · LW(p) · GW(p)

I thought it was for humour actually. Demonstrating the problem he was talking about in the very sentence he was talking about it..

Replies from: atucker
comment by atucker · 2010-12-14T20:52:29.382Z · LW(p) · GW(p)

I thought it was for humour actually. Demonstrating the problem he was talking about in the very sentence he was talking about it..

That was the intent. I probably could've done it better though.

The comprehensibility problem in the last sentence seems to be the grammar!

I agree with that, but the readability metric doesn't seem to deduct that much for grammar. Instead it just looks for long sentences, then docks for that. I don't think that it would actually be able to detect a long but readable sentence, and deduct fewer points for it.

Replies from: magfrump
comment by magfrump · 2010-12-14T23:21:44.468Z · LW(p) · GW(p)

Okay, so now I want to see how many words I can fit into a sentence without it getting too confusing to be read by someone who is pretty young or perhaps new to English; what sorts of ideas might you, or anyone else, have to make a sentence keep working as long as possible?

As to the original comment, sorry I guess I explained your joke.

Replies from: atucker
comment by atucker · 2010-12-15T01:16:19.250Z · LW(p) · GW(p)

Well, you did, but I was probably going to anyway at that point.

Really long descriptions seem to work well for making long sentences. Aside, do you want to do this with or without semicolons?

Replies from: magfrump
comment by magfrump · 2010-12-15T02:05:12.860Z · LW(p) · GW(p)

Does the algorithm count semicolons as creating new sentences? The purpose here remains to defeat the algorithm, correct?

Replies from: atucker
comment by atucker · 2010-12-15T02:52:45.412Z · LW(p) · GW(p)

I don't know actually. I'd guess not, but it might vary by implementation.

comment by mindspillage · 2010-12-17T08:28:08.038Z · LW(p) · GW(p)

I think these metrics are best for discovering which text would most benefit from efforts to simplify it. (Something I should do myself when writing for an audience.)

Thanks for posting the formula. I think it makes it much clearer what its limitations are, as compared to the opaque description "it measures reading level".

comment by David_Gerard · 2010-12-13T13:44:42.773Z · LW(p) · GW(p)

It tends to indicate intricate grammar. Tangled sentences are a hazard of attempting precision in English.

It'd be interesting to run the numbers on the top-rated comments, to see how much of a problem this is considered in practice.

Of course, that's just the front page of top comments, so these numbers are purely entertainment. But someone could do it more robustly if they care to.

comment by Jack · 2010-12-13T13:51:58.655Z · LW(p) · GW(p)

'Advanced' reading level is almost always a pretty good proxy for obfuscation, rather than for intelligence.

It's a good proxy for both. I don't see much obfuscation here. I do see a lot of people trying to precisely nail down ideas that are difficult to express.

Replies from: Richard_Kennaway, Pavitra
comment by Richard_Kennaway · 2010-12-13T14:24:03.979Z · LW(p) · GW(p)

It's a good proxy for both.

Which makes it a bad proxy for either.

Replies from: Alexandros
comment by Alexandros · 2010-12-13T14:58:00.552Z · LW(p) · GW(p)

Only if they were the only options, mutually exclusive, and equally likely.

comment by Pavitra · 2010-12-14T02:43:23.225Z · LW(p) · GW(p)

It's a good proxy for both.

Then the obvious next question would be: what's the differential for intelligence vs. obfuscation?

Replies from: Kevin
comment by Kevin · 2010-12-14T16:06:04.253Z · LW(p) · GW(p)

Compression. http://prize.hutter1.net/

comment by HonoreDB · 2010-12-13T20:55:50.595Z · LW(p) · GW(p)

Looks like text that is not forgotten tends to be Intermediate or Advanced.