Rationality Verification Opportunity?

beoshaffer

Rationality Verification Opportunity?

post by beoShaffer · 2011-12-15T22:11:04.211Z · LW · GW · Legacy · 14 comments

14 comments

One of the challenges of rationality verification is that most people who are willing to contribute personal data for it are already familiar with the techniques involved. This makes it difficult to tell if their performance on any form of rationality test is due to their training or their innate abilities. Does the start of a new sequence present a way around this for that sequence's content?

I believe that it might, and will propose some ideas on how we can take advantage of these opportunities. But first I would suggest that you try to think through the problem for yourself (I know this is slightly different from what is talked about in that post, but I think the principle holds).

Did you think through the general problem of rationality verification for new sequences before thinking of any solutions. Did you then think of your own solutions before getting your mind contaminated with mine? If yes, good. If no, not so good.

If we had good measures of general rationality that could be retaken by the same person multiple times without losing reliability we could simply ask LWers to take them at various intervals and see if they improved after reading the new sequence. As that is not the case I suspect we would have to create specific measures for each sequence. It seems that most writers have a decent idea of what benefits they expect people to gain from their sequences' so perhaps they could try to come up with specific measures for the things that their sequences are supposed to improve. Then before running the sequence main sequence they could put out a call for people to complete these measures and send them in. They could then collect the data again from people who have read the completed sequence, preferably after they have had enough time to practice the material, but not long enough to have had to many other life changes. The necessity and viability of having additional experimental controls will vary between sequences. But I think we will generally be fine with a simple before and after picture.

While there are some time and talent limitations I would be willing to help with creating the measures, collecting and interpreting the data and any other necessary steps.

I declare Crocker's rules on the content and style of this post. This includes the title.

14 comments

Comments sorted by top scores.

comment by JenniferRM · 2011-12-16T00:39:15.286Z · LW(p) · GW(p)

The field of psychometrics is all about this kind of thing. The keyword here is "practice effect". It sort of looks like Wikipedia's deletionists have trimmed their content on the concept down to two sentences in an article that's been nominated for deletion as non-notable, but if you hunt around you can find pre-existing content on ways to control for practice effects.

The unique thing about the situation with LW in this respect seems to me to be that there are a lot of people who tend to conceive and execute and publish polls with relatively sophisticated methodology for the internet in the complete absence of grants or formal publication or whatever. We're doing as a hobby for a community blog, (yes, a blog) what academics make an entire career out of! Maybe not fully solid with control groups yet, but we're kind of close to this already actually.

To make this sort of "surprising but casual competence" more dramatic and effective, it seems like it might be worthwhile to do a sequence on current best practices for community members to run studies on the community itself. Between the free polling technology available via the forms built into google docs and and chapters from psychometrics textbooks, I bet it wouldn't be that hard for someone to pull together such content for a sequence that makes it easier for people here to spice their efforts up with pretty solid techniques :-)

For example, we probably could get some interesting control data for practice effects the way Anna got control data in her recent poll via mechanical turk. If you've developed the quiz content you could ask LWers to take it and turkers to take it, and then get the same LWers and turkers to re-take it with some being exposed to whatever manipulation you tried on LW by posting content and others not... it wouldn't be a perfect control, but (ignoring the costs) it would be better than nothing...

...which naturally leads me to wonder. What is the value of information here? Is there some change in behavior that certain results would cause? What kind of increase in value could be expected from such a change in behavior? Anyone have guesses here?

Replies from: beoShaffer

↑ comment by beoShaffer · 2011-12-17T03:15:40.476Z · LW(p) · GW(p)

What is the value of information here? Is there some change in behavior that certain results would cause? What kind of increase in value could be expected from such a change in behavior? Anyone have guesses here?

If the sequence was shown to be useful we would be able to use the data to help show people that LW is useful. If the sequence is not useful we would likely need to do more research to determine why. If we find that the sequence was merely ineffective in instilling the techniques we could re-write it to be more effective. If it turns out that the techniques themselves are ineffective we could stop teaching them. Preferably we wouldn't remove the sequence just add a warning at the start of each post. This would save people time and encourage them to create alternative techniques.

comment by Incorrect · 2011-12-15T22:29:54.248Z · LW(p) · GW(p)

Since you declared Crocker's rules on the writing so explicitly...

teqnuices

from first paragraph
techniques*

verification for new sequnces before thinking of any solutions

from third paragraph
sequences*

yourself(I know t

from second paragraph
This should probably have a space.

genreally

from fourth paragraph last sentence
generally*

reliablity

from fourth paragraph first sentence
reliability*

While there are some time and talent limitation I would be willing to help with creating the measures, collecting and interpeting the data and any other necessary steps.

limitations
interpreting

It seems that most writers have a decent idea of what benefits they expect people to gain from their sequences'

from fourth paragraph third sentence
There is no need for an apostrophe after the word "sequences."

Then before running the sequence main sequence they could put out a call for people to complete these measures and send them in

"Then before publishing the sequence they could request people complete these measures and send them in" would be a bit better.

Your post reads like spoken communication. Almost every sentence could be improved.

Replies from: beoShaffer, Antisuji, billswift

↑ comment by beoShaffer · 2011-12-15T23:01:51.305Z · LW(p) · GW(p)

Almost every sentence could be improved. Any specifics?

Replies from: Incorrect

↑ comment by Incorrect · 2011-12-15T23:41:21.676Z · LW(p) · GW(p)

My writing skills are also lacking but I'll give it a shot…

Does the start of a new sequence present a way around this for that sequence's content?

The publishing of a sequence with new material may present an opportunity to perform rationality tests without the aforementioned difficulties.

The necessity and viability of having additional experimental controls, like a control group that just reports the measures, without reading the sequence will vary between sequences. But I think we will generally be fine with a simple before and after picture.

Whether additional experimental controls are necessary and/or viable will vary between sequences. For example, a control group could report their measurements without reading the sequence. Regardless, I think it is generally adequate to simply compare individual's measurements from before and after reading the sequence.

I guess I invoke Crocker's rules as well. Although, I think my sentences may be even worse…

Replies from: beoShaffer

↑ comment by beoShaffer · 2011-12-16T00:41:00.510Z · LW(p) · GW(p)

Changed

The necessity and viability of having additional experimental controls will vary between sequences. For example, we could use a control group that doesn't read the sequence, or reads an alternate version while still filling out the same measures. But I think we will generally be fine with a simple before and after picture.

To a new version of my own. Also considered changing it to:

The necessity and viability of having additional experimental controls will vary between sequences. For example, we could use a control group that doesn't read the sequence, or reads an alternate version while still filling out the same measures. But I think we will generally be fine with a simple before and after picture.

Any advice on which is better?

↑ comment by Antisuji · 2011-12-16T02:39:03.150Z · LW(p) · GW(p)

What do the asterisks mean?

Replies from: Incorrect

↑ comment by Incorrect · 2011-12-16T03:07:02.194Z · LW(p) · GW(p)

It indicates a correction

Replies from: Antisuji

↑ comment by Antisuji · 2011-12-16T06:44:53.015Z · LW(p) · GW(p)

I apologize for being snarky. I am aware of the usage, though I am more familiar with the form in which the asterisk comes at the beginning of the line. I always assumed that the construction came from the way asterisks are used in footnotes, though of course I could be wrong about that. I had not thought to look at the Wikipedia page, so thanks for the link.

Perhaps I should have said that the use of asterisks as a correction marker is in a lower register than I am used to seeing on LW, at least to my eyes. It is fine for IM conversation (though it still grates a little*) but less so for a non-real-time format where you have the luxury of editing. I'll acknowledge that I am at 31 on the older side for this forum and so possibly not fully au courant.

[Edited for formatting.]

* Which I'm trying to get over. I expect it annoys me for the same reason ending rants with "/rant" does, though on a smaller scale.

Replies from: Incorrect

↑ comment by Incorrect · 2011-12-16T07:25:53.124Z · LW(p) · GW(p)

It is fine for IM conversation (though it still grates a little*) but less so for a non-real-time format where you have the luxury of editing.

I was correcting the original poster, not myself.

↑ comment by billswift · 2011-12-15T22:41:59.956Z · LW(p) · GW(p)

How did you miss this one "teqnuices" in the first sentence?

Replies from: Incorrect

↑ comment by Incorrect · 2011-12-15T23:08:15.460Z · LW(p) · GW(p)

I didn't.

comment by Morendil · 2011-12-16T07:31:54.291Z · LW(p) · GW(p)

Please read this paper before you start down an unproductive path. It's in a different field but many of the same caveats apply.

The solution I came up with (when you insisted) was to verify rationality by observing the production of novel results. They don't have to be Nobel-prize grade, but someone who has actually understood some material presented here should be able to use it to generate thoughts that are not the result of a cache lookup.

In practical terms, this would look like a "meta-exercise" a la "The Five Second Level", asking the reader to first design an application exercise for the ideas newly assimilated, and then solve that exercise. The only requirement is that any solution should come as a surprise to the person proposing it.

Replies from: lessdazed

↑ comment by lessdazed · 2011-12-30T23:56:45.161Z · LW(p) · GW(p)

verify rationality by observing the production of novel results. They don't have to be Nobel-prize grade

I would expect even better results when monitoring catastrophes among the trained and the untrained. Things like "not going bankrupt" relative to those in a good control group, rather than "got tenure."

Rationality Verification Opportunity?

Contents

14 comments