Please Critique Things for the Review!

raemon

Please Critique Things for the Review!

post by Raemon · 2020-01-11T20:59:49.312Z · LW · GW · 30 comments

30 comments

I’ve spent a lot of time defending LW authors’ right to have the conversation they want to have [LW · GW], whether that be early stage brainstorming, developing a high context idea, or just randomly wanting to focus on some particular thing.

LessWrong is not only a place for finished, flawless works. Good intellectual output requires both Babble [LW · GW] and Prune [LW · GW], and in my experience the best thinkers often require idiosyncratic environments in order to produce and refine important insights. LessWrong is a full-stack intellectual pipeline.

But the 2018 Review is supposed to be late stage in that pipeline. We’re pruning, not babbling here, and criticism is quite welcome. We’re deliberately offering just as much potential prize money ($2000) to reviewers as to the top-rated authors.

Nominated authors had the opportunity to opt out of the review process, and none of them did. Getting nominated is meant to feel something like “getting invited to the grown-ups table”, where your ideas are subjected to serious evaluation, and that scrutiny is seen as a sign of respect.

In my current expectations, the Review is one of the primary ways that LessWrong ensures high epistemic standards. But how well that plan works is proportional to how much effort critics put into it.

The Review and Voting Phases will continue for another until January 19th. During that time, review-comments will appear on the voting page, so anyone considering how to vote on a given post will have the opportunity to see critiques. The reviews will appear abridged initially, so I’d aim for the first couple sentences to communicate your overall takeaway.

The Review norms aren’t “literally anything goes” – ridicule, name-calling etc still aren’t appropriate. I’d describe the intended norms for reviews as “professional”. But, posts nominated for the Review should treated as something like “the usual frontpage norms, but with a heavier emphasis on serious evaluation.”

I’m still not sure precisely what the rules/guidelines should be about what is acceptable for the final Best of 2018 Book. In some cases, a post might make some important points, but also make some unjustified claims. (I personally think Local Validity as Key to Sanity and Civilization falls in this category). My current best guess is that it’d be fine if such posts end up in the book, but I’d want to make sure to also include reviews that highlighted any questionable statements.

Happy Critiquing!

30 comments

Comments sorted by top scores.

comment by Zvi · 2020-01-16T15:11:21.963Z · LW(p) · GW(p)

I'm finding that posts have to be read and reviewed one at a time to do this properly. As a result there's no way I'm going to get to the bulk of the posts in time, even after deciding several days ago to make this one of my priorities for free time. And yeah, the whole thing feels mostly like work, which can't help.

Replies from: Wei_Dai, Raemon

↑ comment by Wei Dai (Wei_Dai) · 2020-01-16T20:59:23.195Z · LW(p) · GW(p)

And yeah, the whole thing feels mostly like work, which can’t help.

This is partly why I haven't done any reviews, despite feeling a vague moral obligation to do so. Another reason is that I wasn't super engaged with LW throughout most of 2018 and few of the nominated posts jumped out at me (as something that I have a strong opinion about) from a skim of the titles, and the ones that did jump out at me I think I already commented on back when they were first posted and don't feel motivated to review them now. Maybe that's because I don't like to pass judgment (I don't think I've written a review for anything before) and when I first commented it was in the spirit of "here are some tentative thoughts I'm bringing up for discussion".

Also, I haven't voted yet because I don't remember the details of the vast majority of the posts, and don't feel comfortable just voting based on my current general feeling about each post (which is probably most strongly influenced by how much I currently agree with the main points it tried to make), and I also don't feel like taking the time to re-read all of the posts. (I think for this reason perhaps whoever's selecting the final posts to go into the book should consider post karma as much or even more than the votes?)

I think if there was a period where every few days a mod would post a few nominated posts and ask people to re-read and re-discuss them, that might have helped to engage people like me more. (Although honestly there's so much new content on LW competing for attention now that I might not have participated much even in that process.)

Replies from: Raemon, Raemon

↑ comment by Raemon · 2020-01-16T21:09:17.656Z · LW(p) · GW(p)

Also, I haven't voted yet because I don't remember the details of the vast majority of the posts, and don't feel comfortable just voting based on my current general feeling about each post

Reminder here that it's pretty fine to vote proportional to "how good does the post seem" and "how confident you are in that assessment." (i.e. I expect it to improve the epistemic value of the vote if people in your reference class weakly vote on the posts that seem good)

↑ comment by Raemon · 2020-01-16T21:05:18.096Z · LW(p) · GW(p)

I think if there was a period where every few days a mod would post a few nominated posts and ask people to re-read and re-discuss them, that might have helped to engage people like me more. (Although honestly there's so much new content on LW competing for attention now that I might not have participated much even in that process.)

That's a pretty good idea, might try something like that next year.

the ones that did jump out at me I think I already commented on back when they were first posted and don't feel motivated to review them now.

Not sure how helpful this is, but fwiw:

I think it's useful for the post authors to write reviews basically saying "here is how much thinking has evolved since writing this, and/or 'yup, I still just endorse this and think it's great".

In the same way, I think it'd be useful people did most of their commenting back-in-the-day write a short review that basically says "I still endorse the things I said back then", or "my thinking has changed a bit, here's how." (As I noted elsethread, I think it was also helpful when Vanessa combined several previous comments into one more distilled comment, although obviously that's a bit more work).

↑ comment by Raemon · 2020-01-16T18:41:38.927Z · LW(p) · GW(p)

Nod. Something perhaps worth saying explicitly was that I was expecting / hoping for each longtime user to review a smallish number of things (like, 1-5) over the course of the monthlong review process, focusing on posts that they had some kind of strong opinion about.

(Some people have done lots of smaller reviews, which I also think is good but for different reasons, and not something I think people should be feeling pressure to do if they’re not finding it worthwhile.)

comment by Raemon · 2020-01-12T21:22:18.645Z · LW(p) · GW(p)

I wanted to highlight something particularly good about Vanessa's recent review of Realism About Rationality [LW(p) · GW(p)] – partly answering an implied question of "what if you already commented on a post a year ago and don't have anything new to say?"

I think the Review is a good time to do distillation on past discussion. Vanessa's comment was nice because took what had previously been a lengthy set of back-and-forths, and turned into in a single more digestible comment.

comment by Bendini (bendini) · 2020-01-11T23:16:14.548Z · LW(p) · GW(p)

The shortage of reviews is both puzzling and concerning, but one explanation for it is that the expected financial return of writing reviews for the prize money is not high enough to motivate the average LessWrong user, and the expected social prestige for commenting on old things is lower per unit of effort than writing new things. (It's certainly true for me, I find commenting way easier than posting but I've never got any social recognition from it, whereas my single LW post introduced me to about 50 people.)

Another potential reason is that it's pretty hard to "review" the submissions. Like most essays on LessWrong, they state one or two big ideas and then spend the vast majority of the words on explaining the ideas and connecting them to other things we know. This insight density is what makes them interesting, but it also makes it very hard to evaluate the theories within them. If you can't examine the evidence that's behind a theory, you have to either assume it or challenge the theory as a whole, which is what usually happens in the comments section after it's first published. If true, this means that you're not really asking for reviews, but lengthy comments that can say something that wouldn't have been said last year.

Replies from: Ruby, Thrasymachus, Raemon, DanielFilan, None, Pattern, Pattern

↑ comment by Ruby · 2020-01-12T05:25:49.585Z · LW(p) · GW(p)

Raw numbers to go with Bendini's comment:

As of the time of writing this comment, there've been 82 reviews on the 75 qualified (i.e., twice-nominated) posts by 32 different reviewers. 24 reviews were by 18 different authors on their own posts.

Whether this counts as a shortage, is puzzling, or is concerning is a harder question to answer.

My quick thoughts:

Personally, I was significantly surprised by the level of contribution to the 2018 Review. It's really hard to get people to do things (especially thing that are New and Work) and I wouldn't have been puzzled at all if the actual numbers had been 20% of what they actually are. Even the more optimistic LW team members had planned for a world where the team hunkered down and wrote all the reviews ourselves.
If we consider the relevant population of of potential reviewers to be the same as those eligible to vote, i.e., users with 1000+ karma, then there are ~130 [1] such users who view at least one post on the site each week (~150 at the monthly timescale). That gives us 20-25% of active eligible voters writing reviews.
- If you look at all users above 100 karma, the number is 8-10% of candidate reviewer engaging in the Review. People below 100 karma won't have written many comments and/or probably haven't been around for that long so aren't likely candidates.

Relative to the people who could reasonably be expected to review, I think we're doing decently, if something like 10-20% of people who could do something are doing it. Of course, there's another question of why there aren't more people with 100+ or 1000+ karma around to begin with, but it's probably not to do with the incentives or mechanics of the review.

[1] For reference, there are 430 users in the LessWrong database with more than 1000 karma.

Replies from: bendini, Ruby

↑ comment by Bendini (bendini) · 2020-01-12T07:16:36.087Z · LW(p) · GW(p)

Those numbers look pretty good in percentage terms. I hadn't thought about it from that angle and I'm surprised they're that high.

FWIW, my original perception that there was a shortage was based on the ratio between the quantity of reviews and the quantity of new posts that have been written since the start of the review period. In theory, the latter takes a lot more effort than the former, so it would be unexpected if more people do the higher effort thing automatically and less people do the lower effort thing despite explicit calls to action and $2000 in prize money.

Replies from: Ruby

↑ comment by Ruby · 2020-01-12T07:46:44.960Z · LW(p) · GW(p)

Re: the ratio
The ratio isn't obviously bad to me, depending on your expectation? Between the beginning of the review on Dec 8th and Jan 3rd [1] then there's been 199 posts (excluding question posts but not excluding link posts), but of those:

- 149 post written by 66 users with over 100 karma

- 95 written by 33 users above 1000 karma (the most relevant comparison)

- 151 posts written by 75 people whose account was first active before 2019.

Compare those with the 82 reviews by 32 reviewers, it's a ratio of reviews:posts between 1:1 and 1:2.
I'm curious if you'd been expecting something much different. [ETA: because of the incomplete data you might want to say 120 posts vs 82 reviews which is 1:1.5.]

Re: the effort
It's not clear to me that the effort involved means you should expect more reviews: 1) I think the Cost-Benefit Ratio for posts is higher even if they take longer, 2) reviewing a post only happens if you've read the post and it impacted you enough to remember and feel motivated to say stuff about, 3) when I write posts, it's about something I've been thinking about and am excited about; I haven't developed any habit around being excited about reviews since I'm not used to it.

[1] That's when I last pulled that particular data onto my machine and I'm being a bit lazy because 8 more days it isn't going to change the overall picture; though it means the relative numbers are a bit worse for reviews.

↑ comment by Ruby · 2020-01-12T07:53:19.170Z · LW(p) · GW(p)

Okay, so 80% of the reviewers have > 1000 karma. 90% >= 463; which means I think the "20-25% of eligible review voters are writing reviews" number is correct if this methodology actually makes sense.

↑ comment by Thrasymachus · 2020-01-12T18:45:47.738Z · LW(p) · GW(p)

I also buy the econ story here (and, per Ruby, I'm somewhat pleasantly surprised by the amount of reviewing activity given this).

General observation suggests that people won't find writing reviews that intrinsically motivating (compare to just writing posts, which all the authors are doing 'for free' with scant chance of reward, also compare to academia - I don't think many academics find peer review/refereeing one of the highlights of their job). With apologies for the classic classical econ joke, if reviewing was so valuable, how come people weren't doing it already? [It also looks like ~25%? of reviews, especially the most extensive, are done by the author on their own work].

If we assume there's little intrinsic motivation (I'm comfortably in the 'you'd have to pay me' camp), the money doesn't offer that much incentive. Given Rudy's numbers suppose each of the 82 reviews takes an average of 45 minutes or so (factoring in (re)reading time and similar). If the nomination money is ~roughly allocated by person time spent, the marginal expected return of me taking an hour to review is something like $40. Facially, this isn't too bad an hourly rate, but the real value is significantly lower:

The 'person-time lottery' model should not be denominated by observed person-time so far, but one's expectation how much will be spent in total once reviewing finishes, which will be higher (especially conditioned on posts like this).
It's very unlikely the reward is going to allocated proportionately to time spent (/some crude proxy thereof like word count). Thus the EV would be discounted by whatever degree of risk aversion one has (I expect the modal 'payout' for a review to be $0).
Opaque allocation also incurs further EV-reducing uncertainty, but best guesses suggest there will be Pareto-principle/tournament dynamic game dynamics, so those with (e.g.) reasons to believe they're less likely to impress the mod team's evaluation of their 'pruning' have strong reasons to select themselves out.

Replies from: Raemon

↑ comment by Raemon · 2020-01-12T20:13:23.814Z · LW(p) · GW(p)

Helpful thoughts, thanks!

I definitely don't expect the money to be directly rewarding in a standard monetary sense. (In general I think prizes do a bad job of providing expected monetary value). My hope for the prize was more to be a strong signal of the magnitude of how much this mattered, and how much recognition reviews would get.

It's entirely plausible that reviewing is sufficiently "not sufficiently motivating" that actually, the thing to do is pay people directly for it. It's also possible that the prizes should be lopsided in favor of reviews. (This year the whole process was a bit of an experiment so we didn't want to spend too much money on it, but it might be that just adding more funding to subsidize things is the answer)

But I had some reason to think "actually things are mostly fine, it's just that the Review was a new thing and not well understood, and communicating more clearly about it might help."

My current sense is:

There have been some critical reviews, so there is at least some motivation latent motivation to do so.
There are people on the site who seem to be generally interested in giving critical feedback, and I was kinda hoping that they'd be up for doing so as part of a broader project. (Some of them have but not as many as I'd hoped. To be fair, I think the job being asked for the 2018 Review is harder than what they normally do)
One source of motivation I'd expected to tap into (which I do think has happened a bit) is "geez, that might be going into the official Community Recognized Good Posts Book? Okay, before it wasn't worth worrying about Someone Being Wrong On the Internet, but now the stakes are raised and it is worth it."

↑ comment by Raemon · 2020-01-12T04:58:20.953Z · LW(p) · GW(p)

Agree with these reasons this is hard. A few thoughts (this is all assuming you're the sort of person who basically thinks the Review makes sense as a concept and want to participate, obviously this may not apply to Mark)

Re: Prestige: I don't know if this helps, but to be clear, I expect to include good reviews in the Best of 2018 book itself. I'm personally hoping that each post comes with at least one review, and in the event that there are deeply substantive reviews those may be given top-billing equivalent. I'm not 100% sure what will happen with reviews in the online seqeunce.

(In fact, I expect reviews to be an potentially easier way to end up in the book than by writing posts, since the target area is more clearly specified.)

"It's Hard to Review Posts"

This is definitely true. Often what needs reviewing is less like "author made an unsubstantiated claim or logical error" and more like "is the entire worldview that generated the post, and the connections the post made to the rest of the world, reasonable? Does it contain subtle flaws? Are there better frames for carving up the world than the one in the post?"

This is a hard problem, and doing a good job is honestly harder than one month work of work. But, this seems like a quite important problem for LessWrong to be able to solve. I think a lot of this site's value comes from people crystallizing ideas that shift one's frame, in domains where evidence is hard to come by. "How to evaluate that?" feels like an essential question for us to figure out how to answer.

My best guess for now is for reviews to not try to fully answer "does this post check out?" (in cases where that depends on a lot of empirical questions that are hard to check, or where "is this the right ontology?" are hard to check). But, instead, to try to map out "what are the questions I would want answered, that would help me figure out if this post checked out?"

(Example of this includes Eli Tyre's "Has there been a memetic collapse [LW · GW]?" question, relating to Eliezer's claims in Local Validity [LW · GW])

Replies from: bendini, None

↑ comment by Bendini (bendini) · 2020-01-12T07:58:15.015Z · LW(p) · GW(p)

Often what needs reviewing is less like "author made an unsubstantiated claim or logical error" and more like "is the entire worldview that generated the post, and the connections the post made to the rest of the world, reasonable?

I agree with this, but given that these posts were popular because lots of people thought they were true and important, deeming the entire worldview of the author flawed would also imply the worldview of the community was flawed as well. It's certainly possible that the community's entire worldview is flawed, but even if you believe that to be true, it would be very difficult to explain in a way that people would find believable.

Replies from: Raemon

↑ comment by Raemon · 2020-01-12T18:11:36.759Z · LW(p) · GW(p)

[edit: I re-read your comment and mostly retract mine, but am thinking about a new version of it]

↑ comment by [deleted] · 2020-01-12T05:28:27.792Z · LW(p) · GW(p)

Have you got authorization from authors/copyright holders to do a book compendium?

Replies from: Raemon

↑ comment by Raemon · 2020-01-12T05:37:42.933Z · LW(p) · GW(p)

Everyone will get contacted about inclusion in the book with the opportunity to opt out.

↑ comment by DanielFilan · 2020-01-12T02:58:15.763Z · LW(p) · GW(p)

FWIW from a karma perspective I've found writing reviews to be significantly more profitable than most comments. IDK how this translates into social prestige though.

Replies from: bendini

↑ comment by Bendini (bendini) · 2020-01-12T04:34:10.505Z · LW(p) · GW(p)

I'm not surprised to learn that is the case.

This is my understanding of how karma maps to social prestige:

People with existing social prestige will be given more karma for a post or a comment than if it was written by someone unknown to the community.
Posts with more karma tend to be more interesting, which helps boost the author's prestige because more people will click on a post with higher karma.
Comments with high karma are viewed as more important.
Comments with higher karma than other comments in the same thread are viewed as the correct opinion.
Virtually nobody looks at how much karma you've got to figure out how seriously to take your opinions. This is probably because by the time you have accumulated enough for it to mean something, regulars will already associate your username with good content.

↑ comment by [deleted] · 2020-01-12T00:38:49.598Z · LW(p) · GW(p)

Not all of us agree with the project. I disagree with the entire concept of "pruning" output in this way. I wouldn't participate on principle.

Replies from: Ruby

↑ comment by Ruby · 2020-01-12T04:40:06.930Z · LW(p) · GW(p)

concept of "pruning" output in this way

I'd be curious to learn the alternative ways you favor, or more detail on why this approach is flawed. Standard academic peer review has its issues, but seemingly a community should have a way it reviews material and determines what's great, what needs work, and what is plain wrong.

Replies from: None

↑ comment by [deleted] · 2020-01-12T05:43:41.239Z · LW(p) · GW(p)

Well part of rationality is being able to assess and integrate this information yourself, rather than trusting in the accuracy of curators (which reinforces bad habits IMHO, hence the concern). Things that are useful get referenced, build citations, and are therefore more visible and likely to be found.

Replies from: Ruby, Ruby

↑ comment by Ruby · 2020-01-12T06:36:31.219Z · LW(p) · GW(p)

Do you think there are any ways the 2018 Review as we've been doing it could be modified to be better along the dimensions you're concerned about?

Replies from: None

↑ comment by [deleted] · 2020-01-16T14:01:20.024Z · LW(p) · GW(p)

Sorry if I wasn't clear: I don't think it's a useful thing to do, full stop.

I don't mean to rain on anyone's parade. I was really just replying to the top-level comment which started with:

The shortage of reviews is both puzzling and concerning...

I was just pointing out that some people aren't participating because they don't find the project worth doing in the first place. To me it's just noise. I'm not going to get in the way of anyone else, if they want to contribute, but if you're wondering why there is a shortage of reviews.. well I gave my reasons for not contributing.

Replies from: Ruby

↑ comment by Ruby · 2020-01-17T06:08:06.736Z · LW(p) · GW(p)

Yeah, true, that seems like a fair reason to point out for why there wouldn't be more reviews. Thanks for sharing your personal reasons.

↑ comment by Ruby · 2020-01-12T06:11:14.549Z · LW(p) · GW(p)

That makes sense. As I'm won't to say [LW · GW], there often risks/benefits/costs in each direction.

Ways in which I think communal and collaborative review are imperative:

Public reviews help establish the standards or reasoning expected in the community.
By reading other people's evaluations, you can better learn how to perform your own.
It's completely time prohibitive for me to thoroughly review every post that I might reference, instead I trust in the author. Dangerously, many people might do this and a post becomes highly cited despite flaws that would be exposed if a person or two spent several hours evaluating it*
I might be competent to understand and reference a paper, but lack the domain expertise to review it myself. The review of another domain expert can help me understanding the shortcoming's of a post.
And as I think has been posted about, having a coordinated "review festival" is ideally an opportunity for people with different opinions about controversial topics to get together and hash it out. In an ideal world, review is the time when the community gets together to resolve what debates it can.

*An example is the work I began auditing the paper Eternity in Six Hours [LW · GW] which is tied to the Astronomical Waste argument. Many people reference that argument, but as far as I know, few people have spent much time attempting to systematically evaluate its claims. (I do hope to finish that work and publish more on it sometime.)

↑ comment by Pattern · 2020-01-12T03:19:35.421Z · LW(p) · GW(p)

This insight density is what makes them interesting, but it also makes it very hard to evaluate the theories within them.

So posts should be (pre-)processed for theory/experimentation? (Or distilled?)

↑ comment by Pattern · 2020-01-12T03:18:37.408Z · LW(p) · GW(p)

Other possible factors:

Maybe people read newer posts instead of (re-)reading older posts

the time of the year (in which reviews occurred)

the length of time open for reviews

the set of users reviews are open to

the set of posts open to review. For example, these are from long ago. (Perhaps if there was a 1 year retrospective, and 2 year, and so on up to 5 years, that could capture engagement earlier, and get ideas for short term and longer term effects.)

some trivial inconveniences around reading the posts to be reviewed (probably already addressed, but did that affect things a lot?)

Replies from: Raemon

↑ comment by Raemon · 2020-01-12T22:53:46.128Z · LW(p) · GW(p)

the set of users reviews are open to

Note that all users can do review. (It's only voting, and nomination, that's restricted to highish karma)

Please Critique Things for the Review!

Contents

30 comments