Reviewing the Review

raemon

Reviewing the Review

post by Raemon · 2020-02-26T02:51:20.159Z · LW · GW · 16 comments

    Was it worth it? Should we do it again?
      Identifying the best posts
      Improving Longterm Incentives and Feedback
      Checking out epistemic state on controversial posts
      Shared sense of ownership over LW’s intellectual pipeline
      Evaluate LessWrong as a site
    Problems with Execution
      Too much stuff
      Voting vs Reviewing
      Voting vs Survey
      Nomination Phase was a bit confusing
    Alignment Review?
  Further Feedback?
None
16 comments

We just spent almost two months reviewing the best posts of 2018. It was a lot of development work, and many LW users put in a lot of work to review and vote on things.

We’ve begun work on the actual printed book, which’ll be distributed at various conferences and events as well as shipped to the featured authors. I expect the finished product to influence the overall effect of the Review. But meanwhile, having completed the “review” part, I think there’s enough information to start asking:

Was it worth it? Should we do it again? How should we do it differently?

Was it worth it? Should we do it again?

My short answer is “yes and yes.” But I have some caveats and concerns.

My own goals for the Review were:

Actually identify the best posts
Improve longterm incentive structures and feedback systems
1. Give users a reason to improve old content
Check our collective epistemic state on controversial posts
Figure out how to evaluate blogposts
Create a shared sense of ownership over LW’s intellectual pipeline
Evaluate LessWrong as a site and project

Some of those (1, 4, 6) I feel able to evaluate independently, others depend on how other people felt about the process, and how much value they got for the effort they put in. It also depends on what the counterfactual actions the Review is being compared to.

But overall, I personally found the process very rewarding. Nominating, reviewing and voting gave me a clearer sense of how various ideas fit together, and what LessWrong had accomplished in 2018. This involved a fair effort on my part (several hours of re-reading, thinking, comparing), but it felt both enjoyable and worthwhile.

Identifying the best posts

I think the review did a decent job at this. The obvious comparison is “what were the top karma posts of 2018?”. Could we have saved ourselves a ton of work by checking that? This is somewhat confounded by the fact that we changed our karma system partway through 2018 (reducing the power of the average upvote, after initially increasing it at the beginning of LW 2.0). At some point we plan to re-run the voting history using the current upvote strengths, which will give us clearer information there.

Meanwhile, comparing the top karma posts of 2018 to the top-voted posts in the review, there are some obvious differences. (Most obviously, “Arbital Postmortem” didn’t end up featured in the review, while being the top-scoring post)

I think the simple act of filtering on “Which posts had at least some people who had made use of them, and were willing to endorse them?” was a key factor.

I think there was a lot of room to improve here. I felt that the voting process seemed, at least in some cases, more like “how prestigious should this post be?” [LW(p) · GW(p)], rather than giving me a clear sense of “how useful was this post?”.

Next year, I’d like to experiment with other voting processes [LW(p) · GW(p)] that help disentangle “should this post be in a public-facing Best of LW book?” from “Was this post valuable?”, and “Did it reflect good intellectual practices?”.

Improving Longterm Incentives and Feedback

This is hardest to evaluate right now. But it’s also where I’m expecting most of the Review process’s value to lie. A decade from now, my bet is that LessWrong will have demonstrably better epistemics and value if we keep doing some form of a longterm, retrospective review process.

But for this year, we saw at least some outcomes in this space.

First, as a nominated author, I found it pretty rewarding to see some posts of mine getting discussed, and that some of them had had a longterm impact on people. I’m guessing that generalizes.

Second, at least 4 people I know of deliberately updated their post for the review. Two of those people were on the LW team, and one of them was me, so, well, I’m not going to count that too strongly. But meanwhile lots of authors gave self-reviews that reflected their updated thoughts.

Third, we saw a number of reviews that gave critical feedback, often exploring not just the ideas in the post but how the post should fit conceptually into an overall worldview.

Not all of those reviews were clearly valuable. But I think the clearest sign of counterfactually valid and valuable reviews were:

Abram’s review of Rationality Realism (where a lot of latent disagreement came to light, followed by in depth discussion of that disagreement)
A review by Bucky which looked into a cited paper in Zvi’s Unknown Unknowns. This was a particular type of intellectual labor that I was hoping to come out of the review process, which I expected to not happen much by default.

Checking out epistemic state on controversial posts

I think this had more room for improvement. The aforementioned Rationality Realism discussion was great, and I think there was at least some progress in, say, Vaniver’s post on Circling [LW · GW] that acted as a sort-of-review for Unreal’s 2018 post.

I don’t have a strong sense that any debates were settled. But, well, I do think it takes an absurdly long time to resolve disagreements [LW · GW] even when people are trying hard in good faith.

I think we did at least get a clearer sense of how controversial each post was, from the voting process, and that seems like a good starting place.

Figure Out How To Evaluate Blogposts

There were a diverse array of blogposts. Some of them benefited from conceptual, philosophical debate. Some of them benefited from statistical review of scientific papers.

A few of them had implied empirical claims, which would be pretty hard to check. I still hope that someone investigates more thoroughly El Tyrei’s question about “Has there been a memetic collapse?” [LW · GW], which looks into some of Eliezer’s assumptions in Local Validity [LW · GW] and Is Clickbait Destroying Our General Intelligence? [LW · GW]. But, to be fair, it’s a lot of work, it’s confusing how to even go about it, and right now I don’t think we’ve really offered good enough rewards for answering it thoroughly.

Overall, we got a fair number of people who worked on reviews [LW(p) · GW(p)], but a small number of people did most of the work. A couple [LW(p) · GW(p)] people [LW(p) · GW(p)] noted that reviewing felt like “work”, and I think the strength of internet forums is making intellectual work feel like play [LW · GW].

I am uncertain how to take all of this into account for next year.

Shared sense of ownership over LW’s intellectual pipeline

I don’t have a clear sense of how this worked out. I know that participating in the review process increased my own sense of partial-ownership over the intellectual process, and I have some sense that the other people participated most heavily in the process felt something of that. But I’m not sure how it worked overall.

This goal was less necessary than some of the other goals but still seems useful for the longterm health of the site.

Evaluate LessWrong as a site

While engaging in the review process, I skimmed posts from 2017, and 2019 as well as digging significantly into the nominated posts from 2018. This gave me some sense of LW’s overall output trajectory. This doesn’t necessarily give us clear common knowledge of the community’s collective epistemic state, but I found it at least useful for myself to form an opinion on “how is LW doing?”

One thing I found was that in 2017, there were relatively few unique authors that I was excited – many of the most exciting things were posts from Eliezer’s Inadequate Equilibria sequence, which already had been made into a book, and then maybe… 5 other authors among posters I was particularly excited about?

In 2018, there was a much more diverse array of popular authors. Eliezer is present as one of the top contributors, but there were around 40 authors featured in the review, and even if you’re just focusing on the top half that’s a healthier base of authorship.

I think we have aways to go – 2018 was when LW 2.0 officially launched and it was still hitting its stride. My rough sense (partially informed by the metrics we’ve started tracking) is that 2019 was a slight improvement, and that things have started picking up in particular towards the end of 2019.

In the 2018 Review, my overall sense is that there were many “quite solid” posts on the subject of general rationality, and coordination. Meanwhile, I think a lot of very concrete progress was made on Alignment. It looks (from the outside) like the field went from a position of not really having any common language to communicate, to establishing several major paradigms of thought with clear introduction sequences.

Some people have expressed some sense that… Alignment posts don’t quite feel like they count. Partly because the Alignment Forum is [sort of] a separate website, and partly because they’re just a lot less accessible. I think it’s admittedly frustrating to have a lot of high-context, technical content that’s hard for the average user to participate in.

But, it still seems quite important, and I think it is sufficient to justify LessWrong 2.0’s existence. Much of it is tightly interwoven with the study of rationality and agency. And much of it is precisely the sort of hard, confusing problem that LessWrong was founded to help solve.

I do hope for more accessible “rationality” oriented content to build momentum on LessWrong. I think some progress was made on that in 2019, and, well, we’ll see next year hopefully how that looked in retrospective.

Problems with Execution

Too much stuff

There were 75 posts that made it into the review process. This seemed roughly the right amount of contenders, but more than people could easily think about at once. One way or another, we need to do a better job of directing people’s attention.

Options I can see include:

Have a higher nomination threshold, somehow aiming for closer to 50 posts. (for reference, this year, 23 posts had 3+ nominations rather than 2. I’m assuming more posts would have gotten 3 nominations if we had stated that explicitly as a requirement.
- I don’t think it’s reasonable to cull the initial pool to less than 50, especially if post volume grows each year.
Somehow culling the nomination pool some other way, partway through the process.
- I don’t currently have good ideas on how to do this
Direct user’s attention to posts they’d previously engaged with (i.e. views, upvotes, comments)
- I’m pretty confident we’ll do this next year, but it doesn’t seem sufficient
Direct people’s attention to a smaller group of posts at a time. Maybe every few days, direct people’s attention to a different set of posts
- This seems potentially promising, but 75 is still just a lot of posts to get through. It’d be 18 posts per week if the review period was a month, 9 if it were 2 months.
Radically restructure the thing somehow (perhaps doing rolling reviews every few months rather than a single all-encompassing one?)
- Rolling reviews feel… less exciting somehow. But I could imagine this turning out to be right approach.
Randomly assigning each user a smaller number of posts to focus on. (Perhaps each user gets 5 or 10 posts they’re supposed to evaluate, and if they’ve evaluated those ones, they a) get to have their votes counted in the larger tally, b) they are then welcome to review other posts that they’re excited about)

A lot of the options feel like good ideas, but insufficient. But maybe if you combine them together you get something workable.

Voting vs Reviewing

This year, we initially planned to separate out the reviewing stage and the voting stage. Ben and I ended up deciding to have the Voting Phase overlap the end of the Review Phase. I can’t remember all the reasons for this but they included “we still wanted more reviews overall, and we expected people who show up to do voting to end up doing some reviews along the way”, and “some people might want to update their votes in response to views, and vice versa.”

I think it’s plausible that next year it might be better to just have the Vote and Review phases completely overlapping. (In particular if we end up doing something like “assign each user 10 random posts to evaluate.” I imagine it being a fairly hard task to “do a serious review of 10 posts”, but to be fairly achievable to “think about 10 posts enough to cast some votes on them, and if you end up writing reviews in the meanwhile that’d be great.”

Voting vs Survey

As I mentioned earlier [LW(p) · GW(p)]: A worry I have with our voting system this year is that it felt more to me like "ranking posts by prestige" than "ranking them by truth/usefulness." It so happens that "prestige in LW community" does prioritize truth/usefulness and I think the outcome mostly tracked that, but I think we can do better.

I'm worried because:

I'd expected, by default, for these things to be jumbled together.
Whatever you thought of Affordance Widths, it seems unlikely for it's "-140" score to be based on the merits of the post, rather than people not wanting the author represented in a Best of LW book. (This isn't obviously wrong: reputational effects matter and it's fine to give people an outlet for that, but I think it's better to ask those questions separately from questions of truth, and usefulness. It currently seems to me that if an unpopular author wrote an obviously important piece, the review wouldn't be able to determine that)

I also noticed conflict between “which posts really make sense to showcase to the rest of the world?” and “which posts do we want to reward for important internal updates, that were either very technical, or part of high context ongoing conversations.”

So I’d like to frame it more as a survey next year, with questions like:

Have you thought about the ideas in this post in the past year?
Do the ideas in this post seem important?
Does this post demonstrate good epistemics?
Should this post appear in a public-facing Best of LW book?
Should this post appear in an inward-facing, high-context LW Journal?

(With options for "yes/no/neutral", perhaps with Strong No and Strong Yes)

The main reason not to do this is that in many cases the answers may be similar, enough that they feel annoyingly redundant rather than helpfully nuanced. But I currently lean towards “try it at least once.” I’m hoping this would prompt users to give honest answers rather than trying to strategically vote.

If we ended up combining the Review and Voting Phases, this might come along with text-boxes for Reviews. Possibly just one catch-all textbox. Or, possibly broken up into multiple freeform questions, such as:

How has this post been helpful to you?
What advice do you have to the author to improve this post?
What thoughts do you have for other voters to help them evaluate this post?
What further work or questions would you like to see done in this post?

Nomination Phase was a bit confusing

I originally hoped the nominations phase would include detailed information on why the posts were good. Instead most nominations ended up fairly brief, and people gave more nuanced positive thoughts in the review phase.

I think in many cases it was useful for nominations to include reasons why, to give other nominators some context for why they might consider seconding the nomination. But in some cases that just felt superfluous.

At the very least, next year I’d include a more prominent “endorse nomination” button, that makes it lower effort for subsequent nominators to increase the nomination count. It’s possible that including reasons for nominations isn’t necessary, and we can handle that as part of the Review step.

Alignment Review?

It did seem like the Alignment Forum would have benefited from having a somewhat different review process. My current guess is that next year, there would be a LessWrong Review, and an Alignment Review, and some of the content is overlapping but they’re optimized separately.

It's possible the Alignment Review might include things not published on the Alignment Forum (and, in fact, it’d be a fine outcome if it were concluded that the most important Alignment progress happened elsewhere). In the months leading up to the review, AF users might be encouraged to make link posts for 2019 Alignment Content that they think was particularly important.

Further Feedback?

Did you have your own take on the review process? Were there particular problems with execution? Having seen how the review process fit together, do you have overall concerns about the approach or the ontology?

I'm interested in feedback on whatever levels stand out to you.

16 comments

Comments sorted by top scores.

comment by Pattern · 2020-02-26T18:51:45.609Z · LW(p) · GW(p)

Content:

Did you have your own take on the review process? Were there particular problems with execution?

Yes:

Give users a reason to improve old content

I prefer old posts stay the same.

An addition like "EDIT: an improved version of this essay was written for the 2019 review of 2010, click here to go to that post."* isn't bad. Worst case, this ends up forming a long linked list terminating in the latest version.

*A shorter version would be: "Next version of post here: [link]"

Doubly linked or a sequence might be better, but it's preservation of old links that is important. Posts that were good were good for a reason. Seeing the same idea presented multiple ways can make it easier to grasp, while rewrites can be less clear/useful in the ways the prior version were.

Changing the original can also decrease the value of discussions of old content, including:

Evaluation.

Figure out how to evaluate blogposts

This is easier when evaluations can be seen in the context of what they were evaluating, because it hasn't been changed.

Styling:

One thing I found was that in 2017, there were relatively few unique authors that I was excited [about] –

I think we have [a ways] to go

Replies from: Raemon

↑ comment by Raemon · 2020-02-26T21:13:43.579Z · LW(p) · GW(p)

One thing to check, which you may not be aware of: there's an (somewhat unfinished) feature wherein post-updates can be tagged with a "major revision". (Currently this feature is admin only. It's been used for the updated versions of sequence posts, and a couple ad-hoc things. I'm guessing it will be formally shipped to the public by the end of 2020).

Once a post has at least one major revision, the top of the post gets a drop-down menu that lets you select past versions. (Changing version changes the url, which you can then link to)

Old comments, made before a major revision, get a little icon at the top-right that says "this comment was made for an older version of this post" (which you can click on to see that version of the post)

You can see this in action for the Being a Robust Agent post.

The current implementation is that the default link goes to the newest version, but you can create permalinks to old versions if you want. I can imagine deciding to change that, although I think on average it'll be a better experience to get the latest version of a post (with clear signposting that you are on a second version). I agree that sometimes someone might go all George-Lucas on their content and make it worse, but I expect most major version updates to be improvements.

I can imagine you still disagreeing with the underlying philosophy but wanted to make sure you were aware of all that before delving into it.

Replies from: Pattern

↑ comment by Pattern · 2020-02-27T05:08:33.610Z · LW(p) · GW(p)

I appreciate your response. After checking one of the posts changed in the review I found that this feature was in place for it.

If this is the norm for reviews and changes otherwise remain rare,* this appears to take care of the major/immediate aspects that were of concern (w.r.t. preserving old links).

If at some future point LW (the site) ceases to exist, then things being formatted in this fashion might make preservation more difficult, but my concerns are currently more short term.

*It makes the following things trivial inconveniences, rather than large or mild:

after following an old link, accessing the original version (?revision=1.0.0)
when making a link to version two, making sure the link is to to the new version (?revision=2.0.0)

comment by Rohin Shah (rohinmshah) · 2020-02-26T17:14:48.803Z · LW(p) · GW(p)

It's possible the Alignment Review might include things not published on the Alignment Forum

I don't want this. There's a field of alignment outside of the community that uses the Alignment Forum, with very different ideas about how progress is made; it seems bad to have an evaluation of work they produce according to metrics that they don't endorse.

(I'm imagining a new entrant to the field deciding whether to go to CHAI, Ought, FHI, OpenAI or DeepMind based on the results of this review, and thinking "oh god no that sounds horrible".)

Replies from: Raemon

↑ comment by Raemon · 2020-02-26T22:53:01.344Z · LW(p) · GW(p)

Nod. (to be clear this is still all in the early brainstorming stage. I ran the idea by ~3 alignment people, but an Alignment Review of any kind, AF-forum specific or otherwise, would be something I wanted to get a lot of buy-in for and iron out the details of before attempting)

(I'm imagining a new entrant to the field deciding whether to go to CHAI, Ought, FHI, OpenAI or DeepMind based on the results of this review, and thinking "oh god no that sounds horrible".)

I'm not actually sure I understood the intended point here, wondering if you could rephrase it in somewhat different words.

Current Thoughts

Here's a more fleshed out version of my current thinking (again, all of this is intended to be early stage brainstorming-type-thoughts)

Taking a step back, there a few different problems that seem worth solving. The core thing is "getting alignment right is a high stakes and confusing problem, and many people disagree about what counts as progress."

One problem is "Even within individual alignment paradigms, there's not much common knowledge of what sorts of results are meaningful. Or even what exactly the paradigm is." (this currently seems true to me, although I can imagine some paradigms being clearer than others)

Another problem is "It's not obvious how to evaluate across paradigms, or what paradigm is right, or if multiple paradigms are right for different reasons, how to integrate them." And that's also an important problem to actually solve some day.

I can see reasons to not try to solve the second problem until there's clearer consensus on the first problem, within at least some clusters of researchers. But I can also imagine them not being (much) harder to solve together than separately.

A few possibilities I can imagine, knobs to turn, and/or considerations:

Considerations

Less "rank ordering evaluative-ness"?

If I naively imagine using something close to the 2019 review for alignment (even within a single paradigm), I expect my concerns about "sort by prestige" to be much worse, because there are greater political consequences that one could screw up (and, lack of common knowledge about how large those consequences are and how bad they might be might make everyone too anxious to get buy-in).

So, I would currently guess that even for an Alignment Forum centric review that just focused on itself, I'd be particularly wary of the output being obviously rank-ordered. It may be better to be more like a survey that focuses on qualitative rather than quantitative questions. (In places where it's necessary to produce a list of posts, have the list randomized or something)

(I think at some point it is fairly important to actually be able to say 'this stuff was most important' and give prizes and stuff, but that might be something you do more Nobel Prize style a decade+ after the fact)

Option #1 – AF-Centric

The simplest option is "just focus on AF evaluating itself." This comes with less downside risk, but also less upside. I think there's a decent chance that the most important progress will have come outside of the Alignment Forum. If AF isn't producing ideas that seem meaningful outside of its own ecosystem, that's important to know. (It's not obviously bad, maybe the ideas don't seem meaningful because the broader ecosystem is wrong, but knowing which sorts of posts were seemed good to a wider variety of researchers is useful)

Option #2 – A Non-Alignment-Forum-Review, with Buy In From Everyone

The far end of a spectrum would be to not host the review on Alignment Forum, instead creating a new body that's specifically aiming to be representative of various subfields and paradigms of people who are working on something reasonably called "AI alignment", and get each of their opinions.

It's not really obvious where to draw that boundary, but it'd likely make sense to get people OpenAI, Deepmind and CHAI. (I'm not as up-to-date on what Ought or FHI do, but depending on what sort of research they do I could see it)

Option #3 – Alignment Forum Review of Everything, with Buy In From Everyone

Alternately, you might say "well, we tried to create a place where people from various paradigms could communicate with each other, and that place was the Alignment Forum, and insofar as that feels like a place with some-particular-paradigm where other paradigms are kinda unwelcome, that's something that should be fixed, and part of the point here might be to extend an olive branch to other places and get them more involved."

A recent observation someone made in person is that the AF is filtered for people who like to comment on the internet, which isn't the same filter as "people who like Agent Foundations", but there is some history that sort of conflates that. And meanwhile researchers elsewhere may not want to get dragged into internet discussions in the first place.

One thing the Review might offer is a way for people-who-don't-like-internet-discussions to quickly fill out a survey that gives them more of a voice into "the AF consensus" without having to get involved with extended discussions. (Or to have those discussions be time-boxed to one month.)

This is currently my favorite option (though not one that's obviously endorsed by anyone else).

Option #4 – Alignment Forum Review of Everything, with Buy In From AF and that's it

Somewhere else on a (possibly multidimensional) spectrum is to say "Okay, AF has some kind of opinion on what paradigms are good. That's supposed to be a relatively broad consensus that people at CHAI, OpenAI, MIRI and at least some people from deepmind. It might currently disproportionately favor MIRI for historical reasons, which is kinda bad. But, the optimal version of itself is still somewhat opinionated, compared to the broader landscape."

And in that case, yes, it'd be evaluating things on different metrics than people wanted for themselves. But... that seems fine? It's an important part of science as an institution that people get to evaluate you based on different things than you might have wanted to be evaluated on.

Conferences you submit papers to can reject you, journals might be aiming to focus on particular subfields and maybe you think your thing is relevant to their subfield but they don't.

Most nonprofits aren't trying to optimize for Givewell's goals, but it was good that Givewell set up a system of evaluating that said "if you care about goal X, we think these are the best nonprofits, here's why." The collective system of evaluation from different vantage points is a key piece of intellectual progress.

Replies from: vanessa-kosoy, rohinmshah

↑ comment by Vanessa Kosoy (vanessa-kosoy) · 2020-02-27T06:48:40.861Z · LW(p) · GW(p)

If I naively imagine using something close to the 2019 review for alignment (even within a single paradigm), I expect my concerns about "sort by prestige" to be much worse, because there are greater political consequences that one could screw up (and, lack of common knowledge about how large those consequences are and how bad they might be might make everyone too anxious to get buy-in).

I don't think so.

Your main example for the prestige problem with the LW review was "affordance widths". I admit that I was one of the people who assigned a lot of negative points to "affordance widths", and also that I did it not purely on abstract epistemic grounds (in those terms the essay is merely mediocre) but because of the added context about the author. When I voted, the question I was answering was "should this be included in Best of 2018", including all considerations. If I wasn't supposed to do this then I'm sorry, I haven't noticed before.

The main reason I think it would be terrible to include "affordance widths" is not exactly prestige. The argument I used before [LW(p) · GW(p)] is prestige-based, but that's because I expected this part to be more broadly accepted, and wished to avoid the more charged debate I anticipated if I ventured closer to the core. The main reason is, I think it would send a really bad message to women and other vulnerable populations who are interested in LessWrong: not because of the identity of the author, but because the essay was obviously designed to justify the author's behavior. Some of the reputational ramifications of that would be well-earned (although I also expect the response to be disproportional).

On the other hand, it is hard for me to imagine anything of the sort applying to the Alignment Forum. It would be much more tricky to somehow justify sexual abuse through discussion about AI risk, and if someone accomplished it then surely the AI-alignment-qua-AI-alignment value of that work would be very low. The sort of political considerations that do apply here are not considerations that would affect my vote, and I suspect (although ofc I cannot be sure) the same is true about most other voters.

Also, next time I will adjust my behavior in the LW vote also, since clearly it is against the intent of the organizers. However, I suggest that some process is created in parallel to the main vote, where context-dependent considerations can be brought up, either for public discussion or for the attention of the moderator team specifically.

Replies from: Raemon

↑ comment by Raemon · 2020-02-27T21:17:07.776Z · LW(p) · GW(p)

To be clear, given the vote system we went with (which basically rolled all considerations into a single vote, and ask), I don't think there was anything wrong with voting against Affordance Widths for that reason.

I saw this more as "the system wasn't well designed, we should use a better system next time."

(Different LW team members also had different opinions on what exactly the Review should be doing and why, and some changed their mind over the course of the process, which is part of why some of the messaging was mixed).

The reason I thought (at the time) it was best to "just collapse everything into one vote, which is tied fairly closely to 'what should be in the book?'" was that if you told people it was about "being honest about good epistemics", but the result still ended up influencing the book, you'd have something of an asshole filter where some people vote strategically and are disproportionately rewarded."

I think I may have some conceptual disagreements with your framing, but my current goal for next year is to structure things in a way that separates out truth, usefulness, and broader reputational effects from each other, so that the process is more robust to people coming at it with different goals and frames.

Replies from: Raemon

↑ comment by Raemon · 2020-02-27T21:26:54.745Z · LW(p) · GW(p)

The reason I'm more worried about this for an Alignment Review is that the stakes are higher, and it is not only important that the process be epistemically sound, but for everyone to believe it's epistemically sound and/or fair. (And meanwhile sexual abuse isn't the only possible worrisome thing to come up)

↑ comment by Rohin Shah (rohinmshah) · 2020-02-27T00:07:56.158Z · LW(p) · GW(p)

I'm not actually sure I understood the intended point here, wondering if you could rephrase it in somewhat different words.

There are two pretty different approaches to AI safety, which I could uncharitably call MIRI-rationalist vs. everyone else. (I don't have an accurate charitable name for the difference.) I claim that AF sees mostly just the former perspective. See this comment thread [LW(p) · GW(p)]. (Standard disclaimers about "actually this is a spectrum and there are lots of other features that people disagree on", the point is that this is an important higher-order bit.)

I think that for both sides:

Their work is plausibly useful
They don't have a good model of why the other side's work is useful
They don't expect the other side's work to be useful on their own models

Given this, I expect that ratings by one side of the other side's work will not have much correlation with which work is actually useful.

So, such a rating seems to have not much upside, and does have downside, in that non-experts who look at these ratings and believe them will get wrong beliefs about which work is useful.

(I already see people interested in working on CHAI-style stuff who say things that MIRI-rationalist viewpoint says where my internal response is something like "I wish you hadn't internalized these ideas before coming here".)

I expect my concerns about "sort by prestige" to be much worse

I agree with this but it's not my main worry.

The far end of a spectrum would be to not host the review on Alignment Forum, instead creating a new body that's specifically aiming to be representative of various subfields and paradigms of people who are working on something reasonably called "AI alignment", and get each of their opinions.

This would be good if it could be done; I'd support it (assuming that you actually get a representative body). I think this is hard, but that doesn't mean it's impossible / not worth doing, and I'd want a lot of the effort to be in ensuring that you get a representative body.

A recent observation someone made in person is that the AF is filtered for people who like to comment on the internet, which isn't the same filter as "people who like Agent Foundations", but there is some history that sort of conflates that. And meanwhile researchers elsewhere may not want to get dragged into internet discussions in the first place.

I don't think this is the main selection effect to worry about.

Okay, AF has some kind of opinion on what paradigms are good. That's supposed to be a relatively broad consensus that people at CHAI, OpenAI, MIRI and at least some people from deepmind.

It's not a broad consensus. CHAI has ~10 grad students + a few professors, research engineers, and undergrads; only Daniel Filan and I could reasonably said to be part of AF. OpenAI has a pretty big safety team (>10 probably); only Paul Christiano could reasonably be said to be part of AF. Similarly for DeepMind, where only Richard Ngo would count.

But, the optimal version of itself is still somewhat opinionated, compared to the broader landscape.

Seems right; we just seem very far from this version.

And in that case, yes, it'd be evaluating things on different metrics than people wanted for themselves. But... that seems fine? It's an important part of science as an institution that people get to evaluate you based on different things than you might have wanted to be evaluated on.

Agreed for the optimal version.

Conferences you submit papers to can reject you, journals might be aiming to focus on particular subfields and maybe you think your thing is relevant to their subfield but they don't.

I'd be pretty worried if a bunch of biology researchers had to decide which physics papers should be published. (This exaggerates the problem, but I think it does qualitatively describe the problem.)

Most nonprofits aren't trying to optimize for Givewell's goals, but it was good that Givewell set up a system of evaluating that said "if you care about goal X, we think these are the best nonprofits, here's why."

Nonprofits should be accountable to their donors. X-risk research should be accountable to reality. You might think that accountability to an AF review would be a good proxy for this, but I think it is not.

(You might find it controversial to claim that nonprofits should be accountable to donors, in which case I'd ask why it is good for GiveWell to set up such a system of evaluation. Though this is not very cruxy for me so maybe just ignore it.)

Replies from: vanessa-kosoy, Raemon, Raemon

↑ comment by Vanessa Kosoy (vanessa-kosoy) · 2020-02-27T07:12:37.451Z · LW(p) · GW(p)

I don't want this. There's a field of alignment outside of the community that uses the Alignment Forum, with very different ideas about how progress is made; it seems bad to have an evaluation of work they produce according to metrics that they don't endorse.

This seems like a very strange claim to me. If the proponents of the MIRI-rationalist view think that (say) a paper by DeepMind has valuable insights from the perspective of the MIRI-rationalist paradigm, and should be featured in "best [according to MIRI-rationalists] of AI alignment work in 2018", how is it bad? On the contrary, it is very valuable the the MIRI-rationalist community is able to draw each other's attention to this important paper.

So, such a rating seems to have not much upside, and does have downside, in that non-experts who look at these ratings and believe them will get wrong beliefs about which work is useful.

Anything anyone says publicly can be read by a non-expert, and if something wrong was said, and the non-expert believes it, then the non-expert gets wrong beliefs. This is a general problem with non-experts, and I don't see how is it worse here. Of course if the MIRI-rationalist viewpoint is true then the resulting beliefs will not be wrong at all. But this just brings us back to the object-level question.

(I already see people interested in working on CHAI-style stuff who say things that MIRI-rationalist viewpoint says where my internal response is something like "I wish you hadn't internalized these ideas before coming here".)

So, not only is the MIRI-rationalist viewpoint wrong, it is so wrong that it irreversibly poisons the mind of anyone exposed to it? Isn't it a good idea to let people evaluate ideas on their own merits? If someone endorses a wrong idea, shouldn't you be able to convince em by presenting counterarguments? If you cannot present counterarguments, how are you so sure the idea is actually wrong? If the person in question cannot understand the counterargument, doesn't it make em much less valuable for your style of work anyway? Finally, if you actually believe this, doesn't it undermine the entire principle of AI debate? ;)

Replies from: rohinmshah

↑ comment by Rohin Shah (rohinmshah) · 2020-02-27T17:29:10.230Z · LW(p) · GW(p)

If the proponents of the MIRI-rationalist view think that (say) a paper by DeepMind has valuable insights from the perspective of the MIRI-rationalist paradigm, and should be featured in "best [according to MIRI-rationalists] of AI alignment work in 2018"

That seems mostly fine and good to me, but I predict it mostly won't happen (which is why I said "They don't expect the other side's work to be useful on their own models"). I think you still have the "poisoning" problem as you call it, but I'm much less worried about it.

I'm more worried about the rankings and reviews, which have a much stronger "poisoning" problem.

Anything anyone says publicly can be read by a non-expert, and if something wrong was said, and the non-expert believes it, then the non-expert gets wrong beliefs. This is a general problem with non-experts, and I don't see how is it worse here.

Many more people are likely to read the results of a review, relative to arguments in the comments of a linkpost to a paper.
Calling something a "review", with a clear process for generating a ranking, grants it much more legitimacy that one person saying something on the Internet.

So, not only is the MIRI-rationalist viewpoint wrong, it is so wrong that it irreversibly poisons the mind of anyone exposed to it?

Not irreversibly.

Isn't it a good idea to let people evaluate ideas on their own merits?

When presented with the strongest arguments for both sides, yes. Empirically that doesn't happen.

If someone endorses a wrong idea, shouldn't you be able to convince em by presenting counterarguments?

I sometimes can and have. However, I don't have infinite time. (You think I endorse wrong ideas. Why haven't you been able to convince me by presenting counterarguments?)

Also, for non-experts this is not necessarily true (or is true only in some vacuous sense). If a non-expert sees within a community of experts 50 people arguing for A, and 1 person arguing for not-A, even if they find the arguments for not-A compelling, in most cases they should still put high credence on A.

(The vacuous sense in which it's true is that the non-expert could become an expert by spending hundreds or thousands of hours becoming an expert, in which case they can evaluate the arguments on their own merits.)

If you cannot present counterarguments, how are you so sure the idea is actually wrong?

I in fact can present counterarguments, it just takes a long time.

If the person in question cannot understand the counterargument, doesn't it make em much less valuable for your style of work anyway?

Empirically, it seems that humans have very "sticky" worldviews, such that whichever worldview they first inhabit, it's very unlikely that they switch to the other worldview. So depending on what you mean by "understand", I could have two responses:

They "could have" understood (and generated themselves) the counterargument if they had started out in the opposite worldview
No one currently in the field is able to "understand" the arguments of the other side, so it's not a sign of incompetence if a new person cannot "understand" such an argument

Obviously ideal Bayesians wouldn't have "sticky" worldviews; it turns out humans aren't ideal Bayesians.

Finally, if you actually believe this, doesn't it undermine the entire principle of AI debate?

If you mean debate as a proposal for AI alignment, you might hope that we can create AI systems that are closer to ideal Bayesian reasoners than we are, or you might hope that humans who think for a very long time are closer to ideal Bayesian reasoners. Either way, I agree this is a problem that would have to be dealt with.

If you mean debate as in "through debate, AI alignment researchers will have better beliefs", then yes, it does undermine this principle. (You might have noticed that not many alignment researchers try to do this sort of debate.)

↑ comment by Raemon · 2020-02-27T00:50:54.969Z · LW(p) · GW(p)

A lot of those concerns seem valid. I recalled the earlier comment thread and had it in mind while I was writing the response comment. (I agree that "viewpoint X" is a thing, and I don't even think it's that uncharitable to call it the MIRI/rationalist viewpoint, although it's simplified)

Fwiw, while I prefer option #3 (I just added #s to the options for easier reference), #2 and #4 both seem pretty fine. And whichever option one went with, getting representative members seems like an important thing to put a lot of effort into.

My current sense is that AF was aiming to be a place where people-other-than-Paul at OpenAI would feel comfortable participating. I can imagine it turns out "AF already failed to be this sufficiently that if you want that, you need to start over," but it is moderately expensive to start over. I would agree that this would require a lot of work, but seems potentially quite important and worthwhile.

What are the failure modes you imagine, and/or how much harder do you think it is, to host the review on AF, while aiming for a broader base of participants that AF currently feels oriented towards? (As compared to the "try for a broad base of participants and host it somewhere other than AF")

Replies from: Raemon

↑ comment by Raemon · 2020-02-27T01:08:47.525Z · LW(p) · GW(p)

Random other things I thought about:

I can definitely imagine "it turns out to get people involved you need more anonymity/plausible-deniability than a public forum affords", so starting from a different vantage point is better.

One of the options someone proposed was "CHAI, MIRI, OpenAI and Deepmind [potentially other orgs] are each sort of treated as an entity in a parliament, with N vote-weight each. It's up to them how they distribute that vote weight among their internal teams." I think I'd weakly prefer "actually you just really try to get more people from each team to participate, so you end up with information from 20 individuals rather than 4 opaque orgs", but I can imagine a few reasons why the former is more practical (with the plausible deniability being a feature/bug combo)

Replies from: rohinmshah

↑ comment by Rohin Shah (rohinmshah) · 2020-02-27T16:59:27.752Z · LW(p) · GW(p)

My current sense is that AF was aiming to be a place where people-other-than-Paul at OpenAI would feel comfortable participating.

Agreed that that was the goal; I'm arguing that it has failed at this. (Or, well, maybe they'd be comfortable participating, but they don't see the value in participating.)

What are the failure modes you imagine, and/or how much harder do you think it is, to host the review on AF, while aiming for a broader base of participants that AF currently feels oriented towards?

Mainly I think it would be really hard to get that broader base of participants. I imagine trying to convince specific people (not going to name names) that they should be participating, and the only argument that I think might be convincing to them would be "if we don't participate, then our work will be evaluated by MIRI-rationalist standards, and future entrants to the field will forever misunderstand our work in the same way that people forever misunderstand CIRL". It seems pretty bad to rely on that argument.

I think you might be underestimating how different these two groups are. Like, it's not just that they work on different things, they also have different opinions on the best ways to publish, what should count as good work, the value of theoretical vs. conceptual vs. empirical work, etc. Certainly most are glad that the other exists in the sense that they think it is better than nothing (but not everyone meets even this low bar), but beyond that there's not much agreement on anything. I expect the default reaction to be "this review isn't worth my time".

I can definitely imagine "it turns out to get people involved you need more anonymity/plausible-deniability than a public forum affords", so starting from a different vantage point is better.

As above, I expect the default reaction to be "this review isn't worth my time", rather than something like "I need plausible deniability to evaluate other people's work".

One of the options someone proposed was "CHAI, MIRI, OpenAI and Deepmind [potentially other orgs] are each sort of treated as an entity in a parliament, with N vote-weight each. It's up to them how they distribute that vote weight among their internal teams."

This sort of mechanism doesn't address the "review isn't worth my time" problem. It would probably give you a more unbiased estimate of what the "field" thinks, but only because e.g. Richard and I would get a very large vote weight. (And even that isn't unbiased -- Richard and I are much closer to the MIRI-rationalist viewpoint than the average for our orgs.)

↑ comment by Raemon · 2020-02-27T00:59:45.476Z · LW(p) · GW(p)

On the Givewell example:

Some noteworthy things about Givewell is that it's not really trying to make all nonprofits accountable to donors (since most nonprofits aren't even ranked). It's trying to answer a particular question, for a subset of the donor population.

By contrast, something like CharityNavigator is aiming to cover a broad swath of nonprofits and is more implicitly claiming that all nonprofits should be more accountable-on-average than they currently are.

It's also noteworthy that Givewell's paradigm is distinct from the general claims of "nonprofits should be accountable", or utilitarianism, or other EA frameworks. Givewell is doing one fairly specific thing, which is different from what CharityNavigator or OpenPhil are doing.

I do think CharityNavigator is an important and perhaps relevant example since they're optimizing a metric that I think is wrong. I think it's probably still at least somewhat good that CharityNavigator exists, since it moves the overall conversation of "we should be trying to evaluate nonprofits" forward, and creating more transparency than there used to be. I could be persuaded that CharityNavigator was net-negative though.

I'd be pretty worried if a bunch of biology researchers had to decide which physics papers should be published. (This exaggerates the problem, but I think it does qualitatively describe the problem.)

There's a pretty big distinction between "decide which papers get published." If some biologists started a journal that dealt with physics (because they thought they had some reason to believe they had a unique and valuable take on Physics And Biology) that might be weird, perhaps bad. But, it wouldn't be "decide what physics things get published." It'd be "some biologists start a weird Physics Journal with it's own kinda weird submission criteria."

(I think that might potentially be bad, from a "affecting signal/noise ratio" axis, but also I don't think the metaphor is that good – the only reason it feels potentially bad is because of the huge disconnect between physics and biology, and and "biologists start a journal about some facet of biology that intersects with some other field that's actually plausibly relevant to biology" feels fine)

Replies from: rohinmshah

↑ comment by Rohin Shah (rohinmshah) · 2020-02-27T17:06:49.769Z · LW(p) · GW(p)

If some biologists started a journal that dealt with physics (because they thought they had some reason to believe they had a unique and valuable take on Physics And Biology) that might be weird, perhaps bad. But, it wouldn't be "decide what physics things get published." It'd be "some biologists start a weird Physics Journal with it's own kinda weird submission criteria."

I in fact meant "decide what physics things get published"; in this counterfactual every physics journal / conference sends their submissions to biologists for peer review and a decision on whether it should be published. I think that is more correctly pointing at the problems I am worried about than "some biologists start a new physics journal".

Like, it is not the case that there already exists a public evaluation mechanism for work coming out of CHAI / OpenAI / DeepMind. (I guess you could look at whether the papers they produce are published in some top conference, but this isn't something OpenAI and DeepMind try very hard to do, and in any case that's a pretty bad evaluation mechanism because it's evaluating by the standards of the regular AI field, not the standards of AI safety.) So creating a public evaluation mechanism when none exists is automatically going to get some of the legitimacy, at least for non-experts.

Reviewing the Review

Contents

Was it worth it? Should we do it again?

Problems with Execution

Alignment Review?

Further Feedback?

16 comments