Recommendation Features on LessWrong

post by habryka (habryka4), jimrandomh · 2019-06-15T00:23:18.102Z · score: 61 (18 votes) · LW · GW · 36 comments

Contents

  Continue Reading
  From the Archives
  Why These Features
    LessWrong as a Repository for "long content"
    LessWrong as a Nudge
  Caveat: Addictiveness
  These Are Beta
None
37 comments

Today, we're rolling out several new beta features on the home page, which display recommended posts to read. The first is a Continue Reading section: if you start reading a multi-post sequence, it will suggest that you read another post from that sequence. The second is a From the Archives section, which recommends highly-rated posts that you haven't read, from all of LessWrong's history.

To use these features, please ensure you are logged-in [LW · GW].


Continue Reading

Sequences are a mechanism on LessWrong for organizing collections of related posts. Anyone can create a Sequence from the library page [LW · GW]. If you write a series of posts and add them to a Sequence, they will have Previous and Next links at the top and bottom; if you create a Sequence out of posts by other authors, they will have Previous and Next links for readers who came to them via the Sequence.

When you are logged in and read a post from a Sequence, the first unread post from that sequence will be added as a recommendation in the Continue Reading section, like this:

If you decide not to finish, hover over the recommendation and click the X button to dismiss the recommendation. For logged-out users, the Continue Reading section is replaced with a Core Reading section, which suggests the first posts of Rationality: A-Z [LW · GW], The Codex [LW · GW], and Harry Potter and the Methods of Rationality [LW · GW].


From the Archives

The home page now has a From the Archives section, which displays three posts randomly selected from the entire history of LessWrong. Currently, a post can appear in this section if:

(1) you've never read it while logged in (including on old-LessWrong),

(2) it has a score of at least 50, and

(3) it is not in the Meta section, or manually excluded from recommendation by moderators. (We manually exclude posts if they aged poorly in a way that wouldn't be captured by votes at the time -- for example, announcements of conferences that have already happened, and reporting of studies that later failed to replicate.)

Currently, if a post is eligible to appear in the From the Archives section, it will appear with probability proportional to the cube of its score.


Why These Features

LessWrong as a Repository for "long content"

Gwern’s about page has influenced me a lot in thinking about the future of LessWrong. Gwern uses the following quote:

The Internet is self destructing paper. A place where anything written is soon destroyed by rapacious competition and the only preservation is to forever copy writing from sheet to sheet faster than they can burn. If it’s worth writing, it’s worth keeping. If it can be kept, it might be worth writing…If you store your writing on a third party site like Blogger, Livejournal or even on your own site, but in the complex format used by blog/wiki software du jour you will lose it forever as soon as hypersonic wings of Internet labor flows direct people’s energies elsewhere. For most information published on the Internet, perhaps that is not a moment too soon, but how can the muse of originality soar when immolating transience brushes every feather?
-- Julian Assange (“Self destructing paper”, 5 December 2006)

Most of the content on the internet is designed to be read and forgotten in a very short period of time. Existing discussion platforms like Reddit and many forums even close threads automatically after a certain period of time to ensure that all discussion centers around the most recent activity on the site.

One of the goals I have with LessWrong is to be a place where we can build on each other's ideas over the course of multiple decades, and where if you show up, you engage with the content on the site in a focused way, more similar to a textbook than a normal internet forum. And like our best textbooks, good introductions into core topics tend to stand the test of time quite well (e.g. the Feynman Lectures, which is still one of, if not the best introduction to physics even 60 years later).

The Continue Reading system is a key part of that goal, because it makes it much easier to use the site as a tool for focused study, since continuing to read the sequences you started is now one of the core actions on the site.

The recommendation system is also a key part of that goal, because it creates a way to discover content from the complete history of LessWrong, instead of just the last week, which strikes me as a necessary component to make collective intellectual progress that can span multiple decades. The best things for almost anyone to read have very likely not been written in the past week.


LessWrong as a Nudge

Continue Reading is a nudge to encourage reading a few long things, rather than a lot of short things. Longer writing allows topics to be explored in greater depth, and also enables more explicit decisions about what to read, since making one decision per sequence is a lot less work than making one decision per post.

From the Archives is a nudge to read better posts. When we choose what to read, there is often a recency bias; the best writing of the past ten years will be better, on average, than the best writing of the past week, but active conversations will focus on the most recent things. A good information diet contains a mix of recent writing and of timeless classics; by putting From the Archives on the home page, we are saying, on the margin, read more of the past.

I also think that From the Archives will have a positive effect on what people write on LessWrong. There are many good ideas in LessWrong's archives, waiting to be built upon, which haven't received attention recently; my hope is that recommendations of older posts will inspire more good writing.


Caveat: Addictiveness

Reading the latest posts on LessWrong is finite; you will run out of interesting-seeming recent posts, which creates a natural limit on time spent. Reading posts from the archives is effectively infinite; LessWrong's archives are deep enough that you probably won't ever run out of things to read. These new recommendation features therefore offer an opportunity to spend a lot of time by accident. We'd rather you make a deliberate decision about what and how much to read on LessWrong.

If you find you're spending more time reading LessWrong's recommended posts than you want, or expect that you would spend more time than you want to, you can turn off the Continue Reading and/or From the Archives sections by clicking the gear icon. (This requires that you be logged in to save the setting.)


These Are Beta

These features are beta, and probably have bugs. The From the Archives post selection algorithm we're currently using (based on post scores) seems to work okay for now, but scores are heavily affected by post visibility as well as quality, so some posts (especially imported posts) aren't being recommended that should be, and post scores will suffer a positive-feedback effect where being recommended causes posts to be recommended more. So, we expect to rely less on the raw post score in the future, and more on other evaluation mechanisms such as asking users for retrospective evaluations of posts they've previously read, read-completion and clickthrough rates, vote-to-views ratios, etc. The recommendation algorithm is likely to become too complex to straightforwardly explain, though its workings will always be knowable to those willing to dive into the source code.

36 comments

Comments sorted by top scores.

comment by John_Maxwell_IV · 2019-06-16T05:40:55.363Z · score: 19 (6 votes) · LW · GW

Idea: nudge "From the Archives" so it tends to show different users the same posts around the same time, so if someone leaves a comment on a post they read, others might also see the comment and a discussion can happen. (Or alternatively, I suppose you could just upweight posts which recently received comments in the "From the Archives" selection process. That seems better.)

comment by Raemon · 2019-06-17T21:59:21.062Z · score: 7 (3 votes) · LW · GW
(Or alternatively, I suppose you could just upweight posts which recently received comments in the "From the Archives" selection process. That seems better.)

This is much simpler and I think makes for a pretty good default plan.

(There's also the question of "the recent discussion section doesn't currently do a good enough job of highlighting recent comments to address this whole concern automatically. I'm curious if people have opinions on what would improve that situation)

comment by Zvi · 2019-06-16T15:15:16.270Z · score: 7 (3 votes) · LW · GW

I'd go a step stronger. Brainstorm: From the Archives should have a random order for some time period (e.g. something from a day to a week) and show you the three things highest on that list that you haven't read.

comment by habryka (habryka4) · 2019-06-17T22:15:52.336Z · score: 2 (1 votes) · LW · GW

The problem with this approach is that we randomize recommendations on each load, and it's not obvious how to do this while preserving that functionality (which I think is really key for the whole thing to work).

comment by philh · 2019-06-19T07:56:55.436Z · score: 15 (5 votes) · LW · GW

Fwiw, on other sites I sometimes find that I see something interesting just as I'm clicking away, and then when I come back the interesting thing is gone. Making the recommendations a little sticky would help with that. (I see they don't reload if I use the back button, so that might be sufficient.)

comment by Zvi · 2019-06-19T15:48:32.560Z · score: 4 (2 votes) · LW · GW

A 'remind me what recommendations you've given me recently' list being available to be clicked on might be nice?

comment by habryka (habryka4) · 2019-06-19T18:40:31.324Z · score: 2 (1 votes) · LW · GW

Yeah, agree with that. Not sure where to put a link to that, but will figure something out.

comment by Vaniver · 2019-06-17T05:45:52.623Z · score: 9 (4 votes) · LW · GW

So, I was just recommended Plastination is Maturing and Needs Funding [LW · GW]. I considered putting some effort into "what's the state of plastination in 2019, 7 years later?" and commenting, but hit a handful of obstacles, one of which was "is the state of plastination in 2019 long content?". Like, the relevant fund paid out its prizes at various times, and it'd take a bit more digging to figure out if the particular team in Hanson's post was the one that won, and it's not really obvious if it matters. (Suppose we discover that the prize wasn't won by that team, after the evaluation was paid for; what does that imply?)

This makes me more excited about John's idea [LW · GW] that shows posts with some simultaneity between users; like the Sequences Reruns, for example. It might be worth it to have a comment writing up what's changed for the other people clicking on it in 2019 who don't know where to look or aren't that committed to figuring things out, where it doesn't make sense to push that post into 'recent discussion' on my own (if this was randomly picked for me).

comment by Tetraspace Grouping (tetraspace-grouping) · 2019-06-15T13:22:17.340Z · score: 8 (4 votes) · LW · GW

Is there any way to mark a post as unread? It's recommending me a lot of sequences that it believes I'm halfway through when in fact I've just briefly checked a couple of posts in it, and it would be nice if I could start it again from the beginning.

comment by habryka (habryka4) · 2019-06-15T16:31:46.852Z · score: 5 (3 votes) · LW · GW

Yeah, this is pretty high on the Todo list. Hopefully we can do that next week.

comment by pranomostro · 2019-06-15T10:19:52.496Z · score: 7 (4 votes) · LW · GW

Just some quick feedback on the "Continue Reading" feature: At the moment, when I read a post in the middle of a sequence, the next recommended post is at the beginning of the sequence, but I would like it to be the post after the last one I read in the sequence. Perhaps this is intentional, but I wouldn't use the feature that way, since I already try to read the posts in sequence (sometimes without being logged in).

comment by jimrandomh · 2019-06-15T19:17:36.315Z · score: 5 (3 votes) · LW · GW

It's ambiguous whether to recommend the first unread post or the next post after the last read, and I suspect neither answer will satisfy everyone. You can at least click through to the sequence table of contents, and go from there, though.

comment by habryka (habryka4) · 2019-06-15T16:34:49.691Z · score: 5 (3 votes) · LW · GW

I prefer the current setup, mostly because I often discover sequences by just reading posts in the recommendations that then turn to have been part of a sequence I want to read, for which I then want to start at the beginning (and I expect this will be particularly the case with posts from R:A-Z for most users).

Will think about whether there is a way to get the best of both worlds.

comment by Zvi · 2019-06-16T15:16:25.520Z · score: 2 (1 votes) · LW · GW

My gut says that it's worth it to explicitly offer both, if someone comes in in the middle?

comment by Said Achmiz (SaidAchmiz) · 2019-06-15T02:04:15.292Z · score: 6 (4 votes) · LW · GW

Currently, a post can appear in this section if … it has a score of at least 50 …

Is this adjusted by post date? Posts from before the relaunch are going to have much less karma, on average (and as user karma grows and the karma weight of upvotes grows with it, average karma will increase further). A post from last month with 50 karma, and a post from 2010 with 50 karma, are really not comparable…

We manually exclude posts if they aged poorly in a way that wouldn’t be captured by votes at the time—for example … reporting of studies that later failed to replicate

I wish you wouldn’t!

It seems to me that it would be extremely valuable to include posts like this in the recommendations—but annotate them with a note that the research in question hasn’t replicated. This would, I think, have an excellent pedagogic effect! To see how popular, how highly-upvoted, a study could be, while turning out later to have been bunk—think of the usefulness as a series of naturalistic rationality case studies! (Likewise useful would be to examine the comment threads of these old posts; did any of the commentariat suspect anything amiss? If so, what heuristics did they use? Did certain people consistently get it right, and if so, how? etc.) The new recommendation engine could do great good, in this way…

comment by jimrandomh · 2019-06-15T04:04:10.045Z · score: 8 (4 votes) · LW · GW

Is this adjusted by post date? Posts from before the relaunch are going to have much less karma, on average (and as user karma grows and the karma weight of upvotes grows with it, average karma will increase further). A post from last month with 50 karma, and a post from 2010 with 50 karma, are really not comparable…

This is one of a number of significant problems with using karma for this. My ideal system - which we probably won't do soon, because of the amount of effort involved - would be something like:

  • Periodically, users get a list of posts that they read over the past week, end are asked to pick their favorite and to update their votes
  • This is converted into pairwise comparisons end used to generate an elo rating for each post
  • The recommender has a VOI factor to increase the visibility of posts where it doesn't have a precise enough estimate of the rating
  • We separately have trusted raters compare posts from a more random sampling, compute a separate set of ratingr that way, and use it as a ground truth to set the tuning parameters and see how well it's working.

In this world, karma would still be displayed and updated in response to votes the same way it is now, to give people an estimate of visibility and reception and to get a quick initial estimate of quality, but it would be superseded as a measurement of post quality for older content.

comment by habryka (habryka4) · 2019-06-15T02:14:47.593Z · score: 2 (1 votes) · LW · GW
It seems to me that it would be extremely valuable to include posts like this in the recommendations—but annotate them with a note that the research in question hasn’t replicated. This would, I think, have an excellent pedagogic effect! To see how popular, how highly-upvoted, a study could be, while turning out later to have been bunk—think of the usefulness as a series of naturalistic rationality case studies! (Likewise useful would be to examine the comment threads of these old posts; did any of the commentariat suspect anything amiss? If so, what heuristics did they use? Did certain people consistently get it right, and if so, how? etc.) The new recommendation engine could do great good, in this way…

This is an interesting point. I think I would be in favor of this if we had a way to pin comments to the top as moderators. Right now I expect we could leave a comment, but I don't expect that comment to actually show up high enough in the comment tree to be seen by most users, and we could edit the post but I am particularly hesitant to write retraction notices for other people.

Ideally I would want a way for things like this to happen organically driven by user activity instead of moderator intervention, but I don't know yet how to best do that. Interested in suggestions, since it feels important for the broader vision of making progress over a long period of time.

comment by Said Achmiz (SaidAchmiz) · 2019-06-15T03:24:09.174Z · score: 2 (1 votes) · LW · GW

Why not insert a note at the top of the post?

Make it stand out, visually, like put it in a “moderator note” box or whatever, and you’re good to go…

Ideally I would want a way for things like this to happen organically driven by user activity instead of moderator intervention

I confess I don’t really know what you mean by this.

comment by habryka (habryka4) · 2019-06-15T03:36:41.283Z · score: 2 (1 votes) · LW · GW

I think there is still a loss of ownership that people would feel when we add big moderator note's to the top of their posts, even if clearly signaled as moderator-added content, that I think would feel quite violating to many authors, though I might be wrong here.

I confess I don’t really know what you mean by this.

Not sure how to explain more. It would be good if there was some system that would allow other users that are not moderators to be able to inform other users about the updated epistemic content of a post. There are many potential ways to achieve that.

One might be to add inline comments that when they reach a certain threshold of votes can be displayed prominently enough to get the attention of others reading the content for the first time (though that also comes with cost), another might be to find some way to reduce or remove the strong first-mover bias in comment sections that prevent new comments from reaching the top of the comment section most of the time (due to voting activity usually being concentrated right after a post is created, which makes it hard fo rnew comments to get a lot of upvotes).

comment by Said Achmiz (SaidAchmiz) · 2019-06-15T03:50:24.186Z · score: 2 (1 votes) · LW · GW

It would be good if there was some system that would allow other users that are not moderators to be able to inform other users about the updated epistemic content of a post.

I see, yes. Well, I agree that such a system would be good to have, but I am not convinced that it would be better for what I have in mind that using the recommendation system you’ve built for this. After all, three-quarters of the work here is precisely in bringing the old posts in question to the attention of users; relying on users in the first place, to accomplish that, seems to be an ineffective plan—whereas using the automated recommendation engine is perfect. (Still the user-originated system you allude to would, I think, be a good supplement.)

I think there is still a loss of ownership that people would feel when we add big moderator note’s to the top of their posts, even if clearly signaled as moderator-added content, that I think would feel quite violating to many authors, though I might be wrong here.

Well, that seems to me to be a matter of designing the UI/styling for clear separation, which is an eminently tractable problem. (Or do you disagree, with either clause?) There is, after all, all sorts of metadata and navigation UI and so on around a post, which is not generated by the author (directly or at all); have the layout and styling and such of these “moderator notes” clearly associate them with this metadata/navigation, and I think (unless I am misunderstanding you) that your concern is thereby addressed.

comment by habryka (habryka4) · 2019-06-15T04:04:47.030Z · score: 2 (1 votes) · LW · GW
After all, three-quarters of the work here is precisely in bringing the old posts in question to the attention of users; relying on users in the first place, to accomplish that, seems to be an ineffective plan—whereas using the automated recommendation engine is perfect. (Still the user-originated system you allude to would, I think, be a good supplement.)

This indicates at least some misunderstanding of what I tried to convey. I agree that the recommendation system can do the job of promoting the visibility of such posts, but then I was additionally suggesting that it would be good to independently allow users to promote epistemic corrections to a higher level of visibility on the post-page itself in a way that does not require moderator interaction.

comment by Said Achmiz (SaidAchmiz) · 2019-06-15T04:11:42.362Z · score: 4 (2 votes) · LW · GW

Ah! Yes, I understand now, and entirely agree!

comment by habryka (habryka4) · 2019-06-15T03:59:12.150Z · score: 2 (1 votes) · LW · GW

I think agree that we can do some better UI work to show that separation, and I think that's probably the correct long-term strategy. Just the backlog of additional features like that is long, and difficulty of solving this problem well isn't trivial (and neither is the cost of messing up), so I was mostly comparing options that don't require any additional features like that and keep the existing site hierarchy.

This discussion has however made me update that putting in the relevant effort does surface a good amount of additional value, so I will think about that more.

comment by habryka (habryka4) · 2019-06-15T02:11:06.383Z · score: 2 (1 votes) · LW · GW
Is this adjusted by post date? Posts from before the relaunch are going to have much less karma, on average (and as user karma grows and the karma weight of upvotes grows with it, average karma will increase further). A post from last month with 50 karma, and a post from 2010 with 50 karma, are really not comparable…

Rerunning the whole vote history with the new karma is one of the next things on our to-do list. Right now it will indeed be biased towards the recent year, which I hope to fix soon (that is one of the things that I consider necessary before removing the "[beta]" tag from the feature).

comment by Zvi · 2019-06-16T15:13:35.830Z · score: 5 (3 votes) · LW · GW

Random recommendations included things I've read since LW2.0 came into fashion, presumably because I wasn't logged in. I'm guessing there's no reasonable fix for this (e.g. IP tracking), but perhaps a button that says "mark as read" would be cool, same as "mark as unread" but in a place that would be easy to mark. Dunno. I do realize one can just click on the thing.

It also doesn't feel like any of the current options actually give me a good "Best of Less Wrong" thing either in general or on particular topics. The selected sequences (and the sequences themselves) are good things to have access to, but it seems like the thing I want to exist, simply doesn't and isn't trivial to make? Alas, I don't have the time to make it right now.

comment by habryka (habryka4) · 2019-06-16T18:23:01.806Z · score: 2 (1 votes) · LW · GW

Yeah, I've heard the same from others, so I think it's likely we will add a mark as read button.

Interested in hearing more about what the thing is you would like to see. There are things we can do with cookies that would at least help the accuracy of the view tracking.

comment by Lanrian · 2019-06-15T13:07:29.540Z · score: 5 (3 votes) · LW · GW

Re addictiveness: a potential fix could be to add an option to only refresh the recommended archive posts once per day (or some other time period of your choice).

comment by Zvi · 2019-06-17T11:20:21.762Z · score: 4 (2 votes) · LW · GW

Part of the idea of curation is that some posts are what one might call Evergreen. They make sense out of the context of the discussion at that time, or are part of a full Evergreen discussion that makes sense out of the context of that time. Also, some posts are designed largely as exercises or places to sort things out, versus creating Evergreen things that last.

This especially applies to calls to action that no longer make any sense given the time that has passed.

If we're going to do recommendations as the top thing on the page every time, it seems like it would be worth it to remove the ones that are about topics that no longer apply or make sense. I realize this will involve judgment calls. I don't have a good solution beyond 'someone goes through them and picks which ones not to include.'

comment by habryka (habryka4) · 2019-06-17T16:19:22.137Z · score: 2 (1 votes) · LW · GW

Current algorithm is: The LessWrong team is removing all the ones that don't seem evergreen as soon as we see them in the recommendation section.

Likely also makes sense to do some systematic passthrough. My guess is there are around 3000 posts that can be recommended. Going through all of them is some work, but not prohibitively much work.

comment by Kaj_Sotala · 2019-06-23T18:24:44.314Z · score: 3 (1 votes) · LW · GW

The home page now has a From the Archives section, which displays three posts randomly selected from the entire history of LessWrong. Currently, a post can appear in this section if:

(1) you've never read it while logged in (including on old-LessWrong),

This seems bugged; I'm frequently recommended posts that I've already read, and in some cases even voted on (I was just recommended https://www.lesswrong.com/posts/6hfGNLf4Hg5DXqJCF/a-fable-of-science-and-politics [LW · GW] , which I know I've read multiple times, and which showed an upvote right when I opened the page).

comment by habryka (habryka4) · 2019-06-23T21:53:00.636Z · score: 2 (1 votes) · LW · GW

Huh, that's quite weird. It's definitely not doing that for me, though that might be a result of my account being fully populated with events we've tracked on LW2.

We've imported all the old post-view events from LW 1, but there is a good chance those are missing some coverage, in particular for very old accounts.

At the very least we should make sure to not show you something that you've already voted on.

comment by [deleted] · 2019-06-16T18:54:55.295Z · score: 1 (1 votes) · LW · GW

Due to the new recommendations, I made a comment to an old post; it is now shown as comment by "[deleted]". Some left-overs from the great import?

comment by [deleted] · 2019-06-16T18:57:15.513Z · score: 1 (1 votes) · LW · GW

Oh, I'm now deleted everywhere... Is that some account-merging gone wrong? (I had two users, with I think the same e-mail, and then could only log-in with my previously deleted account on LW2, ...)

comment by habryka (habryka4) · 2019-06-16T20:05:31.239Z · score: 2 (1 votes) · LW · GW

Oh, interesting. I will look into this, probably tomorrow. Sorry for the confusion. Probably an account merging side effect

comment by [deleted] · 2019-06-18T09:33:00.608Z · score: 1 (1 votes) · LW · GW

Thanks for looking into this.

comment by habryka (habryka4) · 2019-06-18T16:44:44.549Z · score: 2 (1 votes) · LW · GW

Can you send us a message in Intercom (the chat bubble in the bottom right)? We can then figure out what your two accounts are and merge them properly, without one of them being deleted.