Richard Ngo's Shortform

post by Richard_Ngo (ricraz) · 2020-04-26T10:42:18.494Z · LW · GW · 128 comments

128 comments

Comments sorted by top scores.

comment by Richard_Ngo (ricraz) · 2020-08-20T14:06:57.832Z · LW(p) · GW(p)

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here. So far my best effort to make that argument has been in the comment thread starting here [LW(p) · GW(p)]. Looking back at that thread, I just noticed that a couple [LW(p) · GW(p)] of those comments [LW(p) · GW(p)] have been downvoted to negative karma. I don't think any of my comments have ever hit negative karma before; I find it particularly sad that the one time it happens is when I'm trying to explain why I think this community is failing at its key goal of cultivating better epistemics.

There's all sorts of arguments to be made here, which I don't have time to lay out in detail. But just step back for a moment. Tens or hundreds of thousands of academics are trying to figure out how the world works, spending their careers putting immense effort into reading and producing and reviewing papers. Even then, there's a massive replication crisis. And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts, and hoping that a few of the best ideas stick? This is not what a desperate effort to find the truth looks like.

Replies from: mr-hire, zachary-robertson, Benito, Ruby, Pongo, Viliam, An1lam
comment by Matt Goldenberg (mr-hire) · 2020-08-20T16:11:44.972Z · LW(p) · GW(p)

And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts, and hoping that a few of the best ideas stick? This is not what a desperate effort to find the truth looks like.

It seems to me that maybe this is what a certain stage in the desperate effort to find the truth looks like?

Like, the early stages of intellectual progress look a lot like thinking about different ideas and seeing which ones stand up robustly to scrutiny.  Then the best ones can be tested more rigorously [LW · GW]and their edges refined through experimentation.  

It seems to me like there needs to be some point in the desparate search for truth in which you're allowing for half-formed thoughts and unrefined hypotheses, or else you simply never get to a place where the hypotheses you're creating even brush up against the truth.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-20T20:50:53.709Z · LW(p) · GW(p)

In the half-formed thoughts stage, I'd expect to see a lot of literature reviews, agendas laying out problems, and attempts to identify and question fundamental assumptions. I expect that (not blog-post-sized speculation) to be the hard part of the early stages of intellectual progress, and I don't see it right now.

Perhaps we can split this into technical AI safety and everything else. Above I'm mostly speaking about "everything else" that Less Wrong wants to solve. Since AI safety is now a substantial enough field that its problems need to be solved in more systemic ways.

Replies from: mr-hire
comment by Matt Goldenberg (mr-hire) · 2020-08-20T22:22:52.535Z · LW(p) · GW(p)

In the half-formed thoughts stage, I'd expect to see a lot of literature reviews, agendas laying out problems, and attempts to identify and question fundamental assumptions. I expect that (not blog-post-sized speculation) to be the hard part of the early stages of intellectual progress, and I don't see it right now.

I would expect that later in the process.  Agendas laying out problems and fundamental assumptions don't spring from nowhere (at least for me), they come from conversations where I'm trying to articulate some intuition, and I recognize some underlying pattern. The pattern and structure doesn't emerge spontaneously, it comes from trying to pick around the edges of a thing, get thoughts across, explain my intuitions and see where they break.

I think it's fair to say that crystallizing these patterns into a formal theory is a "hard part", but the foundation for making it easy is laid out in the floundering and flailing that came before.

comment by Zachary Robertson (zachary-robertson) · 2020-08-20T19:10:39.275Z · LW(p) · GW(p)

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here.

I think this is literally true. There seems to be very little ability to build upon prior work.

Out of curiosity do you see Less Wrong as significantly useful or is it closer to entertainment/habit? I've found myself thinking along the same lines as I start thinking about starting my PhD program etc. The utility of Less Wrong seems to be a kind of double-edged sword. On the one hand, some of the content is really insightful and exposes me to ideas I wouldn't otherwise encounter. On the other hand, there is such an incredible amount of low-quality content that I worry that I'm learning bad practices.

Replies from: Viliam
comment by Viliam · 2020-08-20T20:57:24.435Z · LW(p) · GW(p)

Ironically, some people already feel threatened by the high standards here. Setting them higher probably wouldn't result in more good content. It would result in less mediocre content, but probably also less good content, as the authors who sometimes write a mediocre article and sometimes a good one, would get discouraged and give up.

Ben Pace gives a few examples of great content in the next comment. It would be better to easier separate the good content from the rest, but that's what the reviews are for. Well, only one review so far, if I remember correctly. I would love to see reviews of pre-2018 content (maybe multiple years in one review, if they were less productive). Then I would love to see the winning content get the same treatment as the Sequences -- edit them and arrange them into a book, and make it "required reading" for the community (available as a free PDF).

Replies from: zachary-robertson
comment by Zachary Robertson (zachary-robertson) · 2020-08-20T22:44:48.091Z · LW(p) · GW(p)

Setting them higher (standards) probably wouldn't result in more good content.

I broadly agree here. However, I do see the short-forms as a consistent way to skirt around this. I'd say at least 30% of the Less Wrong value proposition are the conversations I get to have. Short-forms seem to be more adapted for continuing conversations and they have a low bar for being made.

I could clarify a bit. My main problem with low quality content isn't exactly that it's 'wrong' or something like that. Mostly, the issues I'm finding most common for me are,

  1. Too many niche pre-requisites.
  2. No comments
  3. Nagging feeling post is reinventing the wheel

I think one is a ridiculously bad problem. I'm literally getting a PhD in machine learning, write about AI Safety, and still find a large number of those posts (yes AN posts) glazed in internal-jargon that makes it difficult to connect with current research. Things get even worse when I look at non-AI related things.

Two is just a tragedy of the fact the rich get richer. While I'm guilty of this also, I think that requiring posts to also post seed questions/discussion topics in the comments could go a long way to alleviate this problem. I oftentimes read a post and want to leave a comment, but then don't because I'm not even sure the author thought about the discussion their post might start.

Three is probably a bit mean. Yet, more than once I've discovered a Less Wrong concept already had a large research literature devoted to it. I think this ties in with one due to the fact niche pre-reqs often go hand-in-hand with insufficient literature review.

comment by Ben Pace (Benito) · 2020-08-20T20:00:42.082Z · LW(p) · GW(p)

The top posts in the 2018 Review [LW · GW] are filled with fascinating and well-explained ideas. Many of the new ideas are not settled science, but they're quite original and substantive, or excellent distillations of settled science, and are often the best piece of writing on the internet about their topics.

You're wrong about LW epistemic standards not being high enough to make solid intellectual progress, we already have. On AI alone (which I am using in large part because there's vaguely more consensus around it than around rationality), I think you wouldn't have seen almost any of the public write-ups (like Embedded Agency and Zhukeepa's Paul FAQ) without LessWrong, and I think a lot of them are brilliant.

I'm not saying we can't do far better, or that we're sufficiently good. Many of the examples of success so far are "Things that were in people's heads but didn't have a natural audience to share them with". There's not a lot of collaboration at present, which is why I'm very keen to build the new LessWrong Docs that allows for better draft sharing and inline comments and more. We're working on the tools for editing tags, things like edit histories and so on, that will allow us to build a functioning wiki system to have canonical writeups and explanation that people add to and refine. I want future iterations of the LW Review to have more allowance for incorporating feedback from reviewers. There's lots of work to do, and we're just getting started. But I disagree the direction isn't "a desperate effort to find the truth". That's what I'm here for.

Even in the last month or two, how do you look at things like this [LW · GW] and this [LW · GW] and this [LW · GW] and this [LW · GW] and not think that they're likely the best publicly available pieces of writing in the world about their subjects? Wrt rationality, I expect things like this [LW · GW] and this [LW · GW] and this [LW · GW] and this [LW · GW] will probably go down as historically important LW posts that helped us understand the world, and make a strong showing in the 2020 LW Review.

Replies from: ricraz, zachary-robertson
comment by Richard_Ngo (ricraz) · 2020-08-21T06:28:12.725Z · LW(p) · GW(p)

As mentioned in my reply to Ruby, this is not a critique of the LW team, but of the LW mentality. And I should have phrased my point more carefully - "epistemic standards are too low to make any progress" is clearly too strong a claim, it's more like "epistemic standards are low enough that they're an important bottleneck to progress". But I do think there's a substantive disagreement here. Perhaps the best way to spell it out is to look at the posts you linked and see why I'm less excited about them than you are.

Of the top posts in the 2018 review, and the ones you linked (excluding AI), I'd categorise them as follows:

Interesting speculation about psychology and society, where I have no way of knowing if it's true:

  • Local Validity as a Key to Sanity and Civilization
  • The Loudest Alarm Is Probably False
  • Anti-social punishment (which is, unlike the others, at least based on one (1) study).
  • Babble
  • Intelligent social web
  • Unrolling social metacognition
  • Simulacra levels
  • Can you keep this secret?

Same as above but it's by Scott so it's a bit more rigorous and much more compelling:

  • Is Science Slowing Down?
  • The tails coming apart as a metaphor for life

Useful rationality content:

  • Toolbox-thinking and law-thinking
  • A sketch of good communication
  • Varieties of argumentative experience

Review of basic content from other fields. This seems useful for informing people on LW, but not actually indicative of intellectual progress unless we can build on them to write similar posts on things that *aren't* basic content in other fields:

  • Voting theory primer
  • Prediction markets: when do they work
  • Costly coordination mechanism of common knowledge (Note: I originally said I hadn't seen many examples of people building on these ideas, but at least for this post there seems to be a lot.)
  • Six economics misconceptions
  • Swiss political system

It's pretty striking to me how much the original sequences drew on the best academic knowledge, and how little most of the things above draw on the best academic knowledge. And there's nothing even close to the thoroughness of Luke's literature reviews.

The three things I'd like to see more of are:

1. The move of saying "Ah, this is interesting speculation about a complex topic. It seems compelling, but I don't have good ways of verifying it; I'll treat it like a plausible hypothesis which could be explored more by further work." (I interpret the thread I originally linked [LW(p) · GW(p)] as me urging Wei to do this).

2. Actually doing that follow-up work. If it's an empirical hypothesis, investigating empirically. If it's a psychological hypothesis, does it apply to anyone who's not you? If it's more of a philosophical hypothesis, can you identify the underlying assumptions and the ways it might be wrong? In all cases, how does it fit into existing thought? (That'll probably take much more than a single blog post).

3. Insofar as many of these scattered plausible insights are actually related in deep ways, trying to combine them so that the next generation of LW readers doesn't have to separately learn about each of them, but can rather download a unified generative framework.

Replies from: Benito, Ruby, DanielFilan
comment by Ben Pace (Benito) · 2020-08-22T03:40:34.339Z · LW(p) · GW(p)

Quoting your reply to Ruby below, I agree I'd like LessWrong to be much better at "being able to reliably produce and build on good ideas". 

The reliability and focus feels most lacking to me on the building side, rather than the production, which I think we're doing quite well at. I think we've successfully formed a publishing platform that provides and audience who are intensely interested in good ideas around rationality, AI, and related subjects, and a lot of very generative and thoughtful people are writing down their ideas here.

We're low on the ability to connect people up to do more extensive work on these ideas – most good hypotheses and arguments don't get a great deal of follow up or further discussion.

Here are some subjects where I think there's been various people sharing substantive perspectives, but I think there's also a lot of space for more 'details' to get fleshed out and subquestions to be cleanly answered:

The above isn't complete, it's just some of the ones that come to mind as having lots of people sharing perspectives. And the list of people definitely isn't complete.

Here examples of things that I'd like to see more of, that feel more like doing the legwork to actually dive into the details:

  • Eli Tyre and Bucky replicating Scott's birth-order hypothesis
  • Katja and the other fine people at AI Impacts doing long-term research on a question (discontinuous progress) with lots of historical datapoints
  • Jameson writing up his whole research question in great detail and very well, and then an excellent commenter turning up and answering it
  • Zhukeepa writing up an explanation of Paul's research, allowing many more to understand it, and allowing Eliezer to write a response
  • Scott writing Goodhart Taxonomy, and the commenters banding together to find a set of four similar examples to add to the post
  • Val writing some interesting things about insight meditation, prompting Kaj to write a non-mysterious explanation
  • In the LW Review when Bucky checked out the paper Zvi analysed and argued it did not support the conclusions Zvi reached (this changed my opinion of Zvi's post from 'true' to 'false')
  • The discussion around covid and EMH prompting Richard Meadows to write down a lot of the crucial and core arguments around the EMH

The above is also not mentioning lots of times when the person generating the idea does a lot of the legwork, like Scott or Jameson or Sarah or someone.

I see a lot of (very high quality) raw energy here that wants shaping and directing, with the use of lots of tools for coordination (e.g. better collaboration tools).

The epistemic standards being low is one way of putting it, but it doesn't resonate with me much and kinda feels misleading. I think our epistemic standards are way higher than the communities you mention (historians, people interested in progress studies). Bryan Caplan said he knows of no group whose beliefs are more likely to be right in general than the rationalists, this seems often accurate to me. I think we do a lot of exploration and generation and evaluation, just not in a very coordinated manner, and so could make progress at like 10x–100x the rate if we collaborated better, and I think we can get there without too much work.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-22T06:28:38.901Z · LW(p) · GW(p)

"I see a lot of (very high quality) raw energy here that wants shaping and directing, with the use of lots of tools for coordination (e.g. better collaboration tools)."

Yepp, I agree with this. I guess our main disagreement is whether the "low epistemic standards" framing is a useful way to shape that energy. I think it is because it'll push people towards realising how little evidence they actually have for many plausible-seeming hypotheses on this website. One proven claim is worth a dozen compelling hypotheses, but LW to a first approximation only produces the latter.

When you say "there's also a lot of space for more 'details' to get fleshed out and subquestions to be cleanly answered", I find myself expecting that this will involve people who believe the hypothesis continuing to build their castle in the sky, not analysis about why it might be wrong and why it's not.

That being said, LW is very good at producing "fake frameworks". So I don't want to discourage this too much. I'm just arguing that this is a different thing from building robust knowledge about the world.

Replies from: Benito, John_Maxwell_IV, Benito
comment by Ben Pace (Benito) · 2020-08-23T02:19:05.125Z · LW(p) · GW(p)

One proven claim is worth a dozen compelling hypotheses

I will continue to be contrary and say I'm not sure I agree with this.

For one, I think in many domains new ideas are really hard to come by, as opposed to making minor progress in the existing paradigms. Fundamental theories in physics, a bunch of general insights about intelligence (in neuroscience and AI), etc.

And secondly, I am reminded of what Lukeprog wrote in his moral consciousness report, that he wished the various different philosophies-of-consciousness would stop debating each other, go away for a few decades, then come back with falsifiable predictions. I sometimes take this stance regarding many disagreements of import, such as the basic science vs engineering approaches to AI alignment. It's not obvious to me that the correct next move is for e.g. Eliezer and Paul to debate for 1000 hours, but instead to go away and work on their ideas for a decade then come back with lots of fleshed out details and results that can be more meaningfully debated.

I feel similarly about simulacra levels, Embedded Agency, and a bunch of IFS stuff. I would like to see more experimentation and literature reviews where they make sense, but I also feel like these are implicitly making substantive and interesting claims about the world, and I'd just be interested in getting a better sense of what claims they're making, and have them fleshed out + operationalized more. That would be a lot of progress to me, and I think each of them is seeing that sort of work (with Zvi, Abram, and Kaj respectively leading the charges on LW, alongside many others).

Replies from: Raemon, ricraz
comment by Raemon · 2020-08-23T05:02:22.381Z · LW(p) · GW(p)

I think I'm concretely worried that some of those models / paradigms (and some other ones on LW) don't seem pointed in a direction that leads obviously to "make falsifiable predictions."

And I can imagine worlds where "make falsifiable predictions" isn't the right next step, you need to play around with it more and get it fleshed out in your head before you can do that. But there is at least some writing on LW that feels to me like it leaps from "come up with an interesting idea" to "try to persuade people it's correct" without enough checking.

(In the case of IFS, I think Kaj's sequence is doing a great job of laying it out in a concrete way where it can then be meaningfully disagreed with. But the other people who've been playing around with IFS didn't really seem interested in that, and I feel like we got lucky that Kaj had the time and interest to do so.)

comment by Richard_Ngo (ricraz) · 2020-08-23T06:28:00.026Z · LW(p) · GW(p)

I feel like this comment isn't critiquing a position I actually hold. For example, I don't believe that "the correct next move is for e.g. Eliezer and Paul to debate for 1000 hours". I am happy for people to work towards building evidence for their hypotheses in many ways, including fleshing out details, engaging with existing literature, experimentation, and operationalisation.

Perhaps this makes "proven claim" a misleading phrase to use. Perhaps more accurate to say: "one fully fleshed out theory is more valuable than a dozen intuitively compelling ideas". But having said that, I doubt that it's possible to fully flesh out a theory like simulacra levels without engaging with a bunch of academic literature and then making predictions.

I also agree with Raemon's response below.

comment by John_Maxwell (John_Maxwell_IV) · 2020-08-26T05:27:15.969Z · LW(p) · GW(p)

One proven claim is worth a dozen compelling hypotheses, but LW to a first approximation only produces the latter.

Depends on the claim, right?

If the cost of evaluating a hypothesis is high, and hypotheses are cheap to generate, I would like to generate a great deal before selecting one to evaluate.

comment by Ben Pace (Benito) · 2020-08-23T02:18:58.082Z · LW(p) · GW(p)

Yepp, I agree with this. I guess our main disagreement is whether the "low epistemic standards" framing is a useful way to shape that energy. I think it is because it'll push people towards realising how little evidence they actually have for many plausible-seeming hypotheses on this website.

A housemate of mine said to me they think LW has a lot of breadth, but could benefit from more depth. 

I think in general when we do intellectual work we have excellent epistemic standards, capable of listening to all sorts of evidence that other communities and fields would throw out, and listening to subtler evidence than most scientists ("faster than science"), but that our level of coordination and depth is often low. "LessWrongers should collaborate more and go into more depth in fleshing out their ideas" sounds more true to me than "LessWrongers have very low epistemic standards".

Replies from: ricraz, TAG
comment by Richard_Ngo (ricraz) · 2020-08-23T06:18:40.815Z · LW(p) · GW(p)
In general when we do intellectual work we have excellent epistemic standards, capable of listening to all sorts of evidence that other communities and fields would throw out, and listening to subtler evidence than most scientists ("faster than science")

"Being more openminded about what evidence to listen to" seems like a way in which we have lower epistemic standards than scientists, and also that's beneficial. It doesn't rebut my claim that there are some ways in which we have lower epistemic standards than many academic communities, and that's harmful.

In particular, the relevant question for me is: why doesn't LW have more depth? Sure, more depth requires more work, but on the timeframe of several years, and hundreds or thousands of contributors, it seems viable. And I'm proposing, as a hypothesis, that LW doesn't have enough depth because people don't care enough about depth - they're willing to accept ideas even before they've been explored in depth. If this explanation is correct, then it seems accurate to call it a problem with our epistemic standards - specifically, the standard of requiring (and rewarding) deep investigation and scholarship.

Replies from: John_Maxwell_IV, curi, Benito, TAG
comment by John_Maxwell (John_Maxwell_IV) · 2020-08-26T05:41:09.199Z · LW(p) · GW(p)

LW doesn't have enough depth because people don't care enough about depth - they're willing to accept ideas even before they've been explored in depth. If this explanation is correct, then it seems accurate to call it a problem with our epistemic standards - specifically, the standard of requiring (and rewarding) deep investigation and scholarship.

Your solution to the "willingness to accept ideas even before they've been explored in depth" problem is to explore ideas in more depth. But another solution is to accept fewer ideas, or hold them much more provisionally.

I'm a proponent of the second approach because:

  • I suspect even academia doesn't hold ideas as provisionally as it should. See Hamming on expertise: https://forum.effectivealtruism.org/posts/mG6mckPHAisEbtKv5/should-you-familiarize-yourself-with-the-literature-before?commentId=SaXXQXLfQBwJc9ZaK [EA(p) · GW(p)]

  • I suspect trying to browbeat people to explore ideas in more depth works against the grain of an online forum as an institution. Browbeating works in academia because your career is at stake, but in an online forum, it just hurts intrinsic motivation and cuts down on forum use (the forum runs on what Clay Shirky called "cognitive surplus", essentially a term for peoples' spare time and motivation). I'd say one big problem with LW 1.0 that LW 2.0 had to solve before flourishing was people felt too browbeaten to post much of anything.

If we accept fewer ideas / hold them much more provisionally, but provide a clear path to having an idea be widely held as true, that creates an incentive for people to try & jump through hoops--and this incentive is a positive one, not a punishment-driven browbeating incentive.

Maybe part of the issue is that on LW, peer review generally happens in the comments after you publish, not before. So there's no publication carrot to offer in exchange for overcoming the objections of peer reviewers.

Replies from: ricraz, Raemon
comment by Richard_Ngo (ricraz) · 2020-08-26T06:39:58.208Z · LW(p) · GW(p)

"If we accept fewer ideas / hold them much more provisionally, but provide a clear path to having an idea be widely held as true, that creates an incentive for people to try & jump through hoops--and this incentive is a positive one, not a punishment-driven browbeating incentive."

Hmm, it sounds like we agree on the solution but are emphasising different parts of it. For me, the question is: who's this "we" that should accept fewer ideas? It's the set of people who agree with my argument that you shouldn't believe things which haven't been fleshed out very much. But the easiest way to add people to that set is just to make the argument, which is what I've done. Specifically, note that I'm not criticising anyone for producing posts that are short and speculative: I'm criticising the people who update too much on those posts.

Replies from: John_Maxwell_IV
comment by John_Maxwell (John_Maxwell_IV) · 2020-08-26T08:10:34.050Z · LW(p) · GW(p)

Fair enough. I'm reminded of a time someone summarized one of my posts as being a definitive argument against some idea X and me thinking to myself "even I don't think my post definitively settles this issue" haha.

comment by Raemon · 2020-08-26T05:56:26.144Z · LW(p) · GW(p)

Yeah, this is roughly how I think about it.

I do think right now LessWrong should lean more in the direction the Richard is suggesting – I think it was essential to establish better Babble procedures but now we're doing well enough on that front that I think setting clearer expectations of how the eventual pruning works is reasonable. 

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-26T06:53:44.830Z · LW(p) · GW(p)

I wanted to register that I don't like "babble and prune" as a model of intellectual development. I think intellectual development actually looks more like:

1. Babble

2. Prune

3. Extensive scholarship

4. More pruning

5. Distilling scholarship to form common knowledge

And that my main criticism is the lack of 3 and 5, not the lack of 2 or 4.

I also note that: a) these steps get monotonically harder, so that focusing on the first two misses *almost all* the work; b) maybe I'm being too harsh on the babble and prune framework because it's so thematically appropriate for me to dunk on it here; I'm not sure if your use of the terminology actually reveals a substantive disagreement.

Replies from: Raemon, John_Maxwell_IV
comment by Raemon · 2020-08-27T05:09:21.433Z · LW(p) · GW(p)

I basically agree with your 5-step model (I at least agree it's a more accurate description than Babel and Prune, which I just meant as rough shorthand). I'd add things like "original research/empiricism" or "more rigorous theorizing" to the "Extensive Scholarship" step. 

I see the LW Review as basically the first of (what I agree should essentially be at least) a 5 step process. It's adding a stronger Step 2, and a bit of Step 5 (at least some people chose to rewrite their posts to be clearer and respond to criticism)

...

Currently, we do get non-zero Extensive Scholarship and Original Empiricism. (Kaj's Multi-Agent Models of Mind [? · GW] seems like it includes real scholarship. Scott Alexander / Eli Tyre and Bucky's exploration into Birth Order Effects seemed like real empiricism). Not nearly as much as I'd like.

But John's comment elsethread [? · GW] seems significant:

If the cost of evaluating a hypothesis is high, and hypotheses are cheap to generate, I would like to generate a great deal before selecting one to evaluate.

This reminded of a couple posts in the 2018 Review, Local Validity as Key to Sanity and Civilization [LW · GW], and Is Clickbait Destroying Our General Intelligence? [LW · GW]. Both of those seemed like "sure, interesting hypothesis. Is it real tho?"

During the Review I created a followup "How would we check if Mathematicians are Generally More Law Abiding? [LW · GW]" question, trying to move the question from Stage 2 to 3. I didn't get much serious response, probably because, well, it was a much harder question.

But, honestly... I'm not sure it's actually a question that was worth asking. I'd like to know if Eliezer's hypothesis about mathematicians is true, but I'm not sure it ranks near the top of questions I'd want people to put serious effort into answering. 

I do want LessWrong to be able to followup Good Hypotheses with Actual Research, but it's not obvious which questions are worth answering. OpenPhil et al are paying for some types of answers, I think usually by hiring researchers full time. It's not quite clear what the right role for LW to play in the ecosystem.

comment by John_Maxwell (John_Maxwell_IV) · 2020-08-26T11:52:22.090Z · LW(p) · GW(p)
  1. All else equal, the harder something is, the less we should do it.

  2. My quick take is that writing lit reviews/textbooks is a comparative disadvantage of LW relative to the mainstream academic establishment.

In terms of producing reliable knowledge... if people actually care about whether something is true, they can always offer a cash prize for the best counterargument (which could of course constitute citation of academic research). The fact that people aren't doing this suggests to me that for most claims on LW, there isn't any (reasonably rich) person who cares deeply re: whether the claim is true. I'm a little wary of putting a lot of effort into supply if there is an absence of demand.

(I guess the counterargument is that accurate knowledge is a public good so an individual's willingness to pay doesn't get you complete picture of the value accurate knowledge brings. Maybe what we need is a way to crowdfund bounties for the best argument related to something.)

(I agree that LW authors would ideally engage more with each other and academic literature on the margin.)

Replies from: AllAmericanBreakfast
comment by AllAmericanBreakfast · 2020-08-26T16:16:05.679Z · LW(p) · GW(p)

I’ve been thinking about the idea of “social rationality” lately, and this is related. We do so much here in the way of training individual rationality - the inputs, functions, and outputs of a single human mind. But if truth is a product, then getting human minds well-coordinated to produce it might be much more important than training them to be individually stronger. Just as assembly line production is much more effective in producing almost anything than teaching each worker to be faster in assembling a complete product by themselves.

My guess is that this could be effective not only in producing useful products, but also in overcoming biases. Imagine you took 5 separate LWers and asked them to create a unified consensus response to a given article. My guess is that they’d learn more through that collective effort, and produce a more useful response, than if they spent the same amount of time individually evaluating the article and posting their separate replies.

Of course, one of the reasons we don’t to that so much is that coordination is an up-front investment and is unfamiliar. Figuring out social technology to make it easier to participate in might be a great project for LW.

Replies from: John_Maxwell_IV
comment by John_Maxwell (John_Maxwell_IV) · 2020-08-27T04:36:06.301Z · LW(p) · GW(p)

There's been a fair amount of discussion of that sort of thing here: https://www.lesswrong.com/tag/group-rationality [? · GW] There are also groups outside LW thinking about social technology such as RadicalxChange.

Imagine you took 5 separate LWers and asked them to create a unified consensus response to a given article. My guess is that they’d learn more through that collective effort, and produce a more useful response, than if they spent the same amount of time individually evaluating the article and posting their separate replies.

I'm not sure. If you put those 5 LWers together, I think there's a good chance that the highest status person speaks first and then the others anchor on what they say and then it effectively ends up being like a group project for school with the highest status person in charge. Some [LW · GW] related links [LW(p) · GW(p)].

Replies from: AllAmericanBreakfast
comment by AllAmericanBreakfast · 2020-08-27T14:17:45.925Z · LW(p) · GW(p)

That’s definitely a concern too! I imagine such groups forming among people who either already share a basic common view, and collaborate to investigate more deeply. That way, any status-anchoring effects are mitigated.

Alternatively, it could be an adversarial collaboration. For me personally, some of the SSC essays in this format have led me to change my mind in a lasting way.

comment by curi · 2020-09-03T22:32:10.667Z · LW(p) · GW(p)

they're willing to accept ideas even before they've been explored in depth

People also reject ideas before they've been explored in depth. I've tried to discuss similar issues with LW [LW · GW] before but the basic response was roughly "we like chaos where no one pays attention to whether an argument has ever been answered by anyone; we all just do our own thing with no attempt at comprehensiveness or organizing who does what; having organized leadership of any sort, or anyone who is responsible for anything, would be irrational" (plus some suggestions that I'm low social status and that therefore I personally deserve to be ignored. there were also suggestions – phrased rather differently but amounting to this – that LW will listen more if published ideas are rewritten, not to improve on any flaws, but so that the new versions can be published at LW before anywhere else, because the LW community's attention allocation is highly biased towards that).

comment by Ben Pace (Benito) · 2020-08-23T06:48:39.182Z · LW(p) · GW(p)

I feel somewhat inclined to wrap up this thread at some point, even while there's more to say. We can continue if you like and have something specific or strong you'd like to ask, but otherwise will pause here.

comment by TAG · 2020-08-23T10:49:01.224Z · LW(p) · GW(p)

why doesn’t LW have more depth?

You have to realise that what you are doing isn't adequate in order to gain the motivation to do it better, and that is unlikely to happen if you are mostly communicating with other people who think everything is OK.

comment by TAG · 2020-08-23T10:44:02.903Z · LW(p) · GW(p)

Lesswrong is competing against philosophy as well as science, and philosophy has broader criterion of evidence still. In fact , lesswrongians are often frustrated that mainstream philosophy takes such topics as dualism or theism seriously.. even though theres an abundance of Bayesian evidence for them.

comment by Ruby · 2020-08-21T07:57:52.321Z · LW(p) · GW(p)

(Thanks for laying out your position in this level of depth. Sorry for how long this comment turned out. I guess I wanted to back up a bunch of my agreement with words. It's a comment for the sake of everyone else, not just you.)

I think there's something to what you're saying, that the mentality itself could be better. The Sequences have been criticized because Eliezer didn't cite previous thinkers all that much, but at least as far as the science goes, as you said, he was drawing on academic knowledge. I also think we've lost something precious with the absence of epic topic reviews by the likes of Luke. Kaj Sotala still brings in heavily from outside knowledge, John Wentworth did a great review on Biological Circuits, and we get SSC crossposts that have that, but otherwise posts aren't heavily referencing or building upon outside stuff. I concede that I would like to see a lot more of that.

I think Kaj was rightly disappointed that he didn't get more engagement with his post whose gist was "this is what the science really says about S1 & S2, one of your most cherished concepts, LW community".

I wouldn't say the typical approach is strictly bad, there's value in thinking freshly for oneself or that failure to reference previous material shouldn't be a crime or makes a text unworthy, but yeah, it'd be pretty cool if after Alkjash laid out Babble & Prune (which intuitively feels so correct), someone had dug through what empirical science we have to see whether the picture lines up. Or heck, actually gone and done some kind of experiment. I bet it would turn up something interesting.

And I think what you're saying is that the issue isn't just that people aren't following up with scholarship and empiricism on new ideas and models, but that they're actually forgetting that these are the next steps. Instead, they're overconfident in our homegrown models, as though LessWrong were the one place able to come up with good ideas. (Sorry, some of this might be my own words.) 

The category I'd label a lot of LessWrong posts with is "engaging articulation of a point which is intuitive in hindsight" / "creation of common vocabulary around such points". That's pretty valuable, but I do think solving the hardest problems will take more.

-----

You use the word "reliably" in a few places. It feels like it's doing some work in your statements, and I'm not entirely sure what you mean or why it's important.

-----

A model which is interesting but maybe not of obvious connection. I was speaking to a respected rationalist thinker this week and they classified potential writing on LessWrong into three categories: 

  1. Writing stuff to help oneself figure things out. Like a diary, but publicly shared.
  2. People exchanging "letters" as they attempt to figure things out. Like old school academic journals.
  3. Someone having something mostly figured out but with a large inferential distance to bridge. They write a large collection of posts trying to cover that distance. One example is The Sequences, and more recent examples are from  John Wentworth and Kaj Sotala

I mention this because I recall you (alongside the rationalist thinker) complaining about the lack of people "presenting their worldviews on LessWrong".

The kinds of epistemic norms I think you're advocating for feel like a natural fit for 2nd kind of writing, but it's less clear to me how they should apply to people presenting world views. Maybe it's not more complicated than it's fine to present your worldview without a tonne of evidence, but people shouldn't forget that the evidence hasn't been presented and it feeling intuitively correct isn't enough.

-----

There's something in here about Epistemic Modesty, something, something. Some part of me reads you as calling for more of that, which I'm wary of, but I don't currently have more to say than flagging it as maybe a relevant variable in any disagreements here.

We probably do disagree about the value of academic sources, or what it takes to get value from them. Hmm. Maybe it's something like there's something to be said for thinking about models and assessing their plausibility yourself rather than relying on likely very flawed empirical studies. 

Maybe I'm in favor of large careful reviews of what science knows but less in favor of trying to find sources for each idea or model that gets raised. I'm not sure.

-----

I can't recall whether I've written publicly much about this, but a model I've had for a year or more is that for LW to make intellectual progress, we need to become a "community of practice", not just a "community of interest". Martial arts vs literal stamp collecting. (Streetfighting might be better still due to actual testing real fighting ability.) It's great that many people find LessWrong a guilty pleasure they feel less guilty about than Facebook, but for us to make progress, people need to see LessWrong as a place where one of things you do is show up and do Serious Work, some of which is relatively hard and boring, like writing and reading lit reviews.

I suspect that a cap on the epistemic standards people hold stuff to is downstream of the level of effort people are calibrated on applying. But maybe it goes in other direction, so I don't know.

Probably the 2018 Review is biased towards the posts which are most widely read, i.e., those easiest and most enjoyable to read, rather than solely rewarding those with the best contributions. Not overwhelmingly, but enough. Maybe same for karma. I'm not sure how to relate to that.

-----

3. Insofar as many of these scattered plausible insights are actually related in deep ways, trying to combine them so that the next generation of LW readers doesn't have to separately learn about each of them, but can rather download a unified generative framework.

This sounds partially like distillation work plus extra integration. And sounds pretty good to me too.


-----

I still remember my feeling of disillusionment in the LessWrong community relative soon after I joined in late 2012. I realized that the bulk of members didn't seem serious about advancing the Art. I never heard people discussing new results from cognitive science and how to apply them, even though that's what Sequences were in large part and the Sequences hardly claimed to be complete! I guess I do relate somewhat to your "desperate effort" comment, though we've got some people trying pretty hard that I wouldn't want to short change.

We do good stuff, but more is possible [LW · GW]. I appreciate the reminder. I hope we succeed at pushing the culture and mentality in directions you like.

Replies from: drossbucket
comment by drossbucket · 2020-08-23T08:44:46.917Z · LW(p) · GW(p)

This is only tangentially relevant, but adding it here as some of you might find it interesting:

Venkatesh Rao has an excellent Twitter thread on why most independent research only reaches this kind of initial exploratory level (he tried it for a bit before moving to consulting). It's pretty pessimistic, but there is a somewhat more optimistic follow-up thread on potential new funding models. Key point is that the later stages are just really effortful and time-consuming, in a way that keeps out a lot of people trying to do this as a side project alongside a separate main job (which I think is the case for a lot of LW contributors?)

Quote from that thread:

Research =

a) long time between having an idea and having something to show for it that even the most sympathetic fellow crackpot would appreciate (not even pay for, just get)

b) a >10:1 ratio of background invisible thinking in notes, dead-ends, eliminating options etc

With a blogpost, it’s like a week of effort at most from idea to mvp, and at most a 3:1 ratio of invisible to visible. That’s sustainable as a hobby/side thing.

To do research-grade thinking you basically have to be independently wealthy and accept 90% deadweight losses

Also just wanted to say good luck! I'm a relative outsider here with pretty different interests to LW core topics but I do appreciate people trying to do serious work outside academia, have been trying to do this myself, and have thought a fair bit about what's currently missing (I wrote that in a kind of jokey style but I'm serious about the topic).

Replies from: ricraz, ricraz
comment by Richard_Ngo (ricraz) · 2020-08-23T10:05:22.149Z · LW(p) · GW(p)

Also, I liked your blog post! More generally, I strongly encourage bloggers to have a "best of" page, or something that directs people to good posts. I'd be keen to read more of your posts but have no idea where to start.

Replies from: drossbucket
comment by drossbucket · 2020-08-23T10:48:43.543Z · LW(p) · GW(p)

Thanks! I have been meaning to add a 'start here' page for a while, so that's good to have the extra push :) Seems particularly worthwhile in my case because a) there's no one clear theme and b) I've been trying a lot of low-quality experimental posts this year bc pandemic trashed motivation, so recent posts are not really reflective of my normal output.

For now some of my better posts in the last couple of years might be Cognitive decoupling and banana phones (tracing back the original precursor of Stanovich's idea), The middle distance (a writeup of a useful and somewhat obscure idea from Brian Cantwell Smith's On the Origin of Objects), and the negative probability post and its followup.

comment by Richard_Ngo (ricraz) · 2020-08-23T09:44:15.112Z · LW(p) · GW(p)

Thanks, these links seem great! I think this is a good (if slightly harsh) way of making a similar point to mine:

"I find that autodidacts who haven’t experienced institutional R&D environments have a self-congratulatory low threshold for what they count as research. It’s a bit like vanity publishing or fan fiction. This mismatch doesn’t exist as much in indie art, consulting, game dev etc"

comment by DanielFilan · 2020-08-21T06:34:31.394Z · LW(p) · GW(p)

As mentioned in this comment [LW(p) · GW(p)], the Unrolling social metacognition paper is closely related to at least one research paper.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-21T06:58:31.031Z · LW(p) · GW(p)

Right, but this isn't mentioned in the post? Which seems odd. Maybe that's actually another example of the "LW mentality": why is the fact that there has been solid empirical research into 3 layers not being enough not important enough to mention in a post on why 3 layers isn't enough? (Maybe because the post was time-boxed? If so that seems reasonable, but then I would hope that people comment saying "Here's a very relevant paper, why didn't you cite it?")

comment by Zachary Robertson (zachary-robertson) · 2020-08-20T23:13:56.178Z · LW(p) · GW(p)

On AI alone (which I am using in large part because there's vaguely more consensus around it than around rationality), I think you wouldn't have seen almost any of the public write-ups (like Embedded Agency and Zhukeepa's Paul FAQ) without LessWrong

I think a distinction should be made between intellectual progress (whatever that is) and distillation. I know lots of websites that do amazing distillation of AI related concepts (literally distill.pub). I think most people would agree that sort of work is important in order to make intellectual progress, but I also think significantly less people would agree distillation is intellectual progress. Having this distinction in mind, I think your examples from AI are not as convincing. Perhaps more so once you consider the Less Wrong is often being used more as a platform to share these distillations than to create them.

I think you're right that Less Wrong has some truly amazing content. However, once again, it seems a lot of these posts are not inherently from the ecosystem but are rather essentially cross-posted. If I say a lot of the content on LW is low-quality it's mostly an observation about what I expect to find from material that builds on itself. The quality of LW-style accumulated knowledge seems lower than it could be.

On a personal note, I've actively tried to explore using this site as a way to engage with research and have come to a similar opinion as Richard. The most obvious barrier is the separation between LW and AIAF. Effectively, if you're doing AI safety research, to second-order approximation you can block LW (noise) and only look at AIAF (signal). I say to second-order because anything from LW that is signal ends up being posted on AIAF anyway which means the method is somewhat error-tolerant.

This probably comes off as a bit pessimistic. Here's a concrete proposal I hope to try out soon enough. Pick a research question. Get a small group of people/friends together. Start talking about the problem and then posting on LW. Iterate until there's group consensus.

Replies from: Benito, Benito
comment by Ben Pace (Benito) · 2020-08-21T02:21:54.546Z · LW(p) · GW(p)

Much of the same is true of scientific journals. Creating a place to share and publish research is a pretty key piece of intellectual infrastructure, especially for researchers to create artifacts of their thinking along the way. 

The point about being 'cross-posted' is where I disagree the most. 

This is largely original content that counterfactually wouldn't have been published, or occasionally would have been published but to a much smaller audience. What Failure Looks Like wasn't crossposted, Anna's piece on reality-revealing puzzles wasn't crossposted. I think that Zvi would have still written some on mazes and simulacra, but I imagine he writes substantially more content given the cross-posting available for the LW audience. Could perhaps check his blogging frequency over the last few years to see if that tracks. I recall Zhu telling me he wrote his FAQ because LW offered an audience for it, and likely wouldn't have done so otherwise. I love everything Abram writes, and while he did have the Intelligent Agent Foundations Forum, it had a much more concise, technical style, tiny audience, and didn't have the conversational explanations and stories and cartoons that have been so excellent and well received on LW, and it wouldn't as much have been focused on the implications for rationality of things like logical inductors. Rohin wouldn't have written his coherence theorems piece or any of his value learning sequence, and I'm pretty sure about that because I personally asked him to write that sequence, which is a great resource and I've seen other researchers in the field physically print off to write on and study. Kaj has an excellent series of non-mystical explanations of ideas from insight meditation that started as a response to things Val wrote, and I imagine those wouldn't have been written quite like that if that context did not exist on LW.

I could keep going, but probably have made the point. It seems weird to not call this collectively a substantial amount of intellectual progress, on a lot of important questions.

I am indeed focusing right now on how to do more 'conversation'. I'm in the middle of trying to host some public double cruxes for events, for example, and some day we will finally have inline commenting and better draft sharing and so on. It's obviously not finished.

Replies from: rohinmshah
comment by rohinmshah · 2020-09-02T16:13:36.039Z · LW(p) · GW(p)

Rohin wouldn't have written his coherence theorems piece or any of his value learning sequence, and I'm pretty sure about that because I personally asked him to write that sequence

Yeah, that's true, though it might have happened at some later point in the future as I got increasingly frustrated by people continuing to cite VNM at me (though probably it would have been a blog post and not a full sequence).

Reading through this comment tree, I feel like there's a distinction to be made between "LW / AIAF as a platform that aggregates readership and provides better incentives for blogging", and "the intellectual progress caused by posts on LW / AIAF". The former seems like a clear and large positive of LW / AIAF, which I think Richard would agree with. For the latter, I tend to agree with Richard, though perhaps not as strongly as he does. Maybe I'd put it as, I only really expect intellectual progress from a few people who work on problems full time who probably would have done similar-ish work if not for LW / AIAF (but likely would not have made it public).

I'd say this mostly for the AI posts. I do read the rationality posts and don't get a different impression from them, but I also don't think enough about them to be confident in my opinions there.

comment by Ben Pace (Benito) · 2020-08-20T23:23:50.900Z · LW(p) · GW(p)

 By "AN" do you mean the AI Alignment Forum, or "AIAF"?

Replies from: zachary-robertson
comment by Zachary Robertson (zachary-robertson) · 2020-08-21T00:24:38.789Z · LW(p) · GW(p)

Ya, totally messed up that. I meant the AI Alignment Forum or AIAF. I think out of habit I used AN (Alignment Newsletter)

Replies from: Benito
comment by Ben Pace (Benito) · 2020-08-21T08:34:23.739Z · LW(p) · GW(p)

I did suspect you'd confused it with the Alignment Newsletter :)

comment by Ruby · 2020-08-21T03:14:25.043Z · LW(p) · GW(p)

Thanks for chiming in with this. People criticizing the epistemics is hopefully how we get better epistemics. When the Californian smoke isn't interfering with my cognition as much, I'll try to give your feedback (and Rohin's [LW(p) · GW(p)]) proper attention. I would generally be interested to hear your arguments/models in detail, if you get the chance to lay them out.

My default position is LW has done well enough historically (e.g. Ben Pace's examples) for me to currently be investing in getting it even better. Epistemics and progress could definitely be a lot better, but getting there is hard. If I didn't see much progress on the rate of progress in the next year or two, I'd probably go focus on other things, though I think it'd be tragic if we ever lost what we have now.

And another thought:

And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts

Yes and no. Journal articles have their advantages, and so do blog posts. A bunch of recent LessWrong team's work has been around filling in the missing pieces for the system to work, e.g. Open Questions (hasn't yet worked for coordinating research), Annual Review, Tagging, Wiki. We often talk about conferences and "campus".
My work on Open Questions involved thinking about i) a better template for articles than "Abstract, Intro, Methods, etc.", but Open Questions didn't work for unrelated reasons we haven't overcome yet, ii) getting lit reviews done systematically by people, iii) coordinating groups around research agendas. 

I've thought about re-attempting the goals of Open Questions with instead a "Research Agenda" feature that lets people communally maintain research agendas and work on them. It's a question of priorities whether I work on that anytime soon.

I do really think many of the deficiencies of LessWrong's current work compared to academia are "infrastructure problems" at least as much as the epistemic standards of the community. Which means the LW team should be held culpable for not having solved them yet, but it is tricky.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-21T05:33:10.325Z · LW(p) · GW(p)

For the record, I think the LW team is doing a great job. There's definitely a sense in which better infrastructure can reduce the need for high epistemic standards, but it feels like the thing I'm pointing at is more like "Many LW contributors not even realising how far away we are from being able to reliably produce and build on good ideas" (which feels like my criticism of Ben's position in his comment, so I'll respond more directly there).

comment by Pongo · 2020-08-20T20:21:34.835Z · LW(p) · GW(p)

It seems really valuable to have you sharing how you think we’re falling epistemically short and probably important for the site to integrate the insights behind that view. There are a bunch of ways I disagree with your claims about epistemic best practices, but it seems like it would be cool if I could pass your ITT more. I wish your attempt to communicate the problems you saw had worked out better. I hope there’s a way for you to help improve LW epistemics, but also get that it might be costly in time and energy.

comment by Viliam · 2020-08-20T21:10:54.892Z · LW(p) · GW(p)
I just noticed that a couple of those comments have been downvoted to negative karma

Now they're positive again.

Confusing to me, their Ω-karma (karma on another website) is also positive. Does it mean they previously had negative LW-karma but positive Ω-karma? Or that their Ω-karma also improved as a result of you complaining on LW a few hours ago? Why would it?

(Feature request: graph of evolution of comment karma as a function of time.)

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-21T14:36:50.178Z · LW(p) · GW(p)

I'm confused, what is Ω-karma?

Replies from: mikkel-wilson
comment by MikkW (mikkel-wilson) · 2020-08-21T15:25:00.554Z · LW(p) · GW(p)

AI Alignment Forum karma (which is also displayed here on posts that are crossposted)

comment by NaiveTortoise (An1lam) · 2020-08-21T13:17:59.270Z · LW(p) · GW(p)

I'd be curious what, if any, communities you think set good examples in this regard. In particular, are there specific academic subfields or non-academic scenes that exemplify the virtues you'd like to see more of?

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-21T14:35:06.121Z · LW(p) · GW(p)

Maybe historians of the industrial revolution? Who grapple with really complex phenomena and large-scale patterns, like us, but unlike us use a lot of data, write a lot of thorough papers and books, and then have a lot of ongoing debate on those ideas. And then the "progress studies" crowd is an example of an online community inspired by that tradition (but still very nascent, so we'll see how it goes).

More generally I'd say we could learn to be more rigorous by looking at any scientific discipline or econ or analytic philosophy. I don't think most LW posters are in a position to put in as much effort as full-time researchers, but certainly we can push a bit in that direction.

Replies from: An1lam
comment by NaiveTortoise (An1lam) · 2020-08-26T12:48:31.717Z · LW(p) · GW(p)

Thanks for your reply! I largely agree with drossbucket [LW(p) · GW(p)]'s reply.

I also wonder how much this is an incentives problem. As you mentioned and in my experience, the fields you mentioned strongly incentivize an almost fanatical level of thoroughness that I suspect is very hard for individuals to maintain without outside incentives pushing them that way. At least personally, I definitely struggle and, frankly, mostly fail to live up to the sorts of standards you mention when writing blog posts in part because the incentive gradient feels like it pushes towards hitting the publish button.

Given this, I wonder if there's a way to shift the incentives on the margin. One minor thing I've been thinking of trying for my personal writing is having a Knuth or Nintil style "pay for mistakes" policy. Do you have thoughts on other incentive structures to for rewarding rigor or punishing the lack thereof?

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-26T15:47:40.572Z · LW(p) · GW(p)

It feels partly like an incentives problem, but also I think a lot of people around here are altruistic and truth-seeking and just don't realise that there are much more effective ways to contribute to community epistemics than standard blog posts.

I think that most LW discussion is at the level where "paying for mistakes" wouldn't be that helpful, since a lot of it is fuzzy. Probably the thing we need first are more reference posts that distill a range of discussion into key concepts, and place that in the wider intellectual context. Then we can get more empirical. (Although I feel pretty biased on this point, because my own style of learning about things is very top-down). I guess to encourage this, we could add a "reference" section for posts that aim to distill ongoing debates on LW.

In some cases you can get a lot of "cheap" credit by taking other people's ideas and writing a definitive version of them aimed at more mainstream audiences. For ideas that are really worth spreading, that seems useful.

comment by Richard_Ngo (ricraz) · 2020-12-02T23:39:08.239Z · LW(p) · GW(p)

The crucial heuristic I apply when evaluating AI safety research directions is: could we have used this research to make humans safe, if we were supervising the human evolutionary process? And if not, do we have a compelling story for why it'll be easier to apply to AIs than to humans?

Sometimes this might be too strict a criterion, but I think in general it's very valuable in catching vague or unfounded assumptions about AI development.

Replies from: adamShimi
comment by adamShimi · 2020-12-03T23:06:10.183Z · LW(p) · GW(p)

By making human safe, do you mean with regard to evolution's objective?

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-12-04T15:27:40.146Z · LW(p) · GW(p)

No. I meant: suppose we were rerunning a simulation of evolution, but can modify some parts of it (e.g. evolution's objective). How do we ensure that whatever intelligent species comes out of it is safe in the same ways we want AGIs to be safe?

(You could also think of this as: how could some aliens overseeing human evolution have made humans safe by those aliens' standards of safety? But this is a bit trickier to think about because we don't know what their standards are. Although presumably current humans, being quite aggressive and having unbounded goals, wouldn't meet them).

Replies from: adamShimi
comment by adamShimi · 2020-12-05T14:41:30.638Z · LW(p) · GW(p)

Okay, thanks. Could you give me an example of a research direction that passes this test? The thing I have in mind right now is pretty much everything that backchain to local search [LW · GW], but maybe that's not the way you think about it.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-12-05T15:14:44.763Z · LW(p) · GW(p)

So I think Debate is probably the best example of something that makes a lot of sense when applied to humans, to the point where they're doing human experiments on it already.

But this heuristic is actually a reason why I'm pretty pessimistic about most safety research directions.

Replies from: adamShimi
comment by adamShimi · 2020-12-18T16:03:20.854Z · LW(p) · GW(p)

So I've been thinking about this for a while, and I think I disagree with what I understand of your perspective. Which might obviously mean I misunderstand your perspective.

What I think I understand is that you judge safety research directions based on how well they could work on an evolutionary process like the one that created humans. But for me, the most promising approach to AGI is based on local search, which differs a bit from evolutionary process. I don't really see a reason to consider evolutionary processes instead of local search, and even then, the specific approach of evolution for humans is probably far too specific as a test bench.

This matters because problems for one are not problems for the other. For example, one way to mess with an evolutionary process is to find way for everything to survive and reproduce/disseminate. Technology in general did that for humans, which means the evolutionary pressure decreased as technology evolved. But that's not a problem for local search, since at each step there will be only one next program.

On the other hand, local search might be dangerous because of things like gradient hacking [AF · GW]. And they don't make sense for evolutionary processes.

In conclusion, I feel for the moment that backchaining to local search [LW · GW] is a better heuristic for judging safety research directions. But I'm curious about where our disagreement lies on this issue.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-12-21T11:12:50.412Z · LW(p) · GW(p)

One source of our disagreement: I would describe evolution as a type of local search. The difference is that it's local with respect to the parameters of a whole population, rather than an individual agent. So this does introduce some disanalogies, but not particularly significant ones (to my mind). I don't think it would make much difference to my heuristic if we imagined that humans had evolved via gradient descent over our genes instead.

In other words, I like the heuristic of backchaining to local search, and I think of it as a subset of my heuristic. The thing it's missing, though, is that it doesn't tell you which approaches will actually scale up to training regimes which are incredibly complicated, applied to fairly intelligent agents. For example, impact penalties make sense in a local search context for simple problems. But to evaluate whether they'll work for AGIs, you need to apply them to massively complex environments. So my intuition is that, because I don't know how to apply them to the human ancestral environment, we also won't know how to apply them to our AGIs' training environments.

Similarly, when I think about MIRI's work on decision theory, I really have very little idea how to evaluate it in the context of modern machine learning. Are decision theories the type of thing which AIs can learn via local search? Seems hard to tell, since our AIs are so far from general intelligence. But I can reason much more easily about the types of decision theories that humans have, and the selective pressures that gave rise to them.

As a third example, my heuristic endorses Debate due to a high-level intuition about how human reasoning works, in addition to a low-level intuition about how it can arise via local search.

Replies from: adamShimi
comment by adamShimi · 2020-12-22T15:13:40.770Z · LW(p) · GW(p)

So if I try to summarize your position, it's something like: backchain to local search for simple and single-AI cases, and then think about aligning humans for the scaled and multi-agents version? That makes much more sense, thanks!

I also definitely see why your full heuristic doesn't feel immediately useful to me: because I mostly focus on the simple and single-AI case. But I've been thinking more and more (in part thanks to your writing) that I should allocate more thinking time to the more general case. I hope your heuristic will help me there.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-12-22T17:35:38.834Z · LW(p) · GW(p)

Cool, glad to hear it. I'd clarify the summary slightly: I think all safety techniques should include at least a rough intuition for why they'll work in the scaled-up version, even when current work on them only applies them to simple AIs. (Perhaps this was implicit in your summary already, I'm not sure.)

comment by Richard_Ngo (ricraz) · 2020-12-10T16:53:13.462Z · LW(p) · GW(p)

A well-known analogy from Yann LeCun: if machine learning is a cake, then unsupervised learning is the cake itself, supervised learning is the icing, and reinforcement learning is the cherry on top.

I think this is useful for framing my core concerns about current safety research:

  • If we think that unsupervised learning will produce safe agents, then why will the comparatively small contributions of SL and RL make them unsafe?
  • If we think that unsupervised learning will produce dangerous agents, then why will safety techniques which focus on SL and RL (i.e. basically all of them) work, when they're making comparatively small updates to agents which are already misaligned?

I do think it's more complicated than I've portrayed here, but I haven't yet seen a persuasive response to the core intuition.

Replies from: steve2152
comment by Steven Byrnes (steve2152) · 2020-12-11T11:44:00.195Z · LW(p) · GW(p)

I wrote a few posts on self-supervised learning last year:

I'm not aware of any airtight argument that "pure" self-supervised learning systems, either generically or with any particular architecture, are safe to use, to arbitrary levels of intelligence, though it seems very much worth someone trying to prove or disprove that. For my part, I got distracted by other things and haven't thought about it much since then.

The other issue is whether "pure" self-supervised learning systems would be capable enough to satisfy our AGI needs, or to safely bootstrap to systems that are. I go back and forth on this. One side of the argument I wrote up here [LW · GW]. The other side is, I'm now (vaguely) thinking that people need a reward system to decide what thoughts to think, and the fact that GPT-3 doesn't need reward is not evidence of reward being unimportant but rather evidence that GPT-3 is nothing like an AGI [LW · GW]. Well, maybe.

For humans, self-supervised learning forms the latent representations, but the reward system controls action selection. It's not altogether unreasonable to think that action selection, and hence reward, is a more important thing to focus on for safety research. AGIs are dangerous when they take dangerous actions, to a first approximation. The fact that a larger fraction of neocortical synapses are adjusted by self-supervised learning than by reward learning is interesting and presumably safety-relevant, but I don't think it immediately proves that self-supervised learning has a similarly larger fraction of the answers to AGI safety questions. Maybe, maybe not, it's not immediately obvious. :-)

comment by Richard_Ngo (ricraz) · 2021-01-14T00:25:39.837Z · LW(p) · GW(p)

In a bayesian rationalist view of the world, we assign probabilities to statements based on how likely we think they are to be true. But truth is a matter of degree, as Asimov points out. In other words, all models are wrong, but some are less wrong than others.

Consider, for example, the claim that evolution selects for reproductive fitness. Well, this is mostly true, but there's also sometimes group selection, and the claim doesn't distinguish between a gene-level view and an individual-level view, and so on...

So just assigning it a single probability seems inadequate. Instead, we could assign a probability distribution over its degree of correctness. But because degree of correctness is such a fuzzy concept, it'd be pretty hard to connect this distribution back to observations.

Or perhaps the distinction between truth and falsehood is sufficiently clear-cut in most everyday situations for this not to be a problem. But questions about complex systems (including, say, human thoughts and emotions) are messy enough that I expect the difference between "mostly true" and "entirely true" to often be significant.

Has this been discussed before? Given Less Wrong's name, I'd be surprised if not, but I don't think I've stumbled across it.

Replies from: habryka4, DanielFilan, Zack_M_Davis
comment by habryka (habryka4) · 2021-01-14T04:21:48.413Z · LW(p) · GW(p)

This feels generally related to the problems covered in Scott and Abram's research over the past few years. One of the sentences that stuck out to me the most was (roughly paraphrased since I don't want to look it up): 

In order to be a proper bayesian agent, a single hypothesis you formulate is as big and complicated as a full universe that includes yourself

I.e. our current formulations of bayesianism like solomonoff induction only formulate the idea of a hypothesis at such a low level that even trying to think about a single hypothesis rigorously is basically impossible with bounded computational time. So in order to actually think about anything you have to somehow move beyond naive bayesianism.

Replies from: ricraz, EpicNamer27098
comment by Richard_Ngo (ricraz) · 2021-01-14T13:24:38.854Z · LW(p) · GW(p)

This seems reasonable, thanks. But I note that "in order to actually think about anything you have to somehow move beyond naive bayesianism" is a very strong criticism. Does this invalidate everything that has been said about using naive bayesianism in the real world? E.g. every instance where Eliezer says "be bayesian".

One possible answer is "no, because logical induction fixes the problem". My uninformed guess is that this doesn't work because there are comparable problems with applying to the real world. But if this is your answer, follow-up question: before we knew about logical induction, were the injunctions to "be bayesian" justified?

(Also, for historical reasons, I'd be interested in knowing when you started believing this.)

Replies from: habryka4
comment by habryka (habryka4) · 2021-01-14T18:34:50.950Z · LW(p) · GW(p)

I think it definitely changed a bunch of stuff for me, and does at least a bit invalidate some of the things that Eliezer said, though not actually very much. 

In most of his writing Eliezer used bayesianism as an ideal that was obviously unachievable, but that still gives you a rough sense of what the actual limits of cognition are, and rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal. I did definitely get confused for a while and tried to apply Bayes to everything directly, and then felt bad when I couldn't actually apply bayes theorem in some situations, which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot. 

My shift on this happened over the last 2-3 years or so. I think starting with Embedded Agency, but maybe a bit before that. 

Replies from: ricraz, Raemon
comment by Richard_Ngo (ricraz) · 2021-01-14T19:12:49.108Z · LW(p) · GW(p)

rules out a bunch of methods of cognition as being clearly in conflict with that theoretical ideal

Which ones? In Against Strong Bayesianism [LW · GW] I give a long list of methods of cognition that are clearly in conflict with the theoretical ideal, but in practice are obviously fine. So I'm not sure how we distinguish what's ruled out from what isn't.

which I now realize is because those tended to be problems where embededness or logical uncertainty mattered a lot

Can you give an example of a real-world problem where logical uncertainty doesn't matter a lot, given that without logical uncertainty, we'd have solved all of mathematics and considered all the best possible theories in every other domain?

Replies from: habryka4
comment by habryka (habryka4) · 2021-01-14T22:51:09.898Z · LW(p) · GW(p)

I think in-practice there are lots of situations where you can confidently create a kind of pocket-universe where you can actually consider hypotheses in a bayesian way. 

Concrete example: Trying to figure out who voted a specific way on a LW post. You can condition pretty cleanly on vote-strength, and treat people's votes as roughly independent, so if you have guesses on how different people are likely to vote, it's pretty easy to create the odds ratios for basically all final karma + vote numbers and then make a final guess based on that. 

It's clear that there is some simplification going on here, by assigning static probabilities for people's vote behavior, treating them as independent (though modeling some subset of independence wouldn't be too hard), etc.. But overall I expect it to perform pretty well and to give you good answers. 

(Note, I haven't actually done this explicitly, but my guess is my brain is doing something pretty close to this when I do see vote numbers + karma numbers on a thread)

So I'm not sure how we distinguish what's ruled out from what isn't.

Well, it's obvious that anything that claims to be better than the ideal bayesian update is clearly ruled out. I.e. arguments that by writing really good explanations of a phenomenon you can get to a perfect understanding. Or arguments that you can derive the rules of physics from first principles.

There are also lots of hypotheticals where you do get to just use Bayes properly and then it provides very strong bounds on the ideal approach. There are a good number of implicit models behind lots of standard statistics models that when put into a bayesian framework give rise to a more general formulation. See the Wikipedia article for "Bayesian interpretations of regression" for a number of examples.

Of course, in reality it is always unclear whether the assumptions that give rise to various regression methods actually hold, but I think you can totally say things like "given these assumption, the bayesian solution is the ideal one, and you can't perform better than this, and if you put in the computational effort you will actually achieve this performance".

comment by Raemon · 2021-01-14T18:57:26.358Z · LW(p) · GW(p)

Are you able to give examples of the times you tried to be Bayesian and it failed because embedded was?

comment by EpicNamer27098 · 2021-01-14T08:21:35.793Z · LW(p) · GW(p)

Scott and Abram? Who? Do they have any books I can read to familiarize myself with this discourse?

Replies from: habryka4, ricraz
comment by Richard_Ngo (ricraz) · 2021-01-14T12:37:06.314Z · LW(p) · GW(p)

Scott Garrabrant and Abram Demski, two MIRI researchers.

For introductions to their work, see the Embedded Agency sequence [? · GW], the Consequences of Logical Induction sequence [? · GW], and the Cartesian Frames sequence [? · GW].

comment by DanielFilan · 2021-01-14T00:54:26.827Z · LW(p) · GW(p)

Related but not identical: this shortform post [LW(p) · GW(p)].

comment by Zack_M_Davis · 2021-01-14T01:25:21.325Z · LW(p) · GW(p)

See the section about scoring rules in the Technical Explanation.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2021-01-14T13:16:42.836Z · LW(p) · GW(p)

Hmmm, but what does this give us? He talks about the difference between vague theories and technical theories, but then says that we can use a scoring rule to change the probabilities we assign to each type of theory.

But my question is still: when you increase your credence in a vague theory, what are you increasing your credence about? That the theory is true?

Nor can we say that it's about picking the "best theory" out of the ones we have, since different theories may overlap partially.

Replies from: Zack_M_Davis
comment by Zack_M_Davis · 2021-01-14T19:17:32.315Z · LW(p) · GW(p)

If we can quantify how good a theory is at making accurate predictions (or rather, quantify a combination of accuracy and simplicity [LW · GW]), that gives us a sense in which some theories are "better" (less wrong) than others, without needing theories to be "true".

comment by Richard_Ngo (ricraz) · 2020-11-26T18:05:46.782Z · LW(p) · GW(p)

Oracle-genie-sovereign is a really useful distinction that I think I (and probably many others) have avoided using mainly because "genie" sounds unprofessional/unacademic. This is a real shame, and a good lesson for future terminology.

Replies from: adamShimi, DanielFilan, adamShimi
comment by adamShimi · 2020-11-27T13:46:30.704Z · LW(p) · GW(p)

After rereading the chapter in Superintelligence, it seems to me that "genie" captures something akin to act-based agents. Do you think that's the main way to use this concept in the current state of the field, or do you have other applications in mind?

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-11-28T21:42:44.395Z · LW(p) · GW(p)

Ah, yeah, that's a great point. Although I think act-based agents is a pretty bad name, since those agents may often carry out a whole bunch of acts in a row - in fact, I think that's what made me overlook the fact that it's pointing at the right concept. So not sure if I'm comfortable using it going forward, but thanks for point that out.

comment by DanielFilan · 2020-11-26T18:25:24.419Z · LW(p) · GW(p)

Perhaps the lesson is that terminology that is acceptable in one field (in this case philosophy) might not be suitable in another (in this case machine learning).

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-12-04T15:29:42.223Z · LW(p) · GW(p)

I don't think that even philosophers take the "genie" terminology very seriously. I think the more general lesson is something like: it's particularly important to spend your weirdness points wisely when you want others to copy you, because they may be less willing to spend weirdness points.

comment by adamShimi · 2020-11-26T18:08:24.638Z · LW(p) · GW(p)

Is that from Superintelligence? I googled it, and that was the most convincing result.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-11-25T11:45:27.759Z · LW(p) · GW(p)

I suspect that AIXI is misleading to think about in large part because it lacks reusable parameters - instead it just memorises all inputs it's seen so far. Which means the setup doesn't have episodes, or a training/deployment distinction; nor is any behaviour actually "reinforced".

Replies from: DanielFilan, steve2152
comment by DanielFilan · 2020-11-25T23:04:55.284Z · LW(p) · GW(p)

I kind of think the lack of episodes makes it more realistic for many problems, but admittedly not for simulated games. Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism. [EDIT: I retract the second sentence]

Replies from: DanielFilan, DanielFilan
comment by DanielFilan · 2020-11-26T17:23:49.492Z · LW(p) · GW(p)

Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism.

Actually I think this is total nonsense produced by me forgetting the difference between AIXI and Solomonoff induction.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-11-26T17:31:18.568Z · LW(p) · GW(p)

Wait, really? I thought it made sense (although I'd contend that most people don't think about AIXI in terms of those TMs reinforcing hypotheses, which is the point I'm making). What's incorrect about it?

Replies from: DanielFilan
comment by DanielFilan · 2020-11-26T17:42:09.018Z · LW(p) · GW(p)

Well now I'm less sure that it's incorrect. I was originally imagining that like in Solomonoff induction, the TMs basically directly controlled AIXI's actions, but that's not right: there's an expectimax. And if the TMs reinforce actions by shaping the rewards, in the AIXI formalism you learn that immediately and throw out those TMs.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-11-26T18:00:17.959Z · LW(p) · GW(p)

Oh, actually, you're right (that you were wrong). I think I made the same mistake in my previous comment. Good catch.

comment by DanielFilan · 2020-11-26T17:35:11.375Z · LW(p) · GW(p)

"Also, presumably many of the component Turing machines have reusable parameters and reinforce behaviour, altho this is hidden by the formalism."

Actually I think this is total nonsense produced by me forgetting the difference between AIXI and Solomonoff induction.

comment by Steven Byrnes (steve2152) · 2020-11-25T22:34:08.356Z · LW(p) · GW(p)

Humans don't have a training / deployment distinction either... Do humans have "reusable parameters"? Not quite sure what you mean by that.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-11-26T02:41:23.790Z · LW(p) · GW(p)

Yes we do: training is our evolutionary history, deployment is an individual lifetime. And our genomes are our reusable parameters.

Unfortunately I haven't yet written any papers/posts really laying out this analogy, but it's pretty central to the way I think about AI, and I'm working on a bunch of related stuff as part of my PhD, so hopefully I'll have a more complete explanation soon.

Replies from: steve2152
comment by Steven Byrnes (steve2152) · 2020-11-26T03:17:15.823Z · LW(p) · GW(p)

Oh, OK, I see what you mean. Possibly related: my comment here [LW(p) · GW(p)].

comment by Richard_Ngo (ricraz) · 2020-10-11T12:07:45.365Z · LW(p) · GW(p)

I've recently discovered waitwho.is, which collects all the online writing and talks of various tech-related public intellectuals. It seems like an important and previously-missing piece of infrastructure for intellectual progress online.

comment by Richard_Ngo (ricraz) · 2020-09-17T20:01:32.380Z · LW(p) · GW(p)

Greg Egan on universality:

I believe that humans have already crossed a threshold that, in a certain sense, puts us on an equal footing with any other being who has mastered abstract reasoning. There’s a notion in computing science of “Turing completeness”, which says that once a computer can perform a set of quite basic operations, it can be programmed to do absolutely any calculation that any other computer can do. Other computers might be faster, or have more memory, or have multiple processors running at the same time, but my 1988 Amiga 500 really could be programmed to do anything my 2008 iMac can do — apart from responding to external events in real time — if only I had the patience to sit and swap floppy disks all day long. I suspect that something broadly similar applies to minds and the class of things they can understand: other beings might think faster than us, or have easy access to a greater store of facts, but underlying both mental processes will be the same basic set of general-purpose tools. So if we ever did encounter those billion-year-old aliens, I’m sure they’d have plenty to tell us that we didn’t yet know — but given enough patience, and a very large notebook, I believe we’d still be able to come to grips with whatever they had to say.
Replies from: gwern
comment by gwern · 2020-09-17T21:56:05.791Z · LW(p) · GW(p)

Equivocation. "Who's 'we', flesh man?" Even granting the necessary millions or billions of years for a human to sit down and emulate a superintelligence step by step, it is still not the human who understands, but the Chinese room.

Replies from: An1lam
comment by NaiveTortoise (An1lam) · 2020-09-17T23:06:41.160Z · LW(p) · GW(p)

I've seen this quote before and always find it funny because when I read Greg Egan, I constantly find myself thinking there's no way I could've come up with the ideas he has even if you gave me months or years of thinking time.

Replies from: gwern
comment by gwern · 2020-09-18T00:33:50.179Z · LW(p) · GW(p)

Yes, there's something to that, but you have to be careful if you want to use that as an objection. Maybe you wouldn't easily think of it, but that doesn't exclude the possibility of you doing it: you can come up with algorithms you can execute which would spit out Egan-like ideas, like 'emulate Egan's brain neuron by neuron'. (If nothing else, there's always the ol' dovetail-every-possible-Turing-machine hammer.) Most of these run into computational complexity problems, but that's the escape hatch Egan (and Scott Aaronson has made a similar argument) leaves himself by caveats like 'given enough patience, and a very large notebook'. Said patience might require billions of years, and the notebook might be the size of the Milky Way galaxy, but those are all finite numbers, so technically Egan is correct as far as that goes.

Replies from: An1lam
comment by NaiveTortoise (An1lam) · 2020-09-18T01:02:29.079Z · LW(p) · GW(p)

Yeah good point - given generous enough interpretation of the notebook my rejection doesn't hold. It's still hard for me to imagine that response feeling meaningful in the context but maybe I'm just failing to model others well here.

comment by Richard_Ngo (ricraz) · 2020-04-26T21:00:05.813Z · LW(p) · GW(p)

There's some possible world in which the following approach to interpretability works:

  • Put an AGI in a bunch of situations where it sometimes is incentivised to lie and sometimes is incentivised to tell the truth.
  • Train a lie detector which is given all its neural weights as input.
  • Then ask the AGI lots of questions about its plans.

One problem that this approach would face if we were using it to interpret a human is that the human might not consciously be aware of what their motivations are. For example, they may believe they are doing something for altruistic reasons, when in fact their unconscious motivations are primarily to look good. And the motivations which we are less conscious of are exactly those ones which it's most disadvantageous for others to know about.

So would using such an interpretability technique on an AGI work? I guess one important question is something like: by default, would the AGI be systematically biased when talking about its plans, like humans are? Or is this something which only arises when there are selection pressures during training for hiding information?

One way we could avoid this problem: instead of a "lie detector", you could train a "plan identifier", which takes an AGI brain and tells you what that AGI is going to do in english. I'm a little less optimistic about this, since I think that gathering training data will be the big bottleneck either way, and getting enough data to train a plan identifier that's smart enough to generalise to a wide range of plans seems pretty tricky. (By contrast, the lie detector might not need to know very much about the *content* of the lies).

comment by Richard_Ngo (ricraz) · 2020-04-26T10:42:18.798Z · LW(p) · GW(p)

I've heard people argue that "most" utility functions lead to agents with strong convergent instrumental goals. This obviously depends a lot on how you quantify over utility functions. Here's one intuition in the other direction. I don't expect this to be persuasive to most people who make the argument above (but I'd still be interested in hearing why not).

If a non-negligible percentage of an agent's actions are random, then to describe it as a utility-maximiser would require an incredibly complex utility function (because any simple hypothesised utility function will eventually be falsified by a random action). And so this generates arbitrarily simple agents whose observed behaviour can only be described as maximising a utility function for arbitrarily complex utility functions (depending on how long you run them).

I expect people to respond something like: we need a theory of how to describe agents with bounded cognition anyway. And if you have such a theory, then we could describe the agent above as "maximising simple function U, subject to the boundedness constraint that X% of its actions are random".

Replies from: TurnTrout, DanielFilan, Vaniver
comment by TurnTrout · 2020-04-26T14:34:57.887Z · LW(p) · GW(p)

I'm not sure if you consider me to be making that argument [LW · GW], but here are my thoughts: I claim that most reward functions lead to agents with strong convergent instrumental goals. However, I share your intuition that (somehow) uniformly sampling utility functions over universe-histories might not lead to instrumental convergence.

To understand instrumental convergence and power-seeking, consider how many reward functions we might specify automatically imply a causal mechanism for increasing reward. The structure of the reward function implies that more is better, and that there are mechanisms for repeatedly earning points (for example, by showing itself a high-scoring input).

Since the reward function is "simple" (there's usually not a way to grade exact universe histories), these mechanisms work in many different situations and points in time. It's naturally incentivized to assure its own safety in order to best leverage these mechanisms for gaining reward. Therefore, we shouldn't be surprised to see a lot of these simple goals leading to the same kind of power-seeking behavior.

What structure is implied by a reward function?

  • Additive/Markovian: while a utility function might be over an entire universe-history, reward is often additive over time steps. This is a strong constraint which I don't always expect to be true, but i think that among the goals with this structure, a greater proportion of them have power-seeking incentives.
  • Observation-based: while a utility function might be over an entire universe-history, the atom of the reward function is the observation. Perhaps the observation is an input to update a world model, over which we have tried to define a reward function. I think that most ways of doing this lead to power-seeking incentives.
  • Agent-centric: reward functions are defined with respect to what the agent can observe. Therefore, in partially observable environments, there is naturally a greater emphasis on the agent's vantage point in the environment.

My theorems apply to the finite, fully observable, Markovian situation.[1] We might not end up using reward functions for more impressive tasks – we might express preferences over incomplete trajectories, for example. The "specify a reward function over the agent's world model" approach may or may not lead to good subhuman performance in complicated tasks like cleaning warehouses. Imagine specifying a reward function over pure observations for that task – the agent would probably just get stuck looking at a wall in a particularly high-scoring way.

However, for arbitrary utility functions over universe histories, the structure isn't so simple. With utility functions over universe histories having far more degrees of freedom, arbitrary policies can be rationalized as VNM expected utility maximization [? · GW]. That said, with respect to a simplicity prior over computable utility functions, the power-seeking ones might have most of the measure.

A more appropriate claim might be: goal-directed behavior tends to lead to power-seeking, and that's why goal-directed behavior tends to be bad [LW · GW].


  1. However, it's well-known that you can convert finite non-Markovian MDPs into finite Markovian MDPs. ↩︎

Replies from: ricraz, ricraz
comment by Richard_Ngo (ricraz) · 2020-04-30T13:34:52.891Z · LW(p) · GW(p)

I've just put up a post [LW · GW] which serves as a broader response to the ideas underpinning this type of argument.

comment by Richard_Ngo (ricraz) · 2020-04-26T21:05:45.307Z · LW(p) · GW(p)
I claim that most reward functions lead to agents with strong convergent instrumental goals

I think this depends a lot on how you model the agent developing. If you start off with a highly intelligent agent which has the ability to make long-term plans, but doesn't yet have any goals, and then you train it on a random reward function - then yes, it probably will develop strong convergent instrumental goals.

On the other hand, if you start off with a randomly initialised neural network, and then train it on a random reward function, then probably it will get stuck in a local optimum pretty quickly, and never learn to even conceptualise these things called "goals".

I claim that when people think about reward functions, they think too much about the former case, and not enough about the latter. Because while it's true that we're eventually going to get highly intelligent agents which can make long-term plans, it's also important that we get to control what reward functions they're trained on up to that point. And so plausibly we can develop intelligent agents that, in some respects, are still stuck in "local optima" in the way they think about convergent instrumental goals - i.e. they're missing whatever cognitive functionality is required for being ambitious on a large scale.

Replies from: TurnTrout
comment by TurnTrout · 2020-04-26T21:15:07.986Z · LW(p) · GW(p)

Agreed – I should have clarified. I've been mostly discussing instrumental convergence with respect to optimal policies. The path through policy space is also important.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-04-27T01:39:47.532Z · LW(p) · GW(p)

Makes sense. For what it's worth, I'd also argue that thinking about optimal policies at all is misguided (e.g. what's the optimal policy for humans - the literal best arrangement of neurons we could possibly have for our reproductive fitness? Probably we'd be born knowing arbitrarily large amounts of information. But this is just not relevant to predicting or modifying our actual behaviour at all).

Replies from: TurnTrout
comment by TurnTrout · 2020-04-27T02:07:34.193Z · LW(p) · GW(p)

I disagree.

  1. We do in fact often train agents using algorithms which are proven to eventually converge to the optimal policy.[1] Even if we don't expect the trained agents to reach the optimal policy in the real world, we should still understand what behavior is like at optimum. If you think your proposal is not aligned at optimum but is aligned for realistic training paths, you should have a strong story for why.

  2. Formal theorizing about instrumental convergence with respect to optimal behavior is strictly easier than theorizing about -optimal behavior, which I think is what you want for a more realistic treatment of instrumental convergence for real agents. Even if you want to think about sub-optimal policies, if you don't understand optimal policies... good luck! Therefore, we also have an instrumental (...) interest in studying the behavior at optimum.


  1. At least, the tabular algorithms are proven, but no one uses those for real stuff. I'm not sure what the results are for function approximators, but I think you get my point. ↩︎

Replies from: ricraz, Pattern
comment by Richard_Ngo (ricraz) · 2020-04-27T17:26:33.629Z · LW(p) · GW(p)

1. I think it's more accurate to say that, because approximately none of the non-trivial theoretical results hold for function approximation, approximately none of our non-trivial agents are proven to eventually converge to the optimal policy. (Also, given the choice between an algorithm without convergence proofs that works in practice, and an algorithm with convergence proofs that doesn't work in practice, everyone will use the former). But we shouldn't pay any attention to optimal policies anyway, because the optimal policy in an environment anything like the real world is absurdly, impossibly complex, and requires infinite compute.

2. I think theorizing about ϵ-optimal behavior is more useful than theorizing about optimal behaviour by roughly ϵ, for roughly the same reasons. But in general, clearly I can understand things about suboptimal policies without understanding optimal policies. I know almost nothing about the optimal policy in StarCraft, but I can still make useful claims about AlphaStar (for example: it's not going to take over the world).

Again, let's try cash this out. I give you a human - or, say, the emulation of a human, running in a simulation of the ancestral environment. Is this safe? How do you make it safer? What happens if you keep selecting for intelligence? I think that the theorising you talk about will be actively harmful for your ability to answer these questions.

Replies from: TurnTrout
comment by TurnTrout · 2020-04-27T18:14:23.236Z · LW(p) · GW(p)

I'm confused, because I don't disagree with any specific point you make - just the conclusion. Here's my attempt at a disagreement which feels analogous to me:

TurnTrout: here's how spherical cows roll downhill!

ricraz: real cows aren't spheres.

My response in this "debate" is: if you start with a spherical cow and then consider which real world differences are important enough to model, you're better off than just saying "no one should think about spherical cows".

I think that the theorising you talk about will be actively harmful for your ability to answer these questions.

I don't understand why you think that. If you can have a good understanding of instrumental convergence and power-seeking for optimal agents, then you can consider whether any of those same reasons apply for suboptimal humans.

Considering power-seeking for optimal agents is a relaxed problem [LW · GW]. Yes, ideally, we would instantly jump to the theory that formally describes power-seeking for suboptimal agents with realistic goals in all kinds of environments. But before you do that, a first step is understanding power-seeking in MDPs [LW · GW]. Then, you can take formal insights from this first step and use them to update your pre-theoretic intuitions where appropriate.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-04-29T00:50:49.319Z · LW(p) · GW(p)

Thanks for engaging despite the opacity of the disagreement. I'll try to make my position here much more explicit (and apologies if that makes it sound brusque). The fact that your model is a simplified abstract model is not sufficient to make it useful. Some abstract models are useful. Some are misleading and will cause people who spend time studying them to understand the underlying phenomenon less well than they did before. From my perspective, I haven't seen you give arguments that your models are in the former category not the latter. Presumably you think they are in fact useful abstractions - why? (A few examples of the latter: behaviourism, statistical learning theory, recapitulation theory, Gettier-style analysis of knowledge).

My argument for why they're overall misleading: when I say that "the optimal policy in an environment anything like the real world is absurdly, impossibly complex, and requires infinite compute", or that safety researchers shouldn't think about AIXI, I'm not just saying that these are inaccurate models. I'm saying that they are modelling fundamentally different phenomena than the ones you're trying to apply them to. AIXI is not "intelligence", it is brute force search, which is a totally different thing that happens to look the same in the infinite limit. Optimal tabular policies are not skill at a task, they are a cheat sheet, but they happen to look similar in very simple cases.

Probably the best example of what I'm complaining about is Ned Block trying to use Blockhead to draw conclusions about intelligence. I think almost everyone around here would roll their eyes hard at that. But then people turn around and use abstractions that are just as unmoored from reality as Blockhead, often in a very analogous way. (This is less a specific criticism of you, TurnTrout, and more a general criticism of the field).

if you start with a spherical cow and then consider which real world differences are important enough to model, you're better off than just saying "no one should think about spherical cows".

Forgive me a little poetic license. The analogy in my mind is that you were trying to model the cow as a sphere, but you didn't know how to do so without setting its weight as infinite, and what looked to you like your model predicting the cow would roll downhill was actually your model predicting that the cow would swallow up the nearby fabric of spacetime and the bottom of the hill would fall into its event horizon. At which point, yes, you would be better off just saying "nobody should think about spherical cows".

Replies from: TurnTrout
comment by TurnTrout · 2020-04-29T18:42:47.044Z · LW(p) · GW(p)

Thanks for elaborating this interesting critique. I agree we generally need to be more critical of our abstractions.

I haven't seen you give arguments that your models [of instrumental convergence] are [useful for realistic agents]

Falsifying claims and "breaking" proposals is a classic element of AI alignment discourse and debate. Since we're talking about superintelligent agents, we can't predict exactly what a proposal would do. However, if I make a claim ("a superintelligent paperclip maximizer would keep us around because of gains from trade"), you can falsify this by showing that my claimed policy is dominated by another class of policies ("we would likely be comically resource-inefficient in comparison; GFT arguments don't model dynamics which allow killing other agents and appropriating their resources").

Even we can come up with this dominant policy class, so the posited superintelligence wouldn't miss it either. We don't know what the superintelligent policy will be, but we know what it won't be (see also Formalizing convergent instrumental goals). Even though I don't know how Gary Kasparov will open the game, I confidently predict that he won't let me checkmate him in two moves.

Non-optimal power and instrumental convergence

Instead of thinking about optimal policies, let's consider the performance of a given algorithm . takes a rewardless MDP and a reward function as input, and outputs a policy.

Definition. Let be a continuous distribution over reward functions with CDF . The average return achieved by algorithm at state and discount rate is

Instrumental convergence with respect to 's policies can be defined similarly ("what is the -measure of a given trajectory under ?"). The theory I've laid out allows precise claims, which is a modest benefit to our understanding. Before, we just had intuitions about some vague concept called "instrumental convergence".

Here's bad reasoning, which implies that the cow tears a hole in spacetime:

Suppose the laws of physics bestow godhood upon an agent executing some convoluted series of actions; in particular, this allows avoiding heat death. Clearly, it is optimal for the vast majority of agents to instantly become god.

The problem is that it's impractical to predict what a smarter agent will do, or what specific kinds of action will be instrumentally convergent for , or that the real agent would be infinitely smart. Just because it's smart doesn't mean it's omniscient, as you rightly point out.

Here's better reasoning:

Suppose that the MDP modeling the real world represents shutdown as a single terminal state. Most optimal agents don't allow themselves to be shut down. Furthermore, since we can see that most goals offer better reward at non-shutdown states, superintelligent can as well.[1] While I don't know exactly what will tend to do, I predict that policies generated by will tend to resist shutdown.


  1. It might seem like I'm assuming the consequent here. This is not so – the work is first done by the theorems on optimal behavior, which do imply that most goals achieve greater return by avoiding shutdown. The question is whether reasonably intelligent suboptimal agents realize this fact. Given a uniformly drawn reward function, we can usually come up with a better policy than dying, so the argument is that can as well. ↩︎

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-04-29T23:22:41.783Z · LW(p) · GW(p)

I'm afraid I'm mostly going to disengage here, since it seems more useful to spend the time writing up more general + constructive versions of my arguments, rather than critiquing a specific framework.

If I were to sketch out the reasons I expect to be skeptical about this framework if I looked into it in more detail, it'd be something like:

1. Instrumental convergence isn't training-time behaviour, it's test-time behaviour. It isn't about increasing reward, it's about achieving goals (that the agent learned by being trained to increase reward).

2. The space of goals that agents might learn is very different from the space of reward functions. As a hypothetical, maybe it's the case that neural networks are just really good at producing deontological agents, and really bad at producing consequentialists. (E.g, if it's just really really difficult for gradient descent to get a proper planning module working). Then agents trained on almost all reward functions will learn to do well on them without developing convergent instrumental goals. (I expect you to respond that being deontological won't get you to optimality. But I would say that talking about "optimality" here ruins the abstraction, for reasons outlined in my previous comment).

Replies from: TurnTrout
comment by TurnTrout · 2020-04-30T20:18:12.337Z · LW(p) · GW(p)

I expect you to respond that being deontological won't get you to optimality. But I would say that talking about "optimality" here ruins the abstraction, for reasons outlined in my previous comment

I was actually going to respond, "that's a good point, but (IMO) a different concern than the one you initially raised". I see you making two main critiques.

  1. (paraphrased) " won't produce optimal policies for the specified reward function [even assuming alignment generalization off of the training distribution], so your model isn't useful" – I replied to this critique above.

  2. "The space of goals that agents might learn is very different from the space of reward functions." I agree this is an important part of the story. I think the reasonable takeaway is "current theorems on instrumental convergence help us understand what superintelligent won't do, assuming no reward-result gap. Since we can't assume alignment generalization, we should keep in mind how the inductive biases of gradient descent affect the eventual policy produced."

I remain highly skeptical of the claim that applying this idealized theory of instrumental convergence worsens our ability to actually reason about it.

ETA: I read some information you privately messaged me, and i see why you might see the above two points as a single concern.

comment by Pattern · 2020-04-27T03:39:04.318Z · LW(p) · GW(p)
We do in fact often train agents using algorithms which are proven to eventually converge to the optimal policy.[1] [LW · GW]
At least, the tabular algorithms are proven, but no one uses those for real stuff. I'm not sure what the results are for function approximators, but I think you get my point. ↩︎ [LW · GW]

Is the point that people try to use algorithms which they think will eventually converge to the optimal policy? (Assuming there is one.)

Replies from: TurnTrout
comment by TurnTrout · 2020-04-27T03:55:21.632Z · LW(p) · GW(p)

Something like that, yeah.

comment by DanielFilan · 2020-08-22T06:00:42.015Z · LW(p) · GW(p)

And so this generates arbitrarily simple agents whose observed behaviour can only be described as maximising a utility function for arbitrarily complex utility functions (depending on how long you run them).

I object to the claim that agents that act randomly can be made "arbitrarily simple". Randomness is basically definitionally complicated!

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-08-22T06:32:07.112Z · LW(p) · GW(p)

Eh, this seems a bit nitpicky. It's arbitrarily simple given a call to a randomness oracle, which in practice we can approximate pretty easily. And it's "definitionally" easy to specify as well: "the function which, at each call, returns true with 50% likelihood and false otherwise."

Replies from: DanielFilan
comment by DanielFilan · 2020-08-22T16:27:27.083Z · LW(p) · GW(p)

If you get an 'external' randomness oracle, then you could define the utility function pretty simply in terms of the outputs of the oracle.

If the agent has a pseudo-random number generator (PRNG) inside it, then I suppose I agree that you aren't going to be able to give it a utility function that has the standard set of convergent instrumental goals, and PRNGs can be pretty short. (Well, some search algorithms are probably shorter, but I bet they have higher Kt complexity, which is probably a better measure for agents)

comment by Vaniver · 2020-04-29T23:27:41.768Z · LW(p) · GW(p)

If a reasonable percentage of an agent's actions are random, then to describe it as a utility-maximiser would require an incredibly complex utility function (because any simple hypothesised utility function will eventually be falsified by a random action).

I'd take a different tack here, actually; I think this depends on what the input to the utility function is. If we're only allowed to look at 'atomic reality', or the raw actions the agent takes, then I think your analysis goes through, that we have a simple causal process generating the behavior but need a very complicated utility function to make a utility-maximizer that matches the behavior.

But if we're allowed to decorate the atomic reality with notes like "this action was generated randomly", then we can have a utility function that's as simple as the generator, because it just counts up the presence of those notes. (It doesn't seem to me like this decorator is meaningfully more complicated than the thing that gave us "agents taking actions" as a data source, so I don't think I'm paying too much here.)

This can lead to a massive explosion in the number of possible utility functions (because there's a tremendous number of possible decorators), but I think this matches the explosion that we got by considering agents that were the outputs of causal processes in the first place. That is, consider reasoning about python code that outputs actions in a simple game, where there are many more possible python programs than there are possible policies in the game.

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-04-29T23:56:49.603Z · LW(p) · GW(p)

So in general you can't have utility functions that are as simple as the generator, right? E.g. the generator could be deontological. In which case your utility function would be complicated. Or it could be random, or it could choose actions by alphabetical order, or...

And so maybe you can have a little note for each of these. But now what it sounds like is: "I need my notes to be able to describe every possible cognitive algorithm that the agent could be running". Which seems very very complicated.

I guess this is what you meant by the "tremendous number" of possible decorators. But if that's what you need to do to keep talking about "utility functions", then it just seems better to acknowledge that they're broken as an abstraction.

E.g. in the case of python code, you wouldn't do anything analogous to this. You would just try to reason about all the possible python programs directly. Similarly, I want to reason about all the cognitive algorithms directly.

Replies from: Vaniver
comment by Vaniver · 2020-04-30T23:45:51.293Z · LW(p) · GW(p)

Which seems very very complicated.

That's right.

I realized my grandparent comment is unclear here:

but need a very complicated utility function to make a utility-maximizer that matches the behavior.

This should have been "consequence-desirability-maximizer" or something, since the whole question is "does my utility function have to be defined in terms of consequences, or can it be defined in terms of arbitrary propositions?". If I want to make the deontologist-approximating Innocent-Bot, I have a terrible time if I have to specify the consequences that correspond to the bot being innocent and the consequences that don't, but if you let me say "Utility = 0 - badness of sins committed" then I've constructed a 'simple' deontologist. (At least, about as simple as the bot that says "take random actions that aren't sins", since both of them need to import the sins library.)

In general, I think it makes sense to not allow this sort of elaboration of what we mean by utility functions, since the behavior we want to point to is the backwards assignment of desirability to actions based on the desirability of their expected consequences, rather than the expectation of any arbitrary property.

---

Actually, I also realized something about your original comment which I don't think I had the first time around; if by "some reasonable percentage of an agent's actions are random" you mean something like "the agent does epsilon-exploration" or "the agent plays an optimal mixed strategy", then I think it doesn't at all require a complicated utility function to generate identical behavior. Like, in the rock-paper-scissors world, and with the simple function 'utility = number of wins', the expected utility maximizing move (against tough competition) is to throw randomly, and we won't falsify the simple 'utility = number of wins' hypothesis by observing random actions.

Instead I read it as something like "some unreasonable percentage of an agent's actions are random", where the agent is performing some simple-to-calculate mixed strategy that is either suboptimal or only optimal by luck (when the optimal mixed strategy is the maxent strategy, for example), and matching the behavior with an expected utility maximizer is a challenge (because your target has to be not some fact about the environment, but some fact about the statistical properties of the actions taken by the agent).

---

I think this is where the original intuition becomes uncompelling. We care about utility-maximizers because they're doing their backwards assignment, using their predictions of the future to guide their present actions to try to shift the future to be more like what they want it to be. We don't necessarily care about imitators, or simple-to-write bots, or so on. And so if I read the original post as "the further a robot's behavior is from optimal, the less likely it is to demonstrate convergent instrumental goals", I say "yeah, sure, but I'm trying to build smart robots (or at least reasoning about what will happen if people try to)."

Replies from: ricraz
comment by Richard_Ngo (ricraz) · 2020-05-01T10:56:36.212Z · LW(p) · GW(p)
Instead I read it as something like "some unreasonable percentage of an agent's actions are random"

This is in fact the intended reading, sorry for ambiguity. Will edit. But note that there are probably very few situations where exploring via actual randomness is best; there will almost always be some type of exploration which is more favourable. So I don't think this helps.

We care about utility-maximizers because they're doing their backwards assignment, using their predictions of the future to guide their present actions to try to shift the future to be more like what they want it to be.

To be pedantic: we care about "consequence-desirability-maximisers" (or in Rohin's terminology, goal-directed agents) because they do backwards assignment. But I think the pedantry is important, because people substitute utility-maximisers for goal-directed agents, and then reason about those agents by thinking about utility functions, and that just seems incorrect.

And so if I read the original post as "the further a robot's behavior is from optimal, the less likely it is to demonstrate convergent instrumental goals"

What do you mean by optimal here? The robot's observed behaviour will be optimal for some utility function, no matter how long you run it.

Replies from: Vaniver
comment by Vaniver · 2020-05-01T23:49:40.348Z · LW(p) · GW(p)

To be pedantic: we care about "consequence-desirability-maximisers" (or in Rohin's terminology, goal-directed agents) because they do backwards assignment.

Valid point.

But I think the pedantry is important, because people substitute utility-maximisers for goal-directed agents, and then reason about those agents by thinking about utility functions, and that just seems incorrect.

This also seems right. Like, my understanding of what's going on here is we have:

  • 'central' consequence-desirability-maximizers, where there's a simple utility function that they're trying to maximize according to the VNM axioms
  • 'general' consequence-desirability-maximizers, where there's a complicated utility function that they're trying to maximize, which is selected because it imitates some other behavior

The first is a narrow class, and depending on how strict you are with 'maximize', quite possibly no physically real agents will fall into it. The second is a universal class, which instantiates the 'trivial claim' that everything is utility maximization.

Put another way, the first is what happens if you hold utility fixed / keep utility simple, and then examine what behavior follows; the second is what happens if you hold behavior fixed / keep behavior simple, and then examine what utility follows.

Distance from the first is what I mean by "the further a robot's behavior is from optimal"; I want to say that I should have said something like "VNM-optimal" but actually I think it needs to be closer to "simple utility VNM-optimal." 

I think you're basically right in calling out a bait-and-switch that sometimes happens, where anyone who wants to talk about the universality of expected utility maximization in the trivial 'general' sense can't get it to do any work, because it should all add up to normality, and in normality there's a meaningful distinction between people who sort of pursue fuzzy goals and ruthless utility maximizers.