ricraz's Shortform

post by Richard_Ngo (ricraz) · 2020-04-26T10:42:18.494Z · score: 6 (1 votes) · LW · GW · 84 comments

84 comments

Comments sorted by top scores.

comment by Richard_Ngo (ricraz) · 2020-08-20T14:06:57.832Z · score: 38 (18 votes) · LW(p) · GW(p)

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here. So far my best effort to make that argument has been in the comment thread starting here [LW(p) · GW(p)]. Looking back at that thread, I just noticed that a couple [LW(p) · GW(p)] of those comments [LW(p) · GW(p)] have been downvoted to negative karma. I don't think any of my comments have ever hit negative karma before; I find it particularly sad that the one time it happens is when I'm trying to explain why I think this community is failing at its key goal of cultivating better epistemics.

There's all sorts of arguments to be made here, which I don't have time to lay out in detail. But just step back for a moment. Tens or hundreds of thousands of academics are trying to figure out how the world works, spending their careers putting immense effort into reading and producing and reviewing papers. Even then, there's a massive replication crisis. And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts, and hoping that a few of the best ideas stick? This is not what a desperate effort to find the truth looks like.

comment by Ben Pace (Benito) · 2020-08-20T20:00:42.082Z · score: 10 (9 votes) · LW(p) · GW(p)

The top posts in the 2018 Review [LW · GW] are filled with fascinating and well-explained ideas. Many of the new ideas are not settled science, but they're quite original and substantive, or excellent distillations of settled science, and are often the best piece of writing on the internet about their topics.

You're wrong about LW epistemic standards not being high enough to make solid intellectual progress, we already have. On AI alone (which I am using in large part because there's vaguely more consensus around it than around rationality), I think you wouldn't have seen almost any of the public write-ups (like Embedded Agency and Zhukeepa's Paul FAQ) without LessWrong, and I think a lot of them are brilliant.

I'm not saying we can't do far better, or that we're sufficiently good. Many of the examples of success so far are "Things that were in people's heads but didn't have a natural audience to share them with". There's not a lot of collaboration at present, which is why I'm very keen to build the new LessWrong Docs that allows for better draft sharing and inline comments and more. We're working on the tools for editing tags, things like edit histories and so on, that will allow us to build a functioning wiki system to have canonical writeups and explanation that people add to and refine. I want future iterations of the LW Review to have more allowance for incorporating feedback from reviewers. There's lots of work to do, and we're just getting started. But I disagree the direction isn't "a desperate effort to find the truth". That's what I'm here for.

Even in the last month or two, how do you look at things like this [LW · GW] and this [LW · GW] and this [LW · GW] and this [LW · GW] and not think that they're likely the best publicly available pieces of writing in the world about their subjects? Wrt rationality, I expect things like this [LW · GW] and this [LW · GW] and this [LW · GW] and this [LW · GW] will probably go down as historically important LW posts that helped us understand the world, and make a strong showing in the 2020 LW Review.

comment by Richard_Ngo (ricraz) · 2020-08-21T06:28:12.725Z · score: 11 (5 votes) · LW(p) · GW(p)

As mentioned in my reply to Ruby, this is not a critique of the LW team, but of the LW mentality. And I should have phrased my point more carefully - "epistemic standards are too low to make any progress" is clearly too strong a claim, it's more like "epistemic standards are low enough that they're an important bottleneck to progress". But I do think there's a substantive disagreement here. Perhaps the best way to spell it out is to look at the posts you linked and see why I'm less excited about them than you are.

Of the top posts in the 2018 review, and the ones you linked (excluding AI), I'd categorise them as follows:

Interesting speculation about psychology and society, where I have no way of knowing if it's true:

  • Local Validity as a Key to Sanity and Civilization
  • The Loudest Alarm Is Probably False
  • Anti-social punishment (which is, unlike the others, at least based on one (1) study).
  • Babble
  • Intelligent social web
  • Unrolling social metacognition
  • Simulacra levels
  • Can you keep this secret?

Same as above but it's by Scott so it's a bit more rigorous and much more compelling:

  • Is Science Slowing Down?
  • The tails coming apart as a metaphor for life

Useful rationality content:

  • Toolbox-thinking and law-thinking
  • A sketch of good communication
  • Varieties of argumentative experience

Review of basic content from other fields. This seems useful for informing people on LW, but not actually indicative of intellectual progress unless we can build on them to write similar posts on things that *aren't* basic content in other fields:

  • Voting theory primer
  • Prediction markets: when do they work
  • Costly coordination mechanism of common knowledge (Note: I originally said I hadn't seen many examples of people building on these ideas, but at least for this post there seems to be a lot.)
  • Six economics misconceptions
  • Swiss political system

It's pretty striking to me how much the original sequences drew on the best academic knowledge, and how little most of the things above draw on the best academic knowledge. And there's nothing even close to the thoroughness of Luke's literature reviews.

The three things I'd like to see more of are:

1. The move of saying "Ah, this is interesting speculation about a complex topic. It seems compelling, but I don't have good ways of verifying it; I'll treat it like a plausible hypothesis which could be explored more by further work." (I interpret the thread I originally linked [LW(p) · GW(p)] as me urging Wei to do this).

2. Actually doing that follow-up work. If it's an empirical hypothesis, investigating empirically. If it's a psychological hypothesis, does it apply to anyone who's not you? If it's more of a philosophical hypothesis, can you identify the underlying assumptions and the ways it might be wrong? In all cases, how does it fit into existing thought? (That'll probably take much more than a single blog post).

3. Insofar as many of these scattered plausible insights are actually related in deep ways, trying to combine them so that the next generation of LW readers doesn't have to separately learn about each of them, but can rather download a unified generative framework.

comment by Ben Pace (Benito) · 2020-08-22T03:40:34.339Z · score: 11 (6 votes) · LW(p) · GW(p)

Quoting your reply to Ruby below, I agree I'd like LessWrong to be much better at "being able to reliably produce and build on good ideas". 

The reliability and focus feels most lacking to me on the building side, rather than the production, which I think we're doing quite well at. I think we've successfully formed a publishing platform that provides and audience who are intensely interested in good ideas around rationality, AI, and related subjects, and a lot of very generative and thoughtful people are writing down their ideas here.

We're low on the ability to connect people up to do more extensive work on these ideas – most good hypotheses and arguments don't get a great deal of follow up or further discussion.

Here are some subjects where I think there's been various people sharing substantive perspectives, but I think there's also a lot of space for more 'details' to get fleshed out and subquestions to be cleanly answered:

The above isn't complete, it's just some of the ones that come to mind as having lots of people sharing perspectives. And the list of people definitely isn't complete.

Here examples of things that I'd like to see more of, that feel more like doing the legwork to actually dive into the details:

  • Eli Tyre and Bucky replicating Scott's birth-order hypothesis
  • Katja and the other fine people at AI Impacts doing long-term research on a question (discontinuous progress) with lots of historical datapoints
  • Jameson writing up his whole research question in great detail and very well, and then an excellent commenter turning up and answering it
  • Zhukeepa writing up an explanation of Paul's research, allowing many more to understand it, and allowing Eliezer to write a response
  • Scott writing Goodhart Taxonomy, and the commenters banding together to find a set of four similar examples to add to the post
  • Val writing some interesting things about insight meditation, prompting Kaj to write a non-mysterious explanation
  • In the LW Review when Bucky checked out the paper Zvi analysed and argued it did not support the conclusions Zvi reached (this changed my opinion of Zvi's post from 'true' to 'false')
  • The discussion around covid and EMH prompting Richard Meadows to write down a lot of the crucial and core arguments around the EMH

The above is also not mentioning lots of times when the person generating the idea does a lot of the legwork, like Scott or Jameson or Sarah or someone.

I see a lot of (very high quality) raw energy here that wants shaping and directing, with the use of lots of tools for coordination (e.g. better collaboration tools).

The epistemic standards being low is one way of putting it, but it doesn't resonate with me much and kinda feels misleading. I think our epistemic standards are way higher than the communities you mention (historians, people interested in progress studies). Bryan Caplan said he knows of no group whose beliefs are more likely to be right in general than the rationalists, this seems often accurate to me. I think we do a lot of exploration and generation and evaluation, just not in a very coordinated manner, and so could make progress at like 10x–100x the rate if we collaborated better, and I think we can get there without too much work.

comment by Richard_Ngo (ricraz) · 2020-08-22T06:28:38.901Z · score: 9 (4 votes) · LW(p) · GW(p)

"I see a lot of (very high quality) raw energy here that wants shaping and directing, with the use of lots of tools for coordination (e.g. better collaboration tools)."

Yepp, I agree with this. I guess our main disagreement is whether the "low epistemic standards" framing is a useful way to shape that energy. I think it is because it'll push people towards realising how little evidence they actually have for many plausible-seeming hypotheses on this website. One proven claim is worth a dozen compelling hypotheses, but LW to a first approximation only produces the latter.

When you say "there's also a lot of space for more 'details' to get fleshed out and subquestions to be cleanly answered", I find myself expecting that this will involve people who believe the hypothesis continuing to build their castle in the sky, not analysis about why it might be wrong and why it's not.

That being said, LW is very good at producing "fake frameworks". So I don't want to discourage this too much. I'm just arguing that this is a different thing from building robust knowledge about the world.

comment by Ben Pace (Benito) · 2020-08-23T02:19:05.125Z · score: 5 (4 votes) · LW(p) · GW(p)

One proven claim is worth a dozen compelling hypotheses

I will continue to be contrary and say I'm not sure I agree with this.

For one, I think in many domains new ideas are really hard to come by, as opposed to making minor progress in the existing paradigms. Fundamental theories in physics, a bunch of general insights about intelligence (in neuroscience and AI), etc.

And secondly, I am reminded of what Lukeprog wrote in his moral consciousness report, that he wished the various different philosophies-of-consciousness would stop debating each other, go away for a few decades, then come back with falsifiable predictions. I sometimes take this stance regarding many disagreements of import, such as the basic science vs engineering approaches to AI alignment. It's not obvious to me that the correct next move is for e.g. Eliezer and Paul to debate for 1000 hours, but instead to go away and work on their ideas for a decade then come back with lots of fleshed out details and results that can be more meaningfully debated.

I feel similarly about simulacra levels, Embedded Agency, and a bunch of IFS stuff. I would like to see more experimentation and literature reviews where they make sense, but I also feel like these are implicitly making substantive and interesting claims about the world, and I'd just be interested in getting a better sense of what claims they're making, and have them fleshed out + operationalized more. That would be a lot of progress to me, and I think each of them is seeing that sort of work (with Zvi, Abram, and Kaj respectively leading the charges on LW, alongside many others).

comment by Raemon · 2020-08-23T05:02:22.381Z · score: 10 (6 votes) · LW(p) · GW(p)

I think I'm concretely worried that some of those models / paradigms (and some other ones on LW) don't seem pointed in a direction that leads obviously to "make falsifiable predictions."

And I can imagine worlds where "make falsifiable predictions" isn't the right next step, you need to play around with it more and get it fleshed out in your head before you can do that. But there is at least some writing on LW that feels to me like it leaps from "come up with an interesting idea" to "try to persuade people it's correct" without enough checking.

(In the case of IFS, I think Kaj's sequence is doing a great job of laying it out in a concrete way where it can then be meaningfully disagreed with. But the other people who've been playing around with IFS didn't really seem interested in that, and I feel like we got lucky that Kaj had the time and interest to do so.)

comment by Richard_Ngo (ricraz) · 2020-08-23T06:28:00.026Z · score: 7 (3 votes) · LW(p) · GW(p)

I feel like this comment isn't critiquing a position I actually hold. For example, I don't believe that "the correct next move is for e.g. Eliezer and Paul to debate for 1000 hours". I am happy for people to work towards building evidence for their hypotheses in many ways, including fleshing out details, engaging with existing literature, experimentation, and operationalisation.

Perhaps this makes "proven claim" a misleading phrase to use. Perhaps more accurate to say: "one fully fleshed out theory is more valuable than a dozen intuitively compelling ideas". But having said that, I doubt that it's possible to fully flesh out a theory like simulacra levels without engaging with a bunch of academic literature and then making predictions.

I also agree with Raemon's response below.

comment by John_Maxwell (John_Maxwell_IV) · 2020-08-26T05:27:15.969Z · score: 2 (1 votes) · LW(p) · GW(p)

One proven claim is worth a dozen compelling hypotheses, but LW to a first approximation only produces the latter.

Depends on the claim, right?

If the cost of evaluating a hypothesis is high, and hypotheses are cheap to generate, I would like to generate a great deal before selecting one to evaluate.

comment by Ben Pace (Benito) · 2020-08-23T02:18:58.082Z · score: 1 (3 votes) · LW(p) · GW(p)

Yepp, I agree with this. I guess our main disagreement is whether the "low epistemic standards" framing is a useful way to shape that energy. I think it is because it'll push people towards realising how little evidence they actually have for many plausible-seeming hypotheses on this website.

A housemate of mine said to me they think LW has a lot of breadth, but could benefit from more depth. 

I think in general when we do intellectual work we have excellent epistemic standards, capable of listening to all sorts of evidence that other communities and fields would throw out, and listening to subtler evidence than most scientists ("faster than science"), but that our level of coordination and depth is often low. "LessWrongers should collaborate more and go into more depth in fleshing out their ideas" sounds more true to me than "LessWrongers have very low epistemic standards".

comment by Richard_Ngo (ricraz) · 2020-08-23T06:18:40.815Z · score: 19 (7 votes) · LW(p) · GW(p)
In general when we do intellectual work we have excellent epistemic standards, capable of listening to all sorts of evidence that other communities and fields would throw out, and listening to subtler evidence than most scientists ("faster than science")

"Being more openminded about what evidence to listen to" seems like a way in which we have lower epistemic standards than scientists, and also that's beneficial. It doesn't rebut my claim that there are some ways in which we have lower epistemic standards than many academic communities, and that's harmful.

In particular, the relevant question for me is: why doesn't LW have more depth? Sure, more depth requires more work, but on the timeframe of several years, and hundreds or thousands of contributors, it seems viable. And I'm proposing, as a hypothesis, that LW doesn't have enough depth because people don't care enough about depth - they're willing to accept ideas even before they've been explored in depth. If this explanation is correct, then it seems accurate to call it a problem with our epistemic standards - specifically, the standard of requiring (and rewarding) deep investigation and scholarship.

comment by John_Maxwell (John_Maxwell_IV) · 2020-08-26T05:41:09.199Z · score: 6 (3 votes) · LW(p) · GW(p)

LW doesn't have enough depth because people don't care enough about depth - they're willing to accept ideas even before they've been explored in depth. If this explanation is correct, then it seems accurate to call it a problem with our epistemic standards - specifically, the standard of requiring (and rewarding) deep investigation and scholarship.

Your solution to the "willingness to accept ideas even before they've been explored in depth" problem is to explore ideas in more depth. But another solution is to accept fewer ideas, or hold them much more provisionally.

I'm a proponent of the second approach because:

  • I suspect even academia doesn't hold ideas as provisionally as it should. See Hamming on expertise: https://forum.effectivealtruism.org/posts/mG6mckPHAisEbtKv5/should-you-familiarize-yourself-with-the-literature-before?commentId=SaXXQXLfQBwJc9ZaK [EA(p) · GW(p)]

  • I suspect trying to browbeat people to explore ideas in more depth works against the grain of an online forum as an institution. Browbeating works in academia because your career is at stake, but in an online forum, it just hurts intrinsic motivation and cuts down on forum use (the forum runs on what Clay Shirky called "cognitive surplus", essentially a term for peoples' spare time and motivation). I'd say one big problem with LW 1.0 that LW 2.0 had to solve before flourishing was people felt too browbeaten to post much of anything.

If we accept fewer ideas / hold them much more provisionally, but provide a clear path to having an idea be widely held as true, that creates an incentive for people to try & jump through hoops--and this incentive is a positive one, not a punishment-driven browbeating incentive.

Maybe part of the issue is that on LW, peer review generally happens in the comments after you publish, not before. So there's no publication carrot to offer in exchange for overcoming the objections of peer reviewers.

comment by Richard_Ngo (ricraz) · 2020-08-26T06:39:58.208Z · score: 4 (2 votes) · LW(p) · GW(p)

"If we accept fewer ideas / hold them much more provisionally, but provide a clear path to having an idea be widely held as true, that creates an incentive for people to try & jump through hoops--and this incentive is a positive one, not a punishment-driven browbeating incentive."

Hmm, it sounds like we agree on the solution but are emphasising different parts of it. For me, the question is: who's this "we" that should accept fewer ideas? It's the set of people who agree with my argument that you shouldn't believe things which haven't been fleshed out very much. But the easiest way to add people to that set is just to make the argument, which is what I've done. Specifically, note that I'm not criticising anyone for producing posts that are short and speculative: I'm criticising the people who update too much on those posts.

comment by John_Maxwell (John_Maxwell_IV) · 2020-08-26T08:10:34.050Z · score: 4 (2 votes) · LW(p) · GW(p)

Fair enough. I'm reminded of a time someone summarized one of my posts as being a definitive argument against some idea X and me thinking to myself "even I don't think my post definitively settles this issue" haha.

comment by Raemon · 2020-08-26T05:56:26.144Z · score: 2 (1 votes) · LW(p) · GW(p)

Yeah, this is roughly how I think about it.

I do think right now LessWrong should lean more in the direction the Richard is suggesting – I think it was essential to establish better Babble procedures but now we're doing well enough on that front that I think setting clearer expectations of how the eventual pruning works is reasonable. 

comment by Richard_Ngo (ricraz) · 2020-08-26T06:53:44.830Z · score: 4 (3 votes) · LW(p) · GW(p)

I wanted to register that I don't like "babble and prune" as a model of intellectual development. I think intellectual development actually looks more like:

1. Babble

2. Prune

3. Extensive scholarship

4. More pruning

5. Distilling scholarship to form common knowledge

And that my main criticism is the lack of 3 and 5, not the lack of 2 or 4.

I also note that: a) these steps get monotonically harder, so that focusing on the first two misses *almost all* the work; b) maybe I'm being too harsh on the babble and prune framework because it's so thematically appropriate for me to dunk on it here; I'm not sure if your use of the terminology actually reveals a substantive disagreement.

comment by Raemon · 2020-08-27T05:09:21.433Z · score: 2 (1 votes) · LW(p) · GW(p)

I basically agree with your 5-step model (I at least agree it's a more accurate description than Babel and Prune, which I just meant as rough shorthand). I'd add things like "original research/empiricism" or "more rigorous theorizing" to the "Extensive Scholarship" step. 

I see the LW Review as basically the first of (what I agree should essentially be at least) a 5 step process. It's adding a stronger Step 2, and a bit of Step 5 (at least some people chose to rewrite their posts to be clearer and respond to criticism)

...

Currently, we do get non-zero Extensive Scholarship and Original Empiricism. (Kaj's Multi-Agent Models of Mind [? · GW] seems like it includes real scholarship. Scott Alexander / Eli Tyre and Bucky's exploration into Birth Order Effects seemed like real empiricism). Not nearly as much as I'd like.

But John's comment elsethread [? · GW] seems significant:

If the cost of evaluating a hypothesis is high, and hypotheses are cheap to generate, I would like to generate a great deal before selecting one to evaluate.

This reminded of a couple posts in the 2018 Review, Local Validity as Key to Sanity and Civilization [LW · GW], and Is Clickbait Destroying Our General Intelligence? [LW · GW]. Both of those seemed like "sure, interesting hypothesis. Is it real tho?"

During the Review I created a followup "How would we check if Mathematicians are Generally More Law Abiding? [LW · GW]" question, trying to move the question from Stage 2 to 3. I didn't get much serious response, probably because, well, it was a much harder question.

But, honestly... I'm not sure it's actually a question that was worth asking. I'd like to know if Eliezer's hypothesis about mathematicians is true, but I'm not sure it ranks near the top of questions I'd want people to put serious effort into answering. 

I do want LessWrong to be able to followup Good Hypotheses with Actual Research, but it's not obvious which questions are worth answering. OpenPhil et al are paying for some types of answers, I think usually by hiring researchers full time. It's not quite clear what the right role for LW to play in the ecosystem.

comment by John_Maxwell (John_Maxwell_IV) · 2020-08-26T11:52:22.090Z · score: 2 (1 votes) · LW(p) · GW(p)
  1. All else equal, the harder something is, the less we should do it.

  2. My quick take is that writing lit reviews/textbooks is a comparative disadvantage of LW relative to the mainstream academic establishment.

In terms of producing reliable knowledge... if people actually care about whether something is true, they can always offer a cash prize for the best counterargument (which could of course constitute citation of academic research). The fact that people aren't doing this suggests to me that for most claims on LW, there isn't any (reasonably rich) person who cares deeply re: whether the claim is true. I'm a little wary of putting a lot of effort into supply if there is an absence of demand.

(I guess the counterargument is that accurate knowledge is a public good so an individual's willingness to pay doesn't get you complete picture of the value accurate knowledge brings. Maybe what we need is a way to crowdfund bounties for the best argument related to something.)

(I agree that LW authors would ideally engage more with each other and academic literature on the margin.)

comment by AllAmericanBreakfast · 2020-08-26T16:16:05.679Z · score: 4 (2 votes) · LW(p) · GW(p)

I’ve been thinking about the idea of “social rationality” lately, and this is related. We do so much here in the way of training individual rationality - the inputs, functions, and outputs of a single human mind. But if truth is a product, then getting human minds well-coordinated to produce it might be much more important than training them to be individually stronger. Just as assembly line production is much more effective in producing almost anything than teaching each worker to be faster in assembling a complete product by themselves.

My guess is that this could be effective not only in producing useful products, but also in overcoming biases. Imagine you took 5 separate LWers and asked them to create a unified consensus response to a given article. My guess is that they’d learn more through that collective effort, and produce a more useful response, than if they spent the same amount of time individually evaluating the article and posting their separate replies.

Of course, one of the reasons we don’t to that so much is that coordination is an up-front investment and is unfamiliar. Figuring out social technology to make it easier to participate in might be a great project for LW.

comment by John_Maxwell (John_Maxwell_IV) · 2020-08-27T04:36:06.301Z · score: 9 (4 votes) · LW(p) · GW(p)

There's been a fair amount of discussion of that sort of thing here: https://www.lesswrong.com/tag/group-rationality [? · GW] There are also groups outside LW thinking about social technology such as RadicalxChange.

Imagine you took 5 separate LWers and asked them to create a unified consensus response to a given article. My guess is that they’d learn more through that collective effort, and produce a more useful response, than if they spent the same amount of time individually evaluating the article and posting their separate replies.

I'm not sure. If you put those 5 LWers together, I think there's a good chance that the highest status person speaks first and then the others anchor on what they say and then it effectively ends up being like a group project for school with the highest status person in charge. Some [LW · GW] related links [LW(p) · GW(p)].

comment by AllAmericanBreakfast · 2020-08-27T14:17:45.925Z · score: 3 (2 votes) · LW(p) · GW(p)

That’s definitely a concern too! I imagine such groups forming among people who either already share a basic common view, and collaborate to investigate more deeply. That way, any status-anchoring effects are mitigated.

Alternatively, it could be an adversarial collaboration. For me personally, some of the SSC essays in this format have led me to change my mind in a lasting way.

comment by curi · 2020-09-03T22:32:10.667Z · score: 2 (5 votes) · LW(p) · GW(p)

they're willing to accept ideas even before they've been explored in depth

People also reject ideas before they've been explored in depth. I've tried to discuss similar issues with LW [LW · GW] before but the basic response was roughly "we like chaos where no one pays attention to whether an argument has ever been answered by anyone; we all just do our own thing with no attempt at comprehensiveness or organizing who does what; having organized leadership of any sort, or anyone who is responsible for anything, would be irrational" (plus some suggestions that I'm low social status and that therefore I personally deserve to be ignored. there were also suggestions – phrased rather differently but amounting to this – that LW will listen more if published ideas are rewritten, not to improve on any flaws, but so that the new versions can be published at LW before anywhere else, because the LW community's attention allocation is highly biased towards that).

comment by Ben Pace (Benito) · 2020-08-23T06:48:39.182Z · score: 2 (1 votes) · LW(p) · GW(p)

I feel somewhat inclined to wrap up this thread at some point, even while there's more to say. We can continue if you like and have something specific or strong you'd like to ask, but otherwise will pause here.

comment by TAG · 2020-08-23T10:49:01.224Z · score: 1 (1 votes) · LW(p) · GW(p)

why doesn’t LW have more depth?

You have to realise that what you are doing isn't adequate in order to gain the motivation to do it better, and that is unlikely to happen if you are mostly communicating with other people who think everything is OK.

comment by TAG · 2020-08-23T10:44:02.903Z · score: 3 (2 votes) · LW(p) · GW(p)

Lesswrong is competing against philosophy as well as science, and philosophy has broader criterion of evidence still. In fact , lesswrongians are often frustrated that mainstream philosophy takes such topics as dualism or theism seriously.. even though theres an abundance of Bayesian evidence for them.

comment by Ruby · 2020-08-21T07:57:52.321Z · score: 9 (5 votes) · LW(p) · GW(p)

(Thanks for laying out your position in this level of depth. Sorry for how long this comment turned out. I guess I wanted to back up a bunch of my agreement with words. It's a comment for the sake of everyone else, not just you.)

I think there's something to what you're saying, that the mentality itself could be better. The Sequences have been criticized because Eliezer didn't cite previous thinkers all that much, but at least as far as the science goes, as you said, he was drawing on academic knowledge. I also think we've lost something precious with the absence of epic topic reviews by the likes of Luke. Kaj Sotala still brings in heavily from outside knowledge, John Wentworth did a great review on Biological Circuits, and we get SSC crossposts that have that, but otherwise posts aren't heavily referencing or building upon outside stuff. I concede that I would like to see a lot more of that.

I think Kaj was rightly disappointed that he didn't get more engagement with his post whose gist was "this is what the science really says about S1 & S2, one of your most cherished concepts, LW community".

I wouldn't say the typical approach is strictly bad, there's value in thinking freshly for oneself or that failure to reference previous material shouldn't be a crime or makes a text unworthy, but yeah, it'd be pretty cool if after Alkjash laid out Babble & Prune (which intuitively feels so correct), someone had dug through what empirical science we have to see whether the picture lines up. Or heck, actually gone and done some kind of experiment. I bet it would turn up something interesting.

And I think what you're saying is that the issue isn't just that people aren't following up with scholarship and empiricism on new ideas and models, but that they're actually forgetting that these are the next steps. Instead, they're overconfident in our homegrown models, as though LessWrong were the one place able to come up with good ideas. (Sorry, some of this might be my own words.) 

The category I'd label a lot of LessWrong posts with is "engaging articulation of a point which is intuitive in hindsight" / "creation of common vocabulary around such points". That's pretty valuable, but I do think solving the hardest problems will take more.

-----

You use the word "reliably" in a few places. It feels like it's doing some work in your statements, and I'm not entirely sure what you mean or why it's important.

-----

A model which is interesting but maybe not of obvious connection. I was speaking to a respected rationalist thinker this week and they classified potential writing on LessWrong into three categories: 

  1. Writing stuff to help oneself figure things out. Like a diary, but publicly shared.
  2. People exchanging "letters" as they attempt to figure things out. Like old school academic journals.
  3. Someone having something mostly figured out but with a large inferential distance to bridge. They write a large collection of posts trying to cover that distance. One example is The Sequences, and more recent examples are from  John Wentworth and Kaj Sotala

I mention this because I recall you (alongside the rationalist thinker) complaining about the lack of people "presenting their worldviews on LessWrong".

The kinds of epistemic norms I think you're advocating for feel like a natural fit for 2nd kind of writing, but it's less clear to me how they should apply to people presenting world views. Maybe it's not more complicated than it's fine to present your worldview without a tonne of evidence, but people shouldn't forget that the evidence hasn't been presented and it feeling intuitively correct isn't enough.

-----

There's something in here about Epistemic Modesty, something, something. Some part of me reads you as calling for more of that, which I'm wary of, but I don't currently have more to say than flagging it as maybe a relevant variable in any disagreements here.

We probably do disagree about the value of academic sources, or what it takes to get value from them. Hmm. Maybe it's something like there's something to be said for thinking about models and assessing their plausibility yourself rather than relying on likely very flawed empirical studies. 

Maybe I'm in favor of large careful reviews of what science knows but less in favor of trying to find sources for each idea or model that gets raised. I'm not sure.

-----

I can't recall whether I've written publicly much about this, but a model I've had for a year or more is that for LW to make intellectual progress, we need to become a "community of practice", not just a "community of interest". Martial arts vs literal stamp collecting. (Streetfighting might be better still due to actual testing real fighting ability.) It's great that many people find LessWrong a guilty pleasure they feel less guilty about than Facebook, but for us to make progress, people need to see LessWrong as a place where one of things you do is show up and do Serious Work, some of which is relatively hard and boring, like writing and reading lit reviews.

I suspect that a cap on the epistemic standards people hold stuff to is downstream of the level of effort people are calibrated on applying. But maybe it goes in other direction, so I don't know.

Probably the 2018 Review is biased towards the posts which are most widely read, i.e., those easiest and most enjoyable to read, rather than solely rewarding those with the best contributions. Not overwhelmingly, but enough. Maybe same for karma. I'm not sure how to relate to that.

-----

3. Insofar as many of these scattered plausible insights are actually related in deep ways, trying to combine them so that the next generation of LW readers doesn't have to separately learn about each of them, but can rather download a unified generative framework.

This sounds partially like distillation work plus extra integration. And sounds pretty good to me too.


-----

I still remember my feeling of disillusionment in the LessWrong community relative soon after I joined in late 2012. I realized that the bulk of members didn't seem serious about advancing the Art. I never heard people discussing new results from cognitive science and how to apply them, even though that's what Sequences were in large part and the Sequences hardly claimed to be complete! I guess I do relate somewhat to your "desperate effort" comment, though we've got some people trying pretty hard that I wouldn't want to short change.

We do good stuff, but more is possible [LW · GW]. I appreciate the reminder. I hope we succeed at pushing the culture and mentality in directions you like.

comment by drossbucket · 2020-08-23T08:44:46.917Z · score: 21 (8 votes) · LW(p) · GW(p)

This is only tangentially relevant, but adding it here as some of you might find it interesting:

Venkatesh Rao has an excellent Twitter thread on why most independent research only reaches this kind of initial exploratory level (he tried it for a bit before moving to consulting). It's pretty pessimistic, but there is a somewhat more optimistic follow-up thread on potential new funding models. Key point is that the later stages are just really effortful and time-consuming, in a way that keeps out a lot of people trying to do this as a side project alongside a separate main job (which I think is the case for a lot of LW contributors?)

Quote from that thread:

Research =

a) long time between having an idea and having something to show for it that even the most sympathetic fellow crackpot would appreciate (not even pay for, just get)

b) a >10:1 ratio of background invisible thinking in notes, dead-ends, eliminating options etc

With a blogpost, it’s like a week of effort at most from idea to mvp, and at most a 3:1 ratio of invisible to visible. That’s sustainable as a hobby/side thing.

To do research-grade thinking you basically have to be independently wealthy and accept 90% deadweight losses

Also just wanted to say good luck! I'm a relative outsider here with pretty different interests to LW core topics but I do appreciate people trying to do serious work outside academia, have been trying to do this myself, and have thought a fair bit about what's currently missing (I wrote that in a kind of jokey style but I'm serious about the topic).

comment by Richard_Ngo (ricraz) · 2020-08-23T10:05:22.149Z · score: 5 (2 votes) · LW(p) · GW(p)

Also, I liked your blog post! More generally, I strongly encourage bloggers to have a "best of" page, or something that directs people to good posts. I'd be keen to read more of your posts but have no idea where to start.

comment by drossbucket · 2020-08-23T10:48:43.543Z · score: 6 (4 votes) · LW(p) · GW(p)

Thanks! I have been meaning to add a 'start here' page for a while, so that's good to have the extra push :) Seems particularly worthwhile in my case because a) there's no one clear theme and b) I've been trying a lot of low-quality experimental posts this year bc pandemic trashed motivation, so recent posts are not really reflective of my normal output.

For now some of my better posts in the last couple of years might be Cognitive decoupling and banana phones (tracing back the original precursor of Stanovich's idea), The middle distance (a writeup of a useful and somewhat obscure idea from Brian Cantwell Smith's On the Origin of Objects), and the negative probability post and its followup.

comment by Richard_Ngo (ricraz) · 2020-08-23T09:44:15.112Z · score: 3 (2 votes) · LW(p) · GW(p)

Thanks, these links seem great! I think this is a good (if slightly harsh) way of making a similar point to mine:

"I find that autodidacts who haven’t experienced institutional R&D environments have a self-congratulatory low threshold for what they count as research. It’s a bit like vanity publishing or fan fiction. This mismatch doesn’t exist as much in indie art, consulting, game dev etc"

comment by DanielFilan · 2020-08-21T06:34:31.394Z · score: 2 (1 votes) · LW(p) · GW(p)

As mentioned in this comment [LW(p) · GW(p)], the Unrolling social metacognition paper is closely related to at least one research paper.

comment by Richard_Ngo (ricraz) · 2020-08-21T06:58:31.031Z · score: 5 (3 votes) · LW(p) · GW(p)

Right, but this isn't mentioned in the post? Which seems odd. Maybe that's actually another example of the "LW mentality": why is the fact that there has been solid empirical research into 3 layers not being enough not important enough to mention in a post on why 3 layers isn't enough? (Maybe because the post was time-boxed? If so that seems reasonable, but then I would hope that people comment saying "Here's a very relevant paper, why didn't you cite it?")

comment by Zachary Robertson (zachary-robertson) · 2020-08-20T23:13:56.178Z · score: 7 (4 votes) · LW(p) · GW(p)

On AI alone (which I am using in large part because there's vaguely more consensus around it than around rationality), I think you wouldn't have seen almost any of the public write-ups (like Embedded Agency and Zhukeepa's Paul FAQ) without LessWrong

I think a distinction should be made between intellectual progress (whatever that is) and distillation. I know lots of websites that do amazing distillation of AI related concepts (literally distill.pub). I think most people would agree that sort of work is important in order to make intellectual progress, but I also think significantly less people would agree distillation is intellectual progress. Having this distinction in mind, I think your examples from AI are not as convincing. Perhaps more so once you consider the Less Wrong is often being used more as a platform to share these distillations than to create them.

I think you're right that Less Wrong has some truly amazing content. However, once again, it seems a lot of these posts are not inherently from the ecosystem but are rather essentially cross-posted. If I say a lot of the content on LW is low-quality it's mostly an observation about what I expect to find from material that builds on itself. The quality of LW-style accumulated knowledge seems lower than it could be.

On a personal note, I've actively tried to explore using this site as a way to engage with research and have come to a similar opinion as Richard. The most obvious barrier is the separation between LW and AIAF. Effectively, if you're doing AI safety research, to second-order approximation you can block LW (noise) and only look at AIAF (signal). I say to second-order because anything from LW that is signal ends up being posted on AIAF anyway which means the method is somewhat error-tolerant.

This probably comes off as a bit pessimistic. Here's a concrete proposal I hope to try out soon enough. Pick a research question. Get a small group of people/friends together. Start talking about the problem and then posting on LW. Iterate until there's group consensus.

comment by Ben Pace (Benito) · 2020-08-21T02:21:54.546Z · score: 11 (4 votes) · LW(p) · GW(p)

Much of the same is true of scientific journals. Creating a place to share and publish research is a pretty key piece of intellectual infrastructure, especially for researchers to create artifacts of their thinking along the way. 

The point about being 'cross-posted' is where I disagree the most. 

This is largely original content that counterfactually wouldn't have been published, or occasionally would have been published but to a much smaller audience. What Failure Looks Like wasn't crossposted, Anna's piece on reality-revealing puzzles wasn't crossposted. I think that Zvi would have still written some on mazes and simulacra, but I imagine he writes substantially more content given the cross-posting available for the LW audience. Could perhaps check his blogging frequency over the last few years to see if that tracks. I recall Zhu telling me he wrote his FAQ because LW offered an audience for it, and likely wouldn't have done so otherwise. I love everything Abram writes, and while he did have the Intelligent Agent Foundations Forum, it had a much more concise, technical style, tiny audience, and didn't have the conversational explanations and stories and cartoons that have been so excellent and well received on LW, and it wouldn't as much have been focused on the implications for rationality of things like logical inductors. Rohin wouldn't have written his coherence theorems piece or any of his value learning sequence, and I'm pretty sure about that because I personally asked him to write that sequence, which is a great resource and I've seen other researchers in the field physically print off to write on and study. Kaj has an excellent series of non-mystical explanations of ideas from insight meditation that started as a response to things Val wrote, and I imagine those wouldn't have been written quite like that if that context did not exist on LW.

I could keep going, but probably have made the point. It seems weird to not call this collectively a substantial amount of intellectual progress, on a lot of important questions.

I am indeed focusing right now on how to do more 'conversation'. I'm in the middle of trying to host some public double cruxes for events, for example, and some day we will finally have inline commenting and better draft sharing and so on. It's obviously not finished.

comment by rohinmshah · 2020-09-02T16:13:36.039Z · score: 5 (3 votes) · LW(p) · GW(p)

Rohin wouldn't have written his coherence theorems piece or any of his value learning sequence, and I'm pretty sure about that because I personally asked him to write that sequence

Yeah, that's true, though it might have happened at some later point in the future as I got increasingly frustrated by people continuing to cite VNM at me (though probably it would have been a blog post and not a full sequence).

Reading through this comment tree, I feel like there's a distinction to be made between "LW / AIAF as a platform that aggregates readership and provides better incentives for blogging", and "the intellectual progress caused by posts on LW / AIAF". The former seems like a clear and large positive of LW / AIAF, which I think Richard would agree with. For the latter, I tend to agree with Richard, though perhaps not as strongly as he does. Maybe I'd put it as, I only really expect intellectual progress from a few people who work on problems full time who probably would have done similar-ish work if not for LW / AIAF (but likely would not have made it public).

I'd say this mostly for the AI posts. I do read the rationality posts and don't get a different impression from them, but I also don't think enough about them to be confident in my opinions there.

comment by Ben Pace (Benito) · 2020-08-20T23:23:50.900Z · score: 3 (2 votes) · LW(p) · GW(p)

 By "AN" do you mean the AI Alignment Forum, or "AIAF"?

comment by Zachary Robertson (zachary-robertson) · 2020-08-21T00:24:38.789Z · score: 1 (1 votes) · LW(p) · GW(p)

Ya, totally messed up that. I meant the AI Alignment Forum or AIAF. I think out of habit I used AN (Alignment Newsletter)

comment by Ben Pace (Benito) · 2020-08-21T08:34:23.739Z · score: 2 (1 votes) · LW(p) · GW(p)

I did suspect you'd confused it with the Alignment Newsletter :)

comment by mr-hire · 2020-08-20T16:11:44.972Z · score: 10 (4 votes) · LW(p) · GW(p)

And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts, and hoping that a few of the best ideas stick? This is not what a desperate effort to find the truth looks like.

It seems to me that maybe this is what a certain stage in the desperate effort to find the truth looks like?

Like, the early stages of intellectual progress look a lot like thinking about different ideas and seeing which ones stand up robustly to scrutiny.  Then the best ones can be tested more rigorously [LW · GW]and their edges refined through experimentation.  

It seems to me like there needs to be some point in the desparate search for truth in which you're allowing for half-formed thoughts and unrefined hypotheses, or else you simply never get to a place where the hypotheses you're creating even brush up against the truth.

comment by Richard_Ngo (ricraz) · 2020-08-20T20:50:53.709Z · score: 5 (3 votes) · LW(p) · GW(p)

In the half-formed thoughts stage, I'd expect to see a lot of literature reviews, agendas laying out problems, and attempts to identify and question fundamental assumptions. I expect that (not blog-post-sized speculation) to be the hard part of the early stages of intellectual progress, and I don't see it right now.

Perhaps we can split this into technical AI safety and everything else. Above I'm mostly speaking about "everything else" that Less Wrong wants to solve. Since AI safety is now a substantial enough field that its problems need to be solved in more systemic ways.

comment by mr-hire · 2020-08-20T22:22:52.535Z · score: 3 (2 votes) · LW(p) · GW(p)

In the half-formed thoughts stage, I'd expect to see a lot of literature reviews, agendas laying out problems, and attempts to identify and question fundamental assumptions. I expect that (not blog-post-sized speculation) to be the hard part of the early stages of intellectual progress, and I don't see it right now.

I would expect that later in the process.  Agendas laying out problems and fundamental assumptions don't spring from nowhere (at least for me), they come from conversations where I'm trying to articulate some intuition, and I recognize some underlying pattern. The pattern and structure doesn't emerge spontaneously, it comes from trying to pick around the edges of a thing, get thoughts across, explain my intuitions and see where they break.

I think it's fair to say that crystallizing these patterns into a formal theory is a "hard part", but the foundation for making it easy is laid out in the floundering and flailing that came before.

comment by Zachary Robertson (zachary-robertson) · 2020-08-20T19:10:39.275Z · score: 9 (5 votes) · LW(p) · GW(p)

One fairly strong belief of mine is that Less Wrong's epistemic standards are not high enough to make solid intellectual progress here.

I think this is literally true. There seems to be very little ability to build upon prior work.

Out of curiosity do you see Less Wrong as significantly useful or is it closer to entertainment/habit? I've found myself thinking along the same lines as I start thinking about starting my PhD program etc. The utility of Less Wrong seems to be a kind of double-edged sword. On the one hand, some of the content is really insightful and exposes me to ideas I wouldn't otherwise encounter. On the other hand, there is such an incredible amount of low-quality content that I worry that I'm learning bad practices.

comment by Viliam · 2020-08-20T20:57:24.435Z · score: 3 (2 votes) · LW(p) · GW(p)

Ironically, some people already feel threatened by the high standards here. Setting them higher probably wouldn't result in more good content. It would result in less mediocre content, but probably also less good content, as the authors who sometimes write a mediocre article and sometimes a good one, would get discouraged and give up.

Ben Pace gives a few examples of great content in the next comment. It would be better to easier separate the good content from the rest, but that's what the reviews are for. Well, only one review so far, if I remember correctly. I would love to see reviews of pre-2018 content (maybe multiple years in one review, if they were less productive). Then I would love to see the winning content get the same treatment as the Sequences -- edit them and arrange them into a book, and make it "required reading" for the community (available as a free PDF).

comment by Zachary Robertson (zachary-robertson) · 2020-08-20T22:44:48.091Z · score: 6 (3 votes) · LW(p) · GW(p)

Setting them higher (standards) probably wouldn't result in more good content.

I broadly agree here. However, I do see the short-forms as a consistent way to skirt around this. I'd say at least 30% of the Less Wrong value proposition are the conversations I get to have. Short-forms seem to be more adapted for continuing conversations and they have a low bar for being made.

I could clarify a bit. My main problem with low quality content isn't exactly that it's 'wrong' or something like that. Mostly, the issues I'm finding most common for me are,

  1. Too many niche pre-requisites.
  2. No comments
  3. Nagging feeling post is reinventing the wheel

I think one is a ridiculously bad problem. I'm literally getting a PhD in machine learning, write about AI Safety, and still find a large number of those posts (yes AN posts) glazed in internal-jargon that makes it difficult to connect with current research. Things get even worse when I look at non-AI related things.

Two is just a tragedy of the fact the rich get richer. While I'm guilty of this also, I think that requiring posts to also post seed questions/discussion topics in the comments could go a long way to alleviate this problem. I oftentimes read a post and want to leave a comment, but then don't because I'm not even sure the author thought about the discussion their post might start.

Three is probably a bit mean. Yet, more than once I've discovered a Less Wrong concept already had a large research literature devoted to it. I think this ties in with one due to the fact niche pre-reqs often go hand-in-hand with insufficient literature review.

comment by Ruby · 2020-08-21T03:14:25.043Z · score: 6 (5 votes) · LW(p) · GW(p)

Thanks for chiming in with this. People criticizing the epistemics is hopefully how we get better epistemics. When the Californian smoke isn't interfering with my cognition as much, I'll try to give your feedback (and Rohin's [LW(p) · GW(p)]) proper attention. I would generally be interested to hear your arguments/models in detail, if you get the chance to lay them out.

My default position is LW has done well enough historically (e.g. Ben Pace's examples) for me to currently be investing in getting it even better. Epistemics and progress could definitely be a lot better, but getting there is hard. If I didn't see much progress on the rate of progress in the next year or two, I'd probably go focus on other things, though I think it'd be tragic if we ever lost what we have now.

And another thought:

And we're trying to produce reliable answers to much harder questions by, what, writing better blog posts

Yes and no. Journal articles have their advantages, and so do blog posts. A bunch of recent LessWrong team's work has been around filling in the missing pieces for the system to work, e.g. Open Questions (hasn't yet worked for coordinating research), Annual Review, Tagging, Wiki. We often talk about conferences and "campus".
My work on Open Questions involved thinking about i) a better template for articles than "Abstract, Intro, Methods, etc.", but Open Questions didn't work for unrelated reasons we haven't overcome yet, ii) getting lit reviews done systematically by people, iii) coordinating groups around research agendas. 

I've thought about re-attempting the goals of Open Questions with instead a "Research Agenda" feature that lets people communally maintain research agendas and work on them. It's a question of priorities whether I work on that anytime soon.

I do really think many of the deficiencies of LessWrong's current work compared to academia are "infrastructure problems" at least as much as the epistemic standards of the community. Which means the LW team should be held culpable for not having solved them yet, but it is tricky.

comment by Richard_Ngo (ricraz) · 2020-08-21T05:33:10.325Z · score: 5 (4 votes) · LW(p) · GW(p)

For the record, I think the LW team is doing a great job. There's definitely a sense in which better infrastructure can reduce the need for high epistemic standards, but it feels like the thing I'm pointing at is more like "Many LW contributors not even realising how far away we are from being able to reliably produce and build on good ideas" (which feels like my criticism of Ben's position in his comment, so I'll respond more directly there).

comment by Pongo · 2020-08-20T20:21:34.835Z · score: 5 (3 votes) · LW(p) · GW(p)

It seems really valuable to have you sharing how you think we’re falling epistemically short and probably important for the site to integrate the insights behind that view. There are a bunch of ways I disagree with your claims about epistemic best practices, but it seems like it would be cool if I could pass your ITT more. I wish your attempt to communicate the problems you saw had worked out better. I hope there’s a way for you to help improve LW epistemics, but also get that it might be costly in time and energy.

comment by Viliam · 2020-08-20T21:10:54.892Z · score: 4 (2 votes) · LW(p) · GW(p)
I just noticed that a couple of those comments have been downvoted to negative karma

Now they're positive again.

Confusing to me, their Ω-karma (karma on another website) is also positive. Does it mean they previously had negative LW-karma but positive Ω-karma? Or that their Ω-karma also improved as a result of you complaining on LW a few hours ago? Why would it?

(Feature request: graph of evolution of comment karma as a function of time.)

comment by Richard_Ngo (ricraz) · 2020-08-21T14:36:50.178Z · score: 2 (1 votes) · LW(p) · GW(p)

I'm confused, what is Ω-karma?

comment by MikkW (mikkel-wilson) · 2020-08-21T15:25:00.554Z · score: 3 (2 votes) · LW(p) · GW(p)

AI Alignment Forum karma (which is also displayed here on posts that are crossposted)

comment by NaiveTortoise (An1lam) · 2020-08-21T13:17:59.270Z · score: 1 (1 votes) · LW(p) · GW(p)

I'd be curious what, if any, communities you think set good examples in this regard. In particular, are there specific academic subfields or non-academic scenes that exemplify the virtues you'd like to see more of?

comment by Richard_Ngo (ricraz) · 2020-08-21T14:35:06.121Z · score: 3 (2 votes) · LW(p) · GW(p)

Maybe historians of the industrial revolution? Who grapple with really complex phenomena and large-scale patterns, like us, but unlike us use a lot of data, write a lot of thorough papers and books, and then have a lot of ongoing debate on those ideas. And then the "progress studies" crowd is an example of an online community inspired by that tradition (but still very nascent, so we'll see how it goes).

More generally I'd say we could learn to be more rigorous by looking at any scientific discipline or econ or analytic philosophy. I don't think most LW posters are in a position to put in as much effort as full-time researchers, but certainly we can push a bit in that direction.

comment by NaiveTortoise (An1lam) · 2020-08-26T12:48:31.717Z · score: 3 (2 votes) · LW(p) · GW(p)

Thanks for your reply! I largely agree with drossbucket [LW(p) · GW(p)]'s reply.

I also wonder how much this is an incentives problem. As you mentioned and in my experience, the fields you mentioned strongly incentivize an almost fanatical level of thoroughness that I suspect is very hard for individuals to maintain without outside incentives pushing them that way. At least personally, I definitely struggle and, frankly, mostly fail to live up to the sorts of standards you mention when writing blog posts in part because the incentive gradient feels like it pushes towards hitting the publish button.

Given this, I wonder if there's a way to shift the incentives on the margin. One minor thing I've been thinking of trying for my personal writing is having a Knuth or Nintil style "pay for mistakes" policy. Do you have thoughts on other incentive structures to for rewarding rigor or punishing the lack thereof?

comment by Richard_Ngo (ricraz) · 2020-08-26T15:47:40.572Z · score: 5 (2 votes) · LW(p) · GW(p)

It feels partly like an incentives problem, but also I think a lot of people around here are altruistic and truth-seeking and just don't realise that there are much more effective ways to contribute to community epistemics than standard blog posts.

I think that most LW discussion is at the level where "paying for mistakes" wouldn't be that helpful, since a lot of it is fuzzy. Probably the thing we need first are more reference posts that distill a range of discussion into key concepts, and place that in the wider intellectual context. Then we can get more empirical. (Although I feel pretty biased on this point, because my own style of learning about things is very top-down). I guess to encourage this, we could add a "reference" section for posts that aim to distill ongoing debates on LW.

In some cases you can get a lot of "cheap" credit by taking other people's ideas and writing a definitive version of them aimed at more mainstream audiences. For ideas that are really worth spreading, that seems useful.

comment by Richard_Ngo (ricraz) · 2020-10-11T12:07:45.365Z · score: 6 (3 votes) · LW(p) · GW(p)

I've recently discovered waitwho.is, which collects all the online writing and talks of various tech-related public intellectuals. It seems like an important and previously-missing piece of infrastructure for intellectual progress online.

comment by Richard_Ngo (ricraz) · 2020-09-17T20:01:32.380Z · score: 4 (2 votes) · LW(p) · GW(p)

Greg Egan on universality:

I believe that humans have already crossed a threshold that, in a certain sense, puts us on an equal footing with any other being who has mastered abstract reasoning. There’s a notion in computing science of “Turing completeness”, which says that once a computer can perform a set of quite basic operations, it can be programmed to do absolutely any calculation that any other computer can do. Other computers might be faster, or have more memory, or have multiple processors running at the same time, but my 1988 Amiga 500 really could be programmed to do anything my 2008 iMac can do — apart from responding to external events in real time — if only I had the patience to sit and swap floppy disks all day long. I suspect that something broadly similar applies to minds and the class of things they can understand: other beings might think faster than us, or have easy access to a greater store of facts, but underlying both mental processes will be the same basic set of general-purpose tools. So if we ever did encounter those billion-year-old aliens, I’m sure they’d have plenty to tell us that we didn’t yet know — but given enough patience, and a very large notebook, I believe we’d still be able to come to grips with whatever they had to say.
comment by gwern · 2020-09-17T21:56:05.791Z · score: 15 (6 votes) · LW(p) · GW(p)

Equivocation. "Who's 'we', flesh man?" Even granting the necessary millions or billions of years for a human to sit down and emulate a superintelligence step by step, it is still not the human who understands, but the Chinese room.

comment by NaiveTortoise (An1lam) · 2020-09-17T23:06:41.160Z · score: 1 (1 votes) · LW(p) · GW(p)

I've seen this quote before and always find it funny because when I read Greg Egan, I constantly find myself thinking there's no way I could've come up with the ideas he has even if you gave me months or years of thinking time.

comment by gwern · 2020-09-18T00:33:50.179Z · score: 3 (2 votes) · LW(p) · GW(p)

Yes, there's something to that, but you have to be careful if you want to use that as an objection. Maybe you wouldn't easily think of it, but that doesn't exclude the possibility of you doing it: you can come up with algorithms you can execute which would spit out Egan-like ideas, like 'emulate Egan's brain neuron by neuron'. (If nothing else, there's always the ol' dovetail-every-possible-Turing-machine hammer.) Most of these run into computational complexity problems, but that's the escape hatch Egan (and Scott Aaronson has made a similar argument) leaves himself by caveats like 'given enough patience, and a very large notebook'. Said patience might require billions of years, and the notebook might be the size of the Milky Way galaxy, but those are all finite numbers, so technically Egan is correct as far as that goes.

comment by NaiveTortoise (An1lam) · 2020-09-18T01:02:29.079Z · score: 1 (1 votes) · LW(p) · GW(p)

Yeah good point - given generous enough interpretation of the notebook my rejection doesn't hold. It's still hard for me to imagine that response feeling meaningful in the context but maybe I'm just failing to model others well here.

comment by Richard_Ngo (ricraz) · 2020-04-26T21:00:05.813Z · score: 2 (1 votes) · LW(p) · GW(p)

There's some possible world in which the following approach to interpretability works:

  • Put an AGI in a bunch of situations where it sometimes is incentivised to lie and sometimes is incentivised to tell the truth.
  • Train a lie detector which is given all its neural weights as input.
  • Then ask the AGI lots of questions about its plans.

One problem that this approach would face if we were using it to interpret a human is that the human might not consciously be aware of what their motivations are. For example, they may believe they are doing something for altruistic reasons, when in fact their unconscious motivations are primarily to look good. And the motivations which we are less conscious of are exactly those ones which it's most disadvantageous for others to know about.

So would using such an interpretability technique on an AGI work? I guess one important question is something like: by default, would the AGI be systematically biased when talking about its plans, like humans are? Or is this something which only arises when there are selection pressures during training for hiding information?

One way we could avoid this problem: instead of a "lie detector", you could train a "plan identifier", which takes an AGI brain and tells you what that AGI is going to do in english. I'm a little less optimistic about this, since I think that gathering training data will be the big bottleneck either way, and getting enough data to train a plan identifier that's smart enough to generalise to a wide range of plans seems pretty tricky. (By contrast, the lie detector might not need to know very much about the *content* of the lies).

comment by Richard_Ngo (ricraz) · 2020-04-26T10:42:18.798Z · score: 2 (1 votes) · LW(p) · GW(p)

I've heard people argue that "most" utility functions lead to agents with strong convergent instrumental goals. This obviously depends a lot on how you quantify over utility functions. Here's one intuition in the other direction. I don't expect this to be persuasive to most people who make the argument above (but I'd still be interested in hearing why not).

If a non-negligible percentage of an agent's actions are random, then to describe it as a utility-maximiser would require an incredibly complex utility function (because any simple hypothesised utility function will eventually be falsified by a random action). And so this generates arbitrarily simple agents whose observed behaviour can only be described as maximising a utility function for arbitrarily complex utility functions (depending on how long you run them).

I expect people to respond something like: we need a theory of how to describe agents with bounded cognition anyway. And if you have such a theory, then we could describe the agent above as "maximising simple function U, subject to the boundedness constraint that X% of its actions are random".

comment by TurnTrout · 2020-04-26T14:34:57.887Z · score: 4 (2 votes) · LW(p) · GW(p)

I'm not sure if you consider me to be making that argument [LW · GW], but here are my thoughts: I claim that most reward functions lead to agents with strong convergent instrumental goals. However, I share your intuition that (somehow) uniformly sampling utility functions over universe-histories might not lead to instrumental convergence.

To understand instrumental convergence and power-seeking, consider how many reward functions we might specify automatically imply a causal mechanism for increasing reward. The structure of the reward function implies that more is better, and that there are mechanisms for repeatedly earning points (for example, by showing itself a high-scoring input).

Since the reward function is "simple" (there's usually not a way to grade exact universe histories), these mechanisms work in many different situations and points in time. It's naturally incentivized to assure its own safety in order to best leverage these mechanisms for gaining reward. Therefore, we shouldn't be surprised to see a lot of these simple goals leading to the same kind of power-seeking behavior.

What structure is implied by a reward function?

  • Additive/Markovian: while a utility function might be over an entire universe-history, reward is often additive over time steps. This is a strong constraint which I don't always expect to be true, but i think that among the goals with this structure, a greater proportion of them have power-seeking incentives.
  • Observation-based: while a utility function might be over an entire universe-history, the atom of the reward function is the observation. Perhaps the observation is an input to update a world model, over which we have tried to define a reward function. I think that most ways of doing this lead to power-seeking incentives.
  • Agent-centric: reward functions are defined with respect to what the agent can observe. Therefore, in partially observable environments, there is naturally a greater emphasis on the agent's vantage point in the environment.

My theorems apply to the finite, fully observable, Markovian situation.[1] We might not end up using reward functions for more impressive tasks – we might express preferences over incomplete trajectories, for example. The "specify a reward function over the agent's world model" approach may or may not lead to good subhuman performance in complicated tasks like cleaning warehouses. Imagine specifying a reward function over pure observations for that task – the agent would probably just get stuck looking at a wall in a particularly high-scoring way.

However, for arbitrary utility functions over universe histories, the structure isn't so simple. With utility functions over universe histories having far more degrees of freedom, arbitrary policies can be rationalized as VNM expected utility maximization [? · GW]. That said, with respect to a simplicity prior over computable utility functions, the power-seeking ones might have most of the measure.

A more appropriate claim might be: goal-directed behavior tends to lead to power-seeking, and that's why goal-directed behavior tends to be bad [LW · GW].


  1. However, it's well-known that you can convert finite non-Markovian MDPs into finite Markovian MDPs. ↩︎

comment by Richard_Ngo (ricraz) · 2020-04-30T13:34:52.891Z · score: 5 (3 votes) · LW(p) · GW(p)

I've just put up a post [LW · GW] which serves as a broader response to the ideas underpinning this type of argument.

comment by Richard_Ngo (ricraz) · 2020-04-26T21:05:45.307Z · score: 2 (1 votes) · LW(p) · GW(p)
I claim that most reward functions lead to agents with strong convergent instrumental goals

I think this depends a lot on how you model the agent developing. If you start off with a highly intelligent agent which has the ability to make long-term plans, but doesn't yet have any goals, and then you train it on a random reward function - then yes, it probably will develop strong convergent instrumental goals.

On the other hand, if you start off with a randomly initialised neural network, and then train it on a random reward function, then probably it will get stuck in a local optimum pretty quickly, and never learn to even conceptualise these things called "goals".

I claim that when people think about reward functions, they think too much about the former case, and not enough about the latter. Because while it's true that we're eventually going to get highly intelligent agents which can make long-term plans, it's also important that we get to control what reward functions they're trained on up to that point. And so plausibly we can develop intelligent agents that, in some respects, are still stuck in "local optima" in the way they think about convergent instrumental goals - i.e. they're missing whatever cognitive functionality is required for being ambitious on a large scale.

comment by TurnTrout · 2020-04-26T21:15:07.986Z · score: 2 (1 votes) · LW(p) · GW(p)

Agreed – I should have clarified. I've been mostly discussing instrumental convergence with respect to optimal policies. The path through policy space is also important.

comment by Richard_Ngo (ricraz) · 2020-04-27T01:39:47.532Z · score: 4 (2 votes) · LW(p) · GW(p)

Makes sense. For what it's worth, I'd also argue that thinking about optimal policies at all is misguided (e.g. what's the optimal policy for humans - the literal best arrangement of neurons we could possibly have for our reproductive fitness? Probably we'd be born knowing arbitrarily large amounts of information. But this is just not relevant to predicting or modifying our actual behaviour at all).

comment by TurnTrout · 2020-04-27T02:07:34.193Z · score: 2 (1 votes) · LW(p) · GW(p)

I disagree.

  1. We do in fact often train agents using algorithms which are proven to eventually converge to the optimal policy.[1] Even if we don't expect the trained agents to reach the optimal policy in the real world, we should still understand what behavior is like at optimum. If you think your proposal is not aligned at optimum but is aligned for realistic training paths, you should have a strong story for why.

  2. Formal theorizing about instrumental convergence with respect to optimal behavior is strictly easier than theorizing about -optimal behavior, which I think is what you want for a more realistic treatment of instrumental convergence for real agents. Even if you want to think about sub-optimal policies, if you don't understand optimal policies... good luck! Therefore, we also have an instrumental (...) interest in studying the behavior at optimum.


  1. At least, the tabular algorithms are proven, but no one uses those for real stuff. I'm not sure what the results are for function approximators, but I think you get my point. ↩︎

comment by Richard_Ngo (ricraz) · 2020-04-27T17:26:33.629Z · score: 2 (1 votes) · LW(p) · GW(p)

1. I think it's more accurate to say that, because approximately none of the non-trivial theoretical results hold for function approximation, approximately none of our non-trivial agents are proven to eventually converge to the optimal policy. (Also, given the choice between an algorithm without convergence proofs that works in practice, and an algorithm with convergence proofs that doesn't work in practice, everyone will use the former). But we shouldn't pay any attention to optimal policies anyway, because the optimal policy in an environment anything like the real world is absurdly, impossibly complex, and requires infinite compute.

2. I think theorizing about ϵ-optimal behavior is more useful than theorizing about optimal behaviour by roughly ϵ, for roughly the same reasons. But in general, clearly I can understand things about suboptimal policies without understanding optimal policies. I know almost nothing about the optimal policy in StarCraft, but I can still make useful claims about AlphaStar (for example: it's not going to take over the world).

Again, let's try cash this out. I give you a human - or, say, the emulation of a human, running in a simulation of the ancestral environment. Is this safe? How do you make it safer? What happens if you keep selecting for intelligence? I think that the theorising you talk about will be actively harmful for your ability to answer these questions.

comment by TurnTrout · 2020-04-27T18:14:23.236Z · score: 2 (1 votes) · LW(p) · GW(p)

I'm confused, because I don't disagree with any specific point you make - just the conclusion. Here's my attempt at a disagreement which feels analogous to me:

TurnTrout: here's how spherical cows roll downhill!

ricraz: real cows aren't spheres.

My response in this "debate" is: if you start with a spherical cow and then consider which real world differences are important enough to model, you're better off than just saying "no one should think about spherical cows".

I think that the theorising you talk about will be actively harmful for your ability to answer these questions.

I don't understand why you think that. If you can have a good understanding of instrumental convergence and power-seeking for optimal agents, then you can consider whether any of those same reasons apply for suboptimal humans.

Considering power-seeking for optimal agents is a relaxed problem [LW · GW]. Yes, ideally, we would instantly jump to the theory that formally describes power-seeking for suboptimal agents with realistic goals in all kinds of environments. But before you do that, a first step is understanding power-seeking in MDPs [LW · GW]. Then, you can take formal insights from this first step and use them to update your pre-theoretic intuitions where appropriate.

comment by Richard_Ngo (ricraz) · 2020-04-29T00:50:49.319Z · score: 5 (3 votes) · LW(p) · GW(p)

Thanks for engaging despite the opacity of the disagreement. I'll try to make my position here much more explicit (and apologies if that makes it sound brusque). The fact that your model is a simplified abstract model is not sufficient to make it useful. Some abstract models are useful. Some are misleading and will cause people who spend time studying them to understand the underlying phenomenon less well than they did before. From my perspective, I haven't seen you give arguments that your models are in the former category not the latter. Presumably you think they are in fact useful abstractions - why? (A few examples of the latter: behaviourism, statistical learning theory, recapitulation theory, Gettier-style analysis of knowledge).

My argument for why they're overall misleading: when I say that "the optimal policy in an environment anything like the real world is absurdly, impossibly complex, and requires infinite compute", or that safety researchers shouldn't think about AIXI, I'm not just saying that these are inaccurate models. I'm saying that they are modelling fundamentally different phenomena than the ones you're trying to apply them to. AIXI is not "intelligence", it is brute force search, which is a totally different thing that happens to look the same in the infinite limit. Optimal tabular policies are not skill at a task, they are a cheat sheet, but they happen to look similar in very simple cases.

Probably the best example of what I'm complaining about is Ned Block trying to use Blockhead to draw conclusions about intelligence. I think almost everyone around here would roll their eyes hard at that. But then people turn around and use abstractions that are just as unmoored from reality as Blockhead, often in a very analogous way. (This is less a specific criticism of you, TurnTrout, and more a general criticism of the field).

if you start with a spherical cow and then consider which real world differences are important enough to model, you're better off than just saying "no one should think about spherical cows".

Forgive me a little poetic license. The analogy in my mind is that you were trying to model the cow as a sphere, but you didn't know how to do so without setting its weight as infinite, and what looked to you like your model predicting the cow would roll downhill was actually your model predicting that the cow would swallow up the nearby fabric of spacetime and the bottom of the hill would fall into its event horizon. At which point, yes, you would be better off just saying "nobody should think about spherical cows".

comment by TurnTrout · 2020-04-29T18:42:47.044Z · score: 4 (2 votes) · LW(p) · GW(p)

Thanks for elaborating this interesting critique. I agree we generally need to be more critical of our abstractions.

I haven't seen you give arguments that your models [of instrumental convergence] are [useful for realistic agents]

Falsifying claims and "breaking" proposals is a classic element of AI alignment discourse and debate. Since we're talking about superintelligent agents, we can't predict exactly what a proposal would do. However, if I make a claim ("a superintelligent paperclip maximizer would keep us around because of gains from trade"), you can falsify this by showing that my claimed policy is dominated by another class of policies ("we would likely be comically resource-inefficient in comparison; GFT arguments don't model dynamics which allow killing other agents and appropriating their resources").

Even we can come up with this dominant policy class, so the posited superintelligence wouldn't miss it either. We don't know what the superintelligent policy will be, but we know what it won't be (see also Formalizing convergent instrumental goals). Even though I don't know how Gary Kasparov will open the game, I confidently predict that he won't let me checkmate him in two moves.

Non-optimal power and instrumental convergence

Instead of thinking about optimal policies, let's consider the performance of a given algorithm . takes a rewardless MDP and a reward function as input, and outputs a policy.

Definition. Let be a continuous distribution over reward functions with CDF . The average return achieved by algorithm at state and discount rate is

Instrumental convergence with respect to 's policies can be defined similarly ("what is the -measure of a given trajectory under ?"). The theory I've laid out allows precise claims, which is a modest benefit to our understanding. Before, we just had intuitions about some vague concept called "instrumental convergence".

Here's bad reasoning, which implies that the cow tears a hole in spacetime:

Suppose the laws of physics bestow godhood upon an agent executing some convoluted series of actions; in particular, this allows avoiding heat death. Clearly, it is optimal for the vast majority of agents to instantly become god.

The problem is that it's impractical to predict what a smarter agent will do, or what specific kinds of action will be instrumentally convergent for , or that the real agent would be infinitely smart. Just because it's smart doesn't mean it's omniscient, as you rightly point out.

Here's better reasoning:

Suppose that the MDP modeling the real world represents shutdown as a single terminal state. Most optimal agents don't allow themselves to be shut down. Furthermore, since we can see that most goals offer better reward at non-shutdown states, superintelligent can as well.[1] While I don't know exactly what will tend to do, I predict that policies generated by will tend to resist shutdown.


  1. It might seem like I'm assuming the consequent here. This is not so – the work is first done by the theorems on optimal behavior, which do imply that most goals achieve greater return by avoiding shutdown. The question is whether reasonably intelligent suboptimal agents realize this fact. Given a uniformly drawn reward function, we can usually come up with a better policy than dying, so the argument is that can as well. ↩︎

comment by Richard_Ngo (ricraz) · 2020-04-29T23:22:41.783Z · score: 4 (2 votes) · LW(p) · GW(p)

I'm afraid I'm mostly going to disengage here, since it seems more useful to spend the time writing up more general + constructive versions of my arguments, rather than critiquing a specific framework.

If I were to sketch out the reasons I expect to be skeptical about this framework if I looked into it in more detail, it'd be something like:

1. Instrumental convergence isn't training-time behaviour, it's test-time behaviour. It isn't about increasing reward, it's about achieving goals (that the agent learned by being trained to increase reward).

2. The space of goals that agents might learn is very different from the space of reward functions. As a hypothetical, maybe it's the case that neural networks are just really good at producing deontological agents, and really bad at producing consequentialists. (E.g, if it's just really really difficult for gradient descent to get a proper planning module working). Then agents trained on almost all reward functions will learn to do well on them without developing convergent instrumental goals. (I expect you to respond that being deontological won't get you to optimality. But I would say that talking about "optimality" here ruins the abstraction, for reasons outlined in my previous comment).

comment by TurnTrout · 2020-04-30T20:18:12.337Z · score: 2 (1 votes) · LW(p) · GW(p)

I expect you to respond that being deontological won't get you to optimality. But I would say that talking about "optimality" here ruins the abstraction, for reasons outlined in my previous comment

I was actually going to respond, "that's a good point, but (IMO) a different concern than the one you initially raised". I see you making two main critiques.

  1. (paraphrased) " won't produce optimal policies for the specified reward function [even assuming alignment generalization off of the training distribution], so your model isn't useful" – I replied to this critique above.

  2. "The space of goals that agents might learn is very different from the space of reward functions." I agree this is an important part of the story. I think the reasonable takeaway is "current theorems on instrumental convergence help us understand what superintelligent won't do, assuming no reward-result gap. Since we can't assume alignment generalization, we should keep in mind how the inductive biases of gradient descent affect the eventual policy produced."

I remain highly skeptical of the claim that applying this idealized theory of instrumental convergence worsens our ability to actually reason about it.

ETA: I read some information you privately messaged me, and i see why you might see the above two points as a single concern.

comment by Pattern · 2020-04-27T03:39:04.318Z · score: 2 (1 votes) · LW(p) · GW(p)
We do in fact often train agents using algorithms which are proven to eventually converge to the optimal policy.[1] [LW · GW]
At least, the tabular algorithms are proven, but no one uses those for real stuff. I'm not sure what the results are for function approximators, but I think you get my point. ↩︎ [LW · GW]

Is the point that people try to use algorithms which they think will eventually converge to the optimal policy? (Assuming there is one.)

comment by TurnTrout · 2020-04-27T03:55:21.632Z · score: 2 (1 votes) · LW(p) · GW(p)

Something like that, yeah.

comment by DanielFilan · 2020-08-22T06:00:42.015Z · score: 2 (1 votes) · LW(p) · GW(p)

And so this generates arbitrarily simple agents whose observed behaviour can only be described as maximising a utility function for arbitrarily complex utility functions (depending on how long you run them).

I object to the claim that agents that act randomly can be made "arbitrarily simple". Randomness is basically definitionally complicated!

comment by Richard_Ngo (ricraz) · 2020-08-22T06:32:07.112Z · score: 2 (1 votes) · LW(p) · GW(p)

Eh, this seems a bit nitpicky. It's arbitrarily simple given a call to a randomness oracle, which in practice we can approximate pretty easily. And it's "definitionally" easy to specify as well: "the function which, at each call, returns true with 50% likelihood and false otherwise."

comment by DanielFilan · 2020-08-22T16:27:27.083Z · score: 2 (1 votes) · LW(p) · GW(p)

If you get an 'external' randomness oracle, then you could define the utility function pretty simply in terms of the outputs of the oracle.

If the agent has a pseudo-random number generator (PRNG) inside it, then I suppose I agree that you aren't going to be able to give it a utility function that has the standard set of convergent instrumental goals, and PRNGs can be pretty short. (Well, some search algorithms are probably shorter, but I bet they have higher Kt complexity, which is probably a better measure for agents)

comment by Vaniver · 2020-04-29T23:27:41.768Z · score: 2 (1 votes) · LW(p) · GW(p)

If a reasonable percentage of an agent's actions are random, then to describe it as a utility-maximiser would require an incredibly complex utility function (because any simple hypothesised utility function will eventually be falsified by a random action).

I'd take a different tack here, actually; I think this depends on what the input to the utility function is. If we're only allowed to look at 'atomic reality', or the raw actions the agent takes, then I think your analysis goes through, that we have a simple causal process generating the behavior but need a very complicated utility function to make a utility-maximizer that matches the behavior.

But if we're allowed to decorate the atomic reality with notes like "this action was generated randomly", then we can have a utility function that's as simple as the generator, because it just counts up the presence of those notes. (It doesn't seem to me like this decorator is meaningfully more complicated than the thing that gave us "agents taking actions" as a data source, so I don't think I'm paying too much here.)

This can lead to a massive explosion in the number of possible utility functions (because there's a tremendous number of possible decorators), but I think this matches the explosion that we got by considering agents that were the outputs of causal processes in the first place. That is, consider reasoning about python code that outputs actions in a simple game, where there are many more possible python programs than there are possible policies in the game.

comment by Richard_Ngo (ricraz) · 2020-04-29T23:56:49.603Z · score: 2 (1 votes) · LW(p) · GW(p)

So in general you can't have utility functions that are as simple as the generator, right? E.g. the generator could be deontological. In which case your utility function would be complicated. Or it could be random, or it could choose actions by alphabetical order, or...

And so maybe you can have a little note for each of these. But now what it sounds like is: "I need my notes to be able to describe every possible cognitive algorithm that the agent could be running". Which seems very very complicated.

I guess this is what you meant by the "tremendous number" of possible decorators. But if that's what you need to do to keep talking about "utility functions", then it just seems better to acknowledge that they're broken as an abstraction.

E.g. in the case of python code, you wouldn't do anything analogous to this. You would just try to reason about all the possible python programs directly. Similarly, I want to reason about all the cognitive algorithms directly.

comment by Vaniver · 2020-04-30T23:45:51.293Z · score: 2 (1 votes) · LW(p) · GW(p)

Which seems very very complicated.

That's right.

I realized my grandparent comment is unclear here:

but need a very complicated utility function to make a utility-maximizer that matches the behavior.

This should have been "consequence-desirability-maximizer" or something, since the whole question is "does my utility function have to be defined in terms of consequences, or can it be defined in terms of arbitrary propositions?". If I want to make the deontologist-approximating Innocent-Bot, I have a terrible time if I have to specify the consequences that correspond to the bot being innocent and the consequences that don't, but if you let me say "Utility = 0 - badness of sins committed" then I've constructed a 'simple' deontologist. (At least, about as simple as the bot that says "take random actions that aren't sins", since both of them need to import the sins library.)

In general, I think it makes sense to not allow this sort of elaboration of what we mean by utility functions, since the behavior we want to point to is the backwards assignment of desirability to actions based on the desirability of their expected consequences, rather than the expectation of any arbitrary property.

---

Actually, I also realized something about your original comment which I don't think I had the first time around; if by "some reasonable percentage of an agent's actions are random" you mean something like "the agent does epsilon-exploration" or "the agent plays an optimal mixed strategy", then I think it doesn't at all require a complicated utility function to generate identical behavior. Like, in the rock-paper-scissors world, and with the simple function 'utility = number of wins', the expected utility maximizing move (against tough competition) is to throw randomly, and we won't falsify the simple 'utility = number of wins' hypothesis by observing random actions.

Instead I read it as something like "some unreasonable percentage of an agent's actions are random", where the agent is performing some simple-to-calculate mixed strategy that is either suboptimal or only optimal by luck (when the optimal mixed strategy is the maxent strategy, for example), and matching the behavior with an expected utility maximizer is a challenge (because your target has to be not some fact about the environment, but some fact about the statistical properties of the actions taken by the agent).

---

I think this is where the original intuition becomes uncompelling. We care about utility-maximizers because they're doing their backwards assignment, using their predictions of the future to guide their present actions to try to shift the future to be more like what they want it to be. We don't necessarily care about imitators, or simple-to-write bots, or so on. And so if I read the original post as "the further a robot's behavior is from optimal, the less likely it is to demonstrate convergent instrumental goals", I say "yeah, sure, but I'm trying to build smart robots (or at least reasoning about what will happen if people try to)."

comment by Richard_Ngo (ricraz) · 2020-05-01T10:56:36.212Z · score: 4 (2 votes) · LW(p) · GW(p)
Instead I read it as something like "some unreasonable percentage of an agent's actions are random"

This is in fact the intended reading, sorry for ambiguity. Will edit. But note that there are probably very few situations where exploring via actual randomness is best; there will almost always be some type of exploration which is more favourable. So I don't think this helps.

We care about utility-maximizers because they're doing their backwards assignment, using their predictions of the future to guide their present actions to try to shift the future to be more like what they want it to be.

To be pedantic: we care about "consequence-desirability-maximisers" (or in Rohin's terminology, goal-directed agents) because they do backwards assignment. But I think the pedantry is important, because people substitute utility-maximisers for goal-directed agents, and then reason about those agents by thinking about utility functions, and that just seems incorrect.

And so if I read the original post as "the further a robot's behavior is from optimal, the less likely it is to demonstrate convergent instrumental goals"

What do you mean by optimal here? The robot's observed behaviour will be optimal for some utility function, no matter how long you run it.

comment by Vaniver · 2020-05-01T23:49:40.348Z · score: 2 (1 votes) · LW(p) · GW(p)

To be pedantic: we care about "consequence-desirability-maximisers" (or in Rohin's terminology, goal-directed agents) because they do backwards assignment.

Valid point.

But I think the pedantry is important, because people substitute utility-maximisers for goal-directed agents, and then reason about those agents by thinking about utility functions, and that just seems incorrect.

This also seems right. Like, my understanding of what's going on here is we have:

  • 'central' consequence-desirability-maximizers, where there's a simple utility function that they're trying to maximize according to the VNM axioms
  • 'general' consequence-desirability-maximizers, where there's a complicated utility function that they're trying to maximize, which is selected because it imitates some other behavior

The first is a narrow class, and depending on how strict you are with 'maximize', quite possibly no physically real agents will fall into it. The second is a universal class, which instantiates the 'trivial claim' that everything is utility maximization.

Put another way, the first is what happens if you hold utility fixed / keep utility simple, and then examine what behavior follows; the second is what happens if you hold behavior fixed / keep behavior simple, and then examine what utility follows.

Distance from the first is what I mean by "the further a robot's behavior is from optimal"; I want to say that I should have said something like "VNM-optimal" but actually I think it needs to be closer to "simple utility VNM-optimal." 

I think you're basically right in calling out a bait-and-switch that sometimes happens, where anyone who wants to talk about the universality of expected utility maximization in the trivial 'general' sense can't get it to do any work, because it should all add up to normality, and in normality there's a meaningful distinction between people who sort of pursue fuzzy goals and ruthless utility maximizers.