Comment by wei_dai on Disincentives for participating on LW/AF · 2019-05-15T01:54:46.623Z · score: 6 (3 votes) · LW · GW

I don’t think any of those features strongly disincentivize me from participating on LW/AF; it’s more the lack of people close to my own viewpoint that disincentivizes me from participating.

I see. Hopefully the LW/AF team is following this thread and thinking about what to do, but in the meantime I encourage you to participate anyway, as it seems good to get ideas from your viewpoint "out there" even if no one is currently engaging with them in a way that you find useful.

as well as the focus on expected utility maximization with simple utility functions

I don't think anyone talks about simple utility functions? Maybe you mean explicit utility functions?

A priori I assign somewhat high probability that I will not find useful a critical comment on my work from anyone holding that perspective, but I’ll feel obligated to reply anyway.

If this feature request of mine were implemented, you'd be able to respond to such comments with a couple of clicks. In the meantime it seems best to just not feel obligated to reply.

Comment by wei_dai on Disincentives for participating on LW/AF · 2019-05-14T14:36:35.996Z · score: 6 (3 votes) · LW · GW

I mean, I’m not sure if an intervention is necessary—I do in fact engage with people who share my viewpoint, or at least understand it well; many of them are at CHAI. It just doesn’t happen on LW/AF.

Yeah, I figured as much, which is why I said I'd prefer having an online place for such discussions so that I would be able to listen in on these discussions. :) Another advantage is to encourage more discussions across organizations and from independent researchers, students, and others considering going into the field.

Maybe I more mean that there’s an emphasis that any particular idea must have a connection via a sequence of logical steps to a full solution to AI safety.

It's worth noting that many MIRI researchers seem to have backed away from this (or clarified that they didn't think this in the first place). This was pretty noticeable at the research retreat and also reflected in their recent writings. I want to note though how scary it is that almost nobody has a good idea how their current work logically connects to a full solution to AI safety.

Note that I’m not saying I disagree with all of these points; I’m trying to point at a cluster of beliefs / modes of thinking that I tend to see in people who have viewpoint X.

I'm curious what your strongest disagreements are, and what bugs you the most, as far as disincentivizing you to participate on LW/AF.

Comment by wei_dai on Disincentives for participating on LW/AF · 2019-05-12T22:04:07.629Z · score: 8 (4 votes) · LW · GW

It sounds like you might prefer a separate place to engage more with people who already share your viewpoint. Does that seem right? I think I would prefer having something like that too if it means being able to listen in on discussions of AI safety researchers with perspectives different from myself.

I would be interested in getting a clearer picture of what you mean by "viewpoint X", how your viewpoint differs from it, and what especially bugs you about it, but I guess it's hard to do, or you would have done it already.

Comment by wei_dai on Narcissism vs. social signalling · 2019-05-12T20:23:06.257Z · score: 5 (2 votes) · LW · GW

It seems to me like the first two stages are simple enough that Jessica’s treatment is an adequate formalization, insofar as the “market for lemons” model is well-understood. Can you say a bit more about how you’d expect additional formalization to help here?

In the original "market for lemons" game there was no signaling. Instead the possibility of "lemons" in the market just drives out "peaches" until the whole market collapses.

As I mentioned in my reply to Jessica, the actual model for stage 2 she had in mind seems more complex than any formal model in the literature that I can easily find. I was unsure from her short verbal description in the original comment what model she had in mind (in particular I wasn't sure how to interpret "convincing lies"), and am still unsure whether the math would actually work out the way she thinks (although I grant that it seems intuitively plausible). I was also unsure whether she is assuming standard unbounded rationality or something else.

It’s in the transition from stage 2 to 3 and 4 that some modeling specific to this framework seems needed, to me.

I was confused/uncertain about stage 2 already, but sure I'd be interested in thoughts about how to model the higher stages too.

Comment by wei_dai on Narcissism vs. social signalling · 2019-05-12T20:10:00.556Z · score: 6 (3 votes) · LW · GW

I would not be surprised if this model is already in the literature somewhere.

I couldn't find one after doing a quick search. According to there are separate classes of Audit Games and Signaling Games in the literature. It would seem natural to combine auditing and signaling into a single model but I'm not sure anyone has done so, or how the math would work out.

Comment by wei_dai on Narcissism vs. social signalling · 2019-05-12T11:46:01.037Z · score: 6 (3 votes) · LW · GW

Stage 2 signalling is this but with convincing lies, which actually are enough to convince a Bayesian evaluator (who may be aware of the adversarial dynamic, and audit sometimes).

Do you have a formal (e.g., game theoretic) model of this in mind, or see an approach to creating a formal model for it?

On the one hand, I don't want to Goodhart on excess formality / mathematization or not take advantage of informal models where available, but on the other hand, I'm not sure if long-term intellectual progress is possible without using formal models, since informal models seem very lossy in transmission and it seems very easy to talk past each other when using informal models (e.g., two people think they're discussing one model but actually have two different models in mind). I'm thinking of writing a Question Post about this. If the answer to the above question is "no", would you mind if I used this as an example in my post?

"UDT2" and "against UD+ASSA"

2019-05-12T04:18:37.158Z · score: 40 (12 votes)
Comment by wei_dai on What features of people do you know of that might predict academic success? · 2019-05-11T17:57:17.444Z · score: 5 (2 votes) · LW · GW

Giving more context might help readers to know what kind of answer you're looking for. What kind of people are you planning to run your predictions on? High school students? College students? Graduate students? AI researchers? People in non-AI fields? What kind of interventions are you planning to do on them?

Comment by wei_dai on Disincentives for participating on LW/AF · 2019-05-11T17:51:24.088Z · score: 9 (5 votes) · LW · GW

The LW/AF audience by and large operates under a set of assumptions about AI safety that I don’t really share. I can’t easily describe this set, but one bad way to describe it would be “the MIRI viewpoint” on AI safety.

Are you seeing this reflected in the pattern of votes (comments/posts reflecting "the MIRI viewpoint" get voted up more), pattern of posts (there's less content about other viewpoints), or pattern of engagement (most replies you're getting are from this viewpoint)? Please give some examples if you feel comfortable doing that.

In any case, do you think recruiting more alignment/safety researchers with other viewpoints to participate on LW/AF would be a good solution? Would you like the current audience to consider the arguments for other viewpoints more seriously? Other solutions you think are worth trying?

TurnTrout and I switched to private online messaging at one point

Yeah, I think this is probably being done less than optimally, and I'd like to see LW support or encourage this somehow. One problem with the way people are doing this currently is that the chat transcripts are typically not posted, which prevents others from following along and perhaps getting a similar level of understanding, or asking questions, or spotting errors that both sides are making, or learning discussion skills from such examples.

Comment by wei_dai on Not Deceiving the Evaluator · 2019-05-11T04:10:32.588Z · score: 3 (1 votes) · LW · GW

Thanks for the example, which is really helpful. Am I correct in thinking that in general since the agent will be doing what is optimal according to its own prior, which won't be the same as what is optimal according to the evaluator's prior, if the evaluator is a rational agent trying to optimize the world according to its own prior, then it would not actually reward the agent according to what this scheme specifies but instead reward the agent in such a way as to cause the agent to act according to a policy that the evaluator thinks is best? In other words the evaluator has an incentive to deceive the agent as to what its prior and/or utility function actually are?

Comment by wei_dai on Disincentives for participating on LW/AF · 2019-05-10T23:13:30.359Z · score: 5 (2 votes) · LW · GW

a) Do you have a sense that these people think of LW/AF as a/the primary nexus for discussion alignment-related issues? (but didn’t either because they didn’t expect to get much benefit, or would endure too much cost)

Again I didn't get a chance to talk much about this topic, but I would guess yes.

b) I don’t actually know if there’s any other actual locus of conversation happening anywhere other than individual private google docs, curious if you know of any such thing? (such as mailing lists. Not asking for the specific details of any such mailing list, just wanting to check if such a thing exists at all).

The only thing I personally know is a low-traffic private mailing list run by FHI which has non-FHI researchers on it but mostly consist of discussions between FHI researchers.

Comment by wei_dai on Disincentives for participating on LW/AF · 2019-05-10T22:56:16.341Z · score: 19 (6 votes) · LW · GW

I definitely expect there to be lot of room for improvement here – each of the areas you point to is something we’ve talked about.

That's good to hear.

One quick check (too late in this case, but fairly high priority to figure out IMO) is “do they even think of LW as a place they should have considered reading?”

A lot of them know of LW/AF and at least read some of the posts.

Also it’s sort of awkward to have the two places to hang out be “The AlignmentForum” and “The Alignment Subforum [of LessWrong]”

Agreed this seems really awkward/confusing, but it makes me realize we do need better ways to onboard people who are mainly interested in AI alignment as opposed to rationality and cater to their needs generally. If a new user tries to comment on a post on AF now, it just pops up a message "Log in or go to LessWrong to submit your comment." and there's not even a link to the same post on LW. This whole experience probably needs to be reconsidered.

Disincentives for participating on LW/AF

2019-05-10T19:46:36.010Z · score: 67 (27 votes)
Comment by wei_dai on Not Deceiving the Evaluator · 2019-05-09T23:30:14.463Z · score: 3 (1 votes) · LW · GW

I think I vaguely understand but it would be a lot clearer if you gave a concrete example. Also please update in the direction that people often find it hard to understand things without examples and giving examples preemptively is very cost effective in general.

Comment by wei_dai on Should Effective Altruism be at war with North Korea? · 2019-05-09T19:17:36.252Z · score: 3 (1 votes) · LW · GW

I don’t think I’ve seen a single clear example of someone taking initiative (where saying something new in public based on engagement with the post’s underlying model would count as taking initiative) as a result of that post, and making different giving decisions would probably count too.

I wrote a post that was in part a response/followup to your GiveWell post although I'm not sure if you'd count that as engagement with your underlying model or just superficially engaging with the conclusions or going off on a tangent or something like that.

I think I have some general confusion about what you're trying to do. If you think you have ideas that are good enough to, upon vetting by a wider community, potentially be basis for action for others or help change other people's decisions, or be the basis for further thinking by others, and aren't getting as much engagement as you hope, it seems like you should try harder to communicate your ideas clearly and to a wide audience. On the other hand if you're still pretty confused about something and still trying to figure things out to your own satisfaction, then it would make sense to just talk with others who already share your context and not try super hard to make things clear to a wider audience. Or do you think you've figured some things out but it doesn't seem cost effective to communicate to a wider audience but you might as well put them out there in a low-effort way and maybe a few readers will get your ideas.

(So one suggestion/complaint is to make clearer which type of post is which. Just throwing things out there isn't low cost if it wastes readers' time! Again maybe you think that should just be obvious from looking at the first few paragraphs of a post but it was not to me, in part because others like Eliezer use dialogs to write the first kind of post. In retrospect he was writing fictionalized dialogs instead of reporting actual dialogs but I think that's why the post didn't immediately jump out to me as "maybe this isn't worthwhile for me to try to understand so I should stop before I invest more time/effort into it".)

It seems like you're saying that you rarely or never get enough engagement with the first type of writing, so you no longer think that is cost effective for you, but then what is your motivation for trying to figure these things out now? Just to guide your own actions and maybe a very small group of others? If so, what is your reason for being so pessimistic about getting your ideas into a wider audience if you tried harder? Are there not comparably complex or subtle or counterintuitive ideas that have gotten into a wider audience?

Comment by wei_dai on Should Effective Altruism be at war with North Korea? · 2019-05-09T10:19:27.590Z · score: 12 (3 votes) · LW · GW

On review, there was already a summary at the beginning of the Authoritarian Empiricism post

I didn't recognize this as a summary because it seemed to be talking about a specific "pseudo-community" and I didn't interpret it as making a general point. Even reading it now, knowing that it's a summary, I still can't tell what the main point of the article might be. The beginning of Totalitarian ethical systems seems clearer as summary now that you've described it as such, but before that I didn't know if it was presenting the main point or just an example of a more general point or something tangential, etc., since I didn't understand all of the rest of the post so I couldn't be sure the main point of the post wasn't something different.

Also it seems like the point of a summary is to clearly communicate what the main points of the post are so the reader has some reference to know whether they should read the rest of it and also to help understand the rest of the post (since they can interpret it in relation to the main points) and having an unlabeled summary seems to defeat these purposes as the reader can't even recognize the summary as a summary before they've read and understood the rest of the post.

Comment by wei_dai on Should Effective Altruism be at war with North Korea? · 2019-05-09T10:00:48.155Z · score: 12 (3 votes) · LW · GW

(Sorry, due to attending a research retreat I didn't get a chance to answer your comments until now.)

I don't think you should care so much about engagement as opposed to communicating your ideas to your readers. I found your series on GiveWell a lot easier to understand would much prefer writings in that style.

More specific feedback would be helpful to me, like, “I started reading this article because I got the sense that it was about X, and was disappointed because it didn’t cover arguments Y and Z that I consider important.”

I started reading this post because I read some posts from you in the past that I liked (such as the GiveWell one), and on these dialog ones it was just really hard to tell what main points you're trying to make. I questioned the NK government vs NK people thing because I at least understood that part, and didn't realize it's tangential.

Like, before you added a summary, this post started by talking to a friend who used "threatening" with regard to NK, without even mentioning EA, which made me think "why should I care about this?" so I tried to skim the article but that didn't work (I found one part that seemed clear to me but that turned out to be tangential). I guess just don't know how to read an article that doesn't clearly at the outset say what the main points are (and therefore why I should care), and which also can't be skimmed.

Comment by wei_dai on Not Deceiving the Evaluator · 2019-05-08T17:04:43.156Z · score: 6 (3 votes) · LW · GW

I worked out a toy example that may be helpful. Suppose the setup is that there are states labeled 0-10, actions labeled 0-10, observations labeled 0-10, initial state is 0 and each action takes the system into state with same label and agent/evaluator get observation with same label, and two equally probable utility functions: sum of state labels over time, or the negative of that.

First suppose the policy is to always do the same action, then when you sum over the two utility functions the utilities cancel out so the expected utility is 0. Now suppose the policy is to do any non-zero action (let's say 5) at the start, and then do 0 if the agent observes a negative reward and 10 if the agent observes a positive reward. Now when you sum over the utility functions, in the negative case the utility is -5 + 0 (this policy implies that conditional on that utility function, with probability 1 the state trajectory is 0, 5, 0), in the positive case it's 5+10 (state trajectory 0, 5, 10), so EU is .5 * -5 + .5 * 15 = 5 so this is a better policy than the first one and it should be easy to see that it's optimal.

Hope I understood the idea correct and that helps to explain it?

Is the point you are trying to make different from the one in Learning What to Value? (Specifically, the point about observation-utility maximizers.) If so, how?

It looks closer to the Value Learning Agent in that paper to me and maybe can be considered an implementation / specific instance of that? (Although I haven't tried to figure out whether that's mathematically / formally the case.)

Something that confuses me is that since the evaluator sees everything the agent sees/does, it's not clear how the agent can deceive the evaluator at all. Can someone provide an example in which the agent has an opportunity to deceive in some sense and declines to do that in the optimal policy?

Comment by wei_dai on Should Effective Altruism be at war with North Korea? · 2019-05-06T16:31:09.414Z · score: 13 (3 votes) · LW · GW

It seems like you need this la­beled DISCLAIMER so you can perform ACTION: CONSIDER WHETHER NOT TO READ in­stead of, well, pars­ing the in­for­ma­tion and act­ing based on the model it gives you.

Model building is cognitively taxing. I usually just go by the general expectation that someone wouldn't post on LW unless they think most readers would get positive value from reading it. It seems right to disclaim this when you already have a model and your model doesn't predict this.

Comment by wei_dai on Should Effective Altruism be at war with North Korea? · 2019-05-06T15:55:55.380Z · score: 5 (2 votes) · LW · GW

but at this point I’ve basically given up on most real engagement by people I’m not in a direct dialogue with and am throwing these things up on a sort of just-in-case basis and trying not to do extra work that I don’t expect to pay off.

Thanks for the clarification, but if I had known this earlier, I probably would have invested less time/effort trying to understand these posts. Maybe you could put this disclaimer on top of your dialog posts in the future for the benefit of other readers?

Comment by wei_dai on Should Effective Altruism be at war with North Korea? · 2019-05-06T15:42:24.285Z · score: 3 (1 votes) · LW · GW

I expect the vast majority of likely initiatives that claim to be the specific thing I mentioned to be fake

I don't understand this and how it relates to the second part of the sentence.

and people should judge EA on whether it generates that class of idea and acts on it in ways that could actually work (or more indirectly, whether EAs talk as though they’ve already thought this sort of thing through), not whether it tries to mimic specific suggestions I give.

I'm not convinced there exists a promising idea within the class that you're pointing to (as far as I can understand it), so absence of evidence that EA has thought things through in that direction doesn't seem to show anything from my perspective. In other words, they could just have an intuition similar to mine that there's no promising idea in that class so there's no reason to explore more in that direction.

Comment by wei_dai on Should Effective Altruism be at war with North Korea? · 2019-05-06T07:52:30.675Z · score: 32 (9 votes) · LW · GW

I wish the three recent dialog posts from you were instead written as conventional posts because they don't include abstracts/summaries or much context, it's hard to skim them to try to figure out what they are about (there are no section headings, the conversation moves from topic to topic based on how the interlocutor happens to respond instead of in some optimized way, and they have to be read as a linear dialog to make much sense), and the interlocutor often fails to ask questions that I'd like to have answered or fails to give counterarguments that I'd like to see addressed (whereas in a conventional post the author is more likely to try to anticipate more common questions/counterarguments and answer them).

For example I think if this post were written as a conventional post you probably would have clarified whether the "compromise version of Utilitarianism" is supposed to be a compromise with the NK people or with the NK government since that seems like an obvious question that a lot of people would have (and someone did ask on Facebook), as well as addressed some rather obvious problems with the proposal (whichever one you actually meant).

Comment by wei_dai on Meta-tations on Moderation: Towards Public Archipelago · 2019-05-02T02:34:35.516Z · score: 3 (1 votes) · LW · GW

Similarly, certain worldviews and approaches to problem-solving are overrepresented here relative to the broader community, and these aren’t necessarily the ones I most want to hear from.

I'm curious which worldviews and approaches you saw as over-represented, and which are the ones you most wanted to hear from, and whether anything has changed since you wrote this comment.

Maybe this just boils down to the problem of my friends not being on here and it’s not worth your time to try to solve. But it still feels like a problem.

Are your friends here now? If not, why?

Comment by wei_dai on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-04-30T16:04:20.017Z · score: 5 (2 votes) · LW · GW

Stuart Armstrong wrote a post that argued for merged utility functions of this form (plus a tie-breaker), but there are definitely things, like different priors and logical uncertainty, which the argument doesn't take into account, that make it unclear what the actual form of the utility function would be (or if the merged AI would even be doing expected utility maximization). I'm curious what your own reason for doubting it is.

Comment by wei_dai on Asymmetric Justice · 2019-04-30T04:51:43.394Z · score: 7 (3 votes) · LW · GW

Robin Hanson's Taboo Gradations (which was written after this post) seems related in that it's also about a non-linearity in our mental accounting system for social credit/blame. Might be a good idea to try to build a model that can explain both phenomena at the same time.

Comment by wei_dai on Asymmetric Justice · 2019-04-29T22:31:08.114Z · score: 9 (4 votes) · LW · GW

The relevance to FAI is that any group trying to design one (or really design anything substantively new from first principles) needs to be able to have internal communication that is really, really robustly not made out of telling each other to do specific things, and it seems like the default expectation, including in Rationalist circles, has increasingly become that words are not communicative unless they are commands.

I haven't seen this myself. If you want, I can point you to any number of posts on the Alignment Forum that are not made out of telling each other to do specific things. Can you give some examples of what you've seen that made you say this?

None of the work people were doing several years ago on decision theory was like this.

Again, I'm not really seeing this now either.

I took this to mean that but for this sentence (which I took to be a superfluous conclusion-flavored end, and Zvi agrees wasn’t part of the core content of the post), you wouldn’t have focused on the question of what specific actions the post was asking the reader to perform.

Probably, but I'm not totally sure. I guess unless the (counterfactual) conclusion said something to the effect of "This seems bad, and I'm not sure what to do about it" I might have asked something like "The models in this post don't seem to include enough gears for me to figure out what I can do to help with the situation. Do you have any further thoughts about that?" And then maybe he would have either said "I'm still trying to figure that out" in which case the conversation would have ended, or maybe he would have said "I think we should try not to use asymmetrical mental point systems unless structurally necessary" and then we would have had the same debate about whether that implication is justified or not.

(I'm not sure where this line of question is leading... Also I still don't understand why you're calling it a "technical" error. If the mistake was writing a superfluous conclusion-flavored end, wouldn't "rhetorical" or "presentation" error be more appropriate? What is technical about the error?)

Comment by wei_dai on Asymmetric Justice · 2019-04-29T08:49:09.709Z · score: 3 (1 votes) · LW · GW

I agree that Zvi made a technical error in the conclusion, in a way that reliably caused misinterpretation towards construing things as calls to action, and that it was good to point this out. Nothing amiss here.

This summary seems wrong or confused or confusing to me.

  1. What is the actual error you have in mind? (I myself have made a couple of different criticisms about the post but I'm not sure any of them fits your description of "minor technical error" that "reliably caused misinterpretation towards construing things as calls to action".)
  2. "Call to action" is apparently a loaded term with negative connotations among the mods and perhaps others here (which I wasn't previously aware of). Are you using it in this derogatory sense or some other sense?
  3. Zvi himself has confirmed that his original conclusion was intended as a call to action, albeit an "incidental" one. Why do you keep saying that there wasn't a call to action, and that "call to action" is a misinterpretation?

But, the fact that this minor technical error was so important relative to the rest of the post is, itself, a huge red flag that something is wrong with our discourse, and we should be trying to figure that out if we think something like FAI might turn out to be important.

I believe there have been several different layers of confusion happening in this episode (and may continue to be happening), which has contributed to the large number of comments written about it and maybe a sense that it's more important than the rest of the post. Also, again, depending on exactly what you mean, I'm not sure I'd agree with "minor technical error". It seems like some of my own criticisms of the post were actually fairly substantial and combined with the aforementioned confusions and the fact that disagreements will naturally generate more discussion than agreements, I don't understand why you think there is a "huge red flag that something is wrong with our discourse" here. I wanted to disengage as I'm not sure continuing to participate in this debate (including retrying to fully resolve all the layers of confusion) is the best use of my time, but I'm happy to listen to you explain more if you still think this is actually important or has relevance to FAI.

Comment by wei_dai on Asymmetric Justice · 2019-04-28T23:48:16.462Z · score: 7 (3 votes) · LW · GW

Agreed. And I think I was implicitly focusing on whether the post gave a sufficient explanation for its (original) conclusion, and was rather confused why others were so focused on whether there was a call to action or not (which without knowing the context of your private discussions I just interpreted to mean any practical suggestion)

Comment by wei_dai on Asymmetric Justice · 2019-04-28T19:19:55.922Z · score: 3 (1 votes) · LW · GW

(If the current mod team got hit by a truck and new people took over and tried to implement our “no calls to action on frontpage” rule without understanding it, I predict they wouldn’t get the nuances right).

When did this rule come into effect and where is it written down? The closest thing I can find in Frontpage Posting and Commenting Guidelines is:

A corollary of 1.3 is that we often prefer descriptive language (including language describing your current beliefs, emotional state, etc.) over prescriptive language, all else being equal.

Which seems pretty far from “no calls to action on frontpage” and isn't even in the "Things to keep to a minimum" or "Off-limits things" section.

(If I had been aware of this rule and surrounding discussions about it, maybe I would have been more sensitive about "accusing" someone of making a call to action, which to be clear wasn't my intention at all since I didn't even know such a rule existed.)

Comment by wei_dai on Asymmetric Justice · 2019-04-28T09:04:57.459Z · score: 3 (1 votes) · LW · GW

It felt very Copenhagen Interpretation—I’d interacted with the problem of what to do about it and thus was to blame for not doing more or my solution being incomplete.

I disagree with this framing. I think there's a difference between criticism (pointing out flaws in an idea or presentation or argument) and blame, and I was trying to engage in the former. I wrote a longer reply to one of your comments trying to explain this more but then deleted it because I feel like disengaging at this point. Initially I was just confused about what the conclusion of the post was trying to say and posted a comment about that, which drew me into a more substantive debate, and on reflection I don't think this is actually a debate that I need to be involved in.

Comment by wei_dai on Asymmetric Justice · 2019-04-28T07:02:18.383Z · score: 3 (1 votes) · LW · GW

and then is guilty if the call-to-action isn’t sufficiently well specified and doesn’t give concrete explicit paths to making progress that seem realistic and to fit people’s incentives and so on?

"Guilty" is a strange framing here, and seems to come out of nowhere except for the fact that the post itself is about justice. I believe I started by pointing out a confusion in the conclusion, and when the confusion was clarified, suggested that the conclusion was insufficiently justified. If that is equivalent to accusing someone of being guilty and therefore a bad thing to do, I don't know how we're supposed to criticize any ideas here.

Call to action, and the calling thereof, is an action, and thus makes one potentially blameworthy in various ways for being insufficient, whereas having no call to action would have been fine.

I don't think call to action per se was the problem, but rather the unclear statement of it and the insufficient justification. If it was some other kind of conclusion that was equally unclear or insufficiently justified, I think it would be equally worthy of criticism (which I deliberately use in place of "blame"), by which I just mean that flaws in an idea or presentation should be pointed out.

You’ve interacted with the problem, and thus by CIE are responsible for not doing more. So one must not interact with the problem in any real way, and ensure that one isn’t daring to suggest anything get done.

This has not been my experience, in interacting with the AI safety problem, for example. No one has accused me of not doing more, or said I should be blamed for making the suggestions that I've made. Sure, people have sometimes criticized my ideas as being wrong or lacking good arguments, but I don't think there has been any intention to assign guilt or blame or punishment, and I don't think there has been any such intention here either.

Comment by wei_dai on Asymmetric Justice · 2019-04-28T03:25:11.330Z · score: 13 (3 votes) · LW · GW

I am confused why it is unreasonable to suggest to people that, as a first step to correcting a mistake, that they themselves stop making it.

My reasoning is that 1) the problem could be a coordination problem. If it is, then telling people to individually stop making the mistake does nothing or just hurts the people who listen, without making the world better off as a whole. If it's not a coordination problem, then 2) there's still a high probability that it's a Chesterton's fence, and I think your post didn't do enough to rule that out either.

now it is unreasonable to suggest someone might do the right thing on their own in addition to any efforts to make that a better plan or to assist with abilities to coordinate

Maybe my position is more understandable in light of the Chesterton's fence concern? (Sorry that my critique is coming out in bits and pieces, but originally I just couldn't understand what the ending meant, then the discussion got a bit side-tracked onto whether there was a call to action or not, etc.)

I’d also challenge the idea that only the group’s conclusions on what is just matter, or that the goal of forming conclusions about what is just is to reach the same conclusion as the group, meaning that justice becomes ‘that which the group chooses to coordinate on.’

This seems like a strawman or a misunderstanding of my position. I would say that generally there could be multiple things that the group could choose to coordinate on (i.e., multiple equilibria in terms of game theory) or we could try to change what the group coordinates on by changing the rules of the game, so I would disagree that "the goal of forming conclusions about what is just is to reach the same conclusion as the group". My point is instead that we can't arbitrarily choose "where the coordination is going to land" and we need better models to figure out what's actually feasible.

Comment by wei_dai on Speaking for myself (re: how the LW2.0 team communicates) · 2019-04-27T18:59:32.175Z · score: 15 (4 votes) · LW · GW

(This is an unrelated question about LW that I'd like the LW team to see, but don't think needs its own post, so I'm posting it here.) I want to mention that it remains frustrating when someone says something like "I'm open to argument" or we're already in the middle of a debate, I give them an argument, and just hear nothing back. I've actually kind of gotten used to it a bit, and don't feel as frustrated as I used to feel but it's still pretty much the strongest negative emotion I ever experience when participating on LW.

I believe there are good reasons to address this aside from my personal feelings, but I'm not sure if I'm being objective about that. So I'm interested to know whether this is something that's on the LW team's radar as a problem that could potentially be solved/ameliorated, or if they think it's not worth solving or probably can't be solved or it's more of a personal problem than a community problem. (See this old feature suggestion which I believe I've also re-submitted more recently, which might be one way to try to address the problem.)

Comment by wei_dai on Asymmetric Justice · 2019-04-27T01:04:57.550Z · score: 5 (2 votes) · LW · GW

“this seems like a problem, no idea what to do about it”

I think this is fine if made clear, but the post seemed to be implying (which the author later confirmed) that it did offer action-relevant implications.

Comment by wei_dai on Asymmetric Justice · 2019-04-27T00:11:17.051Z · score: 5 (2 votes) · LW · GW

[...] the models seem straightforwardly good.

Part of my complaint was that the models didn't seem to include enough gears for me to figure out what I could do to make things better. The author's own conclusions, which he later clarified in the comments, seems to be that we should individually do less of the thing that he suggests is bad. But my background assumption is that group rationality problems are usually coordination problems so it usually doesn't help much to tell people to individually "do the right thing". That would be analogous to telling players in PD to just play cooperate. At this point I still don't know whether or why the author's call to action would work better than telling players in PD to just play cooperate.

Comment by wei_dai on Asymmetric Justice · 2019-04-26T23:55:01.576Z · score: 8 (3 votes) · LW · GW

It becomes harder to form a good ending. You can’t just delete that line without substituting another ending.

My main complaint was that I just couldn't tell what you were trying to say with the current ending. If you're open to suggestions, I'd replace the last few lines with something like this instead:

If this analysis is correct, it suggests that we should avoid using asymmetric mental point systems except where structurally necessary. For example, the next time you're in situation ..., consider doing ... instead of ...


If we can’t put an incidental/payoff call to implied action into an analysis piece, then the concrete steps this suggests won’t get taken. People might think ‘this is interesting’ but not know what to do with it, and thus discard the presented model as unworthy of their brain space.

It's not clear to me whether you're saying A) people ought to keep this model in their brain even if there was no practical implication, but they won't in practice, so you had to give them one, or B) it's reasonable to demand a practical implication before making space for a model in one's brain which is why you included one in the post. If the latter, it seems like the practical implication / call to action shouldn't just be incidental, but significant space should be devoted to spelling it out clearly and backing it up by analysis/argument, so that if it was wrong it could be critiqued (which would allow people to discard the model after all).

Comment by wei_dai on Asymmetric Justice · 2019-04-26T19:21:45.621Z · score: 3 (1 votes) · LW · GW

If you allowed better capture of the upside then it would make sense to make them own more downside.

I thought Owen made a good case in the podcast that we currently have more mechanisms in place to fix/workaround the "insufficient capture of the upside" problem than the "insufficient capture of the downside" problem, as far as scientific research is concerned. (See also the related paper.) I would be interested to see the two of you engage each other's arguments directly.

The intention of the last line was, avoid using asymmetric mental point systems except where structurally necessary, and be-a-conclusion.

Do you have an explanation of why we currently are often using asymmetric mental point systems when it's not structurally necessary? My general expectation is that when it comes to deficiencies in human group rationality, there are usually economic / game theoretic reasons for them to exist, so you can't fix it by saying "just don't do that".

Comment by wei_dai on Asymmetric Justice · 2019-04-26T19:09:01.690Z · score: 13 (3 votes) · LW · GW

What’s the model whereby a LessWrong post ought to have a “takeaway message” or “call to action”?

I was trying to figure out what "Let us all be wise enough to aim higher." was intended to mean. It seemed like it was either a "takeaway message" (e.g., suggestion or call to action), or an applause light (i.e., something that doesn't mean anything but sounds good), hence my question.

Zvi’s post seems like it’s in the analysis genre, where an existing commonly represented story about right action is critiqued.

I guess the last sentence threw me, since it seems out of place in the analysis genre?

Comment by wei_dai on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-04-26T12:16:51.027Z · score: 6 (2 votes) · LW · GW

War only happens if two agents don’t have common knowledge about who would win (otherwise they’d agree to skip the costs of war).

But that assumes strong ability to enforcement agreements (which humans typically don't have). For example suppose it's common knowledge that if countries A and B went to war, A would conquer B with probability .9 and it would cost each side $1 trillion. If they could enforce agreements, then they could agree to roll a 10-sided die in place of the war and save $1 trillion each, but if they couldn't, then A would go to war with B anyway if it lost the roll, so now B has a .99 probability of being taken over. Alternatively maybe B agrees to be taken over by A with certainty but get some compensation to cover the .1 chance that it doesn't lose the war. But after taking over B, A could just expropriate all of B's property including the compensation that it paid.

Comment by wei_dai on Asymmetric Justice · 2019-04-26T11:52:13.963Z · score: 9 (5 votes) · LW · GW

Too often we assign risk without reward.

Sometimes we assign too little risk though. Owen Cotton-Barratt made this point in Why daring scientists should have to get liability insurance. Maybe assigning too much risk is worse by frequency, but assigning too little risk is worse by expected impact. In other words a few cases of assigning too little risk, leading to increased x-risk, could easily overwhelm many cases of "assign risk without reward."

Also this post doesn't seem to go into the root causes of "Too often we assign risk without reward." which leaves me wondering how we are supposed to fix the problem (assuming the problem is worth trying to fix). The last sentence "Let us all be wise enough to aim higher." sounds more like an applause light than a substantive suggestion or call to action. It confuses me that the post is so highly upvoted yet I have little idea what the takeaway message is supposed to be.

Comment by wei_dai on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-04-26T11:14:58.771Z · score: 3 (1 votes) · LW · GW

I think superintelligent AI will probably have superhuman capability at cheating in an absolute sense, i.e., they'll be much better than humans at cheating humans. But I don't see a reason to think they'll be better at cheating other superintelligent AI than humans are at cheating other humans, since SAI will also be superhuman at detecting and preventing cheating.

Comment by wei_dai on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-04-26T11:00:42.304Z · score: 4 (2 votes) · LW · GW

A big obstacle to human cooperation is bargaining: deciding how to split the benefit from cooperation. If it didn’t exist, I think humans would cooperate more.

Can you give some examples of where human cooperation is mainly being stopped by difficulty with bargaining? It seems to me like enforcing deals is usually the bigger part of the problem. For example in large companies there are a lot of inefficiencies like shirking, monitoring costs to reduce shirking, political infighting, empire building, CYA, red tape, etc., which get worse as companies get bigger. It sure seems like enforcement (i.e., there's no way to enforce a deal where everyone agrees to stop doing these things) rather than bargaining is the main problem there.

Or consider the inefficiencies in academia, where people often focus more on getting papers published than working on the most important problems. I think that's mainly because an agreement to reward people for publishing papers is easily enforceable and while an agreement to reward people for working on the most important problems isn't. I don't see how improved bargaining would solve this problem.

Do you have any mechanisms in mind that would make bargaining easier for AIs?

I haven't thought about this much, but perhaps if AIs had introspective access to their utility functions, that would make it easier for them to make use of formal bargaining solutions that take utility functions as inputs? Generally it seems likely that AIs will be better at bargaining than humans, for the same kind of reason as here, but AFAICT just making enforcement easier would probably suffice to greatly reduce coordination costs.

Comment by wei_dai on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-04-26T04:06:44.016Z · score: 8 (4 votes) · LW · GW

Yes, but humans generally hand off resources to their children as late as possible (whereas the AIs in my scheme would do so as soon as possible) which suggests that coordination is not the primary purpose for humans to have children.

Comment by wei_dai on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-04-26T03:00:20.921Z · score: 9 (4 votes) · LW · GW

Have you considered the specific mechanism that I proposed, and if so what do you find implausible about it? (If not, see this longer post or this shorter comment.)

I did manage to find a quote from you that perhaps explains most of our disagreement on this specific mechanism:

There are many other factors that influence coordination, after all; even perfect value matching is consistent with quite poor coordination.

Can you elaborate on what these other factors are? It seems to me that most coordination costs in the real world come from value differences, so it's puzzling to see you write this.

Abstracting away from the specific mechanism, as a more general argument, AI designers or evolution will (sooner or later) be able to explore a much larger region of mind design space than biological evolution could. Within this region there are bound to be minds much better at coordination than humans, and we should certainly expect coordination ability to be one objective that AI designers or evolution will optimize for since it offers a significant competitive advantage.

This doesn't guarantee that the designs that end up "winning" will have much better coordination ability than humans because maybe the designers/evolution will be forced to trade off coordination ability for something else they value, to the extent that the "winner" don't coordinate much better than humans, but that doesn't seem like something we should expect to happen by default, without some specific reason to, and it becomes less and less likely as more and more of mind design space is explored.

Comment by wei_dai on Strategic implications of AIs' ability to coordinate at low cost, for example by merging · 2019-04-25T07:21:27.778Z · score: 13 (6 votes) · LW · GW

One possible way for AIs to coordinate with each other is for two or more AIs to modify their individual utility functions into some compromise utility function, in a mutually verifiable way, or equivalently to jointly construct a successor AI with the same compromise utility function and then hand over control of resources to the successor AI. This simply isn't something that humans can do.

Comment by wei_dai on [Answer] Why wasn't science invented in China? · 2019-04-25T07:03:09.053Z · score: 8 (6 votes) · LW · GW

Classical Chinese is a language extremely difficult to master. It literally take decades of effort to be able to write a decent piece. It is hard not because of complicated grammar or complex sentence structure. But because it focus on poetic expressions and scholarly idioms.

Sounds like writing became mainly a way to signal one's intelligence and erudition, instead of a tool for efficient communications. But why didn't Western civilization fall into the same trap, or how did it manage to get out of it?

Strategic implications of AIs' ability to coordinate at low cost, for example by merging

2019-04-25T05:08:21.736Z · score: 42 (17 votes)
Comment by wei_dai on Where to Draw the Boundaries? · 2019-04-21T20:53:49.732Z · score: 14 (4 votes) · LW · GW

Thanks, I think I have a better idea of what you're proposing now, but I'm still not sure I understand it correctly, or if it makes sense.

mice and elephants form a cluster if you project into the subspace spanned by “color” and “relative ear size”, but using a word to point to a cluster in such a “thin”, impoverished subspace is a dishonest rhetorical move when your interlocutors are trying to use language to mostly talk about the many other features of animals which don’t covary much with color and relative-ear-size.

But there are times when it's not a dishonest rhetorical move to do this, right? For example suppose an invasive predator species has moved into some new area, and I have an hypothesis that animals with grey skin and big ears might be the only ones in that area who can escape being hunted to extinction (because I think the predator has trouble seeing grey and big ears are useful for hearing the predator and only this combination of traits offers enough advantage for a prey species to survive). While I'm formulating this hypothesis, discussing how plausible it is, applying for funding, doing field research, etc., it seems useful to create a new term like "eargreyish" so I don't have to keep repeating "grey animals with relatively large ears".

Since it doesn't seem to make sense to never use a word to point to a cluster in a "thin" subspace, what is your advice for when it's ok to do this or accept others doing this?

Comment by wei_dai on Announcement: AI alignment prize round 4 winners · 2019-04-19T22:30:58.962Z · score: 3 (1 votes) · LW · GW

Whose time do you mean? The judges? Your own time? The participants' time?

Comment by wei_dai on What failure looks like · 2019-04-17T21:08:07.976Z · score: 3 (1 votes) · LW · GW

The key issue here is whether there will be coordination between a set of influence-seeking systems that can cause (and will benefit from) a catastrophe, even when other systems are opposing them.

Do you not expect this threshold to be crossed sooner or later, assuming AI alignment remains unsolved? Also, it seems like the main alternative to this scenario is that the influence-seeking systems expect to eventually gain control of most of the universe anyway (even without a "correlated automation failure"), so they don't see a reason to "rock the boat" and try to dispossess humans of their remaining influence/power/resources, but this is almost as bad as the "correlated automation failure" scenario from an astronomical waste perspective. (I'm wondering if you're questioning whether things will turn out badly, or questioning whether things will turn out badly this way.)

Comment by wei_dai on What failure looks like · 2019-04-17T06:12:43.814Z · score: 7 (3 votes) · LW · GW

(Upvoted because I think this deserves more clarification/discussion.)

I'm not sure I understand this part. The influence-seeking systems which have the most influence also have the most to lose from a catastrophe. So they'll be incentivised to police each other and make catastrophe-avoidance mechanisms more robust.

I'm not sure either, but I think the idea is that once influence-seeking systems gain a certain amount of influence, it may become faster or more certain for them to gain more influence by causing a catastrophe than to continue to work within existing rules and institutions. For example they may predict that unless they do that, humans will eventually coordinate to take back the influence that humans lost, or they may predict that during such a catastrophe they can probably expropriate a lot of resources currently owned by humans and gain much influence that way, or humans will voluntarily hand more power to them in order to try to use them to deal with the catastrophe.

As an analogy: we may already be past the point where we could recover from a correlated "world leader failure": every world leader simultaneously launching a coup. But this doesn't make such a failure very likely, unless world leaders also have strong coordination and commitment mechanisms between themselves (which are binding even after the catastrophe).

I think such a failure can happen without especially strong coordination and commitment mechanisms. Something like this happened during the Chinese Warlord Era, when many military commanders became warlords during a correlated "military commander failure", and similar things probably happened many times throughout history. I think what's actually preventing a "world leader failure" today is that most world leaders, especially of the rich democratic countries, don't see any way to further their own values by launching coups in a correlated way. In other words, what would they do afterwards if they did launch such a coup, that would be better than just exercising the power that they already have?

Comment by wei_dai on Where to Draw the Boundaries? · 2019-04-16T15:54:53.757Z · score: 18 (6 votes) · LW · GW

My interest in terminological debates is usually not to discover new ideas but to try to prevent confusion (when readers are likely to infer something wrong from a name, e.g., because of different previous usage or because a compound term is defined to mean something that's different from what one would reasonably infer from the combination of individual terms). But sometimes terminological debates can uncover hidden assumptions and lead to substantive debates about them. See here for an example.

Comment by wei_dai on Where to Draw the Boundaries? · 2019-04-14T21:53:02.771Z · score: 15 (8 votes) · LW · GW

As someone who seems to care more about terminology than most (and as a result probably gets into more terminological debates on LW than anyone else (see 1 2 3 4)), I don't really understand what you're suggesting here. Do you think this advice is applicable to any of the above examples of naming / drawing boundaries? If so, what are its implications in those cases? If not, can you give a concrete example that might come up on LW or otherwise have some relevance to us?

Please use real names, especially for Alignment Forum?

2019-03-29T02:54:20.812Z · score: 30 (10 votes)

The Main Sources of AI Risk?

2019-03-21T18:28:33.068Z · score: 61 (24 votes)

What's wrong with these analogies for understanding Informed Oversight and IDA?

2019-03-20T09:11:33.613Z · score: 37 (8 votes)

Three ways that "Sufficiently optimized agents appear coherent" can be false

2019-03-05T21:52:35.462Z · score: 68 (17 votes)

Why didn't Agoric Computing become popular?

2019-02-16T06:19:56.121Z · score: 53 (16 votes)

Some disjunctive reasons for urgency on AI risk

2019-02-15T20:43:17.340Z · score: 37 (10 votes)

Some Thoughts on Metaphilosophy

2019-02-10T00:28:29.482Z · score: 55 (15 votes)

The Argument from Philosophical Difficulty

2019-02-10T00:28:07.472Z · score: 47 (13 votes)

Why is so much discussion happening in private Google Docs?

2019-01-12T02:19:19.332Z · score: 82 (23 votes)

Two More Decision Theory Problems for Humans

2019-01-04T09:00:33.436Z · score: 58 (19 votes)

Two Neglected Problems in Human-AI Safety

2018-12-16T22:13:29.196Z · score: 75 (24 votes)

Three AI Safety Related Ideas

2018-12-13T21:32:25.415Z · score: 73 (26 votes)

Counterintuitive Comparative Advantage

2018-11-28T20:33:30.023Z · score: 70 (25 votes)

A general model of safety-oriented AI development

2018-06-11T21:00:02.670Z · score: 70 (23 votes)

Beyond Astronomical Waste

2018-06-07T21:04:44.630Z · score: 92 (40 votes)

Can corrigibility be learned safely?

2018-04-01T23:07:46.625Z · score: 73 (25 votes)

Multiplicity of "enlightenment" states and contemplative practices

2018-03-12T08:15:48.709Z · score: 93 (23 votes)

Online discussion is better than pre-publication peer review

2017-09-05T13:25:15.331Z · score: 12 (12 votes)

Examples of Superintelligence Risk (by Jeff Kaufman)

2017-07-15T16:03:58.336Z · score: 5 (5 votes)

Combining Prediction Technologies to Help Moderate Discussions

2016-12-08T00:19:35.854Z · score: 13 (14 votes)

[link] Baidu cheats in an AI contest in order to gain a 0.24% advantage

2015-06-06T06:39:44.990Z · score: 14 (13 votes)

Is the potential astronomical waste in our universe too small to care about?

2014-10-21T08:44:12.897Z · score: 25 (27 votes)

What is the difference between rationality and intelligence?

2014-08-13T11:19:53.062Z · score: 13 (13 votes)

Six Plausible Meta-Ethical Alternatives

2014-08-06T00:04:14.485Z · score: 42 (43 votes)

Look for the Next Tech Gold Rush?

2014-07-19T10:08:53.127Z · score: 39 (37 votes)

Outside View(s) and MIRI's FAI Endgame

2013-08-28T23:27:23.372Z · score: 16 (19 votes)

Three Approaches to "Friendliness"

2013-07-17T07:46:07.504Z · score: 20 (23 votes)

Normativity and Meta-Philosophy

2013-04-23T20:35:16.319Z · score: 12 (14 votes)

Outline of Possible Sources of Values

2013-01-18T00:14:49.866Z · score: 14 (16 votes)

How to signal curiosity?

2013-01-11T22:47:23.698Z · score: 21 (22 votes)

Morality Isn't Logical

2012-12-26T23:08:09.419Z · score: 19 (35 votes)

Beware Selective Nihilism

2012-12-20T18:53:05.496Z · score: 40 (44 votes)

Ontological Crisis in Humans

2012-12-18T17:32:39.150Z · score: 44 (48 votes)

Reasons for someone to "ignore" you

2012-10-08T19:50:36.426Z · score: 23 (24 votes)

"Hide comments in downvoted threads" is now active

2012-10-05T07:23:56.318Z · score: 18 (30 votes)

Under-acknowledged Value Differences

2012-09-12T22:02:19.263Z · score: 47 (50 votes)

Kelly Criteria and Two Envelopes

2012-08-16T21:57:41.809Z · score: 11 (8 votes)

Cynical explanations of FAI critics (including myself)

2012-08-13T21:19:06.671Z · score: 21 (32 votes)

Work on Security Instead of Friendliness?

2012-07-21T18:28:44.692Z · score: 37 (40 votes)

Open Problems Related to Solomonoff Induction

2012-06-06T00:26:10.035Z · score: 27 (28 votes)

List of Problems That Motivated UDT

2012-06-06T00:26:00.625Z · score: 28 (29 votes)

How can we ensure that a Friendly AI team will be sane enough?

2012-05-16T21:24:58.681Z · score: 10 (15 votes)

Neuroimaging as alternative/supplement to cryonics?

2012-05-12T23:26:28.429Z · score: 17 (18 votes)

Strong intutions. Weak arguments. What to do?

2012-05-10T19:27:00.833Z · score: 17 (19 votes)

How can we get more and better LW contrarians?

2012-04-18T22:01:12.772Z · score: 58 (62 votes)

Reframing the Problem of AI Progress

2012-04-12T19:31:04.829Z · score: 28 (31 votes)

against "AI risk"

2012-04-11T22:46:10.533Z · score: 24 (37 votes)