Against Premature Abstraction of Political Issues 2019-12-18T20:19:53.909Z · score: 60 (19 votes)
What determines the balance between intelligence signaling and virtue signaling? 2019-12-09T00:11:37.662Z · score: 60 (16 votes)
Ways that China is surpassing the US 2019-11-04T09:45:53.881Z · score: 50 (19 votes)
List of resolved confusions about IDA 2019-09-30T20:03:10.506Z · score: 98 (33 votes)
Don't depend on others to ask for explanations 2019-09-18T19:12:56.145Z · score: 78 (25 votes)
Counterfactual Oracles = online supervised learning with random selection of training episodes 2019-09-10T08:29:08.143Z · score: 45 (12 votes)
AI Safety "Success Stories" 2019-09-07T02:54:15.003Z · score: 102 (30 votes)
Six AI Risk/Strategy Ideas 2019-08-27T00:40:38.672Z · score: 61 (30 votes)
Problems in AI Alignment that philosophers could potentially contribute to 2019-08-17T17:38:31.757Z · score: 83 (34 votes)
Forum participation as a research strategy 2019-07-30T18:09:48.524Z · score: 112 (37 votes)
On the purposes of decision theory research 2019-07-25T07:18:06.552Z · score: 65 (21 votes)
AGI will drastically increase economies of scale 2019-06-07T23:17:38.694Z · score: 41 (15 votes)
How to find a lost phone with dead battery, using Google Location History Takeout 2019-05-30T04:56:28.666Z · score: 53 (24 votes)
Where are people thinking and talking about global coordination for AI safety? 2019-05-22T06:24:02.425Z · score: 94 (32 votes)
"UDT2" and "against UD+ASSA" 2019-05-12T04:18:37.158Z · score: 49 (16 votes)
Disincentives for participating on LW/AF 2019-05-10T19:46:36.010Z · score: 79 (34 votes)
Strategic implications of AIs' ability to coordinate at low cost, for example by merging 2019-04-25T05:08:21.736Z · score: 56 (22 votes)
Please use real names, especially for Alignment Forum? 2019-03-29T02:54:20.812Z · score: 40 (13 votes)
The Main Sources of AI Risk? 2019-03-21T18:28:33.068Z · score: 70 (29 votes)
What's wrong with these analogies for understanding Informed Oversight and IDA? 2019-03-20T09:11:33.613Z · score: 39 (9 votes)
Three ways that "Sufficiently optimized agents appear coherent" can be false 2019-03-05T21:52:35.462Z · score: 69 (18 votes)
Why didn't Agoric Computing become popular? 2019-02-16T06:19:56.121Z · score: 53 (16 votes)
Some disjunctive reasons for urgency on AI risk 2019-02-15T20:43:17.340Z · score: 37 (10 votes)
Some Thoughts on Metaphilosophy 2019-02-10T00:28:29.482Z · score: 57 (16 votes)
The Argument from Philosophical Difficulty 2019-02-10T00:28:07.472Z · score: 47 (13 votes)
Why is so much discussion happening in private Google Docs? 2019-01-12T02:19:19.332Z · score: 87 (26 votes)
Two More Decision Theory Problems for Humans 2019-01-04T09:00:33.436Z · score: 58 (19 votes)
Two Neglected Problems in Human-AI Safety 2018-12-16T22:13:29.196Z · score: 79 (27 votes)
Three AI Safety Related Ideas 2018-12-13T21:32:25.415Z · score: 63 (25 votes)
Counterintuitive Comparative Advantage 2018-11-28T20:33:30.023Z · score: 78 (30 votes)
A general model of safety-oriented AI development 2018-06-11T21:00:02.670Z · score: 71 (24 votes)
Beyond Astronomical Waste 2018-06-07T21:04:44.630Z · score: 95 (42 votes)
Can corrigibility be learned safely? 2018-04-01T23:07:46.625Z · score: 75 (26 votes)
Multiplicity of "enlightenment" states and contemplative practices 2018-03-12T08:15:48.709Z · score: 101 (26 votes)
Online discussion is better than pre-publication peer review 2017-09-05T13:25:15.331Z · score: 18 (15 votes)
Examples of Superintelligence Risk (by Jeff Kaufman) 2017-07-15T16:03:58.336Z · score: 5 (5 votes)
Combining Prediction Technologies to Help Moderate Discussions 2016-12-08T00:19:35.854Z · score: 13 (14 votes)
[link] Baidu cheats in an AI contest in order to gain a 0.24% advantage 2015-06-06T06:39:44.990Z · score: 14 (13 votes)
Is the potential astronomical waste in our universe too small to care about? 2014-10-21T08:44:12.897Z · score: 25 (27 votes)
What is the difference between rationality and intelligence? 2014-08-13T11:19:53.062Z · score: 13 (13 votes)
Six Plausible Meta-Ethical Alternatives 2014-08-06T00:04:14.485Z · score: 49 (50 votes)
Look for the Next Tech Gold Rush? 2014-07-19T10:08:53.127Z · score: 39 (37 votes)
Outside View(s) and MIRI's FAI Endgame 2013-08-28T23:27:23.372Z · score: 16 (19 votes)
Three Approaches to "Friendliness" 2013-07-17T07:46:07.504Z · score: 20 (23 votes)
Normativity and Meta-Philosophy 2013-04-23T20:35:16.319Z · score: 12 (14 votes)
Outline of Possible Sources of Values 2013-01-18T00:14:49.866Z · score: 14 (16 votes)
How to signal curiosity? 2013-01-11T22:47:23.698Z · score: 21 (22 votes)
Morality Isn't Logical 2012-12-26T23:08:09.419Z · score: 19 (35 votes)
Beware Selective Nihilism 2012-12-20T18:53:05.496Z · score: 40 (44 votes)
Ontological Crisis in Humans 2012-12-18T17:32:39.150Z · score: 46 (50 votes)


Comment by wei_dai on AI Alignment Open Thread October 2019 · 2020-01-20T08:41:31.375Z · score: 3 (1 votes) · LW · GW

By unpredictable I mean that nobody really predicted:

  1. The Left routed the Right so thoroughly in the Cultural War in just the last few years.
  2. Having won, the Left is over-correcting so much in places where it gained power (e.g., establishing policies to achieve equality of outcomes, which ignore politically incorrect facts that nobody is allowed to argue about, establishing the equivalent of loyalty pledges in academia, "canceling" people on the flimsiest of pretexts, etc.). We can explain this after the fact by saying that the Left is being forced by impersonal social dynamics, e.g., runaway virtue signaling, to over-correct, but did anyone predict this ahead of time? How could we have known ahead of time that the virtue of "open inquiry" would lose out to that of "diversity, equity, and inclusion"?
  3. Why is the Left so focused now on equality based on identity groups instead of socioeconomic class, or individual outcomes?
  4. Russia and China adopted communism even though they were extremely poor. (They were ahead of the US in gender equality and income equality for a time due to that, even though they were much poorer.)

None of these seem well-explained by your "rich society" model. My current model is that social media and a decrease in the perception of external threats relative to internal threats both favor more virtue signaling, which starts spiraling out of control after some threshold is crossed. But the actual virtue(s) that end up being signaled/reinforced (often at the expense of other virtues) is historically contingent and hard to predict.

Comment by wei_dai on AI Alignment Open Thread October 2019 · 2020-01-20T06:11:43.016Z · score: 19 (4 votes) · LW · GW

Studying recent cultural changes in the US and the ideas of virtue signaling and preference falsification more generally has also made me more pessimistic about non-AGI or delayed-AGI approaches to a positive long term future (e.g., the Long Reflection). I used to think that if we could figure out how to achieve strong global coordination on AI, or build a stable world government, then we'd be able to take our time, centuries or millennia if needed, to figure out how to build an aligned superintelligent AI. But it seems that human cultural/moral evolution often happens through some poorly understood but apparently quite unstable dynamics, rather than by philosophers gradually making progress on moral philosophy and ultimately converging to moral truth as I may have imagined or implicitly assumed. (I did pay lip service to concerns about "value drift" back then but I guess it just wasn't that salient to me.)

Especially worrying is that no country or culture seems immune to these unpredictable dynamics. My father used to tell me to look out for the next Cultural Revolution (having lived through one himself), and I always thought that it was crazy to worry about something like that happening in the West. Well I don't anymore.

Comment by wei_dai on What long term good futures are possible. (Other than FAI)? · 2020-01-20T04:52:26.596Z · score: 3 (1 votes) · LW · GW

We build what I called a modest superintelligence, consisting of one or more humans who are naturally extremely intelligent or who underwent intelligence amplification, they figure out how to build a stable world government and decide that it's safer to do WBE and gradually increase human (em) intelligence than to build an FAI.

Comment by wei_dai on How to Escape From Immoral Mazes · 2020-01-20T04:22:10.733Z · score: 5 (2 votes) · LW · GW

What are some specific alternative career paths to consider, for (1) someone who is already in the maze and wants a "second act", and (2) someone who thinks climbing the corporate ladder is their comparative advantage, but hasn't started yet?

Comment by wei_dai on Open & Welcome Thread - January 2020 · 2020-01-19T19:18:34.622Z · score: 10 (5 votes) · LW · GW

Anyone else kept all photos of themselves off the public net because they saw something like this coming?

Comment by wei_dai on How to Escape From Immoral Mazes · 2020-01-19T01:55:00.407Z · score: 11 (4 votes) · LW · GW

I don't like playing politics, I don't like having bosses and being told what to do, I don't like competition, I have no desire to manage other people, so I've instinctively avoided or quickly left any places that were even remotely maze-like. I guess I'm the exact opposite of the target audience of this post. :)

But suppose someone likes (or thinks they like) or has a comparative advantage in some of these things, and their career plan was to climb the corporate ladder. What would you suggest they go into instead?

Also, it seems a lot harder to take the kind of risks you're talking about if you're already in a maze and you have kids and a mortgage. Do you have any advice for people in that situation?

Comment by wei_dai on The Road to Mazedom · 2020-01-19T01:13:54.336Z · score: 5 (2 votes) · LW · GW

Would "maze behavior = playing company politics, especially punishment of attempts to communicate clearly about ethics and object level output" be a good way to sum it up? (Or is it missing something important?)

Comment by wei_dai on The Road to Mazedom · 2020-01-18T23:00:17.168Z · score: 3 (1 votes) · LW · GW

I guess "maze behavior" is short for "maze-creating and maze-supporting behaviors"? Did you explain what such behavior consist of previously in the sequence? (I tried searching and skimming the previous posts but couldn't find it.)

Comment by wei_dai on Political Roko's basilisk · 2020-01-18T20:22:07.074Z · score: 14 (4 votes) · LW · GW

I think something like this actually happens a lot in politics. To give a contemporary example, in many places in the West currently, the best way to protect oneself against being politically attacked in the future via accusations of various "isms" and "phobias" is to prove to be an "ally" by loudly accusing others of "isms" and "phobias" or otherwise helping the faction that tends to make such accusations to gain more power.

(Posting as a comment because it's not exactly what you're looking for, i.e., a bill or law.)

Comment by wei_dai on Identity Isn't In Specific Atoms · 2020-01-18T07:43:21.932Z · score: 10 (2 votes) · LW · GW

I agree that this is a major unsolved problem. I started thinking about this problem more than 20 years ago which eventually led to UDT (in part as an attempt to sidestep it). At one point I thought maybe we can just give up anticipation and switch to using UDT which doesn't depend on a notion of anticipation, but I currently think that some of our values are likely expressed in terms of anticipation so we probably still have to solve the problem (or a version of it) before we can translate them into a UDT utility function.

Comment by wei_dai on ialdabaoth is banned · 2020-01-17T03:55:18.140Z · score: 8 (3 votes) · LW · GW

I'm less worried about "cancel culture" becoming a thing on LW than in EA (because we seem naturally more opposed to that kind of thing), but I'm still a bit worried. I think having mods be obligated to explain all non-trivial banning decisions (with specifics instead of just pointing to broad categories like "manipulative") would be a natural Schelling fence to protect against a potential slippery slope, so the costs involved may be worth paying from that perspective.

Comment by wei_dai on Please Critique Things for the Review! · 2020-01-16T20:59:23.195Z · score: 7 (3 votes) · LW · GW

And yeah, the whole thing feels mostly like work, which can’t help.

This is partly why I haven't done any reviews, despite feeling a vague moral obligation to do so. Another reason is that I wasn't super engaged with LW throughout most of 2018 and few of the nominated posts jumped out at me (as something that I have a strong opinion about) from a skim of the titles, and the ones that did jump out at me I think I already commented on back when they were first posted and don't feel motivated to review them now. Maybe that's because I don't like to pass judgment (I don't think I've written a review for anything before) and when I first commented it was in the spirit of "here are some tentative thoughts I'm bringing up for discussion".

Also, I haven't voted yet because I don't remember the details of the vast majority of the posts, and don't feel comfortable just voting based on my current general feeling about each post (which is probably most strongly influenced by how much I currently agree with the main points it tried to make), and I also don't feel like taking the time to re-read all of the posts. (I think for this reason perhaps whoever's selecting the final posts to go into the book should consider post karma as much or even more than the votes?)

I think if there was a period where every few days a mod would post a few nominated posts and ask people to re-read and re-discuss them, that might have helped to engage people like me more. (Although honestly there's so much new content on LW competing for attention now that I might not have participated much even in that process.)

Comment by wei_dai on Why Quantum? · 2020-01-16T20:10:16.317Z · score: 3 (1 votes) · LW · GW

Apparently LessWrong comments are not indexed by google, so I don’t have a non-time-intensive way of tracking down that comment.

Comments should be indexed by Google (I've seen comments show up in my search results before), but maybe not completely? Can you send a note to the LW team (telling them why you think comments are not being indexed) to see if there's anything they can do about this? In the meantime, have you tried LW's own search feature (the magnifying glass icon at the top)?

Here’s a paper by David Wallace on Deutsch’s decision theory formulation of the Born probabilities

I actually wrote a comment about that back in 2009 but haven't revisited it since. Have you read the response/counterargument I linked to, and still find Wallace's paper compelling?

Comment by wei_dai on Open & Welcome Thread - January 2020 · 2020-01-16T09:22:06.230Z · score: 17 (6 votes) · LW · GW

I think with the parliamentary model, it's probably best to assume away as many of the problems with group rationality as you can.

A big source of problems in group rationality is asymmetric information, and for the parliamentary model we can just assume that all the delegates can costlessly learn everything about all the other delegates, or equivalently that they differ only in their morality and not in their information set.

Another big source of problems is that coalitional behavior can lead to arbitrary and unfair outcomes: for example if you start out with three equal individuals, and any two of them can ally with each other and beat up the third person and take their stuff, you're going to end up with an arbitrary and unfair situation. For this perhaps we can assume that the delegates just don't engage in alliance building and always vote according to their own morality without regard to strategic coalitional considerations. (Actually I'm not sure this paragraph makes sense but I've run out of time to think about it.)

I'm probably missing other important group rationality problems, but hopefully this gives you the general idea.

Comment by wei_dai on Predictors exist: CDT going bonkers... forever · 2020-01-16T09:00:15.618Z · score: 11 (6 votes) · LW · GW

My introductory textbook to decision theory was an attempt to build for CDT an elegant mathematical foundation to rival the jeffrey-bolker axioms for EDT. And why do this? It said, basically, “EDT gives the wrong answer in Newcomb’s Problem and other problems, so we need to find a way to make some version of CDT mathematically respectable.”

Joyce's Foundations of Causal Decision Theory, right? That was the book I bought to learn decision theory too. My focus was on anthropic reasoning instead of Newcomb's problem at the time, so I just uncritically accepted the book's contention that two-boxing is the rational thing to do. As a result, while trying to formulate my own decision theory, I had to come up with complicated ways to force it to two-box. It was only after reading Eliezer's posts about Newcomb's problem that I realized that if one-boxing is actually the right thing to do, the decision theory could be made much more elegant. (Too bad it turns out to still have a number of problems that we don't know how to solve.)

Comment by wei_dai on The Alignment-Competence Trade-Off, Part 1: Coalition Size and Signaling Costs · 2020-01-16T07:58:52.193Z · score: 5 (2 votes) · LW · GW

Wish I had this post to reference when I wrote AGI will drastically increase economies of scale.

Comment by wei_dai on Why Quantum? · 2020-01-11T07:01:58.376Z · score: 6 (3 votes) · LW · GW

I'm not happy with the Barbour post either, but the rest of the sequence seems better. There was a post on this topic.

Just like the first post introducing the Born probabilities has a comment pointing out that the probabilities fall out of the Taylor expansion on state evolution and are not in fact mysterious at all. (Alternatively you can show this from the decision theory formulation.)

Can you link to this please? And explain the decision theory thing if that's not part of the comment you're referring to?

Comment by wei_dai on Outer alignment and imitative amplification · 2020-01-11T06:42:14.663Z · score: 5 (2 votes) · LW · GW

I may have asked this already somewhere, but do you know if there's a notion of "outer aligned" that is applicable to oracles/predictors in general, as opposed to trying to approximate/predict HCH specifically? Basically the problem is that I don't know what "aligned" or "trying to do what we want" could mean in the general case. Is "outer alignment" meant to be applicable in the general case?

This post talks about outer alignment of the loss function. Do you think it also makes sense to talk about outer alignment of the training process as a whole, so that for example if there is a security hole in the hardware or software environment and the model takes advantage of the security hole to hack its loss/reward, then we'd call that an "outer alignment failure". Or would it make more sense to use different terminology for that?

Intuitively, I will say that a loss function is outer aligned at optimum if all the possible models that perform optimally according that loss function are aligned with our goals—that is, they are at least trying to do what we want.

So technically, one should say that a loss function is outer aligned at optimum with respect to some model class, right?

Also, related to Ofer's comment, can you clarify whether it's intended for this definition that the loss function only looks at the model's input/output behavior, or can it also take into account other information about the model?

HCH is just a bunch of humans after all and if you can instruct them not to do dumb things like instantiate arbitrary Turing machines

I believe the point about Turing machines was that given Low Bandwidth Overseer, it's not clear how to get HCH/IA to do complex tasks without making it instantiate arbitrary Turing machines. But other issues arise with HBO, as William Saunders wrote in the above linked post:

The reason for this system [LBO] being introduced is wanting to avoid security issues as the system scales. The fear is that there would be an “attack” on the system: an input that could be shown to an overseer that would cause the overseer to become corrupted and try to sabotage the system. This could be some kind of misleading philosophical argument, some form of blackmail, a human adversarial example, etc. If an input like this exists, then as soon as the first agent is corrupted, it can try to spread the attack to other agents. The first agent could be corrupted either by chance, or through an attack being included in the input.

I understand you don't want to go into details about whether theoretical HCH is aligned or not here, but I still want to flag that "instruct them not to do dumb things like instantiate arbitrary Turing machines" seems rather misleading. I'm also curious whether you have HBO or LBO in mind for this post.

Comment by wei_dai on Outer alignment and imitative amplification · 2020-01-11T06:39:05.641Z · score: 20 (5 votes) · LW · GW

Aside from some quibbles, this matches my understanding pretty well, but may leave the reader wondering why Paul Christiano and Ought decided to move away from imitative amplification to approval-based amplification. To try to summarize my understanding of their thinking (mostly from an email conversation in September of last year between me, you (Evan), Paul Christiano, and William Saunders):

  • William (and presumably Paul) think approval-based amplification can also be outer aligned. (I do not a good understand why they think this, and William said "still have an IOU pending to provide a more fleshed out argument why it won't fail.")
  • Paul thinks imitative amplification has a big problem when the overseer gets amplified beyond the capacity of the model class that's being trained. (Approximating HCH as closely as possible wouldn't lead to good results in that case unless we had a rather sophisticated notion of "close".)
  • I replied that we could do research into how the overseer could effectively dumb itself down, similar to how a teacher would dumb themselves down to teach a child. One approach is to use a trial-and-error process, for example ramping up the difficulty of what it’s trying to teach and then backing down if the model stops learning well, and trying a different way of improving task performance and checking if the model can learn that, and so on. (I didn't get a reply on this point.)
  • William also wrote, "RL-IA is easier to run human experiments in, because the size of trees to complete tasks, and the access to human experts with full knowledge of the tree (eg the Ought reading comprehension experiments) I'd lean towards taking the position that we should try to use SL-IA where possible, but some tasks might just be much easier to work with in RL-AI"
Comment by wei_dai on Meta-discussion from "Circling as Cousin to Rationality" · 2020-01-06T08:22:13.474Z · score: 18 (10 votes) · LW · GW

Suppose (as seems to be the implication of your comments) that Eliezer would write for Less Wrong if, and only if, the environment (both technical and social) were comparable to that which now exists on Facebook (his current platform of choice).

I want to note that Eliezer now seems to spend more time on Twitter than on Facebook, and the discussion on Twitter is even lower quality than on Facebook or simply absent (i.e., I rarely see substantial back-and-forth discussions on Eliezer's Twitter posts). This, plus the fact that the LW team already made a bunch of changes at Eliezer's request to try to lure him back, without effect, makes me distrust habryka's explanations in the grandparent comment.

Comment by wei_dai on Open & Welcome Thread - December 2019 · 2020-01-06T08:07:23.233Z · score: 5 (2 votes) · LW · GW

I don't recall writing such a comment.

Comment by wei_dai on The Universe Doesn't Have to Play Nice · 2020-01-06T07:42:27.380Z · score: 18 (4 votes) · LW · GW

It’s often helpful to think about the root cause of your disagreements with other people

Can you please link to the specific disagreements you have in mind, so I can judge whether your proposed root cause actually explains those disagreements?

Sometimes the universe will play nice, but we can’t assume it.

Sure, but we also need to be wary about prematurely concluding that the universe doesn't play nice in any specific area. If my IQ was 30 points lower or I was born a few centuries earlier, I wouldn't be able to know or understand a lot of things that I actually do. A lot of places where it seems like the universe isn't playing nice might just be a reflection of us not being smart enough or not having thought long enough.

Comment by wei_dai on What is Life in an Immoral Maze? · 2020-01-06T07:23:28.283Z · score: 15 (5 votes) · LW · GW

An Immoral Maze can be modeled as a super-perfectly competitive job market for management material. All the principles of super-perfect competition are in play. The normal barriers to such competition have been stripped away. Too many ‘qualified’ managers compete for too few positions.

Why is middle management more of a super-perfectly competitive field than other jobs? I'm skeptical that "super-perfect competition" is really what makes middle management more of an "Immoral Maze" compared to other fields. I'm guessing that something like the difficulty of measuring manager productivity (given randomness of the people they manage and other environmental factors) probably has more to do with it.

You previously wrote:

We will define super-perfect competition as competition that mostly has most of the elements of perfect competition, but lacks free entry and free exit, which creates more production than there would be at equilibrium (and thus, also, a lower price).

I don't see why middle management would especially "lacks free entry and free exit".

Comment by wei_dai on Less Wrong Poetry Corner: Walter Raleigh's "The Lie" · 2020-01-06T05:39:54.159Z · score: 7 (3 votes) · LW · GW

(This comment is really helpful for me to understand your positions.)

Some part of your brain is performing some computation that, if it works, to the extent that it works, is mirroring Bayesian decision theory. But that doesn’t help the part of you can that talk, that can be reached by the part of me that can talk.

Why not? It seems likely to me that the part of my brain that is doing something like Bayesian decision theory can be trained in certain directions by the part of me that talks/listens (for example by studying history or thinking about certain thought experiments).

I falsifiably predict that a culture that has “Use Bayesian decision theory to decide whether or not to speak the truth” as its slogan won’t be able to do science

I'm not convinced of this. Can you say more about why you think this?

Comment by wei_dai on My new rationality/futurism podcast · 2020-01-06T05:13:22.045Z · score: 5 (2 votes) · LW · GW

I’ve been trying to understand the underlying dynamics that’s driving the leftward trend in academia and have not been able to find much online.

Since I wrote that, I came across Why Are Professors Liberal? with Neil Gross (a talk based on his 2013 book) which presents evidence that at least as of 2013, professors were mostly liberal due to self-selection: liberals tended to see academia as a suitable career choice more than conservatives (possibly due to some historically contingent reasons which then became self-reinforcing).

But since that book/video was released, the leftward trend in academia has accelerated, which I still haven't found a great explanation for, but my guess is that it's due to a combination of (1) increasing virtue signaling makes academia seem even less suitable for those who aren't left leaning and (2) more explicit or barely concealed discrimination based on ideology.

Comment by wei_dai on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-06T04:33:43.531Z · score: 7 (3 votes) · LW · GW

It seems that the interviewees here either:

  1. Use "AI risk" in a narrower way than I do.
  2. Neglected to consider some sources/forms of AI risk (see above link).
  3. Have considered other sources/forms of AI risk but do not find them worth addressing.
  4. Are worried about other sources/forms of AI risk but they weren't brought up during the interviews.

Can you talk about which of these is the case for yourself (Rohin) and for anyone else whose thinking you're familiar with? (Or if any of the other interviewees would like to chime in for themselves?)

Comment by wei_dai on Does GPT-2 Understand Anything? · 2020-01-04T09:02:59.518Z · score: 3 (1 votes) · LW · GW

One way we might choose to draw these distinctions is using the technical vocabulary that teachers have developed. Reasoning about something is more than mere Comprehension: it would be called Application, Analysis or Synthesis, depending on how the reasoning is used.

So would you say that GPT-2 has Comprehension of "recycling" but not Comprehension of "in favor of" and "against", because it doesn't show even the basic understand that the latter pair are opposites? I feel like even teachers' technical vocabulary isn't great here because it was developed with typical human cognitive development in mind, and AIs aren't "growing up" the same way.

Comment by wei_dai on Meta-discussion from "Circling as Cousin to Rationality" · 2020-01-04T08:52:09.323Z · score: 13 (3 votes) · LW · GW

A world where authors can simply ignore questions like this without significant negative social consequences is also the world that I would prefer the most, though I think we are currently not in that world, and getting there requires some shift in norms in culture that I would like to see.

I occasionally ignore questions and comments and have not noticed any significant negative social consequences from doing so. Others have also sometimes ignored my questions/comments without incurring significant negative social consequences that I can see. It seems to me that the current culture is already one where authors can simply ignore questions/comments, especially ones that are not highly upvoted. (I'd actually like to switch to or experiment with a norm where people have to at least indicate why they're ignoring something.)

Given this, I'm puzzled that other authors have complained to you about feeling obligated to answer questions. Can you explain more why they feel that way, or give some quotes of what people actually said?

Oh, I do recall someone saying that they feel obligated to answer all critical comments, but my interpretation is that it has more to do with their personal psychology than the site culture or potential consequences.

Comment by wei_dai on My new rationality/futurism podcast · 2020-01-04T01:08:34.998Z · score: 4 (2 votes) · LW · GW

I don’t see the trend reversing unless the economics of higher education change.

How is the economics of higher education causing the trend? Have you written about this anywhere? I've been trying to understand the underlying dynamics that's driving the leftward trend in academia and have not been able to find much online. (The same trend exists in journalism, K-12 education, etc., but that could perhaps all be explained by academia producing graduates who are increasingly left-wing.) It seems like while LessWrong has been trying to "raise the sanity waterline", the bigger trend in society is increasing bias (towards the political left, especially among elites), which I think we should probably pay more attention to, because it seems likely to affect our x-risk efforts sooner or later. (Arguably it already is.)

Comment by wei_dai on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2020-01-03T11:27:40.263Z · score: 5 (2 votes) · LW · GW

I think it does actually, although I'm not sure how exactly. See Logical vs physical risk aversion.

Comment by wei_dai on Does GPT-2 Understand Anything? · 2020-01-03T11:21:09.179Z · score: 8 (4 votes) · LW · GW

Some people have expressed that “GPT-2 doesn’t understand anything about language or reality. It’s just huge statistics.” In at least two senses, this is true.

My complaint is that GPT-2 isn't able to reason with whatever "understanding" it has (as shown by FeepingCreature's example “We are in favor of recycling, because recycling doesn’t actually improve the environment, and that’s why we are against recycling.”) which seems like the most important thing we want in an AI that "understands language".

With an abstract understanding, here are some things one can do: • answer questions about it in one’s own words • define it • use it appropriately in a sentence • provide details about it • summarize it

I suggest that these are all tests that in a human highly correlates with being able to reason with a concept (which again is what we really want) but the correlation apparently breaks down when we're dealing with AI, so the fact that an AI can pass these tests doesn't mean as much as it would with a human.

At this point we have to decide whether we want the word "understand" to mean "... and is able to reason with it" and I think we do because if we say "GPT-2 understands language" then a lot of people will misinterpret that as meaning that GPT-2 can do verbal/symbolic reasoning, and that seems worse than the opposite confusion, where we say "GPT-2 doesn't understand language" and people misinterpret that as meaning that GPT-2 can't give definitions or summaries.

Comment by wei_dai on Perfect Competition · 2020-01-01T05:27:15.565Z · score: 5 (2 votes) · LW · GW

Like johnswentworth, I also don't understand the leap to "Iron Law of Wages takes over". You seem to be at least making some unstated assumptions in that part. Besides, even if subsistence wages do obtain for labor, land should continue to be very valuable and be earning a lot of rent/surplus, so can't the landlords at least be considered "children in Disneyland"?

Comment by wei_dai on Speaking Truth to Power Is a Schelling Point · 2020-01-01T05:03:37.370Z · score: 7 (4 votes) · LW · GW

Speak the truth, even if your voice trembles—unless adding that truth to our map would make it x% harder for our coalition to compete for Imperial grant money

Why do you assume that this is the only negative consequence of speaking the truth? In the real world (that I think I live in), speaking some truths might get your child bullied in school (including by the teachers or administrators), or get you unemployed, jailed, or killed. Is this post supposed to have applications in that world?

Comment by wei_dai on human psycholinguists: a critical appraisal · 2020-01-01T04:49:45.683Z · score: 9 (4 votes) · LW · GW

Do you know any promising theories of the higher levels of speech production (i.e., human verbal/symbolic reasoning)? That seems to me to be one of the biggest missing pieces at this point of a theoretical understanding of human intelligence (and of AGI theory), and I wonder if there's actually good theoretical work out there that I'm just not aware of.

Comment by wei_dai on We run the Center for Applied Rationality, AMA · 2019-12-25T05:43:51.671Z · score: 19 (9 votes) · LW · GW

and makes it somewhat plausible why I’m claiming that “coming to take singularity scenarios seriously can be pretty disruptive to common sense,” and such that it might be nice to try having a “bridge” that can help people lose less of the true parts of common sense as their world changes

Can you say a bit more about how CFAR helps people do this? Some of the "confusions" you mentioned are still confusing to me. Are they no longer confusing to you? If so, can you explain how that happened and what you ended up thinking on each of those topics? For example lately I'm puzzling over something related to this:

Given this, should I get lost in “what about simulations / anthropics” to the point of becoming confused about normal day-today events?

Comment by wei_dai on Funk-tunul's Legacy; Or, The Legend of the Extortion War · 2019-12-24T18:05:47.499Z · score: 8 (4 votes) · LW · GW

When a ceedeetee agent met a 9-bot, she would reason causally: “Well, the other agent is going to name 9, so I had better name 1 if I want any payoff at all!”

How does a ceedeetee agent tell what kind of opponent they're facing, and what prevents ceedeetee agents from evolving to or deciding to hide such externally visible differences?

Depending on such details, there are situations where TDT/UDT/FDT seemingly does worse than CDT. See this example (a variant of 2TDT-1CDT) from cousin_it:

Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A is a TDT agent named Alice. The child in universe B is named Bob and has a random mutation that makes him use CDT. Both children go on to play many blind PDs with their neighbors. It looks like Bob’s life will be much happier than Alice’s, right?

More tangentially, the demand game is also one that UDT 1.x loses to a human, because of "unintentional simulation".

Comment by wei_dai on Should I floss? · 2019-12-24T16:53:43.423Z · score: 15 (7 votes) · LW · GW

From A Network Meta-analysis of Interproximal Oral Hygiene Methods in the Reduction of Clinical Indices of Inflammation:

The present BNMA enabled us to quantitatively evaluate OH aids and provide a global ranking of their efficacy. Among 10 IOH aids, interdental brushes and water-jets ranked high among the aids for reducing gingival bleeding. Unsupervised flossing did not yield substantial reductions in gingival inflammation. The present findings are aligned with the recommendations set forth following a consensus meeting during the 11th European Workshop in Periodontology3 that forced the periodontal community to rethink the recommendation for flossing across groups and levels of periodontal health. The present work corroborates the recommendations derived from the workshop proceedings, which state that flossing cannot be generally recommended for managing gingivitis except for sites where the interdental space is too limited to allow the passage of an interdental brush without trauma. In fact, our meta-analysis did not limit the selection to interproximal hygiene aids to interdental brushes and floss but also suggested that water-jet devices and potentially toothpicks, when used under intensive oral hygiene instruction, may be beneficial homecare aids in the management of gingivitis. Given the prevalence of gingivitis, providing the general public with efficacious alternatives to flossing would likely have significant public health impact.

Flossing has received the most attention among IOH aids and is highly recommended by dentists and dental associations alike due to its conceptually superior capability of removing plaque for interdental areas.50 Therefore, a word of caution regarding the interpretation of findings from the present study is important. The present NMA does not refute the efficacy of flossing for removing interproximal plaque around teeth. The challenge of performing a technically-demanding OH habit51 such as flossing may help explain it’s relatively poor ranking against other IOH aids. When performed effectively, flossing is likely an efficacious approach against gingival inflammation and potentially dental caries.52 For example, one study confirmed that daily (weekdays) professional flossing can prevent the incidence of caries in schoolchildren by 40%.52 Nevertheless, trials assessing the efficacy of self-administered flossing and dental caries have largely failed to show any effect.53 These findings support the hypothesis that flossing is indeed efficacious, but its effective application is elusive. Our results support that dental floss is not the quintessential IOH method. Persons that are effectively using floss should not be instructed to discontinue their OH habits. Importantly, as suggested by our findings, other OH adjuncts actually have an increased likelihood of being effective in reducing gingival inflammation, such as interdental brushes, waterjet devices and dental toothpicks with the appropriate OH instruction.

I personally use a water-jet (Waterpik) and it works well for me in terms of both hedonics and dental health results. (I googled for "waterpik meta analysis" and this paper was the first result that came up.)

Comment by wei_dai on We run the Center for Applied Rationality, AMA · 2019-12-22T22:57:25.391Z · score: 20 (8 votes) · LW · GW

Philosophy strikes me as, on the whole, an unusually unproductive field full of people with highly questionable epistemics.

This is kind of tangential, but I wrote Some Thoughts on Metaphilosophy in part to explain why we shouldn't expect philosophy to be as productive as other fields. I do think it can probably be made more productive, by improving people's epistemics, their incentives for working on the most important problems, etc., but the same can be said for lots of other fields.

I certainly don’t want to turn the engineers into philosophers

Not sure if you're saying that you personally don't have an interest in doing this, or that it's a bad idea in general, but if the latter, see Counterintuitive Comparative Advantage.

Comment by wei_dai on Against Premature Abstraction of Political Issues · 2019-12-21T04:51:04.255Z · score: 5 (2 votes) · LW · GW

Another solution would be to think of spaces to discuss politics which one can join.

There are spaces I can join (and have joined) to do politics or observe politics but not so much to discuss politics, because the people there lack the rationality skills or background knowledge (e.g., the basics of Bayesian epistemology, or an understanding of game theory in general and signaling in particular) to do so.

I believe that we wont get a better understanding of politics by discussing it here, as its more of a form of empirical knowledge you acquire:

I think we need both, because after observing "politics in the wild", I need to systemize the patterns I observed, understand why things happened the way they did, predict whether the patterns/trends I saw are likely to continue, etc. And it's much easier to do that with other people's help than to do it alone.

Comment by wei_dai on Free Speech and Triskaidekaphobic Calculators: A Reply to Hubinger on the Relevance of Public Online Discussion to Existential Risk · 2019-12-21T04:05:50.841Z · score: 37 (12 votes) · LW · GW

The cognitive algorithm of “Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c.” wouldn’t have led us to noticing the alignment problem, and I would be pretty surprised if it were sufficient to solve it (although that would be very convenient).

This seems to be straw-manning Evan. I'm pretty sure he would explicitly disagree with "Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c." and I don't see much evidence he is implicitly doing this either. In my own case I think taintedness is one consideration that I have to weigh among many and I don't see what's wrong with that. Practically, this means trying to find some way of talking about political issues on LW or an LW-adjacent space while limiting how much the more political discussions taint the more technical discussions or taint everyone who is associated with LW in some way.

I'm not sure what your own position is on the "should we discuss object-level politics on LW" question. Are you suggesting that we should just go ahead and do it without giving any weight to taintedness? (You didn't fully spell out the analogy with the Triskaidekaphobic Calculator, but it seems like you're saying that worrying about taintedness while trying to solve AI safety is like trying to make a calculator while worrying about triskaidekaphobia, so we shouldn't do that?)

I've been saying that I don't see how things work out well if x-risk / AI safety people don't get better at thinking about, talking about, and doing politics. I'm doubtful that I'm understanding you correctly, but if I am, I also don't see how things work out well if LW starts talking about politics without any regard to taintedness and as a result x-risk / AI safety becomes politically radioactive to most people who have to worry about conventional politics.

Comment by wei_dai on Against Premature Abstraction of Political Issues · 2019-12-20T23:36:59.079Z · score: 7 (3 votes) · LW · GW

I think it was bad in the short term (it was at least a distraction, and maybe tainted AI safety by association although I don't have any personal knowledge of that), but probably good in the long run, because it gave people a good understanding of one political phenomenon (i.e., the giving and taking of offense) which let them better navigate similar situations in the future. In other words, if the debate hadn't happened online and the resulting understanding widely propagated through this community, there probably would have been more political drama over time because people wouldn't have had a good understanding of the how and why of avoiding offense.

But I do agree that "taint by association" is a big problem going forward, and I'm not sure what to do about that yet. By mentioning the 2009 debate I was mainly trying to establish that if that problem could be solved or ameliorated to a large degree, then online political discussions seem to be worth having because they can be pretty productive.

Comment by wei_dai on Against Premature Abstraction of Political Issues · 2019-12-19T02:42:42.285Z · score: 6 (3 votes) · LW · GW

It is sometimes possible to avoid this failure mode, but imo basically only if the conversations are kept highly academic and avoiding of any hot-button issues (e.g. as in some online AI safety discussions, though not all). I think this is basically impossible for politics

I disagree, and think LW can actually do ok, and probably even better with some additional safeguards around political discussions. You weren't around yet when we had the big 2009 political debate that I referenced in the OP, but I think that one worked out pretty well in the end. And I note that (at least from my perspective) a lot of progress in that debate was made online as opposed to in person, even though presumably many parallel offline discussions were also happening.

so I suspect that not having the ability to talk about politics online won’t be much of a problem (and might even be quite helpful, since I suspect it would overall raise the level of political discourse).

Do you think just talking about politics in person is good enough for making enough intellectual progress and disseminating that widely enough to eventually solve the political problems around AI safety and x-risks? Even if I didn't think there's an efficiency hit relative to current ways of discussing politics online, I would be quite worried about that and trying to find ways to move beyond just talking in person...

Comment by wei_dai on Against Premature Abstraction of Political Issues · 2019-12-19T00:02:19.955Z · score: 10 (5 votes) · LW · GW

I'm not very familiar with the rationalist diaspora, but I wonder if there were or are spaces within that where political discussions are allowed or welcome, how things turned out, and what lessons we can learn from their history to inform future experiments.

I do know about the weekly cultural war threads on TheMotte and the "EA discuss politics" Facebook group but haven't observed them long enough to make any strong conclusions. Also, for my tastes, they seem a little bit too far removed from LW both culturally and in terms of overlapping membership because they both spawned from LW-adjacent groups rather than LW itself.

Comment by wei_dai on Against Premature Abstraction of Political Issues · 2019-12-18T21:49:53.538Z · score: 3 (1 votes) · LW · GW

How much of an efficiency hit do you think taking all discussion of a subject offline ("in-person") involves? For example if all discussions about AI safety could only be done in person (no forums, journals, conferences, blogs, etc.), how much would that slow down progress?

Comment by wei_dai on Against Premature Abstraction of Political Issues · 2019-12-18T21:22:21.652Z · score: 11 (5 votes) · LW · GW

Personally, I would like LessWrong to be a place where I can talk about AI safety and existential risk without being implicitly associated with lots of other political content that I may or may not agree with.

Good point, I agree this is probably a dealbreaker for a lot of people (maybe even me) unless we can think of some way to avoid it. I can't help but think that we have to find a solution besides "just don't talk about politics" though, because x-risk is inherently political and as the movement gets bigger it's going to inevitably come into conflict with other people's politics. (See here for an example of it starting to happen already.) If by the time that happens in full force, we're still mostly political naivetes with little understanding of how politics works in general or what drives particular political ideas, how is that going to work out well? (ETA: This is not an entirely rhetorical question, BTW. If anyone can see how things work out well in the end despite LW never getting rid of the "don't talk about politics" norm, I really want to hear that so I can maybe work in that direction instead.)

Comment by wei_dai on What determines the balance between intelligence signaling and virtue signaling? · 2019-12-17T05:17:01.897Z · score: 7 (3 votes) · LW · GW

Building up on this, virtue is valued more when a society is threatened from the inside.

Right, and unfortunately the relevant thing here isn't how much society is objectively threatened from the inside, but people's perceptions of the threat, which can differ wildly from reality, because of propaganda, revenue-driven news media, preexisting ideology, or any number of other things. To quote a particularly extreme and tragic instance of this from Gwern's review of The Cultural Revolution: A People's History, 1962-1976:

The disappointment generates dissonance: many people genuinely believed that the solutions had been found and that the promises could be kept and the goals were realistic, but somehow it came out all wrong. ("We wanted the best, but it turned out like always.") Why? It can't be that the ideology is wrong, that is unthinkable; the ideology has been proven correct. Nor is it the great leader's fault, of course. Nor are there any enemies close at hand: they were all killed or exiled. The cargo cult keeps implementing the revolution and waving the flags, but the cargo of First World countries stubbornly refuses to land.

The paranoid yet logical answer is that there must be invisible enemies: saboteurs, counter-revolutionaries, and society remaining 'structurally' anti-ideological. No matter that victory was total, the failure of their policies proves that the enemies are still everywhere.

Comment by wei_dai on Is the term mesa optimizer too narrow? · 2019-12-17T04:29:35.351Z · score: 3 (1 votes) · LW · GW

When the brain makes a decision, it usually considers at most three or four alternatives for each action it does. Most of the actual work is therefore done at the heuristics stage, not the selection part. And even at the selection stage, I have little reason to believe that it is actually comparing alternatives against an explicit objective function.

Assuming this, it seems to me that the heuristics are being continuously trained by the selection stage, so that is the most important part even if heuristics are doing most of the immediate work in making each decision. And I'm not sure what you mean by "explicit objective function". I guess the objective function is encoded in the connections/weights of some neural network. Are you not counting that as an explicit objective function and instead only counting a symbolically represented function as "explicit"? If so, why would not being "explicit" disqualify humans as mesa optimizers? If not, please explain more what you mean?

Since this can all be done in a simple feedforward neural network, I find it hard to see why the best model of its behavior should be an optimizer.

I take your point that some models can behave like an optimizer at first glance but if you look closer it's not really an optimizer after all. But this doesn't answer my question: "Can you give some realistic examples/scenarios of “malign generalization” that does not involve mesa optimization? I’m not sure what kind of thing you’re actually worried about here."

ETA: If you don't have a realistic example in mind, and just think that we shouldn't currently rule out the possibility that a non-optimizer might generalize in a way that is more dangerous than total failure, I think that's a good thing to point out too. (I had already upvoted your post based on that.)

Comment by wei_dai on Is the term mesa optimizer too narrow? · 2019-12-16T05:27:13.845Z · score: 7 (3 votes) · LW · GW

First, I think by this definition humans are clearly not mesa optimizers.

I'm confused/unconvinced. Surely the 9/11 attackers, for example, must have "internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system"? Can you give some examples of humans being highly dangerous without having done this kind of explicit optimization?

As far as I can tell, Hjalmar Wijk introduced the term “malign generalization” to describe the failure mode that I think is most worth worrying about here.

Can you give some realistic examples/scenarios of "malign generalization" that does not involve mesa optimization? I'm not sure what kind of thing you're actually worried about here.

Comment by wei_dai on Approval Extraction Advertised as Production · 2019-12-16T05:25:47.693Z · score: 14 (5 votes) · LW · GW

Do you think other VCs / startup accelerators are doing this ("screen for founders making a credible effort to create a great product, instead of screening for generalized responsiveness to tests"), or doing it to a greater extent than Y Combinator?

Comment by wei_dai on ialdabaoth is banned · 2019-12-16T04:26:51.268Z · score: 3 (1 votes) · LW · GW

“Worried” is an understatement. It’s more like panicking continuously all year with many hours of lost sleep, crying fits, pacing aimlessly instead of doing my dayjob, and eventually doing enough trauma processing to finish writing my forthcoming 20,000-word memoir explaining in detail (as gently and objectively as possible while still telling the truth about my own sentiments and the world I see) why you motherfuckers are being incredibly intellectually dishonest (with respect to a sense of “intellectual dishonesty” that’s about behavior relative to knowledge, not conscious verbal “intent”).

I think I'm someone who might be sympathetic to your case, but just don't understand what it is, so I'm really curious about this "memoir". (Let me know if you want me to read your current draft.) Is this guess, i.e., you're worried about the community falling prey to runaway virtue signaling, remotely close?