Comment by vaniver on Commentary On "The Abolition of Man" · 2019-07-17T02:54:02.763Z · score: 6 (2 votes) · LW · GW

I don't have a solid sense of this yet, in large part because of how much of it is experiential.

I think I would count the 5-second level as gesturing in this direction; I also note the claim that HPMOR lets people 'experience' content from the Sequences instead of just read it. Some friends who did (old-style) boxing described it as calibrating their emotional reactions to danger and conflict in a way that seems related.

I've been experimenting with conceptualizing some of my long-standing dilemmas as questions of the form "does this desire have merit?" as opposed to something closer to "should I do A or B?", but it's too soon to see if that's the right approach.

Comment by vaniver on Commentary On "The Abolition of Man" · 2019-07-15T21:32:39.988Z · score: 5 (2 votes) · LW · GW

See also: Role are Martial Arts For Agency.

Commentary On "The Abolition of Man"

2019-07-15T18:56:27.295Z · score: 55 (12 votes)
Comment by vaniver on What are we predicting for Neuralink event? · 2019-07-15T17:46:17.834Z · score: 5 (2 votes) · LW · GW

Length of the control vector seems important; there's lots of ways to use gross signals to control small vectors that don't scale to controlling large vectors. Basically, you could imagine that question as something like "could you dance with it?" (doable in 2014) or "could you play a piano with it?" (doable in 2018), both of which naively seem more complicated than an (x,y) pair (at least, when you don't have visual feedback).

Comment by vaniver on What are we predicting for Neuralink event? · 2019-07-14T17:54:38.851Z · score: 21 (9 votes) · LW · GW

I predict with moderate confidence that we will not see:

  • 'Augmented reality'-style overlays or video beamed directly to the visual cortex.
  • Language output (as text or audio or so on) or input.
  • Pure tech or design demos without any demonstrations or experiments with real biology.

I predict with weak confidence that we won't see results in humans. (This prediction is stronger the more invasive the results we're seeing; a superior EEG they could show off in humans, but repair or treatment of strokes will likely only be in mice.)

(Those strike me as the next milestones along the 'make BCIs that are useful for making top performers higher performing' dimension, which seems to be Musk's long-term vision for Neuralink.)

They've mostly been focusing on medical applications. So I predict we will see something closer to:

  • High-spatial-fidelity brain monitoring (probably invasive?), intended to determine gross functionality of different regions (perhaps useful in conjunction with something like ultrasound to do targeted drug delivery for strokes).
  • Neural prostheses intended to replace the functionality of single brain regions that have been destroyed. (This seems more likely for regions that are somehow symmetric or simple.)
  • Results in rats or mice.

I notice I wanted to put 'dexterous motor control' on both lists, so I'm somehow confused; it seems like we already have prostheses that perform pretty well based on external nerve sites (like reading off what you wanted to do with your missing hand from nerves in your arm) but I somehow don't expect us to have the spatial precision or filtering capacity to do that in the brain. (And also it just seems much riskier to attach electrodes internally or to the spinal cord than at an external site, making it unclear why you would even want that.) The main question here for me is something closer to 'bandwidth', where it seems likely you can pilot a drone using solely EEG if the thing you're communicating is closer to "a location that you should be at" than "how quickly each of the four rotors should be spinning in what direction." But we might have results where rats have learned how to pilot drones using low-level controls, or something cool like that.

Comment by vaniver on The AI Timelines Scam · 2019-07-12T05:08:34.364Z · score: 19 (7 votes) · LW · GW

Specifically, 'urgent' is measured by the difference between the time you have and the time it will take to do. If I need the coffee to be done in 15 minutes and the bread to be done in an hour, but if I want the bread to be done in an hour I need to preheat the oven now (whereas the coffee only takes 10 minutes to brew start to finish) then preheating the oven is urgent whereas brewing the coffee has 5 minutes of float time. If I haven't started the coffee in 5 minutes, then it becomes urgent. See critical path analysis and Gantt charts and so on.

This might be worth a post? It feels like it'd be low on my queue but might also be easy to write.

Comment by vaniver on The AI Timelines Scam · 2019-07-12T04:58:45.570Z · score: 32 (7 votes) · LW · GW

I mostly agree with your analysis; especially the point about 1 (that the more likely I think my thoughts are to be wrong, the lower cost it is to share them).

I understand that there are good reasons for discussions to be private, but can you elaborate on why we'd want discussions about privacy to be private?

Most examples here have the difficulty that I can't share them without paying the costs, but here's one that seems pretty normal:

Suppose someone is a student and wants to be hired later as a policy analyst for governments, and believes that governments care strongly about past affiliations and beliefs. Then it might make sense for them to censor themselves in public under their real name because of potential negative consequences of things they said when young. However, any statement of the form "I specifically want to hide my views on X" made under their real name has similar possible negative consequences, because it's an explicit admission that the person has something to hide.

Currently, people hiding their unpopular opinions to not face career consequences is fairly standard, and so it's not that damning to say "I think this norm is sensible" or maybe even "I follow this norm," but it seems like it would have been particularly awkward to be first person to explicitly argue for that norm.

Comment by vaniver on The AI Timelines Scam · 2019-07-12T02:01:11.539Z · score: 21 (8 votes) · LW · GW

Do we have those generally trusted arbiters? I note that it seems like many people who I think of as 'generally trusted' are trusted because of some 'private information', even if it's just something like "I've talked to Carol and get the sense that she's sensible."

Comment by vaniver on The AI Timelines Scam · 2019-07-11T16:52:12.759Z · score: 59 (16 votes) · LW · GW

[Note: this, and all comments on this post unless specified otherwise, is written with my 'LW user' hat on, not my 'LW Admin' or 'MIRI employee' hat on, and thus is my personal view instead of the LW view or the MIRI view.]

As someone who thinks about AGI timelines a lot, I find myself dissatisfied with this post because it's unclear what "The AI Timelines Scam" you're talking about, and I'm worried if I poke at the bits it'll feel like a motte and bailey, where it seems quite reasonable to me that '73% of tech executives thinking that the singularity will arrive in <10 years is probably just inflated 'pro-tech' reasoning,' but also it seems quite unreasonable to suggest that strategic considerations about dual use technology should be discussed openly (or should be discussed openly because tech executives have distorted beliefs). It also seems like there's an argument for weighting urgency in planning that could lead to 'distorted' timelines while being a rational response to uncertainty.

On the first point, I think the following might be a fair description of some thinkers in the AGI space, but don't think this is a fair summary of MIRI (and I think it's illegible, to me at least, whether you are intending this to be a summary of MIRI):

This bears similarity to some conversations on AI risk I've been party to in the past few years. The fear is that Others (DeepMind, China, whoever) will develop AGI soon, so We have to develop AGI first in order to make sure it's safe, because Others won't make sure it's safe and We will. Also, We have to discuss AGI strategy in private (and avoid public discussion), so Others don't get the wrong ideas. (Generally, these claims have little empirical/rational backing to them; they're based on scary stories, not historically validated threat models)

I do think it makes sense to write more publicly about the difficulties of writing publicly, but there's always going to be something odd about it. Suppose I have 5 reasons for wanting discussions to be private, and 3 of them I can easily say. Discussing those three reasons will give people an incomplete picture that might seem complete, in a way that saying "yeah, the sum of factors is against" won't. Further, without giving specific examples, it's hard to see which of the ones that are difficult to say you would endorse and which you wouldn't, and it's not obvious to me legibility is the best standard here.

But my simple sense is that openly discussing whether or not nuclear weapons were possible (a technical claim on which people might have private information, including intuitions informed by their scientific experience) would have had costs and it was sensible to be secretive about it. If I think that timelines are short because maybe technology X and technology Y fit together neatly, then publicly announcing that increases the chances that we get short timelines because someone plugs together technology X and technology Y. It does seem like marginal scientists speed things up here.

Now, I'm paying a price here; it may be the case that people have tried to glue together technology X and technology Y and it won't work. I think private discussions on this are way better than no discussions on this, because it increases the chances that those sorts of crucial facts get revealed. It's not obvious that public discussions are all that much better on these grounds.

On the second point, it feels important to note that the threshold for "take something seriously" is actually quite small. I might think that the chance that I have Lyme disease is 5%, and yet that motivates significant action because of hugely asymmetric cost considerations, or rapid decrease in efficacy of action. I think there's often a problem where someone 'has short timelines' in the sense that they think 10-year scenarios should be planned about at all, but this can be easily mistaken for 'they think 10-year scenarios are most likely' because often if you think both an urgent concern and a distant concern are possible, almost all of your effort goes into the urgent concern instead of the distant concern (as sensible critical-path project management would suggest).

Comment by vaniver on The AI Timelines Scam · 2019-07-11T16:45:46.426Z · score: 53 (19 votes) · LW · GW

On 3, I notice this part of your post jumps out to me:

Of course, I'd have written a substantially different post, or none at all, if I believed the technical arguments that AGI is likely to come soon had merit to them

One possibility behind the "none at all" is that 'disagreement leads to writing posts, agreement leads to silence', but another possibility is 'if I think X, I am encouraged to say it, and if I think Y, I am encouraged to be silent.'

My sense is it's more the latter, which makes this seem weirdly 'bad faith' to me. That is, suppose I know Alice doesn't want to talk about biological x-risk in public because of the risk that terrorist groups will switch to using biological weapons, but I think Alice's concerns are overblown and so write a post about how actually it's very hard to use biological weapons and we shouldn't waste money on countermeasures. Alice won't respond with "look, it's not hard, you just do A, B, C and then you kill thousands of people," because this is worse for Alice than public beliefs shifting in a way that seems wrong to her.

It is not obvious what the right path is here. Obviously, we can't let anyone hijack the group epistemology by having concerns about what can and can't be made public knowledge, but also it seems like we shouldn't pretend that everything can be openly discussed in a costless way, or that the costs are always worth it.

Comment by vaniver on The Results of My First LessWrong-inspired I Ching Divination · 2019-07-09T17:22:10.349Z · score: 8 (4 votes) · LW · GW

One side note: I've been surprised by how much the presentation differed between the copy I originally read (Brian Walker's translation) and various "get I Ching readings online" sites that I've gone to over the years. It might be worth looking at a few different translations to find the one that fits you best.

It definitely makes sense to track "am I discovering anything new?", as measured by "I changed my plans" or "I explored fruitfully" or "my emotional orientation towards X improved" (instead of merely changing). It seems worth comparing to other retrospective / prospective analyses you might try; in the same way that one diet should be compared against other diets (not just on grounds of nutrition, but also enjoyment and convenience and so on).

I also attempted to track "how much of a stretch was the scenario/perspective/etc.?", where sometimes it would be right-on and other times I could kind of see it and other times my sense was "nope, that's just not resonating at all." If something is resonating too much, either you have a run of luck that's unseasonably long or you're getting prompts that aren't specific enough to be wrong. If you're trying to train the skill of discernment, you need both to notice when things are right and wrong, and thinking that it's right is worthless unless sometimes you also think it's wrong.

Comment by vaniver on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-06T18:40:32.649Z · score: 5 (2 votes) · LW · GW
People don't customize their houses all that much, to the degree they do it doesn't get them very good returns on well being per dollar spent, and they pay larger well being costs from the aforementioned commute and career inflexibility problems.

I feel conflicting desires here, to point out that this sometimes happens, and to worry that this is justifying a bias instead of correcting it. For example, I switched from 'wanting to rent' to 'wanting to buy' when I realized that I would benefit a lot from having an Endless Pool in my house, and that this wasn't compatible with renting (unless I could find a place that already had one, or whose owner wanted one, or so on). But also that this convinced me doesn't mean that most people who are convinced are correctly convinced. It might be better for me to do whatever 'looking seriously at the price differential' and deciding to invest in the policy of walking to the local pool more instead; but I think actually money isn't the main thing here. (Like, for a while I thought it was better to be in Austin than in the Bay because a software engineer would earn about $10k/yr more all things considered, and then after thinking about it realized that I was happy paying $10k/yr to be in the Bay instead.)

Comment by vaniver on Causal Reality vs Social Reality · 2019-07-05T00:52:39.780Z · score: 7 (3 votes) · LW · GW
Some defensiveness is both justified and adaptive.

This seems right but tricky. That is, it seems important to distinguish 'adaptive for my situation' and 'adaptive for truth-seeking' (either as an individual or as a community), and it seems right that hostility or counterattack or so on are sometimes the right tool for individual and community truth-seeking. (Sometimes you are better off if you gag Loki: even though gagging in general is a 'symmetric weapon,' gagging of trolls is as asymmetric as your troll-identification system.) Further, there's this way in which 'social monkey'-style defenses seem like they made it harder to know (yourself, or have it known in the community) that you have validly identified the person you're gagging as Loki (because you've eroded the asymmetry of your identification system).

It seems like the hoped behavior is something like the follows: Alice gets a vibe that Bob is being non-cooperative, Alice points out an observation that is relevant to Alice's vibe ("Bob's tone") that also could generate the same vibe in others, and then Bob either acts in a reassuring manner ("oh, I didn't mean to offend you, let me retract the point or state it more carefully") or in a confronting manner ("I don't think you should have been offended by that, and your false accusation / tone policing puts you in the wrong"), and then there are three points to track: object-level correctness, whether Bob is being cooperative once Bob's cooperation has been raised to salience, and whether Alice's vibe of Bob's intent was a valid inference.

It seems to me like we can still go through a similar script without making excuses or obfuscating, but it requires some creativity and this might not be the best path to go down.

Comment by vaniver on Causal Reality vs Social Reality · 2019-07-05T00:34:30.206Z · score: 8 (3 votes) · LW · GW

To be clear I agree with the benefits of politeness, and also think people probably *underweight* the benefits of politeness because they're less easy to see. (And, further, there's a selection effect that people who are 'rude' are disproportionately likely to be ones who find politeness unusually costly or difficult to understand, and have less experience with its benefits.)

This is one of the reasons I like an injunction that's closer to "show the other person how to be polite to you" than "deal with it yourself"; often the person who 'didn't see how to word it any other way' will look at your script and go "oh, I could have written that," and sometimes you'll notice that you're asking them to thread a very narrow needle or are objecting to the core of their message instead of their tone.

Comment by vaniver on Causal Reality vs Social Reality · 2019-07-04T20:12:50.171Z · score: 36 (8 votes) · LW · GW
I, at least, am a social monkey.

I basically don't find this compelling, for reasons analogous to No, It's not The Incentives, it's you. Yes, there are ways to establish emotional safety between people so that I can point out errors in your reasoning in a way that reduces the degree of threat you feel. But there are also ways for you to reduce the number of bucket errors in your mind, so that I can point out errors in your reasoning without it seeming like an attack on "am I ok?" or something similar.

Versions of this sort of thing that look more like "here is how I would gracefully make that same objection" (which has the side benefit of testing for illusion of transparency) seem to me more likely to be helpful, whereas versions that look closer to "we need to settle this meta issue before we can touch the object level" seem to me like they're less likely to be helpful, and more likely to be the sort of defensive dodge that should be taxed instead of subsidized.

Comment by vaniver on Discussion Thread: The AI Does Not Hate You by Tom Chivers · 2019-06-29T14:36:31.463Z · score: 5 (2 votes) · LW · GW

Not very much--the feminism chapter is 6 pages, and the neoreaction chapter is 5 pages. Both read like "look, you might have heard rumors that they're bad because of X, but here's the more nuanced version," and basically give the sort of defense that Scott Alexander would give. About feminism, he mostly brings up Scott Aaronson's Comment #171 and Scott Alexander's response to the response, Scott Alexander's explanation of why there are so few female computer programmers (because of the distribution of interests varying by sex), and the overreaction to James Damore. On neoreaction, he brings up Moldbug's posts on Overcoming Bias, More Right, and Michael Anissimov, and says 'comment sections are the worst' and 'if you're all about taking ideas seriously and discussing them civilly, people who have no other discussion partners will seek you out.'

Comment by vaniver on Discussion Thread: The AI Does Not Hate You by Tom Chivers · 2019-06-29T04:15:56.733Z · score: 5 (2 votes) · LW · GW

You mean Part 7 ("The Dark Sides"), or the ways in which the book is bad?

I thought Part 7 was well-done, overall; he asks if we're a cult (and decides "no" after talking about the question in a sensible way), has a chapter on "you can't psychoanalyze your way to the truth", and talks about feminism and neoreactionaries in a way that's basically sensible.

Some community gossip shows up, but in a way that seems almost totally fair and respects the privacy of the people involved. My one complaint, as someone responsible for the LessWrong brand, is that he refers to one piece of community gossip as 'the LessWrong baby' and discusses a comment thread in which people are unkind to the mother*, while that comment thread happened on SlateStarCodex. But this is mostly the fault of the person he interviewed in that chapter, I think, who introduced that term, and is likely a sensible attempt to avoid naming the actual humans involved, which is what I've done whenever I want to refer to the gossip.

*I'm deliberately not naming the people involved, as they aren't named in the book either, and suspect it should stay that way. If you already know the story you know the search terms, and if you don't it's not really relevant.

Comment by vaniver on Discussion Thread: The AI Does Not Hate You by Tom Chivers · 2019-06-29T03:59:15.464Z · score: 10 (5 votes) · LW · GW

One of the things that I'm sad about is that the book makes no mention of LW 2.0 / the revival. (The last reference I could find was to something in early 2018, but much of the book relates to stuff happening in 2017.) We announced the transition in June 2017, but how much it had succeeded might not have been obvious then (or it may have been the sort of thing that didn't get advertised to Chivers by his in-person contacts), and so there's a chapter on the diaspora which says there's no central hub. Which is still somewhat true--I don't think LW is as much of a central hub as I want it to be--but is not true to the same extent that it was in 2016, say.

Comment by vaniver on Discussion Thread: The AI Does Not Hate You by Tom Chivers · 2019-06-28T23:27:12.621Z · score: 16 (6 votes) · LW · GW

I was pretty pleased with it, and recommended it to my parents. (Like Ajeya, I've had some difficulty giving them the full picture since I stopped working in industry.) There's a sentence on rationalists and small talk that I read out loud to several people in the office, all of whom thought it fit pretty well.

One correction: he refers several times to UCLA Berkeley, when it should just be UC Berkeley. (UCLA refers to the University of California at Los Angeles, a different university in the same UC system as Berkeley.)

Comment by vaniver on How to deal with a misleading conference talk about AI risk? · 2019-06-27T22:27:29.538Z · score: 15 (7 votes) · LW · GW

I've read the slides of the underlying talk, but not listened to it. I currently don't expect to write a long response to this. My thoughts about points the talk touches on:

  • Existential risk vs. catastrophic risk. Often, there's some question about whether or not existential risks are even possible. On slide 7 and 8 Sussman identifies a lot of reasons to think that humans cause catastrophic risks (ecological destruction could possibly kill 90% of people, but seems much more difficult for it to kill 100% of people), and the distinction between the two is only important if you think about the cosmic endowment. But of course if we think AI is an existential threat, and we think humans make AI, then it is true that humans present an existential threat to ourselves. I also note here that Sussman identifies synthetic biology as possibly an existential risk, which raises the question of why an AI couldn't be a source of the existential risk presented by synthetic biology. (If an AI is built that wants to kill us, and that weapon is lying around, then we should be more concerned about AI because it has an opportunity.)
  • Accident risk vs. misuse risk. This article talks about it some, but the basic question is "will advanced AI cause problems because it did something no one wanted (accidents), or something bad people wanted (misuse)?". Most technical AI safety research is focused on accident risk, for reasons that are too long to describe here, but it's not crazy to be concerned about misuse risk, which seems to be Sussman's primary focus. I also think the sort of accident risks that we're concerned about require much deeper solutions that the normal sorts of bugs or accidents that one might imagine on hearing about this; the autonomous vehicle accident that occupies much of the talk is not a good testbed for thinking about what I think of as 'accident risk' and instead one should focus on something like the 'nearest unblocked strategy' article and related things.
  • Openness vs. closure. Open software allows for verifiability; I can know that lots of people have evaluated the decision-making of my self-driving car, rather than just Tesla's internal programming team. But also open software allows for copying and modification; the software used to enable drones that deliver packages could be repurposed to enable drones that deliver hand grenades. If we think a technology is 'dual use', in that it can both be used to make things better (like printing DNA for medical treatments) and worse (like printing DNA to create new viruses), we generally don't want those technologies to be open, and instead have carefully monitored access to dissuade improper use.
  • Solving near-term problems vs. long-term problems. Many people working on technical AI safety focus on applications with immediate uses, like the underlying math for how autonomous vehicles might play nicely with human drivers, and many people working on technical AI safety focus on research that will need to be done before we can safely deploy advanced artificial intelligence. Both of these problems seem real to me, and I wouldn't dissuade someone from working on near-term safety work (especially if the alternative is that they do capabilities work!). I think that the 'long-term' here is measured in "low numbers of decades" instead of "low numbers of centuries," and so it might be a mistake to call it 'long-term,' but the question of how to do prioritization here is actually somewhat complicated, and it seems better if we end up in a world where people working on near-term and long-term issues see each other as collaborators and allies instead of competitors for a limited supply of resources or attention.
Comment by vaniver on How to deal with a misleading conference talk about AI risk? · 2019-06-27T22:01:52.060Z · score: 27 (11 votes) · LW · GW

I've given responses before where I go into detail about how I disagree with some public presentation on AI; the primary example is this one from January 2017, which Yvain also responded to. Generally this is done after messaging the draft to the person in question, to give them a chance to clarify or correct misunderstandings (and to be cooperative instead of blindsiding them).

I generally think it's counterproductive to 'partially engage' or to be dismissive; for example, one consequence of XiXiDu's interviews with AI experts was that some of them (that received mostly dismissive remarks in the LW comments) came away with the impression that people interested in AI risk were jerks who aren't really worth engaging with. For example, I might think someone is confused if they think climate change is more important than AI safety, but I don't think that it's useful to just tell them that they're confused or off-handedly remark that "of course AI safety is more important," since the underlying considerations (like the difference between catastrophic risks and existential risks) are actually non-obvious.

Comment by vaniver on Embedded Agency: Not Just an AI Problem · 2019-06-27T01:53:18.119Z · score: 31 (7 votes) · LW · GW

See here.

[As a side note, I notice that the habit of "pepper things with hyperlinks whenever possible" seems to be less common on modern LW than it was on old LW, but I think it was actually a pretty great habit and I'd like to see more of it.]

Comment by vaniver on Research Agenda in reverse: what *would* a solution look like? · 2019-06-26T23:17:52.865Z · score: 14 (5 votes) · LW · GW

In my experience, people mostly haven't had the view of "we can just do CEV, it'll be fine" and instead have had the view of "before we figure out what our preferences are, which is an inherently political and messy question, let's figure out how to load any preferences at all."

It seems like there needs to be some interplay here--"what we can load" informs "what shape we should force our preferences into" and "what shape our preferences actually are" informs "what loading needs to be capable of to count as aligned."

Comment by vaniver on Jordan Peterson on AI-FOOM · 2019-06-26T22:10:51.824Z · score: 32 (11 votes) · LW · GW

Yeah, it's sort of awkward that there are two different things one might want to talk about with FOOM: the idea of recursive self improvement in the typical I.J. Good sense, and the "human threshold isn't special and can be blown past quickly" idea. AlphaZero being able to hit the superhuman level at Go after 3 days of training, and doing so only a year or two after any professional Go player was defeated by a computer, feels relevant to the second thing but not the first (and is connected to the 'fleets of cars will learn very differently' thing Peterson is pointing at).

[And the two actually are distinct; RSI is an argument for 'blowing past humans is possible' but many 'slow takeoff' views look more like "RSI pulls humans along with it" than "things look slow to a Martian," and there's ways to quickly blow past humans that don't involve RSI.]

Comment by vaniver on What does the word "collaborative" mean in the phrase "collaborative truthseeking"? · 2019-06-26T18:23:26.108Z · score: 21 (6 votes) · LW · GW

If "collaborative" is qualifying truth-seeking, perhaps we can see it more easily by contrast with non-collaborative truthseeking. So what might that look like?

  • I might simply be optimizing for the accuracy of my beliefs, instead of whether or not you also discover the truth.
  • I might be optimizing competitively, where my beliefs are simply judged on whether they're better than yours.
  • I might be primarily concerned about learning from the environment or from myself as opposed to learning from you.
  • I might be following only my interests, instead of joint interests.
  • I might be behaving in a way that doesn't incentivize you to point out things useful to me, or discarding clues you provide, or in a way that fails to provide you clues.

This suggests collaborative truthseeking is done 1) for the benefit of both parties, 2) in a way that builds trust and mutual understanding, and 3) in a way that uses that trust and mutual understanding as a foundation.

There's another relevant contrast, where we could look at collaborative non-truthseeking, or contrast "collaborative truthseeking" as a procedure with other procedures that could be used (like "allocating blame"), but this one seems most related to what you're driving at.

Comment by vaniver on Jordan Peterson on AI-FOOM · 2019-06-26T18:03:49.695Z · score: 25 (11 votes) · LW · GW

YouTube's transcript (with significant editing by me, mostly to clean and format):

Now the guys that are building the autonomous cars, they don't think they're building autonomous cars. They know perfectly well what they're doing. They're building fleets of mutually intercommunicating autonomous robots and each of them will to be able to teach the other because their nervous system will be the same and when there's ten million of them, when one of them learns something all ten million of them will learn it at the same time. They're not gonna have to be very bright before they're very very very smart.
Because us, you know, we'll learn something. You have to imitate it, God that's hard. Or I have to explain it to you and you have to understand it and then you have to act it out. We're not connected wirelessly with the same platform, but robots they are and so once those things get a little bit smart they're not going to stop at a little bit smart for very long they're gonna be unbelievably smart like overnight.
And they're imitating the hell out of us right now too because we're teaching them how to understand us every second of every day the net is learning what we're like. It's watching us, it's communicating with us, it's imitating us and it's gonna know. It already knows in some ways more about us than we know about ourselves. There's lots of reports already of people getting pregnancy ads or ads for infants, sometimes before they know they're pregnant, but often before they've told their families. The way that that happens is the net is watching what they're looking at and inferring with its artificial intelligence and so maybe you're pregnant that's just tilting you a little bit to interest in things that you might not otherwise be interested in. The net tracks that, then it tells you what you're after it does that by offering an advertisement. It's reading your unconscious mind.
Well, so that's what's happening.
Comment by vaniver on How does one get invited to the alignment forum? · 2019-06-25T04:07:16.628Z · score: 10 (5 votes) · LW · GW

We've been in something of a transition period with the alignment forum, where no one was paying active attention to promoting comments or posts or adding users, but starting soon I should be doing that. The primary thing that happens when someone's an AF member is that they can add posts and comments without approval (and one's votes also convey AF karma); I expect I'll mostly go through someone's comments on AF posts and ask "would I reliably promote content like this?" (or, indeed, "have I reliably promoted this person's comments on AF posts?").

Details about what sort of comments I'll think are helpful or insightful are, unfortunately, harder to articulate.

Comment by vaniver on Research Agenda v0.9: Synthesising a human's preferences into a utility function · 2019-06-20T18:35:03.151Z · score: 18 (6 votes) · LW · GW

Overall, I was pretty impressed by this; there were several points where I thought "sure, that would be nice, but obstacle X," and then the next section brought up obstacle X.

I remain sort of unconvinced that utility functions are the right type signature for this sort of thing, but I do feel convinced that "we need some sort of formal synthesis process, and a possible end product of that is a utility function."

That is, most of the arguments I see for 'how a utility function could work' go through some twisted steps. Suppose I'm trying to build a robot, and I want it to be corrigible, and I have a corrigibility detector whose type is 'decision process' to 'score'. I need to wrap that detector with a 'world state' to 'decision process' function and a 'score' to 'utility' function, and then I can hand it off to a robot that does a 'decision process' to 'world state' prediction and optimizes utility. If the robot's predictive abilities are superhuman, it can trace out whatever weird dependencies I couldn't see; if they're imperfect, then each new transformation provides another opportunity for errors to creep in. And it may be the case that this is a core part of reflective stability (because if you map through world-histories you bring objective reality into things in a way that will be asymptotically stable with increasing intelligence) that doesn't have another replacement.

I do find myself worrying that embedded agency will require dropping utility functions in a deep way that ends up connected to whether or not this agenda will work (or which parts of it will work), but remain optimistic that you'll find out something useful along the way and have that sort of obstacle in mind as you're working on it.

Comment by vaniver on Research Agenda v0.9: Synthesising a human's preferences into a utility function · 2019-06-20T18:02:42.275Z · score: 3 (1 votes) · LW · GW

Fixed a typo.

Comment by vaniver on Recommendation Features on LessWrong · 2019-06-17T05:45:52.623Z · score: 9 (4 votes) · LW · GW

So, I was just recommended Plastination is Maturing and Needs Funding. I considered putting some effort into "what's the state of plastination in 2019, 7 years later?" and commenting, but hit a handful of obstacles, one of which was "is the state of plastination in 2019 long content?". Like, the relevant fund paid out its prizes at various times, and it'd take a bit more digging to figure out if the particular team in Hanson's post was the one that won, and it's not really obvious if it matters. (Suppose we discover that the prize wasn't won by that team, after the evaluation was paid for; what does that imply?)

This makes me more excited about John's idea that shows posts with some simultaneity between users; like the Sequences Reruns, for example. It might be worth it to have a comment writing up what's changed for the other people clicking on it in 2019 who don't know where to look or aren't that committed to figuring things out, where it doesn't make sense to push that post into 'recent discussion' on my own (if this was randomly picked for me).

Is there a guide to 'Problems that are too fast to Google'?

2019-06-17T05:04:39.613Z · score: 48 (14 votes)
Comment by vaniver on Some Ways Coordination is Hard · 2019-06-14T22:26:12.053Z · score: 4 (2 votes) · LW · GW

I fixed it more.

Welcome to LessWrong!

2019-06-14T19:42:26.128Z · score: 78 (30 votes)
Comment by vaniver on [Answer] Why wasn't science invented in China? · 2019-06-11T20:57:31.243Z · score: 7 (3 votes) · LW · GW

Ruby, you might also want to borrow Why the West Rules--for Now from me; it focuses less on the scientific question and more on the economic and technological one (which ends up being connected), but I'm not sure it'll be all that different from Huff.

Comment by vaniver on [Answer] Why wasn't science invented in China? · 2019-06-11T20:41:53.380Z · score: 5 (2 votes) · LW · GW
It’s been asserted [source] that having Latin as a lingua franca was important for Europe integrated market for ideas. Makes sense if scholars who otherwise speak different languages are going to be able to communicate.

But the Muslim world was much better off in this regard, with Arabic, and while China has major linguistic variation I think it also had a 'shared language' in basically the same way Latin was a shared language for Europe.

It seems to me like the thing that's important is not so much that the market is integrated, but that there are many buyers and sellers. The best works of Chinese philosophy, as far as I can tell, come from the period when there was major intellectual and military competition between competing factions; the contention between the Hundred Schools of Thought. And then after unification the primary work available for scholars was the unified bureaucracy, which was interested in the Confucian-Legalist blend that won the unification war, and nothing else.

Comment by vaniver on Steelmanning Divination · 2019-06-08T19:32:18.249Z · score: 4 (2 votes) · LW · GW

I would imagine so, because it means you learn the cards as opposed to the sequence of cards. ("In French, chateau always follows voiture.")

Comment by vaniver on Steelmanning Divination · 2019-06-08T19:30:41.436Z · score: 5 (3 votes) · LW · GW

I mean, I think it would be more accurate to say something like "the die roll, as it's uncorrelated with features of the decision, doesn't give me any new information about which action is best," but the reason why I point to CoEE is because it is actually a valid introspective technique to imagine acting by coinflip or dieroll and then see if you're hoping for a particular result, which rhymes with the "if you can predictably update in direction X, then you should already be there."

Comment by vaniver on Steelmanning Divination · 2019-06-05T23:04:54.985Z · score: 12 (8 votes) · LW · GW

The SSC post that motivated finally finishing this up was Book Review: The Secret Of Our Success, which discusses the game-theoretic validity of randomization in competitive endeavors (like hunters vs. prey, or generals vs. generals). It seemed important to also bring up the other sorts of validity, of randomness as debiasing or de-confounding (like why randomized controlled trials are good) or randomness as mechanism to make salient pre-existing principles. I'm reminded of some online advice-purveyor who would often get emails from people asking if their generic advice applied to their specific situation; almost always, the answer was 'yes,' and there was something about the personal attention that was relevant; having it be the case that this particular bit of advice was selected for the situation you're in makes it feel worth considering in a way that "yeah, I guess I could throw this whole book of advice at my problem" doesn't.

Steelmanning Divination

2019-06-05T22:53:54.615Z · score: 137 (52 votes)
Comment by vaniver on Quotes from Moral Mazes · 2019-05-31T17:10:05.899Z · score: 12 (6 votes) · LW · GW

This was probably an accurate depiction of American corporate management when it was written, in the 80s. Since then, things have changed somewhat (in part by tech becoming a larger fraction of the economy, and by increasing meritocracy through increased competitiveness), but I think it's still present in a major way.

It seems like most of these quotes are directly at odds with seeking profit (either long- or short-term), and it would be enlightening to hear why there's not a bunch more efficient organizations taking over.

I think this is happening, but it's slow. Koch Industries claims that a major piece of social tech they use is compensating managers based on the net present value of the thing they're managing, rather than whether they're hitting key targets, and they're growing at something like 10% faster than the rest of the economy, but that still means a very long time until they've taken over (and the larger they get, the harder it is to maintain that relative rate).

Comment by vaniver on Comment section from 05/19/2019 · 2019-05-20T20:45:31.020Z · score: 14 (4 votes) · LW · GW
If we are going to have the exception to the norm at all, then there has to be a pretty high standard of evidence to prove that adding 'Y' to the discourse, in fact, has bad consequences.

I want to note that LW definitely has exceptions to this norm, if only because of the boring, normal exceptions. (If we would get in trouble with law enforcement for hosting something you might put on LW, don't put it on LW.) We've had in the works (for quite some time) a post explaining our position on less boring cases more clearly, but it runs into difficulty with the sort of issues that you discuss here; generally these questions are answered in private in a way that connects to the judgment calls being made and the particulars of the case, as opposed to through transparent principles that can be clearly understood and predicted in advance (in part because, to extend the analogy, this empowers the werewolves as well).

Comment by vaniver on Comment section from 05/19/2019 · 2019-05-20T00:33:15.497Z · score: 44 (12 votes) · LW · GW

[Written as an admin]

First and foremost, LW is a space for intellectual progress about rationality and related topics. Currently, we don't ban people for being fixated on a topic, or 'darkly hinting,' or posts they make off-site, and I don't think we should. We do keep a careful eye on such people, and interpret behavior in 'grey areas' accordingly, in a way that I think reflects both good Bayesianism and good moderation practice.

In my favorite world, people who disagree on object-level questions (both political and non-political) can nevertheless civilly discuss abstract issues. This favors asymmetric weapons and is a core component of truth-seeking. So, while hurt feelings and finding things unpleasant are legitimate and it's worth spending effort optimizing to prevent them, we can't give them that much weight unless they differentiate the true and the untrue.

That said, there are ways to bring up true things that as a whole move people away from the truth, and you might be worried about agreements on abstractions being twisted to force agreement on object-level issues. These are hard to fight, and frustrating if you see them and others don't. The best response I know is to catalog the local truths and lay out how they add up to a lie, or establish the case that agreement on those abstractions doesn't force agreement on the object-level issues, and bring up the catalog every time the local truth advances a global lie. This is a lot more work than flyswatting, but has a much stronger bent towards truth. If you believe this is what Zack is doing, I encourage you to write a compilation post and point people to it as needed; due to the nature of that post, and where it falls on the spectrum from naming abstract dynamics to call-out post, we might leave it on your personal blog or ask that you publish it outside of LW (and link to it as necessary).

Comment by vaniver on Comment section from 05/19/2019 · 2019-05-19T22:03:32.953Z · score: 12 (6 votes) · LW · GW

Generally, if you want to talk about how LW is moderated or unpleasant behavior happening here, you should talk to me. [If you think I'm making mistakes, the person to talk to is probably Habryka.] We don't have an official ombudsman, and perhaps it's worth putting some effort into finding one.

Comment by vaniver on Best reasons for pessimism about impact of impact measures? · 2019-04-20T04:26:08.560Z · score: 3 (1 votes) · LW · GW

Suppose the goal dramatically overvalues some option; then the AI would be willing to pay large (correctly estimated) costs in order to achieve "even larger" (incorrectly estimated) gains.

Comment by vaniver on Best reasons for pessimism about impact of impact measures? · 2019-04-11T19:09:00.203Z · score: 31 (10 votes) · LW · GW

When I think about solutions to AI alignment, I often think about 'meaningful reductionism.' That is, if I can factor a problem into two parts, and the parts don't actually rely on each other, now I have two smaller problems to solve. But if the parts are reliant on each other, I haven't really simplified anything yet.

While impact measures feel promising to me as a cognitive strategy (often my internal representation of politeness feels like 'minimizing negative impact', like walking on sidewalks in a way that doesn't startle birds), they don't feel promising to me as reductionism. That is, if I already had a solution to the alignment problem, then impact measures would likely be part of how I implement that solution, but solving it separately from alignment doesn't feel like it gets me any closer to solving alignment.

[The argument here I like most rests on the difference between costs and side effects; we don't want to minimize side effects because that leads to minimizing good side effects also, and it's hard to specify the difference between 'side effects' and 'causally downstream effects,' and so on. But if we just tell the AI "score highly on a goal measure while scoring low on this cost measure," this only works if we specified the goal and the cost correctly.]

But there's a different approach to AI alignment, which is something more like 'correct formalisms.' We talk sometimes about handing a utility function to the robot, or (in old science fiction) providing it with rules to follow, or so on, and by seeing what it actually looks like when we follow that formalism we can figure out how well that formalism fits to what we're interested in. Utility functions on sensory inputs don't seem alignable because of various defects (like wireheading), and so it seems like the right formalism needs to have some other features (it might still be a utility function, but it needs to be an utility function over mental representations of external reality in such a way that the mental representation tracks external reality even when you have freedom to alter your mental representation, in a way that we can't turn into code yet).

So when I ask myself questions like "why am I optimistic about researching impact measures now?" I get answers like "because exploring the possibility space will make clear exactly how the issues link up." For example, looking at things like relative reachability made it clear to me how value-laden the ontology needs to be in order for a statistical measure on states to be meaningful. This provides a different form-factor for 'transferring values to the AI'; instead of trying to ask something like "is scenario A or B better?" and train a utility function, I might instead try to ask something like "how different are scenarios A and B?" or "how are scenarios A and B different?" and train an ontology, with the hopes that this makes other alignment problems easier because the types line up somewhat more closely.

[I think even that last example still performs poorly on the 'meaningful reductionism' angle, since getting more options for types to use in value loading doesn't seem like it addresses the core obstacles of value loading, but provides some evidence of how it could be useful or clarify thinking.]

Comment by vaniver on Announcing the Center for Applied Postrationality · 2019-04-02T17:09:39.788Z · score: 11 (3 votes) · LW · GW

Could GPT2 make a good weird sun twitter? Probably not, but it could at least be a good inspirobot.

Comment by vaniver on User GPT2 is Banned · 2019-04-02T17:05:23.326Z · score: 18 (9 votes) · LW · GW

It's trained on the whole corpus of LW comments and replies that got sufficiently high karma; naively I wouldn't expect a day to make much of a dent in the training data. But there's an interesting fact about training to match distributions, which is that most measures of distributional overlap (like the KL divergence) are asymmetric; how similar the corpus is to model outputs is different from how similar model outputs are to the corpus. Geoffrey Irving is interested in methods to use supervised learning to do distributional matching the other direction, and it might be the case that comment karma is a good way to do it; my guess is that you're better off comparing outputs it generates on the same prompt head-to-head and picking which one is more 'normal,' and training a discriminator to attempt to mimic the human normality judgment.

Comment by vaniver on You Have About Five Words · 2019-03-13T21:46:50.613Z · score: 3 (1 votes) · LW · GW
In the case of LessWrong, I think the core sequences are around 10,000 words, not sure how big the overall EA corpus is.

This feel like a 100x underestimate; The Sequences clocks in at over a million words, I believe, and it's not the case that only 1% of the words are core.

Comment by vaniver on How dangerous is it to ride a bicycle without a helmet? · 2019-03-09T06:55:28.119Z · score: 13 (5 votes) · LW · GW

It feels like the per-experience costs are more relevant than the lifetime costs, since *also* you have to aggregate the lifetime annoyance. "Is it worth wearing a helmet this time to avoid 2/3rds of a micromort?"

It could be the case that the "get used to it" costs are a single investment, or there are other solutions that might not be worth it for someone who can tolerate a normal helmet but are worth it for habryka.

Comment by vaniver on In My Culture · 2019-03-08T18:54:21.409Z · score: 7 (3 votes) · LW · GW
My closest answer would be something like "in my version of utopia," although maybe that's too strong?

I think this implies way too much endorsement. I often find myself editing a document and thinking "in American English, the comma goes inside the quotation marks," even though "in programming, the period goes outside the quotation marks".

Comment by vaniver on Rule Thinkers In, Not Out · 2019-03-05T04:16:20.883Z · score: 33 (8 votes) · LW · GW

When someone has an incomplete moral worldview (or one based on easily disprovable assertions), there's a way in which the truth isn't "safe" if safety is measured by something like 'reversibility' or 'ability to continue being the way they were.' It is also often the case that one can't make a single small change, and then move on; if, say, you manage to convince a Christian that God isn't real (or some other thing that will predictably cause the whole edifice of their worldview to come crashing down eventually), then the default thing to happen is for them to be lost and alone.

Where to go from there is genuinely unclear to me. Like, one can imagine caring mostly about helping other people grow, in which a 'reversibility' criterion is sort of ludicrous; it's not like people can undo puberty, or so on. If you present them with an alternative system, they don't need to end up lost and alone, because you can directly introduce them to humanism, or whatever. But here you're in something of a double bind; it's somewhat irresponsible to break people's functioning systems without giving them a replacement, and it's somewhat creepy if you break people's functioning systems to pitch your replacement. (And since 'functioning' is value-laden, it's easy for you to think their system needs replacing.)

Comment by vaniver on Rule Thinkers In, Not Out · 2019-03-02T18:04:10.535Z · score: 21 (4 votes) · LW · GW

I think I have this skill, but I don't know that I could write this guide. Partly this is because there are lots of features about me that make this easier, which are hard (or too expensive) to copy. For example, Michael once suggested part of my emotional relationship to lots of this came from being gay, and thus not having to participate in a particular variety of competition and signalling that was constraining others; that seemed like it wasn't the primary factor, but was probably a significant one.

Another thing that's quite difficult here is that many of the claims are about values, or things upstream of values; how can Draco Malfoy learn the truth about blood purism in a 'safe' way?

Comment by vaniver on Rule Thinkers In, Not Out · 2019-02-28T19:58:05.499Z · score: 15 (4 votes) · LW · GW

He appears to have had novel ideas in his technical specialty, but his public writings are mostly about old ideas that have insufficient public defense. There, novelty isn't a virtue (while correctness is).

Comment by vaniver on Rule Thinkers In, Not Out · 2019-02-28T17:24:35.572Z · score: 29 (9 votes) · LW · GW

My sense is that his worldview was 'very sane' in the cynical HPMOR!Quirrell sense (and he was one of the major inspirations for Quirrell, so that's not surprising), and that he was extremely open about it in person in a way that was surprising and exciting.

I think his standout feature was breadth more than depth. I am not sure I could distinguish which of his ideas were 'original' and which weren't. He rarely if ever wrote things, which makes the genealogy of ideas hard to track. (Especially if many people who do write things were discussing ideas with him and getting feedback on them.)

Public Positions and Private Guts

2018-10-11T19:38:25.567Z · score: 90 (27 votes)

Maps of Meaning: Abridged and Translated

2018-10-11T00:27:20.974Z · score: 54 (22 votes)

Compact vs. Wide Models

2018-07-16T04:09:10.075Z · score: 32 (13 votes)

Thoughts on AI Safety via Debate

2018-05-09T19:46:00.417Z · score: 88 (21 votes)

Turning 30

2018-05-08T05:37:45.001Z · score: 69 (21 votes)

My confusions with Paul's Agenda

2018-04-20T17:24:13.466Z · score: 90 (22 votes)

LW Migration Announcement

2018-03-22T02:18:19.892Z · score: 139 (37 votes)

LW Migration Announcement

2018-03-22T02:17:13.927Z · score: 2 (2 votes)

Leaving beta: Voting on moving to

2018-03-11T23:40:26.663Z · score: 6 (6 votes)

Leaving beta: Voting on moving to

2018-03-11T22:53:17.721Z · score: 139 (42 votes)

LW 2.0 Open Beta Live

2017-09-21T01:15:53.341Z · score: 23 (23 votes)

LW 2.0 Open Beta starts 9/20

2017-09-15T02:57:10.729Z · score: 24 (24 votes)

Pair Debug to Understand, not Fix

2017-06-21T23:25:40.480Z · score: 8 (8 votes)

Don't Shoot the Messenger

2017-04-19T22:14:45.585Z · score: 11 (11 votes)

The Quaker and the Parselmouth

2017-01-20T21:24:12.010Z · score: 6 (7 votes)

Announcement: Intelligence in Literature Prize

2017-01-04T20:07:50.745Z · score: 9 (9 votes)

Community needs, individual needs, and a model of adult development

2016-12-17T00:18:17.718Z · score: 12 (13 votes)

Contra Robinson on Schooling

2016-12-02T19:05:13.922Z · score: 4 (5 votes)

Downvotes temporarily disabled

2016-12-01T17:31:41.763Z · score: 17 (18 votes)

Articles in Main

2016-11-29T21:35:17.618Z · score: 3 (4 votes)

Linkposts now live!

2016-09-28T15:13:19.542Z · score: 27 (30 votes)

Yudkowsky's Guide to Writing Intelligent Characters

2016-09-28T14:36:48.583Z · score: 4 (5 votes)

Meetup : Welcome Scott Aaronson to Texas

2016-07-25T01:27:43.908Z · score: 1 (2 votes)

Happy Notice Your Surprise Day!

2016-04-01T13:02:33.530Z · score: 14 (15 votes)

Posting to Main currently disabled

2016-02-19T03:55:08.370Z · score: 22 (25 votes)

Upcoming LW Changes

2016-02-03T05:34:34.472Z · score: 46 (47 votes)

LessWrong 2.0

2015-12-09T18:59:37.232Z · score: 92 (96 votes)

Meetup : Austin, TX - Petrov Day Celebration

2015-09-15T00:36:13.593Z · score: 1 (2 votes)

Conceptual Specialization of Labor Enables Precision

2015-06-08T02:11:20.991Z · score: 10 (11 votes)

Rationality Quotes Thread May 2015

2015-05-01T14:31:04.391Z · score: 9 (10 votes)

Meetup : Austin, TX - Schelling Day

2015-04-13T14:19:21.680Z · score: 1 (2 votes)


2015-04-08T02:56:25.114Z · score: 42 (36 votes)

Thinking well

2015-04-01T22:03:41.634Z · score: 28 (29 votes)

Rationality Quotes Thread April 2015

2015-04-01T13:35:48.660Z · score: 7 (9 votes)

Meetup : Austin, TX - Quack's

2015-03-20T15:12:31.376Z · score: 1 (2 votes)

Rationality Quotes Thread March 2015

2015-03-02T23:38:48.068Z · score: 8 (8 votes)

Rationality Quotes Thread February 2015

2015-02-01T15:53:28.049Z · score: 6 (6 votes)

Control Theory Commentary

2015-01-22T05:31:03.698Z · score: 18 (18 votes)

Behavior: The Control of Perception

2015-01-21T01:21:58.801Z · score: 31 (31 votes)

An Introduction to Control Theory

2015-01-19T20:50:02.624Z · score: 35 (35 votes)

Estimate Effect Sizes

2014-03-27T16:56:35.113Z · score: 1 (2 votes)

[LINK] Will Eating Nuts Save Your Life?

2013-11-30T03:13:03.878Z · score: 7 (12 votes)

Understanding Simpson's Paradox

2013-09-18T19:07:56.653Z · score: 11 (11 votes)

Rationality Quotes September 2013

2013-09-04T05:02:05.267Z · score: 5 (5 votes)

Harry Potter and the Methods of Rationality discussion thread, part 27, chapter 98

2013-08-28T19:29:17.855Z · score: 2 (3 votes)

Rationality Quotes August 2013

2013-08-02T20:59:04.223Z · score: 7 (7 votes)