Comment by robbbb on "Intelligence is impossible without emotion" — Yann LeCun · 2019-04-10T21:47:53.444Z · score: 13 (4 votes) · LW · GW

My prior is that Yann LeCun tends to have unmysterious, thoughtful models of AI (example), even though I strongly disagree with (and am often confused by) his claims about AI safety. So when Yann says "emotion", I wonder if he means anything more than that they "can decide what they do" and have "some intrinsic drive that makes them [...] do particular things" as opposed to having "preprogrammed behavior".

Comment by robbbb on Comparison of decision theories (with a focus on logical-counterfactual decision theories) · 2019-03-18T05:13:53.675Z · score: 6 (3 votes) · LW · GW

Agents need to consider multiple actions and choose the one that has the best outcome. But we're supposing that the code representing the agent's decision only has one possible output. E.g., perhaps an agent is going to choose between action A and action B, and will end up choosing A. Then a sufficiently close examination of the agent's source code will reveal that the scenario "the agent chooses B" is logically inconsistent. But then it's not clear how the agent can reason about the desirability of "the agent chooses B" while evaluating its outcomes, if not via some mechanism for nontrivially reasoning about outcomes of logically inconsistent situations.

Comment by robbbb on Comparison of decision theories (with a focus on logical-counterfactual decision theories) · 2019-03-17T19:49:37.776Z · score: 6 (3 votes) · LW · GW

The comment starting "The main datapoint that Rob left out..." is actually by Nate Soares. I cross-posted it to LW from an email conversation.

Comment by robbbb on Question: MIRI Corrigbility Agenda · 2019-03-16T18:28:52.502Z · score: 4 (2 votes) · LW · GW

I've now also highlighted Scott's tip from "Fixed Point Exercises":

Sometimes people ask me what math they should study in order to get into agent foundations. My first answer is that I have found the introductory class in every subfield to be helpful, but I have found the later classes to be much less helpful. My second answer is to learn enough math to understand all fixed point theorems.
These two answers are actually very similar. Fixed point theorems span all across mathematics, and are central to (my way of) thinking about agent foundations.
Comment by robbbb on Question: MIRI Corrigbility Agenda · 2019-03-16T14:31:39.854Z · score: 4 (2 votes) · LW · GW

I'd expect Jessica/Stuart/Scott/Abram/Sam/Tsvi to have a better sense of that than me. I didn't spot any obvious signs that it's no longer a good reference.

Comment by robbbb on Question: MIRI Corrigbility Agenda · 2019-03-15T05:44:20.940Z · score: 5 (3 votes) · LW · GW

For corrigibility in particular, some good material that's not discussed in "Embedded Agency" or the reading guide is Arbital's Corrigibility and Problem of Fully Updated Deference articles.

Comment by robbbb on Question: MIRI Corrigbility Agenda · 2019-03-15T05:36:42.850Z · score: 14 (5 votes) · LW · GW

The only major changes we've made to the MIRI research guide since mid-2015 are to replace Koller and Friedman's Probabilistic Graphical Models with Pearl's Probabilistic Inference; replace Rosen's Discrete Mathematics with Lehman et al.'s Mathematics for CS; add Taylor et al.'s "Alignment for Advanced Machine Learning Systems", Wasserman's All of Statistics, Shalev-Shwartz and Ben-David's Understanding Machine Learning, and Yudkowsky's Inadequate Equilibria; and remove the Global Catastrophic Risks anthology. So the guide is missing a lot of new material. I've now updated the guide to add the following note at the top:

This research guide has been only lightly updated since 2015. Our new recommendation for people who want to work on the AI alignment problem is:
1. If you have a computer science or software engineering background: Apply to attend our new workshops on AI risk and to work as an engineer at MIRI. For this purpose, you don’t need any prior familiarity with our research.
If you aren’t sure whether you’d be a good fit for an AI risk workshop, or for an engineer position, shoot us an email and we can talk about whether it makes sense.
You can find out more about our engineering program in our 2018 strategy update.
2. If you’d like to learn more about the problems we’re working on (regardless of your answer to the above): See “Embedded Agency” for an introduction to our agent foundations research, and see our Alignment Research Field Guide for general recommendations on how to get started in AI safety.
After checking out those two resources, you can use the links and references in “Embedded Agency” and on this page to learn more about the topics you want to drill down on. If you want a particular problem set to focus on, we suggest Scott Garrabrant’s “Fixed Point Exercises.”
If you want people to collaborate and discuss with, we suggest starting or joining a MIRIx group, posting on LessWrong, applying for our AI Risk for Computer Scientists workshops, or otherwise letting us know you’re out there.
Comment by robbbb on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-03-13T18:07:48.572Z · score: 7 (3 votes) · LW · GW
After all, they didn't get any less publicity for reporting the system's other limitations either, like it only being able to play Protoss v. Protoss on a single map, or 10/11 of the agents having whole-camera vision.

They might well have gotten less publicity due to emphasizing those facts as much as they did.

Comment by robbbb on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-03-13T18:04:16.343Z · score: 5 (2 votes) · LW · GW

I mostly agree with this comment. My speculative best guess is that the main reason MaNa did better against the revised version of AlphaStar wasn't due to the vision limitations, but rather some combination of:

MaNa had more time to come up with a good strategy and analyze previous games.

MaNa had more time to warm up, and was generally in a better headspace.

The previous version of AlphaStar was unusually good, and the new version was an entirely new system, so the new version regressed to the mean a bit. (On the dimension "can beat human pros", even though it was superior on the dimension "can beat other AlphaStar strategies".)

Comment by robbbb on Considerateness in OpenAI LP Debate · 2019-03-12T22:24:22.462Z · score: 10 (2 votes) · LW · GW

Eliezer responded to Chollet's post about intelligence explosion here: https://intelligence.org/2017/12/06/chollet/

Comment by robbbb on Renaming "Frontpage" · 2019-03-12T00:53:11.596Z · score: 4 (2 votes) · LW · GW

Personal Blog ➜ Notebook

Messages ➜ Mailbox

Comment by robbbb on Renaming "Frontpage" · 2019-03-11T16:21:11.300Z · score: 12 (4 votes) · LW · GW

Frontpage ➜ Whiteboard

Art ➜ Canvas

Coordination ➜ Bulletin Board

Meta ➜ Website

Comment by robbbb on In My Culture · 2019-03-10T21:12:27.930Z · score: 6 (4 votes) · LW · GW

I like this comment.

Comment by robbbb on Renaming "Frontpage" · 2019-03-09T04:13:21.305Z · score: 3 (2 votes) · LW · GW

Oooh, I like this. Fewer top-level sections seems good to me.

Comment by robbbb on In My Culture · 2019-03-08T00:03:09.758Z · score: 4 (2 votes) · LW · GW

That was my draft 1. :P

Comment by robbbb on In My Culture · 2019-03-07T21:51:57.113Z · score: 15 (5 votes) · LW · GW

For my personal usage, the way I could imagine using it, "in my culture" sounds a bit serious and final. "Where I'm from, we do X" is nice if I want something to sound weighty and powerful and stable, but I just don't think I've figured myself out enough to do that much yet. There might also be a bit of confusion in that "in my culture" also has a structurally similar literal meaning.

"In Robopolis" seems to fix these problems for me, since it more clearly flags that I'm not talking about a literal culture, and it sounds more agnostic about whether this is a deep part of who I am vs. a passing fashion.

Comment by robbbb on Karma-Change Notifications · 2019-03-06T01:00:58.500Z · score: 10 (5 votes) · LW · GW

The main thing I like about the 'only downvotes' option is that it's kind of funny and pointless. This suits my aesthetic. I could imagine trying it out for a few weeks to see what happens / to call the bluff of the part of my primate brain that thinks social disapproval from strangers is an x-risk. :)

Comment by robbbb on Karma-Change Notifications · 2019-03-05T22:20:41.706Z · score: 19 (8 votes) · LW · GW

If I'm having lunch with a friend, then my usual expectation is that I'll get strong compliments if they adore my clothing style, but I won't get strong criticisms if they strongly dislike it, unless I explicitly opt in to receiving the latter feedback. Most people seem to treat high-salience personal compliments as opt-out, while treating high-salience personal criticisms as opt-in. This can be outweighed if the criticism is important enough, but otherwise, criticism tends to be relatively mild and cloaked in humor or indirection.

Thinking about it in those terms, it makes sense to me to treat upvotes as similar to "person says they love my haircut" and downvotes as similar to "person says they hate my haircut." I probably want to be able to view both kinds of feedback in a time and place of my choosing, but I don't want to have the latter feedback tossed my way literally every time I open Chrome or check my email.

It might be that those norms are fine for personal style, but that we want to promote better, more pro-criticism norms in areas that matter more. We might want to push in the direction of making critical feedback opt-out, so people can (a) update faster on things that do matter a lot, and (b) perhaps get some useful exposure therapy that will make us better at receiving tips, pushback, and contrary views in the future. Mostly I'm just making this comment so folks feel comfortable talking about their preferences openly, without feeling like they're Bad Rationalists if they're not already convinced that it's useful for them personally to receive a regular stream of downvote notifications (in the world where they get a lot of downvotes).

Comment by robbbb on Thoughts on Human Models · 2019-02-26T04:09:41.219Z · score: 2 (1 votes) · LW · GW

That all seems generally fine to me. I agree the tradeoffs are the huge central difficulty here; getting to sufficiently capable AGI sufficiently quickly seems enormously harder if you aren't willing to cut major corners on safety.

Comment by robbbb on Thoughts on Human Models · 2019-02-25T23:19:28.697Z · score: 12 (3 votes) · LW · GW

The goal is to avoid particular hazards, rather than to make things human-independent as an end in itself. So if we accidentally use a concept of "human-independent" that yields impractical results like "the only safe concepts are those of fundamental physics", we should just conclude that we were using the wrong conception of "human-independent". A good way to avoid this is to keep revisiting the concrete reasons we started down this path in the first place, and see which conceptions capture our pragmatic goals well.

Here are some examples of concrete outcomes that various AGI alignment approaches might want to see, if they're intended to respond to concerns about human models:

• The system never exhibits thoughts like "what kind of agent built me?"
• The system exhibits thoughts like that, but never arrives at human-specific conclusions like "my designer probably has a very small working memory" or "my designer is probably vulnerable to the clustering illusion".
• The system never reasons about powerful optimization processes in general. (In addition to steering a wide berth around human models, this might be helpful for guarding against AGI systems doing some varieties of undesirable self-modification or building undesirable smart successors.)
• The system only allocates cognitive resources to solving problems in a specific domain like "biochemistry" or "electrical engineering".

Different alignment approaches can target different subsets of those goals, and of many other similar goals, depending on what they think is feasible and important for safety.

Comment by robbbb on Thoughts on Human Models · 2019-02-23T22:51:57.452Z · score: 3 (2 votes) · LW · GW
What about the possibility that the AGI system threatens others, rather than being threatened itself? Prima facie, that might also lead to worst-case outcomes.

I think a good intuition pump for this idea is to contrast an arbitrarily powerful paperclip maximizer with an arbitrarily powerful something-like-happiness maximizer.

A paperclip maximizer might resort to threats to get what it wants; and in the long run, it will want to convert all resources into paperclips and infrastructure, to the exclusion of everything humans want. But the "normal" failure modes here tend to look like human extinction.

In contrast, a lot of "normal" failure modes for a something-like-happiness maximizer might look like torture, because the system is trying to optimize something about human brains, rather than just trying to remove humans from the picture so it can do its own thing.

Do you envision a system that's not trained using human modelling and therefore just wouldn't know enough about human minds to make any effective threats? I'm not sure how an AI system can meaningfully be said to have "human-level general intelligence" and yet be completely inept in this regard.

I don't know specifically what Ramana and Scott have in mind, but I'm guessing it's a combination of:

• If the system isn't trained using human-related data, its "goals" (or the closest things to goals it has) are more likely to look like the paperclip maximizer above, and less likely to look like the something-like-happiness maximizer. This greatly reduces downside risk if the system becomes more capable than we intended.
• When AI developers build the first AGI systems, the right move will probably be to keep their capabilities to a bare minimum — often the minimum stated in this context is "make your system just capable enough to help make sure the world's AI doesn't cause an existential catastrophe in the near future". If that minimal goal doesn't fluency with certain high-risk domains, then developers should just avoid letting their AGI systems learn about those domains, at least until they've gotten a lot of experience with alignment.

The first developers are in an especially tough position, because they have to act under more time pressure and they'll have very little experience with working AGI systems. As such, it makes sense to try to make their task as easy as possible. Alignment isn't all-or-nothing, and being able to align a system with one set of capabilities doesn't mean you can do so for a system with stronger or more varied capabilities.

If you want to say that such a system isn't technically a "human-level general intelligence", that's fine; the important question is about impact rather than definitions, as long as it's clear that when I say "AGI" I mean something like "system that's doing qualitatively the right kind of reasoning to match human performance in arbitrary domains, in large enough quantities to be competitive in domains like software engineering and theoretical physics", not "system that can in fact match human performance in arbitrary domains".

(Also, if you have such fine-grained control over what your system does or does not know about, or if you can have it do very powerful things without possessing dangerous kinds of knowledge and abilities, then I think many commonly discussed AI safety problems become non-issues anyway, as you can just constrain the system [accordingly].)

Yes, this is one of the main appeals of designing systems that (a) make it easy to blacklist or whitelist certain topics, (b) make it easy to verify that the system really is or isn't thinking about a particular domain, and (c) make it easy to blacklist human modeling in particular. It's a very big deal if you can just sidestep a lot of the core difficulties in AI safety (in your earliest AGI systems). E.g., operator manipulation, deception, mind crime, and some aspects of the fuzziness and complexity of human value.

We don't currently know how to formalize ideas like 'whitelisting cognitive domains', however, and we don't know how to align an AGI system in principle for much more modest tasks, even given a solution to those problems.

Comment by robbbb on Some disjunctive reasons for urgency on AI risk · 2019-02-20T00:39:59.349Z · score: 2 (1 votes) · LW · GW

Yeah, I agree with this view and I believe it's the most common view among MIRI folks.

Comment by robbbb on Voting Weight Discussion · 2019-02-17T01:40:58.527Z · score: 8 (1 votes) · LW · GW

New thing: https://www.openphilanthropy.org/blog/new-web-app-calibration-training

Comment by robbbb on The Case for a Bigger Audience · 2019-02-15T21:58:50.455Z · score: 5 (2 votes) · LW · GW

The above updates me toward being more uncertain about whether it's a good idea to add an 'optional non-anonymized upvoting' feature. I'll note that separating out 'I agree with this' from 'I want to see more comments like this' is potentially extra valuable (maybe even necessary) for a healthy non-anonymized upvoting system, because it's more important to distinguish those things if your name's on the line. Also, non-anonymized 'I factually disagree with this' is a lot more useful than non-anonymized 'I want to see fewer comments/posts like this'.

Comment by robbbb on Why do you reject negative utilitarianism? · 2019-02-13T21:41:04.872Z · score: 10 (4 votes) · LW · GW
The utility of others is not my utility, therefore I am not a utilitarian. I reject unconditional altruism in general for this reason.

When I say that I'm a utilitarian (or something utilitarian-ish), I mean something like: If there were no non-obvious bad side-effects — e.g., it doesn't damage my ability to have ordinary human relationships in a way that ends up burning more value than it creates — I'd take a pill that would bind my future self to be unwilling to sacrifice two strangers to save a friend (or to save myself), all else being equal.

The not-obviously-confused-or-silly version of utilitarianism is "not reflectively endorsing extreme partiality toward yourself or your friends relative to strangers," rather than "I literally have no goals or preferences or affection for anything other than perfectly unbiased maximization of everyone's welfare'".

Comment by robbbb on On Long and Insightful Posts · 2019-02-13T07:03:57.394Z · score: 10 (6 votes) · LW · GW

Long articles are often easier to refute because they make more claims, and their claims are more detailed.

Additionally, the point of writing a blog post isn't to make it easy to refute; and you don't get extra points for refuting an entire post vs. a piece of a post.

Comment by robbbb on Why do you reject negative utilitarianism? · 2019-02-13T00:45:16.259Z · score: 7 (3 votes) · LW · GW

And if you say "I don't push the button, but only because I want to cooperate with other moral theorists" or "I don't push the button, but only because NU is very very likely true but I have nonzero moral uncertainty": do you really think that's the reason? Does that really sound like the prescription of the correct normative theory (modulo your own cognitive limitations and resultant moral uncertainty)? If the negotiation-between-moral-theories spat out a slightly different answer, would this actually be a good idea?

Comment by robbbb on Why do you reject negative utilitarianism? · 2019-02-13T00:42:21.197Z · score: 14 (4 votes) · LW · GW
In evolutionary and developmental history terms, we can see at the first quick glance that many (if not immediately all) of our other motivations interact with suffering, or have interacted with our suffering in the past (individually, neurally, culturally, evolutionarily). They serve functions of group cohesion, coping with stress, acquiring resources, intimacy, adaptive learning & growth, social deterrence, self-protection, understanding ourselves, and various other things we value & honor because they make life easier or interesting.

Seems like all of this could also be said of things like "preferences", "enjoyment", "satisfaction", "feelings of correctness", "attention", "awareness", "imagination", "social modeling", "surprise", "planning", "coordination", "memory", "variety", "novelty", and many other things.

"Preferences" in particular seems like an obvious candidate for 'thing to reduce morality to'; what's your argument for only basing our decisions on dispreference or displeasure and ignoring positive preferences or pleasure (except instrumentally)?

Neither NU nor other systems will honor all of our perceived wants as absolutes to maximize

I'm not sure I understand your argument here. Yes, values are complicated and can conflict with each other. But I'd rather try to find reasonable-though-imperfect approximations and tradeoffs, rather than pick a utility function I know doesn't match human values and optimize it instead just because it's uncomplicated and lets us off the hook for thinking about tradeoffs between things we ultimately care about.

E.g., I like pizza. You could say that it's hard to list every possible flavor I enjoy in perfect detail and completeness, but I'm not thereby tempted to stop eating pizza, or to try to reduce my pizza desire to some other goal like 'existential risk minimization' or 'suffering minimization'. Pizza is just one of the things I like.

To actually reject NU, you must explain what makes something (other than suffering) terminally valuable (or as I say, motivating) beyond its instrumental value for helping us prevent suffering in the total context

E.g.: I enjoy it. If my friends have more fun watching action movies than rom-coms, then I'll happily say that that's sufficient reason for them to watch more action movies, all on its own.

Enjoying action movies is less important than preventing someone from being tortured, and if someone talks too much about trivial sources of fun in the context of immense suffering, then it makes sense to worry that they're a bad person (or not sufficiently in touch with their compassion).

But I understand your position to be not "torture matters more than action movies", but "action movies would ideally have zero impact on our decision-making, except insofar as it bears on suffering". I gather that from your perspective, this is just taking compassion to its logical conclusion; assigning some more value to saving horrifically suffering people than to enjoying a movie is compassionate, so assigning infinitely more value to the one than the other seems like it's just dialing compassion up to 11.

One reason I find this uncompelling is that I don't think the right way to do compassion is to ignore most of the things people care about. I think that helping people requires doing the hard work of figuring out everything they value, and helping them get all those things. That might reduce to "just help them suffer less" in nearly all real-world decisions nowadays, because there's an awful lot of suffering today; but that's a contingent strategy based on various organisms' makeup and environment in 2019, not the final word on everything that's worth doing in a life.

To reject NU, is there some value you want to maximize beyond self-compassion and its role for preventing suffering, at the risk of allowing extreme suffering? How will you tell this to someone undergoing extreme suffering?

I'll tell them I care a great deal about suffering, but I don't assign literally zero importance to everything else.

NU people I've talked to often worry about scenarios like torture vs. dust specks, and that if we don't treat happiness as literally of zero value, then we might make the wrong tradeoff and cause immense harm.

The flip side is dilemmas like:

Suppose you have a chance to push a button that will annihilate all life in the universe forever. You know for a fact that if you don't push it, then billions of people will experience billions upon billions of years of happy, fulfilling, suffering-free life, filled with richness, beauty, variety, and complexity; filled with the things that make life most worth living, and with relationships and life-projects that people find deeply meaningful and satisfying.

However, you also know for a fact that if you don't push the button, you'll experience a tiny, almost-unnoticeable itch on your left shoulder blade a few seconds later, which will be mildly unpleasant for a second or two before the Utopian Future begins. With this one exception, no suffering will ever again occur in the universe, regardless of whether you push the button. Do you push the button, because your momentary itch matters more than all of the potential life and happiness you'd be cutting out?

Comment by robbbb on Why do you reject negative utilitarianism? · 2019-02-12T01:22:20.147Z · score: 30 (15 votes) · LW · GW

I find negative utilitarianism unappealing for roughly the same reason I'd find "we should only care about disgust" or "we should only care about the taste of bananas" unappealing. Or if you think suffering is much closer to a natural kind than disgust, then supply some other mental (or physical!) state that seems more natural-kind-ish to you.

"Only suffering ultimately matters" and "only the taste of bananas ultimately matters" share the virtue of simplicity, but they otherwise run into the same difficulty, which is just that they don't exhaustively describe all the things I enjoy or want or prefer. I don't think my rejection of bananatarianism has to be any more complicated than that.

Something I wrote last year in response to a tangentially related paper:

I personally care about things other than suffering. What are negative utilitarians saying about that?
Are they saying that they don't care about things like friendship, good food, joy, catharsis, adventure, learning new things, falling in love, etc., except as mechanisms for avoiding suffering? Are they saying that I'm deluded about having preferences like those? Are they saying that I should try to change my preferences — and if so, why? Are they saying that my preferences are fine in my personal decision-making as an individual, but shouldn't get any weight in an idealized negotiation about what humanity as a group should do (ignoring any weight my preferences get from non-NU views that might in fact warrant a place at the bargaining table for more foundational or practical reasons distinct from the NU ideal) — and if so, why?
[...] "It's wrong to ever base any decision whatsoever on my (or anyone else's) enjoyment of anything whatsoever in life, except insofar as that enjoyment has downstream effects on other things" is an incredibly, amazingly strong claim. And it's important in this context that you're actually making that incredibly strong claim: more mild "negative-leaning" utilitarianisms (which probably shouldn't be associated with NU, given how stark the difference is) don't have to deal with the version of the world destruction argument I think x-risk people tend to be concerned about, which is not 'in some scenarios, careful weighing of the costs and benefits can justify killing lots of people' but rather 'any offsets or alternatives to building misaligned resource-hungry AGI (without suffering subsystems) get literally zero weight, if you're sufficiently confident that that's what you're building; there's no need to even consider them; they aren't even a feather on the scale'. I just don't see why the not-even-a-feather-on-the-scale view deserves any more attention or respect than, e.g., divine-command theory; in an argument between the "negative-leaning" utilitarian and the real negative utilitarian, I don't think the NU gets any good hits in.
(Simplicity is a virtue, but not when it's of the "I'm going to attempt to disregard every consideration in all of my actions going forward except the expected amount of deliciousness in the future" or "... except the expected amount of lying in the future" variety; so simplicity on its own doesn't raise the view to the level of having non-negligible probability compared to negative-learning U.)
Comment by robbbb on The Case for a Bigger Audience · 2019-02-10T22:44:57.604Z · score: 5 (3 votes) · LW · GW

Idea: if someone hovers over the karma number, a tooltip shows number of voters plus who non-anonymously upvoted; and if you click the karma number, it gives you an option to make your vote non-anonymous (which results in a private notification, plus a public notification if it's an upvote).

This seems better to me than giving the "<" or ">" more functionality, since those are already pretty interactive and complex; whereas the number itself isn't really doing much.

Comment by robbbb on The Case for a Bigger Audience · 2019-02-09T21:41:00.774Z · score: 16 (9 votes) · LW · GW

I agree with this worry, though I have a vague feeling that LW is capturing and retaining less of the rationalist core than is ideal — (EDIT: for example,) I feel like I see LW posts linked/discussed on social media less than is ideal. Not for the purpose of bringing in new readers, but just for the purpose of serving as a common-knowledge hub for rationalists. That's just a feeling, though, and might reflect the bubbles I'm in. E.g., maybe LW is more of a thing on various Discords, since I don't use Discord much.

If we're getting fewer comments than we'd expect and desire given the number of posts or page visits, then that might also suggest that something's wrong with the incentives for commenting.

An opt-in way to give non-anonymous upvotes (either publicly visible, or visible to the upvoted poster, or both) feels to me like it would help with issues in this space, since it's a very low-effort way to give much more contentful/meaningful feedback than an anonymous upvote ("ah, Wei Dai liked my post" is way more information and reinforcement than "ah, my post has 4 more karma", while being a lot less effort than Wei Dai writing more comments). Also separating out "I like this / I want to see more stuff like this" votes from "I agree with this" votes (where I think "I agree with this" votes should only publicly display when they're non-anonymous). I feel like this helps with making posting more rewarding, and also just makes the site as a whole feel more hedonic and less impersonal.

Comment by robbbb on New edition of "Rationality: From AI to Zombies" · 2018-12-20T21:06:45.523Z · score: 3 (2 votes) · LW · GW

Yes :) I wasn't thinking real leather, though maybe synthetic leather also has signaling problems..!

Comment by robbbb on New edition of "Rationality: From AI to Zombies" · 2018-12-16T17:48:27.203Z · score: 6 (4 votes) · LW · GW

Yep, there are good reasons to go for a cheaper edition (e.g., people can buy dozens of copies to pass out without breaking bank) and also to go for a more expensive edition. It makes sense to have one version that's very optimized for affordability (the current version, which is good-quality but roughly at cost), and a separate version that's optimized for other criteria. My main uncertainty is about which features Less Wrong readers are likely to care the most about, and how much those features are worth to them.

Comment by robbbb on New edition of "Rationality: From AI to Zombies" · 2018-12-16T07:22:43.274Z · score: 6 (4 votes) · LW · GW

Not at present. Some people requested that we release higher-quality versions, so that's been on our radar, and I'd be interested to hear what kinds of variants people would and wouldn't be interested in buying. (Full-color, leather-bound, hardcover, etc.)

Comment by robbbb on New edition of "Rationality: From AI to Zombies" · 2018-12-16T00:14:45.275Z · score: 10 (7 votes) · LW · GW

Yep! It doesn't try to include literally every term or reference someone might want to google, but it includes terms like a priori, bit, deontology, directed acyclic graph, Everett branch, normative, and orthogonality, in addition to more rationality-specific terms. The kinds of terms we leave out are ones like "IRC" where some people might need to google the term, but it's not really important enough to warrant a glossary entry.

Comment by robbbb on New edition of "Rationality: From AI to Zombies" · 2018-12-15T22:47:09.702Z · score: 13 (8 votes) · LW · GW

The new version is under CC BY-NC-SA 4.0.

Comment by robbbb on New edition of "Rationality: From AI to Zombies" · 2018-12-15T22:40:07.161Z · score: 5 (3 votes) · LW · GW

Hmm, Chrome seems to have modified the URLs. Try https://intelligence.org/rationality-ai-zombies/ and http://gumroad.com/l/mapterritory (no HTTPS) instead.

The rationalitybook.com link is currently a redirect to the MIRI R:AZ book page while we wait for TrikeApps to finish setting up the proper book page. I figured it was better to release now rather than wait for the finished website and the HACYM ebook, since the print edition will take a few days to deliver and some people will probably want to buy these as holiday presents.

(Amazon currently says it can deliver copies to me by Dec. 18-19 in California.)

## New edition of "Rationality: From AI to Zombies"

2018-12-15T21:33:56.713Z · score: 78 (29 votes)
Comment by robbbb on Transhumanism as Simplified Humanism · 2018-12-12T03:23:49.138Z · score: 5 (3 votes) · LW · GW

K, cool. :)

Comment by robbbb on Transhumanism as Simplified Humanism · 2018-12-11T19:21:24.509Z · score: 31 (7 votes) · LW · GW

Yeah, "Life is good" doesn't validly imply "Living forever is good". There can obviously be offsetting costs; I think it's good to point this out, so we don't confuse "there's a presumption of evidence for (transhumanist intervention blah)" with "there's an ironclad argument against any possible offsetting risks/costs turning up in the future".

Like Said, I took Eliezer to just be saying "there's no currently obvious reason to think that the optimal healthy lifespan for most people is <200 (or <1000, etc.)." My read is that 2007-Eliezer is trying to explain why bioconservatives need to point to some concrete cost at all (rather than taking it for granted that sci-fi-ish outcomes are weird and alien and therefore bad), and not trying to systematically respond to every particular scenario one might come up with where the utilities do flip at a certain age.

The goal is to provide an intuition pump: "Wanting people to live radically longer, be radically smarter, be radically happier, etc. is totally mundane and doesn't require any exotic assumptions or bizarre preferences." Pretty similar to another Eliezer intuition pump:

In addition to standard biases, I have personally observed what look like harmful modes of thinking specific to existential risks. The Spanish flu of 1918 killed 25-50 million people. World War II killed 60 million people. is the order of the largest catastrophes in humanity’s written history. Substantially larger numbers, such as 500 million deaths, and especially qualitatively different scenarios such as the extinction of the entire human species, seem to trigger a different mode of thinking—enter into a “separate magisterium.” People who would never dream of hurting a child hear of an existential risk, and say, “Well, maybe the human species doesn’t really deserve to survive.”
There is a saying in heuristics and biases that people do not evaluate events, but descriptions of events—what is called non-extensional reasoning. The extension of humanity’s extinction includes the death of yourself, of your friends, of your family, of your loved ones, of your city, of your country, of your political fellows. Yet people who would take great offense at a proposal to wipe the country of Britain from the map, to kill every member of the Democratic Party in the U.S., to turn the city of Paris to glass—who would feel still greater horror on hearing the doctor say that their child had cancer— these people will discuss the extinction of humanity with perfect calm. “Extinction of humanity,” as words on paper, appears in fictional novels, or is discussed in philosophy books—it belongs to a different context than the Spanish flu. We evaluate descriptions of events, not extensions of events. The cliché phrase end of the world invokes the magisterium of myth and dream, of prophecy and apocalypse, of novels and movies. The challenge of existential risks to rationality is that, the catastrophes being so huge, people snap into a different mode of thinking.

People tend to think about the long-term future in Far Mode, which makes near-mode good things like "watching a really good movie" or "helping a sick child" feel less cognitively available/relevant/salient. The point of Eliezer's "transhumanist proof by induction" isn't to establish that there can never be offsetting costs (or diminishing returns, etc.) to having more of a good thing. It's just to remind us that small concrete near-mode good things don't stop being good when we talk about far-mode topics. (Indeed, they're often the dominant consideration, because they can end up adding up to so much value when we talk about large-scale things.)

Comment by robbbb on Quantum Mechanics, Nothing to do with Consciousness · 2018-12-01T22:45:15.784Z · score: 5 (3 votes) · LW · GW
The Sequences make it seem like the Many Worlds interpretation has solved this problem but that's not true.

[...] But what does the integral over squared moduli have to do with anything?  On a straight reading of the data, you would always find yourself in both blobs, every time.  How can you find yourself in one blob with greater probability?  What are the Born probabilities, probabilities of?  Here's the map—where's the territory?
I don't know.  It's an open problem.  Try not to go funny in the head about it.
This problem is even worse than it looks, because the squared-modulus business is the only non-linear rule in all of quantum mechanics.  Everything else—everything else—obeys the linear rule that the evolution of amplitude distribution A, plus the evolution of the amplitude distribution B, equals the evolution of the amplitude distribution A + B.
When you think about the weather in terms of clouds and flapping butterflies, it may not look linear on that higher level.  But the amplitude distribution for weather (plus the rest of the universe) is linear on the only level that's fundamentally real.
Does this mean that the squared-modulus business must require additional physics beyond the linear laws we know—that it's necessarily futile to try to derive it on any higher level of organization?
But even this doesn't follow. [...]
[...] But, said Scott, we might encounter future evidence in favor of single-world quantum mechanics, and many-worlds still has the open question of the Born probabilities.
This is indeed what I would call the fallacy of privileging the hypothesis. There must be a trillion better ways to answer the Born question without adding a collapse postulate that would be the only non-linear, non-unitary, discontinous, non-differentiable, non-CPT-symmetric, non-local in the configuration space, Liouville’s-Theorem-violating, privileged-space-of-simultaneity-possessing, faster-than-light-influencing, acausal, informally specified law in all of physics. Something that unphysical is not worth saying out loud or even thinking about as a possibility without a rather large weight of evidence—far more than the current grand total of zero.
But because of a historical accident, collapse postulates and single-world quantum mechanics are indeed on everyone’s lips and in everyone’s mind to be thought of, and so the open question of the Born probabilities is offered up (by Scott Aaronson no less!) as evidence that many-worlds can’t yet offer a complete picture of the world. Which is taken to mean that single-world quantum mechanics is still in the running somehow.
In the minds of human beings, if you can get them to think about this particular hypothesis rather than the trillion other possibilities that are no more complicated or unlikely, you really have done a huge chunk of the work of persuasion. Anything thought about is treated as “in the running,” and if other runners seem to fall behind in the race a little, it’s assumed that this runner is edging forward or even entering the lead.
[... O]ur uncertainty about where the Born statistics come from should be uncertainty within the space of quantum theories that are continuous, linear, unitary, slower-than-light, local, causal, naturalistic, et cetera—the usual character of physical law. Some of that uncertainty might slop outside the standard space onto theories that violate one of these standard characteristics. It’s indeed possible that we might have to think outside the box. But single-world theories violate all these characteristics, and there is no reason to privilege that hypothesis.

The main claims Eliezer is criticizing in the QM sequence are that (1) reifying QM's complex amplitudes runs afoul of Ockham's Razor, (2) objective collapse is a plausible explanation for the Born probabilities, (3) QM shows that reality is ineffable, and (4) QM shows that there's no such thing as reality. I don't know what question of fact you think the Quantum Bayesians and Eliezer disagree about, or what novel factual claim QB is making. (I assume we agree 'physical formalisms can be useful tools' and 'we can use probability theory to think about strength of belief' aren't novel claims.)

## On MIRI's new research directions

2018-11-22T23:42:06.521Z · score: 57 (16 votes)
Comment by robbbb on Suggestion: New material shouldn't be released too fast · 2018-11-22T00:38:48.664Z · score: 2 (1 votes) · LW · GW

I think it generally makes sense to have highly upvoted recent sequences spotlighted on the top of the page, for the same reason it makes sense to have them spotlighted in 'Curated'.

They can then be made rarer (or phased out entirely) once they're less recent, if there's less value to spotlighting them in the long run. I've generally had a hard time finding posts from each other, because posts in Embedded Agency and Fixed Points on LW often haven't been included in any sequence.

Comment by robbbb on Is Clickbait Destroying Our General Intelligence? · 2018-11-17T19:26:58.587Z · score: 3 (2 votes) · LW · GW

If people tend to systematically make a certain mistake, then it's worth asking whether there's some causal factor behind it and whether that could be nudging us toward making the same mistake.

On the other hand, our general ability to solve problems and figure things out presumably is either staying the same, or getting worse, or getting better. That's a factual question that we should be able to learn about, and if (after trying to correct for biases) we did end up reaching a conclusion that resembles an old mistake, well, then it's also possible that the truth resembles an old mistake.

Comment by robbbb on Sam Harris and the Is–Ought Gap · 2018-11-16T05:31:28.969Z · score: 26 (9 votes) · LW · GW

This is a great post, and I think does a good job of capturing why the two sides tend to talk past each other. A is baffled by why B claims to be able to reduce free-floating symbols to other symbols; B is baffled by why A claims to be using free-floating symbols.

They're also both probably right when it comes to "defending standard usage", and are just defending/highlighting different aspects of folk moral communication.

People often use "should" language to try to communicate facts; and if they were more self-aware about the truth-conditions of that language, they would be better able to communicate and achieve their goals. Harris thinks this is important.

People also often use "should" language to try to directly modify each others' motivations. (E.g., trying to express themselves in ways they think will apply social pressure or tug at someone's heartstrings.) Harris' critics think this is important, and worry that uncritically accepting Harris' project could conceal this phenomenon without making it go away.

(Well, I think the latter is less mysterian than the typical anti-Harris ethics argument, and Harris would probably be more sympathetic to the above framing than to the typical "ought is just its own thing, end of story" argument.)

Comment by robbbb on Embedded Agency (full-text version) · 2018-11-15T20:16:49.961Z · score: 20 (6 votes) · LW · GW

The above is the full Embedded Agency sequence, cross-posted from the MIRI website so that it's easier to find the text version on AIAF/LW (via search, sequences, author pages, etc.).

Scott and Abram have added a new section on self-reference to the sequence since it was first posted, and slightly expanded the subsequent section on logical uncertainty and the start of the robust delegation section.

Comment by robbbb on Robust Delegation · 2018-11-14T21:02:16.714Z · score: 2 (1 votes) · LW · GW

(Abram has added a note to this effect in the post above, and in the text version.)

Comment by robbbb on Embedded World-Models · 2018-11-14T20:52:06.724Z · score: 11 (3 votes) · LW · GW

Abram has made a major update to the post above, adding material on self-reference and the grain of truth problem. The corresponding text on the MIRI Blog version has also been expanded, with some extra material on those topics plus logical uncertainty.

Comment by robbbb on Embedded Agents · 2018-11-02T16:07:20.627Z · score: 6 (4 votes) · LW · GW

The next part just went live, and is about exactly that!: http://intelligence.org/embedded-models

Comment by robbbb on Decision Theory · 2018-11-02T15:52:35.982Z · score: 28 (5 votes) · LW · GW

Cross-posting some comments from the MIRI Blog:

Konstantin Surkov:

Re: 5/10 problem
I don't get it. Human is obviously (in that regard) an agent reasoning about his actions. Human also will choose 10 without any difficulty. What in human decision making process is not formalizable here? Assuming we agree that 10 is rational choice.

Abram Demski:

Suppose you know that you take the $10. How do you reason about what would happen if you took the$5 instead? It seems easy if you know how to separate yourself from the world, so that you only think of external consequences (getting $5). If you think about yourself as well, then you run into contradictions when you try to imagine the world where you take the$5, because you know it is not the sort of thing you would do. Maybe you have some absurd predictions about what the world would be like if you took the $5; for example, you imagine that you would have to be blind. That's alright, though, because in the end you are taking the$10, so you're doing fine.
Part of the point is that an agent can be in a similar position, except it is taking the $5, knows it is taking the$5, and unable to figure out that it should be taking the $10 instead due to the absurd predictions it makes about what happens when it takes the$10. It seems kind of hard for a human to end up in that situation, but it doesn't seem so hard to get this sort of thing when we write down formal reasoners, particularly when we let them reason about themselves fully (as natural parts of the world) rather than only reasoning about the external world or having pre-programmed divisions (so they reason about themselves in a different way from how they reason about the world).
Comment by robbbb on Embedded Agents · 2018-11-01T01:02:22.297Z · score: 3 (2 votes) · LW · GW

It's the first post! The posts are indexed on https://www.alignmentforum.org/s/Rm6oQRJJmhGCcLvxh and https://intelligence.org/embedded-agency/, but it looks like they're not on LW?

Comment by robbbb on Embedded Agents · 2018-10-30T17:12:40.440Z · score: 24 (8 votes) · LW · GW

I'd draw more of a connection between embedded agency and bounded optimality or the philosophical superproject of "naturalizing" various concepts (e.g., naturalized epistemology).

Our old name for embedded agency was "naturalized agency"; we switched because we kept finding that CS people wanted to know what we meant by "naturalized", and we'd always say "embedded", so...

"Embodiment" is less relevant because it's about, well, bodies. Embedded agency just says that the agent is embedded in its environment in some fashion; it doesn't say that the agent has a robot body, in spite of the cute pictures of robots Abram drew above. An AI system with no "body" it can directly manipulate or sense will still be physically implemented on computing hardware, and that on its own can raise all the issues above.

## Comment on decision theory

2018-09-09T20:13:09.543Z · score: 63 (25 votes)

## Ben Hoffman's donor recommendations

2018-06-21T16:02:45.679Z · score: 40 (17 votes)

## Critch on career advice for junior AI-x-risk-concerned researchers

2018-05-12T02:13:28.743Z · score: 201 (68 votes)

## Two clarifications about "Strategic Background"

2018-04-12T02:11:46.034Z · score: 76 (22 votes)

## Karnofsky on forecasting and what science does

2018-03-28T01:55:26.495Z · score: 17 (3 votes)

## Quick Nate/Eliezer comments on discontinuity

2018-03-01T22:03:27.094Z · score: 66 (21 votes)

## Yudkowsky on AGI ethics

2017-10-19T23:13:59.829Z · score: 83 (36 votes)

## MIRI: Decisions are for making bad outcomes inconsistent

2017-04-09T03:42:58.133Z · score: 7 (8 votes)

## CHCAI/MIRI research internship in AI safety

2017-02-13T18:34:34.520Z · score: 5 (6 votes)

2016-10-11T23:52:44.410Z · score: 15 (13 votes)

## A few misconceptions surrounding Roko's basilisk

2015-10-05T21:23:08.994Z · score: 56 (50 votes)

## The Library of Scott Alexandria

2015-09-14T01:38:27.167Z · score: 54 (51 votes)

2015-06-11T00:27:00.253Z · score: 19 (20 votes)

## Rationality: From AI to Zombies

2015-03-13T15:11:20.920Z · score: 83 (83 votes)

## Ends: An Introduction

2015-03-11T19:00:44.904Z · score: 2 (2 votes)

## Minds: An Introduction

2015-03-11T19:00:32.440Z · score: 3 (3 votes)

## Biases: An Introduction

2015-03-11T19:00:31.605Z · score: 62 (96 votes)

## Rationality: An Introduction

2015-03-11T19:00:31.162Z · score: 9 (12 votes)

## Beginnings: An Introduction

2015-03-11T19:00:25.616Z · score: 2 (2 votes)

## The World: An Introduction

2015-03-11T19:00:12.370Z · score: 3 (3 votes)

## Announcement: The Sequences eBook will be released in mid-March

2015-03-03T01:58:45.893Z · score: 47 (48 votes)

## A forum for researchers to publicly discuss safety issues in advanced AI

2014-12-13T00:33:50.516Z · score: 12 (13 votes)

## Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda

2014-11-26T11:02:01.038Z · score: 26 (31 votes)

## Groundwork for AGI safety engineering

2014-08-06T21:29:38.767Z · score: 13 (14 votes)

## Politics is hard mode

2014-07-21T22:14:33.503Z · score: 40 (72 votes)

## The Problem with AIXI

2014-03-18T01:55:38.274Z · score: 29 (29 votes)

## Solomonoff Cartesianism

2014-03-02T17:56:23.442Z · score: 34 (31 votes)

## Bridge Collapse: Reductionism as Engineering Problem

2014-02-18T22:03:08.008Z · score: 54 (49 votes)

## Can We Do Without Bridge Hypotheses?

2014-01-25T00:50:24.991Z · score: 11 (12 votes)

## Building Phenomenological Bridges

2013-12-23T19:57:22.555Z · score: 67 (60 votes)

## The genie knows, but doesn't care

2013-09-06T06:42:38.780Z · score: 57 (63 votes)

## The Up-Goer Five Game: Explaining hard ideas with simple words

2013-09-05T05:54:16.443Z · score: 29 (34 votes)

## Reality is weirdly normal

2013-08-25T19:29:42.541Z · score: 33 (48 votes)

## Engaging First Introductions to AI Risk

2013-08-19T06:26:26.697Z · score: 20 (27 votes)

## What do professional philosophers believe, and why?

2013-05-01T14:40:47.028Z · score: 31 (44 votes)