Comment by robbbb on Feedback Requested! Draft of a New About/Welcome Page for LessWrong · 2019-06-02T01:34:36.448Z · score: 4 (2 votes) · LW · GW

A tricky thing about this is that there's an element of cognitive distortion in how most people evaluate these questions, and play-acting at "this distortion makes sense" can worsen the distortion (at the same time that it helps win more trust from people who have the distortion).

If it turned out to be a good idea to try to speak to this perspective, I'd recommend first meditating on a few reversal tests. Like: "Hmm, I wouldn't feel any need to add a disclaimer here if the text I was recommending were The Brothers Karamazov, though I'd want to briefly say why it's relevant, and I might worry about the length. I'd feel a bit worried about recommending a young adult novel, even an unusually didactic one, because people rightly expect YA novels to be optimized for less useful and edifying things than the "literary classics" reference class. The insights tend to be shallower and less common. YA novels and fanfiction are similar in all those respects, and they provoke basically the same feeling in me, so I can maybe use that reversal test to determine what kinds of disclaimers or added context make sense here."

Comment by robbbb on FB/Discord Style Reacts · 2019-06-01T22:43:33.318Z · score: 2 (1 votes) · LW · GW

(If I want to express stronger gratitude than that, I'd rather write it out.)

Comment by robbbb on FB/Discord Style Reacts · 2019-06-01T22:42:28.296Z · score: 2 (1 votes) · LW · GW

On slack, Thumbs Up, OK, and Horns hand signs meet all my minor needs for thanking people.

Comment by robbbb on Drowning children are rare · 2019-05-30T01:28:16.487Z · score: 6 (3 votes) · LW · GW

Can't individuals just list 'Reign of Terror' and then specify in their personalized description that they have a high bar for terror?

Comment by robbbb on Coherent decisions imply consistent utilities · 2019-05-14T19:46:56.205Z · score: 5 (3 votes) · LW · GW

We'd talked about getting a dump out as well, and your plan sounds great to me! The LW team should get back to you with a list at some point (unless they think of a better idea).

Comment by robbbb on Coherent decisions imply consistent utilities · 2019-05-14T03:44:21.316Z · score: 16 (8 votes) · LW · GW

I asked Eliezer if it made sense to cross-post this from Arbital, and did the cross-posting when he approved. I'm sorry it wasn't clear that this was a cross-post! I intended to make this clearer, but my idea was bad (putting the information on the sequence page) and I also implemented it wrong (the sequence didn't previously display on the top of this post).

This post was originally written as a nontechnical introduction to expected utility theory and coherence arguments. Although it begins in media res stylistically, it doesn't have any prereqs or context beyond "this is part of a collection of introductory resources covering a wide variety of technical and semitechnical topics."

Per the first sentence, the main purpose is for this to be a linkable resource for conversations/inquiry about human rationality and conversations/inquiry about AGI:

So we're talking about how to make good decisions, or the idea of 'bounded rationality', or what sufficiently advanced Artificial Intelligences might be like; and somebody starts dragging up the concepts of 'expected utility' or 'utility functions'. And before we even ask what those are, we might first ask, Why?

There have been loose plans for a while to cross-post content from Arbital to LW (maybe all of it; maybe just the best or most interesting stuff), but as I mentioned downthread, we're doing more cross-post experiments sooner than we would have because Arbital's been having serious performance issues.

Comment by robbbb on Coherent decisions imply consistent utilities · 2019-05-14T03:34:04.620Z · score: 5 (3 votes) · LW · GW

I assume you mean 'no one has this responsibility for Arbital anymore', and not that there's someone else who has this responsibility.

Comment by robbbb on Coherent decisions imply consistent utilities · 2019-05-14T02:01:21.742Z · score: 10 (4 votes) · LW · GW

Arbital has been getting increasingly slow and unresponsive. The LW team is looking for fixes or work-arounds, but they aren't familiar with the Arbital codebase. In the meantime, I've been helping cross-post some content from Arbital to LW so it's available at all.

Comment by robbbb on Any rebuttals of Christiano and AI Impacts on takeoff speeds? · 2019-05-12T01:28:50.738Z · score: 27 (7 votes) · LW · GW

MIRI folks are the most prominent proponents of fast takeoff, and we unfortunately haven't had time to write up a thorough response. Oli already quoted the quick comments I posted from Nate and Eliezer last year, and I'll chime in with some of the factors that I think are leading to disagreements about takeoff:

  • Some MIRI people (Nate is one) suspect we might already be in hardware overhang mode, or closer to that point than some other researchers in the field believe.
  • MIRI folks tend to have different views from Paul about AGI, some of which imply that AGI is more likely to be novel and dependent on new insights. (Unfair caricature: Imagine two people in the early 20th century who don't have a technical understanding of nuclear physics yet, trying to argue about how powerful a nuclear-chain-reaction-based bomb might be. If one side were to model that kind of bomb as "sort of like TNT 3.0" while the other is modeling it as "sort of like a small Sun", they're likely to disagree about whether nuclear weapons are going to be a small v. large improvement over TNT. Note I'm just using nuclear weapons as an analogy, not giving an outside-view argument "sometimes technologies are discontinuous, ergo AGI will be discontinuous".)

This list isn't at all intended to be sufficiently-detailed or exhaustive.

I'm hoping we have time to write up more thoughts on this before too long, because this is an important issue (even given that we're trying to minimize the researcher time we put into things other than object-level deconfusion research). I don't want MIRI to be a blocker on other researchers making progress on these issues, though — it would be bad if people put a pause on hashing out takeoff issues for themselves (or put a pause on alignment research that's related to takeoff views) until Eliezer had time to put out a blog post. I primarily wanted to make sure people know that the lack of a substantive response doesn't mean that Nate+Eliezer+Benya+etc. agree with Paul on takeoff issues now, or that we don't think this disagreement matters. Our tardiness is because of opportunity costs and because our views have a lot of pieces to articulate.

Comment by robbbb on Rob B's Shortform Feed · 2019-05-11T20:21:03.506Z · score: 2 (1 votes) · LW · GW

Fantastic!

Comment by robbbb on Rob B's Shortform Feed · 2019-05-11T20:18:55.445Z · score: 2 (1 votes) · LW · GW

That counts! :) Part of why I'm asking is in case we want to build a proper LW glossary, and Rationality Cardinality could at least provide ideas for terms we might be missing.

Comment by robbbb on Rob B's Shortform Feed · 2019-05-10T23:19:00.628Z · score: 4 (2 votes) · LW · GW

Are there any other OK-quality rationalist glossaries out there? https://wiki.lesswrong.com/wiki/Jargon is the only one I know of. I vaguely recall there being one on http://www.bayrationality.com/ at some point, but I might be misremembering.

Comment by robbbb on Rob B's Shortform Feed · 2019-05-10T23:13:24.150Z · score: 7 (3 votes) · LW · GW

The wiki glossary for the sequences / Rationality: A-Z ( https://wiki.lesswrong.com/wiki/RAZ_Glossary ) is updated now with the glossary entries from the print edition of vol. 1-2.

New entries from Map and Territory:

anthropics, availability heuristic, Bayes's theorem, Bayesian, Bayesian updating, bit, Blue and Green, calibration, causal decision theory, cognitive bias, conditional probability, confirmation bias, conjunction fallacy, deontology, directed acyclic graph, elan vital, Everett branch, expected value, Fermi paradox, foozality, hindsight bias, inductive bias, instrumental, intentionality, isomorphism, Kolmogorov complexity, likelihood, maximum-entropy probability distribution, probability distribution, statistical bias, two-boxing

New entries from How to Actually Change Your Mind:

affect heuristic, causal graph, correspondence bias, epistemology, existential risk, frequentism, Friendly AI, group selection, halo effect, humility, intelligence explosion, joint probability distribution, just-world fallacy, koan, many-worlds interpretation, modesty, transhuman

A bunch of other entries from the M&T and HACYM glossaries were already on the wiki; most of these have been improved a bit or made more concise.

Rob B's Shortform Feed

2019-05-10T23:10:14.483Z · score: 19 (3 votes)
Comment by robbbb on Alignment Newsletter One Year Retrospective · 2019-05-06T06:02:49.888Z · score: 6 (3 votes) · LW · GW

One option that's smaller than link posts might be to mention in the AF/LW version of the newsletter which entries are new to AIAF/LW as far as you know; or make comment threads in the newsletter for those entries. I don't know how useful these would be either, but it'd be one way to create common knowledge 'this is currently the one and only place to discuss these things on LW/AIAF'.

Comment by robbbb on [Meta] Hiding negative karma notifications by default · 2019-05-06T01:54:21.736Z · score: 18 (6 votes) · LW · GW

Possible compromise idea: send everyone their karma upvotes along with downvotes regularly, but send the upvotes in daily batches and the downvotes in monthly batches. Having your downvotes sent to you at known, predictable times rather than in random bursts, and having the updates occur less often, might let users take in the relevant information without having it totally dominate their day-to-day experience of visiting the site. This also makes it easier to spot patterns and to properly discount very small aversive changes in vote totals.

On the whole, I'm not sure how useful this would be as a sitewide default. Some concerns:

  • It's not clear to me that karma on its own is all that useful or contentful. Ray recently noted that a comment of his had gotten downvoted somewhat, and that this had been super salient and pointed feedback for him. But I'm pretty sure that the 'downvote' Ray was talking about was actually just me turning a strong upvote into a normal upvote for minor / not-worth-independently-tracking reasons. Plenty of people vote for obscure or complicated or just-wrong reasons.
  • The people who get downvoted the most are likely to have less familiarity with LW norms and context, so they'll be especially ill-equipped to extract actionable information from downvotes. If all people are learning is '<confusing noisy social disapproval>', I'm not sure that's going to help them very much in their journey as a rationalist.

Upvotes tend to be a clearer signal in my experience, while needing to meet a lower bar. (Cf.: we have a higher epistemic bar for establishing a norm 'let's start insulting/criticizing/calling out our colleagues whenever they make a mistake' than for establishing a norm 'let's start complimenting/praising/thanking our colleagues whenever they do something cool', and it would be odd to say that the latter is categorically bad in any environment where we don't also establish the former norm.)

I'm not confident of what the right answer is; this is just me laying out some counter-considerations. I like Mako's comment because it's advocating for an important value, and expressing a not-obviously-wrong concern about that value getting compromised. I lean toward 'don't make down-votes this salient' right now. I'd like more clarity inside my head about how much the downvote-hiding worry is shaped like 'we need to make downvotes more salient so we can actually get the important intellectual work done' vs. 'we need to make downvotes more salient so we can better symbolize/resemble Rationality'.

Comment by robbbb on Open Thread May 2019 · 2019-05-03T05:06:27.709Z · score: 5 (3 votes) · LW · GW

! Hi! I am a biased MIRI person, but I quite dig all the things you mentioned. :)

Comment by robbbb on Habryka's Shortform Feed · 2019-05-02T22:09:44.951Z · score: 7 (4 votes) · LW · GW

I like this shortform feed idea!

Comment by robbbb on Habryka's Shortform Feed · 2019-05-01T18:06:57.710Z · score: 4 (2 votes) · LW · GW

Yeah, strong upvote to this point. Having an Arbital-style system where people's probabilities aren't prominently timestamped might be the worst of both worlds, though, since it discourages updating and makes it look like most people never do it.

I have an intuition that something socially good might be achieved by seeing high-status rationalists treat ass numbers as ass numbers, brazenly assign wildly different probabilities to the same proposition week-by-week, etc., especially if this is a casual and incidental thing rather than being the focus of any blog posts or comments. This might work better, though, if the earlier probabilities vanish by default and only show up again if the user decides to highlight them.

(Also, if a user repeatedly abuses this feature to look a lot more accurate than they really were, this warrants mod intervention IMO.)

Comment by robbbb on Habryka's Shortform Feed · 2019-04-30T23:36:36.309Z · score: 5 (3 votes) · LW · GW

Also, if you do something Arbital-like, I'd find it valuable if the interface encourages people to keep updating their probabilities later as they change. E.g., some (preferably optional) way of tracking how your view has changed over time. Probably also make it easy for people to re-vote without checking (and getting anchored by) their old probability assignment, for people who want that.

Comment by robbbb on Habryka's Shortform Feed · 2019-04-30T23:35:02.135Z · score: 4 (2 votes) · LW · GW

One small thing you could do is to have probability tools be collapsed by default on any AIAF posts (and maybe even on the LW versions of AIAF posts).

Also, maybe someone should write a blog post that's a canonical reference for 'the relevant risks of using probabilities that haven't already been written up', in advance of the feature being released. Then you could just link to that a bunch. (Maybe even include it in the post that explains how the probability tools work, and/or link to that post from all instances of the probability tool.)

Another idea: Arbital had a mix of (1) 'specialized pages that just include a single probability poll and nothing else'; (2) 'pages that are mainly just about listing a ton of probability polls'; and (3) 'pages that have a bunch of other content but incidentally include some probability polls'.

If probability polls on LW mostly looked like 1 and 2 rather than 3, then that might make it easier to distinguish the parts of LW that should be very probability-focused from the parts that shouldn't. I.e., you could avoid adding Arbital's feature for easily embedding probability polls in arbitrary posts (and/or arbitrary comments), and instead treat this more as a distinct kind of page, like 'Questions'.

You could still link to the 'Probability' pages prominently in your post, but the reduced prominence and site support might cause there to be less social pressure for people to avoid writing/posting things out of fears like 'if I don't provide probability assignments for all my claims in this blog post, or don't add a probability poll about something at the end, will I be seen as a Bad Rationalist?'

Comment by robbbb on Habryka's Shortform Feed · 2019-04-30T23:17:48.363Z · score: 2 (1 votes) · LW · GW

I've never checked my karma total on LW 2.0 to see how it's changed.

Comment by robbbb on Habryka's Shortform Feed · 2019-04-28T03:40:56.888Z · score: 5 (3 votes) · LW · GW
I am most worried that this will drastically increase the clutter of comment threads and make things a lot harder to parse. In particular if the order of the reacts is different on each comment, since then there is no reliable way of scanning for the different kinds of information.

I like the reactions UI above, partly because separating it from karma makes it clearer that it's not changing how comments get sorted, and partly because I do want 'agree'/'disagree' to be non-anonymous by default (unlike normal karma).

I agree that the order of reacts should always be the same. I also think every comment/post should display all the reacts (even just to say '0 Agree, 0 Disagree...') to keep things uniform. That means I think there should only be a few permitted reacts -- maybe start with just 'Agree' and 'Disagree', then wait 6+ months and see if users are especially clambering for something extra.

I think the obvious other reacts I'd want to use sometimes are 'agree and downvote' + 'disagree and upvote' (maybe shorten to Agree+Down, Disagree+Up), since otherwise someone might not realize that one and the same person is doing both, which loses a fair amount of this thing I want to be fluidly able to signal. (I don't think there's much value to clearly signaling that the same person agreed and upvoted or disagree and downvoted a thing.)

I would also sometimes click both the 'agree' and 'disagree' buttons, which I think is fine to allow under this UI. :)

Comment by robbbb on Speaking for myself (re: how the LW2.0 team communicates) · 2019-04-27T20:38:02.955Z · score: 9 (5 votes) · LW · GW

*disagrees with and approves of this relevant, interesting, and non-confused comment*

Comment by robbbb on Helen Toner on China, CSET, and AI · 2019-04-23T19:05:20.792Z · score: 5 (4 votes) · LW · GW
"How are we counting Chinese versus non-Chinese papers? Because often, it seems to be just doing it via, "Is their last name Chinese?" Which seems like it really is going to miscount." seems unreasonably skeptical. It's not too much harder to just look up the country of the university/organization that published the paper.

? "Skeptical" implies that this is speculation on Helen's part, whereas I took her to be asserting as fact that this is the methodology that some studies in this category use, and that this isn't a secret or anything. This may be clearer in the full transcript:

Julia Galef: So, I'm curious -- one thing that people often cite is that China publishes more papers on deep learning than the US does. Deep learning, maybe we explained that already, it's the dominant paradigm in AI that's generating a lot of powerful results.
Helen Toner: Mm-hmm.
Julia Galef: So, would you consider that, “number of papers published on deep learning,” would you consider that a meaningful metric?
Helen Toner: I mean, I think it's meaningful. I don't think it is the be-all and end-all metric. I think it contains some information. I think the thing I find frustrating about how central that metric has been is that usually it's mentioned with no sort of accompanying … I don't know. This is a very Rationally Speaking thing to say, so I'm glad I'm on this podcast and not another one…
But it's always mentioned without sort of any kind of caveats or any kind of context. For example, how are we counting Chinese versus non-Chinese papers? Because often, it seems to be just doing it via, "Is their last name Chinese," which seems like it really is going to miscount.
Julia Galef: Oh, wow! There are a bunch of people with Chinese last names working at American AI companies.
Helen Toner: Correct, many of whom are American citizens. So, I think I've definitely seen at least some measures that do that wrong, which seems just completely absurd. But then there's also, if you have a Chinese citizen working in an American university, how should that be counted? Is that a win for the university or is it win for China? It's very unclear.
And they also, these counts of papers have a hard time sort of saying anything about the quality of the papers involved. You can look at citations, but that's not a perfect metric. But it's better, for sure.
And then, lastly, they rarely say anything about the different incentives that Chinese and non-Chinese academics face in publishing. [...]
Comment by robbbb on Book review: The Sleepwalkers by Arthur Koestler · 2019-04-23T18:57:02.444Z · score: 4 (2 votes) · LW · GW

Maybe someday! :)

Comment by robbbb on Book review: The Sleepwalkers by Arthur Koestler · 2019-04-23T11:44:49.460Z · score: 10 (4 votes) · LW · GW

Overconfidence in sentences like "the moon has craters" may be a sin. (Though I'd disagree that this sin category warrants banning someone from talking about the moon's craters and trapping them within a building with threats of force for nine years. YMMV.)

Thinking that the sentence "the moon has craters" refers to the moon, and asserts of the moon that there are craters on it, doesn't seem like a sin at all to me, regardless of whether some scientific models (e.g., in QM) are sometimes useful for reasons we don't understand.

Comment by robbbb on Evidence other than evolution for optimization daemons? · 2019-04-21T21:02:01.510Z · score: 4 (2 votes) · LW · GW

"Catholicism predicts that all soulless optimizers will explicitly represent and maximize their evolutionary fitness function" is a pretty unusual view (even as Catholic views go)! If you want answers to take debates about God and free will into account, I suggest mentioning God/Catholicism in the title.

More broadly, my recommendation would be to read all of https://www.lesswrong.com/rationality and flag questions and disagreements there before trying to square any AI safety stuff with your religious views.

Helen Toner on China, CSET, and AI

2019-04-21T04:10:21.457Z · score: 71 (25 votes)
Comment by robbbb on Slack Club · 2019-04-19T17:44:45.798Z · score: 18 (4 votes) · LW · GW

I agree with a bunch of these concerns. FWIW, it wouldn't surprise me if the current rationalist community still behaviorally undervalues "specialized jargon". (Or, rather than jargon, concept handles a la https://slatestarcodex.com/2014/03/15/can-it-be-wrong-to-crystallize-patterns/.) I don't have a strong view on whether rationalists undervalue of overvalue this kind of thing, but it seems worth commenting on since it's being discussed a lot here.

When I observe the reasons people ended up 'working smarter' or changing course in a good way, it often involves a new lens they started applying to something. I think one of the biggest problems the rationalist community faces is a lack of dakka and a lack of lead bullets. But I guess I want to caution against treating abstraction and execution as too much of a dichotomy, such that we have to choose between "novel LW posts are useful and high-status" and "conscientiousness and follow-through is useful and high-status" and see-saw between the two.

The important thing is cutting the enemy, and I think the kinds of problems that rationalists are in an especially good position to solve require individuals to exhibit large amounts of execution and follow-through while (on a timescale of years) doing a large number of big and small course-corrections to improve their productivity or change their strategy.

It might be that we're doing too much reflection and too much coming up with lenses. It might also be that we're not doing enough grunt work and not doing enough reflection and lenscrafting. Physical tasks don't care whether we're already doing an abnormal amount of one or the other; the universe just hands us problems of a certain difficulty, and if we fall short on any of the requirements then we fail.

It might also be that this varies by individual, such that it's best to just make sure people are aware of these different concerns so they can check which holds true in their own circumstance.

Comment by robbbb on "Intelligence is impossible without emotion" — Yann LeCun · 2019-04-10T21:47:53.444Z · score: 13 (4 votes) · LW · GW

My prior is that Yann LeCun tends to have unmysterious, thoughtful models of AI (example), even though I strongly disagree with (and am often confused by) his claims about AI safety. So when Yann says "emotion", I wonder if he means anything more than that they "can decide what they do" and have "some intrinsic drive that makes them [...] do particular things" as opposed to having "preprogrammed behavior".

Comment by robbbb on Comparison of decision theories (with a focus on logical-counterfactual decision theories) · 2019-03-18T05:13:53.675Z · score: 6 (3 votes) · LW · GW

Agents need to consider multiple actions and choose the one that has the best outcome. But we're supposing that the code representing the agent's decision only has one possible output. E.g., perhaps an agent is going to choose between action A and action B, and will end up choosing A. Then a sufficiently close examination of the agent's source code will reveal that the scenario "the agent chooses B" is logically inconsistent. But then it's not clear how the agent can reason about the desirability of "the agent chooses B" while evaluating its outcomes, if not via some mechanism for nontrivially reasoning about outcomes of logically inconsistent situations.

Comment by robbbb on Comparison of decision theories (with a focus on logical-counterfactual decision theories) · 2019-03-17T19:49:37.776Z · score: 6 (3 votes) · LW · GW

The comment starting "The main datapoint that Rob left out..." is actually by Nate Soares. I cross-posted it to LW from an email conversation.

Comment by robbbb on Question: MIRI Corrigbility Agenda · 2019-03-16T18:28:52.502Z · score: 4 (2 votes) · LW · GW

I've now also highlighted Scott's tip from "Fixed Point Exercises":

Sometimes people ask me what math they should study in order to get into agent foundations. My first answer is that I have found the introductory class in every subfield to be helpful, but I have found the later classes to be much less helpful. My second answer is to learn enough math to understand all fixed point theorems.
These two answers are actually very similar. Fixed point theorems span all across mathematics, and are central to (my way of) thinking about agent foundations.
Comment by robbbb on Question: MIRI Corrigbility Agenda · 2019-03-16T14:31:39.854Z · score: 4 (2 votes) · LW · GW

I'd expect Jessica/Stuart/Scott/Abram/Sam/Tsvi to have a better sense of that than me. I didn't spot any obvious signs that it's no longer a good reference.

Comment by robbbb on Question: MIRI Corrigbility Agenda · 2019-03-15T05:44:20.940Z · score: 5 (3 votes) · LW · GW

For corrigibility in particular, some good material that's not discussed in "Embedded Agency" or the reading guide is Arbital's Corrigibility and Problem of Fully Updated Deference articles.

Comment by robbbb on Question: MIRI Corrigbility Agenda · 2019-03-15T05:36:42.850Z · score: 14 (5 votes) · LW · GW

The only major changes we've made to the MIRI research guide since mid-2015 are to replace Koller and Friedman's Probabilistic Graphical Models with Pearl's Probabilistic Inference; replace Rosen's Discrete Mathematics with Lehman et al.'s Mathematics for CS; add Taylor et al.'s "Alignment for Advanced Machine Learning Systems", Wasserman's All of Statistics, Shalev-Shwartz and Ben-David's Understanding Machine Learning, and Yudkowsky's Inadequate Equilibria; and remove the Global Catastrophic Risks anthology. So the guide is missing a lot of new material. I've now updated the guide to add the following note at the top:

This research guide has been only lightly updated since 2015. Our new recommendation for people who want to work on the AI alignment problem is:
1. If you have a computer science or software engineering background: Apply to attend our new workshops on AI risk and to work as an engineer at MIRI. For this purpose, you don’t need any prior familiarity with our research.
If you aren’t sure whether you’d be a good fit for an AI risk workshop, or for an engineer position, shoot us an email and we can talk about whether it makes sense.
You can find out more about our engineering program in our 2018 strategy update.
2. If you’d like to learn more about the problems we’re working on (regardless of your answer to the above): See “Embedded Agency” for an introduction to our agent foundations research, and see our Alignment Research Field Guide for general recommendations on how to get started in AI safety.
After checking out those two resources, you can use the links and references in “Embedded Agency” and on this page to learn more about the topics you want to drill down on. If you want a particular problem set to focus on, we suggest Scott Garrabrant’s “Fixed Point Exercises.”
If you want people to collaborate and discuss with, we suggest starting or joining a MIRIx group, posting on LessWrong, applying for our AI Risk for Computer Scientists workshops, or otherwise letting us know you’re out there.
Comment by robbbb on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-03-13T18:07:48.572Z · score: 7 (3 votes) · LW · GW
After all, they didn't get any less publicity for reporting the system's other limitations either, like it only being able to play Protoss v. Protoss on a single map, or 10/11 of the agents having whole-camera vision.

They might well have gotten less publicity due to emphasizing those facts as much as they did.

Comment by robbbb on "AlphaStar: Mastering the Real-Time Strategy Game StarCraft II", DeepMind [won 10 of 11 games against human pros] · 2019-03-13T18:04:16.343Z · score: 5 (2 votes) · LW · GW

I mostly agree with this comment. My speculative best guess is that the main reason MaNa did better against the revised version of AlphaStar wasn't due to the vision limitations, but rather some combination of:

MaNa had more time to come up with a good strategy and analyze previous games.

MaNa had more time to warm up, and was generally in a better headspace.

The previous version of AlphaStar was unusually good, and the new version was an entirely new system, so the new version regressed to the mean a bit. (On the dimension "can beat human pros", even though it was superior on the dimension "can beat other AlphaStar strategies".)

Comment by robbbb on Considerateness in OpenAI LP Debate · 2019-03-12T22:24:22.462Z · score: 10 (2 votes) · LW · GW

Eliezer responded to Chollet's post about intelligence explosion here: https://intelligence.org/2017/12/06/chollet/

Comment by robbbb on Renaming "Frontpage" · 2019-03-12T00:53:11.596Z · score: 4 (2 votes) · LW · GW

Personal Blog ➜ Notebook

Messages ➜ Mailbox

Comment by robbbb on Renaming "Frontpage" · 2019-03-11T16:21:11.300Z · score: 12 (4 votes) · LW · GW

Frontpage ➜ Whiteboard

Art ➜ Canvas

Coordination ➜ Bulletin Board

Meta ➜ Website

Comment by robbbb on In My Culture · 2019-03-10T21:12:27.930Z · score: 6 (4 votes) · LW · GW

I like this comment.

Comment by robbbb on Renaming "Frontpage" · 2019-03-09T04:13:21.305Z · score: 3 (2 votes) · LW · GW

Oooh, I like this. Fewer top-level sections seems good to me.

Comment by robbbb on In My Culture · 2019-03-08T00:03:09.758Z · score: 4 (2 votes) · LW · GW

That was my draft 1. :P

Comment by robbbb on In My Culture · 2019-03-07T21:51:57.113Z · score: 15 (5 votes) · LW · GW

For my personal usage, the way I could imagine using it, "in my culture" sounds a bit serious and final. "Where I'm from, we do X" is nice if I want something to sound weighty and powerful and stable, but I just don't think I've figured myself out enough to do that much yet. There might also be a bit of confusion in that "in my culture" also has a structurally similar literal meaning.

"In Robopolis" seems to fix these problems for me, since it more clearly flags that I'm not talking about a literal culture, and it sounds more agnostic about whether this is a deep part of who I am vs. a passing fashion.

Comment by robbbb on Karma-Change Notifications · 2019-03-06T01:00:58.500Z · score: 10 (5 votes) · LW · GW

The main thing I like about the 'only downvotes' option is that it's kind of funny and pointless. This suits my aesthetic. I could imagine trying it out for a few weeks to see what happens / to call the bluff of the part of my primate brain that thinks social disapproval from strangers is an x-risk. :)

Comment by robbbb on Karma-Change Notifications · 2019-03-05T22:20:41.706Z · score: 19 (8 votes) · LW · GW

If I'm having lunch with a friend, then my usual expectation is that I'll get strong compliments if they adore my clothing style, but I won't get strong criticisms if they strongly dislike it, unless I explicitly opt in to receiving the latter feedback. Most people seem to treat high-salience personal compliments as opt-out, while treating high-salience personal criticisms as opt-in. This can be outweighed if the criticism is important enough, but otherwise, criticism tends to be relatively mild and cloaked in humor or indirection.

Thinking about it in those terms, it makes sense to me to treat upvotes as similar to "person says they love my haircut" and downvotes as similar to "person says they hate my haircut." I probably want to be able to view both kinds of feedback in a time and place of my choosing, but I don't want to have the latter feedback tossed my way literally every time I open Chrome or check my email.

It might be that those norms are fine for personal style, but that we want to promote better, more pro-criticism norms in areas that matter more. We might want to push in the direction of making critical feedback opt-out, so people can (a) update faster on things that do matter a lot, and (b) perhaps get some useful exposure therapy that will make us better at receiving tips, pushback, and contrary views in the future. Mostly I'm just making this comment so folks feel comfortable talking about their preferences openly, without feeling like they're Bad Rationalists if they're not already convinced that it's useful for them personally to receive a regular stream of downvote notifications (in the world where they get a lot of downvotes).

Comment by robbbb on Thoughts on Human Models · 2019-02-26T04:09:41.219Z · score: 2 (1 votes) · LW · GW

That all seems generally fine to me. I agree the tradeoffs are the huge central difficulty here; getting to sufficiently capable AGI sufficiently quickly seems enormously harder if you aren't willing to cut major corners on safety.

Comment by robbbb on Thoughts on Human Models · 2019-02-25T23:19:28.697Z · score: 12 (3 votes) · LW · GW

The goal is to avoid particular hazards, rather than to make things human-independent as an end in itself. So if we accidentally use a concept of "human-independent" that yields impractical results like "the only safe concepts are those of fundamental physics", we should just conclude that we were using the wrong conception of "human-independent". A good way to avoid this is to keep revisiting the concrete reasons we started down this path in the first place, and see which conceptions capture our pragmatic goals well.

Here are some examples of concrete outcomes that various AGI alignment approaches might want to see, if they're intended to respond to concerns about human models:

  • The system never exhibits thoughts like "what kind of agent built me?"
  • The system exhibits thoughts like that, but never arrives at human-specific conclusions like "my designer probably has a very small working memory" or "my designer is probably vulnerable to the clustering illusion".
  • The system never reasons about powerful optimization processes in general. (In addition to steering a wide berth around human models, this might be helpful for guarding against AGI systems doing some varieties of undesirable self-modification or building undesirable smart successors.)
  • The system only allocates cognitive resources to solving problems in a specific domain like "biochemistry" or "electrical engineering".

Different alignment approaches can target different subsets of those goals, and of many other similar goals, depending on what they think is feasible and important for safety.

Comment by robbbb on Thoughts on Human Models · 2019-02-23T22:51:57.452Z · score: 3 (2 votes) · LW · GW
What about the possibility that the AGI system threatens others, rather than being threatened itself? Prima facie, that might also lead to worst-case outcomes.

I think a good intuition pump for this idea is to contrast an arbitrarily powerful paperclip maximizer with an arbitrarily powerful something-like-happiness maximizer.

A paperclip maximizer might resort to threats to get what it wants; and in the long run, it will want to convert all resources into paperclips and infrastructure, to the exclusion of everything humans want. But the "normal" failure modes here tend to look like human extinction.

In contrast, a lot of "normal" failure modes for a something-like-happiness maximizer might look like torture, because the system is trying to optimize something about human brains, rather than just trying to remove humans from the picture so it can do its own thing.

Do you envision a system that's not trained using human modelling and therefore just wouldn't know enough about human minds to make any effective threats? I'm not sure how an AI system can meaningfully be said to have "human-level general intelligence" and yet be completely inept in this regard.

I don't know specifically what Ramana and Scott have in mind, but I'm guessing it's a combination of:

  • If the system isn't trained using human-related data, its "goals" (or the closest things to goals it has) are more likely to look like the paperclip maximizer above, and less likely to look like the something-like-happiness maximizer. This greatly reduces downside risk if the system becomes more capable than we intended.
  • When AI developers build the first AGI systems, the right move will probably be to keep their capabilities to a bare minimum — often the minimum stated in this context is "make your system just capable enough to help make sure the world's AI doesn't cause an existential catastrophe in the near future". If that minimal goal doesn't fluency with certain high-risk domains, then developers should just avoid letting their AGI systems learn about those domains, at least until they've gotten a lot of experience with alignment.

The first developers are in an especially tough position, because they have to act under more time pressure and they'll have very little experience with working AGI systems. As such, it makes sense to try to make their task as easy as possible. Alignment isn't all-or-nothing, and being able to align a system with one set of capabilities doesn't mean you can do so for a system with stronger or more varied capabilities.

If you want to say that such a system isn't technically a "human-level general intelligence", that's fine; the important question is about impact rather than definitions, as long as it's clear that when I say "AGI" I mean something like "system that's doing qualitatively the right kind of reasoning to match human performance in arbitrary domains, in large enough quantities to be competitive in domains like software engineering and theoretical physics", not "system that can in fact match human performance in arbitrary domains".

(Also, if you have such fine-grained control over what your system does or does not know about, or if you can have it do very powerful things without possessing dangerous kinds of knowledge and abilities, then I think many commonly discussed AI safety problems become non-issues anyway, as you can just constrain the system [accordingly].)

Yes, this is one of the main appeals of designing systems that (a) make it easy to blacklist or whitelist certain topics, (b) make it easy to verify that the system really is or isn't thinking about a particular domain, and (c) make it easy to blacklist human modeling in particular. It's a very big deal if you can just sidestep a lot of the core difficulties in AI safety (in your earliest AGI systems). E.g., operator manipulation, deception, mind crime, and some aspects of the fuzziness and complexity of human value.

We don't currently know how to formalize ideas like 'whitelisting cognitive domains', however, and we don't know how to align an AGI system in principle for much more modest tasks, even given a solution to those problems.

Comment by robbbb on Some disjunctive reasons for urgency on AI risk · 2019-02-20T00:39:59.349Z · score: 2 (1 votes) · LW · GW

Yeah, I agree with this view and I believe it's the most common view among MIRI folks.

New edition of "Rationality: From AI to Zombies"

2018-12-15T21:33:56.713Z · score: 79 (30 votes)

On MIRI's new research directions

2018-11-22T23:42:06.521Z · score: 57 (16 votes)

Comment on decision theory

2018-09-09T20:13:09.543Z · score: 63 (25 votes)

Ben Hoffman's donor recommendations

2018-06-21T16:02:45.679Z · score: 40 (17 votes)

Critch on career advice for junior AI-x-risk-concerned researchers

2018-05-12T02:13:28.743Z · score: 201 (68 votes)

Two clarifications about "Strategic Background"

2018-04-12T02:11:46.034Z · score: 76 (22 votes)

Karnofsky on forecasting and what science does

2018-03-28T01:55:26.495Z · score: 17 (3 votes)

Quick Nate/Eliezer comments on discontinuity

2018-03-01T22:03:27.094Z · score: 70 (22 votes)

Yudkowsky on AGI ethics

2017-10-19T23:13:59.829Z · score: 83 (36 votes)

MIRI: Decisions are for making bad outcomes inconsistent

2017-04-09T03:42:58.133Z · score: 7 (8 votes)

CHCAI/MIRI research internship in AI safety

2017-02-13T18:34:34.520Z · score: 5 (6 votes)

MIRI AMA plus updates

2016-10-11T23:52:44.410Z · score: 15 (13 votes)

A few misconceptions surrounding Roko's basilisk

2015-10-05T21:23:08.994Z · score: 56 (50 votes)

The Library of Scott Alexandria

2015-09-14T01:38:27.167Z · score: 62 (52 votes)

[Link] Nate Soares is answering questions about MIRI at the EA Forum

2015-06-11T00:27:00.253Z · score: 19 (20 votes)

Rationality: From AI to Zombies

2015-03-13T15:11:20.920Z · score: 85 (84 votes)

Ends: An Introduction

2015-03-11T19:00:44.904Z · score: 2 (2 votes)

Minds: An Introduction

2015-03-11T19:00:32.440Z · score: 3 (3 votes)

Biases: An Introduction

2015-03-11T19:00:31.605Z · score: 63 (99 votes)

Rationality: An Introduction

2015-03-11T19:00:31.162Z · score: 10 (13 votes)

Beginnings: An Introduction

2015-03-11T19:00:25.616Z · score: 2 (2 votes)

The World: An Introduction

2015-03-11T19:00:12.370Z · score: 3 (3 votes)

Announcement: The Sequences eBook will be released in mid-March

2015-03-03T01:58:45.893Z · score: 47 (48 votes)

A forum for researchers to publicly discuss safety issues in advanced AI

2014-12-13T00:33:50.516Z · score: 12 (13 votes)

Stuart Russell: AI value alignment problem must be an "intrinsic part" of the field's mainstream agenda

2014-11-26T11:02:01.038Z · score: 26 (31 votes)

Groundwork for AGI safety engineering

2014-08-06T21:29:38.767Z · score: 13 (14 votes)

Politics is hard mode

2014-07-21T22:14:33.503Z · score: 40 (72 votes)

The Problem with AIXI

2014-03-18T01:55:38.274Z · score: 29 (29 votes)

Solomonoff Cartesianism

2014-03-02T17:56:23.442Z · score: 34 (31 votes)

Bridge Collapse: Reductionism as Engineering Problem

2014-02-18T22:03:08.008Z · score: 54 (49 votes)

Can We Do Without Bridge Hypotheses?

2014-01-25T00:50:24.991Z · score: 11 (12 votes)

Building Phenomenological Bridges

2013-12-23T19:57:22.555Z · score: 67 (60 votes)

The genie knows, but doesn't care

2013-09-06T06:42:38.780Z · score: 57 (63 votes)

The Up-Goer Five Game: Explaining hard ideas with simple words

2013-09-05T05:54:16.443Z · score: 29 (34 votes)

Reality is weirdly normal

2013-08-25T19:29:42.541Z · score: 33 (48 votes)

Engaging First Introductions to AI Risk

2013-08-19T06:26:26.697Z · score: 20 (27 votes)

What do professional philosophers believe, and why?

2013-05-01T14:40:47.028Z · score: 31 (44 votes)