'Longtermism' definitional discussion on EA Forum 2019-08-02T23:53:03.731Z · score: 17 (6 votes)
Henry Kissinger: AI Could Mean the End of Human History 2018-05-15T20:11:11.136Z · score: 46 (10 votes)
AskReddit: Hard Pills to Swallow 2018-05-14T11:20:37.470Z · score: 17 (6 votes)
Predicting Future Morality 2018-05-06T07:17:16.548Z · score: 24 (9 votes)
AI Safety via Debate 2018-05-05T02:11:25.655Z · score: 40 (9 votes)
FLI awards prize to Arkhipov’s relatives 2017-10-28T19:40:43.928Z · score: 12 (5 votes)
Functional Decision Theory: A New Theory of Instrumental Rationality 2017-10-20T08:09:25.645Z · score: 36 (13 votes)
A Software Agent Illustrating Some Features of an Illusionist Account of Consciousness 2017-10-17T07:42:28.822Z · score: 16 (3 votes)
Neuralink and the Brain’s Magical Future 2017-04-23T07:27:30.817Z · score: 6 (7 votes)
Request for help with economic analysis related to AI forecasting 2016-02-06T01:27:39.810Z · score: 6 (7 votes)
[Link] AlphaGo: Mastering the ancient game of Go with Machine Learning 2016-01-27T21:04:55.183Z · score: 14 (15 votes)
[LINK] Deep Learning Machine Teaches Itself Chess in 72 Hours 2015-09-14T19:38:11.447Z · score: 8 (9 votes)
[Link] First almost fully-formed human [foetus] brain grown in lab, researchers claim 2015-08-19T06:37:21.049Z · score: 7 (8 votes)
[Link] Neural networks trained on expert Go games have just made a major leap 2015-01-02T15:48:16.283Z · score: 15 (16 votes)
[LINK] Attention Schema Theory of Consciousness 2013-08-25T22:30:01.903Z · score: 3 (4 votes)
[LINK] Well-written article on the Future of Humanity Institute and Existential Risk 2013-03-02T12:36:39.402Z · score: 16 (19 votes)
The Center for Sustainable Nanotechnology 2013-02-26T06:55:18.542Z · score: 4 (11 votes)


Comment by esrogs on Understanding “Deep Double Descent” · 2019-12-06T21:00:12.181Z · score: 4 (2 votes) · LW · GW
Before the interpolation threshold, the bias-variance tradeoff holds and increasing model complexity leads to overfitting, reducing test error.

Should that read, "... increasing test error"?

Comment by esrogs on Seeking Power is Provably Instrumentally Convergent in MDPs · 2019-12-06T19:30:42.786Z · score: 6 (3 votes) · LW · GW

Thanks for this reply. In general when I'm reading an explanation and come across a statement like, "this means that...", as in the above, if it's not immediately obvious to me why, I find myself wondering whether I'm supposed to see why and I'm just missing something, or if there's a complicated explanation that's being skipped.

In this case it sounds like there was a complicated explanation that was being skipped, and you did not expect readers to see why the statement was true. As a point of feedback: when that's the case I appreciate when writers make note of that fact in the text (e.g. with a parenthetical saying, "To see why this is true, refer to theorem... in the paper.").

Otherwise, I feel like I've just stopped understanding what's being written, and it's hard for me to stay engaged. If I know that something is not supposed to be obvious, then it's easier for me to just mentally flag it as something I can return to later if I want, and keep going.

Comment by esrogs on Seeking Power is Provably Instrumentally Convergent in MDPs · 2019-12-05T22:34:26.557Z · score: 6 (3 votes) · LW · GW
This means that if there's more than twice the power coming from one move than from another, the former is more likely than the latter. In general, if one set of possibilities contributes 2K the power of another set of possibilities, the former set is at least K times more likely than the latter.

Where does the 2 come from? Why does one move have to have more than twice the power of another to be more likely? What happens if it only has 1.1x as much power?

Comment by esrogs on Seeking Power is Provably Instrumentally Convergent in MDPs · 2019-12-05T19:47:46.244Z · score: 6 (3 votes) · LW · GW
Remember how, as the agent gets more farsighted, more of its control comes from Chocolate and Hug, while also these two possibilities become more and more likely?

I don't understand this bit -- how does more of its control come from Chocolate and Hug? Wouldn't you say its control comes from Wait!? Once it ends up in Candy, Chocolate, or Hug, it has no control left. No?

Comment by esrogs on Seeking Power is Provably Instrumentally Convergent in MDPs · 2019-12-05T19:41:32.674Z · score: 6 (3 votes) · LW · GW
We bake the opponent's policy into the environment's rules: when you choose a move, the game automatically replies.

And the opponent plays to win, with perfect play?

Comment by esrogs on Seeking Power is Provably Instrumentally Convergent in MDPs · 2019-12-05T19:34:10.246Z · score: 6 (3 votes) · LW · GW
Imagine we only care about the reward we get next turn. How many goals choose Candy over Wait? Well, it's 50-50 – since we randomly choose a number between 0 and 1 for each state, both states have an equal chance of being maximal.

I got a little confused at the introduction of Wait!, but I think I understand it now. So, to check my understanding, and for the benefit of others, some notes:

  • the agent gets a reward for the Wait! state, just like the other states
  • for terminal states (the three non-Wait! states), the agent stays in that state, and keeps getting the same reward for all future time steps
  • so, when comparing Candy vs Wait! + Chocolate, the rewards after three turns would be (R_candy + γ
    * R_candy + γ^2 * R_candy) vs (R_wait + γ * R_chocolate + γ^2 * R_chocolate)

(I had at first assumed the agent got no reward for Wait!, and also failed to realize that the agent keeps getting the reward for the terminal state indefinitely, and so thought it was just about comparing different one-time rewards.)

Comment by esrogs on Could someone please start a bright home lighting company? · 2019-11-29T20:17:59.835Z · score: 4 (2 votes) · LW · GW

Did you miss the part about how a bunch of different people were doing this and benefitted from it? Or are you just assuming that those people didn't officially have SAD?

Comment by esrogs on A Theory of Pervasive Error · 2019-11-26T09:32:19.038Z · score: 11 (6 votes) · LW · GW
Curtis Yarvin, a computer programmer perhaps most famous as the principal author of the Urbit

More famous than as the co-father of neoreaction? (Along with the N Land also referenced in the paragraph.)

Comment by esrogs on Realism and Rationality · 2019-11-26T08:47:45.749Z · score: 3 (2 votes) · LW · GW
(Within my cave of footnotes, I say a bit more on this point in FN14)

From looking at the footnotes, I think maybe you mean the one that begins, "These metaphysical and epistemological issues become less concerning if..." Wanted to note that this is showing up as #15 for me.

Comment by esrogs on Realism and Rationality · 2019-11-26T07:50:09.322Z · score: 13 (4 votes) · LW · GW

I'm not sure if my position would be considered "moral anti-realist", but if so, it seems to me a bit like calling Einstein a "space anti-realist", or a "simultaneity anti-realist". Einstein says that there is space, and there is simultaneity. They just don't match our folk concepts.

I feel like my position is more like, "we actually mean a bunch of different related things when we use normative language and many of those can be discussed as matters of objective fact" than "any discussion of morality is vacuous".

Does that just mean I'm an anti-realist (or naturalist realist?) and not an error theorist?

EDIT: after following the link in the footnotes to Luke's post on Pluralistic Moral Reductionism, it seems like I am just advocating the same position.

EDIT2: But given that the author of this post was aware of that post, I'm surprised that he thought rationalist's use of normative statements was evidence of contradiction (or tension), rather than of using normative language in a variety of different ways, as in Luke's post. Does any of the tension survive if you assume the speakers are pluralistic moral reductionists?

Comment by esrogs on Realism and Rationality · 2019-11-26T07:27:09.319Z · score: 33 (7 votes) · LW · GW

Speaking for myself (though I think many other rationalists think similarly), I approach this question with a particular mindset that I'm not sure how to describe exactly, but I would like to gesture at with some notes (apologies if all of these are obvious, but I want to get them out there for the sake of clarity):

  • Abstractions tend to be leaky
  • As Sean Carroll would say, there are different "ways of talking" about phenomena, on different levels of abstraction. In physics, we use the lowest level (and talk about quantum fields or whatever) when we want to be maximally precise, but that doesn't mean that higher level emergent properties don't exist. (Just because temperature is an aggregate property of fast moving particles, doesn't mean that heat isn't "real".) And it would be a total waste of time not to use the higher level concepts when discussing higher level phenomenon (e.g. temperature, pressure, color, consciousness, etc.)
  • Various intuitive properties that we would like systems to have may turn out to be impossible, either individually, or together. Consider Arrow's theorem for voting systems, or Gödel's incompleteness theorems. Does the existence of these results mean that no voting system is better than any other? Or that formal systems are all useless? No, but they do mean that we may have to abandon previous ideas we had about finding the one single correct voting procedure, or axiomatic system. We shouldn't stop talking about whether a statement is provable, but, if we want to be precise, we should clarify which formal system we're using when we ask the question.
  • Phenomena that a folk or intuitive understanding sees as one thing, often turn out to be two (or more) things on careful inspection, or to be meaningless in certain contexts. E.g. my compass points north. But if I'm in Greenland, where it points, and the place where the rotational axis of the earth meets the surface aren't the same thing anymore. And if I'm in space, there just is no north anymore (or up, for that matter).
  • When you go through an ontological shift, and discover that the concepts you were using to make sense of the world aren't quite right, you don't have to just halt, melt, and catch fire. It doesn't mean that all of your past conclusions were wrong. As Eliezer would say, you can rescue the utility function.
  • This state of having leaky abstractions, and concepts that aren't quite right, is the default. It is rare that an intuitive or folk concept survives careful analysis unmodified. Maybe whole numbers would be an example that's unmodified. But even there, our idea of what a 'number' is is very different from what people thought a thousand years ago.

With all that in mind as background, when I come to the question of morality or normativity, it seems very natural to me that one might conclude that there is no single objective rule, or set of rules or whatever, that exactly matches our intuitive idea of "shouldness".

Does that mean I can't say which of two actions is better? I don't think so. It means that when I do, I'm probably being a bit imprecise, and what I really mean is some combination of the emotivist statement referenced in the post, plus a claim about what consequences will follow from the action, combined with an implicit expression of belief about how my listeners will feel about those consequences, etc.

I think basically all of the examples in the post of rationalists using normative language can be seen as examples of this kind of shorthand. E.g. saying that one should update one's credences according to Bayes's rule is shorthand for saying that this procedure will produce the most accurate beliefs (and also that I, the speaker, believe it is in the listener's best interest to have accurate beliefs, and etc.).

For me it seems like a totally natural and unsurprising state of affairs for someone to both believe that there is no single precise definition of normativity that perfectly matches our folk understanding of shouldness (or that otherwise is the objectively "correct" morality), and also for that person to go around saying that one should do this or that, or that something is the right thing to do.

Similarly, if your physicist friend says that two things happened at the same time, you don't need to play gotcha and say, "Ah, but I thought you said there was no such thing as absolute simultaneity." You just assume that they actually mean a more complex statement, like "Approximately at the same time, assuming the reference frame of someone on the surface of the Earth."

A folk understanding of morality might think it's defined as:

  • what everyone in their hearts knows is right
  • what will have the best outcomes for me personally in the long run
  • what will have the best outcomes for the people I care about
  • what God says to do
  • what makes me feel good to do after I've done it
  • what other people will approve of me having done

And then it turns out that there just isn't any course of action, or rule for action that satisfies all those properties.

My bet is that there just isn't any definition of normativity that satisfies all the intuitive properties we would like. But that doesn't mean that I can't go around meaningfully talking about what's right in various situations, anymore than the fact that the magnetic pole isn't exactly on the axis of rotation means that I can't point in a direction if someone asks me which way is north.

Comment by esrogs on The Bus Ticket Theory of Genius · 2019-11-25T01:20:41.897Z · score: 2 (1 votes) · LW · GW

Choice of "bus ticket collectors" possibly a nod to William James Sidis?

Comment by esrogs on Pieces of time · 2019-11-12T23:46:25.346Z · score: 8 (5 votes) · LW · GW
I get a similar feeling, at a much lower level, when I travel in the Midwest.

What about traveling in the Midwest gives you this feeling? Is it the travel? Is it the Midwest itself? Is it that you're in a non-urban part of the Midwest, but you're used to the hustle and bustle of a city?

Comment by esrogs on Experiments and Consent · 2019-11-11T06:15:46.733Z · score: 5 (3 votes) · LW · GW

I've seen that. Maybe I'm missing something, but I still stand by my comment. My car is even less capable than the vehicles described there and it's fine to drive.

Seems like the only reason that my car should be allowed on the roads but these should not be is some kind of expectation of moral hazard or false confidence on the part of the driver. No?

Perhaps one could argue that the car is in an uncanny valley where no one can be taught to monitor it correctly. But then it seems like that should be the emphasis rather than simply that the car was not good enough yet at driving itself.

Comment by esrogs on Experiments and Consent · 2019-11-11T02:13:52.729Z · score: 4 (2 votes) · LW · GW

For those who haven't seen it, starting at second 15 here, the driver can be seen to be looking down (presumably at their phone), for 6 full seconds before looking up and realizing that they're about to hit someone. This would not be safe to do in any car.

Comment by esrogs on Experiments and Consent · 2019-11-11T02:09:29.356Z · score: 5 (3 votes) · LW · GW
Similarly, the problem with Uber's car was that if you have an automatic driving system that can't recognize pedestrians, can't anticipate the movements of jaywalkers, freezes in response to dangerous situations, and won't brake to mitigate collisions, it is absolutely nowhere near ready to guide a car on public roads.

Isn't the problem that the human driver wasn't paying attention? My car also cannot recognize pedestrians etc, but it's fine to allow it on public roads because I am vigilant.

To the extent that Uber is at fault (rather than their employee), it seems to me that it's not that they let their cars on the road before they were advanced enough; it's that they didn't adequately ensure that their drivers would be vigilant (via eye-tracking, training, or etc.).

Comment by esrogs on Chris Olah’s views on AGI safety · 2019-11-11T01:40:31.635Z · score: 2 (1 votes) · LW · GW
It's meant to be analogous to imputing a value in a causal Bayes net

Aha! I thought it might be borrowing language from some technical term I wasn't familiar with. Thanks!

Comment by esrogs on Chris Olah’s views on AGI safety · 2019-11-10T05:05:00.748Z · score: 4 (2 votes) · LW · GW
Yes, I believe that RL agents have a much wider range of accident concerns than supervised / unsupervised models.

Is there anything that prevents them from being used as microscopes though? Presumably you can still inspect the models it has learned without using it as an agent (after it's been trained). Or am I missing something?

Comment by esrogs on Chris Olah’s views on AGI safety · 2019-11-10T04:57:18.036Z · score: 4 (2 votes) · LW · GW
"Specifically, rather than using machine learning to build agents which directly take actions in the world, we could use ML as a microscope—a way of learning about the world without directly taking actions in it."
Is there an implicit assumption here that RL agents are generally more dangerous than models that are trained with (un)supervised learning?

Couldn't you use it as a microscope regardless of whether it was trained using RL or (un)supervised learning?

It seems to me that whether it's a microscope would be about what you do with it after it's trained. In other words, an RL agent only need be an agent during training. Once it's trained you could still inspect the models it's learned w/o hooking it up to any effectors.

However, Chris replied yes to this question, so maybe I'm missing something.

Comment by esrogs on Chris Olah’s views on AGI safety · 2019-11-10T04:39:00.638Z · score: 4 (2 votes) · LW · GW
Takeoff does matter, in that I expect that this worldview is not very accurate/good if there's discontinuous takeoff, but imputing the worldview I don't think takeoff matters.

Minor question: could you clarify what you mean by "imputing the worldview" here? Do you mean something like, "operating within the worldview"? (I ask because this doesn't seem to be a use of "impute" that I'm familiar with.)

Comment by esrogs on Elon Musk is wrong: Robotaxis are stupid. We need standardized rented autonomous tugs to move customized owned unpowered wagons. · 2019-11-05T05:47:29.050Z · score: 4 (3 votes) · LW · GW

In a way that Airbnb does not?

Comment by esrogs on What economic gains are there in life extension treatments? · 2019-10-25T06:05:34.694Z · score: 3 (2 votes) · LW · GW
Retired people largely live off their savings; their economic activity is almost entirely consumption.

Why is this listed as a gain? Are you assuming life extension increases work-span but not retirement-span?

Comment by esrogs on ChristianKl's Shortform · 2019-10-13T09:27:30.707Z · score: 4 (3 votes) · LW · GW

Interesting short thread on this here.

Comment by esrogs on Thoughts on "Human-Compatible" · 2019-10-12T19:10:50.058Z · score: 2 (1 votes) · LW · GW
Also, did you mean “wasn’t”? :)

Lol, you got me.

Comment by esrogs on A simple sketch of how realism became unpopular · 2019-10-12T08:15:35.045Z · score: 4 (2 votes) · LW · GW
I don't know who, if anyone, noted the obvious fallacy in Berkeley's master argument prior to Russell in 1912

Not even Moore in 1903?

Russell's criticism is in line with Moore's famous 'The Refutation of Idealism' (1903), where he argues that if one recognizes the act-object distinction within conscious states, one can see that the object is independent of the act.

Isn't that the same argument Russell was making?

Comment by esrogs on Thoughts on "Human-Compatible" · 2019-10-12T07:47:51.737Z · score: 2 (1 votes) · LW · GW
Why does it matter so much that we point exactly to be human?

Should that be "to the human" instead of "to be human"? Wan't sure if you meant to say simply that, or if more words got dropped.

Or maybe it was supposed to be: "matter so much that what we point exactly to be human?"

Comment by esrogs on Thoughts on "Human-Compatible" · 2019-10-12T07:37:02.613Z · score: 2 (1 votes) · LW · GW

FWIW, this reminds me of Holden Karnofsky's formulation of Tool AI (from his 2012 post, Thoughts on the Singularity Institute):

Another way of putting this is that a "tool" has an underlying instruction set that conceptually looks like: "(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc." An "agent," by contrast, has an underlying instruction set that conceptually looks like: "(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A." In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the "tool" version rather than the "agent" version, and this separability is in fact present with most/all modern software. Note that in the "tool" version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter - to describe a program of this kind as "wanting" something is a category error, and there is no reason to expect its step (2) to be deceptive.

If I understand correctly, his "agent" is your Consequentialist AI, and his "tool" is your Decoupled AI 1.

Comment by esrogs on Thoughts on "Human-Compatible" · 2019-10-12T07:25:06.494Z · score: 2 (1 votes) · LW · GW
Here's my summary: reward uncertainty through some extension of a CIRL-like setup, accounting for human irrationality through our scientific knowledge, doing aggregate preference utilitarianism for all of the humans on the planet, discounting people by how well their beliefs map to reality, perhaps downweighting motivations such as envy (to mitigate the problem of everyone wanting positional goods).

Perhaps a dumb question, but is "reward" being used as a noun or verb here? Are we rewarding uncertainty, or is "reward uncertainty" a goal we're trying to achieve?

Comment by esrogs on What are your strategies for avoiding micro-mistakes? · 2019-10-06T23:09:55.556Z · score: 8 (3 votes) · LW · GW

Incidentally, a similar consideration leads me to want to avoid re-using old metaphors when explaining things. If you use multiple metaphors you can triangulate on the meaning -- errors in the listener's understanding will interfere destructively, leaving something closer to what you actually meant.

For this reason, I've been frustrated that we keep using "maximize paperclips" as the stand-in for a misaligned utility function. And I think reusing the exact same example again and again has contributed to the misunderstanding Eliezer describes here:

Original usage and intended meaning: The problem with turning the future over to just any superintelligence is that its utility function may have its attainable maximum at states we'd see as very low-value, even from the most cosmopolitan standpoint.
Misunderstood and widespread meaning: The first AGI ever to arise could show up in a paperclip factory (instead of a research lab specifically trying to do that). And then because AIs just mechanically carry out orders, it does what the humans had in mind, but too much of it.

If we'd found a bunch of different ways to say the first thing, and hadn't just said, "maximize paperclips" every time, then I think the misunderstanding would have been less likely.

Comment by esrogs on What are your strategies for avoiding micro-mistakes? · 2019-10-06T23:02:14.868Z · score: 4 (2 votes) · LW · GW

One mini-habit I have is to try to check my work in a different way from the way I produced it.

For example, if I'm copying down a large number (or string of characters, etc.), then when I double-check it, I read off the transcribed number backwards. I figure this way my brain is less likely to go "Yes yes, I've seen this already" and skip over any discrepancy.

And in general I look for ways to do the same kind of thing in other situations, such that checking is not just a repeat of the original process.

Comment by esrogs on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T23:24:30.910Z · score: 9 (4 votes) · LW · GW
And I think claim 5 is basically in line with what, say, Bostrom would discuss (where stabilization is a thing to do before we attempt to build a sovereign).

You mean in the sense of stabilizing the whole world? I'd be surprised if that's what Yann had in mind. I took him just to mean building a specialized AI to be a check on a single other AI.

Comment by esrogs on FB/Discord Style Reacts · 2019-10-04T17:43:40.351Z · score: 2 (1 votes) · LW · GW
Maybe try out giving people an optional prompt about why they upvoted or downvoted things that is quite short

I like this idea.

Comment by esrogs on List of resolved confusions about IDA · 2019-10-01T04:32:57.291Z · score: 14 (5 votes) · LW · GW
act-based = based on short-term preferences-on-reflection

For others who were confused about what "short-term preferences-on-reflection" would mean, I found this comment and its reply to be helpful.

Putting it into my own words: short-term preferences-on-reflection are about what you would want to happen in the near term, if you had a long time to think about it.

By way of illustration, AlphaZero's long-term preference is to win the chess game, its short-term preference is whatever its policy network spits out as the best move to make next, and its short-term preference-on-reflection is the move it wants to make next after doing a fuck-ton of MCTS.

Comment by esrogs on If the "one cortical algorithm" hypothesis is true, how should one update about timelines and takeoff speed? · 2019-08-26T23:25:12.251Z · score: 4 (2 votes) · LW · GW
and that displacement cells both exist and exist in neocortex

Both exist and exist?

Comment by esrogs on Troll Bridge · 2019-08-24T00:19:38.513Z · score: 4 (4 votes) · LW · GW
there is a troll who will blow up the bridge with you on it, if you cross it "for a dumb reason"

Does this way of writing "if" mean the same thing as "iff", i.e. "if and only if"?

Comment by esrogs on "Designing agent incentives to avoid reward tampering", DeepMind · 2019-08-15T03:06:35.947Z · score: 7 (4 votes) · LW · GW

I can't resist giving this pair of rather incongruous quotes from the paper

Could you spell out what makes the quotes incongruous with each other? It's not jumping out at me.

Comment by esrogs on Can we really prevent all warming for less than 10B$ with the mostly side-effect free geoengineering technique of Marine Cloud Brightening? · 2019-08-05T16:19:08.622Z · score: 4 (2 votes) · LW · GW
1 billion per year per W/m^2 of reduced forcing

For others who weren't sure what "reduced forcing" refers to:

And to put that number in context, the "net anthropogenic component" of radiative forcing appears to be about 1.5 W/m^2 (according to an image in the wikipedia article), so canceling out the anthropogenic component would have an ongoing cost of 1.5 billion per year.

Comment by esrogs on Writing children's picture books · 2019-08-03T04:03:02.664Z · score: 3 (2 votes) · LW · GW

Or you could imagine writing for a smarter but less knowledgeable person. E.g. 10 y.o. Feynman.

Comment by esrogs on Another case of "common sense" not being common? · 2019-07-31T20:48:38.178Z · score: 8 (4 votes) · LW · GW
Okay, that is probably not that good a characterization.

I appreciate the caveat, but I'm actually not seeing the connection at all. What is the relationship you see between common sense and surprisingly simple solutions to problems?

Comment by esrogs on Just Imitate Humans? · 2019-07-27T02:40:24.970Z · score: 5 (2 votes) · LW · GW
Could enough human-imitating artificial agents (running much faster than people) prevent unfriendly AGI from being made?

This seems very related to the question of whether uploads would be safer than some other kind of AGI. Offhand, I remember a comment from Eliezer suggesting that he thought that would be safer (but that uploads would be unlikely to happen first).

Not sure how common that view is though.

Acquiring data: put a group of people in a house with a computer. Show them things (images, videos, audio files, etc.) and give them a chance to respond at the keyboard. Their keyboard actions are the actions, and everything between actions is an observation. Then learn the policy of the group of humans.

Wouldn't this take an enormous amount of observation time to generate enough data to learn a human-imitating policy?

Comment by esrogs on The Self-Unaware AI Oracle · 2019-07-24T18:21:10.207Z · score: 7 (4 votes) · LW · GW

Just want to note that I like your distinctions between Algorithm Land and the Real World and also between Level-1 optimization and Level-2 optimization.

I think some discussion of AI safety hasn't been clear enough on what kind of optimization we expect in which domains. At least, it wasn't clear to me.

But a couple things fell into place for me about 6 months ago, which very much rhyme with your two distinctions:

1) Inexploitability only makes sense relative to a utility function, and if the AI's utility function is orthogonal to yours (e.g. because it is operating in Algorithm Land), then it may be exploitable relative to your utility function, even though it's inexploitable relative to its own utility function. See this comment (and thanks to Rohin for the post that prompted the thought).

2) While some process that's optimizing super-hard for an outcome in Algorithm Land may bleed out into affecting the Real World, this would sort of be by accident, and seems much easier to mitigate than a process that's trying to affect the Real World on purpose. See this comment.

Putting them together, a randomly selected superintelligence doesn't care about atoms, or about macroscopic events unfolding through time (roughly the domain of what we care about). And just because we run it on a computer that from our perspective is embedded in this macroscopic world, and that uses macroscopic resources (compute time, energy), doesn't mean it's going to start caring about macroscopic Real World events, or start fighting with us for those resources. (At least, not in a Level-2 way.)

On the other hand, powerful computing systems we build are not going to be randomly selected from the space of possible programs. We'll have economic incentives to create systems that do consider and operate on the Real World.

So it seems to me that a randomly selected superintelligence may not actually be dangerous (because it doesn't care about being unplugged -- that's a macroscopic concept that seems simple and natural from our perspective, but would not actually correspond to something in most utility functions), but that the superintelligent systems anyone is likely to actually build will be much more likely to be dangerous (because they will model and or act on the Real World).

Comment by esrogs on The AI Timelines Scam · 2019-07-23T04:37:04.547Z · score: 4 (2 votes) · LW · GW

I see two links in your comment that are both linking to the same place -- did you mean for the first one (with the text: "the criticism that the usage of "scam" in the title was an instance of the noncentral fallacy") to link to something else?

Comment by esrogs on The Self-Unaware AI Oracle · 2019-07-23T04:05:50.918Z · score: 2 (1 votes) · LW · GW
The way I read it, Gwern's tool-AI article is mostly about self-improvement.

I'm not sure I understand what you mean here. I linked Gwern's post because your proposal sounded very similar to me to Holden's Tool AI concept, and Gwern's post is one of the more comprehensive responses I can remember coming across.

Is it your impression that what you're proposing is substantially different from Holden's Tool AI?

When I say that your idea sounded similar, I'm thinking of passages like this (from Holden):

Another way of putting this is that a “tool” has an underlying instruction set that conceptually looks like: “(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc.” An “agent,” by contrast, has an underlying instruction set that conceptually looks like: “(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A.” In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the “tool” version rather than the “agent” version, and this separability is in fact present with most/all modern software. Note that in the “tool” version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter - to describe a program of this kind as “wanting” something is a category error, and there is no reason to expect its step (2) to be deceptive….This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode. In fact, if developing “Friendly AI” is what we seek, a tool-AGI could likely be helpful enough in thinking through this problem as to render any previous work on “Friendliness theory” moot.

Compared to this (from you):

Finally, we query the system in a way that is compatible with its self-unawareness. For example, if we want to cure cancer, one nice approach would be to program it to search through its generative model and output the least improbable scenario wherein a cure for cancer is discovered somewhere in the world in the next 10 years. Maybe it would output: "A scientist at a university will be testing immune therapy X, and they will combine it with blood therapy Y, and they'll find that the two together cure all cancers". Then, we go combine therapies X and Y ourselves.

Your, "Then, we go combine therapies X and Y ourselves." to me sounds a lot like Holden's separation of (1) Calculating the best action vs (2) Either explaining (in the case of Tool AI) or executing (in the case of Agent AI) the action. In both cases you seem to be suggesting that we can reap the rewards of superintelligence but retain control by treating the AI as an advisor rather than as an agent who acts on our behalf.

Am I right that what you're proposing is pretty much along the same lines as Holden's Tool AI -- or is there some key difference that I'm missing?

Comment by esrogs on The Self-Unaware AI Oracle · 2019-07-22T23:41:13.244Z · score: 6 (3 votes) · LW · GW

Also see these discussions of Drexler's Comprehensive AI Services proposal, which also emphasizes non-agency:

Comment by esrogs on The Self-Unaware AI Oracle · 2019-07-22T23:28:53.678Z · score: 4 (2 votes) · LW · GW

If you haven't already seen it, you might want to check out:

Comment by esrogs on 1hr talk: Intro to AGI safety · 2019-07-17T18:17:24.291Z · score: 4 (2 votes) · LW · GW

The Dive in! link in the last paragraph appears to be broken. It's taking me to:

Comment by esrogs on What are we predicting for Neuralink event? · 2019-07-17T06:52:55.830Z · score: 15 (3 votes) · LW · GW

Scoring your predictions: it looks like you got all three "not see" predictions right, as well as #1 and #3 from "will see", with only #2 from "will see" missing (though you had merely predicted we'd see something "closer to" your "will see" list, so missing one doesn't necessarily mean you were wrong).

Comment by esrogs on Open Thread July 2019 · 2019-07-16T21:42:07.423Z · score: 9 (3 votes) · LW · GW
But while it may have been sensible to start (fully 10 years ago, now!)

Correction: CFAR was started in 2012 (though I believe some of the founders ran rationality camps the previous summer, in 2011), so it's been 7 (or 8) years, not 10.

Comment by esrogs on Integrity and accountability are core parts of rationality · 2019-07-16T19:53:57.543Z · score: 2 (1 votes) · LW · GW

Got it, that makes sense.

Comment by esrogs on Integrity and accountability are core parts of rationality · 2019-07-16T19:03:15.606Z · score: 2 (1 votes) · LW · GW

But being in a position of power filters for competence, and competence filters for accurate beliefs.

If the quoted bit had instead said:

This means that highly competent people in positions of power often have less accurate beliefs than highly competent people who are not in positions of power.

I wouldn't necessarily have disagreed. But as is I'm pretty skeptical of the claim (again depending on what is meant by "often").