Comment by nnotm on AI Alignment is Alchemy. · 2018-05-17T03:53:48.145Z · score: 3 (2 votes) · LW · GW

As I understand it, the idea with the problems listed in the article is that their solutions are supposed to be fundamental design principles of the AI, rather than addons to fix loopholes.

Augmenting ourselves is probably a good idea to do *in addition* to AI safety research, but I think it's dangerous to do it *instead* of AI safety research. It's far from impossible that artificial intelligence could gain intelligence much faster at some point than augmenting the rather messy human brain, at which point it *needs* to be designed in a safe way.

Comment by nnotm on AI Alignment is Alchemy. · 2018-05-14T18:45:29.446Z · score: 3 (2 votes) · LW · GW

AI alignment is not about trying to outsmart the AI, it's about making sure that what the AI wants is what we want.

If it were actually about figuring out all possible loopholes and preventing them, I would agree that it's a futile endeavor.

A correctly designed AI wouldn't have to be banned from exploring any philosophical or introspective considerations, since regardless of what it discovers there, it's goals would still be aligned with what we want. Discovering *why* it has these goals is similar to humans discovering why we have our motivations (i.e., evolution), and similarly to how discovering evolution didn't change much what humans desire, there's no reason to assume that an AI discovering where its goals come from should change them.

Of course, care will have to be taken to ensure that any self-modifications don't change the goals. But we don't have to work *against* the AI to accomplish that - the AI *also* aims to accomplish its current goals, and any future self-modification that changes its goals would be detrimental in accomplishing its current goals, so (almost) any rational AI will, to the best of its ability, aim *not* to change its goals. Although this doesn't make it easy, since it's quite difficult to formally specify the goals we would want an AI to have.

Comment by nnotm on AI Alignment is Alchemy. · 2018-05-13T21:19:13.428Z · score: 3 (2 votes) · LW · GW

Whether or not it would question its reality mostly depends on what you mean by that - it would almost certainly be useful to figure out how the world works, and especially how the AI itself works, for any AI. It might also be useful to figure out the reason for which it was created.

But, unless it was explicitly programmed in, this would likely not be a motivation in and of itself, rather, it would simply be useful for accomplishing its actual goal.

I'd say the reason why humans place such high value in figuring out philosophical issues is to a large extent because evolution produces messy systems with inconsistent goals. This *could* be the case for AIs too, but to me it seems more likely that some more rational thought will go into their design.

(That's not to say that I believe it will be safe by default, but simply that it will have more organized goals than humans have.)

Comment by nnotm on AI Alignment is Alchemy. · 2018-05-12T19:06:39.410Z · score: 7 (3 votes) · LW · GW

It would need a reason of some kind of reason to change its goals - one might call it a motivation. The only motivation it has available though, are its final goals, and those (by default) don't include changing the final goals.

Humans never had the final goal replicating their genes. They just evolved to want to have sex. (One could perhaps say that the genes themselves had the goal of replicating, and implemented this by giving the humans the goal of having sex.) Reward hacking doesn't involve changing the terminal goal, just fulfilling it in unexpected ways (which is one reason why reinforcement learning might be a bad idea for safe AI.)

Comment by nnotm on AI Alignment is Alchemy. · 2018-05-12T18:04:28.690Z · score: 16 (5 votes) · LW · GW

What you're saying goes against the here widely believed orthogonality thesis, which essentially states that what goal an agent has is independent of how smart it is. If the agent has programmed in a certain set of goals, there is no reason for it to change this set of goals if it becomes smarter (this is because changing its goals would not be beneficial to achieving its current goals).

In this example, if an agent has the sole goal of fulfilling the wishes of a particular human, there is no reason for it to change this goal once it becomes an ASI. As far as the agent is concerned, using resources for this purpose wouldn't be a waste, it would be the only worthwhile use for them. What else would it do with them?

You seem to be assigning some human properties to the hypothetical AI (e.g. "scorn", viewing something as "petty"), which might be partially responsible for the disagreement here.

Comment by nnotm on Pascal's Mugging: Tiny Probabilities of Vast Utilities · 2014-04-21T13:37:49.321Z · score: 1 (1 votes) · LW · GW

Why wait until someone wants the money? Shouldn't the AI try to send 5 Dollars to everyone with a note attached reading "Here is a tribute; please don't kill a huge number of people" regardless of whether they ask for it or not?

Comment by nnotm on Botworld: a cellular automaton for studying self-modifying agents embedded in their environment · 2014-04-12T21:01:05.564Z · score: 2 (2 votes) · LW · GW

Sounds pretty cool, definitely going to try it out some.

Oh, and by the way, you wrote "Inpsect" instead of "Inspect" at the end of page 27.

Comment by nnotm on How to Seem (and Be) Deep · 2014-04-10T01:26:18.475Z · score: 3 (3 votes) · LW · GW

Working links on and

Transhumanism as Simplified Humanism The Meaning That Immortality Gives to Life

Comment by nnotm on 2013 Less Wrong Census/Survey · 2013-11-26T23:13:11.322Z · score: 0 (0 votes) · LW · GW

That's true, though I think "optimal" would be a better word for that than "correct".

Comment by nnotm on 2013 Less Wrong Census/Survey · 2013-11-26T14:49:18.924Z · score: 1 (1 votes) · LW · GW

There are no "correct" or "incorrect" definitions, though, are there? Definitions are subjective, it's only important that participants of a discussion can agree on one.

Comment by nnotm on 2013 Less Wrong Census/Survey · 2013-11-26T14:42:55.366Z · score: 21 (21 votes) · LW · GW

I took it. I was surprised how far I was off with Europe.

Comment by nnotm on The Quick Bayes Table · 2013-08-19T20:47:02.306Z · score: 0 (0 votes) · LW · GW

I know this is over a year old, but I still feel like this is worth pointing out:

If you can get the positive likelihood ratio as the meaning of a positive result, then you can use the negative likelihood ratio as the meaning of the negative result just reworking the problem.

You weren't using the likelihood ratio, which is one value, 8.33... in this case. You were using the numbers you use to get the likelihood ratio.

But the same likelihood ratio would also occur if you had 8% and 0.96%, and then the "negative likelihood ratio" would be about 0.93 instead of 0.22.

You simply need three numbers. Two won't suffice.