Comment by liam-donovan on On motivations for MIRI's highly reliable agent design research · 2019-07-12T00:16:14.372Z · score: 1 (1 votes) · LW · GW

What work is step #1 doing here? It seems like steps #2-5 would still hold even if the AGI in question were using "bad" consequentialist reasoning (e.g. domain-limited/high-K/exploitable/etc.).

In fact, is it necessary to assume that the AGI will be consequentialist at all? It seems highly probable that the first pivotal act will be taken by a system of humans+AI that is collectively behaving in a consequentialist fashion (in order to pick out a pivotal act from the set of all actions). If so, do arguments #2-#5 not apply equally well to this system as a whole, with "top-level" interpreted as something like "transparent to humans within the system"?

Comment by liam-donovan on How does the organization "EthAGI" fit into the broader AI safety landscape? · 2019-07-08T01:19:53.140Z · score: 1 (1 votes) · LW · GW

As far as visible output, the founder did write a (misleading imho) fictional book about AI risk called "Detonation", which is how I heard of Ethagi. I was curious how an organization like this could form with no connection to "mainstream" AI safety people, but I guess it's more common than I thought

How does the organization "EthAGI" fit into the broader AI safety landscape?

2019-07-08T00:46:02.191Z · score: 4 (2 votes)
Comment by liam-donovan on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-03T12:22:08.089Z · score: 3 (1 votes) · LW · GW

Well, a given copy of the oracle wouldn't directly recieve information from the other oracles about the questions they were asked. To the extent a problem remains (which I agree is likely without specific assumptions), wouldn't it apply to all counterfactual oracles?

Comment by liam-donovan on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-02T20:12:59.747Z · score: 5 (3 votes) · LW · GW

Two basic questions I couldn't figure out (sorry):

Can you use a different oracle for every subquestion? If you can, how would this affect the concern Wei_Dai raises?

If we know the oracle is only optimizing for the specified objective function, are mesa-optimisers still a problem for the proposed system as a whole?

Comment by liam-donovan on Contest: $1,000 for good questions to ask to an Oracle AI · 2019-07-02T20:08:05.060Z · score: 4 (2 votes) · LW · GW

How would this be low-bandwidth? If we're able to give the oracle a list of passwords to guess from, can't we just check them all?

Comment by liam-donovan on Open question: are minimal circuits daemon-free? · 2019-06-27T22:16:18.830Z · score: 1 (1 votes) · LW · GW

But the other elements in C(n) aren't necessarily daemons either, right?; Certainly "encoding n days of weather data" isn't daemonic at all; some versions of c_apx might be upstream daemons, but that's not necessarily concerning. I don't understand how this argument tells us anything about whether the smallest circuit is guaranteed to be (downstream) daemon-free.

Comment by liam-donovan on Open question: are minimal circuits daemon-free? · 2019-06-27T21:43:24.439Z · score: 1 (1 votes) · LW · GW

Why couldn't you just use a smaller circuit that runs one single-step simulator, and outputs the result? It seems like that would output an accurate prediction of Paul's behavior iff the k-step simulator outputs an accurate prediction.

Comment by liam-donovan on Open question: are minimal circuits daemon-free? · 2019-06-27T21:31:47.507Z · score: 1 (1 votes) · LW · GW

This is all only relevant to downstream daemons, right? If so, I don't understand why the DD would ever provide 98% accuracy; I'd expect it to provide 99% accuracy until it sees a chance to provide [arbitarily low]% accuracy and start pursuing its agenda directly. As you say, this might happen due to competition between daemon-containing systems, but I think a DD would want to maximize its chances of survival by maximizng its accuracy either way.


Comment by liam-donovan on Open question: are minimal circuits daemon-free? · 2019-06-27T21:22:55.192Z · score: 1 (1 votes) · LW · GW

Isn't this just saying it would be nice if we collectively put more resources towards alignment research relative to capabilities research? I still feel like I'm missing something :/

Comment by liam-donovan on Is it good practice to write questions/comments on old posts you're trying to understand? · 2019-06-27T21:17:59.061Z · score: 2 (2 votes) · LW · GW

Is there a good way to know if an AI safety post is "ephemeral" in the sense that it's no longer relevant to the current state of the discussion?

Is it good practice to write questions/comments on old posts you're trying to understand?

2019-06-27T09:23:01.619Z · score: 21 (10 votes)
Comment by liam-donovan on Open question: are minimal circuits daemon-free? · 2019-06-27T08:02:09.632Z · score: 1 (1 votes) · LW · GW

(sorry for commenting on such an old post)

It seems like the problem from evolution's perspective isn't that we don't understand our goal specification but that our goals are different from evolution's goals. It seems fairly tautological that putting more compute towards maximizing a goal specification than towards making sure the goal specification is what we want is likely to lead to UFAI; I don't see how that implies a "relatively simple" solution?

Comment by liam-donovan on Open question: are minimal circuits daemon-free? · 2019-06-27T07:43:19.764Z · score: 1 (1 votes) · LW · GW

Coincidentally I'm also trying to understand this post at the same time, and was somewhat confused by the "upstream"/"downstream" distinction.

What I eventually concluded was that there are 3 ways a daemon that intrinsically values optimizing some Y can "look like" it's optimzing X:

  • Y = X (this seems both unconcerning and unlikely, and thus somewhat irrelevant)
  • optimzing Y causes optimization pressure to be applied to X (upstream daemon, describes humans if Y = our actual goals and X = inclusive genetic fitness)
  • The daemon is directly optimizing X because the daemon believes this instrumentally helps it achieve Y (downstream daemon, e.g. if optimizing X helps the daemon survive)

Does this seem correct? In particular, I don't understand why upstream daemons would have to have a relatively benign goal.

Comment by liam-donovan on Open question: are minimal circuits daemon-free? · 2019-06-27T07:24:07.047Z · score: 1 (1 votes) · LW · GW

Isn't that the problem Christiano is relatively optimistic about solving through training on different distributions? In particular, it seems like we'd notice the daemon's goals diverging from ours and try to correct it. If I understand Christiano's point correctly, he is optimistic that we could do so successfully due to the existence of a "basin of corrigibility". OTOH, the most concerning case would be when the daemon is intentionally trying to hide the fact that its goals are diverging from X, because we may not realize the system is misaligned before the daemon executes a treacherous turn.


Instead, I interpreted "starting as upstream and becoming downstream" as a description of a phenomenon that might look roughly like this:

1. Upstream daemon (UD) is "discovered" by some optimization process because it seems to do a good job optimizing X on the training data

2. The daemon is "initialized" by the outer optimizer and starts "running"

3. The outer optimizer (OO) provides a set of inputs to the daemon with the goal of optimizing X

4. Based on these inputs, the daemon "notices" that the entity that "initialized" it is trying to optimize X rather than Y

5. The UD infers that the outer optimizer will "terminate" it if the OO discovers the UD is optimizing for Y rather than X

6. The UD starts optimizing for X in order to survive (a convergent instrumental goal regardless of Y)

7. The OO observes the UD's performance on a wide variety of training data and infers that the UD is not in fact a upstream daemon

8. The daemon is "set loose" and proceeds to execute a treacherous turn


As you can tell, my model of how daemons work is extremely lacking; I've used quotes for every phrase that feels dubious to me.

One major question I have which this model, if correct, partially resolves for me: How does the downstream daemon ever "know" what X we are optimizing for in order for the outer optimizer to "discover" it?



Comment by liam-donovan on Explaining "The Crackpot Bet" · 2019-06-27T06:39:28.051Z · score: 1 (1 votes) · LW · GW

I am so confused...what was the answer Zvi had in mind that "anyone who thought about it for 5 minutes" could see? Keep in mind I have absolutely no context here and just wondering if there is in fact an interesting puzzle here whose answer I didn't figure out.

Comment by liam-donovan on Evidence other than evolution for optimization daemons? · 2019-04-22T16:33:24.666Z · score: 3 (2 votes) · LW · GW

I have read that post; it makes sense, but I'm not sure how to distinguish "correct" from "persuasive but wrong" in this case without other evidence

Comment by liam-donovan on Evidence other than evolution for optimization daemons? · 2019-04-22T03:39:23.967Z · score: 1 (0 votes) · LW · GW

I don't understand why it would be -- it looks like MENACE is just a simple physical algorithm that successfully optimizes for winning tic-tac-toe. I thought the idea of an OD was that a process optimizing for goal A hard enough could produce a consequentialist* agent that cares about a different goal B. What is the goal B here (or am I misunderstanding the concept)?

*in the sense Christiano uses "consequentialist"

Comment by liam-donovan on Evidence other than evolution for optimization daemons? · 2019-04-21T21:33:14.083Z · score: 1 (1 votes) · LW · GW

Regarding the first part of your comment: If I understand the quoted section correctly, I don't think I know enough about biology or theology to confidently take a position on that view. Is the observed behavior of some soulless optimizer (e.g. intelligent non-human primates) significantly different from what one would expect if they only maximized inclusive genetic fitness? If so, that would definitely answer my question.

Comment by liam-donovan on Evidence other than evolution for optimization daemons? · 2019-04-21T21:13:52.014Z · score: 1 (1 votes) · LW · GW

Thank you for the prompt response to a poorly-worded question!

I'm not particularly interested in answers that take God/free will into account; I was just hoping to find evidence/justifications for the existence of optimization daemons other than evolution. It sounds like my question would be clearer and more relevant if I removed the mention of religion?

Evidence other than evolution for optimization daemons?

2019-04-21T20:50:18.986Z · score: 4 (1 votes)
Comment by liam-donovan on Clarifying Consequentialists in the Solomonoff Prior · 2019-04-03T07:32:01.031Z · score: 1 (1 votes) · LW · GW

Is the possibility of consequentialists in the universal prior still extant for an AIXItl algorithm running with arbitrarily large (but finite) computer power?