Redefining Fast Takeoff 2019-08-23T02:15:16.369Z · score: 10 (8 votes)
Deconfuse Yourself about Agency 2019-08-23T00:21:24.548Z · score: 15 (10 votes)
AI Safety Debate and Its Applications 2019-07-23T22:31:58.318Z · score: 39 (18 votes)


Comment by vojtakovarik on Deconfuse Yourself about Agency · 2019-10-09T21:26:21.155Z · score: 1 (1 votes) · LW · GW

First off, while I feel somewhat de-confused about X-like behavior, I don't feel very confident about X-like architectures. Maybe the meaning is somewhat clear on higher levels of abstraction (e.g., if my brain goes "realize I want to describe a concept --> visualize several explanations and judge each for suitability --> pick the one that seems the best --> send a signal to start typing it down", then this would be a kind of search/optimization-thingy). But on the level of physics, I don't really know what an architecture means. So take this with a grain of salt.

Maybe the term "physical structure" is misleading. The thing I was trying to point at is the distinction between being able to accurately model Y using model X, and Y actually being X. In the sense that there might be a giant look-up table (GLUT) that accuractly predicts your behavior, but on no level of abstraction is it correct to say that you actually are a GLUT. Whereas modelling you as having some goals, planning, etc. might be less accurate but somewhat more, hm, true. I realize this isn't very precise, but I guess you can see what I mean.

That being said, I suppose that what I meant by "optimization architecture" is, for example, a stochastic gradient descent with the emphasis on "this is the input", "this is the part of the algorithm that does the calculation", and "this is the output". An "implementation of an optimization architecture" would be...well, the atoms of your computer that perform SGD, or maybe some simple bacteria that moves in the direction where the concentration of whatever-it-likes is the highest (not that anything I know would implement precisely SGD, but still).

Ad "interesting physical structure" behind the ant-colony: If by "evolution" we mean the atoms that the world is made of, as they changed over time until your ant colony emerged...then yeah, this is a physical structure causally upstream of the ant colony, and one that is responsible for the ant colony behaving the way it does. I wouldn't say it is interesting (to me, and w.r.t. the ant colony) though, since it is totally incomprehensible to me. (But maybe "interestingness" doesn't really make sense on the level of physics, and is only relevant in relation to our abstract world-models and their understanding.)

Finally, the ideal thing a "X-like behavior ==> Y-like architecture" theorem would cash out into is a criterion that you can actually check and say with certainty that the thing will not exhibit X-like behavior. (Whether this is reasonable to hope for is another matter.) So, even if all that I have written in this comment turns out to be nonsense, getting such criterion is what we are after :-).

Comment by vojtakovarik on Deconfuse Yourself about Agency · 2019-09-04T10:42:32.450Z · score: 1 (1 votes) · LW · GW

I agree with your summary :). The claim was that humans often predict behavior by assuming that something has a particular architecture.

(And some confusions about agency seem to appear precisely because of not making the architecture/behavior distinction.)

Comment by vojtakovarik on Problems with AI debate · 2019-08-30T23:57:20.273Z · score: 6 (3 votes) · LW · GW

Intuitively, I agree that the vacation question is under-defined / has too many "right" answers. On the other hand, I can also imagine the world where you can develop some objective fun theory, or just something which actually makes the questions well-posed. And the AIs could use this fact in the debate:

Bob: "Actually, you can derive a well-defined fun theory and use it to answer this question. And then Bali clearly wins."

Alice: "There could never be any such thing!"

Bob: "Actually, there indeed is such a theory, and its central idea is [...]."

[They go on like this for a bit, and eventually, Bob wins.]

Indeed, this seems like a thing you could (by explaining that integration is a thing) if somebody tried to convince you that there is no principled way to measure the area of a circle.

However -- if true -- this only shows that there are less under-defined question than we think. The "Ministry of Ambiguity versus the Department of Clarity" fight is still very much a thing, as are the incentives to manipulate the human. And perhaps most importantly, routinely holding debates where the AI "explains to you how to think about something" seems extremely dangerous...

Comment by vojtakovarik on Deconfuse Yourself about Agency · 2019-08-30T08:29:11.464Z · score: 3 (2 votes) · LW · GW

I have a sense that (formalized) versions of A(Θ)-morphism are going to be more useful (or easier?) for the behavioral side, though it isn't really clear.

I think -morphisation is primarily useful for describing what we often mean when we say "agency". In particular, I view this as distinct from which concepts we should be thinking about in this space. (I think the promising candidates include learning that Vanessa points to in her comment, optimization, search, and the concepts in the second part of my post.)

However, I think it might also serve as a useful part of the language for describing (non) agent-like behavior. For example, we might want to SGD-morphise an ecoli bacteria independently of whether it actually implements some form of stochastic gradient descent w.r.t. the concentration of some chemicals in the environment.

You mention the distinction between agent-like architecture and agent-like behavior (which I find similar to my distinction between selection and control), but how does the concept of A(Θ)-morphism account for this distinction?

I think of agent-like architectures as something objective, or related to the territory. In contrast, agent-like behavior is something subjective, something in the map. Importantly, agent-like behavior, or the lack of it, of some is something that exists in the map of some entity (where often ).

The selection/control distinction seems related, but not quite similar to me. Am I missing something there?

Comment by vojtakovarik on Deconfuse Yourself about Agency · 2019-08-29T18:45:23.993Z · score: 1 (1 votes) · LW · GW

I am not even sure what the input/output channels of a rock are supposed to be

I guess you imagine that the input is the physical forces affecting the ball and the output is the forces the ball exerts on the environment. Obviously, this is very much not useful for anything. But it suddenly becomes non-trivial if you consider something like the billiard-ball computer (seems like a theoretical construct, not sure if anybody actually built one...but it seems like a relevant example anyway).

Comment by vojtakovarik on Deconfuse Yourself about Agency · 2019-08-29T18:37:20.453Z · score: 3 (2 votes) · LW · GW

Yep, that totally makes sense.

Observations inspired by your comment: While this shouldn't necessarily be so, it seems the particular formulations make a lot of difference when it comes to exchanging ideas. If I read your comment without the

(although maybe "intelligence" would be a better word?)

bracket, I immediatelly go "aaa, this is so wrong!". And if I substitute "intelligent" for "agent", I totally agree with it. Not sure whether this is just me, or whether it generalizes to other people.

More specifically, I agree that from the different concepts in the vicinity of "agency", "the ability to learn the environment and exploit this knowledge towards a certain goal" seems to be particularly important to AI alignment. I think the word "agency" is perhaps not well suited for this particular concept, since it comes with so many other connotations. But "intelligence" seems quite right.

Comment by vojtakovarik on Towards an Intentional Research Agenda · 2019-08-23T18:05:45.761Z · score: 1 (1 votes) · LW · GW

(I don't have much experience thinking in these terms, so maybe the question is dumb/already answered in the post. But anyway: )

Do you have some more-detailed (and stupidly explicit) examples of the intentional and algorithmic views on the same thing, and how to translate between them?

Comment by vojtakovarik on Vaniver's View on Factored Cognition · 2019-08-23T17:39:04.136Z · score: 1 (1 votes) · LW · GW

That is, I can easily see how factored cognition allows you to stick to cognitive strategies that definitely solve a problem in a safe way, but don't see how it does that and allows you to develop new cognitive strategies to solve a problem that doesn’t result in an opening for inner optimizers--not within units, but within assemblages of units.

Do you have some intuition for how inner optimizers would arise within assemblages of units, without being initiated by some unit higher in the hierarchy? Or is that what you are pointing at?

Comment by vojtakovarik on AI Safety Debate and Its Applications · 2019-07-31T11:54:17.526Z · score: 4 (2 votes) · LW · GW

I agree with Lanrian. A perhaps better metric is the chance that randomly selected pixels of a randomly selected image will cause the judge to guess the label correctly. This corresponds to "judge accuracy (random pixels)" in Table 2 of the original paper, and it's 48.2%/59.4% for 4/6 pixels.