Posts

Moloch games 2020-10-16T15:19:04.722Z
Subspace optima 2020-05-15T12:38:32.444Z
Risks from Learned Optimization: Conclusion and Related Work 2019-06-07T19:53:51.660Z
Deceptive Alignment 2019-06-05T20:16:28.651Z
The Inner Alignment Problem 2019-06-04T01:20:35.538Z
Conditions for Mesa-Optimization 2019-06-01T20:52:19.461Z
Risks from Learned Optimization: Introduction 2019-05-31T23:44:53.703Z
Alignment problems for economists 2018-07-10T23:43:56.662Z

Comments

Comment by Chris van Merwijk (chrisvm) on Finite Factored Sets: Conditional Orthogonality · 2021-08-24T08:08:58.986Z · LW · GW

I think a subpartition of S can be thought of as a partial function on S, or equivalently, a variable on S that has the possible value "Null"/"undefined".

Comment by Chris van Merwijk (chrisvm) on Finite Factored Sets: Orthogonality and Time · 2021-08-23T13:42:08.993Z · LW · GW

I just want to point out some interesting properties of this definition of time: Let time_C refer to the classical notion of time in a dynamical system, and time_FFS the notion defined in this article.

1. Suppose we have a field on space-time generated by a typical differential dynamical law that satisfies time_C-reversal symmetry, and suppose we factorize its histories according to the states of the system at time_C t=0. Then time_FFS doesn't make a distinction between the "positive" and "negative" part of the time_C. That is, if x is some position (choose a reference frame), then the position (x,2) in space-time (i.e. the value of the field at position x at time_C 2) is later in time_FFS than (x,1), but (x,-2) is also later in time than (x,-1). In this sense, the time_FFS notion of time seems to naturally capture the time-reversal symmetry in the laws of physics: Intuitively, if we start at the big bang, and go "backward in time" we are just as much going into the future as we are if we would go "forward in time". Both directions are the future.

2. However, more weirdly,  time_FFS also allows a comparison between the negative-time_C and positive-time_C events. Namely, (x,1) happens before_FFS (x,-2) while (x, -1) happens before_FFS (x,2). I am not sure what to make of this, or whether we should make anything of it.

3. Suppose a computer is implemented in the physical world and implements a deterministic function , AND we restrict to the set of histories in which this computer actually does this computation. Now let x denote the variable that captures what input is given to this computer (meaning, the data stored in the input register at one particular instance of running this algorithm), and y similarly denote the variable that captures what the output is, then y occurs (weakly) earlier_FFS than x, even though the variable x is defined to be earlier than y (more precisely, to directly apply the definitions of x and y to check their value in a particular history h would involve doing a check for x at a time_C that is earlier_C than the check for y). I'm not sure what to make of this, though it kind of seems like a feature not a bug. If we don't restrict to the set of histories in which the computer does the computation, I'm pretty sure this result disappears, which makes me think this is actually a desirable property of the theory.

Comment by Chris van Merwijk (chrisvm) on Finite Factored Sets: Orthogonality and Time · 2021-08-23T13:19:31.316Z · LW · GW

In the proof of proposition 18, "part 3" should be "part 4".

Comment by Chris van Merwijk (chrisvm) on Finite Factored Sets: Orthogonality and Time · 2021-08-23T10:28:30.509Z · LW · GW

Can't you define  for any set  of partitions of , rather than  w.r.t. a specific factorization , simply as  iff ? If so, it would seem to me to be clearer to define  that way (i.e. make 7 rather than 2 from proposition 10 the definition), and then basically proposition 10 says "if  is a subset of factors of a partition then here are a set of equivalent definitions in terms of chimera". Also I would guess that proposition 11 is still true for  rather than just for , though I haven't checked that 11.6 would still work, but it seems like it should.

Comment by Chris van Merwijk (chrisvm) on How to Throw Away Information · 2021-06-18T14:27:30.506Z · LW · GW

I might misunderstand something or made a mistake and I'm not gonna try to figure it out since the post is old and maybe not alive anymore, but isn't the following a counter-example to the claim that the method of constructing S described above does what it's supposed to do?

Let X and Y be independent coin flips. Then S will be computed as follows:

X=0, Y=0 maps to uniform distribution on {{0:0, 1:0}, {0:0, 1:1}}
X=0, Y=1 maps to uniform distribution on {{0:0, 1:0}, {0:1, 1:0}}
X=1, Y=0 maps to uniform distribution on {{0:1, 1:0}, {0:1, 1:1}}
X=1, Y=1 maps to uniform distribution on {{0:0, 1:1}, {0:1, 1:1}}

But since X is independent from Y, we want S to contain full information about X (plus some noise perhaps), so the support of S given X=1 must not overlap with the support of S given X=0. But it does. For example, {0:0, 1:1} has positive probability of occurring both for X=1, Y=1 and for X=0, Y=0. i.e. conditional on S={0:0, 1:1}, X is still bernoulli. 

Comment by Chris van Merwijk (chrisvm) on Violating the EMH - Prediction Markets · 2021-04-08T07:35:39.452Z · LW · GW

Is there currently a way to pool money on the trades you're suggesting? In general it seems like there is some economies of scale to be gained by creating some kind of rationalist fund

Comment by Chris van Merwijk (chrisvm) on Predictive Coding has been Unified with Backpropagation · 2021-04-05T14:54:29.152Z · LW · GW

I suspect a better title would be "Here is a proposed unification of a particular formalization of predictive coding, with backprop"

Comment by Chris van Merwijk (chrisvm) on Moloch games · 2020-10-18T09:02:45.074Z · LW · GW

Yes that's what I meant, thanks.

Comment by Chris van Merwijk (chrisvm) on Subspace optima · 2020-05-16T11:06:33.306Z · LW · GW

I made up the term on the spot, so I don't think so.

Comment by Chris van Merwijk (chrisvm) on Tabooing 'Agent' for Prosaic Alignment · 2019-08-23T06:21:36.359Z · LW · GW

I endorse this. I like the framing, and it's very much in line with how I think about the problem. One point I'd make is: I'd replace the word "model" with "algorithm", to be even more agnostic. "Model" seems for many people already to carry an implicit intuitive interpretation of what the learned algorithm is doing, namely "trying to faithfully represent the problem", or something similar.

Comment by Chris van Merwijk (chrisvm) on Two agents can have the same source code and optimise different utility functions · 2018-07-11T00:44:06.954Z · LW · GW

Here are some counterarguments:

There can be scenario's where the agent cannot change his source code without processing observations. e.g. the agent may need to reprogram himself via some external device.

The agent may not be aware that there are multiple copies of him.

It seems that for many plausible agent designs, it would require a significant change in the architecture to change his utility function. E.g. if two human sociopaths would want to change their utility function into a weighted average of the two, they couldn't do so without significantly changing their brain architecture. A TDT agent could do this, but I think it is not prudent to assume that all actually future existing AGI's we will deal with will be TDT's (in fact, most likely most of them won't be it seems to me).

So I don't think your comment invalidates the relevance of the point made by the poster.

Comment by Chris van Merwijk (chrisvm) on Two agents can have the same source code and optimise different utility functions · 2018-07-11T00:22:04.825Z · LW · GW

You don't necessarily need "explicit self-reference". The difference in utility functions can also be obtained due to a difference in the location of the agent in the universe. Two identical worms placed in different locations will have different utility functions due to their atoms being not exactly in the same location, despite not having explicit self-reference. Similarly, in a computer simulation, the agents with the same source code will be called by the universe-program in different contexts (if they weren't, I don't see how it makes sense to even speak of them as being "different instances of the same source code". There would just be one instance of the source code.).

So in fact, I think that this is probably a property of almost all possible agents. It seems to me that you need a very complex and specific ontological model in the agent to prevent these effects and have the two agents have the same utility function.