Comment by interstice on Two Small Experiments on GPT-2 · 2019-03-04T23:33:55.601Z · score: 4 (2 votes) · LW · GW

If you literally ran (a powered-up version of) GPT-2 on "A brilliant solution to the AI alignment problem is..." you would get the sort of thing an average internet user would think of as a brilliant solution to the AI alignment problem. Trying to do this more usefully basically leads to Paul's agenda (which is about trying to do imitation learning of an implicit organization of humans)

Predictors as Agents

2019-01-08T20:50:49.599Z · score: 11 (7 votes)
Comment by interstice on Predictors as Agents · 2019-01-03T21:22:03.357Z · score: 3 (2 votes) · LW · GW

Reflective Oracles are a bit of a weird case case because their 'loss' is more like a 0/1 loss than a log loss, so all of the minima are exactly the same(If we take a sample of 100000 universes to score them, the difference is merely incredibly small instead of 0). I was being a bit glib referencing them in the article; I had in mind something more like a model parameterizing a distribution over outputs, whose only influence on the world is via a random sample from this distribution. I think that such models should in general have fixed points for similar reasons, but am not sure. Regardless, these models will, I believe, favour fixed points whose distributions are easy to compute(But not fixed points with low entropy, that is they will punish logical uncertainty but not intrinsic uncertainy). I'm planning to run some experiments with VAEs and post the results later.

Comment by interstice on Generalising CNNs · 2019-01-03T03:49:10.172Z · score: 3 (2 votes) · LW · GW

You might be interested in Transformer Networks, which use a learned pattern of attention to route data between layers. They're pretty popular and have been used in some impressive applications like this very convincing image-synthesis GAN.

re: whether this is a good research direction. The fact that neural networks are highly compressible is very interesting and I too suspect that exploiting this fact could lead to more powerful models. However, if your goal is to increase the chance that AI has a positive impact, then it seems like the relevant thing is how quickly our understanding of how to align AI systems progresses, relative to our understanding of how to build powerful AI systems. As described, this idea sounds like it would be more useful for the latter.

Comment by interstice on Predictors as Agents · 2019-01-01T21:19:50.390Z · score: 1 (1 votes) · LW · GW
Is there a reason you think a reflective oracle (or equivalent) can't just be selected "arbitrarily", and will likely be selected to maximize some score?

The gradient descent is not being done over the reflective oracles, it's being done over some general computational model like a neural net. Any highly-performing solution will necessarily look like a fixed-point-finding computation of some kind, due to the self-referential nature of the predictions. Then, since this fixed-point-finder is *internal* to the model, it will be optimized for log loss just like everything else in the model.

That is, the global optimization of the model is distinct from whatever internal optimization the fixed-point-finder uses to choose the reflective oracle. The global optimization will favor internal optimizers that produce fixed-points with good score. So while fixed-point-finders in general won't optimize for anything in particular, the one this model uses will.

Comment by interstice on Announcement: AI alignment prize round 3 winners and next round · 2018-12-31T22:53:30.147Z · score: 3 (2 votes) · LW · GW

I submit Predictors as Agents.

Comment by interstice on Reflective AIXI and Anthropics · 2018-09-30T15:18:25.622Z · score: 1 (1 votes) · LW · GW
If we assume Sleeping Beauty has lots of information, we might expect that the shortest matching program will look like a simulation of physical law plus a "bridging law" that, given this simulation, tells you what symbols get written to the tape

I agree. I still think that the probabilities would be closer to 1/2, 1/4, 1/4. The bridging law could look like this: search over the universe for compact encodings of my memories so far, then see what is written next onto this encoding. In this case, it would take no more bits to specify waking up on Tuesday, because the memories are identical, in the same format, and just slightly later temporally.

In a naturalized setting, it seems like the tricky part would be getting the AIXI on Monday to care what happens after it goes to sleep. It 'knows' that it's going to lose consciousness(it can see that its current memory encoding is going to be overwritten) so its next prediction is undetermined by its world-model. There is one program that will give it the reward of its successor then terminates, as I described above, but it's not clear why the AIXI would favour that hypothesis. Maybe if it has been in situations involving memory-wiping before, or has observed other RO-AIXI's in such situations.

Comment by interstice on Deep learning - deeper flaws? · 2018-09-29T19:19:46.254Z · score: 1 (1 votes) · LW · GW

"I can't make bets on my beliefs about the Eschaton, because they are about the Eschaton." -- Well, it makes sense. Besides, I did offer you a bet taking into account a) that the money may be worth less in my branch b) I don't think DL + RL AGI is more likely than not, just plausible. If you're more than 96% certain there will be no such AI, 20:1 odds are a good deal.

But anyways, I would be fine with betting on a nearer-term challenge. How about -- in 5 years, a bipedal robot that can run on rough terrain, as in this video, using a policy learned from scratch by DL + RL(possibly including a simulated environment during training) 1:1 odds.

Comment by interstice on Deep learning - deeper flaws? · 2018-09-28T17:07:21.607Z · score: 1 (1 votes) · LW · GW

Hmmm...but if I win the bet then the world may be destroyed, or our environment could change so much the money will become worthless. Would you take 20:1 odds that there won't be DL+RL-based HLAI in 25 years?

Comment by interstice on Reflective AIXI and Anthropics · 2018-09-28T16:59:33.354Z · score: 1 (1 votes) · LW · GW

I still don't see how you're getting those probabilities. Say it takes 1 bit to describe the outcome of the coin toss, and assume it's easy to find all the copies of yourself(ie your memories) in different worlds. Then you need:

1 bit to specify if the coin landed heads or tails

If the coin landed tails, you need 1 more bit to specify if it's Monday or Tuesday.

So AIXI would give these scenarios P(HM)=0.50, P(TM)=0.25, P(TT)=0.25.

Comment by interstice on Deep learning - deeper flaws? · 2018-09-27T22:08:06.447Z · score: 1 (1 votes) · LW · GW

Have something in mind?

Comment by interstice on Reflective AIXI and Anthropics · 2018-09-27T20:34:40.935Z · score: 1 (1 votes) · LW · GW

Well, it COULD be the case that the K-complexity of the memory-erased AIXI environment is lower, even when it learns that this happened. The reason for this is that there could be many possible past AIXI's who have their memory erased/altered and end up in the same subjective situation. Then the memory-erasure hypothesis can use the lowest K-complexity AIXI who ends up with these memories. As the AIXI learns more it can gradually piece together which of the potential past AIXI's it actually was and the K-complexity will go back up again.

EDIT: Oh, I see you were talking about actually having a RANDOM memory in the sense of a random sequence of 1s and 0s. Yeah, but this is no different than AIXI thinking that any random process is high K-complexity. In general, and discounting merging, the memory-altering subroutine will increase the complexity of the environment by a constant plus the complexity of whatever transformation you want to apply to the memories.

Comment by interstice on Deep learning - deeper flaws? · 2018-09-27T19:49:47.219Z · score: 2 (2 votes) · LW · GW

Well, the DotA bot pretty much just used PPO,. AlphaZero used MCTS + RL, OpenAI recently got a robot hand to do object manipulation with PPO and a simulator(the simulator was hand-built, but in principle it could be produced by unsupervised learning like in this). Clearly it's possible to get sophisticated behaviors out of pretty simple RL algorithms. It could be the case that these approaches will "run out of steam" before getting to HLAI, but it's hard to tell at the moment, because our algorithms aren't running with the same amount of compute + data as humans (for humans, I am thinking of our entire lifetime experiences as data, which is used to build a cross-domain optimizer).

re: Uber, I agree that at least in the short term most applications in the real world will feature a fair amount of engineering by hand. But the need for this could decrease as more power becomes available, as has been the case in supervised learning.

Comment by interstice on Boltzmann Brains and Within-model vs. Between-models Probability · 2018-09-27T16:29:38.356Z · score: 1 (1 votes) · LW · GW

How do the initial simple conditions relate to the branching? Our universe seems to have had simple initial conditions but there's still been a lot of random branching, right? That is, the universe from our perspective is just one branch of a quantum state evolving simply from simple conditions, so you need O(#branching events) bits to describe it. Incidentally this undermines Eliezer's argument for MWI based on Solomonoff induction, though MWI is probably still true

[EDITED: Oh, from one of your other comments I see that you aren't saying the shortest program involves beginning at the start of the universe. That makes sense]

Comment by interstice on Deep learning - deeper flaws? · 2018-09-27T16:11:31.441Z · score: 1 (1 votes) · LW · GW

I agree that you do need some sort of causal structure around the function-fitting deep net. The question is how complex this structure needs to be before we can get to HLAI. It seems plausible to me(at least a 10% chance, say) that it could be quite simple, maybe just consisting of modestly more sophisticated versions of the RL algorithms we have so far, combined with really big deep networks.

Comment by interstice on Reflective AIXI and Anthropics · 2018-09-27T15:51:05.644Z · score: 1 (1 votes) · LW · GW

Incidentally, you can use the same idea to have RO-AIXI do anthropic reasoning/bargaining about observers that are in a broader reference class than 'exact same sense data', by making the mapping O -> O' some sort of coarse-graining.

Comment by interstice on Reflective AIXI and Anthropics · 2018-09-27T15:44:14.392Z · score: 1 (1 votes) · LW · GW

" P(HM)=0.49, P(TM)=0.49, P(TT)=0.2 " -- Are these supposed to be mutually exclusive probabilities?

" There is a turing machine that writes the memory-wiped contents to tape all in one pass. " - Yes, this is basically what I said. ('environment' above could include 'the world' + bridging laws). But you also need to alter the reward structure a bit to make it match our usual intuition of what 'memory-wiping' means, and this has significance for decision theory.

Consider, if your own memory was erased, you would probably still be concerned about what was going to happen to you later. But a regular AIXI won't care about what happens to its memory-wiped clone(i.e. another AIXI inducting on the 'memory-wiped' stream), because they don't share an input channel. So to fix this you give the original AIXI all of the rewards that its clone ends up getting.

Comment by interstice on Deep learning - deeper flaws? · 2018-09-26T22:36:05.115Z · score: 3 (1 votes) · LW · GW

Okay, but (e.g.) deep RL methods can solve problems that apparently require quite complex causal thinking such as playing DotA. I think what is happening here is that while there is no explicit causal modelling happening at the lowest level of the algorithm, the learned model ends up building something that serves the functions of one because that is the simplest way to solve a general class of problems. See the above meta-RL paper for good examples of this. There seems to be no obvious obstruction to scaling this sort of thing up to human-level causal modelling. Can you point to a particular task needing causal inference that you think these methods cannot solve?

Comment by interstice on Boltzmann Brains and Within-model vs. Between-models Probability · 2018-09-26T15:26:28.038Z · score: 1 (1 votes) · LW · GW

The penalty for specifying where you are in space and time is dwarfed by the penalty for specifying which Everett branch you're in.

Comment by interstice on Deep learning - deeper flaws? · 2018-09-25T19:26:08.580Z · score: 5 (4 votes) · LW · GW

In the past, people have said that neural networks could not possibly scale up to solve problems of a certain type, due to inherent limitations of the method. Neural net solutions have then been found using minor tweaks to the algorithms and (most importantly) scaling up data and compute. Ilya Sutskever gives many examples of this in his talk here. Some people consider this scaling-up to be "cheating" and evidence against neural nets really working, but it's worth noting that the human brain uses compute on the scale of today's supercomputers or greater, so perhaps we should not be surprised if a working AI design requires a similar amount of power.

On a cursory reading, it seems like most the problems given in the papers could plausibly be solved by meta-reinforcement learning on a general-enough set of environments, of course with massively scaled-up compute and data. It may be that we will need a few more non-trivial insights to get human-level AI, but it's also plausible that scaling up neural nets even further will just work.

Comment by interstice on Reflective AIXI and Anthropics · 2018-09-25T17:57:02.891Z · score: 4 (2 votes) · LW · GW

I think the framework of RO-AIXI can be modified pretty simply to include memory-tampering.

Here's how you do it. Say you have an environment E and an RO-AIXI A running in it. You have run the AIXI for a number of steps, and it has a history of observations O. Now we want to alter its memory to have a history of observations O'. This can be implemented in the environment as follows:

1. Create a new AIXI A', with the same reward function as the original and no memories. Feed it the sequence of observations O'.

2. Run A' in place of A for the remainder of E. In the course of this execution, A' will accumulate total reward R. Terminate A'.

3. Give the original AIXI reward R, then terminate it.

This basically captures what it means for AIXI's memory to be erased. Two AIXI's are only differentiated from each other by their observations and reward function, so creating a new AIXI which shares a reward function with the original is equivalent to changing the first AIXI's observations. The new AIXI, A', will also be able to reason about the possibility that it was produced by such a 'memory-tampering program', as this is just another possible RO-Turing machine. In other words it will be able to reason about the possibility that its memory has been altered.

[EDITED: My original comment falsely stated that AIXI-RO avoids dutch-booking, but I no longer think it does. I've edited my reasoning below]

As applied to the Sleeping Beauty problem from the paper, I think this WILL be dutch-booked. If we assume it takes one bit to specify heads/tails, and one to specify which day one wakes on, then the agent will have probabilities

1/2 Heads,

1/4 Tails, wake on Monday

1/4 Tails, wake on Tuesday

Since memory-erasure has the effect of creating a new AIXI with no memories, the betting scenario(in section 3.2) of the paper has the structure of either a single AIXI choosing to take a bet, or two copies of the same AIXI playing a two-person game. RO-AIXI plays Nash equilibria in such scenarios. Say the AIXI has taken bet 9. From the perspective of the current AIXI, let p be the probability that it takes bet 10, and let q be the probability that its clone takes bet 10.

E[u] = 1/2 * ( (-15 + 2eps) + p (10 + eps)) + 1/2 * ((15 + eps) + p*q *(-20 + 2eps) + p(1 - q)(-10 + eps) + q(1 - p) * (-10 + eps))

= 3/2 eps + 1/2 * (p * 2 * eps + q(-10 + eps))

This has the structure of a prisoner's dilemma. In particular, the expected utility of the current AIXI is maximized at p = 1. So both AIXI's will take the bet and incur a sure loss. On the other hand, for this reason the original AIXI A would not take the bet 9 on Sunday, if given the choice.

Comment by interstice on Exorcizing the Speed Prior? · 2018-07-24T23:37:54.165Z · score: 1 (1 votes) · LW · GW

You could think of the 'advice' given by evolution being in the form of a short program, e.g. for a neural-net-like learning algorithm. In this case, a relatively short string of advice could result in a lot of apparent optimization.

(For the book example: imagine a species that outputs books of 20Gb containing only the letter 'a'. This is very unlikely to be produced by random choice, yet it can be specified with only a few bits of 'advice')

Comment by interstice on Physics has laws, the Universe might not · 2018-06-16T19:02:36.404Z · score: 3 (1 votes) · LW · GW

I largely agree with your conception. That's sort of why I put scare quotes around exist -- I was talking about universes for which there is NO finite computational description, which (I think) is what the OP was talking about. I think it would basically be impossible for us to reason about such universes, so to say that they 'exist' is kind of strange.

Comment by interstice on Physics has laws, the Universe might not · 2018-06-14T18:31:42.618Z · score: 5 (2 votes) · LW · GW

The idea of a universe "without preset laws" seems strange to me. Say for example that you take your universe to be a uniform distribution over strings of length n. This "universe" might be highly chaotic, but it still has an orderly short description -- namely, as the uniform distribution. More generally, for us to even SPEAK about "a toy universe" coherently, we need to give some sort of description of that universe, which basically functions as the laws of that universe(probabilistic laws are still laws). So even if such universes "exist"(whatever that means), we couldn't speak or reason about them in any way, let alone run computer simulations of them.

Comment by interstice on Beyond Astronomical Waste · 2018-06-08T20:02:50.167Z · score: 8 (4 votes) · LW · GW

The weight could be something like the algorithmic probability over strings(https://en.wikipedia.org/wiki/Algorithmic_probability), in which case universes like ours with a concise description would get a fairly large chunk of the weight.

Comment by interstice on The simple picture on AI safety · 2018-05-28T16:50:54.670Z · score: 21 (7 votes) · LW · GW

Couldn't you say the same thing about basically any problem? "Problem X is really quite simple. It can be distilled down to these steps: 1. Solve problem X. There, wasn't that simple?"

Comment by interstice on Open question: are minimal circuits daemon-free? · 2018-05-08T23:31:32.676Z · score: 3 (1 votes) · LW · GW

By "predict sufficiently well" do you mean "predict such that we can't distinguish their output"?

Unless the noise is of a special form, can't we distinguish $f$ and $tilde{f}$ by how well they do on $f$'s goals? It seems like for this not to be the case, the noise would have to be of the form "occasionally do something weak which looks strong to weaker agents". But then we could get this distribution by using a weak (or intermediate) agent directly, which would probably need less compute.

Comment by interstice on Open question: are minimal circuits daemon-free? · 2018-05-07T18:24:49.696Z · score: 13 (3 votes) · LW · GW

Don't know if this counts as a 'daemon', but here's one scenario where a minimal circuit could plausibly exhibit optimization we don't want.

Say we are trying to build a model of some complex environment containing agents, e.g. a bunch of humans in a room. The fastest circuit that predicts this environment will almost certainly devote more computational resources to certain parts of the environment, in particular the agents, and will try to skimp as much as possible on less relevant parts such as chairs, desks etc. This could lead to 'glitches in the matrix' where there are small discrepancies from what the agents expect.

Finding itself in such a scenario, a smart agent could reason: "I just saw something that gives me reason to believe that I'm in a small-circuit simulation. If it looks like the simulation is going to be used for an important decision, I'll act to advance my interests in the real world; otherwise, I'll act as though I didn't notice anything".

In this way, the overall simulation behavior could be very accurate on most inputs, only deviating in the cases where it is likely to be used for an important decision. In effect, the circuit is 'colluding' with the agents inside it to minimize its computational costs. Indeed, you could imagine extreme scenarios where the smallest circuit instantiates the agents in a blank environment with the message "you are inside a simulation; please provide outputs as you would in environment [X]". If the agents are good at pretending, this could be quite an accurate predictor.

Comment by interstice on On exact mathematical formulae · 2018-04-24T21:26:11.147Z · score: 9 (3 votes) · LW · GW

re: differential equation solutions, you can compute if they are within epsilon of each other for any epsilon, which I feel is "morally the same" as knowing if they are equal.

It's true that the concepts are not identical. I feel computability is like the "limit" of the "explicit" concept, as a community of mathematicians comes to accept more and more ways of formally specifying a number. The correspondence is still not perfect, as different families of explicit formulae will have structure(e.g. algebraic structure) that general Turing machines will not.

Comment by interstice on On exact mathematical formulae · 2018-04-23T00:36:44.785Z · score: 8 (5 votes) · LW · GW

While the concept of explicit solution can be interpreted messily, as in the quote above, there is a version of this idea that more closely cuts reality at the joints, computability. A real number is computable iff there is a Turing machine that outputs the number to any desired accuracy. This covers fractions, roots, implicit solutions, integrals, and, if you believe the Church-Turing thesis, anything else we will be able to come up with. https://en.wikipedia.org/wiki/Computable_number

Comment by interstice on Announcing the AI Alignment Prize · 2018-01-03T22:32:25.086Z · score: 3 (1 votes) · LW · GW

usernameneeded@gmail.com

Hope it's not too late, but I also meant for this post(linked in original) to be part of my entry:

https://www.lesserwrong.com/posts/ra4yAMf8NJSzR9syB/a-candidate-complexity-measure

Comment by interstice on Announcing the AI Alignment Prize · 2017-12-31T20:18:32.437Z · score: 3 (2 votes) · LW · GW

Submission:

https://www.lesserwrong.com/posts/ytph8t6AcxPcmJtDh/formal-models-of-complexity-and-evolution

Formal Models of Complexity and Evolution

2017-12-31T20:17:46.513Z · score: 12 (4 votes)

A Candidate Complexity Measure

2017-12-31T20:15:39.629Z · score: 29 (9 votes)
Comment by interstice on Please Help: How to make a big improvement in the alignment of political parties’ incentives with the public interest? · 2017-01-18T00:53:44.250Z · score: 2 (2 votes) · LW · GW

Dominic Cummings asks for help in aligning incentives of political parties. Thought this might be of interest, as aligning incentives is a common topic of discussion here, and Dominic is someone with political power(he ran the Leave campaign for Brexit), so giving him suggestions might be a good opportunity to see some of the ideas here actually implemented.

Please Help: How to make a big improvement in the alignment of political parties’ incentives with the public interest?

2017-01-18T00:51:56.355Z · score: 2 (3 votes)
Comment by interstice on Deliberate Grad School · 2015-10-05T17:56:44.397Z · score: 1 (1 votes) · LW · GW

I think the idea is that you're supposed to deduce the last name and domain name from identifying details in the post.

Comment by interstice on Beyond Statistics 101 · 2015-06-26T16:41:12.366Z · score: 6 (6 votes) · LW · GW

What resources would you recommend for learning advanced statistics?

Comment by interstice on Help needed: nice AIs and presidential deaths · 2015-06-08T21:42:02.145Z · score: 1 (1 votes) · LW · GW

How about you ask the AI "if you were to ask a counterfactual version of you who lives in a world where the president died, what would it advise you to do?". This counterfactual AI is motivated to take nice actions, so it would advise the real AI to take nice actions as well, right?

Comment by interstice on [Link]: The Unreasonable Effectiveness of Recurrent Neural Networks · 2015-06-04T22:03:13.611Z · score: 11 (13 votes) · LW · GW

An interesting post, but I don't know if it implies that "strong AI may be near". Indeed, the author has written another post in which he says that we are "really, really far away" from human-level intelligence: https://karpathy.github.io/2012/10/22/state-of-computer-vision/.

Comment by interstice on Best Explainers on Different Subjects · 2015-03-21T01:37:28.026Z · score: 4 (4 votes) · LW · GW

Another one on computing: The Elements of Computing Systems. This book explains how computers work by teaching you to build a computer from scratch, staring with logic gates. By the end you have a working (emulation of) a computer, every component of which you built. It's great if you already know how to program and want to learn how computers work at a lower level.

Comment by interstice on Introducing Corrigibility (an FAI research subfield) · 2014-10-24T03:13:21.622Z · score: 1 (1 votes) · LW · GW

How does this differ from indifference?