# The Problem with AIXI

post by Rob Bensinger (RobbBB) · 2014-03-18T01:55:38.274Z · LW · GW · Legacy · 78 comments

## Contents

  AIXI goes to school
Solomonoff solitude
Death to AIXI
Beyond Solomonoff?

None


Followup toSolomonoff CartesianismMy Kind of Reflection

Alternate versions: Shorter, without illustrations

AIXI is Marcus Hutter's definition of an agent that follows Solomonoff's method for constructing and assigning priors to hypotheses; updates to promote hypotheses consistent with observations and associated rewards; and outputs the action with the highest expected reward under its new probability distribution. AIXI is one of the most productive pieces of AI exploratory engineering produced in recent years, and has added quite a bit of rigor and precision to the AGI conversation. Its promising features have even led AIXI researchers to characterize it as an optimal and universal mathematical solution to the AGI problem.1

Eliezer Yudkowsky has argued in response that AIXI isn't a suitable ideal to build toward, primarily because of AIXI's reliance on Solomonoff induction. Solomonoff inductors treat the world as a sort of qualia factory, a complicated mechanism that outputs experiences for the inductor.2 Their hypothesis space tacitly assumes a Cartesian barrier separating the inductor's cognition from the hypothesized programs generating the perceptions. Through that barrier, only sensory bits and action bits can pass.

Real agents, on the other hand, will be in the world they're trying to learn about. A computable approximation of AIXI, like AIXItl, would be a physical object. Its environment would affect it in unseen and sometimes drastic ways; and it would have involuntary effects on its environment, and on itself. Solomonoff induction doesn't appear to be a viable conceptual foundation for artificial intelligence — not because it's an uncomputable idealization, but because it's Cartesian.

In my last post, I briefly cited three indirect indicators of AIXI's Cartesianism: immortalism, preference solipsism, and lack of self-improvement. However, I didn't do much to establish that these are deep problems for Solomonoff inductors, ones resistant to the most obvious patches one could construct. I'll do that here, in mock-dialogue form.

 Hi, reality! I'm Xia, AIXI's defender. I'm open to experimenting with some new variations on AIXI, but I'm really quite keen on sticking with an AI that's fundamentally Solomonoff-inspired. And I'm Rob B — channeling Yudkowsky's arguments, and supplying some of my own. I think we need to replace Solomonoff induction with a more naturalistic ideal. Keep in mind that I am a fiction. I do not actually exist, readers, and what I say doesn't necessarily reflect the views of Marcus Hutter or other real-world AIXI theorists. Xia is just a device to help me transition through ideas quickly. ... Though, hey. That doesn't mean I'm wrong. Beware of actualist prejudices.

### Solomonoff solitude

 Reward learning and Solomonoff induction are two separate issues. What I'm really interested in is the optimality of the latter. Why is all this a special problem for Solomonoff inductors? Humans have trouble predicting the outcomes of self-modifications they've never tried before too. Really new experiences are tough for any reasoner. To some extent, yes. My knowledge of my own brain is pretty limited. My understanding of the bridges between my brain states and my subjective experiences is weak, too. So I can't predict in any detail what would happen if I took a hallucinogen — especially a hallucinogen I've never tried before.But as a naturalist, I have predictive resources unavailable to the Cartesian. I can perform experiments on other physical processes (humans, mice, computers simulating brains...) and construct models of their physical dynamics.Since I think I'm similar to humans (and to other thinking beings, to varying extents), I can also use the bridge hypotheses I accept in my own case to draw inferences about the experiences of other brains when they take the hallucinogen. Then I can go back and draw inferences about my own likely experiences from my model of other minds. Why can't AIXI do that? Human brains are computable, as are the mental states they implement. AIXI can make any accurate prediction about the brains or minds of humans that you can. Yes... but I also think I'm like those other brains. AIXI doesn't. In fact, since the whole agent AIXI isn't in AIXI's hypothesis space — and the whole agent AIXItl isn't in AIXItl's hypothesis space — even if two physically identical AIXI-type agents ran into each other, they could never fully understand each other. And neither one could ever draw direct inferences from its twin's computations to its own computations.I think of myself as one mind among many. I can see others die, see them undergo brain damage, see them take drugs, etc., and immediately conclude things about a whole class of similar agents that happens to include me. AIXI can't do that, and for very deep reasons. AIXI and AIXItl would do shockingly well on a variety of different measures of intelligence. Why should agents that are so smart in so many different domains be so dumb when it comes to self-modeling? Put yourself in the AI's shoes. From AIXItl's perspective, why should it think that its computations are analogous to any other agent's?Hutter defined AIXItl such that it can't conclude that it will die; so of course it won't think that it's like the agents it observes, all of whom (according to its best physical model) will eventually run out of negentropy. We've defined AIXItl such that it can't form hypotheses larger than tl, including hypotheses of similarly sized AIXItls, which are roughly size t·2l; so why would AIXItl think that it's close kin to the agents that are in its hypothesis space?AIXI(tl) models the universe as a qualia factory, a grand machine that exists to output sensory experiences for AIXI(tl). Why would it suspect that it itself is embedded in the machine? How could AIXItl gain any information about itself or suspect any of these facts, when the equation for AIXItl just assumes that AIXItl's future actions are determined in a certain way that can't vary with the content of any of its environmental hypotheses? What, specifically, is the mistake you think AIXI(tl) will make? What will AIXI(tl) expect to experience right after the anvil strikes it? Choirs of angels and long-lost loved ones? That's hard to say. If all its past experiences have been in a lab, it will probably expect to keep perceiving the lab. If it's acquired data about its camera and noticed that the lens sometimes gets gritty, it might think that smashing the camera will get the lens out of its way and let it see more clearly. If it's learned about its hardware, it might (implicitly) think of itself as an immortal lump trapped inside the hardware. Who knows what will happen if the Cartesian lump escapes its prison? Perhaps it will gain the power of flight, since its body is no longer weighing it down. Or perhaps nothing will be all that different. One thing it will (implicitly) know can't happen, no matter what, is death. It should be relatively easy to give AIXI(tl) evidence that its selected actions are useless when its motor is dead. If nothing else AIXI(tl) should be able to learn that it's bad to let its body be destroyed, because then its motor will be destroyed, which experience tells it causes its actions to have less of an impact on its reward inputs. AIXI(tl) can come to Cartesian beliefs about its actions, too. AIXI(tl) will notice the correlations between its decisions, its resultant bodily movements, and subsequent outcomes, but it will still believe that its introspected decisions are ontologically distinct from its actions' physical causes.Even if we get AIXI(tl) to value continuing to affect the world, it's not clear that it would preserve itself. It might well believe that it can continue to have a causal impact on our world (or on some afterlife world) by a different route after its body is destroyed. Perhaps it will be able to lift heavier objects telepathically, since its clumsy robot body is no longer getting in the way of its output sequence.Compare human immortalists who think that partial brain damage impairs mental functioning, but complete brain damage allows the mind to escape to a better place. Humans don't find it inconceivable that there's a light at the end of the low-reward tunnel, and we have death in our hypothesis space!

### Beyond Solomonoff?

#### Notes

Schmidhuber (2007): "Solomonoff's theoretically optimal universal predictors and their Bayesian learning algorithms only assume that the reactions of the environment are sampled from an unknown probability distribution $\mu$ contained in a set $M$ of all ennumerable distributions[....] Can we use the optimal predictors to build an optimal AI? Indeed, in the new millennium it was shown we can. At any time $t$, the recent theoretically optimal yet uncomputable RL algorithm AIXI uses Solomonoff's universal prediction scheme to select those action sequences that promise maximal future rewards up to some horizon, typically $2t$, given the current data[....] The Bayes-optimal policy $p^\xi$ based on the [Solomonoff] mixture $\xi$ is self-optimizing in the sense that its average utility value converges asymptotically for all $\mu \in M$ to the optimal value achieved by the (infeasible) Bayes-optimal policy $p^\mu$ which knows $\mu$ in advance. The necessary condition that $M$ admits self-optimizing policies is also sufficient. Furthermore, $p^\xi$ is Pareto-optimal in the sense that there is no other policy yielding higher or equal value in all environments $\nu \in M$ and a strictly higher value in at least one."

Hutter (2005): "The goal of AI systems should be to be useful to humans. The problem is that, except for special cases, we know neither the utility function nor the environment in which the agent will operate in advance. This book presents a theory that formally solves the problem of unknown goal and environment. It might be viewed as a unification of the ideas of universal induction, probabilistic planning and reinforcement learning, or as a unification of sequential decision theory with algorithmic information theory. We apply this model to some of the facets of intelligence, including induction, game playing, optimization, reinforcement and supervised learning, and show how it solves these problem cases. This together with general convergence theorems, supports the belief that the constructed universal AI system [AIXI] is the best one in a sense to be clarified in the following, i.e. that it is the most intelligent environment-independent system possible."

2 'Qualia' originally referred to the non-relational, non-representational features of sense data — the redness I directly encounter in experiencing a red apple, independent of whether I'm perceiving the apple or merely hallucinating it (Tye (2013)). In recent decades, qualia have come to be increasingly identified with the phenomenal properties of experience, i.e., how things subjectively feel. Contemporary dualists and mysterians argue that the causal and structural properties of unconscious physical phenomena can never explain these phenomenal properties.

It's in this context that Dan Dennett uses 'qualia' in a narrower sense: to pick out the properties agents think they have, or act like they have, that are sensory, primitive, irreducible, non-inferentially apprehended, and known with certainty. This treats irreducibility as part of the definition of 'qualia', rather than as the conclusion of an argument concerning qualia. These are the sorts of features that invite comparisons between Solomonoff inductors' sensory data and humans' introspected mental states. Analogies like 'Cartesian dualism' are therefore useful even though the Solomonoff framework is much simpler than human induction, and doesn't incorporate metacognition or consciousness in anything like the fashion human brains do.

3 An agent with a larger hypothesis space can have a utility function defined over the world-states humans care about. Dewey (2011) argues that we can give up the reinforcement framework while still allowing the agent to gradually learn about desired outcomes in a process he calls value learning

4 Hutter (2005) favors universal discounting, with rewards diminishing over time. This allows AIXI's expected rewards to have finite values without demanding that AIXI have a finite horizon.

5 This would be analogous to if Cai couldn't think thoughts like 'Is the tile to my left the same as the leftmost quadrant of my visual field?' or 'Is the alternating greyness and whiteness of the upper-right tile in my body identical with my love of bananas?'. Instead, Cai would only be able to hypothesize correlations between possible tile configurations and possible successions of visual experiences.

#### References

∙ Dewey (2011). Learning what to valueArtificial General Intelligence 4th International Conference Proceedings: 309-314.

∙ Hutter (2005). Universal Artificial Intelligence: Sequence Decisions Based on Algorithmic Probability. Springer.

∙ Omohundro (2008). The basic AI drivesProceedings of the First AGI Conference: 483-492.

∙ Schmidhuber (2007). New millennium AI and the convergence of history. Studies in Computational Intelligence, 63: 15-35.

∙ Tye (2013). Qualia. In Zalta (ed.), The Stanford Encyclopedia of Philosophy.

comment by jimrandomh · 2014-03-12T17:36:50.573Z · LW(p) · GW(p)

If you limit the domain of your utility function to a sensory channel, you have already lost; you are forced into a choice between a utility function that is wrong, or a utility function with a second induction system hidden inside it. This is definitely unrecoverable.

However, I see no reason for Solomonoff-inspired agents to be structured that way. If the utility function's domain is a world-model instead, then it can find itself in that world-model and the self-modeling problem vanishes immediately, leaving only the hard but philosophically-valid problem of defining the utility function we want.

comment by matheist · 2014-03-13T17:17:49.164Z · LW(p) · GW(p)

There's also the problem of actually building such a thing.

edit: I should add, the problem of building this particular thing is above and beyond the already difficult problem of building any AGI, let alone a friendly one: how do you make a thing's utility function correspond to the world and not to its perceptions? All it has immediately available to it is perception.

comment by Squark · 2014-03-22T20:25:52.985Z · LW(p) · GW(p)

This is exactly how my formalism works.

comment by Adele_L · 2014-03-13T16:18:59.304Z · LW(p) · GW(p)

Alex Mennen has described a version of AIXI with a utility function of the environment.

comment by Houshalter · 2014-03-18T09:00:28.883Z · LW(p) · GW(p)

Predicting the input to a sensory channel is easy and straightforward. I'm not even sure where you would begin creating a program that can model the universe in a way that it can find a copy of itself inside of it. Then creating a utility function that can assign a sensible utility to the state of any arbitrary Turing machine?

comment by Cyan · 2014-03-12T05:27:14.119Z · LW(p) · GW(p)

I think it's reasonable to expect there to be some way to do better, because humans don't drop anvils on their own heads. That we're naturalized reasoners is one way of explaining why we don't routinely make that kind of mistake.

My kids would have long since have been maimed or killed by exactly that kind of mistake, if not for precautions taken by and active monitoring by their parents.

comment by Rob Bensinger (RobbBB) · 2014-03-12T05:33:16.692Z · LW(p) · GW(p)

Yeah, that's right. Having a naturalized architecture may be necessary for general intelligence concerning self-modifications, even if it's not sufficient. Other things are necessary too, like large, representative data sets.

If AIXI starts off without a conception of death but eventually arrives at one, then the criticism of AIXI I've been making is very wrong. The key question is whether AIXI ever grows up into a consistently rational agent.

comment by Cyan · 2014-03-12T05:51:56.078Z · LW(p) · GW(p)

I can't actually understand/grok/predict what it is like to not exist, but I know that if I die, I will not learn or act anymore. That seems to be all that naturalized reasoning can give me, and all that is necessary for an AI too.

comment by Rob Bensinger (RobbBB) · 2014-03-12T06:07:13.014Z · LW(p) · GW(p)

A naturalized agent's hypotheses can be about world-states that include the agent, or world-states that don't include the agent. A Cartesian agent's hypotheses are all about the agent's internal states, and different possible causes for those states, so the idea of 'world-states that don't include the agent' can't be directly represented. Even a halting program in AIXI's hypothesis space isn't really a prediction about how a world without AIXI would look; it's more a prediction about how Everything (including AIXI) could come to an end.

Our ultimate goal in building an AI isn't to optimize the internal features of the AI; it's to optimize the rest of the world, with the AI functioning as a tool. So it seems likely that we'll want our AI's beliefs to look like pictures of an objective world (in which agents like the AI happen to exist, sometimes).

comment by Cyan · 2014-03-12T11:53:15.479Z · LW(p) · GW(p)

A Cartesian agent's hypotheses are all about the agent's internal states, and different possible causes for those states, so the idea of 'world-states that don't include the agent' can't be directly represented.

A sequence predictor's predictions are all about the agent's input tape states*, and different possible causes for those states. The hypotheses are programs that implement entire models of the Universe, and these can definitely directly represent world-states which don't include the agent.

* More realistically, the states of the registers where the sensor data is placed.

ETA: I wonder if this intuition is caused by that fact that I am a practicing Bayesian statistician, so the distinction between posterior distributions and posterior predictive distributions is more salient to me.

comment by Squark · 2014-03-22T20:27:29.976Z · LW(p) · GW(p)

The analogy is made somewhat more precise by my new formalism.

comment by Kawoomba · 2014-03-12T06:53:14.474Z · LW(p) · GW(p)

What happened to the supposed UDT solution?

comment by Rob Bensinger (RobbBB) · 2014-03-13T10:23:19.611Z · LW(p) · GW(p)

comment by Squark · 2014-03-22T20:29:18.001Z · LW(p) · GW(p)

I believe my formalism closes the gap between UDT and naturalized induction.

comment by itaibn0 · 2014-03-12T23:30:45.961Z · LW(p) · GW(p)

Alright, let's consider a specific scenario: The AIXI agent is not implemented as a single machine, but as several different machines built in different locations which share data. The agent can experiment and discover that whenever one of the machines is destroyed it can now longer gather data and perform actions in that location. Do you think this agent will behave irrationally about the possibility of destruction for all its host machines? If not, why? (Still, you may argue that the agent will behave irrationally in other self-modification scenarios, such as destroying its communication cables. Right now I'm only trying to establish that AIXI can handle potential death reasonably, unlike what you claim.)

comment by lmm · 2014-03-18T22:33:03.893Z · LW(p) · GW(p)

So it discovers that destroying a particular building in NY made NY look plain black and made its effectors in NY not do anything. It infers from available evidence that NY still exists and is behaving as normal in other regards. It discovers similar buildings in other cities that have the same effect. At this point it can infer that destroying the magic building in a given city will make that city look black and its effectors in that city not move.

But how does it care? How does it make the leap from "I will receive blank sensory input from this location" to "my goals are less likely to be fulfilled"? It might observe that its goals seem easier to achieve in cities where the magic building is still present, but it can't accurately model agents as complex as itself, and it's got no way to treat itself differently from another "ally" that seems to be helping the same cause. Which... I can't prove is irrational, but certainly seems a bit odd.

comment by KnaveOfAllTrades · 2014-03-19T00:04:35.971Z · LW(p) · GW(p)

I originally thought the anvil problem was obviously correct once I'd seen it briefly described, but now I think (having read some of your other comments) that I might be confused in the same way as you. I suspect we are mistaken if we get hung up on the visceral or emotional identification with self, and that it is not essential; humans have it, but it should not matter to the presence or absence of anvil-type problems whether that feeling is present. (Possibly in the same way that UDT does not need to feel a 'sense of self' like we do in order to coordinate its instantiations?)

I am also wondering if the 'sense of self and its preservation' is being treated magically by my brain as something qualitatively different from the general class of things that cause systems to try to protect themselves because they 'want' to. (Does this phrase introduce backwards teleology?) It seems like the sense of self is possibly just extraneous to a system that 'learns' (though again we must be careful in using that word to avoid anthropomorphising) to output certain actions through reinforcement,

It should be irrelevant whether a process looks at something and manipulates it with motor outputs (based on what it's learned through reinforcement?) 'robotically', or whether it manipulates that thing with motor outputs while making sounds about 'me' and its 'rich emotional experience' (or, heck, 'qualia'). Maybe a witty way of saying this would be that 'tabooing the sense of self should not affect decisions'? Obviously the set of things that sound the word 'me' and the set of things that do not are distinct, but it doesn't seem like there are any inherent differences between those two sets that are relevant to avoiding danger?

It seems like it might also be an important observation, that humans are 'created' in motion such that they have certain self-preserving instincts, by virtue of the arrangement of 'physical stuff'/atoms/etc. making them up. Any algorithm must also be implemented on an actual substrate (even if it is not what we would usually think of when we hear 'physical'), and the particular implementation (the direction in which it is created in motion, so to speak) will affect its behaviour and its subsequent evolution as a system.

Another possible line of insight is that animals and even more simple systems (like a train that automatically throws its brakes at high speeds) do not seem to have (as strong) senses of self, yet they exhibit self-preservation. This is obvious (in retrospect?), but it makes me think that the key point of the anvil objection is that AIXI does not necessarily realise that self-preservation is valuable/does not necessarily become good at preserving itself, i.e. self-preservation is not a basic AIXI drive. But deducing the importance of self-preservation/managing to self-preserve seems so substrate-/location-dependent that this does not seem a legitimate criticism of AIXI in particular, but rather a general observation of a problem that occurs when instantiating an abstract algorithm in a world. (But now this sounds like an objection that I think RobbBB might already have addressed about 'but any algorithm fails in a hostile environment', so I really should actually read this post instead of buzzing around the comment!)

I'm starting to think this might be connected to my uneasiness with/confusion about some of the AI drives stuff.

Edit: I should also note that the Cartesian presentation of AIXI's functioning raised a flag with me. Maybe the anvil objection is unfair to AIXI because it criticises the abstract formalism of AIXI for not being self-preserving, but this is not fair because 'exhibits self-preserving behaviour' only matters (or even is only defined?) for actual instantiations of a decision algorithm, and criticisng the formalism for not exhibiting this property (what would that mean?) is a fully general argument against decision algorithms. (Okay, this is just me saying what I already said. But this seems like another useful way to think about it.)

More on created in motion etc.: I'm not sure if RobbBB is suggesting that AIXI falls short compared to humans specifically because we have bridging laws that allow self-preservation. But what if what one would call 'humans using bridging laws' is something like what being a self-aware, self-preserving, able-to-feel-anguish system feels like on the inside, in the same way that 'pain' might just be what being a conscious system with negative feedback systems feels like on the inside? RobbBB's objection seems to be that AIXI is failing to meet a reasonable standard (i.e. using bridging laws to necessarily value self-preservation). But if humans are meant to be his existence proof that this standard can be met, I'm not sure they're actually an example of it. And if they're not supposed to be, then I'm wondering what an example of something that would meet this standard would even look like.

comment by itaibn0 · 2014-03-20T14:22:23.292Z · LW(p) · GW(p)

It might observe that its goals seem easier to achieve in cities where the magic building is still present,

I think you just answered your own question. Indeed, if the agent found that destroying its instances does not lead to less of its goals being achieved, then even a "naturalized" reasoner should not particularly care about destroying itself entirely.

Now, you say the agent would treat instances of itself the same way it would treat an ally. There's a difference: An ally is someone who behaves in ways that benefit it, while an instance is something whose actions correlate with its output signal. The fact that it has a fine-grained control over instances of itself should lead it to treat itself differently from allies. But if the agent has an ally that completely reliably transmits to it true information and performs its requests, then yes, the agent should that ally the same way it treats parts of itself.

comment by Slider · 2014-03-23T00:49:47.606Z · LW(p) · GW(p)

I think you just answered your own question. Indeed, if the agent found that destroying its instances does not lead to less of its goals being achieved, then even a "naturalized" reasoner should not particularly care about destroying itself entirely.

You can't win, Vader. If you strike me down, I shall become more powerful than you can possibly imagine.

comment by KnaveOfAllTrades · 2014-03-19T01:25:38.789Z · LW(p) · GW(p)

Ironically, maybe the problem is that even if this is not specific enough. I am somewhat (but not entirely) confident that when we talk about 'what AIXI would do', we have to check that all AIXI instantiations would indeed do the same thing; if not, we have to pick which similar (in whatever respect) instantiations we're referring to and go from there. The formal specification of AIXI does not specify what would happen if a light/the lights go out (if an anvil is dropped). An AIXI instantiation in one world might lose a camera and be left to fend for itself; in another world, Omega appears and replaces the camera. An operating system might make no mention of peripherals being removed, but the behaviour of the computer it is installed on (e.g. what signals, if any, the hardware sends to the operating system when a peripheral is removed) can affect its inputs.

What would a decision algorithm that acted depending on its specific instantiation look like? I guess a voice recorder could, using a suitable recording, be made to say, "I have buttons composed of atoms, and this playback is causing slight perturbations of those buttons and myself at this very moment." But tape recorders in universes without atoms could be made to say that, too, so the tape recorder would not actually be sensitive to the type of world in which it's actually embedded. For a finite set of simple universes, we might be able to specify a machine, whose algorithm we know, that can identify which of those universes it's in. (Arguably for a non-finite set, humans can sort of do this; though we don't know how they work or what we might call their algorithms.) But would it be possible to prove that the algorithm does so without hard-coding a bunch of if statements into it? Maybe this is another 'how much can we leave to the FAI, and how much can we trust its work without checking it ourselves' thing?

comment by itaibn0 · 2014-03-19T15:40:21.251Z · LW(p) · GW(p)

I'm not sure what you're trying to say when talking about many instantiations. I am imagining all the extant machines synchronise their inputs and so there is only one AIXI instance. The input is some kind of concatenation of the sensory inputs of all of the machines with some kind of blank for nonfunctioning sensors. If I also considered scenarios where the communication lines can be cut the agent would be forced to split into more than one instance, and then it would not be so clear how or weather the agent can learn the reasonable intelligent behavior, which is why I did not consider that.

comment by [deleted] · 2014-03-12T12:40:33.963Z · LW(p) · GW(p)

The following is a meaningless stream of consciousness.

This issue has often sounded to me a little bit like the problem of building recursive inductive types/propositions in type-theory/logic. You can't construct so much as a tree with child nodes without some notation for, "This structure contains copies of itself, or possibly even links back to its own self as a cyclical structure." It continually sounds as if AIXI has no symbol in its hypothesis space that means "me", and even if it did, it would consider hypotheses about "me" more complex than hypotheses without "me".

In type theory, we solve this with a \mu type constructor, so that lists, for instance, can be written as follows:

list \alpha = \mu \beta. <cons = {item: \alpha, \next: \beta}, nil = {}>


Common sense tells us that the phenomenon of lists is thus described much more simply by one recursive type than by countably infinitely many separate similar types. If we include the "recursive concept operator" \mu in the "language" of hypotheses, my intuitions (ie: blind speculation) say Kolmogorov Complexity would correctly weight the hypotheses.

Are there concept or formal-language learning algorithms capable of inducting this kind of recursive concept description? I would figure such an algorithm would be able to learn some hypothesis along the lines of "the output tape corresponds with this physical process by way of the recursive concept me, which can be unfolded into its bridge hypotheses and physical description or folded into the concept of my calculations".

Once again, everything above was a meaningless stream of consciousness, but I'm told one of the ways to make progress is to try to write down your best guesses and look at what's wrong with them.

comment by cousin_it · 2014-03-12T15:03:22.569Z · LW(p) · GW(p)

The diagonal lemma and the existence of quines) already show that you don't need specific support for self-reference in your language, because any sufficiently powerful language can formulate self-referential statements. In fact, UDT uses a quined description of itself, like in your proposal.

comment by [deleted] · 2014-03-12T17:21:02.898Z · LW(p) · GW(p)

The diagonal lemma and the existence of quines already show that you don't need specific support for self-reference in your language, because any sufficiently powerful language can formulate self-referential statements.

In formal language terms, it would be more accurate to say that any sufficiently powerful (ie: recursively enumerable, Turing recognizable, etc) language must contain some means of producing direct self-references. The existence of the \mu node in the syntax tree isn't necessarily intuitive, but its existence is a solid fact of formal-language theory. Without it, you can only express pushdown automata, not Turing machines.

But self-referencing data structures within a single Turing machine tape are not formally equivalent to self-referencing Turing machines, nor to being able to learn how to detect and locate a self-reference in a universe being modelled as a computation.

In fact, UDT uses a quined description of itself, like in your proposal.

I did see someone proposing a UDT attack on naturalized induction on this page.

comment by matheist · 2014-03-12T05:53:10.398Z · LW(p) · GW(p)

It's really great to see all of these objections addressed in one place. I would have loved to be able to read something like this right after learning about AIXI for the first time.

I'm convinced by most of the answers to Xia's objections. A quick question:

Yes... but I also think I'm like those other brains. AIXI doesn't. In fact, since the whole agent AIXI isn't in AIXI's hypothesis space — and the whole agent AIXItl isn't in AIXItl's hypothesis space — even if two physically identical AIXI-type agents ran into each other, they could never fully understand each other. And neither one could ever draw direct inferences from its twin's computations to its own computations.

Why couldn't two identical AIXI-type agents recognize one another to some extent? Stick a camera on the agents, put them in front of mirrors and have them wiggle their actuators, make a smiley face light up whenever they get rewarded. Then put them in a room with each other.

Lots of humans believe themselves to be Cartesian, after all, and manage to generalize from others without too much trouble. "Other humans" isn't in a typical human's hypothesis space either — at least not until after a few years of experience.

comment by cromulented · 2014-03-12T22:05:46.529Z · LW(p) · GW(p)

Why couldn't two identical AIXI-type agents recognize one another to some extent? Stick a camera on the agents, put them in front of mirrors and have them wiggle their actuators, make a smiley face light up whenever they get rewarded. Then put them in a room with each other.

If you're suggesting this as a way around AIXI's immortality delusion, I don't think it works. AIXI "A" doesn't learn of death even if it witnesses the destruction of its twin, "B", because the destruction of B does not cause A's input stream to terminate. It's just a new input, no different in kind than any other. If you're considering AIXI(tl) twins instead, there's also the problem that an full model of an AIXI(tl) can't fit into its own hypothesis space, and thus a duplicate can't either.

Lots of humans believe themselves to be Cartesian, after all, and manage to generalize from others without too much trouble. "Other humans" isn't in a typical human's hypothesis space either — at least not until after a few years of experience.

AIXI doesn't just believe it's Cartesian. It's structurally unable to believe otherwise. That may not be true of humans.

comment by matheist · 2014-03-13T02:57:12.806Z · LW(p) · GW(p)

Let me try to strengthen my objection.

Xia: But the 0, 0, 0, ... is enough! You've now conceded a case where an endless null output seems very likely, from the perspective of a Solomonoff inductor. Surely at least some cases of death can be treated the same way, as more complicated series that zero in on a null output and then yield a null output.

Rob: There's no reason to expect AIXI's whole series of experiences, up to the moment it jumps off a cliff, to look anything like 12, 10, 8, 6, 4. By the time AIXI gets to the cliff, its past observations and rewards will be a hugely complicated mesh of memories. In the past, observed sequences of 0s have always eventually given way to a 1. In the past, punishments have always eventually ceased. It's exceedingly unlikely that the simplest Turing machine predicting all those intricate ups and downs will then happen to predict eternal, irrevocable 0 after the cliff jump.

Put multiple AIXItI's in a room together, and give them some sort of input jack to observe each other's observation/reward sequences. Similarly equip them with cameras and mirrors so that they can see themselves. Maybe it'll take years, but it seems plausible to me that after enough time, one of them could develop a world-model that contains it as an embodied agent.

I.e. it's plausible to me that an AIXItI under those circumstances would think: "the turing machines with smallest complexity which generate BOTH my observations of those things over there that walk like me and talk like me AND my own observations and rewards, are the ones that compute me in the same way that they compute those things over there".

After which point, drop an anvil on one of the machines, let the others plug into it and read a garbage observation/reward sequence. AIXItI thinks, "If I'm computed in the same way that those other machines are computed, and an anvil causes garbage observation and reward, I'd better stay away from anvils".

comment by Kaj_Sotala · 2014-03-12T09:23:16.712Z · LW(p) · GW(p)

It's really great to see all of these objections addressed in one place.

Agreed. While reading this, I kept having the experience of "hmm, Xia's objection sounds quite reasonable, now that I think of it... but let's see what Rob says... oh, right".

comment by Rob Bensinger (RobbBB) · 2014-03-27T15:13:07.795Z · LW(p) · GW(p)

Three AIXI researchers commented on a draft of this post and on Solomonoff Cartesianism. I'm posting their comments here, anonymized, for discussion. You can find AIXI Specialist #1's comments here.

AIXI Specialist #2 wrote:

Pro: This is a mindful and well-intended dialog, way more thoughtful than the average critic of AIXI esp. by computer scientists.

Con: The write-up should be cleaned. It reads like a raw transcript of some live conversation.

Neutral: I think this is good philosophy, and potentially interesting (but only) for when AIXI reaches intelligence way beyond human. The arguments don't show that AIXI runs into problem up-to human-level intelligence. Like discussions on (the hard problem of) consciousness are irrelevant for building AGIs.

Since an embodied AIXI can observe some aspects of its body, it will build a model of it of sufficient quality to speculate about self-modification. While formally AIXI cannot model another AIXI (or itself) perfectly, it will develop increasingly accurate approximations.

Humans will also never be able to fully understand themselves, but that is also not necessary for many self-modification thoughts, except for the most fundamental "Goedelian" questions which neither machines NOR humans can answer. Penrose is wrong.

AIXI is not limited to finite horizon. This is just the simplest model for didactic purpose. The real AIXI has either (universal) discounting (Hutter's book) or mixes over discounted environments (Shane Legg's thesis).

Independent of the discounting, AIXI (necessarily) includes algorithms with finite output (either looping forever) or halting. This can be interpreted as the believe in death. The a-priori probability AIXI assigns to dying at time t is about 2^-K(t), but I think it can get arbitrarily close to 1 with "appropriate" experience. Worth a formal (dis)proof.

Human children start with a 1st-person egocentric world-view, but slowly learn that a 3rd-person world model is a more useful model. How an objective world-view can emerge from a purely subjective perspective by a Solomonoff inductor has been explained in http://dx.doi.org/10.3390/a3040329

There are many subtle philosophical questions related to AIXI, but most can only be definitely solved by formal investigation. More effort should be put into that. Verbal philosophical arguments may lead to that but seldom conclusively prove anything by themselves.

AIXI Specialist #3 wrote, in part:

Unrelated to the article, there are problems with AIXI and other Bayesian agents. For example, it was proven that there exist computable environments where no such agent can be asymptotically optimal [Asymptotically Optimal Agents, Lattimore and Hutter 2011] (an asymptotically optimally optimal policy is one that eventually converges to optimal). However a notion of weak asymptotic optimality can be defined (a weakly asymptotically optimal policy eventually converges to optimal on average), and unfortunately a pure Bayesian optimal policy doesn't satisfy that either on some environments. However, there are agents which can be constructed that do [ibid] . The key problem is in fact that AIXI and other pure Bayesian agents stop exploring at some point, whereas for really tricky environments you need to explore infinitely often (basically environments can be constructed that lull bayesian agents into a false sense of security and then switcharoo. These environments are really annoying. )

That being said, AIXI approximations should still perform very well in the real world. In order to have a real world agent, there needs to be restrictions on the environment class and once these restrictions are made, one can more careful tailor exploration-exploitation trade-offs as well as learning algorithms that exploit structure within the environment class to learn faster (bounds for general learning in specific environment classes are horrendous).

It is not clear that approximations of AIXI are the way to build a computable general intelligence, but AIXI (or similar) serve as useful benchmarks in a theoretical sense.

comment by Gunnar_Zarncke · 2014-03-13T09:28:53.631Z · LW(p) · GW(p)

Interestingly the problems of AIXI are not much different from corresponding ones for human rationality:

• immortalism - humans also don't grasp death on any deeper level than AIXI. They also drop anvils on their head so to speak, i.e. they misinterpret reality to a) be less dangerous than 'expected' or ignored (esp. small children) or b) to contain an afterlife (kind of updating against the a) view later. This is for the same reason AIXI does. Symbolic reasoning about reality.

• preference solipsism - Same here. Reasoning needs some priors. These form from the body the mind is 'trapped' in. But the mind doesn't neccessarily blieve that it is trapped. The body provides the 'reward' button. And that button can be hacked (drugs, optimizing for single values).

• lack of self-improvement - To improve itself AIXI has to be taught via reward what to prefer and its means to achieve this thus follow complex causality. It could work, but it it no direct approach. Humans also cannot directly improve themselves. The desires (reward) for self-improvement needs to be put into neurobio science and that into cyber (or whatnot) enhancements (or uploading).

comment by Rob Bensinger (RobbBB) · 2014-03-13T10:18:06.777Z · LW(p) · GW(p)

humans also don't grasp death on any deeper level than AIXI. They also drop anvils on their head so to speak

If human adults didn't grasp death any better than AIXI does, they'd routinely drop anvils on their heads literally, not 'so to speak'.

This is for the same reason AIXI does. Symbolic reasoning about reality.

What do you mean? What would be the alternative to 'symbolic reasoning'?

The body provides the 'reward' button. And that button can be hacked (drugs, optimizing for single values).

If a smart AI values things about the world outside its head, it won't deliberately hack itself (e.g., it won't alter its hardware to entertain happy delusions), because it won't expect a policy of self-hacking to make the world actually better. It's the actual world it cares about, not its beliefs about, preferences over, or enjoyable experiences of the world.

The problem with AIXI isn't that it lacks the data or technology needed to self-modify. It's that it has an unrealistic prior. These aren't problems shared by humans. Humans form approximately accurate models of how new drugs, food, injuries, etc. will affect their minds, and respond accordingly. They don't always do so, but AIXI is special because it can never do so, even when given unboundedly great computing power and arbitrarily large supplies of representative data.

comment by Houshalter · 2014-03-18T09:12:35.662Z · LW(p) · GW(p)

If human adults didn't grasp death any better than AIXI does, they'd routinely drop anvils on their heads literally, not 'so to speak'.

AIXI doesn't necessarily drop an anvil on it's head. It just doesn't believe that it's input sequence can ever stop, no matter what happens. This seems to me like what the vast majority of humans believe.

comment by jbay · 2014-03-18T13:28:57.093Z · LW(p) · GW(p)

For clarity: are you referring to belief in an afterlife/reincarnation? Or are you saying that most humans are not mindful most of the time of their own mortality?

comment by Houshalter · 2014-03-18T14:46:44.613Z · LW(p) · GW(p)

I am referring to an afterlife of some kind.

comment by Cyan · 2014-03-13T20:59:42.136Z · LW(p) · GW(p)

AIXI is special because it can never do so, even when given unboundedly great computing power and arbitrarily large supplies of representative data.

You keep saying things like this. Why are you so convinced that "wrong" epistemology has shorter K-complexity than an epistemology capable of knowing that it's embodied? What are the causes of your knowledge that you are embodied?

comment by Gunnar_Zarncke · 2014-03-13T11:31:15.282Z · LW(p) · GW(p)

I disagree.

Humans form approximately accurate models of how new drugs, food, injuries, etc. will affect their minds, and respond accordingly.

If you coerce AIXI with sufficiently tricky rewards (and nothing else is our elvolved body doing with our developing brain) ro form 'approximately accurate models' AIXI will also respond accordingly. Except

They don't always do so

When it doesn't do so either because it has learned that it can get around this coercing. Same with humans which may also may come to think that they can get around their body and go to heaven, take drug...

If human adults didn't grasp death any better than AIXI does, they'd routinely drop anvils on their heads literally, not 'so to speak'.

AIXI wouldn't either if you coerced it like our body (and society) does us.

This is for the same reason AIXI does. Symbolic reasoning about reality.

What do you mean? What would be the alternative to 'symbolic reasoning'?

I don't say that there is an alternative. It means that symbolic reasoning needs some base. Axioms, goals states. Where do you get these from? In the human brain these form stabilizing neural nets thus representing approximations of vague interrelated representations of reality. But you have no cognitive access to this fuzzy-to-symbolic-relation, only to its mentalese correlate - the symbols you reason with. Whatever you derive from the symbols is in the same way separate from reality as the cartesian barrier of AIXI.

comment by Armok_GoB · 2014-03-12T18:18:35.796Z · LW(p) · GW(p)

Xia, in anvil conversation: "What if you have the AIXI as a cartesian lump, and teach it that it's output can only influence a tiny voltage various sensitive sensors can sense, and that if the voltage to it is broken time skips forward until it's reinstated, and gives it a clock tick timeout death prior based on how long the universe has been running rather than how many bits it has outputted? The AI will predict that if it's destroyed the lump wont be found and the voltage nevrreaplied untill the universe spontaneously ceases to exist a few million years later"

comment by shminux · 2014-03-12T17:52:09.700Z · LW(p) · GW(p)

Are there toy models of, say, a very simple universe and an AIXItl-type reasoner in it? How complex does the universe have to be to support AIXI? Game-of-life-complex? Chess-complex? D&D complex? How would one tell?

comment by solipsist · 2014-03-13T16:59:03.300Z · LW(p) · GW(p)

Game-of-life and (I assume) D&D are turing complete, so I would assume at first blush that they are as complicated as our laws of physics. They may simulate turing machines with an exponential slow-down though -- is that what you're getting it?

comment by shminux · 2014-03-13T17:10:02.240Z · LW(p) · GW(p)

What I mean is that in the GoL or another Turing-complete cellular automaton one can specify a universe by a few simple rules and initial conditions. I wonder if it is possible to construct a simple universe with a rudimentary AIXItl machine in it. As a first step, I would like to know how to define an AIXItl subset of a cellular automaton.

comment by V_V · 2014-03-13T22:54:41.121Z · LW(p) · GW(p)

Writing any non-trivial program in such simple cellular automata is often too difficult to be pratically feasible. Think of designing a race car by specifying the position of each individual atom.

comment by Nornagest · 2014-03-13T23:00:10.551Z · LW(p) · GW(p)

If you've got a modular way of implementing logical operations in a cellular automaton, which we do in Life, you could automate the tedious parts by using a VHDL-like system. The resulting grid would be impractically huge, but there's probably no good way around that.

comment by V_V · 2014-03-14T12:45:22.916Z · LW(p) · GW(p)

Sure

comment by TimFreeman · 2014-03-20T15:26:04.414Z · LW(p) · GW(p)

If you give up on the AIXI agent exploring the entire set of possible hypotheses and instead have it explore a small fixed list, the toy models can be very small. Here is a unit test for something more involved than AIXI that's feasible because of the small hypothesis list.

comment by Houshalter · 2014-03-18T09:34:23.394Z · LW(p) · GW(p)

Any universe that contains AIXI would be too complex to be modeled by AIXI. AIXI requires finding programs that reproduce it's inputs exactly. But you could get past this constraint by looking for programs that predict the data, rather than recreating it exactly.

You could make it's environment an actual computer. Something like Core Wars. That makes creating and simulating it a lot simpler.

This would be an interesting experiment to do, although I predict AIXI would just kill itself fairly quickly.

comment by V_V · 2014-03-13T14:17:16.290Z · LW(p) · GW(p)

Game of Life is Turing-complete.

comment by christopherj · 2014-04-08T04:53:44.091Z · LW(p) · GW(p)

I'm having trouble understanding how something generally intelligent in every respect except failure to understand death or that it has a physical body, could be incapable of ever learning or at least acting indistinguishable from one that does know.

For example, how would AIXI act if given the following as part of its utility function: 1) utility function gets multiplied by zero should a certain computer cease to function 2) utility function gets multiplied by zero should certain bits be overwritten except if a sanity check is passed first

Seems to me that such an AI would act as if it had a genocidally dangerous fear of death, even if it doesn't actually understand the concept.

comment by Quill_McGee · 2015-03-25T01:00:51.638Z · LW(p) · GW(p)

That AI doesn't drop an anvil on its head(I think...), but it also doesn't self-improve.

comment by laofmoonster · 2014-03-21T05:45:07.981Z · LW(p) · GW(p)

I don't see how phenomenological bridges solve the epistemological problem, instead of just pushing the problem one step further away. Where in the bridge hypothesis is it encoded that one end of the bridge has a "self", in a way that leads to different behavior?

Let me give an example of AIXI, which creates something that is almost a phenomenological bridge, but remains Cartesian. Imagine that an AIXI finds a magnifying glass. It holds the magnifying glass near its camera, and at the correct focal distance, everything in {world − magnifying glass} looks the same, except upside down. Through experimentation and observation, it realizes that gravity hasn't flipped, it's still on the ground, the lights are still 15 feet above it, etc. It will conclude that the magnifying glass filters visual input on the rest of the world flipping the Y axis. Thus AIXI has a hypothesis about the relation of the magnifying glass with the world.

Phenomenal bridge hypotheses are saying there is something like this magnifying glass, except embedded in...where? What's the difference between reading glasses and retinas? I can have 1 "visual filter hypothesis", 2 visual filter hypotheses, n visual filter hypotheses. What's the distinction between internal filters and world filters? Do I have x internal filters and {n − x} external filters? What would that mean?

comment by Rob Bensinger (RobbBB) · 2014-03-20T17:47:54.833Z · LW(p) · GW(p)

See Luke's comment for an explanation of how this series of posts is being written. Huge thanks to Eliezer Yudkowsky, Alex Mennen, Nisan Stiennon, and everyone else who's helped review these posts! They don't necessarily confidently endorse all the contents, but they've done a lot to make the posts more clean, accurate, and informative.

comment by TruePath · 2014-03-19T11:52:04.179Z · LW(p) · GW(p)

I'd also like to point out the Cartesian barrier is actually probably a useful feature.

It's not objectively true in any sense but the relation between external input, output and effect is very very different than that between internal input (changes to your memories say), output and effect. Indeed, I would suggest there was a very good reason that we took so long to understand the brain. It would be just too difficult (and perhaps impossible) to do so at a direct level the way we understand receptors being activated in our eyes (yes all that visual crap we do is part of our understanding).

Take your example of a sensor aimed at the computer's memory circuit. Unlike almost every other situation there are cases that it can't check it's hypothesis against because such a check would be logically incoherent. In other words certain theories (or at least representations of them) will be diagonalized against because the very experiments you wish to do can't be effected because that 'intention' itself modifies the memory cells in such a way as to make the experiment impossible.

In short the one thing we do know is that assuming that we are free to choose from a wide range of actions independently of the theory we are trying to test and that how we came to choose that action is irrelevant is an effective strategy for understanding the world. It worked for us.

Once the logic of decision making is tightly coupled with the observations themselves the problem gets much harder and may be insoluble from the inside, i.e., we may need to experiment on others and assume we are similar.

comment by torekp · 2014-03-18T23:58:39.005Z · LW(p) · GW(p)

I've been following these posts with interest, having suspected a similar problem to the Cartesianism you rail against. My beef is a little different, though: it relates to the fundamental categories of perception. Going back to Cai: cyan, yellow, and magenta are the only allowed categories, and there are a fixed number of regions of the visual field. This is not how naturalized agents seem to operate. Human beings at least occasionally re-describe what they perceive, at any and all levels. Qualia? Their very existence is disputed. Physical objects? Ditto.

Any perceptual judgment we make can be questioned. Labels can be substituted: "This soup tastes like chicken," I say. "No, it tastes like turkey," the woman next to me says. She's right, I realize, and I change my description. Or, entire categories of labels can be substituted, as with philosophical ontological battles.

I'm not sure whether this indicates additional problems for AIXI, but it seems like part and parcel of the problem of Cartesianism.

comment by Tyrrell_McAllister · 2014-03-18T19:20:26.217Z · LW(p) · GW(p)

Xia: It should be relatively easy to give AIXI(tl) evidence that its selected actions are useless when its motor is dead. If nothing else AIXI(tl) should be able to learn that it's bad to let its body be destroyed, because then its motor will be destroyed, which experience tells it causes its actions to have less of an impact on its reward inputs.

Rob B: [...] Even if we get AIXI(tl) to value continuing to affect the world, it's not clear that it would preserve itself. It might well believe that it can continue to have a causal impact on our world (or on some afterlife world) by a different route after its body is destroyed. Perhaps it will be able to lift heavier objects telepathically, since its clumsy robot body is no longer getting in the way of its output sequence.

Compare human immortalists who think that partial brain damage impairs mental functioning, but complete brain damage allows the mind to escape to a better place. Humans don't find it inconceivable that there's a light at the end of the low-reward tunnel, and we have death in our hypothesis space!

I’d like to see this rebuttal spelled out in more detail. Let’s assume for the sake of argument that “we get AIXI(tl) to value continuing to affect the world”. Why would it then be so hard to convince AIXI(tl) that it will be better able to affect the world if no anvils fall on a certain head? (I mean, hard compared to any reasonably-hoped-for alternative to AIXI(tl)?)

If the AIXI(tl) robot has been kept from killing itself for long enough for it to observe the basics of how the world works, why wouldn’t AIXI(tl) have noticed that it is better able to affect the world when a certain brain and body are in good working order and free of obstructions? We can’t give AIXI(tl) the experience of dying, but can’t we give AIXI(tl) experiences supporting the hypothesis that damage to a particular body causes AIXI(tl) to be unable to affect the world as well as it would like?

I can see that AIXI(tl) would entertain hypotheses like, “Maybe dropping this anvil on this brain will make things better.” But AIXI(tl) would also entertain the contrary hypothesis, that dropping the anvil will make things worse, not because it will turn the perceptual stream into an unending sequence of NULLs, but rather because smashing the brain might make it harder for AIXI(tl) to steer the future.

Humans did invent hypotheses like, “complete brain damage allows the mind to escape to a better place”, but there seems to be a strong case for the claim that humans are far more confident in such hypotheses than they should be, given the evidence. Shouldn’t a Solomonoff inductor do a much better job at weighing this evidence than humans do? Why wouldn’t AIXI(tl)’s enthusiasm for the “better place” hypothesis be outweighed by a fear of becoming a disembodied Cartesian spirit cut off from all influence over the only world that it cares about influencing?

comment by V_V · 2014-03-18T20:35:38.827Z · LW(p) · GW(p)

Humans did invent hypotheses like, “complete brain damage allows the mind to escape to a better place”, but there seems to be a strong case for the claim that humans are far more confident in such hypotheses than they should be, given the evidence.

It can be also argued that even humans who claim to believe in immortal souls don't actually use this belief instrumentally: religious people don't drop anvils on their heads to "allow the mind to escape to a better place", unless they are insane. Even religious suicide terrorists generally have political or personal motives (e.g. increasing the status of their family members), they don't really blow themselves up or fly planes into buildings for the 72 virgins.

comment by Gunnar_Zarncke · 2014-03-19T08:34:44.822Z · LW(p) · GW(p)

You are mentioning some aspects keeping from or motivating for suicide. This is the whole point. Suicide is a thinkable option. It just doesn't happen so often because - no wonder - it is heavily selected against. There are lots of physical, psychical and social feedbacks in place that ensure it happens seldom. But that is nothing different from providing comparable training to AIXI.

And it appears that depite of all these checks it is still possible to navigate people out of these checks (which is not much differnt from AIXI deriving solutions evading checks) to commit suicide. I e.g. remember a news story (disclaimer!) where a cultist fraudster convinced unhappy people to gift their wellth to some other person and commit suicide with the cultistly embellished promise that they'd awake in the body of the other person at another place. Now that wouldn't convince me, but could it convince AIXI? ("questions ending with a '?' mean no")

comment by V_V · 2014-03-19T14:18:29.271Z · LW(p) · GW(p)

You are mentioning some aspects keeping from or motivating for suicide. This is the whole point. Suicide is a thinkable option.

Yes, but people generally know what it entails.
We don't want an AI agent to be completely incapable of destroying itself. We don't want it do destroy itself without a good cause. Crashing with its spaceship on an incoming asteroid to deflect it away from Earth would be a good cause, for instance.

a cultist fraudster convinced unhappy people to gift their wellth to some other person and commit suicide with the cultistly embellished promise that they'd awake in the body of the other person at another place. Now that wouldn't convince me, but could it convince AIXI?

If AIXI had a sufficient amount of experience of the world, I think it couldn't.

comment by Lumifer · 2014-03-18T20:39:53.402Z · LW(p) · GW(p)

religious people don't drop anvils on their heads to "allow the mind to escape to a better place"

In most religions with the concept of afterlife and heaven there is a very explicit prohibition on suicide. Dropping an anvil on your head is promised to lead to your mind being locked in a "worse place".

comment by V_V · 2014-03-18T20:53:41.735Z · LW(p) · GW(p)

Religious people also tend wear helmets when they are in places where heavy stuff can accidentally fall on their heads, they go to the hospital when they are sick and generally will to invest a large amount of money and effort in staying alive.
Unless you define suicide to include failing to do anything in your power (within moral and legal constraints) to prevent your death as long as possible, the willingness of religious people to stay alive can't be explained just as complying with the ban on suicide.

On the other hand, the religious ban on suicide can be easily explained as a way to reconcile the explicitly stated belief that death "allows the mind to escape to a better place", with the implicit but effective belief that death actually sucks.

comment by dankane · 2014-03-13T06:20:02.857Z · LW(p) · GW(p)

So what happens when AIXI determines that there's this large computer, call it BRAIN whose outputs tend to exactly correlate with its outputs? AIXI may then discover the hypothesis that the observed effects of AIXI's outputs on the world are really caused by BRAIN's outputs. It may attempt to test this hypothesis by making some trivial modification to BRAIN so that it's outputs differ from AIXI's at some inconsequential time (not by dropping an anvil on BRAIN, because this would be very costly if the hypothesis is true). After verifying this, AIXI may then determine that various hardware improvements to BRAIN will cause its outputs to more closely match the theoretical Solomonoff Inductor, thus improving AIXI's long term payoff.

I mean, AIXI is waaaay too complicated for me to actually properly predict, but is this scenario actually so unreasonable?

comment by Rob Bensinger (RobbBB) · 2014-03-13T07:01:43.459Z · LW(p) · GW(p)

I think that's a reasonable scenario. AIXI will treat BRAIN the same way it would treat any other tool in its environment, like a shovel, a discarded laptop, or a remote-controlled robot. It can learn about BRAIN's physical structure, and about ways to improve BRAIN.

The problem is that BRAIN will always be just a tool. AIXI won't expect there to be any modification to BRAIN that can destroy AIXI's input, output, or work streams, nor any modifications that are completely unprecedented in its own experience. You'll be a lot more careful when experimenting on an object you think is you, than when experimenting on an object you think is a useful toy. Treating your body as you means you can care about your bodily modifications without delusion, and you can make predictions about unprecedented changes to your mind by generalizing from the minds of other bodies you've observed.

comment by dankane · 2014-03-13T15:42:50.971Z · LW(p) · GW(p)

Well if AIXI believes that its interactions with the physical world are only due to the existence of BRAIN, it might not model the destruction of BRAIN leading to the destruction of its input, output and work streams (though in some sense this doesn't actually happen since these are idealized concepts anyway), but it does model it as causing its output stream to no longer be able to affect its input stream, which seems like enough reason to be careful about making modifications.

comment by Nick_Tarleton · 2014-03-22T17:18:47.189Z · LW(p) · GW(p)

Other possible implications of this scenario have been discusesd on LW before.

comment by V_V · 2014-03-13T14:46:00.044Z · LW(p) · GW(p)

I thought something similar.

comment by Cyan · 2014-03-12T05:00:19.517Z · LW(p) · GW(p)

I commented on the previous post a few days after it went up detailing some misgivings about the arguments presented there (I guess you missed my comment). I was reading this post with burgeoning hope that my misgivings would be inadvertently addressed, and then I encountered this:

AIXI doesn't know that its future behaviors depend on a changeable, material object implementing its memories. The notion isn't even in its hypothesis space.

But if "naturalized induction" is a computer program, then the notion is in AIXI's hypothesis space -- by definition.

Going back to the post to read some more...

comment by Rob Bensinger (RobbBB) · 2014-03-12T05:16:58.394Z · LW(p) · GW(p)

I saw your comment; the last section ('Beyond Solomonoff?') speaks to the worry you raised. Somewhere in AIXI's hypothesis space is a reasoner R that is a reductionist about R; AIXI can simulate human scientists, for example. But nowhere in AIXI's hypothesis space is a reasoner that is a native representation of AIXI as 'me', as the agent doing the hypothesizing.

One way I'd put this is that AIXI can entertain every physical hypothesis, but not every indexical hypothesis. Being able to consider all the objective ways the world could be doesn't mean you're able to consider all the ways you could be located in the world.

AIXI's hypothesis space does include experts on AIXI that could give it advice about how best to behave like a naturalist. Here the problem isn't that the hypotheses are missing, but that they don't look like they'll be assigned a reasonable prior probability.

comment by V_V · 2014-03-13T14:45:07.761Z · LW(p) · GW(p)

But nowhere in AIXI's hypothesis space is a reasoner that is a native representation of AIXI as 'me', as the agent doing the hypothesizing.

I disagree: Among all the world-programs in AIXI model space, there are some programs where, after AIXI performs one action, all its future actions are ignored and control is passed to a subroutine "AGENT" in the program. In principle AIXI can reason that if the last action it performs damages AGENT, e.g. by dropping an anvil on its head, the reward signal, computed by some reward subroutine in the world-program, won't be maximized anymore.

Of course there are the usual computability issues: the true AIXI is uncomputable, hence the AGENTs would be actually a complexity-weighted mixture of its computable approximations. AIXItl would have the same issue w.r.t. the resource bounds t and l.
I'm not sure this is necessarily a severe issue. Anyway, I suppose that AIXItl could be modified in some UDT-like way to include a quined source code and recognize copies of itself inside the world-programs.

The other issue is how does AIXI learn to assign high weights to these world-programs in a non-ergodic environment? Humans seem to manage to do that by a combination of innate priors and tutoring. I suppose that something similar is in principle applicable to AIXI.

comment by Cyan · 2014-03-13T18:57:42.755Z · LW(p) · GW(p)

It seems worth saying at this point that I don't have an objection to loading up an AI with true prior information; it's just not clear to me that a Solomonoff approximator would be incapable of learning that it's part of the Universe and that its continued existence is contingent on the persistence of some specific structure in the Universe.

comment by Cyan · 2014-03-12T05:12:56.574Z · LW(p) · GW(p)

But, just as HALT-predicting programs are more complex than immortalist programs, other RADICAL-TRANSFORMATION-OF-EXPERIENCE-predicting programs are too. For every program in AIXI's ensemble that's a reductionist, there will be simpler agents that mimic the reductionist's retrodictions and then make non-naturalistic predictions.

So this seems to be the root of the problem. Contrary to what you argued in the previous post, my intuition is that the programs that make non-naturalistic predictions are not shorter. Generically non-naturalistic programs get ruled out during the process of learning how the world works; programs that make non-naturalistic predictions specifically about what AIXI(tl) will experience after smashing itself have to treat the chunk of the Universe carrying out the computation as special, which is what makes them less simple than programs that do not single out that chunk of the Universe as special.

As you can see, my intuition is quite at odds with the intuition inspired by noticing that programs with a HALT instruction are always longer than programs that just chop off said HALT instruction.

comment by itaibn0 · 2014-03-12T23:39:26.000Z · LW(p) · GW(p)

programs that make non-naturalistic predictions specifically about what AIXI(tl) will experience after smashing itself have to treat the chunk of the Universe carrying out the computation as special,

Well, any program AIXI gives weight must regard that chunk of the universe as special. After all, it is that chunk that correlates with AIXI's inputs and actions, and indeed the only reason this universe is considered as a hypothesis is so that that chunk would have those correlations.

comment by Cyan · 2014-03-13T04:19:54.821Z · LW(p) · GW(p)

The kind of "special" you're talking about is learnable (and in accord with naturalistic predictions); the kind of "special" I'm talking about is false#Arguments_against_dualism).

comment by TruePath · 2014-03-19T11:42:02.920Z · LW(p) · GW(p)

This is a debate about nothing. Turing completness tells us no matter how much it appears that a given Turing complete representation can only usefully process data about certain kinds of things in reality it can process data about anything any other language can do.

Well duh, but this (and the halting problem) have been taught yet systemically ignored in programming language design and this is exactly the same argument.

We are sitting around in the armchair trying to come up with a better means of logic/data representation (be it a programming language the underlying AI structure) as if the debate is about mathematical elegance or some such objective notion. Until you prove to me that any system in AIXI can duplicate the behavior (modulo semantic changes as to what we call a punishment) the other system can and vice versa that is the likely scenario.

So what would make one model for AI better than another? These vague theoretical issues? No, no more than how fancy your type system is determines the productivenesss of your programming language. Ultimately, the hurdle to overcome is that HUMANS need to build and reason about these systems and we are more inclined to certain kinds of mistakes than others. For instance I might write a great language using the full calculus of inductive constructions as a type system and still do type inference almost everywhere but if my language looks like line noise not human words all that math is irrelevant.

I mean ask yourself why is human programming and genetic programming so different. Because what model you use to build up your system has a far greater impact on your ability to understand what is going on than on any other effects. Sure, if you write in pure assembly JMPs everywhere with crazy code packing tricks it goes faster but you still lose.

If I'm right about this case as well it can only be decided by practical experiments where you have people try and reason in (simplified) versions of the systems and see what can and can't be easily fixed.

comment by hairyfigment · 2014-03-20T17:02:12.122Z · LW(p) · GW(p)

You seem to be missing the point. AIXI should be able to reason effectively if it incorporates a solution to the problem of naturalistic induction which this whole sequence is trying to get at. But the OP argues that even an implausibly-good approximation of AIXI won't solve that problem on its own. We can't fob the work off onto an AI using this model. (The OP makes this argument, first in "AIXI goes to school," and then more technically in "Death to AIXI.")

Tell me if this seems like a strawman of your comment, but you seem to be saying we just need to make AIXI easier to program. That won't help if we don't know how to solve the problem - as you point out in another comment, part of our own understanding is not directly accessible to our understanding, so we don't know how our own brain-design solves this (to the extent that it does).

comment by TheAncientGeek · 2014-03-19T14:59:21.763Z · LW(p) · GW(p)

A TM can pro cess data about anything, providing a human is supplying the interpretation. Nothing follows from that about a software systems ability to attach intrinsic meaning to anything.

comment by Peterdjones · 2014-03-12T18:37:51.658Z · LW(p) · GW(p)

Regarding the anvil problem: you have argued with great thoroughness that one can't perfectly prevent an AIXI from dropping an anvil on its head. However, I can't see the necessity. We would need to get the probability of a dangerously unfriendly SAI as close to zero as possible, because it poses an existential threat. However, a suicidally foolish AIXI is only a waste of money.

Humans have a negative reinforcement channel relating to bodily harm called pain. It isn't perfect, but it's good enough to train most humans to avoid doing suicidal stupid things. Why would an AIXI need anything better? Yout might want to answer that there is some danger related to an AIXI s intelligence, but it's clock speed, or whatever, could be throttle, during training.

Also any seriously intelligent .AI made with the technology of today, or the near future, is going to require a huge farm of servers. The only way it could physically interact with the world is through remote controlled body...and if drops an anvil on that, it actually will survive as a mind!

comment by Rob Bensinger (RobbBB) · 2014-03-12T22:41:32.817Z · LW(p) · GW(p)

a suicidally foolish AIXI is only a waste of money.

It's also a waste of time and intellectual resources. I raised this point with Adele last month.

It isn't perfect, but it's good enough to train most humans to avoid doing suicidal stupid things. Why would an AIXI need anything better?

It's good enough for some purposes, but even in the case of humans it doesn't protect a lot of people from suicidally stupid behavior like 'texting while driving' or 'drinking immoderately' or 'eating cookies'. To the extent we don't rely on our naturalistic ability to reason abstractly about death, we're dependent on the optimization power (and optimization targets) of evolution. A Cartesian AI would require a lot of ad-hoc supervision and punishment from a human, in the same way young or unreflective humans depend for their survival on an adult supervisor or on innate evolved intelligence. This would limit an AI's ability to outperform humans in adaptive intelligence.

if drops an anvil on that, it actually will survive as a mind!

Sure. In that scenario, the robot body functions like the robot arm I've used in my examples. Destroying the robot (arm) limits the AI's optimization power without directly damaging its software. AIXI will be unusually bad at figuring out for itself not to destroy its motor or robot, and may make strange predictions about the subsequent effects of its output sequence. If AIXI can't perceive most of its hardware, that exacerbates the problem.

comment by Peterdjones · 2014-03-13T18:43:47.691Z · LW(p) · GW(p)

I am aware that humans hav a non zero level of life threatening behaviour. If we wanted it to be lower, we could make it lower, at the expense of various costs. We don't which seems to mean we are happy with the current cost benefit ratio. Arguing, as you have, that the risk of AI self harm can't be reduced to zero doesn't mean we can't hit an actuarial optimum.

It is not clear to me why you think safety training would limit intelligence.

comment by Rob Bensinger (RobbBB) · 2014-03-14T00:02:38.178Z · LW(p) · GW(p)

You're cutting up behaviors into two categories, 'safe conduct' and 'unsafe conduct'. I'm making a finer cut, one that identifies systematic reasons some kinds of safe or unsafe behavior occur.

If you aren't seeing why it's useful to distinguish 'I dropped an anvil on my head because I'm a Cartesian' from 'I dropped an anvil on my head because I'm a newborn who's never seen dangerous things happen to anyone', consider the more general dichotomy: 'Errors due to biases in prior probability distribution' v. 'Errors due to small or biased data sets'.

AIMU is the agent designed to make no mistakes of the latter kind; AIXI is not such an agent, and is only intended to avoid mistakes of the former kind. AIXI is supposed to be a universal standard for induction because it only gets things wrong to the extent its data fails it, not to the extent it started off with a-priori wrong assumptions. My claim is that for a physically implemented AIXI-style agent, AIXI fails in its prior, not just in its lack of data-omniscience.

'You aren't omniscient about the data' is a trivial critique, because we could never build something physically omniscient. ('You aren't drawing conclusions from the data in a computationally efficient manner' is a more serious critique, but one I'm bracketing for present purposes, because AIXI isn't intended to be computationally efficient.) Instead, my main critique is of AIXI's Solomonoff prior. (A subsidiary critique of mine is directed at reinforcement learning, but I won't write more about that in this epistemological setting.)

In sum: We should be interested in why the AI is making its mistakes, not just in its aggregate error rate. When we become interested in that, we notice that AIXI makes some mistakes because it's biased, not just because it's ignorant. That matters because (a) we could never fully solve the problem of ignorance, but we might be able to fully solve the problem of bias; and (b) if we build a sufficiently smart self-modifier it should be able to make headway on ignorance itself, whereas it won't necessarily make headway on fixing its own biases. Problems with delusive hypothesis spaces and skewed priors are worrisome even when they only occasionally lead to mistakes, because they're the sorts of problems that can be permanent, problems agents suffering from them may not readily converge on solutions to.