## Posts

## Comments

**alexmennen**on AlphaStar: Impressive for RL progress, not for AGI progress · 2019-11-02T19:51:47.196Z · score: 21 (11 votes) · LW · GW

The impressive part is getting reinforcement learning to work at all in such a vast state space

It seems to me that that is AGI progress? The real world has an event vaster state space, after all. Getting things to work in vast state spaces is a necessary pre-condition to AGI.

**alexmennen**on What are we assuming about utility functions? · 2019-10-03T16:54:44.047Z · score: 6 (3 votes) · LW · GW

Ok, I see what you mean about independence of irrelevant alternatives only being a real coherence condition when the probabilities are objective (or otherwise known to be equal because they come from the same source, even if there isn't an objective way of saying what their common probability is).

But I disagree that this makes VNM only applicable to settings in which all sources of uncertainty have objectively correct probabilities. As I said in my previous comment, you only need there to exist some source of objective probabilities, and you can then use preferences over lotteries involving objective probabilities and preferences over related lotteries involving other sources of uncertainty to determine what probability the agent must assign for those other sources of uncertainty.

Re: the difference between VNM and Bayesian expected utility maximization, I take it from the word "Bayesian" that the way you're supposed to choose between actions does involve first coming up with probabilities of each outcome resulting from each action, and from "expected utility maximization", that these probabilities are to be used in exactly the way the VNM theorem says they should be. Since the VNM theorem does not make any assumptions about where the probabilities came from, these still sound essentially the same, except with Bayesian expected utility maximization being framed to emphasize that you have to get the probabilities somehow first.

**alexmennen**on What are we assuming about utility functions? · 2019-10-03T06:10:34.388Z · score: 15 (5 votes) · LW · GW

I think you're underestimating VNM here.

only two of those four are relevant to coherence. The main problem is that the axioms relevant to coherence (acyclicity and completeness) do not say anything at all about probability

It seems to me that the independence axiom is a coherence condition, unless I misunderstand what you mean by coherence?

correctly point out problems with VNM

I'm curious what problems you have in mind, since I don't think VNM has problems that don't apply to similar coherence theorems.

VNM utility stipulates that agents have preferences over "lotteries" with known, objective probabilities of each outcome. The probabilities are assumed to be objectively known from the start. The Bayesian coherence theorems do not assume probabilities from the start; they derive probabilities from the coherence criteria, and those probabilities are specific to the agent.

One can construct lotteries with probabilities that are pretty well understood (e.g. flipping coins that we have accumulated a lot of evidence are fair), and you can restrict attention to lotteries only involving uncertainty coming from such sources. One may then get probabilities for other, less well-understood sources of uncertainty by comparing preferences involving such uncertainty to preferences involving easy-to-quantify uncertainty (e.g. if A is preferred to B, and you're indifferent between 60%A+40%B and "A if X, B if not-X", then you assign probability 60% to X. Perhaps not quite as philosophically satisfying as deriving probabilities from scratch, but this doesn't seem like a fatal flaw in VNM to me.

I do not expect agent-like systems in the wild to be pushed toward VNM expected utility maximization. I expect them to be pushed toward Bayesian expected utility maximization.

I understood those as being synonyms. What's the difference?

**alexmennen**on [AN #66]: Decomposing robustness into capability robustness and alignment robustness · 2019-10-02T00:58:31.657Z · score: 6 (3 votes) · LW · GW

I do, however, believe that the single step cooperate-defect game which they use to come up with their factors seems like a very simple model for what will be a very complex system of interactions. For example, AI development will take place over time, and it is likely that the same companies will continue to interact with one another. Iterated games have very different dynamics, and I hope that future work will explore how this would affect their current recommendations, and whether it would yield new approaches to incentivizing cooperation.

It may be difficult for companies to get accurate information about how careful their competitors are being about AI safety. An iterated game in which players never learn what the other players did on previous rounds is the same as a one-shot game. This points to a sixth factor that increases chance of cooperation on safety: high transparency, so that companies may verify their competitors' cooperation on safety. This is closely related to high trust.

**alexmennen**on A Critique of Functional Decision Theory · 2019-09-16T03:40:17.491Z · score: 17 (6 votes) · LW · GW

I object to the framing of the bomb scenario on the grounds that low probabilities of high stakes are a source of cognitive bias that trip people up for reasons having nothing to do with FDT. Consider the following decision problem: "There is a button. If you press the button, you will be given $100. Also, pressing the button has a very small (one in a trillion trillion) chance of causing you to burn to death." Most people would not touch that button. Using the same payoffs and probabilies in a scenario to challenge FDT thus exploits cognitive bias to make FDT look bad. A better scenario would be to replace the bomb with something that will fine you $1000 (and, if you want, also increase the chance of of error).

But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.

I think the crucial difference here is how easily you can cause the predictor to be wrong. In the case where the predictor simulates you, if you two-box, then the predictor expects you to two-box. In the case where the predictor uses your nationality to predict your behavior, Scots usually one-box, and you're Scottish, if you two-box, then the predictor will still expect you to one-box because you're Scottish.

But now suppose that the pathway by which S causes there to be money in the opaque box or not is that another agent looks at S...

I didn't think that was supposed to matter at all? I haven't actually read the FDT paper, and have mostly just been operating under the assumption that FDT is basically the same as UDT, but UDT didn't build in any dependency on external agents, and I hadn't heard about any such dependency being introduced in FDT; it would surprise me if it did.

**alexmennen**on A Critique of Functional Decision Theory · 2019-09-16T02:15:11.538Z · score: 4 (2 votes) · LW · GW

I don't know if I'm a simulation or a real person.

A possible response to this argument is that the predictor may be able to accurately predict the agent without explicitly simulating them. A possible counter-response to this is to posit that any sufficiently accurate model of a conscious agent is necessarily conscious itself, whether the model takes the form of an explicit simulation or not.

**alexmennen**on Troll Bridge · 2019-09-15T16:34:59.969Z · score: 2 (1 votes) · LW · GW

I think the counterfactuals used by the agent are the correct counterfactuals for someone else to use while reasoning about the agent from the outside, but not the correct counterfactuals for the agent to use while deciding what to do. After all, knowing the agent's source code, if you see it start to cross the bridge, it is correct to infer that it's reasoning is inconsistent, and you should expect to see the troll blow up the bridge. But while deciding what to do, the agent should be able to reason about purely causal effects of its counterfactual behavior, screening out other logical implications.

Also, counterfactuals which predict that the bridge blows up seem to be saying that the agent can control whether PA is consistent or inconsistent.

Disagree that that's what's happening. The link between the consistency of the reasoning system and the behavior of the agent is because the consistency of the reasoning system controls the agent's behavior, rather than the other way around. Since the agent is selecting outcomes based on their consequences, it does make sense to speak of the agent choosing actions to some extent, but I think speaking of logical implications of the agent's actions on the consistency of formal systems as "controlling" the consistency of the formal system seems like an inappropriate attribution of agency to me.

**alexmennen**on A Primer on Matrix Calculus, Part 2: Jacobians and other fun · 2019-08-19T16:44:00.348Z · score: 3 (2 votes) · LW · GW

I suppose why that's not why we're minimizing determinant, but rather frobenius norm.

Yes, although another reason is that the determinant is only defined if the input and output spaces have the same dimension, which they typically don't.

**alexmennen**on A Primer on Matrix Calculus, Part 3: The Chain Rule · 2019-08-18T19:46:36.750Z · score: 5 (3 votes) · LW · GW

First, a vector can be seen as a list of numbers, and a matrix can be seen as an ordered list of vectors. An ordered list of matrices is... a tensor of order 3. Well not exactly. Apparently some people are actually disappointed with the term tensor because a tensor means something very specific in mathematics already and isn'tjustan ordered list of matrices. But whatever, that's the term we're using for this blog post at least.

It's true that tensors are something more specific than multidimensional arrays of numbers, but Jacobians of functions between tensor spaces (that being what you're using the multidimensional arrays for here) are, in fact, tensors.

**alexmennen**on A Primer on Matrix Calculus, Part 2: Jacobians and other fun · 2019-08-18T04:58:29.491Z · score: 5 (3 votes) · LW · GW

What this means is for the Jacobian is that the determinant tells us how much space is being squished or expanded in theneighborhoodaround a point. If the output space is being expanded a lot at some input point, then this means that the neural network is a bit unstable at that region, since minor alterations in the input could cause huge distortions in the output. By contrast, if the determinant is small, then some small change to the input will hardly make a difference to the output.

This isn't quite true; the determinant being small is consistent with small changes in input making arbitrarily large changes in output, just so long as small changes in input in a different direction make sufficiently small changes in output.

The frobenius norm is nothing complicated, and is really just a way of describing that we square all of the elements in the matrix, take the sum, and then take the square root of this sum.

An alternative definition of the frobenius norm better highlights its connection to the motivation of regularizing the Jacobian frobenius in terms of limiting the extent to which small changes in input can cause large changes in output: The frobenius norm of a matrix J is the root-mean-square of |J(x)| over all unit vectors x.

**alexmennen**on If physics is many-worlds, does ethics matter? · 2019-07-10T17:37:58.100Z · score: 26 (15 votes) · LW · GW

"Controlling which Everett branch you end up in" is the wrong way to think about decisions, even if many-worlds is true. Brains don't appear to rely much on quantum randomness, so if you make a certain decision, that probably means that the overwhelming majority of identical copies of you make the same decision. You aren't controlling which copy you are; you're controlling what all of the copies do. And even if quantum randomness does end of mattering in decisions, so that a non-trivial proportion of copies of you make different decisions from each other, then you would still presumably want a high proportion of them to make good decisions; you can do your part to bring that about by making good decisions yourself.

**alexmennen**on If physics is many-worlds, does ethics matter? · 2019-07-10T17:24:44.198Z · score: 22 (11 votes) · LW · GW

Consider reading a real physicist's take on the issue

This seems phrased to suggest that her view is "the real physicist view" on the multiverse. You could also read what Max Tegmark or David Deutsch, for instance, have to say about multiverse hypotheses and get a "real physicist's" view from them.

Also, she doesn't actually say much in that blog post. She points out that when she says that multiverse hypotheses are unscientific, she doesn't mean that they're false, so this doesn't seem especially useful to someone who wants to know whether there actually is a multiverse, or is interested in the consequences thereof. She says "there is no reason to think we live in such multiverses to begin with", but proponents of multiverse hypotheses have given reasons to support their views, which she doesn't address.

**alexmennen**on Von Neumann’s critique of automata theory and logic in computer science · 2019-05-29T01:17:38.188Z · score: 2 (1 votes) · LW · GW

#1 (at the end) sounds like complexity theory.

Some of what von Neumann says makes it sound like he's interested in a mathematical foundation for analog computing, which I think has been done by now.

**alexmennen**on And My Axiom! Insights from 'Computability and Logic' · 2019-01-17T02:11:03.435Z · score: 6 (4 votes) · LW · GW

On several occasions, the authors emphasize how the intuitive nature of "effective computability" renders futile any attempt to formalize the thesis. However, I'm rather interested in formalizing intuitive concepts and therefore wondered why this hasn't been attempted.

Formalizing the intuitive notion of effective computability was exactly what Turing was trying to do when he introduced Turing machines, and Turing's thesis claims that his attempt was successful. If you come up with a new formalization of effective computability and prove it equivalent to Turing computability, then in order to use this as a proof of Turing's thesis, you would need to argue that your new formalization is correct. But such an argument would inevitably be informal, since it links a formal concept to an informal concept, and there already have been informal arguments for Turing's thesis, so I don't think there is anything really fundamental to be gained from this.

**alexmennen**on And My Axiom! Insights from 'Computability and Logic' · 2019-01-17T01:57:56.471Z · score: 4 (2 votes) · LW · GW

Consider the halting set; ... is not enumerable / computable.

...

Here, we should be careful with how we interpret "information". After all, coNP-complete problems are trivially Cook reducible to their NP-complete counterparts (e.g., query the oracle and then negate the output), but many believe that there isn't a corresponding Karp reduction (where we do a polynomial amount of computation before querying the oracle and returning its answer). Since we aren't considering complexity but instead whether it's enumerable at all, complementation is fine.

You're using the word "enumerable" in a nonstandard way here, which might indicate that you've missed something (and if not, then perhaps at least this will be useful for someone else reading this). "Enumerable" is not usually used as a synonym for computable. A set is computable if there is a program that determines whether or not its input is in the set. But a set is enumerable if there is a program that halts if its input is in the set, and does not halt otherwise. Every computable set is enumerable (since you can just use the output of the computation to decide whether or not to halt). But the halting set is an example of a set that is enumerable but not computable (it is enumerable because you can just run the program coded by your input, and halt if/when it halts). Enumerable sets are not closed under complementation; in fact, an enumerable set whose complement is enumerable is computable (because you can run the programs for the set and its complement in parallel on the same input; eventually one of them will halt, which will tell you whether or not the input is in the set).

The distinction between Cook and Karp reductions remains meaningful when "polynomial-time" is replaced by "Turing computable" in the definitions. Any set that an enumerable set is Turing-Karp reducible to is also enumerable, but an enumerable set is Turing-Cook reducible to its complement.

The reason "enumerable" is used for this concept is that a set is enumerable iff there is a program computing a sequence that enumerates every element of the set. Given a program that halts on exactly the elements of a given set, you can construct an enumeration of the set by running your program on every input in parallel, and adding an element to the end of your sequence whenever the program halts on that input. Conversely, given an enumeration of a set, you can construct a program that halts on elements of the set by going through the sequence and halting whenever you find your input.

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-18T07:33:58.989Z · score: 2 (1 votes) · LW · GW

I don't follow the analogy to 1/x being a partial function that you're getting at.

Maybe a better way to explain what I'm getting at is that it's really the same issue that I pointed out for the two-envelopes problem, where you know the amount of money in each envelope is finite, but the uniform distribution up to an infinite surreal would suggest that the probability that the amount of money is finite is infinitesimal. Suppose you say that the size of the ray is an infinite surreal number . The size of the portion of this ray that is distance at least from is when is a positive real, so presumably you would also want this to be so for surreal . But using, say, , every point in is within distance of , but this rule would say that the measure of the portion of the ray that is farther than from is ; that is, almost all of the measure of is concentrated on the empty set.

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-16T19:29:34.761Z · score: 3 (2 votes) · LW · GW

The latter. It doesn't even make sense to speak of maximizing the expectation of an unbounded utility function, because unbounded functions don't even have expectations with respect to all probability distributions.

There is a way out of this that you could take, which is to only insist that the utility function has to have an expectation with respect to probability distributions in some restricted class, if you know your options are all going to be from that restricted class. I don't find this very satisfying, but it works. And it offers its own solution to Pascal's mugging, by insisting that any outcome whose utility is on the scale of 3^^^3 has prior probability on the scale of 1/(3^^^3) or lower.

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-16T19:18:05.341Z · score: 5 (3 votes) · LW · GW

It's a bad bullet to bite. Its symmetries are essential to what makes Euclidean space interesting.

And here's another one: are you not bothered by the lack of countable additivity? Suppose you say that the volume of Euclidean space is some surreal number . Euclidean space is the union of an increasing sequence of balls. The volumes of these balls are all finite, in particular, less than , so how can you justify saying that their union has volume greater than ?

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-15T19:46:04.097Z · score: 2 (1 votes) · LW · GW

Why? Plain sequences are a perfectly natural object of study. I'll echo gjm's criticism that you seem to be trying to "resolve" paradoxes by changing the definitions of the words people use so that they refer to unnatural concepts that have been gerrymandered to fit your solution, while refusing to talk about the natural concepts that people actually care about.

I don't think think your proposal is a good one for indexed sequences either. It is pretty weird that shifting the indices of your sequence over by 1 could change the size of the sequence.

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-15T19:04:55.957Z · score: 2 (1 votes) · LW · GW

What about rotations, and the fact that we're talking about destroying a bunch of symmetry of the plane?

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-15T17:11:04.375Z · score: 2 (1 votes) · LW · GW

There are measurable sets whose volumes will not be preserved if you try to measure them with surreal numbers. For example, consider . Say its measure is some infinite surreal number . The volume-preserving left-shift operation sends to , which has measure , since has measure . You can do essentially the same thing in higher dimensions, and the shift operation in two dimensions () can be expressed as the composition of two rotations, so rotations can't be volume-preserving either. And since different rotations will have to fail to preserve volumes in different ways, this will break symmetries of the plane.

I wouldn't say that volume-preserving transformations fail to preserve volume on non-measurable sets, just that non-measurable sets don't even have measures that could be preserved or not preserved. Failing to preserve measures of sets that you have assigned measures to is entirely different. Non-measurable sets also don't arise in mathematical practice; half-spaces do. I'm also skeptical of the existence of non-measurable sets, but the non-existence of non-measurable sets is a far bolder claim than anything else I've said here.

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-15T02:05:38.665Z · score: 2 (1 votes) · LW · GW

Indeed Pascal's Mugging type issues are already present with the more standard infinities.

Right, infinity of any kind (surreal or otherwise) doesn't belong in decision theory.

"Surreal numbers are not the right tool for measuring the volume of Euclidean space or the duration of forever" - why?

How would you? If you do something like taking an increasing sequence of bounded subsets that fill up the space you're trying to measure, find a formula f(n) for the volume of the nth subset, and plug in , the result will be highly dependent on which increasing sequence of bounded subsets you use. Did you have a different proposal? It's sort of hard to explain why no method for measuring volumes using surreal numbers can possibly work well, though I am confident it is true. At the very least, volume-preserving transformations like shifting everything 1 meter to the left or rotating everything around some axis cease to be volume-preserving, though I don't know if you'd find this convincing.

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-15T01:39:51.969Z · score: 4 (2 votes) · LW · GW

You want to conceive of this problem as "a sequence whose order-type is ω", but from the surreal perspective this lacks resolution. Is the number of elements (surreal) ω, ω+1 or ω+1000? All of these are possible given that in the ordinals 1+ω=ω so we can add arbitrarily many numbers to the start of a sequence without changing its order type.

It seems to me that measuring the lengths of sequences with surreals rather than ordinals is introducing fake resolution that shouldn't be there. If you start with an infinite constant sequence 1,1,1,1,1,1,..., and tell me the sequence has size , and then you add another 1 to the beginning to get 1,1,1,1,1,1,1,..., and you tell me the new sequence has size , I'll be like "uh, but those are the same sequence, though. How can they have different sizes?"

**alexmennen**on An Extensive Categorisation of Infinite Paradoxes · 2018-12-14T20:50:11.702Z · score: 14 (6 votes) · LW · GW

Surreal numbers are useless for all of these paradoxes.

Infinitarian paralysis: Using surreal-valued utilities creates more infinitarian paralysis than it solves, I think. You'll never take an opportunity to increase utility by because it will always have higher expected utility to focus all of your attention on trying to find ways to increase utility by , since there's some (however small) probability that such efforts would succeed, so the expected utility of focusing your efforts on looking for ways to increase utility by will have expected utility , which is higher than . I think a better solution would be to note that for any person, a nonzero fraction of people are close enough to identical to that person that they will make the same decisions, so any decision that anyone makes affects a nonzero fraction of people. Measure theory is probably a better framework than surreal numbers for formalizing what is meant by "fraction" here.

Paradox of the gods: The introduction of surreal numbers solves nothing. Why wouldn't he be able to advance more than miles if no gods erect any barriers until he advances miles for some finite ?

Two-envelopes paradox: it doesn't make sense to model your uncertainty over how much money is in the first envelope with a uniform surreal-valued probability distribution on for an infinite surreal , because then the probability that there is a finite amount of money in the envelope is infinitesimal, but we're trying to model the situation in which we know there's a finite amount of money in the envelope and just have no idea which finite amount.

Sphere of suffering: Surreal numbers are not the right tool for measuring the volume of Euclidean space or the duration of forever.

Hilbert hotel: As you mentioned, using surreals in the way you propose changes the problem.

Trumped, Trouble in St. Petersburg, Soccer teams, Can God choose an integer at random?, The Headache: Using surreals in the way you propose in each of these changes the problems in exactly the same way it does for the Hilbert hotel.

St. Petersburg paradox: If you pay infinity dollars to play the game, then you lose infinity dollars with probability 1. Doesn't sound like a great deal.

Banach-Tarski Paradox: The free group only consists of sequences of finite length.

The Magic Dartboard: First, a nitpick: that proof relies on the continuum hypothesis, which is independent of ZFC. Aside from that, the proof is correct, which means any resolution along the lines you're imagining that imply that no magic dartboards exist is going to imply that the continuum hypothesis is false. Worse, the fact that for any countable ordinal, there are countably many smaller countable ordinals and uncountably many larger countable ordinals follows from very minimal mathematical assumptions, and is often used in descriptive set theory without bringing in the continuum hypothesis at all, so if you start trying to change math to make sense of "the second half of the countable ordinals", you're going to have a bad time.

Parity paradoxes: The lengths of the sequences involved here are the ordinal , not a surreal number. You might object that there is also a surreal number called , but this is different from the ordinal . Arithmetic operations act differently on ordinals than they do on the copies of those ordinals in the surreal numbers, so there's no reasonable sense in which the surreals contain the ordinals. Example: if you add another element to the beginning of either sequence (i.e. flip the switch at , or add a at the beginning of the sum, respectively), then you've added one thing, so the surreal number should increase by , but the order-type is unchanged, so the ordinal remains the same.

**alexmennen**on Safely and usefully spectating on AIs optimizing over toy worlds · 2018-08-21T21:11:56.038Z · score: 2 (1 votes) · LW · GW

The agent could be programmed to have a certain hard-coded ontology rather than searching through all possible hypotheses weighted by description length.

**alexmennen**on Safely and usefully spectating on AIs optimizing over toy worlds · 2018-08-15T20:53:35.731Z · score: 2 (1 votes) · LW · GW

I haven't heard the term "platonic goals" before. There's been plenty written on capability control before, but I don't know of anything written before on the strategy I described in this post (although it's entirely possible that there's been previous writing on the topic that I'm not aware of).

**alexmennen**on Safely and usefully spectating on AIs optimizing over toy worlds · 2018-08-09T00:17:28.157Z · score: 2 (1 votes) · LW · GW

Are you worried about leaks from the abstract computational process into the real world, leaks from the real world into the abstract computational process, or both? (Or maybe neither and I'm misunderstanding your concern?)

There will definitely be tons of leaks from the abstract computational process into the real world; just looking at the result is already such a leak. The point is that the AI should have no incentive to optimize such leaks, not that the leaks don't exist, so the existence of additional leaks that we didn't know about shouldn't be concerning.

Leaks from the outside world into the computational abstraction would be more concerning, since the whole point is to prevent those from existing. It seems like it should be possible to make hardware arbitrarily reliable by devoting enough resources to error detection and correction, which would prevent such leaks, though I'm not an expert, so it would be good to know if this is wrong. There may be other ways to get the AI to act similarly to the way it would in the idealized toy world even when hardware errors create small differences. This is certainly the sort of thing we would want to take seriously if hardware can't be made arbitrarily reliable.

Incidentally, that story about accidental creation of a radio with an evolutionary algorithm was part of what motivated my post in the first place. If the evolutionary algorithm had used tests of its oscillator design in a computer model, rather than in the real world, then it would have have built a radio receiver, since radio signals from nearby computers would not have been included in the computer model of the environment, even though they were present in the actual environment.

**alexmennen**on Probabilistic Tiling (Preliminary Attempt) · 2018-08-08T00:17:44.711Z · score: 2 (1 votes) · LW · GW

What I meant was that the computation isn't extremely long in the sense of description length, not in the sense of computation time. Also, we aren't doing policy search over the set of all turing machines, we're doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)

Wouldn't the set of all action sequences have lower description length than some large finite set of policies? There's also the potential problem that all of the policies in the large finite set you're searching over could be quite far from optimal.

**alexmennen**on Probabilistic Tiling (Preliminary Attempt) · 2018-08-07T23:47:29.284Z · score: 2 (1 votes) · LW · GW

Ok, understood on the second assumption. is not a function to , but a function to the set of -valued random variables, and your assumption is that this random variable is uncorrelated with certain claims about the outputs of certain policies. The intuitive explanation of the third condition made sense; my complaint was that even with the intended interpretation at hand, the formal statement made no sense to me.

I'm pretty sure you're assuming that is resolved on day , not that it is resolved eventually.

Searching over the set of all Turing machines won't halt in a reasonably short amount of time, and in fact won't halt ever, since the set of all Turing machines is non-compact. So I don't see what you mean when you say that the computation is not extremely long.

**alexmennen**on Safely and usefully spectating on AIs optimizing over toy worlds · 2018-08-07T23:16:36.059Z · score: 2 (1 votes) · LW · GW

This model seems very fatalistic, I guess? It seems somewhat incompatible with an agent that has preferences. (Perhaps you're suggesting we build an AI without preferences, but it doesn't sound like that.)

Ok, here's another attempt to explain what I meant. Somewhere in the platonic realm of abstract mathematical structures, there is a small world with physics quite a lot like ours, containing an AI running on some idealized computational hardware, and trying to arrange the rest of the small world so that it has some desired property. Humans simulate this process so they can see what the AI does in the small world, and copy what it does. The AI could try messing with us spectators, so that we end up giving more compute to the physical instantiation of the AI in the human world (which is different from the AI in the platonic mathematical structure), which the physical instantiation of the AI in the human world can use to better manipulate the simulation of the toy world that we are running in the human world (which is also different from the platonic mathematical structure). The platonic mathematical structure itself does not have a human world with extra compute in it that can be grabbed, so trying to mess with human spectators would, in the platonic mathematical structure, just end up being a waste of compute, so this strategy will be discarded if it somehow gets considered in the first place. Thus a real-world simulation of this AI-in-a-platonic-mathematical-structure will, if accurate, behave in the same way.

**alexmennen**on Probabilistic Tiling (Preliminary Attempt) · 2018-08-07T22:57:18.972Z · score: 2 (1 votes) · LW · GW

Thanks, fixed.

**alexmennen**on Probabilistic Tiling (Preliminary Attempt) · 2018-08-07T22:50:13.134Z · score: 4 (3 votes) · LW · GW

I suggest stating the result you're proving before giving the proof.

You have some unusual notation that I think makes some of this unnecessarily confusing. Instead of this underlined vs non-underlined thing, you should have different functions $ and , where the first maps action sequences to utilities, and the second maps a pair consisting of an action and a future policy to the utility of the action sequence beginning with , followed by , followed by the action sequence generated by . Your first assumption would then be stated . Your second assumption (fairness of the environment) is implicit in the type signature of the utility function . If your utility depends on something other than the action sequence, then it doesn't make sense to write it as a function of the action sequence. It's good to point out assumptions that are implicit in the formalism you're using, but by the time you identify utility as a function of action sequences, you don't need to assume fairness of the environment as an additional axiom. I do not understand what your third assumption is.

This is emphatically false in general, but there's a special condition that makes it viable, namely that the distribution at time n is guaranteed to assign probability 1 to iff . My epistemic state about this is "this seems extremely plausible, but I don't know for sure if logical inductors attain this property in the limit"

They don't. For instance, let be any true undecidable sentence. The logical inductor does not assign probability 1 to even in the limit. Your fourth assumption does not seem reasonable. Does not give you what you want?

Note that this only explicitly writes the starting code, and the code that might be modified to, not the past or future action sequence! This is important for the agent to be able to reason about this computation, despite it taking an infinite input.

I think this is exactly backwards. The property that makes spaces easy to search through and reason about is compactness, not finiteness. If is finite, then is compact, and thus easy to search through and reason about, provided the relevant functions on it are continuous. But the space of computer programs is an infinite discrete space, hence non-compact, and hard to search through and reason about, except by remembering that the purpose of selecting a program is so that it will generate an element of the nice, easily-searchable compact space of action sequences.

**alexmennen**on Safely and usefully spectating on AIs optimizing over toy worlds · 2018-08-03T20:42:35.982Z · score: 5 (3 votes) · LW · GW

The model I had in mind was that the AI and the toy world are both abstract computational processes with no causal influence from our world, and that we are merely simulating/spectating on both the AI itself and the toy world it optimizes. If the AI messes with people simulating it so that they end up simulating a similar AI with more compute, this can give it more influence over these peoples' simulation of the toy world the AI is optimizing, but it doesn't give the AI any more influence over the abstract computational process that it (another abstract computational process) was interfacing with and optimizing over.

Separately, I also find it hard to imagine us building a virtual world that is similar enough to the real world that we are able to transfer solutions between the two, even with some finetuning in the real world.

Yes, this could be difficult, and would likely limit what we could do, but I don't see why this would prevent us from getting anything useful out of a virtual-world-optimizer. Lots of engineering tasks don't require more explicit physics knowledge than we already have.

**alexmennen**on Safely and usefully spectating on AIs optimizing over toy worlds · 2018-07-31T22:26:36.859Z · score: 2 (3 votes) · LW · GW

I agree. I didn't mean to imply that I thought this step would be easy, and I would also be interested in more concrete ways of doing it. It's possible that creating a hereditarily restricted optimizer along the lines I was suggesting could end up being approximately as difficult as creating an aligned general-purpose optimizer, but I intuitively don't expect this to be the case.

**alexmennen**on Computational efficiency reasons not to model VNM-rational preference relations with utility functions · 2018-07-25T19:00:46.905Z · score: 4 (2 votes) · LW · GW

It seems unlikely to me that a re-evaluation of how many QALYs buying a sandwich is worth would arise from a re-evaluation of how valuable QALYs are, rather than a re-evaluation of how much buying the sandwich is worth.

I disagree with this. The value of a QALY could depend on other features of the universe (such as your lifespan) in ways that are difficult to explicitly characterize, and thus are subject to revision upon further thought. That is, you might not be able to say exactly how valuable the difference between living 50 years and living 51 years is, denominated in units of the difference between living 1000 years and living 1001 years. Your estimate of this ratio might be subject to revision once you think about it for longer. So the value of a QALY isn't stable under re-evaluation, even when expressed in units of QALYs under different circumstances. In general, I'm skeptical that the concept of good reference points whose values are stable in the way you want is a coherent one.

**alexmennen**on Meta: IAFF vs LessWrong · 2018-07-13T16:52:12.000Z · score: 0 (0 votes) · LW · GW

There should be a chat icon on the bottom-right of the screen on Alignment Forum that you can use to talk to the admins (unless only people who have already been approved can see this?). You can also comment on LW (Alignment Forum posts are automatically crossposted to LW), and ask the admins to make it show up on Alignment Forum afterwards.

**alexmennen**on Meta: IAFF vs LessWrong · 2018-07-12T22:21:17.000Z · score: 1 (1 votes) · LW · GW

There is a replacement for IIAF now: https://www.alignmentforum.org/

**alexmennen**on Clarifying Consequentialists in the Solomonoff Prior · 2018-07-12T16:49:30.555Z · score: 2 (1 votes) · LW · GW

Ok, I see what you're getting at now.

**alexmennen**on Clarifying Consequentialists in the Solomonoff Prior · 2018-07-12T03:49:52.836Z · score: 2 (1 votes) · LW · GW

I don't think that specifying the property of importance is simple and helps narrow down S. I think that in order for predicting S to be important, S must be generated by a simple process. Processes that take large numbers of bits to specify are correspondingly rarely occurring, and thus less useful to predict.

**alexmennen**on Clarifying Consequentialists in the Solomonoff Prior · 2018-07-12T03:46:00.092Z · score: 2 (1 votes) · LW · GW

Suppose that I just specify a generic feature of a simulation that can support life + expansion (the complexity of specifying "a simulation that can support life" is also paid by the intended hypothesis, so we can factor it out). Over a long enough time such a simulation will produce life, that life will spread throughout the simulation, and eventually have some control over many features of that simulation.

Oh yes, I see. That does cut the complexity overhead down a lot.

Once you've specified the agent, it just samples randomly from the distribution of "strings I want to influence." That has a way lower probability than the "natural" complexity of a string I want to influence. For example, if 1/quadrillion strings are important to influence, then the attackers are able to save log(quadrillion) bits.

I don't understand what you're saying here.

**alexmennen**on Clarifying Consequentialists in the Solomonoff Prior · 2018-07-12T00:15:00.185Z · score: 5 (3 votes) · LW · GW

I didn't mean that an agenty Turing machine would find S and then decide that it wants you to correctly predict S. I meant that to the extent that predicting S is commonly useful, there should be a simple underlying reason why it is commonly useful, and this reason should give you a natural way of computing S that does not have the overhead of any agency that decides whether or not it wants you to correctly predict S.

**alexmennen**on Clarifying Consequentialists in the Solomonoff Prior · 2018-07-11T21:22:04.074Z · score: 2 (1 votes) · LW · GW

This reasoning seems to rely on there being such strings S that are useful to predict far out of proportion to what you would expect from their complexity. But a description of the circumstance in which predicting S is so useful should itself give you a way of specifying S, so I doubt that this is possible.

**alexmennen**on A universal score for optimizers · 2018-07-11T18:34:27.242Z · score: 3 (2 votes) · LW · GW

I think decision problems with incomplete information are a better model in which to measure optimization power than deterministic decision problems with complete information are. If the agent knows exactly what payoffs it would get from each action, it is hard to explain why it might not choose the optimal one. In the example I gave, the first agent could have mistakenly concluded that the .9-utility action was better than the 1-utility action while making only small errors in estimating the consequences of each of its actions, while the second agent would need to make large errors in estimating the consequences of its actions in order to think that the .1-utility action was better than the 1-utility action.

**alexmennen**on Clarifying Consequentialists in the Solomonoff Prior · 2018-07-11T18:24:21.144Z · score: 3 (2 votes) · LW · GW

I'm not convinced that the probability of S' could be pushed up to anything near the probability of S. Specifying an agent that wants to trick you into predicting S' rather than S with high probability when you see their common prefix requires specifying the agency required to plan this type of deception (which should be quite complicated), and specifying the common prefix of S and S' as the particular target for the deception (which, insofar as it makes sense to say that S is the "correct" continuation of the prefix, should have about the same "natural" complexity as S). That is, specifying such an agent requires all the information required to specify S, plus a bunch of overhead to specify agency, which adds up to much more complexity than S itself.

**alexmennen**on An environment for studying counterfactuals · 2018-07-11T04:55:17.932Z · score: 5 (3 votes) · LW · GW

The multi-armed bandit problem is a many-round problem in which actions in early rounds provide information that is useful for later rounds, so it makes sense to explore to gain this information. That's different from using exploration in one-shot problems to make the counterfactuals well-defined, which is a hack.

**alexmennen**on A universal score for optimizers · 2018-07-11T04:11:24.461Z · score: 6 (4 votes) · LW · GW

Some undesirable properties of C-score:

It depends on how the space of actions are represented. If a set of very similar actions that achieve the same utility for the agent are merged into one action, this will change the agent's C-score.

It does not depend on the magnitudes of the agent's preferences, only on their orderings. Compare 2 agents: the first has 3 available actions, which would give it utilities 0, .9, and 1, respectively, and it picks the action that would give it utility .9. The second has 3 available actions, which would give it utilities 0, .1, and 1, respectively, and it picks the action that would give it utility .1. Intuitively, the first agent is a more successful optimizer, but both agents have the same C-score.

**alexmennen**on The Learning-Theoretic AI Alignment Research Agenda · 2018-07-01T23:52:26.000Z · score: 2 (1 votes) · LW · GW

A related question is, whether it is possible to design an algorithm for strong AI based on simple mathematical principles, or whether any strong AI will inevitably be an enormous kludge of heuristics designed by trial and error. I think that we have some empirical support for the former, given that humans evolved to survive in a certain environment but succeeded to use their intelligence to solve problems in very different environments.

I don't understand this claim. It seems to me that human brains appear to be "an enormous kludge of heuristics designed by trial and error". Shouldn't the success of humans be evidence for the latter?

**alexmennen**on Logical uncertainty and Mathematical uncertainty · 2018-06-27T20:57:17.846Z · score: 8 (2 votes) · LW · GW

Logical induction does not take the outputs of randomized algorithms into account. But it does listen to deterministic algorithms that are defined by taking a randomized algorithm but making it branch pseudo-randomly instead of randomly. Because of this, I expect that modifying logical induction to include randomized algorithms would not lead to a significant gain in performance.

**alexmennen**on Logical uncertainty and Mathematical uncertainty · 2018-06-26T23:25:07.764Z · score: 7 (1 votes) · LW · GW

Oh, I see. But for any particular value of n, the claim that there are n! permutations of n objects is something we can know in advance is resolvable (even if we haven't noticed why this is always true), because we can count the permutations and check.

**alexmennen**on Logical uncertainty and Mathematical uncertainty · 2018-06-26T22:20:58.582Z · score: 7 (1 votes) · LW · GW

Are you making the point that we often reason about how likely a sentence is to be true and then use our conclusion as evidence about how likely it is to have a proof of reasonable length? I think this is a good point. One possible response is that if we're doing something like logical induction in that it listens to many heuristics and pays more attention to the ones that have been reliable, then some of those heuristics can involve performing computations that look like trying to estimate how likely a sentence is to be true, in the process of estimating how likely it is to have a short proof, and then we can just pay attention to the probabilities suggested for existence of a short proof. A possible counterobjection is that if you want to know how to be such a successful heuristic, rather than just how to aggregate successful heuristics, you might want to reason about probabilities of mathematical truth yourself. A possible response to this counterobjection is that yes, maybe you should think about how likely a mathematical claim is to be true, but it is not necessary for your beliefs about it to be expressed as a probability, since it is impossible to resolve bets made about the truth of a mathematical sentence, absent a procedure for checking it.