Forum Digest: Corrigibility, utility indifference, & related control ideas 2015-03-24T17:39:09.000Z
Single-bit reflective oracles are enough 2015-03-17T23:00:24.000Z
An implementation of modal UDT 2015-02-11T06:02:22.000Z
Generalizing the Corrigibility paper's impossibility result? 2015-02-04T03:16:53.000Z
On notation for modal UDT 2015-02-03T19:26:48.000Z
From halting oracles to modal logic 2015-02-03T19:26:17.000Z
Third-person counterfactuals 2015-02-03T01:13:26.000Z
The odd counterfactuals of playing chicken 2015-02-02T07:15:50.000Z
Multibit reflective oracles 2015-01-25T02:23:00.000Z
"Evil" decision problems in provability logic 2015-01-10T01:04:28.000Z
Utility indifference and infinite improbability drives 2014-11-29T06:26:36.000Z
Improving the modal UDT optimality result 2014-11-23T22:16:38.000Z
Trustworthy automated philosophy? 2014-11-21T02:57:23.000Z
Topological truth predicates: Towards a model of perfect Bayesian agents 2014-11-15T06:39:48.000Z
Oracle machines instead of topological truth predicates 2014-11-15T06:39:31.000Z
Simplicity priors with reflective oracles 2014-11-15T06:39:19.000Z
A primer on provability logic 2014-11-15T06:39:04.000Z
Main vs. Discussion 2014-11-15T06:38:52.000Z
An optimality result for modal UDT 2014-11-15T06:38:22.000Z
Main vs. Discussion 2014-11-15T06:35:49.000Z
Predictors that don't try to manipulate you(?) 2014-11-15T05:53:19.000Z
Exploiting EDT 2014-11-10T19:59:51.000Z
Welcome! 2014-11-04T03:20:47.000Z


Comment by Benya_Fallenstein (Benja_Fallenstein) on Waterfall Truth Predicates · 2015-07-14T04:02:13.000Z · LW · GW

We should be more careful, though, about what we mean by saying that only depends on for , though, since this cannot be a purely syntactic criterion if we allow quantification over the subscript (as I did here). I'm pretty sure that something can be worked out, but I'll leave it for the moment.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Waterfall Truth Predicates · 2015-07-14T03:57:49.000Z · LW · GW

I would suggest changing this system by defining to mean that no is the Gödel number of a proof of an inconsistency in ZFC (instead of just asserting that isn't). The purpose of this is to make it so that if ZFC were inconsistent, then we only end up talking about a finite number of levels of truth predicate. More specifically, I'd define to be PA plus the axiom schema

Then, it seems that Jacob Hilton's proof that the waterfalls are consistent goes through for this waterfall:

Work in ZFC and assume that ZFC is inconsistent. Let be the lowest Gödel number of a proof of an inconsistency. Let be the following model of our language: Start with the standard model of PA; it remains to give interpretations of the truth predicates. If , then is false for all . If , then is true iff is the Gödel number of a true formula involving only for . Then, it's clear that , and hence all (since is the strongest of the systems) is sound on , and therefore consistent.

Thus, we have proven in ZFC that if ZFC is inconsistent, then is consistent; or equivalently, that if is inconsistent, then ZFC is consistent. Stepping out of ZFC, we can see that if is inconsistent, then ZFC proves this, and therefore in this case ZFC proves its own consistency, implying that it is inconsistent. Hence, if ZFC is consistent, then so is .

(Moreover, we can formalize this reasoning in ZFC. Hence, we can prove in ZFC (i) that if ZFC is inconsistent, then is consistent, and (ii) that if ZFC is consistent, then is consistent. By the law of the excluded middle, ZFC proves that is consistent.)

Comment by Benya_Fallenstein (Benja_Fallenstein) on No Good Logical Conditional Probability · 2015-06-03T18:29:22.000Z · LW · GW

Hm; we could add an uninterpreted predicate symbol to the language of arithmetic, and let and . Then, it seems like the only barrier to recursive enumerability of is that 's opinions about aren't computable; this seems worrying in practice, since it seems certain that we would like logical uncertainty to be able to reason about the values of computations that use more resources than we use to compute our own probability estimates. But on the other hand, all of this makes this sound like an issue of self-reference, which is its own can of worms (once we have a computable process assigning probabilities to the value of computations, we can consider the sentence saying "I'm assigned probability " etc.).

Comment by Benya_Fallenstein (Benja_Fallenstein) on Agents that can predict their Newcomb predictor · 2015-06-02T20:03:18.000Z · LW · GW

The other direction follows from the fact that the algorithm is bounded, and PA can simply show the execution trace of in steps.

Unimportant technical point: I think the length of the PA proof grows faster than this. (More precisely, the length in symbols, rather than the length in number of statements; we're almost always interested in the former, since it determines how quickly a proof can be checked or be found by exhaustive search.) The obvious way of showing in PA is to successively show for higher and higher that "after ticks, the Turing machine computing is in the following state, and its tapes have the following content". But even to write out the contents of the tape of a Turing machine that has run for ticks will in general take symbols. So the obvious way of doing the proof takes at least symbols. Moreover, I'm not sure whether we can get from "at time , the configuration is such-and-such" to "at time , the configuration is such-and-such" in a constant number of proof steps. I suspect we don't need more than symbols overall, but I'm not extremely confident. (I'd be quite surprised if the time turned out not to be polynomial, though!)

Comment by Benya_Fallenstein (Benja_Fallenstein) on An Informal Conjecture on Proof Length and Logical Counterfactuals · 2015-05-16T16:38:07.000Z · LW · GW

Next, we consider the case that PA is consistent and work through the agent’s decision. PA can’t prove , since we used the chicken rule, so since the sentence is easily provable, the sentence (ie. the first sentence that the agents checks for proofs of) must be unprovable.

It seems like this argument needs soundness of PA, not just consistency of PA. Do you see a way to prove in PA that if , then PA is inconsistent?

[edited to add:] However, your idea reminds me of my post on the odd counterfactuals of playing chicken, and I think the example I gave there makes your idea go through:

The scenario is that you get 10 if you take action 1 and it's not provable that you don't take action 1; you get 5 if you take action 2; and you get 0 if you take action 1 and it's provable that you don't. Clearly you should take action 1, but I prove that modal UDT actually takes action 2. To do so, I show that PA proves . (Then from the outside, follows from the outside by soundness of PA.)

This seems to make your argument go through if we can also show that PA doesn't show . But if it did, then modal UDT would take action 1 because this comes first in its proof search, contradiction.

Thus, PA proves (because this follows from ), and also PA doesn't prove . As in your argument, then, the trolljecture implies that we should think "if the agent takes action 1, it gets utility 0" is a good counterfactual, and we don't think that's true.

Still interested in whether you can make your argument go through in your case as well, especially if you can use the chicken step in a way I'm not seeing yet. Like Patrick, I'd encourage you to develop this into a post.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Reflective probabilistic logic cannot assign positive probability to its own coherence and an inner reflection principle · 2015-05-06T02:41:00.000Z · LW · GW

If you replace the inner "" by "", then the literal thing you wrote follows from the reflection principle: Suppose that the outer probability is . Then

Now, implies , which by the converse of the outer reflection principle yields , whence . Now, by the forward direction of the outer reflection principle, we have

which, by the outer reflection principle again, implies

a contradiction to the assumption that had outer probability .

However, what we'd really like is an inner reflection principle that assigns probability one to the statement *quantified over all , , and Gödel numbers . I think Paul Christiano has a proof that this is impossible for small enough , but I don't remember how the details worked.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Paraconsistent Tiling Agents (Very Early Draft) · 2015-04-28T00:40:24.000Z · LW · GW

This is very interesting!

My main question, which I didn't see an answer to on my first read, is: If the agent proves that action leads to the goal being achieved, are there any conditions under which this implies that the goal actually is achieved? The procrastination paradox shows that this isn't true in general. Is there a class of sentences (e.g., all sentences, though I don't expect that to be true in this case) such that if then ? In other words, do we have some guarantee that we do better at actually achieving than an agent which uses the system ?

A more technical point: Given that you claim that Theorem 3.2 (reflectively coherent quantified belief) follows "trivially" from Theorem 2.4, you presumably mean Theorem 2.4 to state that , rather than . (Even if you consider to implicitly contain an occurrence of , it is essential that in the free variable is replaced by the numeral corresponding to the outer value of .)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Forum Digest: Corrigibility, utility indifference, & related control ideas · 2015-03-24T17:42:20.000Z · LW · GW

Categorization is hard! :-) I wanted to break it up because long lists are annoying to read, but there was certainly some arbitrariness in dividing it up. I've moved "resource gathering agent" to the odds & ends.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Identity and quining in UDT · 2015-03-22T01:33:27.000Z · LW · GW

Want to echo Nate's points!

One particular thing that I wanted to emphasize is that I think you can see as a thread on this forum (in particular, the modal UDT work is relevant) is that it's useful to make formal toy models where the math is fully specified, so that you can prove theorems about what exactly an agent would do (or, sometimes, write a program that figures it out for you). When you write out things that explicitly, then, for example, it becomes clearer that you need to assume that a decision problem is "fair" (extensional) to get certain results, as Nate points out (or if you don't assume it, someone else can look at your result and point out that it's not true as stated). In your post, you're using "logical expectations" that condition on something being true, without defining exactly what all of this means, and as a result you can argue about what these agents will do, but not actually prove it; that's certainly a reasonable part of the research process, but I'd like to encourage you to turn your work into models that are fully specified, so that you can actually prove theorems about them.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Meta- the goals of this forum · 2015-03-17T21:06:08.000Z · LW · GW

Some additions about how the initial system is going to work:

  • Non-member contributions (comments and links) are going to become publicly visible when they have received 2 likes (from members---only members can like things).

  • However, members will be able to reply to a contribution as soon as it has received 1 like. This means that if you think someone's made a useful contribution, and you want to reply to them about it, you don't have to wait for a second person to Like it before you can write that reply. (It won't be publicly visible until the contribution has two Likes, though.)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Third-person counterfactuals · 2015-02-03T20:56:48.000Z · LW · GW

Yeah, that sounds good! Of course, by the Kripke levels argument, it's sufficient to consider 's of the form . And we might want to have a separate notion of " leads to at level ", which we can actually implement in a finite modal formula. This seems to suggest a version of modal UDT that tries to prove things in PA, then if that has ambiguous counterfactuals (i.e., it can't prove for any ) we try PA+1 and so on up to some finite ; then we can hope these versions of UDT approximate optimality according to your revised version of " leads to " as . Worth working out!

Comment by Benya_Fallenstein (Benja_Fallenstein) on Third-person counterfactuals · 2015-02-03T01:25:49.000Z · LW · GW

Fixed---thanks, Patrick!

Regarding the example, I earlier defined "action leads to outcome " to mean the conjunction of and ; i.e., we check for spurious counterfactuals before believing that tells us something about what action leads to, and we only consider ourselves "fully informed" in this sense if we have non-spurious information for each . (Of course, my follow-up post is about how that's still unsatisfactory; the reason to define this notion of "fully informative" so explicitly was really to be able to say more clearly in which sense we intuitively have a problem even when we've ruled out the problems of ambiguous counterfactuals and not enough counterfactuals.)

Comment by Benya_Fallenstein (Benja_Fallenstein) on From halting oracles to modal logic · 2014-11-23T22:17:53.000Z · LW · GW

(Thanks for the feedback!)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Trustworthy automated philosophy? · 2014-11-21T08:02:34.000Z · LW · GW

In our chat about this, Eliezer said that, aside from the difficulty of making a human-level mathematical philosophy engine aligned with our goals, an additional significant relevant disagreement with Paul is that Eliezer thinks it's likely that we'll use low-level self-improvement on the road to human-level AI; he used the analogies of programmers using compilers instead of writing machine code directly, and of EURISKO helping Lenat. (Again, hoping I'm not misrepresenting.)

This seems like a plausible scenario to me, but I'm not convinced it argues for the kind of work we're doing currently; it seems fairly likely to me that "reliably helping the programmers to do low-level tasks on the way to human-level AI" can probably be handled by having a small protected metalayer, which only the humans can modify. This sort of architecture seems very problematic for an AI acting in the world---it makes Cartesian assumptions, and we don't want something that's bound to a single architecture like that---which is why we haven't been looking in this direction, but if this is a significant reason for studying self-reference, we should be exploring obvious tricks involving a protected metalayer.

For example, the Milawa theorem prover has a small initial verifier, which can be replaced by a different verifier that accepts proofs in a more powerful language, given a proof that the new verifier only outputs theorems that the old verifier would also have outputted on some input. How does this avoid running into the Löbstacle? (I.e., how can the new verifier also allow the user to replace it given a proof of the analogous theorem in its language?) The answer is, by having a protected metalevel; the proof that the new verifier only outputs theorems the old verifier would have outputted does not need to prove anything about the code that is willing to switch out the new verifier for an even newer verifier.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Approximability · 2014-11-21T07:18:48.000Z · LW · GW

...though although I'm guessing the variant of the reflective oracle discussed in the comment thread may be approximable, it seems less likely that a version of AIXI can be defined based on it that would be approximable.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Approximability · 2014-11-21T03:14:26.000Z · LW · GW

Sorry for the delayed reply! I like this post, and agree with much of what you're saying. I guess I disagree with the particular line you draw, though; I think there's an interesting line, but it's at "is there a Turing machine which will halt and give the answer to this problem" rather than "is there a Turing machine which will spit out the correct answer for a large enough input (but will spit out wrong answers for smaller inputs, and you don't know what number is large enough)". The latter of these doesn't seem that qualitatively different to me from "is there a Turing machine which will give you the correct answer if you give it two large enough numbers (but will spit out wrong answers if the numbers aren't large enough, and what is 'large enough' for the second number depends on what the first number you give it is)", which takes you one more level up the arithmetic hierarchy; the line between the first and the second seems more significant to me than the line between the second and the third.

Regarding the reflective oracle result, yes the version presented in the post is higher up in the arithmetic hierarchy, but I think the variant discussed in this comment thread with Paul is probably approximable. As I said above, though, I'm not convinced that that's actually the right place to put the line. Also, there's of course an "AIXItl-like version" which is computable by limiting itself to hypotheses with bounded source code length and computation time. But most importantly in my mind, I don't actually think it would make sense for any actual agent to compute an oracle like this; the point is to define a notion of perfect Bayesian agent which can reason about worlds containing other perfect Bayesian agents, in the hope that this model will yield useful insights about the real world of logically uncertain agents reasoning about worlds containing other logically uncertain agents, and these logically uncertain agents certainly don't reason about each other by computing out an "oracle" first.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Stable self-improvement as a research problem · 2014-11-21T03:00:28.000Z · LW · GW

I think the main disagreement is about whether it's possible to get an initial system which is powerful in the ways needed for your proposal and which is knowably aligned with our goals; some more about this in my reply to your post, which I've finally posted, though there I mostly discuss my own position rather than Eliezer's.

Comment by Benya_Fallenstein (Benja_Fallenstein) on "Evil" decision problems in provability logic · 2014-11-16T04:29:49.000Z · LW · GW

Actually, drnickbone's original LessWrong post introducing evil problems also gives an extension to the case you are considering: The evil decision problem gives the agent three or more options, and rewards the one that the "victim" decision theory assigns the least probability to (breaking ties lexicographically). Then, no decision theory can put probability on the action that is rewarded in its probabilistic evil problem.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Exploiting EDT · 2014-11-12T01:18:42.000Z · LW · GW

True. This looks to me like an effect of EDT not being stable under self-modification, although here the issue is handicapping itself through external means rather than self-modification---like, if you offer a CDT agent a potion that will make it unable to lift more than one box before it enters Newcomb's problem (i.e., before Omega makes its observation of the agent), then it'll cheerfully take it and pay you for the privilege.

Comment by Benya_Fallenstein (Benja_Fallenstein) on A primer on provability logic · 2014-11-11T01:46:18.000Z · LW · GW

Thread for proofs of results claimed above (not limited to stuff not found in Lindström); contributions appreciated. Stuff not in Lindström includes the uniqueness of arithmetic fixed points in PA (found in the modal agents paper), and I think the version of the fixed point theorem with more than one (the latter should be provable by iterated application of the fixed point theorem for a single ).

Comment by Benya_Fallenstein (Benja_Fallenstein) on Oracle machines instead of topological truth predicates · 2014-11-10T21:56:34.000Z · LW · GW

Thanks for expanding on your construction! I hadn't thought of the recursive construction, that's really neat.

I'm not that worried about the application to AIXI: unless I'm missing something, we can just additionally give our machines access to a double halting oracle (for ordinary Turing machines), and recover the same power. I considered doing that and didn't go with it because the stuff in my post seemed slightly more elegant, but if there's a reason to prefer the other version, it seems fine to use it.

I'm not clear on what you mean by ""; I'm guessing you were sloppy in talking about rather than my or some other variant, but I may be missing something. The version seems to need Kakutani unless we want to allow some -tolerance: consider for arbitrary and for ; we want this to equal for all such that and to equal for all such that , which makes discontinuous if it's single-valued. My intuition is that we generally need Kakutani to get exact optimality rather than optimality (as a rule of thumb, not a hard-and-fast rule), because which choice is optimal jumps discontinuously.

In the version, continuity is not as immediate as if we could determine by looking at only three values of . However, we can use the same trick as in my post on simplicity priors: Abbreviate the probability that the oracle will say "greater" or "less" when called on as and , respectively; then, the probability that a reflective assigns to halting equals


This is clearly a continuous function of , since later calls to are multiplied with smaller and smaller weights. Thus, we can use this expression and the analogous one for in the definition of .

We can then check that a fixed point of this correspondence is a reflective oracle by showing inductively that for every , if the probability that halts in steps and outputs is greater than then and if the probability that it halts in steps and outputs is less than , then .

Tangential: I'm now thinking that it would be more elegant not to speak about halting; rather, use write-and-advance-only output tapes, and talk about the probability that the first bit outputs is a or a , with "never outputs anything" as the third possibility(whether due to looping or halting without output). Interested in whether or not this presentation would seem natural to theoretical computer scientists. (Although still interested in comments on the notation for the probabilities of the oracle's outputs.)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Exploiting EDT · 2014-11-10T20:00:13.000Z · LW · GW

Thanks! I didn't really think at all about whether or not "money-pump" was the appropriate word (I'm not sure what the exact definition is); have now changed "way to money-pump EDT agents" into "way to get EDT agents to pay you for managing the news for them".

Comment by Benya_Fallenstein (Benja_Fallenstein) on Oracle machines instead of topological truth predicates · 2014-11-09T01:44:03.000Z · LW · GW

About determining whether something's a query: A formulation of the statement we want to test is, (A) "for every (infinite) input on the oracle tape, and every , there is a such that with probability , the program halts in steps."

I think this is equivalent to, (B) "For every , there is a such that, for any input on the oracle tape, the program halts within timesteps with probability ."

Reason why I think this is equivalent: Clearly (B) implies (A). Suppose that (B) is false; then there is a such that for all , there is an input on the oracle tape such that the program runs for steps with probability . Let be a function that returns such an oracle tape input for every .

Now, there either must be infinitely many such that starts with , or infinitely many such that starts with , or both. Write down such a bit; say, . Now, there must be infinitely many such that starts with , or such that starts with , or both. Iterate; this way, we obtain an infinitely long input for the oracle tape, which has the property that for every , there are arbitrarily large ---and in particular, there's a ---such that the program runs for steps with probability . In other words, for every , the probability that the program runs for steps when given this input on its oracle tape is . This shows that (A) is false.

Thus, it's sufficient to test whether a prospective query satisfies (B). But in timesteps, the machine can only read the first bits of its oracle tape and of its random input tape, so the statement "for any input on the oracle tape, the program halts within timesteps with probability " is primitive recursive. Thus, (B) is a statement, which we can test with a double halting oracle.

(warning: math above written quickly and not carefully checked)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Oracle machines instead of topological truth predicates · 2014-11-08T23:57:05.000Z · LW · GW

In the other direction, suppose that is a primitive recursive predicate, and consider the following probabilistic Turing machine: First, randomly choose an (placing positive probability on every natural number). Then, search exhaustively for an such that is true. If you find one, halt and output . This machine is a query if and only if . Together with the parent comment, this shows that having an oracle that tells you whether or not something is a query is equivalent to having an oracle for statements (i.e., a double halting oracle).

Comment by Benya_Fallenstein (Benja_Fallenstein) on Oracle machines instead of topological truth predicates · 2014-11-08T23:04:23.000Z · LW · GW

Aagh... I'm conflicted. I really like the simplicity of having players correspond 1:1 to calls to the oracle, and their mixed strategies to the probability that the oracle returns "true", but the implication that the oracle must return the same answer if it's called twice on the same arguments during the execution of a single program interacts very badly with the intuitive picture of the applications I have in mind.

The intuitive picture is that the agents under consideration are living in a world whose laws of physics are probabilistic and allow for the construction of oracles---meaning, devices that invoke the underlying reflective oracle. An agent would be just a computer with access to such a device. Then, according to CDT, it's optimal to program these computers to do oracle calls that compare the expected utilities of taking the different actions available to the agent, and then taking an action that maximizes expected utility. (I.e., it's not that that's the only way you could program that computer---you could program it to use its oracle in many different ways, but this one is an optimal one according to causal reasoning.) The world, then, can be modelled to be an oracle machine program which simulates the local laws of physics and outputs what happens in the world.

But for this picture to be correct, it's really important for different calls to the oracle to be independent. Suppose that they aren't, and we have two players playing a game of Matching Pennies. One of these players chooses according the expected utility calculation described above, using a single oracle invocation to figure out which action is better. But the other player... makes the same call to the oracle as the first player, uses this to determine whether the first player chooses heads or tails, and does the opposite. Clearly, the second player wins every game, meaning that the algorithm run by the first player is no longer optimal.

So I think the version you outlined doesn't work for my main application, even though I really like the simplicity of it.

One possibility would be to have a player for every instance where the oracle is invoked in one of the possible execution histories of one of the programs under consideration. But then it wouldn't be guaranteed that the different players corresponding to the same pair will play the same mixed strategies. Of course, it can be shown that there are Nash equilibria that are symmetric in this way, but requiring such a result seems to take away from the simplicity of the connection to Nash equilibria, and it's not immediate that the algorithms for finding Nash equilibria can be adapted to finding this kind of symmetric equilibrium (although they almost certainly can).

Another possibility would be to try to formulate this as a game where players correspond to pairs , but where their pure strategies are numbers in the interval . In this case, if the players' payoffs are a continuous bounded function of the strategy profile, Kakutani's fixed point theorem will guarantee that there is a pure Nash equilibrium, and it should be possible to find this result in a standard textbook somewhere (economists need continuous strategy spaces for modelling things like auction bids). And there might be existing algorithms for finding Nash equilibria for such games. Judd, Renner & Schmedders (2012) looks promising, although I haven't really delved into it. (It also looks annoyingly complicated, though.)

For the payoffs, we could then re-use your idea: If player chooses strategy , their payoff is plus times the probability that returns , where behaves with probabilities according to the pure strategy profile under consideration.

Although, now I'm worried whether or not it's a problem that when we evaluate whether a player would do better by switching from to , we also switch the oracle to a different oracle that makes calls to return "true" with probability rather than with probability ? (In other words, when we consider replacing by , we're also potentially changing the probability that halts...)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Oracle machines instead of topological truth predicates · 2014-11-08T08:13:00.000Z · LW · GW

Fixed the typo, thanks!

I considered describing the probabilities of the oracle returning "true" by a different function , but it seemed too pedantic to have a different letter. Maybe that's wrong, but it still feels too pedantic. If I do things that way I probably shouldn't be writing " returns 'true' if...", though...

Comment by Benya_Fallenstein (Benja_Fallenstein) on Oracle machines instead of topological truth predicates · 2014-11-08T08:08:24.000Z · LW · GW

Thanks! I've been also thinking about making a finite version; my main motivation was that I've been hoping to take my still-incompletely-specified AIXI variant and find an analog variant of AIXItl, the computable version of AIXI that considers only hypotheses which terminate in time and whose source length is .

I think that would in fact be a pretty reasonable finite set of programs to use in what you're suggesting, since "we want it to terminate in time " provides a motivation for why we would require programs to terminate surely, as opposed to almost surely. (Terminating surely seems to be necessary for the condition of "only finitely many possible calls to the oracle" to hold: there is an almost surely terminating stochastic oracle program which chooses a random pair in and calls the oracle on that.)

It seems like the finite, AIXItl version should be able to implement any finite game with a rational payoff matrix (for sufficiently large and ), which is pretty neat. (Of course, even though this is computable, it does require the game being played to be much simpler than the computers the players are running on---which are spending basically all of their computation time on computing the oracle, rather than the agent code that calls that oracle; in any real life situation, it's pretty clear that there would be a better use that these agents could put their supercomputers to.)

I hadn't realized the point about the problem being expressible using just elementary operations and the set-valued step function---that's pretty neat! I don't know enough about polynomial solving to figure out the details of how this would work, but it's extremely plausible that a variation of a technique used for finding Nash equilibria would work. (In fact, I wonder whether the problem can somehow be transformed into the problem of finding a Nash equilibrium of an appropriate finite game?) I'd be interested in seeing some more details worked out. It seems like some proof that equilibria in the finite case can actually be found would be nice to have in an exposition aimed at computer scientists. (One brute-force proof idea that I'd considered before would be just to say that by the fixed point theorem, there is some solution, so if you try rational-valued solutions in a fine enough grid, you have to find values that are within any of being a solution, in some appropriate sense. I haven't worked out the details.)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Oracle machines instead of topological truth predicates · 2014-11-06T21:44:50.000Z · LW · GW

Hm, I think it would be expectedly helpful to explain things more, but it would also take more time, and aiming much lower would take much more time. (That's part of why it's taking me so long to turn these things into research papers, although in that case the length limitations make the constraint even more difficult.) At the moment, I'm trying to get the things I'm working on out there at all, especially with something like this where I haven't yet worked through all the details of whether I can actually do my intended application based on it. After all, this doesn't need to be the last explanation of this topic, ever!

That said, I do appreciate the problem that something written in this style is harder to digest than something with more explanations. Maybe I'll try to shift in that direction, but I don't want that to come at the cost of not getting new interesting thoughts out at all (which I'm worried it might).

Comment by Benya_Fallenstein (Benja_Fallenstein) on Topological truth predicates: Towards a model of perfect Bayesian agents · 2014-11-06T19:57:02.000Z · LW · GW

Thanks! Replied in a post.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Using modal fixed points to formalize logical causality · 2014-11-05T01:01:41.000Z · LW · GW

A model of UDT with a halting oracle searches only for one utility value for each action. I'm guessing the other formulation just wasn't obvious at the time? (I don't remember realizing the possibility of playing chicken implicitly before Will Sawin advertised it to me, though I think he attributed it to you.)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Topological truth predicates: Towards a model of perfect Bayesian agents · 2014-11-05T00:51:40.000Z · LW · GW

Thanks! :-)

The reflective assignments correspond very directly to Nash equilibria (albeit in an -to- manner, because given any finite game, a reflective assignment contains information about that game but also information about many other things). So I wouldn't quite say that they correspond to methods of equilibrium selection---e.g., if two methods of equilibrium selection give you the same Nash equilibrium, they're not distinguishable on the level of reflective assignments. But yeah, the question of what equilibrium gets played gets outsourced to the oracle.

And yeah, since these agents play Nash equilibria, they will defect on the symmetric PD. (I'm not advocating this as a desirable feature. :-))

Comment by Benya_Fallenstein (Benja_Fallenstein) on Using modal fixed points to formalize logical causality · 2014-11-04T22:37:23.000Z · LW · GW

However, the approach is designed so that these "spurious" logical implications are unprovable, so they don’t interfere with decision-making. The proof of that is left as an easy exercise.

I don't think this is technically true as stated; it seems to be possible that the agent proves some spurious counterfactuals as long as the outcome it does in fact obtain is the best possible one. (This is of course harmless!) Say the agent has two possible actions, and , leading to outcomes and , respectively. The latter is preferred, and these are the only two outcomes. Suppose that happens to be lexicographically lower than in the agent's reasoning. Then it seems to be provable that the agent will in fact choose , meaning that it's provable that it won't choose , meaning that it finds both and the spurious in the first step.

So I think the correct statement is a disjunction: The agent obtains the highest possible outcome or it finds no spurious counterfactuals.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Using modal fixed points to formalize logical causality · 2014-11-04T22:07:07.000Z · LW · GW

I know this is supposed to be just introductory, but I actually think that the complete reformulation of UDT-with-a-halting-oracle in terms of modal logic is really interesting! For starters, it allows us to compare UDT and modal agents in the same framework (with the right 's, we can see this version of UDT as a modal agent). It would also be really neat if we could write an "interpreter" that allows us to write UDT as a program calling a halting oracle, and then evaluate what it does by way of modal logic.

But also, it allows us to give a nice definition of "decision theory" and "decision problem" in the context with halting oracles. I was planning on posting soon about the definition that I showed you when you were visiting, which is designed for actually computable agents with bounded proof lengths, and is more complicated because of that. As a stepping stone to that, using the provability logic framework, I think we can define:

  • a decision theory to be a set of fully modalized formulas , for (fully modalized meaning that the propositional arguments only appear inside boxes);
  • and a decision problem to be a set of formulas , for , which do not need to be modalized (which isn't a problem because they're not self-referential).

The condition that must be fully modalized, but doesn't need to be, seems to be the natural thing corresponding to how in the bounded case, we allow the universe to run the agent, but the agent must use abstract reasoning about the universe, it can't just run it.

Given such a set, we can define , for , as follows: Then the actual action taken by decision theory when run on the decision problem is the modal fixed point of the equations , for .

Comment by Benya_Fallenstein (Benja_Fallenstein) on The Pascal's Wager Fallacy Fallacy · 2009-03-19T12:00:10.000Z · LW · GW
Ask yourself if you would want to revive someone frozen 100 years ago. Yes. They don't deserve to die. Kthx next.

I wish that this were on Less Wrong, so that I could vote this up.

Comment by Benya_Fallenstein (Benja_Fallenstein) on An Especially Elegant Evpsych Experiment · 2009-02-14T00:05:01.000Z · LW · GW

Garrett, since Anonymous reply was a little implicit, the point is that infants have a larger chance of dying before reproducing than young adults, so expected number of future offspring increases during childhood (when at each point counting only non-deceased children).

Aron, almost; it's because they get older, and only future children are relevant. Whether they've had children won't change the value except insofar it changes the chance for future children.

Me: IIUC, we expect a large influence of random variation in the sample.

Bzzzt! Wrong.

Upon more careful reading and thinking, what I understand the authors to be doing is this. They ask 436 Canadian subjects to imagine that two sons or two daughters of different specified ages died in a car accident, and ask which child the subject thinks the parent would feel more grief for. They then use the Thurstone scaling procedure to obtain a grief score for each age (1 day; 1, 2, 6, 10, 13, 17, 20, 30, 50 years).

They say that the procedure gives highly replicable results, and they have that large sample size, so no big sampling effects expected here.

They then correlate this data with reproduction value data for the same ages for the !Kung, which they got from: Howell, N. Demography of the Dobe !Kung, New York: Academic Press, 1979. This is not a random sample, it's for the whole population, so no sampling effects there.

So replication with the same populations should give a very similar result. My original argument still applies, in that the high correlation may in part be due to the choice of populations, but I was completely wrong in expecting sampling effects to play a role.

Also, I realize now that I can't really judge how extreme the correlation is (though I'll happily defer to those who say it is very large): it's too different from the usual kind of correlation in Psychology for my fledgling feeling for correlation values to apply. The usual kind of study looks at two values for each experimental subject (e.g. IQ vs. rating of looks) where this study looks at two values (Canadian ratings and !Kung reproductive value) for each of the ten age groups. In the usual kind of study, correlations >0.9 are suspiciously high, because, AFAIR, if you administer the same psychological instrument to the same subjects twice, a good correlation between the two tests is ~0.8, which means the noise from testing is just too large to get you a correlation >0.9. This obviously doesn't apply to the present study's design.

Comment by Benya_Fallenstein (Benja_Fallenstein) on An Especially Elegant Evpsych Experiment · 2009-02-13T17:43:31.000Z · LW · GW

Eliezer, right, thanks. And I hadn't noticed about the correlations of the subcategories...

Comment by Benya_Fallenstein (Benja_Fallenstein) on An Especially Elegant Evpsych Experiment · 2009-02-13T16:46:38.000Z · LW · GW
Might we get an even higher correlation if we tried to take into account the reproductive opportunity cost of raising a child of age X to independent maturity, while discarding all sunk costs to raise a child to age X?

I haven't done the math, but my intuition says that upon observing the highest! correlation! ever!, surely our subjective probability must go towards a high true underlying correlation and having picked a sample with a particularly high correlation? (Conditioning on the paper not being wrong due to human error or fake, of course -- I don't suspect that particularly, but surely our subjective probability of that must go up too upon seeing the !!!.) If this is correct, it seems that we should expect to see a lower correlation for the modified design, even if the underlying effect is actually stronger.

(If I'm making a thinko somewhere there, please do tell... I hope to Know My Stuff about statistics someday, but I'm just not there yet :))

Do note that the correlation is, IIUC, between the mean Canadian rating for a given age and the mean reproductive value of female !Kung of a given age, meaning that "if the correlations were tested, the degrees of freedom would be (the number of ages) - 2 = 8, not (the number of subjects - 2) as is usually the case when testing correlations for significance", so IIUC, we expect a large influence of random variation in the sample. (The authors don't actually provide p-values for the correlations.) That's not surprising, really; if the highest! correlation! ever! came from an experiment that did not allow for significant influence of random effects (because of really large sample size, say), that should make us suspicious, right? (Because if there were real effects that large, there should be other people investigating similarly large effects with statistically weaker methods, and thus occasionally getting even more extreme results?)

Comment by Benya_Fallenstein (Benja_Fallenstein) on OB Status Update · 2009-01-31T01:10:46.000Z · LW · GW

Any chance you'd consider installing jsMath? (Client-side library rendering LaTeX math. Formatting math in plain HTML won't kill you, but there are other things you can do with the same amount of effort that will make you stronger still :-))

Comment by Benya_Fallenstein (Benja_Fallenstein) on Sympathetic Minds · 2009-01-19T10:13:27.000Z · LW · GW
We might even cooperate in the Prisoner's Dilemma. But we would never be friends with them. They would never see us as anything but means to an end. They would never shed a tear for us, nor smile for our joys. And the others of their own kind would receive no different consideration, nor have any sense that they were missing something important thereby.

...but beware of using that as a reason to think of them as humans in chitin exoskeletons :-)

Comment by Benya_Fallenstein (Benja_Fallenstein) on In Praise of Boredom · 2009-01-18T16:24:51.000Z · LW · GW

Robin, I suspect that despite how it may look from a high level, the lives of most of the people you refer to probably do differ enough from year to year that they will in fact have new experiences and learn something new, and that they would in fact find it unbearable if their world were so static as to come even a little close to being video game repetitive.

That said, I would agree that many people seem not to act day-to-day as if they put a premium on Eliezer-style novelty, but that seems like it could be better explained by Eliezer's boredom being a FAR value than by the concept being specific to Eliezer :-)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Newcomb's Problem and Regret of Rationality · 2009-01-02T15:01:00.000Z · LW · GW

thinks -- Okay, so if I understand you correctly now, the essential thing I was missing that you meant to imply was that the utility of living forever must necessarily be equal to (cannot be larger than) the limit of the utilities of living a finite number of years. Then, if u(live forever) is finite, p times the difference between u(live forever) and u(live n years) must become arbitrarily small, and thus, eventually smaller than q times the difference between u(live n years) and u(live googolplex years). You then arrive at a contradiction, from which you conclude that u(live forever) = the limit of u(live n years) cannot be finite. Okay. Without the qualification I was missing, the condition wouldn't be inconsistent with a bounded utility function, since the difference wouldn't have to get arbitrarily small, but the qualification certainly seems reasonable.

(I would still prefer for all possibilities considered to have defined utilities, which would mean extending the range of the utility function beyond the real numbers, which would mean that u(live forever) would, technically, be an upper bound for {u(live n years) | n in N} -- that's what I had in mind in my last paragraph above. But you're not required to share my preferences on framing the issue, of course :-))

Comment by Benya_Fallenstein (Benja_Fallenstein) on Newcomb's Problem and Regret of Rationality · 2009-01-02T05:57:00.000Z · LW · GW

Given how many times Eliezer has linked to it, it's a little surprising that nobody seems to have picked up on this yet, but the paragraph about the utility function not being up for grabs seems to have a pretty serious technical flaw:

There is no finite amount of life lived N where I would prefer a 80.0001% probability of living N years to an 0.0001% chance of living a googolplex years and an 80% chance of living forever. This is a sufficient condition to imply that my utility function is unbounded.

Let p = 80% and let q be one in a million. I'm pretty sure that what Eliezer has in mind is,

(A) For all n, there is an even larger n' such that (p+q)u(live n years) < pu(live n' years) + q*(live a googolplex years).

This indeed means that {u(live n' years) | n' in N} is not upwards bounded -- I did check the math :-) --, which means that u is not upwards bounded, which means that u is not bounded. But what he actually said was,

(B) For all n, (p+q)u(live n years) <= pu(live forever) + q*u(live googolplex years)

That's not only different from A, it contradicts A! It doesn't imply that u needs to be bounded, of course, but it flat out states that {u(live n years) | n in N} is upwards bounded by (pu(live forever) + qu(live googolplex years))/(p+q).

(We may perhaps see this as reason enough to extend the domain of our utility function to some superset of the real numbers. In that case it's no longer necessary for the utility function to be unbounded to satisfy (A), though -- although we might invent a new condition like "not bounded by a real number.")

Comment by Benya_Fallenstein (Benja_Fallenstein) on 37 Ways That Words Can Be Wrong · 2008-12-17T10:50:45.000Z · LW · GW

(To be clear, the previous comment was meant as a joke, not as a serious addition to the list -- at least not as it stands :-))

Comment by Benya_Fallenstein (Benja_Fallenstein) on 37 Ways That Words Can Be Wrong · 2008-12-16T15:53:04.000Z · LW · GW

You give an absolute train wreck of a purported definition, then do your best to relive the crash over and over. Intelligence is not merely objective reason, but includes nonlinear subconscious processing, intuition, and emotional intelligence. Therefore, AGI needs quantum computing.

Comment by Benya_Fallenstein (Benja_Fallenstein) on Expected Creative Surprises · 2008-10-24T23:22:28.000Z · LW · GW

On a whimsical note, it is reminiscent of the unpredictability of the Infinite Improbability Drive :-)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Torture vs. Dust Specks · 2008-10-22T15:43:00.000Z · LW · GW

Bogdan's presented almost exactly the argument that I too came up with while reading this thread. I would choose the specks in that argument and also in the original scenario (as long as I am not committing to the same choice being repeated an arbitrary number of times, and I am not causing more people to crash their cars than I cause not to crash their cars; the latter seems like an unlikely assumption, but thought experiments are allowed to make unlikely assumptions, and I'm interested in the moral question posed when we accept the assumption). Based on the comments above, I expect that Eliezer is perfectly consistent and would choose torture, though (as in the scenario with 3^^^3 repeated lives).

Eliezer and Marcello do seem to be correct in that, in order to be consistent, I would have to choose a cut-off point such that n dust specks in 3^^^3 eyes would be less bad than one torture, but n+1 dust specks would be worse. I agree that it seems counterintuitive that adding just one speck could make the situation "infinitely" worse, especially since the speckists won't be able to agree exactly where the cut-off point is.

But it's only the infinity that's unique to speckism. Suppose that you had to choose between inflicting one minute of torture on one person, or putting n dust specks into that person's eye over the next fifty years. If you're a consistent expected utility altruist, there must be some n such that you would choose n specks, but not n+1 specks. What makes the n+1st speck different? Nothing, it just happens to be the cut-off point you must choose if you don't want to choose 10^57 specks over torture, nor torture over zero specks. If you make ten altruists consider the question independently, will they arrive at exactly the same value of n? Prolly not.

The above argument does not destroy my faith in decision theory, so it doesn't destroy my provisional acceptance of speckism, either.

Comment by Benya_Fallenstein (Benja_Fallenstein) on How Many LHC Failures Is Too Many? · 2008-10-02T12:15:00.000Z · LW · GW

Gawk! "even if they had to deal with a terrorist attack on all of these branches, say" was supposed to come after "Surely everybody here would find an outcome undesirable where all of their future Everett branches wink out of existence." (The bane of computers. On a typewriter, this would not have happened.)

Comment by Benya_Fallenstein (Benja_Fallenstein) on How Many LHC Failures Is Too Many? · 2008-10-02T12:09:00.000Z · LW · GW

Richard, I am going to assume ... that you assign an Everett branch in which you painless wink out of existence a value of zero (neither desirable or undesirable)

I'd rather say that people who find quantum suicide desirable have a utility function that does not decompose into a linear combination of individual utility functions for their individual Everett branches-- even if they had to deal with a terrorist attack on all of these branches, say. Surely everybody here would find an outcome undesirable where all of their future Everett branches wink out of existence. So if somebody prefers one Everett branch winking out and one continuing to exist to both continuing to exist, you can only describe their utility function by looking at all the branches, not by looking at the different branches individually. (Did that make sense?)

Comment by Benya_Fallenstein (Benja_Fallenstein) on Friedman's "Prediction vs. Explanation" · 2008-09-30T12:28:00.000Z · LW · GW

Two points I'd like to comment on.

Re: The second scientist had more information

I don't think this is relevant if-- as I understood from the description-- the first scientist's theory predicted experiments 11..20 with high accuracy. In this scenario, I don't think the first scientist should have learned anything that would make them reject their previous view. This seems like an important point. (I think I understood this from Tyrrell's comment.)

Re: Theories screen of theorists

I agree-- we should pick the simpler theory-- if we're able to judge them for simplicity, and one is the clear winner. This may not be easy. (To judge General Relativity to be appropriately simple, we may have to be familiar with the discussion around symmetries in physics, not just with the formulas of GR, for example...)

I understood Tyrrell to say that both of the scientists are imperfect Bayesian reasoners, and so are we. If we were perfect Bayesians, both scientists and we would look at the data and immediately make the same prediction about the next trial. In practice, all three of us use some large blob of heuristics. Each such blob of heuristics is going to have a bias, and we want to pick the one that has the smaller expected bias. (If we formalize theories as functions from experiments to probability distributions of results, I think the "bias" would naturally be the Kullback-Leibler divergence between the theory and the true distribution.) Using Tyrrell's argument, it seems we can show that the first scientist's bias is likely to be smaller than the second scientist's bias (other things being equal).

Comment by Benya_Fallenstein (Benja_Fallenstein) on Friedman's "Prediction vs. Explanation" · 2008-09-29T16:02:12.000Z · LW · GW

Tyrrell, um. If "the ball will be visible" is a better theory, then "we will observe some experimental result" would be an even better theory?

Solomonoff induction, the induction method based on Kolmogorov complexity, requires the theory (program) to output the precise experimental results of all experiments so far, and in the future. So your T3 would not be a single program; rather, it would be a set of programs, each encoding specifically one experimental outcome consistent with "the ball is visible." (Which gets rid of the problem that "we will observe some experimental result" is the best possible theory :))