Posts

A Brief Intro to Domain Theory 2019-11-21T03:24:13.416Z · score: 17 (9 votes)
So You Want to Colonize The Universe Part 5: The Actual Design 2019-02-27T10:23:28.424Z · score: 14 (8 votes)
So You Want to Colonize The Universe Part 4: Velocity Changes and Energy 2019-02-27T10:22:46.371Z · score: 11 (6 votes)
So You Want To Colonize The Universe Part 3: Dust 2019-02-27T10:20:14.780Z · score: 16 (9 votes)
So You Want to Colonize the Universe Part 2: Deep Time Engineering 2019-02-27T10:18:18.209Z · score: 13 (8 votes)
So You Want to Colonize The Universe 2019-02-27T10:17:50.427Z · score: 15 (12 votes)
Failures of UDT-AIXI, Part 1: Improper Randomizing 2019-01-06T03:53:03.563Z · score: 15 (6 votes)
COEDT Equilibria in Games 2018-12-06T18:00:08.442Z · score: 15 (4 votes)
Oracle Induction Proofs 2018-11-28T08:12:38.306Z · score: 6 (2 votes)
Bounded Oracle Induction 2018-11-28T08:11:28.183Z · score: 29 (10 votes)
What are Universal Inductors, Again? 2018-11-07T22:32:57.364Z · score: 11 (5 votes)
When EDT=CDT, ADT Does Well 2018-10-25T05:03:40.366Z · score: 14 (4 votes)
Asymptotic Decision Theory (Improved Writeup) 2018-09-27T05:17:03.222Z · score: 29 (8 votes)
Reflective AIXI and Anthropics 2018-09-24T02:15:18.108Z · score: 19 (8 votes)
Cooperative Oracles 2018-09-01T08:05:55.899Z · score: 19 (11 votes)
VOI is Only Nonnegative When Information is Uncorrelated With Future Action 2018-08-31T05:13:11.916Z · score: 23 (10 votes)
Probabilistic Tiling (Preliminary Attempt) 2018-08-07T01:14:15.558Z · score: 15 (9 votes)
Conditioning, Counterfactuals, Exploration, and Gears 2018-07-10T22:11:52.473Z · score: 30 (6 votes)
Logical Inductor Tiling and Why it's Hard 2018-06-14T06:34:36.000Z · score: 2 (2 votes)
A Loophole for Self-Applicative Soundness 2018-06-11T07:57:26.000Z · score: 1 (1 votes)
Resource-Limited Reflective Oracles 2018-06-06T02:50:42.000Z · score: 4 (3 votes)
Logical Inductors Converge to Correlated Equilibria (Kinda) 2018-05-30T20:23:54.000Z · score: 2 (2 votes)
Logical Inductor Lemmas 2018-05-26T17:43:52.000Z · score: 0 (0 votes)
Two Notions of Best Response 2018-05-26T17:14:19.000Z · score: 0 (0 votes)
Doubts about Updatelessness 2018-05-03T05:44:12.000Z · score: 1 (1 votes)
Program Search and Incomplete Understanding 2018-04-29T04:32:22.125Z · score: 41 (13 votes)
No Constant Distribution Can be a Logical Inductor 2018-04-07T09:09:49.000Z · score: 13 (6 votes)
Musings on Exploration 2018-04-03T02:15:17.000Z · score: 1 (1 votes)
A Difficulty With Density-Zero Exploration 2018-03-27T01:03:03.000Z · score: 0 (0 votes)
Distributed Cooperation 2018-03-18T05:46:56.000Z · score: 2 (2 votes)
Passing Troll Bridge 2018-02-25T08:21:17.000Z · score: 1 (1 votes)
Further Progress on a Bayesian Version of Logical Uncertainty 2018-02-01T21:36:39.000Z · score: 3 (2 votes)
Strategy Nonconvexity Induced by a Choice of Potential Oracles 2018-01-27T00:41:04.000Z · score: 1 (1 votes)
Open Problems Regarding Counterfactuals: An Introduction For Beginners 2017-07-18T02:21:20.000Z · score: 20 (6 votes)

Comments

Comment by diffractor on So You Want to Colonize The Universe Part 5: The Actual Design · 2019-08-23T05:59:57.495Z · score: 1 (1 votes) · LW · GW

Good point on phase 6. For phase 3, smaller changes in velocity further out are fine, but I still think that even with less velocity changes, you'll still have difficulty finding an engine that gets sufficient delta-V that isn't fission/fusion/antimatter based. (also in the meantime I realized that neutron damage over those sorts of timescales are going to be *really* bad.) For phase 5, I don't think a lightsail would provide enough deceleration, because you've got inverse-square losses. Maybe you could decelerate with a lightsail in the inner stellar system, but I think you'd just breeze right through since the radius of the "efficiently slow down" sphere is too small relative to how much you slow down, and in the outer stellar system, light pressure is too low to slow you down meaningfully.

Comment by diffractor on So You Want To Colonize The Universe Part 3: Dust · 2019-08-23T05:53:19.550Z · score: 1 (1 votes) · LW · GW

Very good point!

Comment by diffractor on 87,000 Hours or: Thoughts on Home Ownership · 2019-07-08T05:05:39.761Z · score: 5 (4 votes) · LW · GW

I'd be extremely interested in the quantitative analysis you've done so far.

Comment by diffractor on Open Problems Regarding Counterfactuals: An Introduction For Beginners · 2019-03-25T21:41:31.947Z · score: 12 (4 votes) · LW · GW

See if this works.

Comment by diffractor on So You Want to Colonize The Universe Part 4: Velocity Changes and Energy · 2019-03-02T18:17:30.135Z · score: 2 (2 votes) · LW · GW

I'm talking about using a laser sail to get up to near c (0.1 g acceleration for 40 lightyears is pretty strong) in the first place, and slowing down by other means.

This trick is about using a laser sail for both acceleration and deceleration.

Comment by diffractor on So You Want to Colonize The Universe Part 4: Velocity Changes and Energy · 2019-03-02T02:13:19.223Z · score: 2 (2 votes) · LW · GW

Yeah, I think the original proposal for a solar sail involved deceleration by having the central part of the sail detach and receive the reflected beam from the outer "ring" of the sail. I didn't do this because IIRC the beam only maintains coherence over 40 lightyears or so, so that trick would be for nearby missions.

Comment by diffractor on So You Want To Colonize The Universe Part 3: Dust · 2019-02-28T21:31:33.143Z · score: 6 (5 votes) · LW · GW

For 1, the mental model for non-relativistic but high speeds should be "a shallow crater is instantaneously vaporized out of the material going fast" and for relativistic speeds, it should be the same thing but with the vaporization directed in a deeper hole (energy doesn't spread out as much, it keeps in a narrow cone) instead of in all directions. However, your idea of having a spacecraft as a big flat sheet and being able to tolerate having a bunch of holes being shot in it is promising. The main issue that I see is that this approach is incompatible with a lot of things that (as far as we know) can only be done with solid chunks of matter, like antimatter energy capture, or having sideways boosting-rockets, and once you start armoring the solid chunks in the floaty sail, you're sort of back in the same situation. So it seems like an interesting approach and it'd be cool if it could work but I'm not quite sure it can (not entirely confident that it couldn't, just that it would require a bunch of weird solutions to stuff like "how does your sheet of tissue boost sideways at 0.1% of lightspeed".

For 2, the problem is that the particles which are highly penetrating are either unstable (muons, kaons, neutrons...) and will fall apart well before arrival (and that's completely dodging the issue of making bulk matter out of them), or they are stable (neutrinos, dark matter), and don't interact with anything, and since they don't really interact with anything, this means they especially don't interact with themselves (well, at least we know this for neutrinos), so they can't hold together any structure, nor can they interact with matter at the destination. Making a craft out of neutrinos is ridiculously more difficult than making a craft out of room-temperature air. If they can go through a light-year of lead without issue, they aren't exactly going to stick to each other. Heck, I think you'd actually have better luck trying to make a spaceship out of pure light.

For 3, it's because in order to use ricocheting mass to power your starcraft, you need to already have some way of ramping the mass up to relativistic speeds so it can get to the rapidly retreating starcraft in the first place, and you need an awful lot of mass. Light already starts off at the most relativistic speed of all, and around a star you already have astronomical amounts of light available for free.

For 4, there sort of is, but mostly not. The gravity example has the problem of the speeding up of the craft when it has the two stars ahead of it perfectly counterbalancing the backwards deceleration when the two stars are behind it. For potentials like gravity or electrical fields or pretty much anything you'd want to use, there's an inverse-square law for them, which means that they aren't really relevant unless you're fairly close to a star. The one instance I can think of where something like your approach is the case is the electric sail design in the final part. In interstellar space, it brakes against the thin soup of protons as usual, but nearby a star, the "wind" of particles streaming out from the star acts as a more effective brake and it can sail on that (going out), or use it for better deceleration (coming in). Think of it as a sail slowing a boat down when the air is stationary, and slowing down even better when the wind is blowing against you.

Comment by diffractor on So You Want to Colonize The Universe · 2019-02-28T21:10:48.895Z · score: 2 (2 votes) · LW · GW

Whoops, I guess I messed up on that setting. Yeah, it's ok.

Comment by diffractor on So You Want to Colonize The Universe Part 4: Velocity Changes and Energy · 2019-02-28T01:07:23.941Z · score: 3 (3 votes) · LW · GW

Actually, no! The activation energy for the conversion of diamond to graphite is about 540 kJ/mol, and using the Arrhenius equation to get the rate constant for diamond-graphite conversion, with a radiator temperature of 1900 K, we get that after 10,000 years of continuous operation, 99.95% of the diamond will still be diamond. At room temperature, the diamond-to-carbon conversion rate is slow enough that protons will decay before any appreciable amount of graphite is made.

Even for a 100,000 year burn, 99.5% of the diamond will still be intact at 1900 K.

There isn't much room to ramp up the temperature, though. We can stick to around 99%+ of the diamond being intact up to around 2100 K, but 2200 K has 5% of the diamond converting, 2300 K has 15% converting, 2400K has 45%, and it's 80 and 99% conversion of diamond into graphite over 10,000 years for 2500 K and 2600 K respectively.

Comment by diffractor on So You Want to Colonize The Universe · 2019-02-27T19:31:12.774Z · score: 1 (4 votes) · LW · GW

Agreed. Also, there's an incentive to keep thinking about how to go faster until the marginal gain in design by one day of thought speeds the rocket up by less than one day, instead of launching, otherwise you'll get overtaken, and agreeing on a coordinated plan ahead of time (you get this galaxy, I get that galaxy, etc...) to avoid issues with lightspeed delays.

Comment by diffractor on So You Want to Colonize The Universe · 2019-02-27T19:28:57.684Z · score: 3 (3 votes) · LW · GW

Or maybe accepting messages from home (in rocket form or not) of "whoops, we were wrong about X, here's the convincing moral argument" and acting accordingly. Then the only thing to be worried about would be irreversible acts done in the process of colonizing a galaxy, instead of having a bad "living off resources" endstate.

Comment by diffractor on So You Want to Colonize the Universe Part 2: Deep Time Engineering · 2019-02-27T18:08:20.973Z · score: 5 (4 votes) · LW · GW

Edited. Thanks for that. I guess I managed to miss both of those, I was mainly going off of the indispensable and extremely thorough Atomic Rockets site having extremely little discussion of intergalactic missions as opposed to interstellar missions.

It looks like there are some spots where me and Armstrong converged on the same strategy (using lasers to launch probes), but we seem to disagree about how big of a deal dust shielding is, how hard deceleration is, and what strategy to use for deceleration.

Comment by diffractor on So You Want to Colonize The Universe · 2019-02-27T17:57:20.690Z · score: 2 (2 votes) · LW · GW

Yeah, Atomic Rockets was an incredibly helpful resource for me, I definitely endorse it for others.

Comment by diffractor on What makes people intellectually active? · 2019-01-15T01:16:04.921Z · score: 11 (6 votes) · LW · GW

This doesn't quite seem right, because just multiplying probabilities only works when all the quantities are independent. However, I'd put higher odds on someone having the ability to recognize a worthwhile result conditional on them having an ability to work on a problem, then having the ability to recognize a worthwhile result, so the multiplication of probabilities will be higher than it seems at first.

I'm unsure whether this consideration affects whether the distribution would be lognormal or not.

Comment by diffractor on CDT Dutch Book · 2019-01-14T01:29:23.101Z · score: 1 (1 votes) · LW · GW

(lightly edited restatement of email comment)

Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".

There will be 0 utility for taking left path, -10 utility for taking the right path, and -1000 utility for a cosmic ray hit. The CDT counterfactual says 0 utility for taking left path, -10 utility for taking the right path, while the conditional says 0 utility for left path, -1010 utility for right path (because conditional on taking the right path, you were hit by a cosmic ray).

In order to get the dutch book to go through, we need to get the agent to take the right path, to exploit P(cosmic ray) changing between the decision time and afterwards. So the initial bet could be something like -1 utility now, +12 utility upon taking the right path and not being hit by a cosmic ray. But now since the optimal action is "take the right path along with the bet", the problem setup has been changed, and we can't conclude that the agent's conditional on taking the right path places high probability on getting hit by a cosmic ray (because now the right path is the optimal action), so we can't money-pump with the "+0.5 utility, -12 utility upon taking a cosmic ray hit" bet.

So this seems to dutch-book Death-in-Damascus, not CDTEDT cases in general.

Comment by diffractor on Failures of UDT-AIXI, Part 1: Improper Randomizing · 2019-01-08T03:43:24.721Z · score: 3 (2 votes) · LW · GW

Yes, UDT means updateless decision theory, "the policy" is used as a placeholder for "whatever policy the agent ends up picking", much like a variable in an equation, and "the algorithm I wrote" is still unpublished because there were too many things wrong with it for me to be comfortable putting it up, as I can't even show it has any nice properties in particular. Although now that you mention it, I probably should put it up so future posts about what's wrong with it have a well-specified target to shoot holes in. >_>

Comment by diffractor on Cooperative Oracles · 2018-12-05T05:23:15.854Z · score: 1 (1 votes) · LW · GW

It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that more changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality.

The definition of an almost stratified Pareto optimum was adapted from this , and was developed specifically to address the infinite game in that post involving a non-well-founded chain of players, where nothing is a stratified Pareto optimum for all players. Something isn't stratified Pareto optimal in a vacuum, it's stratified Pareto optimal for a particular player. There's no oracle that's stratified Pareto optimal for all players, but if you take the closure of everyone's SPO sets first to produce a set of ASPO oracles for every player, and take the intersection of all those sets, there are points which are ASPO for everyone.

Comment by diffractor on Beliefs at different timescales · 2018-11-06T01:00:36.301Z · score: 3 (2 votes) · LW · GW

My initial inclination is to introduce as the space of events on turn , and define and then you can express it as .

Comment by diffractor on Beliefs at different timescales · 2018-11-05T00:14:47.198Z · score: 1 (1 votes) · LW · GW

The notation for the sum operator is unclear. I'd advise writing the sum as and using an subscript inside the sum so it's clearer what is being substituted where.

Comment by diffractor on Asymptotic Decision Theory (Improved Writeup) · 2018-10-29T21:50:53.040Z · score: 3 (2 votes) · LW · GW

Wasn't there a fairness/continuity condition in the original ADT paper that if there were two "agents" that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if , then ) This would mean that it'd be impossible to have be low while is high, so the argument still goes through.

Although, after this whole line of discussion, I'm realizing that there are enough substantial differences between the original formulation of ADT and the thing I wrote up that I should probably clean up this post a bit and clarify more about what's different in the two formulations. Thanks for that.

Comment by diffractor on Asymptotic Decision Theory (Improved Writeup) · 2018-10-25T20:54:42.492Z · score: 1 (1 votes) · LW · GW
in the ADT paper, the asymptotic dominance argument is about the limit of the agent's action as epsilon goes to 0. This limit is not necessarily computable, so the embedder can't contain the agent, since it doesn't know epsilon. So the evil problem doesn't work.

Agreed that the evil problem doesn't work for the original ADT paper. In the original ADT paper, the agents are allowed to output distributions over moves. I didn't like this because it implicitly assumes that it's possible for the agent to perfectly randomize, and I think randomization is better modeled by a (deterministic) action that consults an environmental random-number generator, which may be correlated with other things.

What I meant was that, in the version of argmax that I set up, if is the two constant policies "take blank box" and "take shiny box", then for the embedder where the opponent runs argmax to select which box to fill, the argmax agent will converge to deterministically randomizing between the two policies, by the logical inductor assigning very similar expected utility to both options such that the inductor can't predict which action will be chosen. And this occurs because the inductor outputting more of "take the blank box" will have converge to a higher expected value (so argmax will learn to copy that), and the inductor outputting more of "take the shiny box" will have converge to a higher expected value (so argmax will learn to copy that).

The optimality proof might be valid. I didn't understand which specific step you thought was wrong.

So, the original statement in the paper was

It must then be the case that for every . Let be the first element of in . Since every class will be seperated by at least in the limit, will eventually be a distribution over just . And since for every , , by the definition of it must be the case that .

The issue with this is the last sentence. It's basically saying "since the two actions and get equal expected utility in the limit, the total variation distance between a distribution over the two actions, and one of the actions, limits to zero", which is false

And it is specifically disproved by the second counterexample, where there are two actions that both result in 1 utility, so they're both in the same equivalence class, but a probabilistic mixture between them (as converges to playing, for all ) gets less than 1 utility.

Consider the following embedder. According to this embedder, you will play chicken against ADT-epsilon who knows who you are. When ADT-epsilon considers this embedder, it will always pass the reality filter, since in fact ADT-epsilon is playing against ADT-epsilon. Furthermore, this embedder gives NeverSwerveBot a high utility. So ADT-epsilon expects a high utility from this embedder, through NeverSwerveBot, and it never swerves.

You'll have to be more specific about "who knows what you are". If it unpacks as "opponent only uses the embedder where it is up against [whatever policy you plugged in]", then NeverSwerveBot will have a high utility, but it will get knocked down by the reality filter, because if you converge to never swerving, will converge to 0, and the inductor will learn that so it will converge to assigning equal expected value to and, and converges to 1.

If it unpacks as "opponent is ADT-epsilon", and you converge to never swerving, then argmaxing will start duplicating the swerve strategy instead of going straight. In both cases, the argument fails.

Comment by diffractor on Asymptotic Decision Theory (Improved Writeup) · 2018-09-28T05:22:26.498Z · score: 3 (2 votes) · LW · GW

I got an improved reality-filter that blocks a certain class of environments that lead conjecture 1 to fail, although it isn't enough to deal with the provided chicken example and lead to a proof of conjecture 1. (the subscripts will be suppressed for clarity)

Instead of the reality-filter for being

it is now

This doesn't just check whether reality is recovered on average, it also checks whether all the "plausible conditionals" line up as well. Some of the conditionals may not be well-formed, as there may be conditioning on low-or-zero probability events, but these are then multiplied by a very small number, so no harm is done.

This has the nice property that for all "plausibly chosen embedders" that have a probability sufficiently far away from 0, all embedders and that pass this reality filter have the property that

So all embedders that pass the reality filter will agree on the expected utility of selecting a particular embedder that isn't very unlikely to be selected.

Comment by diffractor on Reflective AIXI and Anthropics · 2018-09-27T20:11:27.160Z · score: 1 (1 votes) · LW · GW

I figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating.

Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing some other simple environment, you would intuitively expect the AIXI to act as if it's in the simple environment for a brief time before gaining enough information to conclude that things have changed and rederive the new rules of where it is.

Comment by diffractor on Reflective AIXI and Anthropics · 2018-09-27T20:01:04.519Z · score: 1 (1 votes) · LW · GW

Not quite. If taking bet 9 is a prerequisite to taking bet 10, then AIXI won't take bet 9, but if bet 10 gets offered whether or not bet 9 is accepted, then AIXI will be like "ah, future me will take the bet, and wind up with 10+ in the heads world and -20+2 in the tails world. This is just a given. I'll take this +15/-15 bet as it has net positive expected value, and the loss in the heads world is more than counterbalanced by the reduction in the magnitude of loss for the tails world"

Something else feels slightly off, but I can't quite pinpoint it at this point. Still, I guess this solves my question as originally stated, so I'll PM you for payout. Well done!

(btw, you can highlight a string of text and hit crtl+4 to turn it into math-mode)

Comment by diffractor on Asymptotic Decision Theory (Improved Writeup) · 2018-09-27T16:01:20.403Z · score: 1 (1 votes) · LW · GW

Yup, I meant counterfactual mugging. Fixed.

Comment by diffractor on Asymptotic Decision Theory (Improved Writeup) · 2018-09-27T08:06:03.342Z · score: 1 (1 votes) · LW · GW

I think I remember the original ADT paper showing up on agent foundations forum before a writeup on logical EDT with exploration, and my impression of which came first was affected by that. Also, the "this is detailed in this post" was referring to logical EDT for exploration. I'll edit for clarity.

Comment by diffractor on Reflective AIXI and Anthropics · 2018-09-26T00:09:27.053Z · score: 1 (1 votes) · LW · GW

I actually hadn't read that post or seen the idea anywhere before writing this up. It's a pretty natural resolution, so I'd be unsurprised if it was independently discovered before. Sorry about being unable to assist.

The extra penalty to describe where you are in the universe corresponds to requiring sense data to pin down *which* star you are near, out of the many stars, even if you know the laws of physics, so it seems to recover desired behavior.

Comment by diffractor on Cooperative Oracles · 2018-09-05T03:54:17.930Z · score: 3 (2 votes) · LW · GW

Giles Edkins coded up a thing which lets you plug in numbers for a 2-player, 2-move game payoff matrix and it automatically displays possible outcomes in utility-space. It may be found here. The equilibrium points and strategy lines were added later in MS Paint.

Comment by diffractor on Cooperative Oracles · 2018-09-02T22:51:28.607Z · score: 1 (1 votes) · LW · GW

The basic reason for the dependency relation to care about oracle queries from strategies is that, when you have several players all calling the oracle on each other, there's no good way to swap out the oracle calls with the computation. The trick you describe does indeed work, and is a reason to not call any more turing machines than you need to, but there's several things it doesn't solve. For instance, if you are player 1, and your strategy depends on oracle calls to player 2 and 3, and the same applies to the other two players, you may be able to swap out an oracle call to player two with player two's actual code (which calls players 1 and 3), but you can't unpack any more oracle calls into their respective computations without hitting an infinite regress.

Comment by diffractor on Cooperative Oracles · 2018-09-01T16:24:54.548Z · score: 1 (1 votes) · LW · GW

I'm not sure what you mean by fixing the utility function occurring before fixing the strategy. In the problem setup of a game, you specify a utility function machine and a strategy machine for everyone, and there isn't any sort of time or order on this (there's just a set of pairs of probabilistic oracle machines) and you can freely consider things such as "what happens when we change some player's strategies/utility function machines"

Comment by diffractor on Probabilistic Tiling (Preliminary Attempt) · 2018-08-07T23:53:21.192Z · score: 1 (1 votes) · LW · GW

Ah, the formal statement was something like "if the policy A isn't the argmax policy, the successor policy B must be in the policy space of the future argmax, and the action selected by policy A is computed so the relevant equality holds"

Yeah, I am assuming fast feedback that it is resolved on day .

What I meant was that the computation isn't extremely long in the sense of description length, not in the sense of computation time. Also, we aren't doing policy search over the set of all turing machines, we're doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)

Also I'm less confident in conditional future-trust for all conditionals than I used to be, I'll try to crystallize where I think it goes wrong.

Comment by diffractor on Probabilistic Tiling (Preliminary Attempt) · 2018-08-07T23:19:46.375Z · score: 1 (1 votes) · LW · GW

First: That notation seems helpful. Fairness of the environment isn't present by default, it still needs to be assumed even if the environment is purely action-determined, as you can consider an agent in the environment that is using a hardwired predictor of what the argmax agent would do. It is just a piece of the environment, and feeding a different sequence of actions into the environment as input gets a different score, so the environment is purely action-determined, but it's still unfair in the sense that the expected utility of feeding action into the function drops sharply if you condition on the argmax agent selecting action . The third condition was necessary to carry out this step. . The intuitive interpretation of the third condition is that, if you know that policy B selects action 4, then you can step from "action 4 is taken" to "policy B takes the actions it takes", and if you have a policy where you don't know what action it takes (third condition is violated), then "policy B does its thing" may have a higher expected utility than any particular action being taken, even in a fair environment that only cares about action sequences, as the hamster dance example shows.

Second: I think you misunderstood what I was claiming. I wasn't claiming that logical inductors attain the conditional future-trust property, even in the limit, for all sentences or all true sentences. What I was claiming was: The fact that is provable or disprovable in the future (in this case, is ), makes the conditional future-trust property hold (I'm fairly sure), and for statements where there isn't guaranteed feedback, the conditional future-trust property may fail. The double-expectation property that you state does not work to carry the proof through, because the proof (from the perspective of the first agent), takes as an assumption, so the "conditional on " part has to be outside of the future expectation, when you go back to what the first agent believes.

Third: the sense I meant for "agent is able to reason about this computation" is that the computation is not extremely long, so logical inductor traders can bet on it.

Comment by diffractor on Complete Class: Consequentialist Foundations · 2018-07-12T03:54:22.837Z · score: 3 (2 votes) · LW · GW

Pretty much that, actually. It doesn't seem too irrational, though. Upon looking at a mathematical universe where torture was decided upon as a good thing, it isn't an obvious failure of rationality to hope that a cosmic ray flips the sign bit of the utility function of an agent in there.

The practical problem with values that care about other mathematical worlds, however, is that if the agent you built has a UDT prior over values, it's an improvement (from the perspective of the prior) for the nosy neigbors/values that care about other worlds, to dictate some of what happens in your world (since the marginal contribution of your world to the prior expected utility looks like some linear combination of the various utility functions, weighted by how much they care about your world) So, in practice, it'd be a bad idea to build a UDT value learning prior containing utility functions that have preferences over all worlds, since it'd add a bunch of extra junk from different utility functions to our world if run.

Comment by diffractor on An environment for studying counterfactuals · 2018-07-11T03:48:02.722Z · score: 2 (2 votes) · LW · GW

If exploration is a hack, then why do pretty much all multi-armed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?

Comment by diffractor on Optimal and Causal Counterfactual Worlds · 2018-07-05T05:40:27.000Z · score: 0 (0 votes) · LW · GW

Yeah, when I went back and patched up the framework of this post to be less logical-omniscence-y, I was able to get , but 2 is a bit too strong to be proved from 1, because my framing of 2 is just about probability disagreements in general, while 1 requires to assign probability 1 to .

Comment by diffractor on Poker example: (not) deducing someone's preferences · 2018-06-23T00:37:21.418Z · score: 4 (2 votes) · LW · GW

Since beliefs/values combinations can be ruled out, would it then be possible to learn values by asking the human about their own beliefs?

Comment by diffractor on A Loophole for Self-Applicative Soundness · 2018-06-11T15:01:10.000Z · score: 0 (0 votes) · LW · GW

I found an improved version by Pavel, that gives a way to construct a proof of from that has a length of . The improved version is here.

There are restrictions to this result, though. One is that the C-rule must apply to the logic. This is just the ability to go from to instantiating a such that . Pretty much all reasonable theorem provers have this.

The second restriction is that the theory must be finitely axiomatizable. No axiom schemas allowed. Again, this isn't much of a restriction in practice, because NBG set theory, which proves the consistency of ZFC, is finitely axiomatizable.

The proof strategy is basically as follows. It's shown that the shortest proof of a statement with quantifier depth n must have a length of , if the maximum quantifier depth in the proof is or greater.

This can be flipped around to conclude that if there's a length-n proof of , the maximum quantifier depth in the proof can be at most .

The second part of the proof involves constructing a bounded-quantifier version of a truth predicate. By Tarski's undefinability of truth, a full truth predicate cannot be constructed, but it's possible to exhibit a formula for which it's provable that ( is the formula laying out Tarski's conditions for something to be a truth predicate). Also, if quantifier depth of , there's a proof of ( is the sentence with its free variables substituted for the elements enumerated in the list ) Also, there's a proof that is preserved under inference rules and logical axioms, as long as everything stays below a quantifier depth of .

All these proofs can be done in lines. One factor of comes from the formula abbreviated as getting longer at a linear rate, and the other factor comes from having to prove for each seperately as an ingredient for the next proof.

Combining the two parts, the bound on the quantifier depth and the bound on how long it takes to prove stuff about the truth predicate, make it take steps to prove all the relevant theorems about a sufficiently large bounded quantifier depth truth predicate, and then you can just go "the statement that we are claiming to have been proved must have apply to it, and we've proved this is equivalent to the statement itself"

As a further bonus, a single -length proof can establish the consistency of the theory itself for all -length proofs.

It seems like a useful project to develop a program that will automatically write a proof of this form, to assess whether abstract unpacking of bounded proofs is usable in practice, but it will require digging into a bunch of finicky details of exactly how to encode a math theory inside itself.

Comment by diffractor on A Loophole for Self-Applicative Soundness · 2018-06-04T22:25:12.000Z · score: 0 (0 votes) · LW · GW

Caught a flaw with this proposal in the currently stated form, though it is probably patchable.

When unpacking a proof, at some point the sentence will be reached as a conclusion, which is a false statement.

Comment by diffractor on Does Thinking Hard Hurt Your Brain? · 2018-04-29T21:56:04.384Z · score: 12 (3 votes) · LW · GW

It doesn't hurt my brain, but there's a brain fog that kicks in eventually, that's kind of like a blankness with no new ideas coming, an aversion to further work, and a reduction in working memory, so I can stare at some piece of math for a while, and not comprehend it, because I can't load all the concepts into my mind at once. It's kind of like a hard limit for any cognition-intensive task.

This kicks in around the 2 hour mark for really intensive work/studying, although for less intensive work/studying, it can vary up all the way up to 8 hours. As a general rule of thumb, the -afinil class of drugs triples my time limit until the brain fog kicks in, at a cost of less creative and lateral thinking.

Because of this, my study habits for school consisted of alternating 2-hour study blocks and naps.

Comment by diffractor on Smoking Lesion Steelman · 2018-04-19T22:24:29.000Z · score: 1 (1 votes) · LW · GW

I think that in that case, the agent shouldn't smoke, and CDT is right, although there is side-channel information that can be used to come to the conclusion that the agent should smoke. Here's a reframing of the provided payoff matrix that makes this argument clearer. (also, your problem as stated should have 0 utility for a nonsmoker imagining the situation where they smoke and get killed)

Let's say that there is a kingdom which contains two types of people, good people and evil people, and a person doesn't necessarily know which type they are. There is a magical sword enchanted with a heavenly aura, and if a good person wields the sword, it will guide them do heroic things, for +10 utility (according to a good person) and 0 utility (according to a bad person). However, if an evil person wields the sword, it will afflict them for the rest of their life with extreme itchiness, for -100 utility (according to everyone).

good person's utility estimates:

  • takes sword

    • I'm good: 10

    • I'm evil: -90

  • don't take sword: 0

evil person's utility estimates:

  • takes sword

    • I'm good: 0

    • I'm evil: -100

  • don't take sword: 0

As you can clearly see, this is the exact same payoff matrix as the previous example. However, now it's clear that if a (secretly good) CDT agent believes that most of society is evil, then it's a bad idea to pick up the sword, because the agent is probably evil (according to the info they have) and will be tormented with itchiness for the rest of their life, and if it believes that most of society is good, then it's a good idea to pick up the sword. Further, this situation is intuitively clear enough to argue that CDT just straight-up gets the right answer in this case.

A human (with some degree of introspective power) in this case, could correctly reason "oh hey I just got a little warm fuzzy feeling upon thinking of the hypothetical where I wield the sword and it doesn't curse me. This is evidence that I'm good, because an evil person would not have that response, so I can safely wield the sword and will do so".

However, what the human is doing in this case is using side-channel information that isn't present in the problem description. They're directly experiencing sense data as a result of the utility calculation outputting 10 in that hypothetical, and updating on that. In a society where everyone was really terrible at introspection so the only access they had to their decision algorithm was seeing their actual decision, (and assuming no previous decision problems that good and evil people decide differently on so the good person could learn that they were good by their actions), it seems to me like there's a very intuitively strong case for not picking up the sword/not smoking.

Comment by diffractor on Musings on Exploration · 2018-04-07T01:18:49.000Z · score: 0 (0 votes) · LW · GW

A: While that is a really interesting note that I hadn't spotted before, the standard formulation of exploration steps in logical inductor decision theory involve infinite exploration steps over all time, so even though an agent of this type would be able to inductively learn from what other agents do in different decision problems in less time than it naively appears, that wouldn't make it explore less.

B: What I intended with the remark about Thompson sampling was that troll bridge functions on there being two distinct causes of "attempting to cross the bridge". One is crossing because you believe it to be the best action, and the other is crossing because an exploration step occurred, and Thompson sampling doesn't have a split decision criterion like this. Although now that you point it out, it is possible to make a Thompson sampling variant where the troll blows up the bridge when "crossing the bridge" is not the highest-ranked action.

Comment by diffractor on An Untrollable Mathematician Illustrated · 2018-04-02T17:46:27.954Z · score: 4 (2 votes) · LW · GW

The beliefs aren't arbitrary, they're still reasoning according to a probability distribution over propositionally consistent "worlds". Furthermore, the beliefs converge to a single number in the limit of updating on theorems, even if the sentence of interest is unprovable. Consider some large but finite set S of sentences that haven't been proved yet, such that the probability of sampling a sentence in that set before sampling the sentence of interest "x", is very close to 1. Then pick a time N, that is large enough that by that time, all the logical relations between the sentences in S will have been found. Then, with probability very close to 1, either "x" or "notx" will be sampled without going outside of S.

So, if there's some cool new theorem that shows up relating "x" and some sentence outside of S, like "y->x", well, you're almost certain to hit either "x" or "notx" before hitting "y", because "y" is outside S, so this hot new theorem won't affect the probabilities by more than a negligible amount.

Also I figured out how to generalize the prior a bit to take into account arbitrary constraints other than propositional consistency, though there's still kinks to iron out in that one. Check this.

Comment by diffractor on A Difficulty With Density-Zero Exploration · 2018-03-27T22:57:26.000Z · score: 0 (0 votes) · LW · GW

Update: This isn't really an issue, you just need to impose an assumption that there is some function such that , and is computable in time polynomial in , and you always find out whether exploration happened on turn after days.

This is just the condition that there's a subsequence where good feedback is possible, and is discussed significantly in section 4.3 of the logical induction paper.

If there's a subsequence B (of your subsequence of interest, A) where you can get good feedback, then there's infinite exploration steps on subsequence B (and also on A because it contains B)

This post is hereby deprecated. Still right, just not that relevant.

Comment by diffractor on The Art of the Artificial: Insights from 'Artificial Intelligence: A Modern Approach' · 2018-03-27T08:49:58.748Z · score: 9 (3 votes) · LW · GW

Yup, that particular book is how I learned to prove stuff too. (well, actually, there was a substantial time delay between reading that and being able to prove stuff, but it's an extremely worthwhile overview)

Comment by diffractor on An Untrollable Mathematician Illustrated · 2018-03-22T01:06:55.414Z · score: 12 (3 votes) · LW · GW

You're pretending that it's what nature is doing what you update your prior. It works when sentences are shown to you in an adversarial order, but there's the weird aspect that this prior expects the sentences to go back to being drawn from some fixed distribution afterwards. It doesn't do a thing where it goes "ah, I'm seeing a bunch of blue blocks selectively revealed, even though I think there's a bunch of red blocks, the next block I'll have revealed will probably be blue". Instead, it just sticks with its prior on red and blue blocks.

Comment by diffractor on An Untrollable Mathematician Illustrated · 2018-03-22T01:03:12.216Z · score: 14 (5 votes) · LW · GW

There's a misconception, it isn't about finding sentences of the form and , because if you do that, it immediately disproves . It's actually about merely finding many instances of where has probability, and this lowers the probability of . This is kind of like how finding out about the Banach-Tarski paradox (something you assign low probability to) may lower your degree of belief in the axiom of choice.

The particular thing that prevents trolling is that in this distribution, there's a fixed probability of drawing on the next round no matter how many implications and 's you've found so far. So the way it evades trolling is a bit cheaty, in a certain sense, because it believes that the sequence of truth or falsity of math sentences that it sees is drawn from a certain fixed distribution, and doesn't do anything like believing that it's more likely to see a certain class of sentences come up soon.

Comment by diffractor on An Untrollable Mathematician Illustrated · 2018-03-22T00:52:39.221Z · score: 22 (7 votes) · LW · GW

There's a difference between "consistency" (it is impossible to derive X and notX for any sentence X, this requires a halting oracle to test, because there's always more proof paths), and "propositional consistency", which merely requires that there are no contradictions discoverable by boolean algebra only. So A^B is propositionally inconsistent with notA, and propositionally consistent with A. If there's some clever way to prove that B implies notA, it wouldn't affect the propositional consistency of them at all. Propositional consistency of a set of sentences can be verified in exponential time.

Comment by diffractor on Distributed Cooperation · 2018-03-18T09:17:50.000Z · score: 1 (1 votes) · LW · GW

If you drop the Pareto-improvement condition from the cell rank, and just have "everyone sorts things by their own utility", then you won't necessarily get a Pareto-optimal outcome (within the set of cell center-points), but you will at least get a point where there are no strict Pareto improvements (no points that leave everyone better off).

The difference between the two is... let's say we've got a 2-player 2-move game that in utility-space, makes some sort of quadrilateral. If the top and right edges join at 90 degrees, the Pareto-frontier would be the point on the corner, and the set of "no strict Pareto improvements" would be the top and the right edges.

If that corner is obtuse, then both "Pareto frontier" and "no strict Pareto improvements" agree that both line edges are within the set, and if the corner is acute, then both "Pareto frontier" and "no strict Pareto improvements" agree that only the corner is within the set. It actually isn't much of a difference, it only manifests when the utilities for a player are exactly equal, and is easily changed by a little bit of noise.

The utility-approximation issue you pointed out seems to be pointing towards the impossibility of guaranteeing limiting to a point on the Pareto frontier (when you make the cell size smaller and smaller), precisely because of that "this set is unstable under arbitrarily small noise" issue.

But, the "set of all points that have no strict Pareto improvements by more than for all players", ie, the -fuzzed version of "set of points with no strict pareto improvement", does seem to be robust against a little bit of noise, and doesn't require the Pareto-improvement condition on everyone's ranking of cells.

So I'm thinking that if that's all we can attain (because of the complication you pointed out), then it lets us drop that inelegant Pareto-improvement condition.

I'll work on the proof that for sufficiently small cell size , you can get an outcome within of the set of "no strict Pareto improvements available"

Nice job spotting that flaw.

Comment by diffractor on Set Up for Success: Insights from 'Naïve Set Theory' · 2018-03-01T00:22:19.044Z · score: 14 (4 votes) · LW · GW

I read through the entire Logical Induction paper, most-everything on Agent Foundations Forum, the advised Linear Algebra textbook, part of a Computational Complexity textbook, and the Optimal Poly-Time Estimators paper.

I'd be extremely interested in helping out other people with learning MIRI-relevant math, having gone through it solo. I set up a Discord chatroom for it, but it's been pretty quiet. I'll PM you both.

Comment by diffractor on Further Progress on a Bayesian Version of Logical Uncertainty · 2018-02-03T04:46:18.000Z · score: 0 (0 votes) · LW · GW

Intermediate update:

The handwavy argument about how you'd get propositional inconsistency in the limit of imposing the constraint of "the string cannot contain and and and... and "

is less clear than I thought. The problem is that, while the prior may learn that that constraint applies as it updates on more sentences, that particular constraint can get you into situations where adding either or leads to a violation of the constraint.

So, running the prior far enough forward leads to the probability distribution being nearly certain that, while that particular constraint applied in the past, it will stop applying at some point in the future by vetoing both possible extensions of a string of sentences, and then less-constrained conditions will apply from that point forward.

On one hand, if you don't have the computational resources to enforce full propositional consistency, it's expected that most of the worlds you generate will be propositionally inconsistent, and midway through generating them you'll realize that some of them are indeed propositionally inconsistent.

On the other hand, we want to be able to believe that constraints capable of painting themselves into a corner will apply to reality forevermore.

I'll think about this a bit more. One possible line of attack is having and not add up to one, because it's possible that the sentence generating process will just stop cold before one of the two shows up, and renormalizing them to 1. But I'd have to check if it's still possible to -approximate the distribution if we introduce this renormalization, and to be honest, I wouldn't be surprised if there was a more elegant way around this.

EDIT: yes it's still possible to -approximate the distribution in known time if you have refer to , although the bounds are really loose. Will type up the proof later.