Posts
Comments
Good point on phase 6. For phase 3, smaller changes in velocity further out are fine, but I still think that even with less velocity changes, you'll still have difficulty finding an engine that gets sufficient deltaV that isn't fission/fusion/antimatter based. (also in the meantime I realized that neutron damage over those sorts of timescales are going to be *really* bad.) For phase 5, I don't think a lightsail would provide enough deceleration, because you've got inversesquare losses. Maybe you could decelerate with a lightsail in the inner stellar system, but I think you'd just breeze right through since the radius of the "efficiently slow down" sphere is too small relative to how much you slow down, and in the outer stellar system, light pressure is too low to slow you down meaningfully.
Very good point!
I'd be extremely interested in the quantitative analysis you've done so far.
See if this works.
I'm talking about using a laser sail to get up to near c (0.1 g acceleration for 40 lightyears is pretty strong) in the first place, and slowing down by other means.
This trick is about using a laser sail for both acceleration and deceleration.
Yeah, I think the original proposal for a solar sail involved deceleration by having the central part of the sail detach and receive the reflected beam from the outer "ring" of the sail. I didn't do this because IIRC the beam only maintains coherence over 40 lightyears or so, so that trick would be for nearby missions.
For 1, the mental model for nonrelativistic but high speeds should be "a shallow crater is instantaneously vaporized out of the material going fast" and for relativistic speeds, it should be the same thing but with the vaporization directed in a deeper hole (energy doesn't spread out as much, it keeps in a narrow cone) instead of in all directions. However, your idea of having a spacecraft as a big flat sheet and being able to tolerate having a bunch of holes being shot in it is promising. The main issue that I see is that this approach is incompatible with a lot of things that (as far as we know) can only be done with solid chunks of matter, like antimatter energy capture, or having sideways boostingrockets, and once you start armoring the solid chunks in the floaty sail, you're sort of back in the same situation. So it seems like an interesting approach and it'd be cool if it could work but I'm not quite sure it can (not entirely confident that it couldn't, just that it would require a bunch of weird solutions to stuff like "how does your sheet of tissue boost sideways at 0.1% of lightspeed".
For 2, the problem is that the particles which are highly penetrating are either unstable (muons, kaons, neutrons...) and will fall apart well before arrival (and that's completely dodging the issue of making bulk matter out of them), or they are stable (neutrinos, dark matter), and don't interact with anything, and since they don't really interact with anything, this means they especially don't interact with themselves (well, at least we know this for neutrinos), so they can't hold together any structure, nor can they interact with matter at the destination. Making a craft out of neutrinos is ridiculously more difficult than making a craft out of roomtemperature air. If they can go through a lightyear of lead without issue, they aren't exactly going to stick to each other. Heck, I think you'd actually have better luck trying to make a spaceship out of pure light.
For 3, it's because in order to use ricocheting mass to power your starcraft, you need to already have some way of ramping the mass up to relativistic speeds so it can get to the rapidly retreating starcraft in the first place, and you need an awful lot of mass. Light already starts off at the most relativistic speed of all, and around a star you already have astronomical amounts of light available for free.
For 4, there sort of is, but mostly not. The gravity example has the problem of the speeding up of the craft when it has the two stars ahead of it perfectly counterbalancing the backwards deceleration when the two stars are behind it. For potentials like gravity or electrical fields or pretty much anything you'd want to use, there's an inversesquare law for them, which means that they aren't really relevant unless you're fairly close to a star. The one instance I can think of where something like your approach is the case is the electric sail design in the final part. In interstellar space, it brakes against the thin soup of protons as usual, but nearby a star, the "wind" of particles streaming out from the star acts as a more effective brake and it can sail on that (going out), or use it for better deceleration (coming in). Think of it as a sail slowing a boat down when the air is stationary, and slowing down even better when the wind is blowing against you.
Whoops, I guess I messed up on that setting. Yeah, it's ok.
Actually, no! The activation energy for the conversion of diamond to graphite is about 540 kJ/mol, and using the Arrhenius equation to get the rate constant for diamondgraphite conversion, with a radiator temperature of 1900 K, we get that after 10,000 years of continuous operation, 99.95% of the diamond will still be diamond. At room temperature, the diamondtocarbon conversion rate is slow enough that protons will decay before any appreciable amount of graphite is made.
Even for a 100,000 year burn, 99.5% of the diamond will still be intact at 1900 K.
There isn't much room to ramp up the temperature, though. We can stick to around 99%+ of the diamond being intact up to around 2100 K, but 2200 K has 5% of the diamond converting, 2300 K has 15% converting, 2400K has 45%, and it's 80 and 99% conversion of diamond into graphite over 10,000 years for 2500 K and 2600 K respectively.
Agreed. Also, there's an incentive to keep thinking about how to go faster until the marginal gain in design by one day of thought speeds the rocket up by less than one day, instead of launching, otherwise you'll get overtaken, and agreeing on a coordinated plan ahead of time (you get this galaxy, I get that galaxy, etc...) to avoid issues with lightspeed delays.
Or maybe accepting messages from home (in rocket form or not) of "whoops, we were wrong about X, here's the convincing moral argument" and acting accordingly. Then the only thing to be worried about would be irreversible acts done in the process of colonizing a galaxy, instead of having a bad "living off resources" endstate.
Edited. Thanks for that. I guess I managed to miss both of those, I was mainly going off of the indispensable and extremely thorough Atomic Rockets site having extremely little discussion of intergalactic missions as opposed to interstellar missions.
It looks like there are some spots where me and Armstrong converged on the same strategy (using lasers to launch probes), but we seem to disagree about how big of a deal dust shielding is, how hard deceleration is, and what strategy to use for deceleration.
Yeah, Atomic Rockets was an incredibly helpful resource for me, I definitely endorse it for others.
This doesn't quite seem right, because just multiplying probabilities only works when all the quantities are independent. However, I'd put higher odds on someone having the ability to recognize a worthwhile result conditional on them having an ability to work on a problem, then having the ability to recognize a worthwhile result, so the multiplication of probabilities will be higher than it seems at first.
I'm unsure whether this consideration affects whether the distribution would be lognormal or not.
(lightly edited restatement of email comment)
Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".
There will be 0 utility for taking left path, 10 utility for taking the right path, and 1000 utility for a cosmic ray hit. The CDT counterfactual says 0 utility for taking left path, 10 utility for taking the right path, while the conditional says 0 utility for left path, 1010 utility for right path (because conditional on taking the right path, you were hit by a cosmic ray).
In order to get the dutch book to go through, we need to get the agent to take the right path, to exploit P(cosmic ray) changing between the decision time and afterwards. So the initial bet could be something like 1 utility now, +12 utility upon taking the right path and not being hit by a cosmic ray. But now since the optimal action is "take the right path along with the bet", the problem setup has been changed, and we can't conclude that the agent's conditional on taking the right path places high probability on getting hit by a cosmic ray (because now the right path is the optimal action), so we can't moneypump with the "+0.5 utility, 12 utility upon taking a cosmic ray hit" bet.
So this seems to dutchbook DeathinDamascus, not CDTEDT cases in general.
Yes, UDT means updateless decision theory, "the policy" is used as a placeholder for "whatever policy the agent ends up picking", much like a variable in an equation, and "the algorithm I wrote" is still unpublished because there were too many things wrong with it for me to be comfortable putting it up, as I can't even show it has any nice properties in particular. Although now that you mention it, I probably should put it up so future posts about what's wrong with it have a wellspecified target to shoot holes in. >_>
It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that more changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality.
The definition of an almost stratified Pareto optimum was adapted from this , and was developed specifically to address the infinite game in that post involving a nonwellfounded chain of players, where nothing is a stratified Pareto optimum for all players. Something isn't stratified Pareto optimal in a vacuum, it's stratified Pareto optimal for a particular player. There's no oracle that's stratified Pareto optimal for all players, but if you take the closure of everyone's SPO sets first to produce a set of ASPO oracles for every player, and take the intersection of all those sets, there are points which are ASPO for everyone.
My initial inclination is to introduce as the space of events on turn , and define and then you can express it as .
The notation for the sum operator is unclear. I'd advise writing the sum as and using an subscript inside the sum so it's clearer what is being substituted where.
Wasn't there a fairness/continuity condition in the original ADT paper that if there were two "agents" that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if , then ) This would mean that it'd be impossible to have be low while is high, so the argument still goes through.
Although, after this whole line of discussion, I'm realizing that there are enough substantial differences between the original formulation of ADT and the thing I wrote up that I should probably clean up this post a bit and clarify more about what's different in the two formulations. Thanks for that.
in the ADT paper, the asymptotic dominance argument is about the limit of the agent's action as epsilon goes to 0. This limit is not necessarily computable, so the embedder can't contain the agent, since it doesn't know epsilon. So the evil problem doesn't work.
Agreed that the evil problem doesn't work for the original ADT paper. In the original ADT paper, the agents are allowed to output distributions over moves. I didn't like this because it implicitly assumes that it's possible for the agent to perfectly randomize, and I think randomization is better modeled by a (deterministic) action that consults an environmental randomnumber generator, which may be correlated with other things.
What I meant was that, in the version of argmax that I set up, if is the two constant policies "take blank box" and "take shiny box", then for the embedder where the opponent runs argmax to select which box to fill, the argmax agent will converge to deterministically randomizing between the two policies, by the logical inductor assigning very similar expected utility to both options such that the inductor can't predict which action will be chosen. And this occurs because the inductor outputting more of "take the blank box" will have converge to a higher expected value (so argmax will learn to copy that), and the inductor outputting more of "take the shiny box" will have converge to a higher expected value (so argmax will learn to copy that).
The optimality proof might be valid. I didn't understand which specific step you thought was wrong.
So, the original statement in the paper was
It must then be the case that for every . Let be the first element of in . Since every class will be seperated by at least in the limit, will eventually be a distribution over just . And since for every , , by the definition of it must be the case that .
The issue with this is the last sentence. It's basically saying "since the two actions and get equal expected utility in the limit, the total variation distance between a distribution over the two actions, and one of the actions, limits to zero", which is false
And it is specifically disproved by the second counterexample, where there are two actions that both result in 1 utility, so they're both in the same equivalence class, but a probabilistic mixture between them (as converges to playing, for all ) gets less than 1 utility.
Consider the following embedder. According to this embedder, you will play chicken against ADTepsilon who knows who you are. When ADTepsilon considers this embedder, it will always pass the reality filter, since in fact ADTepsilon is playing against ADTepsilon. Furthermore, this embedder gives NeverSwerveBot a high utility. So ADTepsilon expects a high utility from this embedder, through NeverSwerveBot, and it never swerves.
You'll have to be more specific about "who knows what you are". If it unpacks as "opponent only uses the embedder where it is up against [whatever policy you plugged in]", then NeverSwerveBot will have a high utility, but it will get knocked down by the reality filter, because if you converge to never swerving, will converge to 0, and the inductor will learn that so it will converge to assigning equal expected value to and, and converges to 1.
If it unpacks as "opponent is ADTepsilon", and you converge to never swerving, then argmaxing will start duplicating the swerve strategy instead of going straight. In both cases, the argument fails.
I got an improved realityfilter that blocks a certain class of environments that lead conjecture 1 to fail, although it isn't enough to deal with the provided chicken example and lead to a proof of conjecture 1. (the subscripts will be suppressed for clarity)
Instead of the realityfilter for being
it is now
This doesn't just check whether reality is recovered on average, it also checks whether all the "plausible conditionals" line up as well. Some of the conditionals may not be wellformed, as there may be conditioning on loworzero probability events, but these are then multiplied by a very small number, so no harm is done.
This has the nice property that for all "plausibly chosen embedders" that have a probability sufficiently far away from 0, all embedders and that pass this reality filter have the property that
So all embedders that pass the reality filter will agree on the expected utility of selecting a particular embedder that isn't very unlikely to be selected.
I figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower Kcomplexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating.
Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing some other simple environment, you would intuitively expect the AIXI to act as if it's in the simple environment for a brief time before gaining enough information to conclude that things have changed and rederive the new rules of where it is.
Not quite. If taking bet 9 is a prerequisite to taking bet 10, then AIXI won't take bet 9, but if bet 10 gets offered whether or not bet 9 is accepted, then AIXI will be like "ah, future me will take the bet, and wind up with 10+ in the heads world and 20+2 in the tails world. This is just a given. I'll take this +15/15 bet as it has net positive expected value, and the loss in the heads world is more than counterbalanced by the reduction in the magnitude of loss for the tails world"
Something else feels slightly off, but I can't quite pinpoint it at this point. Still, I guess this solves my question as originally stated, so I'll PM you for payout. Well done!
(btw, you can highlight a string of text and hit crtl+4 to turn it into mathmode)
Yup, I meant counterfactual mugging. Fixed.
I think I remember the original ADT paper showing up on agent foundations forum before a writeup on logical EDT with exploration, and my impression of which came first was affected by that. Also, the "this is detailed in this post" was referring to logical EDT for exploration. I'll edit for clarity.
I actually hadn't read that post or seen the idea anywhere before writing this up. It's a pretty natural resolution, so I'd be unsurprised if it was independently discovered before. Sorry about being unable to assist.
The extra penalty to describe where you are in the universe corresponds to requiring sense data to pin down *which* star you are near, out of the many stars, even if you know the laws of physics, so it seems to recover desired behavior.
Giles Edkins coded up a thing which lets you plug in numbers for a 2player, 2move game payoff matrix and it automatically displays possible outcomes in utilityspace. It may be found here. The equilibrium points and strategy lines were added later in MS Paint.
The basic reason for the dependency relation to care about oracle queries from strategies is that, when you have several players all calling the oracle on each other, there's no good way to swap out the oracle calls with the computation. The trick you describe does indeed work, and is a reason to not call any more turing machines than you need to, but there's several things it doesn't solve. For instance, if you are player 1, and your strategy depends on oracle calls to player 2 and 3, and the same applies to the other two players, you may be able to swap out an oracle call to player two with player two's actual code (which calls players 1 and 3), but you can't unpack any more oracle calls into their respective computations without hitting an infinite regress.
I'm not sure what you mean by fixing the utility function occurring before fixing the strategy. In the problem setup of a game, you specify a utility function machine and a strategy machine for everyone, and there isn't any sort of time or order on this (there's just a set of pairs of probabilistic oracle machines) and you can freely consider things such as "what happens when we change some player's strategies/utility function machines"
Ah, the formal statement was something like "if the policy A isn't the argmax policy, the successor policy B must be in the policy space of the future argmax, and the action selected by policy A is computed so the relevant equality holds"
Yeah, I am assuming fast feedback that it is resolved on day .
What I meant was that the computation isn't extremely long in the sense of description length, not in the sense of computation time. Also, we aren't doing policy search over the set of all turing machines, we're doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)
Also I'm less confident in conditional futuretrust for all conditionals than I used to be, I'll try to crystallize where I think it goes wrong.
First: That notation seems helpful. Fairness of the environment isn't present by default, it still needs to be assumed even if the environment is purely actiondetermined, as you can consider an agent in the environment that is using a hardwired predictor of what the argmax agent would do. It is just a piece of the environment, and feeding a different sequence of actions into the environment as input gets a different score, so the environment is purely actiondetermined, but it's still unfair in the sense that the expected utility of feeding action into the function drops sharply if you condition on the argmax agent selecting action . The third condition was necessary to carry out this step. . The intuitive interpretation of the third condition is that, if you know that policy B selects action 4, then you can step from "action 4 is taken" to "policy B takes the actions it takes", and if you have a policy where you don't know what action it takes (third condition is violated), then "policy B does its thing" may have a higher expected utility than any particular action being taken, even in a fair environment that only cares about action sequences, as the hamster dance example shows.
Second: I think you misunderstood what I was claiming. I wasn't claiming that logical inductors attain the conditional futuretrust property, even in the limit, for all sentences or all true sentences. What I was claiming was: The fact that is provable or disprovable in the future (in this case, is ), makes the conditional futuretrust property hold (I'm fairly sure), and for statements where there isn't guaranteed feedback, the conditional futuretrust property may fail. The doubleexpectation property that you state does not work to carry the proof through, because the proof (from the perspective of the first agent), takes as an assumption, so the "conditional on " part has to be outside of the future expectation, when you go back to what the first agent believes.
Third: the sense I meant for "agent is able to reason about this computation" is that the computation is not extremely long, so logical inductor traders can bet on it.
Pretty much that, actually. It doesn't seem too irrational, though. Upon looking at a mathematical universe where torture was decided upon as a good thing, it isn't an obvious failure of rationality to hope that a cosmic ray flips the sign bit of the utility function of an agent in there.
The practical problem with values that care about other mathematical worlds, however, is that if the agent you built has a UDT prior over values, it's an improvement (from the perspective of the prior) for the nosy neigbors/values that care about other worlds, to dictate some of what happens in your world (since the marginal contribution of your world to the prior expected utility looks like some linear combination of the various utility functions, weighted by how much they care about your world) So, in practice, it'd be a bad idea to build a UDT value learning prior containing utility functions that have preferences over all worlds, since it'd add a bunch of extra junk from different utility functions to our world if run.
If exploration is a hack, then why do pretty much all multiarmed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?
Yeah, when I went back and patched up the framework of this post to be less logicalomniscencey, I was able to get , but 2 is a bit too strong to be proved from 1, because my framing of 2 is just about probability disagreements in general, while 1 requires to assign probability 1 to .
Since beliefs/values combinations can be ruled out, would it then be possible to learn values by asking the human about their own beliefs?
I found an improved version by Pavel, that gives a way to construct a proof of from that has a length of . The improved version is here.
There are restrictions to this result, though. One is that the Crule must apply to the logic. This is just the ability to go from to instantiating a such that . Pretty much all reasonable theorem provers have this.
The second restriction is that the theory must be finitely axiomatizable. No axiom schemas allowed. Again, this isn't much of a restriction in practice, because NBG set theory, which proves the consistency of ZFC, is finitely axiomatizable.
The proof strategy is basically as follows. It's shown that the shortest proof of a statement with quantifier depth n must have a length of , if the maximum quantifier depth in the proof is or greater.
This can be flipped around to conclude that if there's a lengthn proof of , the maximum quantifier depth in the proof can be at most .
The second part of the proof involves constructing a boundedquantifier version of a truth predicate. By Tarski's undefinability of truth, a full truth predicate cannot be constructed, but it's possible to exhibit a formula for which it's provable that ( is the formula laying out Tarski's conditions for something to be a truth predicate). Also, if quantifier depth of , there's a proof of ( is the sentence with its free variables substituted for the elements enumerated in the list ) Also, there's a proof that is preserved under inference rules and logical axioms, as long as everything stays below a quantifier depth of .
All these proofs can be done in lines. One factor of comes from the formula abbreviated as getting longer at a linear rate, and the other factor comes from having to prove for each seperately as an ingredient for the next proof.
Combining the two parts, the bound on the quantifier depth and the bound on how long it takes to prove stuff about the truth predicate, make it take steps to prove all the relevant theorems about a sufficiently large bounded quantifier depth truth predicate, and then you can just go "the statement that we are claiming to have been proved must have apply to it, and we've proved this is equivalent to the statement itself"
As a further bonus, a single length proof can establish the consistency of the theory itself for all length proofs.
It seems like a useful project to develop a program that will automatically write a proof of this form, to assess whether abstract unpacking of bounded proofs is usable in practice, but it will require digging into a bunch of finicky details of exactly how to encode a math theory inside itself.
Caught a flaw with this proposal in the currently stated form, though it is probably patchable.
When unpacking a proof, at some point the sentence will be reached as a conclusion, which is a false statement.
It doesn't hurt my brain, but there's a brain fog that kicks in eventually, that's kind of like a blankness with no new ideas coming, an aversion to further work, and a reduction in working memory, so I can stare at some piece of math for a while, and not comprehend it, because I can't load all the concepts into my mind at once. It's kind of like a hard limit for any cognitionintensive task.
This kicks in around the 2 hour mark for really intensive work/studying, although for less intensive work/studying, it can vary up all the way up to 8 hours. As a general rule of thumb, the afinil class of drugs triples my time limit until the brain fog kicks in, at a cost of less creative and lateral thinking.
Because of this, my study habits for school consisted of alternating 2hour study blocks and naps.
I think that in that case, the agent shouldn't smoke, and CDT is right, although there is sidechannel information that can be used to come to the conclusion that the agent should smoke. Here's a reframing of the provided payoff matrix that makes this argument clearer. (also, your problem as stated should have 0 utility for a nonsmoker imagining the situation where they smoke and get killed)
Let's say that there is a kingdom which contains two types of people, good people and evil people, and a person doesn't necessarily know which type they are. There is a magical sword enchanted with a heavenly aura, and if a good person wields the sword, it will guide them do heroic things, for +10 utility (according to a good person) and 0 utility (according to a bad person). However, if an evil person wields the sword, it will afflict them for the rest of their life with extreme itchiness, for 100 utility (according to everyone).
good person's utility estimates:

takes sword

I'm good: 10

I'm evil: 90


don't take sword: 0
evil person's utility estimates:

takes sword

I'm good: 0

I'm evil: 100


don't take sword: 0
As you can clearly see, this is the exact same payoff matrix as the previous example. However, now it's clear that if a (secretly good) CDT agent believes that most of society is evil, then it's a bad idea to pick up the sword, because the agent is probably evil (according to the info they have) and will be tormented with itchiness for the rest of their life, and if it believes that most of society is good, then it's a good idea to pick up the sword. Further, this situation is intuitively clear enough to argue that CDT just straightup gets the right answer in this case.
A human (with some degree of introspective power) in this case, could correctly reason "oh hey I just got a little warm fuzzy feeling upon thinking of the hypothetical where I wield the sword and it doesn't curse me. This is evidence that I'm good, because an evil person would not have that response, so I can safely wield the sword and will do so".
However, what the human is doing in this case is using sidechannel information that isn't present in the problem description. They're directly experiencing sense data as a result of the utility calculation outputting 10 in that hypothetical, and updating on that. In a society where everyone was really terrible at introspection so the only access they had to their decision algorithm was seeing their actual decision, (and assuming no previous decision problems that good and evil people decide differently on so the good person could learn that they were good by their actions), it seems to me like there's a very intuitively strong case for not picking up the sword/not smoking.
A: While that is a really interesting note that I hadn't spotted before, the standard formulation of exploration steps in logical inductor decision theory involve infinite exploration steps over all time, so even though an agent of this type would be able to inductively learn from what other agents do in different decision problems in less time than it naively appears, that wouldn't make it explore less.
B: What I intended with the remark about Thompson sampling was that troll bridge functions on there being two distinct causes of "attempting to cross the bridge". One is crossing because you believe it to be the best action, and the other is crossing because an exploration step occurred, and Thompson sampling doesn't have a split decision criterion like this. Although now that you point it out, it is possible to make a Thompson sampling variant where the troll blows up the bridge when "crossing the bridge" is not the highestranked action.
The beliefs aren't arbitrary, they're still reasoning according to a probability distribution over propositionally consistent "worlds". Furthermore, the beliefs converge to a single number in the limit of updating on theorems, even if the sentence of interest is unprovable. Consider some large but finite set S of sentences that haven't been proved yet, such that the probability of sampling a sentence in that set before sampling the sentence of interest "x", is very close to 1. Then pick a time N, that is large enough that by that time, all the logical relations between the sentences in S will have been found. Then, with probability very close to 1, either "x" or "notx" will be sampled without going outside of S.
So, if there's some cool new theorem that shows up relating "x" and some sentence outside of S, like "y>x", well, you're almost certain to hit either "x" or "notx" before hitting "y", because "y" is outside S, so this hot new theorem won't affect the probabilities by more than a negligible amount.
Also I figured out how to generalize the prior a bit to take into account arbitrary constraints other than propositional consistency, though there's still kinks to iron out in that one. Check this.
Update: This isn't really an issue, you just need to impose an assumption that there is some function such that , and is computable in time polynomial in , and you always find out whether exploration happened on turn after days.
This is just the condition that there's a subsequence where good feedback is possible, and is discussed significantly in section 4.3 of the logical induction paper.
If there's a subsequence B (of your subsequence of interest, A) where you can get good feedback, then there's infinite exploration steps on subsequence B (and also on A because it contains B)
This post is hereby deprecated. Still right, just not that relevant.
Yup, that particular book is how I learned to prove stuff too. (well, actually, there was a substantial time delay between reading that and being able to prove stuff, but it's an extremely worthwhile overview)
You're pretending that it's what nature is doing what you update your prior. It works when sentences are shown to you in an adversarial order, but there's the weird aspect that this prior expects the sentences to go back to being drawn from some fixed distribution afterwards. It doesn't do a thing where it goes "ah, I'm seeing a bunch of blue blocks selectively revealed, even though I think there's a bunch of red blocks, the next block I'll have revealed will probably be blue". Instead, it just sticks with its prior on red and blue blocks.
There's a misconception, it isn't about finding sentences of the form and , because if you do that, it immediately disproves . It's actually about merely finding many instances of where has probability, and this lowers the probability of . This is kind of like how finding out about the BanachTarski paradox (something you assign low probability to) may lower your degree of belief in the axiom of choice.
The particular thing that prevents trolling is that in this distribution, there's a fixed probability of drawing on the next round no matter how many implications and 's you've found so far. So the way it evades trolling is a bit cheaty, in a certain sense, because it believes that the sequence of truth or falsity of math sentences that it sees is drawn from a certain fixed distribution, and doesn't do anything like believing that it's more likely to see a certain class of sentences come up soon.
There's a difference between "consistency" (it is impossible to derive X and notX for any sentence X, this requires a halting oracle to test, because there's always more proof paths), and "propositional consistency", which merely requires that there are no contradictions discoverable by boolean algebra only. So A^B is propositionally inconsistent with notA, and propositionally consistent with A. If there's some clever way to prove that B implies notA, it wouldn't affect the propositional consistency of them at all. Propositional consistency of a set of sentences can be verified in exponential time.
If you drop the Paretoimprovement condition from the cell rank, and just have "everyone sorts things by their own utility", then you won't necessarily get a Paretooptimal outcome (within the set of cell centerpoints), but you will at least get a point where there are no strict Pareto improvements (no points that leave everyone better off).
The difference between the two is... let's say we've got a 2player 2move game that in utilityspace, makes some sort of quadrilateral. If the top and right edges join at 90 degrees, the Paretofrontier would be the point on the corner, and the set of "no strict Pareto improvements" would be the top and the right edges.
If that corner is obtuse, then both "Pareto frontier" and "no strict Pareto improvements" agree that both line edges are within the set, and if the corner is acute, then both "Pareto frontier" and "no strict Pareto improvements" agree that only the corner is within the set. It actually isn't much of a difference, it only manifests when the utilities for a player are exactly equal, and is easily changed by a little bit of noise.
The utilityapproximation issue you pointed out seems to be pointing towards the impossibility of guaranteeing limiting to a point on the Pareto frontier (when you make the cell size smaller and smaller), precisely because of that "this set is unstable under arbitrarily small noise" issue.
But, the "set of all points that have no strict Pareto improvements by more than for all players", ie, the fuzzed version of "set of points with no strict pareto improvement", does seem to be robust against a little bit of noise, and doesn't require the Paretoimprovement condition on everyone's ranking of cells.
So I'm thinking that if that's all we can attain (because of the complication you pointed out), then it lets us drop that inelegant Paretoimprovement condition.
I'll work on the proof that for sufficiently small cell size , you can get an outcome within of the set of "no strict Pareto improvements available"
Nice job spotting that flaw.
I read through the entire Logical Induction paper, mosteverything on Agent Foundations Forum, the advised Linear Algebra textbook, part of a Computational Complexity textbook, and the Optimal PolyTime Estimators paper.
I'd be extremely interested in helping out other people with learning MIRIrelevant math, having gone through it solo. I set up a Discord chatroom for it, but it's been pretty quiet. I'll PM you both.
Intermediate update:
The handwavy argument about how you'd get propositional inconsistency in the limit of imposing the constraint of "the string cannot contain and and and... and "
is less clear than I thought. The problem is that, while the prior may learn that that constraint applies as it updates on more sentences, that particular constraint can get you into situations where adding either or leads to a violation of the constraint.
So, running the prior far enough forward leads to the probability distribution being nearly certain that, while that particular constraint applied in the past, it will stop applying at some point in the future by vetoing both possible extensions of a string of sentences, and then lessconstrained conditions will apply from that point forward.
On one hand, if you don't have the computational resources to enforce full propositional consistency, it's expected that most of the worlds you generate will be propositionally inconsistent, and midway through generating them you'll realize that some of them are indeed propositionally inconsistent.
On the other hand, we want to be able to believe that constraints capable of painting themselves into a corner will apply to reality forevermore.
I'll think about this a bit more. One possible line of attack is having and not add up to one, because it's possible that the sentence generating process will just stop cold before one of the two shows up, and renormalizing them to 1. But I'd have to check if it's still possible to approximate the distribution if we introduce this renormalization, and to be honest, I wouldn't be surprised if there was a more elegant way around this.
EDIT: yes it's still possible to approximate the distribution in known time if you have refer to , although the bounds are really loose. Will type up the proof later.