Posts
Comments
I have no special insight here but boring, cynical common sense suggests the following:
The big difference between now and the pre-ChatGPT era is that Google and a bunch of other massive competitors have woken up and want to blaze past OpenAI. For their part, OpenAI doesn't want there to be a perception that they have been overtaken, so will want to release on a fast enough schedule to be able to trump Google's latest and greatest. (Of course the arrival of something marketed as "GPT-5" tells us nothing about the true state of progress. The GPTs aren't natural kinds.)
You should be able to get it as a corollary of the lemma that given two disjoint convex subsets U and V of R^n (which are non-zero distance apart), there exists an affine function f on R^n such that f(u) > 0 for all u in V and f(v) < 0 for all v in V.
Our two convex sets being (1) the image of the simplex under the F_i : i = 1 ... n and (2) the "negative quadrant" of R^n (i.e. the set of points all of whose co-ordinates are non-positive.)
an authoritative payoff matrix that X can't safely calculate xerself.
Why not? Can't the payoff matrix be "read off" from the "world program" (assuming X isn't just 'given' the payoff matrix as an argument.)
- Actually, this is an open problem so far as I know: show that if X is a Naive Decision Theory agent as above, with some analyzable inference module like a halting oracle, then there exists an agent Y written so that X cooperates against Y in a Prisoner's Dilemma while Y defects.
Let me just spell out to myself what would have to happen in this instance. For definiteness, let's take the payoffs in prisoner's dilemma to be $0 (CD), $1 (DD), $10 (CC) and $11 (DC).
Now, if X is going to co-operate and Y is going to defect then X is going to prove "If I co-operate then I get $0". Therefore, in order to co-operate, X must also prove the spurious counterfactual "If I defect then I get $x" for some negative value of x.
But suppose I tweak the definition of the NDT agent so that whenever it can prove (1) "if output = a then utility >= u" and (2) "if output != a then utility <= u" it will immediately output a. (And if several statements of the forms (1) and (2) have been proved then the agent searches for them in the order that they were proved) Note that our agent will quickly prove "if output = 'defect' then utility >= $1". So if it ever managed to prove "if output = 'co-operate' then utility = $0" it would defect right away.
Since I have tweaked the definition, this doesn't address your 'open problem' (which I think is a very interesting one) but it does show that if we replace the NDT agent with something only slightly less naive, then the answer is that no such Y exists.
(We could replace Prisoner's Dilemma with an alternative game where each player has a third option called "nuclear holocaust", such that if either player opts for nuclear holocaust then both get (say) -$1, and ask the same question as in your note 2. Then even for the tweaked version of X it's not clear that no such Y exists.)
ETA: I'm afraid my idea doesn't work: The problem is that the agent will also quickly prove "if 'co-operate' then I receive at least $0." So if it can prove the spurious counterfactual "if 'defect' then receive -1" before proving the 'real' counterfactual "if 'co-operate' then receive 0" then it will co-operate.
We could patch this up with a rule that said "if we deduce a contradiction from the assumption 'output = a' then immediately output a" which, if I remember rightly, is Nesov's idea about "playing chicken with the inconsistency". Then on deducing the spurious counterfactual "if 'defect' then receive -1" the agent would immediately defect, which could only happen if the agent itself were inconsistent. So if the agent is consistent, it will never deduce this spurious counterfactual. But of course, this is getting even further away from the original "NDT".
[general comment on sequence, not this specific post.]
You have such a strong intuition that no configuration of classical point particles and forces can ever amount to conscious awareness, yet you don't immediately generalize and say: 'no universe capable of exhaustive description by mathematically precise laws can ever contain conscious awareness'. Why not? Surely whatever weird and wonderful elaboration of quantum theory you dream up, someone can ask the same old question: "why does this bit that you've conveniently labelled 'consciousness' actually have consciousness?"
So you want to identify 'consciousness' with something ontologically basic and unified, with well-defined properties (or else, to you, it doesn't really exist at all). Yet these very things would convince me that you can't possibly have found consciousness given that, in reality, it has ragged, ill-defined edges in time, space, even introspective content.
Stepping back a little, it strikes me that the whole concept of subjective experience has been carefully refined so that it can't possibly be tracked down to anything 'out there' in the world. Kant and Wittgenstein (among others) saw this very clearly. There are many possible conclusions one might draw - Dennett despairs of philosophy and refuses to acknowledge 'subjective experience' at all - but I think people like Chalmers, Penrose and yourself are on a hopeless quest.
The comprehension axiom schema (or any other construction that can be used by a proof checker algorithm) isn't enough to prove all the statements people consider to be inescapable consequences of second-order logic.
Indeed, since the second-order theory of the real numbers is categorical, and since it can express the continuum hypothesis, an oracle for second-order validity would tell us either that CH or ¬CH is 'valid'.
("Set theory in sheep's clothing".)
But the bigger problem is that we can't say exactly what makes a "silly" counterfactual different from a "serious" one.
Would it be naive to hope for a criterion that roughly says: "A conditional P ⇒ Q is silly iff the 'most economical' way of proving it is to deduce it from ¬P or else from Q." Something like: "there exists a proof of ¬P or of Q which is strictly shorter than the shortest proof of P ⇒ Q"?
A totally different approach starts with the fact that your 'lemma 1' could be proved without knowing anything about A. Perhaps this could be deemed a sufficient condition for a counterfactual to be serious. But I guess it's not a necessary condition?
Suppose we had a model M that we thought described cannons and cannon balls. M consists of a set of mathematical assertions about cannons
In logic, the technical terms 'theory' and 'model' have rather precise meanings. If M is a collection of mathematical assertions then it's a theory rather than a model.
formally independent of the mathematical system A in the sense that the addition of some axiom A0 implies Q, while the addition of its negation, ~A0, implies ~Q.
Here you need to specify that adding A0 or ~A0 doesn't make the theory inconsistent, which is equivalent to just saying: "Neither Q nor ~Q can be deduced from A."
Note: if by M you had actually meant a model, in the sense of model theory, then for every well-formed sentence s, either M satisfies s or M satisfies ~s. But then models are abstract mathematical objects (like 'the integers'), and there's usually no way to know which sentences a model satisfies.
Perhaps a slightly simpler way would be to 'run all algorithms simultaneously' such that each one is slowed down by a constant factor. (E.g. at time t = (2x + 1) * 2^n, we do step x of algorithm n.) When algorithms terminate, we check (still within the same "process" and hence slowed down by a factor of 2^n) whether a solution to the problem has been generated. If so, we return it and halt.
ETA: Ah, but the business of 'switching processes' is going to need more than constant time. So I guess it's not immediately clear that this works.
I agree that definitions (and expansions of the language) can be useful or counterproductive, and hence are not immune from criticism. But still, I don't think it makes sense to play the Bayesian game here and attach probabilities to different definitions/languages being correct. (Rather like how one can't apply Bayesian reasoning in order to decide between 'theory 1' and 'theory 2' in my branching vs probability post.) Therefore, I don't think it makes sense to calculate expected utilities by taking a weighted average over each of the possible stances one can take in the mind-body problem.
I don't understand the question, but perhaps I can clarify a little:
I'm trying to say that (e.g.) analytic functionalism and (e.g.) property dualism are not like inconsistent statements in the same language, one of which might be confirmed or refuted if only we knew a little more, but instead like different choices of language, which alter the set of propositions that might be true or false.
It might very well be that the expanded language of property dualism doesn't "do" anything, in the sense that it doesn't help us make decisions.
Of course, we haven't had any instances of jarring physical discontinuities not being accompanied by 'functional discontinuities' (hopefully it's clear what I mean).
But the deeper point is that the whole presumption that we have 'mental continuity' (in a way that transcends functional organization) is an intuition founded on nothing.
(To be fair, even if we accept that these intuitions are indefensible, it's remains to be explained where they come from. I don't think it's all that "bizarre".)
Nice sarcasm. So it must be really easy for you to answer my question then: "How would you show that my suggestions are less likely?"
Right?
You really think there is logical certainty that uploading works in principle and your suggestions are exactly as likely as the suggestion 'uploading doesn't actually work'?
How would you show that my suggestions are less likely? The thing is, it's not as though "nobody's mind has annihilated" is data that we can work from. It's impossible to have such data except in the first-person case, and even there it's impossible to know that your mind didn't annihilate last year and then recreate itself five seconds ago.
We're predisposed to say that a jarring physical discontinuity (even if afterwards, we have an agent functionally equivalent to the original) is more likely to cause mind-annihilation than no such discontinuity, but this intuition seems to be resting on nothing whatsoever.
The identify of an object is a choice, a way of looking at it. The "right" way of making this choice is the way that best achieves your values.
I think that's really the central point. The metaphysical principles which either allow or deny the "intrinsic philosophical risk" mentioned in the OP are not like theorems or natural laws, which we might hope some day to corroborate or refute - they're more like definitions that a person either adopts or does not.
I don't see either as irrational
I have to part company here - I think it is irrational to attach 'terminal value' to your biological substrate (likewise paperclips), though it's difficult to explain exactly why. Terminal values are inherently irrational, but valuing the continuance of your thought patterns is likely to be instrumentally rational for almost any set of terminal values, whereas placing extra value on your biological substrate seems like it could only make sense as a terminal value (except in a highly artificial setting e.g. Dr Evil has vowed to do something evil unless you preserve your substrate).
Of course this raises the question of why the deferred irrationality of preserving one's thoughts in order to do X is better than the immediate irrationality of preserving one's substrate for its own sake. At this point I don't have an answer.
For any particular proposal for mind-uploading, there's probably a significant risk that it doesn't work, but I understand that to mean: there's a risk that what it produces isn't functionally equivalent to the person uploaded. Not "there's a risk that when God/Ripley is watching everyone's viewscreens from the control room, she sees that uploaded person's thoughts are on a different screen from the original."
If the rules of this game allow one side to introduce a "small intrinsic philosophical risk" attached to mind-uploading, even though it's impossible in principle to detect whether someone has suffered 'arbitrary Searlean mind-annihiliation', then surely the other side can postulate a risk of arbitrary mind-annihilation unless we upload ourselves. (Even ignoring the familiar non-Searlean mind-annihilation that awaits us in old age.)
Perhaps a newborn mind has a half-life of only three hours before spontaneously and undetectably annihilating itself.
Excellent.
Perhaps m could serve as a 'location', so that you'd be more likely to meet opponents with similar m values to your own.
Thanks, this is all fascinating stuff.
One small suggestion: if you wanted to, there are ways you could eliminate the phenomenon of 'last round defection'. One idea would be to randomly generate the number of rounds according to an exponential distribution. This is equivalent to having, on each round, a small constant probability that this is the last round. To be honest though, the 'last round' phenomenon makes things more rather than less interesting.
Other ways to spice things up would be: to cause players to make mistakes with small probability (say a 1% chance of defecting when you try to co-operate, and vice versa); or have some probability of misremembering the past.
Conversely, when we got trolled an unspecified length of time ago, an incompetent crackpot troll who shall remain nameless kept having all his posts and comments upvoted by other trolls.
It would help if there was a restriction on how much karma one could add or subtract from a single person in a given time, as others are suggesting.
What interests me about the Boltzmann brain (this is a bit of a tangent) is that it sharply poses the question of where the boundary of a subjective state lies. It doesn't seem that there's any part X of your mental state that couldn't be replaced by a mere "impression of X". E.g. an impression of having been to a party yesterday rather than a memory of the party. Or an impression that one is aware of two differently-coloured patches rather than the patches themselves together with their colours. Or an impression of 'difference' rather than an impression of differently coloured patches.
If we imagine "you" to be a circle drawn with magic marker around a bunch of miscellaneous odds and ends (ideas, memories etc. but perhaps also bits of the 'outside world', like the tattoos on the guy in Memento) then there seems to be no limit to how small we can draw the circle - how much of your mental state can be regarded as 'external'. But if only the 'interior' of the circle needs to be instantiated in order to have a copy of 'you', it seems like anything, no matter how random, can be regarded as a "Boltzmann brain".
Every now and then I see a claim that if there were a uniform weighting of mathematical structures in a Tegmark-like 'verse---whatever that would mean even if we ignore the decision theoretic aspects which really can't be ignored but whatever---that would imply we should expect to find ourselves as Boltzmann mind-computations
The idea is this: Just as most N-bit binary strings have Kolmogorov complexity close to N, so most N-bit binary strings containing s as a substring have Kolmogorov complexity at least N - length(s) + K(s) - somethingsmall.
And now applying the analogy:
N-bit binary string <---> Possible universe
N-bit binary string containing substring s <---> Possible universe containing a being with 'your' subjective state. (Whatever the hell a 'subjective state' is.)
we get:
N-bit binary string containing substring s with Kolmogorov complexity >= N - length(s) + K(s) - O(1) <---> A Boltzmann brain universe.
We don't seem to be experiencing nonsensical chaos, therefore the argument concludes that a uniform weighting is inadequate and an Occamian weighting over structures is necessary
I've never seen 'the argument' finish with that conclusion. The whole point of the Boltzmann brain idea is that even though we're not experiencing nonsensical chaos, it still seems worryingly plausible that everything outside of one's instantaneous mental state is just nonsensical chaos.
What an 'Occamian' weighting buys us is not consistency with our experience of a structured universe (because a Boltzmann brain hypothesis already gives us that) but the ability to use science to decide what to believe - and thus what to do - rather than descend into a pit of nihilism and despair.
Hmm, are you interpreting the results as "boo CEOs" then?
I'm only interpreting the result as "boo this fictional CEO".
How would you modify the experiment to return information closer to what was sought?
Well, what Knobe is looking for is a situation where subjects make their 'is' judgements partly on the basis of their 'ought' judgements. Abstractly, we want a 'moral proposition' X and a 'factual proposition' Y such that when a subject learns X, they tend to give higher credence to Y than when they learn ¬X. Knobe takes X = "The side-effects are harmful to the environment" and Y = "The effect on the environment was intended by the CEO".
(My objection to Knobe's interpretation of his experiment can thus be summarised: "The subjects are using Y to express a moral fact, not a 'factual fact'." After all, if you asked them to explain themselves, in one case they'd say "It wasn't intentional because (i) he didn't care about the effect on the environment, only his bottom line." In the other they'd say "it was intentional because (ii) he knew about the effect and did it anyway." But surely the subjects agree on (i) and (ii) in both cases - the only thing that's changing is the meaning of the word 'intentional', so that the subjects can pass moral judgement on the CEO.)
To answer your question: I'm not sure that genuine examples of this phenomenon exist, except when the 'factual' propositions concern the future. If Y is about a past event, then I think any subject who seems to be exhibiting the Knobe effect will quickly clarify and/or correct themselves if you point it out. (Rather like if you somehow tricked someone into saying an ungrammatical sentence and then told them the error.)
Instead, the moral character of an action’s consequences also seems to influence how non-moral aspects of the action – in this case, whether someone did something intentionally or not – are judged.
Stupid Knobe effect. Obviously the subjects' responses were an attempt to pass judgement on the CEO. In one case, he deserves no praise, but in the other he does deserve blame [or so a typical subject would presumably think]. The fact that they were forced to express their judgement of moral character through the word 'intentional', which sometimes is a 'non-moral' quality of an action, doesn't tell us anything interesting.
Merely saying it wouldn't be so bad, as long as there was some substance behind the assertion.
But basically his argument boils down to this:
"If you dunk two wooden boards with wires poked through them into soapy water and then lift them out, the soaps films between the wires are the solution to an NP-hard problem. But creating the boards and wires and dunking them can be done in polynomial time. So as long as physics is Turing computable, P = NP."
This is a fantastically stupid argument, because you could easily create a simulation of the above process that appeared to be just as good at generating the answers to this problem as the real soap films. But if you gave it a somewhat difficult problem, what would happen is that it would quickly generate something which was nearly but not quite a solution, and there's no reason to think that real soap films would do better.
The fact that Bringsjord got as far as formalising his argument in modal logic and writing it up, without even thinking of the above objection, is quite incredible.
Yeah well, it's Selmer P = NP Bringsjord. He's a complete joke!
I think what mathemajician means is that if the stream of data is random (in that the bits are independent random variables each with probability 1/2 of being 1) then Solomonoff induction converges on the uniform measure with high probability (probability 1, in fact).
I'm sure you knew that already, but you don't seem to realize that it undercuts the logic behind your claim:
The universal prior implies you should say "substantially less than 1 million".
O(BB^-1) (or whatever it is) is still greater than O(1) though, and (as best I can reconstruct it) your argument relies on there being a constant penalty.
I think you're implicitly assuming that the K complexity of a hypothesis of the form "these n random bits followed by the observations predicted by H" equals n + (K-complexity of H) + O(1). Whereas actually, it's n + (K-complexity of H) + O(log(n)). (Here, the log(n) is needed to specify how long the sequence of random bits is).
So if you've observed a hugely long sequence of random bits then log(n) is getting quite large and 'switching universe' hypotheses get penalized relative to hypotheses that simply extend the random sequence.
This makes intuitive sense - what makes a 'switching universe' unparsimonious is the arbitrariness of the moment of switching.
(Btw, I thought it was a fun question to think about, and I'm always glad when this kind of thing gets discussed here.)
ETA: But it gets more complicated if the agent is allowed to use its 'subjective present moment' as a primitive term in its theories, because then we really can describe a switching universe with only a constant penalty, as long as the switch happens 'now'.
The mathematical result is trivial, but its interpretation as the practical advice "obtaining further information is always good" is problematic, for the reason taw points out.
A particular agent can have wrong information, and make a poor decision as a result of combining the wrong information with the new information. Since we're assuming that the additional information is correct, I think it's reasonable to also stipulate that all previous information is correct.
Actually, I thought of that objection myself, but decided against writing it down. First of all, it's not quite right to refer to past information as 'right' or 'wrong' because information doesn't arrive in the form of propositions-whose-truth-is-assumed, but in the form of sense data.* It's better to talk about 'misleading information' rather than 'wrong information'. When adversary A tells you P, which is a lie, your information is not P but "A told me P". (Actually, it's not even that, but you get the idea.) If you don't know A is an adversary then "A told me P" is misleading, but not wrong.
Now, suppose the agent's prior has got to where it is due to the arrival of misleading information. Then relative to that prior, the agent still increases its expected utility whenever it acquires new data (ignoring taw's objection).
(On the other hand, if we're measuring expectations wrt the knowledge of some better informed agent then yes, acquiring information can decrease expected utility. This is for the same reason that, in a Gettier case, learning a new true and relevant fact (e.g. most nearby barn facades are fake) can cause you to abandon a true belief in favour of a false one.)
* Yes yes, I know statements like this are philosophically contentious, but within LW they're assumptions to work from rather than be debated.
It's true for anyone who understands random variables and expectations. There's a one line proof, after all.
Or in other words, the expectation of a max of some random variables is always greater or equal to the max of the expectations.
You could call this 'standard knowledge' but it's not the kind of thing one bothers to commit to memory. Rather, one immediately perceives it as true.
What I'm really asking is, if some statement turns out to be undecidable for all of our models,
Nitpick: you don't mean "models" here, you mean "theories".
does that make that conjecture meaningless
Why should it?
or is undecidable somehow distinct from unverifiable.
Oh... you're implicitly assuming a 1920s style verificationism whereby "meaningfulness" = "verifiability". That's a very bad idea because most/all statements turn out to be 'unverifiable' - certainly all laws of physics.
As for mathematics, the word 'verifiable' applied to a mathematical statement simply means 'provable' - either that or you're using the word in a way guaranteed to cause confusion.
Or perhaps by "statement S is verifiable" what you really mean is "there exists an observation statement T such that P(T|S) is not equal to P(T|¬S)"?
You can find it here though.
You don't get to infer P from Q which is probably false, and then assert P with conviction.
What if I were to put P = "there is no such thing as absolute simultaneity" and Q = "special relativity"?
Or P = "the earth orbits the sun" and Q = "Newton's theory of gravity"?
Hyperreals or some other modification to the standard framework (see discussion of "infinity shades" in Bostrom) are necessary in order to say that a 50% chance of infinite utility is better than a 1/3^^^3 chance of infinite utility.
No it isn't, unless like Hayek you think there's something 'not blindingly obvious' about the 'modification to the standard framework' that consists of stipulating that probability p of infinite utility is better than probability q of infinite utility whenever p > q.
This sort of 'move' doesn't need a name. (What does he call it? "Vector valued utilities" or something like that?) It doesn't need to have a paper written about it. It certainly shouldn't be pretended that we're somehow 'improving on' or 'fixing the flaws in' Pascal's original argument by explicitly writing this move down.
Alan Hajek's article is one of the stupidest things I've ever read, and a depressing indictment on the current state of academic philosophy. Bunch of pointless mathematical gimmicks which he only thinks are impressive because he himself barely understands them.
That may be right, but I don't see how it conflicts with my (throwaway) remark.
"Quale" works better than "qualia" because (i) it sounds more like the word "claw" and (ii) it's singular whereas 'qualia' is plural.
Why is the difference relevant? I honestly can't imagine how someone could be in the position of 'feeling as though 2+2=4 is either necessarily true or necessarily false' but not 'feeling as though it's necessarily true'.
(FWIW I didn't downvote you.)
If I say "37460225182244100253734521345623457115604427833 + 52328763514530238412154321543225430143254061105 = 8978898869677433866588884288884888725858488938" it should not immediately strike you as though I'm asserting a necessary truth that cannot possibly be otherwise.
It immediately strikes me that what you're asserting is either necessarily true or necessarily false, and whichever it is it could not be otherwise.
Nitpick 1:
It seems likely to be the optimal way to build an AI that has to communicate with other AIs.
This seems a very contentious claim. For instance, to store the relative heights of people, wouldn't it make more sense to have the virtual equivalent of a ruler with markings on it rather than the virtual equivalent of a table of sentences of the form "X is taller than Y"?
I think the best approach here is just to explicitly declare it as an assumption: 'for argument's sake' your robot uses this method. End of story.
Nitpick 2:
Because of General Relativity, when applied to the real world, it is, in fact, wrong.
This is false. General Relativity doesn't contradict the fact that space is "locally Euclidean".
He's talking about the Lewis Carroll dialog that inspired the ones in GEB. "What the tortoise said to Achilles."
The point of the dialog is that there's something irreducibly 'dynamic' about the process of logical inference. Believing "A" and "A implies B" does not compel you to believe "B". Even if you also believe "A and (A implies B) together imply B". A static 'picture' of an inference is not itself an inference.
Sure, R is recursively enumerable, but S and S_I are not.
The set S is the set of all total recursive functions. This is set in stone for all time. Therefore, there is only one way that S_I can refer to different things:
- Our stock of observational data may be different. In other words, the set I and the values of h(i) for i in I may be different.
But regardless of I and the values of h(i), it's easy to see that one cannot restrict S_I in the way you're attempting to do.
In fact, one can easily see that S_I = the set of functions of the form "if x is in I then h(x), otherwise f(x)" where f is an arbitrary recursive function.
That is, the whole "I" business is completely pointless, except (presumably) to help the reader assure themselves that the result does apply to AIXI.
phi_n(k) may not halt.
Goedel numbers are integers; how could anything that is enumerated by Goedel numbers not be enumerable? S_I, S, and R are all enumerable. The original paper says that R is the set of partial mu-recursive functions, which means computable functions; and the number of computable functions is enumerable.
You seem to be using 'enumerable' to mean 'countable'. (Perhaps you're confusing it with 'denumerable' which does mean countable.)
RichardKenneway means "recursively enumerable".
What I am doing is choosing the members of S_I.
You're not allowed to - de Blanc has already supplied a definition of S_I. One must either adopt his definition or be talking about something other than his result.
No; the utility function is stipulated to be computable.
What Manfred is calling U(n) here corresponds to what the paper would call U(phi_n(k)).
For simplicity we can make the "naive Bayesian" assumption that they're all independent,
But then for that to work, your prior belief that x + 1 > x, for really large x, has to begin very close to 1. If there was some delta > 0 such that the prior beliefs were bounded above by 1 - delta then the infinite product would always be zero even after Bayesian updates.
How would you know to have a prior belief that reallybignumber + 1 > reallybignumber even in advance of noticing the universal generalization?
Because then you can just number your possible outcomes by integers n and set p(n) to 1/U(n) * 1/2^n, which seems too easy to have been missed.
The reason why this wouldn't work is that sometimes what you're calling "U(n)" would fail to be well defined (because some computation doesn't halt) whereas p(n) must always return something.