Posts
Comments
What makes it rational is that there is an actual underlying hypothesis about how weather works, instead of vague "LLMs are a lot like human uploads". And weather prediction outputs numbers connected to reality we actually care about. And there is no alternative credible hypothesis that implies weather prediction not working.
I don't want to totally dismiss empirical extrapolations, but given the stakes, I would personally prefer for all sides to actually state their model of reality and how they think evidence changed it's plausibility, as formally as possible.
There is no such disagreement, you just can't test all inputs. And without knowledge of how internals work, you may me wrong about extrapolating alignment to future systems.
Yes, except I would object to phrasing this anthropic stuff as "we should expect ourselves to be agents that exist in a universe that abstracts well" instead of "we should value universe that abstracts well (or other universes that contain many instances of us)" - there is no coherence theorems that force summation of your copies, right? And so it becomes apparent that we can value some other thing.
Also even if you consider some memories a part of your identity, you can value yourself slightly less after forgetting them, instead of only having threshold for death.
It doesn't matter whether you call your multiplier "probability" or "value" if it results in your decision to not care about low-measure branch. The only difference is that probability is supposed to be about knowledge, and Wallace's argument involving arbitrary assumption, not only physics, means it's not probability, but value - there is no reason to value knowledge of your low-measure instances less.
this makes decision theory and probably consequentialist ethics impossible in your framework
It doesn't? Nothing stops you from making decisions in a world where you are constantly splitting. You can try to maximize splits of good experiences or something. It just wouldn't be the same decisions you would make without knowledge of splits, but why new physical knowledge shouldn't change your decisions?
Things like lions, and chairs are other examples.
And counted branches.
This is how Wallace defines it (he in turn defines macroscopically indistinguishable in terms of providing the same rewards). It’s his term in the axiomatic system he uses to get decision theory to work. There’s not much to argue about here?
His definition leads to contradiction with informal intuition that motivates consideration of macroscopical indistinguishability in the first place.
We should care about low-measure instances in proportion to the measure, just as in classical decision theory we care about low-probability instances in proportion to the probability.
Why? Wallace's argument is just "you don't care about some irrelevant microscopic differences, so let me write this assumption that is superficially related to that preference, and here - it implies the Born rule". Given MWI, there is nothing wrong physically or rationally in valuing your instances equally whatever their measure is. Their thoughts and experiences don't depend on measure the same way they don't depend on thickness or mass of a computer implementing them. You can rationally not care about irrelevant microscopic differences and still care about number of your thin instances.
How many notions of consciousness do you think are implementable by a short Python program?
Because scale doesn't matter - it doesn't matter if you are implemented on thick or narrow computer.
First of all, macroscopical indistinguishability is not fundamental physical property - branching indifference is additional assumption, so I don't see how it's not as arbitrary as branch counting.
But more importantly, branching indifference assumption is not the same as informal "not caring about macroscopically indistinguishable differences"! As Wallace showed, branching indifference implies the Born rule implies you almost shouldn't care about you in a branch with a measure of 0.000001 even though it may involve drastic macroscopic difference for you in that branch. You being macroscopic doesn't imply you shouldn't care about your low-measure instances.
But why would you want to remove this arbitrariness? Your preferences are fine-grained anyway, so why retain classical counting, but deny counting in the space of wavefunction? It's like saying "dividing world into people and their welfare is arbitrary - let's focus on measuring mass of a space region". The point is you can't remove all decision-theoretic arbitrariness from MWI - "branching indifference" is just arbitrary ethical constraint that is equivalent to valuing measure for no reason, and without it fundamental physics, that works like MWI, does not prevent you from making decisions as if quantum immortality works.
“Decoherence causes the Universe to develop an emergent branching structure. The existence of this branching is a robust (albeit emergent) feature of reality; so is the mod-squared amplitude for any macroscopically described history. But there is no non-arbitrary decomposition of macroscopically-described histories into ‘finest-grained’ histories, and no non-arbitrary way of counting those histories.”
Importantly though, on this approach it is still possible to quantify the combined weight (mod-squared amplitude) of all branches that share a certain macroscopic property, e.g. by saying:
“Tomorrow, the branches in which it is sunny will have combined weight 0.7”
There is no non-arbitrary definition of "sunny". If you are fine with approximations, then you can also decide on decomposition of wavefunction into some number of observers - it's the same problem as decomposing classical world that allows physical splitting of thick computers according to macroscopic property "number of people".
Even if we can’t currently prove certain axioms, doesn’t this just reflect our epistemological limitations rather than implying all axioms are equally “true”?
It doesn't and they are fundamentally equal. The only reality is the physical one - there is no reason to complicate your ontology with platonically existing math. Math is just a collection of useful templates that may help you predict reality and that it works is always just a physical fact. Best case is that we'll know true laws of physics and they will work like some subset of math and then axioms of physics would be actually true. You can make a guess about what axioms are compatible with true physics.
Also there is Shoenfield's absoluteness theorem, which I don't understand, but which maybe prevents empirical grounding of CH?
It sure doesn't seem to generalize in GPT-4o case. But what's the hypothesis for Sonnet 3.5 refusing in 85% of cases? And CoT improving score and o1 being better in browser suggests the problem is in models not understanding consequences, not in them not trying to be good. What's the rate of capability generalization to agent environment? Are we going to conclude that Sonnet is just demonstrates reasoning, instead of doing it for real, if it solves only 85% of tasks it correctly talks about?
Also, what's the rate of generalization of unprompted problematic behaviour avoidance? It's much less of a problem if your AI does what you tell it to do - you can just don't give it to users, tell it to invent nanotechnology, and win.
GPT-4 is insufficiently capable, even if it were given an agent structure, memory and goal set to match, to pull off a treacherous turn. The whole point of the treacherous turn argument is that the AI will wait until it can win to turn against you, and until then play along.
I don't get why actual ability matters. It's sufficiently capable to pull it off in some simulated environments. Are you claiming that we can't decieve GPT-4 and it is actually waiting and playing along just because it can't really win?
Whack-A-Mole fixes, from RLHF to finetuning, are about teaching the system to not demonstrate problematic behavior, not about fundamentally fixing that behavior.
Based on what? Problematic behavior avoidance does actually generalize in practice, right?
Not at all. The problem is that their observations would mostly not be in a classical basis.
I phrased it badly, but what I mean is that there is a simulation of Hilbert space, where some regions contain patterns that can be interpreted as observers observing something, and if you count them by similarity, you won't get counts consistent with Born measure of these patterns. I don't think basis matters in this model, if you change basis for observer, observations and similarity threshold simultaneously? Change of basis would just rotate or scale patterns, without changing how many distinct observers you can interpret them as, right?
??
Collapse or reality fluid. The point of mangled worlds or some other modification is to evade postulating probabilities on the level of physics.
https://mason.gmu.edu/~rhanson/mangledworlds.html
I mean that if turing machine is computing universe according to the laws of quantum mechanics, observers in such universe would be distributed uniformly, not by Born probability. So you either need some modification to current physics, such as mangled worlds, or you can postulate that Born probabilities are truly random.
Our observations are compatible with a world that is generated by a Turing machine with just a couple thousand bits.
Yes, but this is kinda incompatible with QM without mangled worlds.
Imagining two apples is a different thought from imagining one apple, right?
I mean, is it? Different states of the whole cortex are different. And the cortex can't be in a state of imagining only one apple and, simultaneously, be in a state of imagining two apples, obviously. But it's tautological. What are we gaining from thinking about it in such terms? You can say the same thing about the whole brain itself, that it can only have one brain-state in a moment.
I guess there is a sense in which other parts of the brain have more various thoughts relative to what cortex can handle, but, like you said, you can use half of cortex capacity, so why not define song and legal document as different thoughts?
As abstract elements of provisional framework cortex-level thoughts are fine, I just wonder what are you claiming about real constrains, aside from "there limits on thoughts". because, for example, you need other limits anyway - you can't think arbitrary complex thought even if it is intuitively cohesive. But yeah, enough gory details.
On the other hand, I can’t have two songs playing in my head simultaneously, nor can I be thinking about two unrelated legal documents simultaneously.
I can't either, but I don't see just from the architecture why it would be impossible in principle.
Again, I think autoassociative memory / attractor dynamics is a helpful analogy here. If I have a physical instantiation of a Hopfield network, I can’t query 100 of its stored patterns in parallel, right? I have to do it serially.
Yes, but you can theoretically encode many things in each pattern? Although if your parallel processes need different data, one of them will have to skip some responses... Would be better to have different networks, but I don't see brain providing much isolation. Well, it seems to illustrate complications of parallel processing that may played a role in humans usually staying serial.
I still don't get this "only one thing in awareness" thing. There are multiple neurons in cortex and I can imagine two apples - in what sense there can only be one thing in awareness?
Or equivalently, it corresponds equally well to two different questions about the territory, with two different answers, and there’s just no fact of the matter about which is the real answer.
Obviously the real answer is the model which is more veridical^^. The latter hindsight model is right not about the state of the world at t=0.1, but about what you thought about the world at t=0.1 later.
If that’s your hope—then you should already be alarmed at trends
Would be nice for someone to quantify the trends. Otherwise it may as well be that trends point to easygoing enough and aligned enough future systems.
For some humans, the answer will be yes—they really would do zero things!
Nah, it's impossible for evolution to just randomly stumble upon such complicated and unnatural mind-design. Next you are going to say what, that some people are fine with being controlled?
Where an entity has never had the option to do a thing, we may not validly infer its lack of preference.
Aha, so if we do give the option to an entity and it doesn't always kills all humans, then we have evidence it cares, right?
If there is a technical refutation it should simplify back into a nontechnical refutation.
Wait, why prohibiting successors would stop OpenAI from declaring easygoing system a failure? Ah, right - because there is no technical analysis, just elements of one.
I genuinely think it's a "more dakha" situation - the difficulty of communication is often underestimated, but it is possible to reach a mutual understanding.
RLHF does not solve the alignment problem because humans can’t provide good-enough feedback fast-enough.
Yeah, but the point is that the system learns values before an unrestricted AI vs AI conflict.
As mentioned in the beginning, I think the intuition goes that neural networks have a personality trait which we call “alignment”, caused by the correspondence between their values and our values. But “their values” only really makes sense after an unrestricted AI vs AI conflict, since without such conflicts, AIs are just gonna propagate energy to whichever constraints we point them at, so this whole worldview is wrong.
I mean, if your definition of values doesn't make sense for real systems, then it's the problem of your definition. As a hypothesis describing reality "alignment trait makes AI not splash harm on humans" is coherent enough. So the question is how do you know it is unlikely to happen?
This has not lead to the destruction of humanity yet because the biggest adversaries have kept their conflicts limited (because too much conflict is too costly) so no entity has pursued an end by any means necessary. But this only works because there’s a sufficiently small number of sufficiently big adversaries (USA, Russia, China, …), and because there’s sufficiently much opportunity cost.
First, "alignment is easy" is compatible with "we need to keep the set of big adversaries small". But more generally, without numbers it seems like generalized anti-future-technology argument - what's stopping human-regulation mechanisms from solving this adversarial problem, that didn't stop them from solving previous adversarial problems?
It makes conflict more viable for small adversaries against large adversaries
Not necessary? It's not unconceivable for future defense being more effective than offence (trivially true if "defense" is not giving AI to attackers). It kind of required for any future where humans have more power, than in present day?
But also, if you predict a completion model where a very weak hash is followed by its pre-image, it will probably have learned to undo the hash, even though the source generation process never performed that (potentially much more complicated than the hashing function itself) operation, which means it’s not really a simulator.
I'm saying that this won't work with current systems at least for strong hash, because it's hard, and instead of learning to undo, the model will learn to simulate, because it's easier. And then you can vary the strength of hash to measure the degree of predictorness/simulatorness and compare it with what you expect. Or do a similar thing with something other than hash, that also distinguishes the two frames.
The point is that without experiments like these, how have you come to believe in the predictor frame?
I don’t understand, how is “not predicting errors” either a thing we have observed, or something that has anything to do with simulation?
I guess it is less about simulation being the right frame and more about prediction being the wrong one. But I think we have definitely observed LLMs mispredicting things we wouldn't want them to predict. Or is this actually a crux and you haven't seen any evidence at all against the predictor frame?
And I don’t think we’ve observed any evidence of that.
What about any time a system generalizes favourably, instead of predicting errors? You can say it's just a failure of prediction, but it's not like these failures are random.
That is the central safety property we currently rely on and pushes things to be a bit more simulator-like.
And the evidence for this property, instead of, for example, the inherent bias of NNs, being central is what? Why wouldn't predictor exhibit more malign goal-directedness even for short term goals?
I can see that this whole story about modeling LLMs as predictors, and goal-directedness, and fundamental laws of cognition is logically coherent. But where is the connection to reality?
Why wouldn't myopic bias make it more likely to simulate than predict? And does't empirical evidence about LLMs support the simulators frame? Like, what observations persuaded you, that we are not living in the world, where LLMs are simulators?
In order to be “UP-like” in a relevant way, this procedure will have to involve running TMs, and the set of TMs that might be run needs to include the same TM that implements our beings and their world.
Why? The procedure just need to do some reasoning, constrained by UP and outer TM. And then UP-beings can just simulate this fast reasoning without problems of self-simulation.
Yes, AI that practically uses UP may fail to predict whether UP-beings simulate it in the center of their universe or on the boundary. But the point is that the more correct AI is in its reasoning, the more control UP-beings have.
Or you can not create AI that thinks about UP. But that's denying the assumption.
Yet, you can find valence in your own experiences
But why must you care about valence? It's not an epistemic error to not care. You don't have direct experience of there being a law that you must care about valence.
Everyone but Elon himself would say the above is a different scenario from reality. Each of us knows which body our first-person perspective resides in. And that is clearly not the physical human being referred as Elon Musk. But the actual and imaginary scenarios are not differentiated by any physical difference of the world, as the universe is objectively identical.
They are either differentiated by a physically different location of some part of your experience - like your memory being connected to Elon's sensations, or your thought being executed in other location - or it would be wrong to say that this scenario is different from reality: what you imagine would just correctly correspond to Elon's experiences also being real.
Computationalism is an ethical theory, so it is fine for it to be based on high-level abstractions - ethics is arbitrary.
For (1) the multiverse needs to be immensely larger than our universe, by a factor of at least 10106 or so “instances”. The exact double exponent depends upon how closely people have to match before it’s reasonable to consider them to be essentially the same person. Perhaps on the order of millions of data points is enough, maybe more are needed. Evidence for MWI is nowhere near strong enough to justify this level of granularity in the state space and it doesn’t generalize well to space-time quantization so this probably isn’t enough.
Why? Even without unphysically ordering arbitrary point-states, isn't the whole splitting behavior creates at least all subjectively-distinguishable instances?
There is non-zero measure on a branch that starts with you terminally ill and gradually proceeds to you miraculously recovering. So if you consider normally recovered you to be you, nothing stops you from considering this low-measure you to also be you.
I have never heard of anyone going to sleep as one of a pair of twins and waking up as the other.
According to MWI everyone wakes up as multiple selves all the time.
Still don't get how souls would get you psychic powers. Otherwise randomness and causality don't matter - you may as well simultaneously create people in numbered rooms and people in low-numbered rooms would have the same problems.
conscious in the way that we are conscious
Whether it's the same way is an ethical question, so you can decide however you want.
So there should be some sort of hardware-dependence to obtain subjective experience.
I certainly don't believe in subjective experience without any hardware, but no, there is no much dependence except for your preferences for hardware.
As for generally accepted conclusions... I think it's generally accepted that some preferences for hardware are useful in epistemic contexts, so you can be persuaded to say "rock is not conscious" for the same reason you say "rock is not calculator".
Not sure this qualifies, but I try to avoid instantiating complicated models for ethical reasons.
What does "dumb" mean? Corrigibility basically is being selectively dumb. You can give power to a LLM and it would likely still follow instructions.
Given a low prior probability of doom as apparent from the empirical track record of technological progress, I think we should generally be skeptical of purely theoretical arguments for doom, especially if they are vague and make no novel, verifiable predictions prior to doom.
And why such use of the empirical track record is valid? Like, what's the actual hypothesis here? What law of nature says "if technological progress hasn't caused doom yet, it won't cause it tomorrow"?
MIRI’s arguments for doom are often difficult to pin down, given the informal nature of their arguments, and in part due to their heavy reliance on analogies, metaphors, and vague supporting claims instead of concrete empirically verifiable models.
And arguments against are based on concrete empirically verifiable models of metaphors.
If your model of reality has the power to make these sweeping claims with high confidence, then you should almost certainly be able to use your model of reality to make novel predictions about the state of the world prior to AI doom that would help others determine if your model is correct.
Doesn't MIRI's model predict some degree of the whole Shoggoth/actress thing in current system? Seems verifiable.
There is a weaker and maybe shorter version by Chalmers: https://consc.net/papers/panpsychism.pdf. The short version is that there is no way for you to non-accidently know about quantization state of your brain and for that quantization not be a part of an easy problem: pretty much by definition, if you can just physically measure it, it's easy and not mysterious.
Panpsychism is correct about genuineness and subjectivity of experiences, but you can quantize your caring about other differences between experiences of human and zygote however you want.
If we live in naive MWI, an IBP agent would not care for good reasons, because naive MWI is a “library of babel” where essentially every conceivable thing happens no matter what you do.
Isn't the frequency of amplitude-patterns changes depending on what you do? So an agent can care about that instead of point-states.
In the case of teleportation, I think teleportation-phobic people are mostly making an implicit error of the form “mistakenly modeling situations as though you are a Cartesian Ghost who is observing experiences from outside the universe”, not making a mistake about what their preferences are per se.
Why not both? I can imagine that someone would be persuaded to accept teleportation/uploading if they stopped believing in physical Cartesian Ghost. But it's possible that if you remind them that continuity of experience, like table, is just a description of physical situation and not divinely blessed necessary value, that would be enough to tip the balance toward them valuing carbon or whatever. It's bad to be wrong about Cartesian Ghosts, but it's also bad to think that you don't have a choice about how you value experience.
Analogy: When you’re writing in your personal diary, you’re free to define “table” however you want. But in ordinary English-language discourse, if you call all penguins “tables” you’ll just be wrong. And this fact isn’t changed at all by the fact that “table” lacks a perfectly formal physics-level definition.
You're also free to define "I" however you want in your values. You're only wrong if your definitions imply wrong physical reality. But defining "I" and "experiences" in such a way that you will not experience anything after teleportation is possible without implying anything physically wrong.
You can be wrong about physical reality of teleportation. But even after you figured out that there is no additional physical process going on that kills your soul, except for the change of location, you still can move from "my soul crashes against an asteroid" to "soul-death in my values means sudden change in location" instead of to "my soul remains alive".
It's not like I even expect you specifically to mean "don't liking teleportation is necessary irrational" much. It's just that saying that there should be an actual answer to questions about "I" and "experiences" makes people moral-realist.
I'm asking how physicists in the laboratory know that their observation are sharp-valued and classical?
If we were just talking about word definitions and nothing else, then sure, define “self” however you want. You have the universe’s permission to define yourself into dying as often or as rarely as you’d like, if word definitions alone are what concerns you.
But this post hasn’t been talking about word definitions. It’s been talking about substantive predictive questions like “What’s the very next thing I’m going to see? The other side of the teleporter? Or nothing at all?”
There should be an actual answer to this, at least to the same degree there’s an answer to “When I step through this doorway, will I have another experience? And if so, what will that experience be?”
Why? If "I" is arbitrary definition, then “When I step through this doorway, will I have another experience?" depends on this arbitrary definition and so is also arbitrary.
But I hope the arguments I’ve laid out above make it clear what the right answer has to be: You should anticipate having both experiences.
So you always anticipate all possible experiences, because of multiverse? And if they are weighted, than wouldn't discovering that you are made of mini-yous will change your anticipation even without changing your brain state?
What's the evidence for these "sharp-valued classical observations" being real things?
In particular, a.follower many worlder has to discard unobserved results in the same way as a Copenhagenist—it’s just that they interpret doing so as the unobserved results existing in another branch, rather than being snipped off by collapse.
A many-worlder doesn't have to discard unobserved results - you may care about other branches.
The wrong part is mostly in https://arxiv.org/pdf/1405.7577.pdf, but: indexical probabilities of being a copy are value-laden - seems like the derivation first assumes that branching happens globally and then assumes that you are forbidden to count different instantiations of yourself, that were created by this global process.
"The" was just me being bad in English. What I mean is:
- There is probably a way to mathematically model true stochasticity. Properly, not as many-worlds.
- Math being deterministic shouldn't be a problem, because the laws of truly stochastic world are not stochastic themselves.
- I don't expect any such model to be simpler than many-worlds model. And that's why you shouldn't believe in true stochasticity.
- If 1 is wrong and it's not possible to mathematically model true stochasticity, then it's even worse and I would question your assertion of true stochasticity being coherent.
- If you say that mathematical models turn out complex because deterministic math is unnatural language for true stochasticity, then how do you compare them without math? The program that outputs an array is also simpler than the one that outputs one sample from that array.
How would you formulate this axiom?
Ugh, I'm bad at math. Let's say given the space of outcomes O and reality predicate R, the axiom would be .
Carroll's additional assumptions are not relied on by the MWI.
I don't know, any model you like? Space of outcomes with "one outcome is real" axiom. The point is that I can understand the argument for why the true stochasticity may be coherent, but I don't get why it would be better.
I disagree with this part—if Harry does the quantum equivalent of flipping an unbiased coin, then there’s a branch of the universe’s wavefunction in which Harry sees heads and says “gee, isn’t it interesting that I see heads and not tails, I wonder how that works, hmm why did my thread of subjective experience carry me into the heads branch?”, and there’s also a branch of the universe’s wavefunction in which Harry sees tails and says “gee, isn’t it interesting that I see tails and not heads, I wonder how that works, hmm why did my thread of subjective experience carry me into the tails branch?”. I don’t think either of these Harrys is “preferred”.
This is how it works in MWI without additional postulates. But if you postulate the probability that you will find yourself somewhere, then you are postulating the difference between the case where you have found yourself there, and the case where you haven't. Having a number for how much you prefer something is the whole point of indexical probabilities. And as probability of some future "you" goes to zero, this future "you" goes to not being the continuation of your subjective experience, right? Surely that would make this "you" dispreferred in some sense?
- such formalisms are unwieldy
Do you actually need any other reason to not believe in True Randomness?
- that’s just passing the buck to the one who interprets the formalism
Any argument is just passing the buck to the one who interprets the language.