Posts
Comments
In some but not all imaginable Truly Stochastic worlds, perhaps it's like the probability distribution of the whole state of the universe, but OP's intuitionpumping example seems to be imagining a case where A is some small bit of the universe.
Oops, I guess I missed this part when reading your comment. No, I meant for A to refer to the whole configuration of the universe.
The issue with this idea is that it seems pretty much impossible
I think your position here is approximatelyoptimal within the framework of consequentialism.
It's just that I worry that consequentialism itself is the reason we have problems like AI xrisk, in the sense that the thing that drives xrisk scenarios may be the theory of agency that is shared with consequentialism.
I've been working on a post  actually I'm going to temporarily add you as a coauthor so you can see the draft and add comments if you're interested  where I discuss the flaws and how I think one should approach it differently. One of the major inspirations is Against responsibility, but I've sort of taken inspiration from multiple places, including critics of EA and critics of economics.
The ideal of consequentialism is essentially flawless; it's when you hand it to sexobsessed murder monkeys as an excuse to do things that shit hits the fan.
I've come to think that isn't actually the case. E.g. while I disagree with Being nicer than clippy, it quite precisely nails how consequentialism isn't essentially flawless:
Now, of course, utilitarianismintheory was never, erm, actually very tolerant. Utilitarianism is actually kinda pissed about all these hobbies. For example: did you notice the way they aren't hedonium? Seriously tragic. And even setting aside the nothedonium problem (it applies to allthethings), I checked Jim's pleasure levels for the trashyTV, and they're way lower than if he got into Mozart; Mary's stampcollecting is actually a bit obsessive and outofbalance; and Mormonism seems too confident about optimal amount of coffee. Oh noes! Can we optimize these backyards somehow? And Yudkowsky's paradigm misaligned AIs are thinking along the same lines – and they've got the nanobots to make it happen.
Unbounded utility maximization aspires to optimize the entire world. This is pretty funky for just about any optimization criterion people can come up with, even if people are perfectly flawless in how well they follow it. There's a bunch of attempts to patch this, but none have really worked so far, and it doesn't seem like any will ever work.
Upvoted, but I think I disagree out of a tangent.
The consequences of someone's actions are nonetheless partial evidence of their morality. If you discover that embezzled funds have been building up on Bob's bank account, that's evidence Bob is an unethical guy—most people who embezzle funds are unethical. But then you might discover that, before he was caught and the money confiscated, Bob was embezzling funds to build an orphanage. The consequences haven't changed, but Bob's final (unresolved) intentions are attenuating circumstances. If I had to hang around with either your typical fundembezzler or Bob, I would pick Bob.
An orphanage is sort of a funky example, because I don't intuitively associate it with costeffectiveness, but I don't know much about it. If it's not costeffective to build an orphanage, then what logic does Bob see in it? Under ordinary circumstances, I associate noncosteffective charity with just doing what you've cached as good without thinking too much about it, but embezzlement doesn't sound like something you'd cache as good, so that doesn't sound likely. Maybe he's trying to do charity to build reputation that he can leverage into other stuff?
Anyway, if I don't fight the hypothetical, and assume Bob's embezzling for an orphanage was costeffective, then that's evidence that he's engaging in fully unbounded consequentialism, aspiring to do the globally utilitymaximizing action regardless of his personal responsibilities, his attention levels and his comparative advantages.
This allows you to predict that in the future, he might do similar things, e.g. secretly charge ahead with creating AI that takes over the world 0.1% more quickly and 0.1% more safely than its competitors even if there's 99.8% chance everyone dies, in order to capture the extra utility in that extra sliver he gains. Or that he might suppress allegations of rape within his circles if he fears the drama will push his group off track from saving the world.
If, on the other hand, someone was embezzling funds to spend on parties for himself and his friends, then while that's still criminal, it's a much more limited form of criminality, where he still wouldn't want to be part of the team that destroys the world, and wouldn't want to protect rapists. (I mean, he might still want to protect rapists if he's closer friends with the person who is raping than with the victims, but the point is he's trying to help at least some of the people around himself.)
Honestly the one who embezzles funds for unbounded consequentialist purposes sounds much more intellectually interesting, and so I would probably still prefer to hang around him, but the one who embezzles funds for parties seems much safer, and so I think a moral principle along the lines of "unbounded consequentialists are especially evil and must be suppressed" makes sense. You know, the whole thing where we understand that "the ends justify the means" is a villainous thing to say.
I think this is actually pretty cruxy for consequentialism. Of course, you can try to patch consequentialism in various ways, but these problems show up all over the place and are subject to a lot of optimization pressure because resources are useful for many things, so one needs a really robust solution in order for it to be viable. I think the solution lies in recognizing that healthy systems follow a different kind of agency that doesn't aspire to have unbounded impact, and consequentialists need to develop a proper model of that to have a chance.
Measure theory and probability theory was developed to describe stochasticity and uncertainty, but they formalize it in manyworlds terms, closely analogous to how the wavefunction is formalized in quantum mechanics. If one takes the wavefunction formalism literally to the point of believing that quantum mechanics must have many worlds, it seems natural to take the probability distribution formalism equally literally to the point of believing that probability must have many worlds too. Or well, you can have a hidden variables theory of probability too, but the point is it seems like you would have to abandon True Stochasticity.
True Stochasticity vs probability distributions provides a nonquantum example of the nonnative embedding, so if you accept the existence of True Stochasticity as distinct from many worlds of simultaneous possibility or ignorance of hidden variables, then that provides a way to understand my objection. Otherwise, I don't yet know a way to explain it, and am not sure one exists.
As for the case of how a new branch of math could describe wavefunctions more natively, there's a tradeoff where you can put in a ton of work and philosophy to make a field of math that describes an object completely natively, but it doesn't actually help the daytoday work of a mathematician, and it often restricts the tools you can work with (e.g. no excluded middle and no axiom of choice), so people usually don't. Instead they develop their branch of math within classical math with some informal shortcuts.
Okay, so by "wavefunction as a classical mathematical object" you mean a vector in Hilbert space?
Yes.
In that case, what do you mean by the adjective "classical"?
There's a lot of variants of math; e.g. homotopy type theory, abstract stone duality, nonstandard analysis, etc.. Maybe one could make up a variant of math that could embed wavefunctions more natively.
Hi? Edit: the parent comment originally just had a single word saying "Test"
Do you actually need any other reason to not believe in True Randomness?
I think I used to accept this argument, but then came to believe that simplicity of formalisms usually originates from renormalization more than from the simplicity being Literally True?
As a matter of fact, it is modeled this way. To define probability function you need a sample space, from which exactly one outcome is "sampled" in every iteration of probability experiment.
No, that's for random variables, but in order to have random variables you first need a probability distribution over the outcome space.
And this is why, I have troubles with the idea of "true randomness" being philosophically coherent. If there is no mathematical way to describe it, in which way can we say that it's coherent?
You could use a mathematical formalism that contains True Randomness, but 1. such formalisms are unwieldy, 2. that's just passing the buck to the one who interprets the formalism.
The wavefunction in quantum mechanics is not like the probability distribution of (say) where a dart lands when you throw it at a dartboard. (In some but not all imaginable Truly Stochastic worlds, perhaps it's like the probability distribution of the whole state of the universe, but OP's intuitionpumping example seems to be imagining a case where A is some small bit of the universe.)
The reason why it's not like that is that the laws describing the evolution of the system explicitly refer to what's in the wavefunction. We don't have any way to understand and describe what a quantum universe does other than in terms of the evolution of the wavefunction or something basically equivalent thereto.
In my view, the big similarity is in principle of superposition. The evolution of the system in a sense may depend on the wavefunction, but it is an extremely rigid sense which requires it to be invariant to chopping up a superposition to a bunch of independent pieces, or chopping up a simple state into an extremely pathological superposition.
I have the impression  which may well be very unfair  that at some early stage OP imbibed the idea that what "quantum" fundamentally means is something very like "random", so that a system that's deterministic is ipso facto less "quantum" than a system that's stochastic. But that seems wrong to me. We don't presently have any way to distinguish random from deterministic versions of quantum physics; randomness or something very like it shows up in our experience of quantum phenomena, but the fact that a manyworlds interpretation is workable at all means that that doesn't tell us much about whether randomness is essential to quantumness.
It's worth emphasizing that the OP isn't really how I originally thought of QM. One of my earliest memories was of my dad explaining quantum collapse to me, and me reinventing decoherence by asking why it couldn't just be that you got entangled with the thing you were observing. It's only now, years later, that I've come to take issue with QM.
In my mind, there's four things that strongly distinguish QM systems from ordinary stochastic systems:
 Destructive interference
 Principle of least action (you could in principle have this and the next in deterministic/stochastic systems, but it doesn't fall out of the structure the ontology as easily, without additional laws)
 Preservation of information (though of course since the universe is actually quantum, this means the universe doesn't resemble a deterministic or stochastic system at the large scale, because we have thermodynamics and neither deterministic nor stochastic systems need thermodynamics)
 Pauli exclusion principle (technically you could have this in a stochastic system too, but it feels quantummechanical because it can be derived from fermion products being antisymmetric, and antisymmetry only makes sense in quantum systems)
Almost certainly this isn't complete, since I'm mostly autodidact (got taught a bit by my dad, read standard rationalist intros to quantum, like The Sequences and Scott Aaronson, took a mathematical physics course, and coded a few qubit simulations, binged some Wikipedia and Youtube). Of these, only destructive interference really seems like an obstacle, and only a mild one.
(And, incidentally, if we had a model of Truly Stochastic physics in which the evolution of the system is driven by what's inside those probability distributions  why, then, I would rather like the idea of claiming that the probability distributions are what's real, rather than just their outcomes.)
I would say this is cruxy for me, in the sense that if I didn't believe Truly Stochastic systems were ontologically fine, then I would take similar issue with Truly Quantum systems.
In the absence of a measurement/collapse postulate, quantum mechanics is a deterministic theory
You can make a deterministic theory of stochasticity using manyworlds too.
In the absence of a postulate that the wavefunction is Literally The Underlying State, rather than just a way we describe the system deterministically, quantum dynamics doesn't fit under a deterministic ontology.
Also, what do you mean by "the wavefunction as a classical mathematical object"?
If you have some basis , you can represent quantum systems using functions (or perhaps more naturally, as where denotes the free vector space, but then we get into category theory, and that's a giant nerdsnipe).
For any wellcontrolled isolated system, if it starts in a state Ψ⟩, then at a later time it will be in state UΨ⟩ where U is a certain deterministic unitary operator. So far this is indisputable—you can do quantum state tomography, you can measure the interference effects, etc. Right?
It will certainly be mathematically welldescribed by an expression like that. But when you flip a coin without looking at it, it will also be welldescribed by a probability distribution 0.5 H + 0.5 T, and this doesn't mean that we insist that after the flip, the coin is Really In That Distribution.
Now it's true that in quantum systems, you can measure a bunch of additional properties that allow you to rule out alternative models. But my OP is more claiming that the wavefunction is a model of the universe, and the actual universe is presumably the disquotation of this, so by construction the wavefunction acts identically to how I'm claiming the universe acts, and therefore these measurements wouldn't be ruling out that the universe works that way.
Or as a thought experiment: say you're considering a simple quantum system with a handful of qubits. It can be described with a wavefunction that assigns each combination of qubit values a complex number. Now say you code up a classical computer to run a quantum simulator, which you do by using a hash map to connect the qubit combos to their amplitudes. The quantum simulator runs in our quantum universe.
Now here's the question: what happens if you have a superposition in the original quantum system? It turns into a tensor product in the universe the quantum computer runs in, because the quantum simulator represents each branch of the wavefunction separately.
This phenomenon, where a superposition within the system gets represented by a product outside of the system, is basically a consequence of modelling the system using wavefunctions. Contrast this to if you were just running a quantum computer with a bunch of qubits, so the superposition in the internal system would map to a superposition in the external system.
I claim that this extra product comes from modelling the system as a wavefunction, and that much of the "many worlds" aspect of the manyworlds interpretation arises from this (since products represent things that both occur, whereas things in superposition are represented with just sums).
OK, so then you say: “Well, a very big wellcontrolled isolated system could be a box with my friend Harry and his cat in it, and if the same principle holds, then there will be deterministic unitary evolution from Ψ⟩ into UΨ⟩, and hey, I just did the math and it turns out that UΨ⟩ will have a 50/50 mix of ‘Harry sees his cat alive’ and ‘Harry sees his cat dead and is sad’.” This is beyond what’s possible to directly experimentally verify, but I think it should be a very strong presumption by extrapolating from the first paragraph. (As you say, “quantum computers prove larger and larger superpositions to be stable”.)
Yes, if you assume the wavefunction is the actual state of the system, rather than a deterministic model of the system, then it automatically follows that somethinglikemanyworlds must be true.
…And then there’s an indexicality issue, and you need another axiom to resolve it. For example: “as quantum amplitude of a piece of the wavefunction goes to zero, the probability that I will ‘find myself’ in that piece also goes to zero” is one such axiom, and equivalent (it turns out) to the Born rule. It’s another axiom for sure; I just like that particular formulation because it “feels more natural” or something.
Huh, I didn't know this was equivalent to the born rule. It does feel pretty natural, do you have a reference to the proof?
I’m really unsympathetic to the second bulletpoint attitude, but I don’t think I’ve ever successfully talked somebody out of it, so evidently it’s a pretty deep gap, or at any rate I for one am apparently unable to communicate past it.
I agree with the former bullet point rather than the latter.
FWIW last I heard, nobody has constructed a pilotwave theory that agrees with quantum field theory (QFT) in general and the standard model of particle physics in particular. The tricky part is that in QFT there’s observable interference between states that have different numbers of particles in them, e.g. a virtual electron can appear then disappear in one branch but not appear at all in another, and those branches have easilyobservable interference in collision crosssections etc. That messes with the pilotwave formalism, I think.
Someone in the comments of the last thread claimed maybe some people found out how to generalize pilotwave to QFT. But I'm not overly attached to that claim; pilotwave theory is obviously directionally incorrect with respect to the ontology of the universe, and even if it can be forced to work with QFT, I can definitely see how it is in tension with it.
I guess it's hard to answer because it depends on three degrees of freedom:
 Whether you agree with my assessment that it's mostly arbitrary to demand the fundamental ontology to be deterministic rather than stochastic or quantum,
 Whether you count "many worlds" as literally asserting that the wavefunction as a classical mathematical object is real or as simply distancing oneself from collapse/hidden variables,
 Whether you even aim to describe what is ontologically fundamental in the first place.
I'm personally inclined to say the manyworlds interpretation is technically wrong, hence the title. But I have basically suggested people could give different answers to these sorts of degrees of freedom, and so I could see other people having different takeaways.
The observer is highly sensitive to differences along a specific basis, and therefore changes a lot in response to that basis. Due to chaos, this then leads to everything else on earth getting entangled with the observer in that same basis, implying earthwide decoherence.
This is just chaos theory, isn't it? If one person sees that Schrodinger's cat is dead, then they're going to change their future behavior, which changes the behavior of everyone they interact with, and this then butterflies up to entangle the entire earth in the same superposition.
Uncharitable punchline is "if you take pilot wave but keep track of every possible position that any particle could have been (and ignore where they actually were in the actual experiment) then you get many worlds." Seems like a dumb thing to do to me.
How would you formalize pilot wave theory without keeping "track of every possible position that any particle could have been" (which I assume refers to, not throwing away the wavefunction)?
We'd still expect strongly interacting systems e.g. the earth (and really, the solar system?) to have an objective splitting. But it seems correct to say that I basically don't know how far that extends.
Let's say you have some unitary transformation . If you were to apply this to a coherent superposition , it seems like it would pretty much always make you end up with a decoherent superposition. So it doesn't seem like there's anything left to explain.
Kind of, because "multiple future outcomes are possible, rather than one inevitable outcome" could sort of be said to apply to both true stochasticity and true quantum mechanics. With true stochasticity, it has to evolve by a diffusionlike process with no destructive interference, whereas for true quantum mechanics, it has to evolve by a unitarylike process with no information loss.
So to a mind that can comprehend probability distributions, but intuitively thinks they always describe hidden variables or frequencies or whatever, how does one express true stochasticity, the notion where a probability distribution of future outcomes are possible (even if one knew all the information that currently exists), but only one of them happens?
Before I answer that question: do you know what I mean by a truly stochastic universe? If so, how would you explain the concept of true ontologically fundamental stochasticity to a mind that does not know what it means?
But cat alive> + cat dead> is a natural basis because that's the basis in which the interaction occurs. No mystery there; you can't perceive something without interacting with it, and an interaction is likely to have some sort of privileged basis.
Gonna post a toplevel post about it once it's made it through editing, but basically the wavefunction is a way to embed a quantum system in a deterministic system, very closely analogous to how a probability function allows you to embed a stochastic system into a deterministic system. So just like how taking the math literally for QM means believing that you live in a multiverse, taking the math literally for probability also means believing that you live in a multiverse. But it seems philosophically coherent for me to believe that we live in a truly stochastic universe rather than just a deterministic probability multiverse, so it also feels like it should be philosophically coherent that we live in a truly quantum universe.
I'm confused about what you're saying. In particular while I know what "decoherence" means, it sounds like you are talking about some special formal thing when you say "decoherent branches".
Let's consider the case of Schrodinger's cat. Surely the math itself says that when you open the box, you end up in a superposition of see the cat alive> + see the cat dead>.
Or from a comp sci PoV, I imagine having some initial bit sequence, 0101010001100010>, and then applying a Hadamard gate to end up with a superposition (sqrt(1/2) 0> + sqrt(1/2) 1>) (x) 101010001100010>. Next I imagine a bunch of CNOTs that mix together this bit in superposition with the other bits, making the superpositions very distant from each other and therefore unlikely to interact.
What are you saying goes wrong in these pictures?
I'm confused about what distinction you are talking about, possibly because I haven't read Everett's original proposal.
The multiverse interpretation takes the wavefunction literally and says that since the math describes a multiverse, there is a multiverse.
YMMV about how literally you take the math. I've come to have a technical objection to it such that I'd be inclined to say that the multiverse theory is wrong, but also it is very technical and I think a substantial fraction of multiverse theorists would say "yeah that's what I meant" or "I suppose that's plausible too".
But "take the math literally" sure seems like good reason/evidence.
And when it comes to pilot wave theory, its math also postulates a wavefunction, so if you take the math literally for pilot wave theory, you get the Everettian multiverse; you just additionally declare one of the branches Real in a vague sense.
Ahh, now I've got it:
First, each morphism f : A > 0 induces a unique morphism f . pr1 : A x 0 > 0. Proof: suppose f . pr1 = g . pr1. Then we have f = f . pr1 . (id, f) = g . pr1 . (id, f) = g.
Corollary: if you have exponential objects, then if you have any f : A > 0, then A ≈ 0, because there's only one morphism 0 > 0^A.
But, if you have coexponential objects, any hom set A > B can instead be expressed as a hom set AB > 0. This shows AB ≈ 0, and also all homs are equal.
I think exponentials and coexponentials are relevant here, since they are good at shuffling things back and forth between the sides of morphisms, which matters for limits and colimits as they are adjunctions (and a particularly nice kind of adjunction, at that).
I can't remember the entire proof, and maybe I misstated it, but IIRC part of the logic goes as follows:
With exponentials, you can prove that 0 x A ≈ 0, because any morphism 0 x A > B curries into 0 > B^A, of which by the universal property of 0, there's only one.
Similarly, with coexponentials, you can prove that 1 + A ≈ 1, because any morphism A > B+1 cocurries into AB > 1, of which there is only one.
So this at least proves that all the objects built out of 0, 1, x and + are trivial. I think there was something funky where you made use of the fact that A > B can be expressed as 1(B^A) > 0 and 1 > 0^(BA) to further prove all morphisms trivial, but I can't remember it exactly.
I agree that Scott Alexander's position is that it's not selfevidently good for the truth about his own views to be known. I'm just saying there's a bunch of times he's alluded to or outright endorsed it being selfevidently good for the truth to be known in general, in order to defend himself when criticized for being interested in the truth about taboo topics.
Speaking for myself: I don't prefer to be alone or tend to hide information about myself. Quite the opposite; I like to have company but rare is the company that likes to have me, and I like sharing, though it's rare that someone cares to hear it.
Sounds like you aren't avoidant, since introversionrelated items tend to be the ones most highly endorsed by the avoidant profile.
Now if I were in Scott's position? I find social media enemies terrifying and would want to hide as much as possible from them. And Scott's desire for his name not to be broadcast? He's explained it as related to his profession, and I don't see why I should disbelieve that. Yet Scott also schedules regular meetups where strangers can come, which doesn't sound "avoidant". More broadly, labeling famousish people who talk frequently online as "avoidant" doesn't sound right.
Scott Alexander's MBTI type is INTJ. The INT part is all aligned with avoidant, so I still say he's avoidant. Do you think all the meetups and such mean that he's really ENTJ?
As for wanting to hide from social media enemies, I'd speculate that this causally contributes to avoidant personality.
Also, "schizoid" as in schizophrenia? By reputation, rationalists are more likely to be autistic, which tends not to cooccur with schizophrenia, and the ACX survey is correlated with this reputation. (Could say more but I think this suffices.)
Schizoid as in schizoid.
Actually one does need to read The Bell Curve to know what's in it. There's a lot of slander going around about it.
Do you know programming? A coexponential is intuitively roughly speaking an together with a return position where you can place a . It's how function calls are implemented in computers, as morphisms , corresponding to the fact that you have the parameters and the call stack .
(More formally: given a coproduct (~disjoint union) , a coexponential is defined based on being equivalent to .)
I wonder if it is related to exponentials vs coexponentials, since categories with both exponentials and coexponentials are posetal. I don't have any particular argument for how that'd work, though.
Trivially, a coX in C^op is the same but flipped as an X in C.
Then as you know, Stone duality says that CABA = Set^op.
So a coX in CABA is the same but flipped as an X in Set.
(I think it works constructively too if one replaces Boolean with Heyting?)
A notable counterexample is FinVect, which has an equivalence of categories FinVect > FinVect^op.
Are you familiar with the category of complete atomic boolean algebras?
For agents, the "largescale property" of interest is maximizing utility over some stuff "far away"  e.g. far in the future, for the examples in this post.
One consideration that coherence theorems often seem to lack:
It seems to me that often, optimizers establish a boundary and do most of their optimization within that boundary. E.g. animals have a skin that they maintain homeostasis under, companies have offices and factories where they perform their work, states have borders and people have homes.
These don't entirely dodge coherence theorems  typically a substantial part of the point of these boundaries is to optimize some other thing in the future. But they do set something up I feel.
The unrolling of the episode is still very cheap. It's a lot cheaper to unroll a Dreamerv3 for 16 steps, then it is to go out into the world and run a robot in a realworld task for 16 steps and try to get the NN to propagate updated value estimates the entire way...
But I'm not advocating against MBRL, so this isn't the relevant counterfactual. A pure MBRLbased approach would update the value function to match the rollouts, but e.g. DreamerV3 also uses the value function in a Bellmanlike manner to e.g. impute the future reward at the end of an episode. This allows it to plan for further than the 16 steps it rolls out, but it would be computationally intractable to roll out for as far as this ends up planning.
if the environment is difficult, a tree search with a very small planning budget like just a few rollouts is probably going to have quite noisy choices/estimates too. No free lunches.
It's possible for there to be a kind of chaos where the analytic gradients blow up yet discrete differences have predictable effects. Bifurcations etc..
They won't be controlled by something as simple as a single fixed reward function, I think we can agree on that. But I don't find successorfunction like representations to be too promising as a direction for how to generalize agents, or, in fact, any attempt to fancily handengineer in these sorts of approaches into DRL agents.
These things should be learned. For example, leaning into Decision Transformers and using a lot more conditionalizing through metadata and relying on metalearning seems much more promising. (When it comes to generative models, if conditioning isn't solving your problems, you're just not using enough conditioning or generative modeling.) A prompt can describe agents and reward functions and the base agent executes that, and whatever is useful about successorlike representations just emerges automatically internally as the solution to the overall family of tasks in turning histories into actions.
I agree with things needing to be learned; using the actual states themselves was more of a toy model (because we have mathematical models for MDPs but we don't have mathematical models for "capabilities researchers will find something that can be Learned"), and I'd expect something else to happen. If I was to run off to implement this now, I'd be using learned embeddings of states, rather than states themselves. Though of course even learned embeddings have their problems.
The trouble with just saying "let's use decision transformers" is twofold. First, we still need to actually define the feedback system. One option is to just define reward as the feedback, but as you mention, that's not nuanced enough. You could use some system that's trained to mimic human labels as the ground truth, but this kind of system has flaws for standard alignment reasons.
It seems to me that capabilities researchers are eventually going to find some clever feedback system to use. It will to a great extent be learned, but they're going to need to figure out the learning method too.
Same as training the neural network, once it's differentiable  backprop can 'chain the estimates backwards' so efficiently you barely even think about it anymore.
I don't think this is true in general. Unrolling an episode for longer steps takes more resources, and the later steps in the episode become more chaotic. DreamerV3 only unrolls for 16 steps.
Or distilling a tree search into a NN  the tree search needed to do backwards induction of updated estimates from all the terminal nodes all the way up to the root where the next action is chosen, but that's very fast and explicit and can be distilled down into a NN forward pass.
But when you distill a tree search, you basically learn value estimates, i.e. something similar to a Q function (realistically, V function). Thus, here you also have an opportunity to bubble up some additional information.
And aside from being able to update withinepisode or take actions entirely unobserved before, when you do MBRL, you get to do it at arbitrary scale (thus potentially extremely little wallclock time like an AlphaZero), offline (no environment interactions), potentially highly sampleefficient (if the dataset is adequate or one can do optimal experimentation to acquire the most useful data, like PILCO), with transfer learning to all other problems in related environments (because value functions are mostly worthless outside the exact setting, which is why modelfree DRL agents are notorious for overfitting and having zerotransfer), easily eliciting metalearning and zeroshot capabilities, etc.*
I'm not doubting the relevance of MBRL, I expect that to take off too. What I'm doubting is that future agents will be controlled using scalar utilities/rewards/etc. rather than something more nuanced.
With MBRL, don't you end up with the same problem, but when planning in the model instead? E.g. DreamerV3 still learns a value function in their actorcritic reinforcement learning that occurs "in the model". This value function still needs to chain the estimates backwards.
Also, if you expect this to take off, then by your own admission you are mostly accelerating the current trajectory (which I consider mostly doomed) rather than changing it. Unless you expect it to take off mostly thanks to you?
Surely your expectation that the current trajectory is mostly doomed depends on your expectation of the technical details of the extension of the current trajectory. If technical specifics emerge that shows the current trajectory to be going in a more alignable direction, it may be fine to accelerate.
Could this be explained if SAEs only find a subset of the features so therefore the reconstructions are just entirely missing random features whereas random noise is just random and therefore mostly ignored?
It's capability research that is coupled to alignment:
Furthermore it seems like a win for interpretability and alignment as it gives greater feedback on how the AI intends to earn rewards, and better ability to control those rewards.
Coupling alignment to capabilities is basically what we need to survive, because the danger of capabilities comes from the fact that capabilities is selffunding, thereby risking outracing alignment. If alignment can absorb enough success from capabilities, we survive.
Thanks for the link! It does look somewhat relevant.
But I think the weighting by reward (or other significant variables) is pretty important, since it generates a goal to pursue, making it emphasize things that can achieved rather than just things that might randomly happen.
Though this makes me think about whether there are natural variables in the state space that could be weighted by, without using reward per se. E.g. the size of (s'  s) in some natural embedding, or the variance in s' over all the possible actions that could be taken. Hmm. 🤔
I have a concept that I expect to take off in reinforcement learning. I don't have time to test it right now, though hopefully I'd find time later. Until then, I want to put it out here, either as inspiration for others, or as a "called it"/prediction, or as a way to hear critique/about similar projects others might have made:
Reinforcement learning is currently trying to do stuff like learning to model the sum of their future rewards, e.g. expectations using V, A and Q functions for many algorithm, or the entire probability distribution in algorithms like DreamerV3.
Mechanistically, the reason these methods work is that they stitch together experience from different trajectories. So e.g. if one trajectory goes A > B > C and earns a reward at the end, it learns that states A and B and C are valuable. If another trajectory goes D > A > E > F and gets punished at the end, it learns that E and F are lowvalue but D and A are highvalue because its experience from the first trajectory shows that it could've just gone D > A > B > C instead.
But what if it learns of a path E > B? Or a shortcut A > C? Or a path F > G that gives a huge amount of reward? Because these techniques work by chaining the reward backwards stepbystep, it seems like this would be hard to learn well. Like the Bellman equation will still be approximately satisfied, for instance.
Ok, so that's the problem, but how could it be fixed? Speculation time:
You want to learn an embedding of the opportunities you have in a given state (or for a given stateaction), rather than just its potential rewards. Rewards are too sparse of a signal.
More formally, let's say instead of the Q function, we consider what I would call the Hope function: which given a stateaction pair (s, a), gives you a distribution over states it expects to visit, weighted by the rewards it will get. This can still be phrased using the Bellman equation:
Hope(s, a) = rs' + f Hope(s', a')
Where s' is the resulting state that experience has shown comes after s when doing a, f is the discounting factor, and a' is the optimal action in s'.
Because the Hope function is multidimensional, the learning signal is much richer, and one should therefore maybe expects its internal activations to be richer and more flexible in the face of new experience.
Here's another thing to notice: let's say for the policy, we use the Hope function as a target to feed into a decision transformer. We now have a natural parameterization for the policy, based on which Hope it pursues.
In particular, we could define another function, maybe called the Result function, which in addition to s and a takes a target distribution w as a parameter, subject to the Bellman equation:
Result(s, a, w) = rs' + f Result(s', a', (wrs')/f)
Where a' is the action recommended by the decision transformer when asked to achieve (wrs')/f from state s'.
This Result function ought to be invariant under many changes in policy, which should make it more stable to learn, boosting capabilities. Furthermore it seems like a win for interpretability and alignment as it gives greater feedback on how the AI intends to earn rewards, and better ability to control those rewards.
An obvious challenge with this proposal is that states are really latent variables and also too complex to learn distributions over. While this is true, that seems like an orthogonal problem to solve.
Also this mindset seems to pave way for other approaches, e.g. you could maybe have a Halfway function that factors an ambitious hope into smaller ones or something. Though it's a bit tricky because one needs to distinguish correlation and causation.
I guess for reference, here's a slightly more complete version of the personality taxonomy:

Normative: Happy, social, emotionally expressive. Respects authority and expects others to do so too.

Anxious: Afraid of speaking up, of breaking the rules, and of getting noticed. Tries to be alone as a result. Doesn't trust that others mean well.

Wild: Parties, swears, and is emotionally unstable. Breaks rules and supports others (... in doing the same?)

Avoidant: Contrarian, intellectual, and secretive. Likes to be alone and doesn't respect rules or cleanliness.
In practice people would be combinations of these archetypes, rather than purely being one of them. In some versions, the Normative type split into three:
 Jockish: Parties and avoids intellectual topics.
 Steadfast: Conservative yet patient and supportive.
 Perfectionistic: Gets upset over other people's mistakes and tries to take control as a result.
This would make it as fully expressive as the Big Five.
... but there was some mathematical trouble in getting it to be replicable and "nice" if I included 6 profiles, so I'm expecting to be stuck at 4 types unless I discover some new mathematical tricks.
I still don't really understand the avoidant/nonavoidant taxonomy. I am confused when avoidant is both "introverted... and prefer to be alone" while "avoidants... being disturbing to others" when Scott never intended to disturb Metz's life?
The part about being disturbing wasn't supposed to refer to Scott's treatment of Cade Metz, it was supposed to refer to rationalist's interests in taboo and disagreeable topics. And as for trying to be disturbing, I said that I think the nonavoidant people were being unfair in their characterization of them, as it's not that simple and often it's correction to genuine deception by nonavoidants.
And the claim about Scott being low conscientious? Gwern being low conscientious? If it "varying from person to person" so much, is it even descriptive?
My model is an affine transformation applied to Big Five scores, constrained to make the relationship from transformed scores to items linear rather than affine, and optimized to make people's scores sparse.
This is rather technical, but the consequence is that my model is mathematically equivalent to a subspace of the Big Five, and the Big Five has similar issues where it can tend to lump different stuff together. Like one could just as well turn it around and say that the Big Five lumps my anxious and avoidant profiles together under the label of "introverted". (Well, the Big Five has two more dimensions than my model does, so it lumps fewer things together, but other models have more dimensions than Big Five, so Big Five lumps things together relative to those model.)
My model is new, so I'm still experimenting with it to see how much utility I find in it. Maybe I'll abandon it as I get bored and it stops giving results.
Making a claim of Gwern being avoidant, and Gwern said that Gwern is not. It might be the case that Gwern is lying. But that seems far stretched and not yet substantiated. But it seemed confusing enough that Gwern also couldn't tell how wide the concept applies.
Gwern said that he's not avoidant of journalists, but he's low extraversion, low agreeableness, low neuroticism, high openness, mid conscientiousness, so that definitionally makes him avoidant under my personality model (which as mentioned is just an affine transformation of the Big Five). He also alludes to having schizoid personality disorder, which I think is relevant to being avoidant. As I said, this is a model of general personality profiles, not of interactions with journalists specifically.
I get that this is an argument one could make. But the reason I started this tangent was because you said:
Here CM doesn’t directly argue that there was any benefit to doxxing; instead he kinda conveys a vibe / ideology that if something is true then it is selfevidently intrinsically good to publish it
That is, my original argument was not in response to the "Anyway, if the true benefit is zero (as I believe), then we don’t have to quibble over whether the cost was big or small" part of your post, it was to the vibe/ideology part.
Where I was trying to say, it doesn't seem to me that Cade Metz was the one who introduced this vibe/ideology, rather it seems to have been introduced by rationalists prior to this, specifically to defend tinkering with taboo topics.
Like, you mention that Cade Metz conveys this vibe/ideology that you disagree with, and you didn't try to rebut I directly, I assumed because Cade Metz didn't defend it but just treated it as obvious.
And that's where I'm saying, since many rationalists including Scott Alexander have endorsed this ideology, there's a sense in which it seems wrong, almost rude, to not address it directly. Like a sort of MotteBailey tactic.
Why the downvotes? Because it's an irrelevant/tangential ramble? Or some more specific reason?