Posts
Comments
I wonder if it’s true that around the age of 30 women typically start to find babies cute and consequently want children, and if so is this cultural or evolutionary? It’s sort of against my (mesoptimization) intuitions for evolution to act on such high-level planning (it seems that finding babies cute can only lead to reproductive behavior through pretty conscious intermediary planning stages). Relatedly, I wonder if men typically have a basic urge to father children, beyond immediate sexual attraction?
The things you mentioned were probably all net positive, they just had some negative consequences as well. If you want to drive the far-ish future in a particular direction you’ve just got to accept that you’ll never know for sure that you’re doing a good job.
Though I am working on technical alignment (and perhaps because I know it is hard) I think the most promising route may be to increase human and institutional rationality and coordination ability. This may be more tractable than "expected" with modern theory and tools.
Also, I don't think we are on track to solve technical alignment in 50 years without intelligence augmentation in some form, at least not to the point where we could get it right on a "first critical try" if such a thing occurs. I am not even sure there is a simple and rigorous technical solution that looks like something I actually want, though there is probably a decent engineering solution out there somewhere.
It would certainly be nice if we could agree to all put up a ton of satellites that intercept anyone's nuclear missiles (perhaps under the control of an international body), gradually lowering the risk across the board without massively advantaging any country. But I think it would be impossible to coordinate on this.
"Optimization power" is not a scalar multiplying the "objective" vector. There are different types. It's not enough to say that evolution has had longer to optimize things but humans are now "better" optimizers: Evolution invented birds and humans invented planes, evolution invented mitochondria and humans invented batteries. In no case is one really better than the other - they're radically different sorts of things.
Evolution optimizes things in a massively parallel way, so that they're robustly good at lots of different selectively relevant things at once, and has been doing this for a very long time so that inconceivably many tiny lessons are baked in a little bit. Humans work differently - we try to figure out what works for explainable, preferably provable reasons. We also blindly twiddle parameters a bit, but we can only keep so many parameters in mind at once and compare so many metrics - humanity has a larger working memory than individual humans, but the human innovation engine is still driven by linguistic theories, expressed in countable languages. There must be a thousand deep mathematical truths that evolution is already taking advantage of to optimize its DNA repair algorithms, or design wings to work very well under both ordinary and rare turbulent conditions, or minimize/maximize surface tensions of fluids, or invent really excellent neural circuits - without ever actually finding the elaborate proofs. Solving for exact closed form solutions is often incredibly hard, even when the problem can be well-specified, but natural selection doesn't care. It will find what works locally, regardless of logical depth. It might take humans thousands of years to work some of these details out on paper. But once we've worked something out, we can deliberately scale it further and avoid local minima. This distinction in strategies of evolution v.s. humans rhymes with wisdom v.s. intelligence - though in this usage intelligence includes all the insight, except insofar as evolution located and acts through us. As a sidebar, I think some humans prefer an intuitive strategy that is more analogous to evolution's in effect (but not implementation).
So what about when humans turn to building a mind? Perhaps a mind is by its nature something that needs to be robust, optimized in lots of little nearly inexplicable ways for arcane reasons to deal with edge cases. After all, isn't a mind exactly that which provides an organism/robot/agent with the ability to adapt flexibly to new situations? A plane might be faster than a bird, throwing more power at the basic aerodynamics, but it is not as flexible - can we scale some basic principles to beat out brains with the raw force of massive energy expenditure? Or is intelligence inherently about flexibility, and impossible to brute force in that way? Certainly it's not logically inconsistent to imagine that flexibility itself has a simple underlying rule - as a potential existence proof, the mechanics of evolutionary selection are at least superficially simple, though we can't literally replicate it without a fast world-simulator, which would be rather complicated. And maybe evolution is not a flexible thing, but only a designer of flexible things. So neither conclusion seems like a clear winner a priori.
The empirical answers so far seem to complicate the story. Attempts to build a "glass box" intelligence out of pure math (logic or probability) have so far not succeeded, though they have provided useful tools and techniques (like statistics) that avoid the fallacies and biases of human minds. But we've built a simple outer loop optimization target called "next token prediction" and thrown raw compute at it, and managed to optimize black box "minds" in a new way (called gradient descent by backpropogation). Perhaps the process we've capture is a little more like evolution, designing lots of little tricks that work for inscrutable reasons. And perhaps it will work, woe unto us, who have understood almost nothing from it!
If you’re trying to change the vocabulary you should have settled on an option.
I called this a long time ago, though I'm not sure I wrote it down anywhere. But it doesn't mean faster is safer. That's totally wrong - scaling A.I. actually motivates building better GPU's and energy infrastructure. Regardless of compute overhang there was always going to be a "scaling up" period, and the safety community is not prepared for it now.
Mathematics students are often annoyed that they have to worry about "bizarre or unnatural" counterexamples when proving things. For instance, differentiable functions without continuous derivative are pretty weird. Particularly engineers tend to protest that these things will never occur in practice, because they don't show up physically. But these adversarial examples show up constantly in the practice of mathematics - when I am trying to prove (or calculate) something difficult, I will try to cram the situation into a shape that fits one of the theorems in my toolbox, and if those tools don't naturally apply I'll construct all kinds of bizarre situations along the way while changing perspective. In other words, bizarre adversarial examples are common in intermediate calculations - that's why you can't just safely forget about them when proving theorems. Your logic has to be totally sound as a matter of abstraction or interface design - otherwise someone will misuse it.
Cool, I'll DM you.
Possibly, but I think that's the wrong lesson. After all, there's at least a tiny chance we succeed at boxing! Don't put too much stake in "Pascal's mugging"-style reasoning, and don't try to play 4-dimensional chess as a mere mortal :)
The difficulty here is that if the ASI/AGI assigns a tiny probability to being in a simulation, that is subject to being outweighed by other tiny probabilities. For instance, the tiny probability that humanity will successfully fight back (say, create another ASI/AGI) if we are not killed, or the tiny increase in other risks from not using the resources humans need for survival during the takeover process. If this means it takes a little longer to build a Dyson sphere, there's an increased chance of being killed by e.g. aliens or even natural disasters like nearby supernovas in the process. These counterarguments don't work if you expect AGI/ASI to be capable of rapidly taking total control over our solar system's resources.
I have been thinking about extending the AIXI framework from reward to more general utility functions, and working out some of the math, would be happy to chat if that's something you're interested in. I am already supported by the LTFF (for work on embedded agency) so can't apply to the job offer currently. But maybe I can suggest some independent researchers who might be interested.
I guess refusing to use someone’s preferred pronouns is weak Bayesian evidence for wanting to have them killed, but the conclusion is so unlikely it’s probably not appropriate to raise it to the level of serious consideration.
I agree that there should not be a fundamental difference. Actually, I think that when an A.I. is reasoning about improving its reasoning ability some difficulties arise that are tricky to work out with probability theory, but similar to themes that have been explored in logic / recursion theory. But that only implies we haven't worked out the versions of the logical results on reflectivity for uncertain reasoning, not that logical uncertainty is in general qualitatively different than probability. In the example you gave I think it is perfectly reasonable to use probabilities, because we have the tools to do this.
See also my comment on a recent interesting post from Jessica Taylor: https://www.lesswrong.com/posts/Bi4yt7onyttmroRZb/executable-philosophy-as-a-failed-totalizing-meta-worldview?commentId=JYYqqpppE7sFfm9xs
typo, fixed now.
I'll look into it, thanks! I linked a MIRI paper that attempts to learn the utility function, but I think it mostly kicks the problem down the road - including the true environment as an argument to the utility function seems like the first step in the right direction to me.
Yeah, I think you make a good point that increases in intelligence may be harder to understand than decreases, so settling whether this version of AIXI can pursue additional computational resources is an interesting open question.
Eventually - but agency is not sequence prediction + a few hacks. The remaining problems are hard. Massive compute, investment, and enthusiasm will lead to faster progress - i objected to 5 year timelines after chatgpt, but now it’s been a couple years. I think 5 years is still too soon but I’m not sure.
Edit: After Nathan offered to bet my claim is false, I bet no on his market at 82% claiming (roughly) that inference compute is as valuable as training computer for GPT-5: https://manifold.markets/NathanHelmBurger/gpt5-plus-scaffolding-and-inference. I expect this will be difficult to resolve because o1 is the closest we will get to a GPT-5 and it presumably benefits from both more training (including RLHF) and more inference compute. I think its perfectly possible that well thought out reinforcement learning can be as valuable as pretraining, but for practical purposes I expect scaling inference compute on a base model will not see qualitative improvements. I will reach out about more closely related bets.
I’m not convinced that LLM agents are useful for anything.
For what it’s worth, though I can’t point to specific predictions I was not at all surprised by multi-modality. It’s still a token prediction problem, there are not fundamental theoretical differences. I think that modestly more insights are necessary for these other problems.
Hooking up ai subsystems is predictably harder than you’re implying. Humans are terrible at building agi, the only thing we get to work is optimization under minimal structural assumptions. The connections between subsystems will have to be learned not hardcoded, and that will be a bottleneck - very possibly somehow unified system trained in a somewhat clever way will get there first.
I’m not sure they won’t turn out to be easy relative to inventing LLM’s, but under my model of cognition there’s a lot of work remaining. Certainly we should plan for the case that you’re right, though that is probably an unwinnable situation so it may not matter.
The chances of this conversation advancing capabilities are probably negligible - there are thousands of engineers pursuing the plausible sounding approaches. But if you have a particularly specific or obviously novel idea I respect keeping it to yourself.
Let’s revisit the o1 example after people have had some time to play with it. Currently I don’t think there’s much worth updating strongly on.
Why do you think integrating all these abilities is easier than building an LLM? My intuition is that it is much harder, though perhaps most necessary progress for both is contained in scaling compute and most of that scaling has probably already taken place.
My favorite detail is that they aren't training the hidden chain of thought to obey their content policy. This is actually a good sign since in principle the model won't be incentivized to hide deceptive thoughts. Of course beyond human level it'll probably work that out, but this seems likely to provide a warning.
In this example, if I knew the Pythagorean theorem and had performed the calculation, I would be certain of the right answer. If I were not able to perform the calculation because of logical uncertainty (say the numbers were large) then relative to my current state of knowledge I could avoid dutch books by assigning probabilities to side lengths. This would make me impossible to money pump in the sense of cyclical preferences. The fact that I could gamble more wisely if I had access to more computation doesn't seem to undercut the reasons for using probabilities when I don't.
Now in the extreme adversarial case, a bookie could come along who knows my computational limits and only offers me bets where I lose in expectation. But this is also a problem for empirical uncertainty; in both cases, if you literally face a bookie who is consistently winning money from you, you could eventually infer that they know more than you and stop accepting their bets. I still see no fundamental difference between empirical and logical uncertainties.
I've read a bit of the logical induction paper, but I still don't understand why Bayesian probability isn't sufficient for reasoning about math. It seems that the Cox axioms still apply to logical uncertainty, and in fact "parts of the environment are too complex to model" is a classic example justification for using probability in A.I. (I believe it is given in AIMA). At a basic rigorous level probabilities are assigned to percepts, but we like to assign them to English statements as well (doing this reasonably is a version of one of the problems you mentioned). Modeling the relationships between strings of mathematical symbols probabilistically seems a bit more well justified than applying them to English statements if anything, since the truth/falsehood of provability is well-defined in all cases*. Pragmatically, I think I do assign a probabilities to mathematical statements being true/provable when I am doing research, and I am not conscious of this leading me astray!
*the truth of statements independent of (say) ZFC is a bit more of a philosophical quagmire, though it still seems that assigning probabilities to provable/disprovable/independent is a pretty safe practice. This might also be a use case for semimeasures as defective probabilities.
To me, the natural explanation is that they were not trained for sequential decision making and therefore lose coherence rapidly when making long term plans. If I saw an easy patch I wouldn't advertise it, but I don't see any easy patch - I think next token prediction works surprisingly well at producing intelligent behavior in contrast to the poor scaling of RL in hard environments. The fact that it hasn't spontaneously generalized to succeed at sequential decision making (RL style) tasks is in fact not surprising but would have seemed obvious to everyone if not for the many other abilities that did arise spontaneously.
- This is probably true; AIXI does take a mixture of dualistic environments and assumes it is not part of the environment. However, I have never seen the "anvil problem" argued very rigorously - we cannot assume AIXI would learn to protect itself, but that is not a proof that it will destroy itself. AIXI has massive representational power and an approximation to AIXI would form many accurate beliefs about its own hardware, perhaps even concluding that its hardware implements an AIXI approximation optimizing its reward signal (if you doubt this see point 2). Would it not then seek to defend this hardware as a result of aligned interests? The exact dynamics at the "Cartesian boundary" where AIXI sees its internal actions effect the external world are hard to understand, but just because they seem confusing to us (or at least me) does not mean AIXI would necessarily be confused or behave defectively (though since it would be inherently philosophically incorrect, defective behavior is a reasonable expectation). Some arguments for the AIXI problem are not quite right on a technical level, for instance see "Artificial General Intelligence and the Human Mental Model":
"Also, AIXI, and Legg’s Universal Intelligence Measure which it optimizes, is incapable of taking the agent itself into account. AIXI does not “model itself” to figure out what actions it will take in the future; implicit in its definition is the assumption that it will continue, up until its horizon, to choose actions that maximize expected future value. AIXI’s definition assumes that the maximizing action will always be chosen, despite the fact that the agent’s implementation was predictably destroyed. This is not accurate for real-world implementations which may malfunction, be destroyed, self-modify, etc. (Daniel Dewey, personal communication, Aug. 22, 2011; see also Dewey 2011)"
This (and the rest of the chapter's description of AIXI) is pretty accurate, but there's a technical sense in which AIXI does not "assume the maximizing action will always be chosen." Its belief distribution is a semimeasure, which means it represents the possibility that the percept stream may end, terminating the history at a finite time. This is sometimes considered as "death." Note that I am speaking of the latest definition of AIXI that uses a recursive value function - see the section of Jan Leike's PhD thesis on computability levels. The older iterative value function formulation has worse computability properties and really does assume non-termination, so the chapter I quoted may only be outdated and not mistaken.
See also my proposed off-policy definition of AIXI that should deal with brain surgery reasonably.
- Very likely false, at least for some AIXI approximations probably including reasonable implementations of AIXItl. AIXI uses a mixture over probabilistic environments, so it can model environments that are too complicated for it to predict optimally as partially uncertain. That is, probabilities can and will effectively be used to represent logical as well as epistemic uncertainty. A toy AIXI approximation that makes this easy to see is one that performs updating only on the N simplest environments (lets ignore runtime/halting issues for the moment - this is reasonable-ish because AIXI's environments are all at least lower semicomputable). This approximation would place greater and greater weight on the environment that best predicts the percept stream, even if it doesn't do so perfectly perhaps because some complicated events are modeled as "random." The dynamics of updating the universal distribution in a very complicated world are an interesting research topic which seems under or even unexpolored as I write this! Here is a (highly esoteric) discussion of this point as it concerns a real approximation to the universal distribution.
It's true that if we had enough compute to implement a good AIXI approximation, its world would also include lots of hard-to-compute things, possibly including other AIXI approximations, so it need not rapidly become a singleton. But this would not prevent it from being "a working AI."
- This is right, but not really magical - AIXItl only outperforms the class of algorithms with proofs of good performance (in some axiomatic system). If I remember correctly, this class doesn't include AIXItl itself!
It may be possible to formalize your idea as in Orseau's "Space-Time Embedded Intelligence," but it would no longer bear much resemblance to AIXItl. With that said, translating the informal idea you've given into math is highly nontrivial. Which parts of its physical world should be preserved and what does that mean in general? AIXI does not even assume our laws of physics.
Since both objections have been pointers to the definition, I think it's worth noting that I am quite familiar with the definition(s) of AIXI; I've read both of Hutter's books, the second one several times as it was drafted.
Perhaps there is some confusion here about the boundaries of an AIXI implementation. This is a little hard to talk about because we are interested in "what AIXI would do if..." but in fact the embeddedness questions only make sense for AIXI implemented in our world, which would require it to be running on physical hardware, which means in some sense it must be an approximation (though perhaps we can assume that it is a close enough approximation it behaves almost exactly like AIXI). I am visualizing AIXI running inside a robot body. Then it is perfectly possible for AIXI to form accurate beliefs about its body, though in some harder-to-understand sense it can't represent the possibility that it is running on the robots hardware. AIXI's cameras would show its robot body doing things when it took internal actions - if the results damaged the actuators AIXI would have more trouble getting reward, so would avoid similar actions in the future (this is why repairs and some hand-holding before it understood the environment might be helpful). Similarly, pain signals could be communicated to AIXI as negative (or lowered positive) rewards, and it would rapidly learn to avoid them. It's possible that an excellent AIXI approximation (with a reasonable choice of UTM for its prior) would rapidly figure out what was going on and wouldn't need any of these precautions to learn to protect its body - but it seems clear to me that they would at least improve AIXI's chances of success early in life.
With that said, the prevailing wisdom that AIXI would not protect its brain may well be correct, which is why I suggested the off-policy version. This flaw would probably lead to AIXI destroying itself eventually, if it became powerful enough to plan around its pain signals. What I object to is only the dismissal/disagreement with @moridinamael's comment, though it seems to me to be directionally correct and not to make overly strong claims.
I tend to think of this through the lens of the AIXI model - what assumptions does it make and what does it predict? First, one assumes that the environment is an unknown element of the class of computable probability distributions (those induces by probabilistic Turing machines). Then the universal distribution is a highly compelling choice, because it dominates this call while also staying inside it. Unfortunately the computability level does worsen when we consider optimal action based on this belief distribution. Now we must express some coherent preference ordering over action/percept histories, which can be represented as a utility function by VNM. Hutter further assumed it could be expressed as a reward signal, which is a kind of locality condition, but I don't think it is necessary for the model to be useful. This convenient representation allows us to write down a clean specification of AIXI's behavior, relating its well-specified belief distribution and utility function to action choice. It is true that setting aside the reward representation, choosing an arbitrary utility function can justify any action sequence for AIXI (I haven't seen this proven but it seems trivial because all AIXI assigns positive probability to any finite history prefix), but in a way this misses the point: the mathematical machinery we've built up allows us to translate conclusions about AIXI's preference ordering to its sequential action choices and vice versa through the intermediary step of constraining its utility function.
I am confused that this has been heavily downvoted, it seems to be straightforwardly true insofar as it goes. While it doesn't address the fundamental problems of embeddedness for AIXI, and the methods described in the comment would not suffice to teach AIXI to protect its brain in the limit of unlimited capabilities, it seems quite plausible that an AIXI approximation developing in a relatively safe environment with pain sensors, repaired if it causes harm to its actuators, would have a better chance at learning to protect itself in practice. In fact, I have argued that with a careful definition of AIXI's off-policy behavior, this kind of care may actually be sufficient to teach it to avoid damaging its brain as well.
Interesting - intuitively it seems more likely to me that a well constructed mind just doesn't develop sophisticated demons. I think plenty of powerful optimization algorithms are not best understood as creating mesa-optimizers. The low-level systems of the human brain like the visual stream don't seem to ever attempt takeover. I suppose one could make the claim that some psychological disorders arise from optimization "daemons" but this seems mostly seems like pure speculation and possibly an abuse of the terminology. For instance it seems silly to describe optical illusions as optimization daemons.
Yes, I mostly agree with everything you said - the limitation with the probabilistic Turing machine approach (it's usually equivalently described as the a priori probability and described in terms of monotone TM's) is that you can get samples, but you can't use those to estimate conditionals. This is connected to the typical problem of computing the normalization factor in Bayesian statistics. It's possible that these approximations would be good enough in practice though.
The universal distribution/prior is lower semi computable, meaning there is one Turing machine that can approximate it from below, converging to it in the limit. Also, there is a probabilistic Turing machine that induces the universal distribution. So there is a rather clear sense in which one can “use the universal distribution.” Of course in practice different universes would use more or less accurate versions with more or less compromises for efficiency - I think your basic argument holds up insofar as there isn’t a clear mechanism for precise manipulation through the universal distribution. It’s conceivable that some high level actions such as “make it very clear that we prefer this set of moral standards in case anyone with cosmopolitan values simulates are universe” would be preferred based on the malign-universal prior argument.
I don't think that is either my argument or Marcus's; he probably didn't have painless humans in mind when he said that AIXI would avoid damaging itself like humans do. Including some kind of reward shaping like pain seems wise, and if it is not included engineers would have to take care that AIXI did not damage itself while it established enough background knowledge to protect its hardware. I do think that following the steps described in my post would ideally teach AIXI to protect itself, though it's likely that a handful of other tricks and insights are needed in practice to deal with various other problems of embeddedness - and in that case the self-damaging behavior mentioned in your (interesting) write-up would not occur for a sufficiently smart (and single-mindedly goal-directed) agent even without pain sensors.
Any time you attempt to implement AIXI (or any approximation) in the real world you must specify the reward mechanism. If AIXI is equipped with a robotic body you could choose for the sensors to provide "pain" signals. There is no need to provide a nebulous definition of what is or is not part of AIXI's body in order to achieve this.
I also didn't initially buy the argument that Marcus gave and I think some modifications and care are required to make AIXI work as an embedded agent - the off-policy version is a start. Still, I think there are reasonable responses to the objections you have made:
1: It would be standard to issue a negative reward (or decrease the positive reward) if AIXI is at risk of harming its body. This is the equivalent.
2: AIXI does not believe in heaven. If its percept stream ends this is treated as 0 reward forever (which is usually but not always taken as the worst reward possible depending on author). It's unclear if AIXI would expect the destruction of its body to lead to the end of its percept stream, but I think it would under some conditions.
The definition you settled on in your paper (using rejection sampling) only works for estimable weights, so cannot be extended to the lower semicomputable weights usually used for Solomonoff induction (e.g. 2^{-K(<T>)}). The alternative construction in "A formal solution to the grain of truth problem," when properly formalized, works with lower semicomputable weights.
This is a nitpick, but technically to apply the closed graph property instead of upper semicontinuity in Kakutani's fixed point theorem, I believe you need to know that your LCTVS is metrizable. It is sufficient to show it has a countable local base by Theorem 9 of Rudin's functional analysis, which is true because you've taken a countable power of R, but would not necessarily hold for any power of R (?).
Thanks, this is a nice one.
I don't know who this is, but no one was going to say that he's weird so his ideas must be bad.
It's a convenient test-bed to investigate schemes for building an agent with access to a good universal distribution approximation - which is what we (at least, me) usually assume an LLM is!
Also see my PR.
One of my current projects builds on "Learning Universal Predictors." It will eventually appear in my scaffolding sequence (trust me, there will be posts in the scaffolding sequence someday, no really I swear...)
My uncle is a nurse and he named crocks and hoka. However all of these shoes, particularly the crocks, seem to have a similar look.
Okay, Claddagh rings have already been lowered once, I'll kick them all the way down to Low [SPECULATIVE]. Thanks for the rest, I'll take some time to research and integrate it - for a start, is this the clog?
https://www.amazon.ca/dp/B001EJMZ6S?ots=1&ascsubtag=%5Bartid%7C10055.g.43507106%5Bsrc%7Cwww.google.com%5Bch%7C%5Blt%7C%5Bpid%7C749b9f35-9e80-4755-be0c-ec3fcc47ef9f%5Baxid%7Cd8482e9e-0390-4043-9893-b3b76ee3c71b&linkCode=gg2&tag=goodhousekeeping_auto-append-20&th=1
Neat! This is the first submission that's not discernible from meeting someone in person, so it almost has more of a Bourne/Wick feel than Holmes, but I think I'm here for it! Are there any signs that might remain on the person?
This has got to be low frequency but is still a personal favorite thanks!
Thanks for all the suggestions - it will take some time to research and integrate them!
The task is effectively endless, there's a tradeoff curve between time and insight -> agency increases. I think that (particularly if a well-curated list of reliable rules is available) the average person should spend a few hours studying these matters a couple of times a decade to increase their agency. That means the list could still usefully be a little longer. The task of constructing and curating the list in the first place is more time consuming, which means I should be expected to spend more time than is strictly useful on it.
In terms of clothing, I included some well-established status symbols that seem to have staying power and linked to further resources for people who are interested - but I don't recommend obsessively following the fashion cycle.
You're probably right about the side-note, though it seems hard to disentangle.
Good to know, I'll modify the confidence or perhaps change the statement - it seems that since the supporting data is a survey of Americans, it only justifies inferences about Americans.