Posts
Comments
I believe 3 is about right in principle but 5 describes humans today.
I don't think this proves probability and utility are inextricable. I prefer Jaynes' approach of motivating probabilities by coherence conditions on beliefs - later, he notes that utility and probability are on equal footing in decision theory as explained in this post, but (as far as I remember) ultimately decides that he can't carry this through to a meaningful philosophy that stands on its own. By choosing to introduce probabilities as conceptually prior, he "extricates" the two in a way that seems perfectly sensible to me.
I think that at least the weak orthogonality thesis survives these arguments in the sense that any coherent utility function over an ontology "closely matching" reality should in principle be reachable for arbitrarily intelligent agents, along some path of optimization/learning. Your only point that seems to contradict this is the existence of optimization daemons, but I'm confident that an anti-daemon immune system can be designed, so any agent that chooses to design itself in a way where it can be overtaken by daemons must do this with the knowledge that something close to its values will still be optimized for - so this shouldn't cause much observable shift in values.
It's unclear how much measure is assigned to various "final/limiting" utility functions by various agent construction schemes - I think this is far beyond our current technical ability to answer.
Personally, I suspect that the angle is more like 60 degrees, not 3.
“Cancel culture is good actually” needs to go in the hat ;)
You may be right that the benefits are worth the costs for some people, but I think if you have access to a group interested in doing social events with plausible deniability, that group is probably already a place where you should be able to be honest about your beliefs without fear of "cancellation." Then it is preferable to practice (and expect) the moral courage / accountability / honesty of saying what you actually believe and defending it within that group. If you don't have a group of people interested in doing social events with plausible deniability, you probably can't do them and this point is mute. So I'm not sure I understand the use case - you have a friend group that is a little cancel-ish but still interested in expressing controversial beliefs? That sounds like something that is not a rationalist group (or maybe I am spoiled by the culture of Jenn's meetups).
This kind of thing does justified harm to our community’s reputation. If you have fun arguing that only white people can save us while deliberately obfuscating whether you actually believe that, it is in fact a concerning sign about your intentions/seriousness/integrity/trustworthiness.
I don’t believe that these anthropic considerations actually apply, either to us, to oracles, or to Solomonoff induction. The arguments are too informal, it’s very easy to miscalculate Kolmogorov complexities and the measures assigned by the universal distribution using intuitive gestures like this. However I do think that this is a correct generalization of the idea of a malign prior, and I actually appreciate that you wrote it up this way because it makes clear that none of the load-bearing parts of the argument actually rely on reliable calculations (invocations of algorithmic information theory concepts have no been reduced to rigorous math, so the original argument is not stronger than this one).
My impression is that e.g. the Catholic church has a pretty deeply thought out moral philosophy that has persisted across generations. That doesn't mean that every individual Catholic understands and executes it properly.
- Perhaps Legg-Hutter intelligence.
- I'm not sure how much the goal matters - probably the details depend on the utility function you want to optimize. I think you can do about as well as possible by carving out a utility function module and designing the rest uniformly to pursue the objectives of that module. But perhaps this comes at a fairly significant cost (i.e. you'd need a somewhat larger computer to get the same performance if you insist on doing it this way).
- ...And yes, there does exist a computer program which is remarkably good at just chess and nothing else, but that's not the kind of thing I'm talking about here.
- Yes, the I/O channels should be fixed along with the hardware.
The standard method for training LLM's is next token prediction with teacher-forcing, penalized by the negative log-loss. This is exactly the right setup to elicit calibrated conditional probabilities, and exactly the "prequential problem" that Solomonoff induction was designed for. I don't think this was motivated by decision theory, but it definitely makes perfect sense as an approximation to Bayesian inductive inference - the only missing ingredient is acting to optimize a utility function based on this belief distribution. So I think it's too early to suppose that decision theory won't play a role.
What would you have to see proven about Solomonoff induction to conclude it does not have convergence/calibration problems? My friend Aram Ebtekar has worked on showing it converges to 50% on adversarial sequences.
Perhaps LLM's are starting to approach the intelligence of today's average human: capable of only limited original thought, unable to select and autonomously pursue a nontrivial coherent goal across time, learned almost everything they know from reading the internet ;)
No that seems paywalled, curious though?
An example I've been studying obsessively: https://www.lesswrong.com/posts/Yz33koDN5uhSEaB6c/sherlockian-abduction-master-list
How do you suggest advocating for this effectively?
I'm in Canada so can't access the latest Claude, so my experience with these things does tend to be a couple months out of date. But I'm not really impressed with models spitting out slightly wrong code that tells me what functions to call. I think this is essentially a more useful search engine.
I've noticed occasional surprises in that direction, but none of them seem to shake out into utility for me.
Semi-interestingly, my MMA school taught that it's best for the punch to arrive before the leading foot lands so that the punch carries your full weight. Many people at advanced levels weren't aware of this because we did not introduce it right away - if you try to do this before learning a few other details (and building strength), you run a risk of hurting your wrist by punching too hard.
I've been waiting to say this until OpenAI's next larger model dropped, but this has now failed to happen for so long that it's become it's own update, and I'd like to state my prediction before it becomes obvious.
This doesn't seem to be reflected in the general opinion here, but it seems to me that LLM's are plateauing and possibly have already plateaued a year or so ago. Scores on various metrics continue to go up, but this tends to provide weak evidence because they're heavily gained and sometimes leak into the training data. Still, those numbers overall would tend to update me towards short timelines, even with their unreliability taken into account - however, this is outweighed by my personal experience with LLM's. I just don't find them useful for practically anything. I have a pretty consistently correct model of the problems they will be able to help me with and it's not a lot - maybe a broad introduction to a library I'm not familiar with or detecting simple bugs. That model has worked for a year or two without expanding the set much. Also, I don't see any applications to anything economically productive except for fluffy chatbot apps.
I think this is a story about anthropic immortality.
Thanks! I am particularly interested on the hook grip calluses on thumbs, I'll look into that.
Calluses at the base of the finger (say, the knuckle-joint of the palm) are in my experience very difficult to classify. I get them there by climbing as you said, and though I also get some calluses on my fingers those tend to be less persistent and probably disappear most of the time (after climbing for awhile at my level of intensity I stop getting calluses). I have also seen them from biking - when I started out I used to look at people's palms a lot and never came up with a reliable way to distinguish this from weightlifting. But if you could go into some more detail on the differences, perhaps I'll add a more speculative entry and see how it stands up!
(If it's your first post on lesswrong, welcome! I think you'll find that kindness/politeness is the community norm here)
I haven't been able to verify that protestants don't wear a cross on a chain - it seems like they prefer an empty cross to the more catholic-coded crucifix, but this doesn't seem to be what you meant?
Technically the connection between the computability levels of AIT (estimability, lower/upper semi-computability, approximability) and the Turing degrees has not been worked out properly. See chapter 6 of Leike's thesis, though there is a small error in the inequalities of section 6.1.2. It is necessary to connect the computability of real valued functions (type two theory of effectivity) to the arithmetic hierarchy - as far as I know this hasn't been done, but maybe I'll share some notes in a few months.
Roughly, most classes don't have a universal distribution because they are not computably enumerable, but perhaps there are various reasons. There's a nice table in Marcus Hutter's original book, page 50.
It says that (negative log) universal probability is about the same as the (monotone) Kolmogorov complexity - in the discrete case up to a constant multiple. Basically, the Bayesian prediction is closely connected to the shortest explanation. See Li and Vitanyi's "An Introduction to Kolmogorov Complexity and its Applications."
Last question is a longer story I guess. Basically, the conditionals of the universal distribution are not lower semi-computable, and it gets even worse when you have to compare the expected values of different outcomes because of tie-breaking. But a good approximation of AIXI can still be computed in the limit.
Nice things about the universal distribution underlying AIXI include:
- It is one (lower semi-)computable probabilistic model that dominates in the measure-theoretic sense all other (lower semi-)computable probabilistic models. This is not possible to construct for most natural computability levels, so its neat that it works.
- Unites compression and prediction through the coding theorem - though this is slightly weaker in the sequential case.
- It has two very natural characterizations, either as feeding random bits to a UTM or as an explicit mixture of lower semi-computable environments.
With the full AIXI model, Professor Hutter was able to formally extend the probabilistic model to interactive environments without damaging the computability level. Conditioning and planning do damage the computability level but this is fairly well understood and not too bad.
I'm starting a google group for anyone who wants to see occasional updates on my Sherlockian Abduction Master List. It occurred to me that anyone interested in the project would currently have to check the list to see any new observational cues (infrequently) added - also some people outside of lesswrong are interested.
I would be very interested to see what you come up with!
Would be nice to be able to try it out without signing up
I think it's mostly about elite outreach. If you already have a sophisticated model of the situation you shouldn't update too much on it, but it's a reasonably clear signal (for outsiders) that x-risk from A.I. is a credible concern.
Personally I'm unlikely to increase my neuron-neuron bandwidth anytime soon, sounds like a very risky intervention even if possible.
My guess is that it would be very hard to get to millions of connections, so maybe we agree, but I'm curious if you have more specific info. Why is it not the bottleneck though?
I'm not a neuroscientist / cognitive scientist, but my impression is that rapid eye movements are already much faster than my conscious deliberation. Intuitively, this means there's already a lot of potential communication / control / measurement bandwidth left on the table. There is definitely a point beyond which you can't increase human intelligence without effectively adding more densely connected neurons or uploading and increasing clock speed. Honestly I don't think I'm equipped to go deeper into the details here.
You're talking about a handful of people, so the benefit can't be that large.
I'm not sure I agree with either part of this sentence. If we had some really excellent intelligence augmentation software built into AR glasses we might boost on the order of thousands of people. Also I think the top 0.1% of people contribute a large chunk of economic productivity - say on the order of >5%.
I think there's a reasonable chance everything you said is true, except:
What you're actually doing is doing the 5% boost, and never doing the other stuff.
I intend to do the other stuff after finishing my PhD - though its not guaranteed I'll follow through.
The next paragraph is low confidence because it is outside of my area of expertise (I work on agent foundations, not neuroscience):
The problem with neuralink etc. is that they're trying to solve the bandwith problem which is not currently the bottleneck and will take too long to yield any benefits. A full neural lace is maybe similar to a technical solution to alignment in the sense that we won't get either within 20 years at our current intelligence levels. Also, I am not in a position where I have enough confidence in my sanity and intelligence metrics to tamper with my brain by injecting neurons into it and stuff. On the other hand, even minor non-invasive general fluid intelligence increase at the top of the intelligence distribution would be incredibly valuable and profits could be reinvested in more hardcore augmentation down the line. I'd be interested to here where you disagree with this.
It almost goes without saying that if you can make substantial progress on the hardcore approaches that would be much, much more valuable than what I am suggesting, and I encourage you to try.
I think I'm more optimistic about starting with relatively weak intelligence augmentation. For now, I test my fluid intelligence at various times throughout the day (I'm working on better tests justified by algorithmic information theory in the style of Prof Hernandez-Orallo, like this one but it sucks to take https://github.com/mathemajician/AIQ but for now I use my own here: https://github.com/ColeWyeth/Brain-Training-Game), and I correlate the results with everything else I track about my lifestyle using reflect: https://apps.apple.com/ca/app/reflect-track-anything/id6463800032 which I endorse, though I should note it's owned/invented by a couple of my friends/former coworkers. I'll post some intermediate results soon. Obviously this kind of approach alone will probably only provide a low single digit IQ boost at most, but I think it makes sense to pick the low-hanging fruit first (then attempt incrementally harder stuff with the benefit of being slightly smarter). Also, accurate metrics and data collection should be established as early as possible. Ultimately I want to strap some AR goggles on and measure my fluid intelligence in real time ideally from eye movements in response to some subconscious stimulation (haven't vetted the plausibility of this idea at all).
The executive summary seems essentially right to me. My only objection is that Phase 4 should probably be human intelligence augmentation.
You raise an interesting point about virtue ethics - I don't think that is required for moral coherence, I think it is just a shortcut. A consequentialist must be prepared to evaluate ~all outcomes to approach moral coherence, but a virtue ethicist really only needs to evaluate their own actions, which is much easier.
Presented the Sherlockian abduction master list at a Socratica node:
Presented this list and idea at a Socratica node:
Verbal statements often have context dependent or poorly defined truth value, but observations are pretty (not completely) solid. Since useful models eventually shake out into observations, the binary truth values tagging observations "propagate back" through probability theory to make useful statements about models. I am not convinced that we need a fuzzier framework - though I am interested in the philosophical justification for probability theory in the "unrealizable" case where no element of the hypothesis class is true. For instance, it seems that universal distributions mixture is over probabilistic models none of which should necessarily be assumed true, but rather only the widest class we can compute.
Improving computer security seems possible but there are many other attack vectors. For instance, even if an A.I. can prove a system’s software is secure, it may choose to introduce social engineering style back doors if it is not aligned. It’s true that controlled A.I.‘s can be used to harden society but overall I don’t find that strategy comforting.
I’m not convinced that this induction argument goes through. I think it fails on the first generation that is smarter than humans, for basically Yudkowskian reasons.
Imagine that there are just a few labs with powerful A.I., all of which are responsible enough to use existing A.I. control strategies which have been prepared for this situation, and none of which open source their models. Now if they successfully use their A.I. for alignment, they will also be able to successfully use it for capabilities research. At some point, control techniques will no longer be sufficient, and we have to hope that by then A.I. aided alignment has succeeded enough to prevent bad outcomes. I don’t believe this is a serious possibility; the first A.I. capable of solving the alignment problem completely will also be able to deceive us about solving the alignment problem (more) easily - up to and including this point, A.I. will produce partial, convincing solutions to the alignment problem which human engineers will go forward with. Control techniques will simply threshold (below) the capabilities of the first unaligned A.I. that escapes, which is plausibly a net negative since it means we won’t have early high impact warnings. If occasional A.I. escapes turn out to be non-lethal, economic incentives will favor better A.I. control, so working on this early won’t really matter. If occasional A.I. escapes turn out to be lethal, then we will die unless we solve the alignment problem ourselves.
I also don’t expect us to have robustly solved ASI-alignment in that timeframe. I simply fail to see a history in which AI control work now is a decisive factor. If you insist on making a top level claim that I haven’t thought through the branches of how things go, I’d appreciate a more substantive description of the branch I am not considering.
I do see advantages to hardening important institutions against cyberattacks and increasing individual and group rationality so that humans remain agentic for as long as possible.
I disagree with this take. A.I. control will only be important in a multipolar situation in which no single A.I. system can create a gray goo catastrophe etc. But if such pivotal acts are impossible and no singular A.I. takes control, but instead many A.I.’s are competing, than some groups will develop better or worse control for economic reasons and it won’t affect existential risk much to work on it now. I don’t think I can see a situation where control matters - only a few players have A.G.I. for a very long time and none escape or are open sourced but also none gain a decisive advantage?
I wonder if it’s true that around the age of 30 women typically start to find babies cute and consequently want children, and if so is this cultural or evolutionary? It’s sort of against my (mesoptimization) intuitions for evolution to act on such high-level planning (it seems that finding babies cute can only lead to reproductive behavior through pretty conscious intermediary planning stages). Relatedly, I wonder if men typically have a basic urge to father children, beyond immediate sexual attraction?
The things you mentioned were probably all net positive, they just had some negative consequences as well. If you want to drive the far-ish future in a particular direction you’ve just got to accept that you’ll never know for sure that you’re doing a good job.
Though I am working on technical alignment (and perhaps because I know it is hard) I think the most promising route may be to increase human and institutional rationality and coordination ability. This may be more tractable than "expected" with modern theory and tools.
Also, I don't think we are on track to solve technical alignment in 50 years without intelligence augmentation in some form, at least not to the point where we could get it right on a "first critical try" if such a thing occurs. I am not even sure there is a simple and rigorous technical solution that looks like something I actually want, though there is probably a decent engineering solution out there somewhere.
It would certainly be nice if we could agree to all put up a ton of satellites that intercept anyone's nuclear missiles (perhaps under the control of an international body), gradually lowering the risk across the board without massively advantaging any country. But I think it would be impossible to coordinate on this.
"Optimization power" is not a scalar multiplying the "objective" vector. There are different types. It's not enough to say that evolution has had longer to optimize things but humans are now "better" optimizers: Evolution invented birds and humans invented planes, evolution invented mitochondria and humans invented batteries. In no case is one really better than the other - they're radically different sorts of things.
Evolution optimizes things in a massively parallel way, so that they're robustly good at lots of different selectively relevant things at once, and has been doing this for a very long time so that inconceivably many tiny lessons are baked in a little bit. Humans work differently - we try to figure out what works for explainable, preferably provable reasons. We also blindly twiddle parameters a bit, but we can only keep so many parameters in mind at once and compare so many metrics - humanity has a larger working memory than individual humans, but the human innovation engine is still driven by linguistic theories, expressed in countable languages. There must be a thousand deep mathematical truths that evolution is already taking advantage of to optimize its DNA repair algorithms, or design wings to work very well under both ordinary and rare turbulent conditions, or minimize/maximize surface tensions of fluids, or invent really excellent neural circuits - without ever actually finding the elaborate proofs. Solving for exact closed form solutions is often incredibly hard, even when the problem can be well-specified, but natural selection doesn't care. It will find what works locally, regardless of logical depth. It might take humans thousands of years to work some of these details out on paper. But once we've worked something out, we can deliberately scale it further and avoid local minima. This distinction in strategies of evolution v.s. humans rhymes with wisdom v.s. intelligence - though in this usage intelligence includes all the insight, except insofar as evolution located and acts through us. As a sidebar, I think some humans prefer an intuitive strategy that is more analogous to evolution's in effect (but not implementation).
So what about when humans turn to building a mind? Perhaps a mind is by its nature something that needs to be robust, optimized in lots of little nearly inexplicable ways for arcane reasons to deal with edge cases. After all, isn't a mind exactly that which provides an organism/robot/agent with the ability to adapt flexibly to new situations? A plane might be faster than a bird, throwing more power at the basic aerodynamics, but it is not as flexible - can we scale some basic principles to beat out brains with the raw force of massive energy expenditure? Or is intelligence inherently about flexibility, and impossible to brute force in that way? Certainly it's not logically inconsistent to imagine that flexibility itself has a simple underlying rule - as a potential existence proof, the mechanics of evolutionary selection are at least superficially simple, though we can't literally replicate it without a fast world-simulator, which would be rather complicated. And maybe evolution is not a flexible thing, but only a designer of flexible things. So neither conclusion seems like a clear winner a priori.
The empirical answers so far seem to complicate the story. Attempts to build a "glass box" intelligence out of pure math (logic or probability) have so far not succeeded, though they have provided useful tools and techniques (like statistics) that avoid the fallacies and biases of human minds. But we've built a simple outer loop optimization target called "next token prediction" and thrown raw compute at it, and managed to optimize black box "minds" in a new way (called gradient descent by backpropogation). Perhaps the process we've capture is a little more like evolution, designing lots of little tricks that work for inscrutable reasons. And perhaps it will work, woe unto us, who have understood almost nothing from it!
If you’re trying to change the vocabulary you should have settled on an option.
I called this a long time ago, though I'm not sure I wrote it down anywhere. But it doesn't mean faster is safer. That's totally wrong - scaling A.I. actually motivates building better GPU's and energy infrastructure. Regardless of compute overhang there was always going to be a "scaling up" period, and the safety community is not prepared for it now.