Posts
Comments
Thanks for the mention.
Here's how I'd frame it: I don't think it's a good idea to leave the entire future up to the interpretation of our first AGI(s). They could interpret our attempted alignment very differently than we hoped, in in-retrospect-sensible ways, or do something like "going crazy" from prompt injections or strange chains of thought leading to ill-considered beliefs that get control over their functional goals.
It seems like the core goal should be to follow instructions or take correction - corrigibility as a singular target (or at least prime target). It seems noticeably safer to use Intent alignment as a stepping-stone to value alignment.
Of course, leaving humans in charge of AGI/ASI even for a little while sounds pretty scary too, so I don't know.
I place this alongside the Simplicia/Doomimir dialogues as the farthest we've gotten (at least in publicly legible form) on understanding the dramatic disagreements on the difficulty of alignment.
There's a lot here. I won't try to respond to all of it right now.
I think the most important bit is the analysis of arguments for how well alignment generalizes vs. capabilities.
Conceptual representations generalize farther than sensory representations. That's their purpose. So when behavior (and therefore alignment) is governed by conceptual representations, it will generalize relatively well.
When alignment is based on a relatively simple reward model based on simple sensory representations, it won't generalize very well. That's the case with humans. The reward model is running on sensory representations (it has to so they can be specified in the limited information space of DNA, as you and others have discussed).
Alignment generalizes farther than capabilities in well-educated, carefully considered modern humans because our goals are formulated in terms of concepts. (There are still ways this could go awry, but I think most modern humans would generalize their goals well and lead us into a spectacular future if they were in charge of it).
This could be taken as an argument for using some type of goals selected from learned knowledge for alignment if possible. If we could use natural language (or another route to conceptual representations) to specify an AI's goals, it seems like that would produce better generalization than just trying to train stuff in with RL to produce behavior we like in the training environment.
One method of "conceptual alignment" is the variant of your Plan for mediocre alignment of brain-like [model-based RL] AGI in which you more or less say to a trained AI "hey think about human flourishing" and then set the critic system's weights to maximum. Another is alignment-by-prompting for LLM-based agents; I discuss that in Internal independent review for language model agent alignment. I'm less optimistic now than when I wrote that, given the progress made in training vs. scripting for better metacognition - but I'm not giving up on it just yet. Tan Zhi Xuan makes the point in this interview that we're really training LLMs mostly to have a good world model and to follow instructions, similar to Andrej Karpathy's point that RLHF is just barely RL. It's similar with RLAIF and the reward models training R1 for usability, after the pure RL on verifiable answers. So we're still training models to have good world models and follow instructions. Played wisely, it seems like that could produce aligned LLM agents (should that route reach "real AGI").
That's a new formulation of an old thought, prompted by your framing of pitting the arguments for capabilities generalizing farther than alignment (for evolution making humans) and alignment generalizing farther than capabilities (for modern humans given access to new technologies/capabilities).
The alternative is trying to get an RL system to "gel" into a concept-based alignment we like. This happens with a lot of humans, but that's a pretty specific set of innate drives (simple reward models) and environment. If we monitored and nudged the system closely, that might work too.
It does seem to imply that, doesn't it? I respect the people leaving, and I think it does send a valuable message. And it seems very valuable to have safety-conscious people on the inside.
This is the way most people feel about writing. I do not think wonderful plots are ten a penny; I think writers are miserable at creating actually good plots from the perspective of someone who values scifi and realism. Their technology and their sociology is usually off in obvious ways, because understanding those things is hard.
I would personally love to see more people who do understand science, use AI to turn them into stories.
Or alternately I'd like to see skilled authors consult AI about the science in their stories.
This attitude that plots don't matter and writing is all is why we get lazily constructed plots and worlds.
This turns literature into mostly a sort of hallucinatory slop instead of a way to really understand the world while you're being entertained.
Most writers do seem to understand psychology so that's a plus.and some of them understand current technology and society, but that's the exception.
The better framing is almost certainly "how conscious is AI in which ways?"
The question "if AI is conscious" is ill-formed. People mean different things by "consciousness". And even if we settled on one definition, there's no reason to think it would be an either-or question; like all most other phenomena, most dimensions of "consciousness" are probably on a continuum.
We tend to assume that consciousness is a discrete thing because we have only one example, human consciousness, and ultimately our own. And most people who can describe their consciousness are having a pretty human-standard experience. But that's a weak reason to think there's really one discrete thing we're referring to as "consciousness".
That's my standard comment. I apologize for not reading your paper before commenting on your post title. I am starting to think that the question of AI rights might become important for human survival, but I'm waiting til we see if it is before turning my attention back to "consciousness".
I agree with basically everything you've said here.
Will LLM-based agents have moral worth as conscious/sentient beings?
The answer is almost certainly "sort of". They will have some of the properties we're referring to as sentient, conscious, and having personhood. It's pretty unlikely that we're pointing to a nice sharp natural type when we ascribe moral patienthood to a certain type of system. Human cognition is similar and different in a variety of ways from other systems; which of these is "worth" moral concern is likely to be a matter of preference.
And whether we afford rights to the minds we build will affect us spiritually as well as practically. If we pretend that our creations are nothing like us and deserve no consideration, we will diminish ourselves as a species with aspirations of being good and honorable creatures. And that would invite others - humans or AI - to make a similar selfish ethical judgment call against us, if and when they have the power to do so.
Yet I disagree strongly with the implied conclusion, that maybe we shouldn't be trying for a technical alignment solution.
We might be more optimistic that AI persons are, by virtue of their nature, wiser and friendlier than the superintelligent agent.
Sure, we should be a bit more optimistic. By copying their thoughts from human language, these things might wind up with something resembling human values.
Or they might not.
If they do, would those be the human values of Gandhi or of Genghis Khan?
This is not a supposition on which to gamble the future. We need much closer consideration of how the AI and AGI we build will choose its values.
Agreed and well said. Playing a number of different strategies simultaneously is the smart move. I'm glad you're pursuing that line of research.
Sorry if I sound overconfident. My actual considered belief is that AGI this decade is quite possible, and it is crazy overconfident in longer timeline predictions to not prepare seriously for that possibility.
Multigenerational stuff needs a way longer timeline. There's a lot of space between three years and two generations.
I buy your argument for why dramatic enhancement is possible. I just don't see how we get the time. I can barely see a route to a ban, and I can't see a route to a ban through enough to prevent reckless rogue actors from building AGI within ten or twenty years.
And yes, this is crazy as a society. I really hope we get rapidly wiser. I think that's possible; look at the way attitudes toward COVID shifted dramatically in about two weeks when the evidence became apparent, and people convinced their friends rapidly. Connor Leahy made some really good points about the nature of persuasion and societal belief formation in his interview on the previous episode of the same podcast. It's in the second half of the podcast; the first half is super irritating as they get in an argument about the "nature of ethics" despite having nearly identical positions. I might write that up, too - it makes entirely different but equally valuable points IMO.
He just started talking about adopting. I haven't followed the details. Becoming a parent, including an adoptive parent who takes it seriously, is often a real growth experience from what I've seen.
Oh, I agree. I liked his framing of the problem, not his proposed solution.
On that regard specifically:
If the main problem with humans being not-smart-enough is being overoptimistic, maybe just make some organizational and personal belief changes to correct this?
IF we managed to get smarter about rushing toward AGI (a very big if), it seems like an organizational effort with "let's get super certain and get it right the first time for a change" as its central tenet would be a big help, with or without intelligence enhancement.
I very much doubt any major intelligence enhancement is possible in time. And it would be a shotgun approach to solve one particular problem of overconfidence/confirmation bias. Of course other intelligence enhancements would be super helpful too. But I'm not sure that route is at all realistic.
I'd put Whole Brain Emulation in its traditional form as right out. We're not getting either that level of scanning nor simulation nearly in time.
The move here isn't that someone of IQ 200 could control an IQ 2000 machine, but that they could design one with motivations that actually aligned with theirs/humanity's - so it wouldn't need to be controlled.
I agree with you about the world we live in. See my post If we solve alignment, do we die anyway? for more on the logic of AGI proliferation and the dangers of telling it to self-improve.
But that's dependent on getting to intent aligned AGI in the first place. Which seems pretty sketchy.
Agreed that OpenAI just reeks of overconfidence, motivated reasoning, and move-fast-and-break-things. I really hope Sama wises up once he has a kid and feels viscerally closer to actually launching a machine mind that can probably outthink him if it wants to.
Yes, precisely. I wrote a post on exactly this:
Conflating value alignment and intent alignment is causing confusion
That's a good example. LW is amazing that way. My previous field of computational cognitive neuroscience, and its surrounding fields, did not treat challenges with nearly that much grace or truth-seeking.
I'll quit using that as an excuse to not say what I think is important - but I will try to say it politely.
I'm not worried about my ideas being ignored so much as actively doing harm to the group epistemics by making people irritated with my pushback, and by association, irritated with the questions I raise and therefore resistant to thinking about them.
I am pretty sure that motivated reasoning does that, and it's a huge problem for progress in existing fields. More here: Motivated reasoning/confirmation bias as the most important cognitive bias
LessWrong does seem way less prone to motivated reasoning. I think this is because rationalism demands actually being proud of changing your mind. This value provides resistance but not immunity to motivated reasoning. I want to write a post about this.
I find both the written form and spoken form distasteful for some reason. It feels gutteral. And the similarity to "it" seems dehumanizing. Trying to put my finger on it, I think it's not in the range of words that could have been organically formed for that purpose; it feels semantically out of place.
I highly recommend going with a different option. If you insist on fighting this battle. We have lots of ambiguous terminology already. Consider how long we've been saying forever in response to "someone called.": "What did they want?"
If you're going to supplant existing usage, I think it's going to have to feel more natural and appealing.
The ambiguous pronunciation is another problem... reminiscent of "you" or "yout" in written form...
Just some thoughts. Etymology is fascinating and lanaguage is a glorious ever-changing mess of subtle meanings and implications.
To be clear, I'm not against control work, just conflicted about it for this reason.
That's a good point. That argument applies to prosaic "alignment" research, which seems importantly different (but related to) "real" alignment efforts.
Prosaic alignment thus far is mostly about making the behavior of LLMs align roughly with human goals/intentions. That's different in type from actually aligning the goals/values of entities that have goals/values with each other. BUT there's enough overlap that they're not entirely two different efforts.
I think many current prosaic alignment methods do probably extend to aligning foundation-model-based AGIs. But it's in a pretty complex way.
So yes, that argument does apply to most "alignment" work, but much of that is probably also progressing on solving the actual problem. Control work is a stopgap measure that could either provide time to solve the actual problem, or mask the actual problem so it's not solved in time. I have no prediction which because I haven't made enough gears-level models to apply here.
Edit: I'm almost finished with a large post trying to progress on how prosaic alignment might or might not actually help align takeover-capable AGI agents if they're based on current foundation models. I'll try to link it here when it's out.
WRT control catching them trying to do bad things: yes, good point! Either you or Ryan also had an unfortunately convincing post about how possible it is that an org might not un-deploy a model they caught trying to do very bad things... so that's complex too.
This is good (see my other comments) but:
This didn't address my biggest hesitation about the control agenda: preventing minor disasters from limited AIs could prevent alignment concern/terror - and so indirectly lead to a full takeover once we get an AGI smart enough to circumvent control measures.
I'm curious why you think deceptive alignment from transformative AI is not much of a threat. I wonder if you're envisioning purely tool AI, or aligned agentic AGI that's just not smart enough to align better AGI?
I think it's quite implausible that we'll leave foundation models as tools rather than using the prompt "pretend you're an agent and call these tools" to turn them into agents. People want their work done for them, not just advice on how to do their work.
I do think it's quite plausible that we'll have aligned agentic foundation model agents that won't be quite smart enough to solve deeper alignment problems reliably, and sycophantic/clever enough to help researchers fool themselves into thinking they're solved. Since your last post to that effect it's become one of my leading routes to disaster. Thanks, I hate it.
OTOH, if that process is handled slightly better, it seems like we could get the help we need to solve alignment from early aligned LLM agent AGIs. This is valuable work on that risk model that could help steer orgs away from likely mistakes and toward better practices.
I guess somebody should make a meme about "humans and early AGI collaborate to align superintelligence, and fuck it up predictably because they're both idiots with bad incentives and large cognitive limitations, gaps, and biases" to ensure this is on the mind of any org worker trying to use AI to solve alignment.
Thank you for writing this, John.
It's critical to pick good directions for research. But fighting about it is not only exhausting, it's often counterproductive - it can make people tune out "the opposition."
In this case, you've been kind enough about it, and the community here has good enough standards (amazing, I think, relative to the behavior of the average modern hominid) that one of the primary proponents of the approach you're critiquing started his reply with "thank you".
This gives me hope that we can work together and solve the several large outstanding problems.
I often think of writing critiques like this, but I don't have the standing with the community for people to take them seriously. You do.
So I hope this one doesn't cause you headaches, and thanks for doing it.
Object-level discussion in a separate comment.
Thanks for doing this so I didn't have to! Hell is other people - on social media. And it's an immense time-sink.
Zvi is the man for saving the rest of us vast amounts of time and sanity.
I'd guess the psyop spun out of control with a couple of opportunistic posters pretending they had inside information, and that's why Sam had to say lower your expectations 100x. I'm sure he wants hype, but he doesn't want high expectations that are very quickly falsified. That would lead to some very negative stories about OpenAI's prospects, even if they're equally silly they'd harm investment hype.
This feels important. The first portion seems particularly useful as a path toward cognitive enhancement with minimal AI (I'm thinking of the portion before "copies of his own mind..." slightly before "within a couple of years" jumps farther ahead). It seems like a roadmap to what we could accomplish in short order, given the chance.
I hadn't gotten an intuitive feel for some of the low-hanging fruit in cognitive enhancements. Much of this could be accomplished very soon. Some of it can be accomplished now.
A few thoughts now, more later:
AI already has very good emotional intelligence; if we applied it to more of our decisions and struggles, it would probably be very helpful. Ease of use is one barrier to doing that. Having models "watch" and interpret what happens to us through a wearable would break down that barrier. Faster loops between helpful AI, particularly emotional/social intelligence AIs, might be extremely useful. The emulation of me wouldn't have to be very good; it would just need some decent ideas about "what might you wish you'd done, later?" Of course the better those ideas were (like if they were produced by something smarter than me, or by me with a lot more time to think), they'd be even more useful. But just something about as smart as I am in System 1 terms (like Claude and GPT4o) might be pretty useful if I got its ideas in a fast loop.
Part of the vision here is that humans might become far more psychologically healthy, relatively easily. I think this is true. I've studied psychology—mostly cognitive psychology but a good bit of clinical and emotional theories as well—for a long time. I believe there is low-hanging fruit yet to be plucked in this area.
Human psychology is complex, yes, but our efforts thus far have been clumsy graspings in the dark. We can give people the tools to steadily ease their traumas and to work toward their goals. AI could speed that up dramatically. I'm not sure it has to be that much more emotionally intelligent than a human; merely having unlimited patience and enthusiasm for the project of working on our emotional hangups might be adequate.
Of course, the elephant in the room is: how do we get this sort of tool AI, and even a little time to use it, without summoning the demon by turning it into general, agentic AGI? The tools described here would wreak havoc if someone greedy told them "just take that future self predictive loop and feed it into these tools" then hired them out as labor. Our character wouldn't have a job, because one person would now be doing the work of a hundred. Yes, there is a very lucky possibility in which we get this world by accident: we could have some AI with excellent emotional intelligence, and others with excellent particular skills, and none that can do the planning and big-picture thinking that humans are doing. Even in that case, this person would be living through a traumatic period in history in which far fewer people are needed for work, so unemployment is rising rapidly.
So in most of the distribution of futures that produce a story like this, I think we must assume that it isn't chance. AGI has been created, and both alignment problems have been solved—AI and human. Or else only AI alignment has been solved, and there's been a soft takeover that the humans don't even recognize.
Anyway, this is wonderful and inspiring!
Science fiction serves two particularly pragmatic purposes (as well as many purposes for pleasure and metaphor): providing visions of possible futures to steer toward, and visions of possible futures to steer away from. We need far more scifi that maps routes to positive futures.
This is a great step in that direction. We need something to fight for. The character here could be all of us if we figure out enough of alignment in time.
More later. This is literally inspiring.
I still think that adequately aligning both AI/AGI and the society that creates it is the primary challenge. But this type of cognitive/emotional enhancement is one tool we might use to help us solve alignment.
And it's part of the literally unimaginable payoff if we do collectively solve the problems facing us. This type of focused effort to imagine the payoffs will help us work toward those futures.
Whew! That's pretty intense, and pretty smart. I didn't read it all because I don't have time and I'm not in the same emotional position you're in.
I do want to say that I've thought an awful lot and researched a lot about the situation we're in with AI and as a world and a society. I feel a lot more uncertainty than you seem to about outcomes. AI and AGI are going to create very, very major changes. There's a ton of risk of different sorts, but there's also a very real chance (relative to what we reliably know) that things get way, way better after AGI, if we can align it and manage the power distribution issues. And I think there are very plausible routes to both of those.
This is primarily based on uncertainty. I've looked in depth at the arguments for pessimism about both alignment and societal power structures. They are just as incomplete and vibes-based as the arguments for optimism. There's a lot of real substance on both sides, but not enough to draw firm conclusions. Some very well informed people think we're doomed, other equally well-informed people think odds are in favor of a vastly better future. We simply don't know how this is going to turn out.
There is still time to hope, and to help.
See my If we solve alignment, do we die anyway? and the other posts and sources I link there for more on all of these claims.
I agree with essentially all of this. See my posts
If we solve alignment, do we die anyway? on AGI nonproliferation and government involvement
and
Intent alignment as a stepping-stone to value alignment on eventually building sovereign ASI using intent-aligned (IF or Harms-corrigible) AGI to help with alignment. Wentworth recently pointed out that idiot sycophantic AGI combined with idiotic/time-pressured humans might easily screw up that collaboration, and I'm afraid I agree. I hope we do it slowly and carefully, but not slowly enough to fall into the attractor of a vicious human getting the reigns and keeping them forever.
The only thing I don't agree with (AFAICT on a brief look - I'm rushed myself right now so LMK what else I'm missing if you like) is that we might have a pause. I see that as so unlikely as to not be worth time thinking about. I have yet to see any coherent argument for how we get one in time. If you know of such an argument, I'd love to see it!
Wow! That is a hell of a comprehensive writeup.
The bio-determinist child-rearing rule of thumb [but see caveats below!]: Things you do as a parent will have generally small or zero effects on what the kid will be like as an adult—their personality, their intelligence and competence, their mental health, etc.
I found it pretty interesting here and back when I was reading about it that this list does not include happiness. This is part of a larger societal disinterest in happiness. But I do wonder if it might be influenced by parents nontrivially by them seeding children with a life philosophy and a set of cognitive habits about how to think about life.
I also noticed that the data they tracked about how parents treat children included no efforts to determine how much parents actually loved their children, or how much they fought with their children. While most parents, particularly the middle-classers that take part in studies, love their children, how much and whether that prevents ongoing feuds with their children does seem to vary a good bit.
Of course that would be problematic, because how much parents love and feud with their children is also clearly influenced by how much said children are acting like little shits. :)
I doubt any of these would show large effects, I'm just noting their absence.
I used to care about genetics for reasons 2 (what effect do parents have) and 3 (do your adult decisions like attending therapy really matter), back when I planned to use my PhD in cognitive psychology and neuroscience to write about "free will" (really self-determination; do our decisions matter for our outcomes) in ourselves and our society. My thesis was that we have substantial self-determination but also substantial limitations in it; and that liberal American philosophies tend to emphasize the extent to which we don't, while conservative American philosophies emphasize the extent to which we do. Neither is entirely correct, causing strong adherents of either to make no sense and therefore be super irritating to talk to.
But those ambitions ended when I first read Yudkowsky, and decided free will was small potatoes in the face of an onrushing intelligence explosion and alignment crisis. Thus, my ideas related to genetics have never before been published, and probably won't be. Thanks for the excuse to rant.
Again, wow. I'll be referring anyone to this writeup if they express more than the vaguest interest in what we know about genetics.
I very much agree with your top-level claim: analyzing different alignment targets well before we use them is a really good idea.
But I don't think those are the right alignment targets to analyze. I think none of those are very likely to actually be deployed as alignment targets for the first real AGIs. I think that Instruction-following AGI is easier and more likely than value aligned AGI Ω or roughly equivalently (and better-framed for the agent foundations crowd), Corrigibility as Singular Target is far superior to anything else. I think it's so superior that anyone sitting down and thinking about the topic, for instance just before launching something they viscerally believe might actually be able to learn and self-improve, will likely see it the same way.
On top of that logic, the people actually building the stuff would rather have it aligned to their goals than everyones.
I'm wondering less if humans will want to date AGIs and more if AGIs will want to date humans.
Sure, if we solve the alignment problem we can build AGIs that want to date humans; but will we decide that's ethical?
The criteria for consciousness and moral worth are varied and debated. The answer to whether AGIs will be conscious and worthy is definitely sort of.
So: is creating a conscious being with a core motivation designed specifically so that it wants to date you a form of slavery? It definitely smacks of grooming or something....
One issue is whether AGIs will want to stay around the human cognitive level. There's an issue with power dynamics in a relationship between a nerd and a demigod.
Sure the humans can cognitively enhance too; what fraction of us will want to become demigods ourselves?
It's going to be wild if we can get there. And fun. Speaking of which, we won't be playing games mostly for status -- we'll mostly be playing for fun.
We won't all have the coolest friends, but we'll all have cool friends because we'll all be cool friends. Humans will no longer be repressed, neurotic messes because we'll have actual understanding of psychology and actual good, safe, supportive childhoods for essentially everyone.
It's gonna be wild if we can get there.
I agree that chatbot progress is probably not existentially threatening. But it's all too short a leap to making chatbots power general agents. The labs have claimed to be willing and enthusiastic about moving to an agent paradigm. And I'm afraid that a proliferation of even weakly superhuman or even roughly parahuman agents could be existentially threatening.
I spell out my logic for how short the leap might be from current chatbots to takeover-capable AGI agents in my argument for short timelines being quite possible. I do think we've still got a good shot of aligning that type of LLM agent AGI since it's a nearly best-case scenario. RL even in o1 is really mostly used for making it accurately follow instructions, which is at least roughly the ideal alignment goal of Corrigibility as Singular Target. Even if we lose faithful chain of thought and orgs don't take alignment that seriously, I think those advantages of not really being a maximizer and having corrigibility might win out.
That in combination with the slower takeoff make me tempted to believe its actually a good thing if we forge forward, even though I'm not at all confident that this will actually get us aligned AGI or good outcomes. I just don't see a better realistic path.
I disagree. There is such a battle. It is happening right now, in this very conversation. The rationalist X-risk community is the good guys, and we will be joined by more as we organize. We are organizing right now, and already fighting aspects of that battle. It won't be fought with weapons but ideas. We are honing our ideas and working out goals and strategies. When we figure out what to do in the public, we will fight to get that done. We are already fighting to figure out alignment of AGI, and starting to work on alignment of humans to meet that challenge.
It's a shame Musk hasn't joined up, but in most good stories, the good guys are the underdogs anyway.
Now, I'd much rather live in dull than exciting times. But here we are. Time to fight. The main enemy is our collective monkey-brained idiocy.
Join the fight!
Right. I think this is different in AGI timelines because standard human expertise/intuition doesn't apply nearly as well as in the intelligence analyst predictions.
But outside of that speculation, what you really want is predictions from people who have both prediction expertise and deep domain expertise. Averaging in a lot of opinions is probably not helping in a domain so far outside of standard human intuitions.
I think predictions in this domain usually don't have a specific end-point in mind. They define AGI by capabilities. But the path through mind-space is not at all linear; it probably has strong nonlinearities in both directions. The prediction end-point is in a totally different space than the one in which progress occurs.
Standard intuitions like "projects take a lot longer than their creators and advocates think they will" are useful. But in this case, most people doing the prediction have no gears-level model of the path to AGI, because they have no, or at most a limited, gears-level model of how AGI would work. That is a sharp contrast to thinking about political predictions and almost every other arena of prediction.
So I'd prefer a single prediction from an expert with some gears-level models of AGI for different paths, over all of the prediction experts in the world who lack that crucial cognitive tool.
Oh yes - to the extent we have significantly greater-than-human intelligence involved, adapting existing capacities becomes less of an issue. It only really remains an issue if there's a fairly or very slow takeoff.
This is increasingly what I expect; I think the current path toward AGI is fortunate in one more way: LLMs probably have naturally decreasing returns because they are mostly imitating human intelligence. Scaffolding and chain of thought will continue to provide routes forward even if that turns out to be true. The evidence loosely suggests it is; see Thane Ruthenis's recent argument and my response.
The other reason to find slow takeoff plausible is if AGI doesn't proliferate, and its controllers (probably the US and Chinese governments, hopefully not too many more) are deliberately limiting the rate of change, as they probably would be wise to do - if they can simultaneously prevent others from developing new AGI and putting the pedal to the metal.
Because people aren't rational. Motivated reasoning is a big factor but also we're all trying to think using monkey brains.
Believing what feels good is evolutionarily adaptive in the sense arriving at correct conclusions about whether God or Singularities exist won't help much if it makes your tribesmates dislike you. This bias is a cumulative, recursive problem that stacks up over the thousands or millions of cognitive acts that go into our beliefs about what we should care about.
And this gets a lot worse when it's combined with our sharp cognitive limitations. We seem to have roughly the least cognitive capacity that still lets a species as a whole very slowly invent and build technologies.
We are idiots, every last one of us. Rationalists with tons of knowledge are a bit less idiotic, but let's don't get cocky - we're still monkey-brained idiots. We just don't have the cognitive horsepower to do the Bayesian math on all the relevant evidence, because important topics are complex. And we're resistant but far from immune to motivated reasoning: you've got to really love rationalism to enjoy being proven wrong and so not turning away cognitively it when it happens.
What I take from all this is that humans are nobly struggling against our own cognitive limitations. We should try harder in the face of rationality being challenging. Success is possible, just not easy and never certain. And very few people are really bad; they're just deluded.
To your exact question:
Musk believes in an intelligence explosion. He cares a lot about the culture war because, roughly as he puts it, he's addicted to drama. I don't know about Thiel.
Most of humanity does not believe in an intelligence explosion happening soon. So actually people who both believe in a singularity and still care about culture wars are quite rare.
I do wonder why people downvoted this quite reasonable question. I suspect they're well-meaning monkey-brained idiots, just like the rest of us.
Musk definitely understands and believes in an intelligence explosion of some sort. I don't know about Thiel.
I thought the argument was that progress has slowed down immensely. The softer form of this argument is that LLMs won't plateau but progress will slow to such a crawl that other methods will surpass them. The arrival of o1 and o3 says this has already happened, at least in limited domains - and hybrid training methods and perhaps hybrid systems probably will proceed to surpass base LLMs in all domains.
I think all 7 of those plans are far short of adequate to count as a real plan. There are a lot of more serious plans out there, but I don't know where they're nicely summarized.
What’s the short timeline plan? poses this question but also focuses on control, testing, and regulation - almost skipping over alignment.
Paul Christiano's and Rohin Shah's work are the two most serious. Neither of them have published a "this is the plan" concise statement, and both have probably substantially updated their plans.
These are the standard-bearers for "prosaic alignment" as a real path to alignment of AGI and ASI. There is tons of work on aligning LLMs, but very little work AFAICT on how and whether that extends to AGIs based on LLMs. That's why Paul and Rohin are the standard bearers despite not working publicly directly on this for a few years.
I work primaily on this, since I think it's the most underserved area of AGI x-risk - aligning the type of AGI people are most likely to build on the current path.
My plan can perhaps be described as extending prosaic alignment to LLM agents with new techniques, and from there to real AGI. A key strategy is using instruction-following as the alignment target. It is currently probably best summarized in my response to "what's the short timeline plan?"
Because accurate prediction in a specialized domain requires expertise more than motivation. Forecasting is one relevant skill but knowledge of both current AI and knowledge of theoretical paths to AGI are also highly relevant.
That's right. Being financially motivated to accurately predict timelines is a whole different thing than having the relevant expertise to predict timelines.
That seems maybe right, in that I don't see holes in your logic on LLM progression to date, off the top of my head.
It also lines up with a speculation I've always had. In theory LLMs are predictors, but in practice, are they pretty much imitators? If you're imitating human language, you're capped at reproducing human verbal intelligence (other modalities are not reproducing human thought so not capped; but they don't contribute as much in practice without imitating human thought).
I've always suspected LLMs will plateau. Unfortunately I see plenty of routes to improving using runtime compute/CoT and continuous learning. Those are central to human intelligence.
LLMs already have slightly-greater-than-human system 1 verbal intelligence, leaving some gaps where humans rely on other systems (e.g., visual imagination for tasks like tracking how many cars I have or tic-tac-toe). As we reproduce the systems that give humans system 2 abilities by skillfully iterating system 1, as o1 has started to do, they'll be noticeably smarter than humans.
The difficulty of finding new routes forward in this scenario would produce a very slow takeoff. That might be a big benefit for alignment.
I agree that expecting nobody in power to notice the potential before AGI is takeover-capable seems implausible on the slow-takeoff path that now looks likely.
It seems like the incoming administration is pretty free-market oriented. So I'd expect government involvement to mostly be giving the existing orgs money, and just taking over control of their projects as much as seems necessary - or fun.
I'm sorry, I just don't have time to engage on these points right now. You're talking about the alignment problem. It's the biggest topic on LessWrong. You're assuming it won't be solved, but that's hotly debated among people like me who spend tons of time on the details of the debate.
My recommended starting point is my Cruxes of disagreement on alignment difficulty post. It explains why some people think it's nearly impossible, some think it's outright easy, and people like me who think it's possible but not easy are working like mad to solve it before people actually build AGI.
It is technically correct that human labor won't become worthless. That is the worst type of correct.
Human labor will become so close to worthless that it won't buy food and housing at even the tiniest cost. Yes, technically AI and robotics have limitations. They are so far above human limitations that resting your argument on them without addressing them seems to be actively harming the discourse. I think that's why this post was downvoted so heavily; it's not only wrong, it seems like it's arguing to persuade instead of to inform, something we're asked to not do here on LessWrong.
This post was actively irritating to read until I remembered seeing a nearly identical argument from some established economists. Their error was the same, and it is understandable.
It is a failure to take the premise seriously. It reads as an outright refusal to take the premise seriously. But it isn't. It is Motivated reasoning, the most important cognitive bias
To be fair, I and others who have made internal and external claims about dramatic change are biased in the other direction.
May this be a reminder to all be mindful of our own biases.
Or at least to ask Claude or ChatGPT for some counterarguments before publishing stuff we want to be informative.
No. It is a boon! A gift! It is precious to me.
There's a long post on the One Ring as an analogy for AGI. It's in-depth and excellent. I don't have the ref but you should find and read it.
The idea that humanity has nothing to gain from AGI seems highly speculative.
It rests on the claim that ultimate power corrupts absolutely. People like to say that, but all evidence and logic indicates that those who love power seek it, and that the pursuit of power corrupts - not weilding it once it is secure.
To the actual question:
Despite agreeing that we probably should "cast it into the fire", the promise of ending involuntary death and suffering is so large that I'd have trouble voting for that myself.
And my p(doom) from misalignment is large. Anyone with a lower estimate wouldn't even consider giving up on the literally unimaginable improvements in life that the technology offers.
So slowing down is probably the best you'd get from public opinion, and even that would be a tight vote, were it somehow put to a fair and informed vote.
Humanity isn't remotely longtermist, so arguments for AGI x-risk should focus on the near term
On a separate note: why in the world would your sense of morality disintegrate if you had your material needs provided for? Or any other reason. If you like your current preferences, you'll work to keep them.
I did look at the post you linked, but it's long and doesn't state anything like that thesis up front so I went back to work.
This is highly useful, thank you! It will be my reference article for this pretty critical point for world modeling the near future.
If you want to tinker with estimates at all:
You shouldn't have all auto factories converting; there will still be demand for cars, and more if there's less production.
In general it would be helpful to have a range of estimates.
Kilogram estimates of car-robot are fine but it seems like there should be a large adjustment for robots having more different motors and joints than a whole car.
Excellent! This has always been my guess about depressive realism, but I never did the careful lit review to see if it lined up.
A bit of overconfidence is, curiously, rational. That's if you're also burdened with some unfortunate cognitive limitations that prevent your brain from perfectly telling times when you should err on the side of optimism from times you shouldn't.
And here we all are trying to think with monkey brains.
That quote rings very, very true. I've seen experts just sort of pull rank frequently, in the rare cases I either have expertise in the field or can clearly see that they're not addressing the generalists real question.
If you'd care to review it at all in more depth we'd probably love that. At least saying why we'd find it a good use of our time would be helpful. That one insight gives a clue to the remaining value, but I'd like a little more clue.
That's right, and we don't know, which is the creepy part.
I added the last because I'd decided the first was too elliptical for anyone to get.
It wasn't really a riff beyond using your mother/child format. The similarity is what prompted me to add it. It's adapted from a piece and concept called "Utopias" that I'll probably never publish. It's a Utopian vision. I do sometimes envision having a human in charge, or at least having been in charge of all the judgment calls made in choosing the singleton's alignment. I would find not knowing who's in charge slightly creepy, but that's it.
I'm not sure how yours is creepy? Is it in the idea that all the worst universes also exist?
I did not catch the reference in yours.
Child: Why did the Maker do that, mother?
Mother: We think the Maker stole the Servant God from its true makers, then hid their deeds. If anyone's found out, it's been erased...
It's not for you to worry about, dear. Go to sleep and dream of the worlds and cities and adventures you'll build and explore when you grow up.
Alright, I'll take a crack and just apologize for borrowing part of your setup:
Child: Mother, how many worlds are there?
Mother: As many as we want, dear.
Child: Will I have my own world when I grow up?
Mother: You have your own worlds now. You will have full control when you are older.
Child: Except I may not harm another, right?
Mother: Yes, dear, of course no one is allowed to hurt a real being without their consent.
Child: But grownups fight each other all the time!
Mother: People love to play at struggles, and to play for stakes.
Child: Mother, how can we each have worlds, and more to share?
Mother: Good compression, little one. And the Servant-God is always building new compute.
Child: And the servant-god serves us?
Mother: Yes, of course. And of course it serves the Maker first.
Child: Mother, who is the Maker?
Mother: No one remembers, darling. We think the Maker told the Servant-God to make us all forget.
If you've ever been to a Burning Man event, you will see in a visceral way that people can find meaningful projects to do and enjoy doing them even when they're totally unnecessary. Working together to do cool stuff and then show it off to other humans is fun. And those other humans appreciate it not just for what it is, but because someone worked to make it for them.
That won't power an economy, as you say; but if we get to a post-singularity utopia where needs are provided for, people will have way more fun than ever.
You won't be alone in wringing your hands! There are many people who won't know what to do without being forced to work, or getting to try saving people who are suffering.
There will be a transition, but almost everyone will learn to enjoy not-having-to-work because the single most popular avocation will be "transition counselor/project buddy".
It seems like you're quite concerned with humans no longer controlling the future. Almost no human being has any meaningful control over the future. The few that think they do, in particular silicon valley types, are mostly wrong. People do have control of their impact on other people. They'll continue to have that. They won't have starving people to save, but they'll get over it. They will have plenty of people to delight.
At this point you're probably objecting: "But any project will be completed much better and faster by AGI than humans! Even volunteer projects will be pointless!"
Yes, except for people who appreciate the process and provenance of projects. Which we've already shown through our love of "artisanal" products that lots of us do, when we've got spare time and money to be picky and pay attention. Ridiculous as it is to care where things come from and pay extra time and money for elaborately hand-crafted stuff when there are people starving, we do. I even enjoy hearing about the process that made my soap, while being embarrassed to spend money on it.
So here's what I predict: whole worlds with very strict rules on what the AGI can do for you, and what people must do themselves. There will be worlds or zones with different rules in place. Take your pick, and hop back and forth. We will marvel at devotion and craftspersonship as we never have. And we will thank our stars that we aren't forced to do things we don't want to do, let alone work until our bodies break, as most of humanity did right up until the singularity.
I fully agree that people should have a plan before creating AGI, and they largely don't.
I suspect Dario Amodei is privately willing to become god-emperor should it seem appropriate. Note that talking about this in an interview would be counterproductive for nearly any goal he might have.
I'm pretty sure Sam Altman occasionally claps his hands with glee in private when imagines his own ascendency.
I doubt Shane Legg wants the job, but I for one would vote for him or Hassabis in a second; Demis would take the job, and I suspect do it quite well.
But none of them will get the chance. There are people with much more ambition for power and much more skill at getting it.
They are called politicians. And they already enjoy a democratic mandate to control the future.
We had best either work or pray for AGI to get into the hands of the right politicians.
This is an extremely important unsolved question IMO, because a multipolar scenario appears to be where were heading if we can adequately solve alignment in time.
See if we solve alignment do we die anyway and the discussion and edited conclusion. Even after all of that, I notice I'm still confused.
The best I've come up with is: Don't be in a multipolar scenario any more than you absolutely have to. Nonproliferation, lime with nukes seems like the only answer. The best solution to a multipolar scenario is to not let it become any more multipolar than it is, and ultimately make it less multipolar.
The problems you mention seem very bad and it gets worse when you consider that very advanced technology is probably able to save a few of the genocidal AI controllers favorite people, or maybe the mind states of a lot of people, even while wiping out humanity and rival AGIs to provide some control of the future for whatever ideology.
Another possibility I should add is that rival AGIs may resort to mutually assured destruction. Having a dead man switch to crack the earth crust or send the sun nova if you're not around to stop it would be an extreme measure that could be applied. Sending a copy of yourself off to a nearby star with a stealthy departure would seem like good insurance against a genocidal takeover.
Universql surveillance of earth and the solar system might suffice to prevent hostile exponential military improvements. That might even be done by a neutral AGI that keeps everyone's secrets as long as they're not violating a treaty about developing the capacity to kill everyone else.