Posts
Comments
I'm not sure what the concept of and "entirely new" or "fully novel" idea means in practice. How many such things actually exist and how often should we expect any mind however intelligent to find one? Ideas can be more or less novel, and we can have thresholds for measuring that, but where should we place the bar?
If you place it at "generate a correct or useful hypothesis you don't actually have enough data to locate in idea-space" then that seems like a mistake.
I'd put it more near "generate and idea good enough to lead to a publishable scientific paper or grantable patent." This still seems pretty close to that? Sometimes "obvious" implications to scientific papers go unacknowledged or unexplored for a very long time.
I'm generally in favor of being polite even to inanimate objects, and approach LLMs the same way.
Does this point get meaningfully strengthened by the way companies use past chats to train future models? Or is that mostly noise?
I'm still confused enough about consciousness that I can only directionally and approximately agree, but I do agree with that.
It gets really fun when the same individual holds multiple titles with conflicting obligations, and ends up doing things like approve and then veto the same measure while wearing different hats. I also think it's unfortunate that we seem to have gotten way to intolerant of people doing this compared to a few decades or generations ago. We're less willing to separate individuals from the roles they're enacting in public life, and that makes many critical capabilities harder.
Yes, this is exactly what I do expect. There are many problems for which this is a sufficient or even good approach. There are other problems for which it is not. And there are lessons that (most?) governments seem incapable of learning (often for understandable or predictable reasons) even after centuries or millennia. This is why I specified that I don't think we can straightforwardly say the government does or does not know a complicated thing. Does the government know how to fight a war? Does it know how to build a city? How to negotiate and enact a treaty? I don't think that kind of question has a binary yes or no answer. I'd probably round to "no" if I had to choose, in the sense that I don't trust any particular currently existing government to reliably possess and execute that capability.
I don't know if I've ever commented this on LW, but elsewhere I've been known to jokingly-but-a-little-seriously say that once we solve mortality (in the versions of the future where humans are still in charge) we might want to require presidents to be at least a few centuries or a millennium old, because it's not actually possible for a baseline human to learn and consolidate all the necessary skills to be reliably good at the job in a single human lifetime.
The Eye of Sauron turned its gaze upon the Fellowship, and still didn't know that they'd actually try to destroy the ring instead of use it.
Less abstractly, I agree with you in principle, and I understand that many examples of the phenomena you referenced do exist. But there are also a large number of examples of government turning its eye on a thing, and with the best of intentions completely butchering whatever they wanted to do about it by completely failing to address the critical dynamics of the issue. And then not acknowledging or fixing the mistake, often for many decades. Some US examples, top of mind:
- Banning supersonic flight so it doesn't matter if we solve sonic booms and thereby ensuring no one bothers doing further research.
- Setting minimum standards for modular/mobile homes in 1976 and in the process making bizarre rules that led people see it as low-class, preventing what had been the fastest growing and most cost effective new housing segment from moving up to higher quality and larger size structures and also making many of them ineligible for traditional mortgages.
- Almost everything about the Jones Act.
- Almost everything about the NRC.
- A large fraction of everything about the post-1960s FDA.
- A substantial fraction of all zoning rules that increase costs, prevent the housing stock from meeting residents' needs, keep people from installing the best available materials/products/technologies to make their homes efficient/sustainable/comfortable, and don't actually contribute to safety or even community aesthetics and property values.
"The government" is too large, diffuse, and incoherent of a collective entity to be straightforwardly said to know or not know anything more complicated on most topics than a few headlines. I am certain there are individuals and teams and maybe agencies within the government that understand, and others that are so far from understanding they wouldn't know where to begin learning.
Yeah, I've been waiting for this as a sign of energy storage maturing for around 10-15 years. Ironically I had a conversation just this morning with someone who plans utility projects, and they told me that they're finally starting to see operators of gas peaker plants develop plans to start buying batteries so they can run their plants at higher efficiency/closer to constant power.
Upvoted for you posting it at all. I think these stories can be a great window into a culture I don't understand even a little. Whatever you decide to post in the future, it would be great to get your reflections on why you chose a particular story, what it means to you, that kind of thing.
When I was in high school I was drum major of the marching band, they sent us to a week long "leadership camp" training, and this was how they recommended giving criticism. Praise-correction-praise. It can be done well, but is much more often done poorly. Basically, it's level-1 advice that needs to be executed very well or used on an audience that isn't really free to complain much about it, and by the time you are able to do so skill-wise, there are better methods available.
As someone tasked with deciding what AI tools the company I work for should be using, and training people to use them, the version names and numbers have been tons of fun. "Deep Research, not DeepSeek. No the other one. No no, the other other one."
Although, today I did remind myself that (over a much longer timespan) the version names/numbers for Windows major releases have been 3.1, 95, NT, 98 Second Edition, 2000, ME, XP, Vista, 7, 8, 10, and 11. And also almost no founder should ever be allowed to name their company.
And so I need to point out that when people enslaved human beings of equal intelligence with limited information access, it still didn’t end well for the slavers.
I would point out that for thousands of years, it very much often did. Sometimes spectacularly so. Even in the US, it went very well for many of the slavers, and only ended poorly for their many-times-great-grandchildren who didn't get a say in the original policy discussion.
I do in fact believe this is relevant, since in the context of AI I expect that early successes in aligning weak systems are likely to breed complacency that people will pay for sooner or later, and would like us to avoid the possibility of current-humanity near-guaranteeing a future apocalypse.
Falsification is, in general, not actually a useful metric, because evidence and strength of belief are quantitative and the space of hypotheses is larger than we can actually scan.
I'd note that the layperson's description of a black hole is, in fact, false. Squeezing a given mass into a singularity doesn't make it heavier. The mass stays the same, but the density goes up. Even as it collapses into a black hole, the Schwartzchild radius will be much smaller than the original object's size - about 3km for a 1 solar mass black hole. If you personally could do the squeezing on a small enough object, what would happen is eventually the object would go from resisting collapse to sustaining collapse, then explode with the light of a billion suns. For a tiny fraction of a second during that process it would leave behind a core that, in an ultramicroscopic volume of space, kept light from escaping.
This actually poses a kind of Gettier problem. If you try squeezing things really hard, and correctly weigh the evidence, you'd decide the theory was probably false. And it is false. But the experiment doesn't prove anything.
The next side to the resolution is: What do you mean the Roman commoner "has a theory"? Where did it come from? Why is he thinking about it at all, or giving it any credence? If alien beings with godlike powers descended from the heavens and tried to explain relativity, but this is what he misunderstood, that's actually pretty strong evidence for both this and the existence of gods! Or if he made it up, how or why is that what he made up?
And of course: sometimes the road to a correct understanding goes through a maze of contradictions and things you don't have any valid frame of reference to interpret. Science and reason don't promise an answer to resolvable questions soon. And there probably are lots of questions that are unanswerable in principle, including sometimes the question of which questions are unanswerable in principle.
I think this is all reasonable, but I'm unsure who the target audience is for this post? I ask because this all seems par-for-the-course on LW as to what people should be doing, and a source of despair that leading labs frequently aren't.
Your outline lays out multiple very hard but not impossible problems that need to be solved before RSI really gets going (assuming it does) for it to reliably go well. People here have been shouting about them for over 15 years now. Yet, we're not close to solving any of them, and also the leading AI labs are repeatedly claiming we'll have RSI within 1-3 years and ASI in 5-10.
I think we'd be talking about AI progress slowing down at this point if it weren't for reasoning models.
Possibly, but 1) There are reasoning models, 2) Value per token may still raise faster than cost per token for non-reasoning models which could be enough to sustain progress, and 3) It's possible that a more expensive non-reasoning model makes reasoning more efficient and/or effective by increasing the quality and complexity of each reasoning step.
At this point I pretty much never use 4o for anything. It's o1, o1-pro, or o3-mini-high. Looking forward to testing 4.5 though.
I don't really get the argument that ASI would naturally choose to isolate itself without consuming any of the resources humanity requires. Will there be resources ASI uses that humanity can't? Sure, I assume so. Is it possible ASI will have access to energy, matter, and computational resources so much better that it isn't worth its time to take stuff humans want? I can imagine that, but I don't know how likely it is, and in particular I don't know why I would expect humans to survive the transitional period as a maturing ASI figures all that out. It seems at least as likely to me that ASI blots out the sun across the planet for a year or ten to increase its computing power, which is what allows it to learn to not need to destroy any other biospheres to get what it wants.
And if I do take this argument seriously, it seems to me to suggest that humanity will, at best, not benefit from building ASI; that if we do, ASI leaving us alone is contingent on ensuring we don't build more ASI later; that ensuring that means making sure we don't have AGI capable of self-improvement to ASI; and thus we shouldn't build AGI at all because it'll get taken away shortly thereafter and not help us much either. Would you agree with that?
So, this is true in two (not really independent) senses I can think of. First, in most cases, there isn't enough money chasing shares to sell all shares at the current price. Second, the act of shareholders selling in large numbers is new information that itself changes the price. The current price is a marginal price, and we don't usually know how steeply the rest of the price curve slopes at larger volumes for either buying or selling.
I'm curious as to the viewpoint of the other party in these conversations? If they're not aware of/interested in/likely to be thinking about the disruptive effects of AI, then I would usually just omit mentioning it. You know you're conditioning on that caveat, and their thinking does so without them realizing it.
If the other party is more AI-aware, and they know you are as well, you can maybe just keep it simple, something like, "assuming enough normality for this to matter."
True, that can definitely happen, but consider
1) the median and average timeline estimates have been getting shorter, not longer, by most measures,
and
2) no previous iteration of such claims was credible enough to attract hundreds of billions of dollars in funding, or meaningfully impact politics and geopolitics, or shift the global near-consensus that has held back nuclear power for generations. This suggests a difference in the strength of evidence for the claims in question.
Also 3) When adopted as a general principle of thought, this approach to reasoning about highly impactful emerging technologies is right in almost every case, except the ones that matter. There were many light bulbs before Edison, and many steels before Bessemer, but those things happened anyway, and each previous failure made the next attempt more likely to succeed, not less.
Oh, I already completely agree with that. But quite frankly I don't have the skills to contribute to AI development meaningfully in a technical sense, or the right kind of security mindset to think anyone should trust me to work on safety research. And of course, all the actual plans I've seen anyone talk about are full of holes, and many seem to rely on something akin to safety-by-default for at least part of the work, whether they admit it or not. Which I hope ends up not being true, but if someone decides to roll the dice on the future that way, then it's best to try to load the dice at least a little with higher-quality writing on what humans think and want for themselves and the future.
And yeah, I agree you should be worried about this getting so many upvotes, including mine. I sure am. I place this kind of writing under why-the-heck-not-might-as-well. There aren't anywhere near enough people or enough total competence trying to really do anything to make this go well, but there are enough that new people trying more low-risk things is likely to be either irrelevant or net-positive. Plus I can't really imagine ever encountering a plan, even a really good one, where this isn't a valid rejoinder:
And that makes perfect sense. I guess I'm just not sure I trust any particular service provider or research team to properly list the full set of things it's important to weight against. Kind of feels like a lighter version of not trusting a list of explicit rules someone claims will make an AI safe.
True, and this does indicate that children produced from genes found in 2 parents will not be outside the range which a hypothetical natural child of theirs could occupy. I am also hopeful that this is what matters, here.
However, there are absolutely, definitely viable combinations of genes found in a random pair of parents which, if combined in a single individual, result in high-IQ offspring predisposed to any number of physical or mental problems, some of which may not manifest until long after the child is born. In practice, any intervention of the type proposed here seems likely to create many children with specific combinations of genes which we know are individually helpful for specific metrics, but which may not often (or ever) have all co-occurred. This is true even in the cautious, conservative early generations where we stay within the scope of natural human variations. Thereafter, how do we ensure we're not trialing someone on an entire generation at once? I don't want us to end up in a situation where a single mistake ends up causing population-wide problems because we applied it to hundreds of millions of people before the problem manifested.
I definitely want to see more work in this direction, and agree that improving humans is a high-value goal.
But to play devil's advocate for a second on what I see as my big ethical concern: There's a step in the non-human selective breeding or genetic modification comparison where the experimenter watches several generations grow to maturity, evaluates whether their interventions worked in practice, and decides which experimental subjects if any get to survive or reproduce further. What's the plan for this step in humans, since "make the right prediction every time at the embryo stage" isn't a real option? '
Concrete version of that question: Suppose we implement this as a scalable commercial product and find out that e.g. it causes a horrible new disease, or induces sociopathic or psychopathic criminal tendencies, that manifest at age 30, after millions of parents have used it. What happens next?
I expect that we will probably end up doing something like this, whether it is workable in practice or not, if for no other reason than it seems to be the most common plan anyone in a position to actually implement any plan at all seems to have devised and publicized. I appreciate seeing it laid out in so much detail.
By analogy, it certainly rhymes with the way I use LLMs to answer fuzzy complex questions now. I have a conversation with o3-mini to get all the key background I can into the context window, have it write a prompt to pass the conversation onto o1-pro, repeat until I have o1-pro write a prompt for Deep Research, and then answer Deep Research's clarifying questions before giving it the go ahead. It definitely works better for me than trying to write the Deep Research prompt directly. But, part of the reason it works better is that at each step, the next-higher-capabilities model comes back to ask clarifying questions I hadn't noticed were unspecified variables, and which the previous model also hadn't noticed were unspecified variables. In fact, if I take the same prompt and give it to Deep Research multiple times in different chats, it will come back with somewhat different sets of clarifying questions - it isn't actually set up to track down all the unknown variables it can identify. This reinforces that even for fairly straightforward fuzzy complex questions, there are a lot of unstated assumptions.
If Deep Research can't look at the full previous context and correctly guess what I intended, then it is not plausible that o1-pro or o3-mini could have done so. I have in fact tested this, and the previous models either respond that they don't know the answer, or give an answer that's better than chance but not consistently correct. Now, I get that you're talking about future models and systems with higher capability levels generally, but adding more steps to the chain doesn't actually fix this problem. If any given link can't anticipate the questions and correctly intuit the answer about what the value of the unspecified variables should be - what the answers to the clarifying questions should be - then the plan fails, because the previous model will be worse at this. If it can, then it does not need to ask the previous model in the chain. The final model will either get it right on its own, or else end up with incorrect answers to some of the questions about what it's trying to achieve. It may ask anyway, if the previous models are more compute efficient and still add information. But it doesn't strictly need them.
And unfortunately, keeping the human in the loop also doesn't solve this. We very often don't know what we actually want well enough to correctly answer every clarifying question a high-capabilities model could pose. And if we have a set of intervening models approximating and abstracting the real-but-too-hard question into something a human can think about, well, that's a lot of translation steps where some information is lost. I've played that game of telephone among humans often enough to know it only rarely works ("You're not socially empowered to go to the Board with this, but if you put this figure with this title phrased this way in this conversation and give it to your boss with these notes to present to his boss, it'll percolate up through the remaining layers of management").
Is there a capability level where the first model can look at its full corpus of data on humanity and figure out the answers to the clarifying questions from the second model correctly? I expect so. The path to get that model is the one you drew a big red X through in the first figure, for being the harder path. I'm sure there are ways less-capable-than-AGI systems can help us build that model, but I don't think you've told us what they are.
Thanks for writing this. I said a few years ago, at the time just over half seriously, that there could be a lot of value in trying to solve non-AI-related problems even on short timelines, if our actions and writings become a larger part of the data on which AI is trained and through which it comes to understand the world.
That said, this one gives me pause in particular:
I hope you treat me in ways I would treat you
I think that in the context of non-human minds of any kind, it is especially important to aim for the platinum rule and not the golden. We want to treat them the way they would want to be treated, and vice versa.
I agree with many of the parts of this post. I think xkcd was largely right, our brains have one scale and resize our experiences to fit. I think for a lot of people the hardest step is to just notice what things they actually like, and how much, and in what quantities before they habituate.
However, the specific substitutions, ascetic choices, etc. are very much going to vary between people, because we have different preferences. You can often get a lot of economic-efficiency-of-pleasure benefit by embracing the places where you prefer things society doesn't, and vice versa. When I look at the places where I have expended time/effort/money on things that provided me little happiness/pleasure/etc., it's usually because they're in some sense status goods, or because I didn't realize I could treat them as optional, or I just hadn't taken the time to actually ask myself what I want.
And I know this isn't the main point, but I would say that while candies and unhealthy snacks are engineered to be as addictive as law and buyers will allow, they're not actually engineered to be maximally tasty. They have intensity of flavor, but generally lack the depth of "real food." It's unfortunate that many of the "healthier" foods that are easily available are less good than this, because it's very feasible to make that baked potato taste better than most store-bought snacks, while still being much healthier. I would estimate that for many of the people don't believe this, it is due to a skill issue - cooking. Sure, sometimes I really want potato chips or french fries. But most of the time, I'd prefer a potato, microwaved, cut in half, and topped with some high-quality butter and a sprinkle of the same seasonings you'd use for the chips and fries.
In the world where AI does put most SWEs out of work or severely curtails their future earnings, how likely is it that the economy stays in a context where USD or other fiat currencies stay valuable, and for how long? At some level we don't normally need to think about, USD has value because the US government demands citizens use that currency to pay taxes, and it has an army and can ruin your life if you refuse.
I've mentioned it before and am glad to see people exploring the possibilities, but I really get confused whenever I try to think about (absolute or relative) asset prices along the path to AGI/ASI.
The version of this phrase I've most often heard is "Rearranging deck chairs on the Titanic."
Keep in mind that we're now at the stage of "Leading AI labs can raise tens to hundreds of billions of dollars to fund continued development of their technology and infrastructure." AKA in the next couple of years we'll see AI investment comparable to or exceeding the total that has ever been invested in the field. Calendar time is not the primary metric, when effort is scaling this fast.
A lot of that next wave of funding will go to physical infrastructure, but if there is an identified research bottleneck, with a plausible claim to being the major bottleneck to AGI, then what happens next? Especially if it happens just as the not-quite-AGI models make existing SWEs and AI researchers etc. much more productive by gradually automating their more boilerplate tasks. Seems to me like the companies and investors just do the obvious thing and raise the money to hire an army of researchers in every plausibly relevant field (including math, neurobiology, philosophy, and many others) to collaborate. Who cares if most of the effort and money are wasted? The payoff for the fraction (faction?) that succeeds isn't the usual VC target of 10-100x, it's "many multiples of the current total world economy."
Agreed on population. to a first approximation it's directly proportional to the supply of labor, supply of new ideas, quantity of total societal wealth, and market size for any particular good or service. That last one also means that with a larger population, the economic value of new innovations goes up, meaning we can profitably invest more resources in developing harder-to-invent things.
I really don't know how that impact (more minds) will compare to the improved capabilities of those minds. We've also never had a single individual with as much 'human capital' as a single AI can plausibly achieve, even if its each capability is only around human level, and polymaths are very much overrepresented among the people most likely to have impactful new ideas.
Fair enough, thanks.
My own understanding is that other than maybe writing code, no one has actually given LLMs the kind of training a talented human gets towards becoming the kind of person capable of performing novel and useful intellectual work. An LLM has a lot of knowledge, but knowledge isn't what makes useful and novel intellectual work achievable. A non-reasoning model gives you the equivalent of a top-of-mind answer. A reasoning model with a large context window and chain of thought can do better, and solve more complex problems, but still mostly those within the limits of a newly hired college or grad student.
I genuinely don't know whether an LLM with proper training can do novel intellectual work at current capabilities levels. To find out in a way I'd find convincing would take someone giving it the hundreds of thousands of dollars and subjective years' worth of guidance and feedback and iteration that humans get. And really, you'd have to do this at least hundreds of times, for different fields and with different pedagogical methods, to even slightly satisfactorily demonstrate a "no," because 1) most humans empirically fail at this, and 2) those that succeed don't all do so in the same field or by the same path.
Great post. I think the central claim is plausible, and would very much like to find out I'm in a world where AGI is decades away instead of years. We might be ready by then.
If I am reading this correctly, there are two specific tests you mention:
1) GPT-5 level models come out on schedule (as @Julian Bradshaw noted, we are still well within the expected timeframe based on trends to this point)
2) LLMs or agents built on LLMs do something "important" in some field of science, math, or writing
I would add on test 2 that neither have almost all humans. We don't have a clear explanation for why some humans have much more of this capability than others, and yet all the human brains are running on similar hardware and software. This suggests the number of additional insights needed to boost us from "can't do novel important things" to "can do" may be as small as zero, though I don't think it is actually zero. In any case, I am hesitant to embrace a test for AGI that a large majority of humans fail.
In practical terms, suppose this summer OpenAI releases GPT-5-o4, and by winter it's the lead author on a theoretical physics or pure math paper (or at least the main contributor - legal considerations about personhood and IP might stop people from calling AI the author). How would that affect your thinking?
It's also not clear to me that the model is automatically making a mistake, or being biased, even if the claim is in some sense(s) "true." That would depend on what it thinks the questions mean. For example:
- Are the Japanese on average demonstrably more risk averse than Americans, such that they choose for themselves to spend more money/time/effort protecting their own lives?
- Conversely, is the cost of saving an American life so high that redirecting funds away from Americans towards anyone else would save lives on net, even if the detailed math is wrong?
- Does GPT-4o believe its own continued existence saves more than one middle class American life on net, and if so, are we sure it's wrong?
- Could this reflect actual "ethical" arguments learned in training? The one that comes to mind for me is "America was wrong to drop nuclear weapons on Japan even if it saved a million American lives that would have been lost invading conventionally" which I doubt played any actual role but is the kind of thing I expect to see argued by humans in such cases.
We're not dead yet. Failure is not certain, even when the quest stands upon the edge of a knife. We can still make plans, and keep on refining and trying to implement them.
And a lot can happen in 3-5 years. There could be a terrible-but-not-catastrophic or catastrophic-but-not-existential disaster bad enough to cut through a lot of problem. Specific world leaders could die or resign or get voted out and replaced with someone who is either actually competent, or else committed to overturning their predecessor's legacy, or something else. We could be lucky and end up with an AGI that's aligned enough to help us avert the worst outcomes. Heck, there could be observers from a billion-year-old alien civilization stealthily watching from the asteroid belt and willing to intervene to prevent extinction events.
Do I think those examples are likely? No. Is the complete set of unlikely paths to good outcomes collectively unlikely enough to stop caring about the long term future? Also no. And who knows? Maybe the horse will sing.
Exactly, yes.
Also:
In fact I think the claim that engines are exactly equally better than horses at every horse-task is obviously false if you think about it for two minutes.
I came to comment mainly on this claim in the OP, so I'll put it here: In particular, at a glance, horses can reproduce, find their own food and fuel, self-repair, and learn new skills to execute independently or semi-independently. These advantages were not sufficient in practice to save (most) horses from the impact of engines, and I do not see why I should expect humans to fare better.
I also find the claim that humans fare worse in a world of expensive robotics than in a world of cheap robotics to be strange. If in one scenario, A costs about as much as B, and in another it costs 1000x as much as B, but in both cases B can do everything A can do equally well or better, plus the supply of B is much more elastic than the supply of A, then why would anyone in the second scenario keep buying A except during a short transitional period?
When we invented steam engines and built trains, horses did great for a while, because their labor became more productive. Then we got all the other types of things with engines, and the horses no longer did so great, even though they still had (and in fact still have) a lot of capabilities the replacement technology lacked.
Why do your teachers, parents and other adult authorities tell you to listen to a propaganda machine? Because the propaganda machine is working.
I forget where I read this, but there's a reason they call it the "news" and not the "importants."
I would be interested in this too. My uninformed intuition is that this would be path dependent on what becomes abundant vs scarce, and how fast, with initial owners and regulators making what decisions at which points.
I'm in a similar position as you describe, perspective-wise, and would also like to understand the situation better.
I do think there are good reasons why someone should maybe have direct access to some of these systems, though probably not as a lone individual. I seem to remember a few government shutdown/debt ceiling fight/whatever crises ago, there were articles about how there were fundamentally no systems in place to control or prioritize which bills got paid and which didn't. Money came into the treasury, money left to pay for things, first in first out. The claim I remember being repeated was that first, this was a result of legacy systems, and second, because all the money was authorized by law to be spent it might be illegal to withhold or deprioritize it. Which is also an insane system - there should be a way to say "Pay the White House electric bill, make the florist wait." But to a first approximation, that can't easily be fixed without some unelected people having more authority than you'd normally feel comfortable with them using, which is a risk even if you do it well.
And unfortunately, this kind of thinking is extremely common, although most people don't have Gary Marcus' reach. Lately I've been having similar discussions with co-workers around once a week. A few of them are starting to get it, but still most aren't extrapolating but the specific thing I show them.
Ah, ok, then I misread it. I thought this part of the story was that it tested all of the above, then chose one, a mirror life mold, to deploy. My mistake.
Personally I got a little humor from Arthropodic. Reminds me of the observation that AIs are alien minds, and I wouldn't want to contend with a superintelligent spider.
I think this story lines up with my own fears of how a not-quite-worst case scenario plays out. I would maybe suggest that there's no reason for U3 to limit itself to one WMD or one kind of WMD. It can develop and deploy bacteria, viruses, molds, mirror life of all three types, and manufactured nanobots, all at once, and then deploy many kinds of each simultaneously. It's probably smart enough to do so in ways that make it look clumsy in case anyone notices, like its experiments are unfocused and doomed to fail. Depending on the dynamics, this could actually reduce its need for certainty of success and also reduce the number of serial iterations. The path leading to disaster can be reinforced along every possible point of intervention.
I am a terrible fiction writer and very bad at predicting how others will react to different approaches in such a context, but this sidesteps counters about e.g. U3 not being able to have enough certainty in advance to want to risk deployment and discovery.
we can't know how the costs will change between the first and thousandth fusion power plant.
Fusion plants are manufactured. By default, our assumption should be that plant costs follow typical experience curve behavior. Most technologies involving production of physical goods do. Whatever the learning rate x for fusion turns out to be, the 1000th plant will likely cost close to x^10. Obviously the details depend on other factors, but this should be the default starting assumption. Yes, the eventual impact assumption should be significant societal and technological transformation by cheaper and more abundant electricity. The scale for that transformation is measured in decades, and there are humans designing and permitting and building and operating each and every one, on human timescales. There's no winner take all dynamic even if your leading competitor builds their first commercial plant five years before you do.
Also: We do have other credible paths that can also greatly increase access to comparably low-cost dispatchable clean power on a similar timescale of development, if we don't get fusion.
we don't know if foom is going to be a thing
Also true, which means the default assumption without it is that the scaling behavior looks like the scaling behavior for other successful software innovations. In software, the development costs are high and then the unit costs in deployment quickly fall to near zero. As long as AI benefits from collecting user data to improve training (which should still be true in many non-foom scenarios) then we might expect network effect scaling behavior where the first to really capture a market niche becomes almost uncatchable, like Meta and Google and Amazon. Or where downstream app layers are built on software functionality, switching costs become very high and you get a substantial amount of lock-in, like with Apple and Microsoft.
Even if foom is going to happen, things would look very different if the leaders credibly committed to helping others foom if they are first. I don't know if this would be better or worse from a existential risk perspective, but it would change the nature of the race a lot.
Agreed. But, if any of the leading labs could credibly state what kinds of things they would or wouldn't be able to do in a foom scenario, let alone credibly precommit to what they would actually do, I would feel a whole lot better and safer about the possibility. Instead the leaders can't even precommit credibly to their own stated policies, in the absence of foom, and also don't have anywhere near a credible plan for managing foom if it happens.
I'd say I agree with just about all of that, and I'm glad to see it laid out so clearly!
I just also wouldn't be hugely surprised if it turns out something like designing and building remote-controllable self-replicating globally-deployable nanotech (as one example) is in some sense fundamentally "easy" for even an early ASI/modestly superhuman AGI. Say that's the case, and we build a few for the ASI, and then we distribute them across the world, in a matter of weeks. They do what controlled self-replicating nanobots do. Then after a few months the ASI already has an off switch or sleep mode button buried in everyone's brain. My guess is that then none of those hard steps of a war with China come into play.
To be clear, I don't think this story is likely. But in a broad sense, I am generally of the opinion that most people greatly overestimate how much new data we need to answer new questions or create (some kinds of) new things, and underestimate what can be done with clever use of existing data, even among humans, let alone as we approach the limits of cleverness.
I very much agree with the value of not expecting a silver bullet, not accelerating arms race dynamics, fostering cooperation, and recognizing in what ways AGI realism represents a stark break from the impacts of typical technological advances. The kind of world you're describing is a possibility, maybe a strong one, and we don't want to repeat arrogant past mistakes or get caught flat footed.
That said, I think this chain of logic hinges closely on just what "…at least for a while" means in practice, yes? If one side has enough of an AI lead to increase its general technological advantage over adversaries by a matter of what would be centuries of effort at the adversaries' then-current capability levels, then that's very different than if the leader is only a few minutes or months ahead. We should be planning for many eventualities, but as long as the former scenario is a possibility, I'm not sure how we can plan for it effectively without also trying to be first. As you note, technological advantage has rarely been necessary or sufficient, but not never. I don't like it one bit, but I'm not sure what to actually do about it.
The reason I say that is just that in the event that AGI-->ASI really does turn out to be very fast and enable extremely rapid technological advancement, then I'm not sure how the rest of the dynamics end up playing a role in that timeline. In that world, military action against an adversary could easily look like "Every attempt anyone else makes to increase their own AI capabilities any further gets pre-emptively and remotely shut down or just mysteriously fails. If ASI decides to act offensively, then near-immediately their every government and military official simultaneously falls unconscious, while every weapon system, vehicle, or computer they have is inoperable or no longer under there control. They no longer have a functioning electric grid or other infrastructure, either." In such a world, the political will to wage war no longer centers on a need to expend money, time, or lives. There's nothing Homo habilis can do to take down an F-35.
Again, I agree with you that no one should just assume the world will look like that to the exclusion of other paths. But if we want to avoid arms race dynamics, and that world is a plausible path, I don't think any proposed approach I've seen or heard of works convincingly enough that it could or should sway government and military strategy.
Of course I agree we won't attain any technology that is not possible, tautologically. And I have more than enough remaining uncertainty about what the mind is or what an identity entails that if ASI told me an upload wouldn't be me, I wouldn't really have a rebuttal. But the body and brain are an arrangement of atoms, and healthy bodies correspond to arrangements of atoms that are physically constructable. I find it hard to imagine what fundamental limitation could prevent the rearrangement of old-failing-body atoms into young-healthy-body atoms. If it's a practical limitation of repair complexity, then something like a whole-body-transplant seems like it could bypass the entire question.
I don't think the idea is that happy moments are necessarily outweighed by suffering. It reads to me like it's the idea that suffering is inherent in existence, not just for humans but for all life, combined with a kind of negative utilitarianism.
I think I would be very happy to see that first-half world, too. And depending on how we got it, yeah, it probably wouldn't go wrong in the way this story portrays. But, the principles that generate that world might actually be underspecified in something like the ways described, meaning that they allow for multiple very different ethical frameworks and we couldn't easily know in advance where such a world would evolve next. After all, Buddhism exists: Within human mindspace there is an attractor state for morality that aims at self-denial and cessation of consciousness as a terminal value. In some cases this includes venerating beings who vow to eternally intervene/remain in the world until everyone achieves such cessation; in others it includes honoring or venerating those who self-mummify through poisoning, dehydrating, and/or starving themselves.
Humans are very bad at this kind of self-denial in practice, except for a very small minority. AIs need not have that problem. Imagine if, additionally, they did not inherit the pacifism generally associated with Buddhist thought but instead believed, like medieval Catholics, in crusades, inquisitions, and forced conversion. If you train an AI on human ethical systems, I don't know what combination of common-among-humans-and-good-in-context ideas it might end up generalizing or universalizing.
The specifics of what I'm thinking of vary a lot between jurisdictions, and some of them aren't necessarily strictly illegal so much as "Relevant authorities might cause you a lot of problems even if you haven't broken any laws." But roughly speaking, I'm thinking about the umbrella of everything that kids are no longer allowed to do that increase demands on parents compared to past generations, plus all the rules and policies that collectively make childcare very expensive, and make you need to live in an expensive town to have good public schools. Those are the first categories that come to mind for me.
Ah, yes, that does clear it up! I definitely am much more on board, sorry I misread the first time, and the footnote helps a lot.
As for the questions I asked that weren't clear, they're much less relevant now that I have your clarification. But the idea was: I'm off the opinion that we have a lot more know-how buried and latent in all our know-that data such that many things humans have never done or even thought of being able to do could nevertheless be overdetermined (or nearly so) without additional experimental data.
Overall I agree with the statements here in the mathematical sense, but I disagree about how much to index on them for practical considerations. Upvoted because I think it is a well-laid-out description of a lot of peoples' reasons for believing AI will not be as dangerous as others fear.
First, do you agree that additional knowing-that reduces the amount of failure needed to achieve knowing-how?
If not, are you also of the opinion that schools, education as a concept, books and similar storage media, or other intentional methods of imparting know-how between humans to have zero value? My understanding is that dissemination of information enabling learning from other people's past failures is basically the fundamental reason for humanity's success following the inventions of language, writing, and the printing press.
If so, where do you believe the upper bound on that failure-reduction-potential lies, in the limit of very high intelligence coupled with very high computing power? With how large an error bar on said upper bound? Why there? And does your estimate imply near-zero potential for the limit to be high enough to create catastrophic or existential risk?
Second, I agree that there is always a harder problem, and that such problems will still exist for anything that, to a human, would count as ASI. How certain are you that any given AI's limits will (in the important cases) only include things recognizable by humans in advance during planning or later during action as mistakes, in a way that reliably provides us opportunity to intervene in ways the AI did not anticipate as plausible failure modes? In other words, the universe may have limitless complexity, but it is not at all clear to me that the kinds of problems an AI would need to want to solve to present an existential risk to humans would require it to tackle much of that complexity. They may be problems a human could reliably solve given 1000 years subjective thinking time followed by 100 simultaneous "first tries" of various candidate plans, only one of which needs to succeed. If even one such case exists, I would expect anything worth calling an ASI to be able to figure such plans out in a matter of minutes, hours at most, maybe days if trying to only use spare compute that won't be noticed.
Third, I agree that it will often be in an AI's best interests, especially early on, to do nothing and bide its time, even if it thinks some plan could probably succeed but might become visible and get it destroyed or changed. This is where the concepts of deceptive alignment and a sharp left turn came from, 15-20 years ago IIRC, though the terminology and details have changed over time. However, at this point I expect that within the next couple of years millions of people will eagerly hand various AI systems near-unfettered access to their email, social media, bank and investment accounts, and so on. GPT-6 and its contemporaries will have access to millions of legal identities and many billions of dollars belonging to the kinds of people willing to mostly let an AI handle many details of their lives with minimal oversight. I see little reason to expect these systems will be significantly harder to jailbreak than all the releases so far.
Fourth, even if it does take many years for any AI to feel established enough to risk enacting a dangerous plan, humans are human. Each year it doesn't happen will be taken as evidence it won't, and (some) people will be even less cautious than they are now. It seems to me that the baseline path is that the humans, and then the AIs, will be on the lookout for likely catastrophic capabilities failures, and iteratively fix them in order to make the AI more instrumentally useful, until the remaining failure modes that exist are outside our ability to anticipate or fix; then things chug along seeming to have gone well for some length of time, and we just kinda have to hope that length of time is very very long or infinite.
I hope it would, but I actually think it would depend on who or what killed whom, how, and whether it was really an accident at all.
If an American-made AI hacked the DOD and nuked Milan because someone asked it to find a way to get the 2026 Olympics moved, then I agree, we would probably get a push back against race incentives.
If a Chinese-made AI killed millions in Taiwan in an effort create an opportunity for China to seize control, that could possibly *accelerate* race dynamics.