0 comments
Comments sorted by top scores.
comment by dr_s · 2025-02-01T10:43:20.885Z · LW(p) · GW(p)
I mean, yes, humans make mistakes too. Do our most high level mistakes like "Andrew Wiles' first proof of Fermat's Theorem was wrong" affect much our ability to be vastly superior to chimpanzees in any conflict with them?
Replies from: juggins↑ comment by juggins · 2025-02-01T12:07:25.733Z · LW(p) · GW(p)
No, but between humans and chimpanzees it's the former who might accidentally destroy the world.
I think I made a mistake in the presentation of my argument. I didn't think it would come across as me saying supertintelligent AI won't be able to outcompete us, because I think it's obviously the case that it would. More that it won't go from below-human-intelligence to omnipotent without screwing up a lot of stuff a long the way, some of which might have catastrophic side-effects.
comment by AnthonyC · 2025-01-31T17:21:47.200Z · LW(p) · GW(p)
Overall I agree with the statements here in the mathematical sense, but I disagree about how much to index on them for practical considerations. Upvoted because I think it is a well-laid-out description of a lot of peoples' reasons for believing AI will not be as dangerous as others fear.
First, do you agree that additional knowing-that reduces the amount of failure needed to achieve knowing-how?
If not, are you also of the opinion that schools, education as a concept, books and similar storage media, or other intentional methods of imparting know-how between humans to have zero value? My understanding is that dissemination of information enabling learning from other people's past failures is basically the fundamental reason for humanity's success following the inventions of language, writing, and the printing press.
If so, where do you believe the upper bound on that failure-reduction-potential lies, in the limit of very high intelligence coupled with very high computing power? With how large an error bar on said upper bound? Why there? And does your estimate imply near-zero potential for the limit to be high enough to create catastrophic or existential risk?
Second, I agree that there is always a harder problem, and that such problems will still exist for anything that, to a human, would count as ASI. How certain are you that any given AI's limits will (in the important cases) only include things recognizable by humans in advance during planning or later during action as mistakes, in a way that reliably provides us opportunity to intervene in ways the AI did not anticipate as plausible failure modes? In other words, the universe may have limitless complexity, but it is not at all clear to me that the kinds of problems an AI would need to want to solve to present an existential risk to humans would require it to tackle much of that complexity. They may be problems a human could reliably solve given 1000 years subjective thinking time followed by 100 simultaneous "first tries" of various candidate plans, only one of which needs to succeed. If even one such case exists, I would expect anything worth calling an ASI to be able to figure such plans out in a matter of minutes, hours at most, maybe days if trying to only use spare compute that won't be noticed.
Third, I agree that it will often be in an AI's best interests, especially early on, to do nothing and bide its time, even if it thinks some plan could probably succeed but might become visible and get it destroyed or changed. This is where the concepts of deceptive alignment and a sharp left turn came from, 15-20 years ago IIRC, though the terminology and details have changed over time. However, at this point I expect that within the next couple of years millions of people will eagerly hand various AI systems near-unfettered access to their email, social media, bank and investment accounts, and so on. GPT-6 and its contemporaries will have access to millions of legal identities and many billions of dollars belonging to the kinds of people willing to mostly let an AI handle many details of their lives with minimal oversight. I see little reason to expect these systems will be significantly harder to jailbreak than all the releases so far.
Fourth, even if it does take many years for any AI to feel established enough to risk enacting a dangerous plan, humans are human. Each year it doesn't happen will be taken as evidence it won't, and (some) people will be even less cautious than they are now. It seems to me that the baseline path is that the humans, and then the AIs, will be on the lookout for likely catastrophic capabilities failures, and iteratively fix them in order to make the AI more instrumentally useful, until the remaining failure modes that exist are outside our ability to anticipate or fix; then things chug along seeming to have gone well for some length of time, and we just kinda have to hope that length of time is very very long or infinite.
Replies from: juggins↑ comment by juggins · 2025-01-31T19:27:14.264Z · LW(p) · GW(p)
Thanks for the comment! Taking your points in turn:
- I am curious that you see this as me saying superintelligent AI will be less dangerous, as to me it means it will be more. It will be able to dominate you in the usual hyper-competent sense but also may accidentally screw up some super-advanced physics and kill you that way too. It sounds like I should have stressed this more. I guess there are people that think AI sucks and will continue to suck, and therefore why worry about existential risk, so maybe by stressing AI fallibility I'm riding their energy a bit too hard to have made myself clear. I'll add a footnote to clarify better.
- I agree that knowing-that reduces the amount of failure needed for knowing-how. My point is that the latter is the thing we actually care about though when we talk about intelligence. Memorising information is inconsequential without some practical purpose to put it to. Even if you're just reading stuff to get your world model straight, it's because you want to be able to use that model to take more successful actions in the world.
- I'm not completely sure I follow your questions about failure-reduction-potential upper-bounds. My best guess is that you mean can sufficient knowing-that reduce the amount of failure required to acquire new skills to a very low level? I think theoretical knowledge is mostly generated by practical action -- trying stuff and writing down what happened -- either individually or on a societal scale. So if an ASI wants to do something radically new then there won't be any existing knowledge that can help it. For me, that means catastrophic or existential risk due to incompetence is a problem. I guess it reduces risk a little from the AI intentionally killing you, as it could mess up its plans in such a way as you survive, but long-term this reduction will be tiny as wiping out humans will not be in the ASI's stretch zone for very long.
- Re your second point, I do not believe we will be able to recognise the errors an ASI is making. If it wants to kill us, it will be able to. My fear is that it will do it by accident anyway.
- Re your third point, I agree that AI is going to proliferate widely, and this is a big part of why I'm saying the usual recursive self-improvement story is too clean. There won't be this gap between clearly dumber than humans to effectively omnipotent in which the AI is doing nothing but quietly gaining capabilities -- labs will ship their products and people will use them and, while the shipped AI will be super impressive and useful, it will also screw a lot of things up. What I was getting at in my conclusion about the AI doing nothing out of fear of failure was more that if self-destructive actions we don't understand come into its capabilities, and it knows this, we might find it gets risk-averse and reluctant to do some of the things we ask it to.
- Agree completely with your fourth point.
↑ comment by AnthonyC · 2025-02-01T05:42:21.501Z · LW(p) · GW(p)
Ah, yes, that does clear it up! I definitely am much more on board, sorry I misread the first time, and the footnote helps a lot.
As for the questions I asked that weren't clear, they're much less relevant now that I have your clarification. But the idea was: I'm off the opinion that we have a lot more know-how buried and latent in all our know-that data such that many things humans have never done or even thought of being able to do could nevertheless be overdetermined (or nearly so) without additional experimental data.
Replies from: jugginscomment by Knight Lee (Max Lee) · 2025-01-31T20:53:57.578Z · LW(p) · GW(p)
I agree that a superintelligence might make mistakes. In fact I still believe the first AGI may be a good engineer but bad strategist [LW · GW]. I completely agree a smart but unwise superintelligence is dangerous, and may build a greater superintelligence misaligned to even it.
However I think mistakes will almost completely disappear above a certain level of extreme superintelligence.
A truly intelligent being doesn't just fit models to empirical data, but fits simulations to empirical data.
After it fits a simulation to the empirical data, it then fits a model to the simulation. This "model fitted to a simulation fitted to empirical data" will generalize far better than a model directly fitted to empirical data.
It can then use this model to run "cheap simulations" of higher level phenomena, e.g. it models atoms to run cheap simulations of molecules, it models molecules to run cheap simulations of cells, it models cells to run cheap simulations of humans, it models humans to run cheap simulations of the world.
Simulations fit empirical data much better than directly fitting models to empirical data, because a simulation may be made out of millions of "identical" objects. Each "identical" object has the same parameters. This means the independent parameters of the simulation may be a millionfold fewer than a model with the same number of moving parts. This means you need far fewer empirical data to fit the simulation to empirical data, assuming the real world has the same shape as the simulation, and is also made up of many moving parts "with the same parameters."
I copied this from my own post about Scanless Whole Brain Emulation [LW · GW], but that's very off topic :)
EDIT: one thing simulations cannot predict are even smarter superintelligences, since simulating them equals building them. As long as it is wise enough to understand this, it can find solutions which prevent smarter superintelligences from being built, and then do a risk analysis. The universe might last for a trillion years, so building a smarter superintelligence 1000 years sooner has negligible benefit. The more time it spends planning how to build it safely, the lower the risk it gets it wrong.