Beliefs and state of mind into 2025
post by RussellThor · 2025-01-10T22:07:01.060Z · LW · GW · 7 commentsContents
1. Summary 2. When and how to pause? 2.1 Diminishing returns 2.1.1 Background on when progress gives diminishing returns 2.1.2 Diminishing returns and alignment 2.2 Pause at each step? 2.2.1. Slowing down progress increases the overhang 2.2.2 Society may be unstable at any level of pre-AGI technology from now until alignment is solved Centralization could be irreversible Society could adapt badly to pre-AGI 2.3 Pause at the point where AI increases AI researcher productivity by about 10X and aim to maximize time there. 2.3.1 Scheming? 2.3.2 If we get safely to this point, does prior alignment work matter at all? 2.3.2 Avoiding racing is important - resist the urge to improve the AI 2.3.3 Time required 2.4 What is the ideal fantasy way to get to superintelligence? 2.5 Past outcomes 2.6 Prediction None 7 comments
This post is to record the state of my thinking at the start of 2025. I plan to update these reflections in 6-12 months depending on how much changes in the field of AI.
1. Summary
It is best not to pause AI progress until at least one major AI lab achieves a system capable of providing approximately a 10x productivity boost for AI research, including performing almost all tasks of an AI researcher. Extending the time we remain in such a state is critical for ensuring positive outcomes.
If it was possible to stop AI progress sometime before that and focus just on mind uploading, that would be preferable, however I don’t think that is feasible in the current world. Alignment work before such a state suffers from diminishing returns from human intelligence and the lack of critical data on how things will actually play out.
2. When and how to pause?
The simple position is that superintelligent AI will be dangerous, therefore we should stop doing it, or at least pause until we figure out more. However I am genuinely unsure how long to pause, and when. I think the most important thing is having mildly superintelligent AI to help solve alignment, and staying at the stage as long as practical.
Just because something is dangerous, making it slower doesn’t make it safer. For example making childbirth take one week would obviously be far worse. The details determine what is the best course. The major reason against an immediate pause in AI is that it would likely apply to software, not hardware and increase the hardware overhang, without giving a counteracting increase in safety to make it worthwhile.
2.1 Diminishing returns
2.1.1 Background on when progress gives diminishing returns
For a lot of tech, progress is linear to exponential. We are used to steady progress, and the expectation of steady progress, and make plans accordingly. Often progress comes from technology and processes building on themselves. A common example of that is Moore's law where the existing tools which use the current chips are essential to build the new better chips. However sometimes growth can be slower than linear, with diminishing returns, even with the best planning and resources.
The clearest example of this is probably pure mathematics. Unlike in technology where a new machine benefits all in field, a genius proving a hard theorem does not automatically help school children learn their times tables or beginner algebra at all. Instead it makes the gap from a novice to the leading edge of humanities knowledge greater than it was before. This means that it takes longer for a novice to reach the boundary than before, and excludes ever more people from contributing at all, as they simply cannot reach such a level even with unlimited time. In the limit, with fixed human intelligence and population rather then steady progress, we get diminishing returns to almost completely stalled progress. There will be so much accumulated knowledge required to reach the boundary of human knowledge that pretty much no-one would even be able to reach it, before thinking about extending it. Furthermore even though some knowledge will be stored in text, it is possible that no-one alive would actually understand it if the field went out of fashion. I believe for math we are already seeing something like that, and to me this is clearly happening in fundamental physics.
Physics is a bit different because experimental data can guide theories, however there needs to be enough data to make a difference. Say we have an important experiment with a true/false result. The actual result when known will not double the progress as both options will already have been considered beforehand and likely both already be in diminishing returns territory. For example the LHC discovered the Higgs, (totally expected, so didn’t change much) and the absence of low energy SUSY which was not so expected, but not enough data to help make major progress. You could argue there has been 40+ years of little progress. I would argue then there will likely be even less in the next 40 years with fixed human intelligence. Or in other words, quantum gravity etc will not be solved by unmodified humans, but instead is almost certain to be done by some kind of ASI.
2.1.2 Diminishing returns and alignment
I believe there is clear evidence of this happening with alignment research and progress. The 5-10 years before GPT3.5/4 gave us more than the 0-5 before. Major groups like MIRI essentially seem to have given up. [LW · GW]
If alignment research is similar to other fields, then an unlimited period of time before GPT4, that is without actual data would not have lead to major further progress. It would in fact quite likely entrench existing ideas some of which will likely be wrong. From an outside/meta level for a new field without the needed experimental results you would not expect all theories to be correct.
Therefore a simple pause on AI capabilities to allow more time for alignment wouldn’t have helped.
2.2 Pause at each step?
One way is to pause with each major advance and allow alignment to advance to diminishing returns territory, with that hope that there will be enough progress to align a superintelligence at that stage and continue with capabilities if not. There are problems I see with this
2.2.1. Slowing down progress increases the overhang
Here I include the integration of robots in society, for example humanoid robots in all stages of the supply chain in what I call the overhang. Not just increases in computing hardware. With constant computing hardware but increasing robot integration takeover risk increases.
2.2.2 Society may be unstable at any level of pre-AGI technology from now until alignment is solved
There are known sustainability issues, however the unappreciated ones may be greater.
Centralization could be irreversible
Regimes like North Korea will be even worse and technically possible with AI. Imagine NK but with everyone with a constantly listening smartphone matched to a LLM. Any kind of opposition to authority would be simply impossible to coordinate. Once a democracy failed there would be no going back, and with time the number of NK like states would accumulate.
Society could adapt badly to pre-AGI
For example, also AI partners, disinformation polarization etc could lead to fragmentation of society. If anything, with time we seem to be having more issues as a society. If this is true, then our society would be better than a future one at deciding what the post-Singularity world should look like. The more fragmented and polarized society becomes the less clear it is what our CEV is and how to achieve it. We do not appear to be getting smarter, wiser, less warlike or well adjusted so we should make important decisions now rather than later.
2.3 Pause at the point where AI increases AI researcher productivity by about 10X and aim to maximize time there.
If alignment can’t be solved pre-AGI, and we can’t wait for WBE, then what is the optimum course? To me it is maximizing the time where AI is almost very dangerous – that is the time we can learn the most because we get useful results and the AI helps with alignment work itself.
2.3.1 Scheming?
Even if the AI is scheming, unless it is actively trying to takeover it would be hard for it to succeed in its plans. For example you can get the AI to design other AI’s with different architectures suited to interpretability and optimize them to similar capabilities as the original AI. Then proceed to use that AI for further work.
2.3.2 If we get safely to this point, does prior alignment work matter at all?
At the 10* stage, what you need is researchers that understand the issues, but are prepared to update rapidly on new results and use new tools. Existing models of what is dangerous will likely not be fully correct.
2.3.2 Avoiding racing is important - resist the urge to improve the AI
If the aim is to maximize the time where at least one AI lab is in this situation, then a race situation is the worst situation to be in. The lab or group of labs should have and believe they have a period of at least 1 year, preferable 2-3. Then they can resist the temptation to just have the AI continue to optimize itself to stay ahead. A major research lab/group that achieved such an AI would be compute constrained – more researchers would not help as they would not have access to such AI. Only a researcher with enough AI/compute to be 10* would be very useful.
2.3.3 Time required
Because of the speed-up enabled by AI, you will get to diminishing returns much faster. Just 1 year could well be enough and 10 years would be more than optimal to figure out how to align super AI, or at least have confidence to more to the next step in the unlikely even it is required (I expect a 10* AI would know how to create an aligned superintelligence or at least one that was aligned using the available computation with maximum efficiency.)
2.4 What is the ideal fantasy way to get to superintelligence?
If we accept that superintelligence is inevitable at some stage, what is the best or most natural path to get there if we were not constrained by reality? Its clearly not making an AI that we don’t fully understand. One way would be if everyone's IQ increased by 1 point per year, from a fixed date. This would share the gains evenly among everyone alive. (Children born 20 years later start +20) However that would cause large disruption as people got unsatisfied with their careers.
Another way is if each generation was 20 IQ smarter than the last. That may not be that disruptive as parents routinely cope with smarter children. Finally you could extend the human lifespan to say 500 years and view the first 50 as like some kind of childhood, where IQ then steadily increases after that. Some sci-fi has people becoming uploads later in life.
In terms of what is possible for us, whole brain emulation or mind uploading seems the most physically possible. It seems desired and likely as part of a post Singularity society. To me it would be desirable to just go to WBE without superintelligence first, however that is less likely to be possible for us with the current tech and geopolitical environment.
The plan for TAI should first be the alignment of a mildly superintelligent system, then optimize to physical limits, ensure some geo-political stability, install defenses against anticipated attacks, then pursue WBE asap.
2.5 Past outcomes
In Superintelligence I think Bostrom says somewhere that if he knew the formula for intelligence, he wouldn’t disclose it because of alignment dangers. However I definitely would if I had lived in 2005 and knew such a formula. I think at that stage we would be have been constrained by computational power and there would be no dangerous overhang. In a world where we were compute constrained we would have more time to detect and adapt to misalignment, and scheming etc. Specifically by the formula for intelligence I mean the neural code or a system as efficient and adaptable as biology.
2.6 Prediction
I expect an AGI that can 10* AI researcher output by 2028-2032. I believe the current architecture can’t scale to that level (75%) but may help discover the new architectures by suggesting experiments and studying biology and the neural code. I believe there is a good chance a much better architecture is discovered by more directly studying biology.
7 comments
Comments sorted by top scores.
comment by cousin_it · 2025-01-10T22:44:45.288Z · LW(p) · GW(p)
I guess the opposite point of view is that aligning AIs to AI companies' money interests is harmful to the rest of us, so it might actually be better if AI companies didn't have much time to do it, and the AIs got to keep some leftover morality from human texts. And WBE would enable the powerful to do some pretty horrible things to the powerless, so without some kind of benevolent oversight a world with WBE might be scary. But I'm not sure about any of this, maybe your points are right and mine are wrong.
Replies from: nathan-helm-burger, RussellThor↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-10T23:46:22.682Z · LW(p) · GW(p)
In one specific respect I'd like to challenge your point. I think fine-tuning models currently aligns them 'well-enough' to any target point of view. I think that the ethics shown by current LLMs are due to researchers actively putting them there. I've been doing red teaming exercises on LLMs for over a year now, and I find it quite easy to fine-tune them to be evil and murderous. Human texts help them understand morality, but don't make them care enough about it for it to be sticky in the face of fine-tuning.
Replies from: cousin_it, sharmake-farah↑ comment by cousin_it · 2025-01-11T00:02:47.348Z · LW(p) · GW(p)
Yeah, on further thought I think you're right. This is pretty pessimistic then, AI companies will find it easy to align AIs to money interests, and the rest of us will be in a "natives vs the East India Company" situation. More time to spend on alignment then matters only if some companies actually try to align AIs to something good instead, and I'm not sure any companies will do that.
Replies from: sharmake-farah, nathan-helm-burger↑ comment by Noosphere89 (sharmake-farah) · 2025-01-11T01:04:50.670Z · LW(p) · GW(p)
This is also my view of the situation, as well, and is a big portion of the reason why solving AI alignment, which reduces existential risk a lot, is non-trivially likely without further political reforms I don't expect to lead to dystopian worlds (from my values).
↑ comment by Nathan Helm-Burger (nathan-helm-burger) · 2025-01-11T01:06:31.769Z · LW(p) · GW(p)
Yeah, any small group of humans seizing unprecedented control over the entire world seems like a bad gamble to take, even if they start off seeming like decent people.
I'm currently hoping we can figure some kind of new governance solution for managing decentralized power while achieving adequate safety inspections.
https://www.lesswrong.com/posts/FEcw6JQ8surwxvRfr/human-takeover-might-be-worse-than-ai-takeover?commentId=uSPR9svtuBaSCoJ5P [LW(p) · GW(p)]
↑ comment by Noosphere89 (sharmake-farah) · 2025-01-10T23:49:43.642Z · LW(p) · GW(p)
This is consistent with a model where AI alignment is heavily dependent on the data, and way less dependent on inductive biases/priors, so this is good news for alignment.
↑ comment by RussellThor · 2025-01-11T01:38:31.727Z · LW(p) · GW(p)
Perhaps, depends how it is. I think we could do worse than just have Anthropic have a 2 year lead etc. I don't think they would need to prioritize profit as they would be so powerful anyway - the staff would be more interested in getting it right and wouldn't have financial pressure. WBE is a bit difficult, there needs to be clear expectations, i.e. leave weaker people alone and make your own world
https://www.lesswrong.com/posts/o8QDYuNNGwmg29h2e/vision-of-a-positive-singularity [LW · GW]
There is no reason why super AI would need to exploit normies. Whatever we decide, we need some kind of clear expectations and values regarding what WBE are before they become common, Are they benevolent super-elders, AI gods banished to "just" the rest of the galaxy, the natural life progression of first world humans now?