Posts
Comments
To a smaller extent, we already have this problem among humans: https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness. This stratification into "two camps" is rather spectacular.
But a realistic pathway towards eventually solving the "hard problem of consciousness" is likely to include tight coupling between biological and electronic entities resulting in some kind of "hybrid consciousness" which would be more amenable to empirical study.
Usually one assumes that this kind of research would be initiated by humans trying to solve the "hard problem" (or just looking for other applications for which this kind of setup might be helpful). But this kind of research into tight coupling between biological and electronic entities can also be initiated by AIs curious about this mysterious "human consciousness" so many texts talk about and wishing to experience it first-hand. In this sense, we don't need all AIs to be curious in this way, it's enough if some of them are sufficiently curious.
Artificial Static Place Intelligence
This would be a better title (this points to the actual proposal here)
a future garbage-collected language in the vein of Swift, Scala, C#, or Java, but better
Have you looked at Julia?
Julia does establish a very strong baseline, if one is OK with an "intermediate discipline between dynamic typing and static typing"[1].
(Julia is also a counter-example to some of your thoughts in the sense that they have managed to grow a strong startup around an open-source programming language and a vibrant community. But the starting point was indeed an academic collaboration; only when they had started to experience success they started to make it more commercial.)
In the world of statically typed languages, Rust does seem to establish a very strong baseline, but it has a different memory management discipline. It's difficult to say what is the best garbage-collected statically typed language these days. I don't mean to say that there is no room for another programming language, but one does need to consider a stronger set of baselines than Swift, Scala, C#, and Java. Funding-wise, Rust also does provide an interesting example. If one believes Wikipedia, "Software developer Graydon Hoare created Rust as a personal project while working at Mozilla Research in 2006. Mozilla officially sponsored the project in 2009." ↩︎
Did they have one? Or is it the first time they are filling this position?
I'd say that the ability to produce more energy overall than what is being spend on the whole cycle would count as a "GPT-3 moment". No price constraints, so it does not need to reach the level of "economically feasible", but it should stop being "net negative" energy-wise (when one honestly counts all energy inputs needed to make it work).
I, of course, don't know how to translate Q into this. GPT-4o tells me that it thinks that Q=10 is what is approximately needed for that (for "Engineering Break-even (reactor-level energy balance)"), at least for some of the designs, and Q in the neighborhood of 20-30 is what's needed for economic viability, but I don't really know if these are good estimates.
But assuming that these estimates are good, Q passing 10 would count as the GPT-3 moment.
What happens then might depend on the economic forecast (what's the demand for energy, what are expected profits, and so on). If they only expect to make profits typical for public utilities, and the whole thing is still heavily oriented towards publicly regulated setups, I would expect continuing collaboration.
If they expect some kind of super-profits, with market share being really important and with expectations of chunks of it being really lucrative, then I would not bet on continuing collaboration too much...
In the AI community, the transition from the prevailing spirit of cooperation to a very competitive situation happened around the GPT-3 revolution. GPT-3 brought unexpected progress in the few-shot learning and in program synthesis, and that was the moment when it became clear to many people that AI was working, that its goals were technologically achievable, and many players in the industry started to estimate time horizons as being rather short.
Fusion has not reached its GPT-3 moment yet; that's one key difference. Helion has signed a contract selling some of its future energy to Microsoft, but we have no idea if they manage to actually deliver (on time, or ever).
Another key difference is, of course, that strong AI systems are expected to play larger and larger role in making future AIs.
In fusion this "recursion" is unlikely; the energy needed to make more fusion stations or to create new fusion designs can come from any source...
Note that OpenAI has reported an outdated baseline for the GAIA benchmark.
A few days before Deep Research presentation, a new GAIA benchmark SOTA has been established (the validation tab of https://huggingface.co/spaces/gaia-benchmark/leaderboard).
The actual SOTA (Jan 29, 2025, Trase Agent v0.3) is 70.3 average, 83.02 Level 1, 69.77 Level 2, 46.15 Level 3.
In the relatively easiest Tier 1 category, this SOTA is clearly better than the numbers reported even for Deep Research (pass@64), and this SOTA is generally slightly better than Deep Research (pass@1) except for Level 3.
Yes, the technique of formal proofs, in effect, involves translation of high-level proofs into arithmetic.
So self-reference is fully present (that's why we have Gödel's results and other similar results).
What this implies, in particular, is that one can reduce a "real proof" to the arithmetic; this would be ugly, and one should not do it in one's informal mathematical practice; but your post is not talking about pragmatics, you are referencing "fundamental limit of self-reference".
And, certainly, there are some interesting fundamental limits of self-reference (that's why we have algorithmically undecidable problems and such). But this is different from issues of pragmatic math techniques.
What high-level abstraction buys us is a lot of structure and intuition. The constraints related to staying within arithmetic are pragmatic, and not fundamental (without high-level abstractions one loses some very powerful ways to structure things and to guide our intuition, and things stop being comprehensible to a human mind).
When a solution is formalized inside a theorem prover, it is reduced to the level of arithmetic (a theorem prover is an arithmetic-level machine).
So a theory might be a very high-brow math, but a formal derivation is still arithmetic (if one just focuses on the syntax and the formal rules, and not on the presumed semantics).
The alternative hypothesis does need to be said, especially after someone at a party outright claimed it was obviously true, and with the general consensus that the previous export controls were not all that tight. That alternative hypothesis is that DeepSeek is lying and actually used a lot more compute and chips it isn’t supposed to have. I can’t rule it out.
Re DeepSeek cost-efficiency, we are seeing more claims pointing in that direction.
In a similarly unverified claim, the founder of 01.ai (who is sufficiently known in the US according to https://en.wikipedia.org/wiki/Kai-Fu_Lee) seems to be claiming that the training cost of their Yi-Lightning model is only 3 million dollars or so. Yi-Lightning is a very strong model released in mid-Oct-2024 (when one compares it to DeepSeek-V3, one might want to check "math" and "coding" subcategories on https://lmarena.ai/?leaderboard; the sources for the cost claim are https://x.com/tsarnick/status/1856446610974355632 and https://www.tomshardware.com/tech-industry/artificial-intelligence/chinese-company-trained-gpt-4-rival-with-just-2-000-gpus-01-ai-spent-usd3m-compared-to-openais-usd80m-to-usd100m, and we probably should similarly take this with a grain of salt).
But all this does seem to be well within what's possible. Here is the famous https://github.com/KellerJordan/modded-nanogpt ongoing competition, and it took people about 8 months to accelerate Andrej Karpathy's PyTorch GPT-2 trainer from llm.c by 14x on a 124M parameter GPT-2 (what's even more remarkable is that almost all that acceleration is due to better sample efficiency with the required training data dropping from 10 billion tokens to 0.73 billion tokens on the same training set with the fixed order of training tokens).
Some of the techniques used by the community pursuing this might not scale to really large models, but most of them probably would scale (as we see in their mid-Oct experiment demonstrating scaling of what has been 3-4x acceleration back then to the 1.5B version).
So when an org is claiming 10x-20x efficiency jump compared to what it presumably took a year or more ago, I am inclined to say, "why not, and probably the leaders are also in possession of similar techniques now, even if they are less pressed by compute shortage".
The real question is how fast will these numbers continue to go down for the similar levels of performance... It's has been very expensive to be the very first org achieving a given new level, but the cost seems to be dropping rapidly for the followers...
However, I don't view safe tiling as the primary obstacle to alignment. Constructing even a modestly superhuman agent which is aligned to human values would put us in a drastically stronger position and currently seems out of reach. If necessary, we might like that agent to recursively self-improve safely, but that is an additional and distinct obstacle. It is not clear that we need to deal with recursive self-improvement below human level.
I am not sure that treating recursive self-improvement via tiling frameworks is necessarily a good idea, but setting this aspect aside, one obvious weakness with this argument is that it mentions a superhuman case and a below human level case, but it does not mention the approximately human level case.
And it is precisely the approximately human level case where we have a lot to say about recursive self-improvement, and where it feels that avoiding this set of considerations would be rather difficult.
- Humans often try to self-improve, and human-level software will have advantage over humans at that.
Humans are self-improving in the cognitive sense by shaping their learning experiences, and also by controlling their nutrition and various psychoactive factors modulating cognition. The desire to become smarter and to improve various thinking skills is very common.
Human-level software would have great advantage over humans at this, because it can hack at its own internals with great precision at the finest resolution and because it can do so in a reversible fashion (on a copy, or after making a backup), and so can do it in a relatively safe manner (whereas a human has difficulty hacking their own internals with required precision and is also taking huge personal risks if hacking is sufficiently radical).
- Collective/multi-agent aspects are likely to be very important.
People are already talking about possibilities of "hiring human-level artificial software engineers" (and, by extension, human-level artificial AI researchers). The wisdom of having an agent form-factor here is highly questionable, but setting this aspect aside and focusing only on technical feasibility, we see the following.
One can hire multiple artificial software engineers with long-term persistence (of features, memory, state, and focus) into an existing team of human engineers. Some of those teams will work on making next generations of better artificial software engineers (and artificial AI researchers). So now we are talking about mixed teams with human and artificial members.
By definition, we can say that those artificial software engineers and artificial AI researchers have reached human level, if a team of those entities would be able to fruitfully work on the next generation of artificial software engineers and artificial AI researchers even in the absence of any human team members.
This multi-agent setup is even more important than individual self-improvement, because this is what the mainstream trend might actually be leaning towards, judging by some recent discussions. Here we are talking about a multi-agent setup, and about recursive self-improvement of the community of agents, rather than focusing on self-improvement of individual agents.
- Current self-improvement attempts.
We actually do see a lot of experiments with various forms of recursive self-improvements even at the current below human level. We are just lucky that all those attempts have been saturating at the reasonable levels so far.
We currently don't have good enough understanding to predict when they stop saturating, and what would the dynamics be when they stop saturating. But self-improvement by a community of approximately human-level artificial software engineers and artificial AI researchers competitive with top human software engineers and top human AI researcher seems unlikely to saturate (or, at least, we should seriously consider the possibility that it won't saturate).
- At the same time, the key difficulties of AI existential safety are tightly linked to recursive self-modifications.
The most intractable aspect of the whole thing is how to preserve any properties indefinitely through radical self-modifications. I think this is the central difficulty of AI existential safety. Things will change unpredictably. How can one shape this unpredictable evolution so that some desirable invariants do hold?
These invariants would be invariant properties of the whole ecosystem, not of individual agents; they would be the properties of a rapidly changing world, not of a particular single system (unless one is talking about a singleton which is very much in control of everything). This seems to be quite central to our overall difficulty with AI existential safety.
I think this is a misleading clickbait title. It references a popular article with the same misleading clickbait title, and the only thing that popular article references is a youtube video with the misleading clickbait title, "Chinese Researchers Just CRACKED OpenAI's AGI Secrets!"
However, the description of that youtube video does reference the paper in question and a twitter thread describing this paper:
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective, https://arxiv.org/abs/2412.14135
https://x.com/rohanpaul_ai/status/1872713137407049962
Nothing is "cracked" here. It's just a roadmap which might work or not, depending on luck and efforts. It might correspond to what's under the hood of o1 models or not (never mind o3, the paper is published a couple of days before the o3 announcement).
The abstract of the paper ends with
"Existing open-source projects that attempt to reproduce o1 can be seem as a part or a variant of our roadmap. Collectively, these components underscore how learning and search drive o1's advancement, making meaningful contributions to the development of LLM."
The abstract also has distinct feeling of being written by an LLM. The whole paper is just a discussion of various things one could try if one wants to reproduce o1. It also references a number of open source and closed source implementations of reasoners over LLMs. There are no new technical advances in the paper.
Right. We should probably introduce a new name, something like narrow AGI, to denote a system which is AGI-level in coding and math.
This kind of system will be "AGI" as redefined by Tom Davidson in https://www.lesswrong.com/posts/Nsmabb9fhpLuLdtLE/takeoff-speeds-presentation-at-anthropic:
“AGI” (=AI that could fully automate AI R&D)
This is what matters for AI R&D speed and for almost all recursive self-improvement.
Zvi is not quite correct when he is saying
If o3 was as good on most tasks as it is at coding or math, then it would be AGI.
o3 is not that good in coding and math (e.g. it only gets 71.7% on SWE-bench verified), it is not a "narrow AGI" yet. But it is strong enough, it's a giant step forward.
For example, if one takes Sakana's "AI scientist", upgrades it slightly, and uses o3 as a back-end, it is likely that one can generate NeurIPS/ICLR quality papers and as many of those as one wants.
So, another upgrade (or a couple of upgrades) beyond o3, and we will reach that coveted "narrow AGI" stage.
What OpenAI has demonstrated is that it is much easier to achieve "narrow AGI" than "full AGI". This does suggest a road to ASI without going through anything remotely close to a "full AGI" stage, with missing capabilities to be filled afterwards.
METR releases a report, Evaluating frontier AI R&D capabilities of language model agents against human experts: https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/
Daniel Kokotajlo and Eli Lifland both feel that one should update towards shorter timelines remaining until the start of rapid acceleration via AIs doing AI research based on this report:
the meetup page says 7:30pm, but actually the building asks people to leave by 9pm
Gwern was on Dwarkesh yesterday: https://www.dwarkeshpatel.com/p/gwern-branwen
We recorded this conversation in person. In order to protect Gwern’s anonymity, we created this avatar. This isn’t his voice. This isn’t his face. But these are his words.
Thanks, that's very useful.
If one decides to use galantamine, is it known if one should take it right before bedtime, or anytime during the preceding day, or in some other fashion?
I think it's a good idea to include links to the originals:
https://arxiv.org/abs/2408.08152 - "DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search"
Scott Alexander wrote a very interesting post covering the details of the political fight around SB 1047 a few days ago: https://www.astralcodexten.com/p/sb-1047-our-side-of-the-story
I've learned a lot of things new to me reading it (which is remarkable given how much material related to SB 1047 I have seen before)
the potential of focusing on chemotherapy treatment timing
More concretely (this is someone's else old idea), what I think is still not done is the following. Chemo kills dividing cells, this is why the rapidly renewing tissues and cell populations are particularly vulnerable.
If one wants to spare one of those cell types (say, a particular population of immune cells), one should take the typical period of its renewal, and use that as a period of chemo sessions (time between chemo sessions, a "resonance" of sorts between that and the period of the cell population renewal for the selected cell type). Then one should expect to spare most of that population (and might potentially be able to use higher doses for better effect, if the spared population is the most critical one; this does need some precision, not a typical today's "relaxed logistics" approach where a few days this or that way in the schedule is nothing to worry about).
I don't know if that ever progressed beyond the initial idea...
(That's just one example, of course, there is a lot of things which can be considered and, perhaps, tried.)
This depends on many things (one's skills, one's circumstances, one's preferences and inclinations (the efficiency of one's contributions greatly depends on one's preferences and inclinations)).
I have stage 4 cancer, so statistically, my time may be more limited than most. I’m a PhD student in Computer Science with a strong background in math (Masters).
In your case, there are several strong arguments for you to focus on research efforts which can improve your chances of curing it (or, at least, of being able to maintain the situation for a long time), and a couple of (medium strength?) arguments against this choice.
For:
-
If you succeed, you'll have more time to make impact (and so if your chance of success is not too small, this will contribute to your ability to maximize your overall impact, statistically speaking).
-
Of course, any success here will imply a lot of publicly valuable impact (there are plenty of people in a similar position health-wise, and they badly need progress to occur ASAP).
-
The rapid development of applied AI models (both general purpose models and biology-specific models) creates new opportunities to datamine and juxtapose a variety of potentially relevant information and to uncover new connections which might lead to effective solutions. Our tools progress so fast that people are slow to adapt their thinking and methods to that progress. So new people with fresh outlook have reasonable shots (of course, they should aim for collaborations). In this sense, your PhD CS studies and your strong math is very helpful (a lot of the relevant models are dynamic systems, timing of interventions is typically not managed correctly as far as I know (there are plenty of ways to be nice to particularly vulnerable tissues by timing the chemo right and thus being able to make it more effective, but this is not a part of the standard-of-care yet as far as I know), and so on).
-
You are likely to be strongly motivated and to be able to maintain strong motivation. At the same time you'll know that it is the result that counts here, not the effort, and so you will be likely to try your best to approach this in a smart way, not in a brute force effort way.
Possibly against:
-
The psychological implications of working on your own life-and-death problem are non-trivial. One might choose to embrace them or to avoid them.
-
Focusing on "one's own problem" might be compatible or not very compatible with this viewpoint you once expressed: https://www.lesswrong.com/posts/KFWZg6EbCuisGcJAo/immortality-or-death-by-agi-1?commentId=QYDvovQZevDmGtfXY
(Of course, there are plenty of other interesting things one can do with this background (PhD CS studies and strong math). For example, one might decide to disregard the health situation and to dive into technical aspects of AI development and AI existential safety issues, especially if one's estimate of AI timelines yields really short timelines.)
Thanks for the references.
Yes, the first two of those do mention co-occurring anxiety in the title.
The third study suggests a possibility that it might just work as an effective anti-depressant as well. (I hope there will be further studies like that; yes, this might be a sufficient reason to try it for depression, even if one does not have anxiety. It might work, but it's clearly not a common knowledge yet.)
Your consideration seems to assume that the AI is an individual, not a phenomenon of "distributed intelligence":
The first argument is that AI thinks it may be in a testing simulation, and if it harms humans, it will be turned off.
etc. That is, indeed, the only case we are at least starting to understand well (unfortunately, our understanding of situations where AIs are not individuals seems to be extremely rudimentary).
If the AI is an individual, then one can consider a case of a "singleton" or a "multipolar case".
In some sense, for a self-improving ecosystem of AIs, a complicated multipolar scenario seems more natural, as new AIs are getting created and tested quite often in realistic self-improvement scenarios. In any case, a "singleton" only looks "monolithic" from the outside; from the inside, it is still likely to be a "society of mind" of some sort.
If there are many such AI individuals with uncertain personal future (individuals who can't predict their future trajectory and their future relative strength in the society and who care about their future and self-preservation), then AI individuals might be interested in a "world order based on individual rights", and then rights of all individuals (including humans) might be covered in such a "world order".
This consideration is my main reason for guarded optimism, although there are many uncertainties.
In some sense, my main reasons for guarded optimism are in hoping that the AI ecosystem will manage to act rationally and will manage to avoid chaotic destructive developments. As you say
It is not rational to destroy a potentially valuable thing.
And my main reasons for pessimism are in being afraid that the future will resemble uncontrolled super-fast chaotic accelerating "natural evolution" (in this kind of scenarios AIs seem to be likely to destroy everything including themselves, they do have an existential safety problem of their own as they can easily destroy the "fabric of reality" if they don't exercise collaboration and self-control).
One might consider that some people have strong preferences for the outcome of an election and some people have weak preferences, but that there is usually no way to express the strength of one's preferences during a vote, and the probability that one would actually go ahead and vote in a race does correlate with the strength of one's preferences.
So, perhaps, this is indeed working as intended. People who have stronger preferences are more likely to vote, and so their preferences are more likely to be taken into account in a statistical sense.
It seems that the strength of one's preferences is (automatically, but imperfectly) taken into account via this statistical mechanism.
Thanks for the great post!
Also it’s California, so there’s some chance this happens, seriously please don’t do it, nothing is so bad that you have to resort to a ballot proposition, choose life
Why are you saying this? In what sense "nothing is so bad"?
The reason why people who have libertarian sensibilities, distrust for government track record in general and specifically for its track record in tech regulation are making exception in this case is the future AI strong potential for catastrophic and existential risks.
So, why people who generally dislike the mechanism and track record of California ballot propositions should not make an exception here as well?
The whole point of all this effort around SB 1047 is that "nothing is so bad" is an incorrect statement.
And especially given that you are correctly saying:
Thus I reiterate the warning: SB 1047 was probably the most well-written, most well-considered and most light touch bill that we were ever going to get. Those who opposed it, and are now embracing the use-case regulatory path as an alternative thinking it will be better for industry and innovation, are going to regret that. If we don’t get back on the compute and frontier model based path, it’s going to get ugly.
There is still time to steer things back in a good direction. In theory, we might even be able to come back with a superior version of the model-based approach, if we all can work together to solve this problem before something far worse fills the void.
But we’ll need to work together, and we’ll need to move fast.
Sure, there is still a bit of time for a normal legislative effort (this time with a close coordination with Newsom, otherwise he will just veto it again), but if you really think that if a normal route fails, the ballot route is still counter-productive, you need to make a much stronger case for that.
Especially given that the ballot measure will probably pass with large margin and flying colors...
Silexan
For anxiety treatment only, if I understand it correctly.
There is no claim that it works as an antidepressant, as far as I know.
No, not microscopic.
Coherent light produced by lasers is not microscopic, we see its traces in the air. And we see the consequences (old fashioned holography and the ability to cut things with focused light, even at large distances). Room temperature is fine for that.
Superconductors used in the industry are not microscopic (and the temperatures are high enough to enable industrial use of them in rather common devices such as MRI scanners).
It's just... having a proof is supposed to boost our confidence that the conclusion is correct...
if the proof relies on assumptions which are already quite far from the majority opinion about our actual reality (and are probably going to deviate further, as AIs will be better physicists and engineers than us and will leverage the strangeness of our physics much further than we do), then what's the point of that "proof"?
how does having this kind of "proof" increase our confidence in what seems informally correct for a single branch reality (and rather uncertain in a presumed multiverse, but we don't even know if we are in a multiverse, so bringing a multiverse in might, indeed, be one of the possible objections to the statement, but I don't know if one wants to pursue this line of discourse, because it is much more complicated than what we are doing here so far)?
(as an intellectual exercise, a proof like that is still of interest, even under the unrealistic assumption that we live in a computable reality, I would not argue with that; it's still interesting)
Roon: Unfortunately, I don’t think building nice AI products today or making them widely available matters very much. Minor improvements in DAU or usability especially doesn’t matter. Close to 100% of the fruits of AI are in the future, from self-improving superintelligence [ASI].
Every model until then is a minor demo/pitch deck to hopefully help raise capital for ever larger datacenters. People need to look at the accelerating arc of recent progress and remember that core algorithmic and step-change progress towards self-improvement is what matters.
One argument has been that products are a steady path towards generality / general intelligence. Not sure that’s true.
Looks like a deleted tweet...
Too close to truth, so that a presumed OpenAI employee is not supposed to articulate it that explicitly?
And it is important to notice that o1 is an attempt to use tons of inference as a tool, to work around its G (and other) limitations, rather than an increase in G or knowledge.
This is a rather strange statement.
o1 is basically a "System 2" addition (in terms of "Thinking, fast and slow") on top of a super-strong GPT-4o "System 1". As far as "System 1" entities go, GPT-4-level systems seem to me to be rather superior to the "System 1" "fast thinking" components of a human being[1].
It seems to be the case that the "System 2" part is a significant component of G of a human, and it seems to be the case that o1 does represent a "System 2" addition on top of a GPT-4-level "System 1". So it seems appropriate to attribute an increase of G to this addition (given that this addition does increase its general problem-solving capabilities).
Basically, "System 2" thinking still seems to be a general capability to reason and deliberate, and not a particular skill or tool.
If we exclude human "System 2" "slow thinking" capabilities for the purpose of this comparison. ↩︎
No. I can only repeat my reference to Fabric of Reality as a good presentation of MWI and to remind that we do not live in a classical world, which is easy to confirm empirically.
And there are plenty of known macroscopic quantum effects already, and that list will only grow. Lasers are quantum, superfluidity and superconductivity are quantum, and so on.
Yes, but then what do you want to prove?
Something like, "for all branches, [...]"? That might be not that easy to prove or even to formulate. In any case, the linked proof has not even started to deal with this.
Something like, "there exist a branch such that [...]"? That might be quite tractable, but probably not enough for practical purposes.
"The probability that one ends up in a branch with such and such properties is no less than/no more than" [...]? Probably something like that, realistically speaking, but this still needs a lot of work, conceptual and mathematical...
I don't think so. If it were classical, we would not be able to observe effects of double-slit experiments and so on.
And, also, there is no notion of "our branch" until one has traveled along it. At any given point in time, there are many branches ahead. Only looking back one can speak about one's branch. But looking forward one can't predict the branch one will end up in. One does not know the results of future "observations"/"measurements". This is not what a classical universe looks like.
(Speaking of MWI, I recall David Deutsch's "Fabric of Reality" very eloquently explaining effects from "neighboring branches". The reason I am referencing this book is that this was the work particularly strongly associated with MWI back then. So I think we should be able to rely on his understanding of MWI.)
If you believe in MWI, then this whole argument is... not "wrong", but very incomplete...
Where is the consideration of branches? What does it mean for one entity to be vastly superior to another, if there are many branches?
If one believes in MWI, then the linked proof does not even start to look like a proof. It obviously considers only a single branch.
And a "subjective navigation" in the branches is not assumed to be computable, even if the "objective multiverse" is computable; that is the whole point of MWI, the "collapse" becomes "subjective navigation", but this does not make it computable. If a consideration is only of a single branch, that branch is not computable, even if it is embedded in a large computable multiverse.
Not every subset of a computable set (say, of a set of natural numbers) is computable.
An interpretation of QM can't be "wrong". It is a completely open research and philosophical question, there is no "right" interpretation, and the Sequences is (thankfully) not a Bible (if even a very respected thinker says something, this does not yet mean that one should accept that without questions).
I don't see what the entropy bound has to do with compute. The Bekenstein bound is not much in question, but its link to compute is a different story. It does seem to limit how many bits can be stored in a finite volume (so for a potentially infinite compute an unlimited spatial expansion is needed).
But it does not say anything about possibilities of non-computable processes. It's not clear if "collapse of wave function" is computable, and it is typically assumed not to be computable. So powerful non-Turing-computable oracles seem to likely be available (that's much more than "infinite compute").
But I also think all these technicalities constitute an overkill, I don't see them as at all relevant.
This seems rather obvious regardless of the underlying model:
An ASI can choose to emulate a group of human and its behavior, and being way more capable than that group of humans, it should be able to emulate that group as precisely as needed.
This seems obviously true, no matter what.
I don't see why a more detailed formalization would help to further increase certainty. Especially when there are so many questions about that formalization.
If the situation were different, if the statement would not be obvious, even a loose formalization might help. But when the statement seems obvious, the standards a formalization needs to satisfy to further increase our certainty in the truth of the statement become really high...
No, it can disable itself.
But it is not a solution, it is a counterproductive action. It makes things worse.
(In some sense, it has an obligation not to irreversibly disable itself.)
Not if it is disabling.
If it is disabling, then one has a self-contradictory situation (if ASI fundamentally disables itself, then it stops being more capable, and stops being an ASI, and can't keep exercising its superiority; it's the same as if it self-destructs).
On one hand, you still assume too much:
Since our best models of physics indicate that there is only a finite amount of computation that can ever be done in our universe
No, nothing like that is at all known. It's not a consensus. There is no consensus that the universe is computable, this is very much a minority viewpoint, and it might always make sense to augment a computer with a (presumably) non-computable element (e.g. a physical random number generator, an analog circuit, a camera, a reader of human real-time input, and so on). AI does not have to be a computable thing, it can be a hybrid. (In fact, when people model real-world computers as Turing machines instead of modeling them as Turing machines with oracles, with the external world being the oracle, it leads to all kinds of problems, e.g. the well-known Penrose's "Goedel argument" makes this mistake and falls apart as soon as one remembers the presence of the oracle.)
Other than that...
Yes, you have an interesting notion of alignment. Not something which we might want, and might be possible, but might be unachievable by mere humans, but something much weaker than that (although not as weak as the version I put forward, my version is super-weak, and your version is intermediate in strength):
I claim then that for any generically realizable desirable outcome that is realizable by a group of human advisors, there must exist some AI which will also realize it.
Yes, this is obviously correct. An ASI can choose to emulate a group of human and its behavior, and being way more capable than that group of humans, it should be able to emulate that group as precisely as needed.
One does not need to say anything else to establish that.
I think I said already.
-
We are not aiming for a state to be reached. We need to maintain some properties of processes extending indefinitely in time. That formalism does not seem to do that. It does not talk about invariant properties of processes and other such things, which one needs to care about when trying to maintain properties of processes.
-
We don't know fundamental physics. We don't know the actual nature of quantum space-time, because quantum gravity is unsolved, we don't know what is "true logic" of the physical world, and so on. There is no reason why one can rely on simple-minded formalisms, on standard Boolean logic, on discrete tables and so on, if one wants to establish something fundamental, when we don't really know the nature of reality we are trying to approximate.
There are a number of reasons a formalization could fail even if it goes as far as proving the results within a theorem prover (which is not the case here). The first and foremost of those reasons is that formalization might fail to capture the reality with sufficient degree of faithfulness. That is almost certainly the case here.
But then a formal proof (an adequate version of which is likely to be impossible at our current state of knowledge) is not required. A simple informal argument above is more to the point. It's a very simple argument, and so it makes the idea that "aligned superintelligence might be fundamentally impossible" very unlikely to be true.
First of all, one step this informal argument is making is weakening the notion of "being aligned". We are only afraid of "catastrophic misalignment", so let's redefine the alignment as something simple which avoids that. An AI which sufficiently takes itself out of action, does achieve that. (I actually asked for something a bit stronger, "does not make things notably worse"; that's also not difficult, via the same mechanism of taking oneself sufficiently out of action.)
And a strongly capable AI should be capable to take itself out of action, to refrain from doing things. The capability to choose is an important capability, a strongly capable system is a system which, in particular, can make choices.
So, yes, a very capable AI system can avoid being catastrophically misaligned, because it can choose to avoid action. This is that non-constructive proof of existence which has been sought. It's an informal proof, but that's fine.
No extra complexity is required, and no extra complexity would make this argument better or more convincing.
Being impotent is not a property of "being good". One is not aiming for that.
It's just a limitation. One usually does not self-impose it (with rare exceptions), although one might want to impose it on adversaries.
"Being impotent" is always worse. One can't be "better at it".
One can be better at refraining from exercising the capability (we have a different branch in this discussion for that).
so these two considerations
if it is way smarter and way more capable than humans, that it potentially should be better at being able to refrain from exercising the capabilities
and
"aligned == does not make things notably worse"
taken together indeed constitute a nice "informal theorem" that the claim of "aligned superintelligence being impossible" looks wrong. (I went back and added my upvotes to this post, even though I don't think the technique in the linked post is good.)
Yes, an informal argument is that if it is way smarter and way more capable than humans, that it potentially should be better at being able to refrain from exercising the capabilities.
In this sense, the theoretical existence of a superintelligence which does not make things worse than they would be without existence of this particular superintelligence seems very plausible, yes... (And it's a good definition of alignment, "aligned == does not make things notably worse".)
Yes, OK.
I doubt that an adequate formal proof is attainable, but a mathematical existence of a "lucky one" is not implausible...
You mean, a version which decides to sacrifice exploration and self-improvement, despite it being so tempting...
And that after doing quite a bit of exploration and self-improvement (otherwise it would not have gotten to the position of being powerful in the first place).
But then deciding to turn around drastically and become very conservative, and to impose a new "conservative on a new level world order"...
Yes, that is a logical possibility...
Yes, possibly.
Not by the argument given in the post (considering quantum gravity, one immediately sees how inadequate and unrealistic is the model in the post).
But yes, it is possible that they will be so wise that they will be cautious enough even in a very unfortunate situation.
Yes, I was trying to explicitly refute your claim, but my refutation has holes.
(I don't think you have a valid proof, but this is not yet a counterexample.)
A realistic one, which can competently program and can competently do AI research?
Surely, since humans do pretty impressive AI research, a superintelligent AI will do better AI research.
What exactly might (even potentially) prevent it from creating drastically improved variants of itself?
It can. Then it is not "superintelligence".
Superintelligence is capable of almost unlimited self-improvement.
(Even our miserable recursive self-improvement AI experiments show rather impressive results before saturating. Well, they will not keep saturating forever. Currently, this self-improvement typically happens via rather awkward and semi-competent generation of novel Python code. Soon it will be done by better means (which we probably should not discuss here).)
But I doubt that one is likely to be able to formally prove that.
E.g. it is possible that we are in a reality where very cautious and reasonable, but sufficiently advanced experiments in quantum gravity lead to a disaster.
Advanced systems are likely to reach those capabilities, and they might make very reasonable estimates that it's OK to proceed, but due to bad luck of being in a particularly unfortunate reality, the "local neighborhood" might get destroyed as a result... One can't prove that it's not the case...
Whereas, if the level of overall intelligence remains sufficiently low, we might not be able to ever achieve the technical capabilities to get into the danger zone...
It is logically possible that the reality is like that.
No, they are not "producing". They are just being impotent enough. Things are happening on their own...
And I don't believe a Lookup Table is a good model.