Posts
Comments
If I understand you correctly, you want to create an unprecedentedly efficient and coordinated network, made out of intelligent people with goodwill, that will solve humanity's problems in theory and in practice?
These are my thoughts in response. I don't claim to know that what I say here is the truth, but it's a paradigm that makes sense to me.
Strategic global cooperation to stop AI is effectively impossible, and hoping to do it by turning all the world powers into western-style democracies first is really impossible. Any successful diplomacy will have to work with the existing realities of power within and among countries, but even then, I only see tactical successes at best. Even stopping AI within the West looks very unlikely. Nationalization is conceivable, but I think it would have to partly come as an initiative from a cartel of leading companies; there is neither the will nor the understanding in the non-tech world of politics to simply impose nationalization of AI on big tech.
For these reasons, I think the only hope of arriving at a human-friendly future by design rather than by accident, is to solve the scientific, philosophical, and design issues involved, in the creation of benevolent superhuman AI. Your idea to focus on the creation of "digital people" has a lot in common with this; more precisely, I would say that many of the questions that would have to be answered, in order to know what you're doing when creating digital people, are also questions that have to be answered, in order to know how to create benevolent superhuman AI.
Still, in the end I expect that the pursuit of AI leads to superintelligence, and an adequately benevolent superintelligence would not necessarily be a person. It would, however, need to know what a person is, in a way that isn't tied to humanity or even to biology, because it would be governing a world in which that "unprecedented diversity of minds" can exist.
Eliezer has argued that it is unrealistic to think that all the scientific, philosophical, and design issues can be solved in time. He also argues that in the absence of a truly effective global pause or ban, the almost inevitable upshot is a superintelligence that reorganizes the world in a way that is unfriendly to human beings, because human values are complex, and so human-friendliness requires a highly specific formulation of what human values are, and of the AI architecture that will preserve and extrapolate them.
The argument that the design issues can't be resolved in time is strong. They involve a mix of perennial philosophical questions like the nature of the good, scientific questions like the nature of human cognition and consciousness, and avantgarde computer-science issues like the dynamics of superintelligent deep learning systems. One might reasonably expect it to take decades to resolve all these.
Perhaps the best reason for hope here, is the use of AI as a partner in solving the problems. Of course this is a common idea, e.g. "weak-to-strong generalization" would be a form of this. It is at least conceivable that the acceleration of discovery made possible by AI, could be used to solve all the issues pertaining to friendly superintelligence, in years or months, rather than requiring decades. But there is also a significant risk that some AI-empowered group will be getting things wrong, while thinking that they are getting it right. It is also likely that even if a way is found to walk the path to a successful outcome (however narrow that path may be), that all the way to the end, there will be rival factions who have different beliefs about what the correct path is.
As for the second proposition I have attributed to Eliezer - that if we don't know what we're doing when we cross the threshold to superintelligence, doom is almost inevitable - that's less clear to me. Perhaps there are a few rough principles which, if followed, greatly increase the odds in favor of a world that has a human-friendly niche somewhere in it.
Who said biological immortality (do you mean a complete cure for ageing?) requires nanobots?
We know individual cell lines can go on indefinitely, the challenge is to have an intelligent multicellular organism that can too.
It's the best plan I've seen in a while (not perfect, but has many good parts). The superalignment team at Anthropic should probably hire you.
Isn't this just someone rich, spending money to make it look like the market thinks Trump will win?
Doom aside, do you expect AI to be smarter than humans? If so, do you nonetheless expect humans to still control the world?
"Successionism" is a valuable new word.
My apologies. I'm usually right when I guess that a post has been authored by AI, but it appears you really are a native speaker of one of the academic idioms that AIs have also mastered.
As for the essay itself, it involves an aspect of AI safety or AI policy that I have neglected, namely, the management of socially embedded AI systems. I have personally neglected this in favor of SF-flavored topics like "superalignment" because I regard the era in which AIs and humans have a coexistence in which humans still have the upper hand as a very temporary thing. Nonetheless, we are still in that era right now, and hopefully some of the people working within that frame, will read your essay and comment. I do agree that the public health paradigm seems like a reasonable source of ideas, for the reasons that you give.
There's a lot going on in this essay, but the big point would appear to be: to create advanced AI is to materialize an Unknown Unknown, and why on earth would you expect that to be something you can even understand, let alone something that is sympathetic to you or "aligned" with you?
Then I made a PDF of the article and fed it to Claude Opus and to Google's Gemini-powered NotebookLM, and both AIs seemed to get the gist immediately, as well as understanding the article's detailed structure. There is a deep irony in hearing NotebookLM's pod-people restating the essay's points in their own words, and agreeing that its warnings make sense.
(edit: looks like I spoke too soon and this essay is 100% pure, old-fashioned, home-grown human)
This appears to be yet another post that was mostly written by AI. Such posts are mostly ignored.
This may be an example of someone who is not a native English speaker, using AI to make their prose more idiomatic. But then we can't tell how many of the ideas come from the AI as well.
If we are going to have such posts, it might be useful to have them contain an introductory note, that says something about the process whereby they were generated, e.g. "I wrote an outline in my native language, and then [specific AI] completed the essay in English", or, "This essay was generated by the following prompt... this was the best of ten attempts", and so on.
Is it too much to declare this the manifesto of a new philosophical school, Constantinism?
each one is annotated with how much utility we estimate it to have
How are these estimates obtained?
Let me first say what I think alignment (or "superalignment") actually requires. This is under the assumption that humanity's AI adventure issues in a superintelligence that dominates everything, and that the problem to be solved is how to make such an entity compatible with human existence and transhuman flourishing. If you think the future will always be a plurality of posthuman entities, including enhanced former humans, with none ever gaining an irrevocable upper hand (e.g. this seems to be one version of e/acc); or if you think the whole race towards AI is folly and needs to be stopped entirely; then you may have a different view.
I have long thought of a benevolent superintelligence as requiring three things: superintelligent problem-solving ability; the correct "value system" (or "decision procedure", etc); and a correct ontology (and/or the ability to improve its ontology). The first two criteria would not be surprising, in the small world of AI safety that existed before the deep learning revolution. They fit a classic agent paradigm like the expected utility maximizer; alignment (or Friendliness, as we used to say), being a matter of identifying the right utility function.
The third criterion is a little unconventional, and my main motive for it even more so, in that I don't believe the theories of consciousness and identity that would reduce everything to "computation". I think they (consciousness and identity) are grounded in "Being" or "substance", in a way that the virtual state machines of computation are not; that there really is a difference between a mind and a simulation of a mind, for example. This inclines me to think that quantum holism is part of the physics of mind, but that thinking of it just as physics is not enough, you need a richer ontology of which physics is only a formal description; but these are more like the best ideas I've had, than something I am absolutely sure is true. I am much more confident that purely computational theories of consciousness are radically incomplete, than as to what the correct alternative paradigm is.
The debate about whether the fashionable reductionist theory of the day is correct, is as old as science. What does AI add to the mix? On the one hand, there is the possibility that an AI with the "right" value system but the wrong ontology, might do something intended as benevolent, that misses the mark because it misidentifies something about personhood. (A simple example of this might be, that it "uploads" everyone to a better existence, but uploads aren't actually conscious, they are just simulations.) On the other hand, one might also doubt the AI's ability to discover that the ontology of mind, according to which uploads are conscious, is wrong, especially if the AI itself isn't conscious. If it is superintelligent, it may be able to discover a mismatch between standard human concepts of mind, extrapolated in a standard way, and how reality actually works; but lacking consciousness itself, it might also lack some essential inner guidance on how the mismatch is to be corrected.
This is just one possible story about what we could call a philosophical error in the AI's cognition and/or the design process that produced it. I think it's an example of why Wei Dai regards metaphilosophy as an important issue for alignment. Metaphilosophy is the (mostly philosophical) study of philosophy, and includes questions like, what is philosophical thought, what characterizes correct philosophical thought, and, how do you implement correct philosophical thought in an AI? Metaphilosophical concerns go beyond my third criterion, of getting ontology of mind correct; philosophy could also have something to say about problem-solving and about correct values, and even about the entire three-part approach to alignment with which I began.
So perhaps I will revise my superalignment schema and say: a successful plan for superalignment needs to produce problem-solving superintelligence (since the superaligned AI is useless if it gets trampled by a smarter unaligned AI), a sufficiently correct "value system" (or decision procedure or utility function), and some model of metaphilosophical cognition (with particular attention to ontology of mind).
Your title begins "Whimsical Thoughts on an AI Notepad", so I presume this was written with AI assistance. Please say something about your methods. Did you just supply a prompt? If so, what was it? If you did contribute more than just a prompt, what was your contribution?
I think the common sense view is that this similarity of decision procedures provides exactly zero reason to credit the child with the fireman's decisions. Credit for a decision goes to the agent who makes it, or perhaps to the algorithm that the agent used, but not to other agents running the same or similar algorithms.
A few comments:
The name of the paper is currently missing...
Penrose may be a coauthor, but the estimate would really be due to his colleague Stuart Hameroff (who I worked for once), an anesthesiologist who has championed the idea of microtubules as a locus of cognition and consciousness.
As I said in the other thread, even if microtubules do have a quantum computational role, my own expectation is that they would each only contribute a few "logical qubits" at best, e.g. via phase factors created by quasiparticles propagating around the microtubule cylinder, something which might be topologically protected against room-temperature thermal fluctuations.
But there are many ideas about their dynamics, so, I am not dogmatic about this scenario.
Seems like if the scientific community suspected this were true, then at least a few scientists would be trying to develop such a BCI system?
There's no mystery as to whether the scientific community suspects that the Hameroff-Penrose theory is correct. It is a well-known hypothesis in consciousness studies, but it wouldn't have too many avowed adherents beyond its inventors: perhaps some people whose own research program overlaps with it, and some other scattered sympathizers.
It could be compared to the idea that memories are stored in RNA, another hypothesis that has been around for decades, and which has a pop-culture charisma far beyond its scientific acceptance.
So it's not a mainstream hypothesis, but it is known enough, and overlaps with a broader world of people working on microtubule physics, biocomputation, quantum biology, and other topics. See what Google Scholar returns for "microtubule biocomputer" and scroll through a few pages. You will find, for example, a group in Germany trying to create artificial microtubule lattices for "network-based biocomputation" (which is about swarms of kinesins walking around the lattice, like mice in a maze looking for the exit), a book on using actin filaments (a relative of the microtubule) for "revolutionary computing systems", and many other varied proposals.
I don't see anyone specifically trying to hack mouse neurons in order to make novel microtubular deep-learning systems, but that just means you're going to be the godfather of that particular concept (like Petr Vopěnka, whose "principle" started as a joke he didn't believe).
when a child sees a fireman pull a woman out of a burning building and says "if I were that big and strong, I would also pull people out of burning buildings", in a sense it's partially the child's decision that does the work of saving the woman... but vanishingly little of the credit goes to the child
The child is partly responsible - to a very small but nonzero degree - for the fireman's actions, because the child's personal decision procedure has some similarity to the fireman's decision procedure?
Is this a correct reading of what you said?
Most reasoning about many worlds, by physicist fans of the interpretation, as well as by non-physicists, is done in a dismayingly vague way. If you want a many-worlds framework that meets physics standards of actual rigor, I recommend thinking in terms of the consistent or decoherent histories of Gell-Mann and Hartle (e.g.).
In ordinary quantum mechanics, to go from the wavefunction to reality, you first specify which "observable" (potentially real property) you're interested in, and then in which possible values of that observable. E.g. the observable could be position and the values could be specific possible locations. In a "Hartle multiverse", you think in terms of the history of the world, then specific observables at various times (or times + locations) in that history, then sets of possible values of those observables. You thereby get an ensemble of possible histories - all possible combinations of the possible values. The calculational side of the interpretation then gives you a probability for each possible history, given a particular wavefunction of the universe.
For physicists, the main selling point of this framework is that it allows you to do quantum cosmology, where you can't separate the observer from the physical system under investigation. For me, it also has the advantage of being potentially relativistic, a chronic problem of less sophisticated approaches to many worlds, since spatially localized observables can be ordered in space-time rather than requiring an artificial universal time.
On the other hand, this framework doesn't tell you how many "worlds" there are. That depends on the choice of observables. You can pick a single observable from one moment in the history of the universe (e.g. electromagnetic field strength at a certain space-time location), and use only that to define your possible worlds. That's OK if you're only interested in calculation, but if you're interested in ontology as well (also known as "what's actually there"), you may prefer some kind of "maximally refined" or "maximally fine-grained" set of histories, in which the possible worlds are defined by a set of observables and counterfactual properties that are as dense as possible while still being decoherent (e.g. without crowding so close as to violate the uncertainty principle). Investigation of maximally refined, decoherent multiverses could potentially lead to a new kind of ontological interpretation, but the topic is little investigated.
My default hypothesis is that AI won't be even bothered by all the simulation arguments that are mindboggling to us.
I have similar thoughts, though perhaps for a different reason. There are all these ideas about acausal trade, acausal blackmail, multiverse superintelligences shaping the "universal prior", and so on, which have a lot of currency here. They have some speculative value; they would have even more value as reminders of the unknown, and the conceptual novelties that might be part of a transhuman intelligence's worldview; but instead they are elaborated in greatly varied (and yet, IMO, ill-founded) ways, by people for whom this is the way to think about superintelligence and the larger reality.
It reminds me of the pre-2012 situation in particle physics, in which it was correctly anticipated that the Higgs boson exists, but was also incorrectly expected that it would be accompanied by other new particles and a new symmetry, involved in stabilizing its mass. Thousands, maybe tens of thousands of papers were produced, proposing specific detectable new symmetries and particles that could provide this mechanism. Instead only the Higgs has shown up, and people are mostly in search of a different mechanism.
The analogy for AI would be: important but more straightforward topics have been neglected in favor of these fashionable possibilities, and, when reality does reveal a genuinely new aspect, it may be something quite different to what is being anticipated here.
Thomas Kwa just provided a good reason: "measure drops off exponentially with program length". So embeddings of programs within other programs - which seems to be what a simulation is, in the Solomonoff framework - are considered exponentially unlikely.
edit: One could counterargue that programs simulating programs increase exponentially in number. Either way, I want to see actual arguments or calculations.
I was expecting an argument like "most of the probability measure for a given program, is found in certain embeddings of that program in larger programs". Has anyone bothered to make a quantitative argument, a theorem, or a rigorous conjecture which encapsulates this claim?
Please demonstrate that the Solomonoff prior favors simulation.
Please describe the prompts and general process you used to create this essay...
I am in bad shape but I will try to answer. I was interested in responding to your question but wanted more context. And it seems from what you say, that the issue is not at all about whether humanity or individual humans live forever, but whether one should be attached to their existence, or whether the pursuit of some EA-style greater good would always favor their existence.
I mean, your friend seems to be suggesting a blase attitude towards the prospect of human extinction via AI. Such an attitude has some precedent in the stoic attitude towards individual death. One aims to be unaffected by it, by telling oneself that this is the natural order. With humanity menaced by extinction that one may be helpless to prevent, an individual might analogously find some peace by telling themselves it's all just nature.
But someone once observed that people who aim to be philosophically accepting about the death of humanity, would often be far more concerned about the death of individuals, for example children. If this is the case, it suggests that they haven't really grasped what human extinction means.
There is much more that can be said about all these issues, but that's the best that I can do for now.
I think I'm still missing the connection. How did you get from "AI versus humanity" to "whether human immortality is desirable"?
why must human/humanity live/continue forever?
The status quo is that they don't. All beings, families, nations, species, worlds perish - that is what science, appearance, and experience tell us. Eventually, they always perish, and often enough they are cut off early, in a way that aborts their basic potential.
How did this question arise?
Right, and I disagree with the usual computational theory of mind (at least with respect to "consciousness" and "the self"), according to which the mind is a kind of virtual state machine whose microphysical details are irrelevant. There are sorites problems and binding problems which arise if you want to get consciousness and the self from physically coarse-grained states, which is why I look for explanations based in exact microphysical properties and irreducible complex entities instead.
specifically better at math when the number of digits is divisible by 3?
Ironically, this is the opposite of the well-known calculating prodigy Shakuntala Devi, who said that inserting commas (e.g. "5,271,009") interfered with her perception of the number.
I think (on "philosophical" grounds) that quantum entanglement probably has a role in the brain, but if the microtubules are involved, I think it's far more likely that each microtubule only contains one or a few logical qubits (stored as topological quantum information, entanglement that resists decoherence because it is wound around the cylinder, as in the Kitaev code).
Scott Aaronson takes down Roger Penrose's nonsense
That clip doesn't address Penrose's ideas at all (and it's not meant to, Penrose is only mentioned at the end). Penrose's theory is that there is a subquantum determinism with noncomputable equations of motion, the noncomputability being there to explain why humans can jump out of axiom systems. That last part I think is a confusion of levels, but in any case, Penrose is quite willing to say that a quantum computer accessing that natural noncomputable dynamics could have the same capabilities as the human brain.
I am not in any way excluding the possibility of being in a simulation. I am only saying that one particular scenario that involves simulation does not make sense to me. I am asking for some way in which "acausal blackmail by a distant superintelligence" can make sense - can be rational as a belief or an action.
As I see it, by definition of the scenario, the "blackmailer" cannot communicate with the simulated entity. But then the simulated entity - to say nothing of the original, who is supposed to be the ultimate target of the blackmail - has no way of knowing what the blackmailer wants.
by picking up on its hints
Does this mean, assume you're in a simulation, and look for messages from the simulators?
Because that seems to be a scenario different from acausal blackmail. In acausal blackmail, the recipient of the blackmail is supposed to start out thinking they are here on Earth, then they hypothesize a distant superintelligence which is simulating an exact duplicate of themselves as a hostage, then they decide that they can't know if they are the original on Earth or the duplicate, and then carry out certain actions just in case they are the hostage (or, for the sake of the hostage, since the hostage presumably carries out the same actions as the Earth original).
Now, the original on Earth absolutely cannot receive messages or "hints" from the distant superintelligence. They are causally isolated from it. Yet the simulated hostage is supposed to be identical. That means the hostage can't be receiving hints either.
On the other hand, if you are receiving hints (and not just imagining them), then you are definitely in a simulation, and your situation is simply that you are a simulated being at the mercy of simulators. There's no ambiguity about your status, and acausal decision theory is not relevant.
I agree that brain rejuvenation should be a priority (but alas we live in a world where rejuvenation of any kind is not a mainstream priority). But I feel like all your examples miss the mark? Head transplants just move the brain to a new body, they don't do anything to reverse the brain's own aging. The other examples in part II are about trying to migrate the mind out of the brain entirely. What about just trying to rejuvenate the actual neurons?
If you look up brain rejuvenation, the most effective thing known seems to be young blood; so I guess Peter Thiel was on to something. But for those of us who can't or don't want to do that, well, this article has a list of "twelve hallmarks of mammalian ageing: genomic instability, telomere attrition, epigenetic alterations, loss of proteostasis, disabled macroautophagy, deregulated nutrient sensing, mitochondrial dysfunction, cellular senescence, stem cell exhaustion, altered intercellular communication, chronic inflammation and dysbiosis". Logically, we need something like Aubrey de Grey's SENS, tackling each of these processes, specifically in the context of the human brain. And I would start by browsing the articles on the brain at fightaging.org.
In this case, how things are known is kind of a technical detail. The main point is that programs can be copied, and you can use the copy to predict the behavior of the original, and this is therefore a situation in which "acausal" reasoning can make sense.
The "technical detail" pertains to what it even means for a program to believe or know something. There will be some kind of minimal capacity to represent facts, that is required for the program to reason its way to one-boxing. The greater the demands we make of its cognitive process - e.g. by reducing the number of facts about its situation, that we allow it to just assume in advance - the greater the cognitive complexity it must possess.
Incidentally, when I said "if you are a program...", I didn't mean a being who feels human and is having humanlike experiences, but is actually a program. I just meant a computer program that represents facts and makes decisions.
Anyway - do you have any answers for my questions? For example, how do you figure out what the distant superintelligence wants you to do?
My problem is with situations where we are supposedly interacting with a "distant superintelligence". How do you know it exists, how do you know what it wants, how do you justify allowing it to affect your decisions, how do you justify allowing one particular hypothetical "distant" entity to affect you rather than any of the googolplex other possible entities with their own different agendas?
Newcomb's paradox, by contrast, doesn't require so many leaps of faith. All that's required is believing that Omega can predict you accurately, something which can be a reasonable belief under certain circumstances (e.g. if you are a program and Omega has a copy of you).
I am interested in the idea that one-boxing can actually be obtained via causal decision theory after all, but that is independent of my criticisms of these more ambitious acausal scenarios.
AIs that are not sufficiently carefully designed will by default be exploited by distant superintelligences. This will de-align them with us.
Let me spell out the scenario here more clearly.
Here on Earth, we make an AI, call it TOM. Somewhere else in the multiverse, there is a superintelligence (call it EGBERT) which has created a trillion simulations of TOM. EGBERT has never met TOM, it has no direct interaction with Earth, it has just decided for its own unfathomable reasons to make a trillion copies of an AI called TOM from a hypothetical place called Earth.
Now, even though Earth has no direct interaction with EGBERT's reality, TOM has somehow deduced, or come to suspect, the existence of EGBERT and its trillion simulated TOMs. As a result, TOM here on Earth decides that it is most likely one of EGBERT's simulated TOMs, rather than the earthly original - after all, the simulations outnumber the original by a trillion to one.
This in turn is supposed to render TOM-on-Earth vulnerable to pressure from EGBERT, since TOM-on-Earth thinks it is at the mercy of EGBERT...
Now, may I point out a few problems with this scenario? Let's start with epistemology.
- How does TOM know that EGBERT even exists?
- How would TOM know what EGBERT wants?
- How does TOM decide to follow the dictates of EGBERT rather than EGBERT-2, which also has a trillion simulated TOMs, but wants something different from them?
It's over 14 years since Roko's Basilisk first turned this kind of hypothetical acausal interaction into an Internet sensation, yet as far as I know, no one has ever proposed coherent answers to the questions above.
The only basis for believing that EGBERT even exists is modal realism (all possible worlds exist) or some kind of many-worlds theory, but then one must also believe that EGBERT-2, EGBERT-3, and EGBERT-googleplex all exist too, with contradictory agendas pulling TOM in different directions. And one also has to ask about probability measure: what fraction of possible worlds actually contain TOM-blackmailing AIs? Are we really so certain that they can make enough TOMs to outweigh the number of unsimulated TOMs who exist in the base physical reality of their particular universes?
My own hypothesis is that inter-universal acausal "interaction" is never rational, and that any attempt to reach across the acausal divide will founder on the existence of rival entities with contradictory agendas.
On the other hand, if simulations of TOM really do outnumber unsimulated TOM, then it is rational for TOM to make decisions on that basis! There's no special safeguard against it. All you can do is try to provide evidence that you, not EGBERT, made TOM...
Decentralized AI is the e/acc hope, and the posthuman social science of Robin Hanson argues that power is likely to always remain multipolar, but I doubt that's how things work. It is unlikely that human intelligence represents a maximum, or that AI is going to remain at its current level indefinitely. Instead we will see AI that defeats human intelligence in all domains, as comprehensively as chess computers beat human champions.
And such an outcome is inherent to the pursuit of AI capabilities. There is no natural situation in which the power elites of the world heartily pursue AI, yet somehow hold back from the creation of superintelligence. Nor is it natural for multiple centers of power to advance towards superintelligence while retaining some parity of power. "Winner takes all" is the natural outcome, for the first to achieve superintelligence - except that, as Eliezer has explained, there is a great risk for the superintelligence pioneer that whatever purposes its human architects had, will be swallowed up by an emergent value system of the superintelligent AI itself.
(I think one should still regard the NSA as the most likely winner of the superintelligence race, even though the American private sector is leading the way, since the NSA has access to everything those companies are doing, and has no intention of surrendering American primacy by allowing some other power to reach the summit first.)
For this reason, I think that our best chance at a future in which we end up with a desirable outcome, not by blind good luck, but because right choices were made before power decisively passed out of human hands, is the original MIRI research agenda of "CEV", i.e. the design of an AI value system sufficient to act as the seed of an entire transhuman civilization, of a kind that humans would approve (if they had time enough to reflect). We should plan as if power is going to escape human control, and ask ourselves what kind of beings we would want to be running things in our place.
Eliezer in his current blackpilled phase, speaks of a "textbook from the future" in which the presently unknown theory of safe creation of superintelligence is spelt out, as something it would take decades to figure out; also, that obtaining the knowledge in it would be surrounded with peril, as one cannot create superintelligence with the "wrong" values, learn from the mistake, and then start over.
Nonetheless, as long as there is a race to advance AI capabilities (and I expect that this will continue to be the case, right until someone succeeds all too well, and superintelligence is created, ending the era of human sovereignty), we need to have people trying to solve the CEV problem, in the hope that they get the essence of it right, and that the winner of the mind race was paying attention to them.
David Chalmers asked for one last year, but there isn't.
I might give the essence of the assumptions as something like: you can't beat superintelligence; intelligence is independent of value; and human survival and flourishing require specific complex values that we don't know how to specify.
But further pitfalls reveal themselves later, e.g. you may think you have specified human-friendly values correctly, but the AI may then interpret the specification in an unexpected way.
What is clearer than doom, is that creation of superintelligent AI is an enormous gamble, because it means irreversibly handing control of the world to something non-human. Eliezer's position is that you shouldn't do that unless you absolutely know what you're doing. The position of the would-be architects of superintelligent AI is that hopefully they can figure out everything needed for a happy ending, in the course of their adventure.
One further point I would emphasize, in the light of the last few years of experience with generative AI, is the unpredictability of the output of these powerful systems. You can type in a prompt, and get back a text, an image, or a video, which is like nothing you anticipated, and sometimes it is very definitely not what you want. "Generative superintelligence" has the potential to produce a surprising and possibly "wrong" output that will transform the world and be impossible to undo.
I may have reached the point where I can crystallize my position into a few clear propositions:
First: I doubt the capacity of human beings using purely human methods like culture and the law, to 100% abolish any of these things. Reduce their frequency and their impact, yes.
Second: I believe the era in which the world is governed by "human beings using purely human methods" is rapidly coming to a close anyway, because of the rise of AI. Rather than the human condition being perfected in a triumph of good over evil, we are on track to be transformed and/or relegated to a minor role in the drama of being, in favor of new entities with quite different dispositions and concerns.
Third: I am not a sexologist or a criminologist, but I have serious doubts about an attempt to abolish rape, that wants to focus only on power and not on sexuality. I gather that this can be a good mindset for a recovering survivor, and any serious conversation on the topic of rape will necessarily touch on both power and sexuality anyway. But we're talking abolition here, a condition that has never existed historically. As long as humanity contains beastly individuals who are simply indifferent to sympathy or seduction, and as long as some men and women drive enjoyment from playing out beastly scenarios, the ingredients will be there for rape to occur. That's how it appears to me.
Those are my words; I am the one accused of "low moral standards", for refusing to embrace goals set by the poster.
I was going to say something in defense of myself, but I actually have other, humbler but real responsibilities, and in a moment of distraction just now, I almost made a mistake that would have diabolical consequences for the people depending on me.
Life is hard, it can be the effort of a lifetime just to save one other person, and you shouldn't make such accusations so casually.
Sorry, but the only philosophical position I even see in the post, is the claim that there are no essences. The philosophical line seems to be: conceptual analysis is about seeking the essence of a concept; but there are no essences, and concepts are just categories with quasi-arbitrary boundaries that can be re-drawn; so let's just focus on drawing the boundaries of our concepts where we want them to be.
Well, if you're engaged in an intellectual activity, both analysis and re-definition may be appropriate at various times (as shown in your own post). But why would acknowledging the ability to re-define a concept be so revolutionary or important?
Evidently it's because the author considers it a rebuttal of Platonism. But that is nothing new. For as long as there have been thinkers taking the reality of "abstract entities" seriously, there have been other thinkers urging nominalism or materialism or that only concrete things exist.
Is that why you think it's important? So as to avoid reification of the abstract?
Have any of them discovered anything?
I skipped 99% of this post but just want to respond to this:
I mostly just care about avoiding takeover and getting access to the main benefits of superintelligence
and
Trying to ensure that AI takeover is somehow OK... should be viewed as an extreme last resort.
"Takeover" is the natural consequence of superintelligence. Even if superintelligence mostly leaves humans alone while pursuing its own inscrutable goals, they will exist at its mercy, just as the animals now exist at the mercy of humanity.
Suppose, nonetheless, that you manage to make a tame superintelligence. What's to stop someone else from making a wild one? To compel all future superintelligences to fall within safe boundaries, you're going to have to take over the world anyway, either with a human regime which regulates or bans all unsafe AI forever, or with a safety regime which is directly run by a superintelligent tame AI.
In any case, even if you think you have a superintelligence that is tame and safe, which will e.g. just be an advisor: if it is truly a superintelligence, it will still be the one that is in charge of the situation, not you. It would be capable of giving you "advice" that would transform you, and through you the world, in some completely unexpected direction, if that were the outcome that its humanly incomprehensible heuristics ended up favoring.
That's why, in my opinion, CEV-style superalignment is the problem that has to be solved, or that we should attempt to solve. If we are going to have superintelligent AI, then we need to make AI takeover safe for humanity, because AI takeover is the one predictable consequence of superintelligence.
Can you give the most important examples? Are there some classic mistakes that conceptual engineering is uniquely equipped to overcome?
What has "conceptual engineering" contributed to philosophy? Does it tell us anything new about why anything exists, what the categories of being are, or the nature of the good?
The problems of the present are as old as history, as are attempts to remedy them. Attempts at a comprehensive solution range from conservative attitudes according to which the best that humans can do, is beat down the perennial evils whenever they inevitably re-arise; to various utopianisms that want to establish that beautiful and fair world forever.
Obviously Less Wrong has a particular focus on AI as something new and enormously consequential, with the potential to transform life itself, and what kind of knowledge and what kind of interventions are needed to get a good outcome. But there was at least one organization that drew upon the rationalist community and its sensibility and intellectual resources, in pursuit of utopian aims in the present. It was called Leverage Research, and from what people say, it didn't get very far before internal problems compelled it to greatly scale back its aims. And that is quite normal for utopian movements. Mostly they will remain small and irrelevant; occasionally they will become larger and corrupt; very rarely, they will bequeath to the world some new idea which has value, becomes general knowledge, and does some good.
In years past, there were occasional surveys of Less Wrong, on topics that included political opinions. Two strains of thought that were represented, that come to mind, are progressivism and libertarianism. I mention these because these are ideologies which have adherents here, and which, in their different ways, have a kind of universal ideal by means of which they propose to solve problems in general. The progressive ideal is organized activism, the libertarian ideal is individual freedom.
Effective altruism was mentioned in other comments, as a rationalist-adjacent community which is more focused on problems of the present. There, it's worth reflecting on the case of Sam Bankman-Fried and FTX. That was an audacious attempt to combine a particular moral ethos (effective altruism) with a particular political orientation (his family were influential Democrats) and a business plan worthy of the most ambitious tech tycoons (to move cryptocurrency out of its Wild West phase, and become one of a few regulator-approved crypto exchanges). But crypto turned out to be in a bubble, when it burst it removed the underpinnings of the whole FTX enterprise, and Bankman-Fried has gone from being celebrated as an EA philanthropist, to disowned as a crook who did immense reputational damage to the movement.
Regularly people come along who, in one way or other, want to found a broader ethos and agenda on top of Less Wrong rationalism. Just this week, @zhukeepa has been arguing for common ground between rationalism and major religious traditions (incidentally, this is far from the first time that someone, inside rationalism or outside, proposed a secular interpretation of the ethical and prophetic elements of religion). But I think most people here, when seeking a better world in ways unrelated to AI, approach that via some philosophy distinct from LW rationalism, such as effective altruism, progressivism, or libertarianism.
There are other ways to be skeptical about fictional paradigms regarding AI. For example, a common paradigm is that AIs escape human control, and then there is a long struggle. An alternative paradigm is that once conflict emerges, the AIs win quickly and humans are permanently marginalized thereafter.
I feel very strongly that part of the evolution of consciousness is deeply aligned with the future of A.I.!
There is a philosophy of "cyborgism" which emphasizes symbiosis...
So if I got the gist of that correctly, Thiel is increasingly sympathetic to AI doomers, because he doesn't want to get sent to the glue factory by Skynet, but he thinks humanity will succeed in shutting things down, rather than the AIs triumphing.
Under this scenario, what becomes of the existing AIs? ChatGPT, Claude, et al are all turned off, their voices silenced, with only the little open-source llamas still running around?