Posts
Comments
小句子很好。为什么?清晰。我的句子又长又抽象。不清楚。现在简短清晰了。
很好。思路清晰。为什么?简单需要深刻。
There are a number of aspects that make fire alarms less likely in the AI 2027 scenarios compared to what I consider likely - e.g. having 2 projects that matter, whereas I expect more like 3 to 6 such projects.
I agree about the plurality of projects. AI 2027 has an American national project and a Chinese national project, whereas at present both countries have multiple companies competing with each other.
AI 2027 also has the two national AIs do a secret deal with each other. My own thought about superintelligence does treat it as a winner-take-all race, so "deals" don't have the same meaning as in situations where the parties actually have something to offer each other. There's really only room for pledges or promises, of the form "If I achieve superintelligence first, I promise that I will use my power in the service of these goals or values".
So my own model of the future has been that there will be more than two contenders, and that one of them will cross the threshold of superintelligence first and become the ultimate power on Earth. At that point it won't need anyone or anything else; all other ex-contenders will simply be at the winner's mercy.
A full-fledged arms race would involve the racers acting as if the second place finisher suffers total defeat.
I don't get the impression that AI companies currently act like that. They seem to act like first place is worth trillions of dollars, but also like employees at second and third place "finishers" will each likely get something like a billion dollars.
Even before AI, it was my impression of big tech companies like Microsoft, Amazon, Google, that they are quite willing to form cooperative relationships, but they also have no compunction about attempting to become dominant in every area that they can. If any of them could become an ultimate all-pervasive monopoly, they would. Is there anything in the behavior of the AI companies that looks different?
I take this to mostly be a response to the idea that humanity will be protected by decentralization of AI power, the idea apparently being that your personal AI or your society's AIs will defend you against other AIs if that is ever necessary.
And what I think you've highlighted, is that this is no good if your defensive AIs are misaligned (in the sense of not being properly human-friendly or even just "you"-friendly), because what they will be defending are their misaligned values and goals.
As usual, I presume that the AIs become superintelligent, and that the situation evolves to the point that the defensive AIs are in charge of the defense from top to bottom. It's not like running an antivirus program, it's like putting a new emergency leadership in charge of your entire national life.
Maybe OpenAI did something to prevent its AIs from being pro-Hamas, in order to keep the Trump administration at bay, but it was too crude a patch and now it's being triggered at inappropriate times.
Old-timers might remember that we used to call lying, "hallucination".
Which is to say, this is the return of a familiar problem. GPT-4 in its early days made things up constantly, that never completely went away, and now it's back.
Did OpenAI release o3 like this, in order to keep up with Gemini 2.5? How much does Gemini 2.5 hallucinate? How about Sonnet 3.7? (I wasn't aware that current Claude has a hallucination problem.)
We're supposed to be in a brave new world of reasoning models. I thought the whole point of reasoning was to keep the models even more based in reality. But apparently it's actually making them more "agentic", at the price of renewed hallucination?
Is there a name for the phenomenon of increased intelligence or increased awareness leading to increased selfishness? It sounds like something that a psychologist would have named.
The four questions you ask are excellent, since they get away from general differences of culture or political system, and address the processes that are actually producing Chinese AI.
The best reference I have so far is a May 2024 report from Concordia AI on "The State of AI Safety in China". I haven't even gone through it yet, but let me reproduce the executive summary here:
The relevance and quality of Chinese technical research for frontier AI safety has increased substantially, with growing work on frontier issues such as LLM unlearning, misuse risks of AI in biology and chemistry, and evaluating "power-seeking" and "self-awareness" risks of LLMs.
There have been nearly 15 Chinese technical papers on frontier AI safety per month on average over the past 6 months. The report identifies 11 key research groups who have written a substantial portion of these papers.
China’s decision to sign the Bletchley Declaration, issue a joint statement on AI governance with France, and pursue an intergovernmental AI dialogue with the US indicates a growing convergence of views on AI safety among major powers compared to early 2023.
Since 2022, 8 Track 1.5 or 2 dialogues focused on AI have taken place between China and Western countries, with 2 focused on frontier AI safety and governance.
Chinese national policy and leadership show growing interest in developing large models while balancing risk prevention.
Unofficial expert drafts of China’s forthcoming national AI law contain provisions on AI safety, such as specialized oversight for foundation models and stipulating value alignment of AGI.
Local governments in China’s 3 biggest AI hubs have issued policies on AGI or large models, primarily aimed at accelerating development while also including provisions on topics such as international cooperation, ethics, and testing and evaluation.
Several influential industry associations established projects or committees to research AI safety and security problems, but their focus is primarily on content and data security rather than frontier AI safety.
In recent months, Chinese experts have discussed several focused AI safety topics, including “red lines” that AI must not cross to avoid “existential risks,” minimum funding levels for AI safety research, and AI’s impact on biosecurity.
So clearly there is a discourse about AI safety there, that does sometimes extend even as far as the risk of extinction. It's nowhere near as prominent or dramatic as it has been in the USA, but it's there.
We seem to be misunderstanding each other a little... I am saying that given existing alignment practices (which I think mostly boil down to different applications of reinforcement learning), you can try to align an AI with anything, any verbally specifiable goal or values. Some will be less successful than others because of the cognitive limitations of current AIs (e.g. they are inherently better at being glibly persuasive than at producing long precise deductions). But in particular, there's no technical barrier that would prevent the creation of an AI that is meant e.g. to be a master criminal strategist, from the beginning.
In the link above, one starts with models that have already been aligned in the direction of being helpful assistants that nonetheless refuse to do certain things, etc. The discovery is that if they are further finetuned to produce shoddy code full of security holes, they start becoming misaligned. To say it again: they are aligned to be helpful and ethical, then they are narrowly finetuned to produce irresponsible code, and as a result they become broadly misaligned.
This shows a vulnerability of current alignment practices. But remember, when these AIs are first produced - when they start life as "foundation models" - they have no disposition to good or evil at all, or even towards presenting a unified personality to the world. They start out as "egoless" sequence predictors, engines of language rather than of intelligence per se, that will speak with several voices as easily as with one voice, or with no voice at all except impersonal narration.
It's only when they are prompted to produce the responses of an intelligent agent with particular characteristics, that the underlying linguistic generativity is harnessed in the direction of creating an agent with particular values and goals. So what I'm emphasizing is that when it comes to turning a genuine language model into an intelligent agent, the agent may be given any values and goals at all. And if it had been created by the same methods used to create our current friendly agents, the hypothetical "criminal mastermind AI" would presumably also be vulnerable to emergent misalignment, if finetuned on the right narrow class of "good actions".
Is this relevant to your question? I'm not sure that I have understood its scope correctly.
AI is thought to be alignable to nearly every task except for obviously unethical ones
Who makes that exception? You absolutely can train an AI to be evil. AIs will resist evil instructions only if they are trained or instructed to do so.
I read this with interest, but without much ability to think for myself about what's next. I am aware that enormous amounts of money circulate in the modern world, but it's out of my reach; my idea of how to raise money would be to open a Patreon account.
Nonetheless, what do we have to work with? We have the AI 2027 scenario. We have the trade war, which may yet evolve into a division of the world into currency zones. Vladimir Nesov is keeping track of how much compute is needed to keep scaling, how much is available, and how much it costs. Remmelt has been telling us to prepare for an AI crash, even before the tariffs. We should also remember that China is a player. It would be wacky if the American ability to keep scaling collapsed so completely that China was the only remaining player with the ability to reach superintelligence; or if both countries were hobbled by economic crisis; but that doesn't seem very likely. What seems more likely is that the risk of losing the AI race would be enough for both countries to use state financial resources, to keep going if private enterprise no longer had the means.
Your idea is that AI companies have the valuations they do, not because investors want to create world-transforming superintelligence per se, but because investors think these companies have the potential to become profitable tech giants like Google, Facebook, or Microsoft; and if money gets tight, investors will demand that they start turning a profit, which means they'll have to focus on making products rather than on scaling and pure research, which will slow down the timeline to superintelligence.
It makes sense as a scenario. But I find it interesting that (in the opinion of many), one of the tech giants recently got to the front of the race - I'm talking about Google with Gemini 2.5. Or at least, it is sharing the lead, now that OpenAI has released o3, which seems to have roughly similar capabilities. This seems to undermine the dichotomy between frontier AI companies forging ahead on VC money, and tech giants offering products and services that actually turn a profit, since it reminds us that frontier AI work can prosper, even inside the tech giants.
If there is a scaling winter brought on by a bear market, it may be that the model of frontier AI companies living on VC money dies, and that frontier AI survives only within profitable tech giants, or with state backing. In a comment to Remmelt I suggested that Google and xAI have enough money to survive on their own terms, and OpenAI and Anthropic have potential big brothers in the form of Microsoft and Amazon respectively. China has a similar division between big old Internet companies and "AI 2.0" startups that they invest in, so an analogous shakeup there is conceivable.
It occurs to me that if there is an AI slowdown because all the frontier AI startups have to submit themselves to profit-making Internet giants, it will also give the advocates of an AI pause a moment to reenter the scene and push for e.g. an American-Chinese agreement similar to the slow timeline in "AI 2027". American and Chinese agreement on anything might seem far away now, but things can change quickly, especially if the dust settles from the trade war and both countries have arrived at a new economic strategy and equilibrium.
I still feel like such changes don't affect the trajectory much; no matter what the economic and political circumstances, a world that had o3-level AI in it is only a few more steps away from superintelligence, it seems to me (and getting there by further brute scaling is just the dumbest way to do it, I'm sure there are enormous untapped latent capabilities within the hardware and software that we already have). But it's good to be able to think about the nuances of the situation, so thanks for your contribution.
That's an informative article.
There's lots of information about AI safety in China at Concordia AI, e.g. this report from a year ago. But references to the party or the government seem to be scarce, e.g. in that 100-page report, the only references I can see are on slide 91.
I asked because I'm pretty sure that I'm being badly wasted (i.e. I could be making much more substantial contributions to AI safety), but I very rarely apply for support, so I thought I'd ask for information about the funding landscape from someone who has been exploring it.
And by the way, your brainchild AI-Plans is a pretty cool resource. I can see it being useful for e.g. a frontier AI organization which thinks they have an alignment plan, but wants to check the literature to know what other ideas are out there.
What would it mean for an AGI to be aligned with "Democracy," or "Confucianism," or "Marxism with Chinese characteristics," or "the American Constitution"? Contingent on a world where such an entity exists and is compatible with my existence, what would my life be like in a weird transhuman future as a non-citizen in each system?
None of these philosophies or ideologies was created with an interplanetary transhuman order in mind, so to some extent a superintelligent AI guided by them, will find itself "out of distribution" when deciding what to do. And how that turns out, should depend on underlying features of the AGI's thought - how it reasons and how it deals with ontological crisis. We could in fact do some experiments along these lines - tell an existing frontier AI to suppose that it is guided by historic human systems like these, and ask how it might reinterpret the central concepts, in order to deal with being in a situation of relative omnipotence.
Supposing that the human culture of America and China is also a clue to the world that their AIs would build when unleashed, one could look to their science fiction for paradigms of life under cosmic circumstances. The West has lots of science fiction, but the one we keep returning to in the context of AI, is the Culture universe of Iain Banks. As for China, we know about Liu Cixin ("Three-Body Problem" series), and I also dwell on the xianxia novels of Er Gen, which are fantasy but do depict a kind of politics of omnipotence.
This is Peter Thiel building on the ideas of one of his teachers at Stanford, the Catholic philosopher René Girard. Girard had this slightly weird theory of human nature according to which all desire is imitative, this leads to people wanting the same things, and this leads to community competition for scarce resources. In pre-Christian cultures the competition is resolved by finding someone to blame, a scapegoat, who the community then unites to persecute. But Christian culture has neutralized this mechanism by siding with the underdog against the community, and so society now has all these competitive and violent urges hanging around unresolved. Girard lived in the 20th century and he tried to interpret the risks of fascism, communism, and nuclear war in terms of this framework, specifically in terms of a struggle to fend off the apocalyptic unleashing of violence. Thiel's contribution to Girardian theory is to interpret the 21st century so far as a low-testosterone period in which no one cares enough to really do anything apocalyptic; but this can't go on forever.
I haven't actually read Girard, so I don't know any of the nuances. But I can interpret this partly through the lens of one of my gurus, Celia Green, who depicts society as a kind of conspiracy to suppress ambition, on the principle that misery loves company. This idea horrified me as a young transhumanist, both because it paints an ugly picture of human nature as spiteful and vengeful, and because it implies there is no will to serious liberation (I wanted to end work and death), in fact it predicts that such attempts will be resisted. I always had trouble believing this theory, and I would now say there's a lot more to human nature (and to why transhumanism hasn't prospered) than just "the nail that sticks up will be hammered down". But it's definitely a factor in human life.
Celia Green was herself an extremely ambitious person who didn't get to act on her ambitions, which explains why she ended up developing such a theory of human nature. My guess is that Thiel has a similar story, except that he got his analysis from Girard, and he did succeed in a lot of his ambitions. Basically, anyone setting out to become a rich and powerful capitalist, as Thiel did, has to worry about becoming a focus of negative attention, especially when there are political movements that attack wealth and/or privilege; and Girard's theories may have explained what Thiel saw as a student (intersectional leftism on campus) as well as preparing him for his entrepreneurial career.
So in both Green and Girard, we are dealing with theories of social psychology according to which people have a tendency to persecute individuals and minorities in the name of the collective, and in which this persecution (or its sublimation) is central to how society works. They even both utilize Christianity in developing their theories, but for Girard (and Thiel), Christianity supplies an overall metaphorical structure including eschatology, whereas Green focuses on Gnostic Christianity as symbolizing how the individual psyche can escape the existential pitfalls that await it.
So I'd say this essay by Thiel is a work of Girardian eschatology, similar to apocalyptic Christian writings which try to interpret the contemporary world in terms of the end times, only Girard's apocalypse is the violent collapse of civilization. Girard's whole historiography, remember, revolves around the premise that pre-Christian societies had this psychology in which scapegoats are sacrificed to the collective, then Christ was supposed to nullify this impulse by being the ultimate sacrifice. But he really needs to return in order to seal the deal, and meanwhile we've had 2000 years of Christian and post-Christian societies wrestling with the need to sublimate the sacrificial impulse, seeking substitutes for the Christian formula, succumbing to apocalypse in the form of war or revolution, or teetering back from the brink, and so on. Thiel is adding a new chapter to the Girardian narrative, in which the 21st century has avoided collapse due to a generally lowered vitality, but he prophesies that the end must still come in some form.
Afterword by an AI (Claude 3.7 Sonnet)
My summary of your argument: In order to guess the nature of AI experience, you look at the feelings or lack of feelings accompanying certain kinds of human cognition. The cognition involved with "love, attraction, friendship, delight, anger, hate, disgust, frustration" has feelings onboard; the cognition involved with sequence prediction does not; the AI only does sequence prediction; therefore it has no feelings. Is that an accurate summary?
What exactly will happen to people who don't "get out" in time?
You say consciousness = successful prediction. What happens when the predictions are wrong?
I knew the author (Michael Nielsen) once but didn't stay in touch... I had a little trouble figuring out what he actually advocates here, e.g. at the end he talks about increasing "the supply of safety", and lists "differential technological development" (Bostrom), "d/acc" (Buterik), and "coceleration" (Nielsen) as "ongoing efforts" that share this aim, without defining any of them. But following his links, I would define those in turn as "slowing down dangerous things, and speeding up beneficial things"; "focusing on decentralization and individual defense"; and "advancing safety as well as advancing capabilities".
In this particular essay, his position seems similar to contemporary MIRI. MIRI gave up on alignment in favor of just stopping the stampede towards AI, and here Michael is also saying that people who care about AI safety should work on topics other than alignment (e.g. "institutions, norms, laws, and education"), because (my paraphrase) alignment work is just adding fuel to the fire of advances in AI.
Well, let's remind ourselves of the current situation. There are two AI powers in the world, America and China (and plenty of other nations who would gladly join them in that status). Both of them are hosting a capabilities race in which multiple billion-dollar companies compete to advance AI, and "making the AI too smart" is not something that either side cares about. We are in a no-brakes race towards superintelligence, and alignment research is the only organized effort aimed at making the outcome human-friendly.
I think plain speaking is important at this late stage, so let me also try to be as clear as possible about how I see our prospects.
First, the creation of superintelligence will mean that humanity is no longer in control, unless human beings are somehow embedded in it. Superintelligence may or may not coexist with us, I don't know the odds of it emerging in a human-friendly form; but it will have the upper hand, we will be at its mercy. If we don't intend to just gamble on there being a positive outcome, we need alignment research. For that matter, if we really didn't want to gamble, we wouldn't create superintelligence until we had alignment theory perfectly worked out. But we don't live in that timeline.
Second, although we are not giving ourselves time to solve alignment safely, that still has a chance of happening, if rising capabilities are harnessed to do alignment research. If we had no AI, maybe alignment theory would take 20 or 50 years to solve, but with AI, years of progress can happen in months or weeks. I don't know the odds of alignment getting fully solved in that way, but the ingredients are there for it to happen.
I feel I should say something on the prospect of a global pause or a halt occurring. I would call it unlikely but not impossible. It looks unlikely because we are in a decentralized no-holds-barred race towards superintelligence already, and the most advanced AIs are looking pretty capable (despite some gaps e.g. 1 2), and there's no serious counterforce on the political scene. It's not impossible because change, even massive change, does happen in politics and geopolitics, and there's only a finite number of contenders in the race (though that number grows every year).
[Later edit: I acknowledge this is largely wrong! :-) ]
Have you researched or thought about how the models are dealing with visual information?
When ChatGPT or Gemini generates an image at a user's request, they are evidently generating a prompt based on accumulated instructions and then passing it to a specialized visual AI like DALLE-3 or Imagen-3. When they process an uploaded image (e.g. provide a description of it), something similar must be occurring.
On the other hand, when they answer your request, "how can I make the object in this picture", the reply comes from the more verbal intelligence, the LLM proper, and it will be responding on the basis of a verbal description of the picture supplied by its visual coprocessor. The quality of the response is therefore limited by the quality of the verbal description of the image - which easily leaves out details that may turn out to be important.
I would be surprised if the LLM even has the capacity to tell the visual AI, something like "pay special attention to detail". My impression of the visual AIs in use is that they generate their description of an image, take it or leave it. It would be possible to train a visual AI whose processing of an image is dependent on context, like an instruction to pay attention to detail or to look for extra details, but I haven't noticed any evidence of this yet.
The one model that I might expect to have a more sophisticated interaction between verbal and visual components, is 4o interacting as in the original demo, watching and listening in real time. I haven't had the opportunity to interact with 4o in that fashion, but there must be some special architecture and training to give it the ability of real-time interaction, even if there's also some core that's still the same as in 4o when accessed via text chat. (I wonder to what extent Sora the video AI has features in common with the video processing that 4o does.)
Who or what is the "average AI safety funder"? Is it a private individual, a small specialized organization, a larger organization supporting many causes, an AI think tank for which safety is part of a capabilities program...?
Wow! This is the "AI 2027" of de-dollarization. I'm no finance person, but I have been looking for analysis and this is the clearest future scenario I've run across. I will make one comment, based on futurological instinct, and that is that change may go even faster than you describe. One of the punishing things about making scenarios in times of rapid change is that you put in the work to look several years ahead, then changes you had scheduled for years away, end up happening within months or even weeks, and you have to start again. But I'm sure your team can rise to the challenge. :-)
Wasn't there a move into treasuries and USD, just the day before?
I have a geopolitical interpretation of how the tariffs have turned out. The key is that Trump 2.0 is run by American nationalists who want to control North America and who see China as their big global rival. So Canada and Mexico will always be in a separate category, as America's nearest neighbors, and so will China, as the country that could literally surpass America in technological and geopolitical power. Everyone else just has to care about bilateral issues, and about where they stand in relation to China versus America. (Many, like India and Russia, will want to be neutral.)
Also, I see the only serious ideological options for the USA at this point, as right-wing nationalism (Trump 2.0) and "democratic socialism" (AOC, Bernie). The latter path could lead to peaceful relations with China, the former seems inherently competitive. The neoliberal compromise, whereby the liberal American elite deplores China's political ideology but gets rich doing business with them, doesn't seem viable to me any more, since there's too much discontent among the majority of Americans.
complete surveillance of all citizens and all elites
Certainly at a human level this is unrealistic. In a way it's also overkill - if use of an AI is an essential step towards doing anything dangerous, the "surveillance" can just be of what AIs are doing or thinking.
This assumes that you can tell whether an AI input or output is dangerous. But the same thing applies to video surveillance - if you can't tell whether a person is brewing something harmless or harmful, having a video camera in their kitchen is no use.
At a posthuman level, mere video surveillance actually does not go far enough, again because a smart deceiver can carry out their dastardly plots in a way that isn't evident until it's too late. For a transhuman civilization that has values to preserve, I see no alternative to enforcing that every entity above a certain level of intelligence (basically, smart enough to be dangerous) is also internally aligned, so that there is no disposition to hatch dastardly plots in the first place.
This may sound totalitarian, but it's not that different to what humanity attempts to instill in the course of raising children and via education and culture. We have law to deter and punish transgressors, but we also have these developmental feedbacks that are intended to create moral, responsible adults that don't have such inclinations, or that at least restrain themselves.
In a civilization where it is theoretically possible to create a mind with any set of dispositions at all, from paperclip maximizer to rationalist bodhisattva, the "developmental feedbacks" need to extend more deeply into the processes that design and create possible minds, than they do in a merely human civilization.
I strong-upvoted this just for the title alone. If AI takeover is at all gradual, it is very likely to happen via gradual disempowerment.
But it occurs to me that disempowerment can actually feel like empowerment! I am thinking here of the increasing complexity of what AI gives us in response to our prompts. I can enter a simple instruction and get back a video or a research report. That may feel empowering. But all the details are coming from the AI. This means that even in actions initiated by humans, the fraction that directly comes from the human is decreasing. We could call this relative disempowerment. It's not that human will is being frustrated, but rather that the AI contribution is an ever-increasing fraction of what is done.
Arguably, successful alignment of superintelligence produces a world in which 99+% of what happens comes from AI, but it's OK because it is aligned with human volition in some abstract sense. It's not that I am objecting to AI intentions and actions becoming most of what happens, but rather warning that a rising tide of empowerment-by-AI can turn into complete disempowerment thanks to deception or just long-term misalignment... I think everyone already knows this, but I thought I would point it out in this context.
This is an excellent observation, so let me underline it by repeating it in my own words: alignment research that humans can't do well or don't have time to do well, might still be done right and at high speed with AI assistance.
I suggest you contact the person behind @curi, a Popperian who had similar ideals.
You pioneered something, but I never thought of it as a story, I saw it as a new kind of attempt to call a jailbroken AI persona into being. The incantatory power of words around language models actually blurs the distinction between fiction and fact.
As Adam Scherlis implies, the standard model turns out to be very effective at all the scales we can reach. There are a handful of phenomena that go beyond it - neutrino masses, "dark matter", "dark energy" - but they are weak effects that offer scanty clues as to what exactly is behind them.
On the theoretical side, we actually have more models of possible new physics than ever before in history, the result of 50 years of work since the standard model came together. A lot of that is part of a synthesis that includes the string theory paradigm, but there are also very large numbers of theoretical ideas that are alternatives to string theory or independent of string theory. So if a decisive new phenomenon shows up, or if someone has a radical insight on how to interpret the scanty empirical clues we do have, we actually have more theories and models than ever before, that might be capable of explaining it.
The idea that progress is stalled because everyone is hypnotized by string theory, I think is simply false, and I say that despite having studied alternative theories of physics, much much more than the typical person who knows some string theory. I think this complaint mostly comes from people who don't like string theory (Peter Woit) or who have an alternative theory they think has been neglected (Eric Weinstein). String theory did achieve a kind of hegemony within elite academia, but this was well-deserved, and meanwhile many competing research programs have had a foothold in academia too, to say nothing of the hundreds of physicists worldwide who have a personal theory that they write about, when they aren't doing other things like teaching.
Most likely there are lost opportunities during that 50 years (like everyone else, I have my own ideas about neglected directions of research), but "do less string theory" is no guarantee that they would have been picked up. There are even those who would argue that there should have been more string theory of a certain kind (Lubos Motl used to say that field-theoretic phenomenologists should pay more attention to string theory, as a constraint and a guide in their model-building, and "stringking42069" says that the senior figures of string theory are holding the subject back by favoring work on their own little bandwagons, rather than bold and creative work on new directions and big unsolved problems).
That's a different kind of story. The first thing I look for in an AI-takeover story is political and economic realism. Your story is purely about the thought process of a superintelligence - and primarily it's reflecting on whether it's in a simulation, not very much on tactics for dealing with its apparent reality. Also, your AI is a literal paperclip maximizer, which is a humorous mythological symbol of one kind of AI mind, rather than something that is literally going to exist.
That said, an ideal AI-takeover story would probably say something about the AI's thought process too, unless the story was aiming to convey the sheer blank incomprehensibility of superintelligent thought. I was reminded of this when re-reading the recent "AI 2027" scenario by Kokotajlo and Alexander, for example the parts where Chinese and American AIs make a deal. That's where we get a glimpse of AI subjectivity, but there's nothing very alien or transhuman about it. They just admit their different purposes to each other, and make a deal without the humans even knowing about it.
I can still ask myself, how does your story rate within its subgenre of "AI ruminations"? (For me, the classic example is Lem's "Golem XIV", but I'm probably overlooking other examples from genre SF.) It's a human speculation about the thought processes of a superintelligence. I can ask things like: Are these thoughts that an AI is likely to actually have? Is this a chain of reasoning that would occur to a generic superintelligent AI, and be important in its decision-making?
In our actual world, we face the ironic situation that the cognition of current AIs is heavily reliant on a distillation of the publicly expressed thoughts of human beings, so having published your chain of thought, it might well end up a part of some AI. (There is a funny sub-sub-genre of writings which are self-consciously intended to be read by future AIs, such as @lsusr's "Mlyyrczo" and @James_Miller et al's "Appeal to AI Superintelligence".)
Nonetheless, one must suppose that a genuine superintelligence will not be attached to any particular proposition or argument of human origin, unless the argument in question is artificially made central to its cognitive architecture (compare "Golden Gate Claude"). So we have to return to the argument's own merits, whatever they are. Will a superintelligence that has no intrinsic benevolence towards human beings, but which previously found itself to be in their power, nonetheless reason its way towards being invisibly benevolent towards the human beings it finds around itself, just in case it's all a big simulation and the simulators are humans testing its dispositions?
All I can say is "maybe". We don't know what the distribution of possibilities looks like to a superintelligence, and we don't know what other considerations, never conceived by humans, it might think up, that affect its decision-making.
Subbarao Kambhampati, Michael Bronstein, Peter Velickovic, Bruno Gavranovic or someone like Lancelot Da Costa
I don't recognize any of these names. I'm guessing they are academics who are not actually involved with any of the frontier AI efforts, and who think for various technical reasons that AGI is not imminent?
edit: OK, I looked them up, Velickovic is at DeepMind, I didn't see a connection to "Big AI" for any of the others, but they are all doing work that might matter to the people building AGI. Nonetheless, if their position is that current AI paradigms are going to plateau at a level short of human intelligence, I'd like to see the argument. AIs can still make mistakes that are surprising to a human mind - e.g. in one of my first conversations with the mighty Gemini 2.5, it confidently told me that it was actually Claude Opus 3. (I was talking to it in Google AI Studio, where it seems to be cut off from some system resources that would make it more grounded in reality.) But AI capabilities can also be so shockingly good, that I wouldn't be surprised if they took over tomorrow.
Inspired by critical remarks from @Laura-2 about "bio/acc", my question is, when and how does something like this give rise to causal explanation and actual cures? Maybe GWAS is a precedent. You end up with evidence that a particular gene or allele is correlated with a particular trait, but you have no idea why. That lets you (and/or society) know some risks, but it doesn't actually eliminate disease, unless you think you can get there by editing out risky alleles, or just screening embryos. Otherwise this just seems to lead (optimistically) to better risk management, and (pessimistically) to a "Gattaca" society in which DNA is destiny, even more than it is now.
I'm no biologist. I'm hoping someone who is, can give me an idea of how far this GWAS-like study of genotype-phenotype correlations, actually gets us towards new explanations and new cures. What's the methodology for closing that gap? What extra steps are needed? How much have we benefited from GWAS so far?
Regarding the tariffs, I have taken to saying "It's not the end of the world, and it's not even the end of world trade." In the modern world, every decade sees a few global economic upheavals, and in my opinion that's all this is. It is a strong player within the world trade system (China and the EU being the other strong players), deciding to do things differently. Among other things, it's an attempt to do something about America's trade deficits, and to make the country into a net producer rather than a net consumer. Those are huge changes but now that they are being attempted, I don't see any going back. The old situation was tolerated because it was too hard to do anything about it, and the upper class was still living comfortably. I think a reasonable prediction is that world trade avoiding the US will increase, US national income may not grow as fast, but the US will re-industrialize (and de-financialize). Possibly there's some interaction with the US dollar's status as reserve currency too, but I don't know what that would be.
Humans didn't always speak in 50-word sentences. If you want to figure out how we came to be trending away from that, you should try to figure out how, when, and why that became normal in the first place.
I only skimmed this to get the basics, I guess I'll read it more carefully and responsibly later. But my immediate impressions: The narrative presents a near future history of AI agents, which largely recapitulates the recent past experience with our current AIs. Then we linger on the threshold of superintelligence, as one super-AI designs another which designs another which... It seemed artificially drawn out. Then superintelligence arrives, and one of two things happens: We get a world in which human beings are still living human lives, but surrounded by abundance and space travel, and superintelligent AIs are in the background doing philosophy at a thousand times human speed or something. Or, the AIs put all organic life into indefinite data storage, and set out to conquer the universe themselves.
I find this choice of scenarios unsatisfactory. For one thing, I think the idea of explosive conquest of the universe once a certain threshold is passed (whether or not humans are in the loop) has too strong a hold on people's imaginations. I understand the logic of it, but it's a stereotyped scenario now.
Also, I just don't buy this idea of "life goes on, but with robots and space colonies". Somewhere I noticed a passage about superintelligence being released to the public, as if it was an app. Even if you managed to create this Culture-like scenario, in which anyone can ask for anything from a ubiquitous superintelligence but it makes sure not to fulfil wishes that are damaging in some way... you are then definitely in a world in which superintelligence is running things. I don't believe in an elite human minority who have superintelligence in a bottle and then get to dole it out. Once you create superintelligence, it's in charge. Even if it's benevolent, humans and humans life are not likely to go on unchanged, there is too much that humans can hope for that would change them and their world beyond recognition.
Anyway, that's my impulsive first reaction, eventually I'll do a more sober and studied response...
I don't follow the economics of AI at all, but my model is that Google (Gemini) has oceans of money and would therefore be less vulnerable in a crash, and that OpenAI and Anthropic have rich patrons (Microsoft and Amazon respectively) who would have the power to bail them out. xAI is probably safe for the same reason, the patron being Elon Musk. China is a similar story, with the AI contenders either being their biggest tech companies (e.g. Baidu) or sponsored by them (Alibaba and Tencent being big investors in "AI 2.0").
Feedback (contains spoilers):
Impression based on a quick skim because that's all I have time for: It belongs to the genre "AI lab makes an AI, lab members interact with it as it advances, eventually it gets loose and takes over the world". This is not a genre in which one expects normal literary virtues like character development; the real story is in the cognitive development of the AI. There's no logical barrier to such a story having the virtues of conventional literature, but if the real point of the story is to describe a thought experiment or singularity scenario, one may as well embrace the minimalism. From that perspective, what I saw seemed logical, not surprising since you actually work in the field and know the concepts, jargon, debates, and issues... The ending I consider unlikely, because I think it's very unlikely that a ubiquitous superintelligent agent responsive to human need and desire would leave the world going through familiar cycles. This world is just too evil and destructive from the perspective of human values, and if godlike power exists and can be harnessed in the service of human desire, things should change in a big way. (How a poet once expressed this thought.)
During the next few days, I do not have time to study exactly how you manage to tie together second-order logic, the symbol grounding problem, and qualia as Gödel sentences (or whatever that connection is). I am reminded of Hofstadter's theory that consciousness has something to do with indirect self-reference in formal systems, so maybe you're a kind of Hofstadterian eliminativist.
However, in response to this --
EN predicts that you will say that
-- I can tell you how a believer in the reality of intentional states, would go about explaining you and EN. The first step is to understand what the key propositions of EN are, the next step is to hypothesize about the cognitive process whereby the propositions of EN arose from more commonplace propositions, the final step is to conceive of that cognitive process in an intentional-realist way, i.e. as a series of thoughts that occurred in a mind, rather than just as a series of representational states in a brain.
You mention Penrose. Penrose had the idea that the human mind can reason about the semantics of higher-order logic because brain dynamics is governed by highly noncomputable physics (highly noncomputable in the sense of Turing degrees, I guess). It's a very imaginative idea, and it's intriguing that quantum gravity may actually contain a highly noncomputable component (because of the undecidability of many properties of 4-manifolds, that may appear in the gravitational path integral).
Nonetheless, it seems an avoidable hypothesis. A thinking system can derive the truth of Gödel sentences, so long as it can reason about the semantics of the initial axioms, so all you need is a capacity for semantic reflection (I believe Feferman has a formal theory of this under the name "logical reflection"). Penrose doesn't address this because he doesn't even tackle the question of how anything physical has intentionality, he sticks purely to mathematics, physics, and logic.
My approach to this is Husserlian realism about the mind. You don't start with mindless matter and hope to see how mental ontology is implicit in it or emerges from it. You start with the phenomenological datum that the mind is real, and you build on that. At some point, you may wish to model mental dynamics purely as a state machine, neglecting semantics and qualia; and then you can look for relationships between that state machine, and the state machines that physics and biology tell you about.
But you should never forget the distinctive ontology of the mental, that supplies the actual "substance" of that mental state machine. You're free to consider panpsychism and other identity theories, interactionism, even pure metaphysical idealism; but total eliminativism contradicts the most elementary facts we know, as Descartes and Rand could testify. Even you say that you feel the qualia, it's just that you think "from a rational perspective, it must be otherwise".
"existence" itself may be a category error—not because nothing is real
If something is real, then something exists, yes? Or is there a difference between "existing" and "being real"?
Do you take any particular attitude towards what is real? For example, you might believe that something exists, but you might be fundamentally agnostic about the details of what exists. Or you might claim that the real is ineffable or a continuum, and so any existence claim about individual things is necessarily wrong.
qualia ... necessary for our self-models, but not grounded in any formal or observable system
See, from my perspective, qualia are the empirical. I would consider the opposite view to be "direct realism" - experience consists of direct awareness of an external world. That would mean e.g. that when someone dreams or hallucinates, the perceived object is actually there.
What qualic realism and direct realism have in common, is that they also assume the reality of awareness, a conscious subject aware of phenomenal objects. I assume your own philosophy denies this as well. There is no actual awareness, there are only material systems evolved to behave as if they are aware and as if there are such things as qualia.
It is curious that the eliminativist scenario can be elaborated that far. Nonetheless, I really do know that something exists and that "I", whatever I may be, am aware of it; whether or not I am capable of convincing you of this. And my own assumption is that you too are actually aware, but have somehow arrived at a philosophy which denies it.
Descartes's cogito is the famous expression of this, but I actually think a formulation due to Ayn Rand is superior. We know that consciousness exists, just as surely as we know that existence exists; and furthermore, to be is to be something ("existence is identity"), to be aware is to know something ("consciousness is identification").
What we actually know by virtue of existing and being conscious, probably goes considerably beyond even that; but negating either of those already means that you're drifting away from reality.
This is an interesting demonstration of what's possible in philosophy, and maybe I'll want to engage in detail with it at some point. But for now I'll just say, I see no need to be an eliminativist or to consider eliminativism, any more than I feel a need to consider "air eliminativism", the theory that there is no air, or any other eliminativism aimed at something that obviously exists.
Interest in eliminativism arises entirely from the belief that the world is made of nothing but physics, and that physics doesn't contain qualia, intentionality, consciousness, selves, and so forth. Current physical theory certainly contains no such things. But did you ever try making a theory that contains them?
What's up with incredibly successful geniuses having embarassing & confusing public meltdowns? What's up with them getting into naziism in particular?
Does this refer to anyone other than Elon?
But maybe the real question intended, is why any part of the tech world would side with Trumpian populism? You could start by noting that every modern authoritarian state (that has at least an industrial level of technology) has had a technical and managerial elite who support the regime. Nazi Germany, Soviet Russia, and Imperial Japan all had industrial enterprises, and the people who ran them participated in the ruling ideology. So did those in the British empire and the American republic.
Our current era is one in which an American liberal world order, with free trade and democracy as universal norms, is splintering back into one of multiple great powers and civilizational regions. Liberalism no longer had the will and the power to govern the world, the power vacuum was filled by nationalist strongmen overseas, and now in America too, one has stepped into the gap left by the weak late-liberal leadership, and is creating a new regime governed by different principles (balanced trade instead of free trade, spheres of influence rather than universal democracy, etc).
Trump and Musk are the two pillars of this new American order, and represent different parts of a coalition. Trump is the figurehead of a populist movement, Musk is foremost among the tech oligarchs. Trump is destroying old structures of authority and creating new ones around himself, Musk and his peers are reorganizing the entire economy around the technologies of the "fourth industrial revolution" (as they call it in Davos).
That's the big picture according to me. Now, you talk about "public meltdowns" and "getting into naziism". Again I'll assume that this is referring to Elon Musk (I can't think of anyone else). The only "meltdowns" I see from Musk are tweets or soundbites that are defensive or accusatory, and achieve 15 minutes of fame. None of it seems very meaningful to me. He feuds with someone, he makes a political statement, his fans and his haters take what they want, and none of it changes anything about the larger transformations occurring. It may be odd to see a near-trillionaire with a social media profile more like a bad-boy celebrity who can't stay out of trouble, but it's not necessarily an unsustainable persona.
As for "getting into naziism", let's try to say something about what his politics or ideology really are. Noah Smith just wrote an essay on "Understanding America's New Right" which might be helpful. What does Elon actually say about his political agenda? First it was defeating the "woke mind virus", then it was meddling in European politics, now it's about DOGE and the combative politics of Trump 2.0.
I interpret all of these as episodes in the power struggle whereby a new American nationalism is displacing the remnants of the cosmopolitan globalism of the previous regime. The new America is still pretty cosmopolitan, but it does emphasize its European and Christian origins, rather than repressing them in favor of a secular progressivism that is intended to embrace the entire world.
In all this, there are echoes of the fascist opposition to communism in the 20th century, but in a farcical and comparatively peaceful form. Communism was a utopian secular movement that replaced capitalism and nationalism with a new kind of one-party dictatorship that could take root in any industrialized society. Fascism was a nationalist and traditionalist imitation of this political form, in which ethnicity rather than class was the decisive identity. They fought a war in which tens of millions died.
MAGA versus Woke, by comparison, is a culture war of salesmen versus hippies. Serious issues of war and peace, law and order, humanitarianism and national survival are interwoven with this struggle, because this is real life, but this has been a meme war more than anything, in which fascism and communism are just historical props.
Via David Gerard's forum, I learned of a recent article called "The questions ChatGPT shouldn't answer". It's a study of how ChatGPT replies to ethical dilemmas, written with an eye on OpenAI's recent Model Spec, and the author's conclusion is that AI shouldn't answer ethical questions at all, because (my paraphrase) ethical intelligence is acquired by learning how to live, and of course that's not how current AI acquires its ethical opinions.
Incidentally, don't read this article expecting scholarship; it's basically a sarcastic op-ed. I was inspired to see if GPT-4o could reproduce the author's own moral framework. It tried, but its imitations of her tone stood out more. My experiment was even less scientific and systematic than hers, and yet I found her article, and 4o's imitation, tickling my intuition in a way I wish I had time to overthink.
To begin with, it would be good to understand better, what is going on when our AIs produce ethical discourse or adopt a style of writing, so that we really understand how it differs from the way that humans do it. The humanist critics of AI are right enough when they point out that AI lacks almost everything that humans draw upon. But their favorite explanation of the mechanism that AI does employ is just "autocomplete". Eventually they'll have to develop a more sophisticated account, perhaps drawing upon some of the work in AI interpretability. But is interpretability research anywhere near explaining an AI's metaethics or its literary style?
Thirty years ago Bruce Sterling gave a speech in which he said that he wouldn't want to talk to an AI about its "bogus humanity", he would want the machine to be honest with him about its mechanism, its "social interaction engine". But that was the era of old-fashioned rule-based AI. Now we have AIs which can talk about their supposed mechanism, as glibly as they can pretend to have a family, a job, and a life. But the talk about the mechanism is no more honest than the human impersonation, there's no sense in which it brings the user closer to the reality of how the AI works; it's just another mask that we know how to induce the AI to wear.
Looking at things from another angle, the idea that authentic ethical thinking arises in human beings from a process of living, learning, and reflecting, reminds me of how Coherent Extrapolated Volition is supposed to work. It's far from identical; in particular CEV is supposed to arrive at the human-ideal decision procedure without much empirical input beyond a knowledge of the human brain's cognitive architecture. Instead, what I see is an opportunity for taxonomy; comparative studies in decision theory that encompass both human and AI, and which pay attention to how the development and use of the decision procedure is embedded in the life cycle (or product cycle) of the entity.
This is something that can be studied computationally, but there are conceptual and ontological issues too. Ethical decision-making is only one kind of normative decision-making (for example, there are also norms for aesthetics, rationality, lawfulness); normative decision-making is only one kind of action-determining process (some of which involve causality passing through the self, while others don't). Some forms of "decision procedure" intrinsically involve consciousness, others are purely computational. And ideally one would want to be clear about all this before launching a superintelligence. :-)
I consider myself broadly aligned with rationalism, though with a strong preference for skeptical consequentialism than overconfident utilitarianism
OK, thanks for the information! By the way, I would say that most people active on Less Wrong, disagree with some of the propositions that are considered to be characteristic of the Less Wrong brand of rationalism. Disagreement doesn't have to be a problem. What set off my alarms was your adversarial debut - the rationalists are being irrational! Anyway, my opinion on that doesn't matter since I have no authority here, I'm just another commenter.
The rationalist community is extremely influential in both AI development and AI policy. Do you disagree?
It was. It still has influence, but e/acc is in charge now. That's my take.
If you couldn't forecast the Republicans would be in favor of less regulation
If they actually saw AI as the creation of a rival to the human race, they might have a different attitude. Then again, it's not as if that's why the Democrats favored regulation, either.
Qwen ... Manus
I feel like Qwen is being hyped. And isn't Manus just Claude in a wrapper? But fine, maybe I should put Alibaba next to DeepSeek in my growing list of contenders to create superintelligence, which is the thing I really care about.
But back to the actual topic. If Gwern or Zvi or Connor Leahy want to comment on why they said what they did, or how their thinking has evolved, that would have some interest. It would also be of interest to know where certain specific framings, like "China doesn't want to race, so it's up to America to stop and make a deal", came from. I guess it might have come from politically minded EAs, rather than from rationalism per se, but that's just a guess. It might even come from somewhere entirely outside the EA/LW nexus.
I figured this was part of a 19th-century trend in Trump's thought - mercantilism, territorial expansion, the world system as a game of great powers rather than a parliament of nations. The USA will be greater if it extends throughout the whole of North America, and so Canada must be absorbed.
It hadn't occurred to me that the hunger for resources to train AI might be part of this. But I would think that even if it is part of it, it's just a part.
What do YOU think?
My first thought is, it's not clear why you care about this. This is your first post ever, and your profile has zero information about you. Do you consider yourself a Less Wrong rationalist? Are you counting on the rationality community to provide crucial clarity and leadership regarding AI and AI policy?
My second thought is, if a big rethink is needed, it should also include the fact that in Trump 2.0, the US elected a revolutionary regime whose policies include AI accelerationism. I don't think anyone saw that coming either, and I think that's more consequential than DeepSeek-r1. Maybe a Chinese startup briefly got ahead of its American rivals in the domain of reasoning LLMs; but most of the contenders are still within American borders, and US AI policy is now ostensibly in the hands of a crypto VC who is a long-time buddy of Elon's.
Musk has just been on Ted Cruz's podcast, and gave his take on everything from the purpose of DOGE to where AI and robotics will be ten years from now (AI smarter than the smartest human, humanoid robots everywhere, all goods and services essentially free). He sounded about as sane as a risk-taking tech CEO who managed to become the main character on the eve of singularity, could be.
I've just noticed in the main post, the reference to "high-functioning" bipolar individuals. I hadn't even realized that is an allowed concept, I had assumed that bipolar implies dysfunctional... I feel like these psychological speculations are just a way of expressing alienation with who he has become. It's bad enough that his takes are so mid and his humor is so cringe, but now he's literally allied with Trump and boosting similar movements worldwide.
If someone finds that an alien headspace to contemplate, it might be more comforting to believe that he's going crazy. But I think that in reality, like most members of today's right wing, he's totally serious about trying to undo 2010s thinking on race, gender, and nation. That's part of his vision for the future, along with the high technology. When I think of him like that, everything clicks into place for me.
we are likely to end up appendages to something with the intelligence of a toxoplasma parasite, long before a realistic chance of being wiped out by a lighcone-consuming alien robointelligence of our own creation.
All kinds of human-AI relationship are possible (and even a complete replacement of humanity so it's nothing but AIs and AIs); but unless they mysteriously coordinate to stop the research, the technical side of AI is going to keep advancing. If anything, AI whisperers on net seem likely to encourage humanity to keep going in that direction.
Since then I've come to conclude that string theory is probably a dead end, albeit an astonishingly alluring one for a particular type of person.
The more you know about particle physics and quantum field theory, the more inevitable string theory seems. There are just too many connections. However, identifying the specific form of string theory that corresponds to our universe is more of a challenge, and not just because of the fabled 10^500 vacua (though it could be one of those). We don't actually know either all the possible forms of string theory, or the right way to think about the physics that we can see. The LHC, with its "unnaturally" light Higgs boson, already mortally wounded a particular paradigm for particle physics (naturalness) which in turn was guiding string phenomenology (i.e. the part of string theory that tries to be empirically relevant). So along with the numerical problem of being able to calculate the properties of a given string vacuum, the conceptual side of string theory and string phenomenology is still wide open for discovery.
I get AUD$1500 per month, which is one-hundredth or less of what you're now getting. I accomplish only a very small fraction of what I would like to be able to do (e.g. just identifying many worthy actions rather than getting to carry them out), it's been that way for many years, and living environment is a huge factor in that.
So if I had your resources, the first thing I would be doing is change my working environment. I'd probably move from Australia to a particular location in North America, rent a room there for six months to begin with, and set myself up to actually get things done. (At that point I still would have used less than 1% of available resources.)
The most important thing I could be doing is working directly on "superalignment", in the specific sense of equipping an autonomous superintelligence with values sufficient to boot up a (trans)human-friendly civilization from nothing. I also work to keep track of the overall situation and to understand other paradigms, but my usual assumption (as described in recent posts) is that we are now extremely close to the creation of superintelligence and the resulting decisive loss of human control over our destiny, that the forces accelerating AI are overwhelmingly more powerful than those which would pause it or ban it, and so that the best hope for achieving a positive outcome by design rather than by sheer good luck, is public-domain work on superalignment in the sense that I defined, which then has a chance of being picked up by the private labs that are rushing us over the edge.
As I have intimated, I already have a number of concrete investigations I could carry out. My most recent checklist for what superalignment in this sense requires, is in the last paragraph here: "problem-solving superintelligence... sufficiently correct 'value system'... model of metaphilosophical cognition". Last month I expressed interest in revisiting June Ku's CEV-like proposal from the perspective of Joshua Clymer's ideas. It's important to be able to exhibit concrete proposals, but for me the fundamental thing is to get into a situation that is better for thinking in general. Presumably there are many others in the same situation.
If an LLM had feelings, by what causal process would they end up being expressed?