Posts
Comments
Good points, which in part explains why I think it is very very unlikely that AI research can be driven underground (in the US or worldwide). I was speaking to the desirability of driving it underground, not its feasibility.
pausing means moving AI underground, and from what I can tell that would make it much harder to do safety research
I would be overjoyed if all AI research were driven underground! The main source of danger is the fact that there are thousands of AI researchers, most of whom are free to communicate and collaborate with each other. Lone researchers or small underground cells of researcher who cannot publish their results would be vastly less dangerous than the current AI research community even if there are many lone researchers and many small underground teams. And if we could make it illegal for these underground teams to generate revenue by selling AI-based services or to raise money from investors, that would bring me great joy, too.
Research can be modeled as a series of breakthroughs such that it is basically impossible to make breakthrough N before knowing about breakthrough N-1. If the researcher who makes breakthrough N-1 is unable to communicate it to researchers outside of his own small underground cell of researchers, then only that small underground cell or team has a chance at discovering breakthrough N, and research would proceed much more slowly than it does under current conditions.
The biggest hope for our survival is the quite likely and realistic hope that many thousands of person-years of intellectual effort that can only be done by the most talented among us remain to be done before anyone can create an AI that could extinct us. We should be making the working conditions of the (misguided) people doing that intellectual labor as difficult and unproductive as possible. We should restrict or cut off the labs' access to revenue, to investment, to "compute" (GPUs), to electricity and to employees. Employees with the skills and knowledge to advance the field are a particularly important resource for the labs; consequently, we should reduce or restrict their number by making it as hard as possible (illegal preferably) to learn, publish, teach or lecture about deep learning.
Also, in my assessment, we are not getting much by having access to the AI researchers: we're not persuading them to change how they operate and the information we are getting from them is of little help IMHO in the attempt to figure out alignment (in the original sense of the word where the AI stays aligned even if it becomes superhumanly capable).
The most promising alignment research IMHO is the kind that mostly ignores the deep-learning approach (which is the sole focus as far as I know of all the major labs) and inquires deeply into which approach to creating a superhumanly-capable AI would be particularly easy to align. That was the approach taken by MIRI before it concluded in 2022 that its resources were better spent trying to slow down the AI juggernaut through public persuasion.
Deep learning is a technology created by people who did not care about alignment or wrongly assumed alignment would be easy. There is a reason why MIRI mostly ignored deep learning when most AI researchers started to focus on it in 2006. It is probably a better route to aligned transformative AI to search for another, much-easier-to-align technology (that can eventually be made competitive in capabilities with deep learning) than to search for a method to align AIs created with deep-learning technology. (To be clear, I doubt that either approach will bear fruit in time unless the AI juggernaut can be slowed down considerably.) And of course if they will be mostly ignoring deep learning, there's little alignment researchers can learn from the leading labs.
Impressive performance by the chatbot.
Maybe "motto" is the wrong word. I meant words / concepts to use in a comment or in a conversation.
"Those companies that created ChatGPT, etc? If allowed to continue operating without strict regulation, they will cause an intelligence explosion."
There's a good chance that "postpone the intelligence explosion for a few centuries" is a better motto than "stop AI" or "pause AI".
Someone should do some "market research" on this question.
All 3 of the other replies to your question overlook the crispest consideration: namely, it is not possible to ensure the proper functioning of even something as simple as a circuit for division (such as we might find inside a CPU) through testing alone: there are too many possible inputs (too many pairs of possible 64-bit divisors and dividends) to test in one lifetime even if you make a million perfect copies of the circuit and test them in parallel.
Let us consider very briefly what else besides testing an engineer might do to ensure (or "verify" as the engineer would probably say) the proper operation of a circuit for dividing. The circuit is composed of 64 sub-circuits, each responsible for producing one bit of the output (i.e., the quotient to be calculated), and an engineer will know enough about arithmetic to know that the sub-circuit for calculating bit N should bear a close resemblance to the one for bit N+1: it might not be exactly identical, but any differences will be simple enough to be understood by a digital-design engineer -- usually: in 1994, a bug was found in the floating-point division circuit of the Intel Pentium CPU, precipitating a product recall that cost Intel about $475 million. After that, Intel switched to a more reliable, but much more ponderous technique called "formal verification" of its CPUs.
My point is that the question you are asking is sort of a low-stakes question (if you don't mind my saying) because there is a sharp limit to how useful testing can be: testing can reveal that the designers need to go back to the drawing board, but human designers can't go back to the drawing board billions of times (because there is not enough time because human designers are not that fast) so most of the many tens or hundreds of bits of human-applied optimization pressure that will be required for any successful alignment effort will need to come from processes other than testing. Discussion of these other processes is more pressing than any discussion of testing.
Eliezer's "Einstein's Arrogance is directly applicable here although I see that that post uses "bits of evidence" and "bits of entanglement" instead of "bits of optimization pressure".
Another important consideration is that there is probably no safe way to run most of the tests we would want to run on an AI much more powerful than we are.
Let me reassure you that there’s more than enough protein available in plant-based foods. For example, here’s how much grams of protein there is in 100 gram of meat
That is misleading because most foods are mostly water, included the (cooked) meats you list, but the first 4 of the plant foods you list have had their water artificially removed: soy protein isolate; egg white, dried; spirulina algae, dried; baker’s yeast.
Moreover, the human gut digests and absorbs more of animal protein than of plant protein. Part of the reason for this is the plant protein includes more fragments that are impervious to digestive enzymes in the human gut and more fragments (e.g., lectins) that interfere with human physiology.
Moreover, there are many people who can and do eat 1 or even 2 lb of cooked meat every day without obvious short-term consequences whereas most people who would try to eat 1 lb of spirulina (dry weight) or baker's yeast (dry weight) in a day would probably get acute distress of the gut before the end of the day even if the spirulina or yeast was mixed with plenty of other food containing plenty of water, fiber, etc. Or at least that would be my guess (having eaten small amounts of both things): has anyone made the experiment?
The very short answer is that the people with the most experience in alignment research (Eliezer and Nate Soares) say that without an AI pause lasting many decades the alignment project is essentially hopeless because there is not enough time. Sure, it is possible the alignment project succeeds in time, but the probability is really low.
Eliezer has said that AIs based on the deep-learning paradigm are probably particularly hard to align, so it would probably help to get a ban or a long pause on that paradigm even if research in other paradigms continues, but good luck getting even that because almost all of the value currently being provided by AI-based services are based on deep-learning AIs.
One would think that it would be reassuring to know that the people running the labs are really smart and obviously want to survive (and have their children survive) but it is only reassuring before one listens to what they say and reads what they write about their plans on how to prevent human extinction and other catastrophic risks. (The plans are all quite inadequate.)
I'm going to use "goal system" instead of "goals" because a list of goals is underspecified without some method for choosing which goal prevails when two goals "disagree" on the value of some outcome.
wouldn’t we then want ai to improve its own goals to achieve new ones that have increased effectiveness and improving the value of the world?
That is contradictory: the AI's goal system is the single source of truth for the effectiveness and how much of an improvement is any change in the world.
I would need a definition of AGI before I could sensibly answer those questions.
ChatGPT is already an artificial general intelligence by the definition I have been using for the last 25 years.
I think the leaders of the labs have enough private doubts about the safety of their enterprise that if an effective alignment method were available to them, they would probably adopt the method (especially if the group that devised the method do not seem particularly to care who gets credit for having devised it). I.e., my guess is that almost all of the difficulty is in devising an effective alignment method, not getting the leading lab to adopt it. (Making 100% sure that the leading lab adopts it is almost impossible, but acting in such a way that the leading lab will adopt it with p = .6 is easy, and the current situation is so dire that we should jump at any intervention with a .6 chance of a good outcome.)
Eliezer stated recently (during an interview on video) that the deep-learning paradigm seems particularly hard to align, so it would be nice to get the labs to focus on a different paradigm (even if we do not yet have a way to align the different paradigm) but that seems almost impossible unless and until the other paradigm has been developed to the extent that it can create models that are approximately as capable as deep-learning models.
The big picture is that the alignment project seems almost completely hopeless IMHO because of the difficulty of aligning the kind of designs the labs are using and the difficulty of inducing the labs to switch to easier-to-align designs.
Your question would have been better without the dig at theists and non-vegans.
However, whereas the concept of an unaligned general intelligence has the advantage of being a powerful, general abstraction, the HMS concept has the advantage of being much easier to explain to non-experts.
The trouble with the choice of phrase "hyperintelligent machine sociopath" is that it gives the other side of the argument and easy rebuttal, namely, "But that's not what we are trying to do: we're not trying to create a sociopath". In contrast, if the accusation is that (many of) the AI labs are trying to create a machine smarter than people, then the other side cannot truthfully use the same easy rebuttal. Then our side can continue with, "and they don't have a plan for how to control this machine, at least not any plan that stands up to scrutiny". The phrase "unaligned superintelligence" is an extremely condensed version of the argument I just outlined (where the verb "control" has been replaced with "align" to head off the objection that control would not even be desirable because people are not wise enough and not ethical enough to be given control over something so powerful).
An influential LW participant, Jim Miller, who I think is a professor of economics, has written here that divestment does little good because any reduction in the stock price caused by pulling the investments can be counteracted by profit-motivated actors. For publicly-traded stocks, there is a robust supply of of profit-motivated actors scanning for opportunities. I am eager for more discussion on this topic.
I am alarmed to see that I made a big mistake in my previous comment: where I wrote that "contributing more money to AI-safety charities has almost no positive effects", I should have written "contributing to technical alignment research has almost no positive effects". I have nothing bad to say about contributing money to groups addressing the AI threat in other ways, e.g., by spreading the message that AI research is dangerous or lobbying governments to shut it down.
money generated by increases in AI stock could be used to invest in efforts into AI safety, which receives comparably less money
In the present situation, contributing more money to AI-safety charities has almost no positive effects and does almost nothing to make AI "progress" less dangerous. (In fact, in my estimation, the overall effect of all funding for alignment research so far has make the situation a little worse, by publishing insights that will tend to be usable by capability researchers without making non-negligible progress towards an eventual practical alignment solution.)
If you disagree with me, then please name the charity you believe can convert donations into an actual decrease in p(doom) or say something about how a charity would spend money to decrease the probability of disaster.
Just to be perfectly clear: I support the principle, which I believe has always been operative on LW, that people who are optimistic about AI or who are invested in AI companies are welcome to post on LW.
So let's do it first, before the evil guys do it, but let's do it well from the start!
The trouble is no one knows how to do it well. No one knows how to keep an AI aligned as the AI's capabilities start exceeding human capabilities, and if you believe experts like Eliezer and Connor Leahy, it is very unlikely that anyone is going to figure it out before the lack of this knowledge causes human extinction or something equally dire.
It is only a slight exaggeration to say that the only thing keeping the current crop of AI systems from killing us all (or killing most of us and freezing some of us in case we end up having some use in the future) is simply that no AI or coalition of AIs so far is capable of doing it.
Actually there is a good way to do it: shut down all AI research till humanity figures out alignment, which will probably require waiting for a generation of humans significantly smarter than the current generation, which in turn will require probably at least a few centuries.
The ruling coalition can disincentivize the development of a semiconductor supply chain outside the territories it controls by selling world-wide semiconductors that use "verified boot" technology to make it really hard to use the semiconductor to run AI workloads similar to how it is really hard even for the best jailbreakers to jailbreak a modern iPhone.
Out of curiosity, would you agree with this being the most plausible path, even if you disagree with the rest of my argument?
The most plausible story I can imagine quickly right now is the US and China fight a war and the US wins and uses some of the political capital from that win to slow down the AI project, perhaps through control over the world's leading-edge semiconductor fabs plus pressuring Beijing to ban teaching and publishing about deep learning (to go with a ban on the same things in the West). I believe that basically all the leading-edge fabs in existence or that will be built in the next 10 years are in the countries the US has a lot of influence over or in China. Another story: the technology for "measuring loyalty in humans" gets really good fast, giving the first group to adopt the technology so great an advantage that over a few years the group gets control over the territories where all the world's leading-edge fabs and most of the trained AI researchers are.
I want to remind people of the context of this conversation: I'm trying to persuade people to refrain from actions that on expectation make human extinction arrive a little quicker because most of our (sadly slim) hope for survival IMHO flows from possibilities other than our solving (super-)alignment in time.
People worked on capabilities for decades, and never got anywhere until recently, when the hardware caught up, and it was discovered that scaling works unexpectedly well.
If I believed that, then maybe I'd believe (like you seem to do) that there is no strong reason to believe that alignment project cannot be finished successfully before the capabilities project creates an unaligned super-human AI. I'm not saying scaling and hardware improvement have not been important: I'm saying they were not sufficient: algorithmic improvements were quite necessary for the field to arrive at anything like ChatGPT, and at least as early as 2006, there were algorithm improvements that almost everyone in the machine-learning field recognized as breakthrough or important insights. (Someone more knowledgeable about the topic might be able to push the date back into the 1990s or earlier.)
After the publication 19 years ago by Hinton et al of "A Fast Learning Algorithm for Deep Belief Nets", basically all AI researchers recognized it as a breakthrough. Building on it, was AlexNet in 2012, again recognized as an important breakthrough by essentially everyone in the field (and if some people missed it then certainly generational adversarial networks, ResNets and AlphaGo convinced them). AlexNet was the first deep model trained on GPUs, a technique essential for the major breakthrough in 2017 reported in the paper "Attention is all you need".
In contrast, we've seen nothing yet in the field of alignment that is as unambiguously a breakthrough as is the 2006 paper by Hinton et al or 2012's AlexNet or (emphatically) the 2017 paper "Attention is all you need". In fact I suspect that some researchers could tell that the attention mechanism reported by Bahdanau et al in 2015 or the Seq2Seq models reported on by Sutskever et al in 2014 was evidence that deep-learning language models were making solid progress and that a blockbuster insight like "attention is all you need" is probably only a few years away.
The reason I believe it is very unlikely for the alignment research project to succeed before AI kills us all is that in machine learning or the deep-learning subfield of machine learning, what was recognized by essentially everyone in the field as a minor or major breakthrough has occurred every few years. Many of these breakthrough rely on earlier breakthroughs (i.e., it is very unlikely for the sucessive breakthrough to have occurred if the earlier breakthrough had not been disseminated to the community of researcher). During this time, despite very talented people working on it, there has been zero results in alignment research that the entire field of alignment researchers would consider a breakthrough. That does not mean it is impossible for the alignment project to be finished in time, but it does IMO make it critical for the alignment project to be prosecuted in such a way that it does not inadvertently assist the capabilities project.
Yes, much more money has been spent on capability research the last 20 years than on alignment research, but money doesn't help all that much to speed up research in which to have any hope of solving the problem, the researchers need insight X or X2, and to have any hope of arriving at insight X, they need insights Y and Y2, and to have much hope at all of arriving at Y, they need insight Z.
But what's even more unlikely, is the chance that $200 billion on capabilities research plus $0.1 billion on alignment research is survivable, while $210 billion on capabilities research plus $1 billion on alignment research is deadly.
This assumes that alignment success is the mostly likely avenue to safety for humankind whereas like I said, I consider other avenues more likely. Actually there needs to be a qualifier on that: I consider other avenues more likely than the alignment project's succeeding while the current generation of AI researchers remain free to push capabilities: if the AI capabilities juggernaut could be stopped for 150 years, giving the human population time to get smarter and wiser, then alignment is likely (say p = .7) to succeed in my estimation. I am informed by Eliezer in his latest interview that such a success would probably use some technology other than deep learning to create the AI's capabilities; i.e., deep learning is particularly hard to align.
Central to my thinking is my belief that alignment is just a significantly harder problem than the problem of creating an AI capable of killing us all. Does any of the reasoning you do in your section "the comparision" change if you started believing that alignment is much much harder than creating a superhuman (unaligned) AI?
It will probably come as no great surprise that I am unmoved by the arguments I have seen (including your argument) that Anthropic is so much better than OpenAI that it helps the global situation for me to support Anthropic (if it were up to me, both would be shut down today if I couldn't delegate the decision to someone else and if I had to decide now with the result that there is no time for me to gather more information) but I'm not very certain and would pay attention to future arguments for supporting Anthropic or some other lab.
Thanks for engaging with my comments.
we're probably doomed in that case anyways, even without increasing alignment research.
I believe we're probably doomed anyways.
I think even you would agree what P(1) > P(2)
Sorry to disappoint you, but I do not agree.
Although I don't consider it quite impossible that we will figure out alignment, most of my hope for our survival is in other things, such as a group taking over the world and then using their power to ban AI research. (Note that that is in direct contradiction to your final sentence.) So for example, if Putin or Xi were dictator of the world, my guess is that there is a good chance he would choose to ban all AI research. Why? It has unpredictable consequences. We Westerners (particularly Americans) are comfortable with drastic change, even if that change has drastic unpredictable effects on society; non-Westerners are much more skeptical: there have been too many invasions, revolutions and peasant rebellions that have killed millions in their countries. I tend to think that the main reason Xi supports China's AI industry is to prevent the US and the West from superseding China and if that consideration were removed (because for example he had gained dictatorial control over the whole world) he'd choose to just shut it down (and he wouldn't feel that need to have a very strong argument for that shutting it down like Western decision-makers would: non-Western leader shut important things down all the time or at least they would if the governments they led had the funding and the administrative capacity to do so).
Of course Xi's acquiring dictatorial control over the whole world is extremely unlikely, but the magnitude of the technological changes and societal changes that are coming will tend to present opportunities for certain coalitions to gain and to keep enough power to shut AI research down worldwide. (Having power in all countries hosting leading-edge fabs is probably enough.) I don't think this ruling coalition necessarily need to believe that AI presents a potent risk of human extinction for them to choose to shut it down.
I am aware that some reading this will react to "some coalition manages to gain power over the whole world" even more negatively than to "AI research causes the extinction of the entire human race". I guess my response is that I needed an example of a process that could save us and that would feel plausible -- i.e., something that might actually happen. I hasten add that there might be other processes that save us that don't elicit such a negative reaction -- including processes the nature of which we cannot even currently imagine.
I'm very skeptical of any intervention that reduces the amount of time we have left in the hopes that this AI juggernaut is not really as potent a threat to us as it currently appears. I was much much less skeptical of alignment research 20 years ago, but since then a research organization has been exploring the solution space and the leader of that organization (Nate Soares) and its most senior researcher (Eliezer) are reporting that the alignment project is almost completely hopeless. Yes, this organization (MIRI) is kind of small, but it has been funded well enough to keep about a dozen top-notch researchers on the payroll and it has been competently led. Also, for research efforts like this, how many years the team had to work on the problem is more important than the size of the team, and 22 years is a pretty long time to end up with almost no progress other than some initial insights (around the orthogonality thesis, the fragility of value, convergent instrumental values, CEV as a solution to if the problem were solvable by the current generation of human beings.
OK, if I'm being fair and balanced, then I have to concede that it was probably only in 2006 (when Eliezer figured out how to write a long intellectually-dense blog post every day) or even only in 2008 (when Anna Salamon join the organization -- she was very good at recruiting and had a lot of energy to travel and to meet people) that Eliezer's research organization could start to pick and choose among a broad pool of very talented people, but still between 2008 and now is 17 years, which again is a long time for a strong team to fail to make even a decent fraction of the progress humanity would seem to need to make on the alignment problem if in fact the alignment problem is solvable by spending more money on it. It does not appear to me to be the sort of problem than can be solved with 1 or 2 additional insights; it seems a lot more like the kind of problem where insight 1 is needed, but before any mere human can find insight 1, all the researchers need to have already known insight 2, and to have any hope of finding insight 2, they all would have had to know insight 3, and so on.
AI safety spending is only $0.1 billion while AI capabilities spending is $200 billion. A company which adds a comparable amount of effort on both AI alignment and AI capabilities should speed up the former more than the latter
There is very little hope IMHO in increasing spending on technical AI alignment because (as far as we can tell based on how slow progress has been on it over the last 22 years) it is a much thornier problem than AI capability research and because most people doing AI alignment research don't have a viable story about how they are going to stop any insights / progress they achieve from helping with AI capability research. I mean, if you have a specific plan that avoids these problems, then let's hear it, I am all ears, but advocacy in general of increasing work on technical alignment is counterproductive IMHO.
Good reply. The big difference is that in the Cold War, there was no entity with the ability to stop the 2 parties engaged in the nuclear arms race whereas instead of hoping for the leading labs to come to an agreement, in the current situation, we can lobby the governments of the US and the UK to shut the leading labs down or at least nationalize them. Yes, the still leaves an AI race between the developed nations, but the US and the UK are democracies, and most voters don't like AI whereas the main concern of the leaders of China and Russia is to avoid rebellions of their respective populations and they understand correctly that AI is a revolutionary technology with unpredictable societal effects that might easily empower rebels, so as long as Beijing and Moscow have access to the AI technology useful for effective surveillance of the people, they might stop racing if the US and the UK stop racing (and AI-extinction-risk activists should IMHO probably be helping Beijing and Moscow obtain the AI tech they need to effectively surveil their populations to reduce the incentive for Beijing and Moscow to support native AI research efforts).
To clarify: there is only a tiny shred of hope in the plan I outlined, but it is a bigger shred IMHO than hoping for the AI labs to suddenly start acting responsibly.
Eliezer thinks (as do I) that technical progress in alignment is hopeless without first improving the pool of prospective human alignment researchers (e.g., via human cognitive augmentation).
Who’s track record of AI predictions would you like to see evaluated?
Whoever has the best track record :)
Your specific action places most of its hope for human survival on the entities that have done the most to increase extinction risk.
But I’m more thinking about what work remains.
It depends on how they did it. If they did it by formalizing the notion of "the values and preferences (coherently extrapolated) of (the living members of) the species that created the AI", then even just blindly copying their design without any attempt to understand it has a very high probability of getting a very good outcome here on Earth.
The AI of course has to inquire into and correctly learn about our values and preferences before it can start intervening on our behalf, so one way such a blind copying might fail is if the method the aliens used to achieve this correct learning depended on specifics of the situation on the alien planet that don't obtain here on Earth.
I’ve also experienced someone discouraging me from acquiring technical AI skills for the purpose of pursuing a career in technical alignment because they don’t want me to contribute to capabilities down the line. They noted that most people who skill up to work on alignment end up working in capabilities instead.
I agree with this. I.e., although it is possible for individual careers in technical alignment to help the situation, most such careers have the negative effect of speeding up the AI juggernaut without any offsetting positive effects. I.e., the fewer people trained in technical alignment, the better.
Patenting an invention necessarily discloses the invention to the public. (In fact, incentivizing disclosure was the rationale for the creation of the patent system.) That is a little worrying because the main way the non-AI tech titans (large corporations) have protected themselves from patent suits has been to obtain their own patent portfolios, then entering into cross-licensing agreements with the holders of the other patent portfolios. Ergo, the patent-trolling project you propose could incentivize the major labs to disclose inventions that they are currently keeping secret, which of course would be bad.
Although patent trolling is a better use of money IMHO than most of things designed to reduce AI extinction risk that have been funded so far, lobbying the US and UK governments to nationalize their AI labs is probably better because a government program can be a lot more strict than a private organization can be in prohibiting researchers and engineers from sharing insights and knowledge outside the program. In particular, if the nationalized program is sufficiently like top-secret defense programs, long prison sentences are the penalty for a worker to share knowledge outside of the program.
Also, note that the patent troll needs to be well-funded (with a warchest of 100s of millions of dollars preferably) before it starts to influence the behavior of the AI labs although if it starts out with less and gets lucky, it might in time manage to get the needed warchest by winning court cases or arriving at settlements with the labs.
The non-AI parts of the tech industry have been dealing with patent trolls since around 1980 and although technologists certainly complain about these patent trolls, I've never seen any indication that anyone has been completely demoralized by them and we do not see young talented people proclaim that they're not going to go into tech because of the patent trolls. Nationalization of AI research in contrast strikes me as having the potential to completely demoralize a significant fraction of AI researchers and put many talented young people off of the career path although I admit it will also tend to make the career path more appealing to some young people.
Both interventions (patent trolling and lobbying for nationalization) are just ways to slow down the AI juggernaut, buying us a little time to luck into some more permanent solution.
With a specific deadline and a specific threat of a nuclear attack on the US.
The transfer should be made in January 2029
I think you mean in January 2029 or earlier if the question resolves before the end of 2028 otherwise there would be no need to introduce the CPI into the bet to keep things fair (or predictable).
You do realize that by "alignment", the OP (John) is not talking about techniques that prevent an AI that is less generally capable than a capable person from insulting the user or expressing racist sentiments?
We seek a methodology for constructing an AI that either ensures that the AI turns out not to be able to easily outsmart us or (if it does turn out to be able to easily outsmart us) ensures (or makes it unlikely) that it won't kill us all or do something other terrible thing. (The former is not researched much compared to the latter, but I felt the need to include it for completeness.)
The way it is now, it is not even clear whether you and the OP (John) are talking about the same thing (because "alignment" has come to have a broad meaning).
If you want to continue the conversation, it would help to know whether you see a pressing need for a methodology of the type I describe above. (Many AI researchers do not: they think that outcomes like human extinction are quite unlikely or at least easy to avoid.)
Although the argument you outline might be an argument against ever fully trusting tests (usually called "evals" on this site) that this or that AI is aligned, alignment researchers have other tools in their toolbox besides running tests or evals.
It would take a long time to explain these tools, particularly to someone unfamiliar with software development or a related field like digital-electronics design. People make careers in studying tools to make reliable software systems (and reliable digital designs).
The space shuttle is steered by changing the direction in which the rocket nozzles point relative to the entire shuttle. If at any point in flight, one of the rocket nozzles had pointed in a direction a few degrees different from the one it should point in, the shuttle would have been lost and all aboard would have died. The pointing or aiming of these rocket nozzles is under software control. How did the programmers at NASA make this software reliable? Not merely through testing!
These programmers at NASA relied on their usual tools (basically engineering-change-order culture) and did not need a more advanced tool called "formal verification", which Intel turned to to make sure their microprocessors did not have any flaw necessitating another expensive recall after they spent 475 million dollars in 1994 in a famous product recall to fix the so-called Pentium FDIV bug.
Note that FDIV refers to division of (floating-point) numbers and that it is not possible in one human lifetime to test all possible dividends and divisors to ensure that a divider circuit is operating correctly. I.e., the "impossible even theoretically" argument you outline would have predicted that it is impossible to ensure the correct operation of even something as simple as a divider implemented in silicon, and yet Intel has during the 30 years since the 1994 recall avoided another major recall of any of their microprocessors for reasons of a mistake in their digital design.
"Memory allocation" errors (e.g., use-after-free errors) are an important source of bugs and security vulnerabilities in software, and testing has for decades been an important way to find and eliminate these errors (Valgrind probably being the most well-known framework for doing the testing) but the celebrated new programming language Rust completely eliminates the need for testing for this class of programming errors. Rust replaces testing with a more reliable methodology making use of a tool called a "borrow checker". I am not asserting that a borrow checker will help people create an aligned super-intelligence: I am merely pointing at Rust and its borrow checker as an example of a methodology that is superior to testing for ensuring some desirable property (e.g., the absence of use-after-free errors that an attacker might be able to exploit) of a complex software artifact.
In summary, aligning a superhuman AI is humanly possible given sufficiently careful and talented people. The argument for stopping frontier AI research (or pausing it for 100 years) depends on considerations other than the "impossible even theoretically" argument you outline.
Methodologies that are superior to testing take time to develop. For example, the need for a better methodology to prevent "memory allocation" errors was recognized in the 1970s. Rust and its borrow checker are the result of a line of investigation inspired by a seminal paper published in 1987. But Rust has become a realistic option for most programming projects only within the last 10 years or less. And an alignment methodology that continue to be reliable even when the AI becomes super-humanly capable is a much taller order than a methodology to prevent use-after-free errors and related memory-allocation errors.
The strongest sign an attack is coming that I know of is firm evidence that Russia or China is evacuating her cities.
Another sign that would get me to flee immediately (to a rural area of the US: I would not try to leave the country) is a threat by Moscow that Moscow will launch an attack unless Washington takes action A (or stops engaging in activity B) before specific time T.
Western Montana is separated from the missile fields by mountain ranges and the prevailing wind direction and is in fact considered the best place in the continental US to ride out a nuclear attack by Joel Skousen. Being too far away from population centers to be walkable by refugees is the main consideration for Skousen.
Skousen also likes the Cumberland Plateau because refugees are unlikely to opt to walk up the escarpment that separates the Plateau from the population centers to its south.
The overhead is mainly the "fixed cost" of engineering something that works well, which suggest re-using some of the engineering costs already incurred in making it possible for a person to make a hands-free phone call on a smartphone.
Off-topic: most things (e.g., dust particles) that land in an eye end up in the nasal cavity, so I would naively expect that protecting the eyes would be necessary to protect oneself fully from respiratory viruses:
https://www.ranelle.com/wp-content/uploads/2016/08/Tear-Dust-Obstruction-1024x485.jpg
Does anyone care to try to estimate how much the odds ratio of getting covid (O(covid)) decreases when we intervene by switching a "half-mask" respirator such as the ones pictured here for a "full-face" respirator (which protects the eyes)?
The way it is now, when one lab has an insight, the insight will probably spread quickly to all the other labs. If we could somehow "drive capability development into secrecy," that would drastically slow down capability development.
Malice is a real emotion, and it is a bad sign (but not a particularly strong sign) if a person has never felt it.
Yes, letting malice have a large influence on your behavior is a severe character flaw, that is true, but that does not mean that never having felt malice or being incapable of acting out of malice is healthy.
Actually, it is probably rare for a person never to act out of malice: it is probably much more common for a person to just be unaware of his or her malicious motivations.
The healthy organization is to be tempted to act maliciously now and then, but to be good at perceiving when that is happening and to ignore or to choose to resist the temptation most of the time (out of a desire to be a good person).
I expect people to disagree with this answer.
their >90% doom disagrees with almost everyone else who thinks seriously about AGI risk.
The fact that your next sentence refers to Rohin Shah and Paul Christiano, but no one else, makes me worry that for you, only alignment researchers are serious thinkers about AGI risk. Please consider that anyone whose P(doom) is over 90% is extremely unlikely to become an alignment researcher (or to remain one if their P(doom) became high when they were an alignment researcher) because their model will tend predict that alignment research is futile or that it actually increases P(doom).
There is a comment here (which I probably cannot find again) by someone who was in AI research in the 1990s, then he realized that the AI project is actually quite dangerous, so he changed careers to something else. I worry that you are not counting people like him as people who have thought seriously about AGI risk.
I disagree. I think the fact that our reality branches a la Everett has no bearing on our probability of biogensis.
Consider a second biogenesis that happened recently enough and far away enough that light (i.e., information, causal influence) has not had enough time to travel from it to us. We know such regions of spacetime "recent enough and far away enough" exist and in principle could host life, but since we cannot observe a sign of life or a sign of lack of life from them, they are not relevant to our probability of biogenesis whereas by your logic, they are relevant.
new cities like Los Alamos or Hanover
You mean Hanford.
When you write,
a draft research paper or proposal that frames your ideas into a structured and usable format,
who is "you"?
I know you just said that you don't completely trust Huberman, but just today, Huberman published a 30-minute video titled "Master your sleep and be more alert when awake". I listened to it (twice) to refresh my memory and to see if his advice changed.
He mentions yellow-blue (YB) contrasts once (at https://www.youtube.com/watch?v=lIo9FcrljDk&t=502s) and at least thrice he mentions the desirability of exposure to outdoor light when the sun is at a low angle (close to the horizon). As anyone can see by looking around at dawn and again at mid-day, at dawn some parts of the sky will be yellowish (particularly, the parts of the sky near the sun) or even orange whereas other parts will range from pale blue to something like turquoise to deep blue whereas at mid-day the sun is white, the part of the sky near the sun is blue and the blue parts of the sky are essentially all the same shade or hue of blue.
He also says that outdoor light (directly from the sun or indirectly via atmospheric scattering) is the best kind of light for maintaining a healthy circadian rhythm, but that if getting outdoors early enough that the sun is still low in the sky is impractical, artificial light can be effective, particularly blue-heavy artificial light.
I've been help greatly over the last 2 years by a protocol in which I get outdoor light almost every morning when the YB contrasts are at its most extreme, namely between about 20 min before sunrise and about 10 min after sunrise on clear days and a little later on cloudy days. (The other element of my protocol that I know to be essential is strictly limit my exposure to light between 23:00 and 05:00.) I was motivated to comment on your post because it did not contain enough information to help someone sufficiently similar to me (the me of 2 years ago) to achieve the very welcome results I achieved: I'm pretty sure that even very bright artificial light from ordinary LED lights that most of us have in our home (even very many of them shining all at once) would not have helped me nearly as much.
Huberman is not so insistent on getting outside during this 30-minute interval of maximum YB contrast as my protocol is. In fact in today's video he says that he himself often gets outside only after the sun has been out for an hour or 2 and is consequently no longer particularly near the horizon.
Health-conscious people apply a (software-based) filter to their screens in the evening to reduce blue light emitted from the screen. On iOS this is called Night Shift. If your rendition of the effects of light on the circadian rhythm (CR) is complete, then they're doing everything they can do, but if YB contrasts have important effects on the CR, it might be useful in addition to eliminate YB contrasts on our digital devices (which Night Shift and its analogs on the other platforms do not eliminate). This can be done by turning everything shades of gray. (On iOS for example, this can be achieved in Settings > Accessibility > Display & Text Size > Color Filters > Grayscale and can be combined with or "overlaid on" Night Shift.) I and others do this (turn of a filter that makes everything "grayscale") routinely to make it more likely that we will get sleepy sufficiently early in the evening. Additional people report that they like to keep their screens grayscale, but do not cite the CR as the reason for their doing so.
Is a computer screen bright enough such that YB contrasts on the screen can activate the machinery in the retina that is activated by a sunrise? I'm not sure, but I choose to eliminate YB contrasts on my screens just in case it is.
Finally let me quote what I consider the main takeaway from the video Huberman posted today, which I expect we both agree with:
Get up each morning, try to get outside. I know that can be challenging for people, but anywhere from 2 to 10 min of sun exposure will work well for most people. If you can't do it every day or if you sleep through this period of early-day low-solar angle, don't worry about it. The systems in the body -- these hormone systems and neurotransmitter systems -- that make you awake at certain periods of the day and sleepy at other times are operating by averaging when you view the brightest light.
This post paints a partially inaccurate picture. IMHO the following is more accurate.
Unless otherwise indicated, the following information comes from Andrew Huberman. Most comes from Huberman Lab Podcast #68. Huberman opines on a great many health topics. I want to stress that I don't consider Huberman a reliable authority in general, but I do consider him reliable on the circadian rhythm and on motivation and drive. (His research specialization for many years was the former and he for many years has successfully used various interventions to improve his own motivation and drive, which is very high.)
Bright light (especially bluish light) makes a person more alert. (Sufficiently acute exposure to cold, e.g., a plunge into 45-degree water, is an even stronger cause of alertness. Caffeine is another popular intervention for causing alertness.) After many hours of being alert and pursuing goals, a person will get tired, and this tiredness tends to help the person go to sleep. However, the SCN operates independently of exposure to bright light (and cold and caffeine) and independently of how many hours in a row the person has already been alert. A good illustration of that is what happens when a person pulls an all-nighter: at about 4:30 it becomes easier for most people pulling an all-nighter to stay awake even if the person is not being exposed to bright light and even if the person has already been awake for a very long time. Without any light as a stimulus, at around 04:30 the SCN decides to get the brain and the body ready for wakefulness and activity. So, let us inquire how the SCN would stay in sync with the sun in the ancestral environment before the availability of artificial lighting. How does the SCN know that dawn is coming soon?
The answer is that it is complicated, like most things in biology, but I think most neuroscientists agree that the stimulus that most potently entrains the SCN (i.e., that is most effective at ensuring the the SCN is in sync with the sun) is yellow-blue (YB) contrasts. Specifically, the SCN knows it is 4:30 and consequently time to start making the body alert because of the person's exposure to these "YB contrasts" on previous days. Exposure in the evening has an effect, but the strongest effect is exposure circa dawn.
When the sun is at a high angular elevation, it is white and the surrounding sky is dark blue (assuming a cloudless sky). When the sun is slightly above or below the horizon, the part of the sky near the sun is yellow or even orange or pink and with further (angular) distance from the sun, the sky gets steadily bluer. (Note that even 30 minutes before sunrise, the sky is already much brighter than your cell phone's screen or most indoor environments: there is an optical illusion whereby people underestimate the brightness of a light source when the source is spread over a large (angular) area and overestimate the brightness of "point sources" like light bulbs.)
The sensing of these YB contrasts is done by a system distinct from the usual visual system (i.e., the system that gives visual information that is immediately salient to the conscious mind) and in particular there are fewer "sensing pixels" and they are spread further apart than the "pixels" in the usual visual system. The final page of this next 5-page paper has a nice image of the author's estimate of a bunch of "sensing pixels" depicted as dotted circles laid over a photo of a typical sunrise:
https://pmc.ncbi.nlm.nih.gov/articles/PMC8407369/pdf/nihms-1719642.pdf
A light bulb named Tuo is recommended at least half-heartedly by Huberman for controlling the circadian rhythm. Huberman says IIRC it works by alternating between yellow light and blue light many times a second. Huberman explained IIRC that both "spatial" YB contrasts and "temporal" YB contrasts serve a signal that "dawn or dusk is happening". I definitely recall Huberman saying that outdoor light is preferred to this Tuo light bulb, and I understood him to mean that it is more likely to work because no one understands SCN entrainment well enough right now to design an artificial light source guaranteed to work.
The web site for this light bulb says,
Most light therapy products on the market today are based on blue light. This is 15-year-old science that has since been proven incorrect. New science based on laboratory research conducted by the University of Washington, one of the world's top vision and neuroscience centers, shows that blue light has little to no effect in shifting your circadian rhythm. High brightness levels of blue light can have some effect, but this falls short when compared to the power of TUO.
High lux products can work, but they require 10,000 lux of light at a distance of under 2 feet for the duration of treatment. This light level at this distance is uncomfortable for most people. Treatment also needs to happen first thing in the morning to be most effective. Who has time to sit within 2 feet of a bulb for up to a half hour when first waking up? High lux products offer dim settings to make use more comfortable and typically downplay the distance requirement. Unfortunately, at less than 10,000 lux and 2 feet of distance, high lux products have little to no impact.
Huberman recommends getting as much bright light early in the day as possible -- "preferably during the first hour after waking, but definitely during the first 3 hours". This encourages the body to produce cortisol and dopamine, which at this time of day is very good for you (and helps you be productive). But this bright light won't have much effect on keeping your circadian clock entrained with the schedule you want it be entrained with unless the light contains YB contrasts; i.e., getting bright light right after waking is good and getting sufficiently-bright light containing YB contrasts right after waking is also good, but they are good for different reasons (though the first kind of good contributes to a small extent to the second kind of good through the mechanism I described in my second paragraph).
Huberman is insistent that it is not enough to expose yourself to light after your normal wake-up time: you also have to avoid light when you are normally asleep. Suppose your normal wake-up time is 06:00. To maintain a strong circadian rhythm and to get to sleep at a regular time each night (which is good for you and which most people should strive to do) it is essential to avoid exposure to light during the 6 hours between 23:00 and 05:00. Whereas dim light has little positive effect after wake-up time, even quite dim light or light of brief duration between 23:00 and 05:00 tends to have pronounced negative effects.
Light during these 6 hours not only confuses the circadian clock (which is bad and makes it hard to get to sleep at a healthy hour) but it also decreases the amount of motivation and drive available the next morning (by sending signals to a brain region called the habenula). I personally have noticed a strong increase in my level of motivation and drive on most mornings after I instituted the habits described in this comment. (And I more reliably start my sleep at what I consider a healthy hour, but that was less critical in my case because insomnia was never a huge problem of mine.)
Huberman says that getting outside at dawn works to keep the SCN in sync with the sun even on completely overcast days, but it requires longer duration of exposure: 20 minutes instead of 5 minutes IIRC. He says that there are YB contrasts in the overcast sky when the sun is near the horizon that are absent when the angle of the sun is high.
To this point in this comment I merely repeated information I learned from Huberman (and maybe a bit from Wikipedia or such -- it is hard to remember) although I hasten to add that this information certainly jibes with my own experience of going outside at dawn almost every day starting about 2 years ago. Allow me to add one thing of my own invention, namely, what to call this 6-hour interval every night when it is a bad idea to let your eyes be exposed to light: I humbly suggest "curfew". The original meaning of "curfew" was a time every night during which it was illegal in medieval London to have a fire going even in your own fireplace in your own home. (I.e., it was a measure to prevent urban fires.)
“Game theoretic strengthen-the-tribe perspective” is a completely unpersuasive argument to me. The psychological unity of humankind OTOH is persuasive when combined with the observation that this unitary psychology changes slowly enough that the human mind’s robust capability to predict the behavior of conspecifics (and manage the risks posed by them) can keep up.
Although I agree with another comment that Wolfram has not "done the reading" on AI extinction risk, my being able to watch his face while he confronts some of the considerations and arguments for the first time made it easier, not harder, for me to predict where his stance on the AI project will end up 18 months from now. It is hard for me to learn anything about anyone by watching them express a series of cached thoughts.
Near the end of the interview, Wolfram say that he cannot do much processing of what was discussed "in real time", which strongly suggests to me that he expects to process it slowly over the next days and weeks. I.e., he is now trying to reassure himself that the AI project won't kill his four children or any grandchildren he has or will have. Because Wolfram is much better AFAICT at this kind of slow "strategic rational" deliberation than most people at his level of life accomplishment, there is a good chance he will fail to find his slow deliberations reassuring, in which case he probably then declares himself an AI doomer. Specifically, my probability is .2 that 18 months from now, Wolfram will have come out publicly against allowing ambitious frontier AI research to continue. P = .2 is much much higher than my P for the average 65-year-old of his intellectual stature who is not specialized in AI. My P is much higher mostly because I watched this interview; i.e., I was impressed by Wolfram's performance in this interview despite his spending the majority of his time on rabbit holes than I could quickly tell had no possible relevance to AI extinction risk.
My probability that he will become more optimistic about the AI project over the next 18 months is .06: mostly likely, he goes silent on the issue or continues to take an inquisitive non-committal stance in his public discussions of it.
If Wolfram had a history of taking responsibility for his community, e.g., campaigning against drunk driving or running for any elected office, my P of his declaring himself an AI doomer (i.e., becoming someone trying to stop AI) would go up to .5. (He might in fact have done something to voluntarily take responsibility for his community, but if so, I haven't been able to learn about it.) If Wolfram were somehow forced to take sides, and had plenty of time to deliberate calmly on the choice after the application of the pressure to choose sides, he would with p = .88 take the side of the AI doomers.