Posts
Comments
(COI: I’m a lesswrong power-user)
Instead of the hands-on experimentation I expected, what I see is a culture heavily focused on long-form theoretical posts.
FWIW if you personally want to see more of those you can adjust the frontpage settings to boost the posts with a "practical" tag. Or for a dedicated list: https://www.lesswrong.com/tag/practical?sortedBy=magic. I agree that such posts are currently a pretty small fraction of the total, for better or worse. But maybe the absolute number is a more important metric than the fraction?
I’ve written a few “practical” posts on LW, and I generally get very useful comments on them.
Consensus-Building Tools
I think these mostly have yet to be invented.
Consensus can be SUPER HARD. In my AGI safety work, on ~4 occasions I’ve tried to reconcile my beliefs with someone else, where it wound up being the main thing I was doing for about an entire month, just to get to the point where I could clearly articulate what the other person believed and why I disagreed with it. As for actually reaching consensus with the other person, I gave up before getting that far! (See e.g. here, here, here)
I don't really know what would make that kind of thing easier but I hope someone figures it out!
“It is more valuable to provide accurate forecasts than add new, relevant, carefully-written considerations to an argument”
On what margin? In what context? I hope we can all think of examples where one thing is valuable, and where the other thing is valuable. If Einstein predicted the Eddington experiment results but didn’t explain the model underlying his prediction, I don’t think anyone would have gotten much out of it, and really probably nobody would have bothered doing the Eddington experiment in the first place.
Manifold, polymarket, etc. already exist, and I'm very happy they do!! I think lesswrong is filling a different niche, and that's fine.
High status could be tied to demonstrated good judgment through special user flair for accurate forecasters or annual prediction competitions.
As for reputation, I think the idea is that you should judge a comment or post by its content and not by the karma of the person who wrote it. Comment karma and post karma are on display, but by contrast user karma is hidden behind a hover or click. That seems good to me. I myself write posts and comments of widely varying quality, and other people sure do too.
An important part of learning is feeling free and safe to be an amateur messing around with half-baked ideas in a new area—overly-central “status” systems can sometimes discourage that kind of thing, which is bad. (Cf. academia.)
(I think there’s a mild anticorrelation between my own posts’ karma and how objectively good and important they are, see here, so it’s a good thing that I don’t care too much about karma!) (Of course the anticorrelation doesn’t mean high karma is bad, rather it’s from conditioning on a collider.)
For long-time power users like me, I can benefit from the best possible “reputation system”, which is actually knowing most of the commenters. That’s great because I don’t just know them as "good" or "bad", but rather "coming from such-and-such perspective" or "everything they say sounds nuts, and usually is, but sometimes they have some extraordinary insight, and I should especially be open-minded to anything they say in such-and-such domain".
If there were a lesswrong prediction competition, I expect that I probably wouldn’t participate because it would be too time-consuming. There are some people where I would like them to take my ideas seriously, but such people EITHER (1) already take my ideas seriously (e.g. people really into AGI safety) OR (2) would not care whether or not I have a strong forecasting track-record (e.g. Yann LeCun).
There’s also a question about cross-domain transferability of good takes. If we want discourse about near-term geopolitical forecasting, then of course we should platform people with a strong track record of near-term geopolitical forecasting. And if we want discourse about the next ML innovation, then we should platform people with a strong track record of coming up with ML innovations. I’m most interested in neither of those, but rather AGI / ASI, which doesn’t exist yet. Empirically, in my opinion, “ability to come up with ML innovations” transfers quite poorly to “ability to have reasonable expectations about AGI / ASI”. I’m thinking of Yann LeCun for example. What about near-term geopolitical forecasting? Does that transfer? Time will tell—mostly when it’s already too late. At the very least, there are skilled forecasters who strongly disagree with each other about AGI / ASI, so at least some of them are wrong.
(If someone in 1400 AD were quite good at predicting the next coup or war or famine, I wouldn’t expect them to be particularly good at predicting how the industrial revolution would go down. Right? And I think AGI / ASI is kinda like the latter.)
So anyway, probably best to say that we can’t predict a priori who is going to have good takes on AGI, just based on track-record in some different domain. So that’s yet another reason to not have a super central and visible personal reputation system, IMO.
The main insight of the post (as I understand it) is this:
- In the context of a discussion of whether we should be worried about AGI x-risk, someone might say “LLMs don't seem like they're trying hard to autonomously accomplish long-horizon goals—hooray, why were people so worried about AGI risk?”
- In the context of a discussion among tech people and VCs about how we haven't yet made an AGI that can found and run companies as well as Jeff Bezos, someone might say “LLMs don't seem like they're trying hard to autonomously accomplish long-horizon goals—alas, let's try to fix that problem.”
One sounds good and the other sounds bad, but there’s a duality connecting them. They’re the same observation. You can’t get one without the other.
This is an important insight because it helps us recognize the fact that people are trying to solve the second-bullet-point problem (and making nonzero progress), and to the extent that they succeed, they’ll make things worse from the perspective of the people in the first bullet point.
This insight is not remotely novel! (And OP doesn’t claim otherwise.) …But that’s fine, nothing wrong with saying things that many readers will find obvious.
(This “duality” thing is a useful formula! Another related example that I often bring up is the duality between positive-coded “the AI is able to come up with out-of-the-box solutions to problems” versus the negative-coded “the AI sometimes engages in reward hacking”. I think another duality connects positive-coded “it avoids catastrophic forgetting” to negative-coded “it’s hard to train away scheming”, at least in certain scenarios.)
(…and as comedian Mitch Hedberg sagely noted, there’s a duality between positive-coded “cheese shredder” and negative-coded “sponge ruiner”.)
The post also chats about two other (equally “obvious”) topics:
- Instrumental convergence: “the AI seems like it's trying hard to autonomously accomplish long-horizon goals” involves the AI routing around obstacles, and one might expect that to generalize to “obstacles” like programmers trying to shut it down
- Goal (mis)generalization: If “the AI seems like it's trying hard to autonomously accomplish long-horizon goal X”, then the AI might actually “want” some different Y which partly overlaps with X, or is downstream from X, etc.
But the question on everyone’s mind is: Are we doomed?
In and of itself, nothing in this post proves that we’re doomed. I don’t think OP ever explicitly claimed it did? In my opinion, there’s nothing in this post that should constitute an update for the many readers who are already familiar with instrumental convergence, and goal misgeneralization, and the fact that people are trying to build autonomous agents. But OP at least gives a vibe of being an argument for doom going beyond those things, which I think was confusing people in the comments.
Why aren’t we necessarily doomed? Now this is my opinion, not OP’s, but here are three pretty-well-known outs (at least in principle):
- The AI can “want” to autonomously accomplish a long-horizon goal, but also simultaneously “want” to act with integrity, helpfulness, etc. Just like it’s possible for humans to do. And if the latter “want” is strong enough, it can outvote the former “want” in cases where they conflict. See my post Consequentialism & corrigibility.
- The AI can behaviorist-“want” to autonomously accomplish a long-horizon goal, but where the “want” is internally built in such a way as to not generalize OOD to make treacherous turns seem good to the AI. See e.g. my post Thoughts on “Process-Based Supervision”, which is skeptical about the practicalities, but I think the idea is sound in principle.
- We can in principle simply avoid building AIs that autonomously accomplish long-horizon goals, notwithstanding the economic and other pressures—for example, by keeping humans in the loop (e.g. oracle AIs). This one came up multiple times in the comments section.
There’s plenty of challenges in these approaches, and interesting discussions to be had, but the post doesn’t engage with any of these topics.
Anyway, I’m voting strongly against including this post in the 2023 review. It’s not crisp about what it’s arguing for and against (and many commenters seem to have gotten the wrong idea about what it’s arguing for), it’s saying obvious things in a meandering way, and it’s not refuting or even mentioning any of the real counterarguments / reasons for hope. It’s not “best of” material.
How is it that some tiny number of man made mirror life forms would be such a threat to the millions of naturally occurring life forms, but those millions of naturally occurring life forms would not be an absolutely overwhelming symmetrical threat to those few man made mirror forms?
Can’t you ask the same question for any invasive species? Yet invasive species exist. “How is it that some people putting a few Nile perch into Lake Victoria in the 1950s would cause ‘the extinction or near-extinction of several hundred native species’, but the native species of Lake Victoria would not be an absolutely overwhelming symmetrical threat to those Nile perch?”
If I'm not mistaking, you've already changed the wording
No, I haven’t changed anything in this post since Dec 11, three days before your first comment.
valid EA response … EA forum … EA principles …
This isn’t EA forum. Also, you shouldn’t equate “EA” with “concerned about AGI extinction”. There are plenty of self-described EAs who think that AGI extinction is astronomically unlikely and a pointless thing to worry about. (And also plenty of self-described EAs who think the opposite.)
prevent spam/limit stupid comments without causing distracting emotions
If Hypothetical Person X tends to write what you call “stupid comments”, and if they want to be participating on Website Y, and if Website Y wants to prevent Hypothetical Person X from doing that, then there’s an irreconcilable conflict here, and it seems almost inevitable that Hypothetical Person X is going to wind up feeling annoyed by this interaction. Like, Website Y can do things on the margin to make the transaction less unpleasant, but it’s surely going to be somewhat unpleasant under the best of circumstances.
(Pick any popular forum on the internet, and I bet that either (1) there’s no moderation process and thus there’s a ton of crap, or (2) there is a moderation process, and many of the people who get warned or blocked by that process are loudly and angrily complaining about how terrible and unjust and cruel and unpleasant the process was.)
Anyway, I don’t know why you’re saying that here-in-particular. I’m not a moderator, I have no special knowledge about running forums, and it’s way off-topic. (But if it helps, here’s a popular-on-this-site post related to this topic.)
[EDIT: reworded this part a bit.]
what would be a valid EA response to the arguments coming from people fitting these bullets:
- Some are over-optimistic based on mistaken assumptions about the behavior of humans;
- Some are over-optimistic based on mistaken assumptions about the behavior of human institutions;
That’s off-topic for this post so I’m probably not going to chat about it, but see this other comment too.
I think of myself as having high ability and willingness to respond to detailed object-level AGI-optimist arguments, for example:
- Response to Dileep George: AGI safety warrants planning ahead
- Response to Blake Richards: AGI, generality, alignment, & loss functions
- Thoughts on “AI is easy to control” by Pope & Belrose
- LeCun’s “A Path Towards Autonomous Machine Intelligence” has an unsolved technical alignment problem
- Munk AI debate: confusions and possible cruxes
…and more.
I don’t think this OP involves “picturing AI optimists as stubborn simpletons not being able to get persuaded finally that AI is a terrible existential risk”. (I do think AGI optimists are wrong, but that’s different!) At least, I didn’t intend to do that. I can potentially edit the post if you help me understand how you think I’m implying that, and/or you can suggest concrete wording changes etc.; I’m open-minded.
Yeah, the word “consummatory” isn’t great in general (see here), maybe I shouldn’t have used it. But I do think walking is an “innate behavior”, just as sneezing and laughing and flinching and swallowing are. E.g. decorticate rats can walk. As for human babies, they’re decorticate-ish in effect for the first months but still have a “walking / stepping reflex” from day 1 I think.
There can be an innate behavior, but also voluntary cortex control over when and whether it starts—those aren’t contradictory, IMO. This is always true to some extent—e.g. I can voluntarily suppress a sneeze. Intuitively, yeah, I do feel like I have more voluntary control over walking than I do over sneezing or vomiting. (Swallowing is maybe the same category as walking?) I still want to say that all these “innate behaviors” (including walking) are orchestrated by the hypothalamus and brainstem, but that there’s also voluntary control coming via cortex→hypothalamus and/or cortex→brainstem motor-type output channels.
I’m just chatting about my general beliefs. :) I don’t know much about walking in particular, and I haven’t read that particular paper (paywall & I don’t have easy access).
Oh I forgot, you’re one of the people who seems to think that the only conceivable reason that anyone would ever talk about AGI x-risk is because they are trying to argue in favor of, or against, whatever AI government regulation was most recently in the news. (Your comment was one of the examples that I mockingly linked in the intro here.)
If I think AGI x-risk is >>10%, and you think AGI x-risk is 1-in-a-gazillion, then it seems self-evident to me that we should be hashing out that giant disagreement first; and discussing what if any government regulations would be appropriate in light of AGI x-risk second. We’re obviously not going to make progress on the latter debate if our views are so wildly far apart on the former debate!! Right?
So that’s why I think you’re making a mistake whenever you redirect arguments about the general nature & magnitude & existence of the AGI x-risk problem into arguments about certain specific government policies that you evidently feel very strongly about.
(If it makes you feel any better, I have always been mildly opposed to the six month pause plan.)
I’ve long had a tentative rule-of-thumb that:
- medial hypothalamus neuron groups are mostly “tracking a state variable”;
- lateral hypothalamus neuron groups are mostly “turning on a behavior” (especially a “consummatory behavior”).
(…apart from the mammillary areas way at the posterior end of the hypothalamus. They’re their own thing.)
State variables are things like hunger, temperature, immune system status, fertility, horniness, etc.
I don’t have a great proof of that, just some indirect suggestive evidence. (Orexin, contiguity between lateral hypothalamus and PAG, various specific examples of people studying particular hypothalamus neurons.) Anyway, it’s hard to prove directly because changing a state variable can lead to taking immediate actions. And it’s really just a rule of thumb; I’m sure there’s exceptions, and it’s not really a bright-line distinction anyway.
The literature on the lateral hypothalamus is pretty bad. The main problem IIUC is that LH is “reticular”, i.e. when you look at it under the microscope you just see a giant mess of undifferentiated cells. That appearance is probably deceptive—appropriate stains can reveal nice little nuclei hiding inside the otherwise-undifferentiated mess. But I think only one or a few such hidden nuclei are known (the example I’m familiar with is “parvafox”).
Yup! I think discourse with you would probably be better focused on the 2nd or 3rd or 4th bullet points in the OP—i.e., not “we should expect such-and-such algorithm to do X”, but rather “we should expect people / institutions / competitive dynamics to do X”.
I suppose we can still come up with “demos” related to the latter, but it’s a different sort of “demo” than the algorithmic demos I was talking about in this post. As some examples:
- Here is a “demo” that a leader of a large active AGI project can declare that he has a solution to the alignment problem, specific to his technical approach, but where the plan doesn’t stand up to a moment’s scrutiny.
- Here is a “demo” that a different AGI project leader can declare that even trying to solve the alignment problem is already overkill, because misalignment is absurd and AGIs will just be nice, again for reasons that don’t stand up to a moment’s scrutiny.
- (And here’s a “demo” that at least one powerful tech company executive might be fine with AGI wiping out humanity anyway.)
- Here is a “demo” that if you give random people access to an AI, one of them might ask it to destroy humanity, just to see what would happen. Granted, I think this person had justified confidence that this particular AI would fail to destroy humanity …
- … but here is a “demo” that people will in fact do experiments that threaten the whole world, even despite a long track record of rock-solid statistical evidence that the exact thing they’re doing is indeed a threat to the whole world, far out of proportion to its benefit, and that governments won’t stop them, and indeed that governments might even fund them.
- Here is a “demo” that, given a tradeoff between AI transparency (English-language chain-of-thought) and AI capability (inscrutable chain-of-thought but the results are better), many people will choose the latter, and pat themselves on the back for a job well done.
- Every week we get more “demos” that, if next-token prediction is insufficient to make a powerful autonomous AI agent that can successfully pursue long-term goals via out-of-the-box strategies, then many people will say “well so much the worse for next-token prediction”, and they’ll try to figure some other approach that is sufficient for that.
- Here is a “demo” that companies are capable of ignoring or suppressing potential future problems when they would interfere with immediate profits.
- Here is a “demo” that it’s possible for there to be a global catastrophe causing millions of deaths and trillions of dollars of damage, and then immediately afterwards everyone goes back to not even taking trivial measures to prevent similar or worse catastrophes from recurring.
- Here is a “demo” that the arrival of highly competent agents with the capacity to invent technology and to self-reproduce is a big friggin’ deal.
- Here is a “demo” that even small numbers of such highly competent agents can maneuver their way into dictatorial control over a much much larger population of humans.
I could go on and on. I’m not sure your exact views, so it’s quite possible that none of these are crux-y for you, and your crux lies elsewhere. :)
Thanks!
I feel like the actual crux between you and OP is with the claim in post #2 that the brain operates outside the neuron doctrine to a significant extent.
I don’t think that’s quite right. Neuron doctrine is pretty specific IIUC. I want to say: when the brain does systematic things, it’s because the brain is running a legible algorithm that relates to those things. And then there’s a legible explanation of how biochemistry is running that algorithm. But the latter doesn’t need to be neuron-doctrine. It can involve dendritic spikes and gene expression and astrocytes etc.
All the examples here are real and important, and would impact the algorithms of an “adequate” WBE, but are mostly not “neuron doctrine”, IIUC.
Basically, it’s the thing I wrote a long time ago here: “If some [part of] the brain is doing something useful, then it's humanly feasible to understand what that thing is and why it's useful, and to write our own CPU code that does the same useful thing.” And I think “doing something useful” includes as a special case everything that makes me me.
I don't get what you mean when you say stuff like "would be conscious (to the extent that I am), and it would be my consciousness (to a similar extent that I am)," since afaik you don't actually believe that there is a fact of the matter as to the answers to these questions…
Just, it’s a can of worms that I’m trying not to get into right here. I don’t have a super well-formed opinion, and I have a hunch that the question of whether consciousness is a coherent thing is itself a (meta-level) incoherent question (because of the (A) versus (B) thing here). Yeah, just didn’t want to get into it, and I haven’t thought too hard about it anyway. :)
Right, what I actually think is that a future brain scan with future understanding could enable a WBE to run on a reasonable-sized supercomputer (e.g. <100 GPUs), and it would be capturing what makes me me, and would be conscious (to the extent that I am), and it would be my consciousness (to a similar extent that I am), but it wouldn’t be able to reproduce my exact train of thought in perpetuity, because it would be able to reproduce neither the input data nor the random noise of my physical brain. I believe that OP’s objection to “practical CF” is centered around the fact that you need an astronomically large supercomputer to reproduce the random noise, and I don’t think that’s relevant. I agree that “abstraction adequacy” would be a step in the right direction.
Causal closure is just way too strict. And it’s not just because of random noise. For example, suppose that there’s a tiny amount of crosstalk between my neurons that represent the concept “banana” and my neurons that represent the concept “Red Army”, just by random chance. And once every 5 years or so, I’m thinking about bananas, and then a few seconds later, the idea of the Red Army pops into my head, and if not for this cross-talk, it counterfactually wouldn’t have popped into my head. And suppose that I have no idea of this fact, and it has no impact on my life. This overlap just exists by random chance, not part of some systematic learning algorithm. If I got magical brain surgery tomorrow that eliminated that specific cross-talk, and didn’t change anything else, then I would obviously still be “me”, even despite the fact that maybe some afternoon 3 years from now I would fail to think about the Red Army when I otherwise might. This cross-talk is not randomness, and it does undermine “causal closure” interpreted literally. But I would still say that “abstraction adequacy” would be achieved by an abstraction of my brain that captured everything except this particular instance of cross-talk.
Yeah duh I know you’re not talking about MCMC. :) But MCMC is a simpler example to ensure that we’re on the same page on the general topic of how randomness can be involved in algorithms. Are we 100% on the same page about the role of randomness in MCMC? Is everything I said about MCMC super duper obvious from your perspective? If not, then I think we’re not yet ready to move on to the far-less-conceptually-straightforward topic of brains and consciousness.
I’m trying to get at what you mean by:
But imagine instead that (for sake of argument) it turned out that high-resolution details of temperature fluctuations throughout the brain had a causal effect on the execution of the algorithm such that the algorithm doesn't do what it's meant to do if you just take the average of those fluctuations.
I don’t understand what you mean here. For example:
- If I run MCMC with a PRNG given random seed 1, it outputs 7.98 ± 0.03. If I use a random seed of 2, then the MCMC spits out a final answer of 8.01 ± 0.03. My question is: does the random seed entering MCMC “have a causal effect on the execution of the algorithm”, in whatever sense you mean by the phrase “have a causal effect on the execution of the algorithm”?
- My MCMC code uses a PRNG that returns random floats between 0 and 1. If I replace that PRNG with
return 0.5
, i.e. the average of the 0-to-1 interval, then the MCMC now returns a wildly-wrong answer of 942. Is that replacement the kind of thing you have in mind when you say “just take the average of those fluctuations”? If so, how do you reconcile the fact that “just take the average of those fluctuations” gives the wrong answer, with your description of that scenario as “what it’s meant to do”? Or if not, then what would “just take the average of those fluctuations” mean in this MCMC context?
I’m confused by your comment. Let’s keep talking about MCMC.
- The following is true: The random inputs to MCMC have “a causal effect on the execution of the algorithm such that the algorithm doesn't do what it's meant to do if you just take the average of those fluctuations”.
- For example, let’s say the MCMC accepts a million inputs in the range (0,1), typically generated by a PRNG in practice. If you replace the PRNG by the function
return 0.5
(“just take the average of those fluctuations”), then the MCMC will definitely fail to give the right answer.
- For example, let’s say the MCMC accepts a million inputs in the range (0,1), typically generated by a PRNG in practice. If you replace the PRNG by the function
- The following is false: “the signals entering…are systematic rather than random”. The random inputs to MCMC are definitely expected and required to be random, not systematic. If the PRNG has systematic patterns, it screws up the algorithm—I believe this happens from time to time, and people doing Monte Carlo simulations need to be quite paranoid about using an appropriate PRNG. Even very subtle long-range patterns in the PRNG output can screw up the calculation.
The MCMC will do a highly nontrivial (high-computational-complexity) calculation and give a highly non-arbitrary answer. The answer does depend to some extent on the stream of random inputs. For example, suppose I do MCMC, and (unbeknownst to me) the exact answer is 8.00. If I use a random seed of 1 in my PRNG, then the MCMC might spit out a final answer of 7.98 ± 0.03. If I use a random seed of 2, then the MCMC might spit out a final answer of 8.01 ± 0.03. Etc. So the algorithm run is dependent on the random bits, but the output is not totally arbitrary.
All this is uncontroversial background, I hope. You understand all this, right?
executions would branch conditional on specific charge trajectories, and it would be a rubbish computer.
As it happens, almost all modern computer chips are designed to be deterministic, by putting every signal extremely far above the noise floor. This has a giant cost in terms of power efficiency, but it has a benefit of making the design far simpler and more flexible for the human programmer. You can write code without worrying about bits randomly flipping—except for SEUs, but those are rare enough that programmers can basically ignore them for most purposes.
(Even so, such chips can act non-deterministically in some cases—for example as discussed here, some ML code is designed with race conditions where sometimes (unpredictably) the chip calculates (a+b)+c
and sometimes a+(b+c)
, which are ever-so-slightly different for floating point numbers, but nobody cares, the overall algorithm still works fine.)
But more importantly, it’s possible to run algorithms in the presence of noise. It’s not how we normally do things in the human world, but it’s totally possible. For example, I think an ML algorithm would basically work fine if a small but measurable fraction of bits randomly flipped as you ran it. You would need to design it accordingly, of course—e.g. don’t use floating point representation, because a bit-flip in the exponent would be catastrophic. Maybe some signals would be more sensitive to bit-flips than others, in which case maybe put an error-correcting code on the super-sensitive ones. But for lots of other signals, e.g. the lowest-order bit of some neural net activation, we can just accept that they’ll randomly flip sometimes, and the algorithm still basically accomplishes what it’s supposed to accomplish—say, image classification or whatever.
I agree that certain demos might change the mind of certain people. (And if so, they’re worthwhile.) But I also think other people would be immune. For example, suppose someone has the (mistaken) idea: “Nobody would be so stupid as to actually press go on an AI that would then go on to kill lots of people! Or even if theoretically somebody might be stupid enough to do that, management / government / etc. would never let that happen.” Then that mistaken idea would not be disproven by any demo, except a “demo” that involved lots of actual real-life people getting killed. Right?
Hmm, I wasn’t thinking about that because that sentence was nominally in someone else’s voice. But you’re right. I reworded, thanks.
In a perfect world, everyone would be concerned about the risks for which there are good reasons to be concerned, and everyone would be unconcerned about the risks for which there are good reasons to be unconcerned, because everyone would be doing object-level checks of everyone else’s object-level claims and arguments, and coming to the correct conclusion about whether those claims and arguments are valid.
And those valid claims and arguments might involve demonstrations and empirical evidence, but also might be more indirect.
I do think Turing and von Neumann reached correct object-level conclusions via sound reasoning, but obviously I’m stating that belief without justifying it.
I’m not sure what argument you think I’m making.
In a perfect world, I think people would not need any concrete demonstration to be very concerned about AGI x-risk. Alan Turing and John von Neumann were very concerned about AGI x-risk, and they obviously didn’t need any concrete demonstration for that. And I think their reasons for concern were sound at the time, and remain sound today.
But many people today are skeptical that AGI poses any x-risk. (That’s unfortunate from my perspective, because I think they’re wrong.) The point of this post is to suggest that we AGI-concerned people might not be able to win over those skeptics via concrete demonstrations of AI doing scary (or scary-adjacent) things, either now or in the future—or at least, not all of the skeptics. It’s probably worth trying anyway—it might help for some of the skeptics. Regardless, understanding the exact failure modes is helpful.
These algorithms are useful maps of the brain and mind. But is computation also the territory? Is the mind a program? Such a program would need to exist as a high-level abstraction of the brain that is causally closed and fully encodes the mind.
I said it in one of your previous posts but I’ll say it again: I think causal closure is patently absurd, and a red herring. The brain is a machine that runs an algorithm, but algorithms are allowed to have inputs! And if an algorithm has inputs, then it’s not causally closed.
The most obvious examples are sensory inputs—vision, sounds, etc. I’m not sure why you don’t mention those. As soon as I open my eyes, everything in my field of view has causal effects on the flow of my brain algorithm.
Needless to say, algorithms are allowed to have inputs. For example, the mergesort algorithm has an input (namely, a list). But I hope we can all agree that mergesort is an algorithm!
The other example is: the brain algorithm has input channels where random noise enters in. Again, that doesn’t prevent it from being an algorithm. Many famous, central examples of algorithms have input channels that accept random bits—for example, MCMC.
And in regards to “practical CF”, if I run MCMC on my computer while sitting outside, and I use an anemometer attached to the computer as a source of the random input bits entering the MCMC run, then it’s true that you need an astronomically complex hyper-accurate atmospheric simulator in order to reproduce this exact run of MCMC, but I don’t understand your perspective wherein that fact would be important. It’s still true that my computer is implementing MCMC “on a level of abstraction…higher than” atoms and electrons. The wind flowing around the computer is relevant to the random bits, but is not part of the calculations that comprise MCMC (which involve the CPU instruction set etc.). By the same token, if thermal noise mildly impacts my train of thought (as it always does), then it’s true that you need to simulate my brain down to the jiggling atoms in order to reproduce this exact run of my brain algorithm, but this seems irrelevant to me, and in particular it’s still true that my brain algorithm is “implemented on a level of abstraction of the brain higher than biophysics”. (Heck, if I look up at the night sky, then you’d need to simulate the entire Milky Way to reproduce this exact run of my brain algorithm! Who cares, right?)
I agree!! (if I understand correctly). See https://www.lesswrong.com/posts/RrG8F9SsfpEk9P8yi/robin-hanson-s-grabby-aliens-model-explained-part-1?commentId=wNSJeZtCKhrpvAv7c
Huh, this is helpful, thanks, although I’m not quite sure what to make of it and how to move forward.
I do feel confused about how you’re using the term “equanimity”. I sorta have in mind a definition kinda like: neither very happy, nor very sad, nor very excited, nor very tired, etc. Google gives the example: “she accepted both the good and the bad with equanimity”. But if you’re saying “apply equanimity to positive sensations and it makes them better”, you’re evidently using the term “equanimity” in a different way than that. More specifically, I feel like when you say “apply equanimity to X”, you mean something vaguely like “do a specific tricky learned attention-control maneuver that has something to do with the sensory input of X”. That same maneuver could contribute to equanimity, if it’s applied to something like anxiety. But the maneuver itself is not what I would call “equanimity”. It’s upstream. Or sorry if I’m misunderstanding.
Also, I also want to distinguish two aspects of an emotion. In one, “duration of an emotion” is kinda like “duration of wearing my green hat”. I don’t have to be thinking about it the whole time, but it’s a thing happening with my body, and if I go to look, I’ll see that it’s there. Another aspect is the involuntary attention. As long as it’s there, I can’t not think about it, unlike my green hat. I expect that even black-belt PNSE meditators are unable to instantly turn off anger / anxiety / etc. in the former sense. I think these things are brainstem reactions that can be gradually unwound but not instantly. I do expect that those meditators would be able to more instantly prevent the anger / anxiety / etc. from controlling their thought process. What do you think?
Also, just for context, do you think you’ve experienced PNSE? Thanks!
I don’t think any of the challenges you mentioned would be a blocker to aliens that have infinite compute and infinite time. “Is the data big-endian or little-endian?” Well, try it both ways and see which one is a better fit to observations. If neither seems to fit, then do a combinatorial listing of every one of the astronomical number of possible encoding schemes, and check them all! Spend a trillion years studying the plausibility of each possible encoding before moving onto the next one, just to make sure you don’t miss any subtelty. Why not? You can do all sorts of crazy things with infinite compute and infinite time.
I don’t think this is too related to the OP, but in regard to your exchange with jbash:
I think there’s a perspective where “personal identity” is a strong intuition, but a misleading one—it doesn’t really (“veridically”) correspond to anything at all in the real world. Instead it’s a bundle of connotations, many of which are real and important. Maybe I care that my projects and human relationships continue, that my body survives, that the narrative of my life is a continuous linear storyline, that my cherished memories persist, whatever. All those things veridically correspond to things in the real world, but (in this perspective) there isn’t some core fact of the matter about “personal identity” beyond that bundle of connotations.
I think jbash is saying (within this perspective) that you can take the phrase “personal identity”, pick whatever connotations you care, and define “personal identity” as that. And then your response (as I interpret it) is that no, you can’t do that, because there’s a core fact of the matter about personal identity, and that core fact of the matter is very very important, and it’s silly to define “personal identity” as pointing to anything else besides that core fact of the matter.
So I imagine jbash responding that “do I nonetheless continue living (in the sense of, say, anticipating the same kind of experiences)?” is a confused question, based on reifying misleading intuitions around “I”. It’s a bit like saying “in such-and-such a situation, will my ancestor spirits be happy or sad?”
I’m not really defending this perspective here, just trying to help explain it, hopefully.
If we apply the Scott Aaronson waterfall counterargument to your Alice-bot-and-Bob-bot scenario, I think it would say: The first step was running Alice-bot, to get the execution trace. During this step, the conscious experience of Alice-bot manifests (or whatever). Then the second step is to (let’s say) modify the Bob code such that it does the same execution but has different counterfactual properties. Then the third step is to run the Bob code and ask whether the experience of Alice-bot manifests again.
But there’s a more basic question. Forget about Bob. If I run the Alice-bot code twice, with the same execution trace, do I get twice as much Alice-experience stuff? Maybe you think the answer is “yeah duh”, but I’m not so sure. I think the question is confusing, possibly even meaningless. How do you measure how much Alice-experience has happened? The “thick wires” argument (I believe due to Nick Bostrom, see here, p189ff, or shorter version here) seems relevant. Maybe you’ll say that the thick-wires argument is just another reductio about computational functionalism, but I think we can come up with a closely-analogous “thick neurons” thought experiment that makes whatever theory of consciousness you subscribe to have an equally confusing property.
I don’t think Premise 2 is related to my comment. I think it’s possible to agree with premise 2 (“there is an objective fact-of-the-matter whether a conscious experience is occurring”), but also to say that there are cases where it is impossible-in-practice for aliens to figure out that fact-of-the-matter.
By analogy, I can write down a trillion-digit number N, and there will be an objective fact-of-the-matter about what is the prime factorization of N, but it might take more compute than fits in the observable universe to find out that fact-of-the-matter.
This is kinda helpful but I also think people in your (1) group would agree with all three of: (A) the sequence of thoughts that you think directly correspond to something about the evolving state of activity in your brain, (B) random noise has nonzero influence on the evolving state of activity in your brain, (C) random noise cannot be faithfully reproduced in a practical simulation.
And I think that they would not see anything self-contradictory about believing all of those things. (And I also don’t see anything self-contradictory about that, even granting your (1).)
Well, I guess this discussion should really be focused more on personal identity than consciousness (OP wrote: “Whether or not a simulation can have consciousness at all is a broader discussion I'm saving for later in the sequence, and is relevant to a weaker version of CF.”).
So in that regard: my mental image of computational functionalists in your group (1) would also say things like (D) “If I start 5 executions of my brain algorithm, on 5 different computers, each with a different RNG seed, then they are all conscious (they are all exuding consciousness-stuff, or whatever), and they all have equal claim to being “me”, and of course they all will eventually start having different trains of thought. Over the months and years they might gradually diverge in beliefs, memories, goals, etc. Oh well, personal identity is a fuzzy thing anyway. Didn’t you read Parfit?”
But I haven’t read as much of the literature as you, so maybe I’m putting words in people’s mouths.
FYI for future readers: the OP circles back to this question (what counts as a computation) more in a later post of this sequence, especially its appendix, and there’s some lively discussion happening in the comments section there.
You can’t be wrong about the claim “you are having a visual experience”.
Have you heard of Cotard's syndrome?
It’s interesting that you care about what the alien thinks. Normally people say that the most important property of consciousness is its subjectivity. Like, people tend to say things like “Is there something that it’s like to be that person, experiencing their own consciousness?”, rather than “Is there externally-legible indication that there’s consciousness going on here?”.
Thus, I would say: the simulation contains a conscious entity, to the same extent that I am a conscious entity. Whether aliens can figure out that fact is irrelevant.
I do agree with the narrow point that a simulation of consciousness can be externally illegible, i.e. that you can manifest something that’s conscious to the same extent that I am, in a way where third parties will be unable to figure out whether you’ve done that or not. I think a cleaner example than the ones you mentioned is: a physics simulation that might or might not contain a conscious mind, running under homomorphic encryption with a 100000-bit key, and where all copies of the key have long ago been deleted.
Actually never mind. But for future reference I guess I’ll use the intercom if I want an old version labeled. Thanks for telling me how that works. :)
(There’s a website / paper going around that cites a post I wrote way back in 2021, when I was young and stupid, so it had a bunch of mistakes. But after re-reading that post again this morning, I decided that the changes I needed to make weren’t that big, and I just went ahead and edited the post like normal, and added a changelog to the bottom. I’ve done this before. I’ll see if anyone complains. I don’t expect them to. E.g. that same website / paper cites a bunch of arxiv papers while omitting their version numbers, so they’re probably not too worried about that kind of stuff.)
I think there might be a lesswrong editor feature that allows you to edit a post in such a way that the previous version is still accessible. Here’s an example—there’s a little icon next to the author name that says “This post has major past revisions…”. Does anyone know where that option is? I can’t find it in the editor UI. (Or maybe it was removed? Or it’s only available to mods?) Thanks in advance!
There’s a theory (twitter citing reddit) that at least one of these people filed GDPR right to be forgotten requests. So one hypothesis would be: all of those people filed such GDPR requests.
But the reddit post (as of right now) guesses that it might not be specifically about GDPR requests per se, but rather more generally “It's a last resort fallback for preventing misinformation in situations where a significant threat of legal action is present”.
Good luck! I was writing about it semi-recently here.
General comment: It’s also possible to contribute to mind uploading without getting a PhD—see last section of that post. There are job openings that aren’t even biology, e.g. ML engineering. And you could also earn money and donate it, my impression is that there’s desperate need.
I guess I shouldn’t put words in other people’s mouths, but I think the fact that years-long trains-of-thought cannot be perfectly predicted in practice because of noise is obvious and uninteresting to everyone, I bet including to the computational functionalists you quoted, even if their wording on that was not crystal clear.
There are things that the brain does systematically and robustly by design, things which would be astronomically unlikely to happen by chance. E.g. the fact that I move my lips to emit grammatical English-language sentences rather than random gibberish. Or the fact that humans wanted to go to the moon, and actually did so. Or the fact that I systematically take actions that tend to lead to my children surviving and thriving, as opposed to suffering and dying.
That kind of stuff, which my brain does systematically and robustly, is what makes me me. My memories, goals, hopes and dreams, skills, etc. The fact that I happened to glance towards my scissors at time 582834.3 is not important, but the robust systematic patterns are.
And the reason that my brain does those things systematically and robustly is because the brain is designed to run an algorithm that does those things. And there’s a mathematical explanation of why this particular algorithm does those remarkable systematic things like invent quantum mechanics and reflect on the meaning of life, and separately, there’s a biophysical explanation of how it is that the brain is a machine that runs this algorithm.
I don’t think “software versus hardware” is the right frame. I prefer “the brain is a machine that runs a certain algorithm”. Like, what is software-versus-hardware for a mechanical calculator? I dunno. But there are definitely algorithms that the mechanical calculator is executing.
So we can talk about what is the algorithm that the brain is running, and why does it work? Well, it builds models, and stores them, and queries them, and combines them, and edits them, and there’s a reinforcement learning actor-critic thing, blah blah blah.
Those reasons can still be valid even if there’s some unpredictable noise in the system. Think of a grandfather clock—the second hand will robustly move 60× faster than the minute hand, by design, even if there’s some noise in the pendulum that affects the speed of both, or randomness in the surface friction that affects the exact micron-level location that the second hand comes to rest each tick. Or think of an algorithm that involves randomness (e.g. MCMC), and hence any given output is unpredictable, but the algorithm still robustly and systematically does stuff that is a priori specifiable and be astronomically unlikely to happen by chance. Or think of the Super Mario 64 source code compiled to different chip architectures that use different size floats (for example). You can play both, and they will both be very recognizably Super Mario 64, but any given exact sequence of button presses will eventually lead to divergent trajectories on the two systems. (This kind of thing is known to happen in tool-assisted speedruns—they’ll get out of sync on different systems, even when it’s “the same game” to all appearances.)
But it’s still reasonable to say that the Super Mario 64 source code is specifying an algorithm, and all the important properties of Super Mario 64 are part of that algorithm, e.g. what does Mario look like, how does he move, what are the levels, etc. It’s just that the core algorithm is not specified at such a level of detail that we can pin down what any given infinite sequence of button presses will do. That depends on unimportant details like floating point rounding.
I think this is compatible with how people use the word “algorithm” in practice. Like, CS people will causally talk about “two different implementations of the MCMC algorithm”, and not just “two different algorithms in the MCMC family of algorithms”.
That said, I guess it’s possible that Putnam and/or Piccinini were describing things in a careless or confused way viz. the role of noise impinging upon the brain. I am not them, and it’s probably not a good use of time to litigate their exact beliefs and wording. ¯\_(ツ)_/¯
I should probably let EuanMcLean speak for themselves but I do think “literally the exact same sequence of thoughts in the exact same order” is what OP is talking about. See the part about “causal closure”, and “predict which neurons are firing at t1 given the neuron firings at t0…”. The latter is pretty unambiguous IMO: literally the exact same sequence of thoughts in the exact same order.
I definitely didn’t write anything here that amounts to a general argument for (or against) computationalism. I was very specifically responding to this post. :)
In my last post, I defined a concrete claim that computational functionalists tend to make:
Practical CF: A simulation of a human brain on a classical computer, capturing the dynamics of the brain on some coarse-grained level of abstraction, that can run on a computer small and light enough to fit on the surface of Earth, with the simulation running at the same speed as base reality, would cause the same conscious experience as that brain.
From reading this comment, I understand that you mean the following:
- Practical CF, more explicitly: A simulation of a human brain on a classical computer, capturing the dynamics of the brain on some coarse-grained level of abstraction, that can run on a computer small and light enough to fit on the surface of Earth, with the simulation running at the same speed as base reality, would cause the same conscious experience as that brain, in the specific sense of thinking literally the exact same sequence of thoughts in the exact same order, in perpetuity.
I agree that “practical CF” as thus defined is false—indeed I think it’s so obviously false that this post is massive overkill in justifying it.
But I also think that “practical CF” as thus defined is not in fact a claim that computational functionalists tend to make.
Let’s put aside simulation and talk about an everyday situation.
Suppose you’re the building manager of my apartment, and I’m in my apartment doing work. Unbeknownst to me, you flip a coin. If it’s heads, then you set the basement thermostat to 20°C. If it’s tails, then you set the basement thermostat to 20.1°C. As a result, the temperature in my room is slightly different in the two scenarios, and thus the temperature in my brain is slightly different, and this causes some tiny number of synaptic vesicles to release differently under heads versus tails, which gradually butterfly-effect into totally different trains of thought in the two scenarios, perhaps leading me to make a different decision on some question where I was really ambivalent and going back and forth, or maybe having some good idea in one scenario but not the other.
But in both scenarios, it’s still “me”, and it’s still “my mind” and “my consciousness”. Do you see what I mean?
So anyway, when you wrote “A simulation of a human brain on a classical computer…would cause the same conscious experience as that brain”, I initially interpreted that sentence as meaning something more like “the same kind of conscious experience”, just as I would have “the same kind of conscious experience” if the basement thermostat were unknowingly set to 20°C versus 20.1°C.
(And no I don’t just mean “there is a conscious experience either way”. I mean something much stronger than that—it’s my conscious experience either way, whether 20°C or 20.1°C.)
Do you see what I mean? And under that interpretation, I think that the statement would be not only plausible but also a better match to what real computational functionalists usually believe.
I expect Lightcone to be my primary or maybe only x-risk-related donation this year—see my manifund comment here for my endorsement:
As a full-time AGI safety / alignment researcher (see my research output), I can say with confidence that I wouldn’t have been able to get into the field in the first place, and certainly wouldn’t have made a fraction as much progress, without lesswrong / alignment forum (LW/AF). I continue to be extremely reliant on it for my research progress. … [much more here]
Wish I had more to give, but I’ll send something in the mid four figures at the beginning of January (for tax reasons).
Ground shipping is both a complement and a substitute for water shipping, so the net effect isn’t obvious. (Or at least, it’s not obvious to me).
Also, if a certain interest group has not lobbied in a policy area in the past (as I think is the case here?), then that's nonzero evidence that they will continue to not lobby in that policy area in the future.
For Whole Brain Emulation (WBE):
WBE involves (1) measuring a connectome a.k.a. brain scanning, (2) turning that data into a working human emulation with the same human drives, memories, goals, etc. I’m generally optimistic about (1) and pessimistic about (2)—I think (2) is both much harder and much less useful for x-risk reduction than it might seem. But I also think there’s a great x-risk reduction case for making progress towards (1), even setting (2) aside. And after talking to a couple orgs in this space, I think massive progress on (1) is possible in the 2020s and certainly the 2030s.
See my posts Connectomics seems great from an AI x-risk perspective and 8 Examples informing my pessimism on uploading without reverse engineering for more on all that. The last section of the former post also lists two more organizations that seem worthy of consideration for funding in this space.
Yeah it’s fine to assume that there might be some period of time that (1) the AGIs don’t escape control, (2) the code doesn’t leak or get stolen, (3) nobody else reinvents the same thing, (4) Company A doesn’t have infinite capital (yet) to spend on renting cloud compute (or the contracts haven’t yet been signed or whatever). And it’s fine to be curious about how many AGIs would Company A have available during this period of time.
We think that period might be substantial, for reasons discussed in Section II.
I don’t think Section II is related to that. Again, the question I’m asking is How long is the period where an already-existing AGI model type / training approach is only running on the compute already owned by the company that made that AGI, rather than on most of the world’s then-existing compute? If I compare that question to the considerations that you bring up in Section II, they seem almost entirely irrelevant, right? I’ll go through them:
Plateau: There may be unexpected development plateaus that come into effect at around human-level intelligence. These plateaus could be architecture-specific (scaling laws break down; getting past AGI requires something outside the deep learning paradigm) or fundamental to the nature of machine intelligence.
That doesn’t prevent any of those four things I mentioned: it doesn’t prevent (1) the AGIs escaping control and self-reproducing, nor (2) the code / weights leaking or getting stolen, nor (3) other companies reinventing the same thing, nor (4) the AGI company (or companies) having an ability to transform compute into profits at a wildly higher exchange rate than any other compute customer, and thus making unprecedented amounts of money off their existing models, and thus buying more and more compute to run more and more copies of their AGI (e.g. see the “Everything, Inc.” scenario of §3.2.4 here).
Pause: Government intervention could pause frontier AI development. Such a pause could be international. It is plausible that achieving or nearly achieving an AGI system would constitute exactly the sort of catalyzing event that would inspire governments to sharply and suddenly restrict frontier AI development.
That definitely doesn’t prevent (1) or (2), and it probably doesn’t prevent (3) or (4) either depending on implementation details.
Collapse: Advances in AI are dependent on the semiconductor industry, which is composed of several fragile supply chains. A war between China and Taiwan is considered reasonably possible by experts and forecasters. Such an event would dramatically disrupt the semiconductor industry (not to mention the world economy). If this happens around the time that AGI is first developed, AI capabilities could be artificially suspended at human-level for years while computer chip supply chains and AI firms recover.
That doesn’t prevent any of (1,2,3,4). Running an already-existing AGI model on the world’s already-existing stock of chips is unrelated to how many new chips are being produced. And war is not exactly a time when governments tend to choose caution and safety over experimenting with powerful new technologies at scale. Likewise, war is a time when rival countries are especially eager to steal each other’s military-relevant IP.
Abstention: Many frontier AI firms appear to take the risks of advanced AI seriously, and have risk management frameworks in place (see those of Google DeepMind, OpenAI, and Anthropic). Some contain what Holden Karnofsky calls if-then commitments: “If an AI model has capability X, risk mitigations Y must be in place. And, if needed, we will delay AI deployment and/or development to ensure the mitigations can be present in time.” Commitments to pause further development may kick at human-level capabilities. AGI firms might avoid recursive self-improvement to avoid existential or catastrophic risks.
That could be relevant to (1,2,4) with luck. As for (3), it might buy a few months, before Meta and the various other firms and projects that are extremely dismissive of the risks of advanced AI catch up to the front-runners.
Windup: There are hard-to-reduce windup times in the production process of frontier AI models. For example, a training run for future systems may run into the hundreds of billions of dollars, consuming vast amounts of compute and taking months of processing. Other bottlenecks, like the time it takes to run ML experiments, might extend this windup period.
That doesn’t prevent any of (1,2,3,4). Again, we’re assuming the AGI already exists, and discussing how many servers will be running copies of it, and how soon. The question of training next-generation even-more-powerful AGIs is irrelevant to that question. Right?
Yeah it’s fine to assume that there might be some period of time that (1) the AGIs don’t escape control, (2) the code doesn’t leak or get stolen, (3) nobody else reinvents the same thing, (4) Company A doesn’t have infinite capital (yet) to spend on renting cloud compute (or the contracts haven’t yet been signed or whatever). And it’s fine to be curious about how many AGIs would Company A have available during this period of time.
And then a key question is whether anything happens during that period of time that would change what happens after that period of time. (And if not, then the analysis isn’t too important.) A pivotal act would certainly qualify. I’m kinda cynical in this area; I think the most likely scenario by far is that nothing happens during this period that has an appreciable impact on what happens afterwards. Like, I’m sure that Company A try to get their AGIs to beat benchmarks, do scientific research, make money, etc. I also expect them to have lots of very serious meetings, both internally and with government officials. But I don’t expect that Company A would succeed at making the world resilient to future out-of-control AGIs, because that’s just a crazy hard thing to do even with millions of intent-aligned AGIs at your disposal. I discussed some of the practical challenges at What does it take to defend the world against out-of-control AGIs?.
Well anyway. My comment above was just saying that the OP could be clearer on what they’re trying to estimate, not that they’re wrong to be trying to estimate it. :)
I'd be fascinated by having a conversation about why 1e14 FLOP/s might be a better estimate.
I think I don’t want to share anything publicly beyond what I wrote in Section 3 here. ¯\_(ツ)_/¯
For longer term brain processes, you need to take into account fractional shares of relatively-slow-but-high-complexity processes
Yeah I’ve written about that too (here). :) I think that’s much more relevant to how hard it is to create AGI rather than how hard it is to run AGI.
But also, I think it’s easy to intuitively mix up “complexity” with “not-knowing-what’s-going-on”. Like, check out this code, part of an AlphaZero-chess clone project. Imagine knowing nothing about chess, and just looking at a minified (or compiled) version of that code. It would feel like an extraordinarily complex, inscrutable, mess. But if you do know how chess works and you’re trying to write that code in the first place, no problem, it’s a few days of work to get it basically up and running. And it would no longer feel very complex to you, because you would have a framework for understanding it.
By analogy, if we don’t know what all the protein cascades etc. are doing in the brain, then they feel like an extraordinarily complex, inscrutable, mess. But if you have a framework for understanding them, and you’re writing code that does the same thing (e.g. sets certain types of long-term memory traces in certain conditions, or increments a counter variable, or whatever) in your AGI, then that code-writing task might feel pretty straightforward.
I understand that you’re basically assuming that the “initial AGI population” is running on only the same amount of compute that was used to train that very AGI. It’s fine to make that assumption but I think you should emphasize it more. There are a lot of situations where that’s not an appropriate assumption, but rather the relevant question is “what’s the AGI population if most of the world’s compute is running AGIs”.
For example, if the means to run AGIs (code, weights, whatever) gets onto the internet, then everybody all over the world would be doing that immediately. Or if a power-seeking AGI escapes human control, then a possible thing it might do is work to systematically get copies of itself running on most of the world’s compute. Or another possible thing it might do is wipe out humanity and then get copies of itself running on most of the world’s compute, and then we’ll want to know if that’s enough AGIs for a self-sufficient stable supply chain (see “Argument 2” here). Or if we’re thinking more than a few months after AGI becomes possible at all, in a world like today’s where the leader is only slightly ahead of a gaggle of competitors and open-source projects, then AGI would again presumably be on most of the world’s compute. Or if we note that a company with AGI can make unlimited money by renting more and more compute to run more AGIs to do arbitrary remote-work jobs, then we might guess that they would decide to do so, which would lead to scaling up to as much compute around the world as money can buy.
OK, here’s the part of the post where you justified your decision to base your analysis on one training run worth of compute rather than one planet worth of compute, I think:
One reason the training run imputation approach is likely still solid is that competition between firms or countries will crowd out compute or compute will be excluded on national security grounds. Consider the two main actors that could build AGI. If a company builds AGI, they are unlikely to have easy access to commodified compute that they have not themselves built, since they will be in fierce competition with other firms buying chips and obtaining compute. If a government builds AGI, it seems plausible they would impose strict security measures on their compute, reducing the likelihood that anything not immediately in the project would be employable at inference.
The first part doesn’t make sense to me:
Let’s say Company A can make AGIs that are drop-in replacements for highly-skilled humans at any existing remote job (including e.g. “company founder”), and no other company can. And Company C is a cloud provider. Then Company A will be able to outbid every other company for Company C’s cloud compute, since Company A is able to turn cloud compute directly into massive revenue. It can just buy more and more cloud compute from C and every other company, funding itself with rapid exponential growth, until the whole world is saturated.
If Company A and Company B can BOTH make AGIs that are drop-in replacements for highly-skilled humans, and Company C doesn’t do AI research but is just a giant cloud provider, then Company A and Company B will bid against each other to rent Company C’s compute, and no other bidders will be anywhere close to those two. It doesn’t matter whether Company A or Company B wins the auction—Company C’s compute is going to be running AGIs either way. Right?
Next, the second part.
Yes it’s possible that a government would be sufficiently paranoid about IP theft (or loss of control or other things) that it doesn’t want to run its AGI code on random servers that it doesn’t own itself. (We should be so lucky!) It’s also possible that a company would make the same decision for the same reason. Yeah OK, that’s indeed a scenario where one might be interested in the question of what AGI population you get for its training compute. But that’s really only relevant if the government or company rapidly does a pivotal act, I think. Otherwise that’s just an interesting few-month period of containment before AGIs are on most of the world’s compute as above.
we found three existing attempts to estimate the initial AGI population
FWIW Holden Karnofsky wrote a 2022 blog post “AI Could Defeat All Of Us Combined” that mentions the following: “once the first human-level AI system is created, whoever created it could use the same computing power it took to create it in order to run several hundred million copies for about a year each.” Brief justification in his footnote 5. Not sure that adds much to the post, it just popped into my head as a fourth example.
~ ~ ~
For what it’s worth, my own opinion is that 1e14 FLOP/s is a better guess than 1e15 FLOP/s for human brain compute, and also that we should divide all the compute in the world including consumer PCs by 1e14 FLOP/s to guess (what I would call) “initial AGI population”, for all planning purposes apart from pivotal acts. But you’re obviously assuming that AGI will be an LLM, and I’m assuming that it won’t, so you should probably ignore my opinion. We’re talking about different things. Just thought I’d share anyway ¯\_(ツ)_/¯
(Fun tangent, not directly addressing this argument thread.)
There’s a trio of great posts from 2015 by @JonahS : The Truth About Mathematical Ability ; Innate Mathematical Ability ; Is Scott Alexander bad at math? which (among other things) argues that you can be “good at math” along the dimension(s) of noticing patterns very quickly, AND/OR you can be “good at math” along the dimension(s) of an “aesthetic” sense for concepts being right and sensible. (My summary, not his.)
The “aesthetics” is sorta a loss function that provides a guidestar for developing good deep novel understanding—but that process may take a very long time. He offers Scott Alexander, and himself, and Alexander Grothendieck as examples of people with lopsided profiles—stronger on “aesthetics” than they are on “fast pattern-recognition”.
I found it a thought-provoking hypothesis. I wish JonahS had written more.
Do you think there are edge cases where I ask “Is such-and-such system running the Miller-Rabin primality test algorithm?”, and the answer is not a clear yes or no, but rather “Well, umm, kinda…”?
(Not rhetorical! I haven’t thought about it much.)
In case anyone’s wondering, if there’s a 1/n chance of something happening each time (iid), and you try n times (for large n), then it will happen m times with probability . So 0,1,2,3… hits would be 36.8%, 36.8%, 18.4%, 6.1%, 1.5%, 0.3%, … Nice how it sums to one.
(For the general formula, i.e. where the probability is not necessarily 1/n, see: poisson distribution.)
Fun fact, a group of neuroscientists has been trying to start an academia-culture blogging platform / forum thing: https://jocnf.pubpub.org/ E.g. they emphasize the fact that every post gets assigned a DOI.
(…Nobody’s using it though! Basically all the posts so far are by the people who run it, or their close friends.)
(But it’s still very new. Probably too soon to judge it a failure.)
Just thought that might be an interesting point of comparison.
Hmm, I think the point I’m trying to make is: it’s dicey to have a system S that’s being continually modified to systematically reduce some loss L, but then we intervene to edit S in a way that increases L. We’re kinda fighting against the loss-reducing mechanism (be it gradient descent or bankroll-changes or whatever), hoping that the loss-reducing mechanism won’t find a “repair” that works around our interventions.
In that context, my presumption is that an AI will have some epistemic part S that’s continually modified to produce correct objective understanding of the world, including correct anticipation of the likely consequences of actions. The loss L for that part would probably be self-supervised learning, but could also include self-consistency or whatever.
And then I’m interpreting you (maybe not correctly?) as proposing that we should consider things like making the AI have objectively incorrect beliefs about (say) bioweapons, and I feel like that’s fighting against this L in that dicey way.
Whereas your Q-learning example doesn’t have any problem with fighting against a loss function, because Q(S,A) is being consistently and only updated by the reward.
The above is inapplicable to LLMs, I think. (And this seems tied IMO to the fact that LLMs can’t do great novel science yet etc.) But it does apply to FixDT.
Specifically, for things like FixDT, if there are multiple fixed points (e.g. I expect to stand up, and then I stand up, and thus the prediction was correct), then whatever process you use to privilege one fixed point over another, you’re not fighting against the above L (i.e., the “epistemic” loss L based on self-supervised learning and/or self-consistency or whatever). L is applying no force either way. It’s a wide-open degree of freedom.
(If your response is “L incentivizes fixed-points that make the world easier to predict”, then I don’t think that’s a correct description of what such a learning algorithm would do.)
So if your feedback proposal exclusively involves a mechanism that privileging one fixed point over another, then I have no complaints, and would describe it as choosing a utility function (preferences not beliefs) within the FixDT framework.
Btw I think we’re in agreement that there should be some mechanism privileging one fixed point over another, instead of ignoring it and just letting the underdetermined system do whatever it does.
Updating on things being true or false cannot rule out agentic hypotheses (the inner optimizer problem). … Any sufficiently rich hypotheses space has agentic policies, which can't be ruled out by the feedback.
Oh, I want to set that problem aside because I don’t think you need an arbitrarily rich hypothesis space to get ASI. The agency comes from the whole AI system, not just the “epistemic” part, so the “epistemic” part can be selected from a limited model class, as opposed to running arbitrary computations etc. For example, the world model can be “just” a Bayes net, or whatever. We’ve talked about this before.
Reinforcement Learning cannot rule out the wireheading hypothesis or human-manipulation hypothesis.
I also learned the term observation-utility agents from you :) You don’t think that can solve those problems (in principle)?
I’m probably misunderstanding you here and elsewhere, but enjoying the chat, thanks :)
The OP talks about the fact that evolution produced lots of organisms on Earth, of which humans are just one example, and that if we view the set of all life, arguably more of it consists of bacteria or trees than humans. Then this comment thread has been about the question: so what? Why bring that up? Who cares?
Like, here’s where I think we’re at in the discussion:
Nate or Eliezer: “Evolution made humans, and humans don’t care about inclusive genetic fitness.”
tailcalled: “Ah, but did you know that evolution also made bacteria and trees?”
Nate or Eliezer: “…Huh? What does that have to do with anything?”
If you think that the existence on Earth of lots of bacteria and trees is a point that specifically undermines something that Nate or Eliezer said, then can you explain the details?
Here’s a sensible claim:
CLAIM A: “IF there’s a learning algorithm whose reward function is X, THEN the trained models that it creates will not necessarily explicitly desire X.”
This is obviously true, and every animal including humans serves as an example. For most animals, it’s trivially true, because most animals doesn’t even know what inclusive genetic fitness is, so obviously they don’t explicitly desire it.
So here’s a stronger claim:
CLAIM B: “CLAIM A is true even if the trained model is sophisticated enough to fully understand what X is, and to fully understand that it was itself created by this learning algorithm.”
This one is true too, and I think humans are the only example we have. I mean, the claim is really obvious if you know how algorithms work etc., but of course some people question it anyway, so it can be nice to have a concrete illustration.
(More discussion here.)
Neither of those claims has anything to do with humans being the “winners” of evolution. I don’t think there’s any real alignment-related claim that does. Although, people say all kinds of things, I suppose. So anyway, if there’s really something substantive that this post is responding to, I suggest you try to dig it out.
I’ve been on twitter since 2013 and have only ever used the OG timeline (a.k.a. chronological, a.k.a. “following”, a.k.a. every tweet from the people you follow and no others). I think there were periods where the OG timeline was (annoyingly) pretty hard to find, and there were periods where you would be (infuriatingly) auto-switched out of the OG timeline every now and then (weekly-ish?) and had to manually switch back. The OG timeline also has long had occasional advertisements of course. And you might be right that (in some periods) the OG timeline also included occasional other tweets that shouldn’t be in the OG timeline but were thrown in. IIRC, I thought of those as being in the same general category as advertisements, but just kinda advertisements for using more twitter. I think there was a “see less often” option for those, and I always selected that, and I think that helped maintain the relative purity of my OG timeline.