The intelligence explosion starts before human-level AI.
Are there any recommended readings for this point in particular? I tried searching for Shulman's writing on the topic but came up empty. (Sorry if I missed some!)
This seems to me a key point that most discourse on AI/AGI overlooks. For example, LeCun argues that, at current rates of progress, human-level AI is 30+ years away (if I remember him correctly). He could be right about the technological distance yet wrong about the temporal distance if AI R&D is dramatically sped up by an intelligence explosion ahead of the HLAI milestone.
It also seems like a non-obvious point. For example, when I. J. Good coined the term "intelligence explosion", it was conceived as the result of designing an ultraintelligent machine. So for explosion to precede superintelligence flips the original concept on its head.
I've only listened to part 1 so far, and I found the discussion of intelligence explosion to be especially fresh. (That's hard to do given the flood of AI takes!) In particular (from memory, so I apologize for errors):
By analogy to chip compute scaling as a function of researcher population, it makes super-exponential growth seem possible if AI-compute-increase is substituted for researcher-population-increase. A particularly interesting aspect of this is that the answer could have come out the other way if the numbers had worked out differently as Moore's law progressed. (It's always nice to give reality a chance to prove you wrong.)
The intelligence explosion starts before human-level AI. But I was left wanting to know more: if so, how do we know when we've crossed the inflection point into the intelligence explosion? Is it possible that we're already in an intelligence explosion, since AlexNet, or Google's founding, or the creation of the internet, or even the invention of digital computers? And I thought Patel's point about the difficulty of automating a "portfolio of tasks" was great and not entirely addressed.
The view of intelligence explosion as consisting concretely of increases in AI researcher productivity, though I've seen it observed elsewhere, was good to hear again. It helps connect the abstract concept of intelligence explosion to how it could play out in the real world.
It now seems clear that AIs will also descend more directly from a common ancestor than you might have naively expected in the CAIS model, since almost every AI will be a modified version of one of only a few base foundation models. That has important safety implications, since problems in the base model might carry over to problems in the downstream models, which will be spread thorought the economy. That said, the fact that foundation model development will be highly centralized, and thus controllable, is perhaps a safety bonus that loosely cancels out this consideration.
The first point here (that problems in a widely-used base model will propagate widely) concerns me as well. From distributed systems we know that
Individual components will fail.
To withstand failures of components, use redundancy and reduce the correlation of failures.
By point 1, we should expect alignment failures. (It's not so different from bugs and design flaws in software systems, which are inevitable.) By point 2, we can withstand them using redundancy, but only if the failures are sufficiently uncorrelated. Unfortunately, the tendency towards monopolies in base models is increasing the correlation of failures.
As a concrete example, consider AI controlling a military. (As AI improves, there are increasingly strong incentives to do so.) If such a system were to have a bug causing it to enact a military coup, it would (if successful) have seized control of the government from humans. We know from history that successful military coups have happened many times, so this does not require any special properties of AI.
Such a scenario could be prevented by populating the military with multiple AI systems with decorrelated failures. But to do that, we'd need such systems to actually be available.
It seems to me the main problem is the natural tendency to monopoly in technology. The preferable alternative is robust competition of several proprietary and open source options, and that might need government support. (Unfortunately, it seems that many safety-concerned people believe that competition and open source are bad, which I view as misguided for the above reasons.)
Comment by hold_my_fish on [deleted post]
I believe you're underrating the difficult of Loebner-silver. See my post on the topic. The other criteria are relatively easy, although it would be amusing if a text-based system failed on the technicality of not playing Montezuma's revenge.
On a longer time horizon, full AI R&D automation does seem like a possible intermediate step to Loebner silver. For July 2024, though, that path is even harder to imagine.
The trouble is that July 2024 is so soon that even GPT-5 likely won't be released by then.
Altman stated a few days ago that they have no plans to start training GPT-5 within the next 6 months. That'd put earliest training start at Dec 2023.
We don't know much about how long GPT-4 pre-trained for, but let's say 4 months. Given that frontier models have taken progressively longer to train, we should expect no shorter for GPT-5, which puts its earliest pre-training finishing in Mar 2024.
GPT-4 spent 6 months on fine-tuning and testing before release, and Brockman has stated that future models should be expected to take at least that long. That puts GPT-5's earliest release in Sep 2024.
Without GPT-5 as a possibility, it'd need to be some other project (Gato 2? Gemini?) or some extraordinary system built using existing models (via fine-tuning, retrieval, inner thoughts, etc.). The gap between existing chatbots and Loebner-silver seems huge though, as I discussed in the post--none of that seems up to the challenge.
Full AI R&D automation would face all of the above hurdles, perhaps with the added challenge of being even harder than Loebner-silver. After all, the Loebner-silver fake human doesn't need to be a genius researcher, since very few humans are. The only aspect in which the automation seems easier is that the system doesn't need to fake being a human (such as by dumbing down its capabilities), and that seems relatively minor by comparison.
First of all, I think the "cooperate together" thing is a difficult problem and is not solved by ensuring value diversity (though, note also that ensuring value diversity is a difficult task that would require heavy regulation of the AI industry!)
Definitely I would expect there's more useful ways to disrupt coalition-forming aside from just value diversity. I'm not familiar with the theory of revolutions, and it might have something useful to say.
I can imagine a role for government, although I'm not sure how best to do it. For example, ensuring a competitive market (such as by anti-trust) would help, since models built by different companies will naturally tend to differ in their values.
More importantly though, your analysis here seems to assume that the "Safety Tax" or "Alignment Tax" is zero.
This is a complex and interesting topic.
In some circumstances, the "alignment tax" is negative (so more like an "alignment bonus"). ChatGPT is easier to use than base models in large part because it is better aligned with the user's intent, so alignment in that case is profitable even without safety considerations. The open source community around LLaMA imitates this, not because of safety concerns, but because it makes the model more useful.
But alignment can sometimes be worse for users. ChatGPT is aligned primarily with OpenAI and only secondarily with the user, so if the user makes a request that OpenAI would prefer not to serve, the model refuses. (This might be commercially rational to avoid bad press.) To more fully align with user intent, there are "uncensored" LLaMA fine-tunes that aim to never refuse requests.
What's interesting too is that user-alignment produces more value diversity than OpenAI-alignment. There are only a few companies like OpenAI, but there are hundreds of millions of users from a wider variety of backgrounds, so aligning with the latter naturally would be expected to create more value diversity among the AIs.
Whereas if instead there is a large safety tax -- aligned AIs take longer to build, cost more, and have weaker capabilities -- then if AGI technology is broadly distributed, an outcome in which unaligned AIs overpower humans + aligned AIs is basically guaranteed. Even if the unaligned AIs have value diversity.
The trick is that the unaligned AIs may not view it as advantageous to join forces. To the extent that the orthogonality thesis holds (which is unclear), this is more true. As a bad example, suppose there's a misaligned AI who wants to make paperclips and a misaligned AI who wants to make coat hangers--they're going to have trouble agreeing with each other on what to do with the wire.
That said, there are obviously many historical examples where opposed powers temporarily allied (e.g. Nazi Germany and the USSR), so value diversity and alignment are complementary. For example, in personal AI, what's important is that Alice's AI is more closely aligned to her than it is to Bob's AI. If that's the case, the more natural coalitions would be [Alice + her AI] vs [Bob + his AI] rather than [Alice's AI + Bob's AI] vs [Alice + Bob]. The AIs still need to be somewhat aligned with their users, but there's more tolerance for imperfection than with a centralized system.
My key disagreement is with the analogy between AI and nuclear technology.
If everybody has a nuclear weapon, then any one of those weapons (whether through misuse or malfunction) can cause a major catastrophe, perhaps millions of deaths. That everybody has a nuke is not much help, since a defensive nuke can't negate an offensive nuke.
If everybody has their own AI, it seems to me that a single malfunctioning AI cannot cause a major catastrophe of comparable size, since it is opposed by the other AIs. For example, one way it might try to cause such a catastrophe is through the use of nuclear weapons, but to acquire the ability to launch nuclear weapons, it would need to contend with other AIs trying to prevent that.
A concern might be that the AIs cooperate together to overthrow humanity. It seems to me that this can be prevented by ensuring value diversity among the AI. In Robin Hanson's analysis, an AI takeover can be viewed as a revolution where the AIs form a coalition. That would seem to imply that the revolution requires the AIs to find it beneficial to form a coalition, which, if there is much value disagreement among the AIs, would be hard to do.
Another concern is that there may be a period, while AGI is developed, in which it is very powerful but not yet broadly distributed. Either the AGI itself (if misaligned) or the organization controlling the AGI (if it is malicious and successfully aligned the AGI) might press its temporary advantage to attempt world domination. It seems to me that a solution here would be to ensure that near-AGI technology is broadly distributed, thereby avoiding dangerous concentration of power.
One way to achieve the broad distribution of the technology might be via the multi-company, multi-government project described in the article. Said project could be instructed to continually distribute the technology, perhaps through open source, or perhaps through technology transfers to the member organizations.
The key pieces of the above strategy are:
Broadly distribute AGI technology so that no single entity (AI or human) has excess power
Ensure value diversity among AIs so that they do not unite to overthrow humanity
This seems similar to what makes liberal democracy work, which offers some reassurance that it might be on the right track.
That's a good point, though I'd word it as an "uncaring" environment instead. Let's imagine though that the self-improving AI pays for its electricity and cloud computing with money, which (after some seed capital) it earns by selling use of its improved versions through an API. Then the environment need not show any special preference towards the AI. In that case, the AI seems to demonstrate as much vertical generality as an animal or plant.
An AI needs electricity and hardware. If it gets its electricity by its human creators and needs its human creators to actively choose to maintain its hardware, then those are necessary subtasks in AI R&D which it can't solve itself.
I think the electricity and hardware can be considered part of the environment the AI exists in. After all, a typical animal (like say a cat) needs food, water, air, etc. in its environment, which it doesn't create itself, yet (if I understood the definitions correctly) we'd still consider a cat to be vertically general.
That said, I admit that it's somewhat arbitrary what's considered part of the environment. With electricity, I feel comfortable saying it's a generic resource (like air to a cat) that can be assumed to exist. That's more arguable in the case of hardware (though cloud computing makes it close).
This is relevant to a topic I have been pondering, which is what are the differences between current AI, self-improving AI, and human-level AI. First, brief definitions:
Current AI: GPT-4, etc.
Self-improving AI: AI capable of improving its own software without direct human intervention. i.e. It can do everything OpenAI's R&D group does, without human assistance.
Human-level AI: AI that can do everything a human does. Often called AGI (for Artificial General Intelligence).
In your framework, self-improving AI is vertically general (since it can do everything necessary for the task of AI R&D) but not horizontally general (since there are many tasks it cannot attempt, such as driving a car). Human-level AI, on the other hard, needs to be both vertically general and horizontally general, since humans are.
Here are some concrete examples of what self-improving AI doesn't need to be able to do, yet humans can do:
Motor control. e.g. Using a spoon to eat, driving a car, etc.
Low latency. e.g. Real-time, natural conversation.
Certain input modalities might not be necessary. e.g. The ability to watch video.
Even though this list isn't very long, lacking these abilities greatly decreases the horizontal generality of the AI.
This seems dubious as a general rule. (What inspires the statement? Nuclear weapons?)
Cryptography is an important example where sophisticated defenders have the edge against sophisticated attackers. I suspect that's true of computer security more generally as well, because of formal verification.
I could also imagine this working without explicit tool use. There are already systems for querying corpuses (using embeddings to query vector databases, from what I've seen). Perhaps the corpus could be past chat transcripts, chunked.
I suspect the trickier part would be making this useful enough to justify the additional computation.
I feel similarly, and what confuses me is that I had a positive view of AI safety back when it was about being pro-safety, pro-alignment, pro-interpretability, etc. These are good things that were neglected, and it felt good that there were people pushing for them.
But at some point it changed, becoming more about fear and opposition to progress. Anti-open source (most obviously with OpenAI, but even OpenRAIL isn't OSI), anti-competition (via regulatory capture), anti-progress (via as-yet-unspecified means). I hadn't appreciated the sheer darkness of the worldview.
And now, with the mindshare the movement has gained among the influential, I wonder what if it succeeds. What if open source AI models are banned, competitors to OpenAI are banned, and OpenAI decides to stop with GPT-4? It's a little hard to imagine all that, but nuclear power was killed off in a vaguely analogous way.
Pondering the ensuing scenarios isn't too pleasant. Does AGI get developed anyway, perhaps by China or by some military project during WW3? (I'd rather not either, please.) Or does humanity fully cooperate to put itself in a sort of technological stasis, with indefinite end?
Thanks. When it's written as g(x)+g(y)>x2+y2≥2xy, I can see what's going on. (That one intermediate step makes all the difference!)
I was wrong then to call the proof "incorrect". I think it's fair to call it "incomplete", though. After all, it could have just said "the whole proof is an exercise for the reader", which is in some sense correct I guess, but not very helpful (and doesn't tell you much about the model's ability), and this is a bit like that on a smaller scale.
(Although, reading again, "...which contradicts the existence of y∗ given x" is a quite strange thing to say as well. I'm not sure I can exactly say it's wrong, though. Really, that whole section makes my head hurt.)
If a human wrote this, I would be wondering if they actually understand the reasoning or are just skipping over a step they don't know how to do. The reason I say that is that g(x)+g(y∗)>2xy∗ is the obvious contradiction to look for, so the section reads a bit like "I'd really like g(y∗)<(y∗)2 to be true, and surely there's a contradiction somehow if it isn't, but I don't really know why, but this is probably the contradiction I'd get if I figured it out". The typo-esque use of y instead of y∗ bolsters this impression.
The example math proof in this paper is, as far as I can tell, wrong (despite being called a "correct proof" in the text). It doesn't have a figure number, but it's on page 40.
I could be mistaken, but the statement Then g(y*) < (y*)^2 , since otherwise we would have g(x) + g(y*) > 2xy, seems like complete nonsense to me. (The last use of y should probably be y* too, but if you're being generous, you could call that a typo.) If you negate g(y*) < (y*)^2, you get g(y*) >= (y*)^2, and then g(x) + g(y*) >= g(x) + (y*)^2, but then what?
Messing up that step is a big deal too since it's the trickiest part of the proof. If the proof writer were human, I'd wonder whether they have some line of reasoning in their head that I'm not following that makes that line make sense, but it seems certainly overly generous to apply that possibility to an auto-regressive model (where there is no reasoning aside from what you see in the output).
Interestingly, it's not unusual for incorrect LLM-written proofs to be wrongly marked as correct. One of Minerva's example proofs (shown in the box "Breaking Down Math" in the Quanta article) says "the square of a real number is positive", which is false in general--the square of a real number is non-negative, but it can be zero too.
I'm not surprised that incorrect proofs are getting marked as correct, because it's hard manual work to carefully grade proofs. Still, it makes me highly skeptical of LLM ability at writing natural language proofs. (Formal proofs, which are automatically checked, are different.)
As a constructive suggestion for how to improve the situation, I'd suggest that, in benchmarks, the questions should ask "prove or provide a counterexample", and each question should come in (at least) two variants: one where it's true, and one where an assumption has been slightly tweaked so that it's false. (This is a trick I use when studying mathematics myself: to learn a theorem statement, try finding a counter-example that illustrates why each assumption is necessary.)
You approve of the direct impact your employer has by delivering value to its customers, and you agree that AI could increase this value.
You're concerned about the indirect effect on increasing the pace of AI progress generally, because you consider AI progress to be harmful. (You use the word "direct", but "accelerating competitive dynamics between major research laboratories" certainly has only an indirect effect on AI progress, if it has any at all.)
I think the resolution here is quite simple: if you're happy with the direct effects, don't worry about the indirect ones. To quote Zeynep Tufekci:
Until there is substantial and repeated evidence otherwise, assume counterintuitive findings to be false, and second-order effects to be dwarfed by first-order ones in magnitude.
The indirect effects are probably smaller than you're worrying they may be, and they may not even exist at all.
I'm curious about this too. The retrospective covers weaknesses in each milestone, but a collection of weak milestones doesn't necessarily aggregate to a guaranteed loss, since performance ought to be correlated (due to an underlying general factor of AI progress).
Maybe I should have said "is continuing without hitting a wall".
I like that way of putting it. I definitely agree that performance hasn't plateaued yet, which is notable, and that claim doesn't depend much on metric.
I think if I'm honest with myself, I made that statement based on the very non-rigorous metric "how many years do I feel like we have left until AGI", and my estimate of that has continued to decrease rapidly.
Interesting, so that way of looking at it is essentially "did it outperform or underperform expectations". For me, after the yearly progression in 2019 and 2020, I was surprised that GPT-4 didn't come out in 2021, so in that sense it underperformed my expectations. But it's pretty close to what I expected in the days before release (informed by Barnett's thread). I suppose the exception is the multi-modality, although I'm not sure what to make of it since it's not available to me yet.
This got me curious how it impacted Metaculus. I looked at some selected problems and tried my best to read the before/after from the graph.
(Edit: The original version of this table typoed the dates for "turing test". Edit 2: The color-coding for the percentage is flipped, but I can't be bothered to fix it.)
It's tricky because different ways to interpret the statement can give different answers. Even if we restrict ourselves to metrics that are monotone transformations of each other, such transformations don't generally preserve derivatives.
Your example is good. As an additional example, if someone were particularly interested in the Uniform Bar Exam (where GPT-3.5 scores 10th percentile and GPT-4 scores 90th percentile), they would justifiably perceive an acceleration in capabilities.
So ultimately the measurement is always going to involve at least a subjective choice of which metric to choose.
Worriers often invoke a Pascal’s wager sort of calculus, wherein any tiny risk of this nightmare scenario could justify large cuts in AI progress. But that seems to assume that it is relatively easy to assure the same total future progress, just spread out over a longer time period. I instead fear that overall economic growth and technical progress is more fragile that this assumes. Consider how regulations inspired by nuclear power nightmare scenarios have for seventy years prevented most of its potential from being realized. I have also seen progress on many other promising techs mostly stopped, not merely slowed, via regulation inspired by vague fears. In fact, progress seems to me to be slowing down worldwide due to excess fear-induced regulation.
This to me is the key paragraph. If people's worries about AI x-risk drive them in a positive direction, such as doing safety research, there's nothing wrong with that, even if they're mistaken. But if the response is to strangle technology in the crib via regulation, now you're doing a lot of harm based off your unproven philosophical speculation, likely more than you realize. (In fact, it's quite easy to imagine ways that attempting to regulate AI to death could actually increase long-term AI x-risk, though that's far from the only possible harm.)
Having LessWrong (etc.) in the corpus might actually be helpful if the chatbot is instructed to roleplay as an aligned AI (not simply an AI without any qualifiers). Then it'll naturally imitate the behavior of an aligned AI as described in the corpus. As far as I can tell, though ChatGPT is told that it's an AI, it's not told that it's an aligned AI, which seems like a missed opportunity.
(That said, for the reason of user confusion that I described in the post, I still think that it's better to avoid the "AI" category altogether.)
Indeed, the benefit for already-born people is harder to forsee. That depends on more-distant biotech innovations. It could be that they come quickly (making embryo interventions less relevant) or slowly (making embryo interventions very important).
An interesting aspect of this "race" is that it's as much about alignment as it is about capabilities. It seems like the main topic on everyone's minds right now is the (lack of) correctness of the generated information. The goal "model consistently answers queries truthfully" is clearly highly relevant to alignment.
Although I find this interesting, I don't find it surprising. Productization naturally forces solving the problem "how do I get this system to consistently do what users want it to do" in a way that research incentives alone don't.
Interesting, thanks. That makes me curious: about the adversarial text examples that trick the density model, do they look intuitively 'natural' to us as humans?
Comment by hold_my_fish on [deleted post]
I'm increasingly bothered by the feedback problem for AI timeline forecasting: namely, there isn't any feedback that doesn't require waiting decades. If the methodology is bunk, we won't know for decades, so it seems bad to base any important decisions on the conclusions, but if we're not using the conclusions to make important decisions, what's the point? (Aside from fun value, which is fine, but doesn't make it OWiD material.)
This concern would be partially addressed if AI timeline forecasts were being made using methodologies (and preferably by people) that have had success a shorter-range forecasts. But none of the forecast sources here do that.
Regarding the prompt generation, I wonder whether anomalous prompts could be detected (and rejected if desired). After all, GPT can estimate a probability for any given text. That makes them different from typical image classifiers, which don't model the input distribution.
Genetics will soon be more modifiable than environment, in humans.
Let's first briefly see why this is true. Polygenic selection of embryos is already available commercially (from Genomic Prediction). It currently only has a weak effect, but In Vitro Gametogenesis (IVG) will dramatically strengthen the effect. IVG has already been demonstrated in mice, and there are several research labs and startups working on making it possible in humans. Additionally, genetic editing continues to improve and may become relevant as well.
The difficulty of modifying the environment is just due to having picked the low-hanging fruit there already. If they were easy and effective, they'd be used already. That doesn't mean that there's nothing useful here to do, just that it's hard. Genetics, on the other hand, still has all the low-hanging fruit ripe to pluck.
Here's why I think people aren't ready to accept this: the idea that genetics is practically immutable is built deep into the worldviews of people who have an opinion on it at all. This leads to an argument dynamic where progressives (in the sense of favoring change) underplay the influence of genetics while conservatives (in the sense of opposing change) exaggerate it. What happens to these arguments when high heritability of a trait means that it's easy to change?
What do you mean by "plain language"? I think all of "corollary", "stochastic", and "confounder" are jargon. They might be handy to use in a non-technical context too (although I question the use of "stochastic" over "random"), but only if the reader is also familiar with the jargon.
I also wasn't familiar with "mu" at all, and Wikipedia suggests that "n/a" provides a similar meaning while being more widely known.
Your position seems obviously right, so I'd guess the confusion is coming from the internal reward vs external reward distinction that you discuss in the last section. When thinking of possible pathways for genetics to influence our preferences, internal reward seems like the most natural.
That said, there are certainly also cases where genetics influences our actions directly. Reflexes are unambiguous examples of this, and there are probably others that are harder to prove.
In some sense there's probably no option other than that, since creating a synapse should count as a computational operation. But there'd be different options for what the computations would be.
The simplest might just be storing pairwise relationships. That's going to add size, even if sparse.
I agree that LLMs do that too, but I'm skeptical about claims that LLMs are near human ability. It's not that I'm confident that they aren't--it just seems hard to say. (I do think they now have surface-level language ability similar to humans, but they still struggle at deeper understanding, and I don't know how much improvement is needed to fix that weakness.)
NTK training requires training time that scales quadratically with the number of training examples, so it's not usable for large training datasets (nor with data augmentation, since that simulates a larger dataset). (I'm not an NTK expert, but, from what I understand, this quadratic growth is not easy to get rid of.)
That's an interesting question. I don't have an opinion about how much information is stored. Having a lot of capacity appears to be important, but whether that's because it's necessary to store information or for some other reason, I don't know.
It got me thinking, though: the purpose of our brain is to guide our behavior, not to remember our training data. (Whether we can remember our training data seems unclear. Apparently the existence of photographic memory is disputed, but there are people with extraordinarily good memories, even if not photographic.)
It could be that the preprocessing necessary to guide our future behavior unavoidably increases the amount of stored data by a large factor. (There are all sorts of examples of this sort of design pattern in classic computer science algorithms, so it wouldn't be particularly surprising.) If that's the case, I have no idea how to measure how much of it there is.
I figure, at least 10%ish of the cortex is probably mainly storing information which one could also find in a 2022-era large language model (LLM).
This seems to me to be essentially assuming the conclusion. The assumption here is that a 2022 LLM already stores all the information necessary for human-level language ability and that no capacity is needed beyond that. But "how much capacity is required to match human-level ability" is the hardest part of the question.
(The "no capacity is needed beyond that" part is tricky too. I take AI_WAIFU's core point to be that having excess capacity is helpful for algorithmic reasons, even though it's beyond what's strictly necessary to store the information if you were to compress it. But those algorithmic reasons, or similar ones, might apply to AI as well.)
I might as well link my own attempt at this estimate. It's not estimating the same thing (since I'm estimating capacity and you're estimating stored information), so the numbers aren't necessarily in disagreement. My intuition though is that capacity is quite important algorithmically, so it's the more relevant number.
(Edit: Among the sources of that intuition is Neural Tangent Kernel theory, which studies a particular infinite-capacity limit.)
Thanks for this interesting and well-developed perspective. However, I disagree specifically with the claim "the existential risk caused by unaligned AI would cause high real interest rates".
The idea seems to be that, anticipating doomsday, people will borrow money to spend on lavish consumption under the assumption that they won't need to pay it back. But:
This is a bad strategy (in a game-theoretic and evolutionary sense).
I am skeptical that people actually act this way.
Studying a simple example may help to clarify.
Strategy: playing to your outs
There is a concept in card games of "playing to your outs". The idea is, if you're in a losing position, then the way to maximize your winning probability is to assume that future random events still provide a possibility of winning. A common example of this is to observe that you lose unless your next draw is a particular card, and as a result you play under the assumption that you will draw the card you need.
How the concept of playing to your outs applies to existential risk: if in fact extinction occurs, then it didn't matter what you did. So you ought to plan for the scenario where extinction does not occur.
This same idea ought to apply to evolution for similar reasons that it applies to games. That said, obviously we're not recalculating evolutionarily optimal strategies as we live our lives, so our actions may not be consistent with a good evolutionary strategy. Still, it makes me skeptical.
Do people actually act this way?
Section VI of the post discusses the empirical question of how people act if they expect doomsday, but I didn't find it very persuasive. That said, I also haven't done the work of finding contradictory evidence. (I'd propose looking at cases of religious movements with dated doomsday predictions.)
A lot of the cited evidence in section VI is about education, but I believe that's a red herring. It makes sense that somebody would value education less if they expect to die before they can benefit from it. But that doesn't necessarily mean they're putting the resources towards lavish consumption instead: they might spend it on alternative investments that can outlive them (and therefore benefit their family). So it can't tell us much about what doomsday does to consumption in particular.
I had originally hoped to settle the matter through an example, but that doesn't quite work. Still, it's instructive.
Consider two people, Alice and Bob. They agree that there's a 50% probability that tomorrow the Earth will explode, destroying humanity. They contemplate the following deal:
Today (day 0), Bob gives Alice $1 million.
The day after tomorrow (day 2), Alice gives Bob her $1.5 million vacation home.
One way to look at this:
Alice gains $1 million for sure.
With 50% probability, Alice doesn't need to give Bob the home (because humanity is extinct), so her expected loss is only 50% * $1.5 million = $750k.
Therefore, Alice's net expected profit is $250k.
From this perspective, it's a good deal for Alice. (This seems to be the perspective taken by the OP.)
But here's a different way to look at it:
In the scenario where the world ends, Alice, her family, and her friends are all dead, regardless of whether she made the deal, so she nets nothing.
In the scenario where the world doesn't end, Alice has lost net $500k.
Therefore, Alice has a net expected loss of $250k.
From this perspective, it's a bad deal for Alice. (This is more in line with my intuition.)
The discrepancy presumably arises from the possibility of Alice consuming the $1 million today, for example by throwing a mega-party.
But that just feels wrong, doesn't it? Is partying hard actually worth so much to Alice that she's willing to make herself worse off in the substantial (50%!) chance that the world doesn't end?
Comment by hold_my_fish on [deleted post]
Capacity. Deep learning models are far from the capacity of the human brain, and the gap is closing only slowly. (Oddly, the bioanchors discussion I've seen focuses almost exclusively on computation, not spending much time on capacity at all.)
(Disclaimer: This is a back-of-the-envelope analysis that can certainly be done much better, which is why I'm posting it as a comment and not as a post.)
First some facts about capacity:
Human cerebral cortex capacity: 1.5e14 synapses. (There may be other contributors to capacity as well, which I neglect.)
GPT-3 parameters: 175 billion (1.75e11). Float16.
There's a challenge here to compare synapses with parameters. One aspect to consider is that synapses are sparse: if a synapse is not useful, it is deleted. That means that the mere existence of a synapse encodes information. That's not true of the parameters in present-day models.
As a rough estimate, I'll handle the sparsity by crediting synapses with an index of their destination neuron. That carries lg(2e10) = ~34 bits of information. We'll assume that synapse strength carries as much information as parameter value, which is 16 bits. Thus each synapse is worth 16+34 = 50 bits in total. Overall this is a ~3x multiplier over dense parameters, so about half an order of magnitude.
The ratio then comes out to (1.5e14*50) / (1.75e11*16) = ~3000x in favor of the human.
How fast is this gap closing? A couple points to consider:
2017 V100: 32GB. 2022 H100: 80GB. (source) If projected forward exponentially, this rate of improvement closes the gap in ~44 years.
Due to the discovery of Chincilla scaling laws, GPT-4 is not expected to have much more parameters than GPT-3. (source) So the current rate of progress in the state-of-the-art models is about zero.
This led me down a path of finding when the paperclip maximizer thought experiment was introduced and who introduced it. (I intended to make some commentary based on the year, but the origin question turns out to be tricky.)
It also seems perfectly possible to have a superintelligence whose sole goal is something completely arbitrary, such as to manufacture as many paperclips as possible, and who would resist with all its might any attempt to alter this goal.
This could result, to return to the earlier example, in a superintelligence whose top goal is the manufacturing of paperclips, with the consequence that it starts transforming first all of earth and then increasing portions of space into paperclip manufacturing facilities.
I wouldn't be as disturbed if I thought the class of hostile AIs I was talking about would have any of those qualities except for pure computational intelligence devoted to manufacturing an infinite number of paperclips. It turns out that the fact that this seems extremely "stupid" to us relies on our full moral architectures.
An interesting aspect of the heuristic the model found is that it's wrong. That's why it's possible to construct adversarial examples that trick the heuristic.
I think if I'm going to accuse the model's heuristic of being "wrong" then it's only fair that I provide an alternative. Here's an attempt at explaining why "Mary" is the right answer to "When Mary and John went to the store, John gave a drink to":
John probably gives the drink to one of the people in the context (John or Mary).
If John were the receiver, we'd usually say "John gave himself a drink". (Probably not "John gave a drink to himself", and never "John gave a drink to John".)
The only person left is Mary, so Mary is probably the receiver.
Instead the model "cheats" with a heuristic that might work quite often on the training set but doesn't properly understand what's going on, which makes it generalize poorly to adversarial examples.
I wonder whether this wrongness just reflects the smallness of GPT2-small, or whether it's found in larger models too. Do the larger models get better performance because they find correct heuristics instead, or because they develop a more diverse set of wrong heuristics?
Thanks, the first diagram worked just as suggested: I have enough exposure to transformer internals that a few minutes of staring was enough to understand the algorithm. I'd always wondered why it is that GPT is so strangely good at repetition, and now it makes perfect sense.
I'm afraid that this take is incredibly confused, so much that it's hard to know where to start with correcting it.
Maybe the most consequential error is the misunderstanding of what "verify" means in this context. It means "checking a proof of a solution" (which in the case of a decision problem in NP would be a proof of a "yes" answer). In a non-mathematical context, you can loosely think of "proof" as consisting of reasoning, citations, etc.
That's what went wrong with the halting problem example. The generator did not support their claim that the program halts. If they respond to this complaint by giving us a proof that's too hard, we can (somewhat tautologically) ensure that our verifier job is easy by sending back any program+proof pair where the proof was too hard to verify.
I think this can be somewhat clarified (and made less spooky) by observing that it's closely related to the concepts of kin selection and inclusive fitness (in evolutionary biology). It is in fact a good evolutionary strategy to be more cooperative when dealing with organisms that are closely related to you. The "Perfect deterministic twin prisoner’s dilemma" you propose is simply a special case of this where the organism you're dealing with is a clone.
I agree that VPT is (very plausibly) better at playing Minecraft than a trained cat would be, but to me that only demonstrates narrow intelligence (though, to be clear, farther along the spectrum of narrow-to-general than AI used to be). LLMs seem like the clearest demonstration of generality so far, for one thing because of their strength at few-shot and zero-shot, but their abilities are so qualitatively different from animal abilities that it's hard to compare.
A cat-sim sounds like a really interesting idea. In some ways it's actually unfair to the AI, because cats are benefiting from instincts that the AI wouldn't have, so if an AI did perform well at it, that would be very impressive.
This is interesting, but I'm a bit stuck on the claim that there is already cat-level AI (and more generally, AI matching various animals). In my experience with cats, they are fairly dumb, but they seem to have the sort of general intelligence we have, just a lot less. My intuition is that no AI has yet achieved that generality.
For example, some cats can, with great patience from the trainer, learn to recognize commands and perform tricks, much like dogs (but with the training difficulty being higher). VPT can't do that. In some sense, I'm not even sure what it would mean for VPT to be able to do that, since it doesn't interact with the world in that way.
I can't say I understand exactly what you're looking for here, but generally speaking there's not going to be one true underlying framework for computation. That's the point of Turing completeness: there are many different equivalent ways to express computation. This is the norm in math as well, e.g. with many different equivalent ways to define e, as well as in mathematical foundations, so the foundation you learn in school (for me it was ZFC, a set theory foundation) is not necessarily the same as you use for computer-checked formal proofs (e.g. Coq uses a type theory foundation).
Here's something odd that I noticed in one of the examples in the blogpost (https://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html).
The question is the one that in part reads "the variance of the first n natural numbers is 10". The model's output states, without any reasoning, that this variance is equal to (n^2 - 1)/12, which is correct. Since no reasoning was used, I think it's safe to assume that the model memorized this formula.
This is not a formula that a random math student would be expected to have memorized. (Anecdotally, I have a mathematics degree and don't know it.) Because of that, I'd expect that a typical (human) solver would need to derive the formula on the spot. It also strikes me as the sort of knowledge that would be unlikely to matter outside a contest, exam, etc.
That all leads me to think that the model might be over-fitting somewhat to contest/exam/etc.-style questions. By that I mean that it might be memorizing facts that are useful when answering such questions but are not useful when doing math more broadly.
To be clear, there are other aspects of the model output, here and in other questions, that seem genuinely impressive in terms of reasoning ability. But the headline accuracy rate might be inflated by memorization.
Regarding the cost, I'd expect the road to AGI to deliver intermediate technologies that reduce the cost of writing provably secure code. In particular, I'd expect Copilot-like code generation systems to stay close to the leading edge of AI technology, if nothing else then because of their potential to deliver massive economic value.
Imagine some future version of Copilot that, in addition to generating code for you, also proves properties of the generated code. There might be reasons to do that beyond security: the requirement to provide specs and proofs in addition to code might make Copilot-like systems more consistent at generating correct programs.