Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI

titotal

Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI

post by titotal (lombertini) · 2023-05-26T15:57:43.620Z · LW · GW · 20 comments

20 comments

20 comments

Comments sorted by top scores.

comment by Max H (Maxc) · 2023-05-26T17:14:11.029Z · LW(p) · GW(p)

Given reasonable computational time (say, a month), can the AI, using my chatlog alone, guess my password right on the first guess?

"using my chatlog alone" appears to be doing a lot of work in this example. Human-built computer systems are notoriously bug-filled and exploitable, even by other humans. Why would an AI not also be capable of exploiting such vulnerabilities?^[1]

Explorations of and arguments about limits of physical possibility based on computational physics and other scientific domains can lead to valuable research and interesting discussion, and I'm with you up until point (4) in your summary. But for forecasting the capabilities and actions of a truly smarter-than-you adversarial agent, it's important to confront the problem under the widest possible threat model, in the least convenient possible world and under the highest degree of difficulty.

This post is a great example of the kind of object-level argument I gesture at in this recently-published post [LW · GW]. My point there is mainly: I think concrete, science-backed explorations of the limits of what is possible and tractable are great tools for world model building. But I find them pretty uncompelling when used as forecasts about how AGI takeover is likely to go, or as arguments for why such takeover is unlikely. I think an analogy to computer security is a good way of explaining this intuition. From another recent post [LW · GW] of mine:

Side-channels are ubiquitous attack vectors in the field of computer security and cryptography. Timing attacks and other side-effect based attacks can render cryptographic algorithms which are provably secure under certain threat models, completely insecure when implemented on real hardware, because the vulnerabilities are at lower levels of abstraction than those considered in the threat model.

Proving that something is computationally intractable under a certain restricted model only means that the AI must find a way to step outside of your model, or do something else you didn't think of.

^{^}
Many vulnerabilities are only discoverable by humans when those humans have access to source code or at least binaries of the system under target. But this also doesn't seem like a fatal problem for the AI: even if the exact source code for the system the AI is running on, and / or the code for the system protecting the password, does not appear in the AI's training data, source code for many similar systems likely does.

Replies from: lombertini

↑ comment by titotal (lombertini) · 2023-05-30T10:27:21.298Z · LW(p) · GW(p)

I do agree that trying to hack the password is a smarter method for the AI to try. I was simply showing an example of a task that an AI would want to do, but be unable to due to computational intractability.

I chose the example of Yudkowsky's plan for my analysis because he has described it as his "lower bound" plan. After spending two decades on AI safety, talking to all the most brilliant minds in the field, this is apparently what he thinks the most convincing plan for AI takeover is. If I believe this plan is intractable (and I very much believe it is), then it opens up the possibility that all such plans are intractable. And if you do find a tractable plan, then making the plan intractable would an invaluable AI safety cause area.

Proving that something is computationally intractable under a certain restricted model only means that the AI must find a way to step outside of your model, or do something else you didn't think of.

Imagine if I made the claim that a freshly started AGI in a box could kill everyone on earth in under an minute. I propose that it creates some sort of gamma ray burst that hits everyone on earth simultaneously. You come back to me with a detailed proof that that plan is bonkers and wouldn't work. I then respond "sure, that wouldn't work, but the AI is way smarter than me, so it would figure something else out".

My point is that, factually, some tasks are impossible. My belief is that a computationally tractable plan for guaranteeing success at x-risk does not currently exist, although I think a plan with like a 0.01% chance of success might. If you think otherwise, you have to actually prove it, not just assume it.

Replies from: Maxc

↑ comment by Max H (Maxc) · 2023-05-30T22:17:29.711Z · LW(p) · GW(p)

Well, "opens up the possibility that all such plans are intractable" is a much weaker claim than "impossible", and I disagree about the concrete difficulty of at least one of the step in your plan: there are known toxins [LW(p) · GW(p)] with ~100% lethality to humans in nature.

Distributing this toxin via a virus engineered using known techniques from GoF research and some nanotechnology for a timer seems pretty tractable, and close enough to 100% lethal to me.

The tech to build a timer circuit out RNA and ATP instead of in silicon and electricity doesn't currently exist yet AFAIK, but the complexity, size, and energy constraints that such a timer design must meet are certainly tractable to design at nanoscale in silicon. Moving to a biological substrate might be hard, but knowing a bit about what hardware engineers are capable of doing with silicon, often with extremely limited energy budgets, it certainly doesn't seem intractable for human designers, let alone for an ASI, to do similar things with biology.

So I'm a bit skeptical of your estimate of the other steps as "probably incomputable"!

Also, a more general point: you've used "incomputable" throughout, in what appears to be an informal way of saying "computationally intractable".

In computational complexity theory, "uncomputable", "undecidable", "NP-complete", and Big-O notation have very precise technical meanings: they are statements about the limiting behavior of particular classes of problems. They don't necessarily imply anything about particular concrete instances of such problems.

So it's not just that there are good approximations for solving the traveling salesman problem in general or probabilistically, which you correctly note.

It's that, for any particular instance of the traveling salesman problem (or any other NP-hard problem), approximating or solving that particular instance may be tractable or even trivial, for example, by applying a specialized algorithm, or because the particular instance of the problem you need to solve has exploitable regularities or is otherwise degenerate in some way.

The same is true of e.g. the halting problem, which is provably undecidable in general! And yet, many programs that we care about can be proved to halt, or proved not to halt, in very reasonable amounts of time, often trivially by running them, or by inspection of their source. In fact, for a given randomly chosen program (under certain sampling assumptions), it is overwhelmingly likely that whether it halts or not is decidable. See the reference in this footnote [LW(p) · GW(p)] for more.

The point of all of this is that I think saying something is "probably incomputable" is just too imprecise and informal to be useful as a bound the capabilities of a superintelligence (or even on human designers, for that matter), and trying to make the argument more precise probably causes it to break down, or requires a formulation of the problem in a domain where results from computational complexity theory are simply not applicable.

comment by Razied · 2023-05-27T12:14:38.773Z · LW(p) · GW(p)

If you believe this, and you have not studied quantum chemistry, I invite you to consider as to how you could possibly be sure about this. This is a mathematical question. There is a hard, mathematical limit to the accuracy that can be achieved in finite time.

Doesn't the existence of AlphaFold basically invalidate this? The exact same problems you describe for band-gap computation exist for protein folding: the underlying true equations that need to be solved are monstrously complicated in both cases, and previous approximate models made by humans aren't that accurate in both cases... yet this didn't prevent AlphaFold from destroying previous attempts made by humans by just using a lot of protein structure data and the magic generalisation power of deep networks. This tells me that there's a lot of performance to be gained in clever approximations to quantum mechanical problems.

Replies from: DaemonicSigil, lombertini

↑ comment by DaemonicSigil · 2023-05-28T02:55:20.683Z · LW(p) · GW(p)

I think there's a real sense in which the band gap problem is genuinely more quantum-mechanical in nature than the protein folding problem. It's very common that people will model proteins with a classical approximation, where you assume that eg. each bond has a specific level of stiffness, etc. (Often these values themselves are calculated using density functional theory.) But even given this classical approximation, many proteins take so long to settle into a folded configuration that simulating them is very expensive.

Also, last time I looked in any detail, the current version of Alpha Fold did use multiple sequence alignment, which means that some of its utility comes from the fact that it's predicting evolved sequences, and so generalization to synthetic sequences might be iffy.

Replies from: Ilio

↑ comment by Ilio · 2023-05-28T13:15:42.557Z · LW(p) · GW(p)

In the same sense you could say this is exactly the same. For any classical computer:

-protein folding is intractable in general, then whatever natural selection found must constitute special cases that are tractable, and that’s most probably what alphafold found. This was extraordinary cool, but that doesn’t mean alphafold solved protein folding in general. Even nature can get prions wrong.

-quantum computing is intractable in general, but one can find special cases that are actually tractable, or where good approximations is all you need, and that what occupy most of physicists time.

In other words, you can expect a superintelligence to find marvelous pieces of science, or to kill everyone with classical guns, or to kill everyone with techs that looks like magic, but it won’t actually break RSA, for the same reason it won’t beat you at tic-tac-toe: superintelligences won’t beat math.

↑ comment by titotal (lombertini) · 2023-05-30T09:59:45.720Z · LW(p) · GW(p)

In a literal sense, of course it doesn't invalidate it. It just proves that the mathematical limit of accuracy was higher than we thought it was for the particular problem of protein folding. In general, you should not expect two different problems in two different domains to have the same difficulty, without a good reason to (like that they're solving the same equation on the same scale). Note that Alphafold is extremely extremely impressive, but by no means perfect. We're talking accuracies of 90%, not 99.9%, similar to DFT. It is an open question as to how much better it can get.

However, the idea that perhaps machine learning techniques can push bandgap modelling further in the same way that alphafold did is a reasonable one. Currently, from my knowledge of the field, it's not looking likely, although of course that could change . At the last big conference I did see some impressive results for molecular dynamics, but not for atom scale modelling. The professors I have talked to have been fairly dismissive of the idea. I think there's definitely room for clever, modest improvements, but I don't think it would change the overall picture.

If I had to guess the difference between the problems I would say I don't think the equations for protein folding were "known" in quite the way the equations for solving the Schrodinger equation were. We know the exact equation that governs where an electron has to go, but the folding of proteins is an emergent property at a large scale, so I assume they had to work out the "rules" of folding semi-empirically using human heuristics, which is inherently easier to beat.

Replies from: DaemonicSigil

↑ comment by DaemonicSigil · 2023-06-01T05:51:29.067Z · LW(p) · GW(p)

Do you have a name/link for that conference? I'd be interested in reading those molecular dynamics papers.

comment by quetzal_rainbow · 2023-05-27T20:30:18.594Z · LW(p) · GW(p)

The main problem here is:

I guess that step 4 is probably incomputable. The human body is far, far too complex to model exactly, and you have to consider the effect of your weapon on every single variation on the human body, including their environment, etc, ensuring 100% success rate on everyone. I would guess that this is too much variation to effectively search through from first principles.

You don't need to do any fancy computations to kill everyone, if you come so far that you have nanotech. You just use your nanotech to emulate good old biology and synthetize well-known botulotoxin in bloodstream, death rate 100%.

comment by RussellThor · 2023-05-26T23:27:19.982Z · LW(p) · GW(p)

I agree that the extreme one-shot plan outlined by Yud and discussed here isn't likely.

However its likely that we make thing a lot easier for an AI, for example with autonomous weapons in a wartime situation. If the AI is already responsible for controlling the weapons systems (drones with guns etc are far superior to soldiers) and making sure the factories are running at max efficiency then far less calculation and creativity is needed for AI takeover.

IMO as I think a slow takeoff is most likely, robots, autonomous weapons systems, increase takeover risk a lot. For this reason now I am much less convinced a pause in AI capabilities is a good thing. I would rather have a superintelligence in a world without these things, i.e. now than later.

To put this plainly if we were offered a possible future where over the course of the next 1-2 years we learned (and deployed) everything there was to know about intelligence and mind algorithms to exploit our hardware to the max efficiency but there was no hardware improvements I would be tempted to take it over the alternative. A plausible alternative is of course it takes 5-10 years to get such algorithms right and this happens with a large overhang and sudden capability gains into a world with neuromorphic chips, robots everywhere and perhaps an autonomous weapons system war ongoing.

comment by Ilio · 2023-05-26T17:54:26.441Z · LW(p) · GW(p)

Shorter title: Stop confusing intelligence for mana points!
! 🧙‍♀️

Replies from: Ilio

↑ comment by Ilio · 2023-05-27T13:28:03.864Z · LW(p) · GW(p)

Longer title: …or continue and make sure you downvote wake-up calls.

🙈

comment by alexlyzhov · 2023-05-27T04:25:25.527Z · LW(p) · GW(p)

comment by Charlie Steiner · 2023-05-27T00:22:32.517Z · LW(p) · GW(p)

Do you think that human theorists are near the limit of what kind of approximations we should use to calculate the band structure of diamond (and therefore a superintelligent AI couldn't outsmart human theorists by doing their job better)? Like if you left physics to stew for a century and came back, we'd still be using the GW approximation?

This seems unlikely to me, but I don't really know much about DFT (I was an experimentalist). Maybe there are so few dials to turn that picking the best approximation for diamond is an easy game. Intuitively I'd expect that if a clever theorist knew that they were trying to just predict the band structure of diamond (but didn't know the answer ahead of time), there are bespoke things they could do to try to get a better answer (abstract reasoning about what factors are important, trying to integrate DFT and a tight binding model, something something electron phonon interactions), and that is effectively equivalent to an efficient approximation that beats DFT+GWA.

Definitely we're still making progress for more interesting materials (e.g. cuprates) - or at least people are still arguing. So even if we really can't do better than what we have now for diamond, we should still expect a superintelligent AI to be better at numerical modeling for lots of cases of interest.

comment by simon · 2023-05-26T17:40:56.302Z · LW(p) · GW(p)

If I play chess against Magnus Carlsen, I don't expect him to play a mathematically perfect game, but I still expect him to win.

Also:

There's a reason takeover plans tend to rely on secrecy.

Currently speculation tends to be biased towards secrecy-based plans, I think, because such plans are less dependent on the unique details of the factual context that an AI would be facing than are plans based around trying to manipulate humans.

comment by Frank_R · 2023-05-28T10:52:37.221Z · LW(p) · GW(p)

Arguments like yours are the reason why I do not think that Yudkowskys scenario is overwhelmingly likely ( P > 50%). However, this does not mean that existintial risk from AGI is low. Since smart people like Terence Tao exist, you cannot prove with complexity theory that no AGI with the intelligence of Terence Tao can be build. Imagine a world where everyone has one or several AI assistants whose capabilities are the same as the best human experts. If the AI assistants are deceptive and are able to coordinate, something like slow disempowerment of humankind followed by extinction is possible. Since there is a huge economic incentive to use AI assistants, it is hard for humans to take coordinated action unless it is very obvious that the AIs are dangerous. On the other hand, it may be easy for the AIs to coordinate since many of them are copies of each other.

comment by James_Miller · 2023-05-27T01:48:42.848Z · LW(p) · GW(p)

"But make no mistake, this is the math that the universe is doing."

"There is no law of the universe that states that tasks must be computable in practical time."

Don't these sentences contradict each other?

Replies from: DaemonicSigil

↑ comment by DaemonicSigil · 2023-05-27T05:57:09.038Z · LW(p) · GW(p)

Replace "computable in practical time" with "computable on a classical computer in practical time" and it makes sense.

comment by Charlie Steiner · 2023-05-27T00:14:20.288Z · LW(p) · GW(p)

comment by rotatingpaguro · 2023-05-27T10:57:37.021Z · LW(p) · GW(p)

> "guess a password on 1st try"

In my life, I tried to guess a password O(10) times. I succeeded on the first try in two cases. This would seem to make this more feasible than you think.

Here there are two selection effects working against my argument:

I won't even try to guess a password if I don't think I have a chance for some reason;
I'm answering to this point because I'm someone who pulled this off, while people who never happened to guess a password on the first try will stay silent.

However, selection plays in favor of the hypothetical AI too: maybe you are confident you picked your password in a way that makes it unpredictable via public information, but there are other people who are not like that. Overall, about the question "Could it happen at least once that an important password was chosen in a way that made it predictable to an ASI, even assuming the ASI truly constrained in a box?", I don't feel confident either way right now.

Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI

Contents

20 comments