thomas-barnes

Posts
Comments

Posts

Brief analysis of OP Technical AI Safety Funding 2024-10-25T19:37:41.674Z

[Linkpost] Michael Nielsen remarks on 'Oppenheimer' 2023-08-31T15:46:06.345Z

Comments

Comment by 22tom (thomas-barnes) on AI Fire Alarm Scenarios · 2024-10-29T17:37:13.437Z · LW · GW

I stumbled upon this post <3 years later and - wow, seems like we're very close to some fire alarms!

Expert Opinion

The world gradually notices that each of these new leaders express at least as much concern about the risks of AI as prior researchers, and that they increasingly agree that AI will pass human levels sometime in the 2030s.

Would Hinton and Bengio tick this box?

By the late 2020s, there's a clear enough consensus among the recognized AI experts that no major news organization is willing to deny that AI represents an important source of near-term risk.

I'm don't think we're quite at the point where no major news organization willing to deny AI risk - but the CAIS statement was signed by a bunch of industry leaders + endorsed by leading politicians including the UK PM. Even if there isn't full consensus on AI x-risk, my guess is that the number of people "willing to deny that AI represents an important source of near-term risk" is in a small minority.

Venture Capital

I find it easy to imagine that a couple of years before we get AGI, we'll see VCs making multi-billion dollar investments in brand new AI companies, and large tech companies occasionally acquiring them for tens of billions.

SSI raised $1 billion with basically no product, whilst $4bn Inflection was (de facto) acquired by Microsoft. So this hasn't quite yet happened, but we're pretty close!

Corporate Budgets

Companies such as DeepMind and OpenAI might throw around compute budgets of, say, $100 billion in one year.

OpenAI/Microsoft says they'll spend $100bn on a US data center. DeepMind also says they'll spend $100bn on IA (though not exclusively compute). Admittedly these are multi-year projects

Military Arms Race / Project Apollo Analogy

Military leaders become convinced that human-level AI is 5-10 years away, given a Manhattan Project-like effort, and that it will give an important military advantage to any country that achieves it.
This likely involves widespread expert agreement that throwing lots of compute at the problem will be an important factor in succeeding.
...
[The Apollo scenario] resembles the prior one, but leading nations are more clearly peaceful, and they're more optimistic that they will be able to copy (or otherwise benefit from?) the first AI.
A leading nation (let's say Canada) makes a massive AI project that hires the best and the brightest, and devotes more money to compute than could reasonably be expected from any other organization that the best and brightest would be willing to work for.

We're definitely not at this point yet for either of these scenarios. However, the recent NSM gives a signal of trending in this direction (e.g. "It is the policy of the United States Government to enhance innovation and competition by bolstering key drivers of AI progress, such as technical talent and computational power."

The Wright Brothers Analogy

This one is pretty hard to assess. It essentially looks like a world where it's very difficult to tell when "AGI" is acheived, given challenges with actually evaluating whether AI models are human-level. Which feels like where we are today? This line feels particularly accurate:

Thus there will be a significant period when one or more AI's are able to exceed human abilities in a moderate number of fields, and that number of fields is growing at a moderate and hard to measure pace.
I expect this scenario would produce massive confusion about whether to be alarmed.

Worms

Stuxnet-like attacks could become more common if software can productively rewrite itself

AI-Cyber attacks are now fairly discussed by governments (e.g. here), though it's unclear how much this is already happening vs is predicted to happen in future.

Most of the rest - Conspicuously Malicious AI Assistants, Free Guy, Age of Em, AI Politicians, AI's Manipulate Politics - have not happened. No warning shot here yet.

Finally,

Turing Test Alarm, Snooze Button Edition

GPT-7 passes a Turing test, but isn't smart enough to do anything dangerous. People interpret that as evidence that AI is safe.

This feels pretty much where we're at, only with GPT-4.

Overall, my sense is there are clear "warning shots of warning shots" - we have clear signs that warning scenarios are not far off at all (and arguably some have already happened). However, the connection between warning sign and response feels pretty random (with maybe the exception of the FLI letter + CAIS statement + Hinton's resignation triggering a lot more attention to safety during the Spring of 2023).

I'm not sure what to conclude from this. Maybe it's "plan for relatively mundance scenarios", since these are the ones with the clearest evidence of triggering a response

Comment by 22tom (thomas-barnes) on A Narrow Path: a plan to deal with AI extinction risk · 2024-10-07T19:38:14.150Z · LW · GW

TL;DR

From a skim, there are many claims here which I agree / sympathise with.

That said, I also want to make sure this piece stands up to the epistemic "sniff" test. The gist of the piece seems to operate around Simulacrum Levels 2 / 3 ["Choose what to say based on what your statement will cause other people to do or believe" / "Say things that signal membership to your ingroup."].^[1]

From a quick epistemic spot check of just the intro, I'd say that half the claims are accurate on SL1. My guess is this is pretty standard for a lot of advocacy-focused writing, but lower than most LW writing.

Below is a short epistemic spot check of the (non-normative) claims in the introduction, to see whether this piece stands up well on Simulacrum Level 1 ["Attempt to describe the world accurately"]. I use emojis to capture whether the claim is backed by some reasoning or a reliable source

✅ = The claim attempts to accurately describe the world (through evidence or reasoning)
❌ = The claim does not attempt to accurately describe the world (e.g. through a lack of evidence, poor evidence, or the misrepesentation of evidence)
❔= Ambiguous

From the top:

(1) ❌

There is a simple truth - humanity’s extinction is possible. Recent history has also shown us another truth - we can create artificial intelligence (AI) that can rival humanity.1

The footnote says "While there are many such metrics, one useful introductory roundup for those less familiar is at I Gave ChatGPT an IQ Test. Here's What I Discovered | Scientific American". The source linked describes someone impressed by ChatGPT's abilities [in March 2023], giving it an IQ of 155. This source (a) is an unusual choice for measuring frontier AI capabilities, and [more importantly] (b) it does not support the claim "recent history shows we can create an AI that can rival humanity"

[Note - I think this claim is likely true, but it's not defensible from this source alone]

(2) ✅

Companies across the globe are investing to create artificial superintelligence – that they believe will surpass the collective capabilities of all humans. They publicly state that it is not a matter of “if” such artificial superintelligence might exist, but “when”.2

The footnote links to these two sources:

The first is from Chief of Staff at Anthropic. They state "For the same reasons I expect us to reach AGI, I expect it to progress beyond this point, to where we have “superhuman” systems."
The second is announcing superalignment fast grants. They state "We believe superintelligence could arrive within the next 10 years."

(3) ❌

Reasonable estimates by both private AI companies and independent third parties indicate that they believe it could cost only tens to hundreds of billions of dollars to create artificial superintelligence.

No source is given. It's not clear what "reasonable" estimates they are referring to. Cotra's bioanchors says that companies might be willing to spend ~$100 bn to create AGI. But crucially, $100bn today might not buy you enough compute / capabilities.

[Again, I think this claim could be true, but there's no source and "reasonable" allows for too much slippage]

(4)✅

[Catastrophic and extinction] risks have been acknowledged by world3 leaders4, leading scientists and AI industry leaders567, and analyzed by other researchers, including the recent Gladstone Report commissioned by the US Department of State8 and various reports by the Center for AI Safety and the Future of Life Institute.910

Footnotes 3 - 10 aim to support the claim of consensus on AI x-risks. Looking at each in turn:

3 and 4 are from Rishi Sunak [former UK PM] and President von der Leyen. The former said "In the most unlikely but extreme cases, there is even the risk that humanity could lose control of AI completely…". The latter quotes the CAIS statement. So they do both acknowledge the risk. However there are of course many who have not acknowledged this.
5 and 6 are the CAIS and FLI letters. CAIS definitely has leading scientists & AI industry leaders acknowledge the risks.
7 is from Sam Altman: "Development of superhuman machine intelligence (SMI) is probably the greatest threat to the continued existence of humanity"
8 is the Gladstone report, which definitely acknowledges the risks, but is a long way from "The US government recognizes AI x-risk"
9 and 10 are overviews of AI x-risk.

Overall I would say these mostly support the claim.

From then on, a lot more claims (in "The Problem" and "The Solution") are made without support. I think this is forgivable if they're backed in later parts of the report. At some future date, I might go through Phases 0, 1 and 2 (or someone else is very welcome to have a stab)

^{^}
(To give some benefit of the doubt, I'll add that (a) this piece feels lower on the Simulacrum-o-meter than Situational Awareness, (b) this piece is on about the same Simulacrum level as other AI policy debate pieces, and (c) it's unsurprisingly high given an intention to persuade rather than inform. My reason for scrutinizing this is not because it's poor - I just happened to be sufficiently motivated / nerdsniped at the time of reading it.)

Comment by 22tom (thomas-barnes) on The Worst Form Of Government (Except For Everything Else We've Tried) · 2024-03-18T22:12:35.856Z · LW · GW

I think this is similar to the governance arrangement in Northern Ireland that ended the troubles (for the most part). Both sides need to share power in order to govern. If one side is perceived to go too far then the other can resign from government, effectively vetoing the other.

Comment by 22tom (thomas-barnes) on Consider Joining the UK Foundation Model Taskforce · 2023-07-10T14:00:58.265Z · LW · GW

Also, for those eligible to work in the UK, consider applying to work in the taskforce here (Deadline tomorrow!)

User info