Thoughts on the Feasibility of Prosaic AGI Alignment?
post by iamthouthouarti
I’d like to preface by saying that I am not an expert on AI by any means, nor am I remotely involved with any kind of research or studies relevant to ML. I have no insight regarding any of the technical or mathematical aspects of discussions about this technology, and only deal in abstracts.
If you’re still reading this:
Let’s assume two things: (A) that the scaling hypothesis will continue to provide real-world empirical evidence that it’s a plausible approach to AGI (such as with GPT), and (B), that bigger, more well-funded institutions (such as Deepmind, GoogleBrain, and MicrosoftAI) will shift focus from building an AGI that results from or shows something new being revealed about intelligence to adopting OpenAI’s strategy of simply throwing more compute and hardware at the problem to get results (something that they actually have the resources to do in an uncomfortably short-term timeframe).
Whatever you believe (https://www.lesswrong.com/posts/N6vZEnCn6A95Xn39p/are-we-in-an-ai-overhang?commentId=jbD8siv7GMWxRro43 [LW(p) · GW(p)]) to be the actual likelihood of (B), please just humor me for the sake of discussion.
If you consider both assumptions (A) and (B) to be true with high probability, then you’re ultimately conceding that a prosaic AGI is the kind we’re most likely to build. This is discounting the unfortunately less-likely (imo) possibility that another, fundamentally different approach will succeed first.
I say “unfortunately” due to the fact that, by my understanding, most approaches towards AGI alignment (use MIRI as an example) aren’t relevant to the alignment of a prosaic AGI.
That’s not to say that there aren’t approaches towards this issue, because there are (https://www.lesswrong.com/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai [LW · GW]). The problem is that these proposals have caveats that make institutions that I hold in very high regard (MIRI) consider these approaches to be almost certainly impossible.[(https://www.lesswrong.com/posts/Djs38EWYZG8o7JMWY/paul-s-research-agenda-faq [LW · GW]), (https://www.lesswrong.com/posts/S7csET9CgBtpi7sCh/challenges-to-christiano-s-capability-amplification-proposal [LW · GW])]
But regardless, there is still some debate regarding whether or not Yudkowsky’s objections to the current proposals are as much of a knock-down argument in favor of their irreducible impossibility. (https://www.lesswrong.com/posts/3nDR23ksSQJ98WNDm/developmental-stages-of-gpts [LW · GW])
So, I ask, what are your personal takes? Is prosaic alignment almost certainly impossible, or is there a non-negligible amount of hope by your own intuition or evidence?
Comments sorted by top scores.
comment by rohinmshah ·
2020-08-22T17:33:20.530Z · LW(p) · GW(p)
A few people (including me) have optimistic (relative to MIRI) takes here [? · GW].Replies from: iamthouthouarti, iamthouthouarti
↑ comment by iamthouthouarti ·
2020-09-01T23:56:16.236Z · LW(p) · GW(p)
I’m a little confused though. I’m aware of Yudkowsky’s misgivings regarding the possible failings of prosaic AGI alignment, but I’m not sure where he states it to be border-line impossible or worse. Also, when you refer to MIRI being highly pessimistic of prosaic AGI alignment, are you referring to the organization as a whole, or a few key members?
I also don’t understand why this disparity of projections exists. Is there a more implicit part of the argument that neither party (Paul Christiano and MIRI) haven’t publicly adressed?
EDIT: Is the argument more so that it isn't currently possible due to a lack of understanding regarding what corrigibility even is, without entertaining how possible it might be some years down the line?
Replies from: rohinmshah
↑ comment by rohinmshah ·
2020-09-02T23:38:35.027Z · LW(p) · GW(p)
I’m not sure where he states it to be border-line impossible or worse.
Here's a recent comment [LW(p) · GW(p)], which doesn't exactly say that but seems pretty close.
When you refer to MIRI being highly pessimistic of prosaic AGI alignment, are you referring to the organization as a whole, or a few key members?
I don't know -- people at MIRI don't say much about their views; I'm generally responding to a stereotyped caricature of what people associate with MIRI because I don't have any better model. (You can see some more discussion about this "MIRI viewpoint" here [LW(p) · GW(p)].) I've heard from other people that these viewpoints should be most associated with Nate, Eliezer and Benya, but I haven't verified this myself.
I also don’t understand why this disparity of projections exists. Is there a more implicit part of the argument that neither party (Paul Christiano and MIRI) haven’t adressed?
I don't know. To my knowledge the "doom" camp hasn't really responded to the points raised, though here [LW · GW] is a notable exception.Replies from: iamthouthouarti
↑ comment by iamthouthouarti ·
2020-09-03T21:49:09.612Z · LW(p) · GW(p)
The most glaring argument that I could see raised against Christiano’s IDA is that it assumes a functioning AGI would already be developed before measures are taken to make it corrigible. At the same time though, that argument may very well be due to misunderstanding on my part. It’s also possible that MIRI would prefer that the field prioritize over seemingly preparing for non-FOOM scenarios. But I don’t understand how it couldn’t “possibly, possibly, possibly work”.
comment by Charlie Steiner ·
2020-08-22T02:04:46.804Z · LW(p) · GW(p)
I think it's absolutely feasible, but my idea of what a solution looks like is probably in a minority (if I had to guess, maybe of ~30%?)
All you have to do is understand what it is you mean by the AI fulfilling human values, in a way that can be implemented in the architecture and training procedure of a prosaic AI. Easy peasy, lemon squeezy.
The majority of other feasible-ers is mostly dominated by Paulians right now, who want to solve the problem without having to understand that complicated human values thing. Typically by trusting in humans and giving them big awesome planning powers, or using their oversight and feedback to choose good things.Replies from: iamthouthouarti
comment by Razied ·
2020-08-22T23:54:14.968Z · LW(p) · GW(p)
My estimate for the most likely Good Path is something like the following:
1- build a superhuman-level GPT-N
2- enforce absolute secrecy and very heavily restrict access to the model.
3- patch obvious security holes.
4- make it model future progress in AI safety by asking it to predict the contents of highly cited papers from the 2050s.
5- rigorously vet and prove the contents of those papers
6- build safe AGI from those papers
Replies from: iamthouthouarti
↑ comment by iamthouthouarti ·
2020-08-23T17:45:55.546Z · LW(p) · GW(p)
Interesting, but what is the probability you assign to this chain of events? Just as well, the probability you would assign to the advent of transformative AI (AGI) being prosaic- as in its achieved by scaling existing architectures with more compute and better hardware?
Replies from: Razied
↑ comment by Razied ·
2020-08-23T18:31:51.017Z · LW(p) · GW(p)
I am not sure at all about a specific probability for this exact chain of events. I think the secrecy part is quite likely (90%) to happen once a lab actually gets something human level, no matter their commitment to openness, i think seeing their model become truly human-level would scare the shit out of them. Patching obvious security holes also seems 90% likely to me, even Yann Lecun would do that. The real uncertainties come from whether the lab would try to use the model to solve AI safety, or whether they would think their security patches are enough, and push for monetizing the model directly, I'm pretty sure Deepmind and OpenAI would do something like that, I'm unsure about the others.
Regarding the probability of transformative AI being prosaic, i'm thinking 80%. GPT-3 has basically guaranteed that we will explore that particular approach as far as it can go. When I look at all the ways that I can think of making GPT better, of training it faster, of merging image and video understanding into it, of giving it access to true Metadata for each example, longer context length, etc. I see just how easy it is to improve it.
I am completely unsure about timelines. I have a small project going on where I'll try to get a timeline probability estimate from estimates of the following factors:
cheapness of compute (including next generation computing possibilities)
data growth. Text, video, images, games, vr interaction
investment rate (application vs leading research)
Response of investment rate to increased progress
Response of compute availability to investment
researcher numbers as a function of increased progress
different approaches that could lead to AGI (simulation: minecraft style. Joint Text comprehension with image and video understanding, generative stuff?)
level of compute required for AGI
effect of compute availability on speed of algorithm discovery (architecture search)
discovery of new model architectures
discovery of new training algorithms
discovery of new approaches(like GANs, alphaZero, etc.)
switch to secrecy and impact on speed of progress
impact of safety concerns on speed