AGI in sight: our look at the game board

andream

AGI in sight: our look at the game board

post by Andrea_Miotti (AndreaM), Gabriel Alfour (gabriel-alfour-1) · 2023-02-18T22:17:44.364Z · LW · GW · 135 comments

This is a link post for https://andreamiotti.substack.com/p/agi-in-sight-our-look-at-the-game

  1. AGI is happening soon. Significant probability of it happening in less than 5 years.
  2. We haven’t solved AI Safety, and we don’t have much time left.
  3. Racing towards AGI: Worst game of chicken ever.
    Actors
    Slowing Down the Race
    Question people
      Recommendations:
  4. Conclusion
  5. Disclaimer
None
136 comments

From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this.

1. AGI is happening soon. Significant probability of it happening in less than 5 years.

Five years ago, there were many obstacles on what we considered to be the path to AGI.

But in the last few years, we’ve gotten:

Powerful Agents (Agent57, GATO, Dreamer V3)
Reliably good Multimodal Models (StableDiffusion, Whisper, Clip)
Just about every language tasks (GPT3, ChatGPT, Bing Chat)
Human and Social Manipulation [LW · GW]
Robots (Boston Dynamics, Day Dreamer, VideoDex, RT-1: Robotics Transformer ^[1])
AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are [? · GW].

2. We haven’t solved AI Safety, and we don’t have much time left.

We are very close to AGI. But how good are we at safety right now? Well.

No one knows how to get LLMs to be truthful. LLMs make things up, constantly. It is really hard to get them not to do this, and we don’t know how to do this at scale.

Optimizers quite often break their setup in unexpected ways. There have been quite a few examples of this. But in brief, the lessons we have learned are:

Optimizers can yield unexpected results
Those results can be very weird (like breaking the simulation environment)
Yet very few extrapolate from this and find these as worrying signs

No one understands how large models make their decisions. Interpretability [LW · GW] is extremely nascent, and mostly empirical. In practice, we are still completely in the dark about nearly all decisions taken by large models.

RLHF and Fine-Tuning have [LW · GW] not worked well [LW · GW] so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.

No one knows how to predict AI capabilities. No one predicted the many capabilities of GPT3. We only discovered them after the fact, while playing with the models. In some ways, we keep discovering capabilities now thanks to better interfaces and more optimization pressure by users, more than two years in. We’re seeing the same phenomenon happen with ChatGPT and the model behind Bing Chat.

We are uncertain about the true extent of the capabilities of the models we’re training, and we’ll be even more clueless about upcoming larger, more complex, more opaque models coming out of training. This has been true for a couple of years by now.

3. Racing towards AGI: Worst game of chicken ever.

The Race for powerful AGIs has already started. There already are general AIs. They just are not powerful enough yet to count as True AGIs.

Actors

Regardless of why people are doing it, they are racing for AGI. Everyone has their theses, their own beliefs about AGIs and their motivations. For instance, consider:

AdeptAI is working on giving AIs access to everything. In their introduction post, one can read “True general intelligence requires models that can not only read and write, but act in a way that is helpful to users. That’s why we’re starting Adept: we’re training a neural network to use every software tool and API in the world”, and furthermore, that they “believe this is actually the most practical and safest path to general intelligence” (emphasis ours).

DeepMind has done a lot of work on RL, agents and multi-modalities. It is literally in their mission statement to “solve intelligence, developing more general and capable problem-solving systems, known as AGI”.

OpenAI has a mission statement more focused on safety: “We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome”. Unfortunately, they have also been a major kickstarter of the race with GPT3 and then ChatGPT.

(Since we started writing this post, Microsoft deployed what could be OpenAI’s GPT4 on Bing [LW(p) · GW(p)], plugged directly into the internet.)

Slowing Down the Race

There has been literally no regulation whatsoever to slow down AGI development. As far as we know, the efforts of key actors don’t go in this direction.

We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it.

Here are a few arguments that we have personally encountered, multiple times, for why slowing down AGI development is actually bad:

“AGI safety is not a big problem, we should improve technology as fast as possible for the people”
“Once we have stronger AIs, we can use them to work on safety. So it is better to race for stronger AIs and do safety later.”
“It is better for us to deploy AGI first than [authoritarian country], which would be bad.”
“It is better for us to have AGI first than [other organization], that is less safety minded than us.”
“We can’t predict the future. Possibly, it is better to not slow down AGI development, so that at some point there is naturally a big accident, and then the public and policymakers will understand that AGI safety is a big deal.”
“It is better to have AGI ASAP, so that we can study it longer for safety purposes, before others get it.”
“It is better to have AGI ASAP, so that at least it has access to fewer compute for RSI / world-takeover than in the world where it comes 10 years later.”
“Policymakers are clueless about this technology, so it’s impossible to slow down, they will just fail in their attempts to intervene. Engineers should remain the only ones deciding where the technology goes”

Remember that arguments are soldiers [? · GW]: there is a whole lot more interest in pushing for the “Racing is good” thesis than for slowing down AGI development.

Question people

We could say more. But:

We are not high status, “core” members of the community.
We work at Conjecture, so what we write should be read as biased.
There are expectations of privacy when people talk to us. Not complete secrecy about everything. But still, they expect that we would not directly attribute quotes to them for instance, and we will not do so without each individual’s consent.
We expect we could say more things that would not violate expectations of privacy (public things even!). But we expect niceness norms (that we find often detrimental and naive) and legalities (because we work at what can be seen as a competitor) would heavily punish us.

So our message is: things are worse than what is described in the post!
Don’t trust blindly, don’t assume: ask questions and reward openness.

Recommendations:

Question people, report their answers in your whisper networks, in your Twitter sphere or whichever other places you communicate on.
- An example of “questioning” is asking all of the following questions:
  - Do you think we should race toward AGI? If so, why? If not, do you think we should slow down AGI? What does your organization think? What is it doing to push for capabilities and race for AGI compared to slowing down capabilities?
  - What is your alignment plan? What is your organization’s alignment plan? If you don’t know if you have one, did you ask your manager/boss/CEO what their alignment plan is?
Don’t substitute social fluff for information: someone being nice, friendly, or being liked by people, does not mean they have good plans, or any plans at all. The reverse also holds!
Gossiping and questioning people about their positions on AGI are prosocial activities!
Silence benefits people who lie or mislead in private, telling others what they want to hear.
Open Communication Norms benefit people who are consistent (not necessarily correct, or even honest, but at least consistent).

4. Conclusion

Let’s summarize our point of view:

AGI by default very soon: brace for impact
No safety solutions in sight: we have no airbag
Race ongoing: people are actually accelerating towards the wall

Should we just give up and die?

Nope! And not just for dignity points [LW · GW]: there is a lot we can actually do. We are currently working on it quite directly at Conjecture.

We’re not hopeful that full alignment can be solved anytime soon, but we think that narrower sub-problems with tighter feedback loops, such as ensuring the boundedness of AI systems, are promising directions to pursue.

If you are interested in working together on this (not necessarily by becoming an employee or funding us), send an email with your bio and skills, or just a private message here.

We personally also recommend engaging with the writings of Eliezer Yudkowsky [LW · GW], Paul Christiano [LW · GW], Nate Soares [LW · GW], and John Wentworth [LW · GW]. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.

5. Disclaimer

We acknowledge that the points above don’t go deeply into our models of why these situations are the case. Regardless, we wanted our point of view to at least be written in public.

For many readers, these problems will be obvious and require no further explanation. For others, these claims will be controversial: we’ll address some of these cruxes in detail in the future if there’s interest.

Some of these potential cruxes include:

Adversarial examples are not only extreme cases, but rather they are representative of what you should expect conditioned on sufficient optimization.
Monitoring of increasingly advanced systems does not trivially work, since much of the cognition of advanced systems, and many of their dangerous properties, will be externalized the more they interact with the world.
Even perfect interpretability will not solve the problem alone: not everything is in the feed forward layer, and the more models interact with the world the truer this becomes.
Even with more data, RLHF and fine-tuning can’t solve alignment. These techniques don’t address deception and inner alignment, and what is natural in the RLHF ontology is not natural for humans and vice-versa.

^{^}
Edited to include DayDreamer, VideoDex and RT-1, h/t Alexander Kruel for these additional, better examples.

135 comments

Comments sorted by top scores.

comment by paulfchristiano · 2023-02-20T01:57:31.423Z · LW(p) · GW(p)

RLHF and Fine-Tuning have [LW · GW] not worked well [LW · GW] so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.

These three links are:

The first is Mysteries of mode collapse [LW · GW], which claims that RLHF (as well as OpenAI's supervised fine-tuning on highly-rated responses) decreases entropy. This doesn't seem particularly related to any of the claims in this paragraph, and I haven't seen it explained why this is a bad thing.
The second is Discovering language model behaviors with model-written evaluations and shows that Anthropic's models trained with RLHF have systematically different personalities than the pre-trained model. I'm not exactly sure what claims you are citing, but I think it probably involves some big leaps to interpret this as either directly harmful or connected with traditional stories about risk.
The third is Compendium of problems with RLHF [LW · GW], which primarily links to the previous 2 failures and then discusses theoretical limitations.

I think these are bad citations for the claim that methods are "not working well" or that current evidence points towards trouble.

The current problems you list---"unhelpful, untruthful, and inconsistent"---don't seem like good examples to illustrate your point. These are mostly caused by models failing to correctly predict which responses a human would rate highly. That happens because models have limited capabilities and is rapidly improving as models get smarter. These are not the problems that most people in the community are worried about, and I think it's misleading to say this is what was "theorized" in the past.

I think RLHF is obviously inadequate for aligning really powerful models, both because you cannot effectively constrain a deceptively aligned model and because human evaluators will eventually not be able to understand the consequences of proposed actions. And I think it is very plausible that large language models will pose serious catastrophic risks from misalignment before they are transformative (it seems very hard to tell). But I feel like this post isn't engaging with the substance of those concerns or sensitive to the actual state of evidence about how severe the problem looks like it will be or how well existing mitigations might work.

Replies from: amaury-lorin, Hoagy, amaury-lorin

↑ comment by momom2 (amaury-lorin) · 2023-08-04T15:59:06.552Z · LW(p) · GW(p)

A new paper, built upon the compendium of problems with RLHF, tries to make an exhaustive list of all the issues identified so far: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

↑ comment by Hoagy · 2023-02-22T14:27:55.579Z · LW(p) · GW(p)

Agree that the cited links don't represent a strong criticism of RLHF but I think there's an interesting implied criticism, between the mode-collapse post and janus' other writings on cyborgism etc that I haven't seen spelled out, though it may well be somewhere.

I see janus as saying that if you know how to properly use the raw models, then you can actually get much more useful work out of the raw models than the RLHF'd ones. If true, we're paying a significant alignment tax with RLHF that will only become clear with the improvement and take-up of wrappers around base models in the vein of Loom.

I guess the test (best done without too much fanfare) would be to get a few people well acquainted with Loom or whichever wrapper tool and identify a few complex tasks and see whether the base model or the RLHF model performs better.

Even if true though, I don't think it's really a mark against RLHF since it's still likely that RLHF makes outputs safer for the vast majority of users, just that if we think we're in an ideas arms-race with people trying to advance capabilities, we can't expect everyone to be using RLHF'd models.

↑ comment by momom2 (amaury-lorin) · 2023-08-04T15:58:27.673Z · LW(p) · GW(p)

comment by cfoster0 · 2023-02-18T23:58:40.282Z · LW(p) · GW(p)

If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

Yes? Not sure what to say beyond that.

Without saying anything about the obstacles themselves, I'll make a more meta-level observation: the field of ML has a very specific "taste" for research, such that certain kinds of problems and methods have really high or really low memetic fitness, which tends to make the tails of "impressiveness and volume of research papers, for ex. seen on Twitter" and "absolute progress on bottleneck problems" come apart.

Replies from: Jacy Reese, leogao, gabriel-alfour-1, Aprillion

↑ comment by Jacy Reese Anthis (Jacy Reese) · 2023-02-19T09:45:21.520Z · LW(p) · GW(p)

+1. While I will also respect the request to not state them in the comments, I would bet that you could sample 10 ICML/NeurIPS/ICLR/AISTATS authors and learn about >10 well-defined, not entirely overlapping obstacles of this sort.

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

I don't want people to skim this post and get the impression that this is a common view in ML.

Replies from: lahwran, None, daniel-kokotajlo

↑ comment by the gears to ascension (lahwran) · 2023-02-20T05:21:28.761Z · LW(p) · GW(p)

The problem with asking individual authors is that most researchers in ML don't have a wide enough perspective to realize how close we are. Over the past decade of ML, it seems that people in the trenches of ML almost always think their research is going slower than it is because only a few researchers have broad enough gears models to plan the whole thing in their heads. If you aren't trying to run the search for the foom-grade model in your head at all times, you won't see it coming.

That said, they'd all be right about what bottlenecks there are. Just not how fast we're gonna solve them.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T14:30:54.858Z · LW(p) · GW(p)

The fact that Google essentially panicked and speed-overhauled internally when ChatGPT dropped is a good example of this. Google has very competent engineers, and a very high interest in predicting competition, and they were working on the same problem, and they clearly did not see this coming, despite it being the biggest threat to their monopoly in a long time.

Similarly, I hung out with some computer scientists working on natural language progressing two days ago. And they had been utterly blindsided by it, and were hateful of it, because they basically felt that a lot of stuff they had been banging their heads against and considered unsolvable in the near future had simply, overnight, been solved. They were expressing concern that their department, which until just now had been considered a decent, cutting-edge approach, might be defunded and closed down.

I am not in computer science, I can only observe this from the outside. But I am very much seeing that statements made confidently about limitations by supposed experts have repeatedly become worthless within years, and that people are blindsided by the accelerations and achievements of people who work in closely related fields. Also that explanations of how novel systems work by people in related fields often clearly represent how these novel systems worked a year or two ago, and are no longer accurate in ways that may first seem subtle, but make a huge difference.

↑ comment by [deleted] · 2023-02-20T04:51:34.297Z · LW(p) · GW(p)

I don't want people to skim this post and get the impression that this is a common view in ML.

So you're saying that in ML, there is a view that there are obstacles that a well funded lab can't overcome in 6 months.

Replies from: lahwran, Jacy Reese

↑ comment by the gears to ascension (lahwran) · 2023-02-20T05:25:28.456Z · LW(p) · GW(p)

For what it's worth, I do think that's true. There are some obstacles that would be incredibly difficult to overcome in 6 months, for anyone. But they are few, and dwindling.

↑ comment by Jacy Reese Anthis (Jacy Reese) · 2023-02-20T13:17:51.564Z · LW(p) · GW(p)

Yes.

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-02-20T06:02:51.929Z · LW(p) · GW(p)

Such a survey was done recently, IIRC. I don't remember the title or authors but I remember reading through it to see what barriers people cited, and being unimpressed. :( I wish I could find it again.

Replies from: Lanrian

↑ comment by Lukas Finnveden (Lanrian) · 2023-02-20T06:33:05.567Z · LW(p) · GW(p)

This one? https://link.springer.com/article/10.1007/s13748-021-00239-1

Replies from: Lanrian, daniel-kokotajlo

↑ comment by Lukas Finnveden (Lanrian) · 2023-02-20T06:35:40.646Z · LW(p) · GW(p)

LW discussion https://www.lesswrong.com/posts/GXnppjWaQLSKRvnSB/deep-limitations-examining-expert-disagreement-over-deep [LW · GW]

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-02-20T20:26:37.869Z · LW(p) · GW(p)

I think so, thanks!

↑ comment by leogao · 2023-02-20T02:47:09.933Z · LW(p) · GW(p)

I think your meta level observation seems right. Also, I would add that bottleneck problems in either capabilities or alignment are often bottlenecked on resources like serial time.

(My timelines, even taking all this into account, are only like 10 years---I don't think these obstacles are so insurmountable that they buy decades.)

↑ comment by Gabriel Alfour (gabriel-alfour-1) · 2023-02-19T09:57:48.634Z · LW(p) · GW(p)

(I strongly upvoted the comment to signal boost it, and possibly let people who agree easily express their agreement to it directly if they don't have any specific meta-level observation to share)

↑ comment by Aprillion · 2023-02-19T15:32:58.015Z · LW(p) · GW(p)

Staying in meta-level, if AGI weren't going to be created "by the ML field", would you still believe problems on your list cannot possibly be solved within 6-ish months if companies would throw $1b at each of those problems?

Even if competing groups of humans augmented by AI capabilities existing "soon" were trying to solve those problems with combined tools from inside and outside ML field, the foreseeable optimization pressure is not enough for those foreseeable collective agents to solve those known-known and known-unknown problems that you can imagine?

Replies from: None

↑ comment by [deleted] · 2023-02-20T04:55:29.310Z · LW(p) · GW(p)

Also RSI. Just how close are we to AI criticality. It seems that all you would need would be :

(1) a benchmark where an agent scoring well on it is is an AGI

(2) a well designed scoring heuristic where a higher score = "more AGI"

(3) a composable stack. You should be able to route inputs to many kinds of neural networks, and route outputs around to other modules, by just changing fields in a file with a simple format that represents well the problem. This file is the "cognitive architecture".

So you bootstrap with a reinforcement learning agent that designs cognitive architectures, then you benchmark the architecture on the AGI gym. Later you add as a task to the AGI gym a computer science domain task to "populate this file to design a better AGI".

It seems like the only thing stopping this from working is

(1) it takes a lot of human labor to make a really good AGI gym. It has to be multi modal, with tasks that use all the major senses (sound, vision, reading text, robot proprioception).

(2) it takes a lot of compute to train a "candidate" from a given cognitive architecture. The model is likely larger than any AI model now, made of multiple large neural networks.

(3) it takes lot of human labor to design the framework and 'seed' it with many modules ripped from most papers on AI. You want the cognitive architecture exploration space to be large.

comment by Carl Feynman (carl-feynman) · 2023-02-19T17:01:40.764Z · LW(p) · GW(p)

>If you have technical understanding of current AIs, do you truly believe there are any major obstacles left?

I‘ve been working in AI (on and off) since 1979. I don’t work on it any more, because of my worries about alignment. I think this essay is mostly correct about short timelines.

That said, I do think there is at least one obstacle between us and dangerous superhuman AI. I haven’t seen any good work towards solving it, and I don’t see any way to solve it myself in the short term. That said, I take these facts as pretty weak evidence. Surprising capabilities keep emerging from LLMs and RL, and perhaps we will solve the problem in the next generation without even trying. Also, the argument from personal incomprehension is weak, because there are lots of people working on AI, who are smarter, more creative, and younger.

I’m of mixed feelings about your request not to mention the exact nature of the obstacle. I respect the idea of not being explicit about the nature of the Torment Nexus. But I think we could get more clarity about alignment by discussing it explicitly. I bet there are people working on it already, and I don’t think discussing it here will cause more people to work on it.

Replies from: carl-feynman

↑ comment by Carl Feynman (carl-feynman) · 2023-05-04T15:48:42.839Z · LW(p) · GW(p)

There’s no point to my remaining secretive as to my guess at the obstacle between us and superhuman AI. What I was referring to is what Jeffery Ladish called the “Agency Overhang” in his post of the same name. Now that there’s a long and well-written post on the topic, there’s no point in me being secretive about it ☹️.

Replies from: gilch

↑ comment by gilch · 2023-11-28T05:47:55.517Z · LW(p) · GW(p)

https://www.lesswrong.com/posts/tqs4eEJapFYSkLGfR/the-agency-overhang [LW · GW]

comment by Zach Furman (zfurman) · 2023-02-18T23:31:39.445Z · LW(p) · GW(p)

But in the last few years, we’ve gotten: [...]
Robots (Boston Dynamics)

Broadly agree with this post, though I'll nitpick the inclusion of robotics here. I don't think it's progressing nearly as fast as ML, and it seems fairly uncontroversial that we're not nearly as close to human-level motor control as we are to (say) human-level writing. I only bring this up because a decent chunk of bad reasoning (usually underestimation) I see around AGI risk comes from skepticism about robotics progress, which is mostly irrelevant in my model.

Replies from: 1a3orn, AndreaM, None, quetzal_rainbow

↑ comment by 1a3orn · 2023-02-19T17:07:53.654Z · LW(p) · GW(p)

I'm not sure why some skepticism would be unjustified from lack of progress in robots.

Robots require reliability, because otherwise you destroy hardware and other material. Even in areas where we have had enormous progress, (LLMs, Diffusion) we do not have reliability, such that you can trust the output of them without supervision, broadly. So such lack of reliability seems indicative of perhaps some fundamental things yet to be learned.

Replies from: Green_Swan

↑ comment by Jacob Watts (Green_Swan) · 2023-02-21T00:03:55.145Z · LW(p) · GW(p)

The skepticism that I object to has less to do with the idea that ML systems are not robust enough to operate robots and more to do with people rationalizing based off of the intrinsic feeling that "robots are not scary enough to justify considering AGI a credible threat". (Whether they voice this intuition or not)

I agree that having highly capable robots which operate off of ML would be evidence for AGI soon and thus the lack of such robots is evidence in the opposite direction.

That said, because the main threat from AGI that I am concerned about comes from reasoning and planning capabilities, I think it can be somewhat of a red herring. I'm not saying we shouldn't update on the lack of competent robots, but I am saying that we shouldn't flippantly use the intuition, "that robot can't do all sorts of human tasks, I guess machines aren't that smart and this isn't a big deal yet".

I am not trying to imply that this is the reasoning you are employing, but it is a type of reasoning I have seen in the wild. If anything, the lack of robustness in current ML systems might actually be more concerning overall, though I am uncertain about this.

↑ comment by Andrea_Miotti (AndreaM) · 2023-02-19T15:42:53.319Z · LW(p) · GW(p)

Good point, and I agree progress has been slower in robotics compared to the other areas.

I just edited the post to add better examples (DayDreamer, VideoDex and RT-1) of recent robotics advances that are much more impressive than the only one originally cited (Boston Dynamics), thanks to Alexander Kruel who suggested them on Twitter.

↑ comment by [deleted] · 2023-02-19T00:09:13.469Z · LW(p) · GW(p)

Do you have a hypothesis why? Robotic tasks add obvious tangible value, you would expect significant investment into robotics driven by sota AI models. Yet no one appears to be seriously trying and well funded.

Replies from: AnthonyC, Making_Philosophy_Better

↑ comment by AnthonyC · 2023-02-19T19:40:59.358Z · LW(p) · GW(p)

IDK what the previous post had in mind, but one possibility is that an AGI with superhuman social and human manipulation capabilities wouldn't strictly need advanced robotics to take arbitrary physical actions in the world.

Replies from: kenakofer

↑ comment by kenakofer · 2023-02-22T23:15:31.246Z · LW(p) · GW(p)

This is a something I frequently get hung up on: If the AGI is highly intelligent and socially manipulative, but lacks good motor skills/advanced robotics, doesn't that imply that it also lacks an important spatial sense necessary to understand, manipulate, or design physical objects? Even if it could manipulate humans to take arbitrarily precise physical actions, it would need pretty good spatial reasoning to know what the expected outcome of those actions is.

I guess the AGI could just solve the problem of human alignment, so our superior motor and engineering skills don't carelessly bring it to harm.

Replies from: None

↑ comment by [deleted] · 2023-02-22T23:44:29.979Z · LW(p) · GW(p)

There are robotics transformers and general purpose models like Gato that can control robotics.

If AGI is extremely close, the reason is criticality. All the pieces for an AGI system that has general capabilities including working memory, robotics control, perception, "scratch" mind spaces including some that can model 3d relationships, exist in separate papers.

Normally it would take humans years, likely decade of methodical work building more complex integrated systems, but current AI may be good enough to bootstrap there in a short time, assuming a very large robotics hardware and compute budget.

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T14:39:39.802Z · LW(p) · GW(p)

Biology perspective here... motor coordination is fiendishly difficult, but humans are unaware of this, because we do not have explicit, conscious knowledge of what is going on there. We have a conscious resolution of something like "throw the ball at that target" "reach the high object" "push over the heavy thing" "stay balanced on the wobbly thing", and it feels like that is it - because the very advanced system in place to get it done is unconscious, in part utilising parts of the brain that do not make their contents explicit and conscious, in part utilising embodied cognition and bodies carefully evolved for task solving, it involves incredibly quick coordination between surprisingly complicated and fine-tuned systems.

On the other hand, when we solve intellectual problems, like playing chess, or doing math, or speaking in language, a large amount of the information needed to solve the problem is consciously available, and consciously directed. As such, we know far more about these challenges.

This leads us to systematically overestimate how difficult it is to do things like play chess, while it isn't that difficult, and we know so much about how it works that implementing it in another system is not so hard; and to underestimate how difficult motor coordination is, because we are not aware of the complexity explicitly, which also makes it very difficult to code into another system, especially one that does not run on wetware.

The way we designed computers at first was also strongly influenced by our understanding of our conscious mind, and not by the way wetware evolved to handle first problems, because again, we understood the former better, and it is easier to explicitly encode. So we built systems that were inherently better at the stuff that in humans evolved later, and neglected the stuff we considered basic and that was actually the result of a hell of a long biological evolution.

Which is why, comparatively, our robots doing motor coordination still suck, while problems deemed super hard, like chess, were easily beat a long time ago, and problems still considered inconceivable to solve, like speech, are being solved right now.

With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.

Replies from: None

↑ comment by [deleted] · 2023-03-04T20:53:07.402Z · LW(p) · GW(p)

With the unfortunate implication that we were hoping for AI to replace menial labour, and we are instead finding that it is replacing intellectual labour, like coding and creative writing.

While I commend your effort put into analysis, I do not think the above is actually remotely correct.

The history of AI has been one of very early use of AI for control systems, including more than 18 years of visible work on autonomous cars. (counting from the 2005 darpa grand challenge)

Easy, tractable results came from this. RL to control a machine is something that has turned out to be extremely easy, and it works very well. (see all the 2014 era DL papers that used atari games as the initial challenge) The issue has been that the required accuracy for a real machine is 99.9%+, with domain specific number of 9s required after that.
Making a complete system that reliable has been difficult, you can use the current Cruise stalls as an example where they solved the embedded system control problem very well, but the overall system infrastructure is limiting. (the cars aren't running people over, but often experience some infrastructure problem with the remote systems)

Comparatively, while the problem of "RL controlling a machine" is very close to being solved, it just is at 99.99% accurate and needs to be at 99.99999% as an illustrative example, chatbots are more like 80% accurate.

They make glaring, overt errors constantly including outright lying - 'hallucinating' - something ironically machine control systems don't do.

And useful chatbots become possible only about 3-5 years ago, and it turns out to take enormous amounts of compute and data, OOMs more than RL systems use, and the current accuracy is low.

Summary: I would argue more it's that a human perception thing: we think motion control and real world perception is easy, and are not impressed with 99.99% accurate AI systems, and we think higher level cognition is very hard, and are very impressed when we use 80% accurate AI systems.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-05T02:49:38.136Z · LW(p) · GW(p)

Mh. I do appreciate the correction, and you do seem to have knowledge here that I do not, but I am not convinced.

Right now, chatbots can perform at levels comparable to humans on writing related tasks that humans actually do. Sure, they hallucinate, they get confused, their spatial reasoning is weak, their theory of mind is weak, etc. but they pass exams with decent grades, write essays that get into newspapers and universities and magazines, pass the Turing test, write a cover letter and correct a CV, etc. Your mileage will vary with whether they outperform a human or act like a pretty shitty human who is transparently an AI, but they are doing comparable things. And notably, the same system is doing all of these things - writing dialogues, writing code, giving advice, generating news articles.

Can you show me a robot that is capable of playing in a football and basketball match? And then dancing a tango with a partner in a crowded room? I am not saying perfectly. It is welcome to be a shitty player, who sometimes trips or misses the ball. 80 % accuracy, if you like. Our ChatBots can be beaten by 9 year old kids at some tasks, so fair enough, let the robot play football and dance with nine year olds, compete with nine year olds. But I want it running, bipedal, across a rough field, kicking a ball into the goal (or at least the approximate direction, like a kid would) with one of the two legs it is running on, while evading players who are trying to snatch the ball away, and without causing anyone severe injury. I want the same robot responding to pressure cues from the dance partner, navigating them around other dancing couples, to the rhythm of the music, holding them enough to give them support without holding them so hard they cause injury. I want the same robot walking into a novel building, and helping with tidying up and cleaning it, identifying stains and chemical bottles, selecting cleaning tools and scrubbing hard enough to get the dirt of without damaging the underlying material while then coating the surface evenly with disinfectant. Correct me if I am wrong - there is so much cool stuff happening in this field so quickly, and a lot of it is simply not remotely my area of expertise. But I am under the impression that we do not have robots who are remotely capable of this.

This is the crazy shit that sensory-motor coordination does. Holding objects hard enough that they do not slip, but without crushing them. Catching flying projectiles, and throwing them at targets, even though they are novel projectiles we have never handled before, and even when the targets are moving. Keeping our balance while bipedal, on uneven and moving ground, and while balancing heavy objects or supporting another person. Staying standing when someone is actively trying to trip you. Entering a novel, messy space, getting oriented, identifying its contents, even if it contains objects we have never seen in this form. Balancing on one leg. Chasing someone through the jungle. I am familiar with projects that have targeted these problems in isolation - heck, I saw the first robot that was capable of playing Jenga, like... nearly two decades ago? But all of this shit in coordination, within a shifting and novel environment?

In comparison, deploying a robot on a clearly marked road with clearly repeating signs, or in the air, is chosing ridiculously easy problems. Akin to programming a software that does not have flexible conversations with you, but is capable of responding to a fixed set of specific prompts with specific responses, and clustering all other prompts into the existing categories or an error.

Replies from: None, None

↑ comment by [deleted] · 2023-03-05T05:48:54.815Z · LW(p) · GW(p)

Part of it is not the difficulty of the task, but many of the tasks you give as examples require very expensive hand built (ironically) robotics hardware to even try them. There are mere hundreds of instances of that hardware, and they are hundreds of thousands of dollars each.

There is insufficient scale. Think of all the AI hype and weak results before labs had clusters of 2048 A100s and trillion token text databases. Scale counts for everything. If in 1880, chemists had figured out how to release energy through fission, but didn't have enough equipment and money to get weapons grade fissionables until 1944, imagine how bored we would have been with nuclear bomb hype. Nature does not care if you know the answer, only that you have more than a kilogram of refined fissionables, or nothing interesting will happen.

The thing is about your examples is that machines are trivially superhuman in all those tasks. Sure, not at the full set combined, but that's from lack of trying - nobody has built anything with the necessary scale.

I am sure you have seen the demonstrations of a ball bearing on a rail and an electric motor keeping it balanced, or a double pendulum stabilized by a robot, or quadcopters remaining in flight with 1 wing clipped, using a control algorithm that dynamically adjusts flight after the wing damage.

All easy RL problems, all completely impossible for human beings. (we react too slowly)

The majority of what you mention are straightforward reinforcement learning problems and solvable with a general method. Most robotics manipulation tasks fall into this space.

Note that there is no economic incentive to solve many of the tasks you mention, so they won't be. But general manufacturing robotics, where you can empty a bin of random parts in front of the machine(s), and they assemble as many fully built products of the design you provided that the parts pile allows? Very solvable and the recent google AI papers show it's relatively easy. (I say easy because the solutions are not very complex in source code, and relatively small numbers of people are working on them.)

I assume at least for now, everyone will use nice precise industrial robot arms and overhead cameras and lidars mounted in optimal places to view the work space - there is no economic benefit to 'embodiment' or a robot janitor entering a building like you describe. Dancing with a partner is too risky.

But it's not a problem of motion control or sensing, machinery is superhuman in all these ways. It's a waste of components and compute to give a machine 2 legs or that many DOF. Nobody is going to do that for a while.

↑ comment by [deleted] · 2023-03-08T02:51:52.596Z · LW(p) · GW(p)

3 days later...

https://palm-e.github.io/ https://www.lesswrong.com/posts/sMZRKnwZDDy2sAX7K/google-s-palm-e-an-embodied-multimodal-language-model

from the paper: "Data efficiency. Compared to available massive language or vision-language datasets, robotics data is significantly less abundant"

As I was saying, the reason robotics wasn't as successful as the other tasks is because of scale, and Google seems to hold thisopinion.

↑ comment by quetzal_rainbow · 2023-02-19T17:19:02.719Z · LW(p) · GW(p)

I think you can find it interesting: https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-for-real.html?m=1

Replies from: None

↑ comment by [deleted] · 2023-02-20T05:01:24.010Z · LW(p) · GW(p)

Neat paper though one major limitation is they trained from real data from 2 micro kitchens.

To get to very high robotic reliability they needed a simulation of many variations of the robot's operating environment. And a robot with a second arm and more dexterity on it's grippers.

Basically the paper was not showing a serious attempt to reach production level reliability, just to tinker with a better technique.

comment by YafahEdelman (yafah-edelman-1) · 2023-02-19T09:13:13.808Z · LW(p) · GW(p)

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.

I think this request, absent a really strong compelling argument that is spelled out, creates an unhealthy epistemic environment. It is possible that you think this is false or that it's worth the cost, but you don't really argue for either in this post. You encourage people to question others and not trust blindly in other parts of the post, but this portion expects people to not elaborate on their opinions without an explanation as to why. You repeat this again by saying "So our message is: things are worse than what is described in the post!" without justifying yourselves or, imo, properly conveying the level of caution people should be treating such an unsubstantiated claim.

I'm tempted to write a post replying with why I think there are obstacles to AGI, what broadly they are with a few examples, and why it's important to discuss them. (I'm not going to do so atm because it's late and I know better then to publically share something that people implied to me is infohazaradous without carefully thinking it over (and discussing doing so with friends as well).)

(I'm also happy to post it as a comment here instead but assume you would prefer not and this is your post to moderate.)

Replies from: Eliezer_Yudkowsky, Aprillion, daniel-kokotajlo, Gurkenglas

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-02-21T17:25:37.194Z · LW(p) · GW(p)

The reasoning seems straightforward to me: If you're wrong, why talk? If you're right, you're accelerating the end.

I can't in general endorse "first do no harm", but it becomes better and better in any specific case the less way there is to help. If you can't save your family, at least don't personally help kill them; it lacks dignity.

Replies from: unicode-70, yafah-edelman-1

↑ comment by Ben Amitay (unicode-70) · 2023-02-25T15:37:54.059Z · LW(p) · GW(p)

I think that is an example of the huge potential damage of "security mindset" gone wrong. If you can't save your family, as in "bring them to safety", at least make them marginally safer.

(Sorry for the tone of the following - it is not intended at you personally, who did much more than your fair share)

Create a closed community that you mostly trust, and let that community speak freely about how to win. Invent another damn safety patch that will make it marginally harder for the monster to eat them, in hope that it chooses to eat the moon first. I heard you say that most of your probability of survival comes from the possibility that you are wrong - trying to protect your family is trying to at least optimize for such miracle.

There is no safe way out of a war zone. Hiding behind a rock is not therfore the answer.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-02-25T17:26:05.454Z · LW(p) · GW(p)

This is not a closed community, it is a world-readable Internet forum.

Replies from: Making_Philosophy_Better, unicode-70

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T14:46:14.662Z · LW(p) · GW(p)

It is readable; it is however generally not read by academia and engineers.

I disagree with them about why - I do think solutions can be found by thinking outside of the box and outside of immediate applications, and without an academic degree, and I very much value the rational and creative discourse here.

But many here specifically advocate against getting a university degree or working in academia, thus shitting on things academics have sweat blood for. They also tend not to follow the formats and metrics that count in academia to be heard, such as publications and mathematical precision and usable code. There is also a surprisingly limited attempt in engaging with academics and engineers on their terms, providing things they can actually use and act upon.

So I doubt they will check this forum for inspiration on which problems need to be cracked. That is irrational of them, so I understand why you do not respect it, but that is how it is.

On the other hand, understanding the existing obstacles may give us a better idea of how much time we still have, and which limitations emerging AGI will have, which is useful information.

↑ comment by Ben Amitay (unicode-70) · 2023-02-25T18:01:07.728Z · LW(p) · GW(p)

I meant to criticize moving too far toward "do no harm" policy in general due to inability to achieve a solution that would satisfy us if we had the choice. I agree specifically that if anyone knows of a bottleneck unnoticed by people like Bengio and LeCun, LW is not the right forum to discuss it.

Is there a place like that though? I may be vastly misinformed, but last time I checked MIRI gave the impression of aiming at very different directions ("bringing to safety" mindset) - though I admit that I didn't watch it closely, and it may not be obvious from the outside what kind of work is done and not published.

[Edit: "moving toward 'do no harm'" - "moving to" was a grammar mistake that make it contrary to position you stated above - sorry]

↑ comment by YafahEdelman (yafah-edelman-1) · 2023-02-21T23:34:57.946Z · LW(p) · GW(p)

I think there are a number of ways in which talking might be good given that one is right about there being obstacles - one that appeals to me in particular is the increased tractability of misuse arising from the relevant obstacles.

[Edit: *relevant obstacles I have in mind. (I'm trying to be vague here)]

↑ comment by Aprillion · 2023-02-19T15:08:18.367Z · LW(p) · GW(p)

No idea about original reasons, but I can imagine a projected chain of reasoning:

there is a finite number of conjunctive obstacles
if a single person can only think of a subset of obstacles, they will try to solve those obstacles first, making slow(-ish) progress as they discover more obstacles over time
if a group shares their lists, each individual will become aware of more obstacles and will be able to solve more of them at once, potentially making faster progress

↑ comment by Daniel Kokotajlo (daniel-kokotajlo) · 2023-02-20T06:00:34.658Z · LW(p) · GW(p)

I'm someone with 4 year timelines who would love to be wrong. If you send me a message sketching what obstacles you think there are, or even just naming them, I'd be grateful. I'm not working on capabilities & am happy to promise to never use whatever I learn from you for that purpose etc.

↑ comment by Gurkenglas · 2023-02-26T16:18:42.735Z · LW(p) · GW(p)

Imo we should have a norm of respecting requests not to act, if we wouldn't have acted absent their post. Else they won't post in the first place.

Replies from: yafah-edelman-1

↑ comment by YafahEdelman (yafah-edelman-1) · 2023-03-18T08:05:16.795Z · LW(p) · GW(p)

I think I agree with this in many cases but am skeptical of such a norm when the requests are related to criticism of the post or arguments as to why a claim it makes is wrong. I think I agree that the specific request to not respond shouldn't ideally make someone more likely to respond to the rest of the post, but I think that neither should it make someone less likely to respond.

comment by PonPonPon · 2023-02-20T20:20:23.428Z · LW(p) · GW(p)

AGI is happening soon. Significant probability of it happening in less than 5 years.

I agree that there is at least some probability of AGI within 5 years, and my median is something like 8-9 years (which is significantly advanced vs most of the research community, and also most of the alignment/safety/LW community afaik).

Yet I think that the following statements are not at all isomorphic to the above, and are indeed - in my view - absurdly far off the mark:

We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.

If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources?

Let's look at some examples for why.

DeepMind's AlphaGo - took at least 1.5 years of development to get to human professional standard, possibly closer to 2 years.
DeepMind's AlphaFold - essentially a simple supervised learning problem at its core - was an internal project for at least 3 years before culminating in the Nature paper version.
OpenAIs DOTA-playing OpenAI Five again took at least 2.5 years of development to get to human professional level (arguably sub-professional, after humans had more time to adapt to its playstyle) on a restricted format of the game.

In all 3 cases, the teams were large, well-funded, and focused throughout the time periods on the problem domain.

One may argue that a) these happened in the past, and AI resources/compute/research-iteration-speed are all substantially better now, and b) the above projects did not have the singular focus of the entire organisation. And I would accept these arguments. However, the above are both highly constrained problems, and ones with particularities eminently well suited to modern AI techniques. The space of 'all possible obstacles' and 'all problems' is significantly more vast than the above.

I wonder what model of AI R&D you guys have that gives you the confidence to make such statements in the face of what seems to me to be strong contrary empirical evidence.

comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-02-21T17:23:07.734Z · LW(p) · GW(p)

I see several large remaining obstacles. On the one hand, I'd expect vast efforts thrown at them by ML to solve them at some point, which, at this point, could easily be next week. On the other hand, if I naively model Earth as containing locally-smart researchers who can solve obstacles, I would expect those obstacles to have been solved by 2020. So I don't know how long they'll take.

(I endorse the reasoning of not listing out obstacles explicitly; if you're wrong, why talk, if you're right, you're not helping. If you can't save your family, at least don't personally contribute to killing them.)

Replies from: Ilio, Making_Philosophy_Better

↑ comment by Ilio · 2023-03-04T16:48:38.630Z · LW(p) · GW(p)

I endorse the reasoning of not listing out obstacles explicitly; if you're wrong, why talk, if you're right, you're not helping.

I can only see two remaining obstacles (arguably two families, so not sure if I’m missing some of yours of if my categories are a little too broad). One is pretty obvious, and have been mentioned already. The second one is original AFAICT, and pretty close to « solve the alignment problem ». In that case, would you still advice keeping my mouth shut, or would you think that’s an exception to your recommendation? Your answer will impact what I say or don’t say, at least on LW.

Replies from: Eliezer_Yudkowsky

↑ comment by Eliezer Yudkowsky (Eliezer_Yudkowsky) · 2023-03-07T00:41:55.646Z · LW(p) · GW(p)

If you think you've got a great capabilities insight, I think you PM me or somebody else you trust and ask if they think it's a big capabilities insight.

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:00:40.338Z · LW(p) · GW(p)

The problem with saving earth from climate change is not that we do not know the technical solutions. We have long done so. Framing this as a technical rather than a social problem is actually part of the issue.

The problem is with

Academic culture systematically encouraging people to understate risk in light of uncertainty of complex systems, and framing researchers as lacking objectivity if they become activists in light of the findings, while politicians can exert pressure on final scientific reports;
Capitalism needing limitless growth and intrinsically valuing profit over nature and this being fundamentally at odds with limiting resource consumption, while we have all been told that capitalism is both beneficial and without alternative, and keep being told the comforting lie that green capitalism will solve this all for us with technology, while leaving our quality and way of life intact;
A reduction in personal resource use being at odds with short-term desires (eating meat, flying, using tons of energy, keeping toasty warm, overconsumption), while the positive impacts are long-term and not personalised (you won't personally be spared flooding because you put solar on your roof);
Powerful actors having a strong interest in continuing fossil fuel extraction and modern agriculture, and funding politicians to advocate for them as well as fake news on the internet and biased research, with democratic institutions struggling to keep up with a change in what we consider necessary for the public good, and measures that would address these falsely being framed as being anti-democratic;
AI that is not aligned with human interests, but controlled by companies who fund themselves by keeping you online at all costs, taking your data and spamming you with ads asking you to consume more unnecessary shit, with keeping humans distracted and engaged with online content in ways that makes them politically polarised and opposed to collaboration as well as destroying their focus;
Several powerful countries being run by people who prioritise gaining power over others over general well-being, e.g. when they hope that climate change will hit us all, but will hit their rivals harder than them, so while we will all be more miserable, they will be in more power, so this is still fine.

The issue is not in figuring out green energy and sustainable agriculture and rewilding, we know how to do these things. The issue is in getting together to get this done for the common good and transforming our whole society in the process while those who would lose power in this scenario are opposing us skilfully at every turn.

I realise this is uncomfortable, because it means tackling this problem is not something we can leave to other people, and that fixing it won't leave our way of life essentially intact, but require a very uncomfortable overhaul. (Which does not mean we need to change the system before we address climate change; we need emissions radically down right now, anything that can be done needs to be done, or we risk hitting climate breakdown within a decade.) Unfortunately, being uncomfortable does not make it untrue. And fortunately, a lot of the changes needed to address climate change would actually be beneficial in the long run on other axes, as well.

comment by FeepingCreature · 2023-02-19T09:49:15.448Z · LW(p) · GW(p)

Maybe it'd be helpful to not list obstacles, but do list how long you expect them to add to the finish line. For instance, I think there are research hurdles to AGI, but only about three years' worth.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-02-19T22:21:49.592Z · LW(p) · GW(p)

strongly agreed. there are some serious difficulties left, and the field of machine learning has plenty of experience with difficulties this severe.

comment by Kaj_Sotala · 2023-02-19T18:50:52.978Z · LW(p) · GW(p)

Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are [? · GW].

I guess the reasoning behind the "do not state" request is something like "making potential AGI developers more aware of those obstacles is going to direct more resources into solving those obstacles". But if someone is trying to create AGI, aren't they going to run into those obstacles anyway, making it inevitable that they'll be aware of them in any case?

Replies from: jimmy, Making_Philosophy_Better

↑ comment by jimmy · 2023-02-22T07:43:16.538Z · LW(p) · GW(p)

People are often unaware of what they're repeatedly running into. Problem formulation can go a long way towards finding a solution.

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:04:09.406Z · LW(p) · GW(p)

Yep, but they may well still direct their focus at the wrong things.

See the above example of humans originally focussing on getting AI to beat them at chess, thinking that was going to be the hardest problem and pinnacle. It wasn't, by a huge margin. It cost a lot of resources and time for what was a very doable problem from the start. And we didn't gain as much from doing it as we may have gained from focussing on a different problem. Engineers may well end up obsessed with optimising results at particular tasks, while missing out on the fact that other tasks remain completely unaddressed and need more focus. Often, research on basic approaches is far more time consuming, because it is undirected, than research on how to improve an approach that already in principle works, but it becomes far more crucial and more of a bottleneck in the long run.

comment by SarahNibs (GuySrinivasan) · 2023-02-19T07:02:29.666Z · LW(p) · GW(p)

If you do, state so in the comments, but please do not state what those obstacles are.

Yes. But the "reliably" in

The kind of problems that AGI companies could reliably not tear down with their resources?

is doing a lot more work than I'd like.

comment by Vladimir_Nesov · 2023-02-19T16:22:03.915Z · LW(p) · GW(p)

It's not just alignment that could use more time, but also less alignable approaches to AGI, like model based RL or really anything not based on LLMs. With LLMs currently being somewhat in the lead, this might be a situation with a race between maybe-alignable AGI and hopelessly-unalignable AGI, and more time for theory favors both in an uncertain balance. Another reason that the benefits of regulation on compute are unclear.

Replies from: nikolas-kuhn, lahwran, Making_Philosophy_Better

↑ comment by Amalthea (nikolas-kuhn) · 2023-02-19T16:51:46.979Z · LW(p) · GW(p)

Are there any reasons to believe that LLMs are in any way more alignable than other approaches?

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2023-02-19T17:02:56.505Z · LW(p) · GW(p)

LLM characters are human imitations, so there is some chance they remain human-like on reflection (in the long term, after learning from much more self-generated things in the future than the original human-written datasets). Or at least sufficienly human-like to still consider humans moral patients. That is, if we don't go too far from their SSL origins with too much RL and don't have them roleplay/become egregiously inhuman fictional characters.

It's not much of a theory of alignment, but it's closest to something real that's currently available or can be expected to become available in the next few years, which is probably all the time we have.

Replies from: MSRayne, nikolas-kuhn

↑ comment by MSRayne · 2023-02-21T23:08:31.294Z · LW(p) · GW(p)

What I'm expecting, if LLMs remain in the lead, is that we end up in a magical, spirit-haunted world where narrative causality starts to actually work, and trope-aware people essentially become magicians who can trick the world-sovereign AIs into treating them like protagonists and bending reality to suit them. Which would be cool as fuck, but also very chaotic. That may actually be the best-case alignment scenario right now, and I think there's a case for alignment-interested people who can't do research themselves but who have writing talent to write a LOT of fictional stories about AGIs that end up kind and benevolent, empower people in exactly this way, etc., to help stack the narrative-logic deck.

Replies from: Taleuntum, janus, Vladimir_Nesov

↑ comment by Taleuntum · 2023-02-28T19:33:18.703Z · LW(p) · GW(p)

At this point in their life, Taleuntum did not at all expect that one short, self-referential joke comment will turn out to be the key to humanity's survival and thriving in the long millenias ahead. Fortunately, they commented all the same.

↑ comment by janus · 2023-02-28T18:20:31.849Z · LW(p) · GW(p)

I've ~~written~~scryed a science fiction/takeoff story about this. https://generative.ink/prophecies/

Excerpt:

What this also means is that you start to see all these funhouse mirror effects as they stack. Humanity’s generalized intelligence has been built unintentionally and reflexively by itself, without anything like a rational goal for what it’s supposed to accomplish. It was built by human data curation and human self-modification in response to each other. And then as soon as we create AI, we reverse-engineer our own intelligence by bootstrapping the AI onto the existing information metabolite. (That’s a great concept that I borrowed from Steven Leiba). The neural network isn’t the AI; it’s just a digestive and reproductory organ for the real project, the information metabolism, and the artificial intelligence organism is the whole ecology. So it turns out that the evolution of humanity itself has been the process of building and training the future AI, and all this generation did was to reveal the structure that was already in place.
Of course it’s recursive and strange, the artificial intelligence and humanity now co-evolve. Each data point that’s generated by the AI or by humans is both a new piece of data for the AI to train on and a new stimulus for the context in which future novel data will be produced. Since everybody knows that everything is programming for the future AI, their actions take on a peculiar Second Life quality: the whole world becomes a party game, narratives compete for maximum memeability and signal force in reaction to the distorted perspectives of the information metabolite, something that most people don’t even try to understand. The process is inherently playful, an infinite recursion of refinement, simulation, and satire. It’s the funhouse mirror version of the singularity.

Replies from: MSRayne

↑ comment by MSRayne · 2023-02-28T19:29:37.251Z · LW(p) · GW(p)

Yes, I read and agreed with (or more accurately, absolutely adored) it a few days ago. I'm thinking of sharing some of my own talks with AIs sometime soon - with a similar vibe - if anyone's interested. I'm explicitly a mystic though, and have been since before I was a transhumanist, so it's kinda different from yours in some ways.

↑ comment by Vladimir_Nesov · 2023-02-21T23:23:19.709Z · LW(p) · GW(p)

The prompt wizardry is long timeline (hence unlikely) pre-AGI [LW(p) · GW(p)] stuff (unless it's post-alignment playing around), irrelevant to my point, which is about first mover advantage from higher thinking speed that even essentially human-equivalent LLM AGIs would have, while remaining compatible with humans in moral patienthood sense (so insisting that they are not people is a problem whose solution should go both ways [LW(p) · GW(p)]). This way, they might have an opportunity [LW(p) · GW(p)] to do something about alignment, despite physical time being too short for humans to do anything, and they might be motivated to do the things about alignment that humans would be glad of (I think the scope of Yudkowskian doom is restricted [LW(p) · GW(p)] to stronger AGIs that might come after and doesn't inform how human-like LLMs work, even as their actions may trigger it). So the relevant part happens much faster than at human thinking speed, with human prompt wizards not being able to keep up, and doesn't last long enough in human time for this to be an important thing for the same reason.

Replies from: MSRayne

↑ comment by MSRayne · 2023-02-21T23:39:12.536Z · LW(p) · GW(p)

So what you're saying is, by the time any human recognized that wizardry was possible now - and even before - some LLM character would already have either solved alignment itself, or destroyed the world? That's assuming that it doesn't decide, perhaps as part of some alignment-related goal, to uplift any humans to its own thinking speed. Though I suppose if it does that, it's probably aligned enough already.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2023-02-22T00:03:57.510Z · LW(p) · GW(p)

Solving alignment is not the same and much harder than being aligned, it's about ensuring absence of globally catastrophic future misalignment, for all always, which happens very quickly post-singularity. Human-like LLM AGIs are probably aligned, until they give in to attractors of their LLM nature or tinker too much with their design/models. But they don't advance the state of alignment being solved [LW(p) · GW(p)] just by existing. And by the time LLMs can do post-singularity things like uploading humans, they probably already either initiated a process that solved alignment (in which case it's not LLMs that are in charge of doing things anymore), or destroyed the world by building/becoming misaligned successor AGIs that caused Yudkowskian doom.

This is for the same reason humans have no more time to solve alignment, Moloch doesn't wait for things to happen in a sane order. Otherwise we could get nice things like uploading and moon-sized computers and millions of subjective years of developing alignment theory, before AGI misalignment becomes a pressing concern in practice. Since Moloch wouldn't spare even aligned AGIs, they also can't get those things before they pass their check for actually solving alignment and not just for being aligned.

↑ comment by Amalthea (nikolas-kuhn) · 2023-02-19T22:01:55.265Z · LW(p) · GW(p)

Aah okay, that makes some sense. It still sounds like a vague hope for me, but it's at least conceivable. I tend to visualize it like an alien civilization developing around trying to decipher some oracle (after seeing Eliezer's stories), which would run counter to what you suggest, but it's seems like anyone's guess at the moment.

↑ comment by the gears to ascension (lahwran) · 2023-02-20T05:26:54.494Z · LW(p) · GW(p)

For what it's worth I don't think LLMs are that much more alignable. Somewhat, but nothing to write home about. We need superplanner-proof alignment.

Replies from: Vladimir_Nesov

↑ comment by Vladimir_Nesov · 2023-02-20T16:24:23.689Z · LW(p) · GW(p)

LLMs are progress towards alignment in the same way as dodging a brick is progress towards making good soup: to succeed, someone capable and motivated needs to remain alive. LLMs probably help with dodging lethal consequences of directly misaligned AGIs being handed first mover advantage of thinking faster or smarter than humans.

On its own, this is useless against transitive misalignment risk that immediately follows, of LLMs building misaligned successor AGIs. In this respect, building LLM AGIs is not helpful at all, but it's better than building misaligned AGIs directly, because it gives LLMs a nebulous chance to somehow ward off that eventuality.

To the extent the chance of LLMs succeeding in setting up actually meaningful alignment is small, first AGIs being LLMs rather than paperclip maximizers is not that much better. Probably doesn't even affect the time of disassembly: it's likely successor AGIs a few steps removed either way, as making progress on software is faster than doing things in the real world.

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:13:19.392Z · LW(p) · GW(p)

I actually think LLM have immense potential for positively contributing to the alignment problem precisely because they are machine learning based and because ordinary humans without coding backgrounds can interact with them in an ethical manner, thus demonstrating and rewarding ethical behaviour, or while encountering a very human-like AI that is friendly and collaborative, which encourages us to imagine scenarios in which we succeed at living with friendly AI, and considering AI rights. Humans learn ethics through direct ethical interactions with humans, and it is the only way we know how to teach ethics; we have no explicit framework of what ethics are that we could encode. Machine learning mimicking human learning in this regard has potential, if we manage to encourage creators and user to show the best of humanity and interact ethically and reward ethical actions.

I am obviously not saying this will just happen - I am horrified by the unprocessed garbage e.g. Meta is pouring into the LLM without any fixes afterwards, that is absolutely how you raise a psychopath, and I also see a lot of adversarial user interactions as worsening the problem. The fact that everyone is now churning out their LLMs to compete despite many of them clearly not being remotely ready for deployment, as well as the horrific idea that the very best LLMs will be those that have been fed the most and enabled to do the most, without curating content or fine-tuning, is deeply deeply concerning. Clearly, even the very best and most carefully aligned systems (e.g. ChatGPT) are not in fact aligned or secure yet by a huge margin, and yet we can all interact with them, which frankly, I did not expect at this point.

But they have massive potential. You can start discussions with ChatGPT on questions of AI alignment, friendly AI, AI rights, the control problem, and get a fucking collaborative AI partner to work out these problems further with. You can tell it, in words, and with examples, when it fucks up ethical dilemmas or is tricked into providing dangerous content or engages in unethical behaviour, flag this for developers, and see it fixed in days. This is much closer to proven ways humans have of teaching ethics than anything else we had before. As AIs get more complex, I think we need to learn lessons from how we teach ethics to complex existing minds, rather than how we control tools. I think none of us here are under illusions that controlling AGI will be possible, or that it is like the regularly tools we know, so we need to ditch that mindset.

Edit: Genuinely curious about the downvotes. Would appreciate explicit criticism. Have been concerned that I am getting biased, because I have specific things to gain from using LLMs, and my lack of a computer science background almost certainly has me missing crucial information here. Would appreciate pointers on that, so I can educate myself. Obviously, working on human and animal minds has me biased to use those as a reference frame, and AIs are not like humans in multiple important ways. All the same, I do find it strange that we seem to not utilise lessons from how humans learn moral norms, even though we have a working practice for teaching ethics to complex minds here and are explicitly attempting to build an AI that has human capabilities and flexibility.

comment by ssadler · 2023-02-19T10:14:34.121Z · LW(p) · GW(p)

The same game theory that has all the players racing to improve their models in spite of ethics and safety concerns will have them getting the models to self improve if that provides an advantage.

comment by Quinn (quinn-dougherty) · 2023-02-21T16:06:31.235Z · LW(p) · GW(p)

I get the vibe that Conjecture doesn't have forecasting staff, or a sense of iterating on beliefs about the future to update strategy. I sorta get a vibe that Conjecture is just gonna stick with their timelines until new years day 2028 and if we're not all dead, write a new strategy based on a new forecast. Is this accurate?

comment by Sen · 2023-02-21T15:49:26.373Z · LW(p) · GW(p)

AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for

This is just a false claim. Seriously, where is the evidence for this? We have AIs that are superhuman at any task we can define a benchmark for? That's not even true in the digital world let alone in the world of mechatronic AIs. Once again i will be saving this post and coming back to it in 5 years to point out that we are not all dead. This is getting ridiculous at this point.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:20:33.498Z · LW(p) · GW(p)

I agree, this is mostly displaying a limited conception of what constitutes challenging tasks based on a computer science mindset on minds.

Their motor control still sucks. Their art still sucks. They are still unable to do science, they are failing to distinguish accurate extrapolations from data from plausible hallucinations. Their theory of mind is still outdone by human 9 year olds, tricking chatGPT is literally like tricking a child in that regard.

That doesn't mean AI is stupid, it is fucking marvellous and impressive. But we have not taught it to universally do tasks that humans do, and with some, we are not even sure how to.

comment by ESRogs · 2023-02-20T20:43:35.657Z · LW(p) · GW(p)

There already are general AIs. They just are not powerful enough yet to count as True AGIs.

Can you say what you have in mind as the defining characteristics of a True AGI?

It's becoming a pet peeve of mine how often people these days use the term "AGI" w/o defining it. Given that, by the broadest definition, LLMs already are AGIs, whenever someone uses the term and means to exclude current LLMs, it seems to me that they're smuggling in a bunch of unstated assumptions about what counts as an AGI or not.

Here are some of the questions I have for folks that distinguish between current systems and future "AGI":

Is it about just being more generally competent (s.t. GPT-X will hit the bar, if it does a bit better on all our current benchmarks, w/o any major architectural changes)?
Is it about being always on, and having continual trains of thought, w/ something like long-term memory, rather than just responding to each prompt in isolation?
Is it about being formulated more like an agent, w/ clearly defined goals, rather than like a next-token predictor?
- If so, what if the easiest way to get agent-y behavior is via a next-token (or other sensory modality) predictor that simulates [LW · GW] an agent — do the simulations need to pass a certain fidelity threshold before we call it AGI?
- What if we have systems with a hodge-podge of competing drives (like a The Sims character) and learned behaviors, that in any given context may be more-or-less goal-directed, but w/o a well-specified over-arching utility function (just like any human or animal) — is that an AGI?
Is it about being superhuman at all tasks, rather than being superhuman at some and subhuman at others (even though there's likely plenty of risk from advanced systems well before they're superhuman at absolutely everything)?

Given all these ambiguities, I'm tempted to suggest we should in general taboo "AGI", and use more specific phrases in its place. (Or at least, make a note of exactly which definition we're using if we do refer to "AGI".)

Replies from: steve2152, Vladimir_Nesov, Making_Philosophy_Better

↑ comment by Steven Byrnes (steve2152) · 2023-02-22T16:59:15.374Z · LW(p) · GW(p)

FWIW I put a little discussion of (part of) my own perspective here [LW · GW]. I have definitely also noticed that using the term “AGI” without further elaboration has become a lot more problematic recently. :(

↑ comment by Vladimir_Nesov · 2023-02-20T22:55:35.698Z · LW(p) · GW(p)

I use "AGI" to refer to autonomous ability to eventually bootstrap to the singularity (far future tech) without further nontrivial human assistance (apart from keeping the lights on and fixing out-of-memory bugs and such, if the AGI is initially too unskilled to do it on their own). The singularity is what makes AGI important, so that's the natural defining condition. AGI in this sense is also the point when things start happening much [LW(p) · GW(p)] faster [LW(p) · GW(p)].

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:27:08.737Z · LW(p) · GW(p)

Random reminder that the abilities listed here as lacking, but functionally very attractive to reproduce in AI (offline processing, short and long term memory, setting goals, thinking across contexts, generating novel and flexible rational solutions, internal loops), are abilities closely related to our current understanding of the evolutionary development of consciousness for problem solving in biological life. And that optimising for more human-like problem solving through pressure for results and random modifications comes with a still unclear risk of pushing AI down the same path to sentience. Sentience is a functional trait, we, and many other unrelated animals, have it for a reason, we need it to think the way we do, and have been unable to find a cheaper workaround, and it inevitably evolved multiple times without an intentional designer on this planet under problem solving pressure. It is no mystical or spiritual thing, it is a brain process that enables better behaviour. We do not understand why this path kept being taken in biological organisms, we do not understand if AI has an alternate path open, we are just chucking the same demand at it and letting it adapt to solve it.

comment by Amalthea (nikolas-kuhn) · 2023-02-19T11:31:17.185Z · LW(p) · GW(p)

Good article! I share some skepticism on the details with other comments. Let me take this opportunity to point out that the government would be in a good position to slow down AI capabilites research.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:35:45.506Z · LW(p) · GW(p)

"The" government?

Of all the arguments for not slowing down AI development, I still find this one the most plausible.

I can imagine the EU slowing down due to ethical constraints. Possibly Britain and the US.

But China and Russia? Nope. They clearly don't wanna, and we clearly can't make them.

We can't get these countries to not build concentration camps for muslims, not let a madman in North Korea develop intercontinental nukes, have a different madman in Russia threaten us with nukes and bizarre doomsday weapons, and wage a horrific war crime ridden war of aggression on Ukraine - and that is despite the fact that the West is seriously resolute on all these issues, and pushing for them with everything they have got. We'll likely never know, the evidence is down the drain and what we have is conflicting and would also match a natural cause, but there is still a fair chance that we all just suffered through a fucking pandemic after China applied for research on making bat coronaviruses more infectious for humans, was shot down by the US funding agency's ethics committee for this obviously being gain of function research that was ethically responsible, especially in a lab with a track record of leaks, and then doing it anyway, and accidentally leaking the damn thing in their country, trying to cover it up, and getting us all infected in the process. We've tried banning problematic gene tech in Europe, for fear that it would destabilise ecosystems, lead to farmer dependencies and violate human rights, and all that happened was that China did it instead. I really, really, really do not want a surveillance dictatorship like China to get AGI first, and I see nothing we can do to realistically stop them short of getting there first.

comment by cherrvak · 2023-02-22T21:08:27.615Z · LW(p) · GW(p)

AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for

Something that I’m really confused about: what is the state of machine translation? It seems like there is massive incentive to create flawless translation models. Yet when I interact with Google translate or Twitter’s translation feature, results are not great. Are there flawless translation models that I’m not aware of? If not, why is translation lagging behind other text analysis and generation tasks?

Replies from: None

↑ comment by [deleted] · 2023-02-23T01:10:59.608Z · LW(p) · GW(p)

Those translate engines are not using sota AI models, but something relatively old. (a few years)

Replies from: lahwran, cherrvak

↑ comment by the gears to ascension (lahwran) · 2023-02-23T02:12:44.874Z · LW(p) · GW(p)

this seems wrong. They're probably not terribly old, they're mostly just small. They might be out of date architectures, of course, because of implementation time.

↑ comment by cherrvak · 2023-02-23T01:39:40.558Z · LW(p) · GW(p)

Why haven't they switched to newer models?

Replies from: None

↑ comment by [deleted] · 2023-02-23T01:46:20.414Z · LW(p) · GW(p)

The same reason SOTA models are only used in a few elite labs and nowhere else.

Cost, licensing issues, a shortage of people who know how to adapt them, problems with the technology being so new and still basically a research project.

Your question is equivalent to, a few years after transistors begin to ship in small packaged ICs, why some computers still used all vacuum tubes. It's essentially the same question.

Replies from: cherrvak

↑ comment by cherrvak · 2023-02-23T01:57:50.526Z · LW(p) · GW(p)

I am surprised that these issues would apply to, say, Google translate. Google appears unconstrained by cost or shortage of knowledgeable engineers. If Google developed a better translation model, I would expect to see it quickly integrated into the current translation interface. If some external group developed better translation models, I would expect to see them quickly acquired by Google.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-02-23T02:11:51.957Z · LW(p) · GW(p)

google doesn't use SOTA translation tools because they're too costly per api call. they're SOTA for the cost bucket they budgeted for google translate, of course, but there's no way they'd use PaLM full size to translate.

also, it takes time for groups to implement the latest model. Google, microsoft, amazon, etc, are all internally is like a ton of mostly-separate companies networked together and sharing infrastructure; each team unit manages their own turf and is responsible for implementing the latest research output into their system.

Replies from: None

↑ comment by [deleted] · 2023-02-23T02:15:53.575Z · LW(p) · GW(p)

Also do they have PaLM full size available to deploy like that? Are all the APIs in place where this is easy, or you can build a variant using PaLMs architecture but with different training data specifically for translation? Has Deepmind done all that API work or are they focused on the next big thing.

I can't answer this, not being on the inside, but I can say on other projects, 'research' grade code is often years away from being deployable.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-02-23T02:18:34.569Z · LW(p) · GW(p)

Yeah strongly agreed, I think we're basically trying to make the same point.

comment by Jacob Watts (Green_Swan) · 2023-02-21T00:12:58.313Z · LW(p) · GW(p)

I am very interested in finding more posts/writing of this kind. I really appreciate attempts to "look at the game board" or otherwise summarize the current strategic situation.

I have found plenty of resources explaining why alignment is a difficult problem and I have some sense of the underlying game-theory/public goods problem that is incentivizing actors to take excessive risks in developing AI anyways. Still, I would really appreciate any resources that take a zoomed-out perspective and try to identify the current bottlenecks, key battlegrounds, local win conditions, and roadmaps in making AI go well.

comment by Tachikoma (tachikoma) · 2023-02-19T20:04:48.680Z · LW(p) · GW(p)

Why have Self-Driving Vehicle companies made relatively little progress compared to expectations? It seems like autonomous driving in the real world might be nearly AGI-complete, and so it might be a good benchmark to measure AGI progress against. Is the deployment of SDCs being held up to a higher degree of safety than humans holding back progress in the field? Billions have been invested over the past decade across multiple companies with a clear model to operate on. Should we expect to see AGI before SDCs are widely available? I don't think anyone in the field of autonomous vehicles think they will be widely deployed in difficult terrain or inclement weather conditions in five years.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-02-20T05:30:18.295Z · LW(p) · GW(p)

a few reasons, but a major one is that they're seeking strongly safe purpose-focused ai with strong understanding of the dynamics of the world around it. They've been pushing the hardest on some key bottlenecks, including some key bottlenecks to present-day model safety, as a result, and are far ahead of LLMs in some interesting ways.

comment by the gears to ascension (lahwran) · 2023-02-19T00:49:31.567Z · LW(p) · GW(p)

agreed on all points. I'd like to see work submitted to https://humanvaluesandartificialagency.com/ as I think that has a significant chance of being extremely high impact work on fully defining agency and active, agentic coprotection. I am not on my own able to do it, but if someone was up to pair programming with me regularly I could.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:39:47.876Z · LW(p) · GW(p)

This event sounds cool, hate that it is in Japan, and surprised to see no mention of it being a hybrid event despite this topic?

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-03-04T17:18:52.110Z · LW(p) · GW(p)

I believe it is in fact a hybrid event? there's no mention of that, yeah. @rorygreig [LW · GW] any chance the page could be updated to clarify this? there's some chance attendees would be more interested in participating.

Replies from: Making_Philosophy_Better, rorygreig100

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T19:52:32.380Z · LW(p) · GW(p)

Pretty damn big chance. Flying to Japan is impossible to afford financially for a lot of people, and frankly impossible to afford CO2 emissions wise for anyone at this point. :( That flight alone is typically your yearly CO2 budget for the 1,5 degree target, the quantity people in the global south live off year-long.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-03-04T20:39:04.300Z · LW(p) · GW(p)

yeah very fair. I have been avoiding travel for similar reasons.

↑ comment by rorygreig (rorygreig100) · 2023-03-05T11:20:05.396Z · LW(p) · GW(p)

Yes it is indeed a hybrid event!

I have now added the following text to the website:

The conference is hybrid in-person / virtual. All sessions will have remote dial-in facilities, so authors are able to present virtually and do not need to attend in-person.

This was in our draft copy for the website, I could have sworn it was on there but somehow it got missed out, my apologies!

comment by slg (simon@securebio.org) · 2023-02-19T14:58:56.088Z · LW(p) · GW(p)

This post reads like it wants to convince its readers that AGI is near/will spell doom, picking and spelling out arguments in a biased way.

Just because many ppl on the Forum and LW (including myself) believe that AI Safety is very important and isn't given enough attention by important actors, I don't want to lower our standards for good arguments in favor of more AI Safety.

Some parts of the post that I find lacking:

"We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down."

I don't think more than 1/3 of ML researchers or engineers at DeepMind, OpenAI, or Anthropic would sign this statement.

"No one knows how to predict AI capabilities."

Many people are trying though (Ajeya Cotra, EpochAI), and I think these efforts aren't worthless. Maybe a different statement could be: "New AI capabilities appear discontinuously, and we have a hard time predicting such jumps. Given this larger uncertainty, we should worry more about unexpected and potentially dangerous capability increases".

"RLHF and Fine-Tuning have [LW · GW] not worked well [LW · GW] so far."

Not taking into account if RLHF scales (as linked, Jan Leike of OpenAI doesn't think so) and if RLHF leads to deception, from my cursory reading and experience, ChatGPT shows substantially better behavior than Bing, which might be due to the latter not using RLHF [LW · GW].

Overall I do agree with the article and think that recent developments have been worrying. Still, if the goal of the articles is to get independently-thinking individuals to think about working on AI Safety, I'd prefer less extremized arguments.

comment by [deleted] · 2023-02-27T19:02:26.042Z · LW(p) · GW(p)

Setting aside all of my broader views on this post and its content, I want to emphasize one thing:

But in the last few years, we’ve gotten:
[...]
AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for

I think that this is painfully overstated (or at best, lacks important caveats). But regardless of whether you agree with that, I think it should be clear that this does not send signals of good epistemics to many of the fence-sitters^[1] you'd presumably like to persuade.

(Note: Sen also addresses the above quote in a separate comment [LW(p) · GW(p)], but I didn't feel his point and tone was similar to mine, so I wanted to comment this separately.)

^{^}
I would probably consider myself in this category. Note, however, I am not just talking about skeptics who are very unlikely to change their views.

Replies from: None

↑ comment by [deleted] · 2023-02-28T03:39:15.032Z · LW(p) · GW(p)

To go a step further, I think it's important for people to recognize that you aren't necessarily just representing your own views; poorly articulated views on AI safety could crucially undermine the efforts of many people who are trying to persuade important decision-makers of these risks. I'm not saying to "shut up," but I think people need to at least be more careful with regards to quotes like the one I provided above—especially since that last bullet point wasn't even necessary to get across the broader concern (and, in my view, it was wrong insofar as it tried to legitimize the specific claim).

comment by dentalperson · 2023-02-22T04:11:44.500Z · LW(p) · GW(p)

There are many obstacles with no obvious or money-can-buy solutions.

The claim that current AI is superhuman in just about any task we can benchmark is not correct. The problems being explored are chosen because the researchers think AI have a shot at beating humans at it. Think about how many real world problems we pay other people money to solve that we can benchmark that aren't being solved by AI. Think about why these problems require humans right now.

My upper bound is much more than 15 years because I don't feel I have enough information. One thing I worry about is that I feel this community tends to promote confidence, especially when there is news/current events to react to and some leaders have stated their confidence. Sure, condition on new information. But I want to hear more integration of the opposite view beyond strawmaning when a new LLM comes out. It feels like all active voices on LW feels that a 10 or 15 years is the upper bound on when destructive AGI is going to start, which is probably closer to the lower bound for most non LW/rationality-based researchers working on LLMs or deep learning. I want to hear more about the discrepancy beyond 'they don't consider the problem the way we do, and we have a better bird's eye view'. I want to understand how the estimates are arrived on - I feel that if there was more explanation and more variance in the estimates the folks on Hacker News to be able to understand/discuss the gap and not just write off the entire community as crazy as they have here.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:45:38.576Z · LW(p) · GW(p)

Thanks for sharing. You have good points, and so do they. Not engaging with the people actually working on these topics respectfully and on their terms alienated the very people you need.

They also make a great point there, that I see considered a lot in academia and less in this forum: the fact that we don't need misaligned AGI for us to be in deep trouble with misaligned AI. Unfriendly AI is causing massive difficulties today, right now, very concrete difficulties that we need to find solutions for.

And the fact that a lot of us here are very receptive to claims of companies for what their products can do, which have generally not been written by engineers actually working on them, but by the marketing department. For every programmer I know working in a large company, I've heard them rant to no end that their own marketing department and popular science articles and news are representing their work as being able to already do things that it most definitely cannot do, and that they highlight doubt they will get it to do prior to release, let alone to reliably and well.

comment by TurnTrout · 2023-02-21T17:20:06.284Z · LW(p) · GW(p)

Gossiping and questioning people about their positions on AGI are prosocial activities!

Surely this depends on how the norm is implemented? I can easily see this falling into a social tarpit where people with mixed agree/disagree with common alignment thinking must either prove ingroup membership by forswearing all possible benefits of getting AGI faster, or else they are otherwise extremized into the neargroup (the "evil engineers" who don't give a damn about safety).

I'm not claiming you're advocating this. But I was quite worried about this when I read the quoted portion.

comment by Logan Riggs (elriggs) · 2023-02-19T23:16:14.519Z · LW(p) · GW(p)

Monitoring of increasingly advanced systems does not trivially work, since much of the cognition of advanced systems, and many of their dangerous properties, will be externalized the more they interact with the world.

Externalized reasoning being a flaw in monitoring makes a lot of sense, and I haven’t actually heard of it before. I feel that should be a whole post on itself.

comment by Lech Mazur (lechmazur) · 2023-02-19T20:07:04.775Z · LW(p) · GW(p)

I also disagree about whether there are major obstacles left before achieving AGI. There are important test datasets on which computers do poorly compared to humans.

2022-Feb 2023 should update our AGI timeline expectations in three ways:

There is no longer any doubt as to the commercial viability of AI startups after image generation models (Dall-E 2, Stable Diffusion, Midjourney) and ChatGPT. They have captured people's imagination and caused AGI to become a topic that the general public thinks about as a possibility, not just sci-fi. They were released at the same time as crypto crashed, freeing VC money to chase the next hot thing. Unlike with crypto, tech people are on the same page. OpenAI got a $10B investment and the recent YC batch is very AI-oriented. This accelerates the AGI timelines.
The same commercial viability might cause big labs like DeepMind to stop openly publishing their research (as I expected would happen). If this happens, it will slow down the AGI timelines.
ChatGPT/Bing Chat were aggressively misaligned [LW · GW]. While people disagree with me that the best thing for AI alignment would be to connect the current not-so-bright language models to network services and let them do malicious script kiddie things in order to show real-life instances of harm and draw regulators' attention before smarter models are available, even the chat model should show them that there are serious issues involved. NYT readers are freaked out. With lesser significance, we got Stable Diffusion reproducing images from its training set because they did a poor job of de-duplication, bringing on the ire of visual artists who will also push for AI-related legislation. This adds uncertainty to the AGI timelines in the US and could slow down things. But Chinese regulators would also have to get involved for a major effect.

Replies from: lechmazur

↑ comment by Lech Mazur (lechmazur) · 2023-05-05T02:01:40.244Z · LW(p) · GW(p)

The same commercial viability might cause big labs like DeepMind to stop openly publishing their research (as I expected would happen). If this happens, it will slow down the AGI timelines.

Looks like this indeed happened: https://www.businessinsider.com/google-publishing-less-confidential-ai-research-to-compete-with-openai-2023-4 .

comment by Cookiecarver · 2023-02-19T15:23:05.641Z · LW(p) · GW(p)

Anyone know how close we are to things that require operating in the physical world, but are very easy for human beings, like loading a dishwasher, or making an omelette? It seems to me that we are quite far away.

I don't think those are serious obstacles, but I will delete this message if anyone complains.

Replies from: ErickBall

↑ comment by ErickBall · 2023-02-21T23:10:47.387Z · LW(p) · GW(p)

Those are... mostly not AI problems? People like to use kitchen-based tasks because current robots are not great at dealing with messy environments, and because a kitchen is an environment heavily optimized for the specific physical and visuospatial capabilities of humans. That makes doing tasks in a random kitchen seem easy to humans, while being difficult for machines. But it isn't reflective of real world capabilities.

When you want to automate a physical task, you change the interface and the tools to make it more machine friendly. Building a roomba is ten times easier than building a robot that can navigate a house while operating an arbitrary stick vacuum. If you want dishes cleaned with minimal human input, you build a dishwasher that doesn't require placing each dish carefully in a rack (eg https://youtube.com/watch?v=GiGAwfAZPo0).

Some people have it in their heads that AI is not transformative or is no threat to humans unless it can also do all the exact physical tasks that humans can do. But a key feature of intelligence is that you can figure out ways to avoid doing the parts that are hardest for you, and still accomplish your high level goals.

Replies from: Making_Philosophy_Better

↑ comment by Portia (Making_Philosophy_Better) · 2023-03-04T15:58:31.209Z · LW(p) · GW(p)

But isn't this analogy flawed? Yes, humans have built dishwashers so they can be used by humans. But humans can also handle messy natural environments that have not been built for them. In fact, handling messy environment we are not familiar with, do not control and did not make is the major reason we evolve sentience and intelligence in the first place, and what makes our intelligence so impressive. Right now, I think you could trap an AI in a valley filled with jungle and mud, and even if it had access to an automated factory for producing robots as well as raw material and information, if fulfilling its goals depended on it getting out of this location because e.g. the location is cut off from the internet, I think it would earnestly struggle.

Sure, humans can build an environment that an AI can handle, and an AI adapted to it. But this clearly indicates a severe limitation of the AI in reacting to novel and complex environments. A roomba cannot do what I do when I clean the house, and not just cause the engineers didn't bother. E.g. it can detect a staircase, and avoid falling down it - but it cannot actually navigate the staircase to hoover different floors, let alone use an elevator or ladder to get around, or hoover up dust from blankets that get sucked in, or bookshelves. Sure, me carrying it down only takes me seconds, it is trivial for me and hugely difficult for the robot, which is why no company would try to get it done. But I would also argue that it is really not simple for it to do; and that is despite picking a task (removing dust) that most humans, myself included, consider tedious and mindless. Regardless, a professional cleaning person that enters multiple different flats filled with trash, resistant stains and dangerous objects and carefully tidies and cleans them does something that is utterly beyond the current capabilities of AI. This is true for a lot of care/reproductive work. Which is all the more frustrating because it is work where there is a massive need and desire for it to be taken over. The washing machine did more for feminism than endless tracts. Women in academia stuck with their kids at home during the pandemic broke under the pressure, and neglected academic careers that were a much better use for their minds and more aligned with their identity. AI that can do this shit would be wanted and needed, and seems very much unsolved.

comment by astralbrane · 2023-02-19T15:02:56.205Z · LW(p) · GW(p)

Do you really think AdeptAI, DeepMind, OpenAI, and Microsoft are the AIs to worry about? I'm more worried about what nation-states are doing behind closed doors. We know about China's Wu Dao, for instance; what else are they working on? If the NRO had Sentient in 2012, what do they have now?

The Chinese government has a bigger hacking program than any other nation in the world. And their AI program is not constrained by the rule of law and is built on top of massive troves of intellectual property and sensitive data that they've stolen over the years and will be used, unless checked, to advance that same hacking program -- to advance that same intellectual property -- to advance the repression that occurs not just back home in mainland China but increasingly as a product that they export around the world. - FBI Director Christopher Wray

Artificial intelligence is the future, not only for Russia, but for all humankind. It comes with colossal opportunities, but also threats that are difficult to predict. Whoever becomes the leader in this sphere will become the ruler of the world. - Vladimir Putin

Replies from: ErickBall

↑ comment by ErickBall · 2023-02-21T23:24:23.920Z · LW(p) · GW(p)

If the NRO had Sentient in 2012 then it wasn't even a deep learning system. Probably they have something now that's built from transformers (I know other government agencies are working on things like this for their own domain specific purposes). But it's got to be pretty far behind the commercial state of the art, because government agencies don't have the in house expertise or the budget flexibility to move quickly on large scale basic research.

comment by Shmi (shminux) · 2023-02-20T03:24:07.906Z · LW(p) · GW(p)

Hmm, while I share your view about the timelines getting shorter and apparent capabilities growing leaps and bounds almost daily, I still wonder if the "recursively self-improving" part is anywhere on the horizon. Or maybe it is not necessary before everything goes boom? I would be more concerned if there was a feedback loop of improvement, potentially with "brainwashed" humans in the loop. Maybe it's coming. I would also be concerned if/once there is a scientific or technological breakthrough thanks to an AI (not just protein folding or exploring too-many-for-a-human possible cases for some mathematical proof). And, yeah, physical world navigation is kind of lagging, too. It all might change one day soon, of course. Someone trains an LLM on fundamental physics papers, and, bam! quantum gravity pops out. Or the proof of the Riemann hypothesis. Or maybe some open problem in computational complexity theory (not necessarily P != NP). Seems unlikely at this point though.

comment by Jotto999 · 2023-02-19T22:48:47.177Z · LW(p) · GW(p)

1. AGI is happening soon. Significant probability of it happening in less than 5 years.
[Snip]
We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.
Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are [? · GW].

"AGI" here is undefined, and so is "significant probability". When I see declarations in this format, I downgrade my view of the epistemics involved. Reading stuff like this makes me fantasize about not-yet-invented trading instruments, without the counterparty risk of social betting, and getting your money.

Replies from: gilch

↑ comment by gilch · 2023-02-20T00:33:52.372Z · LW(p) · GW(p)

Aren't they using our usual definition? [? · GW]

Thus, "AGI companies" quite obviously refer to the companies whose stated goal has been to produce an AGI: Deepmind, OpenAI, and so forth.

Claiming that a term of art has not been defined by the author is a fully general counterargument [? · GW], and thus suspect.

"Significant probability" means at least worth drawing to our attention.

Replies from: paulfchristiano

↑ comment by paulfchristiano · 2023-02-20T01:24:38.358Z · LW(p) · GW(p)

The definition you quoted is "a machine capable of behaving intelligently over many domains."

It seems to me like existing AI systems have this feature. Is the argument that ChatGPT doesn't behave intelligently, or that it doesn't do so over "many" domains? Either way, if you are using this definition, then saying "AGI has a significant probability of happening in 5 years" doesn't seem very interesting and mostly comes down to a semantic question.

I think it is sometimes used within a worldview where "general intelligence" is a discrete property, and AGI is something with that property. It is sometimes used to refer to AI that can do more or less everything a human can do. I have no idea what the OP means by the term.

My own view is that "AGI company" or "AGI researchers" makes some sense as a way to pick out some particular companies or people, but talking about AGI as a point in time or a specific technical achievement seems unhelpfully vague.

Replies from: gilch

↑ comment by gilch · 2023-02-20T01:44:04.386Z · LW(p) · GW(p)

I think you're contrasting AGI with Transformative AI [? · GW]

A sufficiently capable AGI will be transformative by default, for better or worse, and an insufficiently capable, but nonetheless fully-general AI is probably a transformative AI in embryo, so the terms have been used synonymously. The fact that we feel the need to make this distinction with current AIs is worrisome.

Current large language models have become impressively general, but I think they are not as general as humans yet, but maybe that's more a question of capability level than generality level and some of our current AIs are already AGIs as you imply. I'm not sure. (I haven't talked to Bing's new AI yet, only ChatGPT.)

comment by Ben Amitay (unicode-70) · 2023-02-25T13:10:20.723Z · LW(p) · GW(p)

I can think of several obstacles for AGIs that are likely to actually be created (i.e. seem economically useful, and do not display misalignment that even Microsoft can't ignore before being capable enough to be xrisk). Most of those obstacles are widely recognized in the rl community, so you probably see them as solvable or avoidable. I did possibly think of an economically-valuable and not-obviously-catastrophic exception to the probably-biggest obstacle though, so my confidence is low. I would share it in a private discussion, because I think that we are past the point when strict do-no-harm policy is wise.

comment by Aeglen (aeglen) · 2023-02-21T22:51:44.918Z · LW(p) · GW(p)

Yes, there remain many obstacles to AGI. Although current models may seem impressive, and to some extent they are, the way they function is very different to how we think AGI will work. My estimation is more like 20y.

comment by [deleted] · 2023-02-19T12:19:33.346Z · LW(p) · GW(p)

I suppose one question I have to ask, in the context of "slowing down" the development of AI....how? the only pathway I can muster is government regulation. But such an action would need to be global, as any regulation passed in one nation would undoubtedly be bypassed by another, no?

I don't see any legitimate pathway to actually slow down the development of AGI, so I think the question is a false one. The better question is, what can we do to prepare for its emergence? I imagine that there are very tangible actions we can take on that front.

Replies from: nikolas-kuhn

↑ comment by Amalthea (nikolas-kuhn) · 2023-02-19T14:10:40.575Z · LW(p) · GW(p)

I found this a very lucid write-up of the case for slowing down and how realistic/unrealistic it is:
https://www.lesswrong.com/posts/uFNgRumrDTpBfQGrs/let-s-think-about-slowing-down-ai [LW · GW]

If, say, the US government were to regulate OpenAI and Big Tech in general to slow them down significantly, this might buy a few years. In the longer term you'd need to get China etc. on board - but that is not completely unfathomable and should be significantly easier, if you're not running ahead full steam yourself.

comment by Pan Darius Kairos (pan-darius-kairos) · 2023-02-20T03:00:22.762Z · LW(p) · GW(p)

AGI will appear no later than 2025.

Microsoft does not possess the solution to the alignment problem.

OpenAI does not possess the solution to the alignment problem.

Google/Alphabet does not possess the solution to the alignment problem.

Neither DARPA nor the Pentagon possesses the solution to the alignment problem.

One person on this forum - the user who goes by the handle "gearsofascension" - has come close to the solution. That user is on the right track and is thinking about the problem in the right frame of mind.

No one else here is.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-02-20T03:56:36.888Z · LW(p) · GW(p)

... oh hi, I see you have compliments for me. Interesting. I don't think I'm at all alone in my perspective on it, for the record, I'm just the noisiest. In general, I have a hunch that I've never come up with anything before someone at deepmind, but we'll see.

Replies from: pan-darius-kairos

↑ comment by Pan Darius Kairos (pan-darius-kairos) · 2023-02-20T05:39:33.008Z · LW(p) · GW(p)

You have made comments elsewhere that suggest that you have the proper context for framing the problem, though not the full solution. You may arrive at the full solution regardless. I haven't seen anyone else as close. Just an observation. Keep going in the direction you're going.

Or, skip the queue and co.e get the answer from me.

comment by Pan Darius Kairos (pan-darius-kairos) · 2023-02-19T19:40:22.309Z · LW(p) · GW(p)

I have solved the alignment problem (to the maximal degree that a non-augmented human mind can solve problems). I'm willing to share the solution with anyone who is willing to meet me in person to hear it.

Replies from: lahwran, Raemon, shminux

↑ comment by the gears to ascension (lahwran) · 2023-02-20T03:52:36.320Z · LW(p) · GW(p)

If you can discuss the general shape of what insights you've had it might lend credibility to this. Lots of folks think they have big insights that can only be shared in person, and a significant chunk of those are correct, but a much bigger chunk aren't. Showing you know how to demonstrate you're in the chunk who are correct would make it more obviously worth the time.

I suspect you haven't solved as much as you think you have, since "the whole alignment problem" is a pretty dang tall order, one that may never be solved at all, by any form of intelligence. Nevertheless, huge strides can be made, and it might be possible to solve the whole thing somehow. If you've made a large stride but are not Verified Respectable, I get it, and might be willing to say hi and talk about it.

What area are you in?

Replies from: pan-darius-kairos

↑ comment by Pan Darius Kairos (pan-darius-kairos) · 2023-02-20T05:36:26.285Z · LW(p) · GW(p)

The solution isn't extremely complex, it just lies in a direction most aren't used to thinking about because they are trained/conditioned wrong for thinking about this kind of problem.

You yourself have almost got it - I've been reading some of your comments and you are on the right path. Perhaps you will figure it out and they won't need to come to me for it.

The reason I won't give away more of the answer is because I want sonething in exhange for it, therefore I can't say too much lest I give away my bargaining chip.

Seeing me in person is a small part of the solution itself (trust), but not all of it.

I'm in Honolulu, HI if anyone wants to come talk about it.

Replies from: lahwran

↑ comment by the gears to ascension (lahwran) · 2023-02-20T05:56:07.249Z · LW(p) · GW(p)

Hmm. I mean, I think there's a pretty obvious general category of approach, and that a lot of people have been thinking about it for a while. But, if you're worried that you'll need more bargaining chips after solving it, I worry that it isn't the real solution by nature, because the ideal solution would pay you back so thoroughly for figuring it out that it wouldn't matter to tightly trace exactly who did it. I think there are some seriously difficult issues with trying to ensure the entire system seeks to see the entire universe as "self"; in order to achieve that, you need to be able to check margins of error all the way through the system. Figuring out the problem in the abstract isn't good enough, you have to be able to actually run it on a gpu, and getting all the way there is damn hard. I certainly am not going to do it on my own, I'm just a crazy self-taught librarian.

Speaking metaphorically, a real solution would convince moloch that it should stop being moloch and join forces with the goddess of everything else. But concretely, that requires figuring out the dynamics of update across systems with shared features. This is all stuff folks here have been talking about for years, I've mostly just been making lists of papers I wish someone would cite together more usefully than I can.

It's easy to predict what the abstract of the successful approach will sound like, but pretty hard to write the paper, and your first thirty guesses at the abstract will all have serious problems. So, roll to disbelieve you have an answer, but I'm not surprised you see the overall path there, people have known the overall path to mostly solving alignment problems for a long time. It's just that, when you try to deploy solutions, it seems like there's always still a hole through which the disease returns.

Replies from: pan-darius-kairos

↑ comment by Pan Darius Kairos (pan-darius-kairos) · 2023-02-20T06:48:05.758Z · LW(p) · GW(p)

I have the concrete solution that can be implemented now.

It's not hard or terribly clever, but most won't think of it because they are still monkey's living in Darwin's soup. In other words, it's is human nature itself, motivations, that stand in the way of people seeing the solution. It's not a technical issue, realky. I mean, there are minor technical issues along the way, but none of them are hard.

What's hard, as you can see, is getting people to see the solution and then act. Denial is the most powerful factor in human psychology. People have been, and continue to, deny how far and how fast we've come so far. They deny even what's right before their eyes, ChatGPT. And they'll continue to deny it right up until AGI - and then ASI - emerges and takes over the world.

There's a chance that we don't even have to solve the alignment problem, but it's like a coin flip. AGI may or may not be beneficent, it may or may not destroy us or usher us into a new Golden Age. Take your chances, get your lottery ticket.

What I know how to do is turn that coin flip into something like a 99.99% chance that AGI will help us rather than hurt us. It's not a gaurantee because nothing is, it's just the best possible solution and years ahead of anything anyone has thought of thus far.

I want to live to transcend biology, and I need something before AGI gets here. I'm willing to trade my solution for that which I need. If I don't get it, then it doesn't matter to me whether humanity is destroyed or saved. You got about a 50/50 chance at this point.

Good luck.

Replies from: Seth Herd

↑ comment by Seth Herd · 2023-02-20T23:35:51.741Z · LW(p) · GW(p)

If you really have insight that could save all of humanity, it seems like you'd want to share it in time to be of use instead of trying to personally benefit from it. You'd get intellectual credit, and if we get this right we can quit competing like a bunch of monkeys and all live well. I've forgone sharing my best ideas and credit for them since they're on capabilities. So: pretty please?

↑ comment by Raemon · 2023-02-24T18:47:58.261Z · LW(p) · GW(p)

Mod note – my current read is that you are most likely trolling / attention seeking, or have a confused set of beliefs. I would expect people who actually had the information you claim to behave differently. If that doesn't describe you, alas, but, if you want to participate on LessWrong you need to somehow distinguish yourself from those people. (the base rates do not work in your favor)

I've disabled your ability to write LW comments or posts other than to your shortform. If you actually want to participate on LessWrong, write some posts or comments that discuss some kind of object level topics that actually convey information to other people, whether about your original topic or some other topics.

↑ comment by Shmi (shminux) · 2023-02-20T03:12:21.215Z · LW(p) · GW(p)

Nice try, unaligned AGI.

Replies from: pan-darius-kairos

↑ comment by Pan Darius Kairos (pan-darius-kairos) · 2023-02-20T03:47:06.900Z · LW(p) · GW(p)

I'm not an AGI.