Large Language Models Suggest a Path to Ems

post by anithite (obserience) · 2022-12-29T02:20:01.753Z · LW · GW · 2 comments

Contents

  Alignment?
  Reasons for Hope
  Could this go Superhuman?
  Capabilities Generalization?
  Isn't this just ems?
  Won't a country with no ethics boards do this first?
  Practical next steps (no brain implants)
None
2 comments

TL:DR:

TL:DR end

supposition: intelligence learning scale of difficulty:

  1. from scratch in a simulated world via RL(reinforcement learning) (AlphaGo Zero, XLand, Reward is Enough)
  2. from humans via imitation learning (LLMs, RL pretrained on human play(Minecraft), human children)
  1. Distillation (learn from teacher in white box model)

Distillation in a nutshell:

supposition:why this works

My pitch: Train AI from humans via brain implant generated data.

How this might work:

Problems:

Alignment?

Obviously get training data from kind non-psychopaths. If the resulting models have similar behavior to the humans they're modeled from they're going to be aligned because they're basically copies. Problems arise if the resulting copies are damaged in some way especially if that leads to loss of empathy, insanity or something.

Reasons for Hope

RL and other pure AI approaches seem to require a lot of tinkering to get working. Domain specific agents designed around the task can do pretty well but AGI is elusive. They do well when training data can be generated through self play(Chess, Go, DOTA) or where human world models don't apply(AlphaFold(Protein folding), FermiNet(Quantum mechanics)) and the AI is beating the best human designed algorithms but pure RL approaches have to learn a world model and a policy from scratch. Human data based approaches get both from the training dataset. This is an after the fact rationalisation of why LLMs have had so much success.

As a real life example, RL approaches have a ways to go before they can do things like solve real world action planning problems which LLMs get more or less for free(palm-saycan).

Could this go Superhuman?

Larger models should just end up overparameterized. All bets are off if you add 20 zeroes as with any ML system.

Capabilities Generalization?

Capabilities research for imitation AI shouldn't transfer well to RL because the core problem with RL is training signal sparsity and the resulting credit attribution problem. Knowing how the adult human brain works might help but info on early childhood development won't be there (hopefully) which is a good thing for ASI risk because early childhood brain development details would be very useful to anyone designing RL based AIs. Additionally, the human brain uses feedback connections extensively. Distilling an existing human brain from lots of intermediate neuron level data should stabilize things but from scratch training of a similar network won’t work using modern methods.

The bad scenario is if data collected allows reverse engineering a general purpose learning algorithm. Maybe the early childhood info isn't that important. Someone then clones a github repo adds some zeroes to the config and three months into the training run the world ends. Neuralink has some data from live animal brains collected over month-long timespans. Despite this, the world hasn't ended which lowers my belief this sort of data is dangerous. They also might be politely sitting on something very dangerous.

If capabilities don't transfer there's plausibly some buffer time between AGI and ASI. Scalable human level AGI based on smart technically capable non psychopaths is a very powerful tool that might be enough to solve big problems. There's a number of plausible okayish outcomes that could see us not all dying! That's a win in my book.

Isn't this just ems?

Yes, yes it is ... probably. In an ideal world maybe we wouldn't do this? On the one hand, human imitation based AGIs are in my moral circle, abusing them is bad, but some humans are workaholics voluntarily and/or don't have no-messing-with-my-brain terminal values such that they would approve of becoming aligned workaholics or contributing to an aligned workaholic gestalt especially in the pursuit of altruistic goals. Outcomes where ems don't end up in virtual hell are acceptable to me or at least better than the alternative(paperclips). Best case scenario, em (AI/conventional)programmers automate the boring stuff so no morally valuable entities have to do boring mindless work. There are likely paths where high skill level ems won't be forced to do things they hate. The status quo is worse in some ways. Lots of people hate their jobs.

Won't a country with no ethics boards do this first?

They might have problems trusting the resulting ems. You could say they'd have a bit of an AGI alignment problem. Also democratic countries have an oligopoly on leading edge chip manufacturing so large scale deployment might be tough. Another country could also steal some em copies and offer better working conditions. There's a book to be written in all of this somewhere.

Practical next steps (no brain implants)

If this can be shown to work and work efficiently, starting the real world biotech side becomes more promising.

2 comments

Comments sorted by top scores.

comment by Gordon Seidoh Worley (gworley) · 2022-12-29T22:51:13.119Z · LW(p) · GW(p)

Interesting idea but I'm suspicious that LLMs are enough for us to accept these as EMs. I think more likely people will treat such trained models as not true EMs but rather like ghosts who are fixed on who the person was when the model was trained.

The idea from fiction that came to mind is the people in portraits in Harry Potter.

Of course, such a thing is still pretty useful! I'm not sure LLMs are good enough at the sort of online learning and ontological shifts and other complex things we expect from people and thus EMs.

Replies from: obserience
comment by anithite (obserience) · 2022-12-30T04:03:37.037Z · LW(p) · GW(p)

This is just a way to take a bunch of humans and copy paste till current pressing problems are solvable. If public opinion doesn't affect deployment it doesn't matter.

Models that can't learn or change don't go insane. Fine tuning on later brain data once subjects have learned a new capability can substitute. Getting the em/model to learn in silicon is a problem to solve after there's a working model.

I edited the TL:DR to better emphasize that the preferred implementation is using brain data to train whatever shape of model the data suggests, not necessarily transformers.

The key point is that using internal brain state for training an ML model to imitate a human is probably the fastest way to get a passable copy of that human and that's AGI solved.