0 comments
Comments sorted by top scores.
comment by Yair Halberstadt (yair-halberstadt) · 2022-06-24T06:38:32.826Z · LW(p) · GW(p)
If an army of human level AGIs could work together to solve problems we currently can't superhumanly fast, then they combined would effectively be an AGI, and we would have to make sure they were aligned with us first.
Replies from: joshua-clymer↑ comment by joshc (joshua-clymer) · 2022-06-24T18:21:44.098Z · LW(p) · GW(p)
I know that this is a common argument against amplification, but I've never found it super compelling.
People often point to evil corporations to show that unaligned behavior can emerge from aligned humans, but I don't think this analogy is very strong. Humans in fact do not share the same goals and are generally competing with each other over resources and power, which seems like the main source of inadequate equilibria to me.
If everyone in the world was a copy of Eliezer, I don't think we would have a coordination problem around building AGI. They would probably have an Eliezer government that is constantly looking out for emergent misalignment and suggesting organizational changes to squash it. Since everyone in this world is optimizing for making AGI go well and not for profit or status among their Eliezer peers, all you have to do is tell them what the problem is and what they need to do to fix it. You don't have to threaten them with jail time or worry that they will exploit loopholes in Eliezer law.
I think it is quite likely that I am missing something here and it would be great if you could flush this argument out a little more or direct me towards a post that does.
comment by Daniel Kokotajlo (daniel-kokotajlo) · 2022-06-24T16:30:57.069Z · LW(p) · GW(p)
typo: Ajaya should be Ajeya
Replies from: joshua-clymer↑ comment by joshc (joshua-clymer) · 2022-06-25T01:20:51.647Z · LW(p) · GW(p)
Thanks!
comment by Ericf · 2022-06-24T12:09:17.343Z · LW(p) · GW(p)
Humans are trained on a tiny unique subset of available training data. I would expect multiple instances of a set of AI software trained on close to the same set of data to think very similar to each-other, and not provide more creative capability than a single AI with more bandwidth.
Replies from: joshua-clymer↑ comment by joshc (joshua-clymer) · 2022-06-24T17:45:05.646Z · LW(p) · GW(p)
That's a good point. I guess I don't expect this to be a big problem because:
1. I think 1,000,000 copies of myself could still get a heck of a lot done.
2. The first human-level AGI might be way more creative than your average human. It would probably be trained on data from billions of humans, so all of those different ways of thinking could be latent in the model.
2. The copies can potentially diverge. I'm expecting the first transformative model to be stateful and be able to meta-learn. This could be as simple as giving a transformer read and write access to an external memory and training it over longer time horizons. The copies could meta-learn on different data and different sub-problems and bring different perspectives to the table.