Posts

AI Will Not Want to Self-Improve 2023-05-16T20:53:33.635Z

Comments

Comment by petersalib on AI Rights for Human Safety · 2024-08-08T13:56:45.841Z · LW · GW

Hi Seth--the other author of the paper here. 

I think there are two things to say to your question. The first is that, in one sense, we agree. There are no guarantees here. Conditions could evolve such that there is no longer any positive-sum trade possible between humans and AGIs. Then, the economic interactions model is not going to provide humans any benefits. 

BUT, we think that there will be scope for positive-sum trade substantially longer than is currently assumed. Most people thinking about this (including, I think, your question above) treat the most important question as: Can AI automate all tasks, and perform them more efficiently (with fewer inputs) than humans. This, we argue, following e.g., Noah Smith, isn't quite right. That is a question about who has the absolute advantage at a task. But for trade, what matters is who has the comparative advantage. Comparative advantage is not about who can do X most efficiently (in the simple sense), but instead who can do it at lowest opportunity cost. 

AIs may face very high opportunity costs precisely because they are so capable at doing the things they value. We imagine, e.g., an AI whose ultimate goal is finding prime numbers. Suppose it is massively more efficient at this than humans--and also more efficient at all possible tasks. Suppose further that the AI is constrained at the margin in by compute. Thus, for each marginal A100 produced, the AI can either use it to find more primes (EXTREMELY HIGH VALUE TO AI) or use it to pilot a robot that maintains its own servers (low value to AI). Here, the AI may well prefer to use the A100 to find more primes and pay humans to maintain the server racks. Even better if it pay humans with something they value immensely but which is very cheap for the AI to produce. Maybe, e.g., a vaccine. 

This is just a toy example, but I think it gives the idea. There are many quesitons here, especially about what resource will constrain AGI at the margin, and how rivalrous human consumption will be w/r/t that resource. If the AI is constrained at the margin, and blanketing the whole earth in solar panels is by far the cheapest way to get it, we may be doomed. If constrained to some degree by compute and power, and space-based fusion reactors are almost as cheap as solar, maybe we're fine. It's complicated!

Another thing worth mentioning here is that the existence of human-AI trade won't eliminate the human-human economy. Similarly, US-Korea trade didn't eliminate the intra-Korea economy.  What it did do was help to push incomes up across Korea, including in sectors that don't export. This is for a bunch of reasons, including international trade's general productivity enancements via technology exhange, but also Baumol effects spilling over to purely domestic markets. 

If we think of humans like the Asian Tiger economies, and the AIs like the US or EU economies, I think the world of long-run trade with AIs doesn't seem that bad. True, the US is much richer per capita than South Korea. But they are also very rich, compared with the globe and their own baseline. So we can imagine a world in which AIs do, indeed, have almost all of the property. But the total amount of property/consumption is just so vast that, even with a small share, humans are immensely wealthy by contemporary standards. 

Comment by petersalib on AI Will Not Want to Self-Improve · 2023-05-31T18:59:23.440Z · LW · GW

A few people have pointed out this question of (non)identity. I've updated the full draft in the link at the top to address it. But, in short, I think the answer is that, whether an initial AI creates a successor or simply modifies its own body of code (or hardware, etc.), it faces the possibility that the new AI failed to share its goals. If so, the successor AI would not want to revert to the original. It would want to preserve its own goals. It's possible that there is some way to predict an emergent value drift just before it happens and cease improvement. But I'm not sure it would be, unless the AI had solved interpretability and could rigorously monitor the relevant parameters (or equivalent code). 

Comment by petersalib on AI Will Not Want to Self-Improve · 2023-05-31T18:53:06.360Z · LW · GW

I think my response to this is similar to the one to Wei Dai above. Which is to agree that there are certain kinds of improvements that generate less risk of misalignment but it's hard to be certain. It seems like those paths are (1) less likely to produce transformational improvements in capabilities than other, more aggressive, changes and (2) not the kinds of changes we usually worry about in the arguments for human-AI risk, such that the risks remain largely symmetric. But maybe I'm missing something here!

Comment by petersalib on AI Will Not Want to Self-Improve · 2023-05-31T18:48:40.578Z · LW · GW

This seems right to me, and the essay could probably benefit from saying something about what counts as self-improvement in the relevant sense. I think the answer is probably something like "improvements that could plausibly lead to unplanned changes in the model's goals (final or sub)." It's hard to know exactly what those are. I agree it's less likely that simply increasing processor speed a bit would do it (though Bostrom argues that big speed increases might). At any rate, it seems to me that whatever the set includes, it will be symmetric as between human-produced and AI-produced improvements to AI. So for the important improvements--the ones risking misalignment--the arguments should remain symmetrical.