0 comments

Comments sorted by top scores.

comment by the gears to ascension (lahwran) · 2024-08-01T23:39:24.308Z · LW(p) · GW(p)

Relying on markets for alignment implicitly assumes that economic incentives will naturally lead to aligned behavior. But we know from human societies that markets alone don't guarantee particular outcomes - real-world markets often produce unintended negative consequences at scale. Attempts at preventing this exist, but none are even close to leak-free, strong enough to contain the most extreme malicious agents the system instantiates or prevent them from breaking the system's properties; in other words, markets have particularly severe inner alignment issues, especially compared to backprop. Markets are fundamentally driven by the pursuit of defined rewards or currencies, so in such a system, how do we ensure that the currency being optimized for truly captures what we care about - how do you ensure that you're only paying for good things, in a deep way, rather than things that have "good" printed on the box?

Also, using markets in the context of alignment is nothing new, MIRI has been talking about it for years; the agent foundations group has many serious open problems related to it. If you want to make progress on making something like this a good idea, you're going to need to do theory work, because it can be confidently known now that the design you proposed is catastrophically misaligned and will exhibit all the ordinary failures markets have now.

Replies from: abhimanyu-pallavi-sudhir

↑ comment by Abhimanyu Pallavi Sudhir (abhimanyu-pallavi-sudhir) · 2024-08-02T08:34:31.868Z · LW(p) · GW(p)

Your claims about markets seem just wrong to me. Markets generally do what their consumers want, and their failures are largely the result of transaction costs. Some of these transaction costs have to do with information asymmetry (which needs to be solved), but many others that show up in the real world (related to standard problems like negative externalities etc.) can just be removed by construction in virtual markets.

Markets are fundamentally driven by the pursuit of defined rewards or currencies, so in such a system, how do we ensure that the currency being optimized for truly captures what we care about

By having humans be the consumers in the market. Yes, it is possible to "trick" the consumers, but the idea is that if any oversight protocol is possible at all, then the consumers will naturally buy information from there, and AIs will learn to expect this changing reward function.

MIRI has been talking about it for years; the agent foundations group has many serious open problems related to it.

Can you send me a link? The only thing on "markets in an alignment context" I've found on this from the MIRI side is the Wentworth-Soares discussion, but that seems like a very different issue.

it can be confidently known now that the design you proposed is catastrophically misaligned

Can you send me a link for where this was confidently shown? This is a very strong claim to make, nobody even makes this claim in the context of backprop.