To Boldly Code
post by StrivingForLegibility · 2024-01-26T18:25:59.525Z · LW · GW · 4 commentsContents
I Reject Your Incentives and Substitute My Own Transitive Logical Line-Of-Sight Joint Policy Optimization Robust Delegation None 4 comments
In the previous post [LW · GW], we looked at how networks of counterfactual mechanisms can be used to make logical commitments in open source games [? · GW]. And this post goes through an example!
I Reject Your Incentives and Substitute My Own
Alice and Bob are designing open-source programs to play the game on their behalf. Alice decides that a negotiation process is a useful mechanism to handle an incentive misalignment in , and designs AliceBot to direct its logical crystal ball [LW · GW] to a different game , which is a negotiation over conditional commitments [LW · GW] for .
Since AliceBot has BobBot's source code, it simulates how a negotiation between them would go. This introduces new instances of AliceBot and BobBot, so let's call the instances that will actually implement any policy Implementers, and the instances which negotiate conditional commitments over those policies Negotiators.
Implementer AliceBot supplies both Negotiators with all the relevant information they need for this negotiation. Conveniently, this negotiation game includes a communication channel where Negotiators can send each other messages. This happens to make it a lot easier to coordinate than in the underlying game .
Negotiator AliceBot sends over a preliminary message offering to negotiate over of conditional joint commitments , and sends over the conditions that Implementer AliceBot will be verifying before implementing its part of . In this negotiation protocol, it is now Negotiator BobBot's turn to reply with Implementer BobBot's or else terminate negotiations.
Transitive Logical Line-Of-Sight
To understand how that instance of BobBot responds, we need to switch over to Bob's POV. Inconveniently, the strong superrationality postulate [LW(p) · GW(p)] failed to hold: Bob reasoned differently than Alice about how to solve their incentive misalignment and designed BobBot to look first at some completely different mechanism-counterfactual game. That instance of AliceBot has enough information to know that its Implementer instance isn't going to condition its behavior on the output of this hypothetical game, and so helpfully terminates with an error message rather than make conditional commitments that its Implementer instance won't uphold.
Conveniently, BobBot was also designed with a fallback in this case, and launches a powerful formal verification tool that quickly builds a model of AliceBot and the information flow between modules. This allows both instances of BobBot to work backwards and notice that Implementer AliceBot is conditioning its behavior on the output of some other counterfactual game.
Joint Policy Optimization
Negotiator BobBot, finding itself in that counterfactual, begins to study the structure of the negotiation protocol Alice designed and how this will supposedly influence the joint policy implemented in . Negotiator BobBot can't discount the possibility that it was given incomplete or inaccurate information, but this negotiation protocol and AliceBot were both designed to be legible to analysis tools.
The joint policy space of is too large to search exhaustively, but Alice used some clever insights into the structure of this space to ignore large sub-spaces [? · GW] full of knowably-suboptimal joint policies. AliceBot and BobBot have common knowledge of their computational resources, and if they could communicate they could pass messages back and forth to divide up the computational load [LW · GW].
This protocol is instead being performed entirely acausally, which gives them a computational budget that looks like rather than . Which is a pain, but it lets Alice and Bob coordinate despite not being able to communicate.
Alice and Bob each supplied their delegates with their respective social choice theories, which score joint policies. AliceBot and BobBot negotiate at the level of social-utilities rather than individual-utilities, which doesn't eliminate all disagreements but tends to get closer to agreement.
BobBot responds with the three conditions that Implementer BobBot will verify before upholding its part of any joint commitment :
- Implementer AliceBot did indeed supply this instance of BobBot with all material information
- Implementer AliceBot will uphold its part of
- is fair according to
is a fairness criterion that Bob thought of using a different perspective than Alice, and BobBot was designed without knowing what Alice's social choice theory would be. Perhaps Alice and Bob might agree at the meta-level [? · GW] about the morally correct way to define a social choice theory, if given the chance to discuss the matter. But Alice and Bob aren't participants in this protocol, and they didn't design their delegates with any pathway towards this kind of convergence. Like thermostats or network routers, AliceBot and BobBot implement their decision theory without reflecting on how it might be improved.
Of course, Alice designed this negotiation protocol such that no participant could do better for themselves by adding more constraints. ends up being somewhat skewed in Bob's favor, from AliceBot's perspective. isn't an ultimatum game, but it would be the equivalent of saying that Alice should only get $48 when splitting $100.
Negotiator AliceBot has already sent over her own conditions, including how they interact [LW · GW] with conditions like these. If Negotiator BobBot had capped Alice's gains at the equivalent of $50, his own gains would have likewise been capped at $50. Instead, each find their gains capped at the equivalent of $47.98. (The $48 that was effectively offered to Alice, minus a small penalty that each impose to give the other an incentive to converge at the meta-level.)
AliceBot and BobBot are programmed to accept losses like these as the price of incentivizing what Alice and Bob respectively see as the socially optimal outcome. They feel no temptation to best-respond [LW · GW] and abandon the Schelling boundary that their creators tasked them with enforcing. The negotiation protocol returns a conditional joint commitment that Alice and Bob will each separately think achieves about 96% of the achievable social utility, had their decision theory been universalized [LW · GW].
One level of simulation up, each Implementer checks the conditions that their Negotiator instances foresaw. All checks pass, and Implementer AliceBot and BobBot implement a joint policy from rather than .
Robust Delegation
Alice and Bob each see a message like the following, written using words and concepts native to each:
Logical Handshake Successful!
Enacting commitment.
Economic surplus generated: $959,608,188.31.
Warning: Economic efficiency at 96%. Click here for more details.
The first acausal treaty to be negotiated between their civilizations, but not the last.
- ^
Thanks to ChatGPT 3.5 for the title!
4 comments
Comments sorted by top scores.
comment by Carl Feynman (carl-feynman) · 2024-01-26T22:25:12.724Z · LW(p) · GW(p)
Just my two cents:
You might get more interaction with this essay (story?) if you explained at the beginning what you are trying to accomplish by writing it. I read the first two paragraphs and had no motivation to keep reading further. I skipped to the last paragraph and was not further enlightened.
Replies from: StrivingForLegibility↑ comment by StrivingForLegibility · 2024-01-27T00:43:12.758Z · LW(p) · GW(p)
Thank you! I started writing the previous post [LW · GW] in this sequence and decided to break the example off into its own post.
For anyone else looking for a TLDR: this is an example of how a network of counterfactual mechanisms can be used to make logical commitments for an arbitrary game.
Replies from: carl-feynman↑ comment by Carl Feynman (carl-feynman) · 2024-01-27T16:05:25.263Z · LW(p) · GW(p)
Put those two sentences at the beginning of your post and my objection goes away!
Replies from: StrivingForLegibility↑ comment by StrivingForLegibility · 2024-01-28T02:00:32.864Z · LW(p) · GW(p)
I'd been thinking about "cleanness", but I think you're right that "being oriented to what we're even talking about" is more important. Thank you again for the advice!