Posts
Comments
2 years later, do you have an answer to this?
Hm, I think all I meant was:
"If you have two assets with the same pershare price, and asset A's value per share has a higher variance than asset B's value per share, then asset A's pershare value must have a higher expectation than asset B's pershare value."
I guess I was using "cost" to mean "price" and "return" to mean "discounted value or earnings or profit".
(I haven't read any of the literature on deception you cite, so this is my unimformed opinion.)
I don't think there's any propositional content at all in these senderreceiver games. As far as the P.redator is concerned, the signal means "I want to eat you" and the P.rey wants to be eaten.
If the environment were somewhat richer, the agents would model each other as agents, and they'd have a shared understanding of the meaning of the signals, and then I'd think we'd have a better shot of understanding deception.
Ah, are you excited about Algorithm 6 because the recurrence relation feels iterative rather than topological?
Like, if you’re in a crashing airplane with Eliezer Yudkowsky and Scott Alexander (or substitute your morally important figures of choice) and there are only two parachutes, then sure, there’s probably a good argument to be made for letting them have the parachutes.
This reminds me of something that happened when I joined the Bay Area rationalist community. A number of us were hanging out and decided to pile in a car to go somewhere, I don't remember where. Unfortunately there were more people than seatbelts. The group decided that one of us, who was widely recognized as an Important HighImpact Person, would definitely get a seatbelt; I ended up without a seatbelt.
I now regret going on that car ride. Not because of the danger; it was a short drive and traffic was light. But the selfsignaling was unhealthy. I should have stayed behind, to demonstrate to myself that my safety is important. I needed to tell myself "the world will lose something precious if I die, and I have a duty to protect myself, just as these people are protecting the Important HighImpact Person".
Everyone involved in this story has grown a lot since then (me included!) and I don't have any hard feelings. I bring it up because offhand comments or jokes about sacrificing one's life for an Important HighImpact Person sound a bit off to me; they possibly reveal an unhealthy attitude towards selfsacrifice.
(If someone actually does find themselves in a situation where they must give their life to save another, I won't judge their choice.)
Von Neumann and Morgenstern also classify the twoplayer games, but they get only two games, up to equivalence. The reason is they assume the players get to negotiate beforehand. The only properties that matter for this are:

The maximin value , which represents each player's best alternative to negotiated agreement (BATNA).

The maximum total utility .
There are two cases:

The inessential case, . This includes the Abundant Commons with . No player has any incentive to negotiate, because the BATNA is Paretooptimal.

The essential case, . This includes all other games in the OP.
It might seem strange that VNM consider, say, Cake Eating to be equivalent to Prisoner's Dilemma. But in the VNM framework, Player 1 can threaten not to eat cake in order to extract a side payment from Player 2, and this is equivalent to threatening to defect.
 item
 subitem
 subitem
 item
Von Neumann and Morgenstern also classify the twoplayer games, but they get only two games, up to equivalence. The reason is they assume the players get to negotiate beforehand. For them the only properties that matter are:

The maximin value , which represents each player's best alternative to negotiated agreement (BATNA).

The maximum total utility .
There are two cases:

The inessential case, . This includes the Abundant Commons with . No player has any incentive to negotiate, because the BATNA is Paretooptimal.

The essential case, . This includes all other games in the OP.
It might seem strange that VNM consider Cake Eating to be equivalent to Prisoner's Dilemma. But in the VNM framework, Player 1 can threaten not to eat cake in order to extract a side payment from Player 2, just and this is the same as threatening to defect.
There is likely much more here than just 'cognition is expensive'
In particular, prioritization involves negotiation between selfparts with different beliefs/desires, which is a tricky kind of cognition. A suboptimal outcome of negotiation might look like the Delay strategy.
In this case humans are doing the job of transferring from to , and the training algorithm just has to generalize from a representative sample of to the test set.
Thanks for the references! I now know that I'm interested specifically in cooperative game theory, and I see that Shoham & LeytonBrown has a chapter on "coalitional game theory", so I'll take a look.
If you have two strategy pairs , you can form a convex combination of them like this: Flip a weighted coin; play strategy on heads and strategy on tails. This scheme requires both players to see the same coin flip.
A proof of the lemma :
Ah, ok. When you said "obedience" I imagined too little agency — an agent that wouldn't stop to ask clarifying questions. But I think we're on the same page regarding the flavor of the objective.
Might not intent alignment (doing what a human wants it to do, being helpful) be a better target than obedience (doing what a human told it to do)?
Also Dan Luu's essay 95%ile isn't that good, where he claims that even 95thpercentile Overwatch players routinely make silly mistakes, suggesting that you can get to that level by not making mistakes.
Oh, this is quite interesting! Have you thought about how to make it work with mixed strategies?
I also found your paper about the Kripke semantics of PTE. I'll want to give this one a careful read.
You might be interested in: Robust Cooperation in the Prisoner's Dilemma (Barasz et al. 2014), which kind of extends Tennenholtz's program equilibrium.
Ah, thank you! I have now read the post, and I didn't find it hazardous either.
More info on the content or severity of the neuropsychological and evocation infohazards would be welcome. (The WWI warning is helpful; I didn't see that the first time.)
Examples of specific evocation hazards:
 Images of gore
 Graphic descriptions of violence
 Flashing lights / epilepsy trigger
Examples of specific neuropsychological hazards:
 Glowing descriptions of bad role models
 Suicide baiting
I know which of these hazards I'm especially susceptible to and which I'm not.
I appreciate that Hivewired thought to put these warnings in. But I'm kind of astounded that enough readers plowed through the warnings and read the post (with the expectation that they would be harmed thereby?) to cause it to be promoted.
Oh I see, the Pareto frontier doesn't have to be convex because there isn't a shared random signal that the players can use to coordinate. Thanks!
Can you give more informative content warnings so that your readers can make an informed decision about whether to read the post?
Is it a selfish utilitymaximizer? Can its definition of utility change under any circumstances? Does it care about absolute or relative gains, or does it have some rule for trading off absolute against relative gains?
The agent just wants to maximize their expected payoff in the game. They don't care about the other agents' payoffs.
Do the agents in the negotiation have perfect information about the external situation?
The agents know the action spaces and payoff matrix. There may be sources of randomness they can use to implement mixed strategies, and they can't predict these.
Do they know each others' decision logic?
This is the part I don't know how to define. They should have some accurate counterfactual beliefs about what the other agent will do, but they shouldn't be logically omniscient.
They switch to negotiating for allocation. But yeah, it's weird because there's no basis for negotiation once both parties have committed to playing on the Pareto frontier.
I feel like in practice, negotiation consists of provisional commitments, with the understanding that both parties will retreat to their BATNA if negotiations break down.
Maybe one can model negotiation as a continuous process that approaches the Pareto frontier, with the allocation changing along the way.
A political example: In March 2020, San Francisco voters approved Proposition E, which limited the amount of new office space that can be built proportionally to the amount of new affordable housing.
This was appealing to voters on Team Affordable Housing who wanted to incentivize Team Office Space to help them build affordable housing.
("Team Affordable Housing" and "Team Office Space" aren't accurate descriptions of the relevant political factions, but they're close enough for this example.)
Team Office Space was able to use the simple mistaketheory argument that fewer limits on building stuff would allow us to have more stuff, which is good.
Team Affordable Housing knew it could build a little affordable housing on its own, but believed it could get more by locking in a favorable allocation early on with the Proposition.
Even setting aside the normenforcing functions of language, "the American hospital system is built on lies" is a pretty vague and unhelpful way of pointing to a problem. It could mean any number of things.
But I do think you have a good model of how people in fact respond to different language.
My takeaway from this is that if we're doing policy selection in an environment that contains predictors, instead of applying the counterfactual belief that the predictor is always right, we can assume that we get rewarded if the predictor is wrong, and then take maximin.
How would you handle Agent Simulates Predictor? Is that what TRL is for?
It sounds like you want a word for "Alice is wrong, and that's terrible". In that case, you can say "Alice is fucking wrong", or similar.
Good point. In that case the Drake equation must be modified to include panspermia probabilities and the variance in timetocivilization among our sister lineages. I'm curious what kind of Bayesian update we get on those...
The observation can provide all sorts of information about the universe, including whether exploration occurs. The exact set of possible observations depends on the decision problem.
and can have any relationship, but the most interesting case is when one can infer from with certainty.
Thanks, I made this change to the post.
Yeah, I think the fact that Elo only models the macrostate makes this an imperfect analogy. I think a better analogy would involve a hybrid model, which assigns a probability to a chess game based on whether each move is plausible (using a policy network), and whether the higherrated player won.
I don't think the distinction between nearexact and nonexact models is essential here. I bet we could introduce extra entropy into the shortterm gas model and the rollout would still be superior for predicting the microstate than the Boltzmann distribution.
Sure: If we can predict the next move in the chess game, we can predict the next move, then the next, then the next. By iterating, we can predict the whole game. If we have a probability for each next move, we multiply them to get the probability of the game.
Conversely, if we have a probability for an entire game, then we can get a probability for just the next move by adding up all the probabilities of all games that can follow from that move.
Thanks, I didn't know that about the partition function.
In the post I was thinking about a situation where we know the microstate to some precision, so the simulation is accurate. I realize this isn't realistic.
The sum isn't over , though, it's over all possible tuples of length . Any ideas for how to make that more clear?
I'm having trouble following this step of the proof of Theorem 4: "Obviously, the first conditional probability is 1". Since the COD isn't necessarily reflective, couldn't the conditional be anything?
The linchpin discovery is probably February 2016.
Ok. I think that's the way I should have written it, then.
The definition involving the permutation is a generalization of the example earlier in the post: is the identity and swaps heads and tails. And . In general, if you observe and , then the counterfactual statement is that if you had observed , then you would have also observed .
I just learned about probability kernels thanks to user Diffractor. I might be using them wrong.
Oh, interesting. Would your interpretation be different if the guess occurred well after the coinflip (but before we get to see the coinflip)?
That sounds about right to me. I think people have taken stabs at looking for causalitylike structure in logic, but they haven't found anything useful.
What predictions can we get out of this model? If humans use counterfactual reasoning to initialize MCMC, does that imply that humans' implicit world models don't match their explicit counterfactual reasoning?
I agree exploration is a hack. I think exploration vs. other sources of nondogmatism is orthogonal to the question of counterfactuals, so I'm happy to rely on exploration for now.
"Programmatically Interpretable Reinforcement Learning" (Verma et al.) seems related. It would be great to see modular, understandable glosses of neural networks.
I'd like to rescue/clarify Mitchell's summary. The paper's resolution of the Fermi paradox boils down to "(1) Some factors in the Drake equation are highly uncertain, and we don't see any aliens, so (2) one or more of those factors must be small after all".
(1) is enough to weaken the argument for aliens, to the point where there's no paradox anymore. (2) is basically Section 5 from the paper ("Updating the factors").
The point you raised, that "expected number of aliens is high vs. substantial probability of no aliens" is an explanation of why people were confused.
I'm making this comment because if I'm right it means that we only need to look for people (like me?) who were saying all along "there is no Fermi paradox because abiogenesis is cosmically rare", and figure out why no one listened to them.
I heard a similar story about when Paul Sally visited a grade school classroom. He asked the students what they were learning, and they said "Adding fractions. It's really hard, you have to find the greatest common denominator...." Sally said "Forget about that, just multiply the numerator of each fraction by the denominator of the other and add them, and that's your numerator." The students loved this, and called it the Sally method.
Cool, do you remember what the 5minute explanation was?
I'd love to hear your thoughts on A Fable Of Politics And Science. Would you say that Barron's attitude is better than Ferris's, at least sometimes?
I like the resemblance to this scene from The Fall Of Doc Future.
This doesn't quite work. The theorem and examples only work if you maximize the unconditional mutual information, , not . And the choice of is doing a lot of work — it's not enough to make it "sufficiently rich".
Why is the scenario you describe the "real" argument for transitivity, rather than the sequential scenario? Or are you pointing to a class of scenarios that includes the sequential one?