Posts

Equilibrium and prior selection problems in multipolar deployment 2020-04-02T20:06:14.298Z · score: 17 (7 votes)
Section 7: Foundations of Rational Agency 2019-12-22T02:05:24.380Z · score: 16 (3 votes)
Sections 5 & 6: Contemporary Architectures, Humans in the Loop 2019-12-20T03:52:43.629Z · score: 28 (6 votes)
Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms 2019-12-17T21:46:49.216Z · score: 21 (7 votes)
Sections 1 & 2: Introduction, Strategy and Governance 2019-12-17T21:27:30.496Z · score: 35 (10 votes)
Acknowledgements & References 2019-12-14T07:04:06.272Z · score: 7 (6 votes)
Preface to EAF's Research Agenda on Cooperation, Conflict, and TAI 2019-12-13T21:02:48.552Z · score: 54 (18 votes)
First application round of the EAF Fund 2019-07-08T00:20:56.565Z · score: 20 (6 votes)

Comments

Comment by jesseclifton on Equilibrium and prior selection problems in multipolar deployment · 2020-04-05T17:37:41.547Z · score: 1 (1 votes) · LW · GW

The new summary looks good =) Although I second Michael Dennis' comment below, that the infinite regress of priors is avoided in standard game theory by specifying a common prior. Indeed the specification of this prior leads to a prior selection problem.

The formality of "priors / equilibria" doesn't have any benefit in this case (there aren't any theorems to be proven)

I’m not sure if you mean “there aren’t any theorems to be proven” or “any theorem that’s proven in this framework would be useless”. The former is false, e.g. there are things to prove about the construction of learning equilibria in various settings. I’m sympathetic with the latter criticism, though my own intuition is that working with the formalism will help uncover practically useful methods for promoting cooperation, and point to problems that might not be obvious otherwise. I'm trying to make progress in this direction in this paper, though I wouldn't yet call this practical.

The one benefit I see is that it signals that "no, even if we formalize it, the problem doesn't go away", to those people who think that once formalized sufficiently all problems go away via the magic of Bayesian reasoning

Yes, this is a major benefit I have in mind!

The strategy of agreeing on a joint welfare function is already a heuristic and isn't an optimal strategy; it feels very weird to suppose that initially a heuristic is used and then we suddenly switch to pure optimality

I’m not sure what you mean by “heuristic” or “optimality” here. I don’t know of any good notion of optimality which is independent of the other players, which is why there is an equilibrium selection problem. The welfare function selects among the many equilibria (i.e. it selects one which optimizes the welfare). I wouldn't call this a heuristic. There has to be some way to select among equilibria, and the welfare function is chosen such that the resulting equilibrium is acceptable by each of the principals' lights.

Comment by jesseclifton on Equilibrium and prior selection problems in multipolar deployment · 2020-04-05T03:21:56.829Z · score: 2 (2 votes) · LW · GW

both players want to optimize the welfare function (making it a collaborative game)

The game is collaborative in the sense that a welfare function is optimized in equilibrium, but the principals will in general have different terminal goals (reward functions) and the equilibrium will be enforced with punishments (cf. tit-for-tat).

the issue is primarily that in a collaborative game, the optimal thing for you to do depends strongly on who your partner is, but you may not have a good understanding of who your partner is, and if you're wrong you can do arbitrarily poorly

Agreed, but there's the additional point that in the case of principals designing AI agents, the principals can (in theory) coordinate to ensure that the agents "know who their partner is". That is, they can coordinate on critical game-theoretic parameters of their respective agents.

Comment by jesseclifton on How special are human brains among animal brains? · 2020-04-03T06:50:41.286Z · score: 3 (2 votes) · LW · GW

Chimpanzees, crows, and dolphins are capable of impressive feats of higher intelligence, and I don’t think there’s any particular reason to think that Neanderthals are capable of doing anything qualitatively more impressive

This seems like a pretty cursory treatment of what seems like quite a complicated and contentious subject. A few possible counterexamples jump to mind. These are just things I remember coming across when browsing cognitive science sources over the years.

My nonexpert sense is that it is at least controversial both how each of this is connected with language, and the extent to which nonhumans are capable of them.

Comment by jesseclifton on Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms · 2020-03-29T19:26:33.906Z · score: 2 (2 votes) · LW · GW

Yep, fixed, thanks :)

Comment by jesseclifton on Sections 5 & 6: Contemporary Architectures, Humans in the Loop · 2020-03-29T19:21:38.921Z · score: 2 (2 votes) · LW · GW

Fixed, thanks :)

Comment by jesseclifton on Section 7: Foundations of Rational Agency · 2020-03-29T19:17:21.231Z · score: 2 (2 votes) · LW · GW

Should be "same", fixed, thanks :)

Comment by jesseclifton on Instrumental Occam? · 2020-02-01T02:31:58.561Z · score: 4 (3 votes) · LW · GW

In model-free RL, policy-based methods choose policies by optimizing a noisy estimate of the policy's value. This is analogous to optimizing a noisy estimate of prediction accuracy (i.e., accuracy on the training data) to choose a predictive model. So we often need to trade variance for bias in the policy-learning case (i.e., shrink towards simpler policies) just as in the predictive modeling case.

Comment by jesseclifton on MichaelA's Shortform · 2020-01-24T06:10:02.028Z · score: 2 (2 votes) · LW · GW

There are "reliabilist" accounts of what makes a credence justified. There are different accounts, but they say (very roughly) that a credence is justified if it is produced by a process that is close to the truth on average. See (this paper)[https://philpapers.org/rec/PETWIJ-2].

Frequentist statistics can be seen as a version of reliabilism. Criteria like the Brier score for evaluating forecasters can also be understood in a reliabilist framework.

Comment by jesseclifton on Exploring safe exploration · 2020-01-08T05:41:33.029Z · score: 3 (2 votes) · LW · GW

Maybe pedantic but, couldn't we just look at the decision process as a sequence of episodes from the POMDP, and formulate the problem in terms of the regret incurred by our learning algorithm in this decision process? In particular, if catastrophic outcomes (i.e., ones which dominate the total regret) are possible, then a low-regret learning algorithm will have to be safe while still gathering some information that helps in future episodes. (On this view, the goal of safe exploration research is the same as the goal of learning generally: design low-regret learning algorithms. It's just that the distribution of rewards in some cases implies that low-regret learning algorithms have to be "safe" ones.)

Comment by jesseclifton on Sections 5 & 6: Contemporary Architectures, Humans in the Loop · 2019-12-21T20:12:34.988Z · score: 3 (3 votes) · LW · GW

I definitely think it's worth exploring. I have the intuition that creating a single agent might be difficult for various logistical and political reasons, and so it feels more robust to figure out the multiagent case. But I would certainly like to have a clearer picture of how and under what circumstances several AI developers might implement a single compromise agent.

Comment by jesseclifton on Sections 1 & 2: Introduction, Strategy and Governance · 2019-12-20T05:03:04.801Z · score: 2 (2 votes) · LW · GW

Ah, I see now that I did not make this clear at all. The main thing in the case of war is that, under certain payoff structures, a state might not be able to credibly commit to the terms of a peaceful settlement if it is expected to increase in power relative to its counterpart. Thus the state who expects to lose relative power will sometimes rather wage preventative war (while it is still relatively strong) than settle. This is still a problem in models with complete information and divisible stakes.

I'll try to edit the text to make this clearer soon, thanks for bringing it up.

Comment by jesseclifton on Sections 1 & 2: Introduction, Strategy and Governance · 2019-12-17T19:15:39.854Z · score: 2 (2 votes) · LW · GW

It seems plausible that if players could truthfully disclose private information and divide stakes, the ability to credibly commit would often not be needed

Even if the players can find a settlement that they both prefer to conflict (e.g., flipping a coin to decide who gets the territory) there's still the problem of committing to honoring that settlement (you might still just attack me if the coin doesn't land in your favor). So I think there's still a problem. But maybe you're saying that if there's really no private information, then there is no credibility problem, because players can anticipate defections because they know everything about their counterpart? Something like that?

Comment by jesseclifton on Preface to EAF's Research Agenda on Cooperation, Conflict, and TAI · 2019-12-17T18:55:34.570Z · score: 6 (4 votes) · LW · GW

Do you think focusing on s-risks leads to meaningfully different technical goals than focusing on other considerations?

I think it definitely leads to a difference in prioritization among the things one could study under the broad heading of AI safety. Hopefully this will be clear in the body of the agenda. And, some considerations around possible downsides of certain alignment work might be more salient to those focused on s-risk; the possibility that attempts at alignment with human values could lead to very bad “near misses” is an example. (I think some other EAF researchers have more developed views on this than myself.) But, in this document and my own current research I’ve tried to choose directions that are especially important from the s-risk perspective but which are also valuable by the lights of non-s-risk-focused folks working in the area.

[Just speaking for myself here]

I find myself someone confused by s-risks as defined here

For what it’s worth, EAF is currently deliberating about this definition and it might change soon.