Posts

My Marriage Vows 2021-07-21T10:48:24.443Z
Needed: AI infohazard policy 2020-09-21T15:26:05.040Z
Deminatalist Total Utilitarianism 2020-04-16T15:53:13.953Z
The Reasonable Effectiveness of Mathematics or: AI vs sandwiches 2020-02-14T18:46:39.280Z
Offer of co-authorship 2020-01-10T17:44:00.977Z
Intelligence Rising 2019-11-27T17:08:40.958Z
Vanessa Kosoy's Shortform 2019-10-18T12:26:32.801Z
Biorisks and X-Risks 2019-10-07T23:29:14.898Z
Slate Star Codex Tel Aviv 2019 2019-09-05T18:29:53.039Z
Offer of collaboration and/or mentorship 2019-05-16T14:16:20.684Z
Reinforcement learning with imperceptible rewards 2019-04-07T10:27:34.127Z
Dimensional regret without resets 2018-11-16T19:22:32.551Z
Computational complexity of RL with traps 2018-08-29T09:17:08.655Z
Entropic Regret I: Deterministic MDPs 2018-08-16T13:08:15.570Z
Algo trading is a central example of AI risk 2018-07-28T20:31:55.422Z
The Learning-Theoretic AI Alignment Research Agenda 2018-07-04T09:53:31.000Z
Meta: IAFF vs LessWrong 2018-06-30T21:15:56.000Z
Computing an exact quantilal policy 2018-04-12T09:23:27.000Z
Quantilal control for finite MDPs 2018-04-12T09:21:10.000Z
Improved regret bound for DRL 2018-03-02T12:49:27.000Z
More precise regret bound for DRL 2018-02-14T11:58:31.000Z
Catastrophe Mitigation Using DRL (Appendices) 2018-02-14T11:57:47.000Z
Bugs? 2018-01-21T21:32:10.492Z
The Behavioral Economics of Welfare 2017-12-22T11:35:09.617Z
Improved formalism for corruption in DIRL 2017-11-30T16:52:42.000Z
Why DRL doesn't work for arbitrary environments 2017-11-30T12:22:37.000Z
Catastrophe Mitigation Using DRL 2017-11-22T05:54:42.000Z
Catastrophe Mitigation Using DRL 2017-11-17T15:38:18.000Z
Delegative Reinforcement Learning with a Merely Sane Advisor 2017-10-05T14:15:45.000Z
On the computational feasibility of forecasting using gamblers 2017-07-18T14:00:00.000Z
Delegative Inverse Reinforcement Learning 2017-07-12T12:18:22.000Z
Learning incomplete models using dominant markets 2017-04-28T09:57:16.000Z
Dominant stochastic markets 2017-03-17T12:16:55.000Z
A measure-theoretic generalization of logical induction 2017-01-18T13:56:20.000Z
Towards learning incomplete models using inner prediction markets 2017-01-08T13:37:53.000Z
Subagent perfect minimax 2017-01-06T13:47:12.000Z
Minimax forecasting 2016-12-14T08:22:13.000Z
Minimax and dynamic (in)consistency 2016-12-11T10:42:08.000Z
Attacking the grain of truth problem using Bayes-Savage agents 2016-10-20T14:41:56.000Z
IRL is hard 2016-09-13T14:55:26.000Z
Stabilizing logical counterfactuals by pseudorandomization 2016-05-25T12:05:07.000Z
Stability of optimal predictor schemes under a broader class of reductions 2016-04-30T14:17:35.000Z
Predictor schemes with logarithmic advice 2016-03-27T08:41:23.000Z
Reflection with optimal predictors 2016-03-22T17:20:37.000Z
Logical counterfactuals for random algorithms 2016-01-06T13:29:52.000Z
Quasi-optimal predictors 2015-12-25T14:17:05.000Z
Implementing CDT with optimal predictor systems 2015-12-20T12:58:44.000Z
Bounded Solomonoff induction using optimal predictor schemes 2015-11-10T13:59:29.000Z
Superrationality in arbitrary games 2015-11-04T18:20:41.000Z
Optimal predictor schemes 2015-11-01T17:28:46.000Z

Comments

Comment by Vanessa Kosoy (vanessa-kosoy) on £2000 bounty - contraceptives (and UTI) literature review · 2021-09-16T00:03:07.008Z · LW · GW

You can search on scholar.google.com (if normal google isn't good enough) and get them from scihub/libgen.

Comment by Vanessa Kosoy (vanessa-kosoy) on Is MIRI's reading list up to date? · 2021-09-12T19:33:20.582Z · LW · GW

For background on my own research programme, I recommend:

There are some other topics that are important but I'm not sure what reading to recommend: functional analysis, algorithmic information theory, Markov decision processes.

Comment by Vanessa Kosoy (vanessa-kosoy) on Progress on Causal Influence Diagrams · 2021-09-12T18:30:06.555Z · LW · GW

IIUC, in a multi-agent influence model, every subgame perfect equilibrium is also a subgame perfect equilibrium in the corresponding extensive form game, but the converse is false in general. Do you know whether at least one subgame perfect equilibrium exists for any MAIM? I couldn't find it in the paper.

Comment by Vanessa Kosoy (vanessa-kosoy) on Information At A Distance Is Mediated By Deterministic Constraints · 2021-09-11T23:54:45.752Z · LW · GW

So, your thesis is, only exponential models give rise to nice abstractions? And, since it's important to have abstractions, we might just as well have our agents reason exclusively in terms of exponential models?

Comment by Vanessa Kosoy (vanessa-kosoy) on Information At A Distance Is Mediated By Deterministic Constraints · 2021-09-11T21:26:11.807Z · LW · GW

I'm still confused. What direction of GKPD do you want to use? It sounds like you want to use the low-dimensional statistic => exponential family direction. Why? What is good about some family being exponential?

Comment by Vanessa Kosoy (vanessa-kosoy) on Information At A Distance Is Mediated By Deterministic Constraints · 2021-09-10T20:42:01.578Z · LW · GW

Can you explain how the generalized KPD fits into all of this? KPD is about estimating the parameters of a model from samples via a low dimensional statistic, whereas you are talking about estimating one part of a sample from another (distant) part of the sample via a low dimensional statistic. Are you using KPD to rule out "high-dimensional" correlations going through the parameters of the model?

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-09-10T19:39:47.255Z · LW · GW

The way I think about instrumental goals is: You have have an MDP with a hierarchical structure (i.e. the states are the leaves of a rooted tree), s.t. transitions between states that differ on a higher level of the hierarchy (i.e. correspond to branches that split early) are slower than transitions between states that differ on lower levels of the hierarchy. Then quasi-stationary distributions on states resulting from different policies on the "inner MDP" of a particular "metastate" effectively function as actions w.r.t. to the higher levels. Under some assumptions it should be possible to efficiently control such an MDP in time complexity much lower than polynomial in the total number of states[1]. Hopefully it is also possible to efficiently learn this type of hypothesis.

I don't think that anywhere there we will need a lemma saying that the algorithm picks "aligned" goals.


  1. For example, if each vertex in the tree has the structure of one of some small set of MDPs, and you are given mappings from admissible distributions on "child" MDPs to actions of "parent" MDP that is compatible with the transition kernel. ↩︎

Comment by Vanessa Kosoy (vanessa-kosoy) on Agency in Conway’s Game of Life · 2021-09-09T23:46:09.237Z · LW · GW

I think the GoL is not the best example for this sort of questions. See this post by Scott Aaronson discussing the notion of "physical universality" which seems relevant here.

Also, like other commenters pointed out, I don't think the object you get here is necessarily AI. That's because the "laws of physics" and the distribution of initial conditions are assumed to be simple and known. An AI would be something that can accomplish an objective of this sort while also having to learn the rules of the automaton or detect patterns in the initial conditions. For example, instead of initializing the rest of the field uniformly randomly, you could initialize it using something like the Solomonoff prior.

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-09-09T21:17:40.328Z · LW · GW

I don't understand what Lemma 1 is if it's not some kind of performance guarantee. So, this reasoning seems kinda circular. But, maybe I misunderstand.

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-09-09T20:02:28.907Z · LW · GW

It's only a problem if we also claim that the "find a learning algorithm that satisfies the desiderata" part is not an AGI safety problem.

I never said it's not a safety problem. I only said that a lot progress on this can come from research that is not very "safety specific". I would certainly work on it if "precisely defining safe" was already solved.

That's also where I was coming from when I expressed skepticism about "strong formal guarantees". We have no performance guarantee about the brain, and we have no performance guarantee about AlphaGo, to my knowledge.

Yes, we don't have these things. That doesn't mean these things don't exist. Surely all research is about going from "not having" things to "having" things? (Strictly speaking, it would be very hard to literally have a performance guarantee about the brain since the brain doesn't have to be anything like a "clean" implementation of a particular algorithm. But that's besides the point.)

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-09-09T18:05:23.073Z · LW · GW

Obviously the problem of "make an agential "prior-building AI" that doesn't try to seize control of its off-switch" is being worked on almost exclusively by x-risk people.

Umm, obviously I did not claim it isn't. I just decomposed the original problem in a different way that didn't single out this part.

...if we can make a safe agential "prior-building AI" that gets to human-level predictive ability and beyond, then we've solved almost the whole TAI safety problem, because we could then run the prior-building AI, then turn it off and use microscope AI to extract a bunch of new-to-humans predictively-useful concepts from the prior it built—including new ideas & concepts that will accelerate AGI safety research.

Maybe? I'm not quite sure what you mean by "prior building AI" and whether it's even possible to apply a "microscope" to something superhuman, or that this approach is easier than other approaches, but I'm not necessarily ruling it out.

Or maybe another way of saying it would be: I think I put a lot of weight on the possibility that those "learning algorithms with strong formal guarantees" will turn out not to exist, at least not at human-level capabilities.

That's where our major disagreement is, I think. I see human brains as evidence such algorithms exist and deep learning as additional evidence. We know that powerful learning algorithms exist. We know that no algorithm can learn anything (no free lunch). What we need is a mathematical description of the space of hypotheses these algorithms are good at, and associated performance bounds. The enormous generality of these algorithms suggests that there probably is such a simple description.

...I'm having trouble imagining how that kind of thing would transfer to a domain where we need the algorithm to discover new concepts and leverage them for making better predictions, and we don't know a priori what the concepts look like, or how many there will be, or how hard they will be to find, or how well they will generalize, etc.

I don't understand your argument here. When I prove a theorem that "for all x: P(x)", I don't need to be able to imagine every possible value of x. That's the power of abstraction. To give a different example, the programmers of AlphaGo could not possibly anticipate all the strategies it came up or all the life and death patterns it discovered. That wasn't a problem for them either.

Comment by Vanessa Kosoy (vanessa-kosoy) on Sam Altman Q&A Notes - Aftermath · 2021-09-08T14:47:13.012Z · LW · GW

Just wanted to say that you did nothing wrong IMO. Also, I feel like I got some benefit from the notes, and the accuracy criticism seemed so weak that the notes were probably fairly accurate.

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-09-03T21:49:05.347Z · LW · GW

I think the confusion here comes from mixing algorithms with desiderata. HDTL is not an algorithm, it is a type of desideratum than an algorithm can satisfy. "the AI's prior has a combinatorial explosion" is true but "dumb process of elimination" is false. A powerful AI has to be have a very rich space of hypotheses it can learn. But this doesn't mean this space of hypotheses is explicitly stored in its memory or anything of the sort (which would be infeasible). It only means that the algorithm somehow manages to learn those hypotheses, for example by some process of adding more and more detail incrementally (which might correspond to refinement in the infra-Bayesian sense).

My thesis here is that if the AI satisfies a (carefully fleshed out in much more detail) version of the HDTL desideratum, then it is safe and capable. How to make an efficient algorithm that satisfies such a desideratum is another question, but that's a question from a somewhat different domain: specifically the domain of developing learning algorithms with strong formal guarantees and/or constructing a theory of formal guarantees for existing algorithms. I see the latter effort as to first approximation orthogonal to the effort of finding good formal desiderata for safe TAI (and, it also receives plenty of attention from outside the existential safety community).

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-09-02T18:29:45.332Z · LW · GW

I gave a formal mathematical definition of (idealized) HDTL, so the answer to your question should probably be contained there. But I'm not entirely sure what it is since I don't entirely understand the question.

The AI has a "superior epistemic vantage point" in the sense that, the prior is richer than the prior that humans have. But, why do we "still have the whole AGI alignment / control problem in defining what this RL system is trying to do and what strategies it’s allowed to use to do it"? The objective is fully specified.

A possible interpretation of your argument: a powerful AI would have to do something like TRL and access to the "envelope" computer can be unsafe in itself, because of possible side effects. That's truly a serious problem! Essentially, it's non-Cartesian daemons.

Atm I don't have an extremely good solution to non-Cartesian daemons. Homomorphic cryptography can arguably solve it, but there's large overhead. Possibly we can make do with some kind of obfuscation instead. Another vague idea I have is, make the AI avoid running computations which have side-effects predictable by the AI. In any case, more work is needed.

Recall that in this approach we need to imitate both aspects of the human policy—plausibly-human actions, and plausibly-human world-model-updates. This seems hard, because the AI only sees the human’s actions, not its world-model updates.

I don't see why is it especially hard, it seems just like any system with unobservable degrees of freedom, which covers just about anything in the real world. So I would expect an AI with transformative capability to be able to do it. But maybe I'm just misunderstanding what you mean by this "approach number 2". Perhaps you're saying that it's not enough to accurately predict the human actions, we need to have accurate pointers to particular gears inside the model. But I don't think we do (maybe it's because I'm following approach number 1).

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-09-01T21:51:03.643Z · LW · GW

Algorithmic Information Theory

Comment by Vanessa Kosoy (vanessa-kosoy) on Social behavior curves, equilibria, and radicalism · 2021-09-01T18:26:21.278Z · LW · GW

The descriptive part is great, but the prescriptive part is a little iffy. The optimal strategy is not choosing to be "radical" or "conformist". The optimal strategy is: do a Bayesian update on the fact that many other people are doing X, and then take the highest expected utility action. Even better, try to figure out why they are doing X (for example, by asking them) and update on that. It's true that Bayesian inference is hard and heuristics such as "be at such-and-such point on the radical-conformist axis" might be helpful, but there's no reason why this heuristic is always the best you can do.

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-08-31T20:16:05.226Z · LW · GW

I'm still a bit hazy on what happens next in the plan—i.e., getting from that probabilistic model to the more abstract "what the human wants".

Well, one thing you could try is using the AIT definition of goal-directedness to go from the policy to the utility function. However, in general it might require knowledge of the human's counterfactual behavior which the AI doesn't have. Maybe there are some natural assumption under which it is possible, but it's not clear.

It's still worth noting that I, Steve, personally can be standing in a room with another human H2, watching them cook, and I can figure out what H2 is trying to do.

I feel the appeal of this intuition, but on the other hand, it might be a much easier problem since both of you are humans doing fairly "normal" human things. It is less obvious you would be able to watch something completely alien and unambiguously figure out what it's trying to do.

....I'm even more concerned about this kind of design hitting a capabilities wall dramatically earlier than unsafe AGIs would.

To first approximation, it is enough for the AI to be more capable than us, since, whatever different solution we might come up with, an AI which is more capable than us would come up with a solution at least as good. Quantilizing from an imitation baseline seems like it should achieve that, since the baseline is "as capable as us" and arguably quantilization would produce significant improvement over that.

Specifically, I think it's important that an AGI be able to do things like "come up with a new way to conceptualize the alignment problem", and I think doing those things requires goal-seeking-RL-type exploration (e.g. exploring different possible mathematical formalizations or whatever) within a space of mental "actions" none of which it has ever seen a human take.

Instead of "actions the AI has seen a human take", a better way to think about it is "actions the AI can confidently predict a human could take (with sufficient probability)".

Comment by Vanessa Kosoy (vanessa-kosoy) on Vanessa Kosoy's Shortform · 2021-08-24T17:25:43.567Z · LW · GW

This is about right.

Notice that typically we use the AI for tasks which are hard for H. This means that without the AI's help, H's probability of success will usually be low. Quantilization-wise, this is a problem: the AI will be able to eliminate those paths for which H will report failure, but maybe most of the probability mass among apparent-success paths is still on failure (i.e. the success report is corrupt). This is why the timeline part is important.

On a typical task, H expects to fail eventually but they don't expect to fail soon. Therefore, the AI can safely consider a policies of the form "in the short-term, do something H would do with marginal probability, in the long-term go back to H's policy". If by the end of the short-term maneuver H reports an improved prognosis, this can imply that the improvement is genuine (since the AI knows H is probably uncorrupted at this point). Moreover, it's possible that in the new prognosis H still doesn't expect to fail soon. This allows performing another maneuver of the same type. This way, the AI can iteratively steer the trajectory towards true success.

Comment by Vanessa Kosoy (vanessa-kosoy) on Randal Koene on brain understanding before whole brain emulation · 2021-08-23T22:12:02.809Z · LW · GW

IMO there's a fair chance that it's much easier to do alignment given WBE, since it gives you a formal specification of the entire human policy instead of just some samples. For example, we might be able to go from policy to utility function using the AIT definition of goal-directedness. So, there is some case for doing WBE before TAI if that's feasible.

Comment by Vanessa Kosoy (vanessa-kosoy) on Research agenda update · 2021-08-23T19:39:16.803Z · LW · GW

For example, lots of discussion of IRL and value learning seem to presuppose that we’re writing code that tells the AGI specifically how to model a human. To pick a random example, in Vanessa Kosoy's 2018 research agenda, the "demonstration" and "learning by teaching" ideas seem to rely on being able to do that—I don't see how we could possibly do those things if the whole world-model is a bunch of unlabeled patterns in patterns in patterns in sensory input etc.

We can at least try doing those things by just having specific channels through which human actions enter the system. For example, maybe it's enough to focus on what the human posts on Facebook, so the AI just needs to look at that. The problem with this is, it leaves us open to attack vectors in which the channel in question is hijacked. On the other hand, even if we had a robust way to point to the human brain, we would still have attack vectors in which the human themself gets "hacked" somehow.

In principle, I can imagine solving these problems by somehow having a robust definition of "unhacked human", which is what you're going for, I think. But there might be a different type of solution in which we just avoid entering "corrupt" states in which the content of the channel diverges from what we intended. For example, this might be achievable by quantilizing imitation.

Comment by Vanessa Kosoy (vanessa-kosoy) on Coase's "Nature of the Firm" on Polyamory · 2021-08-15T08:54:15.839Z · LW · GW

Platonic friendships come and go but we don't really feel emotionally hurt from them.

Wow, that's so not my experience. I get hurt by those a lot.

Comment by Vanessa Kosoy (vanessa-kosoy) on Coase's "Nature of the Firm" on Polyamory · 2021-08-14T19:48:04.015Z · LW · GW

There are two key assumptions here: (i) sex automatically implies an especially deep bond, deeper than even the deepest platonic friendship and (ii) it is nearly always suboptimal to have more than one such deep bond. Assumption (i) is extremely dubious given that one-night stands are fairly common. Assumption (ii) is harder to disprove, but personally I am skeptical about it. Yes, there can be increasing returns from investing in a relationship with one person, but there can also be diminishing returns (for example because different relationships have complementary benefits). So, while this type of argument carries some weight, I doubt it can justify monogamy on purely pragmatic grounds.

Comment by Vanessa Kosoy (vanessa-kosoy) on OpenAI Codex: First Impressions · 2021-08-13T20:40:05.351Z · LW · GW

Hmm, I suppose they might be combining the problem statement and the prompt provided by the user into a single prompt somehow, and feeding that to the network? Either that or they're cheating :)

Comment by Vanessa Kosoy (vanessa-kosoy) on Coase's "Nature of the Firm" on Polyamory · 2021-08-13T19:16:55.017Z · LW · GW

I'm not sure I follow you, but here's my steelman: Alice and Bob are in a prisoner's dilemma type of game. If Bob takes Carol as a lover while Alice doesn't have anyone else, Bob will be better off and Alice will be worse off, because Bob is getting affection from two people while Alice is only getting part of Bob's affection. If Bob takes Carol as a lover and Alice takes David as a lover, then both of them are worse-off: naively each gets 50% of the affection of each of 2 people, but this amounts to less than 100% of the affection of 1 person because of "transaction cost". So, monogamy is a norm that enforces cooperation to mutual benefit.

I don't think that's really how it works, but at least I see the logic in the argument.

Comment by Vanessa Kosoy (vanessa-kosoy) on OpenAI Codex: First Impressions · 2021-08-13T18:49:29.043Z · LW · GW

IIUC, the contest was only on time, not on correctness? Because correctness was verified by some pre-defined automatic tests? If so, how was Codex deployed solo? Did they just sample it many times on the same prompt until it produced something that passed the tests? Or something more sophisticated?

Also:

In all fairness, the competition paradigm was many-to-some — everyone faced the same five problems. So, Codex will have a rich data of differentiated prompts for the same set of problems. It might give the AI a learning edge (in the case of concurrent active learning).

This makes no sense to me. Do you assume solo-Codex exploited the prompts submitted by other competitors? Or that the assistant-Codexes communicated with each other somehow? I kinda doubt either of those happened.

Comment by Vanessa Kosoy (vanessa-kosoy) on Coase's "Nature of the Firm" on Polyamory · 2021-08-13T18:08:06.772Z · LW · GW

Obviously finding multiple lovers is more work than finding one lover. But, this doesn't explain monogamy. Monogamy is not just "one lover is enough", it is "you are forbidden to take multiple lovers (even if they are easily available)". I don't think the latter is explainable by any sort of efficiency argument, it's just a question of jealousy and/or cultural convention.

And, polyamory doesn't have to be "ephemeral". How long you relationships last and how many relationships you have are a priori independent variables

Another way to look it, most people have many relationships, if platonic relationships are included. The difference between monogamy and polyamory is not how many people you have in your life, it's are you allowed to kiss / have sex with more than one of them.

Comment by Vanessa Kosoy (vanessa-kosoy) on Open and Welcome Thread – July 2021 · 2021-07-31T20:05:40.916Z · LW · GW

This is known as "agent simulates predictor". There has been plenty of discussion of this problem. I'm currently feeling too lazy to try to summarize or link all the approaches, but here are some thoughts I had about it via my infra-Bayesian theory.

Comment by Vanessa Kosoy (vanessa-kosoy) on Did they or didn't they learn tool use? · 2021-07-29T16:49:47.982Z · LW · GW

On page 28 they say:

Whilst some tasks do show successful ramp building (Figure 21), some hand-authored tasks require multiple ramps to be built to navigate up multiple floors which are inaccessible. In these tasks the agent fails.

From this, I'm guessing that it sometimes succeeds to build one ramp, but fails when the task requires building multiple ramps.

Comment by Vanessa Kosoy (vanessa-kosoy) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-27T22:11:05.336Z · LW · GW

I don't see what the big deal is about laws of physics. Humans and all their ancestors evolved in a world with the same laws of physics; we didn't have to generalize to different worlds with different laws. Also, I don't think "be superhuman at figuring out the true laws of physics" is on the shortest path to AIs being dangerous. Also, I don't think AIs need to control robots or whatnot in the real world to be dangerous, so they don't even need to be able to understand the true laws of physics, even on a basic level.

The entire novelty of this work revolves around zero-shot / few-shot performance: the ability to learn new tasks which don't come with astronomic amounts of training data. To evaluate to which extent this goal has been achieved, we need to look at what was actually new about the tasks vs. what was repeated in the training data a zillion times. So, my point was, the laws of physics do not contribute to this aspect.

Moreover, although the laws of physics are fixed, we didn't evolve to know all of physics. Lots of intuition about 3D geometry and mechanics: definitely. But there are many, many things about the world we had to learn. A bronze age blacksmith posseted sophisticated knowledge about the properties of materials and their interaction that did not come from their genes, not to mention a modern rocket scientist. (Ofc, the communication of knowledge means that each of them benefits from training data acquired by other people and previous generations, and yet.) And, learning is equivalent to performing well on a distribution of different worlds.

Finally, an AI doesn't need to control robots to be dangerous but it does need to create sophisticated models of the world and the laws which govern it. That doesn't necessarily mean being good at the precise thing we call "physics" (e.g. figuring out quantum gravity), but it is a sort of "physics" broadly construed (so, including any area of science and/or human behavior and/or dynamics of human societies etc.)

I agree it would be a bigger deal if they could use e.g. first-order logic, but not that much of a bigger deal? Put it this way: wanna bet about what would happen if they retrained these agents, but with 10x bigger brains and for 10x longer, in an expanded environment that supported first-order logic?

I might be tempted to take some such bet, but it seems hard to operationalize. Also hard to test unless DeepMind will happen to perform this exact experiment.

Comment by Vanessa Kosoy (vanessa-kosoy) on DeepMind: Generally capable agents emerge from open-ended play · 2021-07-27T20:46:16.699Z · LW · GW

This is certainly interesting! To put things in proportion though, here are some limitations that I see, after skimming the paper and watching the video:

  • The virtual laws of physics are always the same. So, the sense in which this agent is "generally capable" is only via the geometry and the formal specification of the goal. Which is still interesting to be sure! But not as a big deal as it would be if it did zero-shot learning of physics (which would be an enormous deal IMO).
  • The formal specification is limited to propositional calculus. This allows for a combinatorial explosion of possible goals, but there's still some sense in which it is "narrow". It would be a bigger deal if it used some more expressive logical language.
  • For some tasks it looks like the agent is just trying vaguely relevant things at random until it achieves the goal. So, it is able to recognize the goal has been achieved, but less able to come up with efficient plans for achieving it. While "trying stuff until something sticks" is definitely a strategy I can relate to, it is not as impressive as planning in advance. Notice that just recognizing the goal is relatively easy: modulo the transformation from 2D imagery to a 3D model (which is certainly non-trivial but not a novel capability), you don't need AI to do it at all (indeed the environment obviously computes the reward via handcrafted code).
Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-24T08:50:10.426Z · LW · GW

First, negotiations theory has progressed past game theory solutions to a more psychologically based methodology.

Hmm. Do you have a reference which is not, like, an entire book?

Second, the second vow focuses too much on the target (a KS solution bargain to the disagreement) and too little on the process.

Well, the process is important, but I feel like the discourse norms exemplified by this community already have us covered there, give or take.

Third, KS systems (like other game theory approaches) are difficult to quantify. It's hard to assign a dollar value to taking out the trash versus doing the dishes.

It's not dollar value, it's utilon value. I agree that quantification is challenging, but IMO it only reflects the complexity of the underlying reality that we have to deal with one way or the other. In principle, you can always quantify the utility function by asking enough questions of the form "do I prefer this lottery over outcomes to that lottery over outcomes".

Fourth, a KS/game-theory negotiations approach to disputes promotes an accounting approach to your relationship.

I think that all relationships are already accounting, people are just not always honest about it. Problems arise from people having different expectations / standards of fairness that they expect others to follow while never negotiating them explicitly. The latter is what we want to avoid here.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-24T08:32:06.940Z · LW · GW

Ah, super fair. Splitting any outside income 50/50 would still work, I think.

I think that a 50/50 splits creates the wrong incentives, but I am reluctant to discuss this in public.

PS: Of course this was also prompted by us nerding out about your and Marcus's vows so thank you again for sharing this. I'm all heart-eyes every time I think about it!

<3

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-24T08:18:43.750Z · LW · GW

I had an extremely painful and emotional divorce myself, so I am aware. Although, I tend to reject the idea that emotions prevent you from thinking straight. I think that's a form of strategic self-deception.

Strictly speaking, the vows don't say all decisions must be unanimous (although if they aren't it becomes kinda tricky to define the bargaining solution). However, arguably, if both of us follow the vows and we have common knowledge about this, we should arrive at unanimous decisions[1]. This is the desirable state. On the other hand, it's also possible that one of us breaks the vows, or erroneously beliefs that the other broke the vows[2], in which case unilateral action might be consistent with the vows. So, there's no implication that unilateral divorce is completely forbidden.


  1. By Aumann agreement, but even if we have different priors so Aumann agreement doesn't apply, we should still be able to state those priors and compute the bargaining solution on that basis. ↩︎

  2. Further levels of recursion are ruled out if we assume that one is not allowed to dismiss the vows on account of an unconscionable violation by the other party without declaring this to the other party, which is probably a good clause to add. ↩︎

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-23T15:00:54.902Z · LW · GW

Let me first state, that this is quite inspiring!

Thank you!

70% of the couples that marry these days (meaning, Millenials and Generation X) are subject to get a divorce within a decade...

I wanted to inquire, without judgement, how do you reconcile with this fact?

I am divorced myself, and my previous marriage lasted about a decade. Still, I don't know if there's much to reconcile. Obviously there is always risk that the marriage will fail. Equally obviously, staying without a primary lover forever is a worse alternative (for me).

Previous generations did not have WhatsApp, and so they missed each other truly. They did not have Instagram, and so they did not lust after things they do not have nor did they constantly focused on what is 'missing' in their life. They did not have facebook or Linkedin, and so they did not spend the time running the rat-race, chasing passions to prove to others a point.

Luckily I don't have Instagram or Facebook, and although I have a LinkedIn account, I don't engage with the content there. I do have WhatsApp but I'm skeptical that it's really so bad.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-23T14:52:35.950Z · LW · GW

One way to interpret this is "I will do my best effort to follow the optimal policy". On the other hand, when you're optimizing for just your own utility function, one could argue that the "best effort" is exactly equal to the optimal policy once you take constraints and computational/logical uncertainty into account. On the third hand, perhaps for bargaining the case for identifying "best effort" and "optimal" is weaker. In practice, what's important is that even if you followed a suboptimal policy for a while, there's a well-defined way to return to optimal behavior. This is true for Nash bargaining (because of independence of irrelevant alternatives), less so for KS! Which is why I'm leaning towards switching to Nash. And if I fail to even make the best effort, there's the clause about how to amend.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-23T14:39:36.637Z · LW · GW

In the original paper they have "Assumption 4" which clearly states they disregard solutions that don't dominate the disagreement point. But, you have a good point that when those solutions are taken into account, you don't really have monotonicity.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-23T11:03:44.429Z · LW · GW

First of all, this is awesome.

Thank you :)

It seems kind of odd that terrible solutions like (1000, -10^100) could determine the outcome (I realize they can't be the outcome, but still).

I think you might be misunderstanding how KS works. The "best" values in KS are those that result when you optimize one player's payoff under the constraint that the second player's payoff is higher than the disagreement payoff. So, you completely ignore outcomes where one of us would be worse off in expectation than if we didn't marry.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-23T10:55:23.002Z · LW · GW

We do have margin for minor violations of the vows, as long as they are not "unconscionable". Granted, we don't have a precise definition of "unconscionable", but certainly if both of us agree that a violation is not unconscionable then it isn't.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-23T10:51:09.986Z · LW · GW

Marcus has chronic illness. This means their contribution to the household can vary unpredictably, practically on any timescale. As a result, it's hard to think of any split that's not going to be skewed in one or other direction in some scenarios. Moreover, they are unable to hold to job, so their time doesn't have opportunity cost in a financial sense.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-22T21:17:06.311Z · LW · GW

Our first question is whether you intend to merge your finances.

We do, at least because I'm the only one who has income.

My next question is why the KS solution vs the Nash solution to the bargaining problem?

I'm actually not sure about this. Initially I favored KS because monotonicity seemed more natural than independence of irrelevant alternatives. But then I realized than in sequential decision making, IIA is important because it allows you to consistently optimize your policy on a certain branch of the decision tree even if you made suboptimal actions in the past. There seems to be a way of working around this violation of IIA in KS, but it makes KS look more like a hack.

But also are you sure the Shapley value doesn't make more sense here?

Shapley value is for distributing transferable values, like money. For general utility functions there's no easy way to convert 1 Vanessa-utilon to 1 Marcus-utilon.

Thanks so much for sharing this. It's so sweet and nerdy and heart-warming and wonderful! And congratulations!

Thank you <3

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-22T21:07:05.504Z · LW · GW

Firstly I think that " 's " is a grammar mistake and it should just read "...or until my [spouse] breaks..." instead.

You're right, thanks!

Allowing yourself to cancel following your vows because your spouse willfully stopped following theirs is a little dangerous. It leads to situations where you might rather justify your own breach of the vows by pointing to their breach instead of trying to make things right.

I agree that it's a possible failure mode, but the alternative seems worse? Suppose that my spouse starts completely disregarding the vows and breaking them egregiously. Do you really think I should still follow my own vows to the letter?

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-22T21:02:11.000Z · LW · GW

We, as in humans, are poorly defined, barely conscious, irrational, lumps of meat. We are not aware of our own utility functions let alone those of others, especially as they change over time and chaotically in the course of a day. We are unable to follow a precise recipe like the one you have outlined.

I'm not convinced. I have a rather favorable view of human agency and rationality compared to the distribution of opinions in this community, and I think it's not the place to hash out these differences. For our present purpose, just assume that we are able (or at least give concrete examples of where we will fail).

I think the only thing you can communicate with your vows is the spirit of your words not their contractual meaning. That's why you often hear in others vows poorly defined meaningless things like "I feel you and I were destined to be together" or "I love you with all of my heart".

I think that marriage is a contract, except that usually it's unwritten. Which often enough leads to disaster when the two spouses have different notions of what the contract entails. People prefer meaningless things over explicit contracts for signaling reasons, but we explicitly decided to pursue a different strategy.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-22T20:50:34.551Z · LW · GW

Very interesting - and congratulations!

Thank you :)

It strikes me that the first vow will sometimes conflict with the second.

Well, yes, the intent it is that the Vow of Honest takes precedence over the Vow of Concord.

Have you considered going meta? "I make the set of vows determined by the Kalai-Smorodinski solution to the bargaining problem..."

I'm not sure what's the difference between "set of vows" and "policy"? When I say "policy" I refer to the set of behaviors we are actually capable of choosing from, including computational and other constraints.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-22T20:46:13.463Z · LW · GW

A more precise formulation would be: "when choosing what information to pass on, optimize solely for your best estimate of the spouse's utility function".

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-22T20:41:57.082Z · LW · GW

That's why I wrote "in the counterfactual in which the source of said doubt or dispute would be revealed to us and understood by us with all of its implications at that time as well as we understand it at the time it actually surfaced", so we do use the new information and experience.

The reason I want to anchor it to our present selves is because at present we are fairly aligned. We have pretty good common understanding of what we want form these vows. On the other hand, the appearance of a dispute in the future might be the result of us becoming unaligned. This would create the temptation for each of us to interpret the vows in their own favor. The wedding-time anchor mitigates that somewhat, because it requires us to argue with a straight face that our wedding-time-selves would endorse the given interpretation.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-22T20:30:24.299Z · LW · GW

So then, these vows could only be made if you have an extremely high level of already having untangled yourself / the elephant, such that it's even possible for you to not (self-)deceive.

I believe that it's always possible for you to not self-deceive. The only real agent is the "elephant". The conscious self is just a "mask" this agent wears, by choice. It can equally well choose to wear a different mask if that benefits it.

What's "on purpose" doing here?

I just mean that there is an intent to deceive, rather than an accidental miscommunication.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-21T18:56:10.834Z · LW · GW

Thank you :)

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-21T18:51:04.112Z · LW · GW

Thank you for sharing. I'm sorry it worked out so poorly for you!

It sounds like your situation was not at all Pareto efficient? If so, this Vow of Concord would not preclude you from divorce? Notice that the Vow does not say that both spouses must locally prefer divorce for divorce to happen. It only says that divorce must be part of the bargaining-optimal policy.

For example, consider the following scenario:

  • If we wouldn't get married, our payoffs would be .
  • With probability we will have a mutually beneficial marriage in which each has payoff .
  • With probability the marriage will be detrimental to me (payoff ) and beneficial to my spouse (payoff ).
  • With probability the marriage will be beneficial to me (payoff ) and detrimental (w.r.t counterfactual) to my spouse (payoff ).

(Here "beneficial" and "detrimental" are to be understood relatively to the counterfactual in which this marriage didn't happen.)

Then, the policy "never get divorced" has payoff vector whereas the policy "stay married iff the marriage is mutually beneficial" has payoff vector . The latter is Pareto dominant therefore the Vow of Concord will select the latter.

Now let's change the payoffs:

  • If we wouldn't get married, our payoffs would be .
  • With probability we will have a mutually beneficial marriage in which each has payoff .
  • With probability the marriage will be detrimental to me (payoff ) and beneficial to my spouse (payoff ).
  • With probability the marriage will be beneficial to me (payoff ) and detrimental to my spouse (payoff ).

Now "never get divorced" has payoff vector while "stay married iff the marriage is mutually beneficial" still has payoff vector . The Vow of Concord requires us to stay married. Without the Vow (or decision-theoretic ability to simulate the Vow), both of us would be worse off in expectation!

In practice, I think that miserable marriages are virtually guaranteed to land in the "get divorced" territory, since they tend to make both spouses miserable.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-21T14:58:26.200Z · LW · GW

Well, at any given moment we will use the best-guess decision theory we have at the time.

Comment by Vanessa Kosoy (vanessa-kosoy) on My Marriage Vows · 2021-07-21T14:53:04.270Z · LW · GW

To phrase my intent more precisely: whatever the decision theory we will come to believe in[1] is, we vow to behave in a way which is the closest analogue in that decision theory of the formal specification we gave here in the framework of ordinary Bayesian sequential decision making.


  1. It is also possible we will disagree about decision theory. In that case, I guess we need to defer to whatever is the most concrete "metadecision theory" we can agree upon. ↩︎