Open problem: thin logical priors

post by TsviBT · 2017-01-11T20:00:08.000Z · score: 5 (5 votes) · LW · GW · None comments


    Background / Motivation
    Problem statement
  Type signature


In short, and at a high level, the problem of thin priors is to understand how an agent can learn logical facts and make use of them in its predictions, without setting up a reflective instability across time. Before the agent knows the fact, it is required by logical uncertainty to “care about” worlds where the fact does not hold; after it learns the fact, it might no longer care about those worlds; so the ignorant agent has different goals than the knowing agent. This problem points at a hole in our basic understanding, namely how to update on logical facts; logical induction solves much of logical uncertainty, but doesn’t clarify how to update on computations, since many logical facts are learned “behind the scenes” by traders.


The ideas in this post seem to have been discussed for some time. Jessica brought them up in a crisper form in a conversation a while go with me, and also came up with the name; this post is largely based on ideas in that conversation and some subsequent ones with other people, possibly refined / reframed.

Background / Motivation

It would be nice to have a reflectively stable decision theory (i.e. a decision theory that largely endorses itself to continue making decisions over other potential targets of self-modification); this the most basic version of averting / containing instrumental goals, which is arguably necessary in some form to make a safe agent. Agents that choose policies using beliefs that have been updated on (logical) observations seem to be unstable, presenting an obstacle. More specifically, we have the following line of reasoning:

If we could write down a prior over logical statements that was thin enough to be computable, but rich enough to be useful for selecting policies (which may depend on or imply further computations), then we might be able write down a reflectively stable agent.

Problem statement


Type signature

A natural type for a thin prior is , a distribution on sequence space. We may want to restrict to distributions that assign probability 1 to propositionally consistent worlds (that is, we may want to fix an encoding of sentences). We may also want to restrict to distributions that are computable or efficiently computable—that is, the function is computable using an amount of time that is some reasonable function of , where is a finite dictionary of results of computations.

Another possible type is . That is, a thin “prior” is not a prior, but rather a possibly more general system of counterfactuals, where is intended to be interpreted as the agent’s “best guess at what is true in the counterfactual world in which computations behave as specified by ”. Given the condition that this is equivalent to just a fixed distribution in . But since this condition can be violated, as in e.g. causal counterfactuals, this type signature is strictly more general. (We could go further and distinguish background known facts, facts to counterfact on, and unclamped facts.)

In place of we might instead put , meaning that the prior is not just prior probabilities, but rather prior beliefs about counterfactual worlds given that the agent takes different possible actions.


None comments

Comments sorted by top scores.

comment by paulfchristiano · 2017-01-12T19:56:56.000Z · score: 2 (2 votes) · LW(p) · GW(p)

I think the fact that traders are updating "behind the scenes" is an important problem with logical inductors (and with Solomonoff induction, though the logical inductors case is philosophically clearer to think about). It seems more natural to me to study that problem in the purely epistemic setting though.

In particular, there are conditions where we systematically expect traders to predict badly, e.g. because some of them are consequentialists and by predicting badly they can influence us in a desired way. As a result, although logical inductors are reflectively consistent in the limit, at finite times we don't approximately trust their judgments (even after they have run for more than long enough to update on all of the logical facts that we know).

I am more interested in progress on this problem than about the application to decision theory (and I think that the epistemic version is equally philosophically appealing), so if I were thinking about thin priors I would have a somewhat different focus.

the notion of a good thin prior might be partially dependent on subjective human judgments, and so not amenable to math

I agree with this, but if we lower the bar from "correct" to not actively bad it feels like there ought to be a solution.

comment by TsviBT · 2017-01-14T08:51:22.000Z · score: 0 (0 votes) · LW(p) · GW(p)

I agree that the epistemic formulation is probably more broadly useful, e.g. for informed oversight. The decision theory problem is additionally compelling to me because of the apparent paradox of having a changing caring measure. I naively think of the caring measure as fixed, but this is apparently impossible because, well, you have to learn logical facts. (This leads to thoughts like "maybe EU maximization is just wrong; you don't maximize an approximation to your actual caring function".)