janos-kramar

Posts
Comments

Posts

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2) 2025-03-26T19:07:48.710Z

JumpReLU SAEs + Early Access to Gemma 2 SAEs 2024-07-19T16:10:54.664Z

Improving Dictionary Learning with Gated Sparse Autoencoders 2024-04-25T18:43:47.003Z

[Full Post] Progress Update #1 from the GDM Mech Interp Team 2024-04-19T19:06:59.185Z

[Summary] Progress Update #1 from the GDM Mech Interp Team 2024-04-19T19:06:17.755Z

AtP*: An efficient and scalable method for localizing LLM behaviour to components 2024-03-18T17:28:37.513Z

Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5) 2023-12-23T02:46:25.892Z

Fact Finding: How to Think About Interpreting Memorisation (Post 4) 2023-12-23T02:46:16.675Z

Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3) 2023-12-23T02:46:05.517Z

Fact Finding: Simplifying the Circuit (Post 2) 2023-12-23T02:45:49.675Z

Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1) 2023-12-23T02:44:24.270Z

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla 2023-07-20T10:50:58.611Z

Infinite Modal Combat: some observations 2015-07-29T04:05:14.000Z

A tractable, interpretable formulation of approximate conditioning for pairwise-specified probability distributions over truth values 2015-06-03T19:08:18.000Z

Comments

Comment by János Kramár (janos-kramar) on On Defense Mechanisms · 2018-03-05T22:13:58.167Z · LW · GW

Seems also like the "playing dead" behaviour. If you're under attack and aren't going to summon/indicate allies (via sadness) or enforce your boundary yourself (via anger) or appease the attacker (via submission), another option is to give up on active response and hope that if you play dead just right, they'll lose interest for some reason. Many attackers' goals are better served by a responsive opponent; and attacking someone dead is both potentially unhealthy and no fun.

Comment by János Kramár (janos-kramar) on Concise Open Problem in Logical Uncertainty · 2016-01-12T21:08:10.000Z · LW · GW

Ah, I think I can stymy $M$ with 2 nonconstant advisors. Namely, let $A_{1} (n) = \frac{1}{2} - \frac{1}{n + 3}$ and $A_{2} (n) = \frac{1}{2} + \frac{1}{n + 3}$ . We (setting up an adversarial $E$ ) precommit to setting $E (n) = 0$ if $p (n) \geq A_{2} (n)$ and $E (n) = 1$ if $p (n) \leq A_{1} (n)$ ; now we can assume that $M$ always chooses $p (n) \in [A_{1} (n), A_{2} (n)]$ , since this is better for $M$ .

Now define $b_{i}^{'} (j) = | A_{i} (j) + E (j) - 1 | - | p (j) + E (j) - 1 |$ and $b_{i} (n) = \sum_{j < n} b_{i}^{'} (j)$ . Note that if we also define ${bad}_{i} (n) = \sum_{j < n} (log | A_{i} (j) + E (j) - 1 | - log | p (j) + E (j) - 1 |)$ then $\sum_{j < n} | 2 b_{i} (j) - {bad}_{i} (j) | \leq \sum_{j < n} (2 A_{1} (j) - 1 - log (2 A_{1} (j)))) = \sum_{j < n} O ({(\frac{1}{2} - A_{1} (j))}_{1}^{2})$ is bounded; therefore if we can force $b_{1} (n) \to \infty$ or $b_{2} (n) \to \infty$ then we win.

Let's reparametrize by writing $δ (n) = A_{2} (n) - A_{1} (n) = \frac{2}{n + 3}$ and $q (n) = \frac{p (n) - A_{1} (n)}{δ (n)}$ , so that $b_{i}^{'} (j) = δ (j) (| i - 2 + E (j) | - | q (j) - 1 + E (j) |)$ .

Now, similarly to how $M$ worked for constant advisors, let's look at the problem in rounds: let $s_{0} = 0$ , and $s_{n} = ⌊ exp (s_{n - 1} - 1) ⌋ + 1$ for $n > 0$ . When determining $E (s_{n - 1}), \dots, E (s_{n} - 1)$ , we can look at $p (s_{n - 1}), \dots, p (s_{n} - 1)$ . Let $t_{n} = ⌊ b_{2} (s_{n}) - \frac{1}{n} ⌋$ . Let's set $E (s_{n - 1}), \dots, E (s_{n} - 1)$ to 1 if $\sum_{j = s_{n - 1}}^{s_{n} - 1} δ (j) (1 - q (j)) \geq 1$ ; otherwise we'll do something more complicated, but maintain the constraint that $b_{2} (s_{n}) \geq b_{2} (s_{n - 1}) - \frac{1}{n (n - 1)} \geq t_{n - 1} + \frac{1}{n}$ : this guarantees that $t_{n}$ is nondecreasing and that ${l i m i n f}_{j \to \infty} b_{2} (j) \geq {lim}_{n \to \infty} t_{n}$ .

If $t_{n} \to \infty$ then $b_{2} (n) \to \infty$ and we win. Otherwise, let $t = {lim}_{n \to \infty} t_{n}$ , and consider $n$ such that $t_{n - 1} = t$ .

We have $\sum_{j = s_{n - 1}}^{s_{n} - 1} δ (j) (1 - q (j)) < 1$ . Let $J \subseteq {s_{n - 1}, \dots, s_{n} - 1}$ be a set of indices with $q (j) \geq q (j^{'})$ for all $j \in J, j^{'} \notin J$ , that is maximal under the constraint that $\sum_{j \in J} δ (j) (1 - q (j)) \leq \frac{1}{n (n - 1)}$ ; thus we will still have $\sum_{j \in J} δ (j) (1 - q (j)) \geq \frac{1}{n (n - 1)} - δ (s_{n - 1})$ . We shall set $E (j) = 0$ for all $j \in J$ .

By the definition of $J$ : $\begin{matrix} \sum j \in J b_{1}^{'} (j) & = \sum j \in J δ (j) q (j) \geq \sum j \in J δ (j) (1 - q (j)) \frac{\sum_{j = s_{n - 1}}^{s_{n} - 1} δ (j) q (j)}{\sum_{j = s_{n - 1}}^{s_{n} - 1} δ (j) (1 - q (j))} \geq (\frac{1}{n (n - 1)} - δ (s_{n - 1})) \frac{\sum_{j = s_{n - 1}}^{s_{n} - 1} δ (j) - 1}{1} \geq (\frac{1}{n (n - 1)} - δ (s_{n - 1})) (2 log (\frac{s_{n} + 3}{s_{n - 1} + 3}) - 1) \geq 2 if n ≫ 0 \end{matrix}$

For $j^{'} \notin J$ , we'll proceed iteratively, greedily minimizing $∣ ∣ \sum_{j^{'} = s_{n - 1}}^{j} 1_{j^{'} \notin J} (b_{1}^{'} (j^{'}), b_{2}^{'} (j^{'})) ∣ ∣$ . Then: $\begin{matrix} min s_{n - 1} \leq j < s_{n} j \sum j^{'} = s_{n - 1} 1_{j^{'} \notin J} b_{1}^{'} (j^{'}) & \geq - \sqrt{s_{n} - 1 \sum j = s_{n - 1} δ (j)^{2}} = - 2 \sqrt{s_{n} - 2 \sum j = s_{n - 1} + 3 \frac{1}{j^{2}}} \geq - 2 \sqrt{s_{n} - 2 \sum j = s_{n - 1} + 3 (\frac{1}{j - 1} - \frac{1}{j})} \geq - \frac{2}{\sqrt{s_{n - 1} + 2}} \geq - 1 if n ≫ 0 \end{matrix}$

Keeping this constraint, we can flip (or not flip) all the $E (j^{'})$ s for $j^{'} \notin J$ so that $\sum_{j^{'} = s_{n - 1}}^{s_{n} - 1} 1_{j^{'} \notin J} b_{2}^{'} (j^{'}) > 0$ . Then, we have $b_{2} (s_{n}) \geq b_{2} (s_{n - 1}) - \frac{1}{n (n - 1)}$ , $b_{1} (s_{n}) - b_{1} (s_{n - 1}) = \sum_{j = s_{n - 1}}^{s_{n}} (1_{j \in J} + 1_{j \notin J}) b_{1}^{'} (j) \geq 2 - 1 = 1$ if $n ≫ 0$ , and for $s_{n - 1} \leq j \leq s_{n}$ , $b_{1} (j) \geq b_{1} (s_{n - 1}) + \sum_{j^{'} = s_{n - 1}}^{j - 1} 1_{j^{'} \notin J} b_{1}^{'} (j^{'}) \geq b_{1} (s_{n - 1}) - 1$ if $n ≫ 0$ .

Therefore, $b_{1} (j) \to \infty$ , so we win.

Comment by János Kramár (janos-kramar) on Concise Open Problem in Logical Uncertainty · 2016-01-12T20:45:12.000Z · LW · GW

I don't yet know whether I can extend it to two nonconstant advisors, but I do know I can extend it to a countably infinite number of constant-prediction advisors. Let $(P_{i})_{i = 0, \dots}$ be an enumeration of their predictions that contains each one an infinite number of times. Then:

def M(p, E, P):
    prev, this, next = 0, 0, 1
    def bad(i):
        return sum(log(abs((E[k] + P[i] - 1) /
                           (E[k] + p[k] - 1)))
                   for k in xrange(prev))
    for k in xrange(this, next): p[k] = 0.5
    prev, this, next = this, next, floor(exp(next - 1)) + 1

    for i in xrange(0, Inf):
        for k in xrange(this, next): p[k] = P[i]
        prev, this, next = this, next, floor(exp(next - 1)) + 1

bad(i) is now up to date through E[:this], not just E[:prev]

        bound = 2 * bad(i)
        for j in xrange(0, Inf):
            if P[j] == P[i]: continue
            flip = P[j] < P[i]
            p1, p2 = abs(P[i] - flip), abs(P[j] - flip)
            for k in xrange(this, next): p[k] = abs(p1 - flip)
            prev, this, next = this, next, floor(exp(next - 1)) + 1
            
            if bad(i) <= 0: break
            while bad(i) > 0 and bad(j) > 0:
                # won't let bad(i) surpass bound
                eps = (bound - bad(i)) / 2 / abs(1 - p1 - flip) / (next - this)

This is just for early iterations of the inner loop; in the limit, eps should be just enough for bad(i) to go halfway to bound if we let p = abs(p1 + eps - flip):

                while eps >= 1 - p1 or
                      bound <= bad(i) + (next - this) *
                        log((1 - p1) / (1 - p1 - eps)):
                    eps /= 2
                for k in xrange(this, next): p[k] = abs(p1 + eps - flip)
                prev, this, next = this, next, floor(exp(next - 1)) + 1

                for k in xrange(this, next): p[k] = abs(p1 - flip)
                # this is where the P[i] + d * eps affects bad(i)

Consider $q = \frac{log (1 - p 1) - log (1 - p 2)}{log (1 - p 1) - log (1 - p 2) + log (p 2) - log (p 1)}$ . This $q$ is the probability between p1 and p2 such that if E[k] is chosen with probability $| q - f l i p |$ then that will have an equal impact on bad(i) and bad(j). Now consider some $q^{'}$ between p1 and $q$ . Every iteration where $mean (| E [p r e v : t h i s] - f l i p |) \leq q^{'}$ will decrease bad(j) by a positive quantity that's at least linear in this-prev, so (at least after the first few such iterations) this will exceed $p r e v \cdot - log ({max}_{k : P_{k} has been reached} max (P_{k}, 1 - P_{k})) > b a d (j)$ , so it will turn bad(j) negative. If this happens for all j then M cannot be bad for E. If it doesn't, then let's look at the first j where it doesn't. After a finite number of iterations, every iteration must have $mean (| E [p r e v : t h i s] - f l i p |) > q^{'}$ . However, this will cause bad(i) to decrease by a positive quantity that's at least proportional to bound - bad(i); therefore, after a finite number of such iterations, we must reach $b a d (i) < 0$ . So if M is bad for E then for each value of i we will eventually make $b a d (i) < 0$ and then move on to the next value of i. This implies M is not bad for E.

Emboldened by this, we can also consider the problem of building an $M$ that isn't outperformed by any constant advisor. However, this cannot be done, according to the following handwavy argument:

Let $q$ be some incompressible number, and let $E (i) i i d \sim Bern (q)$ . When computing $p (n)$ , $M$ can't do appreciably better than Laplace's law of succession, which will give it standard error $\sqrt{\frac{q (1 - q)}{log (n)}}$ , and relative badness $\sim \frac{q (1 - q)}{log (n)} (\frac{1}{q} + \frac{1}{1 - q}) = \frac{1}{log (n)}$ (relative to the $q$ -advisor) on average. For $i \leq n$ , and $n ≫ 0$ , the greatest deviation of the badness from the $\sum_{j = 2}^{i} \frac{1}{log (j)} \geq \frac{i - 1}{log (i)}$ trend is $\approx \sqrt{2 n log log (n) q (1 - q)}$ (according to the law of the iterated logarithm), which isn't enough to counteract the expected badness; therefore the badness will converge to infinity.

Comment by János Kramár (janos-kramar) on Concise Open Problem in Logical Uncertainty · 2015-12-16T21:27:24.000Z · LW · GW

def M(p, E):
    p1, p2 = 1./3, 2./3
    prev, this, next = 0, 0, 1

bad1 and bad2 compute log-badnesses of M relative to p1 and p2, on E[:prev]; the goal of M is to ensure neither one goes to $\infty$ . prev, this, next are set in such a way that M is permitted access to this when computing p[this:next].

    def bad(advisor):
        return lambda:
            sum(log(abs((E[i] + advisor(i) - 1) /
                        (E[i] + p[i] - 1)))
                for i in xrange(prev))
    bad1, bad2 = bad(lambda i: p1), bad(lambda i: p2)
    for i in xrange(this, next): p[i] = 0.5
    prev, this, next = this, next, floor(exp(next - 1)) + 1

    while True:
        for i in xrange(this, next): p[i] = p1
        prev, this, next = this, next, floor(exp(next - 1)) + 1

bad1() is now be up to date through E[:this], not just E[:prev]

        bound = 2 * bad1()
        while bad1() > 0:
            # won't let bad1() surpass bound
            eps = (bound - bad1()) / 2 / (1 - p1) / (next - this)

This is just for early iterations; in the limit, eps should be just enough for bad1 to go halfway to bound:

            while eps >= 1 - p1 or
              bound <= bad1() + (next - this) *
              log((1 - p1) / (1 - p1 - eps)):
                eps /= 2
            for i in xrange(this, next): p[i] = p1 + eps
            prev, this, next = this, next, floor(exp(next - 1)) + 1

            for i in xrange(this, next): p[i] = p1
            # this is where the p1 + eps affects bad1()
            prev, this, next = this, next, floor(exp(next - 1)) + 1

Now every iteration (after the first few) where $mean (E [p r e v : t h i s]) \leq \frac{2}{5}$ will decrease bad2() by roughly at least $(t h i s - p r e v) (log (1 - p 1) - log (1 - p 2) + \frac{2}{5} (log (p 1) - log (p 2) - log (1 - p 1) + log (1 - p 2))) = (t h i s - p r e v) \frac{1}{5} log (2) ≫ p r e v$ , which is large enough to turn bad2() negative. Therefore, if M is bad for E, there can be only finitely many such iterations until the loop exits. However, every iteration where $mean (E [p r e v : t h i s]) \geq \frac{2}{5}$ will cause bound - bad1() to grow exponentially (by a factor of $\frac{11}{10} = 1 + \frac{1}{2} (- 1 + \frac{2}{5} \frac{1}{p 1})$ ), so the loop will terminate.

Now we'll perform the same procedure for bad2():

        for i in xrange(this, next): p[i] = p2
        prev, this, next = this, next, floor(exp(next - 1)) + 1

        bound = 2 * bad2()
        while bad2() > 0:
            # won't let bad2() surpass bound
            eps = (bound - bad2()) / 2 / p2 / (next - this)
            while eps >= p2 or
              bound <= bad2() + (next - this) *
              log( p2 / (p2 - eps)):
                eps /= 2
            for i in xrange(this, next): p[i] = p2 - eps
            prev, this, next = this, next, floor(exp(next - 1)) + 1

            for i in xrange(this, next): p[i] = p2
            # this is where the p2 - eps affects bad2()
            prev, this, next = this, next, floor(exp(next - 1)) + 1

For the same reasons as the previous loop, this loop either stops with bad2() < 0 or runs forever with bad2() bounded and bad1 repeatedly falling back below 0.

Therefore, this algorithm either gets trapped in one of the inner while loops (and succeeds) or turns bad1() and bad2() negative, each an infinite number of times, and therefore succeeds.

Comment by János Kramár (janos-kramar) on Stationary algorithmic probability · 2015-06-21T22:56:09.000Z · LW · GW

These results are still a bit unsatisfying.

The first half constructs an invariant measure which is then shown to be unsatisfactory because UTMs can rank arbitrarily high while only being good at encoding variations of themselves. This is mostly the case because the chain is transient; if it was positive recurrent then the measure would be finite, and UTMs ranking high would have to be good at encoding (and being encoded by) the average UTM rather than just a select family of UTMs.

The second half looks at whether we can get better results (ie a probability measure) by restricting our attention to output-free "UTMs" (though I misspoke; these are not actually UTMs but rather universal semidecidable languages (we can call them USDLs)). It concludes that we can't if the measure will be continuous on the given digraph - however, this is an awkward notion of continuity: a low-complexity USDL whose behavior is tweaked very slightly but in a complex way may be very close in the given topology, but should have measure much lower than the starting USDL. So I consider this question unanswered.

Comment by János Kramár (janos-kramar) on A tractable, interpretable formulation of approximate conditioning for pairwise-specified probability distributions over truth values · 2015-06-20T16:34:14.000Z · LW · GW

There is a lot more to say about the perspective that isn't relaxed to continuous random variables. In particular, the problem of finding the maximum entropy joint distribution that agrees with particular pairwise distributions is closely related to Markov Random Fields and the Ising model. (The relaxation to continuous random variables is a Gaussian Markov Random Field.) It is easily seen that this maximum entropy joint distribution must have the form $log Pr (1_{φ_{1}}, \dots, 1_{φ_{n}}) = \sum_{i < j} θ_{i j} 1_{φ_{i} \land φ_{j}} + \sum_{i} θ_{i} 1_{φ_{i}} - log Z$ where $log Z$ is the normalizing constant, or partition function. This is an appealing distribution to use, and easy to do conditioning on and to add new variables to. Computing relative entropy reduces to finding bivariate marginals and to computing $Z$ , and computing marginals reduces to computing $Z$ , which is intractable in general[^istrail], though easy if the Markov graph (ie the graph with edges $i j$ for $θ_{i, j} \neq 0$ ) is a forest.

There have been many approaches to this problem (Wainwright and Jordan[^wainwright] is a good survey), but the main ways to extend the applicability from forests have been:

decompose components of the graph as "junction trees", ie trees whose nodes are overlapping clusters of nodes from the original graph; this permits exact computation with cost exponential in the cluster-sizes, ie in the treewidth[^pearl]
make use of clever combinatorial work coming out of statistical mechanics to do exact computation on "outerplanar" graphs, or on general graphs with cost exponential in the (outer-)graph genus[^schraudolph]
find nodes such that conditioning on those nodes greatly simplifies the graph (eg makes it singly connected), and sum over their possible values explicitly (this has cost exponential in the number of nodes being conditioned on)

A newer class of models, called sum-product networks[^poon], generalizes these and other models by writing the total joint probability as a positive polynomial $1 = \sum_{x_{1}, \dots, x_{n} = 0}^{1} Pr (1_{φ_{1}} = x_{1}, \dots, 1_{φ_{n}} = x_{n}) 1_{φ_{1}}^{x_{1}} 1_{{¯ φ}_{1}}^{1 - x_{1}} \dots 1_{φ_{n}}^{x_{n}} 1_{{¯ φ}_{n}}^{1 - x_{n}}$ in the variables $1_{φ_{1}}, 1_{{¯ φ}_{1}}, \dots, 1_{φ_{n}}, 1_{{¯ φ}_{n}}$ and requiring only that this polynomial be simplifiable to an expression requiring a tractable number of additions and multiplications to evaluate. This allows easy computation of marginals, conditionals, and KL divergence, though it will likely be necessary to do some approximate simplification every so often (otherwise the complexity may accumulate, even with a fixed maximum number of sentences being considered at a time).

However, if we want to stay close to the context of the Non-Omniscience paper, we can do approximate calculations of the partition function on the complete graph - in particular, the Bethe partition function[^weller] has been widely used in practice, and while it's not logconvex like $Z$ is, it's often a better approximation to the partition function than well-known convex approximations such as TRW.

[^istrail]: Istrail, Sorin. "Statistical mechanics, three-dimensionality and NP-completeness: I. Universality of intractability for the partition function of the Ising model across non-planar surfaces." In Proceedings of the thirty-second annual ACM symposium on Theory of computing, pp. 87-96. ACM, 2000.

[^weller]: Weller, Adrian. "Bethe and Related Pairwise Entropy Approximations."

[^pearl]: Pearl, Judea. "Probabilistic reasoning in intelligent systems: Networks of plausible reasoning." (1988).

[^schraudolph]: Schraudolph, Nicol N., and Dmitry Kamenetsky. "Efficient Exact Inference in Planar Ising Models." arXiv preprint arXiv:0810.4401 (2008).

[^wainwright]: Wainwright, Martin J., and Michael I. Jordan. "Graphical models, exponential families, and variational inference." Foundations and Trends® in Machine Learning 1, no. 1-2 (2008): 1-305.

[^poon]: Poon, Hoifung, and Pedro Domingos. "Sum-product networks: A new deep architecture." In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, pp. 689-690. IEEE, 2011.

Comment by János Kramár (janos-kramar) on Stationary algorithmic probability · 2015-06-18T18:16:12.000Z · LW · GW

In order to understand what the measure $μ$ that was constructed from $d$ will reward, here's the sort of machine that comes close to ${sup}_{M} μ (M) = 3$ :

Let $M_{0}$ be an arbitrary UTM. Now consider the function $r (n) = n - 2^{⌊ lg n ⌋}$ (or, really, any function $r : N^{+} \to N^{0}$ with $r (n) < n$ that visits every nonnegative integer infinitely many times), and let $L = {x \in {0, 1}^{*} : | x | > 2, x_{| x | - 1} = x_{r (| x | - 1)}, x_{| x | - 2} = x_{r (| x | - 2)}}$ . (The indices here are zero-based.) Choose $x_{0} \in L$ such that $x_{0}$ has no proper prefix in $L$ . Then, construct the UTM $M$ that does:

repeat:
    s := ""
    while s not in L:
        # if there is no next character, halt
        s := s + readchar()
    if s == x0:
        break
M0()

This $M$ will have $μ (M) > 3 - 2^{- | x_{0} |} + d (M_{0}, M) 2^{- | x_{0} | - d (M_{0}, M)}$ .

$M$ here is optimized for building up internal states (that are then UTMs that are efficiently encoded), while also being very easy to reset from these internal states; in other words being easy to "encode" from the UTMs it efficiently encodes, using at most 2 bits (an average of $\frac{1 + \sqrt{5}}{2}$ ). This is somewhat interesting, but clearly doesn't capture the kind of computational expressivity we're primarily interested in.

Comment by János Kramár (janos-kramar) on Stationary algorithmic probability · 2015-06-18T18:07:13.000Z · LW · GW

Consider the function $a (M_{1}, M_{2}) = 2^{- d (M_{1}, M_{2}) - d (M_{2}, M_{1})}$ where $d (M_{1}, M_{2}) = min (| x | | x \in {0, 1}^{*} : \forall y \in {0, 1}^{*} : M_{1} (x y) = M_{2} (y) unless neither of these halts)$ . The reversible Markov chain with transition probabilities $p (M_{1}, M_{2}) = \frac{a (M_{1}, M_{2})}{\sum_{M_{2}^{'}} a (M_{1}, M_{2}^{'})}$ has a bounded positive invariant measure $μ (M) = \sum_{M^{'}} a (M, M^{'})$ . Of course, as the post showed, the total measure is infinite. Also, because the chain is reversible and transient, the invariant measure is far from unique - indeed, for any machine $M_{0}$ , the measure $μ (M) = p^{(0)} (M, M_{0}) + 2 \sum_{n = 1}^{\infty} p^{(n)} (M, M_{0})$ will be a bounded positive invariant measure.

It seems tempting (to me) to try to get a probability measure by modding out the output-permutations (that the post uses to show this isn't possible for the full set of UTMs). To this end, consider the set of UTMs that have no output. (These will be unaffected by the output-permutations.) We can try to use the induced sub-digraph on these to build a probability measure $μ$ . The measure of each UTM should be a function of the rooted edge-labeled digraph $G_{M}$ rooted at that UTM.

The most natural topology on rooted edge-labeled infinite digraphs is the one generated by the sets ${G : G^{'} is isomorphic to an induced rooted edge-labeled subgraph of G}$ where $G^{'}$ ranges over finite rooted edge-labeled digraphs - we could hope that $μ$ is continuous according to this topology. Unfortunately, this can't work: if $μ (M) > 0$ then $μ^{- 1} ((\frac{1}{2} μ (M), \infty))$ must be open, and so it must contain some finite intersection of the generating sets; however, every such intersection that's nonempty (as this one is) contains infinitely many UTMs, so the total measure must be infinite as well.

Actually, on further thought, I think the best thing to use here is a log-bilinear distribution over the space of truth-assignments. For these, it is easy to efficiently compute exact normalizing constants, conditional distributions, marginal distributions, and KL divergences; there is no impedance mismatch. KL divergence minimization here is still a convex minimization (in the natural parametrization of the exponential family).

The only shortcoming is that 0 is not a probability, so it won't let you eg say that $Pr (φ_{1} \to φ_{2}) = 1$ ; but this can be remedied using a real or hyperreal approximation.

An easy way to get rid of the probabilities-outside-[0,1] problem in the continuous relaxation is to constrain the "conditional"/updated distribution to have $Var (1_{φ_{i}} ∣ ∣ \dots) \leq E (1_{φ_{i}} ∣ ∣ \dots) (1 - E (1_{φ_{i}} ∣ ∣ \dots))$ (which is a convex constraint; it's equivalent to $Var (1_{φ_{i}} ∣ ∣ \dots) + {(E (1_{φ_{i}} ∣ ∣ \dots) - \frac{1}{2})}^{2}$ ), and then minimize KL-divergence accordingly.

The two obvious flaws are that the result of updating becomes ordering-dependent (though this may not be a problem in practice), and that the updated distribution will sometimes have $Var (1_{φ_{i}} ∣ ∣ \dots) < E (1_{φ_{i}} ∣ ∣ \dots) (1 - E (1_{φ_{i}} ∣ ∣ \dots))$ , and it's not clear how to interpret that.

Comment by János Kramár (janos-kramar) on Stationary algorithmic probability · 2015-06-04T16:31:15.000Z · LW · GW

It may still be possible to get a unique (up to scaling) invariant measure (with infinite sum) over the UTMs by invoking something like the Krein-Rutman theorem and applying it to the transition operator. I haven't yet verified that all the conditions hold.

This measure would then be an encoding-invariant way to compare UTMs' "intrinsic complexity" in the sense of "number of bits needed to simulate".

Comment by János Kramár (janos-kramar) on No Good Logical Conditional Probability · 2015-06-02T00:36:29.000Z · LW · GW

This is interesting! I would dispute, though, that a good logical conditional probability must be able to condition on arbitrary, likely-non-r.e. sets of sentences.

Comment by János Kramár (janos-kramar) on Modal Bargaining Agents · 2015-05-11T02:35:55.000Z · LW · GW

What's the harm in requiring prior coordination, considering there's already a prior agreement to follow a particular protocol involving $A_{i}$ s? (And something earlier on in the context about a shared source of randomness to ensure convexity of the feasible set.)

Comment by János Kramár (janos-kramar) on Modal Bargaining Agents · 2015-05-07T02:04:05.000Z · LW · GW

If the fairness constraints are all pairwise (ie each player has fairness curves for each opponent), then the scheme generalizes directly. Slightly more generally, if each player's fairness set is weakly convex and closed under componentwise max, the scheme still generalizes directly (in effect the componentwise max creates a fairness curve which can be intersected with the $x y z = A_{i}$ surfaces to get the $(x_{i}, y_{i}, z_{i})$ points.

In order to generalize fully, the agents should each precommunicate their fairness sets. In fact, after doing this, the algorithm is very simple: player X can compute what it believes is the optimal- $x y z$ feasible-and-fair-according-to-everyone point $(x, y, z)$ (which is unique because these are all convex sets), and if PA proves the outcome will be fair-according-to-X, then output $x$ ; otherwise output 0.

Comment by János Kramár (janos-kramar) on Modal Bargaining Agents · 2015-04-29T22:46:31.000Z · LW · GW

You did miss something: namely from PA+2 X wants to show feasibility of $(\frac{m}{2} x_{i_{0}}, y)$ , not $(\frac{m}{2}, y)$ . In your example, $x_{i_{0}} = 3$ , so the Löbian circle you describe will fail.

I'll walk through what will happen in the example.

The $A_{i}$ are just areas (ie $x_{i} y_{i}$ ), not rectangles. In this example, $A_{1} = 6$ is enough to contain $(2, 3)$ and $(3, 2)$ . For conciseness let's have $A_{1} = 6$ , $A_{2} = 4$ , and $m = 3$ (so $A_{3} = 0$ ).

Both X and Y have $i_{0} = 1$ . According to X, $(x_{1}, y_{1}) = (3, 2)$ , $(x_{2}, y_{2}) = (2, 2)$ , and $(x_{3}, y_{3}) = (0, 0)$ .

First the speculative phase will happen:

X will try to prove in PA+1 that $y \leq 2$ and that $(\frac{3}{1} \cdot 3, y)$ is in the feasible set. The latter is immediately false, so this will fail.

Next, X will try to prove in PA+2 that $y \leq 2$ and that $(\frac{3}{2} \cdot 3, y)$ is in the feasible set. This too is immediately false.

Next, X will try to prove in PA+3 that $y \leq 2$ and that $(\frac{3}{3} \cdot 3, y)$ is feasible (in short, that $y = 2$ ), and return 3 if so. This will fail to reach a cycle.

Then the bargaining phase:

Next, X will try to prove in PA+4 that $y \leq 2$ and that $(3, y)$ is feasible, and return 3 if so. This will fail identically.

Next, X will try to prove in PA+5 that $y \leq 2$ and that $(2, y)$ is feasible, and return 2 if so. This will reach a Löbian circle, and X and Y will both return 2, which is what we want.

Comment by János Kramár (janos-kramar) on Modal Bargaining Agents · 2015-04-25T21:53:34.000Z · LW · GW

How about a gridless scheme like:

The agents agree that they will each output how much utility they will get, and if they fail to choose a feasible point they both get 0.

Now discretize the possible "rectangle areas": let them be $A_{1} > . . . > A_{m} = 0$ . (This requires a way to agree on that, but this seems easier than agreeing on grid points; the finer the better, basically. Perhaps the most obvious way to do it is to have these be evenly spaced from $A_{m} = 0$ to $A_{1}$ ; then only $A_{1}$ and $m$ need to be agreed upon.)

X will do the following:

let $A_{i_{0}}$ be the first area in this list that is achieved by a feasible point on X's fairness curve. (We will assume the fairness curve is weakly convex and weakly increasing.)

for $i = i_{0}, \dots, m - 1$ , let $(x_{i}, y_{i})$ be the "fair" utility distribution that achieves area $A_{i}$ ; let $(x_{m}, y_{m}) = (0, 0)$ .

let $y$ be Y's output.

# Speculative phase:

if $P A + 1$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{1} x_{i_{0}}, y)$ is feasible, return $\frac{m}{1} x_{i_{0}}$ .

else if $P A + 2$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{2} x_{i_{0}}, y)$ is feasible, return $\frac{m}{2} x_{i_{0}}$ .

$⋮$

else if $P A + m$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{m} x_{i_{0}}, y)$ is feasible, return $\frac{m}{m} x_{i_{0}}$ .

# Bargaining phase:

else if $P A + m + i_{0}$ proves $y \leq y_{i_{0}}$ and $(x_{i_{0}}, y)$ is feasible, return $x_{i_{0}}$ .

else if $P A + m + i_{0} + 1$ proves $y \leq y_{i_{0} + 1}$ and $(x_{i_{0} + 1}, y)$ is feasible, return $x_{i_{0} + 1}$ .

$⋮$

else if $P A + m + m - 1$ proves $y \leq y_{m - 1}$ and $(x_{m - 1}, y)$ is feasible, return $x_{m - 1}$ .

else return $x_{i_{0}}$ .

If both agents follow protocol, the result is guaranteed to be feasible and to not be above either agent's fairness curve.

Furthermore, if the intersection of the fairness curves is at a feasible point that produces rectangle area $\geq A_{i}$ for some $i$ and X and Y follow the protocol then on the $P A + m + i$ step, they will be able to agree on the coordinatewise min of their proposed feasible points with area $A_{i}$ .

If they're both really generous and their fairness curves don't intersect inside the feasible region, the given protocol will agree on approximately the highest feasible multiple of how much they thought they were fairly due, avoiding utility sacrifice.

Comment by János Kramár (janos-kramar) on Modal Bargaining Agents · 2015-04-25T21:09:16.000Z · LW · GW

How about a gridless scheme like:

The agents agree that they will each output how much utility they will get, and if they fail to choose a feasible point they both get 0.

X will do the following:

let $A_{i_{0}}$ be the first area in this list that is achieved by a feasible point on X's fairness curve. (We will assume the fairness curve is weakly convex and weakly increasing.)

for $i = i_{0}, \dots, m - 1$ , let $(x_{i}, y_{i})$ be the "fair" utility distribution that achieves area $A_{i}$ ; let $(x_{m}, y_{m}) = (0, 0)$ .

let $y$ be Y's output.

# Speculative phase:

if $P A + 1$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{1} x_{i_{0}}, y)$ is feasible, return $\frac{m}{1} x_{i_{0}}$ .

else if $P A + 2$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{2} x_{i_{0}}, y)$ is feasible, return $\frac{m}{2} x_{i_{0}}$ .

$⋮$

else if $P A + m$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{m} x_{i_{0}}, y)$ is feasible, return $\frac{m}{m} x_{i_{0}}$ .

# Bargaining phase:

else if $P A + m + i_{0}$ proves $y \leq y_{i_{0}}$ and $(x_{i_{0}}, y)$ is feasible, return $x_{i_{0}}$ .

else if $P A + m + i_{0} + 1$ proves $y \leq y_{i_{0} + 1}$ and $(x_{i_{0} + 1}, y)$ is feasible, return $x_{i_{0} + 1}$ .

$⋮$

else if $P A + m + m - 1$ proves $y \leq y_{m - 1}$ and $(x_{m - 1}, y)$ is feasible, return $x_{m - 1}$ .

else return $x_{i_{0}}$ .

If both agents follow protocol, the result is guaranteed to be feasible and to not be above either agent's fairness curve.

Comment by János Kramár (janos-kramar) on Modal Bargaining Agents · 2015-04-25T00:57:14.000Z · LW · GW

How about a gridless scheme like:

The agents agree that they will each output how much utility they will get, and if they fail to choose a feasible point they both get 0.

X will do the following:

let $A_{i_{0}}$ be the first area in this list that is achieved by a feasible point on X's fairness curve. (We will assume the fairness curve is weakly convex and weakly increasing.)

for $i = i_{0}, \dots, m - 1$ , let $(x_{i}, y_{i})$ be the "fair" utility distribution that achieves area $A_{i}$ ; let $(x_{m}, y_{m}) = (0, 0)$ .

let $y$ be Y's output.

# Speculative phase:

if $P A + 1$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{1} x_{i_{0}}, y)$ is feasible, return $\frac{m}{1} x_{i_{0}}$ .

else if $P A + 2$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{2} x_{i_{0}}, y)$ is feasible, return $\frac{m}{2} x_{i_{0}}$ .

$⋮$

else if $P A + m$ proves $y \leq y_{i_{0}}$ and $(\frac{m}{m} x_{i_{0}}, y)$ is feasible, return $\frac{m}{m} x_{i_{0}}$ .

# Bargaining phase:

else if $P A + m + i_{0}$ proves $y \leq y_{i_{0}}$ and $(x_{i_{0}}, y)$ is feasible, return $x_{i_{0}}$ .

else if $P A + m + i_{0} + 1$ proves $y \leq y_{i_{0} + 1}$ and $(x_{i_{0} + 1}, y)$ is feasible, return $x_{i_{0} + 1}$ .

$⋮$

else if $P A + m + m - 1$ proves $y \leq y_{m - 1}$ and $(x_{m - 1}, y)$ is feasible, return $x_{m - 1}$ .

else return $x_{i_{0}}$ .

If both agents follow protocol, the result is guaranteed to be feasible and to not be above either agent's fairness curve.

User info

Posts

Comments