Posts

Thinking About Filtered Evidence Is (Very!) Hard 2020-03-19T23:20:05.562Z · score: 76 (23 votes)
Bayesian Evolving-to-Extinction 2020-02-14T23:55:27.391Z · score: 35 (12 votes)
A 'Practice of Rationality' Sequence? 2020-02-14T22:56:13.537Z · score: 71 (23 votes)
Instrumental Occam? 2020-01-31T19:27:10.845Z · score: 31 (11 votes)
Becoming Unusually Truth-Oriented 2020-01-03T01:27:06.677Z · score: 98 (38 votes)
The Credit Assignment Problem 2019-11-08T02:50:30.412Z · score: 64 (20 votes)
Defining Myopia 2019-10-19T21:32:48.810Z · score: 28 (6 votes)
Random Thoughts on Predict-O-Matic 2019-10-17T23:39:33.078Z · score: 27 (11 votes)
The Parable of Predict-O-Matic 2019-10-15T00:49:20.167Z · score: 188 (66 votes)
Partial Agency 2019-09-27T22:04:46.754Z · score: 53 (15 votes)
The Zettelkasten Method 2019-09-20T13:15:10.131Z · score: 143 (62 votes)
Do Sufficiently Advanced Agents Use Logic? 2019-09-13T19:53:36.152Z · score: 41 (16 votes)
Troll Bridge 2019-08-23T18:36:39.584Z · score: 73 (42 votes)
Conceptual Problems with UDT and Policy Selection 2019-06-28T23:50:22.807Z · score: 52 (13 votes)
What's up with self-esteem? 2019-06-25T03:38:15.991Z · score: 39 (18 votes)
How hard is it for altruists to discuss going against bad equilibria? 2019-06-22T03:42:24.416Z · score: 52 (15 votes)
Paternal Formats 2019-06-09T01:26:27.911Z · score: 60 (27 votes)
Mistakes with Conservation of Expected Evidence 2019-06-08T23:07:53.719Z · score: 148 (47 votes)
Does Bayes Beat Goodhart? 2019-06-03T02:31:23.417Z · score: 46 (15 votes)
Selection vs Control 2019-06-02T07:01:39.626Z · score: 112 (30 votes)
Separation of Concerns 2019-05-23T21:47:23.802Z · score: 70 (22 votes)
Alignment Research Field Guide 2019-03-08T19:57:05.658Z · score: 204 (74 votes)
Pavlov Generalizes 2019-02-20T09:03:11.437Z · score: 68 (20 votes)
What are the components of intellectual honesty? 2019-01-15T20:00:09.144Z · score: 32 (8 votes)
CDT=EDT=UDT 2019-01-13T23:46:10.866Z · score: 42 (11 votes)
When is CDT Dutch-Bookable? 2019-01-13T18:54:12.070Z · score: 25 (4 votes)
CDT Dutch Book 2019-01-13T00:10:07.941Z · score: 27 (8 votes)
Non-Consequentialist Cooperation? 2019-01-11T09:15:36.875Z · score: 46 (15 votes)
Combat vs Nurture & Meta-Contrarianism 2019-01-10T23:17:58.703Z · score: 61 (16 votes)
What makes people intellectually active? 2018-12-29T22:29:33.943Z · score: 90 (43 votes)
Embedded Agency (full-text version) 2018-11-15T19:49:29.455Z · score: 95 (38 votes)
Embedded Curiosities 2018-11-08T14:19:32.546Z · score: 86 (34 votes)
Subsystem Alignment 2018-11-06T16:16:45.656Z · score: 121 (39 votes)
Robust Delegation 2018-11-04T16:38:38.750Z · score: 120 (39 votes)
Embedded World-Models 2018-11-02T16:07:20.946Z · score: 90 (27 votes)
Decision Theory 2018-10-31T18:41:58.230Z · score: 100 (36 votes)
Embedded Agents 2018-10-29T19:53:02.064Z · score: 193 (83 votes)
A Rationality Condition for CDT Is That It Equal EDT (Part 2) 2018-10-09T05:41:25.282Z · score: 17 (6 votes)
A Rationality Condition for CDT Is That It Equal EDT (Part 1) 2018-10-04T04:32:49.483Z · score: 21 (7 votes)
In Logical Time, All Games are Iterated Games 2018-09-20T02:01:07.205Z · score: 83 (26 votes)
Track-Back Meditation 2018-09-11T10:31:53.354Z · score: 61 (25 votes)
Exorcizing the Speed Prior? 2018-07-22T06:45:34.980Z · score: 11 (4 votes)
Stable Pointers to Value III: Recursive Quantilization 2018-07-21T08:06:32.287Z · score: 20 (9 votes)
Probability is Real, and Value is Complex 2018-07-20T05:24:49.996Z · score: 44 (20 votes)
Complete Class: Consequentialist Foundations 2018-07-11T01:57:14.054Z · score: 43 (16 votes)
Policy Approval 2018-06-30T00:24:25.269Z · score: 49 (18 votes)
Machine Learning Analogy for Meditation (illustrated) 2018-06-28T22:51:29.994Z · score: 101 (38 votes)
Confusions Concerning Pre-Rationality 2018-05-23T00:01:39.519Z · score: 36 (7 votes)
Co-Proofs 2018-05-21T21:10:57.290Z · score: 91 (25 votes)
Bayes' Law is About Multiple Hypothesis Testing 2018-05-04T05:31:23.024Z · score: 81 (20 votes)

Comment by abramdemski on Two Alternatives to Logical Counterfactuals · 2020-04-05T03:30:23.185Z · score: 2 (1 votes) · LW · GW

Ahhh ok.

Comment by abramdemski on Two Alternatives to Logical Counterfactuals · 2020-04-05T03:29:17.850Z · score: 5 (3 votes) · LW · GW

I'm left with the feeling that you don't see the problem I'm pointing at.

My concern is that the most plausible world where you aren't a pure optimizer might look very very different, and whether this very very different world looks better or worse than the normal-looking world does not seem very relevant to the current decision.

Consider the "special exception selves" you mention -- the Nth exception-self has a hard-coded exception "go right if it's beet at least N turns and you've gone right at most 1/N of the time".

Now let's suppose that the worlds which give rise to exception-selves are a bit wild. That is to say, the rewards in those worlds have pretty high variance. So a significant fraction of them have quite high reward -- let's just say 10% of them have value much higher than is achievable in the real world.

So we expect that by around N=10, there will be an exception-self living in a world that looks really good.

This suggests to me that the policy-dependent-source agent cannot learn to go left > 90% of the time, because once it crosses that threshhold, the exception-self in the really good looking world is ready to trigger its exception -- so going right starts to appear really good. The agent goes right until it is under the threshhold again.

If that's true, then it seems to me rather bad: the agent ends up repeatedly going right in a situation where it should be able to learn to go left easily. Its reason for repeatedly going right? There is one enticing world, which looks much like the real world, except that in that world the agent definitely goes right. Because that agent is a lucky agent who gets a lot of utility, the actual agent has decided to copy its behavior exactly -- anything else would prove the real agent unlucky, which would be sad.

Of course, this outcome is far from obvious; I'm playing fast and loose with how this sort of agent might reason.

Comment by abramdemski on Two Alternatives to Logical Counterfactuals · 2020-04-03T20:00:21.583Z · score: 7 (4 votes) · LW · GW

If you see your source code is B instead of A, you should anticipate learning that the programmers programmed B instead of A, which means something was different in the process. So the counterfactual has implications backwards in physical time.

At some point it will ground out in: different indexical facts, different laws of physics, different initial conditions, different random events...

I'm not sure how you are thinking about this. It seems to me like this will imply really radical changes to the universe. Suppose the agent is choosing between a left path and a right path. Its actual programming will go left. It has to come up with alternate programming which would make it go right, in order to consider that scenario. The most probable universe in which its programming would make it go right is potentially really different from our own. In particular, it is a universe where it would go right despite everything it has observed, a lifetime of (updateless) learning, which in the real universe, has taught it that it should go left in situations like this.

EG, perhaps it has faced an iterated 5&10 problem, where left always yields 10. It has to consider alternate selves who, faced with that history, go right.

It just seems implausible that thinking about universes like that will result in systematically good decisions. In the iterated 5&10 example, perhaps universes where its programming fails iterated 5&10 are universes where iterated 5&10 is an exceedingly unlikely situation; so in fact, the reward for going right is quite unlikely to be 5, and very likely to be 100. Then the AI would choose to go right.

Obviously, this is not necessarily how you are thinking about it at all -- as you said, you haven't given an actual decision procedure. But the idea of considering only really consistent counterfactual worlds seems quite problematic.

Comment by abramdemski on Two Alternatives to Logical Counterfactuals · 2020-04-03T19:44:04.503Z · score: 2 (1 votes) · LW · GW

Conditioning on ‘A(obs) = act’ is still a conditional, not a counterfactual. The difference between conditionals and counterfactuals is the difference between “If Oswald didn’t kill Kennedy, then someone else did” and “If Oswald didn’t kill Kennedy, then someone else would have”.

I still disagree. We need a counterfactual structure in order to consider the agent as a function A(obs). EG, if the agent is a computer program, the function would contain all the counterfactual information about what the agent would do if it observed different things. Hence, considering the agent's computer program as such a function leverages an ontological commitment to those counterfactuals.

To illustrate this, consider counterfactual mugging where we already see that the coin is heads -- so, there is nothing we can do, we are at the mercy of our counterfactual partner. But suppose we haven't yet observed whether Omega gives us the money.

A "real counterfactual" is one which can be true or false independently of whether its condition is met. In this case, if we believe in real counterfactuals, we believe that there is a fact of the matter about what we do in the case, even though the coin came up heads. If we don't believe in real counterfactuals, we instead think only that there is a fact of how Omega is computing "what I would have done if the coin had been tails" -- but we do not believe there is any "correct" way for Omega to compute that.

The representation and the representation both appear to satisfy this test of non-realism. The first is always true if the observation is false, so, lacks the ability to vary independently of the observation. The second is undefined when the observation is false, which is perhaps even more appealing for the non-realist.

Now consider the representation. can still vary even when we know . So, it fails this test -- it is a realist representation!

Putting something into functional form imputes a causal/counterfactual structure.

Comment by abramdemski on Two Alternatives to Logical Counterfactuals · 2020-04-03T19:36:00.626Z · score: 2 (1 votes) · LW · GW

In the happy dance problem, when the agent is considering doing a happy dance, the agent should have already updated on M. This is more like timeless decision theory than updateless decision theory.

I agree that this gets around the problem, but to me the happy dance problem is still suggestive -- it looks like the material conditional is the wrong representation of the thing we want to condition on.

Also -- if the agent has already updated on observations, then updating on is just the same as updating on . So this difference only matters in the updateless case, where it seems to cause us trouble.

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-04-03T07:23:51.561Z · score: 4 (2 votes) · LW · GW

Sure, that seems reasonable. I guess I saw this as the point of a lot of MIRI’s past work, and was expecting this to be about honesty / filtered evidence somehow.

Yeah, ok. This post as written is really less the kind of thing somebody who has followed all the MIRI thinking needs to hear and more the kind of thing one might bug an orthodox Bayesian with. I framed it in terms of filtered evidence because I came up with it by thinking about some confusion I was having about filtered evidence. And it does problematize the Bayesian treatment. But in terms of actual research progress it would be better framed as a negative result about whether Sam's untrollable prior can be modified to have richer learning.

I think we mean different things by “perfect model”. What if [...]

Yep, I agree with everything you say here.

Comment by abramdemski on The absurdity of un-referenceable entities · 2020-04-02T22:27:19.800Z · score: 6 (3 votes) · LW · GW

Also -- it may not come across in my other comments -- the argument in the OP was novel to me (at least, if I had heard it before, I thought it was wrong at that time and didn't update on it) and feels like a nontrivial observation about how reference has to work.

Comment by abramdemski on The absurdity of un-referenceable entities · 2020-04-02T22:16:45.379Z · score: 8 (4 votes) · LW · GW

Alright, cool. 👌In general I think reference needs to be treated as a vague object to handle paradoxes (something along the lines of Hartry Field's theory of vague semantics, although I may prefer something closer to linear logic rather than his non-classical logic) -- and also just to be more true to actual use.

I am not able to think of any argument why the set of un-referenceable entities should be paradoxical rather than empty, at the moment. But it seems somehow appropriate that the domain of quantification for our language be vague, and further could be that we don't assert that nothing lies outside of it. (Only that there is not some thing definitely outside of it.)

Comment by abramdemski on Two Alternatives to Logical Counterfactuals · 2020-04-02T21:52:17.099Z · score: 17 (6 votes) · LW · GW

I too have recently updated (somewhat) away from counterfactual non-realism. I have a lot of stuff I need to work out and write about it.

I seem to have a lot of disagreements with your post.

Given this uncertainty, you may consider material conditionals: if I take action X, will consequence Q necessarily follow? An action may be selected on the basis of these conditionals, such as by determining which action results in the highest guaranteed expected utility if that action is taken.

I don't think material conditionals are the best way to cash out counterfactual non-realism.

• The basic reason I think it's bad is the happy dance problem. This makes it seem clear that the sentence to condition on should not be .
• If the action can be viewed as a function of observations, conditioning on makes sense. But this is sort of like already having counterfactuals, or at least, being realist that there are counterfactuals about whan would do if the agent observed different things. So this response can be seen as abandoning counterfactual non-realism.
• A different approach is to consider conditional beliefs rather than material implications. I think this is more true to counterfactual non-realism. In the simplest form, this means you just condition on actions (rather than trying to condition on something like or ). However, in order to reason updatelessly, you need something like conditioning on conditionals, which complicates matters.
• Another reason to think it's bad is Troll Bridge.
• Again if the agent thinks there are basic counterfactual facts, (required to respect but little else -- ie entirely determined by subjective beliefs), then the agent can escape Troll Bridge by disagreeing with the relevant inference. But this, of course, rejects the kind of counterfactual non-realism you intend.
• To be more in line with counterfactual non-realism, we would like to use conditional probabilities instead. However, conditional probability behaves too much like material implication to block the Troll Bridge argument. However, I believe that there is an account of conditional probability which avoids this by rejecting the ratio analysis of conditional probability -- ie Bayes' definition -- and instead regards conditional probability as a basic entity. (Along the lines of what Alan Hájek goes on and on about.) Thus an EDT-like procedure can be immune to both 5-and-10 and Troll Bridge. (I claim.)

As for policy-dependent source code, I find myself quite unsympathetic to this view.

• If the agent is updateful, this is just saying that in counterfactuals where the agent does something else, it might have different source code. Which seems fine, but does it really solve anything? Why is this much better than counterfactuals which keep the source code fixed but imagine the execution trace being different? This seems to only push the rough spots further back -- there can still be contradictions, e.g. between the source code and the process by which programmers wrote the source code. Do you imagine it is possible to entirely remove such rough spots from the counterfactuals?
• So it seems you intend the agent to be updateless instead. But then we have all the usual issues with logical updatelessness. If the agent is logically updateless, there is absolutely no reason to think that its beliefs about the connections between source code and actual policy behavior is any good. Making those connections requires actual reasoning, not simply a good enough prior -- which means being logically updateful. So it's unclear what to do.
• Perhaps logically-updateful policy-dependent-source-code is the most reasonable version of the idea. But then we are faced with the usual questions about spurious counterfactuals, chicken rule, exploration, and Troll Bridge. So we still have to make choices about those things.
Comment by abramdemski on The absurdity of un-referenceable entities · 2020-04-02T20:31:15.112Z · score: 4 (2 votes) · LW · GW

Yeah, I'm describing a confusion between views from nowhere and 3rd person perspectives.

Do we disagree about something? It seems possible that you think "ontologizing the by-definition-not-ontologizable" is a bad thing, whereas I'm arguing it's important to have that in one's ontology (even if it's an empty set).

I could see becoming convinced that "the non-ontologizable" is an inherently vague set, IE, achieves a paradoxical status of not being definitely empty, but definitely not being definitely populated.

Comment by abramdemski on The absurdity of un-referenceable entities · 2020-04-01T22:10:13.974Z · score: 9 (2 votes) · LW · GW

Another reason why unreferenceable entities may be intuitively appealing is that if we take a third person perspective, we can easily imagine an abstract agent being unable to reference some entity.

In map/territory thinking, we could imagine things beyond the curvature of the earth being impossible to illustrate on a 2d map. In pure logic, we imagine a Tarskian truth predicate for a logic.

You, sitting outside the thought experiment, cannot be referenced by the agent you imagine. (That is, one easily neglects the possibility.) So the agent saying "the stuff someone else might think of" appears to be no help.

So, I note that the absurdity of the unreferenceable entity is not quite trivial. You are assuming that "unreferenceable" is a concept within the ontology, in order to prove that no such thing can be.

It is perfectly consistent to imagine an entity and an object which cannot be referenced by our imagined entity. We need only suppose that our entity lacks a concept of the unreferenceable.

So despite the absurdity of unreferenceable objects, it seems we need them in our ontology in order to avoid them. ;)

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-04-01T20:23:41.882Z · score: 8 (4 votes) · LW · GW

I agree with your first sentence, but I worry you may still be missing my point here, namely that the Bayesian notion of belief doesn't allow us to make the distinction you are pointing to. If a hypothesis implies something, it implies it "now"; there is no "the conditional probability is 1 but that isn't accessible to me yet".

I also think this result has nothing to do with "you can't have a perfect model of Carol". Part of the point of my assumptions is that they are, individually, quite compatible with having a perfect model of Carol amongst the hypotheses.

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-04-01T18:14:43.759Z · score: 6 (3 votes) · LW · GW

I'm not sure exactly what the source of your confusion is, but:

I don't see how this follows. At the point where the confidence in PA rises above 50%, why can't the agent be mistaken about what the theorems of PA are?

The confidence in PA as a hypothesis about what the speaker is saying is what rises above 50%. Specifically, an efficiently computable hypothesis eventually enumerating all and only the theorems of PA rises above 50%.

For example, let T be a theorem of PA that hasn't been claimed yet. Why can't the agent believe P(claims-T) = 0.01 and P(claims-not-T) = 0.99? It doesn't seem like this violates any of your assumptions.

This violates the assumption of honesty that you quote, because the agent simultaneously has P(H) > 0.5 for a hypothesis H such that P(obs_n-T | H) = 1, for some (possibly very large) n, and yet also believes P(T) < 0.5. This is impossible since it must be that P(obs_n-T) > 0.5, due to P(H) > 0.5, and therefore must be that P(T) > 0.5, by honesty.

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-04-01T18:07:20.843Z · score: 4 (2 votes) · LW · GW

Here's one way to extend a result like this to lying. Rather than assume honesty, we could assume observations carry sufficiently much information about the truth. This is like saying that sensory perception may be fooled, but in the long run, bears a strong enough connection to reality for us to infer a great deal. Something like this should imply the same computational difficulties.

I'm not sure exactly how this assumption should be spelled out, though.

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-04-01T17:58:21.886Z · score: 2 (1 votes) · LW · GW

It's sufficient to allow an adversarial (and dishonest) speaker to force a contradiction, sure. But the theorem is completely subjective. It says that even from the agent's perspective there is a problem. IE, even if we think the speaker to be completely honest, we can't (computably) have (even minimally) consistent beliefs. So it's more surprising than simply saying that if we believe a speaker to be honest then that speaker can create a contradiction by lying to us. (At least, more surprising to me!)

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-04-01T17:51:41.885Z · score: 6 (3 votes) · LW · GW

It's absurd (in a good way) how much you are getting out of incomplete hypotheses. :)

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-03-23T23:21:40.921Z · score: 4 (2 votes) · LW · GW

I like your example, because "Carol's answers are correct" seems like something very simple, and also impossible for a (bounded) Bayesian to represent. It's a variation of calculator or notepad problems -- that is, the problem of trying to represent a reasoner who has (and needs) computational/informational resources which are outside of their mind. (Calculator/notepad problems aren't something I've written about anywhere iirc, just something that's sometimes on my mind when thinking about logical uncertainty.)

I do want to note that weakening honesty seems like a pretty radical departure from the standard Bayesian treatment of filtered evidence, in any case (for better or worse!). Distinguishing between observing X and X itself, it is normally assumed that observing X implies X. So while our thinking on this does seem to differ, we are agreeing that there are significant points against the standard view.

From outside, the solution you propose looks like "doing the best you can to represent the honesty hypothesis in a computationally tractable way" -- but from inside, the agent doesn't think of it that way. It simply can't conceive of perfect honesty. This kind of thing feels both philosophically unsatisfying and potentially concerning for alignment. It would be more satisfying if the agent could explicitly suspect perfect honesty, but also use tractable approximations to reason about it. (Of course, one cannot always get everything one wants.)

We could modify the scenario to also include questions about Carol's honesty -- perhaps when the pseudo-Bayesian gets a question wrong, it is asked to place a conditional bet about what Carol would say if Carol eventually gets around to speaking on that question. Or other variations along similar lines.

Comment by abramdemski on Zoom In: An Introduction to Circuits · 2020-03-21T01:44:30.542Z · score: 18 (6 votes) · LW · GW

The "Zoom In" work is aimed at understanding what's going on in neural networks as a scientific question, not directly tackling mesa-optimization. This work is relevant to more application-oriented interpretability if you buy that understanding what is going on is an important prerequisite to applications.

As the original article put it:

And so we often get standards of evaluations more targeted at whether an interpretability method is useful rather than whether we’re learning true statements.

Or, as I put it in Embedded Curiosities:

One downside of discussing these problems as instrumental strategies is that it can lead to some misunderstandings about why we think this kind of work is so important. With the “instrumental strategies” lens, it’s tempting to draw a direct line from a given research problem to a given safety concern.

A better understanding of 'circuits' in the sense of Zoom In could yield unexpected fruits in terms of safety. But to name an expected direction: understanding the algorithms expressed by 95% of a neural network, one could re-implement those independently. This would yield a totally transparent algorithm. Obviously a further question to ask is, how much of a performance hit do we take by discarding the 5% we don't understand? (If it's too large, this is also a significant point against the idea that the 'circuits' methodology is really providing much understanding of the deep NN from a scientific point of view.)

I'm not claiming that doing that would eliminate all safety concerns with the resulting reimplementation, of course. Only that it would address the specific concern you mention.

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-03-21T00:51:58.997Z · score: 5 (3 votes) · LW · GW

Yep. I'll insert some clarification.

Comment by abramdemski on Thinking About Filtered Evidence Is (Very!) Hard · 2020-03-21T00:51:37.920Z · score: 3 (2 votes) · LW · GW

The set of theorems of PA is computably enumerable, but the set of theorems and anti-theorems (things provably false) is not computably separable, IE there is no computation which returns "true" for theorems, "false" for anti-theorems, and always returns some answer (we don't care specifically what that answer is for anything that's not a theorem or anti-theorem).

Comment by abramdemski on The Zettelkasten Method · 2020-03-14T18:43:16.037Z · score: 5 (3 votes) · LW · GW
Do you people actually open your index note and then go through all your notes related to your project from time to time?

I do look at index notes when I want an overview (usually because I'm not quite sure what I want to work on nex), but I don't go through *all* related notes really.

Usually I write things on paper as an extension of my working memory, but right after having finished the thought, I can throw it away.

This is somewhat different from how I develop ideas.

• It is true that I'm not mostly going back and re-reading. Most of my time is spent on new material.
• It's also true that I don't necessarily re-read a note once I'm finished with a thought. I do largely feel satisfied at that point and move on to something else.
• However, I often re-visit my notes. Even with paper notebooks, I would tend to look back at recent pages frequently, and less recent pages less frequently, sometimes going all the way back to my previous few filled notebooks. This looking-back was sometimes a result of wanting to see how a specific thing went in my notes; sometimes a result of wanting to know where things left off with a particular line of thought, so that I could try and take it further; and sometimes a result of wanting an overview of what I'd been thinking about, to prioritize what I might think about next.
• Furthermore, with Zettelkasten, there is not necessarily a single "path" such that an idea is "finished" when you reach the end. So there's more incentive to keep going back to things, continuing the various branching paths.
I do keep notes in emacs org-mode, but I almost never go and read them sequentially. I think it would be boring - I'd rather go read stuff on the internet. Actually I rarely read my notes at all. Usually I only do it when I want to remind myself something specific and I remember that I have something written about it.

I do occasionally kind of flip through sequentially, but I agree reading them sequentially would be boring and inefficient. I think your instincts here are basically right, and it's just that you aren't developing ideas which require a whole lot of looking back. Also it's possible that you aren't asking yourself what you should be working on at a higher level that often (which, at least for me, tends to involve reviewing my open threads).

Comment by abramdemski on Bayesian Evolving-to-Extinction · 2020-02-25T20:12:07.764Z · score: 2 (1 votes) · LW · GW
Or just bad implementations do this - predict-o-matic as described sounds like a bad idea, and like it doesn't contain hypotheses, so much as "players"*. (And the reason there'd be a "side channel" is to understand theories - the point of which is transparency, which, if accomplished, would likely prevent manipulation.)

You can think of the side-channel as a "bad implementation" issue, but do you really want to say that we have to forego diagnostic logs in order to have a good implementation of "hypotheses" instead of "players"? Going to the extreme, every brain has side-channels such as EEG.

But more importantly, as Daniel K pointed out, you don't need the side-channel. If the predictions are being used in a complicated way to make decisions, the hypotheses/players have an incentive to fight each other through the consequences of those decisions.

So, the interesting question is, what's necessary for a *good* implementation of this?

This seems a strange thing to imagine - how can fighting occur, especially on a training set?

If the training set doesn't provide any opportunity for manipulation/corruption, then I agree that my argument isn't relevant for the training set. It's most directly relevant for online learning. However, keep in mind also that deep learning might be pushing in the direction of learning to learn. Something like a Memory Network is trained to "keep learning" in a significant sense. So you then have to ask if its learned learning strategy has these same issues, because that will be used on-line.

(I can almost imagine neurons passing on bad input, but a) it seems like gradient descent would get rid of that, and b) it's not clear where the "tickets" are.)

Simplifying the picture greatly, imagine that the second-back layer of neurons is one-neuron-per-ticket. Gradient descent can choose which of these to pay the most attention to, but little else; according to the lottery ticket hypothesis, the gradients passing through the 'tickets' themselves aren't doing that much for learning, besides reinforcing good tickets and weakening bad.

So imagine that there is one ticket which is actually malign, and has a sophisticated manipulative strategy. Sometimes it passes on bad input in service of its manipulations, but overall it is the best of the lottery tickets so while the gradient descent punishes it on those rounds, it is more than made up for in other cases. Furthermore, the manipulations of the malign ticket ensure that competing tickets are kept down, by manipulating situations to be those which the other tickets don't predict very well.

*I don't have a link to the claim, but it's been said before that 'the math behind Bayes' theorem requires each hypothesis to talk about all of the universe, as opposed to human models that can be domain limited.'

This remark makes me think you're thinking something about logical-induction style traders which only trade on a part of the data vs bayesian-style hypotheses which have to make predictions everywhere. I'm not sure how that relates to my post -- there are things to say about it, but, I don't think I said any of them. In particular the lottery-ticket hypothesis isn't about this; a "lottery ticket" is a small part of the deep NN, but, is effectively a hypothesis about the whole data.

Comment by abramdemski on Bayesian Evolving-to-Extinction · 2020-02-25T19:49:58.861Z · score: 3 (2 votes) · LW · GW

Ah right! I meant to address this. I think the results are more muddy (and thus don't serve as clear illustrations so well), but, you do get the same thing even without a side-channel.

Comment by abramdemski on Bayesian Evolving-to-Extinction · 2020-02-25T19:47:20.258Z · score: 3 (2 votes) · LW · GW

Yeah, in probability theory you don't have to worry about how everything is implemented. But for implementations of Bayesian modeling with a rich hypothesis class, each hypothesis could be something like a blob of code which actually does a variety of things.

As for "want", sorry for using that without unpacking it. What it specifically means is that hypotheses like that will have a tendency to get more probability weight in the system, so if we look at the weighty (and thus influential) hypotheses, they are more likely to implement strategies which achieve those ends.

Comment by abramdemski on Becoming Unusually Truth-Oriented · 2020-02-14T22:56:44.373Z · score: 8 (2 votes) · LW · GW

I should have phrased my previous comment as a question -- what do you see as valuable about the second half without the first half?

Maybe I can mostly answer that question for myself, though.

• "Developing Ideas" is related to the first half, but it's in an inventive frame -- so confabulation is much less of a concern. (But we might similarly doubt it and ask for empirical support.)
• "Inner Sim" has separate validation presumably (although I haven't looked into this).
• The motivated cognition section?
• "Correcting Yourself" has a pretty obvious story about why it should be useful.
• Explaining things to others is very generally observed to be useful, and the connection I make to the first half could be seen as spurious or at least not particularly important.
• The question of how to do gears thinking more and better is just pretty important all around. But I think my particular remarks are not any more empirically validated than the memory stuff I mentioned.
• Understanding others -- same as gears. Important, but not a lot to back up my remarks.

I think what I'm going to do is post a question about what could/should go into such a sequence.

Comment by abramdemski on Instrumental Occam? · 2020-02-14T21:12:45.055Z · score: 2 (1 votes) · LW · GW

Excellent, thanks for the comment! I really appreciate the correction. That's quite interesting.

Comment by abramdemski on Becoming Unusually Truth-Oriented · 2020-02-07T23:02:43.820Z · score: 4 (2 votes) · LW · GW

The way I currently see it, the second half of the post is more like an assortment of things, which are all tied together by the fact that they elaborate the basic mental movement in the first half of the post. So a post which was just the second half doesn't seem especially coherent to me.

Comment by abramdemski on Becoming Unusually Truth-Oriented · 2020-02-07T22:59:04.193Z · score: 2 (1 votes) · LW · GW
(I think it's fine as a "random ideas from Abram in 2020" post, but my impression is you had aspirations towards it serving as a good self-contained-intro-to-rationality)

Ah. Yeah, I guess I conceived of this as pretty solidly somewhere between those extremes, but the title could be misleading towards the second.

I don't think of it as a collection of random ideas interesting to me right now. I do think of it as a coherent thing. But I certainly don't intend it to be an introduction to rationality or even the subtopic of truth-orientedness in rationality.

Comment by abramdemski on Malign generalization without internal search · 2020-02-04T21:17:19.430Z · score: 6 (3 votes) · LW · GW

A similar borderline case is death spirals in ants. (Google it for nice pictures/videos of the phenomenon.) Ants may or may not do internal search, but regardless, it seems like this phenomenon could be reproduced without any internal search. The ants implement a search overall via a pattern of behavior distributed over many ants. This "search" behavior has a weird corner case where they literally go into a death spiral, which is quite non-obvious from the basic behavior pattern.

Comment by abramdemski on Instrumental Occam? · 2020-02-01T21:01:52.723Z · score: 3 (2 votes) · LW · GW

Yes, I agree with that. But (as I've said in the past) this formalism doesn't do it for me. I have yet to see something which strikes me as a compelling argument in its favor.

So in the context of planning by probabilistic inference, instrumental occam seems almost like a bug rather than a feature -- the unjustified bias toward simpler policies doesn't seem to serve a clear purpose. It's just an assumption.

Granted, the fact that I intuitively feel there should be some kind of instrumental occam is a point in favor of such methods in some sense.

Comment by abramdemski on High-precision claims may be refuted without being replaced with other high-precision claims · 2020-01-30T23:40:55.808Z · score: 19 (7 votes) · LW · GW

I like this post because I'm fond of using the "what's the better alternative?" argument in instrumental matters, so it's good to have an explicit flag of where it fails in epistemic matters. Technically the argument still holds, but the "better alternative" can be a high entropy theory, which often doesn't rise to saliency as a theory at all.

It's also a questionable heuristic in instrumental matters, as often it is possible to meaningly critique a policy without yet having a better alternative. But one must be careful to distinguish between these "speculative" critiques (which can note important downsides but don't strongly a policy should be changed, due to a lack of alternatives) vs true evaluations (which claim that changes need to be made, and therefore should be required to evaluate alternatives).

Comment by abramdemski on Realism about rationality · 2020-01-30T22:35:30.867Z · score: 6 (3 votes) · LW · GW
If the starting point is incoherent, then this approach doesn't seem like it'll go far - if AIXI isn't useful to study, then probably AIXItl isn't either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).

Hm. I already think the starting point of Bayesian decision theory (which is even "further up" than AIXI in how I am thinking about it) is fairly useful.

• In a naive sort of way, people can handle uncertain gambles by choosing a quantity to treat as 'utility' (such as money), quantifying probabilities of outcomes, and taking expected values. This doesn't always serve very well (e.g. one might prefer Kelley betting), but it was kind of the starting point (probability theory getting its starting point from gambling games) and the idea seems like a useful decision-making mechanism in a lot of situations.
• Perhaps more convincingly, probability theory seems extremely useful, both as a precise tool for statisticians and as a somewhat looser analogy for thinking about everyday life, cognitive biases, etc.

AIXI adds to all this the idea of quantifying Occam's razor with algorithmic information theory, which seems to be a very fruitful idea. But I guess this is the sort of thing we're going to disagree on.

As for AIXItl, I think it's sort of taking the wrong approach to "dragging things down to earth". Logical induction simultaneously makes things computable and solves a new set of interesting problems having to do with accomplishing that. AIXItl feels more like trying to stuff an uncomputable peg into a computable hole.

Comment by abramdemski on Becoming Unusually Truth-Oriented · 2020-01-30T18:12:34.845Z · score: 19 (5 votes) · LW · GW

It sounds like both of you are people who don't have experience with remembering dreams, so an opening which for me seemed very relatable didn't land. Raemon flags his comment as 'pedagogical note' and David calls recalling dreams a 'misleading example'.

But is there more to it than a starting example that didn't connect? David brings up research which ambiguously suggests there's a lot of confabulation around dreams. (I'm interested in references.) I had an in-person conversation with someone who read my post and thought the confabulation problem was more broadly damning.

My inside view is that if this were a very serious problem, I'd kind of be screwed. I'm having difficulty taking the position very seriously, because this is such a basic mental move. Of course at some level I'm saying "try doing more of this" and the question is "does doing more of this make things worse?" -- in the world where that's the case, we don't want to practice this mental motion or encourage it.

Objectively, dreams are kind of a worst-case scenario, since it isn't possible to check the reality. Subjectively, though, dreams seem to me like a really good case: I don't always know what I made up later vs what really occurred in the dream, but I (subjectively) know when I don't know. I often catch myself adding details, and can either tease out what was made up vs what was really there, or conclude that I can't do so.

I initially wrote this post with the idea of starting a sequence on "rationality as a practice", IE, trying to dig into things like this which are moment-to-moment habits of thought which one can work to improve no matter one's current level of skill. Now my feeling is that this sort of thing is bottlenecked on empirical evidence. I would like to know whether the stuff I propose actually increases or decreases confabulation.

Comment by abramdemski on Realism about rationality · 2020-01-19T20:23:19.603Z · score: 4 (2 votes) · LW · GW

So, yeah, one thing that's going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.)

But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded.

• The whole idea of the logical uncertainty problem is to consider agents with limited computational resources.
• Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways.

So to a large extent I think my recent direction can be seen as continuing a theme already present -- perhaps you might say I'm trying to properly learn the lesson of logical induction.

But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree.

So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world.

Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and try to drag it down to Earth rather than the other way around. (That model feels a bit fake but somewhat useful.)

(Generally I am disappointed by my reply here. I feel I have not adequately engaged with you, particularly on the function-vs-nature distinction. I may try again later.)

Comment by abramdemski on Realism about rationality · 2020-01-18T19:00:16.679Z · score: 2 (1 votes) · LW · GW

I generally like the re-framing here, and agree with the proposed crux.

I may try to reply more at the object level later.

Comment by abramdemski on The Zettelkasten Method · 2020-01-18T18:34:28.320Z · score: 2 (1 votes) · LW · GW

Yeah, I actually tried them, but didn't personally like them that well. They could definitely be an option for someone.

Comment by abramdemski on Realism about rationality · 2020-01-17T21:43:46.056Z · score: 10 (3 votes) · LW · GW
(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)

This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.

(I agree with 1, somewhat agree with 2, and don't agree with 3).

It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?

My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).

I guess my position is something like this. I think it may be quite possible to make capabilities "blindly" -- basically the processing-power heavy type of AI progress (applying enough tricks so you're not literally recapitulating evolution, but you're sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.

So I believe in some kind of knowledge to be had (ie, point #1).

Yeah, so, taking stock of the discussion again, it seems like:

• There's a thing-I-believe-which-is-kind-of-like-rationality-realism.
• Points 1 and 2 together seem more in line with that thing than "rationality realism" as I understood it from the OP.
• You already believe #1, and somewhat believe #2.
• We are both pessimistic about #3, but I'm so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
• We probably do have some disagreement about something like "how real is rationality?" -- but I continue to strongly suspect it isn't that cruxy.
(ETA: In my head I was replacing "evolution" with "reproductive fitness"; I don't agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don't know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)

I checked whether I thought the analogy was right with "reproductive fitness" and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I'm claiming that there's a theory of evolution out there.

Sorry it resulted in a confusing mixed metaphor overall.

But, separately, I don't get how you're seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they're separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution -- without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.

To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a "real" thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover "real" things that would then be important, but I don't think that's the claim.)

I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it's all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.

Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn't understand how organisms seeded on those planets would likely evolve.)

So -- it seems to me -- the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!

My underlying model is that when you talk about something so "real" that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can't do this with "non-real" things.

Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.

As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.

The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don't usually need exact models of attackers, and a system which relies on those is less likely to be secure.

Comment by abramdemski on Realism about rationality · 2020-01-17T20:10:45.871Z · score: 11 (2 votes) · LW · GW
I was thinking of the difference between the theory of electromagnetism vs the idea that there's a reproductive fitness function, but that it's very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with 'fake' models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I'm unsure which theory rationality will end up closer to.

[Spoiler-boxing the following response not because it's a spoiler, but because I was typing a response as I was reading your message and the below became less relevant. The end of your message includes exactly the examples I was asking for (I think), but I didn't want to totally delete my thinking-out-loud in case it gave helpful evidence about my state.]

I'm having trouble here because yes, the theory of population genetics factors in heavily to what I said, but to me reproductive fitness functions (largely) inherit their realness from the role they play in population genetics. So the two comparisons you give seem not very different to me. The "hard to determine what it is" from the first seems to lead directly to the "fake inputs" from the second.

So possibly you're gesturing at a level of realness which is "how real fitness functions would be if there were not a theory of population genetics"? But I'm not sure exactly what to imagine there, so could you give a different example (maybe a few) of something which is that level of real?

Separately, I feel weird having people ask me about why things are 'cruxy' when I didn't initially say that they were and without the context of an underlying disagreement that we're hashing out. Like, either there's some misunderstanding going on, or you're asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do.

Ah, well. I interpreted this earlier statement from you as a statement of cruxiness:

If I didn't believe the above, I'd be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my 'worldview' related to AI.

And furthermore the list following this:

Searching for beliefs I hold for which 'rationality realism' is crucial by imagining what I'd conclude if I learned that 'rationality irrealism' was more right:

So, yeah, I'm asking you about something which you haven't claimed is a crux of a disagreement which you and I are having, but, I am asking about it because I seem to have a disagreement with you about (a) whether rationality realism is true (pending clarification of what the term means to each of us), and (b) whether rationality realism should make a big difference for several positions you listed.

I confess to being quite troubled by AIXI's language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than "polynomial in some input", which should be some input to a good theory of bounded rationality.

Ah, so this points to a real and large disagreement between us about how subjective a theory of rationality should be (which may be somewhat independent of just how real rationality is, but is related).

I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.

Ok. Taking this as the rationality irrealism position, I would disagree with it, and also agree that it would make a big difference for the things you said rationality-irrealism would make a big difference for.

So I now think we have a big disagreement around point "a" (just how real rationality is), but maybe not so much around "b" (what the consequences are for the various bullet points you listed).

Comment by abramdemski on Realism about rationality · 2020-01-13T13:59:03.743Z · score: 8 (4 votes) · LW · GW
Although in some sense I also endorse the "strawman" that rationality is more like momentum than like fitness (at least some aspects of rationality).

How so?

I think that ricraz claims that it's impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the "momentum vs. fitness" comparison doesn't make sense to me.

Well, it's not entirely clear. First there is the "realism" claim, which might even be taken in contrast to mathematical abstraction; EG, "is IQ real, or is it just a mathematical abstraction"? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where "accurate" means, at least in part, helpfulness in making real predictions).

So the idea seems to be that there's a spectrum with physics at one extreme end. I'm not quite sure what goes at the other extreme end. Here's one possibility:

• Physics
• Chemistry
• Biology
• Psychology
• Social Sciences
• Humanities

A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, "realness" vs "mathematical modelability". Well, it's not clear exactly what that second axis should be.

Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect "reproductive fitness" levels rather than "momentum" levels.

Hmm, actually, I guess there's a tricky interpretational issue here, which is what it means to model agency exactly.

• On the one hand, I fully believe in Eliezer's idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality.
• But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter.

I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.

Comment by abramdemski on Realism about rationality · 2020-01-13T12:57:05.727Z · score: 4 (2 votes) · LW · GW
ETA: I also have a model of you being less convinced by realism about rationality than others in the "MIRI crowd"; in particular, selection vs. control seems decidedly less "realist" than mesa-optimizers (which didn't have to be "realist", but was quite "realist" the way it was written, especially in its focus on search).

Just a quick reply to this part for now (but thanks for the extensive comment, I'll try to get to it at some point).

It makes sense. My recent series on myopia also fits this theme. But I don't get much* push-back on these things. Some others seem even less realist than I am. I see myself as trying to carefully deconstruct my notions of "agency" into component parts that are less fake. I guess I do feel confused why other people seem less interested in directly deconstructing agency the way I am. I feel somewhat like others kind of nod along to distinctions like selection vs control but then go back to using a unitary notion of "optimization". (This applies to people at MIRI and also people outside MIRI.)

*The one person who has given me push-back is Scott.

Comment by abramdemski on Realism about rationality · 2020-01-13T12:48:08.992Z · score: 2 (1 votes) · LW · GW

How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don't see why the distinction should be so cruxy.

My suspicion is that Rationality Realism would have captured a crux much more closely if the line weren't "momentum vs reproductive fitness", but rather, "momentum vs the bystander effect" (ie, physics vs social psychology). Reproductive fitness implies something that's quite mathematizable, but with relatively "fake" models -- e.g., evolutionary models tend to assume perfectly separated generations, perfect mixing for breeding, etc. It would be absurd to model the full details of reality in an evolutionary model, although it's possible to get closer and closer.

I think that's more the sort of thing I expect for theories of agency! I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn't ignore poly-time differences (ie, anything "closer to the ground" than logical induction) has to be hardware-dependent as well.

If I didn't believe the above,

What alternative world are you imagining, though?

Comment by abramdemski on Realism about rationality · 2020-01-10T05:06:12.411Z · score: 53 (12 votes) · LW · GW

I didn't like this post. At the time, I didn't engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn't actually engage with the idea very much. So it seems like a good idea to say something now.

The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don't think it's my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it's a straw-man of the view it's trying to point at.

The main problem is the word "realism". It isn't clear exactly what it means, but I suspect that being really anti-realist about rationality would not shift my views about the importance of MIRI-style research that much.

I agree that there's something kind of like rationality realism. I just don't think this post successfully points at it.

Ricraz starts out with the list: momentum, evolutionary fitness, intelligence. He says that the question (of rationality realism) is whether fitness is more like momentum or more like fitness. Momentum is highly formalizable. Fitness is a useful abstraction, but no one can write down the fitness function for a given organism. If pressed, we have to admit that it does not exist: every individual organism has what amounts to its own different environment, since it has different starting conditions (nearer to different food sources, etc), and so, is selected on different criteria.

So as I understand it, the claim is that the MIRI cluster believes rationality is more like momentum, but many outside the MIRI cluster believe it's more like fitness.

It seems to me like my position, and the MIRI-cluster position, is (1) closer to "rationality is like fitness" than "rationality is like momentum", and (2) doesn't depend that much on the difference. Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality. (This also seems supported by the analogy -- evolutionary biologists still see fitness as a very important subject, and don't seem to care that much about exactly how real the abstraction is.)

To the extent that this post has made a lot of people think that rationality realism is an important crux, it's quite plausible to me that it's made the discussion worse.

To expand more on (1) -- since it seems a lot of people found its negation plausible -- it seems like if there's an analogue for the theory of evolution, which uses relatively unreal concepts like "fitness" to help us understand rational agency, we'd like to know about it. In this view, MIRI-cluster is essentially saying "biologists should want to invent evolution. Look at all the similarities across different animals. Don't you want to explain that?" Whereas the non-MIRI cluster is saying "biologists don't need to know about evolution."

Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2020-01-03T05:13:31.042Z · score: 2 (1 votes) · LW · GW
Let me explain more clearly why this is a circular argument:
a) You want to show that we should take counterfactuals into account when making decisions
b) You argue that this way of making decisions does better on average
c) The average includes the very counterfactuals whose value is in question. So b depends on a already being proven => circular argument

That isn't my argument though. My argument is that we ARE thinking ahead about counterfactual mugging right now, in considering the question. We are not misunderstanding something about the situation, or missing critical information. And from our perspective right now, we can see that agreeing to be mugged is the best strategy on average.

We can see that if we update on the value of the coin flip being tails, we would change our mind about this. But the statement of the problem requires that there is also the possibility of heads. So it does not make sense to consider the tails scenario in isolation; that would be a different decision problem (one in which Omega asks us for $100 out of the blue with no other significant backstory). So we (right now, considering how to reason about counterfactual muggings in the abstract) know that there are the two possibilities, with equal probability, and so the best strategy on average is to pay. So we see behaving updatefully as bad. So my argument for considering the multiple possibilities is, the role of thinking about decision theory now is to help guide the actions of my future self. You feel that I'm begging the question. I guess I take only thinking about this counterfactual as the default position, as where an average person is likely to be starting from. And I was trying to see if I could find an argument strong enough to displace this. So I'll freely admit I haven't provided a first-principles argument for focusing just on this counterfactual. I think the average person is going to be thinking about things like duty, honor, and consistency which can serve some of the purpose of updatelessness. But sure, updateful reasoning is a natural kind of starting point, particularly coming from a background of modern economics or bayesian decision theory. But my argument is compatible with that starting point, if you accept my "the role of thinking about decision theory now is to help guide future actions" line of thinking. In that case, starting from updateful assumptions now, decision-theoretic reasoning makes you think you should behave updatelessly in the future. Whereas the assumption you seem to be using, in your objection to my line of reasoning, is "we should think of decision-theoretic problems however we think of problems now". So if we start out an updateful agent, we would think about decision-theoretic problems and think "I should be updateful". If we start out a CDT agent, then when we think about decision-theoretic problems we would conclude that you should reason causally. EDT agents would think about problems and conclude you should reason evidentially. And so on. That's the reasoning I'm calling circular. Of course an agent should reason about a problem using its best current understanding. But my claim is that when doing decision theory, the way that best understanding should be applied is to figure out what decision theory does best, not to figure out what my current decision theory already does. And when we think about problems like counterfactual mugging, the description of the problem requires that there's both the possibility of heads and tails. So "best" means best overall, not just down the one branch. If the act of doing decision theory were generally serving the purpose of aiding in making the current decision, then my argument would not make sense, and yours would. Current-me might want to tell the me in that universe to be more updateless about things, but alternate-me would not be interested in hearing it, because alternate-me wouldn't be interested in thinking ahead in general, and the argument wouldn't make any sense with respect to alternate-me's current decision. So my argument involves a fact about the world which I claim determines which of several ways to reason, and hence, is not circular. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2020-01-02T07:50:00.821Z · score: 2 (1 votes) · LW · GW Iterated situations are indeed useful for understanding learning. But I'm trying to abstract out over the learning insofar as I can. I care that you get the information required for the problem, but not so much how you get it. OK, but I don't see how that addresses my argument. The average includes worlds that you know you are not in. So this doesn't help us justify taking these counterfactuals into account, This is the exact same response again (ie the very kind of response I was talking about in my remark you're responding to), where you beg the question of whether we should evaluate from an updateful perspective. Why is it problematic that we already know we are not in those worlds? Because you're reasoning updatefully? My original top-level answer explained why I think this is a circular justification in a way that the updateless position isn't. I'm not saying you should reason in this way. You should reason updatelessly. Ok. So what's at steak in this discussion is the justification for updatelessness, not the whether of updatelessness. I still don't get why you seem to dismiss my justification for updatelessness, though. All I'm understanding of your objection is a question-begging appeal to updatelful reasoning. Comment by abramdemski on What is an Evidential Decision Theory agent? · 2020-01-02T07:18:19.496Z · score: 6 (3 votes) · LW · GW I'm posting a short response rather than there be none, although I think you are calling for a longer more thoughtful response. I would simply say an evidential agent selects an action via ; that is, it evaluates each action by (Bayes-)conditioning on that action, and checking expected utility. Of course this simple formula can take on many complications when EDT is being described in more fleshed-out mathematical settings. Perhaps this is where part of the confusion comes from. There is some intuitive aspect to judging whether a more complicated formula is "essentially EDT". (For example, the classic rigorous formulation of EDT is the Jeffrey-Bolker axioms, which at a glance look nothing like the formula.) But I would say that most of the issue you're describing in the OP is that people think of EDT in terms of what it does or doesn't do, rather than in terms of this simple formula. That seems to be genuinely solved by just writing out when people seem unclear on what EDT is. Also, note, the claim that EDT doesn't smoke in smoking lesion is quite controversial (the famous tickle defense argues to the contrary). This is related to your observation that EDT will often correctly navigate causality, because the causal structure is already encoded in the conditional probability. So that's part of why it's critical to think of EDT as the formula, rather than as what it supposedly does or doesn't do. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2020-01-01T20:23:06.502Z · score: 4 (2 votes) · LW · GW You can learn about a situation other than by facing that exact situation yourself. For example, you may observe other agents facing that situation or receive testimony from an agent that has proven itself trustworthy. You don't even seem to disagree with me here as you wrote: "you can learn enough about the universe to be confident you're now in a counterfactual mugging without ever having faced one before" Right, I agree with you here. The argument is that we have to understand learning in the first place to be able to make these arguments, and iterated situations are the easiest setting to do that in. So if you're imagining that an agent learns what situation it's in more indirectly, but thinks about that situation differently than an agent who learned in an iterated setting, there's a question of why that is. It's more a priori plausible to me that a learning agent thinks about a problem by generalizing from similar situations it has been in, which I expect to act kind of like iteration. Or, as I mentioned re: all games are iterated games in logical time, the agent figures out how to handle a situation by generalizing from similar scenarios across logic. So any game we talk about is iterated in this sense. >One way of appealing to human moral intuition Doesn't work on counter-factually selfish agents I disagree. Reciprocal altruism and true altruism are kind of hard to distinguish in human psychology, but I said "it's a good deal" to point at the reciprocal-altruism intuition. The point being that acts of reciprocal altruism can be a good deal w/o having considered them ahead of time. It's perfectly possible to reason "it's a good deal to lose my hand in this situation, because I'm trading it for getting my life saved in a different situation; one which hasn't come about, but could have." I kind of feel like you're just repeatedly denying this line of reasoning. Yes, the situation in front of you is that you're in the risk-hand world rather than the risk-life world. But this is just question-begging with respect to updateful reasoning. Why give priority to that way of thinking over the "but it could just as well have been my life at steak" world? Especially when we can see that the latter way of reasoning does better on average? >Decision theory should be reflectively endorsed decision theory. That's what decision theory basically is: thinking we do ahead of time which is supposed to help us make decisions Thinking about decisions before you make them != thinking about decisions timelessly Ah, that's kind of the first reply from you that's surprised me in a bit. Can you say more about that? My feeling is that in this particular case the equality seems to hold. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-31T23:43:06.012Z · score: 4 (2 votes) · LW · GW considering I'm considering the case when you are only mugged once, that sounds an awful lot like saying it's reasonable to choose not to pay. The perspective I'm coming from is that you have to ask how you came to be in the epistemic situation you're in. Setting agents up in decision problems "from nothing" doesn't tell us much, if it doesn't make sense for an agent to become confident that it's in that situation. An example of this is smoking lesion. I've written before about how the usual version doesn't make very much sense as a situation that an agent can find itself in. The best way to justify the usual "the agent finds itself in a decision problem" way of working is to have a learning-theoretic setup in which a learning agent can successfully learn that it's in the scenario. Once we have that, it makes sense to think about the one-shot case, because we have a plausible story whereby an agent comes to believe it's in the situation described. This is especially important when trying to account for logical uncertainty, because now everything is learned -- you can't say a rational agent should be able to reason in a particular way, because the agent is still learning to reason. If an agent is really in a pure one-shot case, that agent can do anything at all. Because it has not learned yet. So, yes, "it's reasonable to choose not to pay", BUT ALSO any behavior at all is reasonable in a one-shot scenario, because the agent hasn't had a chance to learn yet. This doesn't necessarily mean you have to deal with an iterated counterfactual mugging. You can learn enough about the universe to be confident you're now in a counterfactual mugging without ever having faced one before. But a key part of counterfactual mugging is that you haven't considered things ahead of time. I think it is important to engage with this aspect or explain why this doesn't make sense. This goes along with the idea that it's unreasonable to consider agents as if they emerge spontaneously from a vacuum, face a single decision problem, and then disappear. An agent is evolved or built or something. This ahead-of-time work can't be in principle distinguished from "thinking ahead". As I said above, this becomes especially clear if we're trying to deal with logical uncertainty on top of everything else, because the agent is still learning to reason. The agent has to have experience reasoning about similar stuff in order to learn. We can give a fresh logical inductor a bunch of time to think about one thing, but how it spends that time is by thinking about all sorts of other logical problems in order to train up its heuristic reasoning. This is why I said all games are iterated games in logical time -- the logical inductor doesn't literally play the game a bunch of times to learn, but it simulates a bunch of parallel-universe versions of itself who have played a bunch of very similar games, which is very similar. imagine instead of$50 it was your hand being cut off to save your life in the counterfactual. It's going to be awfully tempting to keep your hand. Why is what you would have committed to, but didn't relevant?

One way of appealing to human moral intuition (which I think is not vacuous) is to say, what if you know that someone is willing to risk great harm to save your life because they trust you the same, and you find yourself in a situation where you can sacrifice your own hand to prevent a fatal injury from happening to them? It's a good deal; it could have been your life on the line.

But really my justification is more the precommitment story. Decision theory should be reflectively endorsed decision theory. That's what decision theory basically is: thinking we do ahead of time which is supposed to help us make decisions. I'm fine with imagining hypothetically that we haven't thought about things ahead of time, as an exercise to help us better understand how to think. But that means my take-away from the exercise is based on which ways of thinking seemed to help get better outcomes, in the hypothetical situations envisioned!

Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-30T21:22:11.407Z · score: 2 (1 votes) · LW · GW
My interest is in the counterfactual mugging in front of you, as this is the hardest part to justify. Future muggings aren't a difficult problem.

I'm not sure exactly what you're getting at, though. Obviously counterfactual mugging in front of you is always the one that matters, in some sense. But if I've considered things ahead of time already when confronted with my very first counterfactual mugging, then I may have decided to handle counterfactual mugging by paying up in general. And further, there's the classic argument that you should always consider what you would have committed to ahead of time.

I'm kind of feeling like you're ignoring those arguments, or something? Or they aren't interesting for your real question?

Basically I keep talking about how "yes you can refuse a finite number of muggings" because I'm trying to say that, sure, you don't end up concluding you should accept every mugging, but generally the argument via treat-present-cases-as-if-they-were-future-cases seems pretty strong. And the response I'm hearing from you sounds like "but what about present cases?"

Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-30T03:03:28.059Z · score: 4 (2 votes) · LW · GW
Why can't I use this argument for CDT in Newcomb's?

From my perspective right now, CDT does worse in Newcomb's. So, considering between CDT and EDT as ways of thinking about Newcomb, EDT and other 1-boxing DTs are better.

What I meant to say instead of future actions is that it is clear that we should commit to UDT for future muggings, but less clear if the mugging was already set up.

Even UDT advises to not give in to muggings if it already knows, in its prior, that it is in the world where Omega asks for the \$10. But you have to ask: who would be motivated to create such a UDT? Only "parents" who already knew the mugging outcome themselves, and weren't motivated to act updatelessly about it. And where did they come from? At some point, more-rational agency comes from less-rational agency. In the model where a CDT agent self-modifies to become updateless, which counterfactual muggings the UDT agent will and won't be mugged by gets baked in at that time. With evolved creatures, of course it is more complicated.

I'm not sure, but it seems like our disagreement might be around the magnitude of this somehow. Like, I'm saying something along the lines of "Sure, you refuse some counterfactual muggings, but only finitely many. From the outside, that looks like making a finite number of mistakes and then learning." While you're saying something like, "Sure, you'd rather get counterfactually mugged for all future muggings, but it still seems like you want to take the one in front of you." (So from my perspective you're putting yourself in the shoes of an agent who hasn't "learned better" yet.)

The analogy is a little strained, but I am thinking about it like a Bayesian update. If you keep seeing things go a certain way, you eventually predict that. But that doesn't make it irrational to hedge your bets for some time. So it can be rational in that sense to refuse some counterfactual muggings. But you should eventually take them.

The agent should still be able to solve such scenarios given a sufficient amount of time to think and the necessary starting information. Such as reliable reports about what happened to others who encountered counterfactual muggers

Basically, I don't think that way of thinking completely holds when we're dealing with logical uncertainty. A counterlogical mugging is a situation where time to think can, in a certain sense, hurt (if you fully update on that thinking, anyway). So there isn't such a clear distinction between thinking-from-starting-information and learning from experience.

Comment by abramdemski on What are we assuming about utility functions? · 2019-12-30T00:36:22.524Z · score: 2 (1 votes) · LW · GW

Yeah, I think something like this is pretty important. Another reason is that humans inherently don't like to be told, top-down, that X is the optimal solution. A utilitarian AI might redistribute property forcefully, where a pareto-improving AI would seek to compensate people.

An even more stringent requirement which seems potentially sensible: only pareto-improvements which both parties both understand and endorse. (IE, there should be something like consent.) This seems very sensible with small numbers of people, but unfortunately, seems infeasible for large numbers of people (given the way all actions have side-effects for many many people).