## Posts

Becoming Unusually Truth-Oriented 2020-01-03T01:27:06.677Z · score: 81 (29 votes)
The Credit Assignment Problem 2019-11-08T02:50:30.412Z · score: 64 (20 votes)
Defining Myopia 2019-10-19T21:32:48.810Z · score: 28 (6 votes)
Random Thoughts on Predict-O-Matic 2019-10-17T23:39:33.078Z · score: 27 (11 votes)
The Parable of Predict-O-Matic 2019-10-15T00:49:20.167Z · score: 188 (66 votes)
Partial Agency 2019-09-27T22:04:46.754Z · score: 53 (15 votes)
The Zettelkasten Method 2019-09-20T13:15:10.131Z · score: 126 (51 votes)
Do Sufficiently Advanced Agents Use Logic? 2019-09-13T19:53:36.152Z · score: 41 (16 votes)
Troll Bridge 2019-08-23T18:36:39.584Z · score: 73 (42 votes)
Conceptual Problems with UDT and Policy Selection 2019-06-28T23:50:22.807Z · score: 52 (13 votes)
What's up with self-esteem? 2019-06-25T03:38:15.991Z · score: 39 (18 votes)
How hard is it for altruists to discuss going against bad equilibria? 2019-06-22T03:42:24.416Z · score: 52 (15 votes)
Paternal Formats 2019-06-09T01:26:27.911Z · score: 60 (27 votes)
Mistakes with Conservation of Expected Evidence 2019-06-08T23:07:53.719Z · score: 148 (47 votes)
Does Bayes Beat Goodhart? 2019-06-03T02:31:23.417Z · score: 45 (14 votes)
Selection vs Control 2019-06-02T07:01:39.626Z · score: 111 (29 votes)
Separation of Concerns 2019-05-23T21:47:23.802Z · score: 70 (22 votes)
Alignment Research Field Guide 2019-03-08T19:57:05.658Z · score: 201 (73 votes)
Pavlov Generalizes 2019-02-20T09:03:11.437Z · score: 68 (20 votes)
What are the components of intellectual honesty? 2019-01-15T20:00:09.144Z · score: 32 (8 votes)
CDT=EDT=UDT 2019-01-13T23:46:10.866Z · score: 42 (11 votes)
When is CDT Dutch-Bookable? 2019-01-13T18:54:12.070Z · score: 25 (4 votes)
CDT Dutch Book 2019-01-13T00:10:07.941Z · score: 27 (8 votes)
Non-Consequentialist Cooperation? 2019-01-11T09:15:36.875Z · score: 46 (15 votes)
Combat vs Nurture & Meta-Contrarianism 2019-01-10T23:17:58.703Z · score: 54 (15 votes)
What makes people intellectually active? 2018-12-29T22:29:33.943Z · score: 91 (44 votes)
Embedded Agency (full-text version) 2018-11-15T19:49:29.455Z · score: 95 (38 votes)
Embedded Curiosities 2018-11-08T14:19:32.546Z · score: 84 (33 votes)
Subsystem Alignment 2018-11-06T16:16:45.656Z · score: 121 (39 votes)
Robust Delegation 2018-11-04T16:38:38.750Z · score: 120 (39 votes)
Embedded World-Models 2018-11-02T16:07:20.946Z · score: 90 (27 votes)
Decision Theory 2018-10-31T18:41:58.230Z · score: 99 (35 votes)
Embedded Agents 2018-10-29T19:53:02.064Z · score: 191 (81 votes)
A Rationality Condition for CDT Is That It Equal EDT (Part 2) 2018-10-09T05:41:25.282Z · score: 17 (6 votes)
A Rationality Condition for CDT Is That It Equal EDT (Part 1) 2018-10-04T04:32:49.483Z · score: 21 (7 votes)
In Logical Time, All Games are Iterated Games 2018-09-20T02:01:07.205Z · score: 83 (26 votes)
Track-Back Meditation 2018-09-11T10:31:53.354Z · score: 60 (24 votes)
Exorcizing the Speed Prior? 2018-07-22T06:45:34.980Z · score: 11 (4 votes)
Stable Pointers to Value III: Recursive Quantilization 2018-07-21T08:06:32.287Z · score: 20 (9 votes)
Probability is Real, and Value is Complex 2018-07-20T05:24:49.996Z · score: 44 (20 votes)
Complete Class: Consequentialist Foundations 2018-07-11T01:57:14.054Z · score: 43 (16 votes)
Policy Approval 2018-06-30T00:24:25.269Z · score: 49 (18 votes)
Machine Learning Analogy for Meditation (illustrated) 2018-06-28T22:51:29.994Z · score: 100 (37 votes)
Confusions Concerning Pre-Rationality 2018-05-23T00:01:39.519Z · score: 36 (7 votes)
Co-Proofs 2018-05-21T21:10:57.290Z · score: 91 (25 votes)
Bayes' Law is About Multiple Hypothesis Testing 2018-05-04T05:31:23.024Z · score: 81 (20 votes)
Words, Locally Defined 2018-05-03T23:26:31.203Z · score: 50 (15 votes)
Hufflepuff Cynicism on Hypocrisy 2018-03-29T21:01:29.179Z · score: 33 (17 votes)
Learn Bayes Nets! 2018-03-27T22:00:11.632Z · score: 84 (24 votes)
An Untrollable Mathematician Illustrated 2018-03-20T00:00:00.000Z · score: 268 (98 votes)

Comment by abramdemski on Realism about rationality · 2020-01-19T20:23:19.603Z · score: 4 (2 votes) · LW · GW

So, yeah, one thing that's going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.)

But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded.

• The whole idea of the logical uncertainty problem is to consider agents with limited computational resources.
• Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways.

So to a large extent I think my recent direction can be seen as continuing a theme already present -- perhaps you might say I'm trying to properly learn the lesson of logical induction.

But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree.

So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world.

Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and try to drag it down to Earth rather than the other way around. (That model feels a bit fake but somewhat useful.)

(Generally I am disappointed by my reply here. I feel I have not adequately engaged with you, particularly on the function-vs-nature distinction. I may try again later.)

Comment by abramdemski on Realism about rationality · 2020-01-18T19:00:16.679Z · score: 2 (1 votes) · LW · GW

I generally like the re-framing here, and agree with the proposed crux.

I may try to reply more at the object level later.

Comment by abramdemski on The Zettelkasten Method · 2020-01-18T18:34:28.320Z · score: 2 (1 votes) · LW · GW

Yeah, I actually tried them, but didn't personally like them that well. They could definitely be an option for someone.

Comment by abramdemski on Realism about rationality · 2020-01-17T21:43:46.056Z · score: 10 (3 votes) · LW · GW
(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)

This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.

(I agree with 1, somewhat agree with 2, and don't agree with 3).

It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?

My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).

I guess my position is something like this. I think it may be quite possible to make capabilities "blindly" -- basically the processing-power heavy type of AI progress (applying enough tricks so you're not literally recapitulating evolution, but you're sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.

So I believe in some kind of knowledge to be had (ie, point #1).

Yeah, so, taking stock of the discussion again, it seems like:

• There's a thing-I-believe-which-is-kind-of-like-rationality-realism.
• Points 1 and 2 together seem more in line with that thing than "rationality realism" as I understood it from the OP.
• You already believe #1, and somewhat believe #2.
• We are both pessimistic about #3, but I'm so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
• We probably do have some disagreement about something like "how real is rationality?" -- but I continue to strongly suspect it isn't that cruxy.
(ETA: In my head I was replacing "evolution" with "reproductive fitness"; I don't agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don't know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)

I checked whether I thought the analogy was right with "reproductive fitness" and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I'm claiming that there's a theory of evolution out there.

Sorry it resulted in a confusing mixed metaphor overall.

But, separately, I don't get how you're seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they're separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution -- without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.

To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a "real" thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover "real" things that would then be important, but I don't think that's the claim.)

I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it's all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.

Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn't understand how organisms seeded on those planets would likely evolve.)

So -- it seems to me -- the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!

My underlying model is that when you talk about something so "real" that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can't do this with "non-real" things.

Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.

As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.

The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don't usually need exact models of attackers, and a system which relies on those is less likely to be secure.

Comment by abramdemski on Realism about rationality · 2020-01-17T20:10:45.871Z · score: 11 (2 votes) · LW · GW
I was thinking of the difference between the theory of electromagnetism vs the idea that there's a reproductive fitness function, but that it's very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with 'fake' models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I'm unsure which theory rationality will end up closer to.

[Spoiler-boxing the following response not because it's a spoiler, but because I was typing a response as I was reading your message and the below became less relevant. The end of your message includes exactly the examples I was asking for (I think), but I didn't want to totally delete my thinking-out-loud in case it gave helpful evidence about my state.]

I'm having trouble here because yes, the theory of population genetics factors in heavily to what I said, but to me reproductive fitness functions (largely) inherit their realness from the role they play in population genetics. So the two comparisons you give seem not very different to me. The "hard to determine what it is" from the first seems to lead directly to the "fake inputs" from the second.

So possibly you're gesturing at a level of realness which is "how real fitness functions would be if there were not a theory of population genetics"? But I'm not sure exactly what to imagine there, so could you give a different example (maybe a few) of something which is that level of real?

Separately, I feel weird having people ask me about why things are 'cruxy' when I didn't initially say that they were and without the context of an underlying disagreement that we're hashing out. Like, either there's some misunderstanding going on, or you're asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do.

Ah, well. I interpreted this earlier statement from you as a statement of cruxiness:

If I didn't believe the above, I'd be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my 'worldview' related to AI.

And furthermore the list following this:

Searching for beliefs I hold for which 'rationality realism' is crucial by imagining what I'd conclude if I learned that 'rationality irrealism' was more right:

So, yeah, I'm asking you about something which you haven't claimed is a crux of a disagreement which you and I are having, but, I am asking about it because I seem to have a disagreement with you about (a) whether rationality realism is true (pending clarification of what the term means to each of us), and (b) whether rationality realism should make a big difference for several positions you listed.

I confess to being quite troubled by AIXI's language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than "polynomial in some input", which should be some input to a good theory of bounded rationality.

Ah, so this points to a real and large disagreement between us about how subjective a theory of rationality should be (which may be somewhat independent of just how real rationality is, but is related).

I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.

Ok. Taking this as the rationality irrealism position, I would disagree with it, and also agree that it would make a big difference for the things you said rationality-irrealism would make a big difference for.

So I now think we have a big disagreement around point "a" (just how real rationality is), but maybe not so much around "b" (what the consequences are for the various bullet points you listed).

Comment by abramdemski on Realism about rationality · 2020-01-13T13:59:03.743Z · score: 8 (4 votes) · LW · GW
Although in some sense I also endorse the "strawman" that rationality is more like momentum than like fitness (at least some aspects of rationality).

How so?

I think that ricraz claims that it's impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the "momentum vs. fitness" comparison doesn't make sense to me.

Well, it's not entirely clear. First there is the "realism" claim, which might even be taken in contrast to mathematical abstraction; EG, "is IQ real, or is it just a mathematical abstraction"? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where "accurate" means, at least in part, helpfulness in making real predictions).

So the idea seems to be that there's a spectrum with physics at one extreme end. I'm not quite sure what goes at the other extreme end. Here's one possibility:

• Physics
• Chemistry
• Biology
• Psychology
• Social Sciences
• Humanities

A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, "realness" vs "mathematical modelability". Well, it's not clear exactly what that second axis should be.

Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect "reproductive fitness" levels rather than "momentum" levels.

Hmm, actually, I guess there's a tricky interpretational issue here, which is what it means to model agency exactly.

• On the one hand, I fully believe in Eliezer's idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality.
• But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter.

I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.

Comment by abramdemski on Realism about rationality · 2020-01-13T12:57:05.727Z · score: 4 (2 votes) · LW · GW
ETA: I also have a model of you being less convinced by realism about rationality than others in the "MIRI crowd"; in particular, selection vs. control seems decidedly less "realist" than mesa-optimizers (which didn't have to be "realist", but was quite "realist" the way it was written, especially in its focus on search).

Just a quick reply to this part for now (but thanks for the extensive comment, I'll try to get to it at some point).

It makes sense. My recent series on myopia also fits this theme. But I don't get much* push-back on these things. Some others seem even less realist than I am. I see myself as trying to carefully deconstruct my notions of "agency" into component parts that are less fake. I guess I do feel confused why other people seem less interested in directly deconstructing agency the way I am. I feel somewhat like others kind of nod along to distinctions like selection vs control but then go back to using a unitary notion of "optimization". (This applies to people at MIRI and also people outside MIRI.)

*The one person who has given me push-back is Scott.

Comment by abramdemski on Realism about rationality · 2020-01-13T12:48:08.992Z · score: 2 (1 votes) · LW · GW

How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don't see why the distinction should be so cruxy.

My suspicion is that Rationality Realism would have captured a crux much more closely if the line weren't "momentum vs reproductive fitness", but rather, "momentum vs the bystander effect" (ie, physics vs social psychology). Reproductive fitness implies something that's quite mathematizable, but with relatively "fake" models -- e.g., evolutionary models tend to assume perfectly separated generations, perfect mixing for breeding, etc. It would be absurd to model the full details of reality in an evolutionary model, although it's possible to get closer and closer.

I think that's more the sort of thing I expect for theories of agency! I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn't ignore poly-time differences (ie, anything "closer to the ground" than logical induction) has to be hardware-dependent as well.

If I didn't believe the above,

What alternative world are you imagining, though?

Comment by abramdemski on Realism about rationality · 2020-01-10T05:06:12.411Z · score: 53 (12 votes) · LW · GW

I didn't like this post. At the time, I didn't engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn't actually engage with the idea very much. So it seems like a good idea to say something now.

The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don't think it's my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it's a straw-man of the view it's trying to point at.

The main problem is the word "realism". It isn't clear exactly what it means, but I suspect that being really anti-realist about rationality would not shift my views about the importance of MIRI-style research that much.

I agree that there's something kind of like rationality realism. I just don't think this post successfully points at it.

Ricraz starts out with the list: momentum, evolutionary fitness, intelligence. He says that the question (of rationality realism) is whether fitness is more like momentum or more like fitness. Momentum is highly formalizable. Fitness is a useful abstraction, but no one can write down the fitness function for a given organism. If pressed, we have to admit that it does not exist: every individual organism has what amounts to its own different environment, since it has different starting conditions (nearer to different food sources, etc), and so, is selected on different criteria.

So as I understand it, the claim is that the MIRI cluster believes rationality is more like momentum, but many outside the MIRI cluster believe it's more like fitness.

It seems to me like my position, and the MIRI-cluster position, is (1) closer to "rationality is like fitness" than "rationality is like momentum", and (2) doesn't depend that much on the difference. Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality. (This also seems supported by the analogy -- evolutionary biologists still see fitness as a very important subject, and don't seem to care that much about exactly how real the abstraction is.)

To the extent that this post has made a lot of people think that rationality realism is an important crux, it's quite plausible to me that it's made the discussion worse.

To expand more on (1) -- since it seems a lot of people found its negation plausible -- it seems like if there's an analogue for the theory of evolution, which uses relatively unreal concepts like "fitness" to help us understand rational agency, we'd like to know about it. In this view, MIRI-cluster is essentially saying "biologists should want to invent evolution. Look at all the similarities across different animals. Don't you want to explain that?" Whereas the non-MIRI cluster is saying "biologists don't need to know about evolution."

Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2020-01-03T05:13:31.042Z · score: 2 (1 votes) · LW · GW
Let me explain more clearly why this is a circular argument:
a) You want to show that we should take counterfactuals into account when making decisions
b) You argue that this way of making decisions does better on average
c) The average includes the very counterfactuals whose value is in question. So b depends on a already being proven => circular argument

That isn't my argument though. My argument is that we ARE thinking ahead about counterfactual mugging right now, in considering the question. We are not misunderstanding something about the situation, or missing critical information. And from our perspective right now, we can see that agreeing to be mugged is the best strategy on average.

We can see that if we update on the value of the coin flip being tails, we would change our mind about this. But the statement of the problem requires that there is also the possibility of heads. So it does not make sense to consider the tails scenario in isolation; that would be a different decision problem (one in which Omega asks us for $100 out of the blue with no other significant backstory). So we (right now, considering how to reason about counterfactual muggings in the abstract) know that there are the two possibilities, with equal probability, and so the best strategy on average is to pay. So we see behaving updatefully as bad. So my argument for considering the multiple possibilities is, the role of thinking about decision theory now is to help guide the actions of my future self. You feel that I'm begging the question. I guess I take only thinking about this counterfactual as the default position, as where an average person is likely to be starting from. And I was trying to see if I could find an argument strong enough to displace this. So I'll freely admit I haven't provided a first-principles argument for focusing just on this counterfactual. I think the average person is going to be thinking about things like duty, honor, and consistency which can serve some of the purpose of updatelessness. But sure, updateful reasoning is a natural kind of starting point, particularly coming from a background of modern economics or bayesian decision theory. But my argument is compatible with that starting point, if you accept my "the role of thinking about decision theory now is to help guide future actions" line of thinking. In that case, starting from updateful assumptions now, decision-theoretic reasoning makes you think you should behave updatelessly in the future. Whereas the assumption you seem to be using, in your objection to my line of reasoning, is "we should think of decision-theoretic problems however we think of problems now". So if we start out an updateful agent, we would think about decision-theoretic problems and think "I should be updateful". If we start out a CDT agent, then when we think about decision-theoretic problems we would conclude that you should reason causally. EDT agents would think about problems and conclude you should reason evidentially. And so on. That's the reasoning I'm calling circular. Of course an agent should reason about a problem using its best current understanding. But my claim is that when doing decision theory, the way that best understanding should be applied is to figure out what decision theory does best, not to figure out what my current decision theory already does. And when we think about problems like counterfactual mugging, the description of the problem requires that there's both the possibility of heads and tails. So "best" means best overall, not just down the one branch. If the act of doing decision theory were generally serving the purpose of aiding in making the current decision, then my argument would not make sense, and yours would. Current-me might want to tell the me in that universe to be more updateless about things, but alternate-me would not be interested in hearing it, because alternate-me wouldn't be interested in thinking ahead in general, and the argument wouldn't make any sense with respect to alternate-me's current decision. So my argument involves a fact about the world which I claim determines which of several ways to reason, and hence, is not circular. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2020-01-02T07:50:00.821Z · score: 2 (1 votes) · LW · GW Iterated situations are indeed useful for understanding learning. But I'm trying to abstract out over the learning insofar as I can. I care that you get the information required for the problem, but not so much how you get it. OK, but I don't see how that addresses my argument. The average includes worlds that you know you are not in. So this doesn't help us justify taking these counterfactuals into account, This is the exact same response again (ie the very kind of response I was talking about in my remark you're responding to), where you beg the question of whether we should evaluate from an updateful perspective. Why is it problematic that we already know we are not in those worlds? Because you're reasoning updatefully? My original top-level answer explained why I think this is a circular justification in a way that the updateless position isn't. I'm not saying you should reason in this way. You should reason updatelessly. Ok. So what's at steak in this discussion is the justification for updatelessness, not the whether of updatelessness. I still don't get why you seem to dismiss my justification for updatelessness, though. All I'm understanding of your objection is a question-begging appeal to updatelful reasoning. Comment by abramdemski on What is an Evidential Decision Theory agent? · 2020-01-02T07:18:19.496Z · score: 6 (3 votes) · LW · GW I'm posting a short response rather than there be none, although I think you are calling for a longer more thoughtful response. I would simply say an evidential agent selects an action via ; that is, it evaluates each action by (Bayes-)conditioning on that action, and checking expected utility. Of course this simple formula can take on many complications when EDT is being described in more fleshed-out mathematical settings. Perhaps this is where part of the confusion comes from. There is some intuitive aspect to judging whether a more complicated formula is "essentially EDT". (For example, the classic rigorous formulation of EDT is the Jeffrey-Bolker axioms, which at a glance look nothing like the formula.) But I would say that most of the issue you're describing in the OP is that people think of EDT in terms of what it does or doesn't do, rather than in terms of this simple formula. That seems to be genuinely solved by just writing out when people seem unclear on what EDT is. Also, note, the claim that EDT doesn't smoke in smoking lesion is quite controversial (the famous tickle defense argues to the contrary). This is related to your observation that EDT will often correctly navigate causality, because the causal structure is already encoded in the conditional probability. So that's part of why it's critical to think of EDT as the formula, rather than as what it supposedly does or doesn't do. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2020-01-01T20:23:06.502Z · score: 4 (2 votes) · LW · GW You can learn about a situation other than by facing that exact situation yourself. For example, you may observe other agents facing that situation or receive testimony from an agent that has proven itself trustworthy. You don't even seem to disagree with me here as you wrote: "you can learn enough about the universe to be confident you're now in a counterfactual mugging without ever having faced one before" Right, I agree with you here. The argument is that we have to understand learning in the first place to be able to make these arguments, and iterated situations are the easiest setting to do that in. So if you're imagining that an agent learns what situation it's in more indirectly, but thinks about that situation differently than an agent who learned in an iterated setting, there's a question of why that is. It's more a priori plausible to me that a learning agent thinks about a problem by generalizing from similar situations it has been in, which I expect to act kind of like iteration. Or, as I mentioned re: all games are iterated games in logical time, the agent figures out how to handle a situation by generalizing from similar scenarios across logic. So any game we talk about is iterated in this sense. >One way of appealing to human moral intuition Doesn't work on counter-factually selfish agents I disagree. Reciprocal altruism and true altruism are kind of hard to distinguish in human psychology, but I said "it's a good deal" to point at the reciprocal-altruism intuition. The point being that acts of reciprocal altruism can be a good deal w/o having considered them ahead of time. It's perfectly possible to reason "it's a good deal to lose my hand in this situation, because I'm trading it for getting my life saved in a different situation; one which hasn't come about, but could have." I kind of feel like you're just repeatedly denying this line of reasoning. Yes, the situation in front of you is that you're in the risk-hand world rather than the risk-life world. But this is just question-begging with respect to updateful reasoning. Why give priority to that way of thinking over the "but it could just as well have been my life at steak" world? Especially when we can see that the latter way of reasoning does better on average? >Decision theory should be reflectively endorsed decision theory. That's what decision theory basically is: thinking we do ahead of time which is supposed to help us make decisions Thinking about decisions before you make them != thinking about decisions timelessly Ah, that's kind of the first reply from you that's surprised me in a bit. Can you say more about that? My feeling is that in this particular case the equality seems to hold. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-31T23:43:06.012Z · score: 4 (2 votes) · LW · GW considering I'm considering the case when you are only mugged once, that sounds an awful lot like saying it's reasonable to choose not to pay. The perspective I'm coming from is that you have to ask how you came to be in the epistemic situation you're in. Setting agents up in decision problems "from nothing" doesn't tell us much, if it doesn't make sense for an agent to become confident that it's in that situation. An example of this is smoking lesion. I've written before about how the usual version doesn't make very much sense as a situation that an agent can find itself in. The best way to justify the usual "the agent finds itself in a decision problem" way of working is to have a learning-theoretic setup in which a learning agent can successfully learn that it's in the scenario. Once we have that, it makes sense to think about the one-shot case, because we have a plausible story whereby an agent comes to believe it's in the situation described. This is especially important when trying to account for logical uncertainty, because now everything is learned -- you can't say a rational agent should be able to reason in a particular way, because the agent is still learning to reason. If an agent is really in a pure one-shot case, that agent can do anything at all. Because it has not learned yet. So, yes, "it's reasonable to choose not to pay", BUT ALSO any behavior at all is reasonable in a one-shot scenario, because the agent hasn't had a chance to learn yet. This doesn't necessarily mean you have to deal with an iterated counterfactual mugging. You can learn enough about the universe to be confident you're now in a counterfactual mugging without ever having faced one before. But a key part of counterfactual mugging is that you haven't considered things ahead of time. I think it is important to engage with this aspect or explain why this doesn't make sense. This goes along with the idea that it's unreasonable to consider agents as if they emerge spontaneously from a vacuum, face a single decision problem, and then disappear. An agent is evolved or built or something. This ahead-of-time work can't be in principle distinguished from "thinking ahead". As I said above, this becomes especially clear if we're trying to deal with logical uncertainty on top of everything else, because the agent is still learning to reason. The agent has to have experience reasoning about similar stuff in order to learn. We can give a fresh logical inductor a bunch of time to think about one thing, but how it spends that time is by thinking about all sorts of other logical problems in order to train up its heuristic reasoning. This is why I said all games are iterated games in logical time -- the logical inductor doesn't literally play the game a bunch of times to learn, but it simulates a bunch of parallel-universe versions of itself who have played a bunch of very similar games, which is very similar. imagine instead of$50 it was your hand being cut off to save your life in the counterfactual. It's going to be awfully tempting to keep your hand. Why is what you would have committed to, but didn't relevant?

One way of appealing to human moral intuition (which I think is not vacuous) is to say, what if you know that someone is willing to risk great harm to save your life because they trust you the same, and you find yourself in a situation where you can sacrifice your own hand to prevent a fatal injury from happening to them? It's a good deal; it could have been your life on the line.

But really my justification is more the precommitment story. Decision theory should be reflectively endorsed decision theory. That's what decision theory basically is: thinking we do ahead of time which is supposed to help us make decisions. I'm fine with imagining hypothetically that we haven't thought about things ahead of time, as an exercise to help us better understand how to think. But that means my take-away from the exercise is based on which ways of thinking seemed to help get better outcomes, in the hypothetical situations envisioned!

Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-30T21:22:11.407Z · score: 2 (1 votes) · LW · GW
My interest is in the counterfactual mugging in front of you, as this is the hardest part to justify. Future muggings aren't a difficult problem.

I'm not sure exactly what you're getting at, though. Obviously counterfactual mugging in front of you is always the one that matters, in some sense. But if I've considered things ahead of time already when confronted with my very first counterfactual mugging, then I may have decided to handle counterfactual mugging by paying up in general. And further, there's the classic argument that you should always consider what you would have committed to ahead of time.

I'm kind of feeling like you're ignoring those arguments, or something? Or they aren't interesting for your real question?

Basically I keep talking about how "yes you can refuse a finite number of muggings" because I'm trying to say that, sure, you don't end up concluding you should accept every mugging, but generally the argument via treat-present-cases-as-if-they-were-future-cases seems pretty strong. And the response I'm hearing from you sounds like "but what about present cases?"

Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-30T03:03:28.059Z · score: 4 (2 votes) · LW · GW
Why can't I use this argument for CDT in Newcomb's?

From my perspective right now, CDT does worse in Newcomb's. So, considering between CDT and EDT as ways of thinking about Newcomb, EDT and other 1-boxing DTs are better.

What I meant to say instead of future actions is that it is clear that we should commit to UDT for future muggings, but less clear if the mugging was already set up.

Even UDT advises to not give in to muggings if it already knows, in its prior, that it is in the world where Omega asks for the 10. But you have to ask: who would be motivated to create such a UDT? Only "parents" who already knew the mugging outcome themselves, and weren't motivated to act updatelessly about it. And where did they come from? At some point, more-rational agency comes from less-rational agency. In the model where a CDT agent self-modifies to become updateless, which counterfactual muggings the UDT agent will and won't be mugged by gets baked in at that time. With evolved creatures, of course it is more complicated. I'm not sure, but it seems like our disagreement might be around the magnitude of this somehow. Like, I'm saying something along the lines of "Sure, you refuse some counterfactual muggings, but only finitely many. From the outside, that looks like making a finite number of mistakes and then learning." While you're saying something like, "Sure, you'd rather get counterfactually mugged for all future muggings, but it still seems like you want to take the one in front of you." (So from my perspective you're putting yourself in the shoes of an agent who hasn't "learned better" yet.) The analogy is a little strained, but I am thinking about it like a Bayesian update. If you keep seeing things go a certain way, you eventually predict that. But that doesn't make it irrational to hedge your bets for some time. So it can be rational in that sense to refuse some counterfactual muggings. But you should eventually take them. The agent should still be able to solve such scenarios given a sufficient amount of time to think and the necessary starting information. Such as reliable reports about what happened to others who encountered counterfactual muggers Basically, I don't think that way of thinking completely holds when we're dealing with logical uncertainty. A counterlogical mugging is a situation where time to think can, in a certain sense, hurt (if you fully update on that thinking, anyway). So there isn't such a clear distinction between thinking-from-starting-information and learning from experience. Comment by abramdemski on What are we assuming about utility functions? · 2019-12-30T00:36:22.524Z · score: 2 (1 votes) · LW · GW Yeah, I think something like this is pretty important. Another reason is that humans inherently don't like to be told, top-down, that X is the optimal solution. A utilitarian AI might redistribute property forcefully, where a pareto-improving AI would seek to compensate people. An even more stringent requirement which seems potentially sensible: only pareto-improvements which both parties both understand and endorse. (IE, there should be something like consent.) This seems very sensible with small numbers of people, but unfortunately, seems infeasible for large numbers of people (given the way all actions have side-effects for many many people). Comment by abramdemski on What are we assuming about utility functions? · 2019-12-30T00:31:16.102Z · score: 2 (1 votes) · LW · GW I didn't reply to this originally, probably because I think it's all pretty reasonable. That's why I distinguished between the hypotheses of "human utility" and CEV. It is my vague understanding (and I could be wrong) that some alignment researchers see it as their task to align AGI with current humans and their values, thinking the "extrapolation" less important or that it will take care of itself, while others consider extrapolation an important part of the alignment problem. My thinking on this is pretty open. In some sense, everything is extrapolation (you don't exactly "currently" have preferences, because every process is expressed through time...). But OTOH there may be a strong argument for doing as little extrapolation as possible. My intuitions tend to agree, but I'm also inclined to ask "why not?" e.g. even if my preferences are absurdly cyclical, but we get AGI to imitate me perfectly (or me + faster thinking + more information) Well, imitating you is not quite right. (EG, the now-classic example introduced with the CIRL framework: you want the AI to help you make coffee, not learn to drink coffee itself.) Of course maybe it is imitating you at some level in its decision-making, like, imitating your way of judging what's good. under what sense of the word is it "unaligned" with me? I'm thinking things like: will it disobey requests which it understands and is capable of? Will it fight you? Not to say that those things are universally wrong to do, but they could be types of alignment we're shooting for, and inconsistencies do seem to create trouble there. Presumably if we know that it might fight us, we would want to have some kind of firm statement about what kind of "better" reasoning would make it do so (e.g., it might temporarily fight us if we were severely deluded in some way, but we want pretty high standards for that). Comment by abramdemski on What are we assuming about utility functions? · 2019-12-29T22:44:19.945Z · score: 19 (4 votes) · LW · GW Yeah, I don't 100% buy the arguments which I gave in bullet-points in my previous comment. But I guess I would say the following: I expect to basically not buy any descriptive theory of human preferences. It doesn't seem likely we could find super-prospect theory which really successfully codified the sort of inconsistencies which we see in human values, and then reap some benefits for AI alignment. So it seems like what you want to do instead is make very few assumptions at all. Assume that the human can do things like answer questions, but don't expect responses to be consistent even in the most basic sense of "the same answer to the same question". Of course, this can't be the end of the story, since we need to have a criterion -- what it means to be aligned with such a human. But hopefully the criterion would also be as agnostic as possible. I don't want to rely on specific theories of human irrationality. So, when you say you want to see more discussion of this because it is "absolutely critical", I am curious about your model of what kind of answers are possible and useful. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-23T22:06:08.844Z · score: 2 (1 votes) · LW · GW I actually agree with Elizier's argument that winning is more important than abstract conventions of thought. It's just it's not always clear which option is winning. Indeed here, as I've argued, winning seems to match more directly to not paying and abstract conventions of thought to the arguments about the counterfactual. It seems to me as if you're ignoring the general thrust of my position, which is that the notion of winning that's important is the one we have in-hand when we are thinking about what decision procedure to use. This seems to strongly favor paying up in counterfactual mugging, except for some finite set of counterfactual muggings which we already know about at the time when we consider this. Yeah I'm not disputing pre-committing to UDT for future actions, the question is more difficult when it comes to past actions. One thought: even if you're in a counterfactual mugging that was set up before you came into existence, before you learn about it you might have time to pre-commit to paying in any such situations. It seems right to focus on future actions, because those are the ones which our current thoughts about which decision theory to adopt will influence. Well, this is the part of the question I'm interested in. As I said, I have no objection to pre-committing to UDT for future actions So is it that we have the same position with respect to future counterfactual muggings, but you are trying to figure out how to deal with present ones? I think that since no agent can be perfect from the start, we always have to imagine that an agent will make some mistakes before it gets on the right track. So if it refuses to be counterfactually mugged a few times before settling on a be-mugged strategy, we cannot exactly say that was rational or irrational; it depends on the prior. An agent might assent or refuse to pay up on a counterfactual mugging on the 5th digit of . We can't absolutely call that right or wrong. So, I think how an agent deals with a single counterfactual muggings is kind of its own business. It is only clear that it should not refuse mugging forever. (And if it refuses mugging for a really long time, this feels not so good, even if it would eventually start being mugged.) Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-23T21:49:31.631Z · score: 4 (2 votes) · LW · GW Here is my understanding. I was not really involved in the events, so, take this with a grain of salt; it's all third hand. FDT was attempting to be an umbrella term for "MIRI-style decision theories", ie decision theories which 1-box on Newcomb, cooperate in twin prisoner's dilemma, accept counterfactual muggings, grapple with logical uncertainty rather than ignoring it, and don't require free will (ie, can be implemented as deterministic algorithms without conceptual problems that the decision theory doesn't provide the tools to handle). The two main alternatives which FDT was trying to be an umbrella term were UDT, and TDT (timeless decision theory). However, the FDT paper leaned far toward TDT ways of describing things -- specifically, giving diagrams which look like causal models, and describing the decision procedure as making an intervention on the node corresponding to the ouput of the decision algorithm. This was too far from how Wei Dai envisions UDT. So FDT ended up being mostly a re-branding of TDT, but with less concrete detail (so FDT is an umbrella term for a family of TDT-like decision theories, but, not an umbrella large enough to encompass UDT). I think of TDT and UDT as about equally capable, but only if TDT does anthropic reasoning. Otherwise, UDT is strictly more capable, because TDT will not pay in counterfactual mugging, because it updates on its observations. FDT cannot be directly compared, because it is simply more vague than TDT. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-23T21:29:06.266Z · score: 4 (2 votes) · LW · GW Yep, "it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem", and these "ways of approaching the problem" are coded by the agent designer, whether explicitly, or by making it create and apply a "decision theory" subroutine. Once the algorithm is locked in by the designer (who is out of scope for OO), in this world an OO already knows what decision theory the agent will discover and use. TL;DR: the agent is in scope of OO, while the agent designer is out of scope and so potentially has the grounds of thinking of themselves as "making a (free) decision" without breaking self-consistency, while the agent has no such luxury. That's the "special point in the chain". What exactly does in-scope / out-of-scope mean? The OO has access to what the designer does (since the designer's design is given to the OO), so for practical purposes, the OO is predicting the designer perfectly. Just not by simulating the OO. Seems like this is what is relevant in this case. I am making no claims here whether in the "real world" we are more like agents or more like agent designers, since there are no OOs that we know of that could answer the question. But you are making the claim that there is an objective distinction. It seems to me more like a subjective one: I can look at an algorithm from a number of perspectives; some of them will be more like OO (seeing it as "just an algorithm"), while others will regard the algorithm as an agent (unable to calculate exactly what the algorithm will do, they're forced to take the intentional stance). IE, for any agent you can imagine an OO for that agent, while you can also imagine a number of other perspectives. (Even if there are true-random bits involved in a decision, we can imagine an OO with access to those true-random bits. For quantum mechanics this might involve a violation of physics (e.g. no-cloning theorem), which is important in some sense, but doesn't strike me as so philosophically important.) I don't know what it means for there to be a more objective distinction, unless it is the quantum randomness thing, in which case maybe we largely agree on questions aside from terminology. Well. I am not sure that "it has to "think as if it has a choice"". Thinking about having a choice seems like it requires an internal narrator, a degree of self-awareness. It is an open question whether an internal narrator necessarily emerges once the algorithm complexity is large enough. In fact, that would be an interesting open problem to work on, and if I were to do research in the area of agency and decision making, I would look into this as a project. If an internal narrator is not required, then there is no thinking about choices, just following the programming that makes a decision. A bacteria following a sugar gradient probably doesn't think about choices. Not sure what counts as thinking for a chess program and whether it has the quale of having a choice. I want to distinguish "thinking about choices" from "awareness of thinking about choices" (which seems approximately like "thinking about thinking about choices", though there's probably more to it). I am only saying that it is thinking about choices, ie computing relative merits of different choices, not that it is necessarily consciously aware of doing so, or that it has an internal narrator. It "has a perspective" from which it has choices in that there is a describable epistemic position which it is in, not that it's necessarily self-aware of being in that position in a significant sense. If you know that someone has predicted your behavior, then you accept that you are a deterministic algorithm, and the inventing of the decision algorithm is just a deterministic subroutine of it. I don't think we disagree there. (correct) The future is set, you are relegated to learning about what it is, and to feel the illusion of inventing the decision algorithm and/or acting on it. A self-consistent attitude in the OO setup is more like "I am just acting out my programming, and it feels like making decisions". This seems to be where we disagree. It is not like there is a seperate bit of clockwork deterministically ticking away and eventually spitting out an answer, with "us" standing off to the side and eventually learning what decision was made. We are the computation which outputs the decision. Our hand is not forced. So it does not seem right to me to say that the making-of-decisions is only an illusion. If we did not think through the decisions, they would in fact not be made the same. So the thing-which-determines-the-decision is precisely such thinking. There is not a false perception about what hand is pulling the strings in this scenario; so what is the illusion? Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-20T16:54:45.794Z · score: 2 (1 votes) · LW · GW Ok, this helps me understand your view better. But not completely. I don't think there is such a big difference between the agent and the agent-designer. Who are the 'we" in this setup? A world designer can sure create an NPC (which you are in this setup) to one-box. Can the NPC itself change their algorithm? We (as humans) are (always) still figuring out how to make decisions. From our perspective, we are still inventing the decision algorithm. From OO's perspective, we were always going to behave a certain way. But, this does not contradict our perspective; OO just knows more. In the computer-programmed scenario, there is a chain of decision points: we think of the idea -> we start programming, and design various bots -> the bots themselves learn (in the case of ML bots), which selects between various strategies -> the strategies themselves perform some computation to select actions In the OO case, it does not matter so much where in this chain a particular computation occurs (because omniscient omega can predict the entire chain equally well). So it might be that I implement a bit of reasoning when writing a bot; or it might be the learning algorithm that implements that bit of reasoning; or it might be the learned strategy. Similarly, we have a chain which includes biological evolution, cultural innovation, our parents meeting, our conception, our upbringing, what we learn in school, what we think about at various points in our lives, leading up to this moment. Who are the 'we" in this setup? A world designer can sure create an NPC (which you are in this setup) to one-box. Can the NPC itself change their algorithm? I do not think there is a special point in the chain. Well -- it's true that different points in the chain have varying degrees of agency. But any point in the chain which is performing important computation "could", from its perspective, do something differently, changing the remainder of the chain. So we, the bot designer, could design the bot differently (from our perspective when choosing how to design the bot). The bot's learning algorithm could have selected a different strategy (from its perspective). And the strategy could have selected a different action. Of course, from our perspective, it is a little difficult to imagine the learning algorithm selecting a different strategy, if we understand how the learning algorithm works. And it is fairly difficult to imagine the strategy selecting a different action, since it is going to be a relatively small computation. But this is the same way that OO would have difficulty thinking of us doing something different, since OO can predict exactly what we do and exactly how we arrive at our decision. The learning algorithm's entire job is to select between different alternative strategies; it has to "think as if it has a choice", or else it could not perform the computation it needs to perform. Similarly, the learned strategy has to select between different actions; if there is a significant computational problem being solved by doing this, it must be "thinking as if it had a choice" as well (though, granted, learned strategies are often more like lookup tables, in which case I would not say that). This does not mean choice is an illusion at any point in the chain. Choice is precisely the computation which chooses between alternatives. The alternatives are an illusion, in that counterfactuals are subjective. So that's my view. I'm still confused about aspects of your view. Particularly, this: If you are a sufficiently smart NPC in the OO world, you will find that the only self-consistent approach is to act while knowing that you are just acting out your programming and that "decisions" are an illusion you cannot avoid. How is this consistent with your assertion that OO-problems are inconsistent because "you cannot optimize for interaction with an interaction with OO"? As you say, the NPC is forced to consider the "illusion" of choice -- it is an illusion which cannot be avoided. Furthermore, this is due to the real situation which it actually finds itself in. (Or at least, the realistic scenario which we are imagining it is in.) So it seems to me it is faced with a real problem, which it actually has to solve; and, there are better and worse ways of approaching this problem (e.g., UDT-like thinking will tend to produce better results). So, • The alternatives are fake (counterfactuals are subjective), but, • The problem is real, • The agent has to make a choice, • There are better and worse ways of reasoning about that choice -- we can see that agents who reason in one way or another do better/worse, • It helps to study better and worse ways of reasoning ahead of time (whether that's by ML algorithms learning, or humans abstractly reasoning about decision theory). So it seems to me that this is very much like any other sort of hypothetical problem which we can benefit from reasoning about ahead of time (e.g., "how to build bridges"). The alternatives are imaginary, but the problem is real, and we can benefit from considering how to approach it ahead of time (whether we're human or sufficiently advanced NPC). Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-20T02:54:41.692Z · score: 2 (1 votes) · LW · GW One way of experimenting with this would be to use simulable agents (such as RL agents). We could set up the version where Omega is perfectly infallible (simulate the agent 100% accurately, including any random bits) and watch what different decision procedures do in this situation. So, we can set up OO situations in reality. If we did this, we could see agents both 1-boxing and 2-boxing. We would see 1-boxers get better outcomes. Furthermore, if we were designing agents for this task, designing them to 1-box would be a good strategy. This seems to undermine your position that OO situations are self-contradictory (since we can implement them on computers), and that the advice to 1-box is meaningless. If we try to write a decision-making algorithm based on "you are a deterministic automaton, whatever you feel or think or pretend to decide is an artifact of the algorithm that runs you, so the question whether to pay is meaningless, you either will pay or will not, you have no control over it." we would not have an easy time. Comment by abramdemski on Counterfactual Mugging: Why should you pay? · 2019-12-20T02:38:53.640Z · score: 9 (2 votes) · LW · GW I'm most fond of the precommitment argument. You say: You could argue that you would have pre-commited to paying if you had known about the situation ahead of time. True, but you didn't pre-commit and you didn't know about it ahead of time, so the burden is on you to justify why you should act as though you did. In Newcomb's problem you want to have pre-committed and if you act as though you were pre-committed then you will find that you actually were pre-committed. However, here it is the opposite. Upon discovering that the coin came up tails, you want to act as though you were not pre-commited to pay and if you act that way, you will find that you actually were indeed not pre-commited. I do not think this gets at the heart of the precommitment argument. You mention cousin_it's argument that what we care about is what decision theory we'd prefer a benevolent AI to use. You grant that this makes sense for that case, but you seem skeptical that the same reasoning applies to humans. I argue that it does. When reasoning abstractly about decision-making, I am (in part) thinking about how I would like myself to make decisions in the future. So it makes sense for me to say to myself, "Ah, I'd want to be counterfactually mugged." I will count being-counterfactually-mugged as a point in favor of proposed ways of thinking about decisions; I will count not-being-mugged as a point against. This is not, in itself, a precommitment; this is just a heuristic about good and bad reasoning as it seems to me when thinking about it ahead of time. A generalization of this heuristic is, "Ah, it seems any case where a decision procedure would prefer to make a commitment ahead of time but would prefer to do something different in the moment is a point against that decision procedure". I will, thinking about decision-making in the abstract as things seem to me now, tend to prefer decision procedures which avoid such self-contradictions. In other words, thinking about what constitutes good decision-making in the abstract seems a whole lot like thinking about how we would want a benevolent AI to make decisions. You could argue that I might think such things now, and might think up all sorts of sophisticated arguments which fit that picture, but later, when Omega asks me for100, if I re-think my decision-theoretic concepts at that time, I'll know better.

But, based on what principles would I be reconsidering? I can think of some. It seems to me now, though, that those principles are mistaken, and I should instead reason using principles which are more self-consistent -- principles which, when faced with the question of whether to give Omega $100, arrive at the same answer I currently think to be right. Of course this cannot be a general argument that I prefer to reason by principles which will arrive at conclusions consistent with my current beliefs. What I can do is consider the impact which particular ways of reasoning about decisions have on my overall expected utility (assuming I start out reasoning with some version of expected utility theory). Doing so, I will prefer UDT-like ways of reasoning when it comes to problems like counterfactual mugging. You might argue that beliefs are for true things, so I can't legitimately discount ways-of-thinking just because they have bad consequences. But, these are ways-of-thinking-about-decisions. The point of ways-of-thinking-about-decisions is winning. And, as I think about it now, it seems preferable to think about it in those ways which reliably achieve higher expected utility (the expectation being taken from my perspective now). Nor is this a quirk of my personal psychology, that I happen to find these arguments compelling in my current mental state, and so, when thinking about how to reason, prefer methods of reasoning which are more consistent with precommitments I would make. Rather, this seems like a fairly general fact about thinking beings who approach decision-making in a roughly expected-utility-like manner. Perhaps you would argue, like the CDT-er sometimes does in response to Newcomb, that you cannot modify your approach to reasoning about decisions so radically. You see that, from your perspective now, it would be better if you reasoned in a way which made you accept future counterfactual muggings. You'd see, in the future, that you are making a choice inconsistent with your preferences now. But this only means that you have different preferences then and now. And anyway, the question of decision theory should be what to do given preferences, right? You can take that perspective, but it seems you must do so regretfully -- you should wish you could self-modify in that way. Furthermore, to the extent that a theory of preferences sits in the context of a theory of rational agency, it seems like preferences should be the kind of think which tend to stay the same over time, not the sort of thing which change like this. Basically, it seems that assuming preferences remain fixed, beliefs about what you should do given those preferences and certain information should not change (except due to bounded rationality). IE: certainly I may think I should go to the grocery store but then change my mind when I learn it's closed. But I should not start out thinking that I should go to the grocery store even in the hypothetical where it's closed, and then, upon learning it's closed, go home instead. (Except due to bounded rationality.) That's what is happening with CDT in counterfactual mugging: it prefers that its future self should, if asked for$100, hand it over; but, when faced with the situation, it thinks it should not hand it over.

The CDTer response ("alas, I cannot change my own nature so radically") presumes that we have already figured out how to reason about decisions. I imagine that the real crux behind such a response is actually that CDT feels like the true answer, so that the non-CDT answer does not seem compelling even once it is established to have a higher expected value. The CDTer feels as if they'd have to lie to themselves to 1-box. The truth is that they could modify themselves so easily, if they thought the non-CDT answer was right! They protest that Newcomb's problem simply punishes rationality. But this argument presumes that CDT defines rationality.

An EDT agent who asks how best to act in future situations to maximize expected value in those situations will arrive back at EDT, since expected-value-in-the-situation is the very criterion which EDT already uses. However, this is a circular way of thinking -- we can make a variant of that kind of argument which justifies any decision procedure.

A CDT or EDT agent who asks itself how best to act in future situations to maximize expected value as estimated by its current self will arrive at UDT. Furthermore, that's the criterion it seems an agent ought to use when weighing the pros and cons of a decision theory; not the expected value according to some future hypothetical, but the expected value of switching to that decision theory now.

And, remember, it's not the case that we will switch back to CDT/EDT if we reconsider which decision theory is highest-expected-utility when we are later faced with Omega asking for $100. We'd be a UDT agent at that point, and so, would consider handing over the$100 to be the highest-EV action.

I expect another protest at this point -- that the question of which decision theory gets us the highest expected utility by our current estimation isn't the same as which one is true or right. To this I respond that, if we ask what highly capable agents would do ("highly intelligent"/"highly rational"), we would expect them to be counterfactually mugged -- because highly capable agents would (by the assumption of their high capability) self-modify if necessary in order to behave in the ways they would have precommitted to behave. So, this kind of decision theory / rationality seems like the kind you'd want to study to better understand the behavior of highly capable agents; and, the kind you would want to imitate if trying to become highly capable. This seems like an interesting enough thing to study. If there is some other thing, "the right decision theory", to study, I'm curious what that other thing is -- but it does not seem likely to make me lose interest in this thing (the normative theory I currently call decision theory, in which it's right to be counterfactually mugged).

a) it's possible that a counterfactual mugging situation could have been set up before an AI was built

My perspective now already includes some amount of updateless reasoning, so I don't necessarily find that compelling. However, I do agree that even according to UDT there's a subjective question of how much information should be incorporated into the prior. So, for example, it seems sensible to refuse counterfactual mugging on the first digit of pi.

Or maybe you just directly care about counterfactual selves? But why? Do you really believe that counterfactuals are in the territory and not the map?

It seems worth pointing out that we might deal with this via anthropic reasoning. We don't need to believe that the counterfactual selves literally exist; rather, we are unsure whether we are being simulated. If we are being simulated, then the other self (in a position to get $1000) really does exist. Caveat ---- There are a few hedge-words and qualifiers in the above which the casual reader might underestimate the importance of. For example, when I say (except due to bounded rationality) I really mean that many parts of the argument I'm making crumbles to dust in the face of bounded rationality, not that bounded rationality is a small issue which I set aside for convenience in the argument above. Keep in mind that I've recently been arguing against UDT. However, I do still think it is right to be counterfactually mugged, for something resembling the reasons I gave. It's just that many details of the argument I'm making really don't work for embedded agents -- to such a large extent that I've become pessimistic about UDT-like ideas. Comment by abramdemski on Two Types of Updatelessness · 2019-12-19T19:21:04.826Z · score: 2 (1 votes) · LW · GW I still use this distinction in my thinking, but I guess I haven't had any significant payoffs yet in terms of factoring the problem this way and then solving one part or another. Basically, all-upside cases are cases which you can solve by "updatelessness without uncertainty" -- you don't have to make trade-offs, you just have to recognize the better strategy. This is kind of natural for logical updatelessness (you are still calculating your probabilities, so you don't have probabilities yet, but you can make decisions anyway), but also kind of really unnatural (e.g., I don't know how to make this fit well with logical induction). Comment by abramdemski on Decision Theory · 2019-12-19T19:07:09.337Z · score: 7 (3 votes) · LW · GW Sure, but what computation do you then do, to figure out what UDT recommends? You have to have, written down, a specific prior which you evaluate everything with. That's the problem. As discussed in Embedded World Models, a Bayesian prior is not a very good object for an embedded agent's beliefs, due to realizability/grain-of-truth concerns; that is, specifically because a Bayesian prior needs to list all possibilities explicitly (to a greater degree than, e.g., logical induction). Comment by abramdemski on The Parable of Predict-O-Matic · 2019-11-29T23:42:46.455Z · score: 37 (8 votes) · LW · GW This maybe the most horrifying thing I have ever read. I'm amused that this sentence is likely the highest praise for my writing I've ever received. Comment by abramdemski on The Credit Assignment Problem · 2019-11-17T16:09:05.157Z · score: 7 (4 votes) · LW · GW Actually, that wasn't what I was trying to say. But, now that I think about it, I think you're right. I was thinking of the discounting variant of REINFORCE as having a fixed, but rather bad, model associating rewards with actions: rewards are tied more with actions nearby. So I was thinking of it as still two-level, just worse than actor-critic. But, although the credit assignment will make mistakes (a predictable punishment which the agent can do nothing to avoid will nonetheless make any actions leading up to the punishment less likely in the future), they should average out in the long run (those 'wrongfully punished' actions should also be 'wrongfully rewarded'). So it isn't really right to think it strongly depends on the assumption. Instead, it's better to think of it as a true discounting function. IE, it's not as assumption about the structure of consequences; it's an expression of how much the system cares about distant rewards when taking an action. Under this interpretation, REINFORCE indeed "closes the gradient gap" -- solves the credit assignment problem w/o restrictive modeling assumptions. Maybe. It might also me argued that REINFORCE depends on some properties of the environment such as ergodicity. I'm not that familiar with the details. But anyway, it now seems like a plausible counterexample. Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T23:22:26.878Z · score: 2 (1 votes) · LW · GW The online learning conceptual problem (as I understand your description of it) says, for example, I can never know whether it was a good idea to have read this book, because maybe it will come in handy 40 years later. Well, this seems to be "solved" in humans by exponential / hyperbolic discounting. It's not exactly episodic, but we'll more-or-less be able to retrospectively evaluate whether a cognitive process worked as desired long before death. I interpret you as suggesting something like what Rohin is suggesting, with a hyperbolic function giving the weights. It seems (to me) the literature establishes that our behavior can be approximately described by the hyperbolic discounting rule (in certain circumstances anyway), but, comes nowhere near establishing that the mechanism by which we learn looks like this, and in fact has some evidence against. But that's a big topic. For a quick argument, I observe that humans are highly capable, and I generally expect actor/critic to be more capable than dumbly associating rewards with actions via the hyperbolic function. That doesn't mean humans use actor/critic; the point is that there are a lot of more-sophisticated setups to explore. We do in fact have a model class. It's possible that our models are entirely subservient to instrumental stuff (ie, we "learn to think" rather than "thinking to learn", which would mean we don't have the big split which I'm pointing to -- ie, that we solve the credit assignment problem "directly" somehow, rather than needing to learn to do so. It seems very rich; in terms of "grain of truth", well I'm inclined to think that nothing worth knowing is fundamentally beyond human comprehension, except for contingent reasons like memory and lifespan limitations (i.e. not because they are not incompatible with the internal data structures). Maybe that's good enough? Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T23:05:36.083Z · score: 3 (2 votes) · LW · GW Not... really? "how can I maximize accuracy?" is a very liberal agentification of a process that might be more drily thought of as asking "what is accurate?" Your standard sequence predictor isn't searching through epistemic pseudo-actions to find which ones best maximize its expected accuracy, it's just following a pre-made plan of epistemic action that happens to increase accuracy. Yeah, I absolutely agree with this. My description that you quoted was over-dramaticizing the issue. Really, what you have is an agent sitting on top of non-agentic infrastructure. The non-agentic infrastructure is "optimizing" in a broad sense because it follows a gradient toward predictive accuracy, but it is utterly myopic (doesn't plan ahead to cleverly maximize accuracy). The point I was making, stated more accurately, is that you (seemingly) need this myopic optimization as a 'protected' sub-part of the agent, which the overall agent cannot freely manipulate (since if it could, it would just corrupt the policy-learning process by wireheading). Though this does lead to the thought: if you want to put things on equal footing, does this mean you want to describe a reasoner that searches through epistemic steps/rules like an agent searching through actions/plans? This is more or less how humans already conceive of difficult abstract reasoning. Yeah, my observation is that it intuitively seems like highly capable agents need to be able to do that; to that end, it seems like one needs to be able to describe a framework where agents at least have that option without it leading to corruption of the overall learning process via the instrumental part strategically biasing the epistemic part to make the instrumental part look good. (Possibly humans just use a messy solution where the strategic biasing occurs but the damage is lessened by limiting the extent to which the instrumental system can bias the epistemics -- eg, you can't fully choose what to believe.) Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T22:55:50.119Z · score: 3 (2 votes) · LW · GW How does that work? Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T22:52:45.218Z · score: 2 (1 votes) · LW · GW My thinking is somewhat similar to Vanessa's. I think a full explanation would require a long post in itself. It's related to my recent thinking about UDT and commitment races. But, here's one way of arguing for the approach in the abstract. You once asked: Assuming that we do want to be pre-rational, how do we move from our current non-pre-rational state to a pre-rational one? This is somewhat similar to the question of how do we move from our current non-rational (according to ordinary rationality) state to a rational one. Expected utility theory says that we should act as if we are maximizing expected utility, but it doesn't say what we should do if we find ourselves lacking a prior and a utility function (i.e., if our actual preferences cannot be represented as maximizing expected utility). The fact that we don't have good answers for these questions perhaps shouldn't be considered fatal to pre-rationality and rationality, but it's troubling that little attention has been paid to them, relative to defining pre-rationality and rationality. (Why are rationality researchers more interested in knowing what rationality is, and less interested in knowing how to be rational? Also, BTW, why are there so few rationality researchers? Why aren't there hordes of people interested in these issues?) My contention is that rationality should be about the update process. It should be about how you adjust your position. We can have abstract rationality notions as a sort of guiding star, but we also need to know how to steer based on those. Some examples: • Logical induction can be thought of as the result of performing this transform on Bayesianism; it describes belief states which are not coherent, and gives a rationality principle about how to approach coherence -- rather than just insisting that one must somehow approach coherence. • Evolutionary game theory is more dynamic than the Nash story. It concerns itself more directly with the question of how we get to equilibrium. Strategies which work better get copied. We can think about the equilibria, as we do in the Nash picture; but, the evolutionary story also lets us think about non-equilibrium situations. We can think about attractors (equilibria being point-attractors, vs orbits and strange attractors), and attractor basins; the probability of ending up in one basin or another; and other such things. • However, although the model seems good for studying the behavior of evolved creatures, there does seem to be something missing for artificial agents learning to play games; we don't necessarily want to think of there as being a population which is selected on in that way. • The complete class theorem describes utility-theoretic rationality as the end point of taking Pareto improvements. But, we could instead think about rationality as the process of taking Pareto improvements. This lets us think about (semi-)rational agents whose behavior isn't described by maximizing a fixed expected utility function, but who develop one over time. (This model in itself isn't so interesting, but we can think about generalizing it; for example, by considering the difficulty of the bargaining process -- subagents shouldn't just accept any Pareto improvement offered.) • Again, this model has drawbacks. I'm definitely not saying that by doing this you arrive at the ultimate learning-theoretic decision theory I'd want. Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T22:28:40.388Z · score: 4 (2 votes) · LW · GW You could also have a version of REINFORCE that doesn't make the episodic assumption, where every time you get a reward, you take a policy gradient step for each of the actions taken so far, with a weight that decays as actions go further back in time. You can't prove anything interesting about this, but you also can't prove anything interesting about actor-critic methods that don't have episode boundaries, I think. Yeah, you can do this. I expect actor-critic to work better, because your suggestion is essentially a fixed model which says that actions are more relevant to temporally closer rewards (and that this is the only factor to consider). I'm not sure how to further convey my sense that this is all very interesting. My model is that you're like "ok sure" but don't really see why I'm going on about this. Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T21:33:27.095Z · score: 5 (3 votes) · LW · GW Yeah, it's definitely related. The main thing I want to point out is that Shapley values similarly require a model in order to calculate. So you have to distinguish between the problem of calculating a detailed distribution of credit and being able to assign credit "at all" -- in artificial neural networks, backprop is how you assign detailed credit, but a loss function is how you get a notion of credit at all. Hence, the question "where do gradients come from?" -- a reward function is like a pile of money made from a joint venture; but to apply backprop or Shapley value, you also need a model of counterfactual payoffs under a variety of circumstances. This is a problem, if you don't have a seperate "epistemic" learning process to provide that model -- ie, it's a problem if you are trying to create one big learning algorithm that does everything. Specifically, you don't automatically know how to send rewards to each contributor proportional to how much they improved the actual group decision because in the cases I'm interested in, ie online learning, you don't have the option of rerunning it without them and seeing how performance declines -- because you need a model in order to rerun. But, also, I think there are further distinctions to make. I believe that if you tried to apply Shapley value to neural networks, it would go poorly; and presumably there should be a "philosophical" reason why this is the case (why Shapley value is solving a different problem than backprop). I don't know exactly what the relevant distinction is. (Or maybe Shapley value works fine for NN learning; but, I'd be surprised.) Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T21:21:25.223Z · score: 12 (3 votes) · LW · GW Yeah, this one was especially difficult in that way. I spent a long time trying to articulate the idea in a way that made any sense, and kept adding framing context to the beginning to make the stuff closer to what I wanted to say make more sense -- the idea that the post was about the credit assignment algorithm came very late in the process. I definitely agree that rant-mode feels very vulnerable to attack. Comment by abramdemski on “embedded self-justification,” or something like that · 2019-11-13T21:17:02.342Z · score: 10 (2 votes) · LW · GW What you call floor for Alpha Go, i.e. the move evaluations, are not even boundaries (in the sense nostalgebraist define it), that would just be the object level (no meta at all) policy. I think in general the idea of the object level policy with no meta isn't well-defined, if the agent at least does a little meta all the time. In AlphaGo, it works fine to shut off the meta; but you could imagine a system where shutting off the meta would put it in such an abnormal state (like it's on drugs) that the observed behavior wouldn't mean very much in terms of its usual operation. Maybe this is the point you are making about humans not having a good floor/ceiling distinction. But, I think we can conceive of the "floor" more generally. If the ceiling is the fixed structure, e.g. the update for the weights, the "floor" is the lowest-level content -- e.g. the weights themselves. Whether thinking at some meta-level or not, these weights determine the fast heuristics by which a system reasons. I still think some of what nostalgebraist said about boundaries seems more like the floor than the ceiling. The space "between" the floor and the ceiling involves constructed meta levels, which are larger computations (ie not just a single application of a heuristic function), but which are not fixed. This way we can think of the floor/ceiling spectrum as small-to-large: the floor is what happens in a very small amount of time; the ceiling is the whole entire process of the algorithm (learning and interacting with the world); the "interior" is anything in-between. Of course, this makes it sort of trivial, in that you could apply the concept to anything at all. But the main interesting thing is how an agent's subjective experience seems to interact with floors and ceilings. IE, we can't access floors very well because they happen "too quickly", and besides, they're the thing that we do everything with (it's difficult to imagine what it would mean for a consciousness to have subjective "access to" its neurons/transistors). But we can observe the consequences very immediately, and reflect on that. And the fast operations can be adjusted relatively easy (e.g. updating neural weights). Intermediate-sized computational phenomena can be reasoned about, and accessed interactively, "from the outside" by the rest of the system. But the whole computation can be "reasoned about but not updated" in a sense, and becomes difficult to observe again (not "from the outside" the way smaller sub-computations can be observed). Comment by abramdemski on Meetup Notes: Ole Peters on ergodicity · 2019-11-13T20:24:08.174Z · score: 3 (2 votes) · LW · GW I now like the "time vs ensemble" description better. I was trying to understand everything coming from a Bayesian frame, but actually, all of these ideas are more frequentist. In a Bayesian frame, it's natural to think directly in terms of a decision rule. I didn't think time-averaging was a good description because I didn't see a way for an agent to directly replace ensemble average with time average, in order to make decisions: • Ensemble averaging is the natural response to decision-making under uncertainty; you're averaging over different possibilities. When you try to time-average to get rid of your uncertainty, you have to ask "time average what?" -- you don't know what specific situation you're in. • In general, the question of how to turn your current situation into a repeated sequence for the purpose of time-averaging analysis seems under-determined (even if you are certain about your present situation). Surely Peters doesn't want us to use actual time in the analysis; in actual time, you end up dead and lose all your money, so the time-average analysis is trivial. • Even if you settle on a way to turn the situation into an iterated sequence, the necessary limit does not necessarily exist. This is also true of the possibility-average, of course (the St Petersburg Paradox being a classic example); but it seems easier to get failure in the time-avarage case, because you just need non-convergence; ie, you don't need any unbounded stuff to happen. However, all of these points are also true of frequentism: • Frequentist approaches start from the objective/external perspective rather than the agent's internal uncertainty. They don't want to define probability as the subjective viewpoint; they want probability to be defined as limiting frequencies if you repeated an experiment over and over again. The fact that you don't have direct access to these is a natural consequence of you not having direct access to objective truth. • Even given direct access to objective truth, frequentist probabilities are still under-defined because of the reference class problem -- what infinite sequence of experiments do you conceive of your experiment as part of? • And, again, once you select a sequence, there's no guarantee that a limit exists. Frequentism has to solve this by postulating that limits exist for the kinds of reference classes we want to talk about. So, I now think what Ole Peters is working on is frequentist decision theory. Previously, the frequentist/Bayesian debate was about statistics and science, but decision theory was predominantly Bayesian. Ole Peters is working out the natural theory of decision making which frequentists could/should have been pursuing. (So, in that sense, it's much more than just a new argument for kelly betting.) Describing frequentist-vs-Bayesian as time-averaging vs possibility-averaging (aka ensemble-averaging) seems perfectly appropriate. So, on my understanding, Ole's response to the three difficulties could be: • We first understand the optimal response to an objectively defined scenario; then, once we've done that, we can concern ourselves with the question of how to actually behave given our uncertainty about what situation we're in. This is not trying to be a universal formula for rational decision making in the same way Bayesianism attempts to be; you might have to do some hard work to figure out enough about your situation in order to apply the theory. • And when we design general-purpose techniques, much like when we design statistical tests, our question should be whether given an objective scenario the decision-making technique does well -- the same as frequentists wanting estimates to be unbiased. Bayesians want decisions and estimates to be optimal given our uncertainty instead. • As for how to turn your situation into an iterated game, Ole can borrow the frequentist response of not saying much about it. • As for the existence of a limit, Ole actually says quite a bit about how to fiddle with the math until you're dealing with a quantity for which a limit exists. See his lecture notes. On page 24 (just before section 1.3) he talks briefly about finding an appropriate function of your wealth such that you can do the analysis. Then, section 2.7 says much more about this. • The general idea is that you have to choose an analysis which is appropriate to the dynamics. Additive dynamics call for additive analysis (examining the time-average of wealth). Multiplicative dynamics call for multiplicative analysis (examining the time-average of growth, as in kelly betting and similar settings). Other settings call for other functions. Multiplicative dynamics are common in financial theory because so much financial theory is about investment, but if we examine financial decisions for those living on income, then it has to be very different. Comment by abramdemski on Meetup Notes: Ole Peters on ergodicity · 2019-11-09T23:11:44.974Z · score: 5 (5 votes) · LW · GW I haven't read the material extensively (I've skimmed it), but here's what I think is wrong with the time-average-vs-ensemble-average argument and my attempt to steelman it. It seems very plausible to me that you're right about the question-begging nature of Peter's version of the argument; it seems like by maximizing expected growth rate, you're maximizing log wealth. But I also think he's trying to point at something real. In the presentation where he uses the 1.5x/0.6x bet example, Peters shows how "expected utility over time" is an increasing line (this is the "ensemble average" -- averaging across possibilities at each time), whereas the actual payout for any player looks like a straight downward line (in log-wealth) if we zoom out over enough iterations. There's no funny business here -- yes, he's taking a log, but that's just the best way of graphing the phenomenon. It's still true that you lose almost surely if you keep playing this game longer and longer. This is a real phenomenon. But, how do we formalize an alternative optimization criterion from it? How do we make decisions in a way which "aggregates over time rather than over ensemble"? It's natural to try to formalize something in log-wealth space since that's where we see a straight line, but as you said, that's question-begging. Well, a (fairly general) special case of log-wealth maximization is the Kelly criterion. How do people justify that? Wikipedia's current "proof" section includes a heuristic argument which runs roughly as follows: • Imagine you're placing bets in the same way a large number of times, N. • By the law of large numbers, the frequency of wins and losses approximately equals their probabilities. • Optimize total wealth at time N under the assumption that the frequencies equal the probabilities. You get the Kelly criterion. Now, it's easy to see this derivation and think "Ah, so the Kelly criterion optimizes your wealth after a large number of steps, whereas expected utility only looks one step ahead". But, this is not at all the case. An expected money maximizer (EMM) thinking long-term will still take risky bets. Observe that (in the investment setting in which Kelly works) the EMM strategy for a single step doesn't depend on the amount of money you have -- you either put all your money in the best investment, or you keep all of your money because there are no good investments. Therefore, the payout of the EMM in a single step is some multiple C of the amount of money it begins that step with. Therefore, an EMM looking one step ahead just values its winnings at the end of the first step C more -- but this doesn't change its behavior, since multiplying everything by C doesn't change what the max-expectation strategy will be. Similarly, two-step lookahead only modifies things by , and so on. So an EMM looking far ahead behaves just like one maximizing its holdings in the very next step. The trick in the analysis is the way we replace a big sum over lots of possible ways things could go with a single "typical" outcome. This might initially seem like a mere computational convenience -- after all, the vast vast majority of possible sequences have approximately the expected win/loss frequencies. Here, though, it makes all the difference, because it eliminates from consideration the worlds which have the highest weight in the EMM analysis -- the worlds where things to really well and the EMM gets exponentially much money. OK, so, is the derivation just a mistake? I think many english-language justifications of the Kelly criterion or log-wealth maximization are misleading or outright wrong. I don't think we can justify it as an analysis of the best long-term strategy, because the analysis rules out any sequence other than those with the most probable statistics, which isn't a move motivated by long-term analysis. I don't think we can even justify it as "time average rather than ensemble average" because we're not time-averaging wealth. Indeed, the whole point is supposedly to deal with the non-ergodic cases; but non-ergodic systems don't have unique time-averaged behavior! However, I ultimately find something convincing about the analysis: namely, from an evolutionary perspective, we expect to eventually find that only (approximate) log-wealth maximizers remain in the market (with non-negligible funds). This conclusion is perfectly compatible with expected utility theory as embodied by the VNM axioms et cetera. It's an argument that market entities will tend to have utility=log(money), at least approximately, at least in common situations which we can expect strategies to be optimized for. More generally, there might be an argument that evolved organisms will tend to have utility=log(resources), for many notions of resources. However, maybe Nassim Nicolas Taleb would rebuke us for this tepid and timid conclusion. In terms of pure utility theory, applying a log before taking an expectation is a distinction without a difference -- we were allowed any utility function we wanted from the start, so requiring an arbitrary transform means nothing. For example, we can "solve" the St. Petersburg paradox by claiming our utility is the log of money -- but we can then re-create the paradox by putting all the numbers in the game through an exponential function! So what's the point? We should learn from our past mistakes, and choose a framework which won't be prone to those same errors. So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters' general idea, but isn't just log-wealth maximization? Well, let's look again at the kelly-criterion analysis. Can we make that into a general-purpose decision procedure? Can we get it to produce results incompatible with VNM? If so, is the procedure at all plausible? As I've already mentioned, there isn't a clear way to apply the law-of-large-numbers trick in non-ergodic situations, because there is not a unique "typical" set of frequencies which emerges. Can we do anything to repair the situation, though? I propose that we maximize median expected value. This gives a notion of "typical" which does not rely on an application of the law of large numbers, so it's fine if the statistics of our sequence don't converge to a single unique point. If they do, however, the median will evaluate things from that point. So, it's a workable generalization of the principle behind Kelly betting. The median also relates to something mentioned in the OP: I've felt vaguely confused for a long time about why expected value/utility is the right way to evaluate decisions; it seems like I might be more strongly interested in something like "the 99th percentile outcome for the overall utility generated over my lifetime". The median is the 50th percentile, so there you go. Maximizing the median indeed violates VNM: • It's discontinuous. Small differences in probability can change the median outcome by a lot. Maybe this isn't so bad -- who really cares about continuity, anyway? Yeah, seemingly small differences in probability create "unjustified" large differences in perceived quality of a plan, but only in circumstances where outcomes are sparse enough that the median is not very "informed". • It violates independence, in a more obviously concerning way. A median-maximizer doesn't care about "outlier" outcomes. It's indifferent between the following two plans, which seems utterly wrong: • A plan with 100% probability of getting you$100
• A plan with 60% probability of getting you \$100, and 40% probability of getting you killed.

Both of these concerns become negligible as we take a long-term view. The longer into the future we look, the more outcomes there will be, making the median more robust to shifting probabilities. Similarly, a median-maximizer is indifferent between the two options above, but if you consider the iterated game, it will strongly prefer the global strategy of always selecting the first option.

Still, I would certainly not prefer to optimize median value myself, or create AGI which optimizes median value. What if there's a one-shot situation which is similar to the 40%-death example? I think I similarly don't want to maximize the 99th percentile outcome, although this is less clearly terrible.

Can we give an evolutionary argument for median utility, as a generalization of the evolutionary argument for log utility? I don't think so. The evolutionary argument relies on the law of large numbers, to say that we'll almost surely end up in a world where log-maximizers prosper. There's no similar argument that we almost surely end up in the "median world".

So, all told:

• I don't think there's a good argument against expectation-maximization here.
• But I do think those who think there is should consider median-maximization, as it's an alternative to expectation-maximization which is consistent with much of the discussion here.
• I basically buy the argument that utility should be log of money.
• I don't think it's right to describe the whole thing as "time-average vs ensemble-average", and suspect some of the "derivations" are question-begging.
• I do think there's an evolutionary argument which can be understood from some of the derivations, however.
Comment by abramdemski on Meetup Notes: Ole Peters on ergodicity · 2019-11-09T20:35:21.326Z · score: 2 (1 votes) · LW · GW

It seems to me like it's right. So far as I can tell, the "time-average vs ensemble average" argument doesn't really make sense, but it's still true that log-wealth maximization is a distinguished risk-averse utility function with especially good properties.

• Idealized markets will evolve to contain only Kelly bettors, as other strategies either go bust too often or have sub-optimal growth.
• BUT, keep in mind we don't live in such an idealized market. In reality, it only makes sense to use this argument to conclude that financially savvy people/institutions will be approximate log-wealth maximizers -- IE, the people/organizations with a lot of money. Regular people might be nowhere near log-wealth-maximizing, because "going bust" often doesn't literally mean dying; you can be a failed serial startup founder, because you can crash on friends'/parents' couches between ventures, work basic jobs when necessary, etc.
• More generally, evolved organisms are likely to be approximately log-resource maximizers. I'm less clear on this argument, but the situation seems analogous. It therefore may make sense to suppose that humans are approximate log-resource maximizers.

(I'm not claiming Peters is necessarily adding anything to this analysis.)

Comment by abramdemski on Defining Myopia · 2019-11-09T07:47:41.615Z · score: 6 (2 votes) · LW · GW

Sorry for taking so long to respond to this one.

I don't get the last step in your argument:

In contrast, if our learning algorithm is some evolutionary computation algorithm, the models (in the population) in which θ8 happens to be larger are expected to outperform the other models, in iteration 2. Therefore, we should expect iteration 2 to increase the average value of θ8 (over the model population).

Why do those models outperform? I think you must be imagining a different setup, but I'm interpreting your setup as:

• This is a classification problem, so, we're getting feedback on correct labels X for some Y.
• It's online, so we're doing this in sequence, and learning after each.
• We keep a population of models, which we update (perhaps only a little) after every training example; population members who predicted the label correctly get a chance to reproduce, and a few population members who didn't are killed off.
• The overall prediction made by the system is the average of all the predictions (or some other aggregation).
• Large influences at one time-step will cause predictions which make the next time-step easier.
• So, if the population has an abundance of high at one time step, the population overall does better in the next time step, because it's easier for everyone to predict.
• So, the frequency of high will not be increased at all. Just like in gradient descent, there's no point at which the relevant population members are specifically rewarded.

In other words, many members of the population can swoop in and reap the benefits caused by high- members. So high- carriers do not specifically benefit.

Comment by abramdemski on The Credit Assignment Problem · 2019-11-08T19:35:47.038Z · score: 8 (4 votes) · LW · GW

Yeah, I pretty strongly think there's a problem -- not necessarily an insoluble problem, but, one which has not been convincingly solved by any algorithm which I've seen. I think presentations of ML often obscure the problem (because it's not that big a deal in practice -- you can often define good enough episode boundaries or whatnot).

Suppose we have a good reward function (as is typically assumed in deep RL). We can just copy the trick in that setting, right? But the rest of the post makes it sound like you still think there's a problem, in that even with that reward, you don't know how to assign credit to each individual action. This is a problem that evolution also has; evolution seemed to manage it just fine.
• Yeah, I feel like "matching rewards to actions is hard" is a pretty clear articulation of the problem.
• I agree that it should be surprising, in some sense, that getting rewards isn't enough. That's why I wrote a post on it! But why do you think it should be enough? How do we "just copy the trick"??
• I don't agree that this is analogous to the problem evolution has. If evolution just "received" the overall population each generation, and had to figure out which genomes were good/bad based on that, it would be a more analogous situation. However, that's not at all the case. Evolution "receives" a fairly rich vector of which genomes were better/worse, each generation. The analogous case for RL would be if you could output several actions each step, rather than just one, and receive feedback about each. But this is basically "access to counterfactuals"; to get this, you need a model.
(Similarly, even if you think actor-critic methods don't count, surely REINFORCE is one-level learning? It works okay; added bells and whistles like critics are improvements to its sample efficiency.)

No, definitely not, unless I'm missing something big.

From page 329 of this draft of Sutton & Barto:

Note that REINFORCE uses the complete return from time t, which includes all future rewards up until the end of the episode. In this sense REINFORCE is a Monte Carlo algorithm and is well defined only for the episodic case with all updates made in retrospect after the episode is completed (like the Monte Carlo algorithms in Chapter 5). This is shown explicitly in the boxed on the next page.

So, REINFORCE "solves" the assignment of rewards to actions via the blunt device of an episodic assumption; all rewards in an episode are grouped with all actions during that episode. If you expand the episode to infinity (so as to make no assumption about episode boundaries), then you just aren't learning. This means it's not applicable to the case of an intelligence wandering around and interacting dynamically with a world, where there's no particular bound on how the past may relate to present reward.

The "model" is thus extremely simple and hardwired, which makes it seem one-level. But you can't get away with this if you want to interact and learn on-line with a really complex environment.

Also, since the episodic assumption is a form of myopia, REINFORCE is compatible with the conjecture that any gradients we can actually construct are going to incentivize some form of myopia.

Comment by abramdemski on The Credit Assignment Problem · 2019-11-08T17:32:48.600Z · score: 5 (3 votes) · LW · GW

Yep, I 100% agree that this is relevant. The PP/Friston/free-energy/active-inference camp is definitely at least trying to "cross the gradient gap" with a unified theory as opposed to a two-system solution. However, I'm not sure how to think about it yet.

• I may be completely wrong, but I have a sense that there's a distinction between learning and inference which plays a similar role; IE, planning is just inference, but both planning and inference work only because the learning part serves as the second "protected layer"??
• It may be that the PP is "more or less" the Bayesian solution; IE, it requires a grain of truth to get good results, so it doesn't really help with the things I'm most interested in getting out of "crossing the gap".
• Note that PP clearly tries to implement things by pushing everything into epistemics. On the other hand, I'm mostly discussing what happens when you try to smoosh everything into the instrumental system. So many of my remarks are not directly relevant to PP.
• I get the sense that Friston might be using the "evolution solution" I mentioned; so, unifying things in a way which kind of lets us talk about evolved agents, but not artificial ones. However, this is obviously an oversimplification, because he does present designs for artificial agents based on the ideas.

Overall, my current sense is that PP obscures the issue I'm interested in more than solves it, but it's not clear.

Comment by abramdemski on The Zettelkasten Method · 2019-11-07T23:51:11.541Z · score: 3 (2 votes) · LW · GW

Not really? Although I use interconnections, I focus a fair amount on the tree-structure part. I would say there's a somewhat curious phenomenon where I am able to go "deeper" in analysis than I would previously (in notebooks or workflowy), but the "shallow" part of the analysis isn't questioned as much as it could be (it becomes the context in which things happen). In a notebook, I might end up re-stating "early" parts of my overall argument more, and therefore refining them more.

I have definitely had the experience of reaching a conclusion fairly strongly in Zettelkasten and then having trouble articulating it to other people. My understanding of the situation is that I've built up a lot of context of which questions are worth asking, how to ask them, which examples are most interesting, etc. So there's a longer inferential distance. BUT, it's also a bad sign for the conclusion. The context I've built up is more probably shaky if I can't articulate it very well.

Comment by abramdemski on The Zettelkasten Method · 2019-11-06T20:00:50.463Z · score: 8 (4 votes) · LW · GW

My worry was essentially media-makes-message style. Luhmann's sociological theories were sprawling interconnected webs. (I have not read him at all; this is just my impression.) This is not necessarily because the reality he was looking at is best understood in that form. Also, his theory of sociology has something to do with systems interacting with each other through communication bottlenecks (?? again, I have not really read him), which he explicitly relates to Zettelkasten.

Relatedly, Paul Christiano uses a workflowy-type outlining tool extensively, and his theory of AI safety prominently features hierarchical tree structures.

Comment by abramdemski on Dreaming of Political Bayescraft · 2019-11-04T17:40:26.775Z · score: 9 (5 votes) · LW · GW
Any time you find yourself being tempted to be loyal to an idea, it turns out that what you should actually be loyal to is whatever underlying feature of human psychology makes the idea look like a good idea; that way, you'll find it easier to fucking update when it turns out that the implementation of your favorite idea isn't as fun as you expected!

I agree that there's an important skill here, but I also want to point out that this seems to tip in a particular direction which may be concerning.

Ben Hoffman writes about authenticity vs accuracy.

• An authenticity-oriented person thinks of honesty as being true to what you're feeling right now. Quick answers from the gut are more honest. Careful consideration before speaking is a sign of dishonesty. Making a promise and later breaking it isn't dishonest if you really meant the promise when you made it!
• An accuracy-oriented person thinks of honesty as making a real effort to tell the truth. Quick answers are a sign that you're not doing that; long pauses before speaking are a sign that you are. It's not just about saying what you really believe; making a factual error when you could have avoided it if you had been more careful is almost the same as purposefully lying (especially given concerns about motivated cognition).

Authenticity and accuracy are both valuable, and it would be best to reconcile them. But, my concern is that your advice against being loyal to an idea tips things away from accuracy. If you have a knee-jerk reaction to be loyal to the generators of an idea rather than the idea itself, it seems to me like you're going to make some slips toward the making-a-promise-and-breaking-it-isn't-dishonest-if-you-meant-it direction which you wouldn't reflectively endorse if you considered it more carefully.

Comment by abramdemski on The Parable of Predict-O-Matic · 2019-11-03T07:15:19.556Z · score: 4 (2 votes) · LW · GW

I guess 'self-fulfilling prophecy' is a bit long and awkward. Sometimes 'basilisk' is thrown around, but, specifically for negative cases (self-fulfilling-and-bad). But, are you trying to name something slightly different (perhaps broader or narrower) than self-fulfilling prophecy points at?

I find I don't like 'stipulation'; that has the connotation of command, for me (like, if I tell you to do something).

Comment by abramdemski on “embedded self-justification,” or something like that · 2019-11-03T06:49:46.102Z · score: 27 (11 votes) · LW · GW

It seems to me that there are roughly two types of "boundary" to think about: ceilings and floors.

• Floors are aka the foundations. Maybe a system is running on a basically Bayesian framework, or (alternately) logical induction. Maybe there are some axioms, like ZFC. Going meta on floors involves the kind of self-reference stuff which you hear about most often: Gödel's theorem and so on. Floors are, basically, pretty hard to question and improve (though not impossible).
• Ceilings are fast heuristics. You have all kinds of sophisticated beliefs in the interior, but there's a question of which inferences you immediately make, without doing any meta to consider what direction to think in. (IE, you do generally do some meta to think about what direction to think in; but, this "tops out" at some level, at which point the analysis has to proceed without meta.) Ceilings are relatively easy to improve. For example, the AlphaGo move proposal network and evaluation network (if I recall the terms correctly). These have cheap updates which can be made frequently, via observing the results of reasoning. These incremental updates then help the more expensive tree-search reasoning to be even better.

Both floors and ceilings have a flavor of "the basic stuff that's actually happening" -- the interior is built out of a lot of boundary stuff, and small changes to boundary will create large shifts in interior. However, floors and ceilings are very different. Tweaking floor is relatively dangerous, while tweaking ceiling is relatively safe. Returning to the AlphaGo analogy, the floor is like the model of the game which allows tree search. The floor is what allows us to create a ceiling. Tweaks to the floor will tend to create large shifts in the ceiling; tweaks to the ceiling will not change the floor at all.

(Perhaps other examples won't have as clear a floor/ceiling division as AlphaGo; or, perhaps they still will.)

What remains unanswered, though, is whether there is any useful way of talking about doing this (the whole thing, including the self-improvement R&D) well, doing it rationally, as opposed to doing it in a way that simply “seems to work” after the fact.
[...] Is there anything better than simply bumbling around in concept-space, in a manner that perhaps has many internal structures of self-justification but is not known to work as a whole? [...]
Can you represent your overall policy, your outermost strategy-over-strategies considered a response to your entire situation, in a way that is not a cartoon, a way real enough to defend itself?

My intuition is that the situation differs, somewhat, for floors and ceilings.

• For floors, there are fundamental logical-paradox-flavored barriers. This relates to MIRI research on tiling agents.
• For ceilings, there are computational-complexity-flavored barriers. You don't expect to have a perfect set of heuristics for fast thinking. But, you can have strategies relating to heuristics which have universal-ish properties. Like, logical induction is an "uppermost ceiling" (takes the fixed point of recursive meta) such that, in some sense, you know you're doing the best you can do in terms of tracking which heuristics are useful; you don't have to spawn further meta-analysis on your heuristic-forming heuristics. HOWEVER, it is also very very slow and impractical for building real agents. It's the agent that gets eaten in your parable. So, there's more to be said with respect to ceilings as they exist in reality.
Comment by abramdemski on Defining Myopia · 2019-11-02T17:46:58.299Z · score: 6 (3 votes) · LW · GW
(1) I expect many actors to be throwing a lot of money on selection processes (especially unsupervised learning), and I find it plausible that such efforts would produce transformative/dangerous systems.

Sure.

(2) Suppose there's some competitive task that is financially important (e.g. algo-trading), for which actors build systems that use a huge neural network trained via gradient descent. I find it plausible that some actors will experiment with evolutionary computation methods, trying to produce a component that will outperform and replace that neural network.

Maybe, sure.

There seems to be something I'm missing here. What you said earlier:

Apart from this, it seems to me that some evolutionary computation algorithms tend to yield models that take all the Pareto improvements, given sufficiently long runtime. The idea is that at any point during training we should expect a model to outperform another model—that takes one less Pareto improvement—on future fitness evaluations (all other things being equal).

is an essentially mathematical remark, which doesn't have a lot to do with AI timelines and projections of which technologies will be used. I'm saying that this remark strikes me as a type error, because it confuses what I meant by "take all the Pareto improvements" -- substituting the (conceptually and technologically difficult) control concept for the (conceptually straightforward, difficult only because of processing power limitations) selection concept.

I interpret you that way because your suggestion to apply evolutionary algorithms appears to be missing data. We can apply evolutionary algorithms if we can define a loss function. But the problem I'm pointing at (off full vs partial agency) has to do with difficulties of defining a loss function.

>How would you propose to apply evolutionary algorithms to online learning?
One can use a selection process—say, some evolutionary computation algorithm—to produce a system that performs well in an online learning task. The fitness metric would be based on the performance in many (other) online learning tasks for which training data is available (e.g. past stock prices) or for which the environment can be simulated (e.g. Atari games, robotic arm + boxes).

So, what is the argument that you'd tend to get full agency out of this? I think the situation is not very different from applying gradient descent in a similar way.

• Using data from past stock prices, say, creates an implicit model that the agent's trades can never influence the stock price. This is of course a mostly fine model for today's ML systems, but, it's also an example of what I'm talking about -- training procedures tend to create partial agency rather than full agency.
• Training the system on many online learning tasks, there will not be an incentive to optimize across tasks -- the training procedure implicitly assumes that the different tasks are independent. This is significant because you really need a whole lot of data in order to learn effective online learning tactics; it seems likely you'd end up splitting larger scenarios into a lot of tiny episodes, creating myopia.

I'm not saying I'd be happily confident that such a procedure would produce partial agents (therefore avoiding AI risk). And indeed, there are differences between doing this with gradient descent and evolutionary algorithms. One of the things I focused on in the post, time-discounting, becomes less relevant -- but only because it's more natural to split things into episodes in the case of evolutionary algorithms, which still creates myopia as a side effect.

What I'm saying is there's a real credit assignment problem here -- you're trying to pick between different policies (ie the code which the evolutionary algorithms are selecting between), based on which policy has performed better in the past. But you've taken a lot of actions in the past. And you've gotten a lot of individual pieces of feedback. You don't know how to ascribe success/failure credit -- that is, you don't know how to match individual pieces of feedback to individual decisions you made (and hence to individual pieces of code).

So you solve the problem in a basically naive way: you assume that the feedback on "instance n" was related to the code you were running at that time. This is a myopic assumption!

>How would you propose to apply evolutionary algorithms to non-episodic environments?
I'm not sure whether this refers to non-episodic tasks (the issue being slower/sparser feedback?) or environments that can't be simulated (in which case the idea above seems to apply: one can use a selection process, using other tasks for which there's training data or for which the environment can be simulated).

The big thing with environments that can't be simulated is that you don't have a reset button, so you can't back up and try again; so, episodic and simulable are pretty related.

Sparse feedback is related to what I'm talking about, but feels like a selection-oriented way of understanding the difficulty of control; "sparse feedback" still applies to very episodic problems such as chess. The difficulty with control is that arbitrarily long historical contexts can sometimes matter, and you have to learn anyway. But I agree that it's much easier for this to present real difficulty if the rewards are sparse.

Comment by abramdemski on Partial Agency · 2019-11-02T16:49:24.470Z · score: 2 (1 votes) · LW · GW
My point is that the relevant distinction in that case seems to be "instrumental goal" vs. "terminal goal", rather than "full agency" vs. "partial agency". In other words, I expect that a map that split things up based on instrumental vs. terminal would do a better job of understanding the territory than one that used full vs. partial agency.

Ah, I see. I definitely don't disagree that epistemics is instrumental. (Maybe we have some terminal drive for it, but, let's set that aside.) BUT:

• I don't think we can account for what's going on here just by pointing that out. Yes, the fact that it's instrumental means that we cut it off when it "goes too far", and there's not a nice encapsulation of what "goes too far" means. However, I think even when we set that aside there's still an alter-the-map-to-fit-the-territory-not-the-other-way-around phenomenon. IE, yes, it's a subgoal, but how can we understand the subgoal? Is it best understood as optimization, or something else?
• When designing machine learning algorithms, this is essentially built in as a terminal goal; the training procedure incentivises predicting the data, not manipulating it. Or, if it does indeed incentivize manipulation of the data, we would like to understand that better; and we'd like to be able to design things which don't have that incentive structure.
To be clear, I don't think iid explains it in all cases, I also think iid is just a particularly clean example.

Ah, sorry for misinterpreting you.