## Posts

The Credit Assignment Problem 2019-11-08T02:50:30.412Z · score: 59 (15 votes)
Defining Myopia 2019-10-19T21:32:48.810Z · score: 28 (6 votes)
Random Thoughts on Predict-O-Matic 2019-10-17T23:39:33.078Z · score: 26 (10 votes)
The Parable of Predict-O-Matic 2019-10-15T00:49:20.167Z · score: 154 (57 votes)
Partial Agency 2019-09-27T22:04:46.754Z · score: 52 (14 votes)
The Zettelkasten Method 2019-09-20T13:15:10.131Z · score: 117 (44 votes)
Do Sufficiently Advanced Agents Use Logic? 2019-09-13T19:53:36.152Z · score: 41 (16 votes)
Troll Bridge 2019-08-23T18:36:39.584Z · score: 72 (41 votes)
Conceptual Problems with UDT and Policy Selection 2019-06-28T23:50:22.807Z · score: 39 (12 votes)
What's up with self-esteem? 2019-06-25T03:38:15.991Z · score: 39 (18 votes)
How hard is it for altruists to discuss going against bad equilibria? 2019-06-22T03:42:24.416Z · score: 52 (15 votes)
Paternal Formats 2019-06-09T01:26:27.911Z · score: 60 (27 votes)
Mistakes with Conservation of Expected Evidence 2019-06-08T23:07:53.719Z · score: 146 (46 votes)
Does Bayes Beat Goodhart? 2019-06-03T02:31:23.417Z · score: 45 (14 votes)
Selection vs Control 2019-06-02T07:01:39.626Z · score: 103 (27 votes)
Separation of Concerns 2019-05-23T21:47:23.802Z · score: 70 (22 votes)
Alignment Research Field Guide 2019-03-08T19:57:05.658Z · score: 199 (71 votes)
Pavlov Generalizes 2019-02-20T09:03:11.437Z · score: 68 (20 votes)
What are the components of intellectual honesty? 2019-01-15T20:00:09.144Z · score: 32 (8 votes)
CDT=EDT=UDT 2019-01-13T23:46:10.866Z · score: 42 (11 votes)
When is CDT Dutch-Bookable? 2019-01-13T18:54:12.070Z · score: 25 (4 votes)
CDT Dutch Book 2019-01-13T00:10:07.941Z · score: 27 (8 votes)
Non-Consequentialist Cooperation? 2019-01-11T09:15:36.875Z · score: 43 (15 votes)
Combat vs Nurture & Meta-Contrarianism 2019-01-10T23:17:58.703Z · score: 55 (16 votes)
What makes people intellectually active? 2018-12-29T22:29:33.943Z · score: 83 (42 votes)
Embedded Agency (full-text version) 2018-11-15T19:49:29.455Z · score: 91 (35 votes)
Embedded Curiosities 2018-11-08T14:19:32.546Z · score: 79 (30 votes)
Subsystem Alignment 2018-11-06T16:16:45.656Z · score: 115 (36 votes)
Robust Delegation 2018-11-04T16:38:38.750Z · score: 109 (36 votes)
Embedded World-Models 2018-11-02T16:07:20.946Z · score: 80 (25 votes)
Decision Theory 2018-10-31T18:41:58.230Z · score: 87 (32 votes)
Embedded Agents 2018-10-29T19:53:02.064Z · score: 151 (68 votes)
A Rationality Condition for CDT Is That It Equal EDT (Part 2) 2018-10-09T05:41:25.282Z · score: 17 (6 votes)
A Rationality Condition for CDT Is That It Equal EDT (Part 1) 2018-10-04T04:32:49.483Z · score: 21 (7 votes)
In Logical Time, All Games are Iterated Games 2018-09-20T02:01:07.205Z · score: 83 (26 votes)
Track-Back Meditation 2018-09-11T10:31:53.354Z · score: 57 (21 votes)
Exorcizing the Speed Prior? 2018-07-22T06:45:34.980Z · score: 11 (4 votes)
Stable Pointers to Value III: Recursive Quantilization 2018-07-21T08:06:32.287Z · score: 20 (9 votes)
Probability is Real, and Value is Complex 2018-07-20T05:24:49.996Z · score: 44 (20 votes)
Complete Class: Consequentialist Foundations 2018-07-11T01:57:14.054Z · score: 43 (16 votes)
Policy Approval 2018-06-30T00:24:25.269Z · score: 49 (18 votes)
Machine Learning Analogy for Meditation (illustrated) 2018-06-28T22:51:29.994Z · score: 99 (36 votes)
Confusions Concerning Pre-Rationality 2018-05-23T00:01:39.519Z · score: 36 (7 votes)
Co-Proofs 2018-05-21T21:10:57.290Z · score: 91 (25 votes)
Bayes' Law is About Multiple Hypothesis Testing 2018-05-04T05:31:23.024Z · score: 81 (20 votes)
Words, Locally Defined 2018-05-03T23:26:31.203Z · score: 50 (15 votes)
Hufflepuff Cynicism on Hypocrisy 2018-03-29T21:01:29.179Z · score: 33 (17 votes)
Learn Bayes Nets! 2018-03-27T22:00:11.632Z · score: 84 (24 votes)
An Untrollable Mathematician Illustrated 2018-03-20T00:00:00.000Z · score: 263 (94 votes)
Explanation vs Rationalization 2018-02-22T23:46:48.377Z · score: 31 (8 votes)

Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T23:22:26.878Z · score: 2 (1 votes) · LW · GW
The online learning conceptual problem (as I understand your description of it) says, for example, I can never know whether it was a good idea to have read this book, because maybe it will come in handy 40 years later. Well, this seems to be "solved" in humans by exponential / hyperbolic discounting. It's not exactly episodic, but we'll more-or-less be able to retrospectively evaluate whether a cognitive process worked as desired long before death.

I interpret you as suggesting something like what Rohin is suggesting, with a hyperbolic function giving the weights.

It seems (to me) the literature establishes that our behavior can be approximately described by the hyperbolic discounting rule (in certain circumstances anyway), but, comes nowhere near establishing that the mechanism by which we learn looks like this, and in fact has some evidence against. But that's a big topic. For a quick argument, I observe that humans are highly capable, and I generally expect actor/critic to be more capable than dumbly associating rewards with actions via the hyperbolic function. That doesn't mean humans use actor/critic; the point is that there are a lot of more-sophisticated setups to explore.

We do in fact have a model class.

It's possible that our models are entirely subservient to instrumental stuff (ie, we "learn to think" rather than "thinking to learn", which would mean we don't have the big split which I'm pointing to -- ie, that we solve the credit assignment problem "directly" somehow, rather than needing to learn to do so.

It seems very rich; in terms of "grain of truth", well I'm inclined to think that nothing worth knowing is fundamentally beyond human comprehension, except for contingent reasons like memory and lifespan limitations (i.e. not because they are not incompatible with the internal data structures). Maybe that's good enough?
Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T23:05:36.083Z · score: 2 (1 votes) · LW · GW
Not... really? "how can I maximize accuracy?" is a very liberal agentification of a process that might be more drily thought of as asking "what is accurate?" Your standard sequence predictor isn't searching through epistemic pseudo-actions to find which ones best maximize its expected accuracy, it's just following a pre-made plan of epistemic action that happens to increase accuracy.

Yeah, I absolutely agree with this. My description that you quoted was over-dramaticizing the issue.

Really, what you have is an agent sitting on top of non-agentic infrastructure. The non-agentic infrastructure is "optimizing" in a broad sense because it follows a gradient toward predictive accuracy, but it is utterly myopic (doesn't plan ahead to cleverly maximize accuracy).

The point I was making, stated more accurately, is that you (seemingly) need this myopic optimization as a 'protected' sub-part of the agent, which the overall agent cannot freely manipulate (since if it could, it would just corrupt the policy-learning process by wireheading).

Though this does lead to the thought: if you want to put things on equal footing, does this mean you want to describe a reasoner that searches through epistemic steps/rules like an agent searching through actions/plans?
This is more or less how humans already conceive of difficult abstract reasoning.

Yeah, my observation is that it intuitively seems like highly capable agents need to be able to do that; to that end, it seems like one needs to be able to describe a framework where agents at least have that option without it leading to corruption of the overall learning process via the instrumental part strategically biasing the epistemic part to make the instrumental part look good.

(Possibly humans just use a messy solution where the strategic biasing occurs but the damage is lessened by limiting the extent to which the instrumental system can bias the epistemics -- eg, you can't fully choose what to believe.)

Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T22:55:50.119Z · score: 2 (1 votes) · LW · GW

How does that work?

Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T22:52:45.218Z · score: 2 (1 votes) · LW · GW

My thinking is somewhat similar to Vanessa's. I think a full explanation would require a long post in itself. It's related to my recent thinking about UDT and commitment races. But, here's one way of arguing for the approach in the abstract.

Assuming that we do want to be pre-rational, how do we move from our current non-pre-rational state to a pre-rational one? This is somewhat similar to the question of how do we move from our current non-rational (according to ordinary rationality) state to a rational one. Expected utility theory says that we should act as if we are maximizing expected utility, but it doesn't say what we should do if we find ourselves lacking a prior and a utility function (i.e., if our actual preferences cannot be represented as maximizing expected utility).
The fact that we don't have good answers for these questions perhaps shouldn't be considered fatal to pre-rationality and rationality, but it's troubling that little attention has been paid to them, relative to defining pre-rationality and rationality. (Why are rationality researchers more interested in knowing what rationality is, and less interested in knowing how to be rational? Also, BTW, why are there so few rationality researchers? Why aren't there hordes of people interested in these issues?)

My contention is that rationality should be about the update process. It should be about how you adjust your position. We can have abstract rationality notions as a sort of guiding star, but we also need to know how to steer based on those.

Some examples:

• Logical induction can be thought of as the result of performing this transform on Bayesianism; it describes belief states which are not coherent, and gives a rationality principle about how to approach coherence -- rather than just insisting that one must somehow approach coherence.
• Evolutionary game theory is more dynamic than the Nash story. It concerns itself more directly with the question of how we get to equilibrium. Strategies which work better get copied. We can think about the equilibria, as we do in the Nash picture; but, the evolutionary story also lets us think about non-equilibrium situations. We can think about attractors (equilibria being point-attractors, vs orbits and strange attractors), and attractor basins; the probability of ending up in one basin or another; and other such things.
• However, although the model seems good for studying the behavior of evolved creatures, there does seem to be something missing for artificial agents learning to play games; we don't necessarily want to think of there as being a population which is selected on in that way.
• The complete class theorem describes utility-theoretic rationality as the end point of taking Pareto improvements. But, we could instead think about rationality as the process of taking Pareto improvements. This lets us think about (semi-)rational agents whose behavior isn't described by maximizing a fixed expected utility function, but who develop one over time. (This model in itself isn't so interesting, but we can think about generalizing it; for example, by considering the difficulty of the bargaining process -- subagents shouldn't just accept any Pareto improvement offered.)
• Again, this model has drawbacks. I'm definitely not saying that by doing this you arrive at the ultimate learning-theoretic decision theory I'd want.
Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T22:28:40.388Z · score: 2 (1 votes) · LW · GW
You could also have a version of REINFORCE that doesn't make the episodic assumption, where every time you get a reward, you take a policy gradient step for each of the actions taken so far, with a weight that decays as actions go further back in time. You can't prove anything interesting about this, but you also can't prove anything interesting about actor-critic methods that don't have episode boundaries, I think.

Yeah, you can do this. I expect actor-critic to work better, because your suggestion is essentially a fixed model which says that actions are more relevant to temporally closer rewards (and that this is the only factor to consider).

I'm not sure how to further convey my sense that this is all very interesting. My model is that you're like "ok sure" but don't really see why I'm going on about this.

Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T21:33:27.095Z · score: 4 (2 votes) · LW · GW

Yeah, it's definitely related. The main thing I want to point out is that Shapley values similarly require a model in order to calculate. So you have to distinguish between the problem of calculating a detailed distribution of credit and being able to assign credit "at all" -- in artificial neural networks, backprop is how you assign detailed credit, but a loss function is how you get a notion of credit at all. Hence, the question "where do gradients come from?" -- a reward function is like a pile of money made from a joint venture; but to apply backprop or Shapley value, you also need a model of counterfactual payoffs under a variety of circumstances. This is a problem, if you don't have a seperate "epistemic" learning process to provide that model -- ie, it's a problem if you are trying to create one big learning algorithm that does everything.

Specifically, you don't automatically know how to

send rewards to each contributor proportional to how much they improved the actual group decision

because in the cases I'm interested in, ie online learning, you don't have the option of

rerunning it without them and seeing how performance declines

-- because you need a model in order to rerun.

But, also, I think there are further distinctions to make. I believe that if you tried to apply Shapley value to neural networks, it would go poorly; and presumably there should be a "philosophical" reason why this is the case (why Shapley value is solving a different problem than backprop). I don't know exactly what the relevant distinction is.

(Or maybe Shapley value works fine for NN learning; but, I'd be surprised.)

Comment by abramdemski on The Credit Assignment Problem · 2019-11-13T21:21:25.223Z · score: 11 (2 votes) · LW · GW

Yeah, this one was especially difficult in that way. I spent a long time trying to articulate the idea in a way that made any sense, and kept adding framing context to the beginning to make the stuff closer to what I wanted to say make more sense -- the idea that the post was about the credit assignment algorithm came very late in the process. I definitely agree that rant-mode feels very vulnerable to attack.

Comment by abramdemski on “embedded self-justification,” or something like that · 2019-11-13T21:17:02.342Z · score: 2 (1 votes) · LW · GW
What you call floor for Alpha Go, i.e. the move evaluations, are not even boundaries (in the sense nostalgebraist define it), that would just be the object level (no meta at all) policy.

I think in general the idea of the object level policy with no meta isn't well-defined, if the agent at least does a little meta all the time. In AlphaGo, it works fine to shut off the meta; but you could imagine a system where shutting off the meta would put it in such an abnormal state (like it's on drugs) that the observed behavior wouldn't mean very much in terms of its usual operation. Maybe this is the point you are making about humans not having a good floor/ceiling distinction.

But, I think we can conceive of the "floor" more generally. If the ceiling is the fixed structure, e.g. the update for the weights, the "floor" is the lowest-level content -- e.g. the weights themselves. Whether thinking at some meta-level or not, these weights determine the fast heuristics by which a system reasons.

I still think some of what nostalgebraist said about boundaries seems more like the floor than the ceiling.

The space "between" the floor and the ceiling involves constructed meta levels, which are larger computations (ie not just a single application of a heuristic function), but which are not fixed. This way we can think of the floor/ceiling spectrum as small-to-large: the floor is what happens in a very small amount of time; the ceiling is the whole entire process of the algorithm (learning and interacting with the world); the "interior" is anything in-between.

Of course, this makes it sort of trivial, in that you could apply the concept to anything at all. But the main interesting thing is how an agent's subjective experience seems to interact with floors and ceilings. IE, we can't access floors very well because they happen "too quickly", and besides, they're the thing that we do everything with (it's difficult to imagine what it would mean for a consciousness to have subjective "access to" its neurons/transistors). But we can observe the consequences very immediately, and reflect on that. And the fast operations can be adjusted relatively easy (e.g. updating neural weights). Intermediate-sized computational phenomena can be reasoned about, and accessed interactively, "from the outside" by the rest of the system. But the whole computation can be "reasoned about but not updated" in a sense, and becomes difficult to observe again (not "from the outside" the way smaller sub-computations can be observed).

Comment by abramdemski on Meetup Notes: Ole Peters on ergodicity · 2019-11-13T20:24:08.174Z · score: 2 (1 votes) · LW · GW

I now like the "time vs ensemble" description better. I was trying to understand everything coming from a Bayesian frame, but actually, all of these ideas are more frequentist.

In a Bayesian frame, it's natural to think directly in terms of a decision rule. I didn't think time-averaging was a good description because I didn't see a way for an agent to directly replace ensemble average with time average, in order to make decisions:

• Ensemble averaging is the natural response to decision-making under uncertainty; you're averaging over different possibilities. When you try to time-average to get rid of your uncertainty, you have to ask "time average what?" -- you don't know what specific situation you're in.
• In general, the question of how to turn your current situation into a repeated sequence for the purpose of time-averaging analysis seems under-determined (even if you are certain about your present situation). Surely Peters doesn't want us to use actual time in the analysis; in actual time, you end up dead and lose all your money, so the time-average analysis is trivial.
• Even if you settle on a way to turn the situation into an iterated sequence, the necessary limit does not necessarily exist. This is also true of the possibility-average, of course (the St Petersburg Paradox being a classic example); but it seems easier to get failure in the time-avarage case, because you just need non-convergence; ie, you don't need any unbounded stuff to happen.

However, all of these points are also true of frequentism:

• Frequentist approaches start from the objective/external perspective rather than the agent's internal uncertainty. They don't want to define probability as the subjective viewpoint; they want probability to be defined as limiting frequencies if you repeated an experiment over and over again. The fact that you don't have direct access to these is a natural consequence of you not having direct access to objective truth.
• Even given direct access to objective truth, frequentist probabilities are still under-defined because of the reference class problem -- what infinite sequence of experiments do you conceive of your experiment as part of?
• And, again, once you select a sequence, there's no guarantee that a limit exists. Frequentism has to solve this by postulating that limits exist for the kinds of reference classes we want to talk about.

So, I now think what Ole Peters is working on is frequentist decision theory. Previously, the frequentist/Bayesian debate was about statistics and science, but decision theory was predominantly Bayesian. Ole Peters is working out the natural theory of decision making which frequentists could/should have been pursuing. (So, in that sense, it's much more than just a new argument for kelly betting.)

Describing frequentist-vs-Bayesian as time-averaging vs possibility-averaging (aka ensemble-averaging) seems perfectly appropriate.

So, on my understanding, Ole's response to the three difficulties could be:

• We first understand the optimal response to an objectively defined scenario; then, once we've done that, we can concern ourselves with the question of how to actually behave given our uncertainty about what situation we're in. This is not trying to be a universal formula for rational decision making in the same way Bayesianism attempts to be; you might have to do some hard work to figure out enough about your situation in order to apply the theory.
• And when we design general-purpose techniques, much like when we design statistical tests, our question should be whether given an objective scenario the decision-making technique does well -- the same as frequentists wanting estimates to be unbiased. Bayesians want decisions and estimates to be optimal given our uncertainty instead.
• As for how to turn your situation into an iterated game, Ole can borrow the frequentist response of not saying much about it.
• As for the existence of a limit, Ole actually says quite a bit about how to fiddle with the math until you're dealing with a quantity for which a limit exists. See his lecture notes. On page 24 (just before section 1.3) he talks briefly about finding an appropriate function of your wealth such that you can do the analysis. Then, section 2.7 says much more about this.
• The general idea is that you have to choose an analysis which is appropriate to the dynamics. Additive dynamics call for additive analysis (examining the time-average of wealth). Multiplicative dynamics call for multiplicative analysis (examining the time-average of growth, as in kelly betting and similar settings). Other settings call for other functions. Multiplicative dynamics are common in financial theory because so much financial theory is about investment, but if we examine financial decisions for those living on income, then it has to be very different.
Comment by abramdemski on Meetup Notes: Ole Peters on ergodicity · 2019-11-09T23:11:44.974Z · score: 5 (5 votes) · LW · GW

I haven't read the material extensively (I've skimmed it), but here's what I think is wrong with the time-average-vs-ensemble-average argument and my attempt to steelman it.

It seems very plausible to me that you're right about the question-begging nature of Peter's version of the argument; it seems like by maximizing expected growth rate, you're maximizing log wealth.

But I also think he's trying to point at something real.

In the presentation where he uses the 1.5x/0.6x bet example, Peters shows how "expected utility over time" is an increasing line (this is the "ensemble average" -- averaging across possibilities at each time), whereas the actual payout for any player looks like a straight downward line (in log-wealth) if we zoom out over enough iterations. There's no funny business here -- yes, he's taking a log, but that's just the best way of graphing the phenomenon. It's still true that you lose almost surely if you keep playing this game longer and longer.

This is a real phenomenon. But, how do we formalize an alternative optimization criterion from it? How do we make decisions in a way which "aggregates over time rather than over ensemble"? It's natural to try to formalize something in log-wealth space since that's where we see a straight line, but as you said, that's question-begging.

Well, a (fairly general) special case of log-wealth maximization is the Kelly criterion. How do people justify that? Wikipedia's current "proof" section includes a heuristic argument which runs roughly as follows:

• Imagine you're placing bets in the same way a large number of times, N.
• By the law of large numbers, the frequency of wins and losses approximately equals their probabilities.
• Optimize total wealth at time N under the assumption that the frequencies equal the probabilities. You get the Kelly criterion.

Now, it's easy to see this derivation and think "Ah, so the Kelly criterion optimizes your wealth after a large number of steps, whereas expected utility only looks one step ahead". But, this is not at all the case. An expected money maximizer (EMM) thinking long-term will still take risky bets. Observe that (in the investment setting in which Kelly works) the EMM strategy for a single step doesn't depend on the amount of money you have -- you either put all your money in the best investment, or you keep all of your money because there are no good investments. Therefore, the payout of the EMM in a single step is some multiple C of the amount of money it begins that step with. Therefore, an EMM looking one step ahead just values its winnings at the end of the first step C more -- but this doesn't change its behavior, since multiplying everything by C doesn't change what the max-expectation strategy will be. Similarly, two-step lookahead only modifies things by , and so on. So an EMM looking far ahead behaves just like one maximizing its holdings in the very next step.

The trick in the analysis is the way we replace a big sum over lots of possible ways things could go with a single "typical" outcome. This might initially seem like a mere computational convenience -- after all, the vast vast majority of possible sequences have approximately the expected win/loss frequencies. Here, though, it makes all the difference, because it eliminates from consideration the worlds which have the highest weight in the EMM analysis -- the worlds where things to really well and the EMM gets exponentially much money.

OK, so, is the derivation just a mistake?

I think many english-language justifications of the Kelly criterion or log-wealth maximization are misleading or outright wrong. I don't think we can justify it as an analysis of the best long-term strategy, because the analysis rules out any sequence other than those with the most probable statistics, which isn't a move motivated by long-term analysis. I don't think we can even justify it as "time average rather than ensemble average" because we're not time-averaging wealth. Indeed, the whole point is supposedly to deal with the non-ergodic cases; but non-ergodic systems don't have unique time-averaged behavior!

However, I ultimately find something convincing about the analysis: namely, from an evolutionary perspective, we expect to eventually find that only (approximate) log-wealth maximizers remain in the market (with non-negligible funds).

This conclusion is perfectly compatible with expected utility theory as embodied by the VNM axioms et cetera. It's an argument that market entities will tend to have utility=log(money), at least approximately, at least in common situations which we can expect strategies to be optimized for. More generally, there might be an argument that evolved organisms will tend to have utility=log(resources), for many notions of resources.

However, maybe Nassim Nicolas Taleb would rebuke us for this tepid and timid conclusion. In terms of pure utility theory, applying a log before taking an expectation is a distinction without a difference -- we were allowed any utility function we wanted from the start, so requiring an arbitrary transform means nothing. For example, we can "solve" the St. Petersburg paradox by claiming our utility is the log of money -- but we can then re-create the paradox by putting all the numbers in the game through an exponential function! So what's the point? We should learn from our past mistakes, and choose a framework which won't be prone to those same errors.

So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters' general idea, but isn't just log-wealth maximization?

Well, let's look again at the kelly-criterion analysis. Can we make that into a general-purpose decision procedure? Can we get it to produce results incompatible with VNM? If so, is the procedure at all plausible?

As I've already mentioned, there isn't a clear way to apply the law-of-large-numbers trick in non-ergodic situations, because there is not a unique "typical" set of frequencies which emerges. Can we do anything to repair the situation, though?

I propose that we maximize median expected value. This gives a notion of "typical" which does not rely on an application of the law of large numbers, so it's fine if the statistics of our sequence don't converge to a single unique point. If they do, however, the median will evaluate things from that point. So, it's a workable generalization of the principle behind Kelly betting.

The median also relates to something mentioned in the OP:

I've felt vaguely confused for a long time about why expected value/utility is the right way to evaluate decisions; it seems like I might be more strongly interested in something like "the 99th percentile outcome for the overall utility generated over my lifetime".

The median is the 50th percentile, so there you go.

Maximizing the median indeed violates VNM:

• It's discontinuous. Small differences in probability can change the median outcome by a lot. Maybe this isn't so bad -- who really cares about continuity, anyway? Yeah, seemingly small differences in probability create "unjustified" large differences in perceived quality of a plan, but only in circumstances where outcomes are sparse enough that the median is not very "informed".
• It violates independence, in a more obviously concerning way. A median-maximizer doesn't care about "outlier" outcomes. It's indifferent between the following two plans, which seems utterly wrong:
• A plan with 100% probability of getting you $100 • A plan with 60% probability of getting you$100, and 40% probability of getting you killed.

Both of these concerns become negligible as we take a long-term view. The longer into the future we look, the more outcomes there will be, making the median more robust to shifting probabilities. Similarly, a median-maximizer is indifferent between the two options above, but if you consider the iterated game, it will strongly prefer the global strategy of always selecting the first option.

Still, I would certainly not prefer to optimize median value myself, or create AGI which optimizes median value. What if there's a one-shot situation which is similar to the 40%-death example? I think I similarly don't want to maximize the 99th percentile outcome, although this is less clearly terrible.

Can we give an evolutionary argument for median utility, as a generalization of the evolutionary argument for log utility? I don't think so. The evolutionary argument relies on the law of large numbers, to say that we'll almost surely end up in a world where log-maximizers prosper. There's no similar argument that we almost surely end up in the "median world".

So, all told:

• I don't think there's a good argument against expectation-maximization here.
• But I do think those who think there is should consider median-maximization, as it's an alternative to expectation-maximization which is consistent with much of the discussion here.
• I basically buy the argument that utility should be log of money.
• I don't think it's right to describe the whole thing as "time-average vs ensemble-average", and suspect some of the "derivations" are question-begging.
• I do think there's an evolutionary argument which can be understood from some of the derivations, however.
Comment by abramdemski on Meetup Notes: Ole Peters on ergodicity · 2019-11-09T20:35:21.326Z · score: 2 (1 votes) · LW · GW

It seems to me like it's right. So far as I can tell, the "time-average vs ensemble average" argument doesn't really make sense, but it's still true that log-wealth maximization is a distinguished risk-averse utility function with especially good properties.

• Idealized markets will evolve to contain only Kelly bettors, as other strategies either go bust too often or have sub-optimal growth.
• BUT, keep in mind we don't live in such an idealized market. In reality, it only makes sense to use this argument to conclude that financially savvy people/institutions will be approximate log-wealth maximizers -- IE, the people/organizations with a lot of money. Regular people might be nowhere near log-wealth-maximizing, because "going bust" often doesn't literally mean dying; you can be a failed serial startup founder, because you can crash on friends'/parents' couches between ventures, work basic jobs when necessary, etc.
• More generally, evolved organisms are likely to be approximately log-resource maximizers. I'm less clear on this argument, but the situation seems analogous. It therefore may make sense to suppose that humans are approximate log-resource maximizers.

(I'm not claiming Peters is necessarily adding anything to this analysis.)

Comment by abramdemski on Defining Myopia · 2019-11-09T07:47:41.615Z · score: 6 (2 votes) · LW · GW

Sorry for taking so long to respond to this one.

I don't get the last step in your argument:

In contrast, if our learning algorithm is some evolutionary computation algorithm, the models (in the population) in which θ8 happens to be larger are expected to outperform the other models, in iteration 2. Therefore, we should expect iteration 2 to increase the average value of θ8 (over the model population).

Why do those models outperform? I think you must be imagining a different setup, but I'm interpreting your setup as:

• This is a classification problem, so, we're getting feedback on correct labels X for some Y.
• It's online, so we're doing this in sequence, and learning after each.
• We keep a population of models, which we update (perhaps only a little) after every training example; population members who predicted the label correctly get a chance to reproduce, and a few population members who didn't are killed off.
• The overall prediction made by the system is the average of all the predictions (or some other aggregation).
• Large influences at one time-step will cause predictions which make the next time-step easier.
• So, if the population has an abundance of high at one time step, the population overall does better in the next time step, because it's easier for everyone to predict.
• So, the frequency of high will not be increased at all. Just like in gradient descent, there's no point at which the relevant population members are specifically rewarded.

In other words, many members of the population can swoop in and reap the benefits caused by high- members. So high- carriers do not specifically benefit.

Comment by abramdemski on The Credit Assignment Problem · 2019-11-08T19:35:47.038Z · score: 8 (4 votes) · LW · GW

Yeah, I pretty strongly think there's a problem -- not necessarily an insoluble problem, but, one which has not been convincingly solved by any algorithm which I've seen. I think presentations of ML often obscure the problem (because it's not that big a deal in practice -- you can often define good enough episode boundaries or whatnot).

Suppose we have a good reward function (as is typically assumed in deep RL). We can just copy the trick in that setting, right? But the rest of the post makes it sound like you still think there's a problem, in that even with that reward, you don't know how to assign credit to each individual action. This is a problem that evolution also has; evolution seemed to manage it just fine.
• Yeah, I feel like "matching rewards to actions is hard" is a pretty clear articulation of the problem.
• I agree that it should be surprising, in some sense, that getting rewards isn't enough. That's why I wrote a post on it! But why do you think it should be enough? How do we "just copy the trick"??
• I don't agree that this is analogous to the problem evolution has. If evolution just "received" the overall population each generation, and had to figure out which genomes were good/bad based on that, it would be a more analogous situation. However, that's not at all the case. Evolution "receives" a fairly rich vector of which genomes were better/worse, each generation. The analogous case for RL would be if you could output several actions each step, rather than just one, and receive feedback about each. But this is basically "access to counterfactuals"; to get this, you need a model.
(Similarly, even if you think actor-critic methods don't count, surely REINFORCE is one-level learning? It works okay; added bells and whistles like critics are improvements to its sample efficiency.)

No, definitely not, unless I'm missing something big.

From page 329 of this draft of Sutton & Barto:

Note that REINFORCE uses the complete return from time t, which includes all future rewards up until the end of the episode. In this sense REINFORCE is a Monte Carlo algorithm and is well defined only for the episodic case with all updates made in retrospect after the episode is completed (like the Monte Carlo algorithms in Chapter 5). This is shown explicitly in the boxed on the next page.

So, REINFORCE "solves" the assignment of rewards to actions via the blunt device of an episodic assumption; all rewards in an episode are grouped with all actions during that episode. If you expand the episode to infinity (so as to make no assumption about episode boundaries), then you just aren't learning. This means it's not applicable to the case of an intelligence wandering around and interacting dynamically with a world, where there's no particular bound on how the past may relate to present reward.

The "model" is thus extremely simple and hardwired, which makes it seem one-level. But you can't get away with this if you want to interact and learn on-line with a really complex environment.

Also, since the episodic assumption is a form of myopia, REINFORCE is compatible with the conjecture that any gradients we can actually construct are going to incentivize some form of myopia.

Comment by abramdemski on The Credit Assignment Problem · 2019-11-08T17:32:48.600Z · score: 5 (3 votes) · LW · GW

Yep, I 100% agree that this is relevant. The PP/Friston/free-energy/active-inference camp is definitely at least trying to "cross the gradient gap" with a unified theory as opposed to a two-system solution. However, I'm not sure how to think about it yet.

• I may be completely wrong, but I have a sense that there's a distinction between learning and inference which plays a similar role; IE, planning is just inference, but both planning and inference work only because the learning part serves as the second "protected layer"??
• It may be that the PP is "more or less" the Bayesian solution; IE, it requires a grain of truth to get good results, so it doesn't really help with the things I'm most interested in getting out of "crossing the gap".
• Note that PP clearly tries to implement things by pushing everything into epistemics. On the other hand, I'm mostly discussing what happens when you try to smoosh everything into the instrumental system. So many of my remarks are not directly relevant to PP.
• I get the sense that Friston might be using the "evolution solution" I mentioned; so, unifying things in a way which kind of lets us talk about evolved agents, but not artificial ones. However, this is obviously an oversimplification, because he does present designs for artificial agents based on the ideas.

Overall, my current sense is that PP obscures the issue I'm interested in more than solves it, but it's not clear.

Comment by abramdemski on The Zettelkasten Method · 2019-11-07T23:51:11.541Z · score: 3 (2 votes) · LW · GW

Not really? Although I use interconnections, I focus a fair amount on the tree-structure part. I would say there's a somewhat curious phenomenon where I am able to go "deeper" in analysis than I would previously (in notebooks or workflowy), but the "shallow" part of the analysis isn't questioned as much as it could be (it becomes the context in which things happen). In a notebook, I might end up re-stating "early" parts of my overall argument more, and therefore refining them more.

I have definitely had the experience of reaching a conclusion fairly strongly in Zettelkasten and then having trouble articulating it to other people. My understanding of the situation is that I've built up a lot of context of which questions are worth asking, how to ask them, which examples are most interesting, etc. So there's a longer inferential distance. BUT, it's also a bad sign for the conclusion. The context I've built up is more probably shaky if I can't articulate it very well.

Comment by abramdemski on The Zettelkasten Method · 2019-11-06T20:00:50.463Z · score: 8 (4 votes) · LW · GW

My worry was essentially media-makes-message style. Luhmann's sociological theories were sprawling interconnected webs. (I have not read him at all; this is just my impression.) This is not necessarily because the reality he was looking at is best understood in that form. Also, his theory of sociology has something to do with systems interacting with each other through communication bottlenecks (?? again, I have not really read him), which he explicitly relates to Zettelkasten.

Relatedly, Paul Christiano uses a workflowy-type outlining tool extensively, and his theory of AI safety prominently features hierarchical tree structures.

Comment by abramdemski on Dreaming of Political Bayescraft · 2019-11-04T17:40:26.775Z · score: 9 (5 votes) · LW · GW
Any time you find yourself being tempted to be loyal to an idea, it turns out that what you should actually be loyal to is whatever underlying feature of human psychology makes the idea look like a good idea; that way, you'll find it easier to fucking update when it turns out that the implementation of your favorite idea isn't as fun as you expected!

I agree that there's an important skill here, but I also want to point out that this seems to tip in a particular direction which may be concerning.

Ben Hoffman writes about authenticity vs accuracy.

• An authenticity-oriented person thinks of honesty as being true to what you're feeling right now. Quick answers from the gut are more honest. Careful consideration before speaking is a sign of dishonesty. Making a promise and later breaking it isn't dishonest if you really meant the promise when you made it!
• An accuracy-oriented person thinks of honesty as making a real effort to tell the truth. Quick answers are a sign that you're not doing that; long pauses before speaking are a sign that you are. It's not just about saying what you really believe; making a factual error when you could have avoided it if you had been more careful is almost the same as purposefully lying (especially given concerns about motivated cognition).

Authenticity and accuracy are both valuable, and it would be best to reconcile them. But, my concern is that your advice against being loyal to an idea tips things away from accuracy. If you have a knee-jerk reaction to be loyal to the generators of an idea rather than the idea itself, it seems to me like you're going to make some slips toward the making-a-promise-and-breaking-it-isn't-dishonest-if-you-meant-it direction which you wouldn't reflectively endorse if you considered it more carefully.

Comment by abramdemski on The Parable of Predict-O-Matic · 2019-11-03T07:15:19.556Z · score: 2 (1 votes) · LW · GW

I guess 'self-fulfilling prophecy' is a bit long and awkward. Sometimes 'basilisk' is thrown around, but, specifically for negative cases (self-fulfilling-and-bad). But, are you trying to name something slightly different (perhaps broader or narrower) than self-fulfilling prophecy points at?

I find I don't like 'stipulation'; that has the connotation of command, for me (like, if I tell you to do something).

Comment by abramdemski on “embedded self-justification,” or something like that · 2019-11-03T06:49:46.102Z · score: 24 (9 votes) · LW · GW

It seems to me that there are roughly two types of "boundary" to think about: ceilings and floors.

• Floors are aka the foundations. Maybe a system is running on a basically Bayesian framework, or (alternately) logical induction. Maybe there are some axioms, like ZFC. Going meta on floors involves the kind of self-reference stuff which you hear about most often: Gödel's theorem and so on. Floors are, basically, pretty hard to question and improve (though not impossible).
• Ceilings are fast heuristics. You have all kinds of sophisticated beliefs in the interior, but there's a question of which inferences you immediately make, without doing any meta to consider what direction to think in. (IE, you do generally do some meta to think about what direction to think in; but, this "tops out" at some level, at which point the analysis has to proceed without meta.) Ceilings are relatively easy to improve. For example, the AlphaGo move proposal network and evaluation network (if I recall the terms correctly). These have cheap updates which can be made frequently, via observing the results of reasoning. These incremental updates then help the more expensive tree-search reasoning to be even better.

Both floors and ceilings have a flavor of "the basic stuff that's actually happening" -- the interior is built out of a lot of boundary stuff, and small changes to boundary will create large shifts in interior. However, floors and ceilings are very different. Tweaking floor is relatively dangerous, while tweaking ceiling is relatively safe. Returning to the AlphaGo analogy, the floor is like the model of the game which allows tree search. The floor is what allows us to create a ceiling. Tweaks to the floor will tend to create large shifts in the ceiling; tweaks to the ceiling will not change the floor at all.

(Perhaps other examples won't have as clear a floor/ceiling division as AlphaGo; or, perhaps they still will.)

What remains unanswered, though, is whether there is any useful way of talking about doing this (the whole thing, including the self-improvement R&D) well, doing it rationally, as opposed to doing it in a way that simply “seems to work” after the fact.
[...] Is there anything better than simply bumbling around in concept-space, in a manner that perhaps has many internal structures of self-justification but is not known to work as a whole? [...]
Can you represent your overall policy, your outermost strategy-over-strategies considered a response to your entire situation, in a way that is not a cartoon, a way real enough to defend itself?

My intuition is that the situation differs, somewhat, for floors and ceilings.

• For floors, there are fundamental logical-paradox-flavored barriers. This relates to MIRI research on tiling agents.
• For ceilings, there are computational-complexity-flavored barriers. You don't expect to have a perfect set of heuristics for fast thinking. But, you can have strategies relating to heuristics which have universal-ish properties. Like, logical induction is an "uppermost ceiling" (takes the fixed point of recursive meta) such that, in some sense, you know you're doing the best you can do in terms of tracking which heuristics are useful; you don't have to spawn further meta-analysis on your heuristic-forming heuristics. HOWEVER, it is also very very slow and impractical for building real agents. It's the agent that gets eaten in your parable. So, there's more to be said with respect to ceilings as they exist in reality.
Comment by abramdemski on Defining Myopia · 2019-11-02T17:46:58.299Z · score: 6 (3 votes) · LW · GW
(1) I expect many actors to be throwing a lot of money on selection processes (especially unsupervised learning), and I find it plausible that such efforts would produce transformative/dangerous systems.

Sure.

(2) Suppose there's some competitive task that is financially important (e.g. algo-trading), for which actors build systems that use a huge neural network trained via gradient descent. I find it plausible that some actors will experiment with evolutionary computation methods, trying to produce a component that will outperform and replace that neural network.

Maybe, sure.

There seems to be something I'm missing here. What you said earlier:

Apart from this, it seems to me that some evolutionary computation algorithms tend to yield models that take all the Pareto improvements, given sufficiently long runtime. The idea is that at any point during training we should expect a model to outperform another model—that takes one less Pareto improvement—on future fitness evaluations (all other things being equal).

is an essentially mathematical remark, which doesn't have a lot to do with AI timelines and projections of which technologies will be used. I'm saying that this remark strikes me as a type error, because it confuses what I meant by "take all the Pareto improvements" -- substituting the (conceptually and technologically difficult) control concept for the (conceptually straightforward, difficult only because of processing power limitations) selection concept.

I interpret you that way because your suggestion to apply evolutionary algorithms appears to be missing data. We can apply evolutionary algorithms if we can define a loss function. But the problem I'm pointing at (off full vs partial agency) has to do with difficulties of defining a loss function.

>How would you propose to apply evolutionary algorithms to online learning?
One can use a selection process—say, some evolutionary computation algorithm—to produce a system that performs well in an online learning task. The fitness metric would be based on the performance in many (other) online learning tasks for which training data is available (e.g. past stock prices) or for which the environment can be simulated (e.g. Atari games, robotic arm + boxes).

So, what is the argument that you'd tend to get full agency out of this? I think the situation is not very different from applying gradient descent in a similar way.

• Using data from past stock prices, say, creates an implicit model that the agent's trades can never influence the stock price. This is of course a mostly fine model for today's ML systems, but, it's also an example of what I'm talking about -- training procedures tend to create partial agency rather than full agency.
• Training the system on many online learning tasks, there will not be an incentive to optimize across tasks -- the training procedure implicitly assumes that the different tasks are independent. This is significant because you really need a whole lot of data in order to learn effective online learning tactics; it seems likely you'd end up splitting larger scenarios into a lot of tiny episodes, creating myopia.

I'm not saying I'd be happily confident that such a procedure would produce partial agents (therefore avoiding AI risk). And indeed, there are differences between doing this with gradient descent and evolutionary algorithms. One of the things I focused on in the post, time-discounting, becomes less relevant -- but only because it's more natural to split things into episodes in the case of evolutionary algorithms, which still creates myopia as a side effect.

What I'm saying is there's a real credit assignment problem here -- you're trying to pick between different policies (ie the code which the evolutionary algorithms are selecting between), based on which policy has performed better in the past. But you've taken a lot of actions in the past. And you've gotten a lot of individual pieces of feedback. You don't know how to ascribe success/failure credit -- that is, you don't know how to match individual pieces of feedback to individual decisions you made (and hence to individual pieces of code).

So you solve the problem in a basically naive way: you assume that the feedback on "instance n" was related to the code you were running at that time. This is a myopic assumption!

>How would you propose to apply evolutionary algorithms to non-episodic environments?
I'm not sure whether this refers to non-episodic tasks (the issue being slower/sparser feedback?) or environments that can't be simulated (in which case the idea above seems to apply: one can use a selection process, using other tasks for which there's training data or for which the environment can be simulated).

The big thing with environments that can't be simulated is that you don't have a reset button, so you can't back up and try again; so, episodic and simulable are pretty related.

Sparse feedback is related to what I'm talking about, but feels like a selection-oriented way of understanding the difficulty of control; "sparse feedback" still applies to very episodic problems such as chess. The difficulty with control is that arbitrarily long historical contexts can sometimes matter, and you have to learn anyway. But I agree that it's much easier for this to present real difficulty if the rewards are sparse.

Comment by abramdemski on Partial Agency · 2019-11-02T16:49:24.470Z · score: 2 (1 votes) · LW · GW
My point is that the relevant distinction in that case seems to be "instrumental goal" vs. "terminal goal", rather than "full agency" vs. "partial agency". In other words, I expect that a map that split things up based on instrumental vs. terminal would do a better job of understanding the territory than one that used full vs. partial agency.

Ah, I see. I definitely don't disagree that epistemics is instrumental. (Maybe we have some terminal drive for it, but, let's set that aside.) BUT:

• I don't think we can account for what's going on here just by pointing that out. Yes, the fact that it's instrumental means that we cut it off when it "goes too far", and there's not a nice encapsulation of what "goes too far" means. However, I think even when we set that aside there's still an alter-the-map-to-fit-the-territory-not-the-other-way-around phenomenon. IE, yes, it's a subgoal, but how can we understand the subgoal? Is it best understood as optimization, or something else?
• When designing machine learning algorithms, this is essentially built in as a terminal goal; the training procedure incentivises predicting the data, not manipulating it. Or, if it does indeed incentivize manipulation of the data, we would like to understand that better; and we'd like to be able to design things which don't have that incentive structure.
To be clear, I don't think iid explains it in all cases, I also think iid is just a particularly clean example.

Ah, sorry for misinterpreting you.

Comment by abramdemski on Defining Myopia · 2019-10-22T00:15:33.247Z · score: 3 (2 votes) · LW · GW

I like the way you are almost able to turn this into a 'positive' account (the way generalized objectives are a positive account of myopic goals, but speaking in terms of failure to make certain pareto improvements is not). However, I worry that any goal over stated can be converted to a goal over outputs which amounts to the same thing, by calculating the expected value of the action according to the old goal. Presumably you mean some sufficiently simple action-goal so as to exclude this.

Comment by abramdemski on Defining Myopia · 2019-10-21T20:11:07.425Z · score: 4 (3 votes) · LW · GW

Any global optimization technique can find the global optimum of a fixed evaluation function given time. This is a different problem. As I mentioned before, the assumption of simulable environments which you invoke to apply evolutionary algorithms to RL problems assumes too much; it fundamentally changes the problem from a control problem to a selection problem. This is exactly the kind of mistake which prompted me to come up with the selection/control distinction.

How would you propose to apply evolutionary algorithms to online learning? How would you propose to apply evolutionary algorithms to non-episodic environments? I'm not saying it can't be done, but in doing so, your remark will no longer apply. For online non-episodic problems, you don't get to think directly in terms of climbing a fitness landscape.

Comment by abramdemski on The Dualist Predict-O-Matic ($100 prize) · 2019-10-19T07:50:44.312Z · score: 5 (3 votes) · LW · GW To highlight the "blurry distinction" more: In situations like that, you get into an optimized fixed point over time, even though the learning algorithm itself isn't explicitly searching for that. Note, if the prediction algorithm anticipates this process (perhaps partially), it will "jump ahead", so that convergence to a fixed point happens more within the computation of the predictor (less over steps of real world interaction). This isn't formally the same as searching for fixed points internally (you will get much weaker guarantees out of this haphazard process), but it does mean optimization for fixed point finding is happening within the system under some conditions. Comment by abramdemski on The Dualist Predict-O-Matic ($100 prize) · 2019-10-18T08:58:03.858Z · score: 2 (1 votes) · LW · GW

Do you mean to say that a prophecy might happen to be self-fulfilling even if it wasn't optimized for being so? Or are you trying to distinguish between "explicit" and "implicit" searches for fixed points?

More the second than the first, but I'm also saying that the line between the two is blurry.

For example, suppose there is someone who will often do what predict-o-matic predicts if they can understand how to do it. They often ask it what they are going to do. At first, predict-o-matic predicts them as usual. This modifies their behavior to be somewhat more predictable than it normally would be. Predict-o-matic locks into the patterns (especially the predictions which work the best as suggestions). Behavior gets even more regular. And so on.

You could say that no one is optimizing for fixed-point-ness here, and predict-o-matic is just chancing into it. But effectively, there's an optimization implemented by the pair of the predict-o-matic and the person.

In situations like that, you get into an optimized fixed point over time, even though the learning algorithm itself isn't explicitly searching for that.

Comment by abramdemski on The Dualist Predict-O-Matic (\$100 prize) · 2019-10-18T00:49:01.377Z · score: 12 (3 votes) · LW · GW

I'm not really sure what you mean when you say "something goes wrong" (in relation to the prize). I've been thinking about all this in a very descriptive way, ie, I want to understand what happens generally, not force a particular outcome. So I'm a little out-of-touch with the "goes wrong" framing at the moment. There are a lot of different things which could happen. Which constitute "going wrong"?

• Becoming non-myopic; ie, using strategies which get lower prediction loss long-term rather than on a per-question basis.
• (Note this doesn't necessarily mean planning to do so, in an inner-optimizer way.)
• Making self-fulfilling prophecies in order to strategically minimize prediction loss on individual questions (while possibly remaining myopic).
• Having a tendency for self-fulfilling prophecies at all (not necessarily strategically minimizing loss).
• Having a tendency for self-fulfilling prophecies, but not necessarily the ones which society has currently converged to (eg, disrupting existing equilibria about money being valuable because everyone expects things to stay that way).
• Strategically minimizing prediction loss in any way other than by giving better answers in an intuitive sense.
• Manipulating the world strategically in any way, toward any end.
• Catastrophic risk by any means (not necessarily due to strategic manipulation).

In particular, inner misalignment seems like something you aren't including in your "going wrong"? (Since it seems like an easy answer to your challenge.)

I note that the recursive-decomposition type system you describe is very different from most modern ML, and different from the "basically gradient descent" sort of thing I was imagining in the story. (We might naturally suppose that Predict-O-Matic has some "secret sauce" though.)

If you aren't already convinced, here's another explanation for why I don't think the Predict-O-Matic will make self-fulfilling prophecies by default.
In Abram's story, the engineer says: "The answer to a question isn't really separate from the expected observation. So 'probability of observation depending on that prediction' would translate to 'probability of an event given that event', which just has to be one."
In other words, if the Predict-O-Matic knows it will predict P = A, it assigns probability 1 to the proposition that it will predict P = A.

Right, basically by definition. The word 'given' was intended in the Bayesian sense, ie, conditional probability.

I contend that Predict-O-Matic doesn't know it will predict P = A at the relevant time. It would require time travel -- to know whether it will predict P = A, it will have to have made a prediction already, and but it's still formulating its prediction as it thinks about what it will predict.

It's quite possible that the Predict-O-Matic has become relatively predictable-by-itself, so that it generally has good (not perfect) guesses about what it is about to predict. I don't mean that it is in an equilibrium with itself; its predictions may be shifting in predictable directions. If these shifts become large enough, or if its predictability goes second-order (it predicts that it'll predict its own output, and thus pre-anticipates the direction of shift recursively) it has to stop knowing its own output in so much detail (it's changing too fast to learn about). But it can possibly know a lot about its output.

I definitely agree with most of the stuff in the 'answering a question by having the answer' section. Whether a system explicitly makes the prediction into a fixed point is a critical question, which will determine which way some of these issues go.

• If the system does, then there are explicit 'handles' to optimize the world by selecting which self-fulfilling prophecies to make true. We are effectively forced to deal with the issue (if only by random selection).
• If the system doesn't, then we lack such handles, but the system still has to do something in the face of such situations. It may converge to self-fulfilling stuff. It may not, and so, produce 'inconsistent' outputs forever. This will depend on features of the learning algorithm as well as features of the situation it finds itself in.

It seems a bit like you might be equating the second option with "does not produce self-fulfilling prophecies", which I think would be a mistake.

Comment by abramdemski on The Parable of Predict-O-Matic · 2019-10-17T20:44:30.783Z · score: 4 (3 votes) · LW · GW

I'm actually trying to be somewhat agnostic about the right conclusion here. I could have easily added another chapter discussing why the maximizing-surprise idea is not quite right. The moral is that the questions are quite complicated, and thinking vaguely about 'optimization processes' is quite far from adequate to understand this. Furthermore, it'll depend quite a bit on the actual details of a training procedure!

Comment by abramdemski on The Parable of Predict-O-Matic · 2019-10-17T20:26:34.080Z · score: 3 (2 votes) · LW · GW

Ah, yes, OK. I see I didn't include a line which I had considered including, [1.5] Assume the players are bidding rationally. (Editing OP to include.) The character is an economist, so it makes sense that this would be a background assumption.

So then, the highest bidder is the person who expects to make the most, which is the person actually capable of making the most.

Of course, you also have to worry about conflict of interest (where someone can extract value from the company by means other than dividends). But if we're using this as a model of a training process, the decision market is effectively the entire economy.

Comment by abramdemski on Maybe Lying Doesn't Exist · 2019-10-16T23:57:06.461Z · score: 10 (6 votes) · LW · GW
"Your honor, I know I told the customer that the chemical I sold to them would cure their disease, and it didn't, and I had enough information to know that, but you see, I wasn't conscious that it wouldn't cure their disease, as I was selling it to them, so it isn't really fraud" would not fly in any court that is even seriously pretending to be executing justice.

Yet, oddly, something called 'criminal intent' is indeed required in addition to the crime itself.

It seems that 'criminal intent' is not interpreted as conscious intent. Rather, the actions of the accused must be incompatible with those of a reasonable person trying to avoid the crime.

Comment by abramdemski on The Parable of Predict-O-Matic · 2019-10-16T23:10:02.094Z · score: 3 (2 votes) · LW · GW
The details on this one didn't fit together.

I don't know what objection this part is making.

Comment by abramdemski on The Parable of Predict-O-Matic · 2019-10-16T21:11:46.387Z · score: 4 (3 votes) · LW · GW

Betting person X will not die of cancer might create an incentive for a cancer assassin, but that's different then killing someone using any means.

Yeah, everything depends critically on which things get bet on in the market. If there's a general life expectancy market, it's an all-method assassination market. But it might happen that no one is interested in betting about that, and all the bets have to do with specific causes of death. Then things would be much more difficult for an assassin.

However, the more specific the claim being bet on, the lower the probably should be; so, the higher the reward for making it come true.

Comment by abramdemski on The Parable of Predict-O-Matic · 2019-10-16T21:04:22.450Z · score: 10 (4 votes) · LW · GW

No, it was a speculative conjecture which I thought of while writing.

The idea is that incentivizing agents to lower the error of your predictions (as in a prediction market) looks exactly like incentivizing them to "create" information (find ways of making the world more chaotic), and this is no coincidence. So perhaps there's a more general principle behind it, where trying to incentivize minimization of f(x,y) only through channel x (eg, only by improving predictions) results in an incentive to maximize f through y, under some additional assumptions. Maybe there is a connection to optimization duality in there.

In terms of the fictional cannon, I think of it as the engineer trying to convince the boss by simplifying things and making wild but impressive sounding conjectures. :)

Comment by abramdemski on Self-supervised learning & manipulative predictions · 2019-10-14T20:36:51.506Z · score: 3 (2 votes) · LW · GW

The reply to interstice makes me think about logical uncertainty: if the predictor "reasons" about what to expect (internally engages in a sequence of computations which accounts for more structure as it thinks longer), then it is especially difficult to be approximately Bayesian (for all the classic reasons that logical uncertainty address things up). So the argument that the described behaviour isn't logical doesn't really apply, because you have to deal with things like you mention where you spot an inconsistency in your probability distribution but you aren't sure how to deal with it.

This "reasoning" argument is related to the intuition you mention about search -- you imagine the system searching for sensible futures when deciding what to predict next. It doesn't make sense for a system to do that if the system is only learning conditional probabilities of the next token given history; there is no information to gain by looking ahead. However, there are a number of reasons why it could look ahead of it's doing something more complicated. It could be actively searching for good explanations of its history, and looking ahead to plausible futures might somehow aid that process. Or maybe it learns the more general blank-filling task rather than only the forward-prediction version where you fill in the future given the past; then it could benefit from consulting its own models that go in the other direction as a consistency check.

Still, I'm not convinced that strategic behavior gets incentivised. As you say in the post, we have to think through specific learning algorithms and what behaviour they encourage.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-14T07:46:00.038Z · score: 5 (3 votes) · LW · GW

I appreciate your thoughts! My own thinking on this is rapidly shifting and I regret that I'm not producing more posts about it right now. I will try to comment further on your linked post. Feel encouraged to PM me if you write/wrote more in this and think I might have missed it; I'm pretty interested in this right now.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-14T07:37:55.568Z · score: 4 (2 votes) · LW · GW

you correctly point out that in some architectures such parts would just get ignored, but in my view what happens in humans is more like a board of bayesian subagetns voting

How does credit assignment work to determine these subagents' voting power (if at all)? I'm negative about viewing it as 'prediction with warped parts ("fixed priors"), but setting that aside, one way or the other there's the concrete question of what's actually going on at the learning algorithm level. How do you set something up which is not incredibly myopic? (For example, if subagents are assigned credit based on who's active when actual reward is received, that's going to be incredibly myopic -- subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn't reward you for things that happen later; much like political terms of office making long-term policy difficult.)

Overall I'm not sure to what extent you expect clean designs from evolution.

I wasn't talking about parsimony because I expect the brain to be simple, but rather because a hypothesis which has a lot of extra complexity is less likely to be right. I expect human values to be complex, but still think a desire for parsimony such as sometimes motivates PP to be good in itself -- a parsimonious theory which matched observations well would be convincing in a way a complicated one would not be, even though I expect things to be complicated, because the complicated theory has many chances to be wrong.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-14T07:14:34.239Z · score: 4 (2 votes) · LW · GW

I'm still interested if you can say more about how you view it as minimizing a warped prediction. I mentioned that of you fix some parts of the network, they seem to end up getting ignored rather than producing goal-directed behaviour. Do you have an alternate picture in which this doesn't happen? (I'm not asking you to justify yourself rigorously; I'm curious for whatever thoughts or vague images you have here, though of course all the better if it really works)

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T08:27:24.009Z · score: 4 (2 votes) · LW · GW

I left a reply to this view at the other comment. However, I don't feel that point connects very well to the point I tried to make.

Your OP talks about minimization of prediction error as a theory of human value, relevant to alignment. It might be that evolution re-purposes predictive machinery to pursue adaptive goals; this seems like the sort of thing evolution would do. However, this leaves the question of what those goals are. You say you're not claiming that humans globally minimize prediction error. But, partly because of the remarks you made in the OP, I'm reading you as suggesting that humans do minimize prediction error, but relative to a skewed prediction.

Are human values well-predicted by modeling us as minimizing prediction error relative to a skewed prediction?

My argument here is that evolved creatures such as humans are more likely to (as one component of value) steer toward prediction error, because doing so tends to lead to learning, which is broadly valuable. This is difficult to model by taking a system which minimizes prediction error and skewing the predictions, because it is the exact opposite.

Elsewhere, you suggest that exploration can be predicted by your theory if there's a sort of reflection within the system, so that prediction error is predicted as well. The system therefore has an overall set-point for prediction error and explores if it's too small. But I think this would be drowned out. If I started with a system which minimizes prediction error and added a curiosity drive on top of it, I would have to entirely cancel out the error-minimization drive before I started to see the curiosity doing its job successfully. Similarly for your hypothesized part. Everything else in the system is strategically avoiding error. One part steering toward error would have to out-vote or out-smart all those other parts.

Now, that's over-stating my point. I don't think human curiosity drive is exactly seeking maximum prediction error. I think it's more likely related to the derivative of prediction error. But the point remains that that's difficult to model as minimization of a skewed prediction error, and requires a sub-part implementing curiosity to drown out all the other parts.

Instead of modeling human value as minimization of error of a skewed prediction, why not step back and model it as minimizing "some kind of error"? This seems no less parsimonious (since you have to specify the skew anyway), and leaves you with all the same controller machinery to propagate error through the system and learn to avoid it.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T07:48:24.844Z · score: 6 (3 votes) · LW · GW

• How does it work? I made some remarks in this other comment, and more extensive remarks below.
• How is minimizing error from a fixed/slow-moving set-point different from pursuing arbitrary goals? What's left of the minimization-of-prediction-error hypothesis?

When I think of minimizing prediction error, I think of minimizing error of something which is well-modeled as a predictor. A set-point for sex, say, doesn't seem like this -- many organisms get far less than their satiation level of sex, but the set-point evolved based on genetic fitness, not predictive accuracy. The same is true for other scarce resources in the ancestral environment, such as sugar.

Is your model that evolution gets agents to pursue useful goals by warping predictive circuitry to make false-but-useful predictions? Or is it that evolution would fix the ancient predictive circuitry if it were better at modifying old-but-critical subsystems in big jumps, but can't? I find the second unlikely. The first seems possible, but strains my credulity about modeling the warped stuff as "prediction".

As for the how-does-it-work point: if we start with a predictive hierarchy but then warp some pieces to fix their set-points, how do we end up with something which strategically minimizes the prediction error of those parts? When I think of freezing some of the predictions, it seems like what you get is a world-model which is locked into some beliefs, not something which strategizes to make those predictions true.

As I mentioned in the other comment, I have seen other work which gets agents out of this sort of thing; but it seems likely they had different definitions of key ideas such as minimizing prediction error, so your response would be illuminating.

• Well-working Bayesian systems minimize prediction error in the sense that they tweak their own weights (that is, probabilities) so as to reduce future error, in response to stimuli. They don't have a tendency to produce outputs now which are expected to reduce later prediction error. This is also true of small parts in a Bayesian network; each is individually responsible for minimizing its own prediction error of downstream info, using upstream info as helpful "freebie" information which it can benefit from in its downstream predictions. So, if you freeze a small part, its downstream neighbors will simply stop using it, because its frozen output is not useful. Upstream neighbors get the easy job of predicting the frozen values. So a mostly-bayesian system with some frozen parts doesn't seem to start trying to minimize the prediction error of the frozen bit in other ways, because each part is responsible for minimizing their own error.
• Similarly for artificial neural networks: freezing a sub-network makes its feedforward signal useless to downstream neurons, and its backprop information little more interesting than that. Other systems of predictive hierarchies seem likely to get similar results.

The problem here is that these systems are only trying to minimize prediction error on the current step. A predictive system may have long-term models, but error is only back-propagated in a way which encourages each individual prediction to be more accurate for the time-step it was made, not in a way which encourages outputs to strategically make future inputs easier to predict.

So, the way I see it, in order for a system to strategically act so as to minimize future prediction error of a frozen sub-part, you'd need a part of the system to act as a reinforcement learner whose reward signal was the prediction error of the other part. This is not how parts of a predictive hierarchy tend to behave. Parts of a predictive hierarchy learn to reduce their own predictive error -- and even there, they learn to produce outputs which are more similar to their observations, not to manipulate things so as to better match predictions.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T05:36:25.146Z · score: 2 (1 votes) · LW · GW
I think this mostly dissolves the other points you bring up that I read as contingent on thinking the theory doesn't predict humans would find variety and surprise good in some circumstances, but if not please let me know what the remaining concerns are in light of this explanation (or possibly object to my explanation of why we expect surprise to sometimes be net good).

Yeah, I noted that I and other humans often seem to enjoy surprise, but I also had a different point I was trying to make -- the claim that it makes sense that you'd observe competent agents doing many things which can be explained by minimizing prediction error, no matter what their goals.

But, it isn't important for you to respond further to this point if you don't feel it accounts for your observations.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T04:46:10.078Z · score: 2 (1 votes) · LW · GW

On my understanding of how things work, goals and beliefs combine to make action, so neither one is really mentally closer to action than the other. Both a goal and a belief can be quite far removed from action (eg, a nearly impossible goal which you don't act on, or a belief about far-away things which don't influence your day-to-day). Both can be very close (a jump scare seems most closely connected to a belief, whereas deciding to move your hand and then doing so is more goal-like -- granted both those examples have complications).

If, in conversation, the distinction comes up explicitly, it is usually because of stuff like this:

• Alice makes an unclear statement; it sounds like she could be claiming A or wanting A.
• Bob asks for clarification, because Bob's reaction to believing A is true would be very different from his reaction to believing A is good (or, in more relative terms, knowing Alice endorses one or the other of those). In the first case, Bob might plan under the assumption A; in the second, Bob might make plans designed to make A true.
• Alice is engaging in wishful thinking, claiming that something is true when really the opposite is just too terrible to consider.
• Bob wants to be able to rely on Alice's assertions, so Bob is concerned about the possibility of wishful thinking.
• Or, Bob is concerned for Alice; Bob doesn't want Alice to ignore risks due to ignoring negative possibilities, or fail to set up back-up plans for the bad scenarios.

My point is that it doesn't seem to me like a case of people intuitively breaking up a thing which is scientifically really one phenomena. Predicting A and wanting A seem to have quite different consequences. If you predict A, you tend to restrict attention to cases where it is true when planning; you may plan actions which rely on it. If you want A, you don't do that; you are very aware of all the cases where not-A. You take actions designed to ensure A.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T04:13:58.891Z · score: 2 (1 votes) · LW · GW
In light of this second reason, I'll add to my first reason that it seems maximally parsimonious that if we were looking for an origin of valence it would have to be about something simple that could be done by a control system, and the simplest thing it could do that doesn't simply ignore the input is test how far off an observed input is from a set point. If something more complex is going on, I think we'd need an explanation for why sending a signal indicating distance from a set point is not enough.

I more or less said this in my other comment, but to reply to this directly -- it makes sense to me that you could have a hierarchy of controllers which communicate via set points and distances from set points, but this doesn't particularly make me think set points are predictions.

Artificial neural networks basically work this way -- signals go one way, "degree of satisfaction" goes the other way (the gradient). If the ANN is being trained to make predictions, then yeah, "predictions go one way, distance from set point goes the other" (well, distance + direction). However, ANNs can be trained to do other things as well; so the signals/corrections need not be about prediction.

People have managed to build AI out of control systems minimizing prediction error, albeit doing, like I propose is necessary, by having some fixed set points that prevent dark room problems.

I've seen some results like this. I'm guessing there are a lot of different ways you could do it, but iirc what I saw seemed reasonable if what you want to do is build something like an imitation learner but also bias toward specific desired results. However, I think in that case "minimizing prediction error" meant a different thing than what you mean. So, what are you imagining?

If I take my ANN analogy, then fixing signals doesn't seem to help me do anything much. A 'set-point' is like a forward signal in the analogy, so fixing set points means fixing inputs to the ANN. But a fixed input is more or less a dead input as far as learning goes; the ANN will still just learn to produce whatever output behavior the gradient incentivises, such as prediction of the data. Fixing some of the outputs doesn't seem very helpful either.

Also, how is this parsimonious?

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T03:30:27.977Z · score: 7 (3 votes) · LW · GW
One is that it's elegant, simple, and parsimonious.

I certainly agree here. Furthermore I think it makes sense to try and unify prediction with other aspects of cognition, so I can get that part of the motivation (although I don't expect that humans have simple values). I just think this makes bad predictions.

Control systems are simple, they look to me to be the simplest thing we might reasonably call "alive" or "conscious" if we try to redefine those terms in ways that are not anchored on our experience here on Earth.

No disagreement here.

and this is claimed to always contain a signal of positive, negative, or neutral judgement.

Yeah, this seems like an interesting claim. I basically agree with the phenomenological claim. This seems to me like evidence in favor of a hierarchy-of-thermostats model (with one major reservation which I'll describe later). However, not particularly like evidence of the prediction-error-minimization perspective. We can have a network of controllers which express wishes to each other separately of predictions. Yes, that's less parsimonious, but I don't see a way to make the first work without dubious compromises.

Here's the reservation which I promised -- if we have a big pile of controllers, how would we know (based on phenomenal experience) that controllers attach positive/negative valence "locally" to every percept?

Forget controllers for a moment, and just suppose that there's any hierarchy at all. It could be made of controller-like pieces, or neural networks learning via backprop, etc. As a proxy for conscious awareness, let's ask: what kind of thing can we verbally report? There isn't any direct access to things inside the hierarchy; there's only the summary of information which gets passed up the hierarchy.

In other words: it makes sense that low-level features like edge detectors and colors get combined into increasingly high-level features until we recognize an object. However, it's notable that our high-level cognition can also purposefully attend to low-level features such as lines. This isn't really predicted by the basic hierarchy picture -- more needs to be said about how this works.

So, similarly, we can't predict that you or I verbally report positive/negative/neutral attaching to percepts from the claim that the sensory hierarchy is composed of units which are controllers. A controller has valence in that it has goals and how-it's-doing on those goals, but why should we expect that humans verbally report the direct experience of that? Humans don't have direct conscious experience of everything going on in neural circuitry.

This is not at all a problem with minimization of prediction error; it's more a question about hierarchies of controllers.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T00:35:31.151Z · score: 10 (2 votes) · LW · GW
Agents trade off exploring and exploiting, and when they're exploiting they look like they're minimizing prediction error?

That's one hypothesis in the space I was pointing at, but not particularly the thing I expect to be true. Or, maybe I think it is somewhat true as an observation about policies, but doesn't answer the question of how exactly variety and anti-variety are involved in our basic values.

A model which I more endorse:

We like to make progress understanding things. We don't like chaotic stuff with no traction for learning (like TV fuzz). We like orderly stuff more, but only while learning about it; it then fades to zero, meaning we have to seek more variety for our hedonic treadmill. We really like patterns which keep establishing and then breaking expectations, especially if there is always a deeper pattern which makes sense of the exceptions (like music); these patterns maximize the feeling of learning progress.

But I think that's just one aspect of our values, not a universal theory of human values.

Comment by abramdemski on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-09T23:43:06.444Z · score: 26 (6 votes) · LW · GW

I agree that

• there's something to the hierarchy thing;
• if we want, we can always represent values in terms of minimizing prediction error (at least to a close approximation), so long as we choose the right predictions;
• this might turn out to be the right thing to do, in order to represent the hierarchy thing elegantly (although I don't currently see why, and am somewhat skeptical).

However, I don't agree that we should think of values as being predictable from the concept of minimizing prediction error.

The tone of the following is a bit more adversarial than I'd like; sorry for that. My attitude toward predictive processing comes from repeated attempts to see why people like it, and all the reasons seeming to fall flat to me. If you respond, I'm curious about your reaction to these points, but it may be more useful for you to give the positive reasons why you think your position is true (or even just why it would be appealing), particularly if they're unrelated to what I'm about to say.

Evolved Agents Probably Don't Minimize Prediction Error

If we look at the field of reinforcement learning, it appears to be generally useful to add intrinsic motivation for exploration to an agent. This is the exact opposite of predictability: in one case we add reward for entering unpredictable states, whereas in the other case we add reward for entering predictable states. I've seen people try to defend minimizing prediction error by showing that the agent is still motivated to learn (in order to figure out how to avoid unpredictability). However, the fact remains: it is still motivated to learn strictly less than an unpredictability-loving agent. RL has, in practice, found it useful to add reward for unpredictability; this suggests that evolution might have done the same, and suggests that it would not have done the exact opposite. Agents operating under a prediction-error penalty would likely under-explore.

It's Easy to Overestimate The Degree to which Agents Minimize Prediction Error

I often enjoy variety -- in food, television, etc -- and observe other humans doing so. Naively, it seems like humans sometimes prefer predictability and sometimes prefer variety.

However: any learning agent, almost no matter its values, will tend to look like it is seeking predictability once it has learned its environment well. It is taking actions it has taken before, and steering toward the environmental states similar to what it always steers for. So, one could understandably reach the conclusion that it is reliability itself which the agent likes.

In other words: if I seem to eat the same foods quite often (despite claiming to like variety), you might conclude that I like familiarity when it's actually just that I like what I like. I've found a set of foods which I particularly enjoy (which I can rotate between for the sake of variety). That doesn't mean it is familiarity itself which I enjoy.

I'm not denying that mere familiarity has some positive valence for humans; I'm just saying that for arbitrary agents, it seems easy to over-estimate the importance of familiarity in their values, so we should be a bit suspicious about it for humans too. And I'm saying that it seems like humans enjoy surprises sometimes, and there's evolutionary/machine-learning reasoning to explain why this might be the case.

We Need To Explain Why Humans Differentiate Goals and Beliefs, Not Just Why We Conflate Them

You mention that good/bad seem like natural categories. I agree that people often seem to mix up "should" and "probably is", "good" and "normal", "bad" and "weird", etc. These observations in themselves speak in favor of the minimize-prediction-error theory of values.

However, we also differentiate these concepts at other times. Why is that? Is it some kind of mistake? Or is the conflation of the two the mistake?

I think the mix-up between the two is partly explained by the effect I mentioned earlier: common practice is optimized to be good, so there will be a tendency for commonality and goodness to correlate. So, it's sensible to cluster them together mentally, which can result in them getting confused. There's likely another aspect as well, which has something to do with social enforcement (ie, people are strategically conflating the two some of the time?) -- but I'm not sure exactly how that works.

Comment by abramdemski on Two senses of “optimizer” · 2019-10-09T20:09:47.059Z · score: 11 (3 votes) · LW · GW

I pretty strongly think this is the same distinction as I am pointing at with selection vs control, although perhaps I focus on a slightly broader cluster-y distinction while you have a more focused definition.

I think this distinction is something which people often conflate in computer science more broadly, too. Often, for example, a method will be initially intended for the control case, and people will make 'improvements' to it which only make sense in a selection context. It's easy for things to slide in that direction, because control-type algorithms will often be tested out in computer-simulated environments; but then, you have access to the environment, and can optimize it in more direct ways.

I'm more annoyed by this sort of mix-up than I probably should be.

Comment by abramdemski on The Zettelkasten Method · 2019-10-09T20:00:28.104Z · score: 2 (1 votes) · LW · GW

Yeah, I think it's actually not too bad to use Zettelkasten addresses in a fixed-page-location notebook. You can't put the addresses in proper order, but, I've mentioned that I don't sort my cards until I have a large back-log of unsorted anyway.

• As I said, the creation-time ordering is pretty useful anyway, because it correlates to what you're most likely to want to look at, whereas the proper sorting does not.
• Also, looking up addresses in creation-time ordering is usually not too bad: you can still rely on 2a to be later than 2, 2b to be later than 2a, etc. You just don't know for sure whether 3a will be on a later page than 2a.
Comment by abramdemski on The Zettelkasten Method · 2019-10-06T18:48:31.029Z · score: 3 (2 votes) · LW · GW

Another thing you could do (which I'm considering):

Currently, when I want to start an entirely new top-level topic, I make a new card in my highest-address deck. This means that highest deck is full of top-level ideas which mostly have little or no development.

Instead, one could bias toward starting new decks for new top-level ideas. You probably don't want to do this every time, but, it means you have a nice new deck with no distractions which you can carry around on its own. And so long as you are carrying around your latest new deck, you can add new top-level cards to it if you need to start a new topic on the go.

You don't get access to all your older ideas, but if we compare this to carrying around a notebook, it compares favorably.

EDIT: I've tried this now; I think it's quite a good solution.

Comment by abramdemski on Troll Bridge · 2019-10-05T21:09:29.968Z · score: 2 (1 votes) · LW · GW
I started asking for a chess example because you implied that the reasoning in the top-level comment stops being sane in iterated games.
In a simple iteration of Troll bridge, whether we're dumb is clear after the first time we cross the bridge.

Right, OK. I would say "sequential" rather than "iterated" -- my point was about making a weird assessment of your own future behavior, not what you can do if you face the same scenario repeatedly. IE: Troll Bridge might be seen as artificial in that the environment is explicitly designed to punish you if you're "dumb"; but, perhaps a sequential game can punish you more naturally by virtue of poor future choices.

Suppose my chess skill varies by day. If my last few moves were dumb, I shouldn't rely on my skill today. I don't see why I shouldn't deduce this ahead of time

Yep, I agree with this.

I concede the following points:

• If there is a mistake in the troll-bridge reasoning, predicting that your next actions are likely to be dumb conditional on a dumb-looking action is not an example of the mistake.
• Furthermore, that inference makes perfect sense, and if it is as analogous to the troll-bridge reasoning as I was previously suggesting, the troll-bridge reasoning makes sense.

However, I still assert the following:

• Predicting that your next actions are likely to be dumb conditional on a dumb looking action doesn't make sense if the very reason why you think the action looks dumb is that the next actions are probably dumb if you take it.

IE, you don't have a prior heuristic judgement that a move is one which you make when you're dumb; rather, you've circularly concluded that the move would be dumb -- because it's likely to lead to a bad outcome -- because if you take that move your subsequent moves are likely to be bad -- because it is a dumb move.

I don't have a natural setup which would lead to this, but the point is that it's a crazy way to reason rather than a natural one.

The question, then, is whether the troll-bridge reasoning is analogous to to this.

I think we should probably focus on the probabilistic case (recently added to the OP), rather than the proof-based agent. I could see myself deciding that the proof-based agent is more analogous to the sane case than the crazy one. But the probabilistic case seems completely wrong.

In the proof-based case, the question is: do we see the Löbian proof as "circular" in a bad way? It makes sense to conclude that you'd only cross the bridge when it is bad to do so, if you can see that proving it's a good idea is inconsistent. But does the proof that that's inconsistent "go through" that very inference? We know that the troll blows up the bridge if we're dumb, but that in itself doesn't constitute outside reason that crossing is dumb.

But I can see an argument that our "outside reason" is that we can't know that crossing is safe, and since we're a proof-based agent, would never take the risk unless we're being dumb.

However, this reasoning does not apply to the probabilistic agent. It can cross the bridge as a calculated risk. So its reasoning seems absolutely circular. There is no "prior reason" for it to think crossing is dumb; and, even if it did think it more likely dumb than not, it doesn't seem like it should be 100% certain of that. There should be some utilities for the three outcomes which preserve the preference ordering but which make the risk of crossing worthwhile.

Comment by abramdemski on Troll Bridge · 2019-10-04T23:45:44.998Z · score: 2 (1 votes) · LW · GW

The heuristic can override mere evidence, agreed. The problem I'm pointing at isn't that the heuristic is fundamentally bad and shouldn't be used, but rather that it shouldn't circularly reinforce its own conclusion by counting a hypothesized move as differentially suggesting you're a bad player in the hypothetical where you make that move. Thinking that way seems contrary to the spirit of the hypothetical (whose purpose is to help evaluate the move). It's fine for the heuristic to suggest things are bad in that hypothetical (because you heuristically think the move is bad); it seems much more questionable to suppose that your subsequent moves will be worse in that hypothetical, particularly if that inference is a lynchpin if your overall negative assessment of the move.

What do you want out of the chess-like example? Is it enough for me to say the troll could be the other player, and the bridge could be a strategy which you want to employ? (The other player defeats the strategy if they think you did it for a dumb reason, and they let it work if they think you did it smartly, and they know you well, but you don't know whether they think you're dumb, but you do know that if you were being dumb then you would use the strategy.) This is can be exactly troll bridge as stated in the post, but set in chess with player source code visible.

I'm guessing that's not what you want, but I'm not sure what you want.

Comment by abramdemski on Troll Bridge · 2019-10-04T23:25:44.865Z · score: 2 (1 votes) · LW · GW

I don't see why the proof fails here; it seems to go essentially as usual.

Reasoning in PA:

Suppose a=cross->u=-10 were provable.

Further suppose a=cross.

Note that we can see there's a proof that not crossing gets 0, so it must be that a better (or equal) value was found for crossing, which must have been +10 unless PA is inconsistent, since crossing implies that u is either +10 or -10. Since we already assumed crossing gets -10, this leads to trouble in the usual way, and the proof proceeds from there.

(Actually, I guess everything is a little messier since you haven't stipulated the search order for actions, so we have to examine more cases. Carrying out some more detailed reasoning: So (under our assumptions) we know PA must have proved (a=cross -> u=10). But we already supposed that it proves (a=cross -> u=-10). So PA must prove not(a=cross). But then it must prove prove(not(a=cross)), since PA has self-knowledge of proofs. Which means PA can't prove that it'll cross, if PA is to be consistent. But it knows that proving it doesn't take an action makes that action very appealing; so it knows it would cross, unless not crossing was equally appealing. But it can prove that not crossing nets zero. So the only way for not crossing to be equally appealing is for PA to also prove not crossing nets 10. For this to be consistent, PA has to prove that the agent doesn't not-cross. But now we have PA proving that the agent doesn't cross and also that it doesn't not cross! So PA must be inconsistent under our assumptions. The rest of the proof continues as usual.)