The Pointers Problem: Clarifications/Variations 2021-01-05T17:29:45.698Z
Debate Minus Factored Cognition 2020-12-29T22:59:19.641Z
Babble Challenge: Not-So-Future Coordination Tech 2020-12-21T16:48:20.515Z
Fusion and Equivocation in Korzybski's General Semantics 2020-12-21T05:44:41.064Z
Writing tools for tabooing? 2020-12-13T19:50:37.301Z
Mental Blinders from Working Within Systems 2020-12-10T19:09:50.720Z
Quick Thoughts on Immoral Mazes 2020-12-09T01:21:40.210Z
Number-guessing protocol? 2020-12-07T15:07:48.019Z
Recursive Quantilizers II 2020-12-02T15:26:30.138Z
Nash Score for Voting Techniques 2020-11-26T19:29:31.187Z
Deconstructing 321 Voting 2020-11-26T03:35:40.863Z
Normativity 2020-11-18T16:52:00.371Z
Thoughts on Voting Methods 2020-11-17T20:23:07.255Z
Signalling & Simulacra Level 3 2020-11-14T19:24:50.191Z
Learning Normativity: A Research Agenda 2020-11-11T21:59:41.053Z
Probability vs Likelihood 2020-11-10T21:28:03.934Z
Time Travel Markets for Intellectual Accounting 2020-11-09T16:58:44.276Z
Kelly Bet or Update? 2020-11-02T20:26:01.185Z
Generalize Kelly to Account for # Iterations? 2020-11-02T16:36:25.699Z
Dutch-Booking CDT: Revised Argument 2020-10-27T04:31:15.683Z
Top Time Travel Interventions? 2020-10-26T23:25:07.973Z
Babble & Prune Thoughts 2020-10-15T13:46:36.116Z
One hub, or many? 2020-10-04T16:58:40.800Z
Weird Things About Money 2020-10-03T17:13:48.772Z
"Zero Sum" is a misnomer. 2020-09-30T18:25:30.603Z
What Does "Signalling" Mean? 2020-09-16T21:19:00.968Z
Most Prisoner's Dilemmas are Stag Hunts; Most Stag Hunts are Battle of the Sexes 2020-09-14T22:13:01.236Z
Comparing Utilities 2020-09-14T20:56:15.088Z
[Link] Five Years and One Week of Less Wrong 2020-09-14T16:49:35.082Z
Social Capital Paradoxes 2020-09-10T18:44:18.291Z
abramdemski's Shortform 2020-09-10T17:55:38.663Z
Capturing Ideas 2020-09-09T21:20:23.049Z
Updates and additions to "Embedded Agency" 2020-08-29T04:22:25.556Z
The Bayesian Tyrant 2020-08-20T00:08:55.738Z
Radical Probabilism 2020-08-18T21:14:19.946Z
Mesa-Search vs Mesa-Control 2020-08-18T18:51:59.664Z
PSA: Tagging is Awesome 2020-07-30T17:52:14.047Z
What happens to variance as neural network training is scaled? What does it imply about "lottery tickets"? 2020-07-28T20:22:14.066Z
To what extent are the scaling properties of Transformer networks exceptional? 2020-07-28T20:06:24.191Z
How should AI debate be judged? 2020-07-15T22:20:33.950Z
What does it mean to apply decision theory? 2020-07-08T20:31:05.884Z
How "honest" is GPT-3? 2020-07-08T19:38:01.800Z
Noise on the Channel 2020-07-02T01:58:18.128Z
Radical Probabilism [Transcript] 2020-06-26T22:14:13.523Z
Betting with Mandatory Post-Mortem 2020-06-24T20:04:34.177Z
Relating HCH and Logical Induction 2020-06-16T22:08:10.023Z
An Orthodox Case Against Utility Functions 2020-04-07T19:18:12.043Z
Thinking About Filtered Evidence Is (Very!) Hard 2020-03-19T23:20:05.562Z
Bayesian Evolving-to-Extinction 2020-02-14T23:55:27.391Z
A 'Practice of Rationality' Sequence? 2020-02-14T22:56:13.537Z


Comment by abramdemski on Where to Draw the Boundaries? · 2021-01-21T19:26:11.361Z · LW · GW

If the alien understands the whole picture, it will notice the causal arrow from human concerns to social constructs. For instance, if you want gay marriage to be a thing, you amend the marriage construct so that is.

The point of the thought experiment is that, for the alien, all of that is totally mundane (ie scientific) knowledge. So why can't that observation count as scientific for us?

IE, just because we have control over a thing doesn't -- in my ontology -- indicate that the concept of map/territory correspondence no longer applies. It only implies that we need to have conditional expectations, so that we can think about what happens if we do one thing or another. (For example, I know that if I think about whether I'm thinking about peanut butter, I'm thinking about peanut butter. So my estimate "am I thinking about peanut butter?" will always be high, when I care to form such an estimate.)

Rocks existed before the concept of rocks. Money did not exist before he concept of money.

And how is the temporal point at which something comes into existence relevant to whether we need to track it accurately in our map, aside from the fact that things temporally distant from us are less relevant to our concerns?

Your reply was very terse, and does not articulate very much of the model you're coming from, instead mostly reiterating the disagreement. It would be helpful to me if you tried to unpack more of your overall view, and the logic by which you reach your conclusions.

I know that you have a concept of "pre-existing reality" which includes rocks and not money, and I believe that you think things which aren't in pre-existing reality don't need to be tracked by maps (at least, something resembling this). What I don't see is the finer details of this concept of pre-existing reality, and why you think we don't need to track those things accurately in maps.

The point of my rock example is that the smashed rock did not exist before we smashed it. Or we could say "the rock dust" or such. In doing so, we satisfy your temporal requirement (the rock dust did not exist until we smashed it, much like money did not exist until we conceived of it). We also satisfy the requirement that we have complete control over it (we can make the rock dust, just like we can invent gay marriage).

I know you don't think the rock example counts, but I'm trying to ask for a more detailed model of why it doesn't. I gave the rock example because, presumably, you do agree that bits of smashed rock are the sort of thing we might want accurate maps of. Yet they seem to match your criteria.

Imagine for a moment that we had perfect control of how the rock crumbles. Even then, it would seem that we still might want a place in our map for the shape of the rock shards. Despite our perfect control, we might want to remember that we shaped the rock shards into a key and a matching lock, etc.

Remember that the original point of this argument was your assertion:

In order for your map to be useful , it needs to reflect the statistical structure of things to the extent required by the value it is in service to.

That can be zero. There is a meta category of things that are created by humans without any footprint in pre existing reality. These include money, marriages, and mortgages

So -- to the extent that we are remaining relevant to the original point -- the question is why, in your model, there is zero need to reflect the statistical structure of money, marriage, etc.

Comment by abramdemski on Where to Draw the Boundaries? · 2021-01-21T18:02:25.755Z · LW · GW

So if your friends are using concepts which are optimized for other things, then either (1) you’ve got differing goals and you now would do well to sort out which of their concepts have been gerrymandered, (2) they’ve inherited gerrymandered concepts from someone else with different goals, or (3) your friends and you are all cooperating to gerrymander someone else’s concepts (or, (4), someone is making a mistake somewhere and gerrymandering concepts unnecessarily).

So? That’s a very particular set of problems. If you try to solve them by banning all unscientific concepts, then you lose all the usefulness they have in other contexts.

It seems like part of our persistent disagreement is:

  • I see this as one of very few pathways, and by far the dominant pathway, by which beliefs can be beneficial in a different way from useful-for-prediction
  • You see this as one of many many pathways, and very much a corner case

I frankly admit that I think you're just wrong about this, and you seem quite mistaken in many of the other pathways you point out. The argument you quoted above was supposed to help establish my perspective, by showing that there would be no reason to use gerrymandered concepts unless there was some manipulation going on. Yet you casually brush this off as a very particular set of problems.

I’m just saying there’s something special about avoiding these things, whenever possible,

Wherever possible, or wherever beneficial? Does it make the world a better place to keep pointing out that tomatoes are fruit?

As a general policy, I think that yes, frequently pointing out subtler inaccuracies in language helps practice specificity and gradually refines concepts. For example, if you keep pointing out that tomatoes are fruit, you might eventually be corrected by someone pointing out that "vegetable" is a culinary distinction rather than a biological one, and so there is no reason to object to the classification of a tomato as a vegetable. This could help you develop philosophically, by providing a vivid example of how we use multiple overlapping classification systems rather than one; and further, that scientific-sounding classification criteria don't always take precedence (IE culinary knowledge is just as valid as biology knowledge).

If you use a gerrymandered concept, you may have no understanding of the non-gerrymandered versions; or you may have some understanding, but in any case not the fluency to think in them.

I’m not following you any more. Of course unscientific concepts can go wrong—anything can. But if you’re not saying everyone should use scientific conceotts all the time, what are you saying?

In what you quoted, I was trying to point out the distinction between speaking a certain way vs thinking a certain way. My overall conversational strategy was to try to separate out the question of whether you should speak a specific way from the question of whether you should think a specific way. This was because I had hoped that we could more easily reach agreement about the "thinking" side of the question.

More specifically, I was pointing out that if we restrict our attention to how to think, then (I claim) the cost of using concepts for non-epistemic reasons is very high, because you usually cannot also be fluent in the more epistemically robust concepts, without the non-epistemic concepts losing a significant amount of power. I gave an example of a Christian who understands the atheist worldview in too much detail.

I see Zack as (correctly) ruling in mere optimization of concepts to predict the things we care about, but ruling out other forms of optimization of concepts to be useful.

I think that is Zacks argument, and that it s fallacious. Because we do things other than predict.

I need some kind of map of the pathways you think are important here.

I 100% agree that we do things other than predict. Specifically, we act. However, the effectiveness of action seems to be very dependent on the accuracy of predictions. We either (a) come up with good plans by virtue of having good models of the world, or (b) learn how to take effective actions "directly" by interacting with the world and responding to feedback. Both of these rely on good epistemics (because learning to act "directly" still relies on our understanding of the world to interpret the feedback -- ie the same reason ML people sometimes say that reinforcement learning is essentially learning a classifier).

That view -- that by far the primary way in which concepts influence the world is via the motor output channels, which primarily rely on good predictions -- is the foundation of my view that most of the benefits of concepts optimized for things other than prediction must be manipulation.

Low level manipulation is ubiquitous. You need to argue for “manipulative in an egregiously bad way” separately

I’m arguing that Zack’s definition is a very good Schelling fence to put up

You are arguing that it is remotely possible to eliminate all manipulation???

Suppose we're starting a new country, and we are making the decision to outlaw theft. Someone comes to you and says "it isn't remotely possible to eliminate all theft!!!" ... you aren't going to be very concerned with their argument, right? The point of laws is not to entirely eliminate a behavior (although it would be nice). The point is to help make the behavior uncommon enough that the workings of society are not too badly impacted.

In Zack's case, he isn't even suggesting criminal punishment be applied to violations. It's more like someone just saying "stealing is bad". So the reply "you're saying that we can eliminate all theft???" seems even less relevant.

One of Zack’s recurring arguments is that appeal to consequences is an invalid argument when considering where to draw conceptual boundaries

Obtaining good consequences is a very good reason to do a lot of things.

Again, I'm going to need some kind of map of how you see the consequences flowing, because I think the main pathway for those "good consequences" you're seeing is manipulation.

Comment by abramdemski on Asymmetric Justice · 2021-01-20T21:54:13.285Z · LW · GW

I really like this post. I think it points out an important problem with intuitive credit-assignment algorithms which people often use. The incentive toward inaction is a real problem which is often encountered in practice. While I was somewhat aware of the problem before, this post explains it well.

I also think this post is wrong, in a significant way: asymmetric justice is not always a problem and is sometimes exactly what you want. in particular, it's how you want a justice system (in the sense of police, judges, etc) to work.

The book Law's Order explains it like this: you don't want theft to be punished in keeping with its cost. Rather, in order for the free market to function, you want theft to be punished harshly enough that theft basically doesn't happen.

Zvi speaks as if the purpose of the justice system is to reward positive externalities and punish negative externalities, to align everyone's incentives. While this is a noble goal, Law's Order sees it as a goal to be taken care of by other parts of society, in particular the free market. (Law's Order is a fairly libertarian book, so it puts a lot of faith in the free market.)

The purpose of the justice system is to enforce the structure such that those other institutions can do their jobs. The free market can't optimize people's lives properly if theft and murder are a constant and contracts cannot be enforced.

So, it makes perfect sense for a justice system to be asymmetric. Its role is to strongly disincentivize specific things, not to broadly provide compensatory incentives.

(For this reason, scales are a pretty terrible symbol for justice.)

In general, we might conclude that credit assignment systems need two parts:

  1. A "symmetric" part, which attempts to allocate credit in as calibrated a way as it can, rewarding good work and punishing bad.
  2. An "asymmetric" part, which harshly enforces the rules which ensure that the symmetric part can function, ensuring that those rules are followed frequently enough for things to function.

This also gives us a criterion for when punishment should be disproportionate: only those things which interfere with the more proportionate credit assignment should be disproportionately punished.

Overall, I still think this is a great post, I just think there's more to the issue.

Comment by abramdemski on Debate Minus Factored Cognition · 2021-01-20T21:06:22.518Z · LW · GW

I think this is only true when you have turn-by-turn play and your opponent has already "claimed" the honest debater role.

Yeah, I was assuming turn-by-turn play.

In the simultaneous play setting, I think you expect both agents to be honest.

This is a significant point that I was missing: I had assumed that in simultaneous play, the players would randomize, so as to avoid choosing the same answer, since choosing the same answer precludes winning. However, if choosing a worse answer means losing, then players prefer a draw.

But I'm not yet convinced, because there's still the question of whether choosing the worse answer means losing. The "clawing" argument still suggests that choosing the worse answer may yield a draw (in expectation), even in simultaneous play. (IE, what if the should-be loser attacks the winner, and they go back and forth, with winner depending on last word?)

Ah, I suppose this is still consistent with honesty being an equilibrium. But it would then be a really weak sort of equilibrium -- there would be no reason to be honest, but no specific reason to be dishonest, either.

Zero-sum setting, argument that honesty is an equilibrium (for the first player in a turn-by-turn game, or either player in a simultaneous-action game):

If you are always honest, then whenever you can take an action, there will exist a defeater (by your assumption), therefore you will have at least as many options as any non-honest policy (which may or may not have a defeater). Therefore you maximize your value by being honest.

There always exists an honest defeater to dishonest arguments. But, never to honest arguments. (I should have explicitly assumed this.) Therefore, you are significantly tying your hands by being honest: you don't have a way to refute honest arguments. (Which you would like to do, since in the zero-sum setting, this may be the only way to recover points.)

I assume (correct me if I'm wrong) that the scoring rules to "the zero sum setting" are something like: the judge assesses things at the end, giving +1 to the winner and -1 from the loser, or 0 in case of a tie.

Then I concede that there is an honest equilibrium where the first player tells the truth, and the second player concedes (or, in simultaneous play, both players tell the truth and then concede). However, it does seem to be an extremely weak equilibrium -- the second player is equally happy to lie, starting a back-and-forth chain which is a tie in expectation.

It seems plausible to me that there's an incremental zero-sum scoring rule; EG, every convincing counterargument takes 1 point from the other player, so any dishonest statement is sure to lose you a point (in equilibrium). The hope would be that you always prefer to concede rather than argue, even if you're already losing, in order to avoid losing more points.

However, this doesn't work, because a dishonest (but convincing) argument gives you +1, and then -1 if it is refuted; so at worst it's a wash. So again it's a weak equilibrium, and if there's any imperfection in the equilibrium at all, it actively incentivises lying when you would otherwise concede (because you want to take the chance that the opponent will not manage to refute your argument).

This was the line of reasoning which led me to the scoring rule in the post, since making it a -2 (but still only +1 for the other player) solves that issue.

When arguments do terminate quickly enough (maximum depth of the game tree is less than the debate length), that ensures that the honest player always gets the "last word" (the point at which a dishonest defeater no longer exists), and so honesty always wins and is the unique equilibrium.

I agree that if we assume honesty eventually wins if arguments are long enough (IE, eventually you get to an honest argument which has no dishonest defeater), then there would be an honest equilibrium, and no dishonest equilibrium.

More broadly, I note that the "clawing" argument only applies when facing an honest opponent. Otherwise, you should just use honest counterarguments.

Ahhh, this is actually a pretty interesting point, because it almost suggests that honesty is an Evolutionarily Stable Equilibrium, even though it's only a Weak Nash Equilibrium. But I think that's not quite true, since the strategy "lie when you would otherwise have to concede, but otherwise be honest" can invade the honest equilibrium. (IE that mutation would not be selected against, and could be actively selected for if we're not quite in equilibrium, since players might not be quite perfect at finding the honest refutations for all lies.)

I also don't really understand the hope in the non-zero-sum case here -- in the non-zero-sum setting, as you mention the first player can be dishonest, and then the second player concedes rather than giving an honest defeater that will then be re-defeated by the first (dishonest) player. This seems like worse behavior than is happening under the zero-sum case.

You're right, that's really bad. The probability of the opponent finding (and using) a dishonest defeater HAS TO be below 50%, in all cases, which is a pretty high bar. Although of course we can make an argument about how that probability should be below 50% if we're already in an honest-enough regime. (IE we hope that the dishonest player prefers to concede at that point rather than refute the refutation, for the same reason as your argument gives -- it's too afraid of the triple refutation. This is precisely the argument we can't make in the zero sum case.)

Comment by abramdemski on Debate Minus Factored Cognition · 2021-01-20T19:20:05.202Z · LW · GW

There are two arguments:

  1. Your assumption + automatic verification of questions of the form "What is the best defeater to X" implies Weak Factored Cognition (which as defined in my original comment is of the form "there exists a tree such that..." and says nothing about what equilibrium we get).

Right, of course, that makes more sense. However, I'm still feeling dense -- I still have no inkling of how you would argue weak factored cognition from #1 and #2. Indeed, Weak FC seems far too strong to be established from anything resembling #1 and #2: WFC says that for any question Q with a correct answer A, there exists a tree. In terms of the computational complexity analogy, this is like "all problems are PSPACE". Presumably you intended this as something like an operational definition of "correct answer" rather than an assertion that all questions are answerable by verifiable trees? In any case, #1 and #2 don't seem to imply anything like "for all questions with a correct answer..." -- indeed, #2 seems irrelevant, since it is about what arguments players can reliably find, not about what the human can verify.

2. Weak Factored Cognition + debate + human judge who assumes optimal play implies an honest equilibrium. (Maybe also: if you assume debate trees terminate, then the equilibrium is unique. I think there's some subtlety here though.)

I'll just flag that I still don't know this argument, either, and I'm curious where you're getting it from / what it is. (I have a vague recollection that this argument might have been explained to me in some other comment thread about debate, but, I haven't found it yet.) But, you understandably don't focus on articulating your arguments 1 or 2 in the main body of your comment, instead focusing on other things. I'll leave this comment as a thread for you to articulate those two arguments further if you feel up to it, and make another comment to reply to the bulk of your comment.

Comment by abramdemski on Debate Minus Factored Cognition · 2021-01-19T22:09:38.502Z · LW · GW

Thanks for taking the time to reply!

I don’t think that’s what I did? Here’s what I think the structure of my argument is:

  1. Every dishonest argument has a defeater. (Your assumption.)
  2. Debaters are capable of finding a defeater if it exists. (I said “the best counterargument” before, but I agree it can be weakened to just “any defeater”. This doesn’t feel that qualitatively different.)
  3. 1 and 2 imply the Weak Factored Cognition hypothesis. I’m not assuming factored cognition, I’m proving it using your assumption.

Ah, interesting, I didn't catch that this is what you were trying to do. But how are you arguing #3? Your original comment seems to be constructing a tree computation for my debate, which is why I took it for an argument that my thing can be computed within factored cognition, not vice versa.

I think maybe what you're trying to argue is that #1 and #2 together imply that we can root out dishonest arguments (at least, in the honest equilibrium), which I would agree with -- and then you're suggesting that this means we can recognize good arguments in the factored-cognition sense of good (IE arguments supported by a FC tree)? But I don't yet see the implication from rooting out dishonest arguments to being able to recognize arguments that are valid in FC terms.

Perhaps an important point is that by "dishonest" I mean manipulative, ie, arguments which appear valid to a human on first reading them but which are (in some not-really-specified sense) bad. So, being able to root out dishonest arguments just means we can prevent the human from being improperly convinced. Perhaps you are reading "dishonest" to mean "invalid in an FC sense", ie, lacking an FC tree. This is not at all what I mean by dishonest. Although we might suppose dishonest implies dishonest, this supposition still would not make your argument go through (as far as I am seeing), because the set of not-dishonest arguments would still not equal the set of FC-valid arguments.

If you did mean for "honest" to be defined as "has a supporting FC tree", my objection to your argument quoted above would be that #1 is implausibly strong, since it requires that any flaw in a tree can be pointed out in a single step. (Analogically, this is assuming PSPACE=NP.)

Possibly your worry is that the argument trees will never terminate, because every honest defeater could still have a dishonest defeater?

I mean, that's a concern I have, but not necessarily wrt the argument above. (Unless you have a reason why it's relevant.)

It is true that I do need an additional assumption of some sort to ensure termination. Without that assumption, honesty becomes one of multiple possible equilibria (but it is still an equilibrium).

Based on what argument? Is this something from the original debate paper that I'm forgetting?

I also agree with this; does anyone think it is proving something about the safety properties of debate w.r.t messy situations?

Fair question. Possibly it's just my flawed assumption about why the analogy was supposed to be interesting. I assumed people were intending the PSPACE thing as evidence about what would happen in messier situations.

This seems good; I think probably I don’t get what exactly you’re arguing. (Like, what’s the model of human fallibility where you don’t access NP in one step? Can the theoretical-human not verify witnesses? What can the theoretical-human verify, that lets them access NP in multiple timesteps but not one timestep?)

My model is like this:

Imagine that we're trying to optimize a travelling salesman route, using an AI advice system. However, whenever the AI says "democratic" or "peaceful" or other such words, the human unthinkingly approves of the route, without checking the claimed distance calculation.

This is, of course, a little absurd, but similar effects have been observed in experiments.

I'm then making the further assumption that humans can correct these errors when they're explained sufficiently well.

That's my model; the proposal in the post lives or dies on its merits.

I agree that you get a “clawing on to the argument in hopes of winning” effect, but I don’t see why that changes the equilibrium away from honesty. Just because a dishonest debater would claw on doesn’t mean that they’d win. The equilibrium is defined by what makes you win.

The point of the "clawing" argument is that it's a rational deviation from honesty, so it means honesty isn't an equilibrium. It's a 50/50 chance of winning (whoever gets the last word), which is better than a sure failure (in the case that a player has exhausted its ability to honestly argue).

Granted, there may be zero-sum rules which nonetheless don't allow this. I'm only saying that I didn't see how to avoid it with zero-sum scoring.

I don’t really understand why you want it to be non-zero-sum [...]

I really just needed it for my argument to go through. If you have an alternate argument which works for the zero-sum case, I’m interested in hearing it.

I mean, I tried to give one (see response to your first point; I’m not assuming the Factored Cognition hypothesis). I’m not sure what’s unconvincing about it.

I remain curious to hear your clarification wrt that (specifically, how you justify point #3). However, if that argument went through, how would that also be an argument that the same thing can be accomplished with a zero-sum set of rules?

Based on your clarification, my current understanding of what that argument tries to accomplish is "I’m not assuming factored cognition, I’m proving it using your assumption." How would establishing that help establish a set of zero sum rules which have an honest equilibrium?

Comment by abramdemski on AI safety via market making · 2021-01-19T00:00:16.302Z · LW · GW

This was a very interesting comment (along with its grandparent comment), thanks -- it seems like a promising direction.

However, I'm still confused about whether this would work. It's very different from judging procedure outlined here; why is that? Do you have a similarly detailed write-up of the system you're describing here?

I'm actually less concerned about loops and more concerned about arguments which are infinite trees, but the considerations are similar. It seems possible that the proposal you're discussing very significantly addresses concerns I've had about debate.

Comment by abramdemski on Debate Minus Factored Cognition · 2021-01-18T23:28:02.835Z · LW · GW

I think I disagree with the claim you're making about being able to avoid requiring the judge to assume that one player is honest (but I might be confused about what you're proposing). 

Don't you yourself disagree with requiring the judge to assume that one player is honest? In a recent comment, you discuss how claims should not be trusted by default.

Comment by abramdemski on Debate Minus Factored Cognition · 2021-01-18T22:57:33.379Z · LW · GW

I don't know if you've seen our most recent debate rules and attempt at analysis of whether they provide the desired behavior - seems somewhat relevant to what you're thinking about here. 

I took a look, and it was indeed helpful. However, I left a comment there about a concern I have. The argument at the end only argues for what you call D-acceptability: having no answer that's judged better after D steps of debate. My concern is that even if debaters are always D-acceptable for all D, that does not mean they are honest. They can instead use non-well-founded argument trees which never bottom out.

Comment by abramdemski on Debate Minus Factored Cognition · 2021-01-18T22:06:06.979Z · LW · GW

It seems to me that your argument is very similar, except that you get a little more mileage out of assumption 2, that the debaters can find the true decomposition tree.

While I agree that the defeater tree can be encoded as a factored cognition tree, that just means that if we assume factored cognition, and make my assumption about (recursive) defeaters, then we can show that factored cognition can handle the defeater computation. This is sort of like proving that the stronger theory can handle what the weaker theory can handle, which would not be surprising -- I'd still be interested in the weaker theory as a way to argue safety from fewer assumptions. But it's not even that, since you'd still need to additionally suppose my thesis about defeaters, beyond (strong/weak) factored cognition.

Essentially what's happening is that with your argument we get to trust that the debaters have explored all possible counterarguments and selected the best one and so the human gets to assume that no other more compelling counterarguments exist, which is not something we typically get to assume with weak Factored Cognition. It feels to me intuitively like this puts more burden on the assumption that we find the true equilibrium, though formally it's the same assumption as before.

I don't really get this part -- what's so important about the best counterargument? I think my argument in the post is more naturally captured by supposing counterarguments either work or don't, in binary fashion. So a debater just has to find a defeater. Granted, some defeaters have a higher probability of working, in a realistic situation with a fallible judge. And sure, the debaters should find those. But I don't see where I'm putting a higher burden on finding the true equilibrium. What are you pointing at?

Idk, it seems like this is only true because you are forcing your human to make a judgment. If the judge were allowed to say "I don't know" (in which case no one gets reward, or the reward is split), then I think one step of debate once again provides an NP oracle.

Or perhaps you're assuming that the human is just not very good at being a poly-time algorithm; if that's what you're saying that seems like it's missing the point of the computational complexity analogy. I don't think people who make that analogy (including myself) mean that humans could actually implement arbitrary poly-time algorithms faithfully.

Yeah, my reply would be that I don't see how you get NP oracles out of one step, because a one-step debate will just result in maximally convincing arguments which have little to do with the truth.

I mean, I agree that if you're literally trying to solve TSP, then a human could verify proposed solutions. However, it seems like we don't have to get very messy before humans become exceedingly manipulable through dishonest argument.

So if the point of the computational complexity analogy is to look at what debate could accomplish if humans could be perfect (but poly-time) judges, then I accept the conclusion, but I just don't think that's telling you very much about what you can accomplish on messier questions (and especially, not telling you much about safety properties of debate).

Instead, I'm proposing a computational complexity analogy in which we account for human fallibility as judges, but also allow for the debate to have some power to correct for those errors. This seems like a more realistic way to assess the capabilities of highly trained debate systems.

So far, all of this discussion still works with the zero-sum setting, so I don't really understand why you say

>The following is a fairly nonstandard setup for AI Debate, but I found it necessary to make my argument go through.

Hm, well, I thought I was pretty clear in the post about why I needed that to make my argument work, so I'm not sure what else to say. I'll try again:

In my setup, a player is incentivised to concede when they're beaten, rather than continue to defeat the arguments of the other side. This is crucial, because any argument may have a (dishonest) defeater, so the losing side could continue on, possibly flipping the winner back and forth until the argument gets decided by who has the last word. Thus, my argument that there is an honest equilibrium would not go through for a zero-sum mechanism where players are incentivised to try and steal victory back from the jaws of defeat.

Perhaps I could have phrased my point as the pspace capabilities of debate are eaten up by error correction. 

In any case, it seems to me like making it non-zero-sum is an orthogonal axis. I don't really understand why you want it to be non-zero-sum -- you say that it is to incentivize honesty at every step, but why doesn't this happen with standard debate? If you evaluate the debate at the end rather than at every step, then as far as I can tell under the assumptions you use the best strategy is to be honest.


Overall it seemed to me like the non-zero-sum aspect introduced some problems (might no longer access PSPACE, introduces additional equilibria beyond the honest one), and did not actually help solve anything, but I'm pretty sure I just completely missed the point you were trying to make.

I really just needed it for my argument to go through. If you have an alternate argument which works for the zero-sum case, I'm interested in hearing it.

Maybe you mean that if we assume (weak/strong) factored cognition, you can argue that zero-sum debate works, because argument trees terminate, so who wins is not in fact just up to who gets the last word. But (a) this would require factored cognition; (b) I'm interested in hearing your argument even if it relies on factored cognition, because I'm still concerned that a dishonest player can use flawless but non-well-founded argument trees (and is incentivised to do so, even in the honest equilibrium, to avert loss).

As usual when talking about the debate, I get the feeling that I'm possibly being dumb about something because everyone else seems to buy that there are arguments in support of various points. I'm kind of worried that there aren't really arguments for those things, which is a big part of why I bothered to write a post at all -- this post is basically my attempt to articulate the part of debate that I can currently understand why would work. But getting the argument I'm missing would certainly be helpful.

Comment by abramdemski on Discussion on the choice of concepts · 2021-01-14T22:15:03.711Z · LW · GW

“You think you have done ok, but word meanings are a giant tragedy of the commons. You might have done untold damage. We know that interesting concepts are endlessly watered down by exaggerators and attention seekers choosing incrementally wider categories at every ambiguity. That kind of thing might be going on all over the place. Maybe we just don’t know what words could be, if we were trying to do them well, instead of everyone being out to advance their own utterings.”

"You know, you're speaking as if I'm contributing to the tragedy of the commons, while you are the one who is avoiding it. But you're the one who doesn't think word-meaning is serious enough to elevate beyond an arbitrary choice, whereas I was the one concerned with the real meaning of words. Doesn't your casual stance invite the greater risk of tragedy? Isn't my attempt to cooperate with a larger group the sort of thing which avoids tragedy?"

"I'm far from indifferent, or casual! Denying that there is one correct definition of a word does not make language arbitrary, or unimportant."

"Yes, I get that... and since I didn't explicitly say it before: I concede that there is no fundamental reason we have to stick to common usage, and furthermore, if you're trying to figure out what common usage is in order to decide whether to agree with some point in a discussion, you're probably going down a wrong track. But, look. That doesn't mean you're allowed to make a word mean anything you want."

"I literally am. There are no word police."

"... yeah ... but, look. According to my schoolbooks, at least, biologists define 'life' in a way which excludes viruses, right? Because they don't have 'cells', and there's some doctrine about life consisting of cells. And that's crazy, right? All the big, important intuitions about biology apply to viruses. They're clearly a form of life, because they reproduce and evolve, just like life. If you're going to go around with a narrow concept of 'life' which excludes viruses, you are missing something. You're not just going to be using language in a way I find disagreeable. Your mental heuristics are going to reach poorer conclusions, because you don't apply them broadly enough. Unless you have some secondary concept, 'pseudo-life', which plays the role in your ontology which 'life' plays in mine. In which case it is just a translation issue."

"A virus doesn't have any metabolism, though. That's pretty important to a lot of biology!"

"... Fine, but that still plays to my point that definitions are important, and can be wrong!"

"Hm. I think we both agree that definitions can be good and bad. But, what would make one wrong?"

"It's the same thing that makes anything wrong. Bad definitions lead to low predictive accuracy. If you use worse definitions, you're going to tend to lose bets against people who use better definitions, all else being equal."

"Hmm. I'm pretty on board with the Bayesian thing, but this seems somehow different. I have an intuition that which definitions you use shouldn't matter, at all, to how you predict."

"That seems patently false in practice."

"Sure, but... the bayesian ideal of rationality is an agent with unlimited processing power. It can trivially translate things from one definition to another. The words are just a tool it uses to describe its beliefs. Hence, definitions may influence efficiency of communication, but they shouldn't influence the quality of the beliefs themselves."

"I think I see the problem here. You're imagining just speaking with the definitions. I'm imagining thinking in those terms. I think we'd be on the same page, if I thought speaking were the only concern. In any conversation, the ideal is to communicate efficiently and accurately, in the language as it's understood by the listeners. There's a question of who/how to adjust when participants in a conversation have differing definitions, or don't know whether they have the same ones. But setting that aside, "

"Sure, I think that's what I've been trying to say!"

"But there's another way to think about language. As you know, prediction is compression in orthodox Bayesianism. So a belief distribution can be thought of as a language, and vice versa -- we can translate between the two, using coding schemes. So, in that sense, our internal language just is our beliefs, and by definition has to do with how we make predictions."

"Sure, ok, but that still doesn't need to have much to do with how we talk about things; we can use different languages internally and externally. Like you said, the ideal is to translate our thoughts into the language our listeners can best understand."

"Yes, BUT, that's a more cool and collected way of relating to others -- dare I say cold and isolated, as per my earlier line of thinking. It's a bit like turning in a math assignment without any shown work. You can't bare your soul to everyone, but among trusted friends, you want to talk about how you actually think, pre-translation, because if you do that, you might actually stand a chance of improving how you think."

"I don't think we can literally convey how we think -- it would be a big mess of neural activations. We're doomed to speak in translation."

"Ok, point conceded. But there are degrees. I guess what I'm trying to say is that it seems important to the workings of my internal ontology that 'toasters' just aren't something that can be labelled as 'stupid'; it's a confused notion..."

"Hm, well, I feel it's the reverse, there's something wrong with not being able to label toasters that way..."

Comment by abramdemski on Radical Probabilism · 2021-01-14T17:05:55.860Z · LW · GW

DP: I'm not saying that hardware is infinitely reliable, or confusing a camera for direct access to reality, or anything like that. But, at some point, in practice, we get what we get, and we have to take it for granted. Maybe you consider the camera unreliable, but you still directly observe what the camera tells you. Then you would make probabilistic inferences about what light hit the camera, based on definite observations of what the camera tells you. Or maybe it's one level more indirect from that, because your communication channel with the camera is itself imperfect. Nonetheless, at some point, you know what you saw -- the bits make it through the peripheral systems, and enter the main AI system as direct observations, of which we can be certain. Hardware failures inside the core system can happen, but you shouldn't be trying to plan for that in the reasoning of the core system itself -- reasoning about that would be intractable. Instead, to address that concern, you use high-reliability computational methods at a lower level, such as redundant computations on separate hardware to check the integrity of each computation.

RJ: Then the error-checking at the lower level must be seen as part of the rational machinery.

DP: True, but all the error-checking procedures I know of can also be dealt with in a classical bayesian framework.

RJ: Can they? I wonder. But, I must admit, to me, this is a theory of rationality for human beings. It's possible that the massively parallel hardware of the brain performs error-correction at a separated, lower level. However, it is also quite possible that it does not. An abstract theory of rationality should capture both possibilities. And is this flexibility really useless for AI? You mention running computations on different hardware in order to check everything. But this requires a rigid setup, where all computations are re-run a set number of times. We could also have a more flexible setup, where computations have confidence attached, and running on different machines creates increased confidence. This would allow for finer-grained control, re-running computations when the confidence is really important. And need I remind you that belief prop in Bayesian networks can be understood in radical probabilist terms? In this view, a belief network can be seen as a network of experts communicating with one another. This perspective has been, as I understand it, fruitful.

DP: Sure, but we can also see belief prop as just an efficient way of computing the regular Bayesian math. The efficiency can come from nowhere special, rather than coming from a core insight about rationality. Algorithms are like that all the time -- I don't see the fast fourier transform as coming from some basic insight about rationality.

RJ: The "factor graph" community says that belief prop and fast fourier actually come from the same insight! But I concede the point; we don't actually need to be radical probabilists to understand and use belief prop. But why are you so resistant? Why are you so eager to posit a well-defined boundary between the "core system" and the environment?

DP: It just seems like good engineering. We want to deal with a cleanly defined boundary if possible, and it seems possible. And this way we can reason explicitly about the meaning of sensory observations, rather than implicitly being given the meaning by way of uncertain updates which stipulate a given likelihood ratio with no model. And it doesn't seem like you've given me a full alternative -- how do you propose to, really truly, specify a system without a boundary? At some point, messages have to be interpreted as uncertain evidence. It's not like you have a camera automatically feeding you virtual evidence, unless you've designed the hardware to do that. In which case, the boundary would be the camera -- the light waves don't give you virtual evidence in the format the system accepts, even if light is "fundamentally uncertain" in some quantum sense or whatever. So you have this boundary, where the system translates input into evidence (be it uncertain or not) -- you haven't eliminated it.

RJ: That's true, but you're supposing the boundary is represented in the AI itself as a special class of "sensory" propositions. Part of my argument is that, due to logical uncertainty, we can't really make this distinction between sensory observations and internal propositions. And, once we make that concession, we might as well allow the programmer/teacher to introduce virtual evidence about whatever they want; this allows direct feedback on abstract matters such as "how to think about this", which can't be modeled easily in classic Bayesian settings such as Solomonoff induction, and may be important for AI safety.

DP: Very well, I concede that while I still hold out hope for a fully Bayesian treatment of logical uncertainty, I can't provide you with one. And, sure, providing virtual evidence about arbitrary propositions does seem like a useful way to train a system. I'm just suspicious that there's a fully Bayesian way to do everything you might want to do...

Comment by abramdemski on The Pointers Problem: Clarifications/Variations · 2021-01-12T17:09:13.677Z · LW · GW

Oh, well, satisfying the logical induction criterion is stronger than just PSPACE. I see debate, and iterated amplification, as attempts to get away with less than full logical induction. See, especially Paul's comment

Comment by abramdemski on The Pointers Problem: Clarifications/Variations · 2021-01-11T16:11:59.766Z · LW · GW

I don't have much to say other than that I agree with the connection. Honestly, thinking of it in those terms makes me pessimistic that it's true -- it seems quite possible that humans, given enough time for philosophical reflection, could point to important value-laden features of worlds/plans which are not PSPACE. 

Comment by abramdemski on Mistakes with Conservation of Expected Evidence · 2021-01-11T15:57:32.708Z · LW · GW

Yeah. But I fear that a more common reading of "yes requires the possibility of no" takes it to mean "yes requires the possibility of an explicit no", when in fact it's just "yes requires the possibility of not-yes". I would rather explicitly highlight this by adding "yes requires the possibility of no, or at least, silence", rather than just lumping this under "tricky cases" of yes-requires-the-possibility-of-no.

Comment by abramdemski on Debate Minus Factored Cognition · 2021-01-08T15:46:53.448Z · LW · GW

I think the collusion concern basically over-anthropomorphizes the training process. Say, in prisoner's dilemma, if you train myopically, then "all incentives point toward defection" translates concretely to actual defection.

Granted, there are training regimes in which this doesn't happen, but those would have to be avoided.

OTOH, the concern might be that an inner optimizer would develop which colludes. This would have to be dealt with by more general anti-inner-optimizer technology.

I don’t know if you’ve seen our most recent debate rules and attempt at analysis of whether they provide the desired behavior—seems somewhat relevant to what you’re thinking about here.

Yep, I should take a look!

Comment by abramdemski on You are Dissociating (probably) · 2021-01-07T18:52:39.135Z · LW · GW

Thanks, thas seems helpful! But I don't quite buy it.

Specifically, I don't buy the developmental picture. It seems to me that, under ordinary conditions, if you ask someone to take their self as an object, they don't immediately dissociate. Meditations which aim at defuzion don't seem to traverse over dissociation as part of the path.

I'm also a bit fuzzy on the description of "I am" vs "I am me". In "I am", there's complete equivocation. But in "I am me", there's mere equivalence -- an explicit belief in equality. If the end goal is to recognize equality, why would defuzing the things be useful in the first place? I think the relationship is more complicated than equality.

So now I'm thinking of fusion/defusion as the dimension along which we can take (more and more) internal things as object, but dissociation/association is something like whether we take responsibility for those things. That's not quite right, but it's getting there.

This explains why dissociation might be ultimately dysfunctional and undesirable -- it robs us of agency by not taking responsibility for things. This might be helpful in specific cases, and might be pleasant in specific cases, but as a general habit would be unhelpful and could get unpleasant.

Again, I don't think this is quite right, and there's also something to your "I am me" model that my "responsibility" model doesn't capture. But I also think there's something to the responsibility model that "I am me" doesn't capture.

Comment by abramdemski on Debate Minus Factored Cognition · 2021-01-07T17:29:52.691Z · LW · GW

Basically, it sounds like you’re saying that we can get good answers by just running the whole debate and throwing out answers that turn out to have a defeater, or a defeater-defeater-defeater, or whatever. But if this is the only guarantee we’re providing, then we’re going to need to run an extremely large number of debates to ever get a good answer (ie an exp number of debates for a question where the explanation for the answer is exp-sized)

I'm not sure why you're saying this, but in the post, I restricted my claim to NP-like problems. So for example, traveling salesman -- the computation to find good routes may be very difficult, but the explanation for the answer remains short (EG an explicit path). So, yes, I'm saying that I don't see the same sort of argument working for exp-sized explanations. (Although Rohin's comment gave me pause, and I still need to think it over more.)

But aside from that, I'm also not sure what you mean by the "run an extremely large number of debates" point. Debate isn't like search, where we run more/longer to get better answers. Do you mean that my proposal seems to require longer training time to get anywhere? If so, why is that? Or, what do you mean?

It sounds like you’re saying that we can not require that the judge assume one player is honest/trust the claims lower in the debate tree when evaluating the claims higher in the tree. But if we can’t assume this, that presumably means that some reasonable fraction of all claims being made are dishonest

I'm not asserting that the judge should distrust, either. Like the normal debate argument, I want to end up in an honest equilibrium. So I'm not saying we need some kind of equilibrium where the judge is justified in distrust.

My concern involves the tricky relationship between the equilibrium we're after and what the judge has to actually do during training (when we might not be anywhere near equilibrium). I don't want the judge to have to pretend answers are honest at times when they're statistically not. I didn't end up going through that whole argument in the post (unfortunately), but in my notes for the post, the judge being able to judge via honest opinion at all times during training was an important criterion.

(because if there were only a few dishonest claims, then they’d have honest defeaters and we’d have a clear training signal away from dishonesty, so after training for a bit we’d be able to trust the lower claims).

I agree that that's what we're after. But I think maybe the difference in our positions can be captured if we split "honest" into two different notions...

a-honesty: the statement lacks an immediate (a-honest) counterargument. IE, if I think a statement is a-honest, then I don't think there's a next statement which you can (a-honestly) tell me which would make me disbelieve the statement.

b-honesty: the statement cannot be struck down by multi-step (b-honest) debate. IE, if I think a statement is b-honest, I think as debate proceeds, I'll still believe it.

Both definitions are recursive; their definitions require the rest of the debate being honest in the appropriate sense. However, my intuition is that a-honesty can more easily be established incrementally, starting with a slight pressure toward honesty (because it's supposedly easier in the first place), making the opening statements converge to honesty quickly (in response to the fact that honest defeaters in the first responses are relatively common), then the first responses, etc. On the other hand, converging to b-honesty seems relatively difficult to establish by induction; it seems to me that in order to argue that a particular level of the debate is b-honest, you need the whole remainder of the debate to be probably b-honest.

Now, critically, if the judge thinks debaters are a-honest but not b-honest, then the judge will believe NP type arguments (a TSP path can be struck down by pointing out a single error), but not trust claimed outputs of exponential-tree computations.

So my intuition is that, trying to train for b-honesty, you get debaters making subtle arguments that push the inconsistencies ever-further-out, because you don't have the benefit of an inductive assumption where the rest of the debate is probably b-honest; you have no reason to inductively assume that debaters will follow a strategy where they recursively descend the tree to zero in on errors. They have no reason to do this if they're not already in that equilibrium.

This, in turn, means that judges of the debate have little reason to expect b-honesty, so shouldn't (realistically) assume that at least one of the debaters is honest; but this would exacerbate the problem further, since this would mean there is little training signal (for debates which really do rest on questions about exponential trees, that is). Hence the need to tell the judge to assume at least one debater is honest.

On the other hand, trying for a-honesty, individual a-dishonest claims can be defeated relatively easily (ie, in one step). This gives the judge a lot more reason to probabilistically conclude that the next step in the debate would have been a-honest, and thus, that all statements seen were probably a-honest (unless the judge sees an explicit defeater, of course).

Granted, I don't claim to have a training procedure which results in a-honesty, so I'm not claiming it's that easy.

At this point, debate isn’t really competitive, because it gives us dud answers almost all the time, and we’re going to have to run an exponential number of debates before we happen on a correct one.

Again, I don't really get the idea of running more debates. If the debaters are trained well, so they're following an approximately optimal strategy, we should get the best answer right away.

Are you suggesting we use debate more as a check on our AI systems, to help us discover that they’re bad, rather than as a safe alternative? Ie debate never produces good answers, it just lets you see that bad answers are bad?

My suggestion is certainly going in that direction, but as with regular debate, I am proposing that the incentives produced by debate could produce actually-good answers, not just helpful refutations of bad answers.

But also, the ‘amplified judge consulting sub-debates’ sounds like it’s just the same thing as letting the judge assume that claims lower in the debate are correct when evaluating claims higher in the tree.

You're right, it introduces similar problems. We certainly can't amplify the judge in that way at the stage where we don't even trust the debaters to be a-honest.

But consider:

Let's say we train "to convergence" with a non-amplified judge. (Or at least, to the point where we're quite confident in a-honesty.) Then we can freeze that version, and start using it as a helper to amplify the judge.

Now, we've already got a-honesty, but we're training for a*-honesty: a-honesty with a judge who can personally verify more statements (and thus recognize more sophisticated defeaters, and thus, trust a wider range of statements on the grounds that they could be defeated if false). We might have to shake up the debater strategies to get them to try to take advantage of the added power, so they may not even be a-honest for a while. But eventually they converge to a*-honesty, and can be trusted to answer a broader range of questions.

Again we freeze these debate strategies and use them to amplify the judge, and repeat the whole process.

So here, we have an inductive story, where we build up reason to trust each level. This should eventually build up to large computation trees of the same kind b-honesty is trying to compute.

Comment by abramdemski on You are Dissociating (probably) · 2021-01-05T18:24:05.873Z · LW · GW

I'm confused about the relationship between dissociation and defusion. On the surface they sound like the same thing: getting a little distance from something; separating your sense of self from your feelings; etc. First-hand descriptions of dissociation and first-hand descriptions of some benefits of meditation have many similarities, with the exception that dissociation is described in negative terms.

Yet, when you say "there are many possible practices that can help", you mention meditation as a way to reduce dissociation.

Personally, I think I've experienced mild dissociative states, but I've never felt really negative about it; they seem interesting, and sometimes helpful for dealing with stress.

Some obvious questions:

  • Is meditation really defusion practice, as Kaj suggests?
  • Is defusion as beneficial as Kaj suggests? (The idea as Kaj described it was that defusion can allow you to react more appropriately to stimuli, but by the same token, might allow you to react inappropriately, EG allowing you to ignore pain which you shouldn't ignore, or slowing your reaction times when fast reactions are desirable. If defusion and dissociation are as similar as they seem, defusion might have more downsides than that.)
  • Is dissociation really as negative as people seem to think? (EG, my experiences are mildly positive. Perhaps people just don't talk about the positive side much?)
  • Are defusion and dissociation really the same thing? Or, what exactly are the differences and similarities?
Comment by abramdemski on How hard is it for altruists to discuss going against bad equilibria? · 2021-01-03T01:18:29.468Z · LW · GW

I guess I have the impression that it's difficult to talk about the issues in this post, especially publicly, without being horribly misunderstood (by some). Which is some evidence about the object level questions.

Comment by abramdemski on How hard is it for altruists to discuss going against bad equilibria? · 2021-01-03T01:13:31.581Z · LW · GW

I regret writing this post because I later heard that Michael Arc was using the fact that I wrote it as evidence of corruption inside MIRI, which sorta overshadows my thinking about the post.

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2021-01-02T03:09:25.620Z · LW · GW

Zooming out, Friston's core idea is a direct consequence of thermodynamics: for any system (like an organism) to persist in a state of low entropy (e.g. 98°F) in an environment that is higher entropy but contains some exploitable order (e.g. calories aren't uniformly spread in the universe but concentrated in bananas), it must exploit this order. Exploiting it is equivalent to minimizing surprise, since if you're surprised there some pattern of the world that you failed to make use of (free energy). 

I haven't yet understood the mathematical details of Friston's arguments. I've been told that some of them are flawed. But it's plausible to me that the particular mathematical argument you're pointing at here is OK. However, I doubt the conclusion of that argument would especially convince me that the brain is set up with the particular sort of architecture described by PP. This, it seems to me, gets into the domain of PP as a theoretical model of ideal agency as opposed to a specific neurological hypothesis.

Humans did not perfectly inherit the abstract goals which would have been most evolutionary beneficial. We are not fitness-maximizers. Similarly, even if all intelligent beings need to avoid entropy in order to keep living, that does not establish that we are entropy-minimizers at the core of our motivation system. As per my sibling comment, that's like looking at a market economy and concluding that everyone is a money-maximizer. It's not a necessary supposition, because we can also explain everyone's money-seeking behavior by pointing out that money is very useful.

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2021-01-02T02:53:23.015Z · LW · GW

You can see that perception and action rely on the same mechanism in many ways, starting with the simple fact that when you look at something you don't receive a static picture, but rather constantly saccade and shift your eyes, contract and expand your pupil and cornea, move your head around, and also automatically compensate for all of this motion.

How does this suggest that perception and action rely on the same mechanism, as opposed to are very intertwined? I would certainly agree that motor control in vision has tight feedback loops with vision itself. What I don't believe is that we should model this as acting so as to minimize prediction loss. For one thing, I've read that a pretty good model of saccade movement patterns is that we look at the most surprising parts of the image, which would be better-modeled by moving eyes so as to maximize predictive loss.

Babies look longer at objects which they find surprising, as opposed to those which they recognize.

It's true that PP can predict some behaviors like this, because you'd do this in order to learn, so that you minimize future prediction error. But that doesn't mean PP is helping us predict those eye movements.

In a world dependent on money, a money-minimizing person might still have to obtain and use money in order to survive and get to a point where they can successfully do without money. That doesn't mean we can look at money-seeking behavior and conclude that a person is a money-minimizer. More likely that they're a money-maximizer. But they could be any number of things, because in this world, you have to deal with money in a broad variety of circumstances.

Let me briefly sketch an anti-PP theory. According to what you've said so far, I understand you as saying that we act in a way which minimizes prediction error, but according to a warped prior which doesn't just try to model reality statistically accurately, but rather, increases the probability of things like food, sex, etc in accordance with their importance (to evolutionary fitness). This causes us to seek those things.

My anti-PP theory is this: we act in a way which maximizes prediction error, but according to a warped prior which doesn't just model reality statistically accurately, but rather, decreases the probability of things like food, sex, etc in accordance with their importance. This causes us to seek those things.

I don't particularly believe anti-PP, but I find it to be more plausible than PP. It fits human behavior better. It fits eye saccades better. (The eye hits surprising parts of the image, plus sexually significant parts of the image. It stands to reason that sexually significant images are artificially "surprising" to our visual system, making them more interesting.) It fits curiosity and play behavior better.

By the way, I'm actually much more amenable to the version of PP in Kaj Sotala's post on craving, where warping epistemics by forcing belief in success is just one motivation among several in the brain. I do think something similar to that seems to happen, although my explanation for it is much different (see my earlier comment). I just don't buy that this is the basic action mechanism of the brain, governing all our behavior, since it seems like a large swath of our behavior is basically the opposite of what you'd expect under this hypothesis. Yes, these predictions can always be fixed by sufficiently modifying the prior, forcing the  "pursuing minimal prediction error" hypothesis to line up with the data we see. However, because humans are curious creatures who look at surprising things, engage in experimental play, and like to explore, you're going to have to take a sensible probability distribution and just about reverse the probabilities to explain those observations. At that point, you might as well switch to anti-PP theory.

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2021-01-02T00:46:49.463Z · LW · GW

You're discussing PP as a possible model for AI, whereas I posit PP as a model for animal brains. The main difference is that animal brains are evolved and occur inside bodies.

So, for your project of re-writing rationality in PP, would PP constitute a model of human irrationality, and how to rectify it, in contrast to ideal rationality (which would not be well-described by PP)? 

Or would you employ PP both as a model which explains human irrationality and as an ideal rationality notion, so that we can use it both as the framework in which we describe irrationality and as the framework in which we can understand what better rationality would be?

Evolution is the answer to the dark room problem. You come with prebuilt hardware that is adapted a certain adaptive niche, which is equivalent to modeling it. Your legs are a model of the shape of the ground and the size of your evolutionary territory. Your color vision is a model of berries in a bush, and your fingers that pick them. Your evolved body is a hyperprior you can't update away. In a sense, you're predicting all the things that are adaptive: being full of good food, in the company of allies and mates, being vigorous and healthy, learning new things. Lying hungry in a dark room creates a persistent error in your highest-order predictive models (the evolved ones) that you can't change.

Am I right in inferring from this that your preferred version of PP is one where we explicitly plan to minimize prediction error, as opposed to the Active Inference model (which instead minimizes KL divergence)? Or do you endorse an Active Inference type model?

This explanation in terms of evolution makes the PP theory consistent with observations, but does not give me a reason to believe PP. The added complexity to the prior is similar to the added complexity of other kinds of machinery to implement drives, so as yet I see no reason to prefer this explanation to other possibly explanations of what's going on in the brain. 

My remarks about problems with different versions of PP can each be patched in various ways; these are not supposed to be "gotcha" arguments in the sense of "PP can't explain this! / PP can't deal with this!". Rather, I'm trying to boggle at why PP looks promising in the first place, as a hypothesis to raise to our attention.

Each of the arguments I mentioned are about one way I might see that someone might think PP is doing some work for us, and why I don't see that as a promising avenue.

So I remain curious what the generators of your view are.

Comment by abramdemski on A non-mystical explanation of "no-self" (three characteristics series) · 2020-12-31T16:46:56.406Z · LW · GW

I would be interested in reading more of that sort of thing, especially from people who also have decent 3rd person perspectives (or at least believe such is possible).

Comment by abramdemski on Craving, suffering, and predictive processing (three characteristics series) · 2020-12-31T16:44:47.385Z · LW · GW

I like this version of Predictive Processing much better than the usual, in that you explicitly posit that warping beliefs toward success is only ONE of several motivation systems. I find this much more plausible than using it as the grand unifying theory.

That said, isn't the observation that binocular rivalry doesn't create suffering a pretty big point against the theory as you've described it?

Side note, I don't experience the alternating images you described. I see both things superimposed, something like if you averaged the bitmaps together. Although that's not /quite/ an accurate description. I attribute this to playing with crossing my eyes a lot at a young age, although the causality could be the other way. There's a lot of variance in how people experience their visual field, you'll find, if you ask people enough detailed questions about it. (Same with all sorts of aspects of cognition. Practically all cognitive studies of this kind focus on the typical response more than the variation, giving a false impression of unity of you only read summaries. I suspect a lot of the cognitive variation correlates with personality type (ie OCEAN).)

Comment by abramdemski on A non-mystical explanation of "no-self" (three characteristics series) · 2020-12-31T01:52:55.257Z · LW · GW

Despite all your commentary to the contrary, I find reading this quite effective at inducing some kind of altered state.

Comment by abramdemski on Debate Minus Factored Cognition · 2020-12-31T00:51:23.856Z · LW · GW

Thanks, this seems very insightful, but I'll have to think about it more before making a full reply.

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2020-12-30T23:03:31.628Z · LW · GW

I suspect some of the things that you want to use PP for, I would rather use my machine-learning model of meditation. The basic idea is that we are something like a model-based RL agent, but (pathologically) have some control over our attention mechanism. We can learn what kind of attention patterns are more useful. But we can also get our attention patterns into self-reinforcing loops, where we attend to the things which reinforce those attention patterns, and not things which punish them.

For example, when drinking too much, we might resist thinking about how we'll hate ourselves tomorrow. This attention pattern is self-reinforcing, because it lets us drink more (yay!), while refusing to spend the necessary attention to propagate the negative consequences which might stop that behavior (and which would also harm the attention pattern). All our hurting tomorrow won't de-enforce the pattern very effectively, because that pattern isn't very active to be de-enforced, tomorrow. (RL works by propagating expected pain/pleasure shortly after we do things -- it can achieve things on long time horizons because the expected pain/pleasure includes expectations on long time horizons, but the actual learning which updates an action only happens soon after we take that action.) 

Wishful thinking works by avoiding painful thoughts. This is a self-reinforcing attention pattern for the same reason: if we avoid painful thoughts, we in particular avoid propagating the negative consequences of avoiding painful thoughts. Avoiding painful thoughts feels useful in the moment, because pain is pain. But this causes us to leave that important paperwork in the desk drawer for months, building up the problem, making us avoid it all the more. The more successful we are at not noticing it, the less the negative consequences propagate to the attention pattern which is creating the whole problem.

I have a weaker story for confirmation bias. Naturally, confirming a theory feels good, and getting disconfirmation feels bad. (This is not because we experience the basic neural feedback of perceptual PP as pain/pleasure, which would make us seek predictability and avoid predictive error -- I don't think that's true, as I've discussed at length. Rather, this is more of a social thing. It feels bad to be proven wrong, because that often has negative consequences, especially in the ancestral environment.)

So attention patterns (and behavior patterns) which lead to being proven right will be reinforced. This is effectively one of those pathological self-reinforcing attention patterns, since it avoids its own disconfirmation, and hence, avoids propagating the consequences which would de-enforce it.

I would predict confirmation bias is strongest when we have every social incentive to prove ourselves right.

However, I doubt my story is the full story of confirmation bias. It doesn't really explain performance in the task where you have to flip over cards to check whether "every vowel has an even number on the other side" or such things.

In any case, my theory is very much a just-so story which I contrived. Take with heap of salt.

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2020-12-30T22:53:14.184Z · LW · GW

(see my model of confirmation bias in action).

Quoting from that, and responding:

PP tells us there are three ways you make you predictions match sensory input: 
1. Change your underlying models and their predictions based on what you see. 
2. Change your perception to fit with what you predicted. 
3. Act on the world to bring the two into alignment.

I would clarify that #1 and #2 happen together. Given a large difference between prediction and observation, a confident prediction somewhat overwrites the perception (which helps us deal with noisy data), but the prediction is weakened, too.

And #3 is, of course, something I argued against in my other reply.

You meet cyan skinned people. If they're blunt, you perceive that as nastiness. If they're tactful, you perceive that as dishonesty. You literally see facial twitches and hear notes that aren't there, PP making confirmation bias propagate all the way down to your basic senses.

Right, this makes sense.

If they're actually nice, your brain gets a prediction error signal and tries to correct it with action. You taunt to provoke nastiness, or become intimidating to provoke dishonesty. You grow ever more confident in your excellent intuition with regards to those cyan bastards.

Why do you believe this?

I can believe that, in social circumstances, people act so as to make their predictions get confirmed, because this is important to group status. For example, (subconsciously) socially engineering a situation where the cyan-skinned person is trapped in a catch 22, where no matter what they do, you'll be able to fit it into your narrative. 

What I don't believe in is a general mechanism whereby you act so as to confirm your predictions.

I already stated several reasons in my other comment. First, this does not follow easily from the bayes-net-like mechanisms of perceptual PP theory. They minimize prediction error in a totally different sense, reactively weakening parts of models which resulted in poor predictions, and strengthening models which had strong predictions. This offers no mechanism by which actions would be optimized in a way such that we proactively minimize prediction error thru our actions.

Second, it doesn't fit, by and large, with human behavior. Humans are curious infovores; a better model would be that we actively plan to maximize prediction error, seeking out novel stimulus by steering toward parts of the state-space where our current predictive ability is poor. (Both of these models are poor, but the information-loving model is better.) Give a human a random doodad and they'll fiddle with it by doing things to see what will happen. I think people make a sign error, thinking PP predicts info-loving behavior because this maximizes learning, which intuitively might sound like minimizing prediction error. But it's quite the opposite: maximizing learning means planning to maximize prediction error.

Third, the activity of any highly competent agent will naturally be highly predictable to that agent, so it's easy to think that it's "minimizing prediction error" by following probable lines of action. This explains away a lot of examples of "minimizing prediction error", in that we don't need to posit any separate mechanism to explain what's going on. A highly competent agent isn't necessarily actively minimizing prediction error, just because it's managed to steer things into a predictable state. It's got other goals.

Furthermore, anything which attempts to maintain any kind of homeostasis will express behaviors which can naturally be described as "reducing errors" -- we put on a sweater when it's too cold, take it off when it's too hot, etc. If we're any good at maintaining our homeostasis, this broadly looks sorta like minimizing prediction error (because statistically, we're typically closer to our homeostatic set point), but it's not. 

This is why confirmation bias is the mother of all bias. CB doesn't just conveniently ignore conflicting data. It reinforces itself in your explicit beliefs, in unconscious intuition, in raw perception, AND in action. It can grow from nothing and become impossible to dislodge.

I consider this to be on shaky grounds. Perceptual PP theory is abstracted from the math of bayesian networks, which avoid self-reinforcing beliefs like this. As I mentioned earlier, #1 and #2 happen simultaneously. So the top-down theories should weaken, even as they impose themselves tyrannically on perception. A self-reinforcing feedback loop requires a more complicated explanation.

On the other hand, this can happen in loopy bayesian networks, when approximate inference is done via loopy belief prop. For example, there's a formal result that Gaussian bayes nets end up with the correct mean-value beliefs, but with too high confidence.

So, maybe.

But loopy belief prop is just one approximate inference method for bayes nets, and it makes sense that evolution would fine-tune the inference of the brain to perform quite well at perceptual tasks. This could include adjustments to account for the predictable biases of loopy belief propagation, EG artificially decreasing confidence to make it closer to what it should be.

My point isn't that you're outright wrong about this one, it just seems like it's not a strong prediction of the model.

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2020-12-30T21:50:21.134Z · LW · GW

New Technical is a bit too technical me, so at the book’s recommendation I read An Untrollable Mathematician Illustrated instead and got a cool lesson on the work done to bring together probability theory and logical induction. I’m in this weird spot where I know more math than the vast majority of people but vastly less math than e.g. the researchers at MIRI. And so when I read posts about MIRI’s research and the mathematics of AI alignment I’m either bored or hopelessly lost within two paragraphs.

I expect your response to be common, and therefore have begun to wonder how the heck Technical Explanation got into the book. Did the people who upvoted it really read it? Did they get anything out of it?

I'm curious whether Radical Probabilism did more for you. I think of it as the better attempt at the same thing, IE, communicating the insights of logical induction for broader bayesian rationality.

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2020-12-30T21:46:56.951Z · LW · GW

PP is not one thing. This makes it very difficult for me to say what I don't like about it, since no one element seems to be necessarily present in all the different versions. What follows are some remarks about specific ideas I've seen associated with PP, many of them contradictory. Do let me know which ideas you endorse / don't endorse.

It is also possible that each of my points is based on a particular misconception about PP. While I've made some effort to be well-informed about PP, I have not spent so much time on it, so my understanding is definitely shallow.

The three main meanings of PP (each of which is a cluster, containing many many different sub-meanings, as you flesh out the details in different ways):

  • A theory of perception. If you look PP up on Wikipedia, the term primarily refers to a theory of perceptual processing in which prediction plays a central role, and observations interact with predictions to provide a feedback signal for learning. So, the theory is that perception is fundamentally about minimizing prediction error. I basically believe this theory. So let's set it aside.
  • A theory of action. Some people took the idea "the brain minimizes prediction error" and tried to apply it to motor control, too -- and to everything else in the brain. I think this kind of made sense as a thing to try (unifying these two things is a worthwhile goal!), but doesn't go anywhere. I'll have a lot to say about this. This theory is what I'll mean when I say PP -- it is, in my experience, what rationalists and rationalist-adjacent people primarily mean by "PP".
  • A theory of everything. Friston's free-energy principle. This is not only supposed to apply to the human brain, but also evolution, and essentially any physical system. I have it on good authority that the math in Friston's papers is full of errors, and no one who has been excited about this (that I've seen) has also claimed to understand it. 

1. You have 3 ways of avoiding prediction error: updating your models, changing your perception, acting on the world. Those are always in play and you often do all three in some combination (see my model of confirmation bias in action).

The PP theory of perception says that the brain "minimizes prediction error" in the sense that it is always engaged in the business of predicting, and compares the predictions to observations in order to generate feedback. This could be like gradient descent, or like Bayesian updates.

Actively planning to minimize prediction error, or learning policies which minimize prediction error, is a totally different thing which requires different mechanisms.

Consider that minimizing prediction error in the sense required for prediction means making each individual prediction as accurate as possible, which means, being totally myopic. An error on a specific prediction means making an adjustment on that specific prediction. The credit assignment problem is easily solved; we know exactly what led to that specific prediction, so we can propagate all the relevant errors and make the necessary adjustments.

On the other hand, with planning and policy learning, there is a nontrivial (indeed, severe) credit assignment problem. We don't know which outputs lead to which error signals later. Therefore, we need an entirely different learning mechanism. Indeed, as I argued in The Credit Assignment Problem, we basically need a world model in order to assign credit. This makes it very hard to unify the theory of perception with the theory of action, because one needs the other as input!

In any case, why do you want to suppose that humans take actions in a way which minimizes prediction error? I think this is a poor model. There's the standard "dark room problem" objection: if humans wanted to minimize prediction error, they would like sensory deprivation chambers a whole lot more than they seem to. Instead, humans like to turn on the radio, watch TV, read a book, etc when they don't have anything else to do. Simply put, we are curious creatures, who do not like being bored. Yes, we also don't like too much excitement of the wrong kind, but we are closer to infophillic than infophobic! And this makes sense from an evolutionary perspective. Machine learning has found that reinforcement learning agents do better when you have a basic mechanism to encourage exploration, because it's easy to under-explore, but hard to properly explore. One way to do this is to actively reinforce prediction error; IE, the agents are actually maximizing prediction error! (as one component of more complicated values, perhaps.)

I've seen PP blog posts take this in stride, explaining that it's important to explore in order to get better at doing things so that you can minimize prediction error later. I've seen technical derivations of a "curiosity drive" on this premise. And sure, that's technically true. But that doesn't change that you're postulating a drive which discourages exploration, all things considered, when it's more probable (based on parallels with RL) that evolution would add a drive to explicitly encourage exploration. 

Perhaps this is part of why one of the most common PP formalisms doesn't actually propose to minimize prediction error in either of the two above senses (IE, correcting predictions via feedback, or taking actions which make future prediction error less).

The primary theoretical tool by which PP seeks to explain action is active inference. According to this method, we can select actions by first conditioning on our success, and then sampling actions from that distribution. I sometimes see this justified as a practical way to leverage inference machinery to make decisions. We can judge that on its pragmatic merits. (I think it's not common to use it purely to get the job done -- techniques such as reinforcement learning mostly work better.) Other times, I've heard it associated with the idea that people can't conceive of failure (particularly true failure of core values), or with other forms of wishful thinking.

My first complaint is that this is usually not different enough from standard Bayesian decision theory to account for the biases it purports to predict. For example, to plan to avoid death, you have to start with a realistic world-model which includes all the ways you could die, and then condition on not dying, and then sample actions from that.

In what sense are you "incapable of conceiving of death" if your computations manage to successfully identify potential causes of death and create plans which avoid them?

In what sense are you engaging in wishful thinking, if your planning algorithms work?

One might say: "The psychological claim of wishful thinking isn't that humans fail to take disaster into account when they plan; the claim is, rather, that humans plan while inhabiting a psychological perspective in which they can't fail. This lines up with the idea of sampling from the probability distribution in which failure isn't an option."

But this is too extreme. It's true, when I idly muse about the future, I have a tendency to exclude my own death from it. Yet, I have a visceral fear of heights. When I am near the edge of a cliff, I feel like I am going to fall off and die. This image loops repeatedly even though it has never happened to me and my probability of taking a few steps forward and falling is very low. (It's a fascinating experience: I often stand near ledges on purpose to experience the strong, visceral, unshakable belief that I'm about to fall, which fails to update on all evidence to the contrary.) If I were simply cognizing in the probability distribution which excludes death, I would avoid ledges and cliffs without thinking explicitly about the negative consequences.

And humans are quite capable of explicitly discussing the possibility of death, too.

My second issue with planning by inference is that it also introduces new biases -- strange, inhuman biases.

In particular, a planning-by-inference agent cannot conceive of novel, complicated plans which achieve its goals. This is because updating on success doesn't shift you from your prior as much as it should.

Suppose there is a narrow walkway across an abyss. You are a video game character: you have four directions you can walk (N, S, E, W) at any time. To get across the walkway, you have to go N thirty times in a row.

There are two ways to achieve success: you can open the chest next to you, which achieves success 10% of the time, and otherwise, results in the walkway disappearing. Or, you can cross the walkway, and open the box on the other side. This results in success 100% of the time. You know all of this.

Bayesian decision theory would recommend crossing the walkway.

Planning by inference will almost always open the nearby chest instead.

To see why, remember than we update on our prior. Since we don't already know the optimal plan, our prior on actions is an even distribution between N, S, E and W at all time-steps. This means crossing the walkway has a prior probability of approximately . Updating this prior on success, we find that it's far more probable that we'll succeed by opening the nearby chest.

Technical aside -- the sense in which planning by inference minimizes prediction error is: it minimizes KL divergence between its action distribution and the distribution conditioned on success. (This is just a fancy way of saying you're doing your best to match those probabilities.) It's important to keep in mind that this is vaaastly different from actively planning to avoid prediction error. There is no "dark room problem" here. Indeed, planning-by-inference encourages exploration, rather than suppressing it -- perhaps to the point of over-exploring (because planning-by-inference agents continue to use sub-optimal plans with frequency proportional to their probability of success, long after they've fully explored the possibilities).

2. Action is key, and it shapes and is shaped by perception. The map you build of any territory is prioritized and driven by the things you can act on most effectively. You don't just learn "what is out there" but "what can I do with it".

How are you comparing standard bayesian thinking with PP, such that PP comes out ahead in this respect?

  • Standard bayesian learning theory does just fine learning about acting, along with learning about everything else.
  • Standard bayesian decision theory offers a theory of acting based on that information.
  • Granted, standard bayesian theory has the agent learning about everything, regardless of its usefulness, rather than learning specifically those things which help it act. This is because standard Bayesian theory assumes sufficient processing power to fully update beliefs. However, I am unaware of any PP theory which improves on this state of affairs. Free-energy-minimization models can help deal with limited processing power by variational bayesian inference, but this minimizes the error of all beliefs, rather than providing a tool to specifically focus on those beliefs which will be useful for action (again, to my knowledge). Practical bayesian inference has some tools for focusing inference on the most useful parts, but I have never seen those tools especially associated with PP theory.

3. You care about prediction over the lifetime scale, so there's an explore/exploit tradeoff between potentially acquiring better models and sticking with the old ones.

I've already mentioned some ways in which I think the PP treatment of explore/exploit is not a particularly good one. I think machine learning research has generated much better tools.

4. Prediction goes from the abstract to the detailed. You perceive specifics in a way that aligns with your general model, rarely in contradiction.

5. Updating always goes from the detailed to the abstract. It explains Kuhn's paradigm shifts but for everything — you don't change your general theory and then update the details, you accumulate error in the details and then the general theory switches all at once to slot them into place.

6. In general, your underlying models are a distribution but perception is always unified, whatever your leading model is. So when perception changes it does so abruptly.

This is the perceptual part of PP theory, which I have few issues with.

7. Attention is driven in a Bayesian way, to the places that are most likely to confirm/disconfirm your leading hypothesis, balancing the accuracy of perceiving the attended detail correctly and the leverage of that detail to your overall picture.

This is one part of perceptual PP which I do have an issue with. I have often read PP accounts of attention with some puzzlement.

PP essentially models perception as one big bayesian network with observations at the bottom and very abstract ideas at the top -- which, fair enough. Attention is then modeled as a process which focuses inference on those parts of the network experiencing the most discordance between the top-down predictions and the bottom-up observations. This algorithm makes a lot of sense: there are similar algorithms in machine learning, for focusing belief propagation on the points where it is currently most needed, in order to efficiently propagate large changes across the network before we do any fine-tuning by propagating smaller, less-likely-to-be-important changes. (Why would the brain, a big parallel machine, need such an optimization? Why not propagate all the messages at once, in parallel? Because, biologically, we want to conserve resources. Areas of the brain which are doing more thinking actively consume more oxygen from the blood. Thinking hard is exhausting because it literally takes more energy.)

So far so good.

The problem is, this does not explain conscious experience of attention. I think people are conflating this kind of processing prioritization with conscious experience. They see this nice math of "surprise" in bayesian networks (IE, discordance between bottom-up and top-down messages), and without realizing it, they form a mental image of a humunculus sitting outside the bayesian network and looking at the more surprising regions. (Because this reflects their internal experience pretty well.)

So, how can we get a similar picture without the humunculus?

One theory is that conscious experience is a global workspace which many areas in the brain have fast access to, for the purpose of quickly propagating information that is important to a lot of processes in the brain. I think this theory is a pretty good one. But this is very different from the bayes-net-propagation-prioritization picture. This LW post discusses the discordance.

This isn't so much a strike against the PP picture of attention (it seems quite possible something like the PP mechanism is present), as a statement that there's also something else going on -- another distinct attention mechanism, which isn't best understood in PP terms. Maybe which isn't best understood in terms of a big bayes net, either, since it doesn't really make sense for a big bayes net to have a global workspace.

If we imagine that the neocortex is more or less a big bayes net (with cortical columns as nodes), and the rest of the brain is (among other things, perhaps) an RL agent which utilizes the neocortex as its model, then this secondary attention mechanism is like a filter which determines which information gets from the neocortex to the RL agent. It can, of course, use the PP notion of attention as a strong heuristic determining how to filter information. I don't think this necessarily captures everything that's going on, but it is, in my opinion, better than the pure PP model.

I don't want to get mired down in discussing the details of predictive processing (least of all, the details of Friston's free energy). Feel welcomed to express any specific points you have, by all means. (I'd love a point by point response!!) But what I would really like to know is why you are interested in predictive processing in the first place. All the potential reasons I see seem to be based on empty promises. Yet, PP fans seems to think the ideas will eventually bear fruit. What heuristic is behind this positive expectation? Why are the ideas so promising? What's so exciting about what you've seen? What are the deep generators?

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2020-12-30T16:39:27.380Z · LW · GW

Predictive Processing strikes me as a poor framework; I'd like to try and discuss your enthusiasm vs my lack of enthusiasm. What insights do you think it gives? What basically does PP mean to you, so we're not talking post each other?

Comment by abramdemski on Review: LessWrong Best of 2018 – Epistemology · 2020-12-29T23:37:55.799Z · LW · GW

I liked the tiny books!

Comment by abramdemski on The Parable of Predict-O-Matic · 2020-12-28T19:03:58.930Z · LW · GW

OK, yeah, that's fair.

Comment by abramdemski on The Parable of Predict-O-Matic · 2020-12-28T18:17:17.174Z · LW · GW

I don't see why it should necessarily undercut the core message of the post, since inner optimizers are still in some sense about the consequences of a pure predictive accuracy optimizer (but in the selection sense, not the control sense). But I agree that it wasn't sufficiently well done. It didn't feel like a natural next complication, the way everything else did.

Comment by abramdemski on The Parable of Predict-O-Matic · 2020-12-28T18:14:30.708Z · LW · GW

I share a feeling that part 9 is somehow bad, and I think your points are fair.

Comment by abramdemski on Where to Draw the Boundaries? · 2020-12-28T18:10:18.698Z · LW · GW

What you are saying would be true if people chose friends and projects at random. And if you can only use one toolkit for everything. Neither assumption is realistic. People gather over common interests, and common interests lead to specialised vocabulary. That's as true of rationalism as anything else.

>In contrast, if you have friends who optimize their beliefs based on a lot of other things, then you will have to do more work to figure out whether those beliefs are useful to you as well.

Assuming friends are as randomly distributed as strangers.

I agree that in practice, people choose friends who share memes (in particular, these "optimized for reasons other than pure accuracy" memes) -- both in that they will select friends on the basis of shared memes, and in that other ways of selecting friends will often result in selecting those who share memes.

But remember my point about agents with fully shared goals. Then, memes optimized to predict what they mutually care about will be optimal for them to use.

So if your friends are using concepts which are optimized for other things, then either (1) you've got differing goals and you now would do well to sort out which of their concepts have been gerrymandered, (2) they've inherited gerrymandered concepts from someone else with different goals, or (3) your friends and you are all cooperating to gerrymander someone else's concepts (or, (4), someone is making a mistake somewhere and gerrymandering concepts unnecessarily).

I'm not saying that any of these are fundamentally ineffective, untenable, or even morally reprehensible (though I do think of 1-3 as a bit morally reprehensible, it's not really the position I want to defend here). I'm just saying there's something special about avoiding these things, whenever possible, which has good reason to be attractive to a math/science/rationalist flavored person -- because if you care deeply about clear thinking, and don't want the overhead of optimizing your memes for political ends (or de-optimizing memes from friends from those ends), this is the way to do it. So for that sort of person, fighting against gerrymandered concepts is a very reasonable policy decision, and those who have made that choice will find allies with each other. They will naturally prefer to have their own discussions in their own places.

I do, of course, think that the LessWrong community should be and to an extent is such a place.

>>Why assume it’s necessarily conflictual and zero sum?

>Because when it is not, then beliefs optimized for predictive value only are optimal. If several agents have sufficiently similar goals such that their only focus is on achieving common goals, then the most predictively accurate beliefs are also going to be the highest utility.

Assuming that everything is prediction. If several agents have sufficiently similar goals such that their only focus is on achieving common goals,the most optimal concepts will be ones that are specialised for achieving the goal.

For examplee, in cookery school, you will be taught the scientific untruth that tomatoes are vegetables. The manipulates them into into putting them into savoury dishes instead of deserts. This is more efficient than discovering by trial and error what to do with them.

This point was dealt with in the OP. This is why Zack refers to optimizing for prediction of things we care about. Zack is ruling in things like classifying tomatoes as vegetables for culinary purposes, and fruits for biological purposes. A cook cares about whether something goes well with savory dishes, whereas a biologist cares about properties relating to the functioning and development of an organism, and its evolutionary relationships with other organisms. So each will use concepts optimized for predicting those things.

So why sanction this sort of goal-dependence, while leaving other sorts of goal-dependence unsanctioned? Can't I apply the same arguments I made previously, about this creating a lot of friction when people with different goals try to use each other's concepts?

I think it does create a lot of friction, but the cost of not doing this is simply too high. To live in this universe, humans have to focus on predicting things which are useful to them. Our intellect is not so vast that we can predict things in a completely unbiased way and still have the capacity to, say, cook a meal.

Furthermore, although this does create some friction between agents with different goals, what it doesn't do (which conceptual gerrymandering does do) is cloud your judgement when you are doing your best to figure things out on your own. By definition, your concepts are optimized to help you predict things you care about, ie, think as clearly as possible. Whereas if your concepts are optimized for other goals, then you must be sacrificing some of your ability to predict things you care about, in order to achieve other things. Yes, it might be worth it, but it must be recognized as a sacrifice. And it's natural for some people to be unwilling to make that sort of sacrifice.

I imagine that, perhaps, you aren't fully internalizing this cost because you are imagining using gerrymandered concepts in conversation while internally thinking in clear concepts. But I see the argument as about how to think, not how to talk (although both are important). If you use a gerrymandered concept, you may have no understanding of the non-gerrymandered versions; or you may have some understanding, but in any case not the fluency to think in them. Otherwise you'd risk not achieving your purpose, like a Christian who shows too much fluency in the atheist ontology, thus losing credibility as a Christian. (If they think in the atheist ontology and only speak in the Christian one, that just makes them a liar, which is a different matter.)

There isn't just one kind of unscientific concept. Shared myths can iron out differences in goals, as in your example, or they can optimise the achievement of shared goals, as in mine.

To summarize, I continue to assume a somewhat adversarial scenario (not necessarily zero sum!) because I see Zack as (correctly) ruling in mere optimization of concepts to predict the things we care about, but ruling out other forms of optimization of concepts to be useful. I believe that this rules in all the non-adversarial examples which you would point at, leaving only the cases where something adversarial is going on.

Low level manipulation is ubiquitous. You need to argue for "manipulative in an egregiously bad way" separately

I'm arguing that Zack's definition is a very good Schelling fence to put up.

One of Zack's recurring arguments is that appeal to consequences is an invalid argument when considering where to draw conceptual boundaries. "We can't define Vargaths as anyone who supports Varg, because the President would be a Vargath by that definition, which she would find offensive; and we don't want to offend the president!" would be, by Zack's lights, transparent conceptual gerrymandering and an invalid argument. 

Zack's argument is not itself conceptual gerrymandering because this argument is being made on epistemic grounds, IE, pointing out that accepting "appeals to consequences" arguments reduces your ability to predict things you care about.

My argument in support of Zack's argument appeals to consequences, but does so in service of the normative question of whether a community of truth-seekers should adopt norms against appeals to consequences. Being a normative question, this is precisely where appeals to consequences are valid and desired.

I think you should think of the validity/invalidity of appeals to consequences as the main thing at stake in this argument, in so far as you are wondering what it's all about (ie trying to ask me exactly what kind of claim I'm making). Fighting against ubiquitous low-level manipulation would be nice, but there isn't really a proposal on the table for accomplishing that.

1: For the record, I believe the classical "did you know tomatoes aren't vegetables, they're fruits?" is essentially an urban legend with no basis in scientific classification. Vegetable is essentially a culinary term. If you want to speak in biology terms, then yes, it's also a fruit, but that's not mutually exclusive with it being a vegetable. But in any case, it's clear that there can be terminological conflicts like this, even if "vegetable" isn't one of them; and "tomato" is a familiar example, even if it's spurious. So we can carry on using it as an example for the sake of argument.

Comment by abramdemski on Fusion and Equivocation in Korzybski's General Semantics · 2020-12-28T17:06:02.319Z · LW · GW


Comment by abramdemski on Babble Challenge: Not-So-Future Coordination Tech · 2020-12-21T19:16:46.827Z · LW · GW
  1. A way to implement efficient Futarchy for small groups, on par with how easy it is to run small groups via democratic vote.
  2. Like voting, but with more explicit rules for good deliberation and delegation of deliberation.
  3. Rather than voting, deliberate until unanimous consensus is reached, forcing the group to address minority concerns.
  4. Add to the previous the idea of delegating: in large groups where a full group deliberation is infeasible, people choose delegates who they trust to represent their views. Delegates can further delegate, so that you can defer the decision of who to delegate to.
  5. For decisions which do not realistically require consent from everyone, EG choosing a restaurant when not everyone realistically will go, a kickstarter-like mechanism for tabulating how many would go along with which decision. Some kind of utilitarian voting technique to go along with this, helping to select the best out of the options which have enough potential support.
  6. Approval voting, but everyone states the percentage approval at which they would defer to the group decision. This allows us to shift between requiring full consensus vs simply the highest total approval number (even if that's a tiny minority).
  7. A "Bayesian Database" of all scientific information, in which hypotheses are registered with descriptions in a formal language (so that we can apply a description length prior), and submitted data automatically updates all the hypotheses.
  8. The previous but also connected to a prediction market somehow.
  9. The previous but also with argument mapping capability, including appropriately propagating information across probabilistic arguments.
  10. More generally, prediction markets with argument mapping.
  11. The previous, plus rewarding valid arguments thru picking up any arbitrage implied.
  12. A prediction market which also has a connected way to put up money to make events happen. The prediction market is used to solve the credit-assignment problem, and also solve it predictively, so the market selects the most efficient allocation of your money it can find, including paying some up-front and also paying bonuses later based on actual perceived effectiveness.
  13. A quadtratic funding effective altruist charity fund, with a connected prediction market to rank charities (just for informing givers, not with any strict connection to money allocation), of course including feedback about what actually happens with grants. The prediction market uses its own virtual currency (so it's not easily manipulated by outside interests, and just finds the best forecasters). [A problem with this is that prediction markets aren't going to be particularly good for evaluating x-risk causes.]
  14. An effective altruist / rationalist version of LinkedIn/Klout/etc: EAs/rationalists rate each other for various important properties, using overlapping webs of trust, with (hopefully) Bayesian probabilistic soundness in how trust inferences are propagated. This helps recruitment and hiring for EA orgs, and facilitates finding partners for less formal collaborations, etc.
  15. A microloan/microgrant fund for EAs, with some sort of accountability, bringing us closer to just having one big pot of money which all EAs can draw from as needed. (A "loan/grant" must be "repaid", but not necessarily with money; IE it can be "repaid" with some sort of impact certification?)
  16. A phone app that alerts you when you are in close physical proximity to another rationalist, eg in an unfamiliar city.
  17. A heat map of density of rationalists, including eg which restaurants they frequent.
  18. An app for coordinating rationalist/ea group houses, allowing you to put in housing preferences (people you'd like to live with, people you wouldn't live with, space requirements, rent cap, requirements for space, other requirements for the house, distance from work, etc) and jointly optimizing everything to find good proposals.
  19. Like Github, but for EA projects: a single place with a bunch of projects, descriptions of how to contribute, tools for managing tasks (similar to bug report tickets), etc.
  20. A way to classify claims made by pundits/journalists/writers/etc in articles or public statements, and then record follow-up fact checking / prediction accuracy, to essentially force accountability on them, and help people estimate the accuracy of new statements from the same sources.
Comment by abramdemski on Fusion and Equivocation in Korzybski's General Semantics · 2020-12-21T16:11:18.826Z · LW · GW

Sure. I think we could add a lot more detail (and subtract a few mistaken neurological notions from his system), but his basic idea still makes sense today:

  1. The world outside the nervous system.
  2. The nervous stimulation event. EG, light hitting the retina. This includes "immediate physical-chemical-electro-coloidal" impact of the stimulation, but I'm not sure where he'd draw the boundary.
  3. Broader but still preverbal reactions. Thinking, feeling, etc. This is (at least in part) what Focusing is trying to access.
  4. Linguistic, symbolic processing. "I see a chair", "I feel hurt", etc.

He refers to 1-3 together as "the silent level" and places emphasis on trying to properly distinguish, and access, the silent level.

I'm not sure whether 4 included the internal monologue, or only actual speech. If not, it seems like Korzybski must not have thought in words. (Note how "thinking" is placed as part of the silent level, in #3.)

Comment by abramdemski on Fusion and Equivocation in Korzybski's General Semantics · 2020-12-21T15:59:41.021Z · LW · GW

Yeah, totally. I think I want to defend something like being capable of drawing as many distinctions as possible (while, of course, focusing more on the more important distinctions).

One of the most distinction-heavy people I know is also one of the hardest to understand. Actually, I think the two people I know who are best at distinctions are also the two most communication-bottlenecked people I know.


if you distinguish two things that you previously considered the same, you need to store at least a bit of information more than before

Not literally. It depends on the probability of the two things. At 50/50, it's 1 bit. The further it gets from that, the more we can use efficient encodings to average less than 1 bit per instance, approaching zero.

Comment by abramdemski on Fusion and Equivocation in Korzybski's General Semantics · 2020-12-21T15:52:21.937Z · LW · GW

Ah, good example!

For me, this illustrates how obviously useful and defensible a fusion can seem -- "I am bad at math" can seem like just an empirical fact, and generalization from the past to the future seems like a very defensible heuristic. Nonetheless, using "Yet!" to drive a wedge as you describe turns out to be quite useful.

Comment by abramdemski on Radical Probabilism · 2020-12-20T20:43:03.064Z · LW · GW

It's a good question!

For me, the most general answer is the framework of logical induction, where the bookies are allowed so long as they have poly-time computable strategies. In this case, a bookie doesn't have to be guaranteed to make money in order to count; rather, if it makes arbitrarily much money, then there's a problem. So convergence traders are at risk of being stuck with a losing ticket, but, their existence forces convergence anyway.

If we don't care about logical uncertainty, the right condition is instead that the bookie knows the agent's beliefs, but doesn't know what the outcome in the world will be, or what the agent's future beliefs will be. In this case, it's traditional to requite that bookies are guaranteed to make money. 

(Puzzles of logical uncertainty can easily point out how this condition doesn't really make sense, given EG that future events and beliefs might be computable from the past, which is why the condition doesn't work if we care about logical uncertainty.)

In that case, I believe you're right, we can't use convergence traders as I described them.

Yet, it turns out we can prove convergence a different way.

To be honest, I haven't tried to understand the details of those proofs yet, but you can read about it in the paper "It All Adds Up: Dynamic Coherence of Radical Probabilism" by Sandy Zabell.

Comment by abramdemski on Where to Draw the Boundaries? · 2020-12-20T17:34:57.966Z · LW · GW

>I’m saying that epistemics focused on usefulness-to-predicting is broadly useful in a way that epistemics optimized in other ways is not

That's still not very clear. As opposed to other epistemics being useless, or as opposed to other epistemics having specialized usefulness?

What I meant by "broadly useful" is, having usefulness in many situations and for many people, rather than having usefulness in one specific situation or for one specific person.

For example, it's often more useful to have friends who optimize their epistemics mostly based on usefulness-for-predicting, because those beliefs are more likely to be useful to you as well, rather than just them.

In contrast, if you have friends who optimize their beliefs based on a lot of other things, then you will have to do more work to figure out whether those beliefs are useful to you as well. Simply put, their beliefs will be less trustworthy.

Scaling up from "friends" to "society", this effect gets much more pronounced, so that in the public sphere we really have to ask who benefits from claims/beliefs, and uncontaminated beliefs are much more valuable (so truly unbiased science and journalism are quite valuable as a social good).

Similarly, we can go to the smaller scale of one person communicating with themselves over time. If you optimize your beliefs based on a lot of things other than usefulness-for-predicting, the usefulness of your beliefs will have a tendency to be very situation-specific, so your may have to rethink things a lot more when situations change, compared with someone who left their beliefs unclouded.

Why assume it's necessarily conflictual and zero sum? For one thing, there's a lot of social constructs and unscientific semantics out there.

Because when it is not, then beliefs optimized for predictive value only are optimal. If several agents have sufficiently similar goals such that their only focus is on achieving common goals, then the most predictively accurate beliefs are also going to be the highest utility.

For example, if there is a high social incentive in a community to believe in some specific deity, it could be because there is low trust that people without that belief would act cooperatively. This in turn is because people are assumed to have selfish (IE non-shared) goals. Belief in the deity aligns goals because the deity is said to punish selfish behavior. So, given the belief, everyone can act cooperatively.

Why assume anything unscientific is manipulative?

I'll grant you one caveat: self-fulfilling prophecies. In situations where those are possible, there are several equally predictively accurate beliefs with different utilities, and we should choose the "best" according to our full preferences.

It's a pretty large concession, since it includes all sorts of traditions and norms.

Aside from that, though, optimizing for something else that predictive value is very probably manipulative for the reason I stated above: if you're optimizing for something else, it suggests you're not working in a team with shared goals, since assuming shared goals, the best collective beliefs are the most predictive.

ETA: I don't buy that a unscientific concept is necessarily a lie, but even so, if lies are contagious, and no process deletes them, then we should already be in a sea of lies.

I think this part is just a misunderstanding. The post I linked to argues that lies are contagious not in the sense that they spread, but rather, in the sense that in order to justify one lie, you often have to make more lies, so that the lie spreads throughout your web of beliefs. Ultimately, under scrutiny, you would have to lie (eg to yourself) about epistemology itself, since you would need to justify where you got these beliefs from (so for example, Christian scholars will tend to disagree with Bayesians about what constitutes justification for a belief).

Why? Science and politics do not have to fight over the same territory.

I think this has to do with our other disagreement, so I'll just say that in an ordinary conversation (which I think normally has some mix between "engineer culture" and "diplomat culture"), I personally think there is a lot of overlap in the territory those two modes might be concerned with.

Comment by abramdemski on Where to Draw the Boundaries? · 2020-12-20T17:02:12.090Z · LW · GW

Not in any important sense. Physical instantiations can be very varied..they don't have to look like a typical chess set...and you can play chess in your head if you're smart enough. Chess is a lot more like maths than it is like ichthyology.

Lots of physical things can have varied instantiations. EG "battery". That in itself doesn't seem like an important barrier.

>Even though I have complete control over whether to welcome you, the inference from “does not reflect reality” to “wrong” is still perfectly valid

In that one case.

OK, here's a more general case: I'm looking at a map you're holding, and making factual claims about where the lines of ink are on the paper, colors, etc.

This is very close to your money example, since I can't just make up the numbers in my bank account.

Again, the inference from "does not reflect reality" to "wrong" is perfectly valid.

It's true that I can change the numbers in my bank account by EG withdrawing/depositing money, but this is very similar to observing that I can change a rock by breaking it; it doesn't turn the rock into a non-factual matter.

We already categorise sociology, etc, as soft sciences. Meaning that they are not completely unscientific...and also that they are not reflections of pre existing reality.

True, but it seems like "soft" is due to the fact that we can't get very precise predictions, or even very calibrated probabilities (due to a lot of distributional shift, poor reference classes, etc). NOT due to the concept of prediction failing to be meaningful.

As a thought experiment, imagine an alien species observing earth without interfering with it in any way. Surely, for them, our "social constructs" could be a matter of science, which could be predicted accurately or inaccurately, etc?

Then imagine that the alien moves to the shoulder of a human. It could still play the role of an impartial observer. Surely it could still have scientific beliefs about things like how money works at that point.

Then imagine that the alien occasionally talks with the human whose shoulder it is on. It does not try to sway decisions in any way, but it does offer the human its predictions if the human asks. In cases where events are contingent on the prediction itself (ie the prediction alters what the human does, which changes the subject matter being predicted), the alien does its best to explain that relationship to the human, rather than offer a specific prediction.

I would argue that the alien can still have scientific beliefs about things like how money works at this point.

Now imagine that the "alien" is just a sub-process in the human brain. For example, there's a hypothesis that the cortex serves a purely predictive role, while the rest of the brain implements an agent which uses those predictions. 

Again, I would argue that it's still possible for this sub-process to have factual/scientific/impartial predictions about EG how money works.

Assuming deteminism, statements about the future can be logically inferred from a pre existing state of the universe plus pre existing laws.

Right, agreed. So I'd ask what your notion of "pre-existing" is, such that you made your initial statement (emphasis mine):

In order for your map to be useful , it needs to reflect the statistical structure of things to the extent required by the value it is in service to.

That can be zero. There is a meta category of things that are created by humans without any footprint in pre existing reality.

I understand your thesis to be that if something is not pre-existing reality, a map does not need to "reflect the statistical structure". I'm trying to understand what your thesis means. Based on what you said so far, I hypothesized that "pre-existing" might mean "not effected (causally) by humans". But this doesn't seem to be right, because as you said, the future can be predicted from the past using the ("pre-existing") state and the ("pre-existing") laws.

Comment by abramdemski on Luna Lovegood and the Chamber of Secrets - Part 1 · 2020-12-17T22:17:29.520Z · LW · GW

"fan-fan-fiction" seems wrong, that just denotes a fan of a fanfic (or perhaps a fic written by a fan of a fan).

I'm tempted to say fanficfic instead, since it's a fic of a fanfic.

But really, fanficfanfic seems most accurate if you're going to do something like that.

So might as well just go with "recursive fanfic" or "metafic" or something like that.

Comment by abramdemski on Luna Lovegood and the Chamber of Secrets - Part 1 · 2020-12-17T22:14:42.075Z · LW · GW


Comment by abramdemski on Luna Lovegood and the Chamber of Secrets - Part 1 · 2020-12-17T19:10:49.199Z · LW · GW

I mean downvote it in your feed, which makes it show up less frequently.

Comment by abramdemski on Luna Lovegood and the Chamber of Secrets - Part 1 · 2020-12-17T15:32:25.064Z · LW · GW

You could downvote the fiction tag, which might mean only fiction passing a sufficiently good filter would appear.

Although I don't think solutions like this fully solve the general problem of people being driven away if LW gets too full of fiction. For example, it doesn't help not-logged-in readers (I think there are a lot of those?).