Alignment Research Field Guide

2019-03-08T19:57:05.658Z · score: 180 (58 votes)

Pavlov Generalizes

2019-02-20T09:03:11.437Z · score: 66 (19 votes)
Comment by abramdemski on Alignment Newsletter #41 · 2019-01-22T01:47:46.644Z · score: 4 (2 votes) · LW · GW
This seems like it is not about the "motivational system", and if this were implemented in a robot that does have a separate "motivational system" (i.e. it is goal-directed), I worry about a nearest unblocked strategy.

I am confused about where you think the motivation system comes into my statement. It sounds like you are imagining that what I said is a constraint, which could somehow be coupled with a seperate motivation system. If that's your interpretation, that's not what I meant at all, unless random sampling counts as a motivation system. I'm saying that all you do is sample from what's consented to.

But, maybe what you are saying is that in "the intersection of what the user expects and what the user wants", the first is functioning as a constraint, and the second is functioning as a motivation system (basically the usual IRL motivation system). If that's what you meant, I think that's a valid concern. What I was imagining is that you are trying to infer "what the user wants" not in terms of end goals, but rather in terms of actions (really, policies) for the AI. So, it is more like an approval-directed agent to an extent. If the human says "get me groceries", the job of the AI is not to infer the end state the human is asking the robot to optimize for, but rather, to infer the set of policies which the human is trying to point at.

There's no optimization on top of this finding perverse instantiations of the constraints; the AI just follows the policy which it infers the human would like. Of course the powerful learning system required for this to work may perversely instantiate these beliefs (ie, there may be daemons aka inner optimizers).

(The most obvious problem I see with this approach is that it seems to imply that the AI can't help the human do anything which the human doesn't already know how to do. For example, if you don't know how to get started filing your taxes, then the robot can't help you. But maybe there's some way to differentiate between more benign cases like that and less benign cases like using nanotechnology to more effectively get groceries?)

A third interpretation of your concern is that you're saying that if the thing is doing well enough to get groceries, there has to be powerful optimization somewhere, and wherever it is, it's going to be pushing toward perverse instantiations one way or another. I don't have any argument against this concern, but I think it mostly amounts to a concern about inner optimizers.

(I feel compelled to mention again that I don't feel strongly that the whole idea makes any sense. I just want to convey why I don't think it's about constraining an underlying motivation system.)

Comment by abramdemski on Alignment Newsletter #41 · 2019-01-20T07:59:27.379Z · score: 8 (4 votes) · LW · GW
Non-Consequentialist Cooperation? (Abram Demski): [...]
However, this also feels different from corrigibility, in that it feels more like a limitation put on the AI system, while corrigibility seems more like a property of the AI's "motivational system". This might be fine, since the AI might just not be goal-directed. One other benefit of corrigibility is that if you are "somewhat" corrigible, then you would like to become more corrigible, since that is what the human would prefer; informed-consent-AI doesn't seem to have an analogous benefit.

You could definitely think of it as a limitation to put on a system, but I actually wasn't thinking of it that way when I wrote the post. I was trying to imagine something which only operates from this principle. Granted, I didn't really explain how that could work. I was imagining that it does something like sample from a probability distribution which is (speaking intuitively) the intersection of what you expect it to do and what you would like it to do.

(It now seems to me that although I put "non-consequentialist" in the title of the post, I didn't explain the part where it isn't consequentialist very well. Which is fine, since the post was very much just spitballing.)

Comment by abramdemski on CDT=EDT=UDT · 2019-01-19T20:22:04.956Z · score: 2 (1 votes) · LW · GW

Agreed. I'll at least edit the post to point to this comment.

Comment by abramdemski on CDT=EDT=UDT · 2019-01-18T21:41:42.577Z · score: 2 (1 votes) · LW · GW

I'm not sure which you're addressing, but, note that I'm not objecting to the practice of illustrating variables with diamonds and boxes rather than only circles so that you can see at a glance where the choices and the utility are (although I don't tend to use the convention myself). I'm objecting to the further implication that doing this makes it not a Bayes net.

Comment by abramdemski on XOR Blackmail & Causality · 2019-01-18T18:57:41.311Z · score: 3 (2 votes) · LW · GW
I hear there is a way to fiddle with the foundations of probability theory so that conditional probabilities are taken as basic and ordinary probabilities are defined in terms of them. Maybe this would solve the problem?

This does help somewhat. See here. But, in order to get good answers from that, you need to already know enough about the structure of the situation.

Maybe I'm late to the party, in which case sorry about that & I look forward to hearing why I'm wrong, but I'm not convinced that epsilon-exploration is a satisfactory way to ensure that conditional probabilities are well-defined. Here's why:

I agree, but I also think there are some things pointing in the direction of "there's something interesting going on with epsilon exploration". Specifically, there's a pretty strong analogy between epsilon exploration and modal UDT: MUDT is like the limit as you send exploration probability to zero, so it never actually happens but it still happens in nonstandard models. However, that only seems to work when you know the structure of the situation logically. When you have to learn it, you have to actually explore sometimes to get it right.

To the extent that MUDT looks like a deep result about counterfactual reasoning, I take this as a point in favor of epsilon exploration telling us something about the deep structure of counterfactual reasoning.

Anyway, see here for some more recent thoughts of mine. (But I didn't discuss the question of epsilon exploration as much as I could have.)

Comment by abramdemski on CDT=EDT=UDT · 2019-01-18T18:40:56.722Z · score: 2 (1 votes) · LW · GW

I disagree. All the nodes in the network should be thought of as grounding out in imagination, in that it's a world-model, not a world. Maybe I'm not seeing your point.

I would definitely like to see a graphical model that's more capable of representing the way the world-model itself is recursively involved in decision-making.

One argument for calling an influence diagram a generalization of a bayes could be that the conditional probability table for the agent's policy given observations is not given as part of the influence diagram, and instead must be solved for. But we can still think of this as a special case of a Bayes net, rather than a generalization, by thinking of an influence diagram as a special sort of Bayes net in which the decision nodes have to have conditional probability tables obeying some optimality notion (such as the CDT optimality notion, the EDT optimality notion, etc).

This constraint is not easily represented within the Bayes net itself, but instead imposed from outside. It would be nice to have a graphical model in which you could represent that kind of constraint naturally. But simply labelling things as decision nodes doesn't do much. I would rather have a way of identifying something as agent-like based on the structure of the model for it. (To give a really bad version: suppose you allow directed cycles, rather than requiring DAGs, and you think of the "backwards causality" as agency. But, this is really bad, and I offer it only to illustrate the kind of thing I mean -- allowing you to express the structure which gives rise to agency, rather than taking agency as a new primitive.)

Comment by abramdemski on What makes people intellectually active? · 2019-01-18T18:28:06.383Z · score: 2 (1 votes) · LW · GW
All in all, I can't wrap my head around "what is the difference between a producer and a consumer of thought?" because the question as posed seems to hold rigor, even quality, constant/irrelevant.

I'm not trying to hold it constant, I'm just trying to understand a relatively low standard, because that's the part I feel confused about. It seems relatively much easier to look at bad intellectual output and say how it could have been better, think about the thought processes involved, etc. Much harder to say what goes into producing output at all vs not doing so.

Comment by abramdemski on Is there a.. more exact.. way of scoring a predictor's calibration? · 2019-01-16T18:26:05.525Z · score: 25 (8 votes) · LW · GW

It's important to note that accuracy and calibration are two different things. I'm mentioning this because the OP asks for calibration metrics, but several answers so far give accuracy metrics. Any proper scoring rule is a measure of accuracy as opposed to calibration.

It is possible to be very well-calibrated but very inaccurate; for example, you might know that it is going to be Monday 1/7th of the time, so you give a probability of 1/7th. Everyone else just knows what day it is. On a calibration graph, you would be perfectly lined up; when you say 1/7th, the thing happens 1/7th of the time.

It is also possible to have high accuracy and poor calibration. Perhaps you can guess coin flips when no one else can, but you are wary of your precognitive powers, which makes you underconfident. So, you always place 60% probability on the event that actually happens (heads or tails). Your calibration graph is far out of line, but your accuracy is higher than anyone else.

In terms of improving rationality, the interesting thing about calibration is that (as in the precog example) if you know you're poorly calibrated, you can boost your accuracy simply by improving your calibration. In some sense it is a free improvement: you don't need to know anything more about the domain; you get more accurate just by knowing more about yourself (by seeing a calibration chart and adjusting).

However, if you just try to be more calibrated without any concern for accuracy, you could be like the person who says 1/7th. So, just aiming to do well on a score of calibration is not a good idea. This could be part of the reason why calibration charts are presented instead of calibration scores. (Another reason being that calibration charts help you know how to adjust to increase calibration.)

That being said, a decomposition of a proper scoring rule into components including a measure of calibration, like Dark Denego gives, seems like the way to go.

Comment by abramdemski on CDT=EDT=UDT · 2019-01-16T04:18:38.870Z · score: 12 (3 votes) · LW · GW

I guess, philosophically, I worry that giving the nodes special types like that pushes people toward thinking about agents as not-embedded-in-the-world, thinking things like "we need to extend Bayes nets to represent actions and utilities, because those are not normal variable nodes". Not that memoryless cartesian environments are any better in that respect.

Comment by abramdemski on CDT=EDT=UDT · 2019-01-16T02:45:28.521Z · score: 14 (8 votes) · LW · GW

Hrm. I realize that the post would be comprehensible to a much wider audience with a glossary, but there's one level of effort needed for me to write posts like this one, and another level needed for posts where I try to be comprehensible to someone who lacks all the jargon of MIRI-style decision theory. Basically, if I write with a broad audience in mind, then I'm modeling all the inferential gaps and explaining a lot more details. I would never get to points like the one I'm trying to make in this post. (I've tried.) Posts like this are primarily for the few people who have kept up with the CDT=EDT sequence so far, to get my updated thinking in writing in case anyone wants to go through the effort of trying to figure out what in the world I mean. To people who need a glossary, I recommend searching lesswrong and the stanford encyclopedia of philosophy.

What are the components of intellectual honesty?

2019-01-15T20:00:09.144Z · score: 32 (8 votes)
Comment by abramdemski on Combat vs Nurture & Meta-Contrarianism · 2019-01-15T19:36:05.442Z · score: 10 (4 votes) · LW · GW

I've avoided people/conversations on those grounds, but I'm not sure it is the best way to deal with it. And I really do think good intellectual progress can be made at level 2. As Ruby said in the post I'm replying to, intellectual debate is common in analytic philosophy, and it does well there.

Maybe my description of intellectual debate makes you think of all the bad arguments-are-soldiers stuff. Which it should. But, I think there's something to be said about highly developed cultures of intellectual debate. There are a lot of conventions which make it work better, such as a strong norm of being charitable to the other side (which, in intellectual-debate culture, means an expectation that people will call you out for being uncharitable). This sort of simulates level 3 within level 2.

As for level 1, you might be able to develop some empathy for it at times when you feel particularly vulnerable and need people to do something to affirm your belongingness in a group or conversation. Keep an eye out for times when you appreciate level-one behavior from others, times when you would have appreciated some level-one comfort, or times when other people engage in level one (and decide whether it was helpful in the situation). It's nice when we can get to a place where no one's ego is on the line when they offer ideas, but sometimes it just is. Ignoring it doesn't make it go away, it just makes you manage it ineptly. My guess is that you are involved with more level one situations than you think, and would endorse some of it.

Comment by abramdemski on CDT Dutch Book · 2019-01-14T07:52:28.950Z · score: 2 (1 votes) · LW · GW

(lightly edited version of my original email reply to above comment; note that Diffractor was originally replying to a version of the Dutch-book which didn't yet call out the fact that it required an assumption of nonzero probability on actions.)

I agree that this Dutch-book argument won't touch probability zero actions, but my thinking is that it really should apply in general to actions whose probability is bounded away from zero (in some fairly broad setting). I'm happy to require an epsilon-exploration assumption to get the conclusion.

Your thought experiment raises the issue of how to ensure in general that adding bets to a decision problem doesn't change the decisions made. One thought I had was to make the bets always smaller than the difference in utilities. Perhaps smaller Dutch-books are in some sense less concerning, but as long as they don't vanish to infinitesimal, seems legit. A bet that's desirable at one scale is desirable at another. But scaling down bets may not suffice in general. Perhaps a bet-balancing scheme to ensure that nothing changes the comparative desirability of actions as the decision is made?

For your cosmic ray problem, what about: 

You didn't specify the probability of a cosmic ray. I suppose it should have probability higher than the probability of exploration. Let's say 1/million for cosmic ray, 1/billion for exploration.

Before the agent makes the decision, it can be given the option to lose .01 util if it goes right, in exchange for +.02 utils if it goes right & cosmic ray. This will be accepted (by either a CDT agent or EDT agent), because it is worth approximately +.01 util conditioned on going right, since cosmic ray is almost certain in that case.

Then, while making the decision, cosmic ray conditioned on going right looks very unlikely in terms of CDT's causal expectations. We give the agent the option of getting .001 util if it goes right, if it also agrees to lose .02 conditioned on going right & cosmic ray.

CDT agrees to both bets, and so loses money upon going right.

Ah, that's not a very good money pump. I want it to lose money no matter what. Let's try again: 

Before decision: option to lose 1 millionth of a util in exchange for 2 utils if right&ray.

During decision: option to gain .1 millionth util in exchange for -2 util if right&ray.

That should do it. CDT loses .9 millionth of a util, with nothing gained. And the trick is almost the same as my dutch book for death in damascus. I think this should generalize well.

The amounts of money lost in the Dutch Book get very small, but that's fine.


2019-01-13T23:46:10.866Z · score: 42 (11 votes)

When is CDT Dutch-Bookable?

2019-01-13T18:54:12.070Z · score: 25 (4 votes)
Comment by abramdemski on CDT Dutch Book · 2019-01-13T08:28:40.350Z · score: 4 (2 votes) · LW · GW

"The expectations should be equal for actions with nonzero probability" -- this means a CDT agent should have equal causal expectations for any action taken with nonzero probability, and EDT agents should similarly have equal evidential expectations. Actually, I should revise my statement to be more careful: in the case of epsilon-exploring agents, the condition is >epsilon rather than >0. In any case, my statement there isn't about evidential and causal expectations being equal to each other, but rather about one of them being conversant across (sufficiently probable) actions.

"differing counterfactual and evidential expectations are smoothly more and more tenable as actions become less and less probable" -- this means that the amount we can take from a CDT agent through a Dutch Book, for an action which is given a different casual expectation than evidential expectation, smoothly reduces as the probability of an action goes to zero. In that statement, I was assuming you hold the difference between evidential and causal expectations constant add you reduce the probability of the action. Otherwise it's not necessarily true.

CDT Dutch Book

2019-01-13T00:10:07.941Z · score: 27 (8 votes)
Comment by abramdemski on Combat vs Nurture & Meta-Contrarianism · 2019-01-12T23:11:02.508Z · score: 9 (4 votes) · LW · GW

I think it's usually a good idea overall, but there is a less cooperative conversational tactic which tries to masquerade as this: listing a number of plausible straw-men in order to create the appearance that all possible interpretations of what the other person is saying are bad. (Feels like from the inside: all possible interpretations are bad; i'll demonstrate it exhaustively...)

It's not completely terrible, because even this combative version of the conversational move opens up the opportunity for the other person to point out the (n+1)th interpretation which hasn't been enumerated.

You can try to differentiate yourself from this via tone (by not sounding like you're trying to argue against the other person in asking the question), but, this will only be somewhat successful since someone trying to make the less cooperative move will also try to sound like they're honestly trying to understand.

Comment by abramdemski on Non-Consequentialist Cooperation? · 2019-01-11T23:54:00.135Z · score: 2 (1 votes) · LW · GW

My gut response is that hillclimbing is itself consequentialist, so this doesn't really help with fragility of value; if you get the hillclimbing direction slightly wrong, you'll still end up somewhere very wrong. On the other hand, Paul's approach rests on something which we could call a deontological approach to the hillclimbing part (IE, amplification steps do not rely on throwing more optimization power at a pre-specified function).

Comment by abramdemski on Non-Consequentialist Cooperation? · 2019-01-11T23:15:16.353Z · score: 4 (2 votes) · LW · GW

I wouldn't say that preference utilitarianism "falls apart"; it just becomes much harder to implement.

And I'd like a little more definition of "autonomy" as a value - how do you operationally detect whether you're infringing on someone's autonomy?

My (still very informal) suggestion is that you don't try to measure autonomy directly and optimize for it. Instead, you try to define and operate from informed consent. This (maybe) allows a system to have enough autonomy to perform complex and open-ended tasks, but not so much that you expect perverse instantiations of goals.

My proposed definition of informed consent is "the human wants X and understands the consequences of the AI doing X", where X is something like a probability distribution on plans which the AI might enact. (... that formalization is very rough)

Is it just the right to make bad decisions (those which contradict stated goals and beliefs)?

This is certainly part of respecting an agent's autonomy. I think more generally respecting someone's autonomy means not taking away their freedom, not making decisions on their behalf without having prior permission to do so, and avoiding operating from assumptions about what is good or bad for a person.

Comment by abramdemski on Non-Consequentialist Cooperation? · 2019-01-11T22:56:11.184Z · score: 7 (4 votes) · LW · GW
Autonomy is a value and can be expressed as a part of a utility function, I think. So ambitious value learning should be able to capture it, so an aligned AI based on ambitious value learning would respect someone's autonomy when they value it themselves. If they don't, why impose it upon them?

One could make a similar argument for corrigibility: ambitious value learning would respect our desire for it to behave corrigibly if we actually wanted that, and if we didn't want that, why impose it?

Corrigibility makes sense as something to ensure in its own right because it is good to have in case the value learning is not doing what it should (or something else is going wrong).

I think respect for autonomy is similarly useful. It helps avoid evil-genie (perverse instantiation) type failures by requiring that we understand what we are asking the AI to do. It helps avoid preference-manipulation problems which value learning approaches might otherwise have, because regardless of how well expected-human-value is optimized by manipulating human preferences, such manipulation usually involves fooling the human, which violates autonomy.

(In cases where humans understand the implications of value manipulation and consent to it, it's much less concerning -- though we still want to make sure the AI isn't prone to pressure humans into that, and think carefully about whether it is really OK.)

Is the point here that you expect we can't solve those problems and therefore need an alternative? The idea doesn't help with "the difficulties of assuming human rationality" though so what problems does it help with?

It's less an alternative in terms of avoiding the things which make value learning hard, and more an alternative in terms of providing a different way to apply the same underlying insights, to make something which is less of a ruthless maximizer at the end.

In other words, it doesn't avoid the central problems of ambitious value learning (such as "what does it mean for irrational beings to have values?"), but it is a different way to try to put those insights together into a safe system. You might add other safety precautions to an ambitious value learner, such as [ambitious value learning + corrigibility + mild optimization + low impact + transparency]. Consent-based systems could be an alternative to that agglomerated approach, either replacing some of the safety measures or making them less difficult to include by providing a different foundation to build on.

Is the idea that even trying to do ambitious value learning constitutes violating someone's autonomy (in other words someone could have a preference against having ambitious value learning done on them) and by the time we learn this it would be too late?

I think there are a couple of ways in which this is true.

  • I mentioned cases where a value-learner might violate privacy in ways humans wouldn't want, because the overall result is positive in terms of the extent to which the AI can optimize human values. This is somewhat bad, but it isn't X-risk bad. It's not my real concern. I pointed it out because I think it is part of the bigger picture; it provides a good example of the kind of optimization a value-learner is likely to engage in, which we don't really want.
  • I think the consent/autonomy idea actually gets close (though maybe not close enough) to something fundamental about safety concerns which follow an "unexpected result of optimizing something reasonable-looking" pattern. As such, it may be better to make it an explicit design feature, rather than trust the system to realize that it should be careful about maintaining human autonomy before it does anything dangerous.
  • It seems plausible that, interacting with humans over time, a system which respects autonomy at a basic level would converge to different overall behavior than a value-learning system which trades autonomy off with other values. If you actually get ambitious value learning really right, this is just bad. But, I don't endorse your "why impose it on them?" argument. Humans could eventually decide to run all-out value-learning optimization (without mild optimization, without low-impact constraints, without hard-coded corrigibility). Preserving human autonomy in the meantime seems
Comment by abramdemski on What makes people intellectually active? · 2019-01-11T21:20:51.846Z · score: 4 (2 votes) · LW · GW

Abstracting your idea a little: in order to go beyond first thoughts, you need some kind of strategy for developing ideas further. Without one, you will just have the same thoughts when you try to "think more" about a subject. I've edited my answer to elaborate on this idea.

Non-Consequentialist Cooperation?

2019-01-11T09:15:36.875Z · score: 40 (13 votes)

Combat vs Nurture & Meta-Contrarianism

2019-01-10T23:17:58.703Z · score: 55 (16 votes)
Comment by abramdemski on What makes people intellectually active? · 2019-01-01T06:59:21.301Z · score: 2 (1 votes) · LW · GW

Well, my original intention was definitely more like "why don't more people keep developing their ideas further?" as opposed to "why don't more people have ideas?" -- but, I definitely grant that sharing ideas is what I actually am able to observe.

Comment by abramdemski on What makes people intellectually active? · 2019-01-01T06:25:21.541Z · score: 10 (7 votes) · LW · GW

If someone had commented with a one-line answer like "people are intellectually active if it is rewarding", I would have been very meh about it -- it's obvious, but trivial. All the added detail you gave makes it seem like a pretty useful observation, though.

Two possible caveats --

  • What determines what's rewarding? Any set of behaviors can be explained by positing that they're rewarding, so for this kind of model to be meaningful, there's got to be a set of rewards involved which are relatively simple and have relatively broad explanatory power.
  • In order for a behavior to be rewarded in the first place, it has to be generated the first time. How does that happen? Animal trainers build up complicated tricks by rewarding steps incrementally approaching the desired behavior. Are there similar incremental steps here? What are they, and what rewards are associated with them?

(Your spelled-out details give some ideas in those directions.)

Comment by abramdemski on What makes people intellectually active? · 2019-01-01T06:08:31.568Z · score: 3 (2 votes) · LW · GW

How do you manage your pipeline beyond collecting ideas?

I used to simply have an idea notebook. Writing down ideas was a monolithic activity in my head, encompassing everything from capturing the initial thought to developing it further and communicating it. I now think of those as three very different stages.

  • Capturing ideas: having appropriate places to write down thoughts as short memory aids, maybe a few words or a sentence or two.
  • Developing ideas: explaining the idea to myself in text. This allows me to take the seed of an idea and try different ways of fleshing out details, refine it, maybe turn it into something different.
  • Communicating: explaining it such that other people can understand it. This forces the idea to be given much more detail, often uncovering additional problems. So, I get another revision of the idea out of this (even if no one reads the writeup).

My pipeline definitely doesn't always work, though. In particular, capturing an idea does not guarantee that I will later develop it. I find that even if I capture an idea, I'll drop it by default if it isn't collected with a set of related ideas which are part of an ongoing thought process. This is somewhat tricky to accomplish.

Comment by abramdemski on What makes people intellectually active? · 2018-12-30T23:27:56.743Z · score: 8 (5 votes) · LW · GW

Concerning TV Tropes --

I think a primary, maybe the primary, effect that the sequences have on a reader's thinking is through this kind of pattern-matching. It is shallow, as rationality techniques go, but it can have a large effect nonetheless. It's like the only rationality technique you have is TAPs, and you only set up taps of the form "resemblance to rationality concept" -> "think of rationality concept". But, those taps can still be quite useful, since thinking of a relevant concept may lead to something.

Concerning the rest -- not sure what to comment on, but it is a datapoint.

Comment by abramdemski on What makes people intellectually active? · 2018-12-30T23:00:13.092Z · score: 8 (4 votes) · LW · GW

You seem to be claiming that it is a personality trait, something which influences how a person will interact with a broad variety of ideas and circumstances, which may or may not be true. Suggesting that it is a personality trait also comes with connotations that it would be hard to change, and may have origins in genetics or early childhood.

I'm somewhat skeptical of both claims. I suppose I think there is a broad personality factor which makes some difference, but for one person, it will tend to vary a lot from subject to subject, with potentially large (per-subject) variations throughout life (but especially around one's teens perhaps).

Comment by abramdemski on What makes people intellectually active? · 2018-12-30T22:55:39.044Z · score: 5 (3 votes) · LW · GW

What Martin is describing might somewhat resemble OCD, without actually being OCD. Let's just say that some degree of obsession seems related to the development of ideas, at least in some cases.

I did want to focus on the descriptive question rather than the normative question. It is possible that almost all intellectual progress comes from obsessive people, while it's also "not the happiest or most fruitful path". Do you think that's wrong? If so, why do you think there are other common paths? I'm actually fairly skeptical of that. It seems very plausible that obsession is causally important.

Comment by abramdemski on What makes people intellectually active? · 2018-12-30T22:40:29.965Z · score: 3 (2 votes) · LW · GW


Comment by abramdemski on What makes people intellectually active? · 2018-12-30T22:29:16.729Z · score: 18 (7 votes) · LW · GW

I think a big contributing factor is having some kind of intellectual community / receptive audience. Having a social context in which new ideas are expected, appreciated, and refined creates the affordance to really think about things.

The way I see it, contact with such a community only needs to happen initially. After that, many people will keep developing ideas on their own.

A school/work setting doesn't seem to count for as much as a less formal voluntary group. It puts thinking in the context of "for work" / "for school", which may even actively discourage developing one's own ideas later.

Also, it seems like attempts to start intellectual groups in order to provide the social context for developing ideas will often fail. People don't know how to start good groups by default, and there is a lot which can go wrong.

Editing to add:

Another important bottleneck is having a mental toolkit for working on hard problems. One reason why people don't go past the first answer which comes to mind is that they don't have any routines to follow which get them past their first thoughts. Even if you're asked to think more about a problem, you'll likely rehearse the same thoughts, and reach the same conclusions, unless you have a strategy for getting new thoughts. Johnswentworth's answer hints at this direction.

The best resource I know of for developing this kind of mental toolkit is Polya's book How to Solve It. He provides a set of questions to ask yourself while problem-solving. At first, these questions may seem like object-level tools to help you get unstuck when you are stuck, which is true. But over time, asking the questions will help you develop a toolkit for thinking about problems.

Comment by abramdemski on What are the axioms of rationality? · 2018-12-30T22:26:23.159Z · score: 18 (4 votes) · LW · GW

There are a variety of axiom systems which justify mostly similar notions of rationality, and a few posts explore these axiom systems. Sniffnoy summarized Savage's Axioms. I summarized some approaches and why I think it may be possible to do better. I wrote in detail about complete class theorems. I also took a look at consequences of the jeffrey-bolker axioms. (Jeffrey-bolker and complete class are my two favorite ways to axiomatize things, and they have some very different consequences!)

As many others are emphasizing, these axiomatic approaches don't really summarize rationality-as-practiced, although they are highly connected. Actually, I think people are kind of downplaying the connection. Although de-biasing moves such as de-anchoring aren't usually justified by direct appeal to rationality axioms, it is possible to flesh out that connection, and doing this with enough things will likely improve your decision-theoretic thinking.


1) The fact that there are many alternative axiom systems, and that we can judge them for various good/bad features, illustrates that one set of axioms doesn't capture the whole of rationality (at least, not yet).

2) The fact that not even the sequences deal much with these axioms shows that they need not be central to a practice of rationality. Thoroughly understanding probability and expected utility as calculations, and understanding that there are strong arguments for these calculations in particular is more important.

Comment by abramdemski on What makes people intellectually active? · 2018-12-30T21:51:26.367Z · score: 27 (8 votes) · LW · GW

Yeah, I think that one of the most important things in my intellectual development was an assignment in high school to keep an idea pocket-book for some period of time. I filled several books in the alloted time, and just kept at it. Writing in a notebook became one of my primary down-time activities.

The difference between having an idea on your commute home from work and then getting home and surfing the internet / turning on the TV / whatever, vs getting home and writing out the idea in a notebook before doing those other things, is huge. That's at least one iteration on the idea; a chance to add details, notice flaws, and refine them.

Comment by abramdemski on What makes people intellectually active? · 2018-12-30T21:44:59.426Z · score: 4 (2 votes) · LW · GW
how do you know people aren't having ideas? Very few of my ideas are something I've thought enough about to write down or talk about in public, and many (most?) people do not have a great desire to write down or discuss their not-fully-fleshed-out ideas for public consumption anyway.

I suppose there are a lot of different lines we can draw.

Having ideas at all: almost literally everyone (although, I'm not sure what we should count here, exactly).

Having second thoughts; being dissatisfied with easy answers, and looking for better ones: lots of people, but, to greatly varying degrees.

Having an affordance to think new thoughts in an area of interest; not letting a vague notion of experts knowing better be a curiosity-stopper: somewhat rare? Particularly rare in combination with a moderately informed position? Maybe quite rare in combination with the "having second thoughts" attribute?

Building up an intellectual edifice (of whatever quality) around some topic of interest: fairly rare

What makes people intellectually active?

2018-12-29T22:29:33.943Z · score: 75 (31 votes)
Comment by abramdemski on Reasons compute may not drive AI capabilities growth · 2018-12-27T16:38:19.289Z · score: 6 (3 votes) · LW · GW

I enjoyed the discussion. My own take is that this view is likely wrong.

  • The "many ways to train that aren't widely used" is evidence for alternatives which could substitute for a certain amount of hardware growth, but I don't see it as evidence that hardware doesn't drive growth.
  • My impression is that alternatives to grid search aren't very popular because alternatives don't really work reliably. Maybe this has changed and people haven't picked up on it yet. Or maybe alternatives take more effort than they're worth.

The fact that these things are fairly well known and still not used suggests that it is cheaper to pick up more compute rather than use them. You discuss these things as evidence that computing power is abundant. I'm not sure how to quantify that. It seems like you mean for "computing power is abundant" to be an argument against "computing power drives progress".

  • "computing power is abundant" could mean that everyone can run whatever crazy idea they want, but the hard part is specifying something which does something interesting. This is quite relative, though. Computing power is certainly abundant compared to 20 years ago. But, the fact that people pay a lot for computing power to run large experiments means that it could be even more abundant than it is now. And, we can certainly write down interesting things which we can't run, and which would produce more intelligent behavior if only we could.
  • "computing power is abundant" could mean that buying more computing power is cheaper in comparison to a lot of low-hanging-fruit optimization of what you're running. This seems like what you're providing evidence for (on my interpretation -- I'm not imagining this is what you intend to be providing evidence for). This to me sounds like an argument that computing power drives progress: when people want to purchase capability progress, they often purchase computing power.

I do think that your observations suggest that computing power can be replaced by engineering, at least to a certain extent. So, slower progress on faster/cheaper computers doesn't mean correspondingly slower AI progress; only somewhat slower.

Comment by abramdemski on Multi-agent predictive minds and AI alignment · 2018-12-22T04:09:51.151Z · score: 8 (5 votes) · LW · GW
Vague introspective evidence for active inference comes from an ability to do inner simulations.

I would take this as introspective evidence in favor of something model-based, but it could look more like model-based RL rather than active inference. (I am not specifically advocating for model-based RL as the right model of human thinking.)

Possibly boldest claim I can make from the principle alone is that people will have a bias to take actions which will "prove their models are right" even at the cost of the actions being actually harmful for them in some important sense.

I believe this claim based on social dynamics -- among social creatures, it seems evolutionarity useful to try to prove your models right. An adaptation for doing this may influence your behavior even when you have no reason to believe anyone is looking or knows about the model you are confirming.

So, an experiment which would differentiate between socio-evolutionary causes and active inference would be to look for the effect in non-social animals. An experiment which comes to mind is that you somehow create a situation where an animal is trying to achieve some goal, but you give false feedback so that the animal momentarily thinks it is less successful than it is. Then, you suddenly replace the false feedback with real feedback. Does the animal try and correct to the previously believed (false) situation, in order to minimize predictive error? Rather than continuing to optimize in a way consistent with the task reward?

There are a lot of confounders. For example, one version of the experiment would involve trying to put your paw as high in the air as possible, and (somehow) initially getting false feedback about how well you are doing. When you suddenly start getting good feedback, do you re-position the paw to restore the previous level of feedback (minimizing predictive error) before trying to get it higher again? A problem with the experiment is that you might re-position your paw just because the real feedback changes the cost-benefit ratio, so a rational agent would try less hard at the task if it found out it was doing better than it thought.

A second example: pushing an object to a target location on the floor. If (somehow) you initially get bad feedback about where you are on the floor, and suddenly the feedback gets corrected, do you go to the location you thought you were at before continuing to make progress toward the goal? A confounder here is that you may have learned a procedure for getting the object to the desired location, and you are more confident in the results of following the procedure than you are otherwise. So, you prefer to push the object to the target location along the familiar route rather than in the efficient route from the new location, but this is a consequence of expected utility maximization under uncertainty about the task rather than any special desire to increase familiarity.

Note that I don't think of this as a prediction made by active inference, since active inference broadly speaking may precisely replicate max-expected-utility, or do other things. However, it seems like a prediction made by your favored version of active inference.

Because of the theoretical beauty, I think we should take active inference seriously as an architectural principle.

I think we may be able to make some progress on the question of its theoretical beauty. I share a desire for unified principles of epistemic and instrumental reasoning. However, I have an intuition that active inference is just not the right way to go about it. The unification is too simplistic, and has too many degrees of freedom. It should have some initial points for its simplicity, but it should lose those points when the simplest versions don't seem right (eg, when you conclude that the picture is missing goals/motivation).

So far, the description was broadly Bayesian/optimal/"unbounded". Unbounded predictive processing / active inference agent is a fearsome monster in a similar way as a fully rational VNM agent. The other key ingredient is bounded rationality. Most biases are consequence of computational/signal processing boundedness, both in PP/AI models and non PP/AI models.

FWIW, I want to mention logical induction as a theory of bounded rationality. It isn't really bounded enough to be the picture of what's going on in humans, but it is certainly major progress on the question of what should happen to probability theory when you have bounded processing power.

I mention this not because it is directly relevant, but because I think people don't necessarily realize logical induction is in the "bounded rationality" arena (even though "logical uncertainty" is definitionally very very close to "bounded rationality", the type of person who tends to talk about logical uncertainty is usually pretty different from the type of person who talks about bounded rationality, I think).


Another thing I want to mention -- although not every version of active inference predicts that organisms actively seek out the familiar and avoid the unfamiliar, it does seem like one of the central intended predictions, and a prediction I would guess most advocates of active inference would argue matches reality. One of my reasons for not liking the theory much is because I don't think it is likely to capture curiosity well. Humans engage in both familiarity-seeking and novelty-seeking behavior, and both for a variety of reasons (both terminal-goal-ish and instrumental-goal-ish), but I think we are closer to novelty-seeking than active inference would predict.

In Delusion, Survival, and Intelligent Agents (Ring & Orseau), behavior of a knowledge-seeking agent and a predictive-accuracy seeking agent are compared. Note that the knowledge-seeking agent and predictive-accuracy seeking agent have exactly opposite utility functions: the knowledge-seeking agent likes to be surprised, whereas the accuracy-seeking agent dislikes surprises. The knowledge-seeking agent behaves in (what I see as) a much more human way than the accuracy-seeking agent. The accuracy-seeking agent will try to gain information to a limited extent, but will ultimately try to remove all sources of novel stimuli to the extent possible. The knowledge-seeking agent will try to do new things forever.

I would also expect evolution to produce something more like the knowledge-seeking agent than the accuracy-seeking agent. In RL, curiosity is a major aid to learning. The basic idea is to augment agents with an intrinsic motive to gain information, in order to ultimately achieve better task performance. There are a wide variety of formulas for curiosity, but as far as I know they are all closer to valuing surprise than avoiding surprise, and this seems like what they should be. So, to the extent that evolution did something similar to designing a highly effective RL agent, it seems more likely that organisms seek novelty as opposed to avoid it.

So, I think the idea that organisms seek familiar experiences over unfamiliar is actually the opposite of what we should expect overall. It is true that for an organism which has learned a decent amount about its environment, we expect to see it steering toward states that are familiar to it. But this is just a consequence of the fact that it has optimized its policy quite a bit; so, it steers toward rewarding states, and it will have seen rewarding states frequently in the past for the same reason. However, in order to get organisms to this place as reliably as possible, it is more likely that evolution would have installed a decision procedure which steers disproportionately toward novelty (all else being equal) than one which steers disproportionately away from novelty (all else being equal).

Comment by abramdemski on Player vs. Character: A Two-Level Model of Ethics · 2018-12-19T23:55:08.296Z · score: 4 (3 votes) · LW · GW

I think hyperintelligent lovecraftian creature is the right picture. I don't think the player is best located in the brain.

there is not enough bandwidth between DNA and the neural network; evolution can input some sort of a signal like "there should be a subsystem tracking social status, and that variable should be maximized" or tune some parameters, but it likely does not have enough bandwidth to transfer some complex representation of the real evolutionary fitness.

I think you would agree that evolution has enough bandwidth to transmit complex strategies. The mess of sub-agents is, I think, more like the character in the analogy. There are some "player" calculations done in the brain itself, but many occur at the evolutionary level.

Comment by abramdemski on Player vs. Character: A Two-Level Model of Ethics · 2018-12-19T23:48:44.979Z · score: 4 (3 votes) · LW · GW

I don't think the "player" is restricted to the brain. A lot of the computation is evolutionary. I think it may be reasonable to view some of the computation as social and economic as well.

Comment by abramdemski on Player vs. Character: A Two-Level Model of Ethics · 2018-12-19T23:41:58.640Z · score: 9 (5 votes) · LW · GW

One thing which I find interesting about many 2-system models, including this one, is that the "lower" system (the subconscious, the elephant, system 1, etc) is often not doing its calculations entirely or even primarily in the brain (though this is only rarely clarified). The original system 1 / system 2 distinction was certainly referring to brain structures -- "hot" and "cool" subsystems of the brain. But, in terms of Freud's earlier 2-system model, the conscious vs unconscious, Carl Jung found it useful to speak of the "collective unconsciousness" as being an element of the unconscious mind. I think Jung's idea is actually a good way of cutting things up.

It's very obvious in the example in this post with the baby nursing: there doesn't need to be a calculation anywhere in the baby which figures out that wanting mama in the evening reduces the chances of more siblings. There probably isn't a calculation like that.

So, in many cases, the "player" is indeed lovecraftian and inhuman: it is Azathoth, the blind watchmaker. Evolution selects the genes which shape the personality type.

Obviously, not all of the "player" computations you're referring to occur at the evolutionary level. But, I think the boundary is a fluid one. It is not always easy to cleanly define whether an adaptation is evolutionary or within-lifetime; many things are a complicated combination of both (see the discussion of the Baldwin effect in The Plausibility of Life.)

I think there are other lovecraftian gods holding some of the strings as well. Many habits and norms are shaped by economic incentives (Mammon, god of the market place). This is a case where more of the computation may be in a person's head, but, not all of it. The market itself does a lot of computation, and people can pick up machiavellian business norms without having a generator of machiavellianness inside their skull, or blindly ape personality-ish things contributing to reasonable spending habits without directly calculating such things, etc.

We can explain the words of politicians better by thinking they're optimized for political advantage rather than truth, and much of that optimization may be in the brain of the politician. But, the political machine also can select for politicians who honestly believe the politically-advantageous things. In an elephant/rider model, the computation of the elephant may be outside the politician.

Comment by abramdemski on Multi-agent predictive minds and AI alignment · 2018-12-17T08:57:35.906Z · score: 7 (2 votes) · LW · GW

I see two ways things could be. (They could also be somewhere in between, or something else entirely...)

  • It could be that extending PP to model actions provides a hypothesis which sticks its neck out with some bold predictions, claiming that specific biases will be observed, and these either nicely fit observations which were previously puzzling, or have since been tested and confirmed. In that case, it would make a great deal of sense to use PP's difficulty modeling goal-oriented behavior an a model of human less-that-goal-oriented behavior.
  • It could be that PP can be extended to actions in many different ways, and it is currently unclear which way might be good. In this case, it seems like PP's difficulty modeling goal-oriented behavior is more of a point against PP, rather than a useful model of the complexity of human values.

The way you use "PP struggles to model goal-oriented behavior" in the discussion in the post, it seems like it would need to be in this first sense; you think PP is a good fit for human behavior, and also, that it isn't clear how to model goals in PP.

The way you talk about what you meant in your follow-up comment, it sounds like you mean the world is the second way. This also fits with my experience. I have seen several different proposals for extending PP to actions (that is, several ways of doing active inference). Several of these have big problems which do not seem to reflect human irrationality in any particular way. At least one of these (and I suspect more than one, based on the way Friston talks about the free energy principle being a tautology) can reproduce maximum-expected-utility planning perfectly; so, there is no advantage or disadvantage for the purpose of predicting human actions. The choice between PP and expected utility formalisms is more a question of theoretical taste.

I think you land somewhere in the middle; you (strongly?) suspect there's a version of PP which could stick its neck out and tightly model human irrationality, but you aren't trying to make strong claims about what it is.

My object-level problem with this is, I don't know why you would suspect this to be true. I haven't seen people offer what strikes me as support for active inference, and I've asked people, and looked around. But, plenty of smart people do seem to suspect this.

My meta-level problem with this is, it doesn't seem like a very good premise from which to argue the rest of your points in the post. Something vaguely PP-shaped may or may not be harder to extract values from than an expected-utility-based agent. (For example, the models of bounded rationality which were discussed at the human-aligned AI summer school had a similar flavor, but actually seem easier to extract values from, since the probability of an action was made to be a monotonic and continuous function of the action's utility.)

Again, I don't disagree with the overall conclusions of your post, just the way you argued them.

Comment by abramdemski on Multi-agent predictive minds and AI alignment · 2018-12-16T03:06:57.347Z · score: 28 (5 votes) · LW · GW

I agree with the broad outline of your points, but I find many of the details incongruous or poorly stated. Some of this is just a general dislike of predictive processing, but assuming a predictive processing model, I don't see why your further comments follow.

I don't claim to understand predictive processing fully, but I read the SSC post you linked, and looked at some other sources. It doesn't seem to me like predictive processing struggles to model goal-oriented behavior. A PP agent doesn't try to hide in the dark all the time to make the world as easy to predict as possible, and it also doesn't only do what it has learned to expect itself to do regardless of what leads to pleasure. My understanding is that this depends on details of the notion of free energy.

So, although I agree that there are serious problems with taking an agent and inferring its values, it isn't clear to me that PP points to new problems of this kind. Jeffrey-Bolker rotation already illustrates that there's a large problem within a very standard expected utility framework.

The point about viewing humans as multi-agent systems, which don't behave like single-agent systems in general, also doesn't seem best made within a PP framework. Friston's claim (as I understand it) is that clumps of matter will under very general conditions eventually evolve to minimize free energy, behaving as agents. If clumps of dead matter can do it, I guess he would say that multi-agent systems can do it. Aside from that, PP clearly makes the claim that systems running on a currency of prediction error (as you put it) act like agents.

Again, this point seems fine to make outside of PP, it just seems like a non-sequitur in a PP context.

I also found the options given in the "what are we aligning with" section confusing. I was expecting to see a familiar litany of options (like aligning with system 1 vs system 2, revealed preferences vs explicitly stated preferences, etc). But I don't know what "aligning with the output of the generative models" means -- it seems to suggest aligning with a probability distribution rather than with preferences. Maybe you mean imitation learning, like what inverse reinforcement learning does? This is supported by the way you immediately contrast with CIRL in #2. But, then, #3, "aligning with the whole system", sounds like imitation learning again -- training a big black box NN to imitate humans. It's also confusing that you mention options #1 and #2 collapsing into one -- if I'm right that you're pointing at IRL vs CIRL, it doesn't seem like this is what happens. IRL learns to drink coffee if the human drinks coffee, whereas CIRL learns to help the human make coffee.

FWIW, I think if we can see the mind as a collection of many agents (each with their own utility function), that's a win. Aligning with a collection of agents is not too hard, so long as you can figure out a reasonable way to settle on fair divisions of utility between them.

Comment by abramdemski on When EDT=CDT, ADT Does Well · 2018-12-05T23:06:14.090Z · score: 3 (2 votes) · LW · GW

I don't currently know of any example where this achieves better performance than LIDT. The raw action-expectations of the logical inductor, used as an embedder, will always pass the reality filter and the CDT=EDT filter. Any embedder differing from those expectations would have to differ on the expected utility of an action other than the one it recommended, in order to pass the CDT=EDT filter. So the expected utility of alternate embedders, according to the those embedders themselves, can only be lower than that of the LIDT-like embedder.

There seems to be a deep connection between the CDT=EDT assumption and LIDT-like reasoning. In the Bayes-net setting where I prove CDT=EDT, I assume self-knowledge of mixed strategy (aka mixed-strategy ratifiability) and I assume one can implement randomized strategies without interference (mixed-strategy implementability). LIDT has the first property, since a logical inductor will learn a calibrated estimate of the action probabilities in a given situation. In problems where the environment is entangled with the randomization the agent uses, like troll bridge, LIDT can do quite poorly. Part of the original attraction of the CDT=EDT philosophy for me was the way it seems to capture how logical induction naturally wants to think about things.

I don't think loss relative to the argmax agent for the true environment is a very good optimality notion. It is only helpful in so far as the argmax strategy on the true environment is the best you can do. Other agents may perform better, in general. For example, argmax behaviour will 2-box in Newcomb so long as the predictor can't predict the agent's exploration. (Say, if the predictor is predicting you with your same logical inductor.) Loss relative to other agents more generally (like in the original ADT write-up) seems more relevant.

Embedded Agency (full-text version)

2018-11-15T19:49:29.455Z · score: 83 (29 votes)

Embedded Curiosities

2018-11-08T14:19:32.546Z · score: 75 (26 votes)

Subsystem Alignment

2018-11-06T16:16:45.656Z · score: 114 (35 votes)

Robust Delegation

2018-11-04T16:38:38.750Z · score: 109 (36 votes)

Embedded World-Models

2018-11-02T16:07:20.946Z · score: 78 (23 votes)

Decision Theory

2018-10-31T18:41:58.230Z · score: 84 (29 votes)

Embedded Agents

2018-10-29T19:53:02.064Z · score: 149 (60 votes)

A Rationality Condition for CDT Is That It Equal EDT (Part 2)

2018-10-09T05:41:25.282Z · score: 17 (6 votes)
Comment by abramdemski on A Rationality Condition for CDT Is That It Equal EDT (Part 1) · 2018-10-04T14:38:38.057Z · score: 3 (2 votes) · LW · GW

I maybe should have clarified that when I say CDT I'm referring to a steel-man CDT which would use some notion of logical causality. I don't think the physical counterfactuals are a live hypothesis in our circles, but several people advocate reasoning which looks like logical causality.

Implementability asserts that you should think of yourself as logico-causally controlling your clone when it is a perfect copy.

A Rationality Condition for CDT Is That It Equal EDT (Part 1)

2018-10-04T04:32:49.483Z · score: 21 (7 votes)
Comment by abramdemski on In Logical Time, All Games are Iterated Games · 2018-09-26T07:52:43.871Z · score: 3 (2 votes) · LW · GW

I have been thinking a bit about evolutionarily stable equilibria, now. Two things seem interesting (perhaps only as analogies, not literal applications of the evolutionarily stable equilibria concept):

  • The motivation for evolutionary equilibria involves dumb selection, rather than rational reasoning. This cuts the tricky knots of recursion. It also makes the myopic learning, which only pays attention to how well things perform in of round, seem more reasonable. Perhaps there's something to be said about rational learning algorithms needing to cut the knots of recursion somehow, such that the evolutionary equilibrium concept holds a lesson for more reflective agents.
  • The idea of evolutionary stability is interesting because it mixes the game and the metagame together a little bit: the players should do what is good for them, but the resulting solution should also be self-enforcing, which means consideration is given to how the solution shapes the future dynamics of learning. This seems like a necessary feature of a solution.
Comment by abramdemski on Track-Back Meditation · 2018-09-25T20:10:38.128Z · score: 3 (2 votes) · LW · GW
Meta comment: are your upvotes worth 7 points?

Seems that way. I don't know what the exact formula is, but it is based on karma.

Comment by abramdemski on Track-Back Meditation · 2018-09-25T06:54:14.002Z · score: 3 (2 votes) · LW · GW
Ah, I wouldn't have expected that. Good to know!

Thinking about your framing from TMI, perhaps I'm supposed to put my awareness but not my attention on the distracting thoughts. The reason it is tempting to do more than this is to "fully integrate the part of me that wants to not be doing this" -- IE, put full awareness onto it to dialogue with it, and decide what I really want to do with the fullness of what I want right now.

In your experience, is it enough to have awareness on the stubbed toe, or is it necessary to put attention on it? You describe your attention going to the toe.

Comment by abramdemski on Track-Back Meditation · 2018-09-24T18:50:22.431Z · score: 3 (2 votes) · LW · GW
From my past experience, when I had more awareness I felt like I was way worse at things. I noticed every time I forgot something, and saw lots of dumb thoughts. But it seems likely to me now that that's just how things have always been, and that was one of the few times I was aware of it. Do you say your productivity was worse because you actually got fewer things done, or just because you noticed lots of awkward gaps and tangents that might've always been there, previously unnoticed?

I got less done.

The Power of Now is not very gears-y. It does get a litte gears-y when it talks about concrete practices which aim to help you live in the now, but only a little. You have to fill in gears. I generated very charitable interpretations as I was reading. it. It's definitely a soft skills book.

It sounds like in your comment, you're saying one of these two things:

The comment you're referring to was making a more philosophical point that moments are only meaningful in their connection to other moments. I could have made the same point by pointing out that there is always a delay from when one neuron fires to when another fires in response. I was pushing against the strong Now-ish ontology in the power of now. Nonetheless, you response is not irrelevant, because part of what I was claiming was that one should do something much like "checking in" as you describe it, in order to see whether they were really conscious/aware/attentive of what was happening in the previous moment.

I would expect it to decrease your attention and increase your peripheral awareness. And more specifically to decrease your attention on thoughts, and increase your peripheral awareness of external senses. Does that sound right?

I think this is part of it, but another part of it is that The Power of Now recommends something like directly facing your suffering in order to transmute it into consciousness. This led me to nonlinguistically focus on any distracting feelings such as not wanting to be doing what I was doing and wanting to procrastinate instead. This was somewhat interesting, but disruptive of productivity.

Comment by abramdemski on Realism about rationality · 2018-09-24T18:38:09.419Z · score: 34 (12 votes) · LW · GW

Rationality realism seems like a good thing to point out which might be a crux for a lot of people, but it doesn't seem to be a crux for me.

I don't think there's a true rationality out there in the world, or a true decision theory out there in the world, or even a true notion of intelligence out there in the world. I work on agent foundations because there's still something I'm confused about even after that, and furthermore, AI safety work seems fairly hopeless while still so radically confused about the-phenomena-which-we-use-intelligence-and-rationality-and-agency-and-decision-theory-to-describe. And, as you say, "from a historical point of view I’m quite optimistic about using maths to describe things in general".

Comment by abramdemski on In Logical Time, All Games are Iterated Games · 2018-09-22T01:09:45.838Z · score: 8 (6 votes) · LW · GW

I agree that "functional time" makes sense, but somehow, I like "logical time" better. It brings out the paradox: logical truth is timeless, but any logical system must have a proof ordering, which brings out a notion of time based on what follows from what.

Comment by abramdemski on Track-Back Meditation · 2018-09-21T23:05:30.399Z · score: 4 (3 votes) · LW · GW

Since recently reading The Power of Now, which thoroughly and viscerally describes the perspective in which track-back would be harmful (because it distracts from the now), I want to elaborate on this some more.

Tho Power of Now did something interesting to my moment-to-moment awareness, but at least in the short term, it seemed to wreck my productivity. Returning to the track-back movement rather than the "now" type movement seems to help bring me back quite a bit.

No moment exists in isolation; anything which you can mentally label as a moment is an extended moment. Furthermore, although you can cultivate what feels like heightened states of awareness (as in The Power of Now), the only way to verify awareness is by checking whether you perceive more detail, and remember more accurately. To simultaneously be the observer and the observed is an illusion; you are always only observing a previous self, even if the delay is very slight.

So, checking whether you can remember what happened in your head over the past few seconds is a check of awareness. Furthermore, Shinzen Young suggests that awareness after-the-fact is in some sense as good for your practice, and more compatible with intellectual work. Furthermore, I find that paying attention to the train of thought/feeling which led to distraction is really helpful for maintaining focus and motivation.

I'm phrasing this as in opposition to "being in the now", but, I'm not sure to what extent I really mean that. I do think I learned things from The Power of Now; and, I'm not deeply experienced in either style.

Comment by abramdemski on In Logical Time, All Games are Iterated Games · 2018-09-21T20:24:53.323Z · score: 2 (1 votes) · LW · GW

If you have no memory, how can you learn? I recognize that you can draw a formal distinction, allowing learning without allowing the strategies being learned to depend on the previous games. But, you are still allowing the agent itself to depend on the previous games, which means that "learning" methods wich bake in more strategy will perform better. For example, a learning method could learn to always go straight in a game of chicken by checking to see whether going straight causes the other player to learn to swerve. IE, it doesn't seem like a principled distinction.

Furthermore, I don't see the motivation for trying to do well in a single-shot game via iterated play. What kind of situation is it trying to model? This is discussed extensively in the paper I mentioned in the post, "If multi-agent learning is the answer, what is the question?"

In Logical Time, All Games are Iterated Games

2018-09-20T02:01:07.205Z · score: 83 (26 votes)
Comment by abramdemski on Track-Back Meditation · 2018-09-19T02:48:36.347Z · score: 2 (1 votes) · LW · GW

Yeah, one might think I'm going against the grain, recommending something that more experienced meditators warn against. On the other hand (and imho), we could take it as a warning against ordinary distractedness and ordinary unmindful involvement in thoughts. Focusing intentionally on one specific mental motion is very different.

Of course, that's for the goal of the book, which is about mindfulness meditation, which involves stabilizing your attention and strengthening your peripheral awareness.

The goal of mindfulness might be interpreted in different ways (and I haven't read that book yet), but ender the interpretation of defusion, I think there's nothing particularly harmful about the track-back exercise. It is possible that it can get you caught up in history and therefore fused with the thoughts, but it is also possible that looking at the history helps put thoughts at a little distance.

For example, someone using mindfulness to deal with cigarette cravings (trying to quit) is supposed to pay mindful attention to the craving, and "ride the wave" until the craving is over. It is possible that tracking back to what gave rise to the craving helps contextualize it and thus put it at a remove ("I was stressed just now, and then I started having the craving"). It is also possible that it takes you away from moment-to-moment presence, and the next thing you know, you find yourself reaching for a cigarette. I don't know for sure.

If your goal is related to debiasing, though, I think it's a pretty good form of mindfulness: the question "why did I have that thought?" is closely related to epistemic hygiene. "Why am I thinking this plan is bad? Ah, I started out being annoyed at Ellen for her bad driving, and then she mentioned this plan. But, her driving is unrelated to this plan..."

Comment by abramdemski on Good Citizenship Is Out Of Date · 2018-09-12T21:13:50.990Z · score: 5 (3 votes) · LW · GW

This is similar to, but slightly different from, the story in Bowling Alone. (Disclaimer: I haven't read Bowling Alone, only had several discussions about it with someone who has.)

One very interesting question is: why were good citizenship norms at their peak in the 1930s-1950s?

According to Bowling Alone the answer is that there was a massive club-formation burst from the late 1800s to the early 1900s. These clubs created the strong social fabric which allowed trust in the society overall to be high.

Why was there a burst of club creation? I don't know.

Track-Back Meditation

2018-09-11T10:31:53.354Z · score: 57 (21 votes)
Comment by abramdemski on Comment on decision theory · 2018-09-11T08:16:35.339Z · score: 29 (10 votes) · LW · GW
When you think about the problem this way, there are no counterfactuals, only state evolution. It can be applied to the past, to the present or to the future.

This doesn't give very useful answers when the state evolution is nearly deterministic, such as an agent made of computer code.

For example, consider an agent trying to decide whether to turn left or turn right. Suppose for the sake of argument that it actually turns left, if you run physics forward. Also suppose that the logical uncertainty has figured that out, so that the best-estimate macrostate probabilities are mostly on that. Now, the agent considers whether to turn left or right.

Since the computation (as pure math) is deterministic, counterfactuals which result from supposing the state evolution went right instead of left mostly consist of computer glitches in which the hardware failed. This doesn't seem like what the agent should be thinking about when it considers the alternative of going right instead of left. For example, the grocery store it is trying to get to could be on the right-hand path. The potential bad results of a hardware failure might outweigh the desire to turn toward the grocery store, so that the agent prefers to turn left.

For this story to make sense, the (logical) certainty that the abstract algorithm decides to turn left in this case has to be higher than the confidence that hardware will not fail, so that turning right seems likely to imply hardware failure. This can happen due to Löb's theorem: the whole above argument, as a hypothetical argument, suggests that the agent would turn left on a particular occasion if it happened to prove ahead of time that its abstract algorithm would turn left (since it would then be certain that turning right implied a hardware failure). But this means a proof of left-turning results in left-turning. Löb's theorem, left-turning is indeed provable.

The Newcomb's-problem example you give also seems problematic. Again, if the agent's algorithm is deterministic, it does basically one thing as long as the initial conditions are such that it is in Newcomb's problem. So, essentially all of the uncertainty about the agent's action is logical uncertainty. I'm not sure exactly what your intended notion of counterfactual is, but, I don't see how reasoning about microstates helps the agent here.

Comment by abramdemski on Comment on decision theory · 2018-09-11T07:50:39.330Z · score: 74 (23 votes) · LW · GW
What are the biggest issues that haven't been solved for UDT or FDT?

UDT was a fairly simple and workable idea in classical Bayesian settings with logical omniscience (or with some simple logical uncertainty treated as if it were empirical uncertainty), but it was always intended to utilize logical uncertainty at its core. Logical induction, our current-best theory of logical uncertainty, doesn't turn out to work very well with UDT so far. The basic problem seems to be that UDT required "updates" to be represented in a fairly explicit way: you have a prior which already contains all the potential things you can learn, and an update is just selecting certain possibilities. Logical induction, in contrast, starts out "really ignorant" and adds structure, not just content, to its beliefs over time. Optimizing via the early beliefs doesn't look like a very good option, as a result.

FDT requires a notion of logical causality, which hasn't appeared yet.

What is a co-ordination problem that hasn't been solved?

Taking logical uncertainty into account, all games become iterated games in a significant sense, because players can reason about each other by looking at what happens in very close situations. If the players have T seconds to think, they can simulate the same game but given t<<T time to think, for many t. So, they can learn from the sequence of "smaller" games.

This might seem like a good thing. For example, single-shot prisoner's dilemma has just a Nash equilibrium of defection. Iterated play cas cooperative equilibria, such as tit-for-tat.

Unfortunately, the folk theorem of game theory implies that there are a whole lot of fairly bad equilibria for iterated games as well. It is possible that each player enforces a cooperative equilibrium via tit-for-tat-like strategies. However, it is just as possible for players to end up in a mutual blackmail double bind, as follows:

Both players initially have some suspicion that the other player is following strategy X: "cooperate 1% of the time if and only if the other player is playing consistently with strategy X; otherwise, defect 100% of the time." As a result of this suspicion, both players play via strategy X in order to get the 1% cooperation rather than 0%.

Ridiculously bad "coordination" like that can be avoided via cooperative oracles, but that requires everyone to somehow have access to such a thing. Distributed oracles are more realistic in that each player can compute them just by reasoning about the others, but players using distributed oracles can be exploited.

So, how do you avoid supremely bad coordination in a way which isn't too badly exploitable?

And what still isn't known about counterfactuals?

The problem of specifying good counterfactuals sort of wraps up any and all other problems of decision theory into itself, which makes this a bit hard to answer. Different potential decision theories may lean more or less heavily on the counterfactuals. If you lead toward EDT-like decision theories, the problem with counterfactuals is mostly just the problem of making UDT-like solutions work. For CDT-like decision theories, it is the other way around; the problem of getting UDT to work is mostly about getting the right counterfactuals!

The mutual-blackmail problem I mentioned in my "coordination" answer is a good motivating example. How do you ensure that the agents don't come to think "I have to play strategy X, because if I don't, the other player will cooperate 0% of the time?"

Exorcizing the Speed Prior?

2018-07-22T06:45:34.980Z · score: 11 (4 votes)

Stable Pointers to Value III: Recursive Quantilization

2018-07-21T08:06:32.287Z · score: 17 (7 votes)

Probability is Real, and Value is Complex

2018-07-20T05:24:49.996Z · score: 44 (20 votes)

Complete Class: Consequentialist Foundations

2018-07-11T01:57:14.054Z · score: 43 (16 votes)

Policy Approval

2018-06-30T00:24:25.269Z · score: 42 (15 votes)

Machine Learning Analogy for Meditation (illustrated)

2018-06-28T22:51:29.994Z · score: 93 (35 votes)

Confusions Concerning Pre-Rationality

2018-05-23T00:01:39.519Z · score: 36 (7 votes)


2018-05-21T21:10:57.290Z · score: 84 (23 votes)

Bayes' Law is About Multiple Hypothesis Testing

2018-05-04T05:31:23.024Z · score: 81 (20 votes)

Words, Locally Defined

2018-05-03T23:26:31.203Z · score: 50 (15 votes)

Hufflepuff Cynicism on Hypocrisy

2018-03-29T21:01:29.179Z · score: 33 (17 votes)

Learn Bayes Nets!

2018-03-27T22:00:11.632Z · score: 84 (24 votes)

An Untrollable Mathematician Illustrated

2018-03-20T00:00:00.000Z · score: 260 (89 votes)

Explanation vs Rationalization

2018-02-22T23:46:48.377Z · score: 31 (8 votes)

The map has gears. They don't always turn.

2018-02-22T20:16:13.095Z · score: 54 (14 votes)

Toward a New Technical Explanation of Technical Explanation

2018-02-16T00:44:29.274Z · score: 126 (44 votes)

Two Types of Updatelessness

2018-02-15T20:19:54.575Z · score: 45 (12 votes)

Two Types of Updatelessness

2018-02-15T20:16:41.000Z · score: 0 (0 votes)

Hufflepuff Cynicism on Crocker's Rule

2018-02-14T00:52:37.065Z · score: 36 (12 votes)

Hufflepuff Cynicism

2018-02-13T02:15:50.945Z · score: 43 (16 votes)

Stable Pointers to Value II: Environmental Goals

2018-02-09T06:03:00.244Z · score: 27 (8 votes)

Stable Pointers to Value II: Environmental Goals

2018-02-09T06:02:43.000Z · score: 0 (0 votes)

Two Coordination Styles

2018-02-07T09:00:18.594Z · score: 82 (28 votes)

All Mathematicians are Trollable: Divergence of Naturalistic Logical Updates

2018-01-28T14:50:25.000Z · score: 4 (4 votes)

An Untrollable Mathematician

2018-01-23T18:46:17.000Z · score: 8 (8 votes)

Policy Selection Solves Most Problems

2017-12-01T00:35:47.000Z · score: 2 (2 votes)

Timeless Modesty?

2017-11-24T11:12:46.869Z · score: 25 (7 votes)

Gears Level & Policy Level

2017-11-24T07:17:51.525Z · score: 85 (30 votes)

Where does ADT Go Wrong?

2017-11-17T23:31:44.000Z · score: 2 (2 votes)

The Happy Dance Problem

2017-11-17T00:50:03.000Z · score: 9 (3 votes)