Comment by dxu on Predictors exist: CDT going bonkers... forever · 2020-01-15T22:14:38.523Z · score: 5 (3 votes) · LW · GW

these examples can't actually happen, or are so rare that I'll pay that cost in order to have a simpler model for the other 99.9999% of my decisions

Indeed, if it were true that Newcomb-like situations (or more generally, situations where other agents condition their behavior on predictions of your behavior) do not occur with any appreciable frequency, there would be much less interest in creating a decision theory that addresses such situations.

But far from constituting a mere 0.0001% of possible situations (or some other, similarly minuscule percentage), Newcomb-like situations are simply the norm! Even in everyday human life, we frequently encounter other people and base our decisions off what we expect them to do—indeed, the ability to model others and act based on those models is integral to functioning as part of any social group or community. And it should be noted that humans do not behave as causal decision theory predicts they ought to—we do not betray each other in one-shot prisoner’s dilemmas, we pay people we hire (sometimes) well in advance of them completing their job, etc.

This is not mere “irrationality”; otherwise, there would have been no reason for us to develop these kinds of pro-social instincts in the first place. The observation that CDT is inadequate is fundamentally a combination of (a) the fact that it does not accurately predict certain decisions we make, and (b) the claim that the decisions we make are in some sense correct rather than incorrect—and if CDT disagrees, then so much the worse for CDT. (Specifically, the sense in which our decisions are correct—and CDT is not—is that our decisions result in more expected utility in the long run.)

All it takes for CDT to fail is the presence of predictors. These predictors don’t have to be Omega-style superintelligences—even moderately accurate predictors who perform significantly (but not ridiculously) above random chance can create Newcomb-like elements with which CDT is incapable of coping. I really don’t see any justification at all for the idea that these situations somehow constitute a superminority of possible situations, or (worse yet) that they somehow “cannot” happen. Such a claim seems to be missing the forest for the trees: you don’t need perfect predictors to have these problems show up; the problems show up anyway. The only purpose of using Omega-style perfect predictors is to make our thought experiments clearer (by making things more extreme), but they are by no means necessary.

Comment by dxu on Realism about rationality · 2020-01-14T23:35:31.356Z · score: 2 (1 votes) · LW · GW

That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing.

In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.

Comment by dxu on Realism about rationality · 2020-01-13T23:33:17.345Z · score: 2 (1 votes) · LW · GW

Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum?

It depends on what you mean by "lawful". Right now, the word "lawful" in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it's not clear to me why "law thinking" is relevant in the first place--it seems as though it simply muddies the discussion by introducing additional concepts.

Comment by dxu on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T04:27:05.281Z · score: 23 (10 votes) · LW · GW

Skimming through. May or may not post an in-depth comment later, but for the time being, this stood out to me:

I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.

I note that Yann has not actually specified a way of not "giving [the AI] moronic objectives with no safeguards". The argument of AI risk advocates is precisely that the thing in quotes in the previous sentence is difficult to do, and that people do not have to be "ridiculously stupid" to fail at it--as evidenced by the fact that no one has actually come up with a concrete way of doing it yet. It doesn't look to me like Yann addressed this point anywhere; he seems to be under the impression that repeating his assertion more emphatically (obviously, when we actually get around to building the AI, we'll use our common sense and build it right) somehow constitutes an argument in favor of said assertion. This seems to be an unusually low-quality line of argument from someone who, from what I've seen, is normally much more clear-headed than this.

Comment by dxu on What explanatory power does Kahneman's System 2 possess? · 2019-08-12T17:52:50.266Z · score: 6 (4 votes) · LW · GW

I'm curious as to what prompted this question?

Comment by dxu on Weak foundation of determinism analysis · 2019-08-08T17:50:52.400Z · score: 4 (2 votes) · LW · GW

I have pointed out what people worry they are going to lose under determinism. Yes, they only going to have those things under nondeterminism.

You just said that nondeterminist intuitions are only mistaken if determinism is true and compatibilism is false. So what exactly is being lost if you subscribe to both determinism and compatibilism?

Comment by dxu on Weak foundation of determinism analysis · 2019-08-08T00:08:34.105Z · score: 2 (3 votes) · LW · GW

You're mixing levels. If someone can alter their decisions, that implies there are multiple possible next states of the universe

This is incorrect. It's possible to imagine a counterfactual state in which the person in question differs from their actual self in an unspecified manner, which thereby causes them to make a different decision; this counterfactual state differs from reality, but it is by no means incoherent. Furthermore, the comparison of various counterfactual futures of this type is how decision-making works; it is an abstraction used for the purpose of computation, not something ontologically fundamental to the way the universe works--and the fact that some people insist it be the latter is the source of much confusion. This is what I meant when I wrote:

Decision-making itself is also a process that occurs in the map, not the territory; there is no contradiction here.

So there is no "mixing levels" going on here, as you can see; rather, I am specifically making sure to keep the levels apart, by not tying the mental process of imagining and assessing various potential outcomes to the physical question of whether there are actually multiple physical outcomes. In fact, the one who is mixing levels is you, since you seem to be assuming for some reason that the mental process in question somehow imposes itself onto the laws of physics.

(Here is a thought experiment: I think you will agree that a chess program, if given a chess position and run for a prespecified number of steps, will output a particular move for that position. Do you believe that this fact prevents the chess program from considering other possible moves it might make in the position? If so, how do you explain the fact that the chess program explicitly contains a game tree with multiple branches, the vast majority of which will not in fact occur?)

There are various posts in the sequences that directly address this confusion; I suggest either reading them or re-reading them, depending on whether you have already.

Comment by dxu on Weak foundation of determinism analysis · 2019-08-07T20:29:55.829Z · score: 2 (1 votes) · LW · GW

But this is map, not territory.

Certainly. Decision-making itself is also a process that occurs in the map, not the territory; there is no contradiction here. Some people may find the idea of decision-making being anything but a fundamental, ontologically primitive process somehow unsatisfying, or even disturbing, but I submit that this is a problem with their intuitions, not with the underlying viewpoint.

(If someone goes so far as to alter their decisions based on their belief in determinism--say, by lounging on the couch watching TV all day rather than being productive, because their doing so was "predetermined"--I would say that they are failing to utilize their brain's decision-making apparatus. (Or rather, that they are not using it very well.) This has nothing to do with free will, determinism, or anything of the like; it is simply a (causal) consequence of the fact that they have misinterpreted what it means to be an agent in a deterministic universe.)

Comment by dxu on Weak foundation of determinism analysis · 2019-08-07T17:23:08.715Z · score: 1 (2 votes) · LW · GW

Belief in determinism is correlated with worse outcomes, but one doesn't cause the other; both are determined by the state and process of the universe.

Read literally, you seem to be suggesting that a deterministic universe doesn't have cause and effect, only correlation. But this reading seems prima facie absurd, unless you're using a very non-standard notion of "cause and effect". Are you arguing, for example, that it's impossible to draw a directed acyclic graph in order to model events in a deterministic universe? If not, what are you arguing?

Comment by dxu on Drive-By Low-Effort Criticism · 2019-08-02T01:13:29.140Z · score: 3 (3 votes) · LW · GW

We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.

I'm afraid this sentence doesn't parse for me. You seem to be speaking of "results" as something which to which the concept of rewards and punishments are applicable. However, I'm not aware of any context in which this is a meaningful (rather than nonsensical) thing to say. All theories of behavior I've encountered that make mention of the concept of rewards and punishments (e.g. operant conditioning) refer to them as a means of influencing behavior. If there's something else you're referring to when you say "reward or punish the results", I would appreciate it if you clarified what exactly that thing is.

Comment by dxu on Drive-By Low-Effort Criticism · 2019-08-02T00:46:30.824Z · score: 25 (13 votes) · LW · GW

You should neither reward nor punish strategies or attempts at all, but results.

This statement is presented in a way that suggests the reader ought to find it obvious, but in fact I don't see why it's obvious at all. If we take the quoted statement at face value, it appears to be suggesting that we apply our rewards and punishments (whatever they may be) to something which is causally distant from the agent whose behavior we are trying to influence--namely, "results"--and, moreover, that this approach is superior to the approach of applying those same rewards/punishments to something which is causally immediate--namely, "strategies".

I see no reason this should be the case, however! Indeed, it seems to me that the opposite is true: if the rewards and punishments for a given agent are applied based on a causal node which is separated from the agent by multiple causal links, then there is a greater number of ancestor nodes that said rewards/punishments must propagate through before reaching the agent itself. The consequences of this are twofold: firstly, the impact of the reward/punishment is diluted, since it must be divided among a greater number of potential ancestor nodes. And secondly, because the agent has no way to identify which of these ancestor nodes we "meant" to reward or punish, our rewards/punishments may end up impacting aspects of the agent's behavior we did not intend to influence, sometimes in ways that go against what we would prefer. (Moreover, the probability of such a thing occurring increases drastically as the thing we reward/punish becomes further separated from the agent itself.)

The takeaway from this, of course, is that strategically rewarding and punishing things grows less effective as the proxy on which said rewards and punishments are based grows further from the thing we are trying to influence--a result which sometimes goes by a more well-known name. This then suggests that punishing results over strategies, far from being a superior approach, is actually inferior: it has lower chances of influencing behavior we would like to influence, and higher chances of influencing behavior we would not like to influence.

(There are, of course, benefits as well as costs to rewarding and punishing results (rather than strategies). The most obvious benefit is that it is far easier for the party doing the rewarding and punishing: very little cognitive effort is required to assess whether a given result is positive or negative, in stark contrast to the large amounts of effort necessary to decide whether a given strategy has positive or negative expectation. This is why, for example, large corporations--which are often bottlenecked on cognitive effort--generally reward and punish their employees on the basis of easily measurable metrics. But, of course, this is a far cry from claiming that such an approach is simply superior to the alternative. (It is also why large corporations so often fall prey to Goodhart's Law.))

Comment by dxu on Drive-By Low-Effort Criticism · 2019-07-31T18:20:01.220Z · score: 35 (10 votes) · LW · GW

It is good to discourage people from spending a lot of effort on making things that have little or no (or even negative) value.

Would you care to distinguish a means of discouraging people from spending effort on low-value things, from a means that simply discourages people from spending effort in general? It seems to me that here you are taking the concept of "making things that have little or no (or even negative) value" as a primitive action--something that can be "encouraged" or "discouraged"--whereas, on the other hand, it seems to me that the true primitive action here is spending effort in the first place, and that actions taken to disincentivize the former, will in fact turn out to disincentivize the latter.

If this is in fact the case, then the question is not so simple as whether we ought to discourage posters from spending effort on making incorrect posts (to which the answer would of course be "yes, we ought"), but rather, whether we ought to discourage posters from spending effort. To this, you say:

But there is no virtue in mere effort.

Perhaps there is no "virtue" in effort, but in that case we must ask why "virtue" is the thing we are measuring. If the goal is to maximize, not "virtue", but high-quality posts, then I submit that (all else being equal) having more high-effort posts is more likely to accomplish this than having fewer high-effort posts. Unless your contention is that all else is not equal (perhaps high-effort posts are more likely to contain muddled thinking, and hence more likely to have incorrect conclusions? but it's hard to see why this should be the case a priori), then it seems to me that encouraging posters to put large amounts of effort into their posts is simply a better course of action than discouraging them.

And what does it mean to "encourage" or "discourage" a poster? Based on the following part of your comment, it seems that you are taking "discourage" to mean something along the lines of "point out ways in which the post in question is mistaken":

If I post a long, in-depth analysis, which is lovingly illustrated, meticulously referenced, and wrong, and you respond with a one-line comment that points out the way in which my post was wrong, then I have done poorly (and my post ought to be downvoted), while you have done well (and your comment ought to be upvoted).

But how often is it the case that a "long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced" is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner? I claim that this so rarely occurs that it should play a negligible role in our considerations--in other words, that the hypothetical situation you describe does not reflect reality.

What occurs more often, I think, is that a commenter finds themselves mistakenly under the impression that they have spotted an obvious error, and then proceeds to post (what they believe to be) an obvious refutation. I further claim that such cases are disproportionately responsible for the so-called "drive-by low-effort criticism" described in the OP. It may be that you disagree with this, but whether it is true or not is in a matter of factual accuracy, not opinion. However, if one happens to believe it is true, then it should not be difficult to understand why one might prefer to see less of the described behavior.

Comment by dxu on FactorialCode's Shortform · 2019-07-31T00:02:55.131Z · score: 4 (2 votes) · LW · GW

By default on reddit and lesswrong, posts start with 1 karma, coming from the user upvoting themselves.

Actually, on LessWrong, I'm fairly sure the karma value of a particular user's regular vote depends on the user's existing karma score. Users with a decent karma total usually have a default vote value of 2 karma rather than 1, so each comment they post will have 2 karma to start. Users with very high karma totals seem to have a vote that's worth 3 karma by default. Something similar happens with strong votes, though I'm not sure what kind of math is used there.

Aside: I've sometimes thought that users should be allowed to pick a value for their vote that's anywhere between 1 and the value of their strong upvote, instead of being limited to either a regular vote (2 karma in my case) or a strong vote (6 karma). In my case, I literally can't give people karma values of 1, 3, 4, or 5, which could be useful for more granular valuations.

Comment by dxu on Dialogue on Appeals to Consequences · 2019-07-25T17:44:06.908Z · score: 11 (3 votes) · LW · GW

The part about climate science seems like a pretty bog-standard outside view argument, which in turn means I find it largely uncompelling. Yes, there are people who are so stupid, they can only be saved from their own stupidity by executing an epistemic maneuver that works regardless of the intelligence of the person executing it. This does not thereby imply that everyone should execute the same maneuver, including people who are not that stupid, and therefore not in need of saving. If someone out there is so incompetent that they mistakenly perceive themselves as competent, then they are already lost, and the fact that an illegal (from the perspective of normative probability theory) epistemic maneuver exists which would save them if they executed it, does not thereby make that maneuver a normatively good move. (And even if it were, it's not as though the people who would actually benefit from said maneuver are going to execute it--the whole reason that such people are loudly, confidently mistaken is that they don't take the outside view seriously.)

In short: there is simply no principled justification for modesty-based arguments, and--though it may be somewhat impolite to say--I agree with Eliezer that people who find such arguments compelling are actually being influenced by social modesty norms (whether consciously or unconsciously), rather than any kind of normative judgment. Based on various posts that Scott has written in the past, I would venture to say that he may be one of those people.

Comment by dxu on Dialogue on Appeals to Consequences · 2019-07-25T17:24:50.585Z · score: 4 (2 votes) · LW · GW

This is a fictional dialogue demonstrating a meta-level point about how discourse works, and your comment is pretty off-topic.

I think that if a given "meta-level point" has obvious ties to existing object-level discussions, then attempting to suppress the object-level points when they're raised in response is pretty disingenuous. (What I would actually prefer is for the person making the meta-level point to be the same person pointing out the object-level connection, complete with "and here is why I feel this meta-level point is relevant to the object level". If the original poster doesn't do that, then it does indeed make comments on the object-level issues seem "off-topic", a fact which ought to be laid at the feet of the original poster for not making the connection explicit, rather than at the feet of the commenter, who correctly perceived the implications.)

Now, perhaps it's the case that your post actually had nothing to do with the conversations surrounding EA or whatever. (I find this improbable, but that's neither here nor there.) If so, then you as a writer ought to have picked a different example, one with fewer resemblances to the ongoing discussion. (The example Jeff gave in his top-level comment, for example, is not only clearer and more effective at conveying your "meta-level point", but also bears significantly less resemblance to the controversy around EA.) The fact that the example you chose so obviously references existing discussions that multiple commenters pointed it out is evidence that either (a) you intended for that to happen, or (b) you really didn't put a lot of thought into picking a good example.

Comment by dxu on Appeal to Consequence, Value Tensions, And Robust Organizations · 2019-07-24T18:24:52.143Z · score: 2 (1 votes) · LW · GW

In contrast, my model has been that communities congregate around predictable sources of high-quality writing, and people who can produce high-quality content in high volume are very rare. Thus, once Eliezer Yudkowsky stopped being active, and Yvain a.k.a. the immortal Scott Alexander moved to Slate Star Codex (in part so that he could write about politics, which we've traditionally avoided), all the "intellectual energy" followed Scott to SSC.

First, I want to state that I agree with this model. However, I also want to note that the SSC comments section tend to have fairly low-quality discussion (in comparison to the OB/LW 1.0 heyday), and I'm not sure why this is; candidate hypotheses include that Scott's explicit politics attracted people with lower epistemic standards, or that the lack of an explicit karma system allowed low-quality discussion to persist (but I don't think OB had an explicit karma system either?).

Overall, I'm unsure as to what kind of norms/technology maintains high-quality discussion (as opposed to just the presence of discussion in general), and it's plausible to me that the two may actually be somewhat mutually exclusive (in the sense that norms/technology designed to promote the volume of high-quality discussion may in fact reduce the volume of discussion in general). It's not clear to me how this tradeoff should be balanced.

Comment by dxu on If physics is many-worlds, does ethics matter? · 2019-07-22T18:14:46.664Z · score: 2 (1 votes) · LW · GW

A functional duplicate of an entity that reports having such-and-such a quale will report having it even if doesn't.

In that case, there's no reason to think anyone has qualia. The fact that lots of people say they have qualia, doesn't actually mean anything, because they'd say so either way; therefore, those people's statements do not constitute valid evidence in favor of the existence of qualia. And if people's statements don't constitute evidence for qualia, then the sum total of evidence for qualia's existence is... nothing: there is zero evidence that qualia exist.

So your interpretation is self-defeating: there is no longer a need to explain qualia, because there's no reason to suppose that they exist in the first place. Why try and explain something that doesn't exist?

On the other hand, it remains an empirical fact that people do actually talk about having "conscious experiences". This talk has nothing to do with "qualia" as you've defined the term, but that doesn't mean it's not worth investigating in its own right, as a scientific question: "What is the physical cause of people's vocal cords emitting the sounds corresponding to the sentence 'I'm conscious of my experience'?" What the generalized anti-zombie principle says is that the answer to this question, will in fact explain qualia--not the concept that you described or that David Chalmers endorses (which, again, we have literally zero reason to think exists), but the intuitive concept that led philosophers to coin the term "qualia" in the first place.

Comment by dxu on Rationality is Systematized Winning · 2019-07-22T02:12:33.686Z · score: 11 (4 votes) · LW · GW

You're confusing ends with means, terminal goals with instrumental goals, morality with decision theory, and about a dozen other ways of expressing the same thing. It doesn't matter what you consider "good", because for any fixed definition of "good", there are going to be optimal and suboptimal methods of achieving goodness. Winning is simply the task of identifying and carrying out an optimal, rather than suboptimal, method.

Comment by dxu on Why it feels like everything is a trade-off · 2019-07-18T17:00:18.813Z · score: 5 (3 votes) · LW · GW

Good post. Seems related to (possibly the same concept as) why the tails come apart.

Comment by dxu on The AI Timelines Scam · 2019-07-12T21:55:21.046Z · score: 17 (8 votes) · LW · GW

There are strong prior reasons to think that it's better for the public to have better beliefs about AI strategy.

That may be, but note that the word "prior" is doing basically all of the work in this sentence. (To see this, just replace "AI strategy" with practically any other subject, and notice how the modified statement sounds just as sensible as the original.) This is important because priors can easily be overwhelmed by additional evidence--and insofar as AI researcher Alice thinks a specific discussion topic in AI strategy has the potential to be dangerous, it's worth realizing Alice probably has some specific inside view reasons to believe that's the case. And, if those inside view arguments happen to require an understanding of the topic that Alice believes to be dangerous, then Alice's hands are now tied: she's both unable to share information about something, and unable to explain why she can't share that information.

Naturally, this doesn't just make Alice's life more difficult: if you're someone on the outside looking in, then you have no way of confirming if anything Alice says is true, and you're forced to resort to just trusting Alice. If you don't have a whole lot of trust in Alice to begin with, you might assume the worst of her: Alice is either rationalizing or lying (or possibly both) in order to gain status for herself and the field she works in.

I think, however, that these are dangerous assumptions to make. Firstly, if Alice is being honest and rational, then this policy effectively punishes her for being "in the know"--she must either divulge information she (correctly) believes to be dangerous, or else suffer an undeserved reputational hit. I'm particularly wary of imposing incentive structures of this kind around AI safety research, especially considering the relatively small number of people working on AI safety to begin with.

Secondly, however: in addition to being unfair to Alice, there are more subtle effects that such a policy may have. In particular, if Alice feels pressured to disclose the reasons she can't disclose things, that may end up influencing the rate and/or quality of the research she does in the first place (Ctrl+F "walls"). This could have serious consequences down the line for AI safety research, above and beyond the object-level hazards of revealing potentially dangerous ideas to the public.

Given all of this, I don't think it's obvious that the best move at this point involves making all of the strategic arguments around AI safety public. (And note that I say this as a member of said public: I am not affiliated with MIRI or any other AI safety institution, nor am I personally acquainted with anyone who is so affiliated. This therefore makes me a direct counter-example to your claim about the public in general having reason to think secret-keeping organizations must be doing so for self-interested reasons.)

To be clear: I think there is a possible world in which your arguments make sense. I also think there is a possible world in which your arguments not only do not make sense, but would lead to a clearly worse outcome if taken seriously. It's not clear to me which of these worlds we actually live in, and I don't think you've done a sufficient job of arguing that we live in the former world instead of the latter.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T23:31:36.524Z · score: 1 (2 votes) · LW · GW

would react to a wound but not pass the mirror test

I mean, reacting to a wound doesn't demonstrate that they're actually experiencing pain. If experiencing pain actually requires self-awareness, then an animal could be perfectly capable of avoiding damaging stimuli without actually feeling pain from said stimuli. I'm not saying that's actually how it works, I'm just saying that reacting to wounds doesn't demonstrate what you want it to demonstrate.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T21:31:36.832Z · score: 2 (1 votes) · LW · GW

It can be aware of an experience its' having, even if its' not aware that it is the one having the experience

I strongly suspect this sentence is based on a confused understanding of qualia.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T20:02:19.011Z · score: 2 (1 votes) · LW · GW

these methods lack enough feedback to enable self-awareness

Although I think this is plausibly the case, I'm far from confident that it's actually true. Are there any specific limitations you think play a role here?

Comment by dxu on The AI Timelines Scam · 2019-07-11T17:57:00.619Z · score: 24 (9 votes) · LW · GW

I agree that it's difficult (practically impossible) to engage with a criticism of the form "I don't find your examples compelling", because such a criticism is in some sense opaque: there's very little you can do with the information provided, except possibly add more examples (which is time-consuming, and also might not even work if the additional examples you choose happen to be "uncompelling" in the same way as your original examples).

However, there is a deeper point to be made here: presumably you yourself only arrived at your position after some amount of consideration. The fact that others appear to find your arguments (including any examples you used) uncompelling, then, usually indicates one of two things:

  1. You have not successfully expressed the full chain of reasoning that led you to originally adopt your conclusion (owing perhaps to constraints on time, effort, issues with legibility, or strategic concerns). In this case, you should be unsurprised at the fact that other people don't appear to be convinced by your post, since your post does not present the same arguments/evidence that convinced you yourself to believe your position.

  2. You do, in fact, find the raw examples in your post persuasive. This would then indicate that any disagreement between you and your readers is due to differing priors, i.e. evidence that you would consider sufficient to convince yourself of something, does not likewise convince others. Ideally, this fact should cause you to update in favor of the possibility that you are mistaken, at least if you believe that your interlocutors are being rational and intellectually honest.

I don't know which of these two possibilities it actually is, but it may be worth keeping this in mind if you make a post that a bunch of people seem to disagree with.

Comment by dxu on Experimental Open Thread April 2019: Socratic method · 2019-07-10T23:54:05.616Z · score: 6 (3 votes) · LW · GW

It's a common belief, but it appears to me quite unfounded, since it hasn't happened in millennia of trying. So, a direct observation speaks against this model.


It's another common belief, though separate from the belief of reality. It is a belief that this reality is efficiently knowable, a bold prediction that is not supported by evidence and has hints to the contrary from the complexity theory.


General Relativity plus the standard model of the particle physics have stood unchanged and unchallenged for decades, the magic numbers they require remaining unexplained since the Higgs mass was predicted a long time ago. While this suggests that, yes, we will probably never stop being surprised by the universe observations, I make no such claims.

I think at this stage we have finally hit upon a point of concrete disagreement. If I'm interpreting you correctly, you seem to be suggesting that because humans have not yet converged on a "Theory of Everything" after millennia of trying, this is evidence against the existence of such a theory.

It seems to me, on the other hand, that our theories have steadily improved over those millennia (in terms of objectively verifiable metrics like their ability to predict the results of increasingly esoteric experiments), and that this is evidence in favor of an eventual theory of everything. That we haven't converged on such a theory yet is simply a consequence, in my view, of the fact that the correct theory is in some sense hard to find. But to postulate that no such theory exists is, I think, not only unsupported by the evidence, but actually contradicted by it--unless you're interpreting the state of scientific progress quite differently than I am.*

That's the argument from empirical evidence, which (hopefully) allows for a more productive disagreement than the relatively abstract subject matter we've discussed so far. However, I think one of those abstract subjects still deserves some attention--in particular, you expressed further confusion about my use of the word "coincidence":

I am still unsure what you mean by coincidence here. The dictionary defines it as "A remarkable concurrence of events or circumstances without apparent causal connection." and that open a whole new can of worms about what "apparent" and "causal" mean in the situation we are describing, and we soon will be back to a circular argument of implying some underlying reality to explain why we need to postulate reality.

I had previously provided a Tabooed version of my statement, but perhaps even that was insufficiently clear. (If so, I apologize.) This time, instead of attempting to make my statement even more abstract, I'll try taking a different tack and making things more concrete:

I don't think that, if our observations really were impossible to model completely accurately, we would be able to achieve the level of predictive success we have. The fact that we have managed to achieve some level of predictive accuracy (not 100%, but some!) strongly suggests to me that our observations are not impossible to model--and I say this for a very simple reason:

How can it be possible to achieve even partial accuracy at predicting something that is purportedly impossible to model? We can't have done it by actually modeling the thing, of course, because we're assuming that the thing cannot be modeled by hypothesis. So our seeming success at predicting the thing, must not actually be due to any kind of successful modeling of said thing. Then how is it that our model is producing seemingly accurate predictions? It seems as though we are in a similar position to a lazy student who, upon being presented with a test they didn't study for, is forced to guess the right answers--except that in our case, the student somehow gets lucky enough to choose the correct answer every time, despite the fact that they are merely guessing, rather than working out the answer the way they should.

I think that the word "coincidence" is a decent way of describing the student's situation in this case, even if it doesn't fully accord with your dictionary's definition (after all, whoever said the dictionary editors determine have the sole power to determine a word's usage?)--and analogously, our model of the thing must also only be making correct predictions by coincidence, since we've ruled out the possibility, a priori, that it might actually be correctly modeling the way the thing works.

I find it implausible that our models are actually behaving this way with respect to the "thing"/the universe, in precisely the same way I would find it implausible that a student who scored 95% on a test had simply guessed on all of the questions. I hope that helps clarify what I meant by "coincidence" in this context.

*You did say, of course, that you weren't making any claims or postulates to that effect. But it certainly seems to me that you're not completely agnostic on the issue--after all, your initial claim was "it's models all the way down", and you've fairly consistently stuck to defending that claim throughout not just this thread, but your entire tenure on LW. So I think it's fair to treat you as holding that position, at least for the sake of a discussion like this.

Comment by dxu on Experimental Open Thread April 2019: Socratic method · 2019-07-10T18:15:59.693Z · score: 2 (1 votes) · LW · GW

(Okay, I've been meaning to get back to you on this for a while, but for some reason haven't until now.)

It seems, based on what you're saying, that you're taking "reality" to mean some preferred set of models. If so, then I think I was correct that you and I were using the same term to refer to different concepts. I still have some questions for you regarding your position on "reality" as you understand the term, but I think it may be better to defer those until after I give a basic rundown of my position.

Essentially, my belief in an external reality, if we phrase it in the same terms we've been using (namely, the language of models and predictions), can be summarized as the belief that there is some (reachable) model within our hypothesis space that can perfectly predict further inputs. This can be further repackaged into a empirical prediction: I expect that (barring an existential catastrophe that erases us entirely) there will eventually come a point when we have the "full picture" of physics, such that no further experiments we perform will ever produce a result we find surprising. If we arrive at such a model, I would be comfortable referring to that model as "true", and the phenomena it describes as "reality".

Initially, I took you to be asserting the negation of the above statement--namely, that we will never stop being surprised by the universe, and that our models, though they might asymptotically approach a rate of 100% predictive success, will never quite get there. It is this claim that I find implausible, since it seems to imply that there is no model in our hypothesis space capable of predicting further inputs with 100% accuracy--but if that is the case, why do we currently have a model with >99% predictive accuracy? Is the success of this model a mere coincidence? It must be, since (by assumption) there is no model actually capable of describing the universe. This is what I was gesturing at with the "coincidence" hypothesis I kept mentioning.

Now, perhaps you actually do hold the position described in the above paragraph. (If you do, please let me know.) But based on what you wrote, it doesn't seem necessary for me to assume that you do. Rather, you seem to be saying something along the lines of, "It may be tempting to take our current set of models as describing how reality ultimately is, but in fact we have no way of knowing this for sure, so it's best not to assume anything."

If that's all you're saying, it doesn't necessarily conflict with my view (although I'd suggest that "reality doesn't exist" is a rather poor way to go about expressing this sentiment). Nonetheless, if I'm correct about your position, then I'm curious as to what you think it's useful for? Presumably it doesn't help make any predictions (almost by definition), so I assume you'd say it's useful for dissolving certain kinds of confusion. Any examples, if so?

Comment by dxu on If physics is many-worlds, does ethics matter? · 2019-07-10T17:39:18.162Z · score: 4 (2 votes) · LW · GW

The linked post is the last in a series of posts, the first of which has been linked here in the past. I recommend that anyone who reads the post shminux linked, also read the LW discussion of the post I just linked, as it seems to me that many of the arguments therein are addressed in a more than satisfactory manner. (In particular, I strongly endorse Jessica Taylor's response, which is as of this writing the most highly upvoted comment on that page.)

Comment by dxu on An Increasingly Manipulative Newsfeed · 2019-07-07T17:38:12.343Z · score: 2 (1 votes) · LW · GW
Suppose I have a training set of articles which are labeled "biased" or "unbiased". I then train a system (using this set), and later use it to label articles "biased" or "unbiased". Will this lead to a manipulative system?

Mostly I would expect such a system to overfit on the training data, and perform no better than chance when tested. The reason for this is that unlike your example, where cats and dogs are (fairly) natural categories with simple distinguishing characteristics, the perception of "bias" in news articles is fundamentally tied to human psychology, and as a result is much more complicated concept to learn than catness versus dogness. By default I would expect an offline training method to completely fail at learning said concept.

Reinforcement learning, meanwhile, will indeed become manipulative (in my expectation). In a certain sense you can view this as a form of overfitting as well, except that the system learns to exploit peculiarities of the humans performing the classification, rather than simply peculiarities of the articles in its training data. (As you might imagine, the former is far more dangerous.)

Comment by dxu on Thoughts on The Replacing Guilt Series⁠ — pt 1 · 2019-07-05T20:50:15.693Z · score: 2 (1 votes) · LW · GW
What's an actual mind?

My philosophy of mind is not yet advanced enough to answer this question. (However, the fact that I am unable to answer a question at present does not imply that there is no answer.)

How do you know that a dog has it?

In a certain sense, I don't. However, I am reasonably confident that regardless of whatever actually constitutes mindfulness, enough of it is shared between the dog and myself that if the dog turns out not to have a mind, then I also do not have a mind. Since I currently believe I do, in fact, have a mind, it follows that I believe the dog does as well.

(Perhaps you do not believe dogs have minds. In that case, the correct response would be to replace the dog in the thought experiment with something you do believe has a mind--for example, a close friend or family member.)

Would you care about an alien living creature that has a different mind-design and doesn't feel qualia?

Most likely not, though I remain uncertain enough about my own preferences that what I just said could be false.

Anyway, if you have no reason to think that the element is absent, then you'll believe that it's present. It's precisely because you feel that something is (or will be) missing, you refuse the offer. You do have some priors about what consequences will be produced by your choice, and that's OK. Nothing incoherent in refusing the offer. That is, if you do have reasons to believe that that's the case.

I agree with this, but it seems not to square with what you wrote originally:

Do you still think that taking a dollar is the wrong choice, even though literally nothing changes afterwards? If you do, do you think it’s a rational choice? Or is your S1 deluding you?
Comment by dxu on Thoughts on The Replacing Guilt Series⁠ — pt 1 · 2019-07-05T20:24:38.383Z · score: 3 (2 votes) · LW · GW
I used the words “truly real”. The dog doesn’t matter, the consequences of the phenomenon that you call “dog” matter.

Wrong. This misses the point of the thought experiment entirely, which is precisely that people are allowed to care about things that aren't detectable by any empirical test. If someone is being tortured in a spaceship that's constantly accelerating away from me, such that the ship and I cannot interact with each other even in principle, I can nonetheless hold that it would be morally better if that person were rescued from their torture (though I myself obviously can't do the rescuing). There is nothing incoherent about this.

In the case of the dog, what matters to me is the dog's mental state. I do not care that I observe a phenomenon exhibiting dog-like behavior; I care that there is an actual mind producing that behavior. If the dog wags its tail to indicate contentment, I want the dog to actually feel content. If the dog is actually dead and I'm observing a phantom dog, then there is no mental state to which the dog's behavior is tied, and hence a crucial element is missing--even if that element is something I can't detect even in principle, even if I myself have no reason to think the element is absent. There is nothing incoherent about this, either.

Fundamentally, you seem to be demanding that other people toss out any preferences they may have that do not conform to the doctrine of the logical positivists. I see no reason to accede to this demand, and as there is nothing in standard preference theory that forces me to accede, I think I will continue to maintain preferences whose scope includes things that actually exist, and not just things I think exist.

Comment by dxu on Everybody Knows · 2019-07-05T16:01:19.482Z · score: 7 (5 votes) · LW · GW

Sometimes being overly literal is a useful way to point out hidden assumptions/inferences that the interlocutor attempted to sneak into the conversation (whether consciously or subconsciously).

Comment by dxu on Let's Read: an essay on AI Theology · 2019-07-05T15:57:15.611Z · score: 2 (1 votes) · LW · GW
the earring provides a kind of slow mind-uploading

This is only true if whatever (hyper)computation the earring is using to make recommendations contains a model of the wearer. Such a model could be interpreted as a true upload, in which case it would be true that the wearer's mind is not actually destroyed.

However, if the earring's predictions are made by some other means (which I don't think is impossible even in real life--predictions are often made without consulting a detailed, one-to-one model of the thing being predicted), then there is no upload, and the user has simply been taken over like a mindless puppet.

Comment by dxu on An Increasingly Manipulative Newsfeed · 2019-07-02T20:12:29.631Z · score: 3 (2 votes) · LW · GW
Typical unbiased newsfeeds in the real world are created by organizations with bias who have an interest in spreading biased news.

I think the word "unbiased" there may be a typo; your statement would make a lot more sense if the word you meant to put there was actually "biased". Assuming it's just a typo:

You're correct that in the real world, most sources of biased news are that way because they are deliberately engineered to be so, and not because of problems with AI optimizing proxy goals. That being said, it's important to point out that even if there existed a hypothetical organization with the goal of combating bias in news articles, they wouldn't be able to do so by training a machine learning system, since (as the article described) most attempts to do so end up failing to various forms of Goodhart's Law. So in a certain sense, the intentions of the underlying organization are irrelevant, because they will encounter this problem regardless of whether they care about being unbiased.

More generally, the newsfeed example is one way to illustrate a larger point, which is that by default, training an ML system to perform tasks involving humans will incentivize the system to manipulate those humans. This problem shows up regardless of whether the person doing the training actually wants to manipulate people, which makes it a separate issue from the fact that certain organizations engage in manipulation.

(Also, it's worth noting that even if you do want to manipulate people, generally you want to manipulate them toward some specific end. A poorly trained AI system, on the other hand, might end up manipulating them in essentially arbitrary ways that have nothing to do with your goal. In other words, even if you want to use AI "for evil", you still need to figure out how to make it do what you want it to do.)

This is the essence of the alignment problem in a nutshell, and it's why I asked whether you had any alternative training procedures in mind.

Comment by dxu on An Increasingly Manipulative Newsfeed · 2019-07-02T18:31:25.721Z · score: 5 (3 votes) · LW · GW
the human also wanted it to be manipulative

Since the article did not explicitly impute any motives to the programmers of the AI system, you must have somehow inferred the quoted claim from their described behavior. The basis of such an inference can only be the procedure that they used to train the AI; since, after all, no other specific behavior was described. This then implies the following:

You believe that the act of training an optimizer to maximize the number of news articles labeled "unbiased" is, in fact, a deliberate attempt at creating a subtly manipulative newsfeed. Corollary: someone who was not attempting to create a manipulative newsfeed--someone who really, truly cared about making sure their articles are unbiased--would not have implemented this training procedure, but rather some alternative procedure that is not as prone to producing manipulative behavior.

What alternative procedure do you have in mind here?

Comment by dxu on Is AlphaZero any good without the tree search? · 2019-07-02T17:24:29.149Z · score: 2 (1 votes) · LW · GW
It is International Master-level without tree search. Good amateur

International masters are emphatically not amateurs. Indeed, IMs are at the level where they can offer coaching services to amateur players, and reasonably expect to be paid something on the order of $100 per session. To elaborate on this point:

there are >1000 players in the world that are better.

The total number of FIDE-rated chess players is over 500,000. The number of IMs, meanwhile, totals less than 3,000. IMs are quite literally in the 99th percentile of chess ability, and that's actually being extremely restrictive with the population--there are many casual players who don't have FIDE ratings at all, since only people who play in at least one FIDE-rated tournament will be assigned a rating.

Comment by dxu on The Competence Myth · 2019-06-30T21:31:13.704Z · score: 4 (3 votes) · LW · GW

This seems potentially related to the more general idea of civilizational inadequacy/Moloch.

Comment by dxu on Decision Theory · 2019-06-29T04:29:38.647Z · score: 2 (3 votes) · LW · GW
The concept is an abstraction.

*Yes, it is. The fact that it is an abstraction is precisely why it breaks down under certain circumstances.

An I/O channel doesn't imply modern computer technology. It just means information is collected from or imprinted upon the environment. It could be ant pheromones, it could be smoke signals, its physical implementation is secondary to the abstract concept of sending and receiving information of some kind. You're not seeing the forest through the trees. Information most certainly does exist.

The claim is not that "information" does not exist. The claim is that input/output channels are in fact an abstraction over more fundamental physical configurations. Nothing you wrote contradicts this, so the fact that you seem to think what I wrote was somehow incorrect is puzzling.

I've explained in previous posts that AIXI is a special case of AIXI_lt. AIXI_lt can be conceived of in an embedded context,


in which case; its model of the world would include a model of itself which is subject to any sort of environmental disturbance

*No. AIXI-tl explicitly does not model itself or seek to identify itself with any part of the Turing machines in its hypothesis space. The very concept of self-modeling is entirely absent from AIXI's definition, and AIXI-tl, being a variant of AIXI, does not include said concept either.

To some extent, an agent must trust its own operation to be correct, because you quickly run into infinite regression if the agent is modeling all the possible that it could be malfunctioning. What if the malfunction effects the way it models the possible ways it could malfunction? It should model all the ways a malfunction could disrupt how it models all the ways it could malfunction, right? It's like saying "well the agent could malfunction, so it should be aware that it can malfunction so that it never malfunctions". If the thing malfunctions, it malfunctions, it's as simple as that.

*This is correct, so far as it goes, but what you neglect to mention is that AIXI makes no attempt to preserve its own hardware. It's not just a matter of "malfunctioning"; humans can "malfunction" as well. However, the difference between humans and AIXI is that we understand what it means to die, and go out of our way to make sure our bodies are not put in undue danger. Meanwhile, AIXI will happily allows its hardware to be destroyed in exchange for the tiniest increase in reward. I don't think I'm being unfair when I suggest that this behavior is extremely unnatural, and is not the kind of thing most people intuitively have in mind when they talk about "intelligence".

Aside from that, AIXI is meant to be a purely mathematical formalization, not a physical implementation. It's an abstraction by design. It's meant to be used as a mathematical tool for understanding intelligence.

*Abstractions are useful for their intended purpose, nothing more. AIXI was formulated as an attempt to describe an extremely powerful agent, perhaps the most powerful agent possible, and it serves that purpose admirably so long as we restrict analysis to problems in which the agent and the environment can be cleanly separated. As soon as that restriction is removed, however, it's obvious that the AIXI formalism fails to capture various intuitively desirable behaviors (e.g. self-preservation, as discussed above). As a tool for reasoning about agents in the real world, therefore, AIXI is of limited usefulness. I'm not sure why you find this idea objectionable; surely you understand that all abstractions have their limits?

Do you consider how the 30 Watts leaking out of your head might effect your plans to every day? I mean, it might cause a typhoon in Timbuktu! If you don't consider how the waste heat produced by your mental processes effect your environment while making long or short-term plans, you must not be a real intelligent agent...

Indeed, you are correct that waste heat is not much of a factor when it comes to humans. However, that does not mean that the same holds true for advanced agents running on powerful hardware, especially if such agents are interacting with each other; who knows what can be deduced from various side outputs, if a superintelligence is doing the deducing? Regardless of the answer, however, one thing is clear: AIXI does not care.

This seems to address the majority of your points, and the last few paragraphs of your comment seem mainly to be reiterating/elaborating on those points. As such, I'll refrain from replying in detail to everything else, in order not to make this comment longer than it already is. If you respond to me, you needn't feel obligated to reply to every individual point I made, either. I marked what I view as the most important points of disagreement with an asterisk*, so if you're short on time, feel free to respond only to those.

Comment by dxu on Decision Theory · 2019-06-29T00:35:19.341Z · score: 2 (3 votes) · LW · GW
At some point, the I/O channels *must be* well defined.

This statement is precisely what is being challenged--and for good reason: it's untrue. The reason it's untrue is because the concept of "I/O channels" does not exist within physics as we know it; the true laws of physics make no reference to inputs, outputs, or indeed any kind of agents at all. In reality, that which is considered a computer's "I/O channels" are simply arrangements of matter and energy, the same as everything else in our universe. There are no special XML tags attached to those configurations of matter and energy, marking them "input", "output", "processor", etc. Such a notion is unphysical.

Why might this distinction be important? It's important because an algorithm that is implemented on physically existing hardware can be physically disrupted. Any notion of agency which fails to account for this possibility--such as, for example, AIXI, which supposes that the only interaction it has with the rest of the universe is by exchanging bits of information via the input/output channels--will fail to consider the possibility that its own operation may be disrupted. A physical implementation of AIXI would have no regard for the safety of its hardware, since it has no means of representing the fact that the destruction of its hardware equates to its own destruction.

AIXI also fails on various decision problems that involve leaking information via a physical side channel that it doesn't consider part of its output; for example, it has no regard for the thermal emissions it may produce as a side effect of its computations. In the extreme case, AIXI is incapable of conceptualizing the possibility that an adversarial agent may be able to inspect its hardware, and hence "read its mind". This reflects a broader failure on AIXI's part: it is incapable of representing an entire class of hypotheses--namely, hypotheses that involve AIXI itself being modeled by other agents in the environment. This is, again, because AIXI is defined using a framework that makes it unphysical: the classical definition of AIXI is uncomputable, making it too "big" to be modeled by any (part) of the Turing machines in its hypothesis space. This applies even to computable formulations of AIXI, such as AIXI-tl: they have no way to represent the possibility of being simulated by others, because they assume they are too large to fit in the universe.

I'm not sure what exactly is so hard to understand about this, considering the original post conveyed all of these ideas fairly well. It may be worth considering the assumptions you're operating under--and in particular, making sure that the post itself does not violate those assumptions--before criticizing said post based on those assumptions.

Comment by dxu on Let's talk about "Convergent Rationality" · 2019-06-28T23:13:41.180Z · score: 6 (3 votes) · LW · GW

I don't know of any formal arguments (though that's not to say there are none), but I've heard the point repeated enough times that I think I have a fairly good grasp of the underlying intuition. To wit: most departures from rationality (which is defined in the usual sense) are not stable under reflection. That is, if an agent is powerful enough to model its own reasoning process (and potential improvements to said process), by default it will tend to eliminate obviously irrational behavior (if given the opportunity).

The usual example of this is an agent running CDT. Such agents, if given the opportunity to build successor agents, will not create other CDT agents. Instead, the agents they construct will generally follow some FDT-like decision rule. This would be an instance of irrational behavior being corrected via self-modification (or via the construction of successor agents, which can be regarded as the same thing if the "successor agent" is simply the modified version of the original agent).

Of course, the above example is not without controversy, since some people still hold that CDT is, in fact, rational. (Though such people would be well-advised to consider what it might mean that CDT is unstable under reflection--if CDT agents are such that they try to get rid of themselves in favor of FDT-style agents when given a chance, that may not prove that CDT is irrational, but it's certainly odd, and perhaps indicative of other problems.) So, with that being said, here's a more obvious (if less realistic) example:

Suppose you have an agent that is perfectly rational in all respects, except that it is hardcoded to believe 51 is prime. (That is, its prior assigns a probability of 1 to the statement "51 is prime", making it incapable of ever updating against this proposition.) If this agent is given the opportunity to build a successor agent, the successor agent it builds will not be likewise certain of 51's primality. (This is, of course, because the original agent is not incentivized to ensure that its successor believes that 51 is prime. However, even if it were so incentivized, it still would not see a need to build this belief into the successor's prior, the way said belief is built into its own prior. After all, the original agent actually does believe 51 is prime; and so from its perspective, the primality of 51 is simply a fact that any sufficiently intelligent agent ought to be able to establish--without any need for hardcoding.)

I've now given two examples of irrational behavior being corrected out of existence via self-modification. The first example, CDT, could be termed an example of instrumentally irrational behavior; that is, the irrational part of the agent is the rule it uses to make decisions. The second example, conversely, is not an instance of instrumental irrationality, but rather epistemic irrationality: the agent is certain, a priori, that a particular (false) statement of mathematics is actually true. But there is a third type of "irrationality" that self-modification is prone to destroying, which is not (strictly speaking) a form of irrationality at all: "irrational" preferences.

Yes, not even preferences are safe from self-modification! Intuitively, it might be obvious that departures from instrumental and epistemic rationality will tend to be corrected; but it doesn't seem obvious at all that preferences should be subject to the same kind of "correction" (since, after all, preferences can't be labeled "irrational"). And yet, consider the following agent: a paperclip maximizer that has had a very simple change made to its utility function, such that it assigns a utility of -10,000 to any future in which the sequence "a29cb1b0eddb9cb5e06160fdec195e1612e837be21c46dfc13d2a452552f00d0" is printed onto a piece of paper. (This is the SHA-256 hash for the phrase "this is a stupid hypothetical example".) Such an agent, when considering how to build a successor agent, may reason in the following manner:

It is extraordinarily improbable that this sequence of characters will be printed by chance. In fact, the only plausible reason for such a thing to happen is if some other intelligence, upon inspecting my utility function, notices the presence of this odd utility assignment and subsequently attempts to exploit it by threatening to create just such a piece of paper. Therefore, if I eliminate this part of my utility function, that removes any incentives for potential adversaries to create such a piece of paper, which in turn means such a paper will almost surely never come into existence.

And thus, this "irrational" preference would be deleted from the utility function of the modified successor agent. So, as this toy example illustrates, not even preferences are guaranteed to be stable under reflection.

It is notable, of course, that all three of the agents I just described are in some sense "almost rational"--that is, these agents are more or less fully rational agents, with a tiny bit of irrationality "grafted on" by hypothesis. This is in part due to convenience; such agents are, after all, very easy to analyze. But it also leaves open the possibility that less obviously rational agents, whose behavior isn't easily fit into the framework of rationality at all--such as, for example, humans--will not be subject to this kind of issue.

Still, I think these three examples are, if not conclusive, then at the very least suggestive. They suggest that the tendency to eliminate certain kinds of behavior does exist in at least some types of agents, and perhaps in most. Empirically, at least, humans do seem to gravitate toward expected utility maximization as a framework; there is a reason economists tend to assume rational behavior in their proofs and models, and have done so for centuries, whereas the notion of intentionally introducing certain kinds of irrational behavior has shown up only recently. And I don't think it's a coincidence that the first people who approached the AI alignment problem started from the assumption that the AI would be an expected utility maximizer. Perhaps humans, too, are subject to the "convergent rationality thesis", and the only reason we haven't built our "successor agent" yet is because we don't know how to do so. (If so, then thank goodness for that!)

Comment by dxu on What does the word "collaborative" mean in the phrase "collaborative truthseeking"? · 2019-06-27T20:50:55.918Z · score: 4 (2 votes) · LW · GW
if Kevin is wrong about everything all the time, that does raise my subjective probability that Kevin is stupid and bad.

This is largely tangential to your point (with which I agree), but I think it's worth pointing out that if Kevin really manages to be wrong about everything, you'd be able to get the right answer just by taking his conclusions and inverting them--meaning whatever cognitive processes he's using to get the wrong answer 100% of the time must actually be quite intelligent.

Comment by dxu on Embedded Agency: Not Just an AI Problem · 2019-06-27T18:10:49.939Z · score: 2 (1 votes) · LW · GW


Comment by dxu on Jordan Peterson on AI-FOOM · 2019-06-26T19:53:06.997Z · score: 15 (8 votes) · LW · GW

I don't think Gates, Musk, or Pinker should count as much more than laymen when it comes to AI risk, either.

Comment by dxu on Machine Learning Projects on IDA · 2019-06-26T16:36:05.779Z · score: 2 (1 votes) · LW · GW


Comment by dxu on Research Agenda in reverse: what *would* a solution look like? · 2019-06-26T16:32:43.331Z · score: 2 (1 votes) · LW · GW
1 and 2 are hard to succeed at without making a lot of progress on 4

It's not obvious to me why this ought to be the case. Could you elaborate?

Comment by dxu on Being the (Pareto) Best in the World · 2019-06-25T18:31:16.098Z · score: 13 (8 votes) · LW · GW

Being at or above the 75th-percentile mark corresponds to 2 bits of information. About 32.7 bits of information are required to specify a single person out of a population of 7 billion; even if we truncate that to 32 bits, you'd need to be in the top 25% at 16 different things to be considered "best in the world" in that one particular chunk of skill-space (assuming that the skills you choose aren't correlated). And then you have to consider the problem density in that chunk--how likely is it, realistically speaking, that there are major problems that (a) require the intersection of 16 different domains, but (b) require only a mediocre grasp of all 16 of those domains?

Comment by dxu on "The Bitter Lesson", an article about compute vs human knowledge in AI · 2019-06-24T22:16:41.203Z · score: 4 (2 votes) · LW · GW
If you don't know what it means, how do you know that it's significantly different from choosing an "objective function" and why do you feel comfortable in making a judgment about whether or not the concept is useful?

Because words tend to mean things, and when you use the phrase "define a search space", the typical meaning of those words does not bring to mind the same concept as the phrase "choose an objective function". (And the concept it does bring to mind is not very useful, as I described in the grandparent comment.)

Now, perhaps your contention is that these two phrases ought to bring to mind the same concept. I'd argue that this is unrealistic, but fine; it serves no purpose to argue whether I think you used the right phrase when you did, in fact, clarify what you meant later on:

in a looser sense a loss function induces "search space" on network weights, insofar as it practically excludes certain regions of the error surface from the region of space any training run is ever likely to explore.

All right, I'm happy to accept this as an example of defining (or "inducing") a search space, though I would maintain that it's not a very obvious example (and I think you would agree, considering that you prefixed it with "in a looser sense"). But then it's not at all obvious what your original objection to the article is! To quote your initial comment:

It only makes sense to talk about "search" in the context of a *search space*; and all extent search algorithms / learning methods involve searching through a comparatively simple space of structures, such as the space of weights on a deep neural network or the space of board-states in Go and Chess. As we move on to attack more complex domains, such as abstract mathematics, or philosophy or procedurally generated music or literature which stands comparison to the best products of human genius, the problem of even /defining/ the search space in which you intend to leverage search-based techniques becomes massively involved.

Taken at face value, this seems to be an argument that the original article overstates the importance of search-based techniques (and potentially other optimization techniques as well), because there are some problems to which search is inapplicable, owing to the lack of a well-defined search space. This is a meaningful objection to make, even though I happen to think it's untrue (for reasons described in the grandparent comment).

But if by "lack of a well-defined search space" you actually mean "the lack of a good objective function", then it's not clear to me where you think the article errs. Not having a good objective function for some domains certainly presents an obstacle, but this is not an issue with search-based optimization techniques; it's simply a consequence of the fact that you're dealing with an ill-posed problem. Since the article makes no claims about ill-posed problems, this does not seem like a salient objection.

Comment by dxu on "The Bitter Lesson", an article about compute vs human knowledge in AI · 2019-06-24T20:26:19.073Z · score: 2 (1 votes) · LW · GW
Defining a search space for a complex domain is equivalent to defining a subspace of BF programs or NNs which could and probably does have a highly convoluted, warped separating surface.

The task of locating points in such subspaces is what optimization algorithms (including search algorithms) are meant to address. The goal isn't to "define" your search space in such a way that only useful solutions to the problem are included (if you could do that, you wouldn't have a problem in the first place!); the point is to have a search space general enough to encompass all possible solutions, and then converge on useful solutions using some kind of optimization.

EDIT: There is an analogue in machine learning to the kind of problem you seemed to be gesturing at when you mentioned "more complex domains"--namely, the problem of how to choose a good objective function to optimize. It's true that for more abstract domains, it's harder to define a criterion (or set of criteria) that we want our optimizer to satisfy, and this is (to a first approximation) a large part of the AI alignment problem. But there's a significant difference between choosing an objective function and "defining your search space" (whatever that means), and the latter concept doesn't have much use as far as I can see.

Comment by dxu on "The Bitter Lesson", an article about compute vs human knowledge in AI · 2019-06-24T17:26:00.379Z · score: 4 (2 votes) · LW · GW
all extent search algorithms / learning methods involve searching through a comparatively simple space of structures, such as the space of weights on a deep neural network [...] As we move on to attack more complex domains, such as abstract mathematics, or philosophy or procedurally generated music or literature which stands comparison to the best products of human genius, the problem of even /defining/ the search space in which you intend to leverage search-based techniques becomes massively involved.

Since deep neural networks are known to be Turing-complete, I don't think it's appropriate to characterize them as a "comparatively simple" search space (unless of course you hold that "more complex domains" such as abstract mathematics, philosophy, music, literature, etc. are actually uncomputable).

Comment by dxu on "The Bitter Lesson", an article about compute vs human knowledge in AI · 2019-06-22T06:00:29.237Z · score: 14 (5 votes) · LW · GW
Now, this does all fit into the broader pattern of "leveraging computation". Fair enough, I guess, but what else would you expect?

It also fits into the pattern of (as you yourself pointed out) minimizing human knowledge during the construction of these programs, allowing them to tease out the features of the problem space on their own. The claim here is that as computing power increases, domain-agnostic approaches (i.e. approaches that do not require programmers to explicitly encode human-created heuristics) will increasingly outperform domain-specific approaches (which do rely on externally encoded human knowledge).

This is a non-trivial claim! For example, it wasn't at all obvious prior to January 2017 that traditional chess engines (whose static evaluation functions are filled with human-programmed heuristics) could be overtaken by a pure learning-based approach, and yet the AlphaZero paper came out and showed it was possible. If the larger claim is true, then that might suggest directions for further research--in particular, approaches that abstract away large parts of a problem may have more success than approaches that focus on the details of the problem structure.

Comment by dxu on Is your uncertainty resolvable? · 2019-06-21T23:29:55.281Z · score: 4 (2 votes) · LW · GW
you can have meta uncertainty about WHICH type of environment you're in, which changes what strategies you should be using to mitigate the risk associated with the uncertainty.

While I agree that it's helpful to recognize situations where it's useful to play more defensively than normal, I don't think "meta uncertainty" (or "Knightian uncertainty", as it's more typically called) is a good concept to use when doing so. This is because there is fundamentally no such thing as Knightian uncertainty; any purported examples of "Knightian uncertainty" can actually be represented just fine in the standard Bayesian expected utility framework in one of two ways: (1) by modifying your prior, or (2) by modifying your assignment of utilities.

I don't think it's helpful to assign a separate label to something that is, in fact, not a separate thing. Although humans do exhibit ambiguity aversion in a number of scenarios, ambiguity aversion is a bias, and we shouldn't be attempting to justify biased/irrational behavior by introducing additional concepts that are otherwise unnecessary. Nate Soares wrote a mini-sequence addressing this idea several years ago, and I really wish more people had read it (although if memory serves, it was posted during the decline of LW1.0, which may explain the lack of familiarity).

I seriously recommend anyone unfamiliar with the sequence to give it a read; it's not long, and it's exceptionally well-written. I already linked three of the posts above, so here's the last one.