Comment by dxu on Coronavirus as a test-run for X-risks · 2020-06-14T00:06:03.697Z · score: 3 (2 votes) · LW · GW

I'd love to see more thought about how the MNM effect might look in an AI scenario. Like you said, maybe denials and assurances followed by freakouts and bans. But maybe we could predict what sorts of events would trigger the shift?

I take it you're presuming slow takeoff in this paragraph, right?

Comment by dxu on Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning · 2020-06-10T16:59:04.237Z · score: 4 (2 votes) · LW · GW

Differing discourse norms; in general, communities that don't expend a constant amount of time-energy into maintaining better-than-average standards of discourse will, by default, regress to the mean. (We saw the same thing happen with LW1.0.)

Comment by dxu on GPT-3: a disappointing paper · 2020-06-02T20:06:38.721Z · score: 21 (6 votes) · LW · GW

I'm not seeing how you distinguish between the following two hypotheses:

  1. GPT-3 exhibits mostly flat scaling at the tasks you mention underneath your first bullet point (WiC, MultiRC, etc.) because its architecture is fundamentally unsuited to those tasks, such that increasing the model capacity will lead to little further improvement.
  2. Even 175B parameters isn't sufficient to perform well on certain tasks (given a fixed architecture), but increasing the number of parameters will eventually cause performance on said tasks to undergo a large increase (akin to something like a phase change in physics).

It sounds like you're implicitly taking the first hypothesis as a given (e.g. when you assert that there is a "remaining gap vs. fine-tuning that seems [unlikely] to be closed"), but I see no reason to give this hypothesis preferential treatment!

In fact, it seems to be precisely the assertion of the paper's authors that the first hypothesis should not be taken as a given; and the evidence they give to support this assertion is... the multiple downstream tasks for which an apparent "phase change" did in fact occur. Let's list them out:

  • BoolQ (apparent flatline between 2.6B and 13B, then a sudden jump in performance at 175B)
  • CB (essentially noise between 0.4B and 13B, then a sudden jump in performance at 175B)
  • RTE (essentially noise until 2.6B, then a sudden shift to very regular improvement until 175B)
  • WSC (essentially noise until 2.6B, then a sudden shift to very regular improvement until 175B)
  • basic arithmetic (mostly flat until 6.7B, followed by rapid improvement until 175B)
  • SquadV2 (apparent flatline at 0.8B, sudden jump at 1.3B followed by approximately constant rate of improvement until 175B)
  • ANLI round 3 (noise until 13B, sudden jump at 175B)
  • word-scramble with random insertion (sudden increase in rate of improvement after 6.7B)

Several of the above examples exhibit a substantial amount of noise in their performance graphs, but nonetheless, I feel my point stands. Given this, it seems rather odd for you to be claiming that the "great across-task variance" indicates a lack of general reasoning capability when said across-task variance is (if anything) evidence for the opposite, with many tasks that previously stumped smaller models being overcome by GPT-3.

It's especially interesting to me that you would write the following, seemingly without realizing the obvious implication (emphasis mine):

we still see a wide spread of task performance despite smooth gains in LM loss, with some of the most distinctive deficits persisting at all scales (common sense physics, cf section 5), and some very basic capabilities only emerging at very large scale and noisily even there (arithmetic)

The takeaway here is, at least in my mind, quite clear: it's a mistake to evaluate model performance on human terms. Without getting into an extended discussion on whether arithmetic ought to count as a "simple" or "natural" task, empirically transformers do not exhibit a strong affinity for the task. Therefore, the fact that this "basic capability" emerges at all is, or at least should be, strong evidence for generalization capability. As such, the way you use this fact to argue otherwise (both in the section I just quoted and in your original post) seems to me to be exactly backwards.

Elsewhere, you write:

The ability to get better downstream results is utterly unsurprising: it would be very surprising if language prediction grew steadily toward perfection without a corresponding trend toward good performance on NLP benchmarks

It's surprising to me that you would write this while also claiming that few-shot prediction seems unlikely to close the gap to fine-tuned models on certain tasks. I can't think of a coherent model where both of these claims are simultaneously true; if you have one, I'd certainly be interested in hearing what it is.

More generally, this is (again) why I stress the importance of concrete predictions. You call it "utterly unsurprising" that a 175B-param model would outperform smaller ones on NLP benchmarks, and yet neither you nor anyone else could have predicted what the scaling curves for those benchmarks would look like. (Indeed, your entire original post can be read as an expression of surprise at the lack of impressiveness of GPT-3's performance on certain benchmarks.)

When you only ever look at things in hindsight, without ever setting forth concrete predictions that can be overturned by evidence, you run the risk of never forming a model concrete enough to be engaged with. I don't believe it's a coincidence that you called it "difficult" to explain why you found the paper unimpressive: it's because your standards of impressiveness are opaque enough that they don't, in and of themselves, constitute a model of how transformers might/might not possess general reasoning ability.

Comment by dxu on GPT-3: a disappointing paper · 2020-06-02T17:49:33.971Z · score: 5 (3 votes) · LW · GW

Also note that a significant number of humans would fail the kind of test you described (inducing the behavior of a novel mathematical operation from a relatively small number of examples), which is why similar tests of inductive reasoning ability show up quite often on IQ tests and the like. It's not the case that failing at that kind of test shows a lack of general reasoning skills, unless we permit that a substantial fraction of humans lack general reasoning skills to at least some extent.

Comment by dxu on GPT-3: a disappointing paper · 2020-05-31T02:12:32.804Z · score: 10 (4 votes) · LW · GW

I don't think the practical value of very new techniques is impossible to estimate. For example, the value of BERT was very clear in the paper that introduced it: it was obvious that this was a strictly better way to do supervised NLP, and it was quickly and widely adopted.

This comparison seems disingenuous. The goal of the BERT paper was to introduce a novel training method for Transformer-based models that measurably outperformed previous training methods. Conversely, the goal of the GPT-3 paper seems to be to investigate the performance of an existing training method when scaled up to previously unreached (and unreachable) model sizes. I would expect you to agree that these are two very different things, surely?

More generally, it seems to me that you've been consistently conflating the practical usefulness of a result with how informative said result is. Earlier, you wrote that "few-shot LM prediction" (not GPT-3 specifically, few-shot prediction in general!) doesn't sound that promising to you because the specific model discussed in the paper doesn't outperform SOTA on all benchmarks, and also requires currently impractical levels of hardware/compute. Setting aside the question of whether this original claim resembles the one you just made in your latest response to me (it doesn't), neither claim addresses what, in my view, are the primary implications of the GPT-3 paper--namely, what it says about the viability of few-shot prediction as model capacity continues to increase.

This, incidentally, is why I issued the "smell test" described in the grandparent, and your answer more or less confirms what I initially suspected: the paper comes across as unsurprising to you because you largely had no concrete predictions to begin with, beyond the trivial prediction that existing trends will persist to some (unknown) degree. (In particular, I didn't see anything in what you wrote that indicates an overall view of how far the capabilities current language models are from human reasoning ability, and what that might imply about where model performance might start flattening with increased scaling.)

Since it doesn't appear that you had any intuitions to begin with about what GPT-3's results might indicate about the scalability of language models in general, it makes sense that your reading of the paper would be framed in terms of practical applications, of which (quite obviously) there are currently none.

Comment by dxu on Draconarius's Shortform · 2020-05-30T20:01:02.374Z · score: 2 (1 votes) · LW · GW

If the number of guests is countable (which is the usual assumption in Hilbert’s setup), then every guest will only have to travel a finite (albeit unboundedly long) distance before they reach their room.

Comment by dxu on GPT-3: a disappointing paper · 2020-05-30T19:31:37.868Z · score: 10 (3 votes) · LW · GW

What do you think that main significance is?

I can’t claim to speak for gwern, but as far as significance goes, Daniel Kokotajlo has already advanced a plausible takeaway. Given that his comment is currently the most highly upvoted comment on this post, I imagine that a substantial fraction of people here share his viewpoint.

Given my past ML experience, this just doesn't sound that promising to me, which may be our disconnect.

I strongly suspect the true disconnect comes a step before this conclusion: namely, that “[your] past ML experience” is all that strongly predictive of performance using new techniques. A smell test: what do you think your past experience would have predicted about the performance of a 175B-parameter model in advance? (And if the answer is that you don’t think you would have had clear predictions, then I don’t see how you can justify this “review” of the paper as anything other than hindsight bias.)

Comment by dxu on AGIs as populations · 2020-05-29T02:33:46.320Z · score: 2 (1 votes) · LW · GW
  • "There seems to be no reason not to expect that human value functions have similar problems, which even "aligned" AIs could trigger unless they are somehow designed not to." There are plenty of reasons to think that we don't have similar problems - for instance, we're much smarter than the ML systems on which we've seen adversarial examples. Also, there are lots of us, and we keep each other in check.
  • "For example, such AIs could give humans so much power so quickly or put them in such novel situations that their moral development can't keep up, and their value systems no longer apply or give essentially random answers." What does this actually look like? Suppose I'm made the absolute ruler of a whole virtual universe - that's a lot of power. How might my value system "not keep up"?

I confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.

Comment by dxu on AI Boxing for Hardware-bound agents (aka the China alignment problem) · 2020-05-09T04:47:27.956Z · score: 2 (1 votes) · LW · GW

"Will it happen?" isn't vacuous or easy, generally speaking. I can think of lots of questions where I have no idea what the answer is, despite a "trend of ever increasing strength".

In the post, you write:

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

"Will it happen?" is easy precisely in cases where a development "seems inevitable"; the hard part then becomes forecasting when such a development will occur. The fact that you (and most computer Go experts, in fact) did not do this is a testament to how unpredictable conceptual advances are, and your attempt to reduce it to the mere continuation of a trend is an oversimplification of the highest order.

I've made specific statements about my beliefs for when Human-Level AI will be developed. If you disagree with these predictions, please state your own.

You've made statements about your willingness to bet at non-extreme odds over relatively large chunks of time. This indicates both low confidence and low granularity, which means that there's very little disagreement to be had. (Of course, I don't mean to imply that it's possible to do better; indeed, given the current level of uncertainty surrounding everything to do with AI, about the only way to get me to disagree with you would have been to provide a highly confident, specific prediction.)

Nevertheless, it's an indicator that you do not believe you possess particularly reliable information about future advances in AI, so I remain puzzled that you would present your thesis so strongly at the start. In particular, your claim that the following questions

Does this mean that the development of human-level AI might not surprise us? Or that by the time human level AI is developed it will already be old news?

depend on

whether or not you were surprised by the development of Alpha-Go

seems to have literally no connection to what you later claim, which is that AlphaGo did not surprise you because you knew something like it had to happen at some point. What is the relevant analogy here to artificial general intelligence? Will artificial general intelligence be "old news" because we suspected from the start that it was possible? If so, what does it mean for something be "old news" if you have no idea when it will happen, and could not have predicted it would happen at any particular point until after it showed up?

As far as I can tell, reading through both the initial post and the comments, none of these questions have been answered.

Comment by dxu on AI Boxing for Hardware-bound agents (aka the China alignment problem) · 2020-05-09T00:36:36.234Z · score: 2 (1 votes) · LW · GW

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

This seems to entirely ignore most (if not all) of the salient implications of AlphaGo's development. What set AlphaGo apart from previous attempts at computer Go was the iterated distillation and amplification scheme employed during its training scheme. This represents a genuine conceptual advance over previous approaches, and to characterize it as simply a continuation of the trend of increasing strength in Go-playing programs only works if you neglect to define said "trend" in any way more specific than "roughly monotonically increasing". And if you do that, you've tossed out any and all information that would make this a useful and non-vacuous observation.

Shortly after this paragraph, you write:

For the record, I was surprised at how soon Alpha-Go happened, but not that it happened.

In other words, you got the easy and useless part ("will it happen?") right, and the difficult and important part ("when will it happen?") wrong. It's not clear to me why you feel this necessitated mention at all, but since you did mention it, I feel obligated to point out that "predictions" of this caliber are the best you'll ever be able to do if you insist on throwing out any information more specific and granular than "historically, these metrics seem to move consistently upward/downward".

Comment by dxu on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:09:51.563Z · score: 3 (2 votes) · LW · GW

If it were common knowledge that any hyperbolic language experts use when speaking about the unlikelihood of AGI (e.g. Andrew Ng's statement "worrying about AI safety is like worrying about overpopulation on Mars") actually corresponded to a 10% subjective probability of AGI, things would look very different than they currently do.

More generally, on a strategic level there is very little difference between a genuinely incorrect forecast and one that is "correct", but communicated so poorly as to create a wrong impression in the mind of the listener. If the state of affairs is such that anyone who privately believes there is a 10% chance of AGI is incentivized to instead report their assessment as "remote", the conclusion of Ord/Yudkowsky holds, and it remains impossible to discern whether AGI is imminent by listening to expert forecasts.

(I also don't believe that said experts, if asked to translate their forecasts to numerical probabilities, would give a median estimate anywhere near as high as 10%, but that's largely tangential to the discussion at hand.)

Furthermore, and more importantly, however: I deny that Fermi's 10% somehow detracts from the point that forecasting the future of novel technologies is hard.

Four years prior to overseeing the world's first nuclear reaction, Fermi believed that it was more likely than not that a nuclear chain reaction was impossible. Setting aside for a moment the question of whether Fermi's specific probability assignment was negligible, or merely small, what this indicates is that the majority of the information necessary to determine the possibility of a nuclear chain reaction was in fact unavailable to Fermi at the time he made his forecast. This does not support the idea that making predictions about technology is easy, any more than it would have if Fermi had assigned 0.001% instead of 10%!

More generally, the specific probability estimate Fermi gave is nothing more than a red herring, one that is given undue attention by the OP. The relevant factor to Ord/Yudkowsky's thesis is how much uncertainty there is in the probability distribution of a given technology--not whether the mean of said distribution, when treated as a point estimate, happens to be negligible or non-negligible. Focusing too much on the latter not only obfuscates the correct lesson to be learned, but also sometimes leads to nonsensical results.

Comment by dxu on Being right isn't enough. Confidence is very important. · 2020-04-07T21:25:51.648Z · score: 1 (2 votes) · LW · GW

The original post wasn’t talking about “correctness”; it was talking about calibration, which is a very specific term with a very specific meaning. Machines one and two are both well-calibrated, but there is nothing requiring that two well-calibrated distributions must perform equally well against each other in a series of bets.

Indeed, this is the very point of the original post, so your comment attempting to contradict it did not, in fact, do so.

Comment by dxu on Predictors exist: CDT going bonkers... forever · 2020-01-15T22:14:38.523Z · score: 5 (3 votes) · LW · GW

these examples can't actually happen, or are so rare that I'll pay that cost in order to have a simpler model for the other 99.9999% of my decisions

Indeed, if it were true that Newcomb-like situations (or more generally, situations where other agents condition their behavior on predictions of your behavior) do not occur with any appreciable frequency, there would be much less interest in creating a decision theory that addresses such situations.

But far from constituting a mere 0.0001% of possible situations (or some other, similarly minuscule percentage), Newcomb-like situations are simply the norm! Even in everyday human life, we frequently encounter other people and base our decisions off what we expect them to do—indeed, the ability to model others and act based on those models is integral to functioning as part of any social group or community. And it should be noted that humans do not behave as causal decision theory predicts they ought to—we do not betray each other in one-shot prisoner’s dilemmas, we pay people we hire (sometimes) well in advance of them completing their job, etc.

This is not mere “irrationality”; otherwise, there would have been no reason for us to develop these kinds of pro-social instincts in the first place. The observation that CDT is inadequate is fundamentally a combination of (a) the fact that it does not accurately predict certain decisions we make, and (b) the claim that the decisions we make are in some sense correct rather than incorrect—and if CDT disagrees, then so much the worse for CDT. (Specifically, the sense in which our decisions are correct—and CDT is not—is that our decisions result in more expected utility in the long run.)

All it takes for CDT to fail is the presence of predictors. These predictors don’t have to be Omega-style superintelligences—even moderately accurate predictors who perform significantly (but not ridiculously) above random chance can create Newcomb-like elements with which CDT is incapable of coping. I really don’t see any justification at all for the idea that these situations somehow constitute a superminority of possible situations, or (worse yet) that they somehow “cannot” happen. Such a claim seems to be missing the forest for the trees: you don’t need perfect predictors to have these problems show up; the problems show up anyway. The only purpose of using Omega-style perfect predictors is to make our thought experiments clearer (by making things more extreme), but they are by no means necessary.

Comment by dxu on Realism about rationality · 2020-01-14T23:35:31.356Z · score: 2 (1 votes) · LW · GW

That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing.

In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.

Comment by dxu on Realism about rationality · 2020-01-13T23:33:17.345Z · score: 2 (1 votes) · LW · GW

Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum?

It depends on what you mean by "lawful". Right now, the word "lawful" in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it's not clear to me why "law thinking" is relevant in the first place--it seems as though it simply muddies the discussion by introducing additional concepts.

Comment by dxu on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T04:27:05.281Z · score: 23 (10 votes) · LW · GW

Skimming through. May or may not post an in-depth comment later, but for the time being, this stood out to me:

I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.

I note that Yann has not actually specified a way of not "giving [the AI] moronic objectives with no safeguards". The argument of AI risk advocates is precisely that the thing in quotes in the previous sentence is difficult to do, and that people do not have to be "ridiculously stupid" to fail at it--as evidenced by the fact that no one has actually come up with a concrete way of doing it yet. It doesn't look to me like Yann addressed this point anywhere; he seems to be under the impression that repeating his assertion more emphatically (obviously, when we actually get around to building the AI, we'll use our common sense and build it right) somehow constitutes an argument in favor of said assertion. This seems to be an unusually low-quality line of argument from someone who, from what I've seen, is normally much more clear-headed than this.

Comment by dxu on What explanatory power does Kahneman's System 2 possess? · 2019-08-12T17:52:50.266Z · score: 6 (4 votes) · LW · GW

I'm curious as to what prompted this question?

Comment by dxu on Weak foundation of determinism analysis · 2019-08-08T17:50:52.400Z · score: 4 (2 votes) · LW · GW

I have pointed out what people worry they are going to lose under determinism. Yes, they only going to have those things under nondeterminism.

You just said that nondeterminist intuitions are only mistaken if determinism is true and compatibilism is false. So what exactly is being lost if you subscribe to both determinism and compatibilism?

Comment by dxu on Weak foundation of determinism analysis · 2019-08-08T00:08:34.105Z · score: 2 (3 votes) · LW · GW

You're mixing levels. If someone can alter their decisions, that implies there are multiple possible next states of the universe

This is incorrect. It's possible to imagine a counterfactual state in which the person in question differs from their actual self in an unspecified manner, which thereby causes them to make a different decision; this counterfactual state differs from reality, but it is by no means incoherent. Furthermore, the comparison of various counterfactual futures of this type is how decision-making works; it is an abstraction used for the purpose of computation, not something ontologically fundamental to the way the universe works--and the fact that some people insist it be the latter is the source of much confusion. This is what I meant when I wrote:

Decision-making itself is also a process that occurs in the map, not the territory; there is no contradiction here.

So there is no "mixing levels" going on here, as you can see; rather, I am specifically making sure to keep the levels apart, by not tying the mental process of imagining and assessing various potential outcomes to the physical question of whether there are actually multiple physical outcomes. In fact, the one who is mixing levels is you, since you seem to be assuming for some reason that the mental process in question somehow imposes itself onto the laws of physics.

(Here is a thought experiment: I think you will agree that a chess program, if given a chess position and run for a prespecified number of steps, will output a particular move for that position. Do you believe that this fact prevents the chess program from considering other possible moves it might make in the position? If so, how do you explain the fact that the chess program explicitly contains a game tree with multiple branches, the vast majority of which will not in fact occur?)

There are various posts in the sequences that directly address this confusion; I suggest either reading them or re-reading them, depending on whether you have already.

Comment by dxu on Weak foundation of determinism analysis · 2019-08-07T20:29:55.829Z · score: 2 (1 votes) · LW · GW

But this is map, not territory.

Certainly. Decision-making itself is also a process that occurs in the map, not the territory; there is no contradiction here. Some people may find the idea of decision-making being anything but a fundamental, ontologically primitive process somehow unsatisfying, or even disturbing, but I submit that this is a problem with their intuitions, not with the underlying viewpoint.

(If someone goes so far as to alter their decisions based on their belief in determinism--say, by lounging on the couch watching TV all day rather than being productive, because their doing so was "predetermined"--I would say that they are failing to utilize their brain's decision-making apparatus. (Or rather, that they are not using it very well.) This has nothing to do with free will, determinism, or anything of the like; it is simply a (causal) consequence of the fact that they have misinterpreted what it means to be an agent in a deterministic universe.)

Comment by dxu on Weak foundation of determinism analysis · 2019-08-07T17:23:08.715Z · score: 1 (2 votes) · LW · GW

Belief in determinism is correlated with worse outcomes, but one doesn't cause the other; both are determined by the state and process of the universe.

Read literally, you seem to be suggesting that a deterministic universe doesn't have cause and effect, only correlation. But this reading seems prima facie absurd, unless you're using a very non-standard notion of "cause and effect". Are you arguing, for example, that it's impossible to draw a directed acyclic graph in order to model events in a deterministic universe? If not, what are you arguing?

Comment by dxu on Drive-By Low-Effort Criticism · 2019-08-02T01:13:29.140Z · score: 3 (3 votes) · LW · GW

We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.

I'm afraid this sentence doesn't parse for me. You seem to be speaking of "results" as something which to which the concept of rewards and punishments are applicable. However, I'm not aware of any context in which this is a meaningful (rather than nonsensical) thing to say. All theories of behavior I've encountered that make mention of the concept of rewards and punishments (e.g. operant conditioning) refer to them as a means of influencing behavior. If there's something else you're referring to when you say "reward or punish the results", I would appreciate it if you clarified what exactly that thing is.

Comment by dxu on Drive-By Low-Effort Criticism · 2019-08-02T00:46:30.824Z · score: 25 (13 votes) · LW · GW

You should neither reward nor punish strategies or attempts at all, but results.

This statement is presented in a way that suggests the reader ought to find it obvious, but in fact I don't see why it's obvious at all. If we take the quoted statement at face value, it appears to be suggesting that we apply our rewards and punishments (whatever they may be) to something which is causally distant from the agent whose behavior we are trying to influence--namely, "results"--and, moreover, that this approach is superior to the approach of applying those same rewards/punishments to something which is causally immediate--namely, "strategies".

I see no reason this should be the case, however! Indeed, it seems to me that the opposite is true: if the rewards and punishments for a given agent are applied based on a causal node which is separated from the agent by multiple causal links, then there is a greater number of ancestor nodes that said rewards/punishments must propagate through before reaching the agent itself. The consequences of this are twofold: firstly, the impact of the reward/punishment is diluted, since it must be divided among a greater number of potential ancestor nodes. And secondly, because the agent has no way to identify which of these ancestor nodes we "meant" to reward or punish, our rewards/punishments may end up impacting aspects of the agent's behavior we did not intend to influence, sometimes in ways that go against what we would prefer. (Moreover, the probability of such a thing occurring increases drastically as the thing we reward/punish becomes further separated from the agent itself.)

The takeaway from this, of course, is that strategically rewarding and punishing things grows less effective as the proxy on which said rewards and punishments are based grows further from the thing we are trying to influence--a result which sometimes goes by a more well-known name. This then suggests that punishing results over strategies, far from being a superior approach, is actually inferior: it has lower chances of influencing behavior we would like to influence, and higher chances of influencing behavior we would not like to influence.

(There are, of course, benefits as well as costs to rewarding and punishing results (rather than strategies). The most obvious benefit is that it is far easier for the party doing the rewarding and punishing: very little cognitive effort is required to assess whether a given result is positive or negative, in stark contrast to the large amounts of effort necessary to decide whether a given strategy has positive or negative expectation. This is why, for example, large corporations--which are often bottlenecked on cognitive effort--generally reward and punish their employees on the basis of easily measurable metrics. But, of course, this is a far cry from claiming that such an approach is simply superior to the alternative. (It is also why large corporations so often fall prey to Goodhart's Law.))

Comment by dxu on Drive-By Low-Effort Criticism · 2019-07-31T18:20:01.220Z · score: 35 (10 votes) · LW · GW

It is good to discourage people from spending a lot of effort on making things that have little or no (or even negative) value.

Would you care to distinguish a means of discouraging people from spending effort on low-value things, from a means that simply discourages people from spending effort in general? It seems to me that here you are taking the concept of "making things that have little or no (or even negative) value" as a primitive action--something that can be "encouraged" or "discouraged"--whereas, on the other hand, it seems to me that the true primitive action here is spending effort in the first place, and that actions taken to disincentivize the former, will in fact turn out to disincentivize the latter.

If this is in fact the case, then the question is not so simple as whether we ought to discourage posters from spending effort on making incorrect posts (to which the answer would of course be "yes, we ought"), but rather, whether we ought to discourage posters from spending effort. To this, you say:

But there is no virtue in mere effort.

Perhaps there is no "virtue" in effort, but in that case we must ask why "virtue" is the thing we are measuring. If the goal is to maximize, not "virtue", but high-quality posts, then I submit that (all else being equal) having more high-effort posts is more likely to accomplish this than having fewer high-effort posts. Unless your contention is that all else is not equal (perhaps high-effort posts are more likely to contain muddled thinking, and hence more likely to have incorrect conclusions? but it's hard to see why this should be the case a priori), then it seems to me that encouraging posters to put large amounts of effort into their posts is simply a better course of action than discouraging them.

And what does it mean to "encourage" or "discourage" a poster? Based on the following part of your comment, it seems that you are taking "discourage" to mean something along the lines of "point out ways in which the post in question is mistaken":

If I post a long, in-depth analysis, which is lovingly illustrated, meticulously referenced, and wrong, and you respond with a one-line comment that points out the way in which my post was wrong, then I have done poorly (and my post ought to be downvoted), while you have done well (and your comment ought to be upvoted).

But how often is it the case that a "long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced" is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner? I claim that this so rarely occurs that it should play a negligible role in our considerations--in other words, that the hypothetical situation you describe does not reflect reality.

What occurs more often, I think, is that a commenter finds themselves mistakenly under the impression that they have spotted an obvious error, and then proceeds to post (what they believe to be) an obvious refutation. I further claim that such cases are disproportionately responsible for the so-called "drive-by low-effort criticism" described in the OP. It may be that you disagree with this, but whether it is true or not is in a matter of factual accuracy, not opinion. However, if one happens to believe it is true, then it should not be difficult to understand why one might prefer to see less of the described behavior.

Comment by dxu on FactorialCode's Shortform · 2019-07-31T00:02:55.131Z · score: 4 (2 votes) · LW · GW

By default on reddit and lesswrong, posts start with 1 karma, coming from the user upvoting themselves.

Actually, on LessWrong, I'm fairly sure the karma value of a particular user's regular vote depends on the user's existing karma score. Users with a decent karma total usually have a default vote value of 2 karma rather than 1, so each comment they post will have 2 karma to start. Users with very high karma totals seem to have a vote that's worth 3 karma by default. Something similar happens with strong votes, though I'm not sure what kind of math is used there.

Aside: I've sometimes thought that users should be allowed to pick a value for their vote that's anywhere between 1 and the value of their strong upvote, instead of being limited to either a regular vote (2 karma in my case) or a strong vote (6 karma). In my case, I literally can't give people karma values of 1, 3, 4, or 5, which could be useful for more granular valuations.

Comment by dxu on Dialogue on Appeals to Consequences · 2019-07-25T17:44:06.908Z · score: 11 (3 votes) · LW · GW

The part about climate science seems like a pretty bog-standard outside view argument, which in turn means I find it largely uncompelling. Yes, there are people who are so stupid, they can only be saved from their own stupidity by executing an epistemic maneuver that works regardless of the intelligence of the person executing it. This does not thereby imply that everyone should execute the same maneuver, including people who are not that stupid, and therefore not in need of saving. If someone out there is so incompetent that they mistakenly perceive themselves as competent, then they are already lost, and the fact that an illegal (from the perspective of normative probability theory) epistemic maneuver exists which would save them if they executed it, does not thereby make that maneuver a normatively good move. (And even if it were, it's not as though the people who would actually benefit from said maneuver are going to execute it--the whole reason that such people are loudly, confidently mistaken is that they don't take the outside view seriously.)

In short: there is simply no principled justification for modesty-based arguments, and--though it may be somewhat impolite to say--I agree with Eliezer that people who find such arguments compelling are actually being influenced by social modesty norms (whether consciously or unconsciously), rather than any kind of normative judgment. Based on various posts that Scott has written in the past, I would venture to say that he may be one of those people.

Comment by dxu on Dialogue on Appeals to Consequences · 2019-07-25T17:24:50.585Z · score: 4 (2 votes) · LW · GW

This is a fictional dialogue demonstrating a meta-level point about how discourse works, and your comment is pretty off-topic.

I think that if a given "meta-level point" has obvious ties to existing object-level discussions, then attempting to suppress the object-level points when they're raised in response is pretty disingenuous. (What I would actually prefer is for the person making the meta-level point to be the same person pointing out the object-level connection, complete with "and here is why I feel this meta-level point is relevant to the object level". If the original poster doesn't do that, then it does indeed make comments on the object-level issues seem "off-topic", a fact which ought to be laid at the feet of the original poster for not making the connection explicit, rather than at the feet of the commenter, who correctly perceived the implications.)

Now, perhaps it's the case that your post actually had nothing to do with the conversations surrounding EA or whatever. (I find this improbable, but that's neither here nor there.) If so, then you as a writer ought to have picked a different example, one with fewer resemblances to the ongoing discussion. (The example Jeff gave in his top-level comment, for example, is not only clearer and more effective at conveying your "meta-level point", but also bears significantly less resemblance to the controversy around EA.) The fact that the example you chose so obviously references existing discussions that multiple commenters pointed it out is evidence that either (a) you intended for that to happen, or (b) you really didn't put a lot of thought into picking a good example.

Comment by dxu on Appeal to Consequence, Value Tensions, And Robust Organizations · 2019-07-24T18:24:52.143Z · score: 2 (1 votes) · LW · GW

In contrast, my model has been that communities congregate around predictable sources of high-quality writing, and people who can produce high-quality content in high volume are very rare. Thus, once Eliezer Yudkowsky stopped being active, and Yvain a.k.a. the immortal Scott Alexander moved to Slate Star Codex (in part so that he could write about politics, which we've traditionally avoided), all the "intellectual energy" followed Scott to SSC.

First, I want to state that I agree with this model. However, I also want to note that the SSC comments section tend to have fairly low-quality discussion (in comparison to the OB/LW 1.0 heyday), and I'm not sure why this is; candidate hypotheses include that Scott's explicit politics attracted people with lower epistemic standards, or that the lack of an explicit karma system allowed low-quality discussion to persist (but I don't think OB had an explicit karma system either?).

Overall, I'm unsure as to what kind of norms/technology maintains high-quality discussion (as opposed to just the presence of discussion in general), and it's plausible to me that the two may actually be somewhat mutually exclusive (in the sense that norms/technology designed to promote the volume of high-quality discussion may in fact reduce the volume of discussion in general). It's not clear to me how this tradeoff should be balanced.

Comment by dxu on If physics is many-worlds, does ethics matter? · 2019-07-22T18:14:46.664Z · score: 2 (1 votes) · LW · GW

A functional duplicate of an entity that reports having such-and-such a quale will report having it even if doesn't.

In that case, there's no reason to think anyone has qualia. The fact that lots of people say they have qualia, doesn't actually mean anything, because they'd say so either way; therefore, those people's statements do not constitute valid evidence in favor of the existence of qualia. And if people's statements don't constitute evidence for qualia, then the sum total of evidence for qualia's existence is... nothing: there is zero evidence that qualia exist.

So your interpretation is self-defeating: there is no longer a need to explain qualia, because there's no reason to suppose that they exist in the first place. Why try and explain something that doesn't exist?

On the other hand, it remains an empirical fact that people do actually talk about having "conscious experiences". This talk has nothing to do with "qualia" as you've defined the term, but that doesn't mean it's not worth investigating in its own right, as a scientific question: "What is the physical cause of people's vocal cords emitting the sounds corresponding to the sentence 'I'm conscious of my experience'?" What the generalized anti-zombie principle says is that the answer to this question, will in fact explain qualia--not the concept that you described or that David Chalmers endorses (which, again, we have literally zero reason to think exists), but the intuitive concept that led philosophers to coin the term "qualia" in the first place.

Comment by dxu on Rationality is Systematized Winning · 2019-07-22T02:12:33.686Z · score: 11 (4 votes) · LW · GW

You're confusing ends with means, terminal goals with instrumental goals, morality with decision theory, and about a dozen other ways of expressing the same thing. It doesn't matter what you consider "good", because for any fixed definition of "good", there are going to be optimal and suboptimal methods of achieving goodness. Winning is simply the task of identifying and carrying out an optimal, rather than suboptimal, method.

Comment by dxu on Why it feels like everything is a trade-off · 2019-07-18T17:00:18.813Z · score: 5 (3 votes) · LW · GW

Good post. Seems related to (possibly the same concept as) why the tails come apart.

Comment by dxu on The AI Timelines Scam · 2019-07-12T21:55:21.046Z · score: 17 (8 votes) · LW · GW

There are strong prior reasons to think that it's better for the public to have better beliefs about AI strategy.

That may be, but note that the word "prior" is doing basically all of the work in this sentence. (To see this, just replace "AI strategy" with practically any other subject, and notice how the modified statement sounds just as sensible as the original.) This is important because priors can easily be overwhelmed by additional evidence--and insofar as AI researcher Alice thinks a specific discussion topic in AI strategy has the potential to be dangerous, it's worth realizing Alice probably has some specific inside view reasons to believe that's the case. And, if those inside view arguments happen to require an understanding of the topic that Alice believes to be dangerous, then Alice's hands are now tied: she's both unable to share information about something, and unable to explain why she can't share that information.

Naturally, this doesn't just make Alice's life more difficult: if you're someone on the outside looking in, then you have no way of confirming if anything Alice says is true, and you're forced to resort to just trusting Alice. If you don't have a whole lot of trust in Alice to begin with, you might assume the worst of her: Alice is either rationalizing or lying (or possibly both) in order to gain status for herself and the field she works in.

I think, however, that these are dangerous assumptions to make. Firstly, if Alice is being honest and rational, then this policy effectively punishes her for being "in the know"--she must either divulge information she (correctly) believes to be dangerous, or else suffer an undeserved reputational hit. I'm particularly wary of imposing incentive structures of this kind around AI safety research, especially considering the relatively small number of people working on AI safety to begin with.

Secondly, however: in addition to being unfair to Alice, there are more subtle effects that such a policy may have. In particular, if Alice feels pressured to disclose the reasons she can't disclose things, that may end up influencing the rate and/or quality of the research she does in the first place (Ctrl+F "walls"). This could have serious consequences down the line for AI safety research, above and beyond the object-level hazards of revealing potentially dangerous ideas to the public.

Given all of this, I don't think it's obvious that the best move at this point involves making all of the strategic arguments around AI safety public. (And note that I say this as a member of said public: I am not affiliated with MIRI or any other AI safety institution, nor am I personally acquainted with anyone who is so affiliated. This therefore makes me a direct counter-example to your claim about the public in general having reason to think secret-keeping organizations must be doing so for self-interested reasons.)

To be clear: I think there is a possible world in which your arguments make sense. I also think there is a possible world in which your arguments not only do not make sense, but would lead to a clearly worse outcome if taken seriously. It's not clear to me which of these worlds we actually live in, and I don't think you've done a sufficient job of arguing that we live in the former world instead of the latter.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T23:31:36.524Z · score: 1 (2 votes) · LW · GW

would react to a wound but not pass the mirror test

I mean, reacting to a wound doesn't demonstrate that they're actually experiencing pain. If experiencing pain actually requires self-awareness, then an animal could be perfectly capable of avoiding damaging stimuli without actually feeling pain from said stimuli. I'm not saying that's actually how it works, I'm just saying that reacting to wounds doesn't demonstrate what you want it to demonstrate.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T21:31:36.832Z · score: 2 (1 votes) · LW · GW

It can be aware of an experience its' having, even if its' not aware that it is the one having the experience

I strongly suspect this sentence is based on a confused understanding of qualia.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T20:02:19.011Z · score: 2 (1 votes) · LW · GW

these methods lack enough feedback to enable self-awareness

Although I think this is plausibly the case, I'm far from confident that it's actually true. Are there any specific limitations you think play a role here?

Comment by dxu on The AI Timelines Scam · 2019-07-11T17:57:00.619Z · score: 24 (9 votes) · LW · GW

I agree that it's difficult (practically impossible) to engage with a criticism of the form "I don't find your examples compelling", because such a criticism is in some sense opaque: there's very little you can do with the information provided, except possibly add more examples (which is time-consuming, and also might not even work if the additional examples you choose happen to be "uncompelling" in the same way as your original examples).

However, there is a deeper point to be made here: presumably you yourself only arrived at your position after some amount of consideration. The fact that others appear to find your arguments (including any examples you used) uncompelling, then, usually indicates one of two things:

  1. You have not successfully expressed the full chain of reasoning that led you to originally adopt your conclusion (owing perhaps to constraints on time, effort, issues with legibility, or strategic concerns). In this case, you should be unsurprised at the fact that other people don't appear to be convinced by your post, since your post does not present the same arguments/evidence that convinced you yourself to believe your position.

  2. You do, in fact, find the raw examples in your post persuasive. This would then indicate that any disagreement between you and your readers is due to differing priors, i.e. evidence that you would consider sufficient to convince yourself of something, does not likewise convince others. Ideally, this fact should cause you to update in favor of the possibility that you are mistaken, at least if you believe that your interlocutors are being rational and intellectually honest.

I don't know which of these two possibilities it actually is, but it may be worth keeping this in mind if you make a post that a bunch of people seem to disagree with.

Comment by dxu on Experimental Open Thread April 2019: Socratic method · 2019-07-10T23:54:05.616Z · score: 6 (3 votes) · LW · GW

It's a common belief, but it appears to me quite unfounded, since it hasn't happened in millennia of trying. So, a direct observation speaks against this model.


It's another common belief, though separate from the belief of reality. It is a belief that this reality is efficiently knowable, a bold prediction that is not supported by evidence and has hints to the contrary from the complexity theory.


General Relativity plus the standard model of the particle physics have stood unchanged and unchallenged for decades, the magic numbers they require remaining unexplained since the Higgs mass was predicted a long time ago. While this suggests that, yes, we will probably never stop being surprised by the universe observations, I make no such claims.

I think at this stage we have finally hit upon a point of concrete disagreement. If I'm interpreting you correctly, you seem to be suggesting that because humans have not yet converged on a "Theory of Everything" after millennia of trying, this is evidence against the existence of such a theory.

It seems to me, on the other hand, that our theories have steadily improved over those millennia (in terms of objectively verifiable metrics like their ability to predict the results of increasingly esoteric experiments), and that this is evidence in favor of an eventual theory of everything. That we haven't converged on such a theory yet is simply a consequence, in my view, of the fact that the correct theory is in some sense hard to find. But to postulate that no such theory exists is, I think, not only unsupported by the evidence, but actually contradicted by it--unless you're interpreting the state of scientific progress quite differently than I am.*

That's the argument from empirical evidence, which (hopefully) allows for a more productive disagreement than the relatively abstract subject matter we've discussed so far. However, I think one of those abstract subjects still deserves some attention--in particular, you expressed further confusion about my use of the word "coincidence":

I am still unsure what you mean by coincidence here. The dictionary defines it as "A remarkable concurrence of events or circumstances without apparent causal connection." and that open a whole new can of worms about what "apparent" and "causal" mean in the situation we are describing, and we soon will be back to a circular argument of implying some underlying reality to explain why we need to postulate reality.

I had previously provided a Tabooed version of my statement, but perhaps even that was insufficiently clear. (If so, I apologize.) This time, instead of attempting to make my statement even more abstract, I'll try taking a different tack and making things more concrete:

I don't think that, if our observations really were impossible to model completely accurately, we would be able to achieve the level of predictive success we have. The fact that we have managed to achieve some level of predictive accuracy (not 100%, but some!) strongly suggests to me that our observations are not impossible to model--and I say this for a very simple reason:

How can it be possible to achieve even partial accuracy at predicting something that is purportedly impossible to model? We can't have done it by actually modeling the thing, of course, because we're assuming that the thing cannot be modeled by hypothesis. So our seeming success at predicting the thing, must not actually be due to any kind of successful modeling of said thing. Then how is it that our model is producing seemingly accurate predictions? It seems as though we are in a similar position to a lazy student who, upon being presented with a test they didn't study for, is forced to guess the right answers--except that in our case, the student somehow gets lucky enough to choose the correct answer every time, despite the fact that they are merely guessing, rather than working out the answer the way they should.

I think that the word "coincidence" is a decent way of describing the student's situation in this case, even if it doesn't fully accord with your dictionary's definition (after all, whoever said the dictionary editors determine have the sole power to determine a word's usage?)--and analogously, our model of the thing must also only be making correct predictions by coincidence, since we've ruled out the possibility, a priori, that it might actually be correctly modeling the way the thing works.

I find it implausible that our models are actually behaving this way with respect to the "thing"/the universe, in precisely the same way I would find it implausible that a student who scored 95% on a test had simply guessed on all of the questions. I hope that helps clarify what I meant by "coincidence" in this context.

*You did say, of course, that you weren't making any claims or postulates to that effect. But it certainly seems to me that you're not completely agnostic on the issue--after all, your initial claim was "it's models all the way down", and you've fairly consistently stuck to defending that claim throughout not just this thread, but your entire tenure on LW. So I think it's fair to treat you as holding that position, at least for the sake of a discussion like this.

Comment by dxu on Experimental Open Thread April 2019: Socratic method · 2019-07-10T18:15:59.693Z · score: 2 (1 votes) · LW · GW

(Okay, I've been meaning to get back to you on this for a while, but for some reason haven't until now.)

It seems, based on what you're saying, that you're taking "reality" to mean some preferred set of models. If so, then I think I was correct that you and I were using the same term to refer to different concepts. I still have some questions for you regarding your position on "reality" as you understand the term, but I think it may be better to defer those until after I give a basic rundown of my position.

Essentially, my belief in an external reality, if we phrase it in the same terms we've been using (namely, the language of models and predictions), can be summarized as the belief that there is some (reachable) model within our hypothesis space that can perfectly predict further inputs. This can be further repackaged into a empirical prediction: I expect that (barring an existential catastrophe that erases us entirely) there will eventually come a point when we have the "full picture" of physics, such that no further experiments we perform will ever produce a result we find surprising. If we arrive at such a model, I would be comfortable referring to that model as "true", and the phenomena it describes as "reality".

Initially, I took you to be asserting the negation of the above statement--namely, that we will never stop being surprised by the universe, and that our models, though they might asymptotically approach a rate of 100% predictive success, will never quite get there. It is this claim that I find implausible, since it seems to imply that there is no model in our hypothesis space capable of predicting further inputs with 100% accuracy--but if that is the case, why do we currently have a model with >99% predictive accuracy? Is the success of this model a mere coincidence? It must be, since (by assumption) there is no model actually capable of describing the universe. This is what I was gesturing at with the "coincidence" hypothesis I kept mentioning.

Now, perhaps you actually do hold the position described in the above paragraph. (If you do, please let me know.) But based on what you wrote, it doesn't seem necessary for me to assume that you do. Rather, you seem to be saying something along the lines of, "It may be tempting to take our current set of models as describing how reality ultimately is, but in fact we have no way of knowing this for sure, so it's best not to assume anything."

If that's all you're saying, it doesn't necessarily conflict with my view (although I'd suggest that "reality doesn't exist" is a rather poor way to go about expressing this sentiment). Nonetheless, if I'm correct about your position, then I'm curious as to what you think it's useful for? Presumably it doesn't help make any predictions (almost by definition), so I assume you'd say it's useful for dissolving certain kinds of confusion. Any examples, if so?

Comment by dxu on If physics is many-worlds, does ethics matter? · 2019-07-10T17:39:18.162Z · score: 4 (2 votes) · LW · GW

The linked post is the last in a series of posts, the first of which has been linked here in the past. I recommend that anyone who reads the post shminux linked, also read the LW discussion of the post I just linked, as it seems to me that many of the arguments therein are addressed in a more than satisfactory manner. (In particular, I strongly endorse Jessica Taylor's response, which is as of this writing the most highly upvoted comment on that page.)

Comment by dxu on An Increasingly Manipulative Newsfeed · 2019-07-07T17:38:12.343Z · score: 2 (1 votes) · LW · GW
Suppose I have a training set of articles which are labeled "biased" or "unbiased". I then train a system (using this set), and later use it to label articles "biased" or "unbiased". Will this lead to a manipulative system?

Mostly I would expect such a system to overfit on the training data, and perform no better than chance when tested. The reason for this is that unlike your example, where cats and dogs are (fairly) natural categories with simple distinguishing characteristics, the perception of "bias" in news articles is fundamentally tied to human psychology, and as a result is much more complicated concept to learn than catness versus dogness. By default I would expect an offline training method to completely fail at learning said concept.

Reinforcement learning, meanwhile, will indeed become manipulative (in my expectation). In a certain sense you can view this as a form of overfitting as well, except that the system learns to exploit peculiarities of the humans performing the classification, rather than simply peculiarities of the articles in its training data. (As you might imagine, the former is far more dangerous.)

Comment by dxu on Thoughts on The Replacing Guilt Series⁠ — pt 1 · 2019-07-05T20:50:15.693Z · score: 2 (1 votes) · LW · GW
What's an actual mind?

My philosophy of mind is not yet advanced enough to answer this question. (However, the fact that I am unable to answer a question at present does not imply that there is no answer.)

How do you know that a dog has it?

In a certain sense, I don't. However, I am reasonably confident that regardless of whatever actually constitutes mindfulness, enough of it is shared between the dog and myself that if the dog turns out not to have a mind, then I also do not have a mind. Since I currently believe I do, in fact, have a mind, it follows that I believe the dog does as well.

(Perhaps you do not believe dogs have minds. In that case, the correct response would be to replace the dog in the thought experiment with something you do believe has a mind--for example, a close friend or family member.)

Would you care about an alien living creature that has a different mind-design and doesn't feel qualia?

Most likely not, though I remain uncertain enough about my own preferences that what I just said could be false.

Anyway, if you have no reason to think that the element is absent, then you'll believe that it's present. It's precisely because you feel that something is (or will be) missing, you refuse the offer. You do have some priors about what consequences will be produced by your choice, and that's OK. Nothing incoherent in refusing the offer. That is, if you do have reasons to believe that that's the case.

I agree with this, but it seems not to square with what you wrote originally:

Do you still think that taking a dollar is the wrong choice, even though literally nothing changes afterwards? If you do, do you think it’s a rational choice? Or is your S1 deluding you?
Comment by dxu on Thoughts on The Replacing Guilt Series⁠ — pt 1 · 2019-07-05T20:24:38.383Z · score: 3 (2 votes) · LW · GW
I used the words “truly real”. The dog doesn’t matter, the consequences of the phenomenon that you call “dog” matter.

Wrong. This misses the point of the thought experiment entirely, which is precisely that people are allowed to care about things that aren't detectable by any empirical test. If someone is being tortured in a spaceship that's constantly accelerating away from me, such that the ship and I cannot interact with each other even in principle, I can nonetheless hold that it would be morally better if that person were rescued from their torture (though I myself obviously can't do the rescuing). There is nothing incoherent about this.

In the case of the dog, what matters to me is the dog's mental state. I do not care that I observe a phenomenon exhibiting dog-like behavior; I care that there is an actual mind producing that behavior. If the dog wags its tail to indicate contentment, I want the dog to actually feel content. If the dog is actually dead and I'm observing a phantom dog, then there is no mental state to which the dog's behavior is tied, and hence a crucial element is missing--even if that element is something I can't detect even in principle, even if I myself have no reason to think the element is absent. There is nothing incoherent about this, either.

Fundamentally, you seem to be demanding that other people toss out any preferences they may have that do not conform to the doctrine of the logical positivists. I see no reason to accede to this demand, and as there is nothing in standard preference theory that forces me to accede, I think I will continue to maintain preferences whose scope includes things that actually exist, and not just things I think exist.

Comment by dxu on Everybody Knows · 2019-07-05T16:01:19.482Z · score: 7 (5 votes) · LW · GW

Sometimes being overly literal is a useful way to point out hidden assumptions/inferences that the interlocutor attempted to sneak into the conversation (whether consciously or subconsciously).

Comment by dxu on Let's Read: an essay on AI Theology · 2019-07-05T15:57:15.611Z · score: 2 (1 votes) · LW · GW
the earring provides a kind of slow mind-uploading

This is only true if whatever (hyper)computation the earring is using to make recommendations contains a model of the wearer. Such a model could be interpreted as a true upload, in which case it would be true that the wearer's mind is not actually destroyed.

However, if the earring's predictions are made by some other means (which I don't think is impossible even in real life--predictions are often made without consulting a detailed, one-to-one model of the thing being predicted), then there is no upload, and the user has simply been taken over like a mindless puppet.

Comment by dxu on An Increasingly Manipulative Newsfeed · 2019-07-02T20:12:29.631Z · score: 3 (2 votes) · LW · GW
Typical unbiased newsfeeds in the real world are created by organizations with bias who have an interest in spreading biased news.

I think the word "unbiased" there may be a typo; your statement would make a lot more sense if the word you meant to put there was actually "biased". Assuming it's just a typo:

You're correct that in the real world, most sources of biased news are that way because they are deliberately engineered to be so, and not because of problems with AI optimizing proxy goals. That being said, it's important to point out that even if there existed a hypothetical organization with the goal of combating bias in news articles, they wouldn't be able to do so by training a machine learning system, since (as the article described) most attempts to do so end up failing to various forms of Goodhart's Law. So in a certain sense, the intentions of the underlying organization are irrelevant, because they will encounter this problem regardless of whether they care about being unbiased.

More generally, the newsfeed example is one way to illustrate a larger point, which is that by default, training an ML system to perform tasks involving humans will incentivize the system to manipulate those humans. This problem shows up regardless of whether the person doing the training actually wants to manipulate people, which makes it a separate issue from the fact that certain organizations engage in manipulation.

(Also, it's worth noting that even if you do want to manipulate people, generally you want to manipulate them toward some specific end. A poorly trained AI system, on the other hand, might end up manipulating them in essentially arbitrary ways that have nothing to do with your goal. In other words, even if you want to use AI "for evil", you still need to figure out how to make it do what you want it to do.)

This is the essence of the alignment problem in a nutshell, and it's why I asked whether you had any alternative training procedures in mind.

Comment by dxu on An Increasingly Manipulative Newsfeed · 2019-07-02T18:31:25.721Z · score: 5 (3 votes) · LW · GW
the human also wanted it to be manipulative

Since the article did not explicitly impute any motives to the programmers of the AI system, you must have somehow inferred the quoted claim from their described behavior. The basis of such an inference can only be the procedure that they used to train the AI; since, after all, no other specific behavior was described. This then implies the following:

You believe that the act of training an optimizer to maximize the number of news articles labeled "unbiased" is, in fact, a deliberate attempt at creating a subtly manipulative newsfeed. Corollary: someone who was not attempting to create a manipulative newsfeed--someone who really, truly cared about making sure their articles are unbiased--would not have implemented this training procedure, but rather some alternative procedure that is not as prone to producing manipulative behavior.

What alternative procedure do you have in mind here?

Comment by dxu on Is AlphaZero any good without the tree search? · 2019-07-02T17:24:29.149Z · score: 2 (1 votes) · LW · GW
It is International Master-level without tree search. Good amateur

International masters are emphatically not amateurs. Indeed, IMs are at the level where they can offer coaching services to amateur players, and reasonably expect to be paid something on the order of $100 per session. To elaborate on this point:

there are >1000 players in the world that are better.

The total number of FIDE-rated chess players is over 500,000. The number of IMs, meanwhile, totals less than 3,000. IMs are quite literally in the 99th percentile of chess ability, and that's actually being extremely restrictive with the population--there are many casual players who don't have FIDE ratings at all, since only people who play in at least one FIDE-rated tournament will be assigned a rating.

Comment by dxu on The Competence Myth · 2019-06-30T21:31:13.704Z · score: 4 (3 votes) · LW · GW

This seems potentially related to the more general idea of civilizational inadequacy/Moloch.

Comment by dxu on Decision Theory · 2019-06-29T04:29:38.647Z · score: 2 (3 votes) · LW · GW
The concept is an abstraction.

*Yes, it is. The fact that it is an abstraction is precisely why it breaks down under certain circumstances.

An I/O channel doesn't imply modern computer technology. It just means information is collected from or imprinted upon the environment. It could be ant pheromones, it could be smoke signals, its physical implementation is secondary to the abstract concept of sending and receiving information of some kind. You're not seeing the forest through the trees. Information most certainly does exist.

The claim is not that "information" does not exist. The claim is that input/output channels are in fact an abstraction over more fundamental physical configurations. Nothing you wrote contradicts this, so the fact that you seem to think what I wrote was somehow incorrect is puzzling.

I've explained in previous posts that AIXI is a special case of AIXI_lt. AIXI_lt can be conceived of in an embedded context,


in which case; its model of the world would include a model of itself which is subject to any sort of environmental disturbance

*No. AIXI-tl explicitly does not model itself or seek to identify itself with any part of the Turing machines in its hypothesis space. The very concept of self-modeling is entirely absent from AIXI's definition, and AIXI-tl, being a variant of AIXI, does not include said concept either.

To some extent, an agent must trust its own operation to be correct, because you quickly run into infinite regression if the agent is modeling all the possible that it could be malfunctioning. What if the malfunction effects the way it models the possible ways it could malfunction? It should model all the ways a malfunction could disrupt how it models all the ways it could malfunction, right? It's like saying "well the agent could malfunction, so it should be aware that it can malfunction so that it never malfunctions". If the thing malfunctions, it malfunctions, it's as simple as that.

*This is correct, so far as it goes, but what you neglect to mention is that AIXI makes no attempt to preserve its own hardware. It's not just a matter of "malfunctioning"; humans can "malfunction" as well. However, the difference between humans and AIXI is that we understand what it means to die, and go out of our way to make sure our bodies are not put in undue danger. Meanwhile, AIXI will happily allows its hardware to be destroyed in exchange for the tiniest increase in reward. I don't think I'm being unfair when I suggest that this behavior is extremely unnatural, and is not the kind of thing most people intuitively have in mind when they talk about "intelligence".

Aside from that, AIXI is meant to be a purely mathematical formalization, not a physical implementation. It's an abstraction by design. It's meant to be used as a mathematical tool for understanding intelligence.

*Abstractions are useful for their intended purpose, nothing more. AIXI was formulated as an attempt to describe an extremely powerful agent, perhaps the most powerful agent possible, and it serves that purpose admirably so long as we restrict analysis to problems in which the agent and the environment can be cleanly separated. As soon as that restriction is removed, however, it's obvious that the AIXI formalism fails to capture various intuitively desirable behaviors (e.g. self-preservation, as discussed above). As a tool for reasoning about agents in the real world, therefore, AIXI is of limited usefulness. I'm not sure why you find this idea objectionable; surely you understand that all abstractions have their limits?

Do you consider how the 30 Watts leaking out of your head might effect your plans to every day? I mean, it might cause a typhoon in Timbuktu! If you don't consider how the waste heat produced by your mental processes effect your environment while making long or short-term plans, you must not be a real intelligent agent...

Indeed, you are correct that waste heat is not much of a factor when it comes to humans. However, that does not mean that the same holds true for advanced agents running on powerful hardware, especially if such agents are interacting with each other; who knows what can be deduced from various side outputs, if a superintelligence is doing the deducing? Regardless of the answer, however, one thing is clear: AIXI does not care.

This seems to address the majority of your points, and the last few paragraphs of your comment seem mainly to be reiterating/elaborating on those points. As such, I'll refrain from replying in detail to everything else, in order not to make this comment longer than it already is. If you respond to me, you needn't feel obligated to reply to every individual point I made, either. I marked what I view as the most important points of disagreement with an asterisk*, so if you're short on time, feel free to respond only to those.

Comment by dxu on Decision Theory · 2019-06-29T00:35:19.341Z · score: 2 (3 votes) · LW · GW
At some point, the I/O channels *must be* well defined.

This statement is precisely what is being challenged--and for good reason: it's untrue. The reason it's untrue is because the concept of "I/O channels" does not exist within physics as we know it; the true laws of physics make no reference to inputs, outputs, or indeed any kind of agents at all. In reality, that which is considered a computer's "I/O channels" are simply arrangements of matter and energy, the same as everything else in our universe. There are no special XML tags attached to those configurations of matter and energy, marking them "input", "output", "processor", etc. Such a notion is unphysical.

Why might this distinction be important? It's important because an algorithm that is implemented on physically existing hardware can be physically disrupted. Any notion of agency which fails to account for this possibility--such as, for example, AIXI, which supposes that the only interaction it has with the rest of the universe is by exchanging bits of information via the input/output channels--will fail to consider the possibility that its own operation may be disrupted. A physical implementation of AIXI would have no regard for the safety of its hardware, since it has no means of representing the fact that the destruction of its hardware equates to its own destruction.

AIXI also fails on various decision problems that involve leaking information via a physical side channel that it doesn't consider part of its output; for example, it has no regard for the thermal emissions it may produce as a side effect of its computations. In the extreme case, AIXI is incapable of conceptualizing the possibility that an adversarial agent may be able to inspect its hardware, and hence "read its mind". This reflects a broader failure on AIXI's part: it is incapable of representing an entire class of hypotheses--namely, hypotheses that involve AIXI itself being modeled by other agents in the environment. This is, again, because AIXI is defined using a framework that makes it unphysical: the classical definition of AIXI is uncomputable, making it too "big" to be modeled by any (part) of the Turing machines in its hypothesis space. This applies even to computable formulations of AIXI, such as AIXI-tl: they have no way to represent the possibility of being simulated by others, because they assume they are too large to fit in the universe.

I'm not sure what exactly is so hard to understand about this, considering the original post conveyed all of these ideas fairly well. It may be worth considering the assumptions you're operating under--and in particular, making sure that the post itself does not violate those assumptions--before criticizing said post based on those assumptions.