Posts

Comments

Comment by dxu on Alignment By Default · 2020-08-16T16:18:57.704Z · score: 6 (3 votes) · LW · GW

I like this post a lot, and I think it points out a key crux between what I would term the "Yudkowsky" side (which seems to mostly include MIRI, though I'm not too sure about individual researchers' views) and "everybody else".

In particular, the disagreement seems to crystallize over the question of whether "human values" really are a natural abstraction. I suspect that if Eliezer thought that they were, he would be substantially less worried about AI alignment than he currently is (though naturally all of this is my read on his views).

You do provide some reasons to think that human values might be a natural abstraction, both in the post itself and in the comments, but I don't see these reasons as particularly compelling ones. The one I view as the most compelling is the argument that humans seems to be fairly good at identifying and using natural abstractions, and therefore any abstract concept that we seem to be capable of grasping fairly quickly has a strong chance of being a natural one.

However, I think there's a key difference between abstractions that are developed for the purposes of prediction, and abstractions developed for other purposes (by which I mostly mean "RL"). To the extent that a predictor doesn't have sufficient computational power to form a low-level model of whatever it's trying to predict, I definitely think that the abstractions it develops in the process of trying to improve its prediction will to a large extent be natural ones. (You lay out the reasons for this clearly enough in the post itself, so I won't repeat them here.)

It seems to me, though, that if we're talking about a learning agent that's actually trying to take actions to accomplish things in some environment, there's a substantial amount of learning going on that has nothing to do with learning to predict things with greater accuracy! The abstractions learned in order to select actions from a given action-space in an attempt to maximize a given reward function--these, I see little reason to expect will be natural. In fact, if the computational power afforded to the agent is good but not excellent, I expect mostly the opposite: a kludge of heuristics and behaviors meant to address different subcases of different situations, with not a whole lot of rhyme or reason to be found.

As agents go, humans are definitely of the latter type. And, therefore, I think the fact that we intuitively grasp the concept of "human values" isn't necessarily an argument that "human values" are likely to be natural, in the way that it would be for e.g. trees. The latter would have been developed as a predictive abstraction, whereas the former seems to mainly consist of what I'll term a reward abstraction. And it's quite plausible to me that reward abstractions are only legible by default to agents which implement that particular reward abstraction, and not otherwise. If that's true, then the fact that humans know what "human values" are is merely a consequence of the fact that we happen to be humans, and therefore have a huge amount of mind-structure in common.

To the extent that this is comparable to the branching pattern of a tree (which is a comparison you make in the post), I would argue that it increases rather than lessens the reason to worry: much like a tree's branch structure is chaotic, messy, and overall high-entropy, I expect human values to look similar, and therefore not really encompass any kind of natural category.

Comment by dxu on The "AI Dungeons" Dragon Model is heavily path dependent (testing GPT-3 on ethics) · 2020-08-02T23:51:28.965Z · score: 23 (7 votes) · LW · GW

Here's the actual explanation for this: https://twitter.com/nickwalton00/status/1289946861478936577

This seems to have been an excellent exercise in noticing confusion; in particular, to figure this one out properly would have required one to not recognize that this behavior does not accord with one's pre-existing model, rather than simply coming up with an ad hoc explanation to fit the observation.

I therefore award partial marks to Rafael Harth for not proposing any explanations in particular, as well as Viliam in the comments:

I assumed that the GPT's were just generating the next word based on the previous words, one word at a time. Now I am confused.

Zero marks to Andy Jones, unfortunately:

I am fairly confident that Latitude wrap your Dungeon input before submitting it to GPT-3; if you put in the prompt all at once, that'll make for different model input than putting it in one line at a time.

Don't make up explanations! Take a Bayes penalty for your transgressions!

(No one gets full marks, unfortunately, since I didn't see anyone actually come up with the correct explanation.)

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-26T17:54:02.035Z · score: 6 (5 votes) · LW · GW

For what it's worth, my perception of this thread is the opposite of yours: it seems to me John Wentworth's arguments have been clear, consistent, and easy to follow, whereas you (John Maxwell) have been making very little effort to address his position, instead choosing to repeatedly strawman said position (and also repeatedly attempting to lump in what Wentworth has been saying with what you think other people have said in the past, thereby implicitly asking him to defend whatever you think those other people's positions were).

Whether you've been doing this out of a lack of desire to properly engage, an inability to comprehend the argument itself, or some other odd obstacle is in some sense irrelevant to the object-level fact of what has been happening during this conversation. You've made your frustration with "AI safety people" more than clear over the course of this conversation (and I did advise you not to engage further if that was the case!), but I submit that in this particular case (at least), the entirety of your frustration can be traced back to your own lack of willingness to put forth interpretive labor.

To be clear: I am making this comment in this tone (which I am well aware is unkind) because there are multiple aspects of your behavior in this thread that I find not only logically rude, but ordinarily rude as well. I more or less summarized these aspects in the first paragraph of my comment, but there's one particularly onerous aspect I want to highlight: over the course of this discussion, you've made multiple references to other uninvolved people (either with whom you agree or disagree), without making any effort at all to lay out what those people said or why it's relevant to the current discussion. There are two examples of this from your latest comment alone:

Daniel K agreed with me the other day that there isn't a standard reference for this claim. [Note: your link here is broken; here's a fixed version.]

A MIRI employee openly admitted here that they apply different standards of evidence to claims of safety vs claims of not-safety.

Ignoring the question of whether these two quoted statements are true (note that even the fixed version of the link above goes only to a top-level post, and I don't see any comments on that post from the other day), this is counterproductive for a number of reasons.

Firstly, it's inefficient. If you believe a particular statement is false (and furthermore, that your basis for this belief is sound), you should first attempt to refute that statement directly, which gives your interlocutor the opportunity to either counter your refutation or concede the point, thereby moving the conversation forward. If you instead counter merely by invoking somebody else's opinion, you both increase the difficulty of answering and end up offering weaker evidence.

Secondly, it's irrelevant. John Wentworth does not work at MIRI (neither does Daniel Kokotajlo, for that matter), so bringing up aspects of MIRI's position you dislike does nothing but highlight a potential area where his position differs from MIRI's. (I say "potential" because it's not at all obvious to me that you've been representing MIRI's position accurately.) In order to properly challenge his position, again it becomes more useful to critique his assertions directly rather than round them off to the closest thing said by someone from MIRI.

Thirdly, it's a distraction. When you regularly reference a group of people who aren't present in the actual conversation, repeatedly make mention of your frustration and "grumpiness" with those people, and frequently compare your actual interlocutor's position to what you imagine those people have said, all while your actual interlocutor has said nothing to indicate affiliation with or endorsement of those people, it doesn't paint a picture of an objective critic. To be blunt: it paints a picture of someone with a one-sided grudge against the people in question, and is attempting to inject that grudge into conversations where it shouldn't be present.

I hope future conversations can be more pleasant than this.

Comment by dxu on The Basic Double Crux pattern · 2020-07-23T04:16:58.984Z · score: 2 (1 votes) · LW · GW

I think shminux may have in mind one or more specific topics of contention that he's had to hash out with multiple LWers in the past (myself included), usually to no avail. 

(Admittedly, the one I'm thinking of is deeply, deeply philosophical, to the point where the question "what if I'm wrong about this?" just gets the intuition generator to spew nonsense. But I would say that this is less about an inability to question one's most deeply held beliefs, and more about the fact that there are certain aspects of our world-models that are still confused, and querying them directly may not lead to any new insight.)

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T03:54:52.105Z · score: 6 (3 votes) · LW · GW

If it's read moral philosophy, it should have some notion of what the words "human values" mean.

GPT-3 and systems like it are trained to mimic human discourse. Even if (in the limit of arbitrary computational power) it manages to encode an implicit representation of human values somewhere in its internal state, in actual practice there is nothing tying that representation to the phrase "human values", since moral philosophy is written by (confused) humans, and in human-written text the phrase "human values" is not used in the consistent, coherent manner that would be required to infer its use as a label for a fixed concept.

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T03:48:17.942Z · score: 2 (1 votes) · LW · GW

On "conceding the point":

You said earlier that "The argument for the fragility of value never relied on AI being unable to understand human values." I gave you a quote from Superintelligence which talked about AI being unable to understand human values. Are you gonna, like, concede the point or something?

The thesis that values are fragile doesn't have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede.

On gwern's article:

Anyway, I read Gwern's article a while ago and I thought it was pretty bad. If I recall correctly, Gwern confuses various different notions, for example, he seemed to think that if you replace enough bits of handcrafted software with bits trained using machine learning, an agent will spontaneously emerge.

I'm not sure how to respond to this, except to state that neither this specific claim nor anything particularly close to it appears in the article I linked.

On Tool AI:

Are possible

As far as I'm aware, this point has never been the subject of much dispute.

Are easier to build than Agent AIs

This is still arguable; I have my doubts, but in a "big picture" sense this is largely irrelevant to the greater point, which is:

Will be able to solve the value-loading problem

This is (and remains) the crux. I still don't see how GPT-3 supports this claim! Just as a check that we're on the same page: when you say "value-loading problem", are you referring to something more specific than the general issue of getting an AI to learn and behave according to our values?

***

META: I can understand that you're frustrated about this topic, especially if it seems to you that the "MIRI-sphere" (as you called it in a different comment) is persistently refusing to acknowledge something that appears obvious to you.

Obviously, I don't agree with that characterization, but in general I don't want to engage in a discussion that one side is finding increasingly unpleasant, especially since that often causes the discussion to rapidly deteriorate in quality after a few replies.

As such, I want to explicitly and openly relieve you of any social obligation you may have felt to reply to this comment. If you feel that your time would be better spent elsewhere, please do!

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T02:43:24.482Z · score: 2 (1 votes) · LW · GW

My claim is that we are likely to see a future GPT-N system which [...] does not "resist attempts to meddle with its motivational system".

Well, yes. This is primarily because GPT-like systems don't have a "motivational system" with which to meddle. This is not a new argument by any means: the concept of AI systems that aren't architecturally goal-oriented by default is known as "Tool AI", and there's plenty of pre-existing discussion on this topic. I'm not sure what you think GPT-3 adds to the discussion that hasn't already been mentioned?

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T22:33:36.283Z · score: 2 (1 votes) · LW · GW

I'm confused by what you're saying.

The argument for the fragility of value never relied on AI being unable to understand human values. Are you claiming it does?

If not, what are you claiming?

Comment by dxu on Coronavirus as a test-run for X-risks · 2020-06-14T00:06:03.697Z · score: 3 (2 votes) · LW · GW

I'd love to see more thought about how the MNM effect might look in an AI scenario. Like you said, maybe denials and assurances followed by freakouts and bans. But maybe we could predict what sorts of events would trigger the shift?

I take it you're presuming slow takeoff in this paragraph, right?

Comment by dxu on Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning · 2020-06-10T16:59:04.237Z · score: 4 (2 votes) · LW · GW

Differing discourse norms; in general, communities that don't expend a constant amount of time-energy into maintaining better-than-average standards of discourse will, by default, regress to the mean. (We saw the same thing happen with LW1.0.)

Comment by dxu on GPT-3: a disappointing paper · 2020-06-02T20:06:38.721Z · score: 21 (6 votes) · LW · GW

I'm not seeing how you distinguish between the following two hypotheses:

  1. GPT-3 exhibits mostly flat scaling at the tasks you mention underneath your first bullet point (WiC, MultiRC, etc.) because its architecture is fundamentally unsuited to those tasks, such that increasing the model capacity will lead to little further improvement.
  2. Even 175B parameters isn't sufficient to perform well on certain tasks (given a fixed architecture), but increasing the number of parameters will eventually cause performance on said tasks to undergo a large increase (akin to something like a phase change in physics).

It sounds like you're implicitly taking the first hypothesis as a given (e.g. when you assert that there is a "remaining gap vs. fine-tuning that seems [unlikely] to be closed"), but I see no reason to give this hypothesis preferential treatment!

In fact, it seems to be precisely the assertion of the paper's authors that the first hypothesis should not be taken as a given; and the evidence they give to support this assertion is... the multiple downstream tasks for which an apparent "phase change" did in fact occur. Let's list them out:

  • BoolQ (apparent flatline between 2.6B and 13B, then a sudden jump in performance at 175B)
  • CB (essentially noise between 0.4B and 13B, then a sudden jump in performance at 175B)
  • RTE (essentially noise until 2.6B, then a sudden shift to very regular improvement until 175B)
  • WSC (essentially noise until 2.6B, then a sudden shift to very regular improvement until 175B)
  • basic arithmetic (mostly flat until 6.7B, followed by rapid improvement until 175B)
  • SquadV2 (apparent flatline at 0.8B, sudden jump at 1.3B followed by approximately constant rate of improvement until 175B)
  • ANLI round 3 (noise until 13B, sudden jump at 175B)
  • word-scramble with random insertion (sudden increase in rate of improvement after 6.7B)

Several of the above examples exhibit a substantial amount of noise in their performance graphs, but nonetheless, I feel my point stands. Given this, it seems rather odd for you to be claiming that the "great across-task variance" indicates a lack of general reasoning capability when said across-task variance is (if anything) evidence for the opposite, with many tasks that previously stumped smaller models being overcome by GPT-3.

It's especially interesting to me that you would write the following, seemingly without realizing the obvious implication (emphasis mine):

we still see a wide spread of task performance despite smooth gains in LM loss, with some of the most distinctive deficits persisting at all scales (common sense physics, cf section 5), and some very basic capabilities only emerging at very large scale and noisily even there (arithmetic)

The takeaway here is, at least in my mind, quite clear: it's a mistake to evaluate model performance on human terms. Without getting into an extended discussion on whether arithmetic ought to count as a "simple" or "natural" task, empirically transformers do not exhibit a strong affinity for the task. Therefore, the fact that this "basic capability" emerges at all is, or at least should be, strong evidence for generalization capability. As such, the way you use this fact to argue otherwise (both in the section I just quoted and in your original post) seems to me to be exactly backwards.


Elsewhere, you write:

The ability to get better downstream results is utterly unsurprising: it would be very surprising if language prediction grew steadily toward perfection without a corresponding trend toward good performance on NLP benchmarks

It's surprising to me that you would write this while also claiming that few-shot prediction seems unlikely to close the gap to fine-tuned models on certain tasks. I can't think of a coherent model where both of these claims are simultaneously true; if you have one, I'd certainly be interested in hearing what it is.

More generally, this is (again) why I stress the importance of concrete predictions. You call it "utterly unsurprising" that a 175B-param model would outperform smaller ones on NLP benchmarks, and yet neither you nor anyone else could have predicted what the scaling curves for those benchmarks would look like. (Indeed, your entire original post can be read as an expression of surprise at the lack of impressiveness of GPT-3's performance on certain benchmarks.)

When you only ever look at things in hindsight, without ever setting forth concrete predictions that can be overturned by evidence, you run the risk of never forming a model concrete enough to be engaged with. I don't believe it's a coincidence that you called it "difficult" to explain why you found the paper unimpressive: it's because your standards of impressiveness are opaque enough that they don't, in and of themselves, constitute a model of how transformers might/might not possess general reasoning ability.

Comment by dxu on GPT-3: a disappointing paper · 2020-06-02T17:49:33.971Z · score: 6 (4 votes) · LW · GW

Also note that a significant number of humans would fail the kind of test you described (inducing the behavior of a novel mathematical operation from a relatively small number of examples), which is why similar tests of inductive reasoning ability show up quite often on IQ tests and the like. It's not the case that failing at that kind of test shows a lack of general reasoning skills, unless we permit that a substantial fraction of humans lack general reasoning skills to at least some extent.

Comment by dxu on GPT-3: a disappointing paper · 2020-05-31T02:12:32.804Z · score: 10 (4 votes) · LW · GW

I don't think the practical value of very new techniques is impossible to estimate. For example, the value of BERT was very clear in the paper that introduced it: it was obvious that this was a strictly better way to do supervised NLP, and it was quickly and widely adopted.

This comparison seems disingenuous. The goal of the BERT paper was to introduce a novel training method for Transformer-based models that measurably outperformed previous training methods. Conversely, the goal of the GPT-3 paper seems to be to investigate the performance of an existing training method when scaled up to previously unreached (and unreachable) model sizes. I would expect you to agree that these are two very different things, surely?

More generally, it seems to me that you've been consistently conflating the practical usefulness of a result with how informative said result is. Earlier, you wrote that "few-shot LM prediction" (not GPT-3 specifically, few-shot prediction in general!) doesn't sound that promising to you because the specific model discussed in the paper doesn't outperform SOTA on all benchmarks, and also requires currently impractical levels of hardware/compute. Setting aside the question of whether this original claim resembles the one you just made in your latest response to me (it doesn't), neither claim addresses what, in my view, are the primary implications of the GPT-3 paper--namely, what it says about the viability of few-shot prediction as model capacity continues to increase.

This, incidentally, is why I issued the "smell test" described in the grandparent, and your answer more or less confirms what I initially suspected: the paper comes across as unsurprising to you because you largely had no concrete predictions to begin with, beyond the trivial prediction that existing trends will persist to some (unknown) degree. (In particular, I didn't see anything in what you wrote that indicates an overall view of how far the capabilities current language models are from human reasoning ability, and what that might imply about where model performance might start flattening with increased scaling.)

Since it doesn't appear that you had any intuitions to begin with about what GPT-3's results might indicate about the scalability of language models in general, it makes sense that your reading of the paper would be framed in terms of practical applications, of which (quite obviously) there are currently none.

Comment by dxu on Draconarius's Shortform · 2020-05-30T20:01:02.374Z · score: 2 (1 votes) · LW · GW

If the number of guests is countable (which is the usual assumption in Hilbert’s setup), then every guest will only have to travel a finite (albeit unboundedly long) distance before they reach their room.

Comment by dxu on GPT-3: a disappointing paper · 2020-05-30T19:31:37.868Z · score: 10 (3 votes) · LW · GW

What do you think that main significance is?

I can’t claim to speak for gwern, but as far as significance goes, Daniel Kokotajlo has already advanced a plausible takeaway. Given that his comment is currently the most highly upvoted comment on this post, I imagine that a substantial fraction of people here share his viewpoint.

Given my past ML experience, this just doesn't sound that promising to me, which may be our disconnect.

I strongly suspect the true disconnect comes a step before this conclusion: namely, that “[your] past ML experience” is all that strongly predictive of performance using new techniques. A smell test: what do you think your past experience would have predicted about the performance of a 175B-parameter model in advance? (And if the answer is that you don’t think you would have had clear predictions, then I don’t see how you can justify this “review” of the paper as anything other than hindsight bias.)

Comment by dxu on AGIs as collectives · 2020-05-29T02:33:46.320Z · score: 2 (1 votes) · LW · GW
  • "There seems to be no reason not to expect that human value functions have similar problems, which even "aligned" AIs could trigger unless they are somehow designed not to." There are plenty of reasons to think that we don't have similar problems - for instance, we're much smarter than the ML systems on which we've seen adversarial examples. Also, there are lots of us, and we keep each other in check.
  • "For example, such AIs could give humans so much power so quickly or put them in such novel situations that their moral development can't keep up, and their value systems no longer apply or give essentially random answers." What does this actually look like? Suppose I'm made the absolute ruler of a whole virtual universe - that's a lot of power. How might my value system "not keep up"?

I confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.

Comment by dxu on AI Boxing for Hardware-bound agents (aka the China alignment problem) · 2020-05-09T04:47:27.956Z · score: 2 (1 votes) · LW · GW

"Will it happen?" isn't vacuous or easy, generally speaking. I can think of lots of questions where I have no idea what the answer is, despite a "trend of ever increasing strength".

In the post, you write:

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

"Will it happen?" is easy precisely in cases where a development "seems inevitable"; the hard part then becomes forecasting when such a development will occur. The fact that you (and most computer Go experts, in fact) did not do this is a testament to how unpredictable conceptual advances are, and your attempt to reduce it to the mere continuation of a trend is an oversimplification of the highest order.

I've made specific statements about my beliefs for when Human-Level AI will be developed. If you disagree with these predictions, please state your own.

You've made statements about your willingness to bet at non-extreme odds over relatively large chunks of time. This indicates both low confidence and low granularity, which means that there's very little disagreement to be had. (Of course, I don't mean to imply that it's possible to do better; indeed, given the current level of uncertainty surrounding everything to do with AI, about the only way to get me to disagree with you would have been to provide a highly confident, specific prediction.)

Nevertheless, it's an indicator that you do not believe you possess particularly reliable information about future advances in AI, so I remain puzzled that you would present your thesis so strongly at the start. In particular, your claim that the following questions

Does this mean that the development of human-level AI might not surprise us? Or that by the time human level AI is developed it will already be old news?

depend on

whether or not you were surprised by the development of Alpha-Go

seems to have literally no connection to what you later claim, which is that AlphaGo did not surprise you because you knew something like it had to happen at some point. What is the relevant analogy here to artificial general intelligence? Will artificial general intelligence be "old news" because we suspected from the start that it was possible? If so, what does it mean for something be "old news" if you have no idea when it will happen, and could not have predicted it would happen at any particular point until after it showed up?

As far as I can tell, reading through both the initial post and the comments, none of these questions have been answered.

Comment by dxu on AI Boxing for Hardware-bound agents (aka the China alignment problem) · 2020-05-09T00:36:36.234Z · score: 2 (1 votes) · LW · GW

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

This seems to entirely ignore most (if not all) of the salient implications of AlphaGo's development. What set AlphaGo apart from previous attempts at computer Go was the iterated distillation and amplification scheme employed during its training scheme. This represents a genuine conceptual advance over previous approaches, and to characterize it as simply a continuation of the trend of increasing strength in Go-playing programs only works if you neglect to define said "trend" in any way more specific than "roughly monotonically increasing". And if you do that, you've tossed out any and all information that would make this a useful and non-vacuous observation.

Shortly after this paragraph, you write:

For the record, I was surprised at how soon Alpha-Go happened, but not that it happened.

In other words, you got the easy and useless part ("will it happen?") right, and the difficult and important part ("when will it happen?") wrong. It's not clear to me why you feel this necessitated mention at all, but since you did mention it, I feel obligated to point out that "predictions" of this caliber are the best you'll ever be able to do if you insist on throwing out any information more specific and granular than "historically, these metrics seem to move consistently upward/downward".

Comment by dxu on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:09:51.563Z · score: 3 (2 votes) · LW · GW

If it were common knowledge that any hyperbolic language experts use when speaking about the unlikelihood of AGI (e.g. Andrew Ng's statement "worrying about AI safety is like worrying about overpopulation on Mars") actually corresponded to a 10% subjective probability of AGI, things would look very different than they currently do.

More generally, on a strategic level there is very little difference between a genuinely incorrect forecast and one that is "correct", but communicated so poorly as to create a wrong impression in the mind of the listener. If the state of affairs is such that anyone who privately believes there is a 10% chance of AGI is incentivized to instead report their assessment as "remote", the conclusion of Ord/Yudkowsky holds, and it remains impossible to discern whether AGI is imminent by listening to expert forecasts.

(I also don't believe that said experts, if asked to translate their forecasts to numerical probabilities, would give a median estimate anywhere near as high as 10%, but that's largely tangential to the discussion at hand.)

Furthermore, and more importantly, however: I deny that Fermi's 10% somehow detracts from the point that forecasting the future of novel technologies is hard.

Four years prior to overseeing the world's first nuclear reaction, Fermi believed that it was more likely than not that a nuclear chain reaction was impossible. Setting aside for a moment the question of whether Fermi's specific probability assignment was negligible, or merely small, what this indicates is that the majority of the information necessary to determine the possibility of a nuclear chain reaction was in fact unavailable to Fermi at the time he made his forecast. This does not support the idea that making predictions about technology is easy, any more than it would have if Fermi had assigned 0.001% instead of 10%!

More generally, the specific probability estimate Fermi gave is nothing more than a red herring, one that is given undue attention by the OP. The relevant factor to Ord/Yudkowsky's thesis is how much uncertainty there is in the probability distribution of a given technology--not whether the mean of said distribution, when treated as a point estimate, happens to be negligible or non-negligible. Focusing too much on the latter not only obfuscates the correct lesson to be learned, but also sometimes leads to nonsensical results.

Comment by dxu on Being right isn't enough. Confidence is very important. · 2020-04-07T21:25:51.648Z · score: 1 (2 votes) · LW · GW

The original post wasn’t talking about “correctness”; it was talking about calibration, which is a very specific term with a very specific meaning. Machines one and two are both well-calibrated, but there is nothing requiring that two well-calibrated distributions must perform equally well against each other in a series of bets.

Indeed, this is the very point of the original post, so your comment attempting to contradict it did not, in fact, do so.

Comment by dxu on Predictors exist: CDT going bonkers... forever · 2020-01-15T22:14:38.523Z · score: 5 (3 votes) · LW · GW

these examples can't actually happen, or are so rare that I'll pay that cost in order to have a simpler model for the other 99.9999% of my decisions

Indeed, if it were true that Newcomb-like situations (or more generally, situations where other agents condition their behavior on predictions of your behavior) do not occur with any appreciable frequency, there would be much less interest in creating a decision theory that addresses such situations.

But far from constituting a mere 0.0001% of possible situations (or some other, similarly minuscule percentage), Newcomb-like situations are simply the norm! Even in everyday human life, we frequently encounter other people and base our decisions off what we expect them to do—indeed, the ability to model others and act based on those models is integral to functioning as part of any social group or community. And it should be noted that humans do not behave as causal decision theory predicts they ought to—we do not betray each other in one-shot prisoner’s dilemmas, we pay people we hire (sometimes) well in advance of them completing their job, etc.

This is not mere “irrationality”; otherwise, there would have been no reason for us to develop these kinds of pro-social instincts in the first place. The observation that CDT is inadequate is fundamentally a combination of (a) the fact that it does not accurately predict certain decisions we make, and (b) the claim that the decisions we make are in some sense correct rather than incorrect—and if CDT disagrees, then so much the worse for CDT. (Specifically, the sense in which our decisions are correct—and CDT is not—is that our decisions result in more expected utility in the long run.)

All it takes for CDT to fail is the presence of predictors. These predictors don’t have to be Omega-style superintelligences—even moderately accurate predictors who perform significantly (but not ridiculously) above random chance can create Newcomb-like elements with which CDT is incapable of coping. I really don’t see any justification at all for the idea that these situations somehow constitute a superminority of possible situations, or (worse yet) that they somehow “cannot” happen. Such a claim seems to be missing the forest for the trees: you don’t need perfect predictors to have these problems show up; the problems show up anyway. The only purpose of using Omega-style perfect predictors is to make our thought experiments clearer (by making things more extreme), but they are by no means necessary.

Comment by dxu on Realism about rationality · 2020-01-14T23:35:31.356Z · score: 2 (1 votes) · LW · GW

That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing.

In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.

Comment by dxu on Realism about rationality · 2020-01-13T23:33:17.345Z · score: 2 (1 votes) · LW · GW

Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum?

It depends on what you mean by "lawful". Right now, the word "lawful" in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it's not clear to me why "law thinking" is relevant in the first place--it seems as though it simply muddies the discussion by introducing additional concepts.

Comment by dxu on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T04:27:05.281Z · score: 23 (10 votes) · LW · GW

Skimming through. May or may not post an in-depth comment later, but for the time being, this stood out to me:

I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.

I note that Yann has not actually specified a way of not "giving [the AI] moronic objectives with no safeguards". The argument of AI risk advocates is precisely that the thing in quotes in the previous sentence is difficult to do, and that people do not have to be "ridiculously stupid" to fail at it--as evidenced by the fact that no one has actually come up with a concrete way of doing it yet. It doesn't look to me like Yann addressed this point anywhere; he seems to be under the impression that repeating his assertion more emphatically (obviously, when we actually get around to building the AI, we'll use our common sense and build it right) somehow constitutes an argument in favor of said assertion. This seems to be an unusually low-quality line of argument from someone who, from what I've seen, is normally much more clear-headed than this.

Comment by dxu on What explanatory power does Kahneman's System 2 possess? · 2019-08-12T17:52:50.266Z · score: 6 (4 votes) · LW · GW

I'm curious as to what prompted this question?

Comment by dxu on Weak foundation of determinism analysis · 2019-08-08T17:50:52.400Z · score: 4 (2 votes) · LW · GW

I have pointed out what people worry they are going to lose under determinism. Yes, they only going to have those things under nondeterminism.

You just said that nondeterminist intuitions are only mistaken if determinism is true and compatibilism is false. So what exactly is being lost if you subscribe to both determinism and compatibilism?

Comment by dxu on Weak foundation of determinism analysis · 2019-08-08T00:08:34.105Z · score: 2 (3 votes) · LW · GW

You're mixing levels. If someone can alter their decisions, that implies there are multiple possible next states of the universe

This is incorrect. It's possible to imagine a counterfactual state in which the person in question differs from their actual self in an unspecified manner, which thereby causes them to make a different decision; this counterfactual state differs from reality, but it is by no means incoherent. Furthermore, the comparison of various counterfactual futures of this type is how decision-making works; it is an abstraction used for the purpose of computation, not something ontologically fundamental to the way the universe works--and the fact that some people insist it be the latter is the source of much confusion. This is what I meant when I wrote:

Decision-making itself is also a process that occurs in the map, not the territory; there is no contradiction here.

So there is no "mixing levels" going on here, as you can see; rather, I am specifically making sure to keep the levels apart, by not tying the mental process of imagining and assessing various potential outcomes to the physical question of whether there are actually multiple physical outcomes. In fact, the one who is mixing levels is you, since you seem to be assuming for some reason that the mental process in question somehow imposes itself onto the laws of physics.

(Here is a thought experiment: I think you will agree that a chess program, if given a chess position and run for a prespecified number of steps, will output a particular move for that position. Do you believe that this fact prevents the chess program from considering other possible moves it might make in the position? If so, how do you explain the fact that the chess program explicitly contains a game tree with multiple branches, the vast majority of which will not in fact occur?)

There are various posts in the sequences that directly address this confusion; I suggest either reading them or re-reading them, depending on whether you have already.

Comment by dxu on Weak foundation of determinism analysis · 2019-08-07T20:29:55.829Z · score: 2 (1 votes) · LW · GW

But this is map, not territory.

Certainly. Decision-making itself is also a process that occurs in the map, not the territory; there is no contradiction here. Some people may find the idea of decision-making being anything but a fundamental, ontologically primitive process somehow unsatisfying, or even disturbing, but I submit that this is a problem with their intuitions, not with the underlying viewpoint.

(If someone goes so far as to alter their decisions based on their belief in determinism--say, by lounging on the couch watching TV all day rather than being productive, because their doing so was "predetermined"--I would say that they are failing to utilize their brain's decision-making apparatus. (Or rather, that they are not using it very well.) This has nothing to do with free will, determinism, or anything of the like; it is simply a (causal) consequence of the fact that they have misinterpreted what it means to be an agent in a deterministic universe.)

Comment by dxu on Weak foundation of determinism analysis · 2019-08-07T17:23:08.715Z · score: 1 (2 votes) · LW · GW

Belief in determinism is correlated with worse outcomes, but one doesn't cause the other; both are determined by the state and process of the universe.

Read literally, you seem to be suggesting that a deterministic universe doesn't have cause and effect, only correlation. But this reading seems prima facie absurd, unless you're using a very non-standard notion of "cause and effect". Are you arguing, for example, that it's impossible to draw a directed acyclic graph in order to model events in a deterministic universe? If not, what are you arguing?

Comment by dxu on Drive-By Low-Effort Criticism · 2019-08-02T01:13:29.140Z · score: 3 (3 votes) · LW · GW

We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.

I'm afraid this sentence doesn't parse for me. You seem to be speaking of "results" as something which to which the concept of rewards and punishments are applicable. However, I'm not aware of any context in which this is a meaningful (rather than nonsensical) thing to say. All theories of behavior I've encountered that make mention of the concept of rewards and punishments (e.g. operant conditioning) refer to them as a means of influencing behavior. If there's something else you're referring to when you say "reward or punish the results", I would appreciate it if you clarified what exactly that thing is.

Comment by dxu on Drive-By Low-Effort Criticism · 2019-08-02T00:46:30.824Z · score: 25 (13 votes) · LW · GW

You should neither reward nor punish strategies or attempts at all, but results.

This statement is presented in a way that suggests the reader ought to find it obvious, but in fact I don't see why it's obvious at all. If we take the quoted statement at face value, it appears to be suggesting that we apply our rewards and punishments (whatever they may be) to something which is causally distant from the agent whose behavior we are trying to influence--namely, "results"--and, moreover, that this approach is superior to the approach of applying those same rewards/punishments to something which is causally immediate--namely, "strategies".

I see no reason this should be the case, however! Indeed, it seems to me that the opposite is true: if the rewards and punishments for a given agent are applied based on a causal node which is separated from the agent by multiple causal links, then there is a greater number of ancestor nodes that said rewards/punishments must propagate through before reaching the agent itself. The consequences of this are twofold: firstly, the impact of the reward/punishment is diluted, since it must be divided among a greater number of potential ancestor nodes. And secondly, because the agent has no way to identify which of these ancestor nodes we "meant" to reward or punish, our rewards/punishments may end up impacting aspects of the agent's behavior we did not intend to influence, sometimes in ways that go against what we would prefer. (Moreover, the probability of such a thing occurring increases drastically as the thing we reward/punish becomes further separated from the agent itself.)

The takeaway from this, of course, is that strategically rewarding and punishing things grows less effective as the proxy on which said rewards and punishments are based grows further from the thing we are trying to influence--a result which sometimes goes by a more well-known name. This then suggests that punishing results over strategies, far from being a superior approach, is actually inferior: it has lower chances of influencing behavior we would like to influence, and higher chances of influencing behavior we would not like to influence.

(There are, of course, benefits as well as costs to rewarding and punishing results (rather than strategies). The most obvious benefit is that it is far easier for the party doing the rewarding and punishing: very little cognitive effort is required to assess whether a given result is positive or negative, in stark contrast to the large amounts of effort necessary to decide whether a given strategy has positive or negative expectation. This is why, for example, large corporations--which are often bottlenecked on cognitive effort--generally reward and punish their employees on the basis of easily measurable metrics. But, of course, this is a far cry from claiming that such an approach is simply superior to the alternative. (It is also why large corporations so often fall prey to Goodhart's Law.))

Comment by dxu on Drive-By Low-Effort Criticism · 2019-07-31T18:20:01.220Z · score: 37 (11 votes) · LW · GW

It is good to discourage people from spending a lot of effort on making things that have little or no (or even negative) value.

Would you care to distinguish a means of discouraging people from spending effort on low-value things, from a means that simply discourages people from spending effort in general? It seems to me that here you are taking the concept of "making things that have little or no (or even negative) value" as a primitive action--something that can be "encouraged" or "discouraged"--whereas, on the other hand, it seems to me that the true primitive action here is spending effort in the first place, and that actions taken to disincentivize the former, will in fact turn out to disincentivize the latter.

If this is in fact the case, then the question is not so simple as whether we ought to discourage posters from spending effort on making incorrect posts (to which the answer would of course be "yes, we ought"), but rather, whether we ought to discourage posters from spending effort. To this, you say:

But there is no virtue in mere effort.

Perhaps there is no "virtue" in effort, but in that case we must ask why "virtue" is the thing we are measuring. If the goal is to maximize, not "virtue", but high-quality posts, then I submit that (all else being equal) having more high-effort posts is more likely to accomplish this than having fewer high-effort posts. Unless your contention is that all else is not equal (perhaps high-effort posts are more likely to contain muddled thinking, and hence more likely to have incorrect conclusions? but it's hard to see why this should be the case a priori), then it seems to me that encouraging posters to put large amounts of effort into their posts is simply a better course of action than discouraging them.

And what does it mean to "encourage" or "discourage" a poster? Based on the following part of your comment, it seems that you are taking "discourage" to mean something along the lines of "point out ways in which the post in question is mistaken":

If I post a long, in-depth analysis, which is lovingly illustrated, meticulously referenced, and wrong, and you respond with a one-line comment that points out the way in which my post was wrong, then I have done poorly (and my post ought to be downvoted), while you have done well (and your comment ought to be upvoted).

But how often is it the case that a "long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced" is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner? I claim that this so rarely occurs that it should play a negligible role in our considerations--in other words, that the hypothetical situation you describe does not reflect reality.

What occurs more often, I think, is that a commenter finds themselves mistakenly under the impression that they have spotted an obvious error, and then proceeds to post (what they believe to be) an obvious refutation. I further claim that such cases are disproportionately responsible for the so-called "drive-by low-effort criticism" described in the OP. It may be that you disagree with this, but whether it is true or not is in a matter of factual accuracy, not opinion. However, if one happens to believe it is true, then it should not be difficult to understand why one might prefer to see less of the described behavior.

Comment by dxu on FactorialCode's Shortform · 2019-07-31T00:02:55.131Z · score: 4 (2 votes) · LW · GW

By default on reddit and lesswrong, posts start with 1 karma, coming from the user upvoting themselves.

Actually, on LessWrong, I'm fairly sure the karma value of a particular user's regular vote depends on the user's existing karma score. Users with a decent karma total usually have a default vote value of 2 karma rather than 1, so each comment they post will have 2 karma to start. Users with very high karma totals seem to have a vote that's worth 3 karma by default. Something similar happens with strong votes, though I'm not sure what kind of math is used there.

Aside: I've sometimes thought that users should be allowed to pick a value for their vote that's anywhere between 1 and the value of their strong upvote, instead of being limited to either a regular vote (2 karma in my case) or a strong vote (6 karma). In my case, I literally can't give people karma values of 1, 3, 4, or 5, which could be useful for more granular valuations.

Comment by dxu on Dialogue on Appeals to Consequences · 2019-07-25T17:44:06.908Z · score: 11 (3 votes) · LW · GW

The part about climate science seems like a pretty bog-standard outside view argument, which in turn means I find it largely uncompelling. Yes, there are people who are so stupid, they can only be saved from their own stupidity by executing an epistemic maneuver that works regardless of the intelligence of the person executing it. This does not thereby imply that everyone should execute the same maneuver, including people who are not that stupid, and therefore not in need of saving. If someone out there is so incompetent that they mistakenly perceive themselves as competent, then they are already lost, and the fact that an illegal (from the perspective of normative probability theory) epistemic maneuver exists which would save them if they executed it, does not thereby make that maneuver a normatively good move. (And even if it were, it's not as though the people who would actually benefit from said maneuver are going to execute it--the whole reason that such people are loudly, confidently mistaken is that they don't take the outside view seriously.)

In short: there is simply no principled justification for modesty-based arguments, and--though it may be somewhat impolite to say--I agree with Eliezer that people who find such arguments compelling are actually being influenced by social modesty norms (whether consciously or unconsciously), rather than any kind of normative judgment. Based on various posts that Scott has written in the past, I would venture to say that he may be one of those people.

Comment by dxu on Dialogue on Appeals to Consequences · 2019-07-25T17:24:50.585Z · score: 4 (2 votes) · LW · GW

This is a fictional dialogue demonstrating a meta-level point about how discourse works, and your comment is pretty off-topic.

I think that if a given "meta-level point" has obvious ties to existing object-level discussions, then attempting to suppress the object-level points when they're raised in response is pretty disingenuous. (What I would actually prefer is for the person making the meta-level point to be the same person pointing out the object-level connection, complete with "and here is why I feel this meta-level point is relevant to the object level". If the original poster doesn't do that, then it does indeed make comments on the object-level issues seem "off-topic", a fact which ought to be laid at the feet of the original poster for not making the connection explicit, rather than at the feet of the commenter, who correctly perceived the implications.)

Now, perhaps it's the case that your post actually had nothing to do with the conversations surrounding EA or whatever. (I find this improbable, but that's neither here nor there.) If so, then you as a writer ought to have picked a different example, one with fewer resemblances to the ongoing discussion. (The example Jeff gave in his top-level comment, for example, is not only clearer and more effective at conveying your "meta-level point", but also bears significantly less resemblance to the controversy around EA.) The fact that the example you chose so obviously references existing discussions that multiple commenters pointed it out is evidence that either (a) you intended for that to happen, or (b) you really didn't put a lot of thought into picking a good example.

Comment by dxu on Appeal to Consequence, Value Tensions, And Robust Organizations · 2019-07-24T18:24:52.143Z · score: 2 (1 votes) · LW · GW

In contrast, my model has been that communities congregate around predictable sources of high-quality writing, and people who can produce high-quality content in high volume are very rare. Thus, once Eliezer Yudkowsky stopped being active, and Yvain a.k.a. the immortal Scott Alexander moved to Slate Star Codex (in part so that he could write about politics, which we've traditionally avoided), all the "intellectual energy" followed Scott to SSC.

First, I want to state that I agree with this model. However, I also want to note that the SSC comments section tend to have fairly low-quality discussion (in comparison to the OB/LW 1.0 heyday), and I'm not sure why this is; candidate hypotheses include that Scott's explicit politics attracted people with lower epistemic standards, or that the lack of an explicit karma system allowed low-quality discussion to persist (but I don't think OB had an explicit karma system either?).

Overall, I'm unsure as to what kind of norms/technology maintains high-quality discussion (as opposed to just the presence of discussion in general), and it's plausible to me that the two may actually be somewhat mutually exclusive (in the sense that norms/technology designed to promote the volume of high-quality discussion may in fact reduce the volume of discussion in general). It's not clear to me how this tradeoff should be balanced.

Comment by dxu on If physics is many-worlds, does ethics matter? · 2019-07-22T18:14:46.664Z · score: 2 (1 votes) · LW · GW

A functional duplicate of an entity that reports having such-and-such a quale will report having it even if doesn't.

In that case, there's no reason to think anyone has qualia. The fact that lots of people say they have qualia, doesn't actually mean anything, because they'd say so either way; therefore, those people's statements do not constitute valid evidence in favor of the existence of qualia. And if people's statements don't constitute evidence for qualia, then the sum total of evidence for qualia's existence is... nothing: there is zero evidence that qualia exist.

So your interpretation is self-defeating: there is no longer a need to explain qualia, because there's no reason to suppose that they exist in the first place. Why try and explain something that doesn't exist?

On the other hand, it remains an empirical fact that people do actually talk about having "conscious experiences". This talk has nothing to do with "qualia" as you've defined the term, but that doesn't mean it's not worth investigating in its own right, as a scientific question: "What is the physical cause of people's vocal cords emitting the sounds corresponding to the sentence 'I'm conscious of my experience'?" What the generalized anti-zombie principle says is that the answer to this question, will in fact explain qualia--not the concept that you described or that David Chalmers endorses (which, again, we have literally zero reason to think exists), but the intuitive concept that led philosophers to coin the term "qualia" in the first place.

Comment by dxu on Rationality is Systematized Winning · 2019-07-22T02:12:33.686Z · score: 11 (4 votes) · LW · GW

You're confusing ends with means, terminal goals with instrumental goals, morality with decision theory, and about a dozen other ways of expressing the same thing. It doesn't matter what you consider "good", because for any fixed definition of "good", there are going to be optimal and suboptimal methods of achieving goodness. Winning is simply the task of identifying and carrying out an optimal, rather than suboptimal, method.

Comment by dxu on Why it feels like everything is a trade-off · 2019-07-18T17:00:18.813Z · score: 5 (3 votes) · LW · GW

Good post. Seems related to (possibly the same concept as) why the tails come apart.

Comment by dxu on The AI Timelines Scam · 2019-07-12T21:55:21.046Z · score: 17 (8 votes) · LW · GW

There are strong prior reasons to think that it's better for the public to have better beliefs about AI strategy.

That may be, but note that the word "prior" is doing basically all of the work in this sentence. (To see this, just replace "AI strategy" with practically any other subject, and notice how the modified statement sounds just as sensible as the original.) This is important because priors can easily be overwhelmed by additional evidence--and insofar as AI researcher Alice thinks a specific discussion topic in AI strategy has the potential to be dangerous, it's worth realizing Alice probably has some specific inside view reasons to believe that's the case. And, if those inside view arguments happen to require an understanding of the topic that Alice believes to be dangerous, then Alice's hands are now tied: she's both unable to share information about something, and unable to explain why she can't share that information.

Naturally, this doesn't just make Alice's life more difficult: if you're someone on the outside looking in, then you have no way of confirming if anything Alice says is true, and you're forced to resort to just trusting Alice. If you don't have a whole lot of trust in Alice to begin with, you might assume the worst of her: Alice is either rationalizing or lying (or possibly both) in order to gain status for herself and the field she works in.

I think, however, that these are dangerous assumptions to make. Firstly, if Alice is being honest and rational, then this policy effectively punishes her for being "in the know"--she must either divulge information she (correctly) believes to be dangerous, or else suffer an undeserved reputational hit. I'm particularly wary of imposing incentive structures of this kind around AI safety research, especially considering the relatively small number of people working on AI safety to begin with.

Secondly, however: in addition to being unfair to Alice, there are more subtle effects that such a policy may have. In particular, if Alice feels pressured to disclose the reasons she can't disclose things, that may end up influencing the rate and/or quality of the research she does in the first place (Ctrl+F "walls"). This could have serious consequences down the line for AI safety research, above and beyond the object-level hazards of revealing potentially dangerous ideas to the public.

Given all of this, I don't think it's obvious that the best move at this point involves making all of the strategic arguments around AI safety public. (And note that I say this as a member of said public: I am not affiliated with MIRI or any other AI safety institution, nor am I personally acquainted with anyone who is so affiliated. This therefore makes me a direct counter-example to your claim about the public in general having reason to think secret-keeping organizations must be doing so for self-interested reasons.)

To be clear: I think there is a possible world in which your arguments make sense. I also think there is a possible world in which your arguments not only do not make sense, but would lead to a clearly worse outcome if taken seriously. It's not clear to me which of these worlds we actually live in, and I don't think you've done a sufficient job of arguing that we live in the former world instead of the latter.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T23:31:36.524Z · score: 1 (2 votes) · LW · GW

would react to a wound but not pass the mirror test

I mean, reacting to a wound doesn't demonstrate that they're actually experiencing pain. If experiencing pain actually requires self-awareness, then an animal could be perfectly capable of avoiding damaging stimuli without actually feeling pain from said stimuli. I'm not saying that's actually how it works, I'm just saying that reacting to wounds doesn't demonstrate what you want it to demonstrate.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T21:31:36.832Z · score: 2 (1 votes) · LW · GW

It can be aware of an experience its' having, even if its' not aware that it is the one having the experience

I strongly suspect this sentence is based on a confused understanding of qualia.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T20:02:19.011Z · score: 2 (1 votes) · LW · GW

these methods lack enough feedback to enable self-awareness

Although I think this is plausibly the case, I'm far from confident that it's actually true. Are there any specific limitations you think play a role here?

Comment by dxu on The AI Timelines Scam · 2019-07-11T17:57:00.619Z · score: 24 (9 votes) · LW · GW

I agree that it's difficult (practically impossible) to engage with a criticism of the form "I don't find your examples compelling", because such a criticism is in some sense opaque: there's very little you can do with the information provided, except possibly add more examples (which is time-consuming, and also might not even work if the additional examples you choose happen to be "uncompelling" in the same way as your original examples).

However, there is a deeper point to be made here: presumably you yourself only arrived at your position after some amount of consideration. The fact that others appear to find your arguments (including any examples you used) uncompelling, then, usually indicates one of two things:

  1. You have not successfully expressed the full chain of reasoning that led you to originally adopt your conclusion (owing perhaps to constraints on time, effort, issues with legibility, or strategic concerns). In this case, you should be unsurprised at the fact that other people don't appear to be convinced by your post, since your post does not present the same arguments/evidence that convinced you yourself to believe your position.

  2. You do, in fact, find the raw examples in your post persuasive. This would then indicate that any disagreement between you and your readers is due to differing priors, i.e. evidence that you would consider sufficient to convince yourself of something, does not likewise convince others. Ideally, this fact should cause you to update in favor of the possibility that you are mistaken, at least if you believe that your interlocutors are being rational and intellectually honest.

I don't know which of these two possibilities it actually is, but it may be worth keeping this in mind if you make a post that a bunch of people seem to disagree with.

Comment by dxu on Experimental Open Thread April 2019: Socratic method · 2019-07-10T23:54:05.616Z · score: 6 (3 votes) · LW · GW

It's a common belief, but it appears to me quite unfounded, since it hasn't happened in millennia of trying. So, a direct observation speaks against this model.

...

It's another common belief, though separate from the belief of reality. It is a belief that this reality is efficiently knowable, a bold prediction that is not supported by evidence and has hints to the contrary from the complexity theory.

...

General Relativity plus the standard model of the particle physics have stood unchanged and unchallenged for decades, the magic numbers they require remaining unexplained since the Higgs mass was predicted a long time ago. While this suggests that, yes, we will probably never stop being surprised by the universe observations, I make no such claims.

I think at this stage we have finally hit upon a point of concrete disagreement. If I'm interpreting you correctly, you seem to be suggesting that because humans have not yet converged on a "Theory of Everything" after millennia of trying, this is evidence against the existence of such a theory.

It seems to me, on the other hand, that our theories have steadily improved over those millennia (in terms of objectively verifiable metrics like their ability to predict the results of increasingly esoteric experiments), and that this is evidence in favor of an eventual theory of everything. That we haven't converged on such a theory yet is simply a consequence, in my view, of the fact that the correct theory is in some sense hard to find. But to postulate that no such theory exists is, I think, not only unsupported by the evidence, but actually contradicted by it--unless you're interpreting the state of scientific progress quite differently than I am.*

That's the argument from empirical evidence, which (hopefully) allows for a more productive disagreement than the relatively abstract subject matter we've discussed so far. However, I think one of those abstract subjects still deserves some attention--in particular, you expressed further confusion about my use of the word "coincidence":

I am still unsure what you mean by coincidence here. The dictionary defines it as "A remarkable concurrence of events or circumstances without apparent causal connection." and that open a whole new can of worms about what "apparent" and "causal" mean in the situation we are describing, and we soon will be back to a circular argument of implying some underlying reality to explain why we need to postulate reality.

I had previously provided a Tabooed version of my statement, but perhaps even that was insufficiently clear. (If so, I apologize.) This time, instead of attempting to make my statement even more abstract, I'll try taking a different tack and making things more concrete:

I don't think that, if our observations really were impossible to model completely accurately, we would be able to achieve the level of predictive success we have. The fact that we have managed to achieve some level of predictive accuracy (not 100%, but some!) strongly suggests to me that our observations are not impossible to model--and I say this for a very simple reason:

How can it be possible to achieve even partial accuracy at predicting something that is purportedly impossible to model? We can't have done it by actually modeling the thing, of course, because we're assuming that the thing cannot be modeled by hypothesis. So our seeming success at predicting the thing, must not actually be due to any kind of successful modeling of said thing. Then how is it that our model is producing seemingly accurate predictions? It seems as though we are in a similar position to a lazy student who, upon being presented with a test they didn't study for, is forced to guess the right answers--except that in our case, the student somehow gets lucky enough to choose the correct answer every time, despite the fact that they are merely guessing, rather than working out the answer the way they should.

I think that the word "coincidence" is a decent way of describing the student's situation in this case, even if it doesn't fully accord with your dictionary's definition (after all, whoever said the dictionary editors determine have the sole power to determine a word's usage?)--and analogously, our model of the thing must also only be making correct predictions by coincidence, since we've ruled out the possibility, a priori, that it might actually be correctly modeling the way the thing works.

I find it implausible that our models are actually behaving this way with respect to the "thing"/the universe, in precisely the same way I would find it implausible that a student who scored 95% on a test had simply guessed on all of the questions. I hope that helps clarify what I meant by "coincidence" in this context.


*You did say, of course, that you weren't making any claims or postulates to that effect. But it certainly seems to me that you're not completely agnostic on the issue--after all, your initial claim was "it's models all the way down", and you've fairly consistently stuck to defending that claim throughout not just this thread, but your entire tenure on LW. So I think it's fair to treat you as holding that position, at least for the sake of a discussion like this.

Comment by dxu on Experimental Open Thread April 2019: Socratic method · 2019-07-10T18:15:59.693Z · score: 2 (1 votes) · LW · GW

(Okay, I've been meaning to get back to you on this for a while, but for some reason haven't until now.)

It seems, based on what you're saying, that you're taking "reality" to mean some preferred set of models. If so, then I think I was correct that you and I were using the same term to refer to different concepts. I still have some questions for you regarding your position on "reality" as you understand the term, but I think it may be better to defer those until after I give a basic rundown of my position.

Essentially, my belief in an external reality, if we phrase it in the same terms we've been using (namely, the language of models and predictions), can be summarized as the belief that there is some (reachable) model within our hypothesis space that can perfectly predict further inputs. This can be further repackaged into a empirical prediction: I expect that (barring an existential catastrophe that erases us entirely) there will eventually come a point when we have the "full picture" of physics, such that no further experiments we perform will ever produce a result we find surprising. If we arrive at such a model, I would be comfortable referring to that model as "true", and the phenomena it describes as "reality".

Initially, I took you to be asserting the negation of the above statement--namely, that we will never stop being surprised by the universe, and that our models, though they might asymptotically approach a rate of 100% predictive success, will never quite get there. It is this claim that I find implausible, since it seems to imply that there is no model in our hypothesis space capable of predicting further inputs with 100% accuracy--but if that is the case, why do we currently have a model with >99% predictive accuracy? Is the success of this model a mere coincidence? It must be, since (by assumption) there is no model actually capable of describing the universe. This is what I was gesturing at with the "coincidence" hypothesis I kept mentioning.

Now, perhaps you actually do hold the position described in the above paragraph. (If you do, please let me know.) But based on what you wrote, it doesn't seem necessary for me to assume that you do. Rather, you seem to be saying something along the lines of, "It may be tempting to take our current set of models as describing how reality ultimately is, but in fact we have no way of knowing this for sure, so it's best not to assume anything."

If that's all you're saying, it doesn't necessarily conflict with my view (although I'd suggest that "reality doesn't exist" is a rather poor way to go about expressing this sentiment). Nonetheless, if I'm correct about your position, then I'm curious as to what you think it's useful for? Presumably it doesn't help make any predictions (almost by definition), so I assume you'd say it's useful for dissolving certain kinds of confusion. Any examples, if so?

Comment by dxu on If physics is many-worlds, does ethics matter? · 2019-07-10T17:39:18.162Z · score: 4 (2 votes) · LW · GW

The linked post is the last in a series of posts, the first of which has been linked here in the past. I recommend that anyone who reads the post shminux linked, also read the LW discussion of the post I just linked, as it seems to me that many of the arguments therein are addressed in a more than satisfactory manner. (In particular, I strongly endorse Jessica Taylor's response, which is as of this writing the most highly upvoted comment on that page.)

Comment by dxu on An Increasingly Manipulative Newsfeed · 2019-07-07T17:38:12.343Z · score: 2 (1 votes) · LW · GW
Suppose I have a training set of articles which are labeled "biased" or "unbiased". I then train a system (using this set), and later use it to label articles "biased" or "unbiased". Will this lead to a manipulative system?

Mostly I would expect such a system to overfit on the training data, and perform no better than chance when tested. The reason for this is that unlike your example, where cats and dogs are (fairly) natural categories with simple distinguishing characteristics, the perception of "bias" in news articles is fundamentally tied to human psychology, and as a result is much more complicated concept to learn than catness versus dogness. By default I would expect an offline training method to completely fail at learning said concept.

Reinforcement learning, meanwhile, will indeed become manipulative (in my expectation). In a certain sense you can view this as a form of overfitting as well, except that the system learns to exploit peculiarities of the humans performing the classification, rather than simply peculiarities of the articles in its training data. (As you might imagine, the former is far more dangerous.)

Comment by dxu on Thoughts on The Replacing Guilt Series⁠ — pt 1 · 2019-07-05T20:50:15.693Z · score: 2 (1 votes) · LW · GW
What's an actual mind?

My philosophy of mind is not yet advanced enough to answer this question. (However, the fact that I am unable to answer a question at present does not imply that there is no answer.)

How do you know that a dog has it?

In a certain sense, I don't. However, I am reasonably confident that regardless of whatever actually constitutes mindfulness, enough of it is shared between the dog and myself that if the dog turns out not to have a mind, then I also do not have a mind. Since I currently believe I do, in fact, have a mind, it follows that I believe the dog does as well.

(Perhaps you do not believe dogs have minds. In that case, the correct response would be to replace the dog in the thought experiment with something you do believe has a mind--for example, a close friend or family member.)

Would you care about an alien living creature that has a different mind-design and doesn't feel qualia?

Most likely not, though I remain uncertain enough about my own preferences that what I just said could be false.

Anyway, if you have no reason to think that the element is absent, then you'll believe that it's present. It's precisely because you feel that something is (or will be) missing, you refuse the offer. You do have some priors about what consequences will be produced by your choice, and that's OK. Nothing incoherent in refusing the offer. That is, if you do have reasons to believe that that's the case.

I agree with this, but it seems not to square with what you wrote originally:

Do you still think that taking a dollar is the wrong choice, even though literally nothing changes afterwards? If you do, do you think it’s a rational choice? Or is your S1 deluding you?
Comment by dxu on Thoughts on The Replacing Guilt Series⁠ — pt 1 · 2019-07-05T20:24:38.383Z · score: 3 (2 votes) · LW · GW
I used the words “truly real”. The dog doesn’t matter, the consequences of the phenomenon that you call “dog” matter.

Wrong. This misses the point of the thought experiment entirely, which is precisely that people are allowed to care about things that aren't detectable by any empirical test. If someone is being tortured in a spaceship that's constantly accelerating away from me, such that the ship and I cannot interact with each other even in principle, I can nonetheless hold that it would be morally better if that person were rescued from their torture (though I myself obviously can't do the rescuing). There is nothing incoherent about this.

In the case of the dog, what matters to me is the dog's mental state. I do not care that I observe a phenomenon exhibiting dog-like behavior; I care that there is an actual mind producing that behavior. If the dog wags its tail to indicate contentment, I want the dog to actually feel content. If the dog is actually dead and I'm observing a phantom dog, then there is no mental state to which the dog's behavior is tied, and hence a crucial element is missing--even if that element is something I can't detect even in principle, even if I myself have no reason to think the element is absent. There is nothing incoherent about this, either.

Fundamentally, you seem to be demanding that other people toss out any preferences they may have that do not conform to the doctrine of the logical positivists. I see no reason to accede to this demand, and as there is nothing in standard preference theory that forces me to accede, I think I will continue to maintain preferences whose scope includes things that actually exist, and not just things I think exist.