Comment by dxu on Variables Don't Represent The Physical World (And That's OK) · 2021-06-18T03:56:37.588Z · LW · GW

You are misunderstanding the post. There are no "extra bits of information" hiding anywhere in reality; where the "extra bits of information" are lurking is within the implicit assumptions you made when you constructed your model the way you did.

As long as your model is making use of abstractions--that is, using "summary data" to create and work with a lower-dimensional representation of reality than would be obtained by meticulously tracking every variable of relevance--you are implicitly making a choice about what information you are summarizing and how you are summarizing it.

This choice is forced to some extent, in the sense that there are certain ways of summarizing the data that barely simplify computation at all compared to using the "full" model. But even conditioning on a usefully simplifying (natural) abstraction having been selected, there will still be degrees of freedom remaining, and those degrees of freedom are determined by you (the person doing the summarizing). This is where the "extra information" comes from; it's not because of inherent uncertainty in the physical measurements, but because of an unforced choice that was made between multiple abstract models summarizing the same physical measurements.

Of course, in reality you are also dealing with measurement uncertainty. But that's not what the post is about; the thing described in the post happens even if you somehow manage to get your hands on a set of uncertainty-free measurements, because the moment you pick a particular way to carve up those measurements, you induce a (partially) arbitrary abstraction layer on top of the measurements. As the post itself says:

If there’s only a limited number of data points, then this has the same inherent uncertainty as before: sample mean is not distribution mean. But even if there’s an infinite number of data points, there’s still some unresolvable uncertainty: there are points which are boundary-cases between the “tree” cluster and the “apple” cluster, and the distribution-mean depends on how we classify those. There is no physical measurement we can make which will perfectly tell us which things are “trees” or “apples”; this distinction exists only in our model, not in the territory. In turn, the tree-distribution-parameters do not perfectly correspond to any physical things in the territory.

This implies nothing about determinism, physics, or the nature of reality ("illusory" or otherwise).

Comment by dxu on Often, enemies really are innately evil. · 2021-06-07T23:37:49.774Z · LW · GW

This does not strike me as a psychologically realistic model of sadism, and (absent further explanation/justification) counts in my opinion as a rather large strike against mistake theory (or at least, it would if I took as given that a plurality of self-proclaimed "mistake theorists" would in fact endorse the statement you made).

Comment by dxu on What will 2040 probably look like assuming no singularity? · 2021-05-19T06:29:37.292Z · LW · GW

Neither of those seem to me like the right questions to be asking (though for what it's worth the answer to the first question has been pretty clearly "yes" if by "Chinese government" we're referring specifically to post-2001 China).

Having said that, I don't think outside-viewing these scenarios using coarse-grained reference classes like "the set of mid-term goals China has set for itself in the past" leads to anything useful. Well-functioning countries in general (and China in particular) tend to set goals for themselves they view as achievable, so if they're well-calibrated it's necessarily the case that they'll end up achieving (a large proportion of) the goals they set for themselves. This being the case, you don't learn much from finding out China manages to consistently meet its own goals, other than that they've historically done a pretty decent job at assessing their own capabilities. Nor does this allow you to draw conclusions about a specific goal they have, which may be easier or more difficult to achieve than their average goal.

In the case of Taiwan: by default, China is capable of taking Taiwan by force. What I mean by this is that China's maritime capabilities well exceed Taiwan's defensive capacity, such that Taiwan's continued sovereignty in the face of a Chinese invasion is entirely reliant on the threat of external intervention (principally from the United States, but also by allies in the region). Absent that threat, China could invade Taiwan tomorrow and have a roughly ~100% chance of taking the island. Even if allies get involved, there's a non-negligible probability China wins anyway, and the trend going forward only favors China even more.

Of course, that doesn't mean China will invade Taiwan in the near future. As long as its victory isn't assured, it stands to lose substantially more than from a failed invasion than it stands to gain from a successful one. At least for the near future, so long as the United States doesn't send a clear signal about whether it will defend Taiwan, I expect China to mostly play it safe. But there's definitely a growing confidence within China that they'll retake Taiwan eventually, so the prospect of an invasion is almost certainly on the horizon unless current trends w.r.t. the respective strengths of the U.S. and Chinese militaries reverse for some reason. That's not out of the question (the future is unpredictable), but there's also no particular reason to expect said trends to reverse, so assuming they don't, China will almost certainly try to occupy Taiwan at some point, regardless of what stance the U.S. takes on the issue.

(Separately, there's the question of whether the U.S. will take a positive stance; I'm not optimistic that it will, given its historical reluctance to do so, as well as the fact that all of the risks and incentives responsible for said reluctance will likely only increase as time goes on.)

Comment by dxu on Making Vaccine · 2021-02-05T16:37:56.012Z · LW · GW

A simple Google search shows thousands of articles addressing this very solution.

The solution in the paper you link is literally the solution Eliezer described trying, and not working:

As of 2014, she’d tried sitting in front of a little lightbox for an hour per day, and it hadn’t worked.

(Note that the "little lightbox" in question was very likely one of these, which you may notice have mostly ratings of 10,000 lux rather than the 2,500 cited in the paper. So, significantly brighter, and despite that, didn't work.)

It does sound like you misunderstood, in other words. Knowing that light exposure is an effective treatment for SAD is indeed a known solution; this is why Eliezer tried light boxes to begin with. The point of that excerpt is that this "known solution" did not work for his wife, and the obvious next step of scaling up the amount of light used was not investigated in any of the clinical literature.

But taking a step back, the "Chesterton’s Absence of a Fence" argument doesn't apply here because the circumstances are very different. The entire world is desperately looking for a way to stop COVID. If SAD suddenly occurred out of nowhere and affected the entire economy, you would be sure that bright lights would be one of the first things to be tested.

This is simply a (slightly) disguised variation of your original argument. Absent strong reasons to expect to see efficiency, you should not expect to see efficiency. The "entire world desperately looking for a way to stop COVID" led to bungled vaccine distribution, delayed production, supply shortages, the list goes on and on. Empirically, we do not observe anything close to efficiency in this market, and this should be obvious even without the aid of Dentin's list of bullet points (though naturally those bullet points are very helpful).

(Question: did seeing those bullet points cause you to update at all in the direction of this working, or are you sticking with your 1-2% prior? The latter seems fairly indefensible from an epistemic standpoint, I think.)

Not only is the argument above flawed, it's also special pleading with respect to COVID. Here is the analogue of your argument with respect to SAD:

Around 7% of the population has severe Seasonal Affective Disorder, and another 20% or so has weak Seasonal Affective Disorder. Around 50% of tested cases respond to standard lightboxes. So if the intervention of stringing up a hundred LED bulbs actually worked, it could provide a major improvement to the lives of 3% of the US population, costing on the order of $1000 each (without economies of scale). Many of those 9 million US citizens would be rich enough to afford that as a treatment for major winter depression. If you could prove that your system worked, you could create a company to sell SAD-grade lighting systems and have a large market.

SAD is not an uncommon disorder. In terms of QALYs lost, it's... probably not directly comparable with COVID, but it's at the very least in the same ballpark--certainly to the point where "people want to stop COVID, but they don't care about SAD" is clearly false.

And yet, in point of fact, there are no papers describing the unspeakably obvious intervention of "if your lights don't seem to be working, use more lights", nor are there any companies predicated on this idea. If Eliezer had followed your reasoning to its end conclusion, he might not have bothered testing more light... except that his background assumptions did not imply the (again, fairly indefensible, in my view) heuristic that "if no one else is doing it, the only possible explanation is that it must not work, else people are forgoing free money". And as a result, he did try the intervention, and it worked, and (we can assume) his wife's quality of life was improved significantly as a result.

If there's an argument that (a) applies in full generality to anything other people haven't done before, and (b) if applied, would regularly lead people to forgo testing out their ideas (and not due to any object-level concerns, either, e.g. maybe it's a risky idea to test), then I assert that that argument is bad and harmful, and that you should stop reasoning in this manner.

Comment by dxu on Making Vaccine · 2021-02-04T20:24:16.344Z · LW · GW

This is a very in-depth explanation of some of the constraints affecting pharmaceutical companies that (mostly) don't apply to individuals, and is useful as an object-level explanation for those interested. I'm glad this comment was written, and I upvoted accordingly.

Having said that, I would also like to point out that a detailed explanation of the constraints shouldn't be needed to address the argument in the grandparent comment, which simply reads:

Why are established pharmaceutical companies spending billions on research and using complex mRNA vaccines when simply creating some peptides and adding it to a solution works just as well?

This question inherently assumes that the situation with commercial vaccine-makers is efficient with respect to easy, do-it-yourself interventions, and the key point I want to make is that this assumption is unjustified even if you don't happen to have access to a handy list of bullet points detailing the ways in which companies and individuals differ on this front. (Eliezer wrote a whole book on this at one point, from which I'll quote a relevant section:)

My wife has a severe case of Seasonal Affective Disorder. As of 2014, she’d tried sitting in front of a little lightbox for an hour per day, and it hadn’t worked. SAD’s effects were crippling enough for it to be worth our time to consider extreme options, like her spending time in South America during the winter months. And indeed, vacationing in Chile and receiving more exposure to actual sunlight did work, where lightboxes failed.

From my perspective, the obvious next thought was: “Empirically, dinky little lightboxes don’t work. Empirically, the Sun does work. Next step: more light. Fill our house with more lumens than lightboxes provide.” In short order, I had strung up sixty-five 60W-equivalent LED bulbs in the living room, and another sixty-five in her bedroom.

Ah, but should I assume that my civilization is being opportunistic about seeking out ways to cure SAD, and that if putting up 130 LED light bulbs often worked when lightboxes failed, doctors would already know about that? Should the fact that putting up 130 light bulbs isn’t a well-known next step after lightboxes convince me that my bright idea is probably not a good idea, because if it were, everyone would already be doing it? Should I conclude from my inability to find any published studies on the Internet testing this question that there is some fatal flaw in my plan that I’m just not seeing?

We might call this argument “Chesterton’s Absence of a Fence.” The thought being: I shouldn’t build a fence here, because if it were a good idea to have a fence here, someone would already have built it. The underlying question here is: How strongly should I expect that this extremely common medical problem has been thoroughly considered by my civilization, and that there’s nothing new, effective, and unconventional that I can personally improvise?

Eyeballing this question, my off-the-cuff answer—based mostly on the impressions related to me by every friend of mine who has ever dealt with medicine on a research level—is that I wouldn’t necessarily expect any medical researcher ever to have done a formal experiment on the first thought that popped into my mind for treating this extremely common depressive syndrome. Nor would I strongly expect the intervention, if initial tests found it to be effective, to have received enough attention that I could Google it.

The grandparent comment is more or less an exact example of this species of argument, and is the first of its kind that I can recall seeing "in the wild". I think examples of this kind of thinking are all over the place, but it's rare to find a case where somebody explicitly deploys an argument of this type in such a direct, obvious way. So I wanted to draw attention to this, with further emphasis on the idea that such arguments are not valid in general.

The prevalence of this kind of thinking is why (I claim) at-home, do-it-yourself interventions are so uncommon, and why this particular intervention went largely unnoticed even among the rationalist community. It's a failure mode that's easy to slip into, so I think it's important to point these things out explicitly and push back against them when they're spotted (which is the reason I wrote this comment).

IMPORTANT NOTE: This should be obvious enough to anyone who read Inadequate Equilibria, but one thing I'm not saying here is that you should just trust random advice you find online. You should obviously perform an object-level evaluation of the advice, and put substantial effort into investigating potential risks; such an assessment might very well require multiple days' or weeks' worth of work, and end up including such things as the bulleted list in the parent comment. The point is that once you've performed that assessment, it serves no further purpose to question yourself based only on the fact that others aren't doing the thing you're doing; this is what Eliezer would call wasted motion, and it's unproductive at best and harmful at worst. If you find yourself thinking along these lines, you should stop, in particular if you find yourself saying things like this (emphasis mine):

That being said, I'm extremely skeptical that this will work, my belief is that there's a 1-2% chance here that you've effectively immunized yourself from COVID.

You cannot get enough Bayesian evidence from the fact that [insert company here] isn't doing [insert intervention here] to reduce your probability of an intervention being effective all the way down to 1-2%. That 1-2% figure almost certainly didn't come from any attempt at a numerical assessment; rather, it came purely from an abstract intuition that "stuff that isn't officially endorsed doesn't work". This is the kind of thinking that (I assert) should be noticed and stamped out.

Comment by dxu on Syntax, semantics, and symbol grounding, simplified · 2020-12-01T02:29:11.293Z · LW · GW

With regard to GPT-n, I don't think the hurdle is groundedness. Given a sufficiently vast corpus of language, GPT-n will achieve a level of groundedness where it understands language at a human level but lacks the ability to make intelligent extrapolations from that understanding (e.g. invent general relativity), which is rather a different problem.

The claim in the article is that grounding is required for extrapolation, so these two problems are not in fact unrelated. You might compare e.g. the case of a student who has memorized by rote a number of crucial formulas in calculus, but cannot derive those formulas from scratch if asked (and by extension obviously cannot conceive of or prove novel theorems either); this suggests an insufficient level of understanding of the fundamental mathematical underpinnings of calculus, which (if I understood Stuart's post correctly) is a form of "ungroundedness".

Comment by dxu on [Linkpost] AlphaFold: a solution to a 50-year-old grand challenge in biology · 2020-11-30T20:47:01.160Z · LW · GW

I don't think it's particularly impactful from an X-risk standpoint (at least in terms of first-order consequences), but in terms of timelines I think it represents another update in favor of shorter timelines, in a similar vein to AlphaGo/AlphaZero.

Comment by dxu on Message Length · 2020-10-22T01:44:47.852Z · LW · GW

Since the parameters in your implementation are 32-bit floats, you assign a complexity cost of 32 ⋅ 2^n bits to n-th order Markov chains, and look at the sum of fit (log loss) and complexity.

Something about this feels wrong. The precision of your floats shouldn't be what determines the complexity of your Markov chain; the expressivity of an nth-order Markov chain will almost always be worse than that of a (n+1)th-order Markov chain, even if the latter has access to higher precision floats than the former. Also, in the extreme case where you're working with real numbers, you'd end up with the absurd conclusion that every Markov chain has infinite complexity, which is obviously nonsensical.

This does raise the question of how to assign complexity to Markov chains; it's clearly going to be linear in the number of parameters (and hence exponential in the order of the chain), which means the general form k ⋅ 2^n seems correct... but the value you choose for the coefficient k seems underdetermined.

Comment by dxu on Alignment By Default · 2020-08-16T16:18:57.704Z · LW · GW

I like this post a lot, and I think it points out a key crux between what I would term the "Yudkowsky" side (which seems to mostly include MIRI, though I'm not too sure about individual researchers' views) and "everybody else".

In particular, the disagreement seems to crystallize over the question of whether "human values" really are a natural abstraction. I suspect that if Eliezer thought that they were, he would be substantially less worried about AI alignment than he currently is (though naturally all of this is my read on his views).

You do provide some reasons to think that human values might be a natural abstraction, both in the post itself and in the comments, but I don't see these reasons as particularly compelling ones. The one I view as the most compelling is the argument that humans seems to be fairly good at identifying and using natural abstractions, and therefore any abstract concept that we seem to be capable of grasping fairly quickly has a strong chance of being a natural one.

However, I think there's a key difference between abstractions that are developed for the purposes of prediction, and abstractions developed for other purposes (by which I mostly mean "RL"). To the extent that a predictor doesn't have sufficient computational power to form a low-level model of whatever it's trying to predict, I definitely think that the abstractions it develops in the process of trying to improve its prediction will to a large extent be natural ones. (You lay out the reasons for this clearly enough in the post itself, so I won't repeat them here.)

It seems to me, though, that if we're talking about a learning agent that's actually trying to take actions to accomplish things in some environment, there's a substantial amount of learning going on that has nothing to do with learning to predict things with greater accuracy! The abstractions learned in order to select actions from a given action-space in an attempt to maximize a given reward function--these, I see little reason to expect will be natural. In fact, if the computational power afforded to the agent is good but not excellent, I expect mostly the opposite: a kludge of heuristics and behaviors meant to address different subcases of different situations, with not a whole lot of rhyme or reason to be found.

As agents go, humans are definitely of the latter type. And, therefore, I think the fact that we intuitively grasp the concept of "human values" isn't necessarily an argument that "human values" are likely to be natural, in the way that it would be for e.g. trees. The latter would have been developed as a predictive abstraction, whereas the former seems to mainly consist of what I'll term a reward abstraction. And it's quite plausible to me that reward abstractions are only legible by default to agents which implement that particular reward abstraction, and not otherwise. If that's true, then the fact that humans know what "human values" are is merely a consequence of the fact that we happen to be humans, and therefore have a huge amount of mind-structure in common.

To the extent that this is comparable to the branching pattern of a tree (which is a comparison you make in the post), I would argue that it increases rather than lessens the reason to worry: much like a tree's branch structure is chaotic, messy, and overall high-entropy, I expect human values to look similar, and therefore not really encompass any kind of natural category.

Comment by dxu on The "AI Dungeons" Dragon Model is heavily path dependent (testing GPT-3 on ethics) · 2020-08-02T23:51:28.965Z · LW · GW

Here's the actual explanation for this:

This seems to have been an excellent exercise in noticing confusion; in particular, to figure this one out properly would have required one to not recognize that this behavior does not accord with one's pre-existing model, rather than simply coming up with an ad hoc explanation to fit the observation.

I therefore award partial marks to Rafael Harth for not proposing any explanations in particular, as well as Viliam in the comments:

I assumed that the GPT's were just generating the next word based on the previous words, one word at a time. Now I am confused.

Zero marks to Andy Jones, unfortunately:

I am fairly confident that Latitude wrap your Dungeon input before submitting it to GPT-3; if you put in the prompt all at once, that'll make for different model input than putting it in one line at a time.

Don't make up explanations! Take a Bayes penalty for your transgressions!

(No one gets full marks, unfortunately, since I didn't see anyone actually come up with the correct explanation.)

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-26T17:54:02.035Z · LW · GW

For what it's worth, my perception of this thread is the opposite of yours: it seems to me John Wentworth's arguments have been clear, consistent, and easy to follow, whereas you (John Maxwell) have been making very little effort to address his position, instead choosing to repeatedly strawman said position (and also repeatedly attempting to lump in what Wentworth has been saying with what you think other people have said in the past, thereby implicitly asking him to defend whatever you think those other people's positions were).

Whether you've been doing this out of a lack of desire to properly engage, an inability to comprehend the argument itself, or some other odd obstacle is in some sense irrelevant to the object-level fact of what has been happening during this conversation. You've made your frustration with "AI safety people" more than clear over the course of this conversation (and I did advise you not to engage further if that was the case!), but I submit that in this particular case (at least), the entirety of your frustration can be traced back to your own lack of willingness to put forth interpretive labor.

To be clear: I am making this comment in this tone (which I am well aware is unkind) because there are multiple aspects of your behavior in this thread that I find not only logically rude, but ordinarily rude as well. I more or less summarized these aspects in the first paragraph of my comment, but there's one particularly onerous aspect I want to highlight: over the course of this discussion, you've made multiple references to other uninvolved people (either with whom you agree or disagree), without making any effort at all to lay out what those people said or why it's relevant to the current discussion. There are two examples of this from your latest comment alone:

Daniel K agreed with me the other day that there isn't a standard reference for this claim. [Note: your link here is broken; here's a fixed version.]

A MIRI employee openly admitted here that they apply different standards of evidence to claims of safety vs claims of not-safety.

Ignoring the question of whether these two quoted statements are true (note that even the fixed version of the link above goes only to a top-level post, and I don't see any comments on that post from the other day), this is counterproductive for a number of reasons.

Firstly, it's inefficient. If you believe a particular statement is false (and furthermore, that your basis for this belief is sound), you should first attempt to refute that statement directly, which gives your interlocutor the opportunity to either counter your refutation or concede the point, thereby moving the conversation forward. If you instead counter merely by invoking somebody else's opinion, you both increase the difficulty of answering and end up offering weaker evidence.

Secondly, it's irrelevant. John Wentworth does not work at MIRI (neither does Daniel Kokotajlo, for that matter), so bringing up aspects of MIRI's position you dislike does nothing but highlight a potential area where his position differs from MIRI's. (I say "potential" because it's not at all obvious to me that you've been representing MIRI's position accurately.) In order to properly challenge his position, again it becomes more useful to critique his assertions directly rather than round them off to the closest thing said by someone from MIRI.

Thirdly, it's a distraction. When you regularly reference a group of people who aren't present in the actual conversation, repeatedly make mention of your frustration and "grumpiness" with those people, and frequently compare your actual interlocutor's position to what you imagine those people have said, all while your actual interlocutor has said nothing to indicate affiliation with or endorsement of those people, it doesn't paint a picture of an objective critic. To be blunt: it paints a picture of someone with a one-sided grudge against the people in question, and is attempting to inject that grudge into conversations where it shouldn't be present.

I hope future conversations can be more pleasant than this.

Comment by dxu on The Basic Double Crux pattern · 2020-07-23T04:16:58.984Z · LW · GW

I think shminux may have in mind one or more specific topics of contention that he's had to hash out with multiple LWers in the past (myself included), usually to no avail. 

(Admittedly, the one I'm thinking of is deeply, deeply philosophical, to the point where the question "what if I'm wrong about this?" just gets the intuition generator to spew nonsense. But I would say that this is less about an inability to question one's most deeply held beliefs, and more about the fact that there are certain aspects of our world-models that are still confused, and querying them directly may not lead to any new insight.)

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T03:54:52.105Z · LW · GW

If it's read moral philosophy, it should have some notion of what the words "human values" mean.

GPT-3 and systems like it are trained to mimic human discourse. Even if (in the limit of arbitrary computational power) it manages to encode an implicit representation of human values somewhere in its internal state, in actual practice there is nothing tying that representation to the phrase "human values", since moral philosophy is written by (confused) humans, and in human-written text the phrase "human values" is not used in the consistent, coherent manner that would be required to infer its use as a label for a fixed concept.

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T03:48:17.942Z · LW · GW

On "conceding the point":

You said earlier that "The argument for the fragility of value never relied on AI being unable to understand human values." I gave you a quote from Superintelligence which talked about AI being unable to understand human values. Are you gonna, like, concede the point or something?

The thesis that values are fragile doesn't have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede.

On gwern's article:

Anyway, I read Gwern's article a while ago and I thought it was pretty bad. If I recall correctly, Gwern confuses various different notions, for example, he seemed to think that if you replace enough bits of handcrafted software with bits trained using machine learning, an agent will spontaneously emerge.

I'm not sure how to respond to this, except to state that neither this specific claim nor anything particularly close to it appears in the article I linked.

On Tool AI:

Are possible

As far as I'm aware, this point has never been the subject of much dispute.

Are easier to build than Agent AIs

This is still arguable; I have my doubts, but in a "big picture" sense this is largely irrelevant to the greater point, which is:

Will be able to solve the value-loading problem

This is (and remains) the crux. I still don't see how GPT-3 supports this claim! Just as a check that we're on the same page: when you say "value-loading problem", are you referring to something more specific than the general issue of getting an AI to learn and behave according to our values?


META: I can understand that you're frustrated about this topic, especially if it seems to you that the "MIRI-sphere" (as you called it in a different comment) is persistently refusing to acknowledge something that appears obvious to you.

Obviously, I don't agree with that characterization, but in general I don't want to engage in a discussion that one side is finding increasingly unpleasant, especially since that often causes the discussion to rapidly deteriorate in quality after a few replies.

As such, I want to explicitly and openly relieve you of any social obligation you may have felt to reply to this comment. If you feel that your time would be better spent elsewhere, please do!

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T02:43:24.482Z · LW · GW

My claim is that we are likely to see a future GPT-N system which [...] does not "resist attempts to meddle with its motivational system".

Well, yes. This is primarily because GPT-like systems don't have a "motivational system" with which to meddle. This is not a new argument by any means: the concept of AI systems that aren't architecturally goal-oriented by default is known as "Tool AI", and there's plenty of pre-existing discussion on this topic. I'm not sure what you think GPT-3 adds to the discussion that hasn't already been mentioned?

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T22:33:36.283Z · LW · GW

I'm confused by what you're saying.

The argument for the fragility of value never relied on AI being unable to understand human values. Are you claiming it does?

If not, what are you claiming?

Comment by dxu on Coronavirus as a test-run for X-risks · 2020-06-14T00:06:03.697Z · LW · GW

I'd love to see more thought about how the MNM effect might look in an AI scenario. Like you said, maybe denials and assurances followed by freakouts and bans. But maybe we could predict what sorts of events would trigger the shift?

I take it you're presuming slow takeoff in this paragraph, right?

Comment by dxu on Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning · 2020-06-10T16:59:04.237Z · LW · GW

Differing discourse norms; in general, communities that don't expend a constant amount of time-energy into maintaining better-than-average standards of discourse will, by default, regress to the mean. (We saw the same thing happen with LW1.0.)

Comment by dxu on GPT-3: a disappointing paper · 2020-06-02T20:06:38.721Z · LW · GW

I'm not seeing how you distinguish between the following two hypotheses:

  1. GPT-3 exhibits mostly flat scaling at the tasks you mention underneath your first bullet point (WiC, MultiRC, etc.) because its architecture is fundamentally unsuited to those tasks, such that increasing the model capacity will lead to little further improvement.
  2. Even 175B parameters isn't sufficient to perform well on certain tasks (given a fixed architecture), but increasing the number of parameters will eventually cause performance on said tasks to undergo a large increase (akin to something like a phase change in physics).

It sounds like you're implicitly taking the first hypothesis as a given (e.g. when you assert that there is a "remaining gap vs. fine-tuning that seems [unlikely] to be closed"), but I see no reason to give this hypothesis preferential treatment!

In fact, it seems to be precisely the assertion of the paper's authors that the first hypothesis should not be taken as a given; and the evidence they give to support this assertion is... the multiple downstream tasks for which an apparent "phase change" did in fact occur. Let's list them out:

  • BoolQ (apparent flatline between 2.6B and 13B, then a sudden jump in performance at 175B)
  • CB (essentially noise between 0.4B and 13B, then a sudden jump in performance at 175B)
  • RTE (essentially noise until 2.6B, then a sudden shift to very regular improvement until 175B)
  • WSC (essentially noise until 2.6B, then a sudden shift to very regular improvement until 175B)
  • basic arithmetic (mostly flat until 6.7B, followed by rapid improvement until 175B)
  • SquadV2 (apparent flatline at 0.8B, sudden jump at 1.3B followed by approximately constant rate of improvement until 175B)
  • ANLI round 3 (noise until 13B, sudden jump at 175B)
  • word-scramble with random insertion (sudden increase in rate of improvement after 6.7B)

Several of the above examples exhibit a substantial amount of noise in their performance graphs, but nonetheless, I feel my point stands. Given this, it seems rather odd for you to be claiming that the "great across-task variance" indicates a lack of general reasoning capability when said across-task variance is (if anything) evidence for the opposite, with many tasks that previously stumped smaller models being overcome by GPT-3.

It's especially interesting to me that you would write the following, seemingly without realizing the obvious implication (emphasis mine):

we still see a wide spread of task performance despite smooth gains in LM loss, with some of the most distinctive deficits persisting at all scales (common sense physics, cf section 5), and some very basic capabilities only emerging at very large scale and noisily even there (arithmetic)

The takeaway here is, at least in my mind, quite clear: it's a mistake to evaluate model performance on human terms. Without getting into an extended discussion on whether arithmetic ought to count as a "simple" or "natural" task, empirically transformers do not exhibit a strong affinity for the task. Therefore, the fact that this "basic capability" emerges at all is, or at least should be, strong evidence for generalization capability. As such, the way you use this fact to argue otherwise (both in the section I just quoted and in your original post) seems to me to be exactly backwards.

Elsewhere, you write:

The ability to get better downstream results is utterly unsurprising: it would be very surprising if language prediction grew steadily toward perfection without a corresponding trend toward good performance on NLP benchmarks

It's surprising to me that you would write this while also claiming that few-shot prediction seems unlikely to close the gap to fine-tuned models on certain tasks. I can't think of a coherent model where both of these claims are simultaneously true; if you have one, I'd certainly be interested in hearing what it is.

More generally, this is (again) why I stress the importance of concrete predictions. You call it "utterly unsurprising" that a 175B-param model would outperform smaller ones on NLP benchmarks, and yet neither you nor anyone else could have predicted what the scaling curves for those benchmarks would look like. (Indeed, your entire original post can be read as an expression of surprise at the lack of impressiveness of GPT-3's performance on certain benchmarks.)

When you only ever look at things in hindsight, without ever setting forth concrete predictions that can be overturned by evidence, you run the risk of never forming a model concrete enough to be engaged with. I don't believe it's a coincidence that you called it "difficult" to explain why you found the paper unimpressive: it's because your standards of impressiveness are opaque enough that they don't, in and of themselves, constitute a model of how transformers might/might not possess general reasoning ability.

Comment by dxu on GPT-3: a disappointing paper · 2020-06-02T17:49:33.971Z · LW · GW

Also note that a significant number of humans would fail the kind of test you described (inducing the behavior of a novel mathematical operation from a relatively small number of examples), which is why similar tests of inductive reasoning ability show up quite often on IQ tests and the like. It's not the case that failing at that kind of test shows a lack of general reasoning skills, unless we permit that a substantial fraction of humans lack general reasoning skills to at least some extent.

Comment by dxu on GPT-3: a disappointing paper · 2020-05-31T02:12:32.804Z · LW · GW

I don't think the practical value of very new techniques is impossible to estimate. For example, the value of BERT was very clear in the paper that introduced it: it was obvious that this was a strictly better way to do supervised NLP, and it was quickly and widely adopted.

This comparison seems disingenuous. The goal of the BERT paper was to introduce a novel training method for Transformer-based models that measurably outperformed previous training methods. Conversely, the goal of the GPT-3 paper seems to be to investigate the performance of an existing training method when scaled up to previously unreached (and unreachable) model sizes. I would expect you to agree that these are two very different things, surely?

More generally, it seems to me that you've been consistently conflating the practical usefulness of a result with how informative said result is. Earlier, you wrote that "few-shot LM prediction" (not GPT-3 specifically, few-shot prediction in general!) doesn't sound that promising to you because the specific model discussed in the paper doesn't outperform SOTA on all benchmarks, and also requires currently impractical levels of hardware/compute. Setting aside the question of whether this original claim resembles the one you just made in your latest response to me (it doesn't), neither claim addresses what, in my view, are the primary implications of the GPT-3 paper--namely, what it says about the viability of few-shot prediction as model capacity continues to increase.

This, incidentally, is why I issued the "smell test" described in the grandparent, and your answer more or less confirms what I initially suspected: the paper comes across as unsurprising to you because you largely had no concrete predictions to begin with, beyond the trivial prediction that existing trends will persist to some (unknown) degree. (In particular, I didn't see anything in what you wrote that indicates an overall view of how far the capabilities current language models are from human reasoning ability, and what that might imply about where model performance might start flattening with increased scaling.)

Since it doesn't appear that you had any intuitions to begin with about what GPT-3's results might indicate about the scalability of language models in general, it makes sense that your reading of the paper would be framed in terms of practical applications, of which (quite obviously) there are currently none.

Comment by dxu on Draconarius's Shortform · 2020-05-30T20:01:02.374Z · LW · GW

If the number of guests is countable (which is the usual assumption in Hilbert’s setup), then every guest will only have to travel a finite (albeit unboundedly long) distance before they reach their room.

Comment by dxu on GPT-3: a disappointing paper · 2020-05-30T19:31:37.868Z · LW · GW

What do you think that main significance is?

I can’t claim to speak for gwern, but as far as significance goes, Daniel Kokotajlo has already advanced a plausible takeaway. Given that his comment is currently the most highly upvoted comment on this post, I imagine that a substantial fraction of people here share his viewpoint.

Given my past ML experience, this just doesn't sound that promising to me, which may be our disconnect.

I strongly suspect the true disconnect comes a step before this conclusion: namely, that “[your] past ML experience” is all that strongly predictive of performance using new techniques. A smell test: what do you think your past experience would have predicted about the performance of a 175B-parameter model in advance? (And if the answer is that you don’t think you would have had clear predictions, then I don’t see how you can justify this “review” of the paper as anything other than hindsight bias.)

Comment by dxu on AGIs as collectives · 2020-05-29T02:33:46.320Z · LW · GW
  • "There seems to be no reason not to expect that human value functions have similar problems, which even "aligned" AIs could trigger unless they are somehow designed not to." There are plenty of reasons to think that we don't have similar problems - for instance, we're much smarter than the ML systems on which we've seen adversarial examples. Also, there are lots of us, and we keep each other in check.
  • "For example, such AIs could give humans so much power so quickly or put them in such novel situations that their moral development can't keep up, and their value systems no longer apply or give essentially random answers." What does this actually look like? Suppose I'm made the absolute ruler of a whole virtual universe - that's a lot of power. How might my value system "not keep up"?

I confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.

Comment by dxu on AI Boxing for Hardware-bound agents (aka the China alignment problem) · 2020-05-09T04:47:27.956Z · LW · GW

"Will it happen?" isn't vacuous or easy, generally speaking. I can think of lots of questions where I have no idea what the answer is, despite a "trend of ever increasing strength".

In the post, you write:

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

"Will it happen?" is easy precisely in cases where a development "seems inevitable"; the hard part then becomes forecasting when such a development will occur. The fact that you (and most computer Go experts, in fact) did not do this is a testament to how unpredictable conceptual advances are, and your attempt to reduce it to the mere continuation of a trend is an oversimplification of the highest order.

I've made specific statements about my beliefs for when Human-Level AI will be developed. If you disagree with these predictions, please state your own.

You've made statements about your willingness to bet at non-extreme odds over relatively large chunks of time. This indicates both low confidence and low granularity, which means that there's very little disagreement to be had. (Of course, I don't mean to imply that it's possible to do better; indeed, given the current level of uncertainty surrounding everything to do with AI, about the only way to get me to disagree with you would have been to provide a highly confident, specific prediction.)

Nevertheless, it's an indicator that you do not believe you possess particularly reliable information about future advances in AI, so I remain puzzled that you would present your thesis so strongly at the start. In particular, your claim that the following questions

Does this mean that the development of human-level AI might not surprise us? Or that by the time human level AI is developed it will already be old news?

depend on

whether or not you were surprised by the development of Alpha-Go

seems to have literally no connection to what you later claim, which is that AlphaGo did not surprise you because you knew something like it had to happen at some point. What is the relevant analogy here to artificial general intelligence? Will artificial general intelligence be "old news" because we suspected from the start that it was possible? If so, what does it mean for something be "old news" if you have no idea when it will happen, and could not have predicted it would happen at any particular point until after it showed up?

As far as I can tell, reading through both the initial post and the comments, none of these questions have been answered.

Comment by dxu on AI Boxing for Hardware-bound agents (aka the China alignment problem) · 2020-05-09T00:36:36.234Z · LW · GW

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

This seems to entirely ignore most (if not all) of the salient implications of AlphaGo's development. What set AlphaGo apart from previous attempts at computer Go was the iterated distillation and amplification scheme employed during its training scheme. This represents a genuine conceptual advance over previous approaches, and to characterize it as simply a continuation of the trend of increasing strength in Go-playing programs only works if you neglect to define said "trend" in any way more specific than "roughly monotonically increasing". And if you do that, you've tossed out any and all information that would make this a useful and non-vacuous observation.

Shortly after this paragraph, you write:

For the record, I was surprised at how soon Alpha-Go happened, but not that it happened.

In other words, you got the easy and useless part ("will it happen?") right, and the difficult and important part ("when will it happen?") wrong. It's not clear to me why you feel this necessitated mention at all, but since you did mention it, I feel obligated to point out that "predictions" of this caliber are the best you'll ever be able to do if you insist on throwing out any information more specific and granular than "historically, these metrics seem to move consistently upward/downward".

Comment by dxu on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:09:51.563Z · LW · GW

If it were common knowledge that any hyperbolic language experts use when speaking about the unlikelihood of AGI (e.g. Andrew Ng's statement "worrying about AI safety is like worrying about overpopulation on Mars") actually corresponded to a 10% subjective probability of AGI, things would look very different than they currently do.

More generally, on a strategic level there is very little difference between a genuinely incorrect forecast and one that is "correct", but communicated so poorly as to create a wrong impression in the mind of the listener. If the state of affairs is such that anyone who privately believes there is a 10% chance of AGI is incentivized to instead report their assessment as "remote", the conclusion of Ord/Yudkowsky holds, and it remains impossible to discern whether AGI is imminent by listening to expert forecasts.

(I also don't believe that said experts, if asked to translate their forecasts to numerical probabilities, would give a median estimate anywhere near as high as 10%, but that's largely tangential to the discussion at hand.)

Furthermore, and more importantly, however: I deny that Fermi's 10% somehow detracts from the point that forecasting the future of novel technologies is hard.

Four years prior to overseeing the world's first nuclear reaction, Fermi believed that it was more likely than not that a nuclear chain reaction was impossible. Setting aside for a moment the question of whether Fermi's specific probability assignment was negligible, or merely small, what this indicates is that the majority of the information necessary to determine the possibility of a nuclear chain reaction was in fact unavailable to Fermi at the time he made his forecast. This does not support the idea that making predictions about technology is easy, any more than it would have if Fermi had assigned 0.001% instead of 10%!

More generally, the specific probability estimate Fermi gave is nothing more than a red herring, one that is given undue attention by the OP. The relevant factor to Ord/Yudkowsky's thesis is how much uncertainty there is in the probability distribution of a given technology--not whether the mean of said distribution, when treated as a point estimate, happens to be negligible or non-negligible. Focusing too much on the latter not only obfuscates the correct lesson to be learned, but also sometimes leads to nonsensical results.

Comment by dxu on Being right isn't enough. Confidence is very important. · 2020-04-07T21:25:51.648Z · LW · GW

The original post wasn’t talking about “correctness”; it was talking about calibration, which is a very specific term with a very specific meaning. Machines one and two are both well-calibrated, but there is nothing requiring that two well-calibrated distributions must perform equally well against each other in a series of bets.

Indeed, this is the very point of the original post, so your comment attempting to contradict it did not, in fact, do so.

Comment by dxu on Predictors exist: CDT going bonkers... forever · 2020-01-15T22:14:38.523Z · LW · GW

these examples can't actually happen, or are so rare that I'll pay that cost in order to have a simpler model for the other 99.9999% of my decisions

Indeed, if it were true that Newcomb-like situations (or more generally, situations where other agents condition their behavior on predictions of your behavior) do not occur with any appreciable frequency, there would be much less interest in creating a decision theory that addresses such situations.

But far from constituting a mere 0.0001% of possible situations (or some other, similarly minuscule percentage), Newcomb-like situations are simply the norm! Even in everyday human life, we frequently encounter other people and base our decisions off what we expect them to do—indeed, the ability to model others and act based on those models is integral to functioning as part of any social group or community. And it should be noted that humans do not behave as causal decision theory predicts they ought to—we do not betray each other in one-shot prisoner’s dilemmas, we pay people we hire (sometimes) well in advance of them completing their job, etc.

This is not mere “irrationality”; otherwise, there would have been no reason for us to develop these kinds of pro-social instincts in the first place. The observation that CDT is inadequate is fundamentally a combination of (a) the fact that it does not accurately predict certain decisions we make, and (b) the claim that the decisions we make are in some sense correct rather than incorrect—and if CDT disagrees, then so much the worse for CDT. (Specifically, the sense in which our decisions are correct—and CDT is not—is that our decisions result in more expected utility in the long run.)

All it takes for CDT to fail is the presence of predictors. These predictors don’t have to be Omega-style superintelligences—even moderately accurate predictors who perform significantly (but not ridiculously) above random chance can create Newcomb-like elements with which CDT is incapable of coping. I really don’t see any justification at all for the idea that these situations somehow constitute a superminority of possible situations, or (worse yet) that they somehow “cannot” happen. Such a claim seems to be missing the forest for the trees: you don’t need perfect predictors to have these problems show up; the problems show up anyway. The only purpose of using Omega-style perfect predictors is to make our thought experiments clearer (by making things more extreme), but they are by no means necessary.

Comment by dxu on Realism about rationality · 2020-01-14T23:35:31.356Z · LW · GW

That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing.

In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.

Comment by dxu on Realism about rationality · 2020-01-13T23:33:17.345Z · LW · GW

Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum?

It depends on what you mean by "lawful". Right now, the word "lawful" in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it's not clear to me why "law thinking" is relevant in the first place--it seems as though it simply muddies the discussion by introducing additional concepts.

Comment by dxu on Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More · 2019-10-04T04:27:05.281Z · LW · GW

Skimming through. May or may not post an in-depth comment later, but for the time being, this stood out to me:

I think it would only be relevant in a fantasy world in which people would be smart enough to design super-intelligent machines, yet ridiculously stupid to the point of giving it moronic objectives with no safeguards.

I note that Yann has not actually specified a way of not "giving [the AI] moronic objectives with no safeguards". The argument of AI risk advocates is precisely that the thing in quotes in the previous sentence is difficult to do, and that people do not have to be "ridiculously stupid" to fail at it--as evidenced by the fact that no one has actually come up with a concrete way of doing it yet. It doesn't look to me like Yann addressed this point anywhere; he seems to be under the impression that repeating his assertion more emphatically (obviously, when we actually get around to building the AI, we'll use our common sense and build it right) somehow constitutes an argument in favor of said assertion. This seems to be an unusually low-quality line of argument from someone who, from what I've seen, is normally much more clear-headed than this.

Comment by dxu on What explanatory power does Kahneman's System 2 possess? · 2019-08-12T17:52:50.266Z · LW · GW

I'm curious as to what prompted this question?

Comment by dxu on Weak foundation of determinism analysis · 2019-08-08T17:50:52.400Z · LW · GW

I have pointed out what people worry they are going to lose under determinism. Yes, they only going to have those things under nondeterminism.

You just said that nondeterminist intuitions are only mistaken if determinism is true and compatibilism is false. So what exactly is being lost if you subscribe to both determinism and compatibilism?

Comment by dxu on Weak foundation of determinism analysis · 2019-08-08T00:08:34.105Z · LW · GW

You're mixing levels. If someone can alter their decisions, that implies there are multiple possible next states of the universe

This is incorrect. It's possible to imagine a counterfactual state in which the person in question differs from their actual self in an unspecified manner, which thereby causes them to make a different decision; this counterfactual state differs from reality, but it is by no means incoherent. Furthermore, the comparison of various counterfactual futures of this type is how decision-making works; it is an abstraction used for the purpose of computation, not something ontologically fundamental to the way the universe works--and the fact that some people insist it be the latter is the source of much confusion. This is what I meant when I wrote:

Decision-making itself is also a process that occurs in the map, not the territory; there is no contradiction here.

So there is no "mixing levels" going on here, as you can see; rather, I am specifically making sure to keep the levels apart, by not tying the mental process of imagining and assessing various potential outcomes to the physical question of whether there are actually multiple physical outcomes. In fact, the one who is mixing levels is you, since you seem to be assuming for some reason that the mental process in question somehow imposes itself onto the laws of physics.

(Here is a thought experiment: I think you will agree that a chess program, if given a chess position and run for a prespecified number of steps, will output a particular move for that position. Do you believe that this fact prevents the chess program from considering other possible moves it might make in the position? If so, how do you explain the fact that the chess program explicitly contains a game tree with multiple branches, the vast majority of which will not in fact occur?)

There are various posts in the sequences that directly address this confusion; I suggest either reading them or re-reading them, depending on whether you have already.

Comment by dxu on Weak foundation of determinism analysis · 2019-08-07T20:29:55.829Z · LW · GW

But this is map, not territory.

Certainly. Decision-making itself is also a process that occurs in the map, not the territory; there is no contradiction here. Some people may find the idea of decision-making being anything but a fundamental, ontologically primitive process somehow unsatisfying, or even disturbing, but I submit that this is a problem with their intuitions, not with the underlying viewpoint.

(If someone goes so far as to alter their decisions based on their belief in determinism--say, by lounging on the couch watching TV all day rather than being productive, because their doing so was "predetermined"--I would say that they are failing to utilize their brain's decision-making apparatus. (Or rather, that they are not using it very well.) This has nothing to do with free will, determinism, or anything of the like; it is simply a (causal) consequence of the fact that they have misinterpreted what it means to be an agent in a deterministic universe.)

Comment by dxu on Weak foundation of determinism analysis · 2019-08-07T17:23:08.715Z · LW · GW

Belief in determinism is correlated with worse outcomes, but one doesn't cause the other; both are determined by the state and process of the universe.

Read literally, you seem to be suggesting that a deterministic universe doesn't have cause and effect, only correlation. But this reading seems prima facie absurd, unless you're using a very non-standard notion of "cause and effect". Are you arguing, for example, that it's impossible to draw a directed acyclic graph in order to model events in a deterministic universe? If not, what are you arguing?

Comment by dxu on Drive-By Low-Effort Criticism · 2019-08-02T01:13:29.140Z · LW · GW

We didn’t (or rather, shouldn’t) intend to reward or punish those “ancestor nodes”. We should intend to reward or punish the results.

I'm afraid this sentence doesn't parse for me. You seem to be speaking of "results" as something which to which the concept of rewards and punishments are applicable. However, I'm not aware of any context in which this is a meaningful (rather than nonsensical) thing to say. All theories of behavior I've encountered that make mention of the concept of rewards and punishments (e.g. operant conditioning) refer to them as a means of influencing behavior. If there's something else you're referring to when you say "reward or punish the results", I would appreciate it if you clarified what exactly that thing is.

Comment by dxu on Drive-By Low-Effort Criticism · 2019-08-02T00:46:30.824Z · LW · GW

You should neither reward nor punish strategies or attempts at all, but results.

This statement is presented in a way that suggests the reader ought to find it obvious, but in fact I don't see why it's obvious at all. If we take the quoted statement at face value, it appears to be suggesting that we apply our rewards and punishments (whatever they may be) to something which is causally distant from the agent whose behavior we are trying to influence--namely, "results"--and, moreover, that this approach is superior to the approach of applying those same rewards/punishments to something which is causally immediate--namely, "strategies".

I see no reason this should be the case, however! Indeed, it seems to me that the opposite is true: if the rewards and punishments for a given agent are applied based on a causal node which is separated from the agent by multiple causal links, then there is a greater number of ancestor nodes that said rewards/punishments must propagate through before reaching the agent itself. The consequences of this are twofold: firstly, the impact of the reward/punishment is diluted, since it must be divided among a greater number of potential ancestor nodes. And secondly, because the agent has no way to identify which of these ancestor nodes we "meant" to reward or punish, our rewards/punishments may end up impacting aspects of the agent's behavior we did not intend to influence, sometimes in ways that go against what we would prefer. (Moreover, the probability of such a thing occurring increases drastically as the thing we reward/punish becomes further separated from the agent itself.)

The takeaway from this, of course, is that strategically rewarding and punishing things grows less effective as the proxy on which said rewards and punishments are based grows further from the thing we are trying to influence--a result which sometimes goes by a more well-known name. This then suggests that punishing results over strategies, far from being a superior approach, is actually inferior: it has lower chances of influencing behavior we would like to influence, and higher chances of influencing behavior we would not like to influence.

(There are, of course, benefits as well as costs to rewarding and punishing results (rather than strategies). The most obvious benefit is that it is far easier for the party doing the rewarding and punishing: very little cognitive effort is required to assess whether a given result is positive or negative, in stark contrast to the large amounts of effort necessary to decide whether a given strategy has positive or negative expectation. This is why, for example, large corporations--which are often bottlenecked on cognitive effort--generally reward and punish their employees on the basis of easily measurable metrics. But, of course, this is a far cry from claiming that such an approach is simply superior to the alternative. (It is also why large corporations so often fall prey to Goodhart's Law.))

Comment by dxu on Drive-By Low-Effort Criticism · 2019-07-31T18:20:01.220Z · LW · GW

It is good to discourage people from spending a lot of effort on making things that have little or no (or even negative) value.

Would you care to distinguish a means of discouraging people from spending effort on low-value things, from a means that simply discourages people from spending effort in general? It seems to me that here you are taking the concept of "making things that have little or no (or even negative) value" as a primitive action--something that can be "encouraged" or "discouraged"--whereas, on the other hand, it seems to me that the true primitive action here is spending effort in the first place, and that actions taken to disincentivize the former, will in fact turn out to disincentivize the latter.

If this is in fact the case, then the question is not so simple as whether we ought to discourage posters from spending effort on making incorrect posts (to which the answer would of course be "yes, we ought"), but rather, whether we ought to discourage posters from spending effort. To this, you say:

But there is no virtue in mere effort.

Perhaps there is no "virtue" in effort, but in that case we must ask why "virtue" is the thing we are measuring. If the goal is to maximize, not "virtue", but high-quality posts, then I submit that (all else being equal) having more high-effort posts is more likely to accomplish this than having fewer high-effort posts. Unless your contention is that all else is not equal (perhaps high-effort posts are more likely to contain muddled thinking, and hence more likely to have incorrect conclusions? but it's hard to see why this should be the case a priori), then it seems to me that encouraging posters to put large amounts of effort into their posts is simply a better course of action than discouraging them.

And what does it mean to "encourage" or "discourage" a poster? Based on the following part of your comment, it seems that you are taking "discourage" to mean something along the lines of "point out ways in which the post in question is mistaken":

If I post a long, in-depth analysis, which is lovingly illustrated, meticulously referenced, and wrong, and you respond with a one-line comment that points out the way in which my post was wrong, then I have done poorly (and my post ought to be downvoted), while you have done well (and your comment ought to be upvoted).

But how often is it the case that a "long, in-depth analysis, which is lovingly illustrated [and] meticulously referenced" is, not only wrong, but so obviously wrong that the mistake can be pointed out via a simple one-liner? I claim that this so rarely occurs that it should play a negligible role in our considerations--in other words, that the hypothetical situation you describe does not reflect reality.

What occurs more often, I think, is that a commenter finds themselves mistakenly under the impression that they have spotted an obvious error, and then proceeds to post (what they believe to be) an obvious refutation. I further claim that such cases are disproportionately responsible for the so-called "drive-by low-effort criticism" described in the OP. It may be that you disagree with this, but whether it is true or not is in a matter of factual accuracy, not opinion. However, if one happens to believe it is true, then it should not be difficult to understand why one might prefer to see less of the described behavior.

Comment by dxu on FactorialCode's Shortform · 2019-07-31T00:02:55.131Z · LW · GW

By default on reddit and lesswrong, posts start with 1 karma, coming from the user upvoting themselves.

Actually, on LessWrong, I'm fairly sure the karma value of a particular user's regular vote depends on the user's existing karma score. Users with a decent karma total usually have a default vote value of 2 karma rather than 1, so each comment they post will have 2 karma to start. Users with very high karma totals seem to have a vote that's worth 3 karma by default. Something similar happens with strong votes, though I'm not sure what kind of math is used there.

Aside: I've sometimes thought that users should be allowed to pick a value for their vote that's anywhere between 1 and the value of their strong upvote, instead of being limited to either a regular vote (2 karma in my case) or a strong vote (6 karma). In my case, I literally can't give people karma values of 1, 3, 4, or 5, which could be useful for more granular valuations.

Comment by dxu on Dialogue on Appeals to Consequences · 2019-07-25T17:44:06.908Z · LW · GW

The part about climate science seems like a pretty bog-standard outside view argument, which in turn means I find it largely uncompelling. Yes, there are people who are so stupid, they can only be saved from their own stupidity by executing an epistemic maneuver that works regardless of the intelligence of the person executing it. This does not thereby imply that everyone should execute the same maneuver, including people who are not that stupid, and therefore not in need of saving. If someone out there is so incompetent that they mistakenly perceive themselves as competent, then they are already lost, and the fact that an illegal (from the perspective of normative probability theory) epistemic maneuver exists which would save them if they executed it, does not thereby make that maneuver a normatively good move. (And even if it were, it's not as though the people who would actually benefit from said maneuver are going to execute it--the whole reason that such people are loudly, confidently mistaken is that they don't take the outside view seriously.)

In short: there is simply no principled justification for modesty-based arguments, and--though it may be somewhat impolite to say--I agree with Eliezer that people who find such arguments compelling are actually being influenced by social modesty norms (whether consciously or unconsciously), rather than any kind of normative judgment. Based on various posts that Scott has written in the past, I would venture to say that he may be one of those people.

Comment by dxu on Dialogue on Appeals to Consequences · 2019-07-25T17:24:50.585Z · LW · GW

This is a fictional dialogue demonstrating a meta-level point about how discourse works, and your comment is pretty off-topic.

I think that if a given "meta-level point" has obvious ties to existing object-level discussions, then attempting to suppress the object-level points when they're raised in response is pretty disingenuous. (What I would actually prefer is for the person making the meta-level point to be the same person pointing out the object-level connection, complete with "and here is why I feel this meta-level point is relevant to the object level". If the original poster doesn't do that, then it does indeed make comments on the object-level issues seem "off-topic", a fact which ought to be laid at the feet of the original poster for not making the connection explicit, rather than at the feet of the commenter, who correctly perceived the implications.)

Now, perhaps it's the case that your post actually had nothing to do with the conversations surrounding EA or whatever. (I find this improbable, but that's neither here nor there.) If so, then you as a writer ought to have picked a different example, one with fewer resemblances to the ongoing discussion. (The example Jeff gave in his top-level comment, for example, is not only clearer and more effective at conveying your "meta-level point", but also bears significantly less resemblance to the controversy around EA.) The fact that the example you chose so obviously references existing discussions that multiple commenters pointed it out is evidence that either (a) you intended for that to happen, or (b) you really didn't put a lot of thought into picking a good example.

Comment by dxu on Appeal to Consequence, Value Tensions, And Robust Organizations · 2019-07-24T18:24:52.143Z · LW · GW

In contrast, my model has been that communities congregate around predictable sources of high-quality writing, and people who can produce high-quality content in high volume are very rare. Thus, once Eliezer Yudkowsky stopped being active, and Yvain a.k.a. the immortal Scott Alexander moved to Slate Star Codex (in part so that he could write about politics, which we've traditionally avoided), all the "intellectual energy" followed Scott to SSC.

First, I want to state that I agree with this model. However, I also want to note that the SSC comments section tend to have fairly low-quality discussion (in comparison to the OB/LW 1.0 heyday), and I'm not sure why this is; candidate hypotheses include that Scott's explicit politics attracted people with lower epistemic standards, or that the lack of an explicit karma system allowed low-quality discussion to persist (but I don't think OB had an explicit karma system either?).

Overall, I'm unsure as to what kind of norms/technology maintains high-quality discussion (as opposed to just the presence of discussion in general), and it's plausible to me that the two may actually be somewhat mutually exclusive (in the sense that norms/technology designed to promote the volume of high-quality discussion may in fact reduce the volume of discussion in general). It's not clear to me how this tradeoff should be balanced.

Comment by dxu on [deleted post] 2019-07-22T18:14:46.664Z

A functional duplicate of an entity that reports having such-and-such a quale will report having it even if doesn't.

In that case, there's no reason to think anyone has qualia. The fact that lots of people say they have qualia, doesn't actually mean anything, because they'd say so either way; therefore, those people's statements do not constitute valid evidence in favor of the existence of qualia. And if people's statements don't constitute evidence for qualia, then the sum total of evidence for qualia's existence is... nothing: there is zero evidence that qualia exist.

So your interpretation is self-defeating: there is no longer a need to explain qualia, because there's no reason to suppose that they exist in the first place. Why try and explain something that doesn't exist?

On the other hand, it remains an empirical fact that people do actually talk about having "conscious experiences". This talk has nothing to do with "qualia" as you've defined the term, but that doesn't mean it's not worth investigating in its own right, as a scientific question: "What is the physical cause of people's vocal cords emitting the sounds corresponding to the sentence 'I'm conscious of my experience'?" What the generalized anti-zombie principle says is that the answer to this question, will in fact explain qualia--not the concept that you described or that David Chalmers endorses (which, again, we have literally zero reason to think exists), but the intuitive concept that led philosophers to coin the term "qualia" in the first place.

Comment by dxu on Rationality is Systematized Winning · 2019-07-22T02:12:33.686Z · LW · GW

You're confusing ends with means, terminal goals with instrumental goals, morality with decision theory, and about a dozen other ways of expressing the same thing. It doesn't matter what you consider "good", because for any fixed definition of "good", there are going to be optimal and suboptimal methods of achieving goodness. Winning is simply the task of identifying and carrying out an optimal, rather than suboptimal, method.

Comment by dxu on Why it feels like everything is a trade-off · 2019-07-18T17:00:18.813Z · LW · GW

Good post. Seems related to (possibly the same concept as) why the tails come apart.

Comment by dxu on The AI Timelines Scam · 2019-07-12T21:55:21.046Z · LW · GW

There are strong prior reasons to think that it's better for the public to have better beliefs about AI strategy.

That may be, but note that the word "prior" is doing basically all of the work in this sentence. (To see this, just replace "AI strategy" with practically any other subject, and notice how the modified statement sounds just as sensible as the original.) This is important because priors can easily be overwhelmed by additional evidence--and insofar as AI researcher Alice thinks a specific discussion topic in AI strategy has the potential to be dangerous, it's worth realizing Alice probably has some specific inside view reasons to believe that's the case. And, if those inside view arguments happen to require an understanding of the topic that Alice believes to be dangerous, then Alice's hands are now tied: she's both unable to share information about something, and unable to explain why she can't share that information.

Naturally, this doesn't just make Alice's life more difficult: if you're someone on the outside looking in, then you have no way of confirming if anything Alice says is true, and you're forced to resort to just trusting Alice. If you don't have a whole lot of trust in Alice to begin with, you might assume the worst of her: Alice is either rationalizing or lying (or possibly both) in order to gain status for herself and the field she works in.

I think, however, that these are dangerous assumptions to make. Firstly, if Alice is being honest and rational, then this policy effectively punishes her for being "in the know"--she must either divulge information she (correctly) believes to be dangerous, or else suffer an undeserved reputational hit. I'm particularly wary of imposing incentive structures of this kind around AI safety research, especially considering the relatively small number of people working on AI safety to begin with.

Secondly, however: in addition to being unfair to Alice, there are more subtle effects that such a policy may have. In particular, if Alice feels pressured to disclose the reasons she can't disclose things, that may end up influencing the rate and/or quality of the research she does in the first place (Ctrl+F "walls"). This could have serious consequences down the line for AI safety research, above and beyond the object-level hazards of revealing potentially dangerous ideas to the public.

Given all of this, I don't think it's obvious that the best move at this point involves making all of the strategic arguments around AI safety public. (And note that I say this as a member of said public: I am not affiliated with MIRI or any other AI safety institution, nor am I personally acquainted with anyone who is so affiliated. This therefore makes me a direct counter-example to your claim about the public in general having reason to think secret-keeping organizations must be doing so for self-interested reasons.)

To be clear: I think there is a possible world in which your arguments make sense. I also think there is a possible world in which your arguments not only do not make sense, but would lead to a clearly worse outcome if taken seriously. It's not clear to me which of these worlds we actually live in, and I don't think you've done a sufficient job of arguing that we live in the former world instead of the latter.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T23:31:36.524Z · LW · GW

would react to a wound but not pass the mirror test

I mean, reacting to a wound doesn't demonstrate that they're actually experiencing pain. If experiencing pain actually requires self-awareness, then an animal could be perfectly capable of avoiding damaging stimuli without actually feeling pain from said stimuli. I'm not saying that's actually how it works, I'm just saying that reacting to wounds doesn't demonstrate what you want it to demonstrate.

Comment by dxu on Are we certain that gpt-2 and similar algorithms are not self-aware? · 2019-07-11T21:31:36.832Z · LW · GW

It can be aware of an experience its' having, even if its' not aware that it is the one having the experience

I strongly suspect this sentence is based on a confused understanding of qualia.