Decision theory question: Is it possible to acausally blackmail the universe into telling you how time travel works? 2021-09-06T23:52:44.563Z


Comment by dxu on Rafael Harth's Shortform · 2021-09-03T19:35:17.177Z · LW · GW

I realize you're not exactly saying it outright, but some parts of your comment seem to be gesturing at the idea that smart people should adopt a "modesty norm" among themselves. I think this is a very bad idea for reasons EY already articulated, so I'd just like to clarify whether this is what you believe?

Comment by dxu on Grokking the Intentional Stance · 2021-09-01T21:43:51.311Z · LW · GW

If "fact" means "component of reality, whether know or not", it does not follow...but that is irrelevant, since I did not deny the existence of some kind of reality.

Well, good! It's heartening to see we agree on this; I would ask then why it is that so many subscribers to epistemological minimalism (or some variant thereof) seem to enjoy phrasing their claims in such a way as to sound as though they are denying the existence of external reality; but I recognize that this question is not necessarily yours to answer, since you may not be one of those people.

I repeat the second of my two arguments: to build an accurate model of reality requires taking some assumptions to be foundational;

In which case, I will repeat that the only testable accuracy we have is predictive accuracy, and we do not know whether our ontological claims are accurate, because we have no direct test.

For predictive accuracy and "ontological accuracy" to fail to correspond [for some finite period of time] would require the universe to possess some very interesting structure; the longer the failure of correspondence persists, the more complex the structure in question must be; if (by hypothesis) the failure of correspondence persists indefinitely, the structure in question must be uncomputable.

Is it your belief that one of the above possibilities is the case? If so, what is your reason for this belief, and how does it contend with the (rather significant) problem that the postulated complexity must grow exponentially in the amount of time it takes for the "best" (most predictive) model to line up with the "true" model?

[The above argument seems to address the majority of what I would characterize as your "true" rejection; your comment contained other responses to me concerning sense data, natural selection, the reliability of animal senses, etc. but those seem to me mostly like minutiae unrelated to your main point. If you believe I'm mistaken about this, let me know which of those points you would like a specific response to; in the interim, however, I'm going to ignore them and jump straight to the points I think are relevant.]

The problem is that, while simplicity criteria allow you to select models , you need to know that they are selecting models that are more likely to correspond to reality, rather than on some other basis. SI fares particularly badly, because there is no obvious reason why a short programme should be true, or even that it is a description of reality at all .

The simplicity criterion does not come out of nowhere; it arises from the fact that description complexity is bounded below, but unbounded above. In other words, you can make a hypothesis as complex as you like, adding additional epicycles such that the description complexity of your hypothesis increases without bound; but you cannot decrease the complexity of your hypothesis without bound, since for any choice of computational model there exists a minimally complex hypothesis with description length 0, beneath which no simpler hypotheses exist.

This means that for any hypothesis in your ensemble—any computable [way-that-things-could-be]—there are only finitely many hypotheses with complexity less than that of the hypothesis in question, but infinitely many hypotheses with complexity equal or greater. It follows that for any ordering whatsoever on your hypothesis space, there will exist some number n such that the complexity of the kth hypothesis H_k exceeds some fixed complexity C for any k > n... the upshot of which is that every possible ordering of your hypothesis space corresponds, in the limit, to a simplicity prior.

Does it then follow that the universe we live in must be a simple one? Of course not—but as long as the universe is computable, the hypothesis corresponding to the "true" model of the universe will live only finitely far down our list—and each additional bit of evidence we receive will, on average, halve that distance. This is what I meant when I said that the (postulated) complexity of the universe must grow exponentially in the amount of time any "correspondence failure" can persist: each additional millisecond (or however long it takes to receive one bit of evidence) that the "correspondence failure" persists corresponds to a doubling of the true hypothesis' position number in our list.

So the universe need not be simple a priori for Solomonoff induction to work. All that is required is that the true description complexity of the universe does not exceed 2^b, where b represents the sum total of all knowledge we have accumulated thus far, in bits. That this is a truly gigantic number goes without saying; and if you wish to defend the notion that the "true" model of the universe boasts a complexity in excess of this value, you had better be prepared to come up with some truly extraordinary evidence.

For you to deny this would require that you claim the universe is not describable by any computable process

I see no strength to that claim at all. The universe is partly predictable by computational processes, and that's all for we know.

It is the claim that a programme is ipso facto a description that us extraordinary.

This is the final alternative: the claim, not that the universe's true description complexity is some large but finite value, but that it is actually infinite, i.e. that the universe is uncomputable.

I earlier (in my previous comment) said that "I should like to see" you defend this claim, but of course this was rhetorical; you cannot defend this claim, because no finite amount of evidence you could bring to bear would suffice to establish anything close. The only option, therefore, is for you to attempt to flip the burden of proof, claiming that the universe should be assumed uncomputable by default; and indeed, this is exactly what you did: "It is the claim that a program is ipso facto a description that is extraordinary."

But of course, this doesn't work. "The claim that a program can function as a description" is not an extraordinary claim at all; it is merely a restatement of how programs work: they take some input, perform some internal manipulations on that input, and produce an output. If the input in question happens to be the observation history of some observer, then it is entirely natural to treat the output of the program as a prediction of the next observation; there is nothing extraordinary about this at all!

So the attempted reversal of the burden of proof fails; the "extraordinary" claim remains the claim that the universe cannot be described by any possible program, regardless of length, and the burden of justifying such an impossible-to-justify claim is, thankfully, not my problem.


Comment by dxu on Grokking the Intentional Stance · 2021-09-01T18:59:46.027Z · LW · GW

It's not been mentioned enough, since the point has not generally sunk in.

I find this response particularly ironic, given that I will now proceed to answer almost every one of your points simply by reiterating one of the two arguments I provided above. (Perhaps it's generally a good idea to make sure the point of someone's comment has "sunk in" before replying to them.)

Pragmatism isn't a sufficient answer, because it can show that we are accumulating certain kinds of knowledge, namely the ability to predict things and make things, but does not show that we are accumulating other kinds , specifically ontological knowledge, ie. successful correspondence to reality.

Suppose this is true (i.e. suppose we have no means of accumulating "ontological knowledge"). I repeat the first of my two arguments: by what mechanism does this thereby imply that no ontological facts of any kind exist? Is it not possible both that (a) the sky exists and has a color, and (b) I don't know about it? If you claim this is not possible, I should like to see you defend this very strong positive claim; conversely, if you do not make such a claim, the idea that the "problem of the criterion" has any ontological implications whatsoever is immediately dismissed.

The thing scientific realists care about is having an accurate model of reality, knowing what things are. If you want that, then instrumentalism is giving up something of value to you. So long as it s possible. If realistic knowledge is impossible , then ther'es no loss of value.

I repeat the second of my two arguments: to build an accurate model of reality requires taking some assumptions to be foundational; mathematicians might call these "axioms", whereas Bayesians might call them "the prior". As long as you have such a foundation, it is possible to build models that are at least as trustworthy as the foundation itself; the limiting factor, therefore, on the accumulation of scientific knowledge—or, indeed, any other kind of knowledge—is the reliability of our foundations.

And what are our foundations? They are the sense and reasoning organs provided to us by natural selection; to the extent that they our trustworthy, the theories and edifices we build atop them will be similarly trustworthy. (Assuming, of course, that we do not make any mistakes in our construction.)

So the "problem of the criterion" reduces to the question of how reliable natural selection is at building organisms with trustworthy senses; to this question I answer "very reliable indeed." Should you claim otherwise, I should like to see you defend this very strong positive claim; if you do not claim otherwise, then the "problem of the criterion" immediately ceases to exist.

Far from having no ontological implications, the problem of the criterion has mainly ontological implications, since the pragmatic response works in other areas.

I repeat the first of my two arguments: what ontological implications, and why? I should like to see you defend the (very strong) positive claim that such implications exist; or, alternatively, relinquish the notion that they do.

Of course, it isn’t a coincidence: the reason we can get away with trusting our senses is because our senses actually are trustworthy

Trustworthy or reliable at what?

Per my second argument: at doing whatever they need to do in order for us not to have been selected out of existence—in other words, at providing an effective correspondence between our beliefs and reality. (Why yes, this is the thing the "problem of the criterion" claims to be impossible; why yes, this philosophical rigmarole does seem to have had precisely zero impact on evolution's ability to build such organisms.)

Should you deny that evolution has successfully built organisms with trustworthy senses, I should like to see you defend this very strong positive claim, etc. etc.

You cannot ascertain an ontologicaly correct model of reality just by looking at things. A model is a theoretical structure. Multiple models can be compatible with the same sense data, so a a further criterion is needed. Of course, you can still do predictive, instrumental stuff with empiricism.

The problem of selecting between multiple compatible models is not something I often see packaged with the "problem of the criterion" and others of its genre; it lacks the ladder of infinite descent that those interested in the genre seem to find so attractive, and so is generally omitted from such discussions. But since you bring it up: there is, of course, a principled way to resolve questions of this type as well; the heuristic version (which humans actually implement) is called Occam's razor, whereas the ideal version is called Solomonoff induction.

This is an immensely powerful theoretical tool, mind you: since Solomonoff induction contains (by definition) every computable hypothesis, that means that every possible [way-that-things-could-be] is contained somewhere in its hypothesis space, including what you refer to as "the ontologically correct model of reality"; moreover, one of the theoretical guarantees of Solomonoff induction is that said "correct model" will become the predictor's dominant hypothesis after a finite (and generally, quite short) amount of time.

For you to deny this would require that you claim the universe is not describable by any computable process; and I should like to see you defend this very strong positive claim, etc. etc.

That's part of the problem, not part of the solution. If evolution is optimising for genetic fitness, then it is not optimising for the ability to achieve a correct ontology ... after all , a wrong but predictive model is good enough for survival.

Per my second argument: evolution does not select over models; it selects over priors. A prior is a tool for constructing models; if your prior is non-stupid, i.e. if it doesn't rule out some large class of hypotheses a priori, you will in general be capable of figuring out what the correct model is and promoting it to attention. For you to deny this would require that you claim non-stupid priors confer no survival advantage over stupid priors; and I should like to see you defend this very strong positive claim, etc. etc.

The issues I mentioned have not been answered.

Yes, well.

Comment by dxu on Grokking the Intentional Stance · 2021-09-01T18:53:49.680Z · LW · GW

This feels like violent agreement with my arguments in the linked post, so I think you're arguing against some different reading of the implications of the problem of the criterion than what it does imply.

Perhaps I am! But if so, I would submit that your chosen phrasings of your claims carry unnecessary baggage with them, and that you would do better to phrase your claims in ways that require fewer ontological commitments (even if they become less provocative-sounding thereby).

It doesn't imply there is literally no way to ground knowledge, but that ground is not something especially connected to traditional notions of truth or facts, but rather in usefulness to the purpose of living.

In a certain sense, yes. However, I assert that "traditional notions of truths or facts" (at least if you mean by that phrase what I think you do) are in fact "useful to the purpose of living", in the following sense:

It is useful to have senses that tell you the truth about reality (as opposed to deceiving you about reality). It is useful to have a brain that is capable of performing logical reasoning (as opposed to a brain that is not capable of performing logical reasoning). It is useful to have a brain that is capable of performing probabilistic reasoning (as opposed to a brain that is not, etc. etc).

To the extent that we expect such properties to be useful, we ought also to expect that we possess those properties by default. Otherwise we would not exist in the form we do today; some superior organism would be here in our place, with properties more suited to living in this universe than ours. Thus, "traditional notions of truths and facts" remain grounded; there are no excess degrees of freedom available here.

To what extent do you find the above explanation unsatisfying? And if you do not find it unsatisfying, then (I repeat): what is the use of talking about the "problem of the criterion", beyond (perhaps) the fact that it allows you to assert fun and quirky and unintuitive (and false) things like "facts don't exist"?

This mostly comes up when we try to assess things like what does it mean for something to "be" an "agent". We then run headlong into the grounding problem and this becomes relevant, because what it means for something to "be" an "agent" ends up connected to what end we need to categorize the world, rather than how the world actually "is", since the whole point is that there is no fact of the matter about what "is", only a best effort assessment of what's useful (and one of the things that's really useful is predicting the future, which generally requires building models that correlate to past evidence).

I agree that this is a real difficulty that people run into. I disagree with [what I see as] your [implicit] claim that the "problem of the criterion" framing provides any particular tools for addressing this problem, or that it's a useful framing in general. (Indeed, the sequence I just linked constitutes what I would characterize as a "real" attempt to confront the issue, and you will note the complete absence of claims like "there is no such thing as knowledge" in any of the posts in question; in the absence of such claims, you will instead see plenty of diagrams and mathematical notation.)

It should probably be obvious by now that I view the latter approach as far superior to the former. To the extent that you think I'm not seeing some merits to the former approach, I would be thrilled to have those merits explained to me; right now, however, I don't see anything.

Comment by dxu on Grokking the Intentional Stance · 2021-09-01T05:23:48.902Z · LW · GW

[META: To be fully honest, I don't think the comments section of this post is the best place to be having this discussion. That I am posting this comment regardless is due to the fact that I have seen you posting about your hobby-horse—the so-called "problem of the criterion"—well in excess of both the number of times and places I think it should be mentioned—including on this post whose comments section I just said I don't think is suited to this discussion. I am sufficiently annoyed by this that it has provoked a response from me; nonetheless, I will remove this comment should the author of the post request it.]

Worth saying, I think, that this is fully generally true that there's no observer-independent fact of the matter about whether X "is" Y.

The linked post does not establish what it claims to establish. The claim in question is that "knowledge" cannot be grounded, because "knowledge" requires "justification", which in turn depends on other knowledge, which requires justification of its own, ad infinitum. Thus no "true" knowledge can ever be had, throwing the whole project of epistemology into disarray. (This is then sometimes used as a basis to make extravagantly provocative-sounding claims like "there is no fact of the matter about anything".)

Of course, the obvious response to this is to point out that in fact, humans have been accumulating knowledge for quite some time; or at the very least, humans have been accumulating something that very much looks like "knowledge" (and indeed many people are happy to call it "knowledge"). This obvious objection is mainly "addressed" in the linked post by giving it the name of "pragmatism", and behaving as though the act of labeling an objection thereby relieves that objection of its force.

However, I will not simply reassert the obvious objection here. Instead, I will give two principled arguments, each of which I believe suffices to reject the claims offered in the linked post. (The two arguments in question are mostly independent of each other, so I will present them separately.)

First, there is the question of whether epistemological limitations have significant ontological implications. There is a tendency for "problem of the criterion" adherents to emit sentences that imply they believe this, but—so far as I can tell—they offer no justification for this belief.

Suppose it is the case that I cannot ever know for sure that the sky is blue. (We could substitute any other true fact here, but "the sky is blue" seems to be something of a canonical example.) Does it then follow that there is no objective fact about the color of the sky? If so, why? Through what series of entanglements—causal, logical, or otherwise—does knowing some fact about my brain permit you to draw a conclusion about the sky (especially a conclusion as seemingly far-fetched as "the sky doesn't have a color")?

(Perhaps pedants will be tempted to object here that "color" is a property of visual experiences, not of physical entities, so it is in fact true that the sky has no "color" if no one is there to see it. This is where the substitution clause above enters: you may replace "the sky is blue" with any true claim you wish, at any desired level of specificity, e.g. "the majority of light rays emitted from the Sun whose pathway through the atmosphere undergoes sufficient scattering to reach ground level will have an average wavelength between 440-470 nm".)

If you bite this particular bullet—meaning, you believe that my brain's (in)ability to know something with certainty implies that that something in fact does not exist—then I would have you reconcile the myriad of issues this creates with respect to physics, logic, and epistemology. (Starter questions include: do I then possess the ability to alter facts as I see fit, merely by strategically choosing what I do and do not find out? By what mechanism does this seemingly miraculous ability operate? Why does it not violate known physical principles such as locality, or known statistical principles such as d-separation?)

And—conversely—if you do not bite the bullet in question, then I would have you either (a) find some alternative way to justify the (apparent) ontological commitments implied by claims such as "there's no observer-independent fact of the matter about whether X is Y", or (b) explain why such claims actually don't carry the ontological commitments they very obviously seem to carry.

(Or (c): explain why you find it useful, when making such claims, to employ language in a way that creates such unnecessary—and untrue—ontological commitments.)

In any of the above cases, however, it seems to me that the original argument has been refuted: regardless of which prong of the 4-lemma you choose, you cannot maintain your initial assertion that "there is no objective fact of the matter about anything".

The first argument attacked the idea that fundamental epistemological limitations have broader ontological implications; in doing so, it undermines the wild phrasing I often see thrown around in claims associated with such limitations (e.g. the "problem of the criterion"), and also calls into question the degree to which such limitations are important (e.g. if they don't have hugely significant implications for the nature of reality, why does "pragmatism" not suffice as an answer?).

The second argument, however, attacks the underlying claim more directly. Recall the claim in question:

[...] that "knowledge" cannot be grounded, because "knowledge" requires "justification", which in turn depends on other knowledge, which requires justification of its own, ad infinitum.

Is this actually the case, however? Let's do a case analysis; we'll (once again) use "the sky is blue" as an example:

Suppose I believe that the sky is blue. How might I have arrived at such a belief? There are multiple possible options (e.g. perhaps someone I trust told me that it's blue, or perhaps I've observed it to be blue on every past occasion, so I believe it to be blue right now as well), but for simplicity's sake we'll suppose the most direct method possible: I'm looking at it, right now, and it's blue.

However (says the problemist of the criterion) this is not good enough. For what evidence do you have that your eyes, specifically, are trustworthy? How do you know that your senses are not deceiving you at this very moment? The reliability of one's senses is, in and of itself, a belief that needs further justification—and so, the problemist triumphantly proclaims, the ladder of infinite descent continues.

But hold on: there is something very strange about this line of reasoning. Humans do not, as a rule, ordinarily take their senses to be in doubt. Yes, there are exceptional circumstances (optical illusions, inebriation, etc.) under which we have learned to trust our senses less than we usually do—but the presence of the word "usually" in that sentence already hints at the fact that, by default, we take our senses as a given: trustworthy, not because of some preexisting chain of reasoning that "justifies" it by tying it to some other belief(s), but simply... trustworthy. Full stop.

Is this an illegal move, according to the problemist? After all, the problemist seems to be very adamant that one cannot believe anything without justification, and in taking our senses as a given, we certainly seem to be in violation of this rule...

...and yet, most of the time things seem to work out fine for us in real life. Is this a mere coincidence? If it's true that we are actually engaging in systematically incorrect reasoning—committing an error of epistemology, an illegal move according to the rules that govern proper thought—and moreover, doing so constantly throughout every moment of our lives—one might expect us to have run into some problems by now. Yet by and large, the vast majority of humanity is able to get away with trusting their senses in their day-to-day lives; the universe, for whatever reason, does not conspire to punish our collective irrationality. Is this a coincidence, akin to the coincidences that sometimes reward, say, an irrational lottery ticket buyer? If so, there sure do seem to be quite a few repeat lottery winners on this planet of ours...

Of course, it isn't a coincidence: the reason we can get away with trusting our senses is because our senses actually are trustworthy. And it's also no coincidence that we believe this implicitly, from the very moment we are born, well before any of us had an opportunity to learn about epistemology or the "problem of the criterion". The reliability of our senses, as well as the corresponding trust we have in those senses, are both examples of properties hardcoded into us by evolution—the result of billions upon billions of years of natural selection on genetic fitness.

There were, perhaps, organisms on whom the problemist's argument would have been effective, in the ancient history of the Earth—organisms who did not have reliable senses, and who, if they had chosen to rely unconditionally on whatever senses they possessed, would have been consistently met with trouble. (And if such organisms did not exist, we can still imagine that they did.) But if such organisms did exist back then, they no longer do today: natural selection has weeded them out, excised them from the collective genetic pool for being insufficiently fit. The harsh realities of competition permit no room for organisms to whom the "problem of the criterion" poses a real issue.

And as for the problemist's recursive ladder of justification? It runs straight into the hard brick wall called "evolutionary hardcoding", and proceeds no further than that: the buck stops immediately. Evolution neither needs nor provides justification for the things it does; it merely optimizes for inclusive genetic fitness. Even attempting to apply the problemist's tried-and-true techniques to the alien god produces naught but type and category errors; the genre of analysis preferred by the problemist finds no traction whatsoever. Thus, the problemist of the criterion is defeated, and with him so too vanishes his problem.

Incidentally, what genre of analysis does work on evolution? Since evolution is an optimization process, the answer should be obvious enough: the mathematical study of optimization, with all of the various fields and subfields associated with it. But that is quite beyond the scope of this comment, which is more than long enough as it is. So I leave you with this, and sincerely request that you stop beating your hobby-horse to death: it is, as it were, already dead.

Comment by dxu on Can you control the past? · 2021-09-01T03:15:15.876Z · LW · GW

The output of this process is something people have taken to calling Son-of-CDT; the problem (insofar as we understand Son-of-CDT well enough to talk about its behavior) is that the resulting decision theory continues to neglect correlations that existed prior to self-modification.

(In your terms: Alice and Bob would only one-box in Newcomb variants where Omega based his prediction on them after they came up with their new decision theory; Newcomb variants where Omega's prediction occurred before they had their talk would still be met with two-boxing, even if Omega is stipulated to be able to predict the outcome of the talk.)

This still does not seem like particularly sane behavior, which means, unfortunately, that there's no real way for a CDT agent to fix itself: it was born with too dumb of a prior for even self-modification to save it.

Comment by dxu on Alignment Research = Conceptual Alignment Research + Applied Alignment Research · 2021-08-31T18:46:52.324Z · LW · GW

I think it's becoming less clear to me what you mean by deconfusion. In particular, I don't know what to make of the following claims:

So I have the perspective that deconfusion requires an application. And this application in return constrains what count as a successful deconfusion. [...]

What's interesting about these examples is that only the abstraction case is considered to have completely deconfused the concept it looked at. [...]

Similarly, Cartesian frames sound promising but it's not clear that they actually completely deconfuse the concept they focus on. [...]

Yet not being able to ground everything you think about formally doesn't mean you're not making progress in deconfusing them, nor that they can't be useful at that less than perfectly formal stage.

I don't [presently] think these claims dovetail with my understanding of deconfusion. My [present] understanding of deconfusion is that (loosely speaking) it's a process for taking ideas from the [fuzzy, intuitive, possibly ill-defined] sub-cluster and moving them to the [concrete, grounded, well-specified] sub-cluster.

I don't think this process, as I described it, entails having an application in mind. (Perhaps I'm also misunderstanding what you mean by application!) It seems to me that, although many attempts at deconfusion-style alignment research (such as the three examples I gave in my previous comment) might be ultimately said to have been motivated by the "application" of aligning superhuman agents, in practice they happened more because somebody noticed that whenever some word/phrase/cluster-of-related-words-and-phrases came up in conversation, people would talk about them in conflicting ways, use/abuse contradictory intuitions while talking about them, and just in general (to borrow Nate's words) "continuously accidentally spout nonsense".

But perhaps from your perspective, that kind of thing also counts as an application, e.g. the application of "making us able to talk about the thing we actually care about". If so, then:

  • I agree that it's possible to make progress towards this goal without performing steps that look like formalization. (I would characterize this as the "philosophy" part of Luke's post about going from philosophy to math to engineering.)

  • Conversely, I also agree that it's possible to perform formalization in a way that doesn't perfectly capture the essence of "the thing we want to talk about", or perhaps doesn't usefully capture it in any sense at all; if one wanted to use unkind words, one could describe the former category as "premature [formalization]", and the latter category as "unnecessary [formalization]". (Separately, I also see you as claiming that TurnTrout's work on power-seeking and Scott's work on Cartesian frames fall somewhere in the "premature" category, but this may simply be me putting words in your mouth.)

And perhaps your contention is that there's too much research being done currently that falls under the second bullet point; or, alternatively, that too many people are pursuing research that falls (or may fall) under the second bullet point, in a way that they (counterfactually) wouldn't if there were less (implicit) prestige attached to formal research.

If this (or something like it) is your claim, then I don't think I necessarily disagree; in fact, it's probably fair to say you're in a better position to judge than I am, being "closer to the ground". But I also don't think this precludes my initial position from being valid, where--having laid the groundwork in the previous two bullet points--I can now characterize my initial position as [establishing the existence of] a bullet point number 3:

  • A successful, complete deconfusion of a concept will, almost by definition, admit to a natural formalization; if one then goes to the further step of producing such a formalism, it will be evident that the essence of the original concept is present in said formalism.

    (Or, to borrow Eliezer's words this time, "Do you understand [the concept/property/attribute] well enough to write a computer program that has it?")

And yes, in a certain sense perhaps there might be no point to writing a computer program with the [concept/property/attribute] in question, because such a computer program wouldn't do anything useful. But in another sense, there is a point: the point isn't to produce a useful computer program, but to check whether your understanding has actually reached the level you think it has. If one further takes the position (as I do) that such checks are useful and necessary, then [replacing "writing a computer program" with "producing a formalism"] I claim that many productive lines of deconfusion research will in fact produce formalisms that look "premature" or even "unnecessary", as a part of the process of checking the researchers' understanding.

I think that about sums up [the part of] the disagreement [that I currently know how to verbalize]. I'm curious to see whether you agree this is a valid summary; let me know if (abstractly) you think I've been using the term "deconfusion" differently from you, or (concretely) if you disagree with anything I said about "my" version of deconfusion.

Comment by dxu on Alignment Research = Conceptual Alignment Research + Applied Alignment Research · 2021-08-31T04:08:53.559Z · LW · GW

Perhaps deconfusion and formalization are not identical, but I'm partial to the notion that if you've truly deconfused something (meaning you, personally, are no longer confused about that thing), it should not take much further effort to formalize the thing in question. (Examples of this include TurnTrout's sequence on power-seeking, John Wentworth's sequence on abstraction, and Scott Garrabrant's sequence on Cartesian frames.)

So, although the path to deconfusing some concept may not involve formalizing that concept, being able to formalize the concept is necessary: if you find yourself (for some reason or other) thinking that you've deconfused something, but are nonetheless unable to produce a formalization of it, that's a warning sign that you may not actually be less confused.

Comment by dxu on Can you control the past? · 2021-08-29T20:57:50.552Z · LW · GW

I am surprised that, to this day, there are people on LW who haven't yet dissolved free will, despite the topic being covered explicitly (both in the Sequences and in a litany of other posts) over a decade ago.

No, "libertarian free will" (the idea that there exists some notion of "choice" independent of and unaffected by physics) does not exist. Yes, this means that if you are a god sitting outside of the universe, you can (modulo largely irrelevant fuzz factors like quantum indeterminacy) compute what any individual physical subsystem in the universe will do, including subsystems that refer to themselves as "agents".

But so what? In point of fact, you are not a god. So what bearing does this hypothetical god's perspective have on you, in your position as a merely-physical subsystem-of-the-universe? Perhaps you imagine that the existence of such a god has deep relevance for your decision-making right here and now, but if so, you are merely mistaken. (Indeed, for some people who consistently make this mistake, it may even be the case that hearing of the god's existence has harmed them.)

Suppose, then, that you are not God. It follows that you cannot know what you will decide before actually making the decision. So in the moments leading up to the decision, what will you do? Pleading "but it is a predetermined physical fact!" benefits you not at all; whether there exists a fact of the matter is irrelevant when you cannot access that information even in principle. Whatever you do, then—whatever process of deliberation you follow, whatever algorithm you instantiate—must depend on something other than plucking information out of the mind of God.

What might this deliberation process entail? It depends, naturally, on what type of physical system you are; some systems (e.g. rocks) "deliberate" via extremely simplistic means. But if you happen to be a human—or anything else that might be recognizably "agent-like"—your deliberation will probably consist of something like imagining different choices you might make, visualizing the outcome conditional on each of those choices, and then selecting the choice-outcome pair that ranks best in your estimation.

If you follow this procedure, you will end up making only one choice in the end. What does this entail for all of the other conditional futures you imagined? If you were to ask God, he would tell you that those futures were logically incoherent—not simply physically impossible, but incapable of happening even in principle. But as to the question of how much bearing this (true) fact had on your ability to envision those futures, back when you didn't yet know which way you would choose—the answer is no bearing at all.

This, then, is all that "free will" is: it is the way it feels, from the inside, to instantiate an algorithm following the decision-making procedure outlined above. In short, free will is an "abstraction" in exactly the sense Steven Byrnes described in the grandparent comment, and your objection

Yup, there are different levels of abstraction to model/predict/organize our understanding of the universe. These are not exclusive, nor independent, though - there's only one actual universe! Mixing (or more commonly switching among) levels is not a problem if all the levels are correct on the aspects being predicted. The lower levels are impossibly hard to calculate, but they're what actually happens. The higher levels are more accessible, but sometimes wrong.

is simply incorrect—an error that arose from mixing levels of abstraction that have nothing to do with each other. The "higher-level" picture does not contradict the "lower-level" picture; God's existence has no bearing on your ability to imagine conditional outcomes, and it is this latter concept that people refer to with terms like "free will" or "control".

Comment by dxu on New GPT-3 competitor · 2021-08-13T18:27:51.867Z · LW · GW

The key advance here seems to be the tokenizer, with larger vocabulary, which has been identified by others as a potentially critical limitation for GPT-3.

My impression was that tokenization was a critical limitation for GPT-3 in the opposite direction, i.e. it caused GPT-3's performance to suffer on tasks where character-level information is important (including multi-digit addition, and also things like rhyming, acronyms, etc), because the tokenization process clumps characters together by default and obscures that information. Having more (and longer) tokens does not seem like it would remedy that issue; if anything, it may exacerbate it.

Comment by dxu on Obesity Epidemic Explained in 0.9 Subway Cookies · 2021-08-12T22:45:29.520Z · LW · GW

The body is pretty efficient at turning extra calories into weight

This seems like a strange claim to make, considering the results of various controlled overfeeding studies. (It's also a strange framing on top of that.) Do you have any sources backing up the quoted claim?

Comment by dxu on Book Review: The Beginning of Infinity · 2021-08-11T06:28:52.652Z · LW · GW

The argument runs like this: suppose we say that a variable is ‘close to the edge’ when it is within 10% of its possible extreme values on either side. If there were only one variable that determined our universe, 20% of its possible values would be close to the edge. If we observed such a variable as being very close to the edge, we might suspect that something fishy was going on or that our universe was designed. But for two variables, 1 - 0.8^2 = 32% of values will be close to the edge. And in general, for n variables, 1 - 0.8^n of the values will be close to the edge.

This is simply mathematically incorrect. What would be correct is to say that there is a probability of 1 - 0.8^n that at least one of the variables is "close to the edge" in the sense described. This is not the same thing as saying the expected proportion of the variables is 1 - 0.8^n; that quantity remains constant no matter how many variables you have. (And it is this latter quantity that is relevant for the fine-tuning argument: it is not surprising that the universe contains a single variable that looks to be optimized for our existence, but that it contains a whole host of variables so optimized.)

I don't know whether this is a mistake Deutsch himself made, or whether he had a better argument that you (the reviewer) simply summarized sloppily. Either way, however, it doesn't speak well to the quality of the book's content. (And, moreover, I found the summaries of most of the book's other theses rather sloppy and unconvincing as well, though at least those contained no basic mathematical errors.)

Comment by dxu on Seeking Power is Convergently Instrumental in a Broad Class of Environments · 2021-08-08T22:18:50.614Z · LW · GW

One particular example of this phenomenon that comes to mind:

In (traditional) chess-playing software, generally moves are selected using a combination of search and evaluation, where the search is (usually) some form of minimax with alpha-beta pruning, and the evaluation function is used to assign a value estimate to leaf nodes in the tree, which are then propagated to the root to select a move.

Typically, the evaluation function is designed by humans (although recent developments have changed that second part somewhat) to reflect meaningful features of chess understanding. Obviously, this is useful in order to provide a more accurate estimate for leaf node values, and hence more accurate move selection. What's less obvious is what happens if you give a chess engine a random evaluation function, i.e. one that assigns an arbitrary value to any given position.

This has in fact been done, and the relevant part of the experiment is what happened when the resulting engine was made to play against itself at various search depths. Naively, you'd expect that since the evaluation function has no correlation with the actual value of the position, the engine would make more-or-less random moves regardless of its search depth--but in fact, this isn't the case: even with a completely random evaluation function, higher search depths consistently beat lower search depths.

The reason for this, once the games were examined, is that the high-depth version of the engine seemed to consistently make moves that gave its pieces more mobility, i.e. gave it more legal moves in subsequent positions. This is because, given that the evaluation function assigns arbitrary values to leaf nodes, the "most extreme" value (which is what minimax cares about) will be uniformly distributed among the leaves, and hence branches with more leaves are more likely to be selected. And since mobility is in fact an important concept in real chess, this tendency manifests in a way that favors the higher-depth player.

At the time I first learned about this experiment, it struck me as merely a fascinating coincidence: the fact that selecting from a large number of nodes distributed non-uniformly leads to a natural approximation of the "mobility" concept was interesting, but nothing more. But the standpoint of instrumental convergence reveals another perspective: the concept chess players call "mobility" is actually important precisely because it emerges even in such a primitive system, one almost entirely divorced from the rules and goals of the game. In short, mobility is an instrumentally useful resource--not just in chess, but in all chess-like games, simply because it's useful to have more legal moves no matter what your goal is.

(This is actually borne out in games of a chess variant called "suicide chess", where the goal is to force your opponent to checkmate you. Despite having a completely opposite terminal goal to that of regular chess, games of suicide chess actually strongly resemble games of regular chess, at least during the first half. The reason for this is simply that in both games, the winning side needs to build up a dominant position before being able to force the opponent to do anything, whether that "anything" be delivering checkmate or being checkmated. Once that dominant position has been achieved, you can make use of it to attain whatever end state you want, but the process of reaching said dominant position is the same across variants.)

Additionally, there are some analogies to be drawn to the three types of utility function discussed in the post:

  1. Depth-1 search is analogous to utility functions over action-observation histories (AOH), in that its move selection criterion depends only on the current position. With a random (arbitrary) evaluation function, move selection in this regime is equivalent to drawing randomly from a uniform distribution across the set of immediately available next moves, with no concern for what happens after that. There is no tendency for depth-1 players to seek mobility.

  2. Depth-2 search is analogous to a utility function in a Markov decision process (MDP), in that its move selection criterion depends on the assessment of the immediate next time-step. With a random (arbitrary) evaluation function, move selection in this regime is equivalent to drawing randomly from a uniform distribution across the set of possible replies to one's immediately available moves, which in turn is equivalent to drawing from a distribution over one's next moves weighted by the number of replies. There is a mild tendency for depth-2 players to seek mobility.

  3. Finally, maximum-depth search would be analogous to the classical utility function over observation histories (OH). With a random (arbitrary) evaluation function, move selection in this regime almost always picks whichever branch of the tree leads to the maximum possible number of terminal states. What this looks like in principle is unknown, but empirically we see that high-depth players have a strong tendency to seek mobility.

Comment by dxu on The Myth of the Myth of the Lone Genius · 2021-08-03T18:02:43.715Z · LW · GW

The old wizard sighed. His half-glasses eyes looked only at her, as though they were the only two people present. "Miss Granger, it might be possible to discourage witches from becoming Charms Mistresses, or Quidditch players, or even Aurors. But not heroes. If someone is meant to be a hero then a hero they will be. They will walk through fire and swim through ice. Dementors will not stop them, nor the deaths of friends, and not discouragement either."

"Well," Hermione said, and paused, struggling with the words. "Well, I mean... what if that's not actually true? I mean, to me it seems that if you want more witches to be heroes, you ought to teach them heroing."

Comment by dxu on Another (outer) alignment failure story · 2021-08-01T17:33:14.160Z · LW · GW

realise the difference between measuring something and the thing measured

What does this cash out to, concretely, in terms of a system's behavior? If I were to put a system in front of you that does "realize the difference between measuring something and the thing measured", what would that system's behavior look like? And once you've answered that, can you describe what mechanic in the system's design would lead to that (aspect of its) behavior?

Comment by dxu on Academic Rationality Research · 2021-07-27T03:55:14.100Z · LW · GW

This seems cool! Strongly upvoted for signal-boosting.

Comment by dxu on Re: Competent Elites · 2021-07-23T17:45:26.709Z · LW · GW

It's well settled that qualitative data, defined as "non-numerical data that approximates and characterizes,"

This, unfortunately, is not even close to being a useful definition, as it does not permit us to identify instances of "qualitative data" in the wild. (Unless, of course, your contention is that everything that is not a number is "qualitative data", in which case the merit of your definition is questionable for a rather... different reason.)

As such, I employed qualitative data in the form of quotes from some pretty bright folks

To wit: if "quotes from some pretty bright folks" is sufficient to count as "qualitative data", then I submit that "qualitative data" is a completely useless category, both in and of itself and as a tool for making arguments. (I note that you continue to neglect to respond to the second half of my question, which asked why you believed your use of "qualitative data" strengthened your argument. Your continued lack of response on this point is telling, as are your rather ham-handed attempts to draw attention away from said inability to provide a response—usually by way of playground-level ad hominem attacks.)

But, to play in your lane for just a bit: since "quotes from pretty bright folks" seemingly count as "qualitative data" (which is seemingly meant to be compelling), here is my irrefutable counterargument to your initial claim, which consists of just the same type of "qualitative data" as you yourself employed:

"As soon as you trust yourself, you will know how to live." —Johann Wolfgang von Goethe

"Success is most often achieved by those who don't know that failure is inevitable." —Sylvia Plath

I await your response to this crushing rebuttal, which I am sure will consist of an immediate concession to the unyielding validity of the data I have provided.

Comment by dxu on Re: Competent Elites · 2021-07-23T02:46:30.018Z · LW · GW

I do, in fact, understand the difference between those two things. It's precisely because I understand the difference that I asked you what I did.

Now, let me repeat the question (with some additional emphasis on the important bits): what is the phrase "qualitative data" doing in your comment; in what sense do you believe your initial response to gwern contained "data" at all, qualitative or otherwise; and moreover, why do you believe that your use of this phrase (incidentally combined with other interesting phrases, such as "stats wonk") will cause readers of your comment to believe that it is more likely to be true*, rather than less?

*In fact, I had originally intended to use the word "rigorous" here, but I suspect based on your previous comments that you would not, in fact, agree that "rigor" is a thing to strive for when making arguments; thus I opted for the less specific (but more generally agreed upon) criterion of likelihood. (Whether rigor is in fact an important desideratum is a related discussion to this one, of course, as is--to be somewhat glib--what disregarding said desideratum says about one's own general quality of thought.)

Comment by dxu on Re: Competent Elites · 2021-07-19T17:49:43.946Z · LW · GW

qualitative data

What is this phrase supposed to mean, and why is it supposed to make your argument sound more convincing, rather than less?

Comment by dxu on Variables Don't Represent The Physical World (And That's OK) · 2021-06-18T03:56:37.588Z · LW · GW

You are misunderstanding the post. There are no "extra bits of information" hiding anywhere in reality; where the "extra bits of information" are lurking is within the implicit assumptions you made when you constructed your model the way you did.

As long as your model is making use of abstractions--that is, using "summary data" to create and work with a lower-dimensional representation of reality than would be obtained by meticulously tracking every variable of relevance--you are implicitly making a choice about what information you are summarizing and how you are summarizing it.

This choice is forced to some extent, in the sense that there are certain ways of summarizing the data that barely simplify computation at all compared to using the "full" model. But even conditioning on a usefully simplifying (natural) abstraction having been selected, there will still be degrees of freedom remaining, and those degrees of freedom are determined by you (the person doing the summarizing). This is where the "extra information" comes from; it's not because of inherent uncertainty in the physical measurements, but because of an unforced choice that was made between multiple abstract models summarizing the same physical measurements.

Of course, in reality you are also dealing with measurement uncertainty. But that's not what the post is about; the thing described in the post happens even if you somehow manage to get your hands on a set of uncertainty-free measurements, because the moment you pick a particular way to carve up those measurements, you induce a (partially) arbitrary abstraction layer on top of the measurements. As the post itself says:

If there’s only a limited number of data points, then this has the same inherent uncertainty as before: sample mean is not distribution mean. But even if there’s an infinite number of data points, there’s still some unresolvable uncertainty: there are points which are boundary-cases between the “tree” cluster and the “apple” cluster, and the distribution-mean depends on how we classify those. There is no physical measurement we can make which will perfectly tell us which things are “trees” or “apples”; this distinction exists only in our model, not in the territory. In turn, the tree-distribution-parameters do not perfectly correspond to any physical things in the territory.

This implies nothing about determinism, physics, or the nature of reality ("illusory" or otherwise).

Comment by dxu on Often, enemies really are innately evil. · 2021-06-07T23:37:49.774Z · LW · GW

This does not strike me as a psychologically realistic model of sadism, and (absent further explanation/justification) counts in my opinion as a rather large strike against mistake theory (or at least, it would if I took as given that a plurality of self-proclaimed "mistake theorists" would in fact endorse the statement you made).

Comment by dxu on What will 2040 probably look like assuming no singularity? · 2021-05-19T06:29:37.292Z · LW · GW

Neither of those seem to me like the right questions to be asking (though for what it's worth the answer to the first question has been pretty clearly "yes" if by "Chinese government" we're referring specifically to post-2001 China).

Having said that, I don't think outside-viewing these scenarios using coarse-grained reference classes like "the set of mid-term goals China has set for itself in the past" leads to anything useful. Well-functioning countries in general (and China in particular) tend to set goals for themselves they view as achievable, so if they're well-calibrated it's necessarily the case that they'll end up achieving (a large proportion of) the goals they set for themselves. This being the case, you don't learn much from finding out China manages to consistently meet its own goals, other than that they've historically done a pretty decent job at assessing their own capabilities. Nor does this allow you to draw conclusions about a specific goal they have, which may be easier or more difficult to achieve than their average goal.

In the case of Taiwan: by default, China is capable of taking Taiwan by force. What I mean by this is that China's maritime capabilities well exceed Taiwan's defensive capacity, such that Taiwan's continued sovereignty in the face of a Chinese invasion is entirely reliant on the threat of external intervention (principally from the United States, but also by allies in the region). Absent that threat, China could invade Taiwan tomorrow and have a roughly ~100% chance of taking the island. Even if allies get involved, there's a non-negligible probability China wins anyway, and the trend going forward only favors China even more.

Of course, that doesn't mean China will invade Taiwan in the near future. As long as its victory isn't assured, it stands to lose substantially more than from a failed invasion than it stands to gain from a successful one. At least for the near future, so long as the United States doesn't send a clear signal about whether it will defend Taiwan, I expect China to mostly play it safe. But there's definitely a growing confidence within China that they'll retake Taiwan eventually, so the prospect of an invasion is almost certainly on the horizon unless current trends w.r.t. the respective strengths of the U.S. and Chinese militaries reverse for some reason. That's not out of the question (the future is unpredictable), but there's also no particular reason to expect said trends to reverse, so assuming they don't, China will almost certainly try to occupy Taiwan at some point, regardless of what stance the U.S. takes on the issue.

(Separately, there's the question of whether the U.S. will take a positive stance; I'm not optimistic that it will, given its historical reluctance to do so, as well as the fact that all of the risks and incentives responsible for said reluctance will likely only increase as time goes on.)

Comment by dxu on Making Vaccine · 2021-02-05T16:37:56.012Z · LW · GW

A simple Google search shows thousands of articles addressing this very solution.

The solution in the paper you link is literally the solution Eliezer described trying, and not working:

As of 2014, she’d tried sitting in front of a little lightbox for an hour per day, and it hadn’t worked.

(Note that the "little lightbox" in question was very likely one of these, which you may notice have mostly ratings of 10,000 lux rather than the 2,500 cited in the paper. So, significantly brighter, and despite that, didn't work.)

It does sound like you misunderstood, in other words. Knowing that light exposure is an effective treatment for SAD is indeed a known solution; this is why Eliezer tried light boxes to begin with. The point of that excerpt is that this "known solution" did not work for his wife, and the obvious next step of scaling up the amount of light used was not investigated in any of the clinical literature.

But taking a step back, the "Chesterton’s Absence of a Fence" argument doesn't apply here because the circumstances are very different. The entire world is desperately looking for a way to stop COVID. If SAD suddenly occurred out of nowhere and affected the entire economy, you would be sure that bright lights would be one of the first things to be tested.

This is simply a (slightly) disguised variation of your original argument. Absent strong reasons to expect to see efficiency, you should not expect to see efficiency. The "entire world desperately looking for a way to stop COVID" led to bungled vaccine distribution, delayed production, supply shortages, the list goes on and on. Empirically, we do not observe anything close to efficiency in this market, and this should be obvious even without the aid of Dentin's list of bullet points (though naturally those bullet points are very helpful).

(Question: did seeing those bullet points cause you to update at all in the direction of this working, or are you sticking with your 1-2% prior? The latter seems fairly indefensible from an epistemic standpoint, I think.)

Not only is the argument above flawed, it's also special pleading with respect to COVID. Here is the analogue of your argument with respect to SAD:

Around 7% of the population has severe Seasonal Affective Disorder, and another 20% or so has weak Seasonal Affective Disorder. Around 50% of tested cases respond to standard lightboxes. So if the intervention of stringing up a hundred LED bulbs actually worked, it could provide a major improvement to the lives of 3% of the US population, costing on the order of $1000 each (without economies of scale). Many of those 9 million US citizens would be rich enough to afford that as a treatment for major winter depression. If you could prove that your system worked, you could create a company to sell SAD-grade lighting systems and have a large market.

SAD is not an uncommon disorder. In terms of QALYs lost, it's... probably not directly comparable with COVID, but it's at the very least in the same ballpark--certainly to the point where "people want to stop COVID, but they don't care about SAD" is clearly false.

And yet, in point of fact, there are no papers describing the unspeakably obvious intervention of "if your lights don't seem to be working, use more lights", nor are there any companies predicated on this idea. If Eliezer had followed your reasoning to its end conclusion, he might not have bothered testing more light... except that his background assumptions did not imply the (again, fairly indefensible, in my view) heuristic that "if no one else is doing it, the only possible explanation is that it must not work, else people are forgoing free money". And as a result, he did try the intervention, and it worked, and (we can assume) his wife's quality of life was improved significantly as a result.

If there's an argument that (a) applies in full generality to anything other people haven't done before, and (b) if applied, would regularly lead people to forgo testing out their ideas (and not due to any object-level concerns, either, e.g. maybe it's a risky idea to test), then I assert that that argument is bad and harmful, and that you should stop reasoning in this manner.

Comment by dxu on Making Vaccine · 2021-02-04T20:24:16.344Z · LW · GW

This is a very in-depth explanation of some of the constraints affecting pharmaceutical companies that (mostly) don't apply to individuals, and is useful as an object-level explanation for those interested. I'm glad this comment was written, and I upvoted accordingly.

Having said that, I would also like to point out that a detailed explanation of the constraints shouldn't be needed to address the argument in the grandparent comment, which simply reads:

Why are established pharmaceutical companies spending billions on research and using complex mRNA vaccines when simply creating some peptides and adding it to a solution works just as well?

This question inherently assumes that the situation with commercial vaccine-makers is efficient with respect to easy, do-it-yourself interventions, and the key point I want to make is that this assumption is unjustified even if you don't happen to have access to a handy list of bullet points detailing the ways in which companies and individuals differ on this front. (Eliezer wrote a whole book on this at one point, from which I'll quote a relevant section:)

My wife has a severe case of Seasonal Affective Disorder. As of 2014, she’d tried sitting in front of a little lightbox for an hour per day, and it hadn’t worked. SAD’s effects were crippling enough for it to be worth our time to consider extreme options, like her spending time in South America during the winter months. And indeed, vacationing in Chile and receiving more exposure to actual sunlight did work, where lightboxes failed.

From my perspective, the obvious next thought was: “Empirically, dinky little lightboxes don’t work. Empirically, the Sun does work. Next step: more light. Fill our house with more lumens than lightboxes provide.” In short order, I had strung up sixty-five 60W-equivalent LED bulbs in the living room, and another sixty-five in her bedroom.

Ah, but should I assume that my civilization is being opportunistic about seeking out ways to cure SAD, and that if putting up 130 LED light bulbs often worked when lightboxes failed, doctors would already know about that? Should the fact that putting up 130 light bulbs isn’t a well-known next step after lightboxes convince me that my bright idea is probably not a good idea, because if it were, everyone would already be doing it? Should I conclude from my inability to find any published studies on the Internet testing this question that there is some fatal flaw in my plan that I’m just not seeing?

We might call this argument “Chesterton’s Absence of a Fence.” The thought being: I shouldn’t build a fence here, because if it were a good idea to have a fence here, someone would already have built it. The underlying question here is: How strongly should I expect that this extremely common medical problem has been thoroughly considered by my civilization, and that there’s nothing new, effective, and unconventional that I can personally improvise?

Eyeballing this question, my off-the-cuff answer—based mostly on the impressions related to me by every friend of mine who has ever dealt with medicine on a research level—is that I wouldn’t necessarily expect any medical researcher ever to have done a formal experiment on the first thought that popped into my mind for treating this extremely common depressive syndrome. Nor would I strongly expect the intervention, if initial tests found it to be effective, to have received enough attention that I could Google it.

The grandparent comment is more or less an exact example of this species of argument, and is the first of its kind that I can recall seeing "in the wild". I think examples of this kind of thinking are all over the place, but it's rare to find a case where somebody explicitly deploys an argument of this type in such a direct, obvious way. So I wanted to draw attention to this, with further emphasis on the idea that such arguments are not valid in general.

The prevalence of this kind of thinking is why (I claim) at-home, do-it-yourself interventions are so uncommon, and why this particular intervention went largely unnoticed even among the rationalist community. It's a failure mode that's easy to slip into, so I think it's important to point these things out explicitly and push back against them when they're spotted (which is the reason I wrote this comment).

IMPORTANT NOTE: This should be obvious enough to anyone who read Inadequate Equilibria, but one thing I'm not saying here is that you should just trust random advice you find online. You should obviously perform an object-level evaluation of the advice, and put substantial effort into investigating potential risks; such an assessment might very well require multiple days' or weeks' worth of work, and end up including such things as the bulleted list in the parent comment. The point is that once you've performed that assessment, it serves no further purpose to question yourself based only on the fact that others aren't doing the thing you're doing; this is what Eliezer would call wasted motion, and it's unproductive at best and harmful at worst. If you find yourself thinking along these lines, you should stop, in particular if you find yourself saying things like this (emphasis mine):

That being said, I'm extremely skeptical that this will work, my belief is that there's a 1-2% chance here that you've effectively immunized yourself from COVID.

You cannot get enough Bayesian evidence from the fact that [insert company here] isn't doing [insert intervention here] to reduce your probability of an intervention being effective all the way down to 1-2%. That 1-2% figure almost certainly didn't come from any attempt at a numerical assessment; rather, it came purely from an abstract intuition that "stuff that isn't officially endorsed doesn't work". This is the kind of thinking that (I assert) should be noticed and stamped out.

Comment by dxu on Syntax, semantics, and symbol grounding, simplified · 2020-12-01T02:29:11.293Z · LW · GW

With regard to GPT-n, I don't think the hurdle is groundedness. Given a sufficiently vast corpus of language, GPT-n will achieve a level of groundedness where it understands language at a human level but lacks the ability to make intelligent extrapolations from that understanding (e.g. invent general relativity), which is rather a different problem.

The claim in the article is that grounding is required for extrapolation, so these two problems are not in fact unrelated. You might compare e.g. the case of a student who has memorized by rote a number of crucial formulas in calculus, but cannot derive those formulas from scratch if asked (and by extension obviously cannot conceive of or prove novel theorems either); this suggests an insufficient level of understanding of the fundamental mathematical underpinnings of calculus, which (if I understood Stuart's post correctly) is a form of "ungroundedness".

Comment by dxu on [Linkpost] AlphaFold: a solution to a 50-year-old grand challenge in biology · 2020-11-30T20:47:01.160Z · LW · GW

I don't think it's particularly impactful from an X-risk standpoint (at least in terms of first-order consequences), but in terms of timelines I think it represents another update in favor of shorter timelines, in a similar vein to AlphaGo/AlphaZero.

Comment by dxu on Message Length · 2020-10-22T01:44:47.852Z · LW · GW

Since the parameters in your implementation are 32-bit floats, you assign a complexity cost of 32 ⋅ 2^n bits to n-th order Markov chains, and look at the sum of fit (log loss) and complexity.

Something about this feels wrong. The precision of your floats shouldn't be what determines the complexity of your Markov chain; the expressivity of an nth-order Markov chain will almost always be worse than that of a (n+1)th-order Markov chain, even if the latter has access to higher precision floats than the former. Also, in the extreme case where you're working with real numbers, you'd end up with the absurd conclusion that every Markov chain has infinite complexity, which is obviously nonsensical.

This does raise the question of how to assign complexity to Markov chains; it's clearly going to be linear in the number of parameters (and hence exponential in the order of the chain), which means the general form k ⋅ 2^n seems correct... but the value you choose for the coefficient k seems underdetermined.

Comment by dxu on Alignment By Default · 2020-08-16T16:18:57.704Z · LW · GW

I like this post a lot, and I think it points out a key crux between what I would term the "Yudkowsky" side (which seems to mostly include MIRI, though I'm not too sure about individual researchers' views) and "everybody else".

In particular, the disagreement seems to crystallize over the question of whether "human values" really are a natural abstraction. I suspect that if Eliezer thought that they were, he would be substantially less worried about AI alignment than he currently is (though naturally all of this is my read on his views).

You do provide some reasons to think that human values might be a natural abstraction, both in the post itself and in the comments, but I don't see these reasons as particularly compelling ones. The one I view as the most compelling is the argument that humans seems to be fairly good at identifying and using natural abstractions, and therefore any abstract concept that we seem to be capable of grasping fairly quickly has a strong chance of being a natural one.

However, I think there's a key difference between abstractions that are developed for the purposes of prediction, and abstractions developed for other purposes (by which I mostly mean "RL"). To the extent that a predictor doesn't have sufficient computational power to form a low-level model of whatever it's trying to predict, I definitely think that the abstractions it develops in the process of trying to improve its prediction will to a large extent be natural ones. (You lay out the reasons for this clearly enough in the post itself, so I won't repeat them here.)

It seems to me, though, that if we're talking about a learning agent that's actually trying to take actions to accomplish things in some environment, there's a substantial amount of learning going on that has nothing to do with learning to predict things with greater accuracy! The abstractions learned in order to select actions from a given action-space in an attempt to maximize a given reward function--these, I see little reason to expect will be natural. In fact, if the computational power afforded to the agent is good but not excellent, I expect mostly the opposite: a kludge of heuristics and behaviors meant to address different subcases of different situations, with not a whole lot of rhyme or reason to be found.

As agents go, humans are definitely of the latter type. And, therefore, I think the fact that we intuitively grasp the concept of "human values" isn't necessarily an argument that "human values" are likely to be natural, in the way that it would be for e.g. trees. The latter would have been developed as a predictive abstraction, whereas the former seems to mainly consist of what I'll term a reward abstraction. And it's quite plausible to me that reward abstractions are only legible by default to agents which implement that particular reward abstraction, and not otherwise. If that's true, then the fact that humans know what "human values" are is merely a consequence of the fact that we happen to be humans, and therefore have a huge amount of mind-structure in common.

To the extent that this is comparable to the branching pattern of a tree (which is a comparison you make in the post), I would argue that it increases rather than lessens the reason to worry: much like a tree's branch structure is chaotic, messy, and overall high-entropy, I expect human values to look similar, and therefore not really encompass any kind of natural category.

Comment by dxu on The "AI Dungeons" Dragon Model is heavily path dependent (testing GPT-3 on ethics) · 2020-08-02T23:51:28.965Z · LW · GW

Here's the actual explanation for this:

This seems to have been an excellent exercise in noticing confusion; in particular, to figure this one out properly would have required one to not recognize that this behavior does not accord with one's pre-existing model, rather than simply coming up with an ad hoc explanation to fit the observation.

I therefore award partial marks to Rafael Harth for not proposing any explanations in particular, as well as Viliam in the comments:

I assumed that the GPT's were just generating the next word based on the previous words, one word at a time. Now I am confused.

Zero marks to Andy Jones, unfortunately:

I am fairly confident that Latitude wrap your Dungeon input before submitting it to GPT-3; if you put in the prompt all at once, that'll make for different model input than putting it in one line at a time.

Don't make up explanations! Take a Bayes penalty for your transgressions!

(No one gets full marks, unfortunately, since I didn't see anyone actually come up with the correct explanation.)

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-26T17:54:02.035Z · LW · GW

For what it's worth, my perception of this thread is the opposite of yours: it seems to me John Wentworth's arguments have been clear, consistent, and easy to follow, whereas you (John Maxwell) have been making very little effort to address his position, instead choosing to repeatedly strawman said position (and also repeatedly attempting to lump in what Wentworth has been saying with what you think other people have said in the past, thereby implicitly asking him to defend whatever you think those other people's positions were).

Whether you've been doing this out of a lack of desire to properly engage, an inability to comprehend the argument itself, or some other odd obstacle is in some sense irrelevant to the object-level fact of what has been happening during this conversation. You've made your frustration with "AI safety people" more than clear over the course of this conversation (and I did advise you not to engage further if that was the case!), but I submit that in this particular case (at least), the entirety of your frustration can be traced back to your own lack of willingness to put forth interpretive labor.

To be clear: I am making this comment in this tone (which I am well aware is unkind) because there are multiple aspects of your behavior in this thread that I find not only logically rude, but ordinarily rude as well. I more or less summarized these aspects in the first paragraph of my comment, but there's one particularly onerous aspect I want to highlight: over the course of this discussion, you've made multiple references to other uninvolved people (either with whom you agree or disagree), without making any effort at all to lay out what those people said or why it's relevant to the current discussion. There are two examples of this from your latest comment alone:

Daniel K agreed with me the other day that there isn't a standard reference for this claim. [Note: your link here is broken; here's a fixed version.]

A MIRI employee openly admitted here that they apply different standards of evidence to claims of safety vs claims of not-safety.

Ignoring the question of whether these two quoted statements are true (note that even the fixed version of the link above goes only to a top-level post, and I don't see any comments on that post from the other day), this is counterproductive for a number of reasons.

Firstly, it's inefficient. If you believe a particular statement is false (and furthermore, that your basis for this belief is sound), you should first attempt to refute that statement directly, which gives your interlocutor the opportunity to either counter your refutation or concede the point, thereby moving the conversation forward. If you instead counter merely by invoking somebody else's opinion, you both increase the difficulty of answering and end up offering weaker evidence.

Secondly, it's irrelevant. John Wentworth does not work at MIRI (neither does Daniel Kokotajlo, for that matter), so bringing up aspects of MIRI's position you dislike does nothing but highlight a potential area where his position differs from MIRI's. (I say "potential" because it's not at all obvious to me that you've been representing MIRI's position accurately.) In order to properly challenge his position, again it becomes more useful to critique his assertions directly rather than round them off to the closest thing said by someone from MIRI.

Thirdly, it's a distraction. When you regularly reference a group of people who aren't present in the actual conversation, repeatedly make mention of your frustration and "grumpiness" with those people, and frequently compare your actual interlocutor's position to what you imagine those people have said, all while your actual interlocutor has said nothing to indicate affiliation with or endorsement of those people, it doesn't paint a picture of an objective critic. To be blunt: it paints a picture of someone with a one-sided grudge against the people in question, and is attempting to inject that grudge into conversations where it shouldn't be present.

I hope future conversations can be more pleasant than this.

Comment by dxu on The Basic Double Crux pattern · 2020-07-23T04:16:58.984Z · LW · GW

I think shminux may have in mind one or more specific topics of contention that he's had to hash out with multiple LWers in the past (myself included), usually to no avail. 

(Admittedly, the one I'm thinking of is deeply, deeply philosophical, to the point where the question "what if I'm wrong about this?" just gets the intuition generator to spew nonsense. But I would say that this is less about an inability to question one's most deeply held beliefs, and more about the fact that there are certain aspects of our world-models that are still confused, and querying them directly may not lead to any new insight.)

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T03:54:52.105Z · LW · GW

If it's read moral philosophy, it should have some notion of what the words "human values" mean.

GPT-3 and systems like it are trained to mimic human discourse. Even if (in the limit of arbitrary computational power) it manages to encode an implicit representation of human values somewhere in its internal state, in actual practice there is nothing tying that representation to the phrase "human values", since moral philosophy is written by (confused) humans, and in human-written text the phrase "human values" is not used in the consistent, coherent manner that would be required to infer its use as a label for a fixed concept.

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T03:48:17.942Z · LW · GW

On "conceding the point":

You said earlier that "The argument for the fragility of value never relied on AI being unable to understand human values." I gave you a quote from Superintelligence which talked about AI being unable to understand human values. Are you gonna, like, concede the point or something?

The thesis that values are fragile doesn't have anything to do with how easy it is to create a system that models them implicitly, but with how easy it is to get an arbitrarily intelligent agent to behave in a way that preserves those values. The difference between those two things is analogous to the difference between a prediction task and a reinforcement learning task, and your argument (as far as I can tell) addresses the former, not the latter. Insofar as my reading of your argument is correct, there is no point to concede.

On gwern's article:

Anyway, I read Gwern's article a while ago and I thought it was pretty bad. If I recall correctly, Gwern confuses various different notions, for example, he seemed to think that if you replace enough bits of handcrafted software with bits trained using machine learning, an agent will spontaneously emerge.

I'm not sure how to respond to this, except to state that neither this specific claim nor anything particularly close to it appears in the article I linked.

On Tool AI:

Are possible

As far as I'm aware, this point has never been the subject of much dispute.

Are easier to build than Agent AIs

This is still arguable; I have my doubts, but in a "big picture" sense this is largely irrelevant to the greater point, which is:

Will be able to solve the value-loading problem

This is (and remains) the crux. I still don't see how GPT-3 supports this claim! Just as a check that we're on the same page: when you say "value-loading problem", are you referring to something more specific than the general issue of getting an AI to learn and behave according to our values?


META: I can understand that you're frustrated about this topic, especially if it seems to you that the "MIRI-sphere" (as you called it in a different comment) is persistently refusing to acknowledge something that appears obvious to you.

Obviously, I don't agree with that characterization, but in general I don't want to engage in a discussion that one side is finding increasingly unpleasant, especially since that often causes the discussion to rapidly deteriorate in quality after a few replies.

As such, I want to explicitly and openly relieve you of any social obligation you may have felt to reply to this comment. If you feel that your time would be better spent elsewhere, please do!

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-23T02:43:24.482Z · LW · GW

My claim is that we are likely to see a future GPT-N system which [...] does not "resist attempts to meddle with its motivational system".

Well, yes. This is primarily because GPT-like systems don't have a "motivational system" with which to meddle. This is not a new argument by any means: the concept of AI systems that aren't architecturally goal-oriented by default is known as "Tool AI", and there's plenty of pre-existing discussion on this topic. I'm not sure what you think GPT-3 adds to the discussion that hasn't already been mentioned?

Comment by dxu on Alignment As A Bottleneck To Usefulness Of GPT-3 · 2020-07-22T22:33:36.283Z · LW · GW

I'm confused by what you're saying.

The argument for the fragility of value never relied on AI being unable to understand human values. Are you claiming it does?

If not, what are you claiming?

Comment by dxu on Coronavirus as a test-run for X-risks · 2020-06-14T00:06:03.697Z · LW · GW

I'd love to see more thought about how the MNM effect might look in an AI scenario. Like you said, maybe denials and assurances followed by freakouts and bans. But maybe we could predict what sorts of events would trigger the shift?

I take it you're presuming slow takeoff in this paragraph, right?

Comment by dxu on Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning · 2020-06-10T16:59:04.237Z · LW · GW

Differing discourse norms; in general, communities that don't expend a constant amount of time-energy into maintaining better-than-average standards of discourse will, by default, regress to the mean. (We saw the same thing happen with LW1.0.)

Comment by dxu on GPT-3: a disappointing paper · 2020-06-02T20:06:38.721Z · LW · GW

I'm not seeing how you distinguish between the following two hypotheses:

  1. GPT-3 exhibits mostly flat scaling at the tasks you mention underneath your first bullet point (WiC, MultiRC, etc.) because its architecture is fundamentally unsuited to those tasks, such that increasing the model capacity will lead to little further improvement.
  2. Even 175B parameters isn't sufficient to perform well on certain tasks (given a fixed architecture), but increasing the number of parameters will eventually cause performance on said tasks to undergo a large increase (akin to something like a phase change in physics).

It sounds like you're implicitly taking the first hypothesis as a given (e.g. when you assert that there is a "remaining gap vs. fine-tuning that seems [unlikely] to be closed"), but I see no reason to give this hypothesis preferential treatment!

In fact, it seems to be precisely the assertion of the paper's authors that the first hypothesis should not be taken as a given; and the evidence they give to support this assertion is... the multiple downstream tasks for which an apparent "phase change" did in fact occur. Let's list them out:

  • BoolQ (apparent flatline between 2.6B and 13B, then a sudden jump in performance at 175B)
  • CB (essentially noise between 0.4B and 13B, then a sudden jump in performance at 175B)
  • RTE (essentially noise until 2.6B, then a sudden shift to very regular improvement until 175B)
  • WSC (essentially noise until 2.6B, then a sudden shift to very regular improvement until 175B)
  • basic arithmetic (mostly flat until 6.7B, followed by rapid improvement until 175B)
  • SquadV2 (apparent flatline at 0.8B, sudden jump at 1.3B followed by approximately constant rate of improvement until 175B)
  • ANLI round 3 (noise until 13B, sudden jump at 175B)
  • word-scramble with random insertion (sudden increase in rate of improvement after 6.7B)

Several of the above examples exhibit a substantial amount of noise in their performance graphs, but nonetheless, I feel my point stands. Given this, it seems rather odd for you to be claiming that the "great across-task variance" indicates a lack of general reasoning capability when said across-task variance is (if anything) evidence for the opposite, with many tasks that previously stumped smaller models being overcome by GPT-3.

It's especially interesting to me that you would write the following, seemingly without realizing the obvious implication (emphasis mine):

we still see a wide spread of task performance despite smooth gains in LM loss, with some of the most distinctive deficits persisting at all scales (common sense physics, cf section 5), and some very basic capabilities only emerging at very large scale and noisily even there (arithmetic)

The takeaway here is, at least in my mind, quite clear: it's a mistake to evaluate model performance on human terms. Without getting into an extended discussion on whether arithmetic ought to count as a "simple" or "natural" task, empirically transformers do not exhibit a strong affinity for the task. Therefore, the fact that this "basic capability" emerges at all is, or at least should be, strong evidence for generalization capability. As such, the way you use this fact to argue otherwise (both in the section I just quoted and in your original post) seems to me to be exactly backwards.

Elsewhere, you write:

The ability to get better downstream results is utterly unsurprising: it would be very surprising if language prediction grew steadily toward perfection without a corresponding trend toward good performance on NLP benchmarks

It's surprising to me that you would write this while also claiming that few-shot prediction seems unlikely to close the gap to fine-tuned models on certain tasks. I can't think of a coherent model where both of these claims are simultaneously true; if you have one, I'd certainly be interested in hearing what it is.

More generally, this is (again) why I stress the importance of concrete predictions. You call it "utterly unsurprising" that a 175B-param model would outperform smaller ones on NLP benchmarks, and yet neither you nor anyone else could have predicted what the scaling curves for those benchmarks would look like. (Indeed, your entire original post can be read as an expression of surprise at the lack of impressiveness of GPT-3's performance on certain benchmarks.)

When you only ever look at things in hindsight, without ever setting forth concrete predictions that can be overturned by evidence, you run the risk of never forming a model concrete enough to be engaged with. I don't believe it's a coincidence that you called it "difficult" to explain why you found the paper unimpressive: it's because your standards of impressiveness are opaque enough that they don't, in and of themselves, constitute a model of how transformers might/might not possess general reasoning ability.

Comment by dxu on GPT-3: a disappointing paper · 2020-06-02T17:49:33.971Z · LW · GW

Also note that a significant number of humans would fail the kind of test you described (inducing the behavior of a novel mathematical operation from a relatively small number of examples), which is why similar tests of inductive reasoning ability show up quite often on IQ tests and the like. It's not the case that failing at that kind of test shows a lack of general reasoning skills, unless we permit that a substantial fraction of humans lack general reasoning skills to at least some extent.

Comment by dxu on GPT-3: a disappointing paper · 2020-05-31T02:12:32.804Z · LW · GW

I don't think the practical value of very new techniques is impossible to estimate. For example, the value of BERT was very clear in the paper that introduced it: it was obvious that this was a strictly better way to do supervised NLP, and it was quickly and widely adopted.

This comparison seems disingenuous. The goal of the BERT paper was to introduce a novel training method for Transformer-based models that measurably outperformed previous training methods. Conversely, the goal of the GPT-3 paper seems to be to investigate the performance of an existing training method when scaled up to previously unreached (and unreachable) model sizes. I would expect you to agree that these are two very different things, surely?

More generally, it seems to me that you've been consistently conflating the practical usefulness of a result with how informative said result is. Earlier, you wrote that "few-shot LM prediction" (not GPT-3 specifically, few-shot prediction in general!) doesn't sound that promising to you because the specific model discussed in the paper doesn't outperform SOTA on all benchmarks, and also requires currently impractical levels of hardware/compute. Setting aside the question of whether this original claim resembles the one you just made in your latest response to me (it doesn't), neither claim addresses what, in my view, are the primary implications of the GPT-3 paper--namely, what it says about the viability of few-shot prediction as model capacity continues to increase.

This, incidentally, is why I issued the "smell test" described in the grandparent, and your answer more or less confirms what I initially suspected: the paper comes across as unsurprising to you because you largely had no concrete predictions to begin with, beyond the trivial prediction that existing trends will persist to some (unknown) degree. (In particular, I didn't see anything in what you wrote that indicates an overall view of how far the capabilities current language models are from human reasoning ability, and what that might imply about where model performance might start flattening with increased scaling.)

Since it doesn't appear that you had any intuitions to begin with about what GPT-3's results might indicate about the scalability of language models in general, it makes sense that your reading of the paper would be framed in terms of practical applications, of which (quite obviously) there are currently none.

Comment by dxu on Draconarius's Shortform · 2020-05-30T20:01:02.374Z · LW · GW

If the number of guests is countable (which is the usual assumption in Hilbert’s setup), then every guest will only have to travel a finite (albeit unboundedly long) distance before they reach their room.

Comment by dxu on GPT-3: a disappointing paper · 2020-05-30T19:31:37.868Z · LW · GW

What do you think that main significance is?

I can’t claim to speak for gwern, but as far as significance goes, Daniel Kokotajlo has already advanced a plausible takeaway. Given that his comment is currently the most highly upvoted comment on this post, I imagine that a substantial fraction of people here share his viewpoint.

Given my past ML experience, this just doesn't sound that promising to me, which may be our disconnect.

I strongly suspect the true disconnect comes a step before this conclusion: namely, that “[your] past ML experience” is all that strongly predictive of performance using new techniques. A smell test: what do you think your past experience would have predicted about the performance of a 175B-parameter model in advance? (And if the answer is that you don’t think you would have had clear predictions, then I don’t see how you can justify this “review” of the paper as anything other than hindsight bias.)

Comment by dxu on AGIs as collectives · 2020-05-29T02:33:46.320Z · LW · GW
  • "There seems to be no reason not to expect that human value functions have similar problems, which even "aligned" AIs could trigger unless they are somehow designed not to." There are plenty of reasons to think that we don't have similar problems - for instance, we're much smarter than the ML systems on which we've seen adversarial examples. Also, there are lots of us, and we keep each other in check.
  • "For example, such AIs could give humans so much power so quickly or put them in such novel situations that their moral development can't keep up, and their value systems no longer apply or give essentially random answers." What does this actually look like? Suppose I'm made the absolute ruler of a whole virtual universe - that's a lot of power. How might my value system "not keep up"?

I confess to being uncertain of what you find confusing/unclear here. Think of any subject you currently have conflicting moral intuitions about (do you have none?), and now imagine being given unlimited power without being given the corresponding time to sort out which intuitions you endorse. It seems quite plausible to me that you might choose to do the wrong thing in such a situation, which could be catastrophic if said decision is irreversible.

Comment by dxu on AI Boxing for Hardware-bound agents (aka the China alignment problem) · 2020-05-09T04:47:27.956Z · LW · GW

"Will it happen?" isn't vacuous or easy, generally speaking. I can think of lots of questions where I have no idea what the answer is, despite a "trend of ever increasing strength".

In the post, you write:

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

"Will it happen?" is easy precisely in cases where a development "seems inevitable"; the hard part then becomes forecasting when such a development will occur. The fact that you (and most computer Go experts, in fact) did not do this is a testament to how unpredictable conceptual advances are, and your attempt to reduce it to the mere continuation of a trend is an oversimplification of the highest order.

I've made specific statements about my beliefs for when Human-Level AI will be developed. If you disagree with these predictions, please state your own.

You've made statements about your willingness to bet at non-extreme odds over relatively large chunks of time. This indicates both low confidence and low granularity, which means that there's very little disagreement to be had. (Of course, I don't mean to imply that it's possible to do better; indeed, given the current level of uncertainty surrounding everything to do with AI, about the only way to get me to disagree with you would have been to provide a highly confident, specific prediction.)

Nevertheless, it's an indicator that you do not believe you possess particularly reliable information about future advances in AI, so I remain puzzled that you would present your thesis so strongly at the start. In particular, your claim that the following questions

Does this mean that the development of human-level AI might not surprise us? Or that by the time human level AI is developed it will already be old news?

depend on

whether or not you were surprised by the development of Alpha-Go

seems to have literally no connection to what you later claim, which is that AlphaGo did not surprise you because you knew something like it had to happen at some point. What is the relevant analogy here to artificial general intelligence? Will artificial general intelligence be "old news" because we suspected from the start that it was possible? If so, what does it mean for something be "old news" if you have no idea when it will happen, and could not have predicted it would happen at any particular point until after it showed up?

As far as I can tell, reading through both the initial post and the comments, none of these questions have been answered.

Comment by dxu on AI Boxing for Hardware-bound agents (aka the China alignment problem) · 2020-05-09T00:36:36.234Z · LW · GW

If, on the one hand, you had seen that since the 1950's computer AIs had been capable of beating humans increasingly difficult games and that progress in this domain had been fairly steady and mostly limited by compute power. And moreover that computer Go programs had themselves gone from idiotic to high-amateur level over a course of decades, then the development of alpha-go (if not the exact timing of that development) probably seemed inevitable.

This seems to entirely ignore most (if not all) of the salient implications of AlphaGo's development. What set AlphaGo apart from previous attempts at computer Go was the iterated distillation and amplification scheme employed during its training scheme. This represents a genuine conceptual advance over previous approaches, and to characterize it as simply a continuation of the trend of increasing strength in Go-playing programs only works if you neglect to define said "trend" in any way more specific than "roughly monotonically increasing". And if you do that, you've tossed out any and all information that would make this a useful and non-vacuous observation.

Shortly after this paragraph, you write:

For the record, I was surprised at how soon Alpha-Go happened, but not that it happened.

In other words, you got the easy and useless part ("will it happen?") right, and the difficult and important part ("when will it happen?") wrong. It's not clear to me why you feel this necessitated mention at all, but since you did mention it, I feel obligated to point out that "predictions" of this caliber are the best you'll ever be able to do if you insist on throwing out any information more specific and granular than "historically, these metrics seem to move consistently upward/downward".

Comment by dxu on Failures in technology forecasting? A reply to Ord and Yudkowsky · 2020-05-09T00:09:51.563Z · LW · GW

If it were common knowledge that any hyperbolic language experts use when speaking about the unlikelihood of AGI (e.g. Andrew Ng's statement "worrying about AI safety is like worrying about overpopulation on Mars") actually corresponded to a 10% subjective probability of AGI, things would look very different than they currently do.

More generally, on a strategic level there is very little difference between a genuinely incorrect forecast and one that is "correct", but communicated so poorly as to create a wrong impression in the mind of the listener. If the state of affairs is such that anyone who privately believes there is a 10% chance of AGI is incentivized to instead report their assessment as "remote", the conclusion of Ord/Yudkowsky holds, and it remains impossible to discern whether AGI is imminent by listening to expert forecasts.

(I also don't believe that said experts, if asked to translate their forecasts to numerical probabilities, would give a median estimate anywhere near as high as 10%, but that's largely tangential to the discussion at hand.)

Furthermore, and more importantly, however: I deny that Fermi's 10% somehow detracts from the point that forecasting the future of novel technologies is hard.

Four years prior to overseeing the world's first nuclear reaction, Fermi believed that it was more likely than not that a nuclear chain reaction was impossible. Setting aside for a moment the question of whether Fermi's specific probability assignment was negligible, or merely small, what this indicates is that the majority of the information necessary to determine the possibility of a nuclear chain reaction was in fact unavailable to Fermi at the time he made his forecast. This does not support the idea that making predictions about technology is easy, any more than it would have if Fermi had assigned 0.001% instead of 10%!

More generally, the specific probability estimate Fermi gave is nothing more than a red herring, one that is given undue attention by the OP. The relevant factor to Ord/Yudkowsky's thesis is how much uncertainty there is in the probability distribution of a given technology--not whether the mean of said distribution, when treated as a point estimate, happens to be negligible or non-negligible. Focusing too much on the latter not only obfuscates the correct lesson to be learned, but also sometimes leads to nonsensical results.

Comment by dxu on Being right isn't enough. Confidence is very important. · 2020-04-07T21:25:51.648Z · LW · GW

The original post wasn’t talking about “correctness”; it was talking about calibration, which is a very specific term with a very specific meaning. Machines one and two are both well-calibrated, but there is nothing requiring that two well-calibrated distributions must perform equally well against each other in a series of bets.

Indeed, this is the very point of the original post, so your comment attempting to contradict it did not, in fact, do so.

Comment by dxu on Predictors exist: CDT going bonkers... forever · 2020-01-15T22:14:38.523Z · LW · GW

these examples can't actually happen, or are so rare that I'll pay that cost in order to have a simpler model for the other 99.9999% of my decisions

Indeed, if it were true that Newcomb-like situations (or more generally, situations where other agents condition their behavior on predictions of your behavior) do not occur with any appreciable frequency, there would be much less interest in creating a decision theory that addresses such situations.

But far from constituting a mere 0.0001% of possible situations (or some other, similarly minuscule percentage), Newcomb-like situations are simply the norm! Even in everyday human life, we frequently encounter other people and base our decisions off what we expect them to do—indeed, the ability to model others and act based on those models is integral to functioning as part of any social group or community. And it should be noted that humans do not behave as causal decision theory predicts they ought to—we do not betray each other in one-shot prisoner’s dilemmas, we pay people we hire (sometimes) well in advance of them completing their job, etc.

This is not mere “irrationality”; otherwise, there would have been no reason for us to develop these kinds of pro-social instincts in the first place. The observation that CDT is inadequate is fundamentally a combination of (a) the fact that it does not accurately predict certain decisions we make, and (b) the claim that the decisions we make are in some sense correct rather than incorrect—and if CDT disagrees, then so much the worse for CDT. (Specifically, the sense in which our decisions are correct—and CDT is not—is that our decisions result in more expected utility in the long run.)

All it takes for CDT to fail is the presence of predictors. These predictors don’t have to be Omega-style superintelligences—even moderately accurate predictors who perform significantly (but not ridiculously) above random chance can create Newcomb-like elements with which CDT is incapable of coping. I really don’t see any justification at all for the idea that these situations somehow constitute a superminority of possible situations, or (worse yet) that they somehow “cannot” happen. Such a claim seems to be missing the forest for the trees: you don’t need perfect predictors to have these problems show up; the problems show up anyway. The only purpose of using Omega-style perfect predictors is to make our thought experiments clearer (by making things more extreme), but they are by no means necessary.

Comment by dxu on Realism about rationality · 2020-01-14T23:35:31.356Z · LW · GW

That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing.

In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.

Comment by dxu on Realism about rationality · 2020-01-13T23:33:17.345Z · LW · GW

Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum?

It depends on what you mean by "lawful". Right now, the word "lawful" in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it's not clear to me why "law thinking" is relevant in the first place--it seems as though it simply muddies the discussion by introducing additional concepts.