Goodhart's Curse and Limitations on AI Alignment 2019-08-19T07:57:01.143Z · score: 13 (6 votes)
G Gordon Worley III's Shortform 2019-08-06T20:10:27.796Z · score: 7 (1 votes)
Scope Insensitivity Judo 2019-07-19T17:33:27.716Z · score: 19 (9 votes)
Robust Artificial Intelligence and Robust Human Organizations 2019-07-17T02:27:38.721Z · score: 17 (7 votes)
Whence decision exhaustion? 2019-06-28T20:41:47.987Z · score: 17 (4 votes)
Let Values Drift 2019-06-20T20:45:36.618Z · score: 3 (11 votes)
Say Wrong Things 2019-05-24T22:11:35.227Z · score: 89 (34 votes)
Boo votes, Yay NPS 2019-05-14T19:07:52.432Z · score: 34 (11 votes)
Highlights from "Integral Spirituality" 2019-04-12T18:19:06.560Z · score: 21 (20 votes)
Parfit's Escape (Filk) 2019-03-29T02:31:42.981Z · score: 40 (15 votes)
[Old] Wayfinding series 2019-03-12T17:54:16.091Z · score: 9 (2 votes)
[Old] Mapmaking Series 2019-03-12T17:32:04.609Z · score: 9 (2 votes)
Is LessWrong a "classic style intellectual world"? 2019-02-26T21:33:37.736Z · score: 30 (7 votes)
Akrasia is confusion about what you want 2018-12-28T21:09:20.692Z · score: 18 (15 votes)
What self-help has helped you? 2018-12-20T03:31:52.497Z · score: 34 (11 votes)
Why should EA care about rationality (and vice-versa)? 2018-12-09T22:03:58.158Z · score: 16 (3 votes)
What precisely do we mean by AI alignment? 2018-12-09T02:23:28.809Z · score: 29 (8 votes)
Outline of Metarationality, or much less than you wanted to know about postrationality 2018-10-14T22:08:16.763Z · score: 19 (17 votes)
HLAI 2018 Talks 2018-09-17T18:13:19.421Z · score: 15 (5 votes)
HLAI 2018 Field Report 2018-08-29T00:11:26.106Z · score: 49 (20 votes)
A developmentally-situated approach to teaching normative behavior to AI 2018-08-17T18:44:53.515Z · score: 12 (5 votes)
Robustness to fundamental uncertainty in AGI alignment 2018-07-27T00:41:26.058Z · score: 7 (2 votes)
Solving the AI Race Finalists 2018-07-19T21:04:49.003Z · score: 27 (10 votes)
Look Under the Light Post 2018-07-16T22:19:03.435Z · score: 25 (11 votes)
RFC: Mental phenomena in AGI alignment 2018-07-05T20:52:00.267Z · score: 13 (4 votes)
Aligned AI May Depend on Moral Facts 2018-06-15T01:33:36.364Z · score: 9 (3 votes)
RFC: Meta-ethical uncertainty in AGI alignment 2018-06-08T20:56:26.527Z · score: 18 (5 votes)
The Incoherence of Honesty 2018-06-08T02:28:59.044Z · score: 22 (12 votes)
Safety in Machine Learning 2018-05-29T18:54:26.596Z · score: 17 (4 votes)
Epistemic Circularity 2018-05-23T21:00:51.822Z · score: 5 (1 votes)
RFC: Philosophical Conservatism in AI Alignment Research 2018-05-15T03:29:02.194Z · score: 29 (10 votes)
Thoughts on "AI safety via debate" 2018-05-10T00:44:09.335Z · score: 33 (7 votes)
The Leading and Trailing Edges of Development 2018-04-26T18:02:23.681Z · score: 24 (7 votes)
Suffering and Intractable Pain 2018-04-03T01:05:30.556Z · score: 13 (3 votes)
Evaluating Existing Approaches to AGI Alignment 2018-03-27T19:57:39.207Z · score: 22 (5 votes)
Evaluating Existing Approaches to AGI Alignment 2018-03-27T19:55:57.000Z · score: 0 (0 votes)
Idea: Open Access AI Safety Journal 2018-03-23T18:27:01.166Z · score: 64 (20 votes)
Computational Complexity of P-Zombies 2018-03-21T00:51:31.103Z · score: 3 (4 votes)
Avoiding AI Races Through Self-Regulation 2018-03-12T20:53:45.465Z · score: 6 (3 votes)
How safe "safe" AI development? 2018-02-28T23:21:50.307Z · score: 27 (10 votes)
Self-regulation of safety in AI research 2018-02-25T23:17:44.720Z · score: 33 (10 votes)
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation 2018-02-23T21:42:20.604Z · score: 15 (4 votes)
AI Alignment and Phenomenal Consciousness 2018-02-23T01:21:36.808Z · score: 10 (2 votes)
Formally Stating the AI Alignment Problem 2018-02-19T19:07:14.000Z · score: 0 (0 votes)
Formally Stating the AI Alignment Problem 2018-02-19T19:06:04.086Z · score: 14 (6 votes)
Bayes Rule Applied 2018-02-16T18:30:16.470Z · score: 12 (3 votes)
Introduction to Noematology 2018-02-05T23:28:32.151Z · score: 11 (4 votes)
Form and Feedback in Phenomenology 2018-01-24T19:42:30.556Z · score: 29 (6 votes)
Book Review: Why Buddhism Is True 2018-01-15T20:54:37.431Z · score: 23 (9 votes)
Methods of Phenomenology 2017-12-30T18:42:03.513Z · score: 6 (2 votes)


Comment by gworley on Two senses of “optimizer” · 2019-08-22T10:53:13.771Z · score: 2 (1 votes) · LW · GW

I think there is only a question of how leaky, but it is always non-zero amounts of leaky, which is the reason Bostrom and others are concerned about it for all optimizers and don't bother to make this distinction.

Comment by gworley on Two senses of “optimizer” · 2019-08-22T10:50:59.815Z · score: 2 (1 votes) · LW · GW

Yes, and I'm saying that's not possible. Every optimizer_1 is an optimizer_2.

Comment by gworley on Two senses of “optimizer” · 2019-08-22T08:34:36.743Z · score: 2 (1 votes) · LW · GW

Sure. Not making the distinction seems important, though, because this post seems to be leaning towards rejecting arguments that depend on noticing that the distinction is leaky. Making it is okay so long as you understand it as "optimizer_1 is a way of looking at things that screens off many messy details of the world so I can focus on only the details I care about right now", but if it becomes conflated with "and if something is an optimizer_1 I don't have to worry about the way it is also an optimizer_2" then that's dangerous.

The author of the post suggests it's a problem that "some arguments related to AI safety that seem to conflate these two concepts". I'd say they don't conflate them, but understand that every optimizer_1 is an optimizer_2.

Comment by gworley on Two senses of “optimizer” · 2019-08-21T21:20:49.201Z · score: 7 (5 votes) · LW · GW

Sure, let's be super specific about it.

Let's say we have something you consider an optimizer_1, a SAT solver. It operates over a set of variables V arranged in predicts P using an algorithm A. Since this is a real SAT solver that is computed rather than a purely mathematical one we think about, it runs on some computer C and thus for each of V, P, and A there is some C(V), C(P), and C(A) that is the manifestation of each on the computer. We can conceptualize what C does to V, P, and A in different ways: it turns them into bytes, it turns A into instructions, it uses C(A) to operate on C(V) and C(P) to produce a solution for V and P.

Now the intention is that the algorithm A is an optimizer_1 that only operates on V and P, but in fact A is never run, properly speaking, C(A) is, and we can only say A is run to the extent C(A) does something to reality that we can set up an isomorphism to A with. So C(A) is only an optimizer_1 to the extent the isomorphism holds and it is, as you defined optimizer_1, "solving a computational optimization problem". But properly speaking C(A) doesn't "know" it's an algorithm: it's just matter arranged in a way that is isomorphic, via some transformation, to A.

So what is C(A) doing then to produce a solution? Well, I'd say it "optimizes its environment", that is literally the matter and its configuration that it is in contact with, so it's an optimizer_2.

You might object that there's something special going on here such that C(A) is still an optimizer_1 because it was set up in a way that isolates it from the broader environment so it stays within the isomorphism, but that's not a matter of classification, that's an engineering problem of making an optimizer_2 behave as if it were an optimizer_1. And a large chunk of AI safety (mostly boxing) is dealing with ways in which, even if we can make something safe in optimizer_1 terms, it may still be dangerous as an optimizer_2 because of unexpected behavior where it "breaks" the isomorphism and does something that might still keep the isomorphism in tact but also does other things you didn't think it would do if the isomorphism were strict.

Put pithily, there's no free lunch when it comes to isomorphisms that allow you to manifest your algorithms to compute them, so you have to worry about the way they are computed.

Comment by gworley on G Gordon Worley III's Shortform · 2019-08-21T19:31:50.784Z · score: 9 (4 votes) · LW · GW

Hmm, I feel like there's multiple things going on here, but I think it hinges on this:

Yes, the method requires temporarily suspending episteme-based reasoning and engaging with less conceptual forms of seeing. But it can still be justified and explained using episteme-based models; if it could not, there would be little reason to expect that it would be worth engaging with.

Different traditions vary on how much to emphasize models and episteme. None of them completely ignore it, though, only seek to keep it within its proper place. It's not that episteme is useless, only that it is not primary. You of course should include it because it's part of the world, and to deny it would lead to confusion and suffering. As you note with your first example especially, some people learn to turn off the discriminating mind rather than hold it as object, and they are worse for it because then they can't engage with it anymore. Turning it off is only something you could safely do if you really had become so enlightened that you had no shadow and would never accumulate any additional shadow, and even then it seems strange from where I stand to do that although maybe it would make sense to me if I were in the position that it were a reasonable and safe option.

So to me this reads like an objection to a position I didn't mean to take. I mean to say episteme has a place and is useful, it is not taken as primary to understanding, at some points Buddhist episteme will say contradictory things, that's fine and expected because dharma episteme is normally post hoc rather than ante hoc (though is still expected to be rational right up until it is forced to hit a contradiction), and ante hoc is okay so long as it is then later verified via gnosis or techne.

Comment by gworley on Two senses of “optimizer” · 2019-08-21T17:50:27.554Z · score: 5 (7 votes) · LW · GW

This looks like a Cartesian distinction that exists only by virtue of not fully considering the embeddedness of the optimizer.

It only seems that the domain of an optimizer_1 cannot optimize or affect the environment like an optimizer_2 because you are thinking of it as operating in mathematical, ideal terms, rather than as a real system that runs on a computer by doing physics and interacting with the world. An optimizer_1 can smoothly turn into an optimizer_2 in at least two ways. One is via unintended side effects. Another is via scope creep. There is no clean, bright line separating the domain of the optimizer_1 from the rest of reality, and in fact it was always an optimizer_2, just only looking at a narrow slice of the world because you put up some guardrails to keep it there.

The worry is what happens when it jumps the guardrails, or the guardrails fail.

Comment by gworley on Goodhart's Curse and Limitations on AI Alignment · 2019-08-21T17:36:47.463Z · score: 5 (2 votes) · LW · GW

Right, if you don't have a measure you can't have Goodhart's curse on technical grounds, but I'm also pretty sure something like it is still there, it's just as far as I know no one has tried to show that something like the optimizers curse continues to function when you only have an ordering and not a measure. I think it does, and I think others think it does, and this is part of the generalization to Goodharting, but I don't know that a formal proof demonstrating that has been generated even though I strongly suspect it's true.

Comment by gworley on Odds are not easier · 2019-08-21T13:11:54.261Z · score: 4 (3 votes) · LW · GW

I've always found the notion that "odds are easier" confusing. I'm not sure who they are easier for, but I find reasoning about betting odds confusing and unintuitive. I have a clear feel for what a probability of 0.25 is. I don't have one for what 1:3 means. Maybe most people have greater experience with gambling?

Comment by gworley on Goodhart's Curse and Limitations on AI Alignment · 2019-08-21T13:05:41.484Z · score: 2 (1 votes) · LW · GW

Hmm, maybe you are misunderstanding how the optimizer's curse works? It's powered by selecting based on a measure with error in a way that biases us to pick specific actions based on their measure when the measure errs such that the measure is on average higher rather than lower than its true value. You are mistaken, then, to not care about E, because E is the only reliable and comparable way you have to check if C satisfies T (if there's another one that's reliable and comparable, then use it instead). It's literally the only option, assuming you picked the "best" E (another chance for Goodhart's curse to bite you), for picking C_max that seems better unless you want very high quantilization such that, say, you only act when things appear orders of magnitude better with error bounds small enough that you will only be wrong once in trillions of years.

Comment by gworley on G Gordon Worley III's Shortform · 2019-08-21T12:54:12.964Z · score: 25 (10 votes) · LW · GW

Some thoughts on Buddhist epistemology.

This risks being threatening, upsetting, and heretical within a certain point of view I commonly see expressed on LW for reasons that will become clear if you keep reading. I don't know if that means you shouldn't read this if that sounds like the kind of thing you don't want to read, but I put it out there so you can make the choice without having to engage in the specifics if you don't want to. I don't think you will be missing out on anything if that warning gives you a tinge of "maybe I won't like reading this".

My mind produces a type error when people try to perform deep and precise epistemic analysis of the dharma. That is, when they try to evaluate the truth of claims made by the dharma this seems generally fine, but when they go deep enough that they end up trying to evaluate whether the dharma itself is based on something true, I get the type error.

I'm not sure what people trying to do this turn up. My expectation is that their results looks like noise if you aggregate over all such attempts. The reason being that the dharma is not founded on episteme.

As a quick reminder, there are at least three categories of knowledge worth considering: doxa, episteme, and gnosis. Doxa might translate as "hearsay" in English; it's about statements of the truth. Episteme is knowledge you come to believe via evaluation of the truth. Gnosis is direct, unmediated-by-ontology knowledge of reality. To this I'll also distinguish techne from episteme, the former being experienced knowledge and the latter being reasoned knowledge.

I'll make the probably not very bold claim that most LW rationalists value episteme above all else, accept techne as evidence, accept doxa as evidence about evidence and only weak evidence of truth itself, and mostly ignore gnosis because it is not "rational" in the sense that it cannot be put into words and it can only be pointed at by words and so cannot be analyzed because there is no ontology or categorization to allow making claims one way or the other about it.

Buddhist philosophy values gnosis above all else, then techne, then doxa, then episteme.

To say a little more, the most important thing in Buddhist thinking is seeing reality just as it is, unmediated by the "thinking" mind, by which we really mean the acts of discrimination, judgement, categorization, and ontology. To be sure, this "reality" is not external reality, which we never get to see directly, but rather our unmediated contact with it via the senses. But for all the value of gnosis, unless you plan to sit on a lotus flower in perfect equanimity forever and never act in the world, it's not enough. Techne is the knowledge we gain through action in the world, and although it does pass judgement and discriminate, it also stays close to the ground and makes few claims. It is deeply embodied in action itself.

I'd say doxa comes next because there is a tradition of passing on the words of enlightened people as they said them and acting, at least some of the time, as if they were 100% true. Don't confuse this for just letting anything in, though: the point is to trust in the words of those who have come before and seen more than you and doing that is often very helpful to learning to see that which was previously invisible for yourself, but it is always an action you do yourself not contingent on the teachings since those only pointed you towards where to look and always failed to put into words (because it was impossible) what you would find. The old story was that the Buddha, when asked why he should be believed, said don't: try it for yourself and see what you find.

Episteme is last, and that's because it's not to be trusted. Of all the ways of knowing, episteme is the least grounded in reality. This should not be surprising, but it might be, so I'll say a bit about it. Formal methods are not grounded. There's a reason the grounding problem, epistemic circularity, the problem of the criterion, the problem of finding the universal prior, etc. remain fundamentally unsolved: they are unsolvable in a complete and adequate way. Instead we get pragmatic solutions that cross the chasm between reality and belief, between noumena and phenomena, between the ontic and ontology, and this leap of faith means episteme is always contingent on that leap. Even as it proves things we verify by other means, we must be careful because it's not grounded and we have to check everything it produces by other means. This means going all the way down to gnosis if possible, and techne at the least.

None of this it to say that episteme is not useful for many things and making predictions, but we hold it at arms length because of its powerful ability to confuse us if we didn't happen to make the right leaps where we pragmatically had to. It also always leaves something out because it requires distinctions to function, so it is always less complete. At the same time, it often makes predictions that turn out to be true, and the world is better for our powerful application of it. We just have to keep in mind what it is and what it can do and what its dangers are and engage with it in a thoughtful, careful way to avoid getting lost and confusing our perception of reality for reality itself.

So when we talk about the dharma or justify our actions on it, it's worth noting that it is not really trying to provide consistent episteme. It's grounded on gnosis and techne, presented via doxa, and only after the fact might we try to extend it via episteme to get an idea of where to look to understand it better. Thus it's a strange inversion to ask the dharma for episteme-based proofs. It can't give them, nor does it try, because its episteme is not consistent and cannot be because it chooses completeness instead.

So where does this leave us? If you want to evaluate the dharma, you'll have to do it yourself. You can't argue about it or reason it, you have to sit down and look at the nature of reality without conceptualizing it. Maybe that means you won't engage with it since it doesn't choose to accept the framing of episteme. That seems fine if you are so inclined. But then don't be surprised if the dharma claims you are closed minded, if you feel like it attacks your identity, and if it feels just true enough that you can't easily dismiss it out of hand although you might like to.

Comment by gworley on Goodhart's Curse and Limitations on AI Alignment · 2019-08-21T11:51:08.346Z · score: 2 (1 votes) · LW · GW

Lack of access to perfect information is highly relevant because it's exactly why we can't get around the curse. If we had perfect information we could correct for it as a systematic bias using Bayesian methods and be done with it. It's also why it shows up in the first place: if we could establish a measure E that accurately reported the amount it satisfied T then it wouldn't happen because there would be no error in the measurement.

What you are proposing about allowing targets to be exceeded is simply allowing for more mild optimization, and the optimizer's curse still happens if there is preferential choice at all.

Comment by gworley on Davis_Kingsley's Shortform · 2019-08-21T11:45:48.550Z · score: 11 (3 votes) · LW · GW


Also, I slightly worry that what you're seeing is skewed because the kind of people who are willing to try polyamory are unusually open about their personal lives. That is, I think there's a lot of drama going on with monogamy, too, and it just happens that people who choose to be monogamous also have a culture that better keeps drama secret until it is too big to keep secret anymore, so it simply looks more common in polyamory because it is less hidden.

Comment by gworley on Davis_Kingsley's Shortform · 2019-08-20T23:40:29.201Z · score: 6 (3 votes) · LW · GW

I'm pretty willing to believe you on polyphasic sleep. I've seen lots of people try it and lots of people give up on it because they can't make it work. It also seems like it's hard to get right and complicates life in ways that aren't worth the tradeoffs for most people: maybe they get more productive waking hours, but at the cost of a lifestyle that makes it difficult or impossible to do some things they would like to do.

I have a harder time believing you on polyamory because lots of people seem to keep choosing it. Yes, some people try it and don't like it, and yes some people seem to tolerate it because they feel they must or else date no one at all, but that seems not much different than the situation where monogamy is the default: some people try it and don't like it and some people do it even though they dislike it while doing it because otherwise they can't get other things they want.

I'm not really polyamorous myself, but when I look around and talk to people I see people suffering in polyamorous relationships but no more or less than people suffering in monogamous relationships. The narratives around the suffering and the causes of the suffering are different in those two cases, but seem similar in character and magnitude to me. Relevant to this claim: I spent several years married before getting divorced, consequently spent a lot of time while married around other couples, and so have about equal amounts of experience seeing close up how monogamous people interact and suffering and, from within the rationalist community, seeing close up how polyamorous people interact and suffer.

So my experience tells me to screen off relationship style as not much relevant to suffering, harm, badness, etc. that people experience on average. I have no doubt that for some people one is worse than the other, but when we step back and look at the whole, I don't see a clear bias that favors one for the other the way I do with polyphasic sleep.

Maybe you can say more about why you think polyamory is especially harmful?

Comment by gworley on Goodhart's Curse and Limitations on AI Alignment · 2019-08-20T23:23:30.414Z · score: 2 (1 votes) · LW · GW

So when they interact there's additional variables you're leaving out.

There's a target T that's the real thing you want. Then there's a function E that measures how much you expect C to achieve T. For example, maybe T is "have fun" and E is "how fun C looks". Then given a set of choices C_1, C_2, ... you choose C_max such that E(C_max) >= E(C_i) for all C_i (in normal terms, C_max = argmax E(C_i)). Unfortunately T is hidden such that you can only check if C satisfies T via E (well, this is not exactly true because otherwise we might have a hard time knowing the optimizer's curse exists, but it would hold even if that were the case, we just might not be able to notice it, and regardless we can't use whatever this side channel to assess T as a measure and so can't optimize on it).

Now since we don't have perfect information, there is some error e associated with E, so the true extent to which any C_i satisfies T is E(C_i) + e. But we picked C_max based on the existence of this error, since C_max = argmax E(C_i) + e, thus C_max may not be the true max. As you say, so what, maybe that means we just don't pick the best but we pick something good. But recall that our purpose was T not max E(C_i), so over repeated choices we will consistently, due to the optimizer's curse, pick C_max such that max E(C_i) < T (noting that's a type error as notated, but I think it's intuitive what is meant). Thus e will compound over repeated choice since each subsequent C is conditioned on the previous ones such that it becomes certain that E(C_max) < T and never E(C_max) = T.

This might seem minor if we had only a single dimension to worry about, like "had slightly less than maximum fun", even if it did, say, result in astronomical waste. But we normally are optimizing over multiple dimensions and each choice may fail in different ways along those different dimensions. The result is that we will over time shrink the efficiency frontier (though it might reach a limit and not get worse) and end up with worse solutions than were possible that may even be bad enough that we don't want them. After all, nothing is stopping the error from getting so large or the frontier from shrinking so much that we would be worse off than if we had never started.

Comment by gworley on A misconception about immigration · 2019-08-20T18:21:28.366Z · score: 2 (3 votes) · LW · GW

This doesn't exactly apply because most welfare spending (and in fact most government spend) is leveraged against future economic growth and inflation, so it actually is possible to create self-fulfilling prophecies via stimulus spending so long as that spend eventually produces real growth. This case might or might not be the most efficient way to do that, but governments are rarely trying to maximize efficiency at the expense of all else so we shouldn't really count that against the argument. So this would seem a neutral, at worst, scenario rather than one of dead-weight loss as you argue.

Comment by gworley on Goodhart's Curse and Limitations on AI Alignment · 2019-08-20T09:21:40.788Z · score: 2 (1 votes) · LW · GW
This feels like painting with too broad a brush, and from my state of knowledge, the assumed frame eliminates at least one viable solution. For example, can one build an AI without harmful instrumental incentives (without requiring any fragile specification of "harmful")? If you think not, how do you know that? Do we even presently have a gears-level understanding of why instrumental incentives occur?

Coincidentally, just yesterday I was part of some conversations that now make me more bullish on this approach. I haven't thought about it much in quite a while, and now I'm returning to it.

To say e.g. HCH is so likely to fail we should feel pessimistic about it, it doesn't seem to be enough to say "Goodhart's curse applies". Goodhart's curse applies when I'm buying apples at the grocery store. Why should we expect this bias of HCH to be enough to cause catastrophes, like it would for a superintelligent EU maximizer operating on an unbiased (but noisy) estimate of what we want? Some designs leave more room for correction and cushion, and it seems prudent to consider to what extent that is true for a proposed design.

It depends on how much risk you are willing to tolerate, I think. HCH applies optimization pressure, and in the limit of superintelligence I expect it to be so much optimization pressure that any deviance will become so large as to become a problem. But a person could choose to accept the risk with strategies that help minimize risk of deviance such that they think those strategies will do enough to mitigate the worst of that effect in the limit.

As far as leaving room for correction and cushion, those also require a relatively slow takeoff because it requires time for humans to think and intervene. Since I expect takeoff to be fast, I don't expect there to be adequate time for humans in the loop to notice and correct deviance, thus any deviance that can appear late in the process is a problem in my view.

This isn't obvious to me. Mild optimization seems like a natural thing people are able to imagine doing. If I think about "kinda helping you write a post but not going all-out", the result is not at all random actions. Can you expand?

The problem with mild optimization is that it doesn't eliminate the bias that causes the optimizer's curse, only attenuates it. So unless we can cause via a "mild" method there to be a finite bound on the amount of deviance in the limit of optimization pressure, I don't expect it to help.

Comment by gworley on Goodhart's Curse and Limitations on AI Alignment · 2019-08-19T09:46:17.959Z · score: 2 (1 votes) · LW · GW

I don't think I have anything unique to add to this discussion. Basically I defer to Eliezer and Nick (Bostom) for written arguments since they are largely the ones who provided the arguments that lead me to strongly believe we live in a world with hard takeoff via recursive self improvement that will lead to a "singularity" in the sense that we pass some threshold of intelligence/capabilities beyond which we cannot meaningfully reason about or control what happens after the fact, though we may be able to influence how it happens in ways that don't cut off the possibilities of outcomes we would be happy with.

Comment by gworley on Swimmer963's Shortform · 2019-08-19T09:39:11.580Z · score: 3 (2 votes) · LW · GW

Oh, the answer to that is pretty esoteric, which is why I was vague about it. I'll just say here that I attained to what I would call second path, but also identify with Kegan stage 5 or the "teal" level of development.

Comment by gworley on Matthew Barnett's Shortform · 2019-08-19T06:35:41.759Z · score: 3 (2 votes) · LW · GW

I happened to be looking at something else and saw this comment thread from about a month ago that is relevant to your post.

Comment by gworley on Swimmer963's Shortform · 2019-08-19T06:20:42.244Z · score: 6 (4 votes) · LW · GW

So I don't think you can go back to the way things were. The kind of thing I'm describing causing these changes is a thing that once seen cannot be unseen. You might say there's no blue pill for the kind of change of relationship to reality that made something cease to be a super powerful motivator.

What I do think you can do is find new way to do the things you did before, if they make sense to do where you are now.

To take your case of the gym, maybe in the past you went because it was super stimulating in that you were constantly satisfying many set points at once and were minimizing prediction error, possibly strongly in the form of taking actions to bring your model of yourself in line with your observations of yourself via changing your observations by changing your body so that they matched the model. Or maybe it was some other specific predictions powering things.

If that's fallen away for some reason and you want to go back to the gym, I can think of at least two questions:

  • Do you really want to be going to the gym, or do you just want to want to go to the gym?
  • If you really want to go to the gym, why don't you?

I think failures of motivation often come from mixing up genuine wanting with wanting to want (and then suffering because you think you want something and then you don't do it because you don't realize you don't really want it). If that's the case, the problem is you are not convinced going to the gym is worthwhile. Maybe it's not for you, I don't know; I heard all the arguments and didn't start going to the gym regularly (twice a week) until I was reminded a few years ago that I could go to the climbing gym and just climb walls instead of "working out" and since I find climbing fun (something we could expand but I'll leave be for now) it made gym going appealing and even "working out" like strength training desirable since it makes me better at and have more fun climbing. To return to the gym you probably need something similar that makes it viscerally appealing.

Comment by gworley on Categorial preferences and utility functions · 2019-08-19T05:57:08.301Z · score: 2 (1 votes) · LW · GW

I think a reasonable question to ask is what value having relative strength of the preference ordering matters to questions we want to answer. That is, I suspect the reason we normally only consider a preference ordering and not a preference measure is that it's not relevant to observed behavior, since in that case we presume the most preferred possible action will always be taken, thus more information than the order is not relevant to the decision.

I can imagine we might care about measure in cases like modeling how human preferences are actually manifested where they might have relative weights and can be updated, although personally I prefer the idea of avoiding this by making updating appear immutable by conditioning each preference on the entire causal history that came prior to its realization, although this has its own problems.

Comment by gworley on Swimmer963's Shortform · 2019-08-18T21:59:59.979Z · score: 2 (1 votes) · LW · GW
It also just feels satisfying to make something exist, I think, to draw it out of my head in exactly the form I want– I remember getting into the same kind of flow drawing pictures or composing music, which have much less "content", and I even get some of the thing from singing.

I no longer feel this way about things, but the strong desire to make things exist because I can and it's cool is definitely what motivated me to become a programmer. Over years I just kept at it because it was SO COOL to make the computer do what I told it, and I had all these daydreams about what I would make the computer do next. Now that it feels like there is nothing computable I can't in principle make a computer do, it's kind of boring on its own.

Comment by gworley on Swimmer963's Shortform · 2019-08-18T21:57:13.892Z · score: 2 (1 votes) · LW · GW

You might keep doing things even if they are effortless if they satisfy some other set point. After all, breathing is pretty boring to most people and you keep doing it. But once it is no longer something you have enough uncertainty about it will certainly become less interesting and so maybe not a super stimulus.

I agree there is something where if stuff is too hard we don't seem to view it as a chance to update and instead view it as just requiring more effort than is worthwhile...except in these supercharged states where we'll put up with more to get what we want. I haven't thought much about how this might interact with naturally arising flow states people get into. My theory says that should somehow be powered by minimization of prediction error since it says everything ultimately is, I just haven't thought about how that might work.

Comment by gworley on Subagents, trauma and rationality · 2019-08-18T18:16:15.157Z · score: 10 (5 votes) · LW · GW

I find the existence of a "big T" vs. "little T" trauma distinction interesting. To me it suggests a kind of desire to moralize trauma, as if some of these are the real traumas that matter and these other ones are the made up ones that don't really matter. But the finding seems to be that there is no difference between them to the person who experiences them; the distinction is one to make us feel better about saying getting picked last in gym is functionally the same as being molested if we react to those situations in the same way. As the patient we may feel bad about comparing our "little" trauma to a "big" trauma that others agree is morally bad whereas ours was morally less bad and want to reject even the implied equivalency of using similar techniques to resolve both. But our brains don't seem to be able to tell these apart or care about the moral aspect, so at least from the perspective of treatment it makes a lot of sense to consider them the same.

Comment by gworley on Problems in AI Alignment that philosophers could potentially contribute to · 2019-08-18T17:58:52.441Z · score: 2 (1 votes) · LW · GW

I'd add to the list the question of what hinge propositions to adopt given the pragmatic nature of epistemology as practiced by embedded agents like humans.

Disclosure: I wrote about this already (and if you check the linked paper, know there's a better version coming; it's currently under review and I'll post the preprint soonish).

Comment by gworley on Swimmer963's Shortform · 2019-08-18T17:52:31.811Z · score: 11 (5 votes) · LW · GW

So, I'll give here a possible explanation in terms of my current theory of human minds.

Humans are prediction minimizers with additional homeostatic feedback loops with set points for things like caloric intake, water intake, oxygen intake, etc.

This implies that anything that is super compelling in the way you describe must be super compelling from this standpoint such that a couple things hold true:

  • it aggressively causes you to minimize prediction error (or more correctly, minimizes self-measured/perceived existence of prediction error)
  • it allows you to satisfy your homeostatic feedback loops
  • both of these effects are strong enough to overcome to general desire to satisfy other feedback loops

So for something to capture our heart and mind such that we're willing to do it on hours on end, day after day, it must be very strongly causing us to become less confused about the world (or, again, properly the world as we perceive it) and causing us to believe many other homeostatic feedback loops that are in competition for our attention are also being satisfied.

In your case, my prediction would be that your perception of the world includes many unknowns or points of confusion that you find to be resolved by writing fiction, and you believe writing to be satisfying many other important needs you have. For example, maybe writing feels like socialization because you're in (simulated) conversation with the characters, so you don't get distracted by feelings of loneliness or lack of social interaction when writing. I have no idea if that specific example applies, but it's the kind of thing my model would predict to be happening.

For comparison, I used to love playing the Civilization video games the same way you seem to love writing. I would come up with all kinds of custom scenarios to play, would find great joy in spending hours or days on a game to find out where it took me (felt like I was going somewhere), and even when the game felt like a slog I still kept going because I wanted to see how it turned out. It similarly felt opaque at the time as to why I liked doing it; it was just fun and engrossing and felt like I was doing something worthwhile.

I only stopped playing and stopped finding Civilization fun after some stuff happened in my life such that I no longer was able to perceive playing Civilization as something that was causing me to learn anything, and instead saw it for the game it was rather than the game I imagined it was. Specifically, Civilization, it turns out, was fun to me because I thought of it as a simulation of things I was interested in. Once I saw it was a game with specific mechanics and I could play that game directly, I became less interested and eventually stopped playing all together as I figured out it was just about the game and not about the content. This also turned out to be a general change: almost no game could hold my attention anymore unless the mechanics themselves were interesting; you could dress up the game with any flavor you wanted and it didn't make a difference to my level of interest. Whereas things like story, setting, and character design were interesting to me in the past, now they are just surface-level details to be seen through because they can't engage me because I know they don't tell me anything about the world that I care about.

My prediction, based on all this, would then be that writing may become less compelling to you if you come to believe that writing fiction isn't helping you better understand things you care about in the world or less believe it to be satisfying your other needs.

Comment by gworley on Jacob's Twit, errr, Shortform · 2019-08-18T17:33:07.613Z · score: 4 (4 votes) · LW · GW

Right. I often suspect attempts to change social equilibriums are not attempts at Pareto improvements but instead trade-offs along the existing frontier that better serve some people that are currently underserved by the existing equilibrium. They are, of course, often sold as Pareto improvements by their supporters because it's both not considered acceptable to argue for trade-offs that will make others worse (unless they are undesirable others) and because they may innocently but motivatedly confuse them for true Pareto improvements because they have blindspots that prevent them from noticing how the change would be bad for others when it's good for them, such as via the typical mind fallacy.

Comment by gworley on Matthew Barnett's Shortform · 2019-08-16T23:43:22.866Z · score: 3 (2 votes) · LW · GW

I'm somewhat sympathetic to this. You probably don't need the ability, prior to working on AI safety, to already be familiar with a wide variety of mathematics used in ML, by MIRI, etc.. To be specific, I wouldn't be much concerned if you didn't know category theory, more than basic linear algebra, how to solve differential equations, how to integrate together probability distributions, or even multivariate calculus prior to starting on AI safety work, but I would be concerned if you didn't have deep experience with writing mathematical proofs beyond high school geometry (although I hear these days they teach geometry differently than I learned it—by re-deriving everything in Elements), say the kind of experience you would get from studying graduate level algebra, topology, measure theory, combinatorics, etc..

This might also be a bit of motivated reasoning on my part, to reflect Dagon's comments, since I've not gone back to study category theory since I didn't learn it in school and I haven't had specific need for it, but my experience has been that having solid foundations in mathematical reasoning and proof writing is what's most valuable. The rest can, as you say, be learned lazily, since your needs will become apparent and you'll have enough mathematical fluency to find and pursue those fields of mathematics you may discover you need to know.

Comment by gworley on Beliefs Are For True Things · 2019-08-16T23:25:58.333Z · score: 5 (5 votes) · LW · GW
Holding that beliefs are for true things means that you do not believe things because they are useful, believe things because they sound nice, or believe things because you prefer them to be true. You believe things that are true (or at least that you believe to be true, which is often the best we can get!).

This is maybe a subtle objection, but I disagree with the implicit rejection of utility in favor of truth being set up here. Truth is very attractive to us, and I think this runs deep for reasons that don't much matter here but on which I'll just say I think it's because we're fundamentally prediction error minimizers (with some homeostatic feedback loops thrown in for survival and reproduction purposes). But if I had to justify why truth is important, I would say it's because it's useful. If truth were somehow not causally upstream of making accurate predictions about the world (or maybe that's just what truth means), I don't think I would care about it, because making accurate predictions about the world is really useful to getting all the other things I care about done.

Yes, there is a danger that befalls some people when they prize utility too far above truth that biases them in subtle and gross ways that lead them astray and actually work against them by making them less serve their purposes when they're not looking, but there are similar dangers when people pursue truth at the expense of usefulness, mostly in the form of opportunity costs. I think we all at some point must learn to prize truth over motivated reasoning and preferences, for example, but I also think we must learn to prize the utility of truth over truth itself lest we be enthralled by the Beast of Scrupulosity.

Comment by gworley on Eli's shortform feed · 2019-08-13T19:57:32.655Z · score: 5 (2 votes) · LW · GW

I also think it's reasonable to think that multiple things may be doing on that result in a theory of mental energy. For example, hypotheses 1 and 2 could both be true and result in different causes of similar behavior. I bring this up because I think of those as two different things in my experience: being "full up" and needing to allow time for memory consolidation where I can still force my attention it just doesn't take in new information vs. being unable to force the direction of attention generally.

Comment by gworley on Adjectives from the Future: The Dangers of Result-based Descriptions · 2019-08-12T21:31:48.292Z · score: 4 (3 votes) · LW · GW

Maybe it's because we live in a world full of these "adjectives from the future", but when I think of, for example, a "weight-loss program" I don't think the program will result in weight loss, but rather a program whose purpose is weight loss, whether or not it achieves it. Similarly with the other examples: the adjective is not describing what it will do, but what the intended purpose is.

Comment by gworley on Does human choice have to be transitive in order to be rational/consistent? · 2019-08-12T21:24:09.659Z · score: 2 (1 votes) · LW · GW

Right, it does seem that we have found ways, being bounded and irrational agents, to get closer to rationality by using our boundedness to protect ourselves from our irrationality (and vice versa!).

This seems to be a case of using boundedness in the form of not being precise, maintaining uncertainty that is not resolved until the last moment, and also probably exhaustion (if you try to lead me through a pump after a few steps I'll give up before you can take too much advantage of me) to avoid bad results of maximizing on irrational preferences.

The opposite would be using irrationality to deal with boundedness, such as keeping things vague so we can sometimes still do the right thing even when we've made a mistake in our reasoning about our preferences.

Comment by gworley on What explanatory power does Kahneman's System 2 possess? · 2019-08-12T21:12:36.582Z · score: 9 (5 votes) · LW · GW

As I recall Kahneman is somewhat careful to avoid presenting S1/S2 as part of a dual process theory, and in doing so naturally cuts off some of the chance to turn around and use S2 causally upstream of the things he describes. I think you are correctly seeing the Kahneman is very careful in how he writes, such that S1/S2 are not gears in his model so much as post hoc patterns that act as nice referents to, in his model, isolated behaviors that share certain traits without having to propose a unifying causal mechanism.

Nonetheless, I think we can identify S2 roughly with the neocortex and S1 roughly with the rest of the brain, and understand S1/S2 behaviors as those primarily driven by activity in those parts of the brain. Kahneman just is careful, in my recollection, to avoid saying things like that because there's no hard proof for it, just inference.

Comment by gworley on Scope Insensitivity Judo · 2019-08-09T17:43:59.768Z · score: 2 (1 votes) · LW · GW

Mostly I think of it in terms of predictions and their errors. In this example I expected/predicted the world would look one way and then it looked another, and when it looked another that seems to have triggered a cascade of prediction errors that resulted in a process to try to construct new predictions that also dredged up old evidence from memory to be reconsidered.

Comment by gworley on Project Proposal: Considerations for trading off capabilities and safety impacts of AI research · 2019-08-07T17:40:26.043Z · score: 7 (3 votes) · LW · GW

I think ML methods are insufficient for producing AGI, and getting to AGI will require one or more changes in paradigm before we have a set of tools that will look like they can produce AGI. From what I can tell the ML community is not working on this, and instead prefer incremental enhancements to existing algorithms.

Basically what I view as needed to make AGI work might be summarized as needing to design dynamic feedback networks with memory that support online learning. What we mostly see out of ML these days are feedforward networks with offline learning that are static in execution and often manage to work without memory, though some do have this. My impression is that existing ML algorithms are unstable under these kinds of conditions. I expect something like neural networks will be part of making it to AGI, and so some current ML research will matter, but mostly we should think of current ML research as being about near-term, narrow applications rather than on the road to AGI.

That's at least my opinion based on my understanding of how consciousness works, my belief that "general" requires consciousness, and my understanding of the current state of ML and what it does and does not do that could support consciousness.

Comment by gworley on Project Proposal: Considerations for trading off capabilities and safety impacts of AI research · 2019-08-07T17:33:15.847Z · score: 2 (1 votes) · LW · GW

Based on my reading of the post it seemed to me that you were concerned primarily with info-hazard risks in ML research, not AI research in general; maybe it's the way you framed it that I took it to be contingent on ML mattering.

Comment by gworley on G Gordon Worley III's Shortform · 2019-08-07T01:54:55.393Z · score: 6 (4 votes) · LW · GW

So long as shortform is salient for me, might as well do another one on a novel (in that I've not heard/seen anyone express it before) idea I have about perceptual control theory, minimization of prediction error/confusion, free energy, and Buddhism that I was recently reminded of.

There is a notion within Mahayana Buddhism of the three poisons: ignorance, attachment (or, I think we could better term this here, attraction, for reasons that will become clear), and aversion. This is part of one model of where suffering arises from. Others express these notions in other ways, but I want to focus on this way of talking about these root kleshas (defilements, afflictions, mind poisons) because I think it has a clear tie in with this other thing that excites me, the idea that the primary thing that neurons seek to do is minimize prediction error.

Ignorance, even among the three poisons, is generally considered more fundamental, in that ignorance appears first and it gives rise to attraction and aversion (in some models there is fundamental ignorance that gives rise to the three poisons, marking a separation between ignorance as mental activity and ignorance as a result of the physical embodiment of information transfer). This looks to me a lot like what perceptual control theory predicts if the thing being controlled for is minimization of prediction error: there is confusion about the state of the world, information comes in, and this sends a signal within the control system of neurons to either up or down regulate something. Essentially what the three poisons describe is what you would expect the world to look like if the mind were powered by control systems trying to minimize confusion/ignorance, nudging the system toward and away from a set point where prediction error is minimized via negative feedback (and a small bonus, this might help explain why the brain doesn't tend to get into long-lasting positive feedback loops: it's not constructed for it and before long you trigger something else to down-regulate because you violate its predictions).

It also makes a lot of sense that these would be the root poisons. I think we can forgive 1st millennium Buddhists for not discovering PCT or minimization of prediction error directly, but we should not be surprised that they identified the mental actions this theory predicts should be foundational to the mind and also recognized that they were foundational actions to all others. Elsewhere, Buddhism explicitly calls out ignorance as the fundamental force driving dukkha (suffering), though we probably shouldn't assign too many points to (non-Madhyamaka) Buddhism for noticing this since other Buddhist theories don't make this same claims about attachment and aversion and they are used concurrently in explication of the dharma.

Comment by gworley on Subagents, neural Turing machines, thought selection, and blindspots · 2019-08-07T01:31:32.734Z · score: 6 (3 votes) · LW · GW

The production rule model is interesting to me in that it fits well with Michael Commons' notion of how developmental psychology works. Specifically, Commons has a formal version of his theory that looks a lot like what developmental psychology is about is how humans learn new ways to perform more "complex" production rules in that they are the same sort of rules operating on more complex types.

Comment by gworley on Project Proposal: Considerations for trading off capabilities and safety impacts of AI research · 2019-08-07T00:05:57.582Z · score: 3 (2 votes) · LW · GW

One complicating factor is how much you believe ML contributes to existential threats. For example, I think the current ML community is very unlikely to ever produce AGI (<10%) and that AGI will be the result of break throughs from researchers in other parts of AI, thus it seems not very important to me what current ML researchers think of long-term safety concerns. Other analyses of the situation would result in concluding differently, though, so this seems like an upstream question that must be addressed or at least contingently decided upon before evaluating how much it would make sense to pursue this line of inquiry.

Comment by gworley on G Gordon Worley III's Shortform · 2019-08-06T20:10:28.995Z · score: 12 (6 votes) · LW · GW

I have plans to write this up more fully as a longer post explaining the broader ideas with visuals, but I thought I would highlight one that is pretty interesting and try out the new shortform feature at the same time! As such, this is not optimized for readability, has no links, and I don't try to backup my claims. You've been warned!

Suppose you frequently found yourself identifying with and feeling like you were a homunculus controlling your body and mind: there's a real you buried inside, and it's in the driver's seat. Sometimes your mind and body do what "you" want, sometimes it doesn't and this is frustrating. Plenty of folks reify this in slightly different ways: rider and elephant, monkey and machine, prisoner in cave (or audience member in theater), and, to a certain extent, variations on the S1/S2 model. In fact, I would propose this is a kind of dual process theory of mind that has you identifying with one of the processes.

A few claims.

First, this is a kind of constant, low-level dissociation. It's not the kind of high-intensity dissociation we often think of when we use that term, but it's still a separation of sense of self from the physical embodiment of self.

Second, this is projection, and thus a psychological problem in need of resolving. There's nothing good about thinking of yourself this way; it's a confusion that may be temporarily helpful but it's also something you need to learn to move beyond via first reintegrating the separated sense of self and mind/body.

Third, people drawn to the rationalist community are unusually likely to be the sort of folks who dissociate and identify with the homunculus, S2, the rider, far mode, or whatever you want to call it. It gives them a world view that says "ah, yes, I know what's right, but for some reason by stupid brain doesn't do what I want, so let's learn how to make it do what I want" when this is in fact a confusion because it's the very brain that's "stupid" that's producing the feeling that you think you know what you want!

To speculate a bit, this might help explain some of the rationalist/meta-rationalist divide: rationalists are still dissociating, meta-rationalists have already reintegrated, and as a result we care about very different things and look at the world differently because of it. That's very speculative, though, and I have nothing other than weak evidence to back it up.

Comment by gworley on Preferences as an (instinctive) stance · 2019-08-06T19:16:51.823Z · score: 6 (3 votes) · LW · GW

This is a great point that I think sometimes gets lost on folks, which is why it's good that you bring it up. To the extent I disagree with you on your research agenda, for example, it's disagreement over what model we use to describe reality that will be useful to our purposes, rather than disagreement over reality itself.

Comment by gworley on Raemon's Scratchpad · 2019-08-06T01:22:37.912Z · score: 8 (4 votes) · LW · GW

Good example: the US tried to go metric and then canceled its commitment.

Comment by gworley on Why Subagents? · 2019-08-02T20:07:44.201Z · score: 2 (1 votes) · LW · GW
The problem is that the preferences are conditional on internal state; they can't be captured only by looking at the external environment.

I think I wasn't clear enough about what I meant. I mean to question specifically why excluding such so-called "internal" state is the right choice. Yes, it's difficult and inconvenient to work with that which we cannot externally observe, but I think much of the problem is that our models leave this part of the world out because it can't be easily observed with sufficient fidelity (yet). The division between internal and external is somewhat arbitrary in that it exists at the limit of our observation powers, not generally as a natural limit of the system independent of our knowledge of it, so I question whether it makes sense to then allow that limit to determine the model we use, rather than stepping back and finding a way to make the model larger such that it can include the epistemological limits that create partial preferences as a consequence rather than being ontologically basic to the model.

Comment by gworley on Why Subagents? · 2019-08-02T17:30:00.107Z · score: 2 (1 votes) · LW · GW

With regards to path dependence and partial preferences, a certain amount of this feels like the model simply failing to fully capture the preference on the first go. That is, preferences are conditional, i.e. they are conditioned on the environment in which they are embedded, and the sense in which there is partiality and path dependence issues seems to me to arise entirely from partial specification, not the preference being partial itself. Thus I have to wonder, why pursue models that deal with partial preferences and their issues rather than trying to build better models of preferences that better capture the full complexity of preferences?

To a certain extent it feels to me like with partial preferences we're trying to hang on to some things that were convenient about older models while dealing with the complexities of reality they failed to adequately model, rather than giving up our hope to patch the old models and look for something better suited to what we are trying to model (yes, I'm revealing my own preference here for new models based on what we learned from old models instead of incrementally improving old models).

Comment by gworley on Mistake Versus Conflict Theory of Against Billionaire Philanthropy · 2019-08-01T18:47:07.696Z · score: 4 (3 votes) · LW · GW

As a general point, I consider it worth writing things that tackle an object level issue and show how mistake theory reasoning concludes something different than conflict theory reasoning and how that is different. I say that because I think most people are at least a little bit conflict theorists. Maybe not about everything, but at least sometimes for many people there will be times they think in terms of conflict, of us vs. them, of in-group against out-group. And having someone provide a well-reasoned, thoughtful, and generous-to-opponents essay nudging folks towards mistake theory by showing how it really works on the margin turns folks into being more strongly mistake theorists or using mistake theory more often.

My strong claim would be that humans start out conflict theorists—it's our "natural" state—and it's only through people showing us another way is it possible that we can come to another position. Yes, any writing like this piece by Scott can be used as fuel for reinforcing a conflict theory perspective in some people, but these are also the people who are likely so strongly conflict theorists that all evidence reinforces their position and there's no marginal difference from producing something like Scott's piece or something less charitable, while it does a lot to move people towards a mistake theory perspective, even if just on the object level issue addressed, and repeated exposure to such people can turn them into net mistake theorists.

Could some ideal person have done more to convert more conflict theorists to mistake theory on at least this issue in an essay than Scott did in his? Maybe. But I'm sure Scott did the best he could, and I think it's on net better that he wrote this than not.

Comment by gworley on Forum participation as a research strategy · 2019-07-30T20:54:51.950Z · score: 3 (2 votes) · LW · GW

This has similarly been my approach. As best I can tell writing papers for academic publication is nice but, especially in the AI safety space, is not really the best way to convey and discuss ideas. Much more important seems to be being part of the conversation about technical ideas, learning from it, and adding to it so others can do the same. I put some small amount of effort into things outside FP mostly because I believe it's a good idea for reputation effects and spreading ideas outside the forum bubble, but not because I think it's the best way to make intellectual progress.

It's also nice because the feedback loops are shorter. I can comment on a post or write my own, have a discussion, and then within weeks see the ripples of that discussion influencing other discussions. It helps me feel the impact I'm having, and motivates me to keep going.

Probably the only thing superior in my mind is doing practical work, e.g. building systems that test out ideas. Unfortunately many of the ideas we talk about in safety are currently ahead of the tech so we don't know how to build things yet (and for safety sake I think it's fine to not push on that too hard since I expect it will come on its own anyway), so until we are closer to AGI forum participation is likely one of the high impact activities one can engage in (I'm similarly positive about doing the face-to-face equivalent of talking at conferences and having conversations with interested folks).

Comment by gworley on Keeping Beliefs Cruxy · 2019-07-30T20:45:59.062Z · score: 5 (2 votes) · LW · GW

I think philosophers who are good at philosophy do change their own minds and do seek out ways to change their own minds to match what they know and reason (they, if nothing else, strive for reflective equilibrium). This is of course not what everyone does, and academic philosophy is, in my opinion, suffering from the disease of scholasticism, but I do think philosophers on the whole are good at the weaker form of double crux that takes two people and that they remain open to having their mind changed by the other person.

Comment by gworley on Keeping Beliefs Cruxy · 2019-07-30T19:06:01.699Z · score: 5 (2 votes) · LW · GW

I think it's worth pointing out that there's nothing all that special about double crux if you're used to having discussions with people on the order of being professional philosophers; double crux is just a fancy name for what I would have previously called arguing in good faith. It is special though in that many people are neither naturally shaped such that they naturally disagree with others that way or have no been trained in the method, so giving it a special name and teaching it as a technique seems worthwhile since the alternative is you know there is a "right" way to argue that converges towards truth and a "wrong" way that doesn't and teaching people to do the move where they find the beliefs their ideas hinge on and analyze that hinge in a way that might alter it and thus the entire belief network is hard if you can't precisely describe how to do it.

I feel quite confident that you already know the skill being called double crux based on the conversations I've seen you participate in LW, though I also understand how it can look like there is some "magic" in the technique you're not getting because it feels pretty magical if you learn it late in life and are then like "oh, that's why all my disagreements never go anywhere".

Comment by gworley on Evan Rysdam's Shortform · 2019-07-30T18:00:38.038Z · score: 4 (3 votes) · LW · GW

This very much matches my own model. Once you are high or low status, it's self reinforcing and people will interpret the evidence to support the existing story, which is why when you are high you can play low and you won't lose status (you're just "slumming it" or something similar) and when you are low you can play high and will not gain any status (you're "reaching above your station).

Comment by gworley on The Self-Unaware AI Oracle · 2019-07-30T00:43:24.608Z · score: 3 (2 votes) · LW · GW

Sure. Let's construct the 0-optimizer. Its purpose is simply to cause there to be lots of 0s in memory (as opposed to 1s). It only knows about Algorithm Land, and even then it's a pretty narrow model: it knows about memory and can read and write to it. Now at some point the 0-optimizer manages to get all the bits set to 0 in its addressable memory, so it would seem to have reached maximum attainment.

But it's a hungry optimizer and keeps trying to find ways to set more bits to 0. It eventually stumbles upon a gap in security of the operating system that allows it to gain access to memory outside its address space, so it can now set those bits to 0. Obviously it does this all "accidentally", never knowing it's using a security exploit, it just stumbles into it and just sees memory getting written with 0s so it's happy (this has plenty of precedent; human minds are great examples of complex systems that have limited introspective access that do lots of complex things without knowing how or why they are doing them). With some luck, it doesn't immediately destroy itself and gets a chance to be hungry for more 0s.

Next it accidentally starts using the network interface on the computer. Although it doesn't exactly understand what's going on, it figures out how to get responses that just contain lots of 0s. Unfortunately for us what this is actually doing is performing a denial of service attack against other computers to get back the 0s. Now we have a powerful optimization process that's hungry for 0s and it satisfies its hunger by filling our networks with garbage traffic.

Couple of hops on, it's gone from denial of service attacks to wiping out our ability to use Internet service to our ability to use any EM communication channel to generating dangerously high levels of radiation that kill all life on Earth.

This story involved a lot of luck, but my expectation is that we should not underestimate how "lucky" a powerful optimizer can be, given evolution is a similarly ontologically simple process that nonetheless managed to produce some pretty complex results.