Posts

Oh No My AI (Filk) 2021-06-11T15:05:48.733Z
What are the gears of gluten sensitivity? 2021-06-08T16:42:52.189Z
Zen and Rationality: Continuous Practice 2021-05-31T18:42:56.950Z
The Purpose of Purpose 2021-05-15T21:00:20.559Z
Yampolskiy on AI Risk Skepticism 2021-05-11T14:50:38.500Z
Identity in What You Are Not 2021-04-24T20:11:49.480Z
Forcing Yourself is Self Harm, or Don't Goodhart Yourself 2021-04-10T15:19:42.130Z
Forcing yourself to keep your identity small is self-harm 2021-04-03T14:03:06.469Z
How I Meditate 2021-03-08T03:34:21.612Z
Bootstrapped Alignment 2021-02-27T15:46:29.507Z
Fake Frameworks for Zen Meditation (Summary of Sekida's Zen Training) 2021-02-06T15:38:17.957Z
The Problem of the Criterion 2021-01-21T15:05:41.659Z
The Teleological Mechanism 2021-01-19T23:58:54.496Z
Zen and Rationality: Karma 2021-01-12T20:56:57.475Z
You are Dissociating (probably) 2021-01-04T14:37:02.207Z
A Model of Ontological Development 2020-12-31T01:55:58.654Z
Zen and Rationality: Skillful Means 2020-11-21T02:38:09.405Z
No Causation without Reification 2020-10-23T20:28:51.831Z
The whirlpool of reality 2020-09-27T02:36:34.276Z
Zen and Rationality: Just This Is It 2020-09-20T22:31:56.338Z
Zen and Rationality: Map and Territory 2020-09-12T00:45:40.323Z
How much can surgical masks help with wildfire smoke? 2020-08-21T15:46:12.914Z
Bayesiance (Filk) 2020-08-18T16:30:00.753Z
Zen and Rationality: Trust in Mind 2020-08-11T20:23:34.434Z
Zen and Rationality: Don't Know Mind 2020-08-06T04:33:54.192Z
Let Your Mind Be Not Fixed 2020-07-31T17:54:43.247Z
[Preprint] The Computational Limits of Deep Learning 2020-07-21T21:25:56.989Z
Comparing AI Alignment Approaches to Minimize False Positive Risk 2020-06-30T19:34:57.220Z
What are the high-level approaches to AI alignment? 2020-06-16T17:10:32.467Z
Pragmatism and Completeness 2020-06-12T16:34:57.691Z
The Mechanistic and Normative Structure of Agency 2020-05-18T16:03:35.485Z
What is the subjective experience of free will for agents? 2020-04-02T15:53:38.992Z
Deconfusing Human Values Research Agenda v1 2020-03-23T16:25:27.785Z
Robustness to fundamental uncertainty in AGI alignment 2020-03-03T23:35:30.283Z
Big Yellow Tractor (Filk) 2020-02-18T18:43:09.133Z
Artificial Intelligence, Values and Alignment 2020-01-30T19:48:59.002Z
Towards deconfusing values 2020-01-29T19:28:08.200Z
Normalization of Deviance 2020-01-02T22:58:41.716Z
What spiritual experiences have you had? 2019-12-27T03:41:26.130Z
Values, Valence, and Alignment 2019-12-05T21:06:33.103Z
Doxa, Episteme, and Gnosis Revisited 2019-11-20T19:35:39.204Z
The new dot com bubble is here: it’s called online advertising 2019-11-18T22:05:27.813Z
Fluid Decision Making 2019-11-18T18:39:57.878Z
Internalizing Existentialism 2019-11-18T18:37:18.606Z
A Foundation for The Multipart Psyche 2019-11-18T18:33:20.925Z
In Defense of Kegan 2019-11-18T18:27:37.237Z
Why does the mind wander? 2019-10-18T21:34:26.074Z
What's your big idea? 2019-10-18T15:47:07.389Z
Reposting previously linked content on LW 2019-10-18T01:24:45.052Z
TAISU 2019 Field Report 2019-10-15T01:09:07.884Z

Comments

Comment by G Gordon Worley III (gworley) on Jimrandomh's Shortform · 2021-07-25T00:22:52.894Z · LW · GW

I think the obvious caveat here is that many people can't do this because they have restrictions that have taken them away from the mean. For example, allergies, sensitivities, and ethical or cultural restrictions on what they eat. They can do a limited version of the intervention of course (for example, if only eating plants, eat all the plants you don't eat now and stop eating the plants you currently eat), although I wonder if that would have similar effects or not because it's already so constrained.

Comment by G Gordon Worley III (gworley) on Owain_Evans's Shortform · 2021-07-25T00:19:47.360Z · LW · GW

Echoing something in Viliam's comment, but I think this is looking at the wrong category. It seems like there's no correlation because Christianity is too broad a category of religions with a common history. Instead, the right comparison seems to be Protestantism vs. not.

Even within Protestantism I think there's a lot of room for variation. For example, there might be a correlation with certain branches of Protestant Christianity and not with others.

All of this makes it very hard to tell how much was causally the result of Protestant Christianity or even just particular denominations vs. larger cultural forces of which those denominations were downstream.

Comment by G Gordon Worley III (gworley) on The Nature of Counterfactuals · 2021-07-20T20:11:28.571Z · LW · GW

I argue that for this it doesn't, i.e. my case for how the problem of the criterion gets resolved is that you can't help but be pragmatic because that's a description of how epistemology is physically instantiated in our universe. The only thing you might lose value on is if you have some desire to resolve metaphysical questions and you stop short of resolving them then of course you will fail to receive the full value possible because you didn't get the answer. I argue that getting such answers is impossible, but nonetheless trying to find them may be worthwhile to someone.

Comment by G Gordon Worley III (gworley) on Book review: The Explanation of Ideology · 2021-07-20T19:44:42.311Z · LW · GW

Thinking about this topic, one thing I find interesting is how much we reinforce our own ideas of what the "right" way to be a family is.

A good example is the way, living and growing up in what is classified here as Absolute Nuclear culture, that all of the following were explicitly said by people around me to emphasize that other family systems were wrong:

  • cousin marriage is backwards, especially close cousin marriage (it's a thing hicks, low class, stupid people do)
  • the idea that living at home with your parents once you are an adult means you're a failure
  • parents who try to exert too much authority on their children are brutish and controlling and trying to live through their children
  • having more than one sexual partner is greedy (only high status, wealthy people should be able to get away with it, and even then only as long as they do it in secret)

I've also heard people complain about how Absolute Nuclear culture, from the perspective of other family cultures, is:

  • lonely
  • selfish (puts the individual above the care of the family)
  • weak or foolish (doesn't exert authority or take the partners a person is entitled to)

So regardless of if Todd gets the categories right, it does seem that there is something going on here with self-reinforcing cultures about family system norms that is more than religious beliefs being imposed on people (as opposed to religions imposing the family system, even if they do, they don't seem especially likely to be the leader, but rather to follow the culture).

Comment by G Gordon Worley III (gworley) on The Nature of Counterfactuals · 2021-07-19T16:41:18.520Z · LW · GW

The areas where they don't work coincide with philosophical concerns.

As always, this is an interesting topic, because many of the philosophical concerns I can think of here end up being questions about metaphysics (i.e. what is the nature of stuff that lies beyond your epistemic ability to resolve the question) and I think there's some reason perspective by which you might say that metaphysics "doesn't matter", i.e. it's answers to questions that, while interesting, knowing the answer to them doesn't change what actions you take in the world because we already can know enough to figure out practical answers that serve our within-world purposes.

Comment by G Gordon Worley III (gworley) on Internal Information Cascades · 2021-07-17T21:19:40.685Z · LW · GW

Poetic summary: priors lay heavy.

Comment by G Gordon Worley III (gworley) on The Nature of Counterfactuals · 2021-07-17T20:32:22.700Z · LW · GW

And yet, despite epistemic circularity being our epistemic reality up to our circularly reasoned limited ability to assess that this is in fact the situation we find ourselves in, we manage to reason anyway.

Comment by G Gordon Worley III (gworley) on The inescapability of knowledge · 2021-07-17T19:33:39.762Z · LW · GW

Isn't the most important feature of an "internal map" that it is a conceptual and subjective thing, and not a physical thing? Obviously this smacks of dualism, but that's the price we pay for being able to communicate at all.

And yet such an "internal" thing must have some manifestation embedded within the physical world. However it is often a useful abstraction to ignore the physical details of how information is created and stored.

I do think it is is true that in principle "there is no objective difference between a book containing a painstakingly accurate account of a particular battle, and another book of carelessly assembled just-so stories about the same battle" (emphasis mine). With sufficiently bizarre coincidence of contexts, they could even be objectively identical objects. We can in practice say that in some expected class of agents (say, people from the writer's culture who are capable of reading) interacting in expected ways (like reading it instead of burning it for heat), the former will almost certainly convey more knowledge about the battle than the latter.

I think this begs the question of just what knowledge is.

Comment by G Gordon Worley III (gworley) on "If and Only If" Should Be Spelled "Ifeff" · 2021-07-17T15:19:05.607Z · LW · GW

Agreed. I think this basically makes concerns about "iff" being mistaken for "if" irrelevant and trying to make a better shorthand for "if and only if" is a distraction with insufficient impact for most anyone to trouble themselves with.

Comment by G Gordon Worley III (gworley) on Search-in-Territory vs Search-in-Map · 2021-07-12T23:39:13.005Z · LW · GW

I'm not convinced there's an actual distinction to be made here.

Using your mass comparison example, arguably the only meaningful different between the two is where information is stored. In search-in-map it's stored in an auxiliary system; in search-in-territory it's embedded in the system. The same information is still there, though, all that's changed is the mechanism, and I'm not sure map and territory is the right way to talk about this since both are embedded/embodied in actual systems.

My guess is that search-in-map looks like a thing apart from search-in-territory because of perceived dualism. You give the example of counterfactuals being in the map rather than the territory, but the map is itself still in the territory (as I'm sure you know), so there's no clear sense in which counterfactuals and the models that enable them are not physical processes. Yes, we can apply an abstraction to temporarily ignore the physical process, which is maybe what you mean to get at, but it's still a physical process all the same.

It seems to me maybe the interesting thing is whether you can talk about a search algorithm in terms of particular kinds of abstractions rather than anything else, which if you go far enough around comes back to your position, but with more explained.

Comment by G Gordon Worley III (gworley) on Relentlessness · 2021-07-12T23:13:57.477Z · LW · GW

Reminds me a lot about how Zen training works. Much of the value of a sesshin (several days where you do nothing much other than meditate) is the persistent lack of distractions or other stimulation: you just have to keep coming back, sitting still, and being with yourself, eating the same bland food, doing the same chores, over and over for days. It sounds awful, and at first it is, until you break/surrender and give yourself over to the fact that this is how the world is, and in that space deep meditative states and realization often emerge.

Comment by G Gordon Worley III (gworley) on Anthropics and Embedded Agency · 2021-06-30T19:02:26.023Z · LW · GW

I think you get it right on PBR and this is an underappreciated point.

Comment by G Gordon Worley III (gworley) on Four Components of Audacity · 2021-06-21T19:20:58.174Z · LW · GW

My favorite technique of boldness is to simply tell the truth. One trick is to never prefix statements with "I believe". Don't say "I believe ". If  is true then just say "". (If  is untrue then don't say  and don't believe .) The unqualified statement is bolder. Crocker's rules encode boldness into a social norm.

Most people are really bad at epistemology, so saying "I believe" is a useful marker to remind people you're saying something that could be wrong. Making the bolder statement is more likely to waste your time on things like fighting over categories rather than figuring out how the world actually is. Saying something unqualified is a bid to claim not only something about reality but also about the categories used to describe it, and "I believe" creates some space for the possibility of using other categories (which should always be there, but lots of people are trapped in their own ontologies in ways that prevent them from realizing this, hence they need a reminder).

Comment by G Gordon Worley III (gworley) on Arguments against constructivism (in education)? · 2021-06-20T17:05:31.459Z · LW · GW

I have in my mind this idea that direct instruction is the most effective pedagogical method yet invented, but we don't do it because most existing teachers hate teaching that way. I wonder how, if at all, constructivism could be made to work with it, since otherwise the effectiveness of DI would seem to be another argument against pure constructivism.

Comment by G Gordon Worley III (gworley) on Eli's shortform feed · 2021-06-19T17:59:37.933Z · LW · GW

I've long been somewhat skeptical that utility functions are the right abstraction.

My argument is also rather handwavy, being something like "this is the wrong abstraction for how agents actually function, so even if you can always construct a utility function and say some interesting things about its properties, it doesn't tell you the thing you need to know to understand and predict how an agent will behave". In my mind I liken it to the state of trying to code in functional programming languages on modern computers: you can do it, but you're also fighting an uphill battle against the way the computer is physically implemented, so don't be surprised if things get confusing.

And much like in the utility function case, people still program in functional languages because of the benefits they confer. I think the same is true of utility functions: they confer some big benefits when trying to reason about certain problems, so we accept the tradeoffs of using them. I think that's fine so long as we have a morphism to other abstractions that will work better for understanding the things that utility functions obscure.

Comment by G Gordon Worley III (gworley) on Neo-Mohism · 2021-06-17T15:32:36.338Z · LW · GW

The first and highest tenet is that all tenets are subject to revision. The ultimate arbiter of this philosophy is the ability to make advance, falsifiable predictions, allowing the universe to judge between competing ideas.

Many philosophies aspire to this, yet somehow we have more than one philosophy. This seems like a good idea in theory, but in practice everything gets anchored on particular ways of looking at the world and are less fluid than we would like. I don't object to the ideal, but it's a weird one because in theory every philosophy that includes it seems like it should converge to become the same thing, yet they don't.

Comment by G Gordon Worley III (gworley) on Can someone help me understand the arrow of time? · 2021-06-16T17:29:32.709Z · LW · GW

Some particular aspects of existence we're still a bit confused about.

Comment by G Gordon Worley III (gworley) on Can someone help me understand the arrow of time? · 2021-06-16T17:26:52.817Z · LW · GW

Namely, any time you ask them to explain consciousness, they shake their head and grumble "Existence is existence! It cannot be explained! It can only be experienced!" While this neatly avoids the argument (by refusing to engage in it), it can certainly be frustrating if you want to understand what consciousness is.

I think this is a bit of an exaggeration of the position. It's not that no explanation can be given, only that it won't explain what you're hoping it will because the thing you were hoping to have explained is not the same as the reality you have reified into a thing. One traditional approach is to give up categories and focus on practice and experience (e.g. Zen), but there's also traditions that go hard on explaining the inner workings of the mind and providing detailed models of it (e.g. Gelug).

Comment by G Gordon Worley III (gworley) on adamzerner's Shortform · 2021-06-12T15:29:30.550Z · LW · GW

I've tried doing this in my writing in the past, of the form of just throw away "I think" all together because it's redundant: there's no one thinking up these words but me.

Unfortunately this was a bad choice because many people take bald statements without softening language like "I think" as bids to make claims about how they are or should be perceiving reality, which I mean all statements are but they'll jump to viewing them as claims of access to an external truth (note that this sounds like they are making an error here by having a world model that supposes external facts that can be learned rather than facts being always conditional on the way they are known (which is not to say there is not perhaps some shared external reality, only that any facts/statements you try to claim about it must be conditional because they live in your mind behind your perceptions, but this is a subtle enough point that people will miss it and it's not the default, naive model of the world most people carry around anyway)).

Example:

I think you're doing X -> you're doing X

People react to the latter kind of thing as a stronger kind of claim that I would say it's possible to make.

This doesn't quite sound like what you want to do, though, and instead want to insert more nuanced words to make it clearer what work "think" is doing.

Comment by G Gordon Worley III (gworley) on Oh No My AI (Filk) · 2021-06-11T22:53:33.130Z · LW · GW

Oh wow this is a really great breakdown of the song's structure.

I don't really know any music theory, but I know a bit about poetry, and I can implicitly piece together how it works and hear whether or not it scans. I could tell there was a lot of weird optionality in this song, where you can shove in extra mora or whole feet in lines or leave them out and it would still scan, but sometimes it wouldn't. The song's meter does some weird stuff I don't really understand, which makes it both easy and hard to match (easy in that it offers a lot of flexibility for creativity, hard in that it's complex and easy to get wrong if you don't try singing it).

Comment by G Gordon Worley III (gworley) on A naive alignment strategy and optimism about generalization · 2021-06-11T16:51:18.450Z · LW · GW

For example, I now think that the representations of “what the model knows” in imitative generalization will sometimes need to use neural networks to translate between what the model is thinking and human language. Once you go down that road, you encounter many of the difficulties of the naive training strategy. This is an update in my view; I’ll likely go into more detail in a future post.

+1 to this and excited and happy to hear about this update in your view!

Comment by G Gordon Worley III (gworley) on What are the gears of gluten sensitivity? · 2021-06-11T02:19:27.277Z · LW · GW

Thanks for your reply!

No one mentioned the idea of SIBO, and doesn't sound like it would really match my symptoms, but something I'll keep in mind.

Comment by G Gordon Worley III (gworley) on What are the gears of gluten sensitivity? · 2021-06-10T03:03:12.808Z · LW · GW

As to the high cost action, I am working on prepping to do a true elimination diet. I say true because in many ways I'm already on one, having cut out foods that seemed to clearly give me problems and only keep those I seemed fine to eat, but that scattershot approach has left me without sufficient information to suss out what the actual triggers are.

You mentioning heart rate and HRV is interesting. I've been diagnosed as having a large number of preatrial contracts (PACs). We only noticed because I've had palpitations after eating whatever the triggering foods are (not the only symptom, though; I've also had things like thirst and chest pain that made it necessary to rule out a bunch of stuff like heart conditions and diabetes). I wonder if monitoring my heart would allow me to detect issues when they are below the level of being a problem. But I lack a model of how heart rate is connected to all this for that to make sense to me.

Comment by G Gordon Worley III (gworley) on What are the gears of gluten sensitivity? · 2021-06-10T02:58:15.248Z · LW · GW

This fits my model that gluten somehow contributes to autoimmune/inflammation "load" and that because I'm now dealing with chronically worse asthma, even with treatment it may not be getting me down to a low enough level to consume as much gluten (or something else!) as I could in the past without issue.

Comment by G Gordon Worley III (gworley) on What are the gears of gluten sensitivity? · 2021-06-10T02:56:24.181Z · LW · GW

I'll look into it. I was unaware until this comment that such testing existed. My 23andMe results show I don't have the markers so celiac disease, and SNPedia findings don't show anything likely to be related to gluten. I'll see if it's possible to find testing that might indicate non-allergic food sensitivities that isn't also bogus.

Comment by G Gordon Worley III (gworley) on What are the gears of gluten sensitivity? · 2021-06-10T02:48:40.654Z · LW · GW

Yep, and several other specialist. We ruled out everything else, which left me with a vague diagnosis of nonspecific, as-yet idiopathic food sensitivity.

Comment by G Gordon Worley III (gworley) on What are the gears of gluten sensitivity? · 2021-06-09T00:02:54.371Z · LW · GW

This is also somewhat helpful, but it leaves me with just some vague model of gluten causes the intestine to take up more stuff it normally wouldn't and that extra stuff crossing the blood barrier somehow causes problems. Maybe there's not more precision available in a model I can easily understand without learning a bunch of biochemistry, but I really wish I had something that would allow me to make more precise predictions about things like:

  • How much gluten can I eat without risk of issue? Is it none? is it 10% of what I would normally eat?
  • How long do we expect the effects of gluten last on the body?
  • What things would make the situation better/worse if eaten with gluten?
Comment by G Gordon Worley III (gworley) on The reverse Goodhart problem · 2021-06-08T18:19:48.501Z · LW · GW

Ah, yeah, that's true, there's not much concern about getting too much of a good thing and that actually being good, which does seem like a reasonable category for anti-Goodharting.

It's a bit hard to think when this would actually happen, though, since usually you have to give something up, even if it's just the opportunity to have done less. For example, maybe I'm trying to get a B on a test because that will let me pass the class and graduate, but I accidentally get an A. The A is actually better and I don't mind getting it, but then I'm potentially left with regret that I put in too much effort.

Most examples I can think of that look like potential anti-Goodharting seem the same: I don't mind that I overshot the target, but I do mind that I wasn't as efficient as I could have been.

Comment by G Gordon Worley III (gworley) on The reverse Goodhart problem · 2021-06-08T16:13:58.566Z · LW · GW

Maybe I'm missing something, but this seems already captured by the normal notion of what Goodharting is in that it's about deviation from the objective, not the direction of that deviation.

Comment by G Gordon Worley III (gworley) on Often, enemies really are innately evil. · 2021-06-08T14:42:55.361Z · LW · GW

Is this an objection? That a person fails to conceptualize what they are doing doesn't change the reality of what they are doing except by their own understanding of it.

For example, many people wander around in a state of cognitive fusion with the world, causing them to do things like read intent into places where there is none because they can't tell apart their own motivations from observations about the world. This doesn't really mean though that, for example, the curb on the sidewalk was out to get them when they tripped over it.

So it can still be a type error regardless of if the mind bothers to check this or not; it's a type error within the normal meanings we give to categories like value and strategy and terminal.

Comment by G Gordon Worley III (gworley) on Often, enemies really are innately evil. · 2021-06-08T14:35:46.562Z · LW · GW

Humans are social creatures. We often assume there is an audience even when there isn't one. Even if there isn't one, there's still the audience of me observing myself and making judgments about how good I look to myself.

You're right that this seems to be a maladaptive strategy, but it's also worth remembering that humans are bounded agents. I mean, humans seem to actually do the thing I've described, and a reasonable explanation is that they are short sighted in policy planning.

Comment by G Gordon Worley III (gworley) on Often, enemies really are innately evil. · 2021-06-08T01:25:14.055Z · LW · GW

There's basically no time when you are actually faced with a single option you must take at this level of consideration, so this is a nonstarter. Instead, it's that an option has been screened off so that it looks this way, but in fact there were many available that were simply not considered.

Comment by G Gordon Worley III (gworley) on Often, enemies really are innately evil. · 2021-06-08T01:22:57.784Z · LW · GW

Not at all, only that values and actions taken to achieve those values are not the same thing and that people can change strategies. It'd be too far a jump to go to supposing folks are a blank slate, and we need not consider the question anyway, since the author doesn't go so far as to propose something we need try to resolve by making such a strong claim. I am only saying that the author is mistaken about the idea that strategies are terminal.

For myself, if I look at myself and ask "why did I put so-and-so down" what I don't find is "oh, I want to put people down", I find "oh, I thought that if I did that it would make me look better in comparison" or something like that, where a deeper value is being served: making myself look good.

Comment by G Gordon Worley III (gworley) on Often, enemies really are innately evil. · 2021-06-08T01:16:33.558Z · LW · GW

But this is still a type error even if you think the strategy is being executed without any regard to why it's being executed. It's like mixing up the policy for the utility function.

Comment by G Gordon Worley III (gworley) on Often, enemies really are innately evil. · 2021-06-08T01:15:46.406Z · LW · GW

Yes, exactly.

Put another way, I'd say that if it's not grounded in a felt sense it's not a value, but a policy/strategy/etc. for achieving some value.

Comment by G Gordon Worley III (gworley) on We need a standard set of community advice for how to financially prepare for AGI · 2021-06-07T19:23:55.022Z · LW · GW

There's some major challenges here.

The first is trying to predict what will be a reliable store of value in a world where TAI may disrupt normal power dynamics. For example, if there's a superintelligent AI capable of unilaterally transforming all matter in your light cone into paperclips, is there any sense in which you have enough power to enforce your ownership of anything independent of such an AI? Seems like not, in which case it's very hard to know what assets you can meaningfully own that would be worth owning, let alone by what mechanisms you can meaningfully own things in such a world.

Now we might screen off bad outcomes since they don't matter to this question, but then we're still left with a lot of uncertainty. Maybe it just doesn't matter because we'll be expanding so rapidly that there's little value in existing assets (they'll be quickly dwarfed via expansion). Maybe we'll impose fairness rules that make held assets irrelevant for most things that matter to you. Maybe something else. There's a lot of uncertainty here that makes it hard to be very specific about anything beyond the run up to TAI.

We can, however, I think give some reasonable advice about the run up to TAI and what's likely to be best to have invested in just prior to TAI. Much of the advice about semiconductor equities, for example, seems to fall in this camp.

Comment by G Gordon Worley III (gworley) on Unrefined thoughts on some things rationalism is missing vs religions · 2021-06-07T19:15:05.460Z · LW · GW

I think of rationality as somewhat similar to Buddhism to some respects.

Depending on how we talk about it, Buddhism both is and isn't a religion. It isn't in the sense that there's some core teachings about suffering and how to deal with it that aren't really what I would call a religion so much as a teaching about a way to live life. In this respect it's quite similar to rationality.

It is in the sense that there's lots of religions built up around venerating the Buddha for giving us Buddhism and to support people who practice Buddhist teachings. Note, though, that this isn't exactly the same thing as the core teachings themselves.

I see rationality in a similar place as stoicism was in the late Roman period: a way of living one's life that many adopt, but that also isn't really what we'd call a religion. We could build up religions around it to revere its founders, for example, but we haven't. Obviously a religion does a lot more than that, as you note (and I know, since I practice Zen), but my point is mainly just to show that there's some separation between religion and the teachings embedded within a religion and rationality today looks to me a lot like teachings without a religion around them.

Comment by G Gordon Worley III (gworley) on Often, enemies really are innately evil. · 2021-06-07T19:06:42.006Z · LW · GW

People commonly have a terminal value of dragging other people down.

I'm doubtful this is true, because it's easy to see that dragging people down is not a terminal value, since there must be something gained by dragging others down, and getting that thing is the thing terminally valued. For example, maybe dragging people down makes someone feel good in some way, then getting that good feeling is the thing valued, not the dragging people down. Dragging people down is a strategy, not something terminal.

This suggests the core thesis is mistaken: it's not that people are innately evil, it's that they've learned a bad strategy to get what they want and are trapped in a local maximum where that strategy keeps working and other strategies are locally worse even if they would be globally better if they took the time to reorient themselves to those alternative strategies.

I'm not sure this has much of an impact on some parts of this post, like the bits about bullying, but it does call into question many of the inferences you try to draw about people in general.

Comment by G Gordon Worley III (gworley) on What to optimize for in life? · 2021-06-06T14:34:23.254Z · LW · GW

Slightly different than optionality, optimize for Pareto improvements. The more you can achieve efficiency across the entire frontier the better off you'll be and the less you'll be forced to make tradeoffs along that frontier because you keep expanding it.

Comment by G Gordon Worley III (gworley) on Rules for Epistemic Warfare? · 2021-06-06T04:40:02.156Z · LW · GW

I'm kinda surprised this comment is so controversial. I'm curious what people are objecting to resulting in downvotes.

Comment by G Gordon Worley III (gworley) on Rationalists should meet Integral Theory · 2021-06-05T19:31:59.215Z · LW · GW

I'm in the same boat. I agree with the title of this post (I wrote this whole post about Integral Spirituality) but didn't find this post particularly useful since it's a personal story without a lot of clear takeaways. In my mind this just isn't frontpage worthy, but great for personal on LW.

Comment by G Gordon Worley III (gworley) on The Alignment Forum should have more transparent membership standards · 2021-06-05T19:19:51.791Z · LW · GW

Does much curation actually happen, where members of the forum choose to promote comments and posts to AF? I've occasionally done this for comments and never for posts (other than my own), though I do sometimes hit the "suggest for AF" button on a post I think should be there but am not so confident as to make a unilateral decision. So I was surprised by your comments about curation because I don't much think of that as an activity AF forum members perform.

Comment by G Gordon Worley III (gworley) on If You Want to Find Truth You Need to Step Into Cringe · 2021-06-01T23:05:56.247Z · LW · GW

Hadn't thought of this before, but "cringe" seems pretty related to the rationalist notion of "ugh fields".

Comment by G Gordon Worley III (gworley) on Teaching ML to answer questions honestly instead of predicting human answers · 2021-06-01T22:58:48.720Z · LW · GW
  • Stories about how those algorithms lead to bad consequences. These are predictions about what could/would happen in the world. Even if they aren't predictions about what observations a human would see, they are the kind of thing that we can all recognize as a prediction (unless we are taking a fairly radical skeptical perspective which I don't really care about engaging with).

In the spirit then of caring about stories about how algorithms lead to bad consequences, a story about how I see not making a clear distinction between instrumental and intended models might come to bite you.

Let's use your example of a model that reports "no one entered the data center". I might think the right answer is that "no one entered the data center" when I in fact know that physically someone was in the datacenter but they were an authorized person. If I'm reporting this in the context of asking about a security breach, saying "no one entered the data center" when I more precisely mean "no unauthorized person entered the data center" might be totally reasonable.

In this case there's some ambiguity about what reasonably counts as "no one". This is perhaps somewhat contrived, but category ambiguity is a cornerstone of linguistic confusion and where I see the division between instrumental and intended models breaking down. I think there are probably some chunk of things we could screen off by making this distinction that are obviously wrong (e.g. the model that tries to tell me "no one entered the data center" when in fact, even given my context of a security breach, some unauthorized person did entered the data center), and that seems useful, so I'm mainly pushing on the idea here that your approach here seems insufficient for addressing alignment concerns on its own.

Not that you necessarily thought it was, but this seems like the relevant kind of issue to want to consider here.

Comment by G Gordon Worley III (gworley) on Why don't long running conversations happen on LessWrong? · 2021-05-31T04:42:26.684Z · LW · GW

I think this already happens sometimes, just in the comments on posts where you don't see it. I've been part of conversations that lasted ~1 month on LW via back and forth on comments on posts. ~1 week is somewhat more common.

Admittedly the audience for this is kinda small, but you can always look at the global stream of comments to jump into it.

Comment by G Gordon Worley III (gworley) on How one uses set theory for alignment problem? · 2021-05-29T16:29:15.179Z · LW · GW

It's just generally useful math background. Things like set theory, logic, category theory, etc. are the modern building blocks of mathematical modeling. I don't think there's anything specific about at theory and alignment that's important, only that you can't get very far in things directly relevant to alignment, like decision theory, without a good baseline of set theory knowledge.

Comment by G Gordon Worley III (gworley) on Teaching ML to answer questions honestly instead of predicting human answers · 2021-05-28T20:50:29.868Z · LW · GW

I want to consider models that learn to predict both “how a human will answer question Q” (the instrumental model) and “the real answer to question Q” (the intended model). These two models share almost all of their computation — which is dedicated to figuring out what actually happens in the world. They differ only when it comes time to actually extract the answer. I’ll describe the resulting model as having a “world model,” an “instrumental head,” and an “intended head.”

This seems massively underspecified in that it's really unclear to me what's actually different between the instrumental and intended models.

I say this because you posit the intended model gives "the real answer", but I don't see a means offered by which to tell "real" answers from "fake" ones. Further, for somewhat deep philosophical reasons, I also don't expect there is any such thing as a "real" answer anway, only one that is more or less useful to some purpose, and since ultimately it's humans setting this all up, any "real" answer is ultimately a human answer.

The only difference I can find seems to be a subtle one about whether or not you're directly or indirectly imitating human answers, which is probably relevant for dealing with a class of failure modes like overindexing on what humans actually do vs. what we would do if we were smarter, knew more, etc. but also still leaves you human imitation since there's still imitation of human concerns taking place.

Now, that actually sounds kinda good to me, but it's not what you seem to be explicitly saying when you talk about the instrumental and intended model.

Comment by G Gordon Worley III (gworley) on List of good AI safety project ideas? · 2021-05-27T13:54:07.056Z · LW · GW

I wrote a research agenda that suggests additional work to be done and that I'm not doing.

https://www.lesswrong.com/posts/k8F8TBzuZtLheJt47/deconfusing-human-values-research-agenda-v1

Comment by G Gordon Worley III (gworley) on Controlling Intelligent Agents The Only Way We Know How: Ideal Bureaucratic Structure (IBS) · 2021-05-26T14:52:32.644Z · LW · GW

Yeah, I guess I should say that I'm often worried about the big problem of superintelligent AI and not much thinking about how to control narrow and not generally capable AI. For weak AI, this kind of prosaic control mechanism might be reasonable. Christiano things this class of methods might work on stronger AI.

Comment by G Gordon Worley III (gworley) on G Gordon Worley III's Shortform · 2021-05-24T13:34:38.165Z · LW · GW

Robert Moses and AI Alignment

It's useful to have some examples in mind of what it looks like when an intelligent agent isn't aligned with the shared values of humanity. We have some extreme examples of this, like paperclip maximizers, and some less extreme but extreme in human terms examples, like dictators like Stalin, Mao, and Pol Pot who killed millions in the pursuit for their goals, but these feel like outliers that people can too easily make various arguments for being extreme and that no "reasonable" system would have these problems.

Okay, so let's think about how hard it is to just get "reasonable" people aligned, much less superintelligent AIs.

Consider Robert Moses, a man who achieved much at the expense of wider humanity. He worked within the system, gamed it, did useful things incidentally since they happened to bring him power or let him build a legacy, and then wielded that power in ways that harmed many while helping some. He was smart, generally caring, and largely aligned with what seemed to be good for America at the time, yet still managed to pursue courses of action that were really aligned with humanity as a whole.

We have plenty of other examples, but I think most of them don't put it quite into the kind of stark contrast Moses does. He's a great example of the kind of failure mode you can expect from inadequate alignment mechanism (though on a smaller scale): you get something that's kinda like what you wanted, but also bad in ways you probably didn't anticipate ahead of time.