Posts

Big Yellow Tractor (Filk) 2020-02-18T18:43:09.133Z · score: 11 (3 votes)
Artificial Intelligence, Values and Alignment 2020-01-30T19:48:59.002Z · score: 10 (3 votes)
Towards deconfusing values 2020-01-29T19:28:08.200Z · score: 13 (5 votes)
Normalization of Deviance 2020-01-02T22:58:41.716Z · score: 57 (21 votes)
What spiritual experiences have you had? 2019-12-27T03:41:26.130Z · score: 22 (5 votes)
Values, Valence, and Alignment 2019-12-05T21:06:33.103Z · score: 12 (4 votes)
Doxa, Episteme, and Gnosis Revisited 2019-11-20T19:35:39.204Z · score: 14 (5 votes)
The new dot com bubble is here: it’s called online advertising 2019-11-18T22:05:27.813Z · score: 55 (21 votes)
Fluid Decision Making 2019-11-18T18:39:57.878Z · score: 9 (2 votes)
Internalizing Existentialism 2019-11-18T18:37:18.606Z · score: 10 (3 votes)
A Foundation for The Multipart Psyche 2019-11-18T18:33:20.925Z · score: 7 (1 votes)
In Defense of Kegan 2019-11-18T18:27:37.237Z · score: 10 (5 votes)
Why does the mind wander? 2019-10-18T21:34:26.074Z · score: 11 (4 votes)
What's your big idea? 2019-10-18T15:47:07.389Z · score: 29 (15 votes)
Reposting previously linked content on LW 2019-10-18T01:24:45.052Z · score: 18 (3 votes)
TAISU 2019 Field Report 2019-10-15T01:09:07.884Z · score: 39 (20 votes)
Minimization of prediction error as a foundation for human values in AI alignment 2019-10-09T18:23:41.632Z · score: 13 (7 votes)
Elimination of Bias in Introspection: Methodological Advances, Refinements, and Recommendations 2019-09-30T20:23:13.139Z · score: 16 (3 votes)
Connectome-specific harmonic waves and meditation 2019-09-30T18:08:45.403Z · score: 12 (10 votes)
Goodhart's Curse and Limitations on AI Alignment 2019-08-19T07:57:01.143Z · score: 15 (7 votes)
G Gordon Worley III's Shortform 2019-08-06T20:10:27.796Z · score: 16 (2 votes)
Scope Insensitivity Judo 2019-07-19T17:33:27.716Z · score: 25 (10 votes)
Robust Artificial Intelligence and Robust Human Organizations 2019-07-17T02:27:38.721Z · score: 17 (7 votes)
Whence decision exhaustion? 2019-06-28T20:41:47.987Z · score: 17 (4 votes)
Let Values Drift 2019-06-20T20:45:36.618Z · score: 3 (11 votes)
Say Wrong Things 2019-05-24T22:11:35.227Z · score: 99 (36 votes)
Boo votes, Yay NPS 2019-05-14T19:07:52.432Z · score: 34 (11 votes)
Highlights from "Integral Spirituality" 2019-04-12T18:19:06.560Z · score: 20 (21 votes)
Parfit's Escape (Filk) 2019-03-29T02:31:42.981Z · score: 40 (15 votes)
[Old] Wayfinding series 2019-03-12T17:54:16.091Z · score: 9 (2 votes)
[Old] Mapmaking Series 2019-03-12T17:32:04.609Z · score: 9 (2 votes)
Is LessWrong a "classic style intellectual world"? 2019-02-26T21:33:37.736Z · score: 31 (8 votes)
Akrasia is confusion about what you want 2018-12-28T21:09:20.692Z · score: 27 (16 votes)
What self-help has helped you? 2018-12-20T03:31:52.497Z · score: 34 (11 votes)
Why should EA care about rationality (and vice-versa)? 2018-12-09T22:03:58.158Z · score: 16 (3 votes)
What precisely do we mean by AI alignment? 2018-12-09T02:23:28.809Z · score: 29 (8 votes)
Outline of Metarationality, or much less than you wanted to know about postrationality 2018-10-14T22:08:16.763Z · score: 19 (17 votes)
HLAI 2018 Talks 2018-09-17T18:13:19.421Z · score: 15 (5 votes)
HLAI 2018 Field Report 2018-08-29T00:11:26.106Z · score: 51 (21 votes)
A developmentally-situated approach to teaching normative behavior to AI 2018-08-17T18:44:53.515Z · score: 12 (5 votes)
Robustness to fundamental uncertainty in AGI alignment 2018-07-27T00:41:26.058Z · score: 7 (2 votes)
Solving the AI Race Finalists 2018-07-19T21:04:49.003Z · score: 27 (10 votes)
Look Under the Light Post 2018-07-16T22:19:03.435Z · score: 25 (11 votes)
RFC: Mental phenomena in AGI alignment 2018-07-05T20:52:00.267Z · score: 13 (4 votes)
Aligned AI May Depend on Moral Facts 2018-06-15T01:33:36.364Z · score: 9 (3 votes)
RFC: Meta-ethical uncertainty in AGI alignment 2018-06-08T20:56:26.527Z · score: 18 (5 votes)
The Incoherence of Honesty 2018-06-08T02:28:59.044Z · score: 22 (12 votes)
Safety in Machine Learning 2018-05-29T18:54:26.596Z · score: 18 (5 votes)
Epistemic Circularity 2018-05-23T21:00:51.822Z · score: 5 (1 votes)
RFC: Philosophical Conservatism in AI Alignment Research 2018-05-15T03:29:02.194Z · score: 31 (10 votes)

Comments

Comment by gworley on What are information hazards? · 2020-02-18T19:56:47.195Z · score: 3 (2 votes) · LW · GW

Thanks, this is a really useful summary to have since linking back to Bostrom on info hazards is reasonable but not great if you want people to actually read something and understand information hazards rather than bounce of something explaining the idea. Kudos!

Comment by gworley on Big Yellow Tractor (Filk) · 2020-02-18T18:45:52.173Z · score: 4 (2 votes) · LW · GW

Couple of notes on the song:

  • I wrote it with the Bob Dylan cover in my head more than the original.
  • It doesn't scan perfectly on purpose so that some of the syllables have to be "squished" to fit the time and make the song sound "sloppy" like the original and many covers of it do.
  • In case it's not obvious, it's meant to be a "ha ha only serious" anthem for negative utilitarians
Comment by gworley on Training Regime Day 1: What is applied rationality? · 2020-02-17T20:37:41.146Z · score: 3 (2 votes) · LW · GW

I think of applied rationality pretty narrowly, as the skill of applying reasoning norms that maximize returns (those norms happening to have the standard name "rationality"). Of course there's a lot to that, but I also think this framing is a poor one to train all the skills required to "win". To use a metaphor, as requested, it's like the skill of getting really good at reading a map to find optimal paths between points: your life will be better for it, but it also doesn't teach you everything, like how to figure out where you are on the map now or where you might want to go.

Comment by gworley on G Gordon Worley III's Shortform · 2020-02-17T19:17:22.119Z · score: 8 (4 votes) · LW · GW

tl;dr: read multiple things concurrently so you read them "slowly" over multiple days, weeks, months

When I was a kid, it took a long time to read a book. How could it not: I didn't know all the words, my attention span was shorter, I was more restless, I got lost and had to reread more often, I got bored more easily, and I simply read fewer words per minute. One of the effects of this is that when I read a book I got to live with it for weeks or months as I worked through it.

I think reading like that has advantages. By living with a book for longer the ideas it contained had more opportunity to bump up against other things in my life. I had more time to think about what I had read when I wasn't reading. I more deeply drunk in the book as I worked to grok it. And for books I read for fun, I got to spend more time enjoying them, living with the characters and author, by having it spread out over time.

As an adult it's hard to preserve this. I read faster and read more than I did as a kid (I estimate I spend 4 hours a day reading on a typical day (books, blogs, forums, etc.), not including incidental reading in the course of doing other things). Even with my relatively slow reading rate of about 200 wpm, I can polish off ~50k words per day, the length of a short novel.

The trick, I find, is to read slowly by reading multiple things concurrently and reading only a little bit of each every day. For books this is easy: I can just limit myself to a single chapter per day. As long as I have 4 or 5 books I'm working on at once, I can spread out the reading of each to cover about a month. Add in other things like blogs and I can spread things out more.

I think this has additional benefits over just getting to spend more time with the ideas. It lets the ideas in each book come up against each other in ways they might otherwise not. I sometimes notice patterns that I might otherwise not have because things are made simultaneously salient that otherwise would not be. And as a result I think I understand what I read better because I get the chance not just to let it sink in over days but also because I get to let it sink in with other stuff that makes my memory of it richer and more connected.

So my advice, if you're willing to try it, is to read multiple books, blogs, etc. concurrently, only reading a bit of each one each day, and let your reading span weeks and months so you can soak in what you read more deeply rather than letting it burn bright and fast through your mind to be forgotten like a used up candle.

Comment by gworley on Here is why most advice you hear that seems good, but "just doesn't work" from my unique perspective as a data scientist, as well as some that should actually work. · 2020-02-17T18:57:18.955Z · score: 4 (2 votes) · LW · GW

Welcome to LessWrong!

Given the content of your post, you might find these posts interesting:

Comment by gworley on G Gordon Worley III's Shortform · 2020-02-14T01:09:00.457Z · score: 4 (2 votes) · LW · GW

I few months ago I found a copy of Staying OK, the sequel to I'm OK—You're OK (the book that probably did the most to popularize transactional analysis), on the street near my home in Berkeley. Since I had previously read Games People Play and had not thought about transactional analysis much since, I scooped it up. I've just gotten around to reading it.

My recollection of Games People Play is that it's the better book (based on what I've read of Staying OK so far). Also, transactional analysis is kind of in the water in ways that are hard to notice so you are probably already kind of familiar with some of the ideas in it, but probably not explicitly in a way you could use to build new models (for example, as far as I can tell notions of strokes and life scripts were popularized by if not fully originated within transactional analysis). So if you aren't familiar with transactional analysis I recommend learning a bit about it since although it's a bit dated and we arguably have better models now, it's still pretty useful to read about to help notice patterns of ways people interact with others and themselves, sort of like the way the most interesting thing about Metaphors We Live By is just pointing out the metaphors and recognizing their presence in speech rather than whether the general theory is maximally good or not.

One things that struck me as I'm reading Staying OK is its discussion of the trackback technique. I can't find anything detailed online about it beyond a very brief summary. It's essentially a multi-step process for dealing with conflicts in internal dialogue, "conflict" here being a technical term referring to crossed communication in the transactional analysis model of the psyche. Or at least that's how it's presented. Looking at it a little closer and reading through examples in the book that are not available online, it's really just poorly explained memory reconsolidation. To the extent it's working as a method in transactional analysis therapy, it seems to be working because it's tapping into the same mechanisms as Unlocking the Emotional Brain.

I think this is interesting both because it shows how we've made progress and because it shows that transactional analysis (along with a lot of other things), were also getting at stuff that works, but less effectively because they had weaker evidence to build on that was more confounded with other possible mechanisms. To me this counts as evidence that building theory based on phenomenological evidence can work and is better than nothing, but will be supplanted by work that manages to tie in "objective" evidence.

Comment by gworley on A Variance Indifferent Maximizer Alternative · 2020-02-13T20:09:05.666Z · score: 2 (1 votes) · LW · GW

First, thanks for posting about this even though it failed. Success is built out of failure, and it's helpful to see it so that it's normalized.

Second, I think part of the problem is that there's still not enough constraints on learning. As others notice, this mostly seems to weaken the optimization pressure such that it's slightly less likely to do something we don't want but doesn't actively make it into something that does things we do want and not those we don't.

Third and finally, what this most reminds me of is impact measures. Not in the specific methodology, but in the spirit of the approach. That might be an interesting approach for you to consider given that you were motivated to look for and develop this approach.

Comment by gworley on Confirmation Bias As Misfire Of Normal Bayesian Reasoning · 2020-02-13T19:44:11.834Z · score: 5 (3 votes) · LW · GW

As Stuart previously recognized with the anchoring bias, it's probably worth keeping in mind that any bias is likely only a "bias" against some normative backdrop. Without some way reasoning was supposed to turn out, there are no biases, only the way things happened to work.

Thus things look confusing around confirmation bias, because it only becomes bias when it results in reason that produces a result that doesn't predict reality after the fact. Otherwise it's just correct reasoning based on priors.

Comment by gworley on Suspiciously balanced evidence · 2020-02-12T21:30:45.613Z · score: 2 (1 votes) · LW · GW

Yeah, I think #1 sounds right to me, and there is nothing strange about it.

Comment by gworley on Writeup: Progress on AI Safety via Debate · 2020-02-12T19:54:07.020Z · score: 11 (5 votes) · LW · GW

I don't recall seeing anything addressing this directly: has there been any progress towards dealing with concerns about Goodharting in debate and otherwise the risk of mesa-optimization in the debate approach? The typical risk scenario being something like training debate creates AIs good at convincing humans rather than at convincing humans of the truth, and once you leave the training set of questions were the truth can be reasonably determined independent of the debate mechanism we'll experience what will amount to a treacherous turn because the debate training process accidentally optimized for a different target (convince humans) than the one intended (convince humans of true statements).

For myself this continues to be a concern which seems inadequately addressed and makes me nervous about the safety of debate, much less its adequacy as a safety mechanism.

Comment by gworley on What can the principal-agent literature tell us about AI risk? · 2020-02-12T19:39:57.627Z · score: 5 (2 votes) · LW · GW
Nevertheless, extensions to PAL might still be useful. Agency rents are what might allow AI agents to accumulate wealth and influence, and agency models are the best way we have to learn about the size of these rents. These findings should inform a wide range of future scenarios, perhaps barring extreme ones like Bostrom/Yudkowsky.

For myself, this is the most exciting thing in this post—the possibility of taking the principal-agent model and using it to reason about AI even if most of the existing principal-agent literature doesn't provide results that apply. I see little here to make me think the principal-agent model wouldn't be useful, only that it hasn't been used in ways that are useful to AI risk scenarios yet. It seems worthwhile, for example, to pursue research on the principal-agent problem with some of the adjustments to make it better apply to AI scenarios, such as letting the agent be more powerful than the principal and adjusting the rent measure to better work with AI.

Maybe this approach won't yield anything (as we should expect on priors, simply because most approaches to AI safety are likely not going to work), but it seems worth exploring further on the chance it can deliver valuable insights, even if, as you say, the existing literature doesn't offer much that is directly useful to AI risk now.

Comment by gworley on Suspiciously balanced evidence · 2020-02-12T19:12:21.370Z · score: 4 (2 votes) · LW · GW

An additional possibility: everything already adds up to normality, we're just failing to notice because of how we're framing the question (in this case, whether or not holding middling probability estimates for difficult and controversial statements is correct).

Comment by gworley on What are the risks of having your genome publicly available? · 2020-02-12T01:36:06.242Z · score: 8 (5 votes) · LW · GW

I'm not sure, but my guess is that most of the risk lies in the future, i.e. the risks are in things that might be possible to do later that aren't possible to do now. I say this both because it doesn't seem very dangerous right now and because I can imagine ways in which it would be dangerous, albeit as an outsider to biology, epidemiology, and genetics.

Comment by gworley on A Cautionary Note on Unlocking the Emotional Brain · 2020-02-10T18:44:01.458Z · score: 5 (2 votes) · LW · GW
If you go through a belief update process and it feels like the wrong belief got confirmed, the fact that you feel like the wrong belief won means that there's still some other belief in your brain disagreeing with that winner. In those kinds of situations, if I am approaching this from a stance of open exploration, I can then ask "okay, so I did this update but some part of my mind still seems to disagree with the end result; what's the evidence behind that disagreement, and can I integrate that"?

I sometimes find that memories and the beliefs about the world that they power are "stacked" several layers deep. It's rare to find a memory directly connected to a mistaken ground belief, and it's more normal that 2, 3, 4, or even 5 memories are all interacting through twists and turns to produce whatever knotted and confused sense of the world I have.

Comment by gworley on Potential Research Topic: Vingean Reflection, Value Alignment and Aspiration · 2020-02-07T02:02:23.277Z · score: 3 (2 votes) · LW · GW

This is an interesting way to frame things. I have plenty of experience what you're calling aspiration here via deliberative practices over the past 5 years or so that have caused me to transform in ways I wanted to while also not understanding how to get there. For example, when I started zen practice I had some vague idea of what I was there to do or get—get "enlightened", be more present, be more capable, act more naturally, etc.—but I didn't really understand how to do it or even what it was I was really going for. After all, if I really did understand it, I would have already been doing it. It's only through a very slow process of experimenting, trying, being nudged in directions, and making very short moves towards nearby attractors that I've over time come to better unstand some of these things, or understand why I was confused and what the thing I thought I wanted really was without being skewed by my previous perceptions of it.

I think much of the problem with the kind of approach you are proposing is figuring out how to turn this into something a machine can do. That is, right now it's understood and explained at a level that makes sense for humans, but how do we take those notions and turn them into something mathematically precise enough that we could instruct a machine to do them and then evaluate whether or not what it did was in fact what we intended. I realize you are just pointing out the idea and not claiming to have it all solved, so this is only to say that I expect much of the hard work here is figuring out what the core, natural feature of what's going on with aspiration is such that it can be used to design an AI that can do that.

Comment by gworley on Category Theory Without The Baggage · 2020-02-05T21:42:45.292Z · score: 4 (2 votes) · LW · GW

This is maybe only useful to me and a handful of other folks, but I think of category theory sometimes as a generalization of topology. Rather than topologies you have categories and rather than liftings you have functors (this is not a strict, mathematical generalization, but an intuitive generalization of how the picture in my head of how topology works generalizes to how category theory works, so don't come at me that this is not strictly true).

Comment by gworley on Category Theory Without The Baggage · 2020-02-05T21:39:05.333Z · score: 2 (1 votes) · LW · GW

What seems the advantage to me is that category theory lets you account for more stuff within the formalisms.

Case in point, I did my dissertation in graph theory, and what I can tell you from proving hundreds of statements about graphs is that most of the theory of how and why graphs behave certain ways exists outside what can be captured formally by the theory. This is often what frustrates people about graph theory and all of combinatorics: the proofs come generally not by piling lots of previous results but by seeing through to some fundamental truth about what is going on with the collection ("category") of graphs you are interested in and then using one of a handful of proof tricks ("functors") to transform one thing into another to get your proof.

Comment by gworley on Philosophical self-ratification · 2020-02-04T23:33:14.958Z · score: 2 (1 votes) · LW · GW
"The conclusion does not necessarily follow from the premises."
Huh? If Beth's brain can't reason about the world then she can't know that humans are stimulus-response engines. (I'm not concerned with where in her brain the reasoning happens, just that it happens in her brain somewhere)

I was going to object a bit to this example, too, but since you're already engaged with it here I'll jump in.

I think reasoning about these theories as saying humans are "just" stimulus-response engines strawmans some of these theories. I feel similarly about the mental nonrealism example. In both cases there are better versions of these theories that aren't so easily shown as non-self-ratifying, although I realize you wanted versions here for illustrative purposes. Just a complication to the context of mentioning classes of theories where only the "worst" version of serves as an example, thus is likely to raise objections that fail to notice the isolation to only the worst version.

Comment by gworley on The case for lifelogging as life extension · 2020-02-03T20:25:05.421Z · score: 2 (1 votes) · LW · GW

Reviewing the comments I see only a glancing mention, so I'll point out the leading projects I'm aware of in this space are those by Terasem folks (Terasem might be poorly explained as a "transhumanist religion"). For example. I recall them doing some work on things like questions to elicit responses that might be particularly useful in reconstructing a person from writing, etc. and then also doing some work on persistent storage of the resulting data.

I still expect something like writing ourselves into the future to be useful, whether or not it takes the form a lifelogging. For myself, it's an excuse to write about my experiences and thoughts on social media: truly a lifelog for the ages.

Comment by gworley on What Money Cannot Buy · 2020-02-03T20:08:47.597Z · score: 21 (10 votes) · LW · GW
The real experts will likely spend a bunch of time correct popular misconceptions, which the fakers may subscribe to. By contrast, the fakers will generally not bother "correcting" the truth to their fakery, because why would they? They're trying to sell to unreflective people who just believe the obvious-seeming thing; someone who actually bothered to read corrections to misconceptions at any point is likely too savvy to be their target audience.

Using this as a heuristic would often backfire on you as stated, because there's a certain class of snake oil salesmen who use the conceit of correcting popular misconceptions to sell you on their own, unpopular misconceptions (and of course the product that fits them!). To me it looks like it's exploiting the same kind of psychological mechanism that powers conspiracy theories, where the world is seen as full of hidden knowledge that "they" don't want you to know because the misinformation is letting "them" get rich or whatever. And I think part of the reason this works is that it pattern matches to cases where it turned out someone who thought everyone else was wrong really was right, even if they are rare.

In short, you are more likely to be encountering a snake oil salesman than a Galileo or a Copernicus or a Darwin, so spending a lot of time "correcting" popular misconceptions is probably not a reliable signal of real competence and not fakery.

Comment by gworley on George's Shortform · 2020-02-03T19:04:38.618Z · score: 8 (2 votes) · LW · GW

This line of thinking links up (in my mind) with something slightly different that I've thought about before, which is how do you create a community where people aren't afraid to be themselves, risk saying wrong things, and are willing to listen to others. I think there is some convergence with the signaling concern, because there much of signaling can come from trying to present a view to others that signals something that might not quite be true or authentic, or even if it is true emphasizes certain things more than others differently than the poster naturally would, creating a kind of mask or facade where the focus is on signaling well rather than being oneself, saying wrong things, etc..

I think the solution is generally what I would call "safety" or "psychological safety": people often feel unsafe in a wide variety of situations, don't always realize they have deep, hidden fear powering their actions, and don't know how to ask for more safety without risking giving up the little bit they are already creating for themselves by signaling, being defensive, and otherwise not being themselves to protect themselves from threats real or merely perceived.

I've seen the amazing benefits of creating safety in organizations and the kind of collaboration and happiness it can enable, but I'm less sure about how to do it in a large, online community. I like this kind of exploration of potential mechanism for, as I think of it, creating enough safety to enable doing the things we really care about (being heard, collaborating, feeling happy to talk to others about our ideas, etc.).

Comment by gworley on Towards deconfusing values · 2020-02-03T18:47:34.073Z · score: 2 (1 votes) · LW · GW

I think I basically agree with this and think it's right. In some ways you might say focusing too much on "values" acts like a barrier to deeper investigation of the mechanisms at work here, and I think looking deeper is necessary because I expect that optimization against the value abstraction layer alone will result in Goodharting.

Comment by gworley on [Link] Ignorance, a skilled practice · 2020-01-31T18:58:55.489Z · score: 11 (5 votes) · LW · GW
In this story, I argue, Luria’s peasants are indexical geniuses, who refuse to engage in unproven syllogistic games. They are not interested in a global, universal game. Their children, however, are easily introduced to this game by the process of schooling and literacy.

I've noticed a weaker version of this effect when interacting with people especially not like me, for example in my zen practice. By "not like me" I mean not the sort of person who readily plays the "global, universal game", looking to find abstract models to explain every situation and apply them in new ones. All these people still went to school, all these people can play this game to some extent, but not to the extent I'm willing to (they didn't spend 10 years on voluntary higher education in mathematics and then work jobs where they are paid to create abstractions).

The differences are impressive. Here's just a very small sample of what I have in mind.

In the chant books for our zen center we have little marks showing where to do things like ring bells. Depending on what chants are being done that day and in what order, some of the bells change. For example, we always start with the same sequence of bells but then the transition from one chant to another can vary depending on what came before.

A few months back someone, not me, updated the chant book and thought to abstract out some of the details of this, marking in a separate section how those things work and putting in notes referencing that section. I saw it and thought "ah, finally, someone made this clearer by abstracting away the complicated details". Other people were confused, and the less like me they were the more confused they were. In the end we had to change it back.

Creating abstractions, while natural to me and some others, was extremely confusing to those on the other end of this spectrum who didn't know what to do when the details were not directly there for them to interact with. I expect this cognitive difference between people goes a long way to explaining some kinds of conflicts we see.

Comment by gworley on how has this forum changed your life? · 2020-01-31T00:40:37.650Z · score: 6 (3 votes) · LW · GW

Maybe this is just being cute, I often think of it the other way: if I hadn't been so in need of Less Wrong, Less Wrong wouldn't exist! Any effect it has back on me is just cake.

(This is literally true to the extent that I was among the group of people who were among the early existential risk community that were so confused it drove Eliezer to create what would become LW.)

Comment by gworley on Value uncertainty · 2020-01-31T00:32:01.528Z · score: 3 (2 votes) · LW · GW

Thinking about how these ideas are useful, you might be interested in my treatment of issues around metaethical uncertainty in this paper set to appear in the current issue of Journal of Consciousness Studies. I plan to write up something summarizing it soon on LW in light of its publication, but you can find posts that led to or were the product of some of its content here, here, here, here, and here.

Comment by gworley on Using vector fields to visualise preferences and make them consistent · 2020-01-30T19:34:49.206Z · score: 2 (1 votes) · LW · GW

Interesting. I'm, naturally, interested in the way we might make sense of this in light of valence. For example, can we think of valence as determining the magnitude of a vector (including the ability to flip the vector via a negative magnitude) and direction as about pointing to a place in idea/category space? Maybe this will prove a useful simplification over whatever it is that the brain actually does with valence to help it make decisions to make reasoning about how decisions are made easier and more tractable for aligning AI to.

Comment by gworley on Towards deconfusing values · 2020-01-30T18:53:05.763Z · score: 4 (2 votes) · LW · GW

In some sense that's a direction I might be moving in with my thinking, but there is still some thing that humans identify as values that they care about, so I expect there to be some real phenomenon going on that needs to be considered to get good outcomes, since I expect the default remains a bad outcome if we don't pay attention to whatever it is that makes humans care about stuff. I expect most work today on value learning is not going to get us where we want to go because it's working with the wrong abstractions, and my goal in this work is to dissolve those abstractions to find better ones for our long-term purposes.

Comment by gworley on Raemon's Scratchpad · 2020-01-30T01:08:39.859Z · score: 6 (3 votes) · LW · GW

Yeah, I think anything that adds a meaningful speedbump to any voting operation other than weak upvote is likely a step in the right direction of reshaping incentives.

Comment by gworley on Raemon's Scratchpad · 2020-01-29T19:36:53.562Z · score: 5 (3 votes) · LW · GW

A mechanism I really like is making certain kinds of votes scarce. I've appreciated it when it was a function on other sites I've used, as I think it improved things.

For example, Stack Overflow lets you spend karma in various ways. Two that come to mind:

  • downvotes cost karma (a downvote causing -5 karma costs the downvoter 2 karma)
  • you can pay karma to get attention (you can effectively super strong upvote your own posts, but you pay karma to do it)

Ways this or something similar might work on LW:

  • you get a budget of strong votes (say 1 per day) that you can save and spend how you like but you can't strong upvote everything
  • you get a budget of downvotes
  • strong votes cost karma
  • downvotes cost karma

I like this because it at least puts a break on excess use of votes in fights and otherwise makes these signals more valuable when they are used because they are not free like they are now.

Comment by gworley on On hiding the source of knowledge · 2020-01-29T19:05:15.506Z · score: 2 (1 votes) · LW · GW

That all sounds like part of the same cluster of mental movements to me, i.e. all the stuff that isn't deliberative.

Comment by gworley on On hiding the source of knowledge · 2020-01-29T00:28:35.153Z · score: 4 (2 votes) · LW · GW

Oh man, so much this. I feel like much of the differences that show up between how much different groups respect the writing of different authors comes down to differences in what is an acceptable justification to provide for how you know something and how you came to know it. For examples, one of the things I hated about working as a mathematician was that publications explicitly want you to cut out most of the explanation of how you figured something out and just stick to the minimum necessary description. There's something elegant about doing that, but it also works against helping people develop their mathematical intuitions such that they can follow after you or know why what you're doing might matter.

Comment by gworley on On hiding the source of knowledge · 2020-01-29T00:24:29.780Z · score: 6 (3 votes) · LW · GW
but it’s not at all what people usually mean when they talk about ‘intuition’.

For my own case I immediately and exactly match Jessica's use of "intuition" and expect that is what most people usually mean when they talk about intuition, so I think this claim requires greater justification if that seems important to you given a sample size of 3 here.

Comment by gworley on G Gordon Worley III's Shortform · 2020-01-28T22:55:01.007Z · score: 4 (3 votes) · LW · GW

Most of my most useful insights come not from realizing something new and knowing more, but from realizing something ignored and being certain of less.

Comment by gworley on Hedonic asymmetries · 2020-01-28T02:03:21.887Z · score: 5 (4 votes) · LW · GW
We get bored. If we don’t get bored, we still don’t like the idea of joy without variety.

For what it's worth I think this popular sentiment is misplaced. Joy without variety is just as good as joy with variety, but most people either never experience it or fail to realize they are experiencing it and so don't learn that joy without variety is equally good to joy with variety. Instead I think most people experience rare moments of peak joy and since they don't understand how they happened they come to believe that joy comes from variety itself rather than anywhere else.

Comment by gworley on What research has been done on the altruistic impact of the usual good actions? · 2020-01-28T00:53:47.971Z · score: 9 (4 votes) · LW · GW

I expect there is also a nonlinear network effect going on here, where one person defecting isn't much of a problem but if 20% of the population isn't nice enough then it just erodes down to a worse equilibrium. Something like the social fabric is delivering a lot of value and it's hard to attribute that to any one individual, and if it breaks or changes in ways that lead to everyone's lives being worse it's hard to also account for who should be responsible for those loses and in what amounts.

Comment by gworley on Healing vs. exercise analogies for emotional work · 2020-01-27T22:02:29.856Z · score: 5 (2 votes) · LW · GW

Strange, I find cleaning a pleasing activity.

Comment by gworley on Safety regulators: A tool for mitigating technological risk · 2020-01-27T21:17:20.257Z · score: 2 (1 votes) · LW · GW

How do you see the safety regulator model working in a case like bridges, where safety is already part of the primary function of the system, i.e. a bridge is built to optimize for getting people across a gap they couldn't otherwise cross, and being better at being a bridge (getting more people across), means being safer (fewer people fail to make it across for deadly reasons)? It's not entirely clear where we might draw the line to demarcate a safety regulator in such cases where safety is naturally part of the function.

Comment by gworley on Healing vs. exercise analogies for emotional work · 2020-01-27T20:55:38.221Z · score: 5 (2 votes) · LW · GW

Resharing and expanding on a comment I left on this on Facebook.

In Zen, cleaning is used as a metaphor for the incompleteness of practice. This is very familiar to Zen practitioners because one of the main things we do during work practice is clean the practice space, and as anyone who has cleaned anything knows, you can wipe away all the dirt, but more will appear and you'll have to clean again.

This seems relevant as an alternative metaphor that can be used instead of healing or exercise.

Comment by gworley on G Gordon Worley III's Shortform · 2020-01-27T20:26:09.234Z · score: 2 (1 votes) · LW · GW
But I'm also unsure whether or why my acceptance of closed-empty existence makes you sad.

Because I know the joy of grokking the openness of the "individual" and see the closed approach creating inherent suffering (via wanting for the individual) that cannot be accepted because it seems to be part of the world.

Comment by gworley on G Gordon Worley III's Shortform · 2020-01-27T20:22:59.570Z · score: 2 (1 votes) · LW · GW

My quick response is that all of these sources of loneliness can still be downstream of using closed individualism as an intuitive model. The more I am able to use the open model the more safe I feel in any situation and the more connected I feel to others no matter how similar or different they are to me. Put one way, every stranger is a cousin I haven't met yet, but just knowing on a deep level that the world is full of cousins is reassuring.

Comment by gworley on G Gordon Worley III's Shortform · 2020-01-26T05:00:13.505Z · score: 6 (3 votes) · LW · GW

NB: There's something I feel sad about when I imagine what it's like to be others, so I'm going to ramble about it a bit in shortform because I'd like to say this and possibly say it confusingly rather than not say it at all. Maybe with some pruning this babble can be made to make sense.

There's a certain strain of thought and thinkers in the rationality community that make me feel sad when I think about what it must be like to be them: the "closed" individualists. This is as opposed to people who view personal identity as either "empty" or "open".

I'll let Andrés of QRI explain all too briefly:

Closed Individualism: You start existing when you are born, and stop when you die.
Empty Individualism: You exist as a “time-slice” or “moment of experience.”
Open Individualism: There is only one subject of experience, who is everyone.

I might summarize the positions a little differently. Closed individualism is the "naive" theory of individualism: people, agents, etc. are like islands forever separated from each other by the gulf of subjective experience that can only be crossed by sending messages in bottles to the other islands (because you can never leave the island you are on). Empty individualism says that individualism is an after the fact reification and is not a natural phenomenon but rather an illusory artifact of how we understand the world. Open individualism is a position like the thing panpsychists are often trying to backpedal from, that the Universe is experiencing itself through us.

I think other positions are possible. For example, my own thinking is that it's more like seeing these all as partial views that are "right" from a certain frame of thinking but none on its own captures the whole thing. I might call my position something like dialectical empty individualism via comparison to dialectical monism (which I think is the right term to capture my metaphysical position, though neutral monism probably works just as well, ergo neutral empty individualism might be an alternative term).

Anyway, back to the sadness. Now to be fair I feel sad when I think about what it must be like to be anyone who holds tightly to a closed individualism perspective, rationalist or not, but I more often see the extremes of where the closed position takes one among rationalists. I'm making an inference here, but my guess is that a closed individualist view is a large part of what makes things like value drift scary, life extension a top priority, and game & decision theory feel vitally important not just to AI safety but to living life.

And I say all this having previously been a closed individualist for most of my life. And I'm not opposed to the closed individualist view: I'm working on problems in value alignment for AI, I'm signed up for cryonics, and I think better decision theory is worth having. After all, I think closed individualism is right, and not just partially right, but all right, up to the limit of not being willing to say it's right to the exclusion of the other perspectives. I think the closed individualism view feels real to people and is accurately describing both people's experiences of individuality and some of the phenomena that create it.

So why am I sad? In many ways, closed individualism is a view that is built on suffering. It contains within it a great loneliness for creatures like us who want desperately to connect. It says that no matter how hard we try to bridge the gap it will always remain, and many people feel that if they were given the chance to eliminate it and merge with others they wouldn't want to because then they'd lose themselves. To be a closed individualist is to live in fear: fear of death, fear of change, fear of loss of control. To me, that's sad, because there's another way.

The closed individualist might object "so what, this is a teleological argument: I might not want it to be that I am isolated and suffer, but closed individualism is what the world looks like, and I can't be hurt by what is already true, so I maintain this position is right". But I think this is wrong, because closed individualism is "wrong" in the sense that it doesn't tell the whole story. If you're looking for the theory that's the most scientifically defensible, that for sure is empty individualism, not closed individualism, but it's also very hard to get an intuitive grasp on empty individualism that you can live with and not sometimes think of yourself as a closed individual, so this tends to leave even the person who believes empty individualism is right acting as if closed individualism is how the world works.

The way out lies through open individualism, but this is a hard one to write about. Until you've felt the joy of open hearted connectedness to all being with ever fiber of existence, I think you'd have a hard time taking this view seriously, and the only way to feel this and take it seriously is probably through hundreds if not thousands of hours of meditation (you can also feel it with drugs but I think it's more likely a person would dismiss or misunderstand the feeling as just a cool thing they felt on drugs). The "I am not it but it is me" sense you get is not really possible to explain to others; you have to see it for yourself because it exists somewhere beyond distinction such that it can never be brought back to a world carved up into more than one whole.

So here we are, trapped in a world of suffering, persisted because every closed individualist suffers and generates more suffering for the whole because it is in all of us. Thus am I sad.

Comment by gworley on Bay Solstice 2019 Retrospective · 2020-01-19T17:11:09.994Z · score: 5 (2 votes) · LW · GW

Having had a couple days to sit with this thread, I think it's worth adding that I'm willing to participate in addressing this issue in future Solstice celebrations (so for 2020 at least). I think I'm a poor choice for lots of things related to Solstice organizing because I'm not close enough to the core of rationalist culture to reliably drive things in ways most rationalists would like, but within the context of a team that is doing that I think I could probably have a positive impact on the Solstice experience by pushing it to better incorporate the kinds of things I have in mind and that would be effective at achieving their ends, though I am also happy to defer to others if they are motivated to do this and think they can succeed.

Put another way, if I just complain and point out the problem and offer some suggestions for who to fix it that's not enough to make change happen, so since I think this is important I think it's important enough that I should try to do something about it with my actions rather than just my words.

Comment by gworley on Bay Solstice 2019 Retrospective · 2020-01-17T23:04:28.125Z · score: 12 (4 votes) · LW · GW

That was probably me in the response form.

In the previously planned post I was going to explain something about what I saw like this as way of evidence:

After Solstice I talked with or otherwise helped multiple people suffering as a result of having attended Solstice. One person was seriously negative affected and I talked with them for over an hour about it. Another person was moderately negatively affected and I talked with them about it for about 10 minutes. I talked to 3 other people who in passing mentioned Solstice being net-negative for them but they didn't invite further conversation on that topic. The main themes I got from these conversations is that Solstice strongly reminded these folks that they felt lonely, isolated, or ineffectual in ways I would categorize as distressing or dissonant with their sense of self.

Assuming I got what amounts to a random sample, this suggests to me there is at least a large minority—let's call it O(10%)—of people attending Solstice who are negatively impacted by it.

I also wrote up the following caveats to my evidence:

It's possible I suffer from selection bias and the situation is not as it seems to me. Perhaps by some mechanism or just chance I encountered more people suffering from having attended Solstice in 2019 than is proportional to the entire population. I have no reason to think that is especially the case but it's worth keeping in mind when I give an impression of how many people are affected in negative ways by Solstice and how much that matters.
I also am relying largely on first-hand reports people gave me of their experiences and how much I perceived them to be suffering as inferred from those reports. I have not collected data in a systematic way, so I think there is a probably a lot wrong with my impression if you ask it to do to much. I am only personally confident of the general direction and order of the effect size, nothing more.
Also keep in mind I can't say anything about the people who self-selected out of the main Solstice celebration because they knew from past experience with Solstice celebrations or expectations from similar events that they would have a bad time. I've talked to several people who do this over the years, so if anything they suggest the negative experiences of Solstice are more common than they appear or would be if people didn't avoid it.

I think "aftercare" is a decent first-order approximation of what I view as the appropriate response. I think it needs to be a bit more than just "throw a party" or "here are some people you can talk to". What I have in mind is something more systematic and ritualistic.

An ineffectual version of what I have in mind is the way, towards the end of a Catholic mass, there's the rite of peace: everyone stands up, shakes hands, and says "peace be with you" to the people near them in the pews. Slightly better is the Protestant tradition of lunch fellowship or church picnic that immediately follows service, a sort of post-worship potluck meal, but much of what makes this work (or, as often as not, not) depends on the local culture and how inclusive it is.

I think a good version of this would be something I've not seen much before: a structured authentic relating activity as part of the upswing of the service. There was something like this a few years ago at a Bay Area Solstice where people wrote on notes they posted to the walls. As I recall the prompt was something like "what is something I'm privately afraid of and not telling others", although maybe I'm mixing that up from another event. I think we could come up with something similar for future events that would help people connect and remind them that they are connected, even if they can't see the face of those they are connected to.

I think none of this is to draw away from the darkness. Make the low point low and dark and full of woe. But match it with a high point of brightness and joy that actually pulls people together and connects them without backfiring and throwing in their face the way others are connected and they are not.

I think the Solstice should be "for everyone" in a certain sense, but that achieved not by watering it down, but by making it whole so that, as much as possible, it can hit the dark notes in a way where, even in the depths of despair, people retain a thread of connection to safety that pulls them back out into the light so they can dwell in the darkness for a time without being abandoned there.

Comment by gworley on Reality-Revealing and Reality-Masking Puzzles · 2020-01-17T22:37:22.179Z · score: 12 (8 votes) · LW · GW

I'd like to emphasize some things related to this perspective.

One thing that seems frustrating to me from just outside CFAR in the control group[1] is the way it is fumbling its way towards creating a new traditional for what I'll vaguely and for lack of a better term call positive transformation, i.e. taking people and helping them turn themselves into better versions of themselves that they more like and have greater positive impact on the world (make the world more liked by themselves and others). But there are already a lot of traditions that do this, albeit with different worldviews than the one CFAR has. So it's disappointing to watch CFAR to have tried and failed over the years in various ways, as measured by my interactions with people who have gone through their training programs, that were predictable if they were more aware of and practiced with existing traditions.

This has not been helped by what I read as a disgust or "yuck" reaction from some rationalists when you try to bring in things from these traditions because they are confounded in those traditions with things like supernatural claims. To their credit, many people have not reacted this way, but I've repeatedly felt the existence of this "guilty by association" meme from people who I consider allies in other respects. Yes, I expect on the margin some of this is amped up by the limitations of my communication skills such that I observe more of it than others do along with my ample willingness to put forward ideas that I think work even if they are "wrong" in an attempt to jump closer to global maxima, but I do not think the effect is so large as to discredit this observation.

I'm really excited to read that CFAR is moving in the direction implied by this post, and, because of the impact CFAR is having on the world through the people it impacts, like Romeo I'm happy to assist in what ways I can to help CFAR learn from the wisdom of existing traditions to make itself into an organization that has more positive effects on the world.


[1] This is a very tiny joke: I was in the control group for an early CFAR study and have still not attended a workshop, so in a certain sense I remain in the control group.

Comment by gworley on Bay Solstice 2019 Retrospective · 2020-01-17T21:35:05.115Z · score: 10 (8 votes) · LW · GW
On the feedback form, some people mentioned being very upset by Solstice because it reminded them that they were lonely or felt like they could be accomplishing more. I do not think anything should change about Solstice itself in response to this feedback, because being reminded that the universe is vast and dark and cold is pretty much the entire point of Solstice. 

I started writing a post around this aspect of Solstice, and I may come back to it, but since you bring it up here I think it's worth addressing in a comment.

If I'm being frank, I think this is a woefully inadequate and irresponsible response. I don't mean that as an attack on your or the organizers in any particular year, but rather as a statement against the general pattern of behavior being manifested. A rationalist culture with more Hufflepuff virtue would not think it okay to remind people of something distressing and then offer them nothing to deal with the distress.

Some more, somewhat disjointed and rambly thoughts on all this:

One of the effects of Solstice is that it makes salient thoughts and memories of loss and loneliness, generates negative-affect emotions, and otherwise affects people in powerful ways. These effects, especially for those who do not have a lot of psychological safety, range from producing mild negative affect to causing trauma or trauma-like experiences to causing psychological deintegration. Failure to address this and just say "ehh, intended effect" is, at the risk of sounding hyperbolic, similar in my mind to encouraging someone to engage in a physical activity that is likely to cause them injury and then saying "oh well, I guess find your own way to the hospital" when they inevitably get hurt. Or, for a more mundane example that I think illustrates the same principle, it's like telling a friend you want them to hang out with them at their house to lead them through doing a messy activity and then when the inevitable mess appears saying "okay, well, time to leave, I'm sure you'll clean it up".

I realize I'm making a claim here about what is morally/ethically right. I view it as important to take responsibility for the consequences of our actions, and if we put on an event that regularly and predictably causes negative psychological impact on a portion of the attendees, we have a responsibility to those attendees to help them deal with the fallout.

An alternative would be to more activity encourage such folks not to attend, but I think this is antithetical to how I understand the Solstice (it's a highly inclusive event, and we'd cause people pain if we excluded them), so I think we need to work towards helping people reintegrate after they may have old or active psychic wounds torn open by the event. I think this can be done as part of the ceremony, although a more complete solution would be changing the culture such that the downswing of Solstice didn't drop people out from a place where they already felt unsafe. Lacking that, I think careful ritual design during the dawn/new day part of the arc to help people connect and see the path forward would go a long way to addressing this issue.

For context, much of my thinking here comes from the way spiritual traditions help or fail to help people deal with the consequences of the insights they may gain from interacting with the tradition. This has been a problem in, for example, Western Buddhism, where people may teach meditation but not be equipped or prepared to help or at least help get help for people whose lives get worse (sometimes dramatically so) as a result of meditating. The rationalist project, even though it has a very different worldview and objectives, shares with some spiritual traditions an intention to help people better their lives through transformative practice, but I also see it doing not nearly as much as it could or, in my opinion, should to help those it unintentionally but predictably hurts by teaching its methods, and the situation with those hurt by Winter Solstice seems one more manifestation of this pattern. I would like us to do better at Winter Solstice as a way of shifting towards a better pattern.

Comment by gworley on Can we always assign, and make sense of, subjective probabilities? · 2020-01-17T18:49:39.383Z · score: 4 (3 votes) · LW · GW

I think this is possibly rehashing the main point of disagreement between frequentists and subjectivists, i.e. whether or not probability is only sensible after the fact or if it is also meaningful to talk about probabilities before any data is available. I'm not sure this debate will ever end, but I can tell you that LW culture leans subjectivists, specifically along Bayesian lines.

Comment by gworley on Malign generalization without internal search · 2020-01-15T23:29:12.762Z · score: 2 (1 votes) · LW · GW
However, it's worth noting that saying the agent is mistaken about the state of the world is really an anthropomorphization. It was actually perfectly correct in inferring where the red part of the world was -- we just didn't want it to go to that part of the world. We model the agent as being 'mistaken' about where the landing pad is, but it works equally well to model the agent as having goals that are counter to ours.

That we can flip our perspective like this suggests to me that thinking of the agent as having different goals is likely still anthropomorphic or at least teleological reasoning that results from us modeling this agent has having dispositions it doesn't actually have.

I'm not sure what to offer as an alternative since we're not talking about a category where I feel grounded enough to see clearly what might be really going on, much less offer a more useful abstraction that avoids this problem, but I think it's worth considering that there's a deeper confusion here that this exposes but doesn't resolve.

Comment by gworley on Book review: Rethinking Consciousness · 2020-01-14T22:18:22.957Z · score: 3 (2 votes) · LW · GW
Now put the two together, and you get an "attention schema", an internal model of attention (i.e., of the activity of the GNW). The attention schema is supposedly key to the mystery of consciousness.

The idea of an attention schema helps make sense of a thing talked about in meditation. In zen we talk sometimes about it via the metaphor of the mind like a mirror such that it sees itself reflecting in itself. In The Mind Illuminated it's referred to as metacognitive awareness. The point is that the process by which the mind operates can be observed by itself even as it operates, and and perhaps the attention schema is an important part of what it means to do that, specifically causing the attention schema to be able to model itself.

Comment by gworley on What is the relationship between Preference Learning and Value Learning? · 2020-01-14T20:23:20.655Z · score: 3 (2 votes) · LW · GW

The short answer is that yes, they are related and basically about the same thing. However the approaches of researchers vary a lot.

Relevant considerations that come to mind:

  • The extent to which values/preferences are legible
  • The extent to which they are discoverable
  • The extent to which they are hidden variables
  • The extent to which they are normative
  • How important immediate implementability is
  • How important extreme optimization is
  • How important safety concerns are

The result is that I think there is something of a divide between safety-focused researchers and capabilities-focused researchers in this area due to different assumptions and that makes each others work not very interesting/relevant to the other cluster.

Comment by gworley on Ascetic aesthetic · 2020-01-14T20:14:38.396Z · score: 5 (3 votes) · LW · GW

So I think you are right about the way aesthetics power ethical reasoning, and I think aesthetics is just a waypoint on the causal mechanism of generating ethical judgements, because aesthetics are ultimately about what we value (how we compare things for various purposes), and what we value is a function of valence. So to the extent I agree it's to the extent that I see ethics and aesthetics as applications of valence to different domains.