Deconfusing Human Values Research Agenda v1 2020-03-23T16:25:27.785Z · score: 18 (6 votes)
Robustness to fundamental uncertainty in AGI alignment 2020-03-03T23:35:30.283Z · score: 11 (3 votes)
Big Yellow Tractor (Filk) 2020-02-18T18:43:09.133Z · score: 12 (4 votes)
Artificial Intelligence, Values and Alignment 2020-01-30T19:48:59.002Z · score: 13 (4 votes)
Towards deconfusing values 2020-01-29T19:28:08.200Z · score: 13 (5 votes)
Normalization of Deviance 2020-01-02T22:58:41.716Z · score: 57 (21 votes)
What spiritual experiences have you had? 2019-12-27T03:41:26.130Z · score: 22 (5 votes)
Values, Valence, and Alignment 2019-12-05T21:06:33.103Z · score: 12 (4 votes)
Doxa, Episteme, and Gnosis Revisited 2019-11-20T19:35:39.204Z · score: 14 (5 votes)
The new dot com bubble is here: it’s called online advertising 2019-11-18T22:05:27.813Z · score: 55 (21 votes)
Fluid Decision Making 2019-11-18T18:39:57.878Z · score: 9 (2 votes)
Internalizing Existentialism 2019-11-18T18:37:18.606Z · score: 10 (3 votes)
A Foundation for The Multipart Psyche 2019-11-18T18:33:20.925Z · score: 7 (1 votes)
In Defense of Kegan 2019-11-18T18:27:37.237Z · score: 10 (5 votes)
Why does the mind wander? 2019-10-18T21:34:26.074Z · score: 11 (4 votes)
What's your big idea? 2019-10-18T15:47:07.389Z · score: 29 (15 votes)
Reposting previously linked content on LW 2019-10-18T01:24:45.052Z · score: 18 (3 votes)
TAISU 2019 Field Report 2019-10-15T01:09:07.884Z · score: 39 (20 votes)
Minimization of prediction error as a foundation for human values in AI alignment 2019-10-09T18:23:41.632Z · score: 13 (7 votes)
Elimination of Bias in Introspection: Methodological Advances, Refinements, and Recommendations 2019-09-30T20:23:13.139Z · score: 16 (3 votes)
Connectome-specific harmonic waves and meditation 2019-09-30T18:08:45.403Z · score: 12 (10 votes)
Goodhart's Curse and Limitations on AI Alignment 2019-08-19T07:57:01.143Z · score: 15 (7 votes)
G Gordon Worley III's Shortform 2019-08-06T20:10:27.796Z · score: 16 (2 votes)
Scope Insensitivity Judo 2019-07-19T17:33:27.716Z · score: 25 (10 votes)
Robust Artificial Intelligence and Robust Human Organizations 2019-07-17T02:27:38.721Z · score: 17 (7 votes)
Whence decision exhaustion? 2019-06-28T20:41:47.987Z · score: 17 (4 votes)
Let Values Drift 2019-06-20T20:45:36.618Z · score: 3 (11 votes)
Say Wrong Things 2019-05-24T22:11:35.227Z · score: 99 (36 votes)
Boo votes, Yay NPS 2019-05-14T19:07:52.432Z · score: 34 (11 votes)
Highlights from "Integral Spirituality" 2019-04-12T18:19:06.560Z · score: 20 (21 votes)
Parfit's Escape (Filk) 2019-03-29T02:31:42.981Z · score: 40 (15 votes)
[Old] Wayfinding series 2019-03-12T17:54:16.091Z · score: 9 (2 votes)
[Old] Mapmaking Series 2019-03-12T17:32:04.609Z · score: 9 (2 votes)
Is LessWrong a "classic style intellectual world"? 2019-02-26T21:33:37.736Z · score: 31 (8 votes)
Akrasia is confusion about what you want 2018-12-28T21:09:20.692Z · score: 27 (16 votes)
What self-help has helped you? 2018-12-20T03:31:52.497Z · score: 34 (11 votes)
Why should EA care about rationality (and vice-versa)? 2018-12-09T22:03:58.158Z · score: 16 (3 votes)
What precisely do we mean by AI alignment? 2018-12-09T02:23:28.809Z · score: 29 (8 votes)
Outline of Metarationality, or much less than you wanted to know about postrationality 2018-10-14T22:08:16.763Z · score: 19 (17 votes)
HLAI 2018 Talks 2018-09-17T18:13:19.421Z · score: 15 (5 votes)
HLAI 2018 Field Report 2018-08-29T00:11:26.106Z · score: 51 (21 votes)
A developmentally-situated approach to teaching normative behavior to AI 2018-08-17T18:44:53.515Z · score: 12 (5 votes)
Robustness to fundamental uncertainty in AGI alignment 2018-07-27T00:41:26.058Z · score: 7 (2 votes)
Solving the AI Race Finalists 2018-07-19T21:04:49.003Z · score: 27 (10 votes)
Look Under the Light Post 2018-07-16T22:19:03.435Z · score: 25 (11 votes)
RFC: Mental phenomena in AGI alignment 2018-07-05T20:52:00.267Z · score: 13 (4 votes)
Aligned AI May Depend on Moral Facts 2018-06-15T01:33:36.364Z · score: 9 (3 votes)
RFC: Meta-ethical uncertainty in AGI alignment 2018-06-08T20:56:26.527Z · score: 18 (5 votes)
The Incoherence of Honesty 2018-06-08T02:28:59.044Z · score: 22 (12 votes)
Safety in Machine Learning 2018-05-29T18:54:26.596Z · score: 18 (5 votes)


Comment by gworley on Openness Norms in AGI Development · 2020-03-31T03:07:33.671Z · score: 5 (3 votes) · LW · GW

This is a little bit aside the point, but when I clicked on this link and saw "communist norm in AGI" in the title I predicted this would be some kind of low-quality post arguing about AGI development from the standpoint of a political philosophy. Then I read the post and it turns out that's not what "communist norm" means here; it's jargon that has specific meaning that wasn't what I expected. My guess is others might react to the title in the same way.

So although your title is accurate, my guess is that your post will be better received if you give this term a different name that less sounds like it's associated with a political philosophy, like maybe calling it an "openness norm", at least initially before introducing and switching to the technical term "communist norm", which seems to my ear a better description of what you're talking about even if it's not the technical term normally used.

Comment by gworley on What happens in a recession anyway? · 2020-03-31T02:55:52.806Z · score: 4 (2 votes) · LW · GW

My impression is that each recession looks different because a recession is something that happens at the macro scale but is caused by events at the micro scale, and there are lots of different ways to cause a recession such that I expect it to be hard to generalize well across recessions other than "less economic activity in aggregate". Not that there's no patterns, but I also expect there to be a lot of anti-patterns or seeming anti-patterns because if you just look at "recession" you'll get a grab bag of things that aren't causally similar even if they produced the same outcome.

Comment by gworley on A critical agential account of free will, causation, and physics · 2020-03-30T22:21:46.806Z · score: 2 (1 votes) · LW · GW

To check if I'm understanding you correctly, is this an idealist approach you're taking here, using agent terminology as the foundation instead of talking in terms of "mind" or "intentionality"?

Comment by gworley on Categorization of Meta-Ethical Theories (a flowchart) · 2020-03-30T17:18:40.932Z · score: 3 (2 votes) · LW · GW

Thanks, I really like this. There are a lot of positions within moral philosophy and it's easy to get lost among them. This seems like a handy way to help build the intuitions about how the positions cleave theory space.

Comment by gworley on Deconfusing Human Values Research Agenda v1 · 2020-03-28T20:27:49.628Z · score: 2 (1 votes) · LW · GW
What's your model for why those actions weren't undone?

Not quite sure what you're asking here. In the first two cases they eventually were undone after people got fed up with the situation, the last is recent enough I don't consider it's not having already been undone as evidence people like it, only that they don't have the power to change it. My view is that these changes stayed in place because the dictators and their successors continued to believe the good out weighted the harm when either this was clearly contrary to the ground truth but served some narrow purpose that was viewed as more important or when the ground truth was too hard to discover at the time and we only believe it was net harmful through the lens of historical analysis.

To pop back up to the original question -- if you think making your friend 10x more intelligent would be net negative, would you make them 10x dumber? Or perhaps it's only good to make them 2x smarter, but after that more marginal intelligence is bad?
It would be really shocking if we were at the optimal absolute level of intelligence, so I assume that you think we're at the optimal relative level of intelligence, that is, the best situation is when your friends are about as intelligent as you are. In that case, let's suppose that we increase/decrease all of your friends and your intelligence by a factor of X. For what range of X would you expect this intervention is net positive?

I'm not claiming we're at some optimal level of intelligence for any particular purpose, only that more intelligence leads to greater agency which, in the absence of sufficient mechanisms to constrain actions to beneficial ones, results in greater risk of negative outcomes due to things like deviance and unilateral action. Thus I do in fact think we'd be safer from ourselves, for example screening off existential risks humanity faces due to outside threats like asteroids, if we were dumber.

By comparison, chimpanzees may not live what look to us like very happy lives, they are some factor dumber than us, but also they aren't at risk of making themselves extinct because one chimp really wanted a lot of bananas.

I'm not sure how much smarter we could all get without putting us at too much risk. I think there's an anthropic argument to be made that we are below whatever level of intelligence is dangerous to ourselves without greater safeguards because we wouldn't exist in such universes due to having killed ourselves, but I feel like I have little evidence to make a judgement about how much smarter is safe given, for example, being, say, 95th percentile smart didn't stop people from building things like atomic weapons or developing dangerous chemical applications. I would expect making my friends smarter to risk similarly bad outcomes. Making them dumber seems safer, especially when I'm in the frame of thinking about AGI.

Comment by gworley on Deconfusing Human Values Research Agenda v1 · 2020-03-27T23:39:44.576Z · score: 2 (1 votes) · LW · GW
I'd be interested in specific examples of well-intentioned dictators that screwed things up (though I anticipate my objections will be that 1. they weren't well-intentioned or 2. they didn't have the power to actually impose decisions centrally, and had to spend most of their power ensuring that they remained in power).

Some examples of actions taken by dictators that I think were well intentioned and meant to further goals that seemed laudable and not about power grabbing to the dictator but had net negative outcomes for the people involved and the world:

  • Joseph Stalin's collectivization of farms
  • Tokugawa Iemitsu's closing off of Japan
  • Hugo Chávez's nationalization of many industries
I know you're saying that, I just don't see many arguments for it. From my perspective, you are asserting that Goodhart problems are robust, rather than arguing for it. That's fine, you can just call it an intuition you have, but to the extent you want to change my mind, restating it in different words is not very likely to work.

I've made my case for that here.

Do you really believe that you can predict facts about humans better just by reasoning about evolution (and using no information you've learned by looking at humans), relative to building a model by looking at humans (and using no information you've learned from the theory of evolution)? I suspect you actually mean some other thing, but idk what.

No, it's not my goal that we not look at humans. I instead think we're currently too focused on trying to figure out everything from only looking at the kinds of evidence we can easily collect today, and that we also don't have detailed enough models to know what other evidence is likely relevant. I think understanding whatever is going on with values is hard because there is data further "down the stack", if you will, from observations of behavior that is relevant. I think that because I look at issues like latent preferences that by definition exist because we didn't have enough data to infer their existence but that need not necessarily exist if we gather more data about how those latent preferences are generated such that we could discover them in advance by looking earlier in the process that generates them.

Comment by gworley on ofer's Shortform · 2020-03-27T19:01:36.782Z · score: 6 (4 votes) · LW · GW

Counterpoint: people today inhabit forested/jungled areas without burning everything down by accident (as far as I know; it's the kind of fact I would expect to have heard about if true), and even use fire for controlled burns to manage the forest/jungle.

Comment by gworley on Deconfusing Human Values Research Agenda v1 · 2020-03-27T18:57:26.023Z · score: 2 (1 votes) · LW · GW
Suppose one of your friends became 10x more intelligent, or got a superpower where they could choose at will to stop time for everything except themselves and a laptop (that magically still has Internet access). Is this a net positive change to the world, or a net negative one?

I expect it to be net negative. My model is something like humans are not very agentic (able to reliably achieve/optimize for a goal) in absolute terms even though we may feel as though humans are especially agentic relative to other systems, and because humans bumble a lot they don't tend to have a lot of impact and things work out well or poorly on average as a result of lots of moves that cancel each other out and only leave a small gain or loss in valued outcomes in the end. A 10x smarter human would be more agentic, and if they are not exactly right about how to do good they could more easily do harm that would normally be buffered by their ineffectiveness.

I build this intuition from, for example, the way dictators often screw things up even when they are well intentioned because they now have more power to achieve their goals and it amplifies their mistakes and misunderstandings in ways that cause more impact, more variance, and historically worse outcomes than less agentic methods of leadership.

Although this is not a perfect analogy because 10x smarter is not just 10x more powerful/agentic but 10x better able to think through consequences (which the dictators lacks), I also think the orthogonality thesis is robust enough that it's more likely to me that 10x smarter will not mean a match in ability to think through consequences that will perfectly offset the risks of greater agency.

Wait, I infer alignment from way more than just observed behavior. In the case of my friends, I have a model of how humans work in general, informed both by theory (e.g. evolutionary psychology) and empirical evidence (e.g. reasoning about how I would do X, and projecting it onto them). In the case of AI systems, I would want similar additional information beyond just their behavior, e.g. an understanding of what their training process incentivizes, running counterfactual queries on them early in training when they are still relatively unintelligent and I can understand them, etc.

Exactly, because you can't infer alignment from observed behavior without normative assumptions. I'm saying even with all that (or especially with all of that), the measurement gap is large and we should expect high deviance from the target that will readily lead to Goodharting.

It's not obvious to me that modeling the generators of a thing is easier than modeling the thing. E.g. It's much easier for me to model humans than to model evolution.

It's definitely harder. That's a reasonable consideration when we're trying to engineer a system that will be good enough while racing against the clock, and I think it's quite reasonable, for example, that we're going to try to tackle value alignment via extensions to narrow value learning approaches first because that's easier to build. But I also think those approaches will fail and so I'm looking ahead to where I see the limits of our knowledge for what we'll have to do conditioned on this bet I'm making that value learning approaches similar in kind to those we're trying now won't produce aligned AIs.

Comment by gworley on Deconfusing Human Values Research Agenda v1 · 2020-03-26T19:35:41.376Z · score: 2 (1 votes) · LW · GW

Yep, agree with the summary.

I'll push back on your opinion a little bit here as if it were just a regular LW comment on the post.

I strongly agree that we are confused about human values, but I don't see an understanding of human values as necessary for value alignment. We could hope to build AI systems in a way where we don't need to specify the ultimate human values (or even a framework for learning them) before running the AI system.

This is a reasonably hope but I generally think hope is dangerous when it comes to existential risks, so I'm moved to pursue this line of research because I believe it to be neglected, I believe it's likely enough to be useful to building aligned AI to be worth pursuing, and I would rather us have explored it thoroughly and ended up not needing it than have not explored it and end up needing to have. I also don't think it much takes away from other AI safety research, since the skills needed to work on this problem are somewhat different than those needed to address other AI safety problems (or so I think), so I mostly think we can pursue it for a fairly low opportunity cost.

As an analogy, my friends and I are all confused about human values, but nonetheless I think they are more or less aligned with me (in the sense that if AI systems were like my friends but superintelligent, that sounds broadly fine).

I expect we have a disagreement on how robust Goodhart problems are, as in I would expect that if you felt more or less aligned with a superintelligent AI system the way you feel you are aligned with your friends, the AI system would optimize so hard that it would no longer be aligned, and that the level of alignment you are talking about only works because of lack of optimization power. I suspect that at the level of measurement you're talking about where you can infer alignment from observed behavior there is too much room for error between the measure and the target such that deviance is basically guaranteed.

Thankfully I know others are working on ways to engineer us around Goodhart problems, and maybe these solutions will be robust enough to work over such large measurement gaps, but again I am perhaps more conservative here and want to make the gap between the measure and the target much smaller so that we can effectively get "under" Goodhart effects for the targets we care about by measure and modeling the processes that generate those targets rather than the targets themselves.

Comment by gworley on Deconfusing Human Values Research Agenda v1 · 2020-03-26T19:20:28.212Z · score: 4 (2 votes) · LW · GW

Thanks for your detailed response. Before I dive in, I'll just mention I added a bullet point about Goodhart because somehow when I wrote this up initially I forgot to include it.

× But our real problem is on the meta-level: we want to understand value learning so that we can build an AI that learns human values even without starting with a precise model waiting to be filled in.
_× We can trust AI to discover that structure for us even though we couldn't verify the result, because the point isn't getting the right answer, it's having a trustworthy process.
_ × We can trust AI to discover that structure for us even though we couldn't verify the result, because out human values is because we need to give the AI a precise instruction based on a very vague human concept. The structure is vague for the same reasons as the content.

I don't exactly disagree with you, other than to say that I think if we don't understand enough about human values (for some yet undetermined amount that is "enough") we'd fail to build something that we could trust, but I also don't expect we have to solve the whole problem. Thus I think we need to know enough about the structure to get there, but I don't know how much enough is, so for now I work on the assumption that we have to know it all, but maybe we'll get lucky and can get there with less. But if we don't at least know something of the structure, such as at the fairly abstract level I consider here, I don't think we can precisely specify what we mean by "alignment" to not fail to build aligned AI.

So it's perhaps best to understand my position as a conservative one that is trying to solve problems that I think might be issues but are not guaranteed to be issues because I don't want to find ourselves in a world where we wished we had solved a problem, didn't, and then suffer negative consequences for it.

_ × Merely causing events (in the physical level of description) is not sufficient to say we're acting (in the agent level of description). We need some notion of "could have done something else," which is an abstraction about agents, not something fundamentally physical.
_ × Similar quibbles apply to the other parts - there is no physically special decision process, we can only find one by changing our level of description of the world to one where we posit such a structure.
_ × The point: Everything in the basic model is a statistical regularity we can observe over the behavior of a physical system. You need a bit more nuanced way to place preferences and meta-preferences.

I don't think I have any specific response other than to say that you're right, this is a first pass and there's a lot of hand waving going on still. One difficulty is that we want to build models of the world that will usefully help us work with it while the world also doesn't itself contain the modeled things as themselves, it just contains a soup of stuff interacting with other stuff. What's exciting to me is to get more specific on where my new model breaks down because I expect that to lead the way to become yet less confused.

_ _ × But I think if one applies this patch, then it's a big mistake to use loaded words like "values" to describe the inputs (all inputs?) to the decision-generation process, which are, after all, at a level of description below the level where we can talk about preferences. I think this conflicts with the extensive definitions from earlier.

So this is a common difficulty in this kind of work. There is a category we sort of see in the world, we give it a name, and then we look to understand how that category shows up at different levels of abstraction in our models because it's typically expressed both at a very high level of abstraction and made up of gears moving at lower levels of abstraction. I'm sympathetic to this argument that talking about "values" or any other word in common use is a mistake because it invites confusion, but when I've done the opposite and used technical terminology it's equally confusing but in a different direction, so I no longer think word choice is really the issue here. People are going to be confused because I'm confused, and we're on this ride of being confused together as we try to unknot our tangled models.

× If we recognize that we're talking about different levels of description, then preferences are not either causally after or causally before decisions-on-the-basic-model-level-of-abstraction. They're regular patterns that we can use to model decisions at a slightly higher level of abstraction.

This is probably correct and so in my effort to make clear what I see as the problem with preference models maybe I claim too much. There's a lot to be confused about here.

_ × But I still don't agree that this makes valence human values. I mean values in the sense of "the cluster we sometimes also point at with words like value, preference, affinity, taste, aesthetic, intention, and axiology." So I don't think we're left with a neuroscience problem, I still think what we want the AI to learn is on that higher level of abstraction where preferences live.

I don't know how to make the best case for valence. To me it seems good model because it fits with a lot of other models I have of the world, like that the interesting thing about consciousness is feedback and so lots of things are conscious (in the sense of having the fundamental feature that separates things with subjective experience from those without).

Also to be clear I think we are not left with only a neuroscience problem but also a neuroscience problem. What happens at higher levels of abstraction is meaningful, but I also think it's insufficient on its own and requires us to additionally address questions of how neurons behave to generate what we recognize at a human level as "value".

Comment by gworley on Deconfusing Human Values Research Agenda v1 · 2020-03-25T22:50:00.150Z · score: 2 (1 votes) · LW · GW

I specifically propose they come from valence, recognizing we know that valence is a phenomena generated by the human brain but not exactly how it happens (yet).

Comment by gworley on Are veterans more self-disciplined than non-veterans? · 2020-03-23T15:27:49.980Z · score: 2 (1 votes) · LW · GW

This suggests we might look at data from countries like Israel and Singapore with mandatory military service (although it's complicated because I think both have alternative civil service for objectors and exceptions for certain groups), and look at how results of veterans there compare with similar populations to hold other cultural effects on discipline constant.

Comment by gworley on What information, apart from the connectome, is necessary to simulate a brain? · 2020-03-20T02:17:13.298Z · score: 8 (6 votes) · LW · GW

My impression is that current thinking is that connectome is enough. Other things I can think of the could matter:

  • potentials between neurons
  • neurotransmitter levels
  • interactions with other parts of the body

But I think all of these are expected not to matter much because, for example, humans seem to come back the same as they were before after short periods of brain inactivity aside from damage from oxygen deprivation. Similarly neurotransmitter levels change the "mood" or "flavor" of a person, along with other chemicals that interact with neurotransmitters, but similarly these seem to be transient effects that change a person but are also easily tunable and so not necessarily a key part of what it means to be a particular person.

I think there is some case that interactions with parts of the body outside the brain might matter. For example, I find meditating is difficult when I am sick and have changes to my breathing. This suggests that something about this is interfering with the normal process that lets me get into a meditative state. But arguably this could easily be simulated and is not differentiating between humans, so may not and probably doesn't matter.

Thus to the best of my knowledge connectome is probably "enough" although more wouldn't hurt.

Comment by gworley on Is the coronavirus the most important thing to be focusing on right now? · 2020-03-19T00:21:20.103Z · score: 16 (7 votes) · LW · GW

I agree. Although it's interesting, it feels like it's getting outsized attention because it's a "near" threat. I think this made more sense when the wider public wasn't taking COVID-19 seriously, but now that they are I don't think there's a lot of value in LW continuing to focus on it to the extent that it does. I'm not sure if this is an endorsed policy though or just my personal annoyance at all the COVID-19 stuff taking up space and attention and away from what I still consider a bigger threat in expectation, unaligned AI.

Comment by gworley on Ubiquitous Far-Ultraviolet Light Could Control the Spread of Covid-19 and Other Pandemics · 2020-03-18T20:26:17.433Z · score: 14 (4 votes) · LW · GW

I didn't realize at first that this was cross-posted from EAF, so since this seems to be getting more attention here I'll repost my comment from there here:

You don't mention this, and maybe there is no research on it, but do we expect there to be much opportunity for resistance effects, similar to what we see with antibiotics and the evolution of resistant strains?

For example, would the deployment of large amounts of far-ultraviolet lamps result in selection pressures on microbes to become resistant to them? I think it's not clear, since for example we don't seem to see lots of heat resistant microbes evolving (outside of places like thermal vents) even though we regularly use high heat to kill them.

And even if it did would it be worth the tradeoff? For example, I think even if we knew about the possibility of antibiotic resistance bacteria when penicillin was created, we would still have used penicillin extensively because it was able to cure so many diseases and increase human welfare, although we might have done it with greater care about protocols and their enforcement, so with hindsight maybe we would do something similar here with far-ultraviolet light if we used it.

Comment by gworley on I'm leaving AI alignment – you better stay · 2020-03-16T16:56:24.958Z · score: 17 (7 votes) · LW · GW

Because conditions might change and you might come back to AI alignment research, I want to share some details of what I've been doing and how I've approach my alignment work. I'll write this out as a personal story since that seems to be the best fit, and you can pull out what resonates as advice. Some of the details might seem irrelevant at first but I promise I put them there as context that I think is relevant to tying the whole thing together at the end.

So back in 1999 I got a lot more engaged on the Extropians mailing list (like actually reading it rather than leaving them unread in a folder). This led to me joining the SL4 mailing list and then getting really excited about existential risks more generally (since about 1997 I had been reading and thinking a lot about nanotech/APM and its risks). Over the next few years I stayed moderately engaged on SL4 and things that came after it until around 2004-2005. By this point it seemed I just wasn't cut out for AI alignment research even though I cared a lot, and I mostly gave up on ever being able to contribute anything. I went off to live my life, got married, and worked on a PhD.

I didn't lose touch with the community. When it started Overcoming Bias went straight into my RSS reader, and then LW later on. I kept up with the goings on of SIAI, the Foresight Institute, and other things.

My life changed directions in 2011. That year I dropped out of my PhD, having lost the spirit to finish it about 2 years earlier such that I failed classes and only worked on my research, because my wife was sick and couldn't work and I needed a job that paid more. So I started working as a software engineer at a startup. Over the next year or so this changed me: I was making money, I was doing something I was good at, I saw that I kept getting better at it, and it built a lot of confidence. It seemed I could do things.

So in 2012 I finally signed up for cryonics after years of procrastination. I felt good about myself for maybe the first time in my life and I had the money to do it. In 2013 I felt even better about myself and separated from my wife, finally realizing and accepting that I was only with her not because I wanted to be with her but because I didn't want to be alone. That same year I took a new programming job in the Bay Area.

I continued on this upward trajectory for the next few years, but I didn't think too hard about doing AI research. I thought my best bet was that maybe I could make a lot of money and use it to fund the work of others. Then in 2017, after a period of thinking real hard and writing about my ideas, I noticed one day that maybe I had some comparative advantage to offer AI alignment research. Maybe not to be a superstar researcher trying to solve the whole thing, but I could say some things and do some work that might be helpful.

So, that's what I did. AI alignment research is in some sense a "hobby" for me because it's not what I do full time and I don't get paid to do it, but at the same time it's something I make time for, stay up with, and even if I'm not seemingly doing as much as others, I keep at it because it seems to me I'm able to offer something to the field in places that appear neglected to me. Maybe my biggest impact will just be to have been part of the field and have made it bigger and more active so that it had more surface area for others to stay engaged with and find it on their own paths to doing work that has more direct impact, or maybe I'll eventually stumble on something that is really important, or maybe I already have and we just don't realize it yet. It's hard to say.

So I hope this encourages you not to give up on AI alignment research all together. Do what you need to do for yourself, but also I hope you don't lose your connection to the field. One day you might wake up to realize things have changed or you know something that gives you an unique perspective on the problem that, if nothing else, might get people thinking in ways they weren't about the problem before and inject useful noise that will help us anneal our way to a good solution. I hope that you keep reading, keep commenting, and one day find you have something you need to say about AI alignment because others need to hear it.

Comment by gworley on Rationalists, Post-Rationalists, And Rationalist-Adjacents · 2020-03-16T16:27:04.939Z · score: 14 (3 votes) · LW · GW

To some extent I mean both things, though more the former than the latter.

I'll give a direction answer, but first consider this not perfect comparison that I think gives some flavor of how it seemed to me the OP is approaching the post-rationalist category such that it might evoke the feeling in a self-identified rationalist the sort of feeling a post-rationalist would have seeing themselves explained the way they are here.

Let's give a definition for a pre-rationalist that someone who was a pre-rationalist would endorse. They wouldn't call themselves a pre-rationalist, of course, more likely they'd call themselves something like a normal, functioning adult. They might describe themselves like this, in relation to epistemology:

A normal, functioning adult is someone who cares about the truth.

They then might describe a rationalist like this:

A rationalist is someone who believes certain kinds of or ways of knowing truth are invalid, only special methods can be used to find truth, and other kinds of truths are not real.

There's a lot going on here. The pre-rationalist is framing things in ways that make sense to them, which is fair, but it also means they are somewhat unfair to the rationalist because in their heart what they see is some annoying person who rejects things that they know to be true because it doesn't fit within some system that the rationalist, from the pre-rationalist's point of view, made up. They see the rationalist as a person disconnected from reality and tied up in their special notion of truth. Compare the way that to a non-rationalist outsider rationalists can appear arrogant, idealistic, foolish, unemotional, etc.

I ultimately think something similar is going on here. I don't think this is malicious, only that orthonormal doesn't have an inside view of what it would mean to be a post-rationalist and so offers a definition that is defined in relation to being a rationalist, just as a pre-rationalist would offer a definition of rationalist set up in contrast to their notion of what it is to be "normal".

So yes I do mean that "in order to give a satisfying constructive definition of post rationalists, one must give up commitment to a single ontology" because this is the only way to give such a definition from the inside and have it make sense.

I think the problem is actually worse than this, which is why I haven't proffered my own definition here. I don't think there's a clean way to draw lines around the post-rationalist category and have it capture all of what a post-rationalist would consider important because it would require making distinctions that are in a certain sense not real, but in a certain sense are. You might say that the post-rationalist position is ultimately a non-dual one as way of pointing vaguely in the direction of what I mean, but it's not that helpful a pointer because it also is only a useful one if you have some experience to ground what that means.

So if I really had to try to offer a constructive definition, it would look something like a pointer to what it is like to think in this way so that you could see it for yourself, but you'd have to do that seeing all on your own, not through my words, it would be highly contextualized to fit the person I was offering the definition to, and in the end it would effectively have to make you, at least for a moment, into a post-rationalist, even if beyond that moment you didn't consider yourself one.

Now that I've written all this, I realize this post in itself might serve as such a pointer to someone, though not necessarily you, philh.

Comment by gworley on How likely is it that US states or cities will prevent travel across their borders? · 2020-03-15T01:50:23.099Z · score: 2 (1 votes) · LW · GW

Also also, California seems, in my inexpert legal opinion, to violate this ruling in spirit if not in fact with its agricultural inspection stations, since it effectively restricts interstate commerce though not the movement of people.

Comment by gworley on How likely is it that US states or cities will prevent travel across their borders? · 2020-03-15T01:46:17.811Z · score: 2 (1 votes) · LW · GW

As additional context, the United States effectively has internal external borders, i.e. the border patrol has checkpoints inside the borders of the United States to enforce the external borders by looking to catch people who have violated them. This is different but related and complicates the situation.

Comment by gworley on How likely is it that US states or cities will prevent travel across their borders? · 2020-03-15T01:42:33.197Z · score: 11 (4 votes) · LW · GW

Doing some research, thinking something like this might have happened during the migrations during the Dust Bowl, I found articles claiming that the LAPD attempted a blockade at the California border to prevent people entering the state who appeared not to have the means to support themselves using anti-vagrancy laws.

It seems that this drew a ruling from the Supreme Court stating that states could not restrict interstate migration (and presumably all movement). Source:

Until 1941 states felt free to restrict interstate mobility, focusing that power, when they used it, on the poor. To discourage indigents from crossing state lines, many states maintained tough vagrancy laws and required many years of residence of those applying for public assistance. California had been especially hostile to poor newcomers. In 1936, the Los Angeles police department established a border patrol, dubbed the "Bum Blockade," at major road and rail crossings for the purpose of turning back would-be visitors who lacked obvious means of support. Withdrawn in the face of threatened law suits, this border control effort was followed by a less dramatic but more serious assault on the right of interstate mobility. California's Indigent Act, passed in 1933, made it a crime to bring indigent persons into the state. In 1939 the district attorneys of several of the counties most affected by the Dust Bowl influx began using the law in a very public manner. More than two dozen people were indicted, tried, and convicted. Their crime: helping their relatives move to California from Oklahoma and nearby states. The prosecutions were challenged by the ACLU which pushed the issue all the way to the U.S. Supreme Court. In 1941 the court issued a landmark decision (Edwards v. California) ruling that states had no right to restrict interstate migration by poor people or any other Americans.

The result of this case was the following holding:

A state cannot prohibit indigent people from moving into it.
The Court found that Section 2615 of the Welfare and Institutions Code of California violated Article 1, Section 8 of the Constitution.

This ruling is rather limited and is specifically based on the idea that California had violated the "commerce clause" of the constitution. The concurrent opinions on the decision aimed for a more expansive justification. Wikipedia again:

It is worth noting that in writing their concurring opinions, the additional justices chose to forgo the explanation that California had violated Article 1, Section 8 of the Constitution, arguing that defining the transportation of human beings as “commerce” raises a number of troubling moral questions which undermine individual rights and devalue the original intent of the Commerce Clause. Instead, they propose the idea that the impairment of one's ability to freely traverse interstate borders is a violation of the implied rights of US citizenship, and thereby violates the 14th Amendment and the individual's right to equal protection.

However as I understand it since this was not the official finding of the court this does not constitute precedent; only the holding does.

Also note that this holding is specifically about states taking unilateral action to close their borders, and does not rule that the federal government could not do that. In fact I think it implies that it likely establishes that the federal government does have this power, though if they exercised it I expect it would be challenged on the grounds that even if the commerce clause grants this power the 14th amendment might supercede that.

Comment by gworley on Rationalists, Post-Rationalists, And Rationalist-Adjacents · 2020-03-15T01:08:55.201Z · score: 2 (1 votes) · LW · GW

I see their description as set up against the definition of rationalist, so an eliminative description that says more about what it is not than what it is.

Comment by gworley on Rationalists, Post-Rationalists, And Rationalist-Adjacents · 2020-03-14T02:12:34.597Z · score: 7 (5 votes) · LW · GW

I feel like this fails to capture some import features of each of these categories as they exist in my mind.

  • The rationalist category can reasonably includes people who are not building unified probabilistic models, even if LW-style rationalists are Bayesians, because they apply similarly structured epistemological methods even if their specific methods are different.
  • The post-rationalist category can be talked about constructively, although it's a bit hard to do this in a way that satisfies everyone, especially rationalists, because it require giving up commitment to a single ontology as the only right ontology.
Comment by gworley on Raemon's Scratchpad · 2020-03-13T01:48:46.139Z · score: 2 (1 votes) · LW · GW

I'm not sure what you have in mind by "skipping" here, since the Kegan and other developmental models explicitly are based on the idea that there can be no skipping because each higher level is built out of new ways of combining abstractions from the lower levels.

I have noticed ways in which people can have lumpy integration of the key skills of a level (and have noticed this in various ways in myself); is that the sort of thing you have in mind by "skipping", like made it to 4 without ever having fully integrated the level 3 insights.

Comment by gworley on How effective are tulpas? · 2020-03-12T00:16:10.368Z · score: 2 (1 votes) · LW · GW

"Mindfulness meditation" is a rather vague category anyway, with different teachers teaching different things as if it were all the same thing. This might sometimes be true, but I think of mindfulness meditation as an artificial category recently made up that doesn't neatly, as used by the people who teach it, divide the space of meditation techniques, even if a particular teacher does use it in a precise way that does divide the space in a natural way.

None of this is to say you shouldn't avoid it if you think it doesn't work for you. Meditation is definitely potentially dangerous, and particular techniques can be more dangerous than others to particular individuals depending on what else is going on in their lives, so I think this is a useful intuition to have that some meditation technique is not a one-size-fits-all solution that will work for everyone, especially those who have not already done a lot of work and experienced a significant amount of what we might call, for lack of a better term, awakening.

Comment by gworley on Raemon's Scratchpad · 2020-03-12T00:09:10.197Z · score: 4 (2 votes) · LW · GW

I think this is right, although I stand by the existing numbering convention. My reasoning is that the 4.5 space is really best understood in the paradigm where the thing that marks a level transition is gaining a kind of naturalness with that level, and 4.5 is a place of seeing intellectually that something other than what feels natural is possible, but the higher level isn't yet the "native" way of thinking. This is not to diminish the in between states because they are important to making the transition, but also to acknowledge that they are not the core thing as originally framed.

For what it's worth I think Michael Common's approach is probably a bit better in many ways, especially in that Kegan is right for reasons that are significantly askew of the gears in the brain that make his categories natural. Luckily there's a natural and straightforward mapping between different developmental models (see Integral Psychology and Ken Wilber's work for one explication of this mapping between these different models), so you can basically use whichever is most useful to you in a particular context without missing out on pointing at the general feature of reality these models are all convergent to.

Also perhaps interestingly, there's a model in Zen called the five ranks that has an interpretation that could be understood as a developmental model of psychology, but it also suggests an inbetween level, although between what we might call Kegan 5 and a hypothetical Kegan 6 if Kegan had described such a level. I don't think there's much to read into this, though, as the five ranks is a polymorphic model that explains multiple things in different ways using the same structure, so this is as likely an artifact as some deep truth that there is something special about the 5 to 6 transition, but it is there so it suggests others have similarly noticed it's worth pointing out cases where there are levels between the "real" levels.

Similarly it's clear from Common's model that Kegan's model is woefully under describing the pre-3 territory, and it's possible that due to lack of data all models are failing to describe all the meaningful transition states between the higher levels. As I recall David Chapman wrote something once laying out 10 sublevels between each level, although I'm not sure how much I would endorse that approach.

Comment by gworley on [Article review] Artificial Intelligence, Values, and Alignment · 2020-03-11T00:18:59.515Z · score: 7 (4 votes) · LW · GW

I never came back to this paper after I briefly posted about it, and this seems as good a place as any to say more and continue the conversation.

What I found weird about this paper is that it seems to focus too much on something that seems largely irrelevant to me. I don't expect there to be much for us to choose about how to aggregate values, because I expect that most of the problem is in figuring out how to specify or find values at all. I do expect there to be some issues to resolve around aggregation, but not knowing yet what we will be aggregating (that is, what the abstractions we will be trying to deal with aggregation over and conflict resolution of) makes it hard to see how this kind of consideration is yet relevant.

To be fair, many may object that I am making the same mistake worrying about understanding what values even are and how we might be able to verify if AI are aligned with ours when we don't even know what AI powerful enough to need alignment will look like, so I wouldn't want to see this kind of work not happen, only that for my taste it seems like a premature thing to worry about that may be reasoning about things that won't be relevant or won't be relevant in the way we expect such that the work is of limited marginal value.

That said, I think this paper stands as an excellent signal, as you do, that more mainstream AI researchers are taking problems in value alignment more seriously and thinking about problems of the kind that are more likely, in my estimation, to be important long term than short term concerns about, for example, narrow value learning.

Comment by gworley on How effective are tulpas? · 2020-03-10T21:06:38.206Z · score: 13 (5 votes) · LW · GW

I'm mildly anti-tulpa. I'll try to explain what I find weird and unhelp about them, though also keep in mind I never really tied to develop them, other than the weak tulpa-like mental constructs I have due to high cognitive empathy and capacity to model other people as others rather than as modified version of myself.

So, the human mind doesn't seem to be made of what could reasonably be called subagents, but it is made of subsystems that interact, but subsystem is maybe even an overstatement because the boundaries of those subsystems are often fuzzy. So reifying those subsystems as subagents or tulpas is a misunderstanding that might be a useful simplification for a time but is ultimately an abstraction that is leaky and going to need to be abandoned if you want to better connect with yourself and the world just as it is.

Thus I think tulpas might be a skillful means to some end some of the time, but mostly I think they are not necessary and are extra machinery that you're going to have to tear down later, so it seems unclear to me that it's worth building up.

Comment by gworley on Anthropic effects imply that we are more likely to live in the universe with interstellar panspermia · 2020-03-10T17:58:29.905Z · score: 4 (2 votes) · LW · GW

I think this is line of argument is also consistent with life being easy, i.e. life frequently spontaneously appears millions of times throughout the universe basically whenever the conditions are even moderately right for it. We could replace "panspermia" with "easy life" or equivalently "late Great Filter", in which case I think all is equal between these options within this argument and so we have to choose between them by non-anthropic reasoning, for example arguing non-anthropically whether panspermia or easy life is more likely.

Comment by gworley on Long try's Shortform · 2020-03-10T17:52:29.574Z · score: 2 (3 votes) · LW · GW

When I use the site it shows posts I've already visited as a lighter grey while unvisited posts a dark grey/black.

Comment by gworley on G Gordon Worley III's Shortform · 2020-03-09T17:20:35.797Z · score: 2 (1 votes) · LW · GW

It's not a very firm distinction, but techne is knowledge from doing, so I would consider playing with a hammer a way to develop techne. It certainly overlaps with the concept of gnosis, which is a bit more general and includes knowledge from direct experience that doesn't involve "doing", like the kind of knowledge you gain from observing. But the act of observing is a kind of thing you do, so as you see it's fuzzy, but generally I think of techne as that which involves your body moving.

Comment by gworley on Robustness to fundamental uncertainty in AGI alignment · 2020-03-05T00:24:07.374Z · score: 2 (1 votes) · LW · GW

I'm unclear if you are disagreeing with something or not, but to me your comment reads largely as saying you think there's a lot of probability mass that can be assigned before we reach the frontier and that this is what you think is most important for reasoning about the risks associated with attempts to build human-aligned AI.

I agree that we can learn a lot before we reach the frontier, but I also think that most of the time we should be thinking as if we are already along the frontier and not much expect the sudden development of resolutions to questions that would let us get more of everything. For example, to return to one of my examples, we shouldn't expect to suddenly learn info that would let us make Pareto improvements to our assumptions about moral facts given how long this question has been studied, so we should instead mostly be concerned with marginal trade offs about the assumptions we make under uncertainty.

Comment by gworley on An Analytic Perspective on AI Alignment · 2020-03-02T23:29:17.265Z · score: 2 (1 votes) · LW · GW

You already touch on this some, but do you imagine this perspective allowing you, at least ideally, to create a "complete" filter in the sense that the filtering process would be capable to catching all unsafe and unaligned AI? If so, what are the criteria under which you might be able to achieve that and if not I'm curious what predictable gaps you expect your filter to have?

(I think you've already given a partial answer in your post, but given the way you set up this post with talk about the filter it made me curious to understand what you explicitly think this aspect of it.)

Comment by gworley on Predictive coding and motor control · 2020-03-01T20:38:28.225Z · score: 2 (1 votes) · LW · GW
Question: How do many mammals walk on their first day of life, if the neocortex needs to learn the neural codes and associations? Easy: I say they're not using their neocortex for that! If I understand correctly, there are innate motor control programs stored in the brainstem, midbrain, etc. The neocortex eventually learns to "press go" on these baked-in motor control programs at appropriate times, but the midbrain can also execute them based on its own parallel sensory-processing systems and associated instincts. My understanding is that humans are unusual among mammals—maybe even unique—in the extent to which our neocortex talks directly to muscles, rather than "pressing go" on the various motor programs of the evolutionarily-older parts of the brain.

I don't have a better story here, but this seems odd in that we'd have to somehow explain why humans don't use some baked-in motor control programs if they are there are other mammals. Not that we can't, only that by default we should expect humans have them and they show an ability to let the neocortex and muscles "talk" directly, so it leaves this interesting question of why we think humans don't engage with stored motor control programs too.

Comment by gworley on Training Regime Day 14: Traffic Jams · 2020-02-29T00:20:30.628Z · score: 2 (1 votes) · LW · GW

This idea of a traffic jam sounds related to me to a concept I've more often heard called "monkey mind".

Comment by gworley on Understandable vs justifiable vs useful · 2020-02-29T00:11:51.132Z · score: 2 (1 votes) · LW · GW

I like this. The most important question when I'm deciding what to do is usually to ask if my action will be useful/skillful, not if it is understandable or justifiable. That is, does this action stand a chance of doing what I intend.

This naturally then extends to how I think about the actions of others, such that I mostly wonder whether or not what they are doing is useful, and if it is not I am sympathetic to the difficulty of predicting the effects of our causes.

This is not to say that we should totally ignore norms or our sense of virtue or good in favor of useful or effective action that does something, since useful action is only valuable if it works to achieve something we care about, but given a base desire to see good done, I think usefulness becomes the dominant criterion to worry about.

Comment by gworley on Time Binders · 2020-02-24T22:07:23.234Z · score: 12 (5 votes) · LW · GW
Unfortunately for Korzybski, General Semantics never really took off or achieved prominence as the new field he had set out to create. It wasn’t without some success and it has been taught in some colleges. But overall, despite trying to create something grounded in science and empiricism, over the years the empiricism leaked out of general semantics and a large amount of woo and pseudoscience leaked in. This looks like it was actually a similar failure mode to what had started happening with Origin before I stopped the project. 
With Origin, I introduced a bunch of rough draft concepts and tried to bake in the idea that these were rough ideas that should be iterated upon. However, because of the halo effect, those rough drafts were taken as truth without question. Instead of quickly iterating out of problematic elements, the problematic elements stuck around and became accepted parts of the canon. 
Something similar seems to have happened with General Semantics, at a certain point it stopped being viewed as a science to iterate upon, and began being viewed in a dogmatic, pseudoscientific way. It would eventually spin off a bunch of actual cults like Scientology and Neuro-Linguistic Programming, and while the Institute of General Semantics still exists and still does things, no one seems to really be trying to achieve Korzybski’s goal of a science of human engineering. That goal would sit on a shelf for a long time until finally it was picked back up by one Eliezer Yudkowsky.

This makes me wonder to what extent we fail at this in the rationality movement. I think we're better at it, but I'm also not sure we're as systematic about fighting against it as we could be.

Comment by gworley on Time Binders · 2020-02-24T22:06:18.351Z · score: 4 (2 votes) · LW · GW

I'm hoping we'll find out in the next post, but I would guess the answer is "yes" via general semantics having an impact on science fiction writers who had an impact on transhumanism and the extropians out of which SL4, Eliezer, and this whole thing grew such that even if it wasn't known at the time the ideas were "in the water" in such a way that you could make a strong argument that they did.

Comment by gworley on The recent NeurIPS call for papers requires authors to include a statement about the potential broader impact of their work · 2020-02-24T21:12:17.255Z · score: 5 (4 votes) · LW · GW

Agreed, I think of this like sending a signal that at least a limited concern for safety is important. I'm sure we'll see a bunch of papers with sections addressing this that won't be great, but over time it stands some chance of more regularizing considering concerns about safety and ethics of ML work in the field such that safety work will become more accepted as valuable. So even without a lot of guidance or strong evaluative criteria, this seems a small win to me that, at worst, causes some papers to just have extra fluff sections their authors wrote to pretend to care about safety rather than ignoring it completely.

Comment by gworley on Curiosity Killed the Cat and the Asymptotically Optimal Agent · 2020-02-20T22:44:20.853Z · score: 2 (1 votes) · LW · GW

A decent intuition might be to think about what exploration looks like in human children. Children under the age of 5 but old enough to move about on their own—so toddlers, not babies or "big kids"—face a lot of dangers in the modern world if they are allowed to run their natural exploration algorithm. Heck, I'm not even sure this is a modern problem, because in addition to toddlers not understanding and needing to be protected from exploring electrical sockets and moving vehicles they also have to be protected from more traditional dangers that they would definitely otherwise check out like dangerous plants and animals. Of course, since toddlers grow up into powerful adult humans, this is a kind of evidence that they are powerful enough explorers (even with protections) to become powerful enough to function in society.

Obviously there are a lot of caveats to taking this idea too seriously since I've ignored issues related to human development, but I think it points in the right direction of something everyday that reflects this result.

Comment by gworley on What are information hazards? · 2020-02-18T19:56:47.195Z · score: 3 (2 votes) · LW · GW

Thanks, this is a really useful summary to have since linking back to Bostrom on info hazards is reasonable but not great if you want people to actually read something and understand information hazards rather than bounce of something explaining the idea. Kudos!

Comment by gworley on Big Yellow Tractor (Filk) · 2020-02-18T18:45:52.173Z · score: 4 (2 votes) · LW · GW

Couple of notes on the song:

  • I wrote it with the Bob Dylan cover in my head more than the original.
  • It doesn't scan perfectly on purpose so that some of the syllables have to be "squished" to fit the time and make the song sound "sloppy" like the original and many covers of it do.
  • In case it's not obvious, it's meant to be a "ha ha only serious" anthem for negative utilitarians
Comment by gworley on Training Regime Day 1: What is applied rationality? · 2020-02-17T20:37:41.146Z · score: 3 (2 votes) · LW · GW

I think of applied rationality pretty narrowly, as the skill of applying reasoning norms that maximize returns (those norms happening to have the standard name "rationality"). Of course there's a lot to that, but I also think this framing is a poor one to train all the skills required to "win". To use a metaphor, as requested, it's like the skill of getting really good at reading a map to find optimal paths between points: your life will be better for it, but it also doesn't teach you everything, like how to figure out where you are on the map now or where you might want to go.

Comment by gworley on G Gordon Worley III's Shortform · 2020-02-17T19:17:22.119Z · score: 9 (5 votes) · LW · GW

tl;dr: read multiple things concurrently so you read them "slowly" over multiple days, weeks, months

When I was a kid, it took a long time to read a book. How could it not: I didn't know all the words, my attention span was shorter, I was more restless, I got lost and had to reread more often, I got bored more easily, and I simply read fewer words per minute. One of the effects of this is that when I read a book I got to live with it for weeks or months as I worked through it.

I think reading like that has advantages. By living with a book for longer the ideas it contained had more opportunity to bump up against other things in my life. I had more time to think about what I had read when I wasn't reading. I more deeply drunk in the book as I worked to grok it. And for books I read for fun, I got to spend more time enjoying them, living with the characters and author, by having it spread out over time.

As an adult it's hard to preserve this. I read faster and read more than I did as a kid (I estimate I spend 4 hours a day reading on a typical day (books, blogs, forums, etc.), not including incidental reading in the course of doing other things). Even with my relatively slow reading rate of about 200 wpm, I can polish off ~50k words per day, the length of a short novel.

The trick, I find, is to read slowly by reading multiple things concurrently and reading only a little bit of each every day. For books this is easy: I can just limit myself to a single chapter per day. As long as I have 4 or 5 books I'm working on at once, I can spread out the reading of each to cover about a month. Add in other things like blogs and I can spread things out more.

I think this has additional benefits over just getting to spend more time with the ideas. It lets the ideas in each book come up against each other in ways they might otherwise not. I sometimes notice patterns that I might otherwise not have because things are made simultaneously salient that otherwise would not be. And as a result I think I understand what I read better because I get the chance not just to let it sink in over days but also because I get to let it sink in with other stuff that makes my memory of it richer and more connected.

So my advice, if you're willing to try it, is to read multiple books, blogs, etc. concurrently, only reading a bit of each one each day, and let your reading span weeks and months so you can soak in what you read more deeply rather than letting it burn bright and fast through your mind to be forgotten like a used up candle.

Comment by gworley on Here is why most advice you hear that seems good, but "just doesn't work" from my unique perspective as a data scientist, as well as some that should actually work. · 2020-02-17T18:57:18.955Z · score: 4 (2 votes) · LW · GW

Welcome to LessWrong!

Given the content of your post, you might find these posts interesting:

Comment by gworley on G Gordon Worley III's Shortform · 2020-02-14T01:09:00.457Z · score: 4 (2 votes) · LW · GW

I few months ago I found a copy of Staying OK, the sequel to I'm OK—You're OK (the book that probably did the most to popularize transactional analysis), on the street near my home in Berkeley. Since I had previously read Games People Play and had not thought about transactional analysis much since, I scooped it up. I've just gotten around to reading it.

My recollection of Games People Play is that it's the better book (based on what I've read of Staying OK so far). Also, transactional analysis is kind of in the water in ways that are hard to notice so you are probably already kind of familiar with some of the ideas in it, but probably not explicitly in a way you could use to build new models (for example, as far as I can tell notions of strokes and life scripts were popularized by if not fully originated within transactional analysis). So if you aren't familiar with transactional analysis I recommend learning a bit about it since although it's a bit dated and we arguably have better models now, it's still pretty useful to read about to help notice patterns of ways people interact with others and themselves, sort of like the way the most interesting thing about Metaphors We Live By is just pointing out the metaphors and recognizing their presence in speech rather than whether the general theory is maximally good or not.

One things that struck me as I'm reading Staying OK is its discussion of the trackback technique. I can't find anything detailed online about it beyond a very brief summary. It's essentially a multi-step process for dealing with conflicts in internal dialogue, "conflict" here being a technical term referring to crossed communication in the transactional analysis model of the psyche. Or at least that's how it's presented. Looking at it a little closer and reading through examples in the book that are not available online, it's really just poorly explained memory reconsolidation. To the extent it's working as a method in transactional analysis therapy, it seems to be working because it's tapping into the same mechanisms as Unlocking the Emotional Brain.

I think this is interesting both because it shows how we've made progress and because it shows that transactional analysis (along with a lot of other things), were also getting at stuff that works, but less effectively because they had weaker evidence to build on that was more confounded with other possible mechanisms. To me this counts as evidence that building theory based on phenomenological evidence can work and is better than nothing, but will be supplanted by work that manages to tie in "objective" evidence.

Comment by gworley on A Variance Indifferent Maximizer Alternative · 2020-02-13T20:09:05.666Z · score: 2 (1 votes) · LW · GW

First, thanks for posting about this even though it failed. Success is built out of failure, and it's helpful to see it so that it's normalized.

Second, I think part of the problem is that there's still not enough constraints on learning. As others notice, this mostly seems to weaken the optimization pressure such that it's slightly less likely to do something we don't want but doesn't actively make it into something that does things we do want and not those we don't.

Third and finally, what this most reminds me of is impact measures. Not in the specific methodology, but in the spirit of the approach. That might be an interesting approach for you to consider given that you were motivated to look for and develop this approach.

Comment by gworley on Confirmation Bias As Misfire Of Normal Bayesian Reasoning · 2020-02-13T19:44:11.834Z · score: 5 (3 votes) · LW · GW

As Stuart previously recognized with the anchoring bias, it's probably worth keeping in mind that any bias is likely only a "bias" against some normative backdrop. Without some way reasoning was supposed to turn out, there are no biases, only the way things happened to work.

Thus things look confusing around confirmation bias, because it only becomes bias when it results in reason that produces a result that doesn't predict reality after the fact. Otherwise it's just correct reasoning based on priors.

Comment by gworley on Suspiciously balanced evidence · 2020-02-12T21:30:45.613Z · score: 2 (1 votes) · LW · GW

Yeah, I think #1 sounds right to me, and there is nothing strange about it.

Comment by gworley on Writeup: Progress on AI Safety via Debate · 2020-02-12T19:54:07.020Z · score: 13 (6 votes) · LW · GW

I don't recall seeing anything addressing this directly: has there been any progress towards dealing with concerns about Goodharting in debate and otherwise the risk of mesa-optimization in the debate approach? The typical risk scenario being something like training debate creates AIs good at convincing humans rather than at convincing humans of the truth, and once you leave the training set of questions were the truth can be reasonably determined independent of the debate mechanism we'll experience what will amount to a treacherous turn because the debate training process accidentally optimized for a different target (convince humans) than the one intended (convince humans of true statements).

For myself this continues to be a concern which seems inadequately addressed and makes me nervous about the safety of debate, much less its adequacy as a safety mechanism.

Comment by gworley on What can the principal-agent literature tell us about AI risk? · 2020-02-12T19:39:57.627Z · score: 6 (3 votes) · LW · GW
Nevertheless, extensions to PAL might still be useful. Agency rents are what might allow AI agents to accumulate wealth and influence, and agency models are the best way we have to learn about the size of these rents. These findings should inform a wide range of future scenarios, perhaps barring extreme ones like Bostrom/Yudkowsky.

For myself, this is the most exciting thing in this post—the possibility of taking the principal-agent model and using it to reason about AI even if most of the existing principal-agent literature doesn't provide results that apply. I see little here to make me think the principal-agent model wouldn't be useful, only that it hasn't been used in ways that are useful to AI risk scenarios yet. It seems worthwhile, for example, to pursue research on the principal-agent problem with some of the adjustments to make it better apply to AI scenarios, such as letting the agent be more powerful than the principal and adjusting the rent measure to better work with AI.

Maybe this approach won't yield anything (as we should expect on priors, simply because most approaches to AI safety are likely not going to work), but it seems worth exploring further on the chance it can deliver valuable insights, even if, as you say, the existing literature doesn't offer much that is directly useful to AI risk now.