Posts

Comparing AI Alignment Approaches to Minimize False Positive Risk 2020-06-30T19:34:57.220Z · score: 6 (2 votes)
What are the high-level approaches to AI alignment? 2020-06-16T17:10:32.467Z · score: 13 (4 votes)
Pragmatism and Completeness 2020-06-12T16:34:57.691Z · score: 14 (5 votes)
The Mechanistic and Normative Structure of Agency 2020-05-18T16:03:35.485Z · score: 14 (5 votes)
What is the subjective experience of free will for agents? 2020-04-02T15:53:38.992Z · score: 10 (3 votes)
Deconfusing Human Values Research Agenda v1 2020-03-23T16:25:27.785Z · score: 18 (6 votes)
Robustness to fundamental uncertainty in AGI alignment 2020-03-03T23:35:30.283Z · score: 11 (3 votes)
Big Yellow Tractor (Filk) 2020-02-18T18:43:09.133Z · score: 12 (4 votes)
Artificial Intelligence, Values and Alignment 2020-01-30T19:48:59.002Z · score: 13 (4 votes)
Towards deconfusing values 2020-01-29T19:28:08.200Z · score: 13 (5 votes)
Normalization of Deviance 2020-01-02T22:58:41.716Z · score: 59 (22 votes)
What spiritual experiences have you had? 2019-12-27T03:41:26.130Z · score: 22 (5 votes)
Values, Valence, and Alignment 2019-12-05T21:06:33.103Z · score: 12 (4 votes)
Doxa, Episteme, and Gnosis Revisited 2019-11-20T19:35:39.204Z · score: 14 (5 votes)
The new dot com bubble is here: it’s called online advertising 2019-11-18T22:05:27.813Z · score: 55 (21 votes)
Fluid Decision Making 2019-11-18T18:39:57.878Z · score: 9 (2 votes)
Internalizing Existentialism 2019-11-18T18:37:18.606Z · score: 10 (3 votes)
A Foundation for The Multipart Psyche 2019-11-18T18:33:20.925Z · score: 7 (1 votes)
In Defense of Kegan 2019-11-18T18:27:37.237Z · score: 10 (5 votes)
Why does the mind wander? 2019-10-18T21:34:26.074Z · score: 11 (4 votes)
What's your big idea? 2019-10-18T15:47:07.389Z · score: 29 (15 votes)
Reposting previously linked content on LW 2019-10-18T01:24:45.052Z · score: 18 (3 votes)
TAISU 2019 Field Report 2019-10-15T01:09:07.884Z · score: 39 (20 votes)
Minimization of prediction error as a foundation for human values in AI alignment 2019-10-09T18:23:41.632Z · score: 13 (7 votes)
Elimination of Bias in Introspection: Methodological Advances, Refinements, and Recommendations 2019-09-30T20:23:13.139Z · score: 16 (3 votes)
Connectome-specific harmonic waves and meditation 2019-09-30T18:08:45.403Z · score: 12 (10 votes)
Goodhart's Curse and Limitations on AI Alignment 2019-08-19T07:57:01.143Z · score: 15 (7 votes)
G Gordon Worley III's Shortform 2019-08-06T20:10:27.796Z · score: 16 (2 votes)
Scope Insensitivity Judo 2019-07-19T17:33:27.716Z · score: 25 (10 votes)
Robust Artificial Intelligence and Robust Human Organizations 2019-07-17T02:27:38.721Z · score: 17 (7 votes)
Whence decision exhaustion? 2019-06-28T20:41:47.987Z · score: 17 (4 votes)
Let Values Drift 2019-06-20T20:45:36.618Z · score: 3 (11 votes)
Say Wrong Things 2019-05-24T22:11:35.227Z · score: 99 (36 votes)
Boo votes, Yay NPS 2019-05-14T19:07:52.432Z · score: 34 (11 votes)
Highlights from "Integral Spirituality" 2019-04-12T18:19:06.560Z · score: 19 (22 votes)
Parfit's Escape (Filk) 2019-03-29T02:31:42.981Z · score: 40 (15 votes)
[Old] Wayfinding series 2019-03-12T17:54:16.091Z · score: 9 (2 votes)
[Old] Mapmaking Series 2019-03-12T17:32:04.609Z · score: 9 (2 votes)
Is LessWrong a "classic style intellectual world"? 2019-02-26T21:33:37.736Z · score: 31 (8 votes)
Akrasia is confusion about what you want 2018-12-28T21:09:20.692Z · score: 27 (16 votes)
What self-help has helped you? 2018-12-20T03:31:52.497Z · score: 34 (11 votes)
Why should EA care about rationality (and vice-versa)? 2018-12-09T22:03:58.158Z · score: 16 (3 votes)
What precisely do we mean by AI alignment? 2018-12-09T02:23:28.809Z · score: 29 (8 votes)
Outline of Metarationality, or much less than you wanted to know about postrationality 2018-10-14T22:08:16.763Z · score: 21 (18 votes)
HLAI 2018 Talks 2018-09-17T18:13:19.421Z · score: 15 (5 votes)
HLAI 2018 Field Report 2018-08-29T00:11:26.106Z · score: 51 (21 votes)
A developmentally-situated approach to teaching normative behavior to AI 2018-08-17T18:44:53.515Z · score: 12 (5 votes)
Robustness to fundamental uncertainty in AGI alignment 2018-07-27T00:41:26.058Z · score: 7 (2 votes)
Solving the AI Race Finalists 2018-07-19T21:04:49.003Z · score: 27 (10 votes)
Look Under the Light Post 2018-07-16T22:19:03.435Z · score: 25 (11 votes)

Comments

Comment by gworley on Was a terminal degree ~necessary for inventing Boyle's desiderata? · 2020-07-10T16:33:01.796Z · score: 3 (2 votes) · LW · GW

I actually agree generally with the idea that credentials aren't necessary, but that perhaps makes me all the more suspicious of evidence like the kind you present because I think it strongly risks suffering from unnoticed selection bias.

For example, maybe all the stuff that could be invented without having done the work to obtain a terminal degree was invented in the past, and newer stuff just couldn't be done without the degree. Sure, this is a noisy phenomenon and sometimes you'll either find some low hanging fruit we previously missed that someone without a degree can grab, or you'll find someone smart enough that a degree doesn't make a difference, but largely I think the data also fits a story where the present is different from the past in a way that implies degrees are needed now even if they were less necessary in the past.

I don't think the data you present does much to contradict this possibility, so it reasonably remains possible that you are both right (about the past) and wrong (about the present) at the same time.

Comment by gworley on Angela Pretorius's Shortform · 2020-07-08T17:09:47.916Z · score: 5 (3 votes) · LW · GW

I think one of the big difficulties around rent is the high transaction cost on both sides.

Your proposal helps address some of the friction that makes it costly for landlords to take a risk on tenets, and having lived places where eviction is generally fairly easy (Florida, where you you can evict someone in between 3 and 10 days) and where it is fairly hard (California, where you can't really evict someone in less than 30 days and 60 to 90 days is more realistic) I can say that the rental market certainly felt more efficient where eviction was easier rather than harder.

Alas, housing is not a commodity good where one house/apartment is as good as another at the same price point and there are a lot of complicated factors involved with finding something that is a good fit between renter and landlord, not to mention all the emotional friction present in the market (people become emotionally attached to the places they live; they become part of their identity and then experience emotional pain if they are forced to move, making them more hesitant to switch). And of course there are the costs associated with moving stuff from one place to another, both in personal labor, emotional labor, and financial.

This all points to a market that people are likely to want more regulated rather than less, and to want more restrictions on evictions rather than less when they have the political capital to make that happen. There are likely ways to improve the situation dramatically, but they probably don't look like something as simple as changing eviction rules to make eviction easier (Singapore is widely held up as a model of how housing policy can work really well).

Comment by gworley on Causality and its harms · 2020-07-08T16:56:21.736Z · score: 2 (1 votes) · LW · GW

What would it even mean to say a theory of causality is "correct" here? We're talking about what makes sense to apply the term causality to, and there's matter of correctness at that level, only of usefulness to some purpose. It's only after we have some systematized way of framing a question that we can ask if something is correct within that system.

Comment by gworley on DARPA Digital Tutor: Four Months to Total Technical Expertise? · 2020-07-07T16:14:31.663Z · score: 3 (2 votes) · LW · GW

These are really impressive results! The main thing I wonder is to what extent we should consider this system AI. Maybe I missed this detail, but this sounds like a really impressive expert system turned to the problem if teaching, not a system we would identifying as AI today. Does that seem like a fair assessment?

Comment by gworley on Causality and its harms · 2020-07-05T09:36:28.489Z · score: 2 (1 votes) · LW · GW

Agreed. Assigning causality requires having made a choice about how to carve up the world into categories so one part of the world can affect another. Without having made this choice we lose our normal notion of causality because there are no things to cause other things, hence causality as normally formulated only makes sense within an ontology.

And yet, there is some underlying physical process which drives our ability to model the world with the idea that things cause other things and we might reasonably point to it and say it is the real causality, i.e. the aspect of existence that we perceive as change.

Comment by gworley on Research ideas to study humans with AI Safety in mind · 2020-07-03T23:08:00.223Z · score: 4 (2 votes) · LW · GW

Thanks for writing this. I think there's a lot of useful work that can be done in this direction, and a current major bottleneck on it is identifying it in a way that the people with relevant skills to pursue it are aware of it and why it is valuable.

Comment by gworley on PSA: Cars don't have 'blindspots' · 2020-07-01T18:10:43.314Z · score: 8 (5 votes) · LW · GW

One caveat: the rear view mirror is optional safety equipment as opposed to the side mirrors as not all cars can take advantage of a rear view mirror or have limited rear view mirror functionality (e.g. vans, trucks with trailers, etc.), and traditional mirror alignment is based on the idea that you don't have a rear view mirror and need to use them to see what's directly behind you.

This doesn't mean we shouldn't take advantage of such technology when it's available (when I drive a car with good visibility in the rear view mirror I align my mirrors like the infographic shows, pointing out so I can use them to see to the sides of the car beyond my vision rather than the space behind it or along the side of the car), only that there are plenty of circumstances when this kind of mirror alignment is not possible.

Comment by gworley on Thoughts as open tabs · 2020-06-29T17:46:21.269Z · score: 8 (4 votes) · LW · GW

I think some productivity methods try to address this. For example, I've used GTD (Getting Things Done) in the past and feel like it helped a lot to deal with "open tabs" (GTD calls them "open loops") and re-establishing context.

Comment by gworley on ozziegooen's Shortform · 2020-06-29T17:43:49.597Z · score: 6 (3 votes) · LW · GW

One of the things I love about entertainment is that much of it offers evidence about how humans behave in a wide variety of scenarios. This has gotten truer over time, at least within Anglophone media, with its trend towards realism and away from archetypes and morality plays. Yes, it's not the best possible or most reliable evidence about how real humans behave in real situations and it's a meme around here that you should be careful not to generalize from fictional evidence, but I also think it's better than nothing (I don't think reality TV is especially less fictional than other forms of entertainment with regards to how human behave, given its heavy use of editing and loose scripting to punch up situations for entertainment value).

Comment by gworley on Gödel's Legacy: A game without end · 2020-06-28T20:01:49.644Z · score: 2 (1 votes) · LW · GW

The sumobot video doesn't show up for me in the post. Maybe also include an external link to it?

Comment by gworley on Life at Three Tails of the Bell Curve · 2020-06-28T19:30:44.964Z · score: 4 (3 votes) · LW · GW

I'll claim that I'm decently skilled at modeling people and understanding them. It wasn't always like this. I'm different enough from the typical person my model of myself is not a very useful predictor of what other people are like for many socially relevant things (it's of course plenty useful if the reference class is not other human culturally like me but, say, animals). I had to put in a lot of work over decades to get to a point where other people have useful gears rather than being better modeled as black boxes to me.

The nice benefit of this is that I think I better appreciate how hard it is to understand others. Woe is the neurotypical person who is similar enough to others that they never notice they're making good predictions about most other people because their bad model accidentally works only because they are near the center of the distribution. They, perhaps rightly from their subjective experience, draw the conclusion that there are just some weird people out there they don't understand, rather than that their understanding of everyone is flawed even if it keeps making reasonable predictions in the situations they bother to check (sort of like Newtonian physics being fine so long as you don't go too fast or things aren't too big or too small).

Comment by gworley on Life at Three Tails of the Bell Curve · 2020-06-28T01:28:54.087Z · score: 3 (2 votes) · LW · GW

Nit: I think that sounds more like you're saying you're highly conscientious rather than neurotic, although maybe your statements leave out some of the details of your experience that lead you to say "neurotic". Of course, could just be you're both conscientious and neurotic in a way that makes it hard to tell them apart as separate dimensions.

Comment by gworley on Rudi C's Shortform · 2020-06-23T03:04:49.965Z · score: 4 (2 votes) · LW · GW

It's usually thought about the other way, i.e. we already are trying and failing to solve the human alignment problem (using social structures to get humans to do things in accord with particular values), so solutions to AI alignment must be of a class that cannot be or has not been attempted with humans. Examples can be drawn from business attempts to organize workers around a mission/objective/goal, state attempts to control people, and religious attempts to align behavior with religious teachings.

Comment by gworley on Neural Basis for Global Workspace Theory · 2020-06-22T20:52:43.454Z · score: 6 (3 votes) · LW · GW

Thanks, I really enjoy these kinds of details about the brain. I really liked the level of detail you provided, as less would have left me wanting more and more would have probably been more than was necessary.

Comment by gworley on The affect heuristic and studying autocracies · 2020-06-22T16:06:23.978Z · score: 2 (1 votes) · LW · GW

Only write up the obvious conclusions from your case-studies. I had plenty of subtle theories to explain Jordan’s policies, but just a few which strong evidence. I asked myself “if a young man from Mafraq reproduced my methodology, would he arrive at the same conclusions”, and left out ideas which failed the test.

This feels like the traditional academic approach, but it results in at least a couple of weird effects based on patterns we see in academic publishing:

  • a large amount of unpublished ideas that are known to insiders because they are shared only informally but still influence the results published in the field in a way that is opaque to outsiders and beyond comment/consideration
  • faking or exaggerating data/results in order to reach publication standards of evidence

This suggests attempts to reform academic publishing norms might be relevant here.

Comment by gworley on What's Your Cognitive Algorithm? · 2020-06-20T20:52:22.918Z · score: 4 (2 votes) · LW · GW

A quick summary of the phenomenology of my thoughts:

  • thoughts primarily have shapes, textures, and other touch-related features
    • prior/below the level of words, my thoughts are like touch-felt objects that exist in a field that permeates my body
  • thinking feels like exploring, as in i'm traveling around finding/bumping into thought objects
  • sometimes i'm not very in touch with this aspect though and i can't feel it, but i'm pretty sure it's always there for hard to explain reasons
  • when i can't see it words seem to just come up from nowhere for unknown reasons

Some differences from your model:

  • i don't have what feels like a badness check, rather it feels like i have a thought and then maybe a linked thought is about what the consequences of it might be, and sometimes those are bad.
    • but sometimes i might be distracted and not follow up and notice the bad association.
Comment by gworley on What's Your Cognitive Algorithm? · 2020-06-20T20:42:39.395Z · score: 8 (2 votes) · LW · GW

There is both some actual fact of what it is like to experience your own mind, and then there is the way you make sense of it to explain it to yourself and others that has been reified into concepts. Just because the reification of the experience of our own thinking is flawed in a lot of ways doesn't make it not evidence of our thoughts, it only makes it noisy, unreliable, and "known" in ways that have to be "unknown" (we have to find and notice confusion).

You worry that asking people who they think will tell us more about their understanding of how they think rather than how they actually think, and that's probably true, but also useful, because they got that understanding somehow and it's unlikely to be totally divorced from reality. Lacking better technology for seeing into our minds, we're left to perform hermeneutics on our self reports.

Comment by gworley on ‘Maximum’ level of suffering? · 2020-06-20T20:01:53.958Z · score: 3 (2 votes) · LW · GW

One possibility is "a lot", in that humans seem to interpret pain on a logarithmic scale such that 2/10 pain is 10x worse than 1/10 pain, etc.. However, there is likely some physiological limit to how much sensor data the human brain can process as pain and still register it as pain and suffer from it. This leaves out the possibility of modifying humans in ways that would allow them to experience greater pain.

Note that I also think this question is exactly symmetrical to asking "what's the maximum level of pleasure", and so likely the answer to one is the answer to the other.

Comment by gworley on The point of a memory palace · 2020-06-20T19:49:48.241Z · score: 3 (2 votes) · LW · GW

My understanding is that classical education definitely taught the method of loci (memory palaces), and that it's only in the last 100-200 years that it's fallen out of favor, and in a exponential way where most of the ceasing to teach it happened during the middle of the 20th century. I'm sure there some conclusions that can be drawn from that fact, assuming I have it right.

Comment by gworley on G Gordon Worley III's Shortform · 2020-06-20T19:29:31.083Z · score: 4 (2 votes) · LW · GW

Love for the real state of the universe, and the simultaneous desire to pick better futures and acceptance of whichever future actually obtains

This is the hidden half of what got me thinking about this: my growing being with the world as it is rather than as I understand it.

Comment by gworley on G Gordon Worley III's Shortform · 2020-06-20T15:20:56.918Z · score: 13 (5 votes) · LW · GW

People often talk of unconditional love, but they implicitly mean unconditional love for or towards someone or something, like a child, parent, or spouse. But this kind of love is by definition conditional because it is love conditioned on the target being identified as a particular thing within the lover's ontology.

True unconditional love is without condition, and it cannot be directed because to direct is to condition and choose. Unconditional love is love of all, of everything and all of reality even when not understood as a thing.

Such love is rare, so it seems worth pursuing the arduous cultivation of it.

Comment by gworley on Memory is not about the past · 2020-06-19T19:59:43.357Z · score: 4 (2 votes) · LW · GW

The votes on this post make it seem contentious. My best guess is that's there's some difference in understanding over what "memory is not about the past, it's about the future" means by "about". I take your post to be about the purpose of memory, i.e. what it is useful for. But "about" is ambiguous, also meaning a way of categorizing the contents, thus rendering the nonsense rephrasing of your thesis as "the content of memory isn't generated from past events, but future ones".

Then again maybe people are turned off by something else, but just a guess.

Comment by gworley on Set image dimensions using markdown · 2020-06-17T17:01:28.409Z · score: 2 (1 votes) · LW · GW

I've been similarly frustrated trying to get images to scale on LW, so looking forward to an answer (although maybe the new editor just eliminates this problem, outside using markdown?).

Comment by gworley on What are the high-level approaches to AI alignment? · 2020-06-17T01:15:35.993Z · score: 2 (1 votes) · LW · GW

Based on comments/links so far it seems I should revise the names and add a fourth:

  • IDA = IDA
  • IRL -> Ambitious Value Learning (AVL)
  • DTA -> Embedded Agency (EA)
  • + Brain Emulation (BE)
    • Build AI that either emulates how humans brains work or is bootstrapped from human brain emulations.
Comment by gworley on What are the high-level approaches to AI alignment? · 2020-06-17T01:11:51.170Z · score: 2 (1 votes) · LW · GW

Oh, I forgot about emulation approaches, i.e. bootstrap AI by "copying" human brains, which you mention. Thanks!

Comment by gworley on What are the high-level approaches to AI alignment? · 2020-06-17T01:09:36.457Z · score: 2 (1 votes) · LW · GW

That's true, but there's a natural and historical relationship here with what was in the past termed "seed AI", even if this is not an approach anyone is actively pursuing, which is the kind of thing I was hoping to point at without using that outmoded term.

Comment by gworley on What are the high-level approaches to AI alignment? · 2020-06-17T01:07:16.110Z · score: 4 (2 votes) · LW · GW

Thanks. Your post specifically is pretty helpful because it helps with one of the things that was tripping me up, which is what standard names people call different methods. Your names do a better job of capturing them than mine did.

Comment by gworley on What are the high-level approaches to AI alignment? · 2020-06-17T01:03:40.898Z · score: 2 (1 votes) · LW · GW

Actually this post was not especially helpful for my purpose and I should have explained why in advance because I anticipated someone would link it. Although it helpfully lays out a number of proposals people have made, it does more to work out what's going on with those proposals rather than find ways they can be grouped together (except incidentally). I even reread this post before posting this question and it didn't help me improve on the taxonomy I proposed, which I already had in mind as of a few months ago.

Comment by gworley on What are the high-level approaches to AI alignment? · 2020-06-16T17:16:38.034Z · score: 4 (2 votes) · LW · GW

My initial thought is that there are at least 3, which I'll give the follow names (with short explanations):

  • Iterated Distillation and Amplification (IDA)
    • Build an AI, have it interact with a human, create a new AI based on the interaction of the human and the AI, and repeat until the AI is good enough or it reaches a fixed point and additional iterations don't change it.
  • Inverse Reinforcement Learning (IRL)
    • Build an AI that tries to infer human values from observations and then acts based on those inferred values.
  • Decision Theorized Agent (DTA)
    • Build an AI that uses a decision theory that causes it to make choices that will be aligned with human interests.

All of these are woefully underspecified, so improved summaries of these approaches that you think accurately explain these approaches also appreciated.

Comment by gworley on Creating better infrastructure for controversial discourse · 2020-06-16T15:26:32.891Z · score: 6 (3 votes) · LW · GW

Currently there are three active forks of Lesswrong; Itself, the alignment forum and the EA forum. Can adding a new fork that is more focused on good discourse on controversial/taboo topics be a good idea?

Minor correction: the alignment forum isn't really a fork. Instead AF is like a subforum within LW.

Comment by gworley on Achieving AI alignment through deliberate uncertainty in multiagent systems · 2020-06-15T18:10:04.889Z · score: 2 (1 votes) · LW · GW

I generally have an unfavorable view of multi-agent approaches to safety, especially those that seek to achieve safety via creating multiple agents (I'm more sympathetic to considerations of how to increase safety on the assumption that multiple agents are unavoidable). That being said, you might find these links interesting for some prior related discussion on this site:

Comment by gworley on Cartesian Boundary as Abstraction Boundary · 2020-06-12T21:31:15.763Z · score: 4 (2 votes) · LW · GW

Lately I've been thinking a bit about why programmers have preferences for different programming languages. Like, why is it that I want a language that is (speaking informally)

  • flexible (lets me do things many ways)
  • dynamic (decides what to do at run time rather than compile time)
  • reflexive (let's me change the program source while it's running, or change how to interpret a computation)

and other people want the opposite:

  • rigid (there's one right way to do something)
  • static (decides what to do at compile time, often so much so that static analysis is possible)
  • fixed (the same code is guaranteed to always behave the same way no matter the execution context)

And I think a reasonable argument might be a difference in how much different programmers value the creation of a Cartesian boundary in the language, i.e. how much they want to be able to reason about the program as if it existed outside the execution environment. My preference is to move along the "embedded" end of a less-to-more Cartesian-like abstraction dimension of program design, preferring abstractions that, if not less hide the underlying structure, then at least more make it accessible.

This might not be perfect as I'm babbling a bit to try to build out my model of what the difference in preference comes from, but seems kind of interesting and a possible additional example of where this kind of Cartesian-like boundary gets created.

Comment by gworley on Karma fluctuations? · 2020-06-12T01:44:47.183Z · score: 3 (2 votes) · LW · GW

You seem to be rejecting a position I'm not taking, possibly because I didn't explain it in a maximally clear way.

I'm not saying to vote up/down things you think others will like/dislike, I'm saying vote up/down the things you want other people to read/not read.

Notice how this is not the same as voting up/down what you like/dislike or what you personally want to read/not read or what you think others will like/dislike or what you think others will themselves want to read/not read. I'm saying think of it as saying "I want/don't want this to be seen by others".

Given this framing I end up rarely downvoting things, mostly reserving my downvote for things that feel like an obvious waste of time for all readers. I upvote lots of things by this criteria, especially including things I disagree with or think are wrong, because they seem worth engaging with. And of course lots of things get no vote from me, because I fail to form a judgement of whether or not it's worth reading.

Comment by gworley on Karma fluctuations? · 2020-06-11T17:57:34.600Z · score: 3 (2 votes) · LW · GW

The point is to downvote content that you want to see less of, not content that you disagree with. If by "controversial" you mean "that some people don't want to see it," then I can't speak for others but I can say that personally the whole internet is full of content that I don't want to see (including and indeed especially content that I mostly agree with).

In practice I think people don't do a great job separating "disagree" with "I don't want to see this" because disagreeing often implies not wanting to see something for many people. I wish the norm were less focused on what I want to see and more on what I think is worth being seen by people reading LW.

I think this shift from personal preference to a focus on curating content for others shifts the approach to voting in a way that is likely to better result in votes that reflect what is worth reading when a person comes to the site rather than what people on LW like.

(I previously have had more to say on voting on LW)

Comment by gworley on Goal-directedness is behavioral, not structural · 2020-06-10T01:10:09.907Z · score: 8 (3 votes) · LW · GW

Attempting to approach goal directedness behaviorally is, I expect, going to run into the same problems as trying to infer policy from behaviors only: you can't do it unless you make some normative assumption. This is exactly analogous to the Armstrong's No Free Lunch Theorem for value learning and, to turn it around the other way, we can similarly assign any goal whatsoever to a system based solely on its behavior unless we make some sufficiently strong normative assumption about it.

Comment by gworley on Bob Jacobs's Shortform · 2020-06-05T21:45:22.519Z · score: 4 (2 votes) · LW · GW

I think you're pointing to something that is a fully general problem with reasoning biases and logical fallacies: if people know about it, they might take you pointing out that they're doing it as an attack on them rather than noticing they may be inadvertently making a mistake.

Comment by gworley on Why isn’t assassination/sabotage more common? · 2020-06-04T20:34:26.755Z · score: 4 (2 votes) · LW · GW

If the USA assassinates foreigners, foreigners can fight back. It’s in everyone’s best interest to maintain a low-assassination equilibrium instead of a high-assassination one.

This is the standard, game-theoretic reason I've always heard. Assassination and sabotage are effective and can be carried out with enough secrecy that no one could necessarily prove you did it, but engaging in them creates a world where you have to defend against them because they are normalized, so it's a tool that gets reserved for only those cases where it is deemed to be worth the risk.

Comment by gworley on Focus: you are allowed to be bad at accomplishing your goals · 2020-06-04T03:42:09.460Z · score: 5 (2 votes) · LW · GW

I agree with your intuition that an agent should be allowed to be bad at accomplishing its purpose.

To me the issue is that you're leaving out self-awareness of the goal. That is, to me what makes an agent fully agentic is that it not only is trying to do something but it knows it is trying to do something. This creates a feedback loop within itself that helps keep it on target.

Many agentic-ish systems like RL systems sort of look like this, but the feedback loop that keeps them on target exists outside themselves and thus the agent is actually the RL system plus the human researchers running it. Or you have undirected systems like evolution that look sort of agentic but then don't because you can "trick" them into doing things that are "not their purpose" because they don't really have one, they just execute with no sense of purpose, even if their pattern of behavior is well predicted by modeling them as if they had goals.

Comment by gworley on The Law of Cultural Proximity · 2020-06-04T03:33:57.397Z · score: 2 (1 votes) · LW · GW

I appreciate the sentiment here, but I feel like it's lacking something. Something like I can't really agree or disagree with it: I feel like this is telling a just-so story that could be right but that I also can't really find much opportunity to prove wrong (yes, there is some straightforward things to test; I mean this as more within the reference class of things this is aiming to explain and forecast). It doesn't say enough or give us enough reason to believe in a "law of cultural proximity" beyond some vague intuition that this seems roughly what the world looks like.

Basically I'd like to see posts on LW that point more to evidence rather than isolated reasoning within a model that might or might not be relevant to understanding the real world.

Comment by gworley on What proportion of US companies would agree to this gross pay deduction / direct donation if asked by an employee? · 2020-06-02T22:28:16.189Z · score: 2 (1 votes) · LW · GW

I've not heard of anyone doing this but in principle sounds like it should work since you can contact for all kinds of benefits. You're right that employers might not like it, some reasons include

  • it's different and weird and makes payroll more complicated
  • might be hard to explain to investors
  • might have strange, unforeseen tax consequences for the employee or the employer
Comment by gworley on Why We Age, Part 2: Non-adaptive theories · 2020-05-29T21:00:05.690Z · score: 2 (1 votes) · LW · GW

For the next post on cases where aging is adaptive, I think the octopus would make a great case study since they are well known for "self-destructing" after reproducing, theorized possibly to prevent them from competing with their offspring or eating them, i.e. octopuses that die after reproducing so they don't outcompete their offspring and drive the species extinct through a cycle of falling survival to reproduction rates.

Comment by gworley on OpenAI announces GPT-3 · 2020-05-29T17:00:18.070Z · score: 0 (3 votes) · LW · GW

On the other hand this still looks more like a service than part of a path towards general intelligence, even if it's a very broad, flexible, and fairly general service, but, for example, I don't expect GPT-3 to do things like come up with things to do, only to do the things it is asked to (although I'm sure there's some interesting wandering that can happen by applying GPT-3 recursively).

Comment by gworley on The principle of no non-Apologies · 2020-05-29T16:53:55.425Z · score: 3 (2 votes) · LW · GW

I follow a similar policy of not apologizing unless I really mean it, and meaning it for me is acknowledging that I am ethically culpable for a harm caused. By this I mean something like I knew enough to have done otherwise, but through negligence or motivated reasoning either actively caused harm or through inaction allowed a harm to occur. In those cases an apology seems warranted.

I don't apologize for lots of things, though. If I was ignorant of information that would have allowed me to avoid the harm and then I learn about it, there's no reason to apologize, but there is need to acknowledge that I would have acted otherwise had I known and to publicly make that update. I think this serves much of the purpose of apology, but also recognizes there's nothing for me to regret: I did the best I could at still failed and that's okay.

(Of course, the real answer is that we all always do our best and couldn't have done anything other than what we did, so none of us need ever regret anything, but that's operating at the wrong level of abstraction. Apologies exist in the social ontology and need to deal with regret that can appear there even if there's no causal regret because there is no free will.)

Comment by gworley on The principle of no non-Apologies · 2020-05-29T16:44:03.565Z · score: 5 (3 votes) · LW · GW

Yeah, this is slightly annoying that we have this idiom. And unfortunately people sometimes take an expression of sympathy said as "I'm sorry" as an apology which makes them respond to what you literally said rather than the intent, sort of like if you said "bless you" after someone sneezed and they asked "oh, are you a priest?" or "no thanks, I'm an atheist".

I think the intent of "I'm sorry" here is to say "I regret this is happening to you" along with some combination of "I feel sorrow at hearing this new". Still, it's confusing.

My general policy is to try to avoid saying "I'm sorry" to mean "I sympathize with you" and go for something more direct like "that sucks" or "oh no" or just a wordless expression of sympathy through body language, although sometimes I say it anyway. Language is tough sometimes!

Comment by gworley on Trust-Building: The New Rationality Project · 2020-05-29T16:36:31.587Z · score: 5 (3 votes) · LW · GW

To increase unity and pursue the truth, your goal is to find foreign communities, and determine whether they are friendly or dangerous.

This is an interesting framing I hadn't considered. I usually think of it more as just an exploration to find what perspectives communities/traditions/etc. offer in an ongoing hermeneutical search for the truth, digging into the wrong models other people have to get a bit less wrong myself, which seems pretty compatible with your take.  

Comment by gworley on Normalization of Deviance · 2020-05-27T19:32:31.207Z · score: 2 (1 votes) · LW · GW

Good article I recently came across about normalization of deviance in safety settings where it can produce a "drift towards danger".

Comment by gworley on Predicted Land Value Tax: a better tax than an unimproved land value tax · 2020-05-27T19:30:57.257Z · score: 2 (1 votes) · LW · GW

My usual answer to "why tax land?" is "the speed of light", i.e. the physical constraints of our universe making locality valuable. And we're okay to tax things that are powered by things like physical constants because they are constant and if you get the policy right you minimize the creation of disincentives on the margin, i.e. you tax the "fixed" value of land (locality) so you avoid doing things like creating price floors and ceilings or creating marginal disincentives that result in no development or less development since any marginal development will pay for itself, taxes included.

Comment by gworley on Predicted Land Value Tax: a better tax than an unimproved land value tax · 2020-05-27T16:26:09.068Z · score: 8 (5 votes) · LW · GW
Give up. Taxation isn't about optimality, it's about power and perception. The primary question is "how much can the authority extract without losing the constituency".

I think there's two ways we can approach this.

One is as an interesting problem to look at. It's not necessarily that policy proposals will be adopted, but exploring policies is an interesting exercise, and it may lead to marginally better policies or even Pareto improvements in policy if they were adopted. And following the idea that the purpose of policy exploration is to have ready-at-hand better policies/ideas when a crisis strikes and people are looking for alternatives because the current system isn't working, it seems worth doing this work even if it doesn't seem clear what the path to adoption is.

The second is that you're right, humans often forego "better" solutions for ones that better serve other purposes. For example, is taxing income a good idea? Probably not, but it's relatively easy to do and to understand, so it's a straightforward policy choice. Does this mean we can't consider alternative taxation systems to the one we have, though? I think it doesn't, only that people developing policy must eventually consider things other than economic efficiency if they hope to develop policies that are likely to be adopted.

Comment by gworley on Thinking About Filtered Evidence Is (Very!) Hard · 2020-05-26T03:04:42.182Z · score: 2 (1 votes) · LW · GW
Assumption 3. A listener is said to have minimally consistent beliefs if each proposition X has a negation X*, and P(X)+P(X*)≤1.

One thing that's interesting to me is that this is assumption is frequently not satisfied in real life due to underspecification, e.g. P(I'm happy) + P(I'm not happy) ≥ 1 because "happy" may be underspecified. I can't think of a really strong minimal example, but I feel like this pops up a lot of discussions on complex issues where a dialectic develops because neither thesis nor antithesis capture everything and so both are underspecified in ways that make their naive union exceed the available probability mass.

Comment by gworley on Baking is Not a Ritual · 2020-05-26T01:14:03.221Z · score: 14 (7 votes) · LW · GW

To say a little more, programming is the thing that resonates for me most strongly with the above description of baking as not ritual. I notice it because there are lots of people who learn to program by ritual. They do well enough to get jobs and become my coworkers, but then they also write code full of cruft because they are programming by ritual rather than through deep understanding.

The classic examples of this that come to my mind are things like people thinking the iterator variable on a for loop must be named i or else it won't work, or struggling with sed because they don't realize they can pick the separator character on each invocation, or thinking they need [technology X] because they heard successful businesses use [technology X].