Minimization of prediction error as a foundation for human values in AI alignment 2019-10-09T18:23:41.632Z · score: 12 (6 votes)
Elimination of Bias in Introspection: Methodological Advances, Refinements, and Recommendations 2019-09-30T20:23:13.139Z · score: 16 (3 votes)
Connectome-specific harmonic waves and meditation 2019-09-30T18:08:45.403Z · score: 11 (9 votes)
Goodhart's Curse and Limitations on AI Alignment 2019-08-19T07:57:01.143Z · score: 15 (7 votes)
G Gordon Worley III's Shortform 2019-08-06T20:10:27.796Z · score: 16 (2 votes)
Scope Insensitivity Judo 2019-07-19T17:33:27.716Z · score: 19 (9 votes)
Robust Artificial Intelligence and Robust Human Organizations 2019-07-17T02:27:38.721Z · score: 17 (7 votes)
Whence decision exhaustion? 2019-06-28T20:41:47.987Z · score: 17 (4 votes)
Let Values Drift 2019-06-20T20:45:36.618Z · score: 3 (11 votes)
Say Wrong Things 2019-05-24T22:11:35.227Z · score: 99 (36 votes)
Boo votes, Yay NPS 2019-05-14T19:07:52.432Z · score: 34 (11 votes)
Highlights from "Integral Spirituality" 2019-04-12T18:19:06.560Z · score: 21 (20 votes)
Parfit's Escape (Filk) 2019-03-29T02:31:42.981Z · score: 40 (15 votes)
[Old] Wayfinding series 2019-03-12T17:54:16.091Z · score: 9 (2 votes)
[Old] Mapmaking Series 2019-03-12T17:32:04.609Z · score: 9 (2 votes)
Is LessWrong a "classic style intellectual world"? 2019-02-26T21:33:37.736Z · score: 31 (8 votes)
Akrasia is confusion about what you want 2018-12-28T21:09:20.692Z · score: 27 (16 votes)
What self-help has helped you? 2018-12-20T03:31:52.497Z · score: 34 (11 votes)
Why should EA care about rationality (and vice-versa)? 2018-12-09T22:03:58.158Z · score: 16 (3 votes)
What precisely do we mean by AI alignment? 2018-12-09T02:23:28.809Z · score: 29 (8 votes)
Outline of Metarationality, or much less than you wanted to know about postrationality 2018-10-14T22:08:16.763Z · score: 19 (17 votes)
HLAI 2018 Talks 2018-09-17T18:13:19.421Z · score: 15 (5 votes)
HLAI 2018 Field Report 2018-08-29T00:11:26.106Z · score: 49 (20 votes)
A developmentally-situated approach to teaching normative behavior to AI 2018-08-17T18:44:53.515Z · score: 12 (5 votes)
Robustness to fundamental uncertainty in AGI alignment 2018-07-27T00:41:26.058Z · score: 7 (2 votes)
Solving the AI Race Finalists 2018-07-19T21:04:49.003Z · score: 27 (10 votes)
Look Under the Light Post 2018-07-16T22:19:03.435Z · score: 25 (11 votes)
RFC: Mental phenomena in AGI alignment 2018-07-05T20:52:00.267Z · score: 13 (4 votes)
Aligned AI May Depend on Moral Facts 2018-06-15T01:33:36.364Z · score: 9 (3 votes)
RFC: Meta-ethical uncertainty in AGI alignment 2018-06-08T20:56:26.527Z · score: 18 (5 votes)
The Incoherence of Honesty 2018-06-08T02:28:59.044Z · score: 22 (12 votes)
Safety in Machine Learning 2018-05-29T18:54:26.596Z · score: 17 (4 votes)
Epistemic Circularity 2018-05-23T21:00:51.822Z · score: 5 (1 votes)
RFC: Philosophical Conservatism in AI Alignment Research 2018-05-15T03:29:02.194Z · score: 29 (10 votes)
Thoughts on "AI safety via debate" 2018-05-10T00:44:09.335Z · score: 33 (7 votes)
The Leading and Trailing Edges of Development 2018-04-26T18:02:23.681Z · score: 24 (7 votes)
Suffering and Intractable Pain 2018-04-03T01:05:30.556Z · score: 13 (3 votes)
Evaluating Existing Approaches to AGI Alignment 2018-03-27T19:57:39.207Z · score: 22 (5 votes)
Evaluating Existing Approaches to AGI Alignment 2018-03-27T19:55:57.000Z · score: 0 (0 votes)
Idea: Open Access AI Safety Journal 2018-03-23T18:27:01.166Z · score: 64 (20 votes)
Computational Complexity of P-Zombies 2018-03-21T00:51:31.103Z · score: 3 (4 votes)
Avoiding AI Races Through Self-Regulation 2018-03-12T20:53:45.465Z · score: 6 (3 votes)
How safe "safe" AI development? 2018-02-28T23:21:50.307Z · score: 27 (10 votes)
Self-regulation of safety in AI research 2018-02-25T23:17:44.720Z · score: 33 (10 votes)
The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation 2018-02-23T21:42:20.604Z · score: 15 (4 votes)
AI Alignment and Phenomenal Consciousness 2018-02-23T01:21:36.808Z · score: 10 (2 votes)
Formally Stating the AI Alignment Problem 2018-02-19T19:07:14.000Z · score: 0 (0 votes)
Formally Stating the AI Alignment Problem 2018-02-19T19:06:04.086Z · score: 14 (6 votes)
Bayes Rule Applied 2018-02-16T18:30:16.470Z · score: 12 (3 votes)
Introduction to Noematology 2018-02-05T23:28:32.151Z · score: 11 (4 votes)


Comment by gworley on When we substantially modify an old post should we edit directly or post a version 2? · 2019-10-11T20:22:01.826Z · score: 5 (3 votes) · LW · GW

I generally think new posts are a good idea. The old post is what it was when it was written. If you no longer endorse it or think you can write a better version, do so and link it from the original. I think there is value in being able to find and read old versions of things as they were when they were written without the past being edited out of existence, letting us see the trail of how an idea was shaped and formed and how the author grew in their understanding and ability to express themselves. Additionally the comments on a post may no longer make sense if you substantially edit it.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T20:15:18.486Z · score: 4 (2 votes) · LW · GW

Yeah, this has been a really good comment section for figuring out how my internal models are not as easily conveyed to others as I had hoped. I'll likely write a follow up post trying to explain this idea again with some revised language to make the point possibly clearer and lean more on specifics from existing research on these models since there seem to be some inferential gaps that I have forgotten about such that what feels like the exciting new part to me (prediction error signal = valence = ground of value) is maybe the least interesting and least important aspect to evaluate for others who lack the same beliefs I have about how I think what I'm gesturing at with "predictive coding" and "minimization of prediction error" work.

Comment by gworley on "Mild Hallucination" Test · 2019-10-11T20:10:28.801Z · score: 3 (2 votes) · LW · GW

Yeah, that's a decent approximation. Same effect at twilight. Although it's missing some really important aspect of the experience that's hard to explain, like it's just visually similar and lacks the aliveness that fills everything when I experience them as glowing or bright or self illuminating.

Comment by gworley on "Mild Hallucination" Test · 2019-10-11T16:50:54.310Z · score: 3 (2 votes) · LW · GW

Probably not? It's a thing that started happening to me after ~500 hours of meditation, and it happens in any environment. There's not even a thing to look for exactly because it doesn't feel like a normal visual hallucination. Things look basically the same as they did before but it feels like they are bright. Maybe the closest thing would be something like what happens when your eyes are artificially dilated.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T16:43:19.533Z · score: 3 (2 votes) · LW · GW

It looks like I read your post but forgot about it. I'll have to look at it again.

I am building this theory in a way that I think is highly compatible with Friston, although I also don't have a gears-level understanding of Friston, so I find it easier to think in terms of control systems which appear to offer an equivalent model to me.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T16:34:13.355Z · score: 4 (2 votes) · LW · GW

I find speaking in terms of minimization of prediction error useful to my own intuitions, but it does increasingly look like what I'm really thinking of are just generic homeostatic control systems. I like talking in terms of prediction error because I think it makes the translation to other similar theories easier (I'm thinking other Bayesian brain theories and Friston's free energy theory), but I think it's right to think I'm just thinking about a control system sending signals to hit a set point, even if some of those control systems do learn in a way that looks like Bayesian updating or minimization of prediction error and others don't.

The sense in which I think of this theory as parsimonious is that I don't believe there is a simpler mechanism that can explain what we see. If we could talk about these phenomena in terms of control systems without using signals about distance from set points I'd prefer that, and I think the complexity we get from having to build things out of such simple components is the right move in terms of parsimony rather than having to postulate additional mechanisms. As long as I can explain things adequately without having to introduce more moving parts I'll consider it maximally parsimonious as far as my current knowledge and needs go.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-11T04:23:00.280Z · score: 2 (1 votes) · LW · GW

So, similarly, we can't predict that you or I verbally report positive/negative/neutral attaching to percepts from the claim that the sensory hierarchy is composed of units which are controllers. A controller has valence in that it has goals and how-it's-doing on those goals, but why should we expect that humans verbally report the direct experience of that? Humans don't have direct conscious experience of everything going on in neural circuitry.

Yeah this is s good point and I agree it's one of the things that I am looking for others to verify with better brain imaging technology. I find myself in the position of working ahead of what we can completely verify now because I'm willing to take the bet that it's right or at least right enough that however it's wrong won't throw out the work I do.

Comment by gworley on Gears vs Behavior · 2019-10-11T01:44:44.288Z · score: 2 (1 votes) · LW · GW

Great and simple explanation of an important topic. I just hope I can remember to link this post often so people can find it when they start to question whether or not they really need to know about the gears of something.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-10T22:04:14.243Z · score: 2 (1 votes) · LW · GW
We Need To Explain Why Humans Differentiate Goals and Beliefs, Not Just Why We Conflate Them
You mention that good/bad seem like natural categories. I agree that people often seem to mix up "should" and "probably is", "good" and "normal", "bad" and "weird", etc. These observations in themselves speak in favor of the minimize-prediction-error theory of values.
However, we also differentiate these concepts at other times. Why is that? Is it some kind of mistake? Or is the conflation of the two the mistake?
I think the mix-up between the two is partly explained by the effect I mentioned earlier: common practice is optimized to be good, so there will be a tendency for commonality and goodness to correlate. So, it's sensible to cluster them together mentally, which can result in them getting confused. There's likely another aspect as well, which has something to do with social enforcement (ie, people are strategically conflating the two some of the time?) -- but I'm not sure exactly how that works.

This seems like an important question: if all these phenomena really are ultimately the same thing and powered by the same mechanisms, why do we make distinctions between them and find those distinctions useful?

I don't have an answer I'm satisfied with, but I'll try to say a few words about what I'm thinking and see if that moves us along.

My first approximation would be that we're looking at things that we experience by different means and so give them different names because when we observe them they present in different ways. Goals (I assume by this you mean the cluster of things we might call desires, aversions, and generally intention towards action) probably tend to be observed by noticing the generation of signals going out that usually generate observable actions (movement, speech, etc.) whereas beliefs (the cluster of things that includes thoughts and maybe emotions) are internal and not sending out signals to action beyond mental action.

I don't know enough to be very confident in that, though, and think like you that it could be due to numerous reasons why it might make sense to think of them as separate even if they are fundamentally not very different.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-10T21:42:22.070Z · score: 2 (1 votes) · LW · GW
It's Easy to Overestimate The Degree to which Agents Minimize Prediction Error
I often enjoy variety -- in food, television, etc -- and observe other humans doing so. Naively, it seems like humans sometimes prefer predictability and sometimes prefer variety.
However: any learning agent, almost no matter its values, will tend to look like it is seeking predictability once it has learned its environment well. It is taking actions it has taken before, and steering toward the environmental states similar to what it always steers for. So, one could understandably reach the conclusion that it is reliability itself which the agent likes.
In other words: if I seem to eat the same foods quite often (despite claiming to like variety), you might conclude that I like familiarity when it's actually just that I like what I like. I've found a set of foods which I particularly enjoy (which I can rotate between for the sake of variety). That doesn't mean it is familiarity itself which I enjoy.
I'm not denying that mere familiarity has some positive valence for humans; I'm just saying that for arbitrary agents, it seems easy to over-estimate the importance of familiarity in their values, so we should be a bit suspicious about it for humans too. And I'm saying that it seems like humans enjoy surprises sometimes, and there's evolutionary/machine-learning reasoning to explain why this might be the case.

I've replied about surprise, its benefits, and its mechanism a couple times now. My theory is that surprise is by itself bad but can be made good by having control systems that expect surprise and send a good signal when surprise is seen. Depending on how this gets weighted, this creates a net positive mixed emotion where surprise is experienced as something good and serves many useful purposes.

I think this mostly dissolves the other points you bring up that I read as contingent on thinking the theory doesn't predict humans would find variety and surprise good in some circumstances, but if not please let me know what the remaining concerns are in light of this explanation (or possibly object to my explanation of why we expect surprise to sometimes be net good).

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-10T21:37:01.901Z · score: 2 (1 votes) · LW · GW
Evolved Agents Probably Don't Minimize Prediction Error
If we look at the field of reinforcement learning, it appears to be generally useful to add intrinsic motivation for exploration to an agent. This is the exact opposite of predictability: in one case we add reward for entering unpredictable states, whereas in the other case we add reward for entering predictable states. I've seen people try to defend minimizing prediction error by showing that the agent is still motivated to learn (in order to figure out how to avoid unpredictability). However, the fact remains: it is still motivated to learn strictly less than an unpredictability-loving agent. RL has, in practice, found it useful to add reward for unpredictability; this suggests that evolution might have done the same, and suggests that it would not have done the exact opposite. Agents operating under a prediction-error penalty would likely under-explore.

I ended up replying to this in a separate post since I felt like similar objections kept coming up. My short answer is: minimization of prediction error is minimization of error at predicting input to a control system that may not be arbitrarily free to change its prediction set point. This means that it won't always be the case that a control system is globally trying to minimize prediction error, but instead is locally trying to minimize prediction error, although it may not be able to become less wrong over time because it can't change the prediction to better predict the input.

From an evolutionary perspective my guess is that true Bayesian updating is a fairly recent adaptation, and most minimization of prediction error is minimization of error of mostly fixed prediction set points that are beneficial for survival.

Comment by gworley on "Mild Hallucination" Test · 2019-10-10T21:30:50.055Z · score: 3 (2 votes) · LW · GW

No, that's not quite what I had in mind for glowing, although I do experience something like it in some circumstances. What I meant was more like things feel bright, like they have an inner light shinning out from them.

I do think it's something like what you are describing from The Mind Illuminated. After all, I practice shikantaza, and before it picked up that name it was known as the method of "silent illumination", so it doesn't surprise me to see someone else talking about this as "inner light" (some traditions talk about this as "luminosity"; it feels to me like sensing perfection).

Comment by gworley on "Mild Hallucination" Test · 2019-10-10T18:23:46.752Z · score: 3 (2 votes) · LW · GW

I'm not sure how neurodivergent I am, but I have seen visual snow since I was a little kid (I used to tell people I could "see the air" and they'd be like "wut????"; can't remember a time I didn't do this) and of course after images, and since meditating more I see glowing effects (not auras), flickering/vibrations, breathing, and even tracers, both during and not during meditation. I also have "tinnitus" that looks like hearing a tone mostly only when other sounds don't drown it out although vary occasionally something "clicks" and I temporarily notice it for a few seconds when things are noisy.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-10T18:13:04.376Z · score: 2 (1 votes) · LW · GW

One point of confusion that keeps coming up seems worth clarifying in a top level comment to this post:

minimization of prediction error is minimization of error at predicting input to a control system that may not be arbitrarily free to change its prediction set point

This means that it won't always be the case that a control system is globally trying to minimize prediction error, but instead is locally trying to minimize prediction error, although it may not be able to become less wrong over time because it can't change the prediction to better predict the input.

My suspicion is that in humans the neocortex mostly is normatively Bayesian, made out of lots of identical control systems that get specialized to do different things and each one of them can freely update in a Bayesian manner to optimally minimize prediction error. The rest of it is probably a lot less Bayesian, with harder or impossible to update prediction set points that serve purposes ultimately necessary for survival and reproduction that got set via evolutionary processes.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-10T17:56:26.929Z · score: 5 (3 votes) · LW · GW
The tone of the following is a bit more adversarial than I'd like; sorry for that. My attitude toward predictive processing comes from repeated attempts to see why people like it, and all the reasons seeming to fall flat to me. If you respond, I'm curious about your reaction to these points, but it may be more useful for you to give the positive reasons why you think your position is true (or even just why it would be appealing), particularly if they're unrelated to what I'm about to say.

I'll reply to your points soon because I think doing that is a helpful way for me and others to explore this idea, although it might take me a little time since this is not the only thing I have to do, but first I'll respond to this request that I seemingly left out.

I have two main lines of evidence that come together to make me like this theory.

One is that it's elegant, simple, and parsimonious. Control systems are simple, they look to me to be the simplest thing we might reasonably call "alive" or "conscious" if we try to redefine those terms in ways that are not anchored on our experience here on Earth. I think the reason it's so hard to answer questions about what is alive and what is conscious is because the naive categories we form and give those names are ultimately rooted in simple phenomena involving information "pumping" that locally reduce entropy but there are many things that do this that are outside our historical experience of what we could observe to generate information which historically made more sense to think of as "dead" than "alive". In a certain sense this leads me to a position you might call "cybernetic panpsychism", but that's just fancy words for saying there's nothing so special going on in the universe that makes us different from rocks and stars than (increasingly complex) control systems creating information.

Another is that it fits with a lot of my understanding of human psychology. Western psychology doesn't really get down to a level where it has a solid theory of what's going on at the lowest levels of the mind, but Buddhist's psychology of the abhidharma does, and it says that right after "contact" (stuff interacting with neurons) comes "feeling/sensing", and this is claimed to always contain a signal of positive, negative, or neutral judgement. My own experience with meditation showed me something similar such that when I learned about this theory it seemed like an obviously correct way of explaining what I was experiencing. This makes me strongly believe that any theory of value we want to develop should account for this experience of valence showing up and being attached to every experience.

In light of this second reason, I'll add to my first reason that it seems maximally parsimonious that if we were looking for an origin of valence it would have to be about something simple that could be done by a control system, and the simplest thing it could do that doesn't simply ignore the input is test how far off an observed input is from a set point. If something more complex is going on, I think we'd need an explanation for why sending a signal indicating distance from a set point is not enough.

I briefly referenced these above, but left it all behind links.

I think there are also some other lines of evidence that are less compelling to me but seem worth mentioning:

  • People have managed to build AI out of control systems minimizing prediction error, albeit doing, like I propose is necessary, by having some fixed set points that prevent dark room problems.
  • Neurons do seem to function like simple control systems, though I think we have yet to determine with sufficient certainty that is all that is going on.
  • Predictive coding admits explanations for many phenomena, but this risks just-so stories of the sort we see when evolutionary psychology tries to claim more than it can.
Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-10T17:24:04.328Z · score: 2 (1 votes) · LW · GW

I think this is sort of sideways. It's true, but I think it also misses the deeper aspects of the theory I have in mind.

Yes, from easily observed behavior that's what it looks like: exploitation is about minimizing prediction error and exploration is about, if not maximizing it, then at least not minimizing it. But the theory says that if we see exploration and the theory is correct, then exploration must somehow to built of out things that are ultimately trying to minimize prediction error.

I hope to give a more precise, mathematical explanation of this theory in the future, but for now I'll give the best English language explanation I can of how exploration might work (keeping in mind we should be able to eventually find out exactly how it works if this theory is right with sufficient brain scanning technology).

I suspect exploration happens because a control system in the brain takes as input how much error minimization it observes as measured by how many good and bad signals get sent in other control systems. It then has a set point for some relatively stable and hard to update amount of bad signals it expects to see, and if it has not been seeing enough surprise/mistakes then it starts sending its own bad signals encouraging "restlessness" or "exploration". This is similar to my explanation of creativity from another comment.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-10T00:15:55.091Z · score: 3 (2 votes) · LW · GW

See my reply to jessicata's similar objection and let's try to merge follow ups over there.

Comment by gworley on Minimization of prediction error as a foundation for human values in AI alignment · 2019-10-10T00:14:54.098Z · score: 3 (2 votes) · LW · GW

This is a problem if we attempt to explain things only in terms of minimization of prediction error, usually considered in the form of the "dark room" problem. The solution to this is allow the system to have, as I mention, set points that are slow to update or never update. These ensure humans keep doing things they would otherwise not do because they would be surprising.

To consider your cases of surprising and good and predictable and bad, I believe I have plausible explanations of these phenomena that may explain what's going on, although I will also freely admit that these are just plausible explanations border on being just-so stories because we currently lack the evidence to verify them ground up.

  • surprising and good:
    • creativity: In order to explore the space of possible solutions without getting stuck in local maxima, creativity seems valuable. My theory is that the drive to do surprising things that we call creativity is powered by control systems looking at other control systems and predicting they will generate negative signals indicating error. This makes the systems causing creativity something like financial derivatives.
    • agenty people who positively surprise you: I expect this is a kind of mixed emotion. Surprise is by itself bad, but when the surprise is mixed in with lots of other things that cause other control systems to send good signals because their set points are validated and can produce a net positive experience, even leading us to learn to expect surprise, via secondary control systems monitoring the output of other control systems to see when they are surprised, to directly end up thinking of surprise as secondarily good.
    • children: I expect much of the reasoning around children will be grounded in systems that intentionally don't track truth but instead use set points that are evolutionarily adaptive to get people to do things that are good for differential reproduction and bad for truth.
  • predictable and bad:
    • boring routines: Not all people find boring routines bad, but among those who do I expect the mechanism to be set points encouraging more error ("creativity") as described above.
    • highly derivative art: Derivative art probably looks a lot like boring routines: some people like them because they are predictable, others don't because they are "restless" in that they control systems expecting more error (I don't think this is exactly openness to experience but it does overlap that psychometric).
    • solitary confinement: This is disallowing many set points to come true that are not about predicting reality but about survival and are minimally mutable, so people experience solitary confinement as bad because they keep predicting they will be outside, see friends, etc. because to give those up is to give up import set points that enable survival and so it's a kind of continual hell of being disappointed in every moment with the knowledge that it's not going to change.

It's probably helpful to note that when I say "prediction" I'm often equivocating with "set point" in that I don't think of predictions in this theory as necessarily meant to always be predictions of what will actually be seen, even if they often are, but instead as set points in control systems that get them to combine in particular ways by predicting inputs, even if those predictions are sometimes forced by biology and evolution to be consistently wrong or right via observing other signals, etc.

Comment by gworley on Categories: models of models · 2019-10-09T19:24:15.555Z · score: 4 (5 votes) · LW · GW

This continues to be a slyly gentle series that has you in to something before you know it. Well done!

As a side note, maybe you or the admins can set these posts up as a sequence so they are linked together.

Comment by gworley on A framework for speed reading · 2019-10-09T01:15:36.314Z · score: 2 (1 votes) · LW · GW

I really appreciated this for the insight that I can't read faster than I can think (this is so obvious in hindsight it's hard to believe I hadn't previously considered it!). This gives good context for why I can't seem to read faster than between 200 and 250 wpm while both reading everything and applying all the techniques I've been taught. That I'm only just now learning this despite dabbling over the years at speeding up my reading is surprising to me, but as you say that seems to be because everyone is focused on fragmented reading and maybe is incentivized to ignore limits to give people what they want.

Comment by gworley on Formal Metaethics and Metasemantics for AI Alignment · 2019-10-08T23:44:34.734Z · score: 3 (2 votes) · LW · GW

I like the intention and spirit of this.

Abstract: We construct a fully technical ethical goal function for AI by directly tackling the philosophical problems of metaethics and mental content. To simplify our reduction of these philosophical challenges into “merely” engineering ones, we suppose that unlimited computation and a complete low-level causal model of the world and the adult human brains in it are available.

I think that, just like with AIXI, these sorts of assumptions mean this approach is practically unworkable, it's possible that like AIXI it can serve as a model of the ideal. I realize you have posted your code interspersed with comments, and I look forward to seeing more about this in the future as you develop your explanation of it (right now I lack a specific enough model of it to evaluate it beyond liking the intent).

Comment by gworley on Human instincts, symbol grounding, and the blank-slate neocortex · 2019-10-02T19:17:20.460Z · score: 6 (3 votes) · LW · GW

I found this post really interesting. My main interest has been in understanding human values, and I've been excited by predictive coding because it possibly offers a way to ground values (good being derived from error minimization, bad from error maximization). The CCA theory could help explain why it seems so much of the brain is "doing the same kind of thing" that could result in predictive coding being a useful model even if it turns out the brain doesn't literally have neurons wired up as control systems that minimize sensory prediction error.

Comment by gworley on What are we assuming about utility functions? · 2019-10-02T18:48:07.233Z · score: 2 (1 votes) · LW · GW

FWIW I am both pro-utility and anti-utility at the same time: I think your AGI utility hypothesis and the ASI utility hypothesis are basically correct, but think the human utility hypothesis is wrong (humans can't be adequately modeled by utility functions for the purposes of alignment, even if they can be modeled by them adequately for other purposes), and as a consequence worry that CEV might not be possible depending on what level of identity preservation is desired (in fact I think CEV is largely ill-defined due to identity boundary issues, but that is a separate issue).

Comment by gworley on The first step of rationality · 2019-09-30T23:57:47.407Z · score: 2 (1 votes) · LW · GW

Agreed. It's definitely sexual misconduct in the sense generally meant by the lay precept against sexual misconduct in Buddhist schools.

Comment by gworley on The first step of rationality · 2019-09-30T20:28:11.212Z · score: 5 (3 votes) · LW · GW

I think you miss my point. He's offering here a detailed explanation of how we can fail to integrate with what we value, and I point out that maybe that's because he has such difficulty with it. I'm not making any particular point here about whether or not he is an expert in meditation, bodhi, etc. or whether that is relevant to sexual misconduct beyond it being an instance of failing to integrate his actions with what he claims to value. That most people manage not to engage in sexual misconduct, commit murder, steal, etc. doesn't seem much relevant to the point of understanding the gears of how not to do it, even if most of us don't need to know about the gears to get our desired outcome.

Comment by gworley on What's your favorite notetaking system? · 2019-09-30T19:40:20.418Z · score: 3 (2 votes) · LW · GW

I used to take notes but don't anymore. Instead my "system" consists of:

  • send myself an email if there's something I need to do (actionable notes)
  • search for stuff online when I need it

I find it reliable enough that I'm not much bothered by occasionally forgetting/losing maybe 20% of stuff since it's forced me to be good at remembering the core of things that are important so that I can refind things later.

Comment by gworley on The first step of rationality · 2019-09-30T19:32:52.456Z · score: 4 (5 votes) · LW · GW

I agree. I'll make a bold claim here that I've made elsewhere (sometimes using different language), which is that I wasn't really a rationalist until stream entry. Before that I was just pretending to be a rationalist. By extension from one example (albeit with lots of experience and related evidence that makes me think this is likely valid), I suspect no one is capable of systematized winning without stream entry, mostly because the kind of unification of mind and agency needed isn't present before hand. That doesn't mean winning without stream entry should be discounted, but it does make it fragile and contingent because it's not robustly systematized since the person doing the winning isn't yet capable of robust systematization.

To put a different frame on it, many rationalists like to talk about being "aspiring rationalists" rather than using the rationalist label directly because to them "rationalist" suggests some ideal to which one aspires but cannot be fully realized (this is especially true if you think of being a rationalist in Bayesian terms and then have to come to grips with being a finite, embedded being rather than a Cartesian being with hypercomputation). In that framing, I'd say being an aspiring rationalist without stream entry is like being a starfish aspiring to become a fish: the gap is so wide it causes a type error. Luckily, unlike this unlucky starfish, humans can transform themselves into the thing that can be an aspiring rationalist.

(There's a likely objection here that some would say I'm not a rationalist, making it likely someone would discount this on those grounds. I would instead say that I've transcended what many people think of as rationality, but that doesn't mean I've rejected it, only come to see it as valuable only up to a certain extent where it functions. That's a topic for another time, but wanted to put that here since it's relevant to evaluating my claims.)

Comment by gworley on The first step of rationality · 2019-09-30T19:09:23.157Z · score: 2 (1 votes) · LW · GW

There's something interesting happening here. Not quite irony, but something about seeing an expert notice and describe in details a general problem and then go on the suffer that problem themselves. It's sort of like the only way to become really expert in something is to do it, and that goes for being an expert in all the ways we can fail to be integrated, even with bodhi.

Comment by gworley on crabman's Shortform · 2019-09-28T20:28:30.479Z · score: 2 (1 votes) · LW · GW

I found this to vary by field. When I studied topology and combinatorics we proved all the big important things as homework. When I studied automata theory and measure theory, we did what your teacher is doing.

Comment by gworley on lionhearted's Shortform · 2019-09-28T20:24:38.110Z · score: 2 (1 votes) · LW · GW

I'm somewhat suspicious that this is an artifact of the way we usually measure IQ since tests often look like "more IQ for more questions answered correctly in a fixed amount of time", so unless we really mean to include speed in our measures of g, this seems possibly spurious.

Comment by gworley on Running the Stack · 2019-09-28T20:17:04.945Z · score: 2 (1 votes) · LW · GW

I mean i just naturally operate like a queue. Or at least I do now, maybe I used to be more stack like. But definitely things in the queue can get reordered, but queue feels more natural to me than stack, stack being some kind of degenerate case where the queue is malfunctioning.

Comment by gworley on Running the Stack · 2019-09-26T17:45:40.120Z · score: 3 (2 votes) · LW · GW

Hmm, I think I agree, but a queue instead of a stack.

Comment by gworley on Honoring Petrov Day on LessWrong, in 2019 · 2019-09-26T17:37:34.112Z · score: 24 (11 votes) · LW · GW

the temptation, the call to infamy

button shining, yearning to be pressed

can we endure these sinuous fingers coiled?

only the hours know our hearts

Comment by gworley on G Gordon Worley III's Shortform · 2019-09-26T17:21:14.960Z · score: 2 (1 votes) · LW · GW

Maybe, although I think there is a not very clear distinction I'm trying to make between knowledge and ontological knowledge, though maybe it's not coming across, although if it is and you have some particular argument for why, say, there isn't or can't be such a meaningful distinction, I'd be interested to hear it.

As for my model of reality having too many moving parts, you're right, I'm not totally unconfused about everything yet, and it's the place the remaining confusion lives.

Comment by gworley on On Becoming Clueless · 2019-09-24T18:00:13.390Z · score: 3 (2 votes) · LW · GW

One of the great challenges we seem to face today is that the world is more complex than any of us can completely reason about. This was probably always true, but it used to at least be possible to limit the scope of what counted as the world in a way that let us pretend we could understand everything, but today there's so much complexity we all have to live with not knowing everything. How do we deal with all this uncertainty?

Many responses are possible that engage with the rising uncertainty rather than hide from it. Bayesianism and LW-style rationality offers one, if we can manage to not get confused about our confusion and become conceited and overconfident. I find another one in Zen, where we train ourselves in how to dwell in not knowing and then allow compassionate and effective action to arise spontaneously to the best of our bounded abilities. Others are possible.

These are some of the ways we are working now on answering your question about what we will do then.

Comment by gworley on Emotions are actions, not goals or messages · 2019-09-23T19:27:20.655Z · score: 2 (1 votes) · LW · GW


I've come to see that emotion looks a lot like a confused, confounded category. We use "emotion" to point at a lot of different things, evidenced by if nothing else the way people vacillate between whether or not to talk about countable, distinct emotions (e.g. I'm sad, I'm angry, I'm sad and angry at the same time, etc.) or uncountable, fuzzy emotional energy (e.g. I'm a little sad, I'm very angry, I'm a mix of sad and angry, etc.). So this makes it hard to talk about the category we colloquially call "emotion(s)" and say anything much about their etiology.

This doesn't mean we can't try, but so long as we remain confused at best we can talk about some aspect of the ball-of-mud labeled "emotion". In that sense I think, for example, it's both right that emotions are ultimately actions and that they are ultimately messages, because much of the work is being done by your perspective on the confusion rather than the thing itself.

Comment by gworley on Meetups: Climbing uphill, flowing downhill, and the Uncanny Summit · 2019-09-23T18:05:29.457Z · score: 5 (2 votes) · LW · GW

At REACH we have a meditation meetup every Tuesday. Early on we decided the format should be a combination of doing the thing, talking about the thing, and talking with each other in that order. We experimented a bit before we settled on it, and basically found it was the format that seemed to work best, for some subjective, non-explicit measure of "best".

I think it works because first you have to show up and do the thing we are here to do. Then there is time to have directed conversation within a more limited scope so everyone has the opportunity to participate. Finally we come to the socializing, and it comes at the end for multiple reasons (would get in the way of the goal of the meetup; able to scale longer or shorter with the conditions of the week). So in your model it's climbing up high first, then slowly coming down in a way that lets you coast down to the valley of hanging out, all in one meetup so even if it's your first time you get the whole experience.

I'm not sure how well this can generalize since we are lucky that we are showing up every week to do the same activity, but it might be extensible to other sorts of meetups, rationalist or EA or not.

Comment by gworley on TurnTrout's shortform feed · 2019-09-23T17:31:16.845Z · score: 4 (2 votes) · LW · GW

I think a reasonable and related question we don't have a solid answer for is if humans are already capable of mind crime.

For example, maybe Alice is mad at Bob and imagines causing harm to Bob. How well does Alice have to model Bob for her imaginings to be mind crime? If Alice has low cognitive empathy is it not mind crime but if her cognitive empathy is above some level is it then mind crime?

I think we're currently confused enough about what mind crime is such that it's hard to even begin to know how we could answer these questions based on more than gut feelings.

Comment by gworley on G Gordon Worley III's Shortform · 2019-09-21T22:10:28.798Z · score: 2 (1 votes) · LW · GW

I've thought about this a bit and I don't see a way through to what you are thinking that makes you suggest this since I don't see a reduction happening here, much less one moving towards bundling together confusion that only looks simpler. Can you say a bit more that might make your perspective on this clearer?

Comment by gworley on Everyday belief in diminishing returns is resistant to diminishing returns · 2019-09-21T02:41:19.472Z · score: 3 (2 votes) · LW · GW

I agree with the thrust of your argument since it is true for some things, but I disagree on the issue of meditating. Most of the cool stuff happens only with large amounts of dedicated practice, such I believe a person would be better served by, say, 10 days of 10 hours of meditation each than 400 days of 15 minutes of meditation (100 days of 1 hour of meditation would still be pretty good, maybe even better than the 10 days, but for different reasons than why the 10 days is better than 400).

To me this suggests that it may be hard to know when it is appropriate to apply what you notice here and when not.

Comment by gworley on Divergence on Evidence Due to Differing Priors - A Political Case Study · 2019-09-17T18:36:28.215Z · score: 6 (3 votes) · LW · GW

I don't have any object level comments on this post (seems well reasoned to me and to make a reasonable point about the minds and the reasoning of the two people and their comments considered), but to me this looks like the kind of discussion of politics that is appropriate for LW (and since it's on the Frontpage I take it the mods agree): considers politics as a ground in which to analyze a phenomena where people are motivated in ways that can make distinctions sharp such that we can easily analyze them while avoiding saying anything about the object-level political conclusions.

Comment by gworley on Matthew Barnett's Shortform · 2019-09-17T18:19:05.793Z · score: 2 (1 votes) · LW · GW

Did you have some specific cases in mind when writing this? For example, HCH is interesting and not obviously going to fail in the ways that some other proposals I've seen would, and the proposal there seems to have gotten better as more details have been fleshed out even if there's still some disagreement on things that can be tested eventually even if not yet. Against this we've seen lots of things, like various oracle AI proposals, that to my mind usually have fatal flaws right from the start due to misunderstanding something that they can't easily be salvaged.

I don't want to disincentivize thinking about solving AI alignment directly when I criticize something, but I also don't want to let pass things that to me have obvious problems that the authors probably didn't think about or thought about from different assumptions that maybe are wrong (or maybe I will converse with them and learn that I was wrong!). It seems like an important part of learning in this space is proposing things and seeing why they don't work so you can better understand the constraints of the problem space to work within them to find solutions.

Comment by gworley on If you had to pick one thing you've read that changed the course of your life, what would it be? · 2019-09-16T20:21:04.688Z · score: 3 (2 votes) · LW · GW

Creating Friendly AI

Runner up: GEB

Comment by gworley on Devil's Dialectic · 2019-09-16T20:19:32.623Z · score: 2 (1 votes) · LW · GW

How, if at all, do you view this as different from the normal Hegelian dialectic, in particular with regards to using it as a method of moving towards reflective equilibrium, which is what you seem to be aiming for?

Comment by gworley on hereisonehand's Shortform · 2019-09-16T20:02:12.134Z · score: 3 (2 votes) · LW · GW

Part of the problem was that doing the work to apply those insights and doing so in a way that beats trained humans is hard because until recently those models couldn't handle all the variables and data humans could and so ignored many things that made a difference. Now that more data can be fed into the models they can make the same or better predictions that humans can make and thus stand a chance of outperforming them rather than making "correct" but poorly-informed decisions that, in the real world, would have lost games.

Comment by gworley on What are the merits of signing up for cryonics with Alcor vs. with the Cryonics Institute? · 2019-09-16T19:54:46.490Z · score: 6 (3 votes) · LW · GW

I chose Alcor based on a few things:

  • Alcor has a history of fighting and winning legal battles to preserve patients when patients wanted to be preserved and living relatives did not.
  • Alcor has a history of taking in patients from failed providers.
  • Alcor has been around longer.
  • I worry that CI's lower cost means they are creating externalities that need to be picked up by other parts of the system, in particular around funding research.

On net I think of Alcor as the more trusted option, and consider it worth it if you can afford it (and if you can really afford it, you can sign up with both as I know a few people have).

Comment by gworley on G Gordon Worley III's Shortform · 2019-09-13T22:59:14.192Z · score: 7 (4 votes) · LW · GW

So, having a little more space from all this now, I'll say that I'm hesitant to try to provide justifications because certain parts of the argument require explaining complex internal models of human minds that are a level more complex than I can explain even though I'm using them (I only seem to be able to interpret myself coherently one level of organization less than the maximum level of organization present in my mind) and because other parts of the argument require gnosis of certain insights that I (and to the best of my knowledge, no one) knows how to readily convey without hundreds to thousands of hours of meditation and one-on-one interactions (though I do know a few people who continue to hope that they may yet discover a way to make that kind of thing scalable even though we haven't figured it out in 2500 years, maybe because we were missing something important to let us do it).

So it is true that I can't provide adequate episteme of my claim, and maybe that's what you're reacting to. I don't consider this a problem, but I also recognize that within some parts of the rationalist community that is considered a problem (I model you as being one such person, Duncan). So given that, I can see why from your point of view it looks like I'm just making stuff up or worse since I can't offer "justified belief" that you'd accept as "justified", and I'm not really much interested in this particular case in changing your mind as I don't yet completely know myself how to generate that change in stance towards epistemology in others even though I encountered evidence that lead me to that conclusion myself.

Comment by gworley on G Gordon Worley III's Shortform · 2019-09-13T22:43:17.859Z · score: 8 (4 votes) · LW · GW

I forget if we've talked about this specifically before, but I rarely couch things in ways that make clear I'm talking about what I think rather than what is "true" unless I am pretty uncertain and want to make that really clear or expect my audience to be hostile or primarily made up of essentialists. This is the result of having an epistemology where there is no direct access to reality so I literally cannot say anything that is not a statement about my beliefs about reality, so saying "I think" or "I believe" all the time is redundant because I don't consider eternal notions of truth meaningful (even mathematical truth, because that truth is contingent on something like the meta-meta-physics of the world and my knowledge of it is still mediated by perception, cf. certain aspects of Tegmark).

I think of "truth" as more like "correct subjective predictions, as measured against (again, subjective) observation", so when I make claims about reality I'm always making what I think of as claims about my perception of reality since I can say nothing else and don't worry about appearing to make claims to eternal, essential truth since I so strongly believe such a thing doesn't exist that I need to be actively reminded that most of humanity thinks otherwise to some extent. Sort of like going so hard in one direction that it looks like I've gone in the other because I've carved out everything that would have allowed someone to observe me having to navigate between what appear to others to be two different epistemic states where I only have one of them.

This is perhaps a failure of communication, and I think I speak in ways in person that make this much clearer and then I neglect the aspects of tone not adequately carried in text alone (though others can be the judge of that, but I basically never get into discussions about this concern in person, even if I do get into meta discussions about other aspects of epistemology). FWIW, I think Eliezer has (or at least had) a similar norm, though to be fair it got him into a lot of hot water too, so maybe I shouldn't follow his example here!

Comment by gworley on G Gordon Worley III's Shortform · 2019-09-12T03:59:38.816Z · score: 2 (3 votes) · LW · GW

Correct, it was made in a nonpublic but not private conversation, so you are not the only agent to consider, though admittedly the primary one other than myself in this context. I'm not opposed to discussing disclosure, but I'm also happy to let the matter drop at this point since I feel I have adequately pushed back against the behavior I did not want to implicitly endorse via silence since that was my primary purpose in continuing these threads past the initial reply to your comment.

Comment by gworley on G Gordon Worley III's Shortform · 2019-09-11T23:49:31.065Z · score: 3 (6 votes) · LW · GW

I cannot adequately do that here because it relies on information you conveyed to me in a non-public conversation.

I accept that you say that's not what you're doing, and I am happy to concede that your internal experience of yourself as you experience it tells you that you are doing what you are doing, but I now believe that my explanation better describes why you are doing what you are doing than the explanation you are able to generate to explain your own actions.

The best I can maybe offer is that I believe you have said things that are better explained by an intent to enforce norms rather than argue for norms and imply that general case should be applied in this specific case. I would say the main lines of evidence revolve around how I interpret your turns of phrase, how I read your tone (confrontational and defensive), what aspects of things I have said you have chosen to respond to, how you have directed the conversation, and my general model of human psychology with the specifics you are giving me filled in.

Certainly I may be mistaken in this case and I am reasoning off circumstantial evidence which is not a great situation to be in, but you have pushed me hard enough here and elsewhere that it has made me feel it is necessary to act to serve the purpose of supporting the conversation norms I prefer in the places you have engaged me. I would actually really like this conversation to end because it is not serving anything I value, other than that I believe not responding would simply allow what I dislike to continue and be subtly accepted, and I am somewhat enjoying the opportunity to engage in ways I don't normally so I can benefit from the new experience.