I'm not sure what the question is here, so I'll comment instead.
Now, if the Church Turing Hypothesis is true, then this metaphorical tape is sufficiently powerful to simulate not only boring things like computers, but also fancy things like black-holes and (dare I say it) human intelligence!
I believe this to be an overstep. First, the Church-Turing Thesis is not a formal claim we can readily assess the truth of (formally speaking, we'd probably just say it's false), and instead a belief that some interpret as "the universe is computable" but mostly shows up in computer science as a way to handwave around messy details of proving any particular function is computable. For it to be a formal claim would require us knowing more physics than we do such that we would know the true metaphysics of the universe. Thus by invoking it you put the cart before the horse, claiming a thing that would already prove your argument without justification.
Since this is part of the post that seems to be making an argument you disagree with, I'm inclined to view your description as a strawman of MWI in light of this. If you mean it to be a strong argument against MWI I think you'll have to present it in a way that would convince someone who believes in MWI, since this reads to me like you haven't understood the MWI position and so are objecting to a position superficially similar to the MWI position but that's not the real position.
P.P.S. In case my own viewpoint was not obvious, I think "shut up and calculate" means we only worry about things that could potentially affect our future observations, and worrying about whether or not the other branches of the multiverse "exist" is about as meaningful as worrying about how many angels could stand on the head of a pin.
That's a pragmatic view, and you are free to ignore the (currently) metaphysical question being addressed by MWI because you think it doesn't matter to your life, but it's also not an argument against MWI, only against MWI mattering to your purposes.
Or more generally, should we lump all of these levels together or not?
On the one hand, I think yes, because I think the same basic mechanism is at work (homeostatic feedback loop).
On the other hand, no, because those loops are wired together in different ways in different parts of the brain to do different things. I draw my model of what the levels are from the theory of dependent origination but other theories are possible, and maybe we can eventually get some thoroughly grounded in empirical neuroscience.
After seeing another LW user (sorry, forgot who) mention this post in their commenting guidelines, I've decided to change my own commenting guidelines to the following, matching pretty close to the SSC commenting guidelines that I forgot existed until just a couple days ago:
Comments should be at least two of true, useful, and kind, i.e. you believe what you say, you think the world would be worse without this comment, and you think the comment will be positively received.
I like this because it's simple and it says what rather than how. My old guidelines were all about how:
Seek to foster greater mutual understanding and prefer good faith to bad, nurture to combat, collaboration to argument, and dialectic to debate. Do that by:
-aiming to understand the author and their intent, not what you want them to have said or fear that they said
-being charitable about potential misunderstandings, assuming each person is trying their best to be clearly understood and to advance understanding
-resolving disagreement by finding the crux or synthesizing contrary views to sublimate the disagreement
I'm fairly tolerant, but if you're making comments that are actively counterproductive to fruitful conversation by failing to read and think about what someone else is saying I'm likely to ask you to stop and, if you don't, delete your comments and, if you continue, ban you from commenting on my posts.Some behavior that is especially likely to receive warnings, deletions, and bans:
More generally, I think the SSC commenting guidelines might be a good cluster for those of us who want LW comment sections to be "nice" and so mark our posts as norm enforcing. If this catches on this might help deal with finding the few clusters of commenting norms that make people want without having lots of variation between authors.
I similarly suspect automation is not really happening in a dramatically different way thus far. Maybe that will change in the future (I think it will), but it's not here yet.
So why so much concern about automation?
I suspect because of something they don't look at in this study much (based on the summary): displacement. People are likely being displaced from jobs into other jobs by automation or the perception of automation and some few of those exit the labor market rather than switch into new jobs. Further, those who do move to new jobs likely disprefer their new jobs because they require different skills, they are less skilled at them immediately after switching, and due to lack of initial skill these new jobs initially pay less than the old jobs. This creates compelling evidence for the automation "destroying" jobs story even though the bigger picture makes it clear that this isn't really happening, in particular because the destroying job story ignores the contrary evidence from what happens after a worker has been in a new job after displacement for a few years and have recovered to pre-displacement levels of wages.
When someone asks me why I did or said something I usually lie because the truthful answer is "I don't know". I literally don't know why I make >99% of my decisions. I think through none of these decisions rationally. It's usually some mixture of gut instinct, intuition, cultural norms, common sense and my emotional state at the time.
I know some folks on LW are very scrupulous, or worry about if they are sufficiently scrupulous. While I'm not saying there is no value in having an accurate view of reality (in fact I think it is quite useful!), this is also why when I try to imagine myself being very scrupulous it feels a bit silly because it all just seems like something I made up and any correspondence with reality is the result of things far outside my control, including by not limited to how well I remember things, how well I create ideas in other people's minds using words, and how my brain perceives the accuracy of what I'm saying. That doesn't mean I totally give up on trying to say something true, but I also realize I know so little that my notion of true is very small.
You seem to be using the words "goal-directed" differently than the OP.
And in different ways throughout your comment.
That's a manifestation of my point: what it would mean for something to be a goal seems to be able to shift depending on what it is you think is an important feature of the thing that would have the goal.
I've gone through keeping my identity small and come out the other side, so this might be an interesting nuance on it.
KYIS is important. I think of it in terms of attachment. It's important not to become attached to (or need, in your language) the identified with thing. That's the path down which motivated thinking, defensiveness, and general suffering lie.
However it's also import to project an identity. People get confused about how to interact with you if you don't fit cleanly into a role. To use a programming metaphor, projecting an identity is like documenting your API so people know what and how they can interact with you.
My own experience was that I made my identity so small and consequently projected so little identity that people didn't quite know what to make of me. I was getting labeled "eccentric" and "weird" a lot because I was confusing. So to help other people be less confused and improve my social interactions, I created a brand or identity to project outwards with my clothes, mannerism, etc. that is closely based on who I naturally am as a person but also plays into schema that other people have. The result is people have some clear sense of who I am, even though it's wrong, and it lets them interact with me in consistently positive ways, even if they aren't the maximally positive ways that would be possible if we spent the time to get to know each other deeply. I make my brand the closest Schelling point in in the identity space of schemas that people have, and things fall out smoothly from there.
Maybe not the approach everyone will want to take, but if you find it frustrating that everyone thinks you are weird and doesn't know how to interact with you in positive ways, consider showing some more identity to them (even if it's not the real thing!) so that they can "know" you better. If you're afraid to do that because it's not authentic, consider in what way "being authentic" is something you identify with!
NB: I've not made a post about this point, but your thoughts made me think of it, so I'll bring it up here. Sorry if I left a comment elsewhere making this same point previously and I forgot about it. Also this is not really a direct response to your post, which I'm not explicitly agreeing or disagreeing with in this comment, but more a riff on the same ideas because you got me thinking about them.
I think much of the confusion around goals and goal directed behavior and what constitutes it and what doesn't lies in the fact that goals, as we are treating them here, are teleological, viz. they are defined in context of what we care about. Another way to say this is that goals are a way we anthropomorphize things, thinking of them as operating the same way we experience our own minds operating.
To see this, we can simply shift our perspective to think of anything as being goal directed. Is a twitching robot goal directed? Sure, if I created the robot to twitch, it's doing a great job of achieving its purpose. Is a bottle cap goal directed? Sure, it was created to keep stuff in, and it keeps doing a fine job of that. Conversely, am I goal directed? Maybe not: I just keep doing stuff and it's only after the fact that I can construct a story that says I was aiming to some goal. Is a paperclip maximizer goal directed? Maybe not: it just makes paperclips because it's programmed to and has no idea that that's what it's doing, no more than the bottle cap knows it's holding in liquid or the twitch robot knows it's twitching.
This doesn't mean goals are not important; I think goals matter a lot when we think about alignment because they are a construct that falls out of how humans make sense of the world and their own behavior, but they are interesting for that reason, not because they are a natural part of the world that exists prior to our creation of them in our ontologies, i.e. goals are a feature of the map, not the unmapped territory.
I maybe don't quite understand your first two questions. If you're asking "where does positive valence come from" my answer is "minimization of prediction error", keeping in mind I think of that as a fancy way to say "feedback signal indicating a control system is moving towards a setpoint". I forget how to translate that into terms of Friston's free energy (increasing it? decreasing it?) if you prefer that model, but the point being that valence is a fundamental thing the brain does to signal parts of itself to do more or less of something.
As to your second question, valence is absolutely shaped by evolution so long as we hold the theory that all creatures with nerve cells have come to exist via evolutionary processes (maybe better to taboo "evolution" and say "differential reproduction with trait inheritance"). As to what effect evolution has had on valence seems a matter for evolutionary psychology and related studies of the evolutionary etiology of animal behavior.
In the case of Many-Worlds interpretations or parallel universes, the correct response is to be like Alice, and admit that multiple perspectives are equally admissible. (This is assuming that they truly are empirically indistinguishable.
This is no worse than accepting that there might be multiple mathematical proofs of the Pythagorean theorem, some algebraic and some geometric, or than accepting that angles can be expressed in degrees or in radians. All are equally valid ways to think about the same problem, so use whatever you like.
This seems not quite right to me, in that I doubt we can draw this equivalence. In the case of mathematical proofs and the units with which to measure angles, we can be indifferent between the choices in the case that our purpose (what we care about; our telos) is proving a statement true or having a measure of an angle, respectively, but if we care about length of proof or proof assumptions (maybe we want a proof of a theorem that doesn't rely on the axiom of choice) or angle units supported by a calculator or elegance of working with particular units then there is a difference between these that matters.
So it is with explanations. If our purpose is to make predictions about quantum effects, then a theory about how quantum mechanics works isn't important, only that the mathematical model predicts reality, and metaphysical questions are moot. But if our purpose is to understand what's going on beyond what can be predicted using quantum mechanics, then we care a lot about which interpretation of quantum mechanics is correct because it does make predictions about the thing we care about.
This kind of not-caring-because-it-works is only practical so long as it is pragmatic to a particular purpose. Perhaps many people should be more pragmatic, but that seems a separate issue, and there are many reasons why what is pragmatic for one purpose may not be for another, so I think your view is true but insufficient.
Here's another: AI being x-risky makes me the bad guy.
That is, if I'm an AI researcher and someone tells me that AI poses x-risks, I might react by seeing this as someone telling me I'm a bad person for working on something that makes the world worse. This is bad for me because I derive import parts of my sense of self from being an AI researcher: it's my profession, my source of income, my primary source of status, and a huge part of what makes my life meaningful to me. If what I am doing is bad or dangerous, that threatens to take much of that away (if I also want to think of myself as a good person, meaning I either have to stop doing AI work to avoid being bad or stop thinking of myself as good), and an easy solution to that is to dismiss the arguments.
This is more generally a kind of motivated cognition or rationalization, but I think it's worth considering a specific mechanism because it better points towards ways you might address the objection.
Sort of related to a couple points you already brought up (not in personal experience, outsiders not experts, science fiction), but worrying about AI x-risk is also weird, i.e. it's not a thing everyone else is worrying about, so you use some of your weirdness-points to publicly worry about it, and most people have very low weirdness budgets (because of not enough status to afford more weirdness, low psychological openness, etc.).
Sometimes we’re at a functional local maxima, but we’re not pointed in the right direction globally, and frankly speaking our lack of a high energy parameter is our saving grace – our inability to directly muck up our emotional landscape.
I've heard a similar story in meditation circles about why integration work is important: greater awakening enables greater agency/freedom-of-action, and without integration, virtue, and ethics (these are traditionally combined via the paramita of sila) that can be dangerous because it can let a person run off in personally dangerous or socially bad directions that they were previously only managing not to because they weren't more capable, essentially protecting themselves from themselves with their own failure.
Here, the optimal decisions would be the higher-order outputs which maximize higher-order utility. They are decisions about what to value or how to decide rather than about what to do.
What constitutes utility here, then? For example, some might say utility is grounded in happiness or meaning, in economics we often measure utility in money, and I've been thinking along the lines of grounding utility (through value) in minimization of prediction error. It's fine that you are concerned with higher-order processes (I'm assuming by that you mean processes about processes, like higher-order outputs is outputs about outputs, higher-order utility is utility about utility), and maybe you are primarily concerned with abstractions that let you ignore these details, but then it must still be that those abstractions can be embodied in specifics at some point or else they are abstractions that don't describe reality well. After all, meta-values/preferences/utility functions are still values/preferences/utility functions.
To capture rational values, we are trying to focus on the changes to values that flow out of satisfying one’s higher-order decision criteria. By unrelated distortions of value, I pretty much mean changes in value from any other causes, e.g. from noise, biases, or mere associations.
How do you distinguish whether something is a distortion or not? You point to some things that you consider distortions, but I'm still unclear on the criteria by which you know distortions from the rational values you are looking for. One person's bias may be another person's taste. I realize some of this may depend on how you identify higher-order processes, but even if that's the case we're still left with the question as it applies to those directly, i.e. is some particular higher-order decision criterion a distortion or rational?
In the code and outline I call the lack of distortion Agential Identity (similar to personal identity). I had previously tried to just extract the criteria out of the brain and directly operate on them. But now, I think the brain is sufficiently messy that we can only simulate many continuations and aggregate them. That opens up a lot of potential to stray far from the original state. This Agential Identity helps ensure we’re uncovering your dispositions rather than that of a stranger or a funhouse mirror distortion.
This seems strange to me, because much of what makes a person unique lies in their distortions (speaking loosely here), not in their lack. Normally when we think of distortions they are taking an agent away from a universal perfected norm, and that universal norm would ideally be the same for all agents if it weren't for distortions. What leads you to think there are some personal dispositions that are not distortions and not universal because they are caused by the shared rationality norm?
I tend to think of Hegel as primarily important for his contributions to the development of Western philosophy (so even if he was wrong on details he influenced and framed the work of many future philosophers by getting aspects of the framing right) and for his contributions to methodology (like standardizing the method of dialectic, which on one hand is "obvious" and people were doing it before Hegel, and on the other hand is mysterious and the work of experts until someone lays out what's going on).
A brain’s rational utility function is the utility function that would be arrived at by the brain’s decision algorithm if it were to make more optimal decisions while avoiding unrelated distortions of value.
By what mechanism do you think we can assess how unrelated and how much distortion of value is happening? Put another way, what are "values" in this model such that they are are separate from the utility function and how could you measure whether or not the utility function is better optimizing for those values?
Note that MAPLE is a young place, less than a decade old in its current form. So, much of it is "experimental." These ideas aren't time-tested. But my personal experience of them has been surprisingly positive, so far.
I think it's worth sharing that 3 of the ideas you brought up are, at least within zen, historically common to monastic practice, albeit changed in ways to better fit the context of MAPLE. You call them the care role, the ops role, and the schedule; I see them as analogues of the jisha, the jiki, and the schedule.
The jisha, in a zen monastery, is first and foremost the attendant of the abbot (caveat: some monasteries every teacher and high-ranking priest will have their own jisha). But in addition to this, the jisha is thought of as the "mother" of the sangha, with responsibilities to care for the monks, nuns, and guests, care for the sick, organize cleaning, and otherwise be supportive of the needs of people. This is similar to your care role in some ways, but MAPLE seems to have focused more on the care aspect and dropped the gendered-role aspects.
The jiki (also jikijitsu or jikido) is responsible for directing the movement of the students. They are the "father" to the jisha's "mother", serving as (possibly strict) disciplinarians to keep the monastery operating as intended by the abbot, enforcing rules and handing out punishments. This sounds similar to the Ops role, albeit probably with fewer slaps to the face and blows to the head.
The schedule is, well, the schedule. I expect MAPLE's schedule, though "young", is building on centuries of monastic schedule tradition while adding in new things. I think it's worth adding that the schedule is also there to support deep practice, because there's a very real way that having to make decisions can weaken samadhi, and having all decisions eliminated creates the space in which calm abiding can more easily arise.
What Dharma traditions in particular so you have in mind, because I can't think of one i would describe as saying everyone had innate "moral" perfection unless you sufficiently twist around the word "moral" such that it's use is confusing at best.
Story stats are my favorite feature of Medium. Let me tell you why.
I write primarily to impact others. Although I sometimes choose to do very little work to make myself understandable to anyone who is more than a few inferential steps behind me and then write out on a far frontier of thought, nonetheless my purpose remains sharing my ideas with others. If it weren't for that, I wouldn't bother to write much at all, and certainly not in the same way as I do when writing for others. Thus I care instrumentally a lot about being able to assess if I am having the desired impact so that I can improve in ways that might help serve my purposes.
LessWrong provides some good, high detail clues about impact: votes and comments. Comments on LW are great, and definitely better in quality and depth of engagement than what I find other places. Votes are also relatively useful here, caveat the weaknesses of LW voting I've talked about before. If I post something on LW and it gets lots of votes (up or down) or lots of comments, relative to what other posts receive, then I'm confident people have read what I wrote and I impacted them in some way, whether or not it was in the way I had hoped.
That's basically where story stats stop on LessWrong. Here's a screen shot of the info I get from Medium:
For each story you can see a few things here: views, reads, read ratio, and fans, which is basically likes. I also get an email every week telling me about the largest updates to my story stats, like how many additional views, reads, and fans a story had in the last week.
If I click the little "Details" link under a story name I get more stats: average read time, referral sources, internal vs. external views (external views are views on RSS, etc.), and even a list of "interests" associated with readers who read my story.All of this is great. Each week I get a little positive reward letting me know what I did that worked, what didn't, and most importantly to me, how much people are engaging with things I wrote.
I get some of that here on LessWrong, but not all of it. Although I've bootstrapped myself now to a point where I'll keep writing even absent these motivational queues, I still find this info useful for understanding what things I wrote that people liked best or found most useful and what they found least useful. Some of that is mirrored here by things like votes, but it doesn't capture all of it.
Possibly related but with a slightly different angle, you may have missed my work on trying to formally specify the alignment problem, which is pointing to something similar but arrives at somewhat different results.
It's true that not all of online advertising does nothing. We should expect, if nothing else, online advertising to continue to serve the primary and original purpose of advertising, which is generating choice awareness, and certainly my own experience backs this up: I am aware of any number of products and services only because I saw ads for them on Facebook, Google search, SlateStarCodex, etc.. To the extent that advertising helps people become aware of choices they otherwise would not have become aware of such that on the margin they may take that choice (since you make none of the choices you don't know how to make), it would seem to function successfully, assuming it can be had at a price low enough to produce positive return on investment.
However, my own experience in the industry suggests that most spend that goes beyond generating more than zero awareness is poorly spent. Much to the dismay of marketing departments, you can't usually spend your way through ads to growth. Other forms of marketing look better (content marketing can work really great and can be a win-win when done right).
I'm excited for the rest of this miniseries. I'm similarly interested in cybernetics and am sad it failed for what in hindsight seem to be obvious and unavoidable reasons (interdisciplinary & easily cooped to justify bullshit). My own thinking has taken me in a direction convergent with cybernetics, as I've investigated a bit in the past.
Cool! I don't have time to look into this now, but I'm excited to see what you produce in this direction. As you know I'm pretty pessimistic that we can totally solve Goodhart effects, but I do expect we can mitigate them enough that for things other than superintelligent levels of optimization we can do better than we do now.
I think all of your reasons for how a human comes to have moral authority boil down to something like having a belief that doing things that this authority says are expected to be good (have positive valence, in my current working theory of values). This perhaps gives a way of reframing alignment as the problem of constructing an agent to whom you would give moral authority to decide for you, rather than as we normally do as an agent that is value aligned.
I guess I'm a bit out of the loop on questions about how to define uncertainty, so I'm a bit confused about what position you are against or how this is different from what others do. That is, it seems to be like you are trying to fix a problem you perceive in the way people currently think about uncertainty, but I'm not sure what that problem is so that I can even understand how this framing might fix it. I've been reading this sequence of posts thinking "yeah, sure, this all sounds reasonable" but also without really understanding the context for it. I know you did the post on anthropics, but even there it wasn't really that clear to me how this framing helps us over what is perhaps otherwise normally done, although perhaps that reflects my ignorance of existing arguments about what methods of anthropic reasoning are correct.
This seems quite right to me, that in our minds things are often confused and conflated that don't need to be and as a result we act in ways that aren't what we think should be possible and it feels like doing what we really want is impossible because in our minds we don't know how to separate the thing we want from the thing we don't want. One possible way to deal with these sorts of problems that I've been excited about lately as a good framing for the mechanism that underlies the processes that clear these sorts of confusions is memory reconsolidation.
I guess this is a matter of opinion on how much explanation makes something "untranslatable". For example, maybe it takes 1000 words to give enough context to adequately convey the meaning of a word with a very precise meaning in another language. Is this word "translatable"? In a certain sense no, because making sense of it required giving the person a lot of new context that they didn't have before such that they could make sense of it that was beyond simple reference to existing concepts they had. Obviously the other end of the spectrum where there are words that are literally impossible to explain don't exist or else even the speakers of the same language wouldn't be able to convey their meaning to each other, so it seems to me fair to say some words are untranslatable if by that we mean unable to provide a direct or simple translation on the order of using something up to the size of a phrase to capture the original meaning of the word.
For one thing, these dynamics are already in place: the world is full of agents and more basic optimizing processes that are not aligned with broad human values—most individuals to a small degree, some strange individuals to a large degree, corporations, competitions, the dynamics of political processes.
I don't think of this as evidence that unaligned AI is not dangerous. Arguable we're already seeing bad effects from unaligned AI, such as effects on public discourse as a result of newsfeed algorithms. Further, anything that limits the impact of unaligned action now seems largely the result of existing agents being of relatively low or similar power. Even the most powerful actors in the world right now can't effectively control much of the world (e.g. no government has figured out how to eliminate dissent, no military how to stop terrorists, etc.). I expect thing to look quite different if we develop an actor that is more powerful than a majority of all other actors combined, even if it develops into that power slowly because the steps along the way to that seem individually worth the tradeoff.
But it isn’t obvious to me that by that point it isn’t sufficiently well aligned that we would recognize its future as a wondrous utopia, just not the very best wondrous utopia that we would have imagined if we had really carefully sat down and imagined utopias for thousands of years.
To our ancestors we would appear to live in a wondrous utopia (bountiful food, clean water, low disease, etc.), yet we still want to do better. I think there will be suffering so long as we are not at the global maximum and anyone realizes this.
I find this interesting as this gives one of the better arguments I can recall for there being something positive at the heart of social justice such that it isn't just one side trying to grab power from another to push a different set of norms, since that's often what the dynamics of it look like to me in practice, whatever the intent of social justice advocates, and I find such battles not compelling (why grant one group power rather than another, all else equal, if they will push for the things they want to the exclusion of those who would then not be in power just the same as those in power now do to those seeking to gain power?).
I'm inclined to think there is no problem here because the belief that [Dave] has about being in a simulation is unfounded as it's exactly the same situation Dave finds himself in later when PAL takes route B. That is, taking route B then seems to not be evidence about being in a simulation as you suggest, even if PAL normally takes route A and is highly reliable, because it could just as easily be that Dave is seeing the result of PAL acting on a simulation involving [Dave] causing PAL to prefer route B (assuming there is only one level of simulation; if there's reason to believe there's more than one level we start to tip in favor of simulation).
I was initially intrigued to read this because it seemed like you were going to make an interesting case somewhere along the lines of "mathematics involves hermeneutics" because ultimately mathematics is done by humans using a (formal) language that they must interpret before they can generate more mathematics that follows the rules of the language. It seems to me you never quite got there, a stumbled towards some other point that's not totally clear to me. Forgive me if this is an uncharitable reading, but I read your point as being "look at all the cool stuff we can do because we use formal languages".
Pointing out that formal languages let us do cool stuff is something I agree with, although it feels a bit obvious. I suspect I'm mostly reacting to having hoped you'd make a stronger case for applying hermeneutic methods and having an attitude of interpretation when dealing with formal systems, since this is a point often ignored when people learn about formal systems by remaking what I might call the "naive positivist" mistake of thinking formal systems describe reality precisely, or put in LW terms, confuse the map for the territory.
Additionally, I found your proof of "There exist a Turing Machine that recognize membership in the language." somewhat inadequate given what you presented in the text. It's only thanks to knowing some details about TMs already that this proof is very meaningful to me, and I think the uninitiated reader, whom you seem to be targeting, would have a hard time understanding this proof since you didn't explain the constraints of TMs. For example, an easy objection to raise to your proof, lacking a clear definition of TMs, would be "but won't it halt incorrectly if it sees a '2' or a '3' or a 'c' on the tape?"
I appreciate your clear writing style. If my comments seem harsh it's because you got my hopes up and then delivered something less than what you initially lead me to expect.
Do on site notifications and email notifications come at the same cadence? For example, if I ask for daily notifications by email and on site, will I see them in both places only daily or will I still see them on site immediately?
I'd like to get daily emails and have immediate on site notifications, but I'm not sure if that's currently possible.
I agree with your arguments if we consider explicit forms of knowledge, such as episteme and doxa. I'm uncertain if they also apply to what we might call "implicit" knowledge like that of techne and gnosis, i.e. knowledge that isn't separable from the experience of it. There I think we can make a distinction between pure "is" that exists prior to conceptualization and "is from ought" arising only after such experiences are reified (via distinction/discrimination/judgement) that makes it so that we can only talk about knowledge of the "is from ought" form even if it built over "is" knowledge that we can only point at indirectly.
Maybe. This is a very narrow definition of "enlightenment" in my opinion, as in Scott is claiming PNSE is enlightenment whereas I would say it's one small part of it. I think of it differently, as a combination of psychological development plus some changes to how the brain operates that seemingly includes PNSE but I'm not convinced that's the whole story.
Almost everything is "alive" or "conscious" because the only interesting property that separates things that are "alive" or "dead" is whether or not they contain feedback processes (that, as a consequence, generate information and locally reduce entropy while globally increasing it).
I find this way of formalizing Goodhart weird. Is there a standard formalization of it, or is this your invention? I'll explain what I think is weird.
You define U and V such that you can calculate U - V to find W, but this appears to me to skip right past the most pernicious bit of Goodhart, which is that U is only knowable via a measurement (not necessarily a measure), such that I would say V=μ(U) for some "measuring" function μ:U→R and the problem is that μ(U) is correlated with but different from U since there may not even be a way to compare U.
To make it concrete with an example, suppose U is "beauty as defined by Gordon". We don't, at least as of yet, have a way to find U directly, and maybe we never will. So supposing we don't, if we want to answer questions like "would Gordon find this beautiful?" and "what painting would Gordon most like?" we need to a measurement of U we can work with, as developed by, say, using IRL to discover a "beauty function" that describes U such that we could say how beautiful I would think something is. But we would be hard pressed to be precise about how far off the beauty function is from my sense of beauty because we only have a very gross measure of the difference: compare how beautiful the beauty function and I think some finite set of things are (finite because I'm a bounded, embedded agent who is never going to get to see all things, even if the beauty function somehow could), and even as we are doing this we are still getting a measurement of my internal sense of beauty rather than my internal sense of beauty itself because we are asking me to say how beautiful I think something is rather than directly observing my sense of beauty. This is much of why I expect that Goodhart is extremely robust.
As of yet, no, although this brings up an interesting point, which is that I'm looking at this stuff to find a precise grounding because I don't think we can develop a plan that will work to our satisfaction without it. I realize lots of people disagree with me here, thinking that we need the method first and the value grounding will be worked out instrumentally by the method, but I dislike this because it makes it hard to verify the method than by observing what an AI produced by that method does, and this is a dangerous verification method due to the risk of a "treacherous" turn that isn't so much treacherous as it is the one that could have been predicted if we bothered to have a solid theory of what the method we were using really implied in terms of the thing we cared about, if we had bothered to know what the thing we cared about fundamentally was.
Also I suspect we will be able to think of our desired AI in terms of control systems and set points, because I think we can do this for everything that's "alive", although it may not be the most natural abstraction to use for its architecture.
I read partial agency and myopia as a specific way the boundedness of embedded processes manifest their limitations, so it seems to me both not surprising that it exists nor surprising that there is an idealized "unbounded" form to which the bounded form may aspire but not achieve due to limitations created by being bounded and instantiated out of physical stuff rather than mathematics.
I realize there's a lot more details to the specific case you're considering, but I wonder if you'd agree it's part of this larger, general pattern of real things being limited in ways by embeddedness that makes them less than their theoretical (albeit unachievable) ideal.
This seems to me to address the meta problem of consciousness rather than the hard problem of consciousness itself, since you seem to be more offering an etiology for the existence of agents that would care about the hard problem of consciousness rather than an etiology of qualia.
There is an interesting addition to this, I think, which is that if a goal of the utility function is to encourage exploration then it paradoxically needs to be extremely robust against being modified while it explores and possibly modifies all other goals. I could easily imagine an agent finding some kind of mechanism to avoid local maxima (exploration) being important enough that it would lock it in so the only thing it can't not continue to do is explore well enough to not get trapped and keep looking for a global maximum.