Posts
Comments
some fragments:
What hunches do you currently have surrounding orthogonality, its truth or not, or things near it?
re: hard to know - it seems to me that we can't get a certifiably-going-to-be-good result from a CEV based ai solution unless we can make it certifiable that altruism is present. I think figuring out how to write down some form of what altruism is, especially altruism in contrast to being-a-pushover, is necessary to avoid issues - because even if any person considers themselves for CEV, how would they know they can trust their own behavior?
as far as I can tell humans should by default see themselves as having the same kind of alignment problem as AIs do, where amplification can potentially change what's happening in a way that corrupts thoughts which previously implemented values. can we find a CEV-grade alignment solution that solves the self-and-other alignment problems in humans as well, such that this CEV can be run on any arbitrary chunk of matter and discover its "true wants, needs, and hopes for the future"?
It might be good to have a suggestion that people can't talk if it's not their turn
It might be good to explain why the turn timer. turns out we should have used the recommended one imo
At the end of the game folks started talking about "I got a little extinction, not so much hell", "I got pretty much utopia", "I got omelas". Describing how to normalize points maybe good
also, four people seemed like too many for the first round. maybe fewer? maybe just need more rounds to understand it?
Much of this seems quite plausible and matches my experience, but it seems worded slightly overconfidently. Like, maybe 5 to 10% at most. Lots of points where I went, "well, in your experience, you mean. And also, like, in mine. But I don't think either of us know enough to assert this is always how it goes." I've had issues in the past from taking posts like this a bit too seriously when someone sounded highly confident, so I'm just flagging - this seems like a good insight but I'd guess the model as described is less general than it sounds, and I couldn't tell you exactly how much less general off the top of my head.
But also, it's a pattern I've noticed as well, and have found to be incredibly useful to be aware of. So, weak upvoted.
I do think natural latents could have a significant role to play somehow in QACI-like setups, but it doesn't seem like they let you avoid formalizing, at least in the way you're talking about. It seems more interesting in terms of avoiding specifying a universal prior over possible worlds, if we can instead specify a somewhat less universal prior that bakes in assumptions about our worlds' known causal structure. it might help with getting a robust pointer to the start of the time snippet. I don't see how it helps avoiding specifying "looping", or "time snippet", etc. natural latents seem to me to be primarily about the causal structure of our universe, and it's unclear what they even mean otherwise. it seems like our ability to talk about this concept is made up of a bunch of natural latents, and some of them are kind of messy and underspecified by the phrase, mainly relating to what the heck is a physics.
I like #2. On a similar thread: would be nice to have a separate section for pinned comments. I looked into pull requesting it at one point but looks like it either isn't as trivial as I hoped, or I simply got lost in the code. I feel like folks having more affordance to say, "Contrary to its upvotes or recency, I think this is one of the most representative comments from me, and others seeing my page should see it" would be helpful - pinning does this already but it has ui drawbacks because it simply pushes recent comments out of the way and the pinned marker is quite small (hence why I edit my pinned comments to say they're pinned).
then you're not guaranteed that your AI gets anywhere at all; its knightian uncertainty might remain so immense that the AI keeps picking the null action all the time because some of its knightian hypotheses still say that anything else is a bad idea.
@Tamsin, The knightian in IB is related to limits of what hypotheses you can possibly find/write down, not - if i understand so far - about an adversary. The adversary stuff is afaict mostly to make proofs work.
@all, anyway the big issue I (still) have with this is still that, if the user is trying to give these statements, how bad is it if they screw up in some nonobvious fundamental way? Does this prior instantly collapse if the user is a kinda bad predictor on some important subset of logic or makes only statements that aren't particularly connected to some part of the statements needed to describe reality?
I'd be particularly interested to see Garrabrant, Kosoy, Diffractor, Gurkenglas comment on where they think this works or doesn't
Interested! I would pay at cost if that was available. I'll be asking about which posts are relevant to a question, misc philosophy questions and asking for Claude to challenge me, etc. Primarily interested if I can ask for brevity using a custom prompt, in the system prompt.
Wei Dai and Tsvi BT posts have convinced me I need to understand how one does philosophy significantly better. Anyone who thinks they know how to learn philosophy, I'm interested to hear your takes on how to do that. I get the sense that perhaps reading philosophy books is not the best way to learn to do philosophy.
I may edit this comment with links as I find them. Can't reply much right now though.
This looks really interesting, I'd be eager to try an online version if it's not too much trouble to make?
[edit 7d later: I was too angry here. I think there's some version of this that can be defended, but it's not the version I wrote while angry. edit 2mo latet: It's pretty close, but my policy suggestions need refinement and I need to justify why I think the connection to past eugenics still exists.]
they're welcome to argue in favor of genetic enhancement, as long as it happens after birth. yes, I know it's orders of magnitude harder. But knowing anything about an abortable child should be illegal. I'm a big fan of abortion, as long as it is blind to who is being aborted, because as soon as it depends on the child's characteristics, it's reaching into the distribution and killing some probabilistic people - without that, it's not killing the probabilistic person. Another way to put this is, life begins at information leakage. anything else permits information leakage that imposes the agency of the parents. That's not acceptable. Argue for augmenting a living organism all you want! we'll need at least mild superintelligence to pull it off, unfortunately, but it's absolutely permitted by physics. But your attempt to suppress the societal immune response to this thing in particular is unreasonable. Since you've already driven away the people who would say this in so many words by not banning it sooner, it is my responsibility to point this out. Almost everyone I know who I didn't meet from lesswrong hates this website, and they specifically cite acceptance of eugenics as why. You're already silencing a crowd. I will not shut up.
[edit 7d later: I was too angry here. I think there's some version of this that can be defended, but it's not the version I wrote while angry. edit 2mo latet: It's pretty close, but my policy suggestions need refinement and I need to justify why I think the connection to past eugenics still exists.]
If abortion should be "blind to the child's attributes", then you should put your money where you mouth is and be willing to take of disabled children who will never have a future or be able to take care of themselves. If you won't do that, then you should concede. Your dogma will not create a sustainable, long-lasting civilization.
Solve these things in already-developed organisms. It's orders of magnitude harder, yes, but it's necessary for it to be morally acceptable. Your brother should get to go into cryonics and be revived once we can heal him. Failing that, it's just the risk you take reproducing. Or at least, that will remain my perspective, since I will keep repeating, from my dad's perspective, I would have been a defect worth eliminating. There is nothing you can say to convince me, and I will take every option available to me to prevent your success at this moral atrocity. Sorry about the suffering of ancient earth, but let's fix it in a way that produces outcomes worth creating rather than hellscapes of conformity.
[edit 7d later: I was too angry here. I think there's some version of this that can be defended, but it's not the version I wrote while angry. edit 2mo latet: It's pretty close, but my policy suggestions need refinement and I need to justify why I think the connection to past eugenics still exists.]
it kind of is, actually. Perhaps you could try not wearing your "pretense of neutrality" belief as attire and actually consider what you just said is acceptable. I'd rather say, "if something can reasonably be described as eugenics, that's an automatic failure of acceptability, and you have to argue why your thing is not eugenics in order for it to be welcome anywhere." bodily autonomy means against your parents, too.
edit: i was ratelimited for this bullshit. fuck this website
I have a user script that lets me copy the post into the Claude ui. No need to pay another service.
[edit 7d later: I was too angry here. I think there's some version of this that can be defended, but it's not the version I wrote while angry. edit 2mo latet: It's pretty close, but my policy suggestions need refinement and I need to justify why I think the connection to past eugenics still exists.]
I sure didn't. I'm surprised you expected to be downvoted. This shit typically gets upvoted here. My comment being angry and nonspecific will likely be in the deep negatives.
That said, I gave an argument sketch. People use this to eliminate people like me and my friends. Got a counter, or is my elevated emotion not worth taking seriously?
[edit 7d later: I was too angry here. I think there's some version of this that can be defended, but it's not the version I wrote while angry. edit 2mo latet: It's pretty close, but my policy suggestions need refinement and I need to justify why I think the connection to past eugenics still exists.]
Get this Nazi shit off this fucking website already for fuck's sake. Yeah yeah discourse norms, I'm supposed to be nice and be specific. But actually fuck off. I'm tired of all of y'all that upvote posts like these to 100 karma. If you can't separate your shit from eugenics then your shit is bad. Someone else can make the reasoned argument, I've had enough.
If we can't get tech advanced enough to become shapeshifters, modify already grown bodies, we don't get to mess with genes. I will never support tools that let people select children by any characteristic, they would have been used against me and so many of my friends. Wars have been fought over this, and if you try it, they will be again. Abortion should be blind to the child's attributes.
Mmm. If someone provides real examples and gets downvoted for them, isn't that stronger evidence of issues here? I think you are actually just wrong about this claim. Also, if you in particular provide examples of where you've been downvoted I expect it will be ones that carry my downvote because you were being an ass, rather than because of any factual claim they contain. (And I expect to get downvoted for saying so!)
j is likely to not have that problem, unless they also are systematically simply being an ass and calling it an opinion. If you describe ideas and don't do this shit Shankar and I are, I think you probably won't be downvoted.
moral utilitarianism is a more specific thing than utility maximization, I think?
@j-5 looking forward to your reply here. as you can see, it's likely you'll be upvoted for expanding, so I hope it's not too spooky to reply.
@Shankar Sivarajan, why did you downvote the request for examples?
I've noticed the sanity waterline has in fact been going up on other websites over the past ten years. I suspect
prediction market ethos,
YouTube science education,
things like brilliant,
and general rationalist idea leakage,
Are to blame
Build enough nuclear power plants and we could boil the oceans with current tech, yeah? They're a significant fraction of fusion output iiuc?
I mean, in situations that are generating outrage because of having influence on an outcome you do in fact care about, how do you select the information that you need in order to have what influence on that outcome is possible for you, without that overwhelming you with emotional bypass of your reasoning or breaking your ability to estimate how much influence is possible to have? To put this another way - is there a possible outrage-avoiding behavior vaguely like this one, but where if everyone took on the behavior, it would make things better instead of making one into a rock with "I cooperate and forfeit this challenge" written on it?
In other words - all of the above, I guess? but always through the lens of treating outrage as information for more integrative processes, likely information to be transformed into a form that doesn't get integrated as "self" necessarily, rather than letting outrage be the mental leader and override your own perspectives. Because if you care about the thing the outrage is about, even if you agree with it, you probably especially don't want to accept the outrage at face value, since it will degrade your response to the situation. Short-term strategic thinking about how to influence a big world pattern usually allows you to have a pretty large positive impact, especially if "you" is a copyable, self-cooperating behavior. Outrage cascades are a copyable, partially-self-cooperating behavior, but typically collapse complexity of thought. When something's urgent I wouldn't want push a social context towards dismissing it, but I'd want to push the social context towards calm, constructive, positive responses to the urgency.
You gave the example of someone getting a bunch of power and folks being concerned about this, but ending up manipulated by the outrage cascades. Your suggestions seem to lean towards simply avoiding circles which are highly charged with opinion about who or what has power; I'm most interested in versions of this advice the highly charged circles could adopt to be healthier and have more constructive responses. It does seem like your suggestions aren't too far from this, hence why it seems at all productive to ask.
Idk, those are some words. Low coherent on quite what it is I'm asking for, but you seem on a good track with this, I guess? maybe we come up with something interesting as a result of this question.
re: cloud react - yeah fair
Is there a way to do this that stays alert and active, but avoids outrage?
Well, I'm not going to be in london any time soon.
I think you should think about how your work generalizes between the topics, and try to make it possible for alignment researchers to take as much as they can from it; this is because I expect software pandemics are going to become increasingly similar to wetware pandemics, and so significant conceptual parts of defenses for either will generalize somewhat. That said, I also think that the stronger form of the alignment problem is likely to be useful to you directly on your work anyway; if detecting pandemics in any way involves ML, you're going to run into adversarial examples, and will quickly be facing the same collapsed set of problems (what objective do I train for? how well did it work, can an adversarial optimization process eg evolution or malicious bioengineers break this? what side effects will my system have if deployed?) as anyone who tries to deploy ML. If you're instead not using ML, I just think your system won't work very well and you're being unambitious at your primary goal, because serious bioengineered dangers are likely to involve present-day ML bio tools by the time they're a major issue.
But I think you in particular are doing something sufficiently important that it's quite plausible to me that you're correct. This is very unusual and I wouldn't say it to many people. (normally I'd just not bother directly saying they should switch to working on alignment because of not wanting to waste their time when I'm confident they won't be worth my time to try to spin up, and I instead just make noise about the problem vaguely in people's vicinity and let them decide to jump on it if desired.)
I like the idea of an "FDA but not awful" for AI alignment. However, the FDA is pretty severely captured and inefficient, and has pretty bad performance compared to a randomly selected WHO-listed authority; though that may not be the right comparison set, as you'd ideally want to filter to only those in countries with significant drug development research industries, I've heard the FDA is bad even by that standard. this is mentioned in the article but at the end and only briefly.
So, as soon as I saw the song name I looked it up, and I had no idea what the heck it was about until I returned and kept reading your comment. I tried getting claude to expand on it. every single time it recognized the incest themes. None of the first messages recognized suicide, but many of the second messages did, when I asked what the character singing this is thinking/intending. But I haven't found leading questions sufficient to get it to bring up suicide first. wait, nope! found a prompt where it's now consistently bringing up suicide. I had to lead it pretty hard, but I think this prompt won't make it bring up suicide for songs that don't imply it... yup, tried it on a bunch of different songs, the interpretations all match mine closely, now including negative parts. Just gotta explain why you wanna know so bad, so I think people having relationship issues won't get totally useless advice from claude. Definitely hard to get the rose colored glasses off, though, yeah.
Relying on markets for alignment implicitly assumes that economic incentives will naturally lead to aligned behavior. But we know from human societies that markets alone don't guarantee particular outcomes - real-world markets often produce unintended negative consequences at scale. Attempts at preventing this exist, but none are even close to leak-free, strong enough to contain the most extreme malicious agents the system instantiates or prevent them from breaking the system's properties; in other words, markets have particularly severe inner alignment issues, especially compared to backprop. Markets are fundamentally driven by the pursuit of defined rewards or currencies, so in such a system, how do we ensure that the currency being optimized for truly captures what we care about - how do you ensure that you're only paying for good things, in a deep way, rather than things that have "good" printed on the box?
Also, using markets in the context of alignment is nothing new, MIRI has been talking about it for years; the agent foundations group has many serious open problems related to it. If you want to make progress on making something like this a good idea, you're going to need to do theory work, because it can be confidently known now that the design you proposed is catastrophically misaligned and will exhibit all the ordinary failures markets have now.
ah my bad, my attention missed the link! that does in fact answer my whole question, and if I hadn't missed it I'd have had nothing to ask :)
That's what I already believed, but OP seems to disagree, so I'm trying to understand what they mean
query rephrase: taboo both "algorithmic ontology" and "physicalist ontology". describe how each of them constructs math to describe things in the world, and how that math differs. That is, if you're saying you have an ontology, presumably this means you have some math and some words describing how the math relates to reality. I'm interested in a comparison of that math and those words; so far you're saying things about a thing I don't really understand as being separate from physicalism. Why can't you just see yourself as multiple physical objects and still have a physicalist ontology? what makes these things different in some, any, math, as opposed to only being a difference in how the math connects to reality?
[edit: pinned to profile]
"Hard" problem
That seems to rely on answering the "hard problem of consciousness" (or as I prefer, "problem of first-person something-rather-than-nothing") with an answer like, "the integrated awareness is what gets instantiated by metaphysics".
That seems weird as heck to me. It makes more sense for first-person-something-rather-than-nothing question to be answered by "the individual perspectives of causal nodes (interacting particles' wavefunctions, or whatever else interacts in spatially local ways) in the universe's equations are what gets Instantiated™ As Real® by metaphysics".
(by metaphysics here I just mean ~that-which-can-exist, or ~the-root-node-of-all-possibility; eg this is the thing solomonoff induction tries to model by assuming the root-node-of-all-possibility contains only halting programs, or tegmark 4 tries to model as some mumble mumble blurrier version of solomonoff or something (I don't quite grok tegmark 4); I mean the root node of the entire multiverse of all things which existed at "the beginning", the most origin-y origin. the thing where, when we're surprised there's something rather than nothing, we're surprised that this thing isn't just an empty set.)
If we assume my belief about how to resolve this philosophical confusion is correct, then we cannot construct a description of a hypothetical universe that could have been among those truly instantiated as physically real in the multiverse, and yet also have this property where the hard-problem "first-person-something-rather-than-nothing" can disappear over some timesteps but not others. Instead, everything humans appear to have preferences about relating to death becomes about the so called easy problem, the question of why the many first-person-something-rather-than-nothings of the particles of our brain are able to sustain an integrated awareness. Perhaps that integrated awareness comes and goes, eg with sleep! It seems to me to be what all interesting research on consciousness is about. But I think that either, a new first-person-something-rather-than-nothing-sense-of-Consciousness is allocated to all the particles of the whole universe in every infinitesimal time slice that the universe in question's true laws permit; or, that first-person-something-rather-than-nothing is conserved over time. So I don't worry too much about losing the hard-problem consciousness, as I generally believe it's just "being made of physical stuff which Actually Exists in a privileged sense".
The thing is, this answer to the hard problem of consciousness has kind of weird results relating to eating food. Because it means eating food is a form of uploading! you transfer your chemical processes to a new chunk of matter, and a previous chunk of matter is aggregated as waste product. That waste product was previously part of you, and if every particle has a discrete first-person-something-rather-than-nothing which is conserved, then when you eat food you are "waking up" previously sleeping matter, and the waste matter goes to sleep, forgetting near everything about you into thermal noise!
"Easy" problem
So there's still an interesting problem to resolve - and in fact what I've said resolves almost nothing; it only answers camp #2, providing what I hope is an argument for why they should become primarily interested in camp #1. In camp #1 terms, we can discuss information theory or causal properties about whether the information or causal chain that makes up the things those first-person-perspective-units ie particles are information theoretically "aware of" or "know" things about their environment; we can ask causal questions - eg, "is my red your red?" can instead be "assume my red is your red if there is no experiment which can distinguish them, so can we find such an experiment?" - in which case, I don't worry about losing even the camp #1 form of selfhood-consciousness from sleep, because my brain is overwhelmingly unchanged from sleep and stopping activations and whole-brain synchronization of state doesn't mean it can't be restarted.
It's still possible that every point in spacetime has a separate first-person-something-rather-than-nothing-"consciousness"/"existence", in which case maybe actually even causally identical shapes of particles/physical stuff in my brain which are my neurons representing "a perception of red in the center of my visual field in the past 100ms" are a different qualia than the same ones at a different infinitesimal timestep, or are different qualia than if the exact same shape of particles occurred in your brain. But it seems even less possible to get traction on that metaphysical question than on the question of the origin of first-person-something-rather-than-nothing, and since I don't know of there being any great answers to something-rather-than-nothing, I figure we probably won't ever be able to know. (Also, our neurons for red are, in fact, slightly different. I expect the practical difference is small.)
But that either doesn't resolve or at least partially backs OP's point, about timesteps/timeslices already potentially being different selves in some strong sense, due to ~lack of causal access across time, or so. Since the thing I'm proposing also says non-interacting particles in equilibrium have an inactive-yet-still-real first-person-something-rather-than-nothing, then even rocks or whatever you're on top of right now or your keyboard keys carry the bare-fact-of-existence, and so my preference for not dying can't be about the particles making me up continuing to exist - they cannot be destroyed, thanks to conservation laws of the universe, only rearranged - and my preference is instead about the integrated awareness of all of these particles, where they are shaped and moving in patterns which are working together in a synchronized, evolution-refined, self-regenerating dance we call "being alive". And so it's perfectly true that the matter that makes me up can implement any preference about what successor shapes are valid.
Unwantable preferences?
On the other hand, to disagree with OP a bit, I think there's more objective truth to the matter about what humans prefer than that. Evolution should create very robust preferences for some kinds of thing, such as having some sort of successor state which is still able to maintain autopoesis. I think it's actually so highly evolutionarily unfit to not want that that it's almost unwantable for an evolved being to not want there to be some informationally related autopoietic patterns continuing in the future.
Eg, consider suicide - even suicidal people would be horrified by the idea that all humans would die if they died, and I suspect that suicide is an (incredibly high cost, please avoid it if at all possible!) adaptation that has been preserved because there are very rare cases where it can be increase the inclusive fitness of a group the organism arose from (but I generally believe it almost never is a best strategy, so if anyone reads this who is thinking about it, please be aware I think it's a terribly high cost way to solve whatever problem makes it come to mind, and there are almost certainly tractable better options - poke me if a nerd like me can ever give useful input); but I bring it up because it means, while you can maybe consider the rest of humanity or life on earth to be not sufficiently "you" in an information theory sense that dying suddenly becomes fine, it seems to me to be at least one important reason that suicide is ever acceptable to anyone at all; if they knew they were the last organism I feel like even a maximally suicidal person would want to stick it out for as long as possible, because if all other life forms are dead they'd want to preserve the last gasp of the legacy of life? idk.
But yeah, the only constraints on what you want are what physics permits matter to encode and what you already want. You probably can't just decide to want any old thing, because you already want something different than that. Other than that objection, I think I basically agree with OP.
how does a physicalist ontology differ from an algorithmic ontology in terms of the math?
What's the epistemic backing behind this claim, how much data, what kind? Did you do it, how's it gone? How many others do you know of dropping out and did it go well or poorly?
I suggest any time during the intersection of london and eastern time, ie london evening. (I'm not in either, but that intersection is a reliable win for organizing small international online social events, in my experience)
Interested. Not in london.
Well I'd put it the other way round. I don't know what phenomenal consciousness is unless it just means the bare fact of existence. I currently think the thing people call phenomenal consciousness is just "having realityfluid".
Agreed about its implementation of awareness, as opposed to being unaware but still existing. What about its implementation of existing, as opposed to nonexistence?
I think it would help if we taboo consciousness and instead talk about existence ("the hard problem"/"first-person-ness"/"camp #2", maybe also "realityfluid") and awareness ("the easy problem"/"conscious-of-what"/"camp #1", maybe also "algorithm"). I agree with much of your reasoning, though I think the case that can be made for most cells having microqualia awareness seems very strong to me; whether there are larger integrated bubbles of awareness seems more suspect.
Edit: someone strong upvoted, then someone else strong downvoted. Votes are not very helpful; can you elaborate in a sentence or two or use phrase reacts?
This seems vulnerable to the typical self fulfilling prophecy stuff.
Unrolling the typical issue:
It seems likely to me that in situations where multiple self fulfilling prophecies are possible, features relating to what will happen have more structure than features relating to what did. So, this seems to my intuition like it might be end up that this theoretical framework allows for making a very reliable past-focused-only lie detector, which has some generalization to detecting future lies but is much further from robust about the future. Eg, see FixDT and work upstream of it for discussions of things like "the way you pick actions is that you decide what will be true, because you will believe it".
You could get into situations where the AI isn't lying when it says things it legitimately believes about, eg, its interlocutor; as a trivial example (though this would only work if the statement was in fact true about the future!) the ai might say, "since you're really depressed and only barely managed to talk to me today, it seems like you won't be very productive or have much impact after today." and the interlocutor can't really help but believe it because it's only true because hearing it gets them down and makes them less productive. or alternately, "you won't regret talking to me more" and then you end up wasting a lot of time talking to the AI. or various other such things. In other words, it doesn't ban mesaoptimizers, and mesaoptimizers can, if sufficiently calibrated, believe things that are true-because-they-will-cause-them; it could just as well have been an affirmation, if the AI could have fully-consistent belief that saying an affirmation would in fact make the person more productive.
I'd guess "make sense" means something higher-bar, harder-to-achieve to you than it does to most people. Are there other ways to say "make sense" that pin down the necessary level of making sense you mean to refer to?
With quantum, you can go further: even if you're instantiated in a universe that starts out perfectly rotationally symmetric, as long as we're not modifying physics to guarantee the symmetry is preserved over time, then the vastly overwhelming majority of quantum behavior will not have spurious correlation between rotationally symmetric positions, meaning that the rotational symmetry will almost instantly be lost at the nanoscale, and then it's just some chaos away from having macroscopic effects. You can think about this under the many worlds interpretation: there's a vanishingly small fragment of the wavefunction where every time an atom collides with another atom in one position, the quantum random portion of the collision's outcome is exactly matched in the rotationally symmetric position. From the outside, this means that even picking which classical slice of the wavefunction is still symmetric requires you to specifically encode the concept of symmetry in your description of what slice to take. (I don't quite know the quantum math necessary to write this down.) From the inside, it means an overwhelmingly high probability of nanoscale asymmetry, and from there it's only chaos away from divergence.
I expect it would still take at least a few seconds for the chaos in your brains to diverge enough to have any detectable difference in motor behavior. If we were instead running the same trial twice rather than putting two instances in the same room across from each other, I'd expect it would take slightly longer to become a macroscopic difference, because the separate trials don't get to notice each others' difference in behavior to speed up the divergence.
Importantly, assuming many worlds, then the wavefunction will stay symmetric - for every classical slice of the wavefunction ("world") where person on side A has taken on mindstate A and side B has mindstate B, there's another world in which person on side A has mindstate B and vice versa. This is because if many worlds is correct then it's a "forall possible outcomes", and from the outside, it's not "randomized" until you pick a slice of this forall. It's unclear whether many worlds is physically true, so this might not be the case.
[edit: pinned to profile]
I don't think "self-deception" is a satisfying answer to why this happens, as if to claim that you just need to realize that you're secretly causal decision theory inside. It seems to me that this does demonstrate a mismatch, and failing to notice the mismatch is an error, but people who want that better world need not give up on it just because there's a mismatch. I even agree that things are often optimized to make people look good. But I don't think it's correct to jump to "and therefore, people cannot objectively care about each other in ways that are not advantageous to their own personal fitness". I think there's a failure of communication, where the perspective he criticizes is broken according to its own values, and part of how it's broken involves self-deception, but saying that and calling it a day misses most of the interesting patterns in why someone who wants a better world feels drawn to the ideas involved and feels the current organizational designs are importantly broken.
I feel similarly about OP. Like, agree maybe it's insurance - but, are you sure we're using the decision theory we want to be here?
another quote from the article you linked:
To be clear, the point is not that people are Machiavellian psychopaths underneath the confabulations and self-narratives they develop. Humans have prosocial instincts, empathy, and an intuitive sense of fairness. The point is rather that these likeable features are inevitably limited, and self-serving motives—for prestige, power, and resources—often play a bigger role in our behaviour than we are eager to admit.
...or approve of? this seems more like a failure to implement ones' own values! I feel more like the "real me" is the one who Actually Cooperates Because I Care, and the present day me who fails at that does so because of failing to be sufficiently self-and-other-interpretable to be able to demand I do it reliably (but like, this is from a sort of FDT-ish perspective, where when we consider changing this, we're considering changing all people who would have a similar-to-me thought about this at once to be slightly less discooperative-in-fact). Getting to a point where we can have a better OSGT moral equilibrium (in the world where things weren't about to go really crazy from AI) would have to be an incremental deescalation of inner vs outer behavior mismatch, but I feel like we ought to be able to move that way in principle, and it seems to me that I endorse the side of this mismatch that this article calls self-deceptive. Yeah, it's hard to care about everyone, and when the only thing that gives heavy training pressure to do so is an adversarial evaluation game, it's pretty easy to be misaligned. But I think that's bad actually, and smoothly, non-abruptly moving to an evaluation environment where matching internal vs external is possible seems like in the non-AI world it would sure be pretty nice!
(edit: at very least in the humans-only scenario, I claim much of the hard part of that is doing this more-transparency-and-prosociality-demanding-environemnt in a way that doesn't cause a bunch of negative spurious demands, and/or/via just moving the discooperativeness to the choice of what demands become popular. I claim that people currently taking issue with attempts at using increased pressure to create this equilibrium are often noticing ways the more-prosociality-demanding-memes didn't sufficiently self-reflect to avoid making what are actually in some way just bad demands by more-prosocial-memes' own standards.)
maybe even in the AI world; it just like, might take a lot longer to do this for humans than we have time for. but maybe it's needed to solve the problem, idk. getting into the more speculative parts of the point I wanna make here.
Can you expand on this concisely inline? I agree with the comment you're replying to strongly and think it has been one of miri's biggest weaknesses in the past decade that they didn't build the fortitude to be able to read existing work without becoming confused by its irrelevance. But I also think your and Abram's research direction intuitions seem like some of the most important in the field right now, alongside wentworth. I'd like to understand what it is that has held you back from speed reading external work for hunch seeding for so long. To me, it seems like solving from scratch is best done not from scratch, if that makes sense. Don't defer to what you read.
I think you are also expressing high emotive confidence in your comments. You are presenting a case, and your expressed confidence slightly lower, but still elevated.
Links have high attrition rate, cf ratio of people overcoming a trivial inconvenience. Post your arguments compressed inline to get more eyeballs on them.
The structure of past data is preserved when creating a compressor. Future data is only constrained by smoothness.
The canary string was supposed to be a magic opt out button.
I'd guess your work is in the blended category where the people currently anti-ai are being incorrect by their own lights, and your work did not in fact risk the thing they are trying to protect. I'd guess purely ai generated art will remain unpopular even with the periphery, but high-human-artistry ai art will become more appreciated by the central groups as it becomes more apparent that that doesn't compete the way they thought it did. I also doubt it will displace human-first art, as that's going to stay mildly harder to create with ai as long as there's a culture of using ai in ways that are distinct from human art, and therefore lower availability of AI designed specifically to imitate the always-subtly-shifting most recent human-artist-made-by-hand style. It's already possible to imitate, but it would require different architectures.
I don't think AIs are able to produce the product that is being sold at all, because the product contains the causal chain that produced it. This is a preference that naturally generates AI-resistant jobs, as long as the people who hold it can themselves pay for the things. I mean, you might be able to value drift the people involved away if you experience machine at them hard enough, but it seems like as long as what they're doing is trying to ensure a physically extant biological being who is a furry made the thing and not the thing is high sensory intensity at furry-related features, it seems like it would actually resist that.
Now, like, you propose preference falsification. If that's the case, I'm wrong. I currently think I'm right at like, 70% or so.
Ehh, maybe fair ish, but, you said that NYT is left at all; I think that's wrong, that they're a similar kind of liberal to eg wapo, and that it's a misleading image the NYT cultivates as being "The Left's Voice" in order to damage discourse in ways that seem on brand for the kind of behavior by the NYT described in OP. I find this behavior on NYT's part frustrating, hence it coming up. In the single dimensional model it's a centrist outlet, similar to wapo; It's not "near left" at all, again, it seems to me that even near left progressivism is primarily damaged by NYT. But I don't think the single dimensional model accurately represents the differences of opinion between these opinion clusters anyway - left is a different direction than liberal, is a different direction than libertarian, is a different direction than right, is a different direction than authoritarian - many of these opinion clusters, if you dot product them with each other, are mildly positive or negative, but there are subsets of them that are intensely resonant. And I think accepting the NYT's frame on itself is letting an adversarial actor mess up the layout of your game space, such that the groups that ought to be realizing their opinions can resonate well are not. Letting views be overcoupled due to using too-low-dimensional models to represent them seems like a core way memetic attacks on coordination between disagreeing groups work.