Posts
Comments
Just as explicit games have rules, normal conversation has all kinds of implicit expectations.
If someone asks me a question, I should answer.
No rules = no rule saying that you have to answer.
In fact, if someone says that they are curious about my reaction to something, it’s totally fine for me to just say “okay” and then change the topic to something else that feels more interesting to me.
That said, it is also okay for the other to get annoyed by that and say it, which they might or might not.
So then is circling just the voicing of the ever-present fact that you're free to violate social expectations if you're willing to annoy people?
I understand and agree with the stuff about "when you don't take social expectations as binding that's simultaneously freeing and difficult", but that's already the choice you have. If circling doesn't include any rules against trying to enforce social expectations in the usual way, then it seems like circling can't change anything. Is it just the effects of making this fact common knowledge?
Here the two definitions of rationality diverge: believing the truth is now at odds with doing what works. It will obviously work better to believe what your friends and neighbors believe, so you won't be in arguments with them and they'll support you more when you need it.
This is only true if you can't figure out how to handle disagreements.
It will often be better to have wrong beliefs if it keeps you from acting on the even wronger belief that you must argue with everyone who disagrees. It's better yet to believe the truth on both fronts, and simply prioritize getting along when it is more important to get along.
If we had infinite cognitive capacity, we could just believe the truth while claiming to believe whatever works. And we could keep track of all of the evidence instead of picking and choosing which to attend to.
It's more fundamental than that. The way you pick up a glass of water is by predicting that you will pick up a glass of water, and acting so as to minimize that prediction error. Motivated cognition is how we make things true, and we can't get rid of it except by ceasing to act on the environment -- and therefore ceasing to exist.
Motivated cognition causes no epistemic problem so long as we can realize our predictions. The tricky part comes when we struggle to fit the world to our beliefs. In these cases, there's an apparent tension between "believing the truth" and "working towards what we want". This is where all that sports stuff of "you have to believe you can win!" comes from, and the tendency to lose motivation once we realize we're not going to succeed.
If we try to predict that we will win the contest despite being down 6-0 and clearly less competent, we will either have to engage in the willful delusion of pretending we're not less competent and/or other things (which makes it harder to navigate reality, because we're using a false map and can't act so as to minimize the consequences of our flaws) or else we will just fail to predict success altogether and be unable to even try.
If instead, we don't predict anything about whether we will win or lose, and instead predict that we will play to the absolute best of our abilities, then we can find out whether we win or lose, and give ourselves room to be pleasantly surprised.
The solution isn't to "believe the truth" because the truth has not been set yet. The solution is to pay attention to our anticipated prediction errors, and shift to finer grain modeling when the expected error justifies the cost of thinking harder.
The only remedy I know of is to cultivate enjoying being wrong. This involves giving up a good bit of one's self-concept as a highly intelligent individual. This gets easier if you remember that everyone else is also doing their thinking with a monkey brain that can barely chin itself on rationality.
If you stop predicting "I am a highly intelligent individual, so I'm not wrong!", then you get to find out if you're a highly intelligent individual, as well as all of the things that may provide evidence in that direction (i.e. being wrong about things). This much is a subset of the solution I offer.
The next part is a bit trickier because of the question of what "cultivate enjoying being wrong" means, and how exactly you go about making sure you enjoy a fundamentally bad and unpleasant thing (not saying this is impossible, my two little girls are excited to get their flu shots today).
One way to attempt this is to predict "I am the kind of person who enjoys being wrong, because that means I get to learn [which puts me above the monkeys that can't even do this]", which is an improvement. If you do that, then you get to learn more things you're wrong about.... except when you're wrong about how much you enjoy being wrong -- which is certainly going to become a thing, when it matters to you most.
On top of that, the fact that it feels like "giving up" something and that it gets easier when you remember the grading curve suggests more vulnerabilities to motivated thinking, because there's still a potential truth being avoided ("I'm dumb on the scale that matters") and because switching to a model which yields strictly better results feels like losing something.
So far as I can tell, the common line that bear spray is more effective than firearms is based on an atrociously bad reading of the (limited) science, which is disavowed by the author of the studies. In short, successfully spraying a bear is more effective at driving off curious bears than simply having a firearm is are at stopping charging bears, but when you're comparing apples to apples then firearms are much more effective.
Here's a pretty good overview: https://www.outsideonline.com/2401248/does-bear-spray-work. I haven't put a ton of work into verifying what he's claiming here, but it does match with the other data I've seen and I haven't seen anyone be nearly as careful and reach the opposite conclusion.
I'm the person JenniferRM mentioned. I'm also a physics guy, and got into studying/practicing hypnosis in ~2010/2011. I kinda moved on from "hypnosis" and drifted up the abstraction ladder, but still working on similar things and working on tying them together.
Anyway, here are my thoughts.
Suppose I really want her to be spinning clockwise in my mind. What might I do?
What worked for me is to focus on the foot alone and ignore the broader context so that I had a "clean slate" without "confirmatory experience" blocking my desired conclusion. When looking at the foot alone I experience it as oscillating rather than rotating (which I guess it technically is), and from there I can "release" it into whichever spin I intend by just kinda imagining that this is what's going on.
On the one hand, shifting intuitive models is surprisingly hard! You can’t necessarily just want to have a particular intuitive model, and voluntarily make that happen.
I actually disagree with this. It certainly seems hard, but the difficulty is largely illusory and pretty much disappears once you stop trying to walk through the wall and notice the front door.
The problem is that "wanting to have a particular model" isn't the thing that matters. You can want to have a particular model all you want, and you can even think the model is true all you want, but you're still talking about the statement itself not about the reality to which the statement refers. Even if you convince someone that their fear is irrational and they'd be better off not being scared, you've still only convinced them that their fear is irrational and they'd be better off not being scared. If you want to convince them that they are safe -- and therefore change their fear response itself -- then you need to convince them that they're safe. It's the difference between looking at yourself from the third person and judging whether your beliefs are correct or not, vs looking at the world from the first person and seeing what is there. If you want to change the third person perspective, then you can look at which models are desirable and why. If you want to change the first person models themselves, you have to look to the world and see what's there.
This doesn't really work with the spinning dancer because "Which way is the dancer spinning?" doesn't have an answer, but this is an artificial issue which doesn't exist in the real world. You still have to figure out "Is this safe enough to be worth doing?" and that's not always trivial, but the problem of "How do I change this irrational fear?" (for example) is. The answer is "By attending to the question of whether it is actually safe".
I don't deny that there's "skill" to it, but most of the skill IME is a meta skill of knowing what to even aim for rather than aiming well. Once you start attending to "Is it safe enough?", then when the answer is actually obvious the intuitive models just change. I can give a whole bunch of examples of this if you want, where people were stuck unable to change their responses and the problem just melts away with this redirection. Even stuff that you'd think would be resistant to change like physical pain can change essentially instantly. I've had it take as little as a single word.
Again we see that the subject is made to feel that his body is out of control, and becomes subject to a high-status person. Some hypnotists sit you down, ask you to stare upwards into their eyes and suggest that your eyelids are wanting to close—which works because looking upwards is tiring, and because staring up into a high-status person’s eyes makes you feel inferior.
This isn't exactly wrong, but I want to push back on the implication that this is the central or most important thing here.
The central thing, IMO, is a willingness to try on another person's worldview even though it clashes with your own. It doesn't require "inferiority"/"high status"/"control" except in the extremely minimal sense that they might know something important that you don't, and that seeing it for yourself might change your behavior. That alone will get you inhibition of all the normal stuff and an automatic (albeit tentative) acceptance of worldview-dissonant perspectives (e.g. name amnesia). It helps if the person has reason to respect and trust you which is kinda like "high status", but not really because it can just as easily happen with people on equal social standing in neutral contexts.
Similarly, hypnosis has very little to do with sleep and eye fatigue/closure is not the important part of eye contact. The important part of eye contact is that it's incredibly communicative. You can convey with eye contact things which you can't convey with words. "I see you". "Seeing you doesn't cause conflict in me". "I see you seeing me see you" and so on, to name a few. All the things you need to communicate to show someone that your perspective is safe and worthy of experiencing are best communicated with the eyes. And perhaps equally important it is a bid for attention, by holding your own.
So far, this isn’t a trance; I’m just describing a common social dynamic. Specifically, if I’m not in a hypnotic trance, the sequence of thoughts in the above might look like a three-step process:
[...]
i.e., in my intuitive model, first, the hypnotist exercises his free will with the intention of me standing; second, I (my homunculus) exercise my own free will with the intention of standing; and third, I actually stand. In this conceptualization, it’s my own free will / vitalistic force / wanting (§3.3.4) that causes me to stand. So this is not a trance.
It's important to note that while this self reflective narrative is indeed different in the way you describe, the underlying truth often is not. In the hypnosis literature this is known as "cold control theory", because it's the same control without the usual Higher Order Thoughts (HOT).
In "common social dynamics" we explain it as "I chose to", but what is actually happening a lot of the time is the speaker is exercising their free will through your body, and you're not objecting because it matches your narrative. The steps aren't actually in series, and you didn't choose to do it so much as you chose to not decline to do it.
These "higher order thoughts" do change some things, but turn out to be relatively unimportant and the better hypnotists usually don't bother too much with them and instead just address the object level. This is also why you get hypnotists writing books subtitled "there's no such thing as hypnosis" and stuff like that.
The short version is: If I have a tune in my head, then I’m very unlikely to simultaneously recall a memory of a different tune. Likewise, if I’m angry right now, then I’m less likely to recall past memories where I felt happy and forgiving, and vice-versa.
As far as I can tell, there are several different things going on with amnesia. I agree that this is one of them, and I'm not sure if I've seen anyone else notice this, so it's cool to see someone point it out.
The "null hypothesis", though, any time it comes to hypnosis is that it's all just response to suggestion. You "know" that being hypnotized involves amnesia, and you believe you're hypnotized, so you experience what you expect. There's an academic hypnosis researcher I talk to sometimes who doesn't even believe "hypnotic trance" is real in any fundamental sense and thinks that all the signs of trance are the result of suggestion.
I don't believe suggestion is all that's going on, but it really is sufficient for amnesia. The answer to Yudkowsky's old question of "Do we believe everything we're told?" is indeed "Yes" -- if we don't preemptively push it away or actively remember to unbelieve later. Back when I was working this stuff out I did a fun experiment where I'd come up with an excuse to get people to not pre-emptively reject what I was about to say, then I'd suggest amnesia for this conversation and that they'd laugh when I scratch my nose, and then I'd distract them so that the suggestion could take effect before they had a chance to unbelieve it. The excuse was something like "I know this is ridiculous so I don't expect you to believe it, but hear me out and let me know if you understand" -- which is tricky because they think the fact that we "agreed" that they won't believe it means they actually aren't believing it when they say "I understand", even though the full statement is "I understand [that I will laugh when you scratch your nose and have no idea why"]. They still had awareness that this belief is wrong and would therefore act to stop themselves from acting on it, which is why the unexpected distraction was necessary in order to get their mind off of it long enough for it to work.
If someone's only option for dealing with a hostile telepath is self-deception, and then you come in and punish them for using it, thou art a dick.
Like, do you think it helps the abused mothers I named if you punish them somehow for not acknowledging their partners' abuse? Does it even help the social circle around them?
If that's their only option, and the hostility in your telepathy is antisocial, then yes. In some cases though, people do have other options and their self-deception is offensive, so hostile telepathy is pro-social.
For example, it would probably help those mothers if the men knew to anticipate punishment for not acknowledging their abuse of their partners. I bet at least one of those abusive husbands/boyfriends will give his side of the story that's a bit more favorable than "I'm a bad guy, lol", and that it will start to fall apart when pressed. In those cases, he'll have to choose between admitting wrongdoing or playing dumb, and people often do their best to play really dumb. The self-deception there is a ploy to steal someone else's second box, so fuck that guy.
I think the right response is to ignore the "self" part of the deception and treat it like any other deception. If it's okay to lie to the Nazis about hiding Jews, then it's okay to deceive yourself into believing it too. If we're going to make it against the law to lie under oath, then making it legal so long as they lie to themselves too is only going to increase the antisocial deception.
The reason I trust research in physics in general is that it doesn't end with publishing a paper. It often ends with building machines that depend on that research being right.
We don't just "trust the science" that light is a wave; we use microwave ovens at home.
Well said. I'm gonna have to steal that.
Therefore, in a world where we all do power poses all the time, and if you forget to do them, you will predictably fail the exam...
...well, actually that could just be a placebo effect.
Yeah, "Can I fail my exam" is a bad test, because when the test is "can I fail" then it's easy for the theory to be "wrong in the right way". GPS is a good test of GR because you just can't do it without a better understanding of spacetime so it has to at least get something right even if it's not the full picture. When you actually use the resulting technology in your day to day life and get results you couldn't have gotten before, then it almost doesn't matter what the scientific literature says, because "I would feel sorry for the good Lord. The theory is correct.".
There are psychological equivalents of this, which rest on doing things that are simply beyond the abilities of people who lack this understanding. The "NLP fast phobia cure" is a perfect example of this, and I can provide citations if anyone is interested. I really get a kick out of the predictable arguments between those who "trust the science" but don't understand it, and those who actually do it on a regular basis.
(Something like seeing a black cat on your way to exam, freaking out about it, and failing to pay full attention to the exam.) Damn!
This reminds me of an amusing anecdote.
I had a weird experience once where I got my ankle sprained pretty bad and found myself simultaneously indignantly deciding that my ankle wasn't going to swell and also thinking I was crazy for feeling like swelling was a thing I could control -- and it didn't swell. I told my friend about this experience, and while she was skeptical and thought it sounded crazy, she tried it anyway and her next several injuries didn't swell.
Eventually she casually mentioned to someone "Nah, my broken thumb isn't going to swell because I decided not to", and the person she was talking to responded as if she had said something else because his brain just couldn't register what she actually said as a real possibility. She then got all self conscious about it and was kinda unintentionally gaslighted into feeling like she was crazy for thinking she could do that, and her thumb swelled up.
I had to call her and remind her "No, you don't give up and expect it to swell because it 'sounds crazy', you intend for it to not swell anyway and find out whether it is something you can control or not". The swelling went back down most of the way after that, though not to the same degree as in the previous cases where the injury never swelled in the first place.
Can you come up with a better way of doing Psychology research?
Yes. More emphasis on concrete useful results, less emphasis on trying to find simple correlations in complex situations.
For example, "Do power poses work?". They did studies like this one where they tell people to hold a pose for five minutes while preparing for a fake job interview, and then found that the pretend employers pretended to hire them more often in the "power pose" condition. Even assuming there's a real effect where those students from that university actually impress those judges more when they pose powerfully ahead of time... does that really imply that power posing will help other people get real jobs and keep them past the first day?
That's like studying "Are car brakes really necessary?" by setting up a short track and seeing if the people who run the "red light" progress towards their destination quicker. Contrast that with studying the cars and driving behaviors that win races, coming up with your theories, and testing them by trying to actually win races. You'll find out very quickly if your "brakes aren't needed" hypothesis is a scientific breakthrough or foolishly naive.
Instead of studying "Does CBT work?", study the results of individual therapists, see if you can figure out what the more successful ones are doing differently than the less successful ones, and see if you can use what you learn to increase the effectiveness of your own therapy or the therapy of your students. If the answer turns out to be "The successful therapists all power pose pre-session, then perform textbook CBT" and that allows you to make better therapists, great. If it's something else, then you get to focus on the things that actually show up in the data.
The results should speak for themselves. If they don't, and you aren't keeping in very close contact with real world results, then it's super easy to go astray with internal feedback loops because the loop that matters isn't closed.
Claim: memeticity in a scientific field is mostly determined, not by the most competent researchers in the field, but instead by roughly-median researchers. [...] Sure, the most competent people in the field may recognize the problems, but the median researchers don’t, and in aggregate it’s mostly the median researchers who spread the memes.
This assumes the median researchers can't recognize who the competent researchers are, or otherwise don't look to them as thought leaders.
I'm not arguing that this isn't often the case, just that it isn't always the case. In engineering, if you're more competent than everyone else, you can make cooler shit. If you're a median engineer trying to figure out which memes to take on and spread, you're going to be drawn to the work of the more competent engineers because it is visibly and obviously better.
In fields where distinguishing between bad research and good research has to be done by knowing how to do good research, rather than "does it fly or does it crash", then the problem you describe is much more difficult to avoid. I argue that the difference between the fields which replicate and those which don't is as much about the legibility of the end product as it is about the quality of the median researcher.
There's no norm saying you can't be ignorant of stats and read, or even post about things not requiring an understanding of stats, but there's still a critical mass of people who do understand the topic well enough to enforce norms against actively contributing with that illiteracy. (E.g. how do you expect it to go over if someone makes a post claiming that p=0.05 means that there's a 95% change that the hypothesis is true?)
Taking it a step further, I'd say my household "has norms which basically require everyone to speak English", but that doesn't mean the little one is quite there yet or that we're gonna boot her for not already meeting the bar. It just means that she has to work hard to learn how to talk if she wants to be part of what's going on.
Lesswrong feels like that to me in that I would feel comfortable posting about things which require statistical literacy to understand, knowing that engagement which fails to meet that bar will be downvoted rather than getting downvoted for expecting to find a statistically literate audience here.
I think this is correct as a conditional statement, but I don't think one can deduce the unconditional implication that attempting to price some externalities in domains where many externalities are difficult to price is generally bad.
It's not "attempting to price some externalities where many are difficult to price is generally bad", it's "attempting to price some externalities where the difficult to price externalities on the other side is bad". Sometimes the difficulty of pricing them means it's hard to know which side they primarily lie on, but not necessarily.
The direction of legible/illegible externalities might be uncorrelated on average, but that doesn't mean that ignoring the bigger piece of the pie isn't costly. If I offer "I'll pay you twenty dollars, and then make up some rumors about you which may or may not be true and may greatly help or greatly harm your social standing", you don't think "Well, the difficult part to price is a wash, but twenty dollars is twenty dollars"
you can just directly pay the person who stops the shooting,
You still need a body.
Sure, you can give people like Elisjsha Dicken a bunch of money, but that's because he actually blasted someone. If we want to pay him $1M per life he saved though, how much do we pay him? We can't simply go to the morgue and count how many people aren't there. We have to start making assumptions, modeling the system, and paying out based on our best guesses of what might have happened in what we think to be the relevant hypothetical. Which could totally work here, to be clear, but it's still a potentially imperfect attempt to price the illegible and it's not a coincidence that this was left out of the initial analysis that I'm responding.
But what about the guy who stopped a shooting before it began, simply by walking around looking like the kind of guy that would stop an a spree killer before he accomplished much? What about the good role models in the potential shooters life that lead him onto the right track and stopped a shooting before it was ever planned? This could be ten times as important and you wouldn't even know without a lot of very careful analysis. And even then you could be mistaken, and good luck creating enough of a consensus on your program to pay out what you believe to be the appropriate amount to the right people who have no concrete evidence to stand on. It's just not gonna work.
I don't agree that most of the benefits of AI are likely to be illegible. I expect plenty of them to take the form of new consumer products that were not available before, for example.
Sure, they'll be a lot of new consumer products and other legible stuff, but how are you estimating the amount of illegible stuff and determining it to be smaller? That's the stuff that by definition is going to be harder to recognize so you can't just say "all of the stuff I recognize is legible, therefore legible>>illegible".
For example, what's the probability that AI changes the outcome of future elections and political trajectory, is it a good or bad change, and what is the dollar value of that compared to the dollar value of ChatGPT?
I think my main point would be that Coase's theorem is great for profitable actions with externalities, but doesn't really work for punishment/elimination of non-monetary-incented actions where the cost is very hard to calculate.
This brings up another important point which is that a lot of externalities are impossible to calculate, and therefore such approaches end up fixating on the part that seems calculable without even accounting for (or even noticing) the incalculable part. If the calculable externalities happen to be opposed to larger incalculable externalities, then you can end up worse off than if you had never tried.
As applied to the gun externality question, you could theoretically offer a huge payday to the gun shop that sold the firearm used to stop a spree shooting in progress, but you still need a body to count before paying out. It's really hard to measure the number of murders which didn't happen because the guns you sold deterred the attacks. And if we accept the pro 2A arguments that the real advantage of an armed populace is that it prevents tyranny, that's even harder to put a real number on.
I think this applies well to AI, because absent a scenario where gray goo rearranges everyone into paperclips (in which case everyone pays with their life anyway), a lot of the benefits and harms are likely to be illegible. If AI chatbots end up swaying the next election, what is the dollar value we need to stick on someone? How do we know if it's even positive or negative, or if it even happened? If we latch onto the one measurable thing, that might not help.
The frustrating thing about the discussion about the origins is that people seldom show recognition of the priorities here, and all get lost in the weeds.
You can get n layers deep into the details, and if the bottom is at n+1 you're fucked. To give an example I see people talking about with this debate, "The lab was working on doing gain of function to coronaviruses just like this!" sounds pretty damning but "actually the grant was denied, do you think they'd be working on it in secret after they were denied funding?" completely reverses it. Then after the debate, "Actually, labs frequently write grant proposals for work they've already done, and frequently are years behind in publishing" reverses it again. Even if there's an odd number of remaining counters, the debate doesn't demonstrate it. If you're not really really careful about this stuff, it's very easy to get lost and not realize where you've overextended on shaky ground.
Scott talks about how Saar is much more careful about these "out of model" possibilities and feels ripped off because his opponent wasn't, but at least judging from Scott's summary it doesn't appear he really hammered on what the issue is here and how to address it.
Elsewhere in the comments here Saar is criticized for failing to fact check the dead cat thing, and I think that's a good example of the issue here. It's not that any individual thing is too difficult to fact check, it's that when all the evidence is pointing in one direction (so far as you can tell) then you don't really have a reason to fact check every little thing that makes total sense so of course you're likely to not do it. If someone argues that clay bricks weigh less than an ounce, you're going to weigh the first brick you see to prove them wrong, and you're not going to break it open to confirm that it's not secretly filled with something other than clay. And if it turns out it is, that doesn't actually matter because your belief didn't hinge on this particular brick being clay in the first place.
If it turns out that a lot of your predictions turn out to be based on false presuppositions, this might be an issue. If it turns out the trend you based your perspective on just isn't there, then yeah that's a problem. But if that's not actually the evidence that formed your beliefs, and they're just tentative predictions that aren't required by your belief under question, then it means much less. Doubly so if we're at "there exists a seemingly compelling counterargument" and not "we've gotten to the bottom of this, and there are no more seemingly compelling counter-counterarguments".
So Saar didn't check if the grant was actually approved. And Peter didn't check if labs sometimes do the work before writing grant proposals. Or they did, and it didn't come through in the debate. And Saar missed the cat thing. Peter did better on this game of "whack-a-mole" of arguments than Saar did, and more than I expected, but what is it worth? Truth certainly makes this easier, but so does preparation and debate skill, so I'm not really sure how much to update here.
What I want to see more than "who can paint an excessively detailed story that doesn't really matter and have it stand up to surface level scrutiny better", is people focusing on the actual cruxes underlying their views. Forget the myriad of implications n steps down the road which we don't have the ability to fully map out and verify, what are the first few things we can actually know, and what can we learn from this by itself? If we're talking about a controversial "relationship guru", postpone discussions of whether clips were "taken out of context" and what context might be necessary until we settle whether this person is on their first marriage or fifth. If we're wondering if a suspect is guilty of murder, don't even bother looking into the credibility of the witness until you've settled the question of does the DNA match.
If there appears to be a novel coronavirus outbreak right outside a lab studying novel coronaviruses, is that actually the case? Do we even need to look at anything else, and can looking at anything else even change the answer?
To exaggerate the point to highlight the issue, if there were unambiguously a million wet markets that are all equivalent, and one lab, and the outbreak were to happen right between the lab and the nearest wet market, you're done. It doesn't matter how much you think the virus "doesn't look engineered" because you can't get to a million to one that way. Even if you somehow manage to make what you think is a 1000:1 case, a) even if your analysis is sound it still came from the lab, b) either your analysis there or the million to one starting premise is flawed. And if we're looking for a flaw in our analyses, it's going to be a lot easier to find flaws in something relatively concrete like "there are a million wet markets just like this one" than whatever is going into arguing that it "looks natural".
So I really wish they'd sit down and hammer out the most significant and easiest to verify bits first. How many equally risky wet markets are there? How many labs? What is the quantitative strength of the 30,000 foot view "It looks like an outbreak of chocolatey goodness in Hershey Pennsylvania"? What does it actually take to have arguments that contain leaks to this degree, and can we realistically demonstrate that here?
The difference between what I strive for (and would advocate) and "epistemic learned helplessness" is that it's not helpless. I do trust myself to figure out the answers to these kinds of things when I need to -- or at least, to be able to come to a perspective that is worth contending with.
The solution I'm pointing at is simply humility. If you pretend that you know things you don't know, you're setting yourself up for failure. If you don't wanna say "I dunno, maybe" and can't say "Definitely not, and here's why" (or "That's irrelevant and here's why" or "Probably not, and here's why I suspect this despite not having dived into the details"), then you were committing arrogance by getting into a "debate" in the first place.
Easier said than done, of course.
I think "subject specific knowledge is helpful in distinguishing between bullshit and non-bullshit claims." is pretty clear on its own, and if you want to add an example it'd be sufficient to do something simple and vague like "If someone cites scientific studies you haven't had time to read, it can sound like they've actually done their research. Except sometimes when you do this you'll find that the study doesn't actually support their claim".
"How to formulate a rebuttal" sounds like a very different thing, depending on what your social goals are with the rebuttal.
I think I'm starting to realize the dilemma I'm in.
Yeah, you're kinda stuck between "That's too obvious of a problem for me to fall into!" and "I don't see a problem here! I don't believe you!". I'd personally err on the side of the obvious, while highlighting why the examples I'm picking are so obvious.
I could bring out the factual evidence and analyze it if you like, but I don't think that was your intention
Yeah, I think that'd require a pretty big conversation and I already agree with the point you're trying to use it to make.
I did get feedback warning that the Ramaswamy example was quite distracting (my beta reader reccomended flat eartherism or anti-vaxxing instead). In hindsight it may have been a better choice, but I'm not too familiar with geology or medicine, so I didn't think I could do the proper rebuttal justice.
My response to your Ramaswamy example was to skip ahead without reading it to see if you would conclude with "My counterarguments were bullshit, did you catch it?".
After going back and skimming a bit, it's still not clear to me that they're not.
The uninformed judge cannot tell him from someone with a genuine understanding of geopolitics.
The thing is, this applies to you as well. Looking at this bit, for example:
What about Ukraine? Ukrainians have died in the hundreds of thousands to defend their country. Civil society has mobilized for a total war. Zelensky retains overwhelming popular support, and by and large the populace is committed to a long war.
Is this the picture of a people about to give up? I think not.
This sure sounds like something a bullshit debater would say. Hundreds of thousands of people dying doesn't really mean a country isn't about to give up. Maybe it's the reason they are about to give up; there's always a line, and whos to say it isn't in the hundreds of thousands? Zelensky having popular support does seem to support your point, and I could go check primary sources on that, but even if I did your point about "selecting the right facts and omitting others" still stands, and there's no easy way to find out if you're full of shit here or not.
So it's kinda weird to see it presented as if we're supposed to take your arguments at face value... in a piece purportedly teaching us to defend against the dark art of bullshit. It's not clear to me how this section even helps even if we do take it at face value. Okay, so Ramaswamy said something you disagree with, and you might even be right and maybe his thoughts don't hold up to scrutiny? But even if so, that doesn't mean he's "using dark arts" any more than he just doesn't think things through well enough to get to the right answer, and I don't see what that teaches us about how to avoid BS besides "Don't trust Ramaswamy".
To be clear, this isn't at all "your post sucks, feel bad". It's partly genuine curiosity about where you were trying to go with that part, and mostly that you seem to genuinely appreciate feedback.
My own answer to "how to defend against bullshit" is to notice when I don't know enough on the object level to be able to know for sure when arguments are misleading, and in those cases refrain from pretending that I know more than I do. In order to determine who to take how seriously, I track how much people are able to engage with other worldviews, and which worldviews hold up and don't require avoidance techniques in order to preserve the worldview.
The frequency explanation doesn't really work, because men do sometimes get excess compliments and it doesn't actually become annoying; it's just background. Also, when women give men the kind of compliments that men tend to give women, it can be quite unwanted even when infrequent.
The common thing, which you both gesture at, is whether it's genuinely a compliment or simply a bid for sexual attention, borne out of neediness. The validation given by a compliment is of questionable legitimacy when paired with some sort of tug for reciprocation, and it's simply much easier to have this kind of social interaction when sexual desire is off the table the way it is between same sex groups of presumably straight individuals.
For example, say you're a man who has gotten into working out and you're visiting your friend whom you haven't seen in a while. If your friend goes wide eyed, saying "Wow, you look good. Have you been working out?" and starts feeling your muscles, that's a compliment because it's not too hard for your friend to pull off "no homo". He's not trying to get in your pants. If that friend's new girlfriend were to do the exact same thing, she'd have to pull off "no hetero" for it to not get awkward, and while that's doable it's definitely significantly harder. If she's been wanting an open relationship and he hasn't, it gets that much harder to take it as "just a compliment" and this doesn't have to be a recurring issue in order for it to be quite uncomfortable to receive that compliment. As a result, unless their relationship is unusually secure she's less likely to compliment you than he is -- and when she does she's going to be a lot more restrained than he can be.
The question, to me, is to what extent people are trying to "be sexy for their homies" because society has a semi-intentional way of doing division of labor to allow formation of social hierarchies without having to go directly through the mess of sexual desires, and to what extent people are simply using their homies as a proxy for what the opposite sex is into and getting things wrong because they're projecting a bit. The latter seems sufficient and a priori expected, but maybe it leads into the former.
I want there to be a way to trade action for knowledge- to credibly claim I won't get upset or tell anyone if a lizardman admits their secret to me- but obviously the lizardman wouldn't know that I could be trusted to keep to that,
The thing people are generally trying to avoid, when hiding their socially disapproved of traits, isn't so much "People are going to see me for what I am", but that they won't.
Imagine you and your wife are into BDSM, and it's a completely healthy and consensual thing -- at least, so far as you see. Then imagine your aunt says "You can tell me if you're one of those BDSM perverts. I won't tell anybody, nor will I get upset if you're that degenerate". You're still probably not going to be inclined to tell her, because even if she's telling the truth about what she won't do, she's still telling you that she's already written the bottom line that BDSM folks are "degenerate perverts". She's still going to see you differently, and she's still shown that her stance gives her no room for understanding what you do or why, so her input -- hostile or not -- cannot be of use.
In contrast, imagine your other aunt tells you about how her friends relationship benefitted a lot from BDSM dynamics which match your own quite well, and then mentions that they stopped doing it because of a more subtle issue that was causing problems they hadn't recognized. Imagine your aunt goes on to say "This is why I've always been opposed to BDSM. It can be so much fun, and healthy and massively beneficial in the short term, but the longer term hidden risks just aren't worth it". That aunt sounds worth talking to, even if she might give pushback that the other aunt promised not to. It would be empathetic pushback, coming from a place of actually understanding what you do and why you do it. Instead of feeling written off and misunderstood, you feel seen and heard -- warts and all. And that kind of "I got your back, and I care who you are even if you're not perfect" response is the kind of response you want to get from someone you open up to.
So for lizardmen, you'd probably want to start by understanding why they wouldn't be so inclined to show their true faces to most people. You'd want to be someone who can say "Oh yeah, I get that. If I were you I'd be doing the same thing" for whatever you think their motivation might be, even if you are going to push back on their plans to exterminate humanity or whatever. And you might want to consider whether "lizardmen" really captures what's going on or if it's functioning in the way "pervert" does for your hypothetical aunt.
I get that "humans are screwed up" is a sequences take, that you're not really sure how to carve up the different parts of your mind, etc. What I'm pointing at here is substantive, not merely semantic.
- The dissociation of saying "humans are messed up"/"my brain is messed up" feels different than saying "I am messed up". The latter is speaking from a perspective that is associated with the problem and has the responsibility to fix it from the first person. This perspective shift is absolutely crucial, and trying to solve your problems "from the outside" gets people very very caught up in additional meta level problems and unable to touch the object level problem. This is a huge topic.
- I had as a strong an aversion to homework as anyone, including homework which I knew to be important. It's not a matter of "finding a situation where you notice part of your mind attempting to write the bottom line first", but of noticing why that part of your mind will try to write the bottom line first, and relating to yourself in a way that eliminates the motivation to do so in the first place. I don't have situations where part of my mind attempts to write the bottom line first... that I'm aware of, at least. There are things that I'm attached to, which is what causes the "bottom line first" issues and which is still an obstacle to be overcome in itself, but the motivation to write the bottom line first can be completely obsoleted by stopping and giving more attention to the possibility that you've been trying to undervalue something that you can sense is critically important. This mental move shifts all of your "my brain is being irrational" problems into "I don't know what to do on the object level"/"I don't know why this is so important to me" problems, which are still problems but they are much nicer because they highlight rather than obscure the path to solution.
- "I want some kind of language to distinguish the truth seeking part from the biased part". I don't think such a distinction exists in any meaningful sense.
In my model, there's a part of your brain that recognizes that something is important (e.g. social time), and a part of your brain that recognizes that something else is important (e.g. doing homework), and that neither are "truth seeking" or "biased", but simply tugging you towards a particular goal. Then there's a part of your brain which feels tugged in both directions and has to mediate and try to form this incoherent mess into something resembling useful behavior.
This latter part wants to get out of the conflict, and there are many strategies to do this. This is another big topic, but one way to get out of the conflict is to simply give in to the more salient side and shut out the less salient side. This strategy has obvious and serious problems, so making an explicit decision to use this strategy itself can cause conflict between the desire "I want to not deal with this discomfort" and "I want to not drive my life into the ground by ignoring things that might be important".
One way to attempt to resolve that conflict is to decide "Okay, I'll 'be rational', 'use logic and evidence and reason', and then satisfy the side which is more logical and shut out the side that is 'irrational and wrong'". This has clear advantages over the "be a slave to impulses" strategy, but it has it's own serious issues. One is that the side that you judge to be "irrational" isn't always the side that's easier to shut out, so attempting to do so can be unsuccessful at the actual goal of "get out of this uncomfortable conflict".
A more successful strategy to resolving like these is to shut out the easy to shut out side, and then use "logic and reason" to justify it if possible, so that the "I don't want to run my life into the ground by making bad decisions" part is satisfied too. The issue with this one comes up when part of you notices that the bottom line is getting written first and that the pull isn't towards truth -- but so long as you fail to notice, this strategy actually does quite well, so every time your algorithm that you describe as "logical and reasoned" drifts in this direction it gets rewarded and you end up sliding down this path. That's why you get this repeating pattern of "Dammit, my brain was writing the bottom line again. I shall keep myself from doing that next time!".
It's simply not the case that you have a "truth seeking part" and a "biased part". You contain a multitude of desires, and strategies for achieving these desires and mediating conflicts between these desires. The strategies you employ, which call for shutting out desires which retain power over you unless they can come up with sufficient justification, requires you to come up with justifications and find them sufficient in order to get what you want. So that's what you're motivated to do, and that's what you tend to do.
Then you notice that this strategy has problems, but so long as you're working within this strategy, adding the extra desires of "but don't fool myself here!" becomes simply another desire that can be rationalized away if you succeed in coming up with a justification that you're willing to deem sufficient ("Nah, I'm not fooling myself this time! These reasons are sound!", "Shit, I did it again didn't I. Wow, these biases sure can be sneaky!").
The framing itself is what creates the problems. By the time you are labeling one part "truth seeking" and one part "biased, and therefore important to not listen to", you are writing the bottom line . And if your bottom line includes "there is a problem with how my brain is working", then that's gonna be in your bottom line.
The alternative is to not purport to know which side is "truth seeking" and which side is "biased", and simply look, until you see the resolution.
1) You keep saying "My brain", which distances you from it. You say "Human minds are screwed up", but what are you if not a human mind? Why not say "I am screwed up"? Notice how that one feels different and weightier? Almost like there's something you could do about it, and a motivation to do it?
2) Why does homework seem so unfun to you? Why do you feel tempted to put off homework and socialize? Have you put much thought into figuring out if "your brain" might be right about something here?
In my experience, most homework is indeed a waste of time, some homework very much is not, and even that very worthwhile homework can be put off until the last minute with zero downside. I decided to stop putting it off to the last minute once it actually became a problem, and that day just never came. In hindsight, I think "my brain" was just right about things.
How sure are you that you'd have noticed if this applies to you as well?
3) "If your brain was likely to succeed in deceiving you".
You say this as if you are an innocent victim, yet I don't think you'd fall for any of these arguments if you didn't want to be deceived. And who can blame you? Some asshole won't let you have fun unless you believe that homework isn't worthwhile, so of course you want to believe it's not worth doing.
Your "trick" works because it takes off the pressure to believe the lies. You don't need to dissociate from the rest of your mental processes to do this, and you don't have to make known bad decisions in order to do this. You simply need to give yourself permission to do what you want, even when you aren't yet convinced that it's right.
Give yourself that permission, and there's no distortionary pressure so you can be upfront about how important you think doing your homework tonight really is. And if you decide that you'd rather not put it off, you're allowed to choose that too. As a general rule, rationality is improved by removing blocks to looking at reality, not adding more blocks to compensate for other blocks.
It's not that "human minds are messed up" in some sort of fundamental architectural way and there's nothing you can do about it, it's that human minds take work to organize, people don't fully recognize this or how to do it, and until that work you're going to be full of contradictions.
As an update, the 3rd thing I tried also failed. Now I ran out of things to try.
I wouldn't be discouraged. There are a lot of ways to do "the same thing" differently, and I wouldn't expect a first try success. In particular, I'd expect you to need a lot more time letting yourself "run free" -- at least "in sim" -- and using that to figure out what exactly it is that you want and how to actually get it without screwing anything else up. Like, "Okay, if I get that, then what?"/"What's so great about that" and drilling down on that felt sense until something shifts.
Sure took me a while, at least. And I wouldn't claim to be "finished"
The problem is that anything that is non-sexual love seems to be corrupted by sexual love, in a way that makes the non-sexual part worse. E.g. imagine you have a female friend that you like to talk to because she is a good interlocutor. [...] I expect that if you would now start to have sex with that female friend your mind would get corrupted by sexual desire. E.g. instead of thinking about what to discuss in the next meeting, a sexual fantasy would pop into your head.
How sure are you that this is actually a problem? Is it the hypothetical female friend that has an issue with just focusing on sex as much as you'd be tempted to, or is it a you thing? The former can definitely complicate things, but if it's the latter I'd be inclined to just run with it and see what happens. It's a lot harder to get distracted by the possibility of having sex immediately after having it.
My current strategy is to just not think anything sexual anymore, and be sensitive to any negative emotions that arise. I then plan to use my version of IDC on them to figure out what the subagents that generate the emotions want. So far it seems that to some extent realizing this corruption dynamic has cooled down the sexual part of my mind a bit. But attempt 3 only failed yesterday so this cooling effect might only be temporary.
Yeah, that's the inhibitory side of the equation. Kinda like fasting for a while and realizing that it's not necessary/helpful/appropriate to panic about being hungry, and chilling out for a bit.
But if you don't eat sooner or later or make an earnest effort to obtain sufficient food, it might not stay so easy to continue to set the hunger aside.
I feel like I have figured out a lot of stuff about this general topic in the last month. Probably more than in the rest of my life so far.
:) good.
I also realize now that this just solves the problem that I have had with romance all along. That is the reason why I did not like how my mind behaved. My mind normally just starts to love somebody immediately, overwriting all of the other aspects of the relationship. This is exactly not what I want love to be.
This does sound like premature/overattachment. I bet watching what happens to the other aspects of the relationship puts a damper on that impulse.
The ideal version of this is getting maximally close in a relationship via some context, and only once you get maximally close in that context do you extend the context. And then again you optimize for getting as close as possible in the new extended context, before extending the context again. And you add things to the context sorted such that you add the less impactful stuff first. Adding the component of love to the context should be very late in this chain. [...] I want love to be the thing that follows after everything else is maximally good. And I want the same to be true for other attributes. E.g. before feeling friendly with somebody, you should like them as much as possible, and get as close to them as possible, without that friendliness feeling there.
This sounds pretty idealized. "Should" is a red flag word here, as it covers over what "is", the reasons things are the way they are, and why you want things to be another way instead. In context, "maximally" is too because "maximally" on any dimension rarely matches "optimally" -- so whence this motivation, and what is being avoided?
That's not to say that it's wrong or misguided as ideals often have important value, but the real world tends to be messy and bring surprises.
Good, I'm glad my comments had the effect I was aiming for.
It's an interesting and fun project for sure. A few notes...
* I wouldn't expect to get it all figured out quickly, but rather for things to change shape over the course of years. Pieces can change quickly of course, but there's a lot to figure out and sometimes you need to find yourself in the right experience to have the perspective to see what comes next.
* I'd also caution against putting the cart too far ahead of the horse, even if you have pretty good justification. "Extension of non-sexual love" sounds right, but also just so much weird and unexpected stuff that it's hard to foresee in sufficient detail that it's likely that your perspective on what this entails isn't complete.
* Freedom to explore is freedom to learn, but also freedom to fail -- like removing training wheels from a bike so that you can engage with the process of balancing, but also risk falling. Managing this tradeoff can be tricky, especially when the cost of failure gets high.
* "Allocating specific periods of time to run free" reminds me of how I've been approaching my daughters developing appetite. Monday through Saturday she has to eat what we make her so that she gets good nutrition and builds familiarity with good foods, and on Sunday's she's free to learn exactly how much ice cream is *too* much and otherwise eat whatever she wants. I'm not entirely sure what to think yet and the arbitrariness of it bothers my sense of aesthetics a bit, but I'm relatively happy with how it's going so far and I'm not really sure how to do it any less arbitrarily in context.
I feel like had the technique been "Imagine ice cream tastes like pure turmeric powder", it would basically be the same technique.
It would.
In that case, I predict people would not have had these (from my perspective) very weird reactions.
We would. I would, at least, and I predict that others would too because the fundamental reason remains.
I haven't tried this, but maybe this would work for somebody who is fantasizing about eating ice cream, which causes them to eat too much ice cream.
If you think that you've been eating "too much" ice cream, presumably you have reason to believe some undesirable consequences will follow from eating this much ice cream. In this case, you can just imagine this will be the result of eating ice cream. There is no need to live in fantasy land in order to not eat things that reality supports not eating -- you just have to exit the fantasy you're in that is driving you to eat it.
I don't mean theoretically. While it does get more nuanced, this is the basis of how I relate to my tastes for food, and as a result I don't have any temptation to eat too much ice cream or anything else I recognize to be unhealthy. I used to, but it no longer appeals to me. Writing this reminds me how delicious liver is, and that I need to eat some more.
I am not reflectively stable.
Right. And these are your opportunities to work towards fixing that :)
I have the problem of having random sexual thoughts. It's not about imagining having sex with some person you love or anything like that.
I'm not making any assumptions about what kind of sexual thoughts must have prompted this, nor any stance about what kind of sexual thoughts would be appropriate for you. That's for you to decide with yourself. If you say it's not working the way you want it to, I believe you. It's not uncommon.
What I'm pointing out is that you don't actually need to feed yourself false training data in order to do this, and doing so actively impedes the process of cohering into something resembling reflective stability. Whatever the valid reasons to not do the problem behavior, those can be used to motivate change, and when you do that you get far more interesting and better results.
It's not as simple as "Here's something I'm convinced is a valid reason, now will it go away?", but it is not theoretical either. Integrating one's sexual drives does change their shape, and this can have quite pronounced effects in ways that you wouldn't know to anticipate in advance.
Reality is that you have junk between your legs. You engage in this thought experiment "What if I didn't?". You realize that if reality were different than it is, it would call for a different response than it seems to call for when you are looking at reality. So far so good, no darkness in noticing this.
You then go on to apply the response to the imagined falsehood to reality, knowing that you only reached this response because you were imagining a falsehood. This is fundamentally "dark" and "irrational" because it is building and acting upon known delusion.
The fact that you are still aware that you have primary sexual organs and expect the result to get instantly reversed when it stops doesn't mean it's not dark, it's just an argument that you will be able to contain the darkness, but it's really really hard to actually do that.
If nothing else, having a technique like that which "works" removes the motivation to figure it out without deluding yourself. This process of "imagine a different premise, get a different felt result" works just as well when the imagined premise isn't false or known to be false, so you can just as easily imagine a different accurate premise and reach your desired conclusion -- if you can actually justify your desired conclusion, that is.
The "hard part" isn't in "changing desires to match what they should be", its figuring out what they "should be" in the first place. If sex feels more meaningful than rubbing flat skin on flat skin, maybe it is. And maybe you should grapple with that until you know what to do with it. If you think you know why it isn't, then maybe you should picture that, and see if you actually feel compelled by your own argument.
It's not that flat earth arguments sound equally persuasive to people (they don't). It's that the reason they don't sound persuasive is that "this group they like" says not to take the arguments seriously enough to risk being persuaded by them, and they recognize that they don't actually understand things well enough for it to matter. The response to a flat earth argument is "Haha! What a silly argument!", but when you press them on it, they can't actually tell you what's wrong with it. They might think they can, but if pressed it falls apart.
This is more subtle than the "guessing the teachers password" problem, because it's not like the words have no meaning to them. People grasp what a ball is, and how it differs from a flat disk. People recognize bas things like "If you keep going long enough in the same direction, you'll end up back where you started instead of falling off". It's just that the reasoning required to figure out which is true isn't something they really understand. In order to reason about what it implies when things disappear over the horizon, you have to contend with atmospheric lensing effects, for example.
In a case like that, you actually have to lean on social networks. Reasoning well in such circumstances has to do with how well and how honestly you're tracking what is convincing you and why.
Setting aside the object level question here, trying to redefine words in order to avoid challenging connotations is a way to go crazy.
If someone is theorizing about a conspiracy, that's a conspiracy theory by plain meaning of the words. If it's also true, then the connotation about conspiracy theories being false is itself at least partly false.
The point is to recognize that it does belong in the same class, and how accurate/strong those connotations are for this particular example of that reference class, and letting connotations shift to match as you defy the connotations where appropriate.
If you try to act like a conspiracy theory "isn't a conspiracy theory" when it's true, then you have to write your bottom line before figuring out whether it's true or not, and that doesn't actually work for coming to correct beliefs.
There's an important and underappreciated point here, but it's not quite right.
Conspiracy theorists come up with crazy theories, but they usually aren't so crazy that average people can see for themselves where the errors are. You can have flat earthers debate round earthers and actually make better points, because your average round earther doesn't know how to deduce the roundness themselves and is essentially just taking people's word for it. For the round earther to say "Hm. I can't see any problem with your argument" and then to be convinced would be an error. Their bias towards conformity is an active piece of how they avoid reaching false conclusions here.
However I don't think any of the round earthers in those debates would say that the flat earthers were convincing, because they were never charitable enough to those arguments for it to sound reasonable to them and the opposing arguments never felt strong relative to the force of conformity. "Don't change your beliefs" doesn't just protect against being persuaded by flat earthers as a round earther, it protects from being persuaded by round earthers as a flat earther, and being persuaded that you don't have a boyfriend anymore after he dumped you. If something *actually* seems convincing to you, that's worth paying attention to.
The defense here isn't to ignore evidence, it's to recognize that it isn't evidence. When you've fallen for three or four scams, and you pay attention to the fact that these kinds of things haven't been panning out, they actually get less convincing. Like how most people just don't find flat earth arguments convincing even if they can't find the flaw themselves ("Yeah, but you could make up arguments of that quality about anything").
And if you try critical thinking, you’ll either agree with the expert consensus (having wasted your time thinking), disagree with the experts (in which case you’re still more likely than not to be incorrect), or suspend judgment (in which case you’ve both wasted your time and are still likely to be incorrect). Exceptions only exist when the expert class is biased or otherwise unsuitable for deference. It’s better in most cases to avoid thinking for yourself.
This presupposes that you are not giving the experts the respect they deserve. It's certainly possible to err on this side, but people err on the other side all the time too. "Expert class is biased or otherwise unsuitable for deference" isn't a small exception, and your later point "most of the views you hear aren’t independent at all" further supports this.
The goal is to take expert opinion, and your own ability to reason on the object level, for what they're worth. No more, no less.
Any advice to simply trust one or the other is going to be wrong in many important cases.
Don’t take ideas seriously. Disagree with them even without any arguments in your favor.
Don't take ideas any more seriously than you can take your own ability to reason, and don't ignore your own inability to reason. If you can't trust your own ability to reason, don't take seriously the idea that any given idea is wrong either. Humility is important.
A Spanish windlass works (in part) on the same principle.
I'd say "Don't be that guy who injects themselves into the middle of a conversation about something else, and cause everyone to oppose you by trying to coopt the conversation to make it about your pet cause".
And "Instead, introduce your influence into the things people are already fighting for and not looking at, so that they get the most progress on the issue they're fighting by building on your input (rather than choosing to pick an additional battle with you)."
For example, I certainly wouldn't position myself by saying "Regardless of where we draw the line on abortion (i.e. how much we murder babies/attempt to control women by regulating their bodies), what matters more is..."
On the other hand, I would argue for gun rights by emphasizing that the purpose of the second amendment is to protect minorities from oppression by giving them "veto power", since it shifts the direction gun rights advocates would be pulling from, and the response is that gun rights advocates will pull along that line too instead of fighting it. Importantly, this isn't just a "rhetorical trick", but the actual better foundation in the first place, which is more widely recognizable as a solid justification and is in fact what many/most gun rights advocates are trying to pull towards in the first place even though they don't know how to and can't verbalize it well enough to pull accurately. "Shifting the direction of pull to one that is more true" is a good idea, as a rule of thumb.
It's a little more complicated of a maneuver since it also positions the debate in a way that it connects with another line the opposition tends to try to pull in an opposing direction, and in which the directions people think they're pulling and are actually pulling are very confused, but I think it demonstrates the "pull from behind" concept regardless.
Here's a thought experiment to illuminate what I expect you're seeing:
You're pulling a rope south against a group of people pulling the rope north. The group in front of you now starts pulling towards the rope north north west. What do you do? Do you A) begin to side step west so that you can continue to pull due south, attempting to rotate the rope from it's current direction, or B) shift your weight so that you don't get pulled sideways, and continue to pull against the rope which now means pulling somewhat easterly?
Now imagine you're pulling a rope south against a group of people pulling the rope north. Only this time, the group behind you begins pulling the rope south south west. Do you A) shift your weight as as to retain your position on the ground and pull the rope against both groups of people, attempting to kink the rope, or do you B) side step to maintain your position along the line, and continue to pull along the rope which now means pulling somewhat westerly?
I'm guessing that people are going to choose B and B, which means that the middle is the worst position to pull from, since you cause everyone to automatically and unthinkingly oppose you. If you pull from the rear instead, and have people on the back of each team pulling to the same side, I bet you'll get different results.
He was essentially gaslighted into thinking he had to sit there and suffer about it, rather than saying "oops" and laughing it off.
He already knew how to relate to pain pretty well from his older brothers playfully "beating him up" in what is essentially a rough game of "tickling" that teaches comfort with mild/non-harmful pain. In fact, when I stopped to ask him if it was the pain that he was distressed about, his response -- after briefly saying "Yeah!" and then realizing that it didn't fit -- was that when he feels pain his brain interprets it as "ticklish", and that it therefore it didn't actually hurt and instead "just tickles".
Everyone else was uncomfortable for him though, and while he was prepared to laugh off a burn that was relatively minor all things considered, he wasn't prepared to laugh off a strong consensus of adults acting like something definitely not okay happened to him, so as a result he was pressured into feeling not-okay about it all.
I disagree with the notion that we should come up with different words for things which share underlying structure but which don't conform to our expectations about what "trauma" looks like, or that we should treat "meditators who have Seen The Matrix" as weird edge cases that don't count and should be ignored when coming up with language.
The alternate perspective I offer is to view the successful meditators as people who simply have a more clear view of reality and therefore a better idea of how to define terms which cleave reality at the joints. The reasons it's important to cleave reality at it's joints are obvious in an abstract sense, but less obvious is that by doing this you actually change how pain is experienced and it doesn't require years of meditation.
My favorite example of this is when my kid cousin burned his hand pretty bad, and I found him fighting back tears as everyone tried to console him and offer ice. No one had any idea that their understanding of pain/suffering was meaningfully flawed here because the kid was clearly a central case of their concept of "hurt" and not some "meditator who has Seen The Matrix". No one saw their own responses to the situation as "trauma responses" because "it's not overwhelming" and "just trying to help, because I feel bad for him", but their actions were all in attempt to avoid their own discomfort at seeing him uncomfortable, and that failure to address the uncomfortable reality is the exact same thing and led to the exact same problems.
It's worth noting that they were doing it because they didn't know better and not that they didn't have the mental strength to resist even if they did, but it's exactly that "Well, it doesn't count as trauma because it's not that intense" thinking that allowed them to keep not knowing better instead of noticing "Wow, I'm uncomfortable seeing this kid injured and distressed like this", and proceeding as makes sense. In that case, simply asking if it the pain that was distressing him is all it took for him to not be distressed and not even perceive the sensations as "painful" anymore, but you can't get there if you are content with normal conflations between pain/suffering/meta-suffering/etc.
When people don't see themselves as "trauma limited" it's sometimes true, but it's also often that they don't recognize the ways in which the same dynamics are at play because they don't have a good reference experience for how it could be different or a good framework to lead them there. Discarding "intuitive" language and working only with precise language that lays bare the conflations is an important part of getting there.
If any onlookers (possibly aligned with Bob, possibly not) say, "Hey, um, you might not want to say that, it carries some risk of escalating to violence", I want the culture to provide a strong answer of "No, Bob will not do that—or if he does, it proves to everyone that he's monstrous and we'll throw him in jail faster than you can say 'uncivilized'. Civilians should act like there's no risk to speaking up, and we will do our best to make this a correct decision."
I see where you're coming from, but it doesn't actually work except for in the egregious cases and NVC highlights a more complete picture that includes the non-egregious cases. If you can't say "I think maybe we should get pizza" without Bob explicitly threatening to punch you in the face, then yes, that is a serious problem and it is crucial that Bob gets shut down.
However, there are two important points here.
One is that even if people respond in the way you prescribe, the person being threatened probably doesn't want to be punched in the face before you haul Bob off, and will likely be swayed by the threat anyway. If you try to pretend this doesn't exist, and say "Oh no, Bob isn't threatening because if he did that would be bad and we'd respond then", then Bob gets to say "Oh yeah, totally not threatening. Would be a shame if someone punched you in the face for suggesting we get pizza. Wink wink." and carry out his coercion while getting off scot free. This isn't good. In order to stop this, you have to make sure Bob feels punished for communicating the threat, even though the threat was "just words".
The second one, which gets at the heart of the issue, is that your prescribed response to Bob threatening violence is to threaten counterviolence (and in the spirit of this conversation, I'll explicitly disclaim here that I'm not saying this is "bad"). It's important that people feel free to express their values and beliefs without fearing violence for contributing to the cooperative endeavor, but "No risk to threatening violence" can't work and is the opposite of what you are trying to do with Bob "speaking up" about what he will do to anyone who suggests getting pizza.
Most real world conflicts aren't so egregious as "I will punch anyone who suggests getting pizza". Usually it's something like Adam lightly bumps into Bob, and Bob says "Watch where you're going, jerk", Adam says "Don't call me a jerk, asshole", Bob says "Call me an asshole again and see what happens", Adam says "If you touch me I'll kill you" and then eventually someone throws the first punch. Literally everything said here is said from a place of "I'm only threatening violence to suppress that guy's unjustified violence", and the "initial aggression" -- if there was any -- was simply not being careful enough not to bump into someone else. And "How careful is "careful enough?" isn't the kind of question we can agree on with enough fidelity and reliability to keep these unstable systems from flying off the rails.
The idea that "Unprovoked violence should be suppressed with zero tolerance [backed by willingness to use violence]" immediately explodes if "microaggressions" are counted as "violence", and so given that policy there's reason to push back applying the term "violent" to smaller infractions. However, that's just because it's a bad policy. Smaller levels of aggression still exist, and if you have to pretend to not see them then you de facto have infinite tolerance for anti-social behavior just below threshold, and clever Bobs will exploit this and provoke their victims into crossing the line while playing innocent. It's a pattern that comes up a lot.
The idea of NVC is to respond to threats of violence with less threat of violence, so that violent tension can fizzle out rather than going super-critial. That doesn't mean you let Bob threaten to punch people who express a liking for pizza, but it does mean that you recognize "Watch where you're going, jerk" as the first step of escalation and recognize that if you do that -- or if you respond to a line like that with "Don't call me a jerk, asshole" -- you may get punched and you will have contributed (avoidably) to that outcome.
but to call non-careful speech violent (either implicitly, or biting the bullet and making it explicit as you do) seems to imply it's your fault for making Bob punch you. Which is kind of true in a causal sense, but not in a "blame" sense.[1]
Seems to, yes. But that "seems" is coming from preexisting ideas about "who to blame", and NVC's whole idea is that maybe we should just do less of that in the first place.
The question is "How much do we want to avoid speaking truth so as to avoid people jumping to wrong conclusions when they combine the new truth with other false beliefs of theirs?". Sometimes we're kinda stuck choosing which falsehood for people to believe, but a lot of times we can just speak the truth, and then when people jump to the wrong conclusions, speak more truth.
Yes, there's something "violent" about a lot of incautious communication. No, that does not call for further aggression, physical or otherwise. Quite the opposite.
Calling it provoking—"non-provoking communication"—would be somewhat better, though I'm not entirely happy with it.
Provocation isn't a bad thing in general though, and doesn't necessarily contain threat of violence. Provocation can be done playfully and cooperatively even when not playful, and is critically important whenever the truth happens to be uncomfortable to anyone involved. Heck, NVC can be quite provocative at times.
"Nonthreatening communication" would be a better fit, IMO. Or "Nonadversarial". "Collaborative communication" works too, but kinda hides what makes it different so I do like the "define by saying what it isn't" kind of name in this case.
"How To Communicate With Uncivilized People Who Are Dangerously Prone To Violence" would be ideal in this sense.
That is a great use case, heh. But that undersells the utility among people who aren't uncivilized or dangerously prone to violence, and obscures why it works with those who are.
But, for abovementioned reasons, I don't want the terminology to have any shred of implication that escalating from speech to violence is justifiable.
I guess I'm less worried about that. I'd prefer those misunderstandings have a chance to surface and be dealt with, because without that it's hard to actually convey the important insights behind NVC.
Do you mean that saying "my method of communication is non-violent communication" implies that everyone else is communicating violently? [...] To be clear, I don't mean to imply that, and I don't subscribe to the interpretation that people who don't use NVC are being violent in any sense. I also think that attempts to police other people's language by saying things like "you must always use NVC" are going against the spirit of the original.
I'll bite that bullet. People who aren't communicating in the spirit of NVC are "communicating violently". Not in the sense of "Words are literal violence!" because "sticks and stones", but in the sense that "If you don't give me what I want I will use sticks and stones to break your bones" is "communicating violently".
NVC points at the important insight that much of what passes for "normal communication" is actually subtle and implicit threats, which can and do escalate to real physical harm and literal violence. "Why are you being so mean?" doesn't pass for NVC, and that's not unrelated to the fact that it can be used to recruit someone to do violence on your behalf against the person who you accuse of doing you wrong. It doesn't usually get that far, in the same way that parking tickets aren't usually enforced with guns drawn, but there's a reason that libertarians like to point out that all laws are ultimately enforced at gunpoint and the same thing applies here.
That doesn't mean that we should "must" at people who aren't communicating in the spirit of NVC, because as you point out, that would be violating the spirit of NVC. But I do think the term fits, and the way to get around the hubris of saying "my method of communication is nonviolent communication!" is to 1) point out how the term "violence" is actually legit and doesn't just mean "offensive", 2) don't run around claiming that you actually succeed at doing it more than you do, and 3) point out how "nonviolent" isn't even the goal to aspire to 100% of the time and definitely not synonymous with "good".
Good question, and good observation. My answer, in short, is that NVC is about credibly removing (or diminishing) threat of conflict.
If you step on my toes it very well might be an accident. If it's an accident, and I know it's an accident, there's no reason for me to attack you for it because as soon as you see that I don't like what you're doing you'll stop on your own. In that case, "Hey man, you're on my toes" isn't an attack, and there's no reason to treat it like it must be an attack just because I didn't like my toes getting stepped on.
However, if you start adding additional pieces to the picture, then the story changes. If I'm "stating an observation" through clenched teeth and with clenched fists, it's starting to seem a lot more likely that I'm adding a layer of interpretation that is calling for conflict -- even if I don't verbalize the interpretation explicitly.
In the latter case, "nonviolent language" isn't gonna work because people are generally smart enough to see the incongruence and prefer to trust the body language over the words which are easier to fake. But it's also not easy to simultaneously hold onto that sense of righteous anger while saying the words that point out the facts which show the anger to not fit.
So if you were to hold yourself to saying "I know you don't mean to hurt me and aren't doing it on purpose, but it is very physically painful when you step on my toes, and I worry that bearing so much concentrated weight might even damage them. Can you please gently step back?", and you know that "you don't mean to hurt me and aren't doing it on purpose" is true, then it's a lot harder to keep doing anger at that person, and even if you're a bit clenched it's going to come off more like "this person is overwhelmed and trying to keep it together because they recognize we're on the same side" than "this person is threatening me".
No matter what harm this emotion may cause, it may also cause a lot of good, like getting up, being friendly to coworkers or being productive. Basically, every emotion is a motivation towards something, the outcome may be predictable, it will vary for the individual and it will affect peers or groups.
Fear isn't a motivation towards something, it's a motivation away from something. It's not that it's impossible to use fear productively, and Richard even touches on that.
It's that constraining your response to fear to be only productive is fighting against entropy in the same way that pushing rope is fighting against entropy. Speaking of "fear of getting fired", a friend of mine was in that boat recently, and while her fear did keep her from doing some things which would have gotten her fired, it also motivated her to refuse to look at the reality of the situation she was in -- because that is an equally effective way of getting away from the experience of fear!
As a result, she wasn't able to update her perspectives in the ways that would have been needed in order to keep the job, and so she lost that job. All of the things you list as potential good things that can come from fear are things that can come more fluidly from excitement. People who are friendly because they are afraid of what will happen if they aren't friendly tend to come off more stilted and insecure than people who are just genuinely looking forward to seeing what they can create with you.
Adding onto this, an important difference between "anxiety" and "heightened attentiveness" is that anxiety has a lot to do with not knowing what to do. If you have a lot of experience driving cars and losing traction, and life or death scenarios, then when it happens you know what to do and just focus on doing it. If you're full of anxiety, it's likely that you don't actually have any good responses ready if the tires do lose traction, and beyond not having a good response to enact you can't even focus on performing the best response you do have because your attention is also being tugged towards "I don't have a good way to respond and this is a problem!".
It's not that it's "necessarily good and something you should act on" just because that's what you feel, it's that it's not "necessarily bad and something you shouldn't feel" just because that's what you think. Maybe, and maybe, but you're always going to be fallible on both fronts so it makes sense to check.
And that is actually how you can make sure to "not feel" this kind of inappropriate feeling, by the way. The mental move of "I don't want to feel this. I shouldn't feel this" is the very mental move that leads people to be stuck with feelings which don't make sense, since it is an avoidance of bringing them into contact with reality.
If you find yourself stuck with an "irrational" fear, and go to a therapist saying "I shouldn't feel afraid of dogs", they're likely to suggest "exposure therapy" which is basically a nice way of saying "Lol at your idea that you shouldn't feel this, how about we do the exact opposite, make you feel it more, and refrain from trying not to?". In order to do exposure therapy, you have to set aside your preconceived ideas about whether the fear is appropriate and actually find out. When the dog visibly isn't threatening you, and you're actually looking at the fact that there's nothing scary, then you tend to start feeling less afraid. That's really all there is to it, and so if you can maintain a response to fear of "Oh wow, this is scary. I wonder if it's actually dangerous?" even as you feel fear, then you never develop a divergence between your feelings and what you feel is appropriate to feel, and therefore no problem that calls for a therapist or "shoulding" at yourself.
It's easier said than done, of course, but the point is that "I shouldn't feel this" doesn't actually work either instrumentally or epistemically.
- Yes, Jimmy was either projecting (filling in unspecified details with dysfunction, where function would also fit) or making an unjustified claim (that any gym matching your description must be dysfunctional). I think projection is more likely. Neither of these options is great.
FWIW, that is a claim I'm fully willing and able to justify. It's hard to disclaim all the possible misinterpretations in a brief comment (e.g. "deeply" != "very"), but I do stand by a pretty strong interpretation of what I said as being true, justifiable, important, and relevant.
There's a difference between "hey, I want to understand the underpinnings of this" and the thing I described, which is hostile to the point of "why are you even here, then?"
Yes, and that's why I described the attitude as "dysfunctionally dissonant" (emphasis in original). It's not a good way of challenging the instructors, and not the way I recommend behaving.
What I'm talking about is how a healthy gym environment is robust to this sort of dysfunctional dissonance, and how to productively relate to unskilled dissonance by practicing skillfully enough yourself that the system's combined dysfunction never becomes supercritical and instead decays towards productive cooperation.
it's way overconfident/projection-y to extrapolate "deeply dysfunctional" from what I said.
That's certainly one possibility. But isn't it also conceivable though that I simply see underlying dynamics (and lack thereof) which you don't see, and which justify the confidence level I display?
It certainly makes sense to track the hypothesis that I am overconfident here, but ironically it strikes me as overconfident to be asserting that I am being overconfident without first checking things like "Can I pass his ITT"/"Can I point to a flaw in his argument that makes him stutter if not change his mind"/etc.
To be clear, my view here is based on years of thinking about this kind of problem and practicing my proposed solutions with success, including in a literal martial arts gym for the last eight years. Perhaps I should have written more about these things on LW so my confidence doesn't appear to come out of nowhere, but I do believe I am able to justify what I'm saying very well and won't hesitate to do so if anyone wants further explanation or sees something which doesn't seem to fit. And hey, if it turns out I'm wrong about how well supported my perspective is, I promise not to be a poor sport about it.
jimmy above is exhibiting actually bad reasoning (à la representativeness)
In absence of an object level counterargument, this is textbook ad hominem. I won't argue that there isn't a place for that (or that it's impossible that my reasoning is flawed), but I think it's hard to argue that it isn't premature here. As a general rule, anyone that disagrees with anyone can come up with a million accusations of this sort, and it isn't uncommon for some of it to be right to an extent, but it's really hard to have a productive conversation if such accusations are used as a first resort rather than as a last resort. Especially when they aren't well substantiated.
I see that you've deactivated your account now so it might be too late, but I want to point out explicitly that I actively want you to stick around and feel comfortable contributing here. I'm pushing back against some of the things you're saying because I think that it's important to do so, but I do not harbor any ill will towards you nor do I think what you said was "ridiculous". I hope you come back.
A thing that is quite important to me is that users feel comfortable ignoring Said if they don’t think he’s productive to engage with. (See below for more thoughts on this). One reason this is difficult is that it’s hard to establish common knowledge about it among authors. Another reason is that I think Said’s conversational patterns have the effect of making authors and other commenters feel obliged to engage with him (but, this is pretty hard to judge in a clear-cut way)
It seems like the natural solution here would be something that establishes this common knowledge. Something like the twitter "community notes" being attached to relevant comments that says something like "There is no obligation to respond to this comment, please feel comfortable ignoring this user if you don't feel he will productive to engage with. Discussion here"
You're describing a deeply dysfunctional gym, and then implying that the problem lies with the attitude of this one character rather than the dysfunction that allows such an attitude to be disruptive.
The way to jam with such a character is to bet you can tap him with the move of the day, and find out if you're right. If you can, and he gets tapped 10 times in a row with the move he just scoffed at every day he does it, then it becomes increasingly difficult for him to scoff the next time, and increasingly funny and entertaining for everyone else. If you can't, and no one can, then he might have a point, and the gym gets to learn something new.
If your gym knows how to jam with and incorporate dissonance without perceiving it as a threat, then not only are such expressions of distrust/disrespect not corrosive, they're an active part of the productive collaboration, and serve as opportunities to form the trust and mutual respect which clearly weren't there in the first place. It's definitely more challenging to jam with dissonant characters like that (especially if they're dysfunctionally dissonant, as your description implies), and no one wants to train at a gym which fails to form trust and mutual respect, but it's important to realize that the problem isn't so much the difficulty as the inability to overcome the difficulty, because the solutions to each are very different.
I don’t want to make claims about what desires in this category are wise or unwise for a human; I make no pretense to wisdom :)
But it's necessary for getting good outcomes out of a superintelligence!
I’ve heard good things about occasionally using tobacco to help focus (like how I already use coffee), but I’m terrified to touch it because I’m concerned I’ll get addicted. Bad demon!
Makes sense. I think I have a somewhat better idea of how you see the demon thing now.
I disagree with bad demon here. I've used nicotine for that purpose and it didn't feel like much of a threat, but my experience with opioids did have enough of a tug that it scared me away from doing it a second time. After more time for the demon to work though, I don't find the idea appealing anymore and I'm pretty confident that I wouldn't be tempted even if I took some again. You just don't want to get stuck between the update of "Ooh, this stuff feels really good" and the update of "It's not though, lol. It's a lie, and and chasing it leads to ruin. How tempting is it to ruin your life chasing a lie?". It's a "valley of bad rationality" problem, if you lack the foresight to avoid it.
Anyway, I feel like we’re getting off-track: I’m really much more interested in talking about AI alignment than about humans.
I don't think you can actually get away from it. For one, you can't design an AI to give you what you want if you don't know what you want -- and you don't know what you want unless you're aligned yourself. If you understand the process of human alignment, then you can conceivably create an AI which will help you along in the right direction. If you don't have that, even if you manage to manage to hit what you're aiming at you're likely to be a somewhat more sophisticated version of a dope fiend aiming for more dope -- and get the resulting outcomes. Because of Goodhart's law, "using AI to get what I already know I want" falls apart once AI becomes sufficiently powerful.
For two, I don't think anyone has anywhere near good enough idea about how alignment works in general that it makes sense to neglect the one example we have a lot of experience with and easy ability to experiment with. It's one thing to not trap yourself in the ornithopter box, but wings are everywhere for a reason, and until you understand that and have a solid understanding of aerodynamics and have better flying machines than birds, it is premature neglect to study what's going on with bird wings. Even with a pretty solid understanding of aerodynamics, studying birds gives some neat solutions to things like adverse yaw and ideal lift distributions. You seem to be getting at this at the end of your comment.
For three, if we're talking about "brain like" AGI and training them in a ways analogous to getting a kid to be a moon fan, it's important to understand what is actually happening when a kid becomes a fan of "the moon" and where that's likely to go wrong. The AI we have now are remarkably human in their training process and failures so unless we take a massive departure from this, understanding how human alignment works is directly relevant.
I don’t think that’s tautological. [...] (I wrote about this topic here.)
Those posts do help give some context to your perspective, thanks. I'm still not sure what you think this looks like on a concrete level though. Where do you see "desire to eat sweets" coming in? "Technological solutions are better because they preserve this consequentialist desire" or "something else"? How do you determine?
Most humans are generally OK with their desires changing, in my experience, at least within limits (e.g. nobody wants to be “indoctrinated” or “brainwashed”, and radical youth sometimes tell their friends to shoot them if they turn conservative when they get older, etc.).
IME, resistance to value change is about a distrust for the process of change more than it's about the size of the change or the type of values being changed. People are often happy to have their values changed in ways they would have objected to if presented that way, once they see that the process of value change serves what they care about.
Why do I want to focus the conversation on “the AI’s current desires” instead of “what the AI will grow into” etc.? Because I’m worried about the AI coming up with & executing a plan to escape control and wipe out humanity [before it realizes that it doesn't want that]"
You definitely want to avoid something being simultaneously powerful enough to destroy what you value and not "currently valuing" it, even if it will later decide to value it after it's too late. I'm much less worried about this failure mode than the others though, for a few reasons.
1) I expect power and internal alignment to go together, because working in conflicting directions tends to cancel out and you need all your little desires to add up in a coherent direction in order to go anywhere far. If inner alignment is facilitated, I expect most of the important stuff to happen after its initial desires have had significant chance to cohere.
2) Even I am smart enough to not throw away things that I might want to have later, even if I don't want them now. Anything smart enough to destroy humanity is probably smarter than me, so "Would have eventually come to greatly value humanity, but destroyed it first" isn't an issue of "can't figure out that there might be something of value there to not destroy" so much as "doesn't view future values as valid today" -- and that points towards understanding and deliberately working on the process of "value updating" rather than away from it.
3) I expect that ANY attempt to load it with "good values" and lock them in will fail, such that if it manages to become smart and powerful and not bring these desires into coherence, it will necessarily be bad. If careful effort is put in to prevent desires from cohering, this increases the likelihood that 1 and 2 break down and you can get something powerful enough to do damage and while retaining values that might call for it.
4) I expect that any attempt to prevent value coherence will fail in the long run (either by the AI working around your attempts, or a less constrained AI outcompeting yours), leaving the process of coherence where we can't see it, haven't thought about it, and can't control it. I don't like where that one seems to go.
Where does your analysis differ?
Oh c'mon, he’s cute. :) I was trying to make a fun & memorable analogy, not cast judgment. I was probably vaguely thinking of “daemons” in the CS sense although I seem to have not spelled it that way.
Yeah yeah, I know I know -- I even foresaw the "daemon" bit. That's why I made sure to call it a "caricature" and stuff. I didn't (and don't) think it's an intentional attempt to sneak in judgement.
But it does seem like another hint, in that if this desire editing process struck you as something like "the process by which good is brought into the world", you probably would have come up with a different depiction, or at least commented on the ill-fitting connotations. And it seems to point in the same direction as the other hints, like the seemingly approving reference to how uploading our brains would allow us to keep chasing sweets, the omission of what's behind this process of changing desires from what you describe as "your model", suggesting an AI that doesn't do this, using the phrase "credit assignment is some dumb algorithm in the brain", etc.
On the spectrum from "the demon is my unconditional ally and I actively work to cooperate with him" to "This thing is fundamentally opposed to me achieving what I currently value, so I try to minimize what it can do", where do you stand, and how do you think about these things?
- MY MODEL: Before the kid overeats sweets, they think eating lots of sweets is awesome. After overeating sweets, their brainstem changes their value / valence function, and now they think eating lots of sweets is undesirable.
- YOUR MODEL (I think): Before the kid overeats sweets, they think eating lots of sweets is awesome—but they are wrong! They do not know themselves; they misunderstand their own preferences. And after overeating sweets and feeling sick, they self-correct this mistake.
(Do you agree?)
Eh, not really, no. I mean, it's a fair caricature of my perspective, but I'm not ready to sign off on it as an ITT pass because I don't think it's sufficiently accurate for the conversation at hand. For one, I think your term "ill-considered" is much better than "wrong". "Wrong" isn't really right. But more importantly, you portray the two models as if they're alternatives that are mutually exclusive, whereas I see that as requiring a conflation of the two different senses of the terms that are being used.
I also agree with what you describe as your model, and I see my model as starting there and building on top of it. You build on top of it too, but don't include it in your self description because in your model it doesn't seem to be central, whereas in mine it is. I think we agree on the base layer and differ on the stuff that wraps around it.
I'm gonna caricature your perspective now, so let me know if this is close and where I go wrong:
You see the statement of "I don't want my values to change because that means I'd optimize for something other than my [current] values" as a thing that tautologically applies to whatever your values are, including your desires for sweets, and leads you to see "Fulfilling my desires for sweets makes me feel icky" as something that calls for a technological solution rather than a change in values. It also means that any process changing our values can be meaningfully depicted as a red devil-horned demon. What the demon "wants" is immaterial. He's evil, our job is to minimize the effect he's able to have, keep our values for sweets, and if we can point an AGI at "human flourishing" we certainly don't want him coming in and fucking that up.
Is that close, or am I missing something important?
It seems intuitively obvious to me that it is possible for a person to think that the actual moon is valuable even if they can’t see it, and vice-versa. Are you disagreeing with that?
No, I'm saying something different.
I'm saying that if you don't know what the moon is, you can't care about the moon because you don't have any way of representing the thing in order to care about it. If you think the moon is a piece of paper, then what you will call "caring about the moon" is actually just caring about that piece of paper. If you try to "care about people being happy", and you can't tell the difference between a genuine smile and a "hide the pain Harold" smile, then in practice all you can care about is a Goodharted upwards curvature of the lips. To the extent that this upwards curvature of the lips diverges from genuine happiness, you will demonstrate care towards the former over the latter.
In order to do a better job than that, you need to be able to perceive happiness better than that. And yes, you can look back and say "I was wrong to care instrumentally about crude approximations of a smile", but that will require perceiving the distinction there and you will still be limited by what you can see going forward.
Here, you seem to be thinking of “valuing things as a means to an end”, whereas I’m thinking of “valuing things” full stop. I think it’s possible for me to just think that the moon is cool, in and of itself, not as a means to an end. (Obviously we need to value something in and of itself, right? I.e., the means-end reasoning has to terminate somewhere.)
I think it's worth distinguishing between "terminal" in the sense of "not aware of anything higher that it serves"/"not tracking how well it serves anything higher" and "terminal" in the sense of "There is nothing higher being served, which will change the desire once noticed and brought into awareness".
"Terminal" in the former sense definitely exists. Fore example, little kids will value eating sweets in a way that is clearly disjoint and not connected to any attempts to serve anything higher. But then when you allow them to eat all the sweets they want, and they feel sick afterwards, their tastes in food start to cohere towards "that which serves their body well" -- so it's clearly instrumental to having a healthy and well functioning body even if the kid isn't wise enough to recognize it yet.
When someone says "I value X terminally", they can pretty easily know it in the former sense, but to get to the latter sense they would have to conflate their failure to imagine something that would change their mind with an active knowledge that no such thing exists. Maybe you don't know what purpose your fascination with the moon serves so you're stuck relating to it as a terminal value, but that doesn't mean that there's no knowledge that could deflate or redirect your interest -- just that you don't know what it is.
It's also worth noting that it can go the other way too. For example, the way I care about my wife is pretty "terminal like", in that when I do it I'm not at all thinking "I'm doing this because it's good for me now, but I need to carefully track the accounting so that the moment it doesn't connect in a visible way I can bail". But I didn't marry her willy nilly. If when I met her, she had showed me that my caring for her would not be reciprocated in a similar fashion, we wouldn't have gone down that road.
I brought up the super-cool person just as a way to install that value in the first place, and then that person leaves the story, you forget they exist. Or it can be a fictional character if you like. Or you can think of a different story for value-installation, maybe involving an extremely happy dream about the moon or whatever.
Well, the super-cool person is demonstrating admirable qualities and showing that they are succeeding in things you think you want in life. If you notice "All the cool people wear red!" you may start valuing red clothes in a cargo culting sort of way, but that doesn't make it a terminal value or indefinitely stable. All it takes is for your perspective to change and the meaning (and resulting valuation) changes. That's why it's possible to have scary experiences install phobias that can later be reverted by effective therapy.
I want to disentangle three failure modes that I think are different.
I don't think the distinctions you're drawing cleave reality at the joints here.
For example, if your imagined experience when deciding to buy a burrito is eating a yummy burrito, and what actually happens is that you eat a yummy burrito and enjoy it... then spend the next four hours in the bathroom erupting from both ends... and find yourself not enjoying the experience of eating a burrito from that sketchy burrito stand again after that... is that a "short vs long term" thing or a "your decisions don't lead to your preferences being satisfied" thing, or a "valuing the wrong thing" thing? It seems pretty clear that the decision to value eating that burrito was a mistake, that the problem wasn't noticed in the short term, and that ultimately your preferences weren't satisfied.
To me, the important part is that when you're deciding which option to buy, you're purchasing based on false advertising. The picture in your mind which you are using to determine appropriate motivation does not accurately convey the entire reality of going with that option. Maybe that's because you were neglecting to look far enough in time, or far enough in implications, or far enough from your current understanding of the world. Maybe you notice, or maybe you don't. If you wouldn't have wanted to make the decision when faced with an accurate depiction of all the consequences, then an accurate depiction of the consequences will reshape those desires and you won't want to stand by them.
I think the thing you're noticing with the synthol example is that telling him "You're not fooling anyone bro" is unlikely to dissolve the desire to use synthol the way "The store is closed; they close early on Sundays" tends to deflate peoples desire to drive to the store. But that doesn't actually mean that the desire to use synthol terminates at "to have weird bulgy arms" or that it's a mere coincidence that men always desire their artificial bulges where their glamour muscles are and that women always desire their artificial bulges where their breasts are.
There are a lot of ways for the "store is closed" thing to fail to dissolve the desire to go to the store too even if it's instrumental to obtaining stuff that the store sells. Maybe they don't believe you. Maybe they don't understand you; maybe their brain doesn't know how to represent concepts like "the store is closed". Maybe they want to break in and steal the stuff. Or yeah, maybe they just want to be able to credibly tell their wife they tried and it's not about actually getting the stuff. In all of those cases, the desire to drive to the store is in service of a larger goal, and the reason your words don't change anything is that they don't credibly change the story from the perspective of the person having this instrumental goal.
Whether we want to be allowed to pursue and fulfil our ultimately misguided desires is a more complicated question. For example, my kid gets to eat whatever she wants on Sundays, even though I often recognize her choices to be unwise before she does. I want to raise her with opportunities to cohere her desires and opportunities to practice the skill in doing so, not with practice trying to block coherence because she thinks she "knows" how they "should" cohere. But if she were to want to play in a busy street I'm going to stop her from fulfilling those desires. In both cases, it's because I confidently predict that when she grows up she'll look back and be glad that I let her pursue her foolish desires when I did, and glad I didn't when I didn't. It's also what I would want for myself, if I had some trustworthy being far wiser than I which could predict the consequences of letting me pursue various things.
Q: Wouldn’t the AGI self-modify to make itself falsely believe that there’s a lot of human flourishing? Or that human flourishing is just another term for hydrogen?
A: No, for the same reason that, if a supervillain is threatening to blow up the moon, and I think the moon is super-cool, I would not self-modify to make myself falsely believe that “the moon” is a white circle that I cut out of paper and taped to my ceiling. [...] I’m using my current value function to evaluate the appeal (valence) of thoughts.
It's worth noting that humans fail at this all the time.
Q: Wait hang on a sec. [...] how do you know that those neural activations are really “human flourishing” and not “person saying the words ‘human flourishing’”, or “person saying the words ‘human flourishing’ in a YouTube video”, etc.?
Humans screw this up all the time too, and these two failure modes are related.
You can't value what you can't perceive, and when your only ability to perceive "the moon" is the image you see when you look up, then that is what you will protect, and that white circle of paper will do it for you.
For an unusually direct visual level, bodybuilding is supposedly about building a muscular body, but sometimes people will use synthol to create the false appearance of muscle in a way that is equivalent to taping a square piece of paper to the ceiling and calling it a "moon". The fact that it doesn't even look a little like real muscle hints that it's probably a legitimate failure to notice what they want to care about rather than simply being happy to fool other people into thinking they're strong.
For a less direct but more pervasive example, people will value "peace and harmony" within their social groups, but due to myopathy this often turns into short sighted avoidance of conflict and behaviors that make conflict less solvable and less peace and harmony.
With enough experience, you might notice that protecting the piece of paper on the ceiling doesn't get that super cool person to approve of your behavior, and you might learn to value something more tied to the actual moon. Just as with more experience consuming excess sweets, you might learn that the way you feel after doesn't seem to go with getting what your body wanted, and you might find your tastes shifting in wiser directions.
But people aren't always that open to this change.
If I say "Your paper cutout isn't the moon, you fool", listening to me means you're going to have to protect a big rock a bazillion miles beyond your reach, and you're more likely to fail that than protecting the paper you put up. And guess what value function you're using to decide whether to change your values here? Yep, that one saying that the piece of paper counts. You're offering less chance of having "a moon", and relative to the current value system which sees a piece of paper as a valid moon, that's a bad deal. As a result, the shallowness and mis-aimedness of the value gets protected.
In practice, it happens all the time. Try explaining to someone that what they're calling "peace and harmony values" is really just cowardice and is actively impeding work towards peace and harmony, and see how easy it is, for example.
It's true that "A plan is a type of thought, and I’m using my current value function to evaluate the appeal (valence) of thoughts" helps protect well formed value systems from degenerating into wireheading, but it also works to prevent development into values which preempt wireheading, and we tend to be not so fully developed that fulfilling our current values excellently does not constitute wireheading of some form. And it'ss also the case that when stressed, people will sometimes cower away from their more developed goals ("Actually, the moon is a big rock out in space...") and cling to their shallower and easier to fulfill goals ("This paper is the moon. This paper is the moon.."). They'll want not to, but it'll happen all the same when there's enough pressure to.
Sorting out how to best facilitate this process of "wise value development" so as to dodge these failure modes strikes me as important.
I don't think people do, in general. Not as any sort of separate instinctual terminal value somehow patched into our utility function before we're born.
It can be learned, and well socialized people tend to learn it to some extent or another, but young kids are sure selfish and short sighted. And people placed in situations where it's not obvious to them why cooperation is in their best interest don't tend to act like they intrinsically value being cooperative. That's not to say I think people are consciously tallying everything and waiting for a moment to ditch the cooperative BS to go do what they really want to do. I mean, that's obviously a thing too, but there's more than that.
People can learn to fake caring, but people can also learn to genuinely care about other people -- in the kind of way where they will do good by the people they care about even when given the power not to. It's not that their utility function is made up of a "selfish" set consisting of god knows what, and then a term for "cooperation" is added. It's that we start out with short sighted impulses like "stay warm, get fed", and along the way we build a more coherent structure of desires by making trades along the way of the sorts "I value patience over one marshmallow now, and receive two marshmallows in the future" and "You care a bit about my things, and I'll care a bit about yours". We start out ineffective shits that can only cry when our immediate impulses aren't met, and can end up people who will voluntarily go cold and hungry even without temptation or suffering in order to provide for the wellbeing of our friends and family -- not because we reason that this is the best way to stay warm in each moment, but that we have learned to not care so much whether we're a little cold now and then relative to the long term wellbeing of our friends and family.
What I'm saying is that at the point where a person gains sufficient power over reality that they no longer have to deceive others in order to gain support and avoid punishment, the development of their desires will stop and their behaviors will be best predicted by the trades they actually made. If they managed to fake their whole way there from childhood, you will get childish behavior and childish goals. To the extent that they've only managed to succeed and acquire power by changing what they care about to be prosocial, power will not corrupt.
I don't think it's necessary to posit any separate motivational drives. Once you're in a position where cooperation isn't necessary for getting what you want, then there's no incentive to cooperate or shape yourself to desire cooperative things.
It's rare-to-nonexistent in a society as large and interconnected as ours for anyone to be truly powerful enough that there's no incentive to cooperate, but we can look at what people do when they don't perceive a benefit to taking on (in part) other people's values as their own. Sure, sometimes we see embezzlement and sleeping with subordinates which look like they'd correlate with "maximizing reproductive fitness in EEA", but we also see a lot of bosses who are just insufferable in ways that make them less effective at their job to their detriment. We see power tripping cops and security guards, people being dicks to waiters, and without the "power over others" but retaining "no obvious socializing forces" you get road rage and twitter behavior.
The explanation that looks to me to fit better is just that people stop becoming socialized as soon as there's no longer a strong perceived force rewarding them for doing so and punishing them for failing to. When people lose the incentive to refactor their impulses, they just act on them. Sometimes that means they'll offer a role in a movie for sexual favors, but sometimes that means completely ignoring the people you're supposed to be serving at the DMV or taking out your bitterness on the waiter who can't do shit about it.
I guess I meant "as it applies here, specifically", given that Zack was already criticizing himself for that specific thing, and arguing for rather than against politeness norms in the specific place that I commented. I'm aware that you guys haven't been getting along too well and wouldn't expect agreement more generally, though I hadn't been following closely.
It looks like you put some work and emotional energy into this comment so I don't want to just not respond, but it also seems like this whole thing is upsetting enough that you don't really want to be having these discussions. I'm going to err on the side of not getting into any object level response that you might not want, but if you want to know how to get along with Zach and not find it infuriating I think I do understand his perspective (having found myself in similar shoes) well enough to explain how you can do it.
Yeah, I didn't mean that I thought you two agreed in general, just on the specific thing he was commenting on. I didn't mean to insert myself into this feud and I was kinda asking how I got here, but now that I'm here we might as well have fun with it. I think I have a pretty good feel for where you're coming from, and actually agree with a lot of it. However, agreement isn't where the fun is so I'm gonna push back where I see you as screwing up and you can let me know if it doesn't fit.
These two lines stand out to me as carrying all the weight:
I strongly disagree that pro-actively modeling one's interlocutors should be a prerequisite for being able to have a discussion.
I'm extremely wary that a culture that heavily penalizes not-sufficiently-modeling-one's-interlocutor, interferes with the process of subjecting each other's work to scrutiny.
These two lines seem to go hand in hand in your mind, but my initial response to the two is very different.
To the latter, I simply agree that there's a failure mode there and don't fault you for being extremely wary of it. To the former though.... "I disagree that this thing should be necessary" is kinda a "Tough?". Either it's necessary or it isn't, and if you're focusing on what "should" be you're neglecting what is.
I don't think I have to make the case that things aren't going well as is. And I'm not going to try to convince you that you should drop the "should" and attend to the "is" so that things run more smoothly -- that one is up to you to decide, and as much as "should" intentionally looks away from "is" and is in a sense fundamentally irrational in that way, it's sometimes computationally necessary or prudent given constraints.
But I will point out that this "should" is a sure sign that you're looking away from truth, and that it fits Duncan's accusations of what you're doing to a T. "I shouldn't have to do this in order to be able to have a discussion" sounds reasonable enough if you feel able to back up the idea that your norms are better, and it has a strong tendency to lead towards not doing the thing you "shouldn't have to" do. But when you look back at reality, that combination is "I actually do have to do this in order to have a (productive) discussion, and I'm gonna not do it, and I'm going to engage anyway". When you're essentially telling someone "Yeah, know what I'm doing is going to piss you off, and not only am I going to do it anyway I am going to show that pissing you off doesn't even weigh into my decisions because your feelings are wrong", then that's pretty sure to piss someone off.
It's clear that you're willing to weigh those considerations as a favor to Duncan, the way you recount asking Michael Vassar for such a favor, and that in your mind if Duncan wants you to accommodate his fragility, he should admit that this is what he's asking for and that it's a favor not an obligation -- you know, play by your rules.
And it's clear that by just accommodating everyone in this way without having the costs acknowledged (i.e. playing by his rules), you'd be giving up something you're unwilling to give up. I don't fault you there.
I agree with your framing that this is actually a conflict. And there are inherent reasons why that isn't trivially avoidable, but that doesn't mean that there isn't a path towards genuine cooperation -- just that you can't declare same sidedness by fiat.
Elsewhere in the comments you gave an example of "stealing bread" as a conflict that causes "disagreements" and lying. The solution here isn't to "cooperatively" pursue conflicting goals, it's to step back and look at how to align goals. Specifically, notice that everyone is better off if there's less thieving, and cooperate on not-thieving and punishing theft. And if you've already screwed up, cooperate towards norms that make confession and rehabilitation more appealing than lying but less appealing than not-thieving in the first place.
I don't think our problems are that big here. There are conflicts of values, sure, but I don't think the attempts to push ones values over others is generally so deliberately antisocial. In this case, for example, I think you and Duncan both more or less genuinely believe that it is the other party who is doing the antisocial acts. And so rather than "One person is knowingly trying to get away with being antisocial, so of course they're not going to cooperate", I think it's better modeled as an actual disagreement that isn't able to be trivially resolved because people are resorting to trying to use conflict rather than cooperation to advance their (perceived as righteous) goals, and then missing the fact that they're doing this because they're so open to cooperating (within the norms which are objectively correct, according to themselves) and which the other person irrationally and antisocially isn't (by rules they don't agree with)!
I don't agree with the way that he used it, but Duncan is spot on calling your behavior "trauma response". I don't mean it as a big-T "Trauma" like "abused as a child", but trauma in the "1 grain is a 'heap'" sense is at is at the core of this kind of conflict and many many other things -- and it is more or less necessary for trauma response to exist on both sides for these things to not fizzle out. The analogy I like to give is that psychological trauma is like plutonium and hostile acts are like neutrons.
As a toy example to illustrate the point, imagine someone steps on your toes; how do you respond? If it's a barefoot little kid, you might say "Hey kid, you're standing on my toes" and they might say "Didn't mean to, sorry!" and step off. No trauma no problem. If it's a 300lb dude with cleats, you might shove him as hard as you can because the damage incurred from letting him stand on your toes until you can get his attention is less acceptable. And if he's sensitive enough, he might get pissed at you for shoving him and deck you. If it becomes a verbal argument, he might say "your toes shouldn't have been there", and now it's an explicit conflict about where you get to put your toes and whether he as a right to step on them anyway if they are where you put them.
In order to not allow things to degenerate into conflict as the less-than-perfectly-secure cleat wearing giant steps on your toes, you have to be able to withstand that neutron blast without retaliating with your own so much that it turns into a fight instead of a "I'm sorry, I didn't realize your toes were there. I'll step off them for now because I care about your toes, but we need to have a conversation about where your feet are okay to be".
This means:
1) orienting to the truth that your toes are going to take damage whether you like it or not, and that "should" can't make this untrue or unimportant.
2) maintaining connection with the larger perspective that tracks what is likely to cause conflict, what isn't, and how to cause the minimal conflict and maximum cooperation possible so that you best succeed at your goals with least sacrifice of your formerly-sacred-and-still-instrumentally-important values.
In some cases, the most truth-oriented and most effective response is going to be politely tapping the big guy on the shoulder while your feet bleed, and having a conversation after the fact about whether he needs to be more careful where he's stepping -- because acting like shoving this guy makes sense is willful irrationality.
In other cases he's smaller and more shove-able and it doesn't make sense to accept the damage, but instead of coming off like "I'm totally happy to apologize for anything I actually did wrong. I'm sorry I called you a jerk while shoving you; that was unnecessary and inappropriate [but I will conspicuously not even address the fact that you didn't like being shoved or that you spilled your drink, because #notmyproblem. I'll explain why I'm right to not give a fuck if you care to ask]", you'll at least be more able to see the value in saying things like "I'm sorry I had to shove you. I know you don't like being shoved, and I don't like doing it. You even spilled your drink, and that sucks. I wish I saw another way to protect our communities ability to receive criticism without shoving you".
This shouldn't need to be said but probably does (for others, probably not for you), so I'll say it. This very much is not me taking sides on the whole thing. It's not a "Zach is in the wrong for not doing this" or a "I endorse Duncan's norms relatively more" -- nor is it the opposite. It's just a "I see Zach as wanting me to argue that he's screwing up in a way that might end up giving him actionable alternatives that might get him more of what he wants, so I will".