## Posts

Comment on "Deception as Cooperation" 2021-11-27T04:04:56.571Z
Feature Selection 2021-11-01T00:22:29.993Z
Glen Weyl: "Why I Was Wrong to Demonize Rationalism" 2021-10-08T05:36:08.691Z
Blood Is Thicker Than Water 🐬 2021-09-28T03:21:53.997Z
Sam Altman and Ezra Klein on the AI Revolution 2021-06-27T04:53:17.219Z
Reply to Nate Soares on Dolphins 2021-06-10T04:53:15.561Z
Sexual Dimorphism in Yudkowsky's Sequences, in Relation to My Gender Problems 2021-05-03T04:31:23.547Z
Communication Requires Common Interests or Differential Signal Costs 2021-03-26T06:41:25.043Z
Less Wrong Poetry Corner: Coventry Patmore's "Magna Est Veritas" 2021-01-30T05:16:26.486Z
Unnatural Categories Are Optimized for Deception 2021-01-08T20:54:57.979Z
And You Take Me the Way I Am 2020-12-31T05:45:24.952Z
Containment Thread on the Motivation and Political Context for My Philosophy of Language Agenda 2020-12-10T08:30:19.126Z
Scoring 2020 U.S. Presidential Election Predictions 2020-11-08T02:28:29.234Z
Message Length 2020-10-20T05:52:56.277Z
Msg Len 2020-10-12T03:35:05.353Z
Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem 2020-09-17T02:23:58.869Z
Maybe Lying Can't Exist?! 2020-08-23T00:36:43.740Z
Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle 2020-07-14T06:03:17.761Z
Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning 2020-06-07T07:52:09.143Z
Comment on "Endogenous Epistemic Factionalization" 2020-05-20T18:04:53.857Z
"Starwink" by Alicorn 2020-05-18T08:17:53.193Z
Zoom Technologies, Inc. vs. the Efficient Markets Hypothesis 2020-05-11T06:00:24.836Z
A Book Review 2020-04-28T17:43:07.729Z
Zeynep Tufekci on Why Telling People They Don't Need Masks Backfired 2020-03-18T04:34:09.644Z
The Heckler's Veto Is Also Subject to the Unilateralist's Curse 2020-03-09T08:11:58.886Z
Relationship Outcomes Are Not Particularly Sensitive to Small Variations in Verbal Ability 2020-02-09T00:34:39.680Z
Book Review—The Origins of Unfairness: Social Categories and Cultural Evolution 2020-01-21T06:28:33.854Z
Less Wrong Poetry Corner: Walter Raleigh's "The Lie" 2020-01-04T22:22:56.820Z
Don't Double-Crux With Suicide Rock 2020-01-01T19:02:55.707Z
Speaking Truth to Power Is a Schelling Point 2019-12-30T06:12:38.637Z
Stupidity and Dishonesty Explain Each Other Away 2019-12-28T19:21:52.198Z
Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think 2019-12-27T05:09:22.546Z
Funk-tunul's Legacy; Or, The Legend of the Extortion War 2019-12-24T09:29:51.536Z
Curtis Yarvin on A Theory of Pervasive Error 2019-11-26T07:27:12.328Z
Relevance Norms; Or, Gricean Implicature Queers the Decoupling/Contextualizing Binary 2019-11-22T06:18:59.497Z
Algorithms of Deception! 2019-10-19T18:04:17.975Z
Maybe Lying Doesn't Exist 2019-10-14T07:04:10.032Z
Schelling Categories, and Simple Membership Tests 2019-08-26T02:43:53.347Z
Status 451 on Diagnosis: Russell Aphasia 2019-08-06T04:43:30.359Z
Being Wrong Doesn't Mean You're Stupid and Bad (Probably) 2019-06-29T23:58:09.105Z
What does the word "collaborative" mean in the phrase "collaborative truthseeking"? 2019-06-26T05:26:42.295Z
The Univariate Fallacy 2019-06-15T21:43:14.315Z
Tal Yarkoni: No, it's not The Incentives—it's you 2019-06-11T07:09:16.405Z
"But It Doesn't Matter" 2019-06-01T02:06:30.624Z
Minimax Search and the Structure of Cognition! 2019-05-20T05:25:35.699Z
Where to Draw the Boundaries? 2019-04-13T21:34:30.129Z

Comment by Zack_M_Davis on The Unreasonable Feasibility Of Playing Chess Under The Influence · 2022-01-18T18:58:45.279Z · LW · GW

If I was Scott Alexander or Zvi I'd comb through those papers and wring out insight.

Huh? What's stopping you? How are Scott or Zvi relevant here, at all?

Comment by Zack_M_Davis on Blood Is Thicker Than Water 🐬 · 2022-01-11T19:10:31.252Z · LW · GW

The idea of "general education" is that it's good for ordinary people to learn lots of things that were discovered by specialists: partially because we value knowledge for its own sake, but also because it's hard to tell in advance what knowledge will end up being useful. In principle, you could reject the idea that general education is generally good, but if you're going to be consistent about what that entails, I don't think anyone who reads this website actually wants to go there. Do I really need to know that matter is made of "atoms", that have a "nucleus" composed of "uncharged" "neutrons" and "positively-charged" "protons", surrounded by "negatively-charged" "electrons"? When have I ever used any of this stuff to make money? Do I really need to know that the world is round? &c.

(Incidentally, Dick Grayson is the best Robin.)

Comment by Zack_M_Davis on The date of AI Takeover is not the day the AI takes over · 2022-01-10T16:36:41.723Z · LW · GW

I don't think we disagree about anything substantive, and I don't expect Daniel to disagree about anything substantive after reading this. It's just—

I agree that the OP's talking of PONR as a point in time doesn't make sense; a charitable read is that [...]

I don't think we should be doing charitable readings at yearly review time! If an author uses a toy model to clarify something, we want the post to say "As a clarifying toy model [...]" rather than making the readers figure it out.

Comment by Zack_M_Davis on The date of AI Takeover is not the day the AI takes over · 2022-01-10T07:04:09.201Z · LW · GW

(Thanks for this—it's important that critiques get counter-critiqued, and I think that process is stronger when third parties are involved, rather than it just being author vs. critic.)

The reason that doesn't satisfy me is because I expect the actual calculus of "influence" and "control" in real-world settings to be sufficiently complicated that there's probably not going to be any usefully identifiable "point of no return". On the contrary, if there were an identifiable PONR as a natural abstraction, I think that would be a surprising empirical fact about the world in demand of deeper explanation—that the underlying calculus of influence would just happen to simplify that way, such that you could point to an event and say, "There—that's when it all went wrong", rather than there just being (say) a continuum of increasingly detailed possible causal graphs that you can compute counterfactuals with respect to (with more detailed graphs being more expensive to learn but granting more advanced planning capabilities).

If you're pessimistic about alignment—and especially if you have short timelines like Daniel—I think most of your point-of-no-return-ness should already be in the past. When, specifically? I don't see any reason to expect there to be a simple answer. You lost some measure when OpenAI launched; you lost some measure when Norbert Weiner didn't drop everything to work on the alignment problem in 1960; you lost some measure when Samuel Butler and Charles Babbage turned out to not be the same person in our timeline; you lost some measure when the ancient Greeks didn't discover natural selection ...

The post does have a paragraph mentioning continuous loss of influence and already-lost influence in the past ("Of course, influence over the future might not disappear all on one day ..."), but the reason this doesn't satisfy me as a critic is because it seems to be treated as an afterthought ("We should keep these possibilities in mind as well"), rather than being the underlying reality to which any putative "PONR" would be a mere approximation. Instead, the rhetorical emphasis is on PONR as if it were an event: "The Date of AI Takeover Is Not the Day the AI Takes Over". (And elsewhere, Daniel writes about "PONR-inducing tasks".)

But in my philosophy, "the date" and "the day" of the title are two very different kinds of entities that are hard to talk about in the same sentence. The day AI takes over actually is a physical event that happens on some specific, definite date: nanobots disassemble the Earth, or whatever. That's not subjective; the AI historian-subprocess of the future will record a definitive timestamp of when it happened. In contrast, "the date" of PONR is massively "subjective" depending on further assumptions; the AI historian-subprocesses of the future will record some sort of summary of the decision-relevant results of a billion billion ancestor simulations, but the answer is not going to fit in a 64-bit timestamp.

Maybe to Daniel, this just looks like weirdly unmotivated nitpicking ("not super relevant to the point [he] was trying to make")? But it feels like a substantive worldview difference to me.

Comment by Zack_M_Davis on Use Normal Predictions · 2022-01-09T17:04:03.015Z · LW · GW

We rationalists are very good at making predictions, and the best of us, such as Scott Alexander

This weird self-congratulatory tribalism is a completely unnecessary distraction from an otherwise informative post. Are "we" unusually good at making predictions, compared to similarly informed and motivated people who haven't pledged fealty to our social group? How do you know?

Scott Alexander is a justly popular writer, and I've certainly benefitted from reading many of his posts, but it seems cultish and bizarre to put him on a pedestal as "the best" of "us" like this as the first sentence of a post that has nothing to do with him.

Comment by Zack_M_Davis on The date of AI Takeover is not the day the AI takes over · 2022-01-04T04:16:45.147Z · LW · GW

while my post may have given the misleading impression [...] I didn't fall for that fallacy myself.

I reach for this "bad writing" excuse sometimes, and sometimes it's plausible, but in general, I'm wary of the impulse to tell critics after the fact, "I agree, but I wasn't making that mistake," because I usually expect that if I had a deep (rather than halting, fragmentary, or inconsistent) understanding of the thing that the critic was pointing at, I would have anticipated the criticism in advance and produced different text that didn't provide the critic with the opportunity, such that I could point to a particular sentence and tell the would-be critic, "Didn't I already adequately address this here?"

Comment by Zack_M_Davis on Comment on "Endogenous Epistemic Factionalization" · 2022-01-02T22:49:08.592Z · LW · GW

lays out his reasoning very clearly [...] the bar for Peer Review

I mean, it's not really my reasoning. This academic-paper-summary-as-blog-post was basically me doing "peer" review for Synthese (because I liked the paper, but was annoyed that someone would publish a paper based on computer simulations without publishing their code).

Comment by Zack_M_Davis on The date of AI Takeover is not the day the AI takes over · 2022-01-02T22:25:54.867Z · LW · GW

This post is making a valid point (the time to intervene to prevent an outcome that would otherwise occur, is going to be before the outcome actually occurs), but I'm annoyed with the mind projection fallacy by which this post seems to treat "point of no return" as a feature of the territory, rather than your planning algorithm's map.

(And, incidentally, I wish this dumb robot cult still had a culture that cared about appreciating cognitive algorithms as the common interest of many causes, such that people would find it more natural to write a post about "point of no return"-reasoning as a general rationality topic that could have all sorts of potential applications, rather than the topic specifically being about the special case of the coming robot apocalypse. But it's probably not fair to blame Kokotajlo for this.)

The concept of a "point of no return" only makes sense relative to a class of interventions. A 1 kg ball is falling at 9.8 m/s². When is the "point of no return" at which the ball has accelerated enough such that it's no longer possible to stop it from hitting the ground?

The problem is underspecified as stated. If we add the additional information that your means of intervening is a net that can only trap objects falling with less than X kg⋅m/s² of force, then we can say that the point of no return happens at X/9.8 seconds. But it would be weird to talk about "the second we ball risk reducers lose the ability to significantly reduce the risk of the ball hitting the ground" as if that were an independent pre-existing fact that we could use to determine how strong of a net we need to buy, because it depends on the net strength.

Comment by Zack_M_Davis on $1000 USD prize - Circular Dependency of Counterfactuals · 2022-01-01T18:06:06.754Z · LW · GW Maybe I'm explaining it badly? I'm trying to point to the Judea Pearl thing in my own words. The claim is not that causality "just is" conditional independence relationships. (Pearl repeatedly explicitly disclaims that causal concepts are different from statistical concepts and require stronger assumptions.) Do you have an issue with the graph formalism itself (as an explanation of the underlying reality of how causality and counterfactuals work), separate from practical concerns about how one would learn a particular graph? Comment by Zack_M_Davis on$1000 USD prize - Circular Dependency of Counterfactuals · 2022-01-01T17:31:22.851Z · LW · GW

I think this is a solved problem. Are you familiar with the formalization of causality in terms of Bayesian networks? (You have enough history on this website that you've probably heard of it!)

Make observations using sensors. Abstract your sensory data into variables: maybe you have a weather variable with possible values RAINY and SUNNY, a sprinkler variable with possible values ON and OFF, and a sidewalk variable with possible values WET and DRY. As you make more observations, you can begin to learn statistical relationships between your variables: maybe weather and sprinkler are independent, but conditionally dependent given the value of sprinkler. It turns out that you can summarize this kind of knowledge in the form of a directed graph: weather → sidewalk ← sprinkler. (I'm glossing over a lot of details: a graph represents conditional-independence relationships in the joint distribution over your variables, but the distribution doesn't uniquely specify a graph.)

But once you've learned this graphical model representing the probabilistic relationships between variables which represent abstractions over your sensory observations, then you can construct a similar model that fixes a particular variable to have a particular value, but keeps everything else the same.

Why would you do that? Because such an altered model is useful for decisionmaking if the system-that-you-are is one of the variables in the graph. The way you compute which decision to output is based on a model of how the things in your environment depend on your decision, and it's possible to learn such a model from previous observations, even though you can't observe the effects of your current decision in advance of making it.

And that's what counterfactuals are! I don't think this is meaningfully circular: we've described how the system works in terms of lower-level components. (I've omitted a lot of details, but we can totally write computer programs that do this stuff.)

Comment by Zack_M_Davis on Solve Corrigibility Week · 2021-12-04T22:02:14.414Z · LW · GW

As a starting point, it might help to understand exactly where people's naïve intuitions about why corrigibility should be easy, clash with the technical argument that it's hard.

For me, the intuition goes like this: if I wanted to spend some fraction of my effort helping dolphins in their own moral reference frame, that seems like something I could do. I could give them gifts that I can predict that they'd like (like tasty fish or a water purifier), and be conservative when I couldn't figure out what dolphins "really wanted", and be eager to accept feedback when the dolphins wanted to change how I was trying to help. If my superior epistemic vantage point let me predict that the way dolphins would respond to gifts, would depend on the details like what order the gifts were presented in, I might compute an average over possible gift-orderings, or I might try to ask the dolphins to clarify, but I definitely wouldn't tile the lightcone with tiny molecular happy-dolphin sculptures, because I can tell that's not what dolphins want under any sensible notion of "want".

So what I'd like to understand better is, where exactly does the analogy between "humans being corrigible to dolphins (in the fraction of their efforts devoted to helping dolphins)" and "AI being corrigible to humans" break, such that I haven't noticed yet because empathic inference between mammals still works "well enough", but won't work when scaled to superintelligence? When I try to think of gift ideas for dolphins, am I failing to notice some way in which I'm "selfishly" projecting what I think dolphins should want onto them, or am I violating some coherence axiom?

Comment by Zack_M_Davis on Heads I Win, Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists · 2021-11-28T18:04:12.107Z · LW · GW

Thanks for this analysis! However—

if you start with a uniform prior over [0,1] for the probability of the coin coming up heads

I'm not. The post specifies "a coin that is either biased to land Heads 2/3rds of the time, or Tails 2/3rds of the time"—that is (and maybe I should have been more explicit), I'm saying our prior belief about the coin's bias is just the discrete distribution {"1/3 Heads, 2/3 Tails": 0.5, "2/3 Heads, 1/3 Tails": 0.5}.

I agree that a beta prior would be more "realistic" in the sense of applying to a wider range of scenarios (your uncertainty about a parameter is usually continuous, rather than "it's either this, or it's that, with equal probability"), but I wanted to make the math easy on myself and my readers.

Comment by Zack_M_Davis on [Book Review] "The Bell Curve" by Charles Murray · 2021-11-06T21:52:09.879Z · LW · GW

I'm not advocating lying.

I understand that. I cited a Sequences post that has the word "lies" in the title, but I'm claiming that the mechanism described in the cited posts—that distortions on one topic can spread to both adjacent topics, and to people's understanding of what reasoning looks like—can apply more generally to distortions that aren't direct lies.

Omitting information can be a distortion when the information would otherwise be relevant. In "A Rational Argument", Yudkowsky gives the example of an election campaign manager publishing survey responses from their candidate, but omitting one question which would make their candidate look bad, which Yudkowsky describes as "cross[ing] the line between rationality and rationalization" (!). This is a very high standard—but what made the Sequences so valuable, is that they taught people the counterintuitive idea that this standard exists. I think there's a lot of value in aspiring to hold one's public reasoning to that standard.

Not infinite value, of course! If I knew for a fact that Godzilla will destroy the world if I cite a book that I would otherwise would have cited as genuinely relevant, then fine, for the sake of the sake of the world, I can not cite the book.

Maybe we just quantitatively disagree on how tough Godzilla is and how large the costs of distortions are? Maybe you're happy to throw Sargon of Akkad under the bus, but when Steve Hsu is getting thrown under the bus, I think that's a serious problem for the future of humanity. I think this is actually worth a fight.

With my own resources and my own name (and a pen name), I'm fighting. If someone else doesn't want to fight with their name and their resources, I'm happy to listen to suggestions for how people with different risk tolerances can cooperate to not step on each other's toes! In the case of the shared resource of this website, if the Frontpage/Personal distinction isn't strong enough, then sure, "This is on our Banned Topics list; take it to /r/TheMotte, you guys" could be another point on the compromise curve. What I would hope for from the people playing the sneaky consequentialist image-management strategy, is that you guys would at least acknowledge that there is a conflict and that you've chosen a side.

might fill their opinion vacuum with false claims from elsewhere, or with true claims

For more on why I think not-making-false-claims is vastly too low of a standard to aim for, see "Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think" and "Heads I Win, Tails?—Never Heard of Her".

Comment by Zack_M_Davis on [Book Review] "The Bell Curve" by Charles Murray · 2021-11-06T03:47:41.589Z · LW · GW

I agree that offense-takers are calibrated against Society-in-general, not particular targets.

As a less-political problem with similar structure, consider ransomware attacks. If an attacker encrypts your business's files and will sell you the encryption key for 10 Bitcoins, do you pay (in order to get your files back, as common sense and causal decision theory agree), or do you not-pay (as a galaxy-brained updateless-decision-theory play to timelessly make writing ransomware less profitable, even though that doesn't help the copy of you in this timeline)?

It's a tough call! If your business's files are sufficiently important, then I can definitely see why you'd want to pay! But if someone were to try to portray the act of paying as pro-social, that would be pretty weird. If your Society knew how, law-abiding citizens would prefer to coordinate not to pay attackers, which is why the U.S. Treasury Department is cracking down on facilitating ransomware payments. But if that's not an option ...

our behavior [...] punishment against us [...] some other entity that we shouldn't care much about

If coordinating to resist extortion isn't an option, that makes me very interested in trying to minimize the extent to which there is a collective "us". "We" should be emphasizing that rationality is a subject matter that anyone can study, rather than trying to get people to join our robot cult and be subject to the commands and PR concerns of our leaders. Hopefully that way, people playing a sneaky consequentialist image-management strategy and people playing a Just Get The Goddamned Right Answer strategy can at least avoid being at each other's throats fighting over who owns the "rationalist" brand name.

Comment by Zack_M_Davis on [Book Review] "The Bell Curve" by Charles Murray · 2021-11-06T01:34:10.023Z · LW · GW

But there are systemic reasons why Society gets told that hypotheses about genetically-mediated group differences are offensive, and mostly doesn't (yet?) get told that technological forecasting is offensive. (If some research says Ethnicity E has higher levels of negatively-perceived Trait T, then Ethnicity E people have an incentive to discredit the research independently of its truth value—and people who perceive themselves as being in a zero-sum conflict with Ethnicity E have an incentive to promote the research independently of its truth value.)

Steven and his coalition are betting that it's feasible to "hold the line" on only censoring the hypotheses are closely tied to political incentives like this, without doing much damage to our collective ability to think about other aspects of the world. I don't think it works as well in practice as they think it does, due to the mechanisms described in "Entangled Truths, Contagious Lies" and "Dark Side Epistemology"—you make a seemingly harmless concession one day, and five years later, you end up claiming with perfect sincerity that dolphins are fish—but I don't think it's right to dismiss the strategy as fantasy.

Comment by Zack_M_Davis on [Book Review] "The Bell Curve" by Charles Murray · 2021-11-05T07:01:17.794Z · LW · GW

The relevant actors aren't consciously being strategic about it, but I think their emotions are sensitive to whether the threat of being offended seems to be working. That's what the emotions are for, evolutionarily speaking. People are innately very good at this! When I babysit a friend's unruly 6-year-old child who doesn't want to put on her shoes, or talk to my mother who wishes I would call more often, or introspect on my own rage at the abject cowardice of so-called "rationalists", the functionality of emotions as a negotiating tactic is very clear to me, even if I don't have the same kind of deliberative control over my feelings as my speech (and the child and my mother don't even think of themselves as doing game theory at all).

(This in itself doesn't automatically negate your concerns, of course, but I think it's an important modeling consideration: animals like Godzilla may be less incentivizable than Homo economicus, but they're more like Homo economicus than a tornado or an avalanche.)

Comment by Zack_M_Davis on [Book Review] "The Bell Curve" by Charles Murray · 2021-11-03T02:48:55.693Z · LW · GW

What's with the neglect of Richard J. Herrnstein?! His name actually comes first on the cover!

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-22T03:49:21.575Z · LW · GW

Personally, I lean laissez faire on moderation: I consider banning a non-spam user from the whole website to be quite serious, and that the karma system makes a decently large (but definitely not infinite!) ratio of useless-to-useful comments acceptable. Separately from that, I admit that applying different rules to celebrities would be pretty unprincipled, but I fear that my gut feeling actually is leaning that way.

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-22T02:21:12.696Z · LW · GW

were he to be banned, he would not much be missed

False!—I would miss him. I agree that comments like the grandparent are not great, but Ilya is a bona fide subject matter expert (his Ph.D. advisor was Judea Pearl), so when he contributes references or explanations, that's really valuable. Why escalate to banning a user when individual bad comments can be safely downvoted to invisibility?

Comment by Zack_M_Davis on Feature idea: Notification when a parent comment is modified · 2021-10-21T18:33:06.654Z · LW · GW

changes their mind and updates the original comment like this: "[EDIT: Actually, B makes a good argument against]". This feature would show this information in person B's inbox, without A having to write a separate reply to their comment.

I think separate replies are usually preferable for this. (A comment is a specific thing someone said at a particular time; you shouldn't feel obligated to go back and edit something you already said in the past, just because some of the replies were good.)

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-21T16:32:53.554Z · LW · GW

It's entirely appropriate! Expressing hostility is what slurs are for!

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-21T05:59:14.322Z · LW · GW

Michael told me once that he specifically seeks out people who are high in Eysenckian psychoticism.

So, this seems deliberate.

Because high-psychoticism people are the ones who are most likely to understand what he has to say.

This isn't nefarious. Anyone trying to meet new people to talk to, for any reason, is going to preferentially seek out people who are a better rather than worse match. Someone who didn't like our robot cult could make structurally the same argument about, say, efforts to market Yudkowsky's writing (like spending \$28,000 distributing copies of Harry Potter and the Methods to math contest winners): why, they're preying on innocent high-IQ systematizers and filling their heads with scary stories about the coming robot apocalypse!

I mean, technically, yes. But in Yudkowsky and friends' worldview, the coming robot apocalypse is actually real, and high-IQ systematizers are the people best positioned to understand this important threat. Of course they're going to try to market their memes to that neurotype-demographic. What do you expect them to do? What do you expect Michael to do?

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-18T18:06:52.364Z · LW · GW

Thanks for the explanation. (My comment was written from my idiosyncratic perspective of having been frequently intellectually stymied by speech restrictions, and not having given much careful thought to organizational design.)

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-18T03:28:11.893Z · LW · GW

it seems to me also the case that Michael Vassar also treated Jessica's [...] psycho[sis] as an achievement

Objection: hearsay. How would Scott know this? (I wrote a separate reply about the ways in which I think Scott's comment is being unfair.) As some closer-to-the-source counterevidence against the "treating as an achievement" charge, I quote a 9 October 2017 2:13 p.m. Signal message in which Michael wrote to me:

Up for coming by? I'd like to understand just how similar your situation was to Jessica's, including the details of her breakdown. We really don't want this happening so frequently.

(Also, just, whatever you think of Michael's many faults, very few people are cartoon villains that want their friends to have mental breakdowns.)

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-18T02:48:52.693Z · LW · GW

I talked and corresponded with Michael a lot during 2017–2020, and it seems likely that one of the psychotic breaks people are referring to is mine from February 2017? (Which Michael had nothing to do with causing, by the way.) I don't think you're being fair.

"jailbreak" yourself from it (I'm using a term I found on Ziz's discussion of her conversations with Vassar; I don't know if Vassar uses it himself)

I'm confident this is only a Ziz-ism: I don't recall Michael using the term, and I just searched my emails for jailbreak, and there are no hits from him.

again, this involves making them paranoid about MIRI/CFAR and convincing them to take lots of drugs [...] describing how it was a Vassar-related phenomenon

I'm having trouble figuring out how to respond to this hostile framing. I mean, it's true that I've talked with Michael many times about ways in which (in his view, and separately in mine) MIRI, CfAR, and "the community" have failed to live up to their stated purposes. Separately, it's also true that, on occasion, Michael has recommended I take drugs. (The specific recommendations I recall were weed and psilocybin. I always said No; drug use seems like a very bad idea given my history of psych problems.)

But, well ... if you genuinely thought that institutions and a community that you had devoted a lot of your life to building up, were now failing to achieve their purposes, wouldn't you want to talk to people about it? If you genuinely thought that certain chemicals would make your friends lives' better, wouldn't you recommend them?

Michael is a charismatic guy who has strong views and argues forcefully for them. That's not the same thing as having mysterious mind powers to "make people paranoid" or cause psychotic breaks! (To the extent that there is a correlation between talking to Michael and having psych issues, I suspect a lot of it is a selection effect rather than causal: Michael told me once that he specifically seeks out people who are high in Eysenckian psychoticism.) If someone thinks Michael is wrong about something, great: I'm sure he'd be happy to argue about it, time permitting. But under-evidenced aspersions that someone is somehow dangerous just to talk to are not an argument.

borderline psychosis, which the Vassarites mostly interpreted as success ("these people have been jailbroken out of the complacent/conformist world, and are now correctly paranoid and weird")

I can't speak for Michael or his friends, and I don't want to derail the thread by going into the details of my own situation. (That's a future community-drama post, for when I finally get over enough of my internalized silencing-barriers to finish writing it.) But speaking only for myself, I think there's a nearby idea that actually makes sense: if a particular social scene is sufficiently crazy (e.g., it's a cult), having a mental breakdown is an understandable reaction. It's not that mental breakdowns are in any way good—in a saner world, that wouldn't happen. But if you were so unfortunate to be in a situation where the only psychologically realistic outcomes were either to fall into conformity with the other cult-members, or have a stress-and-sleep-deprivation-induced psychotic episode as you undergo a "deep emotional break with the wisdom of [your] pack", the mental breakdown might actually be less bad in the long run, even if it's locally extremely bad.

My main advice is that if he or someone related to him asks you if you want to take a bunch of drugs and hear his pitch for why the world is corrupt, you say no.

I recommend hearing out the pitch and thinking it through for yourself. (But, yes, without drugs; I think drugs are very risky and strongly disagree with Michael on this point.)

ZD said Vassar broke them out of a mental hospital. I didn't ask them how.

(Incidentally, this was misreporting on my part, due to me being crazy at the time and attributing abilities to Michael that he did not, in fact, have. Michael did visit me in the psych ward, which was incredibly helpful—it seems likely that I would have been much worse off if he hadn't come—but I was discharged normally; he didn't bust me out.)

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-17T18:54:13.586Z · LW · GW

There could be more than one horse.

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-17T04:17:33.197Z · LW · GW

there's a big difference between saying it saves a few years vs. causes us to have a chance at all when we otherwise wouldn't. [...] it seems like most of the relevant ideas were already in the memespace

I was struck by the 4th edition of AI: A Modern Approach quoting Nobert Weiner writing in 1960 (!), "If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively ... we had better be quite sure that the purpose put into the machine is the purpose which we really desire."

It must not have seemed like a pressing issue in 1960, but Weiner noticed the problem! (And Yudkowsky didn't notice, at first.) How much better off are our analogues in the worlds where someone like Weiner (or, more ambitiously, Charles Babbage) did treat it as a pressing issue? How much measure do they have?

Comment by Zack_M_Davis on My experience at and around MIRI and CFAR (inspired by Zoe Curzi's writeup of experiences at Leverage) · 2021-10-17T03:51:21.288Z · LW · GW

I and other researchers were told not to even ask each other about what others of us were working on, on the basis that if someone were working on a secret project, they may have to reveal this fact. Instead, we were supposed to discuss our projects with an executive, who could connect people working on similar projects.

Trying to maintain secrecy within the organization like this (as contrasted to secrecy from the public) seems nuts to me. Certainly, if you have any clever ideas about how to build an AGI, you wouldn't want to put them on the public internet, where they might inspire someone who doesn't appreciate the difficulty of the alignment problem to do something dangerous.

But one would hope that the people working at MIRI do appreciate the difficulty of the alignment problem (as a real thing about the world, and not just something to temporarily believe because your current employer says so). If you want the alignment-savvy people to have an edge over the rest of the world (!), you should want them to be maximally intellectually productive, which naturally requires the ability to talk to each other without the overhead of seeking permission from a designated authority figure. (Where the standard practice of bottlenecking information and decisionmaking on a designated authority figure makes sense if you're a government or a corporation trying to wrangle people into serving the needs of the organization against their own interests, but I didn't think "we" were operating on that model.)

Comment by Zack_M_Davis on Blood Is Thicker Than Water 🐬 · 2021-10-12T16:02:19.504Z · LW · GW

Are you still going to insist that blood is thicker than water and we need to judge them by their phylogenetic group, even though this gives almost no useful information and it's almost always better to judge them by their environmental affinities?

No, of course not: we want categories that give useful information.

Did I fail as a writer by reaching for the cutesy title? (I guess I can't say I wasn't warned.) The actual text of the post—if you actually read all of the sentences in the post instead of just glancing at the title and skimming—is pretty explicit that I'm not proposing that phylogenetics is of fundamental philosophical importance ("it's not that we've 'decided' that we 'want' to define animal words based on phylogeny" [...] "we're likely to end up talking about phylogenetics as a [mere] convenience"), and I'm not saying no one should ever want to talk about convergent-evolution categories ("trees, and possibly crabs, are a case in point").

Rather, the reason I wrote this post is because I keep running into idiocies of the form "But who cares about evolutionary relatedness; I only care if it swims" (on the "Are dolphins fish?" question) or "But who cares about sex chromosomes; I only care about presentation and preferred pronouns" (on the "Are trans women women?" question). And I'm pointing out that even if genetics isn't immediately visible or salient if you glance at an organism from a distance, genetics being at the root of the causal graph buys you lots and lots of conditional independence assertions. (And if you don't know what conditional independence is and why it matters, then you have no business having positions about the philosophy of language. Read the Sequences.)

In the process of writing this up, I succumbed to the temptation of running with the proverbial title as a catchy concept-handle, but I would have hoped the audience of this website was sophisticated enough to understand that it was just a catchy concept-handle for the "root of the causal graph, means more conditional independence assertions, means the clustering happens in a much higher-dimensional space" thing, and not some kind of deranged assertion that "Blood Is Infinitely Thicker Than Water" or "Blood Is Thicker Than Water In All Possible Worlds, Including Thought Experiments Where Evolution Works Differently."

And if not, at some point in the future, do they go from being obviously-mammals-you-are-not-allowed-to-argue-this to obviously-fish-you-are-not-allowed-to-argue-this in the space of a single day?

No, of course not. Like many Less Wrong readers, I, too, am familiar with the Sorites paradox.

Or might there be a very long period when they are more like mammals in some way, more like fish in others, and you're allowed to categorize them however you want based on which is more useful for you?

You should categorize them whichever way is more useful for making predictions and decisions in whatever context you're operating in.

I don't want to summarize this as "however you want", because the question of which categories are most useful for making decisions and predictions is something people can be wrong about, and moreover, something people can be motivatedly wrong about, and—as we've seen—the general laws governing which categories are useful is something people can be motivatedly wrong about.

Suppose we have three vectors in ten-dimensional space, u = [1, 1, 1, 0, 0, 0, 0, 0, 0, 0], v = [1, 1, 1, 2, 2, 2, 2, 2, 2, 2], ad w = [3, 3, 3, 0, 0, 0, 0, 0, 0, 0]. A rich language with lots of vocabulary for describing the world will probably want both a word for a category that groups u and v together (because they match on through ), and a word for a category that groups u and w together (because they match on through ).

You'll probably end up talking about {u, w} more often than {u, v} because the former cluster is in a "thicker", higher-dimensional subspace, and it's not always obvious exactly which variables are "relevant." If I own the animal appearing in the photo located at the URL https://commons.wikimedia.org/wiki/File:A_white_cat.jpg, and you ask if I have a pet, I'm probably going to say "Yes, I have a cat", rather than "Yes, I have a white-animal" or "Yes, I have an endotherm", even though the latter two statments are both true and it's good to have language for them. I think this kind of thing is why English ended up allocating a short codeword bird to a category that includes penguins and excludes bats, even though sparrows and bats have something very salient in common (flying) that penguins don't.

Unfortunately, sometimes the same word/symbol ends up getting attached to two different categories—that's why dictionaries have a list of numbered definitions for a word. Dolphins aren't fish(1) (cold blooded water-dwelling vertebrate with fins and gills), but they are fish(2) (water-dwelling animals more generally). Bananas aren't berries in the culinary sense, but they are berries in the botantical sense. Obviously, it would be retarded to get in a fight over bananas "are berries": it depends on what you mean by the word, and which meaning is relevant can usually be deduced from whether we're baking a pie or studying biology, and if it's not clear which meaning is intended, you can say something like "Oh, sorry, I meant the botanical sense" to clarify.

All this is utterly trivial for people who are actually trying to communicate—who want to be clear about what which probabilistic model they're trying to point to. But I think it's important to articulate the underlying principles rather than saying "you're allowed to categorize them however you want", because sometimes people don't want to be clear: suppose someone were to say "Dolphins are fish because if they weren't, then I would be very sad and might kill myself. You ought to accept an unexpected acquatic mammal or two deep inside the conceptual boundaries of what would normally be considered fish if it'll save someone's life; there's no rule of rationality saying that you shouldn't, and there are plenty of rules of human decency saying that you should."

This emotional-blackmail maneuver is obviously a very different kind of thing than straightforwardly saying "Dolphins are fish in the second sense, but not the first sense; it's important to disambiguate which sense you mean when it's not clear from context." If some intellectually dishonest coward were to try to pass off the emotional-blackmail attempt as a serious philosophy argument, they would be laughed out of the room. Right?

I've tried to address your point about psychiatry in particular at https://slatestarcodex.com/2019/12/04/symptom-condition-cause/

This is a good post, but it doesn't seem very consistent with the part of the Body Keeps the Score review that I was picking on? In response to people who say that "depression" probably isn't one thing, you reply that those research programs haven't panned out. That makes sense: if you don't know how to find a more precise model, a simple model that probably conflates some things might be the best you can do.

But that's an empirical point about what we (don't) know about "depression", not a philosophical one. I'm saying the APA's verdict on van der Kolk's "developmental trauma disorder" proposal should follow the same methodology as your reasoning about depression. If they're not persuaded by van der Kolk's evidence as a matter of science, fine; if van der Kolk's evidence doesn't matter because "that's not the kind of thing we build our categories around" as a matter of philosophy, that's insane.

I think "crabs" are in this situation right now

What reading or experience is this based off of? This sounds more like something someone would say if they've only heard the term "carcinisation" as a fun fact on the internet, rather than actually reading the Wikipedia page in detail.

Comment by Zack_M_Davis on Glen Weyl: "Why I Was Wrong to Demonize Rationalism" · 2021-10-08T17:14:46.138Z · LW · GW

Why would we want an apology? Apologies are boring. Updates are interesting!

Comment by Zack_M_Davis on Dominic Cummings : Regime Change #2: A plea to Silicon Valley · 2021-10-05T20:41:31.101Z · LW · GW

Many of the specific points in the post do seem to be copied from Moldbug/Yarvin's recent work, more so than you might guess if you're not familiar with it, and only saw Yarvin listed along with Alexander/Hanania/Sullivan/Shor at the end—not just the idea of the executive shutting down entrenched bureaucracies, but the framing of the Lincoln and Roosevelt administrations as once-in-70-years de facto regime changes, and the specific citation of Roosevelt's inaugural address as a primary source.

I think it does make sense to read it in that context. A casual reader might come away with the impression that (as ChristianKI puts it) Cummings is proposing "making the system more demoractic (as in, increasing the power of democratically legitimized people)". Whereas if you know that Cummings is reading off of Yarvin's playbook, it's a lot clearer that being more democratic probably isn't the point. (In Yarvin's worldview, it's about temporarily using the forces of democracy to install a king who will govern more competently than a distributed bureaucratic oligrachy.)

Analogously, if some article talked about alignment of present-day machine-learning systems and cited Yudkowsky as one of several inspirations, but the specific points in the essay look like they were ripped from Arbital (rather than proportionately from the other four authors listed as inspirations), you'd probably be correct to infer that the author has given some thought to superintelligent-singleton-ruling-over-our-entire-future-lightcone-forever scenarios, even if the article itself can't cross that much inferential distance.

Comment by Zack_M_Davis on Blood Is Thicker Than Water 🐬 · 2021-09-30T03:55:54.579Z · LW · GW

Certainly, there are lots and lots of different aspects in which things in the world can be similar, and we want our language to have lots and lots of concepts to describe those similarities in the contexts that they happen to be relevant in; if someone were to claim that carnivore shouldn't even be a word, that would be crazy, and that's not what I'm trying to do.

Rather, I'm trying to nail down a sense in which some concepts are more "robust" than others—more useful across a wide variety of situations and contexts. If someone asked you, "Do you have any pets? If so, what kind?", and your pet was the animal depicted in the Wikimedia Commons file Felis_silvestris_catus_lying_on_rice_straw.jpg, you might reply, "Yes, I have a cat." But you probably wouldn't say, "Yes, I have a carnivore", or "Yes, I have an endotherm." You wouldn't even say "Yes, I have a furry-four-legged-tailed-mammal"—a concept which includes dogs, but which English doesn't even offer a word for! But all of those statements are true. What's going on here? Is it just an arbitrary cultural convention that people talk about owning "cats" instead of endotherms or furry-four-legged-tailed-mammals—a convention that could have just as easily gone the other way?

I argue that it's mostly not arbitrary. (You could say that the convention has a large basin of attraction.) Cat-ness is causally upstream of carnivory and four-legged-ness and furriness and warm-bloodedness and nocternalness and whiskers and meowing and lots and lots of other specific details, including details that I don't personally know, and details that I "know" in the sense that my brain can use the information in some contexts, but of which I don't have fine-grained introspective access into the implementation details: it's a lot easier to say "That's a cat" when you see one (and be correct), than it is to describe in words exactly what your brain is doing when you identify it as a cat and not a dog or a weasel.

(The relevant Sequences post is "Mutual Information, and Density in Thingspace".)

So while it's true that the category of interest depends on context and the most useful definition can depend on the situation, concepts definable in terms of ancestry (but not even necessarily monophyletic groups) are so robustly useful across so many situations, that it makes sense that English has short words for cat and fish and monkey and prokaryote—if these words didn't already exist, people would quickly reinvent them—and we don't suffer much from "sea animals" being a composition of two words. I think common usage is actually doing something right here, and people like Scott Alexander ("if he wants to define behemah as four-legged-land-dwellers that's his right, and no better or worse than your definition of 'creatures in a certain part of the phylogenetic tree'") or Nate Soares ("The definitional gynmastics required to believe that dolphins aren't fish are staggering") who claim it's arbitrary are doing something wrong.

Comment by Zack_M_Davis on Blood Is Thicker Than Water 🐬 · 2021-09-28T15:45:12.674Z · LW · GW

(I actually agree with this. Sorry if it's confusing that the rhetorical emphasis of this post is addressing a different error.)

Comment by Zack_M_Davis on Reply to Nate Soares on Dolphins · 2021-09-28T03:25:17.636Z · LW · GW

(Circling back to the object level after a three-and-a-half-month cooldown period.)

In a new post, I explain why paraphyletic categories are actually fine.

Comment by Zack_M_Davis on Petrov Day 2021: Mutually Assured Destruction? · 2021-09-27T22:16:06.299Z · LW · GW

going through the motions of making sure the hashes weren't publicly available would have been just virtue signaling

Yes. That's what I meant by "the sort of thing to do as part of the spirit of the game": in an actual nuclear (or AI) application, you'd want to pick the straightforwardly best design, not the design which was "something like an hour faster to get things going this way", right?

So as part of the wargame ritual, maybe you should expect people to leave annoying nitpicky comments in the genre of "Your hashes are visible", even if you don't think there's any real risk?

Does that seem weird? For more context on why I'm thinking this way, I thought last year's phishing attack provided us with a very valuable and educational "red team" service that it'd be fun to see continued in some form. ("Coordinate to not destroy the world" is an appropriate premise for an existential-risk-reduction community ritual, but so is intelligent adversaries making that difficult.) I'm not personally vicious enough to try to get the site nuked, but putting on a white hat and thinking about how it could be done feels on-theme.

your comment is a straightforward misapplication of security mindset. [...] There is no point in pursuing a security mindset if you are virtually certain that the thing you would be investing resources into would not be your weakest attack point.

"Misapplication" meaning you think I'm doing the security thinking wrong, or "misapplication" meaning you think the security thinking isn't appropriate given the context and costs? I think it's desirable to separate the security analysis (this-and-such design is safe against such-and-this class of attacks) from the project-management analysis (this-and-such design would cost an extra hour of dev time which our team doesn't have); external critics are often in a position to say something useful about the former but not the latter. (Although, unfortunately, my comment as written didn't successfully separate them: "the sort of thing to do" is most naturally read as an implied policy recommendation, not a just-the-facts threat analysis. Sorry!)

Comment by Zack_M_Davis on Petrov Day 2021: Mutually Assured Destruction? · 2021-09-27T15:27:37.820Z · LW · GW

I deliberately refrained from mentioning this in public yesterday out of respect for the spirit of the game, but I was disappointed that the SHA-256 hashes of the launch codes were publicly visible in the source code (such that someone could try to crack them offline) rather than being stored in the database (such that all guesses have to go online through the server).

Okay, yes, I understand that this wasn't a "serious" vulnerability because the launch codes were sufficiently high-entropy (actually random, not chosen by a human) that no one is going to be able to crack the hash. (I didn't make the list this year, but my launch code from last year was 16 alphanumeric characters, which is bits of entropy, which I think is being an expectation of 7 billion years at 100 billion hashes per second? Oh, except knock off a couple orders of magnitude, because there were a hundred codes.)

Still, in principle, as a matter of security mindset, putting the hashes in the database lets you depend on one fewer assumption (about how much computing power the adversary has). It seems like the sort of thing to do as part of the spirit of the game, given that this kind of thing is how the world is actually going to end. (Someone works out that this-and-such AI design will perform safely as long as such-and-these assumptions are true, and then one of the assumptions turns out not to be true.)

Comment by Zack_M_Davis on Petrov Day 2021: Mutually Assured Destruction? · 2021-09-27T03:33:35.022Z · LW · GW

This description indeed does nothing to distinguish between different norms like "keeping promises" or "taking the Petrov Day game seriously" (or, in fact, other, stupider norms such as "giving yourself electric shocks eight hours a day")

Okay, I think I'm reading a lot more into Sullyj3's use of the phrase "seemingly arbitrarily" than you are.

Very specific things like "taking the Petrov Day game seriously" or "giving yourself electric shocks eight hours a day" are the kind of cognitive content that we expect to come with attached justifications: depending on the game and the community, I could see either of "Take this game seriously; it's a ritual" or "Don't take this game seriously; it's just a game" being the norm. On encountering a community with such a norm, I would expect to be able to ask why they do things that way and receive a non-circular answer other than an appeal to "the way things are" (here).

In contrast, the very concept of a "promise" seems to have an inherent asymmetry to it; I'm not sure what making a "promise" would mean in a world where the norm is "You shouldn't keep promises." (This feels like the enactive analogue of the asymmetry between truth and lies, where you need a convention grounding the "true" meaning of a signal in order to even contemplate sending the signal "falsely".)

Comment by Zack_M_Davis on Petrov Day 2021: Mutually Assured Destruction? · 2021-09-27T00:30:08.835Z · LW · GW

this description, as written, could equally be applied to any norm or set of norms, including such basic ones as making and keeping promises (!!).

What? No, it can't be applied equally. The norm of "Keep your promises" serves the function of making it possible for people to plan around each other's behavior. (When I say, "I'll be there," you can confidently predict that I'll be there.) It's a very general and powerful social technology.

The norm of "Take the Petrov Day game on our website very seriously" is a lot more arbitrary because it doesn't serve that general function. If people had to proactively sign up for the game and sign a sworn statement saying that they promise not to press the button, then someone who subsequently pressed the button could be busted on the grounds of violating the more basic norm of breaking a promise that they proactively and voluntarily made—but that would be a different, and much less interesting, game. In the actual game, the mods unilaterally send out codes to users whom they predict will take the game seriously. If the mods guess wrong about that, many observers would say that's "on them."

attempt to convey ingroup membership while simultaneously signaling disdain and disappointment

I mean, to be fair, the ingroup is a massive disappointment that is genuinely worthy of disdain.

Comment by Zack_M_Davis on Petrov Day 2021: Mutually Assured Destruction? · 2021-09-27T00:27:00.522Z · LW · GW

my willingness to identify as a LWer that was burnt [...] HPMOR and the sequences were pretty profound influences on my development [...] frustrated at feeling like I have to abandon my identifaction as an LW rat

I've struggled a lot with this, too. The thing I keep trying to remember is that identification with the social group "shouldn't matter": you can still cherish the knowledge you gained from the group's core texts, without having to be loyal to the group as a collective (as contrasted to your loyalty to individual friends).

I don't think I've been very successful; multiple years after my alienation from "the community", I'm still hanging around leaving bitter comments about how terrible they/we are. It's very dysfunctional! Why can't I just write it off as a loss and move on?

I guess the main difficulty is that, for humans, knowledge actually isn't easily separable from a community of other people with shared vocabulary. I can't just ignore what the central rat social hierarchy is doing, because the central hierarchy nodes exert a lot of control over everyone in the world who I can really talk to.

Comment by Zack_M_Davis on Petrov Day 2021: Mutually Assured Destruction? · 2021-09-26T21:43:52.573Z · LW · GW

Probably, some high karma long-time users are no longer aligned with community goals

I mean, yes, but don't be too sure that's due to value-drift on the users' part rather than "the community."

Comment by Zack_M_Davis on A Semitechnical Introductory Dialogue on Solomonoff Induction · 2021-09-26T01:41:47.834Z · LW · GW

My guess is that "Msr." was intended as an abbreviation for monsieur (French for mister), in ignorance of the fact that the standard abbreviation is actually just M.

Comment by Zack_M_Davis on The 2021 Less Wrong Darwin Game · 2021-09-25T01:54:26.951Z · LW · GW

no clue what package I'd have to install to get pip - neither pip nor pip3 exist

On Ubuntu, at least, there's a python3-pip package, separately from the python3 package? (Other distros may be similar.) It's also supposed to be possible to install pip using Python itself.

Comment by Zack_M_Davis on What Motte and Baileys are rationalists most likely to engage in? · 2021-09-07T03:14:23.506Z · LW · GW

It doesn't help when Yudkowsky actively encourages this confusion! As he Tweeted today: "Anyways, Scott, this is just the usual division of labor in our caliphate: we're both always right, but you cater to the crowd that wants to hear it from somebody too modest to admit that, and I cater to the crowd that wants somebody out of that closet."

Just—the absolute gall of that motherfucker! I still need to finish my memoir about why I don't trust him the way I used to, but it's just so emotionally hard—like a lifelong devout Catholic denouncing the Pope. But what can you do when the Pope is actually wrong? My loyalty is to the truth, not to him.

Comment by Zack_M_Davis on What Motte and Baileys are rationalists most likely to engage in? · 2021-09-06T16:29:10.757Z · LW · GW

The very concept of a "rationalist" is an egregious one! What is a rationalist, really? The motte: "one who studies the methods of rationality, systematic methods of thought that result in true beliefs and goal achievement". The bailey: "a member of the social ingroup of Eliezer Yudkowsky and Scott Alexander fans, and their friends."

Comment by Zack_M_Davis on Value is Fragile · 2021-09-04T08:03:26.578Z · LW · GW

Sorry, the function of bringing up Three Worlds Collide was to point out the apparent contradiction in the Yudkowskian canon. Forget the story; I agree that fiction didn't happen and therefore isn't evidence.

The actual issue is that it seems like worlds shaped by the goal systems of other evolved biological creatures probably don't "contain almost nothing of worth": the lives of octopuses mean much less to me than human lives, but more than tiny molecular paperclips. The theme of "animal-like organisms that feel pleasure and pain" is something that natural selection will tend to reinvent, and the idealized values of those organisms are not a random utility function. (Do you disagree? If so, you at least face a Sorites problem on how fast value drops off as you look at our evolutionary history. Do chimpanzees matter? If not, did Homo erectus?) But if other animals aren't literally-as-valueless-as-paperclips, then some classes of AI architecture might not be, either.

Comment by Zack_M_Davis on Value is Fragile · 2021-09-04T05:12:45.102Z · LW · GW

Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.

Did anyone notice that this flatly contradicts Three Worlds Collide? The superhappies and babyeaters don't inherit from human morals at all (let alone detailedly and reliably), but the humans still regard the aliens as moral patients, having meddling preferences for the babyeater children to not be eaten, rather than being as indifferent as they would be to heaps of pebbles being scattered.

(Yes, it was fiction, but one would imagine the naturalistic metaethics of aliens were meant to be taken at face value, even if there is no Alderson drive and the evolutionary psychology resulting in baby-eaters specifically was literary casuistry.)

So if the moral of this post isn't quite right, how should it be revised?

"Any Future not shaped by a goal system with detailed reliable inheritance from human morals will be incomprehensibly alien and arbitrary-looking in a non-valued direction, even after taking into account how much you think you're 'cosmopolitan'; furthermore, AI goal systems are expected to look even more arbitrary than those of biological aliens, which would at least share the design signature of natural selection"?

How much is this modified moral weakened by potential analogies between natural selection of biological creatures, and gradient-descent or self-play AI training regimens?

Comment by Zack_M_Davis on Chantiel's Shortform · 2021-08-27T02:04:34.225Z · LW · GW

The Categories Were Made for Man, not Man for the Categories, Scott Alexander

The correctness of that post has been disputed; for an extended rebuttal, see "Where to Draw the Boundaries?" and "Unnatural Categories Are Optimized for Deception".

Comment by Zack_M_Davis on Buck's Shortform · 2021-08-25T15:38:14.411Z · LW · GW

A key psychological advantage of the "modest alignment" agenda is that it's not insanity-inducing. When I seriously contemplate the problem of selecting a utility function to determine the entire universe until the end of time, I want to die (which seems safer and more responsible).

But the problem of making language models "be honest" instead of just continuing the prompt? That's more my speed; that, I can think about, and possibly even usefully contribute to, without wanting to die. (And if someone else in the future uses honest language models as one of many tools to help select a utility function to determine the entire universe until the end of time, that's not my problem and not my fault.)

Comment by Zack_M_Davis on [AN #80]: Why AI risk might be solved without additional intervention from longtermists · 2021-08-24T18:10:19.710Z · LW · GW

Another illustration: if you're currently falling from a 90-story building, most of the expected utility is in worlds where there coincidentally happens to be a net to safely catch you before you hit the ground, or interventionist simulators decide to rescue you—even if virtually all of the probability is in worlds where you go splat and die. The decision theory looks right, but this is a lot less comforting than the interview made it sound.