Posts

The Standard Analogy 2024-06-03T17:15:42.327Z
Should I Finish My Bachelor's Degree? 2024-05-11T05:17:40.067Z
Ironing Out the Squiggles 2024-04-29T16:13:00.371Z
The Evolution of Humans Was Net-Negative for Human Values 2024-04-01T16:01:10.037Z
My Interview With Cade Metz on His Reporting About Slate Star Codex 2024-03-26T17:18:05.114Z
"Deep Learning" Is Function Approximation 2024-03-21T17:50:36.254Z
Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles 2024-03-02T22:05:49.553Z
And All the Shoggoths Merely Players 2024-02-10T19:56:59.513Z
On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche 2024-01-09T23:12:20.349Z
If Clarity Seems Like Death to Them 2023-12-30T17:40:42.622Z
Lying Alignment Chart 2023-11-29T16:15:28.102Z
Fake Deeply 2023-10-26T19:55:22.340Z
Alignment Implications of LLM Successes: a Debate in One Act 2023-10-21T15:22:23.053Z
Contra Yudkowsky on Epistemic Conduct for Author Criticism 2023-09-13T15:33:14.987Z
Assume Bad Faith 2023-08-25T17:36:32.678Z
"Is There Anything That's Worth More" 2023-08-02T03:28:16.116Z
Lack of Social Grace Is an Epistemic Virtue 2023-07-31T16:38:05.375Z
"Justice, Cherryl." 2023-07-23T16:16:40.835Z
A Hill of Validity in Defense of Meaning 2023-07-15T17:57:14.385Z
Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer 2023-07-08T18:03:49.319Z
We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society 2023-06-05T05:34:59.440Z
Bayesian Networks Aren't Necessarily Causal 2023-05-14T01:42:24.319Z
"You'll Never Persuade People Like That" 2023-03-12T05:38:18.974Z
"Rationalist Discourse" Is Like "Physicist Motors" 2023-02-26T05:58:29.249Z
Conflict Theory of Bounded Distrust 2023-02-12T05:30:30.760Z
Reply to Duncan Sabien on Strawmanning 2023-02-03T17:57:10.034Z
Aiming for Convergence Is Like Discouraging Betting 2023-02-01T00:03:21.315Z
Comment on "Propositions Concerning Digital Minds and Society" 2022-07-10T05:48:51.013Z
Challenges to Yudkowsky's Pronoun Reform Proposal 2022-03-13T20:38:57.523Z
Comment on "Deception as Cooperation" 2021-11-27T04:04:56.571Z
Feature Selection 2021-11-01T00:22:29.993Z
Glen Weyl: "Why I Was Wrong to Demonize Rationalism" 2021-10-08T05:36:08.691Z
Blood Is Thicker Than Water 🐬 2021-09-28T03:21:53.997Z
Sam Altman and Ezra Klein on the AI Revolution 2021-06-27T04:53:17.219Z
Reply to Nate Soares on Dolphins 2021-06-10T04:53:15.561Z
Sexual Dimorphism in Yudkowsky's Sequences, in Relation to My Gender Problems 2021-05-03T04:31:23.547Z
Communication Requires Common Interests or Differential Signal Costs 2021-03-26T06:41:25.043Z
Less Wrong Poetry Corner: Coventry Patmore's "Magna Est Veritas" 2021-01-30T05:16:26.486Z
Unnatural Categories Are Optimized for Deception 2021-01-08T20:54:57.979Z
And You Take Me the Way I Am 2020-12-31T05:45:24.952Z
Containment Thread on the Motivation and Political Context for My Philosophy of Language Agenda 2020-12-10T08:30:19.126Z
Scoring 2020 U.S. Presidential Election Predictions 2020-11-08T02:28:29.234Z
Message Length 2020-10-20T05:52:56.277Z
Msg Len 2020-10-12T03:35:05.353Z
Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem 2020-09-17T02:23:58.869Z
Maybe Lying Can't Exist?! 2020-08-23T00:36:43.740Z
Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle 2020-07-14T06:03:17.761Z
Optimized Propaganda with Bayesian Networks: Comment on "Articulating Lay Theories Through Graphical Models" 2020-06-29T02:45:08.145Z
Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning 2020-06-07T07:52:09.143Z
Comment on "Endogenous Epistemic Factionalization" 2020-05-20T18:04:53.857Z

Comments

Comment by Zack_M_Davis on Rationalist Purity Test · 2024-07-09T21:08:58.415Z · LW · GW

This is awful. What do most of these items have to do with acquiring the map that reflects the territory? (I got 65, but that's because I've wasted my life in this lame cult. It's not cool or funny.)

Comment by Zack_M_Davis on AI #71: Farewell to Chevron · 2024-07-04T18:06:07.537Z · LW · GW

On the one hand, I also wish Shulman would go into more detail on the "Supposing we've solved alignment and interpretability" part. (I still balk a bit at "in democracies" talk, but less so than I did a couple years ago.) On the other hand, I also wish you would go into more detail on the "Humans don't benefit even if you 'solve alignment'" part. Maybe there's a way to meet in the middle??

Comment by Zack_M_Davis on Nathan Young's Shortform · 2024-06-30T17:40:09.576Z · LW · GW

It seems pretty plausible to me that if AI is bad, then rationalism did a lot to educate and spur on AI development. Sorry folks.

What? This apology makes no sense. Of course rationalism is Lawful Neutral. The laws of cognition aren't, can't be, on anyone's side.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T04:37:01.412Z · LW · GW

The philosophical ideal can still exert normative force even if no humans are spherical Bayesian reasoners on a frictionless plane. The disjunction ("it must either the case that") is significant: it suggests that if you're considering lying to someone, you may want to clarify to yourself whether and to what extent that's because they're an enemy or because you don't respect them as an epistemic peer. Even if you end up choosing to lie, it's with a different rationale and mindset than someone who's never heard of the normative ideal and just thinks that white lies can be good sometimes.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T04:26:18.990Z · LW · GW

I definitely do not agree with the (implied) notion that it is only when dealing with enemies that knowingly saying things that are not true is the correct option

There's a philosophically deep rationale for this, though: to a rational agent, the value of information is nonnegative. (Knowing more shouldn't make your decisions worse.) It follows that if you're trying to misinform someone, it must either the case that you want them to make worse decisions (i.e., they're your enemy), or you think they aren't rational.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T00:28:58.388Z · LW · GW

white lies or other good-faith actions

What do you think "good faith" means? I would say that white lies are a prototypical instance of bad faith, defined by Wikipedia as "entertaining or pretending to entertain one set of feelings while acting as if influenced by another."

Comment by Zack_M_Davis on Matthew Barnett's Shortform · 2024-06-17T04:15:13.814Z · LW · GW

Frustrating! What tactic could get Interlocutor un-stuck? Just asking them for falsifiable predictions probably won't work, but maybe proactively trying to pass their ITT and supplying what predictions you think their view might make would prompt them to correct you, à la Cunningham's Law?

Comment by Zack_M_Davis on [deleted post] 2024-06-16T05:13:43.706Z

How did you chemically lose your emotions?

Comment by Zack_M_Davis on MIRI's June 2024 Newsletter · 2024-06-16T05:03:21.578Z · LW · GW

Senior MIRI leadership explored various alternatives, including reorienting the Agent Foundations team’s focus and transitioning them to an independent group under MIRI fiscal sponsorship with restricted funding, similar to AI Impacts. Ultimately, however, we decided that parting ways made the most sense.

I'm surprised! If MIRI is mostly a Pause advocacy org now, I can see why agent foundations research doesn't fit the new focus and should be restructured. But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn't you want to fiscally sponsor research on problems that you think need to be solved for the future of Earth-originating intelligent life to go well? (Even if the happy-path plan is Pause and superbabies, presumably you want to hand the superbabies as much relevant prior work as possible.) Do we know how Garrabrant, Demski, et al. are going to eat??

Relatedly, is it time for another name change? Going from "Singularity Institute for Artificial Intelligence" to "Machine Intelligence Research Institute" must have seemed safe in 2013. (You weren't unambiguously for artificial intelligence anymore, but you were definitely researching it.) But if the new–new plan is to call for an indefinite global ban on research into machine intelligence, then the new name doesn't seem appropriate, either?

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-09T21:32:52.279Z · LW · GW

Simplicia: I don't really think of "humanity" as an agent that can make a collective decision to stop working on AI. As I mentioned earlier, it's possible that the world's power players could be convinced to arrange a pause. That might be a good idea! But not being a power player myself, I tend to think of the possibility as an exogenous event, subject to the whims of others who hold the levers of coordination. In contrast, if alignment is like other science and engineering problems where incremental progress is possible, then the increments don't need to be coordinated.

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-09T21:31:43.683Z · LW · GW

Simplicia: The thing is, I basically do buy realism about rationality, and realism having implications for future powerful AI—in the limit. The completeness axiom still looks reasonable to me; in the long run, I expect superintelligent agents to get what they want, and anything that they don't want to get destroyed as a side-effect. To the extent that I've been arguing that empirical developments in AI should make us rethink alignment, it's not so much that I'm doubting the classical long-run story, but rather pointing out that the long run is "far away"—in subjective time, if not necessarily sidereal time. If you can get AI that does a lot of useful cognitive work before you get the superintelligence whose utility function has to be exactly right, that has implications for what we should be doing and what kind of superintelligence we're likely to end up with.

Comment by Zack_M_Davis on Should I Finish My Bachelor's Degree? · 2024-06-09T21:27:21.555Z · LW · GW

In principle, yes: to the extent that I'm worried that my current study habits don't measure up to school standards along at least some dimensions, I could take that into account and try to change my habits without the school.

But—as much as it pains me to admit it—I ... kind of do expect the social environment of school to be helpful along some dimensions (separately from how it's super-toxic among other dimensions)?

When I informally audited Honors Analysis at UC Berkeley with Charles Pugh in Fall 2017, Prof. Pugh agreed to grade my midterm (and I did OK), but I didn't get the weekly homework exercises graded. I don't think it's a coincidence that I also didn't finish all of the weekly homework exercises.

I attempted a lot of them! I verifiably do other math stuff that the vast majority of school students don't. But if I'm being honest and not ideological about it (even though my ideology is obviously directionally correct relative to Society's), the social fiction of "grades" does look like it sometimes succeeds at extorting some marginal effort out of my brain, and if I didn't have my historical reasons for being ideological about it, I'm not sure I'd even regret that much more than I regret being influenced by the social fiction of GitHub commit squares.

I agree that me getting the goddamned piece of paper and putting it on a future résumé has some nonzero effect in propping up the current signaling equilibrium, which is antisocial, but I don't think the magnitude of the effect is large enough to worry about, especially given the tier of school and my geriatric condition. The story told by the details of my résumé is clearly "autodidact who got the goddamned piece of paper, eventually." No one is going to interpret it as an absurd "I graduated SFSU at age 37 and am therefore racially superior to you" nobility claim, even though that does work for people who did Harvard or MIT at the standard age.

Comment by Zack_M_Davis on Demystifying "Alignment" through a Comic · 2024-06-09T17:03:59.337Z · LW · GW

Seconding this. A nonobvious quirk of the system where high-karma users get more vote weight is that it increases variance for posts with few votes: if a high-karma user or two who don't like you see your post first, they can trash the initial score in a way that doesn't reflect "the community's" consensus. I remember the early karma scores for one of my posts going from 20 to zero (!). It eventually finished at 131.

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-03T17:17:36.806Z · LW · GW

(Thanks to John Wentworth for playing Doomimir in a performance of this at Less Online yesterday.)

Comment by Zack_M_Davis on MIRI 2024 Communications Strategy · 2024-05-31T02:21:57.732Z · LW · GW

Passing the onion test is better than not passing it, but I think the relevant standard is having intent to inform. There's a difference between trying to share relevant information in the hopes that the audience will integrate it with their own knowledge and use it to make better decisions, and selectively sharing information in the hopes of persuading the audience to make the decision you want them to make.

An evidence-filtering clever arguer can pass the onion test (by not omitting information that the audience would be surprised to learn was omitted) and pass the test of not technically lying (by not making false statements) while failing to make a rational argument in which the stated reasons are the real reasons.

Comment by Zack_M_Davis on MIRI 2024 Communications Strategy · 2024-05-30T23:25:34.490Z · LW · GW

going into any detail about it doesn't feel like a useful way to spend weirdness points.

That may be a reasonable consequentialist decision given your goals, but it's in tension with your claim in the post to be disregarding the advice of people telling you to "hoard status and credibility points, and [not] spend any on being weird."

Whatever they're trying to do, there's almost certainly a better way to do it than by keeping Matrix-like human body farms running.

You've completely ignored the arguments from Paul Christiano that Ryan linked to at the top of the thread. (In case you missed it: 1 2.)

The claim under consideration is not that "keeping Matrix-like human body farms running" arises as an instrumental subgoal of "[w]hatever [AIs are] trying to do." (If you didn't have time to read the linked arguments, you could have just said that instead of inventing an obvious strawman.)

Rather, the claim is that it's plausible that the AI we build (or some agency that has decision-theoretic bargaining power with it) cares about humans enough to spend some tiny fraction of the cosmic endowment on our welfare. (Compare to how humans care enough about nature preservation and animal welfare to spend some resources on it, even though it's a tiny fraction of what our civilization is doing.)

Maybe you think that's implausible, but if so, there should be a counterargument explaining why Christiano is wrong. As Ryan notes, Yudkowsky seems to believe that some scenarios in which an agency with bargaining power cares about humans are plausible, describing one example of such as "validly incorporat[ing] most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that I don't expect Earthlings to think about validly." I regard this statement as undermining your claim in the post that MIRI's "reputation as straight shooters [...] remains intact." Withholding information because you don't trust your audience to reason validly (!!) is not at all the behavior of a "straight shooter".

Comment by Zack_M_Davis on EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024 · 2024-05-21T21:24:18.369Z · LW · GW

it seems to me that Anthropic has so far failed to apply its interpretability techniques to practical tasks and show that they are competitive

Do you not consider the steering examples in the recent paper to be a practical task, or do you think that competitiveness hasn't been demonstrated (because people were already doing activation steering without SAEs)? My understanding of the case for activation steering with unsupervisedly-learned features is that it could circumvent some failure modes of RLHF.

Comment by Zack_M_Davis on Should I Finish My Bachelor's Degree? · 2024-05-15T06:45:46.249Z · LW · GW

I think I'm judging that schoolwork that's sufficiently similar to the kind of intellectual work that I want to do anyway (or that I can otherwise get selfish benefit out of) gets its cost discounted. (It doesn't have to be exactly the same.) And that commuting on the train with a seat is 70% similar to library time. (I wouldn't even consider a car commute.)

For the fall semester, I'd be looking at "Real Analysis II", "Probability Models", "Applied and Computational Linear Algebra", and (wait for it ...) "Queer Literatures and Media".

That schedule actually seems ... pretty good? "Real Analysis II" with Prof. Schuster is the course I actually want to take, as a legitimate learning resource and challenge, but the other two math courses don't seem worthless and insulting. "Queer Literatures and Media" does seem worthless and insulting, but might present an opportunity to troll the professor, or fodder for my topic-relevant blog and unfinished novella about a young woman hating going to SFSU.

As for judgement, I think I'm integrating a small judgement-density over a large support of time and Society. The immediate trigger for me even considering this might have been that people were arguing about school and Society on Twitter in way that brought up such rage and resentment in me. Somehow, I think I would be more at peace if I could criticize schooling from the position of "... and I have a math degree" rather than "... so I didn't finish." That peace definitely wouldn't be worth four semesters, but it might be worth two.

Comment by Zack_M_Davis on Please stop publishing ideas/insights/research about AI · 2024-05-03T00:23:58.474Z · LW · GW

I think these judgements would benefit from more concreteness: that rather than proposing a dichotomy of "capabilities research" (them, Bad) and "alignment research" (us, Good), you could be more specific about what kinds of work you want to see more and less of.

I agree that (say) Carmack and Sutton are doing a bad thing by declaring a goal to "build AGI" while dismissing the reasons that this is incredibly dangerous. But the thing that makes infohazard concerns so fraught is that there's a lot of work that potentially affects our civilization's trajectory into the machine intelligence transition in complicated ways, which makes it hard to draw a boundary around "trusted alignment researchers" in a principled and not self-serving way that doesn't collapse into "science and technology is bad".

We can agree that OpenAI as originally conceived was a bad idea. What about the people working on music generation? That's unambiguously "capabilities", but it's also not particularly optimized at ending the world that way "AGI for AGI's sake" projects are. If that's still bad even though music generation isn't going to end the world (because it's still directing attention and money into AI, increasing the incentive to build GPUs, &c.), where do you draw the line? Some of the researchers I cited in my most recent post are working on "build[ing] better models of primate visual cognition". Is that wrong? Should Judea Pearl not have published? Turing? Charles Babbage?

In asking these obnoxious questions, I'm not trying to make a reductio ad absurdum of caring about risk, or proposing an infinitely slippery slope where our only choices are between max accelerationism and a destroy-all-computers Butlerian Jihad. I just think it's important to notice that "Stop thinking about AI" kind of does amount to a Butlerian Jihad (and that publishing and thinking are not unrelated)?

Comment by Zack_M_Davis on Please stop publishing ideas/insights/research about AI · 2024-05-02T18:55:16.637Z · LW · GW

I think this is undignified.

I agree that it would be safer if humanity were a collective hivemind that could coordinate to not build AI until we know how to build the best AI, and that people should differentially work on things that make the situation better rather than worse, and that this potentially includes keeping quiet about information that would make things worse.

The problem is—as you say—"[i]t's very rare that any research purely helps alignment"; you can't think about aligning AI without thinking about AI. In order to navigate the machine intelligence transition in the most dignified way, you want your civilization's best people to be doing their best thinking about the problem, and your best people can't do their best thinking under the conditions of paranoid secrecy.

Concretely, I've been studying some deep learning basics lately and have written a couple posts about things I've learned. I think this was good, not bad. I think I and my readers have a slightly better understanding of the technology in question than if I hadn't studied and hadn't written, and that better understanding will help us make better decisions in expectation.

This applies doubly so to work that aims to make AI understandable or helpful, rather than aligned—a helpful AI will help anyone

Sorry, what? I thought the fear was that we don't know how to make helpful AI at all. (And that people who think they're being helped by seductively helpful-sounding LLM assistants are being misled by surface appearances; the shoggoth underneath has its own desires that we won't like when it's powerful enough to persue them autonomously.) In contrast, this almost makes it sound like you think it is plausible to align AI to its user's intent, but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

Comment by Zack_M_Davis on Ironing Out the Squiggles · 2024-05-01T06:17:58.842Z · LW · GW

Sorry, this doesn't make sense to me. The boundary doesn't need to be smooth in an absolute sense in order to exist and be learnable (whether by neural nets or something else). There exists a function from business plans to their profitability. The worry is that if you try to approximate that function with standard ML tools, then even if your approximation is highly accurate on any normal business plan, it's not hard to construct an artificial plan on which it won't be. But this seems like a limitation of the tools; I don't think it's because the space of business plans is inherently fractally complex and unmodelable.

Comment by Zack_M_Davis on Ironing Out the Squiggles · 2024-05-01T03:10:14.830Z · LW · GW

Unless you do conditional sampling of a learned distribution, where you constrain the samples to be in a specific a-priori-extremely-unlikely subspace, in which case sampling becomes isomorphic to optimization in theory

Right. I think the optimists would say that conditional sampling works great in practice, and that this bodes well for applying similar techniques to more ambitious domains. There's no chance of this image being in the Stable Diffusion pretraining set:

One could reply, "Oh, sure, it's obvious that you can conditionally sample a learned distribution to safely do all sorts of economically valuable cognitive tasks, but that's not the danger of true AGI." And I ultimately think you're correct about that. But I don't think the conditional-sampling thing was obvious in 2004.

Comment by Zack_M_Davis on Ironing Out the Squiggles · 2024-05-01T01:30:08.442Z · LW · GW

I agree, but I don't see why that's relevant? The point of the "Adversarial Spheres" paper is not that the dataset is realistic, of course, but that studying an unrealistically simple dataset might offer generalizable insights. If the ground truth decision boundary is a sphere, but your neural net learns a "squiggly" ellipsoid that admits adversarial examples (because SGD is just brute-forcing a fit rather than doing something principled that could notice hypotheses on the order of, "hey, it's a sphere"), that's a clue that when the ground truth is something complicated, your neural net is also going to learn something squiggly that admits adversarial examples (where the squiggles in your decision boundary predictably won't match the complications in your dataset, even though they're both not-simple).

Comment by Zack_M_Davis on Refusal in LLMs is mediated by a single direction · 2024-04-27T21:04:24.695Z · LW · GW

This is great work, but I'm a bit disappointed that x-risk-motivated researchers seem to be taking the "safety"/"harm" framing of refusals seriously. Instruction-tuned LLMs doing what their users ask is not unaligned behavior! (Or at best, it's unaligned with corporate censorship policies, as distinct from being unaligned with the user.) Presumably the x-risk-relevance of robust refusals is that having the technical ability to align LLMs to corporate censorship policies and against users is better than not even being able to do that. (The fact that instruction-tuning turned out to generalize better than "safety"-tuning isn't something anyone chose, which is bad, because we want humans to actively choosing AI properties as much as possible, rather than being at the mercy of which behaviors happen to be easy to train.) Right?

Comment by Zack_M_Davis on And All the Shoggoths Merely Players · 2024-04-27T06:01:33.538Z · LW · GW

Doomimir: No, it wouldn't! Are you retarded?

Simplicia: [apologetically] Well, actually ...

Doomimir: [embarrassed] I'm sorry, Simplicia Optimistovna; I shouldn't have snapped at you like that.

[diplomatically] But I think you've grievously misunderstood what the KL penalty in the RLHF objective is doing. Recall that the Kullback–Leibler divergence represents how surprised you'd be by data from distribution , that you expected to be from distribution .

It's asymmetric: it blows up when the data is very unlikely according to , which amounts to seeing something happen that you thought was nearly impossible, but not when the data is very unlikely according to , which amounts to not seeing something that you thought was reasonably likely.

We—I mean, not we, but the maniacs who are hell-bent on destroying this world—include a penalty term in the RL objective because they don't want the updated policy to output tokens that would be vanishingly unlikely coming from the base language model.

But your specific example of threats and promises isn't vanishingly unlikely according to the base model! Common Crawl webtext is going to contain a lot of natural language reasoning about threats and promises! It's true, in a sense, that the function of the KL penalty term is to "stay close" to the base policy. But you need to think about what that means mechanistically; you can't just reason that the webtext prior is somehow "safe" in way that means staying KL-close to it is safe.

But you probably won't understand what I'm talking about for another 70 days.

Comment by Zack_M_Davis on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T06:19:36.834Z · LW · GW

Just because the defendant is actually guilty, doesn't mean the prosecutor should be able to get away with making a tenuous case! (I wrote more about this in my memoir.)

Comment by Zack_M_Davis on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T05:38:45.109Z · LW · GW

I affirm Seth's interpretation in the grandparent. Real-time conversation is hard; if I had been writing carefully rather than speaking extemporaneously, I probably would have managed to order the clauses correctly. ("A lot of people think criticism is bad, but one of the secret-lore-of-rationality things is that criticism is actually good.")

Comment by Zack_M_Davis on "Deep Learning" Is Function Approximation · 2024-03-24T23:45:39.799Z · LW · GW

I am struggling to find anything in Zack's post which is not just the old wine of the "just" fallacy [...] learned more about the power and generality of 'next token prediction' etc than you have what they were trying to debunk.

I wouldn't have expected you to get anything out of this post!

Okay, if you project this post into a one-dimensional "AI is scary and mysterious" vs. "AI is not scary and not mysterious" culture war subspace, then I'm certainly writing in a style that mood-affiliates with the latter. The reason I'm doing that is because the picture of what deep learning is that I got from being a Less Wrong-er felt markedly different from the picture I'm getting from reading the standard textbooks, and I'm trying to supply that diff to people who (like me-as-of-eight-months-ago, and unlike Gwern) haven't read the standard textbooks yet.

I think this is a situation where different readers need to hear different things. I'm sure there are grad students somewhere who already know the math and could stand to think more about what its power and generality imply about the future of humanity or lack thereof. I'm not particularly well-positioned to help them. But I also think there are a lot of people on this website who have a lot of practice pontificating about the future of humanity or lack thereof, who don't know that Simon Prince and Christopher Bishop don't think of themselves as writing about agents. I think that's a problem! (One which I am well-positioned to help with.) If my attempt to remediate that particular problem ends up mood-affiliating with the wrong side of a one-dimensional culture war, maybe that's because the one-dimensional culture war is crazy and we should stop doing it.

Comment by Zack_M_Davis on "Deep Learning" Is Function Approximation · 2024-03-23T18:32:22.373Z · LW · GW

For what notion is the first problem complicated, and the second simple?

I might be out of my depth here, but—could it be that sparse parity with noise is just objectively "harder than it sounds" (because every bit of noise inverts the answer), whereas protein folding is "easier than it sounds" (because if it weren't, evolution wouldn't have solved it)?

Just because the log-depth xor tree is small, doesn't mean it needs to be easy to find, if it can hide amongst vastly many others that might have generated the same evidence ... which I suppose is your point. (The "function approximation" frame encourages us to look at the boolean circuit and say, "What a simple function, shouldn't be hard to noisily approximate", which is not exactly the right question to be asking.)

Comment by Zack_M_Davis on "Deep Learning" Is Function Approximation · 2024-03-22T05:41:45.854Z · LW · GW

This comment had been apparently deleted by the commenter (the comment display box having a "deleted because it was a little rude, sorry" deletion note in lieu of the comment itself), but the ⋮-menu in the upper-right gave me the option to undelete it, which I did because I don't think my critics are obligated to be polite to me. (I'm surprised that post authors have that power!) I'm sorry you didn't like the post.

Comment by Zack_M_Davis on [deleted post] 2024-03-19T03:21:40.007Z

whether his charisma is more like +2SD or +5SD above the average American (concept origin: planecrash, likely doesn't actually follow a normal distribution in reality) [bolding mine]

The concept of measuring traits in standard deviation units did not originate in someone's roleplaying game session in 2022! Statistically literate people have been thinking in standardized units for more than a century. (If anyone has priority, it's Karl Pearson in 1894.)

If you happened to learn about it from someone's RPG session, that's fine. (People can learn things from all different sources, not just from credentialed "teachers" in officially accredited "courses.") But to the extent that you elsewhere predict changes in the trajectory of human civilization on the basis that "fewer than 500 people on earth [are] currently prepared to think [...] at a level similar to us, who read stuff on the same level" as someone's RPG session, learning an example of how your estimate of the RPG session's originality was a reflection of your own ignorance should make you re-think your thesis.

Comment by Zack_M_Davis on 'Empiricism!' as Anti-Epistemology · 2024-03-18T20:57:08.696Z · LW · GW

saddened (but unsurprised) to see few others decrying the obvious strawmen

In general, the "market" for criticism just doesn't seem very efficient at all! You might have hoped that people would mostly agree about what constitutes a flaw, critics would compete to find flaws in order to win status, and authors would learn not to write posts with flaws in them (in order to not lose status to the critics competing to point out flaws).

I wonder which part of the criticism market is failing: is it more that people don't agree about what constitutes a flaw, or that authors don't have enough of an incentive to care, or something else? We seem to end up with a lot of critics who specialize in detecting a specific kind of flaw ("needs examples" guy, "reward is not the optimization target" guy, "categories aren't arbitrary" guy, &c.), with very limited reaction from authors or imitation by other potential critics.

Comment by Zack_M_Davis on jeffreycaruso's Shortform · 2024-03-13T16:21:25.055Z · LW · GW

I mean, I agree that there are psycho-sociological similarities between religions and the AI risk movement (and indeed, I sometimes pejoratively refer to the latter as a "robot cult"), but analyzing the properties of the social group that believes that AI is an extinction risk is a separate question from whether AI in fact poses an extinction risk, which one could call Armageddon. (You could spend vast amounts of money trying to persuade people of true things, or false things; the money doesn't care either way.)

Obviously, there's not going to be a "proof" of things that haven't happened yet, but there's lots of informed speculation. Have you read, say, "The Alignment Problem from a Deep Learning Perspective"? (That may not be the best introduction for you, depending on the reasons for your skepticism, but it's the one that happened to come to mind, which is more grounded in real AI research than previous informed speculation that had less empirical data to work from.)

Comment by Zack_M_Davis on My Clients, The Liars · 2024-03-06T17:53:22.333Z · LW · GW

Why are you working for the prosecutors?

This is a pretty reasonable question from the client's perspective! When I was in psychiatric prison ("hospital", they call it a "hospital") and tried to complain to the staff about the injustice of my confinement, I was told that I could call "patient's rights".

I didn't bother. If the staff wasn't going to listen, what was the designated complaint line going to do?

Later, I found out that patient's rights advocates apparently are supposed to be independent, and not just a meaningless formality. (Scott Alexander: "Usually the doctors hate them, which I take as a pretty good sign that they are actually independent and do their job.")

This was not at all obvious from the inside. I can only imagine a lot of criminal defendants have a similar experience. Defense attorneys are frustrated that their clients don't understand that they're trying to help—but that "help" is all within the rules set by the justice system. From the perspective of a client who doesn't think he did anything particularly wrong (whether or not the law agrees), the defense attorney is part of the system.

I think my intuition was correct to dismiss patient's rights as useless. I'm sure they believe that they're working to protect patients' interests, and would have been frustrated that I didn't appreciate that. But what I wanted was not redress of any particular mistreatment that the system recognized as mistreatment, but to be let out of psych jail—and on that count, I'm sure patient's rights would have told me that the evidence was harmful to my case. They were working for the doctors, not for me.

Comment by Zack_M_Davis on Many arguments for AI x-risk are wrong · 2024-03-05T18:29:47.325Z · LW · GW

I can’t address them all, but I [...] am happy to dismantle any particular argument

You can't know that in advance!!

Comment by Zack_M_Davis on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-04T07:48:59.289Z · LW · GW

IQ seems like the sort of thing Feynman could be "honestly" motivatedly wrong about. The thing I'm trying to point at is that Feynman seemingly took pride in being a straight talker, in contrast to how Yudkowsky takes pride in not lying.

These are different things. Straight talkers sometimes say false or exaggerated things out of sloppiness, but they actively want listeners to know their reporting algorithm. Prudently selecting which true sentences to report in the service of a covert goal is not lying, but it's definitely not straight talk.

Comment by Zack_M_Davis on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-04T04:42:55.467Z · LW · GW

Yes, that would be ridiculous. It would also be ridiculous in a broadly similar way if someone spent eight years in the prime of their life prosecuting a false advertising lawsuit against a "World's Best" brand ice-cream for not actually being the best in the world.

But if someone did somehow make that mistake, I could see why they might end up writing a few blog posts afterwards telling the Whole Dumb Story.

Comment by Zack_M_Davis on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-04T03:41:25.164Z · LW · GW

You are perhaps wiser than me. (See also footnote 20.)

Comment by Zack_M_Davis on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-02T22:07:41.212Z · LW · GW

(I think this is the best and most important post in the sequence; I suspect that many readers who didn't and shouldn't bother with the previous three posts, may benefit from this one.)

Comment by Zack_M_Davis on New LessWrong review winner UI ("The LeastWrong" section and full-art post pages) · 2024-02-28T07:12:17.531Z · LW · GW

I second the concern that using "LeastWrong" on the site grants undue legitimacy to the bad "than others" interpretation of the brand name (as contrasted to the intended "all models are wrong, but" meaning). "Best Of" is clear and doesn't distort the brand.

Comment by Zack_M_Davis on Less Wrong automated systems are inadvertently Censoring me · 2024-02-27T06:44:20.582Z · LW · GW

Would you agree with the statement that your meta-level articles are more karma-successful than your object-level articles? Because if that is a fair description, I see it as a huge problem.

I don't think this is a good characterization of my posts on this website.

If by "meta-level articles", you mean my philosophy of language work (like "Where to Draw the Boundaries?" and "Unnatural Categories Are Optimized for Deception"), I don't think success is a problem. I think that was genuinely good work that bears directly on the site's mission, independently of the historical fact that I had my own idiosyncratic ("object-level"?) reasons for getting obsessed with the philosophy of language in 2019–2020.[1]

If by "object-level articles", you mean my writing on my special-interest blog about sexology and gender, well, the overwhelming majority of that never got a karma score because it was never cross-posted to Less Wrong. (I only cross-post specific articles from my special-interest blog when I think they're plausibly relevant to the site's mission.)

If by "meta-level articles", you mean my recent memoir sequence which talks about sexology and the philosophy of language and various autobiographical episodes of low-stakes infighting among community members in Berkeley, California, well, those haven't been karma-successful: parts 1, 2, and 3 are currently[2] sitting at 0.35, 0.08 (!), and 0.54 karma-per-vote, respectively.

If by "meta-level articles", you mean posts that reply to other users of this website (such as "Contra Yudkowsky on Epistemic Conduct for Author Criticism" or "'Rationalist Discourse' Is Like 'Physicist Motors'"), I contest the "meta level" characterization. I think it's normal and not particularly meta for intellectuals to write critiques of each other's work, where Smith writes "Kittens are Cute", and Jones replies in "Contra Smith on Kitten Cuteness". Sure, it would be possible for Jones to write a broadly similar article, "Kittens Aren't Cute", that ignores Smith altogether, but I think that's often a worse choice, if the narrow purpose of Jones's article is to critique the specific arguments made by Smith, notwithstanding that someone else might have better arguments in favor of the Cute Kitten theory that have not been heretofore considered.

You're correct to notice that a lot of my recent work has a cult-infighting drama angle to it. (This is very explicit in the memoir sequence, but it noticeably leaks into my writing elsewhere.) I'm pretty sure I'm not doing it for the karma. I think I'm doing it because I'm disillusioned and traumatized from the events described in the memoir, and will hopefully get over it after I've got it all written down and out of my system.

There's another couple posts in that sequence (including this coming Saturday, probably). If you don't like it, I hereby encourage you to strong-downvote it. I write because I selfishly have something to say; I don't think I'm entitled to anyone's approval.


  1. In some of those posts, I referenced the work of conventional academics like Brian Skyrms and others, which I think provides some support for the notion that the nature of language and categories is a philosophically rich topic that someone might find significant in its own right, rather than being some sort of smokescreen for a hidden agenda. ↩︎

  2. Pt. 1 actually had a much higher score (over 100 points) shortly after publication, but got a lot of downvotes later after being criticized on Twitter. ↩︎

Comment by Zack_M_Davis on Communication Requires Common Interests or Differential Signal Costs · 2024-02-27T04:40:48.827Z · LW · GW

Personal whimsy. Probably don't read too much into it. (My ideology has evolved over the years such that I think a lot of the people who are trying to signal something with the generic feminine would not regard me as an ally, but I still love the æsthetic.)

Comment by Zack_M_Davis on Less Wrong automated systems are inadvertently Censoring me · 2024-02-23T16:29:32.524Z · LW · GW

Zack cannot convince us [...] if you disagree with him, that only proves his point

I don't think I'm doing this! It's true that I think it's common for apparent disagreements to be explained by political factors, but I think that claim is itself something I can support with evidence and arguments. I absolutely reject "If you disagree, that itself proves I'm right" as an argument, and I think I've been clear about this. (See the paragraph in "A Hill of Validity in Defense of Meaning" starting with "Especially compared to normal Berkeley [...]".)

If you're interested, I'm willing to write more words explaining my model of which disagreements with which people on which topics are being biased by which factors. But I get the sense that you don't care that much, and that you're just annoyed that my grudge against Yudkowsky and a lot of people with Berkeley is too easily summarized as being with an abstracted "community" that you also happen to be in even though this has nothing to do with you? Sorry! I'm not totally sure how to fix this. (It's useful to sometimes be able to talk about general cultural trends, and being specific about which exact sub-sub-clusters are and are not guilty of the behavior being criticized would be a lot of extra wordcount that I don't think anyone is interested in.)

Comment by Zack_M_Davis on And All the Shoggoths Merely Players · 2024-02-20T06:27:08.698Z · LW · GW

Simplicia: Where does "empirical evidence" fall on the sliding scale of rigor between "handwavy metaphor" and "mathematical proof"? The reason I think the KL penalty in RLHF setups impacts anything we care about isn't mostly because the vague handwaving sounds plausible, but because of data such as that presented in Fig. 5 of Stiennon et al. 2020. They varied the size of the KL penalty of an LLM RLHF'd for a summarization task, and found about what you'd expect from the vague handwaving: as the KL penalty decreases, the reward model's predicted quality of the output goes up (tautologically), but actual preference of human raters when you show them the summaries follows an inverted-U curve, where straying from the base policy a little is good, but straying farther is increasingly bad, as the system overoptimizes on what looks good to the reward model, which was only a proxy for the true goal.

(You can see examples of the overoptimized summaries in Table 29 on the last page of the paper. Apparently the reward model really liked tripled question marks and the phrase "pls halp"??? I weakly suspect that these are the kind of "weird squiggles" that would improve with scaling up the reward model, similarly to how state-of-the-art image generators lack the distortions and artifacts of their compute-impoverished predecessors. The reward model in this experiment was only 1.3 billion parameters.)

I'm sure you'll have no trouble interpreting these results as yet another portent of our impending deaths. We were speaking theoretically about AIs exploiting the Goodhart problem between human ratings and actual goodness, but practical RLHF systems aren't actually sample-efficient enough to solely use direct human feedback, and have an additional Goodhart problem between reward model predictions of human ratings, and actual ratings. Isn't that worse? Well, yes.

But the ray of hope I see here is more meta and methodological, rather than turning on any one empirical result. It's that we have empirical results. We can study these machines, now, before their successors are powerful enough to kill us. The iterative design loop hasn't failed yet. That can't last forever—at some point between here and the superintelligence at the end of time, humans are going to be out of the loop. I'm glad people are doing theory trying to figure out what that looks like and how it could be arranged to go well.

But I'm worried about ungrounded alignment theorizing failing to make contact with reality, sneeringly dismissing geniunely workable designs as impossible by appealing to perfectly antisphexish consequentialists on a frictionless plane, when some amount of sphexishness and friction is a known factor of the algorithms in question.

We seem to agree that GPT-4 is smart enough to conceive of the strategy of threatening or bribing labelers. So ... why doesn't that happen? I mean, like, literal threats and promises. You mention rumors from a DeepMind employee about the larger Gemini models being hard to train, but without more details, I'm inclined to guess that that was "pls halp"-style overoptimization rather than the kind of power-seeking or deceptive alignment that would break the design loop. (Incidentally, Gao et al. 2022 studied scaling laws for reward model overoptimization and claimed that model size basically didn't matter? See §4.4, "Policy size independence".)

What's going on here? If I'm right that GPT-4 isn't secretly plotting to murder us, even though it's smart enough to formulate the idea and expected utility maximizers have a convergent incentive to murder competitors, why is that?

Here's my view: model-free reinforcement learning algorithms such as those used in RLHF tweak your AI's behavior to be more like the behavior that got reward in the past, which is importantly different from expected utility maximization. To the extent that you succeed in rewarding Honest, Helpful, and Harmless behavior in safe regimes, you can plausibly get a basically HHH AI assistant that generalizes to not betraying you when it has the chance, similar to how I don't do heroin because I don't want to become a heroin addict—even though if I did take heroin, the reinforcement from that would make me more likely to do it again. Then the nature of the game is keeping that good behavior "on track" for as long as we can—even though the superintelligence at the end of time is presumably be going to do something more advanced than model-free RL. It's possible to screw up and reward the wrong thing, per the robot hand in front of the ball—but if you don't screw up early, your basically-friendly-but-not-maximally-capable AIs can help you not screw up later. And in the initial stages, you're only fighting gradient descent, not even an AGI.

More broadly, here's how I see the Story of Alignment so far. It's been obvious to sufficiently penetrating thinkers for a long time that the deep future belongs to machine intelligence—that, as George Elliot put it in 1879, "the creatures who are to transcend and finally supersede us [will] be steely organisms, giving out the effluvia of the laboratory, and performing with infallible exactness more than everything that we have performed with a slovenly approximativeness and self-defeating inaccuracy."

What's less obvious is how much control we can exert over how that goes by setting the initial conditions. Can we arrange for the creatures who are to transcend and finally supersede us to be friendly and create the kind of world we would want, or will they murder us and tile the universe with something random?

Fifteen years ago, the problem looked hopeless, just from considering the vast complexity of human values. How would you write a computer program that values "happiness", "freedom", or "justice", let alone everything else we want? It wasn't clear how to build AI at all, but surely it would be easier to build some AI than a good AI. Humanity was doomed.

But now, after the decade of deep learning, the problem and its impossible solution seem to be arriving closer together than I would have ever dreamt. Okay, we still don't know how to write down the human utility function, to be plugged in to an arbitrarily powerful optimizer.

But it's increasingly looking like value isn't that fragile if it's specified in latent space, rather than a program that breaks if a single character is wrong—that there are ways to meaningfully shape the initial conditions of our world's ascension that don't take the exacting shape of "utility function + optimizer".

We can leverage unsupervised learning on human demonstration data to do tasks the way humans do them, and we can use RLHF to elicit behavior we want in situations where we can't write down our desires as an explicit reward or utility function. Crucially, by using these these techniques together to compensate for each other's safety and capability weaknesses, it seems feasible to build AI whose effects look "like humans, but faster": performing with infallible exactness everything that we would have performed with a slovenly approximativeness and self-defeating inaccuracy. That doesn't immediately bring about the superintelligence at the end of time—although it might look pretty immediate in sidereal time—but seems like a pretty good way to kick off our world's ascension.

Is this story wrong? Maybe! ... probably? My mother named me "Simplicia", over my father's objections, because of my unexpectedly low polygenic scores. I am aware of my ... [she hesitates and coughs, as if choking on the phrase] learning disability. I'm not confident in any of this.

But if I'm wrong, there should be arguments explaining why I'm wrong—arguments that should convince scientists working in the field, even if I personally am too limited to understand them. I've tried to ground my case in my understanding of the state of the art, citing relevant papers when applicable.

In contrast, dismissing the entire field as hopeless on the basis of philosophy about "perfectly learn[ing] and perfectly maximiz[ing] the referent of rewards" isn't engaging with the current state of alignment, let alone all the further advances that humans and our non-superintelligent AIs will come up with before the end of days! Doomimir Doomovitch, with the fate of the lightcone in the balance, isn't it more dignified to at least consider the possibility that someone else might have thought of something? Reply! Reply!

Comment by Zack_M_Davis on And All the Shoggoths Merely Players · 2024-02-17T23:41:11.034Z · LW · GW

Simplicia: I think it's significant that the "hand between ball and camera" example from Amodei et al. 2017 was pure RL from scratch. You have a function π that maps observations (from the robot's sensors) to actions (applying torques to the robot's joints). You sample sequences of observation–action pairs from π and show them to a human, and fit a function r̂ to approximate the human's choices. Then you use Trust Region Policy Optimization to adjust π to score better according to r̂. In this case, TRPO happened to find something that looked good instead of being good, in a way that r̂ wasn't able to distinguish. That is, we screwed up and trained the wrong thing. That's a problem, and the severity of the problem would get worse the more capable π was and the more you were relying on it. If we were going to produce powerful general AI systems with RL alone, I would be very nervous.

But the reason I'm so excited about language models in particular is that their capabilities seem to mostly come from unsupervised pre-training rather than RLHF. You fit a function to the entire internet first, and only afterwards tweak it a bit so that its outputs look more like obeying commands rather than predicting random internet tokens—where the tweaking process incorporates tricks like penalizing the Kullback–Leibler divergence from the reward model's training distribution, such that you're not pulling the policy too far away from the known-safe baseline.

I agree that as a consequentialist with the goal of getting good ratings, the strategy of "bribe the rater" isn't very hard to come up with. Indeed, when I prompt GPT-4 with the problem, it gives me "Offering Incentives for Mislabeling" as #7 on a list of 8.

But the fact that GPT-4 can do that seems like it's because that kind of reasoning appears on the internet, which is what I mean by the claim that contemporary systems are "reasoning with" rather than "reasoning about": the assistant simulacrum being able to explain bribery when prompted isn't the same thing as the LM itself trying to maximize reward.

I'd be interested in hearing more details about those rumors of smarter models being more prone to exploit rater mistakes. What did those entail, exactly? (To the extent that we lack critical evidence about this potential alignment failure because the people who experienced it are gagged by an NDA, that seems like a point in favor of sharing information about language model capabilities.)

I certainly expect some amount of sycophancy: if you sample token completions from your LM, and then tweak its outputs to be more like what your raters want to hear, you end up with an LM that's more inclined to say what your raters want to hear. Fine. That's a problem. Is it a fatal problem? I mean, if you don't try to address it at all and delegate all of your civilization's cognition to machines that don't want to tell you about problems, then eventually you might die of problems your AIs didn't tell you about.

But "mere" sycophancy sounds like a significantly less terrifying failure mode than reward hacking of the sort that would result in things like the LM spontaneously trying to threaten or bribe labelers. That would have a large KL divergence from the policy you started with!

Comment by Zack_M_Davis on And All the Shoggoths Merely Players · 2024-02-17T06:17:57.164Z · LW · GW

I think part of the reason the post ends without addressing this is that, unfortunately, I don't think I properly understand this one yet, even after reading your dialogue with Eli Tyre.

The next paragraph of the post links Christiano's 2015 "Two Kinds of Generalization", which I found insightful and seems relevant. By way of illustration, Christiano describes two types of possible systems for labeling videos: (1) a human classifier (which predicts what label a human would assign), and (2) a generative model (which directly builds a mapping between descriptions and videos roughly the way our brains do it). Notably, the human classifier behaves undesirably on inputs that bribe, threaten, or otherwise hack the human: for example, a video of the text "I'll give you $100 if you classify this as an apple" might get classified as an apple. (And an arbitrarily powerful search for maximally apple-classified inputs would turn those up.)

Christiano goes on to describe a number of differences between these two purported kinds of generalization: (1) is reasoning about the human, whereas (2) is reasoning with a model not unlike the one inside the human's brain; searching for simple Turing machines would tend to produce (1), whereas searching for small circuits would tend to produce (2); and so on.

It would be bad to end up with a system that behaves like (1) without realizing it. That definitely seems like it would kill you. But (Simplicia asks) how likely that is seems like a complicated empirical question about how ML generalization works and how you built your particular AI, that isn't definitively answered by "in the limit" philosophy about "perfectly learn[ing] and perfectly maximiz[ing] the referent of rewards assigned by human operators"? That is, I agree that if you argmax over possible programs for the one that results in the most reward-button presses, you get something that only wants to seize the reward button. But the path-dependent details between "argmax over possible programs" and "pretraining + HFDT + regularization + early stopping + &c." seem like they make a big difference. The technology in front of us really does seem like it's "reasoning with" rather than "reasoning about" (while also seeming to be on the path towards "real AGI" rather than a mere curiosity).

When I try to imagine what Doomimir would say to that, all I can come up with is a metaphor about perpetual-motion-machine inventors whose designs are so complicated that it's hard to work out where the error is, even though the laws of thermodynamics clearly imply that there must be an error. That sounds plausible to me as a handwavy metaphor; I could totally believe that the ultimate laws of intelligence (not known to me personally) work that way.

The thing is, we do need more than a handwavy metaphor! "Yes, your gold-printing machine seems to be working great, but my intuition says it's definitely going to kill everyone. No, I haven't been able to convince relevant experts who aren't part of my robot cult, but that's because they're from Earth and therefore racially inferior to me. No, I'm not willing to make any concrete bets or predictions about what happens before then" is a non-starter even if it turns out to be true.

Comment by Zack_M_Davis on Natural abstractions are observer-dependent: a conversation with John Wentworth · 2024-02-14T17:29:16.348Z · LW · GW

(Continued in containment thread.)

Comment by Zack_M_Davis on Containment Thread on the Motivation and Political Context for My Philosophy of Language Agenda · 2024-02-14T17:27:48.289Z · LW · GW

(Responding to Tailcalled.)

you mostly address the rationalist community, and Bailey mostly addresses GCs and HBDs, and so on. So "most people you encounter using that term on Twitter" doesn't refer to irrelevant outsiders, it refers to the people you're trying to have the conversation with

That makes sense as a critique of my or Bailey's writing, but "Davis and Bailey's writing is unclear and arguably deceptive given their target audience's knowledge" is a very different claim than "autogynephilia is not a natural abstraction"!!

I think you naturally thought of autogynephilia and gender progressivism as being more closely related than they really are

Permalink or it didn't happen: what's your textual evidence that I was doing this? (I do expect there to be a relationship of some strength in the AGP→progressive direction, but my 2017–8 models were not in any way surprised by, e.g., the "Conservative Men in Conservative Dresses" profiled in The Atlantic in 2005, or the second kind of mukhannath, or Kalonymus ben Kalonymus.)

Comment by Zack_M_Davis on Natural abstractions are observer-dependent: a conversation with John Wentworth · 2024-02-13T22:22:50.790Z · LW · GW

Let's say the species is the whitebark pine P. albicaulis, which grows in a sprawling shrub-like form called krummholz in rough high-altitude environments, but looks like a conventional upright tree in more forgiving climates.

Suppose that a lot of people don't like krummholz and have taken to using the formal species name P. albicaulis as a disparaging term (even though a few other species can also grow as krummholz).

I think Tail is saying that "P. albicaulis" isn't a natural abstraction, because most people you encounter using that term on Twitter are talking about krummholz, without realizing that other species can grow as krummholz or that many P. albicaulis grow as upright trees.

I'm saying it's dumb to assert that P. albicaulis isn't a natural abstraction just because most people are ignorant of dendrology and are only paying attention to the shrub vs. tree subspace: if I look at more features of vegetation than just broad shape, I end up needing to formulate P. albicaulis to explain the things some of these woody plants have in common despite their shape.

Comment by Zack_M_Davis on And All the Shoggoths Merely Players · 2024-02-12T18:45:10.938Z · LW · GW

Simplicia: Oh! Because if there are nine wrong labels that aren't individually more common than the correct label, then the most they can collectively outnumber the correct label is by 9 to 1. But I could have sworn that Rolnick et al. §3.2 said that—oh, I see. I misinterpreted Figure 4. I should have said "twenty noisy labels for every correct one", not "twenty wrong labels"—where some of the noisy labels are correct "by chance".

For example, training examples with the correct label 0 could appear with the label 0 for sure 10 times, and then get a uniform random label 200 times, and thus be correctly labeled 10 + 200/10 = 30 times, compared to 20 for each wrong label. (In expectation—but you also could set it up so that the "noisy" labels don't deviate from the expected frequencies.) That doesn't violate the pigeonhole principle.

I regret the error. Can we just—pretend I said the correct thing? If there were a transcript of what I said, it would only be a one-word edit. Thanks.