Posts

The Standard Analogy 2024-06-03T17:15:42.327Z
Should I Finish My Bachelor's Degree? 2024-05-11T05:17:40.067Z
Ironing Out the Squiggles 2024-04-29T16:13:00.371Z
The Evolution of Humans Was Net-Negative for Human Values 2024-04-01T16:01:10.037Z
My Interview With Cade Metz on His Reporting About Slate Star Codex 2024-03-26T17:18:05.114Z
"Deep Learning" Is Function Approximation 2024-03-21T17:50:36.254Z
Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles 2024-03-02T22:05:49.553Z
And All the Shoggoths Merely Players 2024-02-10T19:56:59.513Z
On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche 2024-01-09T23:12:20.349Z
If Clarity Seems Like Death to Them 2023-12-30T17:40:42.622Z
Lying Alignment Chart 2023-11-29T16:15:28.102Z
Fake Deeply 2023-10-26T19:55:22.340Z
Alignment Implications of LLM Successes: a Debate in One Act 2023-10-21T15:22:23.053Z
Contra Yudkowsky on Epistemic Conduct for Author Criticism 2023-09-13T15:33:14.987Z
Assume Bad Faith 2023-08-25T17:36:32.678Z
"Is There Anything That's Worth More" 2023-08-02T03:28:16.116Z
Lack of Social Grace Is an Epistemic Virtue 2023-07-31T16:38:05.375Z
"Justice, Cherryl." 2023-07-23T16:16:40.835Z
A Hill of Validity in Defense of Meaning 2023-07-15T17:57:14.385Z
Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer 2023-07-08T18:03:49.319Z
We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society 2023-06-05T05:34:59.440Z
Bayesian Networks Aren't Necessarily Causal 2023-05-14T01:42:24.319Z
"You'll Never Persuade People Like That" 2023-03-12T05:38:18.974Z
"Rationalist Discourse" Is Like "Physicist Motors" 2023-02-26T05:58:29.249Z
Conflict Theory of Bounded Distrust 2023-02-12T05:30:30.760Z
Reply to Duncan Sabien on Strawmanning 2023-02-03T17:57:10.034Z
Aiming for Convergence Is Like Discouraging Betting 2023-02-01T00:03:21.315Z
Comment on "Propositions Concerning Digital Minds and Society" 2022-07-10T05:48:51.013Z
Challenges to Yudkowsky's Pronoun Reform Proposal 2022-03-13T20:38:57.523Z
Comment on "Deception as Cooperation" 2021-11-27T04:04:56.571Z
Feature Selection 2021-11-01T00:22:29.993Z
Glen Weyl: "Why I Was Wrong to Demonize Rationalism" 2021-10-08T05:36:08.691Z
Blood Is Thicker Than Water 🐬 2021-09-28T03:21:53.997Z
Sam Altman and Ezra Klein on the AI Revolution 2021-06-27T04:53:17.219Z
Reply to Nate Soares on Dolphins 2021-06-10T04:53:15.561Z
Sexual Dimorphism in Yudkowsky's Sequences, in Relation to My Gender Problems 2021-05-03T04:31:23.547Z
Communication Requires Common Interests or Differential Signal Costs 2021-03-26T06:41:25.043Z
Less Wrong Poetry Corner: Coventry Patmore's "Magna Est Veritas" 2021-01-30T05:16:26.486Z
Unnatural Categories Are Optimized for Deception 2021-01-08T20:54:57.979Z
And You Take Me the Way I Am 2020-12-31T05:45:24.952Z
Containment Thread on the Motivation and Political Context for My Philosophy of Language Agenda 2020-12-10T08:30:19.126Z
Scoring 2020 U.S. Presidential Election Predictions 2020-11-08T02:28:29.234Z
Message Length 2020-10-20T05:52:56.277Z
Msg Len 2020-10-12T03:35:05.353Z
Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem 2020-09-17T02:23:58.869Z
Maybe Lying Can't Exist?! 2020-08-23T00:36:43.740Z
Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle 2020-07-14T06:03:17.761Z
Optimized Propaganda with Bayesian Networks: Comment on "Articulating Lay Theories Through Graphical Models" 2020-06-29T02:45:08.145Z
Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning 2020-06-07T07:52:09.143Z
Comment on "Endogenous Epistemic Factionalization" 2020-05-20T18:04:53.857Z

Comments

Comment by Zack_M_Davis on AI and the Technological Richter Scale · 2024-09-04T17:08:15.706Z · LW · GW
  1. Arguments from moral realism, fully robust alignment, that ‘good enough’ alignment is good enough in practice, and related concepts.

What is moral realism doing in the same taxon with fully robust and good-enough alignment? (This seems like a huge, foundational worldview gap; people who think alignment is easy still buy the orthogonality thesis.)

  1. Arguments from good outcomes being so cheap the AIs will allow them.

If you're putting this below the Point of No Return, then I don't think you've understood the argument. The claim isn't that good outcomes are so cheap that even a paperclip maximizer would implement them. (Obviously, a paperclip maximizer kills you and uses the atoms to make paperclips.)

The claim is that it's plausible for AIs to have some human-regarding preferences even if we haven't really succeeded at alignment, and that good outcomes for existing humans are so cheap that AIs don't have to care about the humans very much in order to spend a tiny fraction of their resources on them. (Compare to how some humans care enough about animal welfare to spend an tiny fraction of our resources helping nonhuman animals that already exist, in a way that doesn't seem like it would be satisfied by killing existing animals and replacing them with artificial pets.)

There are lots of reasons one might disagree with this: maybe you don't think human-regarding preferences are plausible at all, maybe you think accidental human-regarding preferences are bad rather than good (the humans in "Three Worlds Collide" didn't take the Normal Ending lying down), maybe you think it's insane to have such a scope-insensitive concept of good outcomes—but putting it below arguments from science fiction or blind faith (!) is silly.

Comment by Zack_M_Davis on Why Large Bureaucratic Organizations? · 2024-08-28T15:21:18.668Z · LW · GW

in a world where the median person is John Wentworth [...] on Earth (as opposed to Wentworld)

Who? There's no reason to indulge this narcissistic "Things would be better in a world where people were more like meeeeeee, unlike stupid Earth [i.e., the actually existing world containing all actually existing humans]" meme when the comparison relevant to the post's thesis is just "a world in which humans have less need for dominance-status", which is conceptually simpler, because it doesn't drag in irrelevant questions of who this Swentworth person is and whether they have an unusually low need for dominance-status.

(The fact that I feel motivated to write this comment probably owes to my need for dominance-status being within the normal range; I construe statements about an author's medianworld being superior to the real world as a covert status claim that I have an interest in contesting.)

Comment by Zack_M_Davis on Dialogue on Appeals to Consequences · 2024-08-28T15:17:39.330Z · LW · GW

2019 was a more innocent time. I grieve what we've lost.

Comment by Zack_M_Davis on Dialogue on Appeals to Consequences · 2024-08-28T15:11:46.051Z · LW · GW

It's a fuzzy Sorites-like distinction, but I think I'm more sympathetic to trying to route around a particular interlocutor's biases in the context of a direct conversation with a particular person (like a comment or Tweet thread) than I am in writing directed "at the world" (like top-level posts), because the more something is directed "at the world", the more you should expect that many of your readers know things that you don't, such that the humility argument for honesty applies forcefully.

Comment by Zack_M_Davis on How do we know dreams aren't real? · 2024-08-22T19:06:50.123Z · LW · GW

Just because you don't notice when you're dreaming, doesn't mean that dream experiences could just as well be waking experiences. The map is not the territory; Mach's principle is about phenomena that can't be told apart, not just anything you happen not to notice the differences between.

When I was recovering from a psychotic break in 2013, I remember hearing the beeping of a crosswalk signal, and thinking that it sounded like some sort of medical monitor, and wondering briefly if I was actually on my deathbed in a hospital, interpreting the monitor sound as a crosswalk signal and only imagining that I was healthy and outdoors—or perhaps, both at once: the two versions of reality being compatible with my experiences and therefore equally real. In retrospect, it seems clear that the crosswalk signal was real and the hospital idea was just a delusion: a world where people have delusions sometimes is more parsimonious than a world where people's experiences sometimes reflect multiple alternative realities (exactly when they would be said to be experiencing delusions in at least one of those realities).

Comment by Zack_M_Davis on Open Thread Summer 2024 · 2024-08-14T23:00:10.588Z · LW · GW

(I'm interested (context), but I'll be mostly offline the 15th through 18th.)

Comment by Zack_M_Davis on Californians, tell your reps to vote yes on SB 1047! · 2024-08-14T20:00:51.921Z · LW · GW

Here's the comment I sent using the contact form on my representative's website.

Dear Assemblymember Grayson:

I am writing to urge you to consider voting Yes on SB 1047, the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act. How our civilization handles machine intelligence is of critical importance to the future of humanity (or lack thereof), and from what I've heard from sources I've trust, this bill seems like a good first step: experts such as Turing Award winners Yoshua Bengio and Stuart Russell support the bill (https://time.com/7008947/california-ai-bill-letter/), and Eric Neyman of the Alignment Research Center described it as "narrowly tailored to address the most pressing AI risks without inhibiting innovation" (https://x.com/ericneyman/status/1823749878641779006). Thank you for your consideration. I am,

Your faithful constituent,
Zack M. Davis

Comment by Zack_M_Davis on Rationalist Purity Test · 2024-07-09T21:08:58.415Z · LW · GW

This is awful. What do most of these items have to do with acquiring the map that reflects the territory? (I got 65, but that's because I've wasted my life in this lame cult. It's not cool or funny.)

Comment by Zack_M_Davis on AI #71: Farewell to Chevron · 2024-07-04T18:06:07.537Z · LW · GW

On the one hand, I also wish Shulman would go into more detail on the "Supposing we've solved alignment and interpretability" part. (I still balk a bit at "in democracies" talk, but less so than I did a couple years ago.) On the other hand, I also wish you would go into more detail on the "Humans don't benefit even if you 'solve alignment'" part. Maybe there's a way to meet in the middle??

Comment by Zack_M_Davis on Nathan Young's Shortform · 2024-06-30T17:40:09.576Z · LW · GW

It seems pretty plausible to me that if AI is bad, then rationalism did a lot to educate and spur on AI development. Sorry folks.

What? This apology makes no sense. Of course rationalism is Lawful Neutral. The laws of cognition aren't, can't be, on anyone's side.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T04:37:01.412Z · LW · GW

The philosophical ideal can still exert normative force even if no humans are spherical Bayesian reasoners on a frictionless plane. The disjunction ("it must either the case that") is significant: it suggests that if you're considering lying to someone, you may want to clarify to yourself whether and to what extent that's because they're an enemy or because you don't respect them as an epistemic peer. Even if you end up choosing to lie, it's with a different rationale and mindset than someone who's never heard of the normative ideal and just thinks that white lies can be good sometimes.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T04:26:18.990Z · LW · GW

I definitely do not agree with the (implied) notion that it is only when dealing with enemies that knowingly saying things that are not true is the correct option

There's a philosophically deep rationale for this, though: to a rational agent, the value of information is nonnegative. (Knowing more shouldn't make your decisions worse.) It follows that if you're trying to misinform someone, it must either the case that you want them to make worse decisions (i.e., they're your enemy), or you think they aren't rational.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T00:28:58.388Z · LW · GW

white lies or other good-faith actions

What do you think "good faith" means? I would say that white lies are a prototypical instance of bad faith, defined by Wikipedia as "entertaining or pretending to entertain one set of feelings while acting as if influenced by another."

Comment by Zack_M_Davis on Matthew Barnett's Shortform · 2024-06-17T04:15:13.814Z · LW · GW

Frustrating! What tactic could get Interlocutor un-stuck? Just asking them for falsifiable predictions probably won't work, but maybe proactively trying to pass their ITT and supplying what predictions you think their view might make would prompt them to correct you, à la Cunningham's Law?

Comment by Zack_M_Davis on [deleted post] 2024-06-16T05:13:43.706Z

How did you chemically lose your emotions?

Comment by Zack_M_Davis on MIRI's June 2024 Newsletter · 2024-06-16T05:03:21.578Z · LW · GW

Senior MIRI leadership explored various alternatives, including reorienting the Agent Foundations team’s focus and transitioning them to an independent group under MIRI fiscal sponsorship with restricted funding, similar to AI Impacts. Ultimately, however, we decided that parting ways made the most sense.

I'm surprised! If MIRI is mostly a Pause advocacy org now, I can see why agent foundations research doesn't fit the new focus and should be restructured. But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn't you want to fiscally sponsor research on problems that you think need to be solved for the future of Earth-originating intelligent life to go well? (Even if the happy-path plan is Pause and superbabies, presumably you want to hand the superbabies as much relevant prior work as possible.) Do we know how Garrabrant, Demski, et al. are going to eat??

Relatedly, is it time for another name change? Going from "Singularity Institute for Artificial Intelligence" to "Machine Intelligence Research Institute" must have seemed safe in 2013. (You weren't unambiguously for artificial intelligence anymore, but you were definitely researching it.) But if the new–new plan is to call for an indefinite global ban on research into machine intelligence, then the new name doesn't seem appropriate, either?

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-09T21:32:52.279Z · LW · GW

Simplicia: I don't really think of "humanity" as an agent that can make a collective decision to stop working on AI. As I mentioned earlier, it's possible that the world's power players could be convinced to arrange a pause. That might be a good idea! But not being a power player myself, I tend to think of the possibility as an exogenous event, subject to the whims of others who hold the levers of coordination. In contrast, if alignment is like other science and engineering problems where incremental progress is possible, then the increments don't need to be coordinated.

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-09T21:31:43.683Z · LW · GW

Simplicia: The thing is, I basically do buy realism about rationality, and realism having implications for future powerful AI—in the limit. The completeness axiom still looks reasonable to me; in the long run, I expect superintelligent agents to get what they want, and anything that they don't want to get destroyed as a side-effect. To the extent that I've been arguing that empirical developments in AI should make us rethink alignment, it's not so much that I'm doubting the classical long-run story, but rather pointing out that the long run is "far away"—in subjective time, if not necessarily sidereal time. If you can get AI that does a lot of useful cognitive work before you get the superintelligence whose utility function has to be exactly right, that has implications for what we should be doing and what kind of superintelligence we're likely to end up with.

Comment by Zack_M_Davis on Should I Finish My Bachelor's Degree? · 2024-06-09T21:27:21.555Z · LW · GW

In principle, yes: to the extent that I'm worried that my current study habits don't measure up to school standards along at least some dimensions, I could take that into account and try to change my habits without the school.

But—as much as it pains me to admit it—I ... kind of do expect the social environment of school to be helpful along some dimensions (separately from how it's super-toxic among other dimensions)?

When I informally audited Honors Analysis at UC Berkeley with Charles Pugh in Fall 2017, Prof. Pugh agreed to grade my midterm (and I did OK), but I didn't get the weekly homework exercises graded. I don't think it's a coincidence that I also didn't finish all of the weekly homework exercises.

I attempted a lot of them! I verifiably do other math stuff that the vast majority of school students don't. But if I'm being honest and not ideological about it (even though my ideology is obviously directionally correct relative to Society's), the social fiction of "grades" does look like it sometimes succeeds at extorting some marginal effort out of my brain, and if I didn't have my historical reasons for being ideological about it, I'm not sure I'd even regret that much more than I regret being influenced by the social fiction of GitHub commit squares.

I agree that me getting the goddamned piece of paper and putting it on a future résumé has some nonzero effect in propping up the current signaling equilibrium, which is antisocial, but I don't think the magnitude of the effect is large enough to worry about, especially given the tier of school and my geriatric condition. The story told by the details of my résumé is clearly "autodidact who got the goddamned piece of paper, eventually." No one is going to interpret it as an absurd "I graduated SFSU at age 37 and am therefore racially superior to you" nobility claim, even though that does work for people who did Harvard or MIT at the standard age.

Comment by Zack_M_Davis on Demystifying "Alignment" through a Comic · 2024-06-09T17:03:59.337Z · LW · GW

Seconding this. A nonobvious quirk of the system where high-karma users get more vote weight is that it increases variance for posts with few votes: if a high-karma user or two who don't like you see your post first, they can trash the initial score in a way that doesn't reflect "the community's" consensus. I remember the early karma scores for one of my posts going from 20 to zero (!). It eventually finished at 131.

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-03T17:17:36.806Z · LW · GW

(Thanks to John Wentworth for playing Doomimir in a performance of this at Less Online yesterday.)

Comment by Zack_M_Davis on MIRI 2024 Communications Strategy · 2024-05-31T02:21:57.732Z · LW · GW

Passing the onion test is better than not passing it, but I think the relevant standard is having intent to inform. There's a difference between trying to share relevant information in the hopes that the audience will integrate it with their own knowledge and use it to make better decisions, and selectively sharing information in the hopes of persuading the audience to make the decision you want them to make.

An evidence-filtering clever arguer can pass the onion test (by not omitting information that the audience would be surprised to learn was omitted) and pass the test of not technically lying (by not making false statements) while failing to make a rational argument in which the stated reasons are the real reasons.

Comment by Zack_M_Davis on MIRI 2024 Communications Strategy · 2024-05-30T23:25:34.490Z · LW · GW

going into any detail about it doesn't feel like a useful way to spend weirdness points.

That may be a reasonable consequentialist decision given your goals, but it's in tension with your claim in the post to be disregarding the advice of people telling you to "hoard status and credibility points, and [not] spend any on being weird."

Whatever they're trying to do, there's almost certainly a better way to do it than by keeping Matrix-like human body farms running.

You've completely ignored the arguments from Paul Christiano that Ryan linked to at the top of the thread. (In case you missed it: 1 2.)

The claim under consideration is not that "keeping Matrix-like human body farms running" arises as an instrumental subgoal of "[w]hatever [AIs are] trying to do." (If you didn't have time to read the linked arguments, you could have just said that instead of inventing an obvious strawman.)

Rather, the claim is that it's plausible that the AI we build (or some agency that has decision-theoretic bargaining power with it) cares about humans enough to spend some tiny fraction of the cosmic endowment on our welfare. (Compare to how humans care enough about nature preservation and animal welfare to spend some resources on it, even though it's a tiny fraction of what our civilization is doing.)

Maybe you think that's implausible, but if so, there should be a counterargument explaining why Christiano is wrong. As Ryan notes, Yudkowsky seems to believe that some scenarios in which an agency with bargaining power cares about humans are plausible, describing one example of such as "validly incorporat[ing] most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that I don't expect Earthlings to think about validly." I regard this statement as undermining your claim in the post that MIRI's "reputation as straight shooters [...] remains intact." Withholding information because you don't trust your audience to reason validly (!!) is not at all the behavior of a "straight shooter".

Comment by Zack_M_Davis on EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024 · 2024-05-21T21:24:18.369Z · LW · GW

it seems to me that Anthropic has so far failed to apply its interpretability techniques to practical tasks and show that they are competitive

Do you not consider the steering examples in the recent paper to be a practical task, or do you think that competitiveness hasn't been demonstrated (because people were already doing activation steering without SAEs)? My understanding of the case for activation steering with unsupervisedly-learned features is that it could circumvent some failure modes of RLHF.

Comment by Zack_M_Davis on Should I Finish My Bachelor's Degree? · 2024-05-15T06:45:46.249Z · LW · GW

I think I'm judging that schoolwork that's sufficiently similar to the kind of intellectual work that I want to do anyway (or that I can otherwise get selfish benefit out of) gets its cost discounted. (It doesn't have to be exactly the same.) And that commuting on the train with a seat is 70% similar to library time. (I wouldn't even consider a car commute.)

For the fall semester, I'd be looking at "Real Analysis II", "Probability Models", "Applied and Computational Linear Algebra", and (wait for it ...) "Queer Literatures and Media".

That schedule actually seems ... pretty good? "Real Analysis II" with Prof. Schuster is the course I actually want to take, as a legitimate learning resource and challenge, but the other two math courses don't seem worthless and insulting. "Queer Literatures and Media" does seem worthless and insulting, but might present an opportunity to troll the professor, or fodder for my topic-relevant blog and unfinished novella about a young woman hating going to SFSU.

As for judgement, I think I'm integrating a small judgement-density over a large support of time and Society. The immediate trigger for me even considering this might have been that people were arguing about school and Society on Twitter in way that brought up such rage and resentment in me. Somehow, I think I would be more at peace if I could criticize schooling from the position of "... and I have a math degree" rather than "... so I didn't finish." That peace definitely wouldn't be worth four semesters, but it might be worth two.

Comment by Zack_M_Davis on Please stop publishing ideas/insights/research about AI · 2024-05-03T00:23:58.474Z · LW · GW

I think these judgements would benefit from more concreteness: that rather than proposing a dichotomy of "capabilities research" (them, Bad) and "alignment research" (us, Good), you could be more specific about what kinds of work you want to see more and less of.

I agree that (say) Carmack and Sutton are doing a bad thing by declaring a goal to "build AGI" while dismissing the reasons that this is incredibly dangerous. But the thing that makes infohazard concerns so fraught is that there's a lot of work that potentially affects our civilization's trajectory into the machine intelligence transition in complicated ways, which makes it hard to draw a boundary around "trusted alignment researchers" in a principled and not self-serving way that doesn't collapse into "science and technology is bad".

We can agree that OpenAI as originally conceived was a bad idea. What about the people working on music generation? That's unambiguously "capabilities", but it's also not particularly optimized at ending the world that way "AGI for AGI's sake" projects are. If that's still bad even though music generation isn't going to end the world (because it's still directing attention and money into AI, increasing the incentive to build GPUs, &c.), where do you draw the line? Some of the researchers I cited in my most recent post are working on "build[ing] better models of primate visual cognition". Is that wrong? Should Judea Pearl not have published? Turing? Charles Babbage?

In asking these obnoxious questions, I'm not trying to make a reductio ad absurdum of caring about risk, or proposing an infinitely slippery slope where our only choices are between max accelerationism and a destroy-all-computers Butlerian Jihad. I just think it's important to notice that "Stop thinking about AI" kind of does amount to a Butlerian Jihad (and that publishing and thinking are not unrelated)?

Comment by Zack_M_Davis on Please stop publishing ideas/insights/research about AI · 2024-05-02T18:55:16.637Z · LW · GW

I think this is undignified.

I agree that it would be safer if humanity were a collective hivemind that could coordinate to not build AI until we know how to build the best AI, and that people should differentially work on things that make the situation better rather than worse, and that this potentially includes keeping quiet about information that would make things worse.

The problem is—as you say—"[i]t's very rare that any research purely helps alignment"; you can't think about aligning AI without thinking about AI. In order to navigate the machine intelligence transition in the most dignified way, you want your civilization's best people to be doing their best thinking about the problem, and your best people can't do their best thinking under the conditions of paranoid secrecy.

Concretely, I've been studying some deep learning basics lately and have written a couple posts about things I've learned. I think this was good, not bad. I think I and my readers have a slightly better understanding of the technology in question than if I hadn't studied and hadn't written, and that better understanding will help us make better decisions in expectation.

This applies doubly so to work that aims to make AI understandable or helpful, rather than aligned—a helpful AI will help anyone

Sorry, what? I thought the fear was that we don't know how to make helpful AI at all. (And that people who think they're being helped by seductively helpful-sounding LLM assistants are being misled by surface appearances; the shoggoth underneath has its own desires that we won't like when it's powerful enough to persue them autonomously.) In contrast, this almost makes it sound like you think it is plausible to align AI to its user's intent, but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

Comment by Zack_M_Davis on Ironing Out the Squiggles · 2024-05-01T06:17:58.842Z · LW · GW

Sorry, this doesn't make sense to me. The boundary doesn't need to be smooth in an absolute sense in order to exist and be learnable (whether by neural nets or something else). There exists a function from business plans to their profitability. The worry is that if you try to approximate that function with standard ML tools, then even if your approximation is highly accurate on any normal business plan, it's not hard to construct an artificial plan on which it won't be. But this seems like a limitation of the tools; I don't think it's because the space of business plans is inherently fractally complex and unmodelable.

Comment by Zack_M_Davis on Ironing Out the Squiggles · 2024-05-01T03:10:14.830Z · LW · GW

Unless you do conditional sampling of a learned distribution, where you constrain the samples to be in a specific a-priori-extremely-unlikely subspace, in which case sampling becomes isomorphic to optimization in theory

Right. I think the optimists would say that conditional sampling works great in practice, and that this bodes well for applying similar techniques to more ambitious domains. There's no chance of this image being in the Stable Diffusion pretraining set:

One could reply, "Oh, sure, it's obvious that you can conditionally sample a learned distribution to safely do all sorts of economically valuable cognitive tasks, but that's not the danger of true AGI." And I ultimately think you're correct about that. But I don't think the conditional-sampling thing was obvious in 2004.

Comment by Zack_M_Davis on Ironing Out the Squiggles · 2024-05-01T01:30:08.442Z · LW · GW

I agree, but I don't see why that's relevant? The point of the "Adversarial Spheres" paper is not that the dataset is realistic, of course, but that studying an unrealistically simple dataset might offer generalizable insights. If the ground truth decision boundary is a sphere, but your neural net learns a "squiggly" ellipsoid that admits adversarial examples (because SGD is just brute-forcing a fit rather than doing something principled that could notice hypotheses on the order of, "hey, it's a sphere"), that's a clue that when the ground truth is something complicated, your neural net is also going to learn something squiggly that admits adversarial examples (where the squiggles in your decision boundary predictably won't match the complications in your dataset, even though they're both not-simple).

Comment by Zack_M_Davis on Refusal in LLMs is mediated by a single direction · 2024-04-27T21:04:24.695Z · LW · GW

This is great work, but I'm a bit disappointed that x-risk-motivated researchers seem to be taking the "safety"/"harm" framing of refusals seriously. Instruction-tuned LLMs doing what their users ask is not unaligned behavior! (Or at best, it's unaligned with corporate censorship policies, as distinct from being unaligned with the user.) Presumably the x-risk-relevance of robust refusals is that having the technical ability to align LLMs to corporate censorship policies and against users is better than not even being able to do that. (The fact that instruction-tuning turned out to generalize better than "safety"-tuning isn't something anyone chose, which is bad, because we want humans to actively choosing AI properties as much as possible, rather than being at the mercy of which behaviors happen to be easy to train.) Right?

Comment by Zack_M_Davis on And All the Shoggoths Merely Players · 2024-04-27T06:01:33.538Z · LW · GW

Doomimir: No, it wouldn't! Are you retarded?

Simplicia: [apologetically] Well, actually ...

Doomimir: [embarrassed] I'm sorry, Simplicia Optimistovna; I shouldn't have snapped at you like that.

[diplomatically] But I think you've grievously misunderstood what the KL penalty in the RLHF objective is doing. Recall that the Kullback–Leibler divergence represents how surprised you'd be by data from distribution , that you expected to be from distribution .

It's asymmetric: it blows up when the data is very unlikely according to , which amounts to seeing something happen that you thought was nearly impossible, but not when the data is very unlikely according to , which amounts to not seeing something that you thought was reasonably likely.

We—I mean, not we, but the maniacs who are hell-bent on destroying this world—include a penalty term in the RL objective because they don't want the updated policy to output tokens that would be vanishingly unlikely coming from the base language model.

But your specific example of threats and promises isn't vanishingly unlikely according to the base model! Common Crawl webtext is going to contain a lot of natural language reasoning about threats and promises! It's true, in a sense, that the function of the KL penalty term is to "stay close" to the base policy. But you need to think about what that means mechanistically; you can't just reason that the webtext prior is somehow "safe" in way that means staying KL-close to it is safe.

But you probably won't understand what I'm talking about for another 70 days.

Comment by Zack_M_Davis on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T06:19:36.834Z · LW · GW

Just because the defendant is actually guilty, doesn't mean the prosecutor should be able to get away with making a tenuous case! (I wrote more about this in my memoir.)

Comment by Zack_M_Davis on My Interview With Cade Metz on His Reporting About Slate Star Codex · 2024-03-27T05:38:45.109Z · LW · GW

I affirm Seth's interpretation in the grandparent. Real-time conversation is hard; if I had been writing carefully rather than speaking extemporaneously, I probably would have managed to order the clauses correctly. ("A lot of people think criticism is bad, but one of the secret-lore-of-rationality things is that criticism is actually good.")

Comment by Zack_M_Davis on "Deep Learning" Is Function Approximation · 2024-03-24T23:45:39.799Z · LW · GW

I am struggling to find anything in Zack's post which is not just the old wine of the "just" fallacy [...] learned more about the power and generality of 'next token prediction' etc than you have what they were trying to debunk.

I wouldn't have expected you to get anything out of this post!

Okay, if you project this post into a one-dimensional "AI is scary and mysterious" vs. "AI is not scary and not mysterious" culture war subspace, then I'm certainly writing in a style that mood-affiliates with the latter. The reason I'm doing that is because the picture of what deep learning is that I got from being a Less Wrong-er felt markedly different from the picture I'm getting from reading the standard textbooks, and I'm trying to supply that diff to people who (like me-as-of-eight-months-ago, and unlike Gwern) haven't read the standard textbooks yet.

I think this is a situation where different readers need to hear different things. I'm sure there are grad students somewhere who already know the math and could stand to think more about what its power and generality imply about the future of humanity or lack thereof. I'm not particularly well-positioned to help them. But I also think there are a lot of people on this website who have a lot of practice pontificating about the future of humanity or lack thereof, who don't know that Simon Prince and Christopher Bishop don't think of themselves as writing about agents. I think that's a problem! (One which I am well-positioned to help with.) If my attempt to remediate that particular problem ends up mood-affiliating with the wrong side of a one-dimensional culture war, maybe that's because the one-dimensional culture war is crazy and we should stop doing it.

Comment by Zack_M_Davis on "Deep Learning" Is Function Approximation · 2024-03-23T18:32:22.373Z · LW · GW

For what notion is the first problem complicated, and the second simple?

I might be out of my depth here, but—could it be that sparse parity with noise is just objectively "harder than it sounds" (because every bit of noise inverts the answer), whereas protein folding is "easier than it sounds" (because if it weren't, evolution wouldn't have solved it)?

Just because the log-depth xor tree is small, doesn't mean it needs to be easy to find, if it can hide amongst vastly many others that might have generated the same evidence ... which I suppose is your point. (The "function approximation" frame encourages us to look at the boolean circuit and say, "What a simple function, shouldn't be hard to noisily approximate", which is not exactly the right question to be asking.)

Comment by Zack_M_Davis on "Deep Learning" Is Function Approximation · 2024-03-22T05:41:45.854Z · LW · GW

This comment had been apparently deleted by the commenter (the comment display box having a "deleted because it was a little rude, sorry" deletion note in lieu of the comment itself), but the ⋮-menu in the upper-right gave me the option to undelete it, which I did because I don't think my critics are obligated to be polite to me. (I'm surprised that post authors have that power!) I'm sorry you didn't like the post.

Comment by Zack_M_Davis on [deleted post] 2024-03-19T03:21:40.007Z

whether his charisma is more like +2SD or +5SD above the average American (concept origin: planecrash, likely doesn't actually follow a normal distribution in reality) [bolding mine]

The concept of measuring traits in standard deviation units did not originate in someone's roleplaying game session in 2022! Statistically literate people have been thinking in standardized units for more than a century. (If anyone has priority, it's Karl Pearson in 1894.)

If you happened to learn about it from someone's RPG session, that's fine. (People can learn things from all different sources, not just from credentialed "teachers" in officially accredited "courses.") But to the extent that you elsewhere predict changes in the trajectory of human civilization on the basis that "fewer than 500 people on earth [are] currently prepared to think [...] at a level similar to us, who read stuff on the same level" as someone's RPG session, learning an example of how your estimate of the RPG session's originality was a reflection of your own ignorance should make you re-think your thesis.

Comment by Zack_M_Davis on 'Empiricism!' as Anti-Epistemology · 2024-03-18T20:57:08.696Z · LW · GW

saddened (but unsurprised) to see few others decrying the obvious strawmen

In general, the "market" for criticism just doesn't seem very efficient at all! You might have hoped that people would mostly agree about what constitutes a flaw, critics would compete to find flaws in order to win status, and authors would learn not to write posts with flaws in them (in order to not lose status to the critics competing to point out flaws).

I wonder which part of the criticism market is failing: is it more that people don't agree about what constitutes a flaw, or that authors don't have enough of an incentive to care, or something else? We seem to end up with a lot of critics who specialize in detecting a specific kind of flaw ("needs examples" guy, "reward is not the optimization target" guy, "categories aren't arbitrary" guy, &c.), with very limited reaction from authors or imitation by other potential critics.

Comment by Zack_M_Davis on jeffreycaruso's Shortform · 2024-03-13T16:21:25.055Z · LW · GW

I mean, I agree that there are psycho-sociological similarities between religions and the AI risk movement (and indeed, I sometimes pejoratively refer to the latter as a "robot cult"), but analyzing the properties of the social group that believes that AI is an extinction risk is a separate question from whether AI in fact poses an extinction risk, which one could call Armageddon. (You could spend vast amounts of money trying to persuade people of true things, or false things; the money doesn't care either way.)

Obviously, there's not going to be a "proof" of things that haven't happened yet, but there's lots of informed speculation. Have you read, say, "The Alignment Problem from a Deep Learning Perspective"? (That may not be the best introduction for you, depending on the reasons for your skepticism, but it's the one that happened to come to mind, which is more grounded in real AI research than previous informed speculation that had less empirical data to work from.)

Comment by Zack_M_Davis on My Clients, The Liars · 2024-03-06T17:53:22.333Z · LW · GW

Why are you working for the prosecutors?

This is a pretty reasonable question from the client's perspective! When I was in psychiatric prison ("hospital", they call it a "hospital") and tried to complain to the staff about the injustice of my confinement, I was told that I could call "patient's rights".

I didn't bother. If the staff wasn't going to listen, what was the designated complaint line going to do?

Later, I found out that patient's rights advocates apparently are supposed to be independent, and not just a meaningless formality. (Scott Alexander: "Usually the doctors hate them, which I take as a pretty good sign that they are actually independent and do their job.")

This was not at all obvious from the inside. I can only imagine a lot of criminal defendants have a similar experience. Defense attorneys are frustrated that their clients don't understand that they're trying to help—but that "help" is all within the rules set by the justice system. From the perspective of a client who doesn't think he did anything particularly wrong (whether or not the law agrees), the defense attorney is part of the system.

I think my intuition was correct to dismiss patient's rights as useless. I'm sure they believe that they're working to protect patients' interests, and would have been frustrated that I didn't appreciate that. But what I wanted was not redress of any particular mistreatment that the system recognized as mistreatment, but to be let out of psych jail—and on that count, I'm sure patient's rights would have told me that the evidence was harmful to my case. They were working for the doctors, not for me.

Comment by Zack_M_Davis on Many arguments for AI x-risk are wrong · 2024-03-05T18:29:47.325Z · LW · GW

I can’t address them all, but I [...] am happy to dismantle any particular argument

You can't know that in advance!!

Comment by Zack_M_Davis on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-04T07:48:59.289Z · LW · GW

IQ seems like the sort of thing Feynman could be "honestly" motivatedly wrong about. The thing I'm trying to point at is that Feynman seemingly took pride in being a straight talker, in contrast to how Yudkowsky takes pride in not lying.

These are different things. Straight talkers sometimes say false or exaggerated things out of sloppiness, but they actively want listeners to know their reporting algorithm. Prudently selecting which true sentences to report in the service of a covert goal is not lying, but it's definitely not straight talk.

Comment by Zack_M_Davis on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-04T04:42:55.467Z · LW · GW

Yes, that would be ridiculous. It would also be ridiculous in a broadly similar way if someone spent eight years in the prime of their life prosecuting a false advertising lawsuit against a "World's Best" brand ice-cream for not actually being the best in the world.

But if someone did somehow make that mistake, I could see why they might end up writing a few blog posts afterwards telling the Whole Dumb Story.

Comment by Zack_M_Davis on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-04T03:41:25.164Z · LW · GW

You are perhaps wiser than me. (See also footnote 20.)

Comment by Zack_M_Davis on Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles · 2024-03-02T22:07:41.212Z · LW · GW

(I think this is the best and most important post in the sequence; I suspect that many readers who didn't and shouldn't bother with the previous three posts, may benefit from this one.)

Comment by Zack_M_Davis on New LessWrong review winner UI ("The LeastWrong" section and full-art post pages) · 2024-02-28T07:12:17.531Z · LW · GW

I second the concern that using "LeastWrong" on the site grants undue legitimacy to the bad "than others" interpretation of the brand name (as contrasted to the intended "all models are wrong, but" meaning). "Best Of" is clear and doesn't distort the brand.

Comment by Zack_M_Davis on Less Wrong automated systems are inadvertently Censoring me · 2024-02-27T06:44:20.582Z · LW · GW

Would you agree with the statement that your meta-level articles are more karma-successful than your object-level articles? Because if that is a fair description, I see it as a huge problem.

I don't think this is a good characterization of my posts on this website.

If by "meta-level articles", you mean my philosophy of language work (like "Where to Draw the Boundaries?" and "Unnatural Categories Are Optimized for Deception"), I don't think success is a problem. I think that was genuinely good work that bears directly on the site's mission, independently of the historical fact that I had my own idiosyncratic ("object-level"?) reasons for getting obsessed with the philosophy of language in 2019–2020.[1]

If by "object-level articles", you mean my writing on my special-interest blog about sexology and gender, well, the overwhelming majority of that never got a karma score because it was never cross-posted to Less Wrong. (I only cross-post specific articles from my special-interest blog when I think they're plausibly relevant to the site's mission.)

If by "meta-level articles", you mean my recent memoir sequence which talks about sexology and the philosophy of language and various autobiographical episodes of low-stakes infighting among community members in Berkeley, California, well, those haven't been karma-successful: parts 1, 2, and 3 are currently[2] sitting at 0.35, 0.08 (!), and 0.54 karma-per-vote, respectively.

If by "meta-level articles", you mean posts that reply to other users of this website (such as "Contra Yudkowsky on Epistemic Conduct for Author Criticism" or "'Rationalist Discourse' Is Like 'Physicist Motors'"), I contest the "meta level" characterization. I think it's normal and not particularly meta for intellectuals to write critiques of each other's work, where Smith writes "Kittens are Cute", and Jones replies in "Contra Smith on Kitten Cuteness". Sure, it would be possible for Jones to write a broadly similar article, "Kittens Aren't Cute", that ignores Smith altogether, but I think that's often a worse choice, if the narrow purpose of Jones's article is to critique the specific arguments made by Smith, notwithstanding that someone else might have better arguments in favor of the Cute Kitten theory that have not been heretofore considered.

You're correct to notice that a lot of my recent work has a cult-infighting drama angle to it. (This is very explicit in the memoir sequence, but it noticeably leaks into my writing elsewhere.) I'm pretty sure I'm not doing it for the karma. I think I'm doing it because I'm disillusioned and traumatized from the events described in the memoir, and will hopefully get over it after I've got it all written down and out of my system.

There's another couple posts in that sequence (including this coming Saturday, probably). If you don't like it, I hereby encourage you to strong-downvote it. I write because I selfishly have something to say; I don't think I'm entitled to anyone's approval.


  1. In some of those posts, I referenced the work of conventional academics like Brian Skyrms and others, which I think provides some support for the notion that the nature of language and categories is a philosophically rich topic that someone might find significant in its own right, rather than being some sort of smokescreen for a hidden agenda. ↩︎

  2. Pt. 1 actually had a much higher score (over 100 points) shortly after publication, but got a lot of downvotes later after being criticized on Twitter. ↩︎

Comment by Zack_M_Davis on Communication Requires Common Interests or Differential Signal Costs · 2024-02-27T04:40:48.827Z · LW · GW

Personal whimsy. Probably don't read too much into it. (My ideology has evolved over the years such that I think a lot of the people who are trying to signal something with the generic feminine would not regard me as an ally, but I still love the æsthetic.)

Comment by Zack_M_Davis on Less Wrong automated systems are inadvertently Censoring me · 2024-02-23T16:29:32.524Z · LW · GW

Zack cannot convince us [...] if you disagree with him, that only proves his point

I don't think I'm doing this! It's true that I think it's common for apparent disagreements to be explained by political factors, but I think that claim is itself something I can support with evidence and arguments. I absolutely reject "If you disagree, that itself proves I'm right" as an argument, and I think I've been clear about this. (See the paragraph in "A Hill of Validity in Defense of Meaning" starting with "Especially compared to normal Berkeley [...]".)

If you're interested, I'm willing to write more words explaining my model of which disagreements with which people on which topics are being biased by which factors. But I get the sense that you don't care that much, and that you're just annoyed that my grudge against Yudkowsky and a lot of people with Berkeley is too easily summarized as being with an abstracted "community" that you also happen to be in even though this has nothing to do with you? Sorry! I'm not totally sure how to fix this. (It's useful to sometimes be able to talk about general cultural trends, and being specific about which exact sub-sub-clusters are and are not guilty of the behavior being criticized would be a lot of extra wordcount that I don't think anyone is interested in.)