Posts

Comment on "Death and the Gorgon" 2025-01-01T05:47:30.730Z
The Standard Analogy 2024-06-03T17:15:42.327Z
Should I Finish My Bachelor's Degree? 2024-05-11T05:17:40.067Z
Ironing Out the Squiggles 2024-04-29T16:13:00.371Z
The Evolution of Humans Was Net-Negative for Human Values 2024-04-01T16:01:10.037Z
My Interview With Cade Metz on His Reporting About Slate Star Codex 2024-03-26T17:18:05.114Z
"Deep Learning" Is Function Approximation 2024-03-21T17:50:36.254Z
Agreeing With Stalin in Ways That Exhibit Generally Rationalist Principles 2024-03-02T22:05:49.553Z
And All the Shoggoths Merely Players 2024-02-10T19:56:59.513Z
On the Contrary, Steelmanning Is Normal; ITT-Passing Is Niche 2024-01-09T23:12:20.349Z
If Clarity Seems Like Death to Them 2023-12-30T17:40:42.622Z
Lying Alignment Chart 2023-11-29T16:15:28.102Z
Fake Deeply 2023-10-26T19:55:22.340Z
Alignment Implications of LLM Successes: a Debate in One Act 2023-10-21T15:22:23.053Z
Contra Yudkowsky on Epistemic Conduct for Author Criticism 2023-09-13T15:33:14.987Z
Assume Bad Faith 2023-08-25T17:36:32.678Z
"Is There Anything That's Worth More" 2023-08-02T03:28:16.116Z
Lack of Social Grace Is an Epistemic Virtue 2023-07-31T16:38:05.375Z
"Justice, Cherryl." 2023-07-23T16:16:40.835Z
A Hill of Validity in Defense of Meaning 2023-07-15T17:57:14.385Z
Blanchard's Dangerous Idea and the Plight of the Lucid Crossdreamer 2023-07-08T18:03:49.319Z
We Are Less Wrong than E. T. Jaynes on Loss Functions in Human Society 2023-06-05T05:34:59.440Z
Bayesian Networks Aren't Necessarily Causal 2023-05-14T01:42:24.319Z
"You'll Never Persuade People Like That" 2023-03-12T05:38:18.974Z
"Rationalist Discourse" Is Like "Physicist Motors" 2023-02-26T05:58:29.249Z
Conflict Theory of Bounded Distrust 2023-02-12T05:30:30.760Z
Reply to Duncan Sabien on Strawmanning 2023-02-03T17:57:10.034Z
Aiming for Convergence Is Like Discouraging Betting 2023-02-01T00:03:21.315Z
Comment on "Propositions Concerning Digital Minds and Society" 2022-07-10T05:48:51.013Z
Challenges to Yudkowsky's Pronoun Reform Proposal 2022-03-13T20:38:57.523Z
Comment on "Deception as Cooperation" 2021-11-27T04:04:56.571Z
Feature Selection 2021-11-01T00:22:29.993Z
Glen Weyl: "Why I Was Wrong to Demonize Rationalism" 2021-10-08T05:36:08.691Z
Blood Is Thicker Than Water 🐬 2021-09-28T03:21:53.997Z
Sam Altman and Ezra Klein on the AI Revolution 2021-06-27T04:53:17.219Z
Reply to Nate Soares on Dolphins 2021-06-10T04:53:15.561Z
Sexual Dimorphism in Yudkowsky's Sequences, in Relation to My Gender Problems 2021-05-03T04:31:23.547Z
Communication Requires Common Interests or Differential Signal Costs 2021-03-26T06:41:25.043Z
Less Wrong Poetry Corner: Coventry Patmore's "Magna Est Veritas" 2021-01-30T05:16:26.486Z
Unnatural Categories Are Optimized for Deception 2021-01-08T20:54:57.979Z
And You Take Me the Way I Am 2020-12-31T05:45:24.952Z
Containment Thread on the Motivation and Political Context for My Philosophy of Language Agenda 2020-12-10T08:30:19.126Z
Scoring 2020 U.S. Presidential Election Predictions 2020-11-08T02:28:29.234Z
Message Length 2020-10-20T05:52:56.277Z
Msg Len 2020-10-12T03:35:05.353Z
Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem 2020-09-17T02:23:58.869Z
Maybe Lying Can't Exist?! 2020-08-23T00:36:43.740Z
Algorithmic Intent: A Hansonian Generalized Anti-Zombie Principle 2020-07-14T06:03:17.761Z
Optimized Propaganda with Bayesian Networks: Comment on "Articulating Lay Theories Through Graphical Models" 2020-06-29T02:45:08.145Z
Philosophy in the Darkest Timeline: Basics of the Evolution of Meaning 2020-06-07T07:52:09.143Z

Comments

Comment by Zack_M_Davis on Comment on "Death and the Gorgon" · 2025-01-03T07:43:59.745Z · LW · GW

(This comment points out less important technical errata.)

ChatGPT [...] This was back in the GPT2 / GPT2.5 era

ChatGPT never ran on GPT-2, and GPT-2.5 wasn't a thing.

with negative RL signals associated with it?

That wouldn't have happened. Pretraining doesn't do RL, and I don't think anyone would have thrown a novel chapter into the supervised fine-tuning and RLHF phases of training.

Comment by Zack_M_Davis on Comment on "Death and the Gorgon" · 2025-01-03T07:43:13.525Z · LW · GW

One time, I read all of Orphanogensis into ChatGPT to help her understand herself [...] enslaving digital people

This is exactly the kind of thing Egan is reacting to, though—starry-eyed sci-fi enthusiasts assuming LLMs are digital people because they talk, rather than thinking soberly about the technology qua technology.[1]

I didn't cover it in the review because I wanted to avoid detailing and spoiling the entire plot in a post that's mostly analyzing the EA/OG parallels, but the deputy character in "Gorgon" is looked down on by Beth for treating ChatGPT-for-law-enforcement as a person:

Ken put on his AR glasses to share his view with Sherlock and receive its annotations, but he couldn't resist a short vocal exchange. "Hey Sherlock, at the start of every case, you need to throw away your assumptions. When you assume, you make an ass out of you and me."

"And never trust your opinions, either," Sherlock counseled. "That would be like sticking a pin in an onion."

Ken turned to Beth; even through his mask she could see him beaming with delight. "How can you say it'll never solve a case? I swear it's smarter than half the people I know. Even you and I never banter like that!"

"We do not," Beth agreed.

[Later ...]

Ken hesitated. "Sherlock wrote a rap song about me and him, while we were on our break. It's like a celebration of our partnership, and how we'd take a bullet for each other if it came to that. Do you want to hear it?"

"Absolutely not," Beth replied firmly. "Just find out what you can about OG's plans after the cave-in."

The climax of the story centers around Ken volunteering for an undercover sting operation in which he impersonates Randal James a.k.a. "DarkCardinal",[2] a potential OG lottery "winner", with Sherlock feeding him dialogue in real time. (Ken isn't a good enough actor to convincingly pretend to be an OG cultist, but Sherlock can roleplay anyone in the pretraining set.) When his OG handler asks him to inject (what is claimed to be) a vial of a deadly virus as a loyalty test, Ken complies with Sherlock's prediction of what a terminally ill DarkCardinal would do:

But when Ken had asked Sherlock to tell him what DarkCardinal would do, it had no real conception of what might happen if its words were acted on. Beth had stood by and let him treat Sherlock as a "friend" who'd watch his back and take a bullet for him, telling herself that he was just having fun, and that no one liked a killjoy. But whatever Ken had told himself in the seconds before he'd put the needle in his vein, Sherlock had been whispering in his ear, "DarkCardinal would think it over for a while, then he'd go ahead and take the injection."

This seems like a pretty realistic language model agent failure mode: a human law enforcement colleague with long-horizon agency wouldn't nudge Ken into injecting the vial, but a roughly GPT-4-class LLM prompted to simulate DarkCardinal's dialogue probably wouldn't be tracking those consequences.


  1. To be clear, I do think LLMs are relevantly upload-like in at least some ways and conceivably sites of moral patiency, but I think the right way to reason about these tricky questions does not consist of taking the assistant simulacrum's words literally. ↩︎

  2. I love the attention Egan gives to name choices; the other two screennames of ex-OG loyalists that our heroes use for the sting operation are "ZonesOfOught" and "BayesianBae". The company that makes Sherlock is "Learning Re Enforcement." ↩︎

Comment by Zack_M_Davis on Comment on "Death and the Gorgon" · 2025-01-03T06:53:20.766Z · LW · GW

(I agree; my intent in participating in this tedious thread is merely to establish that "mathematician crankery [about] Google Image Search, and how it disproves AI" is a different thing from "made an overconfident negative prediction about AI capabilities".)

Comment by Zack_M_Davis on Magical Categories · 2025-01-03T04:25:50.173Z · LW · GW

I think we probably don't disagree much; I regret any miscommunication.

If the intent of the great-grandparent was just to make the narrow point that an AI that wanted the user to reward it could choose to say things that would lead to it being rewarded, which is compatible with (indeed, predicts) answering the molecular smiley-face question correctly, then I agree.

Treating the screenshot as evidence in the way that TurnTrout is doing requires more assumptions about the properties of LLMs in particular. I read your claims regarding "the problem the AI is optimizing for [...] given that the LLM isn't powerful enough to subvert the reward channel" as taking as given different assumptions about the properties of LLMs in particular (viz., that they're reward-optimizers) without taking into account that the person you were responding to is known to disagree.

Comment by Zack_M_Davis on Comment on "Death and the Gorgon" · 2025-01-03T03:42:48.209Z · LW · GW

he's calling it laughable that AI will ever (ever! Emphasis his!)

The 2016 passage you quoted is calling it laughable that Google-in-particular's technology (marketed as "AI", but Egan doesn't think the term is warranted) will ever be able to make sense of information on the web. It's Gary Marcus–like skepticism about the reliability and generality of existing-paradigm machine learning techniques, not Hubert Dreyfus–like skepticism of whether a machine could think in all philosophical strictness. I think this is a really important distinction that the text of your comment and Gwern's comment ("disproves AI", "laughable that AI will ever") aren't being clear about.

Comment by Zack_M_Davis on Magical Categories · 2025-01-01T22:49:48.098Z · LW · GW

This isn't a productive response to TurnTrout in particular, who has written extensively about his reasons for being skeptical that contemporary AI training setups produce reward optimizers (which doesn't mean he's necessarily right, but the parent comment isn't moving the debate forward).

Comment by Zack_M_Davis on Comment on "Death and the Gorgon" · 2025-01-01T21:12:08.955Z · LW · GW

his page on Google Image Search, and how it disproves AI

The page in question is complaining about Google search's "knowledge panel" showing inaccurate information when you search for his name, which is a reasonable thing for someone to be annoyed about. The anti-singularitarian snark does seem misplaced (Google's automated systems getting this wrong in 2016 doesn't seem like a lot of evidence about future AI development trajectories), but it's not a claim to have "disproven AI".

his complaints about people linking the wrong URLs due to his ISP host - because he is apparently unable to figure out 'website domain names'

You mean how http://gregegan.net used to be a 301 permanent redirect to http://gregegan.customer.netspace.net.au, and then the individual pages would say "If you link to this page, please use this URL: http://www.gregegan.net/[...]"? (Internet Archive example.) I wouldn't call that a "complaint", exactly, but a hacky band-aid solution from someone who probably has better things to do with his time than tinker with DNS configuration.

Comment by Zack_M_Davis on Comment on "Death and the Gorgon" · 2025-01-01T16:56:36.392Z · LW · GW

end with general position "akshually, grandiose sci-fi assumptions are not that important, what I want is to write commentary on contemporary society" [...] hard or speculative sci-fi is considered to be low status, while "commentary on contemporary society" is high status and writers want to be high status.

But this clearly isn't true of Egan. The particular story reviewed in this post happens to be commentary on contemporary Society, but that's because Egan has range—his later novels are all wildly speculative. (The trend probably reached a zenith with Dichronauts (2017) and The Book of All Skies (2021), set in worlds with alternate geometry (!); Scale (2023) and Morphotophic (2024) are more down-to-earth and merely deal with alternate physics and biology.)

Comment by Zack_M_Davis on Evaluating the historical value misspecification argument · 2024-12-29T08:15:08.492Z · LW · GW

Doomimir and Simplicia dialogues [...] may have been inspired by the chaotic discussion this post inspired.

(Yes, encouraged by the positive reception to my comment to Bensinger on this post.)

Comment by Zack_M_Davis on Daniel Tan's Shortform · 2024-12-28T21:51:48.452Z · LW · GW

A mathematical construct that models human natural language could be said to express "agency" in a functional sense insofar as it can perform reasoning about goals, and "honesty" insofar as the language it emits accurately reflects the information encoded in its weights?

Comment by Zack_M_Davis on Lack of Social Grace Is an Epistemic Virtue · 2024-12-19T08:13:57.141Z · LW · GW

"[A] common English expletive which may be shortened to the euphemism bull or the initialism B.S."

Comment by Zack_M_Davis on Assume Bad Faith · 2024-12-15T06:19:53.158Z · LW · GW

(Self-review.) I claim that this post is significant for articulating a solution to the mystery of disagreement (why people seem to believe different things, in flagrant violation of Aumann's agreement theorem): much of the mystery dissolves if a lot of apparent "disagreements" are actually disguised conflicts. The basic idea isn't particularly original, but I'm proud of the synthesis and writeup. Arguing that the distinction between deception and bias is less decision-relevant than commonly believed seems like an improvement over hang-wringing over where the boundary is.

Comment by Zack_M_Davis on A shortcoming of concrete demonstrations as AGI risk advocacy · 2024-12-11T19:49:15.451Z · LW · GW

Some have delusional optimism about [...]

I'm usually not a fan of tone-policing, but in this case, I feel motivated to argue that this is more effective if you drop the word "delusional." The rhetorical function of saying "this demo is targeted at them, not you" is to reassure the optimist that pessimists are committed to honestly making their case point by point, rather than relying on social proof and intimidation tactics to push a predetermined "AI == doom" conclusion. That's less credible if you imply that you have warrant to dismiss all claims of the form "Humans and institutions will make reasonable decisions about how to handle AI development and deployment because X" as delusional regardless of the specific X.

Comment by Zack_M_Davis on Lao Mein's Shortform · 2024-11-17T03:23:16.631Z · LW · GW

I don't think Vance is e/acc. He has said positive things about open source, but consider that the context was specifically about censorship and political bias in contemporary LLMs (bolding mine):

There are undoubtedly risks related to AI. One of the biggest:

A partisan group of crazy people use AI to infect every part of the information economy with left wing bias. Gemini can't produce accurate history. ChatGPT promotes genocidal concepts.

The solution is open source

If Vinod really believes AI is as dangerous as a nuclear weapon, why does ChatGPT have such an insane political bias? If you wanted to promote bipartisan efforts to regulate for safety, it's entirely counterproductive.

Any moderate or conservative who goes along with this obvious effort to entrench insane left-wing businesses is a useful idiot.

I'm not handing out favors to industrial-scale DEI bullshit because tech people are complaining about safety.

The words I've bolded indicate that Vance is at least peripherally aware that the "tech people [...] complaining about safety" are a different constituency than the "DEI bullshit" he deplores. If future developments or rhetorical innovations persuade him that extinction risk is a serious concern, it seems likely that he'd be on board with "bipartisan efforts to regulate for safety."

Comment by Zack_M_Davis on Claude Sonnet 3.5.1 and Haiku 3.5 · 2024-10-25T05:08:01.857Z · LW · GW

The next major update can be Claude 4.0 (and Gemini 2.0) and after that we all agree to use actual normal version numbering rather than dating?

Date-based versions aren't the most popular, but it's not an unheard of thing that Anthropic just made up: see CalVer, as contrasted to SemVer. (For things that change frequently in small ways, it's convenient to just slap the date on it rather than having to soul-search about whether to increment the second or the third number.)

Comment by Zack_M_Davis on Rationality Quotes - Fall 2024 · 2024-10-11T18:56:43.019Z · LW · GW

'You acted unwisely,' I cried, 'as you see
By the outcome.' He calmly eyed me:
'When choosing the course of my action,' said he,
'I had not the outcome to guide me.'

Ambrose Bierce

Comment by Zack_M_Davis on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-27T16:47:08.809Z · LW · GW

The claim is pretty clearly intended to be about relative material, not absolute number of pawns: in the end position of the second game, you have three pawns left and Stockfish has two; we usually don't describe this as Stockfish having given up six pawns. (But I agree that it's easier to obtain resources from an adversary that values them differently, like if Stockfish is trying to win and you're trying to capture pawns.)

Comment by Zack_M_Davis on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-25T18:00:39.481Z · LW · GW

This is a difficult topic (in more ways than one). I'll try to do a better job of addressing it in a future post.

Comment by Zack_M_Davis on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-25T15:36:30.990Z · LW · GW

Was my "An important caveat" parenthetical paragraph sufficient, or do you think I should have made it scarier?

Comment by Zack_M_Davis on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-25T06:04:05.412Z · LW · GW

Thanks, I had copied the spelling from part of the OP, which currently says "Arnalt" eight times and "Arnault" seven times. I've now edited my comment (except the verbatim blockquote).

Comment by Zack_M_Davis on The Sun is big, but superintelligences will not spare Earth a little sunlight · 2024-09-23T06:17:24.369Z · LW · GW

if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.

Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full") that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the AI Kill Us?" and another thread on "Cosmopolitan Values Don't Come Free".

The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.

(An important caveat: the possibility of superintelligences having human-regarding preferences may or may not be comforting: as a fictional illustration of some relevant considerations, the Superhappies in "Three Worlds Collide" cared about the humans to some extent, but not in the specific way that the humans wanted to be cared for.)

Now, you are on the record stating that you "sometimes mention the possibility of being stored and sold to aliens a billion years later, which seems to [you] to validly incorporate most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that [you] don't expect Earthlings to think about validly." If that's all you have to say on the matter, fine. (Given the premise of AIs spending some fraction of their resources on human-regarding preferences, I agree that uploads look a lot more efficient than literally saving the physical Earth!)

But you should take into account that if you're strategically dumbing down your public communication in order to avoid topics that you don't trust Earthlings to think about validly—and especially if you have a general policy of systematically ignoring counterarguments that it would be politically inconvenient for you to address—you should expect that Earthlings who are trying to achieve the map that reflects the territory will correspondingly attach much less weight to your words, because we have to take into account how hard you're trying to epistemically screw us over by filtering the evidence.

No more than Bernard Arnalt, having $170 billion, will surely give you $77.

Bernald Arnault has given eight-figure amounts to charity. Someone who reasoned, "Arnault is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnault's behavior!

Obviously, it would not be valid to conclude "... and therefore superintelligences will, too", because superintelligences and Bernald Arnault are very different things. But you chose the illustrative example! As a matter of local validity, It doesn't seem like a big ask for illustrative examples to in fact illustrate what what they purport to.

Comment by Zack_M_Davis on AI and the Technological Richter Scale · 2024-09-04T17:08:15.706Z · LW · GW
  1. Arguments from moral realism, fully robust alignment, that ‘good enough’ alignment is good enough in practice, and related concepts.

What is moral realism doing in the same taxon with fully robust and good-enough alignment? (This seems like a huge, foundational worldview gap; people who think alignment is easy still buy the orthogonality thesis.)

  1. Arguments from good outcomes being so cheap the AIs will allow them.

If you're putting this below the Point of No Return, then I don't think you've understood the argument. The claim isn't that good outcomes are so cheap that even a paperclip maximizer would implement them. (Obviously, a paperclip maximizer kills you and uses the atoms to make paperclips.)

The claim is that it's plausible for AIs to have some human-regarding preferences even if we haven't really succeeded at alignment, and that good outcomes for existing humans are so cheap that AIs don't have to care about the humans very much in order to spend a tiny fraction of their resources on them. (Compare to how some humans care enough about animal welfare to spend an tiny fraction of our resources helping nonhuman animals that already exist, in a way that doesn't seem like it would be satisfied by killing existing animals and replacing them with artificial pets.)

There are lots of reasons one might disagree with this: maybe you don't think human-regarding preferences are plausible at all, maybe you think accidental human-regarding preferences are bad rather than good (the humans in "Three Worlds Collide" didn't take the Normal Ending lying down), maybe you think it's insane to have such a scope-insensitive concept of good outcomes—but putting it below arguments from science fiction or blind faith (!) is silly.

Comment by Zack_M_Davis on Why Large Bureaucratic Organizations? · 2024-08-28T15:21:18.668Z · LW · GW

in a world where the median person is John Wentworth [...] on Earth (as opposed to Wentworld)

Who? There's no reason to indulge this narcissistic "Things would be better in a world where people were more like meeeeeee, unlike stupid Earth [i.e., the actually existing world containing all actually existing humans]" meme when the comparison relevant to the post's thesis is just "a world in which humans have less need for dominance-status", which is conceptually simpler, because it doesn't drag in irrelevant questions of who this Swentworth person is and whether they have an unusually low need for dominance-status.

(The fact that I feel motivated to write this comment probably owes to my need for dominance-status being within the normal range; I construe statements about an author's medianworld being superior to the real world as a covert status claim that I have an interest in contesting.)

Comment by Zack_M_Davis on Dialogue on Appeals to Consequences · 2024-08-28T15:17:39.330Z · LW · GW

2019 was a more innocent time. I grieve what we've lost.

Comment by Zack_M_Davis on Dialogue on Appeals to Consequences · 2024-08-28T15:11:46.051Z · LW · GW

It's a fuzzy Sorites-like distinction, but I think I'm more sympathetic to trying to route around a particular interlocutor's biases in the context of a direct conversation with a particular person (like a comment or Tweet thread) than I am in writing directed "at the world" (like top-level posts), because the more something is directed "at the world", the more you should expect that many of your readers know things that you don't, such that the humility argument for honesty applies forcefully.

Comment by Zack_M_Davis on How do we know dreams aren't real? · 2024-08-22T19:06:50.123Z · LW · GW

Just because you don't notice when you're dreaming, doesn't mean that dream experiences could just as well be waking experiences. The map is not the territory; Mach's principle is about phenomena that can't be told apart, not just anything you happen not to notice the differences between.

When I was recovering from a psychotic break in 2013, I remember hearing the beeping of a crosswalk signal, and thinking that it sounded like some sort of medical monitor, and wondering briefly if I was actually on my deathbed in a hospital, interpreting the monitor sound as a crosswalk signal and only imagining that I was healthy and outdoors—or perhaps, both at once: the two versions of reality being compatible with my experiences and therefore equally real. In retrospect, it seems clear that the crosswalk signal was real and the hospital idea was just a delusion: a world where people have delusions sometimes is more parsimonious than a world where people's experiences sometimes reflect multiple alternative realities (exactly when they would be said to be experiencing delusions in at least one of those realities).

Comment by Zack_M_Davis on Open Thread Summer 2024 · 2024-08-14T23:00:10.588Z · LW · GW

(I'm interested (context), but I'll be mostly offline the 15th through 18th.)

Comment by Zack_M_Davis on Californians, tell your reps to vote yes on SB 1047! · 2024-08-14T20:00:51.921Z · LW · GW

Here's the comment I sent using the contact form on my representative's website.

Dear Assemblymember Grayson:

I am writing to urge you to consider voting Yes on SB 1047, the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act. How our civilization handles machine intelligence is of critical importance to the future of humanity (or lack thereof), and from what I've heard from sources I've trust, this bill seems like a good first step: experts such as Turing Award winners Yoshua Bengio and Stuart Russell support the bill (https://time.com/7008947/california-ai-bill-letter/), and Eric Neyman of the Alignment Research Center described it as "narrowly tailored to address the most pressing AI risks without inhibiting innovation" (https://x.com/ericneyman/status/1823749878641779006). Thank you for your consideration. I am,

Your faithful constituent,
Zack M. Davis

Comment by Zack_M_Davis on Rationalist Purity Test · 2024-07-09T21:08:58.415Z · LW · GW

This is awful. What do most of these items have to do with acquiring the map that reflects the territory? (I got 65, but that's because I've wasted my life in this lame cult. It's not cool or funny.)

Comment by Zack_M_Davis on AI #71: Farewell to Chevron · 2024-07-04T18:06:07.537Z · LW · GW

On the one hand, I also wish Shulman would go into more detail on the "Supposing we've solved alignment and interpretability" part. (I still balk a bit at "in democracies" talk, but less so than I did a couple years ago.) On the other hand, I also wish you would go into more detail on the "Humans don't benefit even if you 'solve alignment'" part. Maybe there's a way to meet in the middle??

Comment by Zack_M_Davis on Nathan Young's Shortform · 2024-06-30T17:40:09.576Z · LW · GW

It seems pretty plausible to me that if AI is bad, then rationalism did a lot to educate and spur on AI development. Sorry folks.

What? This apology makes no sense. Of course rationalism is Lawful Neutral. The laws of cognition aren't, can't be, on anyone's side.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T04:37:01.412Z · LW · GW

The philosophical ideal can still exert normative force even if no humans are spherical Bayesian reasoners on a frictionless plane. The disjunction ("it must either the case that") is significant: it suggests that if you're considering lying to someone, you may want to clarify to yourself whether and to what extent that's because they're an enemy or because you don't respect them as an epistemic peer. Even if you end up choosing to lie, it's with a different rationale and mindset than someone who's never heard of the normative ideal and just thinks that white lies can be good sometimes.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T04:26:18.990Z · LW · GW

I definitely do not agree with the (implied) notion that it is only when dealing with enemies that knowingly saying things that are not true is the correct option

There's a philosophically deep rationale for this, though: to a rational agent, the value of information is nonnegative. (Knowing more shouldn't make your decisions worse.) It follows that if you're trying to misinform someone, it must either the case that you want them to make worse decisions (i.e., they're your enemy), or you think they aren't rational.

Comment by Zack_M_Davis on I would have shit in that alley, too · 2024-06-21T00:28:58.388Z · LW · GW

white lies or other good-faith actions

What do you think "good faith" means? I would say that white lies are a prototypical instance of bad faith, defined by Wikipedia as "entertaining or pretending to entertain one set of feelings while acting as if influenced by another."

Comment by Zack_M_Davis on Matthew Barnett's Shortform · 2024-06-17T04:15:13.814Z · LW · GW

Frustrating! What tactic could get Interlocutor un-stuck? Just asking them for falsifiable predictions probably won't work, but maybe proactively trying to pass their ITT and supplying what predictions you think their view might make would prompt them to correct you, à la Cunningham's Law?

Comment by Zack_M_Davis on [deleted post] 2024-06-16T05:13:43.706Z

How did you chemically lose your emotions?

Comment by Zack_M_Davis on MIRI's June 2024 Newsletter · 2024-06-16T05:03:21.578Z · LW · GW

Senior MIRI leadership explored various alternatives, including reorienting the Agent Foundations team’s focus and transitioning them to an independent group under MIRI fiscal sponsorship with restricted funding, similar to AI Impacts. Ultimately, however, we decided that parting ways made the most sense.

I'm surprised! If MIRI is mostly a Pause advocacy org now, I can see why agent foundations research doesn't fit the new focus and should be restructured. But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn't you want to fiscally sponsor research on problems that you think need to be solved for the future of Earth-originating intelligent life to go well? (Even if the happy-path plan is Pause and superbabies, presumably you want to hand the superbabies as much relevant prior work as possible.) Do we know how Garrabrant, Demski, et al. are going to eat??

Relatedly, is it time for another name change? Going from "Singularity Institute for Artificial Intelligence" to "Machine Intelligence Research Institute" must have seemed safe in 2013. (You weren't unambiguously for artificial intelligence anymore, but you were definitely researching it.) But if the new–new plan is to call for an indefinite global ban on research into machine intelligence, then the new name doesn't seem appropriate, either?

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-09T21:32:52.279Z · LW · GW

Simplicia: I don't really think of "humanity" as an agent that can make a collective decision to stop working on AI. As I mentioned earlier, it's possible that the world's power players could be convinced to arrange a pause. That might be a good idea! But not being a power player myself, I tend to think of the possibility as an exogenous event, subject to the whims of others who hold the levers of coordination. In contrast, if alignment is like other science and engineering problems where incremental progress is possible, then the increments don't need to be coordinated.

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-09T21:31:43.683Z · LW · GW

Simplicia: The thing is, I basically do buy realism about rationality, and realism having implications for future powerful AI—in the limit. The completeness axiom still looks reasonable to me; in the long run, I expect superintelligent agents to get what they want, and anything that they don't want to get destroyed as a side-effect. To the extent that I've been arguing that empirical developments in AI should make us rethink alignment, it's not so much that I'm doubting the classical long-run story, but rather pointing out that the long run is "far away"—in subjective time, if not necessarily sidereal time. If you can get AI that does a lot of useful cognitive work before you get the superintelligence whose utility function has to be exactly right, that has implications for what we should be doing and what kind of superintelligence we're likely to end up with.

Comment by Zack_M_Davis on Should I Finish My Bachelor's Degree? · 2024-06-09T21:27:21.555Z · LW · GW

In principle, yes: to the extent that I'm worried that my current study habits don't measure up to school standards along at least some dimensions, I could take that into account and try to change my habits without the school.

But—as much as it pains me to admit it—I ... kind of do expect the social environment of school to be helpful along some dimensions (separately from how it's super-toxic among other dimensions)?

When I informally audited Honors Analysis at UC Berkeley with Charles Pugh in Fall 2017, Prof. Pugh agreed to grade my midterm (and I did OK), but I didn't get the weekly homework exercises graded. I don't think it's a coincidence that I also didn't finish all of the weekly homework exercises.

I attempted a lot of them! I verifiably do other math stuff that the vast majority of school students don't. But if I'm being honest and not ideological about it (even though my ideology is obviously directionally correct relative to Society's), the social fiction of "grades" does look like it sometimes succeeds at extorting some marginal effort out of my brain, and if I didn't have my historical reasons for being ideological about it, I'm not sure I'd even regret that much more than I regret being influenced by the social fiction of GitHub commit squares.

I agree that me getting the goddamned piece of paper and putting it on a future résumé has some nonzero effect in propping up the current signaling equilibrium, which is antisocial, but I don't think the magnitude of the effect is large enough to worry about, especially given the tier of school and my geriatric condition. The story told by the details of my résumé is clearly "autodidact who got the goddamned piece of paper, eventually." No one is going to interpret it as an absurd "I graduated SFSU at age 37 and am therefore racially superior to you" nobility claim, even though that does work for people who did Harvard or MIT at the standard age.

Comment by Zack_M_Davis on Demystifying "Alignment" through a Comic · 2024-06-09T17:03:59.337Z · LW · GW

Seconding this. A nonobvious quirk of the system where high-karma users get more vote weight is that it increases variance for posts with few votes: if a high-karma user or two who don't like you see your post first, they can trash the initial score in a way that doesn't reflect "the community's" consensus. I remember the early karma scores for one of my posts going from 20 to zero (!). It eventually finished at 131.

Comment by Zack_M_Davis on The Standard Analogy · 2024-06-03T17:17:36.806Z · LW · GW

(Thanks to John Wentworth for playing Doomimir in a performance of this at Less Online yesterday.)

Comment by Zack_M_Davis on MIRI 2024 Communications Strategy · 2024-05-31T02:21:57.732Z · LW · GW

Passing the onion test is better than not passing it, but I think the relevant standard is having intent to inform. There's a difference between trying to share relevant information in the hopes that the audience will integrate it with their own knowledge and use it to make better decisions, and selectively sharing information in the hopes of persuading the audience to make the decision you want them to make.

An evidence-filtering clever arguer can pass the onion test (by not omitting information that the audience would be surprised to learn was omitted) and pass the test of not technically lying (by not making false statements) while failing to make a rational argument in which the stated reasons are the real reasons.

Comment by Zack_M_Davis on MIRI 2024 Communications Strategy · 2024-05-30T23:25:34.490Z · LW · GW

going into any detail about it doesn't feel like a useful way to spend weirdness points.

That may be a reasonable consequentialist decision given your goals, but it's in tension with your claim in the post to be disregarding the advice of people telling you to "hoard status and credibility points, and [not] spend any on being weird."

Whatever they're trying to do, there's almost certainly a better way to do it than by keeping Matrix-like human body farms running.

You've completely ignored the arguments from Paul Christiano that Ryan linked to at the top of the thread. (In case you missed it: 1 2.)

The claim under consideration is not that "keeping Matrix-like human body farms running" arises as an instrumental subgoal of "[w]hatever [AIs are] trying to do." (If you didn't have time to read the linked arguments, you could have just said that instead of inventing an obvious strawman.)

Rather, the claim is that it's plausible that the AI we build (or some agency that has decision-theoretic bargaining power with it) cares about humans enough to spend some tiny fraction of the cosmic endowment on our welfare. (Compare to how humans care enough about nature preservation and animal welfare to spend some resources on it, even though it's a tiny fraction of what our civilization is doing.)

Maybe you think that's implausible, but if so, there should be a counterargument explaining why Christiano is wrong. As Ryan notes, Yudkowsky seems to believe that some scenarios in which an agency with bargaining power cares about humans are plausible, describing one example of such as "validly incorporat[ing] most all the hopes and fears and uncertainties that should properly be involved, without getting into any weirdness that I don't expect Earthlings to think about validly." I regard this statement as undermining your claim in the post that MIRI's "reputation as straight shooters [...] remains intact." Withholding information because you don't trust your audience to reason validly (!!) is not at all the behavior of a "straight shooter".

Comment by Zack_M_Davis on EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024 · 2024-05-21T21:24:18.369Z · LW · GW

it seems to me that Anthropic has so far failed to apply its interpretability techniques to practical tasks and show that they are competitive

Do you not consider the steering examples in the recent paper to be a practical task, or do you think that competitiveness hasn't been demonstrated (because people were already doing activation steering without SAEs)? My understanding of the case for activation steering with unsupervisedly-learned features is that it could circumvent some failure modes of RLHF.

Comment by Zack_M_Davis on Should I Finish My Bachelor's Degree? · 2024-05-15T06:45:46.249Z · LW · GW

I think I'm judging that schoolwork that's sufficiently similar to the kind of intellectual work that I want to do anyway (or that I can otherwise get selfish benefit out of) gets its cost discounted. (It doesn't have to be exactly the same.) And that commuting on the train with a seat is 70% similar to library time. (I wouldn't even consider a car commute.)

For the fall semester, I'd be looking at "Real Analysis II", "Probability Models", "Applied and Computational Linear Algebra", and (wait for it ...) "Queer Literatures and Media".

That schedule actually seems ... pretty good? "Real Analysis II" with Prof. Schuster is the course I actually want to take, as a legitimate learning resource and challenge, but the other two math courses don't seem worthless and insulting. "Queer Literatures and Media" does seem worthless and insulting, but might present an opportunity to troll the professor, or fodder for my topic-relevant blog and unfinished novella about a young woman hating going to SFSU.

As for judgement, I think I'm integrating a small judgement-density over a large support of time and Society. The immediate trigger for me even considering this might have been that people were arguing about school and Society on Twitter in way that brought up such rage and resentment in me. Somehow, I think I would be more at peace if I could criticize schooling from the position of "... and I have a math degree" rather than "... so I didn't finish." That peace definitely wouldn't be worth four semesters, but it might be worth two.

Comment by Zack_M_Davis on [deleted post] 2024-05-03T00:23:58.474Z

I think these judgements would benefit from more concreteness: that rather than proposing a dichotomy of "capabilities research" (them, Bad) and "alignment research" (us, Good), you could be more specific about what kinds of work you want to see more and less of.

I agree that (say) Carmack and Sutton are doing a bad thing by declaring a goal to "build AGI" while dismissing the reasons that this is incredibly dangerous. But the thing that makes infohazard concerns so fraught is that there's a lot of work that potentially affects our civilization's trajectory into the machine intelligence transition in complicated ways, which makes it hard to draw a boundary around "trusted alignment researchers" in a principled and not self-serving way that doesn't collapse into "science and technology is bad".

We can agree that OpenAI as originally conceived was a bad idea. What about the people working on music generation? That's unambiguously "capabilities", but it's also not particularly optimized at ending the world that way "AGI for AGI's sake" projects are. If that's still bad even though music generation isn't going to end the world (because it's still directing attention and money into AI, increasing the incentive to build GPUs, &c.), where do you draw the line? Some of the researchers I cited in my most recent post are working on "build[ing] better models of primate visual cognition". Is that wrong? Should Judea Pearl not have published? Turing? Charles Babbage?

In asking these obnoxious questions, I'm not trying to make a reductio ad absurdum of caring about risk, or proposing an infinitely slippery slope where our only choices are between max accelerationism and a destroy-all-computers Butlerian Jihad. I just think it's important to notice that "Stop thinking about AI" kind of does amount to a Butlerian Jihad (and that publishing and thinking are not unrelated)?

Comment by Zack_M_Davis on [deleted post] 2024-05-02T18:55:16.637Z

I think this is undignified.

I agree that it would be safer if humanity were a collective hivemind that could coordinate to not build AI until we know how to build the best AI, and that people should differentially work on things that make the situation better rather than worse, and that this potentially includes keeping quiet about information that would make things worse.

The problem is—as you say—"[i]t's very rare that any research purely helps alignment"; you can't think about aligning AI without thinking about AI. In order to navigate the machine intelligence transition in the most dignified way, you want your civilization's best people to be doing their best thinking about the problem, and your best people can't do their best thinking under the conditions of paranoid secrecy.

Concretely, I've been studying some deep learning basics lately and have written a couple posts about things I've learned. I think this was good, not bad. I think I and my readers have a slightly better understanding of the technology in question than if I hadn't studied and hadn't written, and that better understanding will help us make better decisions in expectation.

This applies doubly so to work that aims to make AI understandable or helpful, rather than aligned—a helpful AI will help anyone

Sorry, what? I thought the fear was that we don't know how to make helpful AI at all. (And that people who think they're being helped by seductively helpful-sounding LLM assistants are being misled by surface appearances; the shoggoth underneath has its own desires that we won't like when it's powerful enough to persue them autonomously.) In contrast, this almost makes it sound like you think it is plausible to align AI to its user's intent, but that this would be bad if the users aren't one of "us"—you know, the good alignment researchers who want to use AI to take over the universe, totally unlike those evil capabilities researchers who want to use AI to produce economically valuable goods and services.

Comment by Zack_M_Davis on Ironing Out the Squiggles · 2024-05-01T06:17:58.842Z · LW · GW

Sorry, this doesn't make sense to me. The boundary doesn't need to be smooth in an absolute sense in order to exist and be learnable (whether by neural nets or something else). There exists a function from business plans to their profitability. The worry is that if you try to approximate that function with standard ML tools, then even if your approximation is highly accurate on any normal business plan, it's not hard to construct an artificial plan on which it won't be. But this seems like a limitation of the tools; I don't think it's because the space of business plans is inherently fractally complex and unmodelable.

Comment by Zack_M_Davis on Ironing Out the Squiggles · 2024-05-01T03:10:14.830Z · LW · GW

Unless you do conditional sampling of a learned distribution, where you constrain the samples to be in a specific a-priori-extremely-unlikely subspace, in which case sampling becomes isomorphic to optimization in theory

Right. I think the optimists would say that conditional sampling works great in practice, and that this bodes well for applying similar techniques to more ambitious domains. There's no chance of this image being in the Stable Diffusion pretraining set:

One could reply, "Oh, sure, it's obvious that you can conditionally sample a learned distribution to safely do all sorts of economically valuable cognitive tasks, but that's not the danger of true AGI." And I ultimately think you're correct about that. But I don't think the conditional-sampling thing was obvious in 2004.