On plans for a functional society 2023-12-12T00:07:46.629Z
Secondary Risk Markets 2023-12-11T21:52:46.836Z
Vaniver's thoughts on Anthropic's RSP 2023-10-28T21:06:07.323Z
Truthseeking, EA, Simulacra levels, and other stuff 2023-10-27T23:56:49.198Z
More or Fewer Fights over Principles and Values? 2023-10-15T21:35:31.834Z
Long-Term Future Fund: April 2023 grant recommendations 2023-08-02T07:54:49.083Z
A Social History of Truth 2023-07-31T22:49:23.209Z
Frontier Model Security 2023-07-26T04:48:02.215Z
Bengio's FAQ on Catastrophic AI Risks 2023-06-29T23:04:49.098Z
Weight by Impact 2023-05-21T14:37:58.187Z
Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems 2023-02-17T20:11:39.255Z
Prediction Markets for Science 2023-01-02T17:55:12.808Z
Systems of Survival 2022-12-09T05:13:53.064Z
Notes on Notes on the Synthesis of Form 2022-10-06T02:36:08.595Z
A Pattern Language For Rationality 2022-07-05T19:08:49.783Z
Vaniver's ELK Submission 2022-03-28T21:14:37.019Z
Dual use of artificial-intelligence-powered drug discovery 2022-03-15T02:52:37.154Z
How satisfied should you expect to be with your partner? 2022-02-22T23:27:41.866Z
2020 Review Article 2022-01-14T04:58:02.456Z
The Debtor's Revolt 2021-12-26T19:32:32.980Z
2020 Review: The Discussion Phase 2021-12-15T01:12:44.746Z
[Lecture Club] Awakening from the Meaning Crisis 2021-03-08T15:22:22.626Z
Alex Irpan: "My AI Timelines Have Sped Up" 2020-08-19T16:23:25.348Z
Property as Coordination Minimization 2020-08-04T19:24:15.759Z
Rereading Atlas Shrugged 2020-07-28T18:54:45.272Z
A reply to Agnes Callard 2020-06-28T03:25:27.378Z
Public Positions and Private Guts 2020-06-26T23:00:52.838Z
How alienated should you be? 2020-06-14T15:55:24.043Z
Outperforming the human Atari benchmark 2020-03-31T19:33:46.355Z
Mod Notice about Election Discussion 2020-01-29T01:35:53.947Z
Circling as Cousin to Rationality 2020-01-01T01:16:42.727Z
Self and No-Self 2019-12-29T06:15:50.192Z
T-Shaped Organizations 2019-12-16T23:48:13.101Z
ialdabaoth is banned 2019-12-13T06:34:41.756Z
The Bus Ticket Theory of Genius 2019-11-23T22:12:17.966Z
Vaniver's Shortform 2019-10-06T19:34:49.931Z
Vaniver's View on Factored Cognition 2019-08-23T02:54:00.915Z
Conversation on forecasting with Vaniver and Ozzie Gooen 2019-07-30T11:16:58.633Z
Commentary On "The Abolition of Man" 2019-07-15T18:56:27.295Z
Is there a guide to 'Problems that are too fast to Google'? 2019-06-17T05:04:39.613Z
Steelmanning Divination 2019-06-05T22:53:54.615Z
Public Positions and Private Guts 2018-10-11T19:38:25.567Z
Maps of Meaning: Abridged and Translated 2018-10-11T00:27:20.974Z
Compact vs. Wide Models 2018-07-16T04:09:10.075Z
Thoughts on AI Safety via Debate 2018-05-09T19:46:00.417Z
Turning 30 2018-05-08T05:37:45.001Z
My confusions with Paul's Agenda 2018-04-20T17:24:13.466Z
LW Migration Announcement 2018-03-22T02:18:19.892Z
LW Migration Announcement 2018-03-22T02:17:13.927Z
Leaving beta: Voting on moving to 2018-03-11T23:40:26.663Z


Comment by Vaniver on Read the Roon · 2024-03-05T19:48:14.977Z · LW · GW

What would be a better framing?

I talk about something related in self and no-self; the outward-flowing 'attempt to control' and the inward-flowing 'attempt to perceive' are simultaneously in conflict (something being still makes it easier to see where it is, but also makes it harder to move it to where it should be) and mutually reinforcing (being able to tell where something is makes it easier to move it precisely where it needs to be).

Similarly, you can make an argument that control without understanding is impossible, that getting AI systems to do what we want is one task instead of two. I think I agree the "two progress bars" frame is incorrect but I think the typical AGI developer at a lab is not grappling with the philosophical problems behind alignment difficulties, and is trying to make something that 'works at all' instead of 'works understandably' in the sort of way that would actually lead to understanding which would enable control.

Comment by Vaniver on Vaniver's Shortform · 2024-03-01T19:52:21.418Z · LW · GW

Spoiler-free Dune review, followed by spoilery thoughts: Dune part 1 was a great movie; Dune part 2 was a good movie. (The core strengths of the first movie were 1) fantastic art and 2) fidelity to the book; the second movie doesn't have enough new art to carry its runtime and is stuck in a less interesting part of the plot, IMO, and one where the limitations of being a movie are more significant.)

Dune-the-book is about a lot of things, and I read it as a child, so it holds extra weight in my mind compared to other scifi that I came across when fully formed. One of the ways I feel sort-of-betrayed by Dune is that a lot of the things are fake or bad on purpose; the sandworms are biologically implausible; the ecology of Dune (one of the things it's often lauded for!) is a cruel trick played on the Fremen (see if you can figure it out, or check the next spoiler block for why); the faith-based power of the Fremen warriors is a mirage; the Voice seems implausible; and so on.

The sandworms, the sole spice-factories in the universe (itself a crazy setting detail, but w/e), are killed by water, and so can only operate in deserts. In order to increase spice production, more of Dune has to be turned into a desert. How is that achieved? By having human caretakers of the planet who believe in a mercantilist approach to water--the more water you have locked away in reservoirs underground, the richer you are. As they accumulate water, the planet dries out, the deserts expand, and the process continues. And even if some enterprising smuggler decides to trade water for spice, the Fremen will just bury the water instead of using it to green the planet.

But anyway, one of the things that Dune-the-book got right is that a lot of the action is mental, and that a lot of what differentiates people is perceptual abilities. Some of those abilities are supernatural--the foresight enabled by spice being the main example--but are exaggerations of real abilities. It is possible to predict things about the world, and Dune depicts the predictions as, like, possibilities seen from a hill, with other hills and mountains blocking the view, in a way that seems pretty reminiscent of Monte Carlo tree search. This is very hard to translate to a movie! They don't do any better a job of depicting Paul searching thru futures than Marvel did of Doctor Strange searching thru futures, and the climactic fight is a knife battle between a partial precog and a full precog, which is worse than the fistfight in Sherlock Holmes (2009).

And I think this had them cut one of my favorite things from the book, which was sort of load-bearing to the plot. Namely, Hasimir Fenring, a minor character who has a pivotal moment in the final showdown between Paul and the Emperor after being introduced earlier. (They just don't have that moment.)

Why do do I think he's so important? (For those who haven't read the book recently, he's the emperor's friend, from one of the bloodlines the Bene Gesserit are cultivating for the Kwisatz Haderach, and the 'mild-mannered accountant' sort of assassin.)

The movie does successfully convey that the Bene Gesserit have options. Not everything is riding on Paul. They hint that Paul being there means that the others are close; Feyd talks about his visions, for example.

But I think there's, like, a point maybe familiar from thinking about AI takeoff speeds / conquest risk, which is: when the first AGI shows up, how sophisticated will the rest of the system be? Will it be running on near-AGI software systems, or legacy systems that are easy to disrupt and replace?

In Dune, with regards to the Kwisatz Haderach, it's near-AGI. Hasimir Fenring could kill Paul if he wanted to, even after Paul awakes as KH, even after Paul's army beats the Sardaukar and he reaches the emperor! Paul gets this, Paul gets Hasimir's lonely position and sterility, and Paul is empathetic towards him; Hasimir can sense Paul's empathy and they have, like, an acausal bonding moment, and so Hasimir refuses the Emperor's request to kill Paul. Paul is, in some shared sense, the son he couldn't have and wanted to.

One of the other subtler things here is--why is Paul so constrained? The plot involves literal wormriding I think in part to be a metaphor for riding historical movements. Paul can get the worship of the Fremen--but they decide what that means, not him, and they decide it means holy war across the galaxy. Paul wishes it could be anything else, but doesn't see how to change it. I think one of the things preventing him from changing it is the presence of other powerful opposition, where any attempt to soften his movement will be exploited.

Jumping back to a review of the movie (instead of just their choices about the story shared by movie and book), the way it handles the young skeptic vs. old believer Fremen dynamic seems... clumsy? Like "well, we're making this movie in 2024, we have to cater to audience sensibilities". Paul mansplains sandwalking to Chani, in a moment that seems totally out of place, and intended to reinforce the "this is a white guy where he doesn't belong" narrative that clashes with the rest of the story. (Like, it only makes sense as him trolling his girlfriend, which I think is not what it's supposed to be / how it's supposed to be interpreted?) He insists that he's there to learn from the Fremen / the planet is theirs, but whether this is a cynical bid for their loyalty or his true feeling is unclear. (Given him being sad about the holy war bit, you'd think that sadness might bleed over into what the Fremen want from him more generally.) Chani is generally opposed to viewing him as a prophet / his more power-seeking moves, and is hopefully intended as a sort of audience stand-in; rooting for Paul but worried about what he's becoming. But the movie is about the events that make up Paul's campaign against the Harkonnen, not the philosophy or how anyone feels about it at more than a surface level.

Relatedly, Paul blames Jessica for fanning the flames of fanaticism, but this doesn't engage with that this is what works on them, or that it's part of the overall narrow-path-thru. In general, Paul seems to do a lot of "being sad about doing the harmful thing, but not in a way that stops him from doing the harmful thing", which... self-awareness is not an excuse?

Comment by Vaniver on Elon files grave charges against OpenAI · 2024-03-01T19:03:16.610Z · LW · GW

I think open source AI development is bad for humanity, and think one of the good things about the OpenAI team is that they seem to have realized this (tho perhaps for the wrong reasons).


I am curious about the counterfactual where the original team had realized being open was a mistake from the beginning (let's call that hypothetical project WindfallAI, or whatever, after their charter clause). Would Elon not have funded it? Would some founders (or early employees) have decided not to join?

Comment by Vaniver on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-17T01:53:10.588Z · LW · GW

It doesn't present or consider any evidence for the alternatives. 

So, in the current version of the post (which is edited from the original) Roko goes thru the basic estimate of "probability of this type of virus, location, and timing" given spillover and lab leak, and discounts other evidence in this paragraph:

These arguments are fairly robust to details about specific minor pieces of evidence or analyses. Whatever happens with all the minor arguments about enzymes and raccoon dogs and geospatial clustering, you still have to explain how the virus found its way to the place that got the first BSL-4 lab and the top Google hits for "Coronavirus China", and did so in slightly less than 2 years after the lifting of the moratorium on gain-of-function research. And I don't see how you can explain that other than that covid-19 escaped from WIV or a related facility in Wuhan.

I don't think that counts as presenting it, but I do think that counts as considering it. I think it's fine to question whether or not the arguments are robust to those details--I think they generally are and have not been impressed by any particular argument in favor of zoonosis that I've seen, mostly because I don't think they properly estimate the probability under both hypotheses[1]--but I don't think it's the case that Roko is clearly making procedural errors here. [It seems to me like you're arguing he's making procedural errors instead of just combing to the wrong conclusion / using the wrong numbers, and so I'm focusing on that as the more important point.]

If it's not a lot of evidence

This is what numbers are for. Is "1000-1" a lot? Is it tremendous? Who cares about fuzzy words when the number 1000 is right there. (I happen to think 1000-1 is a lot but is not tremendous.)


  1. ^

    For example, the spatial clustering analysis suggests that the first major transmission event was at the market. But does their model explicitly consider both "transfer from animal to many humans at the market" and "transfer from infected lab worker to many humans at the market" and estimate probabilities for both? I don't think so, and I think that means it's not yet in a state where it can be plugged into the full Bayesian analysis. I think you need to multiply the probability that it was from the lab times the first lab-worker superspreader event happening at the market and compare that to the probability that it was from an animal times the first animal-human superspreader event happening at the market, and then you actually have some useful numbers to compare.

Comment by Vaniver on CFAR Takeaways: Andrew Critch · 2024-02-15T22:36:02.520Z · LW · GW

"I already tried this and it didn't work."

Comment by Vaniver on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-15T22:33:40.908Z · LW · GW

This post expresses a tremendous amount of certainty, and the mere fact that debate was stifled cannot possibly demonstrate that the stifled side is actually correct.

Agreed on the second half, and disagreed on the first. Looking at the version history, the first version of this post clearly identifies its core claims as Roko's beliefs and as the lab as being the "likely" origin, and those sections seem unchanged to today. I don't think that counts as tremendous certainty. Later, Roko estimates the difference in likelihoods between two hypotheses as being 1000:1, but this is really not a tremendous amount either.

What do you wish he had said instead of what he actually said?

It was terrible, and likely backfired, but that isn't "the crime of the century" being referenced, that would be the millions of dead people. 

As I clarify in a comment elsewhere, I think we should treat them as being roughly equally terrible. If we would execute someone for accidentally killing millions of people, I think we should also execute them for destroying evidence that they accidentally killed millions of people, even if it turns out they didn't do it.

My weak guess is Roko is operating under a similar strategy and not being clear enough on the distinction the two halves of "they likely did it and definitely covered it up". Like, the post title begins with "Brute Force Manufactured Consensus", which he feels strongly about in this case because of the size of the underlying problem, but I think it's also pretty clear he is highly opposed to the methodology.

Comment by Vaniver on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-15T22:13:12.379Z · LW · GW

There are two ways I can read this.

I mean a third way, which is that covering up or destroying evidence of X should have a penalty of roughly the same severity as X. (Like, you shouldn't assume they covered it up, you should require evidence that they covered it up.)

I feel like this is jumping to the conclusion that they're gullible

I think you're pushing my statement further than it goes. Not everyone in a group has to be gullible for the social consensus of the group to be driven by gullibility, and manufactured consensus itself doesn't require gullibility. (My guess is that more people are complicit than gullible, and more people are refusing-to-acknowledge ego-harmful possibilities than clear-mindedly setting out to deceive the public.)

To elaborate on my "courtier's reply" comment, and maybe shine some light on 'gullibility', it seems to me like most religions maintain motive force thru manufactured consensus. I think if someone points that out--"our prior should be that this religion is false and propped up by motivated cognition and dysfunctional epistemic social dynamics"--and someone else replies with "ah, but you haven't engaged with all of the theological work done by thinkers about that religion", I think the second reply does not engage with the question of what our prior should be. I think we should assume religions are false by default, while being open to evidence.

I think similarly the naive case is that lab leak is substantially more likely than zoonosis, but not so overwhelmingly that there couldn't be enough evidence to swing things back in favor of zoonosis. If that was the way the social epistemology had gone--people thought it was the lab, there was a real investigation and the lab was cleared--then I would basically believe the consensus and think the underlying process was valid.

Comment by Vaniver on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-15T01:29:02.504Z · LW · GW

So, from my perspective there are two different issues, one epistemic, and the one game-theoretic.

From the epistemic perspective, I would like to know (as part of a general interest in truth) what the true source of the pandemic was.

From the game-theoretic perspective, I think we have sufficiently convincing evidence that someone attempted to cover up the possibility that they were the source of the pandemic. (I think Roko's post doesn't include as much evidence as it could: he points to the Lancet article but not the part of it that's calling lab leak a conspiracy theory, he doesn't point to the released email discussions, etc.) I think the right strategy is to assume guilt in the presence of a coverup, because then someone who is genuinely uncertain as to whether or not they caused the issue is incentivized to cooperate with investigations instead of obstruct them.

That is, even if further investigation shows that COVID did not originate from WIV, I still think it's a colossal crime to have dismissed the possibility of a lab leak and have fudged the evidence (or, at the very least, conflicted the investigations).

I think it's also pretty obvious that the social consensus is against lab leak not because all the experts have watched the 17 hour rootclaim debate, but because it was manufactured, which makes me pretty unsympathetic to the "researching and addressing counter-arguments" claim; it reminds me of the courtier's reply.

Comment by Vaniver on CFAR Takeaways: Andrew Critch · 2024-02-15T01:02:51.641Z · LW · GW

If "they already tried it and it didn't work" they're real into that [Ray interpretation: as an excuse not to try more].

I think I've had this narrative in a bunch of situations. My guess is I have it too much, and it's like fear-of-rejection where it's worth running the numbers and going out on a limb more than people would do by default. But also it really does seem like lots of people overestimate how easy problems are to solve, or how many 'standard' solutions people have tried, or so on. [And I think there's a similar overconfidence thing going on for the advice-giver, which generates some of the resistance.]

It's also not that obvious what the correct update is. Like, if you try a medication for problem X and it fails, it feels like that should decrease your probability that any sort of medication will solve the problem. But this is sort of like the sock drawer problem,[1] where it's probably easy to overestimate how much to update.

  1. ^

    Suppose you have a chest of drawers with six drawers in it, and you think there's a 60% chance the socks are in the chest, and then they're not in the first five drawers you look in. What's the chance they're in the last drawer?

Comment by Vaniver on The impossible problem of due process · 2024-02-09T21:16:01.412Z · LW · GW

When people found out about ACDC's previous ruling on Brent, many were appalled that ACDC had seen the evidence laid out in the Medium posts and ruled that it was okay for Brent to continue on like that

As I recall, the ACDC had in fact not seen the evidence laid out in the Medium posts. (One of the panelists sent an email saying that they had, but this turned out to be incorrect--there was new information, just not in the section he had read when he sent the email, and prematurely sending that email was viewed as one of the ACDC's big mistakes, in addition to their earlier ruling.)

Comment by Vaniver on Most experts believe COVID-19 was probably not a lab leak · 2024-02-05T18:52:34.942Z · LW · GW

Another Insanity Wolf meme!

On the one hand, yes, I agree; I thought virology research was crazy back in 2017? when someone at Event Horizon shared a paper which did a cost-benefit analysis and thought the net effect of BSL-4 labs was something like a hundred deaths per year per lab.

But I think it is important to be able to accurately understand what other people think so that you can talk to them instead of past them. (I still remember, with some bitterness, an op-ed exchange where two people debating virology said, roughly, "these things are so dangerous we shouldn't study them" and "these things are so dangerous we have to study them", and that was the end of the discussion, with agreement on the danger and no real ability to estimate the counterfactuals.)

Did we need to know anything but "Covid is an airborne infectious respiratory virus"? How much research prior to the event did it take to know that?

This account of vaccine development claims that having done research on spike proteins back in 2016 was helpful in being able to rapidly develop the vaccine once the genome was uploaded, for example.

[To be clear, I think it's important to distinguish here between gain of function research, which was disliked enough for there to be a funding moratorium (that then expired), and storing / working with dangerous viruses at all, which I think also is below the cost-benefit threshold, but this is a harder case to make.]

Comment by Vaniver on Most experts believe COVID-19 was probably not a lab leak · 2024-02-05T06:13:23.363Z · LW · GW

The more important point here is that both zoonotic virus jumps and lab leaks are at-large risks that humanity should seek to reduce!

I hear one of the stated reasons for the labs is to study viruses and predict zoonotic jumps. At least some people think we were able to handle COVID so effectively because we were studying viruses in labs and anticipating what might happen, i.e. the net effect of labs is positive.

Given its size, it seems like whether COVID is in the 'pro' or 'con' column does a lot to our sense of whether or not this sort of virology has been good for humans or not and should continue into the future.

Comment by Vaniver on Most experts believe COVID-19 was probably not a lab leak · 2024-02-05T06:09:49.129Z · LW · GW

I think this is evidence, but weak evidence--it updates me more on "Rootclaim isn't great at debates" than it does on the underlying issue. (Like, how much should William Lane Craig winning his debates update me on theism?)

I think if I started off at 90% confidence of lab leak, Rootclaim losing wouldn't bring me below 80% confidence of lab leak. Plausibly Peter Miller's arguments contain defeaters for my specific beliefs, and going thru the debate would bring me much lower, but I don't yet have that sense from the summaries I've seen.

Comment by Vaniver on Brute Force Manufactured Consensus is Hiding the Crime of the Century · 2024-02-04T16:56:20.767Z · LW · GW

I am not suggesting that.

Why not? Are you pointing at that the relevant factor is "population within that distance" instead of "distance"?

Comment by Vaniver on Drone Wars Endgame · 2024-02-01T18:27:41.888Z · LW · GW

Maybe I'm confused about the amount of overhead digital signing / verification adds to communication, but do you think that works at missile speeds? (I don't doubt that it works at drone speeds.)

[To be clear, I'm trying to imagine the whole "distant spotter + laser transmission to missile" system, where increasing the length of messages increases the amount of time you need to have successfully targeted the missile in order to successfully transmit a message.]

Comment by Vaniver on Drone Wars Endgame · 2024-02-01T17:01:33.092Z · LW · GW

To elaborate, it's pretty easy to kill someone important if you are willing to be arrested/executed afterwards; the main thing a suicide drone might enable is killing someone important and being able to escape afterwards. This could already be done with dupes, like the 2017 killing of Kim Jong-nam, but I think the nerve agent involved was more expensive than a handmade gun.

Comment by Vaniver on Drone Wars Endgame · 2024-02-01T16:30:10.623Z · LW · GW

Flares can be overcome by a mesh of recon drones somewhat close to the target that can give targeting information to the missile.

This seems overly optimistic to me / is my guess of where the next countermeasure will show up. If your missile is accepting external course-corrections, the enemy can maybe spoof incorrect course-corrections; the more directional the system is, the harder it is to actually hit your fast-moving and course-correcting missile. 

Comment by Vaniver on Notes on Innocence · 2024-01-31T02:41:01.061Z · LW · GW

Suppose it had been "What do you call an abortion clinic for pianists?"


Comment by Vaniver on If Clarity Seems Like Death to Them · 2024-01-05T18:07:12.135Z · LW · GW

Most of those posts are from before the thing I call "constant abuse" began on LessWrong.

I think I remember this timeline differently, or would like you to be a bit more clear on what you mean. I thought of this as an entrenched conflict back in 2019, which was before all the posts used as examples.

Comment by Vaniver on The Plan - 2023 Version · 2023-12-31T22:39:39.532Z · LW · GW

Resolution: You Don’t Get To Choose The Problem Factorization. The key here is that it’s the problem space which determines the good factorizations, and we have to go look at the problem space and do some work to figure out what those factorizations are.

This reminds me to ask--have you read Notes on the Synthesis of Form by Christopher Alexander, yet? (I summarized it a year ago but it may be worth you going to the original source.)

Comment by Vaniver on If Clarity Seems Like Death to Them · 2023-12-31T20:05:10.305Z · LW · GW

"that person, who wants to be treated in the way that people usually treat men"

Incidentally, one of the things I dislike about this framing is that gender stereotypes / scripts "go both ways". That is, it should be not just "treated like a man" but also "treat people like men do."

Comment by Vaniver on If Clarity Seems Like Death to Them · 2023-12-31T20:03:22.365Z · LW · GW

Is there a way to summarize this shortly? Eliezer disagreed with you about something, or maybe you just interpreted something he wrote as a disagreement with you... and now your soul can't find peace until he admits that he was wrong and you were right about things that are too meta for me to understand wtf you are talking about...

Here's an attempt.

Sometimes people have expectations of each other, like "you won't steal objects from my house".  Those expectations get formed by both explicit and implicit promises. Violating those expectations is often a big deal, not just to the injured party but also to third parties--someone who stole from Alice might well steal from you, too.

To the extent this community encouraged expectations of each other, they were about core epistemic virtues and discussion practices. People will try to ensure their beliefs are consistent with their other beliefs; they won't say things without believing them; they'll share evidence when they can; when they are bound to be uncooperative, they at least explain how and why they'll be uncooperative, and so on. 

[For example, I keep secrets because I think information can be owned, even tho this is cooperative with the information-owner and not with the information-wanter.]

So "Eliezer disagreed with you about something" is an understatement; disagreement is fine, expected even! The thing was that instead of having a regular disagreement in the open, Zack saw Eliezer as breaking a lot of these core expectations, not being open about it or acknowledging it when being called out, and also others not reacting to Eliezer breaking those expectations. (If Eliezer had punched Zack, people would probably have thought that was shocking and criticized it, but this was arguably worse given the centrality of these expectations to Eliezer's prominence and yet people were reacting less.)

That said, the promises were (I think) clearly aspirational / mediated by the pressures of having to actually exist in the world. I do think it makes sense to have a heresy budget, and I think Zack got unlucky with the obsession lottery. I think if people had originally said to Zack "look, we're being greengrocers on your pet issue, sorry about throwing you to the wolves" he would have been sad but moved on; see his commentary on the 2013 disavowal.

Instead they made philosophical arguments that, as far as I can tell, were not correct, and this was crazy-making, because Zack now also doubted his reasoning that led to him disagreeing with them, but no one would talk about this publicly. (Normally if Zack was making a mistake, people could just point to the mistake, and then he could fix the upstream generator of that mistake and everyone could move on.) And, also, to the extent that they generalized their own incorrect justifications to reasoning about other fields, this was making them crazy, in a way that should have alarmed third parties who were depending on their reasoning. The disinterest of those third parties was itself also expectation-violating.

[I don't think I was ever worried about this bleeding over into reasoning about other things; I probably would have joined the conversation more actively if I had? I do regret not asking people what their strategy was back in ~2019; the only people I remember talking to about this were Zack and the LW team.]

Comment by Vaniver on If Clarity Seems Like Death to Them · 2023-12-31T19:16:53.983Z · LW · GW

Jessica thought my use of "heresy" was conflating factual beliefs with political movements. (There are no intrinsically "right wing" facts.) I agreed that conflating political positions with facts would be bad.

I don't get what 'intrinsically' is doing in the middle sentence. (Well, to the extent that I have guessed what you meant, I disagree.)

Like, yes, there's one underlying reality, descriptions of it get called facts.

But isn't the broader context the propagation of propositions, not the propositions themselves? That is, saying X is also saying "pay attention to X" and if X is something whose increased salience is good for the right-wing, then it makes sense to categorize it as a 'right wing fact', as left-wing partisans will be loathe to share it and right-wing partisans will be eager to.

Like, currently there's an armed conflict going on in Israel and Palestine which is harming many people. Of the people most interested in talking about it that I see on the Internet, I sure see a lot of selectivity in which harms they want to communicate, because their motive for communicating about it is not attempting to reach an unbiased estimate, but to participate in a cultural conflict which they hope their side will win. (One could summarize this view as "speech is intrinsically political.")

This bit of HPMOR comes to mind:

"I don't suppose you could explain," Harry said dryly, "in your capacity as an official of the Hogwarts school system, why catching a golden mosquito is deemed an academic accomplishment worthy of a hundred and fifty House points?"

A smile crossed Severus's lips. "Dear me, and I thought you were supposed to be perceptive. Are you truly so incapable of understanding your classmates, Potter, or do you dislike them too much to try? If Quidditch scores did not count toward the House Cup then none of them would care about House points at all. It would merely be an obscure contest for students like you and Miss Granger."

It was a shockingly good answer.

Comment by Vaniver on Vaniver's Shortform · 2023-12-21T01:46:37.433Z · LW · GW

Steam Wrapped got me thinking about games from 2023, so here are some thoughts/recommendations/anti-recommendations. The theme of this year for me was apparently RPGs made by studios whose RPGs I had played before:

  • Baldur's Gate 3: Game of the Year for a reason; took me a bit over a hundred hours on the hardest difficulty setting. (They've since released a harder one.) Doesn't require experience with Dungeons & Dragons, 5th edition specifically, or the previous Baldur's Gate games, tho those enhance the experience. Much more a continuation of Larian's previous RPGs than of the old Baldur's Gate series, which I think is a good thing? Extremely flexible and detailed; you can often be clever and get around things and the game rewards you for it.
    RPGs like these are often made or broken by the quality of the companion NPCs, and I think the crew they have you assemble is a memorable one worth getting to know. Something about playing it felt like it captured the D&D experience (both upsides and downsides) pretty well? Theater kids were involved in the creation of this game, in a good way.
  • Legend of Zelda: Tears of the Kingdom: a sequel to their previous open world Zelda game, and IMO the best 'sequel' I've seen? In the sense of, they know you played the first game, and so now it's the same thing, but different.  Set only a few years after the first game, the map is basically the same (with the new features being mostly vertical expansion--there's now a skyworld and an underworld), your horses from the first game are available in the stables, many recognize you as the guy that saved the world recently. The new physics engine is nice, but the overall plot is... simple but neat? Continuing the theme of "the thing you expect (including novelty!), done competently"
  • Warhammer 40k: Rogue Trader: a new game, and the first Warhammer 40k CRPG. I'm still going thru this one and so don't have a fully realized take here. Made by the people who made Pathfinder: Kingmaker and Pathfinder: Wrath of the Righteous, both of which have an overworld management map plus standard RPG character progression / combat. In Kingmaker, where you're the baron of a new region carved out of the wilderness, I thought it didn't quite fit together (your kingdom management doesn't really matter compared to the RPG plot); in Wrath of the Righteous, where you're appointed the head of a crusade against the Worldwound, I thought it did (mostly b/c of the crusade battle mechanic, a HoMM-style minigame, tho you could see seams where the two systems joined together imperfectly); in Rogue Trader you're a, well, Rogue Trader, i.e. someone tasked by the God-Emperor of humanity to expand the borders of the Imperium by operating along the frontier, and given significant license in how you choose to do so.  You own a flagship (with thousands of residents, most of whom live in clans of people doing the same job for generations!) and several planets, tho ofc this is using sci-fi logic where each planet is basically a single city. There's also a space-battle minigame to add spice to the overworld exploration.
    I am finding the background politics / worldview / whatever of the game quite interesting; the tactical combat is fine but I'm playing on the "this is my first time playing this game" difficulty setting and thinking I probably should have picked a higher one. The Warhammer 40k universe takes infohazards seriously, including the part where telling people what not to think is itself breaking infosec. So you get an extremely dogmatic and siloed empire, where any sort of change is viewed with suspicion as being treason promoted by the Archenemy (because, to be fair, it sometimes is!). Of course, you've got much more flexibility because you inherited an executive order signed by God that says, basically, you can do what you want, the only sort of self-repair the system allows. (But, of course, the system is going about it in a dumb way--you inherit the executive order, rather than having been picked by God!) The three main 'paths' you can take are being Dogmatic yourself, being an Iconoclast (i.e. humanist), or being a Heretic (i.e. on the side of the Archenemy); I haven't yet seen whether the game sycophantically tells you that you made the right choice whatever you pick / the consequences are immaterial or not.
  • Starfield: Bethesda's first new RPG setting in a while. It was... fine? Not very good? I didn't really get hooked by any of the companions (my favorite Starfield companion was less compelling than my least favorite BG3 companion), the whole universe was like 3 towns plus a bunch of procedurally generated 'empty' space, the outpost building was not well-integrated with the rest of the game's systems (it was an upgrade over Fallout 4's outpost-building in some ways but not others), and the central conceit of the plot was, IMO, self-defeating. Spoilers later, since they don't fit well in bulleted lists.
  • Darkest Dungeon II: ok this isn't really an RPG and so doesn't belong on this list, but mentioning it anyway. IMO disappointing compared to Darkest Dungeon. I'm not quite sure what I liked less well, but after 16 hours I decided I would rather play Darkest Dungeon (which I put 160 hours into) and so set it down.

The promised Starfield spoilers:

First, just like in Skyrim you get magic powers and you can get more magic powers by exploring places. But whereas Skyrim tries very hard to get you to interact with dragons / being dragonborn early on, Starfield puts your first power later and doesn't at all advertise "you should actually do this mission". Like, the world map opens up before you unlock that element of gameplay. Which... is sort of fine, because your magic powers are not especially good? I didn't feel the need to hop thru enough universes to chase them all down.

That is, the broader premise is that you can collect some artifacts (which give you the powers), go thru the eye of the universe, and then appear in another universe where you keep your skills and magic powers but lose your items and quest progression. So you can replay the game inside of the game! Some NPCs also have this ability and you're generally fighting them for the artifacts (but not racing, since they never go faster than you). Two characters are the same guy, one who's been thru hundreds of universes and the other thousands; the latter argues you should pick a universe and stick with it. But the net effect is basically the game asking you to not play it, and generally when games do that I take them seriously and stop.

And furthermore, the thing you would most want to do with a new run--try out a new build and new traits or w/e--is the one thing you can't change in their New Game+. If you picked that you were born in the UC, then you'll always be born in the UC, no matter how many times you go thru the Eye. Which, sure, makes sense, but--if I replay Rogue Trader, I'm going to do it with a different origin and class, not just go down a different path. (Like, do I even want to see the plot with a Heretic protagonist?) If I replay Baldur's Gate III, same deal. But Starfield? If I pick it up again, maybe I'll play my previous character and maybe I'll start afresh, but it feels like they should really want me to pick up my old character again. I think they thought I would be enticed to see "what if I played out this quest aligned with a different faction?" but they are mostly about, like, identification instead of consequences. "Do you want the pirates to win or the cops to win?" is not a question I expect people to want to see both sides of.

Comment by Vaniver on Love, Reverence, and Life · 2023-12-19T23:38:18.033Z · LW · GW

I wrote this post on and off over the course of a morning, and towards the end of it realized:

I'm reading you as saying "eating others is inherently not ok" but I would like it to be ok or not contingent on some other facts (like the absence of suffering, or hypothetical net preference, or the ability of people to not have their souls corrupted by carnivorism, or so on) and the generalization of that reasoning to not have terrible consequences elsewhere. (For example, if you think pleasure can't outweigh suffering, then it seems like having kids at all is indefensible, which is a self-extinguishing moral position; if you think something that taken seriously implies it's not even ok to eat plants, then that's even more self-extinguishing.)

I'll still post the rest of the comment I wrote, which responds to you in more detail, but that seems like the most important piece.

relationships where you have to do the complicated emotional gymnastics of saying that you love an animal like their your friend one day and then chopping their head from their body the next and savoring the flavor of the flesh on the grill.

There's a tumblr post where someone talks about immediately feeling the shepherd impulse when interacting with sheep, a bunch of people like the post, someone points out "how many of you eat lamb", and then the original poster responds with "The ancient shepherds I’m referencing also ate lamb lol"

My sense is that there's a few ways to take this. One of them is "actually the emotional gymnastics is not that complicated!", and another is "actually those ancient shepherds also probably abused their wives and thought slavery was fine when it happened to someone else and mistreated their animals, according to our standards; parents caring about their children / guardians caring about their wards is really not sufficient to guarantee good outcomes or license those relationships." I infer your position is closer to the latter but it really feels like it should be possible to have gains from trade, here.

[And, like, one of the downsides of specialization is that it drives people both unusually interested and unusually disinterested in animal welfare into the 'works with animals' business, which is probably how we got into this factory farming mess in the first place.]

My last stab at a response might be to bring up an analogy to slavery. I take the equivalent of your position here to be "look, if each slave can look at the potential life he will hold and prefer that life to no life at all, then isn't that better than him not existing at all?" And to me it seems like I'd be again called to say "no".

I think one of the main ways my libertarian leanings show up is by being okay with people being able to pick worse things that are cheaper. Let people live in tiny houses and work low-paying jobs and sell their bodies and take high-interest loans if that's the right tradeoff for them; removing their options generally isn't helping them.

I think that could extend all the way to slavery, altho it's hard to imagine situations where that actually makes sense. In general, I think children have only a little bit of debt to their parents (certainly not a lifetime of labor and ownership of all their descendants), which is the closest analogy. Probably more realistic is something like conservatorship, where someone is deemed incompetent to handle their financial or medical affairs, and someone else makes those decisions for them; should people be allowed to voluntarily enter a conservatorship?

A fictional version of this shows up in a video game called The Outer Worlds, where a star system is colonized by a group of corporations, where the colonists are a mixture of 'people who put up the capital for the voyage' and 'people agreeing to come as indentured servants', which leads to a very stratified society on the other side, which predictably starts to decay as the colonists have children with huge differences in inherited wealth. Even if Alice decided it was worth being a laborer somewhere new rather than being stuck on Earth, her daughter Carol might not feel like she's bought into this situation and want to violently redistribute things, and it's not obvious that Alice should be able to sell Carol's compliance with society.

But you could imagine that, if Alice can't bind Carol, the colony doesn't go thru, and Carol never comes to exist, and on net Carol is sad about that outcome, and would have preferred having been bound. It feels like an actually thorny question to figure out what tradeoffs precisely make sense, especially because this is a collective bargaining issue (it's not like existing societies get unanimous consent from their participants!) and the empirical tradeoffs are all hypothetical. [My actual expectation is that we get material abundance before we get any interstellar colonies, and so it's not important to get this question right because it'll never come up.]

That is the sort of world I hope for. 

To be clear, this is a world without cats and snakes and other obligate carnivores, right? Or is the plan to first figure out synthetic sources of the various nutrients they need?

[It will also have many fewer other animals--I think on average something like a third of a cow is alive because of my beef consumption--but depending on what you think the limiting factors are, that may mean replacement with fractional vegan humans instead, which is probably an upgrade.]

Comment by Vaniver on What is the next level of rationality? · 2023-12-15T00:23:48.241Z · LW · GW

To explain: Alfred Koryzbski, the guy behind General Semantics, is basically "rationality from 100 years ago". (He lived 1879-1950.) He's ~2 generations before Feynman (1918-1988), who was ~one before Sagan (1934-1996), then there's a 2-3 generation gap to Yudkowsky (1979-). (Of course if you add more names to the list, the gaps disappear; reordering your list, you get James Randi (1928-2020), Dawkins (1941-), Hitchens (1949-2011), Michael Shermer (1954-), and Sam Harris (1967-) which takes you from Feynman to Yudkowsky, basically.)

He features in Rationalism before the Sequences, and is interesting both because 1) you can directly read his stuff, like Science and Sanity, and 2) most of his stuff has already made it to you indirectly, from the student's students. (Yudkowsky apparently wrote the Sequences before reading any Korzybski directly, but read lots of stuff written by people who read Korzybski.)

There are, of course, figures before Korzybski, but I think the gaps get larger / it becomes less obviously "rationalism" instead of something closer to "science". 

Comment by Vaniver on Love, Reverence, and Life · 2023-12-14T23:06:08.829Z · LW · GW

To have this sort of love, this commitment to preventing suffering, with animals to me means pretty much just drawing the line at sentient beings and trying to cultivate a basic sense that they matter and that "it's just bad" to eat them.

I feel like the dialogue circled this for a while, and I want to try to poke directly at it again. I think my line is something like "try to only make trades that the other party 'would' consent to," which means eating high-welfare meat if it seems likely that the animals net prefer being raised to be eaten that way to not existing, tho ofc we have to use our judgment instead of theirs. [This article convinced me to avoid chicken products, for example.] It seems to me like you don't accept this, like in this section here:

When I first started to change my diet, I was most appalled at factory farming, and remember that first call with my parents afterwards, trying to soothe their worries by saying "no, no, this doesn't apply to our animals, I'll keep chowing down on Williams family steaks don't you worry. I'll spare you the details, but long story short I came into the thinking of the above slowly and found myself eventually unable to eat even our products, because now it wasn't a steak on my plate, it was a piece of a cow, maybe even a cow like the one I bottlefed for a year growing up and then sold for slaughter, a good cow named Max. 

Like, I think in the picture where you're only willing to eat Max because of a sense that Max was grateful to have been alive at all, this works. Max might have even more preferred to be a pet, but that's not on offer to all hypothetical Maxes. It's only if there's a clear separation between the friends and food category that this doesn't work, and I see how having that category is consistent but it's not obvious to me that it's the right end goal. (I think many historical people have viewed animals as sacred / people / etc. and also as food.)

[Like, you talk about "what if climate change is solved and all the enviro-vegans disappear?", but this feels to me like that worry is somehow broken; like, what if all the factory farms disappear, and on net all farmed animals are grateful to exist? Then it seems like Mission Accomplished to me, even tho I imagine you will still want to be vegan in that world.]

The story that you tell afterwards is mostly about the standards for checking being too high, but I am not sure how practical that is. You bring up the hypothetical of opposing child slavery coal mines, but I think this is actually a problem for cacao production, and so I try to be about as selective in my chocolate sourcing as I am in my steak sourcing--with the understanding that the ethics of "fair trade" or "grass finished" includes some amount of fraud and errors.

Comment by Vaniver on What is the next level of rationality? · 2023-12-13T21:24:08.914Z · LW · GW

ctrl-f korz


Comment by Vaniver on Neither EA nor e/acc is what we need to build the future · 2023-12-12T19:28:55.677Z · LW · GW

This how you see it?


Nukes have x-risk but humans couldn't help but build them

I think no one seriously considered the prospect of nuclear winter until well after stockpiles were large, and even now it's not obviously an existential concern instead of merely catastrophic. If you're talking about the 'ignite the atmosphere' concern, I think that's actually evidence for voluntary relinquishment--they came up with a number where if they thought the risk was that high, they would give up on the project and take the risk of Nazi victory. 

I expect the consensus estimate will be that AGI projects have risks in excess of that decision criterion, and that will motivate a halt until the risks are credibly lowered.

What if all the other powers at that time went to the Pope and asked for a bull that firing grapeshot wasn't Christian.  Would this change anything?

I assume you're familiar with Innocent II's prohibition on crossbows, and that it wasn't effectively enforced. I am more interested in, say, the American/Israeli prohibition on Iranian nuclear weapons, which does seem to be effectively enforced on Earth. 

The bottlenecks are in the chip fabrication tooling.

Yeah, I think it is more likely that we get compute restrictions / compute surveillance than restrictions on just AI developers. But even then, I think there aren't that many people involved in AI development and it is within the capacities of intelligence agencies to surveil them (tho I am not confident that a "just watch them all the time" plan works out; you need to be able to anticipate the outcomes of the research work they're doing, which requires technical competence that I don't expect those agencies to have).

Comment by Vaniver on On plans for a functional society · 2023-12-12T07:16:49.312Z · LW · GW

Like, you shouldn't work yourself ragged, but my guess is for most people, working on something meaningful (or at least difficult) is actually more fun and rewarding compared to the alternative of doing nothing or hedonism or whatever, even if you ultimately fail. (And on the off-chance you succeed, things can be a lot more fun.)

I think one of the potential cruxes here is how many of the necessary things are fun or difficult in the right way. Like, sure, it sounds neat to work at a geothermal startup and solve problems, and that could plausibly be better than playing video games. But, does lobbying for permitting reform sound fun to you?

The secret of video games is that all of the difficulty is, in some deep sense, optional, and so can be selected to be interesting. ("What is drama, but life with the dull bits cut out?") The thing that enlivens the dull bits of life is the bigger meaning, and it seems to me like the superstructure is what makes the bigger meaning more real and less hallucinatory.

those successes could snowball in lots of different directions pretty quickly, without much meta-level direction.

This seems possible to me, but I think most of the big successes that I've seen have looked more like there's some amount of meta-level direction. Like, I think Elon Musk's projects make more sense if your frame is "someone is deliberately trying to go to Mars and fill out the prerequisites for getting there". Lots of historical eras have people doing some sort of meta-level direction like this.

But also we might just remember the meta-level direction that was 'surfing the wave' instead of pushing the ocean, and many grand plans have failed.

Comment by Vaniver on Neither EA nor e/acc is what we need to build the future · 2023-12-12T07:00:01.493Z · LW · GW

Vaniver is it your belief that a worldwide AI pause - not one limited to a specific geographic area - is a plausible outcome? Could you care to elaborate in more detail why you think it would be possible?

Yes, I think it's plausible. I don't think it's especially likely--my modal scenario still involves everyone dying--but I think especially if you condition on success it seems pretty likely, and it makes sense to play to your outs.

The basic argument for plausibility is that 1) people are mostly awake to risks from advanced AI, 2) current power structures are mostly not enamored with AI / view it as more likely to be destabilizing than stabilizing, 3) the people pushing for unregulated AI development are not particularly charismatic or sympathetic, and 4) current great powers are pretty willing to meddle in other countries when it comes to serious national security issues.

I expect pauses to look more like "significant regulatory apparatus" than "ban"; the sort of thing where building new nuclear plants was legal with approval and yet it takes decades to get NRC approval. Probably this involves a significant change in how chips are constructed and sold. [I note that computer hardware seems like an area where people are pouring gas onto the race instead of trying to slow down.]

I think as the race heats up and AI becomes more and more promising, we might see national total efforts to develop AI faster.

I think this might happen, and is >98% likely to be game over for humanity.

Comment by Vaniver on Secondary Risk Markets · 2023-12-12T06:41:35.991Z · LW · GW

I don't understand why we want to do this.

I want Alice to have help choosing what things to do and not do, in the form of easily understandable prices that turn uncertain badness ("it'll probably be fine, I probably won't break the camera") into certain costs ("hmm, am I really going to get $70 worth of value from using this camera?").

I am most interested in this in contexts where self-insurance is not reasonable to expect. Like, if some satellite company / government agency causes Kessler Syndrome, they're not going to be able to pay back the rest of the Earth on their own, and so there's some temptation to just ignore that outcome; "we'll be bankrupt anyway." But society as a whole very much does not want them to ignore that outcome; society wants avoiding that outcome to be more important to them than the survival of their company, and something like apocalypse insurance seems like the right way to go about that.

But how do you price the apocalypse insurance? You don't want to just kick the can down the road, where now the insurance company is trying to look good to regulators while being cheap enough for customers to get business, reasoning "well, we'll be bankrupt anyway" about the catastrophe happening.

You mention the "unilateralist's curse", but this sounds more like the "auction winner's curse",

I think those are very similar concepts, to the point of often being the same.

which I would expect an insurer to already be taking into account when setting their prices (as that's the insurer's entire core competency).

I probably should have brought up the inexploitability concept from Inadequate Equilbria; I'm arguing that mistaken premiums are inexploitable, because Carol can't make any money from correcting Bob's mistaken belief about Alice, and I want a mechanism to make it exploitable.

Normally insurers just learn from bad bets after the fact and this is basically fine, from society's point of view; when we're insuring catastrophic risks (and using insurance premiums to determine whether or not to embark on those risks) I think it's worth trying to make the market exploitable.

If you buy $1 of synthetic risk for $0.05, does that mean you get $1.00 if Alice breaks the camera, and $0.00 if Alice does not?

Yes, the synthetic risk paying out is always conditional. The sketch I have for that example is Bob has to offer $10 of synthetic risk at each percentage point, except I did the math as tho it were continuous, which you can also do by just choosing midpoints. So there's $10 for sale at $0.55, another $10 for $0.65, and so on; Carol's $40 for $2.80 comes from buying $0.55+0.65+0.75+0.85 (and she doesn't buy the $0.95 one because it looks like a 5 cent loss to her). That is, your tentative guess looks right to me.

The $910 that goes unsold is still held by Bob, so if the camera is wrecked Bob has to pay themselves $910, which doesn't matter. 

As you point out, Bob pays $1.25 for the first $50 of risk, which ends up being a wash. Does that just break the whole scheme, since Bob could just buy all the required synthetic risk and replicate the two-party insurance market? Well... maybe. Maybe you need a tiny sales tax, or something, but I think Bob is incentivized to participate in the market. Why did we need to require it, then? I don't have a good answer there. (Maybe it's easier to have mandatory prediction markets than just legalizing them.)

Comment by Vaniver on Secondary Risk Markets · 2023-12-11T22:02:32.648Z · LW · GW

the problem I most care about

I want markets for x-risks, basically. Suppose someone wants to train an AI and they're pretty sure it'll be fine, but only pretty sure. How do we aggregate estimates to figure out whether or not the AI should be trained? Seems like we should be able to have a risk market. [so8res proposes apocalypse insurance here with "premiums dependent on their behavior", but what's the dependence? Is it just set by the regulator?]

But the standard problems appear; on the "we all die" risk, the market doesn't exist and so people who bet on risk never get paid out.

You could imagine instead using a cap-and-trade system, where, say, only 3 AIs can be trained per year and companies bid for one of the permits, but it seems like this is still tilted towards "who thinks they can make the most money from success?" and not "who does the group think is least likely to fail?". You could have instead an explicit veto/permission system, where maybe you have 11 votes on whether or not to go thru with an AI training run and you need to buy at least 6 'yes'es, but this doesn't transfer resources from worse predictors to better predictors, just from projects that look worse to projects that look better.

And so I think we end up with, as johnswentworth suggests over here, needing to use regular liability / regular insurance, where people are betting on short-term questions  that we expect to resolve ("how will this lawsuit by rightsholders against generative AI companies go?") instead of the unresolvable questions that are most existentially relevant.

Comment by Vaniver on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-12-08T20:58:48.883Z · LW · GW

"Building an AI that doesn't game your specifications" is the actual "alignment question" we should be doing research on.

Ok, it sounds to me like you're saying:

"When you train ML systems, they game your specifications because the training dynamics are too dumb to infer what you actually want. We just need One Weird Trick to get the training dynamics to Do What You Mean Not What You Say, and then it will all work out, and there's not a demon that will create another obstacle given that you surmounted this one."

That is, training processes are not neutral; there's the bad training processes that we have now (or had before the recent positive developments) and eventually will be good training processes that create aligned-by-default systems.

Is this roughly right, or am I misunderstanding you?

Comment by Vaniver on My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" · 2023-12-08T20:48:16.536Z · LW · GW

If you created a misaligned AI, then it would be "thinking back", and you'd be in an adversarial position where security mindset is appropriate.

Cool, we agree on this point.

my point in that section is that the fundamental laws governing how AI training processes work are not "thinking back". They're not adversaries.

I think we agree here on the local point but disagree on its significance to the broader argument. [I'm not sure how much we agree-I think of training dynamics as 'neutral', but also I think of them as searching over program-space in order to find a program that performs well on a (loss function, training set) pair, and so you need to be reasoning about search. But I think we agree the training dynamics are not trying to trick you / be adversarial and instead are straightforwardly 'trying' to make Number Go Down.]

In my picture, we have the neutral training dynamics paired with the (loss function, training set) which creates the AI system, and whether the resulting AI system is adversarial or not depends mostly on the choice of (loss function, training set). It seems to me that we probably have a disagreement about how much of the space of (loss function, training set) leads to misaligned vs. aligned AI (if it hits 'AI' at all), where I think aligned AI is a narrow target to hit that most loss functions will miss, and hitting that narrow target requires security mindset.

To explain further, it's not that the (loss function, training set) is thinking back at you on its own; it's that the AI that's created by training is thinking back at you. So before you decide to optimize X you need to check whether or not you actually want something that's optimizing X, or if you need to optimize for Y instead.

So from my perspective it seems like you need security mindset in order to pick the right inputs to ML training to avoid getting misaligned models.

Comment by Vaniver on MATS Summer 2023 Retrospective · 2023-12-03T01:18:18.044Z · LW · GW


Comment by Vaniver on MATS Summer 2023 Retrospective · 2023-12-02T02:37:07.065Z · LW · GW

Congrats on another successful program!

Mentors rated their enthusiasm for their scholars to continue with their research at 7/10 or greater for 94% of scholars.

What is it at 9/10 or greater? My understanding is that 7/10 and 8/10 are generally viewed as 'neutral' scores, and this is more like "6% of scholars failed" than it is "94% of scholars succeeded." (It looks like averages of roughly 8 are generally viewed as 'high' in this postmortem so this population might be tougher raters than in other contexts, and so I'm wrong on what counts as 'neutral'.)

Comment by Vaniver on The 101 Space You Will Always Have With You · 2023-11-30T05:43:35.615Z · LW · GW

I think something like 2015?

Comment by Vaniver on The 101 Space You Will Always Have With You · 2023-11-29T23:21:19.367Z · LW · GW

Also, I think our Rationality Quotes threads (like this one) were pretty good for enculturation.

Comment by Vaniver on Neither EA nor e/acc is what we need to build the future · 2023-11-28T19:05:21.526Z · LW · GW

A lot of my thinking over the last few months has shifted from "how do we get some sort of AI pause in place?" to "how do we win the peace?". That is, you could have a picture of AGI as the most important problem that precedes all other problems; anti-aging research is important, but it might actually be faster to build an aligned artificial scientist who solves it for you than to solve it yourself (on this general argument, see Artificial Intelligence as a Positive and Negative Factor in Global Risk). But if alignment requires a thirty-year pause on the creation of artificial scientists to work, that belief flips--now actually it makes sense to go ahead with humans researching the biology of aging, and to do projects like Loyal

This isn't true of just aging; there are probably something more like twelve major areas of concern. Some of them are simply predictable catastrophes we would like to avert; others are possibly necessary to be able to safely exit the pause at all (or to keep the pause going when it would be unsafe to exit).

I think 'solutionism' is basically the right path, here. What I'm interested in: what's the foundation for solutionism, or what support does it need? Why is solutionism not already the dominant view? I think one of the things I found most exciting about SENS was the sense that "someone had done the work", had actually identified the list of seven problems, and had a plan of how to address all of the problems. Even if those specific plans didn't pan out, the superstructure was there and the ability to pivot was there. It looked like a serious approach by serious people. What is the superstructure for solutionism such that one can be reasonably confident that marginal efforts are actually contributing to success, instead of bailing water on the Titanic?

Comment by Vaniver on Social Dark Matter · 2023-11-20T00:16:57.946Z · LW · GW

Hearing, on my way out the door, when I'm exhausted beyond all measure and feeling deeply alienated and betrayed, "man, you should really consider sticking around" is upsetting.

This is not how I read Seth Herd's comment; I read him as saying "aw, I'll miss you, but not enough to follow you to Substack." This is simultaneously support for you staying on LW and for the mods to reach an accommodation with you, intended as information for you to do what you will with it.

I think the rest of this--being upset about what you think is the frame of that comment--feels like it's the conflict in miniature? I'm not sure I have much helpful to say, there.

Comment by Vaniver on Thoughts on the AI Safety Summit company policy requests and responses · 2023-11-03T03:57:29.189Z · LW · GW

My understanding is that their commitment is to stop once their ASL-3 evals are triggered.

Ok, we agree. By "beyond ASL-3" I thought you meant "stuff that's outside the category ASL-3" instead of "the first thing inside the category ASL-3".

For the Anthropic RSP in particular, I think it's accurate & helpful to say 

Yep, that summary seems right to me. (I also think the "concrete commitments" statement is accurate.)

But I want to see RSP advocates engage more with the burden of proof concerns.

Yeah, I also think putting the burden of proof on scaling (instead of on pausing) is safer and probably appropriate. I am hesitant about it on process grounds; it seems to me like evidence of safety might require the scaling that we're not allowing until we see evidence of safety. On net, it seems like the right decision on the current margin but the same lock-in concerns (if we do the right thing now for the wrong reasons perhaps we will do the wrong thing for the same reasons in the future) worry me about simply switching the burden of proof (instead of coming up with a better system to evaluate risk).

Comment by Vaniver on Thoughts on the AI Safety Summit company policy requests and responses · 2023-11-01T21:36:52.038Z · LW · GW

I got the impression that Anthropic wants to do the following things before it scales beyond ASL-3:

Did you mean ASL-2 here? This seems like a pretty important detail to get right. (What they would need to do to scale beyond ASL-3 is meet the standard of an ASL-4 lab, which they have not developed yet.)

I agree with Habryka that these don't seem likely to cause Anthropic to stop scaling:

By design, RSPs are conditional pauses; you pause until you have met the standard, and then you continue. If you get the standard in place soon enough, you don't need to pause at all. This incentivizes implementing the security and safety procedures as soon as possible, which seems good to me.

But the RSP does not commit Anthropic to having any particular containment measures or any particular evidence that it is safe to scale to ASL-4 it only commits Anthropic to publish a post about ASL-4 systems. This is why I don't consider the ASL-4 section to be a concrete commitment. 

Yes, I agree that the ASL-4 part is an IOU, and I predict that when they eventually publish it there will be controversy over whether or not they got it right. (Ideally, by then we'll have a consensus framework and independent body that develops those standards, which Anthropic will just sign on to.)

Again, this is by design; the underlying belief of the RSP is that we can only see so far ahead thru the fog, and so we should set our guidelines bit-by-bit, rather than pausing until we can see our way all the way to an aligned sovereign. 

Comment by Vaniver on Thoughts on the AI Safety Summit company policy requests and responses · 2023-11-01T17:24:09.919Z · LW · GW

Are you thinking that psychology-focused AI would notice the existence of their operators sooner than non-psychology AI? Or is it more about influence AI that people deliberately point at themselves instead of others?

I am mostly thinking about the former; I am worried that psychology-focused AI will develop more advanced theory of mind and be able to hide going rogue from operators/users more effectively, develop situational awareness more quickly, and so on.

I currently predict that the AI safety community is best off picking its battles and should not try to interfere with technologies that are as directly critical to national security as psychology AI is;

My view is that the AI takeover problem is fundamentally a 'security' problem. Building a robot army/police force has lots of benefits (I prefer it to a human one in many ways) but it means it's that much easier for a rogue AI to seize control; a counter-terrorism AI also can be used against domestic opponents (including ones worried about the AI), and so on. I think jumping the gun on these sorts of things is more dangerous than jumping the gun on non-security uses (yes, you could use a fleet of self-driving cars to help you in a takeover, but it'd be much harder than a fleet of self-driving missile platforms).

Comment by Vaniver on Thoughts on the AI Safety Summit company policy requests and responses · 2023-11-01T17:02:56.758Z · LW · GW

FWIW I read Anthropic's RSP and came away with the sense that they would stop scaling if their evals suggested that a model being trained either registered as ASL-3 or was likely to (if they scaled it further). They would then restart scaling once they 1) had a definition of the ASL-4 model standard and lab standard and 2) met the standard of an ASL-3 lab.

Do you not think that? Why not?

Comment by Vaniver on Thoughts on the AI Safety Summit company policy requests and responses · 2023-11-01T00:36:36.459Z · LW · GW

(I'm Matthew Gray)

Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.

My sense from reading Inflection's response now is that they say the right things about red teaming and security and so on, but I am pretty worried about their basic plan / they don't seem to be grappling with the risks specific to their approach at all. Quoting from them in two different sections:

Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.

Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.

I think AIs thinking specifically about human psychology--and how to convince people to change their thoughts and behaviors--are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high, and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn't seem to have shown up in their response.

I don't think this type of AI is very useful for closing the acute risk window, and so probably shouldn't be made until much later.

Comment by Vaniver on Vaniver's thoughts on Anthropic's RSP · 2023-10-29T19:37:42.807Z · LW · GW
  1. What's the probability associated with that "should"? The higher it is the less of a concern this point is, but I don't think it's high enough to write off this point. (Separately, agreed that in order for danger warnings to be useful, they also have to be good at evaluating the impact of mitigations unless they're used to halt work entirely.)
  2. I don't think safety buffers are a good solution; I think they're helpful but there will still always be a transition point between ASL-2 models and ASL-3 models, and I think it's safer to have that transition in an ASL-3 lab than an ASL-2 lab. Realistically, I think we're going to end up in a situation where, for example, Anthropic researchers put a 10% chance on the next 4x scaling leading to evals declaring a model ASL-3, and it's not obvious what decision they will (or should) make in that case. Is 10% low enough to proceed, and what are the costs of being 'early'? 
  3. The relevant section of the RSP:

Note that ASLs are defined by risk relative to baseline, excluding other advanced AI systems. This means that a model that initially merits ASL-3 containment and deployment measures for national security reasons might later be reduced to ASL-2 if defenses against national security risks (such as biological or cyber defenses) advance, or if dangerous information becomes more widely available. However, to avoid a “race to the bottom”, the latter should not include the effects of other companies’ language models; just because other language models pose a catastrophic risk does not mean it is acceptable for ours to.

I think it's sensible to reduce models to ASL-2 if defenses against the threat become available (in the same way that it makes sense to demote pathogens from BSL-4 to BSL-3 once treatments become available), but I'm concerned about the "dangerous information becomes more widely available" clause. Suppose you currently can't get slaughterbot schematics off Google; if those become available, I am not sure it then becomes ok for models to provide users with slaughterbot schematics. (Specifically, I don't want companies that make models which are 'safe' except they leak dangerous information X to have an incentive to cause dangerous information X to become available thru other means.)

[There's a related, slightly more subtle point here; supposing you can currently get instructions on how to make a pipe bomb on Google, it can actually reduce security for Claude to explain to users how to make pipe bombs if Google is recording those searches and supplying information to law enforcement / the high-ranked sites on Google search are honeypot sites and Anthropic is not. The baseline is not just "is the information available?" but "who is noticing you accessing the information?".]

4. I mean, superior alternatives always preferred. I am moderately optimistic about "just stop" plans, and am not yet convinced that "scale until our tests tell us to stop" is dramatically superior to "stop now."

(Like, I think the hope here is to have an AI summer while we develop alignment methods / other ways to make humanity more prepared for advanced AI; it is not clear to me that doing that with the just-below-ASL-3 model is all that much better than doing it with the ASL-2 models we have today.)

Comment by Vaniver on We're Not Ready: thoughts on "pausing" and responsible scaling policies · 2023-10-28T23:50:51.801Z · LW · GW

At minimum, I hope that RSPs get renamed, and that those communicating about RSPs are more careful to avoid giving off the impression that RSPs are sufficient.

OpenAI's RDP name seems nicer than the RSP name, for roughly the reason they explain in their AI summit proposal (and also 'risk-informed' feels more honest than 'responsible'):

We refer to our policy as a Risk-Informed Development Policy rather than a Responsible Scaling Policy because we can experience dramatic increases in capability without significant increase in scale, e.g., via algorithmic improvements.

Comment by Vaniver on Truthseeking, EA, Simulacra levels, and other stuff · 2023-10-28T20:27:32.087Z · LW · GW

Nobody has really done any amount of retroactive funding

Wasn't this a retroactive funding thing?

Comment by Vaniver on Architects of Our Own Demise: We Should Stop Developing AI · 2023-10-27T00:49:59.818Z · LW · GW

So my view is that it is the decision-makers currently imagining that the poisoned banana will grant them increased wealth & power who need their minds changed. 

My current sense is that efforts to reach the poisoned banana are mostly not driven by politicians. It's not like Joe Biden or Xi Jinping are pushing for AGI, and even Putin's comments on AI look like near-term surveillance / military stuff, not automated science and engineering.