Posts

Gradations of moral weight 2024-02-29T23:08:58.271Z
Which animals realize which types of subjective welfare? 2024-02-27T19:31:32.854Z
Solution to the two envelopes problem for moral weights 2024-02-19T00:15:53.031Z
Types of subjective welfare 2024-02-02T09:56:34.284Z
Increasingly vague interpersonal welfare comparisons 2024-02-01T06:45:30.160Z
Arguments for utilitarianism are impossibility arguments under unbounded prospects 2023-10-07T21:08:59.645Z
Unbounded utility functions and precommitment 2022-09-10T16:16:22.254Z
If AGI were coming in a year, what should we do? 2022-04-01T00:41:42.200Z
Hardcode the AGI to need our approval indefinitely? 2021-11-11T07:04:09.567Z
How much should we still worry about catching COVID? [Links and Discussion Thread] 2021-08-29T06:26:04.292Z

Comments

Comment by MichaelStJules on Counting arguments provide no evidence for AI doom · 2024-02-28T06:09:48.912Z · LW · GW

The reason SDG doesn't overfit large neural networks is probably because of various measures specifically intended to prevent overfitting, like weight penalties, dropout, early stopping, data augmentation + noise on inputs, and large enough learning rates that prevent convergence. If you didn't do those, running SDG to parameter convergence would probably cause overfitting. Furthermore, we test networks on validation datasets on which they weren't trained, and throw out the networks that don't generalize well to the validation set and start over (with new hyperparameters, architectures or parameter initializations). These measures bias us away from producing and especially deploying overfit networks.

Similarly, we might expect scheming without specific measures to prevent it. What could those measures look like? Catching scheming during training (or validation), and either heavily penalizing it, or fully throwing away the network and starting over? We could also validate out-of-training-distribution. Would networks whose caught scheming has been heavily penalized or networks selected for not scheming during training (and validation) generalize to avoid all (or all x-risky) scheming? I don't know, but it seems more likely than counting arguments would suggest.

Comment by MichaelStJules on Types of subjective welfare · 2024-02-02T20:39:31.452Z · LW · GW

Thanks!

I would say experiments, introspection and consideration of cases in humans have pretty convincingly established the dissociation between the types of welfare (e.g. see my section on it, although I didn't go into a lot of detail), but they are highly interrelated and often or even typically build on each other like you suggest.

I'd add that the fact that they sometimes dissociate seems morally important, because it makes it more ambiguous what's best for someone if multiple types seem to matter, and there are possible beings with some types but not others.

Comment by MichaelStJules on Against most, but not all, AI risk analogies · 2024-01-15T00:07:14.158Z · LW · GW

If someone wants to establish probabilities, they should be more systematic, and, for example, use reference classes. It seems to me that there's been little of this for AI risk arguments in the community, but more in the past few years.

Maybe reference classes are kinds of analogies, but more systematic and so less prone to motivated selection? If so, then it seems hard to forecast without "analogies" of some kind. Still, reference classes are better. On the other hand, even with reference classes, we have the problem of deciding which reference class to use or how to weigh them or make other adjustments, and that can still be subject to motivated reasoning in the same way.

We can try to be systematic about our search and consideration of reference classes, and make estimates across a range of reference classes or weights to them. Do sensitivity analysis. Zach Freitas-Groff seems to have done something like this in AGI Catastrophe and Takeover: Some Reference Class-Based Priors, for which he won a prize from Open Phil's AI Worldviews Contest.

Of course, we don't need to use direct reference classes for AI risk or AI misalignment. We can break the problem down.

Comment by MichaelStJules on Spirit Airlines Merger Play · 2024-01-04T18:18:57.603Z · LW · GW

There's also a decent amount of call option volume+interest at strike prices of $17.5, $20, $22.5, $25, (same links as the comment I'm replying to) which suggests to me that the market is expecting lower upside on successful merger than you. The current price is about $15.8/share, so $17.5 is only +10% and $25 is only +58%.

There's also of course volume+interest for call option at higher strike prices, $27.5, $30, $32.5.

I think this also suggests the market-implied odds calculations giving ~40% to successful merger are wrong, because the expected upside is overestimated.  The market-implied odds are higher.

Comment by MichaelStJules on Spirit Airlines Merger Play · 2024-01-04T07:25:09.763Z · LW · GW

From https://archive.ph/SbuXU, for calculating the market-implied odds:

Author's analysis - assumed break price of $5 for Hawaiian and $6 for Spirit.

also:

  • Without a merger, Spirit may be financially distressed based on recent operating results. There's some risk that Spirit can't continue as a going concern without a merger.
  • Even if JetBlue prevails in court, there is some risk that the deal is recut as the offer was made in a much more favorable environment for airlines, though clauses in the merger agreement may prevent this.

So maybe you're overestimating the upside?

 

From https://archive.ph/rmZOX:

In my opinion, Spirit Airlines, Inc. equity is undervalued at around $15, but you're signing up for tremendous volatility over the coming months. The equity can get trashed under $5 or you can get the entire upside.

Comment by MichaelStJules on Spirit Airlines Merger Play · 2024-01-04T06:03:30.327Z · LW · GW

Unless I'm misreading, it looks like there's a bunch of volume+interest in put options with strike prices of around $5, but little volume+interest in options with lower strike prices (some in $2.50, but much less). $5.5 for January 5th, $5 for January 19th, $5 for February 16th. Much more volume+interest for put options in general for Feb 16th. So if we take those seriously and I'm not misunderstanding, the market expects a chance it'll drop below $5 per share, so a drop of at least ~70%.

There's more volume+interest in put options with strike prices of $7.50 and even more for $10 for February 16th.

Comment by MichaelStJules on Spirit Airlines Merger Play · 2024-01-04T05:11:02.695Z · LW · GW

Why is the downside only -60%?

Comment by MichaelStJules on Spirit Airlines Merger Play · 2024-01-04T04:05:49.550Z · LW · GW

Why think this is underpriced by the markets?

Comment by MichaelStJules on Book Review: Going Infinite · 2023-10-27T21:48:22.538Z · LW · GW

I would be surprised if iguanas find things meaningful that humans don't find meaningful, but maybe they desire some things pretty alien to us. I'm also not sure they find anything meaningful at all, but that depends on how we define meaningfulness.

Still, I think focusing on meaningfulness is also too limited. Iguanas find things important to them, meaningful or not. Desires, motivation, pleasure and suffering all assign some kind of importance to things.

In my view, either

  1. capacity for welfare is something we can measure and compare based on cognitive effects, like effects on attention, in which case it would be surprising if other verteberates, say, had tiny capacities for welfare relative to humans, or
  2. interpersonal utility comparisons can't be grounded, so there aren't any grounds to say iguanas have lower (or higher) capacities for welfare than humans, assuming they have any at all.
Comment by MichaelStJules on Book Review: Going Infinite · 2023-10-27T01:46:50.451Z · LW · GW

I think that's true, but also pretty much the same as what many or most veg or reducetarian EAs did when they decided what diet to follow (and other non-food animal products to avoid), including what exceptions to allow. If the consideration of why not to murder counts as involving math, so does veganism for many or most EAs, contrary to Zvi's claim. Maybe some considered too few options or possible exceptions ahead of time, but that doesn't mean they didn't do any math.

This is also basically how I imagine rule consequentialism to work: you decide what rules to follow ahead of time, including prespecified exceptions, based on math. And then you follow the rules. You don't redo the math for each somewhat unique decision you might face, except possibly very big infrequent decisions, like your career or big donations. You don't change your rule or make a new exception right in the situation where the rule would apply, e.g. a vegan at a restaurant, someone's house or a grocery store. If you change or break your rules too easily, you undermine your own ability to follow rules you set for yourself.

But also, EA is compatible with the impermissibility of instrumental harm regardless of how the math turns out (although I have almost no sympathy for absolutist deontological views). AFAIK, deontologists, including absolutist deontologists, can defend killing in self-defense without math and also think it's better to do more good than less, all else equal.

Comment by MichaelStJules on Book Review: Going Infinite · 2023-10-26T19:12:23.475Z · LW · GW

Well, there could be ways to distinguish, but it could be like a dream, where much of your reasoning is extremely poor, but you're very confident in it anyway. Like maybe you believe that your loved ones in your dream saying the word "pizza" is overwhelming evidence of their consciousness and love for you. But if you investigated properly, you could find out they're not conscious. You just won't, because you'll never question it. If value is totally subjective and the accuracy of beliefs doesn't matter (as would seem to be the case on experientialist accounts), then this seems to be fine.

Do you think simulations are so great that it's better for people to be put into them against their wishes, as long as they perceive/judge it as more meaningful or fulfilling, even if they wouldn't find it meaningful/fulfilling with accurate beliefs? Again, we can make it so that they don't find out.

Similarly, would involuntary wireheading or drugging to make people find things more meaningful or fulfilling be good for those people?

Or, something like a "meaning" shockwave, similar to a hedonium shockwave, — quickly killing and replacing everyone with conscious systems that take no outside input or even have sensations (or only the bare minimum) other than to generate feelings or judgements of meaning, fulfillment, or love? (Some person-affecting views could avoid this while still matching the rest of your views.)

Of course, I think there are good practical reasons to not do things to people against their wishes, even when it's apparently in their own best interests, but I think those don't capture my objections. I just think it would be wrong, except possibly in limited cases, e.g. to prevent foreseeable regret. The point is that people really do often want their beliefs to be accurate, and what they value is really intended — by their own statements — to be pointed at something out there, not just the contents of their experiences. Experientialism seems like an example of Goodhart's law to me, like hedonism might (?) seem like an example of Goodhart's law to you.

I don't think people and their values are in general replaceable, and if they don't want to be manipulated, it's worse for them (in one way) to be manipulated. And that should only be compensated for in limited cases. As far as I know, the only way to fundamentally and robustly capture that is to care about things other than just the contents of experiences and to take a kind of preference/value-affecting view.

Still, I don't think it's necessarily bad or worse for someone to not care about anything but the contents of their experiences. And if the state of the universe was already hedonium or just experiences of meaning, that wouldn't be worse. It's the fact that people do specifically care about things beyond just the contents of their experiences. If they didn't, and also didn't care about being manipulated, then it seems like it wouldn't necessarily be bad to manipulate them.

Comment by MichaelStJules on Book Review: Going Infinite · 2023-10-26T10:02:14.793Z · LW · GW

I think a small share of EAs would do the math before deciding whether or not to commit fraud or murder, or otherwise cause/risk involuntary harm to other people, and instead just rule it out immediately or never consider such options in the first place. Maybe that's a low bar, because the math is too obvious to do?

What other important ways would you want (or make sense for) EAs to be more deontological? More commitment to transparency and against PR?

Comment by MichaelStJules on Book Review: Going Infinite · 2023-10-26T08:20:16.961Z · LW · GW

Maximizing just for expected total pleasure, as a risk neutral classical utilitarian? Maybe being okay with killing everyone or letting everyone die (from AGI, say), as long as the expected payoff in total pleasure is high enough?

I don't really see a very plausible path for SBF to have ended up with enough power to do this, though. Money only buys you so much, against the US government and military, unless you can take them over. And I doubt SBF would destroy us with AGI if others weren't already going to.

Comment by MichaelStJules on Book Review: Going Infinite · 2023-10-26T07:55:36.204Z · LW · GW

Where I agree with classical utilitarianism is that we should compute goodness as a function of experience, rather than e.g. preferences or world states

Isn't this incompatible with caring about genuine meaning and fulfillment, rather than just feelings of them? For example, it's better for you to feel like you're doing more good than to actually do good. It's better to be put into an experience machine and be systematically mistaken about everything you care about, i.e. that the people you love even exist (are conscious, etc.) at all, even against your own wishes, as long as it feels more meaningful and fulfilling (and you never find out it's all fake, or that can be outweighed). You could also have what you find meaningful changed against your wishes, e.g. made to find counting blades of grass very meaningful, more so than caring for your loved ones.

FWIW, this is also an argument for non-experientialist "preference-affecting" views, similar to person-affecting views. On common accounts of weigh or aggregate, if there are subjective goods, then they can be generated and outweigh the violation and abandonment of your prior values, even against your own wishes, if they’re strong enough.

Comment by MichaelStJules on Book Review: Going Infinite · 2023-10-26T07:53:14.935Z · LW · GW

And if emotionally significant social bonds don't count, it seems like we could be throwing away what humans typically find most important in their lives.

Of course, I think there are potentially important differences. I suspect humans tend to be willing to sacrifice or suffer much more for those they love than (almost?) all other animals. Grief also seems to affect humans more (longer, deeper), and it's totally absent in many animals.

On the other hand, I guess some other animals will fight to the death to protect their offspring. And some die apparently grieving. This seems primarily emotionally driven, but I don't think we should discount it for that fact. Emotions are one way of making evaluations, like other kinds of judgements of value.

EDIT: Another possibility is that other animals form such bonds and could even care deeply about them, but don't find them "meaningful" or "fulfilling" at all or in a way as important as humans do. Maybe those require higher cognition, e.g. concepts of meaning and fulfillment. But it seems to me that the deep caring, in just emotional and motivational terms, should be enough?

Comment by MichaelStJules on Arguments for utilitarianism are impossibility arguments under unbounded prospects · 2023-10-10T05:45:59.553Z · LW · GW

Ya, I don't think utilitarian ethics is invalidated, it's just that we don't really have much reason to be utilitarian specifically anymore (not that there are necessarily much more compelling reasons for other views). Why sum welfare and not combine them some other way? I guess there's still direct intuition: two of a good thing is twice as good as just one of them. But I don't see how we could defend that or utilitarianism in general any further in a way that isn't question-begging and doesn't depend on arguments that undermine utilitarianism when generalized.

You could just take your utility function to be  where  is any bounded increasing function, say arctan, and maximize the expected value of that. This doesn't work with actual infinities, but it can handle arbitrary prospects over finite populations. Or, you could just rank prospects by stochastic dominance with respect to the sum of utilities, like Tarsney, 2020.

You can't extend it the naive way, though, i.e. just maximize  whenever that's finite and then do something else when it's infinite or undefined, though. One of the following would happen: the money pump argument goes through again, you give up stochastic dominance or you give up transitivity, each of which seems irrational. This was my 4th response to Infinities are generally too problematic.

Comment by MichaelStJules on Arguments for utilitarianism are impossibility arguments under unbounded prospects · 2023-10-08T03:37:38.222Z · LW · GW

The argument can be generalized without using infinite expectations, and instead using violations of Limitedness in Russell and Isaacs, 2021 or reckless preferences in Beckstead and Thomas, 2023. However, intuitively, it involves prospects that look like they should be infinitely valuable or undefinably valuable relative to the things they're made up of.  Any violation of (the countable extension of) the Archimedean Property/continuity is going to look like you have some kind of infinity.

The issue could just be a categorization thing. I don't think philosophers would normally include this in "infinite ethics", because it involves no actual infinities out there in the world.

Comment by MichaelStJules on Arguments for utilitarianism are impossibility arguments under unbounded prospects · 2023-10-08T01:49:21.070Z · LW · GW

Also, I'd say what I'm considering here isn't really "infinite ethics", or at least not what I understand infinite ethics to be, which is concerned with actual infinities, e.g. an infinite universe, infinitely long lives or infinite value. None of the arguments here assume such infinities, only infinitely many possible outcomes with finite (but unbounded) value.

Comment by MichaelStJules on Arguments for utilitarianism are impossibility arguments under unbounded prospects · 2023-10-08T00:18:32.261Z · LW · GW

Thanks for the comment!

I don't understand this part of your argument. Can you explain how you imagine this proof working?

St Petersburg-like prospects (finite actual utility for each possible outcome, but infinite expected utility, or generalizations of them) violate extensions of each of these axioms to countably many possible outcomes:

  1. The continuity/Archimedean axiom: if A and B have finite expected utility, and A < B, there's no strict mixture of A and an infinite expected utility St Petersburg prospect, like , that's equivalent to B, because all such strict mixtures will have infinite expected utility. Now, you might not have defined expected utility yet, but this kind of argument would generalize: you can pick A and B to be outcomes of the St Petersburg prospect, and any strict mixture with A will be better than B.
  2. The Independence axiom: see the following footnote.[2]
  3. The Sure-Thing Principle: in the money pump argument in my post, B-$100 is strictly better than each outcome of A, but A is strictly better than B-$100. EDIT: Actually, you can just compare A with B.

I think these axioms are usually stated only for prospects for finitely many possible outcomes, but the arguments for the finitary versions, like specific money pump arguments, would apply equally (possibly with tiny modifications that wouldn't undermine them) to the countable versions. Or, at least, that's the claim of Russell and Isaacs, 2021, which they illustrate with a few arguments and briefly describe some others that would generalize. I reproduced their money pump argument in the post.

 

For example, as you came close to saying in your responses, you could just have bounded utility functions! That ends up being rational, and seems not self-undermining because after looking at many of these arguments it seems like maybe you're kinda forced to.

Ya, I agree that would be rational. I don't think having a bounded utility function is in itself self-undermining (and I don't say so), but it would undermine utilitarianism, because it wouldn't satisfy Impartiality + (Separability or Goodsell, 2021's version of Anteriority). If you have to give up Impartiality + (Separability or Goodsell, 2021's version of Anteriority) and the arguments that support them, then there doesn't seem to be much reason left to be a utilitarian of any kind in the first place. You'll have to give up the formal proofs of utilitarianism that depend on these principles or restrictions of them that are motivated in the same ways.

You can try to make utilitarianism rational by approximating it with a bounded utility function, or applying a bounded function to total welfare and taking that as your utility function, and then maximizing expected utility, but then you undermine the main arguments for utilitarianism in the first place.

Hence, utilitarianism is irrational or self-undermining.

 

Overall, I wish you'd explain the arguments in the papers you linked better. The one argument you actually wrote in this post was interesting, you should have done more of that!

I did consider doing that, but the post is already pretty long and I didn't want to spend much more on it. Goodsell, 2021's proof is simple enough, so you could check out the paper. The proof for Theorem 4 from Russell, 2023 looks trickier. I didn't get it on my first read, and I haven't spent the time to actually understand it. EDIT: Also, the proofs aren't as nice/intuitive/fun or flow as naturally as the money pump argument. They present a sequence of prospects constructed in very specific ways, and give a contradiction (violating of transitivity) when you apply all of the assumptions in the theorem. You just have to check the logic.

  1. ^

    You could refuse to define the expected utilility, but the argument generalizes

  2. ^

    Russell and Isaacs, 2021 define Countable Independence as follows:

    For any prospects , and , and any probabilities  that sum to one, if , then

    If furthermore  for some  such that , then

    Then they write:

    Improper prospects clash directly with Countable Independence. Suppose  is a prospect that assigns probabilities  to outcomes  . We can think of  as a countable mixture in two different ways. First, it is a mixture of the one-outcome prospects  in the obvious way. Second, it is also a mixture of infinitely many copies of X itself. If  is improper, this means that  is strictly better than each outcome  . But then Countable Independence would require that X is strictly better than X. (The argument proceeds the same way if X is strictly worse than each outcome xi instead.)

Comment by MichaelStJules on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-10-03T16:21:35.904Z · LW · GW

Possibly, but by limiting access to the arguments, you also limit the public case for it and engagement by skeptics. The views within the area will also probably further reflect self-selection for credulousness and deference over skepticism.

There must be less infohazardous arguments we can engage with. Or, maybe zero-knowledge proofs are somehow applicable. Or, we can select a mutually trusted skeptic (or set of skeptics) with relevant expertise to engage privately. Or, legally binding contracts to prevent sharing.

Comment by MichaelStJules on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-10-01T16:56:24.145Z · LW · GW

Eliezer's scenario uses atmospheric CHON. Also, I guess Eliezer used atmospheric CHON to allow the nanomachines to spread much more freely and aggressively.

Comment by MichaelStJules on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-10-01T16:53:11.647Z · LW · GW

Is 1% of the atmosphere way more than necessary to kill everything near the surface by attacking it?

Comment by MichaelStJules on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-09-30T02:49:43.488Z · LW · GW

Also, maybe we design scalable and efficient quantum computers with AI first, and an AGI uses those to simulate quantum chemistry more efficiently, e.g. Lloyd, 1996 and Zalka, 1996. But large quantum computers may still not be easily accessible. Hard to say.

Comment by MichaelStJules on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-09-30T02:32:27.944Z · LW · GW

High quality quantum chemistry simulations can take days or weeks to run, even on supercomputing clusters.

This doesn't seem very long for an AGI if they're patient and can do this undetected. Even months could be tolerable? And if the AGI keeps up with other AGI by self-improving to avoid being replaced, maybe even years. However, at years, there could be a race between the AGIs to take over, and we could see a bunch of them make attempts that are unlikely to succeed.

Comment by MichaelStJules on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-09-30T02:16:00.462Z · LW · GW

As a historical note and for further context, the diamondoid scenario is at least ~10 years old, outlined here by Eliezer, just not with the term "diamondoid bacteria":

The concrete illustration I often use is that a superintelligence asks itself what the fastest possible route is to increasing its real-world power, and then, rather than bothering with the digital counters that humans call money, the superintelligence solves the protein structure prediction problem, emails some DNA sequences to online peptide synthesis labs, and gets back a batch of proteins which it can mix together to create an acoustically controlled equivalent of an artificial ribosome which it can use to make second-stage nanotechnology which manufactures third-stage nanotechnology which manufactures diamondoid molecular nanotechnology and then... well, it doesn't really matter from our perspective what comes after that, because from a human perspective any technology more advanced than molecular nanotech is just overkill.  A superintelligence with molecular nanotech does not wait for you to buy things from it in order for it to acquire money.  It just moves atoms around into whatever molecular structures or large-scale structures it wants.

The first mention of "diamondoid" on LW (and by Eliezer) is this from 16 years ago, but not for an AI doom scenario.

Comment by MichaelStJules on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-09-30T01:56:49.146Z · LW · GW

Are you thinking quantum computers specifically? IIRC, quantum computers can simulate quantum phenomena much more efficiently at scale than classical computers.

EDIT: For early proofs of efficient quantum simulation with quantum computers, see:

  1. Lloyd, 1996 https://fab.cba.mit.edu/classes/862.22/notes/computation/Lloyd-1996.pdf
  2. Zalka, 1996 https://arxiv.org/abs/quant-ph/9603026v2
Comment by MichaelStJules on "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation · 2023-09-29T17:41:41.552Z · LW · GW

This is the more interesting and important claim to check to me. I think the barriers to engineering bacteria are much lower, but it’s not obvious that this will avoid detection and humans responding to the threat, or that timing and/or triggers in bacteria can be reliable enough.

Comment by MichaelStJules on When would AGIs engage in conflict? · 2023-09-27T00:15:57.690Z · LW · GW

Hmm, if A is simulating B with B's source code, couldn’t the simulated B find out it's being simulated and lie about its decisions or hide what its actual preferences? Or would its actual preferences be derivable from its weights or code directly without simulation?

Comment by MichaelStJules on Paper: LLMs trained on “A is B” fail to learn “B is A” · 2023-09-24T16:54:56.624Z · LW · GW

I had a similar thought about "A is B" vs "B is A", but "A is the B" should reverse to "The B is A" and vice versa when the context is held constant and nothing changes the fact, because "is" implies that it's the present condition and "the" implies uniqueness. However, it might be trained on old and no longer correct writing or that includes quotes about past states of affairs. Some context might still be missing, too, e.g. for "A is the president", president of what? It would still be a correct inference to say "The president is A" in the same context, at least, and some others, but not all.

Also, the present condition can change quickly, e.g. "The time is 5:21:31 pm EST" and "5:21:31 pm EST is the time" quickly become false, but I think these are rare exceptions in our use of language.

Comment by MichaelStJules on Impossibility results for unbounded utilities · 2023-09-22T04:13:06.507Z · LW · GW

p.37-38 in Goodsell, 2023 gives a better proposal, which is to clip/truncate the utilities into the range  and compare the expected clipped utilities in the limit as . This will still suffer from St Petersburg lottery problems, though.

Comment by MichaelStJules on There are no coherence theorems · 2023-09-21T07:35:35.415Z · LW · GW

Looking at Gustafsson, 2022's money pumps for completeness, the precaution principles he uses just seem pretty unintuitive to me. The idea seems to be that if you'll later face a decision situation where you can make a choice that makes you worse off but you can't make yourself better off by getting there, you should avoid the decision situation, even if it's entirely under your control to make a choice in that situation that won't leave you worse off. But, you can just make that choice that won't leave you worse off later instead of avoiding the situation altogether.

Here's the forcing money pump:

 

It seems obvious to me that you can just stick with A all the way through, or switch to B, and neither would violate any of your preferences or be worse than any other option. Gustafsson is saying that would be irrational, it seems because there's some risk you'll make the wrong choices. Another kind of response like your policy I can imagine is that unless you have preferences otherwise (i.e. would strictly prefer another accessible option to what you have now), you just stick with the status quo, as the default. This means sticking with A all the eay though, because you're never offered a strictly better option than it.

 

Another problem with the precaution principles is that they seem much less plausible when you seriously entertain incompleteness, rather than kind of treat incompleteness like equivalence. He effectively argues that at node 3, you should pick B, because otherwise at node 4, you could end up picking B-, which is worse than B, and there's no upside. But that basically means claiming that one of the following must hold:

  1. you'll definitely pick B- at 4, or
  2. B is better than any strict probabilistic mixture of A and B-.

But both are false in general. 1 is false in general because A is permissible at 4. 2 is false in general because A and B are incomparable and incomparability can be infectious (e.g. MacAskill, 2013), so B can be incomparable with a strict probabilistic mixture of A and B-. It also just seems unintuitive, because the claim is made generally, and so would have to hold no matter how low the probability assigned to B- is, as long it's positive.

Imagine A is an apple, B is a banana and B- is a slightly worse banana, and I have no preferences between apples and bananas. It would be odd to say that a banana is better than an apple or a tiny probability of a worse banana. This would be like using the tiny risk of a worse banana with the apple to break a tie between the apple and the banana, but there's no tie to break, because apples and bananas are incomparable.

If A and B were equivalent, then B would indeed very plausibly be better than a strict probabilistic mixture of A and B-. This would follow from Independence, or if A, B and B- are deterministic outcomes, statewise dominance. So, I suspect the intuitions supporting the precaution principles are accidentally treating incomparability like equivalence.

I think a more useful way to think of incomparability is as indeterminancy about which is better. You could consider what happens if you treat A as (possibly infinitely) better than B in one whole treatment of the tree, and consider what happens if you treat B as better than A in a separate treatment, and consider what happens if you treat them as equivalent all the way through (and extend your preference relation to be transitive and continue to satisfy stochastic dominance and independence in each case). If B were better, you'd end up at B, no money pump. If A were better, you'd end up at A, no money pump. If they were equivalent, you'd end up at either (or maybe specifically B, because of precaution), no money pump.

Comment by MichaelStJules on There are no coherence theorems · 2023-09-21T01:02:11.588Z · LW · GW

I think a multi-step decision procedure would be better. Do what your preferences themselves tell you to do and rule out any options you can with them. If there are multiple remaining incomparable options, then apply your original policy to avoid money pumps.

Comment by MichaelStJules on There are no coherence theorems · 2023-09-20T04:28:28.242Z · LW · GW

This also looks like a generalization of stochastic dominance.

Comment by MichaelStJules on There are no coherence theorems · 2023-09-20T02:33:56.991Z · LW · GW

Coming back to this, the policy

if I previously turned down some option X, I will not choose any option that I strictly disprefer to X

seems irrational to me if applied in general. Suppose I offer you  and , where both  and  are random, and  is ex ante preferable to , e.g. stochastically dominates , but has some chance of being worse than . You pick . Then you evaluate  to get . However, suppose you get unlucky, and  is worse than . Suppose further that there's a souring of , that's still preferable to . Then, I offer you to trade  for . It seems irrational to not take .

Maybe what you need to do is first evaluate according to your multi-utility function (or stochastic dominance, which I think is a special case) to rule out some options, i.e. to rule out not trading  for  when the latter is better than the former, and then apply your policy to rule out more options.

Comment by MichaelStJules on Microdooms averted by working on AI Safety · 2023-09-18T19:19:22.594Z · LW · GW

Also, the estimate of the current number of researchers probably underestimates the number of people (or person-hours) who will work on AI safety. You should probably expect further growth to the number of people working on AI safety, because the topic is getting mainstream coverage and support, Hinton and Bengio have become advocates, and it's being pushed more in EA (funding, community building, career advice).

However, the FTX collapse is reason to believe there will be less funding going forward.

Comment by MichaelStJules on Microdooms averted by working on AI Safety · 2023-09-17T23:56:56.533Z · LW · GW

Some other possibilities that may be worth considering and can further reduce impact, at least for an individual looking to work on AI safety themself:

  1. Some work is net negative and increases the risk of doom or wastes the time and attention of people who could be doing more productive things.
  2. Practical limits on the number of people working at a time, e.g. funding, management/supervision capacity. This could mean some people could have much lower probability of making a difference, if them taking a position pushes someone else who would have out from the field, or into (possibly much) less useful work.
Comment by MichaelStJules on When would AGIs engage in conflict? · 2023-08-31T09:31:43.281Z · LW · GW

An AGI could give read and copy access to the code being run and the weights directly on the devices from which the AGI is communicating. That could still be a modified copy of the original and more powerful (or with many unmodified copies) AGI, though. So, the other side may need to track all of the copies, maybe even offline ones that would go online on some trigger or at some date.

Also, giving read and copy access could be dangerous to the AGI if it doesn't have copies elsewhere.

Comment by MichaelStJules on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-08-27T18:58:33.780Z · LW · GW

Some other discussion of his views on (animal) consciousness here (and in the comments).

Comment by MichaelStJules on Eliezer Yudkowsky Is Frequently, Confidently, Egregiously Wrong · 2023-08-27T18:53:50.716Z · LW · GW

My understanding from Eliezer's writing is that he's an illusionist (and/or a higher-order theorist) about consciousness. However, illusionism (and higher-order theories) are compatible with mammals and birds, at least, being conscious. It depends on the specifics.

I'm also an illusionist about consciousness and very sympathetic to the idea that some kinds of higher-order processes are required, but I do think mammals and birds, at least, are very probably conscious, and subject to consciousness illusions. My understanding is that Humphrey (Humphrey, 2022Humphrey, 2023aHumphrey, 2023bHumphrey, 2017Romeo, 2023Humphrey, 2006Humphrey, 2011) and Muehlhauser (2017) (a report for Open Phil, but representing his own views) would say the same. Furthermore, I think the standard interpretation of illusionism doesn’t require consciousness illusions or higher-order processes in conscious subjects at all, and instead a system is conscious if connecting a sufficiently sophisticated introspective system to it the right way would lead to consciousness illusions, and this interpretation would plausibly attribute consciousness more widely, possibly quite widely (Blackmore, 2016 (available submitted draft), Frankish, 2020, Frankish, 2021Frankish, 2023Graziano, 2021, Dung, 2022).

If I recall correctly, Eliezer seemed to give substantial weight to relatively sophisticated self- and other-modelling, like cognitive empathy and passing the mirror test. Few animals seem to pass the mirror test, so that would be reason for skepticism.

However, maybe they’re just not smart enough to infer that the reflection is theirs, or they don’t rely enough on sight. Or, they may recognize themselves in other ways or to at least limited degrees. Dogs can remember what actions they’ve spontaneously taken (Fugazza et al., 2020) and recognize their own bodies as obstacles (Lenkei, 2021), and grey wolves show signs of self-recognition via a scent mirror test (Cazzolla Gatti et al., 2021, layman summary in Mates, 2021). Pigeons can discriminate themselves from conspecifics with mirrors, even if they don’t recognize the reflections as themselves (Wittek et al., 2021Toda and Watanabe, 2008). Mice are subject to the rubber tail illusion and so probably have a sense of body ownership (Wada et al., 2016).

Furthermore, Carey and Fry (1995) show that pigs generalize the discrimination between non-anxiety states and drug-induced anxiety to non-anxiety and anxiety in general, in this case by pressing one lever repeatedly with anxiety, and alternating between two levers without anxiety (the levers gave food rewards, but only if they pressed them according to the condition). Similar experiments were performed on rodents, as discussed in Sánchez-Suárez, 2016, in section 4.d., starting on p.81. Rats generalized from hangover to morphine withdrawal and jetlag, and from high doses of cocaine to movement restriction, from an anxiety-inducing drug to aggressive defeat and predator cues. Of course, anxiety has physical symptoms, so maybe this is what they're discriminating, not the negative affect.

 

There are also of course many non-illusionist theories of consciousness that attribute consciousness more widely that are defended (although I'm personally not sympathetic, unless they're illusionist-compatible), and theory-neutral or theory-light approaches. On theory-neutral and theory-light approaches, see Low, 2012Sneddon et al., 2014Le Neindre et al., 2016Rethink Priorities, 2019Birch, 2020Birch et al., 2022Mason and Lavery, 2022, generally giving more weight to the more recent work.

Comment by MichaelStJules on Is Deontological AI Safe? [Feedback Draft] · 2023-08-02T06:55:04.191Z · LW · GW

I think it's worth pointing out that from the POV of such ethical views, non-extinction could be an existential risk relative to extinction, or otherwise not that important (see also the asymmetric views in Thomas, 2022). If we assign some credence to those views, then we might instead focus more of our resources on avoiding harms without also (significantly) increasing extinction risks, perhaps especially reducing s-risks or the torture of sentient beings.

Furthermore, the more we reduce the risks of such harms, the less prone deontological (and other morally asymmetric) AI could be to aim for extinction.

Comment by MichaelStJules on There are no coherence theorems · 2023-08-02T05:55:42.244Z · LW · GW

The arguments typically require agents to make decisions independently of the parts of the decision tree in the past (or that are otherwise no longer accessible, in case they were ruled out). But an agent need not do that. An agent can always avoid getting money pumped by just following the policy of never picking an option that completes a money pump (or the policy of never making any trades, say). They can even do this with preference cycles.

Does this mean money pump arguments don't tell us anything? Such a policy may have other costs that an agent would want to avoid, if following their preferences locally would otherwise lead to getting money pumped (e.g. as Gustafsson (2022) argues in section 7 Against Resolute Choice), but how important could depend on those costs, including how frequently they expect to incur them, as well as the costs of changing their preferences to satisfy rationality axioms. It seems bad to pick options you'll foreseeable regret. However, changing your preferences to fit some proposed rationality requirements also seems foreseeably regrettable in another way: you have to give up things you care about or some ways you care about them. And that can be worse than your other options for avoiding money pumps, or even, sometimes, getting money pumped.

Furthermore, agents plausibly sometimes need to make commitments that would bind them in the future, even if they'd like to change their minds later, in order to win in Parfit's hitchhiker, say.

 

Similarly, if instead of money pumps, an agent should just avoid any lottery that's worse than (or strictly statewise dominated by, or strictly stochastically dominated by, under some suitable generalization[1]) another they could have guaranteed, it's not clear that's a requirement of rationality, either. If I prefer A<B<C<A, then it doesn't seem more regrettable if I pick one option than if I pick another (knowing nothing else), even though no matter what option I pick, it seems regrettable that I didn't pick another. Choosing foreseeably regrettable options seems bad, but if every option is (foreseeably) regrettable in some way, and there's no least of the evils, then is it actually irrational?

 

Furthermore, if a superintelligence is really good at forecasting, then maybe we should expect it to have substantial knowledge of the decision tree in advance, and to typically be able to steer clear of situations where it might face a money pump or other dilemmas, and if it ever does get money pumped, the costs of all money pumps would be relatively small compared to its gains.

  1. ^

     (strictly) stochastically dominates  iff there's a "probability rearrangement" of , such that  (strictly) statewise dominates .

Comment by MichaelStJules on There are no coherence theorems · 2023-08-02T04:36:26.289Z · LW · GW

See also EJT's comment here (and the rest of the thread). You'd just pick any one of the utility functions. You can also probably drop continuity for something weaker, as I point out in my reply there.

Comment by MichaelStJules on There are no coherence theorems · 2023-08-02T01:44:50.758Z · LW · GW

This is cool. I don't think violations of continuity are also in general exploitable, but I'd guess you should also be able to replace continuity with something weaker from Russell and Isaacs, 2020, just enough to rule out St. Petersburg-like lotteries, specifically any one of Countable Independence (which can also replace independence), the Extended Outcome Principle (which can also replace independence) or Limitedness, and then replace the real-valued utility functions with utility functions representable by "lexicographically ordered ordinal sequences of bounded real utilities".

Comment by MichaelStJules on There are no coherence theorems · 2023-08-01T18:58:33.420Z · LW · GW

EDIT: Looks like a similar point made here.

 

I wonder if we can "extend" utility maximization representation theorems to drop Completeness. There's already an extension to drop Continuity by using an ordinal-indexed vector (sequence) of real numbers, with entries sorted lexicographically ("lexicographically ordered ordinal sequences of bounded real utilities", Russell and Isaacs, 2020). If we drop Completeness, maybe we can still represent the order with a vector of independent but incomparable dimensions across which it must respect ex ante Pareto efficiency (and each of those dimensions could also be split into an ordinal-indexed vector of real numbers with entries sorted lexicographically, if we're also dropping Continuity)?

These also give us examples of somewhat natural/non-crazy orders that are consistent with dropping Completeness. I've seen people (including some economists) claim interpersonal utility comparisons are impossible and that we should only seek Pareto efficiency across people and not worry about tradeoffs between people. (Said Achmiz already pointed this and other examples out.)

Intuitively, the dimensions don't actually need to be totally independent. For example, the order could be symmetric/anonymous/impartial between some dimensions, i.e. swapping values between these dimensions gives indifference. You could also have some strict preferences over some large tradeoffs between dimensions, but not small tradeoffs. Or even, maybe you want more apples and more oranges without tradeoffs between them, but also prefer more bananas to more apples and more bananas to more oranges. Or, a parent, having to give a gift to one of their children, may strictly prefer randomly choosing over picking one child to give it to, and find each nonrandom option incomparable to one another (although this may have problems when they find out which one they will give to, and then give them the option to rerandomize again; they might never actually choose).

Maybe you could still represent all of this with a large number of, possibly infinitely many, real-valued utility functions (or utility functions representable by "lexicographically ordered ordinal sequences of bounded real utilities") instead. So, the correct representation could still be something like a (possibly infinite) set of utility functions (each possibly a "lexicographically ordered ordinal sequences of bounded real" utility functions), across which you must respect ex ante Pareto efficiency. This would be similar to the maximality rule over your representor/credal set/credal committee for imprecise credences (Mogensen, 2019).

Then, just combine this with your policy "if I previously turned down some option X, I will not choose any option that I strictly disprefer to X", where strictly disprefer is understood to mean ex ante Pareto dominated. 

But now this seems like a coherence theorem, just with a broader interpretation of "expected utility".

To be clear, I don't know if this "theorem" is true at all.

 

Possibly also related: McCarthy et al., 2020 have a utilitarian representation theorem that's consistent with "the rejection of all of the expected utility axioms, completeness, continuity, and independence, at both the individual and social levels". However, it's not a real-valued representation. It reduces lotteries over a group of people to a lottery over outcomes for one person, as the probabilistic mixture of each separate person's lottery into one lottery.

Comment by MichaelStJules on Consciousness as intrinsically valued internal experience · 2023-07-30T20:24:57.235Z · LW · GW

There's a recent survey of the general public's answers to "In your own words, what is consciousness?"

Consciousness: In your own words by Michael Graziano and Isaac Ray Christian, 2022

Abstract:

Surprisingly little is known about how the general public understands consciousness, yet information on common intuitions is crucial to discussions and theories of consciousness. We asked 202 members of the general public, “In your own words, what is consciousness?” and analyzed the frequencies with which different perspectives on consciousness were represented. Almost all people (89%) described consciousness as fundamentally receptive – possessing, knowing, perceiving, being aware, or experiencing. In contrast, the perspective that consciousness is agentic (actively making decisions, driving output, or controlling behavior) appeared in only 33% of responses. Consciousness as a social phenomenon was represented by 24% of people. Consciousness as being awake or alert was mentioned by 19%. Consciousness as mystical, transcending the physical world, was mentioned by only 10%. Consciousness in relation to memory was mentioned by 6%. Consciousness as an inner voice or inner being – the homunculus view – was represented by 5%. Finally, only three people (1.5%) mentioned a specific, scholarly theory about consciousness, suggesting that we successfully sampled the opinions of the general public rather than capturing an academic construct. We found little difference between men and women, young and old, or US and non-US participants, except for one possible generation shift. Young, non-US participants were more likely to associate consciousness with moral decision-making. These findings show a snapshot of the public understanding of consciousness – a network of associated concepts, represented at varying strengths, such that some are more likely to emerge when people are asked an open-ended question about it.

Comment by MichaelStJules on When do "brains beat brawn" in Chess? An experiment · 2023-07-06T23:47:26.251Z · LW · GW

I think it's more illustrative than anything, and a response to Robert Miles using chess against Magnus Carlsen as an analogy for humans vs AGI. The point is that a large enough material advantage can help someone win against a far smarter opponent. Somewhat more generally, I think arguments for AI risk often put intelligence on a pedestal, without addressing its limitations, including the physical resource disadvantages AGIs will plausibly face.

I agree that the specifics of chess probably aren't that helpful for informing AI risk estimates, and that a better tuned engine could have done better against the author.

Maybe better experiments to run would be playing real-time strategy games against a far smarter but materially disadvatanged AI, but this would also limit the space of actions an AI could take relative to the real world.

Comment by MichaelStJules on When do "brains beat brawn" in Chess? An experiment · 2023-07-04T20:41:29.990Z · LW · GW

For my 2nd paragraph, I meant that the experiment would underestimate the required resource gap. Being down exactly by a queen at the start of a game is not as bad as being down exactly by a queen later into the game when there are fewer pieces overall left, because that's a larger relative gap in resources.

Comment by MichaelStJules on When do "brains beat brawn" in Chess? An experiment · 2023-07-01T15:55:49.790Z · LW · GW

Would queen-odds games pass through roughly within-distribution game states, anyway, though?

Or, either way, if/when it does reach roughly within-distribution game states, the material advantage in relative terms will be much greater than just being down a queen early on, so the starting material advantage would still underestimate the real material advantage for a better trained AI.

Comment by MichaelStJules on When do "brains beat brawn" in Chess? An experiment · 2023-07-01T15:31:28.486Z · LW · GW

Why is it too late if it would take militaries to stop it? Couldn't the militaries stop it?

Comment by MichaelStJules on What money-pumps exist, if any, for deontologists? · 2023-06-29T00:41:26.681Z · LW · GW

There's also a similar interesting argument here, but I don't think you get a money pump out of it either: https://rychappell.substack.com/p/a-new-paradox-of-deontology