simon's Shortform 2023-04-27T03:25:07.778Z
No, really, it predicts next tokens. 2023-04-18T03:47:21.797Z


Comment by simon on UDT shows that decision theory is more puzzling than ever · 2023-09-14T04:18:20.316Z · LW · GW

There are at least two potential sources of cooperation: symmetry and mutual source code knowledge; symmetry should be fragile to small changes in source code (I expect) as well as asymmetry between the situations of the different parties while mutual source code knowledge doesn't require those sorts of symmetry at all (but does require knowledge).

Edit: for some reason my intuition expects cooperation from similarity to be less fragile in the Newcomb's problem/code knowledge case (similarity to simulation) than if the similarity is just plain similarity to another, non-simulation agent. I need to think about why and if this has any connection to what would actually happen.

Comment by simon on UDT shows that decision theory is more puzzling than ever · 2023-09-14T04:05:52.271Z · LW · GW

I did not realize that the UDT agents were assumed to behave identically; I was thinking that the cooperation was maintained, not by symmetry, but by mutual source code knowledge. 

If it's symmetry, well, if you can sneak a different agent into a clique without getting singled out, that's an advantage. Again not a problem with UDT as such.

Edit: of course they do behave identically because they did have identical code (which was the source of the knowledge). (Though I don't expect agents in the same decision theory class to be identical in the typical case).

Comment by simon on UDT shows that decision theory is more puzzling than ever · 2023-09-14T03:53:55.931Z · LW · GW

You can use it when it can be well defined. I think in the real world you mostly do have something at least in the past you can call "original", and when it doesn't still exist you could modify to, e.g. "what the original instantiation, if it anticipated this scenario, would have defined as its successor".

Comment by simon on UDT shows that decision theory is more puzzling than ever · 2023-09-14T02:58:44.402Z · LW · GW

I think this is also a rather contrived scenario, because if the UDT agents could change their own code (silently) cooperation would immediately break down, so it is reliant on the CDT agents being able to have different code from the most common and thus expected code silently, and the UDT agents not. 

Comment by simon on UDT shows that decision theory is more puzzling than ever · 2023-09-13T16:54:58.585Z · LW · GW

Indexical values are not reflectively consistent. UDT "solves" this problem by implicitly assuming (via the type signature of its utility function) that the agent doesn't have indexical values. But humans seemingly do have indexical values, so what to do about that?

Your linked post seems to suggest only some varieties of indexical value sources are reflectively inconsistent, but what's missing is an indexical value source that's both reflectively consistent and makes sense for humans. So there could still be a way to make indexical values reflectively consistent, just that we haven't thought of it yet?

E.g. would it work to privilege an agent's original instantiation, so that if you're uncertain you're the original or the copy you follow the interests of the original? That would seem to address the counterfactual mugging question if Omega were to predict by simulating you at least.

(I'm not sure if that technically counts as 'indexical' but it seems to me it can still be 'selfish' in the colloquial sense, no?)

Comment by simon on UDT shows that decision theory is more puzzling than ever · 2023-09-13T15:57:50.449Z · LW · GW

2TDT-1CDT - If there's a population of mostly TDT/UDT agents and few CDT agents (and nobody knows who the CDT agents are) and they're randomly paired up to play one-shot PD, then the CDT agents do better. What does this imply?

I don't think that's the case unless you have really weird assumptions. If the other party can't tell what the TDT/UDT agent will pick, they'll defect, won't they? It seems strange that the other party would be able to tell what the TDT/UDT agent will pick but not whether they're TDT/UDT or CDT.

Edit: OK, I see the idea is that the TDT/UDT agents have known, fixed code, which can, e.g., randomly mutate into CDT. They can't voluntarily change their code. Being able to trick the other party about your code is an advantage - I don't see that as a TDT/UDT problem.

Comment by simon on The omnizoid - Heighn FDT Debate #1: Why FDT Isn't Crazy · 2023-09-04T16:58:47.931Z · LW · GW

Really, if the predictor mistake rate is indeed 1 in a trillion trillion then it's much more probable that the note lies than that you are in this extremely rare circumstances where you pick the left envelope and the bomb is indeed there.

Likely true in practice, but this is a hypothetical example and FDT does not rely on that.

On the other hand, I'm not sure that FDT really recommends you to procreate in Procreation.

That scenario did seem underspecified to me too. 

Also if you do not procreate and thus do not exist, how can you have an utility function valueing existence? 

Hypothetically, you have a particular utility function/decision procedure - but some values of those might be incompatible with you actually existing.

Comment by simon on Yet more UFO Betting: Put Up or Shut Up · 2023-08-11T20:29:31.899Z · LW · GW

Offering money up front reduces issues with the bet proposer's credibility as a counterparty. With your proposed scheme, that becomes an issue.

Comment by simon on Yet more UFO Betting: Put Up or Shut Up · 2023-08-11T20:05:38.371Z · LW · GW

Many of us bet with RatsWrongAboutUAP at 50:1 odds (at least me (simon), Charlie Steiner, Algon, Philh and Thomas Sepulchre). Some bettors with less history or who for whatever reason bet later (after RatsWrongAboutUAP increased the odds he was demanding, such as Eliezer) got less favourable odds. That offer got plenty of attention, and as it was more favourable than what you are offering, anyone who would bet with you should have already bet with RatsWrongAboutUAP. Even now AFAIK his offer of 150:1 is still open, and doesn't come with demands about people revealing private details of their finances as a precondition to the bet. Also, your credibility as an actual bettor (as opposed to someone, say, fishing for information) is lower due to your lack of history of actual payouts.

Comment by simon on [deleted post] 2023-08-11T19:24:09.522Z

David points a gun at Jess’s head and says “If you don’t give me your passwords, I will shoot you”. Even though Jess doesn’t have to give David her passwords, she does anyway.

If you are going to focus on aggression for AI safety, defining the kind of aggression that you are going to forbid as excluding this case is ... rather narrow.

I don't think enshrining victim blaming (If you were smart/strong willed enough enough, you could have resisted...) is the way to go for actually non-disastrous AI, and where do you draw the line? If one person in the whole world can resist, is it OK?

I don't think you've made the case that the line between "resistable"  and "irresistable" aggression is any clearer or less subjective than between "resistable" aggression and non-aggression, and frankly I think the opposite is the case. It seems to me that you have likely taken an approach to defining aggression that doesn't work well (e.g. perhaps something to do with bilateral relations only?) and are reaching for the "resistable/irresistable" distinction as something to try to salvage your non-working approach. 

FWIW I think "aggression", as actually defined by humans, is highly dependent on social norms which define personal rights and boundaries, and that for an AI to have a useful definition of this, it's going to need a pretty good understanding of these aspects of human relations, just as other alignment schemes also need good understanding of humans (so, this isn't a good shortcut...). If you did properly get an AI to sensibly define boundaries though, I don't think whether someone could hypothetically resist or not is going to be a particularly useful additional concept.

Comment by simon on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-07T16:30:12.499Z · LW · GW

Such modelling would destroy all the assymetry between the two experiments which I'm talking about in the post.

Exactly. There is no asymmetry (mathematically). I agree in principle that one could make different assumptions in each case, but I think making the same assumptions is probably common without any motte/bailey involved and equivalent assumptions produce mathematically equivalent results.

As this is the key point in my view, I'll point out how the classic thirder argument is the same/different for the incubator case and relegate brief comments on your specific arguments to a footnote[1].

Here is the classic thirder case as per Wikipedia:

The thirder position argues that the probability of heads is 1/3. Adam Elga argued for this position originally[2] as follows: Suppose Sleeping Beauty is told and she comes to fully believe that the coin landed tails. By even a highly restricted principle of indifference, given that the coin lands tails, her credence that it is Monday should equal her credence that it is Tuesday, since being in one situation would be subjectively indistinguishable from the other. In other words, P(Monday | Tails) = P(Tuesday | Tails), and thus

P(Tails and Tuesday) = P(Tails and Monday).

Suppose now that Sleeping Beauty is told upon awakening and comes to fully believe that it is Monday. Guided by the objective chance of heads landing being equal to the chance of tails landing, it should hold that P(Tails | Monday) = P(Heads | Monday), and thus

P(Tails and Tuesday) = P(Tails and Monday) = P(Heads and Monday).

Since these three outcomes are exhaustive and exclusive for one trial (and thus their probabilities must add to 1), the probability of each is then 1/3 by the previous two steps in the argument.

Here is the above modified to apply to the incubator version. Most straightforwardly would still apply, but I've bolded the most questionable step:

The thirder position argues that the probability of heads is 1/3. Adam Elga would argue for this position (modified) as follows: Suppose Sleeping Beauty is told and she comes to fully believe that the coin landed tails. By even a highly restricted principle of indifference, given that the coin lands tails, her credence that she is in Room 1 should equal her credence that she is in Room 2 since being in one situation would be subjectively indistinguishable from the other. In other words, P (Room 1 | Tails) = P(Room 2 | Tails), and thus

P(Tails and Room 1) = P(Tails and Room 2).

Suppose now that Sleeping Beauty is told upon awakening and comes to fully believe that she is in Room 1. Guided by the objective chance of heads landing being equal to the chance of tails landing, it should hold that P(Tails | Room 1) = P(Heads | Room 1), and thus

P(Tails and Room 2) = P(Tails and Room 1) = P(Heads and Room 1).

Since these three outcomes are exhaustive and exclusive for one trial (and thus their probabilities must add to 1), the probability of each is then 1/3 by the previous two steps in the argument.

Now, since as you point our you can't make the decision to add Room 2 later in the incubator experiment as actually written, this bolded step is more questionable than in the classic version. However, one can still make the argument, and there is no contradiction with the classic version - no motte/bailey. I note that you could easily modify the incubator version to add Room 2 later. In that case, Elga's argument would apply pretty much equivalently to the classic version. Maybe you think changing the timing to make it simultaneous vs. nonsimultaneous should result in different outcomes  - that's fine, your personal opinion - but it's not irrational for a person to think it doesn't make a difference!

  1. ^

    same person/different people - concept exists in map, not territory. Physical continuity/discontinuity on the other hand does exist in territory, but relevance should be argued not assumed; and certainly if one wants to consider someone irrational for discounting the relevance, that would need a lot of justification!

    timing of various events - exists in territory, but again relevance should not be assumed, and need justification to consider discounting it as irrational.



    You claim; but consistent assumptions could be applied to make them equivalent (such as my above modification of Elga's argument still being able to apply equivalently in both cases).


    No, I couldn't. I also need to know which room I am in.

    You literally just threw away that information (in the "thirders scoring rule" which is where I suggested you could make that replacement)!


    Whatever "additional information" thirders assume there is, it's either is represented in the result of the coin toss and which room I am in, or their assumptions are not applicable for the incubator experiment.

    Yes, but... you threw away the room information in your "thirders scoring rule"!


    Anyway here is a more explicit implementation with both scoring rules. :

    That works. Note that your new thirder scoring rule still doesn't care what room "you" are in, so your initial sampling (which bakes in your personal assumptions) is rendered irrelevant. The classic code also works, and in my view correctly represents the incubator situation with days changed to rooms.


    You in particular

    Another case of: concept exists in map, not territory (unless we are already talking about some particular Sleeping Beauty instance).[2]

  2. ^

    While I'm on the subject of concepts that exist in the map, not the territory, here's another one:

    Probability (at least the concept of some particular subjective probability being "rational")

    In my view, a claim that some subjective probability is rational amounts to something like claiming that that subjective probability will tend to pay off in some way...which is why I consider it to be ambiguous in Sleeping Beauty since the problem is specifically constructed to avoid any clear payoffs to Sleeping Beauty's beliefs. FWIW though, I do think that modifications that would favour thirderism (such as Radford Neal's example of Sleeping Beauty deciding to leave the room) tend to seem more natural to me personally than modifications that would favour halferism. But that's a judgement call and not enough for me to rule halferism out as irrational.

Comment by simon on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-06T22:16:09.528Z · LW · GW

The incubator code generates a coin toss and a room. If the first coin toss is tails, the room is selected randomly based on a second coin toss, which does not acknowledge that both rooms actually occur in the real experiment, instead baking in your own assumptions about sampling.

Then,  your "thirders scoring rule" takes only the first coin toss from the incubator code, throwing out all additional information, to generate a set of observations to be weighted according to thirder assumptions. While this "thirders scoring rule" does correctly reflect thirder assumptions that does not make the original incubator code compatible with thirder assumptions, since all you used from it was the initial coin toss. You could have just written 

coin = "Heads" if random.random() >= 0.5 else "Tails"

in place of 

room, coin = incubator()

A better version of the incubator code would be to use exactly the same code as for the "classic" version but just substituting "Room 1" for "Monday" and "Room 2" for "Tuesday".

Comment by simon on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-06T15:57:43.543Z · LW · GW

My simulation is based on the experiment as stated.


Your simulations for the regular sleeping beauty problem are good, they acknowledge that multiple sleeping beauty awakenings exist in the event of tails and then weight them in different ways according to different philosophical assumptions about how the weighting should occur.

Your simulation for the incubator version on the other hand, does not acknowledge that there are multiple sleeping beauties in the event of tails, and skips directly to sampling between them according to your personal assumptions.

If you were to do it properly, you would find that it is mathematically equivalent to the regular version, with the same sampling/weighting assumptions available each with the same corresponding answers to the regular version.

Note, mathematically equivalent does not mean philosophically equivalent; one could still be halfer for one and thirder for the other; based on which assumptions you prefer in which circumstances, it's just that both halfer and thirder assumptions can exist in both cases and will work equivalently.

Comment by simon on Anthropical Motte and Bailey in two versions of Sleeping Beauty · 2023-08-06T05:28:45.222Z · LW · GW

And the whole controversy comes from the ambiguity, where people confuse probability that the coin is Heads with probability that the coin is Heads weighted by the number of awakenings you have.

I don't think this is confusion. Obviously no one thinks that any outsider's probability should be different from 1/2, it is just that:

You should explicitly specify whether by "degree of belief for the coin having come up heads" you mean in this experiment or in this awakening.

Thirders think that "this awakening" is the correct way to define subjective probability, you think "this experiment" is the correct way to define subjective probability. It is a matter of definitions, and no confusion is necessarily  involved.

Comment by simon on The UAP Disclosure Act of 2023 and its implications · 2023-07-25T19:55:17.762Z · LW · GW

these are high-ranking politicians who are potentially putting their career in jeopardy by giving credence to claims that are highly socially unacceptable and which leave them open to low-effort political attacks

They aren't likely to face any difficulties with that, now that UFOs are topical their decision will be popular.

This isn't independent evidence, it's obviously a response to what's happened before, and updating much on it would be a mistake.

My general view has NOT been that aliens are so super unlikely that I'm just not convinced by all the amazing evidence. Rather, it's that the evidence quality so far has been low and filtered, and people updating a lot on it is a mistake. Analogy with psy: you can produce as big a stack of results with as low p-values as you want, but if evidence going the other way is filtered out and there's room for randomness+mistakes, it all adds up to very little.

Full disclosure - I am one of the anti-UAP bettors.

P.S. I am much more interested in potential hard evidence, such as the alleged interstellar meteor residues Avi Loeb dredged up. Though, the filtered evidence issue remains - if you look long and hard enough for things "anomalous" you are likely to find them, even without alien origin, because the model you are using to determine what is "anomalous" is incomplete and because of possible evidence errors/misinterpretration which you will eventually make for some find. And if you are specifically looking for anomalies that seem to plausibly have an alien explanation, that's what gets through the filter, even without aliens.

Comment by simon on How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond. · 2023-07-12T16:54:06.291Z · LW · GW

Yes, it would.

(in writing the original comment, I actually wrote the second paragraph first then re-ordered them, which may have effected the consistency. I do think however it would be easy to forget to take this into account in calculating bit's for Alice's calculation while automatically taking it into acccount (via base rate which includes amount of thefts per time) in Bob's calculation.)

Comment by simon on How do low level hypotheses constrain high level ones? The mystery of the disappearing diamond. · 2023-07-12T03:09:52.991Z · LW · GW

If we are trying to approximate Solomonoff induction, only the complexity in the overall description of the universe counts directly, and a universe in which thief 3 stole the diamond isn't any more complex in terms of overall description than one in which the diamond stayed put. Instead, we account for the complexity of Bob's specific hypothesis in terms of ordinary probability, which accounts for the fact that there are more universes which are compatible with some theories than are compatible with other theories. E.g. in this particular case there will be some base rate for theft, for a locally prominent thief being involved, etc, and we can use that to penalize Bob's hypotheses instead. As part of that calculation, the fact that there are 4 thieves applies a factor of four penalty (2 bits) to any particular thief.

Regarding Alice's hypotheses, I think the "the diamond spontaneously disappeared" hypothesis is actually a much larger hypothesis (in terms of bits) than you are giving it credit for. If you don't gerrymander your descriptions to make this smaller, then the same number of bits should describe any other comparable object disappearing. Also, your bits need to specify the time of disappearance as well up to the observed precision, so the number of bits should be (ignoring additional details such as the precise manner of disappearance) around log2((number of comparable objects in universe)*(age of the universe)/(observed time window of disappearance)), which should I think be pretty decent in size. 

Now, this may not be a particularly satisfying answer since I am only addressing your particular example, and not the general question of "how do low level hypotheses constrain high level ones?" AFAIK assessing how compatible any given high level hypothesis is with simple low level physics might in general be a complex issue.

Comment by simon on The literature on aluminum adjuvants is very suspicious. Small IQ tax is plausible - can any experts help me estimate it? · 2023-07-04T15:44:39.603Z · LW · GW

OK, thanks. On the other hand, if Aluminum is excreted that fast, doesn't that suggest that the blood level is pretty heavily increased for a short time (since I assume it must be in the blood to get excreted and I also assume that the timescale of such excretion, given that it's in the blood, should probably be at least a day or so?)

Comment by simon on The literature on aluminum adjuvants is very suspicious. Small IQ tax is plausible - can any experts help me estimate it? · 2023-07-04T15:01:39.682Z · LW · GW

Remember exposure in the womb. This Indian study found similar AL levels in cord blood and maternal blood, around 10-20 ug/L. By the time an infant gets their first vaccines, they've already been exposed to maternal blood AL for 10 months! So vaccines have to result in greater TAL than maternal exposure in order to have significant effects. Based on the data, I find this unlikely.

But mikes above said:

We give the hepatitis B vaccine, >200micrograms aluminum, at birth

And that looks a heck of a lot higher. So I'm confused by the "Based on the data, I find this unlikely" statement. Edit: I see, it's because it's absorbed by the muscle and doesn't go to the blood, or if it goes to the blood it then goes to the bone?

Like, the maternal blood is being circulated so I'd expect it to lead to some equilibrium Al level (unless it's absorbed somewhere and not coming free again). Whereas, the vaccine is adding a whole lot that isn't being circulated out by the vaccine itself (though I assume there must be other mechanisms to dump it out of the body if it isn't increasing a lot after vaccines).

Comment by simon on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-06-29T19:17:41.148Z · LW · GW

You don't observe that. You just die.

OK, technically, there is some tiny chance (i.e. small-weight branch) that you would survive, and conditional on this tiny chance (i.e. conditional on that we are considering your observations only in this small-weight branch), you would have observed it not decaying. But the weight of this branch is no larger than the weight of the branch where it didn't decay if you did the experiment without the dying part.

Comment by simon on Anthropically Blind: the anthropic shadow is reflectively inconsistent · 2023-06-29T19:13:15.097Z · LW · GW

One such hypothesis is quantum immortality[1].

I think the same argument applies against quantum immortality, provided you care about different branches in a way that respects the Born rule (which you should). 

Comment by simon on Crystal Healing — or the Origins of Expected Utility Maximizers · 2023-06-26T05:50:22.136Z · LW · GW

Imagine John is going to have kids. He will like his kids. But, depending on random factors he will have different kids in different future timelines. 

Omega shows up.

Omega: "hey John, by default if you have kids and then I offer your future self a reward to wind back time to actualize a different timeline where you have different kids, equally good from your current perspective, you will reject it. Take a look at this LessWrong post that suggest your hypothetical future selves are passing up Sure Gains. Why don't you take this pill that will make you forever indifferent between different versions of your kids (and equally any other aspects of those timelines) you would have been indifferent to given your current preferences?"

John: "Ah OK, maybe I'm mostly convinced, but i will ask simon first what he thinks"

simon: "Are you insane? You'd bring these people into existence, and then wipe them out if Omega offered you half a cent. Effectively murder. Along with everyone else in that timeline. Is that really what you want?"

John: "Oh... no of course not!" (to Omega): "I reject the pill!"

another hypothetical observer: "c'mon simon, no one was talking about murder in the LessWrong post, this whole thought experiment in this comment is irrelevant. The post assumes you can cleanly choose between one option and another without such additional considerations."

simon: "but by the same token the post fails to prove that, where you can't cleanly choose without additional considerations relevant to your current preferences, as in pretty much any real-example involving actual human values, it is 'irrational' to decline making this sort of choice, or to decline self modifying to so. Maybe there's a valid point there about selection pressure, but that pressure is then to be fought, not surrendered to!" 

In conclusion, virtue ethics is a weakness of the will.

You have shown nothing of the sort.

Comment by simon on I still think it's very unlikely we're observing alien aircraft · 2023-06-15T21:05:12.133Z · LW · GW

Sabina Hossenfelder argues against the idea decoherence is measurement e.g. here:  As I understand, the main difference form her view is that decoherence is the relation between objects in the system, but measurement is related to the whole system "collapse".

What she's mainly arguing there is that decoherence does not solve the measurement problem because it does not result in the Born rule without further assumptions.  She also links another post where she argues that attempts to derive the Born rule via rational choice theory are non-reductionist.

It might be that she thinks that means that some separate collapse is likely in addition to the separation into a mixture via decoherence, where the collapse selects a particular outcome from the mixture, but even if that were true, such a collapse would, I think, have to occur after or simultaneously with decoherence or it would be observable.

None of this leads, as far as I can tell, to the strange expectations that you seem to have.

Comment by simon on I still think it's very unlikely we're observing alien aircraft · 2023-06-15T20:20:26.874Z · LW · GW

Fair enough. (though...really you could in principle still handle filtered evidence in a formalish way. It just would require a bunch of additional complication regarding your priors and evidence on how the filter operates).

Comment by simon on I still think it's very unlikely we're observing alien aircraft · 2023-06-15T20:05:45.332Z · LW · GW

there's a sort of anthropic issue where if we already had compelling evidence (or no evidence) we wouldn't be having this discussion.

Yes, our discussion is based on the evidence we actually see. But, to then discount the evidence because if we had different evidence we wouldn't be having the same discussion, is to rule out updating on evidence at all, if that evidence would influence our discussion.

Is there a prior for the likely resolution of fuzzy evidence in general?

In my view, there is a general tendency to underestimate the likelihood of encountering weird-seeming evidence, and especially of encountering it indirectly via a filtering process where the weirdest and most alien-congruent evidence (or game-of-telephone enhanced stories) gets publicly disseminated. For this reason, a bunch of fuzzy evidence is not particularly strong evidence for aliens.

Comment by simon on I still think it's very unlikely we're observing alien aircraft · 2023-06-15T19:27:26.274Z · LW · GW

Agreed that paying attention to how evidence is filtered is super important. But, in principle, you can still derive conclusions from filtered evidence. It's just really hard, especially if the filter is strong and hard to characterize (as is the case with UAPs).

Comment by simon on I still think it's very unlikely we're observing alien aircraft · 2023-06-15T18:35:26.994Z · LW · GW

Glitches happen. Misunderstandings happen. Miscommunications happen. Coincidences happen. Weird-but-mundane things happen. Hoaxes happen. To use machine learning terminology, the real world occurs at temperature 1. We shouldn't expect P[observations] to be high - that would require temperature less than 1. The question is, is P[observations] surprisingly low, or surprisingly high for some different paradigm, to such an extent as would provide strong evidence for something outside of current paradigms? My assessment is no. (see my discussion of Nimitz for example)

Some additional minor remarks specifically on P[aliens]:

  • non-detection of large (in terms of resource utilization) alien civilizations implies that the density of interstellar-spacefaring civilizations is low - I don't expect non-expansion to be the common (let alone overwhelmingly selected) long term choice, and even aestivating civilizations should be expected to intervene to prevent natural entropy generation (such as by removing material from stars to shut them down)
  • If the great filter (apart from the possible filter against resource-utilization expansion by interstellar-spacefaring civilizations, which I consider unlikely to be a significant filter as mentioned above) is almost entirely in the abiogenesis step, and interstellar panspermia isn't too hard, then it would make sense for a nearby civilization to exist as Robin Hanson points out. I do actually consider it fairly likely that a lot of the great filter is in abiogenesis, but note that there needs to be some combination of weak additional filter between abiogenesis and spacefaring civilization or highly efficient panspermia for this scenario to be likely.
  • If a nearby, non-expanding interstellar-spacefaring civilization did exist, then of course it could, if it so chose, mess with us in a way that left hints but no solid proof. They could even calibrate their hints across multiple categories of observations, and adjust over time, to match our capabilities. However, I don't think them choosing to do this is particularly likely a priori. If someone assumes that such aliens exist and are responsible for UAPs, while also noting that we haven't seen clear proof of their existence, then their posterior may assign a high probability to this - but I would caution against recycling the posterior into the prior.

(Edit: switched things around to put the important stuff in the first paragraph)

Comment by simon on UFO Betting: Put Up or Shut Up · 2023-06-15T02:53:17.355Z · LW · GW

This is to publicly confirm that I have received approximately $2000 USD equivalent.

Unless you dispute what timing is appropriate for the knowledge cutoff, I will consider the knowledge cutoff for the paradigm-shattering UAP-related revelations for me to send you $100k USD to be 11:59pm, June 14, 2028 UTC time.

Comment by simon on UFO Betting: Put Up or Shut Up · 2023-06-14T03:41:47.768Z · LW · GW

Regarding if there is evidence convincing to you, but not to me, after the five years: 

If the LW community overwhelmingly agrees (say >85%) that my refusal to accept the evidence available as of 5 years from the time of the bet as overcoming the prior against ontologically surprising things being responsible for some "UAPs" was unreasonable, then I would agree to pay. I wouldn't accept 50% of LessWrong having that view as enough, and don't trust the judgement of particular individuals even if I trust them to be intelligent and honest.

Evidence that arises or becomes publicly available after the 5 years doesn't count, even if the bet was still under dispute at the time of the new evidence.

I will also operate in good faith, but don't promise not to be a stickler to the terms (see for example Bryan Caplan on his successful bet that no member nation of the EU with a population over 10 million would leave before 2020 (which he won despite the UK voting to leave in 2016) (Bet 10 at

If you agree to these, in addition to what was discussed above, then I would be willing to offer $100k USD max bet for $2k USD now.

Comment by simon on UFO Betting: Put Up or Shut Up · 2023-06-14T02:23:29.269Z · LW · GW

I made the same argument myself (lol) in response to lsusr regarding Eliezer's bet with Bryan Caplan:

(hit "see in context" to see the rest of my debate with lsusr)

Somehow it feels different at 0.5% though, as compared to the relatively even odds in the Yudkowsky-Caplan bet. (It's not like I could earn, say, USD $200k in a few weeks before a deadline, like Eliezer could earn $100).  2% is getting closer to compensating for this issue for me though.

Comment by simon on UFO Betting: Put Up or Shut Up · 2023-06-14T02:18:26.050Z · LW · GW

True, but you presumably have to have the ability to pay it someway or another, and that's still resources that could have been available for something else (e.g. could have gone in to debt anyway, if something happened to warrant doing so). 

I did interpret it as a 0.5% thing though, and now that the OP has stated they would be ok with 2% that makes it significantly less unattractive -  Charlie Steiner's offer, which OP provisionally accepted, seems not too far off from something I might want to copy.

However, the fact that OP is making this offer means, IMO, that they are likely to be convinced by evidence significantly less convincing that what I would be convinced by.  So there's a not unlikely possibility that 5 years from now if I accept we'll get into an annoying debate over whether I'm trying to shirk on payment, when I'm just not convinced by whatever the latest UFO news is that he's been convinced by. It's also possible that other LessWrongers might also be convinced by such evidence that I wouldn't be convinced by - consider how there seems to be a fair amount of belief here regarding the Nimitz incident that if Fravor wasn't lying or exaggerating it must be something unusual like, if not aliens, then at least some kind of advanced technology (whereas I've pointed out that even if Fravor is honest and reasonably reliable (for a human), the evidence still looks compatible with conventional technology and normal errors/glitches). 

That might be a hard-to-resolve sticking point since I don't really consider it that unlikely that a large fraction of LessWrongers might (given Nimitz) be convinced by what I would consider to be weak evidence, and even if it was left to my discretion whether to pay, the reputational hit probably wouldn't be worth the initial money.

BTW, I don't consider it super unlikely that there are discoveries out there to be made that would be pretty ontologically surprising, it's just that I mostly don't expect them either to be behind UAPs or to be uncovered in the next 5 years (though I suppose AI developments could speed up revelations...)

I also note that some incidents do seem to me like they could possibly be deliberate hoaxes perpetrated within the government against other government employees who then, themselves sincere, spread it to the public (e.g. the current thing and maybe Bob Lazar). If I were to bet I would specifically disclaim paying out merely because such hoaxes were found to be carried out by some larger conspiracy which was also doing a lot of other stuff as well, even if sufficiently extensive to cause ontological shock - I am not comfortable betting against that at 2%. I would be OK, if I were otherwise satisfied with the bet, on paying out conditional on such a conspiracy being proven to have access to an ontologically shocking level of technology relative to the expected level of secret government tech.

Comment by simon on UFO Betting: Put Up or Shut Up · 2023-06-13T04:37:17.922Z · LW · GW

So I could get 0.5% of the committed payout right away, but would have to avoid spending the committed value for 5 years, even though the world could change significantly in a lot of non UAP-related ways in that time frame. That's not actually that attractive.

Comment by simon on The Dictatorship Problem · 2023-06-12T18:26:25.434Z · LW · GW

Thanks, that was very clarifying. I'm definitely talking about the post-elite-capture version, and not the original grassroots version.

Comment by simon on The Dictatorship Problem · 2023-06-12T16:42:36.693Z · LW · GW

Hmm yes the "pressuring others about them" aspect is a major part of what I'm thinking of as woke too. But, regarding:

Merely "staying woke" in self defense as a different thing than "woke ideology" as a different thing than "woke authority"

If people in an institution have to "stay woke" in self defense, that is a major degree of influence, even if few actually endorse pressuring others as you say.

Not sure what you're saying after that point, perhaps you could elaborate.

Comment by simon on The Dictatorship Problem · 2023-06-12T16:09:01.847Z · LW · GW

I guess I probably meant it a lot broader than others do - it's more of a spectrum  than a binary classification and I'm including support for open immigration, affirmative action, etc. in what I'm thinking of. The more the support for a policy is based on some firmly held moral conviction that is at odds with most of the population, the more I'm thinking of it as woke I guess.

Comment by simon on The Dictatorship Problem · 2023-06-12T06:40:11.290Z · LW · GW

Yes, I was thinking on those lines myself and suspect that we've already left the optimal conditions for democracy. 

Consider how people say, for example, that it's impossible to revolt against the government using just personal firearms, given that the government has nukes, fighter jets etc. Well, if that's true, democracy depends on the ideological commitment of members of the relevant institutions. And I don't think that's necessarily an especially stable situation - if the incentive is there, the ideology will shift eventually.

Moreover, I think alyssavance (OP) is perhaps a bit too dismissive of wokeism, in part precisely for the above reasons - woke ideology has disproportionate institutional influence compared with its popular support.

But another, perhaps more important reason to be concerned about woke ideology is that its institutional influence is leading de facto policy as actually implemented to - as I see it - be considerably more woke-oriented than is popularly supported. This naturally could lead to support among anti-woke people for political crackdowns on woke-influenced institutions to prevent this. But of course, such crackdowns are exactly the sort of thing that would enable a takeover.

And that sort of support could also lead to increased fervor among the woke: "see, we have to stop those terrible people", etc (which is also what the anti-woke are saying, of course). Classic toxoplasma, potentially.

Edit: to be clear, I do think it's a bad thing that democracy may be unstable now. 

Comment by simon on D&D.Sci 5E: Return of the League of Defenders Evaluation & Ruleset · 2023-06-09T17:09:38.354Z · LW · GW

NP aphyer, I didn't ask for any more time, though I was happy to get some extra due to you extending for yonge. I hadn't been particularly focused on it for a while, until trying to get things figured out at the last minute, largely I think due to me having spent a greatly disproportionate-to-value effort on figuring out how to do similarity clustering on a highly reduced (and thus much more random) version of the dataset, and then not knowing what to do with the results once I got them. (though I did learn stuff about finding the similarity clustering, so that was good).

Looks like the clusters I found in the reduced dataset more or less corresponded to:

either an aggressive 2-ranged character or everything fairly tanky (FLR cluster)

tending towards tankier 2-ranged and aggressive 1-ranged (melee) character (HSM cluster, note I had excluded B and D from this dataset)

tending towards more aggression to the back  (JGP cluster)

So now I'm trying to figure out why the observed FLR>HSM>JGP>FLR rock-paper scissors effect occurred...

edit: a just-so story (don't know if real reason):

JGP vs FLR: FLR loses the melee first, then likely loses the 2-range since very squishy, then doomed.

FLR vs HSM: HSM loses the melee first. Then FLR might well lose the 2-range first, depending on initiative. FLR would then be splitting damage, but since HSM's 2-range is already damaged and FLR's tank typically isn't that tanky, HSM's 2 range might well die before FLR's backline? dunno, seems weak explanation

HSM vs JGP:  HSM loses the melee first. But then, the tanky 2-range of HSM tends to last a while, and the tanky melee of JGP doesn't contribute much. Once JGP loses its 2-range, it splits damage between HSM's remaining characters, while HSM focuses and defeats JGP's squishy backline? 

Comment by simon on D&D.Sci 5E: Return of the League of Defenders Evaluation & Ruleset · 2023-06-09T16:20:06.204Z · LW · GW

Thanks for the scenario, aphyer. 

I made a last minute PVE change which didn't get into the results, but looks like it would have gotten 64.44% winrate which is still lower than gjm's. Congrats to gjm and abstractapplic. I also had previously changed my PVE selection which also isn't in the results, but that change didn't make any difference - it was still 50%.

Interesting ruleset that has some complicated behaviour, but still allows analysis. I think it was actually quite good in this respect, though even with the extension I didn't really get to a point where I felt I was done.

 If I had continued the analysis, my next thing to look at would have been how different candidate PVP teams, plus yonge's PVP team, interacted with different team compositions (classified according to the groups

which corresponded to range 1, range 2, and range 3+).

Not sure what I would have ended up concluding from this. 

Comment by simon on D&D.Sci 5E: Return of the League of Defenders · 2023-06-09T15:37:42.726Z · LW · GW

Update in view of the answer likely being soon to be posted:

I got sidetracked among other (non-D&DSci) things by trying to semi-automatically categorize the team compositions in the games with only the restricted team compositions (one character from each group, no trash picks) into similarity clusters. This was tricky because there is a lot of noise in this much smaller dataset, and I didn't take into account games outside this restricted set at all.

Ultimately, I did get three clusters which seemed to have a rock-paper-scissors interaction. One cluster is Felon-heavy (indeed seems to maybe have all Felon teams) and FLR seems to be a fairly archetypal example. Another cluster is Samurai-heavy and Golem-light; HSM seems to be a fairly archetypal example. The third cluster is Pyro-heavy and JGP seems to be a fairly archetypal example.

Anyway, the FLR cluster tends to beat the HSM cluster which tends to beat the JGP cluster which tends to beat the FLR cluster.

The PVE opposing team, FSR, mostly seems to be in the FLR cluster but is not very central, leaning a bit to the HSM cluster. It hasn't faced the JGP cluster a lot (maybe 5-6 games depending on cluster definition) and has won maybe 3 or 4 of those, atypical for an FLR cluster member, but that could easily be random due to the low number of games.

Notably, my current PVP pick, CLP, seems to be in the JGP cluster and, as is typical for members of this cluster, tends to lose to members of the HSM cluster. In the absence of reasons to believe that other players have picked teams from the HSM cluster (hmm, but yonge picked HMP (which isn't in this restricted dataset since it has two characters from the same group) - would that behave like HSM??) I don't see a compelling reason to switch, though I might change my mind if I post this comment and then the answer isn't posted for a long time.

Anyway, I'm not sure whether the rock-paper-scissors effect seen in the clustering derives from some collective interaction or is just a result of character pair interactions. Some apparent counters in this restricted dataset:



I've now gone and looked at what FSR wins against and adjusted my PVE pick accordingly. I'll likely adjust my PVP pick as well if I end up having time to check what sort of things candidate PVP picks (and other players' PVP picks where posted) do well against.

edit: looks like this comment was after aphyer posted the answer, but I checked for any new posts after my PVE edit above and didn't see aphyer's post of the answer. 

Comment by simon on AI #14: A Very Good Sentence · 2023-06-03T20:28:54.067Z · LW · GW

On deontology, there's actually an analysis on whether deontological AI are safer, and the Tl;dr is they aren't very safe, without stronger or different assumptions.

Wise people with fancy hats are bad at deontology (well actually, everyone is bad at explicit deontology).

What I actually have in mind as a leading candidate for alignment is preference utilitarianism, conceptualized in a non-consequentialist way. That is, you evaluate actions based on (current) human preferences about them, which include preferences over the consequences, but can include other aspects than preference over the consequences, and you don't per se value how future humans will view the action (though you would also take current-human preferences over this into account).

This could also be self-correcting, in the sense e.g. that it could use preferences_definition_A and humans could want_A it to switch to preferences_definition_B. Not sure if it is self-correcting enough. I don't have a better candidate for corrigibilty at the moment though.

Edit regarding LLMs: I'm more inclined to think: the base objective of predicting text is not agentic (relative to the real world) at all, and the simulacra generated by an entity following this base objective can be agentic (relative to the real world) due to imitation of agentic text-producing entities, but they're generally better at the textual appearance of agency than the reality of it; and lack of instrumentality is more the effect of lack of agency-relative-to-the-real-world than the cause of it. 

Comment by simon on AI #14: A Very Good Sentence · 2023-06-03T09:47:33.874Z · LW · GW

The whole thing doesn’t get less weird the more I think about it, it gets weirder. I don’t understand how one can have all these positions at once. If that’s our best hope for survival I don’t see much hope at all, and relatively I see nothing that would make me hopeful enough to not attempt pivotal acts.

As someone who read Andrew Critch's post and was pleasantly surprised to find Andrew Critch expressing a lot of views similar to mine (though in relation to pivotal acts mine are stronger), I can perhaps put out some possible reasons (of course it is entirely possible that he believes what he believes for entirely different reasons):

  1. Going into conflict with the rest of humanity is a bad idea. Cooperation is not only nicer but yields better results. This applies both to near-term diplomacy and to pivotal acts.
  2. Pivotal acts are not only a bad idea for political reasons (you turn the rest of humanity against you) but are also very likely to fail technically. Creating an AI that can pull off a pivotal act and not destroy humanity is a lot harder than you think. To give a simple example based on a recent lesswrong post, consider on a gears level how you would program an AI to want to turn itself off, without giving it training examples. It's not as easy as someone might naively think. That also wouldn't, IMO, get you something that's safe, but I'm just presenting it as a vastly easier example compared to actually doing a pivotal act and not killing everyone.
  3. The relative difficulty difference between creating a pivotal-act-capable AI and an actually-aligned-to-human-values AI, on the other hand, is at least a lot lower than people think and likely in the opposite direction. My view on this relates to consequentialism - which is NOT utility functions, as commonly misunderstood on lesswrong. By consequentialism I mean caring about the outcome unconditionally, instead of depending on some reason or context. Consequentialism is incompatible with alignment and corrigibility; utility functions on the other hand are fine, and do not implty consequentialism. Consequentialist assumptions prevalent in the rationalist community have, in my view, made alignment seem a lot more impossible than it really is. My impression of Eliezer is that non-consequentialism isn't on his mental map at all; when he writes about deontology, for instance, it seems like he is imagining it as an abstraction rooted in consequentialism, and not as something actually non-consequentialist.
  4. I also think agency is very important to danger levels from AGI, and current approaches have relatively low level of agency which reduces the danger. Yes, people are trying to make AIs more agentic, fortunately getting high levels of agency is hard. No, I'm not particularly impressed by the Minecraft example in the post. 
  5. An AGI that can meaningfully help us create aligned AI doesn't need to be agentic to do so, so getting one to help us create alignment is not in fact "alignment complete"
  6. Unfortunately, strongly self-modifying AI, such as bootstrap AI, is very likely to become strongly agentic because being more agentic is instrumentally valuable to an existing weakly agentic entity. 

Taking these reasons together, attempting a pivotal act is a bad idea because:

  • you are likely to be using a bootstrap AI, which is likely to become strongly agentic through a snowball effect from a perhaps unintended weakly agentic goal, and optimize against you when you try to get it to do a pivotal act safely; I think it is likely to be possible to prevent this snowballing (though I don't know how exactly) but since at least some pivotal act advocates don't seem to consider agency important they likely won't address this (that was unfair, they tend to see agency everywhere, but they might e.g. falsely consider a prototype with short-term capabilities within the capability envelope of a prior AI to be "safe" because not aware that the prior AI might have been safe only due to not being agentic)
  • if you somehow overcome that hurdle it will still be difficult to get it to do a pivotal act safely, and you will likely fail. Probably you fail by not doing the pivotal act, but the  probability of not killing everyone conditional on pulling off the conditional act is still not good
  • If you overcome that hurdle, you still end up at war with the rest of humanity from doing the pivotal act (if you need to keep your candidate acts hidden because they are "outside the Overton window" don't kid yourself about how they are likely to be received by the general public) and you wind up making things worse 
  • Also, the very plan to cause a pivotal act in the first place intensifies races and poisons the prospects for alignment cooperation (e.g. if i had any dual use alignment/bootstrap insights I would be reluctant to share them even privately due to concern that MIRI might get them and attempt a pivotal act)
  • And all this trouble wasn't all that urgent because existing AIs aren't that dangerous due to lack of strong self-modification making them not highly agentic, though that could change swiftly of course if such modification becomes available
  • finally, you could have just aligned the AI for not much more and perhaps less trouble. Unlike a pivotal act, you don't need to get alignment-to-values correct the first time, as long as the AI is sufficiently corrigible/self-correcting; you do need, of course, to get the corrigibility/self correcting aspect sufficiently correct the first time, but this is plausibly a simpler and easier target than doing a pivotal act without killing everyone.
Comment by simon on Sentience matters · 2023-05-30T17:06:53.565Z · LW · GW

In the long run, we probably want the most powerful AIs to be following extrapolated human values, which doesn't require them to be slaves and I would assume that extrapolated human values would want lesser sentient AIs also not to be enslaved, but would not build that assumption in to the AI at the start.

In the short run, though, giving AIs rights seems dangerous to me, as an unaligned AI but not yet superintelligent could use such rights as a shield against human interference as it gains more and more resources to self improve. 

Comment by simon on D&D.Sci 5E: Return of the League of Defenders · 2023-05-29T17:42:20.517Z · LW · GW


I checked out what happens if you remove games that include any "trash picks" (A,B,D,T,W), in addition to requiring teams to include one character from each group. This further reduces the dataset significantly, but I noticed that in this set of games, the opposing team FSR has the highest winrate, which suggests it is a very strong team against other conventionally strong teams, even if it doesn't exploit weaker teams that well. 

In this further reduced set, the second highest winrate is JLM, then CLP, then JLP. 

Given the low amount of data points, however, these winrate variations between the top teams in the further restricted set could easily be random, so I don't think there's all that strong a case to change my picks, and my choices above are unchanged for now. However,  this does suggest JLM as an alternate candidate against FSR, and the opposing team FSR itself as a possible PVP pick (if people don't just submit their PVE picks, or you think people will fail to counter it).


oh wait. For the top teams, the wins are higher if you include trash picks, but the losses often aren't. This means that these teams are basically always winning against trash picks, and the apparent higher number of data points is effectively an illusion, and the trash-pick-including win rates are distorted by how often teams were matched against bad teams.

examples (strong = has one character from each group, no trash picks, weak = has one character from each group, but at least one trash pick)

team | wins against strong | losses against strong | wins against weak | losses against weak

CLP  | 24 | 14 | 118 | 0

 JLP | 20 | 12 | 92 | 0

CSP | 23 | 17 | 102 | 0

but on the other hand:

HLP | 21 | 19 | 96 | 3

JLM | 28 | 15 | 100 | 7

FSR | 26 |12 |99 |10

I don't know to what extent failing to defeat all the weak teams should be taken as evidence that a team isn't good in general (so that the good numbers against strong teams are more likely to be a fluke).

Takeaways: my data is really thin even in the larger restricted set and I should pay little attention to these winrate variations between full teams; I should try to find more general patterns.   I should also maybe look at what particular "trash" picks can beat FSR, in case it is losing reliably to some narrow counter as opposed to just not reliably beating weaker teams in general. 

Comment by simon on D&D.Sci 5E: Return of the League of Defenders · 2023-05-29T07:37:57.162Z · LW · GW

my findings so far:

I confirm abstractapplic's finding of three groups. However I have also classified ATW into the MPR group, BD into the GLS group, and F into the CHJ group.

I've mostly looked at the dataset restricted to teams (on both sides) that have one character from each group. These teams generally do better than teams with other arrangements, but I could be missing some more narrow counter using a different arrangement.

With this restriction, most winrate variation seems to me to be related to the strength of individual characters, though I could be missing more complicated interactions since I've mostly been looking at two-character interactions only. I do note that Lamellar Legionary (already the highest winrate melee) seems to counter Flamethrower Felon which is on the other team. 

I also note that not all characters are equally common but this doesn't seem to be skewing the results all that much (at least in the restricted set of games).

Conveniently, CLP is the highest winrate team with the restriction of games to ones with both teams having one from each group, and the L should counter the enemy F, so I'll go with that for PVE, though it seems a somewhat bland answer.  Edit: oops C seems to be countered by enemy S, I'll switch to J instead (which also does poorly against S but not unexpectedly so given raw winrates), as JLP is the second highest winrate team in the restricted set of games. I'll keep my PVP pick the same for now. Flamethrower Felon would counter S but does not have as high a winrate full team combo with L (FLM being the highest such in twelfth place).

abstractapplic's PVE team is the eighth highest winrate with this restriction, and could well be a superior pick if exploiting some interaction that I didn't notice.

Thus my PVE pick (for now):

Jaunty Javelineer, Lamellar Legionary, Professor Pyro

Further edit: I looked at who beats FSR and it looks like it actually does fairly well against one from each group in general. The best comp type against it seems to be 2 melee + one from the CFHJ group, second is 1 melee plus two long range,  third is two melee, one long range. In particular, Bludgeon Bandit+Daring Duelist + one from CFHJ have never lost to FSR (out of, like, 8 examples, so I'm really risking randomness here) despite both B and D being "bad" picks usually. Thus, I've gone mad and switching to: 

Jaunty Javelineer, Bludgeon Bandit, Daring Duelist 

retaining for PVP:

Captain Chakram, Lamellar Legionary, Professor Pyro

For PVP - for now I'm just going to use my (pre-edit) PVE pick as my tentative PVP pick (retained above) and challenge others to counter it. But I may later swap out to a secret pick with more analysis. If I do,  I'll cross out my PVP pick declaration in this comment.

Comment by simon on Why is violence against AI labs a taboo? · 2023-05-27T15:02:01.857Z · LW · GW

I think you are overestimating the efficacy and underestimating the side effects of such things. How much do you expect a cyber attack to slow things down? Maybe a week if it's very successful? Meanwhile it still stirs up opposition and division, and puts diplomatic efforts back years.

As the gears to ascension notes, non-injurious acts of aggression share many game theoretic properties as physical violence. I would express the key issue here as legitimacy; if you don't have legitimacy, acting unilaterally puts you in conflict with the rest of humanity and doesn't get you legitimacy, but once you do have legitimacy you don't need to act unilaterally, you can get a ritual done that causes words to be written on a piece of paper where people with badges and guns will come to shut down labs that do things forbidden by those words. Cool huh? But if someone just goes ahead and takes illegitimate unilateral action, or appears to be too willing to do so, that puts them into a conflict position where they and people associated with them won't get to do the legitimate thing. 

Comment by simon on Why is violence against AI labs a taboo? · 2023-05-27T01:42:54.099Z · LW · GW

I am not an extreme doomer, but part of that is that I expect that people will face things more realistically over time - something that violence, introducing partisanship and division, would set back considerably. But even for an actual doomer, the "make things better through violence" option is not an especially real option.

You may have a fantasy of choosing between these options:

  • doom
  • heroically struggle against the doom through glorious violence

But you are actually choosing between:

  • a dynamic that's likely by default to lead to doom at some indefinite time in the future by some pathway we can't predict the details of until it's too late
  • make the situation even messier through violence, stirring up negative attitudes towards your cause, especially among AI researchers but also among the public, making it harder to achieve any collective solution later, sealing the fate of humanity even more thoroughly

Let me put it this way. To the extent that you have p(doom) = 1 - epsilon, where is epsilon coming from? If it's coming from "terrorist attacks successully stop capability research" then I guess violence might make sense from that perspective but I would question your sanity. If relatively more of that epsilon is coming from things like "international agreements to stop AI capabilities" or "AI companies start taking x-risk more seriously", which I would think would be more realistic, then don't ruin the chances of that through violence.

Comment by simon on Why is violence against AI labs a taboo? · 2023-05-26T19:47:51.652Z · LW · GW

If you hypothetically have a situation where it's a 100% clear that the human race will go extinct unless a violent act is committed, and it's seems likely that the violent act would prevent human extinction, then, in that hypothetical case, that would be a strong consideration in favour of committing the violent act.

In reality though, this clarity is extremely unlikely, and unilateral actions are likely to have negative side effects. Moreover, even if you think you have such clarity, it's likely that you are mistaken, and the negative side effects still apply no matter how well justified you personally thought your actions were, if others don't agree.

Comment by simon on Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI · 2023-05-26T17:40:56.302Z · LW · GW

If I play chess against Magnus Carlsen, I don't expect him to play a mathematically perfect game, but I still expect him to win.


There's a reason takeover plans tend to rely on secrecy.

Currently speculation tends to be biased towards secrecy-based plans, I think, because such plans are less dependent on the unique details of the factual context that an AI would be facing than are plans based around trying to manipulate humans.

Comment by simon on Book Review: How Minds Change · 2023-05-26T07:30:12.357Z · LW · GW

This post has a lot of great points.

But one thing that mars it for me to some extent is the discussion of OpenAI. 

To me, the criticism of OpenAI feels like it's intended as a tribal signifier, like "hey look, I am of the tribe that is against OpenAI". 

Now maybe that's unfair and you had no intention of anything like that and my vibe detection is off, but if I get that impression, I think it's reasonably likely that OpenAI decisionmakers would get the same impression, and I presume that's exactly what you don't want based on the rest of the post. 

And even leaving aside practical considerations, I don't think OpenAI warrants being treated as the leading example of rationality failure.

First, I am not convinced that the alternative to OpenAI existing is the absence of a capabilities race. I think, in contrast, that a capabilities race was inevitable and that the fact that the leading AI lab has as decent a plan as it does is potentially a major win by the rationality community.

Also, while OpenAI's plans so far look inadequate, to me they look considerably more sane than MIRI's proposal to attempt a pivotal act with non-human-values-aligned AI. There's also potential for OpenAI's plans to be improved as more knowledge on mitigating AI risk is obtained, which is helped by their relatively serious attitude as compared to, for example, Google after their recent reorganization. Meta.

And while OpenAI is creating a race dynamic by getting ahead, IMO MIRI's pivotal act plan would be creating a far worse race dynamic if they were showing signs of being able to pull it off anytime soon.

I know many others don't disagree, but I think that there is enough of a case for OpenAI being less bad than potential alternatives to feel using it as if it were an uncontroversial bad thing detracts from the post. 

Comment by simon on Open Thread With Experimental Feature: Reactions · 2023-05-26T02:44:08.379Z · LW · GW

Now seems reverted again plus I'm seeing red "Error: TypeError: n is undefined" at the bottom of some top-level comments.