Comment by alexey on Black Box Biology · 2023-12-29T19:58:06.158Z · LW · GW

so the maximum "downside" would be the sum of the differences between that reference populations lives and those without the variant for all variants you edit (plus any effects from off-targets)

I don't think that's true? It has to assume the variants don't interact with each other. Your reference population would only have 0.01% people with (the rarest) 2 variants at once, 0.0001% with 3 variants, and so on.

Comment by alexey on Extrapolating from Five Words · 2023-12-22T10:33:05.242Z · LW · GW

Yes, but this exact case is when you say "This would be useful for trying out different variations on a phrase to see what those small variations change about the implied meaning" and when it can be particularly misleading because the LLM is contrasting with the previous version which the humans reading/hearing the final version don't know about.

So it would be more useful for that purpose to use a new chat.

Comment by alexey on Extrapolating from Five Words · 2023-12-16T14:23:01.450Z · LW · GW

But the screenshot says "if i instead say the words...". This seems like it has to be in the same chat with the "matters" version.

Comment by alexey on In Defense of Parselmouths · 2023-12-16T11:41:15.631Z · LW · GW

but speak only the truth to other Parselmouths and (by implication) speak only truth to Quakers.

I would merely like to note that the implication seems contrary to the source of the name: I expect Quirrell and most historical Parselmouths in HPMOR would very much lie to Quakers (Quirrell would maybe derive some entertainment from not saying factually false things while misleading them).

Comment by alexey on Who is Sam Bankman-Fried (SBF) really, and how could he have done what he did? - three theories and a lot of evidence · 2023-12-16T11:28:54.815Z · LW · GW

Or to put it another way: in the full post you say

There is some evidence he has higher-than-normal narcissistic traits, and there’s a positive correlation between narcissistic traits and DAE. I think there is more evidence of him having DAE than there is of him having narcissistic traits

but to me it looks like you could have equally replaced DAE with "narcissistic traits" in Theories B and C, and provided the same list of evidence.

(1) Convicted criminals are more likely to have narcissistic traits.

(2) "extreme disregard for protecting his customers" is also evidence for narcissistic traits.

Etc. And then you could repeat the exercise with "sociopathy" and so on.

So there are two possibilities, as far as I can see:

  1. One or more things on the list are in fact not evidence for narcissistic traits.
  2. They are stronger evidence for DAE than for narcissistic traits. 

But it isn't clear which you believe and about what parts of the list in particular. (Of course, with the exception of (4) and (11), but they go in the opposite directions.)

Comment by alexey on Who is Sam Bankman-Fried (SBF) really, and how could he have done what he did? - three theories and a lot of evidence · 2023-12-16T11:05:01.043Z · LW · GW

Yes, it's evidence. My question is how strong or weak this evidence is (and my expectation is that it's weak). Your comparison relies on "wet grass is typically substantial evidence for rain".

Comment by alexey on Who is Sam Bankman-Fried (SBF) really, and how could he have done what he did? - three theories and a lot of evidence · 2023-12-11T21:55:50.208Z · LW · GW

Based on the full text:

Some readers may think that this sounds circular: if I’m trying to explain why someone would do what SBF did, how is it valid to use the fact that he did it as a piece of evidence for the explanation? But treating the convictions as evidence for SBF’s DAE is valid in the same way that, if you were trying to explain why the grass is wet, it would be valid to use the fact that the grass is wet as evidence for the hypothesis that it rained recently (since wet grass is typically substantial evidence for rain).

But a lot of your pro-DAE evidence seems to me to fail this test. E.g. ok, he lied to the customers and to the Congress; why is this substantial evidence of DAE in particular?

oh, FTX doesn’t have a bank account, I guess people can wire to Alameda’s to get money on FTX….3 years later…oh fuck it looks like people wired $8b to Alameda and oh god we basically forgot about the stub account that corresponded to that, and so it was never delivered to FTX.

This seems like evidence in favor of Theory A and against DAE if you look at those as competing explanations? That is, he (is claiming that in this particular case he) commingled funds for reasons unrelated to DAE. 

In November 2022, he also tweeted these statements

It seems likely he believed at that point that if a run could be avoided, he would have enough assets; so making these statements could help most customers, and not making them could hurt most of them, even if it helped a few lucky and quick ones. Not evidence of decreased empathy at all (in my view).

(3) There are multiple sources suggesting that he has a tendency and willingness to lie and deceive others.

Everything under this seems to fail the rain test, at least; very many people have this willingness, most of them don't have DAE (simply based on the prevalence you mention). Is this particular "style" of dishonesty characteristic of DAE?

(4) is actual evidence for DAE, great.

(5) and (10) For the rain test you need to provide a reason to believe most manipulative people have DAE. 


For decreased affective guilt the situation seems to be worse: as far as I can see, no evidence for it is presented, just evidence there is some reported guilt and then

In the context of the large amounts of evidence for his lack of affective empathy, it seems more likely that the quote above is an example of cognitive guilt rather than affective guilt.

This seems to require a very large correlation between DAEmpathy and DAGuilt. Why couldn't he have one but not the other?

When I wrote the above, I was just going by your stated definition of DAE; after going to the page you linked, which I should have done earlier, a lot of your evidence seems to cover the facets of psychopathy other than DAE; you could argue they are correlated, but it seems replacing DAE with psychopathy (as defined there) in theories B and C would make the evidence fit strictly better.

Comment by alexey on Stuxnet, not Skynet: Humanity's disempowerment by AI · 2023-12-04T22:12:32.080Z · LW · GW

I feel like people like Scott Aaronson who are demanding a specific scenario for how AI will actually kill us all... I hypothesize that most scenarios with vastly superhuman AI systems coexisting with humans end in the disempowerment of humans and either human extinction or some form of imprisonment or captivity akin to factory farming

Aaronson in that quote is "demanding a specific scenario" for how GPT-4.5 or GPT-5 in particular will kill us all. Do you believe they will be vastly superhuman?

Comment by alexey on Contra Nora Belrose on Orthogonality Thesis Being Trivial · 2023-11-10T00:23:09.615Z · LW · GW

The quoted section more seems like instrumental convergence than orthogonality to me?

The second part of the sentence, yes. The bolded one seems to acknowledge AIs can have different goals, and I assume that version of EY wouldn't count "God" as a good goal.

Another more relevant part:

Obviously, if the AI is going to be capable of making choices, you need to create an exception to the rules - create a Goal object whose desirability is not calculated by summing up the goals in the justification slot.

Presumably this goal object can be anything.

But in order to accept that, one needs to accept the orthogonality thesis. 

I agree that EY rejected the argument because he accepted OT. I very much disagree that this is the only way to reject the argument. In fact, all four positions seem quite possible:

  1. Accept OT, accept the argument: sure, AIs can have different goals, but this (starting an AI without explicit goals) is how you get an AI which would figure out the meaning of life.
  2. Reject OT, reject the argument: you can think "figure out the meaning of life" is not a possible AI goal.
  3. and 4. EY's positions at different times.


In addition, OT can itself be a reason to charge ahead with creating an AGI: since it says an AGI can have any goal, you "just" need to create an AGI which will improve the world. It says nothing about setting an AGI's goal being difficult.

Comment by alexey on Contra Nora Belrose on Orthogonality Thesis Being Trivial · 2023-11-06T21:44:17.762Z · LW · GW

In fact it seems that the linked argument relies on a version of the orthogonality thesis instead of being refuted by it:

For almost any ultimate goal - joy, truth, God, intelligence, freedom, law - it would be possible to do it better (or faster or more thoroughly or to a larger population) given superintelligence (or nanotechnology or galactic colonization or Apotheosis or surviving the next twenty years).

Nothing about the argument contradicts "the true meaning of life" -- which seems in that argument to be effectively defined as "whatever the AI ends up with as a goal if it starts out without a goal" -- being e.g. paperclips.

Comment by alexey on I compiled a ebook of `Project Lawful` for eBook readers · 2023-10-15T16:35:01.154Z · LW · GW

Is the story currently complete? 

Comment by alexey on Actually, "personal attacks after object-level arguments" is a pretty good rule of epistemic conduct · 2023-10-14T17:06:32.519Z · LW · GW

The issue with the first justification is that no one has actually claimed that the existence of such a rule is obvious or self-evident. Publicly holding a non-obvious belief does not obligate the holder to publicly justify that belief to the satisfaction of the author.

However, Yudkowsky also called the rule "straightforward" and said that

violating it this hugely and explicitly is sufficiently bad news that people should've been wary about this post and hesitated to upvote it for that reason alone

That is, he expected majority of EA Forum members (at least) to also consider is a "basic rule".

Comment by alexey on Autogynephilia discourse is so absurdly bad on all sides · 2023-08-22T16:11:03.887Z · LW · GW

That right there shows autogynephilia isn't a universal explanation.

Do any prominent pro-AGP people claim it is? Even when I see them described by their opponents, the claim is that there are two clusters of trans women and AGP people are one of them, so aroace trans women could belong to the other cluster without contradicting that theory.

Comment by alexey on Intelligence Officials Say U.S. Has Retrieved Craft of Non-Human Origin · 2023-07-07T09:27:53.388Z · LW · GW

There are similar claims in Russia as well, for what it's worth.

Comment by alexey on "a dialogue with myself concerning eliezer yudkowsky" (not author) · 2023-05-02T13:32:23.698Z · LW · GW

and author intentionally cropped

The author is visible in the next screenshot, unless you meant something else (also, even if he wasn't, the name is part of the URL).

Comment by alexey on There are (probably) no superhuman Go AIs: strong human players beat the strongest AIs · 2023-03-21T23:46:10.535Z · LW · GW

If I were going to play chess against Magnus Carlsen I'd definitely study his games with a computer, and if that computer found a stunning refutation to an opening he liked I'd definitely play it.

Conditionally on him continuing to play the opening, I would expect he has a refutation to that refutation, but no reason to use the counter-refutation in public games against the computer. On the other hand, he may not want to burn it on you either.

Comment by alexey on What fact that you know is true but most people aren't ready to accept it? · 2023-03-05T13:17:10.697Z · LW · GW

is obviously different than what you said, though

To me it doesn't seem to be? "condoned by social consensus" == "isn't broadly condemned by their community" in the original comment. And 

because the "social consensus" is something designed by people, in many cases with the explicit goal of including circles wider than "them and their friends"

doesn't seem to work unless you believe a majority of people are both actively designing the "social consensus" and have this goal; majority of people who design the consensus having this as a goal is not sufficient.

Comment by alexey on What does it mean for an AGI to be 'safe'? · 2022-11-06T20:25:49.969Z · LW · GW

It's explicitly the second:

But if they can do that with an AGI capable of ending the acute risk period, then they've probably solved most of the alignment problem. Meaning that it should be easy to drive the probability of disaster dramatically lower.

Comment by alexey on Why Do AI researchers Rate the Probability of Doom So Low? · 2022-10-24T18:00:49.106Z · LW · GW

You might have confused "singularity" and "a singleton" (that is, a single AI (or someone using AI) getting control of the world)?

Comment by alexey on Moses and the Class Struggle · 2022-05-01T11:52:38.741Z · LW · GW

Cairo is a problem too, then (it was founded after Arthur lived).

Comment by alexey on DARPA Digital Tutor: Four Months to Total Technical Expertise? · 2020-07-16T08:16:49.455Z · LW · GW

It's also interesting that apparently field experts only did about as well as the traditional students:

Differences between Fleet and ITTC participants were generally smaller and neither consistently positive nor negative.

Does experience not help at all?

Comment by alexey on Leto among the Machines · 2018-10-08T16:32:31.510Z · LW · GW

I don't believe the original novels imply the humanity nearly went extinct and then banded together, that was only in "the junk Herbert's son wrote". Or that Strong AI was developed only a short time before the Jihad started.

Neither of these are true in the Dune Encyclopedia version, which Frank Hebert at least didn't strongly disapprove of.

There is still some Goodhart's-Law-ing there, to quote

After Jehanne's death, she became a martyr, but her generals continued exponentially with more zeal. Jehanne knew her weaknesses and fears, but her followers did not. The politics of Urania were favored. Around that time, the goals of the Jihad were the destruction of machine technology operating at the expense of human values; but by this point they would have be replaced by indiscriminate slaughter.

Comment by alexey on Intrinsic properties and Eliezer's metaethics · 2017-09-23T12:10:29.102Z · LW · GW

Whereas I can look at a regular triangle and see its ∆-ness from outside the simulation, I cannot do the same (let's suppose) for keys of the right shape to open lock L.

Why suppose this and not the opposite? If you understand L well enough to see if a key opens it immediately, does this make L-openingness intrinsic, so intrinsicness/extrinsicness is relative to the observer?

And on the other hand, someone else needs to simulate a ruler to check for ∆-ness, so it is an extrinsic property to him.

Namely, goodness of a state of affairs is something that I can assess myself from outside a simulation of that state.

I certainly would consider this much more difficult than merely checking whether a key opens a lock. I could after spending enough time understand the lock well enough for this, but even considering a complete state of affairs e.g. on Earth?

Comment by alexey on 2017 LessWrong Survey · 2017-09-22T20:34:26.837Z · LW · GW

I've taken the survey.

Comment by alexey on What conservatives and environmentalists agree on · 2017-04-27T15:58:01.526Z · LW · GW

Most leftists ... believe we can all agree on what crops to grow (what social values to have [2])

Whose slogan is "family values", again?

and pull out and burn the weeds of nostalgia, counter-revolution, and the bourgeoisie

Or the weeds of revolution, hippies, and trade unions...

Conservatives view their own society the way environmentalists view the environment: as a complex organism best not lightly tampered with. They're skeptical of the ability of new policies to do what they're supposed to do, especially a whole bunch of new policies all enacted at once.

Bunch of new policies like War on Drugs, for example?

Comment by alexey on Lesswrong 2016 Survey · 2016-04-02T09:14:05.199Z · LW · GW

I've taken the survey.

Comment by alexey on The AI That Pretends To Be Human · 2016-02-13T18:26:06.191Z · LW · GW

Second AI: If I just destroy all humans, I can be very confident any answers I receive will be from AIs!

Comment by alexey on Astronomy, Astrobiology, & The Fermi Paradox I: Introductions, and Space & Time · 2015-07-30T04:43:05.712Z · LW · GW

The amount of line emission from a galaxy is thus a rough proxy for the rate of star formation – the greater the rate of star formation, the larger the number of large stars exciting interstellar gas into emission nebulae... Indeed, their preferred model to which they fit the trend converges towards a finite quantity of stars formed as you integrate total star formation into the future to infinity, with the total number of stars that will ever be born only being 5% larger than the number of stars that have been born at this time.

Is this a good proxy for total star formation, or only large star formation? Is it plausible that while no/few large stars are forming, many dwarfs are?

Comment by alexey on L-zombies! (L-zombies?) · 2014-02-16T07:11:17.449Z · LW · GW

But my point is that at some point, a "static analysis" becomes functionally equivalent to running it. If I do a "static analysis" to find out what the state of the Turing machine will be at each step, I will get exactly the same result (a sequence of states) that I would have gotten if I had run it for "real", and I will have to engage in computation that is, in some sense, equivalent to the computation that the program asks for.

Crucial words here are "at some point". And Benja's original comment (as I understand it) says precisely that Omega doesn't need to get to that point in order to find out with high confidence what Eliezer's reaction to counterfactual mugging would be.

Comment by alexey on L-zombies! (L-zombies?) · 2014-02-10T13:33:03.179Z · LW · GW

Suppose I've seen records of some inputs and outputs to a program: 1->2, 5->10, 100->200. In every case I am aware of it was given a number as input, it output the doubled number. I don't have the program's source and or ability to access the computer it's actually running on. I form a hypothesis: if this program received input 10000, it would output 20000. Am I running the program?

In this case: doubling program<->Eliezer, inputs<->comments and threads he is answering, outputs<->his replies.

Comment by alexey on L-zombies! (L-zombies?) · 2014-02-10T13:13:30.335Z · LW · GW

But I can still do static analysis of a Turing machine without running it. E.g. I can determine a T.M. would never terminate on given input in finite time.

Comment by alexey on L-zombies! (L-zombies?) · 2014-02-08T18:59:37.441Z · LW · GW

If I'm figuring out what output a program "would" give "if" it were run, in what sense am I not running it?

In the sense of not producing effects on the outside world actually running it would produce. E.g. given this program

int goodbye_world() {
   return 0;

I can conclude running it would launch missiles (assuming suitable implementation of the launch_nuclear_missiles function) and output 0 without actually launching the missiles.